README.md 7.07 KB
Newer Older
1
2
## Attention-based Extraction of Structured Information from Street View Imagery

Jaeyoun Kim's avatar
Jaeyoun Kim committed
3
4
5
6
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-based-extraction-of-structured/optical-character-recognition-on-fsns-test)](https://paperswithcode.com/sota/optical-character-recognition-on-fsns-test?p=attention-based-extraction-of-structured)
[![Paper](http://img.shields.io/badge/paper-arXiv.1704.03549-B3181B.svg)](https://arxiv.org/abs/1704.03549)
[![TensorFlow 1.15](https://img.shields.io/badge/tensorflow-1.15-brightgreen)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)

7
8
9
10
11
12
13
14
15
16
17
18
19
*A TensorFlow model for real-world image text extraction problems.*

This folder contains the code needed to train a new Attention OCR model on the
[FSNS dataset][FSNS] dataset to transcribe street names in France. You can
also use it to train it on your own data.

More details can be found in our paper:

["Attention-based Extraction of Structured Information from Street View
Imagery"](https://arxiv.org/abs/1704.03549)

## Contacts

Jaeyoun Kim's avatar
Jaeyoun Kim committed
20
21
22
23
Authors

* Zbigniew Wojna (zbigniewwojna@gmail.com)
* Alexander Gorban (gorban@google.com)
24

Jaeyoun Kim's avatar
Jaeyoun Kim committed
25
Maintainer: Xavier Gibert [@xavigibert](https://github.com/xavigibert)
26
27
28

## Requirements

29
1. Install the TensorFlow library ([instructions][TF]). For example:
30
31

```
32
python3 -m venv ~/.tensorflow
33
34
source ~/.tensorflow/bin/activate
pip install --upgrade pip
35
pip install --upgrade tensorflow-gpu=1.15
36
37
```

38
2. At least 158GB of free disk space to download the FSNS dataset:
39

40
```
41
cd research/attention_ocr/python/datasets
42
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt
43
cd ..
44
45
```

46
47
3. 16GB of RAM or more; 32GB is recommended.
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980.
48
49

[TF]: https://www.tensorflow.org/install/
50
[FSNS]: https://github.com/tensorflow/models/tree/master/research/street
51
52
53
54
55
56

## How to use this code

To run all unit tests:

```
57
cd research/attention_ocr/python
58
find . -name "*_test.py" -printf '%P\n' | xargs python3 -m unittest
59
60
61
62
63
64
65
66
```

To train from scratch:

```
python train.py
```

67
68
To train a model using pre-trained Inception weights as initialization:

69
70
71
```
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
72
python train.py --checkpoint_inception=./inception_v3.ckpt
73
74
75
76
77
```

To fine tune the Attention OCR model using a checkpoint:

```
78
79
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz
tar xf attention_ocr_2017_08_09.tar.gz
80
python train.py --checkpoint=model.ckpt-399731
81
82
```

Neal Wu's avatar
Neal Wu committed
83
## How to use your own image data to train the model
84
85
86
87

You need to define a new dataset. There are two options:

1. Store data in the same format as the FSNS dataset and just reuse the
88
[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
Neal Wu's avatar
Neal Wu committed
89
module. E.g., create a file datasets/newtextdataset.py:
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
```
import fsns

DEFAULT_DATASET_DIR = 'path/to/the/dataset'

DEFAULT_CONFIG = {
    'name':
        'MYDATASET',
    'splits': {
        'train': {
            'size': 123,
            'pattern': 'tfexample_train*'
        },
        'test': {
            'size': 123,
            'pattern': 'tfexample_test*'
        }
    },
    'charset_filename':
        'charset_size.txt',
    'image_shape': (150, 600, 3),
    'num_of_views':
        4,
    'max_sequence_length':
        37,
    'null_code':
        42,
    'items_to_descriptions': {
        'image':
            'A [150 x 600 x 3] color image.',
        'label':
            'Characters codes.',
        'text':
            'A unicode string.',
        'length':
            'A length of the encoded text.',
        'num_of_views':
            'A number of different views stored within the image.'
    }
}


def get_split(split_name, dataset_dir=None, config=None):
  if not dataset_dir:
    dataset_dir = DEFAULT_DATASET_DIR
  if not config:
    config = DEFAULT_CONFIG

  return fsns.get_split(split_name, dataset_dir, config)
```
You will also need to include it into the `datasets/__init__.py` and specify the
dataset name in the command line.

```
python train.py --dataset_name=newtextdataset
```

Neal Wu's avatar
Neal Wu committed
147
Please note that eval.py will also require the same flag.
148

Alexander Gorban's avatar
Alexander Gorban committed
149
150
151
To learn how to store a data in the FSNS
 format please refer to the https://stackoverflow.com/a/44461910/743658.

152
153
154
155
156
157
2. Define a new dataset format. The model needs the following data to train:

- images: input images,  shape [batch_size x H x W x 3];
- labels: ground truth label ids,  shape=[batch_size x seq_length];
- labels_one_hot: labels in one-hot encoding,  shape [batch_size x seq_length x num_char_classes];

158
159
Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/data_provider.py#L33)
for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
160
161
162
163
164
as the example.

## How to use a pre-trained model

The inference part was not released yet, but it is pretty straightforward to
Neal Wu's avatar
Neal Wu committed
165
implement one in Python or C++.
166

167
The recommended way is to use the [Serving infrastructure][serving].
168

169
170
171
172
173
174
175
176
To export to SavedModel format:

```
python model_export.py \
  --checkpoint=model.ckpt-399731 \
  --export_dir=/tmp/attention_ocr_export
```

177
178
Alternatively you can:
1. define a placeholder for images (or use directly an numpy array)
179
2. [create a graph ](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/eval.py#L60)
180
181
182
```
endpoints = model.create_base(images_placeholder, labels_one_hot=None)
```
183
3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/model.py#L494)
184
4. run computations through the graph:
185
```
186
predictions = sess.run(endpoints.predicted_chars,
187
188
                       feed_dict={images_placeholder:images_actual_data})
```
189
190
5. Convert character IDs (predictions) to UTF8 using the provided charset file.

191
192
193
194
195
196
197
198
Please note that tensor names may change overtime and old stored checkpoints can
become unloadable. In many cases such backward incompatible changes can be
fixed with a [string substitution][1] to update the checkpoint itself or using a
custom var_list with [assign_from_checkpoint_fn][2]. For anything
other than a one time experiment please use the [TensorFlow Serving][serving].

[1]: https://github.com/tensorflow/tensorflow/blob/aaf7adc/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py
[2]: https://www.tensorflow.org/api_docs/python/tf/contrib/framework/assign_from_checkpoint_fn
199
[serving]: https://www.tensorflow.org/tfx/serving/serving_basic
200

201
202
203
## Disclaimer

This code is a modified version of the internal model we used for our paper.
204
Currently it reaches 83.79% full sequence accuracy after 400k steps of training.
205
206
The main difference between this version and the version used in the paper - for
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous
207
208
updates), the provided checkpoint was created using this code after ~6 days of
training on a single GPU (Titan X) (it reached 81% after 24 hours of training),
Alexander Gorban's avatar
Alexander Gorban committed
209
the coordinate encoding is disabled by default.