README.md 9.63 KB
Newer Older
1
# Attention-based Extraction of Structured Information from Street View Imagery
2

Jaeyoun Kim's avatar
Jaeyoun Kim committed
3
4
5
6
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attention-based-extraction-of-structured/optical-character-recognition-on-fsns-test)](https://paperswithcode.com/sota/optical-character-recognition-on-fsns-test?p=attention-based-extraction-of-structured)
[![Paper](http://img.shields.io/badge/paper-arXiv.1704.03549-B3181B.svg)](https://arxiv.org/abs/1704.03549)
[![TensorFlow 1.15](https://img.shields.io/badge/tensorflow-1.15-brightgreen)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)

7
8
9
*A TensorFlow model for real-world image text extraction problems.*

This folder contains the code needed to train a new Attention OCR model on the
10
[FSNS dataset][FSNS] to transcribe street names in France. You can also train the code on your own data.
11
12
13
14
15
16

More details can be found in our paper:

["Attention-based Extraction of Structured Information from Street View
Imagery"](https://arxiv.org/abs/1704.03549)

17
18
19
20
21
22
23
## Description

* Paper presents a model based on ConvNets, RNN's and a novel attention mechanism.
Achieves **84.2%** on FSNS beating the previous benchmark (**72.46%**). Also studies
the speed/accuracy tradeoff that results from using CNN feature extractors of
different depths.

24
25
## Contacts

Jaeyoun Kim's avatar
Jaeyoun Kim committed
26
27
28
29
Authors

* Zbigniew Wojna (zbigniewwojna@gmail.com)
* Alexander Gorban (gorban@google.com)
30

31
32
33
34
35
36
37
38
39
40
41
42
Maintainer

* Xavier Gibert ([@xavigibert](https://github.com/xavigibert))

## Table of Contents

* [Requirements](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#requirements)
* [Dataset](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#dataset)
* [How to use this code](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#how-to-use-this-code)
* [Using your own image data](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#using-your-own-image-data)
* [How to use a pre-trained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#how-to-use-a-pre-trained-model)
* [Disclaimer](https://github.com/tensorflow/models/blob/master/research/attention_ocr/README.md#disclaimer)
43
44
45

## Requirements

46
1. Install the TensorFlow library ([instructions][TF]). For example:
47
48

```
49
python3 -m venv ~/.tensorflow
50
51
source ~/.tensorflow/bin/activate
pip install --upgrade pip
52
pip install --upgrade tensorflow-gpu=1.15
53
54
```

55
2. At least 158GB of free disk space to download the FSNS dataset:
56

57
```
58
cd research/attention_ocr/python/datasets
59
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt
60
cd ..
61
62
```

63
64
3. 16GB of RAM or more; 32GB is recommended.
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980.
65
66

[TF]: https://www.tensorflow.org/install/
67
[FSNS]: https://github.com/tensorflow/models/tree/master/research/street
68

69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
## Dataset

The French Street Name Signs (FSNS) dataset is split into subsets, 
each of which is composed of multiple files. Note that these datasets 
are very large. The approximate sizes are:

* Train: 512 files of 300MB each.
* Validation: 64 files of 40MB each.
* Test: 64 files of 50MB each.
* The datasets download includes a directory `testdata` that contains 
some small datasets that are big enough to test that models can 
actually learn something.
* Total: around 158GB

The download paths are in the following list:

```
https://download.tensorflow.org/data/fsns-20160927/charset_size=134.txt
https://download.tensorflow.org/data/fsns-20160927/test/test-00000-of-00064
...
https://download.tensorflow.org/data/fsns-20160927/test/test-00063-of-00064
https://download.tensorflow.org/data/fsns-20160927/testdata/arial-32-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/fsns-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/mnist-sample-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/numbers-16-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/train/train-00000-of-00512
...
https://download.tensorflow.org/data/fsns-20160927/train/train-00511-of-00512
https://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-of-00064
...
https://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064
```

All URLs are stored in the [research/street](https://github.com/tensorflow/models/tree/master/research/street) 
repository in the text file `python/fsns_urls.txt`.

105
106
107
108
109
## How to use this code

To run all unit tests:

```
110
cd research/attention_ocr/python
111
find . -name "*_test.py" -printf '%P\n' | xargs python3 -m unittest
112
113
114
115
116
117
118
119
```

To train from scratch:

```
python train.py
```

120
121
To train a model using pre-trained Inception weights as initialization:

122
123
124
```
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
125
python train.py --checkpoint_inception=./inception_v3.ckpt
126
127
128
129
130
```

To fine tune the Attention OCR model using a checkpoint:

```
131
132
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz
tar xf attention_ocr_2017_08_09.tar.gz
133
python train.py --checkpoint=model.ckpt-399731
134
135
```

136
## Using your own image data
137
138
139
140

You need to define a new dataset. There are two options:

1. Store data in the same format as the FSNS dataset and just reuse the
141
[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
Neal Wu's avatar
Neal Wu committed
142
module. E.g., create a file datasets/newtextdataset.py:
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
```
import fsns

DEFAULT_DATASET_DIR = 'path/to/the/dataset'

DEFAULT_CONFIG = {
    'name':
        'MYDATASET',
    'splits': {
        'train': {
            'size': 123,
            'pattern': 'tfexample_train*'
        },
        'test': {
            'size': 123,
            'pattern': 'tfexample_test*'
        }
    },
    'charset_filename':
        'charset_size.txt',
    'image_shape': (150, 600, 3),
    'num_of_views':
        4,
    'max_sequence_length':
        37,
    'null_code':
        42,
    'items_to_descriptions': {
        'image':
            'A [150 x 600 x 3] color image.',
        'label':
            'Characters codes.',
        'text':
            'A unicode string.',
        'length':
            'A length of the encoded text.',
        'num_of_views':
            'A number of different views stored within the image.'
    }
}


def get_split(split_name, dataset_dir=None, config=None):
  if not dataset_dir:
    dataset_dir = DEFAULT_DATASET_DIR
  if not config:
    config = DEFAULT_CONFIG

  return fsns.get_split(split_name, dataset_dir, config)
```
You will also need to include it into the `datasets/__init__.py` and specify the
dataset name in the command line.

```
python train.py --dataset_name=newtextdataset
```

Neal Wu's avatar
Neal Wu committed
200
Please note that eval.py will also require the same flag.
201

Alexander Gorban's avatar
Alexander Gorban committed
202
203
204
To learn how to store a data in the FSNS
 format please refer to the https://stackoverflow.com/a/44461910/743658.

205
206
207
208
209
210
2. Define a new dataset format. The model needs the following data to train:

- images: input images,  shape [batch_size x H x W x 3];
- labels: ground truth label ids,  shape=[batch_size x seq_length];
- labels_one_hot: labels in one-hot encoding,  shape [batch_size x seq_length x num_char_classes];

211
212
Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/data_provider.py#L33)
for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
213
214
215
216
217
as the example.

## How to use a pre-trained model

The inference part was not released yet, but it is pretty straightforward to
Neal Wu's avatar
Neal Wu committed
218
implement one in Python or C++.
219

220
The recommended way is to use the [Serving infrastructure][serving].
221

222
223
224
225
226
227
228
229
To export to SavedModel format:

```
python model_export.py \
  --checkpoint=model.ckpt-399731 \
  --export_dir=/tmp/attention_ocr_export
```

230
231
Alternatively you can:
1. define a placeholder for images (or use directly an numpy array)
232
2. [create a graph ](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/eval.py#L60)
233
234
235
```
endpoints = model.create_base(images_placeholder, labels_one_hot=None)
```
236
3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/model.py#L494)
237
4. run computations through the graph:
238
```
239
predictions = sess.run(endpoints.predicted_chars,
240
241
                       feed_dict={images_placeholder:images_actual_data})
```
242
243
5. Convert character IDs (predictions) to UTF8 using the provided charset file.

244
245
246
247
248
249
250
251
Please note that tensor names may change overtime and old stored checkpoints can
become unloadable. In many cases such backward incompatible changes can be
fixed with a [string substitution][1] to update the checkpoint itself or using a
custom var_list with [assign_from_checkpoint_fn][2]. For anything
other than a one time experiment please use the [TensorFlow Serving][serving].

[1]: https://github.com/tensorflow/tensorflow/blob/aaf7adc/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py
[2]: https://www.tensorflow.org/api_docs/python/tf/contrib/framework/assign_from_checkpoint_fn
252
[serving]: https://www.tensorflow.org/tfx/serving/serving_basic
253

254
255
256
## Disclaimer

This code is a modified version of the internal model we used for our paper.
257
Currently it reaches 83.79% full sequence accuracy after 400k steps of training.
258
259
The main difference between this version and the version used in the paper - for
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous
260
261
updates), the provided checkpoint was created using this code after ~6 days of
training on a single GPU (Titan X) (it reached 81% after 24 hours of training),
Alexander Gorban's avatar
Alexander Gorban committed
262
the coordinate encoding is disabled by default.