README.md 2.49 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Attention-based Extraction of Structured Information from Street View Imagery

*A TensorFlow model for real-world image text extraction problems.*

This folder contains the code needed to train a new Attention OCR model on the
[FSNS dataset][FSNS] dataset to transcribe street names in France. You can
also use it to train it on your own data.

More details can be found in our paper:

["Attention-based Extraction of Structured Information from Street View
Imagery"](https://arxiv.org/abs/1704.03549)

## Contacts

Authors:
Zbigniew Wojna <zbigniewwojna@gmail.com>,
Alexander Gorban <gorban@google.com>

Pull requests:
[alexgorban](https://github.com/alexgorban)

## Requirements

25
1. Install the TensorFlow library ([instructions][TF]). For example:
26
27
28
29
30
31
32
33

```
virtualenv --system-site-packages ~/.tensorflow
source ~/.tensorflow/bin/activate
pip install --upgrade pip
pip install --upgrade tensorflow_gpu
```

34
2. At least 158GB of free disk space to download the FSNS dataset:
35

36
```
37
cd models/attention_ocr/python/datasets
38
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt
39
cd ..
40
41
```

42
43
3. 16GB of RAM or more; 32GB is recommended.
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980.
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

[TF]: https://www.tensorflow.org/install/
[FSNS]: https://github.com/tensorflow/models/tree/master/street

## How to use this code

To run all unit tests:

```
python -m unittest discover -p  '*_test.py'
```

To train from scratch:

```
python train.py
```

62
63
To train a model using pre-trained Inception weights as initialization:

64
65
66
67
68
69
70
71
72
```
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
python train.py --checkpoint_inception=inception_v3.ckpt
```

To fine tune the Attention OCR model using a checkpoint:

```
73
74
75
wget http://download.tensorflow.org/models/attention_ocr_2017_05_17.tar.gz
tar xf attention_ocr_2017_05_17.tar.gz
python train.py --checkpoint=model.ckpt-399731
76
77
78
79
80
```

## Disclaimer

This code is a modified version of the internal model we used for our paper.
81
Currently it reaches 83.79% full sequence accuracy after 400k steps of training.
82
83
The main difference between this version and the version used in the paper - for
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous
84
85
86
updates), the provided checkpoint was created using this code after ~6 days of
training on a single GPU (Titan X) (it reached 81% after 24 hours of training),
the coordinate encoding is missing TODO(alexgorban@).