README.md 12.9 KB
Newer Older
1
## Introduction
Khanh Tran's avatar
Khanh Tran committed
2
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
tink2123's avatar
tink2123 committed
3

4
5
6
7
**Recent updates**
- 2020.5.30,Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30,Open source general Chinese OCR model
- 2020.5.30,Provide Ultra-lightweight Chinese OCR model inference
dyning's avatar
dyning committed
8

9
10
11
12
13
14
## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
    - Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
    - Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
dyning's avatar
dyning committed
15

16
### Supported Chinese models list:
dyning's avatar
dyning committed
17

18
|Model Name|Description |Detection Model link|Recognition Model link|
dyning's avatar
dyning committed
19
|-|-|-|-|
20
21
|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
LDOUBLEV's avatar
LDOUBLEV committed
22
23


24
For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
LDOUBLEV's avatar
LDOUBLEV committed
25

Khanh Tran's avatar
Khanh Tran committed
26
**You can also quickly experience the Ultra-lightweight Chinese OCR and General Chinese OCR models as follows:**
27
28

## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
tink2123's avatar
tink2123 committed
29

LDOUBLEV's avatar
LDOUBLEV committed
30
![](doc/imgs_results/11.jpg)
LDOUBLEV's avatar
LDOUBLEV committed
31

Khanh Tran's avatar
Khanh Tran committed
32
The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#Ultra-lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).
dyning's avatar
dyning committed
33

34
#### 1. Environment configuration
LDOUBLEV's avatar
LDOUBLEV committed
35

36
Please see [Quick installation](./doc/installation.md)
tink2123's avatar
tink2123 committed
37

38
#### 2. Download inference models
LDOUBLEV's avatar
LDOUBLEV committed
39

40
#### (1) Download Ultra-lightweight Chinese OCR models
tink2123's avatar
tink2123 committed
41
```
LDOUBLEV's avatar
LDOUBLEV committed
42
mkdir inference && cd inference
43
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
LDOUBLEV's avatar
LDOUBLEV committed
44
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
45
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
LDOUBLEV's avatar
LDOUBLEV committed
46
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
dyning's avatar
dyning committed
47
48
cd ..
```
49
#### (2) Download General Chinese OCR models
dyning's avatar
dyning committed
50
51
```
mkdir inference && cd inference
52
# Download the detection part of the general Chinese OCR model and decompress it
dyning's avatar
dyning committed
53
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
54
# Download the recognition part of the generic Chinese OCR model and decompress it
dyning's avatar
dyning committed
55
56
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
cd ..
tink2123's avatar
tink2123 committed
57
58
```

59
#### 3. Single image and batch image prediction
dyning's avatar
dyning committed
60

Khanh Tran's avatar
Khanh Tran committed
61
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default.
dyning's avatar
dyning committed
62

tink2123's avatar
tink2123 committed
63
```
64
# Set PYTHONPATH environment variable
tink2123's avatar
tink2123 committed
65
66
export PYTHONPATH=.

Khanh Tran's avatar
Khanh Tran committed
67
# Prediction on a single image by specifying image path to image_dir
dyning's avatar
dyning committed
68
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
dyning's avatar
dyning committed
69

Khanh Tran's avatar
Khanh Tran committed
70
# Prediction on a batch of images by specifying image folder path to image_dir
dyning's avatar
dyning committed
71
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
dyning's avatar
dyning committed
72

Khanh Tran's avatar
Khanh Tran committed
73
# If you want to use CPU for prediction, you need to set the use_gpu parameter to False
dyning's avatar
dyning committed
74
75
76
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```

77
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
dyning's avatar
dyning committed
78
```
Khanh Tran's avatar
Khanh Tran committed
79
# Prediction on a single image by specifying image path to image_dir
dyning's avatar
dyning committed
80
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/"  --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
tink2123's avatar
tink2123 committed
81
```
LDOUBLEV's avatar
LDOUBLEV committed
82

83
For more text detection and recognition models, please refer to the document [Inference](./doc/inference.md)
tink2123's avatar
tink2123 committed
84

85
86
87
88
89
## Documentation 
- [Quick installation](./doc/installation.md)
- [Text detection model training/evaluation/prediction](./doc/detection.md)
- [Text recognition model training/evaluation/prediction](./doc/recognition.md)
- [Inference](./doc/inference.md)
dyning's avatar
dyning committed
90

91
## Text detection algorithm
tink2123's avatar
tink2123 committed
92

Khanh Tran's avatar
Khanh Tran committed
93
PaddleOCR open source text detection algorithms list:
tink2123's avatar
tink2123 committed
94
- [x]  EAST([paper](https://arxiv.org/abs/1704.03155))
tink2123's avatar
fix url  
tink2123 committed
95
- [x]  DB([paper](https://arxiv.org/abs/1911.08947))
96
- [ ]  SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
97

Khanh Tran's avatar
Khanh Tran committed
98
On the ICDAR2015 dataset, the text detection result is as follows:
tink2123's avatar
tink2123 committed
99

100
|Model|Backbone|precision|recall|Hmean|Download link|
101
|-|-|-|-|-|-|
102
103
104
105
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
LDOUBLEV's avatar
LDOUBLEV committed
106

107
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
tink2123's avatar
tink2123 committed
108

Khanh Tran's avatar
Khanh Tran committed
109
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
tink2123's avatar
tink2123 committed
110

111
## Text recognition algorithm
tink2123's avatar
tink2123 committed
112

Khanh Tran's avatar
Khanh Tran committed
113
PaddleOCR open-source text recognition algorithms list:
tink2123's avatar
tink2123 committed
114
115
116
117
- [x]  CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x]  Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x]  STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))
118
- [ ]  SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
119

120
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
tink2123's avatar
tink2123 committed
121

122
|Model|Backbone|Avg Accuracy|Module combination|Download link|
dyning's avatar
dyning committed
123
|-|-|-|-|-|
124
125
126
127
128
129
130
131
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
tink2123's avatar
tink2123 committed
132

Khanh Tran's avatar
Khanh Tran committed
133
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/recognition.md)
tink2123's avatar
tink2123 committed
134

135
136
## End-to-end OCR algorithm
- [ ]  [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
137

Khanh Tran's avatar
Khanh Tran committed
138
139
<a name="Ultra-lightweight Chinese OCR results"></a>
## Ultra-lightweight Chinese OCR results
LDOUBLEV's avatar
LDOUBLEV committed
140
141
142
143
144
145
146
147
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
tink2123's avatar
tink2123 committed
148

Khanh Tran's avatar
Khanh Tran committed
149
150
<a name="General Chinese OCR results"></a>
## General Chinese OCR results
151
152
153
154
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)

dyning's avatar
dyning committed
155
## FAQ
Khanh Tran's avatar
Khanh Tran committed
156
1. Prediction error:got an unexpected keyword argument 'gradient_clip'
MissPenguin's avatar
MissPenguin committed
157

Khanh Tran's avatar
Khanh Tran committed
158
    The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.
dyning's avatar
dyning committed
159
    
Khanh Tran's avatar
Khanh Tran committed
160
2. Error when using attention-based recognition model: KeyError: 'predict'
MissPenguin's avatar
MissPenguin committed
161

Khanh Tran's avatar
Khanh Tran committed
162
    The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss.
dyning's avatar
dyning committed
163
    
Khanh Tran's avatar
Khanh Tran committed
164
3. About inference speed
MissPenguin's avatar
MissPenguin committed
165

Khanh Tran's avatar
Khanh Tran committed
166
    When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values.
dyning's avatar
dyning committed
167

Khanh Tran's avatar
Khanh Tran committed
168
4. Service deployment and mobile deployment
MissPenguin's avatar
MissPenguin committed
169

Khanh Tran's avatar
Khanh Tran committed
170
    It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates.
dyning's avatar
dyning committed
171
    
Khanh Tran's avatar
Khanh Tran committed
172
5. Release time of self-developed algorithm
MissPenguin's avatar
MissPenguin committed
173

Khanh Tran's avatar
Khanh Tran committed
174
    Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient.
dyning's avatar
dyning committed
175

176
## Welcome to the PaddleOCR technical exchange group
Khanh Tran's avatar
Khanh Tran committed
177
Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~
dyning's avatar
dyning committed
178

tink2123's avatar
tink2123 committed
179

180
## References
tink2123's avatar
tink2123 committed
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
```
1. EAST:
@inproceedings{zhou2017east,
  title={EAST: an efficient and accurate scene text detector},
  author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
  booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
  pages={5551--5560},
  year={2017}
}

2. DB:
@article{liao2019real,
  title={Real-time Scene Text Detection with Differentiable Binarization},
  author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
  journal={arXiv preprint arXiv:1911.08947},
  year={2019}
}

3. DTRB:
@inproceedings{baek2019wrong,
  title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
  author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={4715--4723},
  year={2019}
}

4. SAST:
@inproceedings{wang2019single,
  title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
  author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={1277--1285},
  year={2019}
}

5. SRN:
@article{yu2020towards,
  title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
  author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
  journal={arXiv preprint arXiv:2003.12294},
  year={2020}
}

6. end2end-psl:
@inproceedings{sun2019chinese,
  title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
  author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={9086--9095},
  year={2019}
}
```
dyning's avatar
dyning committed
234

235
236
## License
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
dyning's avatar
dyning committed
237

238
## Contribution
Khanh Tran's avatar
Khanh Tran committed
239
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.