README.md 12.6 KB
Newer Older
1
2
## Introduction
PaddleOCR aims to create a rich, leading, and practical OCR tool library to help users train better models and apply them.
tink2123's avatar
tink2123 committed
3

4
5
6
7
**Recent updates**
- 2020.5.30,Model prediction and training support Windows systems, and the display of recognition results is optimized
- 2020.5.30,Open source general Chinese OCR model
- 2020.5.30,Provide Ultra-lightweight Chinese OCR model inference
dyning's avatar
dyning committed
8

9
10
11
12
13
14
## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
    - Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
    - Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
dyning's avatar
dyning committed
15

16
### Supported Chinese models list:
dyning's avatar
dyning committed
17

18
|Model Name|Description |Detection Model link|Recognition Model link|
dyning's avatar
dyning committed
19
|-|-|-|-|
20
21
|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
LDOUBLEV's avatar
LDOUBLEV committed
22
23


24
For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
LDOUBLEV's avatar
LDOUBLEV committed
25

26
27
28
**You can also quickly experience the Ultra-lightweight Chinese OCR and general Chinese OCR models as follows:**

## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
tink2123's avatar
tink2123 committed
29

LDOUBLEV's avatar
LDOUBLEV committed
30
![](doc/imgs_results/11.jpg)
LDOUBLEV's avatar
LDOUBLEV committed
31

32
The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#超轻量级中文OCR效果展示) and [General Chinese OCR results](#通用中文OCR效果展示).
dyning's avatar
dyning committed
33

34
#### 1. Environment configuration
LDOUBLEV's avatar
LDOUBLEV committed
35

36
Please see [Quick installation](./doc/installation.md)
tink2123's avatar
tink2123 committed
37

38
#### 2. Download inference models
LDOUBLEV's avatar
LDOUBLEV committed
39

40
#### (1) Download Ultra-lightweight Chinese OCR models
tink2123's avatar
tink2123 committed
41
```
LDOUBLEV's avatar
LDOUBLEV committed
42
mkdir inference && cd inference
43
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
LDOUBLEV's avatar
LDOUBLEV committed
44
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
45
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
LDOUBLEV's avatar
LDOUBLEV committed
46
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
dyning's avatar
dyning committed
47
48
cd ..
```
49
#### (2) Download General Chinese OCR models
dyning's avatar
dyning committed
50
51
```
mkdir inference && cd inference
52
# Download the detection part of the general Chinese OCR model and decompress it
dyning's avatar
dyning committed
53
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
54
# Download the recognition part of the generic Chinese OCR model and decompress it
dyning's avatar
dyning committed
55
56
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
cd ..
tink2123's avatar
tink2123 committed
57
58
```

59
#### 3. Single image and batch image prediction
dyning's avatar
dyning committed
60

61
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
dyning's avatar
dyning committed
62

tink2123's avatar
tink2123 committed
63
```
64
# Set PYTHONPATH environment variable
tink2123's avatar
tink2123 committed
65
66
export PYTHONPATH=.

67
# Predict a single image by specifying image path to image_dir
dyning's avatar
dyning committed
68
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
dyning's avatar
dyning committed
69

70
# Predict a batch of images by specifying image folder path to image_dir
dyning's avatar
dyning committed
71
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
dyning's avatar
dyning committed
72

73
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
dyning's avatar
dyning committed
74
75
76
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```

77
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
dyning's avatar
dyning committed
78
```
79
# Predict a single image by specifying image path to image_dir
dyning's avatar
dyning committed
80
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/"  --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
tink2123's avatar
tink2123 committed
81
```
LDOUBLEV's avatar
LDOUBLEV committed
82

83
For more text detection and recognition models, please refer to the document [Inference](./doc/inference.md)
tink2123's avatar
tink2123 committed
84

85
86
87
88
89
## Documentation 
- [Quick installation](./doc/installation.md)
- [Text detection model training/evaluation/prediction](./doc/detection.md)
- [Text recognition model training/evaluation/prediction](./doc/recognition.md)
- [Inference](./doc/inference.md)
dyning's avatar
dyning committed
90

91
## Text detection algorithm
tink2123's avatar
tink2123 committed
92

93
PaddleOCR open source text detection algorithm list:
tink2123's avatar
tink2123 committed
94
- [x]  EAST([paper](https://arxiv.org/abs/1704.03155))
tink2123's avatar
fix url  
tink2123 committed
95
- [x]  DB([paper](https://arxiv.org/abs/1911.08947))
96
- [ ]  SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
97

98
On the ICDAR2015 text detection public dataset, the detection result is as follows:
tink2123's avatar
tink2123 committed
99

100
|Model|Backbone|precision|recall|Hmean|Download link|
101
|-|-|-|-|-|-|
102
103
104
105
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
LDOUBLEV's avatar
LDOUBLEV committed
106

107
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
tink2123's avatar
tink2123 committed
108

109
For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
tink2123's avatar
tink2123 committed
110

111
## Text recognition algorithm
tink2123's avatar
tink2123 committed
112

113
PaddleOCR open-source text recognition algorithm list:
tink2123's avatar
tink2123 committed
114
115
116
117
- [x]  CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x]  Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x]  STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))
118
- [ ]  SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
119

120
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
tink2123's avatar
tink2123 committed
121

122
|Model|Backbone|Avg Accuracy|Module combination|Download link|
dyning's avatar
dyning committed
123
|-|-|-|-|-|
124
125
126
127
128
129
130
131
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
tink2123's avatar
tink2123 committed
132

133
Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction](./doc/recognition.md)
tink2123's avatar
tink2123 committed
134

135
136
## End-to-end OCR algorithm
- [ ]  [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
tink2123's avatar
tink2123 committed
137

dyning's avatar
dyning committed
138
<a name="超轻量级中文OCR效果展示"></a>
139
## Ultra-lightweight Chinese OCR result
LDOUBLEV's avatar
LDOUBLEV committed
140
141
142
143
144
145
146
147
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
tink2123's avatar
tink2123 committed
148

dyning's avatar
dyning committed
149
<a name="通用中文OCR效果展示"></a>
150
151
152
153
154
## 通用中文OCR效果展示
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)

dyning's avatar
dyning committed
155
156
## FAQ
1. 预测报错:got an unexpected keyword argument 'gradient_clip'
MissPenguin's avatar
MissPenguin committed
157

158
    The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.。
dyning's avatar
dyning committed
159
160
    
2. 转换attention识别模型时报错:KeyError: 'predict'
MissPenguin's avatar
MissPenguin committed
161

dyning's avatar
dyning committed
162
163
164
    基于Attention损失的识别模型推理还在调试中。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。
    
3. 关于推理速度
MissPenguin's avatar
MissPenguin committed
165

dyning's avatar
dyning committed
166
167
168
    图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。

4. 服务部署与移动端部署
MissPenguin's avatar
MissPenguin committed
169

dyning's avatar
dyning committed
170
171
172
    预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。
    
5. 自研算法发布时间
MissPenguin's avatar
MissPenguin committed
173

dyning's avatar
dyning committed
174
175
    自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。

176
## Welcome to the PaddleOCR technical exchange group
MissPenguin's avatar
MissPenguin committed
177
加微信:paddlehelp,备注OCR,小助手拉你进群~
dyning's avatar
dyning committed
178

tink2123's avatar
tink2123 committed
179

180
## References
tink2123's avatar
tink2123 committed
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
```
1. EAST:
@inproceedings{zhou2017east,
  title={EAST: an efficient and accurate scene text detector},
  author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
  booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
  pages={5551--5560},
  year={2017}
}

2. DB:
@article{liao2019real,
  title={Real-time Scene Text Detection with Differentiable Binarization},
  author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
  journal={arXiv preprint arXiv:1911.08947},
  year={2019}
}

3. DTRB:
@inproceedings{baek2019wrong,
  title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
  author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={4715--4723},
  year={2019}
}

4. SAST:
@inproceedings{wang2019single,
  title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
  author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={1277--1285},
  year={2019}
}

5. SRN:
@article{yu2020towards,
  title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
  author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
  journal={arXiv preprint arXiv:2003.12294},
  year={2020}
}

6. end2end-psl:
@inproceedings{sun2019chinese,
  title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
  author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={9086--9095},
  year={2019}
}
```
dyning's avatar
dyning committed
234

235
236
## License
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
dyning's avatar
dyning committed
237

238
239
## Contribution
We welcome your contribution to PaddleOCR and thank you for your feedback.