Commit 79aec8f2 authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into fixocr

parents caaf0bd4 7214a741
......@@ -4,100 +4,42 @@
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
**近期更新**
- 2020.7.9 添加支持空格的识别模型,[识别效果](#支持空格的中文OCR效果展示)
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md)
- 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验
- 2020.5.30 模型预测、训练支持Windows系统
- [more](./doc/doc_ch/update.md)
## 特性
- 超轻量级中文OCR,总模型仅8.6M
- 超轻量级中文OCR模型,总模型仅8.6M
- 单模型支持中英文数字组合识别、竖排文本识别、长文本识别
- 检测模型DB(4.1M)+识别模型CRNN(4.5M)
- 实用通用中文OCR模型
- 多种预测推理部署方案,包括服务部署和端测部署
- 多种文本检测训练算法,EAST、DB
- 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE
- 可运行于Linux、Windows、MacOS等多种系统
### 支持的中文模型列表:
|模型名称|模型简介|检测模型地址|识别模型地址|
|-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
**也可以按如下教程快速体验超轻量级中文OCR和通用中文OCR模型。**
## **超轻量级中文OCR以及通用中文OCR体验**
## 快速体验
![](doc/imgs_results/11.jpg)
上图是超轻量级中文OCR模型效果展示,更多效果图请见文末[超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)[通用中文OCR效果展示](#通用中文OCR效果展示)
#### 1.环境配置
请先参考[快速安装](./doc/doc_ch/installation.md)配置PaddleOCR运行环境。
#### 2.inference模型下载
*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下*
#### (1)超轻量级中文OCR模型下载
```
mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
#### (2)通用中文OCR模型下载
```
mkdir inference && cd inference
# 下载通用中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
# 下载通用中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
cd ..
```
#### 3.单张图像或者图像集合预测
以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。
```bash
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# 预测image_dir指定的图像集合
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
上图是超轻量级中文OCR模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)
# 如果想使用CPU进行预测,需设置use_gpu参数为False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
- 超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下:
```
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于预测引擎推理](./doc/doc_ch/inference.md)
## 中文OCR模型列表
## 文档教程
- [快速安装](./doc/doc_ch/installation.md)
- [文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)
- [文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)
- [基于预测引擎推理](./doc/doc_ch/inference.md)
- [数据集](./doc/doc_ch/datasets.md)
- [FAQ](#FAQ)
- [联系我们](#欢迎加入PaddleOCR技术交流群)
- [参考文献](#参考文献)
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
|-|-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
## 文本检测算法
## 算法介绍
### 1.文本检测算法
PaddleOCR开源的文本检测算法列表:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
......@@ -121,9 +63,9 @@ PaddleOCR开源的文本检测算法列表:
* 注: 上述DB模型的训练和评估,需设置后处理参数box_thresh=0.6,unclip_ratio=1.5,使用不同数据集、不同模型训练,可调整这两个参数进行优化
PaddleOCR文本检测算法的训练和使用请参考文档教程中[文本检测模型训练/评估/预测](./doc/doc_ch/detection.md)
PaddleOCR文本检测算法的训练和使用请参考文档教程中[模型训练/评估中的文本检测部分](./doc/doc_ch/detection.md)
## 文本识别算法
### 2.文本识别算法
PaddleOCR开源的文本识别算法列表:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
......@@ -151,27 +93,49 @@ PaddleOCR开源的文本识别算法列表:
|超轻量中文模型|MobileNetV3|rec_chinese_lite_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|通用中文OCR模型|Resnet34_vd|rec_chinese_common_train.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识别模型训练/评估/预测](./doc/doc_ch/recognition.md)
PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./doc/doc_ch/recognition.md)
## 端到端OCR算法
### 3.端到端OCR算法
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(百度自研, comming soon)
## 文档教程
- [快速安装](./doc/doc_ch/installation.md)
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
- 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 预测部署
- [基于Python预测引擎推理](./doc/doc_ch/inference.md)
- 基于C++预测引擎推理(comming soon)
- [服务部署](./doc/doc_ch/serving.md)
- 端侧部署(comming soon)
- [数据集](./doc/doc_ch/datasets.md)
- [FAQ](#FAQ)
- 效果展示
- [超轻量级中文OCR效果展示](#超轻量级中文OCR效果展示)
- [通用中文OCR效果展示](#通用中文OCR效果展示)
- [支持空格的中文OCR效果展示](#支持空格的中文OCR效果展示)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
## 效果展示
<a name="超轻量级中文OCR效果展示"></a>
## 超轻量级中文OCR效果展示
![](doc/imgs_results/1.jpg)
### 1.超轻量级中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
![](doc/imgs_results/4.jpg)
![](doc/imgs_results/6.jpg)
![](doc/imgs_results/9.jpg)
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
<a name="通用中文OCR效果展示"></a>
## 通用中文OCR效果展示
### 2.通用中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)
<a name="支持空格的中文OCR效果展示"></a>
### 3.支持空格的中文OCR效果展示 [more](./doc/doc_ch/visualization.md)
![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg)
<a name="FAQ"></a>
## FAQ
......@@ -194,65 +158,11 @@ PaddleOCR文本识别算法的训练和使用请参考文档教程中[文本识
扫描二维码或者加微信:paddlehelp,备注OCR,小助手拉你进群~
<img src="./doc/paddlehelp.jpg" width = "200" height = "200" />
<a name="参考文献"></a>
## 参考文献
```
1. EAST:
@inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
<a name="贡献代码"></a>
## 贡献代码
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。
......
......@@ -3,12 +3,12 @@ English | [简体中文](README.md)
## INTRODUCTION
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates**
**Recent updates**
- 2020.7.9 Add recognition model to support space, [recognition result](#space Chinese OCR results)
- 2020.7.9 Add data auguments and learning rate decay strategies,please read [config](./doc/doc_en/config_en.md)
- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training supported on Windows system
- [more](./doc/doc_en/update_en.md)
## FEATURES
......@@ -18,12 +18,13 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user
- Various text detection algorithms: EAST, DB
- Various text recognition algorithms: Rosetta, CRNN, STAR-Net, RARE
<a name="Supported-Chinese-model-list"></a>
### Supported Chinese models list:
|Model Name|Description |Detection Model link|Recognition Model link|
|-|-|-|-|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
|Model Name|Description |Detection Model link|Recognition Model link| Support for space Recognition Model link|
|-|-|-|-|-|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pre-train model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
......@@ -34,7 +35,7 @@ For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/o
![](doc/imgs_results/11.jpg)
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) , [General Chinese OCR results](#General-Chinese-OCR-results) and [Support for space Recognition Model](#Space-Chinese-OCR-results).
#### 1. ENVIRONMENT CONFIGURATION
......@@ -45,22 +46,42 @@ Please see [Quick installation](./doc/doc_en/installation_en.md)
#### (1) Download lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
Copy the detection and recognition 'inference model' address in [Chinese model List](#Supported-Chinese-model-list), download and unpack:
```
mkdir inference && cd inference
# Download the detection part of the Chinese OCR and decompress it
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition part of the Chinese OCR and decompress it
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
cd ..
```
Take lightweight Chinese OCR model as an example:
```
mkdir inference && cd inference
# Download the detection part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
# Download the space-recognized part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar && tar xf ch_rec_mv3_crnn_enhance_infer.tar
cd ..
```
#### (2) Download General Chinese OCR models
After the decompression is completed, the file structure should be as follows:
```
mkdir inference && cd inference
# Download the detection part of the general Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
# Download the recognition part of the generic Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
cd ..
|-inference
|-ch_rec_mv3_crnn
|- model
|- params
|-ch_det_mv3_db
|- model
|- params
...
```
#### 3. SINGLE IMAGE AND BATCH PREDICTION
......@@ -85,6 +106,13 @@ To run inference of the Generic Chinese OCR model, follow these steps above to d
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
To run inference of the space-Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# Prediction on a single image by specifying image path to image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/"
```
For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)
## DOCUMENTATION
......@@ -92,7 +120,9 @@ For more text detection and recognition models, please refer to the document [In
- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/doc_en/inference_en.md)
- [Introduction of yml file](./doc/doc_en/config_en.md)
- [Dataset](./doc/doc_en/datasets_en.md)
- [FAQ]((#FAQ)
## TEXT DETECTION ALGORITHM
......@@ -145,15 +175,15 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
## END-TO-END OCR ALGORITHM
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
<a name="lightweight Chinese OCR results"></a>
<a name="lightweight-Chinese-OCR-results"></a>
## LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
......@@ -164,12 +194,24 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
![](doc/imgs_results/16.png)
![](doc/imgs_results/22.jpg)
<a name="General Chinese OCR results"></a>
<a name="General-Chinese-OCR-results"></a>
## General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/11.jpg)
![](doc/imgs_results/chinese_db_crnn_server/2.jpg)
![](doc/imgs_results/chinese_db_crnn_server/8.jpg)
<a name="Space-Chinese-OCR-results"></a>
## space Chinese OCR results
### LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/img_11.jpg)
### General Chinese OCR results
![](doc/imgs_results/chinese_db_crnn_server/en_paper.jpg)
<a name="FAQ"></a>
## FAQ
1. Error when using attention-based recognition model: KeyError: 'predict'
......
......@@ -6,7 +6,8 @@ Global:
print_batch_step: 2
save_model_dir: ./output/det_db/
save_epoch_step: 200
eval_batch_step: 5000
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
train_batch_size_per_card: 16
test_batch_size_per_card: 16
image_shape: [3, 640, 640]
......@@ -50,4 +51,4 @@ PostProcess:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 2.0
\ No newline at end of file
unclip_ratio: 2.0
......@@ -6,7 +6,7 @@ Global:
print_batch_step: 5
save_model_dir: ./output/det_east/
save_epoch_step: 200
eval_batch_step: 5000
eval_batch_step: [5000, 5000]
train_batch_size_per_card: 16
test_batch_size_per_card: 16
image_shape: [3, 512, 512]
......
......@@ -6,7 +6,7 @@ Global:
print_batch_step: 2
save_model_dir: ./output/det_db/
save_epoch_step: 200
eval_batch_step: 5000
eval_batch_step: [5000, 5000]
train_batch_size_per_card: 8
test_batch_size_per_card: 16
image_shape: [3, 640, 640]
......
......@@ -6,7 +6,7 @@ Global:
print_batch_step: 5
save_model_dir: ./output/det_east/
save_epoch_step: 200
eval_batch_step: 5000
eval_batch_step: [5000, 5000]
train_batch_size_per_card: 8
test_batch_size_per_card: 16
image_shape: [3, 512, 512]
......
......@@ -14,6 +14,8 @@ Global:
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: false
use_space_char: false
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
......
......@@ -14,6 +14,8 @@ Global:
character_type: ch
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
loss_type: ctc
distort: false
use_space_char: false
reader_yml: ./configs/rec/rec_chinese_reader.yml
pretrain_weights:
checkpoints:
......
......@@ -13,6 +13,7 @@ Global:
max_text_length: 25
character_type: en
loss_type: ctc
distort: true
reader_yml: ./configs/rec/rec_icdar15_reader.yml
pretrain_weights: ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy
checkpoints:
......
{
"modules_info": {
"ocr_det": {
"init_args": {
"version": "1.0.0",
"det_model_dir": "./inference/ch_det_mv3_db/",
"use_gpu": true
},
"predict_args": {
"visualization": false
}
}
}
}
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from tools.infer.utility import draw_boxes, base64_to_cv2
from tools.infer.predict_det import TextDetector
class Config(object):
pass
@moduleinfo(
name="ocr_det",
version="1.0.0",
summary="ocr detection service",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRDet(hub.Module):
def _initialize(self,
det_model_dir="",
det_algorithm="DB",
use_gpu=False
):
"""
initialize with the necessary elements
"""
self.config = Config()
self.config.use_gpu = use_gpu
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
print("use gpu: ", use_gpu)
print("CUDA_VISIBLE_DEVICES: ", _places)
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.config.ir_optim = True
self.config.gpu_mem = 8000
#params for text detector
self.config.det_algorithm = det_algorithm
self.config.det_model_dir = det_model_dir
# self.config.det_model_dir = "./inference/det/"
#DB parmas
self.config.det_db_thresh =0.3
self.config.det_db_box_thresh =0.5
self.config.det_db_unclip_ratio =2.0
#EAST parmas
self.config.det_east_score_thresh = 0.8
self.config.det_east_cover_thresh = 0.1
self.config.det_east_nms_thresh = 0.2
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def det_text(self,
images=[],
paths=[],
det_max_side_len=960,
draw_img_save='ocr_det_result',
visualization=False):
"""
Get the text box in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu. Default false.
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
Returns:
res (list): The result of text detection box and save path of images.
"""
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
self.config.det_max_side_len = det_max_side_len
text_detector = TextDetector(self.config)
all_results = []
for img in predicted_data:
result = {'save_path': ''}
if img is None:
logger.info("error in loading image")
result['data'] = []
all_results.append(result)
continue
dt_boxes, elapse = text_detector(img)
print("Predict time : ", elapse)
result['data'] = dt_boxes.astype(np.int).tolist()
if visualization:
image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
draw_img = draw_boxes(image, dt_boxes)
draw_img = np.array(draw_img)
if not os.path.exists(draw_img_save):
os.makedirs(draw_img_save)
saved_name = 'ndarray_{}.jpg'.format(time.time())
save_file_path = os.path.join(draw_img_save, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
print("The visualized image saved in {}".format(save_file_path))
result['save_path'] = save_file_path
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.det_text(images_decode, **kwargs)
return results
if __name__ == '__main__':
ocr = OCRDet()
image_path = [
'./doc/imgs/11.jpg',
'./doc/imgs/12.jpg',
]
res = ocr.det_text(paths=image_path, visualization=True)
print(res)
\ No newline at end of file
{
"modules_info": {
"ocr_rec": {
"init_args": {
"version": "1.0.0",
"det_model_dir": "./inference/ch_rec_mv3_crnn/",
"use_gpu": true
},
"predict_args": {
}
}
}
}
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from tools.infer.utility import base64_to_cv2
from tools.infer.predict_rec import TextRecognizer
class Config(object):
pass
@moduleinfo(
name="ocr_rec",
version="1.0.0",
summary="ocr recognition service",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRRec(hub.Module):
def _initialize(self,
rec_model_dir="",
rec_algorithm="CRNN",
rec_char_dict_path="./ppocr/utils/ppocr_keys_v1.txt",
rec_batch_num=30,
use_gpu=False
):
"""
initialize with the necessary elements
"""
self.config = Config()
self.config.use_gpu = use_gpu
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
print("use gpu: ", use_gpu)
print("CUDA_VISIBLE_DEVICES: ", _places)
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.config.ir_optim = True
self.config.gpu_mem = 8000
#params for text recognizer
self.config.rec_algorithm = rec_algorithm
self.config.rec_model_dir = rec_model_dir
# self.config.rec_model_dir = "./inference/rec/"
self.config.rec_image_shape = "3, 32, 320"
self.config.rec_char_type = 'ch'
self.config.rec_batch_num = rec_batch_num
self.config.rec_char_dict_path = rec_char_dict_path
self.config.use_space_char = True
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def rec_text(self,
images=[],
paths=[]):
"""
Get the text box in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
Returns:
res (list): The result of text detection box and save path of images.
"""
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
text_recognizer = TextRecognizer(self.config)
img_list = []
for img in predicted_data:
if img is None:
continue
img_list.append(img)
try:
rec_res, predict_time = text_recognizer(img_list)
except Exception as e:
print(e)
return []
return rec_res
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.det_text(images_decode, **kwargs)
return results
if __name__ == '__main__':
ocr = OCRRec()
image_path = [
'./doc/imgs_words/ch/word_1.jpg',
'./doc/imgs_words/ch/word_2.jpg',
'./doc/imgs_words/ch/word_3.jpg',
]
res = ocr.rec_text(paths=image_path)
print(res)
\ No newline at end of file
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"det_model_dir": "./inference/ch_det_mv3_db/",
"rec_model_dir": "./inference/ch_rec_mv3_crnn/",
"use_gpu": true
},
"predict_args": {
"visualization": false
}
}
}
}
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from tools.infer.utility import draw_ocr, base64_to_cv2
from tools.infer.predict_system import TextSystem
class Config(object):
pass
@moduleinfo(
name="ocr_system",
version="1.0.0",
summary="ocr system service",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class OCRSystem(hub.Module):
def _initialize(self,
det_model_dir="",
det_algorithm="DB",
rec_model_dir="",
rec_algorithm="CRNN",
rec_char_dict_path="./ppocr/utils/ppocr_keys_v1.txt",
rec_batch_num=30,
use_gpu=False
):
"""
initialize with the necessary elements
"""
self.config = Config()
self.config.use_gpu = use_gpu
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
print("use gpu: ", use_gpu)
print("CUDA_VISIBLE_DEVICES: ", _places)
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.config.ir_optim = True
self.config.gpu_mem = 8000
#params for text detector
self.config.det_algorithm = det_algorithm
self.config.det_model_dir = det_model_dir
# self.config.det_model_dir = "./inference/det/"
#DB parmas
self.config.det_db_thresh =0.3
self.config.det_db_box_thresh =0.5
self.config.det_db_unclip_ratio =2.0
#EAST parmas
self.config.det_east_score_thresh = 0.8
self.config.det_east_cover_thresh = 0.1
self.config.det_east_nms_thresh = 0.2
#params for text recognizer
self.config.rec_algorithm = rec_algorithm
self.config.rec_model_dir = rec_model_dir
# self.config.rec_model_dir = "./inference/rec/"
self.config.rec_image_shape = "3, 32, 320"
self.config.rec_char_type = 'ch'
self.config.rec_batch_num = rec_batch_num
self.config.rec_char_dict_path = rec_char_dict_path
self.config.use_space_char = True
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def recognize_text(self,
images=[],
paths=[],
det_max_side_len=960,
draw_img_save='ocr_result',
visualization=False,
text_thresh=0.5):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu.
batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
text_thresh(float): the threshold of the recognize chinese texts' confidence
Returns:
res (list): The result of chinese texts and save path of images.
"""
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
self.config.det_max_side_len = det_max_side_len
text_sys = TextSystem(self.config)
cnt = 0
all_results = []
for img in predicted_data:
result = {'save_path': ''}
if img is None:
logger.info("error in loading image")
result['data'] = []
all_results.append(result)
continue
starttime = time.time()
dt_boxes, rec_res = text_sys(img)
elapse = time.time() - starttime
cnt += 1
print("Predict time of image %d: %.3fs" % (cnt, elapse))
dt_num = len(dt_boxes)
rec_res_final = []
for dno in range(dt_num):
text, score = rec_res[dno]
# if the recognized text confidence score is lower than text_thresh, then drop it
if score >= text_thresh:
# text_str = "%s, %.3f" % (text, score)
# print(text_str)
rec_res_final.append(
{
'text': text,
'confidence': float(score),
'text_box_position': dt_boxes[dno].astype(np.int).tolist()
}
)
result['data'] = rec_res_final
if visualization:
image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
boxes = dt_boxes
txts = [rec_res[i][0] for i in range(len(rec_res))]
scores = [rec_res[i][1] for i in range(len(rec_res))]
draw_img = draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5)
if not os.path.exists(draw_img_save):
os.makedirs(draw_img_save)
saved_name = 'ndarray_{}.jpg'.format(time.time())
save_file_path = os.path.join(draw_img_save, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
print("The visualized image saved in {}".format(save_file_path))
result['save_path'] = save_file_path
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
if __name__ == '__main__':
ocr = OCRSystem()
image_path = [
'./doc/imgs/11.jpg',
'./doc/imgs/12.jpg',
]
res = ocr.recognize_text(paths=image_path, visualization=True)
print(res)
\ No newline at end of file
......@@ -22,7 +22,7 @@
| print_batch_step | 设置打印log间隔 | 10 | \ |
| save_model_dir | 设置模型保存路径 | output/{算法名称} | \ |
| save_epoch_step | 设置模型保存间隔 | 3 | \ |
| eval_batch_step | 设置模型评估间隔 | 2000 | \ |
| eval_batch_step | 设置模型评估间隔 | 2000 或 [1000, 2000] | 2000 表示每2000次迭代评估一次,[1000, 2000]表示从1000次迭代开始,每2000次评估一次 |
|train_batch_size_per_card | 设置训练时单卡batch size | 256 | \ |
| test_batch_size_per_card | 设置评估时单卡batch size | 256 | \ |
| image_shape | 设置输入图片尺寸 | [3, 32, 100] | \ |
......@@ -30,6 +30,8 @@
| character_type | 设置字符类型 | ch | en/ch, en时将使用默认dict,ch时使用自定义dict|
| character_dict_path | 设置字典路径 | ./ppocr/utils/ic15_dict.txt | \ |
| loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention |
| distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 |
| reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ |
| pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 |
......
......@@ -6,7 +6,7 @@
- [中文文档文字识别](#中文文档文字识别)
- [ICDAR2019-ArT](#ICDAR2019-ArT)
除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)[SynthText](https://github.com/ankush-me/SynthText)[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)[SynthText](https://github.com/ankush-me/SynthText)[SynthText_Chinese_version](https://github.com/JarveeLee/SynthText_Chinese_version)[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
<a name="ICDAR2019-LSVT"></a>
#### 1、ICDAR2019-LSVT
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment