Commit 4c6b03ad authored by Leif's avatar Leif
Browse files

Merge remote-tracking branch 'origin/release/2.5' into release2.5

parents edad6e7c 77331549
......@@ -63,7 +63,6 @@ class DetOp(Op):
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
out_dict = {"dt_boxes": dt_boxes, "image": self.raw_im}
return out_dict, None, ""
......@@ -86,7 +85,7 @@ class RecOp(Op):
dt_boxes = copy.deepcopy(self.dt_list)
feed_list = []
img_list = []
max_wh_ratio = 0
max_wh_ratio = 320/48.
## Many mini-batchs, the type of feed_data is list.
max_batch_size = 6 # len(dt_boxes)
......@@ -150,7 +149,8 @@ class RecOp(Op):
for i in range(dt_num):
text = rec_list[i]
dt_box = self.dt_list[i]
result_list.append([text, dt_box.tolist()])
if text[1] >= 0.5:
result_list.append([text, dt_box.tolist()])
res = {"result": str(result_list)}
return res, None, ""
......
......@@ -73,4 +73,4 @@ python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_
The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
The derived model can be converted through the `opt tool` of PaddleLite.
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md)
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme.md)
......@@ -455,6 +455,19 @@ A:以检测中的resnet骨干网络为例,图像输入网络之后,需要
A:可以在命令中加入 --det_db_unclip_ratio ,参数定义位置,这个参数是检测后处理时控制文本框大小的,默认1.6,可以尝试改成2.5或者更大,反之,如果觉得文本框不够紧凑,也可以把该参数调小。
#### Q:PP-OCR文本检测出现明显漏检,该如何调整参数或者训练。
A:以[#issue5851](https://github.com/PaddlePaddle/PaddleOCR/issues/5815)为例,文字出现漏检时,先分析问题原因。
首先,在[后处理处](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/ppocr/postprocess/db_postprocess.py#L177)添加如下代码,可视化模型对文字的分割区域:
```
import cv2
im = np.array(segmentation[0]*255).astype(np.uint8)
cv2.imwrite("db_seg_vis.jpg", im)
```
如果模型在漏检文字区域有分割区域,但是没有生成框,此类情况多为文字太小,文字弯曲,或者某一行开头的文字过大等等。可减小DB后处理参数[det_db_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L52)[det_db_box_thresh](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L53),或者设置[use_dilation](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L56)为True扩大文字分割区域。
如果模型在漏检文字区域没有分割区域,是模型对此类数据没有训练好,建议用PP-OCR的模型在你的数据场景上finetune。
<a name="27"></a>
### 2.7 模型结构
......@@ -839,3 +852,15 @@ nvidia-smi --lock-gpu-clocks=1590 -i 0
#### Q: 预测时显存爆炸、内存泄漏问题?
**A**: 打开显存/内存优化开关`enable_memory_optim`可以解决该问题,相关代码已合入,[查看详情](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/utility.py#L153)
#### Q: TRT预测报错:InvalidArgumentError: some trt inputs dynamic shape info not set, check the INFO log above for more details.
**A**: PP-OCR的模型采用动态shape预测,因为TRT子图划分问题,需要设置额外参数的shape;
首先,屏蔽此行代码[config.disable_glog_info()](https://github.com/PaddlePaddle/PaddleOCR/blob/767fad23a2b217f775f3c32314ab8b781966671c/tools/infer/utility.py#L306)输出日志,重新预测,根据报错信息中的提示设置某些参数的动态shape。
假设报错信息中:
trt input [lstm_1.tmp_0] dynamic shape info not set, please check and retry.
可以参考[检测模型里设置动态shape的方式](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L268-L270)设置lstm_1.tmp_0的动态shape,识别模型的动态shape在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/8de06d2370e81e0ee1473d17925fb2e05295a0fe/tools/infer/utility.py#L275-L277)
如果不清楚lstm_1.tmp_0的shape是多少,可以把inference.pdmodel 放在网页 https://netron.app/ 里可视化,搜索lstm_1.tmp_0 查看该参数的shape信息。
\ No newline at end of file
......@@ -15,8 +15,8 @@
- **数据简介**:publaynet数据集的训练集合中包含35万张图像,验证集合中包含1.1万张图像。总共包含5个类别,分别是: `text, title, list, table, figure`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/publaynet_demo/gt_PMC3724501_00006.jpg" width="500">
<img src="../datasets/publaynet_demo/gt_PMC5086060_00002.jpg" width="500">
<img src="../../datasets/publaynet_demo/gt_PMC3724501_00006.jpg" width="500">
<img src="../../datasets/publaynet_demo/gt_PMC5086060_00002.jpg" width="500">
</div>
- **下载地址**:https://developer.ibm.com/exchanges/data/all/publaynet/
......@@ -30,8 +30,8 @@
- **数据简介**:CDLA据集的训练集合中包含5000张图像,验证集合中包含1000张图像。总共包含10个类别,分别是: `Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation`。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/CDLA_demo/val_0633.jpg" width="500">
<img src="../datasets/CDLA_demo/val_0941.jpg" width="500">
<img src="../../datasets/CDLA_demo/val_0633.jpg" width="500">
<img src="../../datasets/CDLA_demo/val_0941.jpg" width="500">
</div>
- **下载地址**:https://github.com/buptlihang/CDLA
......@@ -45,8 +45,8 @@
- **数据简介**:TableBank数据集包含Latex(训练集187199张,验证集7265张,测试集5719张)与Word(训练集73383张,验证集2735张,测试集2281张)两种类别的文档。仅包含`Table` 1个类别。部分图像以及标注框可视化如下所示。
<div align="center">
<img src="../datasets/tablebank_demo/004.png" height="700">
<img src="../datasets/tablebank_demo/005.png" height="700">
<img src="../../datasets/tablebank_demo/004.png" height="700">
<img src="../../datasets/tablebank_demo/005.png" height="700">
</div>
- **下载地址**:https://doc-analysis.github.io/tablebank-page/index.html
......
......@@ -591,8 +591,9 @@ Metric:
#### 2.2.5 检测蒸馏模型finetune
PP-OCRv3检测蒸馏有两种方式:
- 采用ch_PP-OCRv3_det_cml.yml,采用cml蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv3_det_cml.yml,采用CML蒸馏,同样Teacher模型设置为PaddleOCR提供的模型或者您训练好的大模型
- 采用ch_PP-OCRv3_det_dml.yml,采用DML的蒸馏,两个Student模型互蒸馏的方法,在PaddleOCR采用的数据集上相比单独训练Student模型有1%-2%的提升。
> 如果您在自己的场景中没有训练过高精度大模型,或原始PP-OCR模型在您的场景中表现不好,则无法使用CML训练以达到更高精度,更应该采用DML训练
在具体fine-tune时,需要在网络结构的`pretrained`参数中设置要加载的预训练模型。
......
# 《动手学OCR》电子书
《动手学OCR》是PaddleOCR团队携手复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下:
《动手学OCR》是PaddleOCR团队携手华中科技大学博导/教授,IAPR Fellow 白翔、复旦大学青年研究员陈智能、中国移动研究院视觉领域资深专家黄文辉、中国工商银行大数据人工智能实验室研究员等产学研同仁,以及OCR开发者共同打造的结合OCR前沿理论与代码实践的教材。主要特色如下:
- 覆盖从文本检测识别到文档分析的OCR全栈技术
- 紧密结合理论实践,跨越代码实现鸿沟,并配套教学视频
......
......@@ -30,11 +30,11 @@ PP-OCR系统pipeline如下:
PP-OCR系统在持续迭代优化,目前已发布PP-OCR和PP-OCRv2两个版本:
PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
PP-OCR从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考[PP-OCR技术报告](https://arxiv.org/abs/2009.09941)
#### PP-OCRv2
PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)
PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和[Enhanced CTC loss](./enhanced_ctc_loss.md)损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考[PP-OCRv2技术报告](https://arxiv.org/abs/2109.03144)
#### PP-OCRv3
......@@ -48,7 +48,7 @@ PP-OCRv3系统pipeline如下:
<img src="../ppocrv3_framework.png" width="800">
</div>
更多细节请参考PP-OCRv3[技术报告](./PP-OCRv3_introduction.md)
更多细节请参考[PP-OCRv3技术报告](https://arxiv.org/abs/2206.03001v2) 👉[中文简洁版](./PP-OCRv3_introduction.md)
<a name="2"></a>
......
......@@ -101,8 +101,17 @@ cd /path/to/ppocr_img
['韩国小馆', 0.994467]
```
**版本说明**
paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`),如需使用其他版本可通过设置参数`--ocr_version`,具体版本说明如下:
| 版本名称 | 版本说明 |
| --- | --- |
| PP-OCRv3 | 支持中、英文检测和识别,方向分类器,支持多语种识别 |
| PP-OCRv2 | 支持中英文的检测和识别,方向分类器,多语言暂未更新 |
| PP-OCR | 支持中、英文检测和识别,方向分类器,支持多语种识别 |
如需使用2.0模型,请指定参数`--ocr_version PP-OCR`,paddleocr默认使用PP-OCRv3模型(`--ocr_version PP-OCRv3`)。更多whl包使用可参考[whl包文档](./whl.md)
如需新增自己训练的模型,可以在[paddleocr](../../paddleocr.py)中增加模型链接和字段,重新编译即可。
更多whl包使用可参考[whl包文档](./whl.md)
<a name="212"></a>
......
......@@ -550,4 +550,4 @@ inference/en_PP-OCRv3_rec/
Q1: 训练模型转inference 模型之后预测效果不一致?
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。更多内容请参考[FAQ](./FAQ.md#210-%E6%A8%A1%E5%9E%8B%E6%95%88%E6%9E%9C%E4%B8%8E%E6%95%88%E6%9E%9C%E4%B8%8D%E4%B8%80%E8%87%B4).
# 社区贡献
# 开源社区
感谢大家长久以来对PaddleOCR的支持和关注,与广大开发者共同构建一个专业、和谐、相互帮助的开源社区是PaddleOCR的目标。本文档展示了已有的社区贡献、对于各类贡献说明、新的机会与流程,希望贡献流程更加高效、路径更加清晰。
......@@ -119,25 +119,30 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
## 附录:社区常规赛积分榜
| 开发者 | 总积分 | 开发者 | 总积分 |
| ------------------------------------------------------- | ------ | ----------------------------------------------------- | ------ |
| [RangeKing](https://github.com/RangeKing) | 220 | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [hao6699](https://github.com/hao6699) | 145 | [v3fc](https://github.com/v3fc) | 35 |
| [mymagicpower](https://github.com/mymagicpower) | 140 | [imiyu](https://github.com/imiyu) | 30 |
| [raoyutian](https://github.com/raoyutian) | 90 | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 80 | [daassh](https://github.com/daassh) | 23 |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 70 | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 20 |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 70 | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 60 | [chccc1994](https://github.com/chccc1994) | 20 |
| [edencfc](https://github.com/edencfc) | 57 | [BeyondYourself ](https://github.com/BeyondYourself) | 20 |
| [zhangyingying520](https://github.com/zhangyingying520) | 57 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [telppa](https://github.com/telppa) | 40 | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| sosojust1984 | 40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [redearly123](https://github.com/redearly123) | 40 | [JimEverest](https://github.com/JimEverest) | 10 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) | 40 | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
| [GreatV](https://github.com/GreatV) | 40 | | |
| CLXK294 | 40 | | |
| 开发者 | 总积分(+新增积分) | 开发者 | 总积分(+新增积分) |
| ---------------------------------------------------------- | ---------------- | ---------------------------------------------------- | --------------- |
| [RangeKing](https://github.com/RangeKing) 🥇 | 222(+2) | CLXK294 | 40 |
| [mymagicpower](https://github.com/mymagicpower) 🥈 | 150(+10) | [telppa](https://github.com/telppa) | 40 |
| [hao6699 ](https://github.com/hao6699) 🥉 | 145 | sosojust1984 | 40 |
| [raoyutian](https://github.com/raoyutian) | 135(+45) | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) 🏅 ↑12 | 120(+80) | [v3fc](https://github.com/v3fc) | 35 |
| [edencfc](https://github.com/edencfc) ↑5 | 97(+40) | [imiyu](https://github.com/imiyu) | 30 |
| [zhangyingying520](https://github.com/zhangyingying520) ↑5 | 91(+34) | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 90(+10) | [daassh](https://github.com/daassh) | 23 |
| [xiaxianlei](https://github.com/xiaxianlei) ✨ 🏆 | +81 | [BeyondYourself](https://github.com/BeyondYourself) | 26(+6) |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 80(+10) | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 26(+6) |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 74(+4) | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 64(+4) | [chccc1994](https://github.com/chccc1994) | 20 |
| [PeterH0323](https://github.com/PeterH0323) ✨ 🎖 | +55 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [JimEverest](https://github.com/JimEverest) | 14(+4) |
| [d2623587501](https://github.com/d2623587501) | 51(+6) | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [Wei-JL](https://github.com/Wei-JL) | 46(+6) | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [GreatV](https://github.com/GreatV) | 42(+2) | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| [fuhengwu2021](https://github.com/fuhengwu2021) ✨ | +40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [manangoel99](https://github.com/manangoel99) ✨ | +40 | | |
| [redearly123](https://github.com/redearly123) | 40 | | |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
......@@ -4,8 +4,7 @@
- 半自动标注工具[PPOCRLabelv2](../../PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
- OCR产业落地工具集:打通22种训练部署软硬件环境与方式,覆盖企业90%的训练部署环境需求
- 交互式OCR开源电子书[《动手学OCR》](./ocr_book.md),覆盖OCR全栈技术的前沿理论与代码实践,并配套教学视频。
- 2022.5.7 添加对[Weights & Biases](https://docs.wandb.ai/)训练日志记录工具的支持。
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 《动手学OCR·十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】[报名地址](https://aistudio.baidu.com/aistudio/course/introduce/25207)
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
......@@ -29,7 +28,7 @@
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md)
- 2020.6.8 添加[数据集](dataset/datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持 `attention` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验
- 2020.5.30 模型预测、训练支持Windows系统
......
......@@ -77,7 +77,7 @@ LK-PAN (Large Kernel PAN) is a lightweight [PAN](https://arxiv.org/pdf/1803.0153
**(2) DML: Deep Mutual Learning Strategy for Teacher Model**
[DML](https://arxiv.org/abs/1706.00384)(Collaborative Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
[DML](https://arxiv.org/abs/1706.00384)(Deep Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
<div align="center">
......@@ -101,7 +101,7 @@ Considering that the features of some channels will be suppressed if the convolu
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
The recognition accuracy of SVTR_tiny outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
<div align="center">
<img src="../ppocr_v3/v3_rec_pipeline.png" width=800>
......
......@@ -29,10 +29,10 @@ PP-OCR pipeline is as follows:
PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the [PP-OCR technical report](https://arxiv.org/abs/2009.09941).
#### PP-OCRv2
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the [PP-OCRv2 technical report](https://arxiv.org/abs/2109.03144).
#### PP-OCRv3
......@@ -46,7 +46,7 @@ PP-OCRv3 pipeline is as follows:
<img src="../ppocrv3_framework.png" width="800">
</div>
For more details, please refer to [PP-OCRv3 technical report](./PP-OCRv3_introduction_en.md).
For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/abs/2206.03001v2).
<a name="2"></a>
## 2. Features
......
......@@ -119,7 +119,18 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.9934559464454651]
```
If you need to use the 2.0 model, please specify the parameter `--ocr_version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md)
**Version**
paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). If you want to use other versions, you can set the parameter `--ocr_version`, the specific version description is as follows:
| version name | description |
| --- | --- |
| PP-OCRv3 | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
| PP-OCRv2 | only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated |
| PP-OCR | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
If you want to add your own trained model, you can add model links and keys in [paddleocr](../../paddleocr.py) and recompile.
More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a>
#### 2.1.2 Multi-language Model
......
# Paddleocr Package
# PaddleOCR Package
## 1 Get started quickly
### 1.1 install package
......
......@@ -154,7 +154,13 @@ MODEL_URLS = {
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar',
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
}
}
},
'cls': {
'ch': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
}
},
},
'PP-OCR': {
'det': {
......@@ -440,7 +446,7 @@ class PaddleOCR(predict_system.TextSystem):
"""
ocr with paddleocr
args:
img: img for ocr, support ndarray, img_path and list or ndarray
img: img for ocr, support ndarray, img_path and list of ndarray
det: use text detection or not. If false, only rec will be exec. Default is True
rec: use text recognition or not. If false, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
......
......@@ -35,10 +35,12 @@ class CopyPaste(object):
point_num = data['polys'].shape[1]
src_img = data['image']
src_polys = data['polys'].tolist()
src_texts = data['texts']
src_ignores = data['ignore_tags'].tolist()
ext_data = data['ext_data'][0]
ext_image = ext_data['image']
ext_polys = ext_data['polys']
ext_texts = ext_data['texts']
ext_ignores = ext_data['ignore_tags']
indexs = [i for i in range(len(ext_ignores)) if not ext_ignores[i]]
......@@ -53,7 +55,7 @@ class CopyPaste(object):
src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
ext_image = cv2.cvtColor(ext_image, cv2.COLOR_BGR2RGB)
src_img = Image.fromarray(src_img).convert('RGBA')
for poly, tag in zip(select_polys, select_ignores):
for idx, poly, tag in zip(select_idxs, select_polys, select_ignores):
box_img = get_rotate_crop_image(ext_image, poly)
src_img, box = self.paste_img(src_img, box_img, src_polys)
......@@ -62,6 +64,7 @@ class CopyPaste(object):
for _ in range(len(box), point_num):
box.append(box[-1])
src_polys.append(box)
src_texts.append(ext_texts[idx])
src_ignores.append(tag)
src_img = cv2.cvtColor(np.array(src_img), cv2.COLOR_RGB2BGR)
h, w = src_img.shape[:2]
......@@ -70,6 +73,7 @@ class CopyPaste(object):
src_polys[:, :, 1] = np.clip(src_polys[:, :, 1], 0, h)
data['image'] = src_img
data['polys'] = src_polys
data['texts'] = src_texts
data['ignore_tags'] = np.array(src_ignores)
return data
......
......@@ -23,7 +23,7 @@ import string
from shapely.geometry import LineString, Point, Polygon
import json
import copy
from scipy.spatial import distance as dist
from ppocr.utils.logging import get_logger
......@@ -74,9 +74,10 @@ class DetLabelEncode(object):
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0)
diff = np.diff(np.array(tmp), axis=1)
rect[1] = tmp[np.argmin(diff)]
rect[3] = tmp[np.argmax(diff)]
return rect
def expand_points_num(self, boxes):
......@@ -443,7 +444,9 @@ class KieLabelEncode(object):
elif 'key_cls' in ann.keys():
labels.append(ann['key_cls'])
else:
raise ValueError("Cannot found 'key_cls' in ann.keys(), please check your training annotation.")
raise ValueError(
"Cannot found 'key_cls' in ann.keys(), please check your training annotation."
)
edges.append(ann.get('edge', 0))
ann_infos = dict(
image=data['image'],
......
......@@ -33,7 +33,7 @@ class SimpleDataSet(Dataset):
self.delimiter = dataset_config.get('delimiter', '\t')
label_file_list = dataset_config.pop('label_file_list')
data_source_num = len(label_file_list)
ratio_list = dataset_config.get("ratio_list", [1.0])
ratio_list = dataset_config.get("ratio_list", 1.0)
if isinstance(ratio_list, (float, int)):
ratio_list = [float(ratio_list)] * int(data_source_num)
......
......@@ -57,17 +57,27 @@ class CELoss(nn.Layer):
class KLJSLoss(object):
def __init__(self, mode='kl'):
assert mode in ['kl', 'js', 'KL', 'JS'
], "mode can only be one of ['kl', 'js', 'KL', 'JS']"
], "mode can only be one of ['kl', 'KL', 'js', 'JS']"
self.mode = mode
def __call__(self, p1, p2, reduction="mean"):
loss = paddle.multiply(p2, paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
if self.mode.lower() == "js":
if self.mode.lower() == 'kl':
loss = paddle.multiply(p2,
paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
loss += paddle.multiply(
p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5))
loss *= 0.5
elif self.mode.lower() == "js":
loss = paddle.multiply(
p2, paddle.log((2 * p2 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss += paddle.multiply(
p1, paddle.log((2 * p1 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss *= 0.5
else:
raise ValueError(
"The mode.lower() if KLJSLoss should be one of ['kl', 'js']")
if reduction == "mean":
loss = paddle.mean(loss, axis=[1, 2])
elif reduction == "none" or reduction is None:
......@@ -95,7 +105,7 @@ class DMLLoss(nn.Layer):
self.act = None
self.use_log = use_log
self.jskl_loss = KLJSLoss(mode="js")
self.jskl_loss = KLJSLoss(mode="kl")
def _kldiv(self, x, target):
eps = 1.0e-10
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment