Commit 721c76b4 authored by LDOUBLEV's avatar LDOUBLEV
Browse files

fix conflict

parents 98162be4 b77f9ec0
# Add new algorithm # Add New Algorithm
PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms. PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
...@@ -263,7 +263,7 @@ Metric: ...@@ -263,7 +263,7 @@ Metric:
main_indicator: acc main_indicator: acc
``` ```
## 优化器 ## Optimizer
The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`. Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
......
# Two-stage Algorithm
- [1. Algorithm Introduction](#1-algorithm-introduction)
* [1.1 Text Detection Algorithm](#11-text-detection-algorithm)
* [1.2 Text Recognition Algorithm](#12-text-recognition-algorithm)
- [2. Training](#2-training)
- [3. Inference](#3-inference)
<a name="Algorithm_introduction"></a> <a name="Algorithm_introduction"></a>
## Algorithm introduction
## 1. Algorithm Introduction
This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md). This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
...@@ -8,31 +17,32 @@ This tutorial lists the text detection algorithms and text recognition algorithm ...@@ -8,31 +17,32 @@ This tutorial lists the text detection algorithms and text recognition algorithm
- [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM) - [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM)
<a name="TEXTDETECTIONALGORITHM"></a> <a name="TEXTDETECTIONALGORITHM"></a>
### 1. Text Detection Algorithm
### 1.1 Text Detection Algorithm
PaddleOCR open source text detection algorithms list: PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] EAST([paper](https://arxiv.org/abs/1704.03155))[2]
- [x] DB([paper](https://arxiv.org/abs/1911.08947)) - [x] DB([paper](https://arxiv.org/abs/1911.08947))[1]
- [x] SAST([paper](https://arxiv.org/abs/1908.05498)) - [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4]
- [x] PSE([paper](https://arxiv.org/abs/1903.12473v2)) - [x] PSENet([paper](https://arxiv.org/abs/1903.12473v2)
On the ICDAR2015 dataset, the text detection result is as follows: On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link| |Model|Backbone|Precision|Recall|Hmean|Download link|
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
|EAST|ResNet50_vd|85.80%|86.71%|86.25%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)| |EAST|ResNet50_vd|85.80%|86.71%|86.25%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)|
|EAST|MobileNetV3|79.42%|80.64%|80.03%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)| |EAST|MobileNetV3|79.42%|80.64%|80.03%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_east_v2.0_train.tar)|
|DB|ResNet50_vd|86.41%|78.72%|82.38%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)| |DB|ResNet50_vd|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|DB|MobileNetV3|77.29%|73.08%|75.12%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)| |DB|MobileNetV3|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)| |SAST|ResNet50_vd|91.39%|83.77%|87.42%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)| |PSE|ResNet50_vd|85.81%|79.53%|82.55%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)| |PSE|MobileNetV3|82.20%|70.48%|75.89%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
On Total-Text dataset, the text detection result is as follows: On Total-Text dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link| |Model|Backbone|Precision|Recall|Hmean|Download link|
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
|SAST|ResNet50_vd|89.63%|78.44%|83.66%|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)| |SAST|ResNet50_vd|89.63%|78.44%|83.66%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)|
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from: **Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from:
* [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi). * [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
...@@ -41,31 +51,41 @@ On Total-Text dataset, the text detection result is as follows: ...@@ -41,31 +51,41 @@ On Total-Text dataset, the text detection result is as follows:
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md) For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md)
<a name="TEXTRECOGNITIONALGORITHM"></a> <a name="TEXTRECOGNITIONALGORITHM"></a>
### 2. Text Recognition Algorithm ### 1.2 Text Recognition Algorithm
PaddleOCR open-source text recognition algorithms list: PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] CRNN([paper](https://arxiv.org/abs/1507.05717))[7]
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))[10]
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) - [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))[11]
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1)) - [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
- [x] SRN([paper](https://arxiv.org/abs/2003.12294)) - [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5]
- [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2)) - [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))[13]
- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2)) - [x] SAR([paper](https://arxiv.org/abs/1811.00751v2))
- [x] SEED([paper](https://arxiv.org/pdf/2005.10977.pdf))
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow: Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|Model|Backbone|Avg Accuracy|Module combination|Download link| |Model|Backbone|Avg Accuracy|Module combination|Download link|
|---|---|---|---|---| |---|---|---|---|---|
|Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)| |Rosetta|Resnet34_vd|80.9%|rec_r34_vd_none_none_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_none_ctc_v2.0_train.tar)|
|Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)| |Rosetta|MobileNetV3|78.05%|rec_mv3_none_none_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_none_ctc_v2.0_train.tar)|
|CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)| |CRNN|Resnet34_vd|82.76%|rec_r34_vd_none_bilstm_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar)|
|CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)| |CRNN|MobileNetV3|79.97%|rec_mv3_none_bilstm_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar)|
|StarNet|Resnet34_vd|84.44%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_ctc_v2.0_train.tar)| |StarNet|Resnet34_vd|84.44%|rec_r34_vd_tps_bilstm_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_ctc_v2.0_train.tar)|
|StarNet|MobileNetV3|81.42%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_ctc_v2.0_train.tar)| |StarNet|MobileNetV3|81.42%|rec_mv3_tps_bilstm_ctc|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_ctc_v2.0_train.tar)|
|RARE|MobileNetV3|82.5%|rec_mv3_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)| |RARE|MobileNetV3|82.5%|rec_mv3_tps_bilstm_att |[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)|
|RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)| |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
|SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)| |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
|NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) | |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
|SAR|Resnet31| 87.2% | rec_r31_sar | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) | |SAR|Resnet31| 87.2% | rec_r31_sar | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
|SEED|Aster_Resnet| 85.2% | rec_resnet_stn_bilstm_att | [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_resnet_stn_bilstm_att.tar) |
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
Please refer to the document for training guide and use of PaddleOCR
## 2. Training
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md). For text recognition algorithms, please refer to [Text recognition model training/evaluation/prediction](./recognition_en.md)
## 3. Inference
Except for the PP-OCR series models of the above models, the other models only support inference based on the Python engine. For details, please refer to [Inference based on Python prediction engine](./inference_en.md)
...@@ -15,7 +15,6 @@ The text line image obtained after text detection is sent to the recognition mod ...@@ -15,7 +15,6 @@ The text line image obtained after text detection is sent to the recognition mod
Example of 0 and 180 degree data samples: Example of 0 and 180 degree data samples:
![](../imgs_results/angle_class_example.jpg) ![](../imgs_results/angle_class_example.jpg)
### DATA PREPARATION
<a name="data-preparation"></a> <a name="data-preparation"></a>
## 2. Data Preparation ## 2. Data Preparation
...@@ -131,8 +130,6 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/ ...@@ -131,8 +130,6 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/
<a name="prediction"></a> <a name="prediction"></a>
## 5. Prediction ## 5. Prediction
### PREDICTION
* Training engine prediction * Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
......
...@@ -36,11 +36,10 @@ Take rec_chinese_lite_train_v2.0.yml as an example ...@@ -36,11 +36,10 @@ Take rec_chinese_lite_train_v2.0.yml as an example
| pretrained_model | Set the path of the pre-trained model | ./pretrain_models/CRNN/best_accuracy | \ | | pretrained_model | Set the path of the pre-trained model | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | set model parameter path | None | Used to load parameters after interruption to continue training| | checkpoints | set model parameter path | None | Used to load parameters after interruption to continue training|
| use_visualdl | Set whether to enable visualdl for visual log display | False | [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) | | use_visualdl | Set whether to enable visualdl for visual log display | False | [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
| infer_img | Set inference image path or folder path | ./infer_img | \| | infer_img | Set inference image path or folder path | ./infer_img | \||
| character_dict_path | Set dictionary path | ./ppocr/utils/ppocr_keys_v1.txt | \ | | character_dict_path | Set dictionary path | ./ppocr/utils/ppocr_keys_v1.txt | If the character_dict_path is None, model can only recognize number and lower letters |
| max_text_length | Set the maximum length of text | 25 | \ | | max_text_length | Set the maximum length of text | 25 | \ |
| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch | | use_space_char | Set whether to recognize spaces | True | \| |
| use_space_char | Set whether to recognize spaces | True | Only support in character_type=ch mode |
| label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in angle classifier model | | label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in angle classifier model |
| save_res_path | Set the save address of the test model results | ./output/det_db/predicts_db.txt | Only valid in the text detection model | | save_res_path | Set the save address of the test model results | ./output/det_db/predicts_db.txt | Only valid in the text detection model |
...@@ -196,7 +195,6 @@ Italian is made up of Latin letters, so after executing the command, you will ge ...@@ -196,7 +195,6 @@ Italian is made up of Latin letters, so after executing the command, you will ge
use_gpu: True use_gpu: True
epoch_num: 500 epoch_num: 500
... ...
character_type: it # language
character_dict_path: {path/of/dict} # path of dict character_dict_path: {path/of/dict} # path of dict
Train: Train:
...@@ -218,18 +216,18 @@ Italian is made up of Latin letters, so after executing the command, you will ge ...@@ -218,18 +216,18 @@ Italian is made up of Latin letters, so after executing the command, you will ge
Currently, the multi-language algorithms supported by PaddleOCR are: Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type | | Configuration file | Algorithm name | backbone | trans | seq | pred | language |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht| | rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional |
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN | | rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) |
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french | | rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german | | rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan | | rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean | | rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin | | rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin |
| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar | | rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic |
| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic | | rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic |
| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari | | rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari |
For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)
......
...@@ -99,6 +99,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o \ ...@@ -99,6 +99,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o \
# Set the GPU ID used by the '--gpus' parameter. # Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
# multi-Node, multi-GPU training
# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
**Note:** For multi-Node multi-GPU training, you need to replace the `ips` value in the preceding command with the address of your machine, and the machines must be able to ping each other. In addition, it requires activating commands separately on multiple machines when we start the training. The command for viewing the IP address of the machine is `ifconfig`.
If you want to further speed up the training, you can use [automatic mixed precision training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_en.html). for single card training, the command is as follows:
```
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
``` ```
### 2.2 Load Trained Model and Continue Training ### 2.2 Load Trained Model and Continue Training
...@@ -223,6 +235,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./out ...@@ -223,6 +235,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./out
## 5. FAQ ## 5. FAQ
Q1: The prediction results of trained model and inference model are inconsistent? Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows: **A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147) - Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50). - Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).
# Environment Preparation # Environment Preparation
Windows and Mac users are recommended to use Anaconda to build a Python environment, and Linux users are recommended to use docker to build a Python environment. If you are familiar with the Python environment, you can skip to step 2 to install PaddlePaddle.
Recommended working environment:
- PaddlePaddle >= 2.0.0 (2.1.2)
- python3.7
- CUDA10.1 / CUDA10.2
- CUDNN 7.6
* [1. Python Environment Setup](#1) * [1. Python Environment Setup](#1)
+ [1.1 Windows](#1.1) + [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2) + [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3) + [1.3 Linux](#1.3)
* [2. Install PaddlePaddle 2.0](#2) * [2. Install PaddlePaddle 2.0](#2)
<a name="1"></a> <a name="1"></a>
## 1. Python Environment Setup ## 1. Python Environment Setup
...@@ -38,7 +47,7 @@ ...@@ -38,7 +47,7 @@
- Check conda to add environment variables and ignore the warning that - Check conda to add environment variables and ignore the warning that
<img src="../install/windows/anaconda_install_env.png" alt="add conda to path" width="500" align="center"/> <img src="../install/windows/anaconda_install_env.png" alt="add conda to path" width="500" align="center"/>
#### 1.1.2 Opening the terminal and creating the conda environment #### 1.1.2 Opening the terminal and creating the conda environment
...@@ -69,7 +78,7 @@ ...@@ -69,7 +78,7 @@
# View the current location of python # View the current location of python
where python where python
``` ```
<img src="../install/windows/conda_list_env.png" alt="create environment" width="600" align="center"/> <img src="../install/windows/conda_list_env.png" alt="create environment" width="600" align="center"/>
The above anaconda environment and python environment are installed The above anaconda environment and python environment are installed
...@@ -133,13 +142,13 @@ The above anaconda environment and python environment are installed ...@@ -133,13 +142,13 @@ The above anaconda environment and python environment are installed
# !!! Contents within this block are managed by 'conda init' !!! # !!! Contents within this block are managed by 'conda init' !!!
__conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" __conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then if [ $? -eq 0 ]; then
eval "$__conda_setup" eval "$__conda_setup"
else else
if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then
. "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" . "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh"
else else
export PATH="/Users/xxx/opt/anaconda3/bin:$PATH" export PATH="/Users/xxx/opt/anaconda3/bin:$PATH"
fi fi
fi fi
unset __conda_setup unset __conda_setup
# <<< conda initialize <<< # <<< conda initialize <<<
...@@ -197,11 +206,10 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -197,11 +206,10 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
- **Download Anaconda**. - **Download Anaconda**.
- Download at: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D - Download at: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
<img src="../install/linux/anaconda_download.png" akt="anaconda download" width="800" align="center"/> <img src="../install/linux/anaconda_download.png" akt="anaconda download" width="800" align="center"/>
- Select the appropriate version for your operating system - Select the appropriate version for your operating system
- Type `uname -m` in the terminal to check the command set used by your system - Type `uname -m` in the terminal to check the command set used by your system
...@@ -216,12 +224,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -216,12 +224,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
sudo yum install wget # CentOS sudo yum install wget # CentOS
``` ```
```bash ```bash
# Then use wget to download from Tsinghua source # Then use wget to download from Tsinghua source
# If you want to download Anaconda3-2021.05-Linux-x86_64.sh, the download command is as follows # If you want to download Anaconda3-2021.05-Linux-x86_64.sh, the download command is as follows
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh
# If you want to download another version, you need to change the file name after the last 1 / to the version you want to download # If you want to download another version, you need to change the file name after the last 1 / to the version you want to download
``` ```
- To install Anaconda. - To install Anaconda.
- Type `sh Anaconda3-2021.05-Linux-x86_64.sh` at the command line - Type `sh Anaconda3-2021.05-Linux-x86_64.sh` at the command line
...@@ -309,7 +317,18 @@ cd /home/Projects ...@@ -309,7 +317,18 @@ cd /home/Projects
# Create a docker container named ppocr and map the current directory to the /paddle directory of the container # Create a docker container named ppocr and map the current directory to the /paddle directory of the container
# If using CPU, use docker instead of nvidia-docker to create docker # If using CPU, use docker instead of nvidia-docker to create docker
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash sudo docker run --name ppocr -v $PWD:/paddle --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda10.2-cudnn7 /bin/bash
# If using GPU, use nvidia-docker to create docker
# docker image registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda11.2-cudnn8 is recommended for CUDA11.2 + CUDNN8.
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it registry.baidubce.com/paddlepaddle/paddle:2.1.3-gpu-cuda10.2-cudnn7 /bin/bash
```
You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine.
```
# ctrl+P+Q to exit docker, to re-enter docker using the following command:
sudo docker container exec -it ppocr /bin/bash
``` ```
<a name="2"></a> <a name="2"></a>
...@@ -329,4 +348,3 @@ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple ...@@ -329,4 +348,3 @@ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
``` ```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
...@@ -21,7 +21,7 @@ Next, we first introduce how to convert a trained model into an inference model, ...@@ -21,7 +21,7 @@ Next, we first introduce how to convert a trained model into an inference model,
- [2.2 DB Text Detection Model Inference](#DB_DETECTION) - [2.2 DB Text Detection Model Inference](#DB_DETECTION)
- [2.3 East Text Detection Model Inference](#EAST_DETECTION) - [2.3 East Text Detection Model Inference](#EAST_DETECTION)
- [2.4 Sast Text Detection Model Inference](#SAST_DETECTION) - [2.4 Sast Text Detection Model Inference](#SAST_DETECTION)
- [3. Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) - [3. Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
- [3.1 Lightweight Chinese Text Recognition Model Reference](#LIGHTWEIGHT_RECOGNITION) - [3.1 Lightweight Chinese Text Recognition Model Reference](#LIGHTWEIGHT_RECOGNITION)
- [3.2 CTC-Based Text Recognition Model Inference](#CTC-BASED_RECOGNITION) - [3.2 CTC-Based Text Recognition Model Inference](#CTC-BASED_RECOGNITION)
...@@ -281,7 +281,7 @@ python3 tools/export_model.py -c configs/det/rec_r34_vd_none_bilstm_ctc.yml -o G ...@@ -281,7 +281,7 @@ python3 tools/export_model.py -c configs/det/rec_r34_vd_none_bilstm_ctc.yml -o G
For CRNN text recognition model inference, execute the following commands: For CRNN text recognition model inference, execute the following commands:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
``` ```
![](../imgs_words_en/word_336.png) ![](../imgs_words_en/word_336.png)
...@@ -314,7 +314,7 @@ with the training, such as: --rec_image_shape="1, 64, 256" ...@@ -314,7 +314,7 @@ with the training, such as: --rec_image_shape="1, 64, 256"
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
--rec_model_dir="./inference/srn/" \ --rec_model_dir="./inference/srn/" \
--rec_image_shape="1, 64, 256" \ --rec_image_shape="1, 64, 256" \
--rec_char_type="en" \ --rec_char_dict_path="./ppocr/utils/ic15_dict.txt" \
--rec_algorithm="SRN" --rec_algorithm="SRN"
``` ```
...@@ -323,7 +323,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png ...@@ -323,7 +323,7 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch` If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
``` ```
<a name="MULTILINGUAL_MODEL_INFERENCE"></a> <a name="MULTILINGUAL_MODEL_INFERENCE"></a>
...@@ -333,7 +333,7 @@ If you need to predict other language models, when using inference model predict ...@@ -333,7 +333,7 @@ If you need to predict other language models, when using inference model predict
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
...@@ -399,7 +399,7 @@ If you want to try other detection algorithms or recognition algorithms, please ...@@ -399,7 +399,7 @@ If you want to try other detection algorithms or recognition algorithms, please
The following command uses the combination of the EAST text detection and STAR-Net text recognition: The following command uses the combination of the EAST text detection and STAR-Net text recognition:
``` ```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
``` ```
After executing the command, the recognition result image is as follows: After executing the command, the recognition result image is as follows:
......
# Python Inference for PP-OCR Model Library # Python Inference for PP-OCR Model Zoo
This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU. This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU.
- [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE) - [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE)
- [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) - [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
- [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION) - [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION)
- [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE) - [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE)
- [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE) - [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE)
- [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION) - [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION)
<a name="DETECTION_MODEL_INFERENCE"></a> <a name="DETECTION_MODEL_INFERENCE"></a>
...@@ -22,10 +19,10 @@ The default configuration is based on the inference setting of the DB text detec ...@@ -22,10 +19,10 @@ The default configuration is based on the inference setting of the DB text detec
``` ```
# download DB text detection inference model # download DB text detection inference model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar
tar xf ch_ppocr_mobile_v2.0_det_infer.tar tar xf ch_PP-OCRv2_det_infer.tar
# predict # run inference
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer.tar/"
``` ```
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
...@@ -42,12 +39,12 @@ Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest si ...@@ -42,12 +39,12 @@ Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest si
If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216: If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --det_limit_type=max --det_limit_side_len=1216
``` ```
If you want to use the CPU for prediction, execute the command as follows If you want to use the CPU for prediction, execute the command as follows
``` ```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --use_gpu=False
``` ```
<a name="RECOGNITION_MODEL_INFERENCE"></a> <a name="RECOGNITION_MODEL_INFERENCE"></a>
...@@ -62,9 +59,10 @@ For lightweight Chinese recognition model inference, you can execute the followi ...@@ -62,9 +59,10 @@ For lightweight Chinese recognition model inference, you can execute the followi
``` ```
# download CRNN text recognition inference model # download CRNN text recognition inference model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
tar xf ch_ppocr_mobile_v2.0_rec_infer.tar tar xf ch_PP-OCRv2_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_10.png" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer" # run inference
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./ch_PP-OCRv2_rec_infer/"
``` ```
![](../imgs_words_en/word_10.png) ![](../imgs_words_en/word_10.png)
...@@ -78,10 +76,12 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658) ...@@ -78,10 +76,12 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
<a name="MULTILINGUAL_MODEL_INFERENCE"></a> <a name="MULTILINGUAL_MODEL_INFERENCE"></a>
### 2. Multilingaul Model Inference ### 2. Multilingaul Model Inference
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
``` ```
wget wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
``` ```
![](../imgs_words/korean/1.jpg) ![](../imgs_words/korean/1.jpg)
...@@ -120,13 +120,13 @@ When performing prediction, you need to specify the path of a single image or a ...@@ -120,13 +120,13 @@ When performing prediction, you need to specify the path of a single image or a
```shell ```shell
# use direction classifier # use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=true
# not use use direction classifier # not use use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false
# use multi-process # use multi-process
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" --use_angle_cls=false --use_mp=True --total_process_num=6
``` ```
......
## OCR model list(V2.1, updated on 2021.9.6) # OCR Model List(V2.1, updated on 2021.9.6)
> **Note** > **Note**
> 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed. > 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed.
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
...@@ -6,9 +6,9 @@ ...@@ -6,9 +6,9 @@
- [1. Text Detection Model](#Detection) - [1. Text Detection Model](#Detection)
- [2. Text Recognition Model](#Recognition) - [2. Text Recognition Model](#Recognition)
- [Chinese Recognition Model](#Chinese) - [2.1 Chinese Recognition Model](#Chinese)
- [English Recognition Model](#English) - [2.2 English Recognition Model](#English)
- [Multilingual Recognition Model](#Multilingual) - [2.3 Multilingual Recognition Model](#Multilingual)
- [3. Text Angle Classification Model](#Angle) - [3. Text Angle Classification Model](#Angle)
- [4. Paddle-Lite Model](#Paddle-Lite) - [4. Paddle-Lite Model](#Paddle-Lite)
...@@ -25,26 +25,26 @@ Relationship of the above models is as follows. ...@@ -25,26 +25,26 @@ Relationship of the above models is as follows.
![](../imgs_en/model_prod_flow_en.png) ![](../imgs_en/model_prod_flow_en.png)
<a name="Detection"></a> <a name="Detection"></a>
### 1. Text Detection Model ## 1. Text Detection Model
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|ch_PP-OCRv2_det_slim|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)| |ch_PP-OCRv2_det_slim|[New] slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_PP-OCRv2_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCR_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| |ch_PP-OCRv2_det|[New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_det_cml.yml](../../configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_cml.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)|
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)|
|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
<a name="Recognition"></a> <a name="Recognition"></a>
### 2. Text Recognition Model ## 2. Text Recognition Model
<a name="Chinese"></a> <a name="Chinese"></a>
#### Chinese Recognition Model ### 2.1 Chinese Recognition Model
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) | |ch_PP-OCRv2_rec_slim|[New] Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|ch_PP-OCRv2_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) | |ch_PP-OCRv2_rec|[New] Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
|ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
...@@ -53,7 +53,7 @@ Relationship of the above models is as follows. ...@@ -53,7 +53,7 @@ Relationship of the above models is as follows.
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. **Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
<a name="English"></a> <a name="English"></a>
#### English Recognition Model ### 2.2 English Recognition Model
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
...@@ -61,7 +61,7 @@ Relationship of the above models is as follows. ...@@ -61,7 +61,7 @@ Relationship of the above models is as follows.
|en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) | |en_number_mobile_v2.0_rec|Original lightweight model, supporting English and number recognition|[rec_en_number_lite_train.yml](../../configs/rec/multi_language/rec_en_number_lite_train.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar) |
<a name="Multilingual"></a> <a name="Multilingual"></a>
#### Multilingual Recognition Model(Updating...) ### 2.3 Multilingual Recognition Model(Updating...)
|model name| dict file | description|config|model size|download| |model name| dict file | description|config|model size|download|
| --- | --- | --- |--- | --- | --- | | --- | --- | --- |--- | --- | --- |
...@@ -82,7 +82,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l ...@@ -82,7 +82,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l
<a name="Angle"></a> <a name="Angle"></a>
### 3. Text Angle Classification Model ## 3. Text Angle Classification Model
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
...@@ -90,7 +90,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l ...@@ -90,7 +90,7 @@ For more supported languages, please refer to : [Multi-language model](./multi_l
|ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | |ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
<a name="Paddle-Lite"></a> <a name="Paddle-Lite"></a>
### 4. Paddle-Lite Model ## 4. Paddle-Lite Model
|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
|---|---|---|---|---|---|---| |---|---|---|---|---|---|---|
|PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9| |PP-OCRv2|extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer_opt.nb)|v2.9|
......
...@@ -24,9 +24,9 @@ The results of detection and recognition are as follows: ...@@ -24,9 +24,9 @@ The results of detection and recognition are as follows:
![](../imgs_results/e2e_res_img293_pgnet.png) ![](../imgs_results/e2e_res_img293_pgnet.png)
![](../imgs_results/e2e_res_img295_pgnet.png) ![](../imgs_results/e2e_res_img295_pgnet.png)
### Performance ### Performance
####Test set: Total Text #### Test set: Total Text
####Test environment: NVIDIA Tesla V100-SXM2-16GB #### Test environment: NVIDIA Tesla V100-SXM2-16GB
|PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|download| |PGNetA|det_precision|det_recall|det_f_score|e2e_precision|e2e_recall|e2e_f_score|FPS|download|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|Paper|85.30|86.80|86.1|-|-|61.7|38.20 (size=640)|-| |Paper|85.30|86.80|86.1|-|-|61.7|38.20 (size=640)|-|
...@@ -36,7 +36,7 @@ The results of detection and recognition are as follows: ...@@ -36,7 +36,7 @@ The results of detection and recognition are as follows:
<a name="Environment_Configuration"></a> <a name="Environment_Configuration"></a>
## 2. Environment Configuration ## 2. Environment Configuration
Please refer to [Quick Installation](./installation_en.md) Configure the PaddleOCR running environment. Please refer to [Operation Environment Preparation](./environment_en.md) to configure PaddleOCR operating environment first, refer to [PaddleOCR Overview and Project Clone](./paddleOCR_overview_en.md) to clone the project
<a name="Quick_Use"></a> <a name="Quick_Use"></a>
## 3. Quick Use ## 3. Quick Use
......
...@@ -53,10 +53,10 @@ If you do not use the provided test image, you can replace the following `--imag ...@@ -53,10 +53,10 @@ If you do not use the provided test image, you can replace the following `--imag
#### 2.1.1 Chinese and English Model #### 2.1.1 Chinese and English Model
* Detection, direction classification and recognition: set the direction classifier parameter`--use_angle_cls true` to recognize vertical text. * Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
```bash ```bash
paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
``` ```
Output will be a list, each item contains bounding box, text and recognition confidence Output will be a list, each item contains bounding box, text and recognition confidence
......
...@@ -161,7 +161,7 @@ The current multi-language model is still in the demo stage and will continue to ...@@ -161,7 +161,7 @@ The current multi-language model is still in the demo stage and will continue to
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo. If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` .
- Custom dictionary - Custom dictionary
...@@ -172,8 +172,6 @@ If you need to customize dic file, please add character_dict_path field in confi ...@@ -172,8 +172,6 @@ If you need to customize dic file, please add character_dict_path field in confi
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a> <a name="TRAINING"></a>
## 2.Training ## 2.Training
...@@ -250,7 +248,6 @@ Global: ...@@ -250,7 +248,6 @@ Global:
# Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
character_dict_path: ppocr/utils/ppocr_keys_v1.txt character_dict_path: ppocr/utils/ppocr_keys_v1.txt
# Modify character type # Modify character type
character_type: ch
... ...
# Whether to recognize spaces # Whether to recognize spaces
use_space_char: True use_space_char: True
...@@ -312,18 +309,18 @@ Eval: ...@@ -312,18 +309,18 @@ Eval:
Currently, the multi-language algorithms supported by PaddleOCR are: Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type | | Configuration file | Algorithm name | backbone | trans | seq | pred | language |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht| | rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional |
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN | | rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) |
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french | | rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german | | rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan | | rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean | | rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin | | rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin |
| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar | | rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic |
| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic | | rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic |
| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari | | rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari |
For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)
...@@ -471,6 +468,3 @@ inference/det_db/ ...@@ -471,6 +468,3 @@ inference/det_db/
``` ```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path" python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path"
``` ```
...@@ -25,11 +25,10 @@ The PaddleOCR model uses configuration files to manage network training and eval ...@@ -25,11 +25,10 @@ The PaddleOCR model uses configuration files to manage network training and eval
For the complete configuration file description, please refer to [Configuration File](./config_en.md) For the complete configuration file description, please refer to [Configuration File](./config_en.md)
<a name="1-basic-concepts"></a> <a name="1-basic-concepts"></a>
# 1. Basic concepts
## 2. Basic Concepts ## 2. Basic Concepts
The following parameters need to be paid attention to when tuning the model: In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
<a name="11-learning-rate"></a> <a name="11-learning-rate"></a>
### 2.1 Learning Rate ### 2.1 Learning Rate
...@@ -53,7 +52,7 @@ and the learning rate is the same in each stage. ...@@ -53,7 +52,7 @@ and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py). warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a> <a name="12-regularization"></a>
## 1.2 Regularization ### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods. Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods. L1 and L2 regularization are the most commonly used regularization methods.
...@@ -125,7 +124,7 @@ There are several experiences for reference when constructing the data set: ...@@ -125,7 +124,7 @@ There are several experiences for reference when constructing the data set:
<a name="3-faq"></a> <a name="3-faq"></a>
# 3. FAQ ## 4. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition? **Q**: How to choose a suitable network input shape when training CRNN recognition?
...@@ -147,3 +146,10 @@ There are several experiences for reference when constructing the data set: ...@@ -147,3 +146,10 @@ There are several experiences for reference when constructing the data set:
A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period. A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period.
***
Click the following links for detailed training tutorial:
- [text detection model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/detection.md)
- [text recognition model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/recognition.md)
- [text direction classification model training](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/angle_class.md)
...@@ -366,4 +366,6 @@ im_show.save('result.jpg') ...@@ -366,4 +366,6 @@ im_show.save('result.jpg')
| rec | Enable recognition when `ppocr.ocr` func exec | TRUE | | rec | Enable recognition when `ppocr.ocr` func exec | TRUE |
| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE | | cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |
| show_log | Whether to print log in det and rec | FALSE | | show_log | Whether to print log in det and rec | FALSE |
| type | Perform ocr or table structuring, the value is selected in ['ocr','structure'] | ocr | | type | Perform ocr or table structuring, the value is selected in ['ocr','structure'] | ocr |
\ No newline at end of file | ocr_version | OCR Model version number, the current model support list is as follows: PP-OCRv2 support Chinese detection and recognition model, PP-OCR support Chinese detection, recognition and direction classifier, multilingual recognition model | PP-OCRv2 |
| structure_version | table structure Model version number, the current model support list is as follows: STRUCTURE support english table structure model | STRUCTURE |
doc/joinus.PNG

194 KB | W: | H:

doc/joinus.PNG

193 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
...@@ -16,6 +16,9 @@ import os ...@@ -16,6 +16,9 @@ import os
import sys import sys
__dir__ = os.path.dirname(__file__) __dir__ = os.path.dirname(__file__)
import paddle
sys.path.append(os.path.join(__dir__, '')) sys.path.append(os.path.join(__dir__, ''))
import cv2 import cv2
...@@ -29,7 +32,7 @@ from ppocr.utils.logging import get_logger ...@@ -29,7 +32,7 @@ from ppocr.utils.logging import get_logger
logger = get_logger() logger = get_logger()
from ppocr.utils.utility import check_and_read_gif, get_image_file_list from ppocr.utils.utility import check_and_read_gif, get_image_file_list
from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url
from tools.infer.utility import draw_ocr, str2bool from tools.infer.utility import draw_ocr, str2bool, check_gpu
from ppstructure.utility import init_args, draw_structure_result from ppstructure.utility import init_args, draw_structure_result
from ppstructure.predict_system import OCRSystem, save_structure_res from ppstructure.predict_system import OCRSystem, save_structure_res
...@@ -39,130 +42,137 @@ __all__ = [ ...@@ -39,130 +42,137 @@ __all__ = [
] ]
SUPPORT_DET_MODEL = ['DB'] SUPPORT_DET_MODEL = ['DB']
VERSION = '2.2.1' VERSION = '2.3.0.2'
SUPPORT_REC_MODEL = ['CRNN'] SUPPORT_REC_MODEL = ['CRNN']
BASE_DIR = os.path.expanduser("~/.paddleocr/") BASE_DIR = os.path.expanduser("~/.paddleocr/")
DEFAULT_MODEL_VERSION = '2.0' DEFAULT_OCR_MODEL_VERSION = 'PP-OCR'
DEFAULT_STRUCTURE_MODEL_VERSION = 'STRUCTURE'
MODEL_URLS = { MODEL_URLS = {
'2.1': { 'OCR': {
'det': { 'PP-OCRv2': {
'ch': { 'det': {
'url': 'ch': {
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar', 'url':
}, 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar',
}, },
'rec': {
'ch': {
'url':
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar',
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
}
}
},
'2.0': {
'det': {
'ch': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar',
},
'en': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar',
}, },
'structure': { 'rec': {
'url': 'ch': {
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar' 'url':
'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar',
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
}
} }
}, },
'rec': { DEFAULT_OCR_MODEL_VERSION: {
'ch': { 'det': {
'url': 'ch': {
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar', 'url':
'dict_path': './ppocr/utils/ppocr_keys_v1.txt' 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar',
}, },
'en': { 'en': {
'url': 'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar', 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar',
'dict_path': './ppocr/utils/en_dict.txt' },
}, 'structure': {
'french': { 'url':
'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar'
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar', }
'dict_path': './ppocr/utils/dict/french_dict.txt'
},
'german': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/german_dict.txt'
},
'korean': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/korean_dict.txt'
},
'japan': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/japan_dict.txt'
},
'chinese_cht': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'
},
'ta': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/ta_dict.txt'
}, },
'te': { 'rec': {
'url': 'ch': {
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar', 'url':
'dict_path': './ppocr/utils/dict/te_dict.txt' 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
},
'en': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/en_dict.txt'
},
'french': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/french_dict.txt'
},
'german': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/german_dict.txt'
},
'korean': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/korean_dict.txt'
},
'japan': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/japan_dict.txt'
},
'chinese_cht': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'
},
'ta': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/ta_dict.txt'
},
'te': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/te_dict.txt'
},
'ka': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/ka_dict.txt'
},
'latin': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/latin_dict.txt'
},
'arabic': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/arabic_dict.txt'
},
'cyrillic': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/cyrillic_dict.txt'
},
'devanagari': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/devanagari_dict.txt'
},
'structure': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar',
'dict_path': 'ppocr/utils/dict/table_dict.txt'
}
}, },
'ka': { 'cls': {
'url': 'ch': {
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar', 'url':
'dict_path': './ppocr/utils/dict/ka_dict.txt' 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
}
}, },
'latin': { }
'url': },
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar', 'STRUCTURE': {
'dict_path': './ppocr/utils/dict/latin_dict.txt' DEFAULT_STRUCTURE_MODEL_VERSION: {
}, 'table': {
'arabic': { 'en': {
'url': 'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar', 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar',
'dict_path': './ppocr/utils/dict/arabic_dict.txt' 'dict_path': 'ppocr/utils/dict/table_structure_dict.txt'
}, }
'cyrillic': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/cyrillic_dict.txt'
},
'devanagari': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar',
'dict_path': './ppocr/utils/dict/devanagari_dict.txt'
},
'structure': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar',
'dict_path': 'ppocr/utils/dict/table_dict.txt'
}
},
'cls': {
'ch': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',
}
},
'table': {
'en': {
'url':
'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar',
'dict_path': 'ppocr/utils/dict/table_structure_dict.txt'
} }
} }
} }
...@@ -177,7 +187,20 @@ def parse_args(mMain=True): ...@@ -177,7 +187,20 @@ def parse_args(mMain=True):
parser.add_argument("--det", type=str2bool, default=True) parser.add_argument("--det", type=str2bool, default=True)
parser.add_argument("--rec", type=str2bool, default=True) parser.add_argument("--rec", type=str2bool, default=True)
parser.add_argument("--type", type=str, default='ocr') parser.add_argument("--type", type=str, default='ocr')
parser.add_argument("--version", type=str, default='2.1') parser.add_argument(
"--ocr_version",
type=str,
default='PP-OCRv2',
help='OCR Model version, the current model support list is as follows: '
'1. PP-OCRv2 Support Chinese detection and recognition model. '
'2. PP-OCR support Chinese detection, recognition and direction classifier and multilingual recognition model.'
)
parser.add_argument(
"--structure_version",
type=str,
default='STRUCTURE',
help='Model version, the current model support list is as follows:'
' 1. STRUCTURE Support en table structure model.')
for action in parser._actions: for action in parser._actions:
if action.dest in ['rec_char_dict_path', 'table_char_dict_path']: if action.dest in ['rec_char_dict_path', 'table_char_dict_path']:
...@@ -215,9 +238,9 @@ def parse_lang(lang): ...@@ -215,9 +238,9 @@ def parse_lang(lang):
lang = "cyrillic" lang = "cyrillic"
elif lang in devanagari_lang: elif lang in devanagari_lang:
lang = "devanagari" lang = "devanagari"
assert lang in MODEL_URLS[DEFAULT_MODEL_VERSION][ assert lang in MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION][
'rec'], 'param lang must in {}, but got {}'.format( 'rec'], 'param lang must in {}, but got {}'.format(
MODEL_URLS[DEFAULT_MODEL_VERSION]['rec'].keys(), lang) MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION]['rec'].keys(), lang)
if lang == "ch": if lang == "ch":
det_lang = "ch" det_lang = "ch"
elif lang == 'structure': elif lang == 'structure':
...@@ -227,33 +250,41 @@ def parse_lang(lang): ...@@ -227,33 +250,41 @@ def parse_lang(lang):
return lang, det_lang return lang, det_lang
def get_model_config(version, model_type, lang): def get_model_config(type, version, model_type, lang):
if version not in MODEL_URLS: if type == 'OCR':
logger.warning('version {} not in {}, use version {} instead'.format( DEFAULT_MODEL_VERSION = DEFAULT_OCR_MODEL_VERSION
version, MODEL_URLS.keys(), DEFAULT_MODEL_VERSION)) elif type == 'STRUCTURE':
DEFAULT_MODEL_VERSION = DEFAULT_STRUCTURE_MODEL_VERSION
else:
raise NotImplementedError
model_urls = MODEL_URLS[type]
if version not in model_urls:
logger.warning('version {} not in {}, auto switch to version {}'.format(
version, model_urls.keys(), DEFAULT_MODEL_VERSION))
version = DEFAULT_MODEL_VERSION version = DEFAULT_MODEL_VERSION
if model_type not in MODEL_URLS[version]: if model_type not in model_urls[version]:
if model_type in MODEL_URLS[DEFAULT_MODEL_VERSION]: if model_type in model_urls[DEFAULT_MODEL_VERSION]:
logger.warning( logger.warning(
'version {} not support {} models, use version {} instead'. 'version {} not support {} models, auto switch to version {}'.
format(version, model_type, DEFAULT_MODEL_VERSION)) format(version, model_type, DEFAULT_MODEL_VERSION))
version = DEFAULT_MODEL_VERSION version = DEFAULT_MODEL_VERSION
else: else:
logger.error('{} models is not support, we only support {}'.format( logger.error('{} models is not support, we only support {}'.format(
model_type, MODEL_URLS[DEFAULT_MODEL_VERSION].keys())) model_type, model_urls[DEFAULT_MODEL_VERSION].keys()))
sys.exit(-1) sys.exit(-1)
if lang not in MODEL_URLS[version][model_type]: if lang not in model_urls[version][model_type]:
if lang in MODEL_URLS[DEFAULT_MODEL_VERSION][model_type]: if lang in model_urls[DEFAULT_MODEL_VERSION][model_type]:
logger.warning('lang {} is not support in {}, use {} instead'. logger.warning(
format(lang, version, DEFAULT_MODEL_VERSION)) 'lang {} is not support in {}, auto switch to version {}'.
format(lang, version, DEFAULT_MODEL_VERSION))
version = DEFAULT_MODEL_VERSION version = DEFAULT_MODEL_VERSION
else: else:
logger.error( logger.error(
'lang {} is not support, we only support {} for {} models'. 'lang {} is not support, we only support {} for {} models'.
format(lang, MODEL_URLS[DEFAULT_MODEL_VERSION][model_type].keys( format(lang, model_urls[DEFAULT_MODEL_VERSION][model_type].keys(
), model_type)) ), model_type))
sys.exit(-1) sys.exit(-1)
return MODEL_URLS[version][model_type][lang] return model_urls[version][model_type][lang]
class PaddleOCR(predict_system.TextSystem): class PaddleOCR(predict_system.TextSystem):
...@@ -265,23 +296,28 @@ class PaddleOCR(predict_system.TextSystem): ...@@ -265,23 +296,28 @@ class PaddleOCR(predict_system.TextSystem):
""" """
params = parse_args(mMain=False) params = parse_args(mMain=False)
params.__dict__.update(**kwargs) params.__dict__.update(**kwargs)
params.use_gpu = check_gpu(params.use_gpu)
if not params.show_log: if not params.show_log:
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
self.use_angle_cls = params.use_angle_cls self.use_angle_cls = params.use_angle_cls
lang, det_lang = parse_lang(params.lang) lang, det_lang = parse_lang(params.lang)
# init model dir # init model dir
det_model_config = get_model_config(params.version, 'det', det_lang) det_model_config = get_model_config('OCR', params.ocr_version, 'det',
det_lang)
params.det_model_dir, det_url = confirm_model_dir_url( params.det_model_dir, det_url = confirm_model_dir_url(
params.det_model_dir, params.det_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang), os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
det_model_config['url']) det_model_config['url'])
rec_model_config = get_model_config(params.version, 'rec', lang) rec_model_config = get_model_config('OCR', params.ocr_version, 'rec',
lang)
params.rec_model_dir, rec_url = confirm_model_dir_url( params.rec_model_dir, rec_url = confirm_model_dir_url(
params.rec_model_dir, params.rec_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang), os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
rec_model_config['url']) rec_model_config['url'])
cls_model_config = get_model_config(params.version, 'cls', 'ch') cls_model_config = get_model_config('OCR', params.ocr_version, 'cls',
'ch')
params.cls_model_dir, cls_url = confirm_model_dir_url( params.cls_model_dir, cls_url = confirm_model_dir_url(
params.cls_model_dir, params.cls_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'cls'), os.path.join(BASE_DIR, VERSION, 'ocr', 'cls'),
...@@ -362,22 +398,27 @@ class PPStructure(OCRSystem): ...@@ -362,22 +398,27 @@ class PPStructure(OCRSystem):
def __init__(self, **kwargs): def __init__(self, **kwargs):
params = parse_args(mMain=False) params = parse_args(mMain=False)
params.__dict__.update(**kwargs) params.__dict__.update(**kwargs)
params.use_gpu = check_gpu(params.use_gpu)
if not params.show_log: if not params.show_log:
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
lang, det_lang = parse_lang(params.lang) lang, det_lang = parse_lang(params.lang)
# init model dir # init model dir
det_model_config = get_model_config(params.version, 'det', det_lang) det_model_config = get_model_config('OCR', params.ocr_version, 'det',
det_lang)
params.det_model_dir, det_url = confirm_model_dir_url( params.det_model_dir, det_url = confirm_model_dir_url(
params.det_model_dir, params.det_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang), os.path.join(BASE_DIR, VERSION, 'ocr', 'det', det_lang),
det_model_config['url']) det_model_config['url'])
rec_model_config = get_model_config(params.version, 'rec', lang) rec_model_config = get_model_config('OCR', params.ocr_version, 'rec',
lang)
params.rec_model_dir, rec_url = confirm_model_dir_url( params.rec_model_dir, rec_url = confirm_model_dir_url(
params.rec_model_dir, params.rec_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang), os.path.join(BASE_DIR, VERSION, 'ocr', 'rec', lang),
rec_model_config['url']) rec_model_config['url'])
table_model_config = get_model_config(params.version, 'table', 'en') table_model_config = get_model_config(
'STRUCTURE', params.structure_version, 'table', 'en')
params.table_model_dir, table_url = confirm_model_dir_url( params.table_model_dir, table_url = confirm_model_dir_url(
params.table_model_dir, params.table_model_dir,
os.path.join(BASE_DIR, VERSION, 'ocr', 'table'), os.path.join(BASE_DIR, VERSION, 'ocr', 'table'),
......
...@@ -11,7 +11,10 @@ ...@@ -11,7 +11,10 @@
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and #See the License for the specific language governing permissions and
#limitations under the License. #limitations under the License.
"""
This code is refered from:
https://github.com/songdejia/EAST/blob/master/data_utils.py
"""
import math import math
import cv2 import cv2
import numpy as np import numpy as np
...@@ -24,10 +27,10 @@ __all__ = ['EASTProcessTrain'] ...@@ -24,10 +27,10 @@ __all__ = ['EASTProcessTrain']
class EASTProcessTrain(object): class EASTProcessTrain(object):
def __init__(self, def __init__(self,
image_shape = [512, 512], image_shape=[512, 512],
background_ratio = 0.125, background_ratio=0.125,
min_crop_side_ratio = 0.1, min_crop_side_ratio=0.1,
min_text_size = 10, min_text_size=10,
**kwargs): **kwargs):
self.input_size = image_shape[1] self.input_size = image_shape[1]
self.random_scale = np.array([0.5, 1, 2.0, 3.0]) self.random_scale = np.array([0.5, 1, 2.0, 3.0])
...@@ -282,12 +285,7 @@ class EASTProcessTrain(object): ...@@ -282,12 +285,7 @@ class EASTProcessTrain(object):
1.0 / max(min(poly_h, poly_w), 1.0) 1.0 / max(min(poly_h, poly_w), 1.0)
return score_map, geo_map, training_mask return score_map, geo_map, training_mask
def crop_area(self, def crop_area(self, im, polys, tags, crop_background=False, max_tries=50):
im,
polys,
tags,
crop_background=False,
max_tries=50):
""" """
make random crop from the input image make random crop from the input image
:param im: :param im:
...@@ -435,5 +433,4 @@ class EASTProcessTrain(object): ...@@ -435,5 +433,4 @@ class EASTProcessTrain(object):
data['score_map'] = score_map data['score_map'] = score_map
data['geo_map'] = geo_map data['geo_map'] = geo_map
data['training_mask'] = training_mask data['training_mask'] = training_mask
# print(im.shape, score_map.shape, geo_map.shape, training_mask.shape) return data
return data
\ No newline at end of file
...@@ -11,6 +11,10 @@ ...@@ -11,6 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""
This code is refer from:
https://github.com/WenmuZhou/DBNet.pytorch/blob/master/data_loader/modules/iaa_augment.py
"""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
......
...@@ -22,6 +22,8 @@ import string ...@@ -22,6 +22,8 @@ import string
from shapely.geometry import LineString, Point, Polygon from shapely.geometry import LineString, Point, Polygon
import json import json
from ppocr.utils.logging import get_logger
class ClsLabelEncode(object): class ClsLabelEncode(object):
def __init__(self, label_list, **kwargs): def __init__(self, label_list, **kwargs):
...@@ -93,31 +95,23 @@ class BaseRecLabelEncode(object): ...@@ -93,31 +95,23 @@ class BaseRecLabelEncode(object):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='ch',
use_space_char=False): use_space_char=False):
support_character_type = [
'ch', 'en', 'EN_symbol', 'french', 'german', 'japan', 'korean',
'EN', 'it', 'xi', 'pu', 'ru', 'ar', 'ta', 'ug', 'fa', 'ur', 'rs',
'oc', 'rsc', 'bg', 'uk', 'be', 'te', 'ka', 'chinese_cht', 'hi',
'mr', 'ne', 'latin', 'arabic', 'cyrillic', 'devanagari'
]
assert character_type in support_character_type, "Only {} are supported now but get {}".format(
support_character_type, character_type)
self.max_text_len = max_text_length self.max_text_len = max_text_length
self.beg_str = "sos" self.beg_str = "sos"
self.end_str = "eos" self.end_str = "eos"
if character_type == "en": self.lower = False
if character_dict_path is None:
logger = get_logger()
logger.warning(
"The character_dict_path is None, model can only recognize number and lower letters"
)
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str) dict_character = list(self.character_str)
elif character_type == "EN_symbol": self.lower = True
# same with ASTER setting (use 94 char). else:
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
elif character_type in support_character_type:
self.character_str = "" self.character_str = ""
assert character_dict_path is not None, "character_dict_path should not be None when character_type is {}".format(
character_type)
with open(character_dict_path, "rb") as fin: with open(character_dict_path, "rb") as fin:
lines = fin.readlines() lines = fin.readlines()
for line in lines: for line in lines:
...@@ -126,7 +120,6 @@ class BaseRecLabelEncode(object): ...@@ -126,7 +120,6 @@ class BaseRecLabelEncode(object):
if use_space_char: if use_space_char:
self.character_str += " " self.character_str += " "
dict_character = list(self.character_str) dict_character = list(self.character_str)
self.character_type = character_type
dict_character = self.add_special_char(dict_character) dict_character = self.add_special_char(dict_character)
self.dict = {} self.dict = {}
for i, char in enumerate(dict_character): for i, char in enumerate(dict_character):
...@@ -148,7 +141,7 @@ class BaseRecLabelEncode(object): ...@@ -148,7 +141,7 @@ class BaseRecLabelEncode(object):
""" """
if len(text) == 0 or len(text) > self.max_text_len: if len(text) == 0 or len(text) > self.max_text_len:
return None return None
if self.character_type == "en": if self.lower:
text = text.lower() text = text.lower()
text_list = [] text_list = []
for char in text: for char in text:
...@@ -168,13 +161,11 @@ class NRTRLabelEncode(BaseRecLabelEncode): ...@@ -168,13 +161,11 @@ class NRTRLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='EN_symbol',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(NRTRLabelEncode, super(NRTRLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def __call__(self, data): def __call__(self, data):
text = data['label'] text = data['label']
...@@ -201,12 +192,10 @@ class CTCLabelEncode(BaseRecLabelEncode): ...@@ -201,12 +192,10 @@ class CTCLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='ch',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(CTCLabelEncode, super(CTCLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def __call__(self, data): def __call__(self, data):
text = data['label'] text = data['label']
...@@ -232,12 +221,10 @@ class E2ELabelEncodeTest(BaseRecLabelEncode): ...@@ -232,12 +221,10 @@ class E2ELabelEncodeTest(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='EN',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(E2ELabelEncodeTest, super(E2ELabelEncodeTest, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def __call__(self, data): def __call__(self, data):
import json import json
...@@ -468,12 +455,10 @@ class AttnLabelEncode(BaseRecLabelEncode): ...@@ -468,12 +455,10 @@ class AttnLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='ch',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(AttnLabelEncode, super(AttnLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
self.beg_str = "sos" self.beg_str = "sos"
...@@ -516,12 +501,10 @@ class SEEDLabelEncode(BaseRecLabelEncode): ...@@ -516,12 +501,10 @@ class SEEDLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='ch',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(SEEDLabelEncode, super(SEEDLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
self.end_str = "eos" self.end_str = "eos"
...@@ -548,12 +531,10 @@ class SRNLabelEncode(BaseRecLabelEncode): ...@@ -548,12 +531,10 @@ class SRNLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length=25, max_text_length=25,
character_dict_path=None, character_dict_path=None,
character_type='en',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(SRNLabelEncode, super(SRNLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
dict_character = dict_character + [self.beg_str, self.end_str] dict_character = dict_character + [self.beg_str, self.end_str]
...@@ -761,12 +742,10 @@ class SARLabelEncode(BaseRecLabelEncode): ...@@ -761,12 +742,10 @@ class SARLabelEncode(BaseRecLabelEncode):
def __init__(self, def __init__(self,
max_text_length, max_text_length,
character_dict_path=None, character_dict_path=None,
character_type='ch',
use_space_char=False, use_space_char=False,
**kwargs): **kwargs):
super(SARLabelEncode, super(SARLabelEncode, self).__init__(
self).__init__(max_text_length, character_dict_path, max_text_length, character_dict_path, use_space_char)
character_type, use_space_char)
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
beg_end_str = "<BOS/EOS>" beg_end_str = "<BOS/EOS>"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment