Commit 2ee12253 authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into fixocr

parents bee2b15e 52ae30b0
# PREDICTION FROM INFERENCE MODEL
# Reasoning based on Python prediction engine
The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
......@@ -18,7 +18,13 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar &
```
The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
```
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/
# -c Set the training algorithm yml configuration file
# -o Set optional parameters
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy \
Global.save_inference_dir=./inference/det_db/
```
When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file.
`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved.
......@@ -38,6 +44,11 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar
The recognition model is converted to the inference model in the same way as the detection, as follows:
```
# -c Set the training algorithm yml configuration file
# -o Set optional parameters
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \
Global.save_inference_dir=./inference/rec_crnn/
```
......@@ -53,7 +64,8 @@ After the conversion is successful, there are two files in the directory:
## TEXT DETECTION MODEL INFERENCE
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters.
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.
### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE
......
......@@ -9,6 +9,8 @@ PaddleOCR working environment:
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/).
*If you want to directly run the prediction code on mac or windows, you can start from step 2.*
1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient.
```
# Switch to the working directory
......@@ -56,6 +58,9 @@ python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsing
# If you have cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
# If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
......
# Quick start of Chinese OCR model
## 1. Prepare for the environment
Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment.
## 2.inference models
| Name | Introduction | Detection model | Recognition model | Recognition model with space support |
|-|-|-|-|-|
|chinese_db_crnn_mobile| Ultra-lightweight Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server| Universal Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
* If wget is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, and uncompress it and place it in the corresponding directory.
Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them.
```
mkdir inference && cd inference
# Download the detection model and unzip
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition model and unzip
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
cd ..
```
Take the ultra-lightweight model as an example:
```
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
After decompression, the file structure should be as follows:
```
|-inference
|-ch_rec_mv3_crnn
|- model
|- params
|-ch_det_mv3_db
|- model
|- params
...
```
## 3. Single image or image set prediction
* The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual results are saved to the `./inference_results` folder by default.
```bash
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# Predict imageset specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
- Universal Chinese OCR model
Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows.
```
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
- Universal Chinese OCR model with the support of space
Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows.
* Note: Please update the source code to the latest version and add parameters `--use_space_char=True`
```
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" --use_space_char=True
```
For more text detection and recognition tandem reasoning, please refer to the document tutorial
: [Inference with Python inference engine](./inference_en.md)
In addition, the tutorial also provides other deployment methods for the Chinese OCR model:
- [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md)
- [Service deployment](./serving_en.md)
- [End-to-end deployment](../../deploy/lite/readme_en.md)
......@@ -96,6 +96,16 @@ You can use them if needed.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
- Custom dictionary
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
- Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`.
**Note: use_space_char only takes effect when character_type=ch**
### TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
......@@ -122,6 +132,17 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
```
- Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse.
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
- Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
......
# REFERENCE
```
1. EAST:
@inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
\ No newline at end of file
# Service deployment
PaddleOCR provides 2 service deployment methods::
- Based on **HubServing**:Has been integrated into PaddleOCR ([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)). Please follow this tutorial.
- Based on **PaddleServing**:See PaddleServing official website for details ([demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)). Follow-up will also be integrated into PaddleOCR.
The service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Select the corresponding service package to install and start service according to your needs. The directory is as follows:
```
deploy/hubserving/
└─ ocr_det detection module service package
└─ ocr_rec recognition module service package
└─ ocr_system two-stage series connection service package
```
Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows:
```
deploy/hubserving/ocr_system/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
```
## Quick start service
The following steps take the 2-stage series service as an example. If only the detection service or recognition service is needed, replace the corresponding file path.
### 1. Prepare the environment
```shell
# Install paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables
export PYTHONPATH=.
```
### 2. Install Service Module
PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs. Such as:
Install the detection service module:
```shell
hub install deploy/hubserving/ocr_det/
```
Or, install the recognition service module:
```shell
hub install deploy/hubserving/ocr_rec/
```
Or, install the 2-stage series service module:
```shell
hub install deploy/hubserving/ocr_system/
```
### 3. Start service
#### Way 1. Start with command line parameters (CPU only)
**start command:**
```shell
$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
--port XXXX \
--use_multiprocess \
--workers \
```
**parameters:**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start the 2-stage series service:
```shell
hub serving start -m ocr_system
```
This completes the deployment of a service API, using the default port number 8866.
#### Way 2. Start with configuration file(CPU、GPU)
**start command:**
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
```python
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true
},
"predict_args": {
}
}
},
"port": 8868,
"use_multiprocess": false,
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
For example, use GPU card No. 3 to start the 2-stage series service:
```shell
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/ocr_system/config.json
```
## Send prediction requests
After the service starts, you can use the following command to send a prediction request to obtain the prediction result:
```shell
python tools/test_hubserving.py server_url image_path
```
Two parameters need to be passed to the script:
- **server_url**:service address,format of which is
`http://[ip_address]:[port]/predict/[module_name]`
For example, if the detection, recognition and 2-stage serial services are started with provided configuration files, the respective `server_url` would be:
`http://127.0.0.1:8866/predict/ocr_det`
`http://127.0.0.1:8867/predict/ocr_rec`
`http://127.0.0.1:8868/predict/ocr_system`
- **image_path**:Test image path, can be a single image path or an image directory path
**Eg.**
```shell
python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/
```
## Returned result format
The returned result is a list. Each item in the list is a dict. The dict may contain three fields. The information is as follows:
|field name|data type|description|
|-|-|-|
|text|str|text content|
|confidence|float|text recognition confidence|
|text_region|list|text location coordinates|
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
|field name/module name|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
## User defined service module modification
If you need to modify the service logic, the following steps are generally required (take the modification of `ocr_system` for example):
- 1. Stop service
```shell
hub serving stop --port/-p XXXX
```
- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test.
- 3. Uninstall old service module
```shell
hub uninstall ocr_system
```
- 4. Install modified service module
```shell
hub install deploy/hubserving/ocr_system/
```
- 5. Restart service
```shell
hub serving start -m ocr_system
```
## Tricks
Here we have sorted out some Chinese OCR training and prediction tricks, which are being updated continuously. You are welcome to contribute more OCR tricks ~
- [Replace Backbone Network](#ReplaceBackboneNetwork)
- [Long Chinese Text Recognition](#LongChineseTextRecognition)
- [Space Recognition](#SpaceRecognition)
<a name="ReplaceBackboneNetwork"></a>
#### 1、Replace Backbone Network
- **Problem Description**
At present, ResNet_vd series and MobileNetV3 series are the backbone networks used in PaddleOCR, whether replacing the other backbone networks will help to improve the accuracy? What should be paid attention to when replacing?
- **Tips**
- Whether text detection or text recognition, the choice of backbone network is a trade-off between prediction effect and prediction efficiency. Generally, a larger backbone network is selected, e.g. ResNet101_vd, then the performance of the detection or recognition is more accurate, but the time cost will increase accordingly. And a smaller backbone network is selected, e.g. MobileNetV3_small_x0_35, the prediction speed is faster, but the accuracy of detection or recognition will be reduced. Fortunately, the detection or recognition effect of different backbone networks is positively correlated with the performance of ImageNet 1000 classification task. [**PaddleClas**](https://github.com/PaddlePaddle/PaddleClas/blob/master/README_en.md) have sorted out the 23 series of classification network structures, such as ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet. It provides the top1 accuracy of classification, the time cost of GPU(V100 and T4) and CPU(SD 855), and the 117 pretrained models [**download addresses**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html).
- Similar as the 4 stages of ResNet, the replacement of text detection backbone network is to determine those four stages to facilitate the integration of FPN like the object detection heads. In addition, for the text detection problem, the pre trained model in ImageNet1000 can accelerate the convergence and improve the accuracy.
- In order to replace the backbone network of text recognition, we need to pay attention to the descending position of network width and height stride. Since the ratio between width and height is large in chinese text recognition, the frequency of height decrease is less and the frequency of width decrease is more. You can refer the [modifies of MobileNetV3](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py) in PaddleOCR.
<a name="LongChineseTextRecognition"></a>
#### 2、Long Chinese Text Recognition
- **Problem Description**
The maximum resolution of Chinese recognition model during training is [3,32,320], if the text image to be recognized is too long, as shown in the figure below, how to adapt?
<div align="center">
<img src="../tricks/long_text_examples.jpg" width="600">
</div>
- **Tips**
During the training, the training samples are not directly resized to [3,32,320]. At first, the height of samples are resized to 32 and keep the ratio between the width and the height. When the width is less than 320, the excess parts are padding 0. Besides, when the ratio between the width and the height of the samples is larger than 10, these samples will be ignored. When the prediction for one image, do as above, but do not limit the max ratio between the width and the height. When the prediction for an images batch, do as training, but the resized target width is the longest width of the images in the batch. [Code as following](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py)
```
def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
assert imgC == img.shape[2]
if self.character_type == "ch":
imgW = int((32 * max_wh_ratio))
h, w = img.shape[:2]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
```
<a name="SpaceRecognition"></a>
#### 3、Space Recognition
- **Problem Description**
As shown in the figure below, for Chinese and English mixed scenes, in order to facilitate reading and using the recognition results, it is often necessary to recognize the spaces between words. How can this situation be adapted?
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="600">
</div>
- **Tips**
There are two possible methods for space recognition. (1) Optimize the text detection. For spliting the text at the space in detection results, it needs to divide the text line with space into many segments when label the data for detection. (2) Optimize the text recognition. The space character is introduced into the recognition dictionary. Label the blank line in the training data for text recognition. In addition, we can also concat multiple word lines to synthesize the training data with spaces. PaddleOCR currently uses the second method.
# RECENT UPDATES
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight Chinese OCR model are provided.
- 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide Lightweight Chinese OCR online experience
......
# Vertical multi-language OCR dataset
Here we have sorted out the commonly used vertical multi-language OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~
- [Chinese urban license plate dataset](#Chinese urban license plate dataset)
- [Bank credit card dataset](#Bank credit card dataset)
- [Captcha dataset-Captcha](#Captcha dataset-Captcha)
- [multi-language dataset](#multi-language dataset)
<a name="Chinese urban license plate dataset"></a>
## Chinese urban license plate dataset
- **Data source**[https://github.com/detectRecog/CCPD](https://github.com/detectRecog/CCPD)
- **Data introduction**: It contains more than 250000 vehicle license plate images and vehicle license plate detection and recognition information labeling. It contains the following license plate image information in different scenes.
* CCPD-Base: General license plate picture
* CCPD-DB: The brightness of license plate area is bright, dark or uneven
* CCPD-FN: The license plate is farther or closer to the camera location
* CCPD-Rotate: License plate includes rotation (horizontal 20\~50 degrees, vertical-10\~10 degrees)
* CCPD-Tilt: License plate includes rotation (horizontal 15\~45 degrees, vertical 15\~45 degrees)
* CCPD-Blur: The license plate contains blurring due to camera lens jitter
* CCPD-Weather: The license plate is photographed on rainy, snowy or foggy days
* CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks
* CCPD-NP: Pictures of new cars without license plates.
![](../datasets/ccpd_demo.png)
- **Download address**
* Baidu cloud download address (extracted code is hm0U): [https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw](https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw)
* Google drive download address:[https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view](https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view)
<a name="Bank credit card dataset"></a>
## Bank credit card dataset
- **Data source**: [https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870](https://www.kesci.com/home/dataset/5954cf1372ead054a5e25870)
- **Data introduction**: There are three types of training data
* 1.Sample card data of China Merchants Bank: including card image data and annotation data, a total of 618 pictures
* 2.Single character data: including pictures and annotation data, 37 pictures in total.
* 3.There are only other bank cards, no more detailed information, a total of 50 pictures.
* The demo image is shown as follows. The annotation information is stored in excel, and the demo image below is marked as
* Top 8 card number: 62257583
* Card type: card of our bank
* End of validity: 07/41
* Chinese phonetic alphabet of card users: MICHAEL
![](../datasets/cmb_demo.jpg)
- **Download address**: [https://cdn.kesci.com/cmb2017-2.zip](https://cdn.kesci.com/cmb2017-2.zip)
<a name="Captcha dataset-Captcha"></a>
## Captcha dataset-Captcha
- **Data source**: [https://github.com/lepture/captcha](https://github.com/lepture/captcha)
- **Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows.
![](../datasets/captcha_demo.png)
- **Download address**: The dataset is generated and has no download address.
<a name="multi-language dataset"></a>
## multi-language dataset(Multi-lingual scene text detection and recognition)
- **Data source**: [https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads)
- **Data introduction**: Multi language detection dataset MLT contains both language recognition and detection tasks.
* In the detection task, the training set contains 10000 images in 10 languages, and each language contains 1000 training images. The test set contains 10000 images.
* In the recognition task, the training set contains 111998 samples.
- **Download address**: The training set is large and can be downloaded in two parts. It can only be downloaded after registering on the website:
[https://rrc.cvc.uab.es/?ch=15&com=downloads](https://rrc.cvc.uab.es/?ch=15&com=downloads)
# Visualization
- [Chinese/English OCR Visualization (Space_support )](#Space_support)
- [Ultra-lightweight Chinese/English OCR Visualization](#Ultra-lightweight)
- [General Chinese/English OCR Visualization](#General)
<a name="Space_support"></a>
## Chinese/English OCR Visualization (Space_support )
### Ultra-lightweight Model
<div align="center">
<img src="../imgs_results/img_11.jpg" width="800">
</div>
### General OCR Model
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
</div>
<a name="Ultra-lightweight"></a>
## Ultra-lightweight Chinese/English OCR Visualization
<div align="center">
<img src="../imgs_results/1.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/7.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/12.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/4.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/6.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/9.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/16.png" width="800">
</div>
<div align="center">
<img src="../imgs_results/22.jpg" width="800">
</div>
<a name="General"></a>
## General Chinese/English OCR Visualization
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
</div>
doc/ocr-android-easyedge.png

157 KB | W: | H:

doc/ocr-android-easyedge.png

292 KB | W: | H:

doc/ocr-android-easyedge.png
doc/ocr-android-easyedge.png
doc/ocr-android-easyedge.png
doc/ocr-android-easyedge.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -16,7 +16,7 @@ import math
import cv2
import numpy as np
import json
import sys
class EASTProcessTrain(object):
def __init__(self, params):
......@@ -78,7 +78,7 @@ class EASTProcessTrain(object):
dst_polys = []
rand_degree_ratio = np.random.rand()
rand_degree_cnt = 1
if rand_degree_ratio > 0.333 and rand_degree_ratio < 0.666:
if 0.333 < rand_degree_ratio < 0.666:
rand_degree_cnt = 2
elif rand_degree_ratio > 0.666:
rand_degree_cnt = 3
......@@ -138,7 +138,7 @@ class EASTProcessTrain(object):
continue
if p_area > 0:
#'poly in wrong direction'
if tag == False:
if not tag:
tag = True #reversed cases should be ignore
poly = poly[(0, 3, 2, 1), :]
validated_polys.append(poly)
......
......@@ -15,6 +15,8 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.regularizer import L2Decay
from ppocr.utils.utility import initial_logger
logger = initial_logger()
......@@ -31,6 +33,8 @@ def AdamDecay(params, parameter_list=None):
base_lr = params['base_lr']
beta1 = params['beta1']
beta2 = params['beta2']
l2_decay = params.get("l2_decay", 0.0)
if 'decay' in params:
params = params['decay']
decay_mode = params['function']
......@@ -47,5 +51,6 @@ def AdamDecay(params, parameter_list=None):
learning_rate=base_lr,
beta1=beta1,
beta2=beta2,
regularization=L2Decay(regularization_coeff=l2_decay),
parameter_list=parameter_list)
return optimizer
......@@ -82,7 +82,7 @@ def main():
'fetch_name_list':eval_fetch_name_list,\
'fetch_varname_list':eval_fetch_varname_list}
metrics = eval_det_run(exe, config, eval_info_dict, "eval")
print("Eval result", metrics)
logger.info("Eval result: {}".format(metrics))
else:
reader_type = config['Global']['reader_yml']
if "benchmark" not in reader_type:
......@@ -92,7 +92,7 @@ def main():
'fetch_name_list': eval_fetch_name_list, \
'fetch_varname_list': eval_fetch_varname_list}
metrics = eval_rec_run(exe, config, eval_info_dict, "eval")
print("Eval result:", metrics)
logger.info("Eval result: {}".format(metrics))
else:
eval_info_dict = {'program':eval_program,\
'fetch_name_list':eval_fetch_name_list,\
......
......@@ -75,6 +75,7 @@ def eval_rec_run(exe, config, eval_info_dict, mode):
char_ops, preds, preds_lod, labels, labels_lod, is_remove_duplicate)
total_acc_num += acc_num
total_sample_num += sample_num
logger.info("eval batch id: {}, acc: {}".format(total_batch_num, acc))
total_batch_num += 1
avg_acc = total_acc_num * 1.0 / total_sample_num
metrics = {'avg_acc': avg_acc, "total_acc_num": total_acc_num, \
......
......@@ -70,7 +70,7 @@ def draw_det_res(dt_boxes, config, img, img_name):
def main():
config = program.load_config(FLAGS.config)
program.merge_config(FLAGS.opt)
print(config)
logger.info(config)
# check if set use_gpu=True in paddlepaddle cpu version
use_gpu = config['Global']['use_gpu']
......
......@@ -84,7 +84,7 @@ def main():
if len(infer_list) == 0:
logger.info("Can not find img in infer_img dir.")
for i in range(max_img_num):
print("infer_img:%s" % infer_list[i])
logger.info("infer_img:%s" % infer_list[i])
img = next(blobs)
predict = exe.run(program=eval_prog,
feed={"image": img},
......@@ -115,9 +115,9 @@ def main():
preds = preds.reshape(-1)
preds_text = char_ops.decode(preds)
print("\t index:", preds)
print("\t word :", preds_text)
print("\t score :", score)
logger.info("\t index: {}".format(preds))
logger.info("\t word : {}".format(preds_text))
logger.info("\t score: {}".format(score))
# save for inference model
target_var = []
......
......@@ -41,19 +41,19 @@ def draw_server_result(image_file, res):
if len(res) == 0:
return np.array(image)
keys = res[0].keys()
if 'text_region' not in keys: # for ocr_rec, draw function is invalid
print("draw function is invalid for ocr_rec!")
if 'text_region' not in keys: # for ocr_rec, draw function is invalid
logger.info("draw function is invalid for ocr_rec!")
return None
elif 'text' not in keys: # for ocr_det
print("draw text boxes only!")
elif 'text' not in keys: # for ocr_det
logger.info("draw text boxes only!")
boxes = []
for dno in range(len(res)):
boxes.append(res[dno]['text_region'])
boxes = np.array(boxes)
draw_img = draw_boxes(image, boxes)
return draw_img
else: # for ocr_system
print("draw boxes and texts!")
else: # for ocr_system
logger.info("draw boxes and texts!")
boxes = []
texts = []
scores = []
......@@ -63,7 +63,8 @@ def draw_server_result(image_file, res):
scores.append(res[dno]['confidence'])
boxes = np.array(boxes)
scores = np.array(scores)
draw_img = draw_ocr(image, boxes, texts, scores, draw_txt=True, drop_score=0.5)
draw_img = draw_ocr(
image, boxes, texts, scores, draw_txt=True, drop_score=0.5)
return draw_img
......@@ -81,13 +82,13 @@ def main(url, image_path):
# 发送HTTP请求
starttime = time.time()
data = {'images':[cv2_to_base64(img)]}
data = {'images': [cv2_to_base64(img)]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
elapse = time.time() - starttime
total_time += elapse
print("Predict time of %s: %.3fs" % (image_file, elapse))
logger.info("Predict time of %s: %.3fs" % (image_file, elapse))
res = r.json()["results"][0]
# print(res)
logger.info(res)
if is_visualize:
draw_img = draw_server_result(image_file, res)
......@@ -98,17 +99,18 @@ def main(url, image_path):
cv2.imwrite(
os.path.join(draw_img_save, os.path.basename(image_file)),
draw_img[:, :, ::-1])
print("The visualized image saved in {}".format(
logger.info("The visualized image saved in {}".format(
os.path.join(draw_img_save, os.path.basename(image_file))))
cnt += 1
if cnt % 100 == 0:
print(cnt, "processed")
print("avg time cost: ", float(total_time)/cnt)
logger.info("{} processed".format(cnt))
logger.info("avg time cost: {}".format(float(total_time) / cnt))
if __name__ == '__main__':
if __name__ == '__main__':
if len(sys.argv) != 3:
print("Usage: %s server_url image_path" % sys.argv[0])
logger.info("Usage: %s server_url image_path" % sys.argv[0])
else:
server_url = sys.argv[1]
image_path = sys.argv[2]
main(server_url, image_path)
\ No newline at end of file
main(server_url, image_path)
......@@ -118,7 +118,7 @@ def main():
def test_reader():
config = program.load_config(FLAGS.config)
program.merge_config(FLAGS.opt)
print(config)
logger.info(config)
train_reader = reader_main(config=config, mode="train")
import time
starttime = time.time()
......@@ -129,7 +129,7 @@ def test_reader():
if count % 1 == 0:
batch_time = time.time() - starttime
starttime = time.time()
print("reader:", count, len(data), batch_time)
logger.info("reader:", count, len(data), batch_time)
except Exception as e:
logger.info(e)
logger.info("finish reader: {}, Success!".format(count))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment