Merge remote-tracking branch 'origin/dygraph' into dygraph

aa59fca5 · Leif · 12d15752 · f01f24c7 · aa59fca5 · aa59fca5
Commit aa59fca5 authored Apr 28, 2022 by Leif
20 changed files
--- a/doc/doc_en/detection_en.md
+++ b/doc/doc_en/detection_en.md
@@ -2,63 +2,28 @@
 This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
- [1. Data and Weights Preparation](#1-data-and-weights-preparatio)
+- [1. Data and Weights Preparation](#1-data-and-weights-preparation)
-  * [1.1 Data Preparation](#11-data-preparation)
+  - [1.1 Data Preparation](#11-data-preparation)
-  * [1.2 Download Pre-trained Model](#12-download-pretrained-model)
+  - [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
 - [2. Training](#2-training)
  * [2.1 Start Training](#21-start-training)
  * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
  * [2.3 Training with New Backbone](#23-training-with-new-backbone)
-  * [2.4 Training with knowledge distillation](#24)
+  * [2.4 Mixed Precision Training](#24-amp-training)
+  * [2.5 Distributed Training](#25-distributed-training)
+  * [2.6 Training with knowledge distillation](#26)
+  * [2.7 Training on other platform(Windows/macOS/Linux DCU)](#27)
 - [3. Evaluation and Test](#3-evaluation-and-test)
-  * [3.1 Evaluation](#31-evaluation)
+  - [3.1 Evaluation](#31-evaluation)
-  * [3.2 Test](#32-test)
+  - [3.2 Test](#32-test)
 - [4. Inference](#4-inference)
- [5. FAQ](#2-faq)
+- [5. FAQ](#5-faq)
 ## 1. Data and Weights Preparation
 ### 1.1 Data Preparation
-The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
+To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets_en.md) .
-After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`
-<p align="center">
- <img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
-<p align="center">
-Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
-```shell
-# Under the PaddleOCR path
-cd PaddleOCR/
-wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
-wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
-```
-After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
-```
-/PaddleOCR/train_data/icdar2015/text_localization/
-  └─ icdar_c4_train_imgs/         Training data of icdar dataset
-  └─ ch4_test_images/             Testing data of icdar dataset
-  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
-  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
-```
-The provided annotation file format is as follow, separated by "\t":
-```
-" Image file name             Image annotation information encoded by json.dumps"
-ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
-```
-The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
-The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
-`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**
-If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
 ### 1.2 Download Pre-trained Model
@@ -175,11 +140,44 @@ After adding the four-part modules of the network, you only need to configure th
 **NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
+### 2.4 Mixed Precision Training
-### 2.4 Training with knowledge distillation
+If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows:
+```shell
+python3 tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
+ ```
+### 2.5 Distributed Training
+During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID:
+```bash
+python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
+```
+**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
+### 2.6 Training with knowledge distillation
 Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md).
+### 2.7 Training on other platform(Windows/macOS/Linux DCU)
+- Windows GPU/CPU
+The Windows platform is slightly different from the Linux platform:
+Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0`
+On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0;
+- macOS
+GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU.
+- Linux DCU
+Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU.
 ## 3. Evaluation and Test
 ### 3.1 Evaluation

--- a/doc/doc_en/ocr_book_en.md
+++ b/doc/doc_en/ocr_book_en.md
+# E-book: *Dive Into OCR*
\ No newline at end of file
--- a/doc/doc_en/ppocr_introduction_en.md
+++ b/doc/doc_en/ppocr_introduction_en.md
+English | [简体中文](../doc_ch/ppocr_introduction.md)
+# PP-OCR
+- [1. Introduction](#1)
+- [2. Features](#2)
+- [3. Benchmark](#3)
+- [4. Visualization](#4)
+- [5. Tutorial](#5)
+    - [5.1 Quick start](#51)
+    - [5.2 Model training / compression / deployment](#52)
+- [6. Model zoo](#6)
+<a name="1"></a>
+## 1. Introduction
+PP-OCR is a self-developed practical ultra-lightweight OCR system, which is slimed and optimized based on the reimplemented [academic algorithms](algorithm_en.md), considering the balance between **accuracy** and **speed**.
+PP-OCR is a two-stage OCR system, in which the text detection algorithm is [DB](algorithm_det_db_en.md), and the text recognition algorithm is [CRNN](algorithm_rec_crnn_en.md). Besides, a [text direction classifier](angle_class_en.md) is added between the detection and recognition modules to deal with text in different directions.
+PP-OCR pipeline is as follows:
+<div align="center">
+    <img src="../ppocrv2_framework.jpg" width="800">
+</div>
+PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
+[1] PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
+[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
+<a name="2"></a>
+## 2. Features
+- Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
+- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
+- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
+- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
+- Support multi-lingual recognition: about 80 languages like Korean, Japanese, German, French, etc
+<a name="3"></a>
+## 3. benchmark
+For the performance comparison between PP-OCR series models, please check the [benchmark](./benchmark_en.md) documentation.
+<a name="4"></a>
+## 4. Visualization [more](./visualization.md)
+<details open>
+<summary>PP-OCRv2 English model</summary>
+<div align="center">
+    <img src="../imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
+</div>
+</details>
+<details open>
+<summary>PP-OCRv2 Chinese model</summary>
+<div align="center">
+      <img src="../imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
+      <img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
+</div>
+<div align="center">
+    <img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
+    <img src="../imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
+</div>
+</details>
+<details open>
+<summary>PP-OCRv2 Multilingual model</summary>
+<div align="center">
+    <img src="../imgs_results/french_0.jpg" width="800">
+    <img src="../imgs_results/korean.jpg" width="800">
+</div>
+</details>
+<a name="5"></a>
+## 5. Tutorial
+<a name="51"></a>
+### 5.1 Quick start
+- You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
+- Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Android systems): [Sign in to the website to obtain the QR code for  installing the App](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)
+- One line of code quick use: [Quick Start](./quickstart_en.md)
+<a name="52"></a>
+### 5.2 Model training / compression / deployment
+For more tutorials, including model training, model compression, deployment, etc., please refer to [tutorials](../../README.md#Tutorials)。
+<a name="6"></a>
+## 6. Model zoo
+## PP-OCR Series Model List（Update on September 8th）
+| Model introduction                                           | Model name                   | Recommended scene | Detection model                                              | Direction classifier                                         | Recognition model                                            |
+| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| Chinese and English ultra-lightweight PP-OCRv2 model（11.6M） |  ch_PP-OCRv2_xx |Mobile & Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
+| Chinese and English ultra-lightweight PP-OCR model (9.4M)       | ch_ppocr_mobile_v2.0_xx      | Mobile & server   |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar)      |
+| Chinese and English general PP-OCR model (143.4M)               | ch_ppocr_server_v2.0_xx      | Server            |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)    |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar)    |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar)  |
+For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./models_list_en.md).
+For a new language request, please refer to [Guideline for new language_requests](../../README.md#language_requests).
--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
 # Text Recognition
 - [1. Data Preparation](#DATA_PREPARATION)
-    - [1.1 Costom Dataset](#Costom_Dataset)
+  * [1.1 Costom Dataset](#Costom_Dataset)
-    - [1.2 Dataset Download](#Dataset_download)
+  * [1.2 Dataset Download](#Dataset_download)
-    - [1.3 Dictionary](#Dictionary)  
+  * [1.3 Dictionary](#Dictionary)  
-    - [1.4 Add Space Category](#Add_space_category)
+  * [1.4 Add Space Category](#Add_space_category)
+  * [1.5 Data Augmentation](#Data_Augmentation)
 - [2. Training](#TRAINING)
-    - [2.1 Data Augmentation](#Data_Augmentation)
+  * [2.1 Start Training](#21-start-training)
-    - [2.2 General Training](#Training)
+  * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
-    - [2.3 Multi-language Training](#Multi_language)
+  * [2.3 Training with New Backbone](#23-training-with-new-backbone)
-    - [2.4 Training with Knowledge Distillation](#kd)
+  * [2.4 Mixed Precision Training](#24-amp-training)
+  * [2.5 Distributed Training](#25-distributed-training)
- [3. Evaluation](#EVALUATION)
+  * [2.6 Training with knowledge distillation](#kd)
+  * [2.7 Multi-language Training](#Multi_language)
- [4. Prediction](#PREDICTION)
+  * [2.8 Training on other platform(Windows/macOS/Linux DCU)](#28)
- [5. Convert to Inference Model](#Inference)
+- [3. Evaluation and Test](#3-evaluation-and-test)
+  * [3.1 Evaluation](#31-evaluation)
+  * [3.2 Test](#32-test)
+- [4. Inference](#4-inference)
+- [5. FAQ](#5-faq)
 <a name="DATA_PREPARATION"></a>
 ## 1. Data Preparation
+### 1.1 DataSet Preparation
-PaddleOCR supports two data formats:
+To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets.md) .
- `LMDB` is used to train data sets stored in lmdb format（LMDBDataSet）;
- `general data` is used to train data sets stored in text files（SimpleDataSet）:
-Please organize the dataset as follows:
-The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
-```
-# linux and mac os
-ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
-# windows
-mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
-```
-<a name="Costom_Dataset"></a>
-### 1.1 Costom Dataset
-If you want to use your own data for training, please refer to the following to organize your data.
- Training set
-It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
-* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
-```
-" Image file name           Image annotation "
-train_data/rec/train/word_001.jpg   简单可依赖
-train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
-...
-```
-The final training set should have the following file structure:
-```
-|-train_data
-  |-rec
-    |- rec_gt_train.txt
-    |- train
-        |- word_001.png
-        |- word_002.jpg
-        |- word_003.jpg
-        | ...
-```
- Test set
-Similar to the training set, the test set also needs to be provided a folder containing all images (test) and a rec_gt_test.txt. The structure of the test set is as follows:
-```
-|-train_data
-  |-rec
-    |-ic15_data
-        |- rec_gt_test.txt
-        |- test
-            |- word_001.jpg
-            |- word_002.jpg
-            |- word_003.jpg
-            | ...
-```
-<a name="Dataset_download"></a>
-### 1.2 Dataset Download
- ICDAR2015
-If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads).
-Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，download the lmdb format dataset required for benchmark
 If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
-PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
-```
-# Training set label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
-# Test Set Label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
-```
-PaddleOCR also provides a data format conversion script, which can convert ICDAR official website label to a data format
-supported by PaddleOCR. The data conversion tool is in `ppocr/utils/gen_label.py`, here is the training set as an example:
-```
-# convert the official gt to rec_gt_label.txt
-python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
-```
-The data format is as follows, (a) is the original picture, (b) is the Ground Truth text file corresponding to each picture:
-![](../datasets/icdar_rec.png)
- Multilingual dataset
-The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
-* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) ,Extraction code:frgi.
-* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
 <a name="Dictionary"></a>
-### 1.3 Dictionary
+### 1.2 Dictionary
 Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
@@ -173,11 +80,8 @@ If you need to customize dic file, please add character_dict_path field in confi
 If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
-<a name="TRAINING"></a>
-## 2.Training
 <a name="Data_Augmentation"></a>
-### 2.1 Data Augmentation
+### 1.5 Data Augmentation
 PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
@@ -185,11 +89,14 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
 Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
-<a name="Training"></a>
+<a name="TRAINING"></a>
-### 2.2 General Training
+## 2.Training
 PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
+<a name="21-start-training"></a>
+### 2.1 Start Training
 First download the pretrain model, you can download the trained model to finetune on the icdar2015 data:
 ```
@@ -305,8 +212,99 @@ Eval:
 ```
 **Note that the configuration file for prediction/evaluation must be consistent with the training.**
+<a name="22-load-trained-model-and-continue-training"></a>
+### 2.2 Load Trained Model and Continue Training
+If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
+For example:
+```shell
+python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model
+```
+**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
+<a name="23-training-with-new-backbone"></a>
+### 2.3 Training with New Backbone
+The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
+necks->heads).
+```bash
+├── architectures # Code for building network
+├── transforms    # Image Transformation Module
+├── backbones     # Feature extraction module
+├── necks         # Feature enhancement module
+└── heads         # Output module
+```
+If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.
+However, if you want to use a new Backbone, an example of replacing the backbones is as follows:
+1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
+2. Add code in the my_backbone.py file, the sample code is as follows:
+```python
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+class MyBackbone(nn.Layer):
+    def __init__(self, *args, **kwargs):
+        super(MyBackbone, self).__init__()
+        # your init code
+        self.conv = nn.xxxx
+    def forward(self, inputs):
+        # your network forward
+        y = self.conv(inputs)
+        return y
+```
+3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
+After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
+```yaml
+  Backbone:
+    name: MyBackbone
+    args1: args1
+```
+**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
+<a name="24-amp-training"></a>
+### 2.4 Mixed Precision Training
+If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows:
+```shell
+python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \
+     -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \
+     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
+ ```
+<a name="25-distributed-training"></a>
+### 2.5 Distributed Training
+During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID:
+```bash
+python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \
+     -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train
+```
+**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.
+<a name="kd"></a>
+### 2.6 Training with Knowledge Distillation
+Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md).
 <a name="Multi_language"></a>
-### 2.3 Multi-language Training
+### 2.7 Multi-language Training
 Currently, the multi-language algorithms supported by PaddleOCR are:
@@ -362,25 +360,35 @@ Eval:
    ...
 ```
-<a name="kd"></a>
+<a name="28"></a>
+### 2.8 Training on other platform(Windows/macOS/Linux DCU)
-### 2.4 Training with Knowledge Distillation
+- Windows GPU/CPU
+The Windows platform is slightly different from the Linux platform:
+Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0`
+On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0;
-Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md).
+- macOS
+GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU.
+- Linux DCU
+Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU.
-<a name="EVALUATION"></a>
+<a name="3-evaluation-and-test"></a>
+## 3. Evaluation and Test
-## 3. Evalution
+<a name="31-evaluation"></a>
+### 3.1 Evaluation
-The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
+The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
 ```
 # GPU evaluation, Global.checkpoints is the weight to be tested
 python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
 ```
-<a name="PREDICTION"></a>
+<a name="32-test"></a>
-## 4. Prediction
+### 3.2 Test
 Using the model trained by paddleocr, you can quickly get prediction through the following script.
@@ -442,9 +450,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg
        result: ('韩国小馆', 0.997218)
 ```
-<a name="Inference"></a>
+<a name="4-inference"></a>
+## 4. Inference
-## 5. Convert to Inference Model
+The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
+The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
+Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
 The recognition model is converted to the inference model in the same way as the detection, as follows:
@@ -462,7 +475,7 @@ If you have a model trained on your own dataset with a different dictionary file
 After the conversion is successful, there are three files in the model save directory:
 ```
-inference/det_db/
+inference/rec_crnn/
    ├── inference.pdiparams         # The parameter file of recognition inference model
    ├── inference.pdiparams.info    # The parameter information of recognition inference model, which can be ignored
    └── inference.pdmodel           # The program file of recognition model
@@ -475,3 +488,10 @@ inference/det_db/
  ```
  python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
  ```
+<a name="5-faq"></a>
+## 5. FAQ
+Q1: After the training model is transferred to the inference model, the prediction effect is inconsistent?
+**A**: There are many such problems, and the problems are mostly caused by inconsistent preprocessing and postprocessing parameters when the trained model predicts and the preprocessing and postprocessing parameters when the inference model predicts. You can compare whether there are differences in preprocessing, postprocessing, and prediction in the configuration files used for training.
--- a/doc/doc_en/training_en.md
+++ b/doc/doc_en/training_en.md
@@ -94,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
    - Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
    - Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
-Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
+Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](dataset/datasets_en.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
 <a name="22-vertical-scene"></a>

--- a/doc/doc_en/update_en.md
+++ b/doc/doc_en/update_en.md
@@ -19,7 +19,7 @@
 - 2020.7.15, Add several related datasets, data annotation and synthesis tools.
 - 2020.7.9 Add a new model to support recognize the  character "space".
 - 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./datasets_en.md) and keep updating
+- 2020.6.8 Add [datasets](dataset/datasets_en.md) and keep updating
 - 2020.6.5 Support exporting `attention` model to `inference_model`
 - 2020.6.5 Support separate prediction and recognition, output result score
 - 2020.5.30 Provide Lightweight Chinese OCR online experience

--- a/doc/features.png
+++ b/doc/features.png
--- a/doc/features_en.png
+++ b/doc/features_en.png
--- a/doc/imgs_results/det_res_img623_fce.jpg
+++ b/doc/imgs_results/det_res_img623_fce.jpg
--- a/doc/imgs_results/det_res_img_10_fce.jpg
+++ b/doc/imgs_results/det_res_img_10_fce.jpg
--- a/doc/imgs_results/det_res_img_10_pse.jpg
+++ b/doc/imgs_results/det_res_img_10_pse.jpg
--- a/doc/imgs_results/det_res_img_10_pse_poly.jpg
+++ b/doc/imgs_results/det_res_img_10_pse_poly.jpg
--- a/doc/ocr-android-easyedge.png
+++ b/doc/ocr-android-easyedge.png
--- a/doc/overview.png
+++ b/doc/overview.png
--- a/paddleocr.py
+++ b/paddleocr.py
@@ -47,7 +47,7 @@ __all__ = [
 ]
 SUPPORT_DET_MODEL = ['DB']
-VERSION = '2.4.0.4'
+VERSION = '2.5'
 SUPPORT_REC_MODEL = ['CRNN']
 BASE_DIR = os.path.expanduser("~/.paddleocr/")
@@ -442,7 +442,7 @@ class PPStructure(StructureSystem):
        logger.debug(params)
        super().__init__(params)
-    def __call__(self, img):
+    def __call__(self, img, return_ocr_result_in_table=False):
        if isinstance(img, str):
            # download net image
            if img.startswith('http'):
@@ -460,7 +460,7 @@ class PPStructure(StructureSystem):
        if isinstance(img, np.ndarray) and len(img.shape) == 2:
            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
-        res = super().__call__(img)
+        res = super().__call__(img, return_ocr_result_in_table)
        return res

--- a/ppocr/data/__init__.py
+++ b/ppocr/data/__init__.py
@@ -72,6 +72,7 @@ def build_dataloader(config, mode, device, logger, seed=None):
        use_shared_memory = loader_config['use_shared_memory']
    else:
        use_shared_memory = True
    if mode == "Train":
        # Distribute data to multiple cards
        batch_sampler = DistributedBatchSampler(

--- a/ppocr/data/collate_fn.py
+++ b/ppocr/data/collate_fn.py
@@ -56,3 +56,17 @@ class ListCollator(object):
        for idx in to_tensor_idxs:
            data_dict[idx] = paddle.to_tensor(data_dict[idx])
        return list(data_dict.values())
+class SSLRotateCollate(object):
+    """
+    bach: [
+        [(4*3xH*W), (4,)]
+        [(4*3xH*W), (4,)]
+        ...
+    ]
+    """
+    def __call__(self, batch):
+        output = [np.concatenate(d, axis=0) for d in zip(*batch)]
+        return output
--- a/ppocr/data/imaug/__init__.py
+++ b/ppocr/data/imaug/__init__.py
@@ -22,8 +22,9 @@ from .make_shrink_map import MakeShrinkMap
 from .random_crop_data import EastRandomCropData, RandomCropImgMask
 from .make_pse_gt import MakePseGt
-from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, \
+from .rec_img_aug import RecAug, RecConAug, RecResizeImg, ClsResizeImg, \
-    SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, PRENResizeImg
+    SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, PRENResizeImg, SVTRRecResizeImg
+from .ssl_img_aug import SSLRotateResize
 from .randaugment import RandAugment
 from .copy_paste import CopyPaste
 from .ColorJitter import ColorJitter

--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@@ -22,6 +22,7 @@ import numpy as np
 import string
 from shapely.geometry import LineString, Point, Polygon
 import json
+import copy
 from ppocr.utils.logging import get_logger
@@ -112,14 +113,14 @@ class BaseRecLabelEncode(object):
            dict_character = list(self.character_str)
            self.lower = True
        else:
-            self.character_str = ""
+            self.character_str = []
            with open(character_dict_path, "rb") as fin:
                lines = fin.readlines()
                for line in lines:
                    line = line.decode('utf-8').strip("\n").strip("\r\n")
-                    self.character_str += line
+                    self.character_str.append(line)
            if use_space_char:
-                self.character_str += " "
+                self.character_str.append(" ")
            dict_character = list(self.character_str)
        dict_character = self.add_special_char(dict_character)
        self.dict = {}
@@ -1007,3 +1008,34 @@ class VQATokenLabelEncode(object):
            gt_label.extend([self.label2id_map[("i-" + label).upper()]] *
                            (len(encode_res["input_ids"]) - 1))
        return gt_label
+class MultiLabelEncode(BaseRecLabelEncode):
+    def __init__(self,
+                 max_text_length,
+                 character_dict_path=None,
+                 use_space_char=False,
+                 **kwargs):
+        super(MultiLabelEncode, self).__init__(
+            max_text_length, character_dict_path, use_space_char)
+        self.ctc_encode = CTCLabelEncode(max_text_length, character_dict_path,
+                                         use_space_char, **kwargs)
+        self.sar_encode = SARLabelEncode(max_text_length, character_dict_path,
+                                         use_space_char, **kwargs)
+    def __call__(self, data):
+        data_ctc = copy.deepcopy(data)
+        data_sar = copy.deepcopy(data)
+        data_out = dict()
+        data_out['img_path'] = data.get('img_path', None)
+        data_out['image'] = data['image']
+        ctc = self.ctc_encode.__call__(data_ctc)
+        sar = self.sar_encode.__call__(data_sar)
+        if ctc is None or sar is None:
+            return None
+        data_out['label_ctc'] = ctc['label']
+        data_out['label_sar'] = sar['label']
+        data_out['length'] = ctc['length']
+        return data_out
--- a/ppocr/data/imaug/rec_img_aug.py
+++ b/ppocr/data/imaug/rec_img_aug.py
@@ -16,6 +16,7 @@ import math
 import cv2
 import numpy as np
 import random
+import copy
 from PIL import Image
 from .text_image_aug import tia_perspective, tia_stretch, tia_distort
@@ -32,13 +33,56 @@ class RecAug(object):
        return data
+class RecConAug(object):
+    def __init__(self,
+                 prob=0.5,
+                 image_shape=(32, 320, 3),
+                 max_text_length=25,
+                 ext_data_num=1,
+                 **kwargs):
+        self.ext_data_num = ext_data_num
+        self.prob = prob
+        self.max_text_length = max_text_length
+        self.image_shape = image_shape
+        self.max_wh_ratio = self.image_shape[1] / self.image_shape[0]
+    def merge_ext_data(self, data, ext_data):
+        ori_w = round(data['image'].shape[1] / data['image'].shape[0] *
+                      self.image_shape[0])
+        ext_w = round(ext_data['image'].shape[1] / ext_data['image'].shape[0] *
+                      self.image_shape[0])
+        data['image'] = cv2.resize(data['image'], (ori_w, self.image_shape[0]))
+        ext_data['image'] = cv2.resize(ext_data['image'],
+                                       (ext_w, self.image_shape[0]))
+        data['image'] = np.concatenate(
+            [data['image'], ext_data['image']], axis=1)
+        data["label"] += ext_data["label"]
+        return data
+    def __call__(self, data):
+        rnd_num = random.random()
+        if rnd_num > self.prob:
+            return data
+        for idx, ext_data in enumerate(data["ext_data"]):
+            if len(data["label"]) + len(ext_data[
+                    "label"]) > self.max_text_length:
+                break
+            concat_ratio = data['image'].shape[1] / data['image'].shape[
+                0] + ext_data['image'].shape[1] / ext_data['image'].shape[0]
+            if concat_ratio > self.max_wh_ratio:
+                break
+            data = self.merge_ext_data(data, ext_data)
+        data.pop("ext_data")
+        return data
 class ClsResizeImg(object):
    def __init__(self, image_shape, **kwargs):
        self.image_shape = image_shape
    def __call__(self, data):
        img = data['image']
-        norm_img = resize_norm_img(img, self.image_shape)
+        norm_img, _ = resize_norm_img(img, self.image_shape)
        data['image'] = norm_img
        return data
@@ -98,10 +142,13 @@ class RecResizeImg(object):
    def __call__(self, data):
        img = data['image']
        if self.infer_mode and self.character_dict_path is not None:
-            norm_img = resize_norm_img_chinese(img, self.image_shape)
+            norm_img, valid_ratio = resize_norm_img_chinese(img,
+                                                            self.image_shape)
        else:
-            norm_img = resize_norm_img(img, self.image_shape, self.padding)
+            norm_img, valid_ratio = resize_norm_img(img, self.image_shape,
+                                                    self.padding)
        data['image'] = norm_img
+        data['valid_ratio'] = valid_ratio
        return data
@@ -160,6 +207,25 @@ class PRENResizeImg(object):
        return data
+class SVTRRecResizeImg(object):
+    def __init__(self,
+                 image_shape,
+                 infer_mode=False,
+                 character_dict_path='./ppocr/utils/ppocr_keys_v1.txt',
+                 padding=True,
+                 **kwargs):
+        self.image_shape = image_shape
+        self.infer_mode = infer_mode
+        self.character_dict_path = character_dict_path
+        self.padding = padding
+    def __call__(self, data):
+        img = data['image']
+        norm_img = resize_norm_img_svtr(img, self.image_shape, self.padding)
+        data['image'] = norm_img
+        return data
 def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25):
    imgC, imgH, imgW_min, imgW_max = image_shape
    h = img.shape[0]
@@ -220,7 +286,8 @@ def resize_norm_img(img, image_shape, padding=True):
    resized_image /= 0.5
    padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
    padding_im[:, :, 0:resized_w] = resized_image
-    return padding_im
+    valid_ratio = min(1.0, float(resized_w / imgW))
+    return padding_im, valid_ratio
 def resize_norm_img_chinese(img, image_shape):
@@ -230,7 +297,7 @@ def resize_norm_img_chinese(img, image_shape):
    h, w = img.shape[0], img.shape[1]
    ratio = w * 1.0 / h
    max_wh_ratio = max(max_wh_ratio, ratio)
-    imgW = int(32 * max_wh_ratio)
+    imgW = int(imgH * max_wh_ratio)
    if math.ceil(imgH * ratio) > imgW:
        resized_w = imgW
    else:
@@ -246,7 +313,8 @@ def resize_norm_img_chinese(img, image_shape):
    resized_image /= 0.5
    padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
    padding_im[:, :, 0:resized_w] = resized_image
-    return padding_im
+    valid_ratio = min(1.0, float(resized_w / imgW))
+    return padding_im, valid_ratio
 def resize_norm_img_srn(img, image_shape):
@@ -276,6 +344,58 @@ def resize_norm_img_srn(img, image_shape):
    return np.reshape(img_black, (c, row, col)).astype(np.float32)
+def resize_norm_img_svtr(img, image_shape, padding=False):
+    imgC, imgH, imgW = image_shape
+    h = img.shape[0]
+    w = img.shape[1]
+    if not padding:
+        if h > 2.0 * w:
+            image = Image.fromarray(img)
+            image1 = image.rotate(90, expand=True)
+            image2 = image.rotate(-90, expand=True)
+            img1 = np.array(image1)
+            img2 = np.array(image2)
+        else:
+            img1 = copy.deepcopy(img)
+            img2 = copy.deepcopy(img)
+        resized_image = cv2.resize(
+            img, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_image1 = cv2.resize(
+            img1, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_image2 = cv2.resize(
+            img2, (imgW, imgH), interpolation=cv2.INTER_LINEAR)
+        resized_w = imgW
+    else:
+        ratio = w / float(h)
+        if math.ceil(imgH * ratio) > imgW:
+            resized_w = imgW
+        else:
+            resized_w = int(math.ceil(imgH * ratio))
+        resized_image = cv2.resize(img, (resized_w, imgH))
+    resized_image = resized_image.astype('float32')
+    resized_image1 = resized_image1.astype('float32')
+    resized_image2 = resized_image2.astype('float32')
+    if image_shape[0] == 1:
+        resized_image = resized_image / 255
+        resized_image = resized_image[np.newaxis, :]
+    else:
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image1 = resized_image1.transpose((2, 0, 1)) / 255
+        resized_image2 = resized_image2.transpose((2, 0, 1)) / 255
+    resized_image -= 0.5
+    resized_image /= 0.5
+    resized_image1 -= 0.5
+    resized_image1 /= 0.5
+    resized_image2 -= 0.5
+    resized_image2 /= 0.5
+    padding_im = np.zeros((3, imgC, imgH, imgW), dtype=np.float32)
+    padding_im[0, :, :, 0:resized_w] = resized_image
+    padding_im[1, :, :, 0:resized_w] = resized_image1
+    padding_im[2, :, :, 0:resized_w] = resized_image2
+    return padding_im
 def srn_other_inputs(image_shape, num_heads, max_text_length):
    imgC, imgH, imgW = image_shape