"vscode:/vscode.git/clone" did not exist on "dbea8e743cd938de998735c0b22c53baabbe9b9b"
Commit 41a1b292 authored by Leif's avatar Leif
Browse files

Merge remote-tracking branch 'origin/dygraph' into dygraph

parents 9471054e 3d30899b
# Enhanced CTC Loss
In OCR recognition, CRNN is a text recognition algorithm widely applied in the industry. In the training phase, it uses CTCLoss to calculate the network loss. In the inference phase, it uses CTCDecode to obtain the decoding result. Although the CRNN algorithm has been proven to achieve reliable recognition results in actual business, users have endless requirements for recognition accuracy. So how to improve the accuracy of text recognition? Taking CTCLoss as the starting point, this paper explores the improved fusion scheme of CTCLoss from three different perspectives: Hard Example Mining, Multi-task Learning, and Metric Learning. Based on the exploration, we propose EnhancedCTCLoss, which includes the following 3 components: Focal-CTC Loss, A-CTC Loss, C-CTC Loss.
## 1. Focal-CTC Loss
Focal Loss was proposed by the paper, "[Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)". When the loss was first proposed, it was mainly to solve the problem of a serious imbalance in the ratio of positive and negative samples in one-stage target detection. This loss function reduces the weight of a large number of simple negative samples in training and also can be understood as a kind of difficult sample mining.
The form of the loss function is as follows:
<div align="center">
<img src="./focal_loss_formula.png" width = "600" />
</div>
Among them, y' is the output of the activation function, and the value is between 0-1. It adds a modulation factor (1-y’)^&gamma; and a balance factor &alpha; on the basis of the original cross-entropy loss. When &alpha; = 1, y = 1, the comparison between the loss function and the cross-entropy loss is shown in the following figure:
<div align="center">
<img src="./focal_loss_image.png" width = "600" />
</div>
As can be seen from the above figure, when &gamma; > 0, the adjustment coefficient (1-y’)^&gamma; gives smaller weight to the easy-to-classify sample loss, making the network pay more attention to the difficult and misclassified samples. The adjustment factor &gamma; is used to adjust the rate at which the weight of simple samples decreases. When &gamma; = 0, it is the cross-entropy loss function. When &gamma; increases, the influence of the adjustment factor will also increase. Experiments revealed that 2 is the optimal value of &gamma;. The balance factor &alpha; is used to balance the uneven proportions of the positive and negative samples. In the text, &alpha; is taken as 0.25.
For the classic CTC algorithm, suppose a certain feature sequence (f<sub>1</sub>, f<sub>2</sub>, ......f<sub>t</sub>), after CTC decoding, the probability that the result is equal to label is y', then the probability that the CTC decoding result is not equal to label is (1-y'); it is not difficult to find that the CTCLoss value and y' have the following relationship:
<div align="center">
<img src="./equation_ctcloss.png" width = "250" />
</div>
Combining the idea of Focal Loss, assigning larger weights to difficult samples and smaller weights to simple samples can make the network focus more on the mining of difficult samples and further improve the accuracy of recognition. Therefore, we propose Focal-CTC Loss. Its definition is as follows:
<div align="center">
<img src="./equation_focal_ctc.png" width = "500" />
</div>
In the experiment, the value of &gamma; is 2, &alpha; = 1, see this for specific implementation: [rec_ctc_loss.py](../../ppocr/losses/rec_ctc_loss.py)
## 2. A-CTC Loss
A-CTC Loss is short for CTC Loss + ACE Loss. Among them, ACE Loss was proposed by the paper, “[Aggregation Cross-Entropy for Sequence Recognition](https://arxiv.org/abs/1904.08364)”. Compared with CTCLoss, ACE Loss has the following two advantages:
+ ACE Loss can solve the recognition problem of 2-D text, while CTCLoss can only process 1-D text
+ ACE Loss is better than CTC loss in time complexity and space complexity
The advantages and disadvantages of the OCR recognition algorithm summarized by the predecessors are shown in the following figure:
<div align="center">
<img src="./rec_algo_compare.png" width = "1000" />
</div>
Although ACELoss does handle 2D predictions, as shown in the figure above, and has advantages in memory usage and inference speed, in practice, we found that using ACELoss alone, the recognition effect is not as good as CTCLoss. Consequently, we tried to combine CTCLoss and ACELoss, and CTCLoss is the mainstay while ACELoss acts as an auxiliary supervision loss. This attempt has achieved better results. On our internal experimental data set, compared to using CTCLoss alone, the recognition accuracy can be improved by about 1%.
A_CTC Loss is defined as follows:
<div align="center">
<img src="./equation_a_ctc.png" width = "300" />
</div>
In the experiment, λ = 0.1. See the ACE loss implementation code: [ace_loss.py](../../ppocr/losses/ace_loss.py)
## 3. C-CTC Loss
C-CTC Loss is short for CTC Loss + Center Loss. Among them, Center Loss was proposed by the paper, “[A Discriminative Feature Learning Approach for Deep Face Recognition](https://link.springer.com/chapter/10.1007/978-3-319-46478-7_31)“. It was first used in face recognition tasks to increase the distance between classes and reduce the distance within classes. It is an earlier and also widely used algorithm.
In the task of Chinese OCR recognition, through the analysis of bad cases, we found that a major difficulty in Chinese recognition is that there are many similar characters, which are easy to misunderstand. From this, we thought about whether we can learn from the idea of n to increase the class spacing of similar characters, to improve recognition accuracy. However, Metric Learning is mainly used in the field of image recognition, and the label of the training data is a fixed value; for OCR recognition, it is a sequence recognition task essentially, and there is no explicit alignment between features and labels. Therefore, how to combine the two is still a direction worth exploring.
By trying Arcmargin, Cosmargin and other methods, we finally found that Centerloss can help further improve the accuracy of recognition. C_CTC Loss is defined as follows:
<div align="center">
<img src="./equation_c_ctc.png" width = "300" />
</div>
In the experiment, we set λ=0.25. See the center_loss implementation code: [center_loss.py](../../ppocr/losses/center_loss.py)
It is worth mentioning that in C-CTC Loss, choosing to initialize the Center randomly does not bring significant improvement. Our Center initialization method is as follows:
+ Based on the original CTCLoss, a network N is obtained by training
+ Select the training set, identify the completely correct part, and form the set G
+ Send each sample in G to the network, perform forward calculation, and extract the correspondence between the input of the last FC layer (ie feature) and the result of argmax calculation (ie index)
+ Aggregate features with the same index, calculate the average, and get the initial center of each character.
Taking the configuration file `configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml` as an example, the center extraction command is as follows:
```
python tools/export_center.py -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml -o Global.pretrained_model="./output/rec_mobile_pp-OCRv2/best_accuracy"
```
After running, `train_center.pkl` will be generated in the main directory of PaddleOCR.
## 4. Experiment
For the above three solutions, we conducted training and evaluation based on Baidu's internal data set. The experimental conditions are shown in the following table:
| algorithm | Focal_CTC | A_CTC | C-CTC |
| :-------- | :-------- | ----: | :---: |
| gain | +0.3% | +0.7% | +1.7% |
Based on the above experimental conclusions, we adopted the C-CTC strategy in PP-OCRv2. It is worth mentioning that, because PP-OCRv2 deals with the recognition task of 6625 Chinese characters, the character set is relatively large and there are many similar characters, so the C-CTC solution brings a significant improvement on this task. But if you switch to other OCR recognition tasks, the conclusion may be different. You can try Focal-CTC, A-CTC, C-CTC, and the combined solution EnhancedCTC. We believe it will bring different degrees of improvement.
The unified combined plan is shown in the following file: [rec_enhanced_ctc_loss.py](../../ppocr/losses/rec_enhanced_ctc_loss.py)
\ No newline at end of file
...@@ -4,9 +4,9 @@ Windows and Mac users are recommended to use Anaconda to build a Python environm ...@@ -4,9 +4,9 @@ Windows and Mac users are recommended to use Anaconda to build a Python environm
Recommended working environment: Recommended working environment:
- PaddlePaddle >= 2.0.0 (2.1.2) - PaddlePaddle >= 2.0.0 (2.1.2)
- python3.7 - Python 3.7
- CUDA10.1 / CUDA10.2 - CUDA 10.1 / CUDA 10.2
- CUDNN 7.6 - cuDNN 7.6
* [1. Python Environment Setup](#1) * [1. Python Environment Setup](#1)
+ [1.1 Windows](#1.1) + [1.1 Windows](#1.1)
...@@ -25,7 +25,7 @@ Recommended working environment: ...@@ -25,7 +25,7 @@ Recommended working environment:
#### 1.1.1 Install Anaconda #### 1.1.1 Install Anaconda
- Note: To use paddlepaddle you need to install python environment first, here we choose python integrated environment Anaconda toolkit - Note: To use PaddlePaddle you need to install python environment first, here we choose python integrated environment Anaconda toolkit
- Anaconda is a common python package manager - Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment. - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment.
...@@ -44,19 +44,19 @@ Recommended working environment: ...@@ -44,19 +44,19 @@ Recommended working environment:
<img src="../install/windows/anaconda_install_folder.png" alt="install config" width="500" align=" left"/> <img src="../install/windows/anaconda_install_folder.png" alt="install config" width="500" align=" left"/>
- Check conda to add environment variables and ignore the warning that - Check Conda to add environment variables and ignore the warning that
<img src="../install/windows/anaconda_install_env.png" alt="add conda to path" width="500" align="center"/> <img src="../install/windows/anaconda_install_env.png" alt="add conda to path" width="500" align="center"/>
#### 1.1.2 Opening the terminal and creating the conda environment #### 1.1.2 Opening the terminal and creating the Conda environment
- Open Anaconda Prompt terminal: bottom left Windows Start Menu -> Anaconda3 -> Anaconda Prompt start console - Open Anaconda Prompt terminal: bottom left Windows Start Menu -> Anaconda3 -> Anaconda Prompt start console
<img src="../install/windows/anaconda_prompt.png" alt="anaconda download" width="300" align="center"/> <img src="../install/windows/anaconda_prompt.png" alt="anaconda download" width="300" align="center"/>
- Create a new conda environment - Create a new Conda environment
```shell ```shell
# Enter the following command at the command line to create an environment named paddle_env # Enter the following command at the command line to create an environment named paddle_env
...@@ -70,7 +70,7 @@ Recommended working environment: ...@@ -70,7 +70,7 @@ Recommended working environment:
<img src="../install/windows/conda_new_env.png" alt="conda create" width="700" align="center"/> <img src="../install/windows/conda_new_env.png" alt="conda create" width="700" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line. - To activate the Conda environment you just created, enter the following command at the command line.
```shell ```shell
# Activate the paddle_env environment # Activate the paddle_env environment
...@@ -91,7 +91,7 @@ The above anaconda environment and python environment are installed ...@@ -91,7 +91,7 @@ The above anaconda environment and python environment are installed
#### 1.2.1 Installing Anaconda #### 1.2.1 Installing Anaconda
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit - Note: To use PaddlePaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager - Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
...@@ -108,17 +108,17 @@ The above anaconda environment and python environment are installed ...@@ -108,17 +108,17 @@ The above anaconda environment and python environment are installed
- Just follow the default settings, it will take a while to install - Just follow the default settings, it will take a while to install
- It is recommended to install a code editor such as vscode or pycharm - It is recommended to install a code editor such as VSCode or PyCharm
#### 1.2.2 Open a terminal and create a conda environment #### 1.2.2 Open a terminal and create a Conda environment
- Open the terminal - Open the terminal
- Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal - Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal
- **Add conda to the environment variables** - **Add Conda to the environment variables**
- Environment variables are added so that the system can recognize the conda command - Environment variables are added so that the system can recognize the Conda command
- Open `~/.bash_profile` in the terminal by typing the following command. - Open `~/.bash_profile` in the terminal by typing the following command.
...@@ -126,7 +126,7 @@ The above anaconda environment and python environment are installed ...@@ -126,7 +126,7 @@ The above anaconda environment and python environment are installed
vim ~/.bash_profile vim ~/.bash_profile
``` ```
- Add conda as an environment variable in `~/.bash_profile`. - Add Conda as an environment variable in `~/.bash_profile`.
```shell ```shell
# Press i first to enter edit mode # Press i first to enter edit mode
...@@ -156,12 +156,12 @@ The above anaconda environment and python environment are installed ...@@ -156,12 +156,12 @@ The above anaconda environment and python environment are installed
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit - When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized. - Verify that the Conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables - Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables - Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then Conda has been added to the environment variables
- Create a new conda environment - Create a new Conda environment
```shell ```shell
# Enter the following command at the command line to create an environment called paddle_env # Enter the following command at the command line to create an environment called paddle_env
...@@ -175,7 +175,7 @@ The above anaconda environment and python environment are installed ...@@ -175,7 +175,7 @@ The above anaconda environment and python environment are installed
- <img src="../install/mac/conda_create.png" alt="conda_create" width="600" align="center"/> - <img src="../install/mac/conda_create.png" alt="conda_create" width="600" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line. - To activate the Conda environment you just created, enter the following command at the command line.
```shell ```shell
# Activate the paddle_env environment # Activate the paddle_env environment
...@@ -198,7 +198,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -198,7 +198,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
#### 1.3.1 Anaconda environment configuration #### 1.3.1 Anaconda environment configuration
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit - Note: To use PaddlePaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager - Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment - After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
...@@ -214,9 +214,9 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -214,9 +214,9 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
- Select the appropriate version for your operating system - Select the appropriate version for your operating system
- Type `uname -m` in the terminal to check the command set used by your system - Type `uname -m` in the terminal to check the command set used by your system
- Download method 1: Download locally, then transfer the installation package to the linux server - Download method 1: Download locally, then transfer the installation package to the Linux server
- Download method 2: Directly use linux command line to download - Download method 2: Directly use Linux command line to download
```shell ```shell
# First install wget # First install wget
...@@ -277,12 +277,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -277,12 +277,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit - When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized. - Verify that the Conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables - Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables - Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then Conda has been added to the environment variables
- Create a new conda environment - Create a new Conda environment
```shell ```shell
# Enter the following command at the command line to create an environment called paddle_env # Enter the following command at the command line to create an environment called paddle_env
...@@ -296,7 +296,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit ...@@ -296,7 +296,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
<img src="../install/linux/conda_create.png" alt="conda_create" width="500" align="center"/> <img src="../install/linux/conda_create.png" alt="conda_create" width="500" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line. - To activate the Conda environment you just created, enter the following command at the command line.
```shell ```shell
# Activate the paddle_env environment # Activate the paddle_env environment
...@@ -335,13 +335,13 @@ sudo docker container exec -it ppocr /bin/bash ...@@ -335,13 +335,13 @@ sudo docker container exec -it ppocr /bin/bash
## 2. Install PaddlePaddle 2.0 ## 2. Install PaddlePaddle 2.0
- If you have cuda9 or cuda10 installed on your machine, please run the following command to install - If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
```bash ```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
``` ```
- If you only have cpu on your machine, please run the following command to install - If you have no available GPU on your machine, please run the following command to install the CPU version
```bash ```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
......
...@@ -139,7 +139,7 @@ tar xf ch_ppocr_mobile_v2.0_det_infer.tar ...@@ -139,7 +139,7 @@ tar xf ch_ppocr_mobile_v2.0_det_infer.tar
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/"
``` ```
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_00018069.jpg) ![](../imgs_results/det_res_00018069.jpg)
...@@ -244,7 +244,7 @@ The visualized text detection results are saved to the `./inference_results` fol ...@@ -244,7 +244,7 @@ The visualized text detection results are saved to the `./inference_results` fol
<a name="RECOGNITION_MODEL_INFERENCE"></a> <a name="RECOGNITION_MODEL_INFERENCE"></a>
## 3. Text Recognition Model Inference ## 3. Text Recognition Model Inference
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details. The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inference. Please check below for details.
<a name="LIGHTWEIGHT_RECOGNITION"></a> <a name="LIGHTWEIGHT_RECOGNITION"></a>
......
...@@ -7,7 +7,7 @@ This article introduces the use of the Python inference engine for the PP-OCR mo ...@@ -7,7 +7,7 @@ This article introduces the use of the Python inference engine for the PP-OCR mo
- [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE) - [Text Detection Model Inference](#DETECTION_MODEL_INFERENCE)
- [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE) - [Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
- [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION) - [1. Lightweight Chinese Recognition Model Inference](#LIGHTWEIGHT_RECOGNITION)
- [2. Multilingaul Model Inference](#MULTILINGUAL_MODEL_INFERENCE) - [2. Multilingual Model Inference](#MULTILINGUAL_MODEL_INFERENCE)
- [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE) - [Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE)
- [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION) - [Text Detection Angle Classification and Recognition Inference Concatenation](#CONCATENATION)
...@@ -25,7 +25,7 @@ tar xf ch_PP-OCRv2_det_infer.tar ...@@ -25,7 +25,7 @@ tar xf ch_PP-OCRv2_det_infer.tar
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer.tar/" python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv2_det_infer.tar/"
``` ```
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_00018069.jpg) ![](../imgs_results/det_res_00018069.jpg)
...@@ -75,7 +75,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658) ...@@ -75,7 +75,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
<a name="MULTILINGUAL_MODEL_INFERENCE"></a> <a name="MULTILINGUAL_MODEL_INFERENCE"></a>
### 2. Multilingaul Model Inference ### 2. Multilingual Model Inference
If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results, If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition: You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
......
## QUICK INSTALLATION ## QUICK INSTALLATION
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. After testing, PaddleOCR can run on glibc 2.23. You can also test other glibc versions or install glibc 2.23 for the best compatibility.
PaddleOCR working environment: PaddleOCR working environment:
- PaddlePaddle 2.0.0 - PaddlePaddle 2.0.0
- python3.7 - Python 3.7
- glibc 2.23 - glibc 2.23
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://www.runoob.com/docker/docker-tutorial.html/). It is recommended to use the docker provided by us to run PaddleOCR. Please refer to the docker tutorial [link](https://www.runoob.com/docker/docker-tutorial.html/).
*If you want to directly run the prediction code on mac or windows, you can start from step 2.* *If you want to directly run the prediction code on Mac or Windows, you can start from step 2.*
**1. (Recommended) Prepare a docker environment. The first time you use this docker image, it will be downloaded automatically. Please be patient.** **1. (Recommended) Prepare a docker environment. For the first time you use this docker image, it will be downloaded automatically. Please be patient.**
``` ```
# Switch to the working directory # Switch to the working directory
cd /home/Projects cd /home/Projects
...@@ -22,7 +22,7 @@ cd /home/Projects ...@@ -22,7 +22,7 @@ cd /home/Projects
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
``` ```
If using CUDA10, please run the following command to create a container. With CUDA10, please run the following command to create a container.
It is recommended to set a shared memory greater than or equal to 32G through the --shm-size parameter: It is recommended to set a shared memory greater than or equal to 32G through the --shm-size parameter:
``` ```
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
...@@ -51,11 +51,11 @@ For more software version requirements, please refer to the instructions in [Ins ...@@ -51,11 +51,11 @@ For more software version requirements, please refer to the instructions in [Ins
# Recommend # Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR git clone https://github.com/PaddlePaddle/PaddleOCR
# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: # If you cannot pull successfully due to network problems, you can switch to the mirror hosted on Gitee:
git clone https://gitee.com/paddlepaddle/PaddleOCR git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. # Note: The mirror on Gitee may not keep in synchronization with the latest update with the project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
``` ```
**4. Install third-party libraries** **4. Install third-party libraries**
...@@ -66,6 +66,6 @@ pip3 install -r requirements.txt ...@@ -66,6 +66,6 @@ pip3 install -r requirements.txt
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely). Please try to download Shapely whl file from [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found) Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
This diff is collapsed.
...@@ -7,13 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_ ...@@ -7,13 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_
Let's first understand some basic concepts. Let's first understand some basic concepts.
- [Introduction about OCR](#introduction-about-ocr) - [Introduction to OCR](#introduction-to-ocr)
* [Basic Concepts of OCR Detection Model](#basic-concepts-of-ocr-detection-model) * [Basic Concepts of OCR Detection Model](#basic-concepts-of-ocr-detection-model)
* [Basic Concepts of OCR Recognition Model](#basic-concepts-of-ocr-recognition-model) * [Basic Concepts of OCR Recognition Model](#basic-concepts-of-ocr-recognition-model)
* [PP-OCR Model](#pp-ocr-model) * [PP-OCR Model](#pp-ocr-model)
## 1. Introduction about OCR ## 1. Introduction to OCR
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model. This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
......
# OCR Model List(V2.1, updated on 2021.9.6) # OCR Model List(V2.1, updated on 2021.9.6)
> **Note** > **Note**
> 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed. > 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU.
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md). > 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
...@@ -18,7 +18,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine ...@@ -18,7 +18,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|--- | --- | --- | |--- | --- | --- |
|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)| |inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.| |trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). | |slim model|\*.nb| Model compressed by PaddleSlim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
Relationship of the above models is as follows. Relationship of the above models is as follows.
...@@ -50,7 +50,7 @@ Relationship of the above models is as follows. ...@@ -50,7 +50,7 @@ Relationship of the above models is as follows.
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset. **Note:** The `trained model` is fine-tuned on the `pre-trained model` with real data and synthesized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthesized data, which is more suitable for fine-tune on your own dataset.
<a name="English"></a> <a name="English"></a>
### 2.2 English Recognition Model ### 2.2 English Recognition Model
......
...@@ -28,12 +28,12 @@ The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japane ...@@ -28,12 +28,12 @@ The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japane
This document will briefly introduce how to use the multilingual model. This document will briefly introduce how to use the multilingual model.
- [1 Installation](#Install) - [1 Installation](#Install)
- [1.1 paddle installation](#paddleinstallation) - [1.1 Paddle installation](#paddleinstallation)
- [1.2 paddleocr package installation](#paddleocr_package_install) - [1.2 PaddleOCR package installation](#paddleocr_package_install)
- [2 Quick Use](#Quick_Use) - [2 Quick Use](#Quick_Use)
- [2.1 Command line operation](#Command_line_operation) - [2.1 Command line operation](#Command_line_operation)
- [2.2 python script running](#python_Script_running) - [2.2 Run with Python script](#python_Script_running)
- [3 Custom Training](#Custom_Training) - [3 Custom Training](#Custom_Training)
- [4 Inference and Deployment](#inference) - [4 Inference and Deployment](#inference)
- [4 Supported languages and abbreviations](#language_abbreviations) - [4 Supported languages and abbreviations](#language_abbreviations)
...@@ -42,7 +42,7 @@ This document will briefly introduce how to use the multilingual model. ...@@ -42,7 +42,7 @@ This document will briefly introduce how to use the multilingual model.
## 1 Installation ## 1 Installation
<a name="paddle_install"></a> <a name="paddle_install"></a>
### 1.1 paddle installation ### 1.1 Paddle installation
``` ```
# cpu # cpu
pip install paddlepaddle pip install paddlepaddle
...@@ -52,7 +52,7 @@ pip install paddlepaddle-gpu ...@@ -52,7 +52,7 @@ pip install paddlepaddle-gpu
``` ```
<a name="paddleocr_package_install"></a> <a name="paddleocr_package_install"></a>
### 1.2 paddleocr package installation ### 1.2 PaddleOCR package installation
pip install pip install
...@@ -79,8 +79,8 @@ paddleocr -h ...@@ -79,8 +79,8 @@ paddleocr -h
* Whole image prediction (detection + recognition) * Whole image prediction (detection + recognition)
Paddleocr currently supports 80 languages, which can be switched by modifying the --lang parameter. PaddleOCR currently supports 80 languages, which can be specified by the --lang parameter.
The specific supported [language] (#language_abbreviations) can be viewed in the table. The supported languages are listed in the [table](#language_abbreviations).
``` bash ``` bash
paddleocr --image_dir doc/imgs_en/254.jpg --lang=en paddleocr --image_dir doc/imgs_en/254.jpg --lang=en
...@@ -90,7 +90,7 @@ paddleocr --image_dir doc/imgs_en/254.jpg --lang=en ...@@ -90,7 +90,7 @@ paddleocr --image_dir doc/imgs_en/254.jpg --lang=en
<img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600"> <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div> </div>
The result is a list, each item contains a text box, text and recognition confidence The result is a list. Each item contains a text box, text and recognition confidence
```text ```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]] [('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]] [('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
...@@ -110,7 +110,7 @@ paddleocr --image_dir doc/imgs_words_en/word_308.png --det false --lang=en ...@@ -110,7 +110,7 @@ paddleocr --image_dir doc/imgs_words_en/word_308.png --det false --lang=en
![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_words_en/word_308.png) ![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_words_en/word_308.png)
The result is a tuple, which returns the recognition result and recognition confidence The result is a 2-tuple, which contains the recognition result and recognition confidence
```text ```text
(0.99879867, 'LITTLE') (0.99879867, 'LITTLE')
...@@ -122,7 +122,7 @@ The result is a tuple, which returns the recognition result and recognition conf ...@@ -122,7 +122,7 @@ The result is a tuple, which returns the recognition result and recognition conf
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
``` ```
The result is a list, each item contains only text boxes The result is a list. Each item represents the coordinates of a text box.
``` ```
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] [[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
...@@ -132,9 +132,9 @@ The result is a list, each item contains only text boxes ...@@ -132,9 +132,9 @@ The result is a list, each item contains only text boxes
``` ```
<a name="python_script_running"></a> <a name="python_script_running"></a>
### 2.2 python script running ### 2.2 Run with Python script
ppocr also supports running in python scripts for easy embedding in your own code: PPOCR is able to run with Python scripts for easy integration with your own code:
* Whole image prediction (detection + recognition) * Whole image prediction (detection + recognition)
...@@ -167,12 +167,12 @@ Visualization of results: ...@@ -167,12 +167,12 @@ Visualization of results:
![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/korean.jpg) ![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/korean.jpg)
ppocr also supports direction classification. For more usage methods, please refer to: [whl package instructions](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/whl.md). PPOCR also supports direction classification. For more detailed usage, please refer to: [whl package instructions](whl_en.md).
<a name="Custom_training"></a> <a name="Custom_training"></a>
## 3 Custom training ## 3 Custom training
ppocr supports using your own data for custom training or finetune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml) PPOCR supports using your own data for custom training or fine-tune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml)
Modify the training data path, dictionary and other parameters. Modify the training data path, dictionary and other parameters.
For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment, For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment,
...@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor ...@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor
## 4 Inference and Deployment ## 4 Inference and Deployment
In addition to installing the whl package for quick forecasting, In addition to installing the whl package for quick forecasting,
ppocr also provides a variety of forecasting deployment methods. PPOCR also provides a variety of forecasting deployment methods.
If necessary, you can read related documents: If necessary, you can read related documents:
- [Python Inference](./inference_en.md) - [Python Inference](./inference_en.md)
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## 1. PaddleOCR Overview ## 1. PaddleOCR Overview
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. Combining actual testing and industrial experience, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library according to different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure tool library for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality. PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. With the experience from real world scenarios and the industry, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library of different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure toolkit for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality.
<div align="center"> <div align="center">
<img src="../overview_en.png"> <img src="../overview_en.png">
...@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith ...@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith
# Recommend # Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR git clone https://github.com/PaddlePaddle/PaddleOCR
# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: # If you cannot pull successfully due to network problems, you can switch to the mirror hosted on Gitee:
git clone https://gitee.com/paddlepaddle/PaddleOCR git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. # Note: The mirror on Gitee may not keep in synchronization with the latest project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
``` ```
### **2.2 Install third-party libraries** ### **2.2 Install third-party libraries**
...@@ -34,6 +34,6 @@ pip3 install -r requirements.txt ...@@ -34,6 +34,6 @@ pip3 install -r requirements.txt
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely). Please try to download Shapely whl file from [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found) Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
...@@ -6,18 +6,18 @@ ...@@ -6,18 +6,18 @@
<a name="Brief_Introduction"></a> <a name="Brief_Introduction"></a>
## 1. Brief Introduction ## 1. Brief Introduction
OCR algorithm can be divided into two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm gets the detection box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm can complete text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster. OCR algorithms can be divided into two categories: two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm locates the box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm combines text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster.
### Introduction Of PGNet Algorithm ### Introduction Of PGNet Algorithm
In recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has the advantages that other algorithms do not During the recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has some advantages over the other algorithms.
- Pgnet loss is designed to guide training, and no character-level annotations is needed - PGNet loss is designed to guide training, and no character-level annotations is needed.
- NMS and ROI related operations are not needed, It can accelerate the prediction - NMS and ROI related operations are not needed. It can accelerate the prediction
- The reading order prediction module is proposed - The reading order prediction module is proposed
- A graph based modification module (GRM) is proposed to further improve the performance of model recognition - A graph based modification module (GRM) is proposed to further improve the performance of model recognition
- Higher accuracy and faster prediction speed - Higher accuracy and faster prediction speed
For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) ,The schematic diagram of the algorithm is as follows: For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf). The schematic diagram of the algorithm is as follows:
![](../pgnet_framework.png) ![](../pgnet_framework.png)
After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text centerline prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction. After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text center-line prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction.
The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition. The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition.
The results of detection and recognition are as follows: The results of detection and recognition are as follows:
...@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf ...@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf
<a name="Quick_Use"></a> <a name="Quick_Use"></a>
## 3. Quick Use ## 3. Quick Use
### inference model download ### Inference model download
This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar) This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar)
``` ```
mkdir inference && cd inference mkdir inference && cd inference
...@@ -131,7 +131,7 @@ python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Optimizer.base_lr=0.0 ...@@ -131,7 +131,7 @@ python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Optimizer.base_lr=0.0
``` ```
#### Load trained model and continue training #### Load trained model and continue training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. If you would like to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
```shell ```shell
python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.checkpoints=./your/trained/model
``` ```
......
...@@ -12,15 +12,15 @@ ...@@ -12,15 +12,15 @@
* [4. FAQ](#3-faq) * [4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training. This article will introduce the basic concepts that is necessary for model training and tuning.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene. At the same time, it will briefly introduce the structure of the training data and how to prepare the data to fine-tune model in vertical scenes.
<a name="1-Yml-Configuration"></a> <a name="1-Yml-Configuration"></a>
## 1. Yml Configuration ## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify. The PaddleOCR uses configuration files to control network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to train the model. Fine-tuning can also be completed by modifying the parameters in the configuration file, which is simple and convenient.
For the complete configuration file description, please refer to [Configuration File](./config_en.md) For the complete configuration file description, please refer to [Configuration File](./config_en.md)
...@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration ...@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration
## 2. Basic Concepts ## 2. Basic Concepts
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference: During the model training process, some hyper-parameters can be manually specified to obtain the optimal result at the least cost. Different data volumes may require different hyper-parameters. When you want to fine-tune the model based on your own data, there are several parameter adjustment strategies for reference:
<a name="11-learning-rate"></a> <a name="11-learning-rate"></a>
### 2.1 Learning Rate ### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration. The learning rate is one of the most important hyper-parameters for training neural networks. It represents the step length of the gradient moving towards the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example: A variety of learning rate update strategies are provided by PaddleOCR, which can be specified in configuration files. For example,
``` ```
Optimizer: Optimizer:
...@@ -46,16 +46,15 @@ Optimizer: ...@@ -46,16 +46,15 @@ Optimizer:
warmup_epoch: 5 warmup_epoch: 5
``` ```
Piecewise stands for piecewise constant attenuation. Different learning rates are specified in different learning stages, `Piecewise` stands for piece-wise constant attenuation. Different learning rates are specified in different learning stages, and the learning rate stay the same in each stage.
and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py). `warmup_epoch` means that in the first 5 epochs, the learning rate will be increased gradually from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a> <a name="12-regularization"></a>
### 2.2 Regularization ### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods. Regularization can effectively avoid algorithm over-fitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods. L1 and L2 regularization are the most widely used regularization methods.
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters; L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters. while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
The configuration method is as follows: The configuration method is as follows:
...@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows: ...@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data. - Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set. - Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc. Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](../doc_ch/datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<a name="22-vertical-scene"></a> <a name="22-vertical-scene"></a>
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized. - 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048). - 2021.1.21 update more than 25+ multilingual recognition models [models list](./models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image. - 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. - 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment