Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
...
@@ -55,5 +55,8 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
...
@@ -55,5 +55,8 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
@@ -5,7 +5,8 @@ The inference model (the model saved by `paddle.jit.save`) is generally a solidi
...
@@ -5,7 +5,8 @@ The inference model (the model saved by `paddle.jit.save`) is generally a solidi
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md).
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/docs/zh_CN/extension/paddle_mobile_inference.md).
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
...
@@ -25,6 +26,7 @@ Next, we first introduce how to convert a trained model into an inference model,
...
@@ -25,6 +26,7 @@ Next, we first introduce how to convert a trained model into an inference model,
-[TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
-[TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
-[1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
-[1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
-[2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
-[2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
-[3. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
-[3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
-[3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
-[4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
-[4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
...
@@ -146,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by
...
@@ -146,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by


You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and
The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
### 3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> **Note**
> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
-[1. Text Detection Model](#Detection)
-[1. Text Detection Model](#Detection)
-[2. Text Recognition Model](#Recognition)
-[2. Text Recognition Model](#Recognition)
...
@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
...
@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|model type|model format|description|
|model type|model format|description|
|--- | --- | --- |
|--- | --- | --- |
|inference model|inference.pdmodel、inference.pdiparams|Used for reasoning based on Python prediction engine,[detail](./inference_en.md)|
|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb|Generally used for Lite deployment|
|slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
-[4.1 Training engine prediction](#Training_engine_prediction)
<aname="DATA_PREPARATION"></a>
<aname="DATA_PREPARATION"></a>
### DATA PREPARATION
### DATA PREPARATION
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
PaddleOCR supports two data formats:
-`LMDB` is used to train data sets stored in lmdb format;
-`general data` is used to train data sets stored in text files:
Please organize the dataset as follows:
Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<aname="Costom_Dataset"></a>
<aname="Costom_Dataset"></a>
* Use your own dataset:
#### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data.
If you want to use your own data for training, please refer to the following to organize your data.
- Training set
- Training set
First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
```
```
" Image file name Image annotation "
" Image file name Image annotation "
train_data/train_0001.jpg 简单可依赖
train_data/rec/train/word_001.jpg 简单可依赖
train_data/train_0002.jpg 用科技让复杂的世界更简单
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
```
...
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
The final training set should have the following file structure:
The final training set should have the following file structure:
```
```
|-train_data
|-train_data
|-ic15_data
|-rec
|- rec_gt_train.txt
|- rec_gt_train.txt
|- train
|- train
|- word_001.png
|- word_001.png
...
@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
```
|-train_data
|-train_data
|-rec
|-ic15_data
|-ic15_data
|- rec_gt_test.txt
|- rec_gt_test.txt
|- test
|- test
...
@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg
|- word_003.jpg
| ...
| ...
```
```
<aname="Dataset_download"></a>
#### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...
@@ -108,6 +117,8 @@ n
...
@@ -108,6 +117,8 @@ n
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
PaddleOCR has built-in dictionaries, which can be used on demand.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
...
@@ -123,10 +134,8 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
...
@@ -123,10 +134,8 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict)or corpus file to [corpus](../../ppocr/utils/corpus)and we will thank you in the Repo.
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
...
@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
...
@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<aname="Add_space_category"></a>
<aname="Add_space_category"></a>
- Add space category
#### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
**Note: use_space_char only takes effect when character_type=ch**
<aname="TRAINING"></a>
<aname="TRAINING"></a>
### TRAINING
### 2 TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<aname="Training"></a>
<aname="Training"></a>
- Training
#### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
...
@@ -195,6 +204,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
...
@@ -195,6 +204,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
For training Chinese data, it is recommended to use
For training Chinese data, it is recommended to use
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
...
@@ -264,7 +277,7 @@ Eval:
...
@@ -264,7 +277,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<aname="Multi_language"></a>
<aname="Multi_language"></a>
- Multi-language
#### 2.3 Multi-language
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
...
@@ -416,7 +429,7 @@ Eval:
...
@@ -416,7 +429,7 @@ Eval:
```
```
<aname="EVALUATION"></a>
<aname="EVALUATION"></a>
### EVALUATION
### 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
pip install"paddleocr>=2.0.1"# Recommend to use version 2.0.1+
pip install"paddleocr>=2.0.1"# Recommend to use version 2.0.1+
...
@@ -12,9 +12,11 @@ build own whl package and install
...
@@ -12,9 +12,11 @@ build own whl package and install
python3 setup.py bdist_wheel
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
```
```
### 1. Use by code
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
* detection classification and recognition
* detection angle classification and recognition
```python
```python
frompaddleocrimportPaddleOCR,draw_ocr
frompaddleocrimportPaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
...
@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
...
@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
['0', 0.99999964]
```
```
### Use by command line
### 2.2 Use by command line
show help information
show help information
```bash
```bash
...
@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
...
@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
['0', 0.99999964]
```
```
## Use custom model
## 3 Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model.
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows