Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
It is recommended that you could understand following pages before reading this example:
1.[PaddleOCR training methods](../../../doc/doc_ch/quickstart.md)
2.[The demo of prune](https://github.com/PaddlePaddle/PaddleSlim/blob/release%2F2.0.0/docs/zh_cn/tutorials/pruning/dygraph/filter_pruning.md)
PaddleOCR also provides a series of (models)[../../../doc/doc_en/models_list_en.md]. Developers can choose their own models or use their own models according to their needs.
### 3. Pruning sensitivity analysis
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of correspoding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw)(download code: 2bpi).
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from:
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md)
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./detection_en.md)
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
@@ -5,7 +5,8 @@ The inference model (the model saved by `paddle.jit.save`) is generally a solidi
...
@@ -5,7 +5,8 @@ The inference model (the model saved by `paddle.jit.save`) is generally a solidi
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md).
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/docs/zh_CN/extension/paddle_mobile_inference.md).
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
...
@@ -147,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by
...
@@ -147,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by


You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `litmit_type` are [`max`, `min`], and
The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
-[4.1 Training engine prediction](#Training_engine_prediction)
<aname="DATA_PREPARATION"></a>
<aname="DATA_PREPARATION"></a>
### DATA PREPARATION
### DATA PREPARATION
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
PaddleOCR supports two data formats:
-`LMDB` is used to train data sets stored in lmdb format;
-`general data` is used to train data sets stored in text files:
Please organize the dataset as follows:
Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<aname="Costom_Dataset"></a>
<aname="Costom_Dataset"></a>
* Use your own dataset:
#### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data.
If you want to use your own data for training, please refer to the following to organize your data.
- Training set
- Training set
First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
* Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error
```
```
" Image file name Image annotation "
" Image file name Image annotation "
train_data/train_0001.jpg 简单可依赖
train_data/rec/train/word_001.jpg 简单可依赖
train_data/train_0002.jpg 用科技让复杂的世界更简单
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
```
...
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
The final training set should have the following file structure:
The final training set should have the following file structure:
```
```
|-train_data
|-train_data
|-ic15_data
|-rec
|- rec_gt_train.txt
|- rec_gt_train.txt
|- train
|- train
|- word_001.png
|- word_001.png
|- word_002.jpg
|- word_002.jpg
|- word_003.jpg
|- word_003.jpg
| ...
| ...
```
```
- Test set
- Test set
...
@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
```
|-train_data
|-train_data
|-rec
|-ic15_data
|-ic15_data
|- rec_gt_test.txt
|- rec_gt_test.txt
|- test
|- test
...
@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg
|- word_003.jpg
| ...
| ...
```
```
<aname="Dataset_download"></a>
#### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...
@@ -108,6 +117,8 @@ n
...
@@ -108,6 +117,8 @@ n
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]
PaddleOCR has built-in dictionaries, which can be used on demand.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
...
@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
...
@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
`ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.
...
@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
...
@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<aname="Add_space_category"></a>
<aname="Add_space_category"></a>
- Add space category
#### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
**Note: use_space_char only takes effect when character_type=ch**
<aname="TRAINING"></a>
<aname="TRAINING"></a>
### TRAINING
### 2 TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<aname="Training"></a>
<aname="Training"></a>
- Training
#### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
...
@@ -268,7 +277,7 @@ Eval:
...
@@ -268,7 +277,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<aname="Multi_language"></a>
<aname="Multi_language"></a>
- Multi-language
#### 2.3 Multi-language
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
...
@@ -420,7 +429,7 @@ Eval:
...
@@ -420,7 +429,7 @@ Eval:
```
```
<aname="EVALUATION"></a>
<aname="EVALUATION"></a>
### EVALUATION
### 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
pip install"paddleocr>=2.0.1"# Recommend to use version 2.0.1+
pip install"paddleocr>=2.0.1"# Recommend to use version 2.0.1+
...
@@ -12,9 +12,11 @@ build own whl package and install
...
@@ -12,9 +12,11 @@ build own whl package and install
python3 setup.py bdist_wheel
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
```
```
### 1. Use by code
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
* detection classification and recognition
* detection angle classification and recognition
```python
```python
frompaddleocrimportPaddleOCR,draw_ocr
frompaddleocrimportPaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
...
@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
['0', 0.99999964]
```
```
### Use by command line
### 2.2 Use by command line
show help information
show help information
```bash
```bash
...
@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
...
@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
['0', 0.99999964]
['0', 0.99999964]
```
```
## Use custom model
## 3 Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model.
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows