"examples/community/latent_consistency_txt2img.py" did not exist on "c18941b01ad0ea6b07d020f353d81153c632a374"
inference_en.md 19.3 KB
Newer Older
Khanh Tran's avatar
Khanh Tran committed
1

tink2123's avatar
tink2123 committed
2
# Reasoning based on Python prediction engine
Khanh Tran's avatar
Khanh Tran committed
3

licx's avatar
licx committed
4
The inference model (the model saved by `fluid.io.save_inference_model`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
Khanh Tran's avatar
Khanh Tran committed
5
6
7

The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

xxxpsyduck's avatar
xxxpsyduck committed
8
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
Khanh Tran's avatar
Khanh Tran committed
9
10
11

Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model.

licx's avatar
licx committed
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
    - [Convert detection model to inference model](#Convert_detection_model)
    - [Convert recognition model to inference model](#Convert_recognition_model)
    
    
- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
    - [1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION)
    - [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
    - [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
    - [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
    
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
    - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
    - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
    - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
    - [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
    
    
- [TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
    - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
    - [2. OTHER MODELS](#OTHER_MODELS)
    
<a name="CONVERT"></a>
xxxpsyduck's avatar
xxxpsyduck committed
35
## CONVERT TRAINING MODEL TO INFERENCE MODEL
licx's avatar
licx committed
36
<a name="Convert_detection_model"></a>
xxxpsyduck's avatar
xxxpsyduck committed
37
### Convert detection model to inference model
Khanh Tran's avatar
Khanh Tran committed
38

xxxpsyduck's avatar
xxxpsyduck committed
39
Download the lightweight Chinese detection model:
Khanh Tran's avatar
Khanh Tran committed
40
41
42
43
44
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/
```
The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
```
tink2123's avatar
tink2123 committed
45
46
47
48
49
50
51
# -c Set the training algorithm yml configuration file
# -o Set optional parameters
#  Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
#  Global.save_inference_dir Set the address where the converted model will be saved.

python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy \
        Global.save_inference_dir=./inference/det_db/
Khanh Tran's avatar
Khanh Tran committed
52
53
54
55
56
57
58
59
60
61
```
When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file.
`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved.
After the conversion is successful, there are two files in the `save_inference_dir` directory:
```
inference/det_db/
  └─  model     Check the program file of inference model
  └─  params    Check the parameter file of the inference model
```

licx's avatar
licx committed
62
<a name="Convert_recognition_model"></a>
xxxpsyduck's avatar
xxxpsyduck committed
63
### Convert recognition model to inference model
Khanh Tran's avatar
Khanh Tran committed
64

xxxpsyduck's avatar
xxxpsyduck committed
65
Download the lightweight Chinese recognition model:
Khanh Tran's avatar
Khanh Tran committed
66
67
68
69
70
71
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/
```

The recognition model is converted to the inference model in the same way as the detection, as follows:
```
tink2123's avatar
tink2123 committed
72
73
74
75
76
# -c Set the training algorithm yml configuration file
# -o Set optional parameters
#  Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
#  Global.save_inference_dir Set the address where the converted model will be saved.

Khanh Tran's avatar
Khanh Tran committed
77
78
79
80
81
82
83
84
85
86
87
88
89
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \
        Global.save_inference_dir=./inference/rec_crnn/
```

If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path.

After the conversion is successful, there are two files in the directory:
```
/inference/rec_crnn/
  └─  model     Identify the saved model files
  └─  params    Identify the parameter files of the inference model
```

licx's avatar
licx committed
90
<a name="DETECTION_MODEL_INFERENCE"></a>
xxxpsyduck's avatar
xxxpsyduck committed
91
## TEXT DETECTION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
92

tink2123's avatar
tink2123 committed
93
94
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.
Khanh Tran's avatar
Khanh Tran committed
95

licx's avatar
licx committed
96
<a name="LIGHTWEIGHT_DETECTION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
97
### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
98

xxxpsyduck's avatar
xxxpsyduck committed
99
For lightweight Chinese detection model inference, you can execute the following commands:
Khanh Tran's avatar
Khanh Tran committed
100
101
102
103
104
105
106

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
```

The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:

107
![](../imgs_results/det_res_2.jpg)
Khanh Tran's avatar
Khanh Tran committed
108
109
110
111
112
113
114
115
116
117
118
119

By setting the size of the parameter `det_max_side_len`, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200
```

If you want to use the CPU for prediction, execute the command as follows
```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
```

licx's avatar
licx committed
120
<a name="DB_DETECTION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
121
### 2. DB TEXT DETECTION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert:

```
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.

python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db"
```

DB text detection model inference, you can execute the following command:

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
```

The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:

141
![](../imgs_results/det_res_img_10_db.jpg)
Khanh Tran's avatar
Khanh Tran committed
142
143
144

**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.

licx's avatar
licx committed
145
<a name="EAST_DETECTION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
146
### 3. EAST TEXT DETECTION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
147
148
149
150
151
152
153
154
155
156
157

First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert:

```
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.

python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east"
```

licx's avatar
licx committed
158
**For EAST text detection model inference, you need to set the parameter ``--det_algorithm="EAST"``**, run the following command:
Khanh Tran's avatar
Khanh Tran committed
159
160
161
162

```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST"
```
licx's avatar
licx committed
163

Khanh Tran's avatar
Khanh Tran committed
164
165
The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:

166
![](../imgs_results/det_res_img_10_east.jpg)
Khanh Tran's avatar
Khanh Tran committed
167

licx's avatar
licx committed
168
169
170
171
172
173
174
175
176
177
178
179
180
**Note**: EAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.


<a name="SAST_DETECTION"></a>
### 4. SAST TEXT DETECTION MODEL INFERENCE
#### (1). Quadrangle text detection model (ICDAR2015)  
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)), you can use the following command to convert:

```
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.checkpoints="./models/sast_r50_vd_icdar2015/best_accuracy" Global.save_inference_dir="./inference/det_sast_ic15"
```

**For SAST quadrangle text detection model inference, you need to set the parameter `--det_algorithm="SAST"`**, run the following command:
Khanh Tran's avatar
Khanh Tran committed
181

licx's avatar
licx committed
182
183
184
185
186
```
python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/"
```

The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
Khanh Tran's avatar
Khanh Tran committed
187

licx's avatar
licx committed
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
![](../imgs_results/det_res_img_10_sast.jpg)

#### (2). Curved text detection model (Total-Text)  
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)), you can use the following command to convert:

```
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.checkpoints="./models/sast_r50_vd_total_text/best_accuracy" Global.save_inference_dir="./inference/det_sast_tt"
```

**For SAST curved text detection model inference, you need to set the parameter `--det_algorithm="SAST"` and `--det_sast_polygon=True`**, run the following command:

```
python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_sast_polygon=True
```

The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:

MissPenguin's avatar
MissPenguin committed
205
![](../imgs_results/det_res_img623_sast.jpg)
licx's avatar
licx committed
206
207
208
209

**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.

<a name="RECOGNITION_MODEL_INFERENCE"></a>
xxxpsyduck's avatar
xxxpsyduck committed
210
## TEXT RECOGNITION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
211

xxxpsyduck's avatar
xxxpsyduck committed
212
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
Khanh Tran's avatar
Khanh Tran committed
213
214


licx's avatar
licx committed
215
<a name="LIGHTWEIGHT_RECOGNITION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
216
### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
Khanh Tran's avatar
Khanh Tran committed
217

xxxpsyduck's avatar
xxxpsyduck committed
218
For lightweight Chinese recognition model inference, you can execute the following commands:
Khanh Tran's avatar
Khanh Tran committed
219
220
221
222
223

```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/"
```

224
![](../imgs_words/ch/word_4.jpg)
Khanh Tran's avatar
Khanh Tran committed
225
226
227
228
229
230

After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.

Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695]


licx's avatar
licx committed
231
<a name="CTC-BASED_RECOGNITION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
232
### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE
Khanh Tran's avatar
Khanh Tran committed
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250

Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`.

First, convert the model saved in the STAR-Net text recognition training process into an inference model. Taking the model based on Resnet34_vd backbone network, using MJSynth and SynthText (two English text recognition synthetic datasets) for training, as an example ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)). It can be converted as follow:

```
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.

python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet"
```

For STAR-Net text recognition model inference, execute the following commands:

```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
```
xxxpsyduck's avatar
xxxpsyduck committed
251

licx's avatar
licx committed
252
<a name="ATTENTION-BASED_RECOGNITION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
253
### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
254
![](../imgs_words_en/word_336.png)
Khanh Tran's avatar
Khanh Tran committed
255
256
257
258
259

After executing the command, the recognition result of the above image is as follows:

Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555]

xxxpsyduck's avatar
xxxpsyduck committed
260
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects:
Khanh Tran's avatar
Khanh Tran committed
261
262
263
264
265
266
267
268
269
270

- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`.

- Character list: the experiment in the DTRB paper is only for 26 lowercase English characters and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no characters dictionary file is used here, but a dictionary is generated by the below command. Therefore, the parameter `rec_char_type` needs to be set during inference, which is specified as "en" in English.

```
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
```

licx's avatar
licx committed
271
<a name="USING_CUSTOM_CHARACTERS"></a>
xxxpsyduck's avatar
xxxpsyduck committed
272
273
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
LDOUBLEV's avatar
LDOUBLEV committed
274
275
276
277
278

```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
```

licx's avatar
licx committed
279
<a name="CONCATENATION"></a>
xxxpsyduck's avatar
xxxpsyduck committed
280
## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION
Khanh Tran's avatar
Khanh Tran committed
281

licx's avatar
licx committed
282
<a name="LIGHTWEIGHT_CHINESE_MODEL"></a>
xxxpsyduck's avatar
xxxpsyduck committed
283
### 1. LIGHTWEIGHT CHINESE MODEL
Khanh Tran's avatar
Khanh Tran committed
284

xxxpsyduck's avatar
xxxpsyduck committed
285
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default.
Khanh Tran's avatar
Khanh Tran committed
286
287
288
289
290
291
292

```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"  --rec_model_dir="./inference/rec_crnn/"
```

After executing the command, the recognition result image is as follows:

293
![](../imgs_results/2.jpg)
Khanh Tran's avatar
Khanh Tran committed
294

licx's avatar
licx committed
295
<a name="OTHER_MODELS"></a>
xxxpsyduck's avatar
xxxpsyduck committed
296
### 2. OTHER MODELS
Khanh Tran's avatar
Khanh Tran committed
297

licx's avatar
licx committed
298
299
300
301
302
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model.

**Note: due to the limitation of rotation logic of detected box, SAST curved text detection model (using the parameter `det_sast_polygon=True`) is not supported for model combination yet.**

The following command uses the combination of the EAST text detection and STAR-Net text recognition:
Khanh Tran's avatar
Khanh Tran committed
303
304
305
306
307
308
309

```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
```

After executing the command, the recognition result image is as follows:

310
![](../imgs_results/img_10.jpg)