detection_en.md 12.1 KB
Newer Older
1
# Text Detection
LDOUBLEV's avatar
LDOUBLEV committed
2

3
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
LDOUBLEV's avatar
LDOUBLEV committed
4

WenmuZhou's avatar
WenmuZhou committed
5
6
7
8
9
10
11
12
13
14
15
16
17
- [1. Data and Weights Preparation](#1-data-and-weights-preparation)
  - [1.1 Data Preparation](#11-data-preparation)
  - [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
- [2. Training](#2-training)
  - [2.1 Start Training](#21-start-training)
  - [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
  - [2.3 Training with New Backbone](#23-training-with-new-backbone)
  - [2.4 Training with knowledge distillation](#24-training-with-knowledge-distillation)
- [3. Evaluation and Test](#3-evaluation-and-test)
  - [3.1 Evaluation](#31-evaluation)
  - [3.2 Test](#32-test)
- [4. Inference](#4-inference)
- [5. FAQ](#5-faq)
Khanh Tran's avatar
Khanh Tran committed
18

19
## 1. Data and Weights Preparation
Khanh Tran's avatar
Khanh Tran committed
20

21
### 1.1 Data Preparation
LDOUBLEV's avatar
LDOUBLEV committed
22

WenmuZhou's avatar
WenmuZhou committed
23
To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets_en.md) .
Khanh Tran's avatar
Khanh Tran committed
24

fanruinet's avatar
fanruinet committed
25
### 1.2 Download Pre-trained Model
26

fanruinet's avatar
fanruinet committed
27
28
First download the pre-trained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pre-trained weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
Khanh Tran's avatar
Khanh Tran committed
29

licx's avatar
licx committed
30
```shell
Khanh Tran's avatar
Khanh Tran committed
31
32
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
tink2123's avatar
tink2123 committed
33
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
WenmuZhou's avatar
WenmuZhou committed
34
# or, download the pre-trained model of ResNet18_vd
tink2123's avatar
tink2123 committed
35
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet18_vd_pretrained.pdparams
WenmuZhou's avatar
WenmuZhou committed
36
# or, download the pre-trained model of ResNet50_vd
tink2123's avatar
tink2123 committed
37
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
38

39
```
Khanh Tran's avatar
Khanh Tran committed
40

Leif's avatar
Leif committed
41
## 2. Training
42
43
44

### 2.1 Start Training

MissPenguin's avatar
MissPenguin committed
45
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
licx's avatar
licx committed
46
```shell
47
python3 tools/train.py -c configs/det/det_mv3_db.yml  \
Leif's avatar
Leif committed
48
         -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
Khanh Tran's avatar
Khanh Tran committed
49
50
```

MissPenguin's avatar
MissPenguin committed
51
52
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
Khanh Tran's avatar
Khanh Tran committed
53

54
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
licx's avatar
licx committed
55
```shell
LDOUBLEV's avatar
update  
LDOUBLEV committed
56
# single GPU training
57
python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
Leif's avatar
Leif committed
58
         Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained  \
59
         Optimizer.base_lr=0.0001
LDOUBLEV's avatar
update  
LDOUBLEV committed
60
61

# multi-GPU training
62
# Set the GPU ID used by the '--gpus' parameter.
Leif's avatar
Leif committed
63
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
stephon's avatar
stephon committed
64

Bin Lu's avatar
Bin Lu committed
65
# multi-Node, multi-GPU training
Bin Lu's avatar
Bin Lu committed
66
# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
stephon's avatar
stephon committed
67
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
Bin Lu's avatar
Bin Lu committed
68
69
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
stephon's avatar
stephon committed
70
71
**Note:** For multi-Node multi-GPU training, you need to replace the `ips` value in the preceding command with the address of your machine, and the machines must be able to ping each other. In addition, it requires activating commands separately on multiple machines when we start the training. The command for viewing the IP address of the machine is `ifconfig`.

Bin Lu's avatar
Bin Lu committed
72
If you want to further speed up the training, you can use [automatic mixed precision training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_en.html). for single card training, the command is as follows:
Bin Lu's avatar
Bin Lu committed
73
74
75
76
```
python3 tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
Khanh Tran's avatar
Khanh Tran committed
77
78
```

79
### 2.2 Load Trained Model and Continue Training
80
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
81
82

For example:
licx's avatar
licx committed
83
```shell
LDOUBLEV's avatar
LDOUBLEV committed
84
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
LDOUBLEV's avatar
LDOUBLEV committed
85
86
```

Leif's avatar
Leif committed
87
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
88
89


90
### 2.3 Training with New Backbone
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.

However, if you want to use a new Backbone, an example of replacing the backbones is as follows:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F


class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx

    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y
```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml
  Backbone:
    name: MyBackbone
    args1: args1
```

**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).

140
141
142
143
144

### 2.4 Training with knowledge distillation

Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md).

145
146
147
## 3. Evaluation and Test

### 3.1 Evaluation
Khanh Tran's avatar
Khanh Tran committed
148

149
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
Khanh Tran's avatar
Khanh Tran committed
150

LDOUBLEV's avatar
LDOUBLEV committed
151
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
Khanh Tran's avatar
Khanh Tran committed
152

153
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
Khanh Tran's avatar
Khanh Tran committed
154

LDOUBLEV's avatar
LDOUBLEV committed
155
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
licx's avatar
licx committed
156
```shell
LDOUBLEV's avatar
LDOUBLEV committed
157
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
Khanh Tran's avatar
Khanh Tran committed
158
159
```

160
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
Khanh Tran's avatar
Khanh Tran committed
161

162
### 3.2 Test
Khanh Tran's avatar
Khanh Tran committed
163
164

Test the detection result on a single image:
165
```shell
166
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
167
168
169
```

When testing the DB model, adjust the post-processing threshold:
170
```shell
171
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
Khanh Tran's avatar
Khanh Tran committed
172
173
174
175
```


Test the detection result on all images in the folder:
176
```shell
177
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
178
```
179

180
## 4. Inference
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202

The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.

The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.

Firstly, we can convert DB trained model to inference model:
```shell
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/"
```

The detection inference model prediction:
```shell
python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm:
```shell
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

203
## 5. FAQ
204
205

Q1: The prediction results of trained model and inference model are inconsistent?
206

207
208
209
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).