detection_en.md 12.8 KB
Newer Older
LDOUBLEV's avatar
LDOUBLEV committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# CONTENT

- [Paste Your Document In Here](#paste-your-document-in-here)
- [1. TEXT DETECTION](#1-text-detection)
  * [1.1 DATA PREPARATION](#11-data-preparation)
  * [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model)
  * [1.3 START TRAINING](#13-start-training)
  * [1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING](#14-load-trained-model-and-continue-training)
  * [1.5 TRAINING WITH NEW BACKBONE](#15-training-with-new-backbone)
  * [1.6 EVALUATION](#16-evaluation)
  * [1.7 TEST](#17-test)
  * [1.8 INFERENCE MODEL PREDICTION](#18-inference-model-prediction)
- [2. FAQ](#2-faq)


# 1. TEXT DETECTION
Khanh Tran's avatar
Khanh Tran committed
17

licx's avatar
licx committed
18
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
Khanh Tran's avatar
Khanh Tran committed
19

LDOUBLEV's avatar
LDOUBLEV committed
20
## 1.1 DATA PREPARATION
Khanh Tran's avatar
Khanh Tran committed
21
22
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

LDOUBLEV's avatar
LDOUBLEV committed
23
24
25
26
27
28
29

After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`

<p align="center">
 <img src="./doc/datasets/ic15_location_download.png" align="middle" width = "600"/>
<p align="center">

Khanh Tran's avatar
Khanh Tran committed
30
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
licx's avatar
licx committed
31
```shell
Khanh Tran's avatar
Khanh Tran committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

47
The provided annotation file format is as follow, seperated by "\t":
Khanh Tran's avatar
Khanh Tran committed
48
49
```
" Image file name             Image annotation information encoded by json.dumps"
LDOUBLEV's avatar
LDOUBLEV committed
50
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
Khanh Tran's avatar
Khanh Tran committed
51
```
WenmuZhou's avatar
WenmuZhou committed
52
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
Khanh Tran's avatar
Khanh Tran committed
53

licx's avatar
licx committed
54
55
56
57
58
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
Khanh Tran's avatar
Khanh Tran committed
59
60


LDOUBLEV's avatar
LDOUBLEV committed
61
## 1.2 DOWNLOAD PRETRAINED MODEL
62
63
64

First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
Khanh Tran's avatar
Khanh Tran committed
65

licx's avatar
licx committed
66
```shell
Khanh Tran's avatar
Khanh Tran committed
67
68
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
69
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
WenmuZhou's avatar
WenmuZhou committed
70
# or, download the pre-trained model of ResNet18_vd
71
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams
WenmuZhou's avatar
WenmuZhou committed
72
# or, download the pre-trained model of ResNet50_vd
73
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams
74

75
```
Khanh Tran's avatar
Khanh Tran committed
76

LDOUBLEV's avatar
LDOUBLEV committed
77
## 1.3 START TRAINING
MissPenguin's avatar
MissPenguin committed
78
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
licx's avatar
licx committed
79
```shell
80
81
python3 tools/train.py -c configs/det/det_mv3_db.yml  \
         -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
Khanh Tran's avatar
Khanh Tran committed
82
83
```

MissPenguin's avatar
MissPenguin committed
84
85
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
Khanh Tran's avatar
Khanh Tran committed
86

87
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
licx's avatar
licx committed
88
```shell
LDOUBLEV's avatar
update  
LDOUBLEV committed
89
# single GPU training
90
91
92
python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
         Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained  \
         Optimizer.base_lr=0.0001
LDOUBLEV's avatar
update  
LDOUBLEV committed
93
94

# multi-GPU training
95
# Set the GPU ID used by the '--gpus' parameter.
96
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
LDOUBLEV's avatar
LDOUBLEV committed
97

Khanh Tran's avatar
Khanh Tran committed
98
99
```

LDOUBLEV's avatar
LDOUBLEV committed
100
## 1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING
101
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
102
103

For example:
licx's avatar
licx committed
104
```shell
LDOUBLEV's avatar
LDOUBLEV committed
105
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
LDOUBLEV's avatar
LDOUBLEV committed
106
107
```

licx's avatar
licx committed
108
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
109
110


LDOUBLEV's avatar
LDOUBLEV committed
111
## 1.5 TRAINING WITH NEW BACKBONE
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.

However, if you want to use a new Backbone, an example of replacing the backbones is as follows:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F


class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx

    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y
```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml
  Backbone:
    name: MyBackbone
    args1: args1
```

**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).

LDOUBLEV's avatar
LDOUBLEV committed
161
## 1.6 EVALUATION
Khanh Tran's avatar
Khanh Tran committed
162

163
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
Khanh Tran's avatar
Khanh Tran committed
164

LDOUBLEV's avatar
LDOUBLEV committed
165
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
Khanh Tran's avatar
Khanh Tran committed
166

167
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
Khanh Tran's avatar
Khanh Tran committed
168

LDOUBLEV's avatar
LDOUBLEV committed
169
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
licx's avatar
licx committed
170
```shell
LDOUBLEV's avatar
LDOUBLEV committed
171
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
Khanh Tran's avatar
Khanh Tran committed
172
173
```

174
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
Khanh Tran's avatar
Khanh Tran committed
175

LDOUBLEV's avatar
LDOUBLEV committed
176
## 1.7 TEST
Khanh Tran's avatar
Khanh Tran committed
177
178

Test the detection result on a single image:
179
```shell
180
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
181
182
183
```

When testing the DB model, adjust the post-processing threshold:
184
```shell
185
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
Khanh Tran's avatar
Khanh Tran committed
186
187
188
189
```


Test the detection result on all images in the folder:
190
```shell
191
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
192
```
193

LDOUBLEV's avatar
LDOUBLEV committed
194
## 1.8 INFERENCE MODEL PREDICTION
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216

The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.

The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.

Firstly, we can convert DB trained model to inference model:
```shell
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/"
```

The detection inference model prediction:
```shell
python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm:
```shell
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

LDOUBLEV's avatar
LDOUBLEV committed
217
# 2. FAQ
218
219
220
221
222

Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).