detection_en.md 6.33 KB
Newer Older
xxxpsyduck's avatar
xxxpsyduck committed
1
# TEXT DETECTION
Khanh Tran's avatar
Khanh Tran committed
2
3
4

This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.

xxxpsyduck's avatar
xxxpsyduck committed
5
## DATA PREPARATION
Khanh Tran's avatar
Khanh Tran committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
```
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

25
The provided annotation file format is as follow, seperated by "\t":
Khanh Tran's avatar
Khanh Tran committed
26
27
28
29
```
" Image file name             Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
30
The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
Khanh Tran's avatar
Khanh Tran committed
31
32
33
34
35

`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format.


xxxpsyduck's avatar
xxxpsyduck committed
36
## TRAINING
Khanh Tran's avatar
Khanh Tran committed
37
38
39
40
41
42
43
44

First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# Download the pre-trained model of ResNet50
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
45
46
47
48
49
50
51
52
53
54
55
56

# decompressing the pre-training model file, take MobileNetV3 as an example
tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/

# Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:
./pretrain_models/MobileNetV3_large_x0_5_pretrained/
  └─ conv_last_bn_mean
  └─ conv_last_bn_offset
  └─ conv_last_bn_scale
  └─ conv_last_bn_variance
  └─ ......

Khanh Tran's avatar
Khanh Tran committed
57
58
```

59
**START TRAINING**  
MissPenguin's avatar
MissPenguin committed
60
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
Khanh Tran's avatar
Khanh Tran committed
61
62
63
64
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
```

MissPenguin's avatar
MissPenguin committed
65
66
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
Khanh Tran's avatar
Khanh Tran committed
67

68
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
Khanh Tran's avatar
Khanh Tran committed
69
70
71
72
```
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
```

LDOUBLEV's avatar
LDOUBLEV committed
73
**load trained model and conntinue training**
74
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
75
76
77
78
79
80

For example:
```
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
```

81
**Note**:The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
82
83


xxxpsyduck's avatar
xxxpsyduck committed
84
## EVALUATION
Khanh Tran's avatar
Khanh Tran committed
85
86
87
88
89

PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.

Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`

90
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
Khanh Tran's avatar
Khanh Tran committed
91
92
93
94

```
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```
95
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
Khanh Tran's avatar
Khanh Tran committed
96
97

Such as:
98
```shell
Khanh Tran's avatar
Khanh Tran committed
99
100
101
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```

102
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
Khanh Tran's avatar
Khanh Tran committed
103

104
## TEST
Khanh Tran's avatar
Khanh Tran committed
105
106

Test the detection result on a single image:
107
```shell
Khanh Tran's avatar
Khanh Tran committed
108
109
110
111
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
```

When testing the DB model, adjust the post-processing threshold:
112
```shell
Khanh Tran's avatar
Khanh Tran committed
113
114
115
116
117
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```


Test the detection result on all images in the folder:
118
```shell
Khanh Tran's avatar
Khanh Tran committed
119
120
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
```