detection.md 5.02 KB
Newer Older
1
# Text detection
2

3
This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
4

5
6
## Data preparation
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
7

8
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget:
9
```
10
# Under the PaddleOCR path
tink2123's avatar
fix doc  
tink2123 committed
11
12
13
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
14
15
```

16
After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
17
```
tink2123's avatar
tink2123 committed
18
/PaddleOCR/train_data/icdar2015/text_localization/
19
20
21
22
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
23
24
```

25
The label file format provided is:
26
```
27
" Image file name                    Image annotation information encoded by json.dumps"
28
29
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
30
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner.
31

32
33
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。
34
35


36
37
38
## Quickstart training

First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
39
```
LDOUBLEV's avatar
LDOUBLEV committed
40
cd PaddleOCR/
41
# Download the pre-trained model of MobileNetV3
tink2123's avatar
fix doc  
tink2123 committed
42
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
43
# Download the pre-trained model of ResNet50
tink2123's avatar
fix doc  
tink2123 committed
44
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
45
46
```

47
**Start training**
48
```
tink2123's avatar
tink2123 committed
49
python3 tools/train.py -c configs/det/det_mv3_db.yml
50
51
```

52
53
In the above instruction, use -c to select the training to use the configs/det/det_db_mv3.yml configuration file.
For a detailed explanation of the configuration file, please refer to [link](./doc/config.md).
54

55
You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
56
```
tink2123's avatar
tink2123 committed
57
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
58
59
```

60
61
62
## Index evaluation

PaddleOCR calculates three indicators related to OCR detection: Precision, Recall, and Hmean.
63

64
Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml
65

66
When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization.
67
68

```
LDOUBLEV's avatar
fix doc  
LDOUBLEV committed
69
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
70
```
71
The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file.
LDOUBLEV's avatar
LDOUBLEV committed
72

73
Such as:
LDOUBLEV's avatar
LDOUBLEV committed
74
```
LDOUBLEV's avatar
fix doc  
LDOUBLEV committed
75
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
LDOUBLEV's avatar
LDOUBLEV committed
76
77
```

78
* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and do not need to be set when evaluating the EAST model.
79

80
## Test detection result
LDOUBLEV's avatar
LDOUBLEV committed
81

82
Test the detection result on a single image:
LDOUBLEV's avatar
LDOUBLEV committed
83
```
84
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
LDOUBLEV's avatar
LDOUBLEV committed
85
86
```

87
When testing the DB model, adjust the post-processing threshold:
88
89
90
91
92
```
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```


93
Test the detection effect of all images in the folder:
LDOUBLEV's avatar
LDOUBLEV committed
94
```
95
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
LDOUBLEV's avatar
LDOUBLEV committed
96
```