"builder/manywheel/scripts/install_conda.sh" did not exist on "3df43e8cd83a109c6b4c4c043a1e38c26698ae03"
detection_en.md 6.65 KB
Newer Older
xxxpsyduck's avatar
xxxpsyduck committed
1
# TEXT DETECTION
Khanh Tran's avatar
Khanh Tran committed
2

licx's avatar
licx committed
3
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
Khanh Tran's avatar
Khanh Tran committed
4

xxxpsyduck's avatar
xxxpsyduck committed
5
## DATA PREPARATION
Khanh Tran's avatar
Khanh Tran committed
6
7
8
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
licx's avatar
licx committed
9
```shell
Khanh Tran's avatar
Khanh Tran committed
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

25
The provided annotation file format is as follow, seperated by "\t":
Khanh Tran's avatar
Khanh Tran committed
26
27
```
" Image file name             Image annotation information encoded by json.dumps"
LDOUBLEV's avatar
LDOUBLEV committed
28
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
Khanh Tran's avatar
Khanh Tran committed
29
```
WenmuZhou's avatar
WenmuZhou committed
30
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
Khanh Tran's avatar
Khanh Tran committed
31

licx's avatar
licx committed
32
33
34
35
36
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
Khanh Tran's avatar
Khanh Tran committed
37
38


xxxpsyduck's avatar
xxxpsyduck committed
39
## TRAINING
Khanh Tran's avatar
Khanh Tran committed
40

WenmuZhou's avatar
WenmuZhou committed
41
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
licx's avatar
licx committed
42
```shell
Khanh Tran's avatar
Khanh Tran committed
43
44
45
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
WenmuZhou's avatar
WenmuZhou committed
46
47
48
# or, download the pre-trained model of ResNet18_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
# or, download the pre-trained model of ResNet50_vd
Khanh Tran's avatar
Khanh Tran committed
49
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
50
51

# decompressing the pre-training model file, take MobileNetV3 as an example
licx's avatar
licx committed
52
tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/
53
54
55
56
57
58
59
60
61

# Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:
./pretrain_models/MobileNetV3_large_x0_5_pretrained/
  └─ conv_last_bn_mean
  └─ conv_last_bn_offset
  └─ conv_last_bn_scale
  └─ conv_last_bn_variance
  └─ ......

Khanh Tran's avatar
Khanh Tran committed
62
63
```

licx's avatar
licx committed
64
#### START TRAINING
MissPenguin's avatar
MissPenguin committed
65
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
licx's avatar
licx committed
66
```shell
WenmuZhou's avatar
WenmuZhou committed
67
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml 2>&1 | tee train_det.log
Khanh Tran's avatar
Khanh Tran committed
68
69
```

MissPenguin's avatar
MissPenguin committed
70
71
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
Khanh Tran's avatar
Khanh Tran committed
72

73
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
licx's avatar
licx committed
74
```shell
WenmuZhou's avatar
WenmuZhou committed
75
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001
Khanh Tran's avatar
Khanh Tran committed
76
77
```

WenmuZhou's avatar
WenmuZhou committed
78
#### load trained model and continue training
79
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
80
81

For example:
licx's avatar
licx committed
82
```shell
WenmuZhou's avatar
WenmuZhou committed
83
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model
LDOUBLEV's avatar
LDOUBLEV committed
84
85
```

licx's avatar
licx committed
86
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
LDOUBLEV's avatar
LDOUBLEV committed
87
88


xxxpsyduck's avatar
xxxpsyduck committed
89
## EVALUATION
Khanh Tran's avatar
Khanh Tran committed
90
91
92

PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.

WenmuZhou's avatar
WenmuZhou committed
93
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3_v1.1.yml`
Khanh Tran's avatar
Khanh Tran committed
94

95
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
Khanh Tran's avatar
Khanh Tran committed
96

licx's avatar
licx committed
97
```shell
WenmuZhou's avatar
WenmuZhou committed
98
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
Khanh Tran's avatar
Khanh Tran committed
99
```
100
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
Khanh Tran's avatar
Khanh Tran committed
101
102

Such as:
103
```shell
WenmuZhou's avatar
WenmuZhou committed
104
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
Khanh Tran's avatar
Khanh Tran committed
105
106
```

107
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
Khanh Tran's avatar
Khanh Tran committed
108

109
## TEST
Khanh Tran's avatar
Khanh Tran committed
110
111

Test the detection result on a single image:
112
```shell
WenmuZhou's avatar
WenmuZhou committed
113
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
114
115
116
```

When testing the DB model, adjust the post-processing threshold:
117
```shell
WenmuZhou's avatar
WenmuZhou committed
118
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
Khanh Tran's avatar
Khanh Tran committed
119
120
121
122
```


Test the detection result on all images in the folder:
123
```shell
WenmuZhou's avatar
WenmuZhou committed
124
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
Khanh Tran's avatar
Khanh Tran committed
125
```