angle_class_en.md 4.89 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## TEXT ANGLE CLASSIFICATION

### DATA PREPARATION

Please organize the dataset as follows:

The default storage path for training data is `PaddleOCR/train_data/cls`, if you already have a dataset on your disk, just create a soft link to the dataset directory:

```
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
```

please refer to the following to organize your data.

- Training set

First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label.

* Note: by default, the image path and image label are split with `\t`, if you use other methods to split, it will cause training error

0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively.

```
" Image file name           Image annotation "

zhoujun's avatar
zhoujun committed
26
27
train/word_001.jpg   0
train/word_002.jpg   180
WenmuZhou's avatar
WenmuZhou committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
```

The final training set should have the following file structure:

```
|-train_data
    |-cls
        |- cls_gt_train.txt
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
            | ...
```

- Test set

Similar to the training set, the test set also needs to be provided a folder
containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows:

```
|-train_data
    |-cls
        |- cls_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
```

### TRAINING
zhoujun's avatar
zhoujun committed
60
Write the prepared txt file and image folder path into the configuration file under the `Train/Eval.dataset.label_file_list` and `Train/Eval.dataset.data_dir` fields, the absolute path of the image consists of the `Train/Eval.dataset.data_dir` field and the image name recorded in the txt file.
WenmuZhou's avatar
WenmuZhou committed
61
62
63
64
65
66
67
68

PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.

Start training:

```
# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
69
# GPU training Support single card and multi-card training, specify the card number through --gpus.
WenmuZhou's avatar
WenmuZhou committed
70
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
WenmuZhou's avatar
WenmuZhou committed
71
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/cls/cls_mv3.yml
WenmuZhou's avatar
WenmuZhou committed
72
73
74
75
```

- Data Augmentation

WenmuZhou's avatar
WenmuZhou committed
76
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.
WenmuZhou's avatar
WenmuZhou committed
77
78
79
80

The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.

Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
WenmuZhou's avatar
WenmuZhou committed
81
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
WenmuZhou's avatar
WenmuZhou committed
82
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
WenmuZhou's avatar
WenmuZhou committed
83
84
85
86


- Training

WenmuZhou's avatar
WenmuZhou committed
87
88
89
90
91
92
93
94
95
96
97
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 1000 iter. The following content will be saved during training:
```bash
├── best_accuracy.pdopt # Optimizer parameters for the best model
├── best_accuracy.pdparams # Parameters of the best model
├── best_accuracy.states # Metric info and epochs of the best model
├── config.yml # Configuration file for this experiment
├── latest.pdopt # Optimizer parameters for the latest model
├── latest.pdparams # Parameters of the latest model
├── latest.states # Metric info and epochs of the latest model
└── train.log # Training log
```
WenmuZhou's avatar
WenmuZhou committed
98
99
100
101
102
103
104

If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.

**Note that the configuration file for prediction/evaluation must be consistent with the training.**

### EVALUATION

WenmuZhou's avatar
WenmuZhou committed
105
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
WenmuZhou's avatar
WenmuZhou committed
106
107
108
109
110
111
112
113
114
115
116
117
118

```
export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```

### PREDICTION

* Training engine prediction

Using the model trained by paddleocr, you can quickly get prediction through the following script.

WenmuZhou's avatar
WenmuZhou committed
119
Use `Global.infer_img` to specify the path of the predicted picture or folder, and use `Global.checkpoints` to specify the weight:
WenmuZhou's avatar
WenmuZhou committed
120
121
122

```
# Predict English results
WenmuZhou's avatar
WenmuZhou committed
123
python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words_en/word_10.png
WenmuZhou's avatar
WenmuZhou committed
124
125
126
127
```

Input image:

WenmuZhou's avatar
WenmuZhou committed
128
![](../imgs_words_en/word_10.png)
WenmuZhou's avatar
WenmuZhou committed
129
130
131
132

Get the prediction result of the input image:

```
WenmuZhou's avatar
WenmuZhou committed
133
134
infer_img: doc/imgs_words_en/word_10.png
     result: ('0', 0.9999995)
WenmuZhou's avatar
WenmuZhou committed
135
```