"docs/source/experiment/webportal.rst" did not exist on "c9ddce9967746c792d56c72ad2b929041251e039"
recognition.md 8.15 KB
Newer Older
1
## Text recognition
tink2123's avatar
tink2123 committed
2

3
### Data preparation
tink2123's avatar
tink2123 committed
4
5


6
PaddleOCR pupports two data formats: `lmdb` used to train public data and debug algorithms; `General Data` to train your own data:
tink2123's avatar
tink2123 committed
7

8
Please set the dataset as follows:
tink2123's avatar
tink2123 committed
9

10
The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory:
tink2123's avatar
tink2123 committed
11
12
13
14
15
16

```
ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset
```


17
* Data download
tink2123's avatar
tink2123 committed
18

19
If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark
tink2123's avatar
tink2123 committed
20

21
* Use your own dataset:
tink2123's avatar
tink2123 committed
22

23
If you want to use your own data for training, please refer to the following to organize your data.
tink2123's avatar
tink2123 committed
24

25
- Training set
tink2123's avatar
tink2123 committed
26

27
First put the training pictures in the same folder (train_images), and use a txt file (rec_gt_train.txt) to record the picture path and label.
tink2123's avatar
tink2123 committed
28

29
* Note: by default, please split the image path and image label with \t, if you use other methods to split, it will cause training error
tink2123's avatar
tink2123 committed
30
31

```
32
" Image file name           Image annotation "
tink2123's avatar
tink2123 committed
33
34
35
36

train_data/train_0001.jpg   简单可依赖
train_data/train_0002.jpg   用科技让复杂的世界更简单
```
37
PaddleOCR provides a label file for training the icdar2015 dataset, which can be downloaded in the following ways:
tink2123's avatar
fix doc  
tink2123 committed
38
39

```
40
# Training set label
tink2123's avatar
fix doc  
tink2123 committed
41
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
42
# Test Set Label
tink2123's avatar
tink2123 committed
43
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
tink2123's avatar
fix doc  
tink2123 committed
44
```
tink2123's avatar
tink2123 committed
45

46
The final training set should have the following file structure:
tink2123's avatar
tink2123 committed
47

tink2123's avatar
tink2123 committed
48
```
tink2123's avatar
tink2123 committed
49
|-train_data
tink2123's avatar
tink2123 committed
50
51
    |-ic15_data
        |- rec_gt_train.txt
tink2123's avatar
fix doc  
tink2123 committed
52
53
54
55
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
tink2123's avatar
tink2123 committed
56
            | ...
tink2123's avatar
tink2123 committed
57
```
tink2123's avatar
tink2123 committed
58

59
- Test set
tink2123's avatar
tink2123 committed
60

61
Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows:
tink2123's avatar
tink2123 committed
62

tink2123's avatar
tink2123 committed
63
```
tink2123's avatar
tink2123 committed
64
|-train_data
tink2123's avatar
tink2123 committed
65
    |-ic15_data
tink2123's avatar
fix doc  
tink2123 committed
66
67
68
69
70
        |- rec_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
tink2123's avatar
tink2123 committed
71
            | ...
tink2123's avatar
tink2123 committed
72
```
tink2123's avatar
tink2123 committed
73

74
- Dictionary
tink2123's avatar
tink2123 committed
75

76
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
tink2123's avatar
tink2123 committed
77

78
Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format:
tink2123's avatar
tink2123 committed
79

tink2123's avatar
tink2123 committed
80
81
```
l
tink2123's avatar
tink2123 committed
82
83
d
a
tink2123's avatar
tink2123 committed
84
85
d
r
tink2123's avatar
tink2123 committed
86
n
tink2123's avatar
tink2123 committed
87
```
tink2123's avatar
tink2123 committed
88

89
word_dict.txt There is a single word in each line, which maps characters and numeric indexes together, and "and" will be mapped to [2 5 1]
tink2123's avatar
tink2123 committed
90

91
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters,
tink2123's avatar
tink2123 committed
92

93
`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters,
tink2123's avatar
tink2123 committed
94

95
You can use them as needed.
tink2123's avatar
tink2123 committed
96

97
To customize the dic file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.。
tink2123's avatar
tink2123 committed
98

99
100
101
102
103
### Start training

PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:

First download the pretrain model, you can download the trained model to finetune on the icdar2015 data
tink2123's avatar
tink2123 committed
104
105

```
tink2123's avatar
tink2123 committed
106
cd PaddleOCR/
107
# Download the pre-trained model of MobileNetV3
tink2123's avatar
tink2123 committed
108
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar
109
# Decompress model parameters
tink2123's avatar
tink2123 committed
110
111
112
113
cd pretrain_models
tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
```

114
Start training:
tink2123's avatar
tink2123 committed
115

tink2123's avatar
tink2123 committed
116
```
117
# Set PYTHONPATH path
tink2123's avatar
tink2123 committed
118
export PYTHONPATH=$PYTHONPATH:.
119
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
tink2123's avatar
tink2123 committed
120
export CUDA_VISIBLE_DEVICES=0,1,2,3
121
# Training icdar15 English data
tink2123's avatar
fix doc  
tink2123 committed
122
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
tink2123's avatar
tink2123 committed
123
124
```

125
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process.
tink2123's avatar
tink2123 committed
126

127
If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
tink2123's avatar
tink2123 committed
128

129
* Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are:
tink2123's avatar
tink2123 committed
130
131


132
| Configuration file |  Algorithm name |   backbone |   trans   |   seq      |     pred     |
tink2123's avatar
tink2123 committed
133
134
135
136
137
138
139
140
141
142
143
144
| :--------: |  :-------:   | :-------:  |   :-------:   |   :-----:   |  :-----:   |
| rec_chinese_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  |
| rec_icdar15_train.yml |  CRNN |   Mobilenet_v3 large 0.5 |  None   |  BiLSTM |  ctc  |
| rec_mv3_none_bilstm_ctc.yml |  CRNN |   Mobilenet_v3 large 0.5 |  None   |  BiLSTM |  ctc  |
| rec_mv3_none_none_ctc.yml |  Rosetta |   Mobilenet_v3 large 0.5 |  None   |  None |  ctc  |
| rec_mv3_tps_bilstm_ctc.yml |  STARNet |   Mobilenet_v3 large 0.5 |  tps   |  BiLSTM |  ctc  |
| rec_mv3_tps_bilstm_attn.yml |  RARE |   Mobilenet_v3 large 0.5 |  tps   |  BiLSTM |  attention  |
| rec_r34_vd_none_bilstm_ctc.yml |  CRNN |   Resnet34_vd |  None   |  BiLSTM |  ctc  |
| rec_r34_vd_none_none_ctc.yml |  Rosetta |   Resnet34_vd |  None   |  None |  ctc  |
| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |

145
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
tink2123's avatar
tink2123 committed
146

147
Take `rec_mv3_none_none_ctc.yml` as an example:
tink2123's avatar
tink2123 committed
148
149
150
```
Global:
  ...
151
  # Modify image_shape to fit long text
tink2123's avatar
tink2123 committed
152
153
  image_shape: [3, 32, 320]
  ...
154
  # Modify character type
tink2123's avatar
tink2123 committed
155
  character_type: ch
156
  # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
tink2123's avatar
tink2123 committed
157
158
  character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
  ...
159
  # Modify reader type
tink2123's avatar
tink2123 committed
160
161
162
163
164
  reader_yml: ./configs/rec/rec_chinese_reader.yml
  ...

...
```
165
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
tink2123's avatar
tink2123 committed
166
167


tink2123's avatar
tink2123 committed
168

169
### Evaluation
tink2123's avatar
tink2123 committed
170

171
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
tink2123's avatar
tink2123 committed
172
173
174

```
export CUDA_VISIBLE_DEVICES=0
175
# GPU evaluation, Global.checkpoints is the weight to be tested
tink2123's avatar
fix doc  
tink2123 committed
176
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
tink2123's avatar
tink2123 committed
177
178
```

179
### prediction
tink2123's avatar
tink2123 committed
180

181
* Training engine prediction
tink2123's avatar
tink2123 committed
182

183
The model trained using PaddleOCR can be quickly predicted by the following script.
tink2123's avatar
tink2123 committed
184

185
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
tink2123's avatar
tink2123 committed
186
187

```
188
# Predict English results
tink2123's avatar
tink2123 committed
189
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
tink2123's avatar
tink2123 committed
190
```
tink2123's avatar
tink2123 committed
191

192
Input image:
tink2123's avatar
tink2123 committed
193
194
195

![](./imgs_words/en/word_1.png)

196
Get the prediction result of the input image:
tink2123's avatar
tink2123 committed
197
198

```
tink2123's avatar
tink2123 committed
199
infer_img: doc/imgs_words/en/word_1.png
tink2123's avatar
tink2123 committed
200
201
202
203
     index: [19 24 18 23 29]
     word : joint
```

204
205
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`,
You can use the following command to predict the Chinese model.
tink2123's avatar
tink2123 committed
206
207

```
208
# Predict Chinese results
tink2123's avatar
tink2123 committed
209
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
tink2123's avatar
tink2123 committed
210
211
```

212
Input image:
tink2123's avatar
tink2123 committed
213

tink2123's avatar
tink2123 committed
214
![](./imgs_words/ch/word_1.jpg)
xiaoting's avatar
xiaoting committed
215

216
Get the prediction result of the input image:
tink2123's avatar
tink2123 committed
217
218

```
tink2123's avatar
tink2123 committed
219
infer_img: doc/imgs_words/ch/word_1.jpg
tink2123's avatar
tink2123 committed
220
221
     index: [2092  177  312 2503]
     word : 韩国小馆
tink2123's avatar
tink2123 committed
222
```