config_en.md 18.8 KB
Newer Older
fanruinet's avatar
fanruinet committed
1
# Configuration
2
3

- [1. Optional Parameter List](#1-optional-parameter-list)
fanruinet's avatar
fanruinet committed
4
- [2. Introduction to Global Parameters of Configuration File](#2-introduction-to-global-parameters-of-configuration-file)
5
6
7
8
9
- [3. Multilingual Config File Generation](#3-multilingual-config-file-generation)

<a name="1-optional-parameter-list"></a>

## 1. Optional Parameter List
Khanh Tran's avatar
Khanh Tran committed
10

WenmuZhou's avatar
WenmuZhou committed
11
The following list can be viewed through `--help`
Khanh Tran's avatar
Khanh Tran committed
12
13
14

|         FLAG             |     Supported script    |        Use        |      Defaults       |         Note         |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
WenmuZhou's avatar
WenmuZhou committed
15
16
|          -c              |      ALL       |  Specify configuration file to use  |  None  |  **Please refer to the parameter introduction for configuration file usage** |
|          -o              |      ALL       |  set configuration options  |  None  |  Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
Khanh Tran's avatar
Khanh Tran committed
17

fanruinet's avatar
fanruinet committed
18
<a name="2-introduction-to-global-parameters-of-configuration-file"></a>
19

fanruinet's avatar
fanruinet committed
20
## 2. Introduction to Global Parameters of Configuration File
Khanh Tran's avatar
Khanh Tran committed
21

WenmuZhou's avatar
WenmuZhou committed
22
23
Take rec_chinese_lite_train_v2.0.yml as an example
### Global
Khanh Tran's avatar
Khanh Tran committed
24

WenmuZhou's avatar
WenmuZhou committed
25
|         Parameter             |            Use                |      Defaults       |            Note            |
Khanh Tran's avatar
Khanh Tran committed
26
| :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
WenmuZhou's avatar
WenmuZhou committed
27
28
|      use_gpu             |    Set using GPU or not           |       true        |                \                 |
|      epoch_num           |    Maximum training epoch number             |       500        |                \                 |
WenmuZhou's avatar
WenmuZhou committed
29
|      log_smooth_window   |    Log queue length, the median value in the queue each time will be printed           |       20          |                \                 |
Khanh Tran's avatar
Khanh Tran committed
30
|      print_batch_step    |    Set print log interval         |       10          |                \                 |
WenmuZhou's avatar
WenmuZhou committed
31
|      save_model_dir      |    Set model save path        |  output/{算法名称}  |                \                 |
Khanh Tran's avatar
Khanh Tran committed
32
|      save_epoch_step     |    Set model save interval        |       3           |                \                 |
fanruinet's avatar
fanruinet committed
33
|      eval_batch_step     |    Set the model evaluation interval        | 2000 or [1000, 2000]        | running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration   |
WenmuZhou's avatar
WenmuZhou committed
34
35
36
37
38
|      cal_metric_during_train     |    Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated        |       true         |                \                 |
|      load_static_weights     |   Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)        |       true         |                \                 |
|      pretrained_model    |    Set the path of the pre-trained model      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
|      checkpoints         |    set model parameter path            |       None        |   Used to load parameters after interruption to continue training|
|      use_visualdl  |    Set whether to enable visualdl for visual log display |          False        |    [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
39
|      use_wandb     |    Set whether to enable W&B for visual log display      | False | [Documentation](https://docs.wandb.ai/)
Leif's avatar
Leif committed
40
41
|      infer_img            |    Set inference image path or folder path     |       ./infer_img | \||
|      character_dict_path |    Set dictionary path            |  ./ppocr/utils/ppocr_keys_v1.txt  | If the character_dict_path is None, model can only recognize number and lower letters |
WenmuZhou's avatar
WenmuZhou committed
42
|      max_text_length     |    Set the maximum length of text        |       25          |                \                 |
Leif's avatar
Leif committed
43
|      use_space_char     |    Set whether to recognize spaces             |        True      |          \|               |
WenmuZhou's avatar
WenmuZhou committed
44
45
46
47
48
49
50
51
52
53
|      label_list          |    Set the angle supported by the direction classifier       |    ['0','180']    |     Only valid in angle classifier model |
|      save_res_path          |    Set the save address of the test model results       |    ./output/det_db/predicts_db.txt    |     Only valid in the text detection model |

### Optimizer ([ppocr/optimizer](../../ppocr/optimizer))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Optimizer class name          |  Adam  |  Currently supports`Momentum`,`Adam`,`RMSProp`, see [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py)  |
|      beta1           |    Set the exponential decay rate for the 1st moment estimates  |       0.9         |               \             |
|      beta2           |    Set the exponential decay rate for the 2nd moment estimates  |     0.999         |               \             |
zhoujun's avatar
zhoujun committed
54
|      clip_norm           |    The maximum norm value  |    -         |               \             |
WenmuZhou's avatar
WenmuZhou committed
55
56
57
58
59
60
61
62
63
|      **lr**                |         Set the learning rate decay method       |   -    |       \  |
|        name    |      Learning rate decay class name   |         Cosine       | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
|        learning_rate      |    Set the base learning rate        |       0.001      |  \        |
|      **regularizer**      |  Set network regularization method        |       -      | \        |
|        name      |    Regularizer class name      |       L2     |  Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py)        |
|        factor      |    Learning rate decay coefficient       |       0.00004     |  \        |


### Architecture ([ppocr/modeling](../../ppocr/modeling))
LDOUBLEV's avatar
LDOUBLEV committed
64
In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head
WenmuZhou's avatar
WenmuZhou committed
65
66
67
68

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      model_type        |         Network Type          |  rec  |  Currently support`rec`,`det`,`cls`  |
fanruinet's avatar
fanruinet committed
69
|      algorithm           |    Model name  |       CRNN         |               See [algorithm_overview](./algorithm_overview_en.md) for the support list             |
70
|      **Transform**           |    Set the transformation method  |       -       |               Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transforms) for details            |
WenmuZhou's avatar
WenmuZhou committed
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
|        name    |      Transformation class name   |         TPS       | Currently supports `TPS` |
|        num_fiducial      |   Number of TPS control points        |       20      |  Ten on the top and bottom       |
|        loc_lr      |    Localization network learning rate        |       0.1      |  \      |
|        model_name      |    Localization network size        |       small      |  Currently support`small`,`large`       |
|      **Backbone**      |  Set the network backbone class name        |       -      | see [ppocr/modeling/backbones](../../ppocr/modeling/backbones)        |
|        name      |    backbone class name       |       ResNet     | Currently support`MobileNetV3`,`ResNet`        |
|        layers      |    resnet layers       |       34     |  Currently support18,34,50,101,152,200       |
|        model_name      |    MobileNetV3 network size       |       small     |  Currently support`small`,`large`       |
|      **Neck**      |  Set network neck        |       -      | see[ppocr/modeling/necks](../../ppocr/modeling/necks)        |
|        name      |    neck class name       |       SequenceEncoder     | Currently support`SequenceEncoder`,`DBFPN`        |
|        encoder_type      |    SequenceEncoder encoder type       |       rnn     |  Currently support`reshape`,`fc`,`rnn`       |
|        hidden_size      |   rnn number of internal units       |       48     |  \      |
|        out_channels      |   Number of DBFPN output channels       |       256     |  \      |
|      **Head**      |  Set the network head        |       -      | see[ppocr/modeling/heads](../../ppocr/modeling/heads)        |
|        name      |    head class name       |       CTCHead     | Currently support`CTCHead`,`DBHead`,`ClsHead`        |
|        fc_decay      |    CTCHead regularization coefficient       |       0.0004     |  \      |
|        k      |   DBHead binarization coefficient       |       50     |  \      |
|        class_dim      |   ClsHead output category number       |       2     |  \      |


### Loss ([ppocr/losses](../../ppocr/losses))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         loss class name          |  CTCLoss  |  Currently support`CTCLoss`,`DBLoss`,`ClsLoss`  |
|      balance_loss        |        Whether to balance the number of positive and negative samples in DBLossloss (using OHEM)         |  True  |  \  |
|      ohem_ratio        |        The negative and positive sample ratio of OHEM in DBLossloss         |  3  |  \  |
|      main_loss_type        |        The loss used by shrink_map in DBLossloss        |  DiceLoss  |  Currently support`DiceLoss`,`BCELoss`  |
|      alpha        |        The coefficient of shrink_map_loss in DBLossloss       |  5  |  \  |
|      beta        |        The coefficient of threshold_map_loss in DBLossloss       |  10  |  \  |
tink2123's avatar
tink2123 committed
101

WenmuZhou's avatar
WenmuZhou committed
102
### PostProcess ([ppocr/postprocess](../../ppocr/postprocess))
tink2123's avatar
tink2123 committed
103

WenmuZhou's avatar
WenmuZhou committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Post-processing class name          |  CTCLabelDecode  |  Currently support`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`  |
|      thresh        |        The threshold for binarization of the segmentation map in DBPostProcess         |  0.3  |  \  |
|      box_thresh        |        The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output         |  0.7  |  \  |
|      max_candidates        |        The maximum number of text boxes output in DBPostProcess        |  1000  |   |
|      unclip_ratio        |        The unclip ratio of the text box in DBPostProcess       |  2.0  |  \  |

### Metric ([ppocr/metrics](../../ppocr/metrics))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Metric method name          |  CTCLabelDecode  |  Currently support`DetMetric`,`RecMetric`,`ClsMetric`  |
|      main_indicator        |        Main indicators, used to select the best model        |  acc |  For the detection method is hmean, the recognition and classification method is acc  |
tink2123's avatar
tink2123 committed
118

WenmuZhou's avatar
WenmuZhou committed
119
120
### Dataset  ([ppocr/data](../../ppocr/data))
|         Parameter             |            Use            |      Defaults        |            Note             |
tink2123's avatar
tink2123 committed
121
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
WenmuZhou's avatar
WenmuZhou committed
122
|      **dataset**        |         Return one sample per iteration          |  -  |  -  |
MissPenguin's avatar
MissPenguin committed
123
|      name        |        dataset class name         |  SimpleDataSet |   Currently support`SimpleDataSet`,`LMDBDataSet`  |
WenmuZhou's avatar
WenmuZhou committed
124
|      data_dir        |        Image folder path        |  ./train_data |  \  |
MissPenguin's avatar
MissPenguin committed
125
|      label_file_list        |        Groundtruth file path         |  ["./train_data/train_list.txt"] | This parameter is not required when dataset is LMDBDataSet   |
WenmuZhou's avatar
WenmuZhou committed
126
127
128
129
130
131
|      ratio_list        |        Ratio of data set         |  [1.0] | If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset   |
|      transforms        |        List of methods to transform images and labels         |  [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] |   see[ppocr/data/imaug](../../ppocr/data/imaug)  |
|      **loader**        |        dataloader related         |  - |   |
|      shuffle        |        Does each epoch disrupt the order of the data set         |  True | \  |
|      batch_size_per_card        |        Single card batch size during training         |  256 | \  |
|      drop_last        |        Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size        |  True | \  |
WenmuZhou's avatar
WenmuZhou committed
132
|      num_workers        |        The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process       |  8 | \  |
tink2123's avatar
tink2123 committed
133

134
135
136
137
138
139
140
141
142
143
144
### Weights & Biases ([W&B](../../ppocr/utils/loggers/wandb_logger.py))
|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|          project              |     Project to which the run is to be logged | uncategorized | \
|          name                 |     Alias/Name of the run | Randomly generated by wandb | \ 
|          id                   |     ID of the run    | Randomly generated by wandb     | \
|          entity               | User or team to which the run is being logged         | The logged in user | \
|          save_dir             | local directory in which all the models and other data is saved | wandb | \
|          config               | model configuration | None | \


145
<a name="3-multilingual-config-file-generation"></a>
tink2123's avatar
tink2123 committed
146

147
## 3. Multilingual Config File Generation
tink2123's avatar
tink2123 committed
148

fanruinet's avatar
fanruinet committed
149
PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is
tink2123's avatar
tink2123 committed
150
151
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)

fanruinet's avatar
fanruinet committed
152
There are two ways to create the required configuration file:
tink2123's avatar
tink2123 committed
153

LDOUBLEV's avatar
LDOUBLEV committed
154
1. Automatically generated by script
tink2123's avatar
tink2123 committed
155

fanruinet's avatar
fanruinet committed
156
Script [generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) can help you generate configuration files for multi-language models.
tink2123's avatar
tink2123 committed
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

- Take Italian as an example, if your data is prepared in the following format:
    ```
    |-train_data
        |- it_train.txt # train_set label
        |- it_val.txt # val_set label
        |- data
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
    ```

    You can use the default parameters to generate a configuration file:

    ```bash
    # The code needs to be run in the specified directory
    cd PaddleOCR/configs/rec/multi_language/
    # Set the configuration file of the language to be generated through the -l or --language parameter.
    # This command will write the default parameters into the configuration file
    python3 generate_multi_language_configs.py -l it
    ```

- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

    ```bash
    # -l or --language field is required
    # --train to modify the training set
    # --val to modify the validation set
    # --data_dir to modify the data set directory
    # --dict to modify the dict path
    # -o to modify the corresponding default parameters
    cd PaddleOCR/configs/rec/multi_language/
    python3 generate_multi_language_configs.py -l it \  # language
    --train {path/of/train_label.txt} \ # path of train_label
    --val {path/of/val_label.txt} \     # path of val_label
    --data_dir {train_data/path} \      # root directory of training data
    --dict {path/of/dict} \             # path of dict
    -o Global.use_gpu=False             # whether to use gpu
    ...

    ```
Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

LDOUBLEV's avatar
LDOUBLEV committed
201
2. Manually modify the configuration file
tink2123's avatar
tink2123 committed
202
203
204
205
206
207
208
209
210

   You can also manually modify the following fields in the template:

   ```
    Global:
      use_gpu: True
      epoch_num: 500
      ...
      character_dict_path:  {path/of/dict} # path of dict
fanruinet's avatar
fanruinet committed
211

tink2123's avatar
tink2123 committed
212
213
214
215
216
217
   Train:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of training data
        label_file_list: ["./train_data/train_list.txt"] # train label path
      ...
fanruinet's avatar
fanruinet committed
218

tink2123's avatar
tink2123 committed
219
220
221
222
223
224
   Eval:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of val data
        label_file_list: ["./train_data/val_list.txt"] # val label path
      ...
fanruinet's avatar
fanruinet committed
225

tink2123's avatar
tink2123 committed
226
   ```
LDOUBLEV's avatar
LDOUBLEV committed
227

xiaoting's avatar
xiaoting committed
228

LDOUBLEV's avatar
LDOUBLEV committed
229
230
Currently, the multi-language algorithms supported by PaddleOCR are:

Leif's avatar
Leif committed
231
232
233
234
235
236
237
238
239
240
241
242
| Configuration file |  Algorithm name |   backbone |   trans   |   seq      |     pred     |  language |
| :--------: |  :-------:   | :-------:  |   :-------:   |   :-----:   |  :-----:   | :-----:  |
| rec_chinese_cht_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | chinese traditional  |
| rec_en_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | English(Case sensitive)   |
| rec_french_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | French |
| rec_ger_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | German   |
| rec_japan_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Japanese |
| rec_korean_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Korean  |
| rec_latin_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Latin  |
| rec_arabic_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | arabic |
| rec_cyrillic_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | cyrillic   |
| rec_devanagari_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | devanagari  |
LDOUBLEV's avatar
LDOUBLEV committed
243
244
245
246
247

For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
248
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)