add_new_algorithm.en.md 10.3 KB
Newer Older
wanglch's avatar
wanglch committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
---
comments: true
---

# Add New Algorithm

PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.

* Data loading and processing
* Network
* Post-processing
* Loss
* Metric
* Optimizer

The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.

## Data loading and processing

Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:

```bash linenums="1"
ppocr/data/
├── imaug             # Scripts for image reading, data augment and label production
│   ├── label_ops.py  # Modules that transform the label
│   ├── operators.py  # Modules that transform the image
│   ├──.....
├── __init__.py
├── lmdb_dataset.py   # The dataset that reads the lmdb
└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
```

PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:

1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
2. Add code in the my_module.py file, the sample code is as follows:

    ```python linenums="1"
    class MyModule:
        def __init__(self, *args, **kwargs):
            # your init code
            pass

        def __call__(self, data):
            img = data['image']
            label = data['label']
            # your process code

            data['image'] = img
            data['label'] = label
            return data
    ```

3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.

All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:

```yaml linenums="1"
# angle class data process
transforms:
  - DecodeImage: # load image
      img_mode: BGR
      channel_first: False
  - MyModule:
      args1: args1
      args2: args2
  - KeepKeys:
      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
```

## Network

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash linenums="1"
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

    ```python linenums="1"
    import paddle
    import paddle.nn as nn
    import paddle.nn.functional as F


    class MyBackbone(nn.Layer):
        def __init__(self, *args, **kwargs):
            super(MyBackbone, self).__init__()
            # your init code
            self.conv = nn.xxxx

        def forward(self, inputs):
            # your network forward
            y = self.conv(inputs)
            return y
    ```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml linenums="1"
Architecture:
model_type: rec
algorithm: CRNN
Transform:
    name: MyTransform
    args1: args1
    args2: args2
Backbone:
    name: MyBackbone
    args1: args1
Neck:
    name: MyNeck
    args1: args1
Head:
    name: MyHead
    args1: args1
```

## Post-processing

Post-processing realizes decoding network output to obtain text box or recognized text. This part is under [ppocr/postprocess](../../ppocr/postprocess).
PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:

1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
2. Add code in the my_postprocess.py file, the sample code is as follows:

    ```python linenums="1"
    import paddle


    class MyPostProcess:
        def __init__(self, *args, **kwargs):
            # your init code
            pass

        def __call__(self, preds, label=None, *args, **kwargs):
            if isinstance(preds, paddle.Tensor):
                preds = preds.numpy()
            # you preds decode code
            preds = self.decode_preds(preds)
            if label is None:
                return preds
            # you label decode code
            label = self.decode_label(label)
            return preds, label

        def decode_preds(self, preds):
            # you preds decode code
            pass

        def decode_label(self, preds):
            # you label decode code
            pass
    ```

3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.

After the post-processing module is added, you only need to configure it in the configuration file to use, such as:

```yaml linenums="1"
PostProcess:
name: MyPostProcess
args1: args1
args2: args2
```

## Loss

The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:

1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
2. Add code in the my_loss.py file, the sample code is as follows:

    ```python linenums="1"
    import paddle
    from paddle import nn


    class MyLoss(nn.Layer):
        def __init__(self, **kwargs):
            super(MyLoss, self).__init__()
            # you init code
            pass

        def __call__(self, predicts, batch):
            label = batch[1]
            # your loss code
            loss = self.loss(input=predicts, label=label)
            return {'loss': loss}
    ```

3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.

After the loss function module is added, you only need to configure it in the configuration file to use it, such as:

```yaml linenums="1"
Loss:
  name: MyLoss
  args1: args1
  args2: args2
```

## Metric

Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:

1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
2. Add code in the my_metric.py file, the sample code is as follows:

    ```python linenums="1"

    class MyMetric(object):
        def __init__(self, main_indicator='acc', **kwargs):
            # main_indicator is used for select best model
            self.main_indicator = main_indicator
            self.reset()

        def __call__(self, preds, batch, *args, **kwargs):
            # preds is out of postprocess
            # batch is out of dataloader
            labels = batch[1]
            cur_correct_num = 0
            cur_all_num = 0
            # you metric code
            self.correct_num += cur_correct_num
            self.all_num += cur_all_num
            return {'acc': cur_correct_num / cur_all_num, }

        def get_metric(self):
            """
            return metrics {
                    'acc': 0,
                    'norm_edit_dis': 0,
                }
            """
            acc = self.correct_num / self.all_num
            self.reset()
            return {'acc': acc}

        def reset(self):
            # reset metric
            self.correct_num = 0
            self.all_num = 0

    ```

3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.

After the metric module is added, you only need to configure it in the configuration file to use it, such as:

```yaml linenums="1"
Metric:
  name: MyMetric
  main_indicator: acc
```

## Optimizer

The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
Modules without built-in can be added through the following steps, take `optimizer` as an example:

1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:

    ```python linenums="1"
    from paddle import optimizer as optim


    class MyOptim(object):
        def __init__(self, learning_rate=0.001, *args, **kwargs):
            self.learning_rate = learning_rate

        def __call__(self, parameters):
            # It is recommended to wrap the built-in optimizer of paddle
            opt = optim.XXX(
                learning_rate=self.learning_rate,
                parameters=parameters)
            return opt

    ```

After the optimizer module is added, you only need to configure it in the configuration file to use, such as:

```yaml linenums="1"
Optimizer:
  name: MyOptim
  args1: args1
  args2: args2
  lr:
    name: Cosine
    learning_rate: 0.001
  regularizer:
    name: 'L2'
    factor: 0
```