data_process.rst 9.26 KB
Newer Older
limm's avatar
limm committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
.. role:: hidden
    :class: hidden-section

Data Process
=================

In MMPreTrain, the data process and the dataset is decomposed. The
datasets only define how to get samples' basic information from the file
system. These basic information includes the ground-truth label and raw
images data / the paths of images.The data process includes data transforms,
data preprocessors and batch augmentations.

- :mod:`Data Transforms <mmpretrain.datasets.transforms>`: Transforms includes loading, preprocessing, formatting and etc.
- :mod:`Data Preprocessors <mmpretrain.models.utils.data_preprocessor>`: Processes includes collate, normalization, stacking, channel fliping and etc.

  - :mod:`Batch Augmentations <mmpretrain.models.utils.batch_augments>`: Batch augmentation involves multiple samples, such as Mixup and CutMix.

.. module:: mmpretrain.datasets.transforms

Data Transforms
--------------------

To prepare the inputs data, we need to do some transforms on these basic
information. These transforms includes loading, preprocessing and
formatting. And a series of data transforms makes up a data pipeline.
Therefore, you can find the a ``pipeline`` argument in the configs of dataset,
for example:

.. code:: python

    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='RandomResizedCrop', scale=224),
        dict(type='RandomFlip', prob=0.5, direction='horizontal'),
        dict(type='PackInputs'),
    ]

    train_dataloader = dict(
        ....
        dataset=dict(
            pipeline=train_pipeline,
            ....),
        ....
    )

Every item of a pipeline list is one of the following data transforms class. And if you want to add a custom data transformation class, the tutorial :doc:`Custom Data Pipelines </advanced_guides/pipeline>` will help you.

.. contents::
   :depth: 1
   :local:
   :backlinks: top

Loading and Formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: data_transform.rst

   LoadImageFromFile
   PackInputs
   PackMultiTaskInputs
   PILToNumpy
   NumpyToPIL
   Transpose
   Collect

Processing and Augmentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: data_transform.rst

   Albumentations
   CenterCrop
   ColorJitter
   EfficientNetCenterCrop
   EfficientNetRandomCrop
   Lighting
   Normalize
   RandomCrop
   RandomErasing
   RandomFlip
   RandomGrayscale
   RandomResize
   RandomResizedCrop
   Resize
   ResizeEdge
   BEiTMaskGenerator
   SimMIMMaskGenerator

Composed Augmentation
"""""""""""""""""""""
Composed augmentation is a kind of methods which compose a series of data
augmentation transforms, such as ``AutoAugment`` and ``RandAugment``.

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: data_transform.rst

   AutoAugment
   RandAugment

The above transforms is composed from a group of policies from the below random
transforms:

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: data_transform.rst

   AutoContrast
   Brightness
   ColorTransform
   Contrast
   Cutout
   Equalize
   GaussianBlur
   Invert
   Posterize
   Rotate
   Sharpness
   Shear
   Solarize
   SolarizeAdd
   Translate
   BaseAugTransform

MMCV transforms
^^^^^^^^^^^^^^^

We also provides many transforms in MMCV. You can use them directly in the config files. Here are some frequently used transforms, and the whole transforms list can be found in :external+mmcv:doc:`api/transforms`.

Transform Wrapper
^^^^^^^^^^^^^^^^^

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: data_transform.rst

   MultiView

.. module:: mmpretrain.models.utils.data_preprocessor


TorchVision Transforms
^^^^^^^^^^^^^^^^^^^^^^

We also provide all the transforms in TorchVision. You can use them the like following examples:

**1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)**

Add TorchVision Augs surrounded by ``dict(type='NumpyToPIL', to_rgb=True),`` and ``dict(type='PILToNumpy', to_bgr=True),``

.. code:: python

    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
        dict(type='torchvision/RandomResizedCrop',size=176),
        dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
        dict(type='RandomFlip', prob=0.5, direction='horizontal'),
        dict(type='PackInputs'),
    ]

    data_preprocessor = dict(
        num_classes=1000,
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
    )


**2. Use TorchVision Augs and ToTensor&Normalize**

Make sure the 'img' has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.

.. code:: python

    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
        dict(
            type='torchvision/RandomResizedCrop',
            size=176,
            interpolation='bilinear'),            # accept str format interpolation mode
        dict(type='torchvision/RandomHorizontalFlip', p=0.5),
        dict(
            type='torchvision/TrivialAugmentWide',
            interpolation='bilinear'),
        dict(type='torchvision/PILToTensor'),
        dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
        dict(
            type='torchvision/Normalize',
            mean=(0.485, 0.456, 0.406),
            std=(0.229, 0.224, 0.225),
        ),
        dict(type='torchvision/RandomErasing', p=0.1),
        dict(type='PackInputs'),
    ]

    data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline


**3. Use TorchVision Augs Except ToTensor&Normalize**

.. code:: python

    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
        dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
        dict(type='torchvision/RandomHorizontalFlip', p=0.5),
        dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
        dict(type='PackInputs'),
    ]

    # here the Normalize params is for the RGB format
    data_preprocessor = dict(
        num_classes=1000,
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=False,
    )


Data Preprocessors
------------------

The data preprocessor is also a component to process the data before feeding data to the neural network.
Comparing with the data transforms, the data preprocessor is a module of the classifier,
and it takes a batch of data to process, which means it can use GPU and batch to accelebrate the processing.

The default data preprocessor in MMPreTrain could do the pre-processing like following:

1. Move data to the target device.
2. Pad inputs to the maximum size of current batch.
3. Stack inputs to a batch.
4. Convert inputs from bgr to rgb if the shape of input is (3, H, W).
5. Normalize image with defined std and mean.
6. Do batch augmentations like Mixup and CutMix during training.

You can configure the data preprocessor by the ``data_preprocessor`` field or ``model.data_preprocessor`` field in the config file. Typical usages are as below:

.. code-block:: python

    data_preprocessor = dict(
        # RGB format normalization parameters
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True,    # convert image from BGR to RGB
    )

Or define in ``model.data_preprocessor`` as following:

.. code-block:: python

   model = dict(
       backbone = ...,
       neck = ...,
       head = ...,
       data_preprocessor = dict(
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True)
       train_cfg=...,
   )

Note that the ``model.data_preprocessor`` has higher priority than ``data_preprocessor``.

.. autosummary::
   :toctree: generated
   :nosignatures:

   ClsDataPreprocessor
   SelfSupDataPreprocessor
   TwoNormDataPreprocessor
   VideoDataPreprocessor

.. module:: mmpretrain.models.utils.batch_augments

Batch Augmentations
^^^^^^^^^^^^^^^^^^^^

The batch augmentation is a component of data preprocessors. It involves multiple samples and mix them in some way, such as Mixup and CutMix.

These augmentations are usually only used during training, therefore, we use the ``model.train_cfg`` field to configure them in config files.

.. code-block:: python

   model = dict(
       backbone=...,
       neck=...,
       head=...,
       train_cfg=dict(augments=[
           dict(type='Mixup', alpha=0.8),
           dict(type='CutMix', alpha=1.0),
       ]),
   )

You can also specify the probabilities of every batch augmentation by the ``probs`` field.

.. code-block:: python

   model = dict(
       backbone=...,
       neck=...,
       head=...,
       train_cfg=dict(augments=[
           dict(type='Mixup', alpha=0.8),
           dict(type='CutMix', alpha=1.0),
       ], probs=[0.3, 0.7])
   )

Here is a list of batch augmentations can be used in MMPreTrain.

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: callable.rst

   Mixup
   CutMix
   ResizeMix