transforms.rst 8.66 KB
Newer Older
1
2
.. _transforms:

3
4
Transforming and augmenting images
==================================
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
5
6
7

.. currentmodule:: torchvision.transforms

8
9
10
Transforms are common image transformations available in the
``torchvision.transforms`` module. They can be chained together using
:class:`Compose`.
11
12
13
Most transform classes have a function equivalent: :ref:`functional
transforms <functional_transforms>` give fine-grained control over the
transformations.
14
15
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
16

17
18
19
20
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_ images
and tensor images, although some transformations are PIL-only and some are
tensor-only. The :ref:`conversion_transforms` may be used to convert to and from
PIL images, or for converting dtypes and ranges.
21
22
23
24
25
26
27

The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.

28
The expected range of the values of a tensor image is implicitly defined by
29
30
31
32
33
34
35
36
37
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.

Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.
38

39
The following examples illustrate the use of the available transforms:
40
41
42
43
44
45
46
47
48
49
50
51
52

    * :ref:`sphx_glr_auto_examples_plot_transforms.py`

        .. figure:: ../source/auto_examples/images/sphx_glr_plot_transforms_001.png
            :align: center
            :scale: 65%

    * :ref:`sphx_glr_auto_examples_plot_scripted_tensor_transforms.py`

        .. figure:: ../source/auto_examples/images/sphx_glr_plot_scripted_tensor_transforms_001.png
            :align: center
            :scale: 30%

53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
.. warning::

    Since v0.8.0 all random transformations are using torch default random generator to sample random parameters.
    It is a backward compatibility breaking change and user should set the random state as following:

    .. code:: python

        # Previous versions
        # import random
        # random.seed(12)

        # Now
        import torch
        torch.manual_seed(17)

    Please, keep in mind that the same seed for torch random generator and Python random generator will not
    produce the same results.

71

72
73
74
75
Transforms scriptability
------------------------

.. TODO: Add note about v2 scriptability (in next PR)
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

In order to script the transformations, please use ``torch.nn.Sequential`` instead of :class:`Compose`.

.. code:: python

    transforms = torch.nn.Sequential(
        transforms.CenterCrop(10),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
    )
    scripted_transforms = torch.jit.script(transforms)

Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor`` and does not require
`lambda` functions or ``PIL.Image``.

For any custom transformations to be used with ``torch.jit.script``, they should be derived from ``torch.nn.Module``.

92

93
94
Geometry
--------
95

96
97
98
99
.. autosummary::
    :toctree: generated/
    :template: class.rst

100
    Resize
101
    v2.Resize
102
103
104
    v2.ScaleJitter
    v2.RandomShortestSize
    v2.RandomResize
105
    RandomCrop
106
    v2.RandomCrop
107
    RandomResizedCrop
108
    v2.RandomResizedCrop
109
    v2.RandomIoUCrop
110
    CenterCrop
111
    v2.CenterCrop
112
    FiveCrop
113
    v2.FiveCrop
114
    TenCrop
115
    v2.TenCrop
116
    Pad
117
    v2.Pad
118
119
120
    v2.RandomZoomOut
    RandomRotation
    v2.RandomRotation
121
    RandomAffine
122
    v2.RandomAffine
123
    RandomPerspective
124
    v2.RandomPerspective
125
126
    ElasticTransform
    v2.ElasticTransform
127
    RandomHorizontalFlip
128
    v2.RandomHorizontalFlip
129
    RandomVerticalFlip
130
    v2.RandomVerticalFlip
131

132

133
134
Color
-----
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
135

136
137
138
139
140
.. autosummary::
    :toctree: generated/
    :template: class.rst

    ColorJitter
141
    v2.ColorJitter
142
    v2.RandomPhotometricDistort
143
    Grayscale
144
    v2.Grayscale
145
    RandomGrayscale
146
    v2.RandomGrayscale
147
    GaussianBlur
148
    v2.GaussianBlur
149
    RandomInvert
150
    v2.RandomInvert
151
    RandomPosterize
152
    v2.RandomPosterize
153
    RandomSolarize
154
    v2.RandomSolarize
155
    RandomAdjustSharpness
156
    v2.RandomAdjustSharpness
157
    RandomAutocontrast
158
    v2.RandomAutocontrast
159
    RandomEqualize
160
    v2.RandomEqualize
161

162
163
Composition
-----------
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
164

165
166
167
.. autosummary::
    :toctree: generated/
    :template: class.rst
vfdev's avatar
vfdev committed
168

169
    Compose
170
    v2.Compose
171
    RandomApply
172
    v2.RandomApply
173
    RandomChoice
174
    v2.RandomChoice
175
    RandomOrder
176
    v2.RandomOrder
vfdev's avatar
vfdev committed
177

178
179
Miscellaneous
-------------
vfdev's avatar
vfdev committed
180

181
182
183
.. autosummary::
    :toctree: generated/
    :template: class.rst
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
184

185
    LinearTransformation
186
    v2.LinearTransformation
187
    Normalize
188
    v2.Normalize
189
    RandomErasing
190
    v2.RandomErasing
191
    Lambda
192
    v2.Lambda
Nicolas Hug's avatar
Nicolas Hug committed
193
    v2.SanitizeBoundingBox
vfdev's avatar
vfdev committed
194
    v2.ClampBoundingBox
195
    v2.UniformTemporalSubsample
vfdev's avatar
vfdev committed
196

197
.. _conversion_transforms:
198

199
200
Conversion
----------
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
201

Nicolas Hug's avatar
Nicolas Hug committed
202
203
204
205
206
207
.. note::
    Beware, some of these conversion transforms below will scale the values
    while performing the conversion, while some may not do any scaling. By
    scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
    255] range into [0, 1] (and vice-versa).
    
208
209
210
.. autosummary::
    :toctree: generated/
    :template: class.rst
vfdev's avatar
vfdev committed
211

212
    ToPILImage
213
214
    v2.ToPILImage
    v2.ToImagePIL
215
    ToTensor
216
    v2.ToTensor
217
    PILToTensor
218
    v2.PILToTensor
219
    v2.ToImageTensor
220
    ConvertImageDtype
221
    v2.ConvertDtype
Nicolas Hug's avatar
Nicolas Hug committed
222
    v2.ConvertImageDtype
Nicolas Hug's avatar
Nicolas Hug committed
223
    v2.ToDtype
vfdev's avatar
vfdev committed
224
    v2.ConvertBoundingBoxFormat
Nicolas Hug's avatar
Nicolas Hug committed
225

226
227
Auto-Augmentation
-----------------
228
229
230
231
232
233
234

`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
The new transform can be used standalone or mixed-and-matched with existing transforms:

235
236
237
.. autosummary::
    :toctree: generated/
    :template: class.rst
238

239
240
    AutoAugmentPolicy
    AutoAugment
241
    v2.AutoAugment
242
    RandAugment
243
    v2.RandAugment
244
    TrivialAugmentWide
245
    v2.TrivialAugmentWide
246
    AugMix
247
    v2.AugMix
248

249
250
.. _functional_transforms:

251
252
253
Functional Transforms
---------------------

254
255
.. currentmodule:: torchvision.transforms.functional

256
257
258
Functional transforms give you fine-grained control of the transformation pipeline.
As opposed to the transformations above, functional transforms don't contain a random number
generator for their parameters.
259
260
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.
261
262
263

Example:
you can apply a functional transform with the same parameters to multiple images like this:
264
265
266
267
268
269
270

.. code:: python

    import torchvision.transforms.functional as TF
    import random

    def my_segmentation_transforms(image, segmentation):
271
        if random.random() > 0.5:
272
273
274
275
276
277
            angle = random.randint(-30, 30)
            image = TF.rotate(image, angle)
            segmentation = TF.rotate(segmentation, angle)
        # more transforms ...
        return image, segmentation

278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299

Example:
you can use a functional transform to build transform classes with custom behavior:

.. code:: python

    import torchvision.transforms.functional as TF
    import random

    class MyRotationTransform:
        """Rotate by one of the given angles."""

        def __init__(self, angles):
            self.angles = angles

        def __call__(self, x):
            angle = random.choice(self.angles)
            return TF.rotate(x, angle)

    rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30])


300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
.. autosummary::
    :toctree: generated/
    :template: function.rst

    adjust_brightness
    adjust_contrast
    adjust_gamma
    adjust_hue
    adjust_saturation
    adjust_sharpness
    affine
    autocontrast
    center_crop
    convert_image_dtype
    crop
    equalize
    erase
    five_crop
    gaussian_blur
319
    get_dimensions
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
    get_image_num_channels
    get_image_size
    hflip
    invert
    normalize
    pad
    perspective
    pil_to_tensor
    posterize
    resize
    resized_crop
    rgb_to_grayscale
    rotate
    solarize
    ten_crop
    to_grayscale
    to_pil_image
    to_tensor
    vflip