transforms.rst 10.3 KB
Newer Older
1
2
.. _transforms:

3
4
Transforming and augmenting images
==================================
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
5
6
7

.. currentmodule:: torchvision.transforms

8
9
10
11
12
13
14
15
16
17
18

.. note::
    In 0.15, we released a new set of transforms available in the
    ``torchvision.transforms.v2`` namespace, which add support for transforming
    not just images but also bounding boxes, masks, or videos. These transforms
    are fully backward compatible with the current ones, and you'll see them
    documented below with a `v2.` prefix. To get started with those new
    transforms, you can check out
    :ref:`sphx_glr_auto_examples_plot_transforms_v2_e2e.py`.
    Note that these transforms are still BETA, and while we don't expect major
    breaking changes in the future, some APIs may still change according to user
19
20
21
22
    feedback. Please submit any feedback you may have `here
    <https://github.com/pytorch/vision/issues/6753>`_, and you can also check
    out `this issue <https://github.com/pytorch/vision/issues/7319>`_ to learn
    more about the APIs that we suspect might involve future changes.
23

24
25
26
Transforms are common image transformations available in the
``torchvision.transforms`` module. They can be chained together using
:class:`Compose`.
27
28
29
Most transform classes have a function equivalent: :ref:`functional
transforms <functional_transforms>` give fine-grained control over the
transformations.
30
31
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
32

33
34
35
36
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_ images
and tensor images, although some transformations are PIL-only and some are
tensor-only. The :ref:`conversion_transforms` may be used to convert to and from
PIL images, or for converting dtypes and ranges.
37
38
39
40
41
42
43

The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.

44
The expected range of the values of a tensor image is implicitly defined by
45
46
47
48
49
50
51
52
53
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.

Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.
54

55
The following examples illustrate the use of the available transforms:
56
57
58
59
60
61
62
63
64
65
66
67
68

    * :ref:`sphx_glr_auto_examples_plot_transforms.py`

        .. figure:: ../source/auto_examples/images/sphx_glr_plot_transforms_001.png
            :align: center
            :scale: 65%

    * :ref:`sphx_glr_auto_examples_plot_scripted_tensor_transforms.py`

        .. figure:: ../source/auto_examples/images/sphx_glr_plot_scripted_tensor_transforms_001.png
            :align: center
            :scale: 30%

69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
.. warning::

    Since v0.8.0 all random transformations are using torch default random generator to sample random parameters.
    It is a backward compatibility breaking change and user should set the random state as following:

    .. code:: python

        # Previous versions
        # import random
        # random.seed(12)

        # Now
        import torch
        torch.manual_seed(17)

    Please, keep in mind that the same seed for torch random generator and Python random generator will not
    produce the same results.

87

88
89
90
91
Transforms scriptability
------------------------

.. TODO: Add note about v2 scriptability (in next PR)
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

In order to script the transformations, please use ``torch.nn.Sequential`` instead of :class:`Compose`.

.. code:: python

    transforms = torch.nn.Sequential(
        transforms.CenterCrop(10),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
    )
    scripted_transforms = torch.jit.script(transforms)

Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor`` and does not require
`lambda` functions or ``PIL.Image``.

For any custom transformations to be used with ``torch.jit.script``, they should be derived from ``torch.nn.Module``.

108

109
110
Geometry
--------
111

112
113
114
115
.. autosummary::
    :toctree: generated/
    :template: class.rst

116
    Resize
117
    v2.Resize
118
119
120
    v2.ScaleJitter
    v2.RandomShortestSize
    v2.RandomResize
121
    RandomCrop
122
    v2.RandomCrop
123
    RandomResizedCrop
124
    v2.RandomResizedCrop
125
    v2.RandomIoUCrop
126
    CenterCrop
127
    v2.CenterCrop
128
    FiveCrop
129
    v2.FiveCrop
130
    TenCrop
131
    v2.TenCrop
132
    Pad
133
    v2.Pad
134
135
136
    v2.RandomZoomOut
    RandomRotation
    v2.RandomRotation
137
    RandomAffine
138
    v2.RandomAffine
139
    RandomPerspective
140
    v2.RandomPerspective
141
142
    ElasticTransform
    v2.ElasticTransform
143
    RandomHorizontalFlip
144
    v2.RandomHorizontalFlip
145
    RandomVerticalFlip
146
    v2.RandomVerticalFlip
147

148

149
150
Color
-----
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
151

152
153
154
155
156
.. autosummary::
    :toctree: generated/
    :template: class.rst

    ColorJitter
157
    v2.ColorJitter
158
    v2.RandomPhotometricDistort
159
    Grayscale
160
    v2.Grayscale
161
    RandomGrayscale
162
    v2.RandomGrayscale
163
    GaussianBlur
164
    v2.GaussianBlur
165
    RandomInvert
166
    v2.RandomInvert
167
    RandomPosterize
168
    v2.RandomPosterize
169
    RandomSolarize
170
    v2.RandomSolarize
171
    RandomAdjustSharpness
172
    v2.RandomAdjustSharpness
173
    RandomAutocontrast
174
    v2.RandomAutocontrast
175
    RandomEqualize
176
    v2.RandomEqualize
177

178
179
Composition
-----------
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
180

181
182
183
.. autosummary::
    :toctree: generated/
    :template: class.rst
vfdev's avatar
vfdev committed
184

185
    Compose
186
    v2.Compose
187
    RandomApply
188
    v2.RandomApply
189
    RandomChoice
190
    v2.RandomChoice
191
    RandomOrder
192
    v2.RandomOrder
vfdev's avatar
vfdev committed
193

194
195
Miscellaneous
-------------
vfdev's avatar
vfdev committed
196

197
198
199
.. autosummary::
    :toctree: generated/
    :template: class.rst
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
200

201
    LinearTransformation
202
    v2.LinearTransformation
203
    Normalize
204
    v2.Normalize
205
    RandomErasing
206
    v2.RandomErasing
207
    Lambda
208
    v2.Lambda
209
210
    v2.SanitizeBoundingBoxes
    v2.ClampBoundingBoxes
211
    v2.UniformTemporalSubsample
vfdev's avatar
vfdev committed
212

213
.. _conversion_transforms:
214

215
216
Conversion
----------
Sasank Chilamkurthy's avatar
Sasank Chilamkurthy committed
217

Nicolas Hug's avatar
Nicolas Hug committed
218
219
220
221
222
223
.. note::
    Beware, some of these conversion transforms below will scale the values
    while performing the conversion, while some may not do any scaling. By
    scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
    255] range into [0, 1] (and vice-versa).
    
224
225
226
.. autosummary::
    :toctree: generated/
    :template: class.rst
vfdev's avatar
vfdev committed
227

228
    ToPILImage
229
230
    v2.ToPILImage
    v2.ToImagePIL
231
    ToTensor
232
    v2.ToTensor
233
    PILToTensor
234
    v2.PILToTensor
235
    v2.ToImageTensor
236
    ConvertImageDtype
Nicolas Hug's avatar
Nicolas Hug committed
237
    v2.ConvertImageDtype
Nicolas Hug's avatar
Nicolas Hug committed
238
    v2.ToDtype
vfdev's avatar
vfdev committed
239
    v2.ConvertBoundingBoxFormat
Nicolas Hug's avatar
Nicolas Hug committed
240

241
242
Auto-Augmentation
-----------------
243
244
245
246
247
248
249

`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
The new transform can be used standalone or mixed-and-matched with existing transforms:

250
251
252
.. autosummary::
    :toctree: generated/
    :template: class.rst
253

254
255
    AutoAugmentPolicy
    AutoAugment
256
    v2.AutoAugment
257
    RandAugment
258
    v2.RandAugment
259
    TrivialAugmentWide
260
    v2.TrivialAugmentWide
261
    AugMix
262
    v2.AugMix
263

264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
Cutmix - Mixup
--------------

Cutmix and Mixup are special transforms that
are meant to be used on batches rather than on individual images, because they
are combining pairs of images together. These can be used after the dataloader,
or part of a collation function. See
:ref:`sphx_glr_auto_examples_plot_cutmix_mixup.py` for detailed usage examples.

.. autosummary::
    :toctree: generated/
    :template: class.rst

    v2.Cutmix
    v2.Mixup

280
281
.. _functional_transforms:

282
283
284
Functional Transforms
---------------------

285
286
.. currentmodule:: torchvision.transforms.functional

287
288
289
290
291
292
293
294

.. note::
    You'll find below the documentation for the existing
    ``torchvision.transforms.functional`` namespace. The
    ``torchvision.transforms.v2.functional`` namespace exists as well and can be
    used! The same functionals are present, so you simply need to change your
    import to rely on the ``v2`` namespace.

295
296
297
Functional transforms give you fine-grained control of the transformation pipeline.
As opposed to the transformations above, functional transforms don't contain a random number
generator for their parameters.
298
299
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.
300
301
302

Example:
you can apply a functional transform with the same parameters to multiple images like this:
303
304
305
306
307
308
309

.. code:: python

    import torchvision.transforms.functional as TF
    import random

    def my_segmentation_transforms(image, segmentation):
310
        if random.random() > 0.5:
311
312
313
314
315
316
            angle = random.randint(-30, 30)
            image = TF.rotate(image, angle)
            segmentation = TF.rotate(segmentation, angle)
        # more transforms ...
        return image, segmentation

317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338

Example:
you can use a functional transform to build transform classes with custom behavior:

.. code:: python

    import torchvision.transforms.functional as TF
    import random

    class MyRotationTransform:
        """Rotate by one of the given angles."""

        def __init__(self, angles):
            self.angles = angles

        def __call__(self, x):
            angle = random.choice(self.angles)
            return TF.rotate(x, angle)

    rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30])


339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
.. autosummary::
    :toctree: generated/
    :template: function.rst

    adjust_brightness
    adjust_contrast
    adjust_gamma
    adjust_hue
    adjust_saturation
    adjust_sharpness
    affine
    autocontrast
    center_crop
    convert_image_dtype
    crop
    equalize
    erase
    five_crop
    gaussian_blur
358
    get_dimensions
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
    get_image_num_channels
    get_image_size
    hflip
    invert
    normalize
    pad
    perspective
    pil_to_tensor
    posterize
    resize
    resized_crop
    rgb_to_grayscale
    rotate
    solarize
    ten_crop
    to_grayscale
    to_pil_image
    to_tensor
    vflip