Commit bf491463 authored by limm's avatar limm
Browse files

add v0.19.1 release

parent e17f5ea2
Quantized ResNet
================
.. currentmodule:: torchvision.models.quantization
The Quantized ResNet model is based on the `Deep Residual Learning for Image Recognition
<https://arxiv.org/abs/1512.03385>`_ paper.
Model builders
--------------
The following model builders can be used to instantiate a quantized ResNet
model, with or without pre-trained weights. All the model builders internally
rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
for more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
resnet18
resnet50
ResNeXt
=======
.. currentmodule:: torchvision.models
The ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
paper.
Model builders
--------------
The following model builders can be used to instantiate a ResNext model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
resnext50_32x4d
resnext101_32x8d
resnext101_64x4d
Quantized ResNeXt
=================
.. currentmodule:: torchvision.models.quantization
The quantized ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
paper.
Model builders
--------------
The following model builders can be used to instantiate a quantized ResNeXt
model, with or without pre-trained weights. All the model builders internally
rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
for more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
resnext101_32x8d
resnext101_64x4d
RetinaNet
=========
.. currentmodule:: torchvision.models.detection
The RetinaNet model is based on the `Focal Loss for Dense Object Detection
<https://arxiv.org/abs/1708.02002>`__ paper.
.. betastatus:: detection module
Model builders
--------------
The following model builders can be used to instantiate a RetinaNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.detection.retinanet.RetinaNet`` base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
retinanet_resnet50_fpn
retinanet_resnet50_fpn_v2
ShuffleNet V2
=============
.. currentmodule:: torchvision.models
The ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
Model builders
--------------
The following model builders can be used to instantiate a ShuffleNetV2 model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.shufflenetv2.ShuffleNetV2`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
shufflenet_v2_x0_5
shufflenet_v2_x1_0
shufflenet_v2_x1_5
shufflenet_v2_x2_0
Quantized ShuffleNet V2
=======================
.. currentmodule:: torchvision.models.quantization
The Quantized ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
Model builders
--------------
The following model builders can be used to instantiate a quantized ShuffleNetV2
model, with or without pre-trained weights. All the model builders internally rely
on the ``torchvision.models.quantization.shufflenetv2.QuantizableShuffleNetV2``
base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/shufflenetv2.py>`_
for more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
shufflenet_v2_x0_5
shufflenet_v2_x1_0
shufflenet_v2_x1_5
shufflenet_v2_x2_0
SqueezeNet
==========
.. currentmodule:: torchvision.models
The SqueezeNet model is based on the `SqueezeNet: AlexNet-level accuracy with
50x fewer parameters and <0.5MB model size <https://arxiv.org/abs/1602.07360>`__
paper.
Model builders
--------------
The following model builders can be used to instantiate a SqueezeNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.squeezenet.SqueezeNet`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
squeezenet1_0
squeezenet1_1
SSD
===
.. currentmodule:: torchvision.models.detection
The SSD model is based on the `SSD: Single Shot MultiBox Detector
<https://arxiv.org/abs/1512.02325>`__ paper.
.. betastatus:: detection module
Model builders
--------------
The following model builders can be used to instantiate a SSD model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.detection.SSD`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
ssd300_vgg16
SSDlite
=======
.. currentmodule:: torchvision.models.detection
The SSDLite model is based on the `SSD: Single Shot MultiBox Detector
<https://arxiv.org/abs/1512.02325>`__, `Searching for MobileNetV3
<https://arxiv.org/abs/1905.02244>`__ and `MobileNetV2: Inverted Residuals and Linear
Bottlenecks <https://arxiv.org/abs/1801.04381>`__ papers.
.. betastatus:: detection module
Model builders
--------------
The following model builders can be used to instantiate a SSD Lite model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.detection.ssd.SSD`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssdlite.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
ssdlite320_mobilenet_v3_large
SwinTransformer
===============
.. currentmodule:: torchvision.models
The SwinTransformer models are based on the `Swin Transformer: Hierarchical Vision
Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030>`__
paper.
SwinTransformer V2 models are based on the `Swin Transformer V2: Scaling Up Capacity
and Resolution <https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.pdf>`__
paper.
Model builders
--------------
The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
swin_t
swin_s
swin_b
swin_v2_t
swin_v2_s
swin_v2_b
VGG
===
.. currentmodule:: torchvision.models
The VGG model is based on the `Very Deep Convolutional Networks for Large-Scale
Image Recognition <https://arxiv.org/abs/1409.1556>`_ paper.
Model builders
--------------
The following model builders can be used to instantiate a VGG model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.vgg.VGG`` base class. Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
vgg11
vgg11_bn
vgg13
vgg13_bn
vgg16
vgg16_bn
vgg19
vgg19_bn
Video MViT
==========
.. currentmodule:: torchvision.models.video
The MViT model is based on the
`MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
<https://arxiv.org/abs/2112.01526>`__ and `Multiscale Vision Transformers
<https://arxiv.org/abs/2104.11227>`__ papers.
Model builders
--------------
The following model builders can be used to instantiate a MViT v1 or v2 model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.MViT`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/mvit.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
mvit_v1_b
mvit_v2_s
Video ResNet
============
.. currentmodule:: torchvision.models.video
The VideoResNet model is based on the `A Closer Look at Spatiotemporal
Convolutions for Action Recognition <https://arxiv.org/abs/1711.11248>`__ paper.
.. betastatus:: video module
Model builders
--------------
The following model builders can be used to instantiate a VideoResNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.resnet.VideoResNet`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/resnet.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
r3d_18
mc3_18
r2plus1d_18
Video S3D
=========
.. currentmodule:: torchvision.models.video
The S3D model is based on the
`Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
<https://arxiv.org/abs/1712.04851>`__ paper.
Model builders
--------------
The following model builders can be used to instantiate an S3D model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.S3D`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/s3d.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
s3d
Video SwinTransformer
=====================
.. currentmodule:: torchvision.models.video
The Video SwinTransformer model is based on the `Video Swin Transformer <https://arxiv.org/abs/2106.13230>`__ paper.
.. betastatus:: video module
Model builders
--------------
The following model builders can be used to instantiate a VideoResNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.swin_transformer.SwinTransformer3d`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/swin_transformer.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
swin3d_t
swin3d_s
swin3d_b
VisionTransformer
=================
.. currentmodule:: torchvision.models
The VisionTransformer model is based on the `An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_ paper.
Model builders
--------------
The following model builders can be used to instantiate a VisionTransformer model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.vision_transformer.VisionTransformer`` base class.
Please refer to the `source code
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
vit_b_16
vit_b_32
vit_l_16
vit_l_32
vit_h_14
Wide ResNet
===========
.. currentmodule:: torchvision.models
The Wide ResNet model is based on the `Wide Residual Networks <https://arxiv.org/abs/1605.07146>`__
paper.
Model builders
--------------
The following model builders can be used to instantiate a Wide ResNet model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
more details about this class.
.. autosummary::
:toctree: generated/
:template: function.rst
wide_resnet50_2
wide_resnet101_2
torchvision.ops .. _ops:
===============
Operators
=========
.. currentmodule:: torchvision.ops .. currentmodule:: torchvision.ops
:mod:`torchvision.ops` implements operators that are specific for Computer Vision. :mod:`torchvision.ops` implements operators, losses and layers that are specific for Computer Vision.
.. note:: .. note::
All operators have native support for TorchScript. All operators have native support for TorchScript.
.. autofunction:: nms Detection and Segmentation Operators
.. autofunction:: batched_nms ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: remove_small_boxes
.. autofunction:: clip_boxes_to_image The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
.. autofunction:: box_convert
.. autofunction:: box_area .. autosummary::
.. autofunction:: box_iou :toctree: generated/
.. autofunction:: generalized_box_iou :template: function.rst
.. autofunction:: roi_align
.. autofunction:: ps_roi_align batched_nms
.. autofunction:: roi_pool masks_to_boxes
.. autofunction:: ps_roi_pool nms
.. autofunction:: deform_conv2d roi_align
.. autofunction:: sigmoid_focal_loss roi_pool
ps_roi_align
.. autoclass:: RoIAlign ps_roi_pool
.. autoclass:: PSRoIAlign
.. autoclass:: RoIPool .. autosummary::
.. autoclass:: PSRoIPool :toctree: generated/
.. autoclass:: DeformConv2d :template: class.rst
.. autoclass:: MultiScaleRoIAlign
.. autoclass:: FeaturePyramidNetwork FeaturePyramidNetwork
MultiScaleRoIAlign
RoIAlign
RoIPool
PSRoIAlign
PSRoIPool
Box Operators
~~~~~~~~~~~~~
These utility functions perform various operations on bounding boxes.
.. autosummary::
:toctree: generated/
:template: function.rst
box_area
box_convert
box_iou
clip_boxes_to_image
complete_box_iou
distance_box_iou
generalized_box_iou
remove_small_boxes
Losses
~~~~~~
The following vision-specific loss functions are implemented:
.. autosummary::
:toctree: generated/
:template: function.rst
complete_box_iou_loss
distance_box_iou_loss
generalized_box_iou_loss
sigmoid_focal_loss
Layers
~~~~~~
TorchVision provides commonly used building blocks as layers:
.. autosummary::
:toctree: generated/
:template: class.rst
Conv2dNormActivation
Conv3dNormActivation
DeformConv2d
DropBlock2d
DropBlock3d
FrozenBatchNorm2d
MLP
Permute
SqueezeExcitation
StochasticDepth
.. autosummary::
:toctree: generated/
:template: function.rst
deform_conv2d
drop_block2d
drop_block3d
stochastic_depth
Training references
===================
On top of the many models, datasets, and image transforms, Torchvision also
provides training reference scripts. These are the scripts that we use to train
the :ref:`models <models>` which are then available with pre-trained weights.
These scripts are not part of the core package and are instead available `on
GitHub <https://github.com/pytorch/vision/tree/main/references>`_. We currently
provide references for
`classification <https://github.com/pytorch/vision/tree/main/references/classification>`_,
`detection <https://github.com/pytorch/vision/tree/main/references/detection>`_,
`segmentation <https://github.com/pytorch/vision/tree/main/references/segmentation>`_,
`similarity learning <https://github.com/pytorch/vision/tree/main/references/similarity>`_,
and `video classification <https://github.com/pytorch/vision/tree/main/references/video_classification>`_.
While these scripts are largely stable, they do not offer backward compatibility
guarantees.
In general, these scripts rely on the latest (not yet released) pytorch version
or the latest torchvision version. This means that to use them, **you might need
to install the latest pytorch and torchvision versions**, with e.g.::
conda install pytorch torchvision -c pytorch-nightly
If you need to rely on an older stable version of pytorch or torchvision, e.g.
torchvision 0.10, then it's safer to use the scripts from that corresponding
release on GitHub, namely
https://github.com/pytorch/vision/tree/v0.10.0/references.
.. _transforms: .. _transforms:
torchvision.transforms Transforming and augmenting images
====================== ==================================
.. currentmodule:: torchvision.transforms .. currentmodule:: torchvision.transforms
Transforms are common image transformations. They can be chained together using :class:`Compose`. Torchvision supports common computer vision transformations in the
Most transform classes have a function equivalent: :ref:`functional ``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
transforms <functional_transforms>` give fine-grained control over the can be used to transform or augment data for training or inference of different
transformations. tasks (image classification, detection, segmentation, video classification).
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
images and tensor images, although some transformations are :ref:`PIL-only
<transforms_pil_only>` and some are :ref:`tensor-only
<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
convert to and from PIL images.
The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.
The expected range of the values of a tensor image is implicitely defined by
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.
Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.
The following examples illustate the use of the available transforms:
* :ref:`sphx_glr_auto_examples_plot_transforms.py` .. code:: python
.. figure:: ../source/auto_examples/images/sphx_glr_plot_transforms_001.png
:align: center
:scale: 65%
* :ref:`sphx_glr_auto_examples_plot_scripted_tensor_transforms.py` # Image Classification
import torch
from torchvision.transforms import v2
.. figure:: ../source/auto_examples/images/sphx_glr_plot_scripted_tensor_transforms_001.png H, W = 32, 32
:align: center img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
:scale: 30%
.. warning:: transforms = v2.Compose([
v2.RandomResizedCrop(size=(224, 224), antialias=True),
v2.RandomHorizontalFlip(p=0.5),
v2.ToDtype(torch.float32, scale=True),
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img = transforms(img)
Since v0.8.0 all random transformations are using torch default random generator to sample random parameters. .. code:: python
It is a backward compatibility breaking change and user should set the random state as following:
.. code:: python # Detection (re-using imports and transforms from above)
from torchvision import tv_tensors
# Previous versions img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
# import random boxes = torch.randint(0, H // 2, size=(3, 4))
# random.seed(12) boxes[:, 2:] += boxes[:, :2]
boxes = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H, W))
# Now # The same transforms can be used!
import torch img, boxes = transforms(img, boxes)
torch.manual_seed(17) # And you can pass arbitrary input structures
output_dict = transforms({"image": img, "boxes": boxes})
Please, keep in mind that the same seed for torch random generator and Python random generator will not Transforms are typically passed as the ``transform`` or ``transforms`` argument
produce the same results. to the :ref:`Datasets <datasets>`.
Start here
----------
Scriptable transforms Whether you're new to Torchvision transforms, or you're already experienced with
--------------------- them, we encourage you to start with
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` in
order to learn more about what can be done with the new v2 transforms.
In order to script the transformations, please use ``torch.nn.Sequential`` instead of :class:`Compose`. Then, browse the sections in below this page for general information and
performance tips. The available transforms and functionals are listed in the
:ref:`API reference <v2_api_ref>`.
.. code:: python More information and tutorials can also be found in our :ref:`example gallery
<gallery>`, e.g. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`
or :ref:`sphx_glr_auto_examples_transforms_plot_custom_transforms.py`.
transforms = torch.nn.Sequential( .. _conventions:
transforms.CenterCrop(10),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)
Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor`` and does not require Supported input types and conventions
`lambda` functions or ``PIL.Image``. -------------------------------------
For any custom transformations to be used with ``torch.jit.script``, they should be derived from ``torch.nn.Module``. Most transformations accept both `PIL <https://pillow.readthedocs.io>`_ images
and tensor inputs. Both CPU and CUDA tensors are supported.
The result of both backends (PIL or Tensors) should be very
close. In general, we recommend relying on the tensor backend :ref:`for
performance <transforms_perf>`. The :ref:`conversion transforms
<conversion_transforms>` may be used to convert to and from PIL images, or for
converting dtypes and ranges.
Tensor image are expected to be of shape ``(C, H, W)``, where ``C`` is the
number of channels, and ``H`` and ``W`` refer to height and width. Most
transforms support batched tensor input. A batch of Tensor images is a tensor of
shape ``(N, C, H, W)``, where ``N`` is a number of images in the batch. The
:ref:`v2 <v1_or_v2>` transforms generally accept an arbitrary number of leading
dimensions ``(..., C, H, W)`` and can handle batched images or batched videos.
Compositions of transforms .. _range_and_dtype:
--------------------------
.. autoclass:: Compose Dtype and expected value range
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The expected range of the values of a tensor image is implicitly defined by
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1]``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype. Typically, images of dtype
``torch.uint8`` are expected to have values in ``[0, 255]``.
Transforms on PIL Image and torch.\*Tensor Use :class:`~torchvision.transforms.v2.ToDtype` to convert both the dtype and
------------------------------------------ range of the inputs.
.. autoclass:: CenterCrop .. _v1_or_v2:
:members:
.. autoclass:: ColorJitter V1 or V2? Which one should I use?
:members: ---------------------------------
.. autoclass:: FiveCrop **TL;DR** We recommending using the ``torchvision.transforms.v2`` transforms
:members: instead of those in ``torchvision.transforms``. They're faster and they can do
more things. Just change the import and you should be good to go. Moving
forward, new features and improvements will only be considered for the v2
transforms.
In Torchvision 0.15 (March 2023), we released a new set of transforms available
in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
advantages compared to the v1 ones (in ``torchvision.transforms``):
- They can transform images **but also** bounding boxes, masks, or videos. This
provides support for tasks beyond image classification: detection, segmentation,
video classification, etc. See
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
- They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
and :class:`~torchvision.transforms.v2.MixUp`. See
:ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py`.
- They're :ref:`faster <transforms_perf>`.
- They support arbitrary input structures (dicts, lists, tuples, etc.).
- Future improvements and features will be added to the v2 transforms only.
These transforms are **fully backward compatible** with the v1 ones, so if
you're already using tranforms from ``torchvision.transforms``, all you need to
do to is to update the import to ``torchvision.transforms.v2``. In terms of
output, there might be negligible differences due to implementation differences.
.. _transforms_perf:
Performance considerations
--------------------------
.. autoclass:: Grayscale We recommend the following guidelines to get the best performance out of the
:members: transforms:
.. autoclass:: Pad - Rely on the v2 transforms from ``torchvision.transforms.v2``
:members: - Use tensors instead of PIL images
- Use ``torch.uint8`` dtype, especially for resizing
- Resize with bilinear or bicubic mode
.. autoclass:: RandomAffine This is what a typical transform pipeline could look like:
:members:
.. autoclass:: RandomApply .. code:: python
.. autoclass:: RandomCrop from torchvision.transforms import v2
:members: transforms = v2.Compose([
v2.ToImage(), # Convert to tensor, only needed if you had a PIL image
v2.ToDtype(torch.uint8, scale=True), # optional, most input are already uint8 at this point
# ...
v2.RandomResizedCrop(size=(224, 224), antialias=True), # Or Resize(antialias=True)
# ...
v2.ToDtype(torch.float32, scale=True), # Normalize expects float input
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
The above should give you the best performance in a typical training environment
that relies on the :class:`torch.utils.data.DataLoader` with ``num_workers >
0``.
Transforms tend to be sensitive to the input strides / memory format. Some
transforms will be faster with channels-first images while others prefer
channels-last. Like ``torch`` operators, most transforms will preserve the
memory format of the input, but this may not always be respected due to
implementation details. You may want to experiment a bit if you're chasing the
very best performance. Using :func:`torch.compile` on individual transforms may
also help factoring out the memory format variable (e.g. on
:class:`~torchvision.transforms.v2.Normalize`). Note that we're talking about
**memory format**, not :ref:`tensor shape <conventions>`.
Note that resize transforms like :class:`~torchvision.transforms.v2.Resize`
and :class:`~torchvision.transforms.v2.RandomResizedCrop` typically prefer
channels-last input and tend **not** to benefit from :func:`torch.compile` at
this time.
.. autoclass:: RandomGrayscale .. _functional_transforms:
:members:
.. autoclass:: RandomHorizontalFlip Transform classes, functionals, and kernels
:members: -------------------------------------------
.. autoclass:: RandomPerspective Transforms are available as classes like
:members: :class:`~torchvision.transforms.v2.Resize`, but also as functionals like
:func:`~torchvision.transforms.v2.functional.resize` in the
``torchvision.transforms.v2.functional`` namespace.
This is very much like the :mod:`torch.nn` package which defines both classes
and functional equivalents in :mod:`torch.nn.functional`.
.. autoclass:: RandomResizedCrop The functionals support PIL images, pure tensors, or :ref:`TVTensors
:members: <tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(boxes)`` are
valid.
.. autoclass:: RandomRotation .. note::
:members:
.. autoclass:: RandomSizedCrop Random transforms like :class:`~torchvision.transforms.v2.RandomCrop` will
:members: randomly sample some parameter each time they're called. Their functional
counterpart (:func:`~torchvision.transforms.v2.functional.crop`) does not do
any kind of random sampling and thus have a slighlty different
parametrization. The ``get_params()`` class method of the transforms class
can be used to perform parameter sampling when using the functional APIs.
.. autoclass:: RandomVerticalFlip
:members:
.. autoclass:: Resize The ``torchvision.transforms.v2.functional`` namespace also contains what we
:members: call the "kernels". These are the low-level functions that implement the
core functionalities for specific types, e.g. ``resize_bounding_boxes`` or
```resized_crop_mask``. They are public, although not documented. Check the
`code
<https://github.com/pytorch/vision/blob/main/torchvision/transforms/v2/functional/__init__.py>`_
to see which ones are available (note that those starting with a leading
underscore are **not** public!). Kernels are only really useful if you want
:ref:`torchscript support <transforms_torchscript>` for types like bounding
boxes or masks.
.. autoclass:: Scale .. _transforms_torchscript:
:members:
.. autoclass:: TenCrop Torchscript support
:members: -------------------
.. autoclass:: GaussianBlur Most transform classes and functionals support torchscript. For composing
:members: transforms, use :class:`torch.nn.Sequential` instead of
:class:`~torchvision.transforms.v2.Compose`:
.. autoclass:: RandomInvert .. code:: python
:members:
.. autoclass:: RandomPosterize transforms = torch.nn.Sequential(
:members: CenterCrop(10),
Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)
.. autoclass:: RandomSolarize .. warning::
:members:
.. autoclass:: RandomAdjustSharpness v2 transforms support torchscript, but if you call ``torch.jit.script()`` on
:members: a v2 **class** transform, you'll actually end up with its (scripted) v1
equivalent. This may lead to slightly different results between the
scripted and eager executions due to implementation differences between v1
and v2.
.. autoclass:: RandomAutocontrast If you really need torchscript support for the v2 transforms, we recommend
:members: scripting the **functionals** from the
``torchvision.transforms.v2.functional`` namespace to avoid surprises.
.. autoclass:: RandomEqualize
:members:
.. _transforms_pil_only: Also note that the functionals only support torchscript for pure tensors, which
are always treated as images. If you need torchscript support for other types
like bounding boxes or masks, you can rely on the :ref:`low-level kernels
<functional_transforms>`.
Transforms on PIL Image only For any custom transformations to be used with ``torch.jit.script``, they should
---------------------------- be derived from ``torch.nn.Module``.
.. autoclass:: RandomChoice See also: :ref:`sphx_glr_auto_examples_others_plot_scripted_tensor_transforms.py`.
.. autoclass:: RandomOrder .. _v2_api_ref:
V2 API reference - Recommended
------------------------------
Geometry
^^^^^^^^
.. _transforms_tensor_only: Resizing
""""""""
.. autosummary::
:toctree: generated/
:template: class.rst
v2.Resize
v2.ScaleJitter
v2.RandomShortestSize
v2.RandomResize
Functionals
.. autosummary::
:toctree: generated/
:template: function.rst
v2.functional.resize
Cropping
""""""""
.. autosummary::
:toctree: generated/
:template: class.rst
v2.RandomCrop
v2.RandomResizedCrop
v2.RandomIoUCrop
v2.CenterCrop
v2.FiveCrop
v2.TenCrop
Functionals
.. autosummary::
:toctree: generated/
:template: function.rst
v2.functional.crop
v2.functional.resized_crop
v2.functional.ten_crop
v2.functional.center_crop
v2.functional.five_crop
Others
""""""
.. autosummary::
:toctree: generated/
:template: class.rst
v2.RandomHorizontalFlip
v2.RandomVerticalFlip
v2.Pad
v2.RandomZoomOut
v2.RandomRotation
v2.RandomAffine
v2.RandomPerspective
v2.ElasticTransform
Functionals
.. autosummary::
:toctree: generated/
:template: function.rst
v2.functional.horizontal_flip
v2.functional.vertical_flip
v2.functional.pad
v2.functional.rotate
v2.functional.affine
v2.functional.perspective
v2.functional.elastic
Color
^^^^^
.. autosummary::
:toctree: generated/
:template: class.rst
v2.ColorJitter
v2.RandomChannelPermutation
v2.RandomPhotometricDistort
v2.Grayscale
v2.RGB
v2.RandomGrayscale
v2.GaussianBlur
v2.GaussianNoise
v2.RandomInvert
v2.RandomPosterize
v2.RandomSolarize
v2.RandomAdjustSharpness
v2.RandomAutocontrast
v2.RandomEqualize
Functionals
.. autosummary::
:toctree: generated/
:template: function.rst
v2.functional.permute_channels
v2.functional.rgb_to_grayscale
v2.functional.grayscale_to_rgb
v2.functional.to_grayscale
v2.functional.gaussian_blur
v2.functional.gaussian_noise
v2.functional.invert
v2.functional.posterize
v2.functional.solarize
v2.functional.adjust_sharpness
v2.functional.autocontrast
v2.functional.adjust_contrast
v2.functional.equalize
v2.functional.adjust_brightness
v2.functional.adjust_saturation
v2.functional.adjust_hue
v2.functional.adjust_gamma
Composition
^^^^^^^^^^^
.. autosummary::
:toctree: generated/
:template: class.rst
v2.Compose
v2.RandomApply
v2.RandomChoice
v2.RandomOrder
Miscellaneous
^^^^^^^^^^^^^
.. autosummary::
:toctree: generated/
:template: class.rst
v2.LinearTransformation
v2.Normalize
v2.RandomErasing
v2.Lambda
v2.SanitizeBoundingBoxes
v2.ClampBoundingBoxes
v2.UniformTemporalSubsample
v2.JPEG
Functionals
.. autosummary::
:toctree: generated/
:template: function.rst
v2.functional.normalize
v2.functional.erase
v2.functional.sanitize_bounding_boxes
v2.functional.clamp_bounding_boxes
v2.functional.uniform_temporal_subsample
v2.functional.jpeg
Transforms on torch.\*Tensor only .. _conversion_transforms:
---------------------------------
.. autoclass:: LinearTransformation
:members:
.. autoclass:: Normalize Conversion
:members: ^^^^^^^^^^
.. autoclass:: RandomErasing .. note::
:members: Beware, some of these conversion transforms below will scale the values
while performing the conversion, while some may not do any scaling. By
scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
.. autoclass:: ConvertImageDtype .. autosummary::
:toctree: generated/
:template: class.rst
.. _conversion_transforms: v2.ToImage
v2.ToPureTensor
v2.PILToTensor
v2.ToPILImage
v2.ToDtype
v2.ConvertBoundingBoxFormat
Conversion Transforms functionals
---------------------
.. autoclass:: ToPILImage .. autosummary::
:members: :toctree: generated/
:template: functional.rst
.. autoclass:: ToTensor v2.functional.to_image
:members: v2.functional.pil_to_tensor
v2.functional.to_pil_image
v2.functional.to_dtype
v2.functional.convert_bounding_box_format
Generic Transforms Deprecated
------------------
.. autoclass:: Lambda .. autosummary::
:members: :toctree: generated/
:template: class.rst
v2.ToTensor
v2.functional.to_tensor
v2.ConvertImageDtype
v2.functional.convert_image_dtype
AutoAugment Transforms Auto-Augmentation
---------------------- ^^^^^^^^^^^^^^^^^
`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models. `AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
...@@ -223,61 +479,189 @@ ImageNet policies provide significant improvements when applied to other dataset ...@@ -223,61 +479,189 @@ ImageNet policies provide significant improvements when applied to other dataset
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN. In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
The new transform can be used standalone or mixed-and-matched with existing transforms: The new transform can be used standalone or mixed-and-matched with existing transforms:
.. autoclass:: AutoAugmentPolicy .. autosummary::
:members: :toctree: generated/
:template: class.rst
.. autoclass:: AutoAugment v2.AutoAugment
:members: v2.RandAugment
v2.TrivialAugmentWide
v2.AugMix
.. _functional_transforms: CutMix - MixUp
^^^^^^^^^^^^^^
Functional Transforms CutMix and MixUp are special transforms that
--------------------- are meant to be used on batches rather than on individual images, because they
are combining pairs of images together. These can be used after the dataloader
(once the samples are batched), or part of a collation function. See
:ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py` for detailed usage examples.
.. autosummary::
:toctree: generated/
:template: class.rst
v2.CutMix
v2.MixUp
Functional transforms give you fine-grained control of the transformation pipeline. Developer tools
As opposed to the transformations above, functional transforms don't contain a random number ^^^^^^^^^^^^^^^
generator for their parameters.
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.
Example: .. autosummary::
you can apply a functional transform with the same parameters to multiple images like this: :toctree: generated/
:template: function.rst
.. code:: python v2.functional.register_kernel
import torchvision.transforms.functional as TF
import random
def my_segmentation_transforms(image, segmentation): V1 API Reference
if random.random() > 0.5: ----------------
angle = random.randint(-30, 30)
image = TF.rotate(image, angle)
segmentation = TF.rotate(segmentation, angle)
# more transforms ...
return image, segmentation
Geometry
^^^^^^^^
Example: .. autosummary::
you can use a functional transform to build transform classes with custom behavior: :toctree: generated/
:template: class.rst
.. code:: python Resize
RandomCrop
RandomResizedCrop
CenterCrop
FiveCrop
TenCrop
Pad
RandomRotation
RandomAffine
RandomPerspective
ElasticTransform
RandomHorizontalFlip
RandomVerticalFlip
Color
^^^^^
.. autosummary::
:toctree: generated/
:template: class.rst
ColorJitter
Grayscale
RandomGrayscale
GaussianBlur
RandomInvert
RandomPosterize
RandomSolarize
RandomAdjustSharpness
RandomAutocontrast
RandomEqualize
Composition
^^^^^^^^^^^
import torchvision.transforms.functional as TF .. autosummary::
import random :toctree: generated/
:template: class.rst
class MyRotationTransform: Compose
"""Rotate by one of the given angles.""" RandomApply
RandomChoice
RandomOrder
def __init__(self, angles): Miscellaneous
self.angles = angles ^^^^^^^^^^^^^
def __call__(self, x): .. autosummary::
angle = random.choice(self.angles) :toctree: generated/
return TF.rotate(x, angle) :template: class.rst
rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30]) LinearTransformation
Normalize
RandomErasing
Lambda
Conversion
^^^^^^^^^^
.. automodule:: torchvision.transforms.functional .. note::
:members: Beware, some of these conversion transforms below will scale the values
while performing the conversion, while some may not do any scaling. By
scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
.. autosummary::
:toctree: generated/
:template: class.rst
ToPILImage
ToTensor
PILToTensor
ConvertImageDtype
Auto-Augmentation
^^^^^^^^^^^^^^^^^
`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
The new transform can be used standalone or mixed-and-matched with existing transforms:
.. autosummary::
:toctree: generated/
:template: class.rst
AutoAugmentPolicy
AutoAugment
RandAugment
TrivialAugmentWide
AugMix
Functional Transforms
^^^^^^^^^^^^^^^^^^^^^
.. currentmodule:: torchvision.transforms.functional
.. autosummary::
:toctree: generated/
:template: function.rst
adjust_brightness
adjust_contrast
adjust_gamma
adjust_hue
adjust_saturation
adjust_sharpness
affine
autocontrast
center_crop
convert_image_dtype
crop
equalize
erase
five_crop
gaussian_blur
get_dimensions
get_image_num_channels
get_image_size
hflip
invert
normalize
pad
perspective
pil_to_tensor
posterize
resize
resized_crop
rgb_to_grayscale
rotate
solarize
ten_crop
to_grayscale
to_pil_image
to_tensor
vflip
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment