add v0.19.1 release

bf491463 · limm · e17f5ea2 · bf491463 · bf491463 · bf491463
Commit bf491463 authored May 30, 2025 by limm
20 changed files
--- a/docs/source/models/resnet_quant.rst
+++ b/docs/source/models/resnet_quant.rst
+Quantized ResNet
+================
+.. currentmodule:: torchvision.models.quantization
+The Quantized ResNet model is based on the `Deep Residual Learning for Image Recognition
+<https://arxiv.org/abs/1512.03385>`_ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a quantized ResNet
+model, with or without pre-trained weights. All the model builders internally
+rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
+for more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    resnet18
+    resnet50
--- a/docs/source/models/resnext.rst
+++ b/docs/source/models/resnext.rst
+ResNeXt
+=======
+.. currentmodule:: torchvision.models
+The ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
+paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a ResNext model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    resnext50_32x4d
+    resnext101_32x8d
+    resnext101_64x4d
--- a/docs/source/models/resnext_quant.rst
+++ b/docs/source/models/resnext_quant.rst
+Quantized ResNeXt
+=================
+.. currentmodule:: torchvision.models.quantization
+The quantized ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
+paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a quantized ResNeXt
+model, with or without pre-trained weights. All the model builders internally
+rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
+for more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    resnext101_32x8d
+    resnext101_64x4d
--- a/docs/source/models/retinanet.rst
+++ b/docs/source/models/retinanet.rst
+RetinaNet
+=========
+.. currentmodule:: torchvision.models.detection
+The RetinaNet model is based on the `Focal Loss for Dense Object Detection
+<https://arxiv.org/abs/1708.02002>`__ paper.
+.. betastatus:: detection module
+Model builders
+--------------
+The following model builders can be used to instantiate a RetinaNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.retinanet.RetinaNet`` base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    retinanet_resnet50_fpn
+    retinanet_resnet50_fpn_v2
--- a/docs/source/models/shufflenetv2.rst
+++ b/docs/source/models/shufflenetv2.rst
+ShuffleNet V2
+=============
+.. currentmodule:: torchvision.models
+The ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
+CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a ShuffleNetV2 model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.shufflenetv2.ShuffleNetV2`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    shufflenet_v2_x0_5
+    shufflenet_v2_x1_0
+    shufflenet_v2_x1_5
+    shufflenet_v2_x2_0
--- a/docs/source/models/shufflenetv2_quant.rst
+++ b/docs/source/models/shufflenetv2_quant.rst
+Quantized ShuffleNet V2
+=======================
+.. currentmodule:: torchvision.models.quantization
+The Quantized ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
+CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a quantized ShuffleNetV2
+model, with or without pre-trained weights. All the model builders internally rely
+on the ``torchvision.models.quantization.shufflenetv2.QuantizableShuffleNetV2``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/shufflenetv2.py>`_
+for more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    shufflenet_v2_x0_5
+    shufflenet_v2_x1_0
+    shufflenet_v2_x1_5
+    shufflenet_v2_x2_0
--- a/docs/source/models/squeezenet.rst
+++ b/docs/source/models/squeezenet.rst
+SqueezeNet
+==========
+.. currentmodule:: torchvision.models
+The SqueezeNet model is based on the `SqueezeNet: AlexNet-level accuracy with
+50x fewer parameters and <0.5MB model size <https://arxiv.org/abs/1602.07360>`__
+paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a SqueezeNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.squeezenet.SqueezeNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    squeezenet1_0
+    squeezenet1_1
--- a/docs/source/models/ssd.rst
+++ b/docs/source/models/ssd.rst
+SSD
+===
+.. currentmodule:: torchvision.models.detection
+The SSD model is based on the `SSD: Single Shot MultiBox Detector
+<https://arxiv.org/abs/1512.02325>`__ paper.
+.. betastatus:: detection module
+Model builders
+--------------
+The following model builders can be used to instantiate a SSD model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.SSD`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    ssd300_vgg16
--- a/docs/source/models/ssdlite.rst
+++ b/docs/source/models/ssdlite.rst
+SSDlite
+=======
+.. currentmodule:: torchvision.models.detection
+The SSDLite model is based on the `SSD: Single Shot MultiBox Detector
+<https://arxiv.org/abs/1512.02325>`__, `Searching for MobileNetV3
+<https://arxiv.org/abs/1905.02244>`__ and `MobileNetV2: Inverted Residuals and Linear
+Bottlenecks <https://arxiv.org/abs/1801.04381>`__ papers.
+.. betastatus:: detection module
+Model builders
+--------------
+The following model builders can be used to instantiate a SSD Lite model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.ssd.SSD`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssdlite.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    ssdlite320_mobilenet_v3_large
--- a/docs/source/models/swin_transformer.rst
+++ b/docs/source/models/swin_transformer.rst
+SwinTransformer
+===============
+.. currentmodule:: torchvision.models
+The SwinTransformer models are based on the `Swin Transformer: Hierarchical Vision
+Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030>`__
+paper.
+SwinTransformer V2 models are based on the `Swin Transformer V2: Scaling Up Capacity
+and Resolution <https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.pdf>`__
+paper.
+Model builders
+--------------
+The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
+All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    swin_t
+    swin_s
+    swin_b
+    swin_v2_t
+    swin_v2_s
+    swin_v2_b
--- a/docs/source/models/vgg.rst
+++ b/docs/source/models/vgg.rst
+VGG
+===
+.. currentmodule:: torchvision.models
+The VGG model is based on the `Very Deep Convolutional Networks for Large-Scale
+Image Recognition <https://arxiv.org/abs/1409.1556>`_ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a VGG model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.vgg.VGG`` base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    vgg11
+    vgg11_bn
+    vgg13
+    vgg13_bn
+    vgg16
+    vgg16_bn
+    vgg19
+    vgg19_bn
--- a/docs/source/models/video_mvit.rst
+++ b/docs/source/models/video_mvit.rst
+Video MViT
+==========
+.. currentmodule:: torchvision.models.video
+The MViT model is based on the
+`MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
+<https://arxiv.org/abs/2112.01526>`__ and `Multiscale Vision Transformers
+<https://arxiv.org/abs/2104.11227>`__ papers.
+Model builders
+--------------
+The following model builders can be used to instantiate a MViT v1 or v2 model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.MViT`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/mvit.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    mvit_v1_b
+    mvit_v2_s
--- a/docs/source/models/video_resnet.rst
+++ b/docs/source/models/video_resnet.rst
+Video ResNet
+============
+.. currentmodule:: torchvision.models.video
+The VideoResNet model is based on the `A Closer Look at Spatiotemporal
+Convolutions for Action Recognition <https://arxiv.org/abs/1711.11248>`__ paper.
+.. betastatus:: video module
+Model builders
+--------------
+The following model builders can be used to instantiate a VideoResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.resnet.VideoResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/resnet.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    r3d_18
+    mc3_18
+    r2plus1d_18
--- a/docs/source/models/video_s3d.rst
+++ b/docs/source/models/video_s3d.rst
+Video S3D
+=========
+.. currentmodule:: torchvision.models.video
+The S3D model is based on the
+`Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
+<https://arxiv.org/abs/1712.04851>`__ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate an S3D model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.S3D`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/s3d.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    s3d
--- a/docs/source/models/video_swin_transformer.rst
+++ b/docs/source/models/video_swin_transformer.rst
+Video SwinTransformer
+=====================
+.. currentmodule:: torchvision.models.video
+The Video SwinTransformer model is based on the `Video Swin Transformer <https://arxiv.org/abs/2106.13230>`__ paper.
+.. betastatus:: video module
+Model builders
+--------------
+The following model builders can be used to instantiate a VideoResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.swin_transformer.SwinTransformer3d`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/swin_transformer.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    swin3d_t
+    swin3d_s
+    swin3d_b
--- a/docs/source/models/vision_transformer.rst
+++ b/docs/source/models/vision_transformer.rst
+VisionTransformer
+=================
+.. currentmodule:: torchvision.models
+The VisionTransformer model is based on the `An Image is Worth 16x16 Words:
+Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_ paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a VisionTransformer model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.vision_transformer.VisionTransformer`` base class.
+Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_ for
+more details about this class.
+.. autosummary::
+   :toctree: generated/
+   :template: function.rst
+   vit_b_16
+   vit_b_32
+   vit_l_16
+   vit_l_32
+   vit_h_14
--- a/docs/source/models/wide_resnet.rst
+++ b/docs/source/models/wide_resnet.rst
+Wide ResNet
+===========
+.. currentmodule:: torchvision.models
+The Wide ResNet model is based on the `Wide Residual Networks <https://arxiv.org/abs/1605.07146>`__
+paper.
+Model builders
+--------------
+The following model builders can be used to instantiate a Wide ResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
+more details about this class.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    wide_resnet50_2
+    wide_resnet101_2
--- a/docs/source/ops.rst
+++ b/docs/source/ops.rst
-torchvision.ops
+.. _ops:
-===============
+Operators
+=========
 .. currentmodule:: torchvision.ops
-:mod:`torchvision.ops` implements operators that are specific for Computer Vision.
+:mod:`torchvision.ops` implements operators, losses and layers that are specific for Computer Vision.
 .. note::
  All operators have native support for TorchScript.
-.. autofunction:: nms
+Detection and Segmentation Operators
-.. autofunction:: batched_nms
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autofunction:: remove_small_boxes
-.. autofunction:: clip_boxes_to_image
+The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
-.. autofunction:: box_convert
-.. autofunction:: box_area
+.. autosummary::
-.. autofunction:: box_iou
+    :toctree: generated/
-.. autofunction:: generalized_box_iou
+    :template: function.rst
-.. autofunction:: roi_align
-.. autofunction:: ps_roi_align
+    batched_nms
-.. autofunction:: roi_pool
+    masks_to_boxes
-.. autofunction:: ps_roi_pool
+    nms
-.. autofunction:: deform_conv2d
+    roi_align
-.. autofunction:: sigmoid_focal_loss
+    roi_pool
+    ps_roi_align
-.. autoclass:: RoIAlign
+    ps_roi_pool
-.. autoclass:: PSRoIAlign
-.. autoclass:: RoIPool
+.. autosummary::
-.. autoclass:: PSRoIPool
+    :toctree: generated/
-.. autoclass:: DeformConv2d
+    :template: class.rst
-.. autoclass:: MultiScaleRoIAlign
-.. autoclass:: FeaturePyramidNetwork
+    FeaturePyramidNetwork
+    MultiScaleRoIAlign
+    RoIAlign
+    RoIPool
+    PSRoIAlign
+    PSRoIPool
+Box Operators
+~~~~~~~~~~~~~
+These utility functions perform various operations on bounding boxes.
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    box_area
+    box_convert
+    box_iou
+    clip_boxes_to_image
+    complete_box_iou
+    distance_box_iou
+    generalized_box_iou
+    remove_small_boxes
+Losses
+~~~~~~
+The following vision-specific loss functions are implemented:
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    complete_box_iou_loss
+    distance_box_iou_loss
+    generalized_box_iou_loss
+    sigmoid_focal_loss
+Layers
+~~~~~~
+TorchVision provides commonly used building blocks as layers:
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    Conv2dNormActivation
+    Conv3dNormActivation
+    DeformConv2d
+    DropBlock2d
+    DropBlock3d
+    FrozenBatchNorm2d
+    MLP
+    Permute
+    SqueezeExcitation
+    StochasticDepth
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    deform_conv2d
+    drop_block2d
+    drop_block3d
+    stochastic_depth
--- a/docs/source/training_references.rst
+++ b/docs/source/training_references.rst
+Training references
+===================
+On top of the many models, datasets, and image transforms, Torchvision also
+provides training reference scripts. These are the scripts that we use to train
+the :ref:`models <models>` which are then available with pre-trained weights.
+These scripts are not part of the core package and are instead available `on
+GitHub <https://github.com/pytorch/vision/tree/main/references>`_. We currently
+provide references for
+`classification <https://github.com/pytorch/vision/tree/main/references/classification>`_,
+`detection <https://github.com/pytorch/vision/tree/main/references/detection>`_,
+`segmentation <https://github.com/pytorch/vision/tree/main/references/segmentation>`_,
+`similarity learning <https://github.com/pytorch/vision/tree/main/references/similarity>`_,
+and `video classification <https://github.com/pytorch/vision/tree/main/references/video_classification>`_.
+While these scripts are largely stable, they do not offer backward compatibility
+guarantees.
+In general, these scripts rely on the latest (not yet released) pytorch version
+or the latest torchvision version. This means that to use them, **you might need
+to install the latest pytorch and torchvision versions**, with e.g.::
+    conda install pytorch torchvision -c pytorch-nightly
+If you need to rely on an older stable version of pytorch or torchvision, e.g.
+torchvision 0.10, then it's safer to use the scripts from that corresponding
+release on GitHub, namely
+https://github.com/pytorch/vision/tree/v0.10.0/references.
--- a/docs/source/transforms.rst
+++ b/docs/source/transforms.rst
 .. _transforms:
-torchvision.transforms
+Transforming and augmenting images
-======================
+==================================
 .. currentmodule:: torchvision.transforms
-Transforms are common image transformations. They can be chained together using :class:`Compose`.
+Torchvision supports common computer vision transformations in the
-Most transform classes have a function equivalent: :ref:`functional
+``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
-transforms <functional_transforms>` give fine-grained control over the
+can be used to transform or augment data for training or inference of different
-transformations.
+tasks (image classification, detection, segmentation, video classification).
-This is useful if you have to build a more complex transformation pipeline
-(e.g. in the case of segmentation tasks).
-Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
-images and tensor images, although some transformations are :ref:`PIL-only
-<transforms_pil_only>` and some are :ref:`tensor-only
-<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
-convert to and from PIL images.
-The transformations that accept tensor images also accept batches of tensor
-images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
-number of channels, ``H`` and ``W`` are image height and width. A batch of
-Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
-of images in the batch.
-The expected range of the values of a tensor image is implicitely defined by
-the tensor dtype. Tensor images with a float dtype are expected to have
-values in ``[0, 1)``. Tensor images with an integer dtype are expected to
-have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
-that can be represented in that dtype.
-Randomized transformations will apply the same transformation to all the
-images of a given batch, but they will produce different transformations
-across calls. For reproducible transformations across calls, you may use
-:ref:`functional transforms <functional_transforms>`.
-The following examples illustate the use of the available transforms:
-    * :ref:`sphx_glr_auto_examples_plot_transforms.py`
+.. code:: python
-        .. figure:: ../source/auto_examples/images/sphx_glr_plot_transforms_001.png
-            :align: center
-            :scale: 65%
-    * :ref:`sphx_glr_auto_examples_plot_scripted_tensor_transforms.py`
+    # Image Classification
+    import torch
+    from torchvision.transforms import v2
-        .. figure:: ../source/auto_examples/images/sphx_glr_plot_scripted_tensor_transforms_001.png
+    H, W = 32, 32
-            :align: center
+    img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
-            :scale: 30%
-.. warning::
+    transforms = v2.Compose([
+        v2.RandomResizedCrop(size=(224, 224), antialias=True),
+        v2.RandomHorizontalFlip(p=0.5),
+        v2.ToDtype(torch.float32, scale=True),
+        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+    ])
+    img = transforms(img)
-    Since v0.8.0 all random transformations are using torch default random generator to sample random parameters.
+.. code:: python
-    It is a backward compatibility breaking change and user should set the random state as following:
-    .. code:: python
+    # Detection (re-using imports and transforms from above)
+    from torchvision import tv_tensors
-        # Previous versions
+    img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
-        # import random
+    boxes = torch.randint(0, H // 2, size=(3, 4))
-        # random.seed(12)
+    boxes[:, 2:] += boxes[:, :2]
+    boxes = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H, W))
-        # Now
+    # The same transforms can be used!
-        import torch
+    img, boxes = transforms(img, boxes)
-        torch.manual_seed(17)
+    # And you can pass arbitrary input structures
+    output_dict = transforms({"image": img, "boxes": boxes})
-    Please, keep in mind that the same seed for torch random generator and Python random generator will not
+Transforms are typically passed as the ``transform`` or ``transforms`` argument
-    produce the same results.
+to the :ref:`Datasets <datasets>`.
+Start here
+----------
-Scriptable transforms
+Whether you're new to Torchvision transforms, or you're already experienced with
---------------------
+them, we encourage you to start with
+:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` in
+order to learn more about what can be done with the new v2 transforms.
-In order to script the transformations, please use ``torch.nn.Sequential`` instead of :class:`Compose`.
+Then, browse the sections in below this page for general information and
+performance tips. The available transforms and functionals are listed in the
+:ref:`API reference <v2_api_ref>`.
-.. code:: python
+More information and tutorials can also be found in our :ref:`example gallery
+<gallery>`, e.g. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`
+or :ref:`sphx_glr_auto_examples_transforms_plot_custom_transforms.py`.
-    transforms = torch.nn.Sequential(
+.. _conventions:
-        transforms.CenterCrop(10),
-        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
-    )
-    scripted_transforms = torch.jit.script(transforms)
-Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor`` and does not require
+Supported input types and conventions
-`lambda` functions or ``PIL.Image``.
+-------------------------------------
-For any custom transformations to be used with ``torch.jit.script``, they should be derived from ``torch.nn.Module``.
+Most transformations accept both `PIL <https://pillow.readthedocs.io>`_ images
+and tensor inputs. Both CPU and CUDA tensors are supported.
+The result of both backends (PIL or Tensors) should be very
+close. In general, we recommend relying on the tensor backend :ref:`for
+performance <transforms_perf>`.  The :ref:`conversion transforms
+<conversion_transforms>` may be used to convert to and from PIL images, or for
+converting dtypes and ranges.
+Tensor image are expected to be of shape ``(C, H, W)``, where ``C`` is the
+number of channels, and ``H`` and ``W`` refer to height and width. Most
+transforms support batched tensor input. A batch of Tensor images is a tensor of
+shape ``(N, C, H, W)``, where ``N`` is a number of images in the batch. The
+:ref:`v2 <v1_or_v2>` transforms generally accept an arbitrary number of leading
+dimensions ``(..., C, H, W)`` and can handle batched images or batched videos.
-Compositions of transforms
+.. _range_and_dtype:
--------------------------
-.. autoclass:: Compose
+Dtype and expected value range
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The expected range of the values of a tensor image is implicitly defined by
+the tensor dtype. Tensor images with a float dtype are expected to have
+values in ``[0, 1]``. Tensor images with an integer dtype are expected to
+have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
+that can be represented in that dtype. Typically, images of dtype
+``torch.uint8`` are expected to have values in ``[0, 255]``.
-Transforms on PIL Image and torch.\*Tensor
+Use :class:`~torchvision.transforms.v2.ToDtype` to convert both the dtype and
------------------------------------------
+range of the inputs.
-.. autoclass:: CenterCrop
+.. _v1_or_v2:
-    :members:
-.. autoclass:: ColorJitter
+V1 or V2? Which one should I use?
-    :members:
+---------------------------------
-.. autoclass:: FiveCrop
+**TL;DR** We recommending using the ``torchvision.transforms.v2`` transforms
-    :members:
+instead of those in ``torchvision.transforms``. They're faster and they can do
+more things. Just change the import and you should be good to go. Moving
+forward, new features and improvements will only be considered for the v2
+transforms.
+In Torchvision 0.15 (March 2023), we released a new set of transforms available
+in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
+advantages compared to the v1 ones (in ``torchvision.transforms``):
+- They can transform images **but also** bounding boxes, masks, or videos. This
+  provides support for tasks beyond image classification: detection, segmentation,
+  video classification, etc. See
+  :ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
+  and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
+- They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
+  and :class:`~torchvision.transforms.v2.MixUp`. See
+  :ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py`.
+- They're :ref:`faster <transforms_perf>`.
+- They support arbitrary input structures (dicts, lists, tuples, etc.).
+- Future improvements and features will be added to the v2 transforms only.
+These transforms are **fully backward compatible** with the v1 ones, so if
+you're already using tranforms from ``torchvision.transforms``, all you need to
+do to is to update the import to ``torchvision.transforms.v2``. In terms of
+output, there might be negligible differences due to implementation differences.
+.. _transforms_perf:
+Performance considerations
+--------------------------
-.. autoclass:: Grayscale
+We recommend the following guidelines to get the best performance out of the
-    :members:
+transforms:
-.. autoclass:: Pad
+- Rely on the v2 transforms from ``torchvision.transforms.v2``
-    :members:
+- Use tensors instead of PIL images
+- Use ``torch.uint8`` dtype, especially for resizing
+- Resize with bilinear or bicubic mode
-.. autoclass:: RandomAffine
+This is what a typical transform pipeline could look like:
-    :members:
-.. autoclass:: RandomApply
+.. code:: python
-.. autoclass:: RandomCrop
+    from torchvision.transforms import v2
-    :members:
+    transforms = v2.Compose([
+        v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
+        v2.ToDtype(torch.uint8, scale=True),  # optional, most input are already uint8 at this point
+        # ...
+        v2.RandomResizedCrop(size=(224, 224), antialias=True),  # Or Resize(antialias=True)
+        # ...
+        v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
+        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+    ])
+The above should give you the best performance in a typical training environment
+that relies on the :class:`torch.utils.data.DataLoader` with ``num_workers >
+0``.
+Transforms tend to be sensitive to the input strides / memory format. Some
+transforms will be faster with channels-first images while others prefer
+channels-last. Like ``torch`` operators, most transforms will preserve the
+memory format of the input, but this may not always be respected due to
+implementation details. You may want to experiment a bit if you're chasing the
+very best performance.  Using :func:`torch.compile` on individual transforms may
+also help factoring out the memory format variable (e.g. on
+:class:`~torchvision.transforms.v2.Normalize`). Note that we're talking about
+**memory format**, not :ref:`tensor shape <conventions>`.
+Note that resize transforms like :class:`~torchvision.transforms.v2.Resize`
+and :class:`~torchvision.transforms.v2.RandomResizedCrop` typically prefer
+channels-last input and tend **not** to benefit from :func:`torch.compile` at
+this time.
-.. autoclass:: RandomGrayscale
+.. _functional_transforms:
-    :members:
-.. autoclass:: RandomHorizontalFlip
+Transform classes, functionals, and kernels
-    :members:
+-------------------------------------------
-.. autoclass:: RandomPerspective
+Transforms are available as classes like
-    :members:
+:class:`~torchvision.transforms.v2.Resize`, but also as functionals like
+:func:`~torchvision.transforms.v2.functional.resize` in the
+``torchvision.transforms.v2.functional`` namespace.
+This is very much like the :mod:`torch.nn` package which defines both classes
+and functional equivalents in :mod:`torch.nn.functional`.
-.. autoclass:: RandomResizedCrop
+The functionals support PIL images, pure tensors, or :ref:`TVTensors
-    :members:
+<tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(boxes)`` are
+valid.
-.. autoclass:: RandomRotation
+.. note::
-    :members:
-.. autoclass:: RandomSizedCrop
+    Random transforms like :class:`~torchvision.transforms.v2.RandomCrop` will
-    :members:
+    randomly sample some parameter each time they're called. Their functional
+    counterpart (:func:`~torchvision.transforms.v2.functional.crop`) does not do
+    any kind of random sampling and thus have a slighlty different
+    parametrization. The ``get_params()`` class method of the transforms class
+    can be used to perform parameter sampling when using the functional APIs.
-.. autoclass:: RandomVerticalFlip
-    :members:
-.. autoclass:: Resize
+The ``torchvision.transforms.v2.functional`` namespace also contains what we
-    :members:
+call the "kernels". These are the low-level functions that implement the
+core functionalities for specific types, e.g. ``resize_bounding_boxes`` or
+```resized_crop_mask``. They are public, although not documented. Check the
+`code
+<https://github.com/pytorch/vision/blob/main/torchvision/transforms/v2/functional/__init__.py>`_
+to see which ones are available (note that those starting with a leading
+underscore are **not** public!). Kernels are only really useful if you want
+:ref:`torchscript support <transforms_torchscript>` for types like bounding
+boxes or masks.
-.. autoclass:: Scale
+.. _transforms_torchscript:
-    :members:
-.. autoclass:: TenCrop
+Torchscript support
-    :members:
+-------------------
-.. autoclass:: GaussianBlur
+Most transform classes and functionals support torchscript. For composing
-    :members:
+transforms, use :class:`torch.nn.Sequential` instead of
+:class:`~torchvision.transforms.v2.Compose`:
-.. autoclass:: RandomInvert
+.. code:: python
-    :members:
-.. autoclass:: RandomPosterize
+    transforms = torch.nn.Sequential(
-    :members:
+        CenterCrop(10),
+        Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
+    )
+    scripted_transforms = torch.jit.script(transforms)
-.. autoclass:: RandomSolarize
+.. warning::
-    :members:
-.. autoclass:: RandomAdjustSharpness
+    v2 transforms support torchscript, but if you call ``torch.jit.script()`` on
-    :members:
+    a v2 **class** transform, you'll actually end up with its (scripted) v1
+    equivalent.  This may lead to slightly different results between the
+    scripted and eager executions due to implementation differences between v1
+    and v2.
-.. autoclass:: RandomAutocontrast
+    If you really need torchscript support for the v2 transforms, we recommend
-    :members:
+    scripting the **functionals** from the
+    ``torchvision.transforms.v2.functional`` namespace to avoid surprises.
-.. autoclass:: RandomEqualize
-    :members:
-.. _transforms_pil_only:
+Also note that the functionals only support torchscript for pure tensors, which
+are always treated as images. If you need torchscript support for other types
+like bounding boxes or masks, you can rely on the :ref:`low-level kernels
+<functional_transforms>`.
-Transforms on PIL Image only
+For any custom transformations to be used with ``torch.jit.script``, they should
----------------------------
+be derived from ``torch.nn.Module``.
-.. autoclass:: RandomChoice
+See also: :ref:`sphx_glr_auto_examples_others_plot_scripted_tensor_transforms.py`.
-.. autoclass:: RandomOrder
+.. _v2_api_ref:
+V2 API reference - Recommended
+------------------------------
+Geometry
+^^^^^^^^
-.. _transforms_tensor_only:
+Resizing
+""""""""
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.Resize
+    v2.ScaleJitter
+    v2.RandomShortestSize
+    v2.RandomResize
+Functionals
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    v2.functional.resize
+Cropping
+""""""""
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.RandomCrop
+    v2.RandomResizedCrop
+    v2.RandomIoUCrop
+    v2.CenterCrop
+    v2.FiveCrop
+    v2.TenCrop
+Functionals
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    v2.functional.crop
+    v2.functional.resized_crop
+    v2.functional.ten_crop
+    v2.functional.center_crop
+    v2.functional.five_crop
+Others
+""""""
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.RandomHorizontalFlip
+    v2.RandomVerticalFlip
+    v2.Pad
+    v2.RandomZoomOut
+    v2.RandomRotation
+    v2.RandomAffine
+    v2.RandomPerspective
+    v2.ElasticTransform
+Functionals
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    v2.functional.horizontal_flip
+    v2.functional.vertical_flip
+    v2.functional.pad
+    v2.functional.rotate
+    v2.functional.affine
+    v2.functional.perspective
+    v2.functional.elastic
+Color
+^^^^^
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.ColorJitter
+    v2.RandomChannelPermutation
+    v2.RandomPhotometricDistort
+    v2.Grayscale
+    v2.RGB
+    v2.RandomGrayscale
+    v2.GaussianBlur
+    v2.GaussianNoise
+    v2.RandomInvert
+    v2.RandomPosterize
+    v2.RandomSolarize
+    v2.RandomAdjustSharpness
+    v2.RandomAutocontrast
+    v2.RandomEqualize
+Functionals
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    v2.functional.permute_channels
+    v2.functional.rgb_to_grayscale
+    v2.functional.grayscale_to_rgb
+    v2.functional.to_grayscale
+    v2.functional.gaussian_blur
+    v2.functional.gaussian_noise
+    v2.functional.invert
+    v2.functional.posterize
+    v2.functional.solarize
+    v2.functional.adjust_sharpness
+    v2.functional.autocontrast
+    v2.functional.adjust_contrast
+    v2.functional.equalize
+    v2.functional.adjust_brightness
+    v2.functional.adjust_saturation
+    v2.functional.adjust_hue
+    v2.functional.adjust_gamma
+Composition
+^^^^^^^^^^^
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.Compose
+    v2.RandomApply
+    v2.RandomChoice
+    v2.RandomOrder
+Miscellaneous
+^^^^^^^^^^^^^
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.LinearTransformation
+    v2.Normalize
+    v2.RandomErasing
+    v2.Lambda
+    v2.SanitizeBoundingBoxes
+    v2.ClampBoundingBoxes
+    v2.UniformTemporalSubsample
+    v2.JPEG
+Functionals
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    v2.functional.normalize
+    v2.functional.erase
+    v2.functional.sanitize_bounding_boxes
+    v2.functional.clamp_bounding_boxes
+    v2.functional.uniform_temporal_subsample
+    v2.functional.jpeg
-Transforms on torch.\*Tensor only
+.. _conversion_transforms:
---------------------------------
-.. autoclass:: LinearTransformation
-    :members:
-.. autoclass:: Normalize
+Conversion
-    :members:
+^^^^^^^^^^
-.. autoclass:: RandomErasing
+.. note::
-    :members:
+    Beware, some of these conversion transforms below will scale the values
+    while performing the conversion, while some may not do any scaling. By
+    scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
+    255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
-.. autoclass:: ConvertImageDtype
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
-.. _conversion_transforms:
+    v2.ToImage
+    v2.ToPureTensor
+    v2.PILToTensor
+    v2.ToPILImage
+    v2.ToDtype
+    v2.ConvertBoundingBoxFormat
-Conversion Transforms
+functionals
---------------------
-.. autoclass:: ToPILImage
+.. autosummary::
-    :members:
+    :toctree: generated/
+    :template: functional.rst
-.. autoclass:: ToTensor
+    v2.functional.to_image
-    :members:
+    v2.functional.pil_to_tensor
+    v2.functional.to_pil_image
+    v2.functional.to_dtype
+    v2.functional.convert_bounding_box_format
-Generic Transforms
+Deprecated
------------------
-.. autoclass:: Lambda
+.. autosummary::
-    :members:
+    :toctree: generated/
+    :template: class.rst
+    v2.ToTensor
+    v2.functional.to_tensor
+    v2.ConvertImageDtype
+    v2.functional.convert_image_dtype
-AutoAugment Transforms
+Auto-Augmentation
----------------------
+^^^^^^^^^^^^^^^^^
 `AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
 Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
@@ -223,61 +479,189 @@ ImageNet policies provide significant improvements when applied to other dataset
 In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
 The new transform can be used standalone or mixed-and-matched with existing transforms:
-.. autoclass:: AutoAugmentPolicy
+.. autosummary::
-    :members:
+    :toctree: generated/
+    :template: class.rst
-.. autoclass:: AutoAugment
+    v2.AutoAugment
-    :members:
+    v2.RandAugment
+    v2.TrivialAugmentWide
+    v2.AugMix
-.. _functional_transforms:
+CutMix - MixUp
+^^^^^^^^^^^^^^
-Functional Transforms
+CutMix and MixUp are special transforms that
---------------------
+are meant to be used on batches rather than on individual images, because they
+are combining pairs of images together. These can be used after the dataloader
+(once the samples are batched), or part of a collation function. See
+:ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py` for detailed usage examples.
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    v2.CutMix
+    v2.MixUp
-Functional transforms give you fine-grained control of the transformation pipeline.
+Developer tools
-As opposed to the transformations above, functional transforms don't contain a random number
+^^^^^^^^^^^^^^^
-generator for their parameters.
-That means you have to specify/generate all parameters, but the functional transform will give you
-reproducible results across calls.
-Example:
+.. autosummary::
-you can apply a functional transform with the same parameters to multiple images like this:
+    :toctree: generated/
+    :template: function.rst
-.. code:: python
+    v2.functional.register_kernel
-    import torchvision.transforms.functional as TF
-    import random
-    def my_segmentation_transforms(image, segmentation):
+V1 API Reference
-        if random.random() > 0.5:
+----------------
-            angle = random.randint(-30, 30)
-            image = TF.rotate(image, angle)
-            segmentation = TF.rotate(segmentation, angle)
-        # more transforms ...
-        return image, segmentation
+Geometry
+^^^^^^^^
-Example:
+.. autosummary::
-you can use a functional transform to build transform classes with custom behavior:
+    :toctree: generated/
+    :template: class.rst
-.. code:: python
+    Resize
+    RandomCrop
+    RandomResizedCrop
+    CenterCrop
+    FiveCrop
+    TenCrop
+    Pad
+    RandomRotation
+    RandomAffine
+    RandomPerspective
+    ElasticTransform
+    RandomHorizontalFlip
+    RandomVerticalFlip
+Color
+^^^^^
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    ColorJitter
+    Grayscale
+    RandomGrayscale
+    GaussianBlur
+    RandomInvert
+    RandomPosterize
+    RandomSolarize
+    RandomAdjustSharpness
+    RandomAutocontrast
+    RandomEqualize
+Composition
+^^^^^^^^^^^
-    import torchvision.transforms.functional as TF
+.. autosummary::
-    import random
+    :toctree: generated/
+    :template: class.rst
-    class MyRotationTransform:
+    Compose
-        """Rotate by one of the given angles."""
+    RandomApply
+    RandomChoice
+    RandomOrder
-        def __init__(self, angles):
+Miscellaneous
-            self.angles = angles
+^^^^^^^^^^^^^
-        def __call__(self, x):
+.. autosummary::
-            angle = random.choice(self.angles)
+    :toctree: generated/
-            return TF.rotate(x, angle)
+    :template: class.rst
-    rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30])
+    LinearTransformation
+    Normalize
+    RandomErasing
+    Lambda
+Conversion
+^^^^^^^^^^
-.. automodule:: torchvision.transforms.functional
+.. note::
-    :members:
+    Beware, some of these conversion transforms below will scale the values
+    while performing the conversion, while some may not do any scaling. By
+    scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
+    255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    ToPILImage
+    ToTensor
+    PILToTensor
+    ConvertImageDtype
+Auto-Augmentation
+^^^^^^^^^^^^^^^^^
+`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
+Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
+ImageNet policies provide significant improvements when applied to other datasets.
+In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
+The new transform can be used standalone or mixed-and-matched with existing transforms:
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+    AutoAugmentPolicy
+    AutoAugment
+    RandAugment
+    TrivialAugmentWide
+    AugMix
+Functional Transforms
+^^^^^^^^^^^^^^^^^^^^^
+.. currentmodule:: torchvision.transforms.functional
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    adjust_brightness
+    adjust_contrast
+    adjust_gamma
+    adjust_hue
+    adjust_saturation
+    adjust_sharpness
+    affine
+    autocontrast
+    center_crop
+    convert_image_dtype
+    crop
+    equalize
+    erase
+    five_crop
+    gaussian_blur
+    get_dimensions
+    get_image_num_channels
+    get_image_size
+    hflip
+    invert
+    normalize
+    pad
+    perspective
+    pil_to_tensor
+    posterize
+    resize
+    resized_crop
+    rgb_to_grayscale
+    rotate
+    solarize
+    ten_crop
+    to_grayscale
+    to_pil_image
+    to_tensor
+    vflip