add v0.19.1 release

bf491463 · limm · e17f5ea2 · bf491463 · bf491463 · bf491463
Commit bf491463 authored May 30, 2025 by limm
20 changed files
--- a/docs/source/models/resnet_quant.rst
+++ b/docs/source/models/resnet_quant.rst
+Quantized ResNet
+================
+
+.. currentmodule:: torchvision.models.quantization
+
+The Quantized ResNet model is based on the `Deep Residual Learning for Image Recognition
+<https://arxiv.org/abs/1512.03385>`_ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a quantized ResNet
+model, with or without pre-trained weights. All the model builders internally
+rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
+for more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    resnet18
+    resnet50
--- a/docs/source/models/resnext.rst
+++ b/docs/source/models/resnext.rst
+ResNeXt
+=======
+
+.. currentmodule:: torchvision.models
+
+The ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
+paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a ResNext model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    resnext50_32x4d
+    resnext101_32x8d
+    resnext101_64x4d
--- a/docs/source/models/resnext_quant.rst
+++ b/docs/source/models/resnext_quant.rst
+Quantized ResNeXt
+=================
+
+.. currentmodule:: torchvision.models.quantization
+
+The quantized ResNext model is based on the `Aggregated Residual Transformations for Deep Neural Networks <https://arxiv.org/abs/1611.05431v2>`__
+paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a quantized ResNeXt
+model, with or without pre-trained weights. All the model builders internally
+rely on the ``torchvision.models.quantization.resnet.QuantizableResNet``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/resnet.py>`_
+for more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    resnext101_32x8d
+    resnext101_64x4d
--- a/docs/source/models/retinanet.rst
+++ b/docs/source/models/retinanet.rst
+RetinaNet
+=========
+
+.. currentmodule:: torchvision.models.detection
+
+The RetinaNet model is based on the `Focal Loss for Dense Object Detection
+<https://arxiv.org/abs/1708.02002>`__ paper.
+
+.. betastatus:: detection module
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a RetinaNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.retinanet.RetinaNet`` base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    retinanet_resnet50_fpn
+    retinanet_resnet50_fpn_v2
--- a/docs/source/models/shufflenetv2.rst
+++ b/docs/source/models/shufflenetv2.rst
+ShuffleNet V2
+=============
+
+.. currentmodule:: torchvision.models
+
+The ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
+CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a ShuffleNetV2 model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.shufflenetv2.ShuffleNetV2`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    shufflenet_v2_x0_5
+    shufflenet_v2_x1_0
+    shufflenet_v2_x1_5
+    shufflenet_v2_x2_0
--- a/docs/source/models/shufflenetv2_quant.rst
+++ b/docs/source/models/shufflenetv2_quant.rst
+Quantized ShuffleNet V2
+=======================
+
+.. currentmodule:: torchvision.models.quantization
+
+The Quantized ShuffleNet V2 model is based on the `ShuffleNet V2: Practical Guidelines for Efficient
+CNN Architecture Design <https://arxiv.org/abs/1807.11164>`__ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a quantized ShuffleNetV2
+model, with or without pre-trained weights. All the model builders internally rely
+on the ``torchvision.models.quantization.shufflenetv2.QuantizableShuffleNetV2``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/quantization/shufflenetv2.py>`_
+for more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    shufflenet_v2_x0_5
+    shufflenet_v2_x1_0
+    shufflenet_v2_x1_5
+    shufflenet_v2_x2_0
--- a/docs/source/models/squeezenet.rst
+++ b/docs/source/models/squeezenet.rst
+SqueezeNet
+==========
+
+.. currentmodule:: torchvision.models
+
+The SqueezeNet model is based on the `SqueezeNet: AlexNet-level accuracy with
+50x fewer parameters and <0.5MB model size <https://arxiv.org/abs/1602.07360>`__
+paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a SqueezeNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.squeezenet.SqueezeNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    squeezenet1_0
+    squeezenet1_1
--- a/docs/source/models/ssd.rst
+++ b/docs/source/models/ssd.rst
+SSD
+===
+
+.. currentmodule:: torchvision.models.detection
+
+The SSD model is based on the `SSD: Single Shot MultiBox Detector
+<https://arxiv.org/abs/1512.02325>`__ paper.
+
+.. betastatus:: detection module
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a SSD model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.SSD`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    ssd300_vgg16
--- a/docs/source/models/ssdlite.rst
+++ b/docs/source/models/ssdlite.rst
+SSDlite
+=======
+
+.. currentmodule:: torchvision.models.detection
+
+The SSDLite model is based on the `SSD: Single Shot MultiBox Detector
+<https://arxiv.org/abs/1512.02325>`__, `Searching for MobileNetV3
+<https://arxiv.org/abs/1905.02244>`__ and `MobileNetV2: Inverted Residuals and Linear
+Bottlenecks <https://arxiv.org/abs/1801.04381>`__ papers.
+
+.. betastatus:: detection module
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a SSD Lite model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.detection.ssd.SSD`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssdlite.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    ssdlite320_mobilenet_v3_large
--- a/docs/source/models/swin_transformer.rst
+++ b/docs/source/models/swin_transformer.rst
+SwinTransformer
+===============
+
+.. currentmodule:: torchvision.models
+
+The SwinTransformer models are based on the `Swin Transformer: Hierarchical Vision
+Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030>`__
+paper.
+SwinTransformer V2 models are based on the `Swin Transformer V2: Scaling Up Capacity
+and Resolution <https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.pdf>`__
+paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
+All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
+base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    swin_t
+    swin_s
+    swin_b
+    swin_v2_t
+    swin_v2_s
+    swin_v2_b
--- a/docs/source/models/vgg.rst
+++ b/docs/source/models/vgg.rst
+VGG
+===
+
+.. currentmodule:: torchvision.models
+
+The VGG model is based on the `Very Deep Convolutional Networks for Large-Scale
+Image Recognition <https://arxiv.org/abs/1409.1556>`_ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a VGG model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.vgg.VGG`` base class. Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    vgg11
+    vgg11_bn
+    vgg13
+    vgg13_bn
+    vgg16
+    vgg16_bn
+    vgg19
+    vgg19_bn
--- a/docs/source/models/video_mvit.rst
+++ b/docs/source/models/video_mvit.rst
+Video MViT
+==========
+
+.. currentmodule:: torchvision.models.video
+
+The MViT model is based on the
+`MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
+<https://arxiv.org/abs/2112.01526>`__ and `Multiscale Vision Transformers
+<https://arxiv.org/abs/2104.11227>`__ papers.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a MViT v1 or v2 model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.MViT`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/mvit.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    mvit_v1_b
+    mvit_v2_s
--- a/docs/source/models/video_resnet.rst
+++ b/docs/source/models/video_resnet.rst
+Video ResNet
+============
+
+.. currentmodule:: torchvision.models.video
+
+The VideoResNet model is based on the `A Closer Look at Spatiotemporal
+Convolutions for Action Recognition <https://arxiv.org/abs/1711.11248>`__ paper.
+
+.. betastatus:: video module
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a VideoResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.resnet.VideoResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/resnet.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    r3d_18
+    mc3_18
+    r2plus1d_18
--- a/docs/source/models/video_s3d.rst
+++ b/docs/source/models/video_s3d.rst
+Video S3D
+=========
+
+.. currentmodule:: torchvision.models.video
+
+The S3D model is based on the
+`Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
+<https://arxiv.org/abs/1712.04851>`__ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate an S3D model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.S3D`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/s3d.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    s3d
--- a/docs/source/models/video_swin_transformer.rst
+++ b/docs/source/models/video_swin_transformer.rst
+Video SwinTransformer
+=====================
+
+.. currentmodule:: torchvision.models.video
+
+The Video SwinTransformer model is based on the `Video Swin Transformer <https://arxiv.org/abs/2106.13230>`__ paper.
+
+.. betastatus:: video module
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a VideoResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.video.swin_transformer.SwinTransformer3d`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/video/swin_transformer.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    swin3d_t
+    swin3d_s
+    swin3d_b
--- a/docs/source/models/vision_transformer.rst
+++ b/docs/source/models/vision_transformer.rst
+VisionTransformer
+=================
+
+.. currentmodule:: torchvision.models
+
+The VisionTransformer model is based on the `An Image is Worth 16x16 Words:
+Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_ paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a VisionTransformer model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.vision_transformer.VisionTransformer`` base class.
+Please refer to the `source code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_ for
+more details about this class.
+
+.. autosummary::
+   :toctree: generated/
+   :template: function.rst
+
+   vit_b_16
+   vit_b_32
+   vit_l_16
+   vit_l_32
+   vit_h_14
--- a/docs/source/models/wide_resnet.rst
+++ b/docs/source/models/wide_resnet.rst
+Wide ResNet
+===========
+
+.. currentmodule:: torchvision.models
+
+The Wide ResNet model is based on the `Wide Residual Networks <https://arxiv.org/abs/1605.07146>`__
+paper.
+
+
+Model builders
+--------------
+
+The following model builders can be used to instantiate a Wide ResNet model, with or
+without pre-trained weights. All the model builders internally rely on the
+``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
+code
+<https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py>`_ for
+more details about this class.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    wide_resnet50_2
+    wide_resnet101_2
--- a/docs/source/ops.rst
+++ b/docs/source/ops.rst
-torchvision.ops
-===============
+.. _ops:
+
+Operators
+=========

 .. currentmodule:: torchvision.ops

-:mod:`torchvision.ops` implements operators that are specific for Computer Vision.
+:mod:`torchvision.ops` implements operators, losses and layers that are specific for Computer Vision.

 .. note::
  All operators have native support for TorchScript.


-.. autofunction:: nms
-.. autofunction:: batched_nms
-.. autofunction:: remove_small_boxes
-.. autofunction:: clip_boxes_to_image
-.. autofunction:: box_convert
-.. autofunction:: box_area
-.. autofunction:: box_iou
-.. autofunction:: generalized_box_iou
-.. autofunction:: roi_align
-.. autofunction:: ps_roi_align
-.. autofunction:: roi_pool
-.. autofunction:: ps_roi_pool
-.. autofunction:: deform_conv2d
-.. autofunction:: sigmoid_focal_loss
-
-.. autoclass:: RoIAlign
-.. autoclass:: PSRoIAlign
-.. autoclass:: RoIPool
-.. autoclass:: PSRoIPool
-.. autoclass:: DeformConv2d
-.. autoclass:: MultiScaleRoIAlign
-.. autoclass:: FeaturePyramidNetwork
+Detection and Segmentation Operators
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    batched_nms
+    masks_to_boxes
+    nms
+    roi_align
+    roi_pool
+    ps_roi_align
+    ps_roi_pool
+
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+
+    FeaturePyramidNetwork
+    MultiScaleRoIAlign
+    RoIAlign
+    RoIPool
+    PSRoIAlign
+    PSRoIPool
+
+
+Box Operators
+~~~~~~~~~~~~~
+
+These utility functions perform various operations on bounding boxes.
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    box_area
+    box_convert
+    box_iou
+    clip_boxes_to_image
+    complete_box_iou
+    distance_box_iou
+    generalized_box_iou
+    remove_small_boxes
+
+Losses
+~~~~~~
+
+The following vision-specific loss functions are implemented:
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    complete_box_iou_loss
+    distance_box_iou_loss
+    generalized_box_iou_loss
+    sigmoid_focal_loss
+
+
+Layers
+~~~~~~
+
+TorchVision provides commonly used building blocks as layers:
+
+.. autosummary::
+    :toctree: generated/
+    :template: class.rst
+
+    Conv2dNormActivation
+    Conv3dNormActivation
+    DeformConv2d
+    DropBlock2d
+    DropBlock3d
+    FrozenBatchNorm2d
+    MLP
+    Permute
+    SqueezeExcitation
+    StochasticDepth
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    deform_conv2d
+    drop_block2d
+    drop_block3d
+    stochastic_depth
--- a/docs/source/training_references.rst
+++ b/docs/source/training_references.rst
+Training references
+===================
+
+On top of the many models, datasets, and image transforms, Torchvision also
+provides training reference scripts. These are the scripts that we use to train
+the :ref:`models <models>` which are then available with pre-trained weights.
+
+These scripts are not part of the core package and are instead available `on
+GitHub <https://github.com/pytorch/vision/tree/main/references>`_. We currently
+provide references for
+`classification <https://github.com/pytorch/vision/tree/main/references/classification>`_,
+`detection <https://github.com/pytorch/vision/tree/main/references/detection>`_,
+`segmentation <https://github.com/pytorch/vision/tree/main/references/segmentation>`_,
+`similarity learning <https://github.com/pytorch/vision/tree/main/references/similarity>`_,
+and `video classification <https://github.com/pytorch/vision/tree/main/references/video_classification>`_.
+
+While these scripts are largely stable, they do not offer backward compatibility
+guarantees.
+
+In general, these scripts rely on the latest (not yet released) pytorch version
+or the latest torchvision version. This means that to use them, **you might need
+to install the latest pytorch and torchvision versions**, with e.g.::
+
+    conda install pytorch torchvision -c pytorch-nightly
+
+If you need to rely on an older stable version of pytorch or torchvision, e.g.
+torchvision 0.10, then it's safer to use the scripts from that corresponding
+release on GitHub, namely
+https://github.com/pytorch/vision/tree/v0.10.0/references.
--- a/docs/source/transforms.rst
+++ b/docs/source/transforms.rst