Unverified Commit 18577d02 authored by Francisco Massa's avatar Francisco Massa Committed by GitHub
Browse files

Add entry in the documentation for video models (#1207)

* Add docs for video models

* Fix docstrings for resnet and vgg
parent 1afef1dc
...@@ -4,8 +4,8 @@ torchvision.models ...@@ -4,8 +4,8 @@ torchvision.models
The models subpackage contains definitions of models for addressing The models subpackage contains definitions of models for addressing
different tasks, including: image classification, pixelwise semantic different tasks, including: image classification, pixelwise semantic
segmentation, object detection, instance segmentation and person segmentation, object detection, instance segmentation, person
keypoint detection. keypoint detection and video classification.
Classification Classification
...@@ -395,3 +395,51 @@ Keypoint R-CNN ...@@ -395,3 +395,51 @@ Keypoint R-CNN
.. autofunction:: torchvision.models.detection.keypointrcnn_resnet50_fpn .. autofunction:: torchvision.models.detection.keypointrcnn_resnet50_fpn
Video classification
====================
We provide models for action recognition pre-trained on Kinetics-400.
They have all been trained with the scripts provided in ``references/video_classification``.
All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W),
where H and W are expected to be 112, and T is a number of video frames in a clip.
The images have to be loaded in to a range of [0, 1] and then normalized
using ``mean = [0.43216, 0.394666, 0.37645]`` and ``std = [0.22803, 0.22145, 0.216989]``.
.. note::
The normalization parameters are different from the image classification ones, and correspond
to the mean and std from Kinetics-400.
.. note::
For now, normalization code can be found in ``references/video_classification/transforms.py``,
see the ``Normalize`` function there. Note that it differs from standard normalization for
images because it assumes the video is 4d.
Kinetics 1-crop accuracies for clip length 16 (16x112x112)
================================ ============= =============
Network Clip acc@1 Clip acc@5
================================ ============= =============
ResNet 3D 18 52.75 75.45
ResNet MC 18 53.90 76.29
ResNet (2+1)D 57.50 78.81
================================ ============= =============
ResNet 3D
----------
.. autofunction:: torchvision.models.video.r3d_18
ResNet Mixed Convolution
------------------------
.. autofunction:: torchvision.models.video.mc3_18
ResNet (2+1)D
-------------
.. autofunction:: torchvision.models.video.r2plus1d_18
...@@ -221,7 +221,7 @@ def _resnet(arch, block, layers, pretrained, progress, **kwargs): ...@@ -221,7 +221,7 @@ def _resnet(arch, block, layers, pretrained, progress, **kwargs):
def resnet18(pretrained=False, progress=True, **kwargs): def resnet18(pretrained=False, progress=True, **kwargs):
r"""ResNet-18 model from r"""ResNet-18 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>'_ `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -233,7 +233,7 @@ def resnet18(pretrained=False, progress=True, **kwargs): ...@@ -233,7 +233,7 @@ def resnet18(pretrained=False, progress=True, **kwargs):
def resnet34(pretrained=False, progress=True, **kwargs): def resnet34(pretrained=False, progress=True, **kwargs):
r"""ResNet-34 model from r"""ResNet-34 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>'_ `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -245,7 +245,7 @@ def resnet34(pretrained=False, progress=True, **kwargs): ...@@ -245,7 +245,7 @@ def resnet34(pretrained=False, progress=True, **kwargs):
def resnet50(pretrained=False, progress=True, **kwargs): def resnet50(pretrained=False, progress=True, **kwargs):
r"""ResNet-50 model from r"""ResNet-50 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>'_ `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -257,7 +257,7 @@ def resnet50(pretrained=False, progress=True, **kwargs): ...@@ -257,7 +257,7 @@ def resnet50(pretrained=False, progress=True, **kwargs):
def resnet101(pretrained=False, progress=True, **kwargs): def resnet101(pretrained=False, progress=True, **kwargs):
r"""ResNet-101 model from r"""ResNet-101 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>'_ `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -269,7 +269,7 @@ def resnet101(pretrained=False, progress=True, **kwargs): ...@@ -269,7 +269,7 @@ def resnet101(pretrained=False, progress=True, **kwargs):
def resnet152(pretrained=False, progress=True, **kwargs): def resnet152(pretrained=False, progress=True, **kwargs):
r"""ResNet-152 model from r"""ResNet-152 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>'_ `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
......
...@@ -97,7 +97,7 @@ def _vgg(arch, cfg, batch_norm, pretrained, progress, **kwargs): ...@@ -97,7 +97,7 @@ def _vgg(arch, cfg, batch_norm, pretrained, progress, **kwargs):
def vgg11(pretrained=False, progress=True, **kwargs): def vgg11(pretrained=False, progress=True, **kwargs):
r"""VGG 11-layer model (configuration "A") from r"""VGG 11-layer model (configuration "A") from
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -108,7 +108,7 @@ def vgg11(pretrained=False, progress=True, **kwargs): ...@@ -108,7 +108,7 @@ def vgg11(pretrained=False, progress=True, **kwargs):
def vgg11_bn(pretrained=False, progress=True, **kwargs): def vgg11_bn(pretrained=False, progress=True, **kwargs):
r"""VGG 11-layer model (configuration "A") with batch normalization r"""VGG 11-layer model (configuration "A") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -119,7 +119,7 @@ def vgg11_bn(pretrained=False, progress=True, **kwargs): ...@@ -119,7 +119,7 @@ def vgg11_bn(pretrained=False, progress=True, **kwargs):
def vgg13(pretrained=False, progress=True, **kwargs): def vgg13(pretrained=False, progress=True, **kwargs):
r"""VGG 13-layer model (configuration "B") r"""VGG 13-layer model (configuration "B")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -130,7 +130,7 @@ def vgg13(pretrained=False, progress=True, **kwargs): ...@@ -130,7 +130,7 @@ def vgg13(pretrained=False, progress=True, **kwargs):
def vgg13_bn(pretrained=False, progress=True, **kwargs): def vgg13_bn(pretrained=False, progress=True, **kwargs):
r"""VGG 13-layer model (configuration "B") with batch normalization r"""VGG 13-layer model (configuration "B") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -141,7 +141,7 @@ def vgg13_bn(pretrained=False, progress=True, **kwargs): ...@@ -141,7 +141,7 @@ def vgg13_bn(pretrained=False, progress=True, **kwargs):
def vgg16(pretrained=False, progress=True, **kwargs): def vgg16(pretrained=False, progress=True, **kwargs):
r"""VGG 16-layer model (configuration "D") r"""VGG 16-layer model (configuration "D")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -152,7 +152,7 @@ def vgg16(pretrained=False, progress=True, **kwargs): ...@@ -152,7 +152,7 @@ def vgg16(pretrained=False, progress=True, **kwargs):
def vgg16_bn(pretrained=False, progress=True, **kwargs): def vgg16_bn(pretrained=False, progress=True, **kwargs):
r"""VGG 16-layer model (configuration "D") with batch normalization r"""VGG 16-layer model (configuration "D") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -163,7 +163,7 @@ def vgg16_bn(pretrained=False, progress=True, **kwargs): ...@@ -163,7 +163,7 @@ def vgg16_bn(pretrained=False, progress=True, **kwargs):
def vgg19(pretrained=False, progress=True, **kwargs): def vgg19(pretrained=False, progress=True, **kwargs):
r"""VGG 19-layer model (configuration "E") r"""VGG 19-layer model (configuration "E")
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
...@@ -174,7 +174,7 @@ def vgg19(pretrained=False, progress=True, **kwargs): ...@@ -174,7 +174,7 @@ def vgg19(pretrained=False, progress=True, **kwargs):
def vgg19_bn(pretrained=False, progress=True, **kwargs): def vgg19_bn(pretrained=False, progress=True, **kwargs):
r"""VGG 19-layer model (configuration 'E') with batch normalization r"""VGG 19-layer model (configuration 'E') with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>'_ `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_
Args: Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet pretrained (bool): If True, returns a model pre-trained on ImageNet
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment