Update to transforms docs (#3646)

* Fixed return docstrings * Added some refs and corrected some parts * more refs, and a note about dtypes Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

Update to transforms docs (#3646)
* Fixed return docstrings * Added some refs and corrected some parts * more refs, and a note about dtypes Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
3a278d70 · Nicolas Hug · GitHub · 7f4ae8c6 · 3a278d70 · 3a278d70
Unverified Commit 3a278d70 authored Apr 08, 2021 by Nicolas Hug Committed by GitHub Apr 08, 2021
3 changed files
--- a/docs/source/transforms.rst
+++ b/docs/source/transforms.rst
@@ -4,15 +4,34 @@ torchvision.transforms
 .. currentmodule:: torchvision.transforms
 Transforms are common image transformations. They can be chained together using :class:`Compose`.
-Additionally, there is the :mod:`torchvision.transforms.functional` module.
+Most transform classes have a function equivalent: :ref:`functional
-Functional transforms give fine-grained control over the transformations.
+transforms <functional_transforms>` give fine-grained control over the
+transformations.
 This is useful if you have to build a more complex transformation pipeline
 (e.g. in the case of segmentation tasks).
-All transformations accept PIL Image, Tensor Image or batch of Tensor Images as input. Tensor Image is a tensor with
+Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
-``(C, H, W)`` shape, where ``C`` is a number of channels, ``H`` and ``W`` are image height and width. Batch of
+images and tensor images, although some transformations are :ref:`PIL-only
-Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number of images in the batch. Deterministic or
+<transforms_pil_only>` and some are :ref:`tensor-only
-random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
+<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
+convert to and from PIL images.
+The transformations that accept tensor images also accept batches of tensor
+images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
+number of channels, ``H`` and ``W`` are image height and width. A batch of
+Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
+of images in the batch.
+The expected range of the values of a tensor image is implicitely defined by
+the tensor dtype. Tensor images with a float dtype are expected to have
+values in ``[0, 1)``. Tensor images with an integer dtype are expected to
+have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
+that can be represented in that dtype.
+Randomized transformations will apply the same transformation to all the
+images of a given batch, but they will produce different transformations
+across calls. For reproducible transformations across calls, you may use
+:ref:`functional transforms <functional_transforms>`.
 .. warning::
@@ -117,6 +136,8 @@ Transforms on PIL Image and torch.\*Tensor
 .. autoclass:: GaussianBlur
    :members:
+.. _transforms_pil_only:
 Transforms on PIL Image only
 ----------------------------
@@ -124,6 +145,7 @@ Transforms on PIL Image only
 .. autoclass:: RandomOrder
+.. _transforms_tensor_only:
 Transforms on torch.\*Tensor only
 ---------------------------------
@@ -139,6 +161,7 @@ Transforms on torch.\*Tensor only
 .. autoclass:: ConvertImageDtype
+.. _conversion_transforms:
 Conversion Transforms
 ---------------------
@@ -173,13 +196,16 @@ The new transform can be used standalone or mixed-and-matched with existing tran
    :members:
+.. _functional_transforms:
 Functional Transforms
 ---------------------
 Functional transforms give you fine-grained control of the transformation pipeline.
 As opposed to the transformations above, functional transforms don't contain a random number
 generator for their parameters.
-That means you have to specify/generate all parameters, but you can reuse the functional transform.
+That means you have to specify/generate all parameters, but the functional transform will give you
+reproducible results across calls.
 Example:
 you can apply a functional transform with the same parameters to multiple images like this:

--- a/torchvision/transforms/functional.py
+++ b/torchvision/transforms/functional.py
@@ -1103,9 +1103,9 @@ def to_grayscale(img, num_output_channels=1):
    Returns:
        PIL Image: Grayscale version of the image.
-            if num_output_channels = 1 : returned image is single channel
-            if num_output_channels = 3 : returned image is 3 channel with r = g = b
+        - if num_output_channels = 1 : returned image is single channel
+        - if num_output_channels = 3 : returned image is 3 channel with r = g = b
    """
    if isinstance(img, Image.Image):
        return F_pil.to_grayscale(img, num_output_channels)
@@ -1128,9 +1128,9 @@ def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:
    Returns:
        PIL Image or Tensor: Grayscale version of the image.
-            if num_output_channels = 1 : returned image is single channel
-            if num_output_channels = 3 : returned image is 3 channel with r = g = b
+        - if num_output_channels = 1 : returned image is single channel
+        - if num_output_channels = 3 : returned image is 3 channel with r = g = b
    """
    if not isinstance(img, torch.Tensor):
        return F_pil.to_grayscale(img, num_output_channels)
@@ -1330,6 +1330,7 @@ def equalize(img: Tensor) -> Tensor:
        img (PIL Image or Tensor): Image on which equalize is applied.
            If img is torch Tensor, it is expected to be in [..., 1 or 3, H, W] format,
            where ... means it can have an arbitrary number of leading dimensions.
+            The tensor dtype must be ``torch.uint8`` and values are expected to be in ``[0, 255]``.
            If img is PIL Image, it is expected to be in mode "P", "L" or "RGB".
    Returns:

--- a/torchvision/transforms/transforms.py
+++ b/torchvision/transforms/transforms.py
@@ -1464,6 +1464,7 @@ class Grayscale(torch.nn.Module):
    Returns:
        PIL Image: Grayscale version of the input.
        - If ``num_output_channels == 1`` : returned image is single channel
        - If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b