Unverified Commit 3a278d70 authored by Nicolas Hug's avatar Nicolas Hug Committed by GitHub
Browse files

Update to transforms docs (#3646)



* Fixed return docstrings

* Added some refs and corrected some parts

* more refs, and a note about dtypes
Co-authored-by: default avatarFrancisco Massa <fvsmassa@gmail.com>
parent 7f4ae8c6
......@@ -4,15 +4,34 @@ torchvision.transforms
.. currentmodule:: torchvision.transforms
Transforms are common image transformations. They can be chained together using :class:`Compose`.
Additionally, there is the :mod:`torchvision.transforms.functional` module.
Functional transforms give fine-grained control over the transformations.
Most transform classes have a function equivalent: :ref:`functional
transforms <functional_transforms>` give fine-grained control over the
transformations.
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).
All transformations accept PIL Image, Tensor Image or batch of Tensor Images as input. Tensor Image is a tensor with
``(C, H, W)`` shape, where ``C`` is a number of channels, ``H`` and ``W`` are image height and width. Batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number of images in the batch. Deterministic or
random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
images and tensor images, although some transformations are :ref:`PIL-only
<transforms_pil_only>` and some are :ref:`tensor-only
<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
convert to and from PIL images.
The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.
The expected range of the values of a tensor image is implicitely defined by
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.
Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.
.. warning::
......@@ -117,6 +136,8 @@ Transforms on PIL Image and torch.\*Tensor
.. autoclass:: GaussianBlur
:members:
.. _transforms_pil_only:
Transforms on PIL Image only
----------------------------
......@@ -124,6 +145,7 @@ Transforms on PIL Image only
.. autoclass:: RandomOrder
.. _transforms_tensor_only:
Transforms on torch.\*Tensor only
---------------------------------
......@@ -139,6 +161,7 @@ Transforms on torch.\*Tensor only
.. autoclass:: ConvertImageDtype
.. _conversion_transforms:
Conversion Transforms
---------------------
......@@ -173,13 +196,16 @@ The new transform can be used standalone or mixed-and-matched with existing tran
:members:
.. _functional_transforms:
Functional Transforms
---------------------
Functional transforms give you fine-grained control of the transformation pipeline.
As opposed to the transformations above, functional transforms don't contain a random number
generator for their parameters.
That means you have to specify/generate all parameters, but you can reuse the functional transform.
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.
Example:
you can apply a functional transform with the same parameters to multiple images like this:
......
......@@ -1103,9 +1103,9 @@ def to_grayscale(img, num_output_channels=1):
Returns:
PIL Image: Grayscale version of the image.
if num_output_channels = 1 : returned image is single channel
if num_output_channels = 3 : returned image is 3 channel with r = g = b
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
"""
if isinstance(img, Image.Image):
return F_pil.to_grayscale(img, num_output_channels)
......@@ -1128,9 +1128,9 @@ def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:
Returns:
PIL Image or Tensor: Grayscale version of the image.
if num_output_channels = 1 : returned image is single channel
if num_output_channels = 3 : returned image is 3 channel with r = g = b
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
"""
if not isinstance(img, torch.Tensor):
return F_pil.to_grayscale(img, num_output_channels)
......@@ -1330,6 +1330,7 @@ def equalize(img: Tensor) -> Tensor:
img (PIL Image or Tensor): Image on which equalize is applied.
If img is torch Tensor, it is expected to be in [..., 1 or 3, H, W] format,
where ... means it can have an arbitrary number of leading dimensions.
The tensor dtype must be ``torch.uint8`` and values are expected to be in ``[0, 255]``.
If img is PIL Image, it is expected to be in mode "P", "L" or "RGB".
Returns:
......
......@@ -1464,6 +1464,7 @@ class Grayscale(torch.nn.Module):
Returns:
PIL Image: Grayscale version of the input.
- If ``num_output_channels == 1`` : returned image is single channel
- If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment