Updates to RoI transforms docs (#3645)

* Edited roi transforms docs * remove incorrect param desc * Fixed confusion about feature maps according to comment

Updates to RoI transforms docs (#3645)
* Edited roi transforms docs * remove incorrect param desc * Fixed confusion about feature maps according to comment
23a67b3f · Nicolas Hug · GitHub · a89da92b · 23a67b3f · 23a67b3f
Unverified Commit 23a67b3f authored Apr 20, 2021 by Nicolas Hug Committed by GitHub Apr 20, 2021
4 changed files
--- a/torchvision/ops/ps_roi_align.py
+++ b/torchvision/ops/ps_roi_align.py
@@ -19,23 +19,24 @@ def ps_roi_align(
    mentioned in Light-Head R-CNN.
    Args:
-        input (Tensor[N, C, H, W]): input tensor
+        input (Tensor[N, C, H, W]): The input tensor, i.e. a batch with ``N`` elements. Each element
+            contains ``C`` feature maps of dimensions ``H x W``.
        boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
            format where the regions will be taken from.
            The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
-            If a single Tensor is passed,
+            If a single Tensor is passed, then the first column should
-            then the first column should contain the batch index. If a list of Tensors
+            contain the index of the corresponding element in the batch, i.e. a number in ``[0, N - 1]``.
-            is passed, then each Tensor will correspond to the boxes for an element i
+            If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i
-            in a batch
+            in the batch.
-        output_size (int or Tuple[int, int]): the size of the output after the cropping
+        output_size (int or Tuple[int, int]): the size of the output (in bins or pixels) after the pooling
-            is performed, as (height, width)
+            is performed, as (height, width).
        spatial_scale (float): a scaling factor that maps the input coordinates to
            the box coordinates. Default: 1.0
        sampling_ratio (int): number of sampling points in the interpolation grid
-            used to compute the output value of each pooled output bin. If > 0
+            used to compute the output value of each pooled output bin. If > 0,
-            then exactly sampling_ratio x sampling_ratio grid points are used.
+            then exactly ``sampling_ratio x sampling_ratio`` sampling points per bin are used. If
-            If <= 0, then an adaptive number of grid points are used (computed as
+            <= 0, then an adaptive number of grid points are used (computed as
-            ceil(roi_width / pooled_w), and likewise for height). Default: -1
+            ``ceil(roi_width / output_width)``, and likewise for height). Default: -1
    Returns:
        Tensor[K, C, output_size[0], output_size[1]]: The pooled RoIs

--- a/torchvision/ops/ps_roi_pool.py
+++ b/torchvision/ops/ps_roi_pool.py
@@ -18,16 +18,17 @@ def ps_roi_pool(
    described in R-FCN
    Args:
-        input (Tensor[N, C, H, W]): input tensor
+        input (Tensor[N, C, H, W]): The input tensor, i.e. a batch with ``N`` elements. Each element
+            contains ``C`` feature maps of dimensions ``H x W``.
        boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
            format where the regions will be taken from.
            The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
-            If a single Tensor is passed,
+            If a single Tensor is passed, then the first column should
-            then the first column should contain the batch index. If a list of Tensors
+            contain the index of the corresponding element in the batch, i.e. a number in ``[0, N - 1]``.
-            is passed, then each Tensor will correspond to the boxes for an element i
+            If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i
-            in a batch
+            in the batch.
-        output_size (int or Tuple[int, int]): the size of the output after the cropping
+        output_size (int or Tuple[int, int]): the size of the output (in bins or pixels) after the pooling
-            is performed, as (height, width)
+            is performed, as (height, width).
        spatial_scale (float): a scaling factor that maps the input coordinates to
            the box coordinates. Default: 1.0

--- a/torchvision/ops/roi_align.py
+++ b/torchvision/ops/roi_align.py
@@ -17,30 +17,31 @@ def roi_align(
    aligned: bool = False,
 ) -> Tensor:
    """
-    Performs Region of Interest (RoI) Align operator described in Mask R-CNN
+    Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.
    Args:
-        input (Tensor[N, C, H, W]): input tensor
+        input (Tensor[N, C, H, W]): The input tensor, i.e. a batch with ``N`` elements. Each element
+            contains ``C`` feature maps of dimensions ``H x W``.
            If the tensor is quantized, we expect a batch size of ``N == 1``.
        boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
            format where the regions will be taken from.
            The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
-            If a single Tensor is passed,
+            If a single Tensor is passed, then the first column should
-            then the first column should contain the batch index. If a list of Tensors
+            contain the index of the corresponding element in the batch, i.e. a number in ``[0, N - 1]``.
-            is passed, then each Tensor will correspond to the boxes for an element i
+            If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i
-            in a batch
+            in the batch.
-        output_size (int or Tuple[int, int]): the size of the output after the cropping
+        output_size (int or Tuple[int, int]): the size of the output (in bins or pixels) after the pooling
-            is performed, as (height, width)
+            is performed, as (height, width).
        spatial_scale (float): a scaling factor that maps the input coordinates to
            the box coordinates. Default: 1.0
        sampling_ratio (int): number of sampling points in the interpolation grid
            used to compute the output value of each pooled output bin. If > 0,
-            then exactly sampling_ratio x sampling_ratio grid points are used. If
+            then exactly ``sampling_ratio x sampling_ratio`` sampling points per bin are used. If
            <= 0, then an adaptive number of grid points are used (computed as
-            ceil(roi_width / pooled_w), and likewise for height). Default: -1
+            ``ceil(roi_width / output_width)``, and likewise for height). Default: -1
        aligned (bool): If False, use the legacy implementation.
-            If True, pixel shift it by -0.5 for align more perfectly about two neighboring pixel indices.
+            If True, pixel shift the box coordinates it by -0.5 for a better alignment with the two
-            This version in Detectron2
+            neighboring pixel indices. This version is used in Detectron2
    Returns:
        Tensor[K, C, output_size[0], output_size[1]]: The pooled RoIs.

--- a/torchvision/ops/roi_pool.py
+++ b/torchvision/ops/roi_pool.py
@@ -18,14 +18,15 @@ def roi_pool(
    Performs Region of Interest (RoI) Pool operator described in Fast R-CNN
    Args:
-        input (Tensor[N, C, H, W]): input tensor
+        input (Tensor[N, C, H, W]): The input tensor, i.e. a batch with ``N`` elements. Each element
+            contains ``C`` feature maps of dimensions ``H x W``.
        boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
            format where the regions will be taken from.
            The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
-            If a single Tensor is passed,
+            If a single Tensor is passed, then the first column should
-            then the first column should contain the batch index. If a list of Tensors
+            contain the index of the corresponding element in the batch, i.e. a number in ``[0, N - 1]``.
-            is passed, then each Tensor will correspond to the boxes for an element i
+            If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i
-            in a batch
+            in the batch.
        output_size (int or Tuple[int, int]): the size of the output after the cropping
            is performed, as (height, width)
        spatial_scale (float): a scaling factor that maps the input coordinates to