Docstrings improvements

Summary: As title says. The is for the sphinx documentation. Reviewed By: HapeMask Differential Revision: D63440496 fbshipit-source-id: 483fdfc6cbc14ce8f88e6d048553488f1a0f8ed3

Docstrings improvements
Summary: As title says. The is for the sphinx documentation. Reviewed By: HapeMask Differential Revision: D63440496 fbshipit-source-id: 483fdfc6cbc14ce8f88e6d048553488f1a0f8ed3
b3060c7a · Stanislav Pidhorskyi · Facebook GitHub Bot · b0ca8b5c · b3060c7a · b3060c7a
Commit b3060c7a authored Sep 26, 2024 by Stanislav Pidhorskyi Committed by Facebook GitHub Bot Sep 26, 2024
Showing with 152 additions and 105 deletions

drtk/edge_grad_estimator.py drtk/edge_grad_estimator.py +76 -41

drtk/interpolate.py drtk/interpolate.py +14 -5

drtk/rasterize.py drtk/rasterize.py +31 -27

drtk/transform.py drtk/transform.py +31 -32

No files found.
--- a/drtk/edge_grad_estimator.py
+++ b/drtk/edge_grad_estimator.py
@@ -24,65 +24,97 @@ def edge_grad_estimator(
    index_img: th.Tensor,
    v_pix_img_hook: Optional[Callable[[th.Tensor], None]] = None,
 ) -> th.Tensor:
-    """
-    Args:
-        v_pix: Pixel-space vertex coordinates with preserved camera-space Z-values.
-            N x V x 3
-
-        vi: face vertex index list tensor
-            V x 3
-
-        bary_img: 3D barycentric coordinate image tensor
-            N x 3 x H x W
-
-        img: The rendered image
-            N x C x H x W
-
-        index_img: index image tensor
-            N x H x W
+    """Makes the rasterized image ``img`` differentiable at visibility discontinuities
+    and backpropagates the gradients to ``v_pix``.
+
+    This function takes a rasterized image ``img`` that is assumed to be differentiable at
+    continuous regions but not at discontinuities. In some cases, ``img`` may not be differentiable
+    at all. For example, if the image is a rendered segmentation mask, it remains constant at
+    continuous regions, making it non-differentiable. However, ``edge_grad_estimator`` can still
+    compute gradients at the discontinuities with respect to ``v_pix``.
+
+    The arguments ``bary_img`` and ``index_img`` must correspond exactly to the rasterized image
+    ``img``. Each pixel in ``img`` should correspond to a fragment originated prom primitive
+    specified by ``index_img`` and it should have barycentric coordinates specified by
+    ``bary_img``. This means that with a small change to ``v_pix``, the pixels in ``img`` should
+    change accordingly. A frequent mistake that violates this condition is applying a mask
+    to the rendered image to exclude unwanted regions, which leads to erroneous gradients.
+
+    The function returns the ``img`` unchanged but with added differentiability at the
+    discontinuities. Note that it is not necessary for the input ``img`` to require gradients,
+    but the returned ``img`` will always require gradients.

-        v_pix_img_hook: a backward hook that will be registered to v_pix_img. Useful for examining
-            generated image space. Default None
+    Args:
+        v_pix (Tensor): Pixel-space vertex coordinates, preserving the original camera-space
+            Z-values. Shape: :math:`(N, V, 3)`.
+        vi (Tensor): Face vertex index list tensor. Shape: :math:`(V, 3)`.
+        bary_img (Tensor): 3D barycentric coordinate image tensor. Shape: :math:`(N, 3, H, W)`.
+        img (Tensor): The rendered image. Shape: :math:`(N, C, H, W)`.
+        index_img (Tensor): Index image tensor. Shape: :math:`(N, H, W)`.
+        v_pix_img_hook (Optional[Callable[[th.Tensor], None]]): An optional backward hook that will
+            be registered to ``v_pix_img``. Useful for examining the generated image space. Default
+            is None.

    Returns:
-        returns the img argument unchanged. Optionally also returns computed
-        v_pix_img. Your loss should use the returned img, even though it is
-        unchanged.
+        Tensor: Returns the input ``img`` unchanged. However, the returned image now has added
+        differentiability at visibility discontinuities. This returned image should be used for
+        computing losses

    Note:
-        It is important to not spatially modify the rasterized image before passing it to edge_grad_estimator.
-        Any operation as long as it is differentiable is ok after the edge_grad_estimator.
+        It is crucial not to spatially modify the rasterized image before passing it to
+        `edge_grad_estimator`. That stems from the requirement that ``bary_img`` and ``index_img``
+        must correspond exactly to the rasterized image ``img``. That means that the location of all
+        discontinuities is controlled by ``v_pix`` and can be modified by modifing ``v_pix``.

-        Examples of opeartions that can be done before edge_grad_estimator:
+        Operations that are allowed, as long as they are differentiable, include:
            - Pixel-wise MLP
            - Color mapping
            - Color correction, gamma correction
-        If the operation is significantly non-linear, then it is recommended to do it before
-        edge_grad_estimator. All sorts of clipping and clamping (e.g. `x.clamp(min=0.0, max=1.0)`), must be
-        done before edge_grad_estimator.
+            - Anything that would be indistinguishable from processing fragments independently
+              before their values get assigned to pixels of ``img``

-        Examples of operations that are not allowed before edge_grad_estimator:
+        Operations that **must be avoided** before `edge_grad_estimator` include:
            - Gaussian blur
-            - Warping, deformation
-            - Masking, cropping, making holes.
+            - Warping or deformation
+            - Masking, cropping, or introducing holes
+
+        There is however, no issue with appling them after `edge_grad_estimator`.
+
+        If the operation is highly non-linear, it is recommended to perform it before calling
+        :func:`edge_grad_estimator`.
+        All sorts of clipping and clamping (e.g., `x.clamp(min=0.0, max=1.0)`) must also be done
+        before invoking this function.

-    Usage::
+    Usage Example::

-        from drtk.renderlayer import edge_grad_estimator
+        import torch.nn.functional as thf
+        from drtk import transform, rasterize, render, interpolate, edge_grad_estimator

        ...

-        out = renderlayer(v, tex, campos, camrot, focal, princpt,
-                 output_filters=["index_img", "render", "mask", "bary_img", "v_pix"])
+        v_pix = transform(v, tex, campos, camrot, focal, princpt)
+        index_img = rasterize(v_pix, vi, width=512, height=512)
+        _, bary_img = render(v_pix, vi, index_img)
+        vt_img = interpolate(vt, vti, index_img, bary_img)
+
+        img = thf.grid_sample(
+            tex,
+            vt_img.permute(0, 2, 3, 1),
+            mode="bilinear",
+            padding_mode="border",
+            align_corners=False
+        )
+
+        mask = (index_img != -1)[:, None, :, :]

-        img = out["render"] * out["mask"]
+        img = img * mask

        img = edge_grad_estimator(
-            v_pix=out["v_pix"],
-            vi=rl.vi,
-            bary_img=out["bary_img"],
+            v_pix=v_pix,
+            vi=vi,
+            bary_img=bary_img,
            img=img,
-            index_img=out["index_img"]
+            index_img=index_img
        )

        optim.zero_grad()
@@ -91,7 +123,10 @@ def edge_grad_estimator(
        optim.step()
    """

-    # Could use v_pix_img output from DRTK, but bary_img needs to be detached.
+    # TODO: avoid call to interpolate, use backward kernel of interpolate directly
+    # Doing so will make `edge_grad_estimator` zero-overhead in forward pass
+    # At the moment, value of `v_pix_img` is ignored, and only passed to
+    # edge_grad_estimator so that backward kernel can be called with the computed gradient.
    v_pix_img = interpolate(v_pix, vi, index_img, bary_img.detach())

    img = th.ops.edge_grad_ext.edge_grad_estimator(v_pix, v_pix_img, vi, img, index_img)
@@ -111,7 +146,7 @@ def edge_grad_estimator_ref(
 ) -> th.Tensor:
    """
    Python reference implementation for
-    :func:`drtk.edge_grad_estimator.edge_grad_estimator`.
+    :func:`drtk.edge_grad_estimator`.
    """

    # could use v_pix_img output from DRTK, but bary_img needs to be detached.

--- a/drtk/interpolate.py
+++ b/drtk/interpolate.py
@@ -3,6 +3,11 @@
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.

+"""
+``drtk.interpolate`` module provides functions for differentiable interpolation of vertex
+attributes across the fragments, e.i. pixels covered by the primitive.
+"""
+
 import torch as th
 from drtk import interpolate_ext

@@ -18,6 +23,7 @@ def interpolate(
 ) -> th.Tensor:
    """
    Performs a linear interpolation of the vertex attributes given the barycentric coordinates
+
    Args:
        vert_attributes (th.Tensor):  vertex attribute tensor
            N x V x C
@@ -27,12 +33,14 @@ def interpolate(
            N x H x W
        bary_img (th.Tensor): 3D barycentric coordinate image tensor
            N x 3 x H x W
+
    Returns:
        A tensor with interpolated vertex attributes with a shape [N, C, H, W]
-    Note:
-        1. The default of `channels_last` is set to true to make this function backward compatible.
-        Please consider using the argument `channels_last` instead of permuting the result afterward.
-        2. By default, the output is not contiguous. Make sure you cal .contiguous() if that is a requirement.
+
+    .. warning::
+        The returned tensor has only valid values for pixels which have a valid index in ``index_img``.
+        For all other pixels, which had index ``-1`` in ``index_img``, the returned tensor will have non-zero
+        values which should be ignored.
    """
    return th.ops.interpolate_ext.interpolate(vert_attributes, vi, index_img, bary_img)

@@ -44,7 +52,8 @@ def interpolate_ref(
    bary_img: th.Tensor,
 ) -> th.Tensor:
    """
-    A reference implementation for `interpolate`. See the doc string from `interpolate`
+    A reference implementation of :func:`drtk.interpolate` in pure PyTorch.
+    This function is used for tests only, please see :func:`drtk.interpolate` for documentation.
    """

    # Run reference implementation in double precision to get as good reference as possible

--- a/drtk/rasterize.py
+++ b/drtk/rasterize.py
@@ -24,36 +24,38 @@ def rasterize(
    Rasterizes a mesh defined by v and vi.

    Args:
-        v (th.Tensor):  vertex positions. The first two components are the projected vertex's location (x, y)
-        on the image plane. The coordinates of the top left corner are (-0.5, -0.5), and the coordinates of
-        the bottom right corner are (width - 0.5, height - 0.5). The z component is expected to be in the
-        camera space coordinate frame (before projection).
+        v (th.Tensor):  vertex positions. The first two components are the projected vertex's
+            location (x, y) on the image plane. The coordinates of the top left corner are
+            (-0.5, -0.5), and the coordinates of the bottom right corner are
+            (width - 0.5, height - 0.5). The z component is expected to be in the camera space
+            coordinate frame (before projection).
            N x V x 3

-        vi (th.Tensor): face vertex index list tensor. The most significant nibble of vi is reserved for
-        controlling visibility of the edges in wireframe mode. In non-wireframe mode, content of the most
-        significant nibble of vi will be ignored.
+        vi (th.Tensor): face vertex index list tensor. The most significant nibble of vi is
+            reserved for controlling visibility of the edges in wireframe mode. In non-wireframe
+            mode, content of the most significant nibble of vi will be ignored.
            V x 3

        height (int): height of the image in pixels.

        width (int): width of the image in pixels.

-        wireframe (bool): If False (default), rasterizes triangles. If True, rasterizes lines, where the most
-        significant nibble of vi is reinterpreted as a bit field controlling the visibility of the edges. The
-        least significant bit controls the visibility of the first edge, the second bit controls the
-        visibility of the second edge, and the third bit controls the visibility of the third edge. This
-        limits the maximum number of vertices to 268435455.
+        wireframe (bool): If False (default), rasterizes triangles. If True, rasterizes lines,
+            where the most significant nibble of vi is reinterpreted as a bit field controlling
+            the visibility of the edges. The least significant bit controls the visibility of the
+            first edge, the second bit controls the visibility of the second edge, and the third
+            bit controls the visibility of the third edge. This limits the maximum number of
+            vertices to 268435455.

    Returns:
-        The rasterized image of triangle indices which is represented with an index tensor of a shape
-        [N, H, W] of type int32 that stores a triangle ID for each pixel. If a triangle covers a pixel and is
-        the closest triangle to the camera, then the pixel will have the ID of that triangle. If no triangles
-        cover a pixel, then its ID is -1.
+        The rasterized image of triangle indices which is represented with an index tensor of a
+        shape [N, H, W] of type int32 that stores a triangle ID for each pixel. If a triangle
+        covers a pixel and is the closest triangle to the camera, then the pixel will have the
+        ID of that triangle. If no triangles cover a pixel, then its ID is -1.

    Note:
-        This function is not differentiable. The gradients should be computed with `edge_grad_estimator`
-        instead.
+        This function is not differentiable. The gradients should be computed with
+        :func:`edge_grad_estimator` instead.
    """
    _, index_img = th.ops.rasterize_ext.rasterize(v, vi, height, width, wireframe)
    return index_img
@@ -68,22 +70,24 @@ def rasterize_with_depth(
    wireframe: bool = False,
 ) -> Tuple[th.Tensor, th.Tensor]:
    """
-    Same as `rasterize` function, additionally returns depth image. Internally it uses the same implementation
-    as the rasterize function which still computes depth but does not return depth.
+    Same as :func:`rasterize` function, additionally returns depth image. Internally it uses the
+    same implementation as the rasterize function which still computes depth but does not return
+    depth.

-    Notes:
+    Note:
        The function is not differentiable, including the depth output.

-    The split is done intentionally to hide the depth image from the user as it is not differentiable which
-    may cause errors if assumed otherwise. Instead, the`barycentrics` function should be used instead to
+    The split is done intentionally to hide the depth image from the user as it is not
+    differentiable which may cause errors if assumed otherwise. Instead, the`barycentrics` function
+    should be used instead to
    compute the differentiable version of depth.

-    However, we still provide `rasterize_with_depth` which returns non-differentiable depth which could allow
-    to avoid call to `barycentrics` function when differentiability is not required.
+    However, we still provide `rasterize_with_depth` which returns non-differentiable depth which
+    could allow to avoid call to `barycentrics` function when differentiability is not required.

    Returns:
-        The rasterized image of triangle indices of shape [N, H, W] and a depth image of shape [N, H, W].
-        Values in of pixels in the depth image not covered by any pixel are 0.
+        The rasterized image of triangle indices of shape [N, H, W] and a depth image of shape
+        [N, H, W]. Values in of pixels in the depth image not covered by any pixel are 0.

    """
    depth_img, index_img = th.ops.rasterize_ext.rasterize(

--- a/drtk/transform.py
+++ b/drtk/transform.py
@@ -22,38 +22,37 @@ def transform(
    fov: Optional[th.Tensor] = None,
 ) -> th.Tensor:
    """
-    v: Tensor, N x V x 3
-    Batch of vertex positions for vertices in the mesh.
-
-    campos: Tensor, N x 3
-    Camera position.
-
-    camrot: Tensor, N x 3 x 3
-    Camera rotation matrix.
-
-    focal: Tensor, N x 2 x 2
-    Focal length [[fx, 0],
-                  [0, fy]]
-
-    princpt: Tensor, N x 2
-    Principal point [cx, cy]
-
-    K: Tensor, N x 3 x 3
-    Camera intrinsic calibration matrix. Either this or both (focal,
-    princpt) must be provided.
-
-    Rt: Tensor, N x 3 x 4 or N x 4 x 4
-    Camera extrinsic matrix. Either this or both (camrot, campos) must be
-    provided. Camrot is the upper 3x3 of Rt, campos = -R.T @ t.
-
-    distortion_mode: List[str]
-    Names of the distortion modes.
-
-    distortion_coeff: Tensor, N x 4
-    Distortion coefficients.
-
-    fov: Tensor, N x 1
-    Valid field of view of the distortion model.
+    Projects 3D vertex positions onto the image plane of the camera.
+
+    Args:
+        v (th.Tensor):  vertex positions. N x V x 3
+        campos (Tensor): Camera position. N x 3
+        camrot (Tensor): Camera rotation matrix. N x 3 x 3
+        focal (Tensor): Focal length. The upper left 2x2 block of the intrinsic matrix
+            [[f_x, s], [0, f_y]].  N x 2 x 2
+        princpt (Tensor): Camera principal point [cx, cy]. N x 2
+        K (Tensor): Camera intrinsic calibration matrix, N x 3 x 3
+        Rt (Tensor): Camera extrinsic matrix. N x 3 x 4 or N x 4 x 4
+        distortion_mode (List[str]): Names of the distortion modes.
+        distortion_coeff (Tensor): Distortion coefficients. N x 4
+        fov (Tensor): Valid field of view of the distortion model. N x 1
+
+    Returns:
+        Vertex positions projected onto the image plane of the camera. The last dimension has
+        still size 3. The first two components are the x and y coordinates on the image plane,
+        and the z is z component of the vertex positions in the camera frame. The latter is used
+        for depth values that are written to the z-buffer. N x V x 3
+
+    .. warning::
+        You must specify either ``K`` (intrinsic matrix) or both ``focal`` and ``princpt``
+        (focal length and principal point).
+
+        Additionally, you must provide either ``Rt`` (extrinsic matrix) or both ``campos``
+        (camera position) and ``camrot`` (camera rotation).
+
+    .. note::
+        If we split ``Rt`` of shape N x 3 x 4 into ``R`` of shape N x 3 x 3 and ``t`` of
+        shape N x 3 x 1, then: ``camrot`` is ``R``, and ``campos`` is ``-R.T @ t``.

    """