Commit b3060c7a authored by Stanislav Pidhorskyi's avatar Stanislav Pidhorskyi Committed by Facebook GitHub Bot
Browse files

Docstrings improvements

Summary: As title says. The is for the sphinx documentation.

Reviewed By: HapeMask

Differential Revision: D63440496

fbshipit-source-id: 483fdfc6cbc14ce8f88e6d048553488f1a0f8ed3
parent b0ca8b5c
......@@ -24,65 +24,97 @@ def edge_grad_estimator(
index_img: th.Tensor,
v_pix_img_hook: Optional[Callable[[th.Tensor], None]] = None,
) -> th.Tensor:
"""
Args:
v_pix: Pixel-space vertex coordinates with preserved camera-space Z-values.
N x V x 3
vi: face vertex index list tensor
V x 3
bary_img: 3D barycentric coordinate image tensor
N x 3 x H x W
img: The rendered image
N x C x H x W
index_img: index image tensor
N x H x W
"""Makes the rasterized image ``img`` differentiable at visibility discontinuities
and backpropagates the gradients to ``v_pix``.
This function takes a rasterized image ``img`` that is assumed to be differentiable at
continuous regions but not at discontinuities. In some cases, ``img`` may not be differentiable
at all. For example, if the image is a rendered segmentation mask, it remains constant at
continuous regions, making it non-differentiable. However, ``edge_grad_estimator`` can still
compute gradients at the discontinuities with respect to ``v_pix``.
The arguments ``bary_img`` and ``index_img`` must correspond exactly to the rasterized image
``img``. Each pixel in ``img`` should correspond to a fragment originated prom primitive
specified by ``index_img`` and it should have barycentric coordinates specified by
``bary_img``. This means that with a small change to ``v_pix``, the pixels in ``img`` should
change accordingly. A frequent mistake that violates this condition is applying a mask
to the rendered image to exclude unwanted regions, which leads to erroneous gradients.
The function returns the ``img`` unchanged but with added differentiability at the
discontinuities. Note that it is not necessary for the input ``img`` to require gradients,
but the returned ``img`` will always require gradients.
v_pix_img_hook: a backward hook that will be registered to v_pix_img. Useful for examining
generated image space. Default None
Args:
v_pix (Tensor): Pixel-space vertex coordinates, preserving the original camera-space
Z-values. Shape: :math:`(N, V, 3)`.
vi (Tensor): Face vertex index list tensor. Shape: :math:`(V, 3)`.
bary_img (Tensor): 3D barycentric coordinate image tensor. Shape: :math:`(N, 3, H, W)`.
img (Tensor): The rendered image. Shape: :math:`(N, C, H, W)`.
index_img (Tensor): Index image tensor. Shape: :math:`(N, H, W)`.
v_pix_img_hook (Optional[Callable[[th.Tensor], None]]): An optional backward hook that will
be registered to ``v_pix_img``. Useful for examining the generated image space. Default
is None.
Returns:
returns the img argument unchanged. Optionally also returns computed
v_pix_img. Your loss should use the returned img, even though it is
unchanged.
Tensor: Returns the input ``img`` unchanged. However, the returned image now has added
differentiability at visibility discontinuities. This returned image should be used for
computing losses
Note:
It is important to not spatially modify the rasterized image before passing it to edge_grad_estimator.
Any operation as long as it is differentiable is ok after the edge_grad_estimator.
It is crucial not to spatially modify the rasterized image before passing it to
`edge_grad_estimator`. That stems from the requirement that ``bary_img`` and ``index_img``
must correspond exactly to the rasterized image ``img``. That means that the location of all
discontinuities is controlled by ``v_pix`` and can be modified by modifing ``v_pix``.
Examples of opeartions that can be done before edge_grad_estimator:
Operations that are allowed, as long as they are differentiable, include:
- Pixel-wise MLP
- Color mapping
- Color correction, gamma correction
If the operation is significantly non-linear, then it is recommended to do it before
edge_grad_estimator. All sorts of clipping and clamping (e.g. `x.clamp(min=0.0, max=1.0)`), must be
done before edge_grad_estimator.
- Anything that would be indistinguishable from processing fragments independently
before their values get assigned to pixels of ``img``
Examples of operations that are not allowed before edge_grad_estimator:
Operations that **must be avoided** before `edge_grad_estimator` include:
- Gaussian blur
- Warping, deformation
- Masking, cropping, making holes.
- Warping or deformation
- Masking, cropping, or introducing holes
There is however, no issue with appling them after `edge_grad_estimator`.
If the operation is highly non-linear, it is recommended to perform it before calling
:func:`edge_grad_estimator`.
All sorts of clipping and clamping (e.g., `x.clamp(min=0.0, max=1.0)`) must also be done
before invoking this function.
Usage::
Usage Example::
from drtk.renderlayer import edge_grad_estimator
import torch.nn.functional as thf
from drtk import transform, rasterize, render, interpolate, edge_grad_estimator
...
out = renderlayer(v, tex, campos, camrot, focal, princpt,
output_filters=["index_img", "render", "mask", "bary_img", "v_pix"])
v_pix = transform(v, tex, campos, camrot, focal, princpt)
index_img = rasterize(v_pix, vi, width=512, height=512)
_, bary_img = render(v_pix, vi, index_img)
vt_img = interpolate(vt, vti, index_img, bary_img)
img = thf.grid_sample(
tex,
vt_img.permute(0, 2, 3, 1),
mode="bilinear",
padding_mode="border",
align_corners=False
)
mask = (index_img != -1)[:, None, :, :]
img = out["render"] * out["mask"]
img = img * mask
img = edge_grad_estimator(
v_pix=out["v_pix"],
vi=rl.vi,
bary_img=out["bary_img"],
v_pix=v_pix,
vi=vi,
bary_img=bary_img,
img=img,
index_img=out["index_img"]
index_img=index_img
)
optim.zero_grad()
......@@ -91,7 +123,10 @@ def edge_grad_estimator(
optim.step()
"""
# Could use v_pix_img output from DRTK, but bary_img needs to be detached.
# TODO: avoid call to interpolate, use backward kernel of interpolate directly
# Doing so will make `edge_grad_estimator` zero-overhead in forward pass
# At the moment, value of `v_pix_img` is ignored, and only passed to
# edge_grad_estimator so that backward kernel can be called with the computed gradient.
v_pix_img = interpolate(v_pix, vi, index_img, bary_img.detach())
img = th.ops.edge_grad_ext.edge_grad_estimator(v_pix, v_pix_img, vi, img, index_img)
......@@ -111,7 +146,7 @@ def edge_grad_estimator_ref(
) -> th.Tensor:
"""
Python reference implementation for
:func:`drtk.edge_grad_estimator.edge_grad_estimator`.
:func:`drtk.edge_grad_estimator`.
"""
# could use v_pix_img output from DRTK, but bary_img needs to be detached.
......
......@@ -3,6 +3,11 @@
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
"""
``drtk.interpolate`` module provides functions for differentiable interpolation of vertex
attributes across the fragments, e.i. pixels covered by the primitive.
"""
import torch as th
from drtk import interpolate_ext
......@@ -18,6 +23,7 @@ def interpolate(
) -> th.Tensor:
"""
Performs a linear interpolation of the vertex attributes given the barycentric coordinates
Args:
vert_attributes (th.Tensor): vertex attribute tensor
N x V x C
......@@ -27,12 +33,14 @@ def interpolate(
N x H x W
bary_img (th.Tensor): 3D barycentric coordinate image tensor
N x 3 x H x W
Returns:
A tensor with interpolated vertex attributes with a shape [N, C, H, W]
Note:
1. The default of `channels_last` is set to true to make this function backward compatible.
Please consider using the argument `channels_last` instead of permuting the result afterward.
2. By default, the output is not contiguous. Make sure you cal .contiguous() if that is a requirement.
.. warning::
The returned tensor has only valid values for pixels which have a valid index in ``index_img``.
For all other pixels, which had index ``-1`` in ``index_img``, the returned tensor will have non-zero
values which should be ignored.
"""
return th.ops.interpolate_ext.interpolate(vert_attributes, vi, index_img, bary_img)
......@@ -44,7 +52,8 @@ def interpolate_ref(
bary_img: th.Tensor,
) -> th.Tensor:
"""
A reference implementation for `interpolate`. See the doc string from `interpolate`
A reference implementation of :func:`drtk.interpolate` in pure PyTorch.
This function is used for tests only, please see :func:`drtk.interpolate` for documentation.
"""
# Run reference implementation in double precision to get as good reference as possible
......
......@@ -24,36 +24,38 @@ def rasterize(
Rasterizes a mesh defined by v and vi.
Args:
v (th.Tensor): vertex positions. The first two components are the projected vertex's location (x, y)
on the image plane. The coordinates of the top left corner are (-0.5, -0.5), and the coordinates of
the bottom right corner are (width - 0.5, height - 0.5). The z component is expected to be in the
camera space coordinate frame (before projection).
v (th.Tensor): vertex positions. The first two components are the projected vertex's
location (x, y) on the image plane. The coordinates of the top left corner are
(-0.5, -0.5), and the coordinates of the bottom right corner are
(width - 0.5, height - 0.5). The z component is expected to be in the camera space
coordinate frame (before projection).
N x V x 3
vi (th.Tensor): face vertex index list tensor. The most significant nibble of vi is reserved for
controlling visibility of the edges in wireframe mode. In non-wireframe mode, content of the most
significant nibble of vi will be ignored.
vi (th.Tensor): face vertex index list tensor. The most significant nibble of vi is
reserved for controlling visibility of the edges in wireframe mode. In non-wireframe
mode, content of the most significant nibble of vi will be ignored.
V x 3
height (int): height of the image in pixels.
width (int): width of the image in pixels.
wireframe (bool): If False (default), rasterizes triangles. If True, rasterizes lines, where the most
significant nibble of vi is reinterpreted as a bit field controlling the visibility of the edges. The
least significant bit controls the visibility of the first edge, the second bit controls the
visibility of the second edge, and the third bit controls the visibility of the third edge. This
limits the maximum number of vertices to 268435455.
wireframe (bool): If False (default), rasterizes triangles. If True, rasterizes lines,
where the most significant nibble of vi is reinterpreted as a bit field controlling
the visibility of the edges. The least significant bit controls the visibility of the
first edge, the second bit controls the visibility of the second edge, and the third
bit controls the visibility of the third edge. This limits the maximum number of
vertices to 268435455.
Returns:
The rasterized image of triangle indices which is represented with an index tensor of a shape
[N, H, W] of type int32 that stores a triangle ID for each pixel. If a triangle covers a pixel and is
the closest triangle to the camera, then the pixel will have the ID of that triangle. If no triangles
cover a pixel, then its ID is -1.
The rasterized image of triangle indices which is represented with an index tensor of a
shape [N, H, W] of type int32 that stores a triangle ID for each pixel. If a triangle
covers a pixel and is the closest triangle to the camera, then the pixel will have the
ID of that triangle. If no triangles cover a pixel, then its ID is -1.
Note:
This function is not differentiable. The gradients should be computed with `edge_grad_estimator`
instead.
This function is not differentiable. The gradients should be computed with
:func:`edge_grad_estimator` instead.
"""
_, index_img = th.ops.rasterize_ext.rasterize(v, vi, height, width, wireframe)
return index_img
......@@ -68,22 +70,24 @@ def rasterize_with_depth(
wireframe: bool = False,
) -> Tuple[th.Tensor, th.Tensor]:
"""
Same as `rasterize` function, additionally returns depth image. Internally it uses the same implementation
as the rasterize function which still computes depth but does not return depth.
Same as :func:`rasterize` function, additionally returns depth image. Internally it uses the
same implementation as the rasterize function which still computes depth but does not return
depth.
Notes:
Note:
The function is not differentiable, including the depth output.
The split is done intentionally to hide the depth image from the user as it is not differentiable which
may cause errors if assumed otherwise. Instead, the`barycentrics` function should be used instead to
The split is done intentionally to hide the depth image from the user as it is not
differentiable which may cause errors if assumed otherwise. Instead, the`barycentrics` function
should be used instead to
compute the differentiable version of depth.
However, we still provide `rasterize_with_depth` which returns non-differentiable depth which could allow
to avoid call to `barycentrics` function when differentiability is not required.
However, we still provide `rasterize_with_depth` which returns non-differentiable depth which
could allow to avoid call to `barycentrics` function when differentiability is not required.
Returns:
The rasterized image of triangle indices of shape [N, H, W] and a depth image of shape [N, H, W].
Values in of pixels in the depth image not covered by any pixel are 0.
The rasterized image of triangle indices of shape [N, H, W] and a depth image of shape
[N, H, W]. Values in of pixels in the depth image not covered by any pixel are 0.
"""
depth_img, index_img = th.ops.rasterize_ext.rasterize(
......
......@@ -22,38 +22,37 @@ def transform(
fov: Optional[th.Tensor] = None,
) -> th.Tensor:
"""
v: Tensor, N x V x 3
Batch of vertex positions for vertices in the mesh.
campos: Tensor, N x 3
Camera position.
camrot: Tensor, N x 3 x 3
Camera rotation matrix.
focal: Tensor, N x 2 x 2
Focal length [[fx, 0],
[0, fy]]
princpt: Tensor, N x 2
Principal point [cx, cy]
K: Tensor, N x 3 x 3
Camera intrinsic calibration matrix. Either this or both (focal,
princpt) must be provided.
Rt: Tensor, N x 3 x 4 or N x 4 x 4
Camera extrinsic matrix. Either this or both (camrot, campos) must be
provided. Camrot is the upper 3x3 of Rt, campos = -R.T @ t.
distortion_mode: List[str]
Names of the distortion modes.
distortion_coeff: Tensor, N x 4
Distortion coefficients.
fov: Tensor, N x 1
Valid field of view of the distortion model.
Projects 3D vertex positions onto the image plane of the camera.
Args:
v (th.Tensor): vertex positions. N x V x 3
campos (Tensor): Camera position. N x 3
camrot (Tensor): Camera rotation matrix. N x 3 x 3
focal (Tensor): Focal length. The upper left 2x2 block of the intrinsic matrix
[[f_x, s], [0, f_y]]. N x 2 x 2
princpt (Tensor): Camera principal point [cx, cy]. N x 2
K (Tensor): Camera intrinsic calibration matrix, N x 3 x 3
Rt (Tensor): Camera extrinsic matrix. N x 3 x 4 or N x 4 x 4
distortion_mode (List[str]): Names of the distortion modes.
distortion_coeff (Tensor): Distortion coefficients. N x 4
fov (Tensor): Valid field of view of the distortion model. N x 1
Returns:
Vertex positions projected onto the image plane of the camera. The last dimension has
still size 3. The first two components are the x and y coordinates on the image plane,
and the z is z component of the vertex positions in the camera frame. The latter is used
for depth values that are written to the z-buffer. N x V x 3
.. warning::
You must specify either ``K`` (intrinsic matrix) or both ``focal`` and ``princpt``
(focal length and principal point).
Additionally, you must provide either ``Rt`` (extrinsic matrix) or both ``campos``
(camera position) and ``camrot`` (camera rotation).
.. note::
If we split ``Rt`` of shape N x 3 x 4 into ``R`` of shape N x 3 x 3 and ``t`` of
shape N x 3 x 1, then: ``camrot`` is ``R``, and ``campos`` is ``-R.T @ t``.
"""
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment