4/N Batchify EdgeGrad

Summary: # Problem In CT / State Encoding, we expect a scenario where we would like to render a batch of topologies where each of them would have different number of vertices and triangles. Currently the only way to support this with DRTK is to iterate over the batch in a for loop for each topology and render it. In a series of diffs we would like to solve this issue by making drtk consume a batch of triangles as opposed to just 1 set of triangles. However, we would like to achieve this behavior without affecting the most common single topology case by a lot. # How do we pass in multiple topologies in a single batch? We will provide a TopologyBatch structure in xrcia/lib/graphics/structures where we will provide functionality to create a Batch x MaxTriangles x 3 and Batch x MaxVertices x 3. Padded vertices will be 0s and padded triangles will have MaxVertices - 1 as their value. But these will discarded as degenerate in rasterization / rendering. # In this diff - Extend `edge_grad_estimator` to support a batch dimension as default. - `edge_grad_kernel` will now unsqueeze the batch dimension when using a single topo - We access the vertex indices of triangles by walking an additional `batch stride * n` in the triangles data pointer. - Add an extra condition to check to see if the triangles are degenerate; this happens when padding the batch. - We show that the optimization continues to produce the same results as in D68538236 Reviewed By: podgorskiy Differential Revision: D68534639 fbshipit-source-id: 4f0ed24075d71b73b9313ecc61296e9567219b0d

4/N Batchify EdgeGrad
Summary: # Problem In CT / State Encoding, we expect a scenario where we would like to render a batch of topologies where each of them would have different number of vertices and triangles. Currently the only way to support this with DRTK is to iterate over the batch in a for loop for each topology and render it. In a series of diffs we would like to solve this issue by making drtk consume a batch of triangles as opposed to just 1 set of triangles. However, we would like to achieve this behavior without affecting the most common single topology case by a lot. # How do we pass in multiple topologies in a single batch? We will provide a TopologyBatch structure in xrcia/lib/graphics/structures where we will provide functionality to create a Batch x MaxTriangles x 3 and Batch x MaxVertices x 3. Padded vertices will be 0s and padded triangles will have MaxVertices - 1 as their value. But these will discarded as degenerate in rasterization / rendering. # In this diff - Extend `edge_grad_estimator` to support a batch dimension as default. - `edge_grad_kernel` will now unsqueeze the batch dimension when using a single topo - We access the vertex indices of triangles by walking an additional `batch stride * n` in the triangles data pointer. - Add an extra condition to check to see if the triangles are degenerate; this happens when padding the batch. - We show that the optimization continues to produce the same results as in D68538236 Reviewed By: podgorskiy Differential Revision: D68534639 fbshipit-source-id: 4f0ed24075d71b73b9313ecc61296e9567219b0d
910228e6 · Kishore Venkateshan · Facebook GitHub Bot · bdb1bfdb · 910228e6 · 910228e6
Commit 910228e6 authored Jan 23, 2025 by Kishore Venkateshan Committed by Facebook GitHub Bot Jan 23, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 6 deletions

drtk/edge_grad_estimator.py drtk/edge_grad_estimator.py +4 -1

src/edge_grad/edge_grad_kernel.cu src/edge_grad/edge_grad_kernel.cu +6 -5

No files found.
--- a/drtk/edge_grad_estimator.py
+++ b/drtk/edge_grad_estimator.py
@@ -46,7 +46,7 @@ def edge_grad_estimator(
    Args:
        v_pix (Tensor): Pixel-space vertex coordinates, preserving the original camera-space
            Z-values. Shape: :math:`(N, V, 3)`.
-        vi (Tensor): Face vertex index list tensor. Shape: :math:`(V, 3)`.
+        vi (Tensor): Face vertex index list tensor. Shape: :math:`(F, 3)` or :math:`(N, F, 3)`.
        bary_img (Tensor): 3D barycentric coordinate image tensor. Shape: :math:`(N, 3, H, W)`.
        img (Tensor): The rendered image. Shape: :math:`(N, C, H, W)`.
        index_img (Tensor): Index image tensor. Shape: :math:`(N, H, W)`.
@@ -122,6 +122,9 @@ def edge_grad_estimator(
        optim.step()
    """

+    if vi.ndim == 2:
+        vi = vi[None, ...].expand(v_pix.shape[0], -1, -1)
+
    # TODO: avoid call to interpolate, use backward kernel of interpolate directly
    # Doing so will make `edge_grad_estimator` zero-overhead in forward pass
    # At the moment, value of `v_pix_img` is ignored, and only passed to

--- a/src/edge_grad/edge_grad_kernel.cu
+++ b/src/edge_grad/edge_grad_kernel.cu
@@ -176,8 +176,9 @@ __global__ void edge_grad_backward_kernel(
  const index_t grad_v_pix_img_sH = grad_v_pix_img.strides[2];
  const index_t grad_v_pix_img_sW = grad_v_pix_img.strides[3];

-  const index_t vi_sV = vi.strides[0];
-  const index_t vi_sF = vi.strides[1];
+  const index_t vi_sN = vi.strides[0];
+  const index_t vi_sV = vi.strides[1];
+  const index_t vi_sF = vi.strides[2];

  CUDA_KERNEL_LOOP_TYPE(index, nthreads, index_t) {
    const index_t x = index % W;
@@ -211,11 +212,11 @@ __global__ void edge_grad_backward_kernel(
      // vertex indices of triangles of CRD pixels
      // 0,0,0 - if not valid
      const int3 vi_pt_center = load_vec3_if_valid<int32_t, index_t>(
-          vi.data + center_index * vi_sV, vi_sF, c_valid, {0, 0, 0});
+          vi.data + n * vi_sN + center_index * vi_sV, vi_sF, c_valid, {0, 0, 0});
      const int3 vi_pt_right = load_vec3_if_valid<int32_t, index_t>(
-          vi.data + right_index * vi_sV, vi_sF, r_valid, {0, 0, 0});
+          vi.data + n * vi_sN + right_index * vi_sV, vi_sF, r_valid, {0, 0, 0});
      const int3 vi_pt_down = load_vec3_if_valid<int32_t, index_t>(
-          vi.data + down_index * vi_sV, vi_sF, d_valid, {0, 0, 0});
+          vi.data + n * vi_sN + down_index * vi_sV, vi_sF, d_valid, {0, 0, 0});

      // center <-> right differ
      const bool lr_diff = (center_index != right_index);