<li><ahref="#differences-between-pytorch-and-tensorflow">Differences between PyTorch and TensorFlow</a><ul>
<li><ahref="#manual-opengl-contexts-in-pytorch">Manual OpenGL contexts in PyTorch</a></li>
</ul></li>
...
...
@@ -524,8 +528,8 @@ For 2D textures, the coordinate origin <span class="math inline">(<em>s</em>,
<p>We skirted around a pretty fundamental question in the description of the texturing operation above. In order to determine the proper amount of prefiltering for sampling a texture, we need to know how densely it is being sampled. But how can we know the sampling density when each pixel knows of a just a single surface point?</p>
<p>The solution is to track the image-space derivatives of all things leading up to the texture sampling operation. <em>These are not the same thing as the gradients used in the backward pass</em>, even though they both involve differentiation! Consider the barycentrics <spanclass="math inline">(<em>u</em>, <em>v</em>)</span> produced by the rasterization operation. They change by some amount when moving horizontally or vertically in the image plane. If we denote the image-space coordinates as <spanclass="math inline">(<em>X</em>, <em>Y</em>)</span>, the image-space derivatives of the barycentrics would be <spanclass="math inline">∂<em>u</em>/∂<em>X</em></span>, <spanclass="math inline">∂<em>u</em>/∂<em>Y</em></span>, <spanclass="math inline">∂<em>v</em>/∂<em>X</em></span>, and <spanclass="math inline">∂<em>v</em>/∂<em>Y</em></span>. We can organize these into a 2×2 Jacobian matrix that describes the local relationship between <spanclass="math inline">(<em>u</em>, <em>v</em>)</span> and <spanclass="math inline">(<em>X</em>, <em>Y</em>)</span>. This matrix is generally different at every pixel. For the purpose of image-space derivatives, the units of <spanclass="math inline"><em>X</em></span> and <spanclass="math inline"><em>Y</em></span> are pixels. Hence, <spanclass="math inline">∂<em>u</em>/∂<em>X</em></span> is the local approximation of how much <spanclass="math inline"><em>u</em></span> changes when moving a distance of one pixel in the horizontal direction, and so on.</p>
<p>Once we know how the barycentrics change w.r.t. pixel position, the interpolation operation can use this to determine how the attributes change w.r.t. pixel position. When attributes are used as texture coordinates, we can therefore tell how the texture sampling position (in texture space) changes when moving around within the pixel (up to a local, linear approximation, that is). This <em>texture footprint</em> tells us the scale on which the texture should be prefiltered. In more practical terms, it tells us which mipmap level(s) to use when sampling the texture.</p>
<p>In nvdiffrast, the rasterization operation can be configured to output the image-space derivatives of the barycentrics in an auxiliary 4-channel output tensor, ordered (<spanclass="math inline">∂<em>u</em>/∂<em>X</em></span>, <spanclass="math inline">∂<em>u</em>/∂<em>Y</em></span>, <spanclass="math inline">∂<em>v</em>/∂<em>X</em></span>, <spanclass="math inline">∂<em>v</em>/∂<em>Y</em></span>) from channel 0 to 3. The interpolation operation can take this auxiliary tensor as input and compute image-space derivatives of any set of attributes being interpolated. Finally, the texture sampling operation requires the image-space derivatives of the texture coordinates if a prefiltered sampling mode is being used.</p>
<p>There is nothing magic about these image-space derivatives. They are tensors like the, e.g., the texture coordinates themselves, they propagate gradients backwards, and so on. For example, if you want to artificially blur or sharpen the texture when sampling it, you can simply multiply the tensor carrying the image-space derivatives of the texture coordinates <spanclass="math inline">∂{<em>s</em>, <em>t</em>}/∂{<em>X</em>, <em>Y</em>}</span> by a scalar value before feeding it into the texture sampling operation. This scales the texture footprints and thus adjusts the amount of prefiltering. If your loss function prefers a different level of sharpness, this multiplier will receive a nonzero gradient.</p>
<p>In nvdiffrast, the rasterization operation can be configured to output the image-space derivatives of the barycentrics in an auxiliary 4-channel output tensor, ordered (<spanclass="math inline">∂<em>u</em>/∂<em>X</em></span>, <spanclass="math inline">∂<em>u</em>/∂<em>Y</em></span>, <spanclass="math inline">∂<em>v</em>/∂<em>X</em></span>, <spanclass="math inline">∂<em>v</em>/∂<em>Y</em></span>) from channel 0 to 3. The interpolation operation can take this auxiliary tensor as input and compute image-space derivatives of any set of attributes being interpolated. Finally, the texture sampling operation can use the image-space derivatives of the texture coordinates to determine the amount of prefiltering.</p>
<p>There is nothing magic about these image-space derivatives. They are tensors like the, e.g., the texture coordinates themselves, they propagate gradients backwards, and so on. For example, if you want to artificially blur or sharpen the texture when sampling it, you can simply multiply the tensor carrying the image-space derivatives of the texture coordinates <spanclass="math inline">∂{<em>s</em>, <em>t</em>}/∂{<em>X</em>, <em>Y</em>}</span> by a scalar value before feeding it into the texture sampling operation. This scales the texture footprints and thus adjusts the amount of prefiltering. If your loss function prefers a different level of sharpness, this multiplier will receive a nonzero gradient.<em>Update:</em> Since version 0.2.1, the texture sampling operation also supports a separate mip level bias input that would be better suited for this particular task, but the gist is the same nonetheless.</p>
<p>One might wonder if it would have been easier to determine the texture footprints simply from the texture coordinates in adjacent pixels, and skip all this derivative rubbish? In easy cases the answer is yes, but silhouettes, occlusions, and discontinuous texture parameterizations would make this approach rather unreliable in practice. Computing the image-space derivatives analytically keeps everything point-like, local, and well-behaved.</p>
<p>It should be noted that computing gradients related to image-space derivatives is somewhat involved and requires additional computation. At the same time, they are often not crucial for the convergence of the training/optimization. Because of this, the primitive operations in nvdiffrast offer options to disable the calculation of these gradients. We're talking about things like <spanclass="math inline">∂<em>L</em><em>o</em><em>s</em><em>s</em>/∂(∂{<em>u</em>, <em>v</em>}/∂{<em>X</em>, <em>Y</em>})</span> that may look second-order-ish, but they're not.</p>
<h3id="mipmaps-and-texture-dimensions">Mipmaps and texture dimensions</h3>
...
...
@@ -725,6 +729,39 @@ Mip level 5
<p>Nvdiffrast supports computation on multiple GPUs in both PyTorch and TensorFlow. As is the convention in PyTorch, the operations are always executed on the device on which the input tensors reside. All GPU input tensors must reside on the same device, and the output tensors will unsurprisingly end up on that same device. In addition, the rasterization operation requires that its OpenGL context was created for the correct device. In TensorFlow, the OpenGL context is automatically created on the device of the rasterization operation when it is executed for the first time.</p>
<p>On Windows, nvdiffrast implements OpenGL device selection in a way that can be done only once per process — after one context is created, all future ones will end up on the same GPU. Hence you cannot expect to run the rasterization operation on multiple GPUs within the same process. Trying to do so will either cause a crash or incur a significant performance penalty. However, with PyTorch it is common to distribute computation across GPUs by launching a separate process for each GPU, so this is not a huge concern. Note that any OpenGL context created within the same process, even for something like a GUI window, will prevent changing the device later. Therefore, if you want to run the rasterization operation on other than the default GPU, be sure to create its OpenGL context before initializing any other OpenGL-powered libraries.</p>
<p>On Linux everything just works, and you can create rasterizer OpenGL contexts on multiple devices within the same process.</p>
<p>Sometimes there is a need to render scenes with partially transparent surfaces. In this case, it is not sufficient to find only the surfaces that are closest to the camera, as you may also need to know what lies behind them. For this purpose, nvdiffrast supports <em>depth peeling</em> that lets you extract multiple closest surfaces for each pixel.</p>
<p>With depth peeling, we start by rasterizing the closest surfaces as usual. We then perform a second rasterization pass with the same geometry, but this time we cull all previously rendered surface points at each pixel, effectively extracting the second-closest depth layer. This can be repeated as many times as desired, so that we can extract as many depth layers as we like. See the images below for example results of depth peeling with each depth layer shaded and antialiased.</p>
<divclass="image-parent">
<divclass="image-row">
<divclass="image-caption">
<imgclass="brd"src="img/spot_aa.png"/>
<divclass="caption">
First depth layer
</div>
</div>
<divclass="image-caption">
<imgclass="brd"src="img/spot_peel1.png"/>
<divclass="caption">
Second depth layer
</div>
</div>
<divclass="image-caption">
<imgclass="brd"src="img/spot_peel2.png"/>
<divclass="caption">
Third depth layer
</div>
</div>
</div>
</div>
<p>The API for depth peeling is based on <code>DepthPeeler</code> object that acts as a <ahref="https://docs.python.org/3/reference/datamodel.html#context-managers">context manager</a>, and its <code>rasterize_next_layer</code> method. The first call to <code>rasterize_next_layer</code> is equivalent to calling the traditional <code>rasterize</code> function, and subsequent calls report further depth layers. The arguments for rasterization are specified when instantiating the <code>DepthPeeler</code> object. Concretely, your code might look something like this:</p>
<aclass="sourceLine"id="cb8-4"data-line-number="4"> (process <spanclass="kw">or</span> store the results)</a></code></pre></div>
<p>There is no performance penalty compared to the basic rasterization op if you end up extracting only the first depth layer. In other words, the code above with <code>num_layers=1</code> runs exactly as fast as calling <code>rasterize</code> once.</p>
<p>Depth peeling is only supported in the PyTorch version of nvdiffrast. For implementation reasons, depth peeling reserves the OpenGL context so that other rasterization operations cannot be performed while the peeling is ongoing, i.e., inside the <code>with</code> block. Hence you cannot start a nested depth peeling operation or call <code>rasterize</code> inside the <code>with</code> block, unless you use a different OpenGL context.</p>
<p>For the sake of completeness, let us note the following small caveat: Depth peeling relies on depth values to distinguish surface points from each other. Therefore, culling "previously rendered surface points" actually means culling all surface points at the same or closer depth as those rendered into the pixel in previous passes. This matters only if you have multiple layers of geometry at matching depths — if your geometry consists of, say, nothing but two exactly overlapping triangles, you will see one of them in the first pass but never see the other one in subsequent passes, as it's at the exact depth that is already considered done.</p>
<h3id="differences-between-pytorch-and-tensorflow">Differences between PyTorch and TensorFlow</h3>
<p>Nvdiffrast can be used from PyTorch and from TensorFlow 1.x; the latter may change to TensorFlow 2.x if there is demand. These frameworks operate somewhat differently and that is reflected in the respective APIs. Simplifying a bit, in TensorFlow 1.x you construct a persistent graph out of persistent nodes, and run many batches of data through it. In PyTorch, there is no persistent graph or nodes, but a new, ephemeral graph is constructed for each batch of data and destroyed immediately afterwards. Therefore, there is also no persistent state for the operations. There is the <code>torch.nn.Module</code> abstraction for festooning operations with persistent state, but we do not use it.</p>
<p>As a consequence, things that would be part of persistent state of an nvdiffrast operation in TensorFlow must be stored by the user in PyTorch, and supplied to the operations as needed. In practice, this is a very small difference and amounts to just a couple of lines of code in most cases.</p>
...
...
@@ -859,20 +896,25 @@ specified, context will be created on currently active Cuda
device.</td></tr></table><divclass="methods">Methods, only available if context was created in manual mode:</div><tableclass="args"><trclass="arg"><tdclass="argname">set_context()</td><tdclass="arg_short">Set (activate) OpenGL context in the current CPU thread.</td></tr><trclass="arg"><tdclass="argname">release_context()</td><tdclass="arg_short">Release (deactivate) currently active OpenGL context.</td></tr></table><divclass="returns">Returns:<divclass="return_description">The newly created OpenGL rasterizer context.</div></div></div>
<pclass="shortdesc">Rasterize triangles.</p><pclass="longdesc">All input tensors must be contiguous and reside in GPU memory except for
the <code>ranges</code> tensor that, if specified, has to reside in CPU memory. The
the <code>ranges</code> tensor that, if specified, has to reside in CPU memory. The
output tensors will be contiguous and reside in GPU memory.</p><divclass="arguments">Arguments:</div><tableclass="args"><trclass="arg"><tdclass="argname">glctx</td><tdclass="arg_short">OpenGL context of type <code>RasterizeGLContext</code>.</td></tr><trclass="arg"><tdclass="argname">pos</td><tdclass="arg_short">Vertex position tensor with dtype <code>torch.float32</code>. To enable range
mode, this tensor should have a 2D shape [num_vertices, 4]. To enable
instanced mode, use a 3D shape [minibatch_size, num_vertices, 4].</td></tr><trclass="arg"><tdclass="argname">tri</td><tdclass="arg_short">Triangle tensor with shape [num_triangles, 3] and dtype <code>torch.int32</code>.</td></tr><trclass="arg"><tdclass="argname">resolution</td><tdclass="arg_short">Output resolution as integer tuple (height, width).</td></tr><trclass="arg"><tdclass="argname">ranges</td><tdclass="arg_short">In range mode, tensor with shape [minibatch_size, 2] and dtype
instanced mode, use a 3D shape [minibatch_size, num_vertices, 4].</td></tr><trclass="arg"><tdclass="argname">tri</td><tdclass="arg_short">Triangle tensor with shape [num_triangles, 3] and dtype <code>torch.int32</code>.</td></tr><trclass="arg"><tdclass="argname">resolution</td><tdclass="arg_short">Output resolution as integer tuple (height, width).</td></tr><trclass="arg"><tdclass="argname">ranges</td><tdclass="arg_short">In range mode, tensor with shape [minibatch_size, 2] and dtype
<code>torch.int32</code>, specifying start indices and counts into <code>tri</code>.
Ignored in instanced mode.</td></tr><trclass="arg"><tdclass="argname">grad_db</td><tdclass="arg_short">Propagate gradients of image-space derivatives of barycentrics
into <code>pos</code> in backward pass. Ignored if OpenGL context was
not configured to output image-space derivatives.</td></tr></table><divclass="returns">Returns:<divclass="return_description">A tuple of two tensors. The first output tensor has shape [minibatch_size,
not configured to output image-space derivatives.</td></tr></table><divclass="returns">Returns:<divclass="return_description">A tuple of two tensors. The first output tensor has shape [minibatch_size,
height, width, 4] and contains the main rasterizer output in order (u, v, z/w,
triangle_id). If the OpenGL context was configured to output image-space
derivatives of barycentrics, the second output tensor will also have shape
[minibatch_size, height, width, 4] and contain said derivatives in order
(du/dX, du/dY, dv/dX, dv/dY). Otherwise it will be an empty tensor with shape
<pclass="shortdesc">Create a depth peeler object for rasterizing multiple depth layers.</p><pclass="longdesc">Arguments are the same as in <code>rasterize()</code>.</p><divclass="returns">Returns:<divclass="return_description">The newly created depth peeler.</div></div></div>
<pclass="shortdesc">Rasterize next depth layer.</p><pclass="longdesc">Operation is equivalent to <code>rasterize()</code> except that previously reported
surface points are culled away.</p><divclass="returns">Returns:<divclass="return_description">A tuple of two tensors as in <code>rasterize()</code>.</div></div></div>
<pclass="shortdesc">Interpolate vertex attributes.</p><pclass="longdesc">All input tensors must be contiguous and reside in GPU memory. The output tensors
will be contiguous and reside in GPU memory.</p><divclass="arguments">Arguments:</div><tableclass="args"><trclass="arg"><tdclass="argname">attr</td><tdclass="arg_short">Attribute tensor with dtype <code>torch.float32</code>.