<li><ahref="#differences-between-pytorch-and-tensorflow">Differences between PyTorch and TensorFlow</a><ul>
<li><ahref="#manual-opengl-contexts-in-pytorch">Manual OpenGL contexts in PyTorch</a></li>
...
...
@@ -368,6 +370,7 @@ Examples of things we've done with nvdiffrast
<h3id="linux">Linux</h3>
<p>We recommend running nvdiffrast on <ahref="https://www.docker.com/">Docker</a>. To build a Docker image with nvdiffrast and PyTorch 1.6 installed, run:</p>
<p>We recommend using Ubuntu, as some Linux distributions might not have all the required packages available — at least CentOS is reportedly problematic.</p>
<p>To try out some of the provided code examples, run:</p>
<p>Alternatively, if you have all the dependencies taken care of (consult the included Dockerfile for reference), you can install nvdiffrast in your local Python site-packages by running</p>
...
...
@@ -389,8 +392,8 @@ Examples of things we've done with nvdiffrast
<p>Nvdiffrast offers four differentiable rendering primitives: <strong>rasterization</strong>, <strong>interpolation</strong>, <strong>texturing</strong>, and <strong>antialiasing</strong>. The operation of the primitives is described here in a platform-agnostic way. Platform-specific documentation can be found in the API reference section.</p>
<p>In this section we ignore the minibatch axis for clarity and assume a minibatch size of one. However, all operations support minibatches as detailed later.</p>
<h3id="rasterization">Rasterization</h3>
<p>The rasterization operation takes as inputs a tensor of vertex positions and a tensor of vertex index triplets that specify the triangles. Vertex positions are specified in NDC (Normalized Device Coordinate) space, i.e., after modelview and projection transformations. Performing these transformations is left as the user's responsibility. In NDC, the view frustum is a cube in homogeneous coordinates where <spanclass="math inline"><em>x</em>/<em>w</em></span>, <spanclass="math inline"><em>y</em>/<em>w</em></span>, <spanclass="math inline"><em>z</em>/<em>w</em></span> are all between -1 and +1.</p>
<p>The output of the rasterization operation is a 4-channel float32 image with tuple (<spanclass="math inline"><em>u</em></span>, <spanclass="math inline"><em>v</em></span>, <spanclass="math inline"><em>z</em>/<em>w</em></span>, <spanclass="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span>) in each pixel. Values <spanclass="math inline"><em>u</em></span> and <spanclass="math inline"><em>v</em></span> are the barycentric coordinates within a triangle: the first vertex in the vertex index triplet obtains <spanclass="math inline">(<em>u</em>, <em>v</em>) = (1, 0)</span>, the second vertex <spanclass="math inline">(<em>u</em>, <em>v</em>) = (0, 1)</span> and the third vertex <spanclass="math inline">(<em>u</em>, <em>v</em>) = (0, 0)</span>. NDC-space depth value <spanclass="math inline"><em>z</em>/<em>w</em></span> is used later by the antialiasing operation to infer occlusion relations between triangles, and it does not propagate gradients to the vertex position input. Field <spanclass="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span> is the triangle index, offset by one. Pixels where no triangle was rasterized will receive a zero in all channels.</p>
<p>The rasterization operation takes as inputs a tensor of vertex positions and a tensor of vertex index triplets that specify the triangles. Vertex positions are specified in clip space, i.e., after modelview and projection transformations. Performing these transformations is left as the user's responsibility. In clip space, the view frustum is a cube in homogeneous coordinates where <spanclass="math inline"><em>x</em>/<em>w</em></span>, <spanclass="math inline"><em>y</em>/<em>w</em></span>, <spanclass="math inline"><em>z</em>/<em>w</em></span> are all between -1 and +1.</p>
<p>The output of the rasterization operation is a 4-channel float32 image with tuple (<spanclass="math inline"><em>u</em></span>, <spanclass="math inline"><em>v</em></span>, <spanclass="math inline"><em>z</em>/<em>w</em></span>, <spanclass="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span>) in each pixel. Values <spanclass="math inline"><em>u</em></span> and <spanclass="math inline"><em>v</em></span> are the barycentric coordinates within a triangle: the first vertex in the vertex index triplet obtains <spanclass="math inline">(<em>u</em>, <em>v</em>) = (1, 0)</span>, the second vertex <spanclass="math inline">(<em>u</em>, <em>v</em>) = (0, 1)</span> and the third vertex <spanclass="math inline">(<em>u</em>, <em>v</em>) = (0, 0)</span>. Normalized depth value <spanclass="math inline"><em>z</em>/<em>w</em></span> is used later by the antialiasing operation to infer occlusion relations between triangles, and it does not propagate gradients to the vertex position input. Field <spanclass="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span> is the triangle index, offset by one. Pixels where no triangle was rasterized will receive a zero in all channels.</p>
<p>Rasterization is point-sampled, i.e., the geometry is not smoothed, blurred, or made partially transparent in any way, in contrast to some previous differentiable rasterizers. The contents of a pixel always represent a single surface point that is on the closest surface visible along the ray through the pixel center.</p>
<p>Point-sampled coverage does not produce vertex position gradients related to occlusion and visibility effects. This is because the motion of vertices does not change the coverage in a continuous way — a triangle is either rasterized into a pixel or not. In nvdiffrast, the occlusion/visibility related gradients are generated in the antialiasing operation that typically occurs towards the end of the rendering pipeline.</p>
<divclass="image-parent">
...
...
@@ -510,7 +513,7 @@ Rendered in 4×4 higher resolution and downsampled
<p>Nvdiffrast follows OpenGL's coordinate systems and other conventions. This is partially because we use OpenGL to accelerate the rasterization operation, but mostly so that there is a <ahref="https://xkcd.com/927/">single standard to follow</a>.</p>
<ul>
<li>
The NDC coordinate system, used for specifying vertex positions in rasterization, maps to screen so that <spanclass="math inline"><em>x</em></span> increases towards right side of screen, <spanclass="math inline"><em>y</em></span> increases towards top of screen, and <strong><spanclass="math inline"><em>z</em></span> increases towards the viewer</strong>.
When rasterizing, the normalized device coordinates — i.e., clip-space coordinates after division by <spanclass="math inline"><em>w</em></span> — map to screen so that <spanclass="math inline"><em>x</em></span> increases towards right side of screen, <spanclass="math inline"><em>y</em></span> increases towards top of screen, and <strong><spanclass="math inline"><em>z</em></span> increases towards the viewer</strong>.
</li>
<li>
<strong>The memory order of image data in OpenGL, and consequently in nvdiffrast, is bottom-up.</strong> This means that row 0 of a tensor containing an image is the bottom row of the texture/image, which is the opposite of the more common scanline order. If you want to keep your image data in the conventional top-down order in your code, but have it logically the right way up inside nvdiffrast, you will need to flip the images vertically when crossing the boundary.
...
...
@@ -729,6 +732,10 @@ Mip level 5
<p>Nvdiffrast supports computation on multiple GPUs in both PyTorch and TensorFlow. As is the convention in PyTorch, the operations are always executed on the device on which the input tensors reside. All GPU input tensors must reside on the same device, and the output tensors will unsurprisingly end up on that same device. In addition, the rasterization operation requires that its OpenGL context was created for the correct device. In TensorFlow, the OpenGL context is automatically created on the device of the rasterization operation when it is executed for the first time.</p>
<p>On Windows, nvdiffrast implements OpenGL device selection in a way that can be done only once per process — after one context is created, all future ones will end up on the same GPU. Hence you cannot expect to run the rasterization operation on multiple GPUs within the same process. Trying to do so will either cause a crash or incur a significant performance penalty. However, with PyTorch it is common to distribute computation across GPUs by launching a separate process for each GPU, so this is not a huge concern. Note that any OpenGL context created within the same process, even for something like a GUI window, will prevent changing the device later. Therefore, if you want to run the rasterization operation on other than the default GPU, be sure to create its OpenGL context before initializing any other OpenGL-powered libraries.</p>
<p>On Linux everything just works, and you can create rasterizer OpenGL contexts on multiple devices within the same process.</p>
<h4id="note-on-torch.nn.dataparallel">Note on torch.nn.DataParallel</h4>
<p>PyTorch offers <code>torch.nn.DataParallel</code> wrapper class for splitting the execution of a minibatch across multiple threads. Unfortunately, this class is fundamentally incompatible with OpenGL-dependent operations, as it spawns a new set of threads at each call (as of PyTorch 1.9.0, at least). Using previously created OpenGL contexts in these new threads, even if taking care to not use the same context in multiple threads, causes them to be migrated around and this has resulted in ever-growing GPU memory usage and abysmal GPU utilization. Therefore, we advise against using <code>torch.nn.DataParallel</code> for rasterization operations that depend on the OpenGL contexts.</p>
<p>Notably, <code>torch.nn.DistributedDataParallel</code> spawns subprocesses that are much more persistent. The subprocesses must create their own OpenGL contexts as part of initialization, and as such they do not suffer from this problem.</p>
<p>GitHub issue <ahref="https://github.com/NVlabs/nvdiffrast/issues/23">#23</a>, especially <ahref="https://github.com/NVlabs/nvdiffrast/issues/23#issuecomment-851577382">this comment</a>, contains further analysis and suggestions for workarounds.</p>
<p>Sometimes there is a need to render scenes with partially transparent surfaces. In this case, it is not sufficient to find only the surfaces that are closest to the camera, as you may also need to know what lies behind them. For this purpose, nvdiffrast supports <em>depth peeling</em> that lets you extract multiple closest surfaces for each pixel.</p>
<p>With depth peeling, we start by rasterizing the closest surfaces as usual. We then perform a second rasterization pass with the same geometry, but this time we cull all previously rendered surface points at each pixel, effectively extracting the second-closest depth layer. This can be repeated as many times as desired, so that we can extract as many depth layers as we like. See the images below for example results of depth peeling with each depth layer shaded and antialiased.</p>
...
...
@@ -897,7 +904,9 @@ device.</td></tr></table><div class="methods">Methods, only available if context
<pclass="shortdesc">Rasterize triangles.</p><pclass="longdesc">All input tensors must be contiguous and reside in GPU memory except for
the <code>ranges</code> tensor that, if specified, has to reside in CPU memory. The
output tensors will be contiguous and reside in GPU memory.</p><divclass="arguments">Arguments:</div><tableclass="args"><trclass="arg"><tdclass="argname">glctx</td><tdclass="arg_short">OpenGL context of type <code>RasterizeGLContext</code>.</td></tr><trclass="arg"><tdclass="argname">pos</td><tdclass="arg_short">Vertex position tensor with dtype <code>torch.float32</code>. To enable range
output tensors will be contiguous and reside in GPU memory.</p><pclass="longdesc">Note: For an unknown reason, on Windows the very first rasterization call using
a newly created OpenGL context may *sometimes* output a blank buffer. This is a
known bug and has never been observed to affect subsequent calls.</p><divclass="arguments">Arguments:</div><tableclass="args"><trclass="arg"><tdclass="argname">glctx</td><tdclass="arg_short">OpenGL context of type <code>RasterizeGLContext</code>.</td></tr><trclass="arg"><tdclass="argname">pos</td><tdclass="arg_short">Vertex position tensor with dtype <code>torch.float32</code>. To enable range
mode, this tensor should have a 2D shape [num_vertices, 4]. To enable
instanced mode, use a 3D shape [minibatch_size, num_vertices, 4].</td></tr><trclass="arg"><tdclass="argname">tri</td><tdclass="arg_short">Triangle tensor with shape [num_triangles, 3] and dtype <code>torch.int32</code>.</td></tr><trclass="arg"><tdclass="argname">resolution</td><tdclass="arg_short">Output resolution as integer tuple (height, width).</td></tr><trclass="arg"><tdclass="argname">ranges</td><tdclass="arg_short">In range mode, tensor with shape [minibatch_size, 2] and dtype
<code>torch.int32</code>, specifying start indices and counts into <code>tri</code>.
...
...
@@ -962,7 +971,8 @@ part of texture coordinates. Mode 'clamp' clamps texture coordinates to the
centers of the boundary texels. Mode 'zero' virtually extends the texture with
all-zero values in all directions.</td></tr><trclass="arg"><tdclass="argname">max_mip_level</td><tdclass="arg_short">If specified, limits the number of mipmaps constructed and used in mipmap-based
filter modes.</td></tr></table><divclass="returns">Returns:<divclass="return_description">A tensor containing the results of the texture sampling with shape
<pclass="shortdesc">Construct a mipmap stack for a texture.</p><pclass="longdesc">This function can be used for constructing a mipmap stack for a texture that is known to remain
constant. This avoids reconstructing it every time <code>texture()</code> is called.</p><divclass="arguments">Arguments:</div><tableclass="args"><trclass="arg"><tdclass="argname">tex</td><tdclass="arg_short">Texture tensor with the same constraints as in <code>texture()</code>.</td></tr><trclass="arg"><tdclass="argname">max_mip_level</td><tdclass="arg_short">If specified, limits the number of mipmaps constructed.</td></tr><trclass="arg"><tdclass="argname">cube_mode</td><tdclass="arg_short">Must be set to True if <code>tex</code> specifies a cube map texture.</td></tr></table><divclass="returns">Returns:<divclass="return_description">An opaque object containing the mipmap stack. This can be supplied in a call to <code>texture()</code>
...
...
@@ -977,7 +987,7 @@ known to remain constant. This avoids reconstructing it every time <code>antiali
GPU memory.</td></tr></table><divclass="returns">Returns:<divclass="return_description">An opaque object containing the topology hash. This can be supplied in a call to
<code>antialias()</code> in the <code>topology_hash</code> argument.</div></div></div>
<pclass="shortdesc">Get current log level.</p><pclass="longdesc"></p><divclass="returns">Returns:<divclass="return_description">Current log level in nvdiffrast. See <code>set_log_level()</code> for possible values.</div></div></div>
<pclass="shortdesc">Get current log level.</p><divclass="returns">Returns:<divclass="return_description">Current log level in nvdiffrast. See <code>set_log_level()</code> for possible values.</div></div></div>