Commit 0c39902a authored by Samuli Laine's avatar Samuli Laine
Browse files

Bugfixes to #30, #32, #36

parent a4e7a4db
...@@ -318,7 +318,9 @@ div.image-parent { ...@@ -318,7 +318,9 @@ div.image-parent {
<li><a href="#geometry-and-minibatches-range-mode-vs-instanced-mode">Geometry and minibatches: Range mode vs Instanced mode</a></li> <li><a href="#geometry-and-minibatches-range-mode-vs-instanced-mode">Geometry and minibatches: Range mode vs Instanced mode</a></li>
<li><a href="#image-space-derivatives">Image-space derivatives</a></li> <li><a href="#image-space-derivatives">Image-space derivatives</a></li>
<li><a href="#mipmaps-and-texture-dimensions">Mipmaps and texture dimensions</a></li> <li><a href="#mipmaps-and-texture-dimensions">Mipmaps and texture dimensions</a></li>
<li><a href="#running-on-multiple-gpus">Running on multiple GPUs</a></li> <li><a href="#running-on-multiple-gpus">Running on multiple GPUs</a><ul>
<li><a href="#note-on-torch.nn.dataparallel">Note on torch.nn.DataParallel</a></li>
</ul></li>
<li><a href="#rendering-multiple-depth-layers">Rendering multiple depth layers</a></li> <li><a href="#rendering-multiple-depth-layers">Rendering multiple depth layers</a></li>
<li><a href="#differences-between-pytorch-and-tensorflow">Differences between PyTorch and TensorFlow</a><ul> <li><a href="#differences-between-pytorch-and-tensorflow">Differences between PyTorch and TensorFlow</a><ul>
<li><a href="#manual-opengl-contexts-in-pytorch">Manual OpenGL contexts in PyTorch</a></li> <li><a href="#manual-opengl-contexts-in-pytorch">Manual OpenGL contexts in PyTorch</a></li>
...@@ -368,6 +370,7 @@ Examples of things we've done with nvdiffrast ...@@ -368,6 +370,7 @@ Examples of things we've done with nvdiffrast
<h3 id="linux">Linux</h3> <h3 id="linux">Linux</h3>
<p>We recommend running nvdiffrast on <a href="https://www.docker.com/">Docker</a>. To build a Docker image with nvdiffrast and PyTorch 1.6 installed, run:</p> <p>We recommend running nvdiffrast on <a href="https://www.docker.com/">Docker</a>. To build a Docker image with nvdiffrast and PyTorch 1.6 installed, run:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb2-1" data-line-number="1"><span class="ex">./run_sample.sh</span> --build-container</a></code></pre></div> <div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb2-1" data-line-number="1"><span class="ex">./run_sample.sh</span> --build-container</a></code></pre></div>
<p>We recommend using Ubuntu, as some Linux distributions might not have all the required packages available — at least CentOS is reportedly problematic.</p>
<p>To try out some of the provided code examples, run:</p> <p>To try out some of the provided code examples, run:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="ex">./run_sample.sh</span> ./samples/torch/cube.py --resolution 32</a></code></pre></div> <div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="ex">./run_sample.sh</span> ./samples/torch/cube.py --resolution 32</a></code></pre></div>
<p>Alternatively, if you have all the dependencies taken care of (consult the included Dockerfile for reference), you can install nvdiffrast in your local Python site-packages by running</p> <p>Alternatively, if you have all the dependencies taken care of (consult the included Dockerfile for reference), you can install nvdiffrast in your local Python site-packages by running</p>
...@@ -389,8 +392,8 @@ Examples of things we've done with nvdiffrast ...@@ -389,8 +392,8 @@ Examples of things we've done with nvdiffrast
<p>Nvdiffrast offers four differentiable rendering primitives: <strong>rasterization</strong>, <strong>interpolation</strong>, <strong>texturing</strong>, and <strong>antialiasing</strong>. The operation of the primitives is described here in a platform-agnostic way. Platform-specific documentation can be found in the API reference section.</p> <p>Nvdiffrast offers four differentiable rendering primitives: <strong>rasterization</strong>, <strong>interpolation</strong>, <strong>texturing</strong>, and <strong>antialiasing</strong>. The operation of the primitives is described here in a platform-agnostic way. Platform-specific documentation can be found in the API reference section.</p>
<p>In this section we ignore the minibatch axis for clarity and assume a minibatch size of one. However, all operations support minibatches as detailed later.</p> <p>In this section we ignore the minibatch axis for clarity and assume a minibatch size of one. However, all operations support minibatches as detailed later.</p>
<h3 id="rasterization">Rasterization</h3> <h3 id="rasterization">Rasterization</h3>
<p>The rasterization operation takes as inputs a tensor of vertex positions and a tensor of vertex index triplets that specify the triangles. Vertex positions are specified in NDC (Normalized Device Coordinate) space, i.e., after modelview and projection transformations. Performing these transformations is left as the user's responsibility. In NDC, the view frustum is a cube in homogeneous coordinates where <span class="math inline"><em>x</em>/<em>w</em></span>, <span class="math inline"><em>y</em>/<em>w</em></span>, <span class="math inline"><em>z</em>/<em>w</em></span> are all between -1 and +1.</p> <p>The rasterization operation takes as inputs a tensor of vertex positions and a tensor of vertex index triplets that specify the triangles. Vertex positions are specified in clip space, i.e., after modelview and projection transformations. Performing these transformations is left as the user's responsibility. In clip space, the view frustum is a cube in homogeneous coordinates where <span class="math inline"><em>x</em>/<em>w</em></span>, <span class="math inline"><em>y</em>/<em>w</em></span>, <span class="math inline"><em>z</em>/<em>w</em></span> are all between -1 and +1.</p>
<p>The output of the rasterization operation is a 4-channel float32 image with tuple (<span class="math inline"><em>u</em></span>, <span class="math inline"><em>v</em></span>, <span class="math inline"><em>z</em>/<em>w</em></span>, <span class="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span>) in each pixel. Values <span class="math inline"><em>u</em></span> and <span class="math inline"><em>v</em></span> are the barycentric coordinates within a triangle: the first vertex in the vertex index triplet obtains <span class="math inline">(<em>u</em>, <em>v</em>) = (1, 0)</span>, the second vertex <span class="math inline">(<em>u</em>, <em>v</em>) = (0, 1)</span> and the third vertex <span class="math inline">(<em>u</em>, <em>v</em>) = (0, 0)</span>. NDC-space depth value <span class="math inline"><em>z</em>/<em>w</em></span> is used later by the antialiasing operation to infer occlusion relations between triangles, and it does not propagate gradients to the vertex position input. Field <span class="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span> is the triangle index, offset by one. Pixels where no triangle was rasterized will receive a zero in all channels.</p> <p>The output of the rasterization operation is a 4-channel float32 image with tuple (<span class="math inline"><em>u</em></span>, <span class="math inline"><em>v</em></span>, <span class="math inline"><em>z</em>/<em>w</em></span>, <span class="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span>) in each pixel. Values <span class="math inline"><em>u</em></span> and <span class="math inline"><em>v</em></span> are the barycentric coordinates within a triangle: the first vertex in the vertex index triplet obtains <span class="math inline">(<em>u</em>, <em>v</em>) = (1, 0)</span>, the second vertex <span class="math inline">(<em>u</em>, <em>v</em>) = (0, 1)</span> and the third vertex <span class="math inline">(<em>u</em>, <em>v</em>) = (0, 0)</span>. Normalized depth value <span class="math inline"><em>z</em>/<em>w</em></span> is used later by the antialiasing operation to infer occlusion relations between triangles, and it does not propagate gradients to the vertex position input. Field <span class="math inline"><em>t</em><em>r</em><em>i</em><em>a</em><em>n</em><em>g</em><em>l</em><em>e</em>_<em>i</em><em>d</em></span> is the triangle index, offset by one. Pixels where no triangle was rasterized will receive a zero in all channels.</p>
<p>Rasterization is point-sampled, i.e., the geometry is not smoothed, blurred, or made partially transparent in any way, in contrast to some previous differentiable rasterizers. The contents of a pixel always represent a single surface point that is on the closest surface visible along the ray through the pixel center.</p> <p>Rasterization is point-sampled, i.e., the geometry is not smoothed, blurred, or made partially transparent in any way, in contrast to some previous differentiable rasterizers. The contents of a pixel always represent a single surface point that is on the closest surface visible along the ray through the pixel center.</p>
<p>Point-sampled coverage does not produce vertex position gradients related to occlusion and visibility effects. This is because the motion of vertices does not change the coverage in a continuous way — a triangle is either rasterized into a pixel or not. In nvdiffrast, the occlusion/visibility related gradients are generated in the antialiasing operation that typically occurs towards the end of the rendering pipeline.</p> <p>Point-sampled coverage does not produce vertex position gradients related to occlusion and visibility effects. This is because the motion of vertices does not change the coverage in a continuous way — a triangle is either rasterized into a pixel or not. In nvdiffrast, the occlusion/visibility related gradients are generated in the antialiasing operation that typically occurs towards the end of the rendering pipeline.</p>
<div class="image-parent"> <div class="image-parent">
...@@ -510,7 +513,7 @@ Rendered in 4×4 higher resolution and downsampled ...@@ -510,7 +513,7 @@ Rendered in 4×4 higher resolution and downsampled
<p>Nvdiffrast follows OpenGL's coordinate systems and other conventions. This is partially because we use OpenGL to accelerate the rasterization operation, but mostly so that there is a <a href="https://xkcd.com/927/">single standard to follow</a>.</p> <p>Nvdiffrast follows OpenGL's coordinate systems and other conventions. This is partially because we use OpenGL to accelerate the rasterization operation, but mostly so that there is a <a href="https://xkcd.com/927/">single standard to follow</a>.</p>
<ul> <ul>
<li> <li>
The NDC coordinate system, used for specifying vertex positions in rasterization, maps to screen so that <span class="math inline"><em>x</em></span> increases towards right side of screen, <span class="math inline"><em>y</em></span> increases towards top of screen, and <strong><span class="math inline"><em>z</em></span> increases towards the viewer</strong>. When rasterizing, the normalized device coordinates — i.e., clip-space coordinates after division by <span class="math inline"><em>w</em></span> map to screen so that <span class="math inline"><em>x</em></span> increases towards right side of screen, <span class="math inline"><em>y</em></span> increases towards top of screen, and <strong><span class="math inline"><em>z</em></span> increases towards the viewer</strong>.
</li> </li>
<li> <li>
<strong>The memory order of image data in OpenGL, and consequently in nvdiffrast, is bottom-up.</strong> This means that row 0 of a tensor containing an image is the bottom row of the texture/image, which is the opposite of the more common scanline order. If you want to keep your image data in the conventional top-down order in your code, but have it logically the right way up inside nvdiffrast, you will need to flip the images vertically when crossing the boundary. <strong>The memory order of image data in OpenGL, and consequently in nvdiffrast, is bottom-up.</strong> This means that row 0 of a tensor containing an image is the bottom row of the texture/image, which is the opposite of the more common scanline order. If you want to keep your image data in the conventional top-down order in your code, but have it logically the right way up inside nvdiffrast, you will need to flip the images vertically when crossing the boundary.
...@@ -729,6 +732,10 @@ Mip level 5 ...@@ -729,6 +732,10 @@ Mip level 5
<p>Nvdiffrast supports computation on multiple GPUs in both PyTorch and TensorFlow. As is the convention in PyTorch, the operations are always executed on the device on which the input tensors reside. All GPU input tensors must reside on the same device, and the output tensors will unsurprisingly end up on that same device. In addition, the rasterization operation requires that its OpenGL context was created for the correct device. In TensorFlow, the OpenGL context is automatically created on the device of the rasterization operation when it is executed for the first time.</p> <p>Nvdiffrast supports computation on multiple GPUs in both PyTorch and TensorFlow. As is the convention in PyTorch, the operations are always executed on the device on which the input tensors reside. All GPU input tensors must reside on the same device, and the output tensors will unsurprisingly end up on that same device. In addition, the rasterization operation requires that its OpenGL context was created for the correct device. In TensorFlow, the OpenGL context is automatically created on the device of the rasterization operation when it is executed for the first time.</p>
<p>On Windows, nvdiffrast implements OpenGL device selection in a way that can be done only once per process — after one context is created, all future ones will end up on the same GPU. Hence you cannot expect to run the rasterization operation on multiple GPUs within the same process. Trying to do so will either cause a crash or incur a significant performance penalty. However, with PyTorch it is common to distribute computation across GPUs by launching a separate process for each GPU, so this is not a huge concern. Note that any OpenGL context created within the same process, even for something like a GUI window, will prevent changing the device later. Therefore, if you want to run the rasterization operation on other than the default GPU, be sure to create its OpenGL context before initializing any other OpenGL-powered libraries.</p> <p>On Windows, nvdiffrast implements OpenGL device selection in a way that can be done only once per process — after one context is created, all future ones will end up on the same GPU. Hence you cannot expect to run the rasterization operation on multiple GPUs within the same process. Trying to do so will either cause a crash or incur a significant performance penalty. However, with PyTorch it is common to distribute computation across GPUs by launching a separate process for each GPU, so this is not a huge concern. Note that any OpenGL context created within the same process, even for something like a GUI window, will prevent changing the device later. Therefore, if you want to run the rasterization operation on other than the default GPU, be sure to create its OpenGL context before initializing any other OpenGL-powered libraries.</p>
<p>On Linux everything just works, and you can create rasterizer OpenGL contexts on multiple devices within the same process.</p> <p>On Linux everything just works, and you can create rasterizer OpenGL contexts on multiple devices within the same process.</p>
<h4 id="note-on-torch.nn.dataparallel">Note on torch.nn.DataParallel</h4>
<p>PyTorch offers <code>torch.nn.DataParallel</code> wrapper class for splitting the execution of a minibatch across multiple threads. Unfortunately, this class is fundamentally incompatible with OpenGL-dependent operations, as it spawns a new set of threads at each call (as of PyTorch 1.9.0, at least). Using previously created OpenGL contexts in these new threads, even if taking care to not use the same context in multiple threads, causes them to be migrated around and this has resulted in ever-growing GPU memory usage and abysmal GPU utilization. Therefore, we advise against using <code>torch.nn.DataParallel</code> for rasterization operations that depend on the OpenGL contexts.</p>
<p>Notably, <code>torch.nn.DistributedDataParallel</code> spawns subprocesses that are much more persistent. The subprocesses must create their own OpenGL contexts as part of initialization, and as such they do not suffer from this problem.</p>
<p>GitHub issue <a href="https://github.com/NVlabs/nvdiffrast/issues/23">#23</a>, especially <a href="https://github.com/NVlabs/nvdiffrast/issues/23#issuecomment-851577382">this comment</a>, contains further analysis and suggestions for workarounds.</p>
<h3 id="rendering-multiple-depth-layers">Rendering multiple depth layers</h3> <h3 id="rendering-multiple-depth-layers">Rendering multiple depth layers</h3>
<p>Sometimes there is a need to render scenes with partially transparent surfaces. In this case, it is not sufficient to find only the surfaces that are closest to the camera, as you may also need to know what lies behind them. For this purpose, nvdiffrast supports <em>depth peeling</em> that lets you extract multiple closest surfaces for each pixel.</p> <p>Sometimes there is a need to render scenes with partially transparent surfaces. In this case, it is not sufficient to find only the surfaces that are closest to the camera, as you may also need to know what lies behind them. For this purpose, nvdiffrast supports <em>depth peeling</em> that lets you extract multiple closest surfaces for each pixel.</p>
<p>With depth peeling, we start by rasterizing the closest surfaces as usual. We then perform a second rasterization pass with the same geometry, but this time we cull all previously rendered surface points at each pixel, effectively extracting the second-closest depth layer. This can be repeated as many times as desired, so that we can extract as many depth layers as we like. See the images below for example results of depth peeling with each depth layer shaded and antialiased.</p> <p>With depth peeling, we start by rasterizing the closest surfaces as usual. We then perform a second rasterization pass with the same geometry, but this time we cull all previously rendered surface points at each pixel, effectively extracting the second-closest depth layer. This can be repeated as many times as desired, so that we can extract as many depth layers as we like. See the images below for example results of depth peeling with each depth layer shaded and antialiased.</p>
...@@ -897,7 +904,9 @@ device.</td></tr></table><div class="methods">Methods, only available if context ...@@ -897,7 +904,9 @@ device.</td></tr></table><div class="methods">Methods, only available if context
<div class="apifunc"><h4><code>nvdiffrast.torch.rasterize(<em>glctx</em>, <em>pos</em>, <em>tri</em>, <em>resolution</em>, <em>ranges</em>=<span class="defarg">None</span>, <em>grad_db</em>=<span class="defarg">True</span>)</code>&nbsp;<span class="sym_function">Function</span></h4> <div class="apifunc"><h4><code>nvdiffrast.torch.rasterize(<em>glctx</em>, <em>pos</em>, <em>tri</em>, <em>resolution</em>, <em>ranges</em>=<span class="defarg">None</span>, <em>grad_db</em>=<span class="defarg">True</span>)</code>&nbsp;<span class="sym_function">Function</span></h4>
<p class="shortdesc">Rasterize triangles.</p><p class="longdesc">All input tensors must be contiguous and reside in GPU memory except for <p class="shortdesc">Rasterize triangles.</p><p class="longdesc">All input tensors must be contiguous and reside in GPU memory except for
the <code>ranges</code> tensor that, if specified, has to reside in CPU memory. The the <code>ranges</code> tensor that, if specified, has to reside in CPU memory. The
output tensors will be contiguous and reside in GPU memory.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">glctx</td><td class="arg_short">OpenGL context of type <code>RasterizeGLContext</code>.</td></tr><tr class="arg"><td class="argname">pos</td><td class="arg_short">Vertex position tensor with dtype <code>torch.float32</code>. To enable range output tensors will be contiguous and reside in GPU memory.</p><p class="longdesc">Note: For an unknown reason, on Windows the very first rasterization call using
a newly created OpenGL context may *sometimes* output a blank buffer. This is a
known bug and has never been observed to affect subsequent calls.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">glctx</td><td class="arg_short">OpenGL context of type <code>RasterizeGLContext</code>.</td></tr><tr class="arg"><td class="argname">pos</td><td class="arg_short">Vertex position tensor with dtype <code>torch.float32</code>. To enable range
mode, this tensor should have a 2D shape [num_vertices, 4]. To enable mode, this tensor should have a 2D shape [num_vertices, 4]. To enable
instanced mode, use a 3D shape [minibatch_size, num_vertices, 4].</td></tr><tr class="arg"><td class="argname">tri</td><td class="arg_short">Triangle tensor with shape [num_triangles, 3] and dtype <code>torch.int32</code>.</td></tr><tr class="arg"><td class="argname">resolution</td><td class="arg_short">Output resolution as integer tuple (height, width).</td></tr><tr class="arg"><td class="argname">ranges</td><td class="arg_short">In range mode, tensor with shape [minibatch_size, 2] and dtype instanced mode, use a 3D shape [minibatch_size, num_vertices, 4].</td></tr><tr class="arg"><td class="argname">tri</td><td class="arg_short">Triangle tensor with shape [num_triangles, 3] and dtype <code>torch.int32</code>.</td></tr><tr class="arg"><td class="argname">resolution</td><td class="arg_short">Output resolution as integer tuple (height, width).</td></tr><tr class="arg"><td class="argname">ranges</td><td class="arg_short">In range mode, tensor with shape [minibatch_size, 2] and dtype
<code>torch.int32</code>, specifying start indices and counts into <code>tri</code>. <code>torch.int32</code>, specifying start indices and counts into <code>tri</code>.
...@@ -962,7 +971,8 @@ part of texture coordinates. Mode 'clamp' clamps texture coordinates to the ...@@ -962,7 +971,8 @@ part of texture coordinates. Mode 'clamp' clamps texture coordinates to the
centers of the boundary texels. Mode 'zero' virtually extends the texture with centers of the boundary texels. Mode 'zero' virtually extends the texture with
all-zero values in all directions.</td></tr><tr class="arg"><td class="argname">max_mip_level</td><td class="arg_short">If specified, limits the number of mipmaps constructed and used in mipmap-based all-zero values in all directions.</td></tr><tr class="arg"><td class="argname">max_mip_level</td><td class="arg_short">If specified, limits the number of mipmaps constructed and used in mipmap-based
filter modes.</td></tr></table><div class="returns">Returns:<div class="return_description">A tensor containing the results of the texture sampling with shape filter modes.</td></tr></table><div class="returns">Returns:<div class="return_description">A tensor containing the results of the texture sampling with shape
[minibatch_size, height, width, tex_channels].</div></div></div> [minibatch_size, height, width, tex_channels]. Cube map fetches with invalid uv coordinates
(e.g., zero vectors) output all zeros and do not propagate gradients.</div></div></div>
<div class="apifunc"><h4><code>nvdiffrast.torch.texture_construct_mip(<em>tex</em>, <em>max_mip_level</em>=<span class="defarg">None</span>, <em>cube_mode</em>=<span class="defarg">False</span>)</code>&nbsp;<span class="sym_function">Function</span></h4> <div class="apifunc"><h4><code>nvdiffrast.torch.texture_construct_mip(<em>tex</em>, <em>max_mip_level</em>=<span class="defarg">None</span>, <em>cube_mode</em>=<span class="defarg">False</span>)</code>&nbsp;<span class="sym_function">Function</span></h4>
<p class="shortdesc">Construct a mipmap stack for a texture.</p><p class="longdesc">This function can be used for constructing a mipmap stack for a texture that is known to remain <p class="shortdesc">Construct a mipmap stack for a texture.</p><p class="longdesc">This function can be used for constructing a mipmap stack for a texture that is known to remain
constant. This avoids reconstructing it every time <code>texture()</code> is called.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">tex</td><td class="arg_short">Texture tensor with the same constraints as in <code>texture()</code>.</td></tr><tr class="arg"><td class="argname">max_mip_level</td><td class="arg_short">If specified, limits the number of mipmaps constructed.</td></tr><tr class="arg"><td class="argname">cube_mode</td><td class="arg_short">Must be set to True if <code>tex</code> specifies a cube map texture.</td></tr></table><div class="returns">Returns:<div class="return_description">An opaque object containing the mipmap stack. This can be supplied in a call to <code>texture()</code> constant. This avoids reconstructing it every time <code>texture()</code> is called.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">tex</td><td class="arg_short">Texture tensor with the same constraints as in <code>texture()</code>.</td></tr><tr class="arg"><td class="argname">max_mip_level</td><td class="arg_short">If specified, limits the number of mipmaps constructed.</td></tr><tr class="arg"><td class="argname">cube_mode</td><td class="arg_short">Must be set to True if <code>tex</code> specifies a cube map texture.</td></tr></table><div class="returns">Returns:<div class="return_description">An opaque object containing the mipmap stack. This can be supplied in a call to <code>texture()</code>
...@@ -977,7 +987,7 @@ known to remain constant. This avoids reconstructing it every time <code>antiali ...@@ -977,7 +987,7 @@ known to remain constant. This avoids reconstructing it every time <code>antiali
GPU memory.</td></tr></table><div class="returns">Returns:<div class="return_description">An opaque object containing the topology hash. This can be supplied in a call to GPU memory.</td></tr></table><div class="returns">Returns:<div class="return_description">An opaque object containing the topology hash. This can be supplied in a call to
<code>antialias()</code> in the <code>topology_hash</code> argument.</div></div></div> <code>antialias()</code> in the <code>topology_hash</code> argument.</div></div></div>
<div class="apifunc"><h4><code>nvdiffrast.torch.get_log_level(<em></em>)</code>&nbsp;<span class="sym_function">Function</span></h4> <div class="apifunc"><h4><code>nvdiffrast.torch.get_log_level(<em></em>)</code>&nbsp;<span class="sym_function">Function</span></h4>
<p class="shortdesc">Get current log level.</p><p class="longdesc"></p><div class="returns">Returns:<div class="return_description">Current log level in nvdiffrast. See <code>set_log_level()</code> for possible values.</div></div></div> <p class="shortdesc">Get current log level.</p><div class="returns">Returns:<div class="return_description">Current log level in nvdiffrast. See <code>set_log_level()</code> for possible values.</div></div></div>
<div class="apifunc"><h4><code>nvdiffrast.torch.set_log_level(<em>level</em>)</code>&nbsp;<span class="sym_function">Function</span></h4> <div class="apifunc"><h4><code>nvdiffrast.torch.set_log_level(<em>level</em>)</code>&nbsp;<span class="sym_function">Function</span></h4>
<p class="shortdesc">Set log level.</p><p class="longdesc">Log levels follow the convention on the C++ side of Torch: <p class="shortdesc">Set log level.</p><p class="longdesc">Log levels follow the convention on the C++ side of Torch:
0 = Info, 0 = Info,
......
...@@ -6,4 +6,4 @@ ...@@ -6,4 +6,4 @@
# distribution of this software and related documentation without an express # distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited. # license agreement from NVIDIA CORPORATION is strictly prohibited.
__version__ = '0.2.5' __version__ = '0.2.6'
...@@ -29,6 +29,18 @@ static int ROUND_UP_BITS(uint32_t x, uint32_t y) ...@@ -29,6 +29,18 @@ static int ROUND_UP_BITS(uint32_t x, uint32_t y)
return (x | m) + 1u; return (x | m) + 1u;
} }
//------------------------------------------------------------------------
// Draw command struct used by rasterizer.
struct GLDrawCmd
{
uint32_t count;
uint32_t instanceCount;
uint32_t firstIndex;
uint32_t baseVertex;
uint32_t baseInstance;
};
//------------------------------------------------------------------------ //------------------------------------------------------------------------
// GL helpers. // GL helpers.
...@@ -386,14 +398,6 @@ void rasterizeResizeBuffers(NVDR_CTX_ARGS, RasterizeGLState& s, int posCount, in ...@@ -386,14 +398,6 @@ void rasterizeResizeBuffers(NVDR_CTX_ARGS, RasterizeGLState& s, int posCount, in
for (int i=0; i < num_outputs; i++) for (int i=0; i < num_outputs; i++)
NVDR_CHECK_CUDA_ERROR(cudaGraphicsGLRegisterImage(&s.cudaColorBuffer[i], s.glColorBuffer[i], GL_TEXTURE_3D, cudaGraphicsRegisterFlagsReadOnly)); NVDR_CHECK_CUDA_ERROR(cudaGraphicsGLRegisterImage(&s.cudaColorBuffer[i], s.glColorBuffer[i], GL_TEXTURE_3D, cudaGraphicsRegisterFlagsReadOnly));
} }
// Resize range arrays?
if ((unsigned int)depth > s.drawCmdBuffer.size())
{
int newSize = (depth > 64) ? ROUND_UP_BITS(depth, 1) : 64;
LOG(INFO) << "Increasing range array size to " << newSize << " elements";
s.drawCmdBuffer.resize(newSize);
}
} }
void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, const float* posPtr, int posCount, int vtxPerInstance, const int32_t* triPtr, int triCount, const int32_t* rangesPtr, int width, int height, int depth, int peeling_idx) void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, const float* posPtr, int posCount, int vtxPerInstance, const int32_t* triPtr, int triCount, const int32_t* rangesPtr, int width, int height, int depth, int peeling_idx)
...@@ -487,6 +491,9 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co ...@@ -487,6 +491,9 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co
} }
else else
{ {
// Populate a buffer for draw commands and execute it.
std::vector<GLDrawCmd> drawCmdBuffer(depth);
if (!rangesPtr) if (!rangesPtr)
{ {
// Fill in range array to instantiate the same triangles for each output layer. // Fill in range array to instantiate the same triangles for each output layer.
...@@ -494,7 +501,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co ...@@ -494,7 +501,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co
// the first dimension in addressing the triangle array. // the first dimension in addressing the triangle array.
for (int i=0; i < depth; i++) for (int i=0; i < depth; i++)
{ {
GLDrawCmd& cmd = s.drawCmdBuffer[i]; GLDrawCmd& cmd = drawCmdBuffer[i];
cmd.firstIndex = 0; cmd.firstIndex = 0;
cmd.count = triCount; cmd.count = triCount;
cmd.baseVertex = vtxPerInstance * i; cmd.baseVertex = vtxPerInstance * i;
...@@ -509,7 +516,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co ...@@ -509,7 +516,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co
// the first dimension in addressing the triangle array. // the first dimension in addressing the triangle array.
for (int i=0, j=0; i < depth; i++) for (int i=0, j=0; i < depth; i++)
{ {
GLDrawCmd& cmd = s.drawCmdBuffer[i]; GLDrawCmd& cmd = drawCmdBuffer[i];
int first = rangesPtr[j++]; int first = rangesPtr[j++];
int count = rangesPtr[j++]; int count = rangesPtr[j++];
NVDR_CHECK(first >= 0 && count >= 0, "range contains negative values"); NVDR_CHECK(first >= 0 && count >= 0, "range contains negative values");
...@@ -523,7 +530,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co ...@@ -523,7 +530,7 @@ void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, co
} }
// Draw! // Draw!
NVDR_CHECK_GL_ERROR(glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, &s.drawCmdBuffer[0], depth, sizeof(GLDrawCmd))); NVDR_CHECK_GL_ERROR(glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, &drawCmdBuffer[0], depth, sizeof(GLDrawCmd)));
} }
} }
...@@ -557,4 +564,36 @@ void rasterizeCopyResults(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t strea ...@@ -557,4 +564,36 @@ void rasterizeCopyResults(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t strea
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnmapResources(num_outputs, s.cudaColorBuffer, stream)); NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnmapResources(num_outputs, s.cudaColorBuffer, stream));
} }
void rasterizeReleaseBuffers(NVDR_CTX_ARGS, RasterizeGLState& s)
{
int num_outputs = s.enableDB ? 2 : 1;
if (s.cudaPosBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaPosBuffer));
s.cudaPosBuffer = 0;
}
if (s.cudaTriBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaTriBuffer));
s.cudaTriBuffer = 0;
}
for (int i=0; i < num_outputs; i++)
{
if (s.cudaColorBuffer[i])
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaColorBuffer[i]));
s.cudaColorBuffer[i] = 0;
}
}
if (s.cudaPrevOutBuffer)
{
NVDR_CHECK_CUDA_ERROR(cudaGraphicsUnregisterResource(s.cudaPrevOutBuffer));
s.cudaPrevOutBuffer = 0;
}
}
//------------------------------------------------------------------------ //------------------------------------------------------------------------
...@@ -41,22 +41,10 @@ struct RasterizeGradParams ...@@ -41,22 +41,10 @@ struct RasterizeGradParams
#include "framework.h" #include "framework.h"
#include "glutil.h" #include "glutil.h"
//------------------------------------------------------------------------
// Draw command struct used by rasterizer.
struct GLDrawCmd
{
uint32_t count;
uint32_t instanceCount;
uint32_t firstIndex;
uint32_t baseVertex;
uint32_t baseInstance;
};
//------------------------------------------------------------------------ //------------------------------------------------------------------------
// OpenGL-related persistent state for forward op. // OpenGL-related persistent state for forward op.
struct RasterizeGLState struct RasterizeGLState // Must be initializable by memset to zero.
{ {
int width; // Allocated frame buffer width. int width; // Allocated frame buffer width.
int height; // Allocated frame buffer height. int height; // Allocated frame buffer height.
...@@ -81,7 +69,6 @@ struct RasterizeGLState ...@@ -81,7 +69,6 @@ struct RasterizeGLState
cudaGraphicsResource_t cudaPrevOutBuffer; cudaGraphicsResource_t cudaPrevOutBuffer;
cudaGraphicsResource_t cudaPosBuffer; cudaGraphicsResource_t cudaPosBuffer;
cudaGraphicsResource_t cudaTriBuffer; cudaGraphicsResource_t cudaTriBuffer;
std::vector<GLDrawCmd> drawCmdBuffer;
int enableDB; int enableDB;
}; };
...@@ -92,6 +79,7 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId ...@@ -92,6 +79,7 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId
void rasterizeResizeBuffers(NVDR_CTX_ARGS, RasterizeGLState& s, int posCount, int triCount, int width, int height, int depth); void rasterizeResizeBuffers(NVDR_CTX_ARGS, RasterizeGLState& s, int posCount, int triCount, int width, int height, int depth);
void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, const float* posPtr, int posCount, int vtxPerInstance, const int32_t* triPtr, int triCount, const int32_t* rangesPtr, int width, int height, int depth, int peeling_idx); void rasterizeRender(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, const float* posPtr, int posCount, int vtxPerInstance, const int32_t* triPtr, int triCount, const int32_t* rangesPtr, int width, int height, int depth, int peeling_idx);
void rasterizeCopyResults(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, float** outputPtr, int width, int height, int depth); void rasterizeCopyResults(NVDR_CTX_ARGS, RasterizeGLState& s, cudaStream_t stream, float** outputPtr, int width, int height, int depth);
void rasterizeReleaseBuffers(NVDR_CTX_ARGS, RasterizeGLState& s);
//------------------------------------------------------------------------ //------------------------------------------------------------------------
#endif // !(defined(NVDR_TORCH) && defined(__CUDACC__)) #endif // !(defined(NVDR_TORCH) && defined(__CUDACC__))
...@@ -18,6 +18,8 @@ static __device__ __forceinline__ void accum_from_mem(float* a, int s, float4 b, ...@@ -18,6 +18,8 @@ static __device__ __forceinline__ void accum_from_mem(float* a, int s, float4 b,
static __device__ __forceinline__ void accum_to_mem(float& a, float* b, int s) { a += b[0]; } static __device__ __forceinline__ void accum_to_mem(float& a, float* b, int s) { a += b[0]; }
static __device__ __forceinline__ void accum_to_mem(float2& a, float* b, int s) { float2 v = a; v.x += b[0]; v.y += b[s]; a = v; } static __device__ __forceinline__ void accum_to_mem(float2& a, float* b, int s) { float2 v = a; v.x += b[0]; v.y += b[s]; a = v; }
static __device__ __forceinline__ void accum_to_mem(float4& a, float* b, int s) { float4 v = a; v.x += b[0]; v.y += b[s]; v.z += b[2*s]; v.w += b[3*s]; a = v; } static __device__ __forceinline__ void accum_to_mem(float4& a, float* b, int s) { float4 v = a; v.x += b[0]; v.y += b[s]; v.z += b[2*s]; v.w += b[3*s]; a = v; }
static __device__ __forceinline__ bool isfinite_vec3(const float3& a) { return isfinite(a.x) && isfinite(a.y) && isfinite(a.z); }
static __device__ __forceinline__ bool isfinite_vec4(const float4& a) { return isfinite(a.x) && isfinite(a.y) && isfinite(a.z) && isfinite(a.w); }
template<class T> static __device__ __forceinline__ T lerp (const T& a, const T& b, float c) { return a + c * (b - a); } template<class T> static __device__ __forceinline__ T lerp (const T& a, const T& b, float c) { return a + c * (b - a); }
template<class T> static __device__ __forceinline__ T bilerp(const T& a, const T& b, const T& c, const T& d, const float2& e) { return lerp(lerp(a, b, e.x), lerp(c, d, e.x), e.y); } template<class T> static __device__ __forceinline__ T bilerp(const T& a, const T& b, const T& c, const T& d, const float2& e) { return lerp(lerp(a, b, e.x), lerp(c, d, e.x), e.y); }
...@@ -110,6 +112,8 @@ static __device__ __forceinline__ int indexCubeMap(float& x, float& y, float z) ...@@ -110,6 +112,8 @@ static __device__ __forceinline__ int indexCubeMap(float& x, float& y, float z)
float m1 = (idx != 2) ? -m : m; float m1 = (idx != 2) ? -m : m;
x = x * m0 + .5; x = x * m0 + .5;
y = y * m1 + .5; y = y * m1 + .5;
if (!isfinite(x) || !isfinite(y))
return -1; // Invalid uv.
x = fminf(fmaxf(x, 0.f), 1.f); x = fminf(fmaxf(x, 0.f), 1.f);
y = fminf(fmaxf(y, 0.f), 1.f); y = fminf(fmaxf(y, 0.f), 1.f);
return idx; return idx;
...@@ -137,7 +141,10 @@ static __device__ __forceinline__ float3 indexCubeMapGrad(float3 uv, float gu, f ...@@ -137,7 +141,10 @@ static __device__ __forceinline__ float3 indexCubeMapGrad(float3 uv, float gu, f
float gy = (idx & 0x0c) ? gl : -gv; float gy = (idx & 0x0c) ? gl : -gv;
float gz = (idx & 0x30) ? gl : (idx & 0x03) ? gu : gv; float gz = (idx & 0x30) ? gl : (idx & 0x03) ? gu : gv;
gz = (idx & 0x09) ? -gz : gz; gz = (idx & 0x09) ? -gz : gz;
return make_float3(gx, gy, gz) * (m * .5f); float3 res = make_float3(gx, gy, gz) * (m * .5f);
if (!isfinite_vec3(res))
return make_float3(0.f, 0.f, 0.f); // Invalid uv.
return res;
} }
// Based on dL/d(d{s,t}/s{X,Y}), compute dL/d(d{x,y,z}/d{X,Y}). This is just two // Based on dL/d(d{s,t}/s{X,Y}), compute dL/d(d{x,y,z}/d{X,Y}). This is just two
...@@ -171,6 +178,11 @@ static __device__ __forceinline__ void indexCubeMapGrad4(float3 uv, float4 dw, f ...@@ -171,6 +178,11 @@ static __device__ __forceinline__ void indexCubeMapGrad4(float3 uv, float4 dw, f
} }
g0 = make_float3(gx0, gy0, gz0) * (m * .5f); g0 = make_float3(gx0, gy0, gz0) * (m * .5f);
g1 = make_float3(gx1, gy1, gz1) * (m * .5f); g1 = make_float3(gx1, gy1, gz1) * (m * .5f);
if (!isfinite_vec3(g0) || !isfinite_vec3(g1))
{
g0 = make_float3(0.f, 0.f, 0.f); // Invalid uv.
g1 = make_float3(0.f, 0.f, 0.f);
}
} }
// Compute d{s,t}/d{X,Y} based on d{x,y,z}/d{X,Y} at a given 3D lookup vector. // Compute d{s,t}/d{X,Y} based on d{x,y,z}/d{X,Y} at a given 3D lookup vector.
...@@ -197,27 +209,33 @@ static __device__ __forceinline__ float4 indexCubeMapGradST(float3 uv, float3 dv ...@@ -197,27 +209,33 @@ static __device__ __forceinline__ float4 indexCubeMapGradST(float3 uv, float3 dv
gu *= (idx & 0x34) ? -mm : mm; gu *= (idx & 0x34) ? -mm : mm;
gv *= (idx & 0x2e) ? -mm : mm; gv *= (idx & 0x2e) ? -mm : mm;
float4 res;
if (idx & 0x03) if (idx & 0x03)
{ {
return make_float4(gu * dvdX.x + dm * dvdX.z, res = make_float4(gu * dvdX.x + dm * dvdX.z,
gu * dvdY.x + dm * dvdY.z, gu * dvdY.x + dm * dvdY.z,
gv * dvdX.x - dm * dvdX.y, gv * dvdX.x - dm * dvdX.y,
gv * dvdY.x - dm * dvdY.y); gv * dvdY.x - dm * dvdY.y);
} }
else if (idx & 0x0c) else if (idx & 0x0c)
{ {
return make_float4(gu * dvdX.y + dm * dvdX.x, res = make_float4(gu * dvdX.y + dm * dvdX.x,
gu * dvdY.y + dm * dvdY.x, gu * dvdY.y + dm * dvdY.x,
gv * dvdX.y + dm * dvdX.z, gv * dvdX.y + dm * dvdX.z,
gv * dvdY.y + dm * dvdY.z); gv * dvdY.y + dm * dvdY.z);
} }
else // (idx & 0x30) else // (idx & 0x30)
{ {
return make_float4(gu * dvdX.z + copysignf(dm, c) * dvdX.x, res = make_float4(gu * dvdX.z + copysignf(dm, c) * dvdX.x,
gu * dvdY.z + copysignf(dm, c) * dvdY.x, gu * dvdY.z + copysignf(dm, c) * dvdY.x,
gv * dvdX.z - dm * dvdX.y, gv * dvdX.z - dm * dvdX.y,
gv * dvdY.z - dm * dvdY.y); gv * dvdY.z - dm * dvdY.y);
} }
if (!isfinite_vec4(res))
return make_float4(0.f, 0.f, 0.f, 0.f);
return res;
} }
// Compute d(d{s,t}/d{X,Y})/d{x,y,z}, i.e., how the pixel derivatives of 2D face // Compute d(d{s,t}/d{X,Y})/d{x,y,z}, i.e., how the pixel derivatives of 2D face
...@@ -313,7 +331,10 @@ static __device__ __forceinline__ int indexTextureNearest(const TextureKernelPar ...@@ -313,7 +331,10 @@ static __device__ __forceinline__ int indexTextureNearest(const TextureKernelPar
if (CUBE_MODE) if (CUBE_MODE)
{ {
// No wrap. Fold face index into tz right away. // No wrap. Fold face index into tz right away.
tz = 6 * tz + indexCubeMap(u, v, uv.z); // Rewrites u, v. int idx = indexCubeMap(u, v, uv.z); // Rewrites u, v.
if (idx < 0)
return -1; // Invalid uv.
tz = 6 * tz + idx;
} }
else else
{ {
...@@ -364,6 +385,11 @@ static __device__ __forceinline__ float2 indexTextureLinear(const TextureKernelP ...@@ -364,6 +385,11 @@ static __device__ __forceinline__ float2 indexTextureLinear(const TextureKernelP
{ {
// Neither clamp or wrap. // Neither clamp or wrap.
face = indexCubeMap(u, v, uv.z); // Rewrites u, v. face = indexCubeMap(u, v, uv.z); // Rewrites u, v.
if (face < 0)
{
tcOut.x = tcOut.y = tcOut.z = tcOut.w = -1; // Invalid uv.
return make_float2(0.f, 0.f);
}
u = u * (float)w - 0.5f; u = u * (float)w - 0.5f;
v = v * (float)h - 0.5f; v = v * (float)h - 0.5f;
} }
...@@ -511,14 +537,16 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level ...@@ -511,14 +537,16 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level
float d_f_ddtdX = vscl * (dtdx * (l2aw + AB) + dtdy * Cw); float d_f_ddtdX = vscl * (dtdx * (l2aw + AB) + dtdy * Cw);
float d_f_ddtdY = vscl * (dtdy * (l2aw - AB) + dtdx * Cw); float d_f_ddtdY = vscl * (dtdy * (l2aw - AB) + dtdx * Cw);
*pdw = make_float4(d_f_ddsdX, d_f_ddsdY, d_f_ddtdX, d_f_ddtdY); float4 d_f_dw = make_float4(d_f_ddsdX, d_f_ddsdY, d_f_ddtdX, d_f_ddtdY);
if (!CUBE_MODE)
*pdw = isfinite_vec4(d_f_dw) ? d_f_dw : make_float4(0.f, 0.f, 0.f, 0.f);
// In cube maps, there is also a texture coordinate vs. mip level gradient. // In cube maps, there is also a texture coordinate vs. mip level gradient.
// Only output nonzero vectors if both are free of inf/Nan garbage.
if (CUBE_MODE) if (CUBE_MODE)
{ {
float4 dx, dy, dz; float4 dx, dy, dz;
indexCubeMapGrad2(uv, dvdX, dvdY, dx, dy, dz); indexCubeMapGrad2(uv, dvdX, dvdY, dx, dy, dz);
float3 d_dsdX_dv = make_float3(dx.x, dy.x, dz.x); float3 d_dsdX_dv = make_float3(dx.x, dy.x, dz.x);
float3 d_dsdY_dv = make_float3(dx.y, dy.y, dz.y); float3 d_dsdY_dv = make_float3(dx.y, dy.y, dz.y);
float3 d_dtdX_dv = make_float3(dx.z, dy.z, dz.z); float3 d_dtdX_dv = make_float3(dx.z, dy.z, dz.z);
...@@ -530,12 +558,14 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level ...@@ -530,12 +558,14 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level
d_f_dv += d_dtdX_dv * d_f_ddtdX; d_f_dv += d_dtdX_dv * d_f_ddtdX;
d_f_dv += d_dtdY_dv * d_f_ddtdY; d_f_dv += d_dtdY_dv * d_f_ddtdY;
*pdfdv = d_f_dv; bool finite = isfinite_vec4(d_f_dw) && isfinite_vec3(d_f_dv);
*pdw = finite ? d_f_dw : make_float4(0.f, 0.f, 0.f, 0.f);
*pdfdv = finite ? d_f_dv : make_float3(0.f, 0.f, 0.f);
} }
} }
// Finally, calculate mip level. // Finally, calculate mip level.
flevel = .5f * __log2f(lenMajorSqr); flevel = .5f * __log2f(lenMajorSqr); // May be inf/NaN, but clamp fixes it.
} }
// Bias the mip level and clamp. // Bias the mip level and clamp.
...@@ -560,6 +590,7 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level ...@@ -560,6 +590,7 @@ static __device__ __forceinline__ void calculateMipLevel(int& level0, int& level
template<class T> template<class T>
static __device__ __forceinline__ void fetchQuad(T& a00, T& a10, T& a01, T& a11, const float* pIn, int4 tc, bool corner) static __device__ __forceinline__ void fetchQuad(T& a00, T& a10, T& a01, T& a11, const float* pIn, int4 tc, bool corner)
{ {
// For invalid cube map uv, tc will be all negative, and all texel values will be zero.
if (corner) if (corner)
{ {
T avg = zero_value<T>(); T avg = zero_value<T>();
...@@ -584,6 +615,7 @@ static __device__ __forceinline__ void fetchQuad(T& a00, T& a10, T& a01, T& a11, ...@@ -584,6 +615,7 @@ static __device__ __forceinline__ void fetchQuad(T& a00, T& a10, T& a01, T& a11,
static __device__ __forceinline__ void accumQuad(float4 c, float* pOut, int level, int4 tc, bool corner, CA_TEMP_PARAM) static __device__ __forceinline__ void accumQuad(float4 c, float* pOut, int level, int4 tc, bool corner, CA_TEMP_PARAM)
{ {
// For invalid cube map uv, tc will be all negative, and no accumulation will take place.
if (corner) if (corner)
{ {
float cb; float cb;
......
...@@ -179,7 +179,7 @@ class _rasterize_func(torch.autograd.Function): ...@@ -179,7 +179,7 @@ class _rasterize_func(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy, ddb): def backward(ctx, dy, ddb):
pos, tri, out = ctx.saved_variables pos, tri, out = ctx.saved_tensors
if ctx.saved_grad_db: if ctx.saved_grad_db:
g_pos = _get_plugin().rasterize_grad_db(pos, tri, out, dy, ddb) g_pos = _get_plugin().rasterize_grad_db(pos, tri, out, dy, ddb)
else: else:
...@@ -194,6 +194,10 @@ def rasterize(glctx, pos, tri, resolution, ranges=None, grad_db=True): ...@@ -194,6 +194,10 @@ def rasterize(glctx, pos, tri, resolution, ranges=None, grad_db=True):
the `ranges` tensor that, if specified, has to reside in CPU memory. The the `ranges` tensor that, if specified, has to reside in CPU memory. The
output tensors will be contiguous and reside in GPU memory. output tensors will be contiguous and reside in GPU memory.
Note: For an unknown reason, on Windows the very first rasterization call using
a newly created OpenGL context may *sometimes* output a blank buffer. This is a
known bug and has never been observed to affect subsequent calls.
Args: Args:
glctx: OpenGL context of type `RasterizeGLContext`. glctx: OpenGL context of type `RasterizeGLContext`.
pos: Vertex position tensor with dtype `torch.float32`. To enable range pos: Vertex position tensor with dtype `torch.float32`. To enable range
...@@ -321,7 +325,7 @@ class _interpolate_func_da(torch.autograd.Function): ...@@ -321,7 +325,7 @@ class _interpolate_func_da(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy, dda): def backward(ctx, dy, dda):
attr, rast, tri, rast_db = ctx.saved_variables attr, rast, tri, rast_db = ctx.saved_tensors
diff_attrs_all, diff_attrs_list = ctx.saved_misc diff_attrs_all, diff_attrs_list = ctx.saved_misc
g_attr, g_rast, g_rast_db = _get_plugin().interpolate_grad_da(attr, rast, tri, dy, rast_db, dda, diff_attrs_all, diff_attrs_list) g_attr, g_rast, g_rast_db = _get_plugin().interpolate_grad_da(attr, rast, tri, dy, rast_db, dda, diff_attrs_all, diff_attrs_list)
return g_attr, g_rast, None, g_rast_db, None, None return g_attr, g_rast, None, g_rast_db, None, None
...@@ -336,7 +340,7 @@ class _interpolate_func(torch.autograd.Function): ...@@ -336,7 +340,7 @@ class _interpolate_func(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy, _): def backward(ctx, dy, _):
attr, rast, tri = ctx.saved_variables attr, rast, tri = ctx.saved_tensors
g_attr, g_rast = _get_plugin().interpolate_grad(attr, rast, tri, dy) g_attr, g_rast = _get_plugin().interpolate_grad(attr, rast, tri, dy)
return g_attr, g_rast, None return g_attr, g_rast, None
...@@ -415,7 +419,7 @@ class _texture_func_mip(torch.autograd.Function): ...@@ -415,7 +419,7 @@ class _texture_func_mip(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy): def backward(ctx, dy):
tex, uv, uv_da, mip_level_bias, *mip_stack = ctx.saved_variables tex, uv, uv_da, mip_level_bias, *mip_stack = ctx.saved_tensors
filter_mode, mip_wrapper, filter_mode_enum, boundary_mode_enum = ctx.saved_misc filter_mode, mip_wrapper, filter_mode_enum, boundary_mode_enum = ctx.saved_misc
if filter_mode == 'linear-mipmap-linear': if filter_mode == 'linear-mipmap-linear':
g_tex, g_uv, g_uv_da, g_mip_level_bias, g_mip_stack = _get_plugin().texture_grad_linear_mipmap_linear(tex, uv, dy, uv_da, mip_level_bias, mip_wrapper, mip_stack, filter_mode_enum, boundary_mode_enum) g_tex, g_uv, g_uv_da, g_mip_level_bias, g_mip_stack = _get_plugin().texture_grad_linear_mipmap_linear(tex, uv, dy, uv_da, mip_level_bias, mip_wrapper, mip_stack, filter_mode_enum, boundary_mode_enum)
...@@ -435,7 +439,7 @@ class _texture_func(torch.autograd.Function): ...@@ -435,7 +439,7 @@ class _texture_func(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy): def backward(ctx, dy):
tex, uv = ctx.saved_variables tex, uv = ctx.saved_tensors
filter_mode, filter_mode_enum, boundary_mode_enum = ctx.saved_misc filter_mode, filter_mode_enum, boundary_mode_enum = ctx.saved_misc
if filter_mode == 'linear': if filter_mode == 'linear':
g_tex, g_uv = _get_plugin().texture_grad_linear(tex, uv, dy, filter_mode_enum, boundary_mode_enum) g_tex, g_uv = _get_plugin().texture_grad_linear(tex, uv, dy, filter_mode_enum, boundary_mode_enum)
...@@ -490,7 +494,8 @@ def texture(tex, uv, uv_da=None, mip_level_bias=None, mip=None, filter_mode='aut ...@@ -490,7 +494,8 @@ def texture(tex, uv, uv_da=None, mip_level_bias=None, mip=None, filter_mode='aut
Returns: Returns:
A tensor containing the results of the texture sampling with shape A tensor containing the results of the texture sampling with shape
[minibatch_size, height, width, tex_channels]. [minibatch_size, height, width, tex_channels]. Cube map fetches with invalid uv coordinates
(e.g., zero vectors) output all zeros and do not propagate gradients.
""" """
# Default filter mode. # Default filter mode.
...@@ -580,7 +585,7 @@ class _antialias_func(torch.autograd.Function): ...@@ -580,7 +585,7 @@ class _antialias_func(torch.autograd.Function):
@staticmethod @staticmethod
def backward(ctx, dy): def backward(ctx, dy):
color, rast, pos, tri = ctx.saved_variables color, rast, pos, tri = ctx.saved_tensors
pos_gradient_boost, work_buffer = ctx.saved_misc pos_gradient_boost, work_buffer = ctx.saved_misc
g_color, g_pos = _get_plugin().antialias_grad(color, rast, pos, tri, dy, work_buffer) g_color, g_pos = _get_plugin().antialias_grad(color, rast, pos, tri, dy, work_buffer)
if pos_gradient_boost != 1.0: if pos_gradient_boost != 1.0:
......
...@@ -34,6 +34,9 @@ RasterizeGLStateWrapper::RasterizeGLStateWrapper(bool enableDB, bool automatic_, ...@@ -34,6 +34,9 @@ RasterizeGLStateWrapper::RasterizeGLStateWrapper(bool enableDB, bool automatic_,
RasterizeGLStateWrapper::~RasterizeGLStateWrapper(void) RasterizeGLStateWrapper::~RasterizeGLStateWrapper(void)
{ {
setGLContext(pState->glctx);
rasterizeReleaseBuffers(NVDR_CTX_PARAMS, *pState);
releaseGLContext();
destroyGLContext(pState->glctx); destroyGLContext(pState->glctx);
delete pState; delete pState;
} }
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment