Fixes to #44, #50, #51

e44c9a29 · Samuli Laine · 45a55763 · e44c9a29 · e44c9a29 · e44c9a29
Commit e44c9a29 authored Oct 29, 2021 by Samuli Laine
11 changed files
--- a/README.md
+++ b/README.md
@@ -20,9 +20,9 @@ For business inquiries, please visit our website and submit the form: [NVIDIA Re
 We do not currently accept outside code contributions in the form of pull requests.
 Environment map stored as part of `samples/data/envphong.npz` is derived from a Wave Engine
-[sample material](https://github.com/WaveEngine/Samples/tree/master/Materials/EnvironmentMap/Content/Assets/CubeMap.cubemap)
+[sample material](https://github.com/WaveEngine/Samples-2.5/tree/master/Materials/EnvironmentMap/Content/Assets/CubeMap.cubemap)
 originally shared under 
-[MIT License](https://github.com/WaveEngine/Samples/blob/master/LICENSE.md).
+[MIT License](https://github.com/WaveEngine/Samples-2.5/blob/master/LICENSE.md).
 Mesh and texture stored as part of `samples/data/earth.npz` are derived from
 [3D Earth Photorealistic 2K](https://www.turbosquid.com/3d-models/3d-realistic-earth-photorealistic-2k-1279125)
 model originally made available under

--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -12,7 +12,7 @@
 #
 # This file defaults to pytorch/pytorch as it works on slightly older
 # driver versions.
-ARG BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
+ARG BASE_IMAGE=pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel
 FROM $BASE_IMAGE
 RUN apt-get update && apt-get install -y --no-install-recommends \
@@ -44,7 +44,7 @@ ENV PYOPENGL_PLATFORM egl
 COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
-RUN pip install imageio imageio-ffmpeg
+RUN pip install ninja imageio imageio-ffmpeg
 COPY nvdiffrast /tmp/pip/nvdiffrast/
 COPY README.md setup.py /tmp/pip/

--- a/docs/index.html
+++ b/docs/index.html
@@ -513,7 +513,7 @@ Rendered in 4×4 higher resolution and downsampled
 <p>Nvdiffrast follows OpenGL's coordinate systems and other conventions. This is partially because we use OpenGL to accelerate the rasterization operation, but mostly so that there is a <a href="https://xkcd.com/927/">single standard to follow</a>.</p>
 <ul>
 <li>
-When rasterizing, the normalized device coordinates — i.e., clip-space coordinates after division by <span class="math inline"><em>w</em></span> — map to screen so that <span class="math inline"><em>x</em></span> increases towards right side of screen, <span class="math inline"><em>y</em></span> increases towards top of screen, and <strong><span class="math inline"><em>z</em></span> increases towards the viewer</strong>.
+In OpenGL convention, the perspective projection matrix (as implemented in, e.g., <a href="https://github.com/NVlabs/nvdiffrast/blob/main/samples/torch/util.py#L16-L20"><code>utils.projection()</code></a> in our samples and <a href="https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/glFrustum.xml"><code>glFrustum()</code></a> in OpenGL) treats the view-space <span class="math inline"><em>z</em></span> as increasing towards the viewer. However, <em>after</em> multiplication by perspective projection matrix, the homogeneous <a href="https://en.wikipedia.org/wiki/Clip_coordinates">clip-space</a> coordinate <span class="math inline"><em>z</em></span>/<span class="math inline"><em>w</em></span> increases away from the viewer. Hence, a larger depth value in the rasterizer output tensor also corresponds to a surface further away from the viewer.
 </li>
 <li>
 <strong>The memory order of image data in OpenGL, and consequently in nvdiffrast, is bottom-up.</strong> This means that row 0 of a tensor containing an image is the bottom row of the texture/image, which is the opposite of the more common scanline order. If you want to keep your image data in the conventional top-down order in your code, but have it logically the right way up inside nvdiffrast, you will need to flip the images vertically when crossing the boundary.
@@ -782,6 +782,7 @@ Third depth layer
 <p>In manual mode, the user assumes the responsibility of setting and releasing the OpenGL context. Most of the time, if you don't have any other libraries that would be using OpenGL, you can just set the context once after having created it and keep it set until the program exits. However, keep in mind that the active OpenGL context is a thread-local resource, so it needs to be set in the same CPU thread as it will be used, and it cannot be set simultaneously in multiple CPU threads.</p>
 <h2 id="samples">Samples</h2>
 <p>Nvdiffrast comes with a set of samples that were crafted to support the research paper. Each sample is available in both PyTorch and TensorFlow versions. Details such as command-line parameters, logging format, etc., may not be identical between the versions, and generally the PyTorch versions should be considered definitive. The command-line examples below are for the PyTorch versions.</p>
+<p>Enabling interactive display using the <code>--display-interval</code> parameter works on Windows but is likely to fail on Linux. Our Dockerfile is set up to support headless rendering only, and thus cannot show an interactive result window.</p>
 <h3 id="triangle.py"><a href="https://github.com/NVlabs/nvdiffrast/blob/main/samples/torch/triangle.py">triangle.py</a></h3>
 <p>This is a minimal sample that renders a triangle and saves the resulting image into a file (<code>tri.png</code>) in the current directory. Running this should be the first step to verify that you have everything set up correctly. Rendering is done using the rasterization and interpolation operations, so getting the correct output image means that both OpenGL and CUDA are working as intended under the hood.</p>
 <p>Example command line:</p>
@@ -895,9 +896,16 @@ Interactive view of pose.py
 <h2 id="pytorch-api-reference">PyTorch API reference</h2>
 <div style="padding-top: 1em;">
 <div class="apifunc"><h4><code>nvdiffrast.torch.RasterizeGLContext(<em>output_db</em>=<span class="defarg">True</span>, <em>mode</em>=<span class="defarg">'automatic'</span>, <em>device</em>=<span class="defarg">None</span>)</code>&nbsp;<span class="sym_class">Class</span></h4>
-<p class="shortdesc">Create a new OpenGL rasterizer context.</p><p class="longdesc">Creating an OpenGL context is a slow operation so you should reuse the same
+<p class="shortdesc">Create a new OpenGL rasterizer context.</p><p class="longdesc">Creating an OpenGL context is a slow operation so you should usually reuse the same
 context in all calls to <code>rasterize()</code> on the same CPU thread. The OpenGL context
-is deleted when the object is destroyed.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">output_db</td><td class="arg_short">Compute and output image-space derivates of barycentrics.</td></tr><tr class="arg"><td class="argname">mode</td><td class="arg_short">OpenGL context handling mode. Valid values are 'manual' and 'automatic'.</td></tr><tr class="arg"><td class="argname">device</td><td class="arg_short">Cuda device on which the context is created. Type can be
+is deleted when the object is destroyed.</p><p class="longdesc">Side note: When using the OpenGL context in a rasterization operation, the
+context's internal framebuffer object is automatically enlarged to accommodate the
+rasterization operation's output shape, but it is never shrunk in size until the
+context is destroyed. Thus, if you need to rasterize, say, deep low-resolution
+tensors and also shallow high-resolution tensors, you can conserve GPU memory by
+creating two separate OpenGL contexts for these tasks. In this scenario, using the
+same OpenGL context for both tasks would end up reserving GPU memory for a deep,
+high-resolution output tensor.</p><div class="arguments">Arguments:</div><table class="args"><tr class="arg"><td class="argname">output_db</td><td class="arg_short">Compute and output image-space derivates of barycentrics.</td></tr><tr class="arg"><td class="argname">mode</td><td class="arg_short">OpenGL context handling mode. Valid values are 'manual' and 'automatic'.</td></tr><tr class="arg"><td class="argname">device</td><td class="arg_short">Cuda device on which the context is created. Type can be
 <code>torch.device</code>, string (e.g., <code>'cuda:1'</code>), or int. If not
 specified, context will be created on currently active Cuda
 device.</td></tr></table><div class="methods">Methods, only available if context was created in manual mode:</div><table class="args"><tr class="arg"><td class="argname">set_context()</td><td class="arg_short">Set (activate) OpenGL context in the current CPU thread.</td></tr><tr class="arg"><td class="argname">release_context()</td><td class="arg_short">Release (deactivate) currently active OpenGL context.</td></tr></table><div class="returns">Returns:<div class="return_description">The newly created OpenGL rasterizer context.</div></div></div>
@@ -1002,9 +1010,9 @@ severity will be silent.</td></tr></table></div>
 <h2 id="licenses">Licenses</h2>
 <p>Copyright © 2020, NVIDIA Corporation. All rights reserved.</p>
 <p>This work is made available under the <a href="https://github.com/NVlabs/nvdiffrast/blob/main/LICENSE.txt">Nvidia Source Code License</a>.</p>
-<p>For business inquiries, please contact <a href="mailto:researchinquiries@nvidia.com">researchinquiries@nvidia.com</a></p>
+<p>For business inquiries, please visit our website and submit the form: <a href="https://www.nvidia.com/en-us/research/inquiries/">NVIDIA Research Licensing</a></p>
 <p>We do not currently accept outside contributions in the form of pull requests.</p>
-<p>Environment map stored as part of <code>samples/data/envphong.npz</code> is derived from a Wave Engine <a href="https://github.com/WaveEngine/Samples/tree/master/Materials/EnvironmentMap/Content/Assets/CubeMap.cubemap">sample material</a> originally shared under <a href="https://github.com/WaveEngine/Samples/blob/master/LICENSE.md">MIT License</a>. Mesh and texture stored as part of <code>samples/data/earth.npz</code> are derived from <a href="https://www.turbosquid.com/3d-models/3d-realistic-earth-photorealistic-2k-1279125">3D Earth Photorealistic 2K</a> model originally made available under <a href="https://blog.turbosquid.com/turbosquid-3d-model-license/#3d-model-license">TurboSquid 3D Model License</a>.</p>
+<p>Environment map stored as part of <code>samples/data/envphong.npz</code> is derived from a Wave Engine <a href="https://github.com/WaveEngine/Samples-2.5/tree/master/Materials/EnvironmentMap/Content/Assets/CubeMap.cubemap">sample material</a> originally shared under <a href="https://github.com/WaveEngine/Samples-2.5/blob/master/LICENSE.md">MIT License</a>. Mesh and texture stored as part of <code>samples/data/earth.npz</code> are derived from <a href="https://www.turbosquid.com/3d-models/3d-realistic-earth-photorealistic-2k-1279125">3D Earth Photorealistic 2K</a> model originally made available under <a href="https://blog.turbosquid.com/turbosquid-3d-model-license/#3d-model-license">TurboSquid 3D Model License</a>.</p>
 <h2 id="citation">Citation</h2>
 <pre><code>@article{Laine2020diffrast,
  title   = {Modular Primitives for High-Performance Differentiable Rendering},

--- a/nvdiffrast/__init__.py
+++ b/nvdiffrast/__init__.py
@@ -6,4 +6,4 @@
 # distribution of this software and related documentation without an express
 # license agreement from NVIDIA CORPORATION is strictly prohibited.
-__version__ = '0.2.6'
+__version__ = '0.2.7'
--- a/nvdiffrast/common/rasterize.cpp
+++ b/nvdiffrast/common/rasterize.cpp
@@ -193,7 +193,6 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId
            STRINGIFY_SHADER_SOURCE(
                in vec4 var_uvzw;
                in vec4 var_db;
-                in int gl_PrimitiveID;
                layout(location = 0) out vec4 out_raster;
                layout(location = 1) out vec4 out_db;
                void main()
@@ -210,8 +209,6 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId
            STRINGIFY_SHADER_SOURCE(
                in vec4 var_uvzw;
                in vec4 var_db;
-                in int gl_Layer;
-                in int gl_PrimitiveID;
                layout(binding = 0) uniform sampler2DArray out_prev;
                layout(location = 0) out vec4 out_raster;
                layout(location = 1) out vec4 out_db;
@@ -255,7 +252,6 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId
            "#version 330\n"
            STRINGIFY_SHADER_SOURCE(
                in vec4 var_uvzw;
-                in int gl_PrimitiveID;
                layout(location = 0) out vec4 out_raster;
                void main()
                {
@@ -269,8 +265,6 @@ void rasterizeInitGLContext(NVDR_CTX_ARGS, RasterizeGLState& s, int cudaDeviceId
            "#version 430\n"
            STRINGIFY_SHADER_SOURCE(
                in vec4 var_uvzw;
-                in int gl_Layer;
-                in int gl_PrimitiveID;
                layout(binding = 0) uniform sampler2DArray out_prev;
                layout(location = 0) out vec4 out_raster;
                void main()

--- a/nvdiffrast/torch/ops.py
+++ b/nvdiffrast/torch/ops.py
@@ -125,10 +125,19 @@ class RasterizeGLContext:
    def __init__(self, output_db=True, mode='automatic', device=None):
        '''Create a new OpenGL rasterizer context.
-        Creating an OpenGL context is a slow operation so you should reuse the same
+        Creating an OpenGL context is a slow operation so you should usually reuse the same
        context in all calls to `rasterize()` on the same CPU thread. The OpenGL context
        is deleted when the object is destroyed.
+        Side note: When using the OpenGL context in a rasterization operation, the
+        context's internal framebuffer object is automatically enlarged to accommodate the
+        rasterization operation's output shape, but it is never shrunk in size until the
+        context is destroyed. Thus, if you need to rasterize, say, deep low-resolution
+        tensors and also shallow high-resolution tensors, you can conserve GPU memory by
+        creating two separate OpenGL contexts for these tasks. In this scenario, using the
+        same OpenGL context for both tasks would end up reserving GPU memory for a deep,
+        high-resolution output tensor.
        Args:
          output_db (bool): Compute and output image-space derivates of barycentrics.
          mode: OpenGL context handling mode. Valid values are 'manual' and 'automatic'.

--- a/samples/torch/cube.py
+++ b/samples/torch/cube.py
@@ -136,15 +136,15 @@ def fit_cube(max_iter          = 5000,
            if display_image or save_mp4:
                ang = ang + 0.01
-                img_b = color[0].cpu().numpy()
+                img_b = color[0].cpu().numpy()[::-1]
-                img_o = color_opt[0].detach().cpu().numpy()
+                img_o = color_opt[0].detach().cpu().numpy()[::-1]
                img_d = render(glctx, a_mvp, vtx_pos_opt, pos_idx, vtx_col_opt, col_idx, display_res)[0]
                img_r = render(glctx, a_mvp, vtx_pos, pos_idx, vtx_col, col_idx, display_res)[0]
                scl = display_res // img_o.shape[0]
                img_b = np.repeat(np.repeat(img_b, scl, axis=0), scl, axis=1)
                img_o = np.repeat(np.repeat(img_o, scl, axis=0), scl, axis=1)
-                result_image = make_grid(np.stack([img_o, img_b, img_d.detach().cpu().numpy(), img_r.cpu().numpy()]))
+                result_image = make_grid(np.stack([img_o, img_b, img_d.detach().cpu().numpy()[::-1], img_r.cpu().numpy()[::-1]]))
                if display_image:
                    util.display_image(result_image, size=display_res, title='%d / %d' % (it, max_iter))

--- a/samples/torch/earth.py
+++ b/samples/torch/earth.py
@@ -157,7 +157,7 @@ def fit_earth(max_iter          = 20000,
            ang = ang + 0.1
            with torch.no_grad():
-                result_image = render(glctx, a_mvp, vtx_pos, pos_idx, vtx_uv, uv_idx, tex_opt, res, enable_mip, max_mip_level)[0].cpu().numpy()
+                result_image = render(glctx, a_mvp, vtx_pos, pos_idx, vtx_uv, uv_idx, tex_opt, res, enable_mip, max_mip_level)[0].cpu().numpy()[::-1]
                if display_image:
                    util.display_image(result_image, size=display_res, title='%d / %d' % (it, max_iter))

--- a/samples/torch/envphong.py
+++ b/samples/torch/envphong.py
@@ -175,7 +175,7 @@ def fit_env_phong(max_iter          = 1000,
            color_opt = dr.texture(env_var[np.newaxis, ...], refl, uv_da=refld, filter_mode='linear-mipmap-linear', boundary_mode='cube')
            color_opt = color_opt + phong_var[:3] * torch.max(zero_tensor, ldotr) ** phong_var[3]
            color_opt = torch.where(mask, one_tensor, color_opt)
-            result_image = color_opt.detach()[0].cpu().numpy()
+            result_image = color_opt.detach()[0].cpu().numpy()[::-1]
            if display_image:
                util.display_image(result_image, size=display_res, title='%d / %d' % (it, max_iter))
            if save_mp4:

--- a/samples/torch/pose.py
+++ b/samples/torch/pose.py
@@ -233,11 +233,10 @@ def fit_pose(max_iter           = 10000,
            save_mp4      = mp4save_interval and (it % mp4save_interval == 0)
            if display_image or save_mp4:
-                c = color[0].detach().cpu().numpy()
                img_ref  = color[0].detach().cpu().numpy()
                img_opt  = color_opt[0].detach().cpu().numpy()
                img_best = render(glctx, torch.matmul(mvp, q_to_mtx(pose_best)), vtx_pos, pos_idx, vtx_col, col_idx, resolution)[0].detach().cpu().numpy()
-                result_image = np.concatenate([img_ref, img_best, img_opt], axis=1)
+                result_image = np.concatenate([img_ref, img_best, img_opt], axis=1)[::-1]
                if display_image:
                    util.display_image(result_image, size=display_res, title='(%d) %d / %d' % (rep, it, max_iter))

--- a/samples/torch/util.py
+++ b/samples/torch/util.py
@@ -15,7 +15,7 @@ import torch
 def projection(x=0.1, n=1.0, f=50.0):
    return np.array([[n/x,    0,            0,              0],
-                     [  0, n/-x,            0,              0],
+                     [  0,  n/x,            0,              0],
                     [  0,    0, -(f+n)/(f-n), -(2*f*n)/(f-n)],
                     [  0,    0,           -1,              0]]).astype(np.float32)