Unverified Commit 5637cc9a authored by Ruilong Li(李瑞龙)'s avatar Ruilong Li(李瑞龙) Committed by GitHub
Browse files

Optimize examples for better performance (#59)

* zeros -> emtpy in cuda

* disable cuda hint if is been built

* update examples with better perf

* bump version to 0.2.0

* update readme

* update index and readm

* clean up doc
parent 3d958321
......@@ -4,7 +4,20 @@
https://www.nerfacc.com/
This is a **tiny** toolbox for **accelerating** NeRF training & rendering using PyTorch CUDA extensions. Plug-and-play for most of the NeRFs!
NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference. It focus on
efficient volumetric rendering of radiance fields, which is universal and plug-and-play for most of the NeRFs.
Using NerfAcc,
- The `vanilla NeRF` model with 8-layer MLPs can be trained to *better quality* (+~0.5 PNSR) \
in *1 hour* rather than *1~2 days* as in the paper.
- The `Instant-NGP NeRF` model can be trained to *better quality* (+~0.7 PSNR) with *9/10th* of \
the training time (4.5 minutes) comparing to the official pure-CUDA implementation.
- The `D-NeRF` model for *dynamic* objects can also be trained in *1 hour* \
rather than *2 days* as in the paper, and with *better quality* (+~0.5 PSNR).
- Both *bounded* and *unbounded* scenes are supported.
**And it is pure Python interface with flexible APIs!**
## Installation
......@@ -12,31 +25,98 @@ This is a **tiny** toolbox for **accelerating** NeRF training & rendering using
pip install nerfacc
```
## Usage
The idea of NerfAcc is to perform efficient ray marching and volumetric rendering. So NerfAcc can work with any user-defined radiance field. To plug the NerfAcc rendering pipeline into your code and enjoy the acceleration, you only need to define two functions with your radience field.
- `sigma_fn`: Compute density at each sample. It will be used by `nerfacc.ray_marching()` to skip the empty and occluded space during ray marching, which is where the major speedup comes from.
- `rgb_sigma_fn`: Compute color and density at each sample. It will be used by `nerfacc.rendering()` to conduct differentiable volumetric rendering. This function will receive gradients to update your network.
An simple example is like this:
``` python
import torch
from torch import Tensor
import nerfacc
radiance_field = ... # network: a NeRF model
optimizer = ... # network optimizer
rays_o: Tensor = ... # ray origins. (n_rays, 3)
rays_d: Tensor = ... # ray normalized directions. (n_rays, 3)
def sigma_fn(
t_starts: Tensor, t_ends:Tensor, ray_indices: Tensor
) -> Tensor:
""" Query density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation density values. (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
sigmas = radiance_field.query_density(positions)
return sigmas # (n_samples, 1)
def rgb_sigma_fn(
t_starts: Tensor, t_ends: Tensor, ray_indices: Tensor
) -> Tuple[Tensor, Tensor]:
""" Query rgb and density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation rgb and density values.
(n_samples, 3), (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
rgbs, sigmas = radiance_field(positions, condition=t_dirs)
return rgbs, sigmas # (n_samples, 3), (n_samples, 1)
# Efficient Raymarching: Skip empty and occluded space, pack samples from all rays.
# packed_info: (n_rays, 2). t_starts: (n_samples, 1). t_ends: (n_samples, 1).
packed_info, t_starts, t_ends = nerfacc.ray_marching(
rays_o, rays_d, sigma_fn=sigma_fn, near_plane=0.2, far_plane=1.0,
early_stop_eps=1e-4, alpha_thre=1e-2,
)
# Differentiable Volumetric Rendering.
# colors: (n_rays, 3). opaicity: (n_rays, 1). depth: (n_rays, 1).
color, opacity, depth = nerfacc.rendering(rgb_sigma_fn, packed_info, t_starts, t_ends)
# Optimize the radience field.
optimizer.zero_grad()
loss = F.mse_loss(color, color_gt)
loss.backward()
optimizer.step()
```
## Examples:
Before running those example scripts, please check the script about which dataset it is needed, and download
the dataset first.
``` bash
# Instant-NGP NeRF in 4.5 minutes.
# Instant-NGP NeRF in 4.5 minutes with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/ngp.html
python examples/train_ngp_nerf.py --train_split trainval --scene lego
```
``` bash
# Vanilla MLP NeRF in 1 hour.
# Vanilla MLP NeRF in 1 hour with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/vanilla.html
python examples/train_mlp_nerf.py --train_split train --scene lego
```
```bash
# T-NeRF for Dynamic objects in 1 hour.
# D-NeRF for Dynamic objects in 1 hour with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/dnerf.html
python examples/train_mlp_dnerf.py --train_split train --scene lego
```
```bash
# Unbounded scene in 1 hour.
# Instant-NGP on unbounded scenes in 20 minutes!
# See results at here: https://www.nerfacc.com/en/latest/examples/unbounded.html
python examples/train_ngp_nerf.py --train_split train --scene garden --auto_aabb --unbounded --cone_angle=0.004
```
......@@ -9,7 +9,7 @@ Benchmarks
Here we trained a 8-layer-MLP for the radiance field and a 4-layer-MLP for the warping field,
(similar to the T-Nerf model in the `D-Nerf`_ paper) on the `D-Nerf dataset`_. We used train
split for training and test split for evaluation. Our experiments are conducted on a
single NVIDIA TITAN RTX GPU.
single NVIDIA TITAN RTX GPU. The training memory footprint is about 11GB.
.. note::
......@@ -19,12 +19,12 @@ single NVIDIA TITAN RTX GPU.
It is not optimal but still makes the rendering very efficient.
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| | bouncing | hell | hook | jumping | lego | mutant | standup | trex | AVG |
| PSNR | bouncing | hell | hook | jumping | lego | mutant | standup | trex | MEAN |
| | balls | warrior | | jacks | | | | | |
+======================+==========+=========+=======+=========+=======+========+=========+=======+=======+
| D-Nerf (PSNR: ~2day) | 38.93 | 25.02 | 29.25 | 32.80 | 21.64 | 31.29 | 32.79 | 31.75 | 30.43 |
| D-Nerf (~ days) | 38.93 | 25.02 | 29.25 | 32.80 | 21.64 | 31.29 | 32.79 | 31.75 | 30.43 |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| Ours (PSNR: ~50min) | 39.60 | 22.41 | 30.64 | 29.79 | 24.75 | 35.20 | 34.50 | 31.83 | 31.09 |
| Ours (~ 50min) | 39.60 | 22.41 | 30.64 | 29.79 | 24.75 | 35.20 | 34.50 | 31.83 | 31.09 |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| Ours (Training time)| 45min | 49min | 51min | 46min | 53min | 57min | 49min | 46min | 50min |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
......
......@@ -7,6 +7,7 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details.
Benchmarks
------------
*updated on 2022-10-08*
Here we trained a `Instant-NGP Nerf`_ model on the `Nerf-Synthetic dataset`_. We follow the same
settings with the Instant-NGP paper, which uses trainval split for training and test split for
......@@ -18,16 +19,16 @@ memory footprint is about 3GB.
The Instant-NGP paper makes use of the alpha channel in the images to apply random background
augmentation during training. Yet we only uses RGB values with a constant white background.
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| | Lego | Mic | Materials |Chair |Hotdog | Ficus | Drums | Ship | AVG |
| | | | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+========+========+
|Instant-NGP(PSNR:5min)| 36.39 | 36.22 | 29.78 | 35.00 | 37.40 | 33.51 | 26.02 | 31.10 | 33.18 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| Ours (PSNR:4.5min) | 36.71 | 36.78 | 29.06 | 36.10 | 37.88 | 32.07 | 25.83 | 31.39 | 33.23 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| Ours (Training time)| 286s | 251s | 250s | 311s | 275s | 254s | 249s | 255s | 266s |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| PSNR | Lego | Mic |Materials| Chair |Hotdog | Ficus | Drums | Ship | MEAN |
| | | | | | | | | | |
+======================+=======+=======+=========+=======+=======+=======+=======+=======+=======+
| Instant-NGP (5min) | 36.39 | 36.22 | 29.78 | 35.00 | 37.40 | 33.51 | 26.02 | 31.10 | 33.18 |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (~4.5min) | 36.82 | 37.61 | 30.18 | 36.13 | 38.11 | 34.48 | 26.62 | 31.37 | 33.92 |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (Training time)| 288s | 259s | 256s | 324s | 288s | 245s | 262s | 257s | 272s |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
.. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989
.. _`github repository`: https://github.com/KAIR-BAIR/nerfacc/
......
......@@ -5,10 +5,11 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details.
Benchmarks
------------
*updated on 2022-10-08*
Here we trained a `Instant-NGP Nerf`_ on the `MipNerf360`_ dataset. We used train
split for training and test split for evaluation. Our experiments are conducted on a
single NVIDIA TITAN RTX GPU.
single NVIDIA TITAN RTX GPU. The training memory footprint is about 6-9GB.
The main difference between working with unbounded scenes and bounded scenes, is that
a contraction method is needed to map the infinite space to a finite :ref:`Occupancy Grid`.
......@@ -23,18 +24,18 @@ that takes from `MipNerf360`_.
show how to use the library, we didn't want to make it too complicated.
+----------------------+-------+-------+------------+-------+--------+--------+--------+
| |Garden |Bicycle| Bonsai |Counter|Kitchen | Room | Stump |
| | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+
|Nerf++(PSNR:~days) | 24.32 | 22.64 | 29.15 | 26.38 | 27.80 | 28.87 | 24.34 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+
|MipNerf360(PSNR:~days)| 26.98 | 24.37 | 33.46 | 29.55 | 32.23 | 31.63 | 28.65 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+
| Ours (PSNR:~1hr) | 25.41 | 22.89 | 27.35 | 23.15 | 27.74 | 30.66 | 21.83 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+
| Ours (Training time)| 40min | 35min | 47min | 39min | 60min | 41min | 28min |
+----------------------+-------+-------+------------+-------+--------+--------+--------+
+----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| PSNR |Garden |Bicycle|Bonsai |Counter|Kitchen| Room | Stump | MEAN |
| | | | | | | | | |
+======================+=======+=======+=======+=======+=======+=======+=======+=======+
| Nerf++ (~days) | 24.32 | 22.64 | 29.15 | 26.38 | 27.80 | 28.87 | 24.34 | 26.21 |
+----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| MipNerf360 (~days) | 26.98 | 24.37 | 33.46 | 29.55 | 32.23 | 31.63 | 28.65 | 29.55 |
+----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| Ours (~20 mins) | 25.41 | 22.97 | 30.71 | 27.34 | 30.32 | 31.00 | 23.43 | 27.31 |
+----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| Ours (Training time) | 25min | 17min | 19min | 23min | 28min | 20min | 17min | 21min |
+----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
.. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989
.. _`MipNerf360`: https://arxiv.org/abs/2111.12077
......
......@@ -8,7 +8,7 @@ Benchmarks
Here we trained a 8-layer-MLP for the radiance field as in the `vanilla Nerf`_. We used the
train split for training and test split for evaluation as in the Nerf paper. Our experiments are
conducted on a single NVIDIA TITAN RTX GPU.
conducted on a single NVIDIA TITAN RTX GPU. The training memory footprint is about 10GB.
.. note::
The vanilla Nerf paper uses two MLPs for course-to-fine sampling. Instead here we only use a
......@@ -17,16 +17,16 @@ conducted on a single NVIDIA TITAN RTX GPU.
so we can simplly increase the number of samples with a single MLP, to achieve the same goal
with the coarse-to-fine sampling, without runtime or memory issue.
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| | Lego | Mic | Materials |Chair |Hotdog | Ficus | Drums | Ship | AVG |
| | | | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+========+========+
| NeRF (PSNR: ~days) | 32.54 | 32.91 | 29.62 | 33.00 | 36.18 | 30.13 | 25.01 | 28.65 | 31.00 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| Ours (PSNR: ~50min) | 33.69 | 33.76 | 29.73 | 33.32 | 35.80 | 32.52 | 25.39 | 28.18 | 31.55 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
| Ours (Training time)| 58min | 53min | 46min | 62min | 56min | 42min | 52min | 49min | 52min |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| PSNR | Lego | Mic |Materials| Chair |Hotdog | Ficus | Drums | Ship | MEAN |
| | | | | | | | | | |
+======================+=======+=======+=========+=======+=======+=======+=======+=======+=======+
| NeRF (~ days) | 32.54 | 32.91 | 29.62 | 33.00 | 36.18 | 30.13 | 25.01 | 28.65 | 31.00 |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (~ 50min) | 33.69 | 33.76 | 29.73 | 33.32 | 35.80 | 32.52 | 25.39 | 28.18 | 31.55 |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (Training time)| 58min | 53min | 46min | 62min | 56min | 42min | 52min | 49min | 52min |
+----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
.. _`github repository`: : https://github.com/KAIR-BAIR/nerfacc/
.. _`vanilla Nerf`: https://arxiv.org/abs/2003.08934
NerfAcc Documentation
===================================
NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference.
NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference. It focus on
efficient volumetric rendering of radiance fields, which is universal and plug-and-play for most of the NeRFs.
Using NerfAcc,
- The `vanilla Nerf`_ model with 8-layer MLPs can be trained to *better quality* (+~0.5 PNSR) \
in *1 hour* rather than *1~2 days* as in the paper.
- The `Instant-NGP Nerf`_ model can be trained to *equal quality* with *9/10th* of the training time (4.5 minutes) \
comparing to the official pure-CUDA implementation.
- The `Instant-NGP Nerf`_ model can be trained to *better quality* (+~0.7 PSNR) with *9/10th* of \
the training time (4.5 minutes) comparing to the official pure-CUDA implementation.
- The `D-Nerf`_ model for *dynamic* objects can also be trained in *1 hour* \
rather than *2 days* as in the paper, and with *better quality* (+~0.5 PSNR).
- Both the *bounded* and *unbounded* scenes are supported.
- Both *bounded* and *unbounded* scenes are supported.
*And it is pure python interface with flexible apis!*
**And it is pure Python interface with flexible APIs!**
| Github: https://github.com/KAIR-BAIR/nerfacc
| Authors: `Ruilong Li`_, `Matthew Tancik`_, `Angjoo Kanazawa`_
.. note::
This repo is focusing on the single scene situation. Generalizable Nerfs across \
This repo is focusing on the single scene situation. Generalizable Nerfs across
multiple scenes is currently out of the scope of this repo. But you may still find
some useful tricks in this repo. :)
Installation:
-------------
......@@ -28,6 +33,82 @@ Installation:
$ pip install nerfacc
Usage:
-------------
The idea of NerfAcc is to perform efficient ray marching and volumetric rendering.
So NerfAcc can work with any user-defined radiance field. To plug the NerfAcc rendering
pipeline into your code and enjoy the acceleration, you only need to define two functions
with your radience field.
- `sigma_fn`: Compute density at each sample. It will be used by :func:`nerfacc.ray_marching` to skip the empty and occluded space during ray marching, which is where the major speedup comes from.
- `rgb_sigma_fn`: Compute color and density at each sample. It will be used by :func:`nerfacc.rendering` to conduct differentiable volumetric rendering. This function will receive gradients to update your network.
An simple example is like this:
.. code-block:: python
import torch
from torch import Tensor
import nerfacc
radiance_field = ... # network: a NeRF model
optimizer = ... # network optimizer
rays_o: Tensor = ... # ray origins. (n_rays, 3)
rays_d: Tensor = ... # ray normalized directions. (n_rays, 3)
def sigma_fn(
t_starts: Tensor, t_ends:Tensor, ray_indices: Tensor
) -> Tensor:
""" Query density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation density values. (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
sigmas = radiance_field.query_density(positions)
return sigmas # (n_samples, 1)
def rgb_sigma_fn(
t_starts: Tensor, t_ends: Tensor, ray_indices: Tensor
) -> Tuple[Tensor, Tensor]:
""" Query rgb and density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation rgb and density values.
(n_samples, 3), (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
rgbs, sigmas = radiance_field(positions, condition=t_dirs)
return rgbs, sigmas # (n_samples, 3), (n_samples, 1)
# Efficient Raymarching: Skip empty and occluded space, pack samples from all rays.
# packed_info: (n_rays, 2). t_starts: (n_samples, 1). t_ends: (n_samples, 1).
packed_info, t_starts, t_ends = nerfacc.ray_marching(
rays_o, rays_d, sigma_fn=sigma_fn, near_plane=0.2, far_plane=1.0,
early_stop_eps=1e-4, alpha_thre=1e-2,
)
# Differentiable Volumetric Rendering.
# colors: (n_rays, 3). opaicity: (n_rays, 1). depth: (n_rays, 1).
color, opacity, depth = nerfacc.rendering(rgb_sigma_fn, packed_info, t_starts, t_ends)
# Optimize the radience field.
optimizer.zero_grad()
loss = F.mse_loss(color, color_gt)
loss.backward()
optimizer.step()
Links:
-------------
.. toctree::
:glob:
:maxdepth: 1
......@@ -53,4 +134,9 @@ Installation:
.. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989
.. _`D-Nerf`: https://arxiv.org/abs/2011.13961
.. _`MipNerf360`: https://arxiv.org/abs/2111.12077
.. _`pixel-Nerf`: https://arxiv.org/abs/2012.02190
\ No newline at end of file
.. _`pixel-Nerf`: https://arxiv.org/abs/2012.02190
.. _`Nerf++`: https://arxiv.org/abs/2010.07492
.. _`Ruilong Li`: https://www.liruilong.cn/
.. _`Matthew Tancik`: https://www.matthewtancik.com/
.. _`Angjoo Kanazawa`: https://people.eecs.berkeley.edu/~kanazawa/
\ No newline at end of file
......@@ -248,8 +248,8 @@ class VanillaNeRFRadianceField(nn.Module):
class DNeRFRadianceField(nn.Module):
def __init__(self) -> None:
super().__init__()
self.posi_encoder = SinusoidalEncoder(3, 0, 0, True)
self.time_encoder = SinusoidalEncoder(1, 0, 0, True)
self.posi_encoder = SinusoidalEncoder(3, 0, 4, True)
self.time_encoder = SinusoidalEncoder(1, 0, 4, True)
self.warp = MLP(
input_dim=self.posi_encoder.latent_dim
+ self.time_encoder.latent_dim,
......
......@@ -141,17 +141,6 @@ class NGPradianceField(torch.nn.Module):
},
)
def query_opacity(self, x, step_size):
density = self.query_density(x)
if self.unbounded:
# NOTE: In principle, we should use the following formula to scale
# up the step size, but in practice, it is somehow not helpful.
# derivitive = contract_to_unisphere(x, self.aabb, derivative=True)
# step_size = step_size / derivitive.norm(dim=-1, keepdim=True)
pass
opacity = density * step_size
return opacity
def query_density(self, x, return_feat: bool = False):
if self.unbounded:
x = contract_to_unisphere(x, self.aabb)
......
......@@ -2,4 +2,5 @@ git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
opencv-python
imageio
numpy
tqdm
\ No newline at end of file
tqdm
scipy
\ No newline at end of file
......@@ -76,7 +76,7 @@ if __name__ == "__main__":
).item()
# setup the radiance field we want to train.
max_steps = 50000
max_steps = 30000
grad_scaler = torch.cuda.amp.GradScaler(1)
radiance_field = DNeRFRadianceField().to(device)
optimizer = torch.optim.Adam(radiance_field.parameters(), lr=5e-4)
......@@ -156,9 +156,12 @@ if __name__ == "__main__":
render_step_size=render_step_size,
render_bkgd=render_bkgd,
cone_angle=args.cone_angle,
alpha_thre=0.01 if step > 1000 else 0.00,
# dnerf options
timestamps=timestamps,
)
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels)
......@@ -213,6 +216,7 @@ if __name__ == "__main__":
render_step_size=render_step_size,
render_bkgd=render_bkgd,
cone_angle=args.cone_angle,
alpha_thre=0.01,
# test options
test_chunk_size=args.test_chunk_size,
# dnerf options
......
......@@ -186,6 +186,8 @@ if __name__ == "__main__":
render_bkgd=render_bkgd,
cone_angle=args.cone_angle,
)
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels)
......
......@@ -140,6 +140,7 @@ if __name__ == "__main__":
near_plane = 0.2
far_plane = 1e4
render_step_size = 1e-2
alpha_thre = 1e-2
else:
contraction_type = ContractionType.AABB
scene_aabb = torch.tensor(args.aabb, dtype=torch.float32, device=device)
......@@ -150,9 +151,10 @@ if __name__ == "__main__":
* math.sqrt(3)
/ render_n_samples
).item()
alpha_thre = 0.0
# setup the radiance field we want to train.
max_steps = 40000 if args.unbounded else 20000
max_steps = 20000
grad_scaler = torch.cuda.amp.GradScaler(2**10)
radiance_field = NGPradianceField(
aabb=args.aabb,
......@@ -185,13 +187,33 @@ if __name__ == "__main__":
rays = data["rays"]
pixels = data["pixels"]
def occ_eval_fn(x):
if args.cone_angle > 0.0:
# randomly sample a camera for computing step size.
camera_ids = torch.randint(
0, len(train_dataset), (x.shape[0],), device=device
)
origins = train_dataset.camtoworlds[camera_ids, :3, -1]
t = (origins - x).norm(dim=-1, keepdim=True)
# compute actual step size used in marching, based on the distance to the camera.
step_size = torch.clamp(
t * args.cone_angle, min=render_step_size
)
# filter out the points that are not in the near far plane.
if (near_plane is not None) and (near_plane is not None):
step_size = torch.where(
(t > near_plane) & (t < far_plane),
step_size,
torch.zeros_like(step_size),
)
else:
step_size = render_step_size
# compute occupancy
density = radiance_field.query_density(x)
return density * step_size
# update occupancy grid
occupancy_grid.every_n_step(
step=step,
occ_eval_fn=lambda x: radiance_field.query_opacity(
x, render_step_size
),
)
occupancy_grid.every_n_step(step=step, occ_eval_fn=occ_eval_fn)
# render
rgb, acc, depth, n_rendering_samples = render_image(
......@@ -205,7 +227,10 @@ if __name__ == "__main__":
render_step_size=render_step_size,
render_bkgd=render_bkgd,
cone_angle=args.cone_angle,
alpha_thre=alpha_thre,
)
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels)
......@@ -254,11 +279,12 @@ if __name__ == "__main__":
rays,
scene_aabb,
# rendering options
near_plane=None,
far_plane=None,
near_plane=near_plane,
far_plane=far_plane,
render_step_size=render_step_size,
render_bkgd=render_bkgd,
cone_angle=args.cone_angle,
alpha_thre=alpha_thre,
# test options
test_chunk_size=args.test_chunk_size,
)
......
......@@ -30,6 +30,7 @@ def render_image(
render_step_size: float = 1e-3,
render_bkgd: Optional[torch.Tensor] = None,
cone_angle: float = 0.0,
alpha_thre: float = 0.0,
# test options
test_chunk_size: int = 8192,
# only useful for dnerf
......@@ -95,6 +96,7 @@ def render_image(
render_step_size=render_step_size,
stratified=radiance_field.training,
cone_angle=cone_angle,
alpha_thre=alpha_thre,
)
rgb, opacity, depth = rendering(
rgb_sigma_fn,
......
......@@ -50,4 +50,5 @@ __all__ = [
"unpack_info",
"ray_resampling",
"loss_distortion",
"unpack_to_ray_indices",
]
......@@ -7,7 +7,7 @@ import os
from subprocess import DEVNULL, call
from rich.console import Console
from torch.utils.cpp_extension import load
from torch.utils.cpp_extension import _get_build_directory, load
PATH = os.path.dirname(os.path.abspath(__file__))
......@@ -21,21 +21,32 @@ def cuda_toolkit_available():
return False
def load_extention(name: str):
return load(
name=name,
sources=glob.glob(os.path.join(PATH, "csrc/*.cu")),
extra_cflags=["-O3"],
extra_cuda_cflags=["-O3"],
)
_C = None
if cuda_toolkit_available():
console = Console()
with console.status(
"[bold yellow]Setting up CUDA (This may take a few minutes the first time)",
spinner="bouncingBall",
):
_C = load(
name="nerfacc_cuda",
sources=glob.glob(os.path.join(PATH, "csrc/*.cu")),
extra_cflags=["-O3"],
extra_cuda_cflags=["-O3"],
)
name = "nerfacc_cuda"
if os.listdir(_get_build_directory(name, verbose=False)) != []:
# If the build exists, we assume the extension has been built
# and we can load it.
_C = load_extention(name)
else:
console = Console()
console.print("[bold red]No CUDA toolkit found. NerfAcc will be disabled.")
# First time to build the extension
if cuda_toolkit_available():
with Console().status(
"[bold yellow]NerfAcc: Setting up CUDA (This may take a few minutes the first time)",
spinner="bouncingBall",
):
_C = load_extention(name)
else:
Console().print(
"[yellow]NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.[/yellow]"
)
__all__ = ["_C"]
......@@ -73,7 +73,7 @@ torch::Tensor contract(
const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor out_samples = torch::zeros({n_samples, 3}, samples.options());
torch::Tensor out_samples = torch::empty({n_samples, 3}, samples.options());
contract_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
n_samples,
......@@ -99,7 +99,7 @@ torch::Tensor contract_inv(
const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor out_samples = torch::zeros({n_samples, 3}, samples.options());
torch::Tensor out_samples = torch::empty({n_samples, 3}, samples.options());
contract_inv_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
n_samples,
......
......@@ -91,7 +91,7 @@ torch::Tensor unpack_info(const torch::Tensor packed_info)
const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads);
int n_samples = packed_info[n_rays - 1].sum(0).item<int>();
torch::Tensor ray_indices = torch::zeros(
torch::Tensor ray_indices = torch::empty(
{n_samples}, packed_info.options().dtype(torch::kInt32));
unpack_info_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
......
......@@ -223,7 +223,7 @@ std::vector<torch::Tensor> ray_marching(
const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads);
// helper counter
torch::Tensor num_steps = torch::zeros(
torch::Tensor num_steps = torch::empty(
{n_rays}, rays_o.options().dtype(torch::kInt32));
// count number of samples per ray
......@@ -253,8 +253,8 @@ std::vector<torch::Tensor> ray_marching(
// output samples starts and ends
int total_steps = cum_steps[cum_steps.size(0) - 1].item<int>();
torch::Tensor t_starts = torch::zeros({total_steps, 1}, rays_o.options());
torch::Tensor t_ends = torch::zeros({total_steps, 1}, rays_o.options());
torch::Tensor t_starts = torch::empty({total_steps, 1}, rays_o.options());
torch::Tensor t_ends = torch::empty({total_steps, 1}, rays_o.options());
ray_marching_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
// rays
......@@ -328,7 +328,7 @@ torch::Tensor grid_query(
const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor occs = torch::zeros({n_samples}, grid_value.options());
torch::Tensor occs = torch::empty({n_samples}, grid_value.options());
AT_DISPATCH_FLOATING_TYPES_AND(
at::ScalarType::Bool,
......
......@@ -187,7 +187,7 @@ def ray_marching(
if sigma_fn is not None:
# Query sigma without gradients
ray_indices = unpack_info(packed_info)
sigmas = sigma_fn(t_starts, t_ends, ray_indices)
sigmas = sigma_fn(t_starts, t_ends, ray_indices.long())
assert (
sigmas.shape == t_starts.shape
), "sigmas must have shape of (N, 1)! Got {}".format(sigmas.shape)
......
......@@ -80,7 +80,7 @@ def rendering(
ray_indices = unpack_info(packed_info)
# Query sigma and color with gradients
rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices)
rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices.long())
assert rgbs.shape[-1] == 3, "rgbs must have 3 channels, got {}".format(
rgbs.shape
)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment