Unverified Commit 5637cc9a authored by Ruilong Li(李瑞龙)'s avatar Ruilong Li(李瑞龙) Committed by GitHub
Browse files

Optimize examples for better performance (#59)

* zeros -> emtpy in cuda

* disable cuda hint if is been built

* update examples with better perf

* bump version to 0.2.0

* update readme

* update index and readm

* clean up doc
parent 3d958321
...@@ -4,7 +4,20 @@ ...@@ -4,7 +4,20 @@
https://www.nerfacc.com/ https://www.nerfacc.com/
This is a **tiny** toolbox for **accelerating** NeRF training & rendering using PyTorch CUDA extensions. Plug-and-play for most of the NeRFs! NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference. It focus on
efficient volumetric rendering of radiance fields, which is universal and plug-and-play for most of the NeRFs.
Using NerfAcc,
- The `vanilla NeRF` model with 8-layer MLPs can be trained to *better quality* (+~0.5 PNSR) \
in *1 hour* rather than *1~2 days* as in the paper.
- The `Instant-NGP NeRF` model can be trained to *better quality* (+~0.7 PSNR) with *9/10th* of \
the training time (4.5 minutes) comparing to the official pure-CUDA implementation.
- The `D-NeRF` model for *dynamic* objects can also be trained in *1 hour* \
rather than *2 days* as in the paper, and with *better quality* (+~0.5 PSNR).
- Both *bounded* and *unbounded* scenes are supported.
**And it is pure Python interface with flexible APIs!**
## Installation ## Installation
...@@ -12,31 +25,98 @@ This is a **tiny** toolbox for **accelerating** NeRF training & rendering using ...@@ -12,31 +25,98 @@ This is a **tiny** toolbox for **accelerating** NeRF training & rendering using
pip install nerfacc pip install nerfacc
``` ```
## Usage
The idea of NerfAcc is to perform efficient ray marching and volumetric rendering. So NerfAcc can work with any user-defined radiance field. To plug the NerfAcc rendering pipeline into your code and enjoy the acceleration, you only need to define two functions with your radience field.
- `sigma_fn`: Compute density at each sample. It will be used by `nerfacc.ray_marching()` to skip the empty and occluded space during ray marching, which is where the major speedup comes from.
- `rgb_sigma_fn`: Compute color and density at each sample. It will be used by `nerfacc.rendering()` to conduct differentiable volumetric rendering. This function will receive gradients to update your network.
An simple example is like this:
``` python
import torch
from torch import Tensor
import nerfacc
radiance_field = ... # network: a NeRF model
optimizer = ... # network optimizer
rays_o: Tensor = ... # ray origins. (n_rays, 3)
rays_d: Tensor = ... # ray normalized directions. (n_rays, 3)
def sigma_fn(
t_starts: Tensor, t_ends:Tensor, ray_indices: Tensor
) -> Tensor:
""" Query density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation density values. (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
sigmas = radiance_field.query_density(positions)
return sigmas # (n_samples, 1)
def rgb_sigma_fn(
t_starts: Tensor, t_ends: Tensor, ray_indices: Tensor
) -> Tuple[Tensor, Tensor]:
""" Query rgb and density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation rgb and density values.
(n_samples, 3), (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
rgbs, sigmas = radiance_field(positions, condition=t_dirs)
return rgbs, sigmas # (n_samples, 3), (n_samples, 1)
# Efficient Raymarching: Skip empty and occluded space, pack samples from all rays.
# packed_info: (n_rays, 2). t_starts: (n_samples, 1). t_ends: (n_samples, 1).
packed_info, t_starts, t_ends = nerfacc.ray_marching(
rays_o, rays_d, sigma_fn=sigma_fn, near_plane=0.2, far_plane=1.0,
early_stop_eps=1e-4, alpha_thre=1e-2,
)
# Differentiable Volumetric Rendering.
# colors: (n_rays, 3). opaicity: (n_rays, 1). depth: (n_rays, 1).
color, opacity, depth = nerfacc.rendering(rgb_sigma_fn, packed_info, t_starts, t_ends)
# Optimize the radience field.
optimizer.zero_grad()
loss = F.mse_loss(color, color_gt)
loss.backward()
optimizer.step()
```
## Examples: ## Examples:
Before running those example scripts, please check the script about which dataset it is needed, and download Before running those example scripts, please check the script about which dataset it is needed, and download
the dataset first. the dataset first.
``` bash ``` bash
# Instant-NGP NeRF in 4.5 minutes. # Instant-NGP NeRF in 4.5 minutes with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/ngp.html # See results at here: https://www.nerfacc.com/en/latest/examples/ngp.html
python examples/train_ngp_nerf.py --train_split trainval --scene lego python examples/train_ngp_nerf.py --train_split trainval --scene lego
``` ```
``` bash ``` bash
# Vanilla MLP NeRF in 1 hour. # Vanilla MLP NeRF in 1 hour with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/vanilla.html # See results at here: https://www.nerfacc.com/en/latest/examples/vanilla.html
python examples/train_mlp_nerf.py --train_split train --scene lego python examples/train_mlp_nerf.py --train_split train --scene lego
``` ```
```bash ```bash
# T-NeRF for Dynamic objects in 1 hour. # D-NeRF for Dynamic objects in 1 hour with better performance!
# See results at here: https://www.nerfacc.com/en/latest/examples/dnerf.html # See results at here: https://www.nerfacc.com/en/latest/examples/dnerf.html
python examples/train_mlp_dnerf.py --train_split train --scene lego python examples/train_mlp_dnerf.py --train_split train --scene lego
``` ```
```bash ```bash
# Unbounded scene in 1 hour. # Instant-NGP on unbounded scenes in 20 minutes!
# See results at here: https://www.nerfacc.com/en/latest/examples/unbounded.html # See results at here: https://www.nerfacc.com/en/latest/examples/unbounded.html
python examples/train_ngp_nerf.py --train_split train --scene garden --auto_aabb --unbounded --cone_angle=0.004 python examples/train_ngp_nerf.py --train_split train --scene garden --auto_aabb --unbounded --cone_angle=0.004
``` ```
...@@ -9,7 +9,7 @@ Benchmarks ...@@ -9,7 +9,7 @@ Benchmarks
Here we trained a 8-layer-MLP for the radiance field and a 4-layer-MLP for the warping field, Here we trained a 8-layer-MLP for the radiance field and a 4-layer-MLP for the warping field,
(similar to the T-Nerf model in the `D-Nerf`_ paper) on the `D-Nerf dataset`_. We used train (similar to the T-Nerf model in the `D-Nerf`_ paper) on the `D-Nerf dataset`_. We used train
split for training and test split for evaluation. Our experiments are conducted on a split for training and test split for evaluation. Our experiments are conducted on a
single NVIDIA TITAN RTX GPU. single NVIDIA TITAN RTX GPU. The training memory footprint is about 11GB.
.. note:: .. note::
...@@ -19,12 +19,12 @@ single NVIDIA TITAN RTX GPU. ...@@ -19,12 +19,12 @@ single NVIDIA TITAN RTX GPU.
It is not optimal but still makes the rendering very efficient. It is not optimal but still makes the rendering very efficient.
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+ +----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| | bouncing | hell | hook | jumping | lego | mutant | standup | trex | AVG | | PSNR | bouncing | hell | hook | jumping | lego | mutant | standup | trex | MEAN |
| | balls | warrior | | jacks | | | | | | | | balls | warrior | | jacks | | | | | |
+======================+==========+=========+=======+=========+=======+========+=========+=======+=======+ +======================+==========+=========+=======+=========+=======+========+=========+=======+=======+
| D-Nerf (PSNR: ~2day) | 38.93 | 25.02 | 29.25 | 32.80 | 21.64 | 31.29 | 32.79 | 31.75 | 30.43 | | D-Nerf (~ days) | 38.93 | 25.02 | 29.25 | 32.80 | 21.64 | 31.29 | 32.79 | 31.75 | 30.43 |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+ +----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| Ours (PSNR: ~50min) | 39.60 | 22.41 | 30.64 | 29.79 | 24.75 | 35.20 | 34.50 | 31.83 | 31.09 | | Ours (~ 50min) | 39.60 | 22.41 | 30.64 | 29.79 | 24.75 | 35.20 | 34.50 | 31.83 | 31.09 |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+ +----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
| Ours (Training time)| 45min | 49min | 51min | 46min | 53min | 57min | 49min | 46min | 50min | | Ours (Training time)| 45min | 49min | 51min | 46min | 53min | 57min | 49min | 46min | 50min |
+----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+ +----------------------+----------+---------+-------+---------+-------+--------+---------+-------+-------+
......
...@@ -7,6 +7,7 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details. ...@@ -7,6 +7,7 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details.
Benchmarks Benchmarks
------------ ------------
*updated on 2022-10-08*
Here we trained a `Instant-NGP Nerf`_ model on the `Nerf-Synthetic dataset`_. We follow the same Here we trained a `Instant-NGP Nerf`_ model on the `Nerf-Synthetic dataset`_. We follow the same
settings with the Instant-NGP paper, which uses trainval split for training and test split for settings with the Instant-NGP paper, which uses trainval split for training and test split for
...@@ -18,16 +19,16 @@ memory footprint is about 3GB. ...@@ -18,16 +19,16 @@ memory footprint is about 3GB.
The Instant-NGP paper makes use of the alpha channel in the images to apply random background The Instant-NGP paper makes use of the alpha channel in the images to apply random background
augmentation during training. Yet we only uses RGB values with a constant white background. augmentation during training. Yet we only uses RGB values with a constant white background.
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| | Lego | Mic | Materials |Chair |Hotdog | Ficus | Drums | Ship | AVG | | PSNR | Lego | Mic |Materials| Chair |Hotdog | Ficus | Drums | Ship | MEAN |
| | | | | | | | | | | | | | | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+========+========+ +======================+=======+=======+=========+=======+=======+=======+=======+=======+=======+
|Instant-NGP(PSNR:5min)| 36.39 | 36.22 | 29.78 | 35.00 | 37.40 | 33.51 | 26.02 | 31.10 | 33.18 | | Instant-NGP (5min) | 36.39 | 36.22 | 29.78 | 35.00 | 37.40 | 33.51 | 26.02 | 31.10 | 33.18 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (PSNR:4.5min) | 36.71 | 36.78 | 29.06 | 36.10 | 37.88 | 32.07 | 25.83 | 31.39 | 33.23 | | Ours (~4.5min) | 36.82 | 37.61 | 30.18 | 36.13 | 38.11 | 34.48 | 26.62 | 31.37 | 33.92 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (Training time)| 286s | 251s | 250s | 311s | 275s | 254s | 249s | 255s | 266s | | Ours (Training time)| 288s | 259s | 256s | 324s | 288s | 245s | 262s | 257s | 272s |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
.. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989 .. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989
.. _`github repository`: https://github.com/KAIR-BAIR/nerfacc/ .. _`github repository`: https://github.com/KAIR-BAIR/nerfacc/
......
...@@ -5,10 +5,11 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details. ...@@ -5,10 +5,11 @@ See code `examples/train_ngp_nerf.py` at our `github repository`_ for details.
Benchmarks Benchmarks
------------ ------------
*updated on 2022-10-08*
Here we trained a `Instant-NGP Nerf`_ on the `MipNerf360`_ dataset. We used train Here we trained a `Instant-NGP Nerf`_ on the `MipNerf360`_ dataset. We used train
split for training and test split for evaluation. Our experiments are conducted on a split for training and test split for evaluation. Our experiments are conducted on a
single NVIDIA TITAN RTX GPU. single NVIDIA TITAN RTX GPU. The training memory footprint is about 6-9GB.
The main difference between working with unbounded scenes and bounded scenes, is that The main difference between working with unbounded scenes and bounded scenes, is that
a contraction method is needed to map the infinite space to a finite :ref:`Occupancy Grid`. a contraction method is needed to map the infinite space to a finite :ref:`Occupancy Grid`.
...@@ -23,18 +24,18 @@ that takes from `MipNerf360`_. ...@@ -23,18 +24,18 @@ that takes from `MipNerf360`_.
show how to use the library, we didn't want to make it too complicated. show how to use the library, we didn't want to make it too complicated.
+----------------------+-------+-------+------------+-------+--------+--------+--------+ +----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| |Garden |Bicycle| Bonsai |Counter|Kitchen | Room | Stump | | PSNR |Garden |Bicycle|Bonsai |Counter|Kitchen| Room | Stump | MEAN |
| | | | | | | | | | | | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+ +======================+=======+=======+=======+=======+=======+=======+=======+=======+
|Nerf++(PSNR:~days) | 24.32 | 22.64 | 29.15 | 26.38 | 27.80 | 28.87 | 24.34 | | Nerf++ (~days) | 24.32 | 22.64 | 29.15 | 26.38 | 27.80 | 28.87 | 24.34 | 26.21 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+ +----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
|MipNerf360(PSNR:~days)| 26.98 | 24.37 | 33.46 | 29.55 | 32.23 | 31.63 | 28.65 | | MipNerf360 (~days) | 26.98 | 24.37 | 33.46 | 29.55 | 32.23 | 31.63 | 28.65 | 29.55 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+ +----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| Ours (PSNR:~1hr) | 25.41 | 22.89 | 27.35 | 23.15 | 27.74 | 30.66 | 21.83 | | Ours (~20 mins) | 25.41 | 22.97 | 30.71 | 27.34 | 30.32 | 31.00 | 23.43 | 27.31 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+ +----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
| Ours (Training time)| 40min | 35min | 47min | 39min | 60min | 41min | 28min | | Ours (Training time) | 25min | 17min | 19min | 23min | 28min | 20min | 17min | 21min |
+----------------------+-------+-------+------------+-------+--------+--------+--------+ +----------------------+-------+-------+-------+-------+-------+-------+-------+-------+
.. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989 .. _`Instant-NGP Nerf`: https://arxiv.org/abs/2201.05989
.. _`MipNerf360`: https://arxiv.org/abs/2111.12077 .. _`MipNerf360`: https://arxiv.org/abs/2111.12077
......
...@@ -8,7 +8,7 @@ Benchmarks ...@@ -8,7 +8,7 @@ Benchmarks
Here we trained a 8-layer-MLP for the radiance field as in the `vanilla Nerf`_. We used the Here we trained a 8-layer-MLP for the radiance field as in the `vanilla Nerf`_. We used the
train split for training and test split for evaluation as in the Nerf paper. Our experiments are train split for training and test split for evaluation as in the Nerf paper. Our experiments are
conducted on a single NVIDIA TITAN RTX GPU. conducted on a single NVIDIA TITAN RTX GPU. The training memory footprint is about 10GB.
.. note:: .. note::
The vanilla Nerf paper uses two MLPs for course-to-fine sampling. Instead here we only use a The vanilla Nerf paper uses two MLPs for course-to-fine sampling. Instead here we only use a
...@@ -17,16 +17,16 @@ conducted on a single NVIDIA TITAN RTX GPU. ...@@ -17,16 +17,16 @@ conducted on a single NVIDIA TITAN RTX GPU.
so we can simplly increase the number of samples with a single MLP, to achieve the same goal so we can simplly increase the number of samples with a single MLP, to achieve the same goal
with the coarse-to-fine sampling, without runtime or memory issue. with the coarse-to-fine sampling, without runtime or memory issue.
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| | Lego | Mic | Materials |Chair |Hotdog | Ficus | Drums | Ship | AVG | | PSNR | Lego | Mic |Materials| Chair |Hotdog | Ficus | Drums | Ship | MEAN |
| | | | | | | | | | | | | | | | | | | | | |
+======================+=======+=======+============+=======+========+========+========+========+========+ +======================+=======+=======+=========+=======+=======+=======+=======+=======+=======+
| NeRF (PSNR: ~days) | 32.54 | 32.91 | 29.62 | 33.00 | 36.18 | 30.13 | 25.01 | 28.65 | 31.00 | | NeRF (~ days) | 32.54 | 32.91 | 29.62 | 33.00 | 36.18 | 30.13 | 25.01 | 28.65 | 31.00 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (PSNR: ~50min) | 33.69 | 33.76 | 29.73 | 33.32 | 35.80 | 32.52 | 25.39 | 28.18 | 31.55 | | Ours (~ 50min) | 33.69 | 33.76 | 29.73 | 33.32 | 35.80 | 32.52 | 25.39 | 28.18 | 31.55 |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
| Ours (Training time)| 58min | 53min | 46min | 62min | 56min | 42min | 52min | 49min | 52min | | Ours (Training time)| 58min | 53min | 46min | 62min | 56min | 42min | 52min | 49min | 52min |
+----------------------+-------+-------+------------+-------+--------+--------+--------+--------+--------+ +----------------------+-------+-------+---------+-------+-------+-------+-------+-------+-------+
.. _`github repository`: : https://github.com/KAIR-BAIR/nerfacc/ .. _`github repository`: : https://github.com/KAIR-BAIR/nerfacc/
.. _`vanilla Nerf`: https://arxiv.org/abs/2003.08934 .. _`vanilla Nerf`: https://arxiv.org/abs/2003.08934
NerfAcc Documentation NerfAcc Documentation
=================================== ===================================
NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference. NerfAcc is a PyTorch Nerf acceleration toolbox for both training and inference. It focus on
efficient volumetric rendering of radiance fields, which is universal and plug-and-play for most of the NeRFs.
Using NerfAcc, Using NerfAcc,
- The `vanilla Nerf`_ model with 8-layer MLPs can be trained to *better quality* (+~0.5 PNSR) \ - The `vanilla Nerf`_ model with 8-layer MLPs can be trained to *better quality* (+~0.5 PNSR) \
in *1 hour* rather than *1~2 days* as in the paper. in *1 hour* rather than *1~2 days* as in the paper.
- The `Instant-NGP Nerf`_ model can be trained to *equal quality* with *9/10th* of the training time (4.5 minutes) \ - The `Instant-NGP Nerf`_ model can be trained to *better quality* (+~0.7 PSNR) with *9/10th* of \
comparing to the official pure-CUDA implementation. the training time (4.5 minutes) comparing to the official pure-CUDA implementation.
- The `D-Nerf`_ model for *dynamic* objects can also be trained in *1 hour* \ - The `D-Nerf`_ model for *dynamic* objects can also be trained in *1 hour* \
rather than *2 days* as in the paper, and with *better quality* (+~0.5 PSNR). rather than *2 days* as in the paper, and with *better quality* (+~0.5 PSNR).
- Both the *bounded* and *unbounded* scenes are supported. - Both *bounded* and *unbounded* scenes are supported.
*And it is pure python interface with flexible apis!* **And it is pure Python interface with flexible APIs!**
| Github: https://github.com/KAIR-BAIR/nerfacc
| Authors: `Ruilong Li`_, `Matthew Tancik`_, `Angjoo Kanazawa`_
.. note:: .. note::
This repo is focusing on the single scene situation. Generalizable Nerfs across \ This repo is focusing on the single scene situation. Generalizable Nerfs across
multiple scenes is currently out of the scope of this repo. But you may still find multiple scenes is currently out of the scope of this repo. But you may still find
some useful tricks in this repo. :) some useful tricks in this repo. :)
Installation: Installation:
------------- -------------
...@@ -28,6 +33,82 @@ Installation: ...@@ -28,6 +33,82 @@ Installation:
$ pip install nerfacc $ pip install nerfacc
Usage:
-------------
The idea of NerfAcc is to perform efficient ray marching and volumetric rendering.
So NerfAcc can work with any user-defined radiance field. To plug the NerfAcc rendering
pipeline into your code and enjoy the acceleration, you only need to define two functions
with your radience field.
- `sigma_fn`: Compute density at each sample. It will be used by :func:`nerfacc.ray_marching` to skip the empty and occluded space during ray marching, which is where the major speedup comes from.
- `rgb_sigma_fn`: Compute color and density at each sample. It will be used by :func:`nerfacc.rendering` to conduct differentiable volumetric rendering. This function will receive gradients to update your network.
An simple example is like this:
.. code-block:: python
import torch
from torch import Tensor
import nerfacc
radiance_field = ... # network: a NeRF model
optimizer = ... # network optimizer
rays_o: Tensor = ... # ray origins. (n_rays, 3)
rays_d: Tensor = ... # ray normalized directions. (n_rays, 3)
def sigma_fn(
t_starts: Tensor, t_ends:Tensor, ray_indices: Tensor
) -> Tensor:
""" Query density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation density values. (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
sigmas = radiance_field.query_density(positions)
return sigmas # (n_samples, 1)
def rgb_sigma_fn(
t_starts: Tensor, t_ends: Tensor, ray_indices: Tensor
) -> Tuple[Tensor, Tensor]:
""" Query rgb and density values from a user-defined radiance field.
:params t_starts: Start of the sample interval along the ray. (n_samples, 1).
:params t_ends: End of the sample interval along the ray. (n_samples, 1).
:params ray_indices: Ray indices that each sample belongs to. (n_samples,).
:returns The post-activation rgb and density values.
(n_samples, 3), (n_samples, 1).
"""
t_origins = rays_o[ray_indices] # (n_samples, 3)
t_dirs = rays_d[ray_indices] # (n_samples, 3)
positions = t_origins + t_dirs * (t_starts + t_ends) / 2.0
rgbs, sigmas = radiance_field(positions, condition=t_dirs)
return rgbs, sigmas # (n_samples, 3), (n_samples, 1)
# Efficient Raymarching: Skip empty and occluded space, pack samples from all rays.
# packed_info: (n_rays, 2). t_starts: (n_samples, 1). t_ends: (n_samples, 1).
packed_info, t_starts, t_ends = nerfacc.ray_marching(
rays_o, rays_d, sigma_fn=sigma_fn, near_plane=0.2, far_plane=1.0,
early_stop_eps=1e-4, alpha_thre=1e-2,
)
# Differentiable Volumetric Rendering.
# colors: (n_rays, 3). opaicity: (n_rays, 1). depth: (n_rays, 1).
color, opacity, depth = nerfacc.rendering(rgb_sigma_fn, packed_info, t_starts, t_ends)
# Optimize the radience field.
optimizer.zero_grad()
loss = F.mse_loss(color, color_gt)
loss.backward()
optimizer.step()
Links:
-------------
.. toctree:: .. toctree::
:glob: :glob:
:maxdepth: 1 :maxdepth: 1
...@@ -54,3 +135,8 @@ Installation: ...@@ -54,3 +135,8 @@ Installation:
.. _`D-Nerf`: https://arxiv.org/abs/2011.13961 .. _`D-Nerf`: https://arxiv.org/abs/2011.13961
.. _`MipNerf360`: https://arxiv.org/abs/2111.12077 .. _`MipNerf360`: https://arxiv.org/abs/2111.12077
.. _`pixel-Nerf`: https://arxiv.org/abs/2012.02190 .. _`pixel-Nerf`: https://arxiv.org/abs/2012.02190
.. _`Nerf++`: https://arxiv.org/abs/2010.07492
.. _`Ruilong Li`: https://www.liruilong.cn/
.. _`Matthew Tancik`: https://www.matthewtancik.com/
.. _`Angjoo Kanazawa`: https://people.eecs.berkeley.edu/~kanazawa/
\ No newline at end of file
...@@ -248,8 +248,8 @@ class VanillaNeRFRadianceField(nn.Module): ...@@ -248,8 +248,8 @@ class VanillaNeRFRadianceField(nn.Module):
class DNeRFRadianceField(nn.Module): class DNeRFRadianceField(nn.Module):
def __init__(self) -> None: def __init__(self) -> None:
super().__init__() super().__init__()
self.posi_encoder = SinusoidalEncoder(3, 0, 0, True) self.posi_encoder = SinusoidalEncoder(3, 0, 4, True)
self.time_encoder = SinusoidalEncoder(1, 0, 0, True) self.time_encoder = SinusoidalEncoder(1, 0, 4, True)
self.warp = MLP( self.warp = MLP(
input_dim=self.posi_encoder.latent_dim input_dim=self.posi_encoder.latent_dim
+ self.time_encoder.latent_dim, + self.time_encoder.latent_dim,
......
...@@ -141,17 +141,6 @@ class NGPradianceField(torch.nn.Module): ...@@ -141,17 +141,6 @@ class NGPradianceField(torch.nn.Module):
}, },
) )
def query_opacity(self, x, step_size):
density = self.query_density(x)
if self.unbounded:
# NOTE: In principle, we should use the following formula to scale
# up the step size, but in practice, it is somehow not helpful.
# derivitive = contract_to_unisphere(x, self.aabb, derivative=True)
# step_size = step_size / derivitive.norm(dim=-1, keepdim=True)
pass
opacity = density * step_size
return opacity
def query_density(self, x, return_feat: bool = False): def query_density(self, x, return_feat: bool = False):
if self.unbounded: if self.unbounded:
x = contract_to_unisphere(x, self.aabb) x = contract_to_unisphere(x, self.aabb)
......
...@@ -3,3 +3,4 @@ opencv-python ...@@ -3,3 +3,4 @@ opencv-python
imageio imageio
numpy numpy
tqdm tqdm
scipy
\ No newline at end of file
...@@ -76,7 +76,7 @@ if __name__ == "__main__": ...@@ -76,7 +76,7 @@ if __name__ == "__main__":
).item() ).item()
# setup the radiance field we want to train. # setup the radiance field we want to train.
max_steps = 50000 max_steps = 30000
grad_scaler = torch.cuda.amp.GradScaler(1) grad_scaler = torch.cuda.amp.GradScaler(1)
radiance_field = DNeRFRadianceField().to(device) radiance_field = DNeRFRadianceField().to(device)
optimizer = torch.optim.Adam(radiance_field.parameters(), lr=5e-4) optimizer = torch.optim.Adam(radiance_field.parameters(), lr=5e-4)
...@@ -156,9 +156,12 @@ if __name__ == "__main__": ...@@ -156,9 +156,12 @@ if __name__ == "__main__":
render_step_size=render_step_size, render_step_size=render_step_size,
render_bkgd=render_bkgd, render_bkgd=render_bkgd,
cone_angle=args.cone_angle, cone_angle=args.cone_angle,
alpha_thre=0.01 if step > 1000 else 0.00,
# dnerf options # dnerf options
timestamps=timestamps, timestamps=timestamps,
) )
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant. # dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels) num_rays = len(pixels)
...@@ -213,6 +216,7 @@ if __name__ == "__main__": ...@@ -213,6 +216,7 @@ if __name__ == "__main__":
render_step_size=render_step_size, render_step_size=render_step_size,
render_bkgd=render_bkgd, render_bkgd=render_bkgd,
cone_angle=args.cone_angle, cone_angle=args.cone_angle,
alpha_thre=0.01,
# test options # test options
test_chunk_size=args.test_chunk_size, test_chunk_size=args.test_chunk_size,
# dnerf options # dnerf options
......
...@@ -186,6 +186,8 @@ if __name__ == "__main__": ...@@ -186,6 +186,8 @@ if __name__ == "__main__":
render_bkgd=render_bkgd, render_bkgd=render_bkgd,
cone_angle=args.cone_angle, cone_angle=args.cone_angle,
) )
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant. # dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels) num_rays = len(pixels)
......
...@@ -140,6 +140,7 @@ if __name__ == "__main__": ...@@ -140,6 +140,7 @@ if __name__ == "__main__":
near_plane = 0.2 near_plane = 0.2
far_plane = 1e4 far_plane = 1e4
render_step_size = 1e-2 render_step_size = 1e-2
alpha_thre = 1e-2
else: else:
contraction_type = ContractionType.AABB contraction_type = ContractionType.AABB
scene_aabb = torch.tensor(args.aabb, dtype=torch.float32, device=device) scene_aabb = torch.tensor(args.aabb, dtype=torch.float32, device=device)
...@@ -150,9 +151,10 @@ if __name__ == "__main__": ...@@ -150,9 +151,10 @@ if __name__ == "__main__":
* math.sqrt(3) * math.sqrt(3)
/ render_n_samples / render_n_samples
).item() ).item()
alpha_thre = 0.0
# setup the radiance field we want to train. # setup the radiance field we want to train.
max_steps = 40000 if args.unbounded else 20000 max_steps = 20000
grad_scaler = torch.cuda.amp.GradScaler(2**10) grad_scaler = torch.cuda.amp.GradScaler(2**10)
radiance_field = NGPradianceField( radiance_field = NGPradianceField(
aabb=args.aabb, aabb=args.aabb,
...@@ -185,13 +187,33 @@ if __name__ == "__main__": ...@@ -185,13 +187,33 @@ if __name__ == "__main__":
rays = data["rays"] rays = data["rays"]
pixels = data["pixels"] pixels = data["pixels"]
# update occupancy grid def occ_eval_fn(x):
occupancy_grid.every_n_step( if args.cone_angle > 0.0:
step=step, # randomly sample a camera for computing step size.
occ_eval_fn=lambda x: radiance_field.query_opacity( camera_ids = torch.randint(
x, render_step_size 0, len(train_dataset), (x.shape[0],), device=device
), )
origins = train_dataset.camtoworlds[camera_ids, :3, -1]
t = (origins - x).norm(dim=-1, keepdim=True)
# compute actual step size used in marching, based on the distance to the camera.
step_size = torch.clamp(
t * args.cone_angle, min=render_step_size
) )
# filter out the points that are not in the near far plane.
if (near_plane is not None) and (near_plane is not None):
step_size = torch.where(
(t > near_plane) & (t < far_plane),
step_size,
torch.zeros_like(step_size),
)
else:
step_size = render_step_size
# compute occupancy
density = radiance_field.query_density(x)
return density * step_size
# update occupancy grid
occupancy_grid.every_n_step(step=step, occ_eval_fn=occ_eval_fn)
# render # render
rgb, acc, depth, n_rendering_samples = render_image( rgb, acc, depth, n_rendering_samples = render_image(
...@@ -205,7 +227,10 @@ if __name__ == "__main__": ...@@ -205,7 +227,10 @@ if __name__ == "__main__":
render_step_size=render_step_size, render_step_size=render_step_size,
render_bkgd=render_bkgd, render_bkgd=render_bkgd,
cone_angle=args.cone_angle, cone_angle=args.cone_angle,
alpha_thre=alpha_thre,
) )
if n_rendering_samples == 0:
continue
# dynamic batch size for rays to keep sample batch size constant. # dynamic batch size for rays to keep sample batch size constant.
num_rays = len(pixels) num_rays = len(pixels)
...@@ -254,11 +279,12 @@ if __name__ == "__main__": ...@@ -254,11 +279,12 @@ if __name__ == "__main__":
rays, rays,
scene_aabb, scene_aabb,
# rendering options # rendering options
near_plane=None, near_plane=near_plane,
far_plane=None, far_plane=far_plane,
render_step_size=render_step_size, render_step_size=render_step_size,
render_bkgd=render_bkgd, render_bkgd=render_bkgd,
cone_angle=args.cone_angle, cone_angle=args.cone_angle,
alpha_thre=alpha_thre,
# test options # test options
test_chunk_size=args.test_chunk_size, test_chunk_size=args.test_chunk_size,
) )
......
...@@ -30,6 +30,7 @@ def render_image( ...@@ -30,6 +30,7 @@ def render_image(
render_step_size: float = 1e-3, render_step_size: float = 1e-3,
render_bkgd: Optional[torch.Tensor] = None, render_bkgd: Optional[torch.Tensor] = None,
cone_angle: float = 0.0, cone_angle: float = 0.0,
alpha_thre: float = 0.0,
# test options # test options
test_chunk_size: int = 8192, test_chunk_size: int = 8192,
# only useful for dnerf # only useful for dnerf
...@@ -95,6 +96,7 @@ def render_image( ...@@ -95,6 +96,7 @@ def render_image(
render_step_size=render_step_size, render_step_size=render_step_size,
stratified=radiance_field.training, stratified=radiance_field.training,
cone_angle=cone_angle, cone_angle=cone_angle,
alpha_thre=alpha_thre,
) )
rgb, opacity, depth = rendering( rgb, opacity, depth = rendering(
rgb_sigma_fn, rgb_sigma_fn,
......
...@@ -50,4 +50,5 @@ __all__ = [ ...@@ -50,4 +50,5 @@ __all__ = [
"unpack_info", "unpack_info",
"ray_resampling", "ray_resampling",
"loss_distortion", "loss_distortion",
"unpack_to_ray_indices",
] ]
...@@ -7,7 +7,7 @@ import os ...@@ -7,7 +7,7 @@ import os
from subprocess import DEVNULL, call from subprocess import DEVNULL, call
from rich.console import Console from rich.console import Console
from torch.utils.cpp_extension import load from torch.utils.cpp_extension import _get_build_directory, load
PATH = os.path.dirname(os.path.abspath(__file__)) PATH = os.path.dirname(os.path.abspath(__file__))
...@@ -21,21 +21,32 @@ def cuda_toolkit_available(): ...@@ -21,21 +21,32 @@ def cuda_toolkit_available():
return False return False
_C = None def load_extention(name: str):
if cuda_toolkit_available(): return load(
console = Console() name=name,
with console.status(
"[bold yellow]Setting up CUDA (This may take a few minutes the first time)",
spinner="bouncingBall",
):
_C = load(
name="nerfacc_cuda",
sources=glob.glob(os.path.join(PATH, "csrc/*.cu")), sources=glob.glob(os.path.join(PATH, "csrc/*.cu")),
extra_cflags=["-O3"], extra_cflags=["-O3"],
extra_cuda_cflags=["-O3"], extra_cuda_cflags=["-O3"],
) )
_C = None
name = "nerfacc_cuda"
if os.listdir(_get_build_directory(name, verbose=False)) != []:
# If the build exists, we assume the extension has been built
# and we can load it.
_C = load_extention(name)
else: else:
console = Console() # First time to build the extension
console.print("[bold red]No CUDA toolkit found. NerfAcc will be disabled.") if cuda_toolkit_available():
with Console().status(
"[bold yellow]NerfAcc: Setting up CUDA (This may take a few minutes the first time)",
spinner="bouncingBall",
):
_C = load_extention(name)
else:
Console().print(
"[yellow]NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.[/yellow]"
)
__all__ = ["_C"] __all__ = ["_C"]
...@@ -73,7 +73,7 @@ torch::Tensor contract( ...@@ -73,7 +73,7 @@ torch::Tensor contract(
const int threads = 256; const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads); const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor out_samples = torch::zeros({n_samples, 3}, samples.options()); torch::Tensor out_samples = torch::empty({n_samples, 3}, samples.options());
contract_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>( contract_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
n_samples, n_samples,
...@@ -99,7 +99,7 @@ torch::Tensor contract_inv( ...@@ -99,7 +99,7 @@ torch::Tensor contract_inv(
const int threads = 256; const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads); const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor out_samples = torch::zeros({n_samples, 3}, samples.options()); torch::Tensor out_samples = torch::empty({n_samples, 3}, samples.options());
contract_inv_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>( contract_inv_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
n_samples, n_samples,
......
...@@ -91,7 +91,7 @@ torch::Tensor unpack_info(const torch::Tensor packed_info) ...@@ -91,7 +91,7 @@ torch::Tensor unpack_info(const torch::Tensor packed_info)
const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads); const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads);
int n_samples = packed_info[n_rays - 1].sum(0).item<int>(); int n_samples = packed_info[n_rays - 1].sum(0).item<int>();
torch::Tensor ray_indices = torch::zeros( torch::Tensor ray_indices = torch::empty(
{n_samples}, packed_info.options().dtype(torch::kInt32)); {n_samples}, packed_info.options().dtype(torch::kInt32));
unpack_info_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>( unpack_info_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
......
...@@ -223,7 +223,7 @@ std::vector<torch::Tensor> ray_marching( ...@@ -223,7 +223,7 @@ std::vector<torch::Tensor> ray_marching(
const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads); const int blocks = CUDA_N_BLOCKS_NEEDED(n_rays, threads);
// helper counter // helper counter
torch::Tensor num_steps = torch::zeros( torch::Tensor num_steps = torch::empty(
{n_rays}, rays_o.options().dtype(torch::kInt32)); {n_rays}, rays_o.options().dtype(torch::kInt32));
// count number of samples per ray // count number of samples per ray
...@@ -253,8 +253,8 @@ std::vector<torch::Tensor> ray_marching( ...@@ -253,8 +253,8 @@ std::vector<torch::Tensor> ray_marching(
// output samples starts and ends // output samples starts and ends
int total_steps = cum_steps[cum_steps.size(0) - 1].item<int>(); int total_steps = cum_steps[cum_steps.size(0) - 1].item<int>();
torch::Tensor t_starts = torch::zeros({total_steps, 1}, rays_o.options()); torch::Tensor t_starts = torch::empty({total_steps, 1}, rays_o.options());
torch::Tensor t_ends = torch::zeros({total_steps, 1}, rays_o.options()); torch::Tensor t_ends = torch::empty({total_steps, 1}, rays_o.options());
ray_marching_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>( ray_marching_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
// rays // rays
...@@ -328,7 +328,7 @@ torch::Tensor grid_query( ...@@ -328,7 +328,7 @@ torch::Tensor grid_query(
const int threads = 256; const int threads = 256;
const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads); const int blocks = CUDA_N_BLOCKS_NEEDED(n_samples, threads);
torch::Tensor occs = torch::zeros({n_samples}, grid_value.options()); torch::Tensor occs = torch::empty({n_samples}, grid_value.options());
AT_DISPATCH_FLOATING_TYPES_AND( AT_DISPATCH_FLOATING_TYPES_AND(
at::ScalarType::Bool, at::ScalarType::Bool,
......
...@@ -187,7 +187,7 @@ def ray_marching( ...@@ -187,7 +187,7 @@ def ray_marching(
if sigma_fn is not None: if sigma_fn is not None:
# Query sigma without gradients # Query sigma without gradients
ray_indices = unpack_info(packed_info) ray_indices = unpack_info(packed_info)
sigmas = sigma_fn(t_starts, t_ends, ray_indices) sigmas = sigma_fn(t_starts, t_ends, ray_indices.long())
assert ( assert (
sigmas.shape == t_starts.shape sigmas.shape == t_starts.shape
), "sigmas must have shape of (N, 1)! Got {}".format(sigmas.shape) ), "sigmas must have shape of (N, 1)! Got {}".format(sigmas.shape)
......
...@@ -80,7 +80,7 @@ def rendering( ...@@ -80,7 +80,7 @@ def rendering(
ray_indices = unpack_info(packed_info) ray_indices = unpack_info(packed_info)
# Query sigma and color with gradients # Query sigma and color with gradients
rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices) rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices.long())
assert rgbs.shape[-1] == 3, "rgbs must have 3 channels, got {}".format( assert rgbs.shape[-1] == 3, "rgbs must have 3 channels, got {}".format(
rgbs.shape rgbs.shape
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment