"vscode:/vscode.git/clone" did not exist on "f71c4308c62b97658bdef8dd516ae958b613f38f"
Unverified Commit 4d77b4c8 authored by Jingwei Zhang's avatar Jingwei Zhang Committed by GitHub
Browse files

[Feature] Support BEVFusion in `projects/` (#2236)

* add bevfusion models

* refactor

* build successfully

* update ImageAug3D

* support inference

* update the format of final bboxes

* add new loading func

* align test precision

* polish docstring

* refactor transformer decoder

* polish code

* fix table in readme

* fix table in readme

* fix table in readme

* update pre-commit-config

* minor changes

* revert the changes of file_client_args in LoadAnnotation3D

* remove unnucessary functions in BEVFusion

* fix loading bug

* fix docstring
parent c6a8eb1f
......@@ -193,10 +193,10 @@ class LoadMultiViewImageFromFiles(BaseTransform):
# unravel to list, see `DefaultFormatBundle` in formating.py
# which will transpose each image separately and then stack into array
results['img'] = [img[..., i] for i in range(img.shape[-1])]
results['img_shape'] = img.shape
results['ori_shape'] = img.shape
results['img_shape'] = img.shape[:2]
results['ori_shape'] = img.shape[:2]
# Set initial values for default meta_keys
results['pad_shape'] = img.shape
results['pad_shape'] = img.shape[:2]
if self.set_default_scale:
results['scale_factor'] = 1.0
num_channels = 1 if len(img.shape) < 3 else img.shape[2]
......@@ -297,9 +297,13 @@ class LoadPointsFromMultiSweeps(BaseTransform):
test_mode: bool = False) -> None:
self.load_dim = load_dim
self.sweeps_num = sweeps_num
if isinstance(use_dim, int):
use_dim = list(range(use_dim))
assert max(use_dim) < load_dim, \
f'Expect all used dimensions < {load_dim}, got {use_dim}'
self.use_dim = use_dim
self.file_client_args = file_client_args.copy()
self.file_client = None
self.file_client = mmengine.FileClient(**self.file_client_args)
self.pad_empty_sweeps = pad_empty_sweeps
self.remove_close = remove_close
self.test_mode = test_mode
......@@ -761,6 +765,7 @@ class LoadAnnotations3D(LoadAnnotations):
self.with_mask_3d = with_mask_3d
self.with_seg_3d = with_seg_3d
self.seg_3d_dtype = seg_3d_dtype
self.file_client = None
def _load_bboxes_3d(self, results: dict) -> dict:
"""Private function to move the 3D bounding box annotation from
......
......@@ -6,11 +6,12 @@ from .misc import replace_ceph_backend
from .setup_env import register_all_modules, setup_multi_processes
from .typing_utils import (ConfigType, InstanceList, MultiConfig,
OptConfigType, OptInstanceList, OptMultiConfig,
OptSamplingResultList)
OptSampleList, OptSamplingResultList)
__all__ = [
'collect_env', 'setup_multi_processes', 'compat_cfg',
'register_all_modules', 'array_converter', 'ArrayConverter', 'ConfigType',
'OptConfigType', 'MultiConfig', 'OptMultiConfig', 'InstanceList',
'OptInstanceList', 'OptSamplingResultList', 'replace_ceph_backend'
'OptInstanceList', 'OptSamplingResultList', 'replace_ceph_backend',
'OptSampleList'
]
......@@ -23,3 +23,4 @@ SamplingResultList = List[SamplingResult]
OptSamplingResultList = Optional[SamplingResultList]
SampleList = List[Det3DDataSample]
OptSampleList = Optional[SampleList]
# BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
> [BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation](https://arxiv.org/abs/2205.13542)
<!-- [ALGORITHM] -->
## Abstract
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost. Code to reproduce our
results is available at https://github.com/mit-han-lab/bevfusion.
<div align=center>
<img src="https://user-images.githubusercontent.com/34888372/215313913-4b43f8a1-e2e2-49ba-b631-992155351922.png" width="800"/>
</div>
## Introduction
We implement BEVFusion and provide the results and pretrained checkpoints on NuScenes dataset.
## Usage
<!-- For a typical model, this section should contain the commands for training and testing. You are also suggested to dump your environment specification to env.yml by `conda env export > env.yml`. -->
### Compiling operations on CUDA
**Note** that the voxelization OP in the original implementation of `BEVFusion` is different from the implementation in MMCV. If you want to use the original pretrained model [here](https://github.com/mit-han-lab/bevfusion/blob/main/README.md), you need to use the original implementation of voxelization OP.
```python
python projects/BEVFusion/setup.py develop
```
### Training commands
In MMDetection3D's root directory, run the following command to train the model:
```bash
python tools/train.py projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py
```
For multi-gpu training, run:
```bash
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=${NUM_GPUS} --master_port=29506 --master_addr="127.0.0.1" tools/train.py projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py
```
### Testing commands
In MMDetection3D's root directory, run the following command to test the model:
```bash
python tools/train.py projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py ${CHECKPOINT_PATH}
```
## Results and models
### NuScenes
| Backbone | Voxel type (voxel size) | NMS | Mem (GB) | Inf time (fps) | NDS | mAP | Download |
| :-----------------------------------------------------------------------------: | :---------------------: | :-: | :------: | :------------: | :---: | :---: | :------------------------------------------------------------------------------------------------------: |
| [SECFPN](./configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py) | voxel (0.075) | × | - | - | 71.62 | 68.77 | [converted_model](https://drive.google.com/file/d/1QkvbYDk4G2d6SZoeJqish13qSyXA4lp3/view?usp=share_link) |
## Citation
```latex
@inproceedings{liu2022bevfusion,
title={BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation},
author={Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2023}
}
```
## Checklist
<!-- Here is a checklist illustrating a usual development workflow of a successful project, and also serves as an overview of this project's progress. The PIC (person in charge) or contributors of this project should check all the items that they believe have been finished, which will further be verified by codebase maintainers via a PR.
OpenMMLab's maintainer will review the code to ensure the project's quality. Reaching the first milestone means that this project suffices the minimum requirement of being merged into 'projects/'. But this project is only eligible to become a part of the core package upon attaining the last milestone.
Note that keeping this section up-to-date is crucial not only for this project's developers but the entire community, since there might be some other contributors joining this project and deciding their starting point from this list. It also helps maintainers accurately estimate time and effort on further code polishing, if needed.
A project does not necessarily have to be finished in a single PR, but it's essential for the project to at least reach the first milestone in its very first PR. -->
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
- [x] Finish the code
<!-- The code's design shall follow existing interfaces and convention. For example, each model component should be registered into `mmdet3d.registry.MODELS` and configurable via a config file. -->
- [x] Basic docstrings & proper citation
<!-- Each major object should contain a docstring, describing its functionality and arguments. If you have adapted the code from other open-source projects, don't forget to cite the source project in docstring and make sure your behavior is not against its license. Typically, we do not accept any code snippet under GPL license. [A Short Guide to Open Source Licenses](https://medium.com/nationwide-technology/a-short-guide-to-open-source-licenses-cf5b1c329edd) -->
- [x] Test-time correctness
<!-- If you are reproducing the result from a paper, make sure your model's inference-time performance matches that in the original paper. The weights usually could be obtained by simply renaming the keys in the official pre-trained weights. This test could be skipped though, if you are able to prove the training-time correctness and check the second milestone. -->
- [x] A full README
<!-- As this template does. -->
- [ ] Milestone 2: Indicates a successful model implementation.
- [ ] Training-time correctness
<!-- If you are reproducing the result from a paper, checking this item means that you should have trained your model from scratch based on the original paper's specification and verified that the final result matches the report within a minor error range. -->
- [ ] Milestone 3: Good to be a part of our core package!
- [ ] Type hints and docstrings
<!-- Ideally *all* the methods should have [type hints](https://www.pythontutorial.net/python-basics/python-type-hints/) and [docstrings](https://google.github.io/styleguide/pyguide.html#381-docstrings). [Example](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/models/detectors/fcos_mono3d.py) -->
- [ ] Unit tests
<!-- Unit tests for each module are required. [Example](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tests/test_models/test_dense_heads/test_fcos_mono3d_head.py) -->
- [ ] Code polishing
<!-- Refactor your code according to reviewer's comment. -->
- [ ] Metafile.yml
<!-- It will be parsed by MIM and Inferencer. [Example](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/fcos3d/metafile.yml) -->
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
<!-- In particular, you may have to refactor this README into a standard one. [Example](/configs/textdet/dbnet/README.md) -->
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
from .bevfusion import BEVFusion
from .bevfusion_necks import GeneralizedLSSFPN
from .depth_lss import DepthLSSTransform
from .loading import BEVLoadMultiViewImageFromFiles
from .sparse_encoder import BEVFusionSparseEncoder
from .transformer import TransformerDecoderLayer
from .transforms_3d import GridMask, ImageAug3D
from .transfusion_head import ConvFuser, TransFusionHead
from .utils import (BBoxBEVL1Cost, HeuristicAssigner3D, HungarianAssigner3D,
IoU3DCost)
__all__ = [
'BEVFusion', 'TransFusionHead', 'ConvFuser', 'ImageAug3D', 'GridMask',
'GeneralizedLSSFPN', 'HungarianAssigner3D', 'BBoxBEVL1Cost', 'IoU3DCost',
'HeuristicAssigner3D', 'DepthLSSTransform',
'BEVLoadMultiViewImageFromFiles', 'BEVFusionSparseEncoder',
'TransformerDecoderLayer'
]
from typing import Dict, List, Optional
import numpy as np
import torch
from torch import Tensor
from torch.nn import functional as F
from mmdet3d.models import Base3DDetector
from mmdet3d.registry import MODELS
from mmdet3d.structures import Det3DDataSample
from mmdet3d.utils import OptConfigType, OptMultiConfig, OptSampleList
from .ops import Voxelization
@MODELS.register_module()
class BEVFusion(Base3DDetector):
def __init__(
self,
data_preprocessor: OptConfigType = None,
pts_voxel_encoder: Optional[dict] = None,
pts_middle_encoder: Optional[dict] = None,
fusion_layer: Optional[dict] = None,
img_backbone: Optional[dict] = None,
pts_backbone: Optional[dict] = None,
vtransform: Optional[dict] = None,
img_neck: Optional[dict] = None,
pts_neck: Optional[dict] = None,
bbox_head: Optional[dict] = None,
init_cfg: OptMultiConfig = None,
seg_head: Optional[dict] = None,
**kwargs,
) -> None:
voxelize_cfg = data_preprocessor.pop('voxelize_cfg')
super().__init__(
data_preprocessor=data_preprocessor, init_cfg=init_cfg)
self.voxelize_reduce = voxelize_cfg.pop('voxelize_reduce')
self.pts_voxel_layer = Voxelization(**voxelize_cfg)
self.pts_voxel_encoder = MODELS.build(pts_voxel_encoder)
self.img_backbone = MODELS.build(img_backbone)
self.img_neck = MODELS.build(img_neck)
self.vtransform = MODELS.build(vtransform)
self.pts_middle_encoder = MODELS.build(pts_middle_encoder)
self.fusion_layer = MODELS.build(fusion_layer)
self.pts_backbone = MODELS.build(pts_backbone)
self.pts_neck = MODELS.build(pts_neck)
self.bbox_head = MODELS.build(bbox_head)
# hard code here where using converted checkpoint of original
# implementation of `BEVFusion`
self.use_converted_checkpoint = True
self.init_weights()
def _forward(self,
batch_inputs: Tensor,
batch_data_samples: OptSampleList = None):
"""Network forward process.
Usually includes backbone, neck and head forward without any post-
processing.
"""
pass
def init_weights(self) -> None:
if self.img_backbone is not None:
self.img_backbone.init_weights()
@property
def with_bbox_head(self):
"""bool: Whether the detector has a box head."""
return hasattr(self, 'bbox_head') and self.bbox_head is not None
@property
def with_seg_head(self):
"""bool: Whether the detector has a segmentation head.
"""
return hasattr(self, 'seg_head') and self.seg_head is not None
def extract_img_feat(
self,
x,
points,
lidar2image,
camera_intrinsics,
camera2lidar,
img_aug_matrix,
lidar_aug_matrix,
img_metas,
) -> torch.Tensor:
B, N, C, H, W = x.size()
x = x.view(B * N, C, H, W)
x = self.img_backbone(x)
x = self.img_neck(x)
if not isinstance(x, torch.Tensor):
x = x[0]
BN, C, H, W = x.size()
x = x.view(B, int(BN / B), C, H, W)
x = self.vtransform(
x,
points,
lidar2image,
camera_intrinsics,
camera2lidar,
img_aug_matrix,
lidar_aug_matrix,
img_metas,
)
return x
def extract_pts_feat(self, batch_inputs_dict) -> torch.Tensor:
points = batch_inputs_dict['points']
feats, coords, sizes = self.voxelize(points)
batch_size = coords[-1, 0] + 1
x = self.pts_middle_encoder(feats, coords, batch_size)
return x
@torch.no_grad()
def voxelize(self, points):
feats, coords, sizes = [], [], []
for k, res in enumerate(points):
ret = self.pts_voxel_layer(res)
if len(ret) == 3:
# hard voxelize
f, c, n = ret
else:
assert len(ret) == 2
f, c = ret
n = None
feats.append(f)
coords.append(F.pad(c, (1, 0), mode='constant', value=k))
if n is not None:
sizes.append(n)
feats = torch.cat(feats, dim=0)
coords = torch.cat(coords, dim=0)
if len(sizes) > 0:
sizes = torch.cat(sizes, dim=0)
if self.voxelize_reduce:
feats = feats.sum(
dim=1, keepdim=False) / sizes.type_as(feats).view(-1, 1)
feats = feats.contiguous()
return feats, coords, sizes
def predict(self, batch_inputs_dict: Dict[str, Optional[Tensor]],
batch_data_samples: List[Det3DDataSample],
**kwargs) -> List[Det3DDataSample]:
"""Forward of testing.
Args:
batch_inputs_dict (dict): The model input dict which include
'points' keys.
- points (list[torch.Tensor]): Point cloud of each sample.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
Samples. It usually includes information such as
`gt_instance_3d`.
Returns:
list[:obj:`Det3DDataSample`]: Detection results of the
input sample. Each Det3DDataSample usually contain
'pred_instances_3d'. And the ``pred_instances_3d`` usually
contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bbox_3d (:obj:`BaseInstance3DBoxes`): Prediction of bboxes,
contains a tensor with shape (num_instances, 7).
"""
batch_input_metas = [item.metainfo for item in batch_data_samples]
feats = self.extract_feat(batch_inputs_dict, batch_input_metas)
if self.with_bbox_head:
outputs = self.bbox_head.predict(feats, batch_input_metas)
if self.use_converted_checkpoint:
outputs[0]['bboxes_3d'].tensor[:, 6] = -outputs[0][
'bboxes_3d'].tensor[:, 6] - np.pi / 2
outputs[0]['bboxes_3d'].tensor[:, 3:5] = outputs[0][
'bboxes_3d'].tensor[:, [4, 3]]
res = self.add_pred_to_datasample(batch_data_samples, outputs)
return res
def extract_feat(
self,
batch_inputs_dict,
batch_input_metas,
**kwargs,
):
imgs = batch_inputs_dict.get('imgs', None)
points = batch_inputs_dict.get('points', None)
lidar2image, camera_intrinsics, camera2lidar = [], [], []
img_aug_matrix, lidar_aug_matrix = [], []
for i, meta in enumerate(batch_input_metas):
lidar2image.append(meta['lidar2img'])
camera_intrinsics.append(meta['cam2img'])
camera2lidar.append(meta['cam2lidar'])
img_aug_matrix.append(meta.get('img_aug_matrix', np.eye(4)))
lidar_aug_matrix.append(meta.get('lidar_aug_matrix', np.eye(4)))
lidar2image = imgs.new_tensor(np.asarray(lidar2image))
camera_intrinsics = imgs.new_tensor(np.array(camera_intrinsics))
camera2lidar = imgs.new_tensor(np.asarray(camera2lidar))
img_aug_matrix = imgs.new_tensor(np.asarray(img_aug_matrix))
lidar_aug_matrix = imgs.new_tensor(np.asarray(lidar_aug_matrix))
img_feature = self.extract_img_feat(imgs, points, lidar2image,
camera_intrinsics, camera2lidar,
img_aug_matrix, lidar_aug_matrix,
batch_input_metas)
pts_feature = self.extract_pts_feat(batch_inputs_dict)
features = [img_feature, pts_feature]
if self.fusion_layer is not None:
x = self.fusion_layer(features)
else:
assert len(features) == 1, features
x = features[0]
x = self.pts_backbone(x)
x = self.pts_neck(x)
return x
def loss(self, batch_inputs_dict: Dict[str, Optional[Tensor]],
batch_data_samples: List[Det3DDataSample],
**kwargs) -> List[Det3DDataSample]:
pass
# modify from https://github.com/mit-han-lab/bevfusion
import torch
import torch.nn as nn
import torch.nn.functional as F
from mmcv.cnn import ConvModule
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class GeneralizedLSSFPN(BaseModule):
def __init__(
self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
no_norm_on_lateral=False,
conv_cfg=None,
norm_cfg=dict(type='BN2d'),
act_cfg=dict(type='ReLU'),
upsample_cfg=dict(mode='bilinear', align_corners=True),
) -> None:
super().__init__()
assert isinstance(in_channels, list)
self.in_channels = in_channels
self.out_channels = out_channels
self.num_ins = len(in_channels)
self.num_outs = num_outs
self.no_norm_on_lateral = no_norm_on_lateral
self.fp16_enabled = False
self.upsample_cfg = upsample_cfg.copy()
if end_level == -1:
self.backbone_end_level = self.num_ins - 1
# assert num_outs >= self.num_ins - start_level
else:
# if end_level < inputs, no extra level is allowed
self.backbone_end_level = end_level
assert end_level <= len(in_channels)
assert num_outs == end_level - start_level
self.start_level = start_level
self.end_level = end_level
self.lateral_convs = nn.ModuleList()
self.fpn_convs = nn.ModuleList()
for i in range(self.start_level, self.backbone_end_level):
l_conv = ConvModule(
in_channels[i] +
(in_channels[i + 1] if i == self.backbone_end_level -
1 else out_channels),
out_channels,
1,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg if not self.no_norm_on_lateral else None,
act_cfg=act_cfg,
inplace=False,
)
fpn_conv = ConvModule(
out_channels,
out_channels,
3,
padding=1,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=act_cfg,
inplace=False,
)
self.lateral_convs.append(l_conv)
self.fpn_convs.append(fpn_conv)
def forward(self, inputs):
"""Forward function."""
# upsample -> cat -> conv1x1 -> conv3x3
assert len(inputs) == len(self.in_channels)
# build laterals
laterals = [inputs[i + self.start_level] for i in range(len(inputs))]
# build top-down path
used_backbone_levels = len(laterals) - 1
for i in range(used_backbone_levels - 1, -1, -1):
x = F.interpolate(
laterals[i + 1],
size=laterals[i].shape[2:],
**self.upsample_cfg,
)
laterals[i] = torch.cat([laterals[i], x], dim=1)
laterals[i] = self.lateral_convs[i](laterals[i])
laterals[i] = self.fpn_convs[i](laterals[i])
# build outputs
outs = [laterals[i] for i in range(used_backbone_levels)]
return tuple(outs)
# modify from https://github.com/mit-han-lab/bevfusion
from typing import Tuple
import torch
from torch import nn
from mmdet3d.registry import MODELS
from .ops import bev_pool
def gen_dx_bx(xbound, ybound, zbound):
dx = torch.Tensor([row[2] for row in [xbound, ybound, zbound]])
bx = torch.Tensor(
[row[0] + row[2] / 2.0 for row in [xbound, ybound, zbound]])
nx = torch.LongTensor([(row[1] - row[0]) / row[2]
for row in [xbound, ybound, zbound]])
return dx, bx, nx
class BaseTransform(nn.Module):
def __init__(
self,
in_channels: int,
out_channels: int,
image_size: Tuple[int, int],
feature_size: Tuple[int, int],
xbound: Tuple[float, float, float],
ybound: Tuple[float, float, float],
zbound: Tuple[float, float, float],
dbound: Tuple[float, float, float],
) -> None:
super().__init__()
self.in_channels = in_channels
self.image_size = image_size
self.feature_size = feature_size
self.xbound = xbound
self.ybound = ybound
self.zbound = zbound
self.dbound = dbound
dx, bx, nx = gen_dx_bx(self.xbound, self.ybound, self.zbound)
self.dx = nn.Parameter(dx, requires_grad=False)
self.bx = nn.Parameter(bx, requires_grad=False)
self.nx = nn.Parameter(nx, requires_grad=False)
self.C = out_channels
self.frustum = self.create_frustum()
self.D = self.frustum.shape[0]
self.fp16_enabled = False
def create_frustum(self):
iH, iW = self.image_size
fH, fW = self.feature_size
ds = (
torch.arange(*self.dbound,
dtype=torch.float).view(-1, 1, 1).expand(-1, fH, fW))
D, _, _ = ds.shape
xs = (
torch.linspace(0, iW - 1, fW,
dtype=torch.float).view(1, 1, fW).expand(D, fH, fW))
ys = (
torch.linspace(0, iH - 1, fH,
dtype=torch.float).view(1, fH, 1).expand(D, fH, fW))
frustum = torch.stack((xs, ys, ds), -1)
return nn.Parameter(frustum, requires_grad=False)
def get_geometry(
self,
camera2lidar_rots,
camera2lidar_trans,
intrins,
post_rots,
post_trans,
**kwargs,
):
B, N, _ = camera2lidar_trans.shape
# undo post-transformation
# B x N x D x H x W x 3
points = self.frustum - post_trans.view(B, N, 1, 1, 1, 3)
points = (
torch.inverse(post_rots).view(B, N, 1, 1, 1, 3,
3).matmul(points.unsqueeze(-1)))
# cam_to_lidar
points = torch.cat(
(
points[:, :, :, :, :, :2] * points[:, :, :, :, :, 2:3],
points[:, :, :, :, :, 2:3],
),
5,
)
combine = camera2lidar_rots.matmul(torch.inverse(intrins))
points = combine.view(B, N, 1, 1, 1, 3, 3).matmul(points).squeeze(-1)
points += camera2lidar_trans.view(B, N, 1, 1, 1, 3)
if 'extra_rots' in kwargs:
extra_rots = kwargs['extra_rots']
points = (
extra_rots.view(B, 1, 1, 1, 1, 3,
3).repeat(1, N, 1, 1, 1, 1, 1).matmul(
points.unsqueeze(-1)).squeeze(-1))
if 'extra_trans' in kwargs:
extra_trans = kwargs['extra_trans']
points += extra_trans.view(B, 1, 1, 1, 1,
3).repeat(1, N, 1, 1, 1, 1)
return points
def get_cam_feats(self, x):
raise NotImplementedError
def bev_pool(self, geom_feats, x):
B, N, D, H, W, C = x.shape
Nprime = B * N * D * H * W
# flatten x
x = x.reshape(Nprime, C)
# flatten indices
geom_feats = ((geom_feats - (self.bx - self.dx / 2.0)) /
self.dx).long()
geom_feats = geom_feats.view(Nprime, 3)
batch_ix = torch.cat([
torch.full([Nprime // B, 1], ix, device=x.device, dtype=torch.long)
for ix in range(B)
])
geom_feats = torch.cat((geom_feats, batch_ix), 1)
# filter out points that are outside box
kept = ((geom_feats[:, 0] >= 0)
& (geom_feats[:, 0] < self.nx[0])
& (geom_feats[:, 1] >= 0)
& (geom_feats[:, 1] < self.nx[1])
& (geom_feats[:, 2] >= 0)
& (geom_feats[:, 2] < self.nx[2]))
x = x[kept]
geom_feats = geom_feats[kept]
x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])
# collapse Z
final = torch.cat(x.unbind(dim=2), 1)
return final
def forward(
self,
img,
points,
lidar2image,
camera_intrinsics,
camera2lidar,
img_aug_matrix,
lidar_aug_matrix,
**kwargs,
):
intrins = camera_intrinsics[..., :3, :3]
post_rots = img_aug_matrix[..., :3, :3]
post_trans = img_aug_matrix[..., :3, 3]
camera2lidar_rots = camera2lidar[..., :3, :3]
camera2lidar_trans = camera2lidar[..., :3, 3]
extra_rots = lidar_aug_matrix[..., :3, :3]
extra_trans = lidar_aug_matrix[..., :3, 3]
geom = self.get_geometry(
camera2lidar_rots,
camera2lidar_trans,
intrins,
post_rots,
post_trans,
extra_rots=extra_rots,
extra_trans=extra_trans,
)
x = self.get_cam_feats(img)
x = self.bev_pool(geom, x)
return x
class BaseDepthTransform(BaseTransform):
def forward(
self,
img,
points,
lidar2image,
cam_intrinsic,
camera2lidar,
img_aug_matrix,
lidar_aug_matrix,
metas,
**kwargs,
):
intrins = cam_intrinsic[..., :3, :3]
post_rots = img_aug_matrix[..., :3, :3]
post_trans = img_aug_matrix[..., :3, 3]
camera2lidar_rots = camera2lidar[..., :3, :3]
camera2lidar_trans = camera2lidar[..., :3, 3]
# print(img.shape, self.image_size, self.feature_size)
batch_size = len(points)
depth = torch.zeros(batch_size, img.shape[1], 1,
*self.image_size).to(points[0].device)
for b in range(batch_size):
cur_coords = points[b][:, :3]
cur_img_aug_matrix = img_aug_matrix[b]
cur_lidar_aug_matrix = lidar_aug_matrix[b]
cur_lidar2image = lidar2image[b]
# inverse aug
cur_coords -= cur_lidar_aug_matrix[:3, 3]
cur_coords = torch.inverse(cur_lidar_aug_matrix[:3, :3]).matmul(
cur_coords.transpose(1, 0))
# lidar2image
cur_coords = cur_lidar2image[:, :3, :3].matmul(cur_coords)
cur_coords += cur_lidar2image[:, :3, 3].reshape(-1, 3, 1)
# get 2d coords
dist = cur_coords[:, 2, :]
cur_coords[:, 2, :] = torch.clamp(cur_coords[:, 2, :], 1e-5, 1e5)
cur_coords[:, :2, :] /= cur_coords[:, 2:3, :]
# imgaug
cur_coords = cur_img_aug_matrix[:, :3, :3].matmul(cur_coords)
cur_coords += cur_img_aug_matrix[:, :3, 3].reshape(-1, 3, 1)
cur_coords = cur_coords[:, :2, :].transpose(1, 2)
# normalize coords for grid sample
cur_coords = cur_coords[..., [1, 0]]
on_img = ((cur_coords[..., 0] < self.image_size[0])
& (cur_coords[..., 0] >= 0)
& (cur_coords[..., 1] < self.image_size[1])
& (cur_coords[..., 1] >= 0))
for c in range(on_img.shape[0]):
masked_coords = cur_coords[c, on_img[c]].long()
masked_dist = dist[c, on_img[c]]
depth[b, c, 0, masked_coords[:, 0],
masked_coords[:, 1]] = masked_dist
extra_rots = lidar_aug_matrix[..., :3, :3]
extra_trans = lidar_aug_matrix[..., :3, 3]
geom = self.get_geometry(
camera2lidar_rots,
camera2lidar_trans,
intrins,
post_rots,
post_trans,
extra_rots=extra_rots,
extra_trans=extra_trans,
)
x = self.get_cam_feats(img, depth)
x = self.bev_pool(geom, x)
return x
@MODELS.register_module()
class DepthLSSTransform(BaseDepthTransform):
def __init__(
self,
in_channels: int,
out_channels: int,
image_size: Tuple[int, int],
feature_size: Tuple[int, int],
xbound: Tuple[float, float, float],
ybound: Tuple[float, float, float],
zbound: Tuple[float, float, float],
dbound: Tuple[float, float, float],
downsample: int = 1,
) -> None:
super().__init__(
in_channels=in_channels,
out_channels=out_channels,
image_size=image_size,
feature_size=feature_size,
xbound=xbound,
ybound=ybound,
zbound=zbound,
dbound=dbound,
)
self.dtransform = nn.Sequential(
nn.Conv2d(1, 8, 1),
nn.BatchNorm2d(8),
nn.ReLU(True),
nn.Conv2d(8, 32, 5, stride=4, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(True),
nn.Conv2d(32, 64, 5, stride=2, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(True),
)
self.depthnet = nn.Sequential(
nn.Conv2d(in_channels + 64, in_channels, 3, padding=1),
nn.BatchNorm2d(in_channels),
nn.ReLU(True),
nn.Conv2d(in_channels, in_channels, 3, padding=1),
nn.BatchNorm2d(in_channels),
nn.ReLU(True),
nn.Conv2d(in_channels, self.D + self.C, 1),
)
if downsample > 1:
assert downsample == 2, downsample
self.downsample = nn.Sequential(
nn.Conv2d(
out_channels, out_channels, 3, padding=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(True),
nn.Conv2d(
out_channels,
out_channels,
3,
stride=downsample,
padding=1,
bias=False,
),
nn.BatchNorm2d(out_channels),
nn.ReLU(True),
nn.Conv2d(
out_channels, out_channels, 3, padding=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(True),
)
else:
self.downsample = nn.Identity()
def get_cam_feats(self, x, d):
B, N, C, fH, fW = x.shape
d = d.view(B * N, *d.shape[2:])
x = x.view(B * N, C, fH, fW)
d = self.dtransform(d)
x = torch.cat([d, x], dim=1)
x = self.depthnet(x)
depth = x[:, :self.D].softmax(dim=1)
x = depth.unsqueeze(1) * x[:, self.D:(self.D + self.C)].unsqueeze(2)
x = x.view(B, N, self.C, self.D, fH, fW)
x = x.permute(0, 1, 3, 4, 5, 2)
return x
def forward(self, *args, **kwargs):
x = super().forward(*args, **kwargs)
x = self.downsample(x)
return x
# Copyright (c) OpenMMLab. All rights reserved.
import copy
from typing import Optional
import mmcv
import mmengine
import numpy as np
from mmdet3d.datasets.transforms import LoadMultiViewImageFromFiles
from mmdet3d.registry import TRANSFORMS
@TRANSFORMS.register_module()
class BEVLoadMultiViewImageFromFiles(LoadMultiViewImageFromFiles):
"""Load multi channel images from a list of separate channel files.
``BEVLoadMultiViewImageFromFiles`` adds the following keys for the
convenience of view transforms in the forward:
- 'cam2lidar'
- 'lidar2img'
Args:
to_float32 (bool): Whether to convert the img to float32.
Defaults to False.
color_type (str): Color type of the file. Defaults to 'unchanged'.
file_client_args (dict): Arguments to instantiate a FileClient.
See :class:`mmengine.fileio.FileClient` for details.
Defaults to dict(backend='disk').
num_views (int): Number of view in a frame. Defaults to 5.
num_ref_frames (int): Number of frame in loading. Defaults to -1.
test_mode (bool): Whether is test mode in loading. Defaults to False.
set_default_scale (bool): Whether to set default scale.
Defaults to True.
"""
def transform(self, results: dict) -> Optional[dict]:
"""Call function to load multi-view image from files.
Args:
results (dict): Result dict containing multi-view image filenames.
Returns:
dict: The result dict containing the multi-view image data.
Added keys and values are described below.
- filename (str): Multi-view image filenames.
- img (np.ndarray): Multi-view image arrays.
- img_shape (tuple[int]): Shape of multi-view image arrays.
- ori_shape (tuple[int]): Shape of original image arrays.
- pad_shape (tuple[int]): Shape of padded image arrays.
- scale_factor (float): Scale factor.
- img_norm_cfg (dict): Normalization configuration of images.
"""
# TODO: consider split the multi-sweep part out of this pipeline
# Derive the mask and transform for loading of multi-sweep data
if self.num_ref_frames > 0:
# init choice with the current frame
init_choice = np.array([0], dtype=np.int64)
num_frames = len(results['img_filename']) // self.num_views - 1
if num_frames == 0: # no previous frame, then copy cur frames
choices = np.random.choice(
1, self.num_ref_frames, replace=True)
elif num_frames >= self.num_ref_frames:
# NOTE: suppose the info is saved following the order
# from latest to earlier frames
if self.test_mode:
choices = np.arange(num_frames - self.num_ref_frames,
num_frames) + 1
# NOTE: +1 is for selecting previous frames
else:
choices = np.random.choice(
num_frames, self.num_ref_frames, replace=False) + 1
elif num_frames > 0 and num_frames < self.num_ref_frames:
if self.test_mode:
base_choices = np.arange(num_frames) + 1
random_choices = np.random.choice(
num_frames,
self.num_ref_frames - num_frames,
replace=True) + 1
choices = np.concatenate([base_choices, random_choices])
else:
choices = np.random.choice(
num_frames, self.num_ref_frames, replace=True) + 1
else:
raise NotImplementedError
choices = np.concatenate([init_choice, choices])
select_filename = []
for choice in choices:
select_filename += results['img_filename'][choice *
self.num_views:
(choice + 1) *
self.num_views]
results['img_filename'] = select_filename
for key in ['cam2img', 'lidar2cam']:
if key in results:
select_results = []
for choice in choices:
select_results += results[key][choice *
self.num_views:(choice +
1) *
self.num_views]
results[key] = select_results
for key in ['ego2global']:
if key in results:
select_results = []
for choice in choices:
select_results += [results[key][choice]]
results[key] = select_results
# Transform lidar2cam to
# [cur_lidar]2[prev_img] and [cur_lidar]2[prev_cam]
for key in ['lidar2cam']:
if key in results:
# only change matrices of previous frames
for choice_idx in range(1, len(choices)):
pad_prev_ego2global = np.eye(4)
prev_ego2global = results['ego2global'][choice_idx]
pad_prev_ego2global[:prev_ego2global.
shape[0], :prev_ego2global.
shape[1]] = prev_ego2global
pad_cur_ego2global = np.eye(4)
cur_ego2global = results['ego2global'][0]
pad_cur_ego2global[:cur_ego2global.
shape[0], :cur_ego2global.
shape[1]] = cur_ego2global
cur2prev = np.linalg.inv(pad_prev_ego2global).dot(
pad_cur_ego2global)
for result_idx in range(choice_idx * self.num_views,
(choice_idx + 1) *
self.num_views):
results[key][result_idx] = \
results[key][result_idx].dot(cur2prev)
# Support multi-view images with different shapes
# TODO: record the origin shape and padded shape
filename, cam2img, lidar2cam, cam2lidar, lidar2img = [], [], [], [], []
for _, cam_item in results['images'].items():
filename.append(cam_item['img_path'])
lidar2cam.append(cam_item['lidar2cam'])
lidar2cam_array = np.array(cam_item['lidar2cam']).astype(
np.float32)
lidar2cam_rot = lidar2cam_array[:3, :3]
lidar2cam_trans = lidar2cam_array[:3, 3:4]
camera2lidar = np.eye(4)
camera2lidar[:3, :3] = lidar2cam_rot.T
camera2lidar[:3, 3:4] = -1 * np.matmul(
lidar2cam_rot.T, lidar2cam_trans.reshape(3, 1))
cam2lidar.append(camera2lidar)
cam2img_array = np.eye(4).astype(np.float32)
cam2img_array[:3, :3] = np.array(cam_item['cam2img']).astype(
np.float32)
cam2img.append(cam2img_array)
lidar2img.append(cam2img_array @ lidar2cam_array)
results['img_path'] = filename
results['cam2img'] = np.stack(cam2img, axis=0)
results['lidar2cam'] = np.stack(lidar2cam, axis=0)
results['cam2lidar'] = np.stack(cam2lidar, axis=0)
results['lidar2img'] = np.stack(lidar2img, axis=0)
results['ori_cam2img'] = copy.deepcopy(results['cam2img'])
if self.file_client is None:
self.file_client = mmengine.FileClient(**self.file_client_args)
# img is of shape (h, w, c, num_views)
# h and w can be different for different views
img_bytes = [self.file_client.get(name) for name in filename]
imgs = [
mmcv.imfrombytes(
img_byte,
flag=self.color_type,
backend='pillow',
channel_order='rgb') for img_byte in img_bytes
]
# handle the image with different shape
img_shapes = np.stack([img.shape for img in imgs], axis=0)
img_shape_max = np.max(img_shapes, axis=0)
img_shape_min = np.min(img_shapes, axis=0)
assert img_shape_min[-1] == img_shape_max[-1]
if not np.all(img_shape_max == img_shape_min):
pad_shape = img_shape_max[:2]
else:
pad_shape = None
if pad_shape is not None:
imgs = [
mmcv.impad(img, shape=pad_shape, pad_val=0) for img in imgs
]
img = np.stack(imgs, axis=-1)
if self.to_float32:
img = img.astype(np.float32)
results['filename'] = filename
# unravel to list, see `DefaultFormatBundle` in formating.py
# which will transpose each image separately and then stack into array
results['img'] = [img[..., i] for i in range(img.shape[-1])]
results['img_shape'] = img.shape[:2]
results['ori_shape'] = img.shape[:2]
# Set initial values for default meta_keys
results['pad_shape'] = img.shape[:2]
if self.set_default_scale:
results['scale_factor'] = 1.0
num_channels = 1 if len(img.shape) < 3 else img.shape[2]
results['img_norm_cfg'] = dict(
mean=np.zeros(num_channels, dtype=np.float32),
std=np.ones(num_channels, dtype=np.float32),
to_rgb=False)
results['num_views'] = self.num_views
results['num_ref_frames'] = self.num_ref_frames
return results
from .bev_pool import bev_pool
from .voxel import DynamicScatter, Voxelization, dynamic_scatter, voxelization
__all__ = [
'bev_pool', 'Voxelization', 'voxelization', 'dynamic_scatter',
'DynamicScatter'
]
from .bev_pool import bev_pool
__all__ = ['bev_pool']
import torch
from . import bev_pool_ext
class QuickCumsum(torch.autograd.Function):
@staticmethod
def forward(ctx, x, geom_feats, ranks):
x = x.cumsum(0)
kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)
kept[:-1] = ranks[1:] != ranks[:-1]
x, geom_feats = x[kept], geom_feats[kept]
x = torch.cat((x[:1], x[1:] - x[:-1]))
# save kept for backward
ctx.save_for_backward(kept)
# no gradient for geom_feats
ctx.mark_non_differentiable(geom_feats)
return x, geom_feats
@staticmethod
def backward(ctx, gradx, gradgeom):
(kept, ) = ctx.saved_tensors
back = torch.cumsum(kept, 0)
back[kept] -= 1
val = gradx[back]
return val, None, None
class QuickCumsumCuda(torch.autograd.Function):
@staticmethod
def forward(ctx, x, geom_feats, ranks, B, D, H, W):
kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)
kept[1:] = ranks[1:] != ranks[:-1]
interval_starts = torch.where(kept)[0].int()
interval_lengths = torch.zeros_like(interval_starts)
interval_lengths[:-1] = interval_starts[1:] - interval_starts[:-1]
interval_lengths[-1] = x.shape[0] - interval_starts[-1]
geom_feats = geom_feats.int()
out = bev_pool_ext.bev_pool_forward(
x,
geom_feats,
interval_lengths,
interval_starts,
B,
D,
H,
W,
)
ctx.save_for_backward(interval_starts, interval_lengths, geom_feats)
ctx.saved_shapes = B, D, H, W
return out
@staticmethod
def backward(ctx, out_grad):
interval_starts, interval_lengths, geom_feats = ctx.saved_tensors
B, D, H, W = ctx.saved_shapes
out_grad = out_grad.contiguous()
x_grad = bev_pool_ext.bev_pool_backward(
out_grad,
geom_feats,
interval_lengths,
interval_starts,
B,
D,
H,
W,
)
return x_grad, None, None, None, None, None, None
def bev_pool(feats, coords, B, D, H, W):
assert feats.shape[0] == coords.shape[0]
ranks = (
coords[:, 0] * (W * D * B) + coords[:, 1] * (D * B) +
coords[:, 2] * B + coords[:, 3])
indices = ranks.argsort()
feats, coords, ranks = feats[indices], coords[indices], ranks[indices]
x = QuickCumsumCuda.apply(feats, coords, ranks, B, D, H, W)
x = x.permute(0, 4, 1, 2, 3).contiguous()
return x
#include <torch/torch.h>
#include <c10/cuda/CUDAGuard.h>
// CUDA function declarations
void bev_pool(int b, int d, int h, int w, int n, int c, int n_intervals, const float* x,
const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* out);
void bev_pool_grad(int b, int d, int h, int w, int n, int c, int n_intervals, const float* out_grad,
const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* x_grad);
/*
Function: pillar pooling (forward, cuda)
Args:
x : input features, FloatTensor[n, c]
geom_feats : input coordinates, IntTensor[n, 4]
interval_lengths : starting position for pooled point, IntTensor[n_intervals]
interval_starts : how many points in each pooled point, IntTensor[n_intervals]
Return:
out : output features, FloatTensor[b, d, h, w, c]
*/
at::Tensor bev_pool_forward(
const at::Tensor _x,
const at::Tensor _geom_feats,
const at::Tensor _interval_lengths,
const at::Tensor _interval_starts,
int b, int d, int h, int w
) {
int n = _x.size(0);
int c = _x.size(1);
int n_intervals = _interval_lengths.size(0);
const at::cuda::OptionalCUDAGuard device_guard(device_of(_x));
const float* x = _x.data_ptr<float>();
const int* geom_feats = _geom_feats.data_ptr<int>();
const int* interval_lengths = _interval_lengths.data_ptr<int>();
const int* interval_starts = _interval_starts.data_ptr<int>();
auto options =
torch::TensorOptions().dtype(_x.dtype()).device(_x.device());
at::Tensor _out = torch::zeros({b, d, h, w, c}, options);
float* out = _out.data_ptr<float>();
bev_pool(
b, d, h, w, n, c, n_intervals, x,
geom_feats, interval_starts, interval_lengths, out
);
return _out;
}
/*
Function: pillar pooling (backward, cuda)
Args:
out_grad : input features, FloatTensor[b, d, h, w, c]
geom_feats : input coordinates, IntTensor[n, 4]
interval_lengths : starting position for pooled point, IntTensor[n_intervals]
interval_starts : how many points in each pooled point, IntTensor[n_intervals]
Return:
x_grad : output features, FloatTensor[n, 4]
*/
at::Tensor bev_pool_backward(
const at::Tensor _out_grad,
const at::Tensor _geom_feats,
const at::Tensor _interval_lengths,
const at::Tensor _interval_starts,
int b, int d, int h, int w
) {
int n = _geom_feats.size(0);
int c = _out_grad.size(4);
int n_intervals = _interval_lengths.size(0);
const at::cuda::OptionalCUDAGuard device_guard(device_of(_out_grad));
const float* out_grad = _out_grad.data_ptr<float>();
const int* geom_feats = _geom_feats.data_ptr<int>();
const int* interval_lengths = _interval_lengths.data_ptr<int>();
const int* interval_starts = _interval_starts.data_ptr<int>();
auto options =
torch::TensorOptions().dtype(_out_grad.dtype()).device(_out_grad.device());
at::Tensor _x_grad = torch::zeros({n, c}, options);
float* x_grad = _x_grad.data_ptr<float>();
bev_pool_grad(
b, d, h, w, n, c, n_intervals, out_grad,
geom_feats, interval_starts, interval_lengths, x_grad
);
return _x_grad;
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("bev_pool_forward", &bev_pool_forward,
"bev_pool_forward");
m.def("bev_pool_backward", &bev_pool_backward,
"bev_pool_backward");
}
#include <stdio.h>
#include <stdlib.h>
/*
Function: pillar pooling
Args:
b : batch size
d : depth of the feature map
h : height of pooled feature map
w : width of pooled feature map
n : number of input points
c : number of channels
n_intervals : number of unique points
x : input features, FloatTensor[n, c]
geom_feats : input coordinates, IntTensor[n, 4]
interval_lengths : starting position for pooled point, IntTensor[n_intervals]
interval_starts : how many points in each pooled point, IntTensor[n_intervals]
out : output features, FloatTensor[b, d, h, w, c]
*/
__global__ void bev_pool_kernel(int b, int d, int h, int w, int n, int c, int n_intervals,
const float *__restrict__ x,
const int *__restrict__ geom_feats,
const int *__restrict__ interval_starts,
const int *__restrict__ interval_lengths,
float* __restrict__ out) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int index = idx / c;
int cur_c = idx % c;
if (index >= n_intervals) return;
int interval_start = interval_starts[index];
int interval_length = interval_lengths[index];
const int* cur_geom_feats = geom_feats + interval_start * 4;
const float* cur_x = x + interval_start * c + cur_c;
float* cur_out = out + cur_geom_feats[3] * d * h * w * c +
cur_geom_feats[2] * h * w * c + cur_geom_feats[0] * w * c +
cur_geom_feats[1] * c + cur_c;
float psum = 0;
for(int i = 0; i < interval_length; i++){
psum += cur_x[i * c];
}
*cur_out = psum;
}
/*
Function: pillar pooling backward
Args:
b : batch size
d : depth of the feature map
h : height of pooled feature map
w : width of pooled feature map
n : number of input points
c : number of channels
n_intervals : number of unique points
out_grad : gradient of the BEV fmap from top, FloatTensor[b, d, h, w, c]
geom_feats : input coordinates, IntTensor[n, 4]
interval_lengths : starting position for pooled point, IntTensor[n_intervals]
interval_starts : how many points in each pooled point, IntTensor[n_intervals]
x_grad : gradient of the image fmap, FloatTensor
*/
__global__ void bev_pool_grad_kernel(int b, int d, int h, int w, int n, int c, int n_intervals,
const float *__restrict__ out_grad,
const int *__restrict__ geom_feats,
const int *__restrict__ interval_starts,
const int *__restrict__ interval_lengths,
float* __restrict__ x_grad) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int index = idx / c;
int cur_c = idx % c;
if (index >= n_intervals) return;
int interval_start = interval_starts[index];
int interval_length = interval_lengths[index];
const int* cur_geom_feats = geom_feats + interval_start * 4;
float* cur_x_grad = x_grad + interval_start * c + cur_c;
const float* cur_out_grad = out_grad + cur_geom_feats[3] * d * h * w * c +
cur_geom_feats[2] * h * w * c + cur_geom_feats[0] * w * c +
cur_geom_feats[1] * c + cur_c;
for(int i = 0; i < interval_length; i++){
cur_x_grad[i * c] = *cur_out_grad;
}
}
void bev_pool(int b, int d, int h, int w, int n, int c, int n_intervals, const float* x,
const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* out) {
bev_pool_kernel<<<(int)ceil(((double)n_intervals * c / 256)), 256>>>(
b, d, h, w, n, c, n_intervals, x, geom_feats, interval_starts, interval_lengths, out
);
}
void bev_pool_grad(int b, int d, int h, int w, int n, int c, int n_intervals, const float* out_grad,
const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* x_grad) {
bev_pool_grad_kernel<<<(int)ceil(((double)n_intervals * c / 256)), 256>>>(
b, d, h, w, n, c, n_intervals, out_grad, geom_feats, interval_starts, interval_lengths, x_grad
);
}
from .scatter_points import DynamicScatter, dynamic_scatter
from .voxelize import Voxelization, voxelization
__all__ = ['Voxelization', 'voxelization', 'dynamic_scatter', 'DynamicScatter']
import torch
from torch import nn
from torch.autograd import Function
from .voxel_layer import (dynamic_point_to_voxel_backward,
dynamic_point_to_voxel_forward)
class _dynamic_scatter(Function):
@staticmethod
def forward(ctx, feats, coors, reduce_type='max'):
"""convert kitti points(N, >=3) to voxels.
Args:
feats: [N, C] float tensor. points features to be reduced
into voxels.
coors: [N, ndim] int tensor. corresponding voxel coordinates
(specifically multi-dim voxel index) of each points.
reduce_type: str. reduce op. support 'max', 'sum' and 'mean'
Returns:
tuple
voxel_feats: [M, C] float tensor. reduced features. input features
that shares the same voxel coordinates are reduced to one row
coordinates: [M, ndim] int tensor, voxel coordinates.
"""
results = dynamic_point_to_voxel_forward(feats, coors, reduce_type)
(voxel_feats, voxel_coors, point2voxel_map,
voxel_points_count) = results
ctx.reduce_type = reduce_type
ctx.save_for_backward(feats, voxel_feats, point2voxel_map,
voxel_points_count)
ctx.mark_non_differentiable(voxel_coors)
return voxel_feats, voxel_coors
@staticmethod
def backward(ctx, grad_voxel_feats, grad_voxel_coors=None):
(feats, voxel_feats, point2voxel_map,
voxel_points_count) = ctx.saved_tensors
grad_feats = torch.zeros_like(feats)
# TODO: whether to use index put or use cuda_backward
# To use index put, need point to voxel index
dynamic_point_to_voxel_backward(
grad_feats,
grad_voxel_feats.contiguous(),
feats,
voxel_feats,
point2voxel_map,
voxel_points_count,
ctx.reduce_type,
)
return grad_feats, None, None
dynamic_scatter = _dynamic_scatter.apply
class DynamicScatter(nn.Module):
def __init__(self, voxel_size, point_cloud_range, average_points: bool):
super(DynamicScatter, self).__init__()
"""Scatters points into voxels, used in the voxel encoder with
dynamic voxelization
**Note**: The CPU and GPU implementation get the same output, but
have numerical difference after summation and division (e.g., 5e-7).
Args:
average_points (bool): whether to use avg pooling to scatter
points into voxel voxel_size (list): list [x, y, z] size
of three dimension
point_cloud_range (list):
[x_min, y_min, z_min, x_max, y_max, z_max]
"""
self.voxel_size = voxel_size
self.point_cloud_range = point_cloud_range
self.average_points = average_points
def forward_single(self, points, coors):
reduce = 'mean' if self.average_points else 'max'
return dynamic_scatter(points.contiguous(), coors.contiguous(), reduce)
def forward(self, points, coors):
"""
Args:
input: NC points
"""
if coors.size(-1) == 3:
return self.forward_single(points, coors)
else:
batch_size = coors[-1, 0] + 1
voxels, voxel_coors = [], []
for i in range(batch_size):
inds = torch.where(coors[:, 0] == i)
voxel, voxel_coor = self.forward_single(
points[inds], coors[inds][:, 1:])
coor_pad = nn.functional.pad(
voxel_coor, (1, 0), mode='constant', value=i)
voxel_coors.append(coor_pad)
voxels.append(voxel)
features = torch.cat(voxels, dim=0)
feature_coors = torch.cat(voxel_coors, dim=0)
return features, feature_coors
def __repr__(self):
tmpstr = self.__class__.__name__ + '('
tmpstr += 'voxel_size=' + str(self.voxel_size)
tmpstr += ', point_cloud_range=' + str(self.point_cloud_range)
tmpstr += ', average_points=' + str(self.average_points)
tmpstr += ')'
return tmpstr
#include <ATen/TensorUtils.h>
#include <torch/extension.h>
// #include "voxelization.h"
namespace {
template <typename T_int>
void determin_max_points_kernel(
torch::TensorAccessor<T_int, 2> coor,
torch::TensorAccessor<T_int, 1> point_to_voxelidx,
torch::TensorAccessor<T_int, 1> num_points_per_voxel,
torch::TensorAccessor<T_int, 3> coor_to_voxelidx, int& voxel_num,
int& max_points, const int num_points) {
int voxelidx, num;
for (int i = 0; i < num_points; ++i) {
if (coor[i][0] == -1) continue;
voxelidx = coor_to_voxelidx[coor[i][0]][coor[i][1]][coor[i][2]];
// record voxel
if (voxelidx == -1) {
voxelidx = voxel_num;
voxel_num += 1;
coor_to_voxelidx[coor[i][0]][coor[i][1]][coor[i][2]] = voxelidx;
}
// put points into voxel
num = num_points_per_voxel[voxelidx];
point_to_voxelidx[i] = num;
num_points_per_voxel[voxelidx] += 1;
// update max points per voxel
max_points = std::max(max_points, num + 1);
}
return;
}
template <typename T, typename T_int>
void scatter_point_to_voxel_kernel(
const torch::TensorAccessor<T, 2> points,
torch::TensorAccessor<T_int, 2> coor,
torch::TensorAccessor<T_int, 1> point_to_voxelidx,
torch::TensorAccessor<T_int, 3> coor_to_voxelidx,
torch::TensorAccessor<T, 3> voxels,
torch::TensorAccessor<T_int, 2> voxel_coors, const int num_features,
const int num_points, const int NDim) {
for (int i = 0; i < num_points; ++i) {
int num = point_to_voxelidx[i];
int voxelidx = coor_to_voxelidx[coor[i][0]][coor[i][1]][coor[i][2]];
for (int k = 0; k < num_features; ++k) {
voxels[voxelidx][num][k] = points[i][k];
}
for (int k = 0; k < NDim; ++k) {
voxel_coors[voxelidx][k] = coor[i][k];
}
}
}
} // namespace
namespace voxelization {
std::vector<at::Tensor> dynamic_point_to_voxel_cpu(
const at::Tensor& points, const at::Tensor& voxel_mapping,
const std::vector<float> voxel_size, const std::vector<float> coors_range) {
// current version tooks about 0.02s_0.03s for one frame on cpu
// check device
AT_ASSERTM(points.device().is_cpu(), "points must be a CPU tensor");
const int NDim = voxel_mapping.size(1);
const int num_points = points.size(0);
const int num_features = points.size(1);
std::vector<int> grid_size(NDim);
for (int i = 0; i < NDim; ++i) {
grid_size[i] =
round((coors_range[NDim + i] - coors_range[i]) / voxel_size[i]);
}
at::Tensor num_points_per_voxel = at::zeros(
{
num_points,
},
voxel_mapping.options());
at::Tensor coor_to_voxelidx = -at::ones(
{grid_size[2], grid_size[1], grid_size[0]}, voxel_mapping.options());
at::Tensor point_to_voxelidx = -at::ones(
{
num_points,
},
voxel_mapping.options());
int voxel_num = 0;
int max_points = 0;
AT_DISPATCH_ALL_TYPES(voxel_mapping.scalar_type(), "determin_max_point", [&] {
determin_max_points_kernel<scalar_t>(
voxel_mapping.accessor<scalar_t, 2>(),
point_to_voxelidx.accessor<scalar_t, 1>(),
num_points_per_voxel.accessor<scalar_t, 1>(),
coor_to_voxelidx.accessor<scalar_t, 3>(), voxel_num, max_points,
num_points);
});
at::Tensor voxels =
at::zeros({voxel_num, max_points, num_features}, points.options());
at::Tensor voxel_coors =
at::zeros({voxel_num, NDim}, points.options().dtype(at::kInt));
AT_DISPATCH_ALL_TYPES(points.scalar_type(), "scatter_point_to_voxel", [&] {
scatter_point_to_voxel_kernel<scalar_t, int>(
points.accessor<scalar_t, 2>(), voxel_mapping.accessor<int, 2>(),
point_to_voxelidx.accessor<int, 1>(),
coor_to_voxelidx.accessor<int, 3>(), voxels.accessor<scalar_t, 3>(),
voxel_coors.accessor<int, 2>(), num_features, num_points, NDim);
});
at::Tensor num_points_per_voxel_out =
num_points_per_voxel.slice(/*dim=*/0, /*start=*/0, /*end=*/voxel_num);
return {voxels, voxel_coors, num_points_per_voxel_out};
}
} // namespace voxelization
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>
#include <torch/types.h>
#include <ATen/cuda/CUDAApplyUtils.cuh>
typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
#define CHECK_CUDA(x) \
TORCH_CHECK(x.device().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) \
TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) \
CHECK_CUDA(x); \
CHECK_CONTIGUOUS(x)
namespace {
int const threadsPerBlock = 512;
int const maxGridDim = 50000;
} // namespace
__device__ __forceinline__ static void reduceMax(float *address, float val) {
int *address_as_i = reinterpret_cast<int *>(address);
int old = *address_as_i, assumed;
do {
assumed = old;
old = atomicCAS(address_as_i, assumed,
__float_as_int(fmaxf(val, __int_as_float(assumed))));
} while (assumed != old || __int_as_float(old) < val);
}
__device__ __forceinline__ static void reduceMax(double *address, double val) {
unsigned long long *address_as_ull =
reinterpret_cast<unsigned long long *>(address);
unsigned long long old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(
address_as_ull, assumed,
__double_as_longlong(fmax(val, __longlong_as_double(assumed))));
} while (assumed != old || __longlong_as_double(old) < val);
}
// get rid of meaningless warnings when compiling host code
#ifdef __CUDA_ARCH__
__device__ __forceinline__ static void reduceAdd(float *address, float val) {
#if (__CUDA_ARCH__ < 200)
#warning \
"compute capability lower than 2.x. fall back to use CAS version of atomicAdd for float32"
int *address_as_i = reinterpret_cast<int *>(address);
int old = *address_as_i, assumed;
do {
assumed = old;
old = atomicCAS(address_as_i, assumed,
__float_as_int(val + __int_as_float(assumed)));
} while (assumed != old);
#else
atomicAdd(address, val);
#endif
}
__device__ __forceinline__ static void reduceAdd(double *address, double val) {
#if (__CUDA_ARCH__ < 600)
#warning \
"compute capability lower than 6.x. fall back to use CAS version of atomicAdd for float64"
unsigned long long *address_as_ull =
reinterpret_cast<unsigned long long *>(address);
unsigned long long old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val + __longlong_as_double(assumed)));
} while (assumed != old);
#else
atomicAdd(address, val);
#endif
}
#endif
template <typename T>
__global__ void
feats_reduce_kernel(const T *feats, const int32_t *coors_map,
T *reduced_feats, // shall be 0 at initialization
const int num_input, const int num_feats,
const reduce_t reduce_type) {
for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
x += gridDim.x * blockDim.x) {
int32_t reduce_to = coors_map[x];
if (reduce_to == -1) continue;
const T *feats_offset = feats + x * num_feats;
T *reduced_feats_offset = reduced_feats + reduce_to * num_feats;
if (reduce_type == reduce_t::MAX) {
for (int i = 0; i < num_feats; i++) {
reduceMax(&reduced_feats_offset[i], feats_offset[i]);
}
} else {
for (int i = 0; i < num_feats; i++) {
reduceAdd(&reduced_feats_offset[i], feats_offset[i]);
}
}
}
}
template <typename T>
__global__ void add_reduce_traceback_grad_kernel(
T *grad_feats, const T *grad_reduced_feats, const int32_t *coors_map,
const int32_t *reduce_count, const int num_input, const int num_feats,
const reduce_t reduce_type) {
for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
x += gridDim.x * blockDim.x) {
int32_t reduce_to = coors_map[x];
if (reduce_to == -1) {
continue;
}
const int input_offset = x * num_feats;
T *grad_feats_offset = grad_feats + input_offset;
const int reduced_offset = reduce_to * num_feats;
const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
if (reduce_type == reduce_t::SUM) {
for (int i = 0; i < num_feats; i++) {
grad_feats_offset[i] = grad_reduced_feats_offset[i];
}
} else if (reduce_type == reduce_t::MEAN) {
for (int i = 0; i < num_feats; i++) {
grad_feats_offset[i] = grad_reduced_feats_offset[i] /
static_cast<T>(reduce_count[reduce_to]);
}
}
}
}
template <typename T>
__global__ void max_reduce_traceback_scatter_idx_kernel(
const T *feats, const T *reduced_feats, int32_t *reduce_from,
const int32_t *coors_map, const int num_input, const int num_feats) {
for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
x += gridDim.x * blockDim.x) {
int32_t reduce_to = coors_map[x];
const int input_offset = x * num_feats;
const T *feats_offset = feats + input_offset;
if (reduce_to == -1) {
continue;
}
const int reduced_offset = reduce_to * num_feats;
const T *reduced_feats_offset = reduced_feats + reduced_offset;
int32_t *reduce_from_offset = reduce_from + reduced_offset;
for (int i = 0; i < num_feats; i++) {
if (feats_offset[i] == reduced_feats_offset[i]) {
atomicMin(&reduce_from_offset[i], static_cast<int32_t>(x));
}
}
}
}
template <typename T>
__global__ void max_reduce_scatter_grad_kernel(T *grad_feats,
const T *grad_reduced_feats,
const int32_t *reduce_from,
const int num_reduced,
const int num_feats) {
for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_reduced;
x += gridDim.x * blockDim.x) {
const int reduced_offset = x * num_feats;
const int32_t *scatter_to_offset = reduce_from + reduced_offset;
const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
for (int i = 0; i < num_feats; i++) {
grad_feats[scatter_to_offset[i] * num_feats + i] =
grad_reduced_feats_offset[i];
}
}
}
namespace voxelization {
std::vector<at::Tensor> dynamic_point_to_voxel_forward_gpu(
const at::Tensor &feats, const at::Tensor &coors,
const reduce_t reduce_type) {
CHECK_INPUT(feats);
CHECK_INPUT(coors);
const int num_input = feats.size(0);
const int num_feats = feats.size(1);
if (num_input == 0)
return {feats.clone().detach(),
coors.clone().detach(),
coors.new_empty({0}, torch::kInt32),
coors.new_empty({0}, torch::kInt32)};
at::Tensor out_coors;
at::Tensor coors_map;
at::Tensor reduce_count;
auto coors_clean = coors.masked_fill(coors.lt(0).any(-1, true), -1);
std::tie(out_coors, coors_map, reduce_count) =
at::unique_dim(coors_clean, 0, true, true, true);
if (out_coors.index({0, 0}).lt(0).item<bool>()) {
// the first element of out_coors (-1,-1,-1) and should be removed
out_coors = out_coors.slice(0, 1);
reduce_count = reduce_count.slice(0, 1);
coors_map = coors_map - 1;
}
coors_map = coors_map.to(torch::kInt32);
reduce_count = reduce_count.to(torch::kInt32);
auto reduced_feats =
at::empty({out_coors.size(0), num_feats}, feats.options());
AT_DISPATCH_FLOATING_TYPES(
feats.scalar_type(), "feats_reduce_kernel", ([&] {
if (reduce_type == reduce_t::MAX)
reduced_feats.fill_(-std::numeric_limits<scalar_t>::infinity());
else
reduced_feats.fill_(static_cast<scalar_t>(0));
dim3 blocks(std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
maxGridDim));
dim3 threads(threadsPerBlock);
feats_reduce_kernel<<<blocks, threads>>>(
feats.data_ptr<scalar_t>(), coors_map.data_ptr<int32_t>(),
reduced_feats.data_ptr<scalar_t>(), num_input, num_feats, reduce_type);
if (reduce_type == reduce_t::MEAN)
reduced_feats /= reduce_count.unsqueeze(-1).to(reduced_feats.dtype());
}));
AT_CUDA_CHECK(cudaGetLastError());
return {reduced_feats, out_coors, coors_map, reduce_count};
}
void dynamic_point_to_voxel_backward_gpu(at::Tensor &grad_feats,
const at::Tensor &grad_reduced_feats,
const at::Tensor &feats,
const at::Tensor &reduced_feats,
const at::Tensor &coors_map,
const at::Tensor &reduce_count,
const reduce_t reduce_type) {
CHECK_INPUT(grad_feats);
CHECK_INPUT(grad_reduced_feats);
CHECK_INPUT(feats);
CHECK_INPUT(reduced_feats);
CHECK_INPUT(coors_map);
CHECK_INPUT(reduce_count);
const int num_input = feats.size(0);
const int num_reduced = reduced_feats.size(0);
const int num_feats = feats.size(1);
grad_feats.fill_(0);
// copy voxel grad to points
if (num_input == 0 || num_reduced == 0) return;
if (reduce_type == reduce_t::MEAN || reduce_type == reduce_t::SUM) {
AT_DISPATCH_FLOATING_TYPES(
grad_reduced_feats.scalar_type(), "add_reduce_traceback_grad_kernel",
([&] {
dim3 blocks(std::min(
at::cuda::ATenCeilDiv(num_input, threadsPerBlock), maxGridDim));
dim3 threads(threadsPerBlock);
add_reduce_traceback_grad_kernel<<<blocks, threads>>>(
grad_feats.data_ptr<scalar_t>(),
grad_reduced_feats.data_ptr<scalar_t>(),
coors_map.data_ptr<int32_t>(), reduce_count.data_ptr<int32_t>(),
num_input, num_feats, reduce_type);
}));
AT_CUDA_CHECK(cudaGetLastError());
} else {
auto reduce_from = at::full({num_reduced, num_feats}, num_input,
coors_map.options().dtype(torch::kInt32));
AT_DISPATCH_FLOATING_TYPES(
grad_reduced_feats.scalar_type(),
"max_reduce_traceback_scatter_idx_kernel", ([&] {
dim3 blocks(std::min(
at::cuda::ATenCeilDiv(num_input, threadsPerBlock), maxGridDim));
dim3 threads(threadsPerBlock);
max_reduce_traceback_scatter_idx_kernel<<<blocks, threads>>>(
feats.data_ptr<scalar_t>(), reduced_feats.data_ptr<scalar_t>(),
reduce_from.data_ptr<int32_t>(), coors_map.data_ptr<int32_t>(),
num_input, num_feats);
}));
AT_CUDA_CHECK(cudaGetLastError());
AT_DISPATCH_FLOATING_TYPES(
grad_reduced_feats.scalar_type(),
"max_reduce_traceback_scatter_idx_kernel", ([&] {
dim3 blocks(std::min(
at::cuda::ATenCeilDiv(num_reduced, threadsPerBlock), maxGridDim));
dim3 threads(threadsPerBlock);
max_reduce_scatter_grad_kernel<<<blocks, threads>>>(
grad_feats.data_ptr<scalar_t>(),
grad_reduced_feats.data_ptr<scalar_t>(),
reduce_from.data_ptr<int32_t>(), num_reduced, num_feats);
}));
AT_CUDA_CHECK(cudaGetLastError());
}
return;
}
} // namespace voxelization
#include <torch/extension.h>
#include "voxelization.h"
namespace voxelization {
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("hard_voxelize", &hard_voxelize, "hard voxelize");
m.def("dynamic_voxelize", &dynamic_voxelize, "dynamic voxelization");
m.def("dynamic_point_to_voxel_forward", &dynamic_point_to_voxel_forward, "dynamic point to voxel forward");
m.def("dynamic_point_to_voxel_backward", &dynamic_point_to_voxel_backward, "dynamic point to voxel backward");
}
} // namespace voxelization
#pragma once
#include <torch/extension.h>
typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
namespace voxelization {
int hard_voxelize_cpu(const at::Tensor &points, at::Tensor &voxels,
at::Tensor &coors, at::Tensor &num_points_per_voxel,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int max_points, const int max_voxels,
const int NDim = 3);
void dynamic_voxelize_cpu(const at::Tensor &points, at::Tensor &coors,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int NDim = 3);
std::vector<at::Tensor> dynamic_point_to_voxel_cpu(
const at::Tensor &points, const at::Tensor &voxel_mapping,
const std::vector<float> voxel_size, const std::vector<float> coors_range);
#ifdef WITH_CUDA
int hard_voxelize_gpu(const at::Tensor &points, at::Tensor &voxels,
at::Tensor &coors, at::Tensor &num_points_per_voxel,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int max_points, const int max_voxels,
const int NDim = 3);
int nondisterministic_hard_voxelize_gpu(const at::Tensor &points, at::Tensor &voxels,
at::Tensor &coors, at::Tensor &num_points_per_voxel,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int max_points, const int max_voxels,
const int NDim = 3);
void dynamic_voxelize_gpu(const at::Tensor &points, at::Tensor &coors,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int NDim = 3);
std::vector<torch::Tensor> dynamic_point_to_voxel_forward_gpu(const torch::Tensor &feats,
const torch::Tensor &coors,
const reduce_t reduce_type);
void dynamic_point_to_voxel_backward_gpu(torch::Tensor &grad_feats,
const torch::Tensor &grad_reduced_feats,
const torch::Tensor &feats,
const torch::Tensor &reduced_feats,
const torch::Tensor &coors_idx,
const torch::Tensor &reduce_count,
const reduce_t reduce_type);
#endif
// Interface for Python
inline int hard_voxelize(const at::Tensor &points, at::Tensor &voxels,
at::Tensor &coors, at::Tensor &num_points_per_voxel,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int max_points, const int max_voxels,
const int NDim = 3, const bool deterministic = true) {
if (points.device().is_cuda()) {
#ifdef WITH_CUDA
if (deterministic) {
return hard_voxelize_gpu(points, voxels, coors, num_points_per_voxel,
voxel_size, coors_range, max_points, max_voxels,
NDim);
}
return nondisterministic_hard_voxelize_gpu(points, voxels, coors, num_points_per_voxel,
voxel_size, coors_range, max_points, max_voxels,
NDim);
#else
AT_ERROR("Not compiled with GPU support");
#endif
}
return hard_voxelize_cpu(points, voxels, coors, num_points_per_voxel,
voxel_size, coors_range, max_points, max_voxels,
NDim);
}
inline void dynamic_voxelize(const at::Tensor &points, at::Tensor &coors,
const std::vector<float> voxel_size,
const std::vector<float> coors_range,
const int NDim = 3) {
if (points.device().is_cuda()) {
#ifdef WITH_CUDA
return dynamic_voxelize_gpu(points, coors, voxel_size, coors_range, NDim);
#else
AT_ERROR("Not compiled with GPU support");
#endif
}
return dynamic_voxelize_cpu(points, coors, voxel_size, coors_range, NDim);
}
inline reduce_t convert_reduce_type(const std::string &reduce_type) {
if (reduce_type == "max")
return reduce_t::MAX;
else if (reduce_type == "sum")
return reduce_t::SUM;
else if (reduce_type == "mean")
return reduce_t::MEAN;
else TORCH_CHECK(false, "do not support reduce type " + reduce_type)
return reduce_t::SUM;
}
inline std::vector<torch::Tensor> dynamic_point_to_voxel_forward(const torch::Tensor &feats,
const torch::Tensor &coors,
const std::string &reduce_type) {
if (feats.device().is_cuda()) {
#ifdef WITH_CUDA
return dynamic_point_to_voxel_forward_gpu(feats, coors, convert_reduce_type(reduce_type));
#else
TORCH_CHECK(false, "Not compiled with GPU support");
#endif
}
TORCH_CHECK(false, "do not support cpu yet");
return std::vector<torch::Tensor>();
}
inline void dynamic_point_to_voxel_backward(torch::Tensor &grad_feats,
const torch::Tensor &grad_reduced_feats,
const torch::Tensor &feats,
const torch::Tensor &reduced_feats,
const torch::Tensor &coors_idx,
const torch::Tensor &reduce_count,
const std::string &reduce_type) {
if (grad_feats.device().is_cuda()) {
#ifdef WITH_CUDA
dynamic_point_to_voxel_backward_gpu(
grad_feats, grad_reduced_feats, feats, reduced_feats, coors_idx, reduce_count,
convert_reduce_type(reduce_type));
return;
#else
TORCH_CHECK(false, "Not compiled with GPU support");
#endif
}
TORCH_CHECK(false, "do not support cpu yet");
}
} // namespace voxelization
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment