Unverified Commit 83954d03 authored by yukang's avatar yukang Committed by GitHub
Browse files

Add support for VoxelNeXt (#1309)

* VoxelNeXt
parent 31f6758a
......@@ -22,6 +22,8 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
## Changelog
[2023-04-02] Added support for [`VoxelNeXt`](https://github.com/dvlab-research/VoxelNeXt) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels.
[2022-09-02] **NEW:** Update `OpenPCDet` to v0.6.0:
* Official code release of [MPPNet](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet).
* Support multi-frame training/testing on Waymo Open Dataset (see the [change log](docs/changelog.md) for more details on how to process data).
......@@ -172,7 +174,6 @@ By default, all models are trained with **a single frame** of **20% data (~32k f
| [PV-RCNN++](tools/cfgs/waymo_models/pv_rcnn_plusplus.yaml) | 77.82/77.32| 69.07/68.62| 77.99/71.36| 69.92/63.74| 71.80/70.71| 69.31/68.26|
| [PV-RCNN++ (ResNet)](tools/cfgs/waymo_models/pv_rcnn_plusplus_resnet.yaml) |77.61/77.14| 69.18/68.75| 79.42/73.31| 70.88/65.21| 72.50/71.39| 69.84/68.77|
Here we also provide the performance of several models trained on the full training set (refer to the paper of [PV-RCNN++](https://arxiv.org/abs/2102.00463)):
| Performance@(train with 100\% Data) | Vec_L1 | Vec_L2 | Ped_L1 | Ped_L2 | Cyc_L1 | Cyc_L2 |
......@@ -180,6 +181,7 @@ Here we also provide the performance of several models trained on the full train
| [SECOND](tools/cfgs/waymo_models/second.yaml) | 72.27/71.69 | 63.85/63.33 | 68.70/58.18 | 60.72/51.31 | 60.62/59.28 | 58.34/57.05 |
| [CenterPoint-Pillar](tools/cfgs/waymo_models/centerpoint_pillar_1x.yaml)| 73.37/72.86 | 65.09/64.62 | 75.35/65.11 | 67.61/58.25 | 67.76/66.22 | 65.25/63.77 |
| [Part-A2-Anchor](tools/cfgs/waymo_models/PartA2.yaml) | 77.05/76.51 | 68.47/67.97 | 75.24/66.87 | 66.18/58.62 | 68.60/67.36 | 66.13/64.93 |
| [VoxelNeXt-2D](tools/cfgs/waymo_models/voxelnext2d_ioubranch.yaml) | 77.94/77.47 |69.68/69.25 |80.24/73.47 |72.23/65.88 |73.33/72.20 |70.66/69.56 |
| [PV-RCNN (CenterHead)](tools/cfgs/waymo_models/pv_rcnn_with_centerhead_rpn.yaml) | 78.00/77.50 | 69.43/68.98 | 79.21/73.03 | 70.42/64.72 | 71.46/70.27 | 68.95/67.79 |
| [PV-RCNN++](tools/cfgs/waymo_models/pv_rcnn_plusplus.yaml) | 79.10/78.63 | 70.34/69.91 | 80.62/74.62 | 71.86/66.30 | 73.49/72.38 | 70.70/69.62 |
| [PV-RCNN++ (ResNet)](tools/cfgs/waymo_models/pv_rcnn_plusplus_resnet.yaml) | 79.25/78.78 | 70.61/70.18 | 81.83/76.28 | 73.17/68.00 | 73.72/72.66 | 71.21/70.19 |
......@@ -199,12 +201,13 @@ but you could easily achieve similar performance by training with the default co
All models are trained with 8 GTX 1080Ti GPUs and are available for download.
| | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download |
|---------------------------------------------|----------:|:-------:|:-------:|:-------:|:---------:|:-------:|:-------:|:---------:|
|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
| [PointPillar-MultiHead](tools/cfgs/nuscenes_models/cbgs_pp_multihead.yaml) | 33.87 | 26.00 | 32.07 | 28.74 | 20.15 | 44.63 | 58.23 | [model-23M](https://drive.google.com/file/d/1p-501mTWsq0G9RzroTWSXreIMyTUUpBM/view?usp=sharing) |
| [SECOND-MultiHead (CBGS)](tools/cfgs/nuscenes_models/cbgs_second_multihead.yaml) | 31.15 | 25.51 | 26.64 | 26.26 | 20.46 | 50.59 | 62.29 | [model-35M](https://drive.google.com/file/d/1bNzcOnE3u9iooBFMk2xK7HqhdeQ_nwTq/view?usp=sharing) |
| [CenterPoint-PointPillar](tools/cfgs/nuscenes_models/cbgs_dyn_pp_centerpoint.yaml) | 31.13 | 26.04 | 42.92 | 23.90 | 19.14 | 50.03 | 60.70 | [model-23M](https://drive.google.com/file/d/1UvGm6mROMyJzeSRu7OD1leU_YWoAZG7v/view?usp=sharing) |
| [CenterPoint (voxel_size=0.1)](tools/cfgs/nuscenes_models/cbgs_voxel01_res3d_centerpoint.yaml) | 30.11 | 25.55 | 38.28 | 21.94 | 18.87 | 56.03 | 64.54 | [model-34M](https://drive.google.com/file/d/1Cz-J1c3dw7JAWc25KRG1XQj8yCaOlexQ/view?usp=sharing) |
| [CenterPoint (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_res3d_centerpoint.yaml) | 28.80 | 25.43 | 37.27 | 21.55 | 18.24 | 59.22 | 66.48 | [model-34M](https://drive.google.com/file/d/1XOHAWm1MPkCKr1gqmc3TWi5AYZgPsgxU/view?usp=sharing) |
| [VoxelNeXt (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_voxelnext.yaml) | 30.11 | 25.23 | 40.57 | 21.69 | 18.56 | 60.53 | 66.65 | [model-31M](https://drive.google.com/file/d/1IV7e7G9X-61KXSjMGtQo579pzDNbhwvf/view?usp=share_link) |
### ONCE 3D Object Detection Baselines
......@@ -218,6 +221,14 @@ All models are trained with 8 GPUs.
| [PV-RCNN](tools/cfgs/once_models/pv_rcnn.yaml) | 77.77 | 23.50 | 59.37 | 53.55 |
| [CenterPoint](tools/cfgs/once_models/centerpoint.yaml) | 78.02 | 49.74 | 67.22 | 64.99 |
### Argoverse2 3D Object Detection Baselines
All models are trained with 4 GPUs.
| | mAP | download |
|---------------------------------------------------------|:----:|:--------------------------------------------------------------------------------------------------:|
| [VoxelNeXt](tools/cfgs/argo2_models/cbgs_voxel01_voxelnext.yaml) | 30.0 | [model-30M](https://drive.google.com/file/d/1zr-it1ERJzLQ3a3hP060z_EQqS_RkNaC/view?usp=share_link) |
| [VoxelNeXt-K3](tools/cfgs/argo2_models/cbgs_voxel01_voxelnext_headkernel3.yaml) | 30.7 | [model-45M](https://drive.google.com/file/d/1NrYRsiKbuWyL8jE4SY27IHpFMY9K0o__/view?usp=share_link) |
### Other datasets
Welcome to support other datasets by submitting pull request.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -12,6 +12,7 @@ from .waymo.waymo_dataset import WaymoDataset
from .pandaset.pandaset_dataset import PandasetDataset
from .lyft.lyft_dataset import LyftDataset
from .once.once_dataset import ONCEDataset
from .argo2.argo2_dataset import Argo2Dataset
from .custom.custom_dataset import CustomDataset
__all__ = {
......@@ -22,7 +23,8 @@ __all__ = {
'PandasetDataset': PandasetDataset,
'LyftDataset': LyftDataset,
'ONCEDataset': ONCEDataset,
'CustomDataset': CustomDataset
'CustomDataset': CustomDataset,
'Argo2Dataset': Argo2Dataset
}
......
import copy
import pickle
import torch
import numpy as np
from ..dataset import DatasetTemplate
from .argo2_utils.so3 import yaw_to_quat
from .argo2_utils.constants import LABEL_ATTR
from os import path as osp
from pathlib import Path
class Argo2Dataset(DatasetTemplate):
def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None):
"""
Args:
root_path:
dataset_cfg:
class_names:
training:
logger:
"""
super().__init__(
dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
)
self.split = self.dataset_cfg.DATA_SPLIT[self.mode]
self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing')
split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')
self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None
self.kitti_infos = []
self.include_kitti_data(self.mode)
def include_kitti_data(self, mode):
if self.logger is not None:
self.logger.info('Loading Argoverse2 dataset')
kitti_infos = []
for info_path in self.dataset_cfg.INFO_PATH[mode]:
info_path = self.root_path / info_path
if not info_path.exists():
continue
with open(info_path, 'rb') as f:
infos = pickle.load(f)
kitti_infos.extend(infos)
self.kitti_infos.extend(kitti_infos)
if self.logger is not None:
self.logger.info('Total samples for Argo2 dataset: %d' % (len(kitti_infos)))
def set_split(self, split):
super().__init__(
dataset_cfg=self.dataset_cfg, class_names=self.class_names, training=self.training, root_path=self.root_path, logger=self.logger
)
self.split = split
self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing')
split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')
self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None
def get_lidar(self, idx):
lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx)
assert lidar_file.exists()
return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)
@staticmethod
def generate_prediction_dicts(batch_dict, pred_dicts, class_names, output_path=None):
"""
Args:
batch_dict:
frame_id:
pred_dicts: list of pred_dicts
pred_boxes: (N, 7), Tensor
pred_scores: (N), Tensor
pred_labels: (N), Tensor
class_names:
output_path:
Returns:
"""
def get_template_prediction(num_samples):
ret_dict = {
'name': np.zeros(num_samples), 'truncated': np.zeros(num_samples),
'occluded': np.zeros(num_samples), 'alpha': np.zeros(num_samples),
'bbox': np.zeros([num_samples, 4]), 'dimensions': np.zeros([num_samples, 3]),
'location': np.zeros([num_samples, 3]), 'rotation_y': np.zeros(num_samples),
'score': np.zeros(num_samples), 'boxes_lidar': np.zeros([num_samples, 7])
}
return ret_dict
def generate_single_sample_dict(batch_index, box_dict):
pred_scores = box_dict['pred_scores'].cpu().numpy()
pred_boxes = box_dict['pred_boxes'].cpu().numpy()
pred_labels = box_dict['pred_labels'].cpu().numpy()
pred_dict = get_template_prediction(pred_scores.shape[0])
if pred_scores.shape[0] == 0:
return pred_dict
pred_boxes_img = pred_boxes
pred_boxes_camera = pred_boxes
pred_dict['name'] = np.array(class_names)[pred_labels - 1]
pred_dict['alpha'] = -np.arctan2(-pred_boxes[:, 1], pred_boxes[:, 0]) + pred_boxes_camera[:, 6]
pred_dict['bbox'] = pred_boxes_img
pred_dict['dimensions'] = pred_boxes_camera[:, 3:6]
pred_dict['location'] = pred_boxes_camera[:, 0:3]
pred_dict['rotation_y'] = pred_boxes_camera[:, 6]
pred_dict['score'] = pred_scores
pred_dict['boxes_lidar'] = pred_boxes
return pred_dict
annos = []
for index, box_dict in enumerate(pred_dicts):
frame_id = batch_dict['frame_id'][index]
single_pred_dict = generate_single_sample_dict(index, box_dict)
single_pred_dict['frame_id'] = frame_id
annos.append(single_pred_dict)
if output_path is not None:
cur_det_file = output_path / ('%s.txt' % frame_id)
with open(cur_det_file, 'w') as f:
bbox = single_pred_dict['bbox']
loc = single_pred_dict['location']
dims = single_pred_dict['dimensions'] # lhw -> hwl
for idx in range(len(bbox)):
print('%s -1 -1 %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f'
% (single_pred_dict['name'][idx], single_pred_dict['alpha'][idx],
bbox[idx][0], bbox[idx][1], bbox[idx][2], bbox[idx][3],
dims[idx][1], dims[idx][2], dims[idx][0], loc[idx][0],
loc[idx][1], loc[idx][2], single_pred_dict['rotation_y'][idx],
single_pred_dict['score'][idx]), file=f)
return annos
def __len__(self):
if self._merge_all_iters_to_one_epoch:
return len(self.kitti_infos) * self.total_epochs
return len(self.kitti_infos)
def __getitem__(self, index):
# index = 4
if self._merge_all_iters_to_one_epoch:
index = index % len(self.kitti_infos)
info = copy.deepcopy(self.kitti_infos[index])
sample_idx = info['point_cloud']['velodyne_path'].split('/')[-1].rstrip('.bin')
calib = None
get_item_list = self.dataset_cfg.get('GET_ITEM_LIST', ['points'])
input_dict = {
'frame_id': sample_idx,
'calib': calib,
}
if 'annos' in info:
annos = info['annos']
loc, dims, rots = annos['location'], annos['dimensions'], annos['rotation_y']
gt_names = annos['name']
gt_bboxes_3d = np.concatenate([loc, dims, rots[..., np.newaxis]], axis=1).astype(np.float32)
input_dict.update({
'gt_names': gt_names,
'gt_boxes': gt_bboxes_3d
})
if "points" in get_item_list:
points = self.get_lidar(sample_idx)
input_dict['points'] = points
input_dict['calib'] = calib
data_dict = self.prepare_data(data_dict=input_dict)
return data_dict
def format_results(self,
outputs,
class_names,
pklfile_prefix=None,
submission_prefix=None,
):
"""Format the results to .feather file with argo2 format.
Args:
outputs (list[dict]): Testing results of the dataset.
pklfile_prefix (str | None): The prefix of pkl files. It includes
the file path and the prefix of filename, e.g., "a/b/prefix".
If not specified, a temp file will be created. Default: None.
submission_prefix (str | None): The prefix of submitted files. It
includes the file path and the prefix of filename, e.g.,
"a/b/prefix". If not specified, a temp file will be created.
Default: None.
Returns:
tuple: (result_files, tmp_dir), result_files is a dict containing
the json filepaths, tmp_dir is the temporal directory created
for saving json files when jsonfile_prefix is not specified.
"""
import pandas as pd
assert len(self.kitti_infos) == len(outputs)
num_samples = len(outputs)
print('\nGot {} samples'.format(num_samples))
serialized_dts_list = []
print('\nConvert predictions to Argoverse 2 format')
for i in range(num_samples):
out_i = outputs[i]
log_id, ts = self.kitti_infos[i]['uuid'].split('/')
track_uuid = None
#cat_id = out_i['labels_3d'].numpy().tolist()
#category = [class_names[i].upper() for i in cat_id]
category = [class_name.upper() for class_name in out_i['name']]
serialized_dts = pd.DataFrame(
self.lidar_box_to_argo2(out_i['bbox']).numpy(), columns=list(LABEL_ATTR)
)
serialized_dts["score"] = out_i['score']
serialized_dts["log_id"] = log_id
serialized_dts["timestamp_ns"] = int(ts)
serialized_dts["category"] = category
serialized_dts_list.append(serialized_dts)
dts = (
pd.concat(serialized_dts_list)
.set_index(["log_id", "timestamp_ns"])
.sort_index()
)
dts = dts.sort_values("score", ascending=False).reset_index()
if pklfile_prefix is not None:
if not pklfile_prefix.endswith(('.feather')):
pklfile_prefix = f'{pklfile_prefix}.feather'
dts.to_feather(pklfile_prefix)
print(f'Result is saved to {pklfile_prefix}.')
dts = dts.set_index(["log_id", "timestamp_ns"]).sort_index()
return dts
def lidar_box_to_argo2(self, boxes):
boxes = torch.Tensor(boxes)
cnt_xyz = boxes[:, :3]
#cnt_xyz[:, 2] += boxes[:, 5] * 0.5
lwh = boxes[:, [4, 3, 5]]
#yaw = -boxes[:, 6] - np.pi/2
yaw = boxes[:, 6] #- np.pi/2
yaw = -yaw - 0.5 * np.pi
while (yaw < -np.pi).any():
yaw[yaw < -np.pi] += 2 * np.pi
while (yaw > np.pi).any():
yaw[yaw > np.pi] -= 2 * np.pi
quat = yaw_to_quat(yaw)
argo_cuboid = torch.cat([cnt_xyz, lwh, quat], dim=1)
return argo_cuboid
def evaluation(self,
results,
class_names,
eval_metric='waymo',
logger=None,
pklfile_prefix=None,
submission_prefix=None,
show=False,
output_path=None,
pipeline=None):
"""Evaluation in KITTI protocol.
Args:
results (list[dict]): Testing results of the dataset.
metric (str | list[str]): Metrics to be evaluated.
Default: 'waymo'. Another supported metric is 'kitti'.
logger (logging.Logger | str | None): Logger used for printing
related information during evaluation. Default: None.
pklfile_prefix (str | None): The prefix of pkl files. It includes
the file path and the prefix of filename, e.g., "a/b/prefix".
If not specified, a temp file will be created. Default: None.
submission_prefix (str | None): The prefix of submission datas.
If not specified, the submission data will not be generated.
show (bool): Whether to visualize.
Default: False.
out_dir (str): Path to save the visualization results.
Default: None.
pipeline (list[dict], optional): raw data loading for showing.
Default: None.
Returns:
dict[str: float]: results of each evaluation metric
"""
from av2.evaluation.detection.constants import CompetitionCategories
from av2.evaluation.detection.utils import DetectionCfg
from av2.evaluation.detection.eval import evaluate
from av2.utils.io import read_feather
dts = self.format_results(results, class_names, pklfile_prefix, submission_prefix)
argo2_root = "../data/argo2/"
val_anno_path = osp.join(argo2_root, 'val_anno.feather')
gts = read_feather(val_anno_path)
gts = gts.set_index(["log_id", "timestamp_ns"]).sort_values("category")
valid_uuids_gts = gts.index.tolist()
valid_uuids_dts = dts.index.tolist()
valid_uuids = set(valid_uuids_gts) & set(valid_uuids_dts)
gts = gts.loc[list(valid_uuids)].sort_index()
categories = set(x.value for x in CompetitionCategories)
categories &= set(gts["category"].unique().tolist())
split = 'val'
dataset_dir = Path(argo2_root) / 'sensor' / split
cfg = DetectionCfg(
dataset_dir=dataset_dir,
categories=tuple(sorted(categories)),
#split=split,
max_range_m=200.0,
eval_only_roi_instances=True,
)
# Evaluate using Argoverse detection API.
eval_dts, eval_gts, metrics = evaluate(
dts.reset_index(), gts.reset_index(), cfg
)
valid_categories = sorted(categories) + ["AVERAGE_METRICS"]
ap_dict = {}
for index, row in metrics.iterrows():
ap_dict[index] = row.to_json()
return metrics.loc[valid_categories], ap_dict
LABEL_ATTR = (
"tx_m",
"ty_m",
"tz_m",
"length_m",
"width_m",
"height_m",
"qw",
"qx",
"qy",
"qz",
)
\ No newline at end of file
"""SO(3) group transformations."""
import kornia.geometry.conversions as C
import torch
from torch import Tensor
from math import pi as PI
@torch.jit.script
def quat_to_mat(quat_wxyz: Tensor) -> Tensor:
"""Convert scalar first quaternion to rotation matrix.
Args:
quat_wxyz: (...,4) Scalar first quaternions.
Returns:
(...,3,3) 3D rotation matrices.
"""
return C.quaternion_to_rotation_matrix(
quat_wxyz, order=C.QuaternionCoeffOrder.WXYZ
)
# @torch.jit.script
def mat_to_quat(mat: Tensor) -> Tensor:
"""Convert rotation matrix to scalar first quaternion.
Args:
mat: (...,3,3) 3D rotation matrices.
Returns:
(...,4) Scalar first quaternions.
"""
return C.rotation_matrix_to_quaternion(
mat, order=C.QuaternionCoeffOrder.WXYZ
)
@torch.jit.script
def quat_to_xyz(
quat_wxyz: Tensor, singularity_value: float = PI / 2
) -> Tensor:
"""Convert scalar first quaternion to Tait-Bryan angles.
Reference:
https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles#Source_code_2
Args:
quat_wxyz: (...,4) Scalar first quaternions.
singularity_value: Value that's set at the singularities.
Returns:
(...,3) The Tait-Bryan angles --- roll, pitch, and yaw.
"""
qw = quat_wxyz[..., 0]
qx = quat_wxyz[..., 1]
qy = quat_wxyz[..., 2]
qz = quat_wxyz[..., 3]
# roll (x-axis rotation)
sinr_cosp = 2 * (qw * qx + qy * qz)
cosr_cosp = 1 - 2 * (qx * qx + qy * qy)
roll = torch.atan2(sinr_cosp, cosr_cosp)
# pitch (y-axis rotation)
pitch = 2 * (qw * qy - qz * qx)
is_out_of_range = torch.abs(pitch) >= 1
pitch[is_out_of_range] = torch.copysign(
torch.as_tensor(singularity_value), pitch[is_out_of_range]
)
pitch[~is_out_of_range] = torch.asin(pitch[~is_out_of_range])
# yaw (z-axis rotation)
siny_cosp = 2 * (qw * qz + qx * qy)
cosy_cosp = 1 - 2 * (qy * qy + qz * qz)
yaw = torch.atan2(siny_cosp, cosy_cosp)
xyz = torch.stack([roll, pitch, yaw], dim=-1)
return xyz
@torch.jit.script
def quat_to_yaw(quat_wxyz: Tensor) -> Tensor:
"""Convert scalar first quaternion to yaw (rotation about vertical axis).
Reference:
https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles#Source_code_2
Args:
quat_wxyz: (...,4) Scalar first quaternions.
Returns:
(...,) The rotation about the z-axis in radians.
"""
xyz = quat_to_xyz(quat_wxyz)
yaw_rad: Tensor = xyz[..., -1]
return yaw_rad
@torch.jit.script
def xyz_to_quat(xyz_rad: Tensor) -> Tensor:
"""Convert euler angles (xyz - pitch, roll, yaw) to scalar first quaternions.
Args:
xyz_rad: (...,3) Tensor of roll, pitch, and yaw in radians.
Returns:
(...,4) Scalar first quaternions (wxyz).
"""
x_rad = xyz_rad[..., 0]
y_rad = xyz_rad[..., 1]
z_rad = xyz_rad[..., 2]
cy = torch.cos(z_rad * 0.5)
sy = torch.sin(z_rad * 0.5)
cp = torch.cos(y_rad * 0.5)
sp = torch.sin(y_rad * 0.5)
cr = torch.cos(x_rad * 0.5)
sr = torch.sin(x_rad * 0.5)
qw = cr * cp * cy + sr * sp * sy
qx = sr * cp * cy - cr * sp * sy
qy = cr * sp * cy + sr * cp * sy
qz = cr * cp * sy - sr * sp * cy
quat_wxyz = torch.stack([qw, qx, qy, qz], dim=-1)
return quat_wxyz
@torch.jit.script
def yaw_to_quat(yaw_rad: Tensor) -> Tensor:
"""Convert yaw (rotation about the vertical axis) to scalar first quaternions.
Args:
yaw_rad: (...,1) Rotations about the z-axis.
Returns:
(...,4) scalar first quaternions (wxyz).
"""
xyz_rad = torch.zeros_like(yaw_rad)[..., None].repeat_interleave(3, dim=-1)
xyz_rad[..., -1] = yaw_rad
quat_wxyz: Tensor = xyz_to_quat(xyz_rad)
return quat_wxyz
......@@ -199,13 +199,19 @@ class DatasetTemplate(torch_data.Dataset):
data_dict[key].append(val)
batch_size = len(batch_list)
ret = {}
batch_size_ratio = 1
for key, val in data_dict.items():
try:
if key in ['voxels', 'voxel_num_points']:
if isinstance(val[0], list):
batch_size_ratio = len(val[0])
val = [i for item in val for i in item]
ret[key] = np.concatenate(val, axis=0)
elif key in ['points', 'voxel_coords']:
coors = []
if isinstance(val[0], list):
val = [i for item in val for i in item]
for i, coor in enumerate(val):
coor_pad = np.pad(coor, ((0, 0), (1, 0)), mode='constant', constant_values=i)
coors.append(coor_pad)
......@@ -287,5 +293,5 @@ class DatasetTemplate(torch_data.Dataset):
print('Error in collate_batch: key=%s' % key)
raise TypeError
ret['batch_size'] = batch_size
ret['batch_size'] = batch_size * batch_size_ratio
return ret
......@@ -113,6 +113,22 @@ class DataProcessor(object):
return data_dict
def double_flip(self, points):
# y flip
points_yflip = points.copy()
points_yflip[:, 1] = -points_yflip[:, 1]
# x flip
points_xflip = points.copy()
points_xflip[:, 0] = -points_xflip[:, 0]
# x y flip
points_xyflip = points.copy()
points_xyflip[:, 0] = -points_xyflip[:, 0]
points_xyflip[:, 1] = -points_xyflip[:, 1]
return points_yflip, points_xflip, points_xyflip
def transform_points_to_voxels(self, data_dict=None, config=None):
if data_dict is None:
grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
......@@ -138,6 +154,25 @@ class DataProcessor(object):
if not data_dict['use_lead_xyz']:
voxels = voxels[..., 3:] # remove xyz in voxels(N, 3)
if config.get('DOUBLE_FLIP', False):
voxels_list, voxel_coords_list, voxel_num_points_list = [voxels], [coordinates], [num_points]
points_yflip, points_xflip, points_xyflip = self.double_flip(points)
points_list = [points_yflip, points_xflip, points_xyflip]
keys = ['yflip', 'xflip', 'xyflip']
for i, key in enumerate(keys):
voxel_output = self.voxel_generator.generate(points_list[i])
voxels, coordinates, num_points = voxel_output
if not data_dict['use_lead_xyz']:
voxels = voxels[..., 3:]
voxels_list.append(voxels)
voxel_coords_list.append(coordinates)
voxel_num_points_list.append(num_points)
data_dict['voxels'] = voxels_list
data_dict['voxel_coords'] = voxel_coords_list
data_dict['voxel_num_points'] = voxel_num_points_list
else:
data_dict['voxels'] = voxels
data_dict['voxel_coords'] = coordinates
data_dict['voxel_num_points'] = num_points
......
......@@ -2,6 +2,7 @@ from .pointnet2_backbone import PointNet2Backbone, PointNet2MSG
from .spconv_backbone import VoxelBackBone8x, VoxelResBackBone8x
from .spconv_backbone_2d import PillarBackBone8x, PillarRes18BackBone8x
from .spconv_backbone_focal import VoxelBackBone8xFocal
from .spconv_backbone_voxelnext import VoxelResBackBone8xVoxelNeXt
from .spconv_unet import UNetV2
__all__ = {
......@@ -11,6 +12,7 @@ __all__ = {
'PointNet2MSG': PointNet2MSG,
'VoxelResBackBone8x': VoxelResBackBone8x,
'VoxelBackBone8xFocal': VoxelBackBone8xFocal,
'VoxelResBackBone8xVoxelNeXt': VoxelResBackBone8xVoxelNeXt,
'PillarBackBone8x': PillarBackBone8x,
'PillarRes18BackBone8x': PillarRes18BackBone8x
}
from functools import partial
import torch
import torch.nn as nn
from ...utils.spconv_utils import replace_feature, spconv
def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,
conv_type='subm', norm_fn=None):
if conv_type == 'subm':
conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)
elif conv_type == 'spconv':
conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
bias=False, indice_key=indice_key)
elif conv_type == 'inverseconv':
conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)
else:
raise NotImplementedError
m = spconv.SparseSequential(
conv,
norm_fn(out_channels),
nn.ReLU(),
)
return m
class SparseBasicBlock(spconv.SparseModule):
expansion = 1
def __init__(self, inplanes, planes, stride=1, norm_fn=None, downsample=None, indice_key=None):
super(SparseBasicBlock, self).__init__()
assert norm_fn is not None
bias = norm_fn is not None
self.conv1 = spconv.SubMConv3d(
inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key
)
self.bn1 = norm_fn(planes)
self.relu = nn.ReLU()
self.conv2 = spconv.SubMConv3d(
planes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key
)
self.bn2 = norm_fn(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = replace_feature(out, self.bn1(out.features))
out = replace_feature(out, self.relu(out.features))
out = self.conv2(out)
out = replace_feature(out, self.bn2(out.features))
if self.downsample is not None:
identity = self.downsample(x)
out = replace_feature(out, out.features + identity.features)
out = replace_feature(out, self.relu(out.features))
return out
class VoxelResBackBone8xVoxelNeXt(nn.Module):
def __init__(self, model_cfg, input_channels, grid_size, **kwargs):
super().__init__()
self.model_cfg = model_cfg
norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)
spconv_kernel_sizes = model_cfg.get('SPCONV_KERNEL_SIZES', [3, 3, 3, 3])
channels = model_cfg.get('CHANNELS', [16, 32, 64, 128, 128])
out_channel = model_cfg.get('OUT_CHANNEL', 128)
self.sparse_shape = grid_size[::-1] + [1, 0, 0]
self.conv_input = spconv.SparseSequential(
spconv.SubMConv3d(input_channels, channels[0], 3, padding=1, bias=False, indice_key='subm1'),
norm_fn(channels[0]),
nn.ReLU(),
)
block = post_act_block
self.conv1 = spconv.SparseSequential(
SparseBasicBlock(channels[0], channels[0], norm_fn=norm_fn, indice_key='res1'),
SparseBasicBlock(channels[0], channels[0], norm_fn=norm_fn, indice_key='res1'),
)
self.conv2 = spconv.SparseSequential(
# [1600, 1408, 41] <- [800, 704, 21]
block(channels[0], channels[1], spconv_kernel_sizes[0], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[0]//2), indice_key='spconv2', conv_type='spconv'),
SparseBasicBlock(channels[1], channels[1], norm_fn=norm_fn, indice_key='res2'),
SparseBasicBlock(channels[1], channels[1], norm_fn=norm_fn, indice_key='res2'),
)
self.conv3 = spconv.SparseSequential(
# [800, 704, 21] <- [400, 352, 11]
block(channels[1], channels[2], spconv_kernel_sizes[1], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[1]//2), indice_key='spconv3', conv_type='spconv'),
SparseBasicBlock(channels[2], channels[2], norm_fn=norm_fn, indice_key='res3'),
SparseBasicBlock(channels[2], channels[2], norm_fn=norm_fn, indice_key='res3'),
)
self.conv4 = spconv.SparseSequential(
# [400, 352, 11] <- [200, 176, 6]
block(channels[2], channels[3], spconv_kernel_sizes[2], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[2]//2), indice_key='spconv4', conv_type='spconv'),
SparseBasicBlock(channels[3], channels[3], norm_fn=norm_fn, indice_key='res4'),
SparseBasicBlock(channels[3], channels[3], norm_fn=norm_fn, indice_key='res4'),
)
self.conv5 = spconv.SparseSequential(
# [200, 176, 6] <- [100, 88, 3]
block(channels[3], channels[4], spconv_kernel_sizes[3], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[3]//2), indice_key='spconv5', conv_type='spconv'),
SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res5'),
SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res5'),
)
self.conv6 = spconv.SparseSequential(
# [200, 176, 6] <- [100, 88, 3]
block(channels[4], channels[4], spconv_kernel_sizes[3], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[3]//2), indice_key='spconv6', conv_type='spconv'),
SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res6'),
SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res6'),
)
self.conv_out = spconv.SparseSequential(
# [200, 150, 5] -> [200, 150, 2]
spconv.SparseConv2d(channels[3], out_channel, 3, stride=1, padding=1, bias=False, indice_key='spconv_down2'),
norm_fn(out_channel),
nn.ReLU(),
)
self.shared_conv = spconv.SparseSequential(
spconv.SubMConv2d(out_channel, out_channel, 3, stride=1, padding=1, bias=True),
nn.BatchNorm1d(out_channel),
nn.ReLU(True),
)
self.forward_ret_dict = {}
self.num_point_features = out_channel
self.backbone_channels = {
'x_conv1': channels[0],
'x_conv2': channels[1],
'x_conv3': channels[2],
'x_conv4': channels[3]
}
def bev_out(self, x_conv):
features_cat = x_conv.features
indices_cat = x_conv.indices[:, [0, 2, 3]]
spatial_shape = x_conv.spatial_shape[1:]
indices_unique, _inv = torch.unique(indices_cat, dim=0, return_inverse=True)
features_unique = features_cat.new_zeros((indices_unique.shape[0], features_cat.shape[1]))
features_unique.index_add_(0, _inv, features_cat)
x_out = spconv.SparseConvTensor(
features=features_unique,
indices=indices_unique,
spatial_shape=spatial_shape,
batch_size=x_conv.batch_size
)
return x_out
def forward(self, batch_dict):
"""
Args:
batch_dict:
batch_size: int
vfe_features: (num_voxels, C)
voxel_coords: (num_voxels, 4), [batch_idx, z_idx, y_idx, x_idx]
Returns:
batch_dict:
encoded_spconv_tensor: sparse tensor
"""
voxel_features, voxel_coords = batch_dict['voxel_features'], batch_dict['voxel_coords']
batch_size = batch_dict['batch_size']
input_sp_tensor = spconv.SparseConvTensor(
features=voxel_features,
indices=voxel_coords.int(),
spatial_shape=self.sparse_shape,
batch_size=batch_size
)
x = self.conv_input(input_sp_tensor)
x_conv1 = self.conv1(x)
x_conv2 = self.conv2(x_conv1)
x_conv3 = self.conv3(x_conv2)
x_conv4 = self.conv4(x_conv3)
x_conv5 = self.conv5(x_conv4)
x_conv6 = self.conv6(x_conv5)
x_conv5.indices[:, 1:] *= 2
x_conv6.indices[:, 1:] *= 4
x_conv4 = x_conv4.replace_feature(torch.cat([x_conv4.features, x_conv5.features, x_conv6.features]))
x_conv4.indices = torch.cat([x_conv4.indices, x_conv5.indices, x_conv6.indices])
out = self.bev_out(x_conv4)
out = self.conv_out(out)
out = self.shared_conv(out)
batch_dict.update({
'encoded_spconv_tensor': out,
'encoded_spconv_tensor_stride': 8
})
batch_dict.update({
'multi_scale_3d_features': {
'x_conv1': x_conv1,
'x_conv2': x_conv2,
'x_conv3': x_conv3,
'x_conv4': x_conv4,
}
})
batch_dict.update({
'multi_scale_3d_strides': {
'x_conv1': 1,
'x_conv2': 2,
'x_conv3': 4,
'x_conv4': 8,
}
})
return batch_dict
......@@ -5,6 +5,7 @@ from .point_head_box import PointHeadBox
from .point_head_simple import PointHeadSimple
from .point_intra_part_head import PointIntraPartOffsetHead
from .center_head import CenterHead
from .voxelnext_head import VoxelNeXtHead
__all__ = {
'AnchorHeadTemplate': AnchorHeadTemplate,
......@@ -13,5 +14,6 @@ __all__ = {
'PointHeadSimple': PointHeadSimple,
'PointHeadBox': PointHeadBox,
'AnchorHeadMulti': AnchorHeadMulti,
'CenterHead': CenterHead
'CenterHead': CenterHead,
'VoxelNeXtHead': VoxelNeXtHead,
}
import numpy as np
import torch
import torch.nn as nn
from torch.nn.init import kaiming_normal_
from ..model_utils import centernet_utils
from ..model_utils import model_nms_utils
from ...utils import loss_utils
import spconv.pytorch as spconv
import copy
from easydict import EasyDict
class SeparateHead(nn.Module):
def __init__(self, input_channels, sep_head_dict, kernel_size, init_bias=-2.19, use_bias=False):
super().__init__()
self.sep_head_dict = sep_head_dict
for cur_name in self.sep_head_dict:
output_channels = self.sep_head_dict[cur_name]['out_channels']
num_conv = self.sep_head_dict[cur_name]['num_conv']
fc_list = []
for k in range(num_conv - 1):
fc_list.append(spconv.SparseSequential(
spconv.SubMConv2d(input_channels, input_channels, kernel_size, padding=int(kernel_size//2), bias=use_bias, indice_key=cur_name),
nn.BatchNorm1d(input_channels),
nn.ReLU()
))
fc_list.append(spconv.SubMConv2d(input_channels, output_channels, 1, bias=True, indice_key=cur_name+'out'))
fc = nn.Sequential(*fc_list)
if 'hm' in cur_name:
fc[-1].bias.data.fill_(init_bias)
else:
for m in fc.modules():
if isinstance(m, spconv.SubMConv2d):
kaiming_normal_(m.weight.data)
if hasattr(m, "bias") and m.bias is not None:
nn.init.constant_(m.bias, 0)
self.__setattr__(cur_name, fc)
def forward(self, x):
ret_dict = {}
for cur_name in self.sep_head_dict:
ret_dict[cur_name] = self.__getattr__(cur_name)(x).features
return ret_dict
class VoxelNeXtHead(nn.Module):
def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size,
predict_boxes_when_training=False):
super().__init__()
self.model_cfg = model_cfg
self.num_class = num_class
self.grid_size = grid_size
self.point_cloud_range = torch.Tensor(point_cloud_range).cuda()
self.voxel_size = torch.Tensor(voxel_size).cuda()
self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None)
self.class_names = class_names
self.class_names_each_head = []
self.class_id_mapping_each_head = []
self.gaussian_ratio = self.model_cfg.get('GAUSSIAN_RATIO', 1)
self.gaussian_type = self.model_cfg.get('GAUSSIAN_TYPE', ['nearst', 'gt_center'])
# The iou branch is only used for Waymo dataset
self.iou_branch = self.model_cfg.get('IOU_BRANCH', False)
if self.iou_branch:
self.rectifier = self.model_cfg.get('RECTIFIER')
nms_configs = self.model_cfg.POST_PROCESSING.NMS_CONFIG
self.nms_configs = [EasyDict(NMS_TYPE=nms_configs.NMS_TYPE,
NMS_THRESH=nms_configs.NMS_THRESH[i],
NMS_PRE_MAXSIZE=nms_configs.NMS_PRE_MAXSIZE[i],
NMS_POST_MAXSIZE=nms_configs.NMS_POST_MAXSIZE[i]) for i in range(num_class)]
self.double_flip = self.model_cfg.get('DOUBLE_FLIP', False)
for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD:
self.class_names_each_head.append([x for x in cur_class_names if x in class_names])
cur_class_id_mapping = torch.from_numpy(np.array(
[self.class_names.index(x) for x in cur_class_names if x in class_names]
)).cuda()
self.class_id_mapping_each_head.append(cur_class_id_mapping)
total_classes = sum([len(x) for x in self.class_names_each_head])
assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}'
kernel_size_head = self.model_cfg.get('KERNEL_SIZE_HEAD', 3)
self.heads_list = nn.ModuleList()
self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG
for idx, cur_class_names in enumerate(self.class_names_each_head):
cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT)
cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV)
self.heads_list.append(
SeparateHead(
input_channels=self.model_cfg.get('SHARED_CONV_CHANNEL', 128),
sep_head_dict=cur_head_dict,
kernel_size=kernel_size_head,
init_bias=-2.19,
use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False),
)
)
self.predict_boxes_when_training = predict_boxes_when_training
self.forward_ret_dict = {}
self.build_losses()
def build_losses(self):
self.add_module('hm_loss_func', loss_utils.FocalLossSparse())
self.add_module('reg_loss_func', loss_utils.RegLossSparse())
if self.iou_branch:
self.add_module('crit_iou', loss_utils.IouLossSparse())
self.add_module('crit_iou_reg', loss_utils.IouRegLossSparse())
def assign_targets(self, gt_boxes, num_voxels, spatial_indices, spatial_shape):
"""
Args:
gt_boxes: (B, M, 8)
Returns:
"""
target_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG
batch_size = gt_boxes.shape[0]
ret_dict = {
'heatmaps': [],
'target_boxes': [],
'inds': [],
'masks': [],
'heatmap_masks': [],
'gt_boxes': []
}
all_names = np.array(['bg', *self.class_names])
for idx, cur_class_names in enumerate(self.class_names_each_head):
heatmap_list, target_boxes_list, inds_list, masks_list, gt_boxes_list = [], [], [], [], []
for bs_idx in range(batch_size):
cur_gt_boxes = gt_boxes[bs_idx]
gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()]
gt_boxes_single_head = []
for idx, name in enumerate(gt_class_names):
if name not in cur_class_names:
continue
temp_box = cur_gt_boxes[idx]
temp_box[-1] = cur_class_names.index(name) + 1
gt_boxes_single_head.append(temp_box[None, :])
if len(gt_boxes_single_head) == 0:
gt_boxes_single_head = cur_gt_boxes[:0, :]
else:
gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0)
heatmap, ret_boxes, inds, mask = self.assign_target_of_single_head(
num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head,
num_voxels=num_voxels[bs_idx], spatial_indices=spatial_indices[bs_idx],
spatial_shape=spatial_shape,
feature_map_stride=target_assigner_cfg.FEATURE_MAP_STRIDE,
num_max_objs=target_assigner_cfg.NUM_MAX_OBJS,
gaussian_overlap=target_assigner_cfg.GAUSSIAN_OVERLAP,
min_radius=target_assigner_cfg.MIN_RADIUS,
)
heatmap_list.append(heatmap.to(gt_boxes_single_head.device))
target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device))
inds_list.append(inds.to(gt_boxes_single_head.device))
masks_list.append(mask.to(gt_boxes_single_head.device))
gt_boxes_list.append(gt_boxes_single_head[:, :-1])
ret_dict['heatmaps'].append(torch.cat(heatmap_list, dim=1).permute(1, 0))
ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0))
ret_dict['inds'].append(torch.stack(inds_list, dim=0))
ret_dict['masks'].append(torch.stack(masks_list, dim=0))
ret_dict['gt_boxes'].append(gt_boxes_list)
return ret_dict
def distance(self, voxel_indices, center):
distances = ((voxel_indices - center.unsqueeze(0))**2).sum(-1)
return distances
def assign_target_of_single_head(
self, num_classes, gt_boxes, num_voxels, spatial_indices, spatial_shape, feature_map_stride, num_max_objs=500,
gaussian_overlap=0.1, min_radius=2
):
"""
Args:
gt_boxes: (N, 8)
feature_map_size: (2), [x, y]
Returns:
"""
heatmap = gt_boxes.new_zeros(num_classes, num_voxels)
ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1))
inds = gt_boxes.new_zeros(num_max_objs).long()
mask = gt_boxes.new_zeros(num_max_objs).long()
x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2]
coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride
coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride
coord_x = torch.clamp(coord_x, min=0, max=spatial_shape[1] - 0.5) # bugfixed: 1e-6 does not work for center.int()
coord_y = torch.clamp(coord_y, min=0, max=spatial_shape[0] - 0.5) #
center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1)
center_int = center.int()
center_int_float = center_int.float()
dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5]
dx = dx / self.voxel_size[0] / feature_map_stride
dy = dy / self.voxel_size[1] / feature_map_stride
radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap)
radius = torch.clamp_min(radius.int(), min=min_radius)
for k in range(min(num_max_objs, gt_boxes.shape[0])):
if dx[k] <= 0 or dy[k] <= 0:
continue
if not (0 <= center_int[k][0] <= spatial_shape[1] and 0 <= center_int[k][1] <= spatial_shape[0]):
continue
cur_class_id = (gt_boxes[k, -1] - 1).long()
distance = self.distance(spatial_indices, center[k])
inds[k] = distance.argmin()
mask[k] = 1
if 'gt_center' in self.gaussian_type:
centernet_utils.draw_gaussian_to_heatmap_voxels(heatmap[cur_class_id], distance, radius[k].item() * self.gaussian_ratio)
if 'nearst' in self.gaussian_type:
centernet_utils.draw_gaussian_to_heatmap_voxels(heatmap[cur_class_id], self.distance(spatial_indices, spatial_indices[inds[k]]), radius[k].item() * self.gaussian_ratio)
ret_boxes[k, 0:2] = center[k] - spatial_indices[inds[k]][:2]
ret_boxes[k, 2] = z[k]
ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log()
ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6])
ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6])
if gt_boxes.shape[1] > 8:
ret_boxes[k, 8:] = gt_boxes[k, 7:-1]
return heatmap, ret_boxes, inds, mask
def sigmoid(self, x):
y = torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4)
return y
def get_loss(self):
pred_dicts = self.forward_ret_dict['pred_dicts']
target_dicts = self.forward_ret_dict['target_dicts']
batch_index = self.forward_ret_dict['batch_index']
tb_dict = {}
loss = 0
batch_indices = self.forward_ret_dict['voxel_indices'][:, 0]
spatial_indices = self.forward_ret_dict['voxel_indices'][:, 1:]
for idx, pred_dict in enumerate(pred_dicts):
pred_dict['hm'] = self.sigmoid(pred_dict['hm'])
hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][idx])
hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight']
target_boxes = target_dicts['target_boxes'][idx]
pred_boxes = torch.cat([pred_dict[head_name] for head_name in self.separate_head_cfg.HEAD_ORDER], dim=1)
reg_loss = self.reg_loss_func(
pred_boxes, target_dicts['masks'][idx], target_dicts['inds'][idx], target_boxes, batch_index
)
loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum()
loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
tb_dict['hm_loss_head_%d' % idx] = hm_loss.item()
tb_dict['loc_loss_head_%d' % idx] = loc_loss.item()
if self.iou_branch:
batch_box_preds = self._get_predicted_boxes(pred_dict, spatial_indices)
pred_boxes_for_iou = batch_box_preds.detach()
iou_loss = self.crit_iou(pred_dict['iou'], target_dicts['masks'][idx], target_dicts['inds'][idx],
pred_boxes_for_iou, target_dicts['gt_boxes'][idx], batch_indices)
iou_reg_loss = self.crit_iou_reg(batch_box_preds, target_dicts['masks'][idx], target_dicts['inds'][idx],
target_dicts['gt_boxes'][idx], batch_indices)
iou_weight = self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['iou_weight'] if 'iou_weight' in self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS else self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
iou_reg_loss = iou_reg_loss * iou_weight #self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
loss += (hm_loss + loc_loss + iou_loss + iou_reg_loss)
tb_dict['iou_loss_head_%d' % idx] = iou_loss.item()
tb_dict['iou_reg_loss_head_%d' % idx] = iou_reg_loss.item()
else:
loss += hm_loss + loc_loss
tb_dict['rpn_loss'] = loss.item()
return loss, tb_dict
def _get_predicted_boxes(self, pred_dict, spatial_indices):
center = pred_dict['center']
center_z = pred_dict['center_z']
#dim = pred_dict['dim'].exp()
dim = torch.exp(torch.clamp(pred_dict['dim'], min=-5, max=5))
rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1)
rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1)
angle = torch.atan2(rot_sin, rot_cos)
xs = (spatial_indices[:, 1:2] + center[:, 0:1]) * self.feature_map_stride * self.voxel_size[0] + self.point_cloud_range[0]
ys = (spatial_indices[:, 0:1] + center[:, 1:2]) * self.feature_map_stride * self.voxel_size[1] + self.point_cloud_range[1]
box_part_list = [xs, ys, center_z, dim, angle]
pred_box = torch.cat((box_part_list), dim=-1)
return pred_box
def rotate_class_specific_nms_iou(self, boxes, scores, iou_preds, labels, rectifier, nms_configs):
"""
:param boxes: (N, 5) [x, y, z, l, w, h, theta]
:param scores: (N)
:param thresh:
:return:
"""
assert isinstance(rectifier, list)
box_preds_list, scores_list, labels_list = [], [], []
for cls in range(self.num_class):
mask = labels == cls
boxes_cls = boxes[mask]
scores_cls = torch.pow(scores[mask], 1 - rectifier[cls]) * torch.pow(iou_preds[mask].squeeze(-1), rectifier[cls])
labels_cls = labels[mask]
selected, selected_scores = model_nms_utils.class_agnostic_nms(box_scores=scores_cls, box_preds=boxes_cls,
nms_config=nms_configs[cls], score_thresh=None)
box_preds_list.append(boxes_cls[selected])
scores_list.append(scores_cls[selected])
labels_list.append(labels_cls[selected])
return torch.cat(box_preds_list, dim=0), torch.cat(scores_list, dim=0), torch.cat(labels_list, dim=0)
def merge_double_flip(self, pred_dict, batch_size, voxel_indices, spatial_shape):
# spatial_shape (Z, Y, X)
pred_dict['hm'] = pred_dict['hm'].sigmoid()
pred_dict['dim'] = pred_dict['dim'].exp()
batch_indices = voxel_indices[:, 0]
spatial_indices = voxel_indices[:, 1:]
pred_dict_ = {k: [] for k in pred_dict.keys()}
counts = []
spatial_indices_ = []
for bs_idx in range(batch_size):
spatial_indices_batch = []
pred_dict_batch = {k: [] for k in pred_dict.keys()}
for i in range(4):
bs_indices = batch_indices == (bs_idx * 4 + i)
if i in [1, 3]:
spatial_indices[bs_indices, 0] = spatial_shape[0] - spatial_indices[bs_indices, 0]
if i in [2, 3]:
spatial_indices[bs_indices, 1] = spatial_shape[1] - spatial_indices[bs_indices, 1]
if i == 1:
pred_dict['center'][bs_indices, 1] = - pred_dict['center'][bs_indices, 1]
pred_dict['rot'][bs_indices, 1] *= -1
pred_dict['vel'][bs_indices, 1] *= -1
if i == 2:
pred_dict['center'][bs_indices, 0] = - pred_dict['center'][bs_indices, 0]
pred_dict['rot'][bs_indices, 0] *= -1
pred_dict['vel'][bs_indices, 0] *= -1
if i == 3:
pred_dict['center'][bs_indices, 0] = - pred_dict['center'][bs_indices, 0]
pred_dict['center'][bs_indices, 1] = - pred_dict['center'][bs_indices, 1]
pred_dict['rot'][bs_indices, 1] *= -1
pred_dict['rot'][bs_indices, 0] *= -1
pred_dict['vel'][bs_indices] *= -1
spatial_indices_batch.append(spatial_indices[bs_indices])
for k in pred_dict.keys():
pred_dict_batch[k].append(pred_dict[k][bs_indices])
spatial_indices_batch = torch.cat(spatial_indices_batch)
spatial_indices_unique, _inv, count = torch.unique(spatial_indices_batch, dim=0, return_inverse=True,
return_counts=True)
spatial_indices_.append(spatial_indices_unique)
counts.append(count)
for k in pred_dict.keys():
pred_dict_batch[k] = torch.cat(pred_dict_batch[k])
features_unique = pred_dict_batch[k].new_zeros(
(spatial_indices_unique.shape[0], pred_dict_batch[k].shape[1]))
features_unique.index_add_(0, _inv, pred_dict_batch[k])
pred_dict_[k].append(features_unique)
for k in pred_dict.keys():
pred_dict_[k] = torch.cat(pred_dict_[k])
counts = torch.cat(counts).unsqueeze(-1).float()
voxel_indices_ = torch.cat([torch.cat(
[torch.full((indices.shape[0], 1), i, device=indices.device, dtype=indices.dtype), indices], dim=1
) for i, indices in enumerate(spatial_indices_)])
batch_hm = pred_dict_['hm']
batch_center = pred_dict_['center']
batch_center_z = pred_dict_['center_z']
batch_dim = pred_dict_['dim']
batch_rot_cos = pred_dict_['rot'][:, 0].unsqueeze(dim=1)
batch_rot_sin = pred_dict_['rot'][:, 1].unsqueeze(dim=1)
batch_vel = pred_dict_['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None
batch_hm /= counts
batch_center /= counts
batch_center_z /= counts
batch_dim /= counts
batch_rot_cos /= counts
batch_rot_sin /= counts
if not batch_vel is None:
batch_vel /= counts
return batch_hm, batch_center, batch_center_z, batch_dim, batch_rot_cos, batch_rot_sin, batch_vel, None, voxel_indices_
def generate_predicted_boxes(self, batch_size, pred_dicts, voxel_indices, spatial_shape):
post_process_cfg = self.model_cfg.POST_PROCESSING
post_center_limit_range = torch.tensor(post_process_cfg.POST_CENTER_LIMIT_RANGE).cuda().float()
ret_dict = [{
'pred_boxes': [],
'pred_scores': [],
'pred_labels': [],
'pred_ious': [],
} for k in range(batch_size)]
for idx, pred_dict in enumerate(pred_dicts):
if self.double_flip:
batch_hm, batch_center, batch_center_z, batch_dim, batch_rot_cos, batch_rot_sin, batch_vel, batch_iou, voxel_indices_ = \
self.merge_double_flip(pred_dict, batch_size, voxel_indices.clone(), spatial_shape)
else:
batch_hm = pred_dict['hm'].sigmoid()
batch_center = pred_dict['center']
batch_center_z = pred_dict['center_z']
batch_dim = pred_dict['dim'].exp()
batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1)
batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1)
batch_iou = (pred_dict['iou'] + 1) * 0.5 if self.iou_branch else None
batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None
voxel_indices_ = voxel_indices
final_pred_dicts = centernet_utils.decode_bbox_from_voxels_nuscenes(
batch_size=batch_size, indices=voxel_indices_,
obj=batch_hm,
rot_cos=batch_rot_cos,
rot_sin=batch_rot_sin,
center=batch_center, center_z=batch_center_z,
dim=batch_dim, vel=batch_vel, iou=batch_iou,
point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size,
feature_map_stride=self.feature_map_stride,
K=post_process_cfg.MAX_OBJ_PER_SAMPLE,
#circle_nms=(post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'),
score_thresh=post_process_cfg.SCORE_THRESH,
post_center_limit_range=post_center_limit_range
)
for k, final_dict in enumerate(final_pred_dicts):
final_dict['pred_labels'] = self.class_id_mapping_each_head[idx][final_dict['pred_labels'].long()]
if not self.iou_branch:
selected, selected_scores = model_nms_utils.class_agnostic_nms(
box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'],
nms_config=post_process_cfg.NMS_CONFIG,
score_thresh=None
)
final_dict['pred_boxes'] = final_dict['pred_boxes'][selected]
final_dict['pred_scores'] = selected_scores
final_dict['pred_labels'] = final_dict['pred_labels'][selected]
ret_dict[k]['pred_boxes'].append(final_dict['pred_boxes'])
ret_dict[k]['pred_scores'].append(final_dict['pred_scores'])
ret_dict[k]['pred_labels'].append(final_dict['pred_labels'])
ret_dict[k]['pred_ious'].append(final_dict['pred_ious'])
for k in range(batch_size):
pred_boxes = torch.cat(ret_dict[k]['pred_boxes'], dim=0)
pred_scores = torch.cat(ret_dict[k]['pred_scores'], dim=0)
pred_labels = torch.cat(ret_dict[k]['pred_labels'], dim=0)
if self.iou_branch:
pred_ious = torch.cat(ret_dict[k]['pred_ious'], dim=0)
pred_boxes, pred_scores, pred_labels = self.rotate_class_specific_nms_iou(pred_boxes, pred_scores, pred_ious, pred_labels, self.rectifier, self.nms_configs)
ret_dict[k]['pred_boxes'] = pred_boxes
ret_dict[k]['pred_scores'] = pred_scores
ret_dict[k]['pred_labels'] = pred_labels + 1
return ret_dict
@staticmethod
def reorder_rois_for_refining(batch_size, pred_dicts):
num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts])
num_max_rois = max(1, num_max_rois) # at least one faked rois to avoid error
pred_boxes = pred_dicts[0]['pred_boxes']
rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1]))
roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois))
roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long()
for bs_idx in range(batch_size):
num_boxes = len(pred_dicts[bs_idx]['pred_boxes'])
rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes']
roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores']
roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels']
return rois, roi_scores, roi_labels
def _get_voxel_infos(self, x):
spatial_shape = x.spatial_shape
voxel_indices = x.indices
spatial_indices = []
num_voxels = []
batch_size = x.batch_size
batch_index = voxel_indices[:, 0]
for bs_idx in range(batch_size):
batch_inds = batch_index==bs_idx
spatial_indices.append(voxel_indices[batch_inds][:, [2, 1]])
num_voxels.append(batch_inds.sum())
return spatial_shape, batch_index, voxel_indices, spatial_indices, num_voxels
def forward(self, data_dict):
x = data_dict['encoded_spconv_tensor']
spatial_shape, batch_index, voxel_indices, spatial_indices, num_voxels = self._get_voxel_infos(x)
self.forward_ret_dict['batch_index'] = batch_index
pred_dicts = []
for head in self.heads_list:
pred_dicts.append(head(x))
if self.training:
target_dict = self.assign_targets(
data_dict['gt_boxes'], num_voxels, spatial_indices, spatial_shape
)
self.forward_ret_dict['target_dicts'] = target_dict
self.forward_ret_dict['pred_dicts'] = pred_dicts
self.forward_ret_dict['voxel_indices'] = voxel_indices
if not self.training or self.predict_boxes_when_training:
if self.double_flip:
data_dict['batch_size'] = data_dict['batch_size'] // 4
pred_dicts = self.generate_predicted_boxes(
data_dict['batch_size'],
pred_dicts, voxel_indices, spatial_shape
)
if self.predict_boxes_when_training:
rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], pred_dicts)
data_dict['rois'] = rois
data_dict['roi_scores'] = roi_scores
data_dict['roi_labels'] = roi_labels
data_dict['has_class_labels'] = True
else:
data_dict['final_box_dicts'] = pred_dicts
return data_dict
......@@ -12,6 +12,7 @@ from .pv_rcnn_plusplus import PVRCNNPlusPlus
from .mppnet import MPPNet
from .mppnet_e2e import MPPNetE2E
from .pillarnet import PillarNet
from .voxelnext import VoxelNeXt
__all__ = {
'Detector3DTemplate': Detector3DTemplate,
......@@ -28,7 +29,8 @@ __all__ = {
'PVRCNNPlusPlus': PVRCNNPlusPlus,
'MPPNet': MPPNet,
'MPPNetE2E': MPPNetE2E,
'PillarNet': PillarNet
'PillarNet': PillarNet,
'VoxelNeXt': VoxelNeXt
}
......
......@@ -127,7 +127,7 @@ class Detector3DTemplate(nn.Module):
return None, model_info_dict
dense_head_module = dense_heads.__all__[self.model_cfg.DENSE_HEAD.NAME](
model_cfg=self.model_cfg.DENSE_HEAD,
input_channels=model_info_dict['num_bev_features'],
input_channels=model_info_dict['num_bev_features'] if 'num_bev_features' in model_info_dict else self.model_cfg.DENSE_HEAD.INPUT_FEATURES,
num_class=self.num_class if not self.model_cfg.DENSE_HEAD.CLASS_AGNOSTIC else 1,
class_names=self.class_names,
grid_size=model_info_dict['grid_size'],
......
from .detector3d_template import Detector3DTemplate
class VoxelNeXt(Detector3DTemplate):
def __init__(self, model_cfg, num_class, dataset):
super().__init__(model_cfg=model_cfg, num_class=num_class, dataset=dataset)
self.module_list = self.build_networks()
def forward(self, batch_dict):
for cur_module in self.module_list:
batch_dict = cur_module(batch_dict)
if self.training:
loss, tb_dict, disp_dict = self.get_training_loss()
ret_dict = {
'loss': loss
}
return ret_dict, tb_dict, disp_dict
else:
pred_dicts, recall_dicts = self.post_processing(batch_dict)
return pred_dicts, recall_dicts
def get_training_loss(self):
disp_dict = {}
loss, tb_dict = self.dense_head.get_loss()
return loss, tb_dict, disp_dict
def post_processing(self, batch_dict):
post_process_cfg = self.model_cfg.POST_PROCESSING
batch_size = batch_dict['batch_size']
final_pred_dict = batch_dict['final_box_dicts']
recall_dict = {}
for index in range(batch_size):
pred_boxes = final_pred_dict[index]['pred_boxes']
recall_dict = self.generate_recall_record(
box_preds=pred_boxes,
recall_dict=recall_dict, batch_index=index, data_dict=batch_dict,
thresh_list=post_process_cfg.RECALL_THRESH_LIST
)
return final_pred_dict, recall_dict
......@@ -77,6 +77,25 @@ def _nms(heat, kernel=3):
return heat * keep
def gaussian3D(shape, sigma=1):
m, n = [(ss - 1.) / 2. for ss in shape]
y, x = np.ogrid[-m:m + 1, -n:n + 1]
h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))
h[h < np.finfo(h.dtype).eps * h.max()] = 0
return h
def draw_gaussian_to_heatmap_voxels(heatmap, distances, radius, k=1):
diameter = 2 * radius + 1
sigma = diameter / 6
masked_gaussian = torch.exp(- distances / (2 * sigma * sigma))
torch.max(heatmap, masked_gaussian, out=heatmap)
return heatmap
@numba.jit(nopython=True)
def circle_nms(dets, thresh):
x1 = dets[:, 0]
......@@ -214,3 +233,116 @@ def decode_bbox_from_heatmap(heatmap, rot_cos, rot_sin, center, center_z, dim,
'pred_labels': cur_labels
})
return ret_pred_dicts
def _topk_1d(scores, batch_size, batch_idx, obj, K=40, nuscenes=False):
# scores: (N, num_classes)
topk_score_list = []
topk_inds_list = []
topk_classes_list = []
for bs_idx in range(batch_size):
batch_inds = batch_idx==bs_idx
if obj.shape[-1] == 1 and not nuscenes:
score = scores[batch_inds].permute(1, 0)
topk_scores, topk_inds = torch.topk(score, K)
topk_score, topk_ind = torch.topk(obj[topk_inds.view(-1)].squeeze(-1), K) #torch.topk(topk_scores.view(-1), K)
else:
score = obj[batch_inds].permute(1, 0)
topk_scores, topk_inds = torch.topk(score, min(K, score.shape[-1]))
topk_score, topk_ind = torch.topk(topk_scores.view(-1), min(K, topk_scores.view(-1).shape[-1]))
#topk_score, topk_ind = torch.topk(score.reshape(-1), K)
topk_classes = (topk_ind // K).int()
topk_inds = topk_inds.view(-1).gather(0, topk_ind)
#print('topk_inds', topk_inds)
if not obj is None and obj.shape[-1] == 1:
topk_score_list.append(obj[batch_inds][topk_inds])
else:
topk_score_list.append(topk_score)
topk_inds_list.append(topk_inds)
topk_classes_list.append(topk_classes)
topk_score = torch.stack(topk_score_list)
topk_inds = torch.stack(topk_inds_list)
topk_classes = torch.stack(topk_classes_list)
return topk_score, topk_inds, topk_classes
def gather_feat_idx(feats, inds, batch_size, batch_idx):
feats_list = []
dim = feats.size(-1)
_inds = inds.unsqueeze(-1).expand(inds.size(0), inds.size(1), dim)
for bs_idx in range(batch_size):
batch_inds = batch_idx==bs_idx
feat = feats[batch_inds]
feats_list.append(feat.gather(0, _inds[bs_idx]))
feats = torch.stack(feats_list)
return feats
def decode_bbox_from_voxels_nuscenes(batch_size, indices, obj, rot_cos, rot_sin,
center, center_z, dim, vel=None, iou=None, point_cloud_range=None, voxel_size=None, voxels_3d=None,
feature_map_stride=None, K=100, score_thresh=None, post_center_limit_range=None, add_features=None):
batch_idx = indices[:, 0]
spatial_indices = indices[:, 1:]
scores, inds, class_ids = _topk_1d(None, batch_size, batch_idx, obj, K=K, nuscenes=True)
center = gather_feat_idx(center, inds, batch_size, batch_idx)
rot_sin = gather_feat_idx(rot_sin, inds, batch_size, batch_idx)
rot_cos = gather_feat_idx(rot_cos, inds, batch_size, batch_idx)
center_z = gather_feat_idx(center_z, inds, batch_size, batch_idx)
dim = gather_feat_idx(dim, inds, batch_size, batch_idx)
spatial_indices = gather_feat_idx(spatial_indices, inds, batch_size, batch_idx)
if not add_features is None:
add_features = [gather_feat_idx(add_feature, inds, batch_size, batch_idx) for add_feature in add_features]
if not isinstance(feature_map_stride, int):
feature_map_stride = gather_feat_idx(feature_map_stride.unsqueeze(-1), inds, batch_size, batch_idx)
angle = torch.atan2(rot_sin, rot_cos)
xs = (spatial_indices[:, :, -1:] + center[:, :, 0:1]) * feature_map_stride * voxel_size[0] + point_cloud_range[0]
ys = (spatial_indices[:, :, -2:-1] + center[:, :, 1:2]) * feature_map_stride * voxel_size[1] + point_cloud_range[1]
#zs = (spatial_indices[:, :, 0:1]) * feature_map_stride * voxel_size[2] + point_cloud_range[2] + center_z
box_part_list = [xs, ys, center_z, dim, angle]
if not vel is None:
vel = gather_feat_idx(vel, inds, batch_size, batch_idx)
box_part_list.append(vel)
if not iou is None:
iou = gather_feat_idx(iou, inds, batch_size, batch_idx)
iou = torch.clamp(iou, min=0, max=1.)
final_box_preds = torch.cat((box_part_list), dim=-1)
final_scores = scores.view(batch_size, K)
final_class_ids = class_ids.view(batch_size, K)
if not add_features is None:
add_features = [add_feature.view(batch_size, K, add_feature.shape[-1]) for add_feature in add_features]
assert post_center_limit_range is not None
mask = (final_box_preds[..., :3] >= post_center_limit_range[:3]).all(2)
mask &= (final_box_preds[..., :3] <= post_center_limit_range[3:]).all(2)
if score_thresh is not None:
mask &= (final_scores > score_thresh)
ret_pred_dicts = []
for k in range(batch_size):
cur_mask = mask[k]
cur_boxes = final_box_preds[k, cur_mask]
cur_scores = final_scores[k, cur_mask]
cur_labels = final_class_ids[k, cur_mask]
cur_add_features = [add_feature[k, cur_mask] for add_feature in add_features] if not add_features is None else None
cur_iou = iou[k, cur_mask] if not iou is None else None
ret_pred_dicts.append({
'pred_boxes': cur_boxes,
'pred_scores': cur_scores,
'pred_labels': cur_labels,
'pred_ious': cur_iou,
'add_features': cur_add_features,
})
return ret_pred_dicts
......@@ -80,6 +80,42 @@ def boxes_iou3d_gpu(boxes_a, boxes_b):
return iou3d
def boxes_aligned_iou3d_gpu(boxes_a, boxes_b):
"""
Args:
boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading]
boxes_b: (N, 7) [x, y, z, dx, dy, dz, heading]
Returns:
ans_iou: (N,)
"""
assert boxes_a.shape[0] == boxes_b.shape[0]
assert boxes_a.shape[1] == boxes_b.shape[1] == 7
# height overlap
boxes_a_height_max = (boxes_a[:, 2] + boxes_a[:, 5] / 2).view(-1, 1)
boxes_a_height_min = (boxes_a[:, 2] - boxes_a[:, 5] / 2).view(-1, 1)
boxes_b_height_max = (boxes_b[:, 2] + boxes_b[:, 5] / 2).view(-1, 1)
boxes_b_height_min = (boxes_b[:, 2] - boxes_b[:, 5] / 2).view(-1, 1)
# bev overlap
overlaps_bev = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], 1))).zero_() # (N, M)
iou3d_nms_cuda.boxes_aligned_overlap_bev_gpu(boxes_a.contiguous(), boxes_b.contiguous(), overlaps_bev)
max_of_min = torch.max(boxes_a_height_min, boxes_b_height_min)
min_of_max = torch.min(boxes_a_height_max, boxes_b_height_max)
overlaps_h = torch.clamp(min_of_max - max_of_min, min=0)
# 3d iou
overlaps_3d = overlaps_bev * overlaps_h
vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).view(-1, 1)
vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).view(-1, 1)
iou3d = overlaps_3d / torch.clamp(vol_a + vol_b - overlaps_3d, min=1e-6)
return iou3d
def nms_gpu(boxes, scores, thresh, pre_maxsize=None, **kwargs):
"""
......
......@@ -250,3 +250,24 @@ int boxes_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::
}
return 1;
}
int boxes_aligned_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::Tensor ans_iou_tensor){
// params boxes_a_tensor: (N, 7) [x, y, z, dx, dy, dz, heading]
// params boxes_b_tensor: (N, 7) [x, y, z, dx, dy, dz, heading]
// params ans_iou_tensor: (N, 1)
CHECK_CONTIGUOUS(boxes_a_tensor);
CHECK_CONTIGUOUS(boxes_b_tensor);
int num_boxes = boxes_a_tensor.size(0);
int num_boxes_b = boxes_b_tensor.size(0);
assert(num_boxes == num_boxes_b);
const float *boxes_a = boxes_a_tensor.data<float>();
const float *boxes_b = boxes_b_tensor.data<float>();
float *ans_iou = ans_iou_tensor.data<float>();
for (int i = 0; i < num_boxes; i++){
ans_iou[i] = iou_bev(boxes_a + i * 7, boxes_b + i * 7);
}
return 1;
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment