Unverified Commit 02ac3e17 authored by Shaoshuai Shi's avatar Shaoshuai Shi Committed by GitHub
Browse files

Support multi-modal 3D detection on NuScenes #1339

Add support for multi-modal NuScenes Detection
parents ad9c25c0 fcfa0773
...@@ -10,6 +10,7 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18 ...@@ -10,6 +10,7 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
* `OpenPCDet` has been updated to `v0.6.0` (Sep. 2022). * `OpenPCDet` has been updated to `v0.6.0` (Sep. 2022).
* The codes of PV-RCNN++ has been supported. * The codes of PV-RCNN++ has been supported.
* The codes of MPPNet has been supported. * The codes of MPPNet has been supported.
* The multi-modal 3D detection approaches on Nuscenes have been supported.
## Overview ## Overview
- [Changelog](#changelog) - [Changelog](#changelog)
...@@ -22,10 +23,15 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18 ...@@ -22,10 +23,15 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
## Changelog ## Changelog
[2023-04-02] Added support for [`VoxelNeXt`](https://github.com/dvlab-research/VoxelNeXt) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels. [2023-05-13] **NEW:** Added support for the multi-modal 3D object detection models on Nuscenes dataset.
* Support multi-modal Nuscenes detection (See the [GETTING_STARTED.md](docs/GETTING_STARTED.md) to process data).
* Support [TransFusion-Lidar](https://arxiv.org/abs/2203.11496) head, which ahcieves 69.43% NDS on Nuscenes validation dataset.
* Support [`BEVFusion`](https://arxiv.org/abs/2205.13542), which fuses multi-modal information on BEV space and reaches 70.98% NDS on Nuscenes validation dataset. (see the [guideline](docs/guidelines_of_approaches/bevfusion.md) on how to train/test with BEVFusion).
[2023-04-02] Added support for [`VoxelNeXt`](https://arxiv.org/abs/2303.11301) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels.
[2022-09-02] **NEW:** Update `OpenPCDet` to v0.6.0: [2022-09-02] **NEW:** Update `OpenPCDet` to v0.6.0:
* Official code release of [MPPNet](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet). * Official code release of [`MPPNet`](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet).
* Support multi-frame training/testing on Waymo Open Dataset (see the [change log](docs/changelog.md) for more details on how to process data). * Support multi-frame training/testing on Waymo Open Dataset (see the [change log](docs/changelog.md) for more details on how to process data).
* Support to save changing training details (e.g., loss, iter, epoch) to file (previous tqdm progress bar is still supported by using `--use_tqdm_to_record`). Please use `pip install gpustat` if you also want to log the GPU related information. * Support to save changing training details (e.g., loss, iter, epoch) to file (previous tqdm progress bar is still supported by using `--use_tqdm_to_record`). Please use `pip install gpustat` if you also want to log the GPU related information.
* Support to save latest model every 5 mintues, so you can restore the model training from latest status instead of previous epoch. * Support to save latest model every 5 mintues, so you can restore the model training from latest status instead of previous epoch.
...@@ -38,10 +44,10 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18 ...@@ -38,10 +44,10 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
[2022-02-07] Added support for Centerpoint models on Nuscenes Dataset. [2022-02-07] Added support for Centerpoint models on Nuscenes Dataset.
[2022-01-14] Added support for dynamic pillar voxelization, following the implementation proposed in [H^23D R-CNN](https://arxiv.org/abs/2107.14391) with unique operation and [`torch_scatter`](https://github.com/rusty1s/pytorch_scatter) package. [2022-01-14] Added support for dynamic pillar voxelization, following the implementation proposed in [`H^23D R-CNN`](https://arxiv.org/abs/2107.14391) with unique operation and [`torch_scatter`](https://github.com/rusty1s/pytorch_scatter) package.
[2022-01-05] **NEW:** Update `OpenPCDet` to v0.5.2: [2022-01-05] **NEW:** Update `OpenPCDet` to v0.5.2:
* The code of [PV-RCNN++](https://arxiv.org/abs/2102.00463) has been released to this repo, with higher performance, faster training/inference speed and less memory consumption than PV-RCNN. * The code of [`PV-RCNN++`](https://arxiv.org/abs/2102.00463) has been released to this repo, with higher performance, faster training/inference speed and less memory consumption than PV-RCNN.
* Add performance of several models trained with full training set of [Waymo Open Dataset](#waymo-open-dataset-baselines). * Add performance of several models trained with full training set of [Waymo Open Dataset](#waymo-open-dataset-baselines).
* Support Lyft dataset, see the pull request [here](https://github.com/open-mmlab/OpenPCDet/pull/720). * Support Lyft dataset, see the pull request [here](https://github.com/open-mmlab/OpenPCDet/pull/720).
...@@ -199,7 +205,7 @@ We could not provide the above pretrained models due to [Waymo Dataset License A ...@@ -199,7 +205,7 @@ We could not provide the above pretrained models due to [Waymo Dataset License A
but you could easily achieve similar performance by training with the default configs. but you could easily achieve similar performance by training with the default configs.
### NuScenes 3D Object Detection Baselines ### NuScenes 3D Object Detection Baselines
All models are trained with 8 GTX 1080Ti GPUs and are available for download. All models are trained with 8 GPUs and are available for download. For training BEVFusion, please refer to the [guideline](docs/guidelines_of_approaches/bevfusion.md).
| | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download | | | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download |
|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:| |----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
...@@ -209,7 +215,10 @@ All models are trained with 8 GTX 1080Ti GPUs and are available for download. ...@@ -209,7 +215,10 @@ All models are trained with 8 GTX 1080Ti GPUs and are available for download.
| [CenterPoint (voxel_size=0.1)](tools/cfgs/nuscenes_models/cbgs_voxel01_res3d_centerpoint.yaml) | 30.11 | 25.55 | 38.28 | 21.94 | 18.87 | 56.03 | 64.54 | [model-34M](https://drive.google.com/file/d/1Cz-J1c3dw7JAWc25KRG1XQj8yCaOlexQ/view?usp=sharing) | | [CenterPoint (voxel_size=0.1)](tools/cfgs/nuscenes_models/cbgs_voxel01_res3d_centerpoint.yaml) | 30.11 | 25.55 | 38.28 | 21.94 | 18.87 | 56.03 | 64.54 | [model-34M](https://drive.google.com/file/d/1Cz-J1c3dw7JAWc25KRG1XQj8yCaOlexQ/view?usp=sharing) |
| [CenterPoint (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_res3d_centerpoint.yaml) | 28.80 | 25.43 | 37.27 | 21.55 | 18.24 | 59.22 | 66.48 | [model-34M](https://drive.google.com/file/d/1XOHAWm1MPkCKr1gqmc3TWi5AYZgPsgxU/view?usp=sharing) | | [CenterPoint (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_res3d_centerpoint.yaml) | 28.80 | 25.43 | 37.27 | 21.55 | 18.24 | 59.22 | 66.48 | [model-34M](https://drive.google.com/file/d/1XOHAWm1MPkCKr1gqmc3TWi5AYZgPsgxU/view?usp=sharing) |
| [VoxelNeXt (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_voxelnext.yaml) | 30.11 | 25.23 | 40.57 | 21.69 | 18.56 | 60.53 | 66.65 | [model-31M](https://drive.google.com/file/d/1IV7e7G9X-61KXSjMGtQo579pzDNbhwvf/view?usp=share_link) | | [VoxelNeXt (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_voxelnext.yaml) | 30.11 | 25.23 | 40.57 | 21.69 | 18.56 | 60.53 | 66.65 | [model-31M](https://drive.google.com/file/d/1IV7e7G9X-61KXSjMGtQo579pzDNbhwvf/view?usp=share_link) |
| [TransFusion-L*](tools/cfgs/nuscenes_models/transfusion_lidar.yaml) | 27.96 | 25.37 | 29.35 | 27.31 | 18.55 | 64.58 | 69.43 | [model-32M](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link) |
| [BEVFusion](tools/cfgs/nuscenes_models/bevfusion.yaml) | 28.03 | 25.43 | 30.19 | 26.76 | 18.48 | 67.75 | 70.98 | [model-157M](https://drive.google.com/file/d/1X50b-8immqlqD8VPAUkSKI0Ls-4k37g9/view?usp=share_link) |
*: Use the fade strategy, which disables data augmentations in the last several epochs during training.
### ONCE 3D Object Detection Baselines ### ONCE 3D Object Detection Baselines
All models are trained with 8 GPUs. All models are trained with 8 GPUs.
......
...@@ -53,9 +53,16 @@ pip install nuscenes-devkit==1.0.5 ...@@ -53,9 +53,16 @@ pip install nuscenes-devkit==1.0.5
* Generate the data infos by running the following command (it may take several hours): * Generate the data infos by running the following command (it may take several hours):
```python ```python
# for lidar-only setting
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \ python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \
--cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \ --cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \
--version v1.0-trainval --version v1.0-trainval
# for multi-modal setting
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos \
--cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml \
--version v1.0-trainval \
--with_cam
``` ```
### Waymo Open Dataset ### Waymo Open Dataset
......
## Installation
Please refer to [INSTALL.md](../INSTALL.md) for the installation of `OpenPCDet`.
* We recommend the users to check the version of pillow and use pillow==8.4.0 to avoid bug in bev pooling.
## Data Preparation
Please refer to [GETTING_STARTED.md](../GETTING_STARTED.md) to process the multi-modal Nuscenes Dataset.
## Training
1. Train the lidar branch for BEVFusion:
```shell
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/transfusion_lidar.yaml \
```
The ckpt will be saved in ../output/nuscenes_models/transfusion_lidar/default/ckpt, or you can download pretrained checkpoint directly form [here](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link).
2. To train BEVFusion, you need to download pretrained parameters for image backbone [here](https://drive.google.com/file/d/1v74WCt4_5ubjO7PciA5T0xhQc9bz_jZu/view?usp=share_link), and specify the path in [config](../../tools/cfgs/nuscenes_models/bevfusion.yaml#L88). Then run the following command:
```shell
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/bevfusion.yaml \
--pretrained_model path_to_pretrained_lidar_branch_ckpt \
```
## Evaluation
* Test with a pretrained model:
```shell
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/nuscenes_models/bevfusion.yaml \
--ckpt ../output/cfgs/nuscenes_models/bevfusion/default/ckpt/checkpoint_epoch_6.pth
```
## Performance
All models are trained with spconv 1.0, but you can directly load them for testing regardless of the spconv version.
| | mATE | mASE | mAOE | mAVE | mAAE | mAP | NDS | download |
|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
| [TransFusion-L](../../tools/cfgs/nuscenes_models/transfusion_lidar.yaml) | 27.96 | 25.37 | 29.35 | 27.31 | 18.55 | 64.58 | 69.43 | [model-32M](https://drive.google.com/file/d/1cuZ2qdDnxSwTCsiXWwbqCGF-uoazTXbz/view?usp=share_link) |
| [BEVFusion](../../tools/cfgs/nuscenes_models/bevfusion.yaml) | 28.03 | 25.43 | 30.19 | 26.76 | 18.48 | 67.75 | 70.98 | [model-157M](https://drive.google.com/file/d/1X50b-8immqlqD8VPAUkSKI0Ls-4k37g9/view?usp=share_link) |
from functools import partial from functools import partial
import numpy as np import numpy as np
from PIL import Image
from ...utils import common_utils from ...utils import common_utils
from . import augmentor_utils, database_sampler from . import augmentor_utils, database_sampler
...@@ -23,6 +24,18 @@ class DataAugmentor(object): ...@@ -23,6 +24,18 @@ class DataAugmentor(object):
cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg) cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg)
self.data_augmentor_queue.append(cur_augmentor) self.data_augmentor_queue.append(cur_augmentor)
def disable_augmentation(self, augmentor_configs):
self.data_augmentor_queue = []
aug_config_list = augmentor_configs if isinstance(augmentor_configs, list) \
else augmentor_configs.AUG_CONFIG_LIST
for cur_cfg in aug_config_list:
if not isinstance(augmentor_configs, list):
if cur_cfg.NAME in augmentor_configs.DISABLE_AUG_LIST:
continue
cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg)
self.data_augmentor_queue.append(cur_augmentor)
def gt_sampling(self, config=None): def gt_sampling(self, config=None):
db_sampler = database_sampler.DataBaseSampler( db_sampler = database_sampler.DataBaseSampler(
root_path=self.root_path, root_path=self.root_path,
...@@ -139,6 +152,7 @@ class DataAugmentor(object): ...@@ -139,6 +152,7 @@ class DataAugmentor(object):
data_dict['gt_boxes'] = gt_boxes data_dict['gt_boxes'] = gt_boxes
data_dict['points'] = points data_dict['points'] = points
data_dict['noise_translate'] = noise_translate
return data_dict return data_dict
def random_local_translation(self, data_dict=None, config=None): def random_local_translation(self, data_dict=None, config=None):
...@@ -251,6 +265,28 @@ class DataAugmentor(object): ...@@ -251,6 +265,28 @@ class DataAugmentor(object):
data_dict['points'] = points data_dict['points'] = points
return data_dict return data_dict
def imgaug(self, data_dict=None, config=None):
if data_dict is None:
return partial(self.imgaug, config=config)
imgs = data_dict["camera_imgs"]
img_process_infos = data_dict['img_process_infos']
new_imgs = []
for img, img_process_info in zip(imgs, img_process_infos):
flip = False
if config.RAND_FLIP and np.random.choice([0, 1]):
flip = True
rotate = np.random.uniform(*config.ROT_LIM)
# aug images
if flip:
img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
img = img.rotate(rotate)
img_process_info[2] = flip
img_process_info[3] = rotate
new_imgs.append(img)
data_dict["camera_imgs"] = new_imgs
return data_dict
def forward(self, data_dict): def forward(self, data_dict):
""" """
Args: Args:
......
...@@ -2,6 +2,7 @@ from collections import defaultdict ...@@ -2,6 +2,7 @@ from collections import defaultdict
from pathlib import Path from pathlib import Path
import numpy as np import numpy as np
import torch
import torch.utils.data as torch_data import torch.utils.data as torch_data
from ..utils import common_utils from ..utils import common_utils
...@@ -130,6 +131,30 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -130,6 +131,30 @@ class DatasetTemplate(torch_data.Dataset):
""" """
raise NotImplementedError raise NotImplementedError
def set_lidar_aug_matrix(self, data_dict):
"""
Get lidar augment matrix (4 x 4), which are used to recover orig point coordinates.
"""
lidar_aug_matrix = np.eye(4)
if 'flip_y' in data_dict.keys():
flip_x = data_dict['flip_x']
flip_y = data_dict['flip_y']
if flip_x:
lidar_aug_matrix[:3,:3] = np.array([[1, 0, 0], [0, -1, 0], [0, 0, 1]]) @ lidar_aug_matrix[:3,:3]
if flip_y:
lidar_aug_matrix[:3,:3] = np.array([[-1, 0, 0], [0, 1, 0], [0, 0, 1]]) @ lidar_aug_matrix[:3,:3]
if 'noise_rot' in data_dict.keys():
noise_rot = data_dict['noise_rot']
lidar_aug_matrix[:3,:3] = common_utils.angle2matrix(torch.tensor(noise_rot)) @ lidar_aug_matrix[:3,:3]
if 'noise_scale' in data_dict.keys():
noise_scale = data_dict['noise_scale']
lidar_aug_matrix[:3,:3] *= noise_scale
if 'noise_translate' in data_dict.keys():
noise_translate = data_dict['noise_translate']
lidar_aug_matrix[:3,3:4] = noise_translate.T
data_dict['lidar_aug_matrix'] = lidar_aug_matrix
return data_dict
def prepare_data(self, data_dict): def prepare_data(self, data_dict):
""" """
Args: Args:
...@@ -165,6 +190,7 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -165,6 +190,7 @@ class DatasetTemplate(torch_data.Dataset):
) )
if 'calib' in data_dict: if 'calib' in data_dict:
data_dict['calib'] = calib data_dict['calib'] = calib
data_dict = self.set_lidar_aug_matrix(data_dict)
if data_dict.get('gt_boxes', None) is not None: if data_dict.get('gt_boxes', None) is not None:
selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names) selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names)
data_dict['gt_boxes'] = data_dict['gt_boxes'][selected] data_dict['gt_boxes'] = data_dict['gt_boxes'][selected]
...@@ -287,6 +313,8 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -287,6 +313,8 @@ class DatasetTemplate(torch_data.Dataset):
constant_values=pad_value) constant_values=pad_value)
points.append(points_pad) points.append(points_pad)
ret[key] = np.stack(points, axis=0) ret[key] = np.stack(points, axis=0)
elif key in ['camera_imgs']:
ret[key] = torch.stack([torch.stack(imgs,dim=0) for imgs in val],dim=0)
else: else:
ret[key] = np.stack(val, axis=0) ret[key] = np.stack(val, axis=0)
except: except:
......
...@@ -8,6 +8,8 @@ from tqdm import tqdm ...@@ -8,6 +8,8 @@ from tqdm import tqdm
from ...ops.roiaware_pool3d import roiaware_pool3d_utils from ...ops.roiaware_pool3d import roiaware_pool3d_utils
from ...utils import common_utils from ...utils import common_utils
from ..dataset import DatasetTemplate from ..dataset import DatasetTemplate
from pyquaternion import Quaternion
from PIL import Image
class NuScenesDataset(DatasetTemplate): class NuScenesDataset(DatasetTemplate):
...@@ -17,6 +19,13 @@ class NuScenesDataset(DatasetTemplate): ...@@ -17,6 +19,13 @@ class NuScenesDataset(DatasetTemplate):
dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
) )
self.infos = [] self.infos = []
self.camera_config = self.dataset_cfg.get('CAMERA_CONFIG', None)
if self.camera_config is not None:
self.use_camera = self.camera_config.get('USE_CAMERA', True)
self.camera_image_config = self.camera_config.IMAGE
else:
self.use_camera = False
self.include_nuscenes_data(self.mode) self.include_nuscenes_data(self.mode)
if self.training and self.dataset_cfg.get('BALANCED_RESAMPLING', False): if self.training and self.dataset_cfg.get('BALANCED_RESAMPLING', False):
self.infos = self.balanced_infos_resampling(self.infos) self.infos = self.balanced_infos_resampling(self.infos)
...@@ -108,6 +117,98 @@ class NuScenesDataset(DatasetTemplate): ...@@ -108,6 +117,98 @@ class NuScenesDataset(DatasetTemplate):
points = np.concatenate((points, times), axis=1) points = np.concatenate((points, times), axis=1)
return points return points
def crop_image(self, input_dict):
W, H = input_dict["ori_shape"]
imgs = input_dict["camera_imgs"]
img_process_infos = []
crop_images = []
for img in imgs:
if self.training == True:
fH, fW = self.camera_image_config.FINAL_DIM
resize_lim = self.camera_image_config.RESIZE_LIM_TRAIN
resize = np.random.uniform(*resize_lim)
resize_dims = (int(W * resize), int(H * resize))
newW, newH = resize_dims
crop_h = newH - fH
crop_w = int(np.random.uniform(0, max(0, newW - fW)))
crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
else:
fH, fW = self.camera_image_config.FINAL_DIM
resize_lim = self.camera_image_config.RESIZE_LIM_TEST
resize = np.mean(resize_lim)
resize_dims = (int(W * resize), int(H * resize))
newW, newH = resize_dims
crop_h = newH - fH
crop_w = int(max(0, newW - fW) / 2)
crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
# reisze and crop image
img = img.resize(resize_dims)
img = img.crop(crop)
crop_images.append(img)
img_process_infos.append([resize, crop, False, 0])
input_dict['img_process_infos'] = img_process_infos
input_dict['camera_imgs'] = crop_images
return input_dict
def load_camera_info(self, input_dict, info):
input_dict["image_paths"] = []
input_dict["lidar2camera"] = []
input_dict["lidar2image"] = []
input_dict["camera2ego"] = []
input_dict["camera_intrinsics"] = []
input_dict["camera2lidar"] = []
for _, camera_info in info["cams"].items():
input_dict["image_paths"].append(camera_info["data_path"])
# lidar to camera transform
lidar2camera_r = np.linalg.inv(camera_info["sensor2lidar_rotation"])
lidar2camera_t = (
camera_info["sensor2lidar_translation"] @ lidar2camera_r.T
)
lidar2camera_rt = np.eye(4).astype(np.float32)
lidar2camera_rt[:3, :3] = lidar2camera_r.T
lidar2camera_rt[3, :3] = -lidar2camera_t
input_dict["lidar2camera"].append(lidar2camera_rt.T)
# camera intrinsics
camera_intrinsics = np.eye(4).astype(np.float32)
camera_intrinsics[:3, :3] = camera_info["camera_intrinsics"]
input_dict["camera_intrinsics"].append(camera_intrinsics)
# lidar to image transform
lidar2image = camera_intrinsics @ lidar2camera_rt.T
input_dict["lidar2image"].append(lidar2image)
# camera to ego transform
camera2ego = np.eye(4).astype(np.float32)
camera2ego[:3, :3] = Quaternion(
camera_info["sensor2ego_rotation"]
).rotation_matrix
camera2ego[:3, 3] = camera_info["sensor2ego_translation"]
input_dict["camera2ego"].append(camera2ego)
# camera to lidar transform
camera2lidar = np.eye(4).astype(np.float32)
camera2lidar[:3, :3] = camera_info["sensor2lidar_rotation"]
camera2lidar[:3, 3] = camera_info["sensor2lidar_translation"]
input_dict["camera2lidar"].append(camera2lidar)
# read image
filename = input_dict["image_paths"]
images = []
for name in filename:
images.append(Image.open(str(self.root_path / name)))
input_dict["camera_imgs"] = images
input_dict["ori_shape"] = images[0].size
# resize and crop image
input_dict = self.crop_image(input_dict)
return input_dict
def __len__(self): def __len__(self):
if self._merge_all_iters_to_one_epoch: if self._merge_all_iters_to_one_epoch:
return len(self.infos) * self.total_epochs return len(self.infos) * self.total_epochs
...@@ -137,6 +238,8 @@ class NuScenesDataset(DatasetTemplate): ...@@ -137,6 +238,8 @@ class NuScenesDataset(DatasetTemplate):
'gt_names': info['gt_names'] if mask is None else info['gt_names'][mask], 'gt_names': info['gt_names'] if mask is None else info['gt_names'][mask],
'gt_boxes': info['gt_boxes'] if mask is None else info['gt_boxes'][mask] 'gt_boxes': info['gt_boxes'] if mask is None else info['gt_boxes'][mask]
}) })
if self.use_camera:
input_dict = self.load_camera_info(input_dict, info)
data_dict = self.prepare_data(data_dict=input_dict) data_dict = self.prepare_data(data_dict=input_dict)
...@@ -251,7 +354,7 @@ class NuScenesDataset(DatasetTemplate): ...@@ -251,7 +354,7 @@ class NuScenesDataset(DatasetTemplate):
pickle.dump(all_db_infos, f) pickle.dump(all_db_infos, f)
def create_nuscenes_info(version, data_path, save_path, max_sweeps=10): def create_nuscenes_info(version, data_path, save_path, max_sweeps=10, with_cam=False):
from nuscenes.nuscenes import NuScenes from nuscenes.nuscenes import NuScenes
from nuscenes.utils import splits from nuscenes.utils import splits
from . import nuscenes_utils from . import nuscenes_utils
...@@ -283,7 +386,7 @@ def create_nuscenes_info(version, data_path, save_path, max_sweeps=10): ...@@ -283,7 +386,7 @@ def create_nuscenes_info(version, data_path, save_path, max_sweeps=10):
train_nusc_infos, val_nusc_infos = nuscenes_utils.fill_trainval_infos( train_nusc_infos, val_nusc_infos = nuscenes_utils.fill_trainval_infos(
data_path=data_path, nusc=nusc, train_scenes=train_scenes, val_scenes=val_scenes, data_path=data_path, nusc=nusc, train_scenes=train_scenes, val_scenes=val_scenes,
test='test' in version, max_sweeps=max_sweeps test='test' in version, max_sweeps=max_sweeps, with_cam=with_cam
) )
if version == 'v1.0-test': if version == 'v1.0-test':
...@@ -308,6 +411,7 @@ if __name__ == '__main__': ...@@ -308,6 +411,7 @@ if __name__ == '__main__':
parser.add_argument('--cfg_file', type=str, default=None, help='specify the config of dataset') parser.add_argument('--cfg_file', type=str, default=None, help='specify the config of dataset')
parser.add_argument('--func', type=str, default='create_nuscenes_infos', help='') parser.add_argument('--func', type=str, default='create_nuscenes_infos', help='')
parser.add_argument('--version', type=str, default='v1.0-trainval', help='') parser.add_argument('--version', type=str, default='v1.0-trainval', help='')
parser.add_argument('--with_cam', action='store_true', default=False, help='use camera or not')
args = parser.parse_args() args = parser.parse_args()
if args.func == 'create_nuscenes_infos': if args.func == 'create_nuscenes_infos':
...@@ -319,6 +423,7 @@ if __name__ == '__main__': ...@@ -319,6 +423,7 @@ if __name__ == '__main__':
data_path=ROOT_DIR / 'data' / 'nuscenes', data_path=ROOT_DIR / 'data' / 'nuscenes',
save_path=ROOT_DIR / 'data' / 'nuscenes', save_path=ROOT_DIR / 'data' / 'nuscenes',
max_sweeps=dataset_cfg.MAX_SWEEPS, max_sweeps=dataset_cfg.MAX_SWEEPS,
with_cam=args.with_cam
) )
nuscenes_dataset = NuScenesDataset( nuscenes_dataset = NuScenesDataset(
......
...@@ -247,9 +247,69 @@ def quaternion_yaw(q: Quaternion) -> float: ...@@ -247,9 +247,69 @@ def quaternion_yaw(q: Quaternion) -> float:
yaw = np.arctan2(v[1], v[0]) yaw = np.arctan2(v[1], v[0])
return yaw return yaw
def obtain_sensor2top(
nusc, sensor_token, l2e_t, l2e_r_mat, e2g_t, e2g_r_mat, sensor_type="lidar"
):
"""Obtain the info with RT matric from general sensor to Top LiDAR.
def fill_trainval_infos(data_path, nusc, train_scenes, val_scenes, test=False, max_sweeps=10): Args:
nusc (class): Dataset class in the nuScenes dataset.
sensor_token (str): Sample data token corresponding to the
specific sensor type.
l2e_t (np.ndarray): Translation from lidar to ego in shape (1, 3).
l2e_r_mat (np.ndarray): Rotation matrix from lidar to ego
in shape (3, 3).
e2g_t (np.ndarray): Translation from ego to global in shape (1, 3).
e2g_r_mat (np.ndarray): Rotation matrix from ego to global
in shape (3, 3).
sensor_type (str): Sensor to calibrate. Default: 'lidar'.
Returns:
sweep (dict): Sweep information after transformation.
"""
sd_rec = nusc.get("sample_data", sensor_token)
cs_record = nusc.get("calibrated_sensor", sd_rec["calibrated_sensor_token"])
pose_record = nusc.get("ego_pose", sd_rec["ego_pose_token"])
data_path = str(nusc.get_sample_data_path(sd_rec["token"]))
# if os.getcwd() in data_path: # path from lyftdataset is absolute path
# data_path = data_path.split(f"{os.getcwd()}/")[-1] # relative path
sweep = {
"data_path": data_path,
"type": sensor_type,
"sample_data_token": sd_rec["token"],
"sensor2ego_translation": cs_record["translation"],
"sensor2ego_rotation": cs_record["rotation"],
"ego2global_translation": pose_record["translation"],
"ego2global_rotation": pose_record["rotation"],
"timestamp": sd_rec["timestamp"],
}
l2e_r_s = sweep["sensor2ego_rotation"]
l2e_t_s = sweep["sensor2ego_translation"]
e2g_r_s = sweep["ego2global_rotation"]
e2g_t_s = sweep["ego2global_translation"]
# obtain the RT from sensor to Top LiDAR
# sweep->ego->global->ego'->lidar
l2e_r_s_mat = Quaternion(l2e_r_s).rotation_matrix
e2g_r_s_mat = Quaternion(e2g_r_s).rotation_matrix
R = (l2e_r_s_mat.T @ e2g_r_s_mat.T) @ (
np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T
)
T = (l2e_t_s @ e2g_r_s_mat.T + e2g_t_s) @ (
np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T
)
T -= (
e2g_t @ (np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T)
+ l2e_t @ np.linalg.inv(l2e_r_mat).T
).squeeze(0)
sweep["sensor2lidar_rotation"] = R.T # points @ R.T + T
sweep["sensor2lidar_translation"] = T
return sweep
def fill_trainval_infos(data_path, nusc, train_scenes, val_scenes, test=False, max_sweeps=10, with_cam=False):
train_nusc_infos = [] train_nusc_infos = []
val_nusc_infos = [] val_nusc_infos = []
progress_bar = tqdm.tqdm(total=len(nusc.sample), desc='create_info', dynamic_ncols=True) progress_bar = tqdm.tqdm(total=len(nusc.sample), desc='create_info', dynamic_ncols=True)
...@@ -291,6 +351,34 @@ def fill_trainval_infos(data_path, nusc, train_scenes, val_scenes, test=False, m ...@@ -291,6 +351,34 @@ def fill_trainval_infos(data_path, nusc, train_scenes, val_scenes, test=False, m
'car_from_global': car_from_global, 'car_from_global': car_from_global,
'timestamp': ref_time, 'timestamp': ref_time,
} }
if with_cam:
info['cams'] = dict()
l2e_r = ref_cs_rec["rotation"]
l2e_t = ref_cs_rec["translation"],
e2g_r = ref_pose_rec["rotation"]
e2g_t = ref_pose_rec["translation"]
l2e_r_mat = Quaternion(l2e_r).rotation_matrix
e2g_r_mat = Quaternion(e2g_r).rotation_matrix
# obtain 6 image's information per frame
camera_types = [
"CAM_FRONT",
"CAM_FRONT_RIGHT",
"CAM_FRONT_LEFT",
"CAM_BACK",
"CAM_BACK_LEFT",
"CAM_BACK_RIGHT",
]
for cam in camera_types:
cam_token = sample["data"][cam]
cam_path, _, camera_intrinsics = nusc.get_sample_data(cam_token)
cam_info = obtain_sensor2top(
nusc, cam_token, l2e_t, l2e_r_mat, e2g_t, e2g_r_mat, cam
)
cam_info['data_path'] = Path(cam_info['data_path']).relative_to(data_path).__str__()
cam_info.update(camera_intrinsics=camera_intrinsics)
info["cams"].update({cam: cam_info})
sample_data_token = sample['data'][chan] sample_data_token = sample['data'][chan]
curr_sd_rec = nusc.get('sample_data', sample_data_token) curr_sd_rec = nusc.get('sample_data', sample_data_token)
......
...@@ -2,7 +2,8 @@ from functools import partial ...@@ -2,7 +2,8 @@ from functools import partial
import numpy as np import numpy as np
from skimage import transform from skimage import transform
import torch
import torchvision
from ...utils import box_utils, common_utils from ...utils import box_utils, common_utils
tv = None tv = None
...@@ -228,6 +229,56 @@ class DataProcessor(object): ...@@ -228,6 +229,56 @@ class DataProcessor(object):
factors=(self.depth_downsample_factor, self.depth_downsample_factor) factors=(self.depth_downsample_factor, self.depth_downsample_factor)
) )
return data_dict return data_dict
def image_normalize(self, data_dict=None, config=None):
if data_dict is None:
return partial(self.image_normalize, config=config)
mean = config.mean
std = config.std
compose = torchvision.transforms.Compose(
[
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=mean, std=std),
]
)
data_dict["camera_imgs"] = [compose(img) for img in data_dict["camera_imgs"]]
return data_dict
def image_calibrate(self,data_dict=None, config=None):
if data_dict is None:
return partial(self.image_calibrate, config=config)
img_process_infos = data_dict['img_process_infos']
transforms = []
for img_process_info in img_process_infos:
resize, crop, flip, rotate = img_process_info
rotation = torch.eye(2)
translation = torch.zeros(2)
# post-homography transformation
rotation *= resize
translation -= torch.Tensor(crop[:2])
if flip:
A = torch.Tensor([[-1, 0], [0, 1]])
b = torch.Tensor([crop[2] - crop[0], 0])
rotation = A.matmul(rotation)
translation = A.matmul(translation) + b
theta = rotate / 180 * np.pi
A = torch.Tensor(
[
[np.cos(theta), np.sin(theta)],
[-np.sin(theta), np.cos(theta)],
]
)
b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
b = A.matmul(-b) + b
rotation = A.matmul(rotation)
translation = A.matmul(translation) + b
transform = torch.eye(4)
transform[:2, :2] = rotation
transform[:2, 3] = translation
transforms.append(transform.numpy())
data_dict["img_aug_matrix"] = transforms
return data_dict
def forward(self, data_dict): def forward(self, data_dict):
""" """
......
...@@ -22,9 +22,11 @@ def build_network(model_cfg, num_class, dataset): ...@@ -22,9 +22,11 @@ def build_network(model_cfg, num_class, dataset):
def load_data_to_gpu(batch_dict): def load_data_to_gpu(batch_dict):
for key, val in batch_dict.items(): for key, val in batch_dict.items():
if not isinstance(val, np.ndarray): if key == 'camera_imgs':
batch_dict[key] = val.cuda()
elif not isinstance(val, np.ndarray):
continue continue
elif key in ['frame_id', 'metadata', 'calib']: elif key in ['frame_id', 'metadata', 'calib', 'image_paths','ori_shape','img_process_infos']:
continue continue
elif key in ['images']: elif key in ['images']:
batch_dict[key] = kornia.image_to_tensor(val).float().cuda().contiguous() batch_dict[key] = kornia.image_to_tensor(val).float().cuda().contiguous()
......
...@@ -46,7 +46,7 @@ class BaseBEVBackbone(nn.Module): ...@@ -46,7 +46,7 @@ class BaseBEVBackbone(nn.Module):
self.blocks.append(nn.Sequential(*cur_layers)) self.blocks.append(nn.Sequential(*cur_layers))
if len(upsample_strides) > 0: if len(upsample_strides) > 0:
stride = upsample_strides[idx] stride = upsample_strides[idx]
if stride >= 1: if stride > 1 or (stride == 1 and not self.model_cfg.get('USE_CONV_FOR_NO_STRIDE', False)):
self.deblocks.append(nn.Sequential( self.deblocks.append(nn.Sequential(
nn.ConvTranspose2d( nn.ConvTranspose2d(
num_filters[idx], num_upsample_filters[idx], num_filters[idx], num_upsample_filters[idx],
......
from .convfuser import ConvFuser
__all__ = {
'ConvFuser':ConvFuser
}
\ No newline at end of file
import torch
from torch import nn
class ConvFuser(nn.Module):
def __init__(self,model_cfg) -> None:
super().__init__()
self.model_cfg = model_cfg
in_channel = self.model_cfg.IN_CHANNEL
out_channel = self.model_cfg.OUT_CHANNEL
self.conv = nn.Sequential(
nn.Conv2d(in_channel, out_channel, 3, padding=1, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU(True)
)
def forward(self,batch_dict):
"""
Args:
batch_dict:
spatial_features_img (tensor): Bev features from image modality
spatial_features (tensor): Bev features from lidar modality
Returns:
batch_dict:
spatial_features (tensor): Bev features after muli-modal fusion
"""
img_bev = batch_dict['spatial_features_img']
lidar_bev = batch_dict['spatial_features']
cat_bev = torch.cat([img_bev,lidar_bev],dim=1)
mm_bev = self.conv(cat_bev)
batch_dict['spatial_features'] = mm_bev
return batch_dict
\ No newline at end of file
...@@ -30,11 +30,12 @@ def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stri ...@@ -30,11 +30,12 @@ def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stri
class SparseBasicBlock(spconv.SparseModule): class SparseBasicBlock(spconv.SparseModule):
expansion = 1 expansion = 1
def __init__(self, inplanes, planes, stride=1, norm_fn=None, downsample=None, indice_key=None): def __init__(self, inplanes, planes, stride=1, bias=None, norm_fn=None, downsample=None, indice_key=None):
super(SparseBasicBlock, self).__init__() super(SparseBasicBlock, self).__init__()
assert norm_fn is not None assert norm_fn is not None
bias = norm_fn is not None if bias is None:
bias = norm_fn is not None
self.conv1 = spconv.SubMConv3d( self.conv1 = spconv.SubMConv3d(
inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key
) )
...@@ -184,6 +185,7 @@ class VoxelResBackBone8x(nn.Module): ...@@ -184,6 +185,7 @@ class VoxelResBackBone8x(nn.Module):
def __init__(self, model_cfg, input_channels, grid_size, **kwargs): def __init__(self, model_cfg, input_channels, grid_size, **kwargs):
super().__init__() super().__init__()
self.model_cfg = model_cfg self.model_cfg = model_cfg
use_bias = self.model_cfg.get('USE_BIAS', None)
norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01) norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)
self.sparse_shape = grid_size[::-1] + [1, 0, 0] self.sparse_shape = grid_size[::-1] + [1, 0, 0]
...@@ -196,29 +198,29 @@ class VoxelResBackBone8x(nn.Module): ...@@ -196,29 +198,29 @@ class VoxelResBackBone8x(nn.Module):
block = post_act_block block = post_act_block
self.conv1 = spconv.SparseSequential( self.conv1 = spconv.SparseSequential(
SparseBasicBlock(16, 16, norm_fn=norm_fn, indice_key='res1'), SparseBasicBlock(16, 16, bias=use_bias, norm_fn=norm_fn, indice_key='res1'),
SparseBasicBlock(16, 16, norm_fn=norm_fn, indice_key='res1'), SparseBasicBlock(16, 16, bias=use_bias, norm_fn=norm_fn, indice_key='res1'),
) )
self.conv2 = spconv.SparseSequential( self.conv2 = spconv.SparseSequential(
# [1600, 1408, 41] <- [800, 704, 21] # [1600, 1408, 41] <- [800, 704, 21]
block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'), block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),
SparseBasicBlock(32, 32, norm_fn=norm_fn, indice_key='res2'), SparseBasicBlock(32, 32, bias=use_bias, norm_fn=norm_fn, indice_key='res2'),
SparseBasicBlock(32, 32, norm_fn=norm_fn, indice_key='res2'), SparseBasicBlock(32, 32, bias=use_bias, norm_fn=norm_fn, indice_key='res2'),
) )
self.conv3 = spconv.SparseSequential( self.conv3 = spconv.SparseSequential(
# [800, 704, 21] <- [400, 352, 11] # [800, 704, 21] <- [400, 352, 11]
block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'), block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),
SparseBasicBlock(64, 64, norm_fn=norm_fn, indice_key='res3'), SparseBasicBlock(64, 64, bias=use_bias, norm_fn=norm_fn, indice_key='res3'),
SparseBasicBlock(64, 64, norm_fn=norm_fn, indice_key='res3'), SparseBasicBlock(64, 64, bias=use_bias, norm_fn=norm_fn, indice_key='res3'),
) )
self.conv4 = spconv.SparseSequential( self.conv4 = spconv.SparseSequential(
# [400, 352, 11] <- [200, 176, 5] # [400, 352, 11] <- [200, 176, 5]
block(64, 128, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'), block(64, 128, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'),
SparseBasicBlock(128, 128, norm_fn=norm_fn, indice_key='res4'), SparseBasicBlock(128, 128, bias=use_bias, norm_fn=norm_fn, indice_key='res4'),
SparseBasicBlock(128, 128, norm_fn=norm_fn, indice_key='res4'), SparseBasicBlock(128, 128, bias=use_bias, norm_fn=norm_fn, indice_key='res4'),
) )
last_pad = 0 last_pad = 0
......
from .swin import SwinTransformer
__all__ = {
'SwinTransformer':SwinTransformer,
}
\ No newline at end of file
from .generalized_lss import GeneralizedLSSFPN
__all__ = {
'GeneralizedLSSFPN':GeneralizedLSSFPN,
}
\ No newline at end of file
import torch
import torch.nn as nn
import torch.nn.functional as F
from ...model_utils.basic_block_2d import BasicBlock2D
class GeneralizedLSSFPN(nn.Module):
"""
This module implements FPN, which creates pyramid features built on top of some input feature maps.
This code is adapted from https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/necks/fpn.py with minimal modifications.
"""
def __init__(self, model_cfg):
super().__init__()
self.model_cfg = model_cfg
in_channels = self.model_cfg.IN_CHANNELS
out_channels = self.model_cfg.OUT_CHANNELS
num_ins = len(in_channels)
num_outs = self.model_cfg.NUM_OUTS
start_level = self.model_cfg.START_LEVEL
end_level = self.model_cfg.END_LEVEL
self.in_channels = in_channels
if end_level == -1:
self.backbone_end_level = num_ins - 1
else:
self.backbone_end_level = end_level
assert end_level <= len(in_channels)
assert num_outs == end_level - start_level
self.start_level = start_level
self.end_level = end_level
self.lateral_convs = nn.ModuleList()
self.fpn_convs = nn.ModuleList()
for i in range(self.start_level, self.backbone_end_level):
l_conv = BasicBlock2D(
in_channels[i] + (in_channels[i + 1] if i == self.backbone_end_level - 1 else out_channels),
out_channels, kernel_size=1, bias = False
)
fpn_conv = BasicBlock2D(out_channels,out_channels, kernel_size=3, padding=1, bias = False)
self.lateral_convs.append(l_conv)
self.fpn_convs.append(fpn_conv)
def forward(self, batch_dict):
"""
Args:
batch_dict:
image_features (list[tensor]): Multi-stage features from image backbone.
Returns:
batch_dict:
image_fpn (list(tensor)): FPN features.
"""
# upsample -> cat -> conv1x1 -> conv3x3
inputs = batch_dict['image_features']
assert len(inputs) == len(self.in_channels)
# build laterals
laterals = [inputs[i + self.start_level] for i in range(len(inputs))]
# build top-down path
used_backbone_levels = len(laterals) - 1
for i in range(used_backbone_levels - 1, -1, -1):
x = F.interpolate(
laterals[i + 1],
size=laterals[i].shape[2:],
mode='bilinear', align_corners=False,
)
laterals[i] = torch.cat([laterals[i], x], dim=1)
laterals[i] = self.lateral_convs[i](laterals[i])
laterals[i] = self.fpn_convs[i](laterals[i])
# build outputs
outs = [laterals[i] for i in range(used_backbone_levels)]
batch_dict['image_fpn'] = tuple(outs)
return batch_dict
This diff is collapsed.
...@@ -6,6 +6,7 @@ from .point_head_simple import PointHeadSimple ...@@ -6,6 +6,7 @@ from .point_head_simple import PointHeadSimple
from .point_intra_part_head import PointIntraPartOffsetHead from .point_intra_part_head import PointIntraPartOffsetHead
from .center_head import CenterHead from .center_head import CenterHead
from .voxelnext_head import VoxelNeXtHead from .voxelnext_head import VoxelNeXtHead
from .transfusion_head import TransFusionHead
__all__ = { __all__ = {
'AnchorHeadTemplate': AnchorHeadTemplate, 'AnchorHeadTemplate': AnchorHeadTemplate,
...@@ -16,4 +17,5 @@ __all__ = { ...@@ -16,4 +17,5 @@ __all__ = {
'AnchorHeadMulti': AnchorHeadMulti, 'AnchorHeadMulti': AnchorHeadMulti,
'CenterHead': CenterHead, 'CenterHead': CenterHead,
'VoxelNeXtHead': VoxelNeXtHead, 'VoxelNeXtHead': VoxelNeXtHead,
'TransFusionHead': TransFusionHead,
} }
import torch
from scipy.optimize import linear_sum_assignment
from pcdet.ops.iou3d_nms import iou3d_nms_cuda
def height_overlaps(boxes1, boxes2):
"""
Calculate height overlaps of two boxes.
"""
boxes1_top_height = (boxes1[:,2]+ boxes1[:,5]).view(-1, 1)
boxes1_bottom_height = boxes1[:,2].view(-1, 1)
boxes2_top_height = (boxes2[:,2]+boxes2[:,5]).view(1, -1)
boxes2_bottom_height = boxes2[:,2].view(1, -1)
heighest_of_bottom = torch.max(boxes1_bottom_height, boxes2_bottom_height)
lowest_of_top = torch.min(boxes1_top_height, boxes2_top_height)
overlaps_h = torch.clamp(lowest_of_top - heighest_of_bottom, min=0)
return overlaps_h
def overlaps(boxes1, boxes2):
"""
Calculate 3D overlaps of two boxes.
"""
rows = len(boxes1)
cols = len(boxes2)
if rows * cols == 0:
return boxes1.new(rows, cols)
# height overlap
overlaps_h = height_overlaps(boxes1, boxes2)
boxes1_bev = boxes1[:,:7]
boxes2_bev = boxes2[:,:7]
# bev overlap
overlaps_bev = boxes1_bev.new_zeros(
(boxes1_bev.shape[0], boxes2_bev.shape[0])
).cuda() # (N, M)
iou3d_nms_cuda.boxes_overlap_bev_gpu(
boxes1_bev.contiguous().cuda(), boxes2_bev.contiguous().cuda(), overlaps_bev
)
# 3d overlaps
overlaps_3d = overlaps_bev.to(boxes1.device) * overlaps_h
volume1 = (boxes1[:, 3] * boxes1[:, 4] * boxes1[:, 5]).view(-1, 1)
volume2 = (boxes2[:, 3] * boxes2[:, 4] * boxes2[:, 5]).view(1, -1)
iou3d = overlaps_3d / torch.clamp(volume1 + volume2 - overlaps_3d, min=1e-8)
return iou3d
class HungarianAssigner3D:
def __init__(self, cls_cost, reg_cost, iou_cost):
self.cls_cost = cls_cost
self.reg_cost = reg_cost
self.iou_cost = iou_cost
def focal_loss_cost(self, cls_pred, gt_labels):
weight = self.cls_cost.get('weight', 0.15)
alpha = self.cls_cost.get('alpha', 0.25)
gamma = self.cls_cost.get('gamma', 2.0)
eps = self.cls_cost.get('eps', 1e-12)
cls_pred = cls_pred.sigmoid()
neg_cost = -(1 - cls_pred + eps).log() * (
1 - alpha) * cls_pred.pow(gamma)
pos_cost = -(cls_pred + eps).log() * alpha * (
1 - cls_pred).pow(gamma)
cls_cost = pos_cost[:, gt_labels] - neg_cost[:, gt_labels]
return cls_cost * weight
def bevbox_cost(self, bboxes, gt_bboxes, point_cloud_range):
weight = self.reg_cost.get('weight', 0.25)
pc_start = bboxes.new(point_cloud_range[0:2])
pc_range = bboxes.new(point_cloud_range[3:5]) - bboxes.new(point_cloud_range[0:2])
# normalize the box center to [0, 1]
normalized_bboxes_xy = (bboxes[:, :2] - pc_start) / pc_range
normalized_gt_bboxes_xy = (gt_bboxes[:, :2] - pc_start) / pc_range
reg_cost = torch.cdist(normalized_bboxes_xy, normalized_gt_bboxes_xy, p=1)
return reg_cost * weight
def iou3d_cost(self, bboxes, gt_bboxes):
iou = overlaps(bboxes, gt_bboxes)
weight = self.iou_cost.get('weight', 0.25)
iou_cost = - iou
return iou_cost * weight, iou
def assign(self, bboxes, gt_bboxes, gt_labels, cls_pred, point_cloud_range):
num_gts, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
# 1. assign -1 by default
assigned_gt_inds = bboxes.new_full((num_bboxes,), -1, dtype=torch.long)
assigned_labels = bboxes.new_full((num_bboxes,), -1, dtype=torch.long)
if num_gts == 0 or num_bboxes == 0:
# No ground truth or boxes, return empty assignment
if num_gts == 0:
# No ground truth, assign all to background
assigned_gt_inds[:] = 0
return num_gts, assigned_gt_inds, max_overlaps, assigned_labels
# 2. compute the weighted costs
cls_cost = self.focal_loss_cost(cls_pred[0].T, gt_labels)
reg_cost = self.bevbox_cost(bboxes, gt_bboxes, point_cloud_range)
iou_cost, iou = self.iou3d_cost(bboxes, gt_bboxes)
# weighted sum of above three costs
cost = cls_cost + reg_cost + iou_cost
# 3. do Hungarian matching on CPU using linear_sum_assignment
cost = cost.detach().cpu()
matched_row_inds, matched_col_inds = linear_sum_assignment(cost)
matched_row_inds = torch.from_numpy(matched_row_inds).to(bboxes.device)
matched_col_inds = torch.from_numpy(matched_col_inds).to(bboxes.device)
# 4. assign backgrounds and foregrounds
# assign all indices to backgrounds first
assigned_gt_inds[:] = 0
# assign foregrounds based on matching results
assigned_gt_inds[matched_row_inds] = matched_col_inds + 1
assigned_labels[matched_row_inds] = gt_labels[matched_col_inds]
max_overlaps = torch.zeros_like(iou.max(1).values)
max_overlaps[matched_row_inds] = iou[matched_row_inds, matched_col_inds]
return assigned_gt_inds, max_overlaps
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment