Unverified Commit aaf9cbeb authored by Cody Reading's avatar Cody Reading Committed by GitHub
Browse files

Support Monocular 3D Detector CaDDN (#538)

* Added CaDDN detector and support for image, depth map, and 2D GT box
dataloading

* Moved image flip augmentation to augmentor_utils

* Updated default get item list to include points

* Moved utils functions into transform_utils

* Combined FFE + F2V into ImageVFE, renamed FFE to FFN, moved depth downsample into data_processor

* Updated README with updated CaDDN weights

* Updated comments for image vfe
parent e3bec15f
...@@ -13,3 +13,5 @@ venv/ ...@@ -13,3 +13,5 @@ venv/
*.pkl *.pkl
*.zip *.zip
*.bin *.bin
output
version.py
\ No newline at end of file
...@@ -18,6 +18,8 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18 ...@@ -18,6 +18,8 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18
## Changelog ## Changelog
[2021-05-14] Added support for the monocular 3D object detection model [`CaDDN`](#KITTI-3D-Object-Detection-Baselines)
[2020-11-27] **Bugfixed:** Please re-prepare the validation infos of Waymo dataset (version 1.2) if you would like to [2020-11-27] **Bugfixed:** Please re-prepare the validation infos of Waymo dataset (version 1.2) if you would like to
use our provided Waymo evaluation tool (see [PR](https://github.com/open-mmlab/OpenPCDet/pull/383)). use our provided Waymo evaluation tool (see [PR](https://github.com/open-mmlab/OpenPCDet/pull/383)).
Note that you do not need to re-prepare the training data and ground-truth database. Note that you do not need to re-prepare the training data and ground-truth database.
...@@ -104,6 +106,7 @@ Selected supported methods are shown in the below table. The results are the 3D ...@@ -104,6 +106,7 @@ Selected supported methods are shown in the below table. The results are the 3D
| [Part-A^2-Free](tools/cfgs/kitti_models/PartA2_free.yaml) | ~3.8 hours| 78.72 | 65.99 | 74.29 | [model-226M](https://drive.google.com/file/d/1lcUUxF8mJgZ_e-tZhP1XNQtTBuC-R0zr/view?usp=sharing) | | [Part-A^2-Free](tools/cfgs/kitti_models/PartA2_free.yaml) | ~3.8 hours| 78.72 | 65.99 | 74.29 | [model-226M](https://drive.google.com/file/d/1lcUUxF8mJgZ_e-tZhP1XNQtTBuC-R0zr/view?usp=sharing) |
| [Part-A^2-Anchor](tools/cfgs/kitti_models/PartA2.yaml) | ~4.3 hours| 79.40 | 60.05 | 69.90 | [model-244M](https://drive.google.com/file/d/10GK1aCkLqxGNeX3lVu8cLZyE0G8002hY/view?usp=sharing) | | [Part-A^2-Anchor](tools/cfgs/kitti_models/PartA2.yaml) | ~4.3 hours| 79.40 | 60.05 | 69.90 | [model-244M](https://drive.google.com/file/d/10GK1aCkLqxGNeX3lVu8cLZyE0G8002hY/view?usp=sharing) |
| [PV-RCNN](tools/cfgs/kitti_models/pv_rcnn.yaml) | ~5 hours| 83.61 | 57.90 | 70.47 | [model-50M](https://drive.google.com/file/d/1lIOq4Hxr0W3qsX83ilQv0nk1Cls6KAr-/view?usp=sharing) | | [PV-RCNN](tools/cfgs/kitti_models/pv_rcnn.yaml) | ~5 hours| 83.61 | 57.90 | 70.47 | [model-50M](https://drive.google.com/file/d/1lIOq4Hxr0W3qsX83ilQv0nk1Cls6KAr-/view?usp=sharing) |
| [CaDDN](tools/cfgs/kitti_models/CaDDN.yaml) |~15 hours| 21.38 | 13.02 | 9.76 | [model-774M](https://drive.google.com/file/d/1OQTO2PtXT8GGr35W9m2GZGuqgb6fyU1V/view?usp=sharing) |
### NuScenes 3D Object Detection Baselines ### NuScenes 3D Object Detection Baselines
All models are trained with 8 GTX 1080Ti GPUs and are available for download. All models are trained with 8 GTX 1080Ti GPUs and are available for download.
......
...@@ -9,6 +9,7 @@ Currently we provide the dataloader of KITTI dataset and NuScenes dataset, and t ...@@ -9,6 +9,7 @@ Currently we provide the dataloader of KITTI dataset and NuScenes dataset, and t
### KITTI Dataset ### KITTI Dataset
* Please download the official [KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) dataset and organize the downloaded files as follows (the road planes could be downloaded from [[road plane]](https://drive.google.com/file/d/1d5mq0RXRnvHPVeKx6Q612z0YRO1t2wAp/view?usp=sharing), which are optional for data augmentation in the training): * Please download the official [KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) dataset and organize the downloaded files as follows (the road planes could be downloaded from [[road plane]](https://drive.google.com/file/d/1d5mq0RXRnvHPVeKx6Q612z0YRO1t2wAp/view?usp=sharing), which are optional for data augmentation in the training):
* If you would like to train [CaDDN](../tools/cfgs/kitti_models/CaDDN.yaml), download the precomputed [depth maps](https://drive.google.com/file/d/1qFZux7KC_gJ0UHEg-qGJKqteE9Ivojin/view?usp=sharing) for the KITTI training set
* NOTE: if you already have the data infos from `pcdet v0.1`, you can choose to use the old infos and set the DATABASE_WITH_FAKELIDAR option in tools/cfgs/dataset_configs/kitti_dataset.yaml as True. The second choice is that you can create the infos and gt database again and leave the config unchanged. * NOTE: if you already have the data infos from `pcdet v0.1`, you can choose to use the old infos and set the DATABASE_WITH_FAKELIDAR option in tools/cfgs/dataset_configs/kitti_dataset.yaml as True. The second choice is that you can create the infos and gt database again and leave the config unchanged.
``` ```
...@@ -17,7 +18,7 @@ OpenPCDet ...@@ -17,7 +18,7 @@ OpenPCDet
│ ├── kitti │ ├── kitti
│ │ │── ImageSets │ │ │── ImageSets
│ │ │── training │ │ │── training
│ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes) │ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes) & (optional: depth_2)
│ │ │── testing │ │ │── testing
│ │ │ ├──calib & velodyne & image_2 │ │ │ ├──calib & velodyne & image_2
├── pcdet ├── pcdet
...@@ -94,6 +95,17 @@ python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos \ ...@@ -94,6 +95,17 @@ python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos \
Note that you do not need to install `waymo-open-dataset` if you have already processed the data before and do not need to evaluate with official Waymo Metrics. Note that you do not need to install `waymo-open-dataset` if you have already processed the data before and do not need to evaluate with official Waymo Metrics.
## Pretrained Models
If you would like to train [CaDDN](../tools/cfgs/kitti_models/CaDDN.yaml), download the pretrained [DeepLabV3 model](https://download.pytorch.org/models/deeplabv3_resnet101_coco-586e9e4e.pth) and place within the `checkpoints` directory
```
OpenPCDet
├── checkpoints
│ ├── deeplabv3_resnet101_coco-586e9e4e.pth
├── data
├── pcdet
├── tools
```
## Training & Testing ## Training & Testing
......
import copy
import numpy as np import numpy as np
from ...utils import common_utils from ...utils import common_utils
...@@ -76,3 +77,42 @@ def global_scaling(gt_boxes, points, scale_range): ...@@ -76,3 +77,42 @@ def global_scaling(gt_boxes, points, scale_range):
points[:, :3] *= noise_scale points[:, :3] *= noise_scale
gt_boxes[:, :6] *= noise_scale gt_boxes[:, :6] *= noise_scale
return gt_boxes, points return gt_boxes, points
def random_image_flip_horizontal(image, depth_map, gt_boxes, calib):
"""
Performs random horizontal flip augmentation
Args:
image: (H_image, W_image, 3), Image
depth_map: (H_depth, W_depth), Depth map
gt_boxes: (N, 7), 3D box labels in LiDAR coordinates [x, y, z, w, l, h, ry]
calib: calibration.Calibration, Calibration object
Returns:
aug_image: (H_image, W_image, 3), Augmented image
aug_depth_map: (H_depth, W_depth), Augmented depth map
aug_gt_boxes: (N, 7), Augmented 3D box labels in LiDAR coordinates [x, y, z, w, l, h, ry]
"""
# Randomly augment with 50% chance
enable = np.random.choice([False, True], replace=False, p=[0.5, 0.5])
if enable:
# Flip images
aug_image = np.fliplr(image)
aug_depth_map = np.fliplr(depth_map)
# Flip 3D gt_boxes by flipping the centroids in image space
aug_gt_boxes = copy.copy(gt_boxes)
locations = aug_gt_boxes[:, :3]
img_pts, img_depth = calib.lidar_to_img(locations)
W = image.shape[1]
img_pts[:, 0] = W - img_pts[:, 0]
pts_rect = calib.img_to_rect(u=img_pts[:, 0], v=img_pts[:, 1], depth_rect=img_depth)
pts_lidar = calib.rect_to_lidar(pts_rect)
aug_gt_boxes[:, :3] = pts_lidar
aug_gt_boxes[:, 6] = -1 * aug_gt_boxes[:, 6]
else:
aug_image = image
aug_depth_map = depth_map
aug_gt_boxes = gt_boxes
return aug_image, aug_depth_map, aug_gt_boxes
\ No newline at end of file
...@@ -78,6 +78,25 @@ class DataAugmentor(object): ...@@ -78,6 +78,25 @@ class DataAugmentor(object):
data_dict['points'] = points data_dict['points'] = points
return data_dict return data_dict
def random_image_flip(self, data_dict=None, config=None):
if data_dict is None:
return partial(self.random_image_flip, config=config)
images = data_dict["images"]
depth_maps = data_dict["depth_maps"]
gt_boxes = data_dict['gt_boxes']
gt_boxes2d = data_dict["gt_boxes2d"]
calib = data_dict["calib"]
for cur_axis in config['ALONG_AXIS_LIST']:
assert cur_axis in ['horizontal']
images, depth_maps, gt_boxes = getattr(augmentor_utils, 'random_image_flip_%s' % cur_axis)(
images, depth_maps, gt_boxes, calib,
)
data_dict['images'] = images
data_dict['depth_maps'] = depth_maps
data_dict['gt_boxes'] = gt_boxes
return data_dict
def forward(self, data_dict): def forward(self, data_dict):
""" """
Args: Args:
...@@ -103,5 +122,8 @@ class DataAugmentor(object): ...@@ -103,5 +122,8 @@ class DataAugmentor(object):
gt_boxes_mask = data_dict['gt_boxes_mask'] gt_boxes_mask = data_dict['gt_boxes_mask']
data_dict['gt_boxes'] = data_dict['gt_boxes'][gt_boxes_mask] data_dict['gt_boxes'] = data_dict['gt_boxes'][gt_boxes_mask]
data_dict['gt_names'] = data_dict['gt_names'][gt_boxes_mask] data_dict['gt_names'] = data_dict['gt_names'][gt_boxes_mask]
if 'gt_boxes2d' in data_dict:
data_dict['gt_boxes2d'] = data_dict['gt_boxes2d'][gt_boxes_mask]
data_dict.pop('gt_boxes_mask') data_dict.pop('gt_boxes_mask')
return data_dict return data_dict
...@@ -39,6 +39,11 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -39,6 +39,11 @@ class DatasetTemplate(torch_data.Dataset):
self.total_epochs = 0 self.total_epochs = 0
self._merge_all_iters_to_one_epoch = False self._merge_all_iters_to_one_epoch = False
if hasattr(self.data_processor, "depth_downsample_factor"):
self.depth_downsample_factor = self.data_processor.depth_downsample_factor
else:
self.depth_downsample_factor = None
@property @property
def mode(self): def mode(self):
return 'train' if self.training else 'test' return 'train' if self.training else 'test'
...@@ -97,7 +102,7 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -97,7 +102,7 @@ class DatasetTemplate(torch_data.Dataset):
""" """
Args: Args:
data_dict: data_dict:
points: (N, 3 + C_in) points: optional, (N, 3 + C_in)
gt_boxes: optional, (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...] gt_boxes: optional, (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]
gt_names: optional, (N), string gt_names: optional, (N), string
... ...
...@@ -133,6 +138,10 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -133,6 +138,10 @@ class DatasetTemplate(torch_data.Dataset):
gt_boxes = np.concatenate((data_dict['gt_boxes'], gt_classes.reshape(-1, 1).astype(np.float32)), axis=1) gt_boxes = np.concatenate((data_dict['gt_boxes'], gt_classes.reshape(-1, 1).astype(np.float32)), axis=1)
data_dict['gt_boxes'] = gt_boxes data_dict['gt_boxes'] = gt_boxes
if data_dict.get('gt_boxes2d', None) is not None:
data_dict['gt_boxes2d'] = data_dict['gt_boxes2d'][selected]
if data_dict.get('points', None) is not None:
data_dict = self.point_feature_encoder.forward(data_dict) data_dict = self.point_feature_encoder.forward(data_dict)
data_dict = self.data_processor.forward( data_dict = self.data_processor.forward(
...@@ -172,6 +181,43 @@ class DatasetTemplate(torch_data.Dataset): ...@@ -172,6 +181,43 @@ class DatasetTemplate(torch_data.Dataset):
for k in range(batch_size): for k in range(batch_size):
batch_gt_boxes3d[k, :val[k].__len__(), :] = val[k] batch_gt_boxes3d[k, :val[k].__len__(), :] = val[k]
ret[key] = batch_gt_boxes3d ret[key] = batch_gt_boxes3d
elif key in ['gt_boxes2d']:
max_boxes = 0
max_boxes = max([len(x) for x in val])
batch_boxes2d = np.zeros((batch_size, max_boxes, val[0].shape[-1]), dtype=np.float32)
for k in range(batch_size):
if val[k].size > 0:
batch_boxes2d[k, :val[k].__len__(), :] = val[k]
ret[key] = batch_boxes2d
elif key in ["images", "depth_maps"]:
# Get largest image size (H, W)
max_h = 0
max_w = 0
for image in val:
max_h = max(max_h, image.shape[0])
max_w = max(max_w, image.shape[1])
# Change size of images
images = []
for image in val:
pad_h = common_utils.get_pad_params(desired_size=max_h, cur_size=image.shape[0])
pad_w = common_utils.get_pad_params(desired_size=max_w, cur_size=image.shape[1])
pad_width = (pad_h, pad_w)
# Pad with nan, to be replaced later in the pipeline.
pad_value = np.nan
if key == "images":
pad_width = (pad_h, pad_w, (0, 0))
elif key == "depth_maps":
pad_width = (pad_h, pad_w)
image_pad = np.pad(image,
pad_width=pad_width,
mode='constant',
constant_values=pad_value)
images.append(image_pad)
ret[key] = np.stack(images, axis=0)
else: else:
ret[key] = np.stack(val, axis=0) ret[key] = np.stack(val, axis=0)
except: except:
......
...@@ -4,6 +4,7 @@ import pickle ...@@ -4,6 +4,7 @@ import pickle
import numpy as np import numpy as np
from skimage import io from skimage import io
from . import kitti_utils
from ...ops.roiaware_pool3d import roiaware_pool3d_utils from ...ops.roiaware_pool3d import roiaware_pool3d_utils
from ...utils import box_utils, calibration_kitti, common_utils, object3d_kitti from ...utils import box_utils, calibration_kitti, common_utils, object3d_kitti
from ..dataset import DatasetTemplate from ..dataset import DatasetTemplate
...@@ -64,6 +65,21 @@ class KittiDataset(DatasetTemplate): ...@@ -64,6 +65,21 @@ class KittiDataset(DatasetTemplate):
assert lidar_file.exists() assert lidar_file.exists()
return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4) return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)
def get_image(self, idx):
"""
Loads image for a sample
Args:
idx: int, Sample index
Returns:
image: (H, W, 3), RGB Image
"""
img_file = self.root_split_path / 'image_2' / ('%s.png' % idx)
assert img_file.exists()
image = io.imread(img_file)
image = image.astype(np.float32)
image /= 255.0
return image
def get_image_shape(self, idx): def get_image_shape(self, idx):
img_file = self.root_split_path / 'image_2' / ('%s.png' % idx) img_file = self.root_split_path / 'image_2' / ('%s.png' % idx)
assert img_file.exists() assert img_file.exists()
...@@ -74,6 +90,21 @@ class KittiDataset(DatasetTemplate): ...@@ -74,6 +90,21 @@ class KittiDataset(DatasetTemplate):
assert label_file.exists() assert label_file.exists()
return object3d_kitti.get_objects_from_label(label_file) return object3d_kitti.get_objects_from_label(label_file)
def get_depth_map(self, idx):
"""
Loads depth map for a sample
Args:
idx: str, Sample index
Returns:
depth: (H, W), Depth map
"""
depth_file = self.root_split_path / 'depth_2' / ('%s.png' % idx)
assert depth_file.exists()
depth = io.imread(depth_file)
depth = depth.astype(np.float32)
depth /= 256.0
return depth
def get_calib(self, idx): def get_calib(self, idx):
calib_file = self.root_split_path / 'calib' / ('%s.txt' % idx) calib_file = self.root_split_path / 'calib' / ('%s.txt' % idx)
assert calib_file.exists() assert calib_file.exists()
...@@ -277,7 +308,7 @@ class KittiDataset(DatasetTemplate): ...@@ -277,7 +308,7 @@ class KittiDataset(DatasetTemplate):
return pred_dict return pred_dict
calib = batch_dict['calib'][batch_index] calib = batch_dict['calib'][batch_index]
image_shape = batch_dict['image_shape'][batch_index] image_shape = batch_dict['image_shape'][batch_index].cpu().numpy()
pred_boxes_camera = box_utils.boxes3d_lidar_to_kitti_camera(pred_boxes, calib) pred_boxes_camera = box_utils.boxes3d_lidar_to_kitti_camera(pred_boxes, calib)
pred_boxes_img = box_utils.boxes3d_kitti_camera_to_imageboxes( pred_boxes_img = box_utils.boxes3d_kitti_camera_to_imageboxes(
pred_boxes_camera, calib, image_shape=image_shape pred_boxes_camera, calib, image_shape=image_shape
...@@ -345,18 +376,11 @@ class KittiDataset(DatasetTemplate): ...@@ -345,18 +376,11 @@ class KittiDataset(DatasetTemplate):
info = copy.deepcopy(self.kitti_infos[index]) info = copy.deepcopy(self.kitti_infos[index])
sample_idx = info['point_cloud']['lidar_idx'] sample_idx = info['point_cloud']['lidar_idx']
points = self.get_lidar(sample_idx)
calib = self.get_calib(sample_idx)
img_shape = info['image']['image_shape'] img_shape = info['image']['image_shape']
if self.dataset_cfg.FOV_POINTS_ONLY: calib = self.get_calib(sample_idx)
pts_rect = calib.lidar_to_rect(points[:, 0:3]) get_item_list = self.dataset_cfg.get('GET_ITEM_LIST', ['points'])
fov_flag = self.get_fov_flag(pts_rect, img_shape, calib)
points = points[fov_flag]
input_dict = { input_dict = {
'points': points,
'frame_id': sample_idx, 'frame_id': sample_idx,
'calib': calib, 'calib': calib,
} }
...@@ -373,10 +397,30 @@ class KittiDataset(DatasetTemplate): ...@@ -373,10 +397,30 @@ class KittiDataset(DatasetTemplate):
'gt_names': gt_names, 'gt_names': gt_names,
'gt_boxes': gt_boxes_lidar 'gt_boxes': gt_boxes_lidar
}) })
if "gt_boxes2d" in get_item_list:
input_dict['gt_boxes2d'] = annos["bbox"]
road_plane = self.get_road_plane(sample_idx) road_plane = self.get_road_plane(sample_idx)
if road_plane is not None: if road_plane is not None:
input_dict['road_plane'] = road_plane input_dict['road_plane'] = road_plane
if "points" in get_item_list:
points = self.get_lidar(sample_idx)
if self.dataset_cfg.FOV_POINTS_ONLY:
pts_rect = calib.lidar_to_rect(points[:, 0:3])
fov_flag = self.get_fov_flag(pts_rect, img_shape, calib)
points = points[fov_flag]
input_dict['points'] = points
if "images" in get_item_list:
input_dict['images'] = self.get_image(sample_idx)
if "depth_maps" in get_item_list:
input_dict['depth_maps'] = self.get_depth_map(sample_idx)
if "calib_matricies" in get_item_list:
input_dict["trans_lidar_to_cam"], input_dict["trans_cam_to_img"] = kitti_utils.calib_to_matricies(calib)
data_dict = self.prepare_data(data_dict=input_dict) data_dict = self.prepare_data(data_dict=input_dict)
data_dict['image_shape'] = img_shape data_dict['image_shape'] = img_shape
......
...@@ -42,3 +42,20 @@ def transform_annotations_to_kitti_format(annos, map_name_to_kitti=None, info_wi ...@@ -42,3 +42,20 @@ def transform_annotations_to_kitti_format(annos, map_name_to_kitti=None, info_wi
anno['rotation_y'] = anno['alpha'] = np.zeros(0) anno['rotation_y'] = anno['alpha'] = np.zeros(0)
return annos return annos
def calib_to_matricies(calib):
"""
Converts calibration object to transformation matricies
Args:
calib: calibration.Calibration, Calibration object
Returns
V2R: (4, 4), Lidar to rectified camera transformation matrix
P2: (3, 4), Camera projection matrix
"""
V2C = np.vstack((calib.V2C, np.array([0, 0, 0, 1], dtype=np.float32))) # (4, 4)
R0 = np.hstack((calib.R0, np.zeros((3, 1), dtype=np.float32))) # (3, 4)
R0 = np.vstack((R0, np.array([0, 0, 0, 1], dtype=np.float32))) # (4, 4)
V2R = R0 @ V2C
P2 = calib.P2
return V2R, P2
\ No newline at end of file
from functools import partial from functools import partial
import numpy as np import numpy as np
from skimage import transform
from ...utils import box_utils, common_utils from ...utils import box_utils, common_utils
...@@ -19,8 +20,11 @@ class DataProcessor(object): ...@@ -19,8 +20,11 @@ class DataProcessor(object):
def mask_points_and_boxes_outside_range(self, data_dict=None, config=None): def mask_points_and_boxes_outside_range(self, data_dict=None, config=None):
if data_dict is None: if data_dict is None:
return partial(self.mask_points_and_boxes_outside_range, config=config) return partial(self.mask_points_and_boxes_outside_range, config=config)
if data_dict.get('points', None) is not None:
mask = common_utils.mask_points_by_range(data_dict['points'], self.point_cloud_range) mask = common_utils.mask_points_by_range(data_dict['points'], self.point_cloud_range)
data_dict['points'] = data_dict['points'][mask] data_dict['points'] = data_dict['points'][mask]
if data_dict.get('gt_boxes', None) is not None and config.REMOVE_OUTSIDE_BOXES and self.training: if data_dict.get('gt_boxes', None) is not None and config.REMOVE_OUTSIDE_BOXES and self.training:
mask = box_utils.mask_boxes_outside_range_numpy( mask = box_utils.mask_boxes_outside_range_numpy(
data_dict['gt_boxes'], self.point_cloud_range, min_num_corners=config.get('min_num_corners', 1) data_dict['gt_boxes'], self.point_cloud_range, min_num_corners=config.get('min_num_corners', 1)
...@@ -106,6 +110,25 @@ class DataProcessor(object): ...@@ -106,6 +110,25 @@ class DataProcessor(object):
data_dict['points'] = points[choice] data_dict['points'] = points[choice]
return data_dict return data_dict
def calculate_grid_size(self, data_dict=None, config=None):
if data_dict is None:
grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
self.grid_size = np.round(grid_size).astype(np.int64)
self.voxel_size = config.VOXEL_SIZE
return partial(self.calculate_grid_size, config=config)
return data_dict
def downsample_depth_map(self, data_dict=None, config=None):
if data_dict is None:
self.depth_downsample_factor = config.DOWNSAMPLE_FACTOR
return partial(self.downsample_depth_map, config=config)
data_dict['depth_maps'] = transform.downscale_local_mean(
image=data_dict['depth_maps'],
factors=(self.depth_downsample_factor, self.depth_downsample_factor)
)
return data_dict
def forward(self, data_dict): def forward(self, data_dict):
""" """
Args: Args:
......
...@@ -2,6 +2,7 @@ from collections import namedtuple ...@@ -2,6 +2,7 @@ from collections import namedtuple
import numpy as np import numpy as np
import torch import torch
import kornia
from .detectors import build_detector from .detectors import build_detector
...@@ -17,8 +18,13 @@ def load_data_to_gpu(batch_dict): ...@@ -17,8 +18,13 @@ def load_data_to_gpu(batch_dict):
for key, val in batch_dict.items(): for key, val in batch_dict.items():
if not isinstance(val, np.ndarray): if not isinstance(val, np.ndarray):
continue continue
if key in ['frame_id', 'metadata', 'calib', 'image_shape']: elif key in ['frame_id', 'metadata', 'calib']:
continue continue
elif key in ['images']:
batch_dict[key] = kornia.image_to_tensor(val).float().cuda().contiguous()
elif key in ['image_shape']:
batch_dict[key] = torch.from_numpy(val).int().cuda()
else:
batch_dict[key] = torch.from_numpy(val).float().cuda() batch_dict[key] = torch.from_numpy(val).float().cuda()
......
from .height_compression import HeightCompression from .height_compression import HeightCompression
from .pointpillar_scatter import PointPillarScatter from .pointpillar_scatter import PointPillarScatter
from .conv2d_collapse import Conv2DCollapse
__all__ = { __all__ = {
'HeightCompression': HeightCompression, 'HeightCompression': HeightCompression,
'PointPillarScatter': PointPillarScatter 'PointPillarScatter': PointPillarScatter,
'Conv2DCollapse': Conv2DCollapse
} }
import torch
import torch.nn as nn
from pcdet.models.model_utils.basic_block_2d import BasicBlock2D
class Conv2DCollapse(nn.Module):
def __init__(self, model_cfg, grid_size):
"""
Initializes 2D convolution collapse module
Args:
model_cfg: EasyDict, Model configuration
grid_size: (X, Y, Z) Voxel grid size
"""
super().__init__()
self.model_cfg = model_cfg
self.num_heights = grid_size[-1]
self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES
self.block = BasicBlock2D(in_channels=self.num_bev_features * self.num_heights,
out_channels=self.num_bev_features,
**self.model_cfg.ARGS)
def forward(self, batch_dict):
"""
Collapses voxel features to BEV via concatenation and channel reduction
Args:
batch_dict:
voxel_features: (B, C, Z, Y, X), Voxel feature representation
Returns:
batch_dict:
spatial_features: (B, C, Y, X), BEV feature representation
"""
voxel_features = batch_dict["voxel_features"]
bev_features = voxel_features.flatten(start_dim=1, end_dim=2) # (B, C, Z, Y, X) -> (B, C*Z, Y, X)
bev_features = self.block(bev_features) # (B, C*Z, Y, X) -> (B, C, Y, X)
batch_dict["spatial_features"] = bev_features
return batch_dict
from .mean_vfe import MeanVFE from .mean_vfe import MeanVFE
from .pillar_vfe import PillarVFE from .pillar_vfe import PillarVFE
from .image_vfe import ImageVFE
from .vfe_template import VFETemplate from .vfe_template import VFETemplate
__all__ = { __all__ = {
'VFETemplate': VFETemplate, 'VFETemplate': VFETemplate,
'MeanVFE': MeanVFE, 'MeanVFE': MeanVFE,
'PillarVFE': PillarVFE 'PillarVFE': PillarVFE,
'ImageVFE': ImageVFE
} }
import torch
from .vfe_template import VFETemplate
from .image_vfe_modules import ffn, f2v
class ImageVFE(VFETemplate):
def __init__(self, model_cfg, grid_size, point_cloud_range, depth_downsample_factor, **kwargs):
super().__init__(model_cfg=model_cfg)
self.grid_size = grid_size
self.pc_range = point_cloud_range
self.downsample_factor = depth_downsample_factor
self.module_topology = [
'ffn', 'f2v'
]
self.build_modules()
def build_modules(self):
"""
Builds modules
"""
for module_name in self.module_topology:
module = getattr(self, 'build_%s' % module_name)()
self.add_module(module_name, module)
def build_ffn(self):
"""
Builds frustum feature network
Returns:
ffn_module: nn.Module, Frustum feature network
"""
ffn_module = ffn.__all__[self.model_cfg.FFN.NAME](
model_cfg=self.model_cfg.FFN,
downsample_factor=self.downsample_factor
)
self.disc_cfg = ffn_module.disc_cfg
return ffn_module
def build_f2v(self):
"""
Builds frustum to voxel transformation
Returns:
f2v_module: nn.Module, Frustum to voxel transformation
"""
f2v_module = f2v.__all__[self.model_cfg.F2V.NAME](
model_cfg=self.model_cfg.F2V,
grid_size=self.grid_size,
pc_range=self.pc_range,
disc_cfg=self.disc_cfg
)
return f2v_module
def get_output_feature_dim(self):
"""
Gets number of output channels
Returns:
out_feature_dim: int, Number of output channels
"""
out_feature_dim = self.ffn.get_output_feature_dim()
return out_feature_dim
def forward(self, batch_dict, **kwargs):
"""
Args:
batch_dict:
images: (N, 3, H_in, W_in), Input images
**kwargs:
Returns:
batch_dict:
voxel_features: (B, C, Z, Y, X), Image voxel features
"""
batch_dict = self.ffn(batch_dict)
batch_dict = self.f2v(batch_dict)
return batch_dict
def get_loss(self):
"""
Gets DDN loss
Returns:
loss: (1), Depth distribution network loss
tb_dict: dict[float], All losses to log in tensorboard
"""
loss, tb_dict = self.ffn.get_loss()
return loss, tb_dict
from .frustum_to_voxel import FrustumToVoxel
__all__ = {
'FrustumToVoxel': FrustumToVoxel
}
import torch
import torch.nn as nn
import kornia
from pcdet.utils import transform_utils
class FrustumGridGenerator(nn.Module):
def __init__(self, grid_size, pc_range, disc_cfg):
"""
Initializes Grid Generator for frustum features
Args:
grid_size: [X, Y, Z], Voxel grid size
pc_range: [x_min, y_min, z_min, x_max, y_max, z_max], Voxelization point cloud range (m)
disc_cfg: EasyDict, Depth discretiziation configuration
"""
super().__init__()
self.dtype = torch.float32
self.grid_size = torch.as_tensor(grid_size)
self.pc_range = pc_range
self.out_of_bounds_val = -2
self.disc_cfg = disc_cfg
# Calculate voxel size
pc_range = torch.as_tensor(pc_range).reshape(2, 3)
self.pc_min = pc_range[0]
self.pc_max = pc_range[1]
self.voxel_size = (self.pc_max - self.pc_min) / self.grid_size
# Create voxel grid
self.depth, self.width, self.height = self.grid_size.int()
self.voxel_grid = kornia.utils.create_meshgrid3d(depth=self.depth,
height=self.height,
width=self.width,
normalized_coordinates=False)
self.voxel_grid = self.voxel_grid.permute(0, 1, 3, 2, 4) # XZY-> XYZ
# Add offsets to center of voxel
self.voxel_grid += 0.5
self.grid_to_lidar = self.grid_to_lidar_unproject(pc_min=self.pc_min,
voxel_size=self.voxel_size)
def grid_to_lidar_unproject(self, pc_min, voxel_size):
"""
Calculate grid to LiDAR unprojection for each plane
Args:
pc_min: [x_min, y_min, z_min], Minimum of point cloud range (m)
voxel_size: [x, y, z], Size of each voxel (m)
Returns:
unproject: (4, 4), Voxel grid to LiDAR unprojection matrix
"""
x_size, y_size, z_size = voxel_size
x_min, y_min, z_min = pc_min
unproject = torch.tensor([[x_size, 0, 0, x_min],
[0, y_size, 0, y_min],
[0, 0, z_size, z_min],
[0, 0, 0, 1]],
dtype=self.dtype) # (4, 4)
return unproject
def transform_grid(self, voxel_grid, grid_to_lidar, lidar_to_cam, cam_to_img):
"""
Transforms voxel sampling grid into frustum sampling grid
Args:
grid: (B, X, Y, Z, 3), Voxel sampling grid
grid_to_lidar: (4, 4), Voxel grid to LiDAR unprojection matrix
lidar_to_cam: (B, 4, 4), LiDAR to camera frame transformation
cam_to_img: (B, 3, 4), Camera projection matrix
Returns:
frustum_grid: (B, X, Y, Z, 3), Frustum sampling grid
"""
B = lidar_to_cam.shape[0]
# Create transformation matricies
V_G = grid_to_lidar # Voxel Grid -> LiDAR (4, 4)
C_V = lidar_to_cam # LiDAR -> Camera (B, 4, 4)
I_C = cam_to_img # Camera -> Image (B, 3, 4)
trans = C_V @ V_G
# Reshape to match dimensions
trans = trans.reshape(B, 1, 1, 4, 4)
voxel_grid = voxel_grid.repeat_interleave(repeats=B, dim=0)
# Transform to camera frame
camera_grid = kornia.transform_points(trans_01=trans, points_1=voxel_grid)
# Project to image
I_C = I_C.reshape(B, 1, 1, 3, 4)
image_grid, image_depths = transform_utils.project_to_image(project=I_C, points=camera_grid)
# Convert depths to depth bins
image_depths = transform_utils.bin_depths(depth_map=image_depths, **self.disc_cfg)
# Stack to form frustum grid
image_depths = image_depths.unsqueeze(-1)
frustum_grid = torch.cat((image_grid, image_depths), dim=-1)
return frustum_grid
def forward(self, lidar_to_cam, cam_to_img, image_shape):
"""
Generates sampling grid for frustum features
Args:
lidar_to_cam: (B, 4, 4), LiDAR to camera frame transformation
cam_to_img: (B, 3, 4), Camera projection matrix
image_shape: (B, 2), Image shape [H, W]
Returns:
frustum_grid (B, X, Y, Z, 3), Sampling grids for frustum features
"""
frustum_grid = self.transform_grid(voxel_grid=self.voxel_grid.to(lidar_to_cam.device),
grid_to_lidar=self.grid_to_lidar.to(lidar_to_cam.device),
lidar_to_cam=lidar_to_cam,
cam_to_img=cam_to_img)
# Normalize grid
image_shape, _ = torch.max(image_shape, dim=0)
image_depth = torch.tensor([self.disc_cfg["num_bins"]],
device=image_shape.device,
dtype=image_shape.dtype)
frustum_shape = torch.cat((image_depth, image_shape))
frustum_grid = transform_utils.normalize_coords(coords=frustum_grid, shape=frustum_shape)
# Replace any NaNs or infinites with out of bounds
mask = ~torch.isfinite(frustum_grid)
frustum_grid[mask] = self.out_of_bounds_val
return frustum_grid
import torch
import torch.nn as nn
from .frustum_grid_generator import FrustumGridGenerator
from .sampler import Sampler
class FrustumToVoxel(nn.Module):
def __init__(self, model_cfg, grid_size, pc_range, disc_cfg):
"""
Initializes module to transform frustum features to voxel features via 3D transformation and sampling
Args:
model_cfg: EasyDict, Module configuration
grid_size: [X, Y, Z], Voxel grid size
pc_range: [x_min, y_min, z_min, x_max, y_max, z_max], Voxelization point cloud range (m)
disc_cfg: EasyDict, Depth discretiziation configuration
"""
super().__init__()
self.model_cfg = model_cfg
self.grid_size = grid_size
self.pc_range = pc_range
self.disc_cfg = disc_cfg
self.grid_generator = FrustumGridGenerator(grid_size=grid_size,
pc_range=pc_range,
disc_cfg=disc_cfg)
self.sampler = Sampler(**model_cfg.SAMPLER)
def forward(self, batch_dict):
"""
Generates voxel features via 3D transformation and sampling
Args:
batch_dict:
frustum_features: (B, C, D, H_image, W_image), Image frustum features
lidar_to_cam: (B, 4, 4), LiDAR to camera frame transformation
cam_to_img: (B, 3, 4), Camera projection matrix
image_shape: (B, 2), Image shape [H, W]
Returns:
batch_dict:
voxel_features: (B, C, Z, Y, X), Image voxel features
"""
# Generate sampling grid for frustum volume
grid = self.grid_generator(lidar_to_cam=batch_dict["trans_lidar_to_cam"],
cam_to_img=batch_dict["trans_cam_to_img"],
image_shape=batch_dict["image_shape"]) # (B, X, Y, Z, 3)
# Sample frustum volume to generate voxel volume
voxel_features = self.sampler(input_features=batch_dict["frustum_features"],
grid=grid) # (B, C, X, Y, Z)
# (B, C, X, Y, Z) -> (B, C, Z, Y, X)
voxel_features = voxel_features.permute(0, 1, 4, 3, 2)
batch_dict["voxel_features"] = voxel_features
return batch_dict
import torch
import torch.nn as nn
import torch.nn.functional as F
class Sampler(nn.Module):
def __init__(self, mode="bilinear", padding_mode="zeros"):
"""
Initializes module
Args:
mode: string, Sampling mode [bilinear/nearest]
padding_mode: string, Padding mode for outside grid values [zeros/border/reflection]
"""
super().__init__()
self.mode = mode
self.padding_mode = padding_mode
def forward(self, input_features, grid):
"""
Samples input using sampling grid
Args:
input_features: (B, C, D, H, W), Input frustum features
grid: (B, X, Y, Z, 3), Sampling grids for input features
Returns
output_features: (B, C, X, Y, Z) Output voxel features
"""
# Sample from grid
output = F.grid_sample(input=input_features, grid=grid, mode=self.mode, padding_mode=self.padding_mode)
return output
from .depth_ffn import DepthFFN
__all__ = {
'DepthFFN': DepthFFN
}
from .ddn_deeplabv3 import DDNDeepLabV3
__all__ = {
'DDNDeepLabV3': DDNDeepLabV3
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment