Unverified Commit f268ba42 authored by ChaimZhu's avatar ChaimZhu Committed by GitHub
Browse files

[Feature] Add SMOKE Head and Coder(v1.0.0.dev0) (#959)



* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import (#839)

* [Enhance]  refactor  iou_neg_piecewise_sampler.py (#842)

* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import

* refactor iou_neg_piecewise_sampler.py

* add docstring

* modify docstring
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>

* [Feature] Add roipooling cuda ops (#843)

* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import

* add roipooling cuda ops

* add roi extractor

* add test_roi_extractor unittest

* Modify setup.py to install roipooling ops

* modify docstring

* remove enlarge bbox in roipoint pooling

* add_roipooling_ops

* modify docstring
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>

* [Refactor] Refactor code structure and docstrings (#803)

* refactor points_in_boxes

* Merge same functions of three boxes

* More docstring fixes and unify x/y/z size

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Remove None in function param type

* Fix unittest

* Add comments for NMS functions

* Merge methods of Points

* Add unittest

* Add optional and default value

* Fix box conversion and add unittest

* Fix comments

* Add unit test

* Indent

* Fix CI

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Add unit test for box bev

* More unit tests and refine docstrings in box_np_ops

* Fix comment

* Add deprecation warning

* [Feature] PointXYZWHLRBBoxCoder (#856)

* support PointBasedBoxCoder

* fix unittest bug

* support unittest in gpu

* support unittest in gpu

* modified docstring

* add args

* add args

* [Enhance] Change Groupfree3D config (#855)

* All mods

* PointSample

* PointSample

* [Doc] Add tutorials/data_pipeline Chinese version (#827)

* [Doc] Add tutorials/data_pipeline Chinese version

* refine doc

* Use the absolute link

* Use the absolute link
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* [Doc] Add Chinese doc for `scannet_det.md` (#836)

* Part

* Complete

* Fix comments

* Fix comments

* [Doc] Add Chinese doc for `waymo_det.md` (#859)

* Add complete translation

* Refinements

* Fix comments

* Fix a minor typo
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* Remove 2D annotations on Lyft (#867)

* Add header for files (#869)

* Add header for files

* Add header for files

* Add header for files

* Add header for files

* [fix] fix typos (#872)

* Fix 3 unworking configs (#882)

* [Fix] Fix `index.rst` for Chinese docs (#873)

* Fix index.rst for zh docs

* Change switch language

* [Fix] Centerpoint head nested list transpose  (#879)

* FIX Transpose nested lists without Numpy

* Removed unused Numpy import

* [Enhance] Update PointFusion (#791)

* update point fusion

* remove LIDAR hardcode

* move get_proj_mat_by_coord_type to utils

* fix lint

* remove todo

* fix lint

* [Doc] Add nuscenes_det.md Chinese version (#854)

* add nus chinese doc

* add nuScenes Chinese doc

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* [Fix] Fix RegNet pretrained weight loading (#889)

* Fix regnet pretrained weight loading

* Remove unused file

* Fix centerpoint tta (#892)

* [Enhance] Add benchmark regression script (#808)

* Initial commit

* [Feature] Support DGCNN (v1.0.0.dev0) (#896)

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* fix typo

* fix typo

* fix typo

* del gf&fa registry (wo reuse pointnet module)

* fix typo

* add benchmark and add copyright header (for DGCNN only)

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* support dgcnn

* Change cam rot_3d_in_axis (#906)

* [Doc] Add coord sys tutorial pic and change links to dev branch (#912)

* Modify link branch and add pic

* Fix pic

* [Feature] add kitti AP40 evaluation metric (v1.0.0.dev0) (#927)

* Add citation (#901)

* [Feature] Add python3.9 in CI (#900)

* Add python3.0 in CI

* Add python3.0 in CI

* Bump to v0.17.0 (#898)

* Update README.md

* Update README_zh-CN.md

* Update version.py

* Update getting_started.md

* Update getting_started.md

* Update changelog.md

* Remove "recent" in the news

* Remove "recent" in the news

* Fix comments

* [Docs] Fix the version of sphinx (#902)

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* add AP40

* add unitest

* add unitest

* seperate AP11 and AP40

* fix some typos
Co-authored-by: default avatardingchang <hudingchang.vendor@sensetime.com>
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* [Feature] add smoke backbone neck (#939)

* add smoke detecotor and it's backbone and neck

* typo fix

* fix typo

* add docstring

* fix typo

* fix comments

* fix comments

* fix comments

* fix typo

* fix typo

* fix

* fix typo

* fix docstring

* refine feature

* fix typo

* use Basemodule in Neck

* [Refactor] Refactor the transformation from image to camera coordinates (#938)

* Refactor points_img2cam

* Refine docstring

* Support array converter and add unit tests

* [Feature] FCOS3D BBox Coder (#940)

* FCOS3D BBox Coder

* Add unit tests

* Change the value from long to float/double

* Rename bbox_out as bbox

* Add comments to forward returns

* [Feature] PGD BBox Coder (#948)

* Support PGD BBox Coder

* Refine docstring

* [Feature] Support Uncertain L1 Loss (#950)

* Add uncertain l1 loss and its unit tests

* Remove mmcv.jit and refine docstrings

* [Fix] Fix visualization in KITTI dataset (#956)

* fix bug to support kitti vis

* fix

* add smoke head

* fix typo

* fix code

* fix

* fix typo

* add detector

* delete dectector

* fix test_heads

* add coder test

* fix head

* fix bugs in smoke head

* refine kitti_mono3d data config

* remove cuda is available

* fix docstring

* fix unitest

* fix typo

* refine

* add affine aug

* fix rebase typo

* fix docs

* change cam_intrinsics to cam2imgs

* fix typos

* fix lint

* fix bugs

* fix las typos

* fix typos

* add dosctrings for trans_mats
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarXi Liu <75658786+xiliu8006@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>
Co-authored-by: default avatarWenhao Wu <79644370+wHao-Wu@users.noreply.github.com>
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>
Co-authored-by: default avatardingchang <hudingchang.vendor@sensetime.com>
Co-authored-by: default avatar谢恩泽 <Johnny_ez@163.com>
Co-authored-by: default avatarRobin Karlsson <34254153+robin-karlsson0@users.noreply.github.com>
Co-authored-by: default avatarDanila Rukhovich <danrukh@gmail.com>
parent e93a77f0
...@@ -8,9 +8,10 @@ from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder ...@@ -8,9 +8,10 @@ from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
from .pgd_bbox_coder import PGDBBoxCoder from .pgd_bbox_coder import PGDBBoxCoder
from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
from .smoke_bbox_coder import SMOKECoder
__all__ = [ __all__ = [
'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder', 'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder', 'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder' 'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder'
] ]
import numpy as np
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class SMOKECoder(BaseBBoxCoder):
"""Bbox Coder for SMOKE.
Args:
base_depth (tuple[float]): Depth references for decode box depth.
base_dims (tuple[tuple[float]]): Dimension references [l, h, w]
for decode box dimension for each category.
code_size (int): The dimension of boxes to be encoded.
"""
def __init__(self, base_depth, base_dims, code_size):
super(SMOKECoder, self).__init__()
self.base_depth = base_depth
self.base_dims = base_dims
self.bbox_code_size = code_size
def encode(self, locations, dimensions, orientations, input_metas):
"""Encode CameraInstance3DBoxes by locations, dimemsions, orientations.
Args:
locations (Tensor): Center location for 3D boxes.
(N, 3)
dimensions (Tensor): Dimensions for 3D boxes.
shape (N, 3)
orientations (Tensor): Orientations for 3D boxes.
shape (N, 1)
input_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
Return:
:obj:`CameraInstance3DBoxes`: 3D bboxes of batch images,
shape (N, bbox_code_size).
"""
bboxes = torch.cat((locations, dimensions, orientations), dim=1)
assert bboxes.shape[1] == self.bbox_code_size, 'bboxes shape dose not'\
'match the bbox_code_size.'
batch_bboxes = input_metas[0]['box_type_3d'](
bboxes, box_dim=self.bbox_code_size)
return batch_bboxes
def decode(self,
reg,
points,
labels,
cam2imgs,
trans_mats,
locations=None):
"""Decode regression into locations, dimemsions, orientations.
Args:
reg (Tensor): Batch regression for each predict center2d point.
shape: (batch * K (max_objs), C)
points(Tensor): Batch projected bbox centers on image plane.
shape: (batch * K (max_objs) , 2)
labels (Tensor): Batch predict class label for each predict
center2d point.
shape: (batch, K (max_objs))
cam2imgs (Tensor): Batch images' camera intrinsic matrix.
shape: (batch, 4, 4)
trans_mats (Tensor): transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
locations (None | Tensor): if locations is None, this function
is used to decode while inference, otherwise, it's used while
training using the ground truth 3d bbox locations.
shape: (batch * K (max_objs), 3)
Return:
tuple(Tensor): The tuple has components below:
- locations (Tensor): Centers of 3D boxes.
shape: (batch * K (max_objs), 3)
- dimensions (Tensor): Dimensions of 3D boxes.
shpae: (batch * K (max_objs), 3)
- orientations (Tensor): Orientations of 3D
boxes.
shape: (batch * K (max_objs), 1)
"""
depth_offsets = reg[:, 0]
centers2d_offsets = reg[:, 1:3]
dimensions_offsets = reg[:, 3:6]
orientations = reg[:, 6:8]
depths = self._decode_depth(depth_offsets)
# get the 3D Bounding box's center location.
pred_locations = self._decode_location(points, centers2d_offsets,
depths, cam2imgs, trans_mats)
pred_dimensions = self._decode_dimension(labels, dimensions_offsets)
if locations is None:
pred_orientations = self._decode_orientation(
orientations, pred_locations)
else:
pred_orientations = self._decode_orientation(
orientations, locations)
return pred_locations, pred_dimensions, pred_orientations
def _decode_depth(self, depth_offsets):
"""Transform depth offset to depth."""
base_depth = depth_offsets.new_tensor(self.base_depth)
depths = depth_offsets * base_depth[1] + base_depth[0]
return depths
def _decode_location(self, points, centers2d_offsets, depths, cam2imgs,
trans_mats):
"""Retrieve objects location in camera coordinate based on projected
points.
Args:
points (Tensor): Projected points on feature map in (x, y)
shape: (batch * K, 2)
centers2d_offset (Tensor): Project points offset in
(delta_x, delta_y). shape: (batch * K, 2)
depths (Tensor): Object depth z.
shape: (batch * K)
cam2imgs (Tensor): Batch camera intrinsics matrix.
shape: (batch, 4, 4)
trans_mats (Tensor): transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
"""
# number of points
N = centers2d_offsets.shape[0]
# batch_size
N_batch = cam2imgs.shape[0]
batch_id = torch.arange(N_batch).unsqueeze(1)
obj_id = batch_id.repeat(1, N // N_batch).flatten()
trans_mats_inv = trans_mats.inverse()[obj_id]
cam2imgs_inv = cam2imgs.inverse()[obj_id]
centers2d = points + centers2d_offsets
centers2d_extend = torch.cat((centers2d, centers2d.new_ones(N, 1)),
dim=1)
# expand project points as [N, 3, 1]
centers2d_extend = centers2d_extend.unsqueeze(-1)
# transform project points back on original image
centers2d_img = torch.matmul(trans_mats_inv, centers2d_extend)
centers2d_img = centers2d_img * depths.view(N, -1, 1)
centers2d_img_extend = torch.cat(
(centers2d_img, centers2d.new_ones(N, 1, 1)), dim=1)
locations = torch.matmul(cam2imgs_inv, centers2d_img_extend).squeeze(2)
return locations[:, :3]
def _decode_dimension(self, labels, dims_offset):
"""Transform dimension offsets to dimension according to its category.
Args:
labels (Tensor): Each points' category id.
shape: (N, K)
dims_offset (Tensor): Dimension offsets.
shape: (N, 3)
"""
labels = labels.flatten().long()
base_dims = dims_offset.new_tensor(self.base_dims)
dims_select = base_dims[labels, :]
dimensions = dims_offset.exp() * dims_select
return dimensions
def _decode_orientation(self, ori_vector, locations):
"""Retrieve object orientation.
Args:
ori_vector (Tensor): Local orientation in [sin, cos] format.
shape: (N, 2)
locations (Tensor): Object location.
shape: (N, 3)
Return:
Tensor: yaw(Orientation). Notice that the yaw's
range is [-np.pi, np.pi].
shape:(N, 1)
"""
assert len(ori_vector) == len(locations)
locations = locations.view(-1, 3)
rays = torch.atan(locations[:, 0] / (locations[:, 2] + 1e-7))
alphas = torch.atan(ori_vector[:, 0] / (ori_vector[:, 1] + 1e-7))
# get cosine value positive and negtive index.
cos_pos_inds = (ori_vector[:, 1] >= 0).nonzero()
cos_neg_inds = (ori_vector[:, 1] < 0).nonzero()
alphas[cos_pos_inds] -= np.pi / 2
alphas[cos_neg_inds] += np.pi / 2
# retrieve object rotation y angle.
yaws = alphas + rays
larger_inds = (yaws > np.pi).nonzero()
small_inds = (yaws < -np.pi).nonzero()
if len(larger_inds) != 0:
yaws[larger_inds] -= 2 * np.pi
if len(small_inds) != 0:
yaws[small_inds] += 2 * np.pi
yaws = yaws.unsqueeze(-1)
return yaws
...@@ -9,6 +9,7 @@ from .free_anchor3d_head import FreeAnchor3DHead ...@@ -9,6 +9,7 @@ from .free_anchor3d_head import FreeAnchor3DHead
from .groupfree3d_head import GroupFree3DHead from .groupfree3d_head import GroupFree3DHead
from .parta2_rpn_head import PartA2RPNHead from .parta2_rpn_head import PartA2RPNHead
from .shape_aware_head import ShapeAwareHead from .shape_aware_head import ShapeAwareHead
from .smoke_mono3d_head import SMOKEMono3DHead
from .ssd_3d_head import SSD3DHead from .ssd_3d_head import SSD3DHead
from .vote_head import VoteHead from .vote_head import VoteHead
...@@ -16,5 +17,5 @@ __all__ = [ ...@@ -16,5 +17,5 @@ __all__ = [
'Anchor3DHead', 'FreeAnchor3DHead', 'PartA2RPNHead', 'VoteHead', 'Anchor3DHead', 'FreeAnchor3DHead', 'PartA2RPNHead', 'VoteHead',
'SSD3DHead', 'BaseConvBboxHead', 'CenterHead', 'ShapeAwareHead', 'SSD3DHead', 'BaseConvBboxHead', 'CenterHead', 'ShapeAwareHead',
'BaseMono3DDenseHead', 'AnchorFreeMono3DHead', 'FCOSMono3DHead', 'BaseMono3DDenseHead', 'AnchorFreeMono3DHead', 'FCOSMono3DHead',
'GroupFree3DHead' 'GroupFree3DHead', 'SMOKEMono3DHead'
] ]
import torch
from torch.nn import functional as F
from mmdet.core import multi_apply
from mmdet.core.bbox.builder import build_bbox_coder
from mmdet.models.builder import HEADS
from mmdet.models.utils import gaussian_radius, gen_gaussian_target
from mmdet.models.utils.gaussian_target import (get_local_maximum,
get_topk_from_heatmap,
transpose_and_gather_feat)
from .anchor_free_mono3d_head import AnchorFreeMono3DHead
@HEADS.register_module()
class SMOKEMono3DHead(AnchorFreeMono3DHead):
r"""Anchor-free head used in `SMOKE <https://arxiv.org/abs/2002.10111>`_
.. code-block:: none
/-----> 3*3 conv -----> 1*1 conv -----> cls
feature
\-----> 3*3 conv -----> 1*1 conv -----> reg
Args:
num_classes (int): Number of categories excluding the background
category.
in_channels (int): Number of channels in the input feature map.
dim_channel (list[int]): indexs of dimension offset preds in
regression heatmap channels.
ori_channel (list[int]): indexs of orientation offset pred in
regression heatmap channels.
bbox_coder (:obj:`CameraInstance3DBoxes`): Bbox coder
for encoding and decoding boxes.
loss_cls (dict, optional): Config of classification loss.
Default: loss_cls=dict(type='GaussionFocalLoss', loss_weight=1.0).
loss_bbox (dict, optional): Config of localization loss.
Default: loss_bbox=dict(type='L1Loss', loss_weight=10.0).
loss_dir (dict, optional): Config of direction classification loss.
In SMOKE, Default: None.
loss_attr (dict, optional): Config of attribute classification loss.
In SMOKE, Default: None.
loss_centerness (dict): Config of centerness loss.
norm_cfg (dict): Dictionary to construct and config norm layer.
Default: norm_cfg=dict(type='GN', num_groups=32, requires_grad=True).
init_cfg (dict): Initialization config dict. Default: None.
""" # noqa: E501
def __init__(self,
num_classes,
in_channels,
dim_channel,
ori_channel,
bbox_coder,
loss_cls=dict(type='GaussionFocalLoss', loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=0.1),
loss_dir=None,
loss_attr=None,
norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
init_cfg=None,
**kwargs):
super().__init__(
num_classes,
in_channels,
loss_cls=loss_cls,
loss_bbox=loss_bbox,
loss_dir=loss_dir,
loss_attr=loss_attr,
norm_cfg=norm_cfg,
init_cfg=init_cfg,
**kwargs)
self.dim_channel = dim_channel
self.ori_channel = ori_channel
self.bbox_coder = build_bbox_coder(bbox_coder)
def forward(self, feats):
"""Forward features from the upstream network.
Args:
feats (tuple[Tensor]): Features from the upstream network, each is
a 4D-tensor.
Returns:
tuple:
cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is
num_points * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is
num_points * bbox_code_size.
"""
return multi_apply(self.forward_single, feats)
def forward_single(self, x):
"""Forward features of a single scale level.
Args:
x (Tensor): Input feature map.
Returns:
tuple: Scores for each class, bbox of input feature maps.
"""
cls_score, bbox_pred, dir_cls_pred, attr_pred, cls_feat, reg_feat = \
super().forward_single(x)
cls_score = cls_score.sigmoid() # turn to 0-1
cls_score = cls_score.clamp(min=1e-4, max=1 - 1e-4)
# (N, C, H, W)
offset_dims = bbox_pred[:, self.dim_channel, ...]
bbox_pred[:, self.dim_channel, ...] = offset_dims.sigmoid() - 0.5
# (N, C, H, W)
vector_ori = bbox_pred[:, self.ori_channel, ...]
bbox_pred[:, self.ori_channel, ...] = F.normalize(vector_ori)
return cls_score, bbox_pred
def get_bboxes(self, cls_scores, bbox_preds, img_metas, rescale=None):
"""Generate bboxes from bbox head predictions.
Args:
cls_scores (list[Tensor]): Box scores for each scale level.
bbox_preds (list[Tensor]): Box regression for each scale.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
rescale (bool): If True, return boxes in original image space.
Returns:
list[tuple[:obj:`CameraInstance3DBoxes`, Tensor, Tensor, None]]:
Each item in result_list is 4-tuple.
"""
assert len(cls_scores) == len(bbox_preds) == 1
cam2imgs = torch.stack([
cls_scores[0].new_tensor(img_meta['cam_intrinsic'])
for img_meta in img_metas
])
trans_mats = torch.stack([
cls_scores[0].new_tensor(img_meta['trans_mat'])
for img_meta in img_metas
])
batch_bboxes, batch_scores, batch_topk_labels = self.decode_heatmap(
cls_scores[0],
bbox_preds[0],
img_metas,
cam2imgs=cam2imgs,
trans_mats=trans_mats,
topk=100,
kernel=3)
result_list = []
for img_id in range(len(img_metas)):
bboxes = batch_bboxes[img_id]
scores = batch_scores[img_id]
labels = batch_topk_labels[img_id]
keep_idx = scores > 0.25
bboxes = bboxes[keep_idx]
scores = scores[keep_idx]
labels = labels[keep_idx]
bboxes = img_metas[img_id]['box_type_3d'](
bboxes, box_dim=self.bbox_code_size, origin=(0.5, 0.5, 0.5))
attrs = None
result_list.append((bboxes, scores, labels, attrs))
return result_list
def decode_heatmap(self,
cls_score,
reg_pred,
img_metas,
cam2imgs,
trans_mats,
topk=100,
kernel=3):
"""Transform outputs into detections raw bbox predictions.
Args:
class_score (Tensor): Center predict heatmap,
shape (B, num_classes, H, W).
reg_pred (Tensor): Box regression map.
shape (B, channel, H , W).
img_metas (List[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
cam2imgs (Tensor): Camera intrinsic matrixs.
shape (B, 4, 4)
trans_mats (Tensor): Transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
topk (int): Get top k center keypoints from heatmap. Default 100.
kernel (int): Max pooling kernel for extract local maximum pixels.
Default 3.
Returns:
tuple[torch.Tensor]: Decoded output of SMOKEHead, containing
the following Tensors:
- batch_bboxes (Tensor): Coords of each 3D box.
shape (B, k, 7)
- batch_scores (Tensor): Scores of each 3D box.
shape (B, k)
- batch_topk_labels (Tensor): Categories of each 3D box.
shape (B, k)
"""
img_h, img_w = img_metas[0]['pad_shape'][:2]
bs, _, feat_h, feat_w = cls_score.shape
center_heatmap_pred = get_local_maximum(cls_score, kernel=kernel)
*batch_dets, topk_ys, topk_xs = get_topk_from_heatmap(
center_heatmap_pred, k=topk)
batch_scores, batch_index, batch_topk_labels = batch_dets
regression = transpose_and_gather_feat(reg_pred, batch_index)
regression = regression.view(-1, 8)
points = torch.cat([topk_xs.view(-1, 1),
topk_ys.view(-1, 1).float()],
dim=1)
locations, dimensions, orientations = self.bbox_coder.decode(
regression, points, batch_topk_labels, cam2imgs, trans_mats)
batch_bboxes = torch.cat((locations, dimensions, orientations), dim=1)
batch_bboxes = batch_bboxes.view(bs, -1, self.bbox_code_size)
return batch_bboxes, batch_scores, batch_topk_labels
def get_predictions(self, labels3d, centers2d, gt_locations, gt_dimensions,
gt_orientations, indexs, img_metas, pred_reg):
"""Prepare predictions for computing loss.
Args:
labels3d (Tensor): Labels of each 3D box.
shpae (B, max_objs, )
centers2d (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2)
gt_locations (Tensor): Coords of each 3D box's location.
shape (B * max_objs, 3)
gt_dimensions (Tensor): Dimensions of each 3D box.
shape (N, 3)
gt_orientations (Tensor): Orientation(yaw) of each 3D box.
shape (N, 1)
indexs (Tensor): Indexs of the existence of the 3D box.
shape (B * max_objs, )
img_metas (list[dict]): Meta information of each image,
e.g., image size, scaling factor, etc.
pre_reg (Tensor): Box regression map.
shape (B, channel, H , W).
Returns:
dict: the dict has components below:
- bbox3d_yaws (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred orientations.
- bbox3d_dims (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred dimentions.
- bbox3d_locs (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred locations.
"""
batch, channel = pred_reg.shape[0], pred_reg.shape[1]
w = pred_reg.shape[3]
cam2imgs = torch.stack([
gt_locations.new_tensor(img_meta['cam_intrinsic'])
for img_meta in img_metas
])
trans_mats = torch.stack([
gt_locations.new_tensor(img_meta['trans_mat'])
for img_meta in img_metas
])
centers2d_inds = centers2d[:, 1] * w + centers2d[:, 0]
centers2d_inds = centers2d_inds.view(batch, -1)
pred_regression = transpose_and_gather_feat(pred_reg, centers2d_inds)
pred_regression_pois = pred_regression.view(-1, channel)
locations, dimensions, orientations = self.bbox_coder.decode(
pred_regression_pois, centers2d, labels3d, cam2imgs, trans_mats,
gt_locations)
locations, dimensions, orientations = locations[indexs], dimensions[
indexs], orientations[indexs]
locations[:, 1] += dimensions[:, 1] / 2
gt_locations = gt_locations[indexs]
assert len(locations) == len(gt_locations)
assert len(dimensions) == len(gt_dimensions)
assert len(orientations) == len(gt_orientations)
bbox3d_yaws = self.bbox_coder.encode(gt_locations, gt_dimensions,
orientations, img_metas)
bbox3d_dims = self.bbox_coder.encode(gt_locations, dimensions,
gt_orientations, img_metas)
bbox3d_locs = self.bbox_coder.encode(locations, gt_dimensions,
gt_orientations, img_metas)
pred_bboxes = dict(ori=bbox3d_yaws, dim=bbox3d_dims, loc=bbox3d_locs)
return pred_bboxes
def get_targets(self, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d,
centers2d, feat_shape, img_shape, img_metas):
"""Get training targets for batch images.
``
Args:
gt_bboxes (list[Tensor]): Ground truth bboxes of each image,
shape (num_gt, 4).
gt_labels (list[Tensor]): Ground truth labels of each box,
shape (num_gt,).
gt_bboxes_3d (list[:obj:`CameraInstance3DBoxes`]): 3D Ground
truth bboxes of each image,
shape (num_gt, bbox_code_size).
gt_labels_3d (list[Tensor]): 3D Ground truth labels of each
box, shape (num_gt,).
centers2d (list[Tensor]): Projected 3D centers onto 2D image,
shape (num_gt, 2).
feat_shape (tuple[int]): Feature map shape with value,
shape (B, _, H, W).
img_shape (tuple[int]): Image shape in [h, w] format.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
Returns:
tuple[Tensor, dict]: The Tensor value is the targets of
center heatmap, the dict has components below:
- gt_centers2d (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2)
- gt_labels3d (Tensor): Labels of each 3D box.
shpae (B, max_objs, )
- indexs (Tensor): Indexs of the existence of the 3D box.
shape (B * max_objs, )
- affine_indexs (Tensor): Indexs of the affine of the 3D box.
shape (N, )
- gt_locs (Tensor): Coords of each 3D box's location.
shape (N, 3)
- gt_dims (Tensor): Dimensions of each 3D box.
shape (N, 3)
- gt_yaws (Tensor): Orientation(yaw) of each 3D box.
shape (N, 1)
- gt_cors (Tensor): Coords of the corners of each 3D box.
shape (N, 8, 3)
"""
reg_mask = torch.stack([
gt_bboxes[0].new_tensor(
not img_meta['affine_aug'], dtype=torch.bool)
for img_meta in img_metas
])
img_h, img_w = img_shape[:2]
bs, _, feat_h, feat_w = feat_shape
width_ratio = float(feat_w / img_w) # 1/4
height_ratio = float(feat_h / img_h) # 1/4
assert width_ratio == height_ratio
center_heatmap_target = gt_bboxes[-1].new_zeros(
[bs, self.num_classes, feat_h, feat_w])
gt_centers2d = centers2d.copy()
for batch_id in range(bs):
gt_bbox = gt_bboxes[batch_id]
gt_label = gt_labels[batch_id]
# project centers2d from input image to feat map
gt_center2d = gt_centers2d[batch_id] * width_ratio
for j, center in enumerate(gt_center2d):
center_x_int, center_y_int = center.int()
scale_box_h = (gt_bbox[j][3] - gt_bbox[j][1]) * height_ratio
scale_box_w = (gt_bbox[j][2] - gt_bbox[j][0]) * width_ratio
radius = gaussian_radius([scale_box_h, scale_box_w],
min_overlap=0.7)
radius = max(0, int(radius))
ind = gt_label[j]
gen_gaussian_target(center_heatmap_target[batch_id, ind],
[center_x_int, center_y_int], radius)
avg_factor = max(1, center_heatmap_target.eq(1).sum())
num_ctrs = [center2d.shape[0] for center2d in centers2d]
max_objs = max(num_ctrs)
reg_inds = torch.cat(
[reg_mask[i].repeat(num_ctrs[i]) for i in range(bs)])
inds = torch.zeros((bs, max_objs),
dtype=torch.bool).to(centers2d[0].device)
# put gt 3d bboxes to gpu
gt_bboxes_3d = [
gt_bbox_3d.to(centers2d[0].device) for gt_bbox_3d in gt_bboxes_3d
]
batch_centers2d = centers2d[0].new_zeros((bs, max_objs, 2))
batch_labels_3d = gt_labels_3d[0].new_zeros((bs, max_objs))
batch_gt_locations = \
gt_bboxes_3d[0].tensor.new_zeros((bs, max_objs, 3))
for i in range(bs):
inds[i, :num_ctrs[i]] = 1
batch_centers2d[i, :num_ctrs[i]] = centers2d[i]
batch_labels_3d[i, :num_ctrs[i]] = gt_labels_3d[i]
batch_gt_locations[i, :num_ctrs[i]] = \
gt_bboxes_3d[i].tensor[:, :3]
inds = inds.flatten()
batch_centers2d = batch_centers2d.view(-1, 2) * width_ratio
batch_gt_locations = batch_gt_locations.view(-1, 3)
# filter the empty image, without gt_bboxes_3d
gt_bboxes_3d = [
gt_bbox_3d for gt_bbox_3d in gt_bboxes_3d
if gt_bbox_3d.tensor.shape[0] > 0
]
gt_dimensions = torch.cat(
[gt_bbox_3d.tensor[:, 3:6] for gt_bbox_3d in gt_bboxes_3d])
gt_orientations = torch.cat([
gt_bbox_3d.tensor[:, 6].unsqueeze(-1)
for gt_bbox_3d in gt_bboxes_3d
])
gt_corners = torch.cat(
[gt_bbox_3d.corners for gt_bbox_3d in gt_bboxes_3d])
target_labels = dict(
gt_centers2d=batch_centers2d.long(),
gt_labels3d=batch_labels_3d,
indexs=inds,
reg_indexs=reg_inds,
gt_locs=batch_gt_locations,
gt_dims=gt_dimensions,
gt_yaws=gt_orientations,
gt_cors=gt_corners)
return center_heatmap_target, avg_factor, target_labels
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
gt_labels,
gt_bboxes_3d,
gt_labels_3d,
centers2d,
depths,
attr_labels,
img_metas,
gt_bboxes_ignore=None):
"""Compute loss of the head.
Args:
cls_scores (list[Tensor]): Box scores for each scale level.
shape (num_gt, 4).
bbox_preds (list[Tensor]): Box dims is a 4D-tensor, the channel
number is bbox_code_size.
shape (B, 7, H, W).
gt_bboxes (list[Tensor]): Ground truth bboxes for each image.
shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]): Class indices corresponding to each box.
shape (num_gts, ).
gt_bboxes_3d (list[:obj:`CameraInstance3DBoxes`]): 3D boxes ground
truth. it is the flipped gt_bboxes
gt_labels_3d (list[Tensor]): Same as gt_labels.
centers2d (list[Tensor]): 2D centers on the image.
shape (num_gts, 2).
depths (list[Tensor]): Depth ground truth.
shape (num_gts, ).
attr_labels (list[Tensor]): Attributes indices of each box.
In kitti it's None.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]): Specify which bounding
boxes can be ignored when computing the loss.
Default: None.
Returns:
dict[str, Tensor]: A dictionary of loss components.
"""
assert len(cls_scores) == len(bbox_preds) == 1
assert attr_labels is None
assert gt_bboxes_ignore is None
center2d_heatmap = cls_scores[0]
pred_reg = bbox_preds[0]
center2d_heatmap_target, avg_factor, target_labels = \
self.get_targets(gt_bboxes, gt_labels, gt_bboxes_3d,
gt_labels_3d, centers2d,
center2d_heatmap.shape,
img_metas[0]['pad_shape'],
img_metas)
pred_bboxes = self.get_predictions(
labels3d=target_labels['gt_labels3d'],
centers2d=target_labels['gt_centers2d'],
gt_locations=target_labels['gt_locs'],
gt_dimensions=target_labels['gt_dims'],
gt_orientations=target_labels['gt_yaws'],
indexs=target_labels['indexs'],
img_metas=img_metas,
pred_reg=pred_reg)
loss_cls = self.loss_cls(
center2d_heatmap, center2d_heatmap_target, avg_factor=avg_factor)
reg_inds = target_labels['reg_indexs']
loss_bbox_oris = self.loss_bbox(
pred_bboxes['ori'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox_dims = self.loss_bbox(
pred_bboxes['dim'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox_locs = self.loss_bbox(
pred_bboxes['loc'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox = loss_bbox_dims + loss_bbox_locs + loss_bbox_oris
loss_dict = dict(loss_cls=loss_cls, loss_bbox=loss_bbox)
return loss_dict
...@@ -358,6 +358,103 @@ def test_vote_head(): ...@@ -358,6 +358,103 @@ def test_vote_head():
assert results[0][2].shape[0] >= 0 assert results[0][2].shape[0] >= 0
def test_smoke_mono3d_head():
head_cfg = dict(
type='SMOKEMono3DHead',
num_classes=3,
in_channels=64,
dim_channel=[3, 4, 5],
ori_channel=[6, 7],
stacked_convs=0,
feat_channels=64,
use_direction_classifier=False,
diff_rad_by_sin=False,
pred_attrs=False,
pred_velo=False,
dir_offset=0,
strides=None,
group_reg_dims=(8, ),
cls_branch=(256, ),
reg_branch=((256, ), ),
num_attrs=0,
bbox_code_size=7,
dir_branch=(),
attr_branch=(),
bbox_coder=dict(
type='SMOKECoder',
base_depth=(28.01, 16.32),
base_dims=((0.88, 1.73, 0.67), (1.78, 1.70, 0.58), (3.88, 1.63,
1.53)),
code_size=7),
loss_cls=dict(type='GaussianFocalLoss', loss_weight=1.0),
loss_bbox=dict(type='L1Loss', reduction='sum', loss_weight=1 / 300),
loss_dir=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_attr=None,
conv_bias=True,
dcn_on_last_conv=False)
self = build_head(head_cfg)
feats = [torch.rand([2, 64, 32, 32], dtype=torch.float32)]
# test forward
ret_dict = self(feats)
assert len(ret_dict) == 2
assert len(ret_dict[0]) == 1
assert ret_dict[0][0].shape == torch.Size([2, 3, 32, 32])
assert ret_dict[1][0].shape == torch.Size([2, 8, 32, 32])
# test loss
gt_bboxes = [
torch.Tensor([[1.0, 2.0, 20.0, 40.0], [45.0, 50.0, 80.0, 70.1],
[34.0, 39.0, 65.0, 64.0]]),
torch.Tensor([[11.0, 22.0, 29.0, 31.0], [41.0, 55.0, 60.0, 99.0],
[29.0, 29.0, 65.0, 56.0]])
]
gt_bboxes_3d = [
CameraInstance3DBoxes(torch.rand([3, 7]), box_dim=7),
CameraInstance3DBoxes(torch.rand([3, 7]), box_dim=7)
]
gt_labels = [torch.randint(0, 3, [3]) for i in range(2)]
gt_labels_3d = gt_labels
centers2d = [torch.randint(0, 60, (3, 2)), torch.randint(0, 40, (3, 2))]
depths = [
torch.rand([3], dtype=torch.float32),
torch.rand([3], dtype=torch.float32)
]
attr_labels = None
img_metas = [
dict(
cam_intrinsic=[[
1260.8474446004698, 0.0, 807.968244525554, 40.1111
], [0.0, 1260.8474446004698, 495.3344268742088, 2.34422],
[0.0, 0.0, 1.0, 0.00333333], [0.0, 0.0, 0.0, 1.0]],
scale_factor=np.array([1., 1., 1., 1.], dtype=np.float32),
pad_shape=[128, 128],
trans_mat=np.array([[0.25, 0., 0.], [0., 0.25, 0], [0., 0., 1.]],
dtype=np.float32),
affine_aug=False,
box_type_3d=CameraInstance3DBoxes) for i in range(2)
]
losses = self.loss(*ret_dict, gt_bboxes, gt_labels, gt_bboxes_3d,
gt_labels_3d, centers2d, depths, attr_labels, img_metas)
assert losses['loss_cls'] >= 0
assert losses['loss_bbox'] >= 0
# test get_boxes
results = self.get_bboxes(*ret_dict, img_metas)
assert len(results) == 2
assert len(results[0]) == 4
assert results[0][0].tensor.shape == torch.Size([100, 7])
assert results[0][1].shape == torch.Size([100])
assert results[0][2].shape == torch.Size([100])
assert results[0][3] is None
def test_parta2_bbox_head(): def test_parta2_bbox_head():
parta2_bbox_head_cfg = _get_parta2_bbox_head_cfg( parta2_bbox_head_cfg = _get_parta2_bbox_head_cfg(
'./parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py') './parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py')
......
...@@ -3,7 +3,8 @@ import torch ...@@ -3,7 +3,8 @@ import torch
from mmcv.cnn import Scale from mmcv.cnn import Scale
from torch import nn as nn from torch import nn as nn
from mmdet3d.core.bbox import DepthInstance3DBoxes, LiDARInstance3DBoxes from mmdet3d.core.bbox import (CameraInstance3DBoxes, DepthInstance3DBoxes,
LiDARInstance3DBoxes)
from mmdet.core import build_bbox_coder from mmdet.core import build_bbox_coder
...@@ -564,3 +565,34 @@ def test_pgd_bbox_coder(): ...@@ -564,3 +565,34 @@ def test_pgd_bbox_coder():
]) ])
assert torch.allclose( assert torch.allclose(
loguniform_prob_depth_preds, expected_preds, atol=1e-3) loguniform_prob_depth_preds, expected_preds, atol=1e-3)
def test_smoke_bbox_coder():
bbox_coder_cfg = dict(
type='SMOKECoder',
base_depth=(28.01, 16.32),
base_dims=((3.88, 1.63, 1.53), (1.78, 1.70, 0.58), (0.88, 1.73, 0.67)),
code_size=7)
bbox_coder = build_bbox_coder(bbox_coder_cfg)
regression = torch.rand([200, 8])
points = torch.rand([200, 2])
labels = torch.ones([2, 100])
cam2imgs = torch.rand([2, 4, 4])
trans_mats = torch.rand([2, 3, 3])
img_metas = [dict(box_type_3d=CameraInstance3DBoxes) for i in range(2)]
locations, dimensions, orientations = bbox_coder.decode(
regression, points, labels, cam2imgs, trans_mats)
assert locations.shape == torch.Size([200, 3])
assert dimensions.shape == torch.Size([200, 3])
assert orientations.shape == torch.Size([200, 1])
bboxes = bbox_coder.encode(locations, dimensions, orientations, img_metas)
assert bboxes.tensor.shape == torch.Size([200, 7])
# specically designed to test orientation decode function's
# special cases.
ori_vector = torch.tensor([[-0.9, -0.01], [-0.9, 0.01]])
locations = torch.tensor([[15., 2., 1.], [15., 2., -1.]])
orientations = bbox_coder._decode_orientation(ori_vector, locations)
assert orientations.shape == torch.Size([2, 1])
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment