Unverified Commit f268ba42 authored by ChaimZhu's avatar ChaimZhu Committed by GitHub
Browse files

[Feature] Add SMOKE Head and Coder(v1.0.0.dev0) (#959)



* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import (#839)

* [Enhance]  refactor  iou_neg_piecewise_sampler.py (#842)

* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import

* refactor iou_neg_piecewise_sampler.py

* add docstring

* modify docstring
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>

* [Feature] Add roipooling cuda ops (#843)

* [Refactor] Main code modification for coordinate system refactor (#677)

* [Enhance] Add script for data update (#774)

* Fixed wrong config paths and fixed a bug in test

* Fixed metafile

* Coord sys refactor (main code)

* Update test_waymo_dataset.py

* Manually resolve conflict

* Removed unused lines and fixed imports

* remove coord2box and box2coord

* update dir_limit_offset

* Some minor improvements

* Removed some \s in comments

* Revert a change

* Change Box3DMode to Coord3DMode where points are converted

* Fix points_in_bbox function

* Fix Imvoxelnet config

* Revert adding a line

* Fix rotation bug when batch size is 0

* Keep sign of dir_scores as before

* Fix several comments

* Add a comment

* Fix docstring

* Add data update scripts

* Fix comments

* fix import

* add roipooling cuda ops

* add roi extractor

* add test_roi_extractor unittest

* Modify setup.py to install roipooling ops

* modify docstring

* remove enlarge bbox in roipoint pooling

* add_roipooling_ops

* modify docstring
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>

* [Refactor] Refactor code structure and docstrings (#803)

* refactor points_in_boxes

* Merge same functions of three boxes

* More docstring fixes and unify x/y/z size

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Add "optional" and fix "Default"

* Remove None in function param type

* Fix unittest

* Add comments for NMS functions

* Merge methods of Points

* Add unittest

* Add optional and default value

* Fix box conversion and add unittest

* Fix comments

* Add unit test

* Indent

* Fix CI

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Remove useless \\

* Add unit test for box bev

* More unit tests and refine docstrings in box_np_ops

* Fix comment

* Add deprecation warning

* [Feature] PointXYZWHLRBBoxCoder (#856)

* support PointBasedBoxCoder

* fix unittest bug

* support unittest in gpu

* support unittest in gpu

* modified docstring

* add args

* add args

* [Enhance] Change Groupfree3D config (#855)

* All mods

* PointSample

* PointSample

* [Doc] Add tutorials/data_pipeline Chinese version (#827)

* [Doc] Add tutorials/data_pipeline Chinese version

* refine doc

* Use the absolute link

* Use the absolute link
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* [Doc] Add Chinese doc for `scannet_det.md` (#836)

* Part

* Complete

* Fix comments

* Fix comments

* [Doc] Add Chinese doc for `waymo_det.md` (#859)

* Add complete translation

* Refinements

* Fix comments

* Fix a minor typo
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* Remove 2D annotations on Lyft (#867)

* Add header for files (#869)

* Add header for files

* Add header for files

* Add header for files

* Add header for files

* [fix] fix typos (#872)

* Fix 3 unworking configs (#882)

* [Fix] Fix `index.rst` for Chinese docs (#873)

* Fix index.rst for zh docs

* Change switch language

* [Fix] Centerpoint head nested list transpose  (#879)

* FIX Transpose nested lists without Numpy

* Removed unused Numpy import

* [Enhance] Update PointFusion (#791)

* update point fusion

* remove LIDAR hardcode

* move get_proj_mat_by_coord_type to utils

* fix lint

* remove todo

* fix lint

* [Doc] Add nuscenes_det.md Chinese version (#854)

* add nus chinese doc

* add nuScenes Chinese doc

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* [Fix] Fix RegNet pretrained weight loading (#889)

* Fix regnet pretrained weight loading

* Remove unused file

* Fix centerpoint tta (#892)

* [Enhance] Add benchmark regression script (#808)

* Initial commit

* [Feature] Support DGCNN (v1.0.0.dev0) (#896)

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* support dgcnn

* fix typo

* fix typo

* fix typo

* del gf&fa registry (wo reuse pointnet module)

* fix typo

* add benchmark and add copyright header (for DGCNN only)

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* support dgcnn

* Change cam rot_3d_in_axis (#906)

* [Doc] Add coord sys tutorial pic and change links to dev branch (#912)

* Modify link branch and add pic

* Fix pic

* [Feature] add kitti AP40 evaluation metric (v1.0.0.dev0) (#927)

* Add citation (#901)

* [Feature] Add python3.9 in CI (#900)

* Add python3.0 in CI

* Add python3.0 in CI

* Bump to v0.17.0 (#898)

* Update README.md

* Update README_zh-CN.md

* Update version.py

* Update getting_started.md

* Update getting_started.md

* Update changelog.md

* Remove "recent" in the news

* Remove "recent" in the news

* Fix comments

* [Docs] Fix the version of sphinx (#902)

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* Fix sphinx version

* add AP40

* add unitest

* add unitest

* seperate AP11 and AP40

* fix some typos
Co-authored-by: default avatardingchang <hudingchang.vendor@sensetime.com>
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>

* [Feature] add smoke backbone neck (#939)

* add smoke detecotor and it's backbone and neck

* typo fix

* fix typo

* add docstring

* fix typo

* fix comments

* fix comments

* fix comments

* fix typo

* fix typo

* fix

* fix typo

* fix docstring

* refine feature

* fix typo

* use Basemodule in Neck

* [Refactor] Refactor the transformation from image to camera coordinates (#938)

* Refactor points_img2cam

* Refine docstring

* Support array converter and add unit tests

* [Feature] FCOS3D BBox Coder (#940)

* FCOS3D BBox Coder

* Add unit tests

* Change the value from long to float/double

* Rename bbox_out as bbox

* Add comments to forward returns

* [Feature] PGD BBox Coder (#948)

* Support PGD BBox Coder

* Refine docstring

* [Feature] Support Uncertain L1 Loss (#950)

* Add uncertain l1 loss and its unit tests

* Remove mmcv.jit and refine docstrings

* [Fix] Fix visualization in KITTI dataset (#956)

* fix bug to support kitti vis

* fix

* add smoke head

* fix typo

* fix code

* fix

* fix typo

* add detector

* delete dectector

* fix test_heads

* add coder test

* fix head

* fix bugs in smoke head

* refine kitti_mono3d data config

* remove cuda is available

* fix docstring

* fix unitest

* fix typo

* refine

* add affine aug

* fix rebase typo

* fix docs

* change cam_intrinsics to cam2imgs

* fix typos

* fix lint

* fix bugs

* fix las typos

* fix typos

* add dosctrings for trans_mats
Co-authored-by: default avatarYezhen Cong <52420115+THU17cyz@users.noreply.github.com>
Co-authored-by: default avatarXi Liu <75658786+xiliu8006@users.noreply.github.com>
Co-authored-by: default avatarTHU17cyz <congyezhen71@hotmail.com>
Co-authored-by: default avatarWenhao Wu <79644370+wHao-Wu@users.noreply.github.com>
Co-authored-by: default avatarTai-Wang <tab_wang@outlook.com>
Co-authored-by: default avatardingchang <hudingchang.vendor@sensetime.com>
Co-authored-by: default avatar谢恩泽 <Johnny_ez@163.com>
Co-authored-by: default avatarRobin Karlsson <34254153+robin-karlsson0@users.noreply.github.com>
Co-authored-by: default avatarDanila Rukhovich <danrukh@gmail.com>
parent e93a77f0
......@@ -8,9 +8,10 @@ from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
from .pgd_bbox_coder import PGDBBoxCoder
from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
from .smoke_bbox_coder import SMOKECoder
__all__ = [
'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder'
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder'
]
import numpy as np
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class SMOKECoder(BaseBBoxCoder):
"""Bbox Coder for SMOKE.
Args:
base_depth (tuple[float]): Depth references for decode box depth.
base_dims (tuple[tuple[float]]): Dimension references [l, h, w]
for decode box dimension for each category.
code_size (int): The dimension of boxes to be encoded.
"""
def __init__(self, base_depth, base_dims, code_size):
super(SMOKECoder, self).__init__()
self.base_depth = base_depth
self.base_dims = base_dims
self.bbox_code_size = code_size
def encode(self, locations, dimensions, orientations, input_metas):
"""Encode CameraInstance3DBoxes by locations, dimemsions, orientations.
Args:
locations (Tensor): Center location for 3D boxes.
(N, 3)
dimensions (Tensor): Dimensions for 3D boxes.
shape (N, 3)
orientations (Tensor): Orientations for 3D boxes.
shape (N, 1)
input_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
Return:
:obj:`CameraInstance3DBoxes`: 3D bboxes of batch images,
shape (N, bbox_code_size).
"""
bboxes = torch.cat((locations, dimensions, orientations), dim=1)
assert bboxes.shape[1] == self.bbox_code_size, 'bboxes shape dose not'\
'match the bbox_code_size.'
batch_bboxes = input_metas[0]['box_type_3d'](
bboxes, box_dim=self.bbox_code_size)
return batch_bboxes
def decode(self,
reg,
points,
labels,
cam2imgs,
trans_mats,
locations=None):
"""Decode regression into locations, dimemsions, orientations.
Args:
reg (Tensor): Batch regression for each predict center2d point.
shape: (batch * K (max_objs), C)
points(Tensor): Batch projected bbox centers on image plane.
shape: (batch * K (max_objs) , 2)
labels (Tensor): Batch predict class label for each predict
center2d point.
shape: (batch, K (max_objs))
cam2imgs (Tensor): Batch images' camera intrinsic matrix.
shape: (batch, 4, 4)
trans_mats (Tensor): transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
locations (None | Tensor): if locations is None, this function
is used to decode while inference, otherwise, it's used while
training using the ground truth 3d bbox locations.
shape: (batch * K (max_objs), 3)
Return:
tuple(Tensor): The tuple has components below:
- locations (Tensor): Centers of 3D boxes.
shape: (batch * K (max_objs), 3)
- dimensions (Tensor): Dimensions of 3D boxes.
shpae: (batch * K (max_objs), 3)
- orientations (Tensor): Orientations of 3D
boxes.
shape: (batch * K (max_objs), 1)
"""
depth_offsets = reg[:, 0]
centers2d_offsets = reg[:, 1:3]
dimensions_offsets = reg[:, 3:6]
orientations = reg[:, 6:8]
depths = self._decode_depth(depth_offsets)
# get the 3D Bounding box's center location.
pred_locations = self._decode_location(points, centers2d_offsets,
depths, cam2imgs, trans_mats)
pred_dimensions = self._decode_dimension(labels, dimensions_offsets)
if locations is None:
pred_orientations = self._decode_orientation(
orientations, pred_locations)
else:
pred_orientations = self._decode_orientation(
orientations, locations)
return pred_locations, pred_dimensions, pred_orientations
def _decode_depth(self, depth_offsets):
"""Transform depth offset to depth."""
base_depth = depth_offsets.new_tensor(self.base_depth)
depths = depth_offsets * base_depth[1] + base_depth[0]
return depths
def _decode_location(self, points, centers2d_offsets, depths, cam2imgs,
trans_mats):
"""Retrieve objects location in camera coordinate based on projected
points.
Args:
points (Tensor): Projected points on feature map in (x, y)
shape: (batch * K, 2)
centers2d_offset (Tensor): Project points offset in
(delta_x, delta_y). shape: (batch * K, 2)
depths (Tensor): Object depth z.
shape: (batch * K)
cam2imgs (Tensor): Batch camera intrinsics matrix.
shape: (batch, 4, 4)
trans_mats (Tensor): transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
"""
# number of points
N = centers2d_offsets.shape[0]
# batch_size
N_batch = cam2imgs.shape[0]
batch_id = torch.arange(N_batch).unsqueeze(1)
obj_id = batch_id.repeat(1, N // N_batch).flatten()
trans_mats_inv = trans_mats.inverse()[obj_id]
cam2imgs_inv = cam2imgs.inverse()[obj_id]
centers2d = points + centers2d_offsets
centers2d_extend = torch.cat((centers2d, centers2d.new_ones(N, 1)),
dim=1)
# expand project points as [N, 3, 1]
centers2d_extend = centers2d_extend.unsqueeze(-1)
# transform project points back on original image
centers2d_img = torch.matmul(trans_mats_inv, centers2d_extend)
centers2d_img = centers2d_img * depths.view(N, -1, 1)
centers2d_img_extend = torch.cat(
(centers2d_img, centers2d.new_ones(N, 1, 1)), dim=1)
locations = torch.matmul(cam2imgs_inv, centers2d_img_extend).squeeze(2)
return locations[:, :3]
def _decode_dimension(self, labels, dims_offset):
"""Transform dimension offsets to dimension according to its category.
Args:
labels (Tensor): Each points' category id.
shape: (N, K)
dims_offset (Tensor): Dimension offsets.
shape: (N, 3)
"""
labels = labels.flatten().long()
base_dims = dims_offset.new_tensor(self.base_dims)
dims_select = base_dims[labels, :]
dimensions = dims_offset.exp() * dims_select
return dimensions
def _decode_orientation(self, ori_vector, locations):
"""Retrieve object orientation.
Args:
ori_vector (Tensor): Local orientation in [sin, cos] format.
shape: (N, 2)
locations (Tensor): Object location.
shape: (N, 3)
Return:
Tensor: yaw(Orientation). Notice that the yaw's
range is [-np.pi, np.pi].
shape:(N, 1)
"""
assert len(ori_vector) == len(locations)
locations = locations.view(-1, 3)
rays = torch.atan(locations[:, 0] / (locations[:, 2] + 1e-7))
alphas = torch.atan(ori_vector[:, 0] / (ori_vector[:, 1] + 1e-7))
# get cosine value positive and negtive index.
cos_pos_inds = (ori_vector[:, 1] >= 0).nonzero()
cos_neg_inds = (ori_vector[:, 1] < 0).nonzero()
alphas[cos_pos_inds] -= np.pi / 2
alphas[cos_neg_inds] += np.pi / 2
# retrieve object rotation y angle.
yaws = alphas + rays
larger_inds = (yaws > np.pi).nonzero()
small_inds = (yaws < -np.pi).nonzero()
if len(larger_inds) != 0:
yaws[larger_inds] -= 2 * np.pi
if len(small_inds) != 0:
yaws[small_inds] += 2 * np.pi
yaws = yaws.unsqueeze(-1)
return yaws
......@@ -9,6 +9,7 @@ from .free_anchor3d_head import FreeAnchor3DHead
from .groupfree3d_head import GroupFree3DHead
from .parta2_rpn_head import PartA2RPNHead
from .shape_aware_head import ShapeAwareHead
from .smoke_mono3d_head import SMOKEMono3DHead
from .ssd_3d_head import SSD3DHead
from .vote_head import VoteHead
......@@ -16,5 +17,5 @@ __all__ = [
'Anchor3DHead', 'FreeAnchor3DHead', 'PartA2RPNHead', 'VoteHead',
'SSD3DHead', 'BaseConvBboxHead', 'CenterHead', 'ShapeAwareHead',
'BaseMono3DDenseHead', 'AnchorFreeMono3DHead', 'FCOSMono3DHead',
'GroupFree3DHead'
'GroupFree3DHead', 'SMOKEMono3DHead'
]
import torch
from torch.nn import functional as F
from mmdet.core import multi_apply
from mmdet.core.bbox.builder import build_bbox_coder
from mmdet.models.builder import HEADS
from mmdet.models.utils import gaussian_radius, gen_gaussian_target
from mmdet.models.utils.gaussian_target import (get_local_maximum,
get_topk_from_heatmap,
transpose_and_gather_feat)
from .anchor_free_mono3d_head import AnchorFreeMono3DHead
@HEADS.register_module()
class SMOKEMono3DHead(AnchorFreeMono3DHead):
r"""Anchor-free head used in `SMOKE <https://arxiv.org/abs/2002.10111>`_
.. code-block:: none
/-----> 3*3 conv -----> 1*1 conv -----> cls
feature
\-----> 3*3 conv -----> 1*1 conv -----> reg
Args:
num_classes (int): Number of categories excluding the background
category.
in_channels (int): Number of channels in the input feature map.
dim_channel (list[int]): indexs of dimension offset preds in
regression heatmap channels.
ori_channel (list[int]): indexs of orientation offset pred in
regression heatmap channels.
bbox_coder (:obj:`CameraInstance3DBoxes`): Bbox coder
for encoding and decoding boxes.
loss_cls (dict, optional): Config of classification loss.
Default: loss_cls=dict(type='GaussionFocalLoss', loss_weight=1.0).
loss_bbox (dict, optional): Config of localization loss.
Default: loss_bbox=dict(type='L1Loss', loss_weight=10.0).
loss_dir (dict, optional): Config of direction classification loss.
In SMOKE, Default: None.
loss_attr (dict, optional): Config of attribute classification loss.
In SMOKE, Default: None.
loss_centerness (dict): Config of centerness loss.
norm_cfg (dict): Dictionary to construct and config norm layer.
Default: norm_cfg=dict(type='GN', num_groups=32, requires_grad=True).
init_cfg (dict): Initialization config dict. Default: None.
""" # noqa: E501
def __init__(self,
num_classes,
in_channels,
dim_channel,
ori_channel,
bbox_coder,
loss_cls=dict(type='GaussionFocalLoss', loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=0.1),
loss_dir=None,
loss_attr=None,
norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
init_cfg=None,
**kwargs):
super().__init__(
num_classes,
in_channels,
loss_cls=loss_cls,
loss_bbox=loss_bbox,
loss_dir=loss_dir,
loss_attr=loss_attr,
norm_cfg=norm_cfg,
init_cfg=init_cfg,
**kwargs)
self.dim_channel = dim_channel
self.ori_channel = ori_channel
self.bbox_coder = build_bbox_coder(bbox_coder)
def forward(self, feats):
"""Forward features from the upstream network.
Args:
feats (tuple[Tensor]): Features from the upstream network, each is
a 4D-tensor.
Returns:
tuple:
cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is
num_points * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is
num_points * bbox_code_size.
"""
return multi_apply(self.forward_single, feats)
def forward_single(self, x):
"""Forward features of a single scale level.
Args:
x (Tensor): Input feature map.
Returns:
tuple: Scores for each class, bbox of input feature maps.
"""
cls_score, bbox_pred, dir_cls_pred, attr_pred, cls_feat, reg_feat = \
super().forward_single(x)
cls_score = cls_score.sigmoid() # turn to 0-1
cls_score = cls_score.clamp(min=1e-4, max=1 - 1e-4)
# (N, C, H, W)
offset_dims = bbox_pred[:, self.dim_channel, ...]
bbox_pred[:, self.dim_channel, ...] = offset_dims.sigmoid() - 0.5
# (N, C, H, W)
vector_ori = bbox_pred[:, self.ori_channel, ...]
bbox_pred[:, self.ori_channel, ...] = F.normalize(vector_ori)
return cls_score, bbox_pred
def get_bboxes(self, cls_scores, bbox_preds, img_metas, rescale=None):
"""Generate bboxes from bbox head predictions.
Args:
cls_scores (list[Tensor]): Box scores for each scale level.
bbox_preds (list[Tensor]): Box regression for each scale.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
rescale (bool): If True, return boxes in original image space.
Returns:
list[tuple[:obj:`CameraInstance3DBoxes`, Tensor, Tensor, None]]:
Each item in result_list is 4-tuple.
"""
assert len(cls_scores) == len(bbox_preds) == 1
cam2imgs = torch.stack([
cls_scores[0].new_tensor(img_meta['cam_intrinsic'])
for img_meta in img_metas
])
trans_mats = torch.stack([
cls_scores[0].new_tensor(img_meta['trans_mat'])
for img_meta in img_metas
])
batch_bboxes, batch_scores, batch_topk_labels = self.decode_heatmap(
cls_scores[0],
bbox_preds[0],
img_metas,
cam2imgs=cam2imgs,
trans_mats=trans_mats,
topk=100,
kernel=3)
result_list = []
for img_id in range(len(img_metas)):
bboxes = batch_bboxes[img_id]
scores = batch_scores[img_id]
labels = batch_topk_labels[img_id]
keep_idx = scores > 0.25
bboxes = bboxes[keep_idx]
scores = scores[keep_idx]
labels = labels[keep_idx]
bboxes = img_metas[img_id]['box_type_3d'](
bboxes, box_dim=self.bbox_code_size, origin=(0.5, 0.5, 0.5))
attrs = None
result_list.append((bboxes, scores, labels, attrs))
return result_list
def decode_heatmap(self,
cls_score,
reg_pred,
img_metas,
cam2imgs,
trans_mats,
topk=100,
kernel=3):
"""Transform outputs into detections raw bbox predictions.
Args:
class_score (Tensor): Center predict heatmap,
shape (B, num_classes, H, W).
reg_pred (Tensor): Box regression map.
shape (B, channel, H , W).
img_metas (List[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
cam2imgs (Tensor): Camera intrinsic matrixs.
shape (B, 4, 4)
trans_mats (Tensor): Transformation matrix from original image
to feature map.
shape: (batch, 3, 3)
topk (int): Get top k center keypoints from heatmap. Default 100.
kernel (int): Max pooling kernel for extract local maximum pixels.
Default 3.
Returns:
tuple[torch.Tensor]: Decoded output of SMOKEHead, containing
the following Tensors:
- batch_bboxes (Tensor): Coords of each 3D box.
shape (B, k, 7)
- batch_scores (Tensor): Scores of each 3D box.
shape (B, k)
- batch_topk_labels (Tensor): Categories of each 3D box.
shape (B, k)
"""
img_h, img_w = img_metas[0]['pad_shape'][:2]
bs, _, feat_h, feat_w = cls_score.shape
center_heatmap_pred = get_local_maximum(cls_score, kernel=kernel)
*batch_dets, topk_ys, topk_xs = get_topk_from_heatmap(
center_heatmap_pred, k=topk)
batch_scores, batch_index, batch_topk_labels = batch_dets
regression = transpose_and_gather_feat(reg_pred, batch_index)
regression = regression.view(-1, 8)
points = torch.cat([topk_xs.view(-1, 1),
topk_ys.view(-1, 1).float()],
dim=1)
locations, dimensions, orientations = self.bbox_coder.decode(
regression, points, batch_topk_labels, cam2imgs, trans_mats)
batch_bboxes = torch.cat((locations, dimensions, orientations), dim=1)
batch_bboxes = batch_bboxes.view(bs, -1, self.bbox_code_size)
return batch_bboxes, batch_scores, batch_topk_labels
def get_predictions(self, labels3d, centers2d, gt_locations, gt_dimensions,
gt_orientations, indexs, img_metas, pred_reg):
"""Prepare predictions for computing loss.
Args:
labels3d (Tensor): Labels of each 3D box.
shpae (B, max_objs, )
centers2d (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2)
gt_locations (Tensor): Coords of each 3D box's location.
shape (B * max_objs, 3)
gt_dimensions (Tensor): Dimensions of each 3D box.
shape (N, 3)
gt_orientations (Tensor): Orientation(yaw) of each 3D box.
shape (N, 1)
indexs (Tensor): Indexs of the existence of the 3D box.
shape (B * max_objs, )
img_metas (list[dict]): Meta information of each image,
e.g., image size, scaling factor, etc.
pre_reg (Tensor): Box regression map.
shape (B, channel, H , W).
Returns:
dict: the dict has components below:
- bbox3d_yaws (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred orientations.
- bbox3d_dims (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred dimentions.
- bbox3d_locs (:obj:`CameraInstance3DBoxes`):
bbox calculated using pred locations.
"""
batch, channel = pred_reg.shape[0], pred_reg.shape[1]
w = pred_reg.shape[3]
cam2imgs = torch.stack([
gt_locations.new_tensor(img_meta['cam_intrinsic'])
for img_meta in img_metas
])
trans_mats = torch.stack([
gt_locations.new_tensor(img_meta['trans_mat'])
for img_meta in img_metas
])
centers2d_inds = centers2d[:, 1] * w + centers2d[:, 0]
centers2d_inds = centers2d_inds.view(batch, -1)
pred_regression = transpose_and_gather_feat(pred_reg, centers2d_inds)
pred_regression_pois = pred_regression.view(-1, channel)
locations, dimensions, orientations = self.bbox_coder.decode(
pred_regression_pois, centers2d, labels3d, cam2imgs, trans_mats,
gt_locations)
locations, dimensions, orientations = locations[indexs], dimensions[
indexs], orientations[indexs]
locations[:, 1] += dimensions[:, 1] / 2
gt_locations = gt_locations[indexs]
assert len(locations) == len(gt_locations)
assert len(dimensions) == len(gt_dimensions)
assert len(orientations) == len(gt_orientations)
bbox3d_yaws = self.bbox_coder.encode(gt_locations, gt_dimensions,
orientations, img_metas)
bbox3d_dims = self.bbox_coder.encode(gt_locations, dimensions,
gt_orientations, img_metas)
bbox3d_locs = self.bbox_coder.encode(locations, gt_dimensions,
gt_orientations, img_metas)
pred_bboxes = dict(ori=bbox3d_yaws, dim=bbox3d_dims, loc=bbox3d_locs)
return pred_bboxes
def get_targets(self, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d,
centers2d, feat_shape, img_shape, img_metas):
"""Get training targets for batch images.
``
Args:
gt_bboxes (list[Tensor]): Ground truth bboxes of each image,
shape (num_gt, 4).
gt_labels (list[Tensor]): Ground truth labels of each box,
shape (num_gt,).
gt_bboxes_3d (list[:obj:`CameraInstance3DBoxes`]): 3D Ground
truth bboxes of each image,
shape (num_gt, bbox_code_size).
gt_labels_3d (list[Tensor]): 3D Ground truth labels of each
box, shape (num_gt,).
centers2d (list[Tensor]): Projected 3D centers onto 2D image,
shape (num_gt, 2).
feat_shape (tuple[int]): Feature map shape with value,
shape (B, _, H, W).
img_shape (tuple[int]): Image shape in [h, w] format.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
Returns:
tuple[Tensor, dict]: The Tensor value is the targets of
center heatmap, the dict has components below:
- gt_centers2d (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2)
- gt_labels3d (Tensor): Labels of each 3D box.
shpae (B, max_objs, )
- indexs (Tensor): Indexs of the existence of the 3D box.
shape (B * max_objs, )
- affine_indexs (Tensor): Indexs of the affine of the 3D box.
shape (N, )
- gt_locs (Tensor): Coords of each 3D box's location.
shape (N, 3)
- gt_dims (Tensor): Dimensions of each 3D box.
shape (N, 3)
- gt_yaws (Tensor): Orientation(yaw) of each 3D box.
shape (N, 1)
- gt_cors (Tensor): Coords of the corners of each 3D box.
shape (N, 8, 3)
"""
reg_mask = torch.stack([
gt_bboxes[0].new_tensor(
not img_meta['affine_aug'], dtype=torch.bool)
for img_meta in img_metas
])
img_h, img_w = img_shape[:2]
bs, _, feat_h, feat_w = feat_shape
width_ratio = float(feat_w / img_w) # 1/4
height_ratio = float(feat_h / img_h) # 1/4
assert width_ratio == height_ratio
center_heatmap_target = gt_bboxes[-1].new_zeros(
[bs, self.num_classes, feat_h, feat_w])
gt_centers2d = centers2d.copy()
for batch_id in range(bs):
gt_bbox = gt_bboxes[batch_id]
gt_label = gt_labels[batch_id]
# project centers2d from input image to feat map
gt_center2d = gt_centers2d[batch_id] * width_ratio
for j, center in enumerate(gt_center2d):
center_x_int, center_y_int = center.int()
scale_box_h = (gt_bbox[j][3] - gt_bbox[j][1]) * height_ratio
scale_box_w = (gt_bbox[j][2] - gt_bbox[j][0]) * width_ratio
radius = gaussian_radius([scale_box_h, scale_box_w],
min_overlap=0.7)
radius = max(0, int(radius))
ind = gt_label[j]
gen_gaussian_target(center_heatmap_target[batch_id, ind],
[center_x_int, center_y_int], radius)
avg_factor = max(1, center_heatmap_target.eq(1).sum())
num_ctrs = [center2d.shape[0] for center2d in centers2d]
max_objs = max(num_ctrs)
reg_inds = torch.cat(
[reg_mask[i].repeat(num_ctrs[i]) for i in range(bs)])
inds = torch.zeros((bs, max_objs),
dtype=torch.bool).to(centers2d[0].device)
# put gt 3d bboxes to gpu
gt_bboxes_3d = [
gt_bbox_3d.to(centers2d[0].device) for gt_bbox_3d in gt_bboxes_3d
]
batch_centers2d = centers2d[0].new_zeros((bs, max_objs, 2))
batch_labels_3d = gt_labels_3d[0].new_zeros((bs, max_objs))
batch_gt_locations = \
gt_bboxes_3d[0].tensor.new_zeros((bs, max_objs, 3))
for i in range(bs):
inds[i, :num_ctrs[i]] = 1
batch_centers2d[i, :num_ctrs[i]] = centers2d[i]
batch_labels_3d[i, :num_ctrs[i]] = gt_labels_3d[i]
batch_gt_locations[i, :num_ctrs[i]] = \
gt_bboxes_3d[i].tensor[:, :3]
inds = inds.flatten()
batch_centers2d = batch_centers2d.view(-1, 2) * width_ratio
batch_gt_locations = batch_gt_locations.view(-1, 3)
# filter the empty image, without gt_bboxes_3d
gt_bboxes_3d = [
gt_bbox_3d for gt_bbox_3d in gt_bboxes_3d
if gt_bbox_3d.tensor.shape[0] > 0
]
gt_dimensions = torch.cat(
[gt_bbox_3d.tensor[:, 3:6] for gt_bbox_3d in gt_bboxes_3d])
gt_orientations = torch.cat([
gt_bbox_3d.tensor[:, 6].unsqueeze(-1)
for gt_bbox_3d in gt_bboxes_3d
])
gt_corners = torch.cat(
[gt_bbox_3d.corners for gt_bbox_3d in gt_bboxes_3d])
target_labels = dict(
gt_centers2d=batch_centers2d.long(),
gt_labels3d=batch_labels_3d,
indexs=inds,
reg_indexs=reg_inds,
gt_locs=batch_gt_locations,
gt_dims=gt_dimensions,
gt_yaws=gt_orientations,
gt_cors=gt_corners)
return center_heatmap_target, avg_factor, target_labels
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
gt_labels,
gt_bboxes_3d,
gt_labels_3d,
centers2d,
depths,
attr_labels,
img_metas,
gt_bboxes_ignore=None):
"""Compute loss of the head.
Args:
cls_scores (list[Tensor]): Box scores for each scale level.
shape (num_gt, 4).
bbox_preds (list[Tensor]): Box dims is a 4D-tensor, the channel
number is bbox_code_size.
shape (B, 7, H, W).
gt_bboxes (list[Tensor]): Ground truth bboxes for each image.
shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]): Class indices corresponding to each box.
shape (num_gts, ).
gt_bboxes_3d (list[:obj:`CameraInstance3DBoxes`]): 3D boxes ground
truth. it is the flipped gt_bboxes
gt_labels_3d (list[Tensor]): Same as gt_labels.
centers2d (list[Tensor]): 2D centers on the image.
shape (num_gts, 2).
depths (list[Tensor]): Depth ground truth.
shape (num_gts, ).
attr_labels (list[Tensor]): Attributes indices of each box.
In kitti it's None.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]): Specify which bounding
boxes can be ignored when computing the loss.
Default: None.
Returns:
dict[str, Tensor]: A dictionary of loss components.
"""
assert len(cls_scores) == len(bbox_preds) == 1
assert attr_labels is None
assert gt_bboxes_ignore is None
center2d_heatmap = cls_scores[0]
pred_reg = bbox_preds[0]
center2d_heatmap_target, avg_factor, target_labels = \
self.get_targets(gt_bboxes, gt_labels, gt_bboxes_3d,
gt_labels_3d, centers2d,
center2d_heatmap.shape,
img_metas[0]['pad_shape'],
img_metas)
pred_bboxes = self.get_predictions(
labels3d=target_labels['gt_labels3d'],
centers2d=target_labels['gt_centers2d'],
gt_locations=target_labels['gt_locs'],
gt_dimensions=target_labels['gt_dims'],
gt_orientations=target_labels['gt_yaws'],
indexs=target_labels['indexs'],
img_metas=img_metas,
pred_reg=pred_reg)
loss_cls = self.loss_cls(
center2d_heatmap, center2d_heatmap_target, avg_factor=avg_factor)
reg_inds = target_labels['reg_indexs']
loss_bbox_oris = self.loss_bbox(
pred_bboxes['ori'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox_dims = self.loss_bbox(
pred_bboxes['dim'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox_locs = self.loss_bbox(
pred_bboxes['loc'].corners[reg_inds, ...],
target_labels['gt_cors'][reg_inds, ...])
loss_bbox = loss_bbox_dims + loss_bbox_locs + loss_bbox_oris
loss_dict = dict(loss_cls=loss_cls, loss_bbox=loss_bbox)
return loss_dict
......@@ -358,6 +358,103 @@ def test_vote_head():
assert results[0][2].shape[0] >= 0
def test_smoke_mono3d_head():
head_cfg = dict(
type='SMOKEMono3DHead',
num_classes=3,
in_channels=64,
dim_channel=[3, 4, 5],
ori_channel=[6, 7],
stacked_convs=0,
feat_channels=64,
use_direction_classifier=False,
diff_rad_by_sin=False,
pred_attrs=False,
pred_velo=False,
dir_offset=0,
strides=None,
group_reg_dims=(8, ),
cls_branch=(256, ),
reg_branch=((256, ), ),
num_attrs=0,
bbox_code_size=7,
dir_branch=(),
attr_branch=(),
bbox_coder=dict(
type='SMOKECoder',
base_depth=(28.01, 16.32),
base_dims=((0.88, 1.73, 0.67), (1.78, 1.70, 0.58), (3.88, 1.63,
1.53)),
code_size=7),
loss_cls=dict(type='GaussianFocalLoss', loss_weight=1.0),
loss_bbox=dict(type='L1Loss', reduction='sum', loss_weight=1 / 300),
loss_dir=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_attr=None,
conv_bias=True,
dcn_on_last_conv=False)
self = build_head(head_cfg)
feats = [torch.rand([2, 64, 32, 32], dtype=torch.float32)]
# test forward
ret_dict = self(feats)
assert len(ret_dict) == 2
assert len(ret_dict[0]) == 1
assert ret_dict[0][0].shape == torch.Size([2, 3, 32, 32])
assert ret_dict[1][0].shape == torch.Size([2, 8, 32, 32])
# test loss
gt_bboxes = [
torch.Tensor([[1.0, 2.0, 20.0, 40.0], [45.0, 50.0, 80.0, 70.1],
[34.0, 39.0, 65.0, 64.0]]),
torch.Tensor([[11.0, 22.0, 29.0, 31.0], [41.0, 55.0, 60.0, 99.0],
[29.0, 29.0, 65.0, 56.0]])
]
gt_bboxes_3d = [
CameraInstance3DBoxes(torch.rand([3, 7]), box_dim=7),
CameraInstance3DBoxes(torch.rand([3, 7]), box_dim=7)
]
gt_labels = [torch.randint(0, 3, [3]) for i in range(2)]
gt_labels_3d = gt_labels
centers2d = [torch.randint(0, 60, (3, 2)), torch.randint(0, 40, (3, 2))]
depths = [
torch.rand([3], dtype=torch.float32),
torch.rand([3], dtype=torch.float32)
]
attr_labels = None
img_metas = [
dict(
cam_intrinsic=[[
1260.8474446004698, 0.0, 807.968244525554, 40.1111
], [0.0, 1260.8474446004698, 495.3344268742088, 2.34422],
[0.0, 0.0, 1.0, 0.00333333], [0.0, 0.0, 0.0, 1.0]],
scale_factor=np.array([1., 1., 1., 1.], dtype=np.float32),
pad_shape=[128, 128],
trans_mat=np.array([[0.25, 0., 0.], [0., 0.25, 0], [0., 0., 1.]],
dtype=np.float32),
affine_aug=False,
box_type_3d=CameraInstance3DBoxes) for i in range(2)
]
losses = self.loss(*ret_dict, gt_bboxes, gt_labels, gt_bboxes_3d,
gt_labels_3d, centers2d, depths, attr_labels, img_metas)
assert losses['loss_cls'] >= 0
assert losses['loss_bbox'] >= 0
# test get_boxes
results = self.get_bboxes(*ret_dict, img_metas)
assert len(results) == 2
assert len(results[0]) == 4
assert results[0][0].tensor.shape == torch.Size([100, 7])
assert results[0][1].shape == torch.Size([100])
assert results[0][2].shape == torch.Size([100])
assert results[0][3] is None
def test_parta2_bbox_head():
parta2_bbox_head_cfg = _get_parta2_bbox_head_cfg(
'./parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py')
......
......@@ -3,7 +3,8 @@ import torch
from mmcv.cnn import Scale
from torch import nn as nn
from mmdet3d.core.bbox import DepthInstance3DBoxes, LiDARInstance3DBoxes
from mmdet3d.core.bbox import (CameraInstance3DBoxes, DepthInstance3DBoxes,
LiDARInstance3DBoxes)
from mmdet.core import build_bbox_coder
......@@ -564,3 +565,34 @@ def test_pgd_bbox_coder():
])
assert torch.allclose(
loguniform_prob_depth_preds, expected_preds, atol=1e-3)
def test_smoke_bbox_coder():
bbox_coder_cfg = dict(
type='SMOKECoder',
base_depth=(28.01, 16.32),
base_dims=((3.88, 1.63, 1.53), (1.78, 1.70, 0.58), (0.88, 1.73, 0.67)),
code_size=7)
bbox_coder = build_bbox_coder(bbox_coder_cfg)
regression = torch.rand([200, 8])
points = torch.rand([200, 2])
labels = torch.ones([2, 100])
cam2imgs = torch.rand([2, 4, 4])
trans_mats = torch.rand([2, 3, 3])
img_metas = [dict(box_type_3d=CameraInstance3DBoxes) for i in range(2)]
locations, dimensions, orientations = bbox_coder.decode(
regression, points, labels, cam2imgs, trans_mats)
assert locations.shape == torch.Size([200, 3])
assert dimensions.shape == torch.Size([200, 3])
assert orientations.shape == torch.Size([200, 1])
bboxes = bbox_coder.encode(locations, dimensions, orientations, img_metas)
assert bboxes.tensor.shape == torch.Size([200, 7])
# specically designed to test orientation decode function's
# special cases.
ori_vector = torch.tensor([[-0.9, -0.01], [-0.9, 0.01]])
locations = torch.tensor([[15., 2., 1.], [15., 2., -1.]])
orientations = bbox_coder._decode_orientation(ori_vector, locations)
assert orientations.shape == torch.Size([2, 1])
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment