Unverified Commit 32a4328b authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Bump version to V1.0.0rc0

Bump version to V1.0.0rc0
parents 86cc487c a8817998
......@@ -19,12 +19,6 @@
**注意**: 我们已经在 0.13.0 及之后的版本中全面支持 pycocotools。
- 如果您遇到下面的问题,并且您的环境包含 numba == 0.48.0 和 numpy >= 1.20.0:
``TypeError: expected dtype object, got 'numpy.dtype[bool_]'``
请将 numpy 的版本降级至 < 1.20.0,或者从源码安装 numba == 0.48,这是由于 numpy == 1.20.0 改变了 API,使得在调用 `np.dtype` 会产生子类。请参考 [这里](https://github.com/numba/numba/issues/6041) 获取更多细节。
- 如果您在导入 pycocotools 相关包时遇到下面的问题:
``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject``
......
......@@ -10,6 +10,7 @@
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
|:-------------------:|:-------------------:|:-------------------:|:-------------------:|
| master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.18.0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.17.3 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0|
......
......@@ -75,3 +75,31 @@
### ImVoxelNet
请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvoxelnet) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果。
### PAConv
请参考 [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/paconv) 获取更多细节,我们在 S3DIS 数据集上给出了相应的结果.
### DGCNN
请参考 [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn) 获取更多细节,我们在 S3DIS 数据集上给出了相应的结果.
### SMOKE
请参考 [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### PGD
请参考 [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd) 获取更多细节,我们在 KITTI 和 nuScenes 数据集上给出了相应的结果.
### PointRCNN
请参考 [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### MonoFlex
请参考 [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### Mixed Precision (FP16) Training
细节请参考 [Mixed Precision (FP16) Training] 在 PointPillars 训练的样例 (https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py).
#!/usr/bin/env python
import functools as func
import glob
import numpy as np
import re
from os import path as osp
import numpy as np
url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/master/'
files = sorted(glob.glob('../configs/*/README.md'))
......
......@@ -6,3 +6,4 @@
data_pipeline.md
customize_models.md
customize_runtime.md
coord_sys_tutorial.md
......@@ -71,7 +71,7 @@ python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --eval-options 'show=True' 'out_dir=${SHOW_DIR}'
```
在运行这个指令后,您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签(例如:在多模态检测任务中的`***_points.obj``***_pred.obj``***_gt.obj``***_img.png``***_pred.png` )。当 `show` 被激活,[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当在没有 GUI 的远程服务器上运行测试的时候,您需要设定 `show=False`
在运行这个指令后,您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签(例如:在多模态检测任务中的`***_points.obj``***_pred.obj``***_gt.obj``***_img.png``***_pred.png` )。当 `show` 被激活,[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当在没有 GUI 的远程服务器上运行测试的时候,无法进行在线可视化,您可以设定 `show=False` 将输出结果保存在 `{SHOW_DIR}`
至于离线可视化,您将有两个选择。
利用 `Open3D` 后端可视化结果,您可以运行下面的指令
......@@ -97,6 +97,12 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py -
**注意**:一旦指定 `--output-dir` ,当按下 open3d 窗口的 `_ESC_`,用户指定的视图图像将被保存。如果您没有显示器,您可以移除 `--online` 标志,从而仅仅保存可视化结果并且进行离线浏览。
为了验证数据的一致性和数据增强的效果,您还可以使用以下命令添加 `--aug` 标志来可视化数据增强后的数据:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --aug --output-dir ${OUTPUT_DIR} --online
```
如果您还想显示 2D 图像以及投影的 3D 边界框,则需要找到支持多模态数据加载的配置文件,然后将 `--task` 参数更改为 `multi_modality-det`。一个例子如下所示
```shell
......@@ -123,6 +129,64 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task
&emsp;
# 模型部署
**Note**: 此工具仍然处于试验阶段,目前只有 SECOND 支持用 [`TorchServe`](https://pytorch.org/serve/) 部署,我们将会在未来支持更多的模型。
为了使用 [`TorchServe`](https://pytorch.org/serve/) 部署 `MMDetection3D` 模型,您可以遵循以下步骤:
## 1. 将模型从 MMDetection3D 转换到 TorchServe
```shell
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```
**Note**: ${MODEL_STORE} 需要为文件夹的绝对路径。
## 2. 构建 `mmdet3d-serve` 镜像
```shell
docker build -t mmdet3d-serve:latest docker/serve/
```
## 3. 运行 `mmdet3d-serve`
查看官网文档来 [使用 docker 运行 TorchServe](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment)
为了在 GPU 上运行,您需要安装 [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。您可以忽略 `--gpus` 参数,从而在 CPU 上运行。
例子:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
```
[阅读文档](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) 关于 Inference (8080), Management (8081) and Metrics (8082) 接口。
## 4. 测试部署
您可以使用 `test_torchserver.py` 进行部署, 同时比较 torchserver 和 pytorch 的结果。
```shell
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
```
例子:
```shell
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
&emsp;
# 模型复杂度
您可以使用 MMDetection 中的 `tools/analysis_tools/get_flops.py` 这个脚本文件,基于 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 计算一个给定模型的计算量 (FLOPS) 和参数量 (params)。
......
......@@ -4,10 +4,11 @@ from .inference import (convert_SyncBN, inference_detector,
inference_multi_modality_detector, inference_segmentor,
init_model, show_result_meshlab)
from .test import single_gpu_test
from .train import train_model
from .train import init_random_seed, train_model
__all__ = [
'inference_detector', 'init_model', 'single_gpu_test',
'inference_mono_3d_detector', 'show_result_meshlab', 'convert_SyncBN',
'train_model', 'inference_multi_modality_detector', 'inference_segmentor'
'train_model', 'inference_multi_modality_detector', 'inference_segmentor',
'init_random_seed'
]
# Copyright (c) OpenMMLab. All rights reserved.
import re
from copy import deepcopy
from os import path as osp
import mmcv
import numpy as np
import re
import torch
from copy import deepcopy
from mmcv.parallel import collate, scatter
from mmcv.runner import load_checkpoint
from os import path as osp
from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes,
from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,
DepthInstance3DBoxes, LiDARInstance3DBoxes,
show_multi_modality_result, show_result,
show_seg_result)
......@@ -83,26 +84,53 @@ def inference_detector(model, pcd):
"""
cfg = model.cfg
device = next(model.parameters()).device # model device
if not isinstance(pcd, str):
cfg = cfg.copy()
# set loading pipeline type
cfg.data.test.pipeline[0].type = 'LoadPointsFromDict'
# build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline)
box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
data = dict(
pts_filename=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
if isinstance(pcd, str):
# load from point clouds file
data = dict(
pts_filename=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
else:
# load from http
data = dict(
points=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
data = test_pipeline(data)
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
......@@ -317,8 +345,7 @@ def show_det_result_meshlab(data,
# for now we convert points into depth mode
box_mode = data['img_metas'][0][0]['box_mode_3d']
if box_mode != Box3DMode.DEPTH:
points = points[..., [1, 0, 2]]
points[..., 0] *= -1
points = Coord3DMode.convert(points, box_mode, Coord3DMode.DEPTH)
show_bboxes = Box3DMode.convert(pred_bboxes, box_mode, Box3DMode.DEPTH)
else:
show_bboxes = deepcopy(pred_bboxes)
......@@ -462,15 +489,17 @@ def show_result_meshlab(data,
data (dict): Contain data from pipeline.
result (dict): Predicted result from model.
out_dir (str): Directory to save visualized result.
score_thr (float): Minimum score of bboxes to be shown. Default: 0.0
show (bool): Visualize the results online. Defaults to False.
snapshot (bool): Whether to save the online results. Defaults to False.
task (str): Distinguish which task result to visualize. Currently we
support 3D detection, multi-modality detection and 3D segmentation.
Defaults to 'det'.
palette (list[list[int]]] | np.ndarray | None): The palette of
segmentation map. If None is given, random palette will be
generated. Defaults to None.
score_thr (float, optional): Minimum score of bboxes to be shown.
Default: 0.0
show (bool, optional): Visualize the results online. Defaults to False.
snapshot (bool, optional): Whether to save the online results.
Defaults to False.
task (str, optional): Distinguish which task result to visualize.
Currently we support 3D detection, multi-modality detection and
3D segmentation. Defaults to 'det'.
palette (list[list[int]]] | np.ndarray, optional): The palette
of segmentation map. If None is given, random palette will be
generated. Defaults to None.
"""
assert task in ['det', 'multi_modality-det', 'seg', 'mono-det'], \
f'unsupported visualization task {task}'
......
# Copyright (c) OpenMMLab. All rights reserved.
from os import path as osp
import mmcv
import torch
from mmcv.image import tensor2imgs
from os import path as osp
from mmdet3d.models import (Base3DDetector, Base3DSegmentor,
SingleStageMono3DDetector)
......@@ -22,9 +23,9 @@ def single_gpu_test(model,
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
show (bool): Whether to save viualization results.
show (bool, optional): Whether to save viualization results.
Default: True.
out_dir (str): The path to save visualization results.
out_dir (str, optional): The path to save visualization results.
Default: None.
Returns:
......
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmcv.runner import get_dist_info
from torch import distributed as dist
from mmdet.apis import train_detector
from mmseg.apis import train_segmentor
def init_random_seed(seed=None, device='cuda'):
"""Initialize random seed.
If the seed is not set, the seed will be automatically randomized,
and then broadcast to all processes to prevent some potential bugs.
Args:
seed (int, optional): The seed. Default to None.
device (str, optional): The device where the seed will be put on.
Default to 'cuda'.
Returns:
int: Seed to be used.
"""
if seed is not None:
return seed
# Make sure all ranks share the same random seed to prevent
# some potential bugs. Please refer to
# https://github.com/open-mmlab/mmdetection/issues/6339
rank, world_size = get_dist_info()
seed = np.random.randint(2**31)
if world_size == 1:
return seed
if rank == 0:
random_num = torch.tensor(seed, dtype=torch.int32, device=device)
else:
random_num = torch.tensor(0, dtype=torch.int32, device=device)
dist.broadcast(random_num, src=0)
return random_num.item()
def train_model(model,
dataset,
cfg,
......
......@@ -19,20 +19,26 @@ class Anchor3DRangeGenerator(object):
ranges (list[list[float]]): Ranges of different anchors.
The ranges are the same across different feature levels. But may
vary for different anchor sizes if size_per_range is True.
sizes (list[list[float]]): 3D sizes of anchors.
scales (list[int]): Scales of anchors in different feature levels.
rotations (list[float]): Rotations of anchors in a feature grid.
custom_values (tuple[float]): Customized values of that anchor. For
example, in nuScenes the anchors have velocities.
reshape_out (bool): Whether to reshape the output into (N x 4).
size_per_range: Whether to use separate ranges for different sizes.
If size_per_range is True, the ranges should have the same length
as the sizes, if not, it will be duplicated.
sizes (list[list[float]], optional): 3D sizes of anchors.
Defaults to [[3.9, 1.6, 1.56]].
scales (list[int], optional): Scales of anchors in different feature
levels. Defaults to [1].
rotations (list[float], optional): Rotations of anchors in a feature
grid. Defaults to [0, 1.5707963].
custom_values (tuple[float], optional): Customized values of that
anchor. For example, in nuScenes the anchors have velocities.
Defaults to ().
reshape_out (bool, optional): Whether to reshape the output into
(N x 4). Defaults to True.
size_per_range (bool, optional): Whether to use separate ranges for
different sizes. If size_per_range is True, the ranges should have
the same length as the sizes, if not, it will be duplicated.
Defaults to True.
"""
def __init__(self,
ranges,
sizes=[[1.6, 3.9, 1.56]],
sizes=[[3.9, 1.6, 1.56]],
scales=[1],
rotations=[0, 1.5707963],
custom_values=(),
......@@ -86,13 +92,14 @@ class Anchor3DRangeGenerator(object):
Args:
featmap_sizes (list[tuple]): List of feature map sizes in
multiple feature levels.
device (str): Device where the anchors will be put on.
device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns:
list[torch.Tensor]: Anchors in multiple feature levels. \
The sizes of each tensor should be [N, 4], where \
N = width * height * num_base_anchors, width and height \
are the sizes of the corresponding feature lavel, \
list[torch.Tensor]: Anchors in multiple feature levels.
The sizes of each tensor should be [N, 4], where
N = width * height * num_base_anchors, width and height
are the sizes of the corresponding feature level,
num_base_anchors is the number of anchors for that level.
"""
assert self.num_levels == len(featmap_sizes)
......@@ -149,7 +156,7 @@ class Anchor3DRangeGenerator(object):
feature_size,
anchor_range,
scale=1,
sizes=[[1.6, 3.9, 1.56]],
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963],
device='cuda'):
"""Generate anchors in a single range.
......@@ -161,14 +168,18 @@ class Anchor3DRangeGenerator(object):
shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with
shape [N, 3], in order of x, y, z.
rotations (list[float] | np.ndarray | torch.Tensor): Rotations of
anchors in a single feature grid.
Defaults to 1.
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors with shape \
torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
"""
if len(feature_size) == 2:
......@@ -231,10 +242,10 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
up corner to distribute anchors.
Args:
anchor_corner (bool): Whether to align with the corner of the voxel
grid. By default it is False and the anchor's center will be
anchor_corner (bool, optional): Whether to align with the corner of the
voxel grid. By default it is False and the anchor's center will be
the same as the corresponding voxel's center, which is also the
center of the corresponding greature grid.
center of the corresponding greature grid. Defaults to False.
"""
def __init__(self, align_corner=False, **kwargs):
......@@ -245,7 +256,7 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
feature_size,
anchor_range,
scale,
sizes=[[1.6, 3.9, 1.56]],
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963],
device='cuda'):
"""Generate anchors in a single range.
......@@ -256,15 +267,18 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
anchor_range (torch.Tensor | list[float]): Range of anchors with
shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with
shape [N, 3], in order of x, y, z.
rotations (list[float] | np.ndarray | torch.Tensor): Rotations of
anchors in a single feature grid.
device (str): Devices that the anchors will be put on.
scale (float | int): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str, optional): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors with shape \
torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
"""
if len(feature_size) == 2:
......@@ -334,7 +348,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
Note that feature maps of different classes may be different.
Args:
kwargs (dict): Arguments are the same as those in \
kwargs (dict): Arguments are the same as those in
:class:`AlignedAnchor3DRangeGenerator`.
"""
......@@ -347,15 +361,16 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
"""Generate grid anchors in multiple feature levels.
Args:
featmap_sizes (list[tuple]): List of feature map sizes for \
featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level.
device (str): Device where the anchors will be put on.
device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns:
list[list[torch.Tensor]]: Anchors in multiple feature levels. \
Note that in this anchor generator, we currently only \
support single feature level. The sizes of each tensor \
should be [num_sizes/ranges*num_rots*featmap_size, \
list[list[torch.Tensor]]: Anchors in multiple feature levels.
Note that in this anchor generator, we currently only
support single feature level. The sizes of each tensor
should be [num_sizes/ranges*num_rots*featmap_size,
box_code_size].
"""
multi_level_anchors = []
......@@ -371,7 +386,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
This function is usually called by method ``self.grid_anchors``.
Args:
featmap_sizes (list[tuple]): List of feature map sizes for \
featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level.
scale (float): Scale factor of the anchors in the current level.
device (str, optional): Device the tensor will be put on.
......
......@@ -12,7 +12,8 @@ from .samplers import (BaseSampler, CombinedSampler,
from .structures import (BaseInstance3DBoxes, Box3DMode, CameraInstance3DBoxes,
Coord3DMode, DepthInstance3DBoxes,
LiDARInstance3DBoxes, get_box_type, limit_period,
mono_cam_box2vis, points_cam2img, xywhr2xyxyr)
mono_cam_box2vis, points_cam2img, points_img2cam,
xywhr2xyxyr)
from .transforms import bbox3d2result, bbox3d2roi, bbox3d_mapping_back
__all__ = [
......@@ -25,5 +26,5 @@ __all__ = [
'LiDARInstance3DBoxes', 'CameraInstance3DBoxes', 'bbox3d2roi',
'bbox3d2result', 'DepthInstance3DBoxes', 'BaseInstance3DBoxes',
'bbox3d_mapping_back', 'xywhr2xyxyr', 'limit_period', 'points_cam2img',
'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
'points_img2cam', 'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
]
# Copyright (c) OpenMMLab. All rights reserved.
# TODO: clean the functions in this file and move the APIs into box structures
# in the future
# NOTICE: All functions in this file are valid for LiDAR or depth boxes only
# if we use default parameters.
import numba
import numpy as np
from .structures.utils import limit_period, points_cam2img, rotation_3d_in_axis
def camera_to_lidar(points, r_rect, velo2cam):
"""Convert points in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args:
points (np.ndarray, shape=[N, 3]): Points in camera coordinate.
r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
......@@ -27,7 +34,10 @@ def camera_to_lidar(points, r_rect, velo2cam):
def box_camera_to_lidar(data, r_rect, velo2cam):
"""Covert boxes in camera coordinate to lidar coordinate.
"""Convert boxes in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args:
data (np.ndarray, shape=[N, 7]): Boxes in camera coordinate.
......@@ -40,10 +50,13 @@ def box_camera_to_lidar(data, r_rect, velo2cam):
np.ndarray, shape=[N, 3]: Boxes in lidar coordinate.
"""
xyz = data[:, 0:3]
l, h, w = data[:, 3:4], data[:, 4:5], data[:, 5:6]
x_size, y_size, z_size = data[:, 3:4], data[:, 4:5], data[:, 5:6]
r = data[:, 6:7]
xyz_lidar = camera_to_lidar(xyz, r_rect, velo2cam)
return np.concatenate([xyz_lidar, w, l, h, r], axis=1)
# yaw and dims also needs to be converted
r_new = -r - np.pi / 2
r_new = limit_period(r_new, period=np.pi * 2)
return np.concatenate([xyz_lidar, x_size, z_size, y_size, r_new], axis=1)
def corners_nd(dims, origin=0.5):
......@@ -80,26 +93,9 @@ def corners_nd(dims, origin=0.5):
return corners
def rotation_2d(points, angles):
"""Rotation 2d points based on origin point clockwise when angle positive.
Args:
points (np.ndarray): Points to be rotated with shape \
(N, point_size, 2).
angles (np.ndarray): Rotation angle with shape (N).
Returns:
np.ndarray: Same shape as points.
"""
rot_sin = np.sin(angles)
rot_cos = np.cos(angles)
rot_mat_T = np.stack([[rot_cos, -rot_sin], [rot_sin, rot_cos]])
return np.einsum('aij,jka->aik', points, rot_mat_T)
def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
"""Convert kitti locations, dimensions and angles to corners.
format: center(xy), dims(xy), angles(clockwise when positive)
format: center(xy), dims(xy), angles(counterclockwise when positive)
Args:
centers (np.ndarray): Locations in kitti label file with shape (N, 2).
......@@ -118,7 +114,7 @@ def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
corners = corners_nd(dims, origin=origin)
# corners: [N, 4, 2]
if angles is not None:
corners = rotation_2d(corners, angles)
corners = rotation_3d_in_axis(corners, angles)
corners += centers.reshape([-1, 1, 2])
return corners
......@@ -172,37 +168,6 @@ def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam):
return lidar_points
def rotation_3d_in_axis(points, angles, axis=0):
"""Rotate points in specific axis.
Args:
points (np.ndarray, shape=[N, point_size, 3]]):
angles (np.ndarray, shape=[N]]):
axis (int, optional): Axis to rotate at. Defaults to 0.
Returns:
np.ndarray: Rotated points.
"""
# points: [N, point_size, 3]
rot_sin = np.sin(angles)
rot_cos = np.cos(angles)
ones = np.ones_like(rot_cos)
zeros = np.zeros_like(rot_cos)
if axis == 1:
rot_mat_T = np.stack([[rot_cos, zeros, -rot_sin], [zeros, ones, zeros],
[rot_sin, zeros, rot_cos]])
elif axis == 2 or axis == -1:
rot_mat_T = np.stack([[rot_cos, -rot_sin, zeros],
[rot_sin, rot_cos, zeros], [zeros, zeros, ones]])
elif axis == 0:
rot_mat_T = np.stack([[zeros, rot_cos, -rot_sin],
[zeros, rot_sin, rot_cos], [ones, zeros, zeros]])
else:
raise ValueError('axis should in range')
return np.einsum('aij,jka->aik', points, rot_mat_T)
def center_to_corner_box3d(centers,
dims,
angles=None,
......@@ -225,7 +190,7 @@ def center_to_corner_box3d(centers,
np.ndarray: Corners with the shape of (N, 8, 3).
"""
# 'length' in kitti format is in x axis.
# yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(wlh)(lidar)
# yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(lwh)(lidar)
# center in kitti format is [0.5, 1.0, 0.5] in xyz.
corners = corners_nd(dims, origin=origin)
# corners: [N, 8, 3]
......@@ -259,8 +224,8 @@ def box2d_to_corner_jit(boxes):
rot_sin = np.sin(boxes[i, -1])
rot_cos = np.cos(boxes[i, -1])
rot_mat_T[0, 0] = rot_cos
rot_mat_T[0, 1] = -rot_sin
rot_mat_T[1, 0] = rot_sin
rot_mat_T[0, 1] = rot_sin
rot_mat_T[1, 0] = -rot_sin
rot_mat_T[1, 1] = rot_cos
box_corners[i] = corners[i] @ rot_mat_T + boxes[i, :2]
return box_corners
......@@ -327,15 +292,15 @@ def rotation_points_single_angle(points, angle, axis=0):
rot_cos = np.cos(angle)
if axis == 1:
rot_mat_T = np.array(
[[rot_cos, 0, -rot_sin], [0, 1, 0], [rot_sin, 0, rot_cos]],
[[rot_cos, 0, rot_sin], [0, 1, 0], [-rot_sin, 0, rot_cos]],
dtype=points.dtype)
elif axis == 2 or axis == -1:
rot_mat_T = np.array(
[[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0], [0, 0, 1]],
[[rot_cos, rot_sin, 0], [-rot_sin, rot_cos, 0], [0, 0, 1]],
dtype=points.dtype)
elif axis == 0:
rot_mat_T = np.array(
[[1, 0, 0], [0, rot_cos, -rot_sin], [0, rot_sin, rot_cos]],
[[1, 0, 0], [0, rot_cos, rot_sin], [0, -rot_sin, rot_cos]],
dtype=points.dtype)
else:
raise ValueError('axis should in range')
......@@ -343,44 +308,6 @@ def rotation_points_single_angle(points, angle, axis=0):
return points @ rot_mat_T, rot_mat_T
def points_cam2img(points_3d, proj_mat, with_depth=False):
"""Project points in camera coordinates to image coordinates.
Args:
points_3d (np.ndarray): Points in shape (N, 3)
proj_mat (np.ndarray): Transformation matrix between coordinates.
with_depth (bool, optional): Whether to keep depth in the output.
Defaults to False.
Returns:
np.ndarray: Points in image coordinates with shape [N, 2].
"""
points_shape = list(points_3d.shape)
points_shape[-1] = 1
assert len(proj_mat.shape) == 2, 'The dimension of the projection'\
f' matrix should be 2 instead of {len(proj_mat.shape)}.'
d1, d2 = proj_mat.shape[:2]
assert (d1 == 3 and d2 == 3) or (d1 == 3 and d2 == 4) or (
d1 == 4 and d2 == 4), 'The shape of the projection matrix'\
f' ({d1}*{d2}) is not supported.'
if d1 == 3:
proj_mat_expanded = np.eye(4, dtype=proj_mat.dtype)
proj_mat_expanded[:d1, :d2] = proj_mat
proj_mat = proj_mat_expanded
points_4 = np.concatenate([points_3d, np.ones(points_shape)], axis=-1)
point_2d = points_4 @ proj_mat.T
point_2d_res = point_2d[..., :2] / point_2d[..., 2:3]
if with_depth:
points_2d_depth = np.concatenate([point_2d_res, point_2d[..., 2:3]],
axis=-1)
return points_2d_depth
return point_2d_res
def box3d_to_bbox(box3d, P2):
"""Convert box3d in camera coordinates to bbox in image coordinates.
......@@ -424,7 +351,10 @@ def corner_to_surfaces_3d(corners):
def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)):
"""Check points in rotated bbox and return indicces.
"""Check points in rotated bbox and return indices.
Note:
This function is for counterclockwise boxes.
Args:
points (np.ndarray, shape=[N, 3+dim]): Points to query.
......@@ -461,25 +391,9 @@ def minmax_to_corner_2d(minmax_box):
return center_to_corner_box2d(center, dims, origin=0.0)
def limit_period(val, offset=0.5, period=np.pi):
"""Limit the value into a period for periodic function.
Args:
val (np.ndarray): The value to be converted.
offset (float, optional): Offset to set the value range. \
Defaults to 0.5.
period (float, optional): Period of the value. Defaults to np.pi.
Returns:
torch.Tensor: Value in the range of \
[-offset * period, (1-offset) * period]
"""
return val - np.floor(val / period + offset) * period
def create_anchors_3d_range(feature_size,
anchor_range,
sizes=((1.6, 3.9, 1.56), ),
sizes=((3.9, 1.6, 1.56), ),
rotations=(0, np.pi / 2),
dtype=np.float32):
"""Create anchors 3d by range.
......@@ -492,14 +406,14 @@ def create_anchors_3d_range(feature_size,
(x_min, y_min, z_min, x_max, y_max, z_max).
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to ((1.6, 3.9, 1.56), ).
Defaults to ((3.9, 1.6, 1.56), ).
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to (0, np.pi / 2).
dtype (type, optional): Data type. Default to np.float32.
dtype (type, optional): Data type. Defaults to np.float32.
Returns:
np.ndarray: Range based anchors with shape of \
np.ndarray: Range based anchors with shape of
(*feature_size, num_sizes, num_rots, 7).
"""
anchor_range = np.array(anchor_range, dtype)
......@@ -550,11 +464,11 @@ def rbbox2d_to_near_bbox(rbboxes):
"""convert rotated bbox to nearest 'standing' or 'lying' bbox.
Args:
rbboxes (np.ndarray): Rotated bboxes with shape of \
rbboxes (np.ndarray): Rotated bboxes with shape of
(N, 5(x, y, xdim, ydim, rad)).
Returns:
np.ndarray: Bounding boxes with the shpae of
np.ndarray: Bounding boxes with the shape of
(N, 4(xmin, ymin, xmax, ymax)).
"""
rots = rbboxes[..., -1]
......@@ -570,6 +484,9 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
"""Calculate box iou. Note that jit version runs ~10x faster than the
box_overlaps function in mmdet3d.core.evaluation.
Note:
This function is for counterclockwise boxes.
Args:
boxes (np.ndarray): Input bounding boxes with shape of (N, 4).
query_boxes (np.ndarray): Query boxes with shape of (K, 4).
......@@ -607,7 +524,10 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
def projection_matrix_to_CRT_kitti(proj):
"""Split projection matrix of kitti.
"""Split projection matrix of KITTI.
Note:
This function is for KITTI only.
P = C @ [R|T]
C is upper triangular matrix, so we need to inverse CR and use QR
......@@ -633,6 +553,9 @@ def projection_matrix_to_CRT_kitti(proj):
def remove_outside_points(points, rect, Trv2c, P2, image_shape):
"""Remove points which are outside of image.
Note:
This function is for KITTI only.
Args:
points (np.ndarray, shape=[N, 3+dims]): Total points.
rect (np.ndarray, shape=[4, 4]): Matrix to project points in
......@@ -782,8 +705,8 @@ def points_in_convex_polygon_3d_jit(points,
normal_vec, d, num_surfaces)
@numba.jit
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
@numba.njit
def points_in_convex_polygon_jit(points, polygon, clockwise=False):
"""Check points is in 2d convex polygons. True when point in polygon.
Args:
......@@ -800,14 +723,16 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
num_points_of_polygon = polygon.shape[1]
num_points = points.shape[0]
num_polygons = polygon.shape[0]
# if clockwise:
# vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
# list(range(num_points_of_polygon - 1)), :]
# else:
# vec1 = polygon[:, [num_points_of_polygon - 1] +
# list(range(num_points_of_polygon - 1)), :] - polygon
# vec1: [num_polygon, num_points_of_polygon, 2]
vec1 = np.zeros((2), dtype=polygon.dtype)
# vec for all the polygons
if clockwise:
vec1 = polygon - polygon[:,
np.array([num_points_of_polygon - 1] + list(
range(num_points_of_polygon - 1))), :]
else:
vec1 = polygon[:,
np.array([num_points_of_polygon - 1] +
list(range(num_points_of_polygon -
1))), :] - polygon
ret = np.zeros((num_points, num_polygons), dtype=np.bool_)
success = True
cross = 0.0
......@@ -815,12 +740,9 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
for j in range(num_polygons):
success = True
for k in range(num_points_of_polygon):
if clockwise:
vec1 = polygon[j, k] - polygon[j, k - 1]
else:
vec1 = polygon[j, k - 1] - polygon[j, k]
cross = vec1[1] * (polygon[j, k, 0] - points[i, 0])
cross -= vec1[0] * (polygon[j, k, 1] - points[i, 1])
vec = vec1[j, k]
cross = vec[1] * (polygon[j, k, 0] - points[i, 0])
cross -= vec[0] * (polygon[j, k, 1] - points[i, 1])
if cross >= 0:
success = False
break
......@@ -839,10 +761,13 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
|/ |/
2 -------- 1
Note:
This function is for LiDAR boxes only.
Args:
boxes3d (np.ndarray): Boxes with shape of (N, 7)
[x, y, z, w, l, h, ry] in LiDAR coords, see the definition of ry
in KITTI dataset.
[x, y, z, x_size, y_size, z_size, ry] in LiDAR coords,
see the definition of ry in KITTI dataset.
bottom_center (bool, optional): Whether z is on the bottom center
of object. Defaults to True.
......@@ -850,19 +775,25 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
np.ndarray: Box corners with the shape of [N, 8, 3].
"""
boxes_num = boxes3d.shape[0]
w, l, h = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
x_corners = np.array(
[w / 2., -w / 2., -w / 2., w / 2., w / 2., -w / 2., -w / 2., w / 2.],
dtype=np.float32).T
y_corners = np.array(
[-l / 2., -l / 2., l / 2., l / 2., -l / 2., -l / 2., l / 2., l / 2.],
dtype=np.float32).T
x_size, y_size, z_size = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
x_corners = np.array([
x_size / 2., -x_size / 2., -x_size / 2., x_size / 2., x_size / 2.,
-x_size / 2., -x_size / 2., x_size / 2.
],
dtype=np.float32).T
y_corners = np.array([
-y_size / 2., -y_size / 2., y_size / 2., y_size / 2., -y_size / 2.,
-y_size / 2., y_size / 2., y_size / 2.
],
dtype=np.float32).T
if bottom_center:
z_corners = np.zeros((boxes_num, 8), dtype=np.float32)
z_corners[:, 4:8] = h.reshape(boxes_num, 1).repeat(4, axis=1) # (N, 8)
z_corners[:, 4:8] = z_size.reshape(boxes_num, 1).repeat(
4, axis=1) # (N, 8)
else:
z_corners = np.array([
-h / 2., -h / 2., -h / 2., -h / 2., h / 2., h / 2., h / 2., h / 2.
-z_size / 2., -z_size / 2., -z_size / 2., -z_size / 2.,
z_size / 2., z_size / 2., z_size / 2., z_size / 2.
],
dtype=np.float32).T
......@@ -870,9 +801,9 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
zeros, ones = np.zeros(
ry.size, dtype=np.float32), np.ones(
ry.size, dtype=np.float32)
rot_list = np.array([[np.cos(ry), -np.sin(ry), zeros],
[np.sin(ry), np.cos(ry), zeros], [zeros, zeros,
ones]]) # (3, 3, N)
rot_list = np.array([[np.cos(ry), np.sin(ry), zeros],
[-np.sin(ry), np.cos(ry), zeros],
[zeros, zeros, ones]]) # (3, 3, N)
R_list = np.transpose(rot_list, (2, 0, 1)) # (N, 3, 3)
temp_corners = np.concatenate((x_corners.reshape(
......
......@@ -3,10 +3,17 @@ from mmdet.core.bbox import build_bbox_coder
from .anchor_free_bbox_coder import AnchorFreeBBoxCoder
from .centerpoint_bbox_coders import CenterPointBBoxCoder
from .delta_xyzwhlr_bbox_coder import DeltaXYZWLHRBBoxCoder
from .fcos3d_bbox_coder import FCOS3DBBoxCoder
from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
from .monoflex_bbox_coder import MonoFlexCoder
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
from .pgd_bbox_coder import PGDBBoxCoder
from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
from .smoke_bbox_coder import SMOKECoder
__all__ = [
'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder'
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder',
'MonoFlexCoder'
]
......@@ -25,7 +25,7 @@ class AnchorFreeBBoxCoder(PartialBinBasedBBoxCoder):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes.
......
......@@ -13,12 +13,12 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
pc_range (list[float]): Range of point cloud.
out_size_factor (int): Downsample factor of the model.
voxel_size (list[float]): Size of voxel.
post_center_range (list[float]): Limit of the center.
post_center_range (list[float], optional): Limit of the center.
Default: None.
max_num (int): Max number to be kept. Default: 100.
score_threshold (float): Threshold to filter boxes based on score.
Default: None.
code_size (int): Code size of bboxes. Default: 9
max_num (int, optional): Max number to be kept. Default: 100.
score_threshold (float, optional): Threshold to filter boxes
based on score. Default: None.
code_size (int, optional): Code size of bboxes. Default: 9
"""
def __init__(self,
......@@ -45,7 +45,8 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
feats (torch.Tensor): Features to be transposed and gathered
with the shape of [B, 2, W, H].
inds (torch.Tensor): Indexes with the shape of [B, N].
feat_masks (torch.Tensor): Mask of the feats. Default: None.
feat_masks (torch.Tensor, optional): Mask of the feats.
Default: None.
Returns:
torch.Tensor: Gathered feats.
......@@ -64,7 +65,7 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
Args:
scores (torch.Tensor): scores with the shape of [B, N, W, H].
K (int): Number to be kept. Defaults to 80.
K (int, optional): Number to be kept. Defaults to 80.
Returns:
tuple[torch.Tensor]
......@@ -135,9 +136,9 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
dim (torch.Tensor): Dim of the boxes with the shape of
[B, 1, W, H].
vel (torch.Tensor): Velocity with the shape of [B, 1, W, H].
reg (torch.Tensor): Regression value of the boxes in 2D with
the shape of [B, 2, W, H]. Default: None.
task_id (int): Index of task. Default: -1.
reg (torch.Tensor, optional): Regression value of the boxes in
2D with the shape of [B, 2, W, H]. Default: None.
task_id (int, optional): Index of task. Default: -1.
Returns:
list[dict]: Decoded boxes.
......
......@@ -19,9 +19,9 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):
@staticmethod
def encode(src_boxes, dst_boxes):
"""Get box regression transformation deltas (dx, dy, dz, dw, dh, dl,
dr, dv*) that can be used to transform the `src_boxes` into the
`target_boxes`.
"""Get box regression transformation deltas (dx, dy, dz, dx_size,
dy_size, dz_size, dr, dv*) that can be used to transform the
`src_boxes` into the `target_boxes`.
Args:
src_boxes (torch.Tensor): source boxes, e.g., object proposals.
......@@ -56,13 +56,13 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):
@staticmethod
def decode(anchors, deltas):
"""Apply transformation `deltas` (dx, dy, dz, dw, dh, dl, dr, dv*) to
`boxes`.
"""Apply transformation `deltas` (dx, dy, dz, dx_size, dy_size,
dz_size, dr, dv*) to `boxes`.
Args:
anchors (torch.Tensor): Parameters of anchors with shape (N, 7).
deltas (torch.Tensor): Encoded boxes with shape
(N, 7+n) [x, y, z, w, l, h, r, velo*].
(N, 7+n) [x, y, z, x_size, y_size, z_size, r, velo*].
Returns:
torch.Tensor: Decoded boxes.
......
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
from ..structures import limit_period
@BBOX_CODERS.register_module()
class FCOS3DBBoxCoder(BaseBBoxCoder):
"""Bounding box coder for FCOS3D.
Args:
base_depths (tuple[tuple[float]]): Depth references for decode box
depth. Defaults to None.
base_dims (tuple[tuple[float]]): Dimension references for decode box
dimension. Defaults to None.
code_size (int): The dimension of boxes to be encoded. Defaults to 7.
norm_on_bbox (bool): Whether to apply normalization on the bounding
box 2D attributes. Defaults to True.
"""
def __init__(self,
base_depths=None,
base_dims=None,
code_size=7,
norm_on_bbox=True):
super(FCOS3DBBoxCoder, self).__init__()
self.base_depths = base_depths
self.base_dims = base_dims
self.bbox_code_size = code_size
self.norm_on_bbox = norm_on_bbox
def encode(self, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels):
# TODO: refactor the encoder in the FCOS3D and PGD head
pass
def decode(self, bbox, scale, stride, training, cls_score=None):
"""Decode regressed results into 3D predictions.
Note that offsets are not transformed to the projected 3D centers.
Args:
bbox (torch.Tensor): Raw bounding box predictions in shape
[N, C, H, W].
scale (tuple[`Scale`]): Learnable scale parameters.
stride (int): Stride for a specific feature level.
training (bool): Whether the decoding is in the training
procedure.
cls_score (torch.Tensor): Classification score map for deciding
which base depth or dim is used. Defaults to None.
Returns:
torch.Tensor: Decoded boxes.
"""
# scale the bbox of different level
# only apply to offset, depth and size prediction
scale_offset, scale_depth, scale_size = scale[0:3]
clone_bbox = bbox.clone()
bbox[:, :2] = scale_offset(clone_bbox[:, :2]).float()
bbox[:, 2] = scale_depth(clone_bbox[:, 2]).float()
bbox[:, 3:6] = scale_size(clone_bbox[:, 3:6]).float()
if self.base_depths is None:
bbox[:, 2] = bbox[:, 2].exp()
elif len(self.base_depths) == 1: # only single prior
mean = self.base_depths[0][0]
std = self.base_depths[0][1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
else: # multi-class priors
assert len(self.base_depths) == cls_score.shape[1], \
'The number of multi-class depth priors should be equal to ' \
'the number of categories.'
indices = cls_score.max(dim=1)[1]
depth_priors = cls_score.new_tensor(
self.base_depths)[indices, :].permute(0, 3, 1, 2)
mean = depth_priors[:, 0]
std = depth_priors[:, 1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
bbox[:, 3:6] = bbox[:, 3:6].exp()
if self.base_dims is not None:
assert len(self.base_dims) == cls_score.shape[1], \
'The number of anchor sizes should be equal to the number ' \
'of categories.'
indices = cls_score.max(dim=1)[1]
size_priors = cls_score.new_tensor(
self.base_dims)[indices, :].permute(0, 3, 1, 2)
bbox[:, 3:6] = size_priors * bbox.clone()[:, 3:6]
assert self.norm_on_bbox is True, 'Setting norm_on_bbox to False '\
'has not been thoroughly tested for FCOS3D.'
if self.norm_on_bbox:
if not training:
# Note that this line is conducted only when testing
bbox[:, :2] *= stride
return bbox
@staticmethod
def decode_yaw(bbox, centers2d, dir_cls, dir_offset, cam2img):
"""Decode yaw angle and change it from local to global.i.
Args:
bbox (torch.Tensor): Bounding box predictions in shape
[N, C] with yaws to be decoded.
centers2d (torch.Tensor): Projected 3D-center on the image planes
corresponding to the box predictions.
dir_cls (torch.Tensor): Predicted direction classes.
dir_offset (float): Direction offset before dividing all the
directions into several classes.
cam2img (torch.Tensor): Camera intrinsic matrix in shape [4, 4].
Returns:
torch.Tensor: Bounding boxes with decoded yaws.
"""
if bbox.shape[0] > 0:
dir_rot = limit_period(bbox[..., 6] - dir_offset, 0, np.pi)
bbox[..., 6] = \
dir_rot + dir_offset + np.pi * dir_cls.to(bbox.dtype)
bbox[:, 6] = torch.atan2(centers2d[:, 0] - cam2img[0, 2],
cam2img[0, 0]) + bbox[:, 6]
return bbox
......@@ -14,9 +14,10 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
num_dir_bins (int): Number of bins to encode direction angle.
num_sizes (int): Number of size clusters.
mean_sizes (list[list[int]]): Mean size of bboxes in each class.
with_rot (bool): Whether the bbox is with rotation. Defaults to True.
size_cls_agnostic (bool): Whether the predicted size is class-agnostic.
with_rot (bool, optional): Whether the bbox is with rotation.
Defaults to True.
size_cls_agnostic (bool, optional): Whether the predicted size is
class-agnostic. Defaults to True.
"""
def __init__(self,
......@@ -36,7 +37,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes.
......@@ -76,7 +77,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
- size_class: predicted bbox size class.
- size_res: predicted bbox size residual.
- size: predicted class-agnostic bbox size
prefix (str): Decode predictions with specific prefix.
prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''.
Returns:
......@@ -122,7 +123,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
cls_preds (torch.Tensor): Class predicted features to split.
reg_preds (torch.Tensor): Regression predicted features to split.
base_xyz (torch.Tensor): Coordinates of points.
prefix (str): Decode predictions with specific prefix.
prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''.
Returns:
......
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from torch.nn import functional as F
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class MonoFlexCoder(BaseBBoxCoder):
"""Bbox Coder for MonoFlex.
Args:
depth_mode (str): The mode for depth calculation.
Available options are "linear", "inv_sigmoid", and "exp".
base_depth (tuple[float]): References for decoding box depth.
depth_range (list): Depth range of predicted depth.
combine_depth (bool): Whether to use combined depth (direct depth
and depth from keypoints) or use direct depth only.
uncertainty_range (list): Uncertainty range of predicted depth.
base_dims (tuple[tuple[float]]): Dimensions mean and std of decode bbox
dimensions [l, h, w] for each category.
dims_mode (str): The mode for dimension calculation.
Available options are "linear" and "exp".
multibin (bool): Whether to use multibin representation.
num_dir_bins (int): Number of Number of bins to encode
direction angle.
bin_centers (list[float]): Local yaw centers while using multibin
representations.
bin_margin (float): Margin of multibin representations.
code_size (int): The dimension of boxes to be encoded.
eps (float, optional): A value added to the denominator for numerical
stability. Default 1e-3.
"""
def __init__(self,
depth_mode,
base_depth,
depth_range,
combine_depth,
uncertainty_range,
base_dims,
dims_mode,
multibin,
num_dir_bins,
bin_centers,
bin_margin,
code_size,
eps=1e-3):
super(MonoFlexCoder, self).__init__()
# depth related
self.depth_mode = depth_mode
self.base_depth = base_depth
self.depth_range = depth_range
self.combine_depth = combine_depth
self.uncertainty_range = uncertainty_range
# dimensions related
self.base_dims = base_dims
self.dims_mode = dims_mode
# orientation related
self.multibin = multibin
self.num_dir_bins = num_dir_bins
self.bin_centers = bin_centers
self.bin_margin = bin_margin
# output related
self.bbox_code_size = code_size
self.eps = eps
def encode(self, gt_bboxes_3d):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (`BaseInstance3DBoxes`): Ground truth 3D bboxes.
shape: (N, 7).
Returns:
torch.Tensor: Targets of orientations.
"""
local_yaw = gt_bboxes_3d.local_yaw
# encode local yaw (-pi ~ pi) to multibin format
encode_local_yaw = local_yaw.new_zeros(
[local_yaw.shape[0], self.num_dir_bins * 2])
bin_size = 2 * np.pi / self.num_dir_bins
margin_size = bin_size * self.bin_margin
bin_centers = local_yaw.new_tensor(self.bin_centers)
range_size = bin_size / 2 + margin_size
offsets = local_yaw.unsqueeze(1) - bin_centers.unsqueeze(0)
offsets[offsets > np.pi] = offsets[offsets > np.pi] - 2 * np.pi
offsets[offsets < -np.pi] = offsets[offsets < -np.pi] + 2 * np.pi
for i in range(self.num_dir_bins):
offset = offsets[:, i]
inds = abs(offset) < range_size
encode_local_yaw[inds, i] = 1
encode_local_yaw[inds, i + self.num_dir_bins] = offset[inds]
orientation_target = encode_local_yaw
return orientation_target
def decode(self, bbox, base_centers2d, labels, downsample_ratio, cam2imgs):
"""Decode bounding box regression into 3D predictions.
Args:
bbox (Tensor): Raw bounding box predictions for each
predict center2d point.
shape: (N, C)
base_centers2d (torch.Tensor): Base centers2d for 3D bboxes.
shape: (N, 2).
labels (Tensor): Batch predict class label for each predict
center2d point.
shape: (N, )
downsample_ratio (int): The stride of feature map.
cam2imgs (Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
Return:
dict: The 3D prediction dict decoded from regression map.
the dict has components below:
- bboxes2d (torch.Tensor): Decoded [x1, y1, x2, y2] format
2D bboxes.
- dimensions (torch.Tensor): Decoded dimensions for each
object.
- offsets2d (torch.Tenosr): Offsets between base centers2d
and real centers2d.
- direct_depth (torch.Tensor): Decoded directly regressed
depth.
- keypoints2d (torch.Tensor): Keypoints of each projected
3D box on image.
- keypoints_depth (torch.Tensor): Decoded depth from keypoints.
- combined_depth (torch.Tensor): Combined depth using direct
depth and keypoints depth with depth uncertainty.
- orientations (torch.Tensor): Multibin format orientations
(local yaw) for each objects.
"""
# 4 dimensions for FCOS style regression
pred_bboxes2d = bbox[:, 0:4]
# change FCOS style to [x1, y1, x2, y2] format for IOU Loss
pred_bboxes2d = self.decode_bboxes2d(pred_bboxes2d, base_centers2d)
# 2 dimensions for projected centers2d offsets
pred_offsets2d = bbox[:, 4:6]
# 3 dimensions for 3D bbox dimensions offsets
pred_dimensions_offsets3d = bbox[:, 29:32]
# the first 8 dimensions are for orientation bin classification
# and the second 8 dimensions are for orientation offsets.
pred_orientations = torch.cat((bbox[:, 32:40], bbox[:, 40:48]), dim=1)
# 3 dimensions for the uncertainties of the solved depths from
# groups of keypoints
pred_keypoints_depth_uncertainty = bbox[:, 26:29]
# 1 dimension for the uncertainty of directly regressed depth
pred_direct_depth_uncertainty = bbox[:, 49:50].squeeze(-1)
# 2 dimension of offsets x keypoints (8 corners + top/bottom center)
pred_keypoints2d = bbox[:, 6:26].reshape(-1, 10, 2)
# 1 dimension for depth offsets
pred_direct_depth_offsets = bbox[:, 48:49].squeeze(-1)
# decode the pred residual dimensions to real dimensions
pred_dimensions = self.decode_dims(labels, pred_dimensions_offsets3d)
pred_direct_depth = self.decode_direct_depth(pred_direct_depth_offsets)
pred_keypoints_depth = self.keypoints2depth(pred_keypoints2d,
pred_dimensions, cam2imgs,
downsample_ratio)
pred_direct_depth_uncertainty = torch.clamp(
pred_direct_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
pred_keypoints_depth_uncertainty = torch.clamp(
pred_keypoints_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
if self.combine_depth:
pred_depth_uncertainty = torch.cat(
(pred_direct_depth_uncertainty.unsqueeze(-1),
pred_keypoints_depth_uncertainty),
dim=1).exp()
pred_depth = torch.cat(
(pred_direct_depth.unsqueeze(-1), pred_keypoints_depth), dim=1)
pred_combined_depth = \
self.combine_depths(pred_depth, pred_depth_uncertainty)
else:
pred_combined_depth = None
preds = dict(
bboxes2d=pred_bboxes2d,
dimensions=pred_dimensions,
offsets2d=pred_offsets2d,
keypoints2d=pred_keypoints2d,
orientations=pred_orientations,
direct_depth=pred_direct_depth,
keypoints_depth=pred_keypoints_depth,
combined_depth=pred_combined_depth,
direct_depth_uncertainty=pred_direct_depth_uncertainty,
keypoints_depth_uncertainty=pred_keypoints_depth_uncertainty,
)
return preds
def decode_direct_depth(self, depth_offsets):
"""Transform depth offset to directly regressed depth.
Args:
depth_offsets (torch.Tensor): Predicted depth offsets.
shape: (N, )
Return:
torch.Tensor: Directly regressed depth.
shape: (N, )
"""
if self.depth_mode == 'exp':
direct_depth = depth_offsets.exp()
elif self.depth_mode == 'linear':
base_depth = depth_offsets.new_tensor(self.base_depth)
direct_depth = depth_offsets * base_depth[1] + base_depth[0]
elif self.depth_mode == 'inv_sigmoid':
direct_depth = 1 / torch.sigmoid(depth_offsets) - 1
else:
raise ValueError
if self.depth_range is not None:
direct_depth = torch.clamp(
direct_depth, min=self.depth_range[0], max=self.depth_range[1])
return direct_depth
def decode_location(self,
base_centers2d,
offsets2d,
depths,
cam2imgs,
downsample_ratio,
pad_mode='default'):
"""Retrieve object location.
Args:
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
offsets2d (torch.Tensor): The offsets between real centers2d
and base centers2d.
shape: (N , 2)
depths (torch.Tensor): Depths of objects.
shape: (N, )
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int): The stride of feature map.
pad_mode (str, optional): Padding mode used in
training data augmentation.
Return:
tuple(torch.Tensor): Centers of 3D boxes.
shape: (N, 3)
"""
N = cam2imgs.shape[0]
# (N, 4, 4)
cam2imgs_inv = cam2imgs.inverse()
if pad_mode == 'default':
centers2d_img = (base_centers2d + offsets2d) * downsample_ratio
else:
raise NotImplementedError
# (N, 3)
centers2d_img = \
torch.cat((centers2d_img, depths.unsqueeze(-1)), dim=1)
# (N, 4, 1)
centers2d_extend = \
torch.cat((centers2d_img, centers2d_img.new_ones(N, 1)),
dim=1).unsqueeze(-1)
locations = torch.matmul(cam2imgs_inv, centers2d_extend).squeeze(-1)
return locations[:, :3]
def keypoints2depth(self,
keypoints2d,
dimensions,
cam2imgs,
downsample_ratio=4,
group0_index=[(7, 3), (0, 4)],
group1_index=[(2, 6), (1, 5)]):
"""Decode depth form three groups of keypoints and geometry projection
model. 2D keypoints inlucding 8 coreners and top/bottom centers will be
divided into three groups which will be used to calculate three depths
of object.
.. code-block:: none
Group center keypoints:
+ --------------- +
/| top center /|
/ | . / |
/ | | / |
+ ---------|----- + +
| / | | /
| / . | /
|/ bottom center |/
+ --------------- +
Group 0 keypoints:
0
+ -------------- +
/| /|
/ | / |
/ | 5/ |
+ -------------- + +
| /3 | /
| / | /
|/ |/
+ -------------- + 6
Group 1 keypoints:
4
+ -------------- +
/| /|
/ | / |
/ | / |
1 + -------------- + + 7
| / | /
| / | /
|/ |/
2 + -------------- +
Args:
keypoints2d (torch.Tensor): Keypoints of objects.
8 vertices + top/bottom center.
shape: (N, 10, 2)
dimensions (torch.Tensor): Dimensions of objetcts.
shape: (N, 3)
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int, opitonal): The stride of feature map.
Defaults: 4.
group0_index(list[tuple[int]], optional): Keypoints group 0
of index to calculate the depth.
Defaults: [0, 3, 4, 7].
group1_index(list[tuple[int]], optional): Keypoints group 1
of index to calculate the depth.
Defaults: [1, 2, 5, 6]
Return:
tuple(torch.Tensor): Depth computed from three groups of
keypoints (top/bottom, group0, group1)
shape: (N, 3)
"""
pred_height_3d = dimensions[:, 1].clone()
f_u = cam2imgs[:, 0, 0]
center_height = keypoints2d[:, -2, 1] - keypoints2d[:, -1, 1]
corner_group0_height = keypoints2d[:, group0_index[0], 1] \
- keypoints2d[:, group0_index[1], 1]
corner_group1_height = keypoints2d[:, group1_index[0], 1] \
- keypoints2d[:, group1_index[1], 1]
center_depth = f_u * pred_height_3d / (
F.relu(center_height) * downsample_ratio + self.eps)
corner_group0_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group0_height) * downsample_ratio + self.eps)
corner_group1_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group1_height) * downsample_ratio + self.eps)
corner_group0_depth = corner_group0_depth.mean(dim=1)
corner_group1_depth = corner_group1_depth.mean(dim=1)
keypoints_depth = torch.stack(
(center_depth, corner_group0_depth, corner_group1_depth), dim=1)
keypoints_depth = torch.clamp(
keypoints_depth, min=self.depth_range[0], max=self.depth_range[1])
return keypoints_depth
def decode_dims(self, labels, dims_offset):
"""Retrieve object dimensions.
Args:
labels (torch.Tensor): Each points' category id.
shape: (N, K)
dims_offset (torch.Tensor): Dimension offsets.
shape: (N, 3)
Returns:
torch.Tensor: Shape (N, 3)
"""
if self.dims_mode == 'exp':
dims_offset = dims_offset.exp()
elif self.dims_mode == 'linear':
labels = labels.long()
base_dims = dims_offset.new_tensor(self.base_dims)
dims_mean = base_dims[:, :3]
dims_std = base_dims[:, 3:6]
cls_dimension_mean = dims_mean[labels, :]
cls_dimension_std = dims_std[labels, :]
dimensions = dims_offset * cls_dimension_mean + cls_dimension_std
else:
raise ValueError
return dimensions
def decode_orientation(self, ori_vector, locations):
"""Retrieve object orientation.
Args:
ori_vector (torch.Tensor): Local orientation vector
in [axis_cls, head_cls, sin, cos] format.
shape: (N, num_dir_bins * 4)
locations (torch.Tensor): Object location.
shape: (N, 3)
Returns:
tuple[torch.Tensor]: yaws and local yaws of 3d bboxes.
"""
if self.multibin:
pred_bin_cls = ori_vector[:, :self.num_dir_bins * 2].view(
-1, self.num_dir_bins, 2)
pred_bin_cls = pred_bin_cls.softmax(dim=2)[..., 1]
orientations = ori_vector.new_zeros(ori_vector.shape[0])
for i in range(self.num_dir_bins):
mask_i = (pred_bin_cls.argmax(dim=1) == i)
start_bin = self.num_dir_bins * 2 + i * 2
end_bin = start_bin + 2
pred_bin_offset = ori_vector[mask_i, start_bin:end_bin]
orientations[mask_i] = pred_bin_offset[:, 0].atan2(
pred_bin_offset[:, 1]) + self.bin_centers[i]
else:
axis_cls = ori_vector[:, :2].softmax(dim=1)
axis_cls = axis_cls[:, 0] < axis_cls[:, 1]
head_cls = ori_vector[:, 2:4].softmax(dim=1)
head_cls = head_cls[:, 0] < head_cls[:, 1]
# cls axis
orientations = self.bin_centers[axis_cls + head_cls * 2]
sin_cos_offset = F.normalize(ori_vector[:, 4:])
orientations += sin_cos_offset[:, 0].atan(sin_cos_offset[:, 1])
locations = locations.view(-1, 3)
rays = locations[:, 0].atan2(locations[:, 2])
local_yaws = orientations
yaws = local_yaws + rays
larger_idx = (yaws > np.pi).nonzero(as_tuple=False)
small_idx = (yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
yaws[small_idx] += 2 * np.pi
larger_idx = (local_yaws > np.pi).nonzero(as_tuple=False)
small_idx = (local_yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
local_yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
local_yaws[small_idx] += 2 * np.pi
return yaws, local_yaws
def decode_bboxes2d(self, reg_bboxes2d, base_centers2d):
"""Retrieve [x1, y1, x2, y2] format 2D bboxes.
Args:
reg_bboxes2d (torch.Tensor): Predicted FCOS style
2D bboxes.
shape: (N, 4)
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
Returns:
torch.Tenosr: [x1, y1, x2, y2] format 2D bboxes.
"""
centers_x = base_centers2d[:, 0]
centers_y = base_centers2d[:, 1]
xs_min = centers_x - reg_bboxes2d[..., 0]
ys_min = centers_y - reg_bboxes2d[..., 1]
xs_max = centers_x + reg_bboxes2d[..., 2]
ys_max = centers_y + reg_bboxes2d[..., 3]
bboxes2d = torch.stack([xs_min, ys_min, xs_max, ys_max], dim=-1)
return bboxes2d
def combine_depths(self, depth, depth_uncertainty):
"""Combine all the prediced depths with depth uncertainty.
Args:
depth (torch.Tensor): Predicted depths of each object.
2D bboxes.
shape: (N, 4)
depth_uncertainty (torch.Tensor): Depth uncertainty for
each depth of each object.
shape: (N, 4)
Returns:
torch.Tenosr: combined depth.
"""
uncertainty_weights = 1 / depth_uncertainty
uncertainty_weights = \
uncertainty_weights / \
uncertainty_weights.sum(dim=1, keepdim=True)
combined_depth = torch.sum(depth * uncertainty_weights, dim=1)
return combined_depth
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment