Unverified Commit 32a4328b authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Bump version to V1.0.0rc0

Bump version to V1.0.0rc0
parents 86cc487c a8817998
...@@ -19,12 +19,6 @@ ...@@ -19,12 +19,6 @@
**注意**: 我们已经在 0.13.0 及之后的版本中全面支持 pycocotools。 **注意**: 我们已经在 0.13.0 及之后的版本中全面支持 pycocotools。
- 如果您遇到下面的问题,并且您的环境包含 numba == 0.48.0 和 numpy >= 1.20.0:
``TypeError: expected dtype object, got 'numpy.dtype[bool_]'``
请将 numpy 的版本降级至 < 1.20.0,或者从源码安装 numba == 0.48,这是由于 numpy == 1.20.0 改变了 API,使得在调用 `np.dtype` 会产生子类。请参考 [这里](https://github.com/numba/numba/issues/6041) 获取更多细节。
- 如果您在导入 pycocotools 相关包时遇到下面的问题: - 如果您在导入 pycocotools 相关包时遇到下面的问题:
``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`` ``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject``
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version | | MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
|:-------------------:|:-------------------:|:-------------------:|:-------------------:| |:-------------------:|:-------------------:|:-------------------:|:-------------------:|
| master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0| | master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0| | 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.18.0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0| | 0.18.0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
| 0.17.3 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.17.3 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0|
......
...@@ -75,3 +75,31 @@ ...@@ -75,3 +75,31 @@
### ImVoxelNet ### ImVoxelNet
请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvoxelnet) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果。 请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvoxelnet) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果。
### PAConv
请参考 [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/paconv) 获取更多细节,我们在 S3DIS 数据集上给出了相应的结果.
### DGCNN
请参考 [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn) 获取更多细节,我们在 S3DIS 数据集上给出了相应的结果.
### SMOKE
请参考 [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### PGD
请参考 [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd) 获取更多细节,我们在 KITTI 和 nuScenes 数据集上给出了相应的结果.
### PointRCNN
请参考 [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### MonoFlex
请参考 [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) 获取更多细节,我们在 KITTI 数据集上给出了相应的结果.
### Mixed Precision (FP16) Training
细节请参考 [Mixed Precision (FP16) Training] 在 PointPillars 训练的样例 (https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py).
#!/usr/bin/env python #!/usr/bin/env python
import functools as func import functools as func
import glob import glob
import numpy as np
import re import re
from os import path as osp from os import path as osp
import numpy as np
url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/master/' url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/master/'
files = sorted(glob.glob('../configs/*/README.md')) files = sorted(glob.glob('../configs/*/README.md'))
......
...@@ -6,3 +6,4 @@ ...@@ -6,3 +6,4 @@
data_pipeline.md data_pipeline.md
customize_models.md customize_models.md
customize_runtime.md customize_runtime.md
coord_sys_tutorial.md
...@@ -71,7 +71,7 @@ python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR} ...@@ -71,7 +71,7 @@ python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --eval-options 'show=True' 'out_dir=${SHOW_DIR}' python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --eval-options 'show=True' 'out_dir=${SHOW_DIR}'
``` ```
在运行这个指令后,您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签(例如:在多模态检测任务中的`***_points.obj``***_pred.obj``***_gt.obj``***_img.png``***_pred.png` )。当 `show` 被激活,[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当在没有 GUI 的远程服务器上运行测试的时候,您需要设定 `show=False` 在运行这个指令后,您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签(例如:在多模态检测任务中的`***_points.obj``***_pred.obj``***_gt.obj``***_img.png``***_pred.png` )。当 `show` 被激活,[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当在没有 GUI 的远程服务器上运行测试的时候,无法进行在线可视化,您可以设定 `show=False` 将输出结果保存在 `{SHOW_DIR}`
至于离线可视化,您将有两个选择。 至于离线可视化,您将有两个选择。
利用 `Open3D` 后端可视化结果,您可以运行下面的指令 利用 `Open3D` 后端可视化结果,您可以运行下面的指令
...@@ -97,6 +97,12 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py - ...@@ -97,6 +97,12 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py -
**注意**:一旦指定 `--output-dir` ,当按下 open3d 窗口的 `_ESC_`,用户指定的视图图像将被保存。如果您没有显示器,您可以移除 `--online` 标志,从而仅仅保存可视化结果并且进行离线浏览。 **注意**:一旦指定 `--output-dir` ,当按下 open3d 窗口的 `_ESC_`,用户指定的视图图像将被保存。如果您没有显示器,您可以移除 `--online` 标志,从而仅仅保存可视化结果并且进行离线浏览。
为了验证数据的一致性和数据增强的效果,您还可以使用以下命令添加 `--aug` 标志来可视化数据增强后的数据:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --aug --output-dir ${OUTPUT_DIR} --online
```
如果您还想显示 2D 图像以及投影的 3D 边界框,则需要找到支持多模态数据加载的配置文件,然后将 `--task` 参数更改为 `multi_modality-det`。一个例子如下所示 如果您还想显示 2D 图像以及投影的 3D 边界框,则需要找到支持多模态数据加载的配置文件,然后将 `--task` 参数更改为 `multi_modality-det`。一个例子如下所示
```shell ```shell
...@@ -123,6 +129,64 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task ...@@ -123,6 +129,64 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task
&emsp; &emsp;
# 模型部署
**Note**: 此工具仍然处于试验阶段,目前只有 SECOND 支持用 [`TorchServe`](https://pytorch.org/serve/) 部署,我们将会在未来支持更多的模型。
为了使用 [`TorchServe`](https://pytorch.org/serve/) 部署 `MMDetection3D` 模型,您可以遵循以下步骤:
## 1. 将模型从 MMDetection3D 转换到 TorchServe
```shell
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```
**Note**: ${MODEL_STORE} 需要为文件夹的绝对路径。
## 2. 构建 `mmdet3d-serve` 镜像
```shell
docker build -t mmdet3d-serve:latest docker/serve/
```
## 3. 运行 `mmdet3d-serve`
查看官网文档来 [使用 docker 运行 TorchServe](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment)
为了在 GPU 上运行,您需要安装 [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。您可以忽略 `--gpus` 参数,从而在 CPU 上运行。
例子:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
```
[阅读文档](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) 关于 Inference (8080), Management (8081) and Metrics (8082) 接口。
## 4. 测试部署
您可以使用 `test_torchserver.py` 进行部署, 同时比较 torchserver 和 pytorch 的结果。
```shell
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
```
例子:
```shell
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
&emsp;
# 模型复杂度 # 模型复杂度
您可以使用 MMDetection 中的 `tools/analysis_tools/get_flops.py` 这个脚本文件,基于 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 计算一个给定模型的计算量 (FLOPS) 和参数量 (params)。 您可以使用 MMDetection 中的 `tools/analysis_tools/get_flops.py` 这个脚本文件,基于 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 计算一个给定模型的计算量 (FLOPS) 和参数量 (params)。
......
...@@ -4,10 +4,11 @@ from .inference import (convert_SyncBN, inference_detector, ...@@ -4,10 +4,11 @@ from .inference import (convert_SyncBN, inference_detector,
inference_multi_modality_detector, inference_segmentor, inference_multi_modality_detector, inference_segmentor,
init_model, show_result_meshlab) init_model, show_result_meshlab)
from .test import single_gpu_test from .test import single_gpu_test
from .train import train_model from .train import init_random_seed, train_model
__all__ = [ __all__ = [
'inference_detector', 'init_model', 'single_gpu_test', 'inference_detector', 'init_model', 'single_gpu_test',
'inference_mono_3d_detector', 'show_result_meshlab', 'convert_SyncBN', 'inference_mono_3d_detector', 'show_result_meshlab', 'convert_SyncBN',
'train_model', 'inference_multi_modality_detector', 'inference_segmentor' 'train_model', 'inference_multi_modality_detector', 'inference_segmentor',
'init_random_seed'
] ]
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
import re
from copy import deepcopy
from os import path as osp
import mmcv import mmcv
import numpy as np import numpy as np
import re
import torch import torch
from copy import deepcopy
from mmcv.parallel import collate, scatter from mmcv.parallel import collate, scatter
from mmcv.runner import load_checkpoint from mmcv.runner import load_checkpoint
from os import path as osp
from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes, from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,
DepthInstance3DBoxes, LiDARInstance3DBoxes, DepthInstance3DBoxes, LiDARInstance3DBoxes,
show_multi_modality_result, show_result, show_multi_modality_result, show_result,
show_seg_result) show_seg_result)
...@@ -83,26 +84,53 @@ def inference_detector(model, pcd): ...@@ -83,26 +84,53 @@ def inference_detector(model, pcd):
""" """
cfg = model.cfg cfg = model.cfg
device = next(model.parameters()).device # model device device = next(model.parameters()).device # model device
if not isinstance(pcd, str):
cfg = cfg.copy()
# set loading pipeline type
cfg.data.test.pipeline[0].type = 'LoadPointsFromDict'
# build the data pipeline # build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline) test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline) test_pipeline = Compose(test_pipeline)
box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d) box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
data = dict(
pts_filename=pcd, if isinstance(pcd, str):
box_type_3d=box_type_3d, # load from point clouds file
box_mode_3d=box_mode_3d, data = dict(
# for ScanNet demo we need axis_align_matrix pts_filename=pcd,
ann_info=dict(axis_align_matrix=np.eye(4)), box_type_3d=box_type_3d,
sweeps=[], box_mode_3d=box_mode_3d,
# set timestamp = 0 # for ScanNet demo we need axis_align_matrix
timestamp=[0], ann_info=dict(axis_align_matrix=np.eye(4)),
img_fields=[], sweeps=[],
bbox3d_fields=[], # set timestamp = 0
pts_mask_fields=[], timestamp=[0],
pts_seg_fields=[], img_fields=[],
bbox_fields=[], bbox3d_fields=[],
mask_fields=[], pts_mask_fields=[],
seg_fields=[]) pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
else:
# load from http
data = dict(
points=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
data = test_pipeline(data) data = test_pipeline(data)
data = collate([data], samples_per_gpu=1) data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda: if next(model.parameters()).is_cuda:
...@@ -317,8 +345,7 @@ def show_det_result_meshlab(data, ...@@ -317,8 +345,7 @@ def show_det_result_meshlab(data,
# for now we convert points into depth mode # for now we convert points into depth mode
box_mode = data['img_metas'][0][0]['box_mode_3d'] box_mode = data['img_metas'][0][0]['box_mode_3d']
if box_mode != Box3DMode.DEPTH: if box_mode != Box3DMode.DEPTH:
points = points[..., [1, 0, 2]] points = Coord3DMode.convert(points, box_mode, Coord3DMode.DEPTH)
points[..., 0] *= -1
show_bboxes = Box3DMode.convert(pred_bboxes, box_mode, Box3DMode.DEPTH) show_bboxes = Box3DMode.convert(pred_bboxes, box_mode, Box3DMode.DEPTH)
else: else:
show_bboxes = deepcopy(pred_bboxes) show_bboxes = deepcopy(pred_bboxes)
...@@ -462,15 +489,17 @@ def show_result_meshlab(data, ...@@ -462,15 +489,17 @@ def show_result_meshlab(data,
data (dict): Contain data from pipeline. data (dict): Contain data from pipeline.
result (dict): Predicted result from model. result (dict): Predicted result from model.
out_dir (str): Directory to save visualized result. out_dir (str): Directory to save visualized result.
score_thr (float): Minimum score of bboxes to be shown. Default: 0.0 score_thr (float, optional): Minimum score of bboxes to be shown.
show (bool): Visualize the results online. Defaults to False. Default: 0.0
snapshot (bool): Whether to save the online results. Defaults to False. show (bool, optional): Visualize the results online. Defaults to False.
task (str): Distinguish which task result to visualize. Currently we snapshot (bool, optional): Whether to save the online results.
support 3D detection, multi-modality detection and 3D segmentation. Defaults to False.
Defaults to 'det'. task (str, optional): Distinguish which task result to visualize.
palette (list[list[int]]] | np.ndarray | None): The palette of Currently we support 3D detection, multi-modality detection and
segmentation map. If None is given, random palette will be 3D segmentation. Defaults to 'det'.
generated. Defaults to None. palette (list[list[int]]] | np.ndarray, optional): The palette
of segmentation map. If None is given, random palette will be
generated. Defaults to None.
""" """
assert task in ['det', 'multi_modality-det', 'seg', 'mono-det'], \ assert task in ['det', 'multi_modality-det', 'seg', 'mono-det'], \
f'unsupported visualization task {task}' f'unsupported visualization task {task}'
......
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
from os import path as osp
import mmcv import mmcv
import torch import torch
from mmcv.image import tensor2imgs from mmcv.image import tensor2imgs
from os import path as osp
from mmdet3d.models import (Base3DDetector, Base3DSegmentor, from mmdet3d.models import (Base3DDetector, Base3DSegmentor,
SingleStageMono3DDetector) SingleStageMono3DDetector)
...@@ -22,9 +23,9 @@ def single_gpu_test(model, ...@@ -22,9 +23,9 @@ def single_gpu_test(model,
Args: Args:
model (nn.Module): Model to be tested. model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader. data_loader (nn.Dataloader): Pytorch data loader.
show (bool): Whether to save viualization results. show (bool, optional): Whether to save viualization results.
Default: True. Default: True.
out_dir (str): The path to save visualization results. out_dir (str, optional): The path to save visualization results.
Default: None. Default: None.
Returns: Returns:
......
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmcv.runner import get_dist_info
from torch import distributed as dist
from mmdet.apis import train_detector from mmdet.apis import train_detector
from mmseg.apis import train_segmentor from mmseg.apis import train_segmentor
def init_random_seed(seed=None, device='cuda'):
"""Initialize random seed.
If the seed is not set, the seed will be automatically randomized,
and then broadcast to all processes to prevent some potential bugs.
Args:
seed (int, optional): The seed. Default to None.
device (str, optional): The device where the seed will be put on.
Default to 'cuda'.
Returns:
int: Seed to be used.
"""
if seed is not None:
return seed
# Make sure all ranks share the same random seed to prevent
# some potential bugs. Please refer to
# https://github.com/open-mmlab/mmdetection/issues/6339
rank, world_size = get_dist_info()
seed = np.random.randint(2**31)
if world_size == 1:
return seed
if rank == 0:
random_num = torch.tensor(seed, dtype=torch.int32, device=device)
else:
random_num = torch.tensor(0, dtype=torch.int32, device=device)
dist.broadcast(random_num, src=0)
return random_num.item()
def train_model(model, def train_model(model,
dataset, dataset,
cfg, cfg,
......
...@@ -19,20 +19,26 @@ class Anchor3DRangeGenerator(object): ...@@ -19,20 +19,26 @@ class Anchor3DRangeGenerator(object):
ranges (list[list[float]]): Ranges of different anchors. ranges (list[list[float]]): Ranges of different anchors.
The ranges are the same across different feature levels. But may The ranges are the same across different feature levels. But may
vary for different anchor sizes if size_per_range is True. vary for different anchor sizes if size_per_range is True.
sizes (list[list[float]]): 3D sizes of anchors. sizes (list[list[float]], optional): 3D sizes of anchors.
scales (list[int]): Scales of anchors in different feature levels. Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float]): Rotations of anchors in a feature grid. scales (list[int], optional): Scales of anchors in different feature
custom_values (tuple[float]): Customized values of that anchor. For levels. Defaults to [1].
example, in nuScenes the anchors have velocities. rotations (list[float], optional): Rotations of anchors in a feature
reshape_out (bool): Whether to reshape the output into (N x 4). grid. Defaults to [0, 1.5707963].
size_per_range: Whether to use separate ranges for different sizes. custom_values (tuple[float], optional): Customized values of that
If size_per_range is True, the ranges should have the same length anchor. For example, in nuScenes the anchors have velocities.
as the sizes, if not, it will be duplicated. Defaults to ().
reshape_out (bool, optional): Whether to reshape the output into
(N x 4). Defaults to True.
size_per_range (bool, optional): Whether to use separate ranges for
different sizes. If size_per_range is True, the ranges should have
the same length as the sizes, if not, it will be duplicated.
Defaults to True.
""" """
def __init__(self, def __init__(self,
ranges, ranges,
sizes=[[1.6, 3.9, 1.56]], sizes=[[3.9, 1.6, 1.56]],
scales=[1], scales=[1],
rotations=[0, 1.5707963], rotations=[0, 1.5707963],
custom_values=(), custom_values=(),
...@@ -86,13 +92,14 @@ class Anchor3DRangeGenerator(object): ...@@ -86,13 +92,14 @@ class Anchor3DRangeGenerator(object):
Args: Args:
featmap_sizes (list[tuple]): List of feature map sizes in featmap_sizes (list[tuple]): List of feature map sizes in
multiple feature levels. multiple feature levels.
device (str): Device where the anchors will be put on. device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns: Returns:
list[torch.Tensor]: Anchors in multiple feature levels. \ list[torch.Tensor]: Anchors in multiple feature levels.
The sizes of each tensor should be [N, 4], where \ The sizes of each tensor should be [N, 4], where
N = width * height * num_base_anchors, width and height \ N = width * height * num_base_anchors, width and height
are the sizes of the corresponding feature lavel, \ are the sizes of the corresponding feature level,
num_base_anchors is the number of anchors for that level. num_base_anchors is the number of anchors for that level.
""" """
assert self.num_levels == len(featmap_sizes) assert self.num_levels == len(featmap_sizes)
...@@ -149,7 +156,7 @@ class Anchor3DRangeGenerator(object): ...@@ -149,7 +156,7 @@ class Anchor3DRangeGenerator(object):
feature_size, feature_size,
anchor_range, anchor_range,
scale=1, scale=1,
sizes=[[1.6, 3.9, 1.56]], sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963], rotations=[0, 1.5707963],
device='cuda'): device='cuda'):
"""Generate anchors in a single range. """Generate anchors in a single range.
...@@ -161,14 +168,18 @@ class Anchor3DRangeGenerator(object): ...@@ -161,14 +168,18 @@ class Anchor3DRangeGenerator(object):
shape [6]. The order is consistent with that of anchors, i.e., shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max). (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional): The scale factor of anchors. scale (float | int, optional): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with Defaults to 1.
shape [N, 3], in order of x, y, z. sizes (list[list] | np.ndarray | torch.Tensor, optional):
rotations (list[float] | np.ndarray | torch.Tensor): Rotations of Anchor size with shape [N, 3], in order of x, y, z.
anchors in a single feature grid. Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str): Devices that the anchors will be put on. device (str): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns: Returns:
torch.Tensor: Anchors with shape \ torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7]. [*feature_size, num_sizes, num_rots, 7].
""" """
if len(feature_size) == 2: if len(feature_size) == 2:
...@@ -231,10 +242,10 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator): ...@@ -231,10 +242,10 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
up corner to distribute anchors. up corner to distribute anchors.
Args: Args:
anchor_corner (bool): Whether to align with the corner of the voxel anchor_corner (bool, optional): Whether to align with the corner of the
grid. By default it is False and the anchor's center will be voxel grid. By default it is False and the anchor's center will be
the same as the corresponding voxel's center, which is also the the same as the corresponding voxel's center, which is also the
center of the corresponding greature grid. center of the corresponding greature grid. Defaults to False.
""" """
def __init__(self, align_corner=False, **kwargs): def __init__(self, align_corner=False, **kwargs):
...@@ -245,7 +256,7 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator): ...@@ -245,7 +256,7 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
feature_size, feature_size,
anchor_range, anchor_range,
scale, scale,
sizes=[[1.6, 3.9, 1.56]], sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963], rotations=[0, 1.5707963],
device='cuda'): device='cuda'):
"""Generate anchors in a single range. """Generate anchors in a single range.
...@@ -256,15 +267,18 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator): ...@@ -256,15 +267,18 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
anchor_range (torch.Tensor | list[float]): Range of anchors with anchor_range (torch.Tensor | list[float]): Range of anchors with
shape [6]. The order is consistent with that of anchors, i.e., shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max). (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional): The scale factor of anchors. scale (float | int): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with sizes (list[list] | np.ndarray | torch.Tensor, optional):
shape [N, 3], in order of x, y, z. Anchor size with shape [N, 3], in order of x, y, z.
rotations (list[float] | np.ndarray | torch.Tensor): Rotations of Defaults to [[3.9, 1.6, 1.56]].
anchors in a single feature grid. rotations (list[float] | np.ndarray | torch.Tensor, optional):
device (str): Devices that the anchors will be put on. Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str, optional): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns: Returns:
torch.Tensor: Anchors with shape \ torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7]. [*feature_size, num_sizes, num_rots, 7].
""" """
if len(feature_size) == 2: if len(feature_size) == 2:
...@@ -334,7 +348,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator): ...@@ -334,7 +348,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
Note that feature maps of different classes may be different. Note that feature maps of different classes may be different.
Args: Args:
kwargs (dict): Arguments are the same as those in \ kwargs (dict): Arguments are the same as those in
:class:`AlignedAnchor3DRangeGenerator`. :class:`AlignedAnchor3DRangeGenerator`.
""" """
...@@ -347,15 +361,16 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator): ...@@ -347,15 +361,16 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
"""Generate grid anchors in multiple feature levels. """Generate grid anchors in multiple feature levels.
Args: Args:
featmap_sizes (list[tuple]): List of feature map sizes for \ featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level. different classes in a single feature level.
device (str): Device where the anchors will be put on. device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns: Returns:
list[list[torch.Tensor]]: Anchors in multiple feature levels. \ list[list[torch.Tensor]]: Anchors in multiple feature levels.
Note that in this anchor generator, we currently only \ Note that in this anchor generator, we currently only
support single feature level. The sizes of each tensor \ support single feature level. The sizes of each tensor
should be [num_sizes/ranges*num_rots*featmap_size, \ should be [num_sizes/ranges*num_rots*featmap_size,
box_code_size]. box_code_size].
""" """
multi_level_anchors = [] multi_level_anchors = []
...@@ -371,7 +386,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator): ...@@ -371,7 +386,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
This function is usually called by method ``self.grid_anchors``. This function is usually called by method ``self.grid_anchors``.
Args: Args:
featmap_sizes (list[tuple]): List of feature map sizes for \ featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level. different classes in a single feature level.
scale (float): Scale factor of the anchors in the current level. scale (float): Scale factor of the anchors in the current level.
device (str, optional): Device the tensor will be put on. device (str, optional): Device the tensor will be put on.
......
...@@ -12,7 +12,8 @@ from .samplers import (BaseSampler, CombinedSampler, ...@@ -12,7 +12,8 @@ from .samplers import (BaseSampler, CombinedSampler,
from .structures import (BaseInstance3DBoxes, Box3DMode, CameraInstance3DBoxes, from .structures import (BaseInstance3DBoxes, Box3DMode, CameraInstance3DBoxes,
Coord3DMode, DepthInstance3DBoxes, Coord3DMode, DepthInstance3DBoxes,
LiDARInstance3DBoxes, get_box_type, limit_period, LiDARInstance3DBoxes, get_box_type, limit_period,
mono_cam_box2vis, points_cam2img, xywhr2xyxyr) mono_cam_box2vis, points_cam2img, points_img2cam,
xywhr2xyxyr)
from .transforms import bbox3d2result, bbox3d2roi, bbox3d_mapping_back from .transforms import bbox3d2result, bbox3d2roi, bbox3d_mapping_back
__all__ = [ __all__ = [
...@@ -25,5 +26,5 @@ __all__ = [ ...@@ -25,5 +26,5 @@ __all__ = [
'LiDARInstance3DBoxes', 'CameraInstance3DBoxes', 'bbox3d2roi', 'LiDARInstance3DBoxes', 'CameraInstance3DBoxes', 'bbox3d2roi',
'bbox3d2result', 'DepthInstance3DBoxes', 'BaseInstance3DBoxes', 'bbox3d2result', 'DepthInstance3DBoxes', 'BaseInstance3DBoxes',
'bbox3d_mapping_back', 'xywhr2xyxyr', 'limit_period', 'points_cam2img', 'bbox3d_mapping_back', 'xywhr2xyxyr', 'limit_period', 'points_cam2img',
'get_box_type', 'Coord3DMode', 'mono_cam_box2vis' 'points_img2cam', 'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
] ]
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
# TODO: clean the functions in this file and move the APIs into box structures # TODO: clean the functions in this file and move the APIs into box structures
# in the future # in the future
# NOTICE: All functions in this file are valid for LiDAR or depth boxes only
# if we use default parameters.
import numba import numba
import numpy as np import numpy as np
from .structures.utils import limit_period, points_cam2img, rotation_3d_in_axis
def camera_to_lidar(points, r_rect, velo2cam): def camera_to_lidar(points, r_rect, velo2cam):
"""Convert points in camera coordinate to lidar coordinate. """Convert points in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args: Args:
points (np.ndarray, shape=[N, 3]): Points in camera coordinate. points (np.ndarray, shape=[N, 3]): Points in camera coordinate.
r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
...@@ -27,7 +34,10 @@ def camera_to_lidar(points, r_rect, velo2cam): ...@@ -27,7 +34,10 @@ def camera_to_lidar(points, r_rect, velo2cam):
def box_camera_to_lidar(data, r_rect, velo2cam): def box_camera_to_lidar(data, r_rect, velo2cam):
"""Covert boxes in camera coordinate to lidar coordinate. """Convert boxes in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args: Args:
data (np.ndarray, shape=[N, 7]): Boxes in camera coordinate. data (np.ndarray, shape=[N, 7]): Boxes in camera coordinate.
...@@ -40,10 +50,13 @@ def box_camera_to_lidar(data, r_rect, velo2cam): ...@@ -40,10 +50,13 @@ def box_camera_to_lidar(data, r_rect, velo2cam):
np.ndarray, shape=[N, 3]: Boxes in lidar coordinate. np.ndarray, shape=[N, 3]: Boxes in lidar coordinate.
""" """
xyz = data[:, 0:3] xyz = data[:, 0:3]
l, h, w = data[:, 3:4], data[:, 4:5], data[:, 5:6] x_size, y_size, z_size = data[:, 3:4], data[:, 4:5], data[:, 5:6]
r = data[:, 6:7] r = data[:, 6:7]
xyz_lidar = camera_to_lidar(xyz, r_rect, velo2cam) xyz_lidar = camera_to_lidar(xyz, r_rect, velo2cam)
return np.concatenate([xyz_lidar, w, l, h, r], axis=1) # yaw and dims also needs to be converted
r_new = -r - np.pi / 2
r_new = limit_period(r_new, period=np.pi * 2)
return np.concatenate([xyz_lidar, x_size, z_size, y_size, r_new], axis=1)
def corners_nd(dims, origin=0.5): def corners_nd(dims, origin=0.5):
...@@ -80,26 +93,9 @@ def corners_nd(dims, origin=0.5): ...@@ -80,26 +93,9 @@ def corners_nd(dims, origin=0.5):
return corners return corners
def rotation_2d(points, angles):
"""Rotation 2d points based on origin point clockwise when angle positive.
Args:
points (np.ndarray): Points to be rotated with shape \
(N, point_size, 2).
angles (np.ndarray): Rotation angle with shape (N).
Returns:
np.ndarray: Same shape as points.
"""
rot_sin = np.sin(angles)
rot_cos = np.cos(angles)
rot_mat_T = np.stack([[rot_cos, -rot_sin], [rot_sin, rot_cos]])
return np.einsum('aij,jka->aik', points, rot_mat_T)
def center_to_corner_box2d(centers, dims, angles=None, origin=0.5): def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
"""Convert kitti locations, dimensions and angles to corners. """Convert kitti locations, dimensions and angles to corners.
format: center(xy), dims(xy), angles(clockwise when positive) format: center(xy), dims(xy), angles(counterclockwise when positive)
Args: Args:
centers (np.ndarray): Locations in kitti label file with shape (N, 2). centers (np.ndarray): Locations in kitti label file with shape (N, 2).
...@@ -118,7 +114,7 @@ def center_to_corner_box2d(centers, dims, angles=None, origin=0.5): ...@@ -118,7 +114,7 @@ def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
corners = corners_nd(dims, origin=origin) corners = corners_nd(dims, origin=origin)
# corners: [N, 4, 2] # corners: [N, 4, 2]
if angles is not None: if angles is not None:
corners = rotation_2d(corners, angles) corners = rotation_3d_in_axis(corners, angles)
corners += centers.reshape([-1, 1, 2]) corners += centers.reshape([-1, 1, 2])
return corners return corners
...@@ -172,37 +168,6 @@ def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam): ...@@ -172,37 +168,6 @@ def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam):
return lidar_points return lidar_points
def rotation_3d_in_axis(points, angles, axis=0):
"""Rotate points in specific axis.
Args:
points (np.ndarray, shape=[N, point_size, 3]]):
angles (np.ndarray, shape=[N]]):
axis (int, optional): Axis to rotate at. Defaults to 0.
Returns:
np.ndarray: Rotated points.
"""
# points: [N, point_size, 3]
rot_sin = np.sin(angles)
rot_cos = np.cos(angles)
ones = np.ones_like(rot_cos)
zeros = np.zeros_like(rot_cos)
if axis == 1:
rot_mat_T = np.stack([[rot_cos, zeros, -rot_sin], [zeros, ones, zeros],
[rot_sin, zeros, rot_cos]])
elif axis == 2 or axis == -1:
rot_mat_T = np.stack([[rot_cos, -rot_sin, zeros],
[rot_sin, rot_cos, zeros], [zeros, zeros, ones]])
elif axis == 0:
rot_mat_T = np.stack([[zeros, rot_cos, -rot_sin],
[zeros, rot_sin, rot_cos], [ones, zeros, zeros]])
else:
raise ValueError('axis should in range')
return np.einsum('aij,jka->aik', points, rot_mat_T)
def center_to_corner_box3d(centers, def center_to_corner_box3d(centers,
dims, dims,
angles=None, angles=None,
...@@ -225,7 +190,7 @@ def center_to_corner_box3d(centers, ...@@ -225,7 +190,7 @@ def center_to_corner_box3d(centers,
np.ndarray: Corners with the shape of (N, 8, 3). np.ndarray: Corners with the shape of (N, 8, 3).
""" """
# 'length' in kitti format is in x axis. # 'length' in kitti format is in x axis.
# yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(wlh)(lidar) # yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(lwh)(lidar)
# center in kitti format is [0.5, 1.0, 0.5] in xyz. # center in kitti format is [0.5, 1.0, 0.5] in xyz.
corners = corners_nd(dims, origin=origin) corners = corners_nd(dims, origin=origin)
# corners: [N, 8, 3] # corners: [N, 8, 3]
...@@ -259,8 +224,8 @@ def box2d_to_corner_jit(boxes): ...@@ -259,8 +224,8 @@ def box2d_to_corner_jit(boxes):
rot_sin = np.sin(boxes[i, -1]) rot_sin = np.sin(boxes[i, -1])
rot_cos = np.cos(boxes[i, -1]) rot_cos = np.cos(boxes[i, -1])
rot_mat_T[0, 0] = rot_cos rot_mat_T[0, 0] = rot_cos
rot_mat_T[0, 1] = -rot_sin rot_mat_T[0, 1] = rot_sin
rot_mat_T[1, 0] = rot_sin rot_mat_T[1, 0] = -rot_sin
rot_mat_T[1, 1] = rot_cos rot_mat_T[1, 1] = rot_cos
box_corners[i] = corners[i] @ rot_mat_T + boxes[i, :2] box_corners[i] = corners[i] @ rot_mat_T + boxes[i, :2]
return box_corners return box_corners
...@@ -327,15 +292,15 @@ def rotation_points_single_angle(points, angle, axis=0): ...@@ -327,15 +292,15 @@ def rotation_points_single_angle(points, angle, axis=0):
rot_cos = np.cos(angle) rot_cos = np.cos(angle)
if axis == 1: if axis == 1:
rot_mat_T = np.array( rot_mat_T = np.array(
[[rot_cos, 0, -rot_sin], [0, 1, 0], [rot_sin, 0, rot_cos]], [[rot_cos, 0, rot_sin], [0, 1, 0], [-rot_sin, 0, rot_cos]],
dtype=points.dtype) dtype=points.dtype)
elif axis == 2 or axis == -1: elif axis == 2 or axis == -1:
rot_mat_T = np.array( rot_mat_T = np.array(
[[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0], [0, 0, 1]], [[rot_cos, rot_sin, 0], [-rot_sin, rot_cos, 0], [0, 0, 1]],
dtype=points.dtype) dtype=points.dtype)
elif axis == 0: elif axis == 0:
rot_mat_T = np.array( rot_mat_T = np.array(
[[1, 0, 0], [0, rot_cos, -rot_sin], [0, rot_sin, rot_cos]], [[1, 0, 0], [0, rot_cos, rot_sin], [0, -rot_sin, rot_cos]],
dtype=points.dtype) dtype=points.dtype)
else: else:
raise ValueError('axis should in range') raise ValueError('axis should in range')
...@@ -343,44 +308,6 @@ def rotation_points_single_angle(points, angle, axis=0): ...@@ -343,44 +308,6 @@ def rotation_points_single_angle(points, angle, axis=0):
return points @ rot_mat_T, rot_mat_T return points @ rot_mat_T, rot_mat_T
def points_cam2img(points_3d, proj_mat, with_depth=False):
"""Project points in camera coordinates to image coordinates.
Args:
points_3d (np.ndarray): Points in shape (N, 3)
proj_mat (np.ndarray): Transformation matrix between coordinates.
with_depth (bool, optional): Whether to keep depth in the output.
Defaults to False.
Returns:
np.ndarray: Points in image coordinates with shape [N, 2].
"""
points_shape = list(points_3d.shape)
points_shape[-1] = 1
assert len(proj_mat.shape) == 2, 'The dimension of the projection'\
f' matrix should be 2 instead of {len(proj_mat.shape)}.'
d1, d2 = proj_mat.shape[:2]
assert (d1 == 3 and d2 == 3) or (d1 == 3 and d2 == 4) or (
d1 == 4 and d2 == 4), 'The shape of the projection matrix'\
f' ({d1}*{d2}) is not supported.'
if d1 == 3:
proj_mat_expanded = np.eye(4, dtype=proj_mat.dtype)
proj_mat_expanded[:d1, :d2] = proj_mat
proj_mat = proj_mat_expanded
points_4 = np.concatenate([points_3d, np.ones(points_shape)], axis=-1)
point_2d = points_4 @ proj_mat.T
point_2d_res = point_2d[..., :2] / point_2d[..., 2:3]
if with_depth:
points_2d_depth = np.concatenate([point_2d_res, point_2d[..., 2:3]],
axis=-1)
return points_2d_depth
return point_2d_res
def box3d_to_bbox(box3d, P2): def box3d_to_bbox(box3d, P2):
"""Convert box3d in camera coordinates to bbox in image coordinates. """Convert box3d in camera coordinates to bbox in image coordinates.
...@@ -424,7 +351,10 @@ def corner_to_surfaces_3d(corners): ...@@ -424,7 +351,10 @@ def corner_to_surfaces_3d(corners):
def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)): def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)):
"""Check points in rotated bbox and return indicces. """Check points in rotated bbox and return indices.
Note:
This function is for counterclockwise boxes.
Args: Args:
points (np.ndarray, shape=[N, 3+dim]): Points to query. points (np.ndarray, shape=[N, 3+dim]): Points to query.
...@@ -461,25 +391,9 @@ def minmax_to_corner_2d(minmax_box): ...@@ -461,25 +391,9 @@ def minmax_to_corner_2d(minmax_box):
return center_to_corner_box2d(center, dims, origin=0.0) return center_to_corner_box2d(center, dims, origin=0.0)
def limit_period(val, offset=0.5, period=np.pi):
"""Limit the value into a period for periodic function.
Args:
val (np.ndarray): The value to be converted.
offset (float, optional): Offset to set the value range. \
Defaults to 0.5.
period (float, optional): Period of the value. Defaults to np.pi.
Returns:
torch.Tensor: Value in the range of \
[-offset * period, (1-offset) * period]
"""
return val - np.floor(val / period + offset) * period
def create_anchors_3d_range(feature_size, def create_anchors_3d_range(feature_size,
anchor_range, anchor_range,
sizes=((1.6, 3.9, 1.56), ), sizes=((3.9, 1.6, 1.56), ),
rotations=(0, np.pi / 2), rotations=(0, np.pi / 2),
dtype=np.float32): dtype=np.float32):
"""Create anchors 3d by range. """Create anchors 3d by range.
...@@ -492,14 +406,14 @@ def create_anchors_3d_range(feature_size, ...@@ -492,14 +406,14 @@ def create_anchors_3d_range(feature_size,
(x_min, y_min, z_min, x_max, y_max, z_max). (x_min, y_min, z_min, x_max, y_max, z_max).
sizes (list[list] | np.ndarray | torch.Tensor, optional): sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z. Anchor size with shape [N, 3], in order of x, y, z.
Defaults to ((1.6, 3.9, 1.56), ). Defaults to ((3.9, 1.6, 1.56), ).
rotations (list[float] | np.ndarray | torch.Tensor, optional): rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid. Rotations of anchors in a single feature grid.
Defaults to (0, np.pi / 2). Defaults to (0, np.pi / 2).
dtype (type, optional): Data type. Default to np.float32. dtype (type, optional): Data type. Defaults to np.float32.
Returns: Returns:
np.ndarray: Range based anchors with shape of \ np.ndarray: Range based anchors with shape of
(*feature_size, num_sizes, num_rots, 7). (*feature_size, num_sizes, num_rots, 7).
""" """
anchor_range = np.array(anchor_range, dtype) anchor_range = np.array(anchor_range, dtype)
...@@ -550,11 +464,11 @@ def rbbox2d_to_near_bbox(rbboxes): ...@@ -550,11 +464,11 @@ def rbbox2d_to_near_bbox(rbboxes):
"""convert rotated bbox to nearest 'standing' or 'lying' bbox. """convert rotated bbox to nearest 'standing' or 'lying' bbox.
Args: Args:
rbboxes (np.ndarray): Rotated bboxes with shape of \ rbboxes (np.ndarray): Rotated bboxes with shape of
(N, 5(x, y, xdim, ydim, rad)). (N, 5(x, y, xdim, ydim, rad)).
Returns: Returns:
np.ndarray: Bounding boxes with the shpae of np.ndarray: Bounding boxes with the shape of
(N, 4(xmin, ymin, xmax, ymax)). (N, 4(xmin, ymin, xmax, ymax)).
""" """
rots = rbboxes[..., -1] rots = rbboxes[..., -1]
...@@ -570,6 +484,9 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0): ...@@ -570,6 +484,9 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
"""Calculate box iou. Note that jit version runs ~10x faster than the """Calculate box iou. Note that jit version runs ~10x faster than the
box_overlaps function in mmdet3d.core.evaluation. box_overlaps function in mmdet3d.core.evaluation.
Note:
This function is for counterclockwise boxes.
Args: Args:
boxes (np.ndarray): Input bounding boxes with shape of (N, 4). boxes (np.ndarray): Input bounding boxes with shape of (N, 4).
query_boxes (np.ndarray): Query boxes with shape of (K, 4). query_boxes (np.ndarray): Query boxes with shape of (K, 4).
...@@ -607,7 +524,10 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0): ...@@ -607,7 +524,10 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
def projection_matrix_to_CRT_kitti(proj): def projection_matrix_to_CRT_kitti(proj):
"""Split projection matrix of kitti. """Split projection matrix of KITTI.
Note:
This function is for KITTI only.
P = C @ [R|T] P = C @ [R|T]
C is upper triangular matrix, so we need to inverse CR and use QR C is upper triangular matrix, so we need to inverse CR and use QR
...@@ -633,6 +553,9 @@ def projection_matrix_to_CRT_kitti(proj): ...@@ -633,6 +553,9 @@ def projection_matrix_to_CRT_kitti(proj):
def remove_outside_points(points, rect, Trv2c, P2, image_shape): def remove_outside_points(points, rect, Trv2c, P2, image_shape):
"""Remove points which are outside of image. """Remove points which are outside of image.
Note:
This function is for KITTI only.
Args: Args:
points (np.ndarray, shape=[N, 3+dims]): Total points. points (np.ndarray, shape=[N, 3+dims]): Total points.
rect (np.ndarray, shape=[4, 4]): Matrix to project points in rect (np.ndarray, shape=[4, 4]): Matrix to project points in
...@@ -782,8 +705,8 @@ def points_in_convex_polygon_3d_jit(points, ...@@ -782,8 +705,8 @@ def points_in_convex_polygon_3d_jit(points,
normal_vec, d, num_surfaces) normal_vec, d, num_surfaces)
@numba.jit @numba.njit
def points_in_convex_polygon_jit(points, polygon, clockwise=True): def points_in_convex_polygon_jit(points, polygon, clockwise=False):
"""Check points is in 2d convex polygons. True when point in polygon. """Check points is in 2d convex polygons. True when point in polygon.
Args: Args:
...@@ -800,14 +723,16 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True): ...@@ -800,14 +723,16 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
num_points_of_polygon = polygon.shape[1] num_points_of_polygon = polygon.shape[1]
num_points = points.shape[0] num_points = points.shape[0]
num_polygons = polygon.shape[0] num_polygons = polygon.shape[0]
# if clockwise: # vec for all the polygons
# vec1 = polygon - polygon[:, [num_points_of_polygon - 1] + if clockwise:
# list(range(num_points_of_polygon - 1)), :] vec1 = polygon - polygon[:,
# else: np.array([num_points_of_polygon - 1] + list(
# vec1 = polygon[:, [num_points_of_polygon - 1] + range(num_points_of_polygon - 1))), :]
# list(range(num_points_of_polygon - 1)), :] - polygon else:
# vec1: [num_polygon, num_points_of_polygon, 2] vec1 = polygon[:,
vec1 = np.zeros((2), dtype=polygon.dtype) np.array([num_points_of_polygon - 1] +
list(range(num_points_of_polygon -
1))), :] - polygon
ret = np.zeros((num_points, num_polygons), dtype=np.bool_) ret = np.zeros((num_points, num_polygons), dtype=np.bool_)
success = True success = True
cross = 0.0 cross = 0.0
...@@ -815,12 +740,9 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True): ...@@ -815,12 +740,9 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
for j in range(num_polygons): for j in range(num_polygons):
success = True success = True
for k in range(num_points_of_polygon): for k in range(num_points_of_polygon):
if clockwise: vec = vec1[j, k]
vec1 = polygon[j, k] - polygon[j, k - 1] cross = vec[1] * (polygon[j, k, 0] - points[i, 0])
else: cross -= vec[0] * (polygon[j, k, 1] - points[i, 1])
vec1 = polygon[j, k - 1] - polygon[j, k]
cross = vec1[1] * (polygon[j, k, 0] - points[i, 0])
cross -= vec1[0] * (polygon[j, k, 1] - points[i, 1])
if cross >= 0: if cross >= 0:
success = False success = False
break break
...@@ -839,10 +761,13 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True): ...@@ -839,10 +761,13 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
|/ |/ |/ |/
2 -------- 1 2 -------- 1
Note:
This function is for LiDAR boxes only.
Args: Args:
boxes3d (np.ndarray): Boxes with shape of (N, 7) boxes3d (np.ndarray): Boxes with shape of (N, 7)
[x, y, z, w, l, h, ry] in LiDAR coords, see the definition of ry [x, y, z, x_size, y_size, z_size, ry] in LiDAR coords,
in KITTI dataset. see the definition of ry in KITTI dataset.
bottom_center (bool, optional): Whether z is on the bottom center bottom_center (bool, optional): Whether z is on the bottom center
of object. Defaults to True. of object. Defaults to True.
...@@ -850,19 +775,25 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True): ...@@ -850,19 +775,25 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
np.ndarray: Box corners with the shape of [N, 8, 3]. np.ndarray: Box corners with the shape of [N, 8, 3].
""" """
boxes_num = boxes3d.shape[0] boxes_num = boxes3d.shape[0]
w, l, h = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5] x_size, y_size, z_size = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
x_corners = np.array( x_corners = np.array([
[w / 2., -w / 2., -w / 2., w / 2., w / 2., -w / 2., -w / 2., w / 2.], x_size / 2., -x_size / 2., -x_size / 2., x_size / 2., x_size / 2.,
dtype=np.float32).T -x_size / 2., -x_size / 2., x_size / 2.
y_corners = np.array( ],
[-l / 2., -l / 2., l / 2., l / 2., -l / 2., -l / 2., l / 2., l / 2.], dtype=np.float32).T
dtype=np.float32).T y_corners = np.array([
-y_size / 2., -y_size / 2., y_size / 2., y_size / 2., -y_size / 2.,
-y_size / 2., y_size / 2., y_size / 2.
],
dtype=np.float32).T
if bottom_center: if bottom_center:
z_corners = np.zeros((boxes_num, 8), dtype=np.float32) z_corners = np.zeros((boxes_num, 8), dtype=np.float32)
z_corners[:, 4:8] = h.reshape(boxes_num, 1).repeat(4, axis=1) # (N, 8) z_corners[:, 4:8] = z_size.reshape(boxes_num, 1).repeat(
4, axis=1) # (N, 8)
else: else:
z_corners = np.array([ z_corners = np.array([
-h / 2., -h / 2., -h / 2., -h / 2., h / 2., h / 2., h / 2., h / 2. -z_size / 2., -z_size / 2., -z_size / 2., -z_size / 2.,
z_size / 2., z_size / 2., z_size / 2., z_size / 2.
], ],
dtype=np.float32).T dtype=np.float32).T
...@@ -870,9 +801,9 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True): ...@@ -870,9 +801,9 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
zeros, ones = np.zeros( zeros, ones = np.zeros(
ry.size, dtype=np.float32), np.ones( ry.size, dtype=np.float32), np.ones(
ry.size, dtype=np.float32) ry.size, dtype=np.float32)
rot_list = np.array([[np.cos(ry), -np.sin(ry), zeros], rot_list = np.array([[np.cos(ry), np.sin(ry), zeros],
[np.sin(ry), np.cos(ry), zeros], [zeros, zeros, [-np.sin(ry), np.cos(ry), zeros],
ones]]) # (3, 3, N) [zeros, zeros, ones]]) # (3, 3, N)
R_list = np.transpose(rot_list, (2, 0, 1)) # (N, 3, 3) R_list = np.transpose(rot_list, (2, 0, 1)) # (N, 3, 3)
temp_corners = np.concatenate((x_corners.reshape( temp_corners = np.concatenate((x_corners.reshape(
......
...@@ -3,10 +3,17 @@ from mmdet.core.bbox import build_bbox_coder ...@@ -3,10 +3,17 @@ from mmdet.core.bbox import build_bbox_coder
from .anchor_free_bbox_coder import AnchorFreeBBoxCoder from .anchor_free_bbox_coder import AnchorFreeBBoxCoder
from .centerpoint_bbox_coders import CenterPointBBoxCoder from .centerpoint_bbox_coders import CenterPointBBoxCoder
from .delta_xyzwhlr_bbox_coder import DeltaXYZWLHRBBoxCoder from .delta_xyzwhlr_bbox_coder import DeltaXYZWLHRBBoxCoder
from .fcos3d_bbox_coder import FCOS3DBBoxCoder
from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
from .monoflex_bbox_coder import MonoFlexCoder
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
from .pgd_bbox_coder import PGDBBoxCoder
from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
from .smoke_bbox_coder import SMOKECoder
__all__ = [ __all__ = [
'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder', 'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder' 'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder',
'MonoFlexCoder'
] ]
...@@ -25,7 +25,7 @@ class AnchorFreeBBoxCoder(PartialBinBasedBBoxCoder): ...@@ -25,7 +25,7 @@ class AnchorFreeBBoxCoder(PartialBinBasedBBoxCoder):
"""Encode ground truth to prediction targets. """Encode ground truth to prediction targets.
Args: Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \ gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7). with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes. gt_labels_3d (torch.Tensor): Ground truth classes.
......
...@@ -13,12 +13,12 @@ class CenterPointBBoxCoder(BaseBBoxCoder): ...@@ -13,12 +13,12 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
pc_range (list[float]): Range of point cloud. pc_range (list[float]): Range of point cloud.
out_size_factor (int): Downsample factor of the model. out_size_factor (int): Downsample factor of the model.
voxel_size (list[float]): Size of voxel. voxel_size (list[float]): Size of voxel.
post_center_range (list[float]): Limit of the center. post_center_range (list[float], optional): Limit of the center.
Default: None. Default: None.
max_num (int): Max number to be kept. Default: 100. max_num (int, optional): Max number to be kept. Default: 100.
score_threshold (float): Threshold to filter boxes based on score. score_threshold (float, optional): Threshold to filter boxes
Default: None. based on score. Default: None.
code_size (int): Code size of bboxes. Default: 9 code_size (int, optional): Code size of bboxes. Default: 9
""" """
def __init__(self, def __init__(self,
...@@ -45,7 +45,8 @@ class CenterPointBBoxCoder(BaseBBoxCoder): ...@@ -45,7 +45,8 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
feats (torch.Tensor): Features to be transposed and gathered feats (torch.Tensor): Features to be transposed and gathered
with the shape of [B, 2, W, H]. with the shape of [B, 2, W, H].
inds (torch.Tensor): Indexes with the shape of [B, N]. inds (torch.Tensor): Indexes with the shape of [B, N].
feat_masks (torch.Tensor): Mask of the feats. Default: None. feat_masks (torch.Tensor, optional): Mask of the feats.
Default: None.
Returns: Returns:
torch.Tensor: Gathered feats. torch.Tensor: Gathered feats.
...@@ -64,7 +65,7 @@ class CenterPointBBoxCoder(BaseBBoxCoder): ...@@ -64,7 +65,7 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
Args: Args:
scores (torch.Tensor): scores with the shape of [B, N, W, H]. scores (torch.Tensor): scores with the shape of [B, N, W, H].
K (int): Number to be kept. Defaults to 80. K (int, optional): Number to be kept. Defaults to 80.
Returns: Returns:
tuple[torch.Tensor] tuple[torch.Tensor]
...@@ -135,9 +136,9 @@ class CenterPointBBoxCoder(BaseBBoxCoder): ...@@ -135,9 +136,9 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
dim (torch.Tensor): Dim of the boxes with the shape of dim (torch.Tensor): Dim of the boxes with the shape of
[B, 1, W, H]. [B, 1, W, H].
vel (torch.Tensor): Velocity with the shape of [B, 1, W, H]. vel (torch.Tensor): Velocity with the shape of [B, 1, W, H].
reg (torch.Tensor): Regression value of the boxes in 2D with reg (torch.Tensor, optional): Regression value of the boxes in
the shape of [B, 2, W, H]. Default: None. 2D with the shape of [B, 2, W, H]. Default: None.
task_id (int): Index of task. Default: -1. task_id (int, optional): Index of task. Default: -1.
Returns: Returns:
list[dict]: Decoded boxes. list[dict]: Decoded boxes.
......
...@@ -19,9 +19,9 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder): ...@@ -19,9 +19,9 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):
@staticmethod @staticmethod
def encode(src_boxes, dst_boxes): def encode(src_boxes, dst_boxes):
"""Get box regression transformation deltas (dx, dy, dz, dw, dh, dl, """Get box regression transformation deltas (dx, dy, dz, dx_size,
dr, dv*) that can be used to transform the `src_boxes` into the dy_size, dz_size, dr, dv*) that can be used to transform the
`target_boxes`. `src_boxes` into the `target_boxes`.
Args: Args:
src_boxes (torch.Tensor): source boxes, e.g., object proposals. src_boxes (torch.Tensor): source boxes, e.g., object proposals.
...@@ -56,13 +56,13 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder): ...@@ -56,13 +56,13 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):
@staticmethod @staticmethod
def decode(anchors, deltas): def decode(anchors, deltas):
"""Apply transformation `deltas` (dx, dy, dz, dw, dh, dl, dr, dv*) to """Apply transformation `deltas` (dx, dy, dz, dx_size, dy_size,
`boxes`. dz_size, dr, dv*) to `boxes`.
Args: Args:
anchors (torch.Tensor): Parameters of anchors with shape (N, 7). anchors (torch.Tensor): Parameters of anchors with shape (N, 7).
deltas (torch.Tensor): Encoded boxes with shape deltas (torch.Tensor): Encoded boxes with shape
(N, 7+n) [x, y, z, w, l, h, r, velo*]. (N, 7+n) [x, y, z, x_size, y_size, z_size, r, velo*].
Returns: Returns:
torch.Tensor: Decoded boxes. torch.Tensor: Decoded boxes.
......
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
from ..structures import limit_period
@BBOX_CODERS.register_module()
class FCOS3DBBoxCoder(BaseBBoxCoder):
"""Bounding box coder for FCOS3D.
Args:
base_depths (tuple[tuple[float]]): Depth references for decode box
depth. Defaults to None.
base_dims (tuple[tuple[float]]): Dimension references for decode box
dimension. Defaults to None.
code_size (int): The dimension of boxes to be encoded. Defaults to 7.
norm_on_bbox (bool): Whether to apply normalization on the bounding
box 2D attributes. Defaults to True.
"""
def __init__(self,
base_depths=None,
base_dims=None,
code_size=7,
norm_on_bbox=True):
super(FCOS3DBBoxCoder, self).__init__()
self.base_depths = base_depths
self.base_dims = base_dims
self.bbox_code_size = code_size
self.norm_on_bbox = norm_on_bbox
def encode(self, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels):
# TODO: refactor the encoder in the FCOS3D and PGD head
pass
def decode(self, bbox, scale, stride, training, cls_score=None):
"""Decode regressed results into 3D predictions.
Note that offsets are not transformed to the projected 3D centers.
Args:
bbox (torch.Tensor): Raw bounding box predictions in shape
[N, C, H, W].
scale (tuple[`Scale`]): Learnable scale parameters.
stride (int): Stride for a specific feature level.
training (bool): Whether the decoding is in the training
procedure.
cls_score (torch.Tensor): Classification score map for deciding
which base depth or dim is used. Defaults to None.
Returns:
torch.Tensor: Decoded boxes.
"""
# scale the bbox of different level
# only apply to offset, depth and size prediction
scale_offset, scale_depth, scale_size = scale[0:3]
clone_bbox = bbox.clone()
bbox[:, :2] = scale_offset(clone_bbox[:, :2]).float()
bbox[:, 2] = scale_depth(clone_bbox[:, 2]).float()
bbox[:, 3:6] = scale_size(clone_bbox[:, 3:6]).float()
if self.base_depths is None:
bbox[:, 2] = bbox[:, 2].exp()
elif len(self.base_depths) == 1: # only single prior
mean = self.base_depths[0][0]
std = self.base_depths[0][1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
else: # multi-class priors
assert len(self.base_depths) == cls_score.shape[1], \
'The number of multi-class depth priors should be equal to ' \
'the number of categories.'
indices = cls_score.max(dim=1)[1]
depth_priors = cls_score.new_tensor(
self.base_depths)[indices, :].permute(0, 3, 1, 2)
mean = depth_priors[:, 0]
std = depth_priors[:, 1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
bbox[:, 3:6] = bbox[:, 3:6].exp()
if self.base_dims is not None:
assert len(self.base_dims) == cls_score.shape[1], \
'The number of anchor sizes should be equal to the number ' \
'of categories.'
indices = cls_score.max(dim=1)[1]
size_priors = cls_score.new_tensor(
self.base_dims)[indices, :].permute(0, 3, 1, 2)
bbox[:, 3:6] = size_priors * bbox.clone()[:, 3:6]
assert self.norm_on_bbox is True, 'Setting norm_on_bbox to False '\
'has not been thoroughly tested for FCOS3D.'
if self.norm_on_bbox:
if not training:
# Note that this line is conducted only when testing
bbox[:, :2] *= stride
return bbox
@staticmethod
def decode_yaw(bbox, centers2d, dir_cls, dir_offset, cam2img):
"""Decode yaw angle and change it from local to global.i.
Args:
bbox (torch.Tensor): Bounding box predictions in shape
[N, C] with yaws to be decoded.
centers2d (torch.Tensor): Projected 3D-center on the image planes
corresponding to the box predictions.
dir_cls (torch.Tensor): Predicted direction classes.
dir_offset (float): Direction offset before dividing all the
directions into several classes.
cam2img (torch.Tensor): Camera intrinsic matrix in shape [4, 4].
Returns:
torch.Tensor: Bounding boxes with decoded yaws.
"""
if bbox.shape[0] > 0:
dir_rot = limit_period(bbox[..., 6] - dir_offset, 0, np.pi)
bbox[..., 6] = \
dir_rot + dir_offset + np.pi * dir_cls.to(bbox.dtype)
bbox[:, 6] = torch.atan2(centers2d[:, 0] - cam2img[0, 2],
cam2img[0, 0]) + bbox[:, 6]
return bbox
...@@ -14,9 +14,10 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder): ...@@ -14,9 +14,10 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
num_dir_bins (int): Number of bins to encode direction angle. num_dir_bins (int): Number of bins to encode direction angle.
num_sizes (int): Number of size clusters. num_sizes (int): Number of size clusters.
mean_sizes (list[list[int]]): Mean size of bboxes in each class. mean_sizes (list[list[int]]): Mean size of bboxes in each class.
with_rot (bool): Whether the bbox is with rotation. Defaults to True. with_rot (bool, optional): Whether the bbox is with rotation.
size_cls_agnostic (bool): Whether the predicted size is class-agnostic.
Defaults to True. Defaults to True.
size_cls_agnostic (bool, optional): Whether the predicted size is
class-agnostic. Defaults to True.
""" """
def __init__(self, def __init__(self,
...@@ -36,7 +37,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder): ...@@ -36,7 +37,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
"""Encode ground truth to prediction targets. """Encode ground truth to prediction targets.
Args: Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \ gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7). with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes. gt_labels_3d (torch.Tensor): Ground truth classes.
...@@ -76,7 +77,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder): ...@@ -76,7 +77,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
- size_class: predicted bbox size class. - size_class: predicted bbox size class.
- size_res: predicted bbox size residual. - size_res: predicted bbox size residual.
- size: predicted class-agnostic bbox size - size: predicted class-agnostic bbox size
prefix (str): Decode predictions with specific prefix. prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''. Defaults to ''.
Returns: Returns:
...@@ -122,7 +123,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder): ...@@ -122,7 +123,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
cls_preds (torch.Tensor): Class predicted features to split. cls_preds (torch.Tensor): Class predicted features to split.
reg_preds (torch.Tensor): Regression predicted features to split. reg_preds (torch.Tensor): Regression predicted features to split.
base_xyz (torch.Tensor): Coordinates of points. base_xyz (torch.Tensor): Coordinates of points.
prefix (str): Decode predictions with specific prefix. prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''. Defaults to ''.
Returns: Returns:
......
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from torch.nn import functional as F
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class MonoFlexCoder(BaseBBoxCoder):
"""Bbox Coder for MonoFlex.
Args:
depth_mode (str): The mode for depth calculation.
Available options are "linear", "inv_sigmoid", and "exp".
base_depth (tuple[float]): References for decoding box depth.
depth_range (list): Depth range of predicted depth.
combine_depth (bool): Whether to use combined depth (direct depth
and depth from keypoints) or use direct depth only.
uncertainty_range (list): Uncertainty range of predicted depth.
base_dims (tuple[tuple[float]]): Dimensions mean and std of decode bbox
dimensions [l, h, w] for each category.
dims_mode (str): The mode for dimension calculation.
Available options are "linear" and "exp".
multibin (bool): Whether to use multibin representation.
num_dir_bins (int): Number of Number of bins to encode
direction angle.
bin_centers (list[float]): Local yaw centers while using multibin
representations.
bin_margin (float): Margin of multibin representations.
code_size (int): The dimension of boxes to be encoded.
eps (float, optional): A value added to the denominator for numerical
stability. Default 1e-3.
"""
def __init__(self,
depth_mode,
base_depth,
depth_range,
combine_depth,
uncertainty_range,
base_dims,
dims_mode,
multibin,
num_dir_bins,
bin_centers,
bin_margin,
code_size,
eps=1e-3):
super(MonoFlexCoder, self).__init__()
# depth related
self.depth_mode = depth_mode
self.base_depth = base_depth
self.depth_range = depth_range
self.combine_depth = combine_depth
self.uncertainty_range = uncertainty_range
# dimensions related
self.base_dims = base_dims
self.dims_mode = dims_mode
# orientation related
self.multibin = multibin
self.num_dir_bins = num_dir_bins
self.bin_centers = bin_centers
self.bin_margin = bin_margin
# output related
self.bbox_code_size = code_size
self.eps = eps
def encode(self, gt_bboxes_3d):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (`BaseInstance3DBoxes`): Ground truth 3D bboxes.
shape: (N, 7).
Returns:
torch.Tensor: Targets of orientations.
"""
local_yaw = gt_bboxes_3d.local_yaw
# encode local yaw (-pi ~ pi) to multibin format
encode_local_yaw = local_yaw.new_zeros(
[local_yaw.shape[0], self.num_dir_bins * 2])
bin_size = 2 * np.pi / self.num_dir_bins
margin_size = bin_size * self.bin_margin
bin_centers = local_yaw.new_tensor(self.bin_centers)
range_size = bin_size / 2 + margin_size
offsets = local_yaw.unsqueeze(1) - bin_centers.unsqueeze(0)
offsets[offsets > np.pi] = offsets[offsets > np.pi] - 2 * np.pi
offsets[offsets < -np.pi] = offsets[offsets < -np.pi] + 2 * np.pi
for i in range(self.num_dir_bins):
offset = offsets[:, i]
inds = abs(offset) < range_size
encode_local_yaw[inds, i] = 1
encode_local_yaw[inds, i + self.num_dir_bins] = offset[inds]
orientation_target = encode_local_yaw
return orientation_target
def decode(self, bbox, base_centers2d, labels, downsample_ratio, cam2imgs):
"""Decode bounding box regression into 3D predictions.
Args:
bbox (Tensor): Raw bounding box predictions for each
predict center2d point.
shape: (N, C)
base_centers2d (torch.Tensor): Base centers2d for 3D bboxes.
shape: (N, 2).
labels (Tensor): Batch predict class label for each predict
center2d point.
shape: (N, )
downsample_ratio (int): The stride of feature map.
cam2imgs (Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
Return:
dict: The 3D prediction dict decoded from regression map.
the dict has components below:
- bboxes2d (torch.Tensor): Decoded [x1, y1, x2, y2] format
2D bboxes.
- dimensions (torch.Tensor): Decoded dimensions for each
object.
- offsets2d (torch.Tenosr): Offsets between base centers2d
and real centers2d.
- direct_depth (torch.Tensor): Decoded directly regressed
depth.
- keypoints2d (torch.Tensor): Keypoints of each projected
3D box on image.
- keypoints_depth (torch.Tensor): Decoded depth from keypoints.
- combined_depth (torch.Tensor): Combined depth using direct
depth and keypoints depth with depth uncertainty.
- orientations (torch.Tensor): Multibin format orientations
(local yaw) for each objects.
"""
# 4 dimensions for FCOS style regression
pred_bboxes2d = bbox[:, 0:4]
# change FCOS style to [x1, y1, x2, y2] format for IOU Loss
pred_bboxes2d = self.decode_bboxes2d(pred_bboxes2d, base_centers2d)
# 2 dimensions for projected centers2d offsets
pred_offsets2d = bbox[:, 4:6]
# 3 dimensions for 3D bbox dimensions offsets
pred_dimensions_offsets3d = bbox[:, 29:32]
# the first 8 dimensions are for orientation bin classification
# and the second 8 dimensions are for orientation offsets.
pred_orientations = torch.cat((bbox[:, 32:40], bbox[:, 40:48]), dim=1)
# 3 dimensions for the uncertainties of the solved depths from
# groups of keypoints
pred_keypoints_depth_uncertainty = bbox[:, 26:29]
# 1 dimension for the uncertainty of directly regressed depth
pred_direct_depth_uncertainty = bbox[:, 49:50].squeeze(-1)
# 2 dimension of offsets x keypoints (8 corners + top/bottom center)
pred_keypoints2d = bbox[:, 6:26].reshape(-1, 10, 2)
# 1 dimension for depth offsets
pred_direct_depth_offsets = bbox[:, 48:49].squeeze(-1)
# decode the pred residual dimensions to real dimensions
pred_dimensions = self.decode_dims(labels, pred_dimensions_offsets3d)
pred_direct_depth = self.decode_direct_depth(pred_direct_depth_offsets)
pred_keypoints_depth = self.keypoints2depth(pred_keypoints2d,
pred_dimensions, cam2imgs,
downsample_ratio)
pred_direct_depth_uncertainty = torch.clamp(
pred_direct_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
pred_keypoints_depth_uncertainty = torch.clamp(
pred_keypoints_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
if self.combine_depth:
pred_depth_uncertainty = torch.cat(
(pred_direct_depth_uncertainty.unsqueeze(-1),
pred_keypoints_depth_uncertainty),
dim=1).exp()
pred_depth = torch.cat(
(pred_direct_depth.unsqueeze(-1), pred_keypoints_depth), dim=1)
pred_combined_depth = \
self.combine_depths(pred_depth, pred_depth_uncertainty)
else:
pred_combined_depth = None
preds = dict(
bboxes2d=pred_bboxes2d,
dimensions=pred_dimensions,
offsets2d=pred_offsets2d,
keypoints2d=pred_keypoints2d,
orientations=pred_orientations,
direct_depth=pred_direct_depth,
keypoints_depth=pred_keypoints_depth,
combined_depth=pred_combined_depth,
direct_depth_uncertainty=pred_direct_depth_uncertainty,
keypoints_depth_uncertainty=pred_keypoints_depth_uncertainty,
)
return preds
def decode_direct_depth(self, depth_offsets):
"""Transform depth offset to directly regressed depth.
Args:
depth_offsets (torch.Tensor): Predicted depth offsets.
shape: (N, )
Return:
torch.Tensor: Directly regressed depth.
shape: (N, )
"""
if self.depth_mode == 'exp':
direct_depth = depth_offsets.exp()
elif self.depth_mode == 'linear':
base_depth = depth_offsets.new_tensor(self.base_depth)
direct_depth = depth_offsets * base_depth[1] + base_depth[0]
elif self.depth_mode == 'inv_sigmoid':
direct_depth = 1 / torch.sigmoid(depth_offsets) - 1
else:
raise ValueError
if self.depth_range is not None:
direct_depth = torch.clamp(
direct_depth, min=self.depth_range[0], max=self.depth_range[1])
return direct_depth
def decode_location(self,
base_centers2d,
offsets2d,
depths,
cam2imgs,
downsample_ratio,
pad_mode='default'):
"""Retrieve object location.
Args:
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
offsets2d (torch.Tensor): The offsets between real centers2d
and base centers2d.
shape: (N , 2)
depths (torch.Tensor): Depths of objects.
shape: (N, )
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int): The stride of feature map.
pad_mode (str, optional): Padding mode used in
training data augmentation.
Return:
tuple(torch.Tensor): Centers of 3D boxes.
shape: (N, 3)
"""
N = cam2imgs.shape[0]
# (N, 4, 4)
cam2imgs_inv = cam2imgs.inverse()
if pad_mode == 'default':
centers2d_img = (base_centers2d + offsets2d) * downsample_ratio
else:
raise NotImplementedError
# (N, 3)
centers2d_img = \
torch.cat((centers2d_img, depths.unsqueeze(-1)), dim=1)
# (N, 4, 1)
centers2d_extend = \
torch.cat((centers2d_img, centers2d_img.new_ones(N, 1)),
dim=1).unsqueeze(-1)
locations = torch.matmul(cam2imgs_inv, centers2d_extend).squeeze(-1)
return locations[:, :3]
def keypoints2depth(self,
keypoints2d,
dimensions,
cam2imgs,
downsample_ratio=4,
group0_index=[(7, 3), (0, 4)],
group1_index=[(2, 6), (1, 5)]):
"""Decode depth form three groups of keypoints and geometry projection
model. 2D keypoints inlucding 8 coreners and top/bottom centers will be
divided into three groups which will be used to calculate three depths
of object.
.. code-block:: none
Group center keypoints:
+ --------------- +
/| top center /|
/ | . / |
/ | | / |
+ ---------|----- + +
| / | | /
| / . | /
|/ bottom center |/
+ --------------- +
Group 0 keypoints:
0
+ -------------- +
/| /|
/ | / |
/ | 5/ |
+ -------------- + +
| /3 | /
| / | /
|/ |/
+ -------------- + 6
Group 1 keypoints:
4
+ -------------- +
/| /|
/ | / |
/ | / |
1 + -------------- + + 7
| / | /
| / | /
|/ |/
2 + -------------- +
Args:
keypoints2d (torch.Tensor): Keypoints of objects.
8 vertices + top/bottom center.
shape: (N, 10, 2)
dimensions (torch.Tensor): Dimensions of objetcts.
shape: (N, 3)
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int, opitonal): The stride of feature map.
Defaults: 4.
group0_index(list[tuple[int]], optional): Keypoints group 0
of index to calculate the depth.
Defaults: [0, 3, 4, 7].
group1_index(list[tuple[int]], optional): Keypoints group 1
of index to calculate the depth.
Defaults: [1, 2, 5, 6]
Return:
tuple(torch.Tensor): Depth computed from three groups of
keypoints (top/bottom, group0, group1)
shape: (N, 3)
"""
pred_height_3d = dimensions[:, 1].clone()
f_u = cam2imgs[:, 0, 0]
center_height = keypoints2d[:, -2, 1] - keypoints2d[:, -1, 1]
corner_group0_height = keypoints2d[:, group0_index[0], 1] \
- keypoints2d[:, group0_index[1], 1]
corner_group1_height = keypoints2d[:, group1_index[0], 1] \
- keypoints2d[:, group1_index[1], 1]
center_depth = f_u * pred_height_3d / (
F.relu(center_height) * downsample_ratio + self.eps)
corner_group0_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group0_height) * downsample_ratio + self.eps)
corner_group1_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group1_height) * downsample_ratio + self.eps)
corner_group0_depth = corner_group0_depth.mean(dim=1)
corner_group1_depth = corner_group1_depth.mean(dim=1)
keypoints_depth = torch.stack(
(center_depth, corner_group0_depth, corner_group1_depth), dim=1)
keypoints_depth = torch.clamp(
keypoints_depth, min=self.depth_range[0], max=self.depth_range[1])
return keypoints_depth
def decode_dims(self, labels, dims_offset):
"""Retrieve object dimensions.
Args:
labels (torch.Tensor): Each points' category id.
shape: (N, K)
dims_offset (torch.Tensor): Dimension offsets.
shape: (N, 3)
Returns:
torch.Tensor: Shape (N, 3)
"""
if self.dims_mode == 'exp':
dims_offset = dims_offset.exp()
elif self.dims_mode == 'linear':
labels = labels.long()
base_dims = dims_offset.new_tensor(self.base_dims)
dims_mean = base_dims[:, :3]
dims_std = base_dims[:, 3:6]
cls_dimension_mean = dims_mean[labels, :]
cls_dimension_std = dims_std[labels, :]
dimensions = dims_offset * cls_dimension_mean + cls_dimension_std
else:
raise ValueError
return dimensions
def decode_orientation(self, ori_vector, locations):
"""Retrieve object orientation.
Args:
ori_vector (torch.Tensor): Local orientation vector
in [axis_cls, head_cls, sin, cos] format.
shape: (N, num_dir_bins * 4)
locations (torch.Tensor): Object location.
shape: (N, 3)
Returns:
tuple[torch.Tensor]: yaws and local yaws of 3d bboxes.
"""
if self.multibin:
pred_bin_cls = ori_vector[:, :self.num_dir_bins * 2].view(
-1, self.num_dir_bins, 2)
pred_bin_cls = pred_bin_cls.softmax(dim=2)[..., 1]
orientations = ori_vector.new_zeros(ori_vector.shape[0])
for i in range(self.num_dir_bins):
mask_i = (pred_bin_cls.argmax(dim=1) == i)
start_bin = self.num_dir_bins * 2 + i * 2
end_bin = start_bin + 2
pred_bin_offset = ori_vector[mask_i, start_bin:end_bin]
orientations[mask_i] = pred_bin_offset[:, 0].atan2(
pred_bin_offset[:, 1]) + self.bin_centers[i]
else:
axis_cls = ori_vector[:, :2].softmax(dim=1)
axis_cls = axis_cls[:, 0] < axis_cls[:, 1]
head_cls = ori_vector[:, 2:4].softmax(dim=1)
head_cls = head_cls[:, 0] < head_cls[:, 1]
# cls axis
orientations = self.bin_centers[axis_cls + head_cls * 2]
sin_cos_offset = F.normalize(ori_vector[:, 4:])
orientations += sin_cos_offset[:, 0].atan(sin_cos_offset[:, 1])
locations = locations.view(-1, 3)
rays = locations[:, 0].atan2(locations[:, 2])
local_yaws = orientations
yaws = local_yaws + rays
larger_idx = (yaws > np.pi).nonzero(as_tuple=False)
small_idx = (yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
yaws[small_idx] += 2 * np.pi
larger_idx = (local_yaws > np.pi).nonzero(as_tuple=False)
small_idx = (local_yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
local_yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
local_yaws[small_idx] += 2 * np.pi
return yaws, local_yaws
def decode_bboxes2d(self, reg_bboxes2d, base_centers2d):
"""Retrieve [x1, y1, x2, y2] format 2D bboxes.
Args:
reg_bboxes2d (torch.Tensor): Predicted FCOS style
2D bboxes.
shape: (N, 4)
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
Returns:
torch.Tenosr: [x1, y1, x2, y2] format 2D bboxes.
"""
centers_x = base_centers2d[:, 0]
centers_y = base_centers2d[:, 1]
xs_min = centers_x - reg_bboxes2d[..., 0]
ys_min = centers_y - reg_bboxes2d[..., 1]
xs_max = centers_x + reg_bboxes2d[..., 2]
ys_max = centers_y + reg_bboxes2d[..., 3]
bboxes2d = torch.stack([xs_min, ys_min, xs_max, ys_max], dim=-1)
return bboxes2d
def combine_depths(self, depth, depth_uncertainty):
"""Combine all the prediced depths with depth uncertainty.
Args:
depth (torch.Tensor): Predicted depths of each object.
2D bboxes.
shape: (N, 4)
depth_uncertainty (torch.Tensor): Depth uncertainty for
each depth of each object.
shape: (N, 4)
Returns:
torch.Tenosr: combined depth.
"""
uncertainty_weights = 1 / depth_uncertainty
uncertainty_weights = \
uncertainty_weights / \
uncertainty_weights.sum(dim=1, keepdim=True)
combined_depth = torch.sum(depth * uncertainty_weights, dim=1)
return combined_depth
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment