Commit ba3cd005 authored by 雍大凯's avatar 雍大凯
Browse files

将子模块转换为普通目录

parent d2b71343
# 教程 8: MMDet3D 模型部署
为了满足在实际使用过程中遇到的算法模型的速度需求,通常我们会将训练好的模型部署到各种推理后端上。 [MMDeploy](https://github.com/open-mmlab/mmdeploy) 是 OpenMMLab 系列算法库的部署框架,现在 MMDeploy 已经支持了 MMDetection3D,我们可以通过 MMDeploy 将训练好的模型部署到各种推理后端上。
## 准备
### 安装 MMDeploy
```bash
git clone -b master git@github.com:open-mmlab/mmdeploy.git
cd mmdeploy
git submodule update --init --recursive
```
### 安装推理后端编译自定义算子
根据 MMDeploy 的文档选择安装推理后端并编译自定义算子,目前 MMDet3D 模型支持了的推理后端有 [OnnxRuntime](https://mmdeploy.readthedocs.io/en/latest/backends/onnxruntime.html)[TensorRT](https://mmdeploy.readthedocs.io/en/latest/backends/tensorrt.html)[OpenVINO](https://mmdeploy.readthedocs.io/en/latest/backends/openvino.html)
## 模型导出
将 MMDet3D 训练好的 Pytorch 模型转换成 ONNX 模型文件和推理后端所需要的模型文件。你可以参考 MMDeploy 的文档 [how_to_convert_model.md](https://github.com/open-mmlab/mmdeploy/blob/master/docs/zh_cn/tutorials/how_to_convert_model.md)
```bash
python ./tools/deploy.py \
${DEPLOY_CFG_PATH} \
${MODEL_CFG_PATH} \
${MODEL_CHECKPOINT_PATH} \
${INPUT_IMG} \
--test-img ${TEST_IMG} \
--work-dir ${WORK_DIR} \
--calib-dataset-cfg ${CALIB_DATA_CFG} \
--device ${DEVICE} \
--log-level INFO \
--show \
--dump-info
```
### 参数描述
- `deploy_cfg` : MMDeploy 代码库中用于部署的配置文件路径。
- `model_cfg` : OpenMMLab 系列代码库中使用的模型配置文件路径。
- `checkpoint` : OpenMMLab 系列代码库的模型文件路径。
- `img` : 用于模型转换时使用的点云文件或图像文件路径。
- `--test-img` : 用于测试模型的图像文件路径。如果没有指定,将设置成 `None`
- `--work-dir` : 工作目录,用来保存日志和模型文件。
- `--calib-dataset-cfg` : 此参数只在 int8 模式下生效,用于校准数据集配置文件。如果没有指定,将被设置成 `None`,并使用模型配置文件中的 'val' 数据集进行校准。
- `--device` : 用于模型转换的设备。如果没有指定,将被设置成 cpu。
- `--log-level` : 设置日记的等级,选项包括 `'CRITICAL','FATAL','ERROR','WARN','WARNING','INFO','DEBUG','NOTSET'`。如果没有指定,将被设置成 INFO。
- `--show` : 是否显示检测的结果。
- `--dump-info` : 是否输出 SDK 信息。
### 示例
```bash
cd mmdeploy
python tools/deploy.py \
configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic-kitti.py \
${$MMDET3D_DIR}/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py \
${$MMDET3D_DIR}/checkpoints/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class_20200620_230421-aa0f3adb.pth \
${$MMDET3D_DIR}/demo/data/kitti/kitti_000008.bin \
--work-dir work-dir \
--device cuda:0 \
--show
```
## 模型推理
现在你可以使用推理后端提供的 API 进行模型推理。但是,如果你想立即测试模型怎么办?我们为您准备了一些推理后端的封装。
```python
from mmdeploy.apis import inference_model
result = inference_model(model_cfg, deploy_cfg, backend_files, img=img, device=device)
```
`inference_model` 将创建一个推理后端的模块并为你进行推理。推理结果与模型的 OpenMMLab 代码库具有相同的格式。
## 测试模型(可选)
可以测试部署在推理后端上的模型的精度和速度。你可以参考 [how to measure performance of models](https://mmdeploy.readthedocs.io/en/latest/tutorials/how_to_measure_performance_of_models.html)
```bash
python tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--out ${OUTPUT_PKL_FILE}] \
[--format-only] \
[--metrics ${METRICS}] \
[--show] \
[--show-dir ${OUTPUT_IMAGE_DIR}] \
[--show-score-thr ${SHOW_SCORE_THR}] \
--device ${DEVICE} \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}] \
[--log2file work_dirs/output.txt]
```
### 示例
```bash
cd mmdeploy
python tools/test.py \
configs/mmdet3d/voxel-detection/voxel-detection_onnxruntime_dynamic.py \
${MMDET3D_DIR}/configs/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.py \
--model work-dir/end2end.onnx \
--metrics bbox \
--device cpu
```
## 支持模型列表
| Model | TorchScript | OnnxRuntime | TensorRT | NCNN | PPLNN | OpenVINO | Model config |
| -------------------- | :---------: | :---------: | :------: | :--: | :---: | :------: | -------------------------------------------------------------------------------------- |
| PointPillars | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) |
| CenterPoint (pillar) | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) |
## 注意
- MMDeploy 的版本需要 >= 0.4.0。
- 目前 CenterPoint 仅支持了 pillar 版本的。
我们在 `tools/` 文件夹路径下提供了许多有用的工具。
# 日志分析
给定一个训练的日志文件,您可以绘制出 loss/mAP 曲线。首先需要运行 `pip install seaborn` 安装依赖包。
![loss曲线图](../../resources/loss_curve.png)
```shell
python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] [--mode ${MODE}] [--interval ${INTERVAL}]
```
**注意**: 如果您想绘制的指标是在验证阶段计算得到的,您需要添加一个标志 `--mode eval` ,如果您每经过一个 `${INTERVAL}` 的间隔进行评估,您需要增加一个参数 `--interval ${INTERVAL}`
示例:
- 绘制出某次运行的分类 loss。
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- 绘制出某次运行的分类和回归 loss,并且保存图片为 pdf 格式。
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
- 在同一张图片中比较两次运行的 bbox mAP。
```shell
# 根据 Car_3D_moderate_strict 在 KITTI 上评估 PartA2 和 second。
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# 根据 Car_3D_moderate_strict 在 KITTI 上分别对车和 3 类评估 PointPillars。
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
您也能计算平均训练速度。
```shell
python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
```
预期输出应该如下所示。
```
-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter
```
 
# 可视化
## 结果
为了观察模型的预测结果,您可以运行下面的指令
```bash
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
```
在运行这个指令后,所有的绘制结果包括输入数据,以及在输入数据基础上可视化的网络输出和真值(例如: 3D 单模态检测任务中的 `***_points.obj``***_pred.obj`),将会被保存在 `${SHOW_DIR}`
要在评估期间看见预测结果,您可以运行下面的指令
```bash
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --eval-options 'show=True' 'out_dir=${SHOW_DIR}'
```
在运行这个指令后,您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签(例如:在多模态检测任务中的`***_points.obj``***_pred.obj``***_gt.obj``***_img.png``***_pred.png` )。当 `show` 被激活,[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当您在没有 GUI 的远程服务器上运行测试的时候,无法进行在线可视化,您可以设定 `show=False` 将输出结果保存在 `{SHOW_DIR}`
至于离线可视化,您将有两个选择。
利用 `Open3D` 后端可视化结果,您可以运行下面的指令
```bash
python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}
```
![](../../resources/open3d_visual.*)
或者您可以使用 3D 可视化软件,例如 [MeshLab](http://www.meshlab.net/) 来打开这些在 `${SHOW_DIR}` 目录下的文件,从而查看 3D 检测输出。具体来说,打开 `***_points.obj` 查看输入点云,打开 `***_pred.obj` 查看预测的 3D 边界框。这允许推理和结果生成在远程服务器中完成,用户可以使用 GUI 在他们的主机上打开它们。
**注意**:可视化接口有一些不稳定,我们将计划和 MMDetection 一起重构这一部分。
## 数据集
我们也提供脚本用来可视化数据集,而无需推理。您可以使用 `tools/misc/browse_dataset.py` 来在线显示载入的数据和真值标签,并且保存进磁盘。现在我们支持所有数据集上的单模态 3D 检测和 3D 分割,支持 KITTI 和 SUN RGB-D 数据集上的多模态 3D 检测,同时支持 nuScenes 数据集上的单目 3D 检测。为了浏览 KITTI 数据集,您可以运行下面的指令
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --output-dir ${OUTPUT_DIR} --online
```
**注意**:一旦指定 `--output-dir` ,当按下 open3d 窗口的 `_ESC_`,用户指定的视图图像将被保存。如果您没有显示器,您可以移除 `--online` 标志,从而仅仅保存可视化结果并且进行离线浏览。
为了验证数据的一致性和数据增强的效果,您还可以使用以下命令添加 `--aug` 标志来可视化数据增强后的数据:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --aug --output-dir ${OUTPUT_DIR} --online
```
如果您还想显示 2D 图像以及投影的 3D 边界框,则需要找到支持多模态数据加载的配置文件,然后将 `--task` 参数更改为 `multi_modality-det`。一个例子如下所示
```shell
python tools/misc/browse_dataset.py configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py --task multi_modality-det --output-dir ${OUTPUT_DIR} --online
```
![](../../resources/browse_dataset_multi_modality.png)
您可以简单的使用不同的配置文件,浏览不同的数据集,例如:在 3D 语义分割任务中可视化 ScanNet 数据集
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/scannet_seg-3d-20class.py --task seg --output-dir ${OUTPUT_DIR} --online
```
![](../../resources/browse_dataset_seg.png)
在单目 3D 检测任务中浏览 nuScenes 数据集
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task mono-det --output-dir ${OUTPUT_DIR} --online
```
![](../../resources/browse_dataset_mono.png)
 
# 模型部署
**Note**: 此工具仍然处于试验阶段,目前只有 SECOND 支持用 [`TorchServe`](https://pytorch.org/serve/) 部署,我们将会在未来支持更多的模型。
为了使用 [`TorchServe`](https://pytorch.org/serve/) 部署 `MMDetection3D` 模型,您可以遵循以下步骤:
## 1. 将模型从 MMDetection3D 转换到 TorchServe
```shell
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```
**Note**: ${MODEL_STORE} 需要为文件夹的绝对路径。
## 2. 构建 `mmdet3d-serve` 镜像
```shell
docker build -t mmdet3d-serve:latest docker/serve/
```
## 3. 运行 `mmdet3d-serve`
查看官网文档来 [使用 docker 运行 TorchServe](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment)
为了在 GPU 上运行,您需要安装 [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。您可以忽略 `--gpus` 参数,从而在 CPU 上运行。
例子:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
```
[阅读文档](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) 关于 Inference (8080), Management (8081) and Metrics (8082) 接口。
## 4. 测试部署
您可以使用 `test_torchserver.py` 进行部署, 同时比较 torchserver 和 pytorch 的结果。
```shell
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
```
例子:
```shell
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
 
# 模型复杂度
您可以使用 MMDetection 中的 `tools/analysis_tools/get_flops.py` 这个脚本文件,基于 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 计算一个给定模型的计算量 (FLOPS) 和参数量 (params)。
```shell
python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
```
您将会得到如下的结果:
```text
==============================
Input shape: (4000, 4)
Flops: 5.78 GFLOPs
Params: 953.83 k
==============================
```
**注意**: 此工具仍然处于试验阶段,我们不能保证数值是绝对正确的。您可以将结果用于简单的比较,但在写技术文档报告或者论文之前您需要再次确认一下。
1. 计算量 (FLOPs) 和输入形状有关,但是参数量 (params) 则和输入形状无关。默认的输入形状为 (1, 40000, 4)。
2. 一些运算操作不计入计算量 (FLOPs),比如说像GN和定制的运算操作,详细细节请参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py)。
3. 我们现在仅仅支持单模态输入(点云或者图片)的单阶段模型的计算量 (FLOPs) 计算,我们将会在未来支持两阶段和多模态模型的计算。
 
# 模型转换
## RegNet 模型转换到 MMDetection
`tools/model_converters/regnet2mmdet.py` 将 pycls 预训练 RegNet 模型中的键转换为 MMDetection 风格。
```shell
python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
```
## Detectron ResNet 转换到 Pytorch
MMDetection 中的 `tools/detectron2pytorch.py` 能够把原始的 detectron 中预训练的 ResNet 模型的键转换为 PyTorch 风格。
```shell
python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
```
## 准备要发布的模型
`tools/model_converters/publish_model.py` 帮助用户准备他们用于发布的模型。
在您上传一个模型到云服务器 (AWS) 之前,您需要做以下几步:
1. 将模型权重转换为 CPU 张量
2. 删除记录优化器状态 (optimizer states) 的相关信息
3. 计算检查点 (checkpoint) 文件的哈希编码 (hash id) 并且把哈希编码加到文件名里
```shell
python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
例如,
```shell
python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
```
最终的输出文件名将会是 `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`。
 
# 数据集转换
`tools/data_converter/` 包含转换数据集为其他格式的一些工具。其中大多数转换数据集为基于 pickle 的信息文件,比如 KITTI,nuscense 和 lyft。Waymo 转换器被用来重新组织 waymo 原始数据为 KITTI 风格。用户能够参考它们了解我们转换数据格式的方法。将它们修改为 nuImages 转换器等脚本也很方便。
为了转换 nuImages 数据集为 COCO 格式,请使用下面的指令:
```shell
python -u tools/data_converter/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERSIONS} \
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
```
- `--data-root`: 数据集的根目录,默认为 `./data/nuimages`。
- `--version`: 数据集的版本,默认为 `v1.0-mini`。要获取完整数据集,请使用 `--version v1.0-train v1.0-val v1.0-mini`。
- `--out-dir`: 注释和语义掩码的输出目录,默认为 `./data/nuimages/annotations/`。
- `--nproc`: 数据准备的进程数,默认为 `4`。由于图片是并行处理的,更大的进程数目能够减少准备时间。
- `--extra-tag`: 注释的额外标签,默认为 `nuimages`。这可用于将不同时间处理的不同注释分开以供研究。
更多的数据准备细节参考 [doc](https://mmdetection3d.readthedocs.io/zh_CN/latest/data_preparation.html),nuImages 数据集的细节参考 [README](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages/README.md/)。
 
# 其他内容
## 打印完整的配置文件
`tools/misc/print_config.py` 逐字打印整个配置文件,展开所有的导入。
```shell
python tools/misc/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
```
# Copyright (c) OpenMMLab. All rights reserved.
import mmcv
import mmdet
import mmseg
from .version import __version__, short_version
def digit_version(version_str):
digit_version = []
for x in version_str.split('.'):
if x.isdigit():
digit_version.append(int(x))
elif x.find('rc') != -1:
patch_version = x.split('rc')
digit_version.append(int(patch_version[0]) - 1)
digit_version.append(int(patch_version[1]))
return digit_version
mmcv_minimum_version = '1.5.2'
mmcv_maximum_version = '1.7.0'
mmcv_version = digit_version(mmcv.__version__)
assert (mmcv_version >= digit_version(mmcv_minimum_version)
and mmcv_version <= digit_version(mmcv_maximum_version)), \
f'MMCV=={mmcv.__version__} is used but incompatible. ' \
f'Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}.'
mmdet_minimum_version = '2.24.0'
mmdet_maximum_version = '3.0.0'
mmdet_version = digit_version(mmdet.__version__)
assert (mmdet_version >= digit_version(mmdet_minimum_version)
and mmdet_version <= digit_version(mmdet_maximum_version)), \
f'MMDET=={mmdet.__version__} is used but incompatible. ' \
f'Please install mmdet>={mmdet_minimum_version}, ' \
f'<={mmdet_maximum_version}.'
mmseg_minimum_version = '0.20.0'
mmseg_maximum_version = '1.0.0'
mmseg_version = digit_version(mmseg.__version__)
assert (mmseg_version >= digit_version(mmseg_minimum_version)
and mmseg_version <= digit_version(mmseg_maximum_version)), \
f'MMSEG=={mmseg.__version__} is used but incompatible. ' \
f'Please install mmseg>={mmseg_minimum_version}, ' \
f'<={mmseg_maximum_version}.'
__all__ = ['__version__', 'short_version']
# Copyright (c) OpenMMLab. All rights reserved.
from .inference import (convert_SyncBN, inference_detector,
inference_mono_3d_detector,
inference_multi_modality_detector, inference_segmentor,
init_model, show_result_meshlab)
from .test import single_gpu_test
from .train import init_random_seed, train_model
__all__ = [
'inference_detector', 'init_model', 'single_gpu_test',
'inference_mono_3d_detector', 'show_result_meshlab', 'convert_SyncBN',
'train_model', 'inference_multi_modality_detector', 'inference_segmentor',
'init_random_seed'
]
# Copyright (c) OpenMMLab. All rights reserved.
import re
from copy import deepcopy
from os import path as osp
import mmcv
import numpy as np
import torch
from mmcv.parallel import collate, scatter
from mmcv.runner import load_checkpoint
from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,
DepthInstance3DBoxes, LiDARInstance3DBoxes,
show_multi_modality_result, show_result,
show_seg_result)
from mmdet3d.core.bbox import get_box_type
from mmdet3d.datasets.pipelines import Compose
from mmdet3d.models import build_model
from mmdet3d.utils import get_root_logger
def convert_SyncBN(config):
"""Convert config's naiveSyncBN to BN.
Args:
config (str or :obj:`mmcv.Config`): Config file path or the config
object.
"""
if isinstance(config, dict):
for item in config:
if item == 'norm_cfg':
config[item]['type'] = config[item]['type']. \
replace('naiveSyncBN', 'BN')
else:
convert_SyncBN(config[item])
def init_model(config, checkpoint=None, device='cuda:0'):
"""Initialize a model from config file, which could be a 3D detector or a
3D segmentor.
Args:
config (str or :obj:`mmcv.Config`): Config file path or the config
object.
checkpoint (str, optional): Checkpoint path. If left as None, the model
will not load any weights.
device (str): Device to use.
Returns:
nn.Module: The constructed detector.
"""
if isinstance(config, str):
config = mmcv.Config.fromfile(config)
elif not isinstance(config, mmcv.Config):
raise TypeError('config must be a filename or Config object, '
f'but got {type(config)}')
config.model.pretrained = None
convert_SyncBN(config.model)
config.model.train_cfg = None
model = build_model(config.model, test_cfg=config.get('test_cfg'))
if checkpoint is not None:
checkpoint = load_checkpoint(model, checkpoint, map_location='cpu')
if 'CLASSES' in checkpoint['meta']:
model.CLASSES = checkpoint['meta']['CLASSES']
else:
model.CLASSES = config.class_names
if 'PALETTE' in checkpoint['meta']: # 3D Segmentor
model.PALETTE = checkpoint['meta']['PALETTE']
model.cfg = config # save the config in the model for convenience
if device != 'cpu':
torch.cuda.set_device(device)
else:
logger = get_root_logger()
logger.warning('Don\'t suggest using CPU device. '
'Some functions are not supported for now.')
model.to(device)
model.eval()
return model
def inference_detector(model, pcd):
"""Inference point cloud with the detector.
Args:
model (nn.Module): The loaded detector.
pcd (str): Point cloud files.
Returns:
tuple: Predicted results and data from pipeline.
"""
cfg = model.cfg
device = next(model.parameters()).device # model device
if not isinstance(pcd, str):
cfg = cfg.copy()
# set loading pipeline type
cfg.data.test.pipeline[0].type = 'LoadPointsFromDict'
# build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline)
box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
if isinstance(pcd, str):
# load from point clouds file
data = dict(
pts_filename=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
else:
# load from http
data = dict(
points=pcd,
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
# for ScanNet demo we need axis_align_matrix
ann_info=dict(axis_align_matrix=np.eye(4)),
sweeps=[],
# set timestamp = 0
timestamp=[0],
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
data = test_pipeline(data)
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
# scatter to specified GPU
data = scatter(data, [device.index])[0]
else:
# this is a workaround to avoid the bug of MMDataParallel
data['img_metas'] = data['img_metas'][0].data
data['points'] = data['points'][0].data
# forward the model
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
return result, data
def inference_multi_modality_detector(model, pcd, image, ann_file):
"""Inference point cloud with the multi-modality detector.
Args:
model (nn.Module): The loaded detector.
pcd (str): Point cloud files.
image (str): Image files.
ann_file (str): Annotation files.
Returns:
tuple: Predicted results and data from pipeline.
"""
cfg = model.cfg
device = next(model.parameters()).device # model device
# build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline)
box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
# get data info containing calib
data_infos = mmcv.load(ann_file)
image_idx = int(re.findall(r'\d+', image)[-1]) # xxx/sunrgbd_000017.jpg
for x in data_infos:
if int(x['image']['image_idx']) != image_idx:
continue
info = x
break
data = dict(
pts_filename=pcd,
img_prefix=osp.dirname(image),
img_info=dict(filename=osp.basename(image)),
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
data = test_pipeline(data)
# TODO: this code is dataset-specific. Move lidar2img and
# depth2img to .pkl annotations in the future.
# LiDAR to image conversion
if box_mode_3d == Box3DMode.LIDAR:
rect = info['calib']['R0_rect'].astype(np.float32)
Trv2c = info['calib']['Tr_velo_to_cam'].astype(np.float32)
P2 = info['calib']['P2'].astype(np.float32)
lidar2img = P2 @ rect @ Trv2c
data['img_metas'][0].data['lidar2img'] = lidar2img
# Depth to image conversion
elif box_mode_3d == Box3DMode.DEPTH:
rt_mat = info['calib']['Rt']
# follow Coord3DMode.convert_point
rt_mat = np.array([[1, 0, 0], [0, 0, -1], [0, 1, 0]
]) @ rt_mat.transpose(1, 0)
depth2img = info['calib']['K'] @ rt_mat
data['img_metas'][0].data['depth2img'] = depth2img
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
# scatter to specified GPU
data = scatter(data, [device.index])[0]
else:
# this is a workaround to avoid the bug of MMDataParallel
data['img_metas'] = data['img_metas'][0].data
data['points'] = data['points'][0].data
data['img'] = data['img'][0].data
# forward the model
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
return result, data
def inference_mono_3d_detector(model, image, ann_file):
"""Inference image with the monocular 3D detector.
Args:
model (nn.Module): The loaded detector.
image (str): Image files.
ann_file (str): Annotation files.
Returns:
tuple: Predicted results and data from pipeline.
"""
cfg = model.cfg
device = next(model.parameters()).device # model device
# build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline)
box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
# get data info containing calib
data_infos = mmcv.load(ann_file)
# find the info corresponding to this image
for x in data_infos['images']:
if osp.basename(x['file_name']) != osp.basename(image):
continue
img_info = x
break
data = dict(
img_prefix=osp.dirname(image),
img_info=dict(filename=osp.basename(image)),
box_type_3d=box_type_3d,
box_mode_3d=box_mode_3d,
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
# camera points to image conversion
if box_mode_3d == Box3DMode.CAM:
data['img_info'].update(dict(cam_intrinsic=img_info['cam_intrinsic']))
data = test_pipeline(data)
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
# scatter to specified GPU
data = scatter(data, [device.index])[0]
else:
# this is a workaround to avoid the bug of MMDataParallel
data['img_metas'] = data['img_metas'][0].data
data['img'] = data['img'][0].data
# forward the model
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
return result, data
def inference_segmentor(model, pcd):
"""Inference point cloud with the segmentor.
Args:
model (nn.Module): The loaded segmentor.
pcd (str): Point cloud files.
Returns:
tuple: Predicted results and data from pipeline.
"""
cfg = model.cfg
device = next(model.parameters()).device # model device
# build the data pipeline
test_pipeline = deepcopy(cfg.data.test.pipeline)
test_pipeline = Compose(test_pipeline)
data = dict(
pts_filename=pcd,
img_fields=[],
bbox3d_fields=[],
pts_mask_fields=[],
pts_seg_fields=[],
bbox_fields=[],
mask_fields=[],
seg_fields=[])
data = test_pipeline(data)
data = collate([data], samples_per_gpu=1)
if next(model.parameters()).is_cuda:
# scatter to specified GPU
data = scatter(data, [device.index])[0]
else:
# this is a workaround to avoid the bug of MMDataParallel
data['img_metas'] = data['img_metas'][0].data
data['points'] = data['points'][0].data
# forward the model
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
return result, data
def show_det_result_meshlab(data,
result,
out_dir,
score_thr=0.0,
show=False,
snapshot=False):
"""Show 3D detection result by meshlab."""
points = data['points'][0][0].cpu().numpy()
pts_filename = data['img_metas'][0][0]['pts_filename']
file_name = osp.split(pts_filename)[-1].split('.')[0]
if 'pts_bbox' in result[0].keys():
pred_bboxes = result[0]['pts_bbox']['boxes_3d'].tensor.numpy()
pred_scores = result[0]['pts_bbox']['scores_3d'].numpy()
else:
pred_bboxes = result[0]['boxes_3d'].tensor.numpy()
pred_scores = result[0]['scores_3d'].numpy()
# filter out low score bboxes for visualization
if score_thr > 0:
inds = pred_scores > score_thr
pred_bboxes = pred_bboxes[inds]
# for now we convert points into depth mode
box_mode = data['img_metas'][0][0]['box_mode_3d']
if box_mode != Box3DMode.DEPTH:
points = Coord3DMode.convert(points, box_mode, Coord3DMode.DEPTH)
show_bboxes = Box3DMode.convert(pred_bboxes, box_mode, Box3DMode.DEPTH)
else:
show_bboxes = deepcopy(pred_bboxes)
show_result(
points,
None,
show_bboxes,
out_dir,
file_name,
show=show,
snapshot=snapshot)
return file_name
def show_seg_result_meshlab(data,
result,
out_dir,
palette,
show=False,
snapshot=False):
"""Show 3D segmentation result by meshlab."""
points = data['points'][0][0].cpu().numpy()
pts_filename = data['img_metas'][0][0]['pts_filename']
file_name = osp.split(pts_filename)[-1].split('.')[0]
pred_seg = result[0]['semantic_mask'].numpy()
if palette is None:
# generate random color map
max_idx = pred_seg.max()
palette = np.random.randint(0, 256, size=(max_idx + 1, 3))
palette = np.array(palette).astype(np.int)
show_seg_result(
points,
None,
pred_seg,
out_dir,
file_name,
palette=palette,
show=show,
snapshot=snapshot)
return file_name
def show_proj_det_result_meshlab(data,
result,
out_dir,
score_thr=0.0,
show=False,
snapshot=False):
"""Show result of projecting 3D bbox to 2D image by meshlab."""
assert 'img' in data.keys(), 'image data is not provided for visualization'
img_filename = data['img_metas'][0][0]['filename']
file_name = osp.split(img_filename)[-1].split('.')[0]
# read from file because img in data_dict has undergone pipeline transform
img = mmcv.imread(img_filename)
if 'pts_bbox' in result[0].keys():
result[0] = result[0]['pts_bbox']
elif 'img_bbox' in result[0].keys():
result[0] = result[0]['img_bbox']
pred_bboxes = result[0]['boxes_3d'].tensor.numpy()
pred_scores = result[0]['scores_3d'].numpy()
# filter out low score bboxes for visualization
if score_thr > 0:
inds = pred_scores > score_thr
pred_bboxes = pred_bboxes[inds]
box_mode = data['img_metas'][0][0]['box_mode_3d']
if box_mode == Box3DMode.LIDAR:
if 'lidar2img' not in data['img_metas'][0][0]:
raise NotImplementedError(
'LiDAR to image transformation matrix is not provided')
show_bboxes = LiDARInstance3DBoxes(pred_bboxes, origin=(0.5, 0.5, 0))
show_multi_modality_result(
img,
None,
show_bboxes,
data['img_metas'][0][0]['lidar2img'],
out_dir,
file_name,
box_mode='lidar',
show=show)
elif box_mode == Box3DMode.DEPTH:
show_bboxes = DepthInstance3DBoxes(pred_bboxes, origin=(0.5, 0.5, 0))
show_multi_modality_result(
img,
None,
show_bboxes,
None,
out_dir,
file_name,
box_mode='depth',
img_metas=data['img_metas'][0][0],
show=show)
elif box_mode == Box3DMode.CAM:
if 'cam2img' not in data['img_metas'][0][0]:
raise NotImplementedError(
'camera intrinsic matrix is not provided')
show_bboxes = CameraInstance3DBoxes(
pred_bboxes, box_dim=pred_bboxes.shape[-1], origin=(0.5, 1.0, 0.5))
show_multi_modality_result(
img,
None,
show_bboxes,
data['img_metas'][0][0]['cam2img'],
out_dir,
file_name,
box_mode='camera',
show=show)
else:
raise NotImplementedError(
f'visualization of {box_mode} bbox is not supported')
return file_name
def show_result_meshlab(data,
result,
out_dir,
score_thr=0.0,
show=False,
snapshot=False,
task='det',
palette=None):
"""Show result by meshlab.
Args:
data (dict): Contain data from pipeline.
result (dict): Predicted result from model.
out_dir (str): Directory to save visualized result.
score_thr (float, optional): Minimum score of bboxes to be shown.
Default: 0.0
show (bool, optional): Visualize the results online. Defaults to False.
snapshot (bool, optional): Whether to save the online results.
Defaults to False.
task (str, optional): Distinguish which task result to visualize.
Currently we support 3D detection, multi-modality detection and
3D segmentation. Defaults to 'det'.
palette (list[list[int]]] | np.ndarray, optional): The palette
of segmentation map. If None is given, random palette will be
generated. Defaults to None.
"""
assert task in ['det', 'multi_modality-det', 'seg', 'mono-det'], \
f'unsupported visualization task {task}'
assert out_dir is not None, 'Expect out_dir, got none.'
if task in ['det', 'multi_modality-det']:
file_name = show_det_result_meshlab(data, result, out_dir, score_thr,
show, snapshot)
if task in ['seg']:
file_name = show_seg_result_meshlab(data, result, out_dir, palette,
show, snapshot)
if task in ['multi_modality-det', 'mono-det']:
file_name = show_proj_det_result_meshlab(data, result, out_dir,
score_thr, show, snapshot)
return out_dir, file_name
# Copyright (c) OpenMMLab. All rights reserved.
from os import path as osp
import mmcv
import torch
from mmcv.image import tensor2imgs
from mmdet3d.models import (Base3DDetector, Base3DSegmentor,
SingleStageMono3DDetector)
def single_gpu_test(model,
data_loader,
show=False,
out_dir=None,
show_score_thr=0.3):
"""Test model with single gpu.
This method tests model with single gpu and gives the 'show' option.
By setting ``show=True``, it saves the visualization results under
``out_dir``.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
show (bool, optional): Whether to save viualization results.
Default: True.
out_dir (str, optional): The path to save visualization results.
Default: None.
Returns:
list[dict]: The prediction results.
"""
model.eval()
results = []
dataset = data_loader.dataset
prog_bar = mmcv.ProgressBar(len(dataset))
for i, data in enumerate(data_loader):
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
if show:
# Visualize the results of MMDetection3D model
# 'show_results' is MMdetection3D visualization API
models_3d = (Base3DDetector, Base3DSegmentor,
SingleStageMono3DDetector)
if isinstance(model.module, models_3d):
model.module.show_results(
data,
result,
out_dir=out_dir,
show=show,
score_thr=show_score_thr)
# Visualize the results of MMDetection model
# 'show_result' is MMdetection visualization API
else:
batch_size = len(result)
if batch_size == 1 and isinstance(data['img'][0],
torch.Tensor):
img_tensor = data['img'][0]
else:
img_tensor = data['img'][0].data[0]
img_metas = data['img_metas'][0].data[0]
imgs = tensor2imgs(img_tensor, **img_metas[0]['img_norm_cfg'])
assert len(imgs) == len(img_metas)
for i, (img, img_meta) in enumerate(zip(imgs, img_metas)):
h, w, _ = img_meta['img_shape']
img_show = img[:h, :w, :]
ori_h, ori_w = img_meta['ori_shape'][:-1]
img_show = mmcv.imresize(img_show, (ori_w, ori_h))
if out_dir:
out_file = osp.join(out_dir, img_meta['ori_filename'])
else:
out_file = None
model.module.show_result(
img_show,
result[i],
show=show,
out_file=out_file,
score_thr=show_score_thr)
results.extend(result)
batch_size = len(result)
for _ in range(batch_size):
prog_bar.update()
return results
# Copyright (c) OpenMMLab. All rights reserved.
import random
import warnings
import numpy as np
import torch
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,
Fp16OptimizerHook, OptimizerHook, build_optimizer,
build_runner, get_dist_info)
from mmcv.utils import build_from_cfg
from torch import distributed as dist
from mmdet3d.datasets import build_dataset
from mmdet3d.utils import find_latest_checkpoint
from mmdet.core import DistEvalHook as MMDET_DistEvalHook
from mmdet.core import EvalHook as MMDET_EvalHook
from mmdet.datasets import build_dataloader as build_mmdet_dataloader
from mmdet.datasets import replace_ImageToTensor
from mmdet.utils import get_root_logger as get_mmdet_root_logger
from mmseg.core import DistEvalHook as MMSEG_DistEvalHook
from mmseg.core import EvalHook as MMSEG_EvalHook
from mmseg.datasets import build_dataloader as build_mmseg_dataloader
from mmseg.utils import get_root_logger as get_mmseg_root_logger
def init_random_seed(seed=None, device='cuda'):
"""Initialize random seed.
If the seed is not set, the seed will be automatically randomized,
and then broadcast to all processes to prevent some potential bugs.
Args:
seed (int, optional): The seed. Default to None.
device (str, optional): The device where the seed will be put on.
Default to 'cuda'.
Returns:
int: Seed to be used.
"""
if seed is not None:
return seed
# Make sure all ranks share the same random seed to prevent
# some potential bugs. Please refer to
# https://github.com/open-mmlab/mmdetection/issues/6339
rank, world_size = get_dist_info()
seed = np.random.randint(2**31)
if world_size == 1:
return seed
if rank == 0:
random_num = torch.tensor(seed, dtype=torch.int32, device=device)
else:
random_num = torch.tensor(0, dtype=torch.int32, device=device)
dist.broadcast(random_num, src=0)
return random_num.item()
def set_random_seed(seed, deterministic=False):
"""Set random seed.
Args:
seed (int): Seed to be used.
deterministic (bool): Whether to set the deterministic option for
CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`
to True and `torch.backends.cudnn.benchmark` to False.
Default: False.
"""
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
if deterministic:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def train_segmentor(model,
dataset,
cfg,
distributed=False,
validate=False,
timestamp=None,
meta=None):
"""Launch segmentor training."""
logger = get_mmseg_root_logger(cfg.log_level)
# prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
data_loaders = [
build_mmseg_dataloader(
ds,
cfg.data.samples_per_gpu,
cfg.data.workers_per_gpu,
# cfg.gpus will be ignored if distributed
len(cfg.gpu_ids),
dist=distributed,
seed=cfg.seed,
drop_last=True) for ds in dataset
]
# put model on gpus
if distributed:
find_unused_parameters = cfg.get('find_unused_parameters', False)
# Sets the `find_unused_parameters` parameter in
# torch.nn.parallel.DistributedDataParallel
#model.to(memory_format=torch.channels_last)
model = MMDistributedDataParallel(
model.cuda(),
device_ids=[torch.cuda.current_device()],
broadcast_buffers=False,
find_unused_parameters=find_unused_parameters)
else:
model = MMDataParallel(
model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
# build runner
optimizer = build_optimizer(model, cfg.optimizer)
if cfg.get('runner') is None:
cfg.runner = {'type': 'IterBasedRunner', 'max_iters': cfg.total_iters}
warnings.warn(
'config is now expected to have a `runner` section, '
'please set `runner` in your config.', UserWarning)
runner = build_runner(
cfg.runner,
default_args=dict(
model=model,
batch_processor=None,
optimizer=optimizer,
work_dir=cfg.work_dir,
logger=logger,
meta=meta))
# register hooks
runner.register_training_hooks(cfg.lr_config, cfg.optimizer_config,
cfg.checkpoint_config, cfg.log_config,
cfg.get('momentum_config', None))
# an ugly walkaround to make the .log and .log.json filenames the same
runner.timestamp = timestamp
# register eval hooks
if validate:
val_dataset = build_dataset(cfg.data.val, dict(test_mode=True))
val_dataloader = build_mmseg_dataloader(
val_dataset,
samples_per_gpu=1,
workers_per_gpu=cfg.data.workers_per_gpu,
dist=distributed,
shuffle=False)
eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_hook = MMSEG_DistEvalHook if distributed else MMSEG_EvalHook
# In this PR (https://github.com/open-mmlab/mmcv/pull/1193), the
# priority of IterTimerHook has been modified from 'NORMAL' to 'LOW'.
runner.register_hook(
eval_hook(val_dataloader, **eval_cfg), priority='LOW')
# user-defined hooks
if cfg.get('custom_hooks', None):
custom_hooks = cfg.custom_hooks
assert isinstance(custom_hooks, list), \
f'custom_hooks expect list type, but got {type(custom_hooks)}'
for hook_cfg in cfg.custom_hooks:
assert isinstance(hook_cfg, dict), \
'Each item in custom_hooks expects dict type, but got ' \
f'{type(hook_cfg)}'
hook_cfg = hook_cfg.copy()
priority = hook_cfg.pop('priority', 'NORMAL')
hook = build_from_cfg(hook_cfg, HOOKS)
runner.register_hook(hook, priority=priority)
if cfg.resume_from:
runner.resume(cfg.resume_from)
elif cfg.load_from:
runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow)
def train_detector(model,
dataset,
cfg,
distributed=False,
validate=False,
timestamp=None,
meta=None):
logger = get_mmdet_root_logger(log_level=cfg.log_level)
# prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
if 'imgs_per_gpu' in cfg.data:
logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. '
'Please use "samples_per_gpu" instead')
if 'samples_per_gpu' in cfg.data:
logger.warning(
f'Got "imgs_per_gpu"={cfg.data.imgs_per_gpu} and '
f'"samples_per_gpu"={cfg.data.samples_per_gpu}, "imgs_per_gpu"'
f'={cfg.data.imgs_per_gpu} is used in this experiments')
else:
logger.warning(
'Automatically set "samples_per_gpu"="imgs_per_gpu"='
f'{cfg.data.imgs_per_gpu} in this experiments')
cfg.data.samples_per_gpu = cfg.data.imgs_per_gpu
runner_type = 'EpochBasedRunner' if 'runner' not in cfg else cfg.runner[
'type']
data_loaders = [
build_mmdet_dataloader(
ds,
cfg.data.samples_per_gpu,
cfg.data.workers_per_gpu,
# `num_gpus` will be ignored if distributed
num_gpus=len(cfg.gpu_ids),
dist=distributed,
seed=cfg.seed,
runner_type=runner_type,
pin_memory=True,
persistent_workers=True)
#persistent_workers=cfg.data.get('persistent_workers', False))
for ds in dataset
]
# put model on gpus
#model.to(memory_format=torch.channels_last)
if distributed:
find_unused_parameters = cfg.get('find_unused_parameters', False)
# Sets the `find_unused_parameters` parameter in
# torch.nn.parallel.DistributedDataParallel
model = MMDistributedDataParallel(
model.cuda(),
device_ids=[torch.cuda.current_device()],
broadcast_buffers=False,
find_unused_parameters=find_unused_parameters)
else:
model = MMDataParallel(
model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
# build runner
optimizer = build_optimizer(model, cfg.optimizer)
if 'runner' not in cfg:
cfg.runner = {
'type': 'EpochBasedRunner',
'max_epochs': cfg.total_epochs
}
warnings.warn(
'config is now expected to have a `runner` section, '
'please set `runner` in your config.', UserWarning)
else:
if 'total_epochs' in cfg:
assert cfg.total_epochs == cfg.runner.max_epochs
runner = build_runner(
cfg.runner,
default_args=dict(
model=model,
optimizer=optimizer,
work_dir=cfg.work_dir,
logger=logger,
meta=meta))
# an ugly workaround to make .log and .log.json filenames the same
runner.timestamp = timestamp
# fp16 setting
fp16_cfg = cfg.get('fp16', None)
if fp16_cfg is not None:
optimizer_config = Fp16OptimizerHook(
**cfg.optimizer_config, **fp16_cfg, distributed=distributed)
elif distributed and 'type' not in cfg.optimizer_config:
optimizer_config = OptimizerHook(**cfg.optimizer_config)
else:
optimizer_config = cfg.optimizer_config
# register hooks
runner.register_training_hooks(
cfg.lr_config,
optimizer_config,
cfg.checkpoint_config,
cfg.log_config,
cfg.get('momentum_config', None),
custom_hooks_config=cfg.get('custom_hooks', None))
if distributed:
if isinstance(runner, EpochBasedRunner):
runner.register_hook(DistSamplerSeedHook())
# register eval hooks
if validate:
# Support batch_size > 1 in validation
val_samples_per_gpu = cfg.data.val.pop('samples_per_gpu', 1)
if val_samples_per_gpu > 1:
# Replace 'ImageToTensor' to 'DefaultFormatBundle'
cfg.data.val.pipeline = replace_ImageToTensor(
cfg.data.val.pipeline)
val_dataset = build_dataset(cfg.data.val, dict(test_mode=True))
val_dataloader = build_mmdet_dataloader(
val_dataset,
samples_per_gpu=val_samples_per_gpu,
workers_per_gpu=cfg.data.workers_per_gpu,
dist=distributed,
shuffle=False)
eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_hook = MMDET_DistEvalHook if distributed else MMDET_EvalHook
# In this PR (https://github.com/open-mmlab/mmcv/pull/1193), the
# priority of IterTimerHook has been modified from 'NORMAL' to 'LOW'.
runner.register_hook(
eval_hook(val_dataloader, **eval_cfg), priority='LOW')
resume_from = None
if cfg.resume_from is None and cfg.get('auto_resume'):
resume_from = find_latest_checkpoint(cfg.work_dir)
if resume_from is not None:
cfg.resume_from = resume_from
if cfg.resume_from:
runner.resume(cfg.resume_from)
elif cfg.load_from:
runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow)
def train_model(model,
dataset,
cfg,
distributed=False,
validate=False,
timestamp=None,
meta=None):
"""A function wrapper for launching model training according to cfg.
Because we need different eval_hook in runner. Should be deprecated in the
future.
"""
if cfg.model.type in ['EncoderDecoder3D']:
train_segmentor(
model,
dataset,
cfg,
distributed=distributed,
validate=validate,
timestamp=timestamp,
meta=meta)
else:
train_detector(
model,
dataset,
cfg,
distributed=distributed,
validate=validate,
timestamp=timestamp,
meta=meta)
# Copyright (c) OpenMMLab. All rights reserved.
from .anchor import * # noqa: F401, F403
from .bbox import * # noqa: F401, F403
from .evaluation import * # noqa: F401, F403
from .points import * # noqa: F401, F403
from .post_processing import * # noqa: F401, F403
from .utils import * # noqa: F401, F403
from .visualizer import * # noqa: F401, F403
from .voxel import * # noqa: F401, F403
# Copyright (c) OpenMMLab. All rights reserved.
from mmdet.core.anchor import build_prior_generator
from .anchor_3d_generator import (AlignedAnchor3DRangeGenerator,
AlignedAnchor3DRangeGeneratorPerCls,
Anchor3DRangeGenerator)
__all__ = [
'AlignedAnchor3DRangeGenerator', 'Anchor3DRangeGenerator',
'build_prior_generator', 'AlignedAnchor3DRangeGeneratorPerCls'
]
# Copyright (c) OpenMMLab. All rights reserved.
import mmcv
import torch
from mmdet.core.anchor import ANCHOR_GENERATORS
@ANCHOR_GENERATORS.register_module()
class Anchor3DRangeGenerator(object):
"""3D Anchor Generator by range.
This anchor generator generates anchors by the given range in different
feature levels.
Due the convention in 3D detection, different anchor sizes are related to
different ranges for different categories. However we find this setting
does not effect the performance much in some datasets, e.g., nuScenes.
Args:
ranges (list[list[float]]): Ranges of different anchors.
The ranges are the same across different feature levels. But may
vary for different anchor sizes if size_per_range is True.
sizes (list[list[float]], optional): 3D sizes of anchors.
Defaults to [[3.9, 1.6, 1.56]].
scales (list[int], optional): Scales of anchors in different feature
levels. Defaults to [1].
rotations (list[float], optional): Rotations of anchors in a feature
grid. Defaults to [0, 1.5707963].
custom_values (tuple[float], optional): Customized values of that
anchor. For example, in nuScenes the anchors have velocities.
Defaults to ().
reshape_out (bool, optional): Whether to reshape the output into
(N x 4). Defaults to True.
size_per_range (bool, optional): Whether to use separate ranges for
different sizes. If size_per_range is True, the ranges should have
the same length as the sizes, if not, it will be duplicated.
Defaults to True.
"""
def __init__(self,
ranges,
sizes=[[3.9, 1.6, 1.56]],
scales=[1],
rotations=[0, 1.5707963],
custom_values=(),
reshape_out=True,
size_per_range=True):
assert mmcv.is_list_of(ranges, list)
if size_per_range:
if len(sizes) != len(ranges):
assert len(ranges) == 1
ranges = ranges * len(sizes)
assert len(ranges) == len(sizes)
else:
assert len(ranges) == 1
assert mmcv.is_list_of(sizes, list)
assert isinstance(scales, list)
self.sizes = sizes
self.scales = scales
self.ranges = ranges
self.rotations = rotations
self.custom_values = custom_values
self.cached_anchors = None
self.reshape_out = reshape_out
self.size_per_range = size_per_range
def __repr__(self):
s = self.__class__.__name__ + '('
s += f'anchor_range={self.ranges},\n'
s += f'scales={self.scales},\n'
s += f'sizes={self.sizes},\n'
s += f'rotations={self.rotations},\n'
s += f'reshape_out={self.reshape_out},\n'
s += f'size_per_range={self.size_per_range})'
return s
@property
def num_base_anchors(self):
"""list[int]: Total number of base anchors in a feature grid."""
num_rot = len(self.rotations)
num_size = torch.tensor(self.sizes).reshape(-1, 3).size(0)
return num_rot * num_size
@property
def num_levels(self):
"""int: Number of feature levels that the generator is applied to."""
return len(self.scales)
def grid_anchors(self, featmap_sizes, device='cuda'):
"""Generate grid anchors in multiple feature levels.
Args:
featmap_sizes (list[tuple]): List of feature map sizes in
multiple feature levels.
device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns:
list[torch.Tensor]: Anchors in multiple feature levels.
The sizes of each tensor should be [N, 4], where
N = width * height * num_base_anchors, width and height
are the sizes of the corresponding feature level,
num_base_anchors is the number of anchors for that level.
"""
assert self.num_levels == len(featmap_sizes)
multi_level_anchors = []
for i in range(self.num_levels):
anchors = self.single_level_grid_anchors(
featmap_sizes[i], self.scales[i], device=device)
if self.reshape_out:
anchors = anchors.reshape(-1, anchors.size(-1))
multi_level_anchors.append(anchors)
return multi_level_anchors
def single_level_grid_anchors(self, featmap_size, scale, device='cuda'):
"""Generate grid anchors of a single level feature map.
This function is usually called by method ``self.grid_anchors``.
Args:
featmap_size (tuple[int]): Size of the feature map.
scale (float): Scale factor of the anchors in the current level.
device (str, optional): Device the tensor will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors in the overall feature map.
"""
# We reimplement the anchor generator using torch in cuda
# torch: 0.6975 s for 1000 times
# numpy: 4.3345 s for 1000 times
# which is ~5 times faster than the numpy implementation
if not self.size_per_range:
return self.anchors_single_range(
featmap_size,
self.ranges[0],
scale,
self.sizes,
self.rotations,
device=device)
mr_anchors = []
for anchor_range, anchor_size in zip(self.ranges, self.sizes):
mr_anchors.append(
self.anchors_single_range(
featmap_size,
anchor_range,
scale,
anchor_size,
self.rotations,
device=device))
mr_anchors = torch.cat(mr_anchors, dim=-3)
return mr_anchors
def anchors_single_range(self,
feature_size,
anchor_range,
scale=1,
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963],
device='cuda'):
"""Generate anchors in a single range.
Args:
feature_size (list[float] | tuple[float]): Feature map size. It is
either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]): Range of anchors with
shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional): The scale factor of anchors.
Defaults to 1.
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
"""
if len(feature_size) == 2:
feature_size = [1, feature_size[0], feature_size[1]]
anchor_range = torch.tensor(anchor_range, device=device)
z_centers = torch.linspace(
anchor_range[2], anchor_range[5], feature_size[0], device=device)
y_centers = torch.linspace(
anchor_range[1], anchor_range[4], feature_size[1], device=device)
x_centers = torch.linspace(
anchor_range[0], anchor_range[3], feature_size[2], device=device)
sizes = torch.tensor(sizes, device=device).reshape(-1, 3) * scale
rotations = torch.tensor(rotations, device=device)
# torch.meshgrid default behavior is 'id', np's default is 'xy'
rets = torch.meshgrid(x_centers, y_centers, z_centers, rotations)
# torch.meshgrid returns a tuple rather than list
rets = list(rets)
tile_shape = [1] * 5
tile_shape[-2] = int(sizes.shape[0])
for i in range(len(rets)):
rets[i] = rets[i].unsqueeze(-2).repeat(tile_shape).unsqueeze(-1)
sizes = sizes.reshape([1, 1, 1, -1, 1, 3])
tile_size_shape = list(rets[0].shape)
tile_size_shape[3] = 1
sizes = sizes.repeat(tile_size_shape)
rets.insert(3, sizes)
ret = torch.cat(rets, dim=-1).permute([2, 1, 0, 3, 4, 5])
# [1, 200, 176, N, 2, 7] for kitti after permute
if len(self.custom_values) > 0:
custom_ndim = len(self.custom_values)
custom = ret.new_zeros([*ret.shape[:-1], custom_ndim])
# custom[:] = self.custom_values
ret = torch.cat([ret, custom], dim=-1)
# [1, 200, 176, N, 2, 9] for nus dataset after permute
return ret
@ANCHOR_GENERATORS.register_module()
class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
"""Aligned 3D Anchor Generator by range.
This anchor generator uses a different manner to generate the positions
of anchors' centers from :class:`Anchor3DRangeGenerator`.
Note:
The `align` means that the anchor's center is aligned with the voxel
grid, which is also the feature grid. The previous implementation of
:class:`Anchor3DRangeGenerator` does not generate the anchors' center
according to the voxel grid. Rather, it generates the center by
uniformly distributing the anchors inside the minimum and maximum
anchor ranges according to the feature map sizes.
However, this makes the anchors center does not match the feature grid.
The :class:`AlignedAnchor3DRangeGenerator` add + 1 when using the
feature map sizes to obtain the corners of the voxel grid. Then it
shifts the coordinates to the center of voxel grid and use the left
up corner to distribute anchors.
Args:
anchor_corner (bool, optional): Whether to align with the corner of the
voxel grid. By default it is False and the anchor's center will be
the same as the corresponding voxel's center, which is also the
center of the corresponding greature grid. Defaults to False.
"""
def __init__(self, align_corner=False, **kwargs):
super(AlignedAnchor3DRangeGenerator, self).__init__(**kwargs)
self.align_corner = align_corner
def anchors_single_range(self,
feature_size,
anchor_range,
scale,
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.5707963],
device='cuda'):
"""Generate anchors in a single range.
Args:
feature_size (list[float] | tuple[float]): Feature map size. It is
either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]): Range of anchors with
shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int): The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to [0, 1.5707963].
device (str, optional): Devices that the anchors will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
"""
if len(feature_size) == 2:
feature_size = [1, feature_size[0], feature_size[1]]
anchor_range = torch.tensor(anchor_range, device=device)
z_centers = torch.linspace(
anchor_range[2],
anchor_range[5],
feature_size[0] + 1,
device=device)
y_centers = torch.linspace(
anchor_range[1],
anchor_range[4],
feature_size[1] + 1,
device=device)
x_centers = torch.linspace(
anchor_range[0],
anchor_range[3],
feature_size[2] + 1,
device=device)
sizes = torch.tensor(sizes, device=device).reshape(-1, 3) * scale
rotations = torch.tensor(rotations, device=device)
# shift the anchor center
if not self.align_corner:
z_shift = (z_centers[1] - z_centers[0]) / 2
y_shift = (y_centers[1] - y_centers[0]) / 2
x_shift = (x_centers[1] - x_centers[0]) / 2
z_centers += z_shift
y_centers += y_shift
x_centers += x_shift
# torch.meshgrid default behavior is 'id', np's default is 'xy'
rets = torch.meshgrid(x_centers[:feature_size[2]],
y_centers[:feature_size[1]],
z_centers[:feature_size[0]], rotations)
# torch.meshgrid returns a tuple rather than list
rets = list(rets)
tile_shape = [1] * 5
tile_shape[-2] = int(sizes.shape[0])
for i in range(len(rets)):
rets[i] = rets[i].unsqueeze(-2).repeat(tile_shape).unsqueeze(-1)
sizes = sizes.reshape([1, 1, 1, -1, 1, 3])
tile_size_shape = list(rets[0].shape)
tile_size_shape[3] = 1
sizes = sizes.repeat(tile_size_shape)
rets.insert(3, sizes)
ret = torch.cat(rets, dim=-1).permute([2, 1, 0, 3, 4, 5])
if len(self.custom_values) > 0:
custom_ndim = len(self.custom_values)
custom = ret.new_zeros([*ret.shape[:-1], custom_ndim])
# TODO: check the support of custom values
# custom[:] = self.custom_values
ret = torch.cat([ret, custom], dim=-1)
return ret
@ANCHOR_GENERATORS.register_module()
class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
"""3D Anchor Generator by range for per class.
This anchor generator generates anchors by the given range for per class.
Note that feature maps of different classes may be different.
Args:
kwargs (dict): Arguments are the same as those in
:class:`AlignedAnchor3DRangeGenerator`.
"""
def __init__(self, **kwargs):
super(AlignedAnchor3DRangeGeneratorPerCls, self).__init__(**kwargs)
assert len(self.scales) == 1, 'Multi-scale feature map levels are' + \
' not supported currently in this kind of anchor generator.'
def grid_anchors(self, featmap_sizes, device='cuda'):
"""Generate grid anchors in multiple feature levels.
Args:
featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level.
device (str, optional): Device where the anchors will be put on.
Defaults to 'cuda'.
Returns:
list[list[torch.Tensor]]: Anchors in multiple feature levels.
Note that in this anchor generator, we currently only
support single feature level. The sizes of each tensor
should be [num_sizes/ranges*num_rots*featmap_size,
box_code_size].
"""
multi_level_anchors = []
anchors = self.multi_cls_grid_anchors(
featmap_sizes, self.scales[0], device=device)
multi_level_anchors.append(anchors)
return multi_level_anchors
def multi_cls_grid_anchors(self, featmap_sizes, scale, device='cuda'):
"""Generate grid anchors of a single level feature map for multi-class
with different feature map sizes.
This function is usually called by method ``self.grid_anchors``.
Args:
featmap_sizes (list[tuple]): List of feature map sizes for
different classes in a single feature level.
scale (float): Scale factor of the anchors in the current level.
device (str, optional): Device the tensor will be put on.
Defaults to 'cuda'.
Returns:
torch.Tensor: Anchors in the overall feature map.
"""
assert len(featmap_sizes) == len(self.sizes) == len(self.ranges), \
'The number of different feature map sizes anchor sizes and ' + \
'ranges should be the same.'
multi_cls_anchors = []
for i in range(len(featmap_sizes)):
anchors = self.anchors_single_range(
featmap_sizes[i],
self.ranges[i],
scale,
self.sizes[i],
self.rotations,
device=device)
# [*featmap_size, num_sizes/ranges, num_rots, box_code_size]
ndim = len(featmap_sizes[i])
anchors = anchors.view(*featmap_sizes[i], -1, anchors.size(-1))
# [*featmap_size, num_sizes/ranges*num_rots, box_code_size]
anchors = anchors.permute(ndim, *range(0, ndim), ndim + 1)
# [num_sizes/ranges*num_rots, *featmap_size, box_code_size]
multi_cls_anchors.append(anchors.reshape(-1, anchors.size(-1)))
# [num_sizes/ranges*num_rots*featmap_size, box_code_size]
return multi_cls_anchors
# Copyright (c) OpenMMLab. All rights reserved.
from .assigners import AssignResult, BaseAssigner, MaxIoUAssigner
from .coders import DeltaXYZWLHRBBoxCoder
# from .bbox_target import bbox_target
from .iou_calculators import (AxisAlignedBboxOverlaps3D, BboxOverlaps3D,
BboxOverlapsNearest3D,
axis_aligned_bbox_overlaps_3d, bbox_overlaps_3d,
bbox_overlaps_nearest_3d)
from .samplers import (BaseSampler, CombinedSampler,
InstanceBalancedPosSampler, IoUBalancedNegSampler,
PseudoSampler, RandomSampler, SamplingResult)
from .structures import (BaseInstance3DBoxes, Box3DMode, CameraInstance3DBoxes,
Coord3DMode, DepthInstance3DBoxes,
LiDARInstance3DBoxes, get_box_type, limit_period,
mono_cam_box2vis, points_cam2img, points_img2cam,
xywhr2xyxyr)
from .transforms import bbox3d2result, bbox3d2roi, bbox3d_mapping_back
__all__ = [
'BaseSampler', 'AssignResult', 'BaseAssigner', 'MaxIoUAssigner',
'PseudoSampler', 'RandomSampler', 'InstanceBalancedPosSampler',
'IoUBalancedNegSampler', 'CombinedSampler', 'SamplingResult',
'DeltaXYZWLHRBBoxCoder', 'BboxOverlapsNearest3D', 'BboxOverlaps3D',
'bbox_overlaps_nearest_3d', 'bbox_overlaps_3d',
'AxisAlignedBboxOverlaps3D', 'axis_aligned_bbox_overlaps_3d', 'Box3DMode',
'LiDARInstance3DBoxes', 'CameraInstance3DBoxes', 'bbox3d2roi',
'bbox3d2result', 'DepthInstance3DBoxes', 'BaseInstance3DBoxes',
'bbox3d_mapping_back', 'xywhr2xyxyr', 'limit_period', 'points_cam2img',
'points_img2cam', 'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
]
# Copyright (c) OpenMMLab. All rights reserved.
from mmdet.core.bbox import AssignResult, BaseAssigner, MaxIoUAssigner
__all__ = ['BaseAssigner', 'MaxIoUAssigner', 'AssignResult']
# Copyright (c) OpenMMLab. All rights reserved.
# TODO: clean the functions in this file and move the APIs into box structures
# in the future
# NOTICE: All functions in this file are valid for LiDAR or depth boxes only
# if we use default parameters.
import numba
import numpy as np
from .structures.utils import limit_period, points_cam2img, rotation_3d_in_axis
def camera_to_lidar(points, r_rect, velo2cam):
"""Convert points in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args:
points (np.ndarray, shape=[N, 3]): Points in camera coordinate.
r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
specific camera coordinate (e.g. CAM2) to CAM0.
velo2cam (np.ndarray, shape=[4, 4]): Matrix to project points in
camera coordinate to lidar coordinate.
Returns:
np.ndarray, shape=[N, 3]: Points in lidar coordinate.
"""
points_shape = list(points.shape[0:-1])
if points.shape[-1] == 3:
points = np.concatenate([points, np.ones(points_shape + [1])], axis=-1)
lidar_points = points @ np.linalg.inv((r_rect @ velo2cam).T)
return lidar_points[..., :3]
def box_camera_to_lidar(data, r_rect, velo2cam):
"""Convert boxes in camera coordinate to lidar coordinate.
Note:
This function is for KITTI only.
Args:
data (np.ndarray, shape=[N, 7]): Boxes in camera coordinate.
r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
specific camera coordinate (e.g. CAM2) to CAM0.
velo2cam (np.ndarray, shape=[4, 4]): Matrix to project points in
camera coordinate to lidar coordinate.
Returns:
np.ndarray, shape=[N, 3]: Boxes in lidar coordinate.
"""
xyz = data[:, 0:3]
x_size, y_size, z_size = data[:, 3:4], data[:, 4:5], data[:, 5:6]
r = data[:, 6:7]
xyz_lidar = camera_to_lidar(xyz, r_rect, velo2cam)
# yaw and dims also needs to be converted
r_new = -r - np.pi / 2
r_new = limit_period(r_new, period=np.pi * 2)
return np.concatenate([xyz_lidar, x_size, z_size, y_size, r_new], axis=1)
def corners_nd(dims, origin=0.5):
"""Generate relative box corners based on length per dim and origin point.
Args:
dims (np.ndarray, shape=[N, ndim]): Array of length per dim
origin (list or array or float, optional): origin point relate to
smallest point. Defaults to 0.5
Returns:
np.ndarray, shape=[N, 2 ** ndim, ndim]: Returned corners.
point layout example: (2d) x0y0, x0y1, x1y0, x1y1;
(3d) x0y0z0, x0y0z1, x0y1z0, x0y1z1, x1y0z0, x1y0z1, x1y1z0, x1y1z1
where x0 < x1, y0 < y1, z0 < z1.
"""
ndim = int(dims.shape[1])
corners_norm = np.stack(
np.unravel_index(np.arange(2**ndim), [2] * ndim),
axis=1).astype(dims.dtype)
# now corners_norm has format: (2d) x0y0, x0y1, x1y0, x1y1
# (3d) x0y0z0, x0y0z1, x0y1z0, x0y1z1, x1y0z0, x1y0z1, x1y1z0, x1y1z1
# so need to convert to a format which is convenient to do other computing.
# for 2d boxes, format is clockwise start with minimum point
# for 3d boxes, please draw lines by your hand.
if ndim == 2:
# generate clockwise box corners
corners_norm = corners_norm[[0, 1, 3, 2]]
elif ndim == 3:
corners_norm = corners_norm[[0, 1, 3, 2, 4, 5, 7, 6]]
corners_norm = corners_norm - np.array(origin, dtype=dims.dtype)
corners = dims.reshape([-1, 1, ndim]) * corners_norm.reshape(
[1, 2**ndim, ndim])
return corners
def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
"""Convert kitti locations, dimensions and angles to corners.
format: center(xy), dims(xy), angles(counterclockwise when positive)
Args:
centers (np.ndarray): Locations in kitti label file with shape (N, 2).
dims (np.ndarray): Dimensions in kitti label file with shape (N, 2).
angles (np.ndarray, optional): Rotation_y in kitti label file with
shape (N). Defaults to None.
origin (list or array or float, optional): origin point relate to
smallest point. Defaults to 0.5.
Returns:
np.ndarray: Corners with the shape of (N, 4, 2).
"""
# 'length' in kitti format is in x axis.
# xyz(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(wlh)(lidar)
# center in kitti format is [0.5, 1.0, 0.5] in xyz.
corners = corners_nd(dims, origin=origin)
# corners: [N, 4, 2]
if angles is not None:
corners = rotation_3d_in_axis(corners, angles)
corners += centers.reshape([-1, 1, 2])
return corners
@numba.jit(nopython=True)
def depth_to_points(depth, trunc_pixel):
"""Convert depth map to points.
Args:
depth (np.array, shape=[H, W]): Depth map which
the row of [0~`trunc_pixel`] are truncated.
trunc_pixel (int): The number of truncated row.
Returns:
np.ndarray: Points in camera coordinates.
"""
num_pts = np.sum(depth[trunc_pixel:, ] > 0.1)
points = np.zeros((num_pts, 3), dtype=depth.dtype)
x = np.array([0, 0, 1], dtype=depth.dtype)
k = 0
for i in range(trunc_pixel, depth.shape[0]):
for j in range(depth.shape[1]):
if depth[i, j] > 0.1:
x = np.array([j, i, 1], dtype=depth.dtype)
points[k] = x * depth[i, j]
k += 1
return points
def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam):
"""Convert depth map to points in lidar coordinate.
Args:
depth (np.array, shape=[H, W]): Depth map which
the row of [0~`trunc_pixel`] are truncated.
trunc_pixel (int): The number of truncated row.
P2 (p.array, shape=[4, 4]): Intrinsics of Camera2.
r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
specific camera coordinate (e.g. CAM2) to CAM0.
velo2cam (np.ndarray, shape=[4, 4]): Matrix to project points in
camera coordinate to lidar coordinate.
Returns:
np.ndarray: Points in lidar coordinates.
"""
pts = depth_to_points(depth, trunc_pixel)
points_shape = list(pts.shape[0:-1])
points = np.concatenate([pts, np.ones(points_shape + [1])], axis=-1)
points = points @ np.linalg.inv(P2.T)
lidar_points = camera_to_lidar(points, r_rect, velo2cam)
return lidar_points
def center_to_corner_box3d(centers,
dims,
angles=None,
origin=(0.5, 1.0, 0.5),
axis=1):
"""Convert kitti locations, dimensions and angles to corners.
Args:
centers (np.ndarray): Locations in kitti label file with shape (N, 3).
dims (np.ndarray): Dimensions in kitti label file with shape (N, 3).
angles (np.ndarray, optional): Rotation_y in kitti label file with
shape (N). Defaults to None.
origin (list or array or float, optional): Origin point relate to
smallest point. Use (0.5, 1.0, 0.5) in camera and (0.5, 0.5, 0)
in lidar. Defaults to (0.5, 1.0, 0.5).
axis (int, optional): Rotation axis. 1 for camera and 2 for lidar.
Defaults to 1.
Returns:
np.ndarray: Corners with the shape of (N, 8, 3).
"""
# 'length' in kitti format is in x axis.
# yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(lwh)(lidar)
# center in kitti format is [0.5, 1.0, 0.5] in xyz.
corners = corners_nd(dims, origin=origin)
# corners: [N, 8, 3]
if angles is not None:
corners = rotation_3d_in_axis(corners, angles, axis=axis)
corners += centers.reshape([-1, 1, 3])
return corners
@numba.jit(nopython=True)
def box2d_to_corner_jit(boxes):
"""Convert box2d to corner.
Args:
boxes (np.ndarray, shape=[N, 5]): Boxes2d with rotation.
Returns:
box_corners (np.ndarray, shape=[N, 4, 2]): Box corners.
"""
num_box = boxes.shape[0]
corners_norm = np.zeros((4, 2), dtype=boxes.dtype)
corners_norm[1, 1] = 1.0
corners_norm[2] = 1.0
corners_norm[3, 0] = 1.0
corners_norm -= np.array([0.5, 0.5], dtype=boxes.dtype)
corners = boxes.reshape(num_box, 1, 5)[:, :, 2:4] * corners_norm.reshape(
1, 4, 2)
rot_mat_T = np.zeros((2, 2), dtype=boxes.dtype)
box_corners = np.zeros((num_box, 4, 2), dtype=boxes.dtype)
for i in range(num_box):
rot_sin = np.sin(boxes[i, -1])
rot_cos = np.cos(boxes[i, -1])
rot_mat_T[0, 0] = rot_cos
rot_mat_T[0, 1] = rot_sin
rot_mat_T[1, 0] = -rot_sin
rot_mat_T[1, 1] = rot_cos
box_corners[i] = corners[i] @ rot_mat_T + boxes[i, :2]
return box_corners
@numba.njit
def corner_to_standup_nd_jit(boxes_corner):
"""Convert boxes_corner to aligned (min-max) boxes.
Args:
boxes_corner (np.ndarray, shape=[N, 2**dim, dim]): Boxes corners.
Returns:
np.ndarray, shape=[N, dim*2]: Aligned (min-max) boxes.
"""
num_boxes = boxes_corner.shape[0]
ndim = boxes_corner.shape[-1]
result = np.zeros((num_boxes, ndim * 2), dtype=boxes_corner.dtype)
for i in range(num_boxes):
for j in range(ndim):
result[i, j] = np.min(boxes_corner[i, :, j])
for j in range(ndim):
result[i, j + ndim] = np.max(boxes_corner[i, :, j])
return result
@numba.jit(nopython=True)
def corner_to_surfaces_3d_jit(corners):
"""Convert 3d box corners from corner function above to surfaces that
normal vectors all direct to internal.
Args:
corners (np.ndarray): 3d box corners with the shape of (N, 8, 3).
Returns:
np.ndarray: Surfaces with the shape of (N, 6, 4, 3).
"""
# box_corners: [N, 8, 3], must from corner functions in this module
num_boxes = corners.shape[0]
surfaces = np.zeros((num_boxes, 6, 4, 3), dtype=corners.dtype)
corner_idxes = np.array([
0, 1, 2, 3, 7, 6, 5, 4, 0, 3, 7, 4, 1, 5, 6, 2, 0, 4, 5, 1, 3, 2, 6, 7
]).reshape(6, 4)
for i in range(num_boxes):
for j in range(6):
for k in range(4):
surfaces[i, j, k] = corners[i, corner_idxes[j, k]]
return surfaces
def rotation_points_single_angle(points, angle, axis=0):
"""Rotate points with a single angle.
Args:
points (np.ndarray, shape=[N, 3]]):
angle (np.ndarray, shape=[1]]):
axis (int, optional): Axis to rotate at. Defaults to 0.
Returns:
np.ndarray: Rotated points.
"""
# points: [N, 3]
rot_sin = np.sin(angle)
rot_cos = np.cos(angle)
if axis == 1:
rot_mat_T = np.array(
[[rot_cos, 0, rot_sin], [0, 1, 0], [-rot_sin, 0, rot_cos]],
dtype=points.dtype)
elif axis == 2 or axis == -1:
rot_mat_T = np.array(
[[rot_cos, rot_sin, 0], [-rot_sin, rot_cos, 0], [0, 0, 1]],
dtype=points.dtype)
elif axis == 0:
rot_mat_T = np.array(
[[1, 0, 0], [0, rot_cos, rot_sin], [0, -rot_sin, rot_cos]],
dtype=points.dtype)
else:
raise ValueError('axis should in range')
return points @ rot_mat_T, rot_mat_T
def box3d_to_bbox(box3d, P2):
"""Convert box3d in camera coordinates to bbox in image coordinates.
Args:
box3d (np.ndarray, shape=[N, 7]): Boxes in camera coordinate.
P2 (np.array, shape=[4, 4]): Intrinsics of Camera2.
Returns:
np.ndarray, shape=[N, 4]: Boxes 2d in image coordinates.
"""
box_corners = center_to_corner_box3d(
box3d[:, :3], box3d[:, 3:6], box3d[:, 6], [0.5, 1.0, 0.5], axis=1)
box_corners_in_image = points_cam2img(box_corners, P2)
# box_corners_in_image: [N, 8, 2]
minxy = np.min(box_corners_in_image, axis=1)
maxxy = np.max(box_corners_in_image, axis=1)
bbox = np.concatenate([minxy, maxxy], axis=1)
return bbox
def corner_to_surfaces_3d(corners):
"""convert 3d box corners from corner function above to surfaces that
normal vectors all direct to internal.
Args:
corners (np.ndarray): 3D box corners with shape of (N, 8, 3).
Returns:
np.ndarray: Surfaces with the shape of (N, 6, 4, 3).
"""
# box_corners: [N, 8, 3], must from corner functions in this module
surfaces = np.array([
[corners[:, 0], corners[:, 1], corners[:, 2], corners[:, 3]],
[corners[:, 7], corners[:, 6], corners[:, 5], corners[:, 4]],
[corners[:, 0], corners[:, 3], corners[:, 7], corners[:, 4]],
[corners[:, 1], corners[:, 5], corners[:, 6], corners[:, 2]],
[corners[:, 0], corners[:, 4], corners[:, 5], corners[:, 1]],
[corners[:, 3], corners[:, 2], corners[:, 6], corners[:, 7]],
]).transpose([2, 0, 1, 3])
return surfaces
def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)):
"""Check points in rotated bbox and return indices.
Note:
This function is for counterclockwise boxes.
Args:
points (np.ndarray, shape=[N, 3+dim]): Points to query.
rbbox (np.ndarray, shape=[M, 7]): Boxes3d with rotation.
z_axis (int, optional): Indicate which axis is height.
Defaults to 2.
origin (tuple[int], optional): Indicate the position of
box center. Defaults to (0.5, 0.5, 0).
Returns:
np.ndarray, shape=[N, M]: Indices of points in each box.
"""
# TODO: this function is different from PointCloud3D, be careful
# when start to use nuscene, check the input
rbbox_corners = center_to_corner_box3d(
rbbox[:, :3], rbbox[:, 3:6], rbbox[:, 6], origin=origin, axis=z_axis)
surfaces = corner_to_surfaces_3d(rbbox_corners)
indices = points_in_convex_polygon_3d_jit(points[:, :3], surfaces)
return indices
def minmax_to_corner_2d(minmax_box):
"""Convert minmax box to corners2d.
Args:
minmax_box (np.ndarray, shape=[N, dims]): minmax boxes.
Returns:
np.ndarray: 2d corners of boxes
"""
ndim = minmax_box.shape[-1] // 2
center = minmax_box[..., :ndim]
dims = minmax_box[..., ndim:] - center
return center_to_corner_box2d(center, dims, origin=0.0)
def create_anchors_3d_range(feature_size,
anchor_range,
sizes=((3.9, 1.6, 1.56), ),
rotations=(0, np.pi / 2),
dtype=np.float32):
"""Create anchors 3d by range.
Args:
feature_size (list[float] | tuple[float]): Feature map size. It is
either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]): Range of anchors with
shape [6]. The order is consistent with that of anchors, i.e.,
(x_min, y_min, z_min, x_max, y_max, z_max).
sizes (list[list] | np.ndarray | torch.Tensor, optional):
Anchor size with shape [N, 3], in order of x, y, z.
Defaults to ((3.9, 1.6, 1.56), ).
rotations (list[float] | np.ndarray | torch.Tensor, optional):
Rotations of anchors in a single feature grid.
Defaults to (0, np.pi / 2).
dtype (type, optional): Data type. Defaults to np.float32.
Returns:
np.ndarray: Range based anchors with shape of
(*feature_size, num_sizes, num_rots, 7).
"""
anchor_range = np.array(anchor_range, dtype)
z_centers = np.linspace(
anchor_range[2], anchor_range[5], feature_size[0], dtype=dtype)
y_centers = np.linspace(
anchor_range[1], anchor_range[4], feature_size[1], dtype=dtype)
x_centers = np.linspace(
anchor_range[0], anchor_range[3], feature_size[2], dtype=dtype)
sizes = np.reshape(np.array(sizes, dtype=dtype), [-1, 3])
rotations = np.array(rotations, dtype=dtype)
rets = np.meshgrid(
x_centers, y_centers, z_centers, rotations, indexing='ij')
tile_shape = [1] * 5
tile_shape[-2] = int(sizes.shape[0])
for i in range(len(rets)):
rets[i] = np.tile(rets[i][..., np.newaxis, :], tile_shape)
rets[i] = rets[i][..., np.newaxis] # for concat
sizes = np.reshape(sizes, [1, 1, 1, -1, 1, 3])
tile_size_shape = list(rets[0].shape)
tile_size_shape[3] = 1
sizes = np.tile(sizes, tile_size_shape)
rets.insert(3, sizes)
ret = np.concatenate(rets, axis=-1)
return np.transpose(ret, [2, 1, 0, 3, 4, 5])
def center_to_minmax_2d(centers, dims, origin=0.5):
"""Center to minmax.
Args:
centers (np.ndarray): Center points.
dims (np.ndarray): Dimensions.
origin (list or array or float, optional): Origin point relate
to smallest point. Defaults to 0.5.
Returns:
np.ndarray: Minmax points.
"""
if origin == 0.5:
return np.concatenate([centers - dims / 2, centers + dims / 2],
axis=-1)
corners = center_to_corner_box2d(centers, dims, origin=origin)
return corners[:, [0, 2]].reshape([-1, 4])
def rbbox2d_to_near_bbox(rbboxes):
"""convert rotated bbox to nearest 'standing' or 'lying' bbox.
Args:
rbboxes (np.ndarray): Rotated bboxes with shape of
(N, 5(x, y, xdim, ydim, rad)).
Returns:
np.ndarray: Bounding boxes with the shape of
(N, 4(xmin, ymin, xmax, ymax)).
"""
rots = rbboxes[..., -1]
rots_0_pi_div_2 = np.abs(limit_period(rots, 0.5, np.pi))
cond = (rots_0_pi_div_2 > np.pi / 4)[..., np.newaxis]
bboxes_center = np.where(cond, rbboxes[:, [0, 1, 3, 2]], rbboxes[:, :4])
bboxes = center_to_minmax_2d(bboxes_center[:, :2], bboxes_center[:, 2:])
return bboxes
@numba.jit(nopython=True)
def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
"""Calculate box iou. Note that jit version runs ~10x faster than the
box_overlaps function in mmdet3d.core.evaluation.
Note:
This function is for counterclockwise boxes.
Args:
boxes (np.ndarray): Input bounding boxes with shape of (N, 4).
query_boxes (np.ndarray): Query boxes with shape of (K, 4).
mode (str, optional): IoU mode. Defaults to 'iou'.
eps (float, optional): Value added to denominator. Defaults to 0.
Returns:
np.ndarray: Overlap between boxes and query_boxes
with the shape of [N, K].
"""
N = boxes.shape[0]
K = query_boxes.shape[0]
overlaps = np.zeros((N, K), dtype=boxes.dtype)
for k in range(K):
box_area = ((query_boxes[k, 2] - query_boxes[k, 0] + eps) *
(query_boxes[k, 3] - query_boxes[k, 1] + eps))
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0]) + eps)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + eps)
if ih > 0:
if mode == 'iou':
ua = ((boxes[n, 2] - boxes[n, 0] + eps) *
(boxes[n, 3] - boxes[n, 1] + eps) + box_area -
iw * ih)
else:
ua = ((boxes[n, 2] - boxes[n, 0] + eps) *
(boxes[n, 3] - boxes[n, 1] + eps))
overlaps[n, k] = iw * ih / ua
return overlaps
def projection_matrix_to_CRT_kitti(proj):
"""Split projection matrix of KITTI.
Note:
This function is for KITTI only.
P = C @ [R|T]
C is upper triangular matrix, so we need to inverse CR and use QR
stable for all kitti camera projection matrix.
Args:
proj (p.array, shape=[4, 4]): Intrinsics of camera.
Returns:
tuple[np.ndarray]: Splited matrix of C, R and T.
"""
CR = proj[0:3, 0:3]
CT = proj[0:3, 3]
RinvCinv = np.linalg.inv(CR)
Rinv, Cinv = np.linalg.qr(RinvCinv)
C = np.linalg.inv(Cinv)
R = np.linalg.inv(Rinv)
T = Cinv @ CT
return C, R, T
def remove_outside_points(points, rect, Trv2c, P2, image_shape):
"""Remove points which are outside of image.
Note:
This function is for KITTI only.
Args:
points (np.ndarray, shape=[N, 3+dims]): Total points.
rect (np.ndarray, shape=[4, 4]): Matrix to project points in
specific camera coordinate (e.g. CAM2) to CAM0.
Trv2c (np.ndarray, shape=[4, 4]): Matrix to project points in
camera coordinate to lidar coordinate.
P2 (p.array, shape=[4, 4]): Intrinsics of Camera2.
image_shape (list[int]): Shape of image.
Returns:
np.ndarray, shape=[N, 3+dims]: Filtered points.
"""
# 5x faster than remove_outside_points_v1(2ms vs 10ms)
C, R, T = projection_matrix_to_CRT_kitti(P2)
image_bbox = [0, 0, image_shape[1], image_shape[0]]
frustum = get_frustum(image_bbox, C)
frustum -= T
frustum = np.linalg.inv(R) @ frustum.T
frustum = camera_to_lidar(frustum.T, rect, Trv2c)
frustum_surfaces = corner_to_surfaces_3d_jit(frustum[np.newaxis, ...])
indices = points_in_convex_polygon_3d_jit(points[:, :3], frustum_surfaces)
points = points[indices.reshape([-1])]
return points
def get_frustum(bbox_image, C, near_clip=0.001, far_clip=100):
"""Get frustum corners in camera coordinates.
Args:
bbox_image (list[int]): box in image coordinates.
C (np.ndarray): Intrinsics.
near_clip (float, optional): Nearest distance of frustum.
Defaults to 0.001.
far_clip (float, optional): Farthest distance of frustum.
Defaults to 100.
Returns:
np.ndarray, shape=[8, 3]: coordinates of frustum corners.
"""
fku = C[0, 0]
fkv = -C[1, 1]
u0v0 = C[0:2, 2]
z_points = np.array(
[near_clip] * 4 + [far_clip] * 4, dtype=C.dtype)[:, np.newaxis]
b = bbox_image
box_corners = np.array(
[[b[0], b[1]], [b[0], b[3]], [b[2], b[3]], [b[2], b[1]]],
dtype=C.dtype)
near_box_corners = (box_corners - u0v0) / np.array(
[fku / near_clip, -fkv / near_clip], dtype=C.dtype)
far_box_corners = (box_corners - u0v0) / np.array(
[fku / far_clip, -fkv / far_clip], dtype=C.dtype)
ret_xy = np.concatenate([near_box_corners, far_box_corners],
axis=0) # [8, 2]
ret_xyz = np.concatenate([ret_xy, z_points], axis=1)
return ret_xyz
def surface_equ_3d(polygon_surfaces):
"""
Args:
polygon_surfaces (np.ndarray): Polygon surfaces with shape of
[num_polygon, max_num_surfaces, max_num_points_of_surface, 3].
All surfaces' normal vector must direct to internal.
Max_num_points_of_surface must at least 3.
Returns:
tuple: normal vector and its direction.
"""
# return [a, b, c], d in ax+by+cz+d=0
# polygon_surfaces: [num_polygon, num_surfaces, num_points_of_polygon, 3]
surface_vec = polygon_surfaces[:, :, :2, :] - \
polygon_surfaces[:, :, 1:3, :]
# normal_vec: [..., 3]
normal_vec = np.cross(surface_vec[:, :, 0, :], surface_vec[:, :, 1, :])
# print(normal_vec.shape, points[..., 0, :].shape)
# d = -np.inner(normal_vec, points[..., 0, :])
d = np.einsum('aij, aij->ai', normal_vec, polygon_surfaces[:, :, 0, :])
return normal_vec, -d
@numba.njit
def _points_in_convex_polygon_3d_jit(points, polygon_surfaces, normal_vec, d,
num_surfaces):
"""
Args:
points (np.ndarray): Input points with shape of (num_points, 3).
polygon_surfaces (np.ndarray): Polygon surfaces with shape of
(num_polygon, max_num_surfaces, max_num_points_of_surface, 3).
All surfaces' normal vector must direct to internal.
Max_num_points_of_surface must at least 3.
normal_vec (np.ndarray): Normal vector of polygon_surfaces.
d (int): Directions of normal vector.
num_surfaces (np.ndarray): Number of surfaces a polygon contains
shape of (num_polygon).
Returns:
np.ndarray: Result matrix with the shape of [num_points, num_polygon].
"""
max_num_surfaces, max_num_points_of_surface = polygon_surfaces.shape[1:3]
num_points = points.shape[0]
num_polygons = polygon_surfaces.shape[0]
ret = np.ones((num_points, num_polygons), dtype=np.bool_)
sign = 0.0
for i in range(num_points):
for j in range(num_polygons):
for k in range(max_num_surfaces):
if k > num_surfaces[j]:
break
sign = (
points[i, 0] * normal_vec[j, k, 0] +
points[i, 1] * normal_vec[j, k, 1] +
points[i, 2] * normal_vec[j, k, 2] + d[j, k])
if sign >= 0:
ret[i, j] = False
break
return ret
def points_in_convex_polygon_3d_jit(points,
polygon_surfaces,
num_surfaces=None):
"""Check points is in 3d convex polygons.
Args:
points (np.ndarray): Input points with shape of (num_points, 3).
polygon_surfaces (np.ndarray): Polygon surfaces with shape of
(num_polygon, max_num_surfaces, max_num_points_of_surface, 3).
All surfaces' normal vector must direct to internal.
Max_num_points_of_surface must at least 3.
num_surfaces (np.ndarray, optional): Number of surfaces a polygon
contains shape of (num_polygon). Defaults to None.
Returns:
np.ndarray: Result matrix with the shape of [num_points, num_polygon].
"""
max_num_surfaces, max_num_points_of_surface = polygon_surfaces.shape[1:3]
# num_points = points.shape[0]
num_polygons = polygon_surfaces.shape[0]
if num_surfaces is None:
num_surfaces = np.full((num_polygons, ), 9999999, dtype=np.int64)
normal_vec, d = surface_equ_3d(polygon_surfaces[:, :, :3, :])
# normal_vec: [num_polygon, max_num_surfaces, 3]
# d: [num_polygon, max_num_surfaces]
return _points_in_convex_polygon_3d_jit(points, polygon_surfaces,
normal_vec, d, num_surfaces)
@numba.njit
def points_in_convex_polygon_jit(points, polygon, clockwise=False):
"""Check points is in 2d convex polygons. True when point in polygon.
Args:
points (np.ndarray): Input points with the shape of [num_points, 2].
polygon (np.ndarray): Input polygon with the shape of
[num_polygon, num_points_of_polygon, 2].
clockwise (bool, optional): Indicate polygon is clockwise. Defaults
to True.
Returns:
np.ndarray: Result matrix with the shape of [num_points, num_polygon].
"""
# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
num_points = points.shape[0]
num_polygons = polygon.shape[0]
# vec for all the polygons
if clockwise:
vec1 = polygon - polygon[:,
np.array([num_points_of_polygon - 1] + list(
range(num_points_of_polygon - 1))), :]
else:
vec1 = polygon[:,
np.array([num_points_of_polygon - 1] +
list(range(num_points_of_polygon -
1))), :] - polygon
ret = np.zeros((num_points, num_polygons), dtype=np.bool_)
success = True
cross = 0.0
for i in range(num_points):
for j in range(num_polygons):
success = True
for k in range(num_points_of_polygon):
vec = vec1[j, k]
cross = vec[1] * (polygon[j, k, 0] - points[i, 0])
cross -= vec[0] * (polygon[j, k, 1] - points[i, 1])
if cross >= 0:
success = False
break
ret[i, j] = success
return ret
def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
"""Convert kitti center boxes to corners.
7 -------- 4
/| /|
6 -------- 5 .
| | | |
. 3 -------- 0
|/ |/
2 -------- 1
Note:
This function is for LiDAR boxes only.
Args:
boxes3d (np.ndarray): Boxes with shape of (N, 7)
[x, y, z, x_size, y_size, z_size, ry] in LiDAR coords,
see the definition of ry in KITTI dataset.
bottom_center (bool, optional): Whether z is on the bottom center
of object. Defaults to True.
Returns:
np.ndarray: Box corners with the shape of [N, 8, 3].
"""
boxes_num = boxes3d.shape[0]
x_size, y_size, z_size = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
x_corners = np.array([
x_size / 2., -x_size / 2., -x_size / 2., x_size / 2., x_size / 2.,
-x_size / 2., -x_size / 2., x_size / 2.
],
dtype=np.float32).T
y_corners = np.array([
-y_size / 2., -y_size / 2., y_size / 2., y_size / 2., -y_size / 2.,
-y_size / 2., y_size / 2., y_size / 2.
],
dtype=np.float32).T
if bottom_center:
z_corners = np.zeros((boxes_num, 8), dtype=np.float32)
z_corners[:, 4:8] = z_size.reshape(boxes_num, 1).repeat(
4, axis=1) # (N, 8)
else:
z_corners = np.array([
-z_size / 2., -z_size / 2., -z_size / 2., -z_size / 2.,
z_size / 2., z_size / 2., z_size / 2., z_size / 2.
],
dtype=np.float32).T
ry = boxes3d[:, 6]
zeros, ones = np.zeros(
ry.size, dtype=np.float32), np.ones(
ry.size, dtype=np.float32)
rot_list = np.array([[np.cos(ry), np.sin(ry), zeros],
[-np.sin(ry), np.cos(ry), zeros],
[zeros, zeros, ones]]) # (3, 3, N)
R_list = np.transpose(rot_list, (2, 0, 1)) # (N, 3, 3)
temp_corners = np.concatenate((x_corners.reshape(
-1, 8, 1), y_corners.reshape(-1, 8, 1), z_corners.reshape(-1, 8, 1)),
axis=2) # (N, 8, 3)
rotated_corners = np.matmul(temp_corners, R_list) # (N, 8, 3)
x_corners = rotated_corners[:, :, 0]
y_corners = rotated_corners[:, :, 1]
z_corners = rotated_corners[:, :, 2]
x_loc, y_loc, z_loc = boxes3d[:, 0], boxes3d[:, 1], boxes3d[:, 2]
x = x_loc.reshape(-1, 1) + x_corners.reshape(-1, 8)
y = y_loc.reshape(-1, 1) + y_corners.reshape(-1, 8)
z = z_loc.reshape(-1, 1) + z_corners.reshape(-1, 8)
corners = np.concatenate(
(x.reshape(-1, 8, 1), y.reshape(-1, 8, 1), z.reshape(-1, 8, 1)),
axis=2)
return corners.astype(np.float32)
# Copyright (c) OpenMMLab. All rights reserved.
from mmdet.core.bbox import build_bbox_coder
from .anchor_free_bbox_coder import AnchorFreeBBoxCoder
from .centerpoint_bbox_coders import CenterPointBBoxCoder
from .delta_xyzwhlr_bbox_coder import DeltaXYZWLHRBBoxCoder
from .fcos3d_bbox_coder import FCOS3DBBoxCoder
from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
from .monoflex_bbox_coder import MonoFlexCoder
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
from .pgd_bbox_coder import PGDBBoxCoder
from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
from .smoke_bbox_coder import SMOKECoder
__all__ = [
'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder',
'MonoFlexCoder'
]
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmdet.core.bbox.builder import BBOX_CODERS
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
@BBOX_CODERS.register_module()
class AnchorFreeBBoxCoder(PartialBinBasedBBoxCoder):
"""Anchor free bbox coder for 3D boxes.
Args:
num_dir_bins (int): Number of bins to encode direction angle.
with_rot (bool): Whether the bbox is with rotation.
"""
def __init__(self, num_dir_bins, with_rot=True):
super(AnchorFreeBBoxCoder, self).__init__(
num_dir_bins, 0, [], with_rot=with_rot)
self.num_dir_bins = num_dir_bins
self.with_rot = with_rot
def encode(self, gt_bboxes_3d, gt_labels_3d):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes.
Returns:
tuple: Targets of center, size and direction.
"""
# generate center target
center_target = gt_bboxes_3d.gravity_center
# generate bbox size target
size_res_target = gt_bboxes_3d.dims / 2
# generate dir target
box_num = gt_labels_3d.shape[0]
if self.with_rot:
(dir_class_target,
dir_res_target) = self.angle2class(gt_bboxes_3d.yaw)
dir_res_target /= (2 * np.pi / self.num_dir_bins)
else:
dir_class_target = gt_labels_3d.new_zeros(box_num)
dir_res_target = gt_bboxes_3d.tensor.new_zeros(box_num)
return (center_target, size_res_target, dir_class_target,
dir_res_target)
def decode(self, bbox_out):
"""Decode predicted parts to bbox3d.
Args:
bbox_out (dict): Predictions from model, should contain keys below.
- center: predicted bottom center of bboxes.
- dir_class: predicted bbox direction class.
- dir_res: predicted bbox direction residual.
- size: predicted bbox size.
Returns:
torch.Tensor: Decoded bbox3d with shape (batch, n, 7).
"""
center = bbox_out['center']
batch_size, num_proposal = center.shape[:2]
# decode heading angle
if self.with_rot:
dir_class = torch.argmax(bbox_out['dir_class'], -1)
dir_res = torch.gather(bbox_out['dir_res'], 2,
dir_class.unsqueeze(-1))
dir_res.squeeze_(2)
dir_angle = self.class2angle(dir_class, dir_res).reshape(
batch_size, num_proposal, 1)
else:
dir_angle = center.new_zeros(batch_size, num_proposal, 1)
# decode bbox size
bbox_size = torch.clamp(bbox_out['size'] * 2, min=0.1)
bbox3d = torch.cat([center, bbox_size, dir_angle], dim=-1)
return bbox3d
def split_pred(self, cls_preds, reg_preds, base_xyz):
"""Split predicted features to specific parts.
Args:
cls_preds (torch.Tensor): Class predicted features to split.
reg_preds (torch.Tensor): Regression predicted features to split.
base_xyz (torch.Tensor): Coordinates of points.
Returns:
dict[str, torch.Tensor]: Split results.
"""
results = {}
results['obj_scores'] = cls_preds
start, end = 0, 0
reg_preds_trans = reg_preds.transpose(2, 1)
# decode center
end += 3
# (batch_size, num_proposal, 3)
results['center_offset'] = reg_preds_trans[..., start:end]
results['center'] = base_xyz.detach() + reg_preds_trans[..., start:end]
start = end
# decode center
end += 3
# (batch_size, num_proposal, 3)
results['size'] = reg_preds_trans[..., start:end]
start = end
# decode direction
end += self.num_dir_bins
results['dir_class'] = reg_preds_trans[..., start:end]
start = end
end += self.num_dir_bins
dir_res_norm = reg_preds_trans[..., start:end]
start = end
results['dir_res_norm'] = dir_res_norm
results['dir_res'] = dir_res_norm * (2 * np.pi / self.num_dir_bins)
return results
# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class CenterPointBBoxCoder(BaseBBoxCoder):
"""Bbox coder for CenterPoint.
Args:
pc_range (list[float]): Range of point cloud.
out_size_factor (int): Downsample factor of the model.
voxel_size (list[float]): Size of voxel.
post_center_range (list[float], optional): Limit of the center.
Default: None.
max_num (int, optional): Max number to be kept. Default: 100.
score_threshold (float, optional): Threshold to filter boxes
based on score. Default: None.
code_size (int, optional): Code size of bboxes. Default: 9
"""
def __init__(self,
pc_range,
out_size_factor,
voxel_size,
post_center_range=None,
max_num=100,
score_threshold=None,
code_size=9):
self.pc_range = pc_range
self.out_size_factor = out_size_factor
self.voxel_size = voxel_size
self.post_center_range = post_center_range
self.max_num = max_num
self.score_threshold = score_threshold
self.code_size = code_size
def _gather_feat(self, feats, inds, feat_masks=None):
"""Given feats and indexes, returns the gathered feats.
Args:
feats (torch.Tensor): Features to be transposed and gathered
with the shape of [B, 2, W, H].
inds (torch.Tensor): Indexes with the shape of [B, N].
feat_masks (torch.Tensor, optional): Mask of the feats.
Default: None.
Returns:
torch.Tensor: Gathered feats.
"""
dim = feats.size(2)
inds = inds.unsqueeze(2).expand(inds.size(0), inds.size(1), dim)
feats = feats.gather(1, inds)
if feat_masks is not None:
feat_masks = feat_masks.unsqueeze(2).expand_as(feats)
feats = feats[feat_masks]
feats = feats.view(-1, dim)
return feats
def _topk(self, scores, K=80):
"""Get indexes based on scores.
Args:
scores (torch.Tensor): scores with the shape of [B, N, W, H].
K (int, optional): Number to be kept. Defaults to 80.
Returns:
tuple[torch.Tensor]
torch.Tensor: Selected scores with the shape of [B, K].
torch.Tensor: Selected indexes with the shape of [B, K].
torch.Tensor: Selected classes with the shape of [B, K].
torch.Tensor: Selected y coord with the shape of [B, K].
torch.Tensor: Selected x coord with the shape of [B, K].
"""
batch, cat, height, width = scores.size()
topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
topk_inds = topk_inds % (height * width)
topk_ys = (topk_inds.float() /
torch.tensor(width, dtype=torch.float)).int().float()
topk_xs = (topk_inds % width).int().float()
topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
topk_clses = (topk_ind / torch.tensor(K, dtype=torch.float)).int()
topk_inds = self._gather_feat(topk_inds.view(batch, -1, 1),
topk_ind).view(batch, K)
topk_ys = self._gather_feat(topk_ys.view(batch, -1, 1),
topk_ind).view(batch, K)
topk_xs = self._gather_feat(topk_xs.view(batch, -1, 1),
topk_ind).view(batch, K)
return topk_score, topk_inds, topk_clses, topk_ys, topk_xs
def _transpose_and_gather_feat(self, feat, ind):
"""Given feats and indexes, returns the transposed and gathered feats.
Args:
feat (torch.Tensor): Features to be transposed and gathered
with the shape of [B, 2, W, H].
ind (torch.Tensor): Indexes with the shape of [B, N].
Returns:
torch.Tensor: Transposed and gathered feats.
"""
feat = feat.permute(0, 2, 3, 1).contiguous()
feat = feat.view(feat.size(0), -1, feat.size(3))
feat = self._gather_feat(feat, ind)
return feat
def encode(self):
pass
def decode(self,
heat,
rot_sine,
rot_cosine,
hei,
dim,
vel,
reg=None,
task_id=-1):
"""Decode bboxes.
Args:
heat (torch.Tensor): Heatmap with the shape of [B, N, W, H].
rot_sine (torch.Tensor): Sine of rotation with the shape of
[B, 1, W, H].
rot_cosine (torch.Tensor): Cosine of rotation with the shape of
[B, 1, W, H].
hei (torch.Tensor): Height of the boxes with the shape
of [B, 1, W, H].
dim (torch.Tensor): Dim of the boxes with the shape of
[B, 1, W, H].
vel (torch.Tensor): Velocity with the shape of [B, 1, W, H].
reg (torch.Tensor, optional): Regression value of the boxes in
2D with the shape of [B, 2, W, H]. Default: None.
task_id (int, optional): Index of task. Default: -1.
Returns:
list[dict]: Decoded boxes.
"""
batch, cat, _, _ = heat.size()
scores, inds, clses, ys, xs = self._topk(heat, K=self.max_num)
if reg is not None:
reg = self._transpose_and_gather_feat(reg, inds)
reg = reg.view(batch, self.max_num, 2)
xs = xs.view(batch, self.max_num, 1) + reg[:, :, 0:1]
ys = ys.view(batch, self.max_num, 1) + reg[:, :, 1:2]
else:
xs = xs.view(batch, self.max_num, 1) + 0.5
ys = ys.view(batch, self.max_num, 1) + 0.5
# rotation value and direction label
rot_sine = self._transpose_and_gather_feat(rot_sine, inds)
rot_sine = rot_sine.view(batch, self.max_num, 1)
rot_cosine = self._transpose_and_gather_feat(rot_cosine, inds)
rot_cosine = rot_cosine.view(batch, self.max_num, 1)
rot = torch.atan2(rot_sine, rot_cosine)
# height in the bev
hei = self._transpose_and_gather_feat(hei, inds)
hei = hei.view(batch, self.max_num, 1)
# dim of the box
dim = self._transpose_and_gather_feat(dim, inds)
dim = dim.view(batch, self.max_num, 3)
# class label
clses = clses.view(batch, self.max_num).float()
scores = scores.view(batch, self.max_num)
xs = xs.view(
batch, self.max_num,
1) * self.out_size_factor * self.voxel_size[0] + self.pc_range[0]
ys = ys.view(
batch, self.max_num,
1) * self.out_size_factor * self.voxel_size[1] + self.pc_range[1]
if vel is None: # KITTI FORMAT
final_box_preds = torch.cat([xs, ys, hei, dim, rot], dim=2)
else: # exist velocity, nuscene format
vel = self._transpose_and_gather_feat(vel, inds)
vel = vel.view(batch, self.max_num, 2)
final_box_preds = torch.cat([xs, ys, hei, dim, rot, vel], dim=2)
final_scores = scores
final_preds = clses
# use score threshold
if self.score_threshold is not None:
thresh_mask = final_scores > self.score_threshold
if self.post_center_range is not None:
self.post_center_range = torch.tensor(
self.post_center_range, device=heat.device)
mask = (final_box_preds[..., :3] >=
self.post_center_range[:3]).all(2)
mask &= (final_box_preds[..., :3] <=
self.post_center_range[3:]).all(2)
predictions_dicts = []
for i in range(batch):
cmask = mask[i, :]
if self.score_threshold:
cmask &= thresh_mask[i]
boxes3d = final_box_preds[i, cmask]
scores = final_scores[i, cmask]
labels = final_preds[i, cmask]
predictions_dict = {
'bboxes': boxes3d,
'scores': scores,
'labels': labels
}
predictions_dicts.append(predictions_dict)
else:
raise NotImplementedError(
'Need to reorganize output as a batch, only '
'support post_center_range is not None for now!')
return predictions_dicts
# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):
"""Bbox Coder for 3D boxes.
Args:
code_size (int): The dimension of boxes to be encoded.
"""
def __init__(self, code_size=7):
super(DeltaXYZWLHRBBoxCoder, self).__init__()
self.code_size = code_size
@staticmethod
def encode(src_boxes, dst_boxes):
"""Get box regression transformation deltas (dx, dy, dz, dx_size,
dy_size, dz_size, dr, dv*) that can be used to transform the
`src_boxes` into the `target_boxes`.
Args:
src_boxes (torch.Tensor): source boxes, e.g., object proposals.
dst_boxes (torch.Tensor): target of the transformation, e.g.,
ground-truth boxes.
Returns:
torch.Tensor: Box transformation deltas.
"""
box_ndim = src_boxes.shape[-1]
cas, cgs, cts = [], [], []
if box_ndim > 7:
xa, ya, za, wa, la, ha, ra, *cas = torch.split(
src_boxes, 1, dim=-1)
xg, yg, zg, wg, lg, hg, rg, *cgs = torch.split(
dst_boxes, 1, dim=-1)
cts = [g - a for g, a in zip(cgs, cas)]
else:
xa, ya, za, wa, la, ha, ra = torch.split(src_boxes, 1, dim=-1)
xg, yg, zg, wg, lg, hg, rg = torch.split(dst_boxes, 1, dim=-1)
za = za + ha / 2
zg = zg + hg / 2
diagonal = torch.sqrt(la**2 + wa**2)
xt = (xg - xa) / diagonal
yt = (yg - ya) / diagonal
zt = (zg - za) / ha
lt = torch.log(lg / la)
wt = torch.log(wg / wa)
ht = torch.log(hg / ha)
rt = rg - ra
return torch.cat([xt, yt, zt, wt, lt, ht, rt, *cts], dim=-1)
@staticmethod
def decode(anchors, deltas):
"""Apply transformation `deltas` (dx, dy, dz, dx_size, dy_size,
dz_size, dr, dv*) to `boxes`.
Args:
anchors (torch.Tensor): Parameters of anchors with shape (N, 7).
deltas (torch.Tensor): Encoded boxes with shape
(N, 7+n) [x, y, z, x_size, y_size, z_size, r, velo*].
Returns:
torch.Tensor: Decoded boxes.
"""
cas, cts = [], []
box_ndim = anchors.shape[-1]
if box_ndim > 7:
xa, ya, za, wa, la, ha, ra, *cas = torch.split(anchors, 1, dim=-1)
xt, yt, zt, wt, lt, ht, rt, *cts = torch.split(deltas, 1, dim=-1)
else:
xa, ya, za, wa, la, ha, ra = torch.split(anchors, 1, dim=-1)
xt, yt, zt, wt, lt, ht, rt = torch.split(deltas, 1, dim=-1)
za = za + ha / 2
diagonal = torch.sqrt(la**2 + wa**2)
xg = xt * diagonal + xa
yg = yt * diagonal + ya
zg = zt * ha + za
lg = torch.exp(lt) * la
wg = torch.exp(wt) * wa
hg = torch.exp(ht) * ha
rg = rt + ra
zg = zg - hg / 2
cgs = [t + a for t, a in zip(cts, cas)]
return torch.cat([xg, yg, zg, wg, lg, hg, rg, *cgs], dim=-1)
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
from ..structures import limit_period
@BBOX_CODERS.register_module()
class FCOS3DBBoxCoder(BaseBBoxCoder):
"""Bounding box coder for FCOS3D.
Args:
base_depths (tuple[tuple[float]]): Depth references for decode box
depth. Defaults to None.
base_dims (tuple[tuple[float]]): Dimension references for decode box
dimension. Defaults to None.
code_size (int): The dimension of boxes to be encoded. Defaults to 7.
norm_on_bbox (bool): Whether to apply normalization on the bounding
box 2D attributes. Defaults to True.
"""
def __init__(self,
base_depths=None,
base_dims=None,
code_size=7,
norm_on_bbox=True):
super(FCOS3DBBoxCoder, self).__init__()
self.base_depths = base_depths
self.base_dims = base_dims
self.bbox_code_size = code_size
self.norm_on_bbox = norm_on_bbox
def encode(self, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels):
# TODO: refactor the encoder in the FCOS3D and PGD head
pass
def decode(self, bbox, scale, stride, training, cls_score=None):
"""Decode regressed results into 3D predictions.
Note that offsets are not transformed to the projected 3D centers.
Args:
bbox (torch.Tensor): Raw bounding box predictions in shape
[N, C, H, W].
scale (tuple[`Scale`]): Learnable scale parameters.
stride (int): Stride for a specific feature level.
training (bool): Whether the decoding is in the training
procedure.
cls_score (torch.Tensor): Classification score map for deciding
which base depth or dim is used. Defaults to None.
Returns:
torch.Tensor: Decoded boxes.
"""
# scale the bbox of different level
# only apply to offset, depth and size prediction
scale_offset, scale_depth, scale_size = scale[0:3]
clone_bbox = bbox.clone()
bbox[:, :2] = scale_offset(clone_bbox[:, :2]).float()
bbox[:, 2] = scale_depth(clone_bbox[:, 2]).float()
bbox[:, 3:6] = scale_size(clone_bbox[:, 3:6]).float()
if self.base_depths is None:
bbox[:, 2] = bbox[:, 2].exp()
elif len(self.base_depths) == 1: # only single prior
mean = self.base_depths[0][0]
std = self.base_depths[0][1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
else: # multi-class priors
assert len(self.base_depths) == cls_score.shape[1], \
'The number of multi-class depth priors should be equal to ' \
'the number of categories.'
indices = cls_score.max(dim=1)[1]
depth_priors = cls_score.new_tensor(
self.base_depths)[indices, :].permute(0, 3, 1, 2)
mean = depth_priors[:, 0]
std = depth_priors[:, 1]
bbox[:, 2] = mean + bbox.clone()[:, 2] * std
bbox[:, 3:6] = bbox[:, 3:6].exp()
if self.base_dims is not None:
assert len(self.base_dims) == cls_score.shape[1], \
'The number of anchor sizes should be equal to the number ' \
'of categories.'
indices = cls_score.max(dim=1)[1]
size_priors = cls_score.new_tensor(
self.base_dims)[indices, :].permute(0, 3, 1, 2)
bbox[:, 3:6] = size_priors * bbox.clone()[:, 3:6]
assert self.norm_on_bbox is True, 'Setting norm_on_bbox to False '\
'has not been thoroughly tested for FCOS3D.'
if self.norm_on_bbox:
if not training:
# Note that this line is conducted only when testing
bbox[:, :2] *= stride
return bbox
@staticmethod
def decode_yaw(bbox, centers2d, dir_cls, dir_offset, cam2img):
"""Decode yaw angle and change it from local to global.i.
Args:
bbox (torch.Tensor): Bounding box predictions in shape
[N, C] with yaws to be decoded.
centers2d (torch.Tensor): Projected 3D-center on the image planes
corresponding to the box predictions.
dir_cls (torch.Tensor): Predicted direction classes.
dir_offset (float): Direction offset before dividing all the
directions into several classes.
cam2img (torch.Tensor): Camera intrinsic matrix in shape [4, 4].
Returns:
torch.Tensor: Bounding boxes with decoded yaws.
"""
if bbox.shape[0] > 0:
dir_rot = limit_period(bbox[..., 6] - dir_offset, 0, np.pi)
bbox[..., 6] = \
dir_rot + dir_offset + np.pi * dir_cls.to(bbox.dtype)
bbox[:, 6] = torch.atan2(centers2d[:, 0] - cam2img[0, 2],
cam2img[0, 0]) + bbox[:, 6]
return bbox
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from mmdet.core.bbox.builder import BBOX_CODERS
from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
@BBOX_CODERS.register_module()
class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
"""Modified partial bin based bbox coder for GroupFree3D.
Args:
num_dir_bins (int): Number of bins to encode direction angle.
num_sizes (int): Number of size clusters.
mean_sizes (list[list[int]]): Mean size of bboxes in each class.
with_rot (bool, optional): Whether the bbox is with rotation.
Defaults to True.
size_cls_agnostic (bool, optional): Whether the predicted size is
class-agnostic. Defaults to True.
"""
def __init__(self,
num_dir_bins,
num_sizes,
mean_sizes,
with_rot=True,
size_cls_agnostic=True):
super(GroupFree3DBBoxCoder, self).__init__(
num_dir_bins=num_dir_bins,
num_sizes=num_sizes,
mean_sizes=mean_sizes,
with_rot=with_rot)
self.size_cls_agnostic = size_cls_agnostic
def encode(self, gt_bboxes_3d, gt_labels_3d):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
with shape (n, 7).
gt_labels_3d (torch.Tensor): Ground truth classes.
Returns:
tuple: Targets of center, size and direction.
"""
# generate center target
center_target = gt_bboxes_3d.gravity_center
# generate bbox size target
size_target = gt_bboxes_3d.dims
size_class_target = gt_labels_3d
size_res_target = gt_bboxes_3d.dims - gt_bboxes_3d.tensor.new_tensor(
self.mean_sizes)[size_class_target]
# generate dir target
box_num = gt_labels_3d.shape[0]
if self.with_rot:
(dir_class_target,
dir_res_target) = self.angle2class(gt_bboxes_3d.yaw)
else:
dir_class_target = gt_labels_3d.new_zeros(box_num)
dir_res_target = gt_bboxes_3d.tensor.new_zeros(box_num)
return (center_target, size_target, size_class_target, size_res_target,
dir_class_target, dir_res_target)
def decode(self, bbox_out, prefix=''):
"""Decode predicted parts to bbox3d.
Args:
bbox_out (dict): Predictions from model, should contain keys below.
- center: predicted bottom center of bboxes.
- dir_class: predicted bbox direction class.
- dir_res: predicted bbox direction residual.
- size_class: predicted bbox size class.
- size_res: predicted bbox size residual.
- size: predicted class-agnostic bbox size
prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''.
Returns:
torch.Tensor: Decoded bbox3d with shape (batch, n, 7).
"""
center = bbox_out[f'{prefix}center']
batch_size, num_proposal = center.shape[:2]
# decode heading angle
if self.with_rot:
dir_class = torch.argmax(bbox_out[f'{prefix}dir_class'], -1)
dir_res = torch.gather(bbox_out[f'{prefix}dir_res'], 2,
dir_class.unsqueeze(-1))
dir_res.squeeze_(2)
dir_angle = self.class2angle(dir_class, dir_res).reshape(
batch_size, num_proposal, 1)
else:
dir_angle = center.new_zeros(batch_size, num_proposal, 1)
# decode bbox size
if self.size_cls_agnostic:
bbox_size = bbox_out[f'{prefix}size'].reshape(
batch_size, num_proposal, 3)
else:
size_class = torch.argmax(
bbox_out[f'{prefix}size_class'], -1, keepdim=True)
size_res = torch.gather(
bbox_out[f'{prefix}size_res'], 2,
size_class.unsqueeze(-1).repeat(1, 1, 1, 3))
mean_sizes = center.new_tensor(self.mean_sizes)
size_base = torch.index_select(mean_sizes, 0,
size_class.reshape(-1))
bbox_size = size_base.reshape(batch_size, num_proposal,
-1) + size_res.squeeze(2)
bbox3d = torch.cat([center, bbox_size, dir_angle], dim=-1)
return bbox3d
def split_pred(self, cls_preds, reg_preds, base_xyz, prefix=''):
"""Split predicted features to specific parts.
Args:
cls_preds (torch.Tensor): Class predicted features to split.
reg_preds (torch.Tensor): Regression predicted features to split.
base_xyz (torch.Tensor): Coordinates of points.
prefix (str, optional): Decode predictions with specific prefix.
Defaults to ''.
Returns:
dict[str, torch.Tensor]: Split results.
"""
results = {}
start, end = 0, 0
cls_preds_trans = cls_preds.transpose(2, 1)
reg_preds_trans = reg_preds.transpose(2, 1)
# decode center
end += 3
# (batch_size, num_proposal, 3)
results[f'{prefix}center_residual'] = \
reg_preds_trans[..., start:end].contiguous()
results[f'{prefix}center'] = base_xyz + \
reg_preds_trans[..., start:end].contiguous()
start = end
# decode direction
end += self.num_dir_bins
results[f'{prefix}dir_class'] = \
reg_preds_trans[..., start:end].contiguous()
start = end
end += self.num_dir_bins
dir_res_norm = reg_preds_trans[..., start:end].contiguous()
start = end
results[f'{prefix}dir_res_norm'] = dir_res_norm
results[f'{prefix}dir_res'] = dir_res_norm * (
np.pi / self.num_dir_bins)
# decode size
if self.size_cls_agnostic:
end += 3
results[f'{prefix}size'] = \
reg_preds_trans[..., start:end].contiguous()
else:
end += self.num_sizes
results[f'{prefix}size_class'] = reg_preds_trans[
..., start:end].contiguous()
start = end
end += self.num_sizes * 3
size_res_norm = reg_preds_trans[..., start:end]
batch_size, num_proposal = reg_preds_trans.shape[:2]
size_res_norm = size_res_norm.view(
[batch_size, num_proposal, self.num_sizes, 3])
start = end
results[f'{prefix}size_res_norm'] = size_res_norm.contiguous()
mean_sizes = reg_preds.new_tensor(self.mean_sizes)
results[f'{prefix}size_res'] = (
size_res_norm * mean_sizes.unsqueeze(0).unsqueeze(0))
# decode objectness score
# Group-Free-3D objectness output shape (batch, proposal, 1)
results[f'{prefix}obj_scores'] = cls_preds_trans[..., :1].contiguous()
# decode semantic score
results[f'{prefix}sem_scores'] = cls_preds_trans[..., 1:].contiguous()
return results
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
import torch
from torch.nn import functional as F
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
@BBOX_CODERS.register_module()
class MonoFlexCoder(BaseBBoxCoder):
"""Bbox Coder for MonoFlex.
Args:
depth_mode (str): The mode for depth calculation.
Available options are "linear", "inv_sigmoid", and "exp".
base_depth (tuple[float]): References for decoding box depth.
depth_range (list): Depth range of predicted depth.
combine_depth (bool): Whether to use combined depth (direct depth
and depth from keypoints) or use direct depth only.
uncertainty_range (list): Uncertainty range of predicted depth.
base_dims (tuple[tuple[float]]): Dimensions mean and std of decode bbox
dimensions [l, h, w] for each category.
dims_mode (str): The mode for dimension calculation.
Available options are "linear" and "exp".
multibin (bool): Whether to use multibin representation.
num_dir_bins (int): Number of Number of bins to encode
direction angle.
bin_centers (list[float]): Local yaw centers while using multibin
representations.
bin_margin (float): Margin of multibin representations.
code_size (int): The dimension of boxes to be encoded.
eps (float, optional): A value added to the denominator for numerical
stability. Default 1e-3.
"""
def __init__(self,
depth_mode,
base_depth,
depth_range,
combine_depth,
uncertainty_range,
base_dims,
dims_mode,
multibin,
num_dir_bins,
bin_centers,
bin_margin,
code_size,
eps=1e-3):
super(MonoFlexCoder, self).__init__()
# depth related
self.depth_mode = depth_mode
self.base_depth = base_depth
self.depth_range = depth_range
self.combine_depth = combine_depth
self.uncertainty_range = uncertainty_range
# dimensions related
self.base_dims = base_dims
self.dims_mode = dims_mode
# orientation related
self.multibin = multibin
self.num_dir_bins = num_dir_bins
self.bin_centers = bin_centers
self.bin_margin = bin_margin
# output related
self.bbox_code_size = code_size
self.eps = eps
def encode(self, gt_bboxes_3d):
"""Encode ground truth to prediction targets.
Args:
gt_bboxes_3d (`BaseInstance3DBoxes`): Ground truth 3D bboxes.
shape: (N, 7).
Returns:
torch.Tensor: Targets of orientations.
"""
local_yaw = gt_bboxes_3d.local_yaw
# encode local yaw (-pi ~ pi) to multibin format
encode_local_yaw = local_yaw.new_zeros(
[local_yaw.shape[0], self.num_dir_bins * 2])
bin_size = 2 * np.pi / self.num_dir_bins
margin_size = bin_size * self.bin_margin
bin_centers = local_yaw.new_tensor(self.bin_centers)
range_size = bin_size / 2 + margin_size
offsets = local_yaw.unsqueeze(1) - bin_centers.unsqueeze(0)
offsets[offsets > np.pi] = offsets[offsets > np.pi] - 2 * np.pi
offsets[offsets < -np.pi] = offsets[offsets < -np.pi] + 2 * np.pi
for i in range(self.num_dir_bins):
offset = offsets[:, i]
inds = abs(offset) < range_size
encode_local_yaw[inds, i] = 1
encode_local_yaw[inds, i + self.num_dir_bins] = offset[inds]
orientation_target = encode_local_yaw
return orientation_target
def decode(self, bbox, base_centers2d, labels, downsample_ratio, cam2imgs):
"""Decode bounding box regression into 3D predictions.
Args:
bbox (Tensor): Raw bounding box predictions for each
predict center2d point.
shape: (N, C)
base_centers2d (torch.Tensor): Base centers2d for 3D bboxes.
shape: (N, 2).
labels (Tensor): Batch predict class label for each predict
center2d point.
shape: (N, )
downsample_ratio (int): The stride of feature map.
cam2imgs (Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
Return:
dict: The 3D prediction dict decoded from regression map.
the dict has components below:
- bboxes2d (torch.Tensor): Decoded [x1, y1, x2, y2] format
2D bboxes.
- dimensions (torch.Tensor): Decoded dimensions for each
object.
- offsets2d (torch.Tenosr): Offsets between base centers2d
and real centers2d.
- direct_depth (torch.Tensor): Decoded directly regressed
depth.
- keypoints2d (torch.Tensor): Keypoints of each projected
3D box on image.
- keypoints_depth (torch.Tensor): Decoded depth from keypoints.
- combined_depth (torch.Tensor): Combined depth using direct
depth and keypoints depth with depth uncertainty.
- orientations (torch.Tensor): Multibin format orientations
(local yaw) for each objects.
"""
# 4 dimensions for FCOS style regression
pred_bboxes2d = bbox[:, 0:4]
# change FCOS style to [x1, y1, x2, y2] format for IOU Loss
pred_bboxes2d = self.decode_bboxes2d(pred_bboxes2d, base_centers2d)
# 2 dimensions for projected centers2d offsets
pred_offsets2d = bbox[:, 4:6]
# 3 dimensions for 3D bbox dimensions offsets
pred_dimensions_offsets3d = bbox[:, 29:32]
# the first 8 dimensions are for orientation bin classification
# and the second 8 dimensions are for orientation offsets.
pred_orientations = torch.cat((bbox[:, 32:40], bbox[:, 40:48]), dim=1)
# 3 dimensions for the uncertainties of the solved depths from
# groups of keypoints
pred_keypoints_depth_uncertainty = bbox[:, 26:29]
# 1 dimension for the uncertainty of directly regressed depth
pred_direct_depth_uncertainty = bbox[:, 49:50].squeeze(-1)
# 2 dimension of offsets x keypoints (8 corners + top/bottom center)
pred_keypoints2d = bbox[:, 6:26].reshape(-1, 10, 2)
# 1 dimension for depth offsets
pred_direct_depth_offsets = bbox[:, 48:49].squeeze(-1)
# decode the pred residual dimensions to real dimensions
pred_dimensions = self.decode_dims(labels, pred_dimensions_offsets3d)
pred_direct_depth = self.decode_direct_depth(pred_direct_depth_offsets)
pred_keypoints_depth = self.keypoints2depth(pred_keypoints2d,
pred_dimensions, cam2imgs,
downsample_ratio)
pred_direct_depth_uncertainty = torch.clamp(
pred_direct_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
pred_keypoints_depth_uncertainty = torch.clamp(
pred_keypoints_depth_uncertainty, self.uncertainty_range[0],
self.uncertainty_range[1])
if self.combine_depth:
pred_depth_uncertainty = torch.cat(
(pred_direct_depth_uncertainty.unsqueeze(-1),
pred_keypoints_depth_uncertainty),
dim=1).exp()
pred_depth = torch.cat(
(pred_direct_depth.unsqueeze(-1), pred_keypoints_depth), dim=1)
pred_combined_depth = \
self.combine_depths(pred_depth, pred_depth_uncertainty)
else:
pred_combined_depth = None
preds = dict(
bboxes2d=pred_bboxes2d,
dimensions=pred_dimensions,
offsets2d=pred_offsets2d,
keypoints2d=pred_keypoints2d,
orientations=pred_orientations,
direct_depth=pred_direct_depth,
keypoints_depth=pred_keypoints_depth,
combined_depth=pred_combined_depth,
direct_depth_uncertainty=pred_direct_depth_uncertainty,
keypoints_depth_uncertainty=pred_keypoints_depth_uncertainty,
)
return preds
def decode_direct_depth(self, depth_offsets):
"""Transform depth offset to directly regressed depth.
Args:
depth_offsets (torch.Tensor): Predicted depth offsets.
shape: (N, )
Return:
torch.Tensor: Directly regressed depth.
shape: (N, )
"""
if self.depth_mode == 'exp':
direct_depth = depth_offsets.exp()
elif self.depth_mode == 'linear':
base_depth = depth_offsets.new_tensor(self.base_depth)
direct_depth = depth_offsets * base_depth[1] + base_depth[0]
elif self.depth_mode == 'inv_sigmoid':
direct_depth = 1 / torch.sigmoid(depth_offsets) - 1
else:
raise ValueError
if self.depth_range is not None:
direct_depth = torch.clamp(
direct_depth, min=self.depth_range[0], max=self.depth_range[1])
return direct_depth
def decode_location(self,
base_centers2d,
offsets2d,
depths,
cam2imgs,
downsample_ratio,
pad_mode='default'):
"""Retrieve object location.
Args:
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
offsets2d (torch.Tensor): The offsets between real centers2d
and base centers2d.
shape: (N , 2)
depths (torch.Tensor): Depths of objects.
shape: (N, )
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int): The stride of feature map.
pad_mode (str, optional): Padding mode used in
training data augmentation.
Return:
tuple(torch.Tensor): Centers of 3D boxes.
shape: (N, 3)
"""
N = cam2imgs.shape[0]
# (N, 4, 4)
cam2imgs_inv = cam2imgs.inverse()
if pad_mode == 'default':
centers2d_img = (base_centers2d + offsets2d) * downsample_ratio
else:
raise NotImplementedError
# (N, 3)
centers2d_img = \
torch.cat((centers2d_img, depths.unsqueeze(-1)), dim=1)
# (N, 4, 1)
centers2d_extend = \
torch.cat((centers2d_img, centers2d_img.new_ones(N, 1)),
dim=1).unsqueeze(-1)
locations = torch.matmul(cam2imgs_inv, centers2d_extend).squeeze(-1)
return locations[:, :3]
def keypoints2depth(self,
keypoints2d,
dimensions,
cam2imgs,
downsample_ratio=4,
group0_index=[(7, 3), (0, 4)],
group1_index=[(2, 6), (1, 5)]):
"""Decode depth form three groups of keypoints and geometry projection
model. 2D keypoints inlucding 8 coreners and top/bottom centers will be
divided into three groups which will be used to calculate three depths
of object.
.. code-block:: none
Group center keypoints:
+ --------------- +
/| top center /|
/ | . / |
/ | | / |
+ ---------|----- + +
| / | | /
| / . | /
|/ bottom center |/
+ --------------- +
Group 0 keypoints:
0
+ -------------- +
/| /|
/ | / |
/ | 5/ |
+ -------------- + +
| /3 | /
| / | /
|/ |/
+ -------------- + 6
Group 1 keypoints:
4
+ -------------- +
/| /|
/ | / |
/ | / |
1 + -------------- + + 7
| / | /
| / | /
|/ |/
2 + -------------- +
Args:
keypoints2d (torch.Tensor): Keypoints of objects.
8 vertices + top/bottom center.
shape: (N, 10, 2)
dimensions (torch.Tensor): Dimensions of objetcts.
shape: (N, 3)
cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
shape: kitti (N, 4, 4) nuscenes (N, 3, 3)
downsample_ratio (int, opitonal): The stride of feature map.
Defaults: 4.
group0_index(list[tuple[int]], optional): Keypoints group 0
of index to calculate the depth.
Defaults: [0, 3, 4, 7].
group1_index(list[tuple[int]], optional): Keypoints group 1
of index to calculate the depth.
Defaults: [1, 2, 5, 6]
Return:
tuple(torch.Tensor): Depth computed from three groups of
keypoints (top/bottom, group0, group1)
shape: (N, 3)
"""
pred_height_3d = dimensions[:, 1].clone()
f_u = cam2imgs[:, 0, 0]
center_height = keypoints2d[:, -2, 1] - keypoints2d[:, -1, 1]
corner_group0_height = keypoints2d[:, group0_index[0], 1] \
- keypoints2d[:, group0_index[1], 1]
corner_group1_height = keypoints2d[:, group1_index[0], 1] \
- keypoints2d[:, group1_index[1], 1]
center_depth = f_u * pred_height_3d / (
F.relu(center_height) * downsample_ratio + self.eps)
corner_group0_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group0_height) * downsample_ratio + self.eps)
corner_group1_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
F.relu(corner_group1_height) * downsample_ratio + self.eps)
corner_group0_depth = corner_group0_depth.mean(dim=1)
corner_group1_depth = corner_group1_depth.mean(dim=1)
keypoints_depth = torch.stack(
(center_depth, corner_group0_depth, corner_group1_depth), dim=1)
keypoints_depth = torch.clamp(
keypoints_depth, min=self.depth_range[0], max=self.depth_range[1])
return keypoints_depth
def decode_dims(self, labels, dims_offset):
"""Retrieve object dimensions.
Args:
labels (torch.Tensor): Each points' category id.
shape: (N, K)
dims_offset (torch.Tensor): Dimension offsets.
shape: (N, 3)
Returns:
torch.Tensor: Shape (N, 3)
"""
if self.dims_mode == 'exp':
dims_offset = dims_offset.exp()
elif self.dims_mode == 'linear':
labels = labels.long()
base_dims = dims_offset.new_tensor(self.base_dims)
dims_mean = base_dims[:, :3]
dims_std = base_dims[:, 3:6]
cls_dimension_mean = dims_mean[labels, :]
cls_dimension_std = dims_std[labels, :]
dimensions = dims_offset * cls_dimension_mean + cls_dimension_std
else:
raise ValueError
return dimensions
def decode_orientation(self, ori_vector, locations):
"""Retrieve object orientation.
Args:
ori_vector (torch.Tensor): Local orientation vector
in [axis_cls, head_cls, sin, cos] format.
shape: (N, num_dir_bins * 4)
locations (torch.Tensor): Object location.
shape: (N, 3)
Returns:
tuple[torch.Tensor]: yaws and local yaws of 3d bboxes.
"""
if self.multibin:
pred_bin_cls = ori_vector[:, :self.num_dir_bins * 2].view(
-1, self.num_dir_bins, 2)
pred_bin_cls = pred_bin_cls.softmax(dim=2)[..., 1]
orientations = ori_vector.new_zeros(ori_vector.shape[0])
for i in range(self.num_dir_bins):
mask_i = (pred_bin_cls.argmax(dim=1) == i)
start_bin = self.num_dir_bins * 2 + i * 2
end_bin = start_bin + 2
pred_bin_offset = ori_vector[mask_i, start_bin:end_bin]
orientations[mask_i] = pred_bin_offset[:, 0].atan2(
pred_bin_offset[:, 1]) + self.bin_centers[i]
else:
axis_cls = ori_vector[:, :2].softmax(dim=1)
axis_cls = axis_cls[:, 0] < axis_cls[:, 1]
head_cls = ori_vector[:, 2:4].softmax(dim=1)
head_cls = head_cls[:, 0] < head_cls[:, 1]
# cls axis
orientations = self.bin_centers[axis_cls + head_cls * 2]
sin_cos_offset = F.normalize(ori_vector[:, 4:])
orientations += sin_cos_offset[:, 0].atan(sin_cos_offset[:, 1])
locations = locations.view(-1, 3)
rays = locations[:, 0].atan2(locations[:, 2])
local_yaws = orientations
yaws = local_yaws + rays
larger_idx = (yaws > np.pi).nonzero(as_tuple=False)
small_idx = (yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
yaws[small_idx] += 2 * np.pi
larger_idx = (local_yaws > np.pi).nonzero(as_tuple=False)
small_idx = (local_yaws < -np.pi).nonzero(as_tuple=False)
if len(larger_idx) != 0:
local_yaws[larger_idx] -= 2 * np.pi
if len(small_idx) != 0:
local_yaws[small_idx] += 2 * np.pi
return yaws, local_yaws
def decode_bboxes2d(self, reg_bboxes2d, base_centers2d):
"""Retrieve [x1, y1, x2, y2] format 2D bboxes.
Args:
reg_bboxes2d (torch.Tensor): Predicted FCOS style
2D bboxes.
shape: (N, 4)
base_centers2d (torch.Tensor): predicted base centers2d.
shape: (N, 2)
Returns:
torch.Tenosr: [x1, y1, x2, y2] format 2D bboxes.
"""
centers_x = base_centers2d[:, 0]
centers_y = base_centers2d[:, 1]
xs_min = centers_x - reg_bboxes2d[..., 0]
ys_min = centers_y - reg_bboxes2d[..., 1]
xs_max = centers_x + reg_bboxes2d[..., 2]
ys_max = centers_y + reg_bboxes2d[..., 3]
bboxes2d = torch.stack([xs_min, ys_min, xs_max, ys_max], dim=-1)
return bboxes2d
def combine_depths(self, depth, depth_uncertainty):
"""Combine all the prediced depths with depth uncertainty.
Args:
depth (torch.Tensor): Predicted depths of each object.
2D bboxes.
shape: (N, 4)
depth_uncertainty (torch.Tensor): Depth uncertainty for
each depth of each object.
shape: (N, 4)
Returns:
torch.Tenosr: combined depth.
"""
uncertainty_weights = 1 / depth_uncertainty
uncertainty_weights = \
uncertainty_weights / \
uncertainty_weights.sum(dim=1, keepdim=True)
combined_depth = torch.sum(depth * uncertainty_weights, dim=1)
return combined_depth
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment