[Fix] Fix demo and visualization (#2453)

* fix demo and visualization * update second checkpoint and link * rename link about scannet config * fix amp config and add lidar_seg in vis_hook * revert main to master in changelog_v1.0.x

[Fix] Fix demo and visualization (#2453)
* fix demo and visualization * update second checkpoint and link * rename link about scannet config * fix amp config and add lidar_seg in vis_hook * revert main to master in changelog_v1.0.x
5ea7fa1b · Jingwei Zhang · GitHub · 1f5dec4c · 5ea7fa1b · 5ea7fa1b
Unverified Commit 5ea7fa1b authored Apr 25, 2023 by Jingwei Zhang Committed by GitHub Apr 25, 2023
12 changed files
--- a/docs/zh_cn/advanced_guides/supported_tasks/vision_det3d.md
+++ b/docs/zh_cn/advanced_guides/supported_tasks/vision_det3d.md
@@ -58,7 +58,7 @@ mmdetection3d
 ./tools/dist_train.sh fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py 8
 ```

-通过先前的脚本训练好一个基准模型后，请记得相应的修改[此处](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py#L8)的路径。
+通过先前的脚本训练好一个基准模型后，请记得相应的修改[此处](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py#L8)的路径。

 ## 定量评估

@@ -101,7 +101,7 @@ barrier 0.466   0.581   0.269   0.169   nan     nan

 ## 测试与提交

-如果你只想在在线基准上进行推理或测试模型性能，你需要将之前评估脚本中的 `--eval mAP` 替换成 `--format-only`，并在需要的情况下指定 `jsonfile_prefix`，例如，添加选项 `--eval-options jsonfile_prefix=work_dirs/fcos3d/test_submission`。请确保配置文件中的[测试信息](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/nus-mono3d.py#L93)由验证集相应地改为测试集。
+如果你只想在在线基准上进行推理或测试模型性能，你需要将之前评估脚本中的 `--eval mAP` 替换成 `--format-only`，并在需要的情况下指定 `jsonfile_prefix`，例如，添加选项 `--eval-options jsonfile_prefix=work_dirs/fcos3d/test_submission`。请确保配置文件中的[测试信息](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/_base_/datasets/nus-mono3d.py#L93)由验证集相应地改为测试集。

 在生成结果后，你可以压缩文件夹并上传至 nuScenes 3D 检测挑战的 evalAI 评估服务器上。


--- a/docs/zh_cn/model_zoo.md
+++ b/docs/zh_cn/model_zoo.md
@@ -10,104 +10,104 @@

 ### SECOND

-请参考 [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/second) 获取更多的细节，我们在 KITTI 和 Waymo 数据集上都给出了相应的基准结果。
+请参考 [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/second) 获取更多的细节，我们在 KITTI 和 Waymo 数据集上都给出了相应的基准结果。

 ### PointPillars

-请参考 [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) 获取更多细节，我们在 KITTI 、nuScenes 、Lyft 、Waymo 数据集上给出了相应的基准结果。
+请参考 [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointpillars) 获取更多细节，我们在 KITTI 、nuScenes 、Lyft 、Waymo 数据集上给出了相应的基准结果。

 ### Part-A2

-请参考 [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/parta2) 获取更多细节。
+请参考 [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/parta2) 获取更多细节。

 ### VoteNet

-请参考 [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet) 获取更多细节，我们在 ScanNet 和 SUNRGBD 数据集上给出了相应的基准结果。
+请参考 [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/votenet) 获取更多细节，我们在 ScanNet 和 SUNRGBD 数据集上给出了相应的基准结果。

 ### Dynamic Voxelization

-请参考 [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/dynamic_voxelization) 获取更多细节。
+请参考 [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/dynamic_voxelization) 获取更多细节。

 ### MVXNet

-请参考 [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/mvxnet) 获取更多细节。
+请参考 [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/mvxnet) 获取更多细节。

 ### RegNetX

-请参考 [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/regnet) 获取更多细节，我们将 pointpillars 的主干网络替换成 RegNetX，并在 nuScenes 和 Lyft 数据集上给出了相应的基准结果。
+请参考 [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/regnet) 获取更多细节，我们将 pointpillars 的主干网络替换成 RegNetX，并在 nuScenes 和 Lyft 数据集上给出了相应的基准结果。

 ### nuImages

-我们在 [nuImages 数据集](https://www.nuscenes.org/nuimages) 上也提供基准模型，请参考 [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages) 获取更多细节，我们在该数据集上提供 Mask R-CNN ， Cascade Mask R-CNN 和 HTC 的结果。
+我们在 [nuImages 数据集](https://www.nuscenes.org/nuimages) 上也提供基准模型，请参考 [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages) 获取更多细节，我们在该数据集上提供 Mask R-CNN ， Cascade Mask R-CNN 和 HTC 的结果。

 ### H3DNet

-请参考 [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/h3dnet) 获取更多细节。
+请参考 [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/h3dnet) 获取更多细节。

 ### 3DSSD

-请参考 [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/3dssd) 获取更多细节。
+请参考 [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/3dssd) 获取更多细节。

 ### CenterPoint

-请参考 [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) 获取更多细节。
+请参考 [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/centerpoint) 获取更多细节。

 ### SSN

-请参考 [SSN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn) 获取更多细节，我们将 pointpillars 中的检测头替换成 SSN 模型中所使用的 ‘shape-aware grouping heads’，并在 nuScenes 和 Lyft 数据集上给出了相应的基准结果。
+请参考 [SSN](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/ssn) 获取更多细节，我们将 pointpillars 中的检测头替换成 SSN 模型中所使用的 ‘shape-aware grouping heads’，并在 nuScenes 和 Lyft 数据集上给出了相应的基准结果。

 ### ImVoteNet

-请参考 [ImVoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvotenet) 获取更多细节，我们在 SUNRGBD 数据集上给出了相应的结果。
+请参考 [ImVoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvotenet) 获取更多细节，我们在 SUNRGBD 数据集上给出了相应的结果。

 ### FCOS3D

-请参考 [FCOS3D](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcos3d) 获取更多细节，我们在 nuScenes 数据集上给出了相应的结果。
+请参考 [FCOS3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcos3d) 获取更多细节，我们在 nuScenes 数据集上给出了相应的结果。

 ### PointNet++

-请参考 [PointNet++](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointnet2) 获取更多细节，我们在 ScanNet 和 S3DIS 数据集上给出了相应的结果。
+请参考 [PointNet++](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointnet2) 获取更多细节，我们在 ScanNet 和 S3DIS 数据集上给出了相应的结果。

 ### Group-Free-3D

-请参考 [Group-Free-3D](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/groupfree3d) 获取更多细节，我们在 ScanNet 数据集上给出了相应的结果。
+请参考 [Group-Free-3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/groupfree3d) 获取更多细节，我们在 ScanNet 数据集上给出了相应的结果。

 ### ImVoxelNet

-请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvoxelnet) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。
+请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvoxelnet) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。

 ### PAConv

-请参考 [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/paconv) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果。
+请参考 [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/paconv) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果。

 ### DGCNN

-请参考 [DGCNN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/dgcnn) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果。
+请参考 [DGCNN](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/dgcnn) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果。

 ### SMOKE

-请参考 [SMOKE](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/smoke) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。
+请参考 [SMOKE](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/smoke) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。

 ### PGD

-请参考 [PGD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pgd) 获取更多细节，我们在 KITTI 和 nuScenes 数据集上给出了相应的结果。
+请参考 [PGD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pgd) 获取更多细节，我们在 KITTI 和 nuScenes 数据集上给出了相应的结果。

 ### PointRCNN

-请参考 [PointRCNN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/point_rcnn) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。
+请参考 [PointRCNN](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/point_rcnn) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。

 ### MonoFlex

-请参考 [MonoFlex](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/monoflex) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。
+请参考 [MonoFlex](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/monoflex) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。

 ### SA-SSD

-请参考 [SA-SSD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/sassd) 获取更多的细节，我们在 KITTI 数据集上给出了相应的基准结果。
+请参考 [SA-SSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/sassd) 获取更多的细节，我们在 KITTI 数据集上给出了相应的基准结果。

 ### FCAF3D

-请参考 [FCAF3D](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcaf3d) 获取更多的细节，我们在 ScanNet, S3DIS 和 SUN RGB-D 数据集上给出了相应的基准结果。
+请参考 [FCAF3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcaf3d) 获取更多的细节，我们在 ScanNet, S3DIS 和 SUN RGB-D 数据集上给出了相应的基准结果。

 ### Mixed Precision (FP16) Training

-细节请参考 [Mixed Precision (FP16) Training 在 PointPillars 训练的样例](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py)。
+细节请参考 [Mixed Precision (FP16) Training 在 PointPillars 训练的样例](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb2-amp-2x_nus-3d.py)。
--- a/docs/zh_cn/notes/benchmarks.md
+++ b/docs/zh_cn/notes/benchmarks.md
@@ -25,7 +25,7 @@

 ### 为了计算速度所做的修改

- __MMDetection3D__：我们尝试使用与其他代码库中尽可能相同的配置，具体配置细节见 [基准测试配置](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark)。
+- __MMDetection3D__：我们尝试使用与其他代码库中尽可能相同的配置，具体配置细节见 [基准测试配置](https://github.com/open-mmlab/MMDetection3D/blob/main/configs/benchmark)。

 - __Det3D__：为了与 Det3D 进行比较，我们使用了 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 所对应的代码版本。


--- a/docs/zh_cn/user_guides/inference.md
+++ b/docs/zh_cn/user_guides/inference.md
@@ -8,7 +8,7 @@

 ### 3D 检测

-#### 单模态样例
+#### 点云样例

 在点云数据上测试 3D 检测器，运行：

@@ -18,59 +18,65 @@ python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device

 点云和预测 3D 框的可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，它可以使用 [MeshLab](http://www.meshlab.net/) 打开。注意如果你设置了 `--show`，通过 [Open3D](http://www.open3d.org/) 可以在线显示预测结果。

-在 KITTI 数据上测试 [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) 模型：
+在 KITTI 数据上测试 [PointPillars 模型](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth)：

 ```shell
-python demo/pcd_demo.py demo/data/kitti/000008.bin configs/second/second_hv-secfpn_8xb6-80e_kitti-3d-car.py checkpoints/second_hv-secfpn_8xb6-80e_kitti-3d-car_20200620_230238-393f000c.pth
+python demo/pcd_demo.py demo/data/kitti/000008.bin configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py ${CHECKPOINT_FILE} --show
 ```

-在 SUN RGB-D 数据上测试 [VoteNet](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/votenet) 模型：
+在 SUN RGB-D 数据上测试 [VoteNet 模型](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth)：

 ```shell
-python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py checkpoints/votenet_8xb16_sunrgbd-3d_20200620_230238-4483c0c0.pth
+python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --show
 ```

-如果你正在使用的 mmdetection3d 版本 >= 0.6.0，记住转换 VoteNet 的模型权重文件，查看 [README](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet/README.md/) 来获取转换模型权重文件的详细说明。
+#### 单目 3D 样例

-#### 多模态样例
-
-在多模态数据（通常是点云和图像）上测试 3D 检测器，运行：
+在图像数据上测试单目 3D 检测器，运行：

 ```shell
-python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
+python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
 ```

-`ANNOTATION_FILE` 需要提供 3D 到 2D 的仿射矩阵，可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，其中包括点云、图像、预测的 3D 框以及它们在图像上的投影。
+`ANNOTATION_FILE` 需要提供 3D 到 2D 的仿射矩阵（相机内参矩阵），可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，其中包括图像以及预测 3D 框在图像上的投影。

-在 KITTI 数据上测试 [MVX-Net](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/mvxnet) 模型：
+在 KITTI 数据上测试 [PGD 模型](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d_20211022_102608-8a97533b.pth)：

 ```shell
-python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvx_fpn-dv-second-secfpn_8xb2-80e_kitti-3d-3class.py checkpoints/mvx_fpn-dv-second-secfpn_8xb2-80e_kitti-3d-3class_20200621_003904-10140f2d.pth
+python demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/000008.pkl  configs/pgd/pgd_r101-caffe_fpn_head-gn_4xb3-4x_kitti-mono3d.py ${CHECKPOINT_FILE}  --show --cam-type CAM2 --score-thr 8
 ```

-在 SUN RGB-D 数据上测试 [ImVoteNet](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/imvotenet) 模型：
+**注意**： PGD 方法的预测框分数并不是在 (0, 1) 之间
+
+在 nuScenes 数据上测试 [FCOS3D 模型](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth)：

 ```shell
-python demo/multi_modality_demo.py demo/data/sunrgbd/sunrgbd_000017.bin demo/data/sunrgbd/sunrgbd_000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd.py checkpoints/imvotenet_stage2_8xb16_sunrgbd_20210323_184021-d44dcb66.pth
+python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl  configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py ${CHECKPOINT_FILE}  --show --cam-type CAM_BACK
 ```

-### 单目 3D 检测
+**注意**： 当对翻转图像可视化单目 3D 检测结果是，相机内参矩阵也应该相应修改。在 PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744) 中可以了解更多细节和示例。

-在图像数据上测试单目 3D 检测器，运行：
+#### 多模态样例
+
+在多模态数据（通常是点云和图像）上测试 3D 检测器，运行：

 ```shell
-python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
+python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
 ```

-`ANNOTATION_FILE` 需要提供 3D 到 2D 的仿射矩阵（相机内参矩阵），可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，其中包括图像以及预测 3D 框在图像上的投影。
+`ANNOTATION_FILE` 需要提供 3D 到 2D 的仿射矩阵，可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，其中包括点云、图像、预测的 3D 框以及它们在图像上的投影。

-在 nuScenes 数据上测试 [FCOS3D](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/fcos3d) 模型：
+在 KITTI 数据上测试 [MVX-Net](https://github.com/open-mmlab/mmdetection3d/tree/main/configs/mvxnet) 模型：

 ```shell
-python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.pkl configs/fcos3d/fcos3d_r101-caffe-dcn-fpn-head-gn_8xb2-1x_nus-mono3d_finetune.py checkpoints/fcos3d_r101-caffe-dcn-fpn-head-gn_8xb2-1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth
+python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py ${CHECKPOINT_FILE} --cam-type CAM2 --show
 ```

-注意当对翻转图像可视化单目 3D 检测结果是，相机内参矩阵也应该相应修改。在 PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744) 中可以了解更多细节和示例。
+在 SUN RGB-D 数据上测试 [ImVoteNet 模型](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class/imvotenet_stage2_16x8_sunrgbd-3d-10class_20210819_192851-1bcd1b97.pth)：
+
+```shell
+python demo/multi_modality_demo.py demo/data/sunrgbd/000017.bin demo/data/sunrgbd/000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --cam-type CAM0 --show --score-thr 0.6
+```

 ### 3D 分割

@@ -82,8 +88,8 @@ python demo/pc_seg_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--devi

 可视化结果会被保存在 `${OUT_DIR}/PCD_NAME`，其中包括点云以及预测的 3D 分割掩码。

-在 ScanNet 数据上测试 [PointNet++ (SSG)](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointnet2) 模型：
+在 ScanNet 数据上测试 [PointNet++ (SSG) 模型](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class_20210514_143644-ee73704a.pth)：

 ```shell
-python demo/pc_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py checkpoints/pointnet2_ssg_2xb16-cosine-200e_scannet-seg_20210514_143644-ee73704a.pth
+python demo/pcd_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py ${CHECKPOINT_FILE} --show
 ```
--- a/docs/zh_cn/user_guides/useful_tools.md
+++ b/docs/zh_cn/user_guides/useful_tools.md
@@ -196,7 +196,7 @@ python -u tools/dataset_converters/nuimage_converter.py --data-root ${DATA_ROOT}
 - `--nproc`: 数据准备的进程数，默认为 `4`。由于图片是并行处理的，更大的进程数目能够减少准备时间。
 - `--extra-tag`: 注释的额外标签，默认为 `nuimages`。这可用于将不同时间处理的不同注释分开以供研究。

-更多的数据准备细节参考 [doc](https://mmdetection3d.readthedocs.io/zh_CN/latest/data_preparation.html)，nuImages 数据集的细节参考 [README](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages/README.md/)。
+更多的数据准备细节参考 [doc](https://mmdetection3d.readthedocs.io/zh_CN/latest/data_preparation.html)，nuImages 数据集的细节参考 [README](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages/README.md/)。

 &#8195;


--- a/docs/zh_cn/user_guides/visualization.md
+++ b/docs/zh_cn/user_guides/visualization.md
@@ -190,7 +190,7 @@ python tools/misc/browse_dataset.py configs/mvxnet/dv_mvx-fpn_second_secfpn_adam
 你可以使用不同的配置浏览不同的数据集，例如在 3D 语义分割任务中可视化 ScanNet 数据集：

 ```shell
-python tools/misc/browse_dataset.py configs/_base_/datasets/scannet_seg-3d-20class.py --task lidar_seg --output-dir ${OUTPUT_DIR} --online
+python tools/misc/browse_dataset.py configs/_base_/datasets/scannet-seg.py --task lidar_seg --output-dir ${OUTPUT_DIR} --online
 ```

 ![](../../../resources/browse_dataset_seg.png)

--- a/mmdet3d/apis/inference.py
+++ b/mmdet3d/apis/inference.py
@@ -178,8 +178,8 @@ def inference_multi_modality_detector(model: nn.Module,
                                      ann_file: Union[str, Sequence[str]],
                                      cam_type: str = 'CAM2'):
    """Inference point cloud with the multi-modality detector. Now we only
-    support multi-modality detector for KITTI dataset since the multi-view
-    image loading is not supported yet in this inference function.
+    support multi-modality detector for KITTI and SUNRGBD datasets since the
+    multi-view image loading is not supported yet in this inference function.

    Args:
        model (nn.Module): The loaded detector.
@@ -198,8 +198,6 @@ def inference_multi_modality_detector(model: nn.Module,
        If pcds is a list or tuple, the same length list type results
        will be returned, otherwise return the detection results directly.
    """
-
-    # TODO: We will support
    if isinstance(pcds, (list, tuple)):
        is_batch = True
        assert isinstance(imgs, (list, tuple))
@@ -229,9 +227,6 @@ def inference_multi_modality_detector(model: nn.Module,
        if osp.basename(img_path) != osp.basename(img):
            raise ValueError(f'the info file of {img_path} is not provided.')

-        data_info['images'][cam_type]['img_path'] = img
-        cam2img = np.array(data_info['images'][cam_type]['cam2img'])
-
        # TODO: check the name consistency of
        # image file and point cloud file
        # TODO: support multi-view image loading
@@ -239,8 +234,14 @@ def inference_multi_modality_detector(model: nn.Module,
            lidar_points=dict(lidar_path=pcd),
            img_path=img,
            box_type_3d=box_type_3d,
-            box_mode_3d=box_mode_3d,
-            cam2img=cam2img)
+            box_mode_3d=box_mode_3d)
+
+        data_info['images'][cam_type]['img_path'] = img
+        if 'cam2img' in data_info['images'][cam_type]:
+            # The data annotation in SRUNRGBD dataset does not contain
+            # `cam2img`
+            data_['cam2img'] = np.array(
+                data_info['images'][cam_type]['cam2img'])

        # LiDAR to image conversion for KITTI dataset
        if box_mode_3d == Box3DMode.LIDAR:
@@ -314,8 +315,10 @@ def inference_mono_3d_detector(model: nn.Module,

        # replace the img_path in data_info with img
        data_info['images'][cam_type]['img_path'] = img
+        # avoid data_info['images'] has multiple keys anout camera views.
+        mono_img_info = {f'{cam_type}': data_info['images'][cam_type]}
        data_ = dict(
-            images=data_info['images'],
+            images=mono_img_info,
            box_type_3d=box_type_3d,
            box_mode_3d=box_mode_3d)


--- a/mmdet3d/datasets/transforms/formating.py
+++ b/mmdet3d/datasets/transforms/formating.py
@@ -195,11 +195,29 @@ class Pack3DDetInputs(BaseTransform):
        gt_instances = InstanceData()
        gt_pts_seg = PointData()

-        img_metas = {}
+        data_metas = {}
        for key in self.meta_keys:
            if key in results:
-                img_metas[key] = results[key]
-        data_sample.set_metainfo(img_metas)
+                data_metas[key] = results[key]
+            elif 'images' in results:
+                if len(results['images'].keys()) == 1:
+                    cam_type = list(results['images'].keys())[0]
+                    # single-view image
+                    if key in results['images'][cam_type]:
+                        data_metas[key] = results['images'][cam_type][key]
+                else:
+                    # multi-view image
+                    img_metas = []
+                    cam_types = list(results['images'].keys())
+                    for cam_type in cam_types:
+                        if key in results['images'][cam_type]:
+                            img_metas.append(results['images'][cam_type][key])
+                    if len(img_metas) > 0:
+                        data_metas[key] = img_metas
+            elif 'lidar_points' in results:
+                if key in results['lidar_points']:
+                    data_metas[key] = results['lidar_points'][key]
+        data_sample.set_metainfo(data_metas)

        inputs = {}
        for key in self.keys:

--- a/mmdet3d/datasets/transforms/loading.py
+++ b/mmdet3d/datasets/transforms/loading.py
@@ -250,7 +250,7 @@ class LoadImageFromFileMono3D(LoadImageFromFile):
            results['cam2img'] = results['images'][camera_type]['cam2img']
        else:
            raise NotImplementedError(
-                'Currently we only support load image from kitti and'
+                'Currently we only support load image from kitti and '
                'nuscenes datasets')

        try:

--- a/mmdet3d/engine/hooks/visualization_hook.py
+++ b/mmdet3d/engine/hooks/visualization_hook.py
@@ -97,13 +97,18 @@ class Det3DVisualizationHook(Hook):
        data_input = dict()

        # Visualize only the first data
-        if 'img_path' in outputs[0]:
+        if self.vis_task in [
+                'mono_det', 'multi-view_det', 'multi-modality_det'
+        ]:
+            assert 'img_path' in outputs[0], 'img_path is not in outputs[0]'
            img_path = outputs[0].img_path
            img_bytes = get(img_path, backend_args=self.backend_args)
            img = mmcv.imfrombytes(img_bytes, channel_order='rgb')
            data_input['img'] = img

-        if 'lidar_path' in outputs[0]:
+        if self.vis_task in ['lidar_det', 'multi-modality_det', 'lidar_seg']:
+            assert 'lidar_path' in outputs[
+                0], 'lidar_path is not in outputs[0]'
            lidar_path = outputs[0].lidar_path
            num_pts_feats = outputs[0].num_pts_feats
            pts_bytes = get(lidar_path, backend_args=self.backend_args)
@@ -145,24 +150,39 @@ class Det3DVisualizationHook(Hook):
            self._test_index += 1

            data_input = dict()
-            if 'img_path' in data_sample:
+            assert 'img_path' in data_sample or 'lidar_path' in data_sample, \
+                "'data_sample' must contain 'img_path' or 'lidar_path'"
+
+            out_file = o3d_save_path = None
+
+            if self.vis_task in [
+                    'mono_det', 'multi-view_det', 'multi-modality_det'
+            ]:
+                assert 'img_path' in data_sample, \
+                    'img_path is not in data_sample'
                img_path = data_sample.img_path
                img_bytes = get(img_path, backend_args=self.backend_args)
                img = mmcv.imfrombytes(img_bytes, channel_order='rgb')
                data_input['img'] = img
-
-            if 'lidar_path' in data_sample:
+                if self.test_out_dir is not None:
+                    out_file = osp.basename(img_path)
+                    out_file = osp.join(self.test_out_dir, out_file)
+
+            if self.vis_task in [
+                    'lidar_det', 'multi-modality_det', 'lidar_seg'
+            ]:
+                assert 'lidar_path' in data_sample, \
+                    'lidar_path is not in data_sample'
                lidar_path = data_sample.lidar_path
                num_pts_feats = data_sample.num_pts_feats
                pts_bytes = get(lidar_path, backend_args=self.backend_args)
                points = np.frombuffer(pts_bytes, dtype=np.float32)
                points = points.reshape(-1, num_pts_feats)
                data_input['points'] = points
-
-            out_file = None
-            if self.test_out_dir is not None:
-                out_file = osp.basename(img_path)
-                out_file = osp.join(self.test_out_dir, out_file)
+                if self.test_out_dir is not None:
+                    o3d_save_path = osp.basename(lidar_path).split(
+                        '.')[0] + '.png'
+                    o3d_save_path = osp.join(self.test_out_dir, o3d_save_path)

            self._visualizer.add_datasample(
                'test sample',
@@ -173,4 +193,5 @@ class Det3DVisualizationHook(Hook):
                wait_time=self.wait_time,
                pred_score_thr=self.score_thr,
                out_file=out_file,
+                o3d_save_path=o3d_save_path,
                step=self._test_index)
--- a/mmdet3d/visualization/local_visualizer.py
+++ b/mmdet3d/visualization/local_visualizer.py
@@ -664,6 +664,9 @@ class Det3DLocalVisualizer(DetLocalVisualizer):
        if hasattr(self, 'o3d_vis'):
            self.o3d_vis.run()
            if save_path is not None:
+                if not (save_path.endswith('.png')
+                        or save_path.endswith('.jpg')):
+                    save_path += '.png'
                self.o3d_vis.capture_screen_image(save_path)
            self.o3d_vis.destroy_window()
            self._clear_o3d_vis()

--- a/tools/test.py
+++ b/tools/test.py
@@ -28,6 +28,8 @@ def parse_args():
        help='directory where painted images will be saved. '
        'If specified, it will be automatically saved '
        'to the work_dir/timestamp/show_dir')
+    parser.add_argument(
+        '--score-thr', type=float, default=0.1, help='bbox score threshold')
    parser.add_argument(
        '--task',
        type=str,
@@ -73,7 +75,15 @@ def trigger_visualization_hook(cfg, args):
            visualization_hook['wait_time'] = args.wait_time
        if args.show_dir:
            visualization_hook['test_out_dir'] = args.show_dir
+        all_task_choices = [
+            'mono_det', 'multi-view_det', 'lidar_det', 'lidar_seg',
+            'multi-modality_det'
+        ]
+        assert args.task in all_task_choices, 'You must set '\
+            f"'--task' in {all_task_choices} in the command " \
+            'if you want to use visualization hook'
        visualization_hook['vis_task'] = args.task
+        visualization_hook['score_thr'] = args.score_thr
    else:
        raise RuntimeError(
            'VisualizationHook must be included in default_hooks.'