@@ -15,6 +15,7 @@ defines how to process the annotations and a data pipeline defines all the steps
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).

The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
In an environment using slurm, users may run the following command instead
In an environment using slurm, users may run the following command instead:
```
```bash
sh tools/create_data.sh <partition> kitti
```
### Waymo
Download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put tfrecord files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running
Download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `.tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split `.txt` files into `data/waymo/kitti_format/ImageSets`. Download ground truth `.bin` file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running:
Note that if your local disk does not have enough space for saving converted data, you can change the `out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
Note that:
- If your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
- If you want faster evaluation on Waymo, you can download the preprocessed [metainfo](https://download.openmmlab.com/mmdetection3d/data/waymo/idx2metainfo.pkl) containing `contextname` and `timestamp` to the directory `data/waymo/waymo_format/`. Then, the dataset config is modified like the following:
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the discussion[here](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000) for more details.
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the [discussion](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000) for more details.
### S3DIS, ScanNet and SUN RGB-D
To prepare S3DIS data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/s3dis/README.md/).
To prepare S3DIS data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/s3dis/README.md).
To prepare ScanNet data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/scannet/README.md/).
To prepare ScanNet data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/scannet/README.md).
To prepare SUN RGB-D data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/sunrgbd/README.md/).
To prepare SUN RGB-D data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/sunrgbd/README.md).
### Customized Datasets
...
...
@@ -148,15 +166,15 @@ For using custom datasets, please refer to [Customize Datasets](https://github.c
If you have used v1.0.0rc1-v1.0.0rc4 mmdetection3d to create data infos before, and now you want to use the newest v1.1.0 mmdetection3d, you need to update the data infos file.
We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from [model zoo](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/model_zoo.md/). We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from [model zoo](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/model_zoo.md). We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
-`type`: The name of the corresponding metric, usually associated with the dataset.
-`ann_file`: The path of annotation file.
-`pklfile_prefix`: An optional argument. The filename of the output results in pickle format. If not specified, the results will not be saved to a file.
-`submission_prefix`: An optional argument. The the results will not be saved to a file then you can upload it to do the official evaluation.
-`submission_prefix`: An optional argument. The results will be saved to a file then you can upload it to do the official evaluation.
Examples:
...
...
@@ -70,7 +70,7 @@ Assume that you have already downloaded the checkpoints to the directory `checkp
After running this command, plotted results including input data and the output of networks visualized on the input will be saved in `${SHOW_DIR}`.
After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. `***_gt.png` and `***_pred.png` in multi-modality detection task and vision-based detection task) in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported, you can download the `results.pkl` from the remote server, and visualize the prediction results offline in your local machine.
After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. `***_gt.png` and `***_pred.png` in multi-modality detection task and vision-based detection task) in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported. You can download the `results.pkl` from the remote server, and visualize the prediction results offline in your local machine.
To visualize the results with `Open3D` backend offline, you can run the following command
To visualize the results with `Open3D` backend offline, you can run the following command:
@@ -158,7 +165,7 @@ This allows the inference and results generation to be done in remote server and
## Dataset
We also provide scripts to visualize the dataset without inference. You can use `tools/misc/browse_dataset.py` to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command
We also provide scripts to visualize the dataset without inference. You can use `tools/misc/browse_dataset.py` to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --output-dir${OUTPUT_DIR}
To verify the data consistency and the effect of data augmentation, you can also add `--aug` flag to visualize the data after data augmentation using the command as below:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --aug--output-dir${OUTPUT_DIR}
If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the `--task` args to `multi_modality-det`. An example is showed below
If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the `--task` args to `multi-modality_det`. An example is showed below:
**注意**:此教程目前仅适用于基于雷达和多模态的 3D 目标检测的相关方法,与基于单目图像的 3D 目标检测相关的内容会在之后进行补充。
## 数据准备
您可以在[这里](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d)下载 KITTI 3D 检测数据并解压缩所有 zip 文件。此外,您可以在[这里](https://download.openmmlab.com/mmdetection3d/data/train_planes.zip)下载道路平面信息,其在训练过程中作为一个可选项,用来提高模型的性能。道路平面信息由 [AVOD](https://github.com/kujason/avod) 生成,你可以在[这里](https://github.com/kujason/avod/issues/19)查看更多细节。
您可以在[这里](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d)下载 KITTI 3D 检测数据并解压缩所有 zip 文件。此外,您可以在[这里](https://download.openmmlab.com/mmdetection3d/data/train_planes.zip)下载道路平面信息,其在训练过程中作为一个可选项,用来提高模型的性能。道路平面信息由 [AVOD](https://github.com/kujason/avod) 生成,更多细节请参考[此处](https://github.com/kujason/avod/issues/19)。
KITTI 官方使用全类平均精度(mAP)和平均方向相似度(AOS)来评估 3D 目标检测的性能,请参考[官方网站](http://www.cvlibs.net/datasets/kitti/eval_3dobject.php)和[论文](http://www.cvlibs.net/publications/Geiger2012CVPR.pdf)获取更多细节。
KITTI 官方使用全类平均精度(mAP)和平均方向相似度(AOS)来评估 3D 目标检测的性能,更多细节请参考[官方网站](http://www.cvlibs.net/datasets/kitti/eval_3dobject.php)和[论文](http://www.cvlibs.net/publications/Geiger2012CVPR.pdf)。
Lyft 提出了一个更加严格的用以评估所预测的 3D 检测框的度量指标。判断一个预测框是否是正类的基本评判标准和 KITTI 一样,如基于 3D 交并比进行评估,然而,Lyft 采用与 COCO 相似的方式来计算平均精度 -- 计算 3D 交并比在 0.5-0.95 之间的不同阈值下的平均精度。实际上,重叠部分大于 0.7 的 3D 交并比是一项对于 3D 检测方法比较严格的标准,因此整体的性能似乎会偏低。相比于其他数据集,Lyft 上不同类别的标注不平衡是导致最终结果偏低的另一个重要原因。更多关于度量指标的定义请参考[官方网址](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/overview/evaluation)。
这里将采用官方方法对 Lyft 进行评估,下面展示了一个评估结果的例子:
...
...
@@ -186,9 +187,9 @@ Lyft 提出了一个更加严格的用以评估所预测的 3D 检测框的度
- info\['instances'\]\[i\]\['num_lidar_pts'\]:每个 3D 边界框内包含的激光雷达点数。
- info\['instances'\]\[i\]\['num_radar_pts'\]:每个 3D 边界框内包含的雷达点数。
- info\['instances'\]\[i\]\['bbox_3d_isvalid'\]:每个包围框是否有效。一般情况下,我们只将包含至少一个激光雷达或雷达点的 3D 框作为有效框。
- info\['cam_instances'\]:是一个字典,包含以下键值:`'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`。对于基于视觉的 3D 目标检测任务,我们将整个场景的 3D 标注划分至它们所属于的相应相机中。对于其中的第 i 个实例,我们有:
ScanNet 3D 语义分割数据集的准备和 3D 检测任务的准备很相似,请查看[此文档](https://github.com/open-mmlab/mmdetection3d/blob/master/docs_zh-CN/datasets/scannet_det.md#dataset-preparation)以获取更多细节。
ScanNet 3D 语义分割数据集的准备和 3D 检测任务的准备很相似,请查看[此文档](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/zh_cn/advanced_guides/datasets/scannet_det.md#%E6%95%B0%E6%8D%AE%E9%9B%86%E5%87%86%E5%A4%87)以获取更多细节。
如果你想生成 bin 文件并提交到服务器中,在运行测试指令前你需要在配置文件的 `test_evaluator` 中指定 `submission_prefix`。
在生成 bin 文件后,您可以简单地构建二进制文件 `create_submission`,并按照[指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/)创建一个提交文件。下面是一些示例:
在生成 bin 文件后,您可以简单地构建二进制文件 `create_submission`,并按照[指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/)创建一个提交文件。下面是一些示例:
如果您遵循我们的最佳实践,您只需要安装 CUDA 运行库,这是因为不需要在本地编译 CUDA 代码。但如果您希望从源码编译 MMCV,或者开发其他 CUDA 算子,那么您需要从 NVIDIA 的[官网](https://developer.nvidia.com/cuda-downloads)安装完整的 CUDA 工具链,并且该版本应该与 PyTorch 的 CUDA 版本相匹配,比如在 `conda install` 指令里指定 cudatoolkit 版本。
```
## 自定义安装
### CUDA 版本
当安装 PyTorch 的时候,你需要去指定 CUDA 的版本。如果你不清楚如何选择 CUDA 的版本,可以参考我们如下的建议:
### 不通过 MIM 安装 MMEngine
- 对于 Ampere 的 NVIDIA GPU, 比如 GeForce 30 series 和 NVIDIA A100, CUDA 11 是必须的。
- 对于老款的 NVIDIA GPUs, CUDA 11 是可编译的,但是 CUDA 10.2 提供更好的可编译性,并且更轻量。