Unverified Commit 76e351a7 authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Release v1.0.0rc2

parents 5111eda8 4422eaab
...@@ -6,21 +6,21 @@ name: build ...@@ -6,21 +6,21 @@ name: build
on: on:
push: push:
paths-ignore: paths-ignore:
- '.dev_scripts/**' - ".dev_scripts/**"
- '.github/**.md' - ".github/**.md"
- 'demo/**' - "demo/**"
- 'docker/**' - "docker/**"
- 'tools/**' - "tools/**"
pull_request: pull_request:
paths-ignore: paths-ignore:
- '.dev_scripts/**' - ".dev_scripts/**"
- '.github/**.md' - ".github/**.md"
- 'demo/**' - "demo/**"
- 'docker/**' - "docker/**"
- 'tools/**' - "tools/**"
- 'docs/**' - "docs/**"
- 'docs_zh-CN/**' - "docs_zh-CN/**"
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref }}
...@@ -67,6 +67,10 @@ jobs: ...@@ -67,6 +67,10 @@ jobs:
uses: actions/setup-python@v2 uses: actions/setup-python@v2
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
- name: Fetch GPG keys
run: |
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
- name: Install system dependencies - name: Install system dependencies
run: | run: |
apt-get update && apt-get install -y ffmpeg libsm6 git ninja-build libglib2.0-0 libsm6 libxrender-dev python${{matrix.python-version}}-dev apt-get update && apt-get install -y ffmpeg libsm6 git ninja-build libglib2.0-0 libsm6 libxrender-dev python${{matrix.python-version}}-dev
......
...@@ -25,7 +25,7 @@ ...@@ -25,7 +25,7 @@
[![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE) [![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE)
**News**: We released the codebase v1.0.0rc1. **News**: We released the codebase v1.0.0rc2.
Note: We are going through large refactoring to provide simpler and more unified usage of many modules. Note: We are going through large refactoring to provide simpler and more unified usage of many modules.
...@@ -85,7 +85,14 @@ This project is released under the [Apache 2.0 license](LICENSE). ...@@ -85,7 +85,14 @@ This project is released under the [Apache 2.0 license](LICENSE).
## Changelog ## Changelog
v1.0.0rc1 was released in 1/4/2022. v1.0.0rc2 was released in 1/5/2022.
- Support [spconv 2.0](https://github.com/traveller59/spconv)
- Support [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) with MinkResNet
- Support training models on custom datasets with only point clouds
- Update Registry to distinguish the scope of built functions
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes
Please refer to [changelog.md](docs/en/changelog.md) for details and release history. Please refer to [changelog.md](docs/en/changelog.md) for details and release history.
## Benchmark and model zoo ## Benchmark and model zoo
......
...@@ -25,7 +25,7 @@ ...@@ -25,7 +25,7 @@
[![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE) [![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE)
**新闻**: 我们发布了版本 v1.0.0rc1. **新闻**: 我们发布了版本 v1.0.0rc2.
说明:我们正在进行大规模的重构,以提供对许多模块更简单、更统一的使用。 说明:我们正在进行大规模的重构,以提供对许多模块更简单、更统一的使用。
...@@ -85,7 +85,14 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱, 下一代 ...@@ -85,7 +85,14 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱, 下一代
## 更新日志 ## 更新日志
最新的版本 v1.0.0rc1 在 2022.4.1 发布。 最新的版本 v1.0.0rc2 在 2022.5.1 发布。
- 支持 [spconv 2.0](https://github.com/traveller59/spconv)
- 支持基于 [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) 的 MinkResNet
- 支持在自定义的只有点云的数据集上训练模型
- 更新注册机制以区分不同函数构建的范围
- 用一系列鸟瞰图的算子替换 mmcv.iou3d 以统一旋转框的相关操作
如果想了解更多版本更新细节和历史信息,请阅读[更新日志](docs/zh_cn/changelog.md) 如果想了解更多版本更新细节和历史信息,请阅读[更新日志](docs/zh_cn/changelog.md)
## 基准测试和模型库 ## 基准测试和模型库
......
...@@ -39,7 +39,7 @@ We implement PointPillars and provide the results and checkpoints on KITTI, nuSc ...@@ -39,7 +39,7 @@ We implement PointPillars and provide the results and checkpoints on KITTI, nuSc
| Backbone | Lr schd | Mem (GB) | Inf time (fps) | Private Score | Public Score | Download | | Backbone | Lr schd | Mem (GB) | Inf time (fps) | Private Score | Public Score | Download |
| :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: | | :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: |
|[SECFPN](./hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d.py)|2x|12.2||13.8|14.1|[model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210829_100455-82b81c39.pth) | [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210829_100455.log.json)| |[SECFPN](./hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d.py)|2x|12.2||13.8|14.1|[model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210829_100455-82b81c39.pth) | [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210829_100455.log.json)|
|[FPN](./hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d.py)|2x|9.2||14.8|15.0|[model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210822_095429-0b3d6196.pth) | [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210822.log.json)| |[FPN](./hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d.py)|2x|9.2||14.8|15.0|[model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210822_095429-0b3d6196.pth) | [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210822_095429.log.json)|
### Waymo ### Waymo
......
...@@ -105,7 +105,7 @@ Assume that you have already downloaded the checkpoints to the directory `checkp ...@@ -105,7 +105,7 @@ Assume that you have already downloaded the checkpoints to the directory `checkp
**Notice**: To generate submissions on Lyft, `csv_savepath` must be given in the `--eval-options`. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit). **Notice**: To generate submissions on Lyft, `csv_savepath` must be given in the `--eval-options`. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit).
Note that in the [config of Lyft dataset](../configs/_base_/datasets/lyft-3d.py), the value of `ann_file` keyword in `test` is `data_root + 'lyft_infos_test.pkl'`, which is the official test set of Lyft without annotation. To test on the validation set, please change this to `data_root + 'lyft_infos_val.pkl'`. Note that in the [config of Lyft dataset](../../configs/_base_/datasets/lyft-3d.py), the value of `ann_file` keyword in `test` is `data_root + 'lyft_infos_test.pkl'`, which is the official test set of Lyft without annotation. To test on the validation set, please change this to `data_root + 'lyft_infos_val.pkl'`.
8. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics. 8. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics.
......
## Changelog ## Changelog
### v1.0.0rc2 (1/5/2022)
#### Highlights
- Support spconv 2.0
- Support MinkowskiEngine with MinkResNet
- Support training models on custom datasets with only point clouds
- Update Registry to distinguish the scope of built functions
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes
#### New Features
- Add loader arguments in the configuration files (#1388)
- Support [spconv 2.0](https://github.com/traveller59/spconv) when the package is installed. Users can still use spconv 1.x in MMCV with CUDA 9.0 (only cost more memory) without losing the compatibility of model weights between two versions (#1421)
- Support MinkowskiEngine with MinkResNet (#1422)
#### Improvements
- Add the documentation for model deployment (#1373, #1436)
- Add Chinese documentation of
- Speed benchmark (#1379)
- LiDAR-based 3D detection (#1368)
- LiDAR 3D segmentation (#1420)
- Coordinate system refactoring (#1384)
- Support training models on custom datasets with only point clouds (#1393)
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes (#1403, #1418)
- Update Registry to distinguish the scope of building functions (#1412, #1443)
- Replace recommonmark with myst_parser for documentation rendering (#1414)
#### Bug Fixes
- Fix the show pipeline in the [browse_dataset.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/misc/browse_dataset.py) (#1376)
- Fix missing __init__ files after coordinate system refactoring (#1383)
- Fix the incorrect yaw in the visualization caused by coordinate system refactoring (#1407)
- Fix `NaiveSyncBatchNorm1d` and `NaiveSyncBatchNorm2d` to support non-distributed cases and more general inputs (#1435)
#### Contributors
A total of 11 developers contributed to this release.
@ZCMax, @ZwwWayne, @Tai-Wang, @VVsssssk, @HanaRo, @JoeyforJoy, @ansonlcy, @filaPro, @jshilong, @Xiangxu-0103, @deleomike
### v1.0.0rc1 (1/4/2022) ### v1.0.0rc1 (1/4/2022)
#### Compatibility #### Compatibility
...@@ -72,7 +114,7 @@ ...@@ -72,7 +114,7 @@
A total of 9 developers contributed to this release. A total of 9 developers contributed to this release.
@ZCMax, @ZwwWayne, @wHao-Wu, @Tai-Wang, @wangruohui, @zjwzcx, @Xiangxu-0103, @EdAyers, @hongye-dev @ZCMax, @ZwwWayne, @wHao-Wu, @Tai-Wang, @wangruohui, @zjwzcx, @Xiangxu-0103, @EdAyers, @hongye-dev, @zhanggefan
### v1.0.0rc0 (18/2/2022) ### v1.0.0rc0 (18/2/2022)
......
...@@ -48,7 +48,7 @@ extensions = [ ...@@ -48,7 +48,7 @@ extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.autodoc',
'sphinx.ext.napoleon', 'sphinx.ext.napoleon',
'sphinx.ext.viewcode', 'sphinx.ext.viewcode',
'recommonmark', 'myst_parser',
'sphinx_markdown_tables', 'sphinx_markdown_tables',
'sphinx.ext.autosectionlabel', 'sphinx.ext.autosectionlabel',
'sphinx_copybutton', 'sphinx_copybutton',
......
...@@ -11,28 +11,29 @@ ...@@ -11,28 +11,29 @@
The required versions of MMCV, MMDetection and MMSegmentation for different versions of MMDetection3D are as below. Please install the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues. The required versions of MMCV, MMDetection and MMSegmentation for different versions of MMDetection3D are as below. Please install the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version | | MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
|:-------------------:|:-------------------:|:-------------------:|:-------------------:| | :-------------------: | :---------------------: | :--------------------: | :------------------------: |
| master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0| | master | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0| | v1.0.0rc2 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0| | v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0 |
| 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0| | v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0| | 0.18.1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.17.3 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.18.0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.17.2 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.17.3 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.1 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.17.2 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.0 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.17.1 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.16.0 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.17.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.15.0 | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0| | 0.16.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.14.0 | mmdet>=2.10.0, <=2.11.0| mmseg==0.14.0 | mmcv-full>=1.3.1, <=1.4.0| | 0.15.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.13.0 | mmdet>=2.10.0, <=2.11.0| Not required | mmcv-full>=1.2.4, <=1.4.0| | 0.14.0 | mmdet>=2.10.0, <=2.11.0 | mmseg==0.14.0 | mmcv-full>=1.3.1, <=1.4.0 |
| 0.12.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0| | 0.13.0 | mmdet>=2.10.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.11.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0| | 0.12.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.10.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0| | 0.11.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.9.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0| | 0.10.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.8.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0| | 0.9.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.7.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0| | 0.8.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.6.0 | mmdet>=2.4.0, <=2.11.0 | Not required | mmcv-full>=1.1.3, <=1.2.0| | 0.7.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5| | 0.6.0 | mmdet>=2.4.0, <=2.11.0 | Not required | mmcv-full>=1.1.3, <=1.2.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5 |
# Installation # Installation
...@@ -76,14 +77,14 @@ You can check the supported CUDA version for precompiled packages on the [PyTorc ...@@ -76,14 +77,14 @@ You can check the supported CUDA version for precompiled packages on the [PyTorc
`E.g. 1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install `E.g. 1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install
PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1. PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.
```python ```shell
conda install pytorch==1.5.0 cudatoolkit=10.1 torchvision==0.6.0 -c pytorch conda install pytorch==1.5.0 cudatoolkit=10.1 torchvision==0.6.0 -c pytorch
``` ```
`E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install `E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install
PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2. PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.
```python ```shell
conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
``` ```
...@@ -192,6 +193,26 @@ you can install it before installing MMCV. ...@@ -192,6 +193,26 @@ you can install it before installing MMCV.
4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`. 4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`.
We have supported spconv2.0. If the user has installed spconv2.0, the code will use spconv2.0 first, which will take up less GPU memory than using the default mmcv spconv. Users can use the following commands to install spconv2.0:
```bash
pip install cumm-cuxxx
pip install spconv-cuxxx
```
Where xxx is the CUDA version in the environment.
For example, using CUDA 10.2, the command will be `pip install cumm-cu102 && pip install spconv-cu102`.
Supported CUDA versions include 10.2, 11.1, 11.3, and 11.4. Users can also install it by building from the source. For more details please refer to [spconv v2.x](https://github.com/traveller59/spconv).
We also support Minkowski Engine as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/NVIDIA/MinkowskiEngine#installation) or use `pip`:
```shell
conda install openblas-devel -c anaconda
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=/opt/conda/include" --install-option="--blas=openblas"
```
5. The code can not be built for CPU only environment (where CUDA isn't available) for now. 5. The code can not be built for CPU only environment (where CUDA isn't available) for now.
## Another option: Docker Image ## Another option: Docker Image
......
...@@ -47,7 +47,7 @@ The definition of coordinate systems in this tutorial is actually **more than ju ...@@ -47,7 +47,7 @@ The definition of coordinate systems in this tutorial is actually **more than ju
The illustration of the three coordinate systems is shown below: The illustration of the three coordinate systems is shown below:
![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/v1.0.0.dev0/resources/coord_sys_all.png) ![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/master/resources/coord_sys_all.png)
The three figures above are the 3D coordinate systems while the three figures below are the bird's eye view. The three figures above are the 3D coordinate systems while the three figures below are the bird's eye view.
...@@ -132,7 +132,7 @@ __|____|____|____|_________\ x right ...@@ -132,7 +132,7 @@ __|____|____|____|_________\ x right
### KITTI ### KITTI
The raw annotation of KITTI is under camera coordinate system, see [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/tools/data_converter/kitti_data_utils.py). In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/datasets/kitti_dataset.py). For training vision-based models, the data is kept in the camera coordinate system. The raw annotation of KITTI is under camera coordinate system, see [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/kitti_data_utils.py). In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py). For training vision-based models, the data is kept in the camera coordinate system.
In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's eye view): In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's eye view):
...@@ -153,7 +153,7 @@ We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also s ...@@ -153,7 +153,7 @@ We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also s
### NuScenes ### NuScenes
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to ``$$`(dy, dx)`$$``, or ``$$`(w, l)`$$``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/docs/datasets/nuscenes_det.md#notes). NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to ``$$`(dy, dx)`$$``, or ``$$`(w, l)`$$``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py). Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
...@@ -171,7 +171,7 @@ The raw data of ScanNet is not point cloud but mesh. The sampled point cloud dat ...@@ -171,7 +171,7 @@ The raw data of ScanNet is not point cloud but mesh. The sampled point cloud dat
The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion. The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion.
For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/tools/data_converter/sunrgbd_data_utils.py). For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/sunrgbd_data_utils.py).
### S3DIS ### S3DIS
...@@ -199,25 +199,25 @@ Finally, the yaw angle should also be converted: ...@@ -199,25 +199,25 @@ Finally, the yaw angle should also be converted:
- ``$$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$`` - ``$$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$``
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details. See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
### Bird's Eye View ### Bird's Eye View
The BEV of a camera coordinate system box is ``$$`(x, z, dx, dz, -r)`$$`` if the 3D box is ``$$`(x, y, z, dx, dy, dz, r)`$$``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground. The BEV of a camera coordinate system box is ``$$`(x, z, dx, dz, -r)`$$`` if the 3D box is ``$$`(x, y, z, dx, dy, dz, r)`$$``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/core/bbox/structures/cam_box3d.py) for more details. See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
### Rotation of boxes ### Rotation of boxes
We set the rotation of all kinds of boxes to be counter-clockwise about the gravity axis. Therefore, to rotate a 3D box we first calculate the new box center, and then we add the rotation angle to the yaw angle. We set the rotation of all kinds of boxes to be counter-clockwise about the gravity axis. Therefore, to rotate a 3D box we first calculate the new box center, and then we add the rotation angle to the yaw angle.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/core/bbox/structures/cam_box3d.py) for more details. See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
## Common FAQ ## Common FAQ
#### Q1: Are the box related ops universal to all coordinate system types? #### Q1: Are the box related ops universal to all coordinate system types?
No. For example, the ops under [this folder](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/ops/roiaware_pool3d) are applicable to boxes under Depth or LiDAR coordinate system only. The evaluation functions for KITTI dataset [here](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/core/evaluation/kitti_utils) are only applicable to boxes under Camera coordinate system since the rotation is clockwise if viewed from above. No. For example, [RoI-Aware Pooling ops](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/roiaware_pool3d.py) is applicable to boxes under Depth or LiDAR coordinate system only. The evaluation functions for KITTI dataset [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/kitti_utils) are only applicable to boxes under Camera coordinate system since the rotation is clockwise if viewed from above.
For each box related op, we have marked the type of boxes to which we can apply the op. For each box related op, we have marked the type of boxes to which we can apply the op.
......
...@@ -8,3 +8,4 @@ ...@@ -8,3 +8,4 @@
customize_runtime.md customize_runtime.md
coord_sys_tutorial.md coord_sys_tutorial.md
backends_support.md backends_support.md
pure_point_cloud_dataset.md
# Tutorial 8: Use Pure Point Cloud Dataset
## Data Pre-Processing
### Convert Point cloud format
Currently, we only support bin format point cloud training and inference, before training on your own datasets, you need to transform your point cloud format to bin file. The common point cloud data formats include pcd and las, we provide some open-source tools for reference.
1. Convert pcd to bin: https://github.com/leofansq/Tools_RosBag2KITTI
2. Convert las to bin: The common conversion path is las -> pcd -> bin, and the conversion from las -> pcd can be achieved through [this tool](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor).
### Point cloud annotation
MMDetection3D does not support point cloud annotation. Some open-source annotation tools are offered for reference:
- [SUSTechPOINTS](https://github.com/naurril/SUSTechPOINTS)
- [LATTE](https://github.com/bernwang/latte)
Besides, we improved [LATTE](https://github.com/bernwang/latte) for better usage. More details can be found [here](https://arxiv.org/abs/2011.10174).
## Support new data format
To support a new data format, you can either convert them to existing formats or directly convert them to the middle format. You could also choose to convert them offline (before training by a script) or online (implement a new dataset and do the conversion at training).
### Reorganize new data formats to existing format
Once your datasets only contain point cloud file and 3D Bounding box annotations, without calib file. We recommend converting it into the basic formats, the annotations files in basic format has the following necessary keys:
```python
[
{'sample_idx':
'lidar_points': {'lidar_path': velodyne_path,
....
},
'annos': {'box_type_3d': (str) 'LiDAR/Camera/Depth'
'gt_bboxes_3d': <np.ndarray> (n, 7)
'gt_names': [list]
....
}
'calib': { .....}
'images': { .....}
}
]
```
In MMDetection3D, for the data that is inconvenient to read directly online, we recommend converting it into into basic format as above and do the conversion offline, thus you only need to modify the config's data annotation paths and classes after the conversion.
To use data that share a similar format as the existing datasets, e.g., Lyft has a similar format as the nuScenes dataset, we recommend directly implementing a new data converter and a dataset class to convert the data and load the data, respectively. In this procedure, the code can inherit from the existing dataset classes to reuse the code.
### Reorganize new data format to middle format
There is also a way if users do not want to convert the annotation format to existing formats.
Actually, we convert all the supported datasets into pickle files, which summarize useful information for model training and inference.
The annotation of a dataset is a list of dict, each dict corresponds to a frame.
A basic example (used in KITTI) is as follows. A frame consists of several keys, like `image`, `point_cloud`, `calib` and `annos`.
As long as we could directly read data according to these information, the organization of raw data could also be different from existing ones.
With this design, we provide an alternative choice for customizing datasets.
```python
[
{'image': {'image_idx': 0, 'image_path': 'training/image_2/000000.png', 'image_shape': array([ 370, 1224], dtype=int32)},
'point_cloud': {'num_features': 4, 'velodyne_path': 'training/velodyne/000000.bin'},
'calib': {'P0': array([[707.0493, 0. , 604.0814, 0. ],
[ 0. , 707.0493, 180.5066, 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 0. , 1. ]]),
'P1': array([[ 707.0493, 0. , 604.0814, -379.7842],
[ 0. , 707.0493, 180.5066, 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 0. , 1. ]]),
'P2': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, 4.575831e+01],
[ 0.000000e+00, 7.070493e+02, 1.805066e+02, -3.454157e-01],
[ 0.000000e+00, 0.000000e+00, 1.000000e+00, 4.981016e-03],
[ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]),
'P3': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, -3.341081e+02],
[ 0.000000e+00, 7.070493e+02, 1.805066e+02, 2.330660e+00],
[ 0.000000e+00, 0.000000e+00, 1.000000e+00, 3.201153e-03],
[ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]),
'R0_rect': array([[ 0.9999128 , 0.01009263, -0.00851193, 0. ],
[-0.01012729, 0.9999406 , -0.00403767, 0. ],
[ 0.00847068, 0.00412352, 0.9999556 , 0. ],
[ 0. , 0. , 0. , 1. ]]),
'Tr_velo_to_cam': array([[ 0.00692796, -0.9999722 , -0.00275783, -0.02457729],
[-0.00116298, 0.00274984, -0.9999955 , -0.06127237],
[ 0.9999753 , 0.00693114, -0.0011439 , -0.3321029 ],
[ 0. , 0. , 0. , 1. ]]),
'Tr_imu_to_velo': array([[ 9.999976e-01, 7.553071e-04, -2.035826e-03, -8.086759e-01],
[-7.854027e-04, 9.998898e-01, -1.482298e-02, 3.195559e-01],
[ 2.024406e-03, 1.482454e-02, 9.998881e-01, -7.997231e-01],
[ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]])},
'annos': {'name': array(['Pedestrian'], dtype='<U10'), 'truncated': array([0.]), 'occluded': array([0]), 'alpha': array([-0.2]), 'bbox': array([[712.4 , 143. , 810.73, 307.92]]), 'dimensions': array([[1.2 , 1.89, 0.48]]), 'location': array([[1.84, 1.47, 8.41]]), 'rotation_y': array([0.01]), 'score': array([0.]), 'index': array([0], dtype=int32), 'group_ids': array([0], dtype=int32), 'difficulty': array([0], dtype=int32), 'num_points_in_gt': array([377], dtype=int32)}}
...
]
```
On top of this you can write a new Dataset class inherited from `Custom3DDataset`, and overwrite related methods,
like [KittiDataset](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py) and [ScanNetDataset](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/scannet_dataset.py).
### An example of customized dataset
Here we provide an example of customized dataset.
Assume the annotation has been reorganized into a list of dict in pickle files like basic format.
The bounding boxes annotations are stored in `annotation.pkl` as the following
```
{'sample_idx': 120,
'lidar_points': {'lidar_path': 'training/000004.bin'},
'annos': {'bbox_type_3d': 'LiDAR',
'gt_bboxes_3d': array([[1.48129511, 3.52074146, 1.85652947, 1.74445975, 0.23195696, 0.57235193, -0.25525],
[ 2.90395617, -3.48033905, 1.52682471,[0.66077662, 0.17072392, 0.67153597, 2.23145]]),
'gt_names': ['car', 'pedestrian']
}
}
```
If the pkl only contains the necessary keys, you can directly use the `Custom3DDataset` for training:
Then in the config, to use `Custom3DDataset` you can modify the config as the following
```python
dataset_A_train = dict(
type='Custom3DDataset',
ann_file = 'annotation.pkl',
pipeline=train_pipeline
)
```
otherwise you need to create a new dataset in `mmdet3d/datasets/my_dataset.py` to load the data and rewrite the `get_ann_info` method.
```python
import numpy as np
from os import path as osp
from mmdet3d.core import show_result
from mmdet3d.core.bbox import DepthInstance3DBoxes
from mmdet.datasets import DATASETS
from .custom_3d import Custom3DDataset
@DATASETS.register_module()
class MyDataset(Custom3DDataset):
CLASSES = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
'bookshelf', 'picture', 'counter', 'desk', 'curtain',
'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin')
def __init__(self,
data_root,
ann_file,
pipeline=None,
classes=None,
modality=None,
box_type_3d='Depth',
filter_empty_gt=True,
test_mode=False):
super().__init__(
data_root=data_root,
ann_file=ann_file,
pipeline=pipeline,
classes=classes,
modality=modality,
box_type_3d=box_type_3d,
filter_empty_gt=filter_empty_gt,
test_mode=test_mode)
def get_ann_info(self, index):
# Use index to get the annos, thus the evalhook could also use this api
info = self.data_infos[index]
if info['annos']['gt_num'] != 0:
gt_bboxes_3d = info['annos']['gt_boxes_upright_depth'].astype(
np.float32) # k, 6
gt_labels_3d = info['annos']['class'].astype(np.int64)
else:
gt_bboxes_3d = np.zeros((0, 6), dtype=np.float32)
gt_labels_3d = np.zeros((0, ), dtype=np.int64)
# to target box structure
gt_bboxes_3d = DepthInstance3DBoxes(
gt_bboxes_3d,
box_dim=gt_bboxes_3d.shape[-1],
with_yaw=False,
origin=(0.5, 0.5, 0.5)).convert_to(self.box_mode_3d)
pts_instance_mask_path = osp.join(self.data_root,
info['pts_instance_mask_path'])
pts_semantic_mask_path = osp.join(self.data_root,
info['pts_semantic_mask_path'])
anns_results = dict(
gt_bboxes_3d=gt_bboxes_3d,
gt_labels_3d=gt_labels_3d,
pts_instance_mask_path=pts_instance_mask_path,
pts_semantic_mask_path=pts_semantic_mask_path)
return anns_results
```
Then in the config, to use `MyDataset` you can modify the config as the following
```python
dataset_A_train = dict(
type='MyDataset',
ann_file = 'annotation.pkl',
pipeline=train_pipeline
)
```
## Customize datasets by dataset wrappers
MMDetection3D also supports many dataset wrappers to mix the dataset or modify the dataset distribution for training like MMDetection.
Currently it supports to three dataset wrappers as below:
- `RepeatDataset`: simply repeat the whole dataset.
- `ClassBalancedDataset`: repeat dataset in a class balanced manner.
- `ConcatDataset`: concat datasets.
### Repeat dataset
We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset is `Dataset_A`, to repeat it, the config looks like the following
```python
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict( # This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
### Class balanced dataset
We use `ClassBalancedDataset` as wrapper to repeat the dataset based on category
frequency. The dataset to repeat needs to instantiate function `self.get_cat_ids(idx)`
to support `ClassBalancedDataset`.
For example, to repeat `Dataset_A` with `oversample_thr=1e-3`, the config looks like the following
```python
dataset_A_train = dict(
type='ClassBalancedDataset',
oversample_thr=1e-3,
dataset=dict( # This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
You may refer to [source code](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/dataset_wrappers.py) for details.
### Concatenate dataset
There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
1. The option `separate_eval=False` assumes the datasets use `self.data_infos` during evaluation. Therefore, COCO datasets do not support this behavior since COCO datasets do not fully rely on `self.data_infos` for evaluation. Combining different types of datasets and evaluating them as a whole is not tested thus is not suggested.
2. Evaluating `ClassBalancedDataset` and `RepeatDataset` is not supported thus evaluating concatenated datasets of these types is also not supported.
A more complex example that repeats `Dataset_A` and `Dataset_B` by N and M times, respectively, and then concatenates the repeated datasets is as the following.
```python
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict(
type='Dataset_A',
...
pipeline=train_pipeline
)
)
dataset_A_val = dict(
...
pipeline=test_pipeline
)
dataset_A_test = dict(
...
pipeline=test_pipeline
)
dataset_B_train = dict(
type='RepeatDataset',
times=M,
dataset=dict(
type='Dataset_B',
...
pipeline=train_pipeline
)
)
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
## Modify Dataset Classes
With existing dataset types, we can modify the class names of them to train subset of the annotations.
For example, if you want to train only three classes of the current dataset,
you can modify the classes of dataset.
The dataset will filter out the ground truth boxes of other classes automatically.
```python
classes = ('person', 'bicycle', 'car')
data = dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
MMDetection V2.0 also supports to read the classes from a file, which is common in real applications.
For example, assume the `classes.txt` contains the name of classes as the following.
```
person
bicycle
car
```
Users can set the classes as a file path, the dataset will load it and convert it to a list automatically.
```python
classes = 'path/to/classes.txt'
data = dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
## Loading Point Clouds Adjustment
Generally speaking, the most basic bin data contains (x, y, z) information, and some also include intensity, elongation (point cloud elongation), timestamp, and the point cloud dimension ranges from 3 to 6. In MMDetection3D, you need to adjust the some settings in config while customized dataset training:
```python
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
# adjust accordingly according to the dimension
# of the point cloud of your own dataset
load_dim=3,
# actually used dimension,you can also specify the
# specific dimension in list format
use_dim=3),
```
## Training Setting Adjustment
In order to avoid some problems in the training process and improve the performance of the model on the custom dataset, some training settings need to be adjusted according to the dataset.
### Adjust Point Cloud Range and Annotations in Config
For example, we can adjust `point_cloud_range` in config file to change training point cloud range. In KITTI dataset, the `point_cloud_range` is set to be `[0, -39.68, -3, 69.12, 39.68, 1]`.
By setting point cloud range, the `PointsRangeFilter` is used to filter point cloud and its mask (semantic and instance), and `ObjectRangeFilter` is used to filter 3D bounding boxes.
```python
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
```
### Adjust Voxel Size in Config
Here you can refer to the setting of the existing datasets. theoretically, `voxel_size` is linked to the setting of `point_cloud_range`. Setting a smaller `voxel_size` will increase the voxel num and the corresponding memory consumption. In addition, the following issues need to be noted:
if the `point_cloud_range` and `voxel_size` are set to be `[0, -40, -3, 70.4, 40, 1]` and `[0.05, 0.05, 0.1]` respectively, then the shape of intermediate feature map should be `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`. More details refers to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/382).
### Adjust Anchor Range and Size in Config
```python
anchor_generator=dict(
type='Anchor3DRangeGenerator',
ranges=[
[0, -40.0, -0.6, 70.4, 40.0, -0.6],
[0, -40.0, -0.6, 70.4, 40.0, -0.6],
[0, -40.0, -1.78, 70.4, 40.0, -1.78],
],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
```
Regarding the setting of `anchor_range`, it is generally adjusted according to dataset. Note that `z` value needs to be adjusted accordingly to the position of the point cloud, please refer to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/986).
Regarding the setting of `anchor_size`, it is usually necessary to count the average length, width and height of the entire training dataset as `anchor_size` to obtain the best results.
**Note** (related to MMDetection):
- Before MMDetection v2.5.0, the dataset will filter out the empty GT images automatically if the classes are set and there is no way to disable that through config. This is an undesirable behavior and introduces confusion because if the classes are not set, the dataset only filters the empty GT images when `filter_empty_gt=True` and `test_mode=False`. After MMDetection v2.5.0, we decouple the image filtering process and the classes modification, i.e., the dataset will only filter empty GT images when `filter_empty_gt=True` and `test_mode=False`, no matter whether the classes are set. Thus, setting the classes only influences the annotations of classes used for training and users could decide whether to filter empty GT images by themselves.
- Since the middle format only has box labels and does not contain the class names, when using `CustomDataset`, users cannot filter out the empty GT images through configs but only do this offline.
- The features for setting dataset classes and dataset filtering will be refactored to be more user-friendly in the future (depends on the progress).
...@@ -105,7 +105,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [- ...@@ -105,7 +105,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [-
**注意**:为了生成 Lyft 数据集的提交结果,`--eval-options` 必须指定 `csv_savepath`。生成 csv 文件后,你可以使用[网站](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit)上给出的 kaggle 命令提交结果。 **注意**:为了生成 Lyft 数据集的提交结果,`--eval-options` 必须指定 `csv_savepath`。生成 csv 文件后,你可以使用[网站](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit)上给出的 kaggle 命令提交结果。
注意在 [Lyft 数据集的配置文件](../configs/_base_/datasets/lyft-3d.py)`test` 中的 `ann_file` 值为 `data_root + 'lyft_infos_test.pkl'`,是没有标注的 Lyft 官方测试集。要在验证数据集上测试,请把它改为 `data_root + 'lyft_infos_val.pkl'` 注意在 [Lyft 数据集的配置文件](../../configs/_base_/datasets/lyft-3d.py)`test` 中的 `ann_file` 值为 `data_root + 'lyft_infos_test.pkl'`,是没有标注的 Lyft 官方测试集。要在验证数据集上测试,请把它改为 `data_root + 'lyft_infos_val.pkl'`
8. 使用8块显卡在 waymo 数据集上测试 PointPillars,使用 waymo 度量方法计算 mAP 8. 使用8块显卡在 waymo 数据集上测试 PointPillars,使用 waymo 度量方法计算 mAP
......
...@@ -77,7 +77,7 @@ KITTI 官方提供的目标检测开发[工具包](https://s3.eu-central-1.amazo ...@@ -77,7 +77,7 @@ KITTI 官方提供的目标检测开发[工具包](https://s3.eu-central-1.amazo
第二步是准备配置文件来帮助数据集的读取和使用,另外,为了在 3D 检测中获得不错的性能,调整超参数通常是必要的。 第二步是准备配置文件来帮助数据集的读取和使用,另外,为了在 3D 检测中获得不错的性能,调整超参数通常是必要的。
假设我们想要使用 PointPillars 模型在 Waymo 数据集上实现三类的 3D 目标检测:vehicle、cyclist、pedestrian,参照 KITTI 数据集[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/kitti-3d-3class.py)、模型[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py)[整体配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py),我们需要准备[数据集配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/waymoD5-3d-3class.py)[模型配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_waymo.py)),并将这两种文件进行结合得到[整体配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py) 假设我们想要使用 PointPillars 模型在 Waymo 数据集上实现三类的 3D 目标检测:vehicle、cyclist、pedestrian,参照 KITTI 数据集[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/kitti-3d-3class.py)、模型[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py)[整体配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py),我们需要准备[数据集配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/waymoD5-3d-3class.py)[模型配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_waymo.py),并将这两种文件进行结合得到[整体配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py)
## 训练一个新的模型 ## 训练一个新的模型
......
# 基准测试 # 基准测试
这里我们对 MMDetection3D 和其他开源 3D 目标检测代码库中模型的训练速度和测试速度进行了基准测试。
## 配置
* 硬件:8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
* 软件:Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
* 模型:由于不同代码库所实现的模型种类有所不同,在基准测试中我们选择了 SECOND、PointPillars、Part-A2 和 VoteNet 几种模型,分别与其他代码库中的相应模型实现进行了对比。
* 度量方法:我们使用整个训练过程中的平均吞吐量作为度量方法,并跳过每个 epoch 的前 50 次迭代以消除训练预热的影响。
## 主要结果
对于模型的训练速度(样本/秒),我们将 MMDetection3D 与其他实现了相同模型的代码库进行了对比。结果如下所示,表格内的数字越大,代表模型的训练速度越快。代码库中不支持的模型使用 `×` 进行标识。
| 模型 | MMDetection3D | OpenPCDet | votenet | Det3D |
| :-----------------: | :-----------: | :-------: | :-----: | :---: |
| VoteNet | 358 | × | 77 | × |
| PointPillars-car | 141 | × | × | 140 |
| PointPillars-3class | 107 | 44 | × | × |
| SECOND | 40 | 30 | × | × |
| Part-A2 | 17 | 14 | × | × |
## 测试细节
### 为了计算速度所做的修改
* __MMDetection3D__:我们尝试使用与其他代码库中尽可能相同的配置,具体配置细节见 [基准测试配置](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark)
* __Det3D__:为了与 Det3D 进行比较,我们使用了 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 所对应的代码版本。
* __OpenPCDet__:为了与 OpenPCDet 进行比较,我们使用了 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 所对应的代码版本。
为了计算训练速度,我们在 `./tools/train_utils/train_utils.py` 文件中添加了用于记录运行时间的代码。我们对每个 epoch 的训练速度进行计算,并报告所有 epoch 的平均速度。
<details>
<summary>
(为了使用相同方法进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/train_utils/train_utils.py b/tools/train_utils/train_utils.py
index 91f21dd..021359d 100644
--- a/tools/train_utils/train_utils.py
+++ b/tools/train_utils/train_utils.py
@@ -2,6 +2,7 @@ import torch
import os
import glob
import tqdm
+import datetime
from torch.nn.utils import clip_grad_norm_
@@ -13,7 +14,10 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
if rank == 0:
pbar = tqdm.tqdm(total=total_it_each_epoch, leave=leave_pbar, desc='train', dynamic_ncols=True)
+ start_time = None
for cur_it in range(total_it_each_epoch):
+ if cur_it > 49 and start_time is None:
+ start_time = datetime.datetime.now()
try:
batch = next(dataloader_iter)
except StopIteration:
@@ -55,9 +59,11 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
tb_log.add_scalar('learning_rate', cur_lr, accumulated_iter)
for key, val in tb_dict.items():
tb_log.add_scalar('train_' + key, val, accumulated_iter)
+ endtime = datetime.datetime.now()
+ speed = (endtime - start_time).seconds / (total_it_each_epoch - 50)
if rank == 0:
pbar.close()
- return accumulated_iter
+ return accumulated_iter, speed
def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_cfg,
@@ -65,6 +71,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
lr_warmup_scheduler=None, ckpt_save_interval=1, max_ckpt_save_num=50,
merge_all_iters_to_one_epoch=False):
accumulated_iter = start_iter
+ speeds = []
with tqdm.trange(start_epoch, total_epochs, desc='epochs', dynamic_ncols=True, leave=(rank == 0)) as tbar:
total_it_each_epoch = len(train_loader)
if merge_all_iters_to_one_epoch:
@@ -82,7 +89,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
cur_scheduler = lr_warmup_scheduler
else:
cur_scheduler = lr_scheduler
- accumulated_iter = train_one_epoch(
+ accumulated_iter, speed = train_one_epoch(
model, optimizer, train_loader, model_func,
lr_scheduler=cur_scheduler,
accumulated_iter=accumulated_iter, optim_cfg=optim_cfg,
@@ -91,7 +98,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
total_it_each_epoch=total_it_each_epoch,
dataloader_iter=dataloader_iter
)
-
+ speeds.append(speed)
# save trained model
trained_epoch = cur_epoch + 1
if trained_epoch % ckpt_save_interval == 0 and rank == 0:
@@ -107,6 +114,8 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
save_checkpoint(
checkpoint_state(model, optimizer, trained_epoch, accumulated_iter), filename=ckpt_name,
)
+ print(speed)
+ print(f'*******{sum(speeds) / len(speeds)}******')
def model_state_to_cpu(model_state):
```
</details>
### VoteNet
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd-3d-10class.py 8 --no-validate
```
* __votenet__:在 commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566) 版本下,执行如下命令:
```bash
python train.py --dataset sunrgbd --batch_size 16
```
然后执行如下命令,对测试速度进行评估:
```bash
python eval.py --dataset sunrgbd --checkpoint_path log_sunrgbd/checkpoint.tar --batch_size 1 --dump_dir eval_sunrgbd --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal
```
注意,为了计算推理速度,我们对 `eval.py` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/eval.py b/eval.py
index c0b2886..04921e9 100644
--- a/eval.py
+++ b/eval.py
@@ -10,6 +10,7 @@ import os
import sys
import numpy as np
from datetime import datetime
+import time
import argparse
import importlib
import torch
@@ -28,7 +29,7 @@ parser.add_argument('--checkpoint_path', default=None, help='Model checkpoint pa
parser.add_argument('--dump_dir', default=None, help='Dump dir to save sample outputs [default: None]')
parser.add_argument('--num_point', type=int, default=20000, help='Point Number [default: 20000]')
parser.add_argument('--num_target', type=int, default=256, help='Point Number [default: 256]')
-parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 8]')
+parser.add_argument('--batch_size', type=int, default=1, help='Batch Size during training [default: 8]')
parser.add_argument('--vote_factor', type=int, default=1, help='Number of votes generated from each seed [default: 1]')
parser.add_argument('--cluster_sampling', default='vote_fps', help='Sampling strategy for vote clusters: vote_fps, seed_fps, random [default: vote_fps]')
parser.add_argument('--ap_iou_thresholds', default='0.25,0.5', help='A list of AP IoU thresholds [default: 0.25,0.5]')
@@ -132,6 +133,7 @@ CONFIG_DICT = {'remove_empty_box': (not FLAGS.faster_eval), 'use_3d_nms': FLAGS.
# ------------------------------------------------------------------------- GLOBAL CONFIG END
def evaluate_one_epoch():
+ time_list = list()
stat_dict = {}
ap_calculator_list = [APCalculator(iou_thresh, DATASET_CONFIG.class2type) \
for iou_thresh in AP_IOU_THRESHOLDS]
@@ -144,6 +146,8 @@ def evaluate_one_epoch():
# Forward pass
inputs = {'point_clouds': batch_data_label['point_clouds']}
+ torch.cuda.synchronize()
+ start_time = time.perf_counter()
with torch.no_grad():
end_points = net(inputs)
@@ -161,6 +165,12 @@ def evaluate_one_epoch():
batch_pred_map_cls = parse_predictions(end_points, CONFIG_DICT)
batch_gt_map_cls = parse_groundtruths(end_points, CONFIG_DICT)
+ torch.cuda.synchronize()
+ elapsed = time.perf_counter() - start_time
+ time_list.append(elapsed)
+
+ if len(time_list==200):
+ print("average inference time: %4f"%(sum(time_list[5:])/len(time_list[5:])))
for ap_calculator in ap_calculator_list:
ap_calculator.step(batch_pred_map_cls, batch_gt_map_cls)
```
### PointPillars-car
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
```
* __Det3D__:在 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 版本下,使用 `kitti_point_pillars_mghead_syncbn.py` 并执行如下命令:
```bash
./tools/scripts/train.sh --launcher=slurm --gpus=8
```
注意,为了训练 PointPillars,我们对 `train.sh` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/scripts/train.sh b/tools/scripts/train.sh
index 3a93f95..461e0ea 100755
--- a/tools/scripts/train.sh
+++ b/tools/scripts/train.sh
@@ -16,9 +16,9 @@ then
fi
# Voxelnet
-python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
+# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/ nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$NUSC_CBGS_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ lyft_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$LYFT_CBGS_WORK_DIR
# PointPillars
-# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ original_pp_mghead_syncbn_kitti.py --work_dir=$PP_WORK_DIR
+python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ kitti_point_pillars_mghead_syncbn.py
```
</details>
### PointPillars-3class
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/pointpillar.yaml --batch_size 32 --workers 32 --epochs 80
```
### SECOND
基准测试中的 SECOND 指在 [second.Pytorch](https://github.com/traveller59/second.pytorch) 首次被实现的 [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config)。Det3D 实现的 SECOND 中,使用了自己实现的 Multi-Group Head,因此无法将它的速度与其他代码库进行对比。
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/second.yaml --batch_size 32 --workers 32 --epochs 80
```
### Part-A2
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令以进行模型训练:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/PartA2.yaml --batch_size 32 --workers 32 --epochs 80
```
\ No newline at end of file
...@@ -48,7 +48,7 @@ extensions = [ ...@@ -48,7 +48,7 @@ extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.autodoc',
'sphinx.ext.napoleon', 'sphinx.ext.napoleon',
'sphinx.ext.viewcode', 'sphinx.ext.viewcode',
'recommonmark', 'myst_parser',
'sphinx_markdown_tables', 'sphinx_markdown_tables',
'sphinx.ext.autosectionlabel', 'sphinx.ext.autosectionlabel',
'sphinx_copybutton', 'sphinx_copybutton',
......
...@@ -9,7 +9,8 @@ ...@@ -9,7 +9,8 @@
| MMDetection3D 版本 | MMDetection 版本 | MMSegmentation 版本 | MMCV 版本 | | MMDetection3D 版本 | MMDetection 版本 | MMSegmentation 版本 | MMCV 版本 |
|:-------------------:|:-------------------:|:-------------------:|:-------------------:| |:-------------------:|:-------------------:|:-------------------:|:-------------------:|
| master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0| | master | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0|
| v1.0.0rc2 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0|
| v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0| | v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0|
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0| | v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0|
| 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0| | 0.18.1 | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0|
......
# 基于Lidar的3D检测 # 基于 LiDAR 的 3D 检测
基于 LiDAR 的 3D 检测算法是 MMDetection3D 支持的最基础的任务之一。对于给定的算法模型,输入为任意数量的、附有 LiDAR 采集的特征的点,输出为每个感兴趣目标的 3D 矩形框 (Bounding Box) 和类别标签。接下来,我们将以在 KITTI 数据集上训练 PointPillars 为例,介绍如何准备数据,如何在标准 3D 检测基准数据集上训练和测试模型,以及如何可视化并验证结果。
## 数据预处理
最开始,我们需要下载原始数据,并按[文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/data_preparation.html)中介绍的那样,把数据重新整理成标准格式。值得注意的是,对于 KIITI 数据集,我们需要额外的 txt 文件用于数据整理。
由于不同数据集上的原始数据有不同的组织方式,我们通常需要用 .pkl 或者 .json 文件收集有用的数据信息。在准备好原始数据后,我们需要运行脚本 `create_data.py`,为不同的数据集生成数据。如,对于 KITTI 数据集,我们需要执行:
```
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
```
随后,相对目录结构将变成如下形式:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ ├── kitti_gt_database
│ │ ├── kitti_infos_train.pkl
│ │ ├── kitti_infos_trainval.pkl
│ │ ├── kitti_infos_val.pkl
│ │ ├── kitti_infos_test.pkl
│ │ ├── kitti_dbinfos_train.pkl
```
## 训练
接着,我们将使用提供的配置文件训练 PointPillars。当你使用不同的 GPU 设置进行训练时,你基本上可以按照这个[教程](https://mmdetection3d.readthedocs.io/zh_CN/latest/1_exist_data_model.html)的示例脚本进行训练。假设我们在一台具有 8 块 GPU 的机器上进行分布式训练:
```
./tools/dist_train.sh configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py 8
```
注意到,配置文件名字中的 `6x8` 是指训练时是用了 8 块 GPU,每块 GPU 上有 6 个样本。如果你有不同的自定义的设置,那么有时你可能需要调整学习率。可以参考这篇[文献](https://arxiv.org/abs/1706.02677)
## 定量评估
在训练期间,模型将会根据配置文件中的 `evaluation = dict(interval=xxx)` 设置,被周期性地评估。我们支持不同数据集的官方评估方案。对于 KITTI, 模型的评价指标为平均精度 (mAP, mean average precision)。3 种类型的 mAP 的交并比 (IoU, Intersection over Union) 阈值可以取 0.5/0.7。评估结果将会被打印到终端中,如下所示:
```
Car AP@0.70, 0.70, 0.70:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:89.6905, 87.4570, 85.4865
3d AP:87.4561, 76.7569, 74.1302
aos AP:97.70, 88.73, 87.34
Car AP@0.70, 0.50, 0.50:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:98.4400, 90.1218, 89.6270
3d AP:98.3329, 90.0209, 89.4035
aos AP:97.70, 88.73, 87.34
```
评估某个特定的模型权重文件。你可以简单地执行下列的脚本:
```
./tools/dist_test.sh configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py \
work_dirs/pointpillars/latest.pth --eval mAP
```
## 测试与提交
如果你只想在线上基准上进行推理或者测试模型的表现,你只需要把上面评估脚本中的 `--eval mAP` 替换为 `--format-only`。如果需要的话,还可以指定 `pklfile_prefix``submission_prefix`,如,添加命令行选项 `--eval-options submission_prefix=work_dirs/pointpillars/test_submission`。请确保配置文件中的[测试信息](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/kitti-3d-3class.py#L131)与测试集对应,而不是验证集。在生成结果后,你可以压缩文件夹,并上传到 KITTI 的评估服务器上。
## 定性验证
MMDetection3D 还提供了通用的可视化工具,以便于我们可以对训练好的模型的预测结果有一个直观的感受。你可以在命令行中添加 `--eval-options 'show=True' 'out_dir=${SHOW_DIR}'` 选项,在评估过程中在线地可视化检测结果;你也可以使用 `tools/misc/visualize_results.py`, 离线地进行可视化。另外,我们还提供了脚本 `tools/misc/browse_dataset.py`, 可视化数据集而不做推理。更多的细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/useful_tools.html#id2)
# 基于Lidar的3D语义分割 # 基于激光雷达的 3D 语义分割
基于激光雷达的 3D 语义分割是 MMDetection3D 支持的最基础的任务之一。它期望给定的模型以激光雷达采集的任意数量的特征点为输入,并预测每个输入点的语义标签。接下来,我们以 ScanNet 数据集上的 PointNet++ (SSG) 为例,展示如何准备数据,在标准的 3D 语义分割基准上训练并测试模型,以及可视化并验证结果。
## 数据准备
首先,我们需要从 ScanNet [官方网站](http://kaldir.vc.in.tum.de/scannet_benchmark/documentation)下载原始数据。
由于不同数据集的原始数据有不同的组织方式,我们通常需要用 pkl 或 json 文件收集有用的数据信息。
因此,在准备好所有的原始数据之后,我们可以遵循 [ScanNet 文档](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/)中的说明生成数据信息。
随后,相关的目录结构将如下所示:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── scannet_utils.py
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── scannet_instance_data
│ │ ├── points
│ │ ├── instance_mask
│ │ ├── semantic_mask
│ │ ├── seg_info
│ │ │ ├── train_label_weight.npy
│ │ │ ├── train_resampled_scene_idxs.npy
│ │ │ ├── val_label_weight.npy
│ │ │ ├── val_resampled_scene_idxs.npy
│ │ ├── scannet_infos_train.pkl
│ │ ├── scannet_infos_val.pkl
│ │ ├── scannet_infos_test.pkl
```
## 训练
接着,我们将使用提供的配置文件训练 PointNet++ (SSG) 模型。当你使用不同的 GPU 设置进行训练时,你基本上可以按照这个[教程](https://mmdetection3d.readthedocs.io/zh_CN/latest/1_exist_data_model.html#inference-with-existing-models)的示例脚本。假设我们在一台具有 2 块 GPU 的机器上使用分布式训练:
```
./tools/dist_train.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py 2
```
注意,配置文件名中的 `16x2` 是指训练时用了 2 块 GPU,每块 GPU 上有 16 个样本。如果你的自定义设置不同于此,那么有时候你需要相应的调整学习率。基本规则可以参考[此处](https://arxiv.org/abs/1706.02677)
## 定量评估
在训练期间,模型权重将会根据配置文件中的 `evaluation = dict(interval=xxx)` 设置被周期性地评估。我们支持不同数据集的官方评估方案。对于 ScanNet,将使用 20 个类别的平均交并比 (mIoU) 对模型进行评估。评估结果将会被打印到终端中,如下所示:
```
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
| classes | wall | floor | cabinet | bed | chair | sofa | table | door | window | bookshelf | picture | counter | desk | curtain | refrigerator | showercurtrain | toilet | sink | bathtub | otherfurniture | miou | acc | acc_cls |
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
| results | 0.7257 | 0.9373 | 0.4625 | 0.6613 | 0.7707 | 0.5562 | 0.5864 | 0.4010 | 0.4558 | 0.7011 | 0.2500 | 0.4645 | 0.4540 | 0.5399 | 0.2802 | 0.3488 | 0.7359 | 0.4971 | 0.6922 | 0.3681 | 0.5444 | 0.8118 | 0.6695 |
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
```
此外,在训练完成后你也可以评估特定的模型权重文件。你可以简单地执行以下脚本:
```
./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py \
work_dirs/pointnet2_ssg/latest.pth --eval mIoU
```
## 测试与提交
如果你只想在在线基准上进行推理或测试模型性能,你需要将之前评估脚本中的 `--eval mIoU` 替换成 `--format-only`,并将 ScanNet 数据集[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/scannet_seg-3d-20class.py#L126)中的 `ann_file=data_root + 'scannet_infos_val.pkl'` 变成 `ann_file=data_root + 'scannet_infos_test.pkl'`。记住将 `txt_prefix` 指定为保存测试结果的目录,例如,添加选项 `--eval-options txt_prefix=work_dirs/pointnet2_ssg/test_submission`。在生成结果后,你可以压缩文件夹并上传至 [ScanNet 评估服务器](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d)上。
## 定性评估
MMDetection3D 还提供了通用的可视化工具,以便于我们可以对训练好的模型预测的分割结果有一个直观的感受。你也可以在评估阶段通过设置 `--eval-options 'show=True' 'out_dir=${SHOW_DIR}'` 来在线可视化分割结果,或者使用 `tools/misc/visualize_results.py` 来离线地进行可视化。此外,我们还提供了脚本 `tools/misc/browse_dataset.py` 用于可视化数据集而不做推理。更多的细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/useful_tools.html#visualization)
# Tutorial 7: 后端支持 # 教程 7: 后端支持
我们支持不同的文件客户端后端:磁盘、Ceph 和 LMDB 等。下面是修改配置使之从 Ceph 加载和保存数据的示例。 我们支持不同的文件客户端后端:磁盘、Ceph 和 LMDB 等。下面是修改配置使之从 Ceph 加载和保存数据的示例。
## 从 Ceph 读取数据和标注文件 ## 从 Ceph 读取数据和标注文件
......
# 教程 6: 坐标系
## 概述
MMDetection3D 使用 3 种不同的坐标系。3D 目标检测领域中不同坐标系的存在是非常有必要的,因为对于各种 3D 数据采集设备来说,如激光雷达、深度相机等,使用的坐标系是不一致的,不同的 3D 数据集也遵循不同的数据格式。早期的工作,比如 SECOND、VoteNet 将原始数据转换为另一种格式,形成了一些后续工作也遵循的约定,使得不同坐标系之间的转换变得更加复杂。
尽管数据集和采集设备多种多样,但是通过总结 3D 目标检测的工作线,我们可以将坐标系大致分为三类:
- 相机坐标系 -- 大多数相机的坐标系,在该坐标系中 y 轴正方向指向地面,x 轴正方向指向右侧,z 轴正方向指向前方。
```
上 z 前
| ^
| /
| /
| /
|/
左 ------ 0 ------> x 右
|
|
|
|
v
y 下
```
- 激光雷达坐标系 -- 众多激光雷达的坐标系,在该坐标系中 z 轴负方向指向地面,x 轴正方向指向前方,y 轴正方向指向左侧。
```
z 上 x 前
^ ^
| /
| /
| /
|/
y 左 <------ 0 ------ 右
```
- 深度坐标系 -- VoteNet、H3DNet 等模型使用的坐标系,在该坐标系中 z 轴负方向指向地面,x 轴正方向指向右侧,y 轴正方向指向前方。
```
z 上 y 前
^ ^
| /
| /
| /
|/
左 ------ 0 ------> x 右
```
该教程中的坐标系定义实际上**不仅仅是定义三个轴**。对于形如 ``$$`(x, y, z, dx, dy, dz, r)`$$`` 的框来说,我们的坐标系也定义了如何解释框的尺寸 ``$$`(dx, dy, dz)`$$`` 和转向角 (yaw) 角度 ``$$`r`$$``
三个坐标系的图示如下:
![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/master/resources/coord_sys_all.png)
上面三张图是 3D 坐标系,下面三张图是鸟瞰图。
以后我们将坚持使用本教程中定义的三个坐标系。
## 转向角 (yaw) 的定义
请参考[维基百科](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles)了解转向角的标准定义。在目标检测中,我们选择一个轴作为重力轴,并在垂直于重力轴的平面 ``$$`\Pi`$$`` 上选取一个参考方向,那么参考方向的转向角为 0,在 ``$$`\Pi`$$`` 上的其他方向有非零的转向角,其角度取决于其与参考方向的角度。
目前,对于所有支持的数据集,标注不包括俯仰角 (pitch) 和滚动角 (roll),这意味着我们在预测框和计算框之间的重叠时只需考虑转向角 (yaw)。
在 MMDetection3D 中,所有坐标系都是右手坐标系,这意味着如果从重力轴的负方向(轴的正方向指向人眼)看,转向角 (yaw) 沿着逆时针方向增加。
下图显示,在右手坐标系中,如果我们设定 x 轴正方向为参考方向,那么 y 轴正方向的转向角 (yaw) 为 ``$$`\frac{\pi}{2}`$$``
```
z 上 y 前 (yaw=0.5*pi)
^ ^
| /
| /
| /
|/
左 (yaw=pi) ------ 0 ------> x 右 (yaw=0)
```
对于一个框来说,其转向角 (yaw) 的值等于其方向减去一个参考方向。在 MMDetection3D 的所有三个坐标系中,参考方向总是 x 轴的正方向,而如果一个框的转向角 (yaw) 为 0,则其方向被定义为与 x 轴平行。框的转向角 (yaw) 的定义如下图所示。
```
y 前
^ 框的方向 (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | |
__|____|____|____|______\ x 右
| | | | /
| | | |
| |____|____|
|
```
## 框尺寸的定义
框尺寸的定义与转向角 (yaw) 的定义是分不开的。在上一节中,我们提到如果一个框的转向角 (yaw) 为 0,它的方向就被定义为与 x 轴平行。那么自然地,一个框对应于 x 轴的尺寸应该是 ``$$`dx`$$``。但是,这在某些数据集中并非总是如此(我们稍后会解决这个问题)。
下图展示了 x 轴和 ``$$`dx`$$``,y 轴和 ``$$`dy`$$`` 对应的含义。
```
y 前
^ 框的方向 (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | | dx
__|____|____|____|______\ x 右
| | | | /
| | | |
| |____|____|
| dy
```
注意框的方向总是和 ``$$`dx`$$`` 边平行。
```
y 前
^ _________
/|\ | | |
| | | |
| | | | dy
| |____|____|____\ 框的方向 (yaw=0)
| | | | /
__|____|____|____|_________\ x 右
| | | | /
| |____|____|
| dx
|
```
## 与支持的数据集的原始坐标系的关系
### KITTI
KITTI 数据集的原始标注是在相机坐标系下的,详见 [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/kitti_data_utils.py)。在 MMDetection3D 中,为了在 KITTI 数据集上训练基于激光雷达的模型,首先将数据从相机坐标系转换到激光雷达坐标,详见 [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py)。对于训练基于视觉的模型,数据保持在相机坐标系不变。
在 SECOND 中,框的激光雷达坐标系定义如下(鸟瞰图):
![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
对于每个框来说,尺寸为 ``$$`(w, l, h)`$$``,转向角 (yaw) 的参考方向为 y 轴正方向。更多细节请参考[代码库](https://github.com/traveller59/second.pytorch#concepts)
我们的激光雷达坐标系有两处改变:
- 转向角 (yaw) 被定义为右手而非左手,从而保持一致性;
- 框的尺寸为 ``$$`(l, w, h)`$$`` 而非 ``$$`(w, l, h)`$$``,由于在 KITTI 数据集中 ``$$`w`$$`` 对应 ``$$`dy`$$````$$`l`$$`` 对应 ``$$`dx`$$``
### Waymo
我们使用 Waymo 数据集的 KITTI 格式数据。因此,在我们的实现中 KITTI 和 Waymo 也共用相同的坐标系。
### NuScenes
NuScenes 提供了一个评估工具包,其中每个框都被包装成一个 `Box` 实例。`Box` 的坐标系不同于我们的激光雷达坐标系,在 `Box` 坐标系中,前两个表示框尺寸的元素分别对应 ``$$`(dy, dx)`$$`` 或者 ``$$`(w, l)`$$``,和我们的表示方法相反。更多细节请参考 NuScenes [教程](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/zh_cn/datasets/nuscenes_det.md#notes)
读者可以参考 [NuScenes 开发工具](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection),了解 [NuScenes 框](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) 的定义和 [NuScenes 评估](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py)的过程。
### Lyft
就涉及坐标系而言,Lyft 和 NuScenes 共用相同的数据格式。
请参考[官方网站](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data)获取更多信息。
### ScanNet
ScanNet 的原始数据不是点云而是网格,需要在我们的深度坐标系下进行采样得到点云数据。对于 ScanNet 检测任务,框的标注是轴对齐的,并且转向角 (yaw) 始终是 0。因此,我们的深度坐标系中转向角 (yaw) 的方向对 ScanNet 没有影响。
### SUN RGB-D
SUN RGB-D 的原始数据不是点云而是 RGB-D 图像。我们通过反投影,可以得到每张图像对应的点云,其在我们的深度坐标系下。但是,数据集的标注并不在我们的系统中,所以需要进行转换。
将原始标注转换为我们的深度坐标系下的标注的转换过程请参考 [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/sunrgbd_data_utils.py)
### S3DIS
在我们的实现中,S3DIS 与 ScanNet 共用相同的坐标系。然而 S3DIS 是一个仅限于分割任务的数据集,因此没有标注是坐标系敏感的。
## 例子
### 框(在不同坐标系间)的转换
以相机坐标系和激光雷达坐标系间的转换为例:
首先,对于点和框的中心点,坐标转换前后满足下列关系:
- ``$$`x_{LiDAR}=z_{camera}`$$``
- ``$$`y_{LiDAR}=-x_{camera}`$$``
- ``$$`z_{LiDAR}=-y_{camera}`$$``
然后,框的尺寸转换前后满足下列关系:
- ``$$`dx_{LiDAR}=dx_{camera}`$$``
- ``$$`dy_{LiDAR}=dz_{camera}`$$``
- ``$$`dz_{LiDAR}=dy_{camera}`$$``
最后,转向角 (yaw) 也应该被转换:
- ``$$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$``
详见[此处](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py)代码了解更多细节。
### 鸟瞰图
如果 3D 框是 ``$$`(x, y, z, dx, dy, dz, r)`$$``,相机坐标系下框的鸟瞰图是 ``$$`(x, z, dx, dz, -r)`$$``。转向角 (yaw) 符号取反是因为相机坐标系重力轴的正方向指向地面。
详见[此处](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py)代码了解更多细节。
### 框的旋转
我们将各种框的旋转设定为绕着重力轴逆时针旋转。因此,为了旋转一个 3D 框,我们首先需要计算新的框的中心,然后将旋转角度添加到转向角 (yaw)。
详见[此处](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py)代码了解更多细节。
## 常见问题
#### Q1: 与框相关的算子是否适用于所有坐标系类型?
否。例如,[用于 RoI-Aware Pooling 的算子](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/roiaware_pool3d.py)只适用于深度坐标系和激光雷达坐标系下的框。由于如果从上方看,旋转是顺时针的,所以 KITTI 数据集[这里](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/kitti_utils)的评估函数仅适用于相机坐标系下的框。
对于每个和框相关的算子,我们注明了其所适用的框类型。
#### Q2: 在每个坐标系中,三个轴是否分别准确地指向右侧、前方和地面?
否。例如在 KITTI 中,从相机坐标系转换为激光雷达坐标系时,我们需要一个校准矩阵。
#### Q3: 框中转向角 (yaw) ``$$`2\pi`$$`` 的相位差如何影响评估?
对于交并比 (IoU) 计算,转向角 (yaw) 有 ``$$`2\pi`$$`` 的相位差的两个框是相同的,所以不会影响评估。
对于角度预测评估,例如 NuScenes 中的 NDS 指标和 KITTI 中的 AOS 指标,会先对预测框的角度进行标准化,因此 ``$$`2\pi`$$`` 的相位差不会改变结果。
#### Q4: 框中转向角 (yaw) ``$$`\pi`$$`` 的相位差如何影响评估?
对于交并比 (IoU) 计算,转向角 (yaw) 有 ``$$`\pi`$$`` 的相位差的两个框是相同的,所以不会影响评估。
然而,对于角度预测评估,这会导致完全相反的方向。
考虑一辆汽车,转向角 (yaw) 是汽车前部方向与 x 轴正方向之间的夹角。如果我们将该角度增加 ``$$`\pi`$$``,车前部将变成车后部。
对于某些类别,例如障碍物,前后没有区别,因此 ``$$`\pi`$$`` 的相位差不会对角度预测分数产生影响。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment