Commit eb1107e4 authored by raojy's avatar raojy
Browse files

fix_mmdetection

parent 7aa442d5
Pipeline #3461 canceled with stages
# Get Started
## Prerequisites
In this section, we demonstrate how to prepare an environment with PyTorch.
MMDetection3D works on Linux, Windows (experimental support) and macOS. It requires Python 3.7+, CUDA 10.0+, and PyTorch 1.8+.
```{note}
If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
```
**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
**Step 1.** Create a conda environment and activate it.
```shell
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
```
**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.
On GPU platforms:
```shell
conda install pytorch torchvision -c pytorch
```
On CPU platforms:
```shell
conda install pytorch torchvision cpuonly -c pytorch
```
## Installation
We recommend that users follow our best practices to install MMDetection3D. However, the whole process is highly customizable. See [Customize Installation](#customize-installation) section for more information.
### Best Practices
**Step 0.** Install [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection) using [MIM](https://github.com/open-mmlab/mim).
```shell
pip install -U openmim
mim install mmengine
mim install 'mmcv>=2.0.0rc4'
mim install 'mmdet>=3.0.0'
```
**Note**: In MMCV-v2.x, `mmcv-full` is renamed to `mmcv`, if you want to install `mmcv` without CUDA ops, you can use `mim install "mmcv-lite>=2.0.0rc4"` to install the lite version.
**Step 1.** Install MMDetection3D.
Case a: If you develop and run mmdet3d directly, install it from source:
```shell
git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x
# "-b dev-1.x" means checkout to the `dev-1.x` branch.
cd mmdetection3d
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in edtiable mode,
# thus any local modifications made to the code will take effect without reinstallation.
```
Case b: If you use mmdet3d as a dependency or third-party package, install it with MIM:
```shell
mim install "mmdet3d>=1.1.0"
```
Note:
1. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
2. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`.
We have supported `spconv 2.0`. If the user has installed `spconv 2.0`, the code will use `spconv 2.0` first, which will take up less GPU memory than using the default `mmcv spconv`. Users can use the following commands to install `spconv 2.0`:
```shell
pip install cumm-cuxxx
pip install spconv-cuxxx
```
Where `xxx` is the CUDA version in the environment.
For example, using CUDA 10.2, the command will be `pip install cumm-cu102 && pip install spconv-cu102`.
Supported CUDA versions include 10.2, 11.1, 11.3, and 11.4. Users can also install it by building from the source. For more details please refer to [spconv v2.x](https://github.com/traveller59/spconv).
We also support `Minkowski Engine` as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/NVIDIA/MinkowskiEngine#installation) or use `pip` to install it:
```shell
conda install openblas-devel -c anaconda
export CPLUS_INCLUDE_PATH=CPLUS_INCLUDE_PATH:${YOUR_CONDA_ENVS_DIR}/include
# replace ${YOUR_CONDA_ENVS_DIR} to your anaconda environment path e.g. `/home/username/anaconda3/envs/openmmlab`.
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=/opt/conda/include" --install-option="--blas=openblas"
```
We also support `Torchsparse` as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/mit-han-lab/torchsparse#installation) or use `pip` to install it:
```shell
sudo apt-get install libsparsehash-dev
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
```
or omit sudo install by following command:
```shell
conda install -c bioconda sparsehash
export CPLUS_INCLUDE_PATH=CPLUS_INCLUDE_PATH:${YOUR_CONDA_ENVS_DIR}/include
# replace ${YOUR_CONDA_ENVS_DIR} to your anaconda environment path e.g. `/home/username/anaconda3/envs/openmmlab`.
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
```
3. The code can not be built for CPU only environment (where CUDA isn't available) for now.
### Verify the Installation
To verify whether MMDetection3D is installed correctly, we provide some sample codes to run an inference demo.
**Step 1.** We need to download config and checkpoint files.
```shell
mim download mmdet3d --config pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car --dest .
```
The downloading will take several seconds or more, depending on your network environment. When it is done, you will find two files `pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py` and `hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth` in your current folder.
**Step 2.** Verify the inference demo.
Case a: If you install MMDetection3D from source, just run the following command.
```shell
python demo/pcd_demo.py demo/data/kitti/000008.bin pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth --show
```
You will see a visualizer interface with point cloud, where bounding boxes are plotted on cars.
**Note**:
If you install MMDetection3D on a remote server without display device, you can leave out the `--show` argument. Demo will still save the predictions to `outputs/pred/000008.json` file.
**Note**:
If you want to input a `.ply` file, you can use the following function and convert it to `.bin` format. Then you can use the converted `.bin` file to run demo.
Note that you need to install `pandas` and `plyfile` before using this script. This function can also be used for data preprocessing for training `ply data`.
```python
import numpy as np
import pandas as pd
from plyfile import PlyData
def convert_ply(input_path, output_path):
plydata = PlyData.read(input_path) # read file
data = plydata.elements[0].data # read data
data_pd = pd.DataFrame(data) # convert to DataFrame
data_np = np.zeros(data_pd.shape, dtype=np.float) # initialize array to store data
property_names = data[0].dtype.names # read names of properties
for i, name in enumerate(
property_names): # read data by property
data_np[:, i] = data_pd[name]
data_np.astype(np.float32).tofile(output_path)
```
Examples:
```python
convert_ply('./test.ply', './test.bin')
```
If you have point clouds in other format (`.off`, `.obj`, etc.), you can use `trimesh` to convert them into `.ply`.
```python
import trimesh
def to_ply(input_path, output_path, original_type):
mesh = trimesh.load(input_path, file_type=original_type) # read file
mesh.export(output_path, file_type='ply') # convert to ply
```
Examples:
```python
to_ply('./test.obj', './test.ply', 'obj')
```
Case b: If you install MMDetection3D with MIM, open your python interpreter and copy&paste the following codes.
```python
from mmdet3d.apis import init_model, inference_detector
config_file = 'pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py'
checkpoint_file = 'hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth'
model = init_model(config_file, checkpoint_file)
inference_detector(model, 'demo/data/kitti/000008.bin')
```
You will see a list of `Det3DDataSample`, and the predictions are in the `pred_instances_3d`, indicating the detected bounding boxes, labels, and scores.
### Customize Installation
#### CUDA Versions
When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
```{note}
Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
```
#### Install MMEngine without MIM
To install MMEngine with pip instead of MIM, please follow [MMEngine installation guides](https://mmengine.readthedocs.io/en/latest/get_started/installation.html).
For example, you can install MMEngine by the following command:
```shell
pip install mmengine
```
#### Install MMCV without MIM
MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/2.x/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
For example, the following command install MMCV built for PyTorch 1.12.x and CUDA 11.6:
```shell
pip install "mmcv>=2.0.0rc4" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html
```
#### Install on Google Colab
[Google Colab](https://colab.research.google.com/) usually has PyTorch installed, thus we only need to install MMEngine, MMCV, MMDetection, and MMDetection3D with the following commands.
**Step 1.** Install [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection) using [MIM](https://github.com/open-mmlab/mim).
```shell
!pip3 install openmim
!mim install mmengine
!mim install "mmcv>=2.0.0rc4,<2.1.0"
!mim install "mmdet>=3.0.0,<3.1.0"
```
**Step 2.** Install MMDetection3D from source.
```shell
!git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x
%cd mmdetection3d
!pip install -e .
```
**Step 3.** Verification.
```python
import mmdet3d
print(mmdet3d.__version__)
# Example output: 1.1.0, or an another version.
```
```{note}
Within Jupyter, the exclamation mark `!` is used to call external executables and `%cd` is a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd) to change the current working directory of Python.
```
#### Using MMDetection3D with Docker
We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docker/Dockerfile) to build an image. Ensure that your [docker version](https://docs.docker.com/engine/install/) >= 19.03.
```shell
# build an image with PyTorch 1.9, CUDA 11.1
# If you prefer other versions, just modified the Dockerfile
docker build -t mmdetection3d docker/
```
Run it with:
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d
```
### Troubleshooting
If you have some issues during the installation, please first view the [FAQ](notes/faq.md) page.
You may [open an issue](https://github.com/open-mmlab/mmdetection3d/issues/new/choose) on GitHub if no solution is found.
### Use Multiple Versions of MMDetection3D in Development
Training and testing scripts have already been modified in `PYTHONPATH` in order to make sure the scripts are using their own versions of MMDetection3D.
To install the default version of MMDetection3D in your environment, you can exclude the following code in the related scripts:
```shell
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
```
Welcome to MMDetection3D's documentation!
==========================================
.. toctree::
:maxdepth: 1
:caption: Get Started
get_started.md
.. toctree::
:maxdepth: 2
:caption: User Guides
user_guides/index.rst
.. toctree::
:maxdepth: 2
:caption: Advanced Guides
advanced_guides/index.rst
.. toctree::
:maxdepth: 1
:caption: Migrating from MMDetection3D 1.0
migration.md
.. toctree::
:maxdepth: 1
:caption: API Reference
api.rst
.. toctree::
:maxdepth: 1
:caption: Model Zoo
model_zoo.md
.. toctree::
:maxdepth: 1
:caption: Notes
notes/index.rst
.. toctree::
:caption: Switch Language
switch_language.md
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
Along with the release of OpenMMLab 2.0, MMDetection3D (namely MMDet3D) 1.1 made many significant changes, resulting in less redundant, more efficient code and a more consistent overall design. These changes break backward compatibility. Therefore, we prepared this migration guide to make the transition as smooth as possible so that all users can enjoy the productivity benefits of the new MMDet3D and the entire OpenMMLab 2.0 ecosystem.
## Environment
MMDet3D 1.1 depends on the new foundational library [MMEngine](https://github.com/open-mmlab/mmengine) for training deep learning models, and therefore has an entirely different dependency chain compared with MMDet3D 1.0. Even if you have a well-rounded MMDet3D 1.0 / 0.x environment before, you still need to create a new Python environment for MMDet3D 1.1. We provide a detailed [installation guide](./get_started.md) for reference.
The configuration files in our new version have a lot of modifications because of the differences between MMCV 1.x and MMEngine. The guides for migration from MMCV to MMEngine can be seen [here](https://github.com/open-mmlab/mmengine/tree/main/docs/en/migration).
We have renamed the names of the remote branches in MMDet3D 1.1 (renaming 1.1 to main, master to 1.0, and dev to dev-1.0). If your local branches in the git system are not aligned with branches of the remote repo, you can use the following commands to resolve it:
```
git fetch origin
git checkout main
git branch main_backup # backup your main branch
git reset --hard origin/main
```
## Dataset
You should update the annotation files generated in the 1.0 version since some key words and structures of annotation in MMDet3D 1.1 have changed. Taking KITTI as an example, the update script is as follows:
```python
python tools/dataset_converters/update_infos_to_v2.py
--dataset kitti
--pkl-path ./data/kitti/kitti_infos_train.pkl
--out-dir ./kitti_v2/
```
If your annotation files are generated in the 0.x version, you should first update them to 1.0 version using this [script](../../tools/update_data_coords.py). Alternatively, you can re-generate annotation files from scratch using this [script](../../tools/create_data.py).
## Model
MMDet3D 1.1 supports loading weights trained on the old version (1.0 version). For models that are important or frequently used, we have thoroughly verified their precisions in the 1.1 version. Especially for some models that may experience potential performance drop or training bugs in the old version, such as [centerpoint](https://github.com/open-mmlab/mmdetection3d/issues/2390), we have checked them and ensured the right precision in the new version. If you encounter any problem, please feel free to raise an [issue](https://github.com/open-mmlab/mmdetection3d/issues). Additionally, we have added some of the latest SOTA methods in our [package](../../configs/) and [projects](../../projects/), making MMDet3D 1.1 a highly recommended choice for implementing your project.
# Model Zoo
## Common settings
- We use distributed training.
- For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows.
- We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) which computes the average time on 2000 images.
## Baselines
### SECOND
Please refer to [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/second) for details. We provide SECOND baselines on KITTI and Waymo datasets.
### PointPillars
Please refer to [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointpillars) for details. We provide pointpillars baselines on KITTI, nuScenes, Lyft, and Waymo datasets.
### Part-A2
Please refer to [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/parta2) for details.
### VoteNet
Please refer to [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/votenet) for details. We provide VoteNet baselines on ScanNet and SUNRGBD datasets.
### Dynamic Voxelization
Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/dynamic_voxelization) for details.
### MVXNet
Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/mvxnet) for details.
### RegNetX
Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently.
### nuImages
We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently.
### H3DNet
Please refer to [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/h3dnet) for details.
### 3DSSD
Please refer to [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/3dssd) for details.
### CenterPoint
Please refer to [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/centerpoint) for details.
### SSN
Please refer to [SSN](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/ssn) for details. We provide pointpillars with shape-aware grouping heads used in SSN on the nuScenes and Lyft datasets currently.
### ImVoteNet
Please refer to [ImVoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvotenet) for details. We provide ImVoteNet baselines on SUNRGBD dataset.
### FCOS3D
Please refer to [FCOS3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcos3d) for details. We provide FCOS3D baselines on the nuScenes dataset.
### PointNet++
Please refer to [PointNet++](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointnet2) for details. We provide PointNet++ baselines on ScanNet and S3DIS datasets.
### Group-Free-3D
Please refer to [Group-Free-3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/groupfree3d) for details. We provide Group-Free-3D baselines on ScanNet dataset.
### ImVoxelNet
Please refer to [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvoxelnet) for details. We provide ImVoxelNet baselines on KITTI dataset.
### PAConv
Please refer to [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/paconv) for details. We provide PAConv baselines on S3DIS dataset.
### DGCNN
Please refer to [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn) for details. We provide DGCNN baselines on S3DIS dataset.
### SMOKE
Please refer to [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke) for details. We provide SMOKE baselines on KITTI dataset.
### PGD
Please refer to [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd) for details. We provide PGD baselines on KITTI and nuScenes dataset.
### PointRCNN
Please refer to [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn) for details. We provide PointRCNN baselines on KITTI dataset.
### MonoFlex
Please refer to [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) for details. We provide MonoFlex baselines on KITTI dataset.
### SA-SSD
Please refer to [SA-SSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/sassd) for details. We provide SA-SSD baselines on the KITTI dataset.
### FCAF3D
Please refer to [FCAF3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcaf3d) for details. We provide FCAF3D baselines on the ScanNet, S3DIS, and SUN RGB-D datasets.
### PV-RCNN
Please refer to [PV-RCNN](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pv_rcnn) for details. We provide PV-RCNN baselines on the KITTI dataset.
### BEVFusion
Please refer to [BEVFusion](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/BEVFusion) for details. We provide BEVFusion baselines on the NuScenes dataset.
### CenterFormer
Please refer to [CenterFormer](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/CenterFormer) for details. We provide CenterFormer baselines on the Waymo dataset.
### TR3D
Please refer to [TR3D](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/TR3D) for details. We provide TR3D baselines on the ScanNet, SUN RGB-D and S3DIS dataset.
### DETR3D
Please refer to [DETR3D](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/DETR3D) for details. We provide DETR3D baselines on the nuScenes dataset.
### PETR
Please refer to [PETR](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/PETR) for details. We provide PETR baselines on the nuScenes dataset.
### TPVFormer
Please refer to [TPVFormer](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/TPVFormer) for details. We provide TPVFormer baselines on the nuScenes dataset.
### Mixed Precision (FP16) Training
Please refer to [Mixed Precision (FP16) Training on PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py) for details.
# Benchmarks
Here we benchmark the training and testing speed of models in MMDetection3D,
with some other open source 3D detection codebases.
## Settings
- Hardwares: 8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
- Model: Since all the other codebases implements different models, we compare the corresponding models including SECOND, PointPillars, Part-A2, and VoteNet with them separately.
- Metrics: We use the average throughput in iterations of the entire training run and skip the first 50 iterations of each epoch to skip GPU warmup time.
## Main Results
We compare the training speed (samples/s) with other codebases if they implement the similar models. The results are as below, the greater the numbers in the table, the faster of the training process. The models that are not supported by other codebases are marked by `×`.
| Methods | MMDetection3D | OpenPCDet | votenet | Det3D |
| :-----------------: | :-----------: | :-------: | :-----: | :---: |
| VoteNet | 358 | × | 77 | × |
| PointPillars-car | 141 | × | × | 140 |
| PointPillars-3class | 107 | 44 | × | × |
| SECOND | 40 | 30 | × | × |
| Part-A2 | 17 | 14 | × | × |
## Details of Comparison
### Modification for Calculating Speed
- __MMDetection3D__: We try to use as similar settings as those of other codebases as possible using [benchmark configs](https://github.com/open-mmlab/MMDetection3D/blob/main/configs/benchmark).
- __Det3D__: For comparison with Det3D, we use the commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7).
- __OpenPCDet__: For comparison with OpenPCDet, we use the commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2).
For training speed, we add code to record the running time in the file `./tools/train_utils/train_utils.py`. We calculate the speed of each epoch, and report the average speed of all the epochs.
<details>
<summary>
(diff to make it use the same method for benchmarking speed - click to expand)
</summary>
```diff
diff --git a/tools/train_utils/train_utils.py b/tools/train_utils/train_utils.py
index 91f21dd..021359d 100644
--- a/tools/train_utils/train_utils.py
+++ b/tools/train_utils/train_utils.py
@@ -2,6 +2,7 @@ import torch
import os
import glob
import tqdm
+import datetime
from torch.nn.utils import clip_grad_norm_
@@ -13,7 +14,10 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
if rank == 0:
pbar = tqdm.tqdm(total=total_it_each_epoch, leave=leave_pbar, desc='train', dynamic_ncols=True)
+ start_time = None
for cur_it in range(total_it_each_epoch):
+ if cur_it > 49 and start_time is None:
+ start_time = datetime.datetime.now()
try:
batch = next(dataloader_iter)
except StopIteration:
@@ -55,9 +59,11 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
tb_log.add_scalar('learning_rate', cur_lr, accumulated_iter)
for key, val in tb_dict.items():
tb_log.add_scalar('train_' + key, val, accumulated_iter)
+ endtime = datetime.datetime.now()
+ speed = (endtime - start_time).seconds / (total_it_each_epoch - 50)
if rank == 0:
pbar.close()
- return accumulated_iter
+ return accumulated_iter, speed
def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_cfg,
@@ -65,6 +71,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
lr_warmup_scheduler=None, ckpt_save_interval=1, max_ckpt_save_num=50,
merge_all_iters_to_one_epoch=False):
accumulated_iter = start_iter
+ speeds = []
with tqdm.trange(start_epoch, total_epochs, desc='epochs', dynamic_ncols=True, leave=(rank == 0)) as tbar:
total_it_each_epoch = len(train_loader)
if merge_all_iters_to_one_epoch:
@@ -82,7 +89,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
cur_scheduler = lr_warmup_scheduler
else:
cur_scheduler = lr_scheduler
- accumulated_iter = train_one_epoch(
+ accumulated_iter, speed = train_one_epoch(
model, optimizer, train_loader, model_func,
lr_scheduler=cur_scheduler,
accumulated_iter=accumulated_iter, optim_cfg=optim_cfg,
@@ -91,7 +98,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
total_it_each_epoch=total_it_each_epoch,
dataloader_iter=dataloader_iter
)
-
+ speeds.append(speed)
# save trained model
trained_epoch = cur_epoch + 1
if trained_epoch % ckpt_save_interval == 0 and rank == 0:
@@ -107,6 +114,8 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
save_checkpoint(
checkpoint_state(model, optimizer, trained_epoch, accumulated_iter), filename=ckpt_name,
)
+ print(speed)
+ print(f'*******{sum(speeds) / len(speeds)}******')
def model_state_to_cpu(model_state):
```
</details>
### VoteNet
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/votenet/votenet_8xb16_sunrgbd-3d.py 8 --no-validate
```
- __votenet__: At commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566), run
```bash
python train.py --dataset sunrgbd --batch_size 16
```
Then benchmark the test speed by running
```bash
python eval.py --dataset sunrgbd --checkpoint_path log_sunrgbd/checkpoint.tar --batch_size 1 --dump_dir eval_sunrgbd --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal
```
Note that eval.py is modified to compute inference time.
<details>
<summary>
(diff to benchmark the similar models - click to expand)
</summary>
```diff
diff --git a/eval.py b/eval.py
index c0b2886..04921e9 100644
--- a/eval.py
+++ b/eval.py
@@ -10,6 +10,7 @@ import os
import sys
import numpy as np
from datetime import datetime
+import time
import argparse
import importlib
import torch
@@ -28,7 +29,7 @@ parser.add_argument('--checkpoint_path', default=None, help='Model checkpoint pa
parser.add_argument('--dump_dir', default=None, help='Dump dir to save sample outputs [default: None]')
parser.add_argument('--num_point', type=int, default=20000, help='Point Number [default: 20000]')
parser.add_argument('--num_target', type=int, default=256, help='Point Number [default: 256]')
-parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 8]')
+parser.add_argument('--batch_size', type=int, default=1, help='Batch Size during training [default: 8]')
parser.add_argument('--vote_factor', type=int, default=1, help='Number of votes generated from each seed [default: 1]')
parser.add_argument('--cluster_sampling', default='vote_fps', help='Sampling strategy for vote clusters: vote_fps, seed_fps, random [default: vote_fps]')
parser.add_argument('--ap_iou_thresholds', default='0.25,0.5', help='A list of AP IoU thresholds [default: 0.25,0.5]')
@@ -132,6 +133,7 @@ CONFIG_DICT = {'remove_empty_box': (not FLAGS.faster_eval), 'use_3d_nms': FLAGS.
# ------------------------------------------------------------------------- GLOBAL CONFIG END
def evaluate_one_epoch():
+ time_list = list()
stat_dict = {}
ap_calculator_list = [APCalculator(iou_thresh, DATASET_CONFIG.class2type) \
for iou_thresh in AP_IOU_THRESHOLDS]
@@ -144,6 +146,8 @@ def evaluate_one_epoch():
# Forward pass
inputs = {'point_clouds': batch_data_label['point_clouds']}
+ torch.cuda.synchronize()
+ start_time = time.perf_counter()
with torch.no_grad():
end_points = net(inputs)
@@ -161,6 +165,12 @@ def evaluate_one_epoch():
batch_pred_map_cls = parse_predictions(end_points, CONFIG_DICT)
batch_gt_map_cls = parse_groundtruths(end_points, CONFIG_DICT)
+ torch.cuda.synchronize()
+ elapsed = time.perf_counter() - start_time
+ time_list.append(elapsed)
+
+ if len(time_list==200):
+ print("average inference time: %4f"%(sum(time_list[5:])/len(time_list[5:])))
for ap_calculator in ap_calculator_list:
ap_calculator.step(batch_pred_map_cls, batch_gt_map_cls)
```
### PointPillars-car
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
```
- __Det3D__: At commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7), use `kitti_point_pillars_mghead_syncbn.py` and run
```bash
./tools/scripts/train.sh --launcher=slurm --gpus=8
```
Note that the config in train.sh is modified to train point pillars.
<details>
<summary>
(diff to benchmark the similar models - click to expand)
</summary>
```diff
diff --git a/tools/scripts/train.sh b/tools/scripts/train.sh
index 3a93f95..461e0ea 100755
--- a/tools/scripts/train.sh
+++ b/tools/scripts/train.sh
@@ -16,9 +16,9 @@ then
fi
# Voxelnet
-python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
+# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/ nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$NUSC_CBGS_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ lyft_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$LYFT_CBGS_WORK_DIR
# PointPillars
-# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ original_pp_mghead_syncbn_kitti.py --work_dir=$PP_WORK_DIR
+python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ kitti_point_pillars_mghead_syncbn.py
```
</details>
### PointPillars-3class
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
```bash
cd tools
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/pointpillar.yaml --batch_size 32 --workers 32 --epochs 80
```
### SECOND
For SECOND, we mean the [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config) that was first implemented in [second.Pytorch](https://github.com/traveller59/second.pytorch). Det3D's implementation of SECOND uses its self-implemented Multi-Group Head, so its speed is not compatible with other codebases.
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/second.yaml --batch_size 32 --workers 32 --epochs 80
```
### Part-A2
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), train the model by running
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/PartA2.yaml --batch_size 32 --workers 32 --epochs 80
```
# Changelog of v1.1
### v1.4.0 (8/1/2024)
#### Highlights
- Refactor Waymo dataset (#2836)
- Support the training of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2738)
- Support [Nerf-Det](https://arxiv.org/abs/2307.14620) in `projects` (#2732)
#### New Features
- Support the training of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2738)
- Support [Nerf-Det](https://arxiv.org/abs/2307.14620) in `projects` (#2732)
- Support [MV-FCOS3D++](https://arxiv.org/abs/2207.12716)
- Refactor Waymo dataset (#2836)
#### Improvements
- Support [PGD](https://arxiv.org/abs/2107.14160)) (front-of-view / multi-view) on Waymo dataset (#2835)
- Release new [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) for verify some methods or debug quickly (#2835)
#### Bug Fixes
- Fix MinkUNet and SPVCNN some wrong configs (#2854)
- Fix incorrect number of arguments in PETR (#2800)
- Delete unused files in `mmdet3d/configs` (#2773)
#### Contributors
A total of 5 developers contributed to this release.
@sunjiahao1999, @WendellZ524, @Yanyirong, @JingweiZhang12, @Tai-Wang
### v1.3.0 (18/10/2023)
#### Highlights
- Support [CENet](https://arxiv.org/abs/2207.12691) in `projects` (#2619)
- Enhance demos with new 3D inferencers (#2763)
#### New Features
- Support [CENet](https://arxiv.org/abs/2207.12691) in `projects` (#2619)
#### Improvements
- Enhance demos with new 3D inferencers (#2763)
- Add BEV-based detection pipeline in nuScenes dataset tutorial (#2672)
- Add the new config type of Cylinder3D in `mmdet3d/configs` (#2681)
- Update [New Config Type](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta) (#2655)
- Update the QR code in README.md (#2703)
#### Bug Fixes
- Fix the download script of nuScenes dataset (#2660)
- Fix circleCI and GitHub workflow configuration (#2652)
- Fix the version of Open3D in requirements (#2633)
- Fix unused files in `mmdet3d/configs` (#2773)
- Fix support devices in FreeAnchor3DHead (#2769)
- Fix readthedocs building and link (#2739, #2650)
- Fix the pitch angle bug in LaserMix (#2710)
#### Contributors
A total of 6 developers contributed to this release.
@sunjiahao1999, @Xiangxu-0103, @ZhaoCake, @LRJKD, @crazysteeaam, @wep21, @zhiqwang
### v1.2.0 (4/7/2023)
#### Highlights
- Support [New Config Type](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta) in `mmdet3d/config` (#2608)
- Support the inference of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2606)
- Support downloading datasets from [OpenDataLab](https://opendatalab.com/) using `mim` (#2593)
#### New Features
- Support [New Config Type](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta) in `mmdet3d/config` (#2608)
- Support the inference of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2606)
- Support downloading datasets from [OpenDataLab](https://opendatalab.com/) using `mim` (#2593)
#### Improvements
- Enhanced visualization in interactive form (#2611)
- Update README.md and Model Zoo (#2599, #2600)
- Speed up S3DIS data preparation (#2585)
#### Bug Fixes
- Remove PointRCNN in benchmark training (#2610)
- Fix wrong indoor detection visualization (#2625)
- Fix MinkUNet download link (#2590)
- Fix the formula in the `readthedocs` (#2580)
#### Contributors
A total of 5 developers contributed to this release.
@sunjiahao1999, @Xiangxu-0103, @JingweiZhang12, @col14m, @zhulf0804
### v1.1.1 (30/5/2023)
#### Highlights
- Support [TPVFormer](https://arxiv.org/pdf/2302.07817.pdf) in `projects` (#2399, #2517, #2535)
- Support the training of BEVFusion in `projects` (#2546)
- Support lidar-based 3D semantic segmentation benchmark (#2530, #2559)
#### New Features
- Support [TPVFormer](https://arxiv.org/pdf/2302.07817.pdf) in `projects` (#2399, #2517, #2535)
- Support the training of \[BEVFusion\] in `projects` (#2558)
- Support lidar-based 3D Semantic Segmentation Benchmark (#2530, #2559)
- Support test-time augmentation for Segmentor (#2382)
- Support `Minkowski ConvModule` and `Residual` Block (#2528)
- Support the visualization of multi-view images in multi-modal methods (#2453)
#### Improvements
- Upload checkpoints and training log of PETR (#2555)
- Replace `np.float` by default `float` in segmentation evaluation (#2527)
- Add docs of converting SemanticKITTI datasets (#2515)
- Support different colors for different classes in visualization (#2500)
- Support tensor-like operations for `BaseInstance3DBoxes` and `BasePoint`
- Add information of LiDAR Segmentation in NuScenes annotation files
- Provide annotation files of datasets generated offline (#2457)
- Refactor document structure (#2429)
- Complete typehints and docstring (#2396, #2457, #2468, #2464, #2485)
#### Bug Fixes
- Fix the bug of abnormal loss when training SECOND in Automatic mixed precision(AMP) mode (#2452)
- Add a warning in function `post_process_coords` in mmdet3d/dataset/convert_utils.py (#2557)
- Fix invalid configs (#2477, #2536)
- Fix bugs of unit test (#2466)
- Update `local-rank` argument in test.py for pytorch 2.0 (#2469)
- Fix docker file (#2451)
- Fix demo and visualization (#2453)
- Fix SUN RGB-D data converter (#2440)
- Fix readthedocs building (#2459, #2419, #2505, #2396)
- Fix CI #(2445)
- Fix the version error of `torch` in github merge stage test (#2424)
- Loose the version restriction of `numba` (#2416)
#### Contributors
A total of 10 developers contributed to this release.
@sunjiahao1999, @Xiangxu-0103, @JingweiZhang12, @chriscarving, @jaan1729, @pd-michaelstanley, @filaPro, @kabouzeid, @A-new-b, @lbin
### v1.1.0 (6/4/2023)
#### Highlights
- Support [Cylinder3D](https://arxiv.org/pdf/2011.10033.pdf) (#2291, #2344, #2350)
- Support [MinkUnet](https://arxiv.org/abs/1904.08755) (#2294, #2358)
- Support [SPVCNN](https://arxiv.org/abs/2007.16100) (#2320,#2372)
- Support [TR3D](https://arxiv.org/abs/2302.02858) detector in `projects` (#2274)
- Support the inference of [BEVFusion](https://arxiv.org/abs/2205.13542) in `projects` (#2175)
- Support [DETR3D](https://arxiv.org/abs/2110.06922) in `projects` (#2173)
#### New Features
- Support [Cylinder3D](https://arxiv.org/pdf/2011.10033.pdf) (#2291, #2344, #2350)
- Support [MinkUnet](https://arxiv.org/abs/1904.08755) (#2294, #2358)
- Support [SPVCNN](https://arxiv.org/abs/2007.16100) (#2320,#2372)
- Support [TR3D](https://arxiv.org/abs/2302.02858) detector in `projects` (#2274)
- Support the inference of [BEVFusion](https://arxiv.org/abs/2205.13542) in `projects` (#2175)
- Support [DETR3D](https://arxiv.org/abs/2110.06922) in `projects` (#2173)
- Support PolarMix and LaserMix augmentation (#2265, #2302)
- Support loading annotation of panoptic segmentation (#2223)
- Support panoptic segmentation metric (#2230)
- Add inferencer for LiDAR-based, monocular and multi-modality 3D detection (#2208, #2190, #2342)
- Add inferencer for LiDAR-based segmentation (#2304)
#### Improvements
- Support `lazy_init` for CBGSDataset (#2271)
- Support generating annotation files for test set on Waymo (#2180)
- Enhance the support for SemanticKitti (#2253, #2323)
- File I/O migration and reconstruction (#2319)
- Support `format_only` option for Lyft, NuScenes and Waymo datasets (#2333, #2151)
- Replace `np.transpose` with `torch.permute` to speed up (#2277)
- Allow setting local-rank for pytorch 2.0 (#2387)
#### Bug Fixes
- Fix the problem of reversal of length and width when drawing heatmap in CenterFormer (#2362)
- Deprecate old type alias due to the new version of numpy (#2339)
- Lose `trimesh` version requirements to fix numpy random state (#2340)
- Fix the device mismatch error in CenterPoint (#2308)
- Fix bug of visualization when there are no bboxes (#2231)
- Fix bug of counting ignore index in IOU in segmentation evaluation (#2229)
#### Contributors
A total of 14 developers contributed to this release.
@ZLTJohn, @SekiroRong, @shufanwu, @vansin, @triple-Mu, @404Vector, @filaPro, @sunjiahao1999, @Ginray, @Xiangxu-0103, @JingweiZhang12, @DezeZhao, @ZCMax, @roger-lcc
### v1.1.0rc3 (7/1/2023)
#### Highlights
- Support [CenterFormer](https://arxiv.org/abs/2209.05588) in `projects` (#2175)
- Support [PETR](https://arxiv.org/abs/2203.05625) in `projects` (#2173)
#### New Features
- Support [CenterFormer](https://arxiv.org/abs/2209.05588) in `projects` (#2175)
- Support [PETR](https://arxiv.org/abs/2203.05625) in `projects` (#2173)
- Refactor ImVoxelNet on SUN RGB-D into mmdet3d v1.1 (#2141)
#### Improvements
- Remove legacy builder.py (#2061)
- Update `customize_dataset` documentation (#2153)
- Update tutorial of LiDAR-based detection (#2120)
#### Bug Fixes
- Fix the configs of FCOS3D and PGD (#2191)
- Fix numpy's `ValueError` in update_infos_to_v2.py (#2162)
- Fix parameter missing in Det3DVisualizationHook (#2118)
- Fix memory overflow in the rotated box IoU calculation (#2134)
- Fix lidar2cam error in update_infos_to_v2.py for nus and lyft dataset (#2110)
- Fix error of data type in Waymo metrics (#2109)
- Update `bbox_3d` information in `cam_instances` for mono3d detection task (#2046)
- Fix label saving of Waymo dataset (#2096)
#### Contributors
A total of 10 developers contributed to this release.
@SekiroRong, @ZLTJohn, @vansin, @shanmo, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @Tai-Wang, @lianqing11
### v1.1.0rc2 (2/12/2022)
#### Highlights
- Support [PV-RCNN](https://arxiv.org/abs/1912.13192)
- Speed up evaluation on Waymo dataset
#### New Features
- Support [PV-RCNN](https://arxiv.org/abs/1912.13192) (#1597, #2045)
- Speed up evaluation on Waymo dataset (#2008)
- Refactor FCAF3D into the framework of mmdet3d v1.1 (#1945)
- Refactor S3DIS dataset into the framework of mmdet3d v1.1 (#1984)
- Add `Projects/` folder and the first example project (#2042)
#### Improvements
- Rename `CLASSES` and `PALETTE` to `classes` and `palette` respectively (#1932)
- Update `metainfo` in pkl files and add `categories` into metainfo (#1934)
- Show instance statistics before and after through the pipeline (#1863)
- Add configs of DGCNN for different testing areas (#1967)
- Remove testing utils from `tests/utils/` to `mmdet3d/testing/` (#2012)
- Add typehint for code in `models/layers/` (#2014)
- Refine documentation (#1891, #1994)
- Refine voxelization for better speed (#2062)
#### Bug Fixes
- Fix loop visualization error about point cloud (#1914)
- Fix image conversion of Waymo to avoid information loss (#1979)
- Fix evaluation on KITTI testset (#2005)
- Fix sampling bug in `IoUNegPiecewiseSampler` (#2017)
- Fix point cloud range in CenterPoint (#1998)
- Fix some loading bugs and support FOV-image-based mode on Waymo dataset (#1942)
- Fix dataset conversion utils (#1923, #2040, #1971)
- Update metafiles in all the configs (#2006)
#### Contributors
A total of 12 developers contributed to this release.
@vavanade, @oyel, @thinkthinking, @PeterH0323, @274869388, @cxiang26, @lianqing11, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @Tai-Wang
### v1.1.0rc1 (11/10/2022)
#### Highlights
- Support a camera-only 3D detection baseline on Waymo, [MV-FCOS3D++](https://arxiv.org/abs/2207.12716)
#### New Features
- Support a camera-only 3D detection baseline on Waymo, [MV-FCOS3D++](https://arxiv.org/abs/2207.12716), with new evaluation metrics and transformations (#1716)
- Refactor PointRCNN in the framework of mmdet3d v1.1 (#1819)
#### Improvements
- Add `auto_scale_lr` in config to support training with auto-scale learning rates (#1807)
- Fix CI (#1813, #1865, #1877)
- Update `browse_dataset.py` script (#1817)
- Update SUN RGB-D and Lyft datasets documentation (#1833)
- Rename `convert_to_datasample` to `add_pred_to_datasample` in detectors (#1843)
- Update customized dataset documentation (#1845)
- Update `Det3DLocalVisualization` and visualization documentation (#1857)
- Add the code of generating `cam_sync_labels` for Waymo dataset (#1870)
- Update dataset transforms typehints (#1875)
#### Bug Fixes
- Fix missing registration of models in [setup_env.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/utils/setup_env.py) (#1808)
- Fix the data base sampler bugs when using the ground plane data (#1812)
- Add output directory existing check during visualization (#1828)
- Fix bugs of nuScenes dataset for monocular 3D detection (#1837)
- Fix visualization hook to support the visualization of different data modalities (#1839)
- Fix monocular 3D detection demo (#1864)
- Fix the lack of `num_pts_feats` key in nuscenes dataset and complete docstring (#1882)
#### Contributors
A total of 10 developers contributed to this release.
@ZwwWayne, @Tai-Wang, @lianqing11, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @tpoisonooo, @ice-tong, @jshilong
### v1.1.0rc0 (1/9/2022)
We are excited to announce the release of MMDetection3D 1.1.0rc0.
MMDet3D 1.1.0rc0 is the first version of MMDetection3D 1.1, a part of the OpenMMLab 2.0 projects.
Built upon the new [training engine](https://github.com/open-mmlab/mmengine) and [MMDet 3.x](https://github.com/open-mmlab/mmdetection/tree/3.x),
MMDet3D 1.1 unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.
It also provides a standard data protocol for different datasets, modalities, and tasks for 3D perception.
We will support more strong baselines in the future release, with our latest exploration on camera-only 3D detection from videos.
### Highlights
1. **New engines**. MMDet3D 1.1 is based on [MMEngine](https://github.com/open-mmlab/mmengine) and [MMDet 3.x](https://github.com/open-mmlab/mmdetection/tree/3.x), which provides a universal and powerful runner that allows more flexible customizations and significantly simplifies the entry points of high-level interfaces.
2. **Unified interfaces**. As a part of the OpenMMLab 2.0 projects, MMDet3D 1.1 unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
3. **Standard data protocol for all the datasets, modalities, and tasks for 3D perception**. Based on the unified base datasets inherited from MMEngine, we also design a standard data protocol that defines and unifies the common keys across different datasets, tasks, and modalities. It significantly simplifies the usage of multiple datasets and data modalities for multi-task frameworks and eases dataset customization. Please refer to the [documentation of customized datasets](../advanced_guides/customize_dataset.md) for details.
4. **Strong baselines**. We will release strong baselines of many popular models to enable fair comparisons among state-of-the-art models.
5. **More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it [here](https://mmdetection3d.readthedocs.io/en/1.1/).
### Breaking Changes
MMDet3D 1.1 has undergone significant changes to have better design, higher efficiency, more flexibility, and more unified interfaces.
Besides the changes of API, we briefly list the major breaking changes in this section.
We will update the [migration guide](../migration.md) to provide complete details and migration instructions.
Users can also refer to the [compatibility documentation](./compatibility.md) and [API doc](https://mmdetection3d.readthedocs.io/en/1.1/) for more details.
#### Dependencies
- MMDet3D 1.1 runs on PyTorch>=1.6. We have deprecated the support of PyTorch 1.5 to embrace the mixed precision training and other new features since PyTorch 1.6. Some models can still run on PyTorch 1.5, but the full functionality of MMDet3D 1.1 is not guaranteed.
- MMDet3D 1.1 relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models of OpenMMLab and are widely depended by OpenMMLab 2.0 projects. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
- MMDet3D 1.1 relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMDet3D 1.1 relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package `mmcv` is the version that provides pre-built CUDA operators and `mmcv-lite` does not since MMCV 2.0.0rc0, while `mmcv-full` has been deprecated since 2.0.0rc0.
- MMDet3D 1.1 is based on MMDet 3.x, which is also a part of OpenMMLab 2.0 projects.
#### Training and testing
- MMDet3D 1.1 uses Runner in [MMEngine](https://github.com/open-mmlab/mmengine) rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMDet3D 1.1 no longer relies on the building logics of those modules in `mmdet3d.train.apis` and `tools/train.py`. Those code have been migrated into [MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py). Please refer to the [migration guide of Runner in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for more details.
- The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
- The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the [migration guide of Hook in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/hook.html) for more details.
- Learning rate and momentum scheduling has been migrated from Hook to [Parameter Scheduler in MMEngine](https://mmengine.readthedocs.io/en/latest/tutorials/param_scheduler.html). Please refer to the [migration guide of Parameter Scheduler in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/param_scheduler.html) for more details.
#### Configs
- The [Runner in MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py) uses a different config structure to ease the understanding of the components in runner. Users can read the [config example of MMDet3D 1.1](../user_guides/config.md) or refer to the [migration guide in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for migration details.
- The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. The names of checkpoints are not updated for now as there is no BC-breaking of model weights between MMDet3D 1.1 and 1.0.x. We will progressively replace all the model weights by those trained in MMDet3D 1.1. Please refer to the [user guides of config](../user_guides/config.md) for more details.
#### Dataset
The Dataset classes implemented in MMDet3D 1.1 all inherits from the `Det3DDataset` and `Seg3DDataset`, which inherits from the [BaseDataset in MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/basedataset.html). In addition to the changes of interfaces, there are several changes of Dataset in MMDet3D 1.1.
- All the datasets support to serialize the internal data list to reduce the memory when multiple workers are built for data loading.
- The internal data structure in the dataset is changed to be self-contained (without losing information like class names in MMDet3D 1.0.x) while keeping simplicity.
- Common keys across different datasets and data modalities are defined and all the info files are unified into a standard protocol.
- The evaluation functionality of each dataset has been removed from dataset so that some specific evaluation metrics like KITTI AP can be used to evaluate the prediction on other datasets.
#### Data Transforms
The data transforms in MMDet3D 1.1 all inherits from `BaseTransform` in MMCV>=2.0.0rc0, which defines a new convention in OpenMMLab 2.0 projects.
Besides the interface changes, there are several changes listed as below:
- The functionality of some data transforms (e.g., `Resize`) are decomposed into several transforms to simplify and clarify the usages.
- The format of data dict processed by each data transform is changed according to the new data structure of dataset.
- Some inefficient data transforms (e.g., normalization and padding) are moved into data preprocessor of model to improve data loading and training speed.
- The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic given the same arguments, i.e., `Resize` in MMDet 3.x and MMSeg 1.x will resize the image in the exact same manner given the same arguments.
#### Model
The models in MMDet3D 1.1 all inherits from `BaseModel` in MMEngine, which defines a new convention of models in OpenMMLeb 2.0 projects.
Users can refer to [the tutorial of model in MMengine](https://mmengine.readthedocs.io/en/latest/tutorials/model.html) for more details.
Accordingly, there are several changes as the following:
- The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMDet3D 1.1.
Specifically, all the input data in training and testing are packed into `inputs` and `data_samples`, where `inputs` contains model inputs like a dict contain a list of image tensors and the point cloud data, and `data_samples` contains other information of the current data sample such as ground truths, region proposals, and model predictions. In this way, different tasks in MMDet3D 1.1 can share the same input arguments, which makes the models more general and suitable for multi-task learning and some flexible training paradigms like semi-supervised learning.
- The model has a data preprocessor module, which are used to pre-process the input data of model. In MMDet3D 1.1, the data preprocessor usually does necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
- The internal logic of model have been changed. In MMDet3D 1.1, model uses `forward_train`, `forward_test`, `simple_test`, and `aug_test` to deal with different model forward logics. In MMDet3D 1.1 and OpenMMLab 2.0, the forward function has three modes: 'loss', 'predict', and 'tensor' for training, inference, and tracing or other purposes, respectively.
The forward function calls `self.loss`, `self.predict`, and `self._forward` given the modes 'loss', 'predict', and 'tensor', respectively.
#### Evaluation
The evaluation in MMDet3D 1.0.x strictly binds with the dataset. In contrast, MMDet3D 1.1 decomposes the evaluation from dataset, so that all the detection dataset can evaluate with KITTI AP and other metrics implemented in MMDet3D 1.1.
MMDet3D 1.1 mainly implements corresponding metrics for each dataset, which are manipulated by [Evaluator](https://mmengine.readthedocs.io/en/latest/design/evaluator.html) to complete the evaluation.
Users can build evaluator in MMDet3D 1.1 to conduct offline evaluation, i.e., evaluate predictions that may not produced in MMDet3D 1.1 with the dataset as long as the dataset and the prediction follows the dataset conventions. More details can be find in the [tutorial in mmengine](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html).
#### Visualization
The functions of visualization in MMDet3D 1.1 are removed. Instead, in OpenMMLab 2.0 projects, we use [Visualizer](https://mmengine.readthedocs.io/en/latest/design/visualization.html) to visualize data. MMDet3D 1.1 implements `Det3DLocalVisualizer` to allow visualization of 2D and 3D data, ground truths, model predictions, and feature maps, etc., at any place. It also supports to send the visualization data to any external visualization backends such as Tensorboard.
### Planned changes
We list several planned changes of MMDet3D 1.1.0rc0 so that the community could more comprehensively know the progress of MMDet3D 1.1. Feel free to create a PR, issue, or discussion if you are interested, have any suggestions and feedbacks, or want to participate.
1. Test-time augmentation: which is supported in MMDet3D 1.0.x, is not implemented in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
2. Inference interfaces: a unified inference interfaces will be supported in the future to ease the use of released models.
3. Interfaces of useful tools that can be used in notebook: more useful tools that implemented in the `tools` directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
4. Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMDet3D 1.1.
5. Wandb visualization: MMDet 2.x supports data visualization since v2.25.0, which has not been migrated to MMDet 3.x for now. Since Wandb provides strong visualization and experiment management capabilities, a `DetWandbVisualizer` and maybe a hook are planned to fully migrated those functionalities in MMDet 2.x and a `Det3DWandbVisualizer` will be supported in MMDet3D 1.1 accordingly.
6. Will support recent new features added in MMDet3D 1.0.x and our recent exploration on camera-only 3D detection from videos: we will refactor these models and support them with benchmarks and models soon.
# Changelog of v1.0.x
### v1.0.0 (6/4/2023)
#### Improvements
- Add BN in FPN to avoid loss Nan in MVXNet (#2282)
- Update `s3dis_data_utils.py` (#2232)
#### Bug Fixes
- Fix precision error when using mixed precision on CenterPoint (#2341)
- Replace `np.transpose` with `torch.permute` to speed up (@2273)
- Update links of SECOND checkpoints (#2185)
#### Contributors
A total of 7 developers contributed to this release.
@JingweiZhang12, @ZCMax, @Xiangxu-0103, @vansinhu, @cs1488, @sunjiahao1999, @Ginray
### v1.0.0rc7 (7/1/2023)
#### Improvements
- Support training and testing on MLU (#2167)
#### Contributors
A total of 1 developers contributed to this release.
@mengpenghui
### v1.0.0rc6 (2/12/2022)
#### New Features
- Add `Projects/` folder and the first example project (#2082)
#### Improvements
- Update Waymo converter to save storage space (#1759)
- Update model link and performance of CenterPoint (#1916)
#### Bug Fixes
- Fix GPU memory occupancy problem in PointRCNN (#1928)
- Fix sampling bug in `IoUNegPiecewiseSampler` (#2018)
#### Contributors
A total of 6 developers contributed to this release.
@oyel, @zzj403, @VVsssssk, @Tai-Wang, @tpoisonooo, @JingweiZhang12, @ZCMax
### v1.0.0rc5 (11/10/2022)
#### New Features
- Support ImVoxelNet on SUN RGB-D (#1738)
#### Improvements
- Fix the cross-codebase reference problem in metafile README (#1644)
- Update the Chinese documentation about getting started (#1715)
- Fix docs link and add docs link checker (#1811)
#### Bug Fixes
- Fix a visualization bug that is potentially triggered by empty prediction labels (#1725)
- Fix point cloud segmentation visualization bug due to wrong parameter passing (#1858)
- Fix Nan loss bug during PointRCNN training (#1874)
#### Contributors
A total of 9 developers contributed to this release.
@ZwwWayne, @Tai-Wang, @filaPro, @VVsssssk, @ZCMax, @Xiangxu-0103, @holtvogt, @tpoisonooo, @lianqing01
### v1.0.0rc4 (8/8/2022)
#### Highlights
- Support [FCAF3D](https://arxiv.org/pdf/2112.00322.pdf)
#### New Features
- Support [FCAF3D](https://arxiv.org/pdf/2112.00322.pdf) (#1547)
- Add the transformation to support multi-camera 3D object detection (#1580)
- Support lift-splat-shoot view transformer (#1598)
#### Improvements
- Remove the limitation of the maximum number of points during SUN RGB-D preprocessing (#1555)
- Support circle CI (#1647)
- Add mim to extras_require in setup.py (#1560, #1574)
- Update dockerfile package version (#1697)
#### Bug Fixes
- Flip yaw angle for DepthInstance3DBoxes.overlaps (#1548, #1556)
- Fix DGCNN configs (#1587)
- Fix bbox head not registered bug (#1625)
- Fix missing objects in S3DIS preprocessing (#1665)
- Fix spconv2.0 model loading bug (#1699)
#### Contributors
A total of 9 developers contributed to this release.
@Tai-Wang, @ZwwWayne, @filaPro, @lianqing11, @ZCMax, @HuangJunJie2017, @Xiangxu-0103, @ChonghaoSima, @VVsssssk
### v1.0.0rc3 (8/6/2022)
#### Highlights
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf)
#### New Features
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf) (#1337)
#### Improvements
- Add Chinese documentation for vision-only 3D detection (#1438)
- Update CenterPoint pretrained models that are compatible with refactored coordinate systems (#1450)
- Configure myst-parser to parse anchor tag in the documentation (#1488)
- Replace markdownlint with mdformat for avoiding installing ruby (#1489)
- Add missing `gt_names` when getting annotation info in Custom3DDataset (#1519)
- Support S3DIS full ceph training (#1542)
- Rewrite the installation and FAQ documentation (#1545)
#### Bug Fixes
- Fix the incorrect registry name when building RoI extractors (#1460)
- Fix the potential problems caused by the registry scope update when composing pipelines (#1466) and using CocoDataset (#1536)
- Fix the missing selection with `order` in the [box3d_nms](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/post_processing/box3d_nms.py) introduced by [#1403](https://github.com/open-mmlab/mmdetection3d/pull/1403) (#1479)
- Update the [PointPillars config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py) to make it consistent with the log (#1486)
- Fix heading anchor in documentation (#1490)
- Fix the compatibility of mmcv in the dockerfile (#1508)
- Make overwrite_spconv packaged when building whl (#1516)
- Fix the requirement of mmcv and mmdet (#1537)
- Update configs of PartA2 and support its compatibility with spconv 2.0 (#1538)
#### Contributors
A total of 13 developers contributed to this release.
@Xiangxu-0103, @ZCMax, @jshilong, @filaPro, @atinfinity, @Tai-Wang, @wenbo-yu, @yi-chen-isuzu, @ZwwWayne, @wchen61, @VVsssssk, @AlexPasqua, @lianqing11
### v1.0.0rc2 (1/5/2022)
#### Highlights
- Support spconv 2.0
- Support MinkowskiEngine with MinkResNet
- Support training models on custom datasets with only point clouds
- Update Registry to distinguish the scope of built functions
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes
#### New Features
- Add loader arguments in the configuration files (#1388)
- Support [spconv 2.0](https://github.com/traveller59/spconv) when the package is installed. Users can still use spconv 1.x in MMCV with CUDA 9.0 (only cost more memory) without losing the compatibility of model weights between two versions (#1421)
- Support MinkowskiEngine with MinkResNet (#1422)
#### Improvements
- Add the documentation for model deployment (#1373, #1436)
- Add Chinese documentation of
- Speed benchmark (#1379)
- LiDAR-based 3D detection (#1368)
- LiDAR 3D segmentation (#1420)
- Coordinate system refactoring (#1384)
- Support training models on custom datasets with only point clouds (#1393)
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes (#1403, #1418)
- Update Registry to distinguish the scope of building functions (#1412, #1443)
- Replace recommonmark with myst_parser for documentation rendering (#1414)
#### Bug Fixes
- Fix the show pipeline in the [browse_dataset.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/misc/browse_dataset.py) (#1376)
- Fix missing __init__ files after coordinate system refactoring (#1383)
- Fix the incorrect yaw in the visualization caused by coordinate system refactoring (#1407)
- Fix `NaiveSyncBatchNorm1d` and `NaiveSyncBatchNorm2d` to support non-distributed cases and more general inputs (#1435)
#### Contributors
A total of 11 developers contributed to this release.
@ZCMax, @ZwwWayne, @Tai-Wang, @VVsssssk, @HanaRo, @JoeyforJoy, @ansonlcy, @filaPro, @jshilong, @Xiangxu-0103, @deleomike
### v1.0.0rc1 (1/4/2022)
#### Compatibility
- We migrate all the mmdet3d ops to mmcv and do not need to compile them when installing mmdet3d.
- To fix the imprecise timestamp and optimize its saving method, we reformat the point cloud data during Waymo data conversion. The data conversion time is also optimized significantly by supporting parallel processing. Please re-generate KITTI format Waymo data if necessary. See more details in the [compatibility documentation](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md).
- We update some of the model checkpoints after the refactor of coordinate systems. Please stay tuned for the release of the remaining model checkpoints.
| | Fully Updated | Partially Updated | In Progress | No Influcence |
| ------------- | :-----------: | :---------------: | :---------: | :-----------: |
| SECOND | | ✓ | | |
| PointPillars | | ✓ | | |
| FreeAnchor | ✓ | | | |
| VoteNet | ✓ | | | |
| H3DNet | ✓ | | | |
| 3DSSD | | ✓ | | |
| Part-A2 | ✓ | | | |
| MVXNet | ✓ | | | |
| CenterPoint | | | ✓ | |
| SSN | ✓ | | | |
| ImVoteNet | ✓ | | | |
| FCOS3D | | | | ✓ |
| PointNet++ | | | | ✓ |
| Group-Free-3D | | | | ✓ |
| ImVoxelNet | ✓ | | | |
| PAConv | | | | ✓ |
| DGCNN | | | | ✓ |
| SMOKE | | | | ✓ |
| PGD | | | | ✓ |
| MonoFlex | | | | ✓ |
#### Highlights
- Migrate all the mmdet3d ops to mmcv
- Support parallel waymo data converter
- Add ScanNet instance segmentation dataset with metrics
- Better compatibility for windows with CI support, op migration and bug fixes
- Support loading annotations from Ceph
#### New Features
- Add ScanNet instance segmentation dataset with metrics (#1230)
- Support different random seeds for different ranks (#1321)
- Support loading annotations from Ceph (#1325)
- Support resuming from the latest checkpoint automatically (#1329)
- Add windows CI (#1345)
#### Improvements
- Update the table format and OpenMMLab project orders in [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md) (#1272, #1283)
- Migrate all the mmdet3d ops to mmcv (#1240, #1286, #1290, #1333)
- Add `with_plane` flag in the KITTI data conversion (#1278)
- Update instructions and links in the documentation (#1300, 1309, #1319)
- Support parallel Waymo dataset converter and ground truth database generator (#1327)
- Add quick installation commands to [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/getting_started.md) (#1366)
#### Bug Fixes
- Update nuimages configs to use new nms config style (#1258)
- Fix the usage of np.long for windows compatibility (#1270)
- Fix the incorrect indexing in `BasePoints` (#1274)
- Fix the incorrect indexing in the [pillar_scatter.forward_single](https://github.com/open-mmlab/mmdetection3d/blob/dev/mmdet3d/models/middle_encoders/pillar_scatter.py#L38) (#1280)
- Fix unit tests that use GPUs (#1301)
- Fix incorrect feature dimensions in `DynamicPillarFeatureNet` caused by previous upgrading of `PillarFeatureNet` (#1302)
- Remove the `CameraPoints` constraint in `PointSample` (#1314)
- Fix imprecise timestamps saving of Waymo dataset (#1327)
#### Contributors
A total of 9 developers contributed to this release.
@ZCMax, @ZwwWayne, @wHao-Wu, @Tai-Wang, @wangruohui, @zjwzcx, @Xiangxu-0103, @EdAyers, @hongye-dev, @zhanggefan
### v1.0.0rc0 (18/2/2022)
#### Compatibility
- We refactor our three coordinate systems to make their rotation directions and origins more consistent, and further remove unnecessary hacks in different datasets and models. Therefore, please re-generate data infos or convert the old version to the new one with our provided scripts. We will also provide updated checkpoints in the next version. Please refer to the [compatibility documentation](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/docs/en/compatibility.md) for more details.
- Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification changes the key names to `lidar2img`, `depth2img`, `cam2img`, etc., for easier understanding. Customized codes using legacy keys may be influenced.
- The next release will begin to move files of CUDA ops to [MMCV](https://github.com/open-mmlab/mmcv). It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
#### Highlights
- Support new monocular 3D detectors: [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd), [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke), [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex)
- Support a new LiDAR-based detector: [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn)
- Support a new backbone: [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn)
- Support 3D object detection on the S3DIS dataset
- Support compilation on Windows
- Full benchmark for PAConv on S3DIS
- Further enhancement for documentation, especially on the Chinese documentation
#### New Features
- Support 3D object detection on the S3DIS dataset (#835)
- Support PointRCNN (#842, #843, #856, #974, #1022, #1109, #1125)
- Support DGCNN (#896)
- Support PGD (#938, #940, #948, #950, #964, #1014, #1065, #1070, #1157)
- Support SMOKE (#939, #955, #959, #975, #988, #999, #1029)
- Support MonoFlex (#1026, #1044, #1114, #1115, #1183)
- Support CPU Training (#1196)
#### Improvements
- Support point sampling based on distance metric (#667, #840)
- Refactor coordinate systems (#677, #774, #803, #899, #906, #912, #968, #1001)
- Unify camera keys in PointFusion and transformations between different systems (#791, #805)
- Refine documentation (#792, #827, #829, #836, #849, #854, #859, #1111, #1113, #1116, #1121, #1132, #1135, #1185, #1193, #1226)
- Add a script to support benchmark regression (#808)
- Benchmark PAConvCUDA on S3DIS (#847)
- Support to download pdf and epub documentation (#850)
- Change the `repeat` setting in Group-Free-3D configs to reduce training epochs (#855)
- Support KITTI AP40 evaluation metric (#927)
- Add the mmdet3d2torchserve tool for SECOND (#977)
- Add code-spell pre-commit hook and fix typos (#995)
- Support the latest numba version (#1043)
- Set a default seed to use when the random seed is not specified (#1072)
- Distribute mix-precision models to each algorithm folder (#1074)
- Add abstract and a representative figure for each algorithm (#1086)
- Upgrade pre-commit hook (#1088, #1217)
- Support augmented data and ground truth visualization (#1092)
- Add local yaw property for `CameraInstance3DBoxes` (#1130)
- Lock the required numba version to 0.53.0 (#1159)
- Support the usage of plane information for KITTI dataset (#1162)
- Deprecate the support for "python setup.py test" (#1164)
- Reduce the number of multi-process threads to accelerate training (#1168)
- Support 3D flip augmentation for semantic segmentation (#1181)
- Update README format for each model (#1195)
#### Bug Fixes
- Fix compiling errors on Windows (#766)
- Fix the deprecated nms setting in the ImVoteNet config (#828)
- Use the latest `wrap_fp16_model` import from mmcv (#861)
- Remove 2D annotations generation on Lyft (#867)
- Update index files for the Chinese documentation to be consistent with the English version (#873)
- Fix the nested list transpose in the CenterPoint head (#879)
- Fix deprecated pretrained model loading for RegNet (#889)
- Fix the incorrect dimension indices of rotations and testing config in the CenterPoint test time augmentation (#892)
- Fix and improve visualization tools (#956, #1066, #1073)
- Fix PointPillars FLOPs calculation error (#1075)
- Fix missing dimension information in the SUN RGB-D data generation (#1120)
- Fix incorrect anchor range settings in the PointPillars [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py) for KITTI (#1163)
- Fix incorrect model information in the RegNet metafile (#1184)
- Fix bugs in non-distributed multi-gpu training and testing (#1197)
- Fix a potential assertion error when generating corners from an empty box (#1212)
- Upgrade bazel version according to the requirement of Waymo Devkit (#1223)
#### Contributors
A total of 12 developers contributed to this release.
@THU17cyz, @wHao-Wu, @wangruohui, @Wuziyi616, @filaPro, @ZwwWayne, @Tai-Wang, @DCNSW, @xieenze, @robin-karlsson0, @ZCMax, @Otteri
### v0.18.1 (1/2/2022)
#### Improvements
- Support Flip3D augmentation in semantic segmentation task (#1182)
- Update regnet metafile (#1184)
- Add point cloud annotation tools introduction in FAQ (#1185)
- Add missing explanations of `cam_intrinsic` in the nuScenes dataset doc (#1193)
#### Bug Fixes
- Deprecate the support for "python setup.py test" (#1164)
- Fix the rotation matrix while rotation axis=0 (#1182)
- Fix the bug in non-distributed multi-gpu training/testing (#1197)
- Fix a potential bug when generating corners for empty bounding boxes (#1212)
#### Contributors
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
### v0.18.0 (1/1/2022)
#### Highlights
- Update the required minimum version of mmdet and mmseg
#### Improvements
- Use the official markdownlint hook and add codespell hook for pre-committing (#1088)
- Improve CI operation (#1095, #1102, #1103)
- Use shared menu content from OpenMMLab's theme and remove duplicated contents from config (#1111)
- Refactor the structure of documentation (#1113, #1121)
- Update the required minimum version of mmdet and mmseg (#1147)
#### Bug Fixes
- Fix symlink failure on Windows (#1096)
- Fix the upper bound of mmcv version in the mminstall requirements (#1104)
- Fix API documentation compilation and mmcv build errors (#1116)
- Fix figure links and pdf documentation compilation (#1132, #1135)
#### Contributors
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
### v0.17.3 (1/12/2021)
#### Improvements
- Change the default show value to `False` in show_result function to avoid unnecessary errors (#1034)
- Improve the visualization of detection results with colorized points in [single_gpu_test](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/test.py#L11) (#1050)
- Clean unnecessary custom_imports in entrypoints (#1068)
#### Bug Fixes
- Update mmcv version in the Dockerfile (#1036)
- Fix the memory-leak problem when loading checkpoints in [init_model](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/inference.py#L36) (#1045)
- Fix incorrect velocity indexing when formatting boxes on nuScenes (#1049)
- Explicitly set cuda device ID in [init_model](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/inference.py#L36) to avoid memory allocation on unexpected devices (#1056)
- Fix PointPillars FLOPs calculation error (#1076)
#### Contributors
A total of 5 developers contributed to this release.
@wHao-Wu, @Tai-Wang, @ZCMax, @MilkClouds, @aldakata
### v0.17.2 (1/11/2021)
#### Improvements
- Update Group-Free-3D and FCOS3D bibtex (#985)
- Update the solutions for incompatibility of pycocotools in the FAQ (#993)
- Add Chinese documentation for the KITTI (#1003) and Lyft (#1010) dataset tutorial
- Add the H3DNet checkpoint converter for incompatible keys (#1007)
#### Bug Fixes
- Update mmdetection and mmsegmentation version in the Dockerfile (#992)
- Fix links in the Chinese documentation (#1015)
#### Contributors
A total of 4 developers contributed to this release.
@Tai-Wang, @wHao-Wu, @ZwwWayne, @ZCMax
### v0.17.1 (1/10/2021)
#### Highlights
- Support a faster but non-deterministic version of hard voxelization
- Completion of dataset tutorials and the Chinese documentation
- Improved the aesthetics of the documentation format
#### Improvements
- Add Chinese documentation for training on customized datasets and designing customized models (#729, #820)
- Support a faster but non-deterministic version of hard voxelization (#904)
- Update paper titles and code details for metafiles (#917)
- Add a tutorial for KITTI dataset (#953)
- Use Pytorch sphinx theme to improve the format of documentation (#958)
- Use the docker to accelerate CI (#971)
#### Bug Fixes
- Fix the sphinx version used in the documentation (#902)
- Fix a dynamic scatter bug that discards the first voxel by mistake when all input points are valid (#915)
- Fix the inconsistent variable names used in the [unit test](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_voxel_encoder/test_voxel_generator.py) for voxel generator (#919)
- Upgrade to use `build_prior_generator` to replace the legacy `build_anchor_generator` (#941)
- Fix a minor bug caused by a too small difference set in the FreeAnchor Head (#944)
#### Contributors
A total of 8 developers contributed to this release.
@DCNSW, @zhanggefan, @mickeyouyou, @ZCMax, @wHao-Wu, @tojimahammatov, @xiliu8006, @Tai-Wang
### v0.17.0 (1/9/2021)
#### Compatibility
- Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification change the key names to `lidar2img`, `depth2img`, `cam2img`, etc. for easier understanding. Customized codes using legacy keys may be influenced.
- The next release will begin to move files of CUDA ops to [MMCV](https://github.com/open-mmlab/mmcv). It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
#### Highlights
- Support 3D object detection on the S3DIS dataset
- Support compilation on Windows
- Full benchmark for PAConv on S3DIS
- Further enhancement for documentation, especially on the Chinese documentation
#### New Features
- Support 3D object detection on the S3DIS dataset (#835)
#### Improvements
- Support point sampling based on distance metric (#667, #840)
- Update PointFusion to support unified camera keys (#791)
- Add Chinese documentation for customized dataset (#792), data pipeline (#827), customized runtime (#829), 3D Detection on ScanNet (#836), nuScenes (#854) and Waymo (#859)
- Unify camera keys used in transformation between different systems (#805)
- Add a script to support benchmark regression (#808)
- Benchmark PAConvCUDA on S3DIS (#847)
- Add a tutorial for 3D detection on the Lyft dataset (#849)
- Support to download pdf and epub documentation (#850)
- Change the `repeat` setting in Group-Free-3D configs to reduce training epochs (#855)
#### Bug Fixes
- Fix compiling errors on Windows (#766)
- Fix the deprecated nms setting in the ImVoteNet config (#828)
- Use the latest `wrap_fp16_model` import from mmcv (#861)
- Remove 2D annotations generation on Lyft (#867)
- Update index files for the Chinese documentation to be consistent with the English version (#873)
- Fix the nested list transpose in the CenterPoint head (#879)
- Fix deprecated pretrained model loading for RegNet (#889)
#### Contributors
A total of 11 developers contributed to this release.
@THU17cyz, @wHao-Wu, @wangruohui, @Wuziyi616, @filaPro, @ZwwWayne, @Tai-Wang, @DCNSW, @xieenze, @robin-karlsson0, @ZCMax
### v0.16.0 (1/8/2021)
#### Compatibility
- Remove the rotation and dimension hack in the monocular 3D detection on nuScenes by applying corresponding transformation in the pre-processing and post-processing. The modification only influences nuScenes coco-style json files. Please re-run the data preparation scripts if necessary. See more details in the PR #744.
- Add a new pre-processing module for the ScanNet dataset in order to support multi-view detectors. Please run the updated scripts to extract the RGB data and its annotations. See more details in the PR #696.
#### Highlights
- Support to use [MIM](https://github.com/open-mmlab/mim) with pip installation
- Support PAConv [models and benchmarks](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/paconv) on S3DIS
- Enhance the documentation especially on dataset tutorials
#### New Features
- Support RGB images on ScanNet for multi-view detectors (#696)
- Support FLOPs and number of parameters calculation (#736)
- Support to use [MIM](https://github.com/open-mmlab/mim) with pip installation (#782)
- Support PAConv models and benchmarks on the S3DIS dataset (#783, #809)
#### Improvements
- Refactor Group-Free-3D to make it inherit BaseModule from MMCV (#704)
- Modify the initialization methods of FCOS3D to be consistent with the refactored approach (#705)
- Benchmark the Group-Free-3D [models](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/groupfree3d) on ScanNet (#710)
- Add Chinese documentation for Getting Started (#725), FAQ (#730), Model Zoo (#735), Demo (#745), Quick Run (#746), Data Preparation (#787) and Configs (#788)
- Add documentation for semantic segmentation on ScanNet and S3DIS (#743, #747, #806, #807)
- Add a parameter `max_keep_ckpts` to limit the maximum number of saved Group-Free-3D checkpoints (#765)
- Add documentation for 3D detection on SUN RGB-D and nuScenes (#770, #793)
- Remove mmpycocotools in the Dockerfile (#785)
#### Bug Fixes
- Fix versions of OpenMMLab dependencies (#708)
- Convert `rt_mat` to `torch.Tensor` in coordinate transformation for compatibility (#709)
- Fix the `bev_range` initialization in `ObjectRangeFilter` according to the `gt_bboxes_3d` type (#717)
- Fix Chinese documentation and incorrect doc format due to the incompatible Sphinx version (#718)
- Fix a potential bug when setting `interval == 1` in [analyze_logs.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/analysis_tools/analyze_logs.py) (#720)
- Update the structure of Chinese documentation (#722)
- Fix FCOS3D FPN BC-Breaking caused by the code refactoring in MMDetection (#739)
- Fix wrong `in_channels` when `with_distance=True` in the [Dynamic VFE Layers](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/voxel_encoders/voxel_encoder.py#L87) (#749)
- Fix the dimension and yaw hack of FCOS3D on nuScenes (#744, #794, #795, #818)
- Fix the missing default `bbox_mode` in the `show_multi_modality_result` (#825)
#### Contributors
A total of 12 developers contributed to this release.
@yinchimaoliang, @gopi231091, @filaPro, @ZwwWayne, @ZCMax, @hjin2902, @wHao-Wu, @Wuziyi616, @xiliu8006, @THU17cyz, @DCNSW, @Tai-Wang
### v0.15.0 (1/7/2021)
#### Compatibility
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to [#1120](https://github.com/open-mmlab/mmcv/pull/1120), for related issues, please refer to [#5343](https://github.com/open-mmlab/mmdetection/issues/5343).
#### Highlights
- Support [PAConv](https://arxiv.org/abs/2103.14635)
- Support monocular/multi-view 3D detector [ImVoxelNet](https://arxiv.org/abs/2106.01178) on KITTI
- Support Transformer-based 3D detection method [Group-Free-3D](https://arxiv.org/abs/2104.00678) on ScanNet
- Add documentation for tasks including LiDAR-based 3D detection, vision-only 3D detection and point-based 3D semantic segmentation
- Add dataset documents like ScanNet
#### New Features
- Support Group-Free-3D on ScanNet (#539)
- Support PAConv modules (#598, #599)
- Support ImVoxelNet on KITTI (#627, #654)
#### Improvements
- Add unit tests for pipeline functions `LoadImageFromFileMono3D`, `ObjectNameFilter` and `ObjectRangeFilter` (#615)
- Enhance [IndoorPatchPointSample](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/transforms_3d.py) (#617)
- Refactor model initialization methods based MMCV (#622)
- Add Chinese docs (#629)
- Add documentation for LiDAR-based 3D detection (#642)
- Unify intrinsic and extrinsic matrices for all datasets (#653)
- Add documentation for point-based 3D semantic segmentation (#663)
- Add documentation of ScanNet for 3D detection (#664)
- Refine docs for tutorials (#666)
- Add documentation for vision-only 3D detection (#669)
- Refine docs for Quick Run and Useful Tools (#686)
#### Bug Fixes
- Fix the bug of [BackgroundPointsFilter](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/transforms_3d.py) using the bottom center of ground truth (#609)
- Fix [LoadMultiViewImageFromFiles](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/loading.py) to unravel stacked multi-view images to list to be consistent with DefaultFormatBundle (#611)
- Fix the potential bug in [analyze_logs](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/analysis_tools/analyze_logs.py) when the training resumes from a checkpoint or is stopped before evaluation (#634)
- Fix test commands in docs and make some refinements (#635)
- Fix wrong config paths in unit tests (#641)
### v0.14.0 (1/6/2021)
#### Highlights
- Support the point cloud segmentation method [PointNet++](https://arxiv.org/abs/1706.02413)
#### New Features
- Support PointNet++ (#479, #528, #532, #541)
- Support RandomJitterPoints transform for point cloud segmentation (#584)
- Support RandomDropPointsColor transform for point cloud segmentation (#585)
#### Improvements
- Move the point alignment of ScanNet from data pre-processing to pipeline (#439, #470)
- Add compatibility document to provide detailed descriptions of BC-breaking changes (#504)
- Add MMSegmentation installation requirement (#535)
- Support points rotation even without bounding box in GlobalRotScaleTrans for point cloud segmentaiton (#540)
- Support visualization of detection results and dataset browse for nuScenes Mono-3D dataset (#542, #582)
- Support faster implementation of KNN (#586)
- Support RegNetX models on Lyft dataset (#589)
- Remove a useless parameter `label_weight` from segmentation datasets including `Custom3DSegDataset`, `ScanNetSegDataset` and `S3DISSegDataset` (#607)
#### Bug Fixes
- Fix a corrupted lidar data file in Lyft dataset in [data_preparation](https://github.com/open-mmlab/mmdetection3d/tree/master/docs/data_preparation.md) (#546)
- Fix evaluation bugs in nuScenes and Lyft dataset (#549)
- Fix converting points between coordinates with specific transformation matrix in the [coord_3d_mode.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/coord_3d_mode.py) (#556)
- Support PointPillars models on Lyft dataset (#578)
- Fix the bug of demo with pre-trained VoteNet model on ScanNet (#600)
### v0.13.0 (1/5/2021)
#### Highlights
- Support a monocular 3D detection method [FCOS3D](https://arxiv.org/abs/2104.10956)
- Support ScanNet and S3DIS semantic segmentation dataset
- Enhancement of visualization tools for dataset browsing and demos, including support of visualization for multi-modality data and point cloud segmentation.
#### New Features
- Support ScanNet semantic segmentation dataset (#390)
- Support monocular 3D detection on nuScenes (#392)
- Support multi-modality visualization (#405)
- Support nuimages visualization (#408)
- Support monocular 3D detection on KITTI (#415)
- Support online visualization of semantic segmentation results (#416)
- Support ScanNet test results submission to online benchmark (#418)
- Support S3DIS data pre-processing and dataset class (#433)
- Support FCOS3D (#436, #442, #482, #484)
- Support dataset browse for multiple types of datasets (#467)
- Adding paper-with-code (PWC) metafile for each model in the model zoo (#485)
#### Improvements
- Support dataset browsing for SUNRGBD, ScanNet or KITTI points and detection results (#367)
- Add the pipeline to load data using file client (#430)
- Support to customize the type of runner (#437)
- Make pipeline functions process points and masks simultaneously when sampling points (#444)
- Add waymo unit tests (#455)
- Split the visualization of projecting points onto image from that for only points (#480)
- Efficient implementation of PointSegClassMapping (#489)
- Use the new model registry from mmcv (#495)
#### Bug Fixes
- Fix Pytorch 1.8 Compilation issue in the [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) (#404)
- Fix [dynamic_scatter](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) errors triggered by empty point input (#417)
- Fix the bug of missing points caused by using break incorrectly in the voxelization (#423)
- Fix the missing `coord_type` in the waymo dataset [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/waymoD5-3d-3class.py) (#441)
- Fix errors in four unittest functions of [configs](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d.py), [test_detectors.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_detectors.py), [test_heads.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_heads/test_heads.py) (#453)
- Fix 3DSSD training errors and simplify configs (#462)
- Clamp 3D votes projections to image boundaries in ImVoteNet (#463)
- Update out-of-date names of pipelines in the [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py) of pointpillars benchmark (#474)
- Fix the lack of a placeholder when unpacking RPN targets in the [h3d_bbox_head.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py) (#508)
- Fix the incorrect value of `K` when creating pickle files for SUN RGB-D (#511)
### v0.12.0 (1/4/2021)
#### Highlights
- Support a new multi-modality method [ImVoteNet](https://arxiv.org/abs/2001.10692).
- Support PyTorch 1.7 and 1.8
- Refactor the structure of tools and [train.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py)/[test.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/test.py)
#### New Features
- Support LiDAR-based semantic segmentation metrics (#332)
- Support [ImVoteNet](https://arxiv.org/abs/2001.10692) (#352, #384)
- Support the KNN GPU operation (#360, #371)
#### Improvements
- Add FAQ for common problems in the documentation (#333)
- Refactor the structure of tools (#339)
- Refactor [train.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py) and [test.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/test.py) (#343)
- Support demo on nuScenes (#353)
- Add 3DSSD checkpoints (#359)
- Update the Bibtex of CenterPoint (#368)
- Add citation format and reference to other OpenMMLab projects in the README (#374)
- Upgrade the mmcv version requirements (#376)
- Add numba and numpy version requirements in FAQ (#379)
- Avoid unnecessary for-loop execution of vfe layer creation (#389)
- Update SUNRGBD dataset documentation to stress the requirements for training ImVoteNet (#391)
- Modify vote head to support 3DSSD (#396)
#### Bug Fixes
- Fix missing keys `coord_type` in database sampler config (#345)
- Rename H3DNet configs (#349)
- Fix CI by using ubuntu 18.04 in github workflow (#350)
- Add assertions to avoid 4-dim points being input to [points_in_boxes](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py) (#357)
- Fix the SECOND results on Waymo in the corresponding [README](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) (#363)
- Fix the incorrect adopted pipeline when adding val to workflow (#370)
- Fix a potential bug when indices used in the backwarding in ThreeNN (#377)
- Fix a compilation error triggered by [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) in PyTorch 1.7 (#393)
### v0.11.0 (1/3/2021)
#### Highlights
- Support more friendly visualization interfaces based on open3d
- Support a faster and more memory-efficient implementation of DynamicScatter
- Refactor unit tests and details of configs
#### New Features
- Support new visualization methods based on open3d (#284, #323)
#### Improvements
- Refactor unit tests (#303)
- Move the key `train_cfg` and `test_cfg` into the model configs (#307)
- Update [README](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md/) with [Chinese version](https://github.com/open-mmlab/mmdetection3d/blob/master/README_zh-CN.md/) and [instructions for getting started](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md/). (#310, #316)
- Support a faster and more memory-efficient implementation of DynamicScatter (#318, #326)
#### Bug Fixes
- Fix an unsupported bias setting in the unit test for centerpoint head (#304)
- Fix errors due to typos in the centerpoint head (#308)
- Fix a minor bug in [points_in_boxes.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py) when tensors are not in the same device. (#317)
- Fix warning of deprecated usages of nonzero during training with PyTorch 1.6 (#330)
### v0.10.0 (1/2/2021)
#### Highlights
- Preliminary release of API for SemanticKITTI dataset.
- Documentation and demo enhancement for better user experience.
- Fix a number of underlying minor bugs and add some corresponding important unit tests.
#### New Features
- Support SemanticKITTI dataset preliminarily (#287)
#### Improvements
- Add tag to README in configurations for specifying different uses (#262)
- Update instructions for evaluation metrics in the documentation (#265)
- Add nuImages entry in [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md/) and gif demo (#266, #268)
- Add unit test for voxelization (#275)
#### Bug Fixes
- Fixed the issue of unpacking size in [furthest_point_sample.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/furthest_point_sample/furthest_point_sample.py) (#248)
- Fix bugs for 3DSSD triggered by empty ground truths (#258)
- Remove models without checkpoints in model zoo statistics of documentation (#259)
- Fix some unclear installation instructions in [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md/) (#269)
- Fix relative paths/links in the documentation (#271)
- Fix a minor bug in [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) when num_features != 4 (#275)
- Fix the bug about missing text files when testing on KITTI (#278)
- Fix issues caused by inplace modification of tensors in `BaseInstance3DBoxes` (#283)
- Fix log analysis for evaluation and adjust the documentation accordingly (#285)
### v0.9.0 (31/12/2020)
#### Highlights
- Documentation refactoring with better structure, especially about how to implement new models and customized datasets.
- More compatible with refactored point structure by bug fixes in ground truth sampling.
#### Improvements
- Documentation refactoring (#242)
#### Bug Fixes
- Fix point structure related bugs in ground truth sampling (#211)
- Fix loading points in ground truth sampling augmentation on nuScenes (#221)
- Fix channel setting in the SeparateHead of CenterPoint (#228)
- Fix evaluation for indoors 3D detection in case of less classes in prediction (#231)
- Remove unreachable lines in nuScenes data converter (#235)
- Minor adjustments of numpy implementation for perspective projection and prediction filtering criterion in KITTI evaluation (#241)
### v0.8.0 (30/11/2020)
#### Highlights
- Refactor points structure with more constructive and clearer implementation.
- Support axis-aligned IoU loss for VoteNet with better performance.
- Update and enhance [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark on Waymo.
#### New Features
- Support axis-aligned IoU loss for VoteNet. (#194)
- Support points structure for consistent processing of all the point related representation. (#196, #204)
#### Improvements
- Enhance [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark on Waymo with stronger baselines. (#205)
- Add model zoo statistics and polish the documentation. (#201)
### v0.7.0 (1/11/2020)
#### Highlights
- Support a new method [SSN](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700579.pdf) with benchmarks on nuScenes and Lyft datasets.
- Update benchmarks for SECOND on Waymo, CenterPoint with TTA on nuScenes and models with mixed precision training on KITTI and nuScenes.
- Support semantic segmentation on nuImages and provide [HTC](https://arxiv.org/abs/1901.07518) models with configurations and performance for reference.
#### New Features
- Modified primitive head which can support the setting on SUN-RGBD dataset (#136)
- Support semantic segmentation and [HTC](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/nuimages) with models for reference on nuImages dataset (#155)
- Support [SSN](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/ssn) on nuScenes and Lyft datasets (#147, #174, #166, #182)
- Support double flip for test time augmentation of CenterPoint with updated benchmark (#143)
#### Improvements
- Update [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark with configurations for reference on Waymo (#166)
- Delete checkpoints on Waymo to comply its specific license agreement (#180)
- Update models and instructions with [mixed precision training](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/fp16) on KITTI and nuScenes (#178)
#### Bug Fixes
- Fix incorrect code weights in anchor3d_head when introducing mixed precision training (#173)
- Fix the incorrect label mapping on nuImages dataset (#155)
### v0.6.1 (11/10/2020)
#### Highlights
- Support mixed precision training of voxel-based methods
- Support docker with PyTorch 1.6.0
- Update baseline configs and results ([CenterPoint](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/centerpoint) on nuScenes and [PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars) on Waymo with full dataset)
- Switch model zoo to download.openmmlab.com
#### New Features
- Support dataset pipeline `VoxelBasedPointSampler` to sample multi-sweep points based on voxelization. (#125)
- Support mixed precision training of voxel-based methods (#132)
- Support docker with PyTorch 1.6.0 (#160)
#### Improvements
- Reduce requirements for the case exclusive of Waymo (#121)
- Switch model zoo to download.openmmlab.com (#126)
- Update docs related to Waymo (#128)
- Add version assertion in the [init file](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/__init__.py) (#129)
- Add evaluation interval setting for CenterPoint (#131)
- Add unit test for CenterPoint (#133)
- Update [PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars) baselines on Waymo with full dataset (#142)
- Update [CenterPoint](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/centerpoint) results with models and logs (#154)
#### Bug Fixes
- Fix a bug of visualization in multi-batch case (#120)
- Fix bugs in dcn unit test (#130)
- Fix dcn bias bug in centerpoint (#137)
- Fix dataset mapping in the evaluation of nuScenes mini dataset (#140)
- Fix origin initialization in `CameraInstance3DBoxes` (#148, #150)
- Correct documentation link in the getting_started.md (#159)
- Fix model save path bug in gather_models.py (#153)
- Fix image padding shape bug in `PointFusion` (#162)
### v0.6.0 (20/9/2020)
#### Highlights
- Support new methods [H3DNet](https://arxiv.org/abs/2006.05682), [3DSSD](https://arxiv.org/abs/2002.10187), [CenterPoint](https://arxiv.org/abs/2006.11275).
- Support new dataset [Waymo](https://waymo.com/open/) (with PointPillars baselines) and [nuImages](https://www.nuscenes.org/nuimages) (with Mask R-CNN and Cascade Mask R-CNN baselines).
- Support Batch Inference
- Support Pytorch 1.6
- Start to publish `mmdet3d` package to PyPI since v0.5.0. You can use mmdet3d through `pip install mmdet3d`.
#### Backwards Incompatible Changes
- Support Batch Inference (#95, #103, #116): MMDetection3D v0.6.0 migrates to support batch inference based on MMDetection >= v2.4.0. This change influences all the test APIs in MMDetection3D and downstream codebases.
- Start to use collect environment function from MMCV (#113): MMDetection3D v0.6.0 migrates to use `collect_env` function in MMCV.
`get_compiler_version` and `get_compiling_cuda_version` compiled in `mmdet3d.ops.utils` are removed. Please import these two functions from `mmcv.ops`.
#### New Features
- Support [nuImages](https://www.nuscenes.org/nuimages) dataset by converting them into coco format and release Mask R-CNN and Cascade Mask R-CNN baseline models (#91, #94)
- Support to publish to PyPI in github-action (#17, #19, #25, #39, #40)
- Support CBGSDataset and make it generally applicable to all the supported datasets (#75, #94)
- Support [H3DNet](https://arxiv.org/abs/2006.05682) and release models on ScanNet dataset (#53, #58, #105)
- Support Fusion Point Sampling used in [3DSSD](https://arxiv.org/abs/2002.10187) (#66)
- Add `BackgroundPointsFilter` to filter background points in data pipeline (#84)
- Support pointnet2 with multi-scale grouping in backbone and refactor pointnets (#82)
- Support dilated ball query used in [3DSSD](https://arxiv.org/abs/2002.10187) (#96)
- Support [3DSSD](https://arxiv.org/abs/2002.10187) and release models on KITTI dataset (#83, #100, #104)
- Support [CenterPoint](https://arxiv.org/abs/2006.11275) and release models on nuScenes dataset (#49, #92)
- Support [Waymo](https://waymo.com/open/) dataset and release PointPillars baseline models (#118)
- Allow `LoadPointsFromMultiSweeps` to pad empty sweeps and select multiple sweeps randomly (#67)
#### Improvements
- Fix all warnings and bugs in PyTorch 1.6.0 (#70, #72)
- Update issue templates (#43)
- Update unit tests (#20, #24, #30)
- Update documentation for using `ply` format point cloud data (#41)
- Use points loader to load point cloud data in ground truth (GT) samplers (#87)
- Unify version file of OpenMMLab projects by using `version.py` (#112)
- Remove unnecessary data preprocessing commands of SUN RGB-D dataset (#110)
#### Bug Fixes
- Rename CosineAnealing to CosineAnnealing (#57)
- Fix device inconsistent bug in 3D IoU computation (#69)
- Fix a minor bug in json2csv of lyft dataset (#78)
- Add missed test data for pointnet modules (#85)
- Fix `use_valid_flag` bug in `CustomDataset` (#106)
### v0.5.0 (9/7/2020)
MMDetection3D is released.
# Compatibility
## v1.1.0rc0
### OpenMMLab v2.0 Refactoring
In this version, we make large refactoring based on MMEngine to achieve unified data elements, model interfaces, visualizers, evaluators and other runtime modules across different datasets, tasks and even codebases. A brief summary for this refactoring is as follows:
- Data Element:
- We add [`Det3DDataSample`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/structures/det3d_data_sample.py) as the common data element passing through datasets and models. It inherits from [`DetDataSample`](<%5Bhttps://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/structures/det3d_data_sample.py%5D(https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/structures/det_data_sample.py)>) in mmdetection and implement `InstanceData`, `PixelData`, and
`LabelData` inheriting from `BaseDataElement` in MMEngine to represent different types of ground truth labels or predictions.
- Datasets:
- We add [`Det3DDataset`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/det3d_dataset.py) and [`Seg3DDataset`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/seg3d_dataset.py) as the base datasets to inherit from the unified `BaseDataset` in MMEngine. They implement most functions that are commonly used across different datasets and simplify the info loading/processing in the current datasets. Re-defined input arguments and functions can be most re-used in different datasets, which are important for the implementation of customized datasets.
- We define the common keys across different datasets and unify all the info files with a standard protocol. The same info is more clear for users because they share the same key across different dataset infos. Besides, for different settings, such as camera-only and LiDAR-only methods, we no longer need different info formats (like the previous pkl and json files). We can just revise the `parse_data_info` to read the necessary information from a complete info file.
- We add `train_dataloader`, `val_dataloader` and `test_dataloader` to replace the original `data` in the config. It simplify the levels of data-related fields.
- Data Transforms
- Based on the basic transforms and wrappers re-implemented and simplified in the latest MMCV, we refactor data transforms to inherit from them.
- We also adjust the implementation of current data pipelines to make them compatible with our latest data protocol.
- Normalization, padding of images and voxelization operations are moved to the data-preprocessing.
- `DefaultFormatBundle3D` and `Collect3D` are replaced with `PackDet3DInputs` to pack the data into the element format as model input.
- Models
- Unify the model interface as `inputs`, `data_samples`, `return_loss=False`
- The basic pre-processing before model forward includes: 1) convert input from CPU to GPU tensors; 2) padding images; 3) normalize images; 4) voxelization.
- Return `loss_dict` during training while return `list[data_sample]` during inference
- Simply function interfaces in the models
- Add `preprocess_cfg` in the model configs for pre-processing
- Visualizer
- Design a unified visualizer, [`Det3DLocalVisualizer`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/visualization/local_visualizer.py), based on MMEngine for different 3D tasks and settings
- Support browsing dataset and visualization hooks based on the [`Det3DLocalVisualizer`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/visualization/local_visualizer.py)
- Evaluator
- Decouple evaluators from datasets to make them more flexible: the evaluation codes of each dataset are implemented as a metric class exclusively.
- Add evaluator information to the current dataset configs
- Registry
- Refactor all the registries to inherit from root registries in MMEngine
- When using modules from other codebases, it is necessary to specify the registry scope, such as `mmdet.ResNet`
- Others: Refactor logging, hooks, scheduler, runner and other runtime configs based on MMEngine
## v1.0.0rc1
### Operators Migration
We have adopted CUDA operators compiled from [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/__init__.py) and removed all the CUDA operators in mmdet3d. We now do not need to compile the CUDA operators in mmdet3d anymore.
### Waymo dataset converter refactoring
In this version we did a major code refactoring that boosted the performance of waymo dataset conversion by multiprocessing.
Meanwhile, we also fixed the imprecise timestamps saving issue in waymo dataset conversion. This change introduces following backward compatibility breaks:
- The point cloud .bin files of waymo dataset need to be regenerated.
In the .bin files each point occupies 6 `float32` and the meaning of the last `float32` now changed from **imprecise timestamps** to **range frame offset**.
The **range frame offset** for each point is calculated as`ri * h * w + row * w + col` if the point is from the **TOP** lidar or `-1` otherwise.
The `h`, `w` denote the height and width of the TOP lidar's range frame.
The `ri`, `row`, `col` denote the return index, the row and the column of the range frame where each point locates.
Following tables show the difference across the change:
Before
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :---------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **imprecise timestamp** |
After
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :--------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- The objects' point cloud .bin files in the GT-database of waymo dataset need to be regenerated because we also dumped the range frame offset for each point into it.
Following tables show the difference across the change:
Before
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 |
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 |
| Meaning | x | y | z | intensity | elongation |
After
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :--------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- Any configuration that uses waymo dataset with GT Augmentation should change the `db_sampler.points_loader.load_dim` from `5` to `6`.
## v1.0.0rc0
### Coordinate system refactoring
In this version, we did a major code refactoring which improved the consistency among the three coordinate systems (and corresponding box representation), LiDAR, Camera, and Depth. A brief summary for this refactoring is as follows:
- The three coordinate systems are all right-handed now (which means the yaw angle increases in the counterclockwise direction).
- The LiDAR system `(x_size, y_size, z_size)` corresponds to `(l, w, h)` instead of `(w, l, h)`. This is more natural since `l` is parallel with the direction where the yaw angle is zero, and we prefer using the positive direction of the `x` axis as that direction, which is exactly how we define yaw angle in Depth and Camera coordinate systems.
- The APIs for box-related operations are improved and now are more user-friendly.
#### ***NOTICE!!***
Since definitions of box representation have changed, the annotation data of most datasets require updating:
- SUN RGB-D: Yaw angles in the annotation should be reversed.
- KITTI: For LiDAR boxes in GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted from the old LiDAR coordinate system to the new one. The training/validation data annotations should be left unchanged since they are under the Camera coordinate system, which is unmodified after the refactoring.
- Waymo: Same as KITTI.
- nuScenes: For LiDAR boxes in training/validation data and GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted.
- Lyft: Same as nuScenes.
Please regenerate the data annotation/GT database files or use [`update_data_coords.py`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/tools/update_data_coords.py) to update the data.
To use boxes under Depth and LiDAR coordinate systems, or to convert boxes between different coordinate systems, users should be aware of the difference between the old and new definitions. For example, the rotation, flipping, and bev functions of [`DepthInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/depth_box3d.py) and [`LiDARInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/lidar_box3d.py) and box conversion [functions](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/box_3d_mode.py) have all been reimplemented in the refactoring.
Consequently, functions like [`output_to_lyft_box`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/datasets/lyft_dataset.py) undergo small modification to adapt to the new LiDAR/Depth box.
Since the LiDAR system `(x_size, y_size, z_size)` now corresponds to `(l, w, h)` instead of `(w, l, h)`, the anchor sizes for LiDAR boxes are also changed, e.g., from `[1.6, 3.9, 1.56]` to `[3.9, 1.6, 1.56]`.
Functions only involving points are generally unaffected except if they rely on some refactored utility functions such as `rotation_3d_in_axis`.
#### Other BC-breaking or new features:
- `array_converter`: Please refer to [array_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/utils/array_converter.py). Functions wrapped with `array_converter` can convert array-like input types of `torch.Tensor`, `np.ndarray`, and `list/tuple/float` to `torch.Tensor` to process in an unified PyTorch pipeline. The result may finally be converted back to the input type. Most functions in [utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/utils.py) are wrapped with `array_converter`.
- [`points_in_boxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/base_box3d.py) and [`points_in_boxes_batch`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/base_box3d.py) will be deprecated soon. They are renamed to `points_in_boxes_part` and `points_in_boxes_all` respectively, with more detailed docstrings. The major difference of the two functions is that if a point is enclosed by multiple boxes, `points_in_boxes_part` will only return the index of the first enclosing box while `points_in_boxes_all` will return all the indices of enclosing boxes.
- `rotation_3d_in_axis`: Please refer to [utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/utils.py). Now this function supports multiple input types and more options. The function with the same name in [box_np_ops.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/box_np_ops.py) is deleted since we do not need another function to tackle with NumPy data. `rotation_2d`, `points_cam2img`, and `limit_period` in box_np_ops.py are also deleted for the same reason.
- `bev` method of [`CameraInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/cam_box3d.py): Changed it to be consistent with the definition of bev in Depth and LiDAR coordinate systems.
- Data augmentation utils in [data_augment_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/datasets/pipelines/data_augment_utils.py) now follow the rules of a right-handed system.
- We do not need the yaw hacking in KITTI anymore after refining [`get_direction_target`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/models/dense_heads/train_mixins.py). Interested users may refer to PR [#677](https://github.com/open-mmlab/mmdetection3d/pull/677) .
## 0.16.0
### Returned values of `QueryAndGroup` operation
We modified the returned `grouped_xyz` value of operation `QueryAndGroup` to support PAConv segmentor. Originally, the `grouped_xyz` is centered by subtracting the grouping centers, which represents the relative positions of grouped points. Now, we didn't perform such subtraction and the returned `grouped_xyz` stands for the absolute coordinates of these points.
Note that, the other returned variables of `QueryAndGroup` such as `new_features`, `unique_cnt` and `grouped_idx` are not affected.
### NuScenes coco-style data pre-processing
We remove the rotation and dimension hack in the monocular 3D detection on nuScenes. Specifically, we transform the rotation and dimension of boxes defined by nuScenes devkit to the coordinate system of our `CameraInstance3DBoxes` in the pre-processing and transform them back in the post-processing. In this way, we can remove the corresponding [hack](https://github.com/open-mmlab/mmdetection3d/pull/744/files#diff-5bee5062bd84e6fa25a2fdd71353f6f283dfdc4a66a0316c3b1ca26078c978b6L165) used in the visualization tools. The modification also guarantees the correctness of all the operations based on our `CameraInstance3DBoxes` (such as NMS and flip augmentation) when training monocular 3D detectors.
The modification only influences nuScenes coco-style json files. Please re-run the nuScenes data preparation script if necessary. See more details in the PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744).
### ScanNet dataset for ImVoxelNet
We adopt a new pre-processing procedure for the ScanNet dataset in order to support ImVoxelNet, which is a multi-view method requiring image data. In previous versions of MMDetection3D, ScanNet dataset was only used for point cloud based 3D detection and segmentation methods. We plan adding ImVoxelNet to our model zoo, thus updating ScanNet correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
- Add [script](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/extract_posed_images.py) for extracting RGB data.
- Update [script](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/scannet_data_utils.py) for annotation creating.
- Add instructions in the documents on preparing image data.
Please refer to the ScanNet [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/) for more details.
## 0.15.0
### MMCV Version
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to [#1120](https://github.com/open-mmlab/mmcv/pull/1120), for related issues, please refer to [#5343](https://github.com/open-mmlab/mmdetection/issues/5343).
### Unified parameter initialization
To unify the parameter initialization in OpenMMLab projects, MMCV supports `BaseModule` that accepts `init_cfg` to allow the modules' parameters initialized in a flexible and unified manner. Now the users need to explicitly call `model.init_weights()` in the training script to initialize the model (as in [here](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py#L183), previously this was handled by the detector. Please refer to PR [#622](https://github.com/open-mmlab/mmdetection3d/pull/622) for details.
### BackgroundPointsFilter
We modified the dataset augmentation function `BackgroundPointsFilter`([here](https://github.com/open-mmlab/mmdetection3d/blob/v0.15.0/mmdet3d/datasets/pipelines/transforms_3d.py#L1132)). In previous version of MMdetection3D, `BackgroundPointsFilter` changes the gt_bboxes_3d's bottom center to the gravity center. In MMDetection3D 0.15.0,
`BackgroundPointsFilter` will not change it. Please refer to PR [#609](https://github.com/open-mmlab/mmdetection3d/pull/609) for details.
### Enhance `IndoorPatchPointSample` transform
We enhance the pipeline function `IndoorPatchPointSample` used in point cloud segmentation task by adding more choices for patch selection. Also, we plan to remove the unused parameter `sample_rate` in the future. Please modify the code as well as the config files accordingly if you use this transform.
## 0.14.0
### Dataset class for 3D segmentation task
We remove a useless parameter `label_weight` from segmentation datasets including `Custom3DSegDataset`, `ScanNetSegDataset` and `S3DISSegDataset` since this weight is utilized in the loss function of model class. Please modify the code as well as the config files accordingly if you use or inherit from these codes.
### ScanNet data pre-processing
We adopt new pre-processing and conversion steps of ScanNet dataset. In previous versions of MMDetection3D, ScanNet dataset was only used for 3D detection task, where we trained on the training set and tested on the validation set. In MMDetection3D 0.14.0, we further support 3D segmentation task on ScanNet, which includes online benchmarking on test set. Since the alignment matrix is not provided for test set data, we abandon the alignment of points in data generation steps to support both tasks. Besides, as 3D segmentation requires per-point prediction, we also remove the down-sampling step in data generation.
- In the new ScanNet processing scripts, we save the unaligned points for all the training, validation and test set. For train and val set with annotations, we also store the `axis_align_matrix` in data infos. For ground-truth bounding boxes, we store boxes in both aligned and unaligned coordinates with key `gt_boxes_upright_depth` and key `unaligned_gt_boxes_upright_depth` respectively in data infos.
- In `ScanNetDataset`, we now load the `axis_align_matrix` as a part of data annotations. If it is not contained in old data infos, we will use identity matrix for compatibility. We also add a transform function `GlobalAlignment` in ScanNet detection data pipeline to align the points.
- Since the aligned boxes share the same key as in old data infos, we do not need to modify the code related to it. But do remember that they are not in the same coordinate system as the saved points.
- There is an `PointSample` pipeline in the data pipelines for ScanNet detection task which down-samples points. So removing down-sampling in data generation will not affect the code.
We have trained a [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet/votenet_8x8_scannet-3d-18class.py) model on the newly processed ScanNet dataset and get similar benchmark results. In order to prepare ScanNet data for both detection and segmentation tasks, please re-run the new pre-processing scripts following the ScanNet [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/).
## 0.12.0
### SUNRGBD dataset for ImVoteNet
We adopt a new pre-processing procedure for the SUNRGBD dataset in order to support ImVoteNet, which is a multi-modality method requiring both image and point cloud data. In previous versions of MMDetection3D, SUNRGBD dataset was only used for point cloud based 3D detection methods. In MMDetection3D 0.12.0, we add ImVoteNet to our model zoo, thus updating SUNRGBD correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
- Fix a bug in the image file path in meta data.
- Convert calibration matrices from double to float to avoid type mismatch in further operations.
- Add instructions in the documents on preparing image data.
Please refer to the SUNRGBD [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/) for more details.
## 0.6.0
### VoteNet and H3DNet model structure update
In MMDetection 0.6.0, we updated the model structures of VoteNet and H3DNet, therefore model checkpoints generated by MMDetection \< 0.6.0 should be first converted to a format compatible with the latest structures via [convert_votenet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_votenet_checkpoints.py) and [convert_h3dnet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_h3dnet_checkpoints.py) . For more details, please refer to the VoteNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/votenet/README.md/) and H3DNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/h3dnet/README.md/).
# Contribution Guide
OpenMMLab welcomes everyone who is interested in contributing to our projects and accepts contribution in the form of PR.
## What is PR
`PR` is the abbreviation of `Pull Request`. Here's the definition of `PR` in the [official document](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) of Github.
```
Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.
```
## Basic Workflow
1. Get the most recent codebase
2. Checkout a new branch from `dev-1.x` or `dev` branch, depending on the version of the codebase you want to contribute to. The main differences between `dev-1.x` and `dev` is that `dev-1.x` depends on MMEngine additionally and it's the main branch we maintains. We strongly recommend you pull request based on more advanced `dev-1.x` branch.
3. Commit your changes ([Don't forget to use pre-commit hooks!](#3-commit-your-changes))
4. Push your changes and create a PR
5. Discuss and review your code
6. Merge your branch to `dev-1.x` / `dev` branch
## Procedures in detail
### 1. Get the most recent codebase
- When you work on your first PR
Fork the OpenMMLab repository: click the **fork** button at the top right corner of Github page
![avatar](https://user-images.githubusercontent.com/34888372/224920532-dc11f696-1175-436a-8c0f-1966f5ca33d1.png)
Clone forked repository to local
```bash
git clone git@github.com:XXX/mmdetection3d.git
```
Add source repository to upstream
```bash
git remote add upstream git@github.com:open-mmlab/mmdetection3d
```
- After your first PR
Checkout the latest branch of the local repository and pull the latest branch of the source repository. Here we assume that you are working on the `dev-1.x` branch.
```bash
git checkout dev-1.x
git pull upstream dev-1.x
```
### 2. Checkout a new branch from the `dev-1.x` / `dev` branch
```bash
git checkout -b branchname
```
```{tip}
To make commit history clear, we strongly recommend you checkout the `dev-1.x` branch before creating a new branch.
```
### 3. Commit your changes
- If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.
```bash
pip install -U pre-commit
pre-commit install
```
- Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.
```bash
# coding
git add [files]
git commit -m 'messages'
```
```{note}
Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.
```
### 4. Push your changes to the forked repository and create a PR
- Push the branch to your forked remote repository
```bash
git push origin branchname
```
- Create a PR
![avatar](https://user-images.githubusercontent.com/34888372/224922548-69455db9-68d1-4d92-a007-afcd2814b1c1.png)
- Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message (For more information, checkout the [official guidance](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)).
- Specifically, if you are contributing to `dev-1.x`, you will have to change the base branch of the PR to `dev-1.x` in the PR page, since the default base branch is `master`.
![avatar](https://user-images.githubusercontent.com/34888372/224923009-1d611a30-0bfc-4fe5-93a2-96cc88a18886.png)
- You can also ask a specific person to review the changes you've proposed.
### 5. Discuss and review your code
- Modify your codes according to reviewers' suggestions and then push your changes.
### 6. Merge your branch to the `dev-1.x` / `dev` branch and delete the branch
- After the PR is merged by the maintainer, you can delete the branch you created in your forked repository.
```bash
git branch -d branchname # delete local branch
git push origin --delete branchname # delete remote branch
```
## PR Specs
1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
2. One short-time branch should be matched with only one PR
3. Accomplish a detailed change in one PR. Avoid large PR
- Bad: Support Faster R-CNN
- Acceptable: Add a box head to Faster R-CNN
- Good: Add a parameter to box head to support custom conv-layer number
4. Provide clear and significant commit message
5. Provide clear and meaningful PR description
- Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
- Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
- Introduce main changes, results and influences on other modules in short description
- Associate related issues and pull requests with a milestone
# FAQ
We list some potential troubles encountered by users and developers, along with their corresponding solutions. Feel free to enrich the list if you find any frequent issues and contribute your solutions to solve them. If you have any trouble with environment configuration, model training, etc, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection3d/blob/master/.github/ISSUE_TEMPLATE/error-report.md) and fill in all required information in the template.
## MMEngine/MMCV/MMDet/MMDet3D Installation
- Compatibility issue between MMEngine, MMCV, MMDetection and MMDetection3D; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx."
- The required versions of MMEngine, MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMEngine, MMCV and MMDetection to avoid installation issues.
| MMDetection3D version | MMEngine version | MMCV version | MMDetection version |
| --------------------- | :----------------------: | :---------------------: | :----------------------: |
| main | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
| v1.4.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
| v1.3.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.3.0 |
| v1.2.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.2.0 |
| v1.1.1 | mmengine>=0.7.1, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.1.0 |
**Note:** If you want to install mmdet3d-v1.0.0rcx, the compatible MMDetection, MMSegmentation and MMCV versions table can be found at [here](https://mmdetection3d.readthedocs.io/en/latest/faq.html#mmcv-mmdet-mmdet3d-installation). Please choose the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
- If you faced the error shown below when importing open3d:
`OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found`
please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
- If you faced the error when importing pycocotools, this is because nuscenes-devkit installs pycocotools but mmdet relies on mmpycocotools. The current workaround is as below. We will migrate to use pycocotools in the future.
```shell
pip uninstall pycocotools mmpycocotools
pip install mmpycocotools
```
**NOTE**: We have migrated to use pycocotools in mmdet3d >= 0.13.0.
- If you face the error shown below when importing pycocotools:
`ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy \< 1.20.0. Or you can compile and install the latest pycocotools from source as below:
`pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
or
`pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
- If you face some errors about numba in cuda-9.0 environment, you should check the version of numba. In cuda-9.0 environment, the high version of numba is not supported and we suggest you could install numba==0.53.0.
## How to annotate point cloud?
MMDetection3D does not support point cloud annotation. Some open-source annotation tool are offered for reference:
- [SUSTechPOINTS](https://github.com/naurril/SUSTechPOINTS)
- [LATTE](https://github.com/bernwang/latte)
Besides, we improved [LATTE](https://github.com/bernwang/latte) for better use. More details can be found [here](https://arxiv.org/abs/2011.10174).
.. toctree::
:maxdepth: 1
benchmarks.md
changelog_v1.0.x.md
changelog.md
compatibility.md
faq.md
contribution_guides.md
#!/usr/bin/env python
import functools as func
import glob
import re
from os import path as osp
import numpy as np
url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/main'
files = sorted(glob.glob('../../configs/*/README.md'))
stats = []
titles = []
num_ckpts = 0
for f in files:
url = osp.dirname(f.replace('../../', url_prefix))
with open(f, 'r') as content_file:
content = content_file.read()
title = content.split('\n')[0].replace('# ', '').strip()
ckpts = set(x.lower().strip()
for x in re.findall(r'\[model\]\((https?.*)\)', content))
if len(ckpts) == 0:
continue
_papertype = [x for x in re.findall(r'\[([A-Z]+)\]', content)]
assert len(_papertype) > 0
papertype = _papertype[0]
paper = set([(papertype, title)])
titles.append(title)
num_ckpts += len(ckpts)
statsmsg = f"""
\t* [{papertype}] [{title}]({url}) ({len(ckpts)} ckpts)
"""
stats.append((paper, ckpts, statsmsg))
allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _ in stats])
msglist = '\n'.join(x for _, _, x in stats)
papertypes, papercounts = np.unique([t for t, _ in allpapers],
return_counts=True)
countstr = '\n'.join(
[f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
modelzoo = f"""
# Model Zoo Statistics
* Number of papers: {len(set(titles))}
{countstr}
* Number of checkpoints: {num_ckpts}
{msglist}
"""
with open('modelzoo_statistics.md', 'w') as f:
f.write(modelzoo)
## <a href='https://mmdetection3d.readthedocs.io/en/latest/'>English</a>
## <a href='https://mmdetection3d.readthedocs.io/zh_CN/latest/'>简体中文</a>
# Backends Support
We support different file client backends: Disk, Ceph and LMDB, etc. Here is an example of how to modify configs for Ceph-based data loading and saving.
## Load data and annotations from Ceph
We support loading data and generated annotation info files (pkl and json) from Ceph:
```python
# set file client backends as Ceph
backend_args = dict(
backend='petrel',
path_mapping=dict({
'./data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/', # replace the path with your data path on Ceph
'data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/' # replace the path with your data path on Ceph
}))
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5)),
sample_groups=dict(Car=15),
classes=class_names,
# set file client for points loader to load training data
points_loader=dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4,
backend_args=backend_args),
# set file client for data base sampler to load db info file
backend_args=backend_args)
train_pipeline = [
# set file client for loading training data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
# set file client for loading training data annotations
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, backend_args=backend_args),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[0.25, 0.25, 0.25],
global_rot_range=[0.0, 0.0],
rot_range=[-0.15707963267, 0.15707963267]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
# set file client for loading validation/testing data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
data = dict(
# set file client for loading training info files (.pkl)
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(pipeline=train_pipeline, classes=class_names, backend_args=backend_args)),
# set file client for loading validation info files (.pkl)
val=dict(pipeline=test_pipeline, classes=class_names,backend_args=backend_args),
# set file client for loading testing info files (.pkl)
test=dict(pipeline=test_pipeline, classes=class_names, backend_args=backend_args))
```
## Load pretrained model from Ceph
```python
model = dict(
pts_backbone=dict(
_delete_=True,
type='NoStemRegNet',
arch='regnetx_1.6gf',
init_cfg=dict(
type='Pretrained', checkpoint='s3://openmmlab/checkpoints/mmdetection3d/regnetx_1.6gf'), # replace the path with your pretrained model path on Ceph
...
```
## Load checkpoint from Ceph
```python
# replace the path with your checkpoint path on Ceph
load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
resume_from = None
workflow = [('train', 1)]
```
## Save checkpoint into Ceph
```python
# checkpoint saving
# replace the path with your checkpoint saving path on Ceph
checkpoint_config = dict(interval=1, max_keep_ckpts=2, out_dir='s3://openmmlab/mmdetection3d')
```
## EvalHook saves the best checkpoint into Ceph
```python
# replace the path with your checkpoint saving path on Ceph
evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetection3d')
```
## Save the training log into Ceph
The training log will be backed up to the specified Ceph path after training.
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
])
```
You can also delete the local training log after backing up to the specified Ceph path by setting `keep_local = False`.
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d', keep_local=False),
])
```
# Learn about Configs
MMDetection3D and other OpenMMLab repositories use [MMEngine's config system](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html). It has a modular and inheritance design, which is convenient to conduct various experiments.
## Config file content
MMDetection3D uses a modular design, all modules with different functions can be configured through the config. Taking PointPillars as an example, we will introduce each field in the config according to different function modules.
### Model config
In MMDetection3D's config, we use `model` to setup detection algorithm components. In addition to neural network components such as `voxel_encoder`, `backbone` etc, it also requires `data_preprocessor`, `train_cfg`, and `test_cfg`. `data_preprocessor` is responsible for processing a batch of data output by dataloader. `train_cfg` and `test_cfg` in the model config are training and testing hyperparameters of the components.
```python
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
voxel_size=[0.16, 0.16, 4],
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=[0.16, 0.16, 4],
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78]],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=0.1111111111111111,
loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
train_cfg=dict(
assigner=[
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1)
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
```
### Dataset and evaluator config
[Dataloaders](https://pytorch.org/docs/stable/data.html?highlight=data%20loader#torch.utils.data.DataLoader) are required for the training, validation, and testing of the [runner](https://mmengine.readthedocs.io/en/latest/tutorials/runner.html). Dataset and data pipeline need to be set to build the dataloader. Due to the complexity of this part, we use intermediate variables to simplify the writing of dataloader configs.
```python
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
classes=class_names,
sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
points_loader=dict(
type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4))
train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='kitti_infos_train.pkl',
data_prefix=dict(pts='training/velodyne_reduced'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
```
[Evaluators](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html) are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
```python
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_val.pkl',
metric='bbox')
test_evaluator = val_evaluator
```
Since the test dataset has no annotation files, the test_dataloader and test_evaluator config in MMDetection3D are generally equal to the val's. If you want to save the detection results on the test dataset, you can write the config like this:
```python
# inference on test dataset and
# format the output results for submission.
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='testing/velodyne_reduced'),
ann_file='kitti_infos_test.pkl',
load_eval_anns=False,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_test.pkl',
metric='bbox',
format_only=True,
submission_prefix='results/kitti-3class/kitti_results')
```
### Training and testing config
MMEngine's runner uses Loop to control the training, validation, and testing processes.
Users can set the maximum training epochs and validation intervals with these fields:
```python
train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=80,
val_interval=2)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
```
### Optimization config
`optim_wrapper` is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer, but also supports functions such as gradient clipping, mixed precision training, etc. Find more in [optimizer wrapper tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html).
```python
optim_wrapper = dict( # Optimizer wrapper config
type='OptimWrapper', # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.
optimizer=dict( # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01),
clip_grad=dict(max_norm=35, norm_type=2)) # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html
```
`param_scheduler` is a field that configures methods of adjusting optimization hyperparameters such as learning rate and momentum. Users can combine multiple schedulers to create a desired parameter adjustment strategy. Find more in [parameter scheduler tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/param_scheduler.html) and [parameter scheduler API documents](https://mmengine.readthedocs.io/en/latest/api/optim.html#scheduler).
```python
param_scheduler = [
dict(
type='CosineAnnealingLR',
T_max=32,
eta_min=0.01,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=48,
eta_min=1.0000000000000001e-07,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=32,
eta_min=0.8947368421052632,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=48,
eta_min=1,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
]
```
### Hook config
Users can attach Hooks to training, validation, and testing loops to insert some operations during running. There are two different hook fields, one is `default_hooks` and the other is `custom_hooks`.
`default_hooks` is a dict of hook configs, and they are the hooks must be required at the runtime. They have default priority which should not be modified. If not set, runner will use the default values. To disable a default hook, users can set its config to `None`.
```python
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=-1),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='Det3DVisualizationHook'))
```
`custom_hooks` is a list of all other hook configs. Users can develop their own hooks and insert them in this field.
```python
custom_hooks = []
```
### Runtime config
```python
default_scope = 'mmdet3d' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/registry.html
env_cfg = dict(
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict( # Multi-processing config
mp_start_method='fork', # Use fork to start multi-processing threads. 'fork' usually faster than 'spawn' but maybe unsafe. See discussion in https://github.com/pytorch/pytorch/issues/1355
opencv_num_threads=0), # Disable opencv multi-threads to avoid system being overloaded
dist_cfg=dict(backend='nccl')) # Distribution configs
vis_backends = [dict(type='LocalVisBackend')] # Visualization backends. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/visualization.html
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(
type='LogProcessor', # Log processor to process runtime logs
window_size=50, # Smooth interval of log values
by_epoch=True) # Whether to format logs with epoch type. Should be consistent with the train loop's type.
log_level = 'INFO' # The level of logging.
load_from = None # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
```
## Config file inheritance
There are 4 basic component types under `configs/_base_`, dataset, model, schedule, default_runtime.
Many methods could be easily constructed with one of these models like SECOND, PointPillars, PartA2, VoteNet.
The configs that are composed of components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from existing methods.
For example, if some modification is made based on PointPillars, users may first inherit the basic PointPillars structure by specifying `_base_ = '../pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py'`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`.
Please refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for detailed documentation.
By setting the `_base_` field, we can set which files the current configuration file inherits from.
When `_base_` is a string of a file path, it means inheriting the contents from one config file.
```python
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
```
When `_base_` is a list of multiple file paths, it means inheriting from multiple files.
```python
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_kitti.py',
'../_base_/datasets/kitti-3d-3class.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
```
If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
### Ignore some fields in the base configs
Sometimes, you may set `_delete_=True` to ignore some of the fields in base configs.
You may refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for a simple illustration.
In MMDetection3D, for example, to change the neck of PointPillars with the following config:
```python
model = dict(
type='MVXFasterRCNN',
data_preprocessor=dict(voxel_layer=dict(...)),
pts_voxel_encoder=dict(...),
pts_middle_encoder=dict(...),
pts_backbone=dict(...),
pts_neck=dict(
type='FPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
act_cfg=dict(type='ReLU'),
in_channels=[64, 128, 256],
out_channels=256,
start_level=0,
num_outs=3),
pts_bbox_head=dict(...))
```
`FPN` and `SECONDFPN` use different keywords to construct:
```python
_base_ = '../_base_/models/pointpillars_hv_fpn_nus.py'
model = dict(
pts_neck=dict(
_delete_=True,
type='SECONDFPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
pts_bbox_head=dict(...))
```
The `_delete_=True` would replace all old keys in `pts_neck` field with new keys.
### Use intermediate variables in configs
Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
It's worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again.
For example, we would like to use a multi-scale strategy to train and test a PointPillars, `train_pipeline`/`test_pipeline` are intermediate variables we would like to modify.
```python
_base_ = './nus-3d.py'
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=[0.95, 1.0, 1.05],
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
```
We first define the new `train_pipeline`/`test_pipeline` and pass them into dataloader fields.
### Reuse variables in \_base\_ file
If the users want to reuse the variables in the base file, they can get a copy of the corresponding variable by using `{{_base_.xxx}}`. E.g:
```python
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
a = {{_base_.model}} # variable `a` is equal to the `model` defined in `_base_`
```
## Modify config through script arguments
When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify `--cfg-options` to in-place modify the config.
- Update config keys of dict chains
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs
Some config dicts are composed as a list in your config. For example, the training pipeline `train_dataloader.dataset.pipeline` is normally a list
e.g. `[dict(type='LoadPointsFromFile'), ...]`. If you want to change `'LoadPointsFromFile'` to `'LoadPointsFromDict'` in the pipeline,
you may specify `--cfg-options data.train.pipeline.0.type=LoadPointsFromDict`.
- Update values of list/tuple
If the value to be updated is a list or a tuple. For example, the config file normally sets `model.data_preprocessor.mean=[123.675, 116.28, 103.53]`. If you want to
change the mean values, you may specify `--cfg-options model.data_preprocessor.mean="[127,127,127]"`. Note that the quotation mark `"` is necessary to
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
## Config Name Style
We follow the below style to name config files. Contributors are advised to follow the same style.
```
{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}.py
```
The file name is divided to five parts. All parts and components are connected with `_` and words of each part or component should be connected with `-`.
- `{algorithm name}`: The name of the algorithm. It can be a detector name such as `pointpillars`, `fcos3d`, etc.
- `{model component names}`: Names of the components used in the algorithm such as voxel_encoder, backbone, neck, etc. For example, `second_secfpn_head-dcn-circlenms` means using SECOND's SparseEncoder, SECONDFPN and a detection head with DCN and circle NMS.
- `{training settings}`: Information of training settings such as batch size, augmentations, loss trick, scheduler, and epochs/iterations. For example: `8xb4-tta-cyclic-20e` means using 8-gpus x 4-samples-per-gpu, test time augmentation, cyclic annealing learning rate, and train 20 epochs.
Some abbreviations:
- `{gpu x batch_per_gpu}`: GPUs and samples per GPU. `bN` indicates N batch size per GPU. E.g. `4xb4` is the short term of 4-GPUs x 4-samples-per-GPU.
- `{schedule}`: training schedule, options are `schedule-2x`, `schedule-3x`, `cyclic-20e`, etc.
`schedule-2x` and `schedule-3x` mean 24 epochs and 36 epochs respectively.
`cyclic-20e` means 20 epochs respectively.
- `{training dataset information}`: Training dataset names like `kitti-3d-3class`, `nus-3d`, `s3dis-seg`, `scannet-seg`, `waymoD5-3d-car`. Here `3d` means dataset used for 3D object detection, and `seg` means dataset used for point cloud segmentation.
- `{testing dataset information}` (optional): Testing dataset name for models trained on one dataset but tested on another. If not mentioned, it means the model was trained and tested on the same dataset type.
# Coordinate System
## Overview
MMDetection3D uses three different coordinate systems. The existence of different coordinate systems in the society of 3D object detection is necessary, because for various 3D data collection devices, such as LiDAR, depth camera, etc., the coordinate systems are not consistent, and different 3D datasets also follow different data formats. Early works, such as SECOND, VoteNet, convert the raw data to another format, forming conventions that some later works also follow, making the conversion between coordinate systems even more complicated.
Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
- Camera coordinate system -- the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
- LiDAR coordinate system -- the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
- Depth coordinate system -- the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of $(x, y, z, dx, dy, dz, r)$, our coordinate systems also define how to interpret the box dimensions $(dx, dy, dz)$ and the yaw angle $r$.
The illustration of the three coordinate systems is shown below:
![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/master/resources/coord_sys_all.png)
The three figures above are the 3D coordinate systems while the three figures below are the bird's eye view.
We will stick to the three coordinate systems defined in this tutorial in the future.
## Definition of the yaw angle
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane $\\Pi$ perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on $\\Pi$ have non-zero yaw angles depending on its angle with the reference direction.
Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one's eyes).
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of $\\frac{\\pi}{2}$.
```
z up y front (yaw=0.5*pi)
^ ^
| /
| /
| /
|/
left (yaw=pi) ------ 0 ------> x right (yaw=0)
```
For a box, the value of its yaw angle equals its direction minus a reference direction. In all three coordinate systems in MMDetection3D, the reference direction is always the positive direction of the x-axis, while the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. The definition of the yaw angle of a box is illustrated in the figure below.
```
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | |
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
|
```
## Definition of the box dimensions
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be $dx$. However, this is not always the case in some datasets (we will address that later).
The following figures show the meaning of the correspondence between the x-axis and $dx$, and between the y-axis and $dy$.
```
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | | dx
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
| dy
```
Note that the box direction is always parallel with the edge $dx$.
```
y front
^ _________
/|\ | | |
| | | |
| | | | dy
| |____|____|____\ box direction (yaw=0)
| | | | /
__|____|____|____|_________\ x right
| | | | /
| |____|____|
| dx
|
```
## Relation with raw coordinate systems of supported datasets
### KITTI
The raw annotation of KITTI is under camera coordinate system, see [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/kitti_data_utils.py). In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py). For training vision-based models, the data is kept in the camera coordinate system.
In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's eye view):
![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
For each box, the dimensions are $(w, l, h)$, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
Our LiDAR coordinate system has two changes:
- The yaw angle is defined to be right-handed instead of left-handed for consistency;
- The box dimensions are $(l, w, h)$ instead of $(w, l, h)$, since $w$ corresponds to $dy$ and $l$ corresponds to $dx$ in KITTI.
### Waymo
We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also share the same coordinate system in our implementation.
### NuScenes
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to $(dy, dx)$, or $(w, l)$, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
### Lyft
Lyft shares the same data format with NuScenes as far as coordinate system is involved.
Please refer to the [official website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data) for more information.
### ScanNet
The raw data of ScanNet is not point cloud but mesh. The sampled point cloud data is under our depth coordinate system. For ScanNet detection task, the box annotations are axis-aligned, and the yaw angle is always zero. Therefore the direction of the yaw angle in our depth coordinate system makes no difference regarding ScanNet.
### SUN RGB-D
The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion.
For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/sunrgbd_data_utils.py).
### S3DIS
S3DIS shares the same coordinate system as ScanNet in our implementation. However, S3DIS is a segmentation-task-only dataset, and thus no annotation is coordinate system sensitive.
## Examples
### Box conversion (between different coordinate systems)
Take the conversion between our Camera coordinate system and LiDAR coordinate system as an example:
First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
- $x\_{LiDAR}=z\_{camera}$
- $y\_{LiDAR}=-x\_{camera}$
- $z\_{LiDAR}=-y\_{camera}$
Then, the box dimensions before and after the conversion satisfy the following relationship:
- $dx\_{LiDAR}=dx\_{camera}$
- $dy\_{LiDAR}=dz\_{camera}$
- $dz\_{LiDAR}=dy\_{camera}$
Finally, the yaw angle should also be converted:
- $r\_{LiDAR}=-\\frac{\\pi}{2}-r\_{camera}$
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
### Bird's Eye View
The BEV of a camera coordinate system box is $(x, z, dx, dz, -r)$ if the 3D box is $(x, y, z, dx, dy, dz, r)$. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
### Rotation of boxes
We set the rotation of all kinds of boxes to be counter-clockwise about the gravity axis. Therefore, to rotate a 3D box we first calculate the new box center, and then we add the rotation angle to the yaw angle.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
## Common FAQ
#### Q1: Are the box related ops universal to all coordinate system types?
No. For example, [RoI-Aware Pooling ops](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/roiaware_pool3d.py) is applicable to boxes under Depth or LiDAR coordinate system only. The evaluation functions for KITTI dataset [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/kitti_utils) are only applicable to boxes under Camera coordinate system since the rotation is clockwise if viewed from above.
For each box related op, we have marked the type of boxes to which we can apply the op.
#### Q2: In every coordinate system, do the three axes point exactly to the right, the front, and the ground, respectively?
No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
#### Q3: How does a phase difference of $2\\pi$ in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of $2\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of $2\\pi$ will not change the result.
#### Q4: How does a phase difference of $\\pi$ in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of $\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
However, for angle prediction evaluation, this will result in the exact opposite direction.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add $\\pi$ to this angle, the car front will become the car rear.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of $\\pi$ will not affect the angle prediction score.
# Customize Data Pipelines
## Design of Data pipelines
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in object detection may not be the same size (point number, gt bbox size, etc.),
we introduce a new `DataContainer` type in MMCV to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
The data preparation pipeline and the dataset is decomposed. Usually a dataset
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
![](../../../resources/data_pipeline.png)
The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
Here is an pipeline example for PointPillars.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
pts_scale_ratio=1.0,
flip=False,
pcd_horizontal_flip=False,
pcd_vertical_flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
```
For each operation, we list the related dict fields that are added/updated/removed.
### Data loading
`LoadPointsFromFile`
- add: points
`LoadPointsFromMultiSweeps`
- update: points
`LoadAnnotations3D`
- add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
### Pre-processing
`GlobalRotScaleTrans`
- add: pcd_trans, pcd_rotation, pcd_scale_factor
- update: points, \*bbox3d_fields
`RandomFlip3D`
- add: flip, pcd_horizontal_flip, pcd_vertical_flip
- update: points, \*bbox3d_fields
`PointsRangeFilter`
- update: points
`ObjectRangeFilter`
- update: gt_bboxes_3d, gt_labels_3d
`ObjectNameFilter`
- update: gt_bboxes_3d, gt_labels_3d
`PointShuffle`
- update: points
`PointsRangeFilter`
- update: points
### Formatting
`DefaultFormatBundle3D`
- update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
`Collect3D`
- add: img_meta (the keys of img_meta is specified by `meta_keys`)
- remove: all other keys except for those specified by `keys`
### Test time augmentation
`MultiScaleFlipAug`
- update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
## Extend and use custom pipelines
1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python
from mmdet.datasets import PIPELINES
@PIPELINES.register_module()
class MyTransform:
def __call__(self, results):
results['dummy'] = True
return results
```
2. Import the new class.
```python
from .my_pipeline import MyTransform
```
3. Use it in config files.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
# Dataset Preparation
## Before Preparation
It is recommended to symlink the dataset root to `$MMDETECTION3D/data`.
If your folder structure is different from the following, you may need to change the corresponding paths in config files.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
│ ├── s3dis
│ │ ├── meta_data
│ │ ├── Stanford3dDataset_v1.2_Aligned_Version
│ │ ├── collect_indoor3d_data.py
│ │ ├── indoor3d_util.py
│ │ ├── README.md
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ ├── sunrgbd
│ │ ├── OFFICIAL_SUNRGBD
│ │ ├── matlab
│ │ ├── sunrgbd_data.py
│ │ ├── sunrgbd_utils.py
│ │ ├── README.md
│ ├── semantickitti
│ │ ├── sequences
│ │ │ ├── 00
│ │ │ │   ├── labels
│ │ │ │   ├── velodyne
│ │ │ ├── 01
│ │ │ ├── ..
│ │ │ ├── 22
```
## Download and Data Preparation
### KITTI
1. Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset kitti
```
2. Prepare KITTI data splits by running:
```bash
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# Download data split
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
```
3. Generate info files by running:
```bash
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
```
In an environment using slurm, users may run the following command instead:
```bash
sh tools/create_data.sh <partition> kitti
```
**Tips**:
- **Ready-made Annotations**. We have also provided kitti data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/kitti/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
```bash
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti --only-gt-database
```
### Waymo
Download Waymo open dataset V1.4.1 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `.tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split `.txt` files into `data/waymo/kitti_format/ImageSets`. Download ground truth `.bin` file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running:
```bash
# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
# The number of `--workers` depends on the maximum number of cores in your CPU.
TF_CPP_MIN_LOG_LEVEL=3 python tools/create_data.py waymo --root-path ./data/waymo --out-dir ./data/waymo --workers 128 --extra-tag waymo --version v1.4
```
Note that:
- In case the preprocessing of Waymo dataset is slow or blocked, consider reducing the value of `--workers`. If this doesn't resolve the issue, you could set `--workers` as 0 to avoid using multiprocess.
- If your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
**Tips**:
- **Ready-made Annotations**. We have provided the annotation files generated offline [here](#summary-of-annotation-files). However, the original Waymo data still needs to be converted to `kitti-format` data by yourself.
- **Waymo-mini**. If you just want to use a part of Waymo Dataset to verify some methods or debug quickly, you could use our provided [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) which only contains two segments in train split and one segment in val split from the original dataset. All the images, point clouds and annotations in this compressed file have been processed offline so that you can directly download and unzip it to `data/waymo/`:
```bash
tar -xzvf waymo_mini.tar.gz -C ./data/waymo_mini
```
### NuScenes
1. Download nuScenes V1.0 full dataset data [HERE](https://www.nuscenes.org/download). Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset nuscenes
```
2. Prepare nuscenes data by running:
```bash
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
```
**Tips**:
- **Ready-made Annotations**. We have also provided NuScenes data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/nuscenes/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
```bash
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --only-gt-database
```
### Lyft
Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running:
```bash
python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
python tools/dataset_converters/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
```
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the [discussion](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000) for more details.
### SemanticKITTI
1. Download SemanticKITTI dataset [HERE](http://semantic-kitti.org/dataset.html#download) and unzip all zip files. Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset semantickitti
```
2. Generate info files by running:
```bash
python ./tools/create_data.py semantickitti --root-path ./data/semantickitti --out-dir ./data/semantickitti --extra-tag semantickitti
```
**Tips**:
- **Ready-made Annotations**. We have also provided SemanticKITTI data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/semantickitti/`.
### S3DIS, ScanNet and SUN RGB-D
To prepare S3DIS data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/s3dis/README.md).
To prepare ScanNet data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/scannet/README.md).
To prepare SUN RGB-D data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/sunrgbd/README.md).
**Tips**: For S3DIS, ScanNet and SUN RGB-D datasets, we have also provided data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/${DATASET}/`. However, you also need to generate point cloud files and semantic/instance masks files (if it has) by yourself.
### Customized Datasets
For using custom datasets, please refer to [Customize Datasets](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
### Update data infos
If you have used v1.0.0rc1-v1.0.0rc4 mmdetection3d to create data infos before, and now you want to use the newest v1.1.0 mmdetection3d, you need to update the data infos file.
```bash
python tools/dataset_converters/update_infos_to_v2.py --dataset ${DATA_SET} --pkl-path ${PKL_PATH} --out-dir ${OUT_DIR}
```
- `--dataset` : Name of dataset.
- `--pkl-path` : Specify the data infos pkl file path.
- `--out-dir` : Output direction of the data infos pkl file.
Example:
```bash
python tools/dataset_converters/update_infos_to_v2.py --dataset kitti --pkl-path ./data/kitti/kitti_infos_trainval.pkl --out-dir ./data/kitti
```
### Summary of annotation files
We provide ready-made annotation files we generated offline for reference. You can directly use these files for convenice.
| Dataset | Train annotation file | Val annotation file | Test information file |
| :-------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| KITTI | [kitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_train.pkl) | [kitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_val.pkl) | [kitti_infos_test](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_test.pkl) |
| NuScenes | [nuscenes_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_train.pkl) [nuscenes_mini_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_train.pkl) | [nuscenes_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_val.pkl) [nuscenes_mini_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_val.pkl) | |
| Waymo | [waymo_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_train.pkl) | [waymo_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_val.pkl) | [waymo_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo/waymo_infos_test.pkl) [waymo_infos_test_cam_only.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_test_cam_only.pkl) |
| [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) | | | |
| SUN RGB-D | [sunrgbd_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_train.pkl) | [sunrgbd_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_val.pkl) | |
| ScanNet | [scannet_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_train.pkl) | [scannet_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_val.pkl) | [scannet_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_test.pkl) |
| SemanticKitti | [semantickitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_train.pkl) | [semantickitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_val.pkl) | [semantickitti_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_test.pkl) |
Train & Test
**************
.. toctree::
:maxdepth: 1
config.md
coord_sys_tutorial.md
dataset_prepare.md
data_pipeline.md
train_test.md
inference.md
new_data_model.md
Useful Tools
************
.. toctree::
:maxdepth: 1
useful_tools.md
visualization.md
backends_support.md
model_deployment.md
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment