For the overall process, please refer to the [README](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md) page for SUN RGB-D.
### Download SUN RGB-D data and toolbox
Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move `SUNRGBD.zip`, `SUNRGBDMeta2DBB_v2.mat`, `SUNRGBDMeta3DBB_v2.mat` and `SUNRGBDtoolbox.zip` to the `OFFICIAL_SUNRGBD` folder, unzip the zip files.
The directory structure before data preparation should be as below:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
```
### Extract data and annotations for 3D detection from raw data
Extract SUN RGB-D annotation data from raw annotation data by running (this requires MATLAB installed on your machine):
The above two scripts call functions such as `read3dPoints` from the [toolbox](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip) provided by SUN RGB-D.
The directory structure after extraction should be as follows.
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
```
Under each following folder there are overall 5285 train files and 5050 val files:
-`calib`: Camera calibration information in `.txt`
-`depth`: Point cloud saved in `.mat` (xyz+rgb)
-`image`: Image data in `.jpg`
-`label`: Detection annotation data in `.txt` (version 2)
-`label_v1`: Detection annotation data in `.txt` (version 1)
-`seg_label`: Segmentation annotation data in `.txt`
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
### Create dataset
Please run the command below to create the dataset.
The above point cloud data are further saved in `.bin` format. Meanwhile `.pkl` info files are also generated for saving annotation and metadata.
The directory structure after processing should be as follows.
```
sunrgbd
├── README.md
├── matlab
│ ├── ...
├── OFFICIAL_SUNRGBD
│ ├── ...
├── sunrgbd_trainval
│ ├── ...
├── points
├── sunrgbd_infos_train.pkl
├── sunrgbd_infos_val.pkl
```
-`points/xxxxxx.bin`: The point cloud data after downsample.
-`sunrgbd_infos_train.pkl`: The train data infos, the detailed info of each scene is as follows:
- info\['lidar_points'\]: A dict containing all information related to the the lidar points.
- info\['lidar_points'\]\['num_pts_feats'\]: The feature dimension of point.
- info\['lidar_points'\]\['lidar_path'\]: The filename of the lidar point cloud data.
- info\['images'\]: A dict containing all information relate to the image data.
- info\['images'\]\['CAM0'\]\['img_path'\]: The filename of the image.
- info\['images'\]\['CAM0'\]\['depth2img'\]: Transformation matrix from depth to image with shape (4, 4).
- info\['images'\]\['CAM0'\]\['height'\]: The height of image.
- info\['images'\]\['CAM0'\]\['width'\]: The width of image.
- info\['instances'\]: A list of dict contains all the annotations of this frame. Each dict corresponds to annotations of single instance. For the i-th instance:
- info\['instances'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box in depth coordinate system.
- info\['instances'\]\[i\]\['bbox'\]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order.
- info\['instances'\]\[i\]\['bbox_label_3d'\]: An int indicates the 3D label of instance and the -1 indicates ignore class.
- info\['instances'\]\[i\]\['bbox_label'\]: An int indicates the 2D label of instance and the -1 indicates ignore class.
-`sunrgbd_infos_val.pkl`: The val data infos, which shares the same format as `sunrgbd_infos_train.pkl`.
## Train pipeline
A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
```python
train_pipeline=[
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0,1,2]),
dict(type='LoadAnnotations3D'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599,0.523599],
scale_ratio_range=[0.85,1.15],
shift_height=True),
dict(type='PointSample',num_points=20000),
dict(
type='Pack3DDetInputs',
keys=['points','gt_bboxes_3d','gt_labels_3d'])
]
```
Data augmentation for point clouds:
-`RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
-`GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-30, 30\] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of \[0.85, 1.15\] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).
-`PointSample`: downsample the input point cloud.
A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
-`Resize`: resize the input image, `keep_ratio=True` means the ratio of the image is kept unchanged.
-`RandomFlip`: randomly flip the input image.
The image augmentation functions are implemented in [MMDetection](https://github.com/open-mmlab/mmdetection/tree/dev-3.x/mmdet/datasets/transforms).
## Metrics
Same as ScanNet, typically mean Average Precision (mAP) is used for evaluation on SUN RGB-D, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called. Please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/indoor_eval.py) for more details.
Since SUN RGB-D consists of image data, detection on image data is also feasible. For instance, in ImVoteNet, we first train an image detector, and we also use mAP for evaluation, e.g. `mAP@0.5`. We use the `eval_map` function from [MMDetection](https://github.com/open-mmlab/mmdetection) to calculate mAP.
This page provides specific tutorials about the usage of MMDetection3D for Waymo dataset.
## Prepare dataset
Before preparing Waymo dataset, if you only installed requirements in `requirements/build.txt` and `requirements/runtime.txt` before, please install the official package for this dataset at first by running
```
pip install waymo-open-dataset-tf-2-6-0
```
or
```
pip install -r requirements/optional.txt
```
Like the general way to prepare dataset, it is recommended to symlink the dataset root to `$MMDETECTION3D/data`.
Due to the original Waymo data format is based on `tfrecord`, we need to preprocess the raw data for convenient usage in the training and evaluation procedure. Our approach is to convert them into KITTI format.
The folder structure should be organized as follows before our processing.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ │ ├── cam_gt.bin
│ │ │ ├── fov_gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
```
You can download Waymo open dataset V1.4 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin files for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare Waymo data by running
```bash
# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
# The number of `--workers` depends on the maximum number of cores in your CPU.
Note that if your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
After the data conversion, the folder structure and info files should be organized as below.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ │ ├── cam_gt.bin
│ │ │ ├── fov_gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ │ │ ├── training
│ │ │ │ ├── image_0
│ │ │ │ ├── image_1
│ │ │ │ ├── image_2
│ │ │ │ ├── image_3
│ │ │ │ ├── image_4
│ │ │ │ ├── velodyne
│ │ │ ├── testing
│ │ │ │ ├── (the same as training)
│ │ │ ├── waymo_gt_database
│ │ │ ├── waymo_infos_trainval.pkl
│ │ │ ├── waymo_infos_train.pkl
│ │ │ ├── waymo_infos_val.pkl
│ │ │ ├── waymo_infos_test.pkl
│ │ │ ├── waymo_dbinfos_train.pkl
```
-`kitti_format/training/image_{0-4}/{a}{bbb}{ccc}.jpg` Here because there are several cameras, we store the corresponding images. We use a coding way `{a}{bbb}{ccc}` to name the data for each frame, where `a` is the prefix for different split (`0` for training, `1` for validation and `2` for testing), `bbb` for segment index and `ccc` for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the `ImageSet` files.
-`kitti_format/training/velodyne/{a}{bbb}{ccc}.bin` point cloud data for each frame.
-`kitti_format/waymo_gt_database/xxx_{Car/Pedestrian/Cyclist}_x.bin`. point cloud data included in each 3D bounding box of the training dataset. These point clouds will be used in data augmentation e.g. `ObjectSample`. `xxx` is the index of training samples and `x` is the index of objects in this frame.
-`kitti_format/waymo_infos_train.pkl`. training dataset information, a dict contains two keys: `metainfo` and `data_list`.`metainfo` contains the basic information for the dataset itself, such as `dataset`, `version` and `info_version`, while `data_list` is a list of dict, each dict (hereinafter referred to as `info`) contains all the detailed information of single sample as follows:
- info\['sample_idx'\]: The index of this sample in the whole dataset.
- info\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list).
- info\['timestamp'\]: Timestamp of the sample data.
- info\['context_name'\]: The context name of sample indices which `*.tfrecord` segment it extracted from.
- info\['lidar_points'\]: A dict containing all the information related to the lidar points.
- info\['lidar_points'\]\['lidar_path'\]: The filename of the lidar point cloud data.
- info\['lidar_points'\]\['num_pts_feats'\]: The feature dimension of point.
- info\['lidar_sweeps'\]: A list contains sweeps information of lidar
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['lidar_path'\]: The lidar data path of i-th sweep.
- info\['lidar_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
- info\['images'\]: A dict contains five keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. Each dict contains all data information related to corresponding camera.
- info\['images'\]\['CAM_XXX'\]\['img_path'\]: The filename of the image.
- info\['images'\]\['CAM_XXX'\]\['height'\]: The height of the image.
- info\['images'\]\['CAM_XXX'\]\['width'\]: The width of the image.
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]: The transformation matrix recording the intrinsic parameters when projecting 3D points to each image plane. (4x4 list)
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]: The transformation matrix from lidar sensor to this camera. (4x4 list)
- info\['images'\]\['CAM_XXX'\]\['lidar2img'\]: The transformation matrix from lidar sensor to each image plane. (4x4 list)
- info\['image_sweeps'\]: A list containing sweeps information of images.
- info\['image_sweeps'\]\[i\]\['images'\]\['CAM_XXX'\]\['img_path'\]: The image path of i-th sweep.
- info\['image_sweeps'\]\[i\]\['ego2global'\]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
- info\['image_sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
- info\['instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
- info\['instances'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, w, h, yaw) order.
- info\['instances'\]\[i\]\['bbox'\]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order. (some instances may not have a corresponding 2D bounding box)
- info\['instances'\]\[i\]\['bbox_label_3d'\]: A int indicating the label of instance and the -1 indicating ignore.
- info\['instances'\]\[i\]\['bbox_label'\]: A int indicating the label of instance and the -1 indicating ignore.
- info\['instances'\]\[i\]\['num_lidar_pts'\]: Number of lidar points included in each 3D bounding box.
- info\['instances'\]\[i\]\['camera_id'\]: The index of the most visible camera for this instance.
- info\['instances'\]\[i\]\['group_id'\]: The index of this instance in this sample.
- info\['cam_sync_instances'\]: It is a list of dict. Each dict contains all annotation information of single instance. Its format is same with \['instances'\]. However, \['cam_sync_instances'\] is only for multi-view camera-based 3D Object Detection task.
- info\['cam_instances'\]: It is a dict containing keys `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. For monocular camera-based 3D Object Detection task, we split 3D annotations of the whole scenes according to the camera they belong to. For the i-th instance:
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_3d'\]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, h, w, yaw) order.
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox'\]: 2D bounding box annotation (exterior rectangle of the projected 3D box), a list arrange as \[x1, y1, x2, y2\].
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label_3d'\]: Label of instance.
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label'\]: Label of instance.
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['center_2d'\]: Projected center location on the image, a list has shape (2,).
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['depth'\]: The depth of projected center.
## Training
Considering there are many similar frames in the original dataset, we can basically use a subset to train our model primarily. In our preliminary baselines, we load one frame every five frames, and thanks to our hyper parameters settings and data augmentation, we obtain a better result compared with the performance given in the original dataset [paper](https://arxiv.org/pdf/1912.04838.pdf). For more details about the configuration and performance, please refer to README.md in the `configs/pointpillars/`. A more complete benchmark based on other settings and methods is coming soon.
## Evaluation
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/r1.3/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
`pklfile_prefix` should be set in `test_evaluator` of configuration if the bin file is needed to be generated, so you can add `--cfg-options "test_evaluator.pklfile_prefix=xxxx"` in the end of command if you want do it.
**Notice**:
1. Sometimes when using `bazel` to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.
2. Considering it takes a little long time to evaluate once, we recommend to evaluate only once at the end of model training.
3. To use TensorFlow with CUDA 9, it is recommended to compile it from source. Apart from official tutorials, you can refer to this [link](https://github.com/SmileTM/Tensorflow2.X-GPU-CUDA9.0) for possibly suitable precompiled packages and useful information for compiling it from source.
## Testing and make a submission
An example to test PointPillars on Waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
`submission_prefix` should be set in `test_evaluator` of configuration before you run the test command if you want to generate the bin files and make a submission to the leaderboard..
After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). Basically, here are some example commands.
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. Make sure you change the fields in `submission.txtpb` before running the command above.
Currently, we only support bin format point cloud training and inference, before training on your own datasets, you need to transform your point cloud format to bin file. The common point cloud data formats include pcd and las, we provide some open-source tools for reference.
1. Convert pcd to bin: https://github.com/leofansq/Tools_RosBag2KITTI
2. Convert las to bin: The common conversion path is las -> pcd -> bin, and the conversion from las -> pcd can be achieved through [this tool](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor).
### Point cloud annotation
MMDetection3D does not support point cloud annotation. Some open-source annotation tools are offered for reference:
Besides, we improved [LATTE](https://github.com/bernwang/latte) for better usage. More details can be found [here](https://arxiv.org/abs/2011.10174).
## Support new data format
To support a new data format, you can either convert them to existing formats or directly convert them to the middle format. You could also choose to convert them offline (before training by a script) or online (implement a new dataset and do the conversion at training).
### Reorganize new data formats to existing format
Once your datasets only contain point cloud file and 3D Bounding box annotations, without calib file. We recommend converting it into the basic formats, the annotations files in basic format has the following necessary keys:
```python
[
{'sample_idx':
'lidar_points':{'lidar_path':velodyne_path,
....
},
'annos':{'box_type_3d':(str)'LiDAR/Camera/Depth'
'gt_bboxes_3d':<np.ndarray>(n,7)
'gt_names':[list]
....
}
'calib':{.....}
'images':{.....}
}
]
```
In MMDetection3D, for the data that is inconvenient to read directly online, we recommend converting it into into basic format as above and do the conversion offline, thus you only need to modify the config's data annotation paths and classes after the conversion.
To use data that share a similar format as the existing datasets, e.g., Lyft has a similar format as the nuScenes dataset, we recommend directly implementing a new data converter and a dataset class to convert the data and load the data, respectively. In this procedure, the code can inherit from the existing dataset classes to reuse the code.
### Reorganize new data format to middle format
There is also a way if users do not want to convert the annotation format to existing formats.
Actually, we convert all the supported datasets into pickle files, which summarize useful information for model training and inference.
The annotation of a dataset is a list of dict, each dict corresponds to a frame.
A basic example (used in KITTI) is as follows. A frame consists of several keys, like `image`, `point_cloud`, `calib` and `annos`.
As long as we could directly read data according to these information, the organization of raw data could also be different from existing ones.
With this design, we provide an alternative choice for customizing datasets.
On top of this you can write a new Dataset class inherited from `Custom3DDataset`, and overwrite related methods,
like [KittiDataset](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py) and [ScanNetDataset](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/scannet_dataset.py).
### An example of customized dataset
Here we provide an example of customized dataset.
Assume the annotation has been reorganized into a list of dict in pickle files like basic format.
The bounding boxes annotations are stored in `annotation.pkl` as the following
Then in the config, to use `MyDataset` you can modify the config as the following
```python
dataset_A_train=dict(
type='MyDataset',
ann_file='annotation.pkl',
pipeline=train_pipeline
)
```
## Customize datasets by dataset wrappers
MMDetection3D also supports many dataset wrappers to mix the dataset or modify the dataset distribution for training like MMDetection.
Currently it supports to three dataset wrappers as below:
-`RepeatDataset`: simply repeat the whole dataset.
-`ClassBalancedDataset`: repeat dataset in a class balanced manner.
-`ConcatDataset`: concat datasets.
### Repeat dataset
We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset is `Dataset_A`, to repeat it, the config looks like the following
```python
dataset_A_train=dict(
type='RepeatDataset',
times=N,
dataset=dict(# This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
### Class balanced dataset
We use `ClassBalancedDataset` as wrapper to repeat the dataset based on category
frequency. The dataset to repeat needs to instantiate function `self.get_cat_ids(idx)`
to support `ClassBalancedDataset`.
For example, to repeat `Dataset_A` with `oversample_thr=1e-3`, the config looks like the following
```python
dataset_A_train=dict(
type='ClassBalancedDataset',
oversample_thr=1e-3,
dataset=dict(# This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
You may refer to [source code](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/dataset_wrappers.py) for details.
### Concatenate dataset
There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train=dict(
type='Dataset_A',
ann_file=['anno_file_1','anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train=dict(
type='Dataset_A',
ann_file=['anno_file_1','anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train=dict()
dataset_B_train=dict()
data=dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=[
dataset_A_train,
dataset_B_train
],
val=dataset_A_val,
test=dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val=dict()
dataset_B_val=dict()
data=dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val,dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
1. The option `separate_eval=False` assumes the datasets use `self.data_infos` during evaluation. Therefore, COCO datasets do not support this behavior since COCO datasets do not fully rely on `self.data_infos` for evaluation. Combining different types of datasets and evaluating them as a whole is not tested thus is not suggested.
2. Evaluating `ClassBalancedDataset` and `RepeatDataset` is not supported thus evaluating concatenated datasets of these types is also not supported.
A more complex example that repeats `Dataset_A` and `Dataset_B` by N and M times, respectively, and then concatenates the repeated datasets is as the following.
```python
dataset_A_train=dict(
type='RepeatDataset',
times=N,
dataset=dict(
type='Dataset_A',
...
pipeline=train_pipeline
)
)
dataset_A_val=dict(
...
pipeline=test_pipeline
)
dataset_A_test=dict(
...
pipeline=test_pipeline
)
dataset_B_train=dict(
type='RepeatDataset',
times=M,
dataset=dict(
type='Dataset_B',
...
pipeline=train_pipeline
)
)
data=dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=[
dataset_A_train,
dataset_B_train
],
val=dataset_A_val,
test=dataset_A_test
)
```
## Modify Dataset Classes
With existing dataset types, we can modify the class names of them to train subset of the annotations.
For example, if you want to train only three classes of the current dataset,
you can modify the classes of dataset.
The dataset will filter out the ground truth boxes of other classes automatically.
```python
classes=('person','bicycle','car')
data=dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
MMDetection V2.0 also supports to read the classes from a file, which is common in real applications.
For example, assume the `classes.txt` contains the name of classes as the following.
```
person
bicycle
car
```
Users can set the classes as a file path, the dataset will load it and convert it to a list automatically.
```python
classes='path/to/classes.txt'
data=dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
## Loading Point Clouds Adjustment
Generally speaking, the most basic bin data contains (x, y, z) information, and some also include intensity, elongation (point cloud elongation), timestamp, and the point cloud dimension ranges from 3 to 6. In MMDetection3D, you need to adjust the some settings in config while customized dataset training:
```python
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
# adjust accordingly according to the dimension
# of the point cloud of your own dataset
load_dim=3,
# actually used dimension,you can also specify the
# specific dimension in list format
use_dim=3),
```
## Training Setting Adjustment
In order to avoid some problems in the training process and improve the performance of the model on the custom dataset, some training settings need to be adjusted according to the dataset.
### Adjust Point Cloud Range and Annotations in Config
For example, we can adjust `point_cloud_range` in config file to change training point cloud range. In KITTI dataset, the `point_cloud_range` is set to be `[0, -39.68, -3, 69.12, 39.68, 1]`.
By setting point cloud range, the `PointsRangeFilter` is used to filter point cloud and its mask (semantic and instance), and `ObjectRangeFilter` is used to filter 3D bounding boxes.
Here you can refer to the setting of the existing datasets. theoretically, `voxel_size` is linked to the setting of `point_cloud_range`. Setting a smaller `voxel_size` will increase the voxel num and the corresponding memory consumption. In addition, the following issues need to be noted:
if the `point_cloud_range` and `voxel_size` are set to be `[0, -40, -3, 70.4, 40, 1]` and `[0.05, 0.05, 0.1]` respectively, then the shape of intermediate feature map should be `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`. More details refers to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/382).
Regarding the setting of `anchor_range`, it is generally adjusted according to dataset. Note that `z` value needs to be adjusted accordingly to the position of the point cloud, please refer to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/986).
Regarding the setting of `anchor_size`, it is usually necessary to count the average length, width and height of the entire training dataset as `anchor_size` to obtain the best results.
**Note** (related to MMDetection):
- Before MMDetection v2.5.0, the dataset will filter out the empty GT images automatically if the classes are set and there is no way to disable that through config. This is an undesirable behavior and introduces confusion because if the classes are not set, the dataset only filters the empty GT images when `filter_empty_gt=True` and `test_mode=False`. After MMDetection v2.5.0, we decouple the image filtering process and the classes modification, i.e., the dataset will only filter empty GT images when `filter_empty_gt=True` and `test_mode=False`, no matter whether the classes are set. Thus, setting the classes only influences the annotations of classes used for training and users could decide whether to filter empty GT images by themselves.
- Since the middle format only has box labels and does not contain the class names, when using `CustomDataset`, users cannot filter out the empty GT images through configs but only do this offline.
- The features for setting dataset classes and dataset filtering will be refactored to be more user-friendly in the future (depends on the progress).
LiDAR-based 3D detection is one of the most basic tasks supported in MMDetection3D.
It expects the given model to take any number of points with features collected by LiDAR as input, and predict the 3D bounding boxes and category labels for each object of interest.
Next, taking PointPillars on the KITTI dataset as an example, we will show how to prepare data, train and test a model on a standard 3D detection benchmark, and how to visualize and validate the results.
## Data Preparation
To begin with, we need to download the raw data and reorganize the data in a standard way presented in the [doc for data preparation](https://mmdetection3d.readthedocs.io/en/dev-1.x/user_guides/dataset_prepare.html).
Note that for KITTI, we need extra `.txt` files for data splits.
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a `.pkl` file.
So after getting all the raw data ready, we need to run the scripts provided in the `create_data.py` for different datasets to generate data infos.
Afterwards, the related folder structure should be as follows:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── kitti_gt_database
│ │ ├── kitti_infos_train.pkl
│ │ ├── kitti_infos_trainval.pkl
│ │ ├── kitti_infos_val.pkl
│ │ ├── kitti_infos_test.pkl
│ │ ├── kitti_dbinfos_train.pkl
```
## Training
Then let us train a model with provided configs for PointPillars.
You can basically follow the examples provided in this [tutorial](https://mmdetection3d.readthedocs.io/en/dev-1.x/user_guides/train_test.html) when training with different GPU settings.
Suppose we use 8 GPUs on a single machine with distributed training:
Note that `8xb6` in the config name refers to the training is completed with 8 GPUs and 6 samples on each GPU.
If your customized setting is different from this, sometimes you need to adjust the learning rate accordingly.
A basic rule can be referred to [here](https://arxiv.org/abs/1706.02677). We have supported `--auto-scale-lr` to
enable automatically scaling LR.
## Quantitative Evaluation
During training, the model checkpoints will be evaluated regularly according to the setting of `train_cfg = dict(val_interval=xxx)` in the config.
We support official evaluation protocols for different datasets.
For KITTI, the model will be evaluated with mean average precision (mAP) with Intersection over Union (IoU) thresholds 0.5/0.7 for 3 categories respectively.
The evaluation results will be printed in the command like:
```
Car AP@0.70, 0.70, 0.70:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:89.6905, 87.4570, 85.4865
3d AP:87.4561, 76.7569, 74.1302
aos AP:97.70, 88.73, 87.34
Car AP@0.70, 0.50, 0.50:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:98.4400, 90.1218, 89.6270
3d AP:98.3329, 90.0209, 89.4035
aos AP:97.70, 88.73, 87.34
```
In addition, you can also evaluate a specific model checkpoint after training is finished. Simply run scripts like the following:
If you would like to only conduct inference or test the model performance on the online benchmark,
you need to specify the `submission_prefix` for corresponding evaluator,
e.g., add `test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_test.pkl', format_only=True, pklfile_prefix='results/kitti-3class/kitti_results', submission_prefix='results/kitti-3class/kitti_results')` in the configuration then you can get the results file.
Please guarantee the `data_prefix` and `ann_file` in [info for testing](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/kitti-3d-3class.py#L117) in the config corresponds to the test set instead of validation set.
After generating the results, you can basically compress the folder and upload to the KITTI evaluation server.
## Qualitative Validation
MMDetection3D also provides versatile tools for visualization such that we can have an intuitive feeling of the detection results predicted by our trained models.
You can either set the `--show` option to visualize the detection results online during evaluation,
or using `tools/misc/visualize_results.py` for offline visualization.
Besides, we also provide scripts `tools/misc/browse_dataset.py` to visualize the dataset without inference.
Please refer more details in the [doc for visualization](https://mmdetection3d.readthedocs.io/en/dev-1.x/user_guides/visualization.html).
LiDAR-based 3D semantic segmentation is one of the most basic tasks supported in MMDetection3D.
It expects the given model to take any number of points with features collected by LiDAR as input, and predict the semantic labels for each input point.
Next, taking PointNet++ (SSG) on the ScanNet dataset as an example, we will show how to prepare data, train and test a model on a standard 3D semantic segmentation benchmark, and how to visualize and validate the results.
## Data Preparation
To begin with, we need to download the raw data from ScanNet's [official website](http://kaldir.vc.in.tum.de/scannet_benchmark/documentation).
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl or .json file.
So after getting all the raw data ready, we can follow the instructions presented in [ScanNet README doc](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/) to generate data infos.
Afterwards, the related folder structure should be as follows:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── scannet_utils.py
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── scannet_instance_data
│ │ ├── points
│ │ ├── instance_mask
│ │ ├── semantic_mask
│ │ ├── seg_info
│ │ │ ├── train_label_weight.npy
│ │ │ ├── train_resampled_scene_idxs.npy
│ │ │ ├── val_label_weight.npy
│ │ │ ├── val_resampled_scene_idxs.npy
│ │ ├── scannet_infos_train.pkl
│ │ ├── scannet_infos_val.pkl
│ │ ├── scannet_infos_test.pkl
```
## Training
Then let us train a model with provided configs for PointNet++ (SSG).
You can basically follow this [tutorial](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html#inference-with-existing-models) for sample scripts when training with different GPU settings.
Suppose we use 2 GPUs on a single machine with distributed training:
If you would like to only conduct inference or test the model performance on the online benchmark,
you should change `ann_file='scannet_infos_val.pkl'` to `ann_file='scannet_infos_test.pkl'` in the
ScanNet dataset's [config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/scannet-seg.py#L129). Remember to
specify the `submission_prefix` in the `test_evaluator`,
e.g., adding `test_evaluator = dict(type='SegMetric', submission_prefix=work_dirs/pointnet2_ssg/test_submission`) or just add `--cfg-options test_evaluator.submission_prefix=work_dirs/pointnet2_ssg/test_submission` in the end of command.
After generating the results, you can basically compress the folder and upload to the [ScanNet evaluation server](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d).
## Qualitative Validation
MMDetection3D also provides versatile tools for visualization such that you can use `tools/misc/visualize_results.py` with results pkl file for offline visualization of add `--show` in the end of test command to do the online visualization.
Besides, we also provide scripts `tools/misc/browse_dataset.py` to visualize the dataset without inference.
Please refer more details in the [doc for visualization](https://mmdetection3d.readthedocs.io/en/latest/useful_tools.html#visualization).
Vision-based 3D detection refers to the 3D detection solutions based on vision-only input, such as monocular, binocular, and multi-view image based 3D detection.
Currently, we only support monocular and multi-view 3D detection methods. Other approaches should be also compatible with our framework and will be supported in the future.
It expects the given model to take any number of images as input, and predict the 3D bounding boxes and category labels for each object of interest.
Taking FCOS3D on the nuScenes dataset as an example, we will show how to prepare data, train and test a model on a standard 3D detection benchmark, and how to visualize and validate the results.
## Data Preparation
To begin with, we need to download the raw data and reorganize the data in a standard way presented in the [doc for data preparation](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html).
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl or .json file.
So after getting all the raw data ready, we need to run the scripts provided in the `create_data.py` for different datasets to generate data infos.
Afterwards, the related folder structure should be as follows:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_trainval.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
```
## Training
Then let us train a model with provided configs for FCOS3D. The basic script is the same as other models.
You can basically follow the examples provided in this [tutorial](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html#inference-with-existing-models) when training with different GPU settings.
Suppose we use 8 GPUs on a single machine with distributed training:
Note that `8xb2` in the config name refers to the training is completed with 8 GPUs and 2 data samples on each GPU.
If your customized setting is different from this, you should add `--auto-scale-lr` to enable automatically scaling learning rate. A basic rule can be referred to [here](https://arxiv.org/abs/1706.02677).
We can also achieve better performance with finetuned FCOS3D by running:
After training a baseline model with the previous script,
please remember to modify the path [here](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py#L8) correspondingly.
## Quantitative Evaluation
During training, the model checkpoints will be evaluated regularly according to the setting of `train_cfg = dict(val_interval=xxx)` in the config.
We support official evaluation protocols for different datasets.
Due to the output format is the same as 3D detection based on other modalities, the evaluation methods are also the same.
For nuScenes, the model will be evaluated with distance-based mean AP (mAP) and NuScenes Detection Score (NDS) for 10 categories respectively.
The evaluation results will be printed in the command like:
If you would like to only conduct inference or test the model performance on the online benchmark,
you just need to specify the `jsonfile_prefix` for corresponding evaluator,
e.g., add `test_evaluator = dict(type='NuscenesMetric', jsonfile_prefix=work_dirs/fcos3d/test_submission)` in the configuration then you can get the results file.
Please guarantee the `data_prefix` and `ann_file` in [info for testing](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/nus-mono3d.py#L93) in the config corresponds to the test set instead of validation set.
After generating the results, you can basically compress the folder and upload to the evalAI evaluation server for nuScenes 3D detection challenge.
## Qualitative Validation
MMDetection3D also provides versatile tools for visualization such that we can have an intuitive feeling of the detection results predicted by our trained models.
You can either set the `--eval-options 'show=True' 'out_dir=${SHOW_DIR}'` option to visualize the detection results online during evaluation,
or using `tools/misc/visualize_results.py` for offline visualization.
Besides, we also provide scripts `tools/misc/browse_dataset.py` to visualize the dataset without inference.
Please refer more details in the [doc for visualization](https://mmdetection3d.readthedocs.io/en/latest/useful_tools.html#visualization).
Note that currently we only support the visualization on images for vision-only methods.
The visualization in the perspective view and bird-eye-view (BEV) will be integrated in the future.
In this section, we demonstrate how to prepare an environment with PyTorch.
MMDetection3D works on Linux, Windows (experimental support) and macOS. It requires Python 3.7+, CUDA 10.0+, and PyTorch 1.8+.
```{note}
If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
```
**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
**Step 1.** Create a conda environment and activate it.
```shell
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
```
**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.
We recommend that users follow our best practices to install MMDetection3D. However, the whole process is highly customizable. See [Customize Installation](#customize-installation) section for more information.
### Best Practices
**Step 0.** Install [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection) using [MIM](https://github.com/open-mmlab/mim).
```shell
pip install-U openmim
mim install mmengine
mim install'mmcv>=2.0.0rc4'
mim install'mmdet>=3.0.0'
```
**Note**: In MMCV-v2.x, `mmcv-full` is renamed to `mmcv`, if you want to install `mmcv` without CUDA ops, you can use `mim install "mmcv-lite>=2.0.0rc4"` to install the lite version.
**Step 1.** Install MMDetection3D.
Case a: If you develop and run mmdet3d directly, install it from source:
# "-b dev-1.x" means checkout to the `dev-1.x` branch.
cd mmdetection3d
pip install-v-e.
# "-v" means verbose, or more output
# "-e" means installing a project in edtiable mode,
# thus any local modifications made to the code will take effect without reinstallation.
```
Case b: If you use mmdet3d as a dependency or third-party package, install it with MIM:
```shell
mim install"mmdet3d>=1.1.0"
```
Note:
1. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
2. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`.
We have supported `spconv 2.0`. If the user has installed `spconv 2.0`, the code will use `spconv 2.0` first, which will take up less GPU memory than using the default `mmcv spconv`. Users can use the following commands to install `spconv 2.0`:
```shell
pip install cumm-cuxxx
pip install spconv-cuxxx
```
Where `xxx` is the CUDA version in the environment.
For example, using CUDA 10.2, the command will be `pip install cumm-cu102 && pip install spconv-cu102`.
Supported CUDA versions include 10.2, 11.1, 11.3, and 11.4. Users can also install it by building from the source. For more details please refer to [spconv v2.x](https://github.com/traveller59/spconv).
We also support `Minkowski Engine` as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/NVIDIA/MinkowskiEngine#installation) or use `pip` to install it:
We also support `Torchsparse` as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/mit-han-lab/torchsparse#installation) or use `pip` to install it:
The downloading will take several seconds or more, depending on your network environment. When it is done, you will find two files `pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py` and `hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth` in your current folder.
**Step 2.** Verify the inference demo.
Case a: If you install MMDetection3D from source, just run the following command.
You will see a visualizer interface with point cloud, where bounding boxes are plotted on cars.
**Note**:
If you install MMDetection3D on a remote server without display device, you can leave out the `--show` argument. Demo will still save the predictions to `outputs/pred/000008.json` file.
**Note**:
If you want to input a `.ply` file, you can use the following function and convert it to `.bin` format. Then you can use the converted `.bin` file to run demo.
Note that you need to install `pandas` and `plyfile` before using this script. This function can also be used for data preprocessing for training `ply data`.
```python
importnumpyasnp
importpandasaspd
fromplyfileimportPlyData
defconvert_ply(input_path,output_path):
plydata=PlyData.read(input_path)# read file
data=plydata.elements[0].data# read data
data_pd=pd.DataFrame(data)# convert to DataFrame
data_np=np.zeros(data_pd.shape,dtype=np.float)# initialize array to store data
property_names=data[0].dtype.names# read names of properties
fori,nameinenumerate(
property_names):# read data by property
data_np[:,i]=data_pd[name]
data_np.astype(np.float32).tofile(output_path)
```
Examples:
```python
convert_ply('./test.ply','./test.bin')
```
If you have point clouds in other format (`.off`, `.obj`, etc.), you can use `trimesh` to convert them into `.ply`.
You will see a list of `Det3DDataSample`, and the predictions are in the `pred_instances_3d`, indicating the detected bounding boxes, labels, and scores.
### Customize Installation
#### CUDA Versions
When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
```{note}
Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
```
#### Install MMEngine without MIM
To install MMEngine with pip instead of MIM, please follow [MMEngine installation guides](https://mmengine.readthedocs.io/en/latest/get_started/installation.html).
For example, you can install MMEngine by the following command:
```shell
pip install mmengine
```
#### Install MMCV without MIM
MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/2.x/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
For example, the following command install MMCV built for PyTorch 1.12.x and CUDA 11.6:
[Google Colab](https://colab.research.google.com/) usually has PyTorch installed, thus we only need to install MMEngine, MMCV, MMDetection, and MMDetection3D with the following commands.
**Step 1.** Install [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection) using [MIM](https://github.com/open-mmlab/mim).
Within Jupyter, the exclamation mark `!` is used to call external executables and `%cd` is a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd) to change the current working directory of Python.
```
#### Using MMDetection3D with Docker
We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docker/Dockerfile) to build an image. Ensure that your [docker version](https://docs.docker.com/engine/install/) >= 19.03.
```shell
# build an image with PyTorch 1.9, CUDA 11.1
# If you prefer other versions, just modified the Dockerfile
docker build -t mmdetection3d docker/
```
Run it with:
```shell
docker run --gpus all --shm-size=8g -it-v{DATA_DIR}:/mmdetection3d/data mmdetection3d
```
### Troubleshooting
If you have some issues during the installation, please first view the [FAQ](notes/faq.md) page.
You may [open an issue](https://github.com/open-mmlab/mmdetection3d/issues/new/choose) on GitHub if no solution is found.
### Use Multiple Versions of MMDetection3D in Development
Training and testing scripts have already been modified in `PYTHONPATH` in order to make sure the scripts are using their own versions of MMDetection3D.
To install the default version of MMDetection3D in your environment, you can exclude the following code in the related scripts:
Along with the release of OpenMMLab 2.0, MMDetection3D (namely MMDet3D) 1.1 made many significant changes, resulting in less redundant, more efficient code and a more consistent overall design. These changes break backward compatibility. Therefore, we prepared this migration guide to make the transition as smooth as possible so that all users can enjoy the productivity benefits of the new MMDet3D and the entire OpenMMLab 2.0 ecosystem.
## Environment
MMDet3D 1.1 depends on the new foundational library [MMEngine](https://github.com/open-mmlab/mmengine) for training deep learning models, and therefore has an entirely different dependency chain compared with MMDet3D 1.0. Even if you have a well-rounded MMDet3D 1.0 / 0.x environment before, you still need to create a new Python environment for MMDet3D 1.1. We provide a detailed [installation guide](./get_started.md) for reference.
The configuration files in our new version have a lot of modifications because of the differences between MMCV 1.x and MMEngine. The guides for migration from MMCV to MMEngine can be seen [here](https://github.com/open-mmlab/mmengine/tree/main/docs/en/migration).
We have renamed the names of the remote branches in MMDet3D 1.1 (renaming 1.1 to main, master to 1.0, and dev to dev-1.0). If your local branches in the git system are not aligned with branches of the remote repo, you can use the following commands to resolve it:
```
git fetch origin
git checkout main
git branch main_backup # backup your main branch
git reset --hard origin/main
```
## Dataset
You should update the annotation files generated in the 1.0 version since some key words and structures of annotation in MMDet3D 1.1 have changed. Taking KITTI as an example, the update script is as follows:
If your annotation files are generated in the 0.x version, you should first update them to 1.0 version using this [script](../../tools/update_data_coords.py). Alternatively, you can re-generate annotation files from scratch using this [script](../../tools/create_data.py).
## Model
MMDet3D 1.1 supports loading weights trained on the old version (1.0 version). For models that are important or frequently used, we have thoroughly verified their precisions in the 1.1 version. Especially for some models that may experience potential performance drop or training bugs in the old version, such as [centerpoint](https://github.com/open-mmlab/mmdetection3d/issues/2390), we have checked them and ensured the right precision in the new version. If you encounter any problem, please feel free to raise an [issue](https://github.com/open-mmlab/mmdetection3d/issues). Additionally, we have added some of the latest SOTA methods in our [package](../../configs/) and [projects](../../projects/), making MMDet3D 1.1 a highly recommended choice for implementing your project.
- For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows.
- We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/analysis_tools/benchmark.py) which computes the average time on 2000 images.
## Baselines
### SECOND
Please refer to [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/second) for details. We provide SECOND baselines on KITTI and Waymo datasets.
### PointPillars
Please refer to [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointpillars) for details. We provide pointpillars baselines on KITTI, nuScenes, Lyft, and Waymo datasets.
### Part-A2
Please refer to [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/parta2) for details.
### VoteNet
Please refer to [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/votenet) for details. We provide VoteNet baselines on ScanNet and SUNRGBD datasets.
### Dynamic Voxelization
Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/dynamic_voxelization) for details.
### MVXNet
Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/mvxnet) for details.
### RegNetX
Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently.
### nuImages
We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently.
### H3DNet
Please refer to [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/h3dnet) for details.
### 3DSSD
Please refer to [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/3dssd) for details.
### CenterPoint
Please refer to [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/centerpoint) for details.
### SSN
Please refer to [SSN](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/ssn) for details. We provide pointpillars with shape-aware grouping heads used in SSN on the nuScenes and Lyft datasets currently.
### ImVoteNet
Please refer to [ImVoteNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvotenet) for details. We provide ImVoteNet baselines on SUNRGBD dataset.
### FCOS3D
Please refer to [FCOS3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcos3d) for details. We provide FCOS3D baselines on the nuScenes dataset.
### PointNet++
Please refer to [PointNet++](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/pointnet2) for details. We provide PointNet++ baselines on ScanNet and S3DIS datasets.
### Group-Free-3D
Please refer to [Group-Free-3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/groupfree3d) for details. We provide Group-Free-3D baselines on ScanNet dataset.
### ImVoxelNet
Please refer to [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/imvoxelnet) for details. We provide ImVoxelNet baselines on KITTI dataset.
### PAConv
Please refer to [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/paconv) for details. We provide PAConv baselines on S3DIS dataset.
### DGCNN
Please refer to [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn) for details. We provide DGCNN baselines on S3DIS dataset.
### SMOKE
Please refer to [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke) for details. We provide SMOKE baselines on KITTI dataset.
### PGD
Please refer to [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd) for details. We provide PGD baselines on KITTI and nuScenes dataset.
### PointRCNN
Please refer to [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn) for details. We provide PointRCNN baselines on KITTI dataset.
### MonoFlex
Please refer to [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) for details. We provide MonoFlex baselines on KITTI dataset.
### SA-SSD
Please refer to [SA-SSD](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/sassd) for details. We provide SA-SSD baselines on the KITTI dataset.
### FCAF3D
Please refer to [FCAF3D](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/fcaf3d) for details. We provide FCAF3D baselines on the ScanNet, S3DIS, and SUN RGB-D datasets.
### PV-RCNN
Please refer to [PV-RCNN](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pv_rcnn) for details. We provide PV-RCNN baselines on the KITTI dataset.
### BEVFusion
Please refer to [BEVFusion](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/BEVFusion) for details. We provide BEVFusion baselines on the NuScenes dataset.
### CenterFormer
Please refer to [CenterFormer](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/CenterFormer) for details. We provide CenterFormer baselines on the Waymo dataset.
### TR3D
Please refer to [TR3D](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/TR3D) for details. We provide TR3D baselines on the ScanNet, SUN RGB-D and S3DIS dataset.
### DETR3D
Please refer to [DETR3D](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/DETR3D) for details. We provide DETR3D baselines on the nuScenes dataset.
### PETR
Please refer to [PETR](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/PETR) for details. We provide PETR baselines on the nuScenes dataset.
### TPVFormer
Please refer to [TPVFormer](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/projects/TPVFormer) for details. We provide TPVFormer baselines on the nuScenes dataset.
### Mixed Precision (FP16) Training
Please refer to [Mixed Precision (FP16) Training on PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py) for details.
- Model: Since all the other codebases implements different models, we compare the corresponding models including SECOND, PointPillars, Part-A2, and VoteNet with them separately.
- Metrics: We use the average throughput in iterations of the entire training run and skip the first 50 iterations of each epoch to skip GPU warmup time.
## Main Results
We compare the training speed (samples/s) with other codebases if they implement the similar models. The results are as below, the greater the numbers in the table, the faster of the training process. The models that are not supported by other codebases are marked by `×`.
- __MMDetection3D__: We try to use as similar settings as those of other codebases as possible using [benchmark configs](https://github.com/open-mmlab/MMDetection3D/blob/main/configs/benchmark).
- __Det3D__: For comparison with Det3D, we use the commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7).
- __OpenPCDet__: For comparison with OpenPCDet, we use the commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2).
For training speed, we add code to record the running time in the file `./tools/train_utils/train_utils.py`. We calculate the speed of each epoch, and report the average speed of all the epochs.
<details>
<summary>
(diff to make it use the same method for benchmarking speed - click to expand)
- __Det3D__: At commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7), use `kitti_point_pillars_mghead_syncbn.py` and run
For SECOND, we mean the [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config) that was first implemented in [second.Pytorch](https://github.com/traveller59/second.pytorch). Det3D's implementation of SECOND uses its self-implemented Multi-Group Head, so its speed is not compatible with other codebases.
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), train the model by running
- Support the training of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2738)
- Support [Nerf-Det](https://arxiv.org/abs/2307.14620) in `projects` (#2732)
#### New Features
- Support the training of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2738)
- Support [Nerf-Det](https://arxiv.org/abs/2307.14620) in `projects` (#2732)
- Support [MV-FCOS3D++](https://arxiv.org/abs/2207.12716)
- Refactor Waymo dataset (#2836)
#### Improvements
- Support [PGD](https://arxiv.org/abs/2107.14160)) (front-of-view / multi-view) on Waymo dataset (#2835)
- Release new [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) for verify some methods or debug quickly (#2835)
#### Bug Fixes
- Fix MinkUNet and SPVCNN some wrong configs (#2854)
- Fix incorrect number of arguments in PETR (#2800)
- Delete unused files in `mmdet3d/configs` (#2773)
#### Contributors
A total of 5 developers contributed to this release.
- Support [New Config Type](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta) in `mmdet3d/config` (#2608)
- Support the inference of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2606)
- Support downloading datasets from [OpenDataLab](https://opendatalab.com/) using `mim` (#2593)
#### New Features
- Support [New Config Type](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta) in `mmdet3d/config` (#2608)
- Support the inference of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects` (#2606)
- Support downloading datasets from [OpenDataLab](https://opendatalab.com/) using `mim` (#2593)
#### Improvements
- Enhanced visualization in interactive form (#2611)
- Support a camera-only 3D detection baseline on Waymo, [MV-FCOS3D++](https://arxiv.org/abs/2207.12716)
#### New Features
- Support a camera-only 3D detection baseline on Waymo, [MV-FCOS3D++](https://arxiv.org/abs/2207.12716), with new evaluation metrics and transformations (#1716)
- Refactor PointRCNN in the framework of mmdet3d v1.1 (#1819)
#### Improvements
- Add `auto_scale_lr` in config to support training with auto-scale learning rates (#1807)
- Fix CI (#1813, #1865, #1877)
- Update `browse_dataset.py` script (#1817)
- Update SUN RGB-D and Lyft datasets documentation (#1833)
- Rename `convert_to_datasample` to `add_pred_to_datasample` in detectors (#1843)
- Update customized dataset documentation (#1845)
- Update `Det3DLocalVisualization` and visualization documentation (#1857)
- Add the code of generating `cam_sync_labels` for Waymo dataset (#1870)
- Update dataset transforms typehints (#1875)
#### Bug Fixes
- Fix missing registration of models in [setup_env.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/utils/setup_env.py)(#1808)
- Fix the data base sampler bugs when using the ground plane data (#1812)
- Add output directory existing check during visualization (#1828)
- Fix bugs of nuScenes dataset for monocular 3D detection (#1837)
- Fix visualization hook to support the visualization of different data modalities (#1839)
- Fix monocular 3D detection demo (#1864)
- Fix the lack of `num_pts_feats` key in nuscenes dataset and complete docstring (#1882)
#### Contributors
A total of 10 developers contributed to this release.
We are excited to announce the release of MMDetection3D 1.1.0rc0.
MMDet3D 1.1.0rc0 is the first version of MMDetection3D 1.1, a part of the OpenMMLab 2.0 projects.
Built upon the new [training engine](https://github.com/open-mmlab/mmengine) and [MMDet 3.x](https://github.com/open-mmlab/mmdetection/tree/3.x),
MMDet3D 1.1 unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.
It also provides a standard data protocol for different datasets, modalities, and tasks for 3D perception.
We will support more strong baselines in the future release, with our latest exploration on camera-only 3D detection from videos.
### Highlights
1.**New engines**. MMDet3D 1.1 is based on [MMEngine](https://github.com/open-mmlab/mmengine) and [MMDet 3.x](https://github.com/open-mmlab/mmdetection/tree/3.x), which provides a universal and powerful runner that allows more flexible customizations and significantly simplifies the entry points of high-level interfaces.
2.**Unified interfaces**. As a part of the OpenMMLab 2.0 projects, MMDet3D 1.1 unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
3.**Standard data protocol for all the datasets, modalities, and tasks for 3D perception**. Based on the unified base datasets inherited from MMEngine, we also design a standard data protocol that defines and unifies the common keys across different datasets, tasks, and modalities. It significantly simplifies the usage of multiple datasets and data modalities for multi-task frameworks and eases dataset customization. Please refer to the [documentation of customized datasets](../advanced_guides/customize_dataset.md) for details.
4.**Strong baselines**. We will release strong baselines of many popular models to enable fair comparisons among state-of-the-art models.
5.**More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it [here](https://mmdetection3d.readthedocs.io/en/1.1/).
### Breaking Changes
MMDet3D 1.1 has undergone significant changes to have better design, higher efficiency, more flexibility, and more unified interfaces.
Besides the changes of API, we briefly list the major breaking changes in this section.
We will update the [migration guide](../migration.md) to provide complete details and migration instructions.
Users can also refer to the [compatibility documentation](./compatibility.md) and [API doc](https://mmdetection3d.readthedocs.io/en/1.1/) for more details.
#### Dependencies
- MMDet3D 1.1 runs on PyTorch>=1.6. We have deprecated the support of PyTorch 1.5 to embrace the mixed precision training and other new features since PyTorch 1.6. Some models can still run on PyTorch 1.5, but the full functionality of MMDet3D 1.1 is not guaranteed.
- MMDet3D 1.1 relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models of OpenMMLab and are widely depended by OpenMMLab 2.0 projects. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
- MMDet3D 1.1 relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMDet3D 1.1 relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package `mmcv` is the version that provides pre-built CUDA operators and `mmcv-lite` does not since MMCV 2.0.0rc0, while `mmcv-full` has been deprecated since 2.0.0rc0.
- MMDet3D 1.1 is based on MMDet 3.x, which is also a part of OpenMMLab 2.0 projects.
#### Training and testing
- MMDet3D 1.1 uses Runner in [MMEngine](https://github.com/open-mmlab/mmengine) rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMDet3D 1.1 no longer relies on the building logics of those modules in `mmdet3d.train.apis` and `tools/train.py`. Those code have been migrated into [MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py). Please refer to the [migration guide of Runner in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for more details.
- The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
- The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the [migration guide of Hook in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/hook.html) for more details.
- Learning rate and momentum scheduling has been migrated from Hook to [Parameter Scheduler in MMEngine](https://mmengine.readthedocs.io/en/latest/tutorials/param_scheduler.html). Please refer to the [migration guide of Parameter Scheduler in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/param_scheduler.html) for more details.
#### Configs
- The [Runner in MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py) uses a different config structure to ease the understanding of the components in runner. Users can read the [config example of MMDet3D 1.1](../user_guides/config.md) or refer to the [migration guide in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for migration details.
- The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. The names of checkpoints are not updated for now as there is no BC-breaking of model weights between MMDet3D 1.1 and 1.0.x. We will progressively replace all the model weights by those trained in MMDet3D 1.1. Please refer to the [user guides of config](../user_guides/config.md) for more details.
#### Dataset
The Dataset classes implemented in MMDet3D 1.1 all inherits from the `Det3DDataset` and `Seg3DDataset`, which inherits from the [BaseDataset in MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/basedataset.html). In addition to the changes of interfaces, there are several changes of Dataset in MMDet3D 1.1.
- All the datasets support to serialize the internal data list to reduce the memory when multiple workers are built for data loading.
- The internal data structure in the dataset is changed to be self-contained (without losing information like class names in MMDet3D 1.0.x) while keeping simplicity.
- Common keys across different datasets and data modalities are defined and all the info files are unified into a standard protocol.
- The evaluation functionality of each dataset has been removed from dataset so that some specific evaluation metrics like KITTI AP can be used to evaluate the prediction on other datasets.
#### Data Transforms
The data transforms in MMDet3D 1.1 all inherits from `BaseTransform` in MMCV>=2.0.0rc0, which defines a new convention in OpenMMLab 2.0 projects.
Besides the interface changes, there are several changes listed as below:
- The functionality of some data transforms (e.g., `Resize`) are decomposed into several transforms to simplify and clarify the usages.
- The format of data dict processed by each data transform is changed according to the new data structure of dataset.
- Some inefficient data transforms (e.g., normalization and padding) are moved into data preprocessor of model to improve data loading and training speed.
- The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic given the same arguments, i.e., `Resize` in MMDet 3.x and MMSeg 1.x will resize the image in the exact same manner given the same arguments.
#### Model
The models in MMDet3D 1.1 all inherits from `BaseModel` in MMEngine, which defines a new convention of models in OpenMMLeb 2.0 projects.
Users can refer to [the tutorial of model in MMengine](https://mmengine.readthedocs.io/en/latest/tutorials/model.html) for more details.
Accordingly, there are several changes as the following:
- The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMDet3D 1.1.
Specifically, all the input data in training and testing are packed into `inputs` and `data_samples`, where `inputs` contains model inputs like a dict contain a list of image tensors and the point cloud data, and `data_samples` contains other information of the current data sample such as ground truths, region proposals, and model predictions. In this way, different tasks in MMDet3D 1.1 can share the same input arguments, which makes the models more general and suitable for multi-task learning and some flexible training paradigms like semi-supervised learning.
- The model has a data preprocessor module, which are used to pre-process the input data of model. In MMDet3D 1.1, the data preprocessor usually does necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
- The internal logic of model have been changed. In MMDet3D 1.1, model uses `forward_train`, `forward_test`, `simple_test`, and `aug_test` to deal with different model forward logics. In MMDet3D 1.1 and OpenMMLab 2.0, the forward function has three modes: 'loss', 'predict', and 'tensor' for training, inference, and tracing or other purposes, respectively.
The forward function calls `self.loss`, `self.predict`, and `self._forward` given the modes 'loss', 'predict', and 'tensor', respectively.
#### Evaluation
The evaluation in MMDet3D 1.0.x strictly binds with the dataset. In contrast, MMDet3D 1.1 decomposes the evaluation from dataset, so that all the detection dataset can evaluate with KITTI AP and other metrics implemented in MMDet3D 1.1.
MMDet3D 1.1 mainly implements corresponding metrics for each dataset, which are manipulated by [Evaluator](https://mmengine.readthedocs.io/en/latest/design/evaluator.html) to complete the evaluation.
Users can build evaluator in MMDet3D 1.1 to conduct offline evaluation, i.e., evaluate predictions that may not produced in MMDet3D 1.1 with the dataset as long as the dataset and the prediction follows the dataset conventions. More details can be find in the [tutorial in mmengine](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html).
#### Visualization
The functions of visualization in MMDet3D 1.1 are removed. Instead, in OpenMMLab 2.0 projects, we use [Visualizer](https://mmengine.readthedocs.io/en/latest/design/visualization.html) to visualize data. MMDet3D 1.1 implements `Det3DLocalVisualizer` to allow visualization of 2D and 3D data, ground truths, model predictions, and feature maps, etc., at any place. It also supports to send the visualization data to any external visualization backends such as Tensorboard.
### Planned changes
We list several planned changes of MMDet3D 1.1.0rc0 so that the community could more comprehensively know the progress of MMDet3D 1.1. Feel free to create a PR, issue, or discussion if you are interested, have any suggestions and feedbacks, or want to participate.
1. Test-time augmentation: which is supported in MMDet3D 1.0.x, is not implemented in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
2. Inference interfaces: a unified inference interfaces will be supported in the future to ease the use of released models.
3. Interfaces of useful tools that can be used in notebook: more useful tools that implemented in the `tools` directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
4. Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMDet3D 1.1.
5. Wandb visualization: MMDet 2.x supports data visualization since v2.25.0, which has not been migrated to MMDet 3.x for now. Since Wandb provides strong visualization and experiment management capabilities, a `DetWandbVisualizer` and maybe a hook are planned to fully migrated those functionalities in MMDet 2.x and a `Det3DWandbVisualizer` will be supported in MMDet3D 1.1 accordingly.
6. Will support recent new features added in MMDet3D 1.0.x and our recent exploration on camera-only 3D detection from videos: we will refactor these models and support them with benchmarks and models soon.
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf)
#### New Features
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf)(#1337)
#### Improvements
- Add Chinese documentation for vision-only 3D detection (#1438)
- Update CenterPoint pretrained models that are compatible with refactored coordinate systems (#1450)
- Configure myst-parser to parse anchor tag in the documentation (#1488)
- Replace markdownlint with mdformat for avoiding installing ruby (#1489)
- Add missing `gt_names` when getting annotation info in Custom3DDataset (#1519)
- Support S3DIS full ceph training (#1542)
- Rewrite the installation and FAQ documentation (#1545)
#### Bug Fixes
- Fix the incorrect registry name when building RoI extractors (#1460)
- Fix the potential problems caused by the registry scope update when composing pipelines (#1466) and using CocoDataset (#1536)
- Fix the missing selection with `order` in the [box3d_nms](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/post_processing/box3d_nms.py) introduced by [#1403](https://github.com/open-mmlab/mmdetection3d/pull/1403)(#1479)
- Update the [PointPillars config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py) to make it consistent with the log (#1486)
- Fix heading anchor in documentation (#1490)
- Fix the compatibility of mmcv in the dockerfile (#1508)
- Make overwrite_spconv packaged when building whl (#1516)
- Fix the requirement of mmcv and mmdet (#1537)
- Update configs of PartA2 and support its compatibility with spconv 2.0 (#1538)
#### Contributors
A total of 13 developers contributed to this release.
- Support training models on custom datasets with only point clouds
- Update Registry to distinguish the scope of built functions
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes
#### New Features
- Add loader arguments in the configuration files (#1388)
- Support [spconv 2.0](https://github.com/traveller59/spconv) when the package is installed. Users can still use spconv 1.x in MMCV with CUDA 9.0 (only cost more memory) without losing the compatibility of model weights between two versions (#1421)
- Support MinkowskiEngine with MinkResNet (#1422)
#### Improvements
- Add the documentation for model deployment (#1373, #1436)
- Add Chinese documentation of
- Speed benchmark (#1379)
- LiDAR-based 3D detection (#1368)
- LiDAR 3D segmentation (#1420)
- Coordinate system refactoring (#1384)
- Support training models on custom datasets with only point clouds (#1393)
- Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes (#1403, #1418)
- Update Registry to distinguish the scope of building functions (#1412, #1443)
- Replace recommonmark with myst_parser for documentation rendering (#1414)
#### Bug Fixes
- Fix the show pipeline in the [browse_dataset.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/misc/browse_dataset.py)(#1376)
- Fix missing __init__ files after coordinate system refactoring (#1383)
- Fix the incorrect yaw in the visualization caused by coordinate system refactoring (#1407)
- Fix `NaiveSyncBatchNorm1d` and `NaiveSyncBatchNorm2d` to support non-distributed cases and more general inputs (#1435)
#### Contributors
A total of 11 developers contributed to this release.
- We migrate all the mmdet3d ops to mmcv and do not need to compile them when installing mmdet3d.
- To fix the imprecise timestamp and optimize its saving method, we reformat the point cloud data during Waymo data conversion. The data conversion time is also optimized significantly by supporting parallel processing. Please re-generate KITTI format Waymo data if necessary. See more details in the [compatibility documentation](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md).
- We update some of the model checkpoints after the refactor of coordinate systems. Please stay tuned for the release of the remaining model checkpoints.
| | Fully Updated | Partially Updated | In Progress | No Influcence |
- Add ScanNet instance segmentation dataset with metrics
- Better compatibility for windows with CI support, op migration and bug fixes
- Support loading annotations from Ceph
#### New Features
- Add ScanNet instance segmentation dataset with metrics (#1230)
- Support different random seeds for different ranks (#1321)
- Support loading annotations from Ceph (#1325)
- Support resuming from the latest checkpoint automatically (#1329)
- Add windows CI (#1345)
#### Improvements
- Update the table format and OpenMMLab project orders in [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md)(#1272, #1283)
- Migrate all the mmdet3d ops to mmcv (#1240, #1286, #1290, #1333)
- Add `with_plane` flag in the KITTI data conversion (#1278)
- Update instructions and links in the documentation (#1300, 1309, #1319)
- Support parallel Waymo dataset converter and ground truth database generator (#1327)
- Add quick installation commands to [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/getting_started.md)(#1366)
#### Bug Fixes
- Update nuimages configs to use new nms config style (#1258)
- Fix the usage of np.long for windows compatibility (#1270)
- Fix the incorrect indexing in `BasePoints` (#1274)
- Fix the incorrect indexing in the [pillar_scatter.forward_single](https://github.com/open-mmlab/mmdetection3d/blob/dev/mmdet3d/models/middle_encoders/pillar_scatter.py#L38)(#1280)
- Fix unit tests that use GPUs (#1301)
- Fix incorrect feature dimensions in `DynamicPillarFeatureNet` caused by previous upgrading of `PillarFeatureNet` (#1302)
- Remove the `CameraPoints` constraint in `PointSample` (#1314)
- Fix imprecise timestamps saving of Waymo dataset (#1327)
#### Contributors
A total of 9 developers contributed to this release.
- We refactor our three coordinate systems to make their rotation directions and origins more consistent, and further remove unnecessary hacks in different datasets and models. Therefore, please re-generate data infos or convert the old version to the new one with our provided scripts. We will also provide updated checkpoints in the next version. Please refer to the [compatibility documentation](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/docs/en/compatibility.md) for more details.
- Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification changes the key names to `lidar2img`, `depth2img`, `cam2img`, etc., for easier understanding. Customized codes using legacy keys may be influenced.
- The next release will begin to move files of CUDA ops to [MMCV](https://github.com/open-mmlab/mmcv). It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
#### Highlights
- Support new monocular 3D detectors: [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd), [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke), [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex)
- Support a new LiDAR-based detector: [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn)
- Support a new backbone: [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn)
- Support 3D object detection on the S3DIS dataset
- Support compilation on Windows
- Full benchmark for PAConv on S3DIS
- Further enhancement for documentation, especially on the Chinese documentation
#### New Features
- Support 3D object detection on the S3DIS dataset (#835)
- Support PointRCNN (#842, #843, #856, #974, #1022, #1109, #1125)
- Fix missing dimension information in the SUN RGB-D data generation (#1120)
- Fix incorrect anchor range settings in the PointPillars [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py) for KITTI (#1163)
- Fix incorrect model information in the RegNet metafile (#1184)
- Fix bugs in non-distributed multi-gpu training and testing (#1197)
- Fix a potential assertion error when generating corners from an empty box (#1212)
- Upgrade bazel version according to the requirement of Waymo Devkit (#1223)
#### Contributors
A total of 12 developers contributed to this release.
- Support Flip3D augmentation in semantic segmentation task (#1182)
- Update regnet metafile (#1184)
- Add point cloud annotation tools introduction in FAQ (#1185)
- Add missing explanations of `cam_intrinsic` in the nuScenes dataset doc (#1193)
#### Bug Fixes
- Deprecate the support for "python setup.py test" (#1164)
- Fix the rotation matrix while rotation axis=0 (#1182)
- Fix the bug in non-distributed multi-gpu training/testing (#1197)
- Fix a potential bug when generating corners for empty bounding boxes (#1212)
#### Contributors
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
### v0.18.0 (1/1/2022)
#### Highlights
- Update the required minimum version of mmdet and mmseg
#### Improvements
- Use the official markdownlint hook and add codespell hook for pre-committing (#1088)
- Improve CI operation (#1095, #1102, #1103)
- Use shared menu content from OpenMMLab's theme and remove duplicated contents from config (#1111)
- Refactor the structure of documentation (#1113, #1121)
- Update the required minimum version of mmdet and mmseg (#1147)
#### Bug Fixes
- Fix symlink failure on Windows (#1096)
- Fix the upper bound of mmcv version in the mminstall requirements (#1104)
- Fix API documentation compilation and mmcv build errors (#1116)
- Fix figure links and pdf documentation compilation (#1132, #1135)
#### Contributors
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
### v0.17.3 (1/12/2021)
#### Improvements
- Change the default show value to `False` in show_result function to avoid unnecessary errors (#1034)
- Improve the visualization of detection results with colorized points in [single_gpu_test](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/test.py#L11)(#1050)
- Clean unnecessary custom_imports in entrypoints (#1068)
#### Bug Fixes
- Update mmcv version in the Dockerfile (#1036)
- Fix the memory-leak problem when loading checkpoints in [init_model](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/inference.py#L36)(#1045)
- Fix incorrect velocity indexing when formatting boxes on nuScenes (#1049)
- Explicitly set cuda device ID in [init_model](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/apis/inference.py#L36) to avoid memory allocation on unexpected devices (#1056)
- Update the solutions for incompatibility of pycocotools in the FAQ (#993)
- Add Chinese documentation for the KITTI (#1003) and Lyft (#1010) dataset tutorial
- Add the H3DNet checkpoint converter for incompatible keys (#1007)
#### Bug Fixes
- Update mmdetection and mmsegmentation version in the Dockerfile (#992)
- Fix links in the Chinese documentation (#1015)
#### Contributors
A total of 4 developers contributed to this release.
@Tai-Wang, @wHao-Wu, @ZwwWayne, @ZCMax
### v0.17.1 (1/10/2021)
#### Highlights
- Support a faster but non-deterministic version of hard voxelization
- Completion of dataset tutorials and the Chinese documentation
- Improved the aesthetics of the documentation format
#### Improvements
- Add Chinese documentation for training on customized datasets and designing customized models (#729, #820)
- Support a faster but non-deterministic version of hard voxelization (#904)
- Update paper titles and code details for metafiles (#917)
- Add a tutorial for KITTI dataset (#953)
- Use Pytorch sphinx theme to improve the format of documentation (#958)
- Use the docker to accelerate CI (#971)
#### Bug Fixes
- Fix the sphinx version used in the documentation (#902)
- Fix a dynamic scatter bug that discards the first voxel by mistake when all input points are valid (#915)
- Fix the inconsistent variable names used in the [unit test](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_voxel_encoder/test_voxel_generator.py) for voxel generator (#919)
- Upgrade to use `build_prior_generator` to replace the legacy `build_anchor_generator` (#941)
- Fix a minor bug caused by a too small difference set in the FreeAnchor Head (#944)
#### Contributors
A total of 8 developers contributed to this release.
- Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification change the key names to `lidar2img`, `depth2img`, `cam2img`, etc. for easier understanding. Customized codes using legacy keys may be influenced.
- The next release will begin to move files of CUDA ops to [MMCV](https://github.com/open-mmlab/mmcv). It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
#### Highlights
- Support 3D object detection on the S3DIS dataset
- Support compilation on Windows
- Full benchmark for PAConv on S3DIS
- Further enhancement for documentation, especially on the Chinese documentation
#### New Features
- Support 3D object detection on the S3DIS dataset (#835)
#### Improvements
- Support point sampling based on distance metric (#667, #840)
- Update PointFusion to support unified camera keys (#791)
- Add Chinese documentation for customized dataset (#792), data pipeline (#827), customized runtime (#829), 3D Detection on ScanNet (#836), nuScenes (#854) and Waymo (#859)
- Unify camera keys used in transformation between different systems (#805)
- Add a script to support benchmark regression (#808)
- Benchmark PAConvCUDA on S3DIS (#847)
- Add a tutorial for 3D detection on the Lyft dataset (#849)
- Support to download pdf and epub documentation (#850)
- Change the `repeat` setting in Group-Free-3D configs to reduce training epochs (#855)
#### Bug Fixes
- Fix compiling errors on Windows (#766)
- Fix the deprecated nms setting in the ImVoteNet config (#828)
- Use the latest `wrap_fp16_model` import from mmcv (#861)
- Remove 2D annotations generation on Lyft (#867)
- Update index files for the Chinese documentation to be consistent with the English version (#873)
- Fix the nested list transpose in the CenterPoint head (#879)
- Fix deprecated pretrained model loading for RegNet (#889)
#### Contributors
A total of 11 developers contributed to this release.
- Remove the rotation and dimension hack in the monocular 3D detection on nuScenes by applying corresponding transformation in the pre-processing and post-processing. The modification only influences nuScenes coco-style json files. Please re-run the data preparation scripts if necessary. See more details in the PR #744.
- Add a new pre-processing module for the ScanNet dataset in order to support multi-view detectors. Please run the updated scripts to extract the RGB data and its annotations. See more details in the PR #696.
#### Highlights
- Support to use [MIM](https://github.com/open-mmlab/mim) with pip installation
- Support PAConv [models and benchmarks](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/paconv) on S3DIS
- Enhance the documentation especially on dataset tutorials
#### New Features
- Support RGB images on ScanNet for multi-view detectors (#696)
- Support FLOPs and number of parameters calculation (#736)
- Support to use [MIM](https://github.com/open-mmlab/mim) with pip installation (#782)
- Support PAConv models and benchmarks on the S3DIS dataset (#783, #809)
#### Improvements
- Refactor Group-Free-3D to make it inherit BaseModule from MMCV (#704)
- Modify the initialization methods of FCOS3D to be consistent with the refactored approach (#705)
- Benchmark the Group-Free-3D [models](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/groupfree3d) on ScanNet (#710)
- Add Chinese documentation for Getting Started (#725), FAQ (#730), Model Zoo (#735), Demo (#745), Quick Run (#746), Data Preparation (#787) and Configs (#788)
- Add documentation for semantic segmentation on ScanNet and S3DIS (#743, #747, #806, #807)
- Add a parameter `max_keep_ckpts` to limit the maximum number of saved Group-Free-3D checkpoints (#765)
- Add documentation for 3D detection on SUN RGB-D and nuScenes (#770, #793)
- Remove mmpycocotools in the Dockerfile (#785)
#### Bug Fixes
- Fix versions of OpenMMLab dependencies (#708)
- Convert `rt_mat` to `torch.Tensor` in coordinate transformation for compatibility (#709)
- Fix the `bev_range` initialization in `ObjectRangeFilter` according to the `gt_bboxes_3d` type (#717)
- Fix Chinese documentation and incorrect doc format due to the incompatible Sphinx version (#718)
- Fix a potential bug when setting `interval == 1` in [analyze_logs.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/analysis_tools/analyze_logs.py)(#720)
- Update the structure of Chinese documentation (#722)
- Fix FCOS3D FPN BC-Breaking caused by the code refactoring in MMDetection (#739)
- Fix wrong `in_channels` when `with_distance=True` in the [Dynamic VFE Layers](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/voxel_encoders/voxel_encoder.py#L87)(#749)
- Fix the dimension and yaw hack of FCOS3D on nuScenes (#744, #794, #795, #818)
- Fix the missing default `bbox_mode` in the `show_multi_modality_result` (#825)
#### Contributors
A total of 12 developers contributed to this release.
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to [#1120](https://github.com/open-mmlab/mmcv/pull/1120), for related issues, please refer to [#5343](https://github.com/open-mmlab/mmdetection/issues/5343).
#### Highlights
- Support [PAConv](https://arxiv.org/abs/2103.14635)
- Support monocular/multi-view 3D detector [ImVoxelNet](https://arxiv.org/abs/2106.01178) on KITTI
- Support Transformer-based 3D detection method [Group-Free-3D](https://arxiv.org/abs/2104.00678) on ScanNet
- Add documentation for tasks including LiDAR-based 3D detection, vision-only 3D detection and point-based 3D semantic segmentation
- Add dataset documents like ScanNet
#### New Features
- Support Group-Free-3D on ScanNet (#539)
- Support PAConv modules (#598, #599)
- Support ImVoxelNet on KITTI (#627, #654)
#### Improvements
- Add unit tests for pipeline functions `LoadImageFromFileMono3D`, `ObjectNameFilter` and `ObjectRangeFilter` (#615)
- Refactor model initialization methods based MMCV (#622)
- Add Chinese docs (#629)
- Add documentation for LiDAR-based 3D detection (#642)
- Unify intrinsic and extrinsic matrices for all datasets (#653)
- Add documentation for point-based 3D semantic segmentation (#663)
- Add documentation of ScanNet for 3D detection (#664)
- Refine docs for tutorials (#666)
- Add documentation for vision-only 3D detection (#669)
- Refine docs for Quick Run and Useful Tools (#686)
#### Bug Fixes
- Fix the bug of [BackgroundPointsFilter](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/transforms_3d.py) using the bottom center of ground truth (#609)
- Fix [LoadMultiViewImageFromFiles](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/loading.py) to unravel stacked multi-view images to list to be consistent with DefaultFormatBundle (#611)
- Fix the potential bug in [analyze_logs](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/analysis_tools/analyze_logs.py) when the training resumes from a checkpoint or is stopped before evaluation (#634)
- Fix test commands in docs and make some refinements (#635)
- Fix wrong config paths in unit tests (#641)
### v0.14.0 (1/6/2021)
#### Highlights
- Support the point cloud segmentation method [PointNet++](https://arxiv.org/abs/1706.02413)
#### New Features
- Support PointNet++ (#479, #528, #532, #541)
- Support RandomJitterPoints transform for point cloud segmentation (#584)
- Support RandomDropPointsColor transform for point cloud segmentation (#585)
#### Improvements
- Move the point alignment of ScanNet from data pre-processing to pipeline (#439, #470)
- Add compatibility document to provide detailed descriptions of BC-breaking changes (#504)
- Support points rotation even without bounding box in GlobalRotScaleTrans for point cloud segmentaiton (#540)
- Support visualization of detection results and dataset browse for nuScenes Mono-3D dataset (#542, #582)
- Support faster implementation of KNN (#586)
- Support RegNetX models on Lyft dataset (#589)
- Remove a useless parameter `label_weight` from segmentation datasets including `Custom3DSegDataset`, `ScanNetSegDataset` and `S3DISSegDataset` (#607)
#### Bug Fixes
- Fix a corrupted lidar data file in Lyft dataset in [data_preparation](https://github.com/open-mmlab/mmdetection3d/tree/master/docs/data_preparation.md)(#546)
- Fix evaluation bugs in nuScenes and Lyft dataset (#549)
- Fix converting points between coordinates with specific transformation matrix in the [coord_3d_mode.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/coord_3d_mode.py)(#556)
- Support PointPillars models on Lyft dataset (#578)
- Fix the bug of demo with pre-trained VoteNet model on ScanNet (#600)
### v0.13.0 (1/5/2021)
#### Highlights
- Support a monocular 3D detection method [FCOS3D](https://arxiv.org/abs/2104.10956)
- Support ScanNet and S3DIS semantic segmentation dataset
- Enhancement of visualization tools for dataset browsing and demos, including support of visualization for multi-modality data and point cloud segmentation.
#### New Features
- Support ScanNet semantic segmentation dataset (#390)
- Support monocular 3D detection on nuScenes (#392)
- Support multi-modality visualization (#405)
- Support nuimages visualization (#408)
- Support monocular 3D detection on KITTI (#415)
- Support online visualization of semantic segmentation results (#416)
- Support ScanNet test results submission to online benchmark (#418)
- Support S3DIS data pre-processing and dataset class (#433)
- Support FCOS3D (#436, #442, #482, #484)
- Support dataset browse for multiple types of datasets (#467)
- Adding paper-with-code (PWC) metafile for each model in the model zoo (#485)
#### Improvements
- Support dataset browsing for SUNRGBD, ScanNet or KITTI points and detection results (#367)
- Add the pipeline to load data using file client (#430)
- Support to customize the type of runner (#437)
- Make pipeline functions process points and masks simultaneously when sampling points (#444)
- Add waymo unit tests (#455)
- Split the visualization of projecting points onto image from that for only points (#480)
- Efficient implementation of PointSegClassMapping (#489)
- Use the new model registry from mmcv (#495)
#### Bug Fixes
- Fix Pytorch 1.8 Compilation issue in the [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu)(#404)
- Fix [dynamic_scatter](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) errors triggered by empty point input (#417)
- Fix the bug of missing points caused by using break incorrectly in the voxelization (#423)
- Fix the missing `coord_type` in the waymo dataset [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/waymoD5-3d-3class.py)(#441)
- Fix errors in four unittest functions of [configs](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d.py), [test_detectors.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_detectors.py), [test_heads.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tests/test_models/test_heads/test_heads.py)(#453)
- Fix 3DSSD training errors and simplify configs (#462)
- Clamp 3D votes projections to image boundaries in ImVoteNet (#463)
- Update out-of-date names of pipelines in the [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py) of pointpillars benchmark (#474)
- Fix the lack of a placeholder when unpacking RPN targets in the [h3d_bbox_head.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py)(#508)
- Fix the incorrect value of `K` when creating pickle files for SUN RGB-D (#511)
### v0.12.0 (1/4/2021)
#### Highlights
- Support a new multi-modality method [ImVoteNet](https://arxiv.org/abs/2001.10692).
- Support PyTorch 1.7 and 1.8
- Refactor the structure of tools and [train.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py)/[test.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/test.py)
#### New Features
- Support LiDAR-based semantic segmentation metrics (#332)
- Support [ImVoteNet](https://arxiv.org/abs/2001.10692)(#352, #384)
- Support the KNN GPU operation (#360, #371)
#### Improvements
- Add FAQ for common problems in the documentation (#333)
- Refactor the structure of tools (#339)
- Refactor [train.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py) and [test.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/test.py)(#343)
- Support demo on nuScenes (#353)
- Add 3DSSD checkpoints (#359)
- Update the Bibtex of CenterPoint (#368)
- Add citation format and reference to other OpenMMLab projects in the README (#374)
- Upgrade the mmcv version requirements (#376)
- Add numba and numpy version requirements in FAQ (#379)
- Avoid unnecessary for-loop execution of vfe layer creation (#389)
- Update SUNRGBD dataset documentation to stress the requirements for training ImVoteNet (#391)
- Modify vote head to support 3DSSD (#396)
#### Bug Fixes
- Fix missing keys `coord_type` in database sampler config (#345)
- Rename H3DNet configs (#349)
- Fix CI by using ubuntu 18.04 in github workflow (#350)
- Add assertions to avoid 4-dim points being input to [points_in_boxes](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py)(#357)
- Fix the SECOND results on Waymo in the corresponding [README](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second)(#363)
- Fix the incorrect adopted pipeline when adding val to workflow (#370)
- Fix a potential bug when indices used in the backwarding in ThreeNN (#377)
- Fix a compilation error triggered by [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) in PyTorch 1.7 (#393)
### v0.11.0 (1/3/2021)
#### Highlights
- Support more friendly visualization interfaces based on open3d
- Support a faster and more memory-efficient implementation of DynamicScatter
- Refactor unit tests and details of configs
#### New Features
- Support new visualization methods based on open3d (#284, #323)
#### Improvements
- Refactor unit tests (#303)
- Move the key `train_cfg` and `test_cfg` into the model configs (#307)
- Update [README](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md/) with [Chinese version](https://github.com/open-mmlab/mmdetection3d/blob/master/README_zh-CN.md/) and [instructions for getting started](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md/). (#310, #316)
- Support a faster and more memory-efficient implementation of DynamicScatter (#318, #326)
#### Bug Fixes
- Fix an unsupported bias setting in the unit test for centerpoint head (#304)
- Fix errors due to typos in the centerpoint head (#308)
- Fix a minor bug in [points_in_boxes.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py) when tensors are not in the same device. (#317)
- Fix warning of deprecated usages of nonzero during training with PyTorch 1.6 (#330)
### v0.10.0 (1/2/2021)
#### Highlights
- Preliminary release of API for SemanticKITTI dataset.
- Documentation and demo enhancement for better user experience.
- Fix a number of underlying minor bugs and add some corresponding important unit tests.
#### New Features
- Support SemanticKITTI dataset preliminarily (#287)
#### Improvements
- Add tag to README in configurations for specifying different uses (#262)
- Update instructions for evaluation metrics in the documentation (#265)
- Add nuImages entry in [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md/) and gif demo (#266, #268)
- Add unit test for voxelization (#275)
#### Bug Fixes
- Fixed the issue of unpacking size in [furthest_point_sample.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/furthest_point_sample/furthest_point_sample.py)(#248)
- Fix bugs for 3DSSD triggered by empty ground truths (#258)
- Remove models without checkpoints in model zoo statistics of documentation (#259)
- Fix some unclear installation instructions in [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md/)(#269)
- Fix relative paths/links in the documentation (#271)
- Fix a minor bug in [scatter_points_cuda.cu](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/voxel/src/scatter_points_cuda.cu) when num_features != 4 (#275)
- Fix the bug about missing text files when testing on KITTI (#278)
- Fix issues caused by inplace modification of tensors in `BaseInstance3DBoxes` (#283)
- Fix log analysis for evaluation and adjust the documentation accordingly (#285)
### v0.9.0 (31/12/2020)
#### Highlights
- Documentation refactoring with better structure, especially about how to implement new models and customized datasets.
- More compatible with refactored point structure by bug fixes in ground truth sampling.
#### Improvements
- Documentation refactoring (#242)
#### Bug Fixes
- Fix point structure related bugs in ground truth sampling (#211)
- Fix loading points in ground truth sampling augmentation on nuScenes (#221)
- Fix channel setting in the SeparateHead of CenterPoint (#228)
- Fix evaluation for indoors 3D detection in case of less classes in prediction (#231)
- Remove unreachable lines in nuScenes data converter (#235)
- Minor adjustments of numpy implementation for perspective projection and prediction filtering criterion in KITTI evaluation (#241)
### v0.8.0 (30/11/2020)
#### Highlights
- Refactor points structure with more constructive and clearer implementation.
- Support axis-aligned IoU loss for VoteNet with better performance.
- Update and enhance [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark on Waymo.
#### New Features
- Support axis-aligned IoU loss for VoteNet. (#194)
- Support points structure for consistent processing of all the point related representation. (#196, #204)
#### Improvements
- Enhance [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark on Waymo with stronger baselines. (#205)
- Add model zoo statistics and polish the documentation. (#201)
### v0.7.0 (1/11/2020)
#### Highlights
- Support a new method [SSN](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700579.pdf) with benchmarks on nuScenes and Lyft datasets.
- Update benchmarks for SECOND on Waymo, CenterPoint with TTA on nuScenes and models with mixed precision training on KITTI and nuScenes.
- Support semantic segmentation on nuImages and provide [HTC](https://arxiv.org/abs/1901.07518) models with configurations and performance for reference.
#### New Features
- Modified primitive head which can support the setting on SUN-RGBD dataset (#136)
- Support semantic segmentation and [HTC](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/nuimages) with models for reference on nuImages dataset (#155)
- Support [SSN](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/ssn) on nuScenes and Lyft datasets (#147, #174, #166, #182)
- Support double flip for test time augmentation of CenterPoint with updated benchmark (#143)
#### Improvements
- Update [SECOND](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/second) benchmark with configurations for reference on Waymo (#166)
- Delete checkpoints on Waymo to comply its specific license agreement (#180)
- Update models and instructions with [mixed precision training](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/fp16) on KITTI and nuScenes (#178)
#### Bug Fixes
- Fix incorrect code weights in anchor3d_head when introducing mixed precision training (#173)
- Fix the incorrect label mapping on nuImages dataset (#155)
### v0.6.1 (11/10/2020)
#### Highlights
- Support mixed precision training of voxel-based methods
- Support docker with PyTorch 1.6.0
- Update baseline configs and results ([CenterPoint](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/centerpoint) on nuScenes and [PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars) on Waymo with full dataset)
- Switch model zoo to download.openmmlab.com
#### New Features
- Support dataset pipeline `VoxelBasedPointSampler` to sample multi-sweep points based on voxelization. (#125)
- Support mixed precision training of voxel-based methods (#132)
- Support docker with PyTorch 1.6.0 (#160)
#### Improvements
- Reduce requirements for the case exclusive of Waymo (#121)
- Switch model zoo to download.openmmlab.com (#126)
- Update docs related to Waymo (#128)
- Add version assertion in the [init file](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/__init__.py)(#129)
- Add evaluation interval setting for CenterPoint (#131)
- Add unit test for CenterPoint (#133)
- Update [PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars) baselines on Waymo with full dataset (#142)
- Update [CenterPoint](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/centerpoint) results with models and logs (#154)
#### Bug Fixes
- Fix a bug of visualization in multi-batch case (#120)
- Fix bugs in dcn unit test (#130)
- Fix dcn bias bug in centerpoint (#137)
- Fix dataset mapping in the evaluation of nuScenes mini dataset (#140)
- Fix origin initialization in `CameraInstance3DBoxes` (#148, #150)
- Correct documentation link in the getting_started.md (#159)
- Fix model save path bug in gather_models.py (#153)
- Fix image padding shape bug in `PointFusion` (#162)
### v0.6.0 (20/9/2020)
#### Highlights
- Support new methods [H3DNet](https://arxiv.org/abs/2006.05682), [3DSSD](https://arxiv.org/abs/2002.10187), [CenterPoint](https://arxiv.org/abs/2006.11275).
- Support new dataset [Waymo](https://waymo.com/open/)(with PointPillars baselines) and [nuImages](https://www.nuscenes.org/nuimages)(with Mask R-CNN and Cascade Mask R-CNN baselines).
- Support Batch Inference
- Support Pytorch 1.6
- Start to publish `mmdet3d` package to PyPI since v0.5.0. You can use mmdet3d through `pip install mmdet3d`.
#### Backwards Incompatible Changes
- Support Batch Inference (#95, #103, #116): MMDetection3D v0.6.0 migrates to support batch inference based on MMDetection >= v2.4.0. This change influences all the test APIs in MMDetection3D and downstream codebases.
- Start to use collect environment function from MMCV (#113): MMDetection3D v0.6.0 migrates to use `collect_env` function in MMCV.
`get_compiler_version` and `get_compiling_cuda_version` compiled in `mmdet3d.ops.utils` are removed. Please import these two functions from `mmcv.ops`.
#### New Features
- Support [nuImages](https://www.nuscenes.org/nuimages) dataset by converting them into coco format and release Mask R-CNN and Cascade Mask R-CNN baseline models (#91, #94)
- Support to publish to PyPI in github-action (#17, #19, #25, #39, #40)
- Support CBGSDataset and make it generally applicable to all the supported datasets (#75, #94)
- Support [H3DNet](https://arxiv.org/abs/2006.05682) and release models on ScanNet dataset (#53, #58, #105)
- Support Fusion Point Sampling used in [3DSSD](https://arxiv.org/abs/2002.10187)(#66)
- Add `BackgroundPointsFilter` to filter background points in data pipeline (#84)
- Support pointnet2 with multi-scale grouping in backbone and refactor pointnets (#82)
- Support dilated ball query used in [3DSSD](https://arxiv.org/abs/2002.10187)(#96)
- Support [3DSSD](https://arxiv.org/abs/2002.10187) and release models on KITTI dataset (#83, #100, #104)
- Support [CenterPoint](https://arxiv.org/abs/2006.11275) and release models on nuScenes dataset (#49, #92)
- Support [Waymo](https://waymo.com/open/) dataset and release PointPillars baseline models (#118)
- Allow `LoadPointsFromMultiSweeps` to pad empty sweeps and select multiple sweeps randomly (#67)
#### Improvements
- Fix all warnings and bugs in PyTorch 1.6.0 (#70, #72)
- Update issue templates (#43)
- Update unit tests (#20, #24, #30)
- Update documentation for using `ply` format point cloud data (#41)
- Use points loader to load point cloud data in ground truth (GT) samplers (#87)
- Unify version file of OpenMMLab projects by using `version.py` (#112)
- Remove unnecessary data preprocessing commands of SUN RGB-D dataset (#110)
#### Bug Fixes
- Rename CosineAnealing to CosineAnnealing (#57)
- Fix device inconsistent bug in 3D IoU computation (#69)
- Fix a minor bug in json2csv of lyft dataset (#78)
- Add missed test data for pointnet modules (#85)
- Fix `use_valid_flag` bug in `CustomDataset` (#106)
In this version, we make large refactoring based on MMEngine to achieve unified data elements, model interfaces, visualizers, evaluators and other runtime modules across different datasets, tasks and even codebases. A brief summary for this refactoring is as follows:
- Data Element:
- We add [`Det3DDataSample`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/structures/det3d_data_sample.py) as the common data element passing through datasets and models. It inherits from [`DetDataSample`](<%5Bhttps://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/structures/det3d_data_sample.py%5D(https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/structures/det_data_sample.py)>) in mmdetection and implement `InstanceData`, `PixelData`, and
`LabelData` inheriting from `BaseDataElement` in MMEngine to represent different types of ground truth labels or predictions.
- Datasets:
- We add [`Det3DDataset`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/det3d_dataset.py) and [`Seg3DDataset`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/seg3d_dataset.py) as the base datasets to inherit from the unified `BaseDataset` in MMEngine. They implement most functions that are commonly used across different datasets and simplify the info loading/processing in the current datasets. Re-defined input arguments and functions can be most re-used in different datasets, which are important for the implementation of customized datasets.
- We define the common keys across different datasets and unify all the info files with a standard protocol. The same info is more clear for users because they share the same key across different dataset infos. Besides, for different settings, such as camera-only and LiDAR-only methods, we no longer need different info formats (like the previous pkl and json files). We can just revise the `parse_data_info` to read the necessary information from a complete info file.
- We add `train_dataloader`, `val_dataloader` and `test_dataloader` to replace the original `data` in the config. It simplify the levels of data-related fields.
- Data Transforms
- Based on the basic transforms and wrappers re-implemented and simplified in the latest MMCV, we refactor data transforms to inherit from them.
- We also adjust the implementation of current data pipelines to make them compatible with our latest data protocol.
- Normalization, padding of images and voxelization operations are moved to the data-preprocessing.
-`DefaultFormatBundle3D` and `Collect3D` are replaced with `PackDet3DInputs` to pack the data into the element format as model input.
- Models
- Unify the model interface as `inputs`, `data_samples`, `return_loss=False`
- The basic pre-processing before model forward includes: 1) convert input from CPU to GPU tensors; 2) padding images; 3) normalize images; 4) voxelization.
- Return `loss_dict` during training while return `list[data_sample]` during inference
- Simply function interfaces in the models
- Add `preprocess_cfg` in the model configs for pre-processing
- Visualizer
- Design a unified visualizer, [`Det3DLocalVisualizer`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/visualization/local_visualizer.py), based on MMEngine for different 3D tasks and settings
- Support browsing dataset and visualization hooks based on the [`Det3DLocalVisualizer`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/visualization/local_visualizer.py)
- Evaluator
- Decouple evaluators from datasets to make them more flexible: the evaluation codes of each dataset are implemented as a metric class exclusively.
- Add evaluator information to the current dataset configs
- Registry
- Refactor all the registries to inherit from root registries in MMEngine
- When using modules from other codebases, it is necessary to specify the registry scope, such as `mmdet.ResNet`
- Others: Refactor logging, hooks, scheduler, runner and other runtime configs based on MMEngine
## v1.0.0rc1
### Operators Migration
We have adopted CUDA operators compiled from [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/__init__.py) and removed all the CUDA operators in mmdet3d. We now do not need to compile the CUDA operators in mmdet3d anymore.
### Waymo dataset converter refactoring
In this version we did a major code refactoring that boosted the performance of waymo dataset conversion by multiprocessing.
Meanwhile, we also fixed the imprecise timestamps saving issue in waymo dataset conversion. This change introduces following backward compatibility breaks:
- The point cloud .bin files of waymo dataset need to be regenerated.
In the .bin files each point occupies 6 `float32` and the meaning of the last `float32` now changed from **imprecise timestamps** to **range frame offset**.
The **range frame offset** for each point is calculated as`ri * h * w + row * w + col` if the point is from the **TOP** lidar or `-1` otherwise.
The `h`, `w` denote the height and width of the TOP lidar's range frame.
The `ri`, `row`, `col` denote the return index, the row and the column of the range frame where each point locates.
Following tables show the difference across the change:
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- The objects' point cloud .bin files in the GT-database of waymo dataset need to be regenerated because we also dumped the range frame offset for each point into it.
Following tables show the difference across the change:
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- Any configuration that uses waymo dataset with GT Augmentation should change the `db_sampler.points_loader.load_dim` from `5` to `6`.
## v1.0.0rc0
### Coordinate system refactoring
In this version, we did a major code refactoring which improved the consistency among the three coordinate systems (and corresponding box representation), LiDAR, Camera, and Depth. A brief summary for this refactoring is as follows:
- The three coordinate systems are all right-handed now (which means the yaw angle increases in the counterclockwise direction).
- The LiDAR system `(x_size, y_size, z_size)` corresponds to `(l, w, h)` instead of `(w, l, h)`. This is more natural since `l` is parallel with the direction where the yaw angle is zero, and we prefer using the positive direction of the `x` axis as that direction, which is exactly how we define yaw angle in Depth and Camera coordinate systems.
- The APIs for box-related operations are improved and now are more user-friendly.
#### ***NOTICE!!***
Since definitions of box representation have changed, the annotation data of most datasets require updating:
- SUN RGB-D: Yaw angles in the annotation should be reversed.
- KITTI: For LiDAR boxes in GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted from the old LiDAR coordinate system to the new one. The training/validation data annotations should be left unchanged since they are under the Camera coordinate system, which is unmodified after the refactoring.
- Waymo: Same as KITTI.
- nuScenes: For LiDAR boxes in training/validation data and GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted.
- Lyft: Same as nuScenes.
Please regenerate the data annotation/GT database files or use [`update_data_coords.py`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/tools/update_data_coords.py) to update the data.
To use boxes under Depth and LiDAR coordinate systems, or to convert boxes between different coordinate systems, users should be aware of the difference between the old and new definitions. For example, the rotation, flipping, and bev functions of [`DepthInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/depth_box3d.py) and [`LiDARInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/lidar_box3d.py) and box conversion [functions](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/box_3d_mode.py) have all been reimplemented in the refactoring.
Consequently, functions like [`output_to_lyft_box`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/datasets/lyft_dataset.py) undergo small modification to adapt to the new LiDAR/Depth box.
Since the LiDAR system `(x_size, y_size, z_size)` now corresponds to `(l, w, h)` instead of `(w, l, h)`, the anchor sizes for LiDAR boxes are also changed, e.g., from `[1.6, 3.9, 1.56]` to `[3.9, 1.6, 1.56]`.
Functions only involving points are generally unaffected except if they rely on some refactored utility functions such as `rotation_3d_in_axis`.
#### Other BC-breaking or new features:
-`array_converter`: Please refer to [array_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/utils/array_converter.py). Functions wrapped with `array_converter` can convert array-like input types of `torch.Tensor`, `np.ndarray`, and `list/tuple/float` to `torch.Tensor` to process in an unified PyTorch pipeline. The result may finally be converted back to the input type. Most functions in [utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/utils.py) are wrapped with `array_converter`.
-[`points_in_boxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/base_box3d.py) and [`points_in_boxes_batch`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/base_box3d.py) will be deprecated soon. They are renamed to `points_in_boxes_part` and `points_in_boxes_all` respectively, with more detailed docstrings. The major difference of the two functions is that if a point is enclosed by multiple boxes, `points_in_boxes_part` will only return the index of the first enclosing box while `points_in_boxes_all` will return all the indices of enclosing boxes.
-`rotation_3d_in_axis`: Please refer to [utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/utils.py). Now this function supports multiple input types and more options. The function with the same name in [box_np_ops.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/box_np_ops.py) is deleted since we do not need another function to tackle with NumPy data. `rotation_2d`, `points_cam2img`, and `limit_period` in box_np_ops.py are also deleted for the same reason.
-`bev` method of [`CameraInstance3DBoxes`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/core/bbox/structures/cam_box3d.py): Changed it to be consistent with the definition of bev in Depth and LiDAR coordinate systems.
- Data augmentation utils in [data_augment_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/datasets/pipelines/data_augment_utils.py) now follow the rules of a right-handed system.
- We do not need the yaw hacking in KITTI anymore after refining [`get_direction_target`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/models/dense_heads/train_mixins.py). Interested users may refer to PR [#677](https://github.com/open-mmlab/mmdetection3d/pull/677) .
## 0.16.0
### Returned values of `QueryAndGroup` operation
We modified the returned `grouped_xyz` value of operation `QueryAndGroup` to support PAConv segmentor. Originally, the `grouped_xyz` is centered by subtracting the grouping centers, which represents the relative positions of grouped points. Now, we didn't perform such subtraction and the returned `grouped_xyz` stands for the absolute coordinates of these points.
Note that, the other returned variables of `QueryAndGroup` such as `new_features`, `unique_cnt` and `grouped_idx` are not affected.
### NuScenes coco-style data pre-processing
We remove the rotation and dimension hack in the monocular 3D detection on nuScenes. Specifically, we transform the rotation and dimension of boxes defined by nuScenes devkit to the coordinate system of our `CameraInstance3DBoxes` in the pre-processing and transform them back in the post-processing. In this way, we can remove the corresponding [hack](https://github.com/open-mmlab/mmdetection3d/pull/744/files#diff-5bee5062bd84e6fa25a2fdd71353f6f283dfdc4a66a0316c3b1ca26078c978b6L165) used in the visualization tools. The modification also guarantees the correctness of all the operations based on our `CameraInstance3DBoxes` (such as NMS and flip augmentation) when training monocular 3D detectors.
The modification only influences nuScenes coco-style json files. Please re-run the nuScenes data preparation script if necessary. See more details in the PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744).
### ScanNet dataset for ImVoxelNet
We adopt a new pre-processing procedure for the ScanNet dataset in order to support ImVoxelNet, which is a multi-view method requiring image data. In previous versions of MMDetection3D, ScanNet dataset was only used for point cloud based 3D detection and segmentation methods. We plan adding ImVoxelNet to our model zoo, thus updating ScanNet correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
- Add [script](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/extract_posed_images.py) for extracting RGB data.
- Update [script](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/scannet_data_utils.py) for annotation creating.
- Add instructions in the documents on preparing image data.
Please refer to the ScanNet [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/) for more details.
## 0.15.0
### MMCV Version
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to [#1120](https://github.com/open-mmlab/mmcv/pull/1120), for related issues, please refer to [#5343](https://github.com/open-mmlab/mmdetection/issues/5343).
### Unified parameter initialization
To unify the parameter initialization in OpenMMLab projects, MMCV supports `BaseModule` that accepts `init_cfg` to allow the modules' parameters initialized in a flexible and unified manner. Now the users need to explicitly call `model.init_weights()` in the training script to initialize the model (as in [here](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/train.py#L183), previously this was handled by the detector. Please refer to PR [#622](https://github.com/open-mmlab/mmdetection3d/pull/622) for details.
### BackgroundPointsFilter
We modified the dataset augmentation function `BackgroundPointsFilter`([here](https://github.com/open-mmlab/mmdetection3d/blob/v0.15.0/mmdet3d/datasets/pipelines/transforms_3d.py#L1132)). In previous version of MMdetection3D, `BackgroundPointsFilter` changes the gt_bboxes_3d's bottom center to the gravity center. In MMDetection3D 0.15.0,
`BackgroundPointsFilter` will not change it. Please refer to PR [#609](https://github.com/open-mmlab/mmdetection3d/pull/609) for details.
### Enhance `IndoorPatchPointSample` transform
We enhance the pipeline function `IndoorPatchPointSample` used in point cloud segmentation task by adding more choices for patch selection. Also, we plan to remove the unused parameter `sample_rate` in the future. Please modify the code as well as the config files accordingly if you use this transform.
## 0.14.0
### Dataset class for 3D segmentation task
We remove a useless parameter `label_weight` from segmentation datasets including `Custom3DSegDataset`, `ScanNetSegDataset` and `S3DISSegDataset` since this weight is utilized in the loss function of model class. Please modify the code as well as the config files accordingly if you use or inherit from these codes.
### ScanNet data pre-processing
We adopt new pre-processing and conversion steps of ScanNet dataset. In previous versions of MMDetection3D, ScanNet dataset was only used for 3D detection task, where we trained on the training set and tested on the validation set. In MMDetection3D 0.14.0, we further support 3D segmentation task on ScanNet, which includes online benchmarking on test set. Since the alignment matrix is not provided for test set data, we abandon the alignment of points in data generation steps to support both tasks. Besides, as 3D segmentation requires per-point prediction, we also remove the down-sampling step in data generation.
- In the new ScanNet processing scripts, we save the unaligned points for all the training, validation and test set. For train and val set with annotations, we also store the `axis_align_matrix` in data infos. For ground-truth bounding boxes, we store boxes in both aligned and unaligned coordinates with key `gt_boxes_upright_depth` and key `unaligned_gt_boxes_upright_depth` respectively in data infos.
- In `ScanNetDataset`, we now load the `axis_align_matrix` as a part of data annotations. If it is not contained in old data infos, we will use identity matrix for compatibility. We also add a transform function `GlobalAlignment` in ScanNet detection data pipeline to align the points.
- Since the aligned boxes share the same key as in old data infos, we do not need to modify the code related to it. But do remember that they are not in the same coordinate system as the saved points.
- There is an `PointSample` pipeline in the data pipelines for ScanNet detection task which down-samples points. So removing down-sampling in data generation will not affect the code.
We have trained a [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet/votenet_8x8_scannet-3d-18class.py) model on the newly processed ScanNet dataset and get similar benchmark results. In order to prepare ScanNet data for both detection and segmentation tasks, please re-run the new pre-processing scripts following the ScanNet [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/).
## 0.12.0
### SUNRGBD dataset for ImVoteNet
We adopt a new pre-processing procedure for the SUNRGBD dataset in order to support ImVoteNet, which is a multi-modality method requiring both image and point cloud data. In previous versions of MMDetection3D, SUNRGBD dataset was only used for point cloud based 3D detection methods. In MMDetection3D 0.12.0, we add ImVoteNet to our model zoo, thus updating SUNRGBD correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
- Fix a bug in the image file path in meta data.
- Convert calibration matrices from double to float to avoid type mismatch in further operations.
- Add instructions in the documents on preparing image data.
Please refer to the SUNRGBD [README.md](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/) for more details.
## 0.6.0
### VoteNet and H3DNet model structure update
In MMDetection 0.6.0, we updated the model structures of VoteNet and H3DNet, therefore model checkpoints generated by MMDetection \< 0.6.0 should be first converted to a format compatible with the latest structures via [convert_votenet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_votenet_checkpoints.py) and [convert_h3dnet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_h3dnet_checkpoints.py) . For more details, please refer to the VoteNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/votenet/README.md/) and H3DNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/h3dnet/README.md/).
OpenMMLab welcomes everyone who is interested in contributing to our projects and accepts contribution in the form of PR.
## What is PR
`PR` is the abbreviation of `Pull Request`. Here's the definition of `PR` in the [official document](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) of Github.
```
Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.
```
## Basic Workflow
1. Get the most recent codebase
2. Checkout a new branch from `dev-1.x` or `dev` branch, depending on the version of the codebase you want to contribute to. The main differences between `dev-1.x` and `dev` is that `dev-1.x` depends on MMEngine additionally and it's the main branch we maintains. We strongly recommend you pull request based on more advanced `dev-1.x` branch.
3. Commit your changes ([Don't forget to use pre-commit hooks!](#3-commit-your-changes))
4. Push your changes and create a PR
5. Discuss and review your code
6. Merge your branch to `dev-1.x` / `dev` branch
## Procedures in detail
### 1. Get the most recent codebase
- When you work on your first PR
Fork the OpenMMLab repository: click the **fork** button at the top right corner of Github page
Checkout the latest branch of the local repository and pull the latest branch of the source repository. Here we assume that you are working on the `dev-1.x` branch.
```bash
git checkout dev-1.x
git pull upstream dev-1.x
```
### 2. Checkout a new branch from the `dev-1.x` / `dev` branch
```bash
git checkout -b branchname
```
```{tip}
To make commit history clear, we strongly recommend you checkout the `dev-1.x` branch before creating a new branch.
```
### 3. Commit your changes
- If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.
```bash
pip install-U pre-commit
pre-commit install
```
- Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.
```bash
# coding
git add [files]
git commit -m'messages'
```
```{note}
Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.
```
### 4. Push your changes to the forked repository and create a PR
- Push the branch to your forked remote repository
- Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message (For more information, checkout the [official guidance](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)).
- Specifically, if you are contributing to `dev-1.x`, you will have to change the base branch of the PR to `dev-1.x` in the PR page, since the default base branch is `master`.