raw_mmdetection

7aa442d5 · raojy · 9c03eaa8 · 7aa442d5 · 7aa442d5 · 7aa442d5
Commit 7aa442d5 authored Apr 01, 2026 by raojy
20 changed files
--- a/mmdetection3d/docs/en/notes/faq.md
+++ b/mmdetection3d/docs/en/notes/faq.md
+# FAQ
+We list some potential troubles encountered by users and developers, along with their corresponding solutions. Feel free to enrich the list if you find any frequent issues and contribute your solutions to solve them. If you have any trouble with environment configuration, model training, etc, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection3d/blob/master/.github/ISSUE_TEMPLATE/error-report.md) and fill in all required information in the template.
+## MMEngine/MMCV/MMDet/MMDet3D Installation
+- Compatibility issue between MMEngine, MMCV, MMDetection and MMDetection3D; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx."
+- The required versions of MMEngine, MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMEngine, MMCV and MMDetection to avoid installation issues.
+  | MMDetection3D version |     MMEngine version     |      MMCV version       |   MMDetection version    |
+  | --------------------- | :----------------------: | :---------------------: | :----------------------: |
+  | main                  | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
+  | v1.4.0                | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
+  | v1.3.0                | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.3.0 |
+  | v1.2.0                | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 |  mmdet>=3.0.0, \<3.2.0   |
+  | v1.1.1                | mmengine>=0.7.1, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 |  mmdet>=3.0.0, \<3.1.0   |
+  **Note:** If you want to install mmdet3d-v1.0.0rcx, the compatible MMDetection, MMSegmentation and MMCV versions table can be found at [here](https://mmdetection3d.readthedocs.io/en/latest/faq.html#mmcv-mmdet-mmdet3d-installation). Please choose the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
+- If you faced the error shown below when importing open3d:
+  `OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found`
+  please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
+- If you faced the error when importing pycocotools, this is because nuscenes-devkit installs pycocotools but mmdet relies on mmpycocotools. The current workaround is as below. We will migrate to use pycocotools in the future.
+  ```shell
+  pip uninstall pycocotools mmpycocotools
+  pip install mmpycocotools
+  ```
+  **NOTE**: We have migrated to use pycocotools in mmdet3d >= 0.13.0.
+- If you face the error shown below when importing pycocotools:
+  `ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
+  please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy \< 1.20.0. Or you can compile and install the latest pycocotools from source as below:
+  `pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
+  or
+  `pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
+- If you face some errors about numba in cuda-9.0 environment, you should check the version of numba. In cuda-9.0 environment, the high version of numba is not supported and we suggest you could install numba==0.53.0.
+## How to annotate point cloud?
+MMDetection3D does not support point cloud annotation. Some open-source annotation tool are offered for reference:
+- [SUSTechPOINTS](https://github.com/naurril/SUSTechPOINTS)
+- [LATTE](https://github.com/bernwang/latte)
+Besides, we improved [LATTE](https://github.com/bernwang/latte) for better use. More details can be found [here](https://arxiv.org/abs/2011.10174).
--- a/mmdetection3d/docs/en/notes/index.rst
+++ b/mmdetection3d/docs/en/notes/index.rst
+.. toctree::
+   :maxdepth: 1
+   benchmarks.md
+   changelog_v1.0.x.md
+   changelog.md
+   compatibility.md
+   faq.md
+   contribution_guides.md
--- a/mmdetection3d/docs/en/stat.py
+++ b/mmdetection3d/docs/en/stat.py
+#!/usr/bin/env python
+import functools as func
+import glob
+import re
+from os import path as osp
+import numpy as np
+url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/main'
+files = sorted(glob.glob('../../configs/*/README.md'))
+stats = []
+titles = []
+num_ckpts = 0
+for f in files:
+    url = osp.dirname(f.replace('../../', url_prefix))
+    with open(f, 'r') as content_file:
+        content = content_file.read()
+    title = content.split('\n')[0].replace('# ', '').strip()
+    ckpts = set(x.lower().strip()
+                for x in re.findall(r'\[model\]\((https?.*)\)', content))
+    if len(ckpts) == 0:
+        continue
+    _papertype = [x for x in re.findall(r'\[([A-Z]+)\]', content)]
+    assert len(_papertype) > 0
+    papertype = _papertype[0]
+    paper = set([(papertype, title)])
+    titles.append(title)
+    num_ckpts += len(ckpts)
+    statsmsg = f"""
+\t* [{papertype}] [{title}]({url}) ({len(ckpts)} ckpts)
+"""
+    stats.append((paper, ckpts, statsmsg))
+allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _ in stats])
+msglist = '\n'.join(x for _, _, x in stats)
+papertypes, papercounts = np.unique([t for t, _ in allpapers],
+                                    return_counts=True)
+countstr = '\n'.join(
+    [f'   - {t}: {c}' for t, c in zip(papertypes, papercounts)])
+modelzoo = f"""
+# Model Zoo Statistics
+* Number of papers: {len(set(titles))}
+{countstr}
+* Number of checkpoints: {num_ckpts}
+{msglist}
+"""
+with open('modelzoo_statistics.md', 'w') as f:
+    f.write(modelzoo)
--- a/mmdetection3d/docs/en/switch_language.md
+++ b/mmdetection3d/docs/en/switch_language.md
+## <a href='https://mmdetection3d.readthedocs.io/en/latest/'>English</a>
+## <a href='https://mmdetection3d.readthedocs.io/zh_CN/latest/'>简体中文</a>
--- a/mmdetection3d/docs/en/user_guides/backends_support.md
+++ b/mmdetection3d/docs/en/user_guides/backends_support.md
+# Backends Support
+We support different file client backends: Disk, Ceph and LMDB, etc. Here is an example of how to modify configs for Ceph-based data loading and saving.
+## Load data and annotations from Ceph
+We support loading data and generated annotation info files (pkl and json) from Ceph:
+```python
+# set file client backends as Ceph
+backend_args = dict(
+    backend='petrel',
+    path_mapping=dict({
+        './data/nuscenes/':
+        's3://openmmlab/datasets/detection3d/nuscenes/', # replace the path with your data path on Ceph
+        'data/nuscenes/':
+        's3://openmmlab/datasets/detection3d/nuscenes/' # replace the path with your data path on Ceph
+    }))
+db_sampler = dict(
+    data_root=data_root,
+    info_path=data_root + 'kitti_dbinfos_train.pkl',
+    rate=1.0,
+    prepare=dict(filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5)),
+    sample_groups=dict(Car=15),
+    classes=class_names,
+    # set file client for points loader to load training data
+    points_loader=dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=4,
+        use_dim=4,
+        backend_args=backend_args),
+    # set file client for data base sampler to load db info file
+    backend_args=backend_args)
+train_pipeline = [
+    # set file client for loading training data
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
+    # set file client for loading training data annotations
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, backend_args=backend_args),
+    dict(type='ObjectSample', db_sampler=db_sampler),
+    dict(
+        type='ObjectNoise',
+        num_try=100,
+        translation_std=[0.25, 0.25, 0.25],
+        global_rot_range=[0.0, 0.0],
+        rot_range=[-0.15707963267, 0.15707963267]),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.78539816, 0.78539816],
+        scale_ratio_range=[0.95, 1.05]),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    # set file client for loading validation/testing data
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+            dict(
+                type='DefaultFormatBundle3D',
+                class_names=class_names,
+                with_label=False),
+            dict(type='Collect3D', keys=['points'])
+        ])
+]
+data = dict(
+    # set file client for loading training info files (.pkl)
+    train=dict(
+        type='RepeatDataset',
+        times=2,
+        dataset=dict(pipeline=train_pipeline, classes=class_names, backend_args=backend_args)),
+    # set file client for loading validation info files (.pkl)
+    val=dict(pipeline=test_pipeline, classes=class_names,backend_args=backend_args),
+    # set file client for loading testing info files (.pkl)
+    test=dict(pipeline=test_pipeline, classes=class_names, backend_args=backend_args))
+```
+## Load pretrained model from Ceph
+```python
+model = dict(
+    pts_backbone=dict(
+        _delete_=True,
+        type='NoStemRegNet',
+        arch='regnetx_1.6gf',
+        init_cfg=dict(
+            type='Pretrained', checkpoint='s3://openmmlab/checkpoints/mmdetection3d/regnetx_1.6gf'), # replace the path with your pretrained model path on Ceph
+        ...
+```
+## Load checkpoint from Ceph
+```python
+# replace the path with your checkpoint path on Ceph
+load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
+resume_from = None
+workflow = [('train', 1)]
+```
+## Save checkpoint into Ceph
+```python
+# checkpoint saving
+# replace the path with your checkpoint saving path on Ceph
+checkpoint_config = dict(interval=1, max_keep_ckpts=2, out_dir='s3://openmmlab/mmdetection3d')
+```
+## EvalHook saves the best checkpoint into Ceph
+```python
+# replace the path with your checkpoint saving path on Ceph
+evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetection3d')
+```
+## Save the training log into Ceph
+The training log will be backed up to the specified Ceph path after training.
+```python
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
+    ])
+```
+You can also delete the local training log after backing up to the specified Ceph path by setting `keep_local = False`.
+```python
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d', keep_local=False),
+    ])
+```
--- a/mmdetection3d/docs/en/user_guides/config.md
+++ b/mmdetection3d/docs/en/user_guides/config.md
+# Learn about Configs
+MMDetection3D and other OpenMMLab repositories use [MMEngine's config system](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html). It has a modular and inheritance design, which is convenient to conduct various experiments.
+## Config file content
+MMDetection3D uses a modular design, all modules with different functions can be configured through the config. Taking PointPillars as an example, we will introduce each field in the config according to different function modules.
+### Model config
+In MMDetection3D's config, we use `model` to setup detection algorithm components. In addition to neural network components such as `voxel_encoder`, `backbone` etc, it also requires `data_preprocessor`, `train_cfg`, and `test_cfg`. `data_preprocessor` is responsible for processing a batch of data output by dataloader. `train_cfg` and `test_cfg` in the model config are training and testing hyperparameters of the components.
+```python
+model = dict(
+    type='VoxelNet',
+    data_preprocessor=dict(
+        type='Det3DDataPreprocessor',
+        voxel=True,
+        voxel_layer=dict(
+            max_num_points=32,
+            point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
+            voxel_size=[0.16, 0.16, 4],
+            max_voxels=(16000, 40000))),
+    voxel_encoder=dict(
+        type='PillarFeatureNet',
+        in_channels=4,
+        feat_channels=[64],
+        with_distance=False,
+        voxel_size=[0.16, 0.16, 4],
+        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
+    middle_encoder=dict(
+        type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
+    backbone=dict(
+        type='SECOND',
+        in_channels=64,
+        layer_nums=[3, 5, 5],
+        layer_strides=[2, 2, 2],
+        out_channels=[64, 128, 256]),
+    neck=dict(
+        type='SECONDFPN',
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    bbox_head=dict(
+        type='Anchor3DHead',
+        num_classes=3,
+        in_channels=384,
+        feat_channels=384,
+        use_direction_classifier=True,
+        assign_per_class=True,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGenerator',
+            ranges=[[0, -39.68, -0.6, 69.12, 39.68, -0.6],
+                    [0, -39.68, -0.6, 69.12, 39.68, -0.6],
+                    [0, -39.68, -1.78, 69.12, 39.68, -1.78]],
+            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        diff_rad_by_sin=True,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(
+            type='mmdet.SmoothL1Loss',
+            beta=0.1111111111111111,
+            loss_weight=2.0),
+        loss_dir=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
+            loss_weight=0.2)),
+    train_cfg=dict(
+        assigner=[
+            dict(
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1)
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
+```
+### Dataset and evaluator config
+[Dataloaders](https://pytorch.org/docs/stable/data.html?highlight=data%20loader#torch.utils.data.DataLoader) are required for the training, validation, and testing of the [runner](https://mmengine.readthedocs.io/en/latest/tutorials/runner.html). Dataset and data pipeline need to be set to build the dataloader. Due to the complexity of this part, we use intermediate variables to simplify the writing of dataloader configs.
+```python
+dataset_type = 'KittiDataset'
+data_root = 'data/kitti/'
+class_names = ['Pedestrian', 'Cyclist', 'Car']
+point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
+input_modality = dict(use_lidar=True, use_camera=False)
+metainfo = dict(classes=class_names)
+db_sampler = dict(
+    data_root=data_root,
+    info_path=data_root + 'kitti_dbinfos_train.pkl',
+    rate=1.0,
+    prepare=dict(
+        filter_by_difficulty=[-1],
+        filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
+    classes=class_names,
+    sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
+    points_loader=dict(
+        type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4))
+train_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=True),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.78539816, 0.78539816],
+        scale_ratio_range=[0.95, 1.05]),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
+]
+test_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range)
+        ]),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+eval_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+train_dataloader = dict(
+    batch_size=6,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    dataset=dict(
+        type='RepeatDataset',
+        times=2,
+        dataset=dict(
+            type=dataset_type,
+            data_root=data_root,
+            ann_file='kitti_infos_train.pkl',
+            data_prefix=dict(pts='training/velodyne_reduced'),
+            pipeline=train_pipeline,
+            modality=input_modality,
+            test_mode=False,
+            metainfo=metainfo,
+            box_type_3d='LiDAR')))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='training/velodyne_reduced'),
+        ann_file='kitti_infos_val.pkl',
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR'))
+test_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='training/velodyne_reduced'),
+        ann_file='kitti_infos_val.pkl',
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR'))
+```
+[Evaluators](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html) are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
+```python
+val_evaluator = dict(
+    type='KittiMetric',
+    ann_file=data_root + 'kitti_infos_val.pkl',
+    metric='bbox')
+test_evaluator = val_evaluator
+```
+Since the test dataset has no annotation files, the test_dataloader and test_evaluator config in MMDetection3D are generally equal to the val's. If you want to save the detection results on the test dataset, you can write the config like this:
+```python
+# inference on test dataset and
+# format the output results for submission.
+test_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='testing/velodyne_reduced'),
+        ann_file='kitti_infos_test.pkl',
+        load_eval_anns=False,
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR'))
+test_evaluator = dict(
+    type='KittiMetric',
+    ann_file=data_root + 'kitti_infos_test.pkl',
+    metric='bbox',
+    format_only=True,
+    submission_prefix='results/kitti-3class/kitti_results')
+```
+### Training and testing config
+MMEngine's runner uses Loop to control the training, validation, and testing processes.
+Users can set the maximum training epochs and validation intervals with these fields:
+```python
+train_cfg = dict(
+    type='EpochBasedTrainLoop',
+    max_epochs=80,
+    val_interval=2)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+```
+### Optimization config
+`optim_wrapper` is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer, but also supports functions such as gradient clipping, mixed precision training, etc. Find more in [optimizer wrapper tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html).
+```python
+optim_wrapper = dict(  # Optimizer wrapper config
+    type='OptimWrapper',  # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.
+    optimizer=dict(  # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
+        type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01),
+    clip_grad=dict(max_norm=35, norm_type=2))  # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html
+```
+`param_scheduler` is a field that configures methods of adjusting optimization hyperparameters such as learning rate and momentum. Users can combine multiple schedulers to create a desired parameter adjustment strategy. Find more in [parameter scheduler tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/param_scheduler.html) and [parameter scheduler API documents](https://mmengine.readthedocs.io/en/latest/api/optim.html#scheduler).
+```python
+param_scheduler = [
+    dict(
+        type='CosineAnnealingLR',
+        T_max=32,
+        eta_min=0.01,
+        begin=0,
+        end=32,
+        by_epoch=True,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=48,
+        eta_min=1.0000000000000001e-07,
+        begin=32,
+        end=80,
+        by_epoch=True,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingMomentum',
+        T_max=32,
+        eta_min=0.8947368421052632,
+        begin=0,
+        end=32,
+        by_epoch=True,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingMomentum',
+        T_max=48,
+        eta_min=1,
+        begin=32,
+        end=80,
+        by_epoch=True,
+        convert_to_iter_based=True),
+]
+```
+### Hook config
+Users can attach Hooks to training, validation, and testing loops to insert some operations during running. There are two different hook fields, one is `default_hooks` and the other is `custom_hooks`.
+`default_hooks` is a dict of hook configs, and they are the hooks must be required at the runtime. They have default priority which should not be modified. If not set, runner will use the default values. To disable a default hook, users can set its config to `None`.
+```python
+default_hooks = dict(
+    timer=dict(type='IterTimerHook'),
+    logger=dict(type='LoggerHook', interval=50),
+    param_scheduler=dict(type='ParamSchedulerHook'),
+    checkpoint=dict(type='CheckpointHook', interval=-1),
+    sampler_seed=dict(type='DistSamplerSeedHook'),
+    visualization=dict(type='Det3DVisualizationHook'))
+```
+`custom_hooks` is a list of all other hook configs. Users can develop their own hooks and insert them in this field.
+```python
+custom_hooks = []
+```
+### Runtime config
+```python
+default_scope = 'mmdet3d'  # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/registry.html
+env_cfg = dict(
+    cudnn_benchmark=False,  # Whether to enable cudnn benchmark
+    mp_cfg=dict(  # Multi-processing config
+        mp_start_method='fork',  # Use fork to start multi-processing threads. 'fork' usually faster than 'spawn' but maybe unsafe. See discussion in https://github.com/pytorch/pytorch/issues/1355
+        opencv_num_threads=0),  # Disable opencv multi-threads to avoid system being overloaded
+    dist_cfg=dict(backend='nccl'))  # Distribution configs
+vis_backends = [dict(type='LocalVisBackend')]  # Visualization backends. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/visualization.html
+visualizer = dict(
+    type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
+log_processor = dict(
+    type='LogProcessor',  # Log processor to process runtime logs
+    window_size=50,  # Smooth interval of log values
+    by_epoch=True)  # Whether to format logs with epoch type. Should be consistent with the train loop's type.
+log_level = 'INFO'  # The level of logging.
+load_from = None  # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
+resume = False  # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
+```
+## Config file inheritance
+There are 4 basic component types under `configs/_base_`, dataset, model, schedule, default_runtime.
+Many methods could be easily constructed with one of these models like SECOND, PointPillars, PartA2, VoteNet.
+The configs that are composed of components from `_base_` are called _primitive_.
+For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
+For easy understanding, we recommend contributors to inherit from existing methods.
+For example, if some modification is made based on PointPillars, users may first inherit the basic PointPillars structure by specifying `_base_ = '../pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py'`, then modify the necessary fields in the config files.
+If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`.
+Please refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for detailed documentation.
+By setting the `_base_` field, we can set which files the current configuration file inherits from.
+When `_base_` is a string of a file path, it means inheriting the contents from one config file.
+```python
+_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
+```
+When `_base_` is a list of multiple file paths, it means inheriting from multiple files.
+```python
+_base_ = [
+    '../_base_/models/pointpillars_hv_secfpn_kitti.py',
+    '../_base_/datasets/kitti-3d-3class.py',
+    '../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
+]
+```
+If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
+### Ignore some fields in the base configs
+Sometimes, you may set `_delete_=True` to ignore some of the fields in base configs.
+You may refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for a simple illustration.
+In MMDetection3D, for example, to change the neck of PointPillars with the following config:
+```python
+model = dict(
+    type='MVXFasterRCNN',
+    data_preprocessor=dict(voxel_layer=dict(...)),
+    pts_voxel_encoder=dict(...),
+    pts_middle_encoder=dict(...),
+    pts_backbone=dict(...),
+    pts_neck=dict(
+        type='FPN',
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        act_cfg=dict(type='ReLU'),
+        in_channels=[64, 128, 256],
+        out_channels=256,
+        start_level=0,
+        num_outs=3),
+    pts_bbox_head=dict(...))
+```
+`FPN` and `SECONDFPN` use different keywords to construct:
+```python
+_base_ = '../_base_/models/pointpillars_hv_fpn_nus.py'
+model = dict(
+    pts_neck=dict(
+        _delete_=True,
+        type='SECONDFPN',
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    pts_bbox_head=dict(...))
+```
+The `_delete_=True` would replace all old keys in `pts_neck` field with new keys.
+### Use intermediate variables in configs
+Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
+It's worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again.
+For example, we would like to use a multi-scale strategy to train and test a PointPillars, `train_pipeline`/`test_pipeline` are intermediate variables we would like to modify.
+```python
+_base_ = './nus-3d.py'
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.3925, 0.3925],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0]),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectNameFilter', classes=class_names),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
+]
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=[0.95, 1.0, 1.05],
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range)
+        ]),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+```
+We first define the new `train_pipeline`/`test_pipeline` and pass them into dataloader fields.
+### Reuse variables in \_base\_ file
+If the users want to reuse the variables in the base file, they can get a copy of the corresponding variable by using `{{_base_.xxx}}`. E.g:
+```python
+_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
+a = {{_base_.model}}  # variable `a` is equal to the `model` defined in `_base_`
+```
+## Modify config through script arguments
+When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify `--cfg-options` to in-place modify the config.
+- Update config keys of dict chains
+  The config options can be specified following the order of the dict keys in the original config.
+  For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
+- Update keys inside a list of configs
+  Some config dicts are composed as a list in your config. For example, the training pipeline `train_dataloader.dataset.pipeline` is normally a list
+  e.g. `[dict(type='LoadPointsFromFile'), ...]`. If you want to change `'LoadPointsFromFile'` to `'LoadPointsFromDict'` in the pipeline,
+  you may specify `--cfg-options data.train.pipeline.0.type=LoadPointsFromDict`.
+- Update values of list/tuple
+  If the value to be updated is a list or a tuple. For example, the config file normally sets `model.data_preprocessor.mean=[123.675, 116.28, 103.53]`. If you want to
+  change the mean values, you may specify `--cfg-options model.data_preprocessor.mean="[127,127,127]"`. Note that the quotation mark `"` is necessary to
+  support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
+## Config Name Style
+We follow the below style to name config files. Contributors are advised to follow the same style.
+```
+{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}.py
+```
+The file name is divided to five parts. All parts and components are connected with `_` and words of each part or component should be connected with `-`.
+- `{algorithm name}`: The name of the algorithm. It can be a detector name such as `pointpillars`, `fcos3d`, etc.
+- `{model component names}`: Names of the components used in the algorithm such as voxel_encoder, backbone, neck, etc. For example, `second_secfpn_head-dcn-circlenms` means using SECOND's SparseEncoder, SECONDFPN and a detection head with DCN and circle NMS.
+- `{training settings}`: Information of training settings such as batch size, augmentations, loss trick, scheduler, and epochs/iterations. For example: `8xb4-tta-cyclic-20e` means using 8-gpus x 4-samples-per-gpu, test time augmentation, cyclic annealing learning rate, and train 20 epochs.
+  Some abbreviations:
+  - `{gpu x batch_per_gpu}`: GPUs and samples per GPU. `bN` indicates N batch size per GPU. E.g. `4xb4` is the short term of 4-GPUs x 4-samples-per-GPU.
+  - `{schedule}`: training schedule, options are `schedule-2x`, `schedule-3x`, `cyclic-20e`, etc.
+    `schedule-2x` and `schedule-3x` mean 24 epochs and 36 epochs respectively.
+    `cyclic-20e` means 20 epochs respectively.
+- `{training dataset information}`: Training dataset names like `kitti-3d-3class`, `nus-3d`, `s3dis-seg`, `scannet-seg`, `waymoD5-3d-car`. Here `3d` means dataset used for 3D object detection, and `seg` means dataset used for point cloud segmentation.
+- `{testing dataset information}` (optional): Testing dataset name for models trained on one dataset but tested on another. If not mentioned, it means the model was trained and tested on the same dataset type.
--- a/mmdetection3d/docs/en/user_guides/coord_sys_tutorial.md
+++ b/mmdetection3d/docs/en/user_guides/coord_sys_tutorial.md
+# Coordinate System
+## Overview
+MMDetection3D uses three different coordinate systems. The existence of different coordinate systems in the society of 3D object detection is necessary, because for various 3D data collection devices, such as LiDAR, depth camera, etc., the coordinate systems are not consistent, and different 3D datasets also follow different data formats. Early works, such as SECOND, VoteNet, convert the raw data to another format, forming conventions that some later works also follow, making the conversion between coordinate systems even more complicated.
+Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
+- Camera coordinate system -- the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
+  ```
+             up  z front
+              |    ^
+              |   /
+              |  /
+              | /
+              |/
+  left ------ 0 ------> x right
+              |
+              |
+              |
+              |
+              v
+            y down
+  ```
+- LiDAR coordinate system -- the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
+  ```
+               z up  x front
+                 ^    ^
+                 |   /
+                 |  /
+                 | /
+                 |/
+  y left <------ 0 ------ right
+  ```
+- Depth coordinate system -- the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
+  ```
+            z up  y front
+              ^    ^
+              |   /
+              |  /
+              | /
+              |/
+  left ------ 0 ------> x right
+  ```
+The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of $(x, y, z, dx, dy, dz, r)$, our coordinate systems also define how to interpret the box dimensions $(dx, dy, dz)$ and the yaw angle $r$.
+The illustration of the three coordinate systems is shown below:
+![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/master/resources/coord_sys_all.png)
+The three figures above are the 3D coordinate systems while the three figures below are the bird's eye view.
+We will stick to the three coordinate systems defined in this tutorial in the future.
+## Definition of the yaw angle
+Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane $\\Pi$ perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on $\\Pi$ have non-zero yaw angles depending on its angle with the reference direction.
+Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
+In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one's eyes).
+The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of $\\frac{\\pi}{2}$.
+```
+                     z up  y front (yaw=0.5*pi)
+                      ^    ^
+                      |   /
+                      |  /
+                      | /
+                      |/
+left (yaw=pi)  ------ 0 ------> x right (yaw=0)
+```
+For a box, the value of its yaw angle equals its direction minus a reference direction. In all three coordinate systems in MMDetection3D, the reference direction is always the positive direction of the x-axis, while the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. The definition of the yaw angle of a box is illustrated in the figure below.
+```
+y front
+  ^      box direction (yaw=0.5*pi)
+ /|\        ^
+  |        /|\
+  |     ____|____
+  |    |    |    |
+  |    |    |    |
+__|____|____|____|______\ x right
+  |    |    |    |      /
+  |    |    |    |
+  |    |____|____|
+  |
+```
+## Definition of the box dimensions
+The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be $dx$. However, this is not always the case in some datasets (we will address that later).
+The following figures show the meaning of the correspondence between the x-axis and $dx$, and between the y-axis and $dy$.
+```
+y front
+  ^      box direction (yaw=0.5*pi)
+ /|\        ^
+  |        /|\
+  |     ____|____
+  |    |    |    |
+  |    |    |    | dx
+__|____|____|____|______\ x right
+  |    |    |    |      /
+  |    |    |    |
+  |    |____|____|
+  |         dy
+```
+Note that the box direction is always parallel with the edge $dx$.
+```
+y front
+  ^     _________
+ /|\   |    |    |
+  |    |    |    |
+  |    |    |    | dy
+  |    |____|____|____\  box direction (yaw=0)
+  |    |    |    |    /
+__|____|____|____|_________\ x right
+  |    |    |    |         /
+  |    |____|____|
+  |         dx
+  |
+```
+## Relation with raw coordinate systems of supported datasets
+### KITTI
+The raw annotation of KITTI is under camera coordinate system, see [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/kitti_data_utils.py). In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py). For training vision-based models, the data is kept in the camera coordinate system.
+In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's eye view):
+![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
+For each box, the dimensions are $(w, l, h)$, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
+Our LiDAR coordinate system has two changes:
+- The yaw angle is defined to be right-handed instead of left-handed for consistency;
+- The box dimensions are $(l, w, h)$ instead of $(w, l, h)$, since $w$ corresponds to $dy$ and $l$ corresponds to $dx$ in KITTI.
+### Waymo
+We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also share the same coordinate system in our implementation.
+### NuScenes
+NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to $(dy, dx)$, or $(w, l)$, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
+Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
+### Lyft
+Lyft shares the same data format with NuScenes as far as coordinate system is involved.
+Please refer to the [official website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data) for more information.
+### ScanNet
+The raw data of ScanNet is not point cloud but mesh. The sampled point cloud data is under our depth coordinate system. For ScanNet detection task, the box annotations are axis-aligned, and the yaw angle is always zero. Therefore the direction of the yaw angle in our depth coordinate system makes no difference regarding ScanNet.
+### SUN RGB-D
+The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion.
+For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/sunrgbd_data_utils.py).
+### S3DIS
+S3DIS shares the same coordinate system as ScanNet in our implementation. However, S3DIS is a segmentation-task-only dataset, and thus no annotation is coordinate system sensitive.
+## Examples
+### Box conversion (between different coordinate systems)
+Take the conversion between our Camera coordinate system and LiDAR coordinate system as an example:
+First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
+- $x\_{LiDAR}=z\_{camera}$
+- $y\_{LiDAR}=-x\_{camera}$
+- $z\_{LiDAR}=-y\_{camera}$
+Then, the box dimensions before and after the conversion satisfy the following relationship:
+- $dx\_{LiDAR}=dx\_{camera}$
+- $dy\_{LiDAR}=dz\_{camera}$
+- $dz\_{LiDAR}=dy\_{camera}$
+Finally, the yaw angle should also be converted:
+- $r\_{LiDAR}=-\\frac{\\pi}{2}-r\_{camera}$
+See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
+### Bird's Eye View
+The BEV of a camera coordinate system box is $(x, z, dx, dz, -r)$ if the 3D box is $(x, y, z, dx, dy, dz, r)$. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
+See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
+### Rotation of boxes
+We set the rotation of all kinds of boxes to be counter-clockwise about the gravity axis. Therefore, to rotate a 3D box we first calculate the new box center, and then we add the rotation angle to the yaw angle.
+See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
+## Common FAQ
+#### Q1: Are the box related ops universal to all coordinate system types?
+No. For example, [RoI-Aware Pooling ops](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/roiaware_pool3d.py) is applicable to boxes under Depth or LiDAR coordinate system only. The evaluation functions for KITTI dataset [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/kitti_utils) are only applicable to boxes under Camera coordinate system since the rotation is clockwise if viewed from above.
+For each box related op, we have marked the type of boxes to which we can apply the op.
+#### Q2: In every coordinate system, do the three axes point exactly to the right, the front, and the ground, respectively?
+No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
+#### Q3: How does a phase difference of $2\\pi$ in the yaw angle of a box affect evaluation?
+For IoU calculation, a phase difference of $2\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
+For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of $2\\pi$ will not change the result.
+#### Q4: How does a phase difference of $\\pi$ in the yaw angle of a box affect evaluation?
+For IoU calculation, a phase difference of $\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
+However, for angle prediction evaluation, this will result in the exact opposite direction.
+Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add $\\pi$ to this angle, the car front will become the car rear.
+For categories such as barrier, the front and the rear have no difference, therefore a phase difference of $\\pi$ will not affect the angle prediction score.
--- a/mmdetection3d/docs/en/user_guides/data_pipeline.md
+++ b/mmdetection3d/docs/en/user_guides/data_pipeline.md
+# Customize Data Pipelines
+## Design of Data pipelines
+Following typical conventions, we use `Dataset` and `DataLoader` for data loading
+with multiple workers. `Dataset` returns a dict of data items corresponding
+the arguments of models' forward method.
+Since the data in object detection may not be the same size (point number, gt bbox size, etc.),
+we introduce a new `DataContainer` type in MMCV to help collect and distribute
+data of different size.
+See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
+The data preparation pipeline and the dataset is decomposed. Usually a dataset
+defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
+A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
+We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
+![](../../../resources/data_pipeline.png)
+The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
+Here is an pipeline example for PointPillars.
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.3925, 0.3925],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0]),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectNameFilter', classes=class_names),
+    dict(type='PointShuffle'),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1.0,
+        flip=False,
+        pcd_horizontal_flip=False,
+        pcd_vertical_flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+            dict(
+                type='DefaultFormatBundle3D',
+                class_names=class_names,
+                with_label=False),
+            dict(type='Collect3D', keys=['points'])
+        ])
+]
+```
+For each operation, we list the related dict fields that are added/updated/removed.
+### Data loading
+`LoadPointsFromFile`
+- add: points
+`LoadPointsFromMultiSweeps`
+- update: points
+`LoadAnnotations3D`
+- add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
+### Pre-processing
+`GlobalRotScaleTrans`
+- add: pcd_trans, pcd_rotation, pcd_scale_factor
+- update: points, \*bbox3d_fields
+`RandomFlip3D`
+- add: flip, pcd_horizontal_flip, pcd_vertical_flip
+- update: points, \*bbox3d_fields
+`PointsRangeFilter`
+- update: points
+`ObjectRangeFilter`
+- update: gt_bboxes_3d, gt_labels_3d
+`ObjectNameFilter`
+- update: gt_bboxes_3d, gt_labels_3d
+`PointShuffle`
+- update: points
+`PointsRangeFilter`
+- update: points
+### Formatting
+`DefaultFormatBundle3D`
+- update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
+`Collect3D`
+- add: img_meta (the keys of img_meta is specified by `meta_keys`)
+- remove: all other keys except for those specified by `keys`
+### Test time augmentation
+`MultiScaleFlipAug`
+- update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
+## Extend and use custom pipelines
+1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
+   ```python
+   from mmdet.datasets import PIPELINES
+   @PIPELINES.register_module()
+   class MyTransform:
+       def __call__(self, results):
+           results['dummy'] = True
+           return results
+   ```
+2. Import the new class.
+   ```python
+   from .my_pipeline import MyTransform
+   ```
+3. Use it in config files.
+   ```python
+   train_pipeline = [
+       dict(
+           type='LoadPointsFromFile',
+           load_dim=5,
+           use_dim=5,
+           backend_args=backend_args),
+       dict(
+           type='LoadPointsFromMultiSweeps',
+           sweeps_num=10,
+           backend_args=backend_args),
+       dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+       dict(
+           type='GlobalRotScaleTrans',
+           rot_range=[-0.3925, 0.3925],
+           scale_ratio_range=[0.95, 1.05],
+           translation_std=[0, 0, 0]),
+       dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+       dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+       dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+       dict(type='ObjectNameFilter', classes=class_names),
+       dict(type='MyTransform'),
+       dict(type='PointShuffle'),
+       dict(type='DefaultFormatBundle3D', class_names=class_names),
+       dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+   ]
+   ```
--- a/mmdetection3d/docs/en/user_guides/dataset_prepare.md
+++ b/mmdetection3d/docs/en/user_guides/dataset_prepare.md
+# Dataset Preparation
+## Before Preparation
+It is recommended to symlink the dataset root to `$MMDETECTION3D/data`.
+If your folder structure is different from the following, you may need to change the corresponding paths in config files.
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── nuscenes
+│   │   ├── maps
+│   │   ├── samples
+│   │   ├── sweeps
+│   │   ├── v1.0-test
+|   |   ├── v1.0-trainval
+│   ├── kitti
+│   │   ├── ImageSets
+│   │   ├── testing
+│   │   │   ├── calib
+│   │   │   ├── image_2
+│   │   │   ├── velodyne
+│   │   ├── training
+│   │   │   ├── calib
+│   │   │   ├── image_2
+│   │   │   ├── label_2
+│   │   │   ├── velodyne
+│   ├── waymo
+│   │   ├── waymo_format
+│   │   │   ├── training
+│   │   │   ├── validation
+│   │   │   ├── testing
+│   │   │   ├── gt.bin
+│   │   ├── kitti_format
+│   │   │   ├── ImageSets
+│   ├── lyft
+│   │   ├── v1.01-train
+│   │   │   ├── v1.01-train (train_data)
+│   │   │   ├── lidar (train_lidar)
+│   │   │   ├── images (train_images)
+│   │   │   ├── maps (train_maps)
+│   │   ├── v1.01-test
+│   │   │   ├── v1.01-test (test_data)
+│   │   │   ├── lidar (test_lidar)
+│   │   │   ├── images (test_images)
+│   │   │   ├── maps (test_maps)
+│   │   ├── train.txt
+│   │   ├── val.txt
+│   │   ├── test.txt
+│   │   ├── sample_submission.csv
+│   ├── s3dis
+│   │   ├── meta_data
+│   │   ├── Stanford3dDataset_v1.2_Aligned_Version
+│   │   ├── collect_indoor3d_data.py
+│   │   ├── indoor3d_util.py
+│   │   ├── README.md
+│   ├── scannet
+│   │   ├── meta_data
+│   │   ├── scans
+│   │   ├── scans_test
+│   │   ├── batch_load_scannet_data.py
+│   │   ├── load_scannet_data.py
+│   │   ├── scannet_utils.py
+│   │   ├── README.md
+│   ├── sunrgbd
+│   │   ├── OFFICIAL_SUNRGBD
+│   │   ├── matlab
+│   │   ├── sunrgbd_data.py
+│   │   ├── sunrgbd_utils.py
+│   │   ├── README.md
+│   ├── semantickitti
+│   │   ├── sequences
+│   │   │   ├── 00
+│   │   │   │   ├── labels
+│   │   │   │   ├── velodyne
+│   │   │   ├── 01
+│   │   │   ├── ..
+│   │   │   ├── 22
+```
+## Download and Data Preparation
+### KITTI
+1. Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset kitti
+```
+2. Prepare KITTI data splits by running:
+```bash
+mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
+# Download data split
+wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
+wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
+wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
+wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
+```
+3. Generate info files by running:
+```bash
+python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
+```
+In an environment using slurm, users may run the following command instead:
+```bash
+sh tools/create_data.sh <partition> kitti
+```
+**Tips**:
+- **Ready-made Annotations**. We have also provided kitti data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/kitti/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
+  ```bash
+  python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti --only-gt-database
+  ```
+### Waymo
+Download Waymo open dataset V1.4.1 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `.tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split `.txt` files into `data/waymo/kitti_format/ImageSets`. Download ground truth `.bin` file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running:
+```bash
+# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
+# The number of `--workers` depends on the maximum number of cores in your CPU.
+TF_CPP_MIN_LOG_LEVEL=3 python tools/create_data.py waymo --root-path ./data/waymo --out-dir ./data/waymo --workers 128 --extra-tag waymo --version v1.4
+```
+Note that:
+- In case the preprocessing of Waymo dataset is slow or blocked, consider reducing the value of `--workers`. If this doesn't resolve the issue, you could set `--workers` as 0 to avoid using multiprocess.
+- If your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
+**Tips**:
+- **Ready-made Annotations**. We have provided the annotation files generated offline [here](#summary-of-annotation-files). However, the original Waymo data still needs to be converted to `kitti-format` data by yourself.
+- **Waymo-mini**. If you just want to use a part of Waymo Dataset to verify some methods or debug quickly, you could use our provided [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) which only contains two segments in train split and one segment in val split from the original dataset. All the images, point clouds and annotations in this compressed file have been processed offline so that you can directly download and unzip it to `data/waymo/`:
+  ```bash
+  tar -xzvf waymo_mini.tar.gz -C ./data/waymo_mini
+  ```
+### NuScenes
+1. Download nuScenes V1.0 full dataset data [HERE](https://www.nuscenes.org/download). Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset nuscenes
+```
+2. Prepare nuscenes data by running:
+```bash
+python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
+```
+**Tips**:
+- **Ready-made Annotations**. We have also provided NuScenes data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/nuscenes/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
+```bash
+python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --only-gt-database
+```
+### Lyft
+Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running:
+```bash
+python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
+python tools/dataset_converters/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
+```
+Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the [discussion](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000) for more details.
+### SemanticKITTI
+1. Download SemanticKITTI dataset [HERE](http://semantic-kitti.org/dataset.html#download) and unzip all zip files. Alternatively, you
+   can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
+```bash
+# install OpenDataLab CLI tools
+pip install -U opendatalab
+# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
+pip install odl
+odl login
+# download and preprocess by MIM
+mim download mmdet3d --dataset semantickitti
+```
+2. Generate info files by running:
+```bash
+python ./tools/create_data.py semantickitti --root-path ./data/semantickitti --out-dir ./data/semantickitti --extra-tag semantickitti
+```
+**Tips**:
+- **Ready-made Annotations**. We have also provided SemanticKITTI data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/semantickitti/`.
+### S3DIS, ScanNet and SUN RGB-D
+To prepare S3DIS data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/s3dis/README.md).
+To prepare ScanNet data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/scannet/README.md).
+To prepare SUN RGB-D data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/sunrgbd/README.md).
+**Tips**: For S3DIS, ScanNet and SUN RGB-D datasets, we have also provided data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/${DATASET}/`. However, you also need to generate point cloud files and semantic/instance masks files (if it has) by yourself.
+### Customized Datasets
+For using custom datasets, please refer to [Customize Datasets](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
+### Update data infos
+If you have used v1.0.0rc1-v1.0.0rc4 mmdetection3d to create data infos before, and now you want to use the newest v1.1.0 mmdetection3d, you need to update the data infos file.
+```bash
+python tools/dataset_converters/update_infos_to_v2.py --dataset ${DATA_SET} --pkl-path ${PKL_PATH} --out-dir ${OUT_DIR}
+```
+- `--dataset` : Name of dataset.
+- `--pkl-path` : Specify the data infos pkl file path.
+- `--out-dir` : Output direction of the data infos pkl file.
+Example:
+```bash
+python tools/dataset_converters/update_infos_to_v2.py --dataset kitti --pkl-path ./data/kitti/kitti_infos_trainval.pkl --out-dir ./data/kitti
+```
+### Summary of annotation files
+We provide ready-made annotation files we generated offline for reference. You can directly use these files for convenice.
+|                                                  Dataset                                                  |                                                                                                           Train annotation file                                                                                                           |                                                                                                        Val annotation file                                                                                                         |                                                                                                              Test information file                                                                                                              |
+| :-------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|                                                   KITTI                                                   |                                                                  [kitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_train.pkl)                                                                   |                                                                 [kitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_val.pkl)                                                                 |                                                                        [kitti_infos_test](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_test.pkl)                                                                         |
+|                                                 NuScenes                                                  | [nuscenes_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_train.pkl) [nuscenes_mini_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_train.pkl) | [nuscenes_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_val.pkl)  [nuscenes_mini_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_val.pkl) |                                                                                                                                                                                                                                                 |
+|                                                   Waymo                                                   |                                                         [waymo_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_train.pkl)                                                          |                                                        [waymo_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_val.pkl)                                                        | [waymo_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo/waymo_infos_test.pkl)   [waymo_infos_test_cam_only.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_test_cam_only.pkl) |
+| [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) |                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                    |                                                                                                                                                                                                                                                 |
+|                                                 SUN RGB-D                                                 |                                                               [sunrgbd_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_train.pkl)                                                                |                                                              [sunrgbd_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_val.pkl)                                                              |                                                                                                                                                                                                                                                 |
+|                                                  ScanNet                                                  |                                                               [scannet_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_train.pkl)                                                                |                                                              [scannet_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_val.pkl)                                                              |                                                                   [scannet_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_test.pkl)                                                                    |
+|                                               SemanticKitti                                               |                                                      [semantickitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_train.pkl)                                                       |                                                     [semantickitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_val.pkl)                                                     |                                                          [semantickitti_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_test.pkl)                                                           |
--- a/mmdetection3d/docs/en/user_guides/index.rst
+++ b/mmdetection3d/docs/en/user_guides/index.rst
+Train & Test
+**************
+.. toctree::
+   :maxdepth: 1
+   config.md
+   coord_sys_tutorial.md
+   dataset_prepare.md
+   data_pipeline.md
+   train_test.md
+   inference.md
+   new_data_model.md
+Useful Tools
+************
+.. toctree::
+   :maxdepth: 1
+   useful_tools.md
+   visualization.md
+   backends_support.md
+   model_deployment.md
--- a/mmdetection3d/docs/en/user_guides/inference.md
+++ b/mmdetection3d/docs/en/user_guides/inference.md
+# Inference
+## Introduction
+We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from [model zoo](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/model_zoo.md). We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
+## Testing
+### 3D Detection
+#### Point cloud demo
+To test a 3D detector on point cloud data, simply run:
+```shell
+python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
+```
+The visualization results including a point cloud and predicted 3D bounding boxes will be saved in `${OUT_DIR}/PCD_NAME`, which you can open using [MeshLab](http://www.meshlab.net/). Note that if you set the flag `--show`, the prediction result will be displayed online using [Open3D](http://www.open3d.org/).
+Example on KITTI data using [PointPillars model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth):
+```shell
+python demo/pcd_demo.py demo/data/kitti/000008.bin configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py ${CHECKPOINT_FILE} --show
+```
+Example on SUN RGB-D data using [VoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth):
+```shell
+python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --show
+```
+#### Monocular 3D demo
+To test a monocular 3D detector on image data, simply run:
+```shell
+python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--cam-type ${CAM_TYPE}] [--score-thr ${SCORE-THR}] [--out-dir ${OUT_DIR}] [--show]
+```
+where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix (camera intrinsic matrix), and `CAM_TYPE` should be specified according to dataset. For example, if you want to inference on the front camera image, the `CAM_TYPE` should be set as `CAM_2` for KITTI, and `CAM_FRONT` for nuScenes. By specifying `CAM_TYPE`, you can even infer on any camera images for datasets with multi-view cameras, such as nuScenes and Waymo. `SCORE-THR` is the 3D bbox threshold while visualization. The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in `${OUT_DIR}/IMG_NAME`.
+Example on KITTI data using [PGD model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d_20211022_102608-8a97533b.pth):
+```shell
+python demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/000008.pkl  configs/pgd/pgd_r101-caffe_fpn_head-gn_4xb3-4x_kitti-mono3d.py ${CHECKPOINT_FILE}  --show --cam-type CAM2 --score-thr 8
+```
+**Note**: For PGD, the prediction score is not among (0, 1).
+Example on nuScenes data using [FCOS3D model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth):
+```shell
+python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl  configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py ${CHECKPOINT_FILE}  --show --cam-type CAM_BACK
+```
+**Note** that when visualizing results of monocular 3D detection for flipped images, the camera intrinsic matrix should also be modified accordingly. See more details and examples in PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744).
+#### Multi-modality demo
+To test a 3D detector on multi-modality data (typically point cloud and image), simply run:
+```shell
+python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
+```
+where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix. The visualization results including a point cloud, an image, predicted 3D bounding boxes and their projection on the image will be saved in `${OUT_DIR}/PCD_NAME`.
+Example on KITTI data using [MVX-Net model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class-8963258a.pth):
+```shell
+python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py ${CHECKPOINT_FILE} --cam-type CAM2 --show
+```
+Example on SUN RGB-D data using [ImVoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class/imvotenet_stage2_16x8_sunrgbd-3d-10class_20210819_192851-1bcd1b97.pth):
+```shell
+python demo/multi_modality_demo.py demo/data/sunrgbd/000017.bin demo/data/sunrgbd/000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --cam-type CAM0 --show --score-thr 0.6
+```
+Example on NuScenes data using [BEVFusion model](https://drive.google.com/file/d/1QkvbYDk4G2d6SZoeJqish13qSyXA4lp3/view?usp=share_link):
+```shell
+python demo/multi_modality_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__LIDAR_TOP__1532402927647951.pcd.bin demo/data/nuscenes/ demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py ${CHECKPOINT_FILE} --cam-type all --score-thr 0.2 --show
+```
+### 3D Segmentation
+To test a 3D segmentor on point cloud data, simply run:
+```shell
+python demo/pcd_seg_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
+```
+The visualization results including a point cloud and its predicted 3D segmentation mask will be saved in `${OUT_DIR}/PCD_NAME`.
+Example on ScanNet data using [PointNet++ (SSG) model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class_20210514_143644-ee73704a.pth):
+```shell
+python demo/pcd_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py ${CHECKPOINT_FILE} --show
+```
--- a/mmdetection3d/docs/en/user_guides/model_deployment.md
+++ b/mmdetection3d/docs/en/user_guides/model_deployment.md
+# Model Deployment
+MMDet3D 1.1 fully relies on [MMDeploy](https://mmdeploy.readthedocs.io/) to deploy models.
+Please stay tuned and this document will be update soon.
--- a/mmdetection3d/docs/en/user_guides/new_data_model.md
+++ b/mmdetection3d/docs/en/user_guides/new_data_model.md
+# Train with Customized Datasets
+In this note, you will know how to train and test predefined models with customized datasets. We use the Waymo dataset as an example to describe the whole process.
+The basic steps are as below:
+1. Prepare the customized dataset
+2. Prepare a config
+3. Train, test, inference models on the customized dataset.
+## Prepare the customized dataset
+There are three ways to support a new dataset in MMDetection3D:
+1. reorganize the dataset into existing format.
+2. reorganize the dataset into a standard format.
+3. implement a new dataset.
+Usually we recommend to use the first two methods which are usually easier than the third.
+In this note, we give an example for converting the data into KITTI format, you can refer to this to reorganize your dataset into kitti format. About the standard format dataset, and you can refer to [customize_dataset.md](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
+**Note**: We take Waymo as the example here considering its format is totally different from other existing formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new data converter (for the second approach above) instead of converting it to another format (for the first approach above).
+### KITTI dataset format
+Firstly, the raw data for 3D object detection from KITTI are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation/testing set, `calib` contains calibration information files, `image_2` and `velodyne` include image data and point cloud data, and `label_2` includes label files for 3D detection.
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── kitti
+│   │   ├── ImageSets
+│   │   ├── testing
+│   │   │   ├── calib
+│   │   │   ├── image_2
+│   │   │   ├── velodyne
+│   │   ├── training
+│   │   │   ├── calib
+│   │   │   ├── image_2
+│   │   │   ├── label_2
+│   │   │   ├── velodyne
+```
+Specific annotation format is described in the official object development [kit](https://s3.eu-central-1.amazonaws.com/avg-kitti/devkit_object.zip). For example, it consists of the following labels:
+```
+#Values    Name      Description
+----------------------------------------------------------------------------
+   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
+                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
+                     'Misc' or 'DontCare'
+   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
+                     truncated refers to the object leaving image boundaries
+   1    occluded     Integer (0,1,2,3) indicating occlusion state:
+                     0 = fully visible, 1 = partly occluded
+                     2 = largely occluded, 3 = unknown
+   1    alpha        Observation angle of object, ranging [-pi..pi]
+   4    bbox         2D bounding box of object in the image (0-based index):
+                     contains left, top, right, bottom pixel coordinates
+   3    dimensions   3D object dimensions: height, width, length (in meters)
+   3    location     3D object location x,y,z in camera coordinates (in meters)
+   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
+   1    score        Only for results: Float, indicating confidence in
+                     detection, needed for p/r curves, higher is better.
+```
+Assume we use the Waymo dataset.
+After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement `WaymoDataset` inherited from `KittiDataset` to load the data and perform training, and implement `WaymoMetric` inherited from `KittiMetric` for evaluation.
+Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/waymo_dataset.py) to process it, in addition need to add a waymo [metric](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/metrics/waymo_metric.py) to evaluate results. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. Regarding the dataset evaluation metric, because Waymo has its own evaluation approach, we need further implement a new Waymo metric; more about the metric could refer to [metric_and_evaluator.md](https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/metric_and_evaluator.md). Afterward, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model by `WaymoMetric`.
+For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [waymo_det.md](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html).
+## Prepare a config
+The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection.
+Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehicle, cyclist and pedestrian, we need to prepare dataset config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/waymoD5-3d-3class.py), model config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_waymo.py) and combine them like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/kitti-3d-3class.py), [model config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_kitti.py) and [overall](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py).
+## Train a new model
+To train a model with the new config, you can simply run
+```shell
+python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
+```
+For more detailed usages, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
+## Test and inference
+To test the trained model, you can simply run
+```shell
+python tools/test.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py work_dirs/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class/latest.pth
+```
+**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html) and prepare files related to metrics computation as official instructions.
+For more detailed usages for test and inference, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
--- a/mmdetection3d/docs/en/user_guides/train_test.md
+++ b/mmdetection3d/docs/en/user_guides/train_test.md
+# Test and Train on Standard Datasets
+### Test existing models on standard datasets
+- single GPU
+- CPU
+- single node multiple GPU
+- multiple node
+You can use the following commands to test a dataset.
+```shell
+# single-gpu testing
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}]  [--show] [--show-dir ${SHOW_DIR}]
+# CPU: disable GPUs and run single-gpu testing script (experimental)
+export CUDA_VISIBLE_DEVICES=-1
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}]  [--show] [--show-dir ${SHOW_DIR}]
+# multi-gpu testing
+./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}]  [--show] [--show-dir ${SHOW_DIR}]
+```
+**Note**:
+For now, CPU testing is only supported for SMOKE.
+Optional arguments:
+- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`.
+- `--show-dir`: If specified, detection results will be plotted on the `***_points.obj` and `***_pred.obj` files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
+All evaluation related arguments are set in the `test_evaluator` in corresponding dataset configuration. such as
+`test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_val.pkl', pklfile_prefix=None, submission_prefix=None)`
+The arguments:
+- `type`: The name of the corresponding metric, usually associated with the dataset.
+- `ann_file`: The path of annotation file.
+- `pklfile_prefix`: An optional argument. The filename of the output results in pickle format. If not specified, the results will not be saved to a file.
+- `submission_prefix`: An optional argument. The results will be saved to a file then you can upload it to do the official evaluation.
+Examples:
+Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
+1. Test VoteNet on ScanNet and save the points and prediction visualization results.
+   ```shell
+   python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
+       checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
+       --show --show-dir ./data/scannet/show_results
+   ```
+2. Test VoteNet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP.
+   ```shell
+   python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
+       checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
+       --show --show-dir ./data/scannet/show_results
+   ```
+3. Test VoteNet on ScanNet (without saving the test results) and evaluate the mAP.
+   ```shell
+   python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
+       checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth
+   ```
+4. Test SECOND on KITTI with 8 GPUs, and evaluate the mAP.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
+       checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth
+   ```
+5. Test PointPillars on nuScenes with 8 GPUs, and generate the json file to be submit to the official evaluation server.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py \
+       checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \
+      --cfg-options 'test_evaluator.jsonfile_prefix=./pointpillars_nuscenes_results'
+   ```
+   The generated results be under `./pointpillars_nuscenes_results` directory.
+6. Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission data to be submit to the official evaluation server.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
+       checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \
+       --cfg-options 'test_evaluator.pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results'
+   ```
+   The generated results be under `./second_kitti_results` directory.
+7. Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \
+       checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth \
+       --cfg-options 'test_evaluator.jsonfile_prefix=results/pp_lyft/results_challenge' \
+       'test_evaluator.csv_savepath=results/pp_lyft/results_challenge.csv' \
+       'test_evaluator.pklfile_prefix=results/pp_lyft/results_challenge.pkl'
+   ```
+   **Notice**: To generate submissions on Lyft, `csv_savepath` must be given in the `--cfg-options`. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit).
+   Note that in the [config of Lyft dataset](../../configs/_base_/datasets/lyft-3d.py), the value of `ann_file` keyword in `test` is `'lyft_infos_test.pkl'`, which is the official test set of Lyft without annotation. To test on the validation set, please change this to `'lyft_infos_val.pkl'`.
+8. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py  \
+       checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
+       --cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
+       'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
+   ```
+   **Notice**: For evaluation on waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`.(Sometimes when using bazel to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.) `pklfile_prefix` should be given in the `--eval-options` for the bin file generation. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision.
+9. Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
+   ```shell
+   ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py  \
+       checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
+       --cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
+       'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
+   ```
+   **Notice**: After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). For evaluation on the validation set with the eval server, you can also use the same way to generate a submission.
+## Train predefined models on standard datasets
+MMDetection3D implements distributed training and non-distributed training,
+which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
+All outputs (log files and checkpoints) will be saved to the working directory,
+which is specified by `work_dir` in the config file.
+By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config.
+```python
+train_cfg = dict(type='EpochBasedTrainLoop', val_interval=1)  # This evaluate the model per 12 epoch.
+```
+**Important**: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config's file name, e.g. '2xb8' means 2 samples per GPU using 8 GPUs.
+According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves.
+### Train with a single GPU
+```shell
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
+### Training with CPU (experimental)
+The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.
+```shell
+export CUDA_VISIBLE_DEVICES=-1
+```
+And then run the script of train with a single GPU.
+**Note**:
+For now, most of the point cloud related algorithms rely on 3D CUDA op, which can not be trained on CPU. Some monocular 3D object detection algorithms, like FCOS3D and SMOKE can be trained on CPU. We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug certain models on machines without GPU for convenience.
+### Train with multiple GPUs
+```shell
+./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
+```
+Optional arguments are:
+- `--cfg-options 'Key=value'`: Override some settings in the used config.
+### Train with multiple machines
+If you run MMDetection3D on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
+```shell
+[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
+```
+Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
+```shell
+GPUS=16 ./tools/slurm_train.sh dev pp_kitti_3class configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py /nfs/xxxx/pp_kitti_3class
+```
+You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
+If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
+On the first machine:
+```shell
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
+```
+On the second machine:
+```shell
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
+```
+Usually it is slow if you do not have high speed networking like InfiniBand.
+### Launch multiple jobs on a single machine
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
+you need to specify different ports (29500 by default) for each job to avoid communication conflict.
+If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
+```
+If you use launch training jobs with Slurm, there are two ways to specify the ports.
+1. Set the port through `--cfg-options`. This is more recommended since it does not change the original configs.
+   ```shell
+   CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29500'
+   CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29501'
+   ```
+2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports.
+   In `config1.py`,
+   ```python
+   env_cfg = dict(
+       dist_cfg=dict(backend='nccl', port=29500)
+   )
+   ```
+   In `config2.py`,
+   ```python
+   env_cfg = dict(
+       dist_cfg=dict(backend='nccl', port=29501)
+   )
+   ```
+   Then you can launch two jobs with `config1.py` and `config2.py`.
+   ```shell
+   CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
+   CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
+   ```
--- a/mmdetection3d/docs/en/user_guides/useful_tools.md
+++ b/mmdetection3d/docs/en/user_guides/useful_tools.md
+We provide lots of useful tools under `tools/` directory.
+## Log Analysis
+You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency.
+![loss curve image](../../../resources/loss_curve.png)
+```shell
+python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] [--mode ${MODE}] [--interval ${INTERVAL}]
+```
+**Notice**: If the metric you want to plot is calculated in the eval stage, you need to add the flag `--mode eval`. If you perform evaluation with an interval of `${INTERVAL}`, you need to add the args `--interval ${INTERVAL}`.
+Examples:
+- Plot the classification loss of some run.
+  ```shell
+  python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
+  ```
+- Plot the classification and regression loss of some run, and save the figure to a pdf.
+  ```shell
+  python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
+  ```
+- Compare the bbox mAP of two runs in the same figure.
+  ```shell
+  # evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
+  python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
+  # evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
+  python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
+  ```
+You can also compute the average training speed.
+```shell
+python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
+```
+The output is expected to be like the following.
+```
+-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
+slowest epoch 11, average time is 1.2024
+fastest epoch 1, average time is 1.1909
+time std over epochs is 0.0028
+average iter time: 1.1959 s/iter
+```
+&#8195;
+## Model Serving
+**Note**: This tool is still experimental now, only SECOND is supported to be served with [`TorchServe`](https://pytorch.org/serve/). We'll support more models in the future.
+In order to serve an `MMDetection3D` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
+### 1. Convert the model from MMDetection3D to TorchServe
+```shell
+python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
+--output-folder ${MODEL_STORE} \
+--model-name ${MODEL_NAME}
+```
+**Note**: ${MODEL_STORE} needs to be an absolute path to a folder.
+### 2. Build `mmdet3d-serve` docker image
+```shell
+docker build -t mmdet3d-serve:latest docker/serve/
+```
+### 3. Run `mmdet3d-serve`
+Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
+In order to run it on the GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run on the CPU.
+Example:
+```shell
+docker run --rm \
+--cpus 8 \
+--gpus device=0 \
+-p8080:8080 -p8081:8081 -p8082:8082 \
+--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
+mmdet3d-serve:latest
+```
+[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) about the Inference (8080), Management (8081) and Metrics (8082) APis
+### 4. Test deployment
+You can use `test_torchserver.py` to compare result of torchserver and pytorch.
+```shell
+python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
+[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
+```
+Example:
+```shell
+python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
+```
+&#8195;
+## Model Complexity
+You can use `tools/analysis_tools/get_flops.py` in MMDetection3D, a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch), to compute the FLOPs and params of a given model.
+```shell
+python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
+```
+You will get the results like this.
+```text
+==============================
+Input shape: (40000, 4)
+Flops: 5.78 GFLOPs
+Params: 953.83 k
+==============================
+```
+**Note**: This tool is still experimental and we do not guarantee that the
+number is absolutely correct. You may well use the result for simple
+comparisons, but double check it before you adopt it in technical reports or papers.
+1. FLOPs are related to the input shape while parameters are not. The default
+   input shape is (1, 40000, 4).
+2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
+3. We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
+&#8195;
+## Model Conversion
+### RegNet model to MMDetection
+`tools/model_converters/regnet2mmdet.py` convert keys in pycls pretrained RegNet models to
+MMDetection style.
+```shell
+python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
+```
+### Detectron ResNet to Pytorch
+`tools/detectron2pytorch.py` in MMDetection could convert keys in the original detectron pretrained
+ResNet models to PyTorch style.
+```shell
+python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
+```
+### Prepare a model for publishing
+`tools/model_converters/publish_model.py` helps users to prepare their model for publishing.
+Before you upload a model to AWS, you may want to
+1. convert model weights to CPU tensors
+2. delete the optimizer states and
+3. compute the hash of the checkpoint file and append the hash id to the
+   filename.
+```shell
+python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
+```
+E.g.,
+```shell
+python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
+```
+The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
+&#8195;
+## Dataset Conversion
+`tools/dataset_converters/` contains tools for converting datasets to other formats. Most of them convert datasets to pickle based info files, like kitti, nuscenes and lyft. Waymo converter is used to reorganize waymo raw data like KITTI style. Users could refer to them for our approach to converting data format. It is also convenient to modify them to use as scripts like nuImages converter.
+To convert the nuImages dataset into COCO format, please use the command below:
+```shell
+python -u tools/dataset_converters/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERSIONS} \
+                                                    --out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
+```
+- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
+- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
+- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
+- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
+- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
+More details could be referred to the [doc](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html) for dataset preparation and [README](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages/README.md/) for nuImages dataset.
+&#8195;
+## Miscellaneous
+### Print the entire config
+`tools/misc/print_config.py` prints the whole config verbatim, expanding all its
+imports.
+```shell
+python tools/misc/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
+```
--- a/mmdetection3d/docs/en/user_guides/visualization.md
+++ b/mmdetection3d/docs/en/user_guides/visualization.md
+# Visualization
+MMDetection3D provides a `Det3DLocalVisualizer` to visualize and store the state of the model during training and testing, as well as results, with the following features.
+1. Support the basic drawing interface for multi-modality data and multi-task.
+2. Support multiple backends such as local, TensorBoard, to write training status such as `loss`, `lr`, or performance evaluation metrics and to a specified single or multiple backends.
+3. Support ground truth visualization on multimodal data, and cross-modal visualization of 3D detection results.
+## Basic Drawing Interface
+Inherited from `DetLocalVisualizer`, `Det3DLocalVisualizer` provides an interface for drawing common objects on 2D images, such as drawing detection boxes, points, text, lines, circles, polygons, and binary masks. More details about 2D drawing can refer to the [visualization documentation](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/visualization.html) in MMDetection. Here we introduce the 3D drawing interface:
+### Drawing point cloud on the image
+We support drawing point cloud on the image by using `draw_points_on_image`.
+```python
+import mmcv
+import numpy as np
+from mmengine import load
+from mmdet3d.visualization import Det3DLocalVisualizer
+info_file = load('demo/data/kitti/000008.pkl')
+points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
+points = points.reshape(-1, 4)[:, :3]
+lidar2img = np.array(info_file['data_list'][0]['images']['CAM2']['lidar2img'], dtype=np.float32)
+visualizer = Det3DLocalVisualizer()
+img = mmcv.imread('demo/data/kitti/000008.png')
+img = mmcv.imconvert(img, 'bgr', 'rgb')
+visualizer.set_image(img)
+visualizer.draw_points_on_image(points, lidar2img)
+visualizer.show()
+```
+![points_on_image](../../../resources/points_on_image.png)
+### Drawing 3D Boxes on Point Cloud
+We support drawing 3D boxes on point cloud by using `draw_bboxes_3d`.
+```python
+import torch
+import numpy as np
+from mmdet3d.visualization import Det3DLocalVisualizer
+from mmdet3d.structures import LiDARInstance3DBoxes
+points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
+points = points.reshape(-1, 4)
+visualizer = Det3DLocalVisualizer()
+# set point cloud in visualizer
+visualizer.set_points(points)
+bboxes_3d = LiDARInstance3DBoxes(
+    torch.tensor([[8.7314, -1.8559, -1.5997, 4.2000, 3.4800, 1.8900,
+                   -1.5808]]))
+# Draw 3D bboxes
+visualizer.draw_bboxes_3d(bboxes_3d)
+visualizer.show()
+```
+![mono3d](../../../resources/pcd.png)
+### Drawing Projected 3D Boxes on Image
+We support drawing projected 3D boxes on image by using `draw_proj_bboxes_3d`.
+```python
+import mmcv
+import numpy as np
+from mmengine import load
+from mmdet3d.visualization import Det3DLocalVisualizer
+from mmdet3d.structures import CameraInstance3DBoxes
+info_file = load('demo/data/kitti/000008.pkl')
+cam2img = np.array(info_file['data_list'][0]['images']['CAM2']['cam2img'], dtype=np.float32)
+bboxes_3d = []
+for instance in info_file['data_list'][0]['instances']:
+    bboxes_3d.append(instance['bbox_3d'])
+gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
+gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
+input_meta = {'cam2img': cam2img}
+visualizer = Det3DLocalVisualizer()
+img = mmcv.imread('demo/data/kitti/000008.png')
+img = mmcv.imconvert(img, 'bgr', 'rgb')
+visualizer.set_image(img)
+# project 3D bboxes to image
+visualizer.draw_proj_bboxes_3d(gt_bboxes_3d, input_meta)
+visualizer.show()
+```
+### Drawing BEV Boxes
+We support drawing BEV boxes by using `draw_bev_bboxes`.
+```python
+import numpy as np
+from mmengine import load
+from mmdet3d.visualization import Det3DLocalVisualizer
+from mmdet3d.structures import CameraInstance3DBoxes
+info_file = load('demo/data/kitti/000008.pkl')
+bboxes_3d = []
+for instance in info_file['data_list'][0]['instances']:
+    bboxes_3d.append(instance['bbox_3d'])
+gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
+gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
+visualizer = Det3DLocalVisualizer()
+# set bev image in visualizer
+visualizer.set_bev_image()
+# draw bev bboxes
+visualizer.draw_bev_bboxes(gt_bboxes_3d, edge_colors='orange')
+visualizer.show()
+```
+### Drawing 3D Semantic Mask
+We support draw segmentation mask via per-point colorization by using `draw_seg_mask`.
+```python
+import numpy as np
+from mmdet3d.visualization import Det3DLocalVisualizer
+points = np.fromfile('demo/data/sunrgbd/000017.bin', dtype=np.float32)
+points = points.reshape(-1, 3)
+visualizer = Det3DLocalVisualizer()
+mask = np.random.rand(points.shape[0], 3)
+points_with_mask = np.concatenate((points, mask), axis=-1)
+# Draw 3D points with mask
+visualizer.set_points(points, pcd_mode=2, vis_mode='add')
+visualizer.draw_seg_mask(points_with_mask)
+visualizer.show()
+```
+## Results
+To see the prediction results of trained models, you can run the following command:
+```bash
+python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
+```
+After running this command, plotted results including input data and the output of networks visualized on the input will be saved in `${SHOW_DIR}`.
+After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. `***_gt.png` and `***_pred.png` in multi-modality detection task and vision-based detection task) in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported. You can download the `results.pkl` from the remote server, and visualize the prediction results offline in your local machine.
+To visualize the results with `Open3D` backend offline, you can run the following command:
+```bash
+python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}
+```
+![](../../../resources/open3d_visual.gif)
+This allows the inference and results generation to be done in remote server and the users can open them on their host with GUI.
+## Dataset
+We also provide scripts to visualize the dataset without inference. You can use `tools/misc/browse_dataset.py` to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command:
+```shell
+python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --output-dir ${OUTPUT_DIR}
+```
+**Notice**: Once specifying `--output-dir`, the images of views specified by users will be saved when pressing `_ESC_` in open3d window. If you want to zoom out/in the point clouds to inspect more details, you could specify `--show-interval=0` in the command.
+To verify the data consistency and the effect of data augmentation, you can also add `--aug` flag to visualize the data after data augmentation using the command as below:
+```shell
+python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --aug --output-dir ${OUTPUT_DIR}
+```
+If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the `--task` args to `multi-modality_det`. An example is showed below:
+```shell
+python tools/misc/browse_dataset.py configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py --task multi-modality_det --output-dir ${OUTPUT_DIR}
+```
+![](../../../resources/browse_dataset_multi_modality.png)
+You can simply browse different datasets using different configs, e.g. visualizing the ScanNet dataset in 3D semantic segmentation task:
+```shell
+python tools/misc/browse_dataset.py configs/_base_/datasets/scannet-seg.py --task lidar_seg --output-dir ${OUTPUT_DIR}
+```
+![](../../../resources/browse_dataset_seg.png)
+And browsing the nuScenes dataset in monocular 3D detection task:
+```shell
+python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task mono_det --output-dir ${OUTPUT_DIR}
+```
+![](../../../resources/browse_dataset_mono.png)
--- a/mmdetection3d/docs/zh_cn/Makefile
+++ b/mmdetection3d/docs/zh_cn/Makefile
+# Minimal makefile for Sphinx documentation
+#
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+.PHONY: help Makefile
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/mmdetection3d/docs/zh_cn/_static/css/readthedocs.css
+++ b/mmdetection3d/docs/zh_cn/_static/css/readthedocs.css
+.header-logo {
+    background-image: url("../image/mmdet3d-logo.png");
+    background-size: 182.5px 40px;
+    height: 40px;
+    width: 182.5px;
+}
--- a/mmdetection3d/docs/zh_cn/advanced_guides/customize_dataset.md
+++ b/mmdetection3d/docs/zh_cn/advanced_guides/customize_dataset.md
+# 自定义数据集
+在本节中，您将了解如何使用自定义数据集训练和测试预定义模型。
+基本步骤如下：
+1. 准备数据
+2. 准备配置文件
+3. 在自定义数据集上训练，测试和推理模型
+## 数据准备
+理想情况下我们可以重新组织自定义的原始数据并将标注格式转换成 KITTI 风格。但是，考虑到对于自定义数据集而言，KITTI 格式的校准文件和 3D 标注难以获得，因此我们在文档中介绍基本的数据格式。
+### 基本数据格式
+#### 点云格式
+目前，我们只支持 `.bin` 格式的点云用于训练和推理。在训练自己的数据集之前，需要将其它格式的点云文件转换成 `.bin` 文件。常见的点云数据格式包括 `.pcd` 和 `.las`，我们列举了一些开源工具作为参考。
+1. `.pcd` 转换成 `.bin`：https://github.com/DanielPollithy/pypcd
+- 您可以通过以下指令安装 `pypcd`：
+  ```bash
+  pip install git+https://github.com/DanielPollithy/pypcd.git
+  ```
+- 您可以使用以下脚本读取 `.pcd` 文件，并将其转换成 `.bin` 格式来保存：
+  ```python
+  import numpy as np
+  from pypcd import pypcd
+  pcd_data = pypcd.PointCloud.from_path('point_cloud_data.pcd')
+  points = np.zeros([pcd_data.width, 4], dtype=np.float32)
+  points[:, 0] = pcd_data.pc_data['x'].copy()
+  points[:, 1] = pcd_data.pc_data['y'].copy()
+  points[:, 2] = pcd_data.pc_data['z'].copy()
+  points[:, 3] = pcd_data.pc_data['intensity'].copy().astype(np.float32)
+  with open('point_cloud_data.bin', 'wb') as f:
+      f.write(points.tobytes())
+  ```
+2. `.las` 转换成 `.bin`：常见的转换流程为 `.las -> .pcd -> .bin`，`.las -> .pcd` 的转换可以用该[工具](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor)实现。
+#### 标签格式
+最基本的信息：每个场景的 3D 边界框和类别标签应该包含在 `.txt` 标注文件中。每一行代表特定场景的一个 3D 框，如下所示：
+```
+# 格式：[x, y, z, dx, dy, dz, yaw, category_name]
+1.23 1.42 0.23 3.96 1.65 1.55 1.56 Car
+3.51 2.15 0.42 1.05 0.87 1.86 1.23 Pedestrian
+...
+```
+**注意**：对于自定义数据集的评估我们目前只支持 KITTI 评估方法。
+3D 框应存储在统一的 3D 坐标系中。
+#### 校准格式
+对于每个激光雷达收集的点云数据，通常会进行融合并转换到特定的激光雷达坐标系。因此，校准信息文件中通常应该包含每个相机的内参矩阵和激光雷达到每个相机的外参转换矩阵，并保存在 `.txt` 校准文件中，其中 `Px` 表示 `camera_x` 的内参矩阵，`lidar2camx` 表示 `lidar` 到 `camera_x` 的外参转换矩阵。
+```
+P0
+P1
+P2
+P3
+P4
+...
+lidar2cam0
+lidar2cam1
+lidar2cam2
+lidar2cam3
+lidar2cam4
+...
+```
+### 原始数据结构
+#### 基于激光雷达的 3D 检测
+基于激光雷达的 3D 目标检测原始数据通常组织成如下格式，其中 `ImageSets` 包含划分文件，指明哪些文件属于训练/验证集，`points` 包含存储成 `.bin` 格式的点云数据，`labels` 包含 3D 检测的标签文件。
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── custom
+│   │   ├── ImageSets
+│   │   │   ├── train.txt
+│   │   │   ├── val.txt
+│   │   ├── points
+│   │   │   ├── 000000.bin
+│   │   │   ├── 000001.bin
+│   │   │   ├── ...
+│   │   ├── labels
+│   │   │   ├── 000000.txt
+│   │   │   ├── 000001.txt
+│   │   │   ├── ...
+```
+#### 基于视觉的 3D 检测
+基于视觉的 3D 目标检测原始数据通常组织成如下格式，其中 `ImageSets` 包含划分文件，指明哪些文件属于训练/验证集，`images` 包含来自不同相机的图像，例如 `camera_x` 获得的图像应放在 `images/images_x` 下，`calibs` 包含校准信息文件，其中存储了每个相机的内参矩阵，`labels` 包含 3D 检测的标签文件。
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── custom
+│   │   ├── ImageSets
+│   │   │   ├── train.txt
+│   │   │   ├── val.txt
+│   │   ├── calibs
+│   │   │   ├── 000000.txt
+│   │   │   ├── 000001.txt
+│   │   │   ├── ...
+│   │   ├── images
+│   │   │   ├── images_0
+│   │   │   │   ├── 000000.png
+│   │   │   │   ├── 000001.png
+│   │   │   │   ├── ...
+│   │   │   ├── images_1
+│   │   │   ├── images_2
+│   │   │   ├── ...
+│   │   ├── labels
+│   │   │   ├── 000000.txt
+│   │   │   ├── 000001.txt
+│   │   │   ├── ...
+```
+#### 多模态 3D 检测
+多模态 3D 目标检测原始数据通常组织成如下格式。不同于基于视觉的 3D 目标检测，`calibs` 里的校准信息文件存储了每个相机的内参矩阵和外参矩阵。
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── custom
+│   │   ├── ImageSets
+│   │   │   ├── train.txt
+│   │   │   ├── val.txt
+│   │   ├── calibs
+│   │   │   ├── 000000.txt
+│   │   │   ├── 000001.txt
+│   │   │   ├── ...
+│   │   ├── points
+│   │   │   ├── 000000.bin
+│   │   │   ├── 000001.bin
+│   │   │   ├── ...
+│   │   ├── images
+│   │   │   ├── images_0
+│   │   │   │   ├── 000000.png
+│   │   │   │   ├── 000001.png
+│   │   │   │   ├── ...
+│   │   │   ├── images_1
+│   │   │   ├── images_2
+│   │   │   ├── ...
+│   │   ├── labels
+│   │   │   ├── 000000.txt
+│   │   │   ├── 000001.txt
+│   │   │   ├── ...
+```
+#### 基于激光雷达的 3D 语义分割
+基于激光雷达的 3D 语义分割原始数据通常组织成如下格式，其中 `ImageSets` 包含划分文件，指明哪些文件属于训练/验证集，`points` 包含点云数据，`semantic_mask` 包含逐点级标签。
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── custom
+│   │   ├── ImageSets
+│   │   │   ├── train.txt
+│   │   │   ├── val.txt
+│   │   ├── points
+│   │   │   ├── 000000.bin
+│   │   │   ├── 000001.bin
+│   │   │   ├── ...
+│   │   ├── semantic_mask
+│   │   │   ├── 000000.bin
+│   │   │   ├── 000001.bin
+│   │   │   ├── ...
+```
+### 数据转换
+按照我们的说明准备好原始数据后，您可以直接使用以下命令生成训练/验证信息文件。
+```bash
+python tools/create_data.py custom --root-path ./data/custom --out-dir ./data/custom --extra-tag custom
+```
+## 自定义数据集示例
+在完成数据准备后，我们可以在 `mmdet3d/datasets/my_dataset.py` 中创建一个新的数据集来加载数据。
+```python
+import mmengine
+from mmdet3d.registry import DATASETS
+from .det3d_dataset import Det3DDataset
+@DATASETS.register_module()
+class MyDataset(Det3DDataset):
+    # 替换成自定义 pkl 信息文件里的所有类别
+    METAINFO = {
+        'classes': ('Pedestrian', 'Cyclist', 'Car')
+    }
+    def parse_ann_info(self, info):
+        """Process the `instances` in data info to `ann_info`.
+        Args:
+            info (dict): Data information of single data sample.
+        Returns:
+            dict: Annotation information consists of the following keys:
+                - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`):
+                  3D ground truth bboxes.
+                - gt_labels_3d (np.ndarray): Labels of ground truths.
+        """
+        ann_info = super().parse_ann_info(info)
+        if ann_info is None:
+            ann_info = dict()
+            # 空实例
+            ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32)
+            ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64)
+        # 过滤掉没有在训练中使用的类别
+        ann_info = self._remove_dontcare(ann_info)
+        gt_bboxes_3d = LiDARInstance3DBoxes(ann_info['gt_bboxes_3d'])
+        ann_info['gt_bboxes_3d'] = gt_bboxes_3d
+        return ann_info
+```
+数据预处理后，用户可以通过以下两个步骤来训练自定义数据集：
+1. 修改配置文件来使用自定义数据集。
+2. 验证自定义数据集标注的正确性。
+这里我们以在自定义数据集上训练 PointPillars 为例：
+### 准备配置
+这里我们演示一个纯点云训练的配置示例：
+#### 准备数据集配置
+在 `configs/_base_/datasets/custom.py` 中：
+```python
+# 数据集设置
+dataset_type = 'MyDataset'
+data_root = 'data/custom/'
+class_names = ['Pedestrian', 'Cyclist', 'Car']  # 替换成您的数据集类别
+point_cloud_range = [0, -40, -3, 70.4, 40, 1]  # 根据您的数据集进行调整
+input_modality = dict(use_lidar=True, use_camera=False)
+metainfo = dict(classes=class_names)
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=4,  # 替换成您的点云数据维度
+        use_dim=4),  # 替换成在训练和推理时实际使用的维度
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=True,
+        with_label_3d=True),
+    dict(
+        type='ObjectNoise',
+        num_try=100,
+        translation_std=[1.0, 1.0, 0.5],
+        global_rot_range=[0.0, 0.0],
+        rot_range=[-0.78539816, 0.78539816]),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.78539816, 0.78539816],
+        scale_ratio_range=[0.95, 1.05]),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=4,  # 替换成您的点云数据维度
+        use_dim=4),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+# 为可视化阶段的数据和 GT 加载构造流水线
+eval_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(type='Pack3DDetInputs', keys=['points']),
+]
+train_dataloader = dict(
+    batch_size=6,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    dataset=dict(
+        type='RepeatDataset',
+        times=2,
+        dataset=dict(
+            type=dataset_type,
+            data_root=data_root,
+            ann_file='custom_infos_train.pkl',  # 指定您的训练 pkl 信息
+            data_prefix=dict(pts='points'),
+            pipeline=train_pipeline,
+            modality=input_modality,
+            test_mode=False,
+            metainfo=metainfo,
+            box_type_3d='LiDAR')))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='points'),
+        ann_file='custom_infos_val.pkl',  # 指定您的验证 pkl 信息
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR'))
+val_evaluator = dict(
+    type='KittiMetric',
+    ann_file=data_root + 'custom_infos_val.pkl',  # 指定您的验证 pkl 信息
+    metric='bbox')
+```
+#### 准备模型配置
+对于基于体素化的检测器如 SECOND，PointPillars 及 CenterPoint，点云范围（point cloud range）和体素大小（voxel size）应该根据您的数据集做调整。理论上，`voxel_size` 和 `point_cloud_range` 的设置是相关联的。设置较小的 `voxel_size` 将增加体素数以及相应的内存消耗。此外，需要注意以下问题：
+如果将 `point_cloud_range` 和 `voxel_size` 分别设置成 `[0, -40, -3, 70.4, 40, 1]` 和 `[0.05, 0.05, 0.1]`，那么中间特征图的形状应该为 `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`。更改 `point_cloud_range` 时，请记得依据 `voxel_size` 更改 `middle_encoder` 里中间特征图的形状。
+关于 `anchor_range` 的设置，一般需要根据数据集做调整。需要注意的是，`z` 值需要根据点云的位置做相应调整，具体请参考此 [issue](https://github.com/open-mmlab/mmdetection3d/issues/986)。
+关于 `anchor_size` 的设置，通常需要计算整个训练集中目标的长、宽、高的平均值作为 `anchor_size`，以获得最好的结果。
+在 `configs/_base_/models/pointpillars_hv_secfpn_custom.py` 中：
+```python
+voxel_size = [0.16, 0.16, 4]  # 根据您的数据集做调整
+point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]  # 根据您的数据集做调整
+model = dict(
+    type='VoxelNet',
+    data_preprocessor=dict(
+        type='Det3DDataPreprocessor',
+        voxel=True,
+        voxel_layer=dict(
+            max_num_points=32,
+            point_cloud_range=point_cloud_range,
+            voxel_size=voxel_size,
+            max_voxels=(16000, 40000))),
+    voxel_encoder=dict(
+        type='PillarFeatureNet',
+        in_channels=4,
+        feat_channels=[64],
+        with_distance=False,
+        voxel_size=voxel_size,
+        point_cloud_range=point_cloud_range),
+    # `output_shape` 需要根据 `point_cloud_range` 和 `voxel_size` 做相应调整
+    middle_encoder=dict(
+        type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
+    backbone=dict(
+        type='SECOND',
+        in_channels=64,
+        layer_nums=[3, 5, 5],
+        layer_strides=[2, 2, 2],
+        out_channels=[64, 128, 256]),
+    neck=dict(
+        type='SECONDFPN',
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    bbox_head=dict(
+        type='Anchor3DHead',
+        num_classes=3,
+        in_channels=384,
+        feat_channels=384,
+        use_direction_classifier=True,
+        assign_per_class=True,
+        # 根据您的数据集调整 `ranges` 和 `sizes`
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGenerator',
+            ranges=[
+                [0, -39.68, -0.6, 69.12, 39.68, -0.6],
+                [0, -39.68, -0.6, 69.12, 39.68, -0.6],
+                [0, -39.68, -1.78, 69.12, 39.68, -1.78],
+            ],
+            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        diff_rad_by_sin=True,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(
+            type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
+        loss_dir=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
+            loss_weight=0.2)),
+    # 模型训练和测试设置
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
+```
+#### 准备整体配置
+我们将上述的所有配置组合在 `configs/pointpillars/pointpillars_hv_secfpn_8xb6_custom.py` 文件中：
+```python
+_base_ = [
+    '../_base_/models/pointpillars_hv_secfpn_custom.py',
+    '../_base_/datasets/custom.py',
+    '../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
+]
+```
+#### 可视化数据集（可选）
+为了验证准备的数据和配置是否正确，我们建议在训练和验证前使用 `tools/misc/browse_dataset.py` 脚本可视化数据集和标注。更多细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/dev-1.x/user_guides/visualization.html)。
+## 评估
+准备好数据和配置之后，您可以遵循我们的文档直接运行训练/测试脚本。
+**注意**：我们为自定义数据集提供了 KITTI 风格的评估实现方法。在数据集配置中需要包含如下内容：
+```python
+val_evaluator = dict(
+    type='KittiMetric',
+    ann_file=data_root + 'custom_infos_val.pkl',  # 指定您的验证 pkl 信息
+    metric='bbox')
+```
--- a/mmdetection3d/docs/zh_cn/advanced_guides/customize_models.md
+++ b/mmdetection3d/docs/zh_cn/advanced_guides/customize_models.md
+# 自定义模型
+我们通常把模型的各个组成成分分成 6 种类型：
+- 编码器（encoder）：包括 voxel encoder 和 middle encoder 等进入 backbone 前所使用的基于体素的方法，如 `HardVFE` 和 `PointPillarsScatter`。
+- 骨干网络（backbone）：通常采用 FCN 网络来提取特征图，如 `ResNet` 和 `SECOND`。
+- 颈部网络（neck）：位于 backbones 和 heads 之间的组成模块，如 `FPN` 和 `SECONDFPN`。
+- 检测头（head）：用于特定任务的组成模块，如`检测框的预测`和`掩码的预测`。
+- RoI 提取器（RoI extractor）：用于从特征图中提取 RoI 特征的组成模块，如 `H3DRoIHead` 和 `PartAggregationROIHead`。
+- 损失函数（loss）：heads 中用于计算损失函数的组成模块，如 `FocalLoss`、`L1Loss` 和 `GHMLoss`。
+## 开发新的组成模块
+### 添加新的编码器
+接下来我们以 HardVFE 为例展示如何开发新的组成模块。
+#### 1. 定义一个新的体素编码器（如 HardVFE：即 HV-SECOND 中使用的体素特征编码器）
+创建一个新文件 `mmdet3d/models/voxel_encoders/voxel_encoder.py`。
+```python
+import torch.nn as nn
+from mmdet3d.registry import MODELS
+@MODELS.register_module()
+class HardVFE(nn.Module):
+    def __init__(self, arg1, arg2):
+        pass
+    def forward(self, x):  # 需要返回一个元组
+        pass
+```
+#### 2. 导入该模块
+您可以在 `mmdet3d/models/voxel_encoders/__init__.py` 中添加以下代码：
+```python
+from .voxel_encoder import HardVFE
+```
+或者在配置文件中添加以下代码，从而避免修改源码：
+```python
+custom_imports = dict(
+    imports=['mmdet3d.models.voxel_encoders.voxel_encoder'],
+    allow_failed_imports=False)
+```
+#### 3. 在配置文件中使用体素编码器
+```python
+model = dict(
+    ...
+    voxel_encoder=dict(
+        type='HardVFE',
+        arg1=xxx,
+        arg2=yyy),
+    ...
+)
+```
+### 添加新的骨干网络
+接下来我们以 [SECOND](https://www.mdpi.com/1424-8220/18/10/3337)（Sparsely Embedded Convolutional Detection）为例展示如何开发新的组成模块。
+#### 1. 定义一个新的骨干网络（如 SECOND）
+创建一个新文件 `mmdet3d/models/backbones/second.py`。
+```python
+from mmengine.model import BaseModule
+from mmdet3d.registry import MODELS
+@MODELS.register_module()
+class SECOND(BaseModule):
+    def __init__(self, arg1, arg2):
+        pass
+    def forward(self, x):  # 需要返回一个元组
+        pass
+```
+#### 2. 导入该模块
+您可以在 `mmdet3d/models/backbones/__init__.py` 中添加以下代码：
+```python
+from .second import SECOND
+```
+或者在配置文件中添加以下代码，从而避免修改源码：
+```python
+custom_imports = dict(
+    imports=['mmdet3d.models.backbones.second'],
+    allow_failed_imports=False)
+```
+#### 3. 在配置文件中使用骨干网络
+```python
+model = dict(
+    ...
+    backbone=dict(
+        type='SECOND',
+        arg1=xxx,
+        arg2=yyy),
+    ...
+)
+```
+### 添加新的颈部网络
+#### 1. 定义一个新的颈部网络（如 SECONDFPN）
+创建一个新文件 `mmdet3d/models/necks/second_fpn.py`。
+```python
+from mmengine.model import BaseModule
+from mmdet3d.registry import MODELS
+@MODELS.register_module()
+class SECONDFPN(BaseModule):
+    def __init__(self,
+                 in_channels=[128, 128, 256],
+                 out_channels=[256, 256, 256],
+                 upsample_strides=[1, 2, 4],
+                 norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
+                 upsample_cfg=dict(type='deconv', bias=False),
+                 conv_cfg=dict(type='Conv2d', bias=False),
+                 use_conv_for_no_stride=False,
+                 init_cfg=None):
+        pass
+    def forward(self, x):
+        # 具体实现忽略
+        pass
+```
+#### 2. 导入该模块
+您可以在 `mmdet3d/models/necks/__init__.py` 中添加以下代码：
+```python
+from .second_fpn import SECONDFPN
+```
+或者在配置文件中添加以下代码，从而避免修改源码：
+```python
+custom_imports = dict(
+    imports=['mmdet3d.models.necks.second_fpn'],
+    allow_failed_imports=False)
+```
+#### 3. 在配置文件中使用颈部网络
+```python
+model = dict(
+    ...
+    neck=dict(
+        type='SECONDFPN',
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    ...
+)
+```
+### 添加新的检测头
+接下来我们以 [PartA2 Head](https://arxiv.org/abs/1907.03670) 为例展示如何开发新的检测头。
+**注意**：此处展示的 `PartA2 RoI Head` 将用于检测器的第二阶段。对于单阶段的检测头，请参考 `mmdet3d/models/dense_heads/` 中的例子。由于其简单高效，它们更常用于自动驾驶场景下的 3D 检测中。
+首先，在 `mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py` 中添加新的 bbox head。`PartA2 RoI Head` 为目标检测实现了一个新的 bbox head。为了实现一个 bbox head，我们通常需要在新模块中实现如下两个函数。有时还需要实现其他相关函数，如 `loss` 和 `get_targets`。
+```python
+from mmengine.model import BaseModule
+from mmdet3d.registry import MODELS
+@MODELS.register_module()
+class PartA2BboxHead(BaseModule):
+    """PartA2 RoI head."""
+    def __init__(self,
+                 num_classes,
+                 seg_in_channels,
+                 part_in_channels,
+                 seg_conv_channels=None,
+                 part_conv_channels=None,
+                 merge_conv_channels=None,
+                 down_conv_channels=None,
+                 shared_fc_channels=None,
+                 cls_channels=None,
+                 reg_channels=None,
+                 dropout_ratio=0.1,
+                 roi_feat_size=14,
+                 with_corner_loss=True,
+                 bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
+                 conv_cfg=dict(type='Conv1d'),
+                 norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
+                 loss_bbox=dict(
+                     type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
+                 loss_cls=dict(
+                     type='CrossEntropyLoss',
+                     use_sigmoid=True,
+                     reduction='none',
+                     loss_weight=1.0),
+                 init_cfg=None):
+        super(PartA2BboxHead, self).__init__(init_cfg=init_cfg)
+    def forward(self, seg_feats, part_feats):
+        pass
+```
+其次，如果有必要的话需要实现一个新的 RoI Head。我们从 `Base3DRoIHead` 中继承得到新的 `PartAggregationROIHead`。我们可以发现 `Base3DRoIHead` 已经实现了如下函数。
+```python
+from mmdet.models.roi_heads import BaseRoIHead
+from mmdet3d.registry import MODELS, TASK_UTILS
+class Base3DRoIHead(BaseRoIHead):
+    """Base class for 3d RoIHeads."""
+    def __init__(self,
+                 bbox_head=None,
+                 bbox_roi_extractor=None,
+                 mask_head=None,
+                 mask_roi_extractor=None,
+                 train_cfg=None,
+                 test_cfg=None,
+                 init_cfg=None):
+        super(Base3DRoIHead, self).__init__(
+            bbox_head=bbox_head,
+            bbox_roi_extractor=bbox_roi_extractor,
+            mask_head=mask_head,
+            mask_roi_extractor=mask_roi_extractor,
+            train_cfg=train_cfg,
+            test_cfg=test_cfg,
+            init_cfg=init_cfg)
+    def init_bbox_head(self, bbox_roi_extractor: dict,
+                       bbox_head: dict) -> None:
+        """Initialize box head and box roi extractor.
+        Args:
+            bbox_roi_extractor (dict or ConfigDict): Config of box
+                roi extractor.
+            bbox_head (dict or ConfigDict): Config of box in box head.
+        """
+        self.bbox_roi_extractor = MODELS.build(bbox_roi_extractor)
+        self.bbox_head = MODELS.build(bbox_head)
+    def init_assigner_sampler(self):
+        """Initialize assigner and sampler."""
+        self.bbox_assigner = None
+        self.bbox_sampler = None
+        if self.train_cfg:
+            if isinstance(self.train_cfg.assigner, dict):
+                self.bbox_assigner = TASK_UTILS.build(self.train_cfg.assigner)
+            elif isinstance(self.train_cfg.assigner, list):
+                self.bbox_assigner = [
+                    TASK_UTILS.build(res) for res in self.train_cfg.assigner
+                ]
+            self.bbox_sampler = TASK_UTILS.build(self.train_cfg.sampler)
+    def init_mask_head(self):
+        """Initialize mask head, skip since ``PartAggregationROIHead`` does not
+        have one."""
+        pass
+```
+接下来主要对 bbox_forward 的逻辑进行修改，同时其继承了来自 `Base3DRoIHead` 的其它逻辑。在 `mmdet3d/models/roi_heads/part_aggregation_roi_head.py` 中，我们实现了新的 RoI Head，如下所示：
+```python
+from typing import Dict, List, Tuple
+from mmdet.models.task_modules import AssignResult, SamplingResult
+from mmengine import ConfigDict
+from torch import Tensor
+from torch.nn import functional as F
+from mmdet3d.registry import MODELS
+from mmdet3d.structures import bbox3d2roi
+from mmdet3d.utils import InstanceList
+from ...structures.det3d_data_sample import SampleList
+from .base_3droi_head import Base3DRoIHead
+@MODELS.register_module()
+class PartAggregationROIHead(Base3DRoIHead):
+    """Part aggregation roi head for PartA2.
+    Args:
+        semantic_head (ConfigDict): Config of semantic head.
+        num_classes (int): The number of classes.
+        seg_roi_extractor (ConfigDict): Config of seg_roi_extractor.
+        bbox_roi_extractor (ConfigDict): Config of part_roi_extractor.
+        bbox_head (ConfigDict): Config of bbox_head.
+        train_cfg (ConfigDict): Training config.
+        test_cfg (ConfigDict): Testing config.
+    """
+    def __init__(self,
+                 semantic_head: dict,
+                 num_classes: int = 3,
+                 seg_roi_extractor: dict = None,
+                 bbox_head: dict = None,
+                 bbox_roi_extractor: dict = None,
+                 train_cfg: dict = None,
+                 test_cfg: dict = None,
+                 init_cfg: dict = None) -> None:
+        super(PartAggregationROIHead, self).__init__(
+            bbox_head=bbox_head,
+            bbox_roi_extractor=bbox_roi_extractor,
+            train_cfg=train_cfg,
+            test_cfg=test_cfg,
+            init_cfg=init_cfg)
+        self.num_classes = num_classes
+        assert semantic_head is not None
+        self.init_seg_head(seg_roi_extractor, semantic_head)
+    def init_seg_head(self, seg_roi_extractor: dict,
+                      semantic_head: dict) -> None:
+        """Initialize semantic head and seg roi extractor.
+        Args:
+            seg_roi_extractor (dict): Config of seg
+                roi extractor.
+            semantic_head (dict): Config of semantic head.
+        """
+        self.semantic_head = MODELS.build(semantic_head)
+        self.seg_roi_extractor = MODELS.build(seg_roi_extractor)
+    @property
+    def with_semantic(self):
+        """bool: whether the head has semantic branch"""
+        return hasattr(self,
+                       'semantic_head') and self.semantic_head is not None
+    def predict(self,
+                feats_dict: Dict,
+                rpn_results_list: InstanceList,
+                batch_data_samples: SampleList,
+                rescale: bool = False,
+                **kwargs) -> InstanceList:
+        """Perform forward propagation of the roi head and predict detection
+        results on the features of the upstream network.
+        Args:
+            feats_dict (dict): Contains features from the first stage.
+            rpn_results_list (List[:obj:`InstanceData`]): Detection results
+                of rpn head.
+            batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
+                samples. It usually includes information such as
+                `gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
+            rescale (bool): If True, return boxes in original image space.
+                Defaults to False.
+        Returns:
+            list[:obj:`InstanceData`]: Detection results of each sample
+            after the post process.
+            Each item usually contains following keys.
+            - scores_3d (Tensor): Classification scores, has a shape
+              (num_instances, )
+            - labels_3d (Tensor): Labels of bboxes, has a shape
+              (num_instances, ).
+            - bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
+              contains a tensor with shape (num_instances, C), where
+              C >= 7.
+        """
+        assert self.with_bbox, 'Bbox head must be implemented in PartA2.'
+        assert self.with_semantic, 'Semantic head must be implemented' \
+                                   ' in PartA2.'
+        batch_input_metas = [
+            data_samples.metainfo for data_samples in batch_data_samples
+        ]
+        voxels_dict = feats_dict.pop('voxels_dict')
+        # TODO: Split predict semantic and bbox
+        results_list = self.predict_bbox(feats_dict, voxels_dict,
+                                         batch_input_metas, rpn_results_list,
+                                         self.test_cfg)
+        return results_list
+    def predict_bbox(self, feats_dict: Dict, voxel_dict: Dict,
+                     batch_input_metas: List[dict],
+                     rpn_results_list: InstanceList,
+                     test_cfg: ConfigDict) -> InstanceList:
+        """Perform forward propagation of the bbox head and predict detection
+        results on the features of the upstream network.
+        Args:
+            feats_dict (dict): Contains features from the first stage.
+            voxel_dict (dict): Contains information of voxels.
+            batch_input_metas (list[dict], Optional): Batch image meta info.
+                Defaults to None.
+            rpn_results_list (List[:obj:`InstanceData`]): Detection results
+                of rpn head.
+            test_cfg (Config): Test config.
+        Returns:
+            list[:obj:`InstanceData`]: Detection results of each sample
+            after the post process.
+            Each item usually contains following keys.
+            - scores_3d (Tensor): Classification scores, has a shape
+              (num_instances, )
+            - labels_3d (Tensor): Labels of bboxes, has a shape
+              (num_instances, ).
+            - bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
+              contains a tensor with shape (num_instances, C), where
+              C >= 7.
+        """
+        ...
+    def loss(self, feats_dict: Dict, rpn_results_list: InstanceList,
+             batch_data_samples: SampleList, **kwargs) -> dict:
+        """Perform forward propagation and loss calculation of the detection
+        roi on the features of the upstream network.
+        Args:
+            feats_dict (dict): Contains features from the first stage.
+            rpn_results_list (List[:obj:`InstanceData`]): Detection results
+                of rpn head.
+            batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
+                samples. It usually includes information such as
+                `gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
+        Returns:
+            dict[str, Tensor]: A dictionary of loss components
+        """
+        assert len(rpn_results_list) == len(batch_data_samples)
+        losses = dict()
+        batch_gt_instances_3d = []
+        batch_gt_instances_ignore = []
+        voxels_dict = feats_dict.pop('voxels_dict')
+        for data_sample in batch_data_samples:
+            batch_gt_instances_3d.append(data_sample.gt_instances_3d)
+            if 'ignored_instances' in data_sample:
+                batch_gt_instances_ignore.append(data_sample.ignored_instances)
+            else:
+                batch_gt_instances_ignore.append(None)
+        if self.with_semantic:
+            semantic_results = self._semantic_forward_train(
+                feats_dict, voxels_dict, batch_gt_instances_3d)
+            losses.update(semantic_results.pop('loss_semantic'))
+        sample_results = self._assign_and_sample(rpn_results_list,
+                                                 batch_gt_instances_3d)
+        if self.with_bbox:
+            feats_dict.update(semantic_results)
+            bbox_results = self._bbox_forward_train(feats_dict, voxels_dict,
+                                                    sample_results)
+            losses.update(bbox_results['loss_bbox'])
+        return losses
+```
+此处我们省略了相关函数的更多细节。更多细节请参考[代码](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/models/roi_heads/part_aggregation_roi_head.py)。
+最后，用户需要在 `mmdet3d/models/roi_heads/bbox_heads/__init__.py` 和 `mmdet3d/models/roi_heads/__init__.py` 添加模块，从而能被相应的注册器找到并加载。
+此外，用户也可以在配置文件中添加以下代码以达到相同的目的。
+```python
+custom_imports=dict(
+    imports=['mmdet3d.models.roi_heads.part_aggregation_roi_head', 'mmdet3d.models.roi_heads.bbox_heads.parta2_bbox_head'],
+    allow_failed_imports=False)
+```
+`PartAggregationROIHead` 的配置文件如下所示：
+```python
+model = dict(
+    ...
+    roi_head=dict(
+        type='PartAggregationROIHead',
+        num_classes=3,
+        semantic_head=dict(
+            type='PointwiseSemanticHead',
+            in_channels=16,
+            extra_width=0.2,
+            seg_score_thr=0.3,
+            num_classes=3,
+            loss_seg=dict(
+                type='mmdet.FocalLoss',
+                use_sigmoid=True,
+                reduction='sum',
+                gamma=2.0,
+                alpha=0.25,
+                loss_weight=1.0),
+            loss_part=dict(
+                type='mmdet.CrossEntropyLoss',
+                use_sigmoid=True,
+                loss_weight=1.0)),
+        seg_roi_extractor=dict(
+            type='Single3DRoIAwareExtractor',
+            roi_layer=dict(
+                type='RoIAwarePool3d',
+                out_size=14,
+                max_pts_per_voxel=128,
+                mode='max')),
+        bbox_roi_extractor=dict(
+            type='Single3DRoIAwareExtractor',
+            roi_layer=dict(
+                type='RoIAwarePool3d',
+                out_size=14,
+                max_pts_per_voxel=128,
+                mode='avg')),
+        bbox_head=dict(
+            type='PartA2BboxHead',
+            num_classes=3,
+            seg_in_channels=16,
+            part_in_channels=4,
+            seg_conv_channels=[64, 64],
+            part_conv_channels=[64, 64],
+            merge_conv_channels=[128, 128],
+            down_conv_channels=[128, 256],
+            bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
+            shared_fc_channels=[256, 512, 512, 512],
+            cls_channels=[256, 256],
+            reg_channels=[256, 256],
+            dropout_ratio=0.1,
+            roi_feat_size=14,
+            with_corner_loss=True,
+            loss_bbox=dict(
+                type='mmdet.SmoothL1Loss',
+                beta=1.0 / 9.0,
+                reduction='sum',
+                loss_weight=1.0),
+            loss_cls=dict(
+                type='mmdet.CrossEntropyLoss',
+                use_sigmoid=True,
+                reduction='sum',
+                loss_weight=1.0))),
+    ...
+)
+```
+MMDetection 2.0 开始支持配置文件之间的继承，因此用户可以关注配置文件的修改。PartA2 Head 的第二阶段主要使用了新的 `PartAggregationROIHead` 和 `PartA2BboxHead`，需要根据对应模块的 `__init__` 函数来设置参数。
+### 添加新的损失函数
+假设您想要为检测框的回归添加一个新的损失函数 `MyLoss`。为了添加一个新的损失函数，用户需要在 `mmdet3d/models/losses/my_loss.py` 中实现该函数。装饰器 `weighted_loss` 能够保证对每个元素的损失进行加权平均。
+```python
+import torch
+import torch.nn as nn
+from mmdet.models.losses.utils import weighted_loss
+from mmdet3d.registry import MODELS
+@weighted_loss
+def my_loss(pred, target):
+    assert pred.size() == target.size() and target.numel() > 0
+    loss = torch.abs(pred - target)
+    return loss
+@MODELS.register_module()
+class MyLoss(nn.Module):
+    def __init__(self, reduction='mean', loss_weight=1.0):
+        super(MyLoss, self).__init__()
+        self.reduction = reduction
+        self.loss_weight = loss_weight
+    def forward(self,
+                pred,
+                target,
+                weight=None,
+                avg_factor=None,
+                reduction_override=None):
+        assert reduction_override in (None, 'none', 'mean', 'sum')
+        reduction = (
+            reduction_override if reduction_override else self.reduction)
+        loss_bbox = self.loss_weight * my_loss(
+            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
+        return loss_bbox
+```
+接下来，用户需要在 `mmdet3d/models/losses/__init__.py` 添加该函数。
+```python
+from .my_loss import MyLoss, my_loss
+```
+或者在配置文件中添加以下代码以达到相同的目的。
+```python
+custom_imports=dict(
+    imports=['mmdet3d.models.losses.my_loss'],
+    allow_failed_imports=False)
+```
+为了使用该函数，用户需要修改 `loss_xxx` 域。由于 `MyLoss` 是用于回归的，您需要修改 head 中的 `loss_bbox` 域。
+```python
+loss_bbox=dict(type='MyLoss', loss_weight=1.0)
+```