raw_mmdetection

7aa442d5 · raojy · 9c03eaa8 · 7aa442d5 · 7aa442d5 · 7aa442d5
Commit 7aa442d5 authored Apr 01, 2026 by raojy
20 changed files
--- a/mmdetection3d/configs/smoke/metafile.yml
+++ b/mmdetection3d/configs/smoke/metafile.yml
+Collections:
+  - Name: SMOKE
+    Metadata:
+      Training Data: KITTI
+      Training Techniques:
+        - Adam
+      Training Resources: 4x V100 GPUS
+      Architecture:
+        - SMOKEMono3DHead
+        - DLA
+    Paper:
+      URL: https://arxiv.org/abs/2002.10111
+      Title: 'SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation'
+    README: configs/smoke/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/models/detectors/smoke_mono3d.py#L7
+      Version: v1.0.0
+
+Models:
+  - Name: smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d
+    In Collection: SMOKE
+    Config: configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+    Metadata:
+      Training Memory (GB): 9.6
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 13.8
+    Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553-d46d9bb0.pth
--- a/mmdetection3d/configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+++ b/mmdetection3d/configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+_base_ = [
+    '../_base_/datasets/kitti-mono3d.py', '../_base_/models/smoke.py',
+    '../_base_/default_runtime.py'
+]
+
+backend_args = None
+
+train_pipeline = [
+    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox=True,
+        with_label=True,
+        with_attr_label=False,
+        with_bbox_3d=True,
+        with_label_3d=True,
+        with_bbox_depth=True),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(type='RandomShiftScale', shift_scale=(0.2, 0.4), aug_prob=0.3),
+    dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
+    dict(
+        type='Pack3DDetInputs',
+        keys=[
+            'img', 'gt_bboxes', 'gt_bboxes_labels', 'gt_bboxes_3d',
+            'gt_labels_3d', 'centers_2d', 'depths'
+        ]),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
+    dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
+    dict(type='Pack3DDetInputs', keys=['img'])
+]
+
+train_dataloader = dict(
+    batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline))
+test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+
+# training schedule for 6x
+max_epochs = 72
+train_cfg = dict(
+    type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=5)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+# learning rate
+param_scheduler = [
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=max_epochs,
+        by_epoch=True,
+        milestones=[50],
+        gamma=0.1)
+]
+
+# optimizer
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='Adam', lr=2.5e-4),
+    clip_grad=None)
+
+find_unused_parameters = True
--- a/mmdetection3d/configs/spvcnn/README.md
+++ b/mmdetection3d/configs/spvcnn/README.md
+# Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
+
+> [Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution ](https://arxiv.org/abs/2007.16100)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/72679458/226509154-80c27d8e-c138-426a-b92e-72846997b5b3.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement SPVCNN with [TorchSparse](https://github.com/mit-han-lab/torchsparse) backend and provide the result and checkpoints on SemanticKITTI datasets.
+
+## Results and models
+
+### SemanticKITTI
+
+|                                 Method                                  | Lr schd | Laser-Polar Mix | Mem (GB) | mIoU |                                                                                                                                                                    Download                                                                                                                                                                     |
+| :---------------------------------------------------------------------: | :-----: | :-------------: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|        [SPVCNN-W16](./spvcnn_w16_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   3.9    | 61.8 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645.log) |
+|        [SPVCNN-W20](./spvcnn_w20_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   4.2    | 62.6 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649.log) |
+|        [SPVCNN-W32](./spvcnn_w32_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   5.4    | 64.3 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324-f7c0c5b4.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324.log) |
+| [SPVCNN-W32](./spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py) |   3x    |        ✔        |   7.2    | 68.7 |                [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908-d68a68b7.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908.log)                |
+
+**Note:** We follow the implementation in SPVNAS original [repo](https://github.com/mit-han-lab/spvnas) and W16\\W20\\W32 indicates different number of channels.
+
+**Note:** Due to TorchSparse backend, the model performance is unstable with TorchSparse backend and may fluctuate by about 1.5 mIoU for different random seeds.
+
+## Citation
+
+```latex
+@inproceedings{tang2020searching,
+  title={Searching efficient 3d architectures with sparse point-voxel convolution},
+  author={Tang, Haotian and Liu, Zhijian and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
+  booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXVIII},
+  pages={685--702},
+  year={2020},
+  organization={Springer}
+}
+```
--- a/mmdetection3d/configs/spvcnn/metafile.yml
+++ b/mmdetection3d/configs/spvcnn/metafile.yml
+Collections:
+  - Name: SPVCNN
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Architecture:
+        - SPVCNN
+    Paper:
+      URL: https://arxiv.org/abs/2007.16100
+      Title: 'Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution'
+    README: configs/spvcnn/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/1.1/mmdet3d/models/backbones/spvcnn_backone.py#L22
+      Version: v1.1.0
+
+Models:
+  - Name: spvcnn_w16_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 3.9
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 61.7
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth
+
+  - Name: spvcnn_w20_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 4.2
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 62.9
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth
+
+  - Name: spvcnn_w32_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 5.4
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 64.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324-f7c0c5b4.pth
+
+  - Name: spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 7.2
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 64.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908-d68a68b7.pth
--- a/mmdetection3d/configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+++ b/mmdetection3d/configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+_base_ = ['./spvcnn_w32_8xb2-amp-15e_semantickitti.py']
+
+model = dict(
+    backbone=dict(
+        base_channels=16,
+        encoder_channels=[16, 32, 64, 128],
+        decoder_channels=[128, 64, 48, 48]),
+    decode_head=dict(channels=48))
+
+randomness = dict(seed=1588147245)
--- a/mmdetection3d/configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+++ b/mmdetection3d/configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+_base_ = ['./spvcnn_w32_8xb2-amp-15e_semantickitti.py']
+
+model = dict(
+    backbone=dict(
+        base_channels=20,
+        encoder_channels=[20, 40, 81, 163],
+        decoder_channels=[163, 81, 61, 61]),
+    decode_head=dict(channels=61))
--- a/mmdetection3d/configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+++ b/mmdetection3d/configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+_base_ = [
+    '../_base_/datasets/semantickitti.py', '../_base_/models/spvcnn.py',
+    '../_base_/default_runtime.py'
+]
+
+train_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=False,
+        with_label_3d=False,
+        with_seg_3d=True,
+        seg_3d_dtype='np.int32',
+        seg_offset=2**16,
+        dataset_type='semantickitti'),
+    dict(type='PointSegClassMapping'),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[0., 6.28318531],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0],
+    ),
+    dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
+]
+
+train_dataloader = dict(
+    sampler=dict(seed=0), dataset=dict(pipeline=train_pipeline))
+
+lr = 0.24
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    loss_scale='dynamic',
+    optimizer=dict(
+        type='SGD', lr=lr, weight_decay=0.0001, momentum=0.9, nesterov=True))
+
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.008, by_epoch=False, begin=0, end=125),
+    dict(
+        type='CosineAnnealingLR',
+        begin=0,
+        T_max=15,
+        by_epoch=True,
+        eta_min=1e-5,
+        convert_to_iter_based=True)
+]
+
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=15, val_interval=1)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1))
+randomness = dict(seed=0, deterministic=False, diff_rank_seed=True)
+env_cfg = dict(cudnn_benchmark=True)
--- a/mmdetection3d/configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+++ b/mmdetection3d/configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+_base_ = [
+    '../_base_/datasets/semantickitti.py', '../_base_/models/spvcnn.py',
+    '../_base_/schedules/schedule-3x.py', '../_base_/default_runtime.py'
+]
+
+model = dict(data_preprocessor=dict(max_voxels=None))
+
+train_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=False,
+        with_label_3d=False,
+        with_seg_3d=True,
+        seg_3d_dtype='np.int32',
+        seg_offset=2**16,
+        dataset_type='semantickitti'),
+    dict(type='PointSegClassMapping'),
+    dict(
+        type='RandomChoice',
+        transforms=[
+            [
+                dict(
+                    type='LaserMix',
+                    num_areas=[3, 4, 5, 6],
+                    pitch_angles=[-25, 3],
+                    pre_transform=[
+                        dict(
+                            type='LoadPointsFromFile',
+                            coord_type='LIDAR',
+                            load_dim=4,
+                            use_dim=4),
+                        dict(
+                            type='LoadAnnotations3D',
+                            with_bbox_3d=False,
+                            with_label_3d=False,
+                            with_seg_3d=True,
+                            seg_3d_dtype='np.int32',
+                            seg_offset=2**16,
+                            dataset_type='semantickitti'),
+                        dict(type='PointSegClassMapping')
+                    ],
+                    prob=1)
+            ],
+            [
+                dict(
+                    type='PolarMix',
+                    instance_classes=[0, 1, 2, 3, 4, 5, 6, 7],
+                    swap_ratio=0.5,
+                    rotate_paste_ratio=1.0,
+                    pre_transform=[
+                        dict(
+                            type='LoadPointsFromFile',
+                            coord_type='LIDAR',
+                            load_dim=4,
+                            use_dim=4),
+                        dict(
+                            type='LoadAnnotations3D',
+                            with_bbox_3d=False,
+                            with_label_3d=False,
+                            with_seg_3d=True,
+                            seg_3d_dtype='np.int32',
+                            seg_offset=2**16,
+                            dataset_type='semantickitti'),
+                        dict(type='PointSegClassMapping')
+                    ],
+                    prob=1)
+            ],
+        ],
+        prob=[0.5, 0.5]),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[0., 6.28318531],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0],
+    ),
+    dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+
+optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic')
+
+default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1))
+randomness = dict(seed=0, deterministic=False, diff_rank_seed=True)
+env_cfg = dict(cudnn_benchmark=True)
--- a/mmdetection3d/configs/ssn/README.md
+++ b/mmdetection3d/configs/ssn/README.md
+# SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds
+
+> [SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds](https://arxiv.org/abs/2004.02774)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds. Due to the nature of point clouds, i.e. unstructured, sparse and noisy, some features benefit-ting multi-class discrimination are underexploited, such as shape information. In this paper, we propose a novel 3D shape signature to explore the shape information from point clouds. By incorporating operations of symmetry, convex hull and chebyshev fitting, the proposed shape sig-nature is not only compact and effective but also robust to the noise, which serves as a soft constraint to improve the feature capability of multi-class discrimination. Based on the proposed shape signature, we develop the shape signature networks (SSN) for 3D object detection, which consist of pyramid feature encoding part, shape-aware grouping heads and explicit shape encoding objective. Experiments show that the proposed method performs remarkably better than existing methods on two large-scale datasets. Furthermore, our shape signature can act as a plug-and-play component and ablation study shows its effectiveness and good scalability.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/79644370/144024507-9c1f23c1-5e5a-49c8-b346-ff37e30adc3a.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement PointPillars with Shape-aware grouping heads used in the SSN and provide the results and checkpoints on the nuScenes and Lyft dataset.
+
+## Results and models
+
+### NuScenes
+
+|                                            Backbone                                             | Lr schd | Mem (GB) | Inf time (fps) |  mAP  |  NDS  |                                                                                                                                                                                                                       Download                                                                                                                                                                                                                       |
+| :---------------------------------------------------------------------------------------------: | :-----: | :------: | :------------: | :---: | :---: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|           [SECFPN](../pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py)            |   2x    |   16.4   |                | 35.17 | 49.76 |                     [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725-0817d270.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725.log.json)                     |
+|                        [SSN](./ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py)                        |   2x    |   3.6    |                | 40.91 | 54.44 |                                              [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351-51915986.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351.log.json)                                              |
+| [RegNetX-400MF-SECFPN](../regnet/pointpillars_hv_regnet-400mf_secfpn_sbn-all_8xb4-2x_nus-3d.py) |   2x    |   16.4   |                | 41.15 | 55.20 | [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230334-53044f32.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230334.log.json) |
+|          [RegNetX-400MF-SSN](./ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py)           |   2x    |   5.1    |                | 46.65 | 58.24 |                    [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615-361e5e04.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615.log.json)                    |
+
+### Lyft
+
+|                                   Backbone                                    | Lr schd | Mem (GB) | Inf time (fps) | Private Score | Public Score |                                                                                                                                                                                                      Download                                                                                                                                                                                                      |
+| :---------------------------------------------------------------------------: | :-----: | :------: | :------------: | :-----------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|  [SECFPN](../pointpillars/pointpillars_hv_secfpn_sbn-all_8xb2-2x_lyft-3d.py)  |   2x    |   12.2   |                |     13.9      |     14.1     |  [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210517_204807-2518e3de.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210517_204807.log.json)  |
+|              [SSN](./ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py)               |   2x    |   8.5    |                |     17.5      |     17.5     |                           [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731-46841b41.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731.log.json)                           |
+| [RegNetX-400MF-SSN](./ssn_hv_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py) |   2x    |   7.4    |                |     17.9      |      18      | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825-d93475a1.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825.log.json) |
+
+Note:
+
+The main difference of the shape-aware grouping heads with the original SECOND FPN heads is that the former groups objects with similar sizes and shapes together, and design shape-specific heads for each group. Heavier heads (with more convolutions and large strides) are designed for large objects while smaller heads for small objects. Note that there may appear different feature map sizes in the outputs, so an anchor generator tailored to these feature maps is also needed in the implementation.
+
+Users could try other settings in terms of the head design. Here we basically refer to the implementation [HERE](https://github.com/xinge008/SSN).
+
+## Citation
+
+```latex
+@inproceedings{zhu2020ssn,
+  title={SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds},
+  author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua},
+  booktitle={Proceedings of the European Conference on Computer Vision},
+  year={2020}
+}
+```
--- a/mmdetection3d/configs/ssn/metafile.yml
+++ b/mmdetection3d/configs/ssn/metafile.yml
+Collections:
+  - Name: SSN
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Training Resources: 8x GeForce GTX 1080 Ti
+      Architecture:
+        - Hard Voxelization
+    Paper:
+      URL: https://arxiv.org/abs/2004.02774
+      Title: 'SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds'
+    README: configs/ssn/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/dense_heads/shape_aware_head.py#L166
+      Version: v0.7.0
+
+Models:
+  - Name: hv_ssn_secfpn_sbn-all_16xb2-2x_nus-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py
+    Metadata:
+      Training Data: nuScenes
+      Training Memory (GB): 3.6
+    Results:
+      - Task: 3D Object Detection
+        Dataset: nuScenes
+        Metrics:
+          mAP: 40.91
+          NDS: 54.44
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351-51915986.pth
+
+  - Name: hv_ssn_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py
+    Metadata:
+      Training Data: nuScenes
+      Training Memory (GB): 5.1
+    Results:
+      - Task: 3D Object Detection
+        Dataset: nuScenes
+        Metrics:
+          mAP: 46.65
+          NDS: 58.24
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615-361e5e04.pth
+
+  - Name: hv_ssn_secfpn_sbn-all_16xb2-2x_lyft-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py
+    Metadata:
+      Training Data: Lyft
+      Training Memory (GB): 8.5
+    Results:
+      - Task: 3D Object Detection
+        Dataset: Lyft
+        Metrics:
+          Private Score: 17.5
+          Public Score: 17.5
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731-46841b41.pth
+
+  - Name: hv_ssn_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d
+    In Collection: SSN
+    Config: configs/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py
+    Metadata:
+      Training Data: Lyft
+      Training Memory (GB): 7.4
+    Results:
+      - Task: 3D Object Detection
+        Dataset: Lyft
+        Metrics:
+          Private Score: 17.9
+          Public Score: 18.0
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825-d93475a1.pth
--- a/mmdetection3d/configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py
+++ b/mmdetection3d/configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py
+_base_ = './ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py'
+# model settings
+model = dict(
+    type='MVXFasterRCNN',
+    pts_backbone=dict(
+        _delete_=True,
+        type='NoStemRegNet',
+        arch=dict(w0=24, wa=24.48, wm=2.54, group_w=16, depth=22, bot_mul=1.0),
+        init_cfg=dict(
+            type='Pretrained', checkpoint='open-mmlab://regnetx_400mf'),
+        out_indices=(1, 2, 3),
+        frozen_stages=-1,
+        strides=(1, 2, 2, 2),
+        base_channels=64,
+        stem_channels=64,
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        norm_eval=False,
+        style='pytorch'),
+    pts_neck=dict(in_channels=[64, 160, 384]))
+# dataset settings
+train_dataloader = dict(batch_size=1, num_workers=2)
--- a/mmdetection3d/configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py
+++ b/mmdetection3d/configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py
+_base_ = './ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py'
+# model settings
+model = dict(
+    type='MVXFasterRCNN',
+    data_preprocessor=dict(type='Det3DDataPreprocessor'),
+    pts_backbone=dict(
+        _delete_=True,
+        type='NoStemRegNet',
+        arch=dict(w0=24, wa=24.48, wm=2.54, group_w=16, depth=22, bot_mul=1.0),
+        init_cfg=dict(
+            type='Pretrained', checkpoint='open-mmlab://regnetx_400mf'),
+        out_indices=(1, 2, 3),
+        frozen_stages=-1,
+        strides=(1, 2, 2, 2),
+        base_channels=64,
+        stem_channels=64,
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        norm_eval=False,
+        style='pytorch'),
+    pts_neck=dict(in_channels=[64, 160, 384]))
--- a/mmdetection3d/configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py
+++ b/mmdetection3d/configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py
+_base_ = [
+    '../_base_/models/pointpillars_hv_fpn_lyft.py',
+    '../_base_/datasets/lyft-3d.py',
+    '../_base_/schedules/schedule-2x.py',
+    '../_base_/default_runtime.py',
+]
+point_cloud_range = [-100, -100, -5, 100, 100, 3]
+# Note that the order of class names should be consistent with
+# the following anchors' order
+class_names = [
+    'bicycle', 'motorcycle', 'pedestrian', 'animal', 'car',
+    'emergency_vehicle', 'bus', 'other_vehicle', 'truck'
+]
+backend_args = None
+
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.3925, 0.3925],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0]),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+        flip_ratio_bev_vertical=0.5),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range)
+        ]),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+train_dataloader = dict(
+    batch_size=2, num_workers=4, dataset=dict(pipeline=train_pipeline))
+test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+
+# model settings
+model = dict(
+    data_preprocessor=dict(
+        voxel_layer=dict(point_cloud_range=[-100, -100, -5, 100, 100, 3])),
+    pts_voxel_encoder=dict(
+        feat_channels=[32, 64],
+        point_cloud_range=[-100, -100, -5, 100, 100, 3]),
+    pts_middle_encoder=dict(output_shape=[800, 800]),
+    pts_neck=dict(
+        _delete_=True,
+        type='SECONDFPN',
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    pts_bbox_head=dict(
+        _delete_=True,
+        type='ShapeAwareHead',
+        num_classes=9,
+        in_channels=384,
+        feat_channels=384,
+        use_direction_classifier=True,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGeneratorPerCls',
+            ranges=[[-100, -100, -1.0709302, 100, 100, -1.0709302],
+                    [-100, -100, -1.3220503, 100, 100, -1.3220503],
+                    [-100, -100, -0.9122268, 100, 100, -0.9122268],
+                    [-100, -100, -1.8012227, 100, 100, -1.8012227],
+                    [-100, -100, -1.0715024, 100, 100, -1.0715024],
+                    [-100, -100, -0.8871424, 100, 100, -0.8871424],
+                    [-100, -100, -0.3519405, 100, 100, -0.3519405],
+                    [-100, -100, -0.6276341, 100, 100, -0.6276341],
+                    [-100, -100, -0.3033737, 100, 100, -0.3033737]],
+            sizes=[
+                [1.76, 0.63, 1.44],  # bicycle
+                [2.35, 0.96, 1.59],  # motorcycle
+                [0.80, 0.76, 1.76],  # pedestrian
+                [0.73, 0.35, 0.50],  # animal
+                [4.75, 1.92, 1.71],  # car
+                [6.52, 2.42, 2.34],  # emergency vehicle
+                [12.70, 2.92, 3.42],  # bus
+                [8.17, 2.75, 3.20],  # other vehicle
+                [10.24, 2.84, 3.44]  # truck
+            ],
+            custom_values=[],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        tasks=[
+            dict(
+                num_class=2,
+                class_names=['bicycle', 'motorcycle'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=2,
+                class_names=['pedestrian', 'animal'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=2,
+                class_names=['car', 'emergency_vehicle'],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=3,
+                class_names=['bus', 'other_vehicle', 'truck'],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01))
+        ],
+        assign_per_class=True,
+        diff_rad_by_sin=True,
+        dir_offset=-0.7854,  # -pi/4
+        dir_limit_offset=0,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(
+            type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
+        loss_dir=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
+            loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        pts=dict(
+            assigner=[
+                dict(  # bicycle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # motorcycle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # pedestrian
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # animal
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # car
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1),
+                dict(  # emergency vehicle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # bus
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1),
+                dict(  # other vehicle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # truck
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1)
+            ],
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+            pos_weight=-1,
+            debug=False)))
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (16 GPUs) x (2 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=32)
--- a/mmdetection3d/configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py
+++ b/mmdetection3d/configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py
+_base_ = [
+    '../_base_/models/pointpillars_hv_fpn_nus.py',
+    '../_base_/datasets/nus-3d.py',
+    '../_base_/schedules/schedule-2x.py',
+    '../_base_/default_runtime.py',
+]
+# Note that the order of class names should be consistent with
+# the following anchors' order
+point_cloud_range = [-50, -50, -5, 50, 50, 3]
+class_names = [
+    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier', 'car',
+    'truck', 'trailer', 'bus', 'construction_vehicle'
+]
+backend_args = None
+
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.3925, 0.3925],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0]),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+        flip_ratio_bev_vertical=0.5),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=5,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='LoadPointsFromMultiSweeps',
+        sweeps_num=10,
+        backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range)
+        ]),
+    dict(type='Pack3DDetInputs', keys=['points'])
+]
+train_dataloader = dict(
+    batch_size=2,
+    num_workers=4,
+    dataset=dict(pipeline=train_pipeline, metainfo=dict(classes=class_names)))
+test_dataloader = dict(
+    dataset=dict(pipeline=test_pipeline, metainfo=dict(classes=class_names)))
+val_dataloader = dict(
+    dataset=dict(pipeline=test_pipeline, metainfo=dict(classes=class_names)))
+
+# model settings
+model = dict(
+    data_preprocessor=dict(voxel_layer=dict(max_num_points=20)),
+    pts_voxel_encoder=dict(feat_channels=[64, 64]),
+    pts_neck=dict(
+        _delete_=True,
+        type='SECONDFPN',
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    pts_bbox_head=dict(
+        _delete_=True,
+        type='ShapeAwareHead',
+        num_classes=10,
+        in_channels=384,
+        feat_channels=384,
+        use_direction_classifier=True,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGeneratorPerCls',
+            ranges=[[-50, -50, -1.67339111, 50, 50, -1.67339111],
+                    [-50, -50, -1.71396371, 50, 50, -1.71396371],
+                    [-50, -50, -1.61785072, 50, 50, -1.61785072],
+                    [-50, -50, -1.80984986, 50, 50, -1.80984986],
+                    [-50, -50, -1.76396500, 50, 50, -1.76396500],
+                    [-50, -50, -1.80032795, 50, 50, -1.80032795],
+                    [-50, -50, -1.74440365, 50, 50, -1.74440365],
+                    [-50, -50, -1.68526504, 50, 50, -1.68526504],
+                    [-50, -50, -1.80673031, 50, 50, -1.80673031],
+                    [-50, -50, -1.64824291, 50, 50, -1.64824291]],
+            sizes=[
+                [1.68452161, 0.60058911, 1.27192197],  # bicycle
+                [2.09973778, 0.76279481, 1.44403034],  # motorcycle
+                [0.72564370, 0.66344886, 1.75748069],  # pedestrian
+                [0.40359262, 0.39694519, 1.06232151],  # traffic cone
+                [0.48578221, 2.49008838, 0.98297065],  # barrier
+                [4.60718145, 1.95017717, 1.72270761],  # car
+                [6.73778078, 2.45609390, 2.73004906],  # truck
+                [12.01320693, 2.87427237, 3.81509561],  # trailer
+                [11.1885991, 2.94046906, 3.47030982],  # bus
+                [6.38352896, 2.73050468, 3.13312415]  # construction vehicle
+            ],
+            custom_values=[0, 0],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        tasks=[
+            dict(
+                num_class=2,
+                class_names=['bicycle', 'motorcycle'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=1,
+                class_names=['pedestrian'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=2,
+                class_names=['traffic_cone', 'barrier'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=1,
+                class_names=['car'],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=4,
+                class_names=[
+                    'truck', 'trailer', 'bus', 'construction_vehicle'
+                ],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01))
+        ],
+        assign_per_class=True,
+        diff_rad_by_sin=True,
+        dir_offset=-0.7854,  # -pi/4
+        dir_limit_offset=0,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(
+            type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
+        loss_dir=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
+            loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        pts=dict(
+            assigner=[
+                dict(  # bicycle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # motorcycle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.3,
+                    min_pos_iou=0.3,
+                    ignore_iof_thr=-1),
+                dict(  # pedestrian
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # traffic cone
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # barrier
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # car
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1),
+                dict(  # truck
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # trailer
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # bus
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # construction vehicle
+                    type='Max3DIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1)
+            ],
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
+            pos_weight=-1,
+            debug=False)))
--- a/mmdetection3d/configs/votenet/README.md
+++ b/mmdetection3d/configs/votenet/README.md
+# Deep Hough Voting for 3D Object Detection in Point Clouds
+
+> [Deep Hough Voting for 3D Object Detection in Point Clouds](https://arxiv.org/abs/1904.09664)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -- samples from 2D manifolds in 3D space -- we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/79644370/143888295-af7435b4-9f75-4669-b5f8-a19ae24a051c.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement VoteNet and provide the result and checkpoints on ScanNet and SUNRGBD datasets.
+
+## Results and models
+
+### ScanNet
+
+|                  Backbone                  | Lr schd | Mem (GB) | Inf time (fps) | AP@0.25 | AP@0.5 |                                                                                                                                                                  Download                                                                                                                                                                  |
+| :----------------------------------------: | :-----: | :------: | :------------: | :-----: | :----: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| [PointNet++](./votenet_8xb8_scannet-3d.py) |   3x    |   4.1    |                |  62.34  | 40.82  | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_8x8_scannet-3d-18class/votenet_8x8_scannet-3d-18class_20210823_234503-cf8134fa.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_8x8_scannet-3d-18class/votenet_8x8_scannet-3d-18class_20210823_234503.log.json) |
+
+### SUNRGBD
+
+|                  Backbone                   | Lr schd | Mem (GB) | Inf time (fps) | AP@0.25 | AP@0.5 |                                                                                                                                                                    Download                                                                                                                                                                    |
+| :-----------------------------------------: | :-----: | :------: | :------------: | :-----: | :----: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| [PointNet++](./votenet_8xb16_sunrgbd-3d.py) |   3x    |   8.1    |                |  59.78  | 35.77  | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823.log.json) |
+
+**Notice**: If your current mmdetection3d version >= 0.6.0, and you are using the checkpoints downloaded from the above links or using checkpoints trained with mmdetection3d version \< 0.6.0, the checkpoints have to be first converted via [tools/model_converters/convert_votenet_checkpoints.py](../../tools/model_converters/convert_votenet_checkpoints.py):
+
+```
+python ./tools/model_converters/convert_votenet_checkpoints.py ${ORIGINAL_CHECKPOINT_PATH} --out=${NEW_CHECKPOINT_PATH}
+```
+
+Then you can use the converted checkpoints following [get_started.md](../../docs/en/get_started.md).
+
+## Indeterminism
+
+Since test data preparation randomly downsamples the points, and the test script uses fixed random seeds while the random seeds of validation in training are not fixed, the test results may be slightly different from the results reported above.
+
+## IoU loss
+
+Adding IoU loss (simply = 1-IoU) boosts VoteNet's performance. To use IoU loss, add this loss term to the config file:
+
+```python
+iou_loss=dict(type='AxisAlignedIoULoss', reduction='sum', loss_weight=10.0 / 3.0)
+```
+
+|                        Backbone                         | Lr schd | Mem (GB) | Inf time (fps) | AP@0.25 | AP@0.5 | Download |
+| :-----------------------------------------------------: | :-----: | :------: | :------------: | :-----: | :----: | :------: |
+| [PointNet++](./votenet_head-iouloss_8xb8_scannet-3d.py) |   3x    |   4.1    |                |  63.81  | 44.21  |    /     |
+
+For now, we only support calculating IoU loss for axis-aligned bounding boxes since the CUDA op of general 3D IoU calculation does not implement the backward method. Therefore, IoU loss can only be used for ScanNet dataset for now.
+
+## Citation
+
+```latex
+@inproceedings{qi2019deep,
+    author = {Qi, Charles R and Litany, Or and He, Kaiming and Guibas, Leonidas J},
+    title = {Deep Hough Voting for 3D Object Detection in Point Clouds},
+    booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
+    year = {2019}
+}
+```
--- a/mmdetection3d/configs/votenet/metafile.yml
+++ b/mmdetection3d/configs/votenet/metafile.yml
+Collections:
+  - Name: VoteNet
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Training Resources: 8x V100 GPUs
+      Architecture:
+        - PointNet++
+    Paper:
+      URL: https://arxiv.org/abs/1904.09664
+      Title: 'Deep Hough Voting for 3D Object Detection in Point Clouds'
+    README: configs/votenet/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/detectors/votenet.py#L10
+      Version: v0.5.0
+
+Models:
+  - Name: votenet_8xb16_sunrgbd-3d.py
+    In Collection: VoteNet
+    Config: configs/votenet/votenet_8xb16_sunrgbd-3d.py
+    Metadata:
+      Training Data: SUNRGBD
+      Training Memory (GB): 8.1
+    Results:
+      - Task: 3D Object Detection
+        Dataset: SUNRGBD
+        Metrics:
+          AP@0.25: 59.78
+          AP@0.5: 35.77
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth
+
+  - Name: votenet_8xb8_scannet-3d.py
+    In Collection: VoteNet
+    Config: configs/votenet/votenet_8xb8_scannet-3d.py
+    Metadata:
+      Training Data: ScanNet
+      Training Memory (GB): 4.1
+    Results:
+      - Task: 3D Object Detection
+        Dataset: ScanNet
+        Metrics:
+          AP@0.25: 62.34
+          AP@0.5: 40.82
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_8x8_scannet-3d-18class/votenet_8x8_scannet-3d-18class_20210823_234503-cf8134fa.pth
+
+  - Name: votenet_iouloss_8x8_scannet-3d-18class
+    In Collection: VoteNet
+    Config: configs/votenet/votenet_head-iouloss_8xb8_scannet-3d.py
+    Metadata:
+      Training Data: ScanNet
+      Training Memory (GB): 4.1
+      Architecture:
+        - IoU Loss
+    Results:
+      - Task: 3D Object Detection
+        Dataset: ScanNet
+        Metrics:
+          AP@0.25: 63.81
+          AP@0.5: 44.21
--- a/mmdetection3d/configs/votenet/votenet_8xb16_sunrgbd-3d.py
+++ b/mmdetection3d/configs/votenet/votenet_8xb16_sunrgbd-3d.py
+# TODO refactor the config of sunrgbd
+_base_ = [
+    '../_base_/datasets/sunrgbd-3d.py', '../_base_/models/votenet.py',
+    '../_base_/schedules/schedule-3x.py', '../_base_/default_runtime.py'
+]
+# model settings
+model = dict(
+    bbox_head=dict(
+        num_classes=10,
+        bbox_coder=dict(
+            type='PartialBinBasedBBoxCoder',
+            num_sizes=10,
+            num_dir_bins=12,
+            with_rot=True,
+            mean_sizes=[
+                [2.114256, 1.620300, 0.927272], [0.791118, 1.279516, 0.718182],
+                [0.923508, 1.867419, 0.845495], [0.591958, 0.552978, 0.827272],
+                [0.699104, 0.454178, 0.75625], [0.69519, 1.346299, 0.736364],
+                [0.528526, 1.002642, 1.172878], [0.500618, 0.632163, 0.683424],
+                [0.404671, 1.071108, 1.688889], [0.76584, 1.398258, 0.472728]
+            ]),
+    ))
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (8 GPUs) x (16 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=128)
--- a/mmdetection3d/configs/votenet/votenet_8xb8_scannet-3d.py
+++ b/mmdetection3d/configs/votenet/votenet_8xb8_scannet-3d.py
+_base_ = [
+    '../_base_/datasets/scannet-3d.py', '../_base_/models/votenet.py',
+    '../_base_/schedules/schedule-3x.py', '../_base_/default_runtime.py'
+]
+
+# model settings
+model = dict(
+    bbox_head=dict(
+        num_classes=18,
+        bbox_coder=dict(
+            type='PartialBinBasedBBoxCoder',
+            num_sizes=18,
+            num_dir_bins=1,
+            with_rot=False,
+            mean_sizes=[[0.76966727, 0.8116021, 0.92573744],
+                        [1.876858, 1.8425595, 1.1931566],
+                        [0.61328, 0.6148609, 0.7182701],
+                        [1.3955007, 1.5121545, 0.83443564],
+                        [0.97949594, 1.0675149, 0.6329687],
+                        [0.531663, 0.5955577, 1.7500148],
+                        [0.9624706, 0.72462326, 1.1481868],
+                        [0.83221924, 1.0490936, 1.6875663],
+                        [0.21132214, 0.4206159, 0.5372846],
+                        [1.4440073, 1.8970833, 0.26985747],
+                        [1.0294262, 1.4040797, 0.87554324],
+                        [1.3766412, 0.65521795, 1.6813129],
+                        [0.6650819, 0.71111923, 1.298853],
+                        [0.41999173, 0.37906948, 1.7513971],
+                        [0.59359556, 0.5912492, 0.73919016],
+                        [0.50867593, 0.50656086, 0.30136237],
+                        [1.1511526, 1.0546296, 0.49706793],
+                        [0.47535285, 0.49249494, 0.5802117]])))
+
+default_hooks = dict(logger=dict(type='LoggerHook', interval=30))
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (8 GPUs) x (8 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=64)
--- a/mmdetection3d/configs/votenet/votenet_head-iouloss_8xb8_scannet-3d.py
+++ b/mmdetection3d/configs/votenet/votenet_head-iouloss_8xb8_scannet-3d.py
+_base_ = ['./votenet_8xb8_scannet-3d.py']
+
+# model settings, add iou loss
+model = dict(
+    bbox_head=dict(
+        iou_loss=dict(
+            type='AxisAlignedIoULoss', reduction='sum', loss_weight=10.0 /
+            3.0)))
--- a/mmdetection3d/dataset-index.yml
+++ b/mmdetection3d/dataset-index.yml
+kitti:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/KITTI_Object/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: KITTI_Object
+  download_root: data
+  data_root: data/kitti
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/kitti_unzip.sh
+
+nuscenes:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/nuScenes/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: nuScenes
+  download_root: data
+  data_root: data/nuscenes
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/nuscenes_unzip.sh
+
+semantickitti:
+  # The name of dataset in OpenDataLab referring to
+  # https://opendatalab.com/SemanticKITTI/cli. You can also download it
+  # by running `odl get ${dataset}` independently
+  dataset: SemanticKITTI
+  download_root: data
+  data_root: data/semantickitti
+  # Scripts for unzipping datasets
+  script: tools/dataset_converters/semantickitti_unzip.sh