fix_mmdetection

eb1107e4 · raojy · 7aa442d5 · eb1107e4 · eb1107e4 · eb1107e4
Commit eb1107e4 authored Apr 01, 2026 by raojy
20 changed files
--- a/mmde/configs/sassd/README.md
+++ b/mmde/configs/sassd/README.md
+# Structure Aware Single-stage 3D Object Detection from Point Cloud
+
+> [Structure Aware Single-stage 3D Object Detection from Point Cloud](<%5Bhttps://arxiv.org/abs/2104.02323%5D(https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf)>)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+3D object detection from point cloud data plays an essential role in autonomous driving. Current single-stage detectors are efficient by progressively downscaling the 3D point clouds in a fully convolutional manner. However, the downscaled features inevitably lose spatial information and cannot make full use of the structure information of 3D point cloud, degrading their localization precision. In this work, we propose to improve the localization precision of single-stage detectors by explicitly leveraging the structure information of 3D point cloud. Specifically, we design an auxiliary network which converts the convolutional features in the backbone network back to point-level representations. The auxiliary network is jointly optimized, by two point-level supervisions, to guide the convolutional features in the backbone network to be aware of the object structure. The auxiliary network can be detached after training and therefore introduces no extra computation in the inference stage. Besides, considering that single-stage detectors suffer from the discordance between the predicted bounding boxes and corresponding classification confidences, we develop an efficient part-sensitive warping operation to align the confidences to the predicted bounding boxes. Our proposed detector ranks at the top of KITTI 3D/BEV detection leaderboards and runs at 25 FPS for inference.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/30491025/172526367-c8b9bdf7-f901-4f2f-8855-bfd55c39f8d1.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement SA-SSD and provide the results and checkpoints on KITTI dataset.
+
+## Citation
+
+```latex
+@InProceedings{he2020sassd,
+    title={Structure Aware Single-stage 3D Object Detection from Point Cloud},
+    author={He, Chenhang and Zeng, Hui and Huang, Jianqiang and Hua, Xian-Sheng and Zhang, Lei},
+    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+    year={2020}
+}
+```
--- a/mmde/configs/sassd/sassd_8xb6-80e_kitti-3d-3class.py
+++ b/mmde/configs/sassd/sassd_8xb6-80e_kitti-3d-3class.py
+_base_ = [
+    '../_base_/datasets/kitti-3d-3class.py',
+    '../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
+]
+
+voxel_size = [0.05, 0.05, 0.1]
+
+model = dict(
+    type='SASSD',
+    data_preprocessor=dict(
+        type='Det3DDataPreprocessor',
+        voxel=True,
+        voxel_layer=dict(
+            max_num_points=5,
+            point_cloud_range=[0, -40, -3, 70.4, 40, 1],
+            voxel_size=voxel_size,
+            max_voxels=(16000, 40000))),
+    voxel_encoder=dict(type='HardSimpleVFE'),
+    middle_encoder=dict(
+        type='SparseEncoderSASSD',
+        in_channels=4,
+        sparse_shape=[41, 1600, 1408],
+        order=('conv', 'norm', 'act')),
+    backbone=dict(
+        type='SECOND',
+        in_channels=256,
+        layer_nums=[5, 5],
+        layer_strides=[1, 2],
+        out_channels=[128, 256]),
+    neck=dict(
+        type='SECONDFPN',
+        in_channels=[128, 256],
+        upsample_strides=[1, 2],
+        out_channels=[256, 256]),
+    bbox_head=dict(
+        type='Anchor3DHead',
+        num_classes=3,
+        in_channels=512,
+        feat_channels=512,
+        use_direction_classifier=True,
+        anchor_generator=dict(
+            type='Anchor3DRangeGenerator',
+            ranges=[
+                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
+                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
+                [0, -40.0, -1.78, 70.4, 40.0, -1.78],
+            ],
+            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        diff_rad_by_sin=True,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(
+            type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
+        loss_dir=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
+            loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.35,
+                neg_iou_thr=0.2,
+                min_pos_iou=0.2,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.35,
+                neg_iou_thr=0.2,
+                min_pos_iou=0.2,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='Max3DIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
--- a/mmde/configs/second/README.md
+++ b/mmde/configs/second/README.md
+# Second: Sparsely embedded convolutional detection
+
+> [SECOND: Sparsely Embedded Convolutional Detection](https://www.mdpi.com/1424-8220/18/10/3337)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/79644370/143889364-10be11c3-838e-4fc9-9613-184f0cd08907.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement SECOND and provide the results and checkpoints on KITTI dataset.
+
+## Results and models
+
+### KITTI
+
+|                              Backbone                               |  Class  |  Lr schd   | Mem (GB) | Inf time (fps) |  mAP  |                                                                                                                                                                                             Download                                                                                                                                                                                             |
+| :-----------------------------------------------------------------: | :-----: | :--------: | :------: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|        [SECFPN](./second_hv_secfpn_8xb6-80e_kitti-3d-car.py)        |   Car   | cyclic 80e |   5.4    |                | 78.2  |                       [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-car/second_hv_secfpn_8xb6-80e_kitti-3d-car-75d9305e.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-car/second_hv_secfpn_8xb6-80e_kitti-3d-car-20230420_191750.log)                        |
+|  [SECFPN (FP16)](./second_hv_secfpn_8xb6-amp-80e_kitti-3d-car.py)   |   Car   | cyclic 80e |   2.9    |                | 78.72 |       [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301-1f5ad833.pth)\| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301.log.json)        |
+|      [SECFPN](./second_hv_secfpn_8xb6-80e_kitti-3d-3class.py)       | 3 Class | cyclic 80e |   5.4    |                | 65.3  |                 [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class/second_hv_secfpn_8xb6-80e_kitti-3d-3class-b086d0a3.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class/second_hv_secfpn_8xb6-80e_kitti-3d-3class-20230420_221130.log)                  |
+| [SECFPN (FP16)](./second_hv_secfpn_8xb6-amp-80e_kitti-3d-3class.py) | 3 Class | cyclic 80e |   2.9    |                | 67.4  | [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059.log.json) |
+
+### Waymo
+
+|                              Backbone                              | Load Interval |  Class  | Lr schd | Mem (GB) | Inf time (fps) | mAP@L1 | mAPH@L1 | mAP@L2 | **mAPH@L2** |                                                                                           Download                                                                                            |
+| :----------------------------------------------------------------: | :-----------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :----: | :---------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| [SECFPN](./second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py) |       5       | 3 Class |   2x    |   8.12   |                |  65.3  |  61.7   |  58.9  |    55.7     | [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_sbn_4x8_2x_waymoD5-3d-3class/hv_second_secfpn_sbn_4x8_2x_waymoD5-3d-3class_20201115_112448.log.json) |
+|                            above @ Car                             |               |         |   2x    |   8.12   |                |  67.1  |  66.6   |  58.7  |    58.2     |                                                                                                                                                                                               |
+|                         above @ Pedestrian                         |               |         |   2x    |   8.12   |                |  68.1  |  59.1   |  59.5  |    51.5     |                                                                                                                                                                                               |
+|                          above @ Cyclist                           |               |         |   2x    |   8.12   |                |  60.7  |  59.5   |  58.4  |    57.3     |                                                                                                                                                                                               |
+
+Note:
+
+- See more details about metrics and data split on Waymo [HERE](https://github.com/open-mmlab/mmdetection3d/tree/main/configs/pointpillars). For implementation details, we basically follow the original settings. All of these results are achieved without bells-and-whistles, e.g. ensemble, multi-scale training and test augmentation.
+- `FP16` means Mixed Precision (FP16) is adopted in training.
+
+## Citation
+
+```latex
+@article{yan2018second,
+  title={Second: Sparsely embedded convolutional detection},
+  author={Yan, Yan and Mao, Yuxing and Li, Bo},
+  journal={Sensors},
+  year={2018},
+  publisher={Multidisciplinary Digital Publishing Institute}
+}
+```
--- a/mmde/configs/second/metafile.yml
+++ b/mmde/configs/second/metafile.yml
+Collections:
+  - Name: SECOND
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Architecture:
+        - Hard Voxelization
+    Paper:
+      URL: https://www.mdpi.com/1424-8220/18/10/3337
+      Title: 'SECOND: Sparsely Embedded Convolutional Detection'
+    README: configs/second/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/backbones/second.py#L11
+      Version: v0.5.0
+
+Models:
+  - Name: second_hv_secfpn_8xb6-80e_kitti-3d-car
+    In Collection: SECOND
+    Config: configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-car.py
+    Metadata:
+      Training Data: KITTI
+      Training Memory (GB): 5.4
+      Training Resources: 8x V100 GPUs
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 78.2
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-car/second_hv_secfpn_8xb6-80e_kitti-3d-car-75d9305e.pth
+
+  - Name: second_hv_secfpn_8xb6-80e_kitti-3d-3class
+    In Collection: SECOND
+    Config: configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py
+    Metadata:
+      Training Data: KITTI
+      Training Memory (GB): 5.4
+      Training Resources: 8x V100 GPUs
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 65.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class/second_hv_secfpn_8xb6-80e_kitti-3d-3class-b086d0a3.pth
+
+  - Name: second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class
+    In Collection: SECOND
+    Config: configs/second/second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
+    Metadata:
+      Training Data: Waymo
+      Training Memory (GB): 8.12
+      Training Resources: 8x GeForce GTX 1080 Ti
+    Results:
+      - Task: 3D Object Detection
+        Dataset: Waymo
+        Metrics:
+          mAP@L1: 65.3
+          mAPH@L1: 61.7
+          mAP@L2: 58.9
+          mAPH@L2: 55.7
+
+  - Name: second_hv_secfpn_8xb6-amp-80e_kitti-3d-car
+    In Collection: SECOND
+    Config: configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-car.py
+    Metadata:
+      Training Techniques:
+        - AdamW
+        - Mixed Precision Training
+      Training Resources: 8x TITAN Xp
+      Training Data: KITTI
+      Training Memory (GB): 2.9
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 78.72
+    Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301-1f5ad833.pth
+    Code:
+      Version: v0.7.0
+
+  - Name: second_hv_secfpn_8xb6-amp-80e_kitti-3d-3class
+    In Collection: SECOND
+    Config: configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-3class.py
+    Metadata:
+      Training Techniques:
+        - AdamW
+        - Mixed Precision Training
+      Training Resources: 8x TITAN Xp
+      Training Data: KITTI
+      Training Memory (GB): 2.9
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 67.4
+    Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth
+    Code:
+      Version: v0.7.0
--- a/mmde/configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py
+++ b/mmde/configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py
+_base_ = [
+    '../_base_/models/second_hv_secfpn_kitti.py',
+    '../_base_/datasets/kitti-3d-3class.py',
+    '../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
+]
--- a/mmde/configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-car.py
+++ b/mmde/configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-car.py
+_base_ = [
+    '../_base_/models/second_hv_secfpn_kitti.py',
+    '../_base_/datasets/kitti-3d-car.py', '../_base_/schedules/cyclic-40e.py',
+    '../_base_/default_runtime.py'
+]
+point_cloud_range = [0, -40, -3, 70.4, 40, 1]
+model = dict(
+    bbox_head=dict(
+        type='Anchor3DHead',
+        num_classes=1,
+        anchor_generator=dict(
+            _delete_=True,
+            type='Anchor3DRangeGenerator',
+            ranges=[[0, -40.0, -1.78, 70.4, 40.0, -1.78]],
+            sizes=[[3.9, 1.6, 1.56]],
+            rotations=[0, 1.57],
+            reshape_out=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        assigner=dict(
+            type='Max3DIoUAssigner',
+            iou_calculator=dict(type='BboxOverlapsNearest3D'),
+            pos_iou_thr=0.6,
+            neg_iou_thr=0.45,
+            min_pos_iou=0.45,
+            ignore_iof_thr=-1),
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False))
--- a/mmde/configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-3class.py
+++ b/mmde/configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-3class.py
+_base_ = 'second_hv_secfpn_8xb6-80e_kitti-3d-3class.py'
+
+# schedule settings
+optim_wrapper = dict(type='AmpOptimWrapper', loss_scale=4096.)
--- a/mmde/configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-car.py
+++ b/mmde/configs/second/second_hv_secfpn_8xb6-amp-80e_kitti-3d-car.py
+_base_ = 'second_hv_secfpn_8xb6-80e_kitti-3d-car.py'
+
+# schedule settings
+optim_wrapper = dict(type='AmpOptimWrapper', loss_scale=4096.)
--- a/mmde/configs/second/second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
+++ b/mmde/configs/second/second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
+_base_ = [
+    '../_base_/models/second_hv_secfpn_waymo.py',
+    '../_base_/datasets/waymoD5-3d-3class.py',
+    '../_base_/schedules/schedule-2x.py',
+    '../_base_/default_runtime.py',
+]
+
+dataset_type = 'WaymoDataset'
+data_root = 'data/waymo/kitti_format/'
+class_names = ['Car', 'Pedestrian', 'Cyclist']
+metainfo = dict(classes=class_names)
+
+point_cloud_range = [-76.8, -51.2, -2, 76.8, 51.2, 4]
+input_modality = dict(use_lidar=True, use_camera=False)
+backend_args = None
+
+db_sampler = dict(
+    data_root=data_root,
+    info_path=data_root + 'waymo_dbinfos_train.pkl',
+    rate=1.0,
+    prepare=dict(
+        filter_by_difficulty=[-1],
+        filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
+    classes=class_names,
+    sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10),
+    points_loader=dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=6,
+        use_dim=[0, 1, 2, 3, 4],
+        backend_args=backend_args),
+    backend_args=backend_args)
+
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=6,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    # dict(type='ObjectSample', db_sampler=db_sampler),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+        flip_ratio_bev_vertical=0.5),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.78539816, 0.78539816],
+        scale_ratio_range=[0.95, 1.05]),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(
+        type='Pack3DDetInputs',
+        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+
+test_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='LIDAR',
+        load_dim=6,
+        use_dim=5,
+        backend_args=backend_args),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+            dict(type='Pack3DDetInputs', keys=['points']),
+        ])
+]
+
+train_dataloader = dict(
+    batch_size=4,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    dataset=dict(
+        type='RepeatDataset',
+        times=2,
+        dataset=dict(
+            type=dataset_type,
+            data_root=data_root,
+            ann_file='waymo_infos_train.pkl',
+            data_prefix=dict(pts='training/velodyne'),
+            pipeline=train_pipeline,
+            modality=input_modality,
+            test_mode=False,
+            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
+            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
+            box_type_3d='LiDAR',
+            # load one frame every five frames
+            load_interval=5,
+            backend_args=backend_args)))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='training/velodyne'),
+        ann_file='waymo_infos_val.pkl',
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR',
+        backend_args=backend_args))
+test_dataloader = dict(
+    batch_size=1,
+    num_workers=1,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_prefix=dict(pts='training/velodyne'),
+        ann_file='waymo_infos_val.pkl',
+        pipeline=test_pipeline,
+        modality=input_modality,
+        test_mode=True,
+        metainfo=metainfo,
+        box_type_3d='LiDAR',
+        backend_args=backend_args))
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (16 GPUs) x (2 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=32)
--- a/mmde/configs/smoke/README.md
+++ b/mmde/configs/smoke/README.md
+# SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
+
+> [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/79644370/143886681-52cb72b9-6635-4624-a728-1c243b046517.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement SMOKE and provide the results and checkpoints on KITTI dataset.
+
+## Results and models
+
+### KITTI
+
+|                           Backbone                            | Lr schd | Mem (GB) | Inf time (fps) |  mAP  |                                                                                                                                                         Download                                                                                                                                                         |
+| :-----------------------------------------------------------: | :-----: | :------: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| [DLA34](./smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py) |   6x    |   9.64   |                | 13.85 | [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553-d46d9bb0.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553.log.json) |
+
+Note: mAP represents Car moderate 3D strict AP11 results.
+
+Detailed performance on KITTI 3D detection (3D/BEV) is as follows, evaluated by AP11 metric:
+
+|            |     Easy      |   Moderate    |     Hard      |
+| ---------- | :-----------: | :-----------: | :-----------: |
+| Car        | 16.92 / 22.97 | 13.85 / 18.32 | 11.90 / 15.88 |
+| Pedestrian | 11.13 / 12.61 | 11.10 / 11.32 | 10.67 / 11.14 |
+| Cyclist    | 0.99  / 1.47  |  0.54 / 0.65  |  0.55 / 0.67  |
+
+## Citation
+
+```latex
+@inproceedings{liu2020smoke,
+  title={Smoke: Single-stage monocular 3d object detection via keypoint estimation},
+  author={Liu, Zechen and Wu, Zizhang and T{\'o}th, Roland},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
+  pages={996--997},
+  year={2020}
+}
+```
--- a/mmde/configs/smoke/metafile.yml
+++ b/mmde/configs/smoke/metafile.yml
+Collections:
+  - Name: SMOKE
+    Metadata:
+      Training Data: KITTI
+      Training Techniques:
+        - Adam
+      Training Resources: 4x V100 GPUS
+      Architecture:
+        - SMOKEMono3DHead
+        - DLA
+    Paper:
+      URL: https://arxiv.org/abs/2002.10111
+      Title: 'SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation'
+    README: configs/smoke/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/models/detectors/smoke_mono3d.py#L7
+      Version: v1.0.0
+
+Models:
+  - Name: smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d
+    In Collection: SMOKE
+    Config: configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+    Metadata:
+      Training Memory (GB): 9.6
+    Results:
+      - Task: 3D Object Detection
+        Dataset: KITTI
+        Metrics:
+          mAP: 13.8
+    Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553-d46d9bb0.pth
--- a/mmde/configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+++ b/mmde/configs/smoke/smoke_dla34_dlaneck_gn-all_4xb8-6x_kitti-mono3d.py
+_base_ = [
+    '../_base_/datasets/kitti-mono3d.py', '../_base_/models/smoke.py',
+    '../_base_/default_runtime.py'
+]
+
+backend_args = None
+
+train_pipeline = [
+    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox=True,
+        with_label=True,
+        with_attr_label=False,
+        with_bbox_3d=True,
+        with_label_3d=True,
+        with_bbox_depth=True),
+    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
+    dict(type='RandomShiftScale', shift_scale=(0.2, 0.4), aug_prob=0.3),
+    dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
+    dict(
+        type='Pack3DDetInputs',
+        keys=[
+            'img', 'gt_bboxes', 'gt_bboxes_labels', 'gt_bboxes_3d',
+            'gt_labels_3d', 'centers_2d', 'depths'
+        ]),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
+    dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
+    dict(type='Pack3DDetInputs', keys=['img'])
+]
+
+train_dataloader = dict(
+    batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline))
+test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
+
+# training schedule for 6x
+max_epochs = 72
+train_cfg = dict(
+    type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=5)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+# learning rate
+param_scheduler = [
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=max_epochs,
+        by_epoch=True,
+        milestones=[50],
+        gamma=0.1)
+]
+
+# optimizer
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='Adam', lr=2.5e-4),
+    clip_grad=None)
+
+find_unused_parameters = True
--- a/mmde/configs/spvcnn/README.md
+++ b/mmde/configs/spvcnn/README.md
+# Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
+
+> [Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution ](https://arxiv.org/abs/2007.16100)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/72679458/226509154-80c27d8e-c138-426a-b92e-72846997b5b3.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement SPVCNN with [TorchSparse](https://github.com/mit-han-lab/torchsparse) backend and provide the result and checkpoints on SemanticKITTI datasets.
+
+## Results and models
+
+### SemanticKITTI
+
+|                                 Method                                  | Lr schd | Laser-Polar Mix | Mem (GB) | mIoU |                                                                                                                                                                    Download                                                                                                                                                                     |
+| :---------------------------------------------------------------------: | :-----: | :-------------: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|        [SPVCNN-W16](./spvcnn_w16_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   3.9    | 61.8 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645.log) |
+|        [SPVCNN-W20](./spvcnn_w20_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   4.2    | 62.6 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649.log) |
+|        [SPVCNN-W32](./spvcnn_w32_8xb2-amp-15e_semantickitti.py)         |   15e   |        ✗        |   5.4    | 64.3 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324-f7c0c5b4.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324.log) |
+| [SPVCNN-W32](./spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py) |   3x    |        ✔        |   7.2    | 68.7 |                [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908-d68a68b7.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908.log)                |
+
+**Note:** We follow the implementation in SPVNAS original [repo](https://github.com/mit-han-lab/spvnas) and W16\\W20\\W32 indicates different number of channels.
+
+**Note:** Due to TorchSparse backend, the model performance is unstable with TorchSparse backend and may fluctuate by about 1.5 mIoU for different random seeds.
+
+## Citation
+
+```latex
+@inproceedings{tang2020searching,
+  title={Searching efficient 3d architectures with sparse point-voxel convolution},
+  author={Tang, Haotian and Liu, Zhijian and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
+  booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXVIII},
+  pages={685--702},
+  year={2020},
+  organization={Springer}
+}
+```
--- a/mmde/configs/spvcnn/metafile.yml
+++ b/mmde/configs/spvcnn/metafile.yml
+Collections:
+  - Name: SPVCNN
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Architecture:
+        - SPVCNN
+    Paper:
+      URL: https://arxiv.org/abs/2007.16100
+      Title: 'Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution'
+    README: configs/spvcnn/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/1.1/mmdet3d/models/backbones/spvcnn_backone.py#L22
+      Version: v1.1.0
+
+Models:
+  - Name: spvcnn_w16_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 3.9
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 61.7
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth
+
+  - Name: spvcnn_w20_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 4.2
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 62.9
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth
+
+  - Name: spvcnn_w32_8xb2-amp-15e_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 5.4
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 64.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324-f7c0c5b4.pth
+
+  - Name: spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti
+    In Collection: SPVCNN
+    Config: configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 7.2
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIOU: 64.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti_20230425_125908-d68a68b7.pth
--- a/mmde/configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+++ b/mmde/configs/spvcnn/spvcnn_w16_8xb2-amp-15e_semantickitti.py
+_base_ = ['./spvcnn_w32_8xb2-amp-15e_semantickitti.py']
+
+model = dict(
+    backbone=dict(
+        base_channels=16,
+        encoder_channels=[16, 32, 64, 128],
+        decoder_channels=[128, 64, 48, 48]),
+    decode_head=dict(channels=48))
+
+randomness = dict(seed=1588147245)
--- a/mmde/configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+++ b/mmde/configs/spvcnn/spvcnn_w20_8xb2-amp-15e_semantickitti.py
+_base_ = ['./spvcnn_w32_8xb2-amp-15e_semantickitti.py']
+
+model = dict(
+    backbone=dict(
+        base_channels=20,
+        encoder_channels=[20, 40, 81, 163],
+        decoder_channels=[163, 81, 61, 61]),
+    decode_head=dict(channels=61))
--- a/mmde/configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+++ b/mmde/configs/spvcnn/spvcnn_w32_8xb2-amp-15e_semantickitti.py
+_base_ = [
+    '../_base_/datasets/semantickitti.py', '../_base_/models/spvcnn.py',
+    '../_base_/default_runtime.py'
+]
+
+train_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=False,
+        with_label_3d=False,
+        with_seg_3d=True,
+        seg_3d_dtype='np.int32',
+        seg_offset=2**16,
+        dataset_type='semantickitti'),
+    dict(type='PointSegClassMapping'),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[0., 6.28318531],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0],
+    ),
+    dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
+]
+
+train_dataloader = dict(
+    sampler=dict(seed=0), dataset=dict(pipeline=train_pipeline))
+
+lr = 0.24
+optim_wrapper = dict(
+    type='AmpOptimWrapper',
+    loss_scale='dynamic',
+    optimizer=dict(
+        type='SGD', lr=lr, weight_decay=0.0001, momentum=0.9, nesterov=True))
+
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.008, by_epoch=False, begin=0, end=125),
+    dict(
+        type='CosineAnnealingLR',
+        begin=0,
+        T_max=15,
+        by_epoch=True,
+        eta_min=1e-5,
+        convert_to_iter_based=True)
+]
+
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=15, val_interval=1)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1))
+randomness = dict(seed=0, deterministic=False, diff_rank_seed=True)
+env_cfg = dict(cudnn_benchmark=True)
--- a/mmde/configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+++ b/mmde/configs/spvcnn/spvcnn_w32_8xb2-amp-laser-polar-mix-3x_semantickitti.py
+_base_ = [
+    '../_base_/datasets/semantickitti.py', '../_base_/models/spvcnn.py',
+    '../_base_/schedules/schedule-3x.py', '../_base_/default_runtime.py'
+]
+
+model = dict(data_preprocessor=dict(max_voxels=None))
+
+train_pipeline = [
+    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=False,
+        with_label_3d=False,
+        with_seg_3d=True,
+        seg_3d_dtype='np.int32',
+        seg_offset=2**16,
+        dataset_type='semantickitti'),
+    dict(type='PointSegClassMapping'),
+    dict(
+        type='RandomChoice',
+        transforms=[
+            [
+                dict(
+                    type='LaserMix',
+                    num_areas=[3, 4, 5, 6],
+                    pitch_angles=[-25, 3],
+                    pre_transform=[
+                        dict(
+                            type='LoadPointsFromFile',
+                            coord_type='LIDAR',
+                            load_dim=4,
+                            use_dim=4),
+                        dict(
+                            type='LoadAnnotations3D',
+                            with_bbox_3d=False,
+                            with_label_3d=False,
+                            with_seg_3d=True,
+                            seg_3d_dtype='np.int32',
+                            seg_offset=2**16,
+                            dataset_type='semantickitti'),
+                        dict(type='PointSegClassMapping')
+                    ],
+                    prob=1)
+            ],
+            [
+                dict(
+                    type='PolarMix',
+                    instance_classes=[0, 1, 2, 3, 4, 5, 6, 7],
+                    swap_ratio=0.5,
+                    rotate_paste_ratio=1.0,
+                    pre_transform=[
+                        dict(
+                            type='LoadPointsFromFile',
+                            coord_type='LIDAR',
+                            load_dim=4,
+                            use_dim=4),
+                        dict(
+                            type='LoadAnnotations3D',
+                            with_bbox_3d=False,
+                            with_label_3d=False,
+                            with_seg_3d=True,
+                            seg_3d_dtype='np.int32',
+                            seg_offset=2**16,
+                            dataset_type='semantickitti'),
+                        dict(type='PointSegClassMapping')
+                    ],
+                    prob=1)
+            ],
+        ],
+        prob=[0.5, 0.5]),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[0., 6.28318531],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0],
+    ),
+    dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+
+optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic')
+
+default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=1))
+randomness = dict(seed=0, deterministic=False, diff_rank_seed=True)
+env_cfg = dict(cudnn_benchmark=True)
--- a/mmde/configs/ssn/README.md
+++ b/mmde/configs/ssn/README.md
+# SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds
+
+> [SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds](https://arxiv.org/abs/2004.02774)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds. Due to the nature of point clouds, i.e. unstructured, sparse and noisy, some features benefit-ting multi-class discrimination are underexploited, such as shape information. In this paper, we propose a novel 3D shape signature to explore the shape information from point clouds. By incorporating operations of symmetry, convex hull and chebyshev fitting, the proposed shape sig-nature is not only compact and effective but also robust to the noise, which serves as a soft constraint to improve the feature capability of multi-class discrimination. Based on the proposed shape signature, we develop the shape signature networks (SSN) for 3D object detection, which consist of pyramid feature encoding part, shape-aware grouping heads and explicit shape encoding objective. Experiments show that the proposed method performs remarkably better than existing methods on two large-scale datasets. Furthermore, our shape signature can act as a plug-and-play component and ablation study shows its effectiveness and good scalability.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/79644370/144024507-9c1f23c1-5e5a-49c8-b346-ff37e30adc3a.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement PointPillars with Shape-aware grouping heads used in the SSN and provide the results and checkpoints on the nuScenes and Lyft dataset.
+
+## Results and models
+
+### NuScenes
+
+|                                            Backbone                                             | Lr schd | Mem (GB) | Inf time (fps) |  mAP  |  NDS  |                                                                                                                                                                                                                       Download                                                                                                                                                                                                                       |
+| :---------------------------------------------------------------------------------------------: | :-----: | :------: | :------------: | :---: | :---: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|           [SECFPN](../pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py)            |   2x    |   16.4   |                | 35.17 | 49.76 |                     [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725-0817d270.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725.log.json)                     |
+|                        [SSN](./ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py)                        |   2x    |   3.6    |                | 40.91 | 54.44 |                                              [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351-51915986.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351.log.json)                                              |
+| [RegNetX-400MF-SECFPN](../regnet/pointpillars_hv_regnet-400mf_secfpn_sbn-all_8xb4-2x_nus-3d.py) |   2x    |   16.4   |                | 41.15 | 55.20 | [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230334-53044f32.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230334.log.json) |
+|          [RegNetX-400MF-SSN](./ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py)           |   2x    |   5.1    |                | 46.65 | 58.24 |                    [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615-361e5e04.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615.log.json)                    |
+
+### Lyft
+
+|                                   Backbone                                    | Lr schd | Mem (GB) | Inf time (fps) | Private Score | Public Score |                                                                                                                                                                                                      Download                                                                                                                                                                                                      |
+| :---------------------------------------------------------------------------: | :-----: | :------: | :------------: | :-----------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|  [SECFPN](../pointpillars/pointpillars_hv_secfpn_sbn-all_8xb2-2x_lyft-3d.py)  |   2x    |   12.2   |                |     13.9      |     14.1     |  [model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210517_204807-2518e3de.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_secfpn_sbn-all_2x8_2x_lyft-3d_20210517_204807.log.json)  |
+|              [SSN](./ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py)               |   2x    |   8.5    |                |     17.5      |     17.5     |                           [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731-46841b41.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731.log.json)                           |
+| [RegNetX-400MF-SSN](./ssn_hv_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py) |   2x    |   7.4    |                |     17.9      |      18      | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825-d93475a1.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825.log.json) |
+
+Note:
+
+The main difference of the shape-aware grouping heads with the original SECOND FPN heads is that the former groups objects with similar sizes and shapes together, and design shape-specific heads for each group. Heavier heads (with more convolutions and large strides) are designed for large objects while smaller heads for small objects. Note that there may appear different feature map sizes in the outputs, so an anchor generator tailored to these feature maps is also needed in the implementation.
+
+Users could try other settings in terms of the head design. Here we basically refer to the implementation [HERE](https://github.com/xinge008/SSN).
+
+## Citation
+
+```latex
+@inproceedings{zhu2020ssn,
+  title={SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds},
+  author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua},
+  booktitle={Proceedings of the European Conference on Computer Vision},
+  year={2020}
+}
+```
--- a/mmde/configs/ssn/metafile.yml
+++ b/mmde/configs/ssn/metafile.yml
+Collections:
+  - Name: SSN
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Training Resources: 8x GeForce GTX 1080 Ti
+      Architecture:
+        - Hard Voxelization
+    Paper:
+      URL: https://arxiv.org/abs/2004.02774
+      Title: 'SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds'
+    README: configs/ssn/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/dense_heads/shape_aware_head.py#L166
+      Version: v0.7.0
+
+Models:
+  - Name: hv_ssn_secfpn_sbn-all_16xb2-2x_nus-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_nus-3d.py
+    Metadata:
+      Training Data: nuScenes
+      Training Memory (GB): 3.6
+    Results:
+      - Task: 3D Object Detection
+        Dataset: nuScenes
+        Metrics:
+          mAP: 40.91
+          NDS: 54.44
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d_20210830_101351-51915986.pth
+
+  - Name: hv_ssn_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_regnet-400mf_secfpn_sbn-all_16xb2-2x_nus-3d.py
+    Metadata:
+      Training Data: nuScenes
+      Training Memory (GB): 5.1
+    Results:
+      - Task: 3D Object Detection
+        Dataset: nuScenes
+        Metrics:
+          mAP: 46.65
+          NDS: 58.24
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_2x16_2x_nus-3d_20210829_210615-361e5e04.pth
+
+  - Name: hv_ssn_secfpn_sbn-all_16xb2-2x_lyft-3d
+    In Collection: SSN
+    Config: configs/ssn/ssn_hv_secfpn_sbn-all_16xb2-2x_lyft-3d.py
+    Metadata:
+      Training Data: Lyft
+      Training Memory (GB): 8.5
+    Results:
+      - Task: 3D Object Detection
+        Dataset: Lyft
+        Metrics:
+          Private Score: 17.5
+          Public Score: 17.5
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20210822_134731-46841b41.pth
+
+  - Name: hv_ssn_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d
+    In Collection: SSN
+    Config: configs/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_16xb1-2x_lyft-3d.py
+    Metadata:
+      Training Data: Lyft
+      Training Memory (GB): 7.4
+    Results:
+      - Task: 3D Object Detection
+        Dataset: Lyft
+        Metrics:
+          Private Score: 17.9
+          Public Score: 18.0
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/ssn/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d/hv_ssn_regnet-400mf_secfpn_sbn-all_1x16_2x_lyft-3d_20210829_122825-d93475a1.pth