[Feature] SSN benchmark on Lyft (#174)

* Add ssn config on Lyft * Update readme of pointpillars * Update SSN readme * Update model_zoo.md * Update README.md * Fix a minor typo in the nus regnet config * Update README.md

[Feature] SSN benchmark on Lyft (#174)
* Add ssn config on Lyft * Update readme of pointpillars * Update SSN readme * Update model_zoo.md * Update README.md * Fix a minor typo in the nus regnet config * Update README.md
e813610f · twang · GitHub · b016d90c · e813610f · e813610f
Unverified Commit e813610f authored Oct 21, 2020 by twang Committed by GitHub Oct 21, 2020
6 changed files
--- a/README.md
+++ b/README.md
@@ -75,6 +75,7 @@ Results and models are available in the [model zoo](docs/model_zoo.md).
 | Part-A2            | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
 | MVXNet             | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
 | CenterPoint        | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| SSN                | ☐        | ☐        | ☐        | ✗         | ☐     | ✗        | ☐     |
 Other features
 - [x] [Dynamic Voxelization](configs/carafe/README.md)

--- a/configs/pointpillars/README.md
+++ b/configs/pointpillars/README.md
@@ -2,7 +2,7 @@
 ## Introduction
-We implement PointPillars and provide the results and checkpoints on KITTI and nuScenes datasets.
+We implement PointPillars and provide the results and checkpoints on KITTI, nuScenes, Lyft and Waymo datasets.
 ```
 @inproceedings{lang2019pointpillars,

--- a/configs/regnet/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_4x8_2x_nus-3d.py
+++ b/configs/regnet/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_4x8_2x_nus-3d.py
 _base_ = [
-    '../_base_/models/hv_pointpillars_fpn_lyft.py',
+    '../_base_/models/hv_pointpillars_fpn_nus.py',
    '../_base_/datasets/nus-3d.py',
    '../_base_/schedules/schedule_2x.py',
    '../_base_/default_runtime.py',

--- a/configs/ssn/README.md
+++ b/configs/ssn/README.md
+# SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds
+## Introduction
+We implement PointPillars with Shape-aware grouping heads used in the SSN and provide the results and checkpoints on Lyft datasets.
+```
+@inproceedings{zhu2020ssn,
+  title={SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds},
+  author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua},
+  booktitle={Proceedings of the European Conference on Computer Vision},
+  year={2020}
+}
+```
+## Results
+### Lyft
+|  Backbone   | Lr schd | Mem (GB) | Inf time (fps) | Private Score | Public Score | Download |
+| :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: |
+|[SECFPN](../pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_lyft-3d.py)|2x|||13.4|13.4||
+|[SSN](./hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d.py)|2x|8.30||17.4|17.5|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20201016_220844-3058d9fc.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d_20201016_220844.log.json)|
+Note:
+The main difference of the shape-aware grouping heads with the original SECOND FPN heads is that the former groups objects with similar sizes and shapes together, and design shape-specific heads for each group. Heavier heads (with more convolutions and large strides) are designed for large objects while smaller heads for small objects. Note that there may appear different feature map sizes in the outputs, so an anchor generator tailored to these feature maps is also needed in the implementation.
+Users could try other settings in terms of the head design. Here we basically refer to the implementation [HERE](https://github.com/xinge008/SSN).
--- a/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d.py
+++ b/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_lyft-3d.py
+_base_ = [
+    '../_base_/models/hv_pointpillars_fpn_lyft.py',
+    '../_base_/datasets/lyft-3d.py',
+    '../_base_/schedules/schedule_2x.py',
+    '../_base_/default_runtime.py',
+]
+point_cloud_range = [-100, -100, -5, 100, 100, 3]
+# Note that the order of class names should be consistent with
+# the following anchors' order
+class_names = [
+    'bicycle', 'motorcycle', 'pedestrian', 'animal', 'car',
+    'emergency_vehicle', 'bus', 'other_vehicle', 'truck'
+]
+train_pipeline = [
+    dict(type='LoadPointsFromFile', load_dim=5, use_dim=5),
+    dict(type='LoadPointsFromMultiSweeps', sweeps_num=10),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.3925, 0.3925],
+        scale_ratio_range=[0.95, 1.05],
+        translation_std=[0, 0, 0]),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+        flip_ratio_bev_vertical=0.5),
+    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='PointShuffle'),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+test_pipeline = [
+    dict(type='LoadPointsFromFile', load_dim=5, use_dim=5),
+    dict(type='LoadPointsFromMultiSweeps', sweeps_num=10),
+    dict(
+        type='MultiScaleFlipAug3D',
+        img_scale=(1333, 800),
+        pts_scale_ratio=1,
+        flip=False,
+        transforms=[
+            dict(
+                type='GlobalRotScaleTrans',
+                rot_range=[0, 0],
+                scale_ratio_range=[1., 1.],
+                translation_std=[0, 0, 0]),
+            dict(type='RandomFlip3D'),
+            dict(
+                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
+            dict(
+                type='DefaultFormatBundle3D',
+                class_names=class_names,
+                with_label=False),
+            dict(type='Collect3D', keys=['points'])
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=4,
+    train=dict(pipeline=train_pipeline, classes=class_names),
+    val=dict(pipeline=test_pipeline, classes=class_names),
+    test=dict(pipeline=test_pipeline, classes=class_names))
+# model settings
+model = dict(
+    pts_voxel_layer=dict(point_cloud_range=[-100, -100, -5, 100, 100, 3]),
+    pts_voxel_encoder=dict(
+        feat_channels=[32, 64],
+        point_cloud_range=[-100, -100, -5, 100, 100, 3]),
+    pts_middle_encoder=dict(output_shape=[800, 800]),
+    pts_neck=dict(
+        _delete_=True,
+        type='SECONDFPN',
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        in_channels=[64, 128, 256],
+        upsample_strides=[1, 2, 4],
+        out_channels=[128, 128, 128]),
+    pts_bbox_head=dict(
+        _delete_=True,
+        type='ShapeAwareHead',
+        num_classes=9,
+        in_channels=384,
+        feat_channels=384,
+        use_direction_classifier=True,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGeneratorPerCls',
+            ranges=[[-100, -100, -1.0709302, 100, 100, -1.0709302],
+                    [-100, -100, -1.3220503, 100, 100, -1.3220503],
+                    [-100, -100, -0.9122268, 100, 100, -0.9122268],
+                    [-100, -100, -1.8012227, 100, 100, -1.8012227],
+                    [-100, -100, -1.0715024, 100, 100, -1.0715024],
+                    [-100, -100, -0.8871424, 100, 100, -0.8871424],
+                    [-100, -100, -0.3519405, 100, 100, -0.3519405],
+                    [-100, -100, -0.6276341, 100, 100, -0.6276341],
+                    [-100, -100, -0.3033737, 100, 100, -0.3033737]],
+            sizes=[
+                [0.63, 1.76, 1.44],  # bicycle
+                [0.96, 2.35, 1.59],  # motorcycle
+                [0.76, 0.80, 1.76],  # pedestrian
+                [0.35, 0.73, 0.50],  # animal
+                [1.92, 4.75, 1.71],  # car
+                [2.42, 6.52, 2.34],  # emergency vehicle
+                [2.92, 12.70, 3.42],  # bus
+                [2.75, 8.17, 3.20],  # other vehicle
+                [2.84, 10.24, 3.44]  # truck
+            ],
+            custom_values=[],
+            rotations=[0, 1.57],
+            reshape_out=False),
+        tasks=[
+            dict(
+                num_class=2,
+                class_names=['bicycle', 'motorcycle'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=2,
+                class_names=['pedestrian', 'animal'],
+                shared_conv_channels=(64, 64),
+                shared_conv_strides=(1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=2,
+                class_names=['car', 'emergency_vehicle'],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01)),
+            dict(
+                num_class=3,
+                class_names=['bus', 'other_vehicle', 'truck'],
+                shared_conv_channels=(64, 64, 64),
+                shared_conv_strides=(2, 1, 1),
+                norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01))
+        ],
+        assign_per_class=True,
+        diff_rad_by_sin=True,
+        dir_offset=0.7854,  # pi/4
+        dir_limit_offset=0,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7),
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
+        loss_dir=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
+# model training and testing settings
+train_cfg = dict(
+    _delete_=True,
+    pts=dict(
+        assigner=[
+            dict(  # bicycle
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # motorcycle
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # animal
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+            dict(  # emergency vehicle
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # bus
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+            dict(  # other vehicle
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # truck
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1)
+        ],
+        allowed_border=0,
+        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+        pos_weight=-1,
+        debug=False))
--- a/docs/model_zoo.md
+++ b/docs/model_zoo.md
@@ -49,3 +49,7 @@ Please refer to [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/master/
 ### CenterPoint
 Please refer to [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) for details.
+### SSN
+Please refer to [SSN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn) for details. We provide pointpillars with shape-aware grouping heads used in SSN on Lyft datasets currently.