Unverified Commit 32a4328b authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Bump version to V1.0.0rc0

Bump version to V1.0.0rc0
parents 86cc487c a8817998
# PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud
> [PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud](https://arxiv.org/abs/1812.04244)
<!-- [ALGORITHM] -->
## Abstract
In this paper, we propose PointRCNN for 3D object detection from raw point cloud. The whole framework is composed of two stages: stage-1 for the bottom-up 3D proposal generation and stage-2 for refining proposals in the canonical coordinates to obtain the final detection results. Instead of generating proposals from RGB image or projecting point cloud to bird's view or voxels as previous methods do, our stage-1 sub-network directly generates a small number of high-quality 3D proposals from point cloud in a bottom-up manner via segmenting the point cloud of the whole scene into foreground points and background. The stage-2 sub-network transforms the pooled points of each proposal to canonical coordinates to learn better local spatial features, which is combined with global semantic features of each point learned in stage-1 for accurate box refinement and confidence prediction. Extensive experiments on the 3D detection benchmark of KITTI dataset show that our proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.
<div align=center>
<img src="https://user-images.githubusercontent.com/79644370/144959105-271038a2-4ae1-4cdb-b6a8-68c14daf83b0.png" width="800"/>
</div>
## Introduction
We implement PointRCNN and provide the result with checkpoints on KITTI dataset.
## Results and models
### KITTI
| Backbone |Class| Lr schd | Mem (GB) | Inf time (fps) | mAP | Download |
| :---------: | :-----: |:-----: | :------: | :------------: | :----: |:----: |
| [PointNet++](./point_rcnn_2x8_kitti-3d-3classes.py) |3 Class|cyclic 40e|4.6||70.83|[model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/point_rcnn/point_rcnn_2x8_kitti-3d-3classes_20211208_151344.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/point_rcnn/point_rcnn_2x8_kitti-3d-3classes_20211208_151344.log.json)|
Note: mAP represents AP11 results on 3 Class under the moderate setting.
Detailed performance on KITTI 3D detection (3D) is as follows, evaluated by AP11 metric:
| | Easy | Moderate | Hard |
|-------------|:-------------:|:--------------:|:------------:|
| Car | 89.13 | 78.72 | 78.24 |
| Pedestrian | 65.81 | 59.57 | 52.75 |
| Cyclist | 93.51 | 74.19 | 70.73 |
## Citation
```latex
@inproceedings{Shi_2019_CVPR,
title = {PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud},
author = {Shi, Shaoshuai and Wang, Xiaogang and Li, Hongsheng},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
```
Collections:
- Name: PointRCNN
Metadata:
Training Data: KITTI
Training Techniques:
- AdamW
Training Resources: 8x Titan XP GPUs
Architecture:
- PointNet++
Paper:
URL: https://arxiv.org/abs/1812.04244
Title: 'PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud'
README: configs/point_rcnn/README.md
Code:
URL: https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/models/detectors/point_rcnn.py#L8
Version: v1.0.0
Models:
- Name: point_rcnn_2x8_kitti-3d-3classes.py
In Collection: PointRCNN
Config: configs/point_rcnn/point_rcnn_2x8_kitti-3d-3classes.py
Metadata:
Training Memory (GB): 4.6
Results:
- Task: 3D Object Detection
Dataset: KITTI
Metrics:
mAP: 70.83
Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/point_rcnn/point_rcnn_2x8_kitti-3d-3classes_20211208_151344.pth
_base_ = [
'../_base_/datasets/kitti-3d-car.py', '../_base_/models/point_rcnn.py',
'../_base_/default_runtime.py', '../_base_/schedules/cyclic_40e.py'
]
# dataset settings
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Car', 'Pedestrian', 'Cyclist']
point_cloud_range = [0, -40, -3, 70.4, 40, 1]
input_modality = dict(use_lidar=True, use_camera=False)
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
sample_groups=dict(Car=20, Pedestrian=15, Cyclist=15),
classes=class_names)
train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointSample', num_points=16384, sample_range=40.0),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointSample', num_points=16384, sample_range=40.0),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(pipeline=train_pipeline, classes=class_names)),
val=dict(pipeline=test_pipeline, classes=class_names),
test=dict(pipeline=test_pipeline, classes=class_names))
# optimizer
lr = 0.001 # max learning rate
optimizer = dict(lr=lr, betas=(0.95, 0.85))
# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=80)
evaluation = dict(interval=2)
# yapf:disable
log_config = dict(
interval=30,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
# yapf:enable
# PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
## Introduction
> [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/abs/1706.02413)
<!-- [ALGORITHM] -->
We implement PointNet++ and provide the result and checkpoints on ScanNet and S3DIS datasets.
## Abstract
```
@inproceedings{qi2017pointnet++,
title={PointNet++ deep hierarchical feature learning on point sets in a metric space},
author={Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J},
booktitle={Proceedings of the 31st International Conference on Neural Information Processing Systems},
pages={5105--5114},
year={2017}
}
```
Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.
<div align=center>
<img src="https://user-images.githubusercontent.com/79644370/143885530-ae53ed38-8132-4bb7-85a7-d2577de7de3f.png" width="800"/>
</div>
## Introduction
We implement PointNet++ and provide the result and checkpoints on ScanNet and S3DIS datasets.
**Notice**: The original PointNet++ paper used step learning rate schedule. We discovered that cosine schedule achieves much better results and adopt it in our implementations. We also use a larger `weight_decay` factor because we find it consistently improves the performance.
## Results
## Results and models
### ScanNet
......@@ -56,3 +56,15 @@ We implement PointNet++ and provide the result and checkpoints on ScanNet and S3
## Indeterminism
Since PointNet++ testing adopts sliding patch inference which involves random point sampling, and the test script uses fixed random seeds while the random seeds of validation in training are not fixed, the test results may be slightly different from the results reported above.
## Citation
```latex
@inproceedings{qi2017pointnet++,
title={PointNet++ deep hierarchical feature learning on point sets in a metric space},
author={Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J},
booktitle={Proceedings of the 31st International Conference on Neural Information Processing Systems},
pages={5105--5114},
year={2017}
}
```
# PointPillars: Fast Encoders for Object Detection from Point Clouds
## Introduction
> [PointPillars: Fast Encoders for Object Detection from Point Clouds](https://arxiv.org/abs/1812.05784)
<!-- [ALGORITHM] -->
We implement PointPillars and provide the results and checkpoints on KITTI, nuScenes, Lyft and Waymo datasets.
## Abstract
```
@inproceedings{lang2019pointpillars,
title={Pointpillars: Fast encoders for object detection from point clouds},
author={Lang, Alex H and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={12697--12705},
year={2019}
}
Object detection in point clouds is an important aspect of many robotics applications such as autonomous driving. In this paper we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline. Recent literature suggests two types of encoders; fixed encoders tend to be fast but sacrifice accuracy, while encoders that are learned from data are more accurate, but slower. In this work we propose PointPillars, a novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). While the encoded features can be used with any standard 2D convolutional detection architecture, we further propose a lean downstream network. Extensive experimentation shows that PointPillars outperforms previous encoders with respect to both speed and accuracy by a large margin. Despite only using lidar, our full detection pipeline significantly outperforms the state of the art, even among fusion methods, with respect to both the 3D and bird's eye view KITTI benchmarks. This detection performance is achieved while running at 62 Hz: a 2 - 4 fold runtime improvement. A faster version of our method matches the state of the art at 105 Hz. These benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds.
```
<div align=center>
<img src="https://user-images.githubusercontent.com/79644370/143885905-aab6ffcf-7727-495e-90ca-edb8dd5e324b.png" width="800"/>
</div>
## Results
## Introduction
We implement PointPillars and provide the results and checkpoints on KITTI, nuScenes, Lyft and Waymo datasets.
## Results and models
### KITTI
......@@ -31,7 +30,9 @@ We implement PointPillars and provide the results and checkpoints on KITTI, nuSc
| Backbone | Lr schd | Mem (GB) | Inf time (fps) | mAP |NDS| Download |
| :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: |
|[SECFPN](./hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py)|2x|16.4||35.17|49.7|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725-0817d270.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d_20200620_230725.log.json)|
|[SECFPN (FP16)](./hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d.py)|2x|8.37||35.19|50.27|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d_20201020_222626-c3f0483e.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d_20201020_222626.log.json)|
|[FPN](./hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py)|2x|16.4||40.0|53.3|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405.log.json)|
|[FPN (FP16)](./hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py)|2x|8.40||39.26|53.26|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d_20201021_120719-269f9dd6.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d_20201021_120719.log.json)|
### Lyft
......@@ -62,3 +63,16 @@ We implement PointPillars and provide the results and checkpoints on KITTI, nuSc
- **Implementation Details**: We basically follow the implementation in the [paper](https://arxiv.org/pdf/1912.04838.pdf) in terms of the network architecture (having a
stride of 1 for the first convolutional block). Different settings of voxelization, data augmentation and hyper parameters make these baselines outperform those in the paper by about 7 mAP for car and 4 mAP for pedestrian with only a subset of the whole dataset. All of these results are achieved without bells-and-whistles, e.g. ensemble, multi-scale training and test augmentation.
- **License Aggrement**: To comply the [license agreement of Waymo dataset](https://waymo.com/open/terms/), the pre-trained models on Waymo dataset are not released. We still release the training log as a reference to ease the future research.
- `FP16` means Mixed Precision (FP16) is adopted in training. With mixed precision training, we can train PointPillars with nuScenes dataset on 8 Titan XP GPUS with batch size of 2. This will cause OOM error without mixed precision training. The loss scale for PointPillars on nuScenes dataset is specifically tuned to avoid the loss to be Nan. We find 32 is more stable than 512, though loss scale 32 still cause Nan sometimes.
## Citation
```latex
@inproceedings{lang2019pointpillars,
title={Pointpillars: Fast encoders for object detection from point clouds},
author={Lang, Alex H and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={12697--12705},
year={2019}
}
```
_base_ = '../pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py'
_base_ = './hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py'
data = dict(samples_per_gpu=2, workers_per_gpu=2)
# fp16 settings, the loss scale is specifically tuned to avoid Nan
fp16 = dict(loss_scale=32.)
......@@ -15,21 +15,15 @@ db_sampler = dict(
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)),
filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
classes=class_names,
sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10))
sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15))
# PointPillars uses different augmentation hyper parameters
train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[0.25, 0.25, 0.25],
global_rot_range=[0.0, 0.0],
rot_range=[-0.15707963267, 0.15707963267]),
dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=False),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
......
......@@ -10,7 +10,7 @@ model = dict(
_delete_=True,
type='Anchor3DRangeGenerator',
ranges=[[0, -39.68, -1.78, 69.12, 39.68, -1.78]],
sizes=[[1.6, 3.9, 1.56]],
sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=True)),
# model training and testing settings
......
......@@ -29,15 +29,15 @@ model = dict(
[-80, -80, -0.9122268, 80, 80, -0.9122268],
[-80, -80, -1.8012227, 80, 80, -1.8012227]],
sizes=[
[1.92, 4.75, 1.71], # car
[2.84, 10.24, 3.44], # truck
[2.92, 12.70, 3.42], # bus
[2.42, 6.52, 2.34], # emergency vehicle
[2.75, 8.17, 3.20], # other vehicle
[0.96, 2.35, 1.59], # motorcycle
[0.63, 1.76, 1.44], # bicycle
[0.76, 0.80, 1.76], # pedestrian
[0.35, 0.73, 0.50] # animal
[4.75, 1.92, 1.71], # car
[10.24, 2.84, 3.44], # truck
[12.70, 2.92, 3.42], # bus
[6.52, 2.42, 2.34], # emergency vehicle
[8.17, 2.75, 3.20], # other vehicle
[2.35, 0.96, 1.59], # motorcycle
[1.76, 0.63, 1.44], # bicycle
[0.80, 0.76, 1.76], # pedestrian
[0.73, 0.35, 0.50] # animal
],
rotations=[0, 1.57],
reshape_out=True)))
......@@ -29,13 +29,13 @@ model = dict(
[-49.6, -49.6, -1.763965, 49.6, 49.6, -1.763965],
],
sizes=[
[1.95017717, 4.60718145, 1.72270761], # car
[2.4560939, 6.73778078, 2.73004906], # truck
[2.87427237, 12.01320693, 3.81509561], # trailer
[0.60058911, 1.68452161, 1.27192197], # bicycle
[0.66344886, 0.7256437, 1.75748069], # pedestrian
[0.39694519, 0.40359262, 1.06232151], # traffic_cone
[2.49008838, 0.48578221, 0.98297065], # barrier
[4.60718145, 1.95017717, 1.72270761], # car
[6.73778078, 2.4560939, 2.73004906], # truck
[12.01320693, 2.87427237, 3.81509561], # trailer
[1.68452161, 0.60058911, 1.27192197], # bicycle
[0.7256437, 0.66344886, 1.75748069], # pedestrian
[0.40359262, 0.39694519, 1.06232151], # traffic_cone
[0.48578221, 2.49008838, 0.98297065], # barrier
],
custom_values=[0, 0],
rotations=[0, 1.57],
......
_base_ = '../pointpillars/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py'
_base_ = './hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py'
data = dict(samples_per_gpu=2, workers_per_gpu=2)
# fp16 settings, the loss scale is specifically tuned to avoid Nan
fp16 = dict(loss_scale=32.)
......@@ -28,15 +28,15 @@ model = dict(
[-100, -100, -0.9122268, 100, 100, -0.9122268],
[-100, -100, -1.8012227, 100, 100, -1.8012227]],
sizes=[
[1.92, 4.75, 1.71], # car
[2.84, 10.24, 3.44], # truck
[2.92, 12.70, 3.42], # bus
[2.42, 6.52, 2.34], # emergency vehicle
[2.75, 8.17, 3.20], # other vehicle
[0.96, 2.35, 1.59], # motorcycle
[0.63, 1.76, 1.44], # bicycle
[0.76, 0.80, 1.76], # pedestrian
[0.35, 0.73, 0.50] # animal
[4.75, 1.92, 1.71], # car
[10.24, 2.84, 3.44], # truck
[12.70, 2.92, 3.42], # bus
[6.52, 2.42, 2.34], # emergency vehicle
[8.17, 2.75, 3.20], # other vehicle
[2.35, 0.96, 1.59], # motorcycle
[1.76, 0.63, 1.44], # bicycle
[0.80, 0.76, 1.76], # pedestrian
[0.73, 0.35, 0.50] # animal
],
rotations=[0, 1.57],
reshape_out=True)))
......@@ -17,7 +17,7 @@ model = dict(
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-74.88, -74.88, -0.0345, 74.88, 74.88, -0.0345]],
sizes=[[2.08, 4.73, 1.77]],
sizes=[[4.73, 2.08, 1.77]],
rotations=[0, 1.57],
reshape_out=True)),
# model training and testing settings
......
......@@ -14,7 +14,7 @@ model = dict(
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-74.88, -74.88, -0.0345, 74.88, 74.88, -0.0345]],
sizes=[[2.08, 4.73, 1.77]],
sizes=[[4.73, 2.08, 1.77]],
rotations=[0, 1.57],
reshape_out=True)),
# model training and testing settings
......
......@@ -167,3 +167,47 @@ Models:
mAPH@L1: 63.3
mAP@L2: 62.6
mAPH@L2: 57.6
- Name: hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d
In Collection: PointPillars
Config: configs/pointpillars/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d.py
Metadata:
Training Techniques:
- AdamW
- Mixed Precision Training
Training Resources: 8x TITAN Xp
Architecture:
- Hard Voxelization
Training Data: nuScenes
Training Memory (GB): 8.37
Results:
- Task: 3D Object Detection
Dataset: nuScenes
Metrics:
mAP: 35.19
NDS: 50.27
Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_secfpn_sbn-all_fp16_2x8_2x_nus-3d_20201020_222626-c3f0483e.pth
Code:
Version: v0.7.0
- Name: hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d
In Collection: PointPillars
Config: configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py
Metadata:
Training Techniques:
- AdamW
- Mixed Precision Training
Training Resources: 8x TITAN Xp
Architecture:
- Hard Voxelization
Training Data: nuScenes
Training Memory (GB): 8.40
Results:
- Task: 3D Object Detection
Dataset: nuScenes
Metrics:
mAP: 39.26
NDS: 53.26
Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d_20201021_120719-269f9dd6.pth
Code:
Version: v0.7.0
# Designing Network Design Spaces
## Introduction
> [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678)
<!-- [BACKBONE] -->
## Abstract
In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.
<div align=center>
<img src="https://user-images.githubusercontent.com/79644370/144025148-b73002cb-3c82-42e4-8da4-65df97aead9c.png" width="800"/>
</div>
## Introduction
We implement RegNetX models in 3D detection systems and provide their first results with PointPillars on nuScenes and Lyft dataset.
The pre-trained modles are converted from [model zoo of pycls](https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md) and maintained in [mmcv](https://github.com/open-mmlab/mmcv).
```
@article{radosavovic2020designing,
title={Designing Network Design Spaces},
author={Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Dollár},
year={2020},
eprint={2003.13678},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Usage
To use a regnet model, there are two steps to do:
......@@ -47,7 +46,7 @@ For other pre-trained models or self-implemented regnet models, the users are re
**Note**: Although Fig. 15 & 16 also provide `w0`, `wa`, `wm`, `group_w`, and `bot_mul` for `arch`, they are quantized thus inaccurate, using them sometimes produces different backbone that does not match the key in the pre-trained model.
## Results
## Results and models
### nuScenes
......@@ -67,3 +66,16 @@ For other pre-trained models or self-implemented regnet models, the users are re
|[RegNetX-400MF-SECFPN](./hv_pointpillars_regnet-400mf_secfpn_sbn-all_4x8_2x_lyft-3d.py)| 2x |15.9||14.9|15.1|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_2x8_2x_lyft-3d_20210524_092151-42513826.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_secfpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_regnet-400mf_secfpn_sbn-all_2x8_2x_lyft-3d_20210524_092151.log.json)|
|[FPN](../pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d.py)|2x|9.2||14.9|15.1|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210517_202818-fc6904c3.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210517_202818.log.json)|
|[RegNetX-400MF-FPN](./hv_pointpillars_regnet-400mf_fpn_sbn-all_4x8_2x_lyft-3d.py)|2x|13.0||16.0|16.1|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_regnet-400mf_fpn_sbn-all_2x8_2x_lyft-3d_20210521_115618-823dcf18.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/regnet/hv_pointpillars_regnet-400mf_fpn_sbn-all_2x8_2x_lyft-3d/hv_pointpillars_regnet-400mf_fpn_sbn-all_2x8_2x_lyft-3d_20210521_115618.log.json)|
## Citation
```latex
@article{radosavovic2020designing,
title={Designing Network Design Spaces},
author={Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Dollár},
year={2020},
eprint={2003.13678},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
_base_ = '../regnet/hv_pointpillars_regnet-400mf_fpn_sbn-all_4x8_2x_nus-3d.py'
_base_ = './hv_pointpillars_regnet-400mf_fpn_sbn-all_4x8_2x_nus-3d.py'
data = dict(samples_per_gpu=2, workers_per_gpu=2)
# fp16 settings, the loss scale is specifically tuned to avoid Nan
fp16 = dict(loss_scale=32.)
......@@ -25,15 +25,15 @@ model = dict(
[-80, -80, -0.9122268, 80, 80, -0.9122268],
[-80, -80, -1.8012227, 80, 80, -1.8012227]],
sizes=[
[1.92, 4.75, 1.71], # car
[2.84, 10.24, 3.44], # truck
[2.92, 12.70, 3.42], # bus
[2.42, 6.52, 2.34], # emergency vehicle
[2.75, 8.17, 3.20], # other vehicle
[0.96, 2.35, 1.59], # motorcycle
[0.63, 1.76, 1.44], # bicycle
[0.76, 0.80, 1.76], # pedestrian
[0.35, 0.73, 0.50] # animal
[4.75, 1.92, 1.71], # car
[10.24, 2.84, 3.44], # truck
[12.70, 2.92, 3.42], # bus
[6.52, 2.42, 2.34], # emergency vehicle
[8.17, 2.75, 3.20], # other vehicle
[2.35, 0.96, 1.59], # motorcycle
[1.76, 0.63, 1.44], # bicycle
[0.80, 0.76, 1.76], # pedestrian
[0.73, 0.35, 0.50] # animal
],
rotations=[0, 1.57],
reshape_out=True)))
......@@ -25,13 +25,13 @@ model = dict(
[-49.6, -49.6, -1.763965, 49.6, 49.6, -1.763965],
],
sizes=[
[1.95017717, 4.60718145, 1.72270761], # car
[2.4560939, 6.73778078, 2.73004906], # truck
[2.87427237, 12.01320693, 3.81509561], # trailer
[0.60058911, 1.68452161, 1.27192197], # bicycle
[0.66344886, 0.7256437, 1.75748069], # pedestrian
[0.39694519, 0.40359262, 1.06232151], # traffic_cone
[2.49008838, 0.48578221, 0.98297065], # barrier
[4.60718145, 1.95017717, 1.72270761], # car
[6.73778078, 2.4560939, 2.73004906], # truck
[12.01320693, 2.87427237, 3.81509561], # trailer
[1.68452161, 0.60058911, 1.27192197], # bicycle
[0.7256437, 0.66344886, 1.75748069], # pedestrian
[0.40359262, 0.39694519, 1.06232151], # traffic_cone
[0.48578221, 2.49008838, 0.98297065], # barrier
],
custom_values=[0, 0],
rotations=[0, 1.57],
......
......@@ -26,15 +26,15 @@ model = dict(
[-100, -100, -0.9122268, 100, 100, -0.9122268],
[-100, -100, -1.8012227, 100, 100, -1.8012227]],
sizes=[
[1.92, 4.75, 1.71], # car
[2.84, 10.24, 3.44], # truck
[2.92, 12.70, 3.42], # bus
[2.42, 6.52, 2.34], # emergency vehicle
[2.75, 8.17, 3.20], # other vehicle
[0.96, 2.35, 1.59], # motorcycle
[0.63, 1.76, 1.44], # bicycle
[0.76, 0.80, 1.76], # pedestrian
[0.35, 0.73, 0.50] # animal
[4.75, 1.92, 1.71], # car
[10.24, 2.84, 3.44], # truck
[12.70, 2.92, 3.42], # bus
[6.52, 2.42, 2.34], # emergency vehicle
[8.17, 2.75, 3.20], # other vehicle
[2.35, 0.96, 1.59], # motorcycle
[1.76, 0.63, 1.44], # bicycle
[0.80, 0.76, 1.76], # pedestrian
[0.73, 0.35, 0.50] # animal
],
rotations=[0, 1.57],
reshape_out=True)))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment