Unverified Commit 32a4328b authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Bump version to V1.0.0rc0

Bump version to V1.0.0rc0
parents 86cc487c a8817998
# Second: Sparsely embedded convolutional detection # Second: Sparsely embedded convolutional detection
## Introduction > [SECOND: Sparsely Embedded Convolutional Detection](https://www.mdpi.com/1424-8220/18/10/3337)
<!-- [ALGORITHM] --> <!-- [ALGORITHM] -->
We implement SECOND and provide the results and checkpoints on KITTI dataset. ## Abstract
``` LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.
@article{yan2018second,
title={Second: Sparsely embedded convolutional detection},
author={Yan, Yan and Mao, Yuxing and Li, Bo},
journal={Sensors},
year={2018},
publisher={Multidisciplinary Digital Publishing Institute}
}
```
## Results <div align=center>
<img src="https://user-images.githubusercontent.com/79644370/143889364-10be11c3-838e-4fc9-9613-184f0cd08907.png" width="800"/>
</div>
## Introduction
We implement SECOND and provide the results and checkpoints on KITTI dataset.
## Results and models
### KITTI ### KITTI
| Backbone |Class| Lr schd | Mem (GB) | Inf time (fps) | mAP |Download | | Backbone |Class| Lr schd | Mem (GB) | Inf time (fps) | mAP |Download |
| :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: | | :---------: | :-----: | :------: | :------------: | :----: |:----: | :------: |
| [SECFPN](./hv_second_secfpn_6x8_80e_kitti-3d-car.py)| Car |cyclic 80e|5.4||79.07|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-car/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-car/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238.log.json)| | [SECFPN](./hv_second_secfpn_6x8_80e_kitti-3d-car.py)| Car |cyclic 80e|5.4||79.07|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-car/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-car/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238.log.json)|
| [SECFPN (FP16)](./hv_second_secfpn_fp16_6x8_80e_kitti-3d-car.py)| Car |cyclic 80e|2.9||78.72|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301-1f5ad833.pth)&#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301.log.json)|
| [SECFPN](./hv_second_secfpn_6x8_80e_kitti-3d-3class.py)| 3 Class |cyclic 80e|5.4||64.41|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-3class/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-3class/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238.log.json)| | [SECFPN](./hv_second_secfpn_6x8_80e_kitti-3d-3class.py)| 3 Class |cyclic 80e|5.4||64.41|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-3class/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/second/hv_second_secfpn_6x8_80e_kitti-3d-3class/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238.log.json)|
| [SECFPN (FP16)](./hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py)| 3 Class |cyclic 80e|2.9||67.4|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059.log.json)|
### Waymo ### Waymo
...@@ -34,4 +36,19 @@ We implement SECOND and provide the results and checkpoints on KITTI dataset. ...@@ -34,4 +36,19 @@ We implement SECOND and provide the results and checkpoints on KITTI dataset.
| above @ Pedestrian|||2x|8.12||68.1|59.1|59.5|51.5| | | above @ Pedestrian|||2x|8.12||68.1|59.1|59.5|51.5| |
| above @ Cyclist|||2x|8.12||60.7|59.5|58.4|57.3| | | above @ Cyclist|||2x|8.12||60.7|59.5|58.4|57.3| |
Note: See more details about metrics and data split on Waymo [HERE](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars). For implementation details, we basically follow the original settings. All of these results are achieved without bells-and-whistles, e.g. ensemble, multi-scale training and test augmentation. Note:
- See more details about metrics and data split on Waymo [HERE](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pointpillars). For implementation details, we basically follow the original settings. All of these results are achieved without bells-and-whistles, e.g. ensemble, multi-scale training and test augmentation.
- `FP16` means Mixed Precision (FP16) is adopted in training.
## Citation
```latex
@article{yan2018second,
title={Second: Sparsely embedded convolutional detection},
author={Yan, Yan and Mao, Yuxing and Li, Bo},
journal={Sensors},
year={2018},
publisher={Multidisciplinary Digital Publishing Institute}
}
```
...@@ -12,7 +12,7 @@ model = dict( ...@@ -12,7 +12,7 @@ model = dict(
_delete_=True, _delete_=True,
type='Anchor3DRangeGenerator', type='Anchor3DRangeGenerator',
ranges=[[0, -40.0, -1.78, 70.4, 40.0, -1.78]], ranges=[[0, -40.0, -1.78, 70.4, 40.0, -1.78]],
sizes=[[1.6, 3.9, 1.56]], sizes=[[3.9, 1.6, 1.56]],
rotations=[0, 1.57], rotations=[0, 1.57],
reshape_out=True)), reshape_out=True)),
# model training and testing settings # model training and testing settings
......
_base_ = '../second/hv_second_secfpn_6x8_80e_kitti-3d-3class.py' _base_ = './hv_second_secfpn_6x8_80e_kitti-3d-3class.py'
# fp16 settings # fp16 settings
fp16 = dict(loss_scale=512.) fp16 = dict(loss_scale=512.)
_base_ = '../second/hv_second_secfpn_6x8_80e_kitti-3d-car.py' _base_ = './hv_second_secfpn_6x8_80e_kitti-3d-car.py'
# fp16 settings # fp16 settings
fp16 = dict(loss_scale=512.) fp16 = dict(loss_scale=512.)
...@@ -21,7 +21,10 @@ db_sampler = dict( ...@@ -21,7 +21,10 @@ db_sampler = dict(
classes=class_names, classes=class_names,
sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10), sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10),
points_loader=dict( points_loader=dict(
type='LoadPointsFromFile', load_dim=5, use_dim=[0, 1, 2, 3, 4])) type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=[0, 1, 2, 3, 4]))
train_pipeline = [ train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=6, use_dim=5), dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=6, use_dim=5),
......
...@@ -57,3 +57,41 @@ Models: ...@@ -57,3 +57,41 @@ Models:
mAPH@L1: 61.7 mAPH@L1: 61.7
mAP@L2: 58.9 mAP@L2: 58.9
mAPH@L2: 55.7 mAPH@L2: 55.7
- Name: hv_second_secfpn_fp16_6x8_80e_kitti-3d-car
In Collection: SECOND
Config: configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car.py
Metadata:
Training Techniques:
- AdamW
- Mixed Precision Training
Training Resources: 8x TITAN Xp
Training Data: KITTI
Training Memory (GB): 2.9
Results:
- Task: 3D Object Detection
Dataset: KITTI
Metrics:
mAP: 78.72
Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car_20200924_211301-1f5ad833.pth
Code:
Version: v0.7.0
- Name: hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class
In Collection: SECOND
Config: configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py
Metadata:
Training Techniques:
- AdamW
- Mixed Precision Training
Training Resources: 8x TITAN Xp
Training Data: KITTI
Training Memory (GB): 2.9
Results:
- Task: 3D Object Detection
Dataset: KITTI
Metrics:
mAP: 67.4
Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fp16/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth
Code:
Version: v0.7.0
# SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
> [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111)
<!-- [ALGORITHM] -->
## Abstract
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation.
<div align=center>
<img src="https://user-images.githubusercontent.com/79644370/143886681-52cb72b9-6635-4624-a728-1c243b046517.png" width="800"/>
</div>
## Introduction
We implement SMOKE and provide the results and checkpoints on KITTI dataset.
## Results and models
### KITTI
| Backbone | Lr schd | Mem (GB) | Inf time (fps) | mAP | Download |
| :---------: | :-----: | :------: | :------------: | :----: | :------: |
|[DLA34](./smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py)|6x|9.64||13.85|[model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553-d46d9bb0.pth) &#124; [log](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553.log.json)
Note: mAP represents Car moderate 3D strict AP11 results.
Detailed performance on KITTI 3D detection (3D/BEV) is as follows, evaluated by AP11 metric:
| | Easy | Moderate | Hard |
|-------------|:-------------:|:--------------:|:------------:|
| Car | 16.92 / 22.97 | 13.85 / 18.32 | 11.90 / 15.88|
| Pedestrian | 11.13 / 12.61| 11.10 / 11.32 | 10.67 / 11.14|
| Cyclist | 0.99 / 1.47 | 0.54 / 0.65 | 0.55 / 0.67 |
## Citation
```latex
@inproceedings{liu2020smoke,
title={Smoke: Single-stage monocular 3d object detection via keypoint estimation},
author={Liu, Zechen and Wu, Zizhang and T{\'o}th, Roland},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages={996--997},
year={2020}
}
```
Collections:
- Name: SMOKE
Metadata:
Training Data: KITTI
Training Techniques:
- Adam
Training Resources: 4x V100 GPUS
Architecture:
- SMOKEMono3DHead
- DLA
Paper:
URL: https://arxiv.org/abs/2002.10111
Title: 'SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation'
README: configs/smoke/README.md
Code:
URL: https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0.dev0/mmdet3d/models/detectors/smoke_mono3d.py#L7
Version: v1.0.0
Models:
- Name: smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d
In Collection: SMOKE
Config: configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py
Metadata:
Training Memory (GB): 9.6
Results:
- Task: 3D Object Detection
Dataset: KITTI
Metrics:
mAP: 13.8
Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d_20210929_015553-d46d9bb0.pth
_base_ = [
'../_base_/datasets/kitti-mono3d.py', '../_base_/models/smoke.py',
'../_base_/default_runtime.py'
]
# optimizer
optimizer = dict(type='Adam', lr=2.5e-4)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='step', warmup=None, step=[50])
# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=72)
log_config = dict(interval=10)
find_unused_parameters = True
class_names = ['Pedestrian', 'Cyclist', 'Car']
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=False,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='RandomShiftScale', shift_scale=(0.2, 0.4), aug_prob=0.3),
dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d', 'gt_labels_3d',
'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
img_scale=(1280, 384),
flip=False,
transforms=[
dict(type='AffineResize', img_scale=(1280, 384), down_ratio=4),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
# SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds # SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds
## Introduction > [SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds](https://arxiv.org/abs/2004.02774)
<!-- [ALGORITHM] --> <!-- [ALGORITHM] -->
We implement PointPillars with Shape-aware grouping heads used in the SSN and provide the results and checkpoints on the nuScenes and Lyft dataset. ## Abstract
``` Multi-class 3D object detection aims to localize and classify objects of multiple categories from point clouds. Due to the nature of point clouds, i.e. unstructured, sparse and noisy, some features benefit-ting multi-class discrimination are underexploited, such as shape information. In this paper, we propose a novel 3D shape signature to explore the shape information from point clouds. By incorporating operations of symmetry, convex hull and chebyshev fitting, the proposed shape sig-nature is not only compact and effective but also robust to the noise, which serves as a soft constraint to improve the feature capability of multi-class discrimination. Based on the proposed shape signature, we develop the shape signature networks (SSN) for 3D object detection, which consist of pyramid feature encoding part, shape-aware grouping heads and explicit shape encoding objective. Experiments show that the proposed method performs remarkably better than existing methods on two large-scale datasets. Furthermore, our shape signature can act as a plug-and-play component and ablation study shows its effectiveness and good scalability.
@inproceedings{zhu2020ssn,
title={SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds}, <div align=center>
author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua}, <img src="https://user-images.githubusercontent.com/79644370/144024507-9c1f23c1-5e5a-49c8-b346-ff37e30adc3a.png" width="800"/>
booktitle={Proceedings of the European Conference on Computer Vision}, </div>
year={2020}
} ## Introduction
```
We implement PointPillars with Shape-aware grouping heads used in the SSN and provide the results and checkpoints on the nuScenes and Lyft dataset.
## Results ## Results and models
### NuScenes ### NuScenes
...@@ -39,3 +40,14 @@ Note: ...@@ -39,3 +40,14 @@ Note:
The main difference of the shape-aware grouping heads with the original SECOND FPN heads is that the former groups objects with similar sizes and shapes together, and design shape-specific heads for each group. Heavier heads (with more convolutions and large strides) are designed for large objects while smaller heads for small objects. Note that there may appear different feature map sizes in the outputs, so an anchor generator tailored to these feature maps is also needed in the implementation. The main difference of the shape-aware grouping heads with the original SECOND FPN heads is that the former groups objects with similar sizes and shapes together, and design shape-specific heads for each group. Heavier heads (with more convolutions and large strides) are designed for large objects while smaller heads for small objects. Note that there may appear different feature map sizes in the outputs, so an anchor generator tailored to these feature maps is also needed in the implementation.
Users could try other settings in terms of the head design. Here we basically refer to the implementation [HERE](https://github.com/xinge008/SSN). Users could try other settings in terms of the head design. Here we basically refer to the implementation [HERE](https://github.com/xinge008/SSN).
## Citation
```latex
@inproceedings{zhu2020ssn,
title={SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds},
author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2020}
}
```
...@@ -96,15 +96,15 @@ model = dict( ...@@ -96,15 +96,15 @@ model = dict(
[-100, -100, -0.6276341, 100, 100, -0.6276341], [-100, -100, -0.6276341, 100, 100, -0.6276341],
[-100, -100, -0.3033737, 100, 100, -0.3033737]], [-100, -100, -0.3033737, 100, 100, -0.3033737]],
sizes=[ sizes=[
[0.63, 1.76, 1.44], # bicycle [1.76, 0.63, 1.44], # bicycle
[0.96, 2.35, 1.59], # motorcycle [2.35, 0.96, 1.59], # motorcycle
[0.76, 0.80, 1.76], # pedestrian [0.80, 0.76, 1.76], # pedestrian
[0.35, 0.73, 0.50], # animal [0.73, 0.35, 0.50], # animal
[1.92, 4.75, 1.71], # car [4.75, 1.92, 1.71], # car
[2.42, 6.52, 2.34], # emergency vehicle [6.52, 2.42, 2.34], # emergency vehicle
[2.92, 12.70, 3.42], # bus [12.70, 2.92, 3.42], # bus
[2.75, 8.17, 3.20], # other vehicle [8.17, 2.75, 3.20], # other vehicle
[2.84, 10.24, 3.44] # truck [10.24, 2.84, 3.44] # truck
], ],
custom_values=[], custom_values=[],
rotations=[0, 1.57], rotations=[0, 1.57],
...@@ -137,7 +137,7 @@ model = dict( ...@@ -137,7 +137,7 @@ model = dict(
], ],
assign_per_class=True, assign_per_class=True,
diff_rad_by_sin=True, diff_rad_by_sin=True,
dir_offset=0.7854, # pi/4 dir_offset=-0.7854, # -pi/4
dir_limit_offset=0, dir_limit_offset=0,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7), bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7),
loss_cls=dict( loss_cls=dict(
......
...@@ -94,16 +94,16 @@ model = dict( ...@@ -94,16 +94,16 @@ model = dict(
[-50, -50, -1.80673031, 50, 50, -1.80673031], [-50, -50, -1.80673031, 50, 50, -1.80673031],
[-50, -50, -1.64824291, 50, 50, -1.64824291]], [-50, -50, -1.64824291, 50, 50, -1.64824291]],
sizes=[ sizes=[
[0.60058911, 1.68452161, 1.27192197], # bicycle [1.68452161, 0.60058911, 1.27192197], # bicycle
[0.76279481, 2.09973778, 1.44403034], # motorcycle [2.09973778, 0.76279481, 1.44403034], # motorcycle
[0.66344886, 0.72564370, 1.75748069], # pedestrian [0.72564370, 0.66344886, 1.75748069], # pedestrian
[0.39694519, 0.40359262, 1.06232151], # traffic cone [0.40359262, 0.39694519, 1.06232151], # traffic cone
[2.49008838, 0.48578221, 0.98297065], # barrier [0.48578221, 2.49008838, 0.98297065], # barrier
[1.95017717, 4.60718145, 1.72270761], # car [4.60718145, 1.95017717, 1.72270761], # car
[2.45609390, 6.73778078, 2.73004906], # truck [6.73778078, 2.45609390, 2.73004906], # truck
[2.87427237, 12.01320693, 3.81509561], # trailer [12.01320693, 2.87427237, 3.81509561], # trailer
[2.94046906, 11.1885991, 3.47030982], # bus [11.1885991, 2.94046906, 3.47030982], # bus
[2.73050468, 6.38352896, 3.13312415] # construction vehicle [6.38352896, 2.73050468, 3.13312415] # construction vehicle
], ],
custom_values=[0, 0], custom_values=[0, 0],
rotations=[0, 1.57], rotations=[0, 1.57],
...@@ -144,7 +144,7 @@ model = dict( ...@@ -144,7 +144,7 @@ model = dict(
], ],
assign_per_class=True, assign_per_class=True,
diff_rad_by_sin=True, diff_rad_by_sin=True,
dir_offset=0.7854, # pi/4 dir_offset=-0.7854, # -pi/4
dir_limit_offset=0, dir_limit_offset=0,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9), bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
loss_cls=dict( loss_cls=dict(
......
# Deep Hough Voting for 3D Object Detection in Point Clouds # Deep Hough Voting for 3D Object Detection in Point Clouds
## Introduction > [Deep Hough Voting for 3D Object Detection in Point Clouds](https://arxiv.org/abs/1904.09664)
<!-- [ALGORITHM] --> <!-- [ALGORITHM] -->
We implement VoteNet and provide the result and checkpoints on ScanNet and SUNRGBD datasets. ## Abstract
``` Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -- samples from 2D manifolds in 3D space -- we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
@inproceedings{qi2019deep,
author = {Qi, Charles R and Litany, Or and He, Kaiming and Guibas, Leonidas J}, <div align=center>
title = {Deep Hough Voting for 3D Object Detection in Point Clouds}, <img src="https://user-images.githubusercontent.com/79644370/143888295-af7435b4-9f75-4669-b5f8-a19ae24a051c.png" width="800"/>
booktitle = {Proceedings of the IEEE International Conference on Computer Vision}, </div>
year = {2019}
} ## Introduction
```
We implement VoteNet and provide the result and checkpoints on ScanNet and SUNRGBD datasets.
## Results ## Results and models
### ScanNet ### ScanNet
...@@ -54,3 +55,14 @@ iou_loss=dict(type='AxisAlignedIoULoss', reduction='sum', loss_weight=10.0 / 3.0 ...@@ -54,3 +55,14 @@ iou_loss=dict(type='AxisAlignedIoULoss', reduction='sum', loss_weight=10.0 / 3.0
| [PointNet++](./votenet_iouloss_8x8_scannet-3d-18class.py) | 3x |4.1||63.81|44.21|/| | [PointNet++](./votenet_iouloss_8x8_scannet-3d-18class.py) | 3x |4.1||63.81|44.21|/|
For now, we only support calculating IoU loss for axis-aligned bounding boxes since the CUDA op of general 3D IoU calculation does not implement the backward method. Therefore, IoU loss can only be used for ScanNet dataset for now. For now, we only support calculating IoU loss for axis-aligned bounding boxes since the CUDA op of general 3D IoU calculation does not implement the backward method. Therefore, IoU loss can only be used for ScanNet dataset for now.
## Citation
```latex
@inproceedings{qi2019deep,
author = {Qi, Charles R and Litany, Or and He, Kaiming and Guibas, Leonidas J},
title = {Deep Hough Voting for 3D Object Detection in Point Clouds},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
year = {2019}
}
```
import argparse import argparse
from os import path as osp
import mmcv import mmcv
from indoor3d_util import export from indoor3d_util import export
from os import path as osp
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument( parser.add_argument(
......
import glob import glob
import numpy as np
from os import path as osp from os import path as osp
import numpy as np
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# CONSTANTS # CONSTANTS
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
......
...@@ -11,11 +11,12 @@ Usage example: python ./batch_load_scannet_data.py ...@@ -11,11 +11,12 @@ Usage example: python ./batch_load_scannet_data.py
""" """
import argparse import argparse
import datetime import datetime
import numpy as np
import os import os
from load_scannet_data import export
from os import path as osp from os import path as osp
import numpy as np
from load_scannet_data import export
DONOTCARE_CLASS_IDS = np.array([]) DONOTCARE_CLASS_IDS = np.array([])
OBJ_CLASS_IDS = np.array( OBJ_CLASS_IDS = np.array(
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]) [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39])
......
# Modified from https://github.com/ScanNet/ScanNet/blob/master/SensReader/python/SensorData.py # noqa # Modified from https://github.com/ScanNet/ScanNet/blob/master/SensReader/python/SensorData.py # noqa
import imageio
import mmcv
import numpy as np
import os import os
import struct import struct
import zlib import zlib
from argparse import ArgumentParser from argparse import ArgumentParser
from functools import partial from functools import partial
import imageio
import mmcv
import numpy as np
COMPRESSION_TYPE_COLOR = {-1: 'unknown', 0: 'raw', 1: 'png', 2: 'jpeg'} COMPRESSION_TYPE_COLOR = {-1: 'unknown', 0: 'raw', 1: 'png', 2: 'jpeg'}
COMPRESSION_TYPE_DEPTH = { COMPRESSION_TYPE_DEPTH = {
......
...@@ -9,8 +9,9 @@ instance segmentations.""" ...@@ -9,8 +9,9 @@ instance segmentations."""
import argparse import argparse
import inspect import inspect
import json import json
import numpy as np
import os import os
import numpy as np
import scannet_utils import scannet_utils
currentdir = os.path.dirname( currentdir = os.path.dirname(
...@@ -90,7 +91,7 @@ def export(mesh_file, ...@@ -90,7 +91,7 @@ def export(mesh_file,
test_mode (bool): Whether is generating test data without labels. test_mode (bool): Whether is generating test data without labels.
Default: False. Default: False.
It returns a tuple, which containts the the following things: It returns a tuple, which contains the the following things:
np.ndarray: Vertices of points data. np.ndarray: Vertices of points data.
np.ndarray: Indexes of label. np.ndarray: Indexes of label.
np.ndarray: Indexes of instance. np.ndarray: Indexes of instance.
......
...@@ -8,8 +8,9 @@ ...@@ -8,8 +8,9 @@
""" """
import csv import csv
import numpy as np
import os import os
import numpy as np
from plyfile import PlyData from plyfile import PlyData
......
...@@ -19,7 +19,7 @@ RUN pip install mmsegmentation==0.18.0 ...@@ -19,7 +19,7 @@ RUN pip install mmsegmentation==0.18.0
# Install MMDetection3D # Install MMDetection3D
RUN conda clean --all RUN conda clean --all
RUN git clone https://github.com/open-mmlab/mmdetection3d.git /mmdetection3d COPY . /mmdetection3d
WORKDIR /mmdetection3d WORKDIR /mmdetection3d
ENV FORCE_CUDA="1" ENV FORCE_CUDA="1"
RUN pip install -r requirements/build.txt RUN pip install -r requirements/build.txt
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment