Unverified Commit fe25f7a5 authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Merge pull request #2867 from open-mmlab/dev-1.x

Bump version to 1.4.0
parents 5c0613be 0ef13b83
Pipeline #2710 failed with stages
in 0 seconds
......@@ -85,7 +85,7 @@ jobs:
type: string
cuda:
type: enum
enum: ["11.1", "11.7"]
enum: ["10.2", "11.7"]
cudnn:
type: integer
default: 8
......@@ -173,7 +173,8 @@ workflows:
torch: 1.8.1
# Use double quotation mark to explicitly specify its type
# as string instead of number
cuda: "11.1"
cuda: "10.2"
cudnn: 7
requires:
- hold
- build_cuda:
......@@ -190,7 +191,8 @@ workflows:
- build_cuda:
name: minimum_version_gpu
torch: 1.8.1
cuda: "11.1"
cuda: "10.2"
cudnn: 7
filters:
branches:
only:
......
......@@ -134,3 +134,4 @@ data/sunrgbd/OFFICIAL_SUNRGBD/
# Waymo evaluation
mmdet3d/evaluation/functional/waymo_utils/compute_detection_metrics_main
mmdet3d/evaluation/functional/waymo_utils/compute_detection_let_metrics_main
mmdet3d/evaluation/functional/waymo_utils/compute_segmentation_metrics_main
......@@ -104,9 +104,15 @@ Like [MMDetection](https://github.com/open-mmlab/mmdetection) and [MMCV](https:/
### Highlight
**We have renamed the branch `1.1` to `main` and switched the default branch from `master` to `main`. We encourage users to migrate to the latest version, though it comes with some cost. Please refer to [Migration Guide](docs/en/migration.md) for more details.**
In version 1.4, MMDetecion3D refactors the Waymo dataset and accelerates the preprocessing, training/testing setup, and evaluation of Waymo dataset. We also extends the support for camera-based, such as Monocular and BEV, 3D object detection models on Waymo. A detailed description of the Waymo data information is provided [here](https://mmdetection3d.readthedocs.io/en/latest/advanced_guides/datasets/waymo.html).
We have constructed a comprehensive LiDAR semantic segmentation benchmark on SemanticKITTI, including Cylinder3D, MinkUNet and SPVCNN methods. Noteworthy, the improved MinkUNetv2 can achieve 70.3 mIoU on the validation set of SemanticKITTI. We have also supported the training of BEVFusion and an occupancy prediction method, TPVFomrer, in our `projects`. More new features about 3D perception are on the way. Please stay tuned!
Besides, in version 1.4, MMDetection3D provides [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) to help community users get started with Waymo and use it for quick iterative development.
**v1.4.0** was released in 8/1/2024:
- Support the training of [DSVT](<(https://arxiv.org/abs/2301.06051)>) in `projects`
- Support [Nerf-Det](https://arxiv.org/abs/2307.14620) in `projects`
- Refactor Waymo dataset
**v1.3.0** was released in 18/10/2023:
......
......@@ -104,9 +104,15 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱,下一代
### 亮点
**我们将 `1.1` 分支重命名为 `main` 并将默认分支从 `master` 切换到 `main`。我们鼓励用户迁移到最新版本,请参考 [迁移指南](docs/en/migration.md)以了解更多细节。**
在1.4版本中,MMDetecion3D 重构了 Waymo 数据集, 加速了 Waymo 数据集的预处理、训练/测试启动、验证的速度。并且在 Waymo 上拓展了对 单目/BEV 等基于相机的三维目标检测模型的支持。在[这里](https://mmdetection3d.readthedocs.io/en/latest/advanced_guides/datasets/waymo.html)提供了对 Waymo 数据信息的详细解读。
我们在 SemanticKITTI 上构建了一个全面的点云语义分割基准,包括 Cylinder3D 、MinkUNet 和 SPVCNN 方法。其中,改进后的 MinkUNetv2 在验证集上可以达到 70.3 mIoU。我们还在 `projects` 中支持了 BEVFusion 的训练和全新的 3D 占有网格预测网络 TPVFormer。更多关于 3D 感知的新功能正在进行中。请继续关注!
此外,在1.4版本中,MMDetection3D 提供了 [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) 来帮助社区用户上手 Waymo 并用于快速迭代开发。
**v1.4.0** 版本已经在 2024.1.8 发布:
-`projects` 中支持了 [DSVT](<(https://arxiv.org/abs/2301.06051)>) 的训练
-`projects` 中支持了 [Nerf-Det](https://arxiv.org/abs/2307.14620)
- 重构了 Waymo 数据集
**v1.3.0** 版本已经在 2023.10.18 发布:
......@@ -171,8 +177,6 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱,下一代
## 基准测试和模型库
## 基准测试和模型库
测试结果和模型可以在[模型库](docs/zh_cn/model_zoo.md)中找到。
<div align="center">
......
# dataset settings
# D3 in the config name means the whole dataset is divided into 3 folds
# We only use one fold for efficient experiments
dataset_type = 'WaymoDataset'
data_root = 'data/waymo/kitti_format/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
metainfo = dict(classes=class_names)
input_modality = dict(use_lidar=False, use_camera=True)
# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)
# data_root = 's3://openmmlab/datasets/detection3d/waymo/kitti_format/'
# Method 2: Use backend_args, file_client_args in versions before 1.1.0
# backend_args = dict(
# backend='petrel',
# path_mapping=dict({
# './data/': 's3://openmmlab/datasets/detection3d/',
# 'data/': 's3://openmmlab/datasets/detection3d/'
# }))
backend_args = None
train_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=False,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
# base shape (1248, 832), scale (0.95, 1.05)
dict(
type='RandomResize3D',
scale=(1248, 832),
ratio_range=(0.95, 1.05),
# ratio_range=(1., 1.),
interpolation='nearest',
keep_ratio=True,
),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='Pack3DDetInputs',
keys=[
'img', 'gt_bboxes', 'gt_bboxes_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers_2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='RandomResize3D',
scale=(1248, 832),
ratio_range=(1., 1.),
interpolation='nearest',
keep_ratio=True),
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam'
]),
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='RandomResize3D',
scale=(1248, 832),
ratio_range=(1., 1.),
interpolation='nearest',
keep_ratio=True),
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam'
]),
]
train_dataloader = dict(
batch_size=3,
num_workers=3,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='waymo_infos_train.pkl',
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='fov_image_based',
# load one frame every three frames
load_interval=3,
backend_args=backend_args))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
ann_file='waymo_infos_val.pkl',
pipeline=eval_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='fov_image_based',
load_eval_anns=False,
backend_args=backend_args))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
ann_file='waymo_infos_val.pkl',
pipeline=eval_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='fov_image_based',
backend_args=backend_args))
val_evaluator = dict(
type='WaymoMetric',
waymo_bin_file='./data/waymo/waymo_format/fov_gt.bin',
metric='LET_mAP',
load_type='fov_image_based',
result_prefix='./pgd_fov_pred')
test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
# dataset settings
# D3 in the config name means the whole dataset is divided into 3 folds
# We only use one fold for efficient experiments
dataset_type = 'WaymoDataset'
data_root = 'data/waymo/kitti_format/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
metainfo = dict(classes=class_names)
input_modality = dict(use_lidar=False, use_camera=True)
# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)
# data_root = 's3://openmmlab/datasets/detection3d/waymo/kitti_format/'
# Method 2: Use backend_args, file_client_args in versions before 1.1.0
# backend_args = dict(
# backend='petrel',
# path_mapping=dict({
# './data/': 's3://openmmlab/datasets/detection3d/',
# 'data/': 's3://openmmlab/datasets/detection3d/'
# }))
backend_args = None
train_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=False,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
# base shape (1248, 832), scale (0.95, 1.05)
dict(
type='RandomResize3D',
scale=(1248, 832),
# ratio_range=(1., 1.),
ratio_range=(0.95, 1.05),
interpolation='nearest',
keep_ratio=True,
),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='Pack3DDetInputs',
keys=[
'img', 'gt_bboxes', 'gt_bboxes_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers_2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='Resize3D',
scale_factor=0.65,
interpolation='nearest',
keep_ratio=True),
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam'
]),
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
dict(type='LoadImageFromFileMono3D', backend_args=backend_args),
dict(
type='Resize3D',
scale_factor=0.65,
interpolation='nearest',
keep_ratio=True),
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam'
]),
]
train_dataloader = dict(
batch_size=3,
num_workers=3,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='waymo_infos_train.pkl',
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='mv_image_based',
# load one frame every three frames
load_interval=3,
backend_args=backend_args))
val_dataloader = dict(
batch_size=1,
num_workers=0,
persistent_workers=False,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
ann_file='waymo_infos_val.pkl',
pipeline=eval_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='mv_image_based',
# load_eval_anns=False,
backend_args=backend_args))
test_dataloader = dict(
batch_size=1,
num_workers=0,
persistent_workers=False,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
pts='training/velodyne',
CAM_FRONT='training/image_0',
CAM_FRONT_LEFT='training/image_1',
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
ann_file='waymo_infos_val.pkl',
pipeline=eval_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
cam_sync_instances=True,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='Camera',
load_type='mv_image_based',
load_eval_anns=False,
backend_args=backend_args))
val_evaluator = dict(
type='WaymoMetric',
waymo_bin_file='./data/waymo/waymo_format/cam_gt.bin',
metric='LET_mAP',
load_type='mv_image_based',
result_prefix='./pgd_mv_pred',
nms_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_pre=500,
nms_thr=0.05,
score_thr=0.001,
min_bbox_size=0,
max_per_frame=100))
test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
......@@ -89,7 +89,10 @@ test_pipeline = [
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
dict(
type='Pack3DDetInputs',
keys=['points'],
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
......@@ -100,7 +103,10 @@ eval_pipeline = [
load_dim=6,
use_dim=5,
backend_args=backend_args),
dict(type='Pack3DDetInputs', keys=['points']),
dict(
type='Pack3DDetInputs',
keys=['points'],
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
]
train_dataloader = dict(
......@@ -164,12 +170,7 @@ test_dataloader = dict(
backend_args=backend_args))
val_evaluator = dict(
type='WaymoMetric',
ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl',
waymo_bin_file='./data/waymo/waymo_format/gt.bin',
data_root='./data/waymo/waymo_format',
backend_args=backend_args,
convert_kitti_format=False)
type='WaymoMetric', waymo_bin_file='./data/waymo/waymo_format/gt.bin')
test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')]
......
......@@ -62,7 +62,8 @@ train_pipeline = [
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
keys=['points'],
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
]
test_pipeline = [
dict(
......@@ -86,7 +87,10 @@ test_pipeline = [
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
dict(
type='Pack3DDetInputs',
keys=['points'],
meta_keys=['box_type_3d', 'sample_idx', 'context_name', 'timestamp'])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
......@@ -161,12 +165,7 @@ test_dataloader = dict(
backend_args=backend_args))
val_evaluator = dict(
type='WaymoMetric',
ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl',
waymo_bin_file='./data/waymo/waymo_format/gt.bin',
data_root='./data/waymo/waymo_format',
convert_kitti_format=False,
backend_args=backend_args)
type='WaymoMetric', waymo_bin_file='./data/waymo/waymo_format/gt.bin')
test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')]
......
# dataset settings
# D3 in the config name means the whole dataset is divided into 3 folds
# D5 in the config name means the whole dataset is divided into 5 folds
# We only use one fold for efficient experiments
dataset_type = 'WaymoDataset'
data_root = 'data/waymo/kitti_format/'
......@@ -19,7 +19,7 @@ data_root = 'data/waymo/kitti_format/'
# }))
backend_args = None
class_names = ['Car', 'Pedestrian', 'Cyclist']
class_names = ['Pedestrian', 'Cyclist', 'Car']
input_modality = dict(use_lidar=False, use_camera=True)
point_cloud_range = [-35.0, -75.0, -2, 75.0, 75.0, 4]
......@@ -30,7 +30,7 @@ train_transforms = [
scale=(1248, 832),
ratio_range=(0.95, 1.05),
keep_ratio=True),
dict(type='RandomCrop3D', crop_size=(720, 1080)),
dict(type='RandomCrop3D', crop_size=(1080, 720)),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5, flip_box3d=False),
]
......@@ -70,7 +70,14 @@ test_pipeline = [
to_float32=True,
backend_args=backend_args),
dict(type='MultiViewWrapper', transforms=test_transforms),
dict(type='Pack3DDetInputs', keys=['img'])
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'ori_cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam',
'num_ref_frames', 'num_views'
])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
......@@ -80,7 +87,14 @@ eval_pipeline = [
to_float32=True,
backend_args=backend_args),
dict(type='MultiViewWrapper', transforms=test_transforms),
dict(type='Pack3DDetInputs', keys=['img'])
dict(
type='Pack3DDetInputs',
keys=['img'],
meta_keys=[
'box_type_3d', 'img_shape', 'ori_cam2img', 'scale_factor',
'sample_idx', 'context_name', 'timestamp', 'lidar2cam',
'num_ref_frames', 'num_views'
])
]
metainfo = dict(classes=class_names)
......@@ -103,6 +117,7 @@ train_dataloader = dict(
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
cam_sync_instances=True,
metainfo=metainfo,
box_type_3d='Lidar',
load_interval=5,
......@@ -149,7 +164,7 @@ test_dataloader = dict(
CAM_FRONT_RIGHT='training/image_2',
CAM_SIDE_LEFT='training/image_3',
CAM_SIDE_RIGHT='training/image_4'),
pipeline=eval_pipeline,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
......@@ -157,10 +172,7 @@ test_dataloader = dict(
backend_args=backend_args))
val_evaluator = dict(
type='WaymoMetric',
ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl',
waymo_bin_file='./data/waymo/waymo_format/cam_gt.bin',
data_root='./data/waymo/waymo_format',
metric='LET_mAP',
backend_args=backend_args)
metric='LET_mAP')
test_evaluator = val_evaluator
......@@ -35,7 +35,7 @@ model = dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-35.0, -75.0, -2, 75.0, 75.0, 4]],
rotations=[.0]),
bbox_head=dict(
bbox_head_3d=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=256,
......@@ -43,13 +43,13 @@ model = dict(
use_direction_classifier=True,
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-35.0, -75.0, -0.0345, 75.0, 75.0, -0.0345],
[-35.0, -75.0, 0, 75.0, 75.0, 0],
[-35.0, -75.0, -0.1188, 75.0, 75.0, -0.1188]],
ranges=[[-35.0, -75.0, 0, 75.0, 75.0, 0],
[-35.0, -75.0, -0.1188, 75.0, 75.0, -0.1188],
[-35.0, -75.0, -0.0345, 75.0, 75.0, -0.0345]],
sizes=[
[4.73, 2.08, 1.77], # car
[0.91, 0.84, 1.74], # pedestrian
[1.81, 0.84, 1.77], # cyclist
[4.73, 2.08, 1.77], # car
],
rotations=[0, 1.57],
reshape_out=False),
......@@ -69,13 +69,6 @@ model = dict(
loss_weight=0.2)),
train_cfg=dict(
assigner=[
dict( # for Car
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1),
dict( # for Pedestrian
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
......@@ -90,6 +83,13 @@ model = dict(
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Car
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1)
],
allowed_border=0,
pos_weight=-1,
......@@ -100,5 +100,5 @@ model = dict(
nms_thr=0.05,
score_thr=0.001,
min_bbox_size=0,
nms_pre=500,
max_num=100))
nms_pre=4096,
max_num=500))
_base_ = ['./minkunet_w32_8xb2-15e_semantickitti.py']
_base_ = ['./minkunet18_w32_torchsparse_8xb2-amp-15e_semantickitti.py']
model = dict(
backbone=dict(
......
_base_ = ['./minkunet_w32_8xb2-15e_semantickitti.py']
_base_ = ['./minkunet18_w32_torchsparse_8xb2-amp-15e_semantickitti.py']
model = dict(
backbone=dict(
......
# MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones
> [MV-FCOS3D++: Multi-View} Camera-Only 4D Object Detection with Pretrained Monocular Backbones](https://arxiv.org/abs/2207.12716)
<!-- [ALGORITHM] -->
## Abstract
In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022. For multi-view camera-only 3D detection, methods based on bird-eye-view or 3D geometric representations can leverage the stereo cues from overlapped regions between adjacent views and directly perform 3D detection without hand-crafted post-processing. However, it lacks direct semantic supervision for 2D backbones, which can be complemented by pretraining simple monocular-based detectors. Our solution is a multi-view framework for 4D detection following this paradigm. It is built upon a simple monocular detector FCOS3D++, pretrained only with object annotations of Waymo, and converts multi-view features to a 3D grid space to detect 3D objects thereon. A dual-path neck for single-frame understanding and temporal stereo matching is devised to incorporate multi-frame information. Our method finally achieves 49.75% mAPL with a single model and wins 2nd place in the WOD challenge, without any LiDAR-based depth supervision during training. The code will be released at [this https URL](https://github.com/Tai-Wang/Depth-from-Motion).
<div align=center>
<img src="https://github.com/open-mmlab/mmdetection3d/assets/72679458/9313eb3c-cc41-40be-9ead-549b3b5fef44" width="800"/>
</div>
## Introduction
We implement multi-view FCOS3D++ and provide the results on Waymo dataset.
## Usage
### Training commands
1. You should train PGD first:
```bash
bash tools/dist_train.py configs/pgd/pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-mv-mono3d.py 8
```
2. Given pre-trained PGD backbone, you could train multi-view FCOS3D++:
```bash
bash tools/dist_train.sh configs/mvfcos3d/multiview-fcos3d_r101-dcn_8xb2_waymoD5-3d-3class.py --cfg-options load_from=${PRETRAINED_CHECKPOINT}
```
**Note**:
the path of `load_from` needs to be changed to yours accordingly.
## Results and models
### Waymo
| Backbone | Load Interval | mAPL | mAP | mAPH | Download |
| :--------------------------------------------------------------------: | :-----------: | :--: | :--: | :--: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [ResNet101+DCN](./multiview-fcos3d_r101-dcn_8xb2_waymoD5-3d-3class.py) | 5x | 38.2 | 52.9 | 49.5 | [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/mvfcos3d/multiview-fcos3d_r101-dcn_8xb2_waymoD5-3d-3class/multiview-fcos3d_r101-dcn_8xb2_waymoD5-3d-3class_20231127_122815.log) |
| above @ Car | | 56.5 | 73.3 | 72.3 | |
| above @ Pedestrian | | 34.8 | 49.5 | 43.1 | |
| above @ Cyclist | | 23.2 | 35.9 | 33.3 | |
**Note**:
Regrettably, we are unable to provide the pre-trained model weights due to [Waymo Dataset License Agreement](https://waymo.com/open/terms/), so we only provide the training logs as shown above.
## Citation
```latex
@article{wang2022mvfcos3d++,
title={{MV-FCOS3D++: Multi-View} Camera-Only 4D Object Detection with Pretrained Monocular Backbones},
author={Wang, Tai and Lian, Qing and Zhu, Chenming and Zhu, Xinge and Zhang, Wenwei},
journal={arXiv preprint},
year={2022}
}
```
......@@ -50,6 +50,23 @@ Note: mAP represents Car moderate 3D strict AP11 / AP40 results. Because of the
| [above w/ finetune](./pgd_r101-caffe_fpn_head-gn_16xb2-2x_nus-mono3d_finetune.py) | 2x | 9.20 | 35.8 | 42.5 | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mono3d_finetune/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mono3d_finetune_20211114_162135-5ec7c1cd.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mono3d_finetune/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mono3d_finetune_20211114_162135.log.json) |
| above w/ tta | 2x | 9.20 | 36.8 | 43.1 | |
### Waymo
| Backbone | Load Interval | Camera view | mAPL | mAP | mAPH | Download |
| :--------------------------------------------------------------------------: | :-----------: | :-----------: | :--: | :--: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [ResNet101 w/ DCN](./pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-fov-mono3d.py) | 3x | front-of-view | 15.8 | 22.7 | 21.51 | [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/pgd/pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-fov-mono3d/pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-fov-mono3d_20231107_164117.log) |
| above @ Car | | | 36.7 | 51.6 | 51.0 | |
| above @ Pedestrian | | | 9.0 | 14.1 | 11.4 | |
| above @ Cyclist | | | 1.6 | 2.5 | 2.2 | |
| [ResNet101 w/ DCN](./pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-mv-mono3d.py) | 3x | multi-view | 20.8 | 29.3 | 27.7 | [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/pgd/pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-mv-mono3d/pgd_r101_fpn_gn-head_dcn_8xb3-2x_waymoD3-mv-mono3d_20231120_202732.log) |
| above @ Car | | | 41.2 | 56.1 | 55.2 | |
| above @ Pedestrian | | | 20.0 | 29.6 | 25.8 | |
| above @ Cyclist | | | 1.4 | 2.2 | 2.0 | |
**Note**:
Regrettably, we are unable to provide the pre-trained model weights due to [Waymo Dataset License Agreement](https://waymo.com/open/terms/), so we only provide the training logs as shown above.
## Citation
```latex
......
......@@ -68,9 +68,9 @@ model = dict(
type='PGDBBoxCoder',
base_depths=((41.01, 18.44), ),
base_dims=(
(4.73, 1.77, 2.08),
(0.91, 1.74, 0.84),
(1.81, 1.77, 0.84),
(4.73, 1.77, 2.08), # Car
(0.91, 1.74, 0.84), # Pedestrian
(1.81, 1.77, 0.84), # Cyclist
),
code_size=7)),
# set weight 1.0 for base 7 dims (offset, depth, size, rot)
......
_base_ = [
'../_base_/datasets/waymoD3-fov-mono3d-3class.py',
'../_base_/models/pgd.py', '../_base_/schedules/mmdet-schedule-1x.py',
'../_base_/default_runtime.py'
]
# load_from = '../Depth-from-Motion/checkpoints/pgd_init.pth'
# model settings
model = dict(
backbone=dict(
type='mmdet.ResNet',
depth=101,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'),
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True)),
neck=dict(num_outs=3),
bbox_head=dict(
num_classes=3,
bbox_code_size=7,
pred_attrs=False,
pred_velo=False,
pred_bbox2d=True,
use_onlyreg_proj=True,
strides=(8, 16, 32),
regress_ranges=((-1, 128), (128, 256), (256, 1e8)),
group_reg_dims=(2, 1, 3, 1, 16,
4), # offset, depth, size, rot, kpts, bbox2d
reg_branch=(
(256, ), # offset
(256, ), # depth
(256, ), # size
(256, ), # rot
(256, ), # kpts
(256, ) # bbox2d
),
centerness_branch=(256, ),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_centerness=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
use_depth_classifier=True,
depth_branch=(256, ),
depth_range=(0, 50),
depth_unit=10,
division='uniform',
depth_bins=6,
pred_keypoints=True,
weight_dim=1,
loss_depth=dict(
type='UncertainSmoothL1Loss', alpha=1.0, beta=3.0,
loss_weight=1.0),
loss_bbox2d=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.0),
loss_consistency=dict(type='mmdet.GIoULoss', loss_weight=0.0),
bbox_coder=dict(
type='PGDBBoxCoder',
base_depths=((41.01, 18.44), ),
base_dims=(
(0.91, 1.74, 0.84), # Pedestrian
(1.81, 1.77, 0.84), # Cyclist
(4.73, 1.77, 2.08)), # Car
code_size=7)),
# set weight 1.0 for base 7 dims (offset, depth, size, rot)
# 0.2 for 16-dim keypoint offsets and 1.0 for 4-dim 2D distance targets
train_cfg=dict(code_weight=[
1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 1.0, 1.0, 1.0, 1.0
]),
test_cfg=dict(nms_pre=100, nms_thr=0.05, score_thr=0.001, max_per_img=20))
# optimizer
optim_wrapper = dict(
optimizer=dict(
type='SGD',
lr=0.008,
),
paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.),
clip_grad=dict(max_norm=35, norm_type=2))
param_scheduler = [
dict(
type='LinearLR',
start_factor=1.0 / 3,
by_epoch=False,
begin=0,
end=500),
dict(
type='MultiStepLR',
begin=0,
end=24,
by_epoch=True,
milestones=[16, 22],
gamma=0.1)
]
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=24, val_interval=24)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
auto_scale_lr = dict(enable=False, base_batch_size=48)
_base_ = [
'../_base_/datasets/waymoD3-mv-mono3d-3class.py',
'../_base_/models/pgd.py', '../_base_/schedules/mmdet-schedule-1x.py',
'../_base_/default_runtime.py'
]
# model settings
model = dict(
backbone=dict(
type='mmdet.ResNet',
depth=101,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'),
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True)),
neck=dict(num_outs=3),
bbox_head=dict(
num_classes=3,
bbox_code_size=7,
pred_attrs=False,
pred_velo=False,
pred_bbox2d=True,
use_onlyreg_proj=True,
strides=(8, 16, 32),
regress_ranges=((-1, 128), (128, 256), (256, 1e8)),
group_reg_dims=(2, 1, 3, 1, 16,
4), # offset, depth, size, rot, kpts, bbox2d
reg_branch=(
(256, ), # offset
(256, ), # depth
(256, ), # size
(256, ), # rot
(256, ), # kpts
(256, ) # bbox2d
),
centerness_branch=(256, ),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_centerness=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
use_depth_classifier=True,
depth_branch=(256, ),
depth_range=(0, 50),
depth_unit=10,
division='uniform',
depth_bins=6,
pred_keypoints=True,
weight_dim=1,
loss_depth=dict(
type='UncertainSmoothL1Loss', alpha=1.0, beta=3.0,
loss_weight=1.0),
loss_bbox2d=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.0),
loss_consistency=dict(type='mmdet.GIoULoss', loss_weight=0.0),
bbox_coder=dict(
type='PGDBBoxCoder',
base_depths=((41.01, 18.44), ),
base_dims=(
(0.91, 1.74, 0.84), # Pedestrian
(1.81, 1.77, 0.84), # Cyclist
(4.73, 1.77, 2.08)), # Car
code_size=7)),
# set weight 1.0 for base 7 dims (offset, depth, size, rot)
# 0.2 for 16-dim keypoint offsets and 1.0 for 4-dim 2D distance targets
train_cfg=dict(code_weight=[
1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 1.0, 1.0, 1.0, 1.0
]),
test_cfg=dict(nms_pre=100, nms_thr=0.05, score_thr=0.001, max_per_img=20))
# optimizer
optim_wrapper = dict(
optimizer=dict(
type='SGD',
lr=0.008,
),
paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.),
clip_grad=dict(max_norm=35, norm_type=2))
param_scheduler = [
dict(
type='LinearLR',
start_factor=1.0 / 3,
by_epoch=False,
begin=0,
end=500),
dict(
type='MultiStepLR',
begin=0,
end=24,
by_epoch=True,
milestones=[16, 22],
gamma=0.1)
]
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=24, val_interval=24)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
auto_scale_lr = dict(enable=False, base_batch_size=48)
_base_ = ['./spvcnn_w32_8xb2-15e_semantickitti.py']
_base_ = ['./spvcnn_w32_8xb2-amp-15e_semantickitti.py']
model = dict(
backbone=dict(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment