Commit 7aa442d5 authored by raojy's avatar raojy
Browse files

raw_mmdetection

parent 9c03eaa8
# FAQ
We list some potential troubles encountered by users and developers, along with their corresponding solutions. Feel free to enrich the list if you find any frequent issues and contribute your solutions to solve them. If you have any trouble with environment configuration, model training, etc, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection3d/blob/master/.github/ISSUE_TEMPLATE/error-report.md) and fill in all required information in the template.
## MMEngine/MMCV/MMDet/MMDet3D Installation
- Compatibility issue between MMEngine, MMCV, MMDetection and MMDetection3D; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx."
- The required versions of MMEngine, MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMEngine, MMCV and MMDetection to avoid installation issues.
| MMDetection3D version | MMEngine version | MMCV version | MMDetection version |
| --------------------- | :----------------------: | :---------------------: | :----------------------: |
| main | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
| v1.4.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.4.0 |
| v1.3.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.2.0 | mmdet>=3.0.0rc5, \<3.3.0 |
| v1.2.0 | mmengine>=0.8.0, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.2.0 |
| v1.1.1 | mmengine>=0.7.1, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.1.0 |
**Note:** If you want to install mmdet3d-v1.0.0rcx, the compatible MMDetection, MMSegmentation and MMCV versions table can be found at [here](https://mmdetection3d.readthedocs.io/en/latest/faq.html#mmcv-mmdet-mmdet3d-installation). Please choose the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
- If you faced the error shown below when importing open3d:
`OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found`
please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
- If you faced the error when importing pycocotools, this is because nuscenes-devkit installs pycocotools but mmdet relies on mmpycocotools. The current workaround is as below. We will migrate to use pycocotools in the future.
```shell
pip uninstall pycocotools mmpycocotools
pip install mmpycocotools
```
**NOTE**: We have migrated to use pycocotools in mmdet3d >= 0.13.0.
- If you face the error shown below when importing pycocotools:
`ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy \< 1.20.0. Or you can compile and install the latest pycocotools from source as below:
`pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
or
`pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
- If you face some errors about numba in cuda-9.0 environment, you should check the version of numba. In cuda-9.0 environment, the high version of numba is not supported and we suggest you could install numba==0.53.0.
## How to annotate point cloud?
MMDetection3D does not support point cloud annotation. Some open-source annotation tool are offered for reference:
- [SUSTechPOINTS](https://github.com/naurril/SUSTechPOINTS)
- [LATTE](https://github.com/bernwang/latte)
Besides, we improved [LATTE](https://github.com/bernwang/latte) for better use. More details can be found [here](https://arxiv.org/abs/2011.10174).
.. toctree::
:maxdepth: 1
benchmarks.md
changelog_v1.0.x.md
changelog.md
compatibility.md
faq.md
contribution_guides.md
#!/usr/bin/env python
import functools as func
import glob
import re
from os import path as osp
import numpy as np
url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/main'
files = sorted(glob.glob('../../configs/*/README.md'))
stats = []
titles = []
num_ckpts = 0
for f in files:
url = osp.dirname(f.replace('../../', url_prefix))
with open(f, 'r') as content_file:
content = content_file.read()
title = content.split('\n')[0].replace('# ', '').strip()
ckpts = set(x.lower().strip()
for x in re.findall(r'\[model\]\((https?.*)\)', content))
if len(ckpts) == 0:
continue
_papertype = [x for x in re.findall(r'\[([A-Z]+)\]', content)]
assert len(_papertype) > 0
papertype = _papertype[0]
paper = set([(papertype, title)])
titles.append(title)
num_ckpts += len(ckpts)
statsmsg = f"""
\t* [{papertype}] [{title}]({url}) ({len(ckpts)} ckpts)
"""
stats.append((paper, ckpts, statsmsg))
allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _ in stats])
msglist = '\n'.join(x for _, _, x in stats)
papertypes, papercounts = np.unique([t for t, _ in allpapers],
return_counts=True)
countstr = '\n'.join(
[f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
modelzoo = f"""
# Model Zoo Statistics
* Number of papers: {len(set(titles))}
{countstr}
* Number of checkpoints: {num_ckpts}
{msglist}
"""
with open('modelzoo_statistics.md', 'w') as f:
f.write(modelzoo)
## <a href='https://mmdetection3d.readthedocs.io/en/latest/'>English</a>
## <a href='https://mmdetection3d.readthedocs.io/zh_CN/latest/'>简体中文</a>
# Backends Support
We support different file client backends: Disk, Ceph and LMDB, etc. Here is an example of how to modify configs for Ceph-based data loading and saving.
## Load data and annotations from Ceph
We support loading data and generated annotation info files (pkl and json) from Ceph:
```python
# set file client backends as Ceph
backend_args = dict(
backend='petrel',
path_mapping=dict({
'./data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/', # replace the path with your data path on Ceph
'data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/' # replace the path with your data path on Ceph
}))
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5)),
sample_groups=dict(Car=15),
classes=class_names,
# set file client for points loader to load training data
points_loader=dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4,
backend_args=backend_args),
# set file client for data base sampler to load db info file
backend_args=backend_args)
train_pipeline = [
# set file client for loading training data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
# set file client for loading training data annotations
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, backend_args=backend_args),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[0.25, 0.25, 0.25],
global_rot_range=[0.0, 0.0],
rot_range=[-0.15707963267, 0.15707963267]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
# set file client for loading validation/testing data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
data = dict(
# set file client for loading training info files (.pkl)
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(pipeline=train_pipeline, classes=class_names, backend_args=backend_args)),
# set file client for loading validation info files (.pkl)
val=dict(pipeline=test_pipeline, classes=class_names,backend_args=backend_args),
# set file client for loading testing info files (.pkl)
test=dict(pipeline=test_pipeline, classes=class_names, backend_args=backend_args))
```
## Load pretrained model from Ceph
```python
model = dict(
pts_backbone=dict(
_delete_=True,
type='NoStemRegNet',
arch='regnetx_1.6gf',
init_cfg=dict(
type='Pretrained', checkpoint='s3://openmmlab/checkpoints/mmdetection3d/regnetx_1.6gf'), # replace the path with your pretrained model path on Ceph
...
```
## Load checkpoint from Ceph
```python
# replace the path with your checkpoint path on Ceph
load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
resume_from = None
workflow = [('train', 1)]
```
## Save checkpoint into Ceph
```python
# checkpoint saving
# replace the path with your checkpoint saving path on Ceph
checkpoint_config = dict(interval=1, max_keep_ckpts=2, out_dir='s3://openmmlab/mmdetection3d')
```
## EvalHook saves the best checkpoint into Ceph
```python
# replace the path with your checkpoint saving path on Ceph
evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetection3d')
```
## Save the training log into Ceph
The training log will be backed up to the specified Ceph path after training.
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
])
```
You can also delete the local training log after backing up to the specified Ceph path by setting `keep_local = False`.
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d', keep_local=False),
])
```
# Learn about Configs
MMDetection3D and other OpenMMLab repositories use [MMEngine's config system](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html). It has a modular and inheritance design, which is convenient to conduct various experiments.
## Config file content
MMDetection3D uses a modular design, all modules with different functions can be configured through the config. Taking PointPillars as an example, we will introduce each field in the config according to different function modules.
### Model config
In MMDetection3D's config, we use `model` to setup detection algorithm components. In addition to neural network components such as `voxel_encoder`, `backbone` etc, it also requires `data_preprocessor`, `train_cfg`, and `test_cfg`. `data_preprocessor` is responsible for processing a batch of data output by dataloader. `train_cfg` and `test_cfg` in the model config are training and testing hyperparameters of the components.
```python
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
voxel_size=[0.16, 0.16, 4],
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=[0.16, 0.16, 4],
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78]],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=0.1111111111111111,
loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
train_cfg=dict(
assigner=[
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1)
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
```
### Dataset and evaluator config
[Dataloaders](https://pytorch.org/docs/stable/data.html?highlight=data%20loader#torch.utils.data.DataLoader) are required for the training, validation, and testing of the [runner](https://mmengine.readthedocs.io/en/latest/tutorials/runner.html). Dataset and data pipeline need to be set to build the dataloader. Due to the complexity of this part, we use intermediate variables to simplify the writing of dataloader configs.
```python
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
classes=class_names,
sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
points_loader=dict(
type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4))
train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='kitti_infos_train.pkl',
data_prefix=dict(pts='training/velodyne_reduced'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
```
[Evaluators](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html) are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
```python
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_val.pkl',
metric='bbox')
test_evaluator = val_evaluator
```
Since the test dataset has no annotation files, the test_dataloader and test_evaluator config in MMDetection3D are generally equal to the val's. If you want to save the detection results on the test dataset, you can write the config like this:
```python
# inference on test dataset and
# format the output results for submission.
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='testing/velodyne_reduced'),
ann_file='kitti_infos_test.pkl',
load_eval_anns=False,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_test.pkl',
metric='bbox',
format_only=True,
submission_prefix='results/kitti-3class/kitti_results')
```
### Training and testing config
MMEngine's runner uses Loop to control the training, validation, and testing processes.
Users can set the maximum training epochs and validation intervals with these fields:
```python
train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=80,
val_interval=2)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
```
### Optimization config
`optim_wrapper` is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer, but also supports functions such as gradient clipping, mixed precision training, etc. Find more in [optimizer wrapper tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html).
```python
optim_wrapper = dict( # Optimizer wrapper config
type='OptimWrapper', # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.
optimizer=dict( # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01),
clip_grad=dict(max_norm=35, norm_type=2)) # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html
```
`param_scheduler` is a field that configures methods of adjusting optimization hyperparameters such as learning rate and momentum. Users can combine multiple schedulers to create a desired parameter adjustment strategy. Find more in [parameter scheduler tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/param_scheduler.html) and [parameter scheduler API documents](https://mmengine.readthedocs.io/en/latest/api/optim.html#scheduler).
```python
param_scheduler = [
dict(
type='CosineAnnealingLR',
T_max=32,
eta_min=0.01,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=48,
eta_min=1.0000000000000001e-07,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=32,
eta_min=0.8947368421052632,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=48,
eta_min=1,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
]
```
### Hook config
Users can attach Hooks to training, validation, and testing loops to insert some operations during running. There are two different hook fields, one is `default_hooks` and the other is `custom_hooks`.
`default_hooks` is a dict of hook configs, and they are the hooks must be required at the runtime. They have default priority which should not be modified. If not set, runner will use the default values. To disable a default hook, users can set its config to `None`.
```python
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=-1),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='Det3DVisualizationHook'))
```
`custom_hooks` is a list of all other hook configs. Users can develop their own hooks and insert them in this field.
```python
custom_hooks = []
```
### Runtime config
```python
default_scope = 'mmdet3d' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/registry.html
env_cfg = dict(
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict( # Multi-processing config
mp_start_method='fork', # Use fork to start multi-processing threads. 'fork' usually faster than 'spawn' but maybe unsafe. See discussion in https://github.com/pytorch/pytorch/issues/1355
opencv_num_threads=0), # Disable opencv multi-threads to avoid system being overloaded
dist_cfg=dict(backend='nccl')) # Distribution configs
vis_backends = [dict(type='LocalVisBackend')] # Visualization backends. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/visualization.html
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(
type='LogProcessor', # Log processor to process runtime logs
window_size=50, # Smooth interval of log values
by_epoch=True) # Whether to format logs with epoch type. Should be consistent with the train loop's type.
log_level = 'INFO' # The level of logging.
load_from = None # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
```
## Config file inheritance
There are 4 basic component types under `configs/_base_`, dataset, model, schedule, default_runtime.
Many methods could be easily constructed with one of these models like SECOND, PointPillars, PartA2, VoteNet.
The configs that are composed of components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from existing methods.
For example, if some modification is made based on PointPillars, users may first inherit the basic PointPillars structure by specifying `_base_ = '../pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py'`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`.
Please refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for detailed documentation.
By setting the `_base_` field, we can set which files the current configuration file inherits from.
When `_base_` is a string of a file path, it means inheriting the contents from one config file.
```python
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
```
When `_base_` is a list of multiple file paths, it means inheriting from multiple files.
```python
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_kitti.py',
'../_base_/datasets/kitti-3d-3class.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
```
If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
### Ignore some fields in the base configs
Sometimes, you may set `_delete_=True` to ignore some of the fields in base configs.
You may refer to [MMEngine config tutorial](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html) for a simple illustration.
In MMDetection3D, for example, to change the neck of PointPillars with the following config:
```python
model = dict(
type='MVXFasterRCNN',
data_preprocessor=dict(voxel_layer=dict(...)),
pts_voxel_encoder=dict(...),
pts_middle_encoder=dict(...),
pts_backbone=dict(...),
pts_neck=dict(
type='FPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
act_cfg=dict(type='ReLU'),
in_channels=[64, 128, 256],
out_channels=256,
start_level=0,
num_outs=3),
pts_bbox_head=dict(...))
```
`FPN` and `SECONDFPN` use different keywords to construct:
```python
_base_ = '../_base_/models/pointpillars_hv_fpn_nus.py'
model = dict(
pts_neck=dict(
_delete_=True,
type='SECONDFPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
pts_bbox_head=dict(...))
```
The `_delete_=True` would replace all old keys in `pts_neck` field with new keys.
### Use intermediate variables in configs
Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
It's worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again.
For example, we would like to use a multi-scale strategy to train and test a PointPillars, `train_pipeline`/`test_pipeline` are intermediate variables we would like to modify.
```python
_base_ = './nus-3d.py'
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=[0.95, 1.0, 1.05],
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
```
We first define the new `train_pipeline`/`test_pipeline` and pass them into dataloader fields.
### Reuse variables in \_base\_ file
If the users want to reuse the variables in the base file, they can get a copy of the corresponding variable by using `{{_base_.xxx}}`. E.g:
```python
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
a = {{_base_.model}} # variable `a` is equal to the `model` defined in `_base_`
```
## Modify config through script arguments
When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify `--cfg-options` to in-place modify the config.
- Update config keys of dict chains
The config options can be specified following the order of the dict keys in the original config.
For example, `--cfg-options model.backbone.norm_eval=False` changes the all BN modules in model backbones to `train` mode.
- Update keys inside a list of configs
Some config dicts are composed as a list in your config. For example, the training pipeline `train_dataloader.dataset.pipeline` is normally a list
e.g. `[dict(type='LoadPointsFromFile'), ...]`. If you want to change `'LoadPointsFromFile'` to `'LoadPointsFromDict'` in the pipeline,
you may specify `--cfg-options data.train.pipeline.0.type=LoadPointsFromDict`.
- Update values of list/tuple
If the value to be updated is a list or a tuple. For example, the config file normally sets `model.data_preprocessor.mean=[123.675, 116.28, 103.53]`. If you want to
change the mean values, you may specify `--cfg-options model.data_preprocessor.mean="[127,127,127]"`. Note that the quotation mark `"` is necessary to
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
## Config Name Style
We follow the below style to name config files. Contributors are advised to follow the same style.
```
{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}.py
```
The file name is divided to five parts. All parts and components are connected with `_` and words of each part or component should be connected with `-`.
- `{algorithm name}`: The name of the algorithm. It can be a detector name such as `pointpillars`, `fcos3d`, etc.
- `{model component names}`: Names of the components used in the algorithm such as voxel_encoder, backbone, neck, etc. For example, `second_secfpn_head-dcn-circlenms` means using SECOND's SparseEncoder, SECONDFPN and a detection head with DCN and circle NMS.
- `{training settings}`: Information of training settings such as batch size, augmentations, loss trick, scheduler, and epochs/iterations. For example: `8xb4-tta-cyclic-20e` means using 8-gpus x 4-samples-per-gpu, test time augmentation, cyclic annealing learning rate, and train 20 epochs.
Some abbreviations:
- `{gpu x batch_per_gpu}`: GPUs and samples per GPU. `bN` indicates N batch size per GPU. E.g. `4xb4` is the short term of 4-GPUs x 4-samples-per-GPU.
- `{schedule}`: training schedule, options are `schedule-2x`, `schedule-3x`, `cyclic-20e`, etc.
`schedule-2x` and `schedule-3x` mean 24 epochs and 36 epochs respectively.
`cyclic-20e` means 20 epochs respectively.
- `{training dataset information}`: Training dataset names like `kitti-3d-3class`, `nus-3d`, `s3dis-seg`, `scannet-seg`, `waymoD5-3d-car`. Here `3d` means dataset used for 3D object detection, and `seg` means dataset used for point cloud segmentation.
- `{testing dataset information}` (optional): Testing dataset name for models trained on one dataset but tested on another. If not mentioned, it means the model was trained and tested on the same dataset type.
# Coordinate System
## Overview
MMDetection3D uses three different coordinate systems. The existence of different coordinate systems in the society of 3D object detection is necessary, because for various 3D data collection devices, such as LiDAR, depth camera, etc., the coordinate systems are not consistent, and different 3D datasets also follow different data formats. Early works, such as SECOND, VoteNet, convert the raw data to another format, forming conventions that some later works also follow, making the conversion between coordinate systems even more complicated.
Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
- Camera coordinate system -- the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
- LiDAR coordinate system -- the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
- Depth coordinate system -- the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of $(x, y, z, dx, dy, dz, r)$, our coordinate systems also define how to interpret the box dimensions $(dx, dy, dz)$ and the yaw angle $r$.
The illustration of the three coordinate systems is shown below:
![](https://raw.githubusercontent.com/open-mmlab/mmdetection3d/master/resources/coord_sys_all.png)
The three figures above are the 3D coordinate systems while the three figures below are the bird's eye view.
We will stick to the three coordinate systems defined in this tutorial in the future.
## Definition of the yaw angle
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane $\\Pi$ perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on $\\Pi$ have non-zero yaw angles depending on its angle with the reference direction.
Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one's eyes).
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of $\\frac{\\pi}{2}$.
```
z up y front (yaw=0.5*pi)
^ ^
| /
| /
| /
|/
left (yaw=pi) ------ 0 ------> x right (yaw=0)
```
For a box, the value of its yaw angle equals its direction minus a reference direction. In all three coordinate systems in MMDetection3D, the reference direction is always the positive direction of the x-axis, while the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. The definition of the yaw angle of a box is illustrated in the figure below.
```
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | |
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
|
```
## Definition of the box dimensions
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be $dx$. However, this is not always the case in some datasets (we will address that later).
The following figures show the meaning of the correspondence between the x-axis and $dx$, and between the y-axis and $dy$.
```
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | | dx
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
| dy
```
Note that the box direction is always parallel with the edge $dx$.
```
y front
^ _________
/|\ | | |
| | | |
| | | | dy
| |____|____|____\ box direction (yaw=0)
| | | | /
__|____|____|____|_________\ x right
| | | | /
| |____|____|
| dx
|
```
## Relation with raw coordinate systems of supported datasets
### KITTI
The raw annotation of KITTI is under camera coordinate system, see [get_label_anno](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/kitti_data_utils.py). In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see [get_ann_info](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/kitti_dataset.py). For training vision-based models, the data is kept in the camera coordinate system.
In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's eye view):
![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
For each box, the dimensions are $(w, l, h)$, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
Our LiDAR coordinate system has two changes:
- The yaw angle is defined to be right-handed instead of left-handed for consistency;
- The box dimensions are $(l, w, h)$ instead of $(w, l, h)$, since $w$ corresponds to $dy$ and $l$ corresponds to $dx$ in KITTI.
### Waymo
We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also share the same coordinate system in our implementation.
### NuScenes
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to $(dy, dx)$, or $(w, l)$, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
### Lyft
Lyft shares the same data format with NuScenes as far as coordinate system is involved.
Please refer to the [official website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data) for more information.
### ScanNet
The raw data of ScanNet is not point cloud but mesh. The sampled point cloud data is under our depth coordinate system. For ScanNet detection task, the box annotations are axis-aligned, and the yaw angle is always zero. Therefore the direction of the yaw angle in our depth coordinate system makes no difference regarding ScanNet.
### SUN RGB-D
The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion.
For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to [sunrgbd_data_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/dataset_converters/sunrgbd_data_utils.py).
### S3DIS
S3DIS shares the same coordinate system as ScanNet in our implementation. However, S3DIS is a segmentation-task-only dataset, and thus no annotation is coordinate system sensitive.
## Examples
### Box conversion (between different coordinate systems)
Take the conversion between our Camera coordinate system and LiDAR coordinate system as an example:
First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
- $x\_{LiDAR}=z\_{camera}$
- $y\_{LiDAR}=-x\_{camera}$
- $z\_{LiDAR}=-y\_{camera}$
Then, the box dimensions before and after the conversion satisfy the following relationship:
- $dx\_{LiDAR}=dx\_{camera}$
- $dy\_{LiDAR}=dz\_{camera}$
- $dz\_{LiDAR}=dy\_{camera}$
Finally, the yaw angle should also be converted:
- $r\_{LiDAR}=-\\frac{\\pi}{2}-r\_{camera}$
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
### Bird's Eye View
The BEV of a camera coordinate system box is $(x, z, dx, dz, -r)$ if the 3D box is $(x, y, z, dx, dy, dz, r)$. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
### Rotation of boxes
We set the rotation of all kinds of boxes to be counter-clockwise about the gravity axis. Therefore, to rotate a 3D box we first calculate the new box center, and then we add the rotation angle to the yaw angle.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
## Common FAQ
#### Q1: Are the box related ops universal to all coordinate system types?
No. For example, [RoI-Aware Pooling ops](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/roiaware_pool3d.py) is applicable to boxes under Depth or LiDAR coordinate system only. The evaluation functions for KITTI dataset [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/kitti_utils) are only applicable to boxes under Camera coordinate system since the rotation is clockwise if viewed from above.
For each box related op, we have marked the type of boxes to which we can apply the op.
#### Q2: In every coordinate system, do the three axes point exactly to the right, the front, and the ground, respectively?
No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
#### Q3: How does a phase difference of $2\\pi$ in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of $2\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of $2\\pi$ will not change the result.
#### Q4: How does a phase difference of $\\pi$ in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of $\\pi$ in the yaw angle will result in the same box, thus not affecting evaluation.
However, for angle prediction evaluation, this will result in the exact opposite direction.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add $\\pi$ to this angle, the car front will become the car rear.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of $\\pi$ will not affect the angle prediction score.
# Customize Data Pipelines
## Design of Data pipelines
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in object detection may not be the same size (point number, gt bbox size, etc.),
we introduce a new `DataContainer` type in MMCV to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
The data preparation pipeline and the dataset is decomposed. Usually a dataset
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
![](../../../resources/data_pipeline.png)
The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
Here is an pipeline example for PointPillars.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
pts_scale_ratio=1.0,
flip=False,
pcd_horizontal_flip=False,
pcd_vertical_flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
```
For each operation, we list the related dict fields that are added/updated/removed.
### Data loading
`LoadPointsFromFile`
- add: points
`LoadPointsFromMultiSweeps`
- update: points
`LoadAnnotations3D`
- add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
### Pre-processing
`GlobalRotScaleTrans`
- add: pcd_trans, pcd_rotation, pcd_scale_factor
- update: points, \*bbox3d_fields
`RandomFlip3D`
- add: flip, pcd_horizontal_flip, pcd_vertical_flip
- update: points, \*bbox3d_fields
`PointsRangeFilter`
- update: points
`ObjectRangeFilter`
- update: gt_bboxes_3d, gt_labels_3d
`ObjectNameFilter`
- update: gt_bboxes_3d, gt_labels_3d
`PointShuffle`
- update: points
`PointsRangeFilter`
- update: points
### Formatting
`DefaultFormatBundle3D`
- update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
`Collect3D`
- add: img_meta (the keys of img_meta is specified by `meta_keys`)
- remove: all other keys except for those specified by `keys`
### Test time augmentation
`MultiScaleFlipAug`
- update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
## Extend and use custom pipelines
1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python
from mmdet.datasets import PIPELINES
@PIPELINES.register_module()
class MyTransform:
def __call__(self, results):
results['dummy'] = True
return results
```
2. Import the new class.
```python
from .my_pipeline import MyTransform
```
3. Use it in config files.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
# Dataset Preparation
## Before Preparation
It is recommended to symlink the dataset root to `$MMDETECTION3D/data`.
If your folder structure is different from the following, you may need to change the corresponding paths in config files.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
│ ├── s3dis
│ │ ├── meta_data
│ │ ├── Stanford3dDataset_v1.2_Aligned_Version
│ │ ├── collect_indoor3d_data.py
│ │ ├── indoor3d_util.py
│ │ ├── README.md
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ ├── sunrgbd
│ │ ├── OFFICIAL_SUNRGBD
│ │ ├── matlab
│ │ ├── sunrgbd_data.py
│ │ ├── sunrgbd_utils.py
│ │ ├── README.md
│ ├── semantickitti
│ │ ├── sequences
│ │ │ ├── 00
│ │ │ │   ├── labels
│ │ │ │   ├── velodyne
│ │ │ ├── 01
│ │ │ ├── ..
│ │ │ ├── 22
```
## Download and Data Preparation
### KITTI
1. Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset kitti
```
2. Prepare KITTI data splits by running:
```bash
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# Download data split
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
```
3. Generate info files by running:
```bash
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
```
In an environment using slurm, users may run the following command instead:
```bash
sh tools/create_data.sh <partition> kitti
```
**Tips**:
- **Ready-made Annotations**. We have also provided kitti data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/kitti/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
```bash
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti --only-gt-database
```
### Waymo
Download Waymo open dataset V1.4.1 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put `.tfrecord` files into corresponding folders in `data/waymo/waymo_format/` and put the data split `.txt` files into `data/waymo/kitti_format/ImageSets`. Download ground truth `.bin` file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running:
```bash
# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
# The number of `--workers` depends on the maximum number of cores in your CPU.
TF_CPP_MIN_LOG_LEVEL=3 python tools/create_data.py waymo --root-path ./data/waymo --out-dir ./data/waymo --workers 128 --extra-tag waymo --version v1.4
```
Note that:
- In case the preprocessing of Waymo dataset is slow or blocked, consider reducing the value of `--workers`. If this doesn't resolve the issue, you could set `--workers` as 0 to avoid using multiprocess.
- If your local disk does not have enough space for saving converted data, you can change the `--out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion.
**Tips**:
- **Ready-made Annotations**. We have provided the annotation files generated offline [here](#summary-of-annotation-files). However, the original Waymo data still needs to be converted to `kitti-format` data by yourself.
- **Waymo-mini**. If you just want to use a part of Waymo Dataset to verify some methods or debug quickly, you could use our provided [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) which only contains two segments in train split and one segment in val split from the original dataset. All the images, point clouds and annotations in this compressed file have been processed offline so that you can directly download and unzip it to `data/waymo/`:
```bash
tar -xzvf waymo_mini.tar.gz -C ./data/waymo_mini
```
### NuScenes
1. Download nuScenes V1.0 full dataset data [HERE](https://www.nuscenes.org/download). Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset nuscenes
```
2. Prepare nuscenes data by running:
```bash
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
```
**Tips**:
- **Ready-made Annotations**. We have also provided NuScenes data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/nuscenes/`. However, if you want to use `ObjectSample` Augmentation in LiDAR-based detection methods, you should additionally generate groundtruth database files and annotations.
```bash
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --only-gt-database
```
### Lyft
Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running:
```bash
python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
python tools/dataset_converters/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
```
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the [discussion](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000) for more details.
### SemanticKITTI
1. Download SemanticKITTI dataset [HERE](http://semantic-kitti.org/dataset.html#download) and unzip all zip files. Alternatively, you
can download the dataset from [OpenDataLab](https://opendatalab.com/) using MIM. The downloading and unzipping command scripts are the following:
```bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab. Note that you should register an account on [OpenDataLab](https://opendatalab.com/) before.
pip install odl
odl login
# download and preprocess by MIM
mim download mmdet3d --dataset semantickitti
```
2. Generate info files by running:
```bash
python ./tools/create_data.py semantickitti --root-path ./data/semantickitti --out-dir ./data/semantickitti --extra-tag semantickitti
```
**Tips**:
- **Ready-made Annotations**. We have also provided SemanticKITTI data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/semantickitti/`.
### S3DIS, ScanNet and SUN RGB-D
To prepare S3DIS data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/s3dis/README.md).
To prepare ScanNet data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/scannet/README.md).
To prepare SUN RGB-D data, please see its [README](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/data/sunrgbd/README.md).
**Tips**: For S3DIS, ScanNet and SUN RGB-D datasets, we have also provided data annotation files generated offline [here](#summary-of-annotation-files). You could download them and place them under `data/${DATASET}/`. However, you also need to generate point cloud files and semantic/instance masks files (if it has) by yourself.
### Customized Datasets
For using custom datasets, please refer to [Customize Datasets](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
### Update data infos
If you have used v1.0.0rc1-v1.0.0rc4 mmdetection3d to create data infos before, and now you want to use the newest v1.1.0 mmdetection3d, you need to update the data infos file.
```bash
python tools/dataset_converters/update_infos_to_v2.py --dataset ${DATA_SET} --pkl-path ${PKL_PATH} --out-dir ${OUT_DIR}
```
- `--dataset` : Name of dataset.
- `--pkl-path` : Specify the data infos pkl file path.
- `--out-dir` : Output direction of the data infos pkl file.
Example:
```bash
python tools/dataset_converters/update_infos_to_v2.py --dataset kitti --pkl-path ./data/kitti/kitti_infos_trainval.pkl --out-dir ./data/kitti
```
### Summary of annotation files
We provide ready-made annotation files we generated offline for reference. You can directly use these files for convenice.
| Dataset | Train annotation file | Val annotation file | Test information file |
| :-------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| KITTI | [kitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_train.pkl) | [kitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_val.pkl) | [kitti_infos_test](https://download.openmmlab.com/mmdetection3d/data/kitti/kitti_infos_test.pkl) |
| NuScenes | [nuscenes_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_train.pkl) [nuscenes_mini_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_train.pkl) | [nuscenes_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_infos_val.pkl) [nuscenes_mini_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/nuscenes/nuscenes_mini_infos_val.pkl) | |
| Waymo | [waymo_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_train.pkl) | [waymo_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_val.pkl) | [waymo_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo/waymo_infos_test.pkl) [waymo_infos_test_cam_only.pkl](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_infos_test_cam_only.pkl) |
| [Waymo-mini](https://download.openmmlab.com/mmdetection3d/data/waymo_mmdet3d_after_1x4/waymo_mini.tar.gz) | | | |
| SUN RGB-D | [sunrgbd_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_train.pkl) | [sunrgbd_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/sunrgbd/sunrgbd_infos_val.pkl) | |
| ScanNet | [scannet_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_train.pkl) | [scannet_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_val.pkl) | [scannet_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/scannet/scannet_infos_test.pkl) |
| SemanticKitti | [semantickitti_infos_train.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_train.pkl) | [semantickitti_infos_val.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_val.pkl) | [semantickitti_infos_test.pkl](https://download.openmmlab.com/mmdetection3d/data/semantickitti/semantickitti_infos_test.pkl) |
Train & Test
**************
.. toctree::
:maxdepth: 1
config.md
coord_sys_tutorial.md
dataset_prepare.md
data_pipeline.md
train_test.md
inference.md
new_data_model.md
Useful Tools
************
.. toctree::
:maxdepth: 1
useful_tools.md
visualization.md
backends_support.md
model_deployment.md
# Inference
## Introduction
We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from [model zoo](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/model_zoo.md). We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
## Testing
### 3D Detection
#### Point cloud demo
To test a 3D detector on point cloud data, simply run:
```shell
python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
```
The visualization results including a point cloud and predicted 3D bounding boxes will be saved in `${OUT_DIR}/PCD_NAME`, which you can open using [MeshLab](http://www.meshlab.net/). Note that if you set the flag `--show`, the prediction result will be displayed online using [Open3D](http://www.open3d.org/).
Example on KITTI data using [PointPillars model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth):
```shell
python demo/pcd_demo.py demo/data/kitti/000008.bin configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py ${CHECKPOINT_FILE} --show
```
Example on SUN RGB-D data using [VoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth):
```shell
python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --show
```
#### Monocular 3D demo
To test a monocular 3D detector on image data, simply run:
```shell
python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--cam-type ${CAM_TYPE}] [--score-thr ${SCORE-THR}] [--out-dir ${OUT_DIR}] [--show]
```
where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix (camera intrinsic matrix), and `CAM_TYPE` should be specified according to dataset. For example, if you want to inference on the front camera image, the `CAM_TYPE` should be set as `CAM_2` for KITTI, and `CAM_FRONT` for nuScenes. By specifying `CAM_TYPE`, you can even infer on any camera images for datasets with multi-view cameras, such as nuScenes and Waymo. `SCORE-THR` is the 3D bbox threshold while visualization. The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in `${OUT_DIR}/IMG_NAME`.
Example on KITTI data using [PGD model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d_20211022_102608-8a97533b.pth):
```shell
python demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/pgd/pgd_r101-caffe_fpn_head-gn_4xb3-4x_kitti-mono3d.py ${CHECKPOINT_FILE} --show --cam-type CAM2 --score-thr 8
```
**Note**: For PGD, the prediction score is not among (0, 1).
Example on nuScenes data using [FCOS3D model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth):
```shell
python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py ${CHECKPOINT_FILE} --show --cam-type CAM_BACK
```
**Note** that when visualizing results of monocular 3D detection for flipped images, the camera intrinsic matrix should also be modified accordingly. See more details and examples in PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744).
#### Multi-modality demo
To test a 3D detector on multi-modality data (typically point cloud and image), simply run:
```shell
python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
```
where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix. The visualization results including a point cloud, an image, predicted 3D bounding boxes and their projection on the image will be saved in `${OUT_DIR}/PCD_NAME`.
Example on KITTI data using [MVX-Net model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class-8963258a.pth):
```shell
python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py ${CHECKPOINT_FILE} --cam-type CAM2 --show
```
Example on SUN RGB-D data using [ImVoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class/imvotenet_stage2_16x8_sunrgbd-3d-10class_20210819_192851-1bcd1b97.pth):
```shell
python demo/multi_modality_demo.py demo/data/sunrgbd/000017.bin demo/data/sunrgbd/000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --cam-type CAM0 --show --score-thr 0.6
```
Example on NuScenes data using [BEVFusion model](https://drive.google.com/file/d/1QkvbYDk4G2d6SZoeJqish13qSyXA4lp3/view?usp=share_link):
```shell
python demo/multi_modality_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__LIDAR_TOP__1532402927647951.pcd.bin demo/data/nuscenes/ demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py ${CHECKPOINT_FILE} --cam-type all --score-thr 0.2 --show
```
### 3D Segmentation
To test a 3D segmentor on point cloud data, simply run:
```shell
python demo/pcd_seg_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
```
The visualization results including a point cloud and its predicted 3D segmentation mask will be saved in `${OUT_DIR}/PCD_NAME`.
Example on ScanNet data using [PointNet++ (SSG) model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class_20210514_143644-ee73704a.pth):
```shell
python demo/pcd_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py ${CHECKPOINT_FILE} --show
```
# Model Deployment
MMDet3D 1.1 fully relies on [MMDeploy](https://mmdeploy.readthedocs.io/) to deploy models.
Please stay tuned and this document will be update soon.
# Train with Customized Datasets
In this note, you will know how to train and test predefined models with customized datasets. We use the Waymo dataset as an example to describe the whole process.
The basic steps are as below:
1. Prepare the customized dataset
2. Prepare a config
3. Train, test, inference models on the customized dataset.
## Prepare the customized dataset
There are three ways to support a new dataset in MMDetection3D:
1. reorganize the dataset into existing format.
2. reorganize the dataset into a standard format.
3. implement a new dataset.
Usually we recommend to use the first two methods which are usually easier than the third.
In this note, we give an example for converting the data into KITTI format, you can refer to this to reorganize your dataset into kitti format. About the standard format dataset, and you can refer to [customize_dataset.md](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
**Note**: We take Waymo as the example here considering its format is totally different from other existing formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new data converter (for the second approach above) instead of converting it to another format (for the first approach above).
### KITTI dataset format
Firstly, the raw data for 3D object detection from KITTI are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation/testing set, `calib` contains calibration information files, `image_2` and `velodyne` include image data and point cloud data, and `label_2` includes label files for 3D detection.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
```
Specific annotation format is described in the official object development [kit](https://s3.eu-central-1.amazonaws.com/avg-kitti/devkit_object.zip). For example, it consists of the following labels:
```
#Values Name Description
----------------------------------------------------------------------------
1 type Describes the type of object: 'Car', 'Van', 'Truck',
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
'Misc' or 'DontCare'
1 truncated Float from 0 (non-truncated) to 1 (truncated), where
truncated refers to the object leaving image boundaries
1 occluded Integer (0,1,2,3) indicating occlusion state:
0 = fully visible, 1 = partly occluded
2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index):
contains left, top, right, bottom pixel coordinates
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
1 score Only for results: Float, indicating confidence in
detection, needed for p/r curves, higher is better.
```
Assume we use the Waymo dataset.
After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement `WaymoDataset` inherited from `KittiDataset` to load the data and perform training, and implement `WaymoMetric` inherited from `KittiMetric` for evaluation.
Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/waymo_dataset.py) to process it, in addition need to add a waymo [metric](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/metrics/waymo_metric.py) to evaluate results. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. Regarding the dataset evaluation metric, because Waymo has its own evaluation approach, we need further implement a new Waymo metric; more about the metric could refer to [metric_and_evaluator.md](https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/metric_and_evaluator.md). Afterward, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model by `WaymoMetric`.
For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [waymo_det.md](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html).
## Prepare a config
The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection.
Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehicle, cyclist and pedestrian, we need to prepare dataset config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/waymoD5-3d-3class.py), model config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_waymo.py) and combine them like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/kitti-3d-3class.py), [model config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_kitti.py) and [overall](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py).
## Train a new model
To train a model with the new config, you can simply run
```shell
python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
```
For more detailed usages, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
## Test and inference
To test the trained model, you can simply run
```shell
python tools/test.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py work_dirs/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class/latest.pth
```
**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html) and prepare files related to metrics computation as official instructions.
For more detailed usages for test and inference, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
# Test and Train on Standard Datasets
### Test existing models on standard datasets
- single GPU
- CPU
- single node multiple GPU
- multiple node
You can use the following commands to test a dataset.
```shell
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# CPU: disable GPUs and run single-gpu testing script (experimental)
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
```
**Note**:
For now, CPU testing is only supported for SMOKE.
Optional arguments:
- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`.
- `--show-dir`: If specified, detection results will be plotted on the `***_points.obj` and `***_pred.obj` files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
All evaluation related arguments are set in the `test_evaluator` in corresponding dataset configuration. such as
`test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_val.pkl', pklfile_prefix=None, submission_prefix=None)`
The arguments:
- `type`: The name of the corresponding metric, usually associated with the dataset.
- `ann_file`: The path of annotation file.
- `pklfile_prefix`: An optional argument. The filename of the output results in pickle format. If not specified, the results will not be saved to a file.
- `submission_prefix`: An optional argument. The results will be saved to a file then you can upload it to do the official evaluation.
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test VoteNet on ScanNet and save the points and prediction visualization results.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
--show --show-dir ./data/scannet/show_results
```
2. Test VoteNet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
--show --show-dir ./data/scannet/show_results
```
3. Test VoteNet on ScanNet (without saving the test results) and evaluate the mAP.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth
```
4. Test SECOND on KITTI with 8 GPUs, and evaluate the mAP.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth
```
5. Test PointPillars on nuScenes with 8 GPUs, and generate the json file to be submit to the official evaluation server.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py \
checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \
--cfg-options 'test_evaluator.jsonfile_prefix=./pointpillars_nuscenes_results'
```
The generated results be under `./pointpillars_nuscenes_results` directory.
6. Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission data to be submit to the official evaluation server.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \
--cfg-options 'test_evaluator.pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results'
```
The generated results be under `./second_kitti_results` directory.
7. Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \
checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth \
--cfg-options 'test_evaluator.jsonfile_prefix=results/pp_lyft/results_challenge' \
'test_evaluator.csv_savepath=results/pp_lyft/results_challenge.csv' \
'test_evaluator.pklfile_prefix=results/pp_lyft/results_challenge.pkl'
```
**Notice**: To generate submissions on Lyft, `csv_savepath` must be given in the `--cfg-options`. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit).
Note that in the [config of Lyft dataset](../../configs/_base_/datasets/lyft-3d.py), the value of `ann_file` keyword in `test` is `'lyft_infos_test.pkl'`, which is the official test set of Lyft without annotation. To test on the validation set, please change this to `'lyft_infos_val.pkl'`.
8. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
--cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
```
**Notice**: For evaluation on waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`.(Sometimes when using bazel to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.) `pklfile_prefix` should be given in the `--eval-options` for the bin file generation. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision.
9. Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
--cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
```
**Notice**: After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). For evaluation on the validation set with the eval server, you can also use the same way to generate a submission.
## Train predefined models on standard datasets
MMDetection3D implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config.
```python
train_cfg = dict(type='EpochBasedTrainLoop', val_interval=1) # This evaluate the model per 12 epoch.
```
**Important**: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config's file name, e.g. '2xb8' means 2 samples per GPU using 8 GPUs.
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves.
### Train with a single GPU
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
### Training with CPU (experimental)
The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.
```shell
export CUDA_VISIBLE_DEVICES=-1
```
And then run the script of train with a single GPU.
**Note**:
For now, most of the point cloud related algorithms rely on 3D CUDA op, which can not be trained on CPU. Some monocular 3D object detection algorithms, like FCOS3D and SMOKE can be trained on CPU. We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug certain models on machines without GPU for convenience.
### Train with multiple GPUs
```shell
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
```
Optional arguments are:
- `--cfg-options 'Key=value'`: Override some settings in the used config.
### Train with multiple machines
If you run MMDetection3D on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
```shell
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
```
Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
```shell
GPUS=16 ./tools/slurm_train.sh dev pp_kitti_3class configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py /nfs/xxxx/pp_kitti_3class
```
You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
```
Usually it is slow if you do not have high speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
```
If you use launch training jobs with Slurm, there are two ways to specify the ports.
1. Set the port through `--cfg-options`. This is more recommended since it does not change the original configs.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29500'
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29501'
```
2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In `config1.py`,
```python
env_cfg = dict(
dist_cfg=dict(backend='nccl', port=29500)
)
```
In `config2.py`,
```python
env_cfg = dict(
dist_cfg=dict(backend='nccl', port=29501)
)
```
Then you can launch two jobs with `config1.py` and `config2.py`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
```
We provide lots of useful tools under `tools/` directory.
## Log Analysis
You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency.
![loss curve image](../../../resources/loss_curve.png)
```shell
python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] [--mode ${MODE}] [--interval ${INTERVAL}]
```
**Notice**: If the metric you want to plot is calculated in the eval stage, you need to add the flag `--mode eval`. If you perform evaluation with an interval of `${INTERVAL}`, you need to add the args `--interval ${INTERVAL}`.
Examples:
- Plot the classification loss of some run.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- Plot the classification and regression loss of some run, and save the figure to a pdf.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
- Compare the bbox mAP of two runs in the same figure.
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
You can also compute the average training speed.
```shell
python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
```
The output is expected to be like the following.
```
-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter
```
&#8195;
## Model Serving
**Note**: This tool is still experimental now, only SECOND is supported to be served with [`TorchServe`](https://pytorch.org/serve/). We'll support more models in the future.
In order to serve an `MMDetection3D` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
### 1. Convert the model from MMDetection3D to TorchServe
```shell
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```
**Note**: ${MODEL_STORE} needs to be an absolute path to a folder.
### 2. Build `mmdet3d-serve` docker image
```shell
docker build -t mmdet3d-serve:latest docker/serve/
```
### 3. Run `mmdet3d-serve`
Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
In order to run it on the GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run on the CPU.
Example:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
```
[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) about the Inference (8080), Management (8081) and Metrics (8082) APis
### 4. Test deployment
You can use `test_torchserver.py` to compare result of torchserver and pytorch.
```shell
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
```
Example:
```shell
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
&#8195;
## Model Complexity
You can use `tools/analysis_tools/get_flops.py` in MMDetection3D, a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch), to compute the FLOPs and params of a given model.
```shell
python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
```
You will get the results like this.
```text
==============================
Input shape: (40000, 4)
Flops: 5.78 GFLOPs
Params: 953.83 k
==============================
```
**Note**: This tool is still experimental and we do not guarantee that the
number is absolutely correct. You may well use the result for simple
comparisons, but double check it before you adopt it in technical reports or papers.
1. FLOPs are related to the input shape while parameters are not. The default
input shape is (1, 40000, 4).
2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
3. We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
&#8195;
## Model Conversion
### RegNet model to MMDetection
`tools/model_converters/regnet2mmdet.py` convert keys in pycls pretrained RegNet models to
MMDetection style.
```shell
python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
```
### Detectron ResNet to Pytorch
`tools/detectron2pytorch.py` in MMDetection could convert keys in the original detectron pretrained
ResNet models to PyTorch style.
```shell
python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
```
### Prepare a model for publishing
`tools/model_converters/publish_model.py` helps users to prepare their model for publishing.
Before you upload a model to AWS, you may want to
1. convert model weights to CPU tensors
2. delete the optimizer states and
3. compute the hash of the checkpoint file and append the hash id to the
filename.
```shell
python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
E.g.,
```shell
python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
```
The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
&#8195;
## Dataset Conversion
`tools/dataset_converters/` contains tools for converting datasets to other formats. Most of them convert datasets to pickle based info files, like kitti, nuscenes and lyft. Waymo converter is used to reorganize waymo raw data like KITTI style. Users could refer to them for our approach to converting data format. It is also convenient to modify them to use as scripts like nuImages converter.
To convert the nuImages dataset into COCO format, please use the command below:
```shell
python -u tools/dataset_converters/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERSIONS} \
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
```
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
More details could be referred to the [doc](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html) for dataset preparation and [README](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages/README.md/) for nuImages dataset.
&#8195;
## Miscellaneous
### Print the entire config
`tools/misc/print_config.py` prints the whole config verbatim, expanding all its
imports.
```shell
python tools/misc/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
```
# Visualization
MMDetection3D provides a `Det3DLocalVisualizer` to visualize and store the state of the model during training and testing, as well as results, with the following features.
1. Support the basic drawing interface for multi-modality data and multi-task.
2. Support multiple backends such as local, TensorBoard, to write training status such as `loss`, `lr`, or performance evaluation metrics and to a specified single or multiple backends.
3. Support ground truth visualization on multimodal data, and cross-modal visualization of 3D detection results.
## Basic Drawing Interface
Inherited from `DetLocalVisualizer`, `Det3DLocalVisualizer` provides an interface for drawing common objects on 2D images, such as drawing detection boxes, points, text, lines, circles, polygons, and binary masks. More details about 2D drawing can refer to the [visualization documentation](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/visualization.html) in MMDetection. Here we introduce the 3D drawing interface:
### Drawing point cloud on the image
We support drawing point cloud on the image by using `draw_points_on_image`.
```python
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
info_file = load('demo/data/kitti/000008.pkl')
points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
points = points.reshape(-1, 4)[:, :3]
lidar2img = np.array(info_file['data_list'][0]['images']['CAM2']['lidar2img'], dtype=np.float32)
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
visualizer.draw_points_on_image(points, lidar2img)
visualizer.show()
```
![points_on_image](../../../resources/points_on_image.png)
### Drawing 3D Boxes on Point Cloud
We support drawing 3D boxes on point cloud by using `draw_bboxes_3d`.
```python
import torch
import numpy as np
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import LiDARInstance3DBoxes
points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
points = points.reshape(-1, 4)
visualizer = Det3DLocalVisualizer()
# set point cloud in visualizer
visualizer.set_points(points)
bboxes_3d = LiDARInstance3DBoxes(
torch.tensor([[8.7314, -1.8559, -1.5997, 4.2000, 3.4800, 1.8900,
-1.5808]]))
# Draw 3D bboxes
visualizer.draw_bboxes_3d(bboxes_3d)
visualizer.show()
```
![mono3d](../../../resources/pcd.png)
### Drawing Projected 3D Boxes on Image
We support drawing projected 3D boxes on image by using `draw_proj_bboxes_3d`.
```python
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
cam2img = np.array(info_file['data_list'][0]['images']['CAM2']['cam2img'], dtype=np.float32)
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
input_meta = {'cam2img': cam2img}
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
# project 3D bboxes to image
visualizer.draw_proj_bboxes_3d(gt_bboxes_3d, input_meta)
visualizer.show()
```
### Drawing BEV Boxes
We support drawing BEV boxes by using `draw_bev_bboxes`.
```python
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
visualizer = Det3DLocalVisualizer()
# set bev image in visualizer
visualizer.set_bev_image()
# draw bev bboxes
visualizer.draw_bev_bboxes(gt_bboxes_3d, edge_colors='orange')
visualizer.show()
```
### Drawing 3D Semantic Mask
We support draw segmentation mask via per-point colorization by using `draw_seg_mask`.
```python
import numpy as np
from mmdet3d.visualization import Det3DLocalVisualizer
points = np.fromfile('demo/data/sunrgbd/000017.bin', dtype=np.float32)
points = points.reshape(-1, 3)
visualizer = Det3DLocalVisualizer()
mask = np.random.rand(points.shape[0], 3)
points_with_mask = np.concatenate((points, mask), axis=-1)
# Draw 3D points with mask
visualizer.set_points(points, pcd_mode=2, vis_mode='add')
visualizer.draw_seg_mask(points_with_mask)
visualizer.show()
```
## Results
To see the prediction results of trained models, you can run the following command:
```bash
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
```
After running this command, plotted results including input data and the output of networks visualized on the input will be saved in `${SHOW_DIR}`.
After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. `***_gt.png` and `***_pred.png` in multi-modality detection task and vision-based detection task) in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported. You can download the `results.pkl` from the remote server, and visualize the prediction results offline in your local machine.
To visualize the results with `Open3D` backend offline, you can run the following command:
```bash
python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}
```
![](../../../resources/open3d_visual.gif)
This allows the inference and results generation to be done in remote server and the users can open them on their host with GUI.
## Dataset
We also provide scripts to visualize the dataset without inference. You can use `tools/misc/browse_dataset.py` to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --output-dir ${OUTPUT_DIR}
```
**Notice**: Once specifying `--output-dir`, the images of views specified by users will be saved when pressing `_ESC_` in open3d window. If you want to zoom out/in the point clouds to inspect more details, you could specify `--show-interval=0` in the command.
To verify the data consistency and the effect of data augmentation, you can also add `--aug` flag to visualize the data after data augmentation using the command as below:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --aug --output-dir ${OUTPUT_DIR}
```
If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the `--task` args to `multi-modality_det`. An example is showed below:
```shell
python tools/misc/browse_dataset.py configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py --task multi-modality_det --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_multi_modality.png)
You can simply browse different datasets using different configs, e.g. visualizing the ScanNet dataset in 3D semantic segmentation task:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/scannet-seg.py --task lidar_seg --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_seg.png)
And browsing the nuScenes dataset in monocular 3D detection task:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task mono_det --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_mono.png)
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.header-logo {
background-image: url("../image/mmdet3d-logo.png");
background-size: 182.5px 40px;
height: 40px;
width: 182.5px;
}
# 自定义数据集
在本节中,您将了解如何使用自定义数据集训练和测试预定义模型。
基本步骤如下:
1. 准备数据
2. 准备配置文件
3. 在自定义数据集上训练,测试和推理模型
## 数据准备
理想情况下我们可以重新组织自定义的原始数据并将标注格式转换成 KITTI 风格。但是,考虑到对于自定义数据集而言,KITTI 格式的校准文件和 3D 标注难以获得,因此我们在文档中介绍基本的数据格式。
### 基本数据格式
#### 点云格式
目前,我们只支持 `.bin` 格式的点云用于训练和推理。在训练自己的数据集之前,需要将其它格式的点云文件转换成 `.bin` 文件。常见的点云数据格式包括 `.pcd``.las`,我们列举了一些开源工具作为参考。
1. `.pcd` 转换成 `.bin`:https://github.com/DanielPollithy/pypcd
- 您可以通过以下指令安装 `pypcd`
```bash
pip install git+https://github.com/DanielPollithy/pypcd.git
```
- 您可以使用以下脚本读取 `.pcd` 文件,并将其转换成 `.bin` 格式来保存:
```python
import numpy as np
from pypcd import pypcd
pcd_data = pypcd.PointCloud.from_path('point_cloud_data.pcd')
points = np.zeros([pcd_data.width, 4], dtype=np.float32)
points[:, 0] = pcd_data.pc_data['x'].copy()
points[:, 1] = pcd_data.pc_data['y'].copy()
points[:, 2] = pcd_data.pc_data['z'].copy()
points[:, 3] = pcd_data.pc_data['intensity'].copy().astype(np.float32)
with open('point_cloud_data.bin', 'wb') as f:
f.write(points.tobytes())
```
2. `.las` 转换成 `.bin`:常见的转换流程为 `.las -> .pcd -> .bin``.las -> .pcd` 的转换可以用该[工具](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor)实现。
#### 标签格式
最基本的信息:每个场景的 3D 边界框和类别标签应该包含在 `.txt` 标注文件中。每一行代表特定场景的一个 3D 框,如下所示:
```
# 格式:[x, y, z, dx, dy, dz, yaw, category_name]
1.23 1.42 0.23 3.96 1.65 1.55 1.56 Car
3.51 2.15 0.42 1.05 0.87 1.86 1.23 Pedestrian
...
```
**注意**:对于自定义数据集的评估我们目前只支持 KITTI 评估方法。
3D 框应存储在统一的 3D 坐标系中。
#### 校准格式
对于每个激光雷达收集的点云数据,通常会进行融合并转换到特定的激光雷达坐标系。因此,校准信息文件中通常应该包含每个相机的内参矩阵和激光雷达到每个相机的外参转换矩阵,并保存在 `.txt` 校准文件中,其中 `Px` 表示 `camera_x` 的内参矩阵,`lidar2camx` 表示 `lidar``camera_x` 的外参转换矩阵。
```
P0
P1
P2
P3
P4
...
lidar2cam0
lidar2cam1
lidar2cam2
lidar2cam3
lidar2cam4
...
```
### 原始数据结构
#### 基于激光雷达的 3D 检测
基于激光雷达的 3D 目标检测原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`points` 包含存储成 `.bin` 格式的点云数据,`labels` 包含 3D 检测的标签文件。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 基于视觉的 3D 检测
基于视觉的 3D 目标检测原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`images` 包含来自不同相机的图像,例如 `camera_x` 获得的图像应放在 `images/images_x` 下,`calibs` 包含校准信息文件,其中存储了每个相机的内参矩阵,`labels` 包含 3D 检测的标签文件。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 多模态 3D 检测
多模态 3D 目标检测原始数据通常组织成如下格式。不同于基于视觉的 3D 目标检测,`calibs` 里的校准信息文件存储了每个相机的内参矩阵和外参矩阵。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 基于激光雷达的 3D 语义分割
基于激光雷达的 3D 语义分割原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`points` 包含点云数据,`semantic_mask` 包含逐点级标签。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── semantic_mask
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
```
### 数据转换
按照我们的说明准备好原始数据后,您可以直接使用以下命令生成训练/验证信息文件。
```bash
python tools/create_data.py custom --root-path ./data/custom --out-dir ./data/custom --extra-tag custom
```
## 自定义数据集示例
在完成数据准备后,我们可以在 `mmdet3d/datasets/my_dataset.py` 中创建一个新的数据集来加载数据。
```python
import mmengine
from mmdet3d.registry import DATASETS
from .det3d_dataset import Det3DDataset
@DATASETS.register_module()
class MyDataset(Det3DDataset):
# 替换成自定义 pkl 信息文件里的所有类别
METAINFO = {
'classes': ('Pedestrian', 'Cyclist', 'Car')
}
def parse_ann_info(self, info):
"""Process the `instances` in data info to `ann_info`.
Args:
info (dict): Data information of single data sample.
Returns:
dict: Annotation information consists of the following keys:
- gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`):
3D ground truth bboxes.
- gt_labels_3d (np.ndarray): Labels of ground truths.
"""
ann_info = super().parse_ann_info(info)
if ann_info is None:
ann_info = dict()
# 空实例
ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32)
ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64)
# 过滤掉没有在训练中使用的类别
ann_info = self._remove_dontcare(ann_info)
gt_bboxes_3d = LiDARInstance3DBoxes(ann_info['gt_bboxes_3d'])
ann_info['gt_bboxes_3d'] = gt_bboxes_3d
return ann_info
```
数据预处理后,用户可以通过以下两个步骤来训练自定义数据集:
1. 修改配置文件来使用自定义数据集。
2. 验证自定义数据集标注的正确性。
这里我们以在自定义数据集上训练 PointPillars 为例:
### 准备配置
这里我们演示一个纯点云训练的配置示例:
#### 准备数据集配置
`configs/_base_/datasets/custom.py` 中:
```python
# 数据集设置
dataset_type = 'MyDataset'
data_root = 'data/custom/'
class_names = ['Pedestrian', 'Cyclist', 'Car'] # 替换成您的数据集类别
point_cloud_range = [0, -40, -3, 70.4, 40, 1] # 根据您的数据集进行调整
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # 替换成您的点云数据维度
use_dim=4), # 替换成在训练和推理时实际使用的维度
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # 替换成您的点云数据维度
use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
# 为可视化阶段的数据和 GT 加载构造流水线
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points']),
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='custom_infos_train.pkl', # 指定您的训练 pkl 信息
data_prefix=dict(pts='points'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='points'),
ann_file='custom_infos_val.pkl', # 指定您的验证 pkl 信息
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # 指定您的验证 pkl 信息
metric='bbox')
```
#### 准备模型配置
对于基于体素化的检测器如 SECOND,PointPillars 及 CenterPoint,点云范围(point cloud range)和体素大小(voxel size)应该根据您的数据集做调整。理论上,`voxel_size``point_cloud_range` 的设置是相关联的。设置较小的 `voxel_size` 将增加体素数以及相应的内存消耗。此外,需要注意以下问题:
如果将 `point_cloud_range``voxel_size` 分别设置成 `[0, -40, -3, 70.4, 40, 1]``[0.05, 0.05, 0.1]`,那么中间特征图的形状应该为 `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`。更改 `point_cloud_range` 时,请记得依据 `voxel_size` 更改 `middle_encoder` 里中间特征图的形状。
关于 `anchor_range` 的设置,一般需要根据数据集做调整。需要注意的是,`z` 值需要根据点云的位置做相应调整,具体请参考此 [issue](https://github.com/open-mmlab/mmdetection3d/issues/986)
关于 `anchor_size` 的设置,通常需要计算整个训练集中目标的长、宽、高的平均值作为 `anchor_size`,以获得最好的结果。
`configs/_base_/models/pointpillars_hv_secfpn_custom.py` 中:
```python
voxel_size = [0.16, 0.16, 4] # 根据您的数据集做调整
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1] # 根据您的数据集做调整
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=point_cloud_range,
voxel_size=voxel_size,
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=voxel_size,
point_cloud_range=point_cloud_range),
# `output_shape` 需要根据 `point_cloud_range` 和 `voxel_size` 做相应调整
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
# 根据您的数据集调整 `ranges` 和 `sizes`
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78],
],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
# 模型训练和测试设置
train_cfg=dict(
assigner=[
dict( # for Pedestrian
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Cyclist
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Car
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1),
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
```
#### 准备整体配置
我们将上述的所有配置组合在 `configs/pointpillars/pointpillars_hv_secfpn_8xb6_custom.py` 文件中:
```python
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_custom.py',
'../_base_/datasets/custom.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
```
#### 可视化数据集(可选)
为了验证准备的数据和配置是否正确,我们建议在训练和验证前使用 `tools/misc/browse_dataset.py` 脚本可视化数据集和标注。更多细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/dev-1.x/user_guides/visualization.html)
## 评估
准备好数据和配置之后,您可以遵循我们的文档直接运行训练/测试脚本。
**注意**:我们为自定义数据集提供了 KITTI 风格的评估实现方法。在数据集配置中需要包含如下内容:
```python
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # 指定您的验证 pkl 信息
metric='bbox')
```
# 自定义模型
我们通常把模型的各个组成成分分成 6 种类型:
- 编码器(encoder):包括 voxel encoder 和 middle encoder 等进入 backbone 前所使用的基于体素的方法,如 `HardVFE``PointPillarsScatter`
- 骨干网络(backbone):通常采用 FCN 网络来提取特征图,如 `ResNet``SECOND`
- 颈部网络(neck):位于 backbones 和 heads 之间的组成模块,如 `FPN``SECONDFPN`
- 检测头(head):用于特定任务的组成模块,如`检测框的预测``掩码的预测`
- RoI 提取器(RoI extractor):用于从特征图中提取 RoI 特征的组成模块,如 `H3DRoIHead``PartAggregationROIHead`
- 损失函数(loss):heads 中用于计算损失函数的组成模块,如 `FocalLoss``L1Loss``GHMLoss`
## 开发新的组成模块
### 添加新的编码器
接下来我们以 HardVFE 为例展示如何开发新的组成模块。
#### 1. 定义一个新的体素编码器(如 HardVFE:即 HV-SECOND 中使用的体素特征编码器)
创建一个新文件 `mmdet3d/models/voxel_encoders/voxel_encoder.py`
```python
import torch.nn as nn
from mmdet3d.registry import MODELS
@MODELS.register_module()
class HardVFE(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # 需要返回一个元组
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/voxel_encoders/__init__.py` 中添加以下代码:
```python
from .voxel_encoder import HardVFE
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.voxel_encoders.voxel_encoder'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用体素编码器
```python
model = dict(
...
voxel_encoder=dict(
type='HardVFE',
arg1=xxx,
arg2=yyy),
...
)
```
### 添加新的骨干网络
接下来我们以 [SECOND](https://www.mdpi.com/1424-8220/18/10/3337)(Sparsely Embedded Convolutional Detection)为例展示如何开发新的组成模块。
#### 1. 定义一个新的骨干网络(如 SECOND)
创建一个新文件 `mmdet3d/models/backbones/second.py`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECOND(BaseModule):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # 需要返回一个元组
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/backbones/__init__.py` 中添加以下代码:
```python
from .second import SECOND
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.backbones.second'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用骨干网络
```python
model = dict(
...
backbone=dict(
type='SECOND',
arg1=xxx,
arg2=yyy),
...
)
```
### 添加新的颈部网络
#### 1. 定义一个新的颈部网络(如 SECONDFPN)
创建一个新文件 `mmdet3d/models/necks/second_fpn.py`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECONDFPN(BaseModule):
def __init__(self,
in_channels=[128, 128, 256],
out_channels=[256, 256, 256],
upsample_strides=[1, 2, 4],
norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
upsample_cfg=dict(type='deconv', bias=False),
conv_cfg=dict(type='Conv2d', bias=False),
use_conv_for_no_stride=False,
init_cfg=None):
pass
def forward(self, x):
# 具体实现忽略
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/necks/__init__.py` 中添加以下代码:
```python
from .second_fpn import SECONDFPN
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.necks.second_fpn'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用颈部网络
```python
model = dict(
...
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
...
)
```
### 添加新的检测头
接下来我们以 [PartA2 Head](https://arxiv.org/abs/1907.03670) 为例展示如何开发新的检测头。
**注意**:此处展示的 `PartA2 RoI Head` 将用于检测器的第二阶段。对于单阶段的检测头,请参考 `mmdet3d/models/dense_heads/` 中的例子。由于其简单高效,它们更常用于自动驾驶场景下的 3D 检测中。
首先,在 `mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py` 中添加新的 bbox head。`PartA2 RoI Head` 为目标检测实现了一个新的 bbox head。为了实现一个 bbox head,我们通常需要在新模块中实现如下两个函数。有时还需要实现其他相关函数,如 `loss``get_targets`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class PartA2BboxHead(BaseModule):
"""PartA2 RoI head."""
def __init__(self,
num_classes,
seg_in_channels,
part_in_channels,
seg_conv_channels=None,
part_conv_channels=None,
merge_conv_channels=None,
down_conv_channels=None,
shared_fc_channels=None,
cls_channels=None,
reg_channels=None,
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
loss_bbox=dict(
type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
reduction='none',
loss_weight=1.0),
init_cfg=None):
super(PartA2BboxHead, self).__init__(init_cfg=init_cfg)
def forward(self, seg_feats, part_feats):
pass
```
其次,如果有必要的话需要实现一个新的 RoI Head。我们从 `Base3DRoIHead` 中继承得到新的 `PartAggregationROIHead`。我们可以发现 `Base3DRoIHead` 已经实现了如下函数。
```python
from mmdet.models.roi_heads import BaseRoIHead
from mmdet3d.registry import MODELS, TASK_UTILS
class Base3DRoIHead(BaseRoIHead):
"""Base class for 3d RoIHeads."""
def __init__(self,
bbox_head=None,
bbox_roi_extractor=None,
mask_head=None,
mask_roi_extractor=None,
train_cfg=None,
test_cfg=None,
init_cfg=None):
super(Base3DRoIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
mask_head=mask_head,
mask_roi_extractor=mask_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
def init_bbox_head(self, bbox_roi_extractor: dict,
bbox_head: dict) -> None:
"""Initialize box head and box roi extractor.
Args:
bbox_roi_extractor (dict or ConfigDict): Config of box
roi extractor.
bbox_head (dict or ConfigDict): Config of box in box head.
"""
self.bbox_roi_extractor = MODELS.build(bbox_roi_extractor)
self.bbox_head = MODELS.build(bbox_head)
def init_assigner_sampler(self):
"""Initialize assigner and sampler."""
self.bbox_assigner = None
self.bbox_sampler = None
if self.train_cfg:
if isinstance(self.train_cfg.assigner, dict):
self.bbox_assigner = TASK_UTILS.build(self.train_cfg.assigner)
elif isinstance(self.train_cfg.assigner, list):
self.bbox_assigner = [
TASK_UTILS.build(res) for res in self.train_cfg.assigner
]
self.bbox_sampler = TASK_UTILS.build(self.train_cfg.sampler)
def init_mask_head(self):
"""Initialize mask head, skip since ``PartAggregationROIHead`` does not
have one."""
pass
```
接下来主要对 bbox_forward 的逻辑进行修改,同时其继承了来自 `Base3DRoIHead` 的其它逻辑。在 `mmdet3d/models/roi_heads/part_aggregation_roi_head.py` 中,我们实现了新的 RoI Head,如下所示:
```python
from typing import Dict, List, Tuple
from mmdet.models.task_modules import AssignResult, SamplingResult
from mmengine import ConfigDict
from torch import Tensor
from torch.nn import functional as F
from mmdet3d.registry import MODELS
from mmdet3d.structures import bbox3d2roi
from mmdet3d.utils import InstanceList
from ...structures.det3d_data_sample import SampleList
from .base_3droi_head import Base3DRoIHead
@MODELS.register_module()
class PartAggregationROIHead(Base3DRoIHead):
"""Part aggregation roi head for PartA2.
Args:
semantic_head (ConfigDict): Config of semantic head.
num_classes (int): The number of classes.
seg_roi_extractor (ConfigDict): Config of seg_roi_extractor.
bbox_roi_extractor (ConfigDict): Config of part_roi_extractor.
bbox_head (ConfigDict): Config of bbox_head.
train_cfg (ConfigDict): Training config.
test_cfg (ConfigDict): Testing config.
"""
def __init__(self,
semantic_head: dict,
num_classes: int = 3,
seg_roi_extractor: dict = None,
bbox_head: dict = None,
bbox_roi_extractor: dict = None,
train_cfg: dict = None,
test_cfg: dict = None,
init_cfg: dict = None) -> None:
super(PartAggregationROIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
self.num_classes = num_classes
assert semantic_head is not None
self.init_seg_head(seg_roi_extractor, semantic_head)
def init_seg_head(self, seg_roi_extractor: dict,
semantic_head: dict) -> None:
"""Initialize semantic head and seg roi extractor.
Args:
seg_roi_extractor (dict): Config of seg
roi extractor.
semantic_head (dict): Config of semantic head.
"""
self.semantic_head = MODELS.build(semantic_head)
self.seg_roi_extractor = MODELS.build(seg_roi_extractor)
@property
def with_semantic(self):
"""bool: whether the head has semantic branch"""
return hasattr(self,
'semantic_head') and self.semantic_head is not None
def predict(self,
feats_dict: Dict,
rpn_results_list: InstanceList,
batch_data_samples: SampleList,
rescale: bool = False,
**kwargs) -> InstanceList:
"""Perform forward propagation of the roi head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
rescale (bool): If True, return boxes in original image space.
Defaults to False.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
assert self.with_bbox, 'Bbox head must be implemented in PartA2.'
assert self.with_semantic, 'Semantic head must be implemented' \
' in PartA2.'
batch_input_metas = [
data_samples.metainfo for data_samples in batch_data_samples
]
voxels_dict = feats_dict.pop('voxels_dict')
# TODO: Split predict semantic and bbox
results_list = self.predict_bbox(feats_dict, voxels_dict,
batch_input_metas, rpn_results_list,
self.test_cfg)
return results_list
def predict_bbox(self, feats_dict: Dict, voxel_dict: Dict,
batch_input_metas: List[dict],
rpn_results_list: InstanceList,
test_cfg: ConfigDict) -> InstanceList:
"""Perform forward propagation of the bbox head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
voxel_dict (dict): Contains information of voxels.
batch_input_metas (list[dict], Optional): Batch image meta info.
Defaults to None.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
test_cfg (Config): Test config.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
...
def loss(self, feats_dict: Dict, rpn_results_list: InstanceList,
batch_data_samples: SampleList, **kwargs) -> dict:
"""Perform forward propagation and loss calculation of the detection
roi on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
Returns:
dict[str, Tensor]: A dictionary of loss components
"""
assert len(rpn_results_list) == len(batch_data_samples)
losses = dict()
batch_gt_instances_3d = []
batch_gt_instances_ignore = []
voxels_dict = feats_dict.pop('voxels_dict')
for data_sample in batch_data_samples:
batch_gt_instances_3d.append(data_sample.gt_instances_3d)
if 'ignored_instances' in data_sample:
batch_gt_instances_ignore.append(data_sample.ignored_instances)
else:
batch_gt_instances_ignore.append(None)
if self.with_semantic:
semantic_results = self._semantic_forward_train(
feats_dict, voxels_dict, batch_gt_instances_3d)
losses.update(semantic_results.pop('loss_semantic'))
sample_results = self._assign_and_sample(rpn_results_list,
batch_gt_instances_3d)
if self.with_bbox:
feats_dict.update(semantic_results)
bbox_results = self._bbox_forward_train(feats_dict, voxels_dict,
sample_results)
losses.update(bbox_results['loss_bbox'])
return losses
```
此处我们省略了相关函数的更多细节。更多细节请参考[代码](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/models/roi_heads/part_aggregation_roi_head.py)
最后,用户需要在 `mmdet3d/models/roi_heads/bbox_heads/__init__.py``mmdet3d/models/roi_heads/__init__.py` 添加模块,从而能被相应的注册器找到并加载。
此外,用户也可以在配置文件中添加以下代码以达到相同的目的。
```python
custom_imports=dict(
imports=['mmdet3d.models.roi_heads.part_aggregation_roi_head', 'mmdet3d.models.roi_heads.bbox_heads.parta2_bbox_head'],
allow_failed_imports=False)
```
`PartAggregationROIHead` 的配置文件如下所示:
```python
model = dict(
...
roi_head=dict(
type='PartAggregationROIHead',
num_classes=3,
semantic_head=dict(
type='PointwiseSemanticHead',
in_channels=16,
extra_width=0.2,
seg_score_thr=0.3,
num_classes=3,
loss_seg=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
reduction='sum',
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_part=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0)),
seg_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='max')),
bbox_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='avg')),
bbox_head=dict(
type='PartA2BboxHead',
num_classes=3,
seg_in_channels=16,
part_in_channels=4,
seg_conv_channels=[64, 64],
part_conv_channels=[64, 64],
merge_conv_channels=[128, 128],
down_conv_channels=[128, 256],
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
shared_fc_channels=[256, 512, 512, 512],
cls_channels=[256, 256],
reg_channels=[256, 256],
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=1.0 / 9.0,
reduction='sum',
loss_weight=1.0),
loss_cls=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0))),
...
)
```
MMDetection 2.0 开始支持配置文件之间的继承,因此用户可以关注配置文件的修改。PartA2 Head 的第二阶段主要使用了新的 `PartAggregationROIHead``PartA2BboxHead`,需要根据对应模块的 `__init__` 函数来设置参数。
### 添加新的损失函数
假设您想要为检测框的回归添加一个新的损失函数 `MyLoss`。为了添加一个新的损失函数,用户需要在 `mmdet3d/models/losses/my_loss.py` 中实现该函数。装饰器 `weighted_loss` 能够保证对每个元素的损失进行加权平均。
```python
import torch
import torch.nn as nn
from mmdet.models.losses.utils import weighted_loss
from mmdet3d.registry import MODELS
@weighted_loss
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@MODELS.register_module()
class MyLoss(nn.Module):
def __init__(self, reduction='mean', loss_weight=1.0):
super(MyLoss, self).__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * my_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss_bbox
```
接下来,用户需要在 `mmdet3d/models/losses/__init__.py` 添加该函数。
```python
from .my_loss import MyLoss, my_loss
```
或者在配置文件中添加以下代码以达到相同的目的。
```python
custom_imports=dict(
imports=['mmdet3d.models.losses.my_loss'],
allow_failed_imports=False)
```
为了使用该函数,用户需要修改 `loss_xxx` 域。由于 `MyLoss` 是用于回归的,您需要修改 head 中的 `loss_bbox` 域。
```python
loss_bbox=dict(type='MyLoss', loss_weight=1.0)
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment