Commit eb1107e4 authored by raojy's avatar raojy
Browse files

fix_mmdetection

parent 7aa442d5
Pipeline #3461 canceled with stages
# Inference
## Introduction
We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from [model zoo](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/model_zoo.md). We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
## Testing
### 3D Detection
#### Point cloud demo
To test a 3D detector on point cloud data, simply run:
```shell
python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
```
The visualization results including a point cloud and predicted 3D bounding boxes will be saved in `${OUT_DIR}/PCD_NAME`, which you can open using [MeshLab](http://www.meshlab.net/). Note that if you set the flag `--show`, the prediction result will be displayed online using [Open3D](http://www.open3d.org/).
Example on KITTI data using [PointPillars model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth):
```shell
python demo/pcd_demo.py demo/data/kitti/000008.bin configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py ${CHECKPOINT_FILE} --show
```
Example on SUN RGB-D data using [VoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/votenet/votenet_16x8_sunrgbd-3d-10class/votenet_16x8_sunrgbd-3d-10class_20210820_162823-bf11f014.pth):
```shell
python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --show
```
#### Monocular 3D demo
To test a monocular 3D detector on image data, simply run:
```shell
python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--cam-type ${CAM_TYPE}] [--score-thr ${SCORE-THR}] [--out-dir ${OUT_DIR}] [--show]
```
where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix (camera intrinsic matrix), and `CAM_TYPE` should be specified according to dataset. For example, if you want to inference on the front camera image, the `CAM_TYPE` should be set as `CAM_2` for KITTI, and `CAM_FRONT` for nuScenes. By specifying `CAM_TYPE`, you can even infer on any camera images for datasets with multi-view cameras, such as nuScenes and Waymo. `SCORE-THR` is the 3D bbox threshold while visualization. The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in `${OUT_DIR}/IMG_NAME`.
Example on KITTI data using [PGD model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d_20211022_102608-8a97533b.pth):
```shell
python demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/pgd/pgd_r101-caffe_fpn_head-gn_4xb3-4x_kitti-mono3d.py ${CHECKPOINT_FILE} --show --cam-type CAM2 --score-thr 8
```
**Note**: For PGD, the prediction score is not among (0, 1).
Example on nuScenes data using [FCOS3D model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth):
```shell
python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py ${CHECKPOINT_FILE} --show --cam-type CAM_BACK
```
**Note** that when visualizing results of monocular 3D detection for flipped images, the camera intrinsic matrix should also be modified accordingly. See more details and examples in PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744).
#### Multi-modality demo
To test a 3D detector on multi-modality data (typically point cloud and image), simply run:
```shell
python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
```
where the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix. The visualization results including a point cloud, an image, predicted 3D bounding boxes and their projection on the image will be saved in `${OUT_DIR}/PCD_NAME`.
Example on KITTI data using [MVX-Net model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class-8963258a.pth):
```shell
python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py ${CHECKPOINT_FILE} --cam-type CAM2 --show
```
Example on SUN RGB-D data using [ImVoteNet model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class/imvotenet_stage2_16x8_sunrgbd-3d-10class_20210819_192851-1bcd1b97.pth):
```shell
python demo/multi_modality_demo.py demo/data/sunrgbd/000017.bin demo/data/sunrgbd/000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd-3d.py ${CHECKPOINT_FILE} --cam-type CAM0 --show --score-thr 0.6
```
Example on NuScenes data using [BEVFusion model](https://drive.google.com/file/d/1QkvbYDk4G2d6SZoeJqish13qSyXA4lp3/view?usp=share_link):
```shell
python demo/multi_modality_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__LIDAR_TOP__1532402927647951.pcd.bin demo/data/nuscenes/ demo/data/nuscenes/n015-2018-07-24-11-22-45+0800.pkl projects/BEVFusion/configs/bevfusion_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py ${CHECKPOINT_FILE} --cam-type all --score-thr 0.2 --show
```
### 3D Segmentation
To test a 3D segmentor on point cloud data, simply run:
```shell
python demo/pcd_seg_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
```
The visualization results including a point cloud and its predicted 3D segmentation mask will be saved in `${OUT_DIR}/PCD_NAME`.
Example on ScanNet data using [PointNet++ (SSG) model](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class_20210514_143644-ee73704a.pth):
```shell
python demo/pcd_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py ${CHECKPOINT_FILE} --show
```
# Model Deployment
MMDet3D 1.1 fully relies on [MMDeploy](https://mmdeploy.readthedocs.io/) to deploy models.
Please stay tuned and this document will be update soon.
# Train with Customized Datasets
In this note, you will know how to train and test predefined models with customized datasets. We use the Waymo dataset as an example to describe the whole process.
The basic steps are as below:
1. Prepare the customized dataset
2. Prepare a config
3. Train, test, inference models on the customized dataset.
## Prepare the customized dataset
There are three ways to support a new dataset in MMDetection3D:
1. reorganize the dataset into existing format.
2. reorganize the dataset into a standard format.
3. implement a new dataset.
Usually we recommend to use the first two methods which are usually easier than the third.
In this note, we give an example for converting the data into KITTI format, you can refer to this to reorganize your dataset into kitti format. About the standard format dataset, and you can refer to [customize_dataset.md](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/en/advanced_guides/customize_dataset.md).
**Note**: We take Waymo as the example here considering its format is totally different from other existing formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new data converter (for the second approach above) instead of converting it to another format (for the first approach above).
### KITTI dataset format
Firstly, the raw data for 3D object detection from KITTI are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation/testing set, `calib` contains calibration information files, `image_2` and `velodyne` include image data and point cloud data, and `label_2` includes label files for 3D detection.
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
```
Specific annotation format is described in the official object development [kit](https://s3.eu-central-1.amazonaws.com/avg-kitti/devkit_object.zip). For example, it consists of the following labels:
```
#Values Name Description
----------------------------------------------------------------------------
1 type Describes the type of object: 'Car', 'Van', 'Truck',
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
'Misc' or 'DontCare'
1 truncated Float from 0 (non-truncated) to 1 (truncated), where
truncated refers to the object leaving image boundaries
1 occluded Integer (0,1,2,3) indicating occlusion state:
0 = fully visible, 1 = partly occluded
2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index):
contains left, top, right, bottom pixel coordinates
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
1 score Only for results: Float, indicating confidence in
detection, needed for p/r curves, higher is better.
```
Assume we use the Waymo dataset.
After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement `WaymoDataset` inherited from `KittiDataset` to load the data and perform training, and implement `WaymoMetric` inherited from `KittiMetric` for evaluation.
Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/datasets/waymo_dataset.py) to process it, in addition need to add a waymo [metric](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/metrics/waymo_metric.py) to evaluate results. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. Regarding the dataset evaluation metric, because Waymo has its own evaluation approach, we need further implement a new Waymo metric; more about the metric could refer to [metric_and_evaluator.md](https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/metric_and_evaluator.md). Afterward, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model by `WaymoMetric`.
For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [waymo_det.md](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html).
## Prepare a config
The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection.
Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehicle, cyclist and pedestrian, we need to prepare dataset config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/waymoD5-3d-3class.py), model config like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_waymo.py) and combine them like [this](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/kitti-3d-3class.py), [model config](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/models/pointpillars_hv_secfpn_kitti.py) and [overall](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py).
## Train a new model
To train a model with the new config, you can simply run
```shell
python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
```
For more detailed usages, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
## Test and inference
To test the trained model, you can simply run
```shell
python tools/test.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py work_dirs/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class/latest.pth
```
**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html) and prepare files related to metrics computation as official instructions.
For more detailed usages for test and inference, please refer to the [Case 1](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html).
# Test and Train on Standard Datasets
### Test existing models on standard datasets
- single GPU
- CPU
- single node multiple GPU
- multiple node
You can use the following commands to test a dataset.
```shell
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# CPU: disable GPUs and run single-gpu testing script (experimental)
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
```
**Note**:
For now, CPU testing is only supported for SMOKE.
Optional arguments:
- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`.
- `--show-dir`: If specified, detection results will be plotted on the `***_points.obj` and `***_pred.obj` files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
All evaluation related arguments are set in the `test_evaluator` in corresponding dataset configuration. such as
`test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_val.pkl', pklfile_prefix=None, submission_prefix=None)`
The arguments:
- `type`: The name of the corresponding metric, usually associated with the dataset.
- `ann_file`: The path of annotation file.
- `pklfile_prefix`: An optional argument. The filename of the output results in pickle format. If not specified, the results will not be saved to a file.
- `submission_prefix`: An optional argument. The results will be saved to a file then you can upload it to do the official evaluation.
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test VoteNet on ScanNet and save the points and prediction visualization results.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
--show --show-dir ./data/scannet/show_results
```
2. Test VoteNet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \
--show --show-dir ./data/scannet/show_results
```
3. Test VoteNet on ScanNet (without saving the test results) and evaluate the mAP.
```shell
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \
checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth
```
4. Test SECOND on KITTI with 8 GPUs, and evaluate the mAP.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth
```
5. Test PointPillars on nuScenes with 8 GPUs, and generate the json file to be submit to the official evaluation server.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py \
checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \
--cfg-options 'test_evaluator.jsonfile_prefix=./pointpillars_nuscenes_results'
```
The generated results be under `./pointpillars_nuscenes_results` directory.
6. Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission data to be submit to the official evaluation server.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \
checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \
--cfg-options 'test_evaluator.pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results'
```
The generated results be under `./second_kitti_results` directory.
7. Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \
checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth \
--cfg-options 'test_evaluator.jsonfile_prefix=results/pp_lyft/results_challenge' \
'test_evaluator.csv_savepath=results/pp_lyft/results_challenge.csv' \
'test_evaluator.pklfile_prefix=results/pp_lyft/results_challenge.pkl'
```
**Notice**: To generate submissions on Lyft, `csv_savepath` must be given in the `--cfg-options`. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit).
Note that in the [config of Lyft dataset](../../configs/_base_/datasets/lyft-3d.py), the value of `ann_file` keyword in `test` is `'lyft_infos_test.pkl'`, which is the official test set of Lyft without annotation. To test on the validation set, please change this to `'lyft_infos_val.pkl'`.
8. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
--cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
```
**Notice**: For evaluation on waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`.(Sometimes when using bazel to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.) `pklfile_prefix` should be given in the `--eval-options` for the bin file generation. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision.
9. Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \
--cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \
'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
```
**Notice**: After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). For evaluation on the validation set with the eval server, you can also use the same way to generate a submission.
## Train predefined models on standard datasets
MMDetection3D implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config.
```python
train_cfg = dict(type='EpochBasedTrainLoop', val_interval=1) # This evaluate the model per 12 epoch.
```
**Important**: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config's file name, e.g. '2xb8' means 2 samples per GPU using 8 GPUs.
According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves.
### Train with a single GPU
```shell
python tools/train.py ${CONFIG_FILE} [optional arguments]
```
If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`.
### Training with CPU (experimental)
The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.
```shell
export CUDA_VISIBLE_DEVICES=-1
```
And then run the script of train with a single GPU.
**Note**:
For now, most of the point cloud related algorithms rely on 3D CUDA op, which can not be trained on CPU. Some monocular 3D object detection algorithms, like FCOS3D and SMOKE can be trained on CPU. We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug certain models on machines without GPU for convenience.
### Train with multiple GPUs
```shell
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
```
Optional arguments are:
- `--cfg-options 'Key=value'`: Override some settings in the used config.
### Train with multiple machines
If you run MMDetection3D on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
```shell
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
```
Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
```shell
GPUS=16 ./tools/slurm_train.sh dev pp_kitti_3class configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py /nfs/xxxx/pp_kitti_3class
```
You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
```
Usually it is slow if you do not have high speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
```
If you use launch training jobs with Slurm, there are two ways to specify the ports.
1. Set the port through `--cfg-options`. This is more recommended since it does not change the original configs.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29500'
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29501'
```
2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In `config1.py`,
```python
env_cfg = dict(
dist_cfg=dict(backend='nccl', port=29500)
)
```
In `config2.py`,
```python
env_cfg = dict(
dist_cfg=dict(backend='nccl', port=29501)
)
```
Then you can launch two jobs with `config1.py` and `config2.py`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
```
We provide lots of useful tools under `tools/` directory.
## Log Analysis
You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency.
![loss curve image](../../../resources/loss_curve.png)
```shell
python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] [--mode ${MODE}] [--interval ${INTERVAL}]
```
**Notice**: If the metric you want to plot is calculated in the eval stage, you need to add the flag `--mode eval`. If you perform evaluation with an interval of `${INTERVAL}`, you need to add the args `--interval ${INTERVAL}`.
Examples:
- Plot the classification loss of some run.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- Plot the classification and regression loss of some run, and save the figure to a pdf.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
- Compare the bbox mAP of two runs in the same figure.
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
You can also compute the average training speed.
```shell
python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
```
The output is expected to be like the following.
```
-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter
```
 
## Model Serving
**Note**: This tool is still experimental now, only SECOND is supported to be served with [`TorchServe`](https://pytorch.org/serve/). We'll support more models in the future.
In order to serve an `MMDetection3D` model with [`TorchServe`](https://pytorch.org/serve/), you can follow the steps:
### 1. Convert the model from MMDetection3D to TorchServe
```shell
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```
**Note**: ${MODEL_STORE} needs to be an absolute path to a folder.
### 2. Build `mmdet3d-serve` docker image
```shell
docker build -t mmdet3d-serve:latest docker/serve/
```
### 3. Run `mmdet3d-serve`
Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).
In order to run it on the GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run on the CPU.
Example:
```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
```
[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) about the Inference (8080), Management (8081) and Metrics (8082) APis
### 4. Test deployment
You can use `test_torchserver.py` to compare result of torchserver and pytorch.
```shell
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
```
Example:
```shell
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
 
## Model Complexity
You can use `tools/analysis_tools/get_flops.py` in MMDetection3D, a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch), to compute the FLOPs and params of a given model.
```shell
python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
```
You will get the results like this.
```text
==============================
Input shape: (40000, 4)
Flops: 5.78 GFLOPs
Params: 953.83 k
==============================
```
**Note**: This tool is still experimental and we do not guarantee that the
number is absolutely correct. You may well use the result for simple
comparisons, but double check it before you adopt it in technical reports or papers.
1. FLOPs are related to the input shape while parameters are not. The default
input shape is (1, 40000, 4).
2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
3. We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
 
## Model Conversion
### RegNet model to MMDetection
`tools/model_converters/regnet2mmdet.py` convert keys in pycls pretrained RegNet models to
MMDetection style.
```shell
python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
```
### Detectron ResNet to Pytorch
`tools/detectron2pytorch.py` in MMDetection could convert keys in the original detectron pretrained
ResNet models to PyTorch style.
```shell
python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
```
### Prepare a model for publishing
`tools/model_converters/publish_model.py` helps users to prepare their model for publishing.
Before you upload a model to AWS, you may want to
1. convert model weights to CPU tensors
2. delete the optimizer states and
3. compute the hash of the checkpoint file and append the hash id to the
filename.
```shell
python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
E.g.,
```shell
python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
```
The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
 
## Dataset Conversion
`tools/dataset_converters/` contains tools for converting datasets to other formats. Most of them convert datasets to pickle based info files, like kitti, nuscenes and lyft. Waymo converter is used to reorganize waymo raw data like KITTI style. Users could refer to them for our approach to converting data format. It is also convenient to modify them to use as scripts like nuImages converter.
To convert the nuImages dataset into COCO format, please use the command below:
```shell
python -u tools/dataset_converters/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERSIONS} \
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
```
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
More details could be referred to the [doc](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html) for dataset preparation and [README](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages/README.md/) for nuImages dataset.
 
## Miscellaneous
### Print the entire config
`tools/misc/print_config.py` prints the whole config verbatim, expanding all its
imports.
```shell
python tools/misc/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]
```
# Visualization
MMDetection3D provides a `Det3DLocalVisualizer` to visualize and store the state of the model during training and testing, as well as results, with the following features.
1. Support the basic drawing interface for multi-modality data and multi-task.
2. Support multiple backends such as local, TensorBoard, to write training status such as `loss`, `lr`, or performance evaluation metrics and to a specified single or multiple backends.
3. Support ground truth visualization on multimodal data, and cross-modal visualization of 3D detection results.
## Basic Drawing Interface
Inherited from `DetLocalVisualizer`, `Det3DLocalVisualizer` provides an interface for drawing common objects on 2D images, such as drawing detection boxes, points, text, lines, circles, polygons, and binary masks. More details about 2D drawing can refer to the [visualization documentation](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/visualization.html) in MMDetection. Here we introduce the 3D drawing interface:
### Drawing point cloud on the image
We support drawing point cloud on the image by using `draw_points_on_image`.
```python
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
info_file = load('demo/data/kitti/000008.pkl')
points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
points = points.reshape(-1, 4)[:, :3]
lidar2img = np.array(info_file['data_list'][0]['images']['CAM2']['lidar2img'], dtype=np.float32)
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
visualizer.draw_points_on_image(points, lidar2img)
visualizer.show()
```
![points_on_image](../../../resources/points_on_image.png)
### Drawing 3D Boxes on Point Cloud
We support drawing 3D boxes on point cloud by using `draw_bboxes_3d`.
```python
import torch
import numpy as np
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import LiDARInstance3DBoxes
points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
points = points.reshape(-1, 4)
visualizer = Det3DLocalVisualizer()
# set point cloud in visualizer
visualizer.set_points(points)
bboxes_3d = LiDARInstance3DBoxes(
torch.tensor([[8.7314, -1.8559, -1.5997, 4.2000, 3.4800, 1.8900,
-1.5808]]))
# Draw 3D bboxes
visualizer.draw_bboxes_3d(bboxes_3d)
visualizer.show()
```
![mono3d](../../../resources/pcd.png)
### Drawing Projected 3D Boxes on Image
We support drawing projected 3D boxes on image by using `draw_proj_bboxes_3d`.
```python
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
cam2img = np.array(info_file['data_list'][0]['images']['CAM2']['cam2img'], dtype=np.float32)
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
input_meta = {'cam2img': cam2img}
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
# project 3D bboxes to image
visualizer.draw_proj_bboxes_3d(gt_bboxes_3d, input_meta)
visualizer.show()
```
### Drawing BEV Boxes
We support drawing BEV boxes by using `draw_bev_bboxes`.
```python
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
visualizer = Det3DLocalVisualizer()
# set bev image in visualizer
visualizer.set_bev_image()
# draw bev bboxes
visualizer.draw_bev_bboxes(gt_bboxes_3d, edge_colors='orange')
visualizer.show()
```
### Drawing 3D Semantic Mask
We support draw segmentation mask via per-point colorization by using `draw_seg_mask`.
```python
import numpy as np
from mmdet3d.visualization import Det3DLocalVisualizer
points = np.fromfile('demo/data/sunrgbd/000017.bin', dtype=np.float32)
points = points.reshape(-1, 3)
visualizer = Det3DLocalVisualizer()
mask = np.random.rand(points.shape[0], 3)
points_with_mask = np.concatenate((points, mask), axis=-1)
# Draw 3D points with mask
visualizer.set_points(points, pcd_mode=2, vis_mode='add')
visualizer.draw_seg_mask(points_with_mask)
visualizer.show()
```
## Results
To see the prediction results of trained models, you can run the following command:
```bash
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
```
After running this command, plotted results including input data and the output of networks visualized on the input will be saved in `${SHOW_DIR}`.
After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. `***_gt.png` and `***_pred.png` in multi-modality detection task and vision-based detection task) in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported. You can download the `results.pkl` from the remote server, and visualize the prediction results offline in your local machine.
To visualize the results with `Open3D` backend offline, you can run the following command:
```bash
python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}
```
![](../../../resources/open3d_visual.gif)
This allows the inference and results generation to be done in remote server and the users can open them on their host with GUI.
## Dataset
We also provide scripts to visualize the dataset without inference. You can use `tools/misc/browse_dataset.py` to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --output-dir ${OUTPUT_DIR}
```
**Notice**: Once specifying `--output-dir`, the images of views specified by users will be saved when pressing `_ESC_` in open3d window. If you want to zoom out/in the point clouds to inspect more details, you could specify `--show-interval=0` in the command.
To verify the data consistency and the effect of data augmentation, you can also add `--aug` flag to visualize the data after data augmentation using the command as below:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --aug --output-dir ${OUTPUT_DIR}
```
If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the `--task` args to `multi-modality_det`. An example is showed below:
```shell
python tools/misc/browse_dataset.py configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py --task multi-modality_det --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_multi_modality.png)
You can simply browse different datasets using different configs, e.g. visualizing the ScanNet dataset in 3D semantic segmentation task:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/scannet-seg.py --task lidar_seg --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_seg.png)
And browsing the nuScenes dataset in monocular 3D detection task:
```shell
python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task mono_det --output-dir ${OUTPUT_DIR}
```
![](../../../resources/browse_dataset_mono.png)
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.header-logo {
background-image: url("../image/mmdet3d-logo.png");
background-size: 182.5px 40px;
height: 40px;
width: 182.5px;
}
# 自定义数据集
在本节中,您将了解如何使用自定义数据集训练和测试预定义模型。
基本步骤如下:
1. 准备数据
2. 准备配置文件
3. 在自定义数据集上训练,测试和推理模型
## 数据准备
理想情况下我们可以重新组织自定义的原始数据并将标注格式转换成 KITTI 风格。但是,考虑到对于自定义数据集而言,KITTI 格式的校准文件和 3D 标注难以获得,因此我们在文档中介绍基本的数据格式。
### 基本数据格式
#### 点云格式
目前,我们只支持 `.bin` 格式的点云用于训练和推理。在训练自己的数据集之前,需要将其它格式的点云文件转换成 `.bin` 文件。常见的点云数据格式包括 `.pcd``.las`,我们列举了一些开源工具作为参考。
1. `.pcd` 转换成 `.bin`:https://github.com/DanielPollithy/pypcd
- 您可以通过以下指令安装 `pypcd`
```bash
pip install git+https://github.com/DanielPollithy/pypcd.git
```
- 您可以使用以下脚本读取 `.pcd` 文件,并将其转换成 `.bin` 格式来保存:
```python
import numpy as np
from pypcd import pypcd
pcd_data = pypcd.PointCloud.from_path('point_cloud_data.pcd')
points = np.zeros([pcd_data.width, 4], dtype=np.float32)
points[:, 0] = pcd_data.pc_data['x'].copy()
points[:, 1] = pcd_data.pc_data['y'].copy()
points[:, 2] = pcd_data.pc_data['z'].copy()
points[:, 3] = pcd_data.pc_data['intensity'].copy().astype(np.float32)
with open('point_cloud_data.bin', 'wb') as f:
f.write(points.tobytes())
```
2. `.las` 转换成 `.bin`:常见的转换流程为 `.las -> .pcd -> .bin``.las -> .pcd` 的转换可以用该[工具](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor)实现。
#### 标签格式
最基本的信息:每个场景的 3D 边界框和类别标签应该包含在 `.txt` 标注文件中。每一行代表特定场景的一个 3D 框,如下所示:
```
# 格式:[x, y, z, dx, dy, dz, yaw, category_name]
1.23 1.42 0.23 3.96 1.65 1.55 1.56 Car
3.51 2.15 0.42 1.05 0.87 1.86 1.23 Pedestrian
...
```
**注意**:对于自定义数据集的评估我们目前只支持 KITTI 评估方法。
3D 框应存储在统一的 3D 坐标系中。
#### 校准格式
对于每个激光雷达收集的点云数据,通常会进行融合并转换到特定的激光雷达坐标系。因此,校准信息文件中通常应该包含每个相机的内参矩阵和激光雷达到每个相机的外参转换矩阵,并保存在 `.txt` 校准文件中,其中 `Px` 表示 `camera_x` 的内参矩阵,`lidar2camx` 表示 `lidar``camera_x` 的外参转换矩阵。
```
P0
P1
P2
P3
P4
...
lidar2cam0
lidar2cam1
lidar2cam2
lidar2cam3
lidar2cam4
...
```
### 原始数据结构
#### 基于激光雷达的 3D 检测
基于激光雷达的 3D 目标检测原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`points` 包含存储成 `.bin` 格式的点云数据,`labels` 包含 3D 检测的标签文件。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 基于视觉的 3D 检测
基于视觉的 3D 目标检测原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`images` 包含来自不同相机的图像,例如 `camera_x` 获得的图像应放在 `images/images_x` 下,`calibs` 包含校准信息文件,其中存储了每个相机的内参矩阵,`labels` 包含 3D 检测的标签文件。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 多模态 3D 检测
多模态 3D 目标检测原始数据通常组织成如下格式。不同于基于视觉的 3D 目标检测,`calibs` 里的校准信息文件存储了每个相机的内参矩阵和外参矩阵。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
```
#### 基于激光雷达的 3D 语义分割
基于激光雷达的 3D 语义分割原始数据通常组织成如下格式,其中 `ImageSets` 包含划分文件,指明哪些文件属于训练/验证集,`points` 包含点云数据,`semantic_mask` 包含逐点级标签。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── semantic_mask
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
```
### 数据转换
按照我们的说明准备好原始数据后,您可以直接使用以下命令生成训练/验证信息文件。
```bash
python tools/create_data.py custom --root-path ./data/custom --out-dir ./data/custom --extra-tag custom
```
## 自定义数据集示例
在完成数据准备后,我们可以在 `mmdet3d/datasets/my_dataset.py` 中创建一个新的数据集来加载数据。
```python
import mmengine
from mmdet3d.registry import DATASETS
from .det3d_dataset import Det3DDataset
@DATASETS.register_module()
class MyDataset(Det3DDataset):
# 替换成自定义 pkl 信息文件里的所有类别
METAINFO = {
'classes': ('Pedestrian', 'Cyclist', 'Car')
}
def parse_ann_info(self, info):
"""Process the `instances` in data info to `ann_info`.
Args:
info (dict): Data information of single data sample.
Returns:
dict: Annotation information consists of the following keys:
- gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`):
3D ground truth bboxes.
- gt_labels_3d (np.ndarray): Labels of ground truths.
"""
ann_info = super().parse_ann_info(info)
if ann_info is None:
ann_info = dict()
# 空实例
ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32)
ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64)
# 过滤掉没有在训练中使用的类别
ann_info = self._remove_dontcare(ann_info)
gt_bboxes_3d = LiDARInstance3DBoxes(ann_info['gt_bboxes_3d'])
ann_info['gt_bboxes_3d'] = gt_bboxes_3d
return ann_info
```
数据预处理后,用户可以通过以下两个步骤来训练自定义数据集:
1. 修改配置文件来使用自定义数据集。
2. 验证自定义数据集标注的正确性。
这里我们以在自定义数据集上训练 PointPillars 为例:
### 准备配置
这里我们演示一个纯点云训练的配置示例:
#### 准备数据集配置
`configs/_base_/datasets/custom.py` 中:
```python
# 数据集设置
dataset_type = 'MyDataset'
data_root = 'data/custom/'
class_names = ['Pedestrian', 'Cyclist', 'Car'] # 替换成您的数据集类别
point_cloud_range = [0, -40, -3, 70.4, 40, 1] # 根据您的数据集进行调整
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # 替换成您的点云数据维度
use_dim=4), # 替换成在训练和推理时实际使用的维度
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # 替换成您的点云数据维度
use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
# 为可视化阶段的数据和 GT 加载构造流水线
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points']),
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='custom_infos_train.pkl', # 指定您的训练 pkl 信息
data_prefix=dict(pts='points'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='points'),
ann_file='custom_infos_val.pkl', # 指定您的验证 pkl 信息
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # 指定您的验证 pkl 信息
metric='bbox')
```
#### 准备模型配置
对于基于体素化的检测器如 SECOND,PointPillars 及 CenterPoint,点云范围(point cloud range)和体素大小(voxel size)应该根据您的数据集做调整。理论上,`voxel_size``point_cloud_range` 的设置是相关联的。设置较小的 `voxel_size` 将增加体素数以及相应的内存消耗。此外,需要注意以下问题:
如果将 `point_cloud_range``voxel_size` 分别设置成 `[0, -40, -3, 70.4, 40, 1]``[0.05, 0.05, 0.1]`,那么中间特征图的形状应该为 `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`。更改 `point_cloud_range` 时,请记得依据 `voxel_size` 更改 `middle_encoder` 里中间特征图的形状。
关于 `anchor_range` 的设置,一般需要根据数据集做调整。需要注意的是,`z` 值需要根据点云的位置做相应调整,具体请参考此 [issue](https://github.com/open-mmlab/mmdetection3d/issues/986)
关于 `anchor_size` 的设置,通常需要计算整个训练集中目标的长、宽、高的平均值作为 `anchor_size`,以获得最好的结果。
`configs/_base_/models/pointpillars_hv_secfpn_custom.py` 中:
```python
voxel_size = [0.16, 0.16, 4] # 根据您的数据集做调整
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1] # 根据您的数据集做调整
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=point_cloud_range,
voxel_size=voxel_size,
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=voxel_size,
point_cloud_range=point_cloud_range),
# `output_shape` 需要根据 `point_cloud_range` 和 `voxel_size` 做相应调整
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
# 根据您的数据集调整 `ranges` 和 `sizes`
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78],
],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
# 模型训练和测试设置
train_cfg=dict(
assigner=[
dict( # for Pedestrian
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Cyclist
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Car
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1),
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
```
#### 准备整体配置
我们将上述的所有配置组合在 `configs/pointpillars/pointpillars_hv_secfpn_8xb6_custom.py` 文件中:
```python
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_custom.py',
'../_base_/datasets/custom.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
```
#### 可视化数据集(可选)
为了验证准备的数据和配置是否正确,我们建议在训练和验证前使用 `tools/misc/browse_dataset.py` 脚本可视化数据集和标注。更多细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/dev-1.x/user_guides/visualization.html)
## 评估
准备好数据和配置之后,您可以遵循我们的文档直接运行训练/测试脚本。
**注意**:我们为自定义数据集提供了 KITTI 风格的评估实现方法。在数据集配置中需要包含如下内容:
```python
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # 指定您的验证 pkl 信息
metric='bbox')
```
# 自定义模型
我们通常把模型的各个组成成分分成 6 种类型:
- 编码器(encoder):包括 voxel encoder 和 middle encoder 等进入 backbone 前所使用的基于体素的方法,如 `HardVFE``PointPillarsScatter`
- 骨干网络(backbone):通常采用 FCN 网络来提取特征图,如 `ResNet``SECOND`
- 颈部网络(neck):位于 backbones 和 heads 之间的组成模块,如 `FPN``SECONDFPN`
- 检测头(head):用于特定任务的组成模块,如`检测框的预测``掩码的预测`
- RoI 提取器(RoI extractor):用于从特征图中提取 RoI 特征的组成模块,如 `H3DRoIHead``PartAggregationROIHead`
- 损失函数(loss):heads 中用于计算损失函数的组成模块,如 `FocalLoss``L1Loss``GHMLoss`
## 开发新的组成模块
### 添加新的编码器
接下来我们以 HardVFE 为例展示如何开发新的组成模块。
#### 1. 定义一个新的体素编码器(如 HardVFE:即 HV-SECOND 中使用的体素特征编码器)
创建一个新文件 `mmdet3d/models/voxel_encoders/voxel_encoder.py`
```python
import torch.nn as nn
from mmdet3d.registry import MODELS
@MODELS.register_module()
class HardVFE(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # 需要返回一个元组
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/voxel_encoders/__init__.py` 中添加以下代码:
```python
from .voxel_encoder import HardVFE
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.voxel_encoders.voxel_encoder'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用体素编码器
```python
model = dict(
...
voxel_encoder=dict(
type='HardVFE',
arg1=xxx,
arg2=yyy),
...
)
```
### 添加新的骨干网络
接下来我们以 [SECOND](https://www.mdpi.com/1424-8220/18/10/3337)(Sparsely Embedded Convolutional Detection)为例展示如何开发新的组成模块。
#### 1. 定义一个新的骨干网络(如 SECOND)
创建一个新文件 `mmdet3d/models/backbones/second.py`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECOND(BaseModule):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # 需要返回一个元组
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/backbones/__init__.py` 中添加以下代码:
```python
from .second import SECOND
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.backbones.second'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用骨干网络
```python
model = dict(
...
backbone=dict(
type='SECOND',
arg1=xxx,
arg2=yyy),
...
)
```
### 添加新的颈部网络
#### 1. 定义一个新的颈部网络(如 SECONDFPN)
创建一个新文件 `mmdet3d/models/necks/second_fpn.py`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECONDFPN(BaseModule):
def __init__(self,
in_channels=[128, 128, 256],
out_channels=[256, 256, 256],
upsample_strides=[1, 2, 4],
norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
upsample_cfg=dict(type='deconv', bias=False),
conv_cfg=dict(type='Conv2d', bias=False),
use_conv_for_no_stride=False,
init_cfg=None):
pass
def forward(self, x):
# 具体实现忽略
pass
```
#### 2. 导入该模块
您可以在 `mmdet3d/models/necks/__init__.py` 中添加以下代码:
```python
from .second_fpn import SECONDFPN
```
或者在配置文件中添加以下代码,从而避免修改源码:
```python
custom_imports = dict(
imports=['mmdet3d.models.necks.second_fpn'],
allow_failed_imports=False)
```
#### 3. 在配置文件中使用颈部网络
```python
model = dict(
...
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
...
)
```
### 添加新的检测头
接下来我们以 [PartA2 Head](https://arxiv.org/abs/1907.03670) 为例展示如何开发新的检测头。
**注意**:此处展示的 `PartA2 RoI Head` 将用于检测器的第二阶段。对于单阶段的检测头,请参考 `mmdet3d/models/dense_heads/` 中的例子。由于其简单高效,它们更常用于自动驾驶场景下的 3D 检测中。
首先,在 `mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py` 中添加新的 bbox head。`PartA2 RoI Head` 为目标检测实现了一个新的 bbox head。为了实现一个 bbox head,我们通常需要在新模块中实现如下两个函数。有时还需要实现其他相关函数,如 `loss``get_targets`
```python
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class PartA2BboxHead(BaseModule):
"""PartA2 RoI head."""
def __init__(self,
num_classes,
seg_in_channels,
part_in_channels,
seg_conv_channels=None,
part_conv_channels=None,
merge_conv_channels=None,
down_conv_channels=None,
shared_fc_channels=None,
cls_channels=None,
reg_channels=None,
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
loss_bbox=dict(
type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
reduction='none',
loss_weight=1.0),
init_cfg=None):
super(PartA2BboxHead, self).__init__(init_cfg=init_cfg)
def forward(self, seg_feats, part_feats):
pass
```
其次,如果有必要的话需要实现一个新的 RoI Head。我们从 `Base3DRoIHead` 中继承得到新的 `PartAggregationROIHead`。我们可以发现 `Base3DRoIHead` 已经实现了如下函数。
```python
from mmdet.models.roi_heads import BaseRoIHead
from mmdet3d.registry import MODELS, TASK_UTILS
class Base3DRoIHead(BaseRoIHead):
"""Base class for 3d RoIHeads."""
def __init__(self,
bbox_head=None,
bbox_roi_extractor=None,
mask_head=None,
mask_roi_extractor=None,
train_cfg=None,
test_cfg=None,
init_cfg=None):
super(Base3DRoIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
mask_head=mask_head,
mask_roi_extractor=mask_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
def init_bbox_head(self, bbox_roi_extractor: dict,
bbox_head: dict) -> None:
"""Initialize box head and box roi extractor.
Args:
bbox_roi_extractor (dict or ConfigDict): Config of box
roi extractor.
bbox_head (dict or ConfigDict): Config of box in box head.
"""
self.bbox_roi_extractor = MODELS.build(bbox_roi_extractor)
self.bbox_head = MODELS.build(bbox_head)
def init_assigner_sampler(self):
"""Initialize assigner and sampler."""
self.bbox_assigner = None
self.bbox_sampler = None
if self.train_cfg:
if isinstance(self.train_cfg.assigner, dict):
self.bbox_assigner = TASK_UTILS.build(self.train_cfg.assigner)
elif isinstance(self.train_cfg.assigner, list):
self.bbox_assigner = [
TASK_UTILS.build(res) for res in self.train_cfg.assigner
]
self.bbox_sampler = TASK_UTILS.build(self.train_cfg.sampler)
def init_mask_head(self):
"""Initialize mask head, skip since ``PartAggregationROIHead`` does not
have one."""
pass
```
接下来主要对 bbox_forward 的逻辑进行修改,同时其继承了来自 `Base3DRoIHead` 的其它逻辑。在 `mmdet3d/models/roi_heads/part_aggregation_roi_head.py` 中,我们实现了新的 RoI Head,如下所示:
```python
from typing import Dict, List, Tuple
from mmdet.models.task_modules import AssignResult, SamplingResult
from mmengine import ConfigDict
from torch import Tensor
from torch.nn import functional as F
from mmdet3d.registry import MODELS
from mmdet3d.structures import bbox3d2roi
from mmdet3d.utils import InstanceList
from ...structures.det3d_data_sample import SampleList
from .base_3droi_head import Base3DRoIHead
@MODELS.register_module()
class PartAggregationROIHead(Base3DRoIHead):
"""Part aggregation roi head for PartA2.
Args:
semantic_head (ConfigDict): Config of semantic head.
num_classes (int): The number of classes.
seg_roi_extractor (ConfigDict): Config of seg_roi_extractor.
bbox_roi_extractor (ConfigDict): Config of part_roi_extractor.
bbox_head (ConfigDict): Config of bbox_head.
train_cfg (ConfigDict): Training config.
test_cfg (ConfigDict): Testing config.
"""
def __init__(self,
semantic_head: dict,
num_classes: int = 3,
seg_roi_extractor: dict = None,
bbox_head: dict = None,
bbox_roi_extractor: dict = None,
train_cfg: dict = None,
test_cfg: dict = None,
init_cfg: dict = None) -> None:
super(PartAggregationROIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
self.num_classes = num_classes
assert semantic_head is not None
self.init_seg_head(seg_roi_extractor, semantic_head)
def init_seg_head(self, seg_roi_extractor: dict,
semantic_head: dict) -> None:
"""Initialize semantic head and seg roi extractor.
Args:
seg_roi_extractor (dict): Config of seg
roi extractor.
semantic_head (dict): Config of semantic head.
"""
self.semantic_head = MODELS.build(semantic_head)
self.seg_roi_extractor = MODELS.build(seg_roi_extractor)
@property
def with_semantic(self):
"""bool: whether the head has semantic branch"""
return hasattr(self,
'semantic_head') and self.semantic_head is not None
def predict(self,
feats_dict: Dict,
rpn_results_list: InstanceList,
batch_data_samples: SampleList,
rescale: bool = False,
**kwargs) -> InstanceList:
"""Perform forward propagation of the roi head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
rescale (bool): If True, return boxes in original image space.
Defaults to False.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
assert self.with_bbox, 'Bbox head must be implemented in PartA2.'
assert self.with_semantic, 'Semantic head must be implemented' \
' in PartA2.'
batch_input_metas = [
data_samples.metainfo for data_samples in batch_data_samples
]
voxels_dict = feats_dict.pop('voxels_dict')
# TODO: Split predict semantic and bbox
results_list = self.predict_bbox(feats_dict, voxels_dict,
batch_input_metas, rpn_results_list,
self.test_cfg)
return results_list
def predict_bbox(self, feats_dict: Dict, voxel_dict: Dict,
batch_input_metas: List[dict],
rpn_results_list: InstanceList,
test_cfg: ConfigDict) -> InstanceList:
"""Perform forward propagation of the bbox head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
voxel_dict (dict): Contains information of voxels.
batch_input_metas (list[dict], Optional): Batch image meta info.
Defaults to None.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
test_cfg (Config): Test config.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
...
def loss(self, feats_dict: Dict, rpn_results_list: InstanceList,
batch_data_samples: SampleList, **kwargs) -> dict:
"""Perform forward propagation and loss calculation of the detection
roi on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
Returns:
dict[str, Tensor]: A dictionary of loss components
"""
assert len(rpn_results_list) == len(batch_data_samples)
losses = dict()
batch_gt_instances_3d = []
batch_gt_instances_ignore = []
voxels_dict = feats_dict.pop('voxels_dict')
for data_sample in batch_data_samples:
batch_gt_instances_3d.append(data_sample.gt_instances_3d)
if 'ignored_instances' in data_sample:
batch_gt_instances_ignore.append(data_sample.ignored_instances)
else:
batch_gt_instances_ignore.append(None)
if self.with_semantic:
semantic_results = self._semantic_forward_train(
feats_dict, voxels_dict, batch_gt_instances_3d)
losses.update(semantic_results.pop('loss_semantic'))
sample_results = self._assign_and_sample(rpn_results_list,
batch_gt_instances_3d)
if self.with_bbox:
feats_dict.update(semantic_results)
bbox_results = self._bbox_forward_train(feats_dict, voxels_dict,
sample_results)
losses.update(bbox_results['loss_bbox'])
return losses
```
此处我们省略了相关函数的更多细节。更多细节请参考[代码](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/models/roi_heads/part_aggregation_roi_head.py)
最后,用户需要在 `mmdet3d/models/roi_heads/bbox_heads/__init__.py``mmdet3d/models/roi_heads/__init__.py` 添加模块,从而能被相应的注册器找到并加载。
此外,用户也可以在配置文件中添加以下代码以达到相同的目的。
```python
custom_imports=dict(
imports=['mmdet3d.models.roi_heads.part_aggregation_roi_head', 'mmdet3d.models.roi_heads.bbox_heads.parta2_bbox_head'],
allow_failed_imports=False)
```
`PartAggregationROIHead` 的配置文件如下所示:
```python
model = dict(
...
roi_head=dict(
type='PartAggregationROIHead',
num_classes=3,
semantic_head=dict(
type='PointwiseSemanticHead',
in_channels=16,
extra_width=0.2,
seg_score_thr=0.3,
num_classes=3,
loss_seg=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
reduction='sum',
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_part=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0)),
seg_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='max')),
bbox_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='avg')),
bbox_head=dict(
type='PartA2BboxHead',
num_classes=3,
seg_in_channels=16,
part_in_channels=4,
seg_conv_channels=[64, 64],
part_conv_channels=[64, 64],
merge_conv_channels=[128, 128],
down_conv_channels=[128, 256],
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
shared_fc_channels=[256, 512, 512, 512],
cls_channels=[256, 256],
reg_channels=[256, 256],
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=1.0 / 9.0,
reduction='sum',
loss_weight=1.0),
loss_cls=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0))),
...
)
```
MMDetection 2.0 开始支持配置文件之间的继承,因此用户可以关注配置文件的修改。PartA2 Head 的第二阶段主要使用了新的 `PartAggregationROIHead``PartA2BboxHead`,需要根据对应模块的 `__init__` 函数来设置参数。
### 添加新的损失函数
假设您想要为检测框的回归添加一个新的损失函数 `MyLoss`。为了添加一个新的损失函数,用户需要在 `mmdet3d/models/losses/my_loss.py` 中实现该函数。装饰器 `weighted_loss` 能够保证对每个元素的损失进行加权平均。
```python
import torch
import torch.nn as nn
from mmdet.models.losses.utils import weighted_loss
from mmdet3d.registry import MODELS
@weighted_loss
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@MODELS.register_module()
class MyLoss(nn.Module):
def __init__(self, reduction='mean', loss_weight=1.0):
super(MyLoss, self).__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * my_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss_bbox
```
接下来,用户需要在 `mmdet3d/models/losses/__init__.py` 添加该函数。
```python
from .my_loss import MyLoss, my_loss
```
或者在配置文件中添加以下代码以达到相同的目的。
```python
custom_imports=dict(
imports=['mmdet3d.models.losses.my_loss'],
allow_failed_imports=False)
```
为了使用该函数,用户需要修改 `loss_xxx` 域。由于 `MyLoss` 是用于回归的,您需要修改 head 中的 `loss_bbox` 域。
```python
loss_bbox=dict(type='MyLoss', loss_weight=1.0)
```
# 自定义运行时配置
## 自定义优化器设置
优化器相关的配置是由 `optim_wrapper` 管理的,其通常有三个字段:`optimizer``paramwise_cfg``clip_grad`。更多细节请参考 [OptimWrapper](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/optim_wrapper.html)。如下所示,使用 `AdamW` 作为`优化器`,骨干网络的学习率降低 10 倍,并添加了梯度裁剪。
```python
optim_wrapper = dict(
type='OptimWrapper',
# 优化器
optimizer=dict(
type='AdamW',
lr=0.0001,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999)),
# 参数级学习率及权重衰减系数设置
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1, decay_mult=1.0),
},
norm_decay_mult=0.0),
# 梯度裁剪
clip_grad=dict(max_norm=0.01, norm_type=2))
```
### 自定义 PyTorch 支持的优化器
我们已经支持使用所有 PyTorch 实现的优化器,且唯一需要修改的地方就是改变配置文件中的 `optim_wrapper` 字段中的 `optimizer` 字段。例如,如果您想使用 `Adam`(注意这样可能会使性能大幅下降),您可以这样修改:
```python
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='Adam', lr=0.0003, weight_decay=0.0001))
```
为了修改模型的学习率,用户只需要修改 `optimizer` 中的 `lr` 字段。用户可以根据 PyTorch 的 [API 文档](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim)直接设置参数。
### 自定义并实现优化器
#### 1. 定义新的优化器
一个自定义优化器可以按照如下过程定义:
假设您想要添加一个叫 `MyOptimizer` 的,拥有参数 `a``b``c` 的优化器,您需要创建一个叫做 `mmdet3d/engine/optimizers` 的目录。接下来,应该在目录下某个文件中实现新的优化器,比如 `mmdet3d/engine/optimizers/my_optimizer.py`
```python
from torch.optim import Optimizer
from mmdet3d.registry import OPTIMIZERS
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c):
pass
```
#### 2. 将优化器添加到注册器
为了找到上述定义的优化器模块,该模块首先需要被引入主命名空间。有两种实现方法:
- 修改 `mmdet3d/engine/optimizers/__init__.py` 导入该模块。
新定义的模块应该在 `mmdet3d/engine/optimizers/__init__.py` 中被导入,从而被找到并且被添加到注册器中:
```python
from .my_optimizer import MyOptimizer
```
- 在配置中使用 `custom_imports` 来人工导入新优化器。
```python
custom_imports = dict(imports=['mmdet3d.engine.optimizers.my_optimizer'], allow_failed_imports=False)
```
模块 `mmdet3d.engine.optimizers.my_optimizer` 会在程序开始被导入,且 `MyOptimizer` 类在那时会自动被注册。注意到应该只有包含 `MyOptimizer` 类的包被导入。`mmdet3d.engine.optimizers.my_optimizer.MyOptimizer`**不能**被直接导入。
事实上,用户可以在这种导入的方法中使用完全不同的文件目录结构,只要保证根目录能在 `PYTHONPATH` 中被定位。
#### 3. 在配置文件中指定优化器
接下来您可以在配置文件的 `optimizer` 字段中使用 `MyOptimizer`。在配置文件中,优化器在 `optimizer` 字段中以如下方式定义:
```python
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001))
```
为了使用您自己的优化器,该字段可以改为:
```python
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value))
```
### 自定义优化器封装构造器
部分模型可能会拥有一些参数专属的优化器设置,比如 BatchNorm 层的权重衰减 (weight decay)。用户可以通过自定义优化器封装构造器来对那些细粒度的参数进行调优。
```python
from mmengine.optim import DefaultOptimWrapperConstructor
from mmdet3d.registry import OPTIM_WRAPPER_CONSTRUCTORS
from .my_optimizer import MyOptimizer
@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
class MyOptimizerWrapperConstructor(DefaultOptimWrapperConstructor):
def __init__(self,
optim_wrapper_cfg: dict,
paramwise_cfg: Optional[dict] = None):
pass
def __call__(self, model: nn.Module) -> OptimWrapper:
return optim_wrapper
```
默认优化器封装构造器在[这里](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/optimizer/default_constructor.py#L18)实现。这部分代码也可以用作新优化器封装构造器的模板。
### 额外的设置
没有在优化器部分实现的技巧应该通过优化器封装构造器或者钩子来实现(比如逐参数的学习率设置)。我们列举了一些常用的可以稳定训练过程或者加速训练的设置。我们欢迎提供更多类似设置的 PR 和 issue。
- __使用梯度裁剪 (gradient clip) 来稳定训练过程__:一些模型依赖梯度裁剪技术来裁剪训练中的梯度,以稳定训练过程。举例如下:
```python
optim_wrapper = dict(
_delete_=True, clip_grad=dict(max_norm=35, norm_type=2))
```
如果您的配置继承了一个已经设置了 `optim_wrapper` 的基础配置,那么您可能需要 `_delete_=True` 字段来覆盖基础配置中无用的设置。更多细节请参考[配置文档](https://mmdetection3d.readthedocs.io/zh_CN/dev-1.x/user_guides/config.html)
- __使用动量调度器 (momentum scheduler) 来加速模型收敛__:我们支持用动量调度器来根据学习率更改模型的动量,这样可以使模型更快地收敛。动量调度器通常和学习率调度器一起使用,例如,如下配置文件在 [3D 检测](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/schedules/cyclic-20e.py)中被用于加速模型收敛。更多细节请参考 [CosineAnnealingLR](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/lr_scheduler.py#L43)[CosineAnnealingMomentum](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/momentum_scheduler.py#L71) 的实现方法。
```python
param_scheduler = [
# 学习率调度器
# 在前 8 个 epoch,学习率从 0 升到 lr * 10
# 在接下来 12 个 epoch,学习率从 lr * 10 降到 lr * 1e-4
dict(
type='CosineAnnealingLR',
T_max=8,
eta_min=lr * 10,
begin=0,
end=8,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=12,
eta_min=lr * 1e-4,
begin=8,
end=20,
by_epoch=True,
convert_to_iter_based=True),
# 动量调度器
# 在前 8 个 epoch,动量从 0 升到 0.85 / 0.95
# 在接下来 12 个 epoch,动量从 0.85 / 0.95 升到 1
dict(
type='CosineAnnealingMomentum',
T_max=8,
eta_min=0.85 / 0.95,
begin=0,
end=8,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=12,
eta_min=1,
begin=8,
end=20,
by_epoch=True,
convert_to_iter_based=True)
]
```
## 自定义训练调度
默认情况下我们使用阶梯式学习率衰减的 1 倍训练调度,这会调用 MMEngine 中的 [`MultiStepLR`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/lr_scheduler.py#L144)。我们在[这里](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/scheduler/lr_scheduler.py)支持了很多其他学习率调度,比如`余弦退火``多项式衰减`调度。下面是一些样例:
- 多项式衰减调度:
```python
param_scheduler = [
dict(
type='PolyLR',
power=0.9,
eta_min=1e-4,
begin=0,
end=8,
by_epoch=True)]
```
- 余弦退火调度:
```python
param_scheduler = [
dict(
type='CosineAnnealingLR',
T_max=8,
eta_min=lr * 1e-5,
begin=0,
end=8,
by_epoch=True)]
```
## 自定义训练循环控制器
默认情况下,我们在 `train_cfg` 中使用 `EpochBasedTrainLoop`,并在每一个训练 epoch 完成后进行一次验证,如下所示:
```python
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_begin=1, val_interval=1)
```
事实上,[`IterBasedTrainLoop`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py#L185)[`EpochBasedTrainLoop`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py#L18) 都支持动态间隔验证,如下所示:
```python
# 在第 365001 次迭代之前,我们每隔 5000 次迭代验证一次。
# 在第 365000 次迭代之后,我们每隔 368750 次迭代验证一次,
# 这意味着我们在训练结束后进行验证。
interval = 5000
max_iters = 368750
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
train_cfg = dict(
type='IterBasedTrainLoop',
max_iters=max_iters,
val_interval=interval,
dynamic_intervals=dynamic_intervals)
```
## 自定义钩子
### 自定义并实现钩子
#### 1. 实现一个新钩子
MMEngine 提供了一些实用的[钩子](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/hook.html),但有些场合用户可能需要实现一个新的钩子。在 v1.1.0rc0 之后,MMDetection3D 在训练时支持基于 MMEngine 自定义钩子。因此用户可以直接在 mmdet3d 或者基于 mmdet3d 的代码库中实现钩子并通过更改训练配置来使用钩子。这里我们给出一个在 mmdet3d 中创建并使用新钩子的例子。
```python
from mmengine.hooks import Hook
from mmdet3d.registry import HOOKS
@HOOKS.register_module()
class MyHook(Hook):
def __init__(self, a, b):
def before_run(self, runner) -> None:
def after_run(self, runner) -> None:
def before_train(self, runner) -> None:
def after_train(self, runner) -> None:
def before_train_epoch(self, runner) -> None:
def after_train_epoch(self, runner) -> None:
def before_train_iter(self,
runner,
batch_idx: int,
data_batch: DATA_BATCH = None) -> None:
def after_train_iter(self,
runner,
batch_idx: int,
data_batch: DATA_BATCH = None,
outputs: Optional[dict] = None) -> None:
```
用户需要根据钩子的功能指定钩子在每个训练阶段时的行为,具体包括如下阶段:`before_run``after_run``before_train``after_train``before_train_epoch``after_train_epoch``before_train_iter`,和 `after_train_iter`。有更多的位点可以插入钩子,详情可参考 [base hook class](https://github.com/open-mmlab/mmengine/blob/main/mmengine/hooks/hook.py#L9)
#### 2. 注册新钩子
接下来我们需要导入 `MyHook`。假设新钩子位于文件 `mmdet3d/engine/hooks/my_hook.py` 中,有两种实现方法:
- 修改 `mmdet3d/engine/hooks/__init__.py` 导入该模块。
新定义的模块应该在 `mmdet3d/engine/hooks/__init__.py` 中被导入,从而被找到并且被添加到注册器中:
```python
from .my_hook import MyHook
```
- 在配置中使用 `custom_imports` 来人为地导入新钩子。
```python
custom_imports = dict(imports=['mmdet3d.engine.hooks.my_hook'], allow_failed_imports=False)
```
#### 3. 更改配置文件
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value)
]
```
您可以将字段 `priority` 设置为 `'NORMAL'` 或者 `'HIGHEST'` 来设置钩子的优先级,如下所示:
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]
```
默认情况下,注册阶段钩子的优先级为 `'NORMAL'`
### 使用 MMDetection3D 中实现的钩子
如果 MMDetection3D 中已经实现了该钩子,您可以直接通过更改配置文件来使用该钩子。
#### 例子:`DisableObjectSampleHook`
我们实现了一个名为 [DisableObjectSampleHook](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/engine/hooks/disable_object_sample_hook.py) 的自定义钩子在训练阶段达到指定 epoch 后禁用 `ObjectSample` 增强策略。
如果有需要的话我们可以在配置文件中设置它:
```python
custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]
```
### 更改默认的运行时钩子
有一些常用的钩子通过 `default_hooks` 注册,它们是:
- `IterTimerHook`:该钩子用来记录加载数据的时间 'data_time' 和模型训练一步的时间 'time'。
- `LoggerHook`:该钩子用来从`执行器(Runner)`的不同组件收集日志并将其写入终端,json 文件,tensorboard 和 wandb 等。
- `ParamSchedulerHook`:该钩子用来更新优化器中的一些超参数,例如学习率和动量。
- `CheckpointHook`:该钩子用来定期地保存检查点。
- `DistSamplerSeedHook`:该钩子用来设置采样和批采样的种子。
- `Det3DVisualizationHook`:该钩子用来可视化验证和测试过程的预测结果。
`IterTimerHook``ParamSchedulerHook``DistSamplerSeedHook` 都很简单,通常不需要修改,因此此处我们将介绍如何使用 `LoggerHook``CheckpointHook``Det3DVisualizationHook`
#### CheckpointHook
除了定期地保存检查点,[`CheckpointHook`](https://github.com/open-mmlab/mmengine/blob/main/mmengine/hooks/checkpoint_hook.py#L18) 提供了其它的可选项例如 `max_keep_ckpts``save_optimizer` 等。用户可以设置 `max_keep_ckpts` 只保存少量的检查点或者通过 `save_optimizer` 决定是否保存优化器的状态。参数的更多细节请参考[此处](https://github.com/open-mmlab/mmengine/blob/main/mmengine/hooks/checkpoint_hook.py#L18)
```python
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
interval=1,
max_keep_ckpts=3,
save_optimizer=True))
```
#### LoggerHook
`LoggerHook` 允许设置日志记录间隔。详细介绍可参考[文档](https://github.com/open-mmlab/mmengine/blob/main/mmengine/hooks/logger_hook.py#L19)
```python
default_hooks = dict(logger=dict(type='LoggerHook', interval=50))
```
#### Det3DVisualizationHook
`Det3DVisualizationHook` 使用 `DetLocalVisualizer` 来可视化预测结果,`Det3DLocalVisualizer` 支持不同的后端,例如 `TensorboardVisBackend``WandbVisBackend`(更多细节请参考[文档](https://github.com/open-mmlab/mmengine/blob/main/mmengine/visualization/vis_backend.py))。用户可以添加多个后端来进行可视化,如下所示。
```python
default_hooks = dict(
visualization=dict(type='Det3DVisualizationHook', draw=True))
vis_backends = [dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')]
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
```
.. toctree::
:maxdepth: 3
kitti.md
nuscenes.md
lyft.md
waymo.md
sunrgbd.md
scannet.md
s3dis.md
semantickitti.md
# KITTI 数据集
本页提供了有关在 MMDetection3D 中使用 KITTI 数据集的具体教程。
## 数据准备
您可以在[这里](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d)下载 KITTI 3D 检测数据并解压缩所有 zip 文件。此外,您可以在[这里](https://download.openmmlab.com/mmdetection3d/data/train_planes.zip)下载道路平面信息,其在训练过程中作为一个可选项,用来提高模型的性能。道路平面信息由 [AVOD](https://github.com/kujason/avod) 生成,更多细节请参考[此处](https://github.com/kujason/avod/issues/19)
像准备数据集的一般方法一样,建议将数据集根目录链接到 `$MMDETECTION3D/data`
在我们处理之前,文件夹结构应按如下方式组织:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── planes (optional)
```
### 创建 KITTI 数据集
为了创建 KITTI 点云数据,首先需要加载原始的点云数据并生成相关的包含目标标签和标注框的数据标注文件,同时还需要为 KITTI 数据集生成每个单独的训练目标的点云数据,并将其存储在 `data/kitti/kitti_gt_database``.bin` 格式的文件中,此外,需要为训练数据或者验证数据生成 `.pkl` 格式的包含数据信息的文件。随后,通过运行下面的命令来创建最终的 KITTI 数据:
```bash
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# 下载数据划分
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti --with-plane
```
需要注意的是,如果您的本地磁盘没有充足的存储空间来存储转换后的数据,您可以通过改变 `--out-dir` 来指定其他任意的存储路径。如果您没有准备 `planes` 数据,您需要移除 `--with-plane` 标志。
处理后的文件夹结构应该如下:
```
kitti
├── ImageSets
│ ├── test.txt
│ ├── train.txt
│ ├── trainval.txt
│ ├── val.txt
├── testing
│ ├── calib
│ ├── image_2
│ ├── velodyne
│ ├── velodyne_reduced
├── training
│ ├── calib
│ ├── image_2
│ ├── label_2
│ ├── velodyne
│ ├── velodyne_reduced
│ ├── planes (optional)
├── kitti_gt_database
│ ├── xxxxx.bin
├── kitti_infos_train.pkl
├── kitti_infos_val.pkl
├── kitti_dbinfos_train.pkl
├── kitti_infos_test.pkl
├── kitti_infos_trainval.pkl
```
- `kitti_gt_database/xxxxx.bin`:训练数据集中包含在 3D 标注框中的点云数据。
- `kitti_infos_train.pkl`:训练数据集,该字典包含了两个键值:`metainfo``data_list``metainfo` 包含数据集的基本信息,例如 `categories`, `dataset``info_version``data_list` 是由字典组成的列表,每个字典(以下简称 `info`)包含了单个样本的所有详细信息。
- info\['sample_idx'\]:该样本在整个数据集的索引。
- info\['images'\]:多个相机捕获的图像信息。是一个字典,包含 5 个键值:`CAM0`, `CAM1`, `CAM2`, `CAM3`, `R0_rect`
- info\['images'\]\['R0_rect'\]:校准旋转矩阵,是一个 4x4 数组。
- info\['images'\]\['CAM2'\]:包含 `CAM2` 相机传感器的信息。
- info\['images'\]\['CAM2'\]\['img_path'\]:图像的文件名。
- info\['images'\]\['CAM2'\]\['height'\]:图像的高。
- info\['images'\]\['CAM2'\]\['width'\]:图像的宽。
- info\['images'\]\['CAM2'\]\['cam2img'\]:相机到图像的变换矩阵,是一个 4x4 数组。
- info\['images'\]\['CAM2'\]\['lidar2cam'\]:激光雷达到相机的变换矩阵,是一个 4x4 数组。
- info\['images'\]\['CAM2'\]\['lidar2img'\]:激光雷达到图像的变换矩阵,是一个 4x4 数组。
- info\['lidar_points'\]:是一个字典,包含了激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度。
- info\['lidar_points'\]\['Tr_velo_to_cam'\]:Velodyne 坐标到相机坐标的变换矩阵,是一个 4x4 数组。
- info\['lidar_points'\]\['Tr_imu_to_velo'\]:IMU 坐标到 Velodyne 坐标的变换矩阵,是一个 4x4 数组。
- info\['instances'\]:是一个字典组成的列表。每个字典包含单个实例的所有标注信息。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox'\]:长度为 4 的列表,以 (x1, y1, x2, y2) 的顺序表示实例的 2D 边界框。
- info\['instances'\]\[i\]\['bbox_3d'\]:长度为 7 的列表,以 (x, y, z, l, h, w, yaw) 的顺序表示实例的 3D 边界框。
- info\['instances'\]\[i\]\['bbox_label'\]:是一个整数,表示实例的 2D 标签,-1 代表忽略。
- info\['instances'\]\[i\]\['bbox_label_3d'\]:是一个整数,表示实例的 3D 标签,-1 代表忽略。
- info\['instances'\]\[i\]\['depth'\]:3D 边界框投影到相关图像平面的中心点的深度。
- info\['instances'\]\[i\]\['num_lidar_pts'\]:3D 边界框内的激光雷达点数。
- info\['instances'\]\[i\]\['center_2d'\]:3D 边界框投影的 2D 中心。
- info\['instances'\]\[i\]\['difficulty'\]:KITTI 官方定义的困难度,包括简单、适中、困难。
- info\['instances'\]\[i\]\['truncated'\]:从 0(非截断)到 1(截断)的浮点数,其中截断指的是离开检测图像边界的检测目标。
- info\['instances'\]\[i\]\['occluded'\]:整数 (0,1,2,3) 表示目标的遮挡状态:0 = 完全可见,1 = 部分遮挡,2 = 大面积遮挡,3 = 未知。
- info\['instances'\]\[i\]\['group_ids'\]:用于多部分的物体。
- info\['plane'\](可选):地平面信息。
更多细节请参考 [kitti_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/kitti_converter.py)[update_infos_to_v2.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/update_infos_to_v2.py)
## 训练流程
下面展示了一个使用 KITTI 数据集进行 3D 目标检测的典型流程:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # x, y, z, intensity
use_dim=4),
dict(
type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
- 数据增强:
- `ObjectNoise`:对场景中的每个真实标注框目标添加噪音。
- `RandomFlip3D`:对输入点云数据进行随机地水平翻转或者垂直翻转。
- `GlobalRotScaleTrans`:对输入点云数据进行旋转。
## 评估
使用 8 个 GPU 以及 KITTI 指标评估的 PointPillars 的示例如下:
```shell
bash tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py work_dirs/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class/latest.pth 8
```
## 度量指标
KITTI 官方使用全类平均精度(mAP)和平均方向相似度(AOS)来评估 3D 目标检测的性能,更多细节请参考[官方网站](http://www.cvlibs.net/datasets/kitti/eval_3dobject.php)[论文](http://www.cvlibs.net/publications/Geiger2012CVPR.pdf)
MMDetection3D 采用相同的方法在 KITTI 数据集上进行评估,下面展示了一个评估结果的例子:
```
Car AP@0.70, 0.70, 0.70:
bbox AP:97.9252, 89.6183, 88.1564
bev AP:90.4196, 87.9491, 85.1700
3d AP:88.3891, 77.1624, 74.4654
aos AP:97.70, 89.11, 87.38
Car AP@0.70, 0.50, 0.50:
bbox AP:97.9252, 89.6183, 88.1564
bev AP:98.3509, 90.2042, 89.6102
3d AP:98.2800, 90.1480, 89.4736
aos AP:97.70, 89.11, 87.38
```
## 测试和提交
使用 8 个 GPU 在 KITTI 上测试 PointPillars 并生成对排行榜的提交的示例如下:
- 首先,你需要在你的配置文件中修改 `test_dataloader``test_evaluator` 字典,如下所示:
```python
data_root = 'data/kitti/'
test_dataloader = dict(
dataset=dict(
ann_file='kitti_infos_test.pkl',
load_eval_anns=False,
data_prefix=dict(pts='testing/velodyne_reduced')))
test_evaluator = dict(
ann_file=data_root + 'kitti_infos_test.pkl',
format_only=True,
pklfile_prefix='results/kitti-3class/kitti_results',
submission_prefix='results/kitti-3class/kitti_results')
```
- 接下来,你可以运行如下测试脚本。
```shell
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py work_dirs/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class/latest.pth 8
```
在生成 `results/kitti-3class/kitti_results/xxxxx.txt` 后,您可以提交这些文件到 KITTI 官方网站进行基准测试,更多细节请参考 [KITTI 官方网站](http://www.cvlibs.net/datasets/kitti/index.php)
# Lyft 数据集
本页提供了有关在 MMDetection3D 中使用 Lyft 数据集的具体教程。
## 准备之前
您可以在[这里](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data)下载 Lyft 3D 检测数据并解压缩所有 zip 文件。
像准备数据集的一般方法一样,建议将数据集根目录链接到 `$MMDETECTION3D/data`
在进行处理之前,文件夹结构应按如下方式组织:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
```
其中 `v1.01-train``v1.01-test` 包含与 nuScenes 数据集相同的元文件,`.txt` 文件包含数据划分的信息。Lyft 不提供训练集和验证集的官方划分方案,因此 MMDetection3D 对不同场景下的不同类别的目标数量进行分析,并提供了一个数据集划分方案。`sample_submission.csv` 是用于提交到 Kaggle 评估服务器的基本文件。需要注意的是,我们遵循了 Lyft 最初的文件夹命名以实现更清楚的文件组织。请将下载下来的原始文件夹按照上述组织结构重新命名。
## 数据准备
组织 Lyft 数据集的方式和组织 nuScenes 的方式相同,首先会生成几乎具有相同结构的 `.pkl` 文件,接着需要重点关注这两个数据集之间的不同点,更多关于数据集信息文件结构的说明请参考 [nuScenes 教程](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/docs/zh_cn/advanced_guides/datasets/nuscenes_det.md)
请通过运行下面的命令来生成 Lyft 的数据集信息文件:
```bash
python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
python tools/data_converter/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
```
请注意,上面的第二行命令用于修复损坏的 lidar 数据文件,更多细节请参考此处[讨论](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/discussion/110000)
处理后的文件夹结构应该如下:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
│ │ ├── lyft_infos_train.pkl
│ │ ├── lyft_infos_val.pkl
│ │ ├── lyft_infos_test.pkl
```
- `lyft_infos_train.pkl`:训练数据集信息,该字典包含两个关键字:`metainfo``data_list``metainfo` 包含数据集的基本信息,例如 `categories`, `dataset``info_version``data_list` 是由字典组成的列表,每个字典(以下简称 `info`)包含了单个样本的所有详细信息。
- info\['sample_idx'\]:样本在整个数据集的索引。
- info\['token'\]:样本数据标记。
- info\['timestamp'\]:样本数据时间戳。
- info\['lidar_points'\]:是一个字典,包含了所有与激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度。
- info\['lidar_points'\]\['lidar2ego'\]:该激光雷达传感器到自车的变换矩阵。(4x4 列表)
- info\['lidar_points'\]\['ego2global'\]:自车到全局坐标的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]:是一个列表,包含了扫描信息(没有标注的中间帧)。
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['data_path'\]:第 i 次扫描的激光雷达数据的文件路径。
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\[lidar2ego''\]:当前激光雷达传感器到自车在第 i 次扫描的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['ego2global'\]:自车在第 i 次扫描到全局坐标的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['lidar2sensor'\]:从当前帧主激光雷达到第 i 帧扫描激光雷达的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]:扫描数据的时间戳。
- info\['lidar_sweeps'\]\[i\]\['sample_data_token'\]:扫描样本数据标记。
- info\['images'\]:是一个字典,包含与每个相机对应的六个键值:`'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`。每个字典包含了对应相机的所有数据信息。
- info\['images'\]\['CAM_XXX'\]\['img_path'\]:图像的文件名。
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]:当 3D 点投影到图像平面时需要的内参信息相关的变换矩阵。(3x3 列表)
- info\['images'\]\['CAM_XXX'\]\['sample_data_token'\]:图像样本数据标记。
- info\['images'\]\['CAM_XXX'\]\['timestamp'\]:图像的时间戳。
- info\['images'\]\['CAM_XXX'\]\['cam2ego'\]:该相机传感器到自车的变换矩阵。(4x4 列表)
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]:激光雷达传感器到该相机的变换矩阵。(4x4 列表)
- info\['instances'\]:是一个字典组成的列表。每个字典包含单个实例的所有标注信息。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox_3d'\]:长度为 7 的列表,以 (x, y, z, l, w, h, yaw) 的顺序表示实例在激光雷达坐标系下的 3D 边界框。
- info\['instances'\]\[i\]\['bbox_label_3d'\]:整数从 0 开始表示实例的标签,其中 -1 代表忽略该类别。
- info\['instances'\]\[i\]\['bbox_3d_isvalid'\]:每个包围框是否有效。一般情况下,我们只将包含至少一个激光雷达或雷达点的 3D 框作为有效框。
接下来将详细介绍 Lyft 数据集和 nuScenes 数据集之间的数据集信息文件中的不同点:
- `lyft_database/xxxxx.bin` 文件不存在:由于真实标注框的采样对实验的影响可以忽略不计,在 Lyft 数据集中不会提取该目录和相关的 `.bin` 文件。
- `lyft_infos_train.pkl`
- info\['instances'\]\[i\]\['velocity'\] 不存在:Lyft 数据集中不存在速度评估信息。
- info\['instances'\]\[i\]\['num_lidar_pts'\] 及 info\['instances'\]\[i\]\['num_radar_pts'\] 不存在。
这里仅介绍存储在训练数据文件的数据记录信息。这同样适用于验证集和测试集(没有实例)。
更多关于 `lyft_infos_xxx.pkl` 的结构信息请参考 [lyft_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/lyft_converter.py)
## 训练流程
### 基于 LiDAR 的方法
Lyft 上基于 LiDAR 的 3D 检测(包括多模态方法)的训练流程与 nuScenes 几乎相同,如下所示:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=5),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
与 nuScenes 相似,在 Lyft 上进行训练的模型也需要 `LoadPointsFromMultiSweeps` 步骤来从连续帧中加载点云数据。另外,考虑到 Lyft 中所收集的激光雷达点的强度是无效的,因此将 `LoadPointsFromMultiSweeps` 中的 `use_dim` 默认值设置为 `[0, 1, 2, 4]`,其中前三个维度表示点的坐标,最后一个维度表示时间戳的差异。
## 评估
使用 8 个 GPU 以及 Lyft 指标评估的 PointPillars 的示例如下:
```shell
bash ./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb2-2x_lyft-3d.py checkpoints/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210517_202818-fc6904c3.pth 8
```
## 度量指标
Lyft 提出了一个更加严格的用以评估所预测的 3D 检测框的度量指标。判断一个预测框是否是正类的基本评判标准和 KITTI 一样,如基于 3D 交并比进行评估,然而,Lyft 采用与 COCO 相似的方式来计算平均精度 -- 计算 3D 交并比在 0.5-0.95 之间的不同阈值下的平均精度。实际上,重叠部分大于 0.7 的 3D 交并比是一项对于 3D 检测方法比较严格的标准,因此整体的性能似乎会偏低。相比于其他数据集,Lyft 上不同类别的标注不平衡是导致最终结果偏低的另一个重要原因。更多关于度量指标的定义请参考[官方网址](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/overview/evaluation)
这里将采用官方方法对 Lyft 进行评估,下面展示了一个评估结果的例子:
```
+mAPs@0.5:0.95------+--------------+
| class | mAP@0.5:0.95 |
+-------------------+--------------+
| animal | 0.0 |
| bicycle | 0.099 |
| bus | 0.177 |
| car | 0.422 |
| emergency_vehicle | 0.0 |
| motorcycle | 0.049 |
| other_vehicle | 0.359 |
| pedestrian | 0.066 |
| truck | 0.176 |
| Overall | 0.15 |
+-------------------+--------------+
```
## 测试和提交
使用 8 个 GPU 在 Lyft 上测试 PointPillars 并生成对排行榜的提交的示例如下:
```shell
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb2-2x_lyft-3d.py work_dirs/pp-lyft/latest.pth 8 --cfg-options test_evaluator.jsonfile_prefix=work_dirs/pp-lyft/results_challenge test_evaluator.csv_savepath=results/pp-lyft/results_challenge.csv
```
在生成 `work_dirs/pp-lyft/results_challenge.csv`,您可以将生成的文件提交到 Kaggle 评估服务器,请参考[官方网址](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles)获取更多细节。
同时还可以使用可视化工具将预测结果进行可视化,更多细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/useful_tools.html#visualization)
# NuScenes 数据集
本页提供了有关在 MMDetection3D 中使用 nuScenes 数据集的具体教程。
## 准备之前
您可以在[这里](https://www.nuscenes.org/download)下载 nuScenes 3D 检测数据 Full dataset (v1.0) 并解压缩所有 zip 文件。
如果您想进行 3D 语义分割任务,需要额外下载 nuScenes-lidarseg 数据标注,并将解压的文件放入 nuScenes 对应的文件夹下。
**注意**:nuScenes-lidarseg 中的 v1.0trainval(test)/categroy.json 会替换原先 Full dataset (v1.0) 原先的 v1.0trainval(test)/categroy.json,但是不会对 3D 目标检测任务造成影响。
像准备数据集的一般方法一样,建议将数据集根目录链接到 `$MMDETECTION3D/data`
在我们处理之前,文件夹结构应按如下方式组织。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── lidarseg (optional)
│ │ ├── v1.0-test
| | ├── v1.0-trainval
```
## 数据准备
我们通常需要通过特定样式来使用 `.pkl` 文件组织有用的数据信息。要为 nuScenes 准备这些文件,请运行以下命令:
```bash
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
```
处理后的文件夹结构应该如下。
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── lidarseg (optional)
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
```
- `nuscenes_database/xxxxx.bin`:训练数据集的每个 3D 包围框中包含的点云数据。
- `nuscenes_infos_train.pkl`:训练数据集,该字典包含了两个键值:`metainfo``data_list``metainfo` 包含数据集的基本信息,例如 `categories`, `dataset``info_version``data_list` 是由字典组成的列表,每个字典(以下简称 `info`)包含了单个样本的所有详细信息。
- info\['sample_idx'\]:样本在整个数据集的索引。
- info\['token'\]:样本数据标记。
- info\['timestamp'\]:样本数据时间戳。
- info\['ego2global'\]:自车到全局坐标的变换矩阵。(4x4 列表)
- info\['lidar_points'\]:是一个字典,包含了所有与激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度。
- info\['lidar_points'\]\['lidar2ego'\]:该激光雷达传感器到自车的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]:是一个列表,包含了扫描信息(没有标注的中间帧)。
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['data_path'\]:第 i 次扫描的激光雷达数据的文件路径。
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\[lidar2ego''\]:当前激光雷达传感器到自车的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['ego2global'\]:自车到全局坐标的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['lidar2sensor'\]:从主激光雷达传感器到当前传感器(用于收集扫描数据)的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]:扫描数据的时间戳。
- info\['lidar_sweeps'\]\[i\]\['sample_data_token'\]:扫描样本数据标记。
- info\['images'\]:是一个字典,包含与每个相机对应的六个键值:`'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`。每个字典包含了对应相机的所有数据信息。
- info\['images'\]\['CAM_XXX'\]\['img_path'\]:图像的文件名。
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]:当 3D 点投影到图像平面时需要的内参信息相关的变换矩阵。(3x3 列表)
- info\['images'\]\['CAM_XXX'\]\['sample_data_token'\]:图像样本数据标记。
- info\['images'\]\['CAM_XXX'\]\['timestamp'\]:图像的时间戳。
- info\['images'\]\['CAM_XXX'\]\['cam2ego'\]:该相机传感器到自车的变换矩阵。(4x4 列表)
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]:激光雷达传感器到该相机的变换矩阵。(4x4 列表)
- info\['instances'\]:是一个字典组成的列表。每个字典包含单个实例的所有标注信息。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox_3d'\]:长度为 7 的列表,以 (x, y, z, l, w, h, yaw) 的顺序表示实例的 3D 边界框。
- info\['instances'\]\[i\]\['bbox_label_3d'\]:整数表示实例的标签,-1 代表忽略。
- info\['instances'\]\[i\]\['velocity'\]:3D 边界框的速度(由于不正确,没有垂直测量),大小为 (2, ) 的列表。
- info\['instances'\]\[i\]\['num_lidar_pts'\]:每个 3D 边界框内包含的激光雷达点数。
- info\['instances'\]\[i\]\['num_radar_pts'\]:每个 3D 边界框内包含的雷达点数。
- info\['instances'\]\[i\]\['bbox_3d_isvalid'\]:每个包围框是否有效。一般情况下,我们只将包含至少一个激光雷达或雷达点的 3D 框作为有效框。
- info\['cam_instances'\]:是一个字典,包含以下键值:`'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`。对于基于视觉的 3D 目标检测任务,我们将整个场景的 3D 标注划分至它们所属于的相应相机中。对于其中的第 i 个实例,我们有:
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label'\]:实例标签。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label_3d'\]:实例标签。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox'\]:2D 边界框标注(3D 框投影的矩形框),顺序为 \[x1, y1, x2, y2\] 的列表。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['center_2d'\]:3D 框投影到图像上的中心点,大小为 (2, ) 的列表。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['depth'\]:3D 框投影中心的深度。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['velocity'\]:3D 边界框的速度(由于不正确,没有垂直测量),大小为 (2, ) 的列表。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['attr_label'\]:实例的属性标签。我们为属性分类维护了一个属性集合和映射。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_3d'\]:长度为 7 的列表,以 (x, y, z, l, h, w, yaw) 的顺序表示实例的 3D 边界框。
- info\['pts_semantic_mask_path'\]:激光雷达语义分割标注的文件名。
注意:
1. `instances``cam_instances``bbox_3d` 的区别。`bbox_3d` 都被转换到 MMDet3D 定义的坐标系下,`instances` 中的 `bbox_3d` 是在激光雷达坐标系下,而 `cam_instances` 是在相机坐标系下。注意它们 3D 框中表示的不同('l, w, h' 和 'l, h, w')。
2. 这里我们只解释训练信息文件中记录的数据。这同样适用于验证集和测试集(测试集的 `.pkl` 文件中不包含 `instances` 以及 `cam_instances`)。
获取 `nuscenes_infos_xxx.pkl` 的核心函数为 [\_fill_trainval_infos](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/nuscenes_converter.py#L146)。更多细节请参考 [nuscenes_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/nuscenes_converter.py)
## 训练流程
### 基于 LiDAR 的方法
nuScenes 上基于 LiDAR 的 3D 检测(包括多模态方法)的典型训练流程如下。
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=5),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
与一般情况相比,nuScenes 有一个特定的 `'LoadPointsFromMultiSweeps'` 流水线来从连续帧加载点云。这是此设置中使用的常见做法。更多细节请参考 nuScenes [原始论文](https://arxiv.org/abs/1903.11027)`'LoadPointsFromMultiSweeps'` 中的默认 `use_dim``[0, 1, 2, 4]`,其中前 3 个维度是指点坐标,最后一个是指时间戳差异。由于在拼接来自不同帧的点时使用点云的强度信息会产生噪声,因此默认情况下不使用点云的强度信息。
### 基于视觉的方法
#### 基于单目方法
在NuScenes数据集中,对于多视角图像,单目检测范式通常由针对每张图像检测和输出 3D 检测结果以及通过后处理(例如 NMS )得到最终检测结果两步组成。从本质上来说,这种范式直接将单目 3D 检测扩展到多视角任务。NuScenes 上基于图像的 3D 检测的典型训练流水线如下。
```python
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='mmdet.Resize', scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='Pack3DDetInputs',
keys=[
'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers_2d', 'depths'
]),
]
```
它遵循 2D 检测的一般流水线,但在一些细节上有所不同:
- 它使用单目流水线加载图像,其中包括额外的必需信息,如相机内参矩阵。
- 它需要加载 3D 标注。
- 一些数据增强技术需要调整,例如`RandomFlip3D`。目前我们不支持更多的增强方法,因为如何迁移和应用其他技术仍在探索中。
#### 基于BEV方法
鸟瞰图,BEV(Bird's-Eye-View),是另一种常用的 3D 检测范式。它直接利用多个视角图像进行 3D 检测。对于 NuScenes 数据集而言,这些视角包括前方`CAM_FRONT`、左前方`CAM_FRONT_LEFT`、右前方`CAM_FRONT_RIGHT`、后方`CAM_BACK`、左后方`CAM_BACK_LEFT`、右后方`CAM_BACK_RIGHT`。一个基本的用于 BEV 方法的流水线如下。
```python
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
train_transforms = [
dict(type='PhotoMetricDistortion3D'),
dict(
type='RandomResize3D',
scale=(1600, 900),
ratio_range=(1., 1.),
keep_ratio=True)
]
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles',
to_float32=True,
num_views=6, ),
dict(type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
# 可选,数据增强
dict(type='MultiViewWrapper', transforms=train_transforms),
# 可选, 筛选特定点云范围内物体
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
# 可选, 筛选特定类别物体
dict(type='ObjectNameFilter', classes=class_names),
dict(type='Pack3DDetInputs', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
为了读取多个视角的图像,数据集也应进行相应微调。
```python
data_prefix = dict(
CAM_FRONT='samples/CAM_FRONT',
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
CAM_BACK='samples/CAM_BACK',
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
)
train_dataloader = dict(
batch_size=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type="NuScenesDataset",
data_root="./data/nuScenes",
ann_file="nuscenes_infos_train.pkl",
data_prefix=data_prefix,
modality=dict(use_camera=True, use_lidar=False, ),
pipeline=train_pipeline,
test_mode=False, )
)
```
## 评估
使用 8 个 GPU 以及 nuScenes 指标评估的 PointPillars 的示例如下
```shell
bash ./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth 8
```
## 指标
NuScenes 提出了一个综合指标,即 nuScenes 检测分数(NDS),以评估不同的方法并设置基准测试。它由平均精度(mAP)、平均平移误差(ATE)、平均尺度误差(ASE)、平均方向误差(AOE)、平均速度误差(AVE)和平均属性误差(AAE)组成。更多细节请参考其[官方网站](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any)
我们也采用这种方法对 nuScenes 进行评估。打印的评估结果示例如下:
```
mAP: 0.3197
mATE: 0.7595
mASE: 0.2700
mAOE: 0.4918
mAVE: 1.3307
mAAE: 0.1724
NDS: 0.3905
Eval time: 170.8s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.503 0.577 0.152 0.111 2.096 0.136
truck 0.223 0.857 0.224 0.220 1.389 0.179
bus 0.294 0.855 0.204 0.190 2.689 0.283
trailer 0.081 1.094 0.243 0.553 0.742 0.167
construction_vehicle 0.058 1.017 0.450 1.019 0.137 0.341
pedestrian 0.392 0.687 0.284 0.694 0.876 0.158
motorcycle 0.317 0.737 0.265 0.580 2.033 0.104
bicycle 0.308 0.704 0.299 0.892 0.683 0.010
traffic_cone 0.555 0.486 0.309 nan nan nan
barrier 0.466 0.581 0.269 0.169 nan nan
```
## 测试和提交
使用 8 个 GPU 在 nuScenes 上测试 PointPillars 并生成对排行榜的提交的示例如下
你需要在对应的配置文件中的 `test_evaluator` 里修改 `jsonfile_prefix`。举个例子,添加 `test_evaluator = dict(type='NuScenesMetric', jsonfile_prefix='work_dirs/pp-nus/results_eval.json')` 或在测试命令后使用 `--cfg-options "test_evaluator.jsonfile_prefix=work_dirs/pp-nus/results_eval.json)`
```shell
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py work_dirs/pp-nus/latest.pth 8 --cfg-options 'test_evaluator.jsonfile_prefix=work_dirs/pp-nus/results_eval'
```
请注意,在[这里](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/_base_/datasets/nus-3d.py#L132)测试信息应更改为测试集而不是验证集。
生成 `work_dirs/pp-nus/results_eval.json` 后,您可以压缩并提交给 nuScenes 基准测试。更多信息请参考 [nuScenes 官方网站](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any)
我们还可以使用我们开发的可视化工具将预测结果可视化。更多细节请参考[可视化文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/useful_tools.html#id2)
## 注意
### `NuScenesBox` 和我们的 `CameraInstanceBoxes` 之间的转换。
总的来说,`NuScenesBox` 和我们的 `CameraInstanceBoxes` 的主要区别主要体现在转向角(yaw)定义上。 `NuScenesBox` 定义了一个四元数或三个欧拉角的旋转,而我们的由于实际情况只定义了一个转向角(yaw),它需要我们在预处理和后处理中手动添加一些额外的旋转,例如[这里](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L673)
另外,请注意,角点和位置的定义在 `NuScenesBox` 中是分离的。例如,在单目 3D 检测中,框位置的定义在其相机坐标中(有关汽车设置,请参阅其官方[插图](https://www.nuscenes.org/nuscenes#data-collection)),即与[我们的](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py)一致。相比之下,它的角点是通过[惯例](https://github.com/nutonomy/nuscenes-devkit/blob/02e9200218977193a1058dd7234f935834378319/python-sdk/nuscenes/utils/data_classes.py#L527) 定义的,“x 向前, y 向左, z 向上”。它导致了与我们的 `CameraInstanceBoxes` 不同的维度和旋转定义理念。一个移除相似冲突的例子是 PR [#744](https://github.com/open-mmlab/mmdetection3d/pull/744)。同样的问题也存在于 LiDAR 系统中。为了解决它们,我们通常会在预处理和后处理中添加一些转换,以保证在整个训练和推理过程中框都在我们的坐标系系统里。
# S3DIS 数据集
## 数据集的准备
对于数据集准备的整体流程,请参考 S3DIS 的[指南](https://github.com/open-mmlab/mmdetection3d/blob/master/data/s3dis/README.md/)
### 提取 S3DIS 数据
通过从原始数据中提取 S3DIS 数据,我们将点云数据读取并保存下相关的标注信息,例如语义分割标签和实例分割标签。
数据提取前的目录结构应该如下所示:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── s3dis
│ │ ├── meta_data
│ │ ├── Stanford3dDataset_v1.2_Aligned_Version
│ │ │ ├── Area_1
│ │ │ │ ├── conferenceRoom_1
│ │ │ │ ├── office_1
│ │ │ │ ├── ...
│ │ │ ├── Area_2
│ │ │ ├── Area_3
│ │ │ ├── Area_4
│ │ │ ├── Area_5
│ │ │ ├── Area_6
│ │ ├── indoor3d_util.py
│ │ ├── collect_indoor3d_data.py
│ │ ├── README.md
```
`Stanford3dDataset_v1.2_Aligned_Version` 目录下,所有房间依据所属区域被分为 6 组。
我们通常使用 5 个区域进行训练,然后在余下 1 个区域上进行测试 (被余下的 1 个区域通常为区域 5)。
在每个区域的目录下包含有多个房间的文件夹,每个文件夹是一个房间的原始点云数据和相关的标注信息。
例如,在 `Area_1/office_1` 目录下的文件如下所示:
- `office_1.txt`:一个 txt 文件存储着原始点云数据每个点的坐标和颜色信息。
- `Annotations/`:这个文件夹里包含有此房间中实例物体的信息 (以 txt 文件的形式存储)。每个 txt 文件表示一个实例,例如:
- `chair_1.txt`:存储有该房间中一把椅子的点云数据。
如果我们将 `Annotations/` 下的所有 txt 文件合并起来,得到的点云就和 `office_1.txt` 中的点云是一致的。
你可以通过 `python collect_indoor3d_data.py` 指令进行 S3DIS 数据的提取。
主要步骤包括:
- 从原始 txt 文件中读取点云数据、语义分割标签和实例分割标签。
- 将点云数据和相关标注文件存储下来。
这其中的核心函数 `indoor3d_util.py` 中的 `export` 函数实现如下:
```python
def export(anno_path, out_filename):
"""将原始数据集的文件转化为点云、语义分割标签和实例分割掩码文件。
我们将同一房间中所有实例的点进行聚合。
参数列表:
anno_path (str): 标注信息的路径,例如 Area_1/office_2/Annotations/
out_filename (str): 保存点云和标签的路径
file_format (str): txt 或 numpy,指定保存的文件格式
注意:
点云在处理过程中被整体移动了,保存下的点最小位于原点 (即没有负数坐标值)
"""
points_list = []
ins_idx = 1 # 实例标签从 1 开始,因此最终实例标签为 0 的点就是无标注的点
# `anno_path` 的一个例子:Area_1/office_1/Annotations
# 其中以 txt 文件存储有该房间中所有实例物体的点云
for f in glob.glob(osp.join(anno_path, '*.txt')):
# get class name of this instance
one_class = osp.basename(f).split('_')[0]
if one_class not in class_names: # 某些房间有 'staris' 类物体
one_class = 'clutter'
points = np.loadtxt(f)
labels = np.ones((points.shape[0], 1)) * class2label[one_class]
ins_labels = np.ones((points.shape[0], 1)) * ins_idx
ins_idx += 1
points_list.append(np.concatenate([points, labels, ins_labels], 1))
data_label = np.concatenate(points_list, 0) # [N, 8], (pts, rgb, sem, ins)
# 将点云对齐到原点
xyz_min = np.amin(data_label, axis=0)[0:3]
data_label[:, 0:3] -= xyz_min
np.save(f'{out_filename}_point.npy', data_label[:, :6].astype(np.float32))
np.save(f'{out_filename}_sem_label.npy', data_label[:, 6].astype(np.int64))
np.save(f'{out_filename}_ins_label.npy', data_label[:, 7].astype(np.int64))
```
上述代码中,我们读取 `Annotations/` 下的所有点云实例,将其合并得到整体房屋的点云,同时生成语义/实例分割的标签。
在提取完每个房间的数据后,点云、语义分割和实例分割的标签文件应以 `.npy` 的格式被保存下来。
### 创建数据集
```shell
python tools/create_data.py s3dis --root-path ./data/s3dis \
--out-dir ./data/s3dis --extra-tag s3dis
```
上述指令首先读取以 `.npy` 格式存储的点云、语义分割和实例分割标签文件,然后进一步将它们以 `.bin` 格式保存。
同时,每个区域 `.pkl` 格式的信息文件也会被保存下来。
数据预处理后的目录结构如下所示:
```
s3dis
├── meta_data
├── indoor3d_util.py
├── collect_indoor3d_data.py
├── README.md
├── Stanford3dDataset_v1.2_Aligned_Version
├── s3dis_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── Area_1_label_weight.npy
│ ├── Area_1_resampled_scene_idxs.npy
│ ├── Area_2_label_weight.npy
│ ├── Area_2_resampled_scene_idxs.npy
│ ├── Area_3_label_weight.npy
│ ├── Area_3_resampled_scene_idxs.npy
│ ├── Area_4_label_weight.npy
│ ├── Area_4_resampled_scene_idxs.npy
│ ├── Area_5_label_weight.npy
│ ├── Area_5_resampled_scene_idxs.npy
│ ├── Area_6_label_weight.npy
│ ├── Area_6_resampled_scene_idxs.npy
├── s3dis_infos_Area_1.pkl
├── s3dis_infos_Area_2.pkl
├── s3dis_infos_Area_3.pkl
├── s3dis_infos_Area_4.pkl
├── s3dis_infos_Area_5.pkl
├── s3dis_infos_Area_6.pkl
```
- `points/xxxxx.bin`:提取的点云数据。
- `instance_mask/xxxxx.bin`:每个点云的实例标签,取值范围为 \[0, ${实例个数}\],其中 0 代表未标注的点。
- `semantic_mask/xxxxx.bin`:每个点云的语义标签,取值范围为 \[0, 12\]
- `s3dis_infos_Area_1.pkl`:区域 1 的数据信息,每个房间的详细信息如下:
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: `points/xxxxx.bin` 点云的路径。
- info\['pts_instance_mask_path'\]: `instance_mask/xxxxx.bin` 实例标签的路径。
- info\['pts_semantic_mask_path'\]: `semantic_mask/xxxxx.bin` 语义标签的路径。
- `seg_info`:为支持语义分割任务所生成的信息文件。
- `Area_1_label_weight.npy`:每一语义类别的权重系数。因为 S3DIS 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `Area_1_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
## 训练流程
S3DIS 上 3D 语义分割的一种典型数据载入流程如下所示:
```python
class_names = ('ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter')
num_points = 4096
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=False,
use_color=True,
load_dim=6,
use_dim=[0, 1, 2, 3, 4, 5]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=False,
with_label_3d=False,
with_mask_3d=False,
with_seg_3d=True),
dict(
type='PointSegClassMapping'),
dict(
type='IndoorPatchPointSample',
num_points=num_points,
block_size=1.0,
ignore_index=None,
use_normalized_coord=True,
enlarge_size=None,
min_unique_num=num_points // 4,
eps=0.0),
dict(type='NormalizePointsColor', color_mean=None),
dict(
type='GlobalRotScaleTrans',
rot_range=[-3.141592653589793, 3.141592653589793], # [-pi, pi]
scale_ratio_range=[0.8, 1.2],
translation_std=[0, 0, 0]),
dict(
type='RandomJitterPoints',
jitter_std=[0.01, 0.01, 0.01],
clip_range=[-0.05, 0.05]),
dict(type='RandomDropPointsColor', drop_ratio=0.2),
dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
]
```
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 \[0, 13) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `13`
- `IndoorPatchPointSample`:从输入点云中裁剪一个含有固定数量点的小块 (patch)。`block_size` 指定了裁剪块的边长,在 S3DIS 上这个数值一般设置为 `1.0`
- `NormalizePointsColor`:将输入点的颜色信息归一化,通过将 RGB 值除以 `255` 来实现。
- 数据增广:
- `GlobalRotScaleTrans`:对输入点云进行随机旋转和放缩变换。
- `RandomJitterPoints`:通过对每一个点施加不同的噪声向量以实现对点云的随机扰动。
- `RandomDropPointsColor`:以 `drop_ratio` 的概率随机将点云的颜色值全部置零。
## 度量指标
通常我们使用平均交并比 (mean Intersection over Union, mIoU) 作为 S3DIS 语义分割任务的度量指标。
具体而言,我们先计算所有类别的 IoU,然后取平均值作为 mIoU。
更多实现细节请参考 [seg_eval.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/seg_eval.py)
正如在 `提取 S3DIS 数据` 一节中所提及的,S3DIS 通常在 5 个区域上进行训练,然后在余下的 1 个区域上进行测试。但是在其他论文中,也有不同的划分方式。
为了便于灵活划分训练和测试的子集,我们首先定义子数据集 (sub-dataset) 来表示每一个区域,然后根据区域划分对其进行合并,以得到完整的训练集。
以下是在区域 1、2、3、4、6 上训练并在区域 5 上测试的一个配置文件例子:
```python
dataset_type = 'S3DISSegDataset'
data_root = './data/s3dis/'
class_names = ('ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter')
train_area = [1, 2, 3, 4, 6]
test_area = 5
train_dataloader = dict(
batch_size=8,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_files=[f's3dis_infos_Area_{i}.pkl' for i in train_area],
metainfo=metainfo,
data_prefix=data_prefix,
pipeline=train_pipeline,
modality=input_modality,
ignore_index=len(class_names),
scene_idxs=[
f'seg_info/Area_{i}_resampled_scene_idxs.npy' for i in train_area
],
test_mode=False))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_files=f's3dis_infos_Area_{test_area}.pkl',
metainfo=metainfo,
data_prefix=data_prefix,
pipeline=test_pipeline,
modality=input_modality,
ignore_index=len(class_names),
scene_idxs=f'seg_info/Area_{test_area}_resampled_scene_idxs.npy',
test_mode=True))
val_dataloader = test_dataloader
```
可以看到,我们通过将多个相应路径构成的列表 (list) 输入 `ann_files``scene_idxs` 以实现训练测试集的划分。
如果修改训练测试区域的划分,只需要简单修改 `train_area``test_area` 即可。
# ScanNet 数据集
MMDetection3D 支持在 ScanNet 数据集上进行 3D 目标检测\\语义分割 任务。本页提供了有关在 MMDetection3D 中使用 ScanNet 数据集的具体教程。
## 数据集准备
请参考 ScanNet 的[指南](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md)以查看总体流程。
### 提取 ScanNet 点云数据
通过提取 ScanNet 数据,我们加载原始点云文件,并生成包括语义标签、实例标签和真实物体包围框在内的相关标注。
```shell
python batch_load_scannet_data.py
```
数据处理之前的文件目录结构如下:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ │ ├── scenexxxx_xx
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
```
`scans` 文件夹下总共有 1201 个训练样本文件夹和 312 个验证样本文件夹,其中存有未处理的点云数据和相关的标注。比如说,在文件夹 `scene0001_01` 下文件是这样组织的:
- `scene0001_01_vh_clean_2.ply`:存有每个顶点坐标和颜色的网格文件。网格的顶点被直接用作未处理的点云数据。
- `scene0001_01.aggregation.json`:包含物体 ID、分割部分 ID、标签的标注文件。
- `scene0001_01_vh_clean_2.0.010000.segs.json`:包含分割部分 ID 和顶点的分割标注文件。
- `scene0001_01.txt`:包括对齐矩阵等的元文件。
- `scene0001_01_vh_clean_2.labels.ply`:包含每个顶点类别的标注文件。
通过运行 `python batch_load_scannet_data.py` 来提取 ScanNet 数据的处理过程主要包含以下几步:
- 从原始文件中提取出点云、实例标签、语义标签和包围框标签文件。
- 下采样原始点云并过滤掉不合法的类别。
- 保存处理后的点云数据和相关的标注文件。
`load_scannet_data.py` 中的核心函数 `export` 如下:
```python
def export(mesh_file,
agg_file,
seg_file,
meta_file,
label_map_file,
output_file=None,
test_mode=False):
# 标签映射文件:./data/scannet/meta_data/scannetv2-labels.combined.tsv
# 该标签映射文件中有多种标签标准,比如 'nyu40id'
label_map = scannet_utils.read_label_mapping(
label_map_file, label_from='raw_category', label_to='nyu40id')
# 加载原始点云数据,特征包括6维:XYZRGB
mesh_vertices = scannet_utils.read_mesh_vertices_rgb(mesh_file)
# 加载场景坐标轴对齐矩阵:一个 4x4 的变换矩阵
# 将传感器坐标系下的原始点转化到另一个坐标系下
# 该坐标系与房屋的两边平行(也就是与坐标轴平行)
lines = open(meta_file).readlines()
# 测试集的数据没有对齐矩阵
axis_align_matrix = np.eye(4)
for line in lines:
if 'axisAlignment' in line:
axis_align_matrix = [
float(x)
for x in line.rstrip().strip('axisAlignment = ').split(' ')
]
break
axis_align_matrix = np.array(axis_align_matrix).reshape((4, 4))
# 对网格顶点进行全局的对齐
pts = np.ones((mesh_vertices.shape[0], 4))
# 同种类坐标下的原始点云,每一行的数据是 [x, y, z, 1]
pts[:, 0:3] = mesh_vertices[:, 0:3]
# 将原始网格顶点转换为对齐后的顶点
pts = np.dot(pts, axis_align_matrix.transpose()) # Nx4
aligned_mesh_vertices = np.concatenate([pts[:, 0:3], mesh_vertices[:, 3:]],
axis=1)
# 加载语义与实例标签
if not test_mode:
# 每个物体都有一个语义标签,并且包含几个分割部分
object_id_to_segs, label_to_segs = read_aggregation(agg_file)
# 很多点属于同一分割部分
seg_to_verts, num_verts = read_segmentation(seg_file)
label_ids = np.zeros(shape=(num_verts), dtype=np.uint32)
object_id_to_label_id = {}
for label, segs in label_to_segs.items():
label_id = label_map[label]
for seg in segs:
verts = seg_to_verts[seg]
# 每个点都有一个语义标签
label_ids[verts] = label_id
instance_ids = np.zeros(
shape=(num_verts), dtype=np.uint32) # 0:未标注的
for object_id, segs in object_id_to_segs.items():
for seg in segs:
verts = seg_to_verts[seg]
# object_id 从 1 开始计数,比如 1,2,3,.,,,.NUM_INSTANCES
# 每个点都属于一个物体
instance_ids[verts] = object_id
if object_id not in object_id_to_label_id:
object_id_to_label_id[object_id] = label_ids[verts][0]
# 包围框格式为 [x, y, z, dx, dy, dz, label_id]
# [x, y, z] 是包围框的重力中心, [dx, dy, dz] 是与坐标轴平行的
# [label_id] 是 'nyu40id' 标准下的语义标签
# 注意:因为三维包围框是与坐标轴平行的,所以旋转角是 0
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
...
return mesh_vertices, label_ids, instance_ids, unaligned_bboxes, \
aligned_bboxes, object_id_to_label_id, axis_align_matrix
```
在从每个场景的扫描文件提取数据后,如果原始点云点数过多,可以将其下采样(比如到 50000 个点),但在三维语义分割任务中,点云不会被下采样。此外,在 `nyu40id` 标准之外的不合法语义标签或者可选的 `DONOT CARE` 类别标签应被过滤。最终,点云文件、语义标签、实例标签和真实物体的集合应被存储于 `.npy` 文件中。
### 提取 ScanNet RGB 色彩数据(可选的)
通过提取 ScanNet RGB 色彩数据,对于每个场景我们加载 RGB 图像与配套 4x4 位姿矩阵、单个 4x4 相机内参矩阵的集合。请注意,这一步是可选的,除非要运行多视图物体检测,否则可以略去这步。
```shell
python extract_posed_images.py
```
1201 个训练样本,312 个验证样本和 100 个测试样本中的每一个都包含一个单独的 `.sens` 文件。比如说,对于场景 `0001_01` 我们有 `data/scannet/scans/scene0001_01/0001_01.sens`。对于这个场景所有图像和位姿数据都被提取至 `data/scannet/posed_images/scene0001_01`。具体来说,该文件夹下会有 300 个 xxxxx.jpg 格式的图像数据,300 个 xxxxx.txt 格式的相机位姿数据和一个单独的 `intrinsic.txt` 内参文件。通常来说,一个场景包含数千张图像。默认情况下,我们只会提取其中的 300 张,从而只占用少于 100 Gb 的空间。要想提取更多图像,请使用 `--max-images-per-scene` 参数。
### 创建数据集
```shell
python tools/create_data.py scannet --root-path ./data/scannet \
--out-dir ./data/scannet --extra-tag scannet
```
上述提取的点云文件,语义类别标注文件,和物体实例标注文件被进一步以 `.bin` 格式保存。与此同时 `.pkl` 格式的文件被生成并用于训练和验证。获取数据信息的核心函数 `process_single_scene` 如下:
```python
def process_single_scene(sample_idx):
# 分别以 .bin 格式保存点云文件,语义类别标注文件和物体实例标注文件
# 获取 info['pts_path'],info['pts_instance_mask_path'] 和 info['pts_semantic_mask_path']
...
# 获取标注
if has_label:
annotations = {}
# 包围框的形状为 [k, 6 + class]
aligned_box_label = self.get_aligned_box_label(sample_idx)
unaligned_box_label = self.get_unaligned_box_label(sample_idx)
annotations['gt_num'] = aligned_box_label.shape[0]
if annotations['gt_num'] != 0:
aligned_box = aligned_box_label[:, :-1] # k, 6
unaligned_box = unaligned_box_label[:, :-1]
classes = aligned_box_label[:, -1] # k
annotations['name'] = np.array([
self.label2cat[self.cat_ids2class[classes[i]]]
for i in range(annotations['gt_num'])
])
# 为了向后兼容,默认的参数名赋予了与坐标轴平行的包围框
# 我们同时保存了对应的与坐标轴不平行的包围框的信息
annotations['location'] = aligned_box[:, :3]
annotations['dimensions'] = aligned_box[:, 3:6]
annotations['gt_boxes_upright_depth'] = aligned_box
annotations['unaligned_location'] = unaligned_box[:, :3]
annotations['unaligned_dimensions'] = unaligned_box[:, 3:6]
annotations[
'unaligned_gt_boxes_upright_depth'] = unaligned_box
annotations['index'] = np.arange(
annotations['gt_num'], dtype=np.int32)
annotations['class'] = np.array([
self.cat_ids2class[classes[i]]
for i in range(annotations['gt_num'])
])
axis_align_matrix = self.get_axis_align_matrix(sample_idx)
annotations['axis_align_matrix'] = axis_align_matrix # 4x4
info['annos'] = annotations
return info
```
如上数据处理后,文件目录结构应如下:
```
scannet
├── meta_data
├── batch_load_scannet_data.py
├── load_scannet_data.py
├── scannet_utils.py
├── README.md
├── scans
├── scans_test
├── scannet_instance_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── train_label_weight.npy
│ ├── train_resampled_scene_idxs.npy
│ ├── val_label_weight.npy
│ ├── val_resampled_scene_idxs.npy
├── posed_images
│ ├── scenexxxx_xx
│ │ ├── xxxxxx.txt
│ │ ├── xxxxxx.jpg
│ │ ├── intrinsic.txt
├── scannet_infos_train.pkl
├── scannet_infos_val.pkl
├── scannet_infos_test.pkl
```
- `points/xxxxx.bin`:下采样后,未与坐标轴平行(即没有对齐)的点云。因为 ScanNet 3D 检测任务将与坐标轴平行的点云作为输入,而 ScanNet 3D 语义分割任务将对齐前的点云作为输入,我们选择存储对齐前的点云和它们的对齐矩阵。请注意:在 3D 检测的预处理流程 [`GlobalAlignment`](https://github.com/open-mmlab/mmdetection3d/blob/9f0b01caf6aefed861ef4c3eb197c09362d26b32/mmdet3d/datasets/pipelines/transforms_3d.py#L423) 后,点云就都是与坐标轴平行的了。
- `instance_mask/xxxxx.bin`:每个点的实例标签,值的范围为:\[0, NUM_INSTANCES\],其中 0 表示没有标注。
- `semantic_mask/xxxxx.bin`:每个点的语义标签,值的范围为:\[1, 40\], 也就是 `nyu40id` 的标准。请注意:在训练流程 `PointSegClassMapping` 中,`nyu40id` 的 ID 会被映射到训练 ID。
- `seg_info`:为支持语义分割任务所生成的信息文件。
- `train_label_weight.npy`:每一语义类别的权重系数。因为 ScanNet 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `train_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
- `posed_images/scenexxxx_xx``.jpg` 图像的集合,还包含 `.txt` 格式的 4x4 相机姿态和单个 `.txt` 格式的相机内参矩阵文件。
- `scannet_infos_train.pkl`:训练集的数据信息,每个场景的具体信息如下:
- info\['lidar_points'\]:字典包含与激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度。
- info\['lidar_points'\]\['axis_align_matrix'\]:用于对齐坐标轴的变换矩阵。
- info\['pts_semantic_mask_path'\]:语义分割标注的文件名。
- info\['pts_instance_mask_path'\]:实例分割标注的文件名。
- info\['instances'\]:字典组成的列表,每个字典包含一个实例的所有标注信息。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox_3d'\]:长度为 6 的列表,以 (x, y, z, l, w, h) 的顺序表示深度坐标系下与坐标轴平行的 3D 边界框。
- info\[instances\]\[i\]\['bbox_label_3d'\]:3D 边界框的标签。
- `scannet_infos_val.pkl`:验证集上的数据信息,与 `scannet_infos_train.pkl` 格式完全一致。
- `scannet_infos_test.pkl`:测试集上的数据信息,与 `scannet_infos_train.pkl` 格式几乎完全一致,除了缺少标注。
## 训练流程
ScanNet 进行 **3D 目标检测**的一种典型数据载入流程如下所示:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_mask_3d=True,
with_seg_3d=True),
dict(type='GlobalAlignment', rotation_axis=2),
dict(type='PointSegClassMapping'),
dict(type='PointSample', num_points=40000),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
flip_ratio_bev_vertical=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.087266, 0.087266],
scale_ratio_range=[1.0, 1.0],
shift_height=True),
dict(
type='Pack3DDetInputs',
keys=[
'points', 'gt_bboxes_3d', 'gt_labels_3d', 'pts_semantic_mask',
'pts_instance_mask'
])
]
```
- `GlobalAlignment`:输入的点云在施加了坐标轴平行的矩阵后应被转换为与坐标轴平行的形式。
- `PointSegClassMapping`:训练中,只有合法的类别 ID 才会被映射到类别标签,比如 \[0, 18)。
- 数据增强:
- `PointSample`:下采样输入点云。
- `RandomFlip3D`:随机左右或前后翻转点云。
- `GlobalRotScaleTrans`:旋转输入点云,对于 ScanNet 角度通常落入 \[-5, 5\](度)的范围;并放缩输入点云,对于 ScanNet 比例通常为 1.0(即不做缩放);最后平移输入点云,对于 ScanNet 通常位移量为 0(即不做位移)。
ScanNet 进行 **3D 语义分割**的一种典型数据载入流程如下所示:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=False,
use_color=True,
load_dim=6,
use_dim=[0, 1, 2, 3, 4, 5]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=False,
with_label_3d=False,
with_mask_3d=False,
with_seg_3d=True),
dict(
type='PointSegClassMapping'),
dict(
type='IndoorPatchPointSample',
num_points=num_points,
block_size=1.5,
ignore_index=len(class_names),
use_normalized_coord=False,
enlarge_size=0.2,
min_unique_num=None),
dict(type='NormalizePointsColor', color_mean=None),
dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
]
```
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 \[0, 20) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `20`
- `IndoorPatchPointSample`:从输入点云中裁剪一个含有固定数量点的小块 (patch)。`block_size` 指定了裁剪块的边长,在 ScanNet 上这个数值一般设置为 `1.5`
- `NormalizePointsColor`:将输入点的颜色信息归一化,通过将 RGB 值除以 `255` 来实现。
## 评估指标
- **目标检测**:通常使用全类平均精度(mAP)来评估 ScanNet 的 3D 检测任务的性能,比如 `mAP@0.25``mAP@0.5`。具体来说,评估时调用一个通用的计算 3D 物体检测多个类别的精度和召回率的函数。更多细节请参考 [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/indoor_eval.py)
**注意**:与在章节`提取 ScanNet 数据`中介绍的那样,所有真实物体的三维包围框是与坐标轴平行的,也就是说旋转角为 0。因此,预测包围框的网络接受的包围框旋转角监督也是 0,且在后处理阶段我们使用适用于与坐标轴平行的包围框的非极大值抑制(NMS),该过程不会考虑包围框的旋转。
- **语义分割**:通常使用平均交并比 (mean Intersection over Union, mIoU) 来评估 ScanNet 的 3D 语义分割任务的性能。具体而言,我们先计算所有类别的 IoU,然后取平均值作为 mIoU。更多实现细节请参考 [seg_eval.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/seg_eval.py)
## 在测试集上测试并提交结果
默认情况下,MMDet3D 的代码是在训练集上进行模型训练,然后在验证集上进行模型测试。
如果你也想在在线基准上测试模型的性能(仅支持语义分割),请在测试命令中加上 `--format-only` 的标记,同时也要将 ScanNet 数据集[配置文件](https://github.com/open-mmlab/mmdetection3d/blob/main/configs/_base_/datasets/scannet-seg.py#L126)中的 `ann_file=data_root + 'scannet_infos_val.pkl'` 改成 `ann_file=data_root + 'scannet_infos_test.pkl'`
请记得通过 `txt_prefix` 来指定想要保存测试结果的文件夹名称。
以 PointNet++ (SSG) 在 ScanNet 上的测试为例,你可以运行以下命令来完成测试结果的保存:
```
./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet-seg.py \
work_dirs/pointnet2_ssg/latest.pth --format-only \
--eval-options txt_prefix=work_dirs/pointnet2_ssg/test_submission
```
在保存测试结果后,你可以将该文件夹压缩,然后提交到 [ScanNet 在线测试服务器](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d)上进行验证。
# SemanticKITTI 数据集
本页提供了有关在 MMDetection3D 中使用 SemanticKITTI 数据集的具体教程。
## 数据集准备
您可以在[这里](http://semantic-kitti.org/dataset.html#download)下载 SemanticKITTI 数据集并解压缩所有 zip 文件。
像准备数据集的一般方法一样,建议将数据集根目录链接到 `$MMDETECTION3D/data`
在我们处理之前,文件夹结构应按如下方式组织:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── semantickitti
│ │ ├── sequences
│ │ │ ├── 00
│ │ │ │   ├── labels
│ │ │ │   ├── velodyne
│ │ │ ├── 01
│ │ │ ├── ..
│ │ │ ├── 22
```
SemanticKITTI 数据集包含 23 个序列,其中序列 \[0-7\] , \[9-10\] 作为训练集(约 19k 训练样本),序列 8 作为验证集(约 4k 验证样本),\[11-22\] 作为测试集 (约20k测试样本)。其中每个序列分别包含 velodyne 和 labels 两个文件夹分别存放激光雷达点云数据和分割标注 (其中高16位存放实例分割标注,低16位存放语义分割标注)。
### 创建 SemanticKITTI 数据集
我们提供了生成数据集信息的脚本,用于测试和训练。通过以下命令生成 `.pkl` 文件:
```bash
python ./tools/create_data.py semantickitti --root-path ./data/semantickitti --out-dir ./data/semantickitti --extra-tag semantickitti
```
处理后的文件夹结构应该如下:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── semantickitti
│ │ ├── sequences
│ │ │ ├── 00
│ │ │ │   ├── labels
│ │ │ │   ├── velodyne
│ │ │ ├── 01
│ │ │ ├── ..
│ │ │ ├── 22
│ │ ├── semantickitti_infos_test.pkl
│ │ ├── semantickitti_infos_train.pkl
│ │ ├── semantickitti_infos_val.pkl
```
- `semantickitti_infos_train.pkl`: 训练数据集, 该字典包含两个键值: `metainfo``data_list`.
`metainfo` 包含该数据集的基本信息。 `data_list` 是由字典组成的列表,每个字典(以下简称 `info`)包含了单个样本的所有详细信息。
- info\['sample_id'\]:该样本在整个数据集的索引。
- info\['lidar_points'\]:是一个字典,包含了激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度
- info\['pts_semantic_mask_pth'\]:三维语义分割的标注文件的文件路径
更多细节请参考 [semantickitti_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/semantickitti_converter.py)[update_infos_to_v2.py ](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/tools/dataset_converters/update_infos_to_v2.py)
## Train pipeline
下面展示了一个使用 SemanticKITTI 数据集进行 3D 语义分割的典型流程:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4,
backend_args=backend_args),
dict(
type='LoadAnnotations3D',
with_bbox_3d=False,
with_label_3d=False,
with_seg_3d=True,
seg_3d_dtype='np.int32',
seg_offset=2**16,
dataset_type='semantickitti',
backend_args=backend_args),
dict(type='PointSegClassMapping'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
flip_ratio_bev_vertical=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05],
translation_std=[0.1, 0.1, 0.1],
),
dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
]
```
- 数据增强:
- `RandomFlip3D`:对输入点云数据进行随机地水平翻转或者垂直翻转。
- `GlobalRotScaleTrans`:对输入点云数据进行旋转、缩放、平移。
## 评估
使用 8 个 GPU 以及 SemanticKITTI 指标评估的 MinkUNet 的示例如下:
```shell
bash tools/dist_test.sh configs/minkunet/minkunet_w32_8xb2-15e_semantickitti.py checkpoints/minkunet_w32_8xb2-15e_semantickitti_20230309_160710-7fa0a6f1.pth 8
```
## 度量指标
通常我们使用平均交并比 (mean Intersection over Union, mIoU) 作为 SemanticKITTI 语义分割任务的度量指标。
具体而言,我们先计算所有类别的 IoU,然后取平均值作为 mIoU。
更多实现细节请参考 [seg_eval.py](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/seg_eval.py)
以下是一个评估结果的样例:
| classes | car | bicycle | motorcycle | truck | bus | person | bicyclist | motorcyclist | road | parking | sidewalk | other-ground | building | fence | vegetation | trunck | terrian | pole | traffic-sign | miou | acc | acc_cls |
| ------- | ------ | ------- | ---------- | ------ | ------ | ------ | --------- | ------------ | ------ | ------- | -------- | ------------ | -------- | ------ | ---------- | ------ | ------- | ------ | ------------ | ------ | ------ | ------- |
| results | 0.9687 | 0.1908 | 0.6313 | 0.8580 | 0.6359 | 0.6818 | 0.8444 | 0.0002 | 0.9353 | 0.4854 | 0.8106 | 0.0024 | 0.9050 | 0.6111 | 0.8822 | 0.6605 | 0.7493 | 0.6442 | 0.4837 | 0.6306 | 0.9202 | 0.6924 |
# SUN RGB-D 数据集
## 数据集的准备
对于数据集准备的整体流程,请参考 SUN RGB-D 的[指南](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md)
### 下载 SUN RGB-D 数据与工具包
[这里](http://rgbd.cs.princeton.edu/data/)下载 SUN RGB-D 的数据。接下来,将 `SUNRGBD.zip``SUNRGBDMeta2DBB_v2.mat``SUNRGBDMeta3DBB_v2.mat``SUNRGBDtoolbox.zip` 移动到 `OFFICIAL_SUNRGBD` 文件夹,并解压文件。
下载完成后,数据处理之前的文件目录结构如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
```
### 从原始数据中提取 3D 检测所需数据与标注
通过运行如下指令从原始文件中提取出 SUN RGB-D 的标注(这需要您的机器中安装了 MATLAB):
```bash
matlab -nosplash -nodesktop -r 'extract_split;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
```
主要的步骤包括:
- 提取出训练集和验证集的索引文件;
- 从原始数据中提取出 3D 检测所需要的数据;
- 从原始的标注数据中提取并组织检测任务使用的标注数据。
用于从深度图中提取点云数据的 `extract_rgbd_data_v2.m` 的主要部分如下:
```matlab
data = SUNRGBDMeta(imageId);
data.depthpath(1:16) = '';
data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
data.rgbpath(1:16) = '';
data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
% 从深度图获取点云
[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
rgb(isnan(points3d(:,1)),:) = [];
points3d(isnan(points3d(:,1)),:) = [];
points3d_rgb = [points3d, rgb];
% MAT 文件比 TXT 文件小三倍。在 Python 中我们可以使用
% scipy.io.loadmat('xxx.mat')['points3d_rgb'] 来加载数据
mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
% 保存点云数据
parsave(strcat(depth_folder, mat_filename), points3d_rgb);
```
用于提取并组织检测任务标注的 `extract_rgbd_data_v1.m` 的主要部分如下:
```matlab
% 输出 2D 和 3D 包围框
data2d = data;
fid = fopen(strcat(det_label_folder, txt_filename), 'w');
for j = 1:length(data.groundtruth3DBB)
centroid = data.groundtruth3DBB(j).centroid; % 3D 包围框中心
classname = data.groundtruth3DBB(j).classname; % 类名
orientation = data.groundtruth3DBB(j).orientation; % 3D 包围框方向
coeffs = abs(data.groundtruth3DBB(j).coeffs); % 3D 包围框大小
box2d = data2d.groundtruth2DBB(j).gtBb2D; % 2D 包围框
fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
end
fclose(fid);
```
上面的两个脚本调用了 SUN RGB-D 提供的[工具包](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip)中的一些函数,如 `read3dPoints`
使用上述脚本提取数据后,文件目录结构应如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
```
在如下每个文件夹下,都有总计 5285 个训练集样本和 5050 个验证集样本:
- `calib``.txt` 后缀的相机标定文件。
- `depth``.mat` 后缀的点云文件,包含 xyz 坐标和 rgb 色彩值。
- `image``.jpg` 后缀的二维图像文件。
- `label``.txt` 后缀的用于检测任务的标注数据(版本二)。
- `label_v1``.txt` 后缀的用于检测任务的标注数据(版本一)。
- `seg_label``.txt` 后缀的用于分割任务的标注数据。
目前,我们使用版本一的数据用于训练与测试,因此版本二的标注并未使用。
### 创建数据集
请运行如下指令创建数据集:
```shell
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
--out-dir ./data/sunrgbd --extra-tag sunrgbd
```
或者,如果使用 slurm,可以使用如下指令替代:
```
bash tools/create_data.sh <job_name> sunrgbd
```
之前提到的点云数据就会被处理并以 `.bin` 格式重新存储。与此同时,`.pkl` 文件也被生成,用于存储数据标注和元信息。
如上数据处理后,文件目录结构应如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── ...
├── OFFICIAL_SUNRGBD
│ ├── ...
├── sunrgbd_trainval
│ ├── ...
├── points
├── sunrgbd_infos_train.pkl
├── sunrgbd_infos_val.pkl
```
- `points/xxxxxx.bin`:降采样后的点云数据。
- `sunrgbd_infos_train.pkl`:训练集数据信息(标注与元信息),每个场景所含数据信息具体如下:
- info\['lidar_points'\]:字典包含了与激光雷达点相关的信息。
- info\['lidar_points'\]\['num_pts_feats'\]:点的特征维度。
- info\['lidar_points'\]\['lidar_path'\]:激光雷达点云数据的文件名。
- info\['images'\]:字典包含了与图像数据相关的信息。
- info\['images'\]\['CAM0'\]\['img_path'\]:图像的文件名。
- info\['images'\]\['CAM0'\]\['depth2img'\]:深度到图像的变换矩阵,形状为 (4, 4)。
- info\['images'\]\['CAM0'\]\['height'\]:图像的高。
- info\['images'\]\['CAM0'\]\['width'\]:图像的宽。
- info\['instances'\]:由字典组成的列表,包含了该帧的所有标注信息。每个字典与单个实例的标注相关。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox_3d'\]:长度为 7 的列表,表示深度坐标系下的 3D 边界框。
- info\['instances'\]\[i\]\['bbox'\]:长度为 4 的列表,以 (x1, y1, x2, y2) 的顺序表示实例的 2D 边界框。
- info\['instances'\]\[i\]\['bbox_label_3d'\]:整数表示实例的 3D 标签,-1 表示忽略该类别。
- info\['instances'\]\[i\]\['bbox_label'\]:整数表示实例的 2D 标签,-1 表示忽略该类别。
- `sunrgbd_infos_val.pkl`:验证集上的数据信息,与 `sunrgbd_infos_train.pkl` 格式完全一致。
## 训练流程
SUN RGB-D 上纯点云 3D 物体检测的典型流程如下:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadAnnotations3D'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='PointSample', num_points=20000),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
点云上的数据增强
- `RandomFlip3D`:随机左右或前后翻转输入点云。
- `GlobalRotScaleTrans`:旋转输入点云,对于 SUN RGB-D 角度通常落入 \[-30, 30\](度)的范围;并放缩输入点云,对于 SUN RGB-D 比例通常落入 \[0.85, 1.15\] 的范围;最后平移输入点云,对于 SUN RGB-D 通常位移量为 0(即不做位移)。
- `PointSample`:降采样输入点云。
SUN RGB-D 上多模态(点云和图像)3D 物体检测的典型流程如下:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations3D'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', scale=(1333, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(type='Pad', size_divisor=32),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d','img', 'gt_bboxes', 'gt_bboxes_labels'])
]
```
图像上的数据增强
- `Resize`:改变输入图像的大小,`keep_ratio=True` 意味着图像的比例不改变。
- `RandomFlip`:随机地翻折图像。
图像增强的实现取自 [MMDetection](https://github.com/open-mmlab/mmdetection/tree/dev-3.x/mmdet/datasets/transforms)
## 度量指标
与 ScanNet 一样,通常使用 mAP(全类平均精度)来评估 SUN RGB-D 的检测任务的性能,比如 `mAP@0.25``mAP@0.5`。具体来说,评估时调用一个通用的计算 3D 物体检测多个类别的精度和召回率的函数。更多细节请参考 [`indoor_eval.py`](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/indoor_eval.py)
因为 SUN RGB-D 包含有图像数据,所以图像上的物体检测也是可行的。举个例子,在 ImVoteNet 中,我们首先训练了一个图像检测器,并且也使用 mAP 指标,如 `mAP@0.5`,来评估其表现。我们使用 [MMDetection](https://github.com/open-mmlab/mmdetection) 库中的 `eval_map` 函数来计算 mAP。
# Waymo 数据集
本文档页包含了关于 MMDetection3D 中 Waymo 数据集用法的教程。
## 数据集准备
在准备 Waymo 数据集之前,如果您之前只安装了 `requirements/build.txt``requirements/runtime.txt` 中的依赖,请通过运行如下指令额外安装 Waymo 数据集所依赖的官方包:
```
pip install waymo-open-dataset-tf-2-6-0
```
或者
```
pip install -r requirements/optional.txt
```
和准备数据集的通用方法一致,我们推荐将数据集根目录软链接至 `$MMDETECTION3D/data`
由于原始 Waymo 数据的格式基于 `tfrecord`,我们需要将原始数据进行预处理,以便于训练和测试时使用。我们的方法是将它们转换为 KITTI 格式。
处理之前,文件目录结构组织如下:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ │ ├── cam_gt.bin
│ │ │ ├── fov_gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
```
您可以在[这里](https://waymo.com/open/download/)下载 1.2 版本的 Waymo 公开数据集,并在[这里](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing)下载其训练/验证/测试集拆分文件。接下来,请将 `tfrecord` 文件放入 `data/waymo/waymo_format/` 下的对应文件夹,并将 txt 格式的数据集拆分文件放入 `data/waymo/kitti_format/ImageSets`。在[这里](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects)下载验证集使用的 bin 格式真实标注 (Ground Truth) 文件并放入 `data/waymo/waymo_format/`。小窍门:您可以使用 `gsutil` 来在命令行下载大规模数据集。您可以将该[工具](https://github.com/RalphMao/Waymo-Dataset-Tool) 作为一个例子来查看更多细节。之后,通过运行如下指令准备 Waymo 数据:
```bash
# TF_CPP_MIN_LOG_LEVEL=3 will disable all logging output from TensorFlow.
# The number of `--workers` depends on the maximum number of cores in your CPU.
TF_CPP_MIN_LOG_LEVEL=3 python tools/create_data.py waymo --root-path ./data/waymo --out-dir ./data/waymo --workers 128 --extra-tag waymo --version v1.4
```
请注意,如果您的本地磁盘没有足够空间保存转换后的数据,您可以将 `--out-dir` 改为其他目录;只要在创建文件夹、准备数据并转换格式后,将数据文件链接到 `data/waymo/kitti_format` 即可。
在数据转换后,文件目录结构应组织如下:
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ │ ├── cam_gt.bin
│ │ │ ├── fov_gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ │ │ ├── training
│ │ │ │ ├── image_0
│ │ │ │ ├── image_1
│ │ │ │ ├── image_2
│ │ │ │ ├── image_3
│ │ │ │ ├── image_4
│ │ │ │ ├── velodyne
│ │ │ ├── testing
│ │ │ │ ├── (the same as training)
│ │ │ ├── waymo_gt_database
│ │ │ ├── waymo_infos_trainval.pkl
│ │ │ ├── waymo_infos_train.pkl
│ │ │ ├── waymo_infos_val.pkl
│ │ │ ├── waymo_infos_test.pkl
│ │ │ ├── waymo_dbinfos_train.pkl
```
- `kitti_format/training/image_{0-4}/{a}{bbb}{ccc}.jpg` 因为 Waymo 数据的来源包含数个相机,这里我们将每个相机对应的图像和标签文件分别存储,并将相机位姿 (pose) 文件存储下来以供后续处理连续多帧的点云。我们使用 `{a}{bbb}{ccc}` 的名称编码方式为每帧数据命名,其中 `a` 是不同数据拆分的前缀(`0` 指代训练集,`1` 指代验证集,`2` 指代测试集),`bbb` 是分割部分 (segment) 的索引,而 `ccc` 是帧索引。您可以轻而易举地按照如上命名规则定位到所需的帧。我们将训练和验证所需数据按 KITTI 的方式集合在一起,然后将训练集/验证集/测试集的索引存储在 `ImageSet` 下的文件中。
- `kitti_format/training/velodyne/{a}{bbb}{ccc}.bin` 当前样本的点云数据
- `kitti_format/waymo_gt_database/xxx_{Car/Pedestrian/Cyclist}_x.bin`. 训练数据集的每个 3D 包围框中包含的点云数据。这些点云会在数据增强中被使用,例如. `ObjectSample`. `xxx` 表示训练样本的索引,`x` 表示实例在当前样本中的索引。
- `kitti_format/waymo_infos_train.pkl`. 训练数据集,该字典包含了两个键值:`metainfo``data_list``metainfo` 包含数据集的基本信息,例如 `dataset``version``info_version``data_list` 是由字典组成的列表,每个字典(以下简称 `info`)包含了单个样本的所有详细信息。:
- info\['sample_idx'\]: 样本在整个数据集的索引。
- info\['ego2global'\]: 自车到全局坐标的变换矩阵。(4x4 列表)
- info\['timestamp'\]:样本数据时间戳。
- info\['context_name'\]: 语境名,表示样本从哪个 `*.tfrecord` 片段中提取的。
- info\['lidar_points'\]: 是一个字典,包含了所有与激光雷达点相关的信息。
- info\['lidar_points'\]\['lidar_path'\]: 激光雷达点云数据的文件名。
- info\['lidar_points'\]\['num_pts_feats'\]: 点的特征维度。
- info\['lidar_sweeps'\]: 是一个列表,包含了历史帧信息。
- info\['lidar_sweeps'\]\[i\]\['lidar_points'\]\['lidar_path'\]: 第 i 帧的激光雷达数据的文件路径。
- info\['lidar_sweeps'\]\[i\]\['ego2global'\]: 第 i 帧的激光雷达传感器到自车的变换矩阵。(4x4 列表)
- info\['lidar_sweeps'\]\[i\]\['timestamp'\]: 第 i 帧的样本数据时间戳。
- info\['images'\]: 是一个字典,包含与每个相机对应的六个键值:`'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`。每个字典包含了对应相机的所有数据信息。
- info\['images'\]\['CAM_XXX'\]\['img_path'\]: 图像的文件名。
- info\['images'\]\['CAM_XXX'\]\['height'\]: 图像的高
- info\['images'\]\['CAM_XXX'\]\['width'\]: 图像的宽
- info\['images'\]\['CAM_XXX'\]\['cam2img'\]: 当 3D 点投影到图像平面时需要的内参信息相关的变换矩阵。(3x3 列表)
- info\['images'\]\['CAM_XXX'\]\['lidar2cam'\]: 激光雷达传感器到该相机的变换矩阵。(4x4 列表)
- info\['images'\]\['CAM_XXX'\]\['lidar2img'\]: 激光雷达传感器到图像平面的变换矩阵。(4x4 列表)
- info\['image_sweeps'\]: 是一个列表,包含了历史帧信息。
- info\['image_sweeps'\]\[i\]\['images'\]\['CAM_XXX'\]\['img_path'\]: 第i帧的图像的文件名.
- info\['image_sweeps'\]\[i\]\['ego2global'\]: 第 i 帧的自车到全局坐标的变换矩阵。(4x4 列表)
- info\['image_sweeps'\]\[i\]\['timestamp'\]: 第 i 帧的样本数据时间戳。
- info\['instances'\]: 是一个字典组成的列表。每个字典包含单个实例的所有标注信息。对于其中的第 i 个实例,我们有:
- info\['instances'\]\[i\]\['bbox_3d'\]: 长度为 7 的列表,以 (x, y, z, l, w, h, yaw) 的顺序表示实例的 3D 边界框。
- info\['instances'\]\[i\]\['bbox'\]: 2D 边界框标注(,顺序为 \[x1, y1, x2, y2\] 的列表。有些实例可能没有对应的 2D 边界框标注。
- info\['instances'\]\[i\]\['bbox_label_3d'\]: 整数表示实例的标签,-1 代表忽略。
- info\['instances'\]\[i\]\['bbox_label'\]: 整数表示实例的标签,-1 代表忽略。
- info\['instances'\]\[i\]\['num_lidar_pts'\]: 每个 3D 边界框内包含的激光雷达点数。
- info\['instances'\]\[i\]\['camera_id'\]: 当前实例最可见相机的索引。
- info\['instances'\]\[i\]\['group_id'\]: 当前实例在当前样本中的索引。
- info\['cam_sync_instances'\]: 是一个字典组成的列表。每个字典包含单个实例的所有标注信息。它的形式与 \['instances'\]相同. 但是, \['cam_sync_instances'\] 专门用于基于多视角相机的三维目标检测任务。
- info\['cam_instances'\]: 是一个字典,包含以下键值: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_SIDE_LEFT'`, `'CAM_SIDE_RIGHT'`. 对于基于视觉的 3D 目标检测任务,我们将整个场景的 3D 标注划分至它们所属于的相应相机中。对于其中的第 i 个实例,我们有:
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_3d'\]: 长度为 7 的列表,以 (x, y, z, l, h, w, yaw) 的顺序表示实例的 3D 边界框。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox'\]: 2D 边界框标注(3D 框投影的矩形框),顺序为 \[x1, y1, x2, y2\] 的列表。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label_3d'\]: 实例标签。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['bbox_label'\]: 实例标签。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['center_2d'\]: 3D 框投影到图像上的中心点,大小为 (2, ) 的列表。
- info\['cam_instances'\]\['CAM_XXX'\]\[i\]\['depth'\]: 3D 框投影中心的深度。
## 训练
考虑到原始数据集中的数据有很多相似的帧,我们基本上可以主要使用一个子集来训练我们的模型。在我们初步的基线中,我们在每五帧图片中加载一帧。得益于我们的超参数设置和数据增强方案,我们得到了比 Waymo [原论文](https://arxiv.org/pdf/1912.04838.pdf)中更好的性能。请移步 `configs/pointpillars/` 下的 README.md 以查看更多配置和性能相关的细节。我们会尽快发布一个更完整的 Waymo 基准榜单 (benchmark)。
## 评估
为了在 Waymo 数据集上进行检测性能评估,请按照[此处指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md)构建用于计算评估指标的二进制文件 `compute_detection_metrics_main`,并将它置于 `mmdet3d/core/evaluation/waymo_utils/` 下。您基本上可以按照下方命令安装 `bazel`,然后构建二进制文件:
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
# git clone https://github.com/Abyssaledge/waymo-open-dataset-master waymo-od # if you want to use faster multi-thread version.
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/evaluation/functional/waymo_utils/
```
接下来,您就可以在 Waymo 上评估您的模型了。如下示例是使用 8 个图形处理器 (GPU) 在 Waymo 上用 Waymo 评价指标评估 PointPillars 模型的情景:
```shell
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth
```
如果需要生成 bin 文件,需要在配置文件的 `test_evaluator` 中指定 `pklfile_prefix`,因此你可以在命令后添加 `--cfg-options "test_evaluator.pklfile_prefix=xxxx"`
**注意**:
1. 有时用 `bazel` 构建 `compute_detection_metrics_main` 的过程中会出现如下错误:`'round' 不是 'std' 的成员` (`'round' is not a member of 'std'`)。我们只需要移除该文件中,`round` 前的 `std::`
2. 考虑到 Waymo 上评估一次耗时不短,我们建议只在模型训练结束时进行评估。
3. 为了在 CUDA 9 环境使用 TensorFlow,我们建议通过编译 TensorFlow 源码的方式使用。除了官方教程之外,您还可以参考该[链接](https://github.com/SmileTM/Tensorflow2.X-GPU-CUDA9.0)以寻找可能合适的预编译包以及编译源码的实用攻略。
## 测试并提交到官方服务器
如下是一个使用 8 个图形处理器在 Waymo 上测试 PointPillars,生成 bin 文件并提交结果到官方榜单的例子:
如果你想生成 bin 文件并提交到服务器中,在运行测试指令前你需要在配置文件的 `test_evaluator` 中指定 `submission_prefix`
在生成 bin 文件后,您可以简单地构建二进制文件 `create_submission`,并按照[指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/)创建一个提交文件。下面是一些示例:
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/evaluation/functional/waymo_utils/
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/evaluation/functional/waymo_utils/submission.txtpb'
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
如果想用官方评估服务器评估您在验证集上的结果,您可以使用同样的方法生成提交文件,只需确保您在运行如上指令前更改 `submission.txtpb` 中的字段值即可。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment