Unverified Commit bb204696 authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Release v1.0.0rc3

Release v1.0.0rc3
parents 14c5ded4 dea954e5
......@@ -7,43 +7,43 @@ MMDetection3D uses three different coordinate systems. The existence of differen
Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
- Camera coordinate system -- the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
- LiDAR coordinate system -- the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
- Depth coordinate system -- the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of ``$$`(x, y, z, dx, dy, dz, r)`$$``, our coordinate systems also define how to interpret the box dimensions ``$$`(dx, dy, dz)`$$`` and the yaw angle ``$$`r`$$``.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of `` $$`(x, y, z, dx, dy, dz, r)`$$ ``, our coordinate systems also define how to interpret the box dimensions `` $$`(dx, dy, dz)`$$ `` and the yaw angle `` $$`r`$$ ``.
The illustration of the three coordinate systems is shown below:
......@@ -55,13 +55,13 @@ We will stick to the three coordinate systems defined in this tutorial in the fu
## Definition of the yaw angle
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane ``$$`\Pi`$$`` perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on ``$$`\Pi`$$`` have non-zero yaw angles depending on its angle with the reference direction.
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane `` $$`\Pi`$$ `` perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on `` $$`\Pi`$$ `` have non-zero yaw angles depending on its angle with the reference direction.
Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one's eyes).
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of ``$$`\frac{\pi}{2}`$$``.
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of `` $$`\frac{\pi}{2}`$$ ``.
```
z up y front (yaw=0.5*pi)
......@@ -92,9 +92,9 @@ __|____|____|____|______\ x right
## Definition of the box dimensions
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be ``$$`dx`$$``. However, this is not always the case in some datasets (we will address that later).
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be `` $$`dx`$$ ``. However, this is not always the case in some datasets (we will address that later).
The following figures show the meaning of the correspondence between the x-axis and ``$$`dx`$$``, and between the y-axis and ``$$`dy`$$``.
The following figures show the meaning of the correspondence between the x-axis and `` $$`dx`$$ ``, and between the y-axis and `` $$`dy`$$ ``.
```
y front
......@@ -111,7 +111,7 @@ __|____|____|____|______\ x right
| dy
```
Note that the box direction is always parallel with the edge ``$$`dx`$$``.
Note that the box direction is always parallel with the edge `` $$`dx`$$ ``.
```
y front
......@@ -138,14 +138,12 @@ In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's
![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
For each box, the dimensions are ``$$`(w, l, h)`$$``, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
For each box, the dimensions are `` $$`(w, l, h)`$$ ``, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
Our LiDAR coordinate system has two changes:
- The yaw angle is defined to be right-handed instead of left-handed for consistency;
- The box dimensions are ``$$`(l, w, h)`$$`` instead of ``$$`(w, l, h)`$$``, since ``$$`w`$$`` corresponds to ``$$`dy`$$`` and ``$$`l`$$`` corresponds to ``$$`dx`$$`` in KITTI.
- The box dimensions are `` $$`(l, w, h)`$$ `` instead of `` $$`(w, l, h)`$$ ``, since `` $$`w`$$ `` corresponds to `` $$`dy`$$ `` and `` $$`l`$$ `` corresponds to `` $$`dx`$$ `` in KITTI.
### Waymo
......@@ -153,7 +151,7 @@ We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also s
### NuScenes
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to ``$$`(dy, dx)`$$``, or ``$$`(w, l)`$$``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to `` $$`(dy, dx)`$$ ``, or `` $$`(w, l)`$$ ``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
......@@ -185,25 +183,25 @@ Take the conversion between our Camera coordinate system and LiDAR coordinate sy
First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
- ``$$`x_{LiDAR}=z_{camera}`$$``
- ``$$`y_{LiDAR}=-x_{camera}`$$``
- ``$$`z_{LiDAR}=-y_{camera}`$$``
- `` $$`x_{LiDAR}=z_{camera}`$$ ``
- `` $$`y_{LiDAR}=-x_{camera}`$$ ``
- `` $$`z_{LiDAR}=-y_{camera}`$$ ``
Then, the box dimensions before and after the conversion satisfy the following relationship:
- ``$$`dx_{LiDAR}=dx_{camera}`$$``
- ``$$`dy_{LiDAR}=dz_{camera}`$$``
- ``$$`dz_{LiDAR}=dy_{camera}`$$``
- `` $$`dx_{LiDAR}=dx_{camera}`$$ ``
- `` $$`dy_{LiDAR}=dz_{camera}`$$ ``
- `` $$`dz_{LiDAR}=dy_{camera}`$$ ``
Finally, the yaw angle should also be converted:
- ``$$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$``
- `` $$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$ ``
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
### Bird's Eye View
The BEV of a camera coordinate system box is ``$$`(x, z, dx, dz, -r)`$$`` if the 3D box is ``$$`(x, y, z, dx, dy, dz, r)`$$``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
The BEV of a camera coordinate system box is `` $$`(x, z, dx, dz, -r)`$$ `` if the 3D box is `` $$`(x, y, z, dx, dy, dz, r)`$$ ``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
......@@ -225,18 +223,18 @@ For each box related op, we have marked the type of boxes to which we can apply
No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
#### Q3: How does a phase difference of ``$$`2\pi`$$`` in the yaw angle of a box affect evaluation?
#### Q3: How does a phase difference of `` $$`2\pi`$$ `` in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of ``$$`2\pi`$$`` in the yaw angle will result in the same box, thus not affecting evaluation.
For IoU calculation, a phase difference of `` $$`2\pi`$$ `` in the yaw angle will result in the same box, thus not affecting evaluation.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of ``$$`2\pi`$$`` will not change the result.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of `` $$`2\pi`$$ `` will not change the result.
#### Q4: How does a phase difference of ``$$`\pi`$$`` in the yaw angle of a box affect evaluation?
#### Q4: How does a phase difference of `` $$`\pi`$$ `` in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of ``$$`\pi`$$`` in the yaw angle will result in the same box, thus not affecting evaluation.
For IoU calculation, a phase difference of `` $$`\pi`$$ `` in the yaw angle will result in the same box, thus not affecting evaluation.
However, for angle prediction evaluation, this will result in the exact opposite direction.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add ``$$`\pi`$$`` to this angle, the car front will become the car rear.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add `` $$`\pi`$$ `` to this angle, the car front will become the car rear.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of ``$$`\pi`$$`` will not affect the angle prediction score.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of `` $$`\pi`$$ `` will not affect the angle prediction score.
......@@ -222,62 +222,62 @@ There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
......
......@@ -41,8 +41,8 @@ To find the above module defined above, this module should be imported into the
- Add `mmdet3d/core/optimizer/__init__.py` to import it.
The newly defined module should be imported in `mmdet3d/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
The newly defined module should be imported in `mmdet3d/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_optimizer import MyOptimizer
......@@ -116,35 +116,35 @@ Tricks not implemented by the optimizer should be implemented through optimizer
- __Use gradient clip to stabilize training__:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
```python
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
```
```python
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
```
If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings in the base config. See the [config documentation](https://mmdetection.readthedocs.io/en/latest/tutorials/config.html) for more details.
If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings in the base config. See the [config documentation](https://mmdetection.readthedocs.io/en/latest/tutorials/config.html) for more details.
- __Use momentum schedule to accelerate model convergence__:
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/lr_updater.py#L358) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/momentum_updater.py#L225).
```python
lr_config = dict(
policy='cyclic',
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config = dict(
policy='cyclic',
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/lr_updater.py#L358) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/momentum_updater.py#L225).
```python
lr_config = dict(
policy='cyclic',
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config = dict(
policy='cyclic',
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
## Customize training schedules
......@@ -153,20 +153,20 @@ We support many other learning rate schedule [here](https://github.com/open-mmla
- Poly schedule:
```python
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
```
```python
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
```
- ConsineAnnealing schedule:
```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```
```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```
## Customize workflow
......@@ -240,8 +240,8 @@ Then we need to make `MyHook` imported. Assuming the hook is in `mmdet3d/core/ut
- Modify `mmdet3d/core/utils/__init__.py` to import it.
The newly defined module should be imported in `mmdet3d/core/utils/__init__.py` so that the registry will
find the new module and add it:
The newly defined module should be imported in `mmdet3d/core/utils/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_hook import MyHook
......
......@@ -86,100 +86,113 @@ For each operation, we list the related dict fields that are added/updated/remov
### Data loading
`LoadPointsFromFile`
- add: points
`LoadPointsFromMultiSweeps`
- update: points
`LoadAnnotations3D`
- add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
### Pre-processing
`GlobalRotScaleTrans`
- add: pcd_trans, pcd_rotation, pcd_scale_factor
- update: points, *bbox3d_fields
- update: points, \*bbox3d_fields
`RandomFlip3D`
- add: flip, pcd_horizontal_flip, pcd_vertical_flip
- update: points, *bbox3d_fields
- update: points, \*bbox3d_fields
`PointsRangeFilter`
- update: points
`ObjectRangeFilter`
- update: gt_bboxes_3d, gt_labels_3d
`ObjectNameFilter`
- update: gt_bboxes_3d, gt_labels_3d
`PointShuffle`
- update: points
`PointsRangeFilter`
- update: points
### Formatting
`DefaultFormatBundle3D`
- update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
`Collect3D`
- add: img_meta (the keys of img_meta is specified by `meta_keys`)
- remove: all other keys except for those specified by `keys`
### Test time augmentation
`MultiScaleFlipAug`
- update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
## Extend and use custom pipelines
1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python
from mmdet.datasets import PIPELINES
```python
from mmdet.datasets import PIPELINES
@PIPELINES.register_module()
class MyTransform:
@PIPELINES.register_module()
class MyTransform:
def __call__(self, results):
results['dummy'] = True
return results
```
def __call__(self, results):
results['dummy'] = True
return results
```
2. Import the new class.
```python
from .my_pipeline import MyTransform
```
```python
from .my_pipeline import MyTransform
```
3. Use it in config files.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
file_client_args=file_client_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
file_client_args=file_client_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
......@@ -8,4 +8,5 @@
customize_runtime.md
coord_sys_tutorial.md
backends_support.md
model_deployment.md
pure_point_cloud_dataset.md
......@@ -37,17 +37,17 @@ python ./tools/deploy.py \
### Description of all arguments
* `deploy_cfg` : The path of deploy config file in MMDeploy codebase.
* `model_cfg` : The path of model config file in OpenMMLab codebase.
* `checkpoint` : The path of model checkpoint file.
* `img` : The path of point cloud file or image file that used to convert model.
* `--test-img` : The path of image file that used to test model. If not specified, it will be set to `None`.
* `--work-dir` : The path of work directory that used to save logs and models.
* `--calib-dataset-cfg` : Only valid in int8 mode. Config used for calibration. If not specified, it will be set to `None` and use "val" dataset in model config for calibration.
* `--device` : The device used for conversion. If not specified, it will be set to `cpu`.
* `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
* `--show` : Whether to show detection outputs.
* `--dump-info` : Whether to output information for SDK.
- `deploy_cfg` : The path of deploy config file in MMDeploy codebase.
- `model_cfg` : The path of model config file in OpenMMLab codebase.
- `checkpoint` : The path of model checkpoint file.
- `img` : The path of point cloud file or image file that used to convert model.
- `--test-img` : The path of image file that used to test model. If not specified, it will be set to `None`.
- `--work-dir` : The path of work directory that used to save logs and models.
- `--calib-dataset-cfg` : Only valid in int8 mode. Config used for calibration. If not specified, it will be set to `None` and use "val" dataset in model config for calibration.
- `--device` : The device used for conversion. If not specified, it will be set to `cpu`.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
- `--show` : Whether to show detection outputs.
- `--dump-info` : Whether to output information for SDK.
### Example
......@@ -110,12 +110,12 @@ python tools/test.py \
## Supported models
| Model | TorchScript | OnnxRuntime | TensorRT | NCNN | PPLNN | OpenVINO | Model config |
| -------------------- | :---------: | :---------: | :------: | :---: | :---: | :------: | -------------------------------------------------------------------------------------- |
| PointPillars | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) |
| CenterPoint (pillar) | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) |
| Model | TorchScript | OnnxRuntime | TensorRT | NCNN | PPLNN | OpenVINO | Model config |
| -------------------- | :---------: | :---------: | :------: | :--: | :---: | :------: | -------------------------------------------------------------------------------------- |
| PointPillars | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) |
| CenterPoint (pillar) | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) |
## Note
* MMDeploy version >= 0.4.0.
* Currently, CenterPoint has only supported the pillar version.
- MMDeploy version >= 0.4.0.
- Currently, CenterPoint has only supported the pillar version.
# Tutorial 8: Use Pure Point Cloud Dataset
# Tutorial 9: Use Pure Point Cloud Dataset
## Data Pre-Processing
......@@ -261,62 +261,62 @@ There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
```python
dataset_A_train = dict()
dataset_B_train = dict()
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val = dict()
dataset_B_val = dict()
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
......@@ -435,7 +435,6 @@ Here you can refer to the setting of the existing datasets. theoretically, `voxe
if the `point_cloud_range` and `voxel_size` are set to be `[0, -40, -3, 70.4, 40, 1]` and `[0.05, 0.05, 0.1]` respectively, then the shape of intermediate feature map should be `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`. More details refers to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/382).
### Adjust Anchor Range and Size in Config
```python
......@@ -450,6 +449,7 @@ anchor_generator=dict(
rotations=[0, 1.57],
reshape_out=False),
```
Regarding the setting of `anchor_range`, it is generally adjusted according to dataset. Note that `z` value needs to be adjusted accordingly to the position of the point cloud, please refer to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/986).
Regarding the setting of `anchor_size`, it is usually necessary to count the average length, width and height of the entire training dataset as `anchor_size` to obtain the best results.
......
......@@ -14,26 +14,26 @@ python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title
Examples:
- Plot the classification loss of some run.
- Plot the classification loss of some run.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- Plot the classification and regression loss of some run, and save the figure to a pdf.
- Plot the classification and regression loss of some run, and save the figure to a pdf.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
- Compare the bbox mAP of two runs in the same figure.
- Compare the bbox mAP of two runs in the same figure.
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
You can also compute the average training speed.
......@@ -51,7 +51,7 @@ time std over epochs is 0.0028
average iter time: 1.1959 s/iter
```
&emsp;
&#8195;
# Visualization
......@@ -126,7 +126,7 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task
![](../../resources/browse_dataset_mono.png)
&emsp;
&#8195;
# Model Serving
......@@ -184,7 +184,7 @@ Example:
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
&emsp;
&#8195;
# Model Complexity
......@@ -213,7 +213,7 @@ comparisons, but double check it before you adopt it in technical reports or pap
2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
3. We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
&emsp;
&#8195;
# Model Conversion
......@@ -258,7 +258,7 @@ python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth
The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
&emsp;
&#8195;
# Dataset Conversion
......@@ -271,15 +271,15 @@ python -u tools/data_converter/nuimage_converter.py --data-root ${DATA_ROOT} --v
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
```
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
More details could be referred to the [doc](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html) for dataset preparation and [README](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages/README.md/) for nuImages dataset.
&emsp;
&#8195;
# Miscellaneous
......
......@@ -32,6 +32,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [-
目前我们只支持 SMOKE 的 CPU 推理测试。
可选参数:
- `RESULT_FILE`:输出结果(pickle 格式)的文件名,如果未指定,结果不会被保存。
- `EVAL_METRICS`:在结果上评测的项,不同的数据集有不同的合法值。具体来说,我们默认对不同的数据集都使用各自的官方度量方法进行评测,所以对 nuScenes、Lyft、ScanNet 和 SUNRGBD 这些数据集来说在检测任务上可以简单设置为 `mAP`;对 KITTI 数据集来说,如果我们只想评测 2D 检测效果,可以将度量方法设置为 `img_bbox`;对于 Waymo 数据集,我们提供了 KITTI 风格(不稳定)和 Waymo 官方风格这两种评测方法,分别对应 `kitti``waymo`,我们推荐使用默认的官方度量方法,它的性能稳定而且可以与其它算法公平比较;同样地,对 S3DIS、ScanNet 这些数据集来说,在分割任务上的度量方法可以设置为 `mIoU`
- `--show`:如果被指定,检测结果会在静默模式下被保存,用于调试和可视化,但只在单块GPU测试的情况下生效,和 `--show-dir` 搭配使用。
......@@ -180,6 +181,7 @@ export CUDA_VISIBLE_DEVICES=-1
- `--options 'Key=value'`:覆盖使用的配置中的一些设定。
`resume-from``load-from` 的不同点:
- `resume-from` 加载模型权重和优化器状态,同时周期数也从特定的模型权重文件中继承,通常用于恢复偶然中断的训练过程。
- `load-from` 仅加载模型权重,训练周期从0开始,通常用于微调。
......
......@@ -72,7 +72,6 @@ KITTI 官方提供的目标检测开发[工具包](https://s3.eu-central-1.amazo
更多关于 Waymo 数据集预处理的中间结果的细节,请参照对应的[说明文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/datasets/waymo_det.html)
## 准备配置文件
第二步是准备配置文件来帮助数据集的读取和使用,另外,为了在 3D 检测中获得不错的性能,调整超参数通常是必要的。
......
# 基准测试
这里我们对 MMDetection3D 和其他开源 3D 目标检测代码库中模型的训练速度和测试速度进行了基准测试。
## 配置
* 硬件:8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
* 软件:Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
* 模型:由于不同代码库所实现的模型种类有所不同,在基准测试中我们选择了 SECOND、PointPillars、Part-A2 和 VoteNet 几种模型,分别与其他代码库中的相应模型实现进行了对比。
* 度量方法:我们使用整个训练过程中的平均吞吐量作为度量方法,并跳过每个 epoch 的前 50 次迭代以消除训练预热的影响。
## 主要结果
对于模型的训练速度(样本/秒),我们将 MMDetection3D 与其他实现了相同模型的代码库进行了对比。结果如下所示,表格内的数字越大,代表模型的训练速度越快。代码库中不支持的模型使用 `×` 进行标识。
| 模型 | MMDetection3D | OpenPCDet | votenet | Det3D |
| :-----------------: | :-----------: | :-------: | :-----: | :---: |
| VoteNet | 358 | × | 77 | × |
| PointPillars-car | 141 | × | × | 140 |
| PointPillars-3class | 107 | 44 | × | × |
| SECOND | 40 | 30 | × | × |
| Part-A2 | 17 | 14 | × | × |
## 测试细节
### 为了计算速度所做的修改
* __MMDetection3D__:我们尝试使用与其他代码库中尽可能相同的配置,具体配置细节见 [基准测试配置](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark)
* __Det3D__:为了与 Det3D 进行比较,我们使用了 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 所对应的代码版本。
* __OpenPCDet__:为了与 OpenPCDet 进行比较,我们使用了 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 所对应的代码版本。
为了计算训练速度,我们在 `./tools/train_utils/train_utils.py` 文件中添加了用于记录运行时间的代码。我们对每个 epoch 的训练速度进行计算,并报告所有 epoch 的平均速度。
<details>
<summary>
(为了使用相同方法进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/train_utils/train_utils.py b/tools/train_utils/train_utils.py
index 91f21dd..021359d 100644
--- a/tools/train_utils/train_utils.py
+++ b/tools/train_utils/train_utils.py
@@ -2,6 +2,7 @@ import torch
import os
import glob
import tqdm
+import datetime
from torch.nn.utils import clip_grad_norm_
@@ -13,7 +14,10 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
if rank == 0:
pbar = tqdm.tqdm(total=total_it_each_epoch, leave=leave_pbar, desc='train', dynamic_ncols=True)
+ start_time = None
for cur_it in range(total_it_each_epoch):
+ if cur_it > 49 and start_time is None:
+ start_time = datetime.datetime.now()
try:
batch = next(dataloader_iter)
except StopIteration:
@@ -55,9 +59,11 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
tb_log.add_scalar('learning_rate', cur_lr, accumulated_iter)
for key, val in tb_dict.items():
tb_log.add_scalar('train_' + key, val, accumulated_iter)
+ endtime = datetime.datetime.now()
+ speed = (endtime - start_time).seconds / (total_it_each_epoch - 50)
if rank == 0:
pbar.close()
- return accumulated_iter
+ return accumulated_iter, speed
def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_cfg,
@@ -65,6 +71,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
lr_warmup_scheduler=None, ckpt_save_interval=1, max_ckpt_save_num=50,
merge_all_iters_to_one_epoch=False):
accumulated_iter = start_iter
+ speeds = []
with tqdm.trange(start_epoch, total_epochs, desc='epochs', dynamic_ncols=True, leave=(rank == 0)) as tbar:
total_it_each_epoch = len(train_loader)
if merge_all_iters_to_one_epoch:
@@ -82,7 +89,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
cur_scheduler = lr_warmup_scheduler
else:
cur_scheduler = lr_scheduler
- accumulated_iter = train_one_epoch(
+ accumulated_iter, speed = train_one_epoch(
model, optimizer, train_loader, model_func,
lr_scheduler=cur_scheduler,
accumulated_iter=accumulated_iter, optim_cfg=optim_cfg,
@@ -91,7 +98,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
total_it_each_epoch=total_it_each_epoch,
dataloader_iter=dataloader_iter
)
-
+ speeds.append(speed)
# save trained model
trained_epoch = cur_epoch + 1
if trained_epoch % ckpt_save_interval == 0 and rank == 0:
@@ -107,6 +114,8 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
save_checkpoint(
checkpoint_state(model, optimizer, trained_epoch, accumulated_iter), filename=ckpt_name,
)
+ print(speed)
+ print(f'*******{sum(speeds) / len(speeds)}******')
def model_state_to_cpu(model_state):
```
</details>
### VoteNet
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd-3d-10class.py 8 --no-validate
```
* __votenet__:在 commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566) 版本下,执行如下命令:
```bash
python train.py --dataset sunrgbd --batch_size 16
```
然后执行如下命令,对测试速度进行评估:
```bash
python eval.py --dataset sunrgbd --checkpoint_path log_sunrgbd/checkpoint.tar --batch_size 1 --dump_dir eval_sunrgbd --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal
```
注意,为了计算推理速度,我们对 `eval.py` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/eval.py b/eval.py
index c0b2886..04921e9 100644
--- a/eval.py
+++ b/eval.py
@@ -10,6 +10,7 @@ import os
import sys
import numpy as np
from datetime import datetime
+import time
import argparse
import importlib
import torch
@@ -28,7 +29,7 @@ parser.add_argument('--checkpoint_path', default=None, help='Model checkpoint pa
parser.add_argument('--dump_dir', default=None, help='Dump dir to save sample outputs [default: None]')
parser.add_argument('--num_point', type=int, default=20000, help='Point Number [default: 20000]')
parser.add_argument('--num_target', type=int, default=256, help='Point Number [default: 256]')
-parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 8]')
+parser.add_argument('--batch_size', type=int, default=1, help='Batch Size during training [default: 8]')
parser.add_argument('--vote_factor', type=int, default=1, help='Number of votes generated from each seed [default: 1]')
parser.add_argument('--cluster_sampling', default='vote_fps', help='Sampling strategy for vote clusters: vote_fps, seed_fps, random [default: vote_fps]')
parser.add_argument('--ap_iou_thresholds', default='0.25,0.5', help='A list of AP IoU thresholds [default: 0.25,0.5]')
@@ -132,6 +133,7 @@ CONFIG_DICT = {'remove_empty_box': (not FLAGS.faster_eval), 'use_3d_nms': FLAGS.
# ------------------------------------------------------------------------- GLOBAL CONFIG END
def evaluate_one_epoch():
+ time_list = list()
stat_dict = {}
ap_calculator_list = [APCalculator(iou_thresh, DATASET_CONFIG.class2type) \
for iou_thresh in AP_IOU_THRESHOLDS]
@@ -144,6 +146,8 @@ def evaluate_one_epoch():
# Forward pass
inputs = {'point_clouds': batch_data_label['point_clouds']}
+ torch.cuda.synchronize()
+ start_time = time.perf_counter()
with torch.no_grad():
end_points = net(inputs)
@@ -161,6 +165,12 @@ def evaluate_one_epoch():
batch_pred_map_cls = parse_predictions(end_points, CONFIG_DICT)
batch_gt_map_cls = parse_groundtruths(end_points, CONFIG_DICT)
+ torch.cuda.synchronize()
+ elapsed = time.perf_counter() - start_time
+ time_list.append(elapsed)
+
+ if len(time_list==200):
+ print("average inference time: %4f"%(sum(time_list[5:])/len(time_list[5:])))
for ap_calculator in ap_calculator_list:
ap_calculator.step(batch_pred_map_cls, batch_gt_map_cls)
```
### PointPillars-car
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
```
* __Det3D__:在 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 版本下,使用 `kitti_point_pillars_mghead_syncbn.py` 并执行如下命令:
```bash
./tools/scripts/train.sh --launcher=slurm --gpus=8
```
注意,为了训练 PointPillars,我们对 `train.sh` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/scripts/train.sh b/tools/scripts/train.sh
index 3a93f95..461e0ea 100755
--- a/tools/scripts/train.sh
+++ b/tools/scripts/train.sh
@@ -16,9 +16,9 @@ then
fi
# Voxelnet
-python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
+# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/ nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$NUSC_CBGS_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ lyft_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$LYFT_CBGS_WORK_DIR
# PointPillars
-# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ original_pp_mghead_syncbn_kitti.py --work_dir=$PP_WORK_DIR
+python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ kitti_point_pillars_mghead_syncbn.py
```
</details>
### PointPillars-3class
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/pointpillar.yaml --batch_size 32 --workers 32 --epochs 80
```
### SECOND
基准测试中的 SECOND 指在 [second.Pytorch](https://github.com/traveller59/second.pytorch) 首次被实现的 [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config)。Det3D 实现的 SECOND 中,使用了自己实现的 Multi-Group Head,因此无法将它的速度与其他代码库进行对比。
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/second.yaml --batch_size 32 --workers 32 --epochs 80
```
### Part-A2
* __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令以进行模型训练:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/PartA2.yaml --batch_size 32 --workers 32 --epochs 80
```
\ No newline at end of file
# 基准测试
这里我们对 MMDetection3D 和其他开源 3D 目标检测代码库中模型的训练速度和测试速度进行了基准测试。
## 配置
- 硬件:8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- 软件:Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
- 模型:由于不同代码库所实现的模型种类有所不同,在基准测试中我们选择了 SECOND、PointPillars、Part-A2 和 VoteNet 几种模型,分别与其他代码库中的相应模型实现进行了对比。
- 度量方法:我们使用整个训练过程中的平均吞吐量作为度量方法,并跳过每个 epoch 的前 50 次迭代以消除训练预热的影响。
## 主要结果
对于模型的训练速度(样本/秒),我们将 MMDetection3D 与其他实现了相同模型的代码库进行了对比。结果如下所示,表格内的数字越大,代表模型的训练速度越快。代码库中不支持的模型使用 `×` 进行标识。
| 模型 | MMDetection3D | OpenPCDet | votenet | Det3D |
| :-----------------: | :-----------: | :-------: | :-----: | :---: |
| VoteNet | 358 | × | 77 | × |
| PointPillars-car | 141 | × | × | 140 |
| PointPillars-3class | 107 | 44 | × | × |
| SECOND | 40 | 30 | × | × |
| Part-A2 | 17 | 14 | × | × |
## 测试细节
### 为了计算速度所做的修改
- __MMDetection3D__:我们尝试使用与其他代码库中尽可能相同的配置,具体配置细节见 [基准测试配置](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark)
- __Det3D__:为了与 Det3D 进行比较,我们使用了 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 所对应的代码版本。
- __OpenPCDet__:为了与 OpenPCDet 进行比较,我们使用了 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 所对应的代码版本。
为了计算训练速度,我们在 `./tools/train_utils/train_utils.py` 文件中添加了用于记录运行时间的代码。我们对每个 epoch 的训练速度进行计算,并报告所有 epoch 的平均速度。
<details>
<summary>
(为了使用相同方法进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/train_utils/train_utils.py b/tools/train_utils/train_utils.py
index 91f21dd..021359d 100644
--- a/tools/train_utils/train_utils.py
+++ b/tools/train_utils/train_utils.py
@@ -2,6 +2,7 @@ import torch
import os
import glob
import tqdm
+import datetime
from torch.nn.utils import clip_grad_norm_
@@ -13,7 +14,10 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
if rank == 0:
pbar = tqdm.tqdm(total=total_it_each_epoch, leave=leave_pbar, desc='train', dynamic_ncols=True)
+ start_time = None
for cur_it in range(total_it_each_epoch):
+ if cur_it > 49 and start_time is None:
+ start_time = datetime.datetime.now()
try:
batch = next(dataloader_iter)
except StopIteration:
@@ -55,9 +59,11 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac
tb_log.add_scalar('learning_rate', cur_lr, accumulated_iter)
for key, val in tb_dict.items():
tb_log.add_scalar('train_' + key, val, accumulated_iter)
+ endtime = datetime.datetime.now()
+ speed = (endtime - start_time).seconds / (total_it_each_epoch - 50)
if rank == 0:
pbar.close()
- return accumulated_iter
+ return accumulated_iter, speed
def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_cfg,
@@ -65,6 +71,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
lr_warmup_scheduler=None, ckpt_save_interval=1, max_ckpt_save_num=50,
merge_all_iters_to_one_epoch=False):
accumulated_iter = start_iter
+ speeds = []
with tqdm.trange(start_epoch, total_epochs, desc='epochs', dynamic_ncols=True, leave=(rank == 0)) as tbar:
total_it_each_epoch = len(train_loader)
if merge_all_iters_to_one_epoch:
@@ -82,7 +89,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
cur_scheduler = lr_warmup_scheduler
else:
cur_scheduler = lr_scheduler
- accumulated_iter = train_one_epoch(
+ accumulated_iter, speed = train_one_epoch(
model, optimizer, train_loader, model_func,
lr_scheduler=cur_scheduler,
accumulated_iter=accumulated_iter, optim_cfg=optim_cfg,
@@ -91,7 +98,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
total_it_each_epoch=total_it_each_epoch,
dataloader_iter=dataloader_iter
)
-
+ speeds.append(speed)
# save trained model
trained_epoch = cur_epoch + 1
if trained_epoch % ckpt_save_interval == 0 and rank == 0:
@@ -107,6 +114,8 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_
save_checkpoint(
checkpoint_state(model, optimizer, trained_epoch, accumulated_iter), filename=ckpt_name,
)
+ print(speed)
+ print(f'*******{sum(speeds) / len(speeds)}******')
def model_state_to_cpu(model_state):
```
</details>
### VoteNet
- __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd-3d-10class.py 8 --no-validate
```
- __votenet__:在 commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566) 版本下,执行如下命令:
```bash
python train.py --dataset sunrgbd --batch_size 16
```
然后执行如下命令,对测试速度进行评估:
```bash
python eval.py --dataset sunrgbd --checkpoint_path log_sunrgbd/checkpoint.tar --batch_size 1 --dump_dir eval_sunrgbd --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal
```
注意,为了计算推理速度,我们对 `eval.py` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/eval.py b/eval.py
index c0b2886..04921e9 100644
--- a/eval.py
+++ b/eval.py
@@ -10,6 +10,7 @@ import os
import sys
import numpy as np
from datetime import datetime
+import time
import argparse
import importlib
import torch
@@ -28,7 +29,7 @@ parser.add_argument('--checkpoint_path', default=None, help='Model checkpoint pa
parser.add_argument('--dump_dir', default=None, help='Dump dir to save sample outputs [default: None]')
parser.add_argument('--num_point', type=int, default=20000, help='Point Number [default: 20000]')
parser.add_argument('--num_target', type=int, default=256, help='Point Number [default: 256]')
-parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 8]')
+parser.add_argument('--batch_size', type=int, default=1, help='Batch Size during training [default: 8]')
parser.add_argument('--vote_factor', type=int, default=1, help='Number of votes generated from each seed [default: 1]')
parser.add_argument('--cluster_sampling', default='vote_fps', help='Sampling strategy for vote clusters: vote_fps, seed_fps, random [default: vote_fps]')
parser.add_argument('--ap_iou_thresholds', default='0.25,0.5', help='A list of AP IoU thresholds [default: 0.25,0.5]')
@@ -132,6 +133,7 @@ CONFIG_DICT = {'remove_empty_box': (not FLAGS.faster_eval), 'use_3d_nms': FLAGS.
# ------------------------------------------------------------------------- GLOBAL CONFIG END
def evaluate_one_epoch():
+ time_list = list()
stat_dict = {}
ap_calculator_list = [APCalculator(iou_thresh, DATASET_CONFIG.class2type) \
for iou_thresh in AP_IOU_THRESHOLDS]
@@ -144,6 +146,8 @@ def evaluate_one_epoch():
# Forward pass
inputs = {'point_clouds': batch_data_label['point_clouds']}
+ torch.cuda.synchronize()
+ start_time = time.perf_counter()
with torch.no_grad():
end_points = net(inputs)
@@ -161,6 +165,12 @@ def evaluate_one_epoch():
batch_pred_map_cls = parse_predictions(end_points, CONFIG_DICT)
batch_gt_map_cls = parse_groundtruths(end_points, CONFIG_DICT)
+ torch.cuda.synchronize()
+ elapsed = time.perf_counter() - start_time
+ time_list.append(elapsed)
+
+ if len(time_list==200):
+ print("average inference time: %4f"%(sum(time_list[5:])/len(time_list[5:])))
for ap_calculator in ap_calculator_list:
ap_calculator.step(batch_pred_map_cls, batch_gt_map_cls)
```
### PointPillars-car
- __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
```
- __Det3D__:在 commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7) 版本下,使用 `kitti_point_pillars_mghead_syncbn.py` 并执行如下命令:
```bash
./tools/scripts/train.sh --launcher=slurm --gpus=8
```
注意,为了训练 PointPillars,我们对 `train.sh` 进行了修改。
<details>
<summary>
(为了对相同模型进行测试所做的具体修改 - 点击展开)
</summary>
```diff
diff --git a/tools/scripts/train.sh b/tools/scripts/train.sh
index 3a93f95..461e0ea 100755
--- a/tools/scripts/train.sh
+++ b/tools/scripts/train.sh
@@ -16,9 +16,9 @@ then
fi
# Voxelnet
-python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
+# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/ nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$NUSC_CBGS_WORK_DIR
# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ lyft_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$LYFT_CBGS_WORK_DIR
# PointPillars
-# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ original_pp_mghead_syncbn_kitti.py --work_dir=$PP_WORK_DIR
+python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ kitti_point_pillars_mghead_syncbn.py
```
</details>
### PointPillars-3class
- __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/pointpillar.yaml --batch_size 32 --workers 32 --epochs 80
```
### SECOND
基准测试中的 SECOND 指在 [second.Pytorch](https://github.com/traveller59/second.pytorch) 首次被实现的 [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config)。Det3D 实现的 SECOND 中,使用了自己实现的 Multi-Group Head,因此无法将它的速度与其他代码库进行对比。
- __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/second.yaml --batch_size 32 --workers 32 --epochs 80
```
### Part-A2
- __MMDetection3D__:在 v0.1.0 版本下, 执行如下命令:
```bash
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
- __OpenPCDet__:在 commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2) 版本下,执行如下命令以进行模型训练:
```bash
cd tools
sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/PartA2.yaml --batch_size 32 --workers 32 --epochs 80
```
......@@ -136,10 +136,11 @@ latex_documents = [
StandaloneHTMLBuilder.supported_image_types = [
'image/svg+xml', 'image/gif', 'image/png', 'image/jpeg'
]
# -- Extension configuration -------------------------------------------------
# Ignore >>> when copying code
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Enable ::: for my_st
myst_enable_extensions = ['colon_fence']
myst_heading_anchors = 3
language = 'zh_CN'
def builder_inited_handler(app):
......
......@@ -88,27 +88,27 @@ kitti
- `kitti_gt_database/xxxxx.bin`: 训练数据集中包含在 3D 标注框中的点云数据
- `kitti_infos_train.pkl`:训练数据集的信息,其中每一帧的信息包含下面的内容:
- info['point_cloud']: {'num_features': 4, 'velodyne_path': velodyne_path}.
- info['annos']: {
- 位置:其中 x,y,z 为相机参考坐标系下的目标的底部中心(单位为米),是一个尺寸为 Nx3 的数组
- 维度: 目标的高、宽、长(单位为米),是一个尺寸为 Nx3 的数组
- 旋转角:相机坐标系下目标绕着 Y 轴的旋转角 ry,其取值范围为 [-pi..pi] ,是一个尺寸为 N 的数组
- 名称:标准框所包含的目标的名称,是一个尺寸为 N 的数组
- 困难度:kitti 官方所定义的困难度,包括 简单,适中,困难
- 组别标识符:用于多部件的目标
}
- (optional) info['calib']: {
- P0:校对后的 camera0 投影矩阵,是一个 3x4 数组
- P1:校对后的 camera1 投影矩阵,是一个 3x4 数组
- P2:校对后的 camera2 投影矩阵,是一个 3x4 数组
- P3:校对后的 camera3 投影矩阵,是一个 3x4 数组
- R0_rect:校准旋转矩阵,是一个 4x4 数组
- Tr_velo_to_cam:从 Velodyne 坐标到相机坐标的变换矩阵,是一个 4x4 数组
- Tr_imu_to_velo:从 IMU 坐标到 Velodyne 坐标的变换矩阵,是一个 4x4 数组
}
- (optional) info['image']:{'image_idx': idx, 'image_path': image_path, 'image_shape', image_shape}.
**注意**:其中的 info['annos'] 中的数据均位于相机参考坐标系中,更多的细节请参考[此处](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf)
- info\['point_cloud'\]: {'num_features': 4, 'velodyne_path': velodyne_path}.
- info\['annos'\]: {
- 位置:其中 x,y,z 为相机参考坐标系下的目标的底部中心(单位为米),是一个尺寸为 Nx3 的数组
- 维度: 目标的高、宽、长(单位为米),是一个尺寸为 Nx3 的数组
- 旋转角:相机坐标系下目标绕着 Y 轴的旋转角 ry,其取值范围为 \[-pi..pi\] ,是一个尺寸为 N 的数组
- 名称:标准框所包含的目标的名称,是一个尺寸为 N 的数组
- 困难度:kitti 官方所定义的困难度,包括 简单,适中,困难
- 组别标识符:用于多部件的目标
}
- (optional) info\['calib'\]: {
- P0:校对后的 camera0 投影矩阵,是一个 3x4 数组
- P1:校对后的 camera1 投影矩阵,是一个 3x4 数组
- P2:校对后的 camera2 投影矩阵,是一个 3x4 数组
- P3:校对后的 camera3 投影矩阵,是一个 3x4 数组
- R0_rect:校准旋转矩阵,是一个 4x4 数组
- Tr_velo_to_cam:从 Velodyne 坐标到相机坐标的变换矩阵,是一个 4x4 数组
- Tr_imu_to_velo:从 IMU 坐标到 Velodyne 坐标的变换矩阵,是一个 4x4 数组
}
- (optional) info\['image'\]:{'image_idx': idx, 'image_path': image_path, 'image_shape', image_shape}.
**注意**:其中的 info\['annos'\] 中的数据均位于相机参考坐标系中,更多的细节请参考[此处](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf)
获取 kitti_infos_xxx.pkl 和 kitti_infos_xxx_mono3d.coco.json 的核心函数分别为 [get_kitti_image_info](https://github.com/open-mmlab/mmdetection3d/blob/7873c8f62b99314f35079f369d1dab8d63f8a3ce/tools/data_converter/kitti_data_utils.py#L140)[get_2d_boxes](https://github.com/open-mmlab/mmdetection3d/blob/7873c8f62b99314f35079f369d1dab8d63f8a3ce/tools/data_converter/kitti_converter.py#L378).
......@@ -150,9 +150,9 @@ train_pipeline = [
```
- 数据增强:
- `ObjectNoise`:对场景中的每个真实标注框目标添加噪音。
- `RandomFlip3D`:对输入点云数据进行随机地水平翻转或者垂直翻转。
- `GlobalRotScaleTrans`:对输入点云数据进行旋转。
- `ObjectNoise`:对场景中的每个真实标注框目标添加噪音。
- `RandomFlip3D`:对输入点云数据进行随机地水平翻转或者垂直翻转。
- `GlobalRotScaleTrans`:对输入点云数据进行旋转。
## 评估
......@@ -191,4 +191,4 @@ mkdir -p results/kitti-3class
./tools/dist_test.sh configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py work_dirs/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class/latest.pth 8 --out results/kitti-3class/results_eval.pkl --format-only --eval-options 'pklfile_prefix=results/kitti-3class/kitti_results' 'submission_prefix=results/kitti-3class/kitti_results'
```
在生成 `results/kitti-3class/kitti_results/xxxxx.txt` 后,您可以提交这些文件到 KITTI 官方网站进行基准测试,请参考 [KITTI 官方网站]((http://www.cvlibs.net/datasets/kitti/index.php))获取更多细节。
在生成 `results/kitti-3class/kitti_results/xxxxx.txt` 后,您可以提交这些文件到 KITTI 官方网站进行基准测试,请参考 [KITTI 官方网站](<(http://www.cvlibs.net/datasets/kitti/index.php)>)获取更多细节。
......@@ -89,21 +89,21 @@ mmdetection3d
- `lyft_database/xxxxx.bin` 文件不存在:由于真实标注框的采样对实验的影响可以忽略不计,在 Lyft 数据集中不会提取该目录和相关的 `.bin` 文件。
- `lyft_infos_train.pkl`:包含训练数据集信息,每一帧包含两个关键字:`metadata``infos`
`metadata` 包含数据集自身的基础信息,如 `{'version': 'v1.01-train'}`,然而 `infos` 包含和 nuScenes 数据集相似的数据集详细信息,但是并不包含一下几点:
- info['sweeps']:扫描信息.
- info['sweeps'][i]['type']:扫描信息的数据类型,如 `'lidar'`
Lyft 数据集中的一些样例具有不同的 LiDAR 设置,然而为了数据分布的一致性,这里将一直采用顶部的 LiDAR 设备所采集的数据点信息。
- info['gt_names']:在 Lyft 数据集中有 9 个类别,相比于 nuScenes 数据集,不同类别的标注不平衡问题更加突出。
- info['gt_velocity'] 不存在:Lyft 数据集中不存在速度评估信息。
- info['num_lidar_pts']:默认值设置为 -1。
- info['num_radar_pts']:默认值设置为 0。
- info['valid_flag'] 不存在:这个标志信息因无效的 `num_lidar_pts``num_radar_pts` 的存在而存在。
`metadata` 包含数据集自身的基础信息,如 `{'version': 'v1.01-train'}`,然而 `infos` 包含和 nuScenes 数据集相似的数据集详细信息,但是并不包含一下几点:
- info\['sweeps'\]:扫描信息.
- info\['sweeps'\]\[i\]\['type'\]:扫描信息的数据类型,如 `'lidar'`
Lyft 数据集中的一些样例具有不同的 LiDAR 设置,然而为了数据分布的一致性,这里将一直采用顶部的 LiDAR 设备所采集的数据点信息。
- info\['gt_names'\]:在 Lyft 数据集中有 9 个类别,相比于 nuScenes 数据集,不同类别的标注不平衡问题更加突出。
- info\['gt_velocity'\] 不存在:Lyft 数据集中不存在速度评估信息。
- info\['num_lidar_pts'\]:默认值设置为 -1。
- info\['num_radar_pts'\]:默认值设置为 0。
- info\['valid_flag'\] 不存在:这个标志信息因无效的 `num_lidar_pts``num_radar_pts` 的存在而存在。
- `nuscenes_infos_train_mono3d.coco.json`:包含 coco 类型的训练数据集相关的信息。这个文件仅包含 2D 相关的信息,不包含 3D 目标检测所需要的信息,如相机内参。
- info['images']:包含所有图像信息的列表。
- 仅包含 `'file_name'`, `'id'`, `'width'`, `'height'`
- info['annotations']:包含所有标注信息的列表。
- 仅包含 `'file_name'``'image_id'``'area'``'category_name'``'category_id'``'bbox'``'is_crowd'``'segmentation'``'id'`,其中 `'is_crowd'``'segmentation'` 默认设置为 `0``[]`
Lyft 数据集中不包含属性标注信息。
- info\['images'\]:包含所有图像信息的列表。
- 仅包含 `'file_name'`, `'id'`, `'width'`, `'height'`
- info\['annotations'\]:包含所有标注信息的列表。
- 仅包含 `'file_name'``'image_id'``'area'``'category_name'``'category_id'``'bbox'``'is_crowd'``'segmentation'``'id'`,其中 `'is_crowd'``'segmentation'` 默认设置为 `0``[]`
Lyft 数据集中不包含属性标注信息。
这里仅介绍存储在训练数据文件的数据记录信息,在测试数据集也采用上述的数据记录方式。
......
......@@ -64,60 +64,60 @@ mmdetection3d
- `nuscenes_database/xxxxx.bin`:训练数据集的每个 3D 包围框中包含的点云数据。
- `nuscenes_infos_train.pkl`:训练数据集信息,每帧信息有两个键值: `metadata``infos``metadata` 包含数据集本身的基本信息,例如 `{'version': 'v1.0-trainval'}`,而 `infos` 包含详细信息如下:
- info['lidar_path']:激光雷达点云数据的文件路径。
- info['token']:样本数据标记。
- info['sweeps']:扫描信息(nuScenes 中的 `sweeps` 是指没有标注的中间帧,而 `samples` 是指那些带有标注的关键帧)。
- info['sweeps'][i]['data_path']:第 i 次扫描的数据路径。
- info['sweeps'][i]['type']:扫描数据类型,例如“激光雷达”。
- info['sweeps'][i]['sample_data_token']:扫描样本数据标记。
- info['sweeps'][i]['sensor2ego_translation']:从当前传感器(用于收集扫描数据)到自车(包含感知周围环境传感器的车辆,车辆坐标系固连在自车上)的转换(1x3 列表)。
- info['sweeps'][i]['sensor2ego_rotation']:从当前传感器(用于收集扫描数据)到自车的旋转(四元数格式的 1x4 列表)。
- info['sweeps'][i]['ego2global_translation']:从自车到全局坐标的转换(1x3 列表)。
- info['sweeps'][i]['ego2global_rotation']:从自车到全局坐标的旋转(四元数格式的 1x4 列表)。
- info['sweeps'][i]['timestamp']:扫描数据的时间戳。
- info['sweeps'][i]['sensor2lidar_translation']:从当前传感器(用于收集扫描数据)到激光雷达的转换(1x3 列表)。
- info['sweeps'][i]['sensor2lidar_rotation']:从当前传感器(用于收集扫描数据)到激光雷达的旋转(四元数格式的 1x4 列表)。
- info['cams']:相机校准信息。它包含与每个摄像头对应的六个键值: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`
- info\['lidar_path'\]:激光雷达点云数据的文件路径。
- info\['token'\]:样本数据标记。
- info\['sweeps'\]:扫描信息(nuScenes 中的 `sweeps` 是指没有标注的中间帧,而 `samples` 是指那些带有标注的关键帧)。
- info\['sweeps'\]\[i\]\['data_path'\]:第 i 次扫描的数据路径。
- info\['sweeps'\]\[i\]\['type'\]:扫描数据类型,例如“激光雷达”。
- info\['sweeps'\]\[i\]\['sample_data_token'\]:扫描样本数据标记。
- info\['sweeps'\]\[i\]\['sensor2ego_translation'\]:从当前传感器(用于收集扫描数据)到自车(包含感知周围环境传感器的车辆,车辆坐标系固连在自车上)的转换(1x3 列表)。
- info\['sweeps'\]\[i\]\['sensor2ego_rotation'\]:从当前传感器(用于收集扫描数据)到自车的旋转(四元数格式的 1x4 列表)。
- info\['sweeps'\]\[i\]\['ego2global_translation'\]:从自车到全局坐标的转换(1x3 列表)。
- info\['sweeps'\]\[i\]\['ego2global_rotation'\]:从自车到全局坐标的旋转(四元数格式的 1x4 列表)。
- info\['sweeps'\]\[i\]\['timestamp'\]:扫描数据的时间戳。
- info\['sweeps'\]\[i\]\['sensor2lidar_translation'\]:从当前传感器(用于收集扫描数据)到激光雷达的转换(1x3 列表)。
- info\['sweeps'\]\[i\]\['sensor2lidar_rotation'\]:从当前传感器(用于收集扫描数据)到激光雷达的旋转(四元数格式的 1x4 列表)。
- info\['cams'\]:相机校准信息。它包含与每个摄像头对应的六个键值: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`
每个字典包含每个扫描数据按照上述方式的详细信息(每个信息的关键字与上述相同)。除此之外,每个相机还包含了一个键值 `'cam_intrinsic'` 用来保存 3D 点投影到图像平面上需要的内参信息。
- info['lidar2ego_translation']:从激光雷达到自车的转换(1x3 列表)。
- info['lidar2ego_rotation']:从激光雷达到自车的旋转(四元数格式的 1x4 列表)。
- info['ego2global_translation']:从自车到全局坐标的转换(1x3 列表)。
- info['ego2global_rotation']:从自我车辆到全局坐标的旋转(四元数格式的 1x4 列表)。
- info['timestamp']:样本数据的时间戳。
- info['gt_boxes']:7 个自由度的 3D 包围框,一个 Nx7 数组。
- info['gt_names']:3D 包围框的类别,一个 1xN 数组。
- info['gt_velocity']:3D 包围框的速度(由于不准确,没有垂直测量),一个 Nx2 数组。
- info['num_lidar_pts']:每个 3D 包围框中包含的激光雷达点数。
- info['num_radar_pts']:每个 3D 包围框中包含的雷达点数。
- info['valid_flag']:每个包围框是否有效。一般情况下,我们只将包含至少一个激光雷达或雷达点的 3D 框作为有效框。
- info\['lidar2ego_translation'\]:从激光雷达到自车的转换(1x3 列表)。
- info\['lidar2ego_rotation'\]:从激光雷达到自车的旋转(四元数格式的 1x4 列表)。
- info\['ego2global_translation'\]:从自车到全局坐标的转换(1x3 列表)。
- info\['ego2global_rotation'\]:从自我车辆到全局坐标的旋转(四元数格式的 1x4 列表)。
- info\['timestamp'\]:样本数据的时间戳。
- info\['gt_boxes'\]:7 个自由度的 3D 包围框,一个 Nx7 数组。
- info\['gt_names'\]:3D 包围框的类别,一个 1xN 数组。
- info\['gt_velocity'\]:3D 包围框的速度(由于不准确,没有垂直测量),一个 Nx2 数组。
- info\['num_lidar_pts'\]:每个 3D 包围框中包含的激光雷达点数。
- info\['num_radar_pts'\]:每个 3D 包围框中包含的雷达点数。
- info\['valid_flag'\]:每个包围框是否有效。一般情况下,我们只将包含至少一个激光雷达或雷达点的 3D 框作为有效框。
- `nuscenes_infos_train_mono3d.coco.json`:训练数据集 coco 风格的信息。该文件将基于图像的数据组织为三类(键值):`'categories'`, `'images'`, `'annotations'`
- info['categories']:包含所有类别名称的列表。每个元素都遵循字典格式并由两个键值组成:`'id'``'name'`
- info['images']:包含所有图像信息的列表。
- info['images'][i]['file_name']:第 i 张图像的文件名。
- info['images'][i]['id']:第 i 张图像的样本数据标记。
- info['images'][i]['token']:与该帧对应的样本标记。
- info['images'][i]['cam2ego_rotation']:从相机到自车的旋转(四元数格式的 1x4 列表)。
- info['images'][i]['cam2ego_translation']:从相机到自车的转换(1x3 列表)。
- info['images'][i]['ego2global_rotation'']:从自车到全局坐标的旋转(四元数格式的 1x4 列表)。
- info['images'][i]['ego2global_translation']:从自车到全局坐标的转换(1x3 列表)。
- info['images'][i]['cam_intrinsic']: 相机内参矩阵(3x3 列表)。
- info['images'][i]['width']:图片宽度, nuScenes 中默认为 1600。
- info['images'][i]['height']:图像高度, nuScenes 中默认为 900。
- info['annotations']: 包含所有标注信息的列表。
- info['annotations'][i]['file_name']:对应图像的文件名。
- info['annotations'][i]['image_id']:对应图像的图像 ID (标记)。
- info['annotations'][i]['area']:2D 包围框的面积。
- info['annotations'][i]['category_name']:类别名称。
- info['annotations'][i]['category_id']:类别 id。
- info['annotations'][i]['bbox']:2D 包围框标注(3D 投影框的外部矩形),1x4 列表跟随 [x1, y1, x2-x1, y2-y1]。x1/y1 是沿图像水平/垂直方向的最小坐标。
- info['annotations'][i]['iscrowd']:该区域是否拥挤。默认为 0。
- info['annotations'][i]['bbox_cam3d']:3D 包围框(重力)中心位置(3)、大小(3)、(全局)偏航角(1)、1x7 列表。
- info['annotations'][i]['velo_cam3d']:3D 包围框的速度(由于不准确,没有垂直测量),一个 Nx2 数组。
- info['annotations'][i]['center2d']:包含 2.5D 信息的投影 3D 中心:图像上的投影中心位置(2)和深度(1),1x3 列表。
- info['annotations'][i]['attribute_name']:属性名称。
- info['annotations'][i]['attribute_id']:属性 ID。
我们为属性分类维护了一个属性集合和映射。更多的细节请参考[这里](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L53)
- info['annotations'][i]['id']:标注 ID。默认为 `i`
- info\['categories'\]:包含所有类别名称的列表。每个元素都遵循字典格式并由两个键值组成:`'id'``'name'`
- info\['images'\]:包含所有图像信息的列表。
- info\['images'\]\[i\]\['file_name'\]:第 i 张图像的文件名。
- info\['images'\]\[i\]\['id'\]:第 i 张图像的样本数据标记。
- info\['images'\]\[i\]\['token'\]:与该帧对应的样本标记。
- info\['images'\]\[i\]\['cam2ego_rotation'\]:从相机到自车的旋转(四元数格式的 1x4 列表)。
- info\['images'\]\[i\]\['cam2ego_translation'\]:从相机到自车的转换(1x3 列表)。
- info\['images'\]\[i\]\['ego2global_rotation''\]:从自车到全局坐标的旋转(四元数格式的 1x4 列表)。
- info\['images'\]\[i\]\['ego2global_translation'\]:从自车到全局坐标的转换(1x3 列表)。
- info\['images'\]\[i\]\['cam_intrinsic'\]: 相机内参矩阵(3x3 列表)。
- info\['images'\]\[i\]\['width'\]:图片宽度, nuScenes 中默认为 1600。
- info\['images'\]\[i\]\['height'\]:图像高度, nuScenes 中默认为 900。
- info\['annotations'\]: 包含所有标注信息的列表。
- info\['annotations'\]\[i\]\['file_name'\]:对应图像的文件名。
- info\['annotations'\]\[i\]\['image_id'\]:对应图像的图像 ID (标记)。
- info\['annotations'\]\[i\]\['area'\]:2D 包围框的面积。
- info\['annotations'\]\[i\]\['category_name'\]:类别名称。
- info\['annotations'\]\[i\]\['category_id'\]:类别 id。
- info\['annotations'\]\[i\]\['bbox'\]:2D 包围框标注(3D 投影框的外部矩形),1x4 列表跟随 \[x1, y1, x2-x1, y2-y1\]。x1/y1 是沿图像水平/垂直方向的最小坐标。
- info\['annotations'\]\[i\]\['iscrowd'\]:该区域是否拥挤。默认为 0。
- info\['annotations'\]\[i\]\['bbox_cam3d'\]:3D 包围框(重力)中心位置(3)、大小(3)、(全局)偏航角(1)、1x7 列表。
- info\['annotations'\]\[i\]\['velo_cam3d'\]:3D 包围框的速度(由于不准确,没有垂直测量),一个 Nx2 数组。
- info\['annotations'\]\[i\]\['center2d'\]:包含 2.5D 信息的投影 3D 中心:图像上的投影中心位置(2)和深度(1),1x3 列表。
- info\['annotations'\]\[i\]\['attribute_name'\]:属性名称。
- info\['annotations'\]\[i\]\['attribute_id'\]:属性 ID。
我们为属性分类维护了一个属性集合和映射。更多的细节请参考[这里](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L53)
- info\['annotations'\]\[i\]\['id'\]:标注 ID。默认为 `i`
这里我们只解释训练信息文件中记录的数据。这同样适用于验证和测试集。
获取 `nuscenes_infos_xxx.pkl``nuscenes_infos_xxx_mono3d.coco.json` 的核心函数分别为 [\_fill_trainval_infos](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/nuscenes_converter.py#L143)[get_2d_boxes](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/nuscenes_converter.py#L397)。更多细节请参考 [nuscenes_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/nuscenes_converter.py)
......@@ -191,10 +191,11 @@ train_pipeline = [
```
它遵循 2D 检测的一般流水线,但在一些细节上有所不同:
- 它使用单目流水线加载图像,其中包括额外的必需信息,如相机内参矩阵。
- 它需要加载 3D 标注。
- 一些数据增强技术需要调整,例如`RandomFlip3D`
目前我们不支持更多的增强方法,因为如何迁移和应用其他技术仍在探索中。
目前我们不支持更多的增强方法,因为如何迁移和应用其他技术仍在探索中。
## 评估
......
......@@ -39,10 +39,12 @@ mmdetection3d
例如,在 `Area_1/office_1` 目录下的文件如下所示:
- `office_1.txt`:一个 txt 文件存储着原始点云数据每个点的坐标和颜色信息。
- `Annotations/`:这个文件夹里包含有此房间中实例物体的信息 (以 txt 文件的形式存储)。每个 txt 文件表示一个实例,例如:
- `chair_1.txt`:存储有该房间中一把椅子的点云数据。
如果我们将 `Annotations/` 下的所有 txt 文件合并起来,得到的点云就和 `office_1.txt` 中的点云是一致的。
- `chair_1.txt`:存储有该房间中一把椅子的点云数据。
如果我们将 `Annotations/` 下的所有 txt 文件合并起来,得到的点云就和 `office_1.txt` 中的点云是一致的。
你可以通过 `python collect_indoor3d_data.py` 指令进行 S3DIS 数据的提取。
主要步骤包括:
......@@ -143,16 +145,16 @@ s3dis
```
- `points/xxxxx.bin`:提取的点云数据。
- `instance_mask/xxxxx.bin`:每个点云的实例标签,取值范围为 [0, ${实例个数}],其中 0 代表未标注的点。
- `semantic_mask/xxxxx.bin`:每个点云的语义标签,取值范围为 [0, 12]。
- `instance_mask/xxxxx.bin`:每个点云的实例标签,取值范围为 \[0, ${实例个数}\],其中 0 代表未标注的点。
- `semantic_mask/xxxxx.bin`:每个点云的语义标签,取值范围为 \[0, 12\]
- `s3dis_infos_Area_1.pkl`:区域 1 的数据信息,每个房间的详细信息如下:
- info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
- info['pts_path']: `points/xxxxx.bin` 点云的路径。
- info['pts_instance_mask_path']: `instance_mask/xxxxx.bin` 实例标签的路径。
- info['pts_semantic_mask_path']: `semantic_mask/xxxxx.bin` 语义标签的路径。
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: `points/xxxxx.bin` 点云的路径。
- info\['pts_instance_mask_path'\]: `instance_mask/xxxxx.bin` 实例标签的路径。
- info\['pts_semantic_mask_path'\]: `semantic_mask/xxxxx.bin` 语义标签的路径。
- `seg_info`:为支持语义分割任务所生成的信息文件。
- `Area_1_label_weight.npy`:每一语义类别的权重系数。因为 S3DIS 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `Area_1_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
- `Area_1_label_weight.npy`:每一语义类别的权重系数。因为 S3DIS 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `Area_1_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
## 训练流程
......@@ -205,13 +207,13 @@ train_pipeline = [
]
```
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 [0, 13) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `13`。
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 \[0, 13) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `13`
- `IndoorPatchPointSample`:从输入点云中裁剪一个含有固定数量点的小块 (patch)。`block_size` 指定了裁剪块的边长,在 S3DIS 上这个数值一般设置为 `1.0`
- `NormalizePointsColor`:将输入点的颜色信息归一化,通过将 RGB 值除以 `255` 来实现。
- 数据增广:
- `GlobalRotScaleTrans`:对输入点云进行随机旋转和放缩变换。
- `RandomJitterPoints`:通过对每一个点施加不同的噪声向量以实现对点云的随机扰动。
- `RandomDropPointsColor`:以 `drop_ratio` 的概率随机将点云的颜色值全部置零。
- `GlobalRotScaleTrans`:对输入点云进行随机旋转和放缩变换。
- `RandomJitterPoints`:通过对每一个点施加不同的噪声向量以实现对点云的随机扰动。
- `RandomDropPointsColor`:以 `drop_ratio` 的概率随机将点云的颜色值全部置零。
## 度量指标
......
......@@ -223,25 +223,25 @@ scannet
```
- `points/xxxxx.bin`:下采样后,未与坐标轴平行(即没有对齐)的点云。因为 ScanNet 3D 检测任务将与坐标轴平行的点云作为输入,而 ScanNet 3D 语义分割任务将对齐前的点云作为输入,我们选择存储对齐前的点云和它们的对齐矩阵。请注意:在 3D 检测的预处理流程 [`GlobalAlignment`](https://github.com/open-mmlab/mmdetection3d/blob/9f0b01caf6aefed861ef4c3eb197c09362d26b32/mmdet3d/datasets/pipelines/transforms_3d.py#L423) 后,点云就都是与坐标轴平行的了。
- `instance_mask/xxxxx.bin`:每个点的实例标签,值的范围为:[0, NUM_INSTANCES],其中 0 表示没有标注。
- `semantic_mask/xxxxx.bin`:每个点的语义标签,值的范围为:[1, 40], 也就是 `nyu40id` 的标准。请注意:在训练流程 `PointSegClassMapping` 中,`nyu40id` 的 ID 会被映射到训练 ID。
- `instance_mask/xxxxx.bin`:每个点的实例标签,值的范围为:\[0, NUM_INSTANCES\],其中 0 表示没有标注。
- `semantic_mask/xxxxx.bin`:每个点的语义标签,值的范围为:\[1, 40\], 也就是 `nyu40id` 的标准。请注意:在训练流程 `PointSegClassMapping` 中,`nyu40id` 的 ID 会被映射到训练 ID。
- `posed_images/scenexxxx_xx``.jpg` 图像的集合,还包含 `.txt` 格式的 4x4 相机姿态和单个 `.txt` 格式的相机内参矩阵文件。
- `scannet_infos_train.pkl`:训练集的数据信息,每个场景的具体信息如下:
- info['point_cloud']:`{'num_features': 6, 'lidar_idx': sample_idx}`,其中 `sample_idx` 为该场景的索引。
- info['pts_path']:`points/xxxxx.bin` 的路径。
- info['pts_instance_mask_path']:`instance_mask/xxxxx.bin` 的路径。
- info['pts_semantic_mask_path']:`semantic_mask/xxxxx.bin` 的路径。
- info['annos']:每个场景的标注。
- annotations['gt_num']:真实物体 (ground truth) 的数量。
- annotations['name']:所有真实物体的语义类别名称,比如 `chair`(椅子)。
- annotations['location']:depth 坐标系下与坐标轴平行的三维包围框的重力中心 (gravity center),形状为 [K, 3],其中 K 是真实物体的数量。
- annotations['dimensions']:depth 坐标系下与坐标轴平行的三维包围框的大小,形状为 [K, 3]。
- annotations['gt_boxes_upright_depth']:depth 坐标系下与坐标轴平行的三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`,形状为 [K, 6]。
- annotations['unaligned_location']:depth 坐标系下与坐标轴不平行(对齐前)的三维包围框的重力中心。
- annotations['unaligned_dimensions']:depth 坐标系下与坐标轴不平行的三维包围框的大小。
- annotations['unaligned_gt_boxes_upright_depth']:depth 坐标系下与坐标轴不平行的三维包围框。
- annotations['index']:所有真实物体的索引,范围为 [0, K)。
- annotations['class']:所有真实物体类别的标号,范围为 [0, 18),形状为 [K, ]。
- info\['point_cloud'\]`{'num_features': 6, 'lidar_idx': sample_idx}`,其中 `sample_idx` 为该场景的索引。
- info\['pts_path'\]`points/xxxxx.bin` 的路径。
- info\['pts_instance_mask_path'\]`instance_mask/xxxxx.bin` 的路径。
- info\['pts_semantic_mask_path'\]`semantic_mask/xxxxx.bin` 的路径。
- info\['annos'\]:每个场景的标注。
- annotations\['gt_num'\]:真实物体 (ground truth) 的数量。
- annotations\['name'\]:所有真实物体的语义类别名称,比如 `chair`(椅子)。
- annotations\['location'\]:depth 坐标系下与坐标轴平行的三维包围框的重力中心 (gravity center),形状为 \[K, 3\],其中 K 是真实物体的数量。
- annotations\['dimensions'\]:depth 坐标系下与坐标轴平行的三维包围框的大小,形状为 \[K, 3\]
- annotations\['gt_boxes_upright_depth'\]:depth 坐标系下与坐标轴平行的三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`,形状为 \[K, 6\]
- annotations\['unaligned_location'\]:depth 坐标系下与坐标轴不平行(对齐前)的三维包围框的重力中心。
- annotations\['unaligned_dimensions'\]:depth 坐标系下与坐标轴不平行的三维包围框的大小。
- annotations\['unaligned_gt_boxes_upright_depth'\]:depth 坐标系下与坐标轴不平行的三维包围框。
- annotations\['index'\]:所有真实物体的索引,范围为 \[0, K)。
- annotations\['class'\]:所有真实物体类别的标号,范围为 \[0, 18),形状为 \[K, \]
- `scannet_infos_val.pkl`:验证集上的数据信息,与 `scannet_infos_train.pkl` 格式完全一致。
- `scannet_infos_test.pkl`:测试集上的数据信息,与 `scannet_infos_train.pkl` 格式几乎完全一致,除了缺少标注。
......@@ -291,11 +291,11 @@ train_pipeline = [
```
- `GlobalAlignment`:输入的点云在施加了坐标轴平行的矩阵后应被转换为与坐标轴平行的形式。
- `PointSegClassMapping`:训练中,只有合法的类别 ID 才会被映射到类别标签,比如 [0, 18)。
- `PointSegClassMapping`:训练中,只有合法的类别 ID 才会被映射到类别标签,比如 \[0, 18)。
- 数据增强:
- `PointSample`:下采样输入点云。
- `RandomFlip3D`:随机左右或前后翻转点云。
- `GlobalRotScaleTrans`: 旋转输入点云,对于 ScanNet 角度通常落入 [-5, 5] (度)的范围;并放缩输入点云,对于 ScanNet 比例通常为 1.0(即不做缩放);最后平移输入点云,对于 ScanNet 通常位移量为 0(即不做位移)。
- `PointSample`:下采样输入点云。
- `RandomFlip3D`:随机左右或前后翻转点云。
- `GlobalRotScaleTrans`: 旋转输入点云,对于 ScanNet 角度通常落入 \[-5, 5\] (度)的范围;并放缩输入点云,对于 ScanNet 比例通常为 1.0(即不做缩放);最后平移输入点云,对于 ScanNet 通常位移量为 0(即不做位移)。
## 评估指标
......
......@@ -10,7 +10,6 @@ ScanNet 3D 语义分割数据集的准备和 3D 检测任务的准备很相似
因为 ScanNet 测试集对 3D 语义分割任务提供在线评测的基准,我们也需要下载其测试集并置于 `scannet` 目录下。
数据预处理前的文件目录结构应如下所示:
```
mmdetection3d
├── mmdet3d
......@@ -70,8 +69,8 @@ scannet
```
- `seg_info`:为支持语义分割任务所生成的信息文件。
- `train_label_weight.npy`:每一语义类别的权重系数。因为 ScanNet 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `train_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
- `train_label_weight.npy`:每一语义类别的权重系数。因为 ScanNet 中属于不同类的点的数量相差很大,一个常见的操作是在计算损失时对不同类别进行加权 (label re-weighting) 以得到更好的分割性能。
- `train_resampled_scene_idxs.npy`:每一个场景 (房间) 的重采样标签。在训练过程中,我们依据每个场景的点的数量,会对其进行不同次数的重采样,以保证训练数据均衡。
## 训练流程
......@@ -111,7 +110,7 @@ train_pipeline = [
]
```
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 [0, 20) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `20`。
- `PointSegClassMapping`:在训练过程中,只有被使用的类别的序号会被映射到类似 \[0, 20) 范围内的类别标签。其余的类别序号会被转换为 `ignore_index` 所制定的忽略标签,在本例中是 `20`
- `IndoorPatchPointSample`:从输入点云中裁剪一个含有固定数量点的小块 (patch)。`block_size` 指定了裁剪块的边长,在 ScanNet 上这个数值一般设置为 `1.5`
- `NormalizePointsColor`:将输入点的颜色信息归一化,通过将 RGB 值除以 `255` 来实现。
......
......@@ -239,25 +239,24 @@ sunrgbd
- `points/0xxxxx.bin`:降采样后的点云数据。
- `sunrgbd_infos_train.pkl`:训练集数据信息(标注与元信息),每个场景所含数据信息具体如下:
- info['point_cloud']:`{'num_features': 6, 'lidar_idx': sample_idx}`,其中 `sample_idx` 为该场景的索引。
- info['pts_path']:`points/0xxxxx.bin` 的路径。
- info['image']:图像路径与元信息:
- image['image_idx']:图像索引。
- image['image_shape']:图像张量的形状(即其尺寸)。
- image['image_path']:图像路径。
- info['annos']:每个场景的标注:
- annotations['gt_num']:真实物体 (ground truth) 的数量。
- annotations['name']:所有真实物体的语义类别名称,比如 `chair`(椅子)。
- annotations['location']:depth 坐标系下三维包围框的重力中心 (gravity center),形状为 [K, 3],其中 K 是真实物体的数量。
- annotations['dimensions']:depth 坐标系下三维包围框的大小,形状为 [K, 3]。
- annotations['rotation_y']:depth 坐标系下三维包围框的旋转角,形状为 [K, ]。
- annotations['gt_boxes_upright_depth']:depth 坐标系下三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`,形状为 [K, 7]。
- annotations['bbox']:二维包围框 `(x, y, x_size, y_size)`,形状为 [K, 4]。
- annotations['index']:所有真实物体的索引,范围为 [0, K)。
- annotations['class']:所有真实物体类别的标号,范围为 [0, 10),形状为 [K, ]。
- info\['point_cloud'\]`{'num_features': 6, 'lidar_idx': sample_idx}`,其中 `sample_idx` 为该场景的索引。
- info\['pts_path'\]`points/0xxxxx.bin` 的路径。
- info\['image'\]:图像路径与元信息:
- image\['image_idx'\]:图像索引。
- image\['image_shape'\]:图像张量的形状(即其尺寸)。
- image\['image_path'\]:图像路径。
- info\['annos'\]:每个场景的标注:
- annotations\['gt_num'\]:真实物体 (ground truth) 的数量。
- annotations\['name'\]:所有真实物体的语义类别名称,比如 `chair`(椅子)。
- annotations\['location'\]:depth 坐标系下三维包围框的重力中心 (gravity center),形状为 \[K, 3\],其中 K 是真实物体的数量。
- annotations\['dimensions'\]:depth 坐标系下三维包围框的大小,形状为 \[K, 3\]
- annotations\['rotation_y'\]:depth 坐标系下三维包围框的旋转角,形状为 \[K, \]
- annotations\['gt_boxes_upright_depth'\]:depth 坐标系下三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`,形状为 \[K, 7\]
- annotations\['bbox'\]:二维包围框 `(x, y, x_size, y_size)`,形状为 \[K, 4\]
- annotations\['index'\]:所有真实物体的索引,范围为 \[0, K)。
- annotations\['class'\]:所有真实物体类别的标号,范围为 \[0, 10),形状为 \[K, \]
- `sunrgbd_infos_val.pkl`:验证集上的数据信息,与 `sunrgbd_infos_train.pkl` 格式完全一致。
## 训练流程
SUN RGB-D 上纯点云 3D 物体检测的典型流程如下:
......@@ -288,8 +287,9 @@ train_pipeline = [
```
点云上的数据增强
- `RandomFlip3D`:随机左右或前后翻转输入点云。
- `GlobalRotScaleTrans`:旋转输入点云,对于 SUN RGB-D 角度通常落入 [-30, 30] (度)的范围;并放缩输入点云,对于 SUN RGB-D 比例通常落入 [0.85, 1.15] 的范围;最后平移输入点云,对于 SUN RGB-D 通常位移量为 0(即不做位移)。
- `GlobalRotScaleTrans`:旋转输入点云,对于 SUN RGB-D 角度通常落入 \[-30, 30\] (度)的范围;并放缩输入点云,对于 SUN RGB-D 比例通常落入 \[0.85, 1.15\] 的范围;最后平移输入点云,对于 SUN RGB-D 通常位移量为 0(即不做位移)。
- `PointSample`:降采样输入点云。
SUN RGB-D 上多模态(点云和图像)3D 物体检测的典型流程如下:
......@@ -331,6 +331,7 @@ train_pipeline = [
```
图像上的数据增强/归一化
- `Resize`: 改变输入图像的大小, `keep_ratio=True` 意味着图像的比例不改变。
- `Normalize`: 归一化图像的 RGB 通道。
- `RandomFlip`: 随机地翻折图像。
......
......@@ -103,36 +103,36 @@ mmdetection3d
为了在 Waymo 数据集上进行检测性能评估,请按照[此处指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/)构建用于计算评估指标的二进制文件 `compute_detection_metrics_main`,并将它置于 `mmdet3d/core/evaluation/waymo_utils/` 下。您基本上可以按照下方命令安装 `bazel`,然后构建二进制文件:
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
接下来,您就可以在 Waymo 上评估您的模型了。如下示例是使用 8 个图形处理器 (GPU) 在 Waymo 上用 Waymo 评价指标评估 PointPillars 模型的情景:
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--eval waymo --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--eval waymo --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
如果需要生成 bin 文件,应在 `--eval-options` 中给出 `pklfile_prefix`。对于评价指标, `waymo` 是我们推荐的官方评估原型。目前,`kitti` 这一评估选项是从 KITTI 迁移而来的,且每个难度下的评估结果和 KITTI 数据集中定义得到的不尽相同——目前大多数物体被标记为难度 0(日后会修复)。`kitti` 评估选项的不稳定来源于很大的计算量,转换的数据中遮挡 (occlusion) 和截断 (truncation) 的缺失,难度的不同定义方式,以及不同的平均精度 (Average Precision) 计算方式。
......@@ -148,28 +148,28 @@ mmdetection3d
如下是一个使用 8 个图形处理器在 Waymo 上测试 PointPillars,生成 bin 文件并提交结果到官方榜单的例子:
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--format-only --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--format-only --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
在生成 bin 文件后,您可以简单地构建二进制文件 `create_submission`,并按照[指示](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) 创建一个提交文件。下面是一些示例:
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
如果想用官方评估服务器评估您在验证集上的结果,您可以使用同样的方法生成提交文件,只需确保您在运行如上指令前更改 `submission.txtpb` 中的字段值即可。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment