Unverified Commit bb204696 authored by Wenwei Zhang's avatar Wenwei Zhang Committed by GitHub
Browse files

Release v1.0.0rc3

Release v1.0.0rc3
parents 14c5ded4 dea954e5
......@@ -2,7 +2,7 @@
We follow the procedure in [pointnet](https://github.com/charlesq34/pointnet).
1. Download S3DIS data by filling this [Google form](https://docs.google.com/forms/d/e/1FAIpQLScDimvNMCGhy_rmBA2gHfDu3naktRm6A8BPwAWWDv-Uhm6Shw/viewform?c=0&w=1). Download the ```Stanford3dDataset_v1.2_Aligned_Version.zip``` file and unzip it. Link or move the folder to this level of directory.
1. Download S3DIS data by filling this [Google form](https://docs.google.com/forms/d/e/1FAIpQLScDimvNMCGhy_rmBA2gHfDu3naktRm6A8BPwAWWDv-Uhm6Shw/viewform?c=0&w=1). Download the `Stanford3dDataset_v1.2_Aligned_Version.zip` file and unzip it. Link or move the folder to this level of directory.
2. In this directory, extract point clouds and annotations by running `python collect_indoor3d_data.py`.
......
......@@ -32,6 +32,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [-
For now, CPU testing is only supported for SMOKE.
Optional arguments:
- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset. Typically we default to use official metrics for evaluation on different datasets, so it can be simply set to `mAP` as a placeholder for detection tasks, which applies to nuScenes, Lyft, ScanNet and SUNRGBD. For KITTI, if we only want to evaluate the 2D detection performance, we can simply set the metric to `img_bbox` (unstable, stay tuned). For Waymo, we provide both KITTI-style evaluation (unstable) and Waymo-style official protocol, corresponding to metric `kitti` and `waymo` respectively. We recommend to use the default official metric for stable performance and fair comparison with other methods. Similarly, the metric can be set to `mIoU` for segmentation tasks, which applies to S3DIS and ScanNet.
- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`.
......@@ -182,6 +183,7 @@ Optional arguments are:
- `--options 'Key=value'`: Override some settings in the used config.
Difference between `resume-from` and `load-from`:
- `resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
- `load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
......@@ -217,7 +219,6 @@ NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_tra
Usually it is slow if you do not have high speed networking like InfiniBand.
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
......
# Benchmarks
Here we benchmark the training and testing speed of models in MMDetection3D,
......@@ -6,34 +5,35 @@ with some other open source 3D detection codebases.
## Settings
* Hardwares: 8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
* Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
* Model: Since all the other codebases implements different models, we compare the corresponding models including SECOND, PointPillars, Part-A2, and VoteNet with them separately.
* Metrics: We use the average throughput in iterations of the entire training run and skip the first 50 iterations of each epoch to skip GPU warmup time.
- Hardwares: 8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
- Model: Since all the other codebases implements different models, we compare the corresponding models including SECOND, PointPillars, Part-A2, and VoteNet with them separately.
- Metrics: We use the average throughput in iterations of the entire training run and skip the first 50 iterations of each epoch to skip GPU warmup time.
## Main Results
We compare the training speed (samples/s) with other codebases if they implement the similar models. The results are as below, the greater the numbers in the table, the faster of the training process. The models that are not supported by other codebases are marked by `×`.
| Methods | MMDetection3D | OpenPCDet |votenet| Det3D |
|:-------:|:-------------:|:---------:|:-----:|:-----:|
| Methods | MMDetection3D | OpenPCDet | votenet | Det3D |
| :-----------------: | :-----------: | :-------: | :-----: | :---: |
| VoteNet | 358 | × | 77 | × |
| PointPillars-car| 141 | × | × | 140 |
| PointPillars-3class| 107 |44 | × | × |
| SECOND| 40 |30 | × | × |
| Part-A2| 17 |14 | × | × |
| PointPillars-car | 141 | × | × | 140 |
| PointPillars-3class | 107 | 44 | × | × |
| SECOND | 40 | 30 | × | × |
| Part-A2 | 17 | 14 | × | × |
## Details of Comparison
### Modification for Calculating Speed
* __MMDetection3D__: We try to use as similar settings as those of other codebases as possible using [benchmark configs](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark).
- __MMDetection3D__: We try to use as similar settings as those of other codebases as possible using [benchmark configs](https://github.com/open-mmlab/MMDetection3D/blob/master/configs/benchmark).
* __Det3D__: For comparison with Det3D, we use the commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7).
- __Det3D__: For comparison with Det3D, we use the commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7).
* __OpenPCDet__: For comparison with OpenPCDet, we use the commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2).
- __OpenPCDet__: For comparison with OpenPCDet, we use the commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2).
For training speed, we add code to record the running time in the file `./tools/train_utils/train_utils.py`. We calculate the speed of each epoch, and report the average speed of all the epochs.
<details>
<summary>
(diff to make it use the same method for benchmarking speed - click to expand)
......@@ -117,19 +117,18 @@ We compare the training speed (samples/s) with other codebases if they implement
### VoteNet
* __MMDetection3D__: With release v0.1.0, run
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd-3d-10class.py 8 --no-validate
```
* __votenet__: At commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566), run
- __votenet__: At commit [2f6d6d3](https://github.com/facebookresearch/votenet/tree/2f6d6d36ff98d96901182e935afe48ccee82d566), run
```bash
python train.py --dataset sunrgbd --batch_size 16
```
Then benchmark the test speed by running
```bash
......@@ -199,13 +198,13 @@ We compare the training speed (samples/s) with other codebases if they implement
### PointPillars-car
* __MMDetection3D__: With release v0.1.0, run
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
```
* __Det3D__: At commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7), use `kitti_point_pillars_mghead_syncbn.py` and run
- __Det3D__: At commit [519251e](https://github.com/poodarchu/Det3D/tree/519251e72a5c1fdd58972eabeac67808676b9bb7), use `kitti_point_pillars_mghead_syncbn.py` and run
```bash
./tools/scripts/train.sh --launcher=slurm --gpus=8
......@@ -241,13 +240,13 @@ We compare the training speed (samples/s) with other codebases if they implement
### PointPillars-3class
* __MMDetection3D__: With release v0.1.0, run
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
```bash
cd tools
......@@ -258,13 +257,13 @@ We compare the training speed (samples/s) with other codebases if they implement
For SECOND, we mean the [SECONDv1.5](https://github.com/traveller59/second.pytorch/blob/master/second/configs/all.fhd.config) that was first implemented in [second.Pytorch](https://github.com/traveller59/second.pytorch). Det3D's implementation of SECOND uses its self-implemented Multi-Group Head, so its speed is not compatible with other codebases.
* __MMDetection3D__: With release v0.1.0, run
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), run
```bash
cd tools
......@@ -273,13 +272,13 @@ For SECOND, we mean the [SECONDv1.5](https://github.com/traveller59/second.pytor
### Part-A2
* __MMDetection3D__: With release v0.1.0, run
- __MMDetection3D__: With release v0.1.0, run
```bash
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
```
* __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), train the model by running
- __OpenPCDet__: At commit [b32fbddb](https://github.com/open-mmlab/OpenPCDet/tree/b32fbddbe06183507bad433ed99b407cbc2175c2), train the model by running
```bash
cd tools
......
## Changelog
### v1.0.0rc3 (8/6/2022)
#### Highlights
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf)
#### New Features
- Support [SA-SSD](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.pdf) (#1337)
#### Improvements
- Add Chinese documentation for vision-only 3D detection (#1438)
- Update CenterPoint pretrained models that are compatible with refactored coordinate systems (#1450)
- Configure myst-parser to parse anchor tag in the documentation (#1488)
- Replace markdownlint with mdformat for avoiding installing ruby (#1489)
- Add missing `gt_names` when getting annotation info in Custom3DDataset (#1519)
- Support S3DIS full ceph training (#1542)
- Rewrite the installation and FAQ documentation (#1545)
#### Bug Fixes
- Fix the incorrect registry name when building RoI extractors (#1460)
- Fix the potential problems caused by the registry scope update when composing pipelines (#1466) and using CocoDataset (#1536)
- Fix the missing selection with `order` in the [box3d_nms](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/post_processing/box3d_nms.py) introduced by [#1403](https://github.com/open-mmlab/mmdetection3d/pull/1403) (#1479)
- Update the [PointPillars config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py) to make it consistent with the log (#1486)
- Fix heading anchor in documentation (#1490)
- Fix the compatibility of mmcv in the dockerfile (#1508)
- Make overwrite_spconv packaged when building whl (#1516)
- Fix the requirement of mmcv and mmdet (#1537)
- Update configs of PartA2 and support its compatibility with spconv 2.0 (#1538)
#### Contributors
A total of 13 developers contributed to this release.
@Xiangxu-0103, @ZCMax, @jshilong, @filaPro, @atinfinity, @Tai-Wang, @wenbo-yu, @yi-chen-isuzu, @ZwwWayne, @wchen61, @VVsssssk, @AlexPasqua, @lianqing11
### v1.0.0rc2 (1/5/2022)
#### Highlights
......@@ -51,7 +89,7 @@ A total of 11 developers contributed to this release.
- We update some of the model checkpoints after the refactor of coordinate systems. Please stay tuned for the release of the remaining model checkpoints.
| | Fully Updated | Partially Updated | In Progress | No Influcence |
|--------------------|:-------------:|:--------:| :-----------: | :-----------: |
| ------------- | :-----------: | :---------------: | :---------: | :-----------: |
| SECOND | | ✓ | | |
| PointPillars | | ✓ | | |
| FreeAnchor | ✓ | | | |
......@@ -60,19 +98,18 @@ A total of 11 developers contributed to this release.
| 3DSSD | | ✓ | | |
| Part-A2 | ✓ | | | |
| MVXNet | ✓ | | | |
| CenterPoint | | | | |
| CenterPoint | | | | |
| SSN | ✓ | | | |
| ImVoteNet | ✓ | | | |
| FCOS3D | | | | |
| PointNet++ | | | | |
| Group-Free-3D | | | | |
| FCOS3D | | | | |
| PointNet++ | | | | |
| Group-Free-3D | | | | |
| ImVoxelNet | ✓ | | | |
| PAConv | | | |✓ |
| DGCNN | | | |✓ |
| SMOKE | | | |✓ |
| PGD | | | |✓ |
| MonoFlex | | | |✓ |
| PAConv | | | | ✓ |
| DGCNN | | | | ✓ |
| SMOKE | | | | ✓ |
| PGD | | | | ✓ |
| MonoFlex | | | | ✓ |
#### Highlights
......@@ -414,7 +451,6 @@ A total of 12 developers contributed to this release.
@yinchimaoliang, @gopi231091, @filaPro, @ZwwWayne, @ZCMax, @hjin2902, @wHao-Wu, @Wuziyi616, @xiliu8006, @THU17cyz, @DCNSW, @Tai-Wang
### v0.15.0 (1/7/2021)
#### Compatibility
......@@ -449,7 +485,6 @@ In order to fix the problem that the priority of EvalHook is too low, all hook p
- Add documentation for vision-only 3D detection (#669)
- Refine docs for Quick Run and Useful Tools (#686)
#### Bug Fixes
- Fix the bug of [BackgroundPointsFilter](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/pipelines/transforms_3d.py) using the bottom center of ground truth (#609)
......@@ -458,10 +493,10 @@ In order to fix the problem that the priority of EvalHook is too low, all hook p
- Fix test commands in docs and make some refinements (#635)
- Fix wrong config paths in unit tests (#641)
### v0.14.0 (1/6/2021)
#### Highlights
- Support the point cloud segmentation method [PointNet++](https://arxiv.org/abs/1706.02413)
#### New Features
......@@ -482,16 +517,17 @@ In order to fix the problem that the priority of EvalHook is too low, all hook p
- Remove a useless parameter `label_weight` from segmentation datasets including `Custom3DSegDataset`, `ScanNetSegDataset` and `S3DISSegDataset` (#607)
#### Bug Fixes
- Fix a corrupted lidar data file in Lyft dataset in [data_preparation](https://github.com/open-mmlab/mmdetection3d/tree/master/docs/data_preparation.md) (#546)
- Fix evaluation bugs in nuScenes and Lyft dataset (#549)
- Fix converting points between coordinates with specific transformation matrix in the [coord_3d_mode.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/coord_3d_mode.py) (#556)
- Support PointPillars models on Lyft dataset (#578)
- Fix the bug of demo with pre-trained VoteNet model on ScanNet (#600)
### v0.13.0 (1/5/2021)
#### Highlights
- Support a monocular 3D detection method [FCOS3D](https://arxiv.org/abs/2104.10956)
- Support ScanNet and S3DIS semantic segmentation dataset
- Enhancement of visualization tools for dataset browsing and demos, including support of visualization for multi-modality data and point cloud segmentation.
......@@ -746,7 +782,7 @@ In order to fix the problem that the priority of EvalHook is too low, all hook p
- Support Batch Inference (#95, #103, #116): MMDetection3D v0.6.0 migrates to support batch inference based on MMDetection >= v2.4.0. This change influences all the test APIs in MMDetection3D and downstream codebases.
- Start to use collect environment function from MMCV (#113): MMDetection3D v0.6.0 migrates to use `collect_env` function in MMCV.
`get_compiler_version` and `get_compiling_cuda_version` compiled in `mmdet3d.ops.utils` are removed. Please import these two functions from `mmcv.ops`.
`get_compiler_version` and `get_compiling_cuda_version` compiled in `mmdet3d.ops.utils` are removed. Please import these two functions from `mmcv.ops`.
#### New Features
......
......@@ -10,46 +10,45 @@ In this version we did a major code refactoring that boosted the performance of
Meanwhile, we also fixed the imprecise timestamps saving issue in waymo dataset conversion. This change introduces following backward compatibility breaks:
- The point cloud .bin files of waymo dataset need to be regenerated.
In the .bin files each point occupies 6 `float32` and the meaning of the last `float32` now changed from **imprecise timestamps** to **range frame offset**.
The **range frame offset** for each point is calculated as`ri * h * w + row * w + col` if the point is from the **TOP** lidar or `-1` otherwise.
The `h`, `w` denote the height and width of the TOP lidar's range frame.
The `ri`, `row`, `col` denote the return index, the row and the column of the range frame where each point locates.
Following tables show the difference across the change:
In the .bin files each point occupies 6 `float32` and the meaning of the last `float32` now changed from **imprecise timestamps** to **range frame offset**.
The **range frame offset** for each point is calculated as`ri * h * w + row * w + col` if the point is from the **TOP** lidar or `-1` otherwise.
The `h`, `w` denote the height and width of the TOP lidar's range frame.
The `ri`, `row`, `col` denote the return index, the row and the column of the range frame where each point locates.
Following tables show the difference across the change:
Before
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
|--------------------------|:---:|:---:|:---:|:---------:|:----------:|:-----------------------:|
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :---------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **imprecise timestamp** |
After
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
|--------------------------|:---:|:---:|:---:|:---------:|:----------:|:----------------------:|
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :--------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- The objects' point cloud .bin files in the GT-database of waymo dataset need to be regenerated because we also dumped the range frame offset for each point into it.
Following tables show the difference across the change:
Following tables show the difference across the change:
Before
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 |
|--------------------------|:---:|:---:|:---:|:---------:|:----------:|
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 |
| Meaning | x | y | z | intensity | elongation |
After
| Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
|--------------------------|:---:|:---:|:---:|:---------:|:----------:|:----------------------:|
| ------------------------ | :-: | :-: | :-: | :-------: | :--------: | :--------------------: |
| Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
| Meaning | x | y | z | intensity | elongation | **range frame offset** |
- Any configuration that uses waymo dataset with GT Augmentation should change the `db_sampler.points_loader.load_dim` from `5` to `6`.
## v1.0.0rc0
### Coordinate system refactoring
......@@ -63,6 +62,7 @@ In this version, we did a major code refactoring which improved the consistency
#### ***NOTICE!!***
Since definitions of box representation have changed, the annotation data of most datasets require updating:
- SUN RGB-D: Yaw angles in the annotation should be reversed.
- KITTI: For LiDAR boxes in GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted from the old LiDAR coordinate system to the new one. The training/validation data annotations should be left unchanged since they are under the Camera coordinate system, which is unmodified after the refactoring.
- Waymo: Same as KITTI.
......@@ -88,7 +88,6 @@ Functions only involving points are generally unaffected except if they rely on
- Data augmentation utils in [data_augment_utils.py](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/datasets/pipelines/data_augment_utils.py) now follow the rules of a right-handed system.
- We do not need the yaw hacking in KITTI anymore after refining [`get_direction_target`](https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc0/mmdet3d/models/dense_heads/train_mixins.py). Interested users may refer to PR [#677](https://github.com/open-mmlab/mmdetection3d/pull/677) .
## 0.16.0
### Returned values of `QueryAndGroup` operation
......@@ -168,4 +167,4 @@ Please refer to the SUNRGBD [README.md](https://github.com/open-mmlab/mmdetectio
### VoteNet and H3DNet model structure update
In MMDetection 0.6.0, we updated the model structures of VoteNet and H3DNet, therefore model checkpoints generated by MMDetection < 0.6.0 should be first converted to a format compatible with the latest structures via [convert_votenet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_votenet_checkpoints.py) and [convert_h3dnet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_h3dnet_checkpoints.py) . For more details, please refer to the VoteNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/votenet/README.md/) and H3DNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/h3dnet/README.md/).
In MMDetection 0.6.0, we updated the model structures of VoteNet and H3DNet, therefore model checkpoints generated by MMDetection \< 0.6.0 should be first converted to a format compatible with the latest structures via [convert_votenet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_votenet_checkpoints.py) and [convert_h3dnet_checkpoints.py](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/model_converters/convert_h3dnet_checkpoints.py) . For more details, please refer to the VoteNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/votenet/README.md/) and H3DNet [README.md](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/h3dnet/README.md/).
......@@ -136,10 +136,11 @@ latex_documents = [
StandaloneHTMLBuilder.supported_image_types = [
'image/svg+xml', 'image/gif', 'image/png', 'image/jpeg'
]
# -- Extension configuration -------------------------------------------------
# Ignore >>> when copying code
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Enable ::: for my_st
myst_enable_extensions = ['colon_fence']
myst_heading_anchors = 3
language = 'en'
def builder_inited_handler(app):
......
......@@ -88,16 +88,16 @@ kitti
- `kitti_gt_database/xxxxx.bin`: point cloud data included in each 3D bounding box of the training dataset
- `kitti_infos_train.pkl`: training dataset infos, each frame info contains following details:
- info['point_cloud']: {'num_features': 4, 'velodyne_path': velodyne_path}.
- info['annos']: {
- info\['point_cloud'\]: {'num_features': 4, 'velodyne_path': velodyne_path}.
- info\['annos'\]: {
- location: x,y,z are bottom center in referenced camera coordinate system (in meters), an Nx3 array
- dimensions: height, width, length (in meters), an Nx3 array
- rotation_y: rotation ry around Y-axis in camera coordinates [-pi..pi], an N array
- rotation_y: rotation ry around Y-axis in camera coordinates \[-pi..pi\], an N array
- name: ground truth name array, an N array
- difficulty: kitti difficulty, Easy, Moderate, Hard
- group_ids: used for multi-part object
}
- (optional) info['calib']: {
- (optional) info\['calib'\]: {
- P0: camera0 projection matrix after rectification, an 3x4 array
- P1: camera1 projection matrix after rectification, an 3x4 array
- P2: camera2 projection matrix after rectification, an 3x4 array
......@@ -106,9 +106,9 @@ kitti
- Tr_velo_to_cam: transformation from Velodyne coordinate to camera coordinate, an 4x4 array
- Tr_imu_to_velo: transformation from IMU coordinate to Velodyne coordinate, an 4x4 array
}
- (optional) info['image']:{'image_idx': idx, 'image_path': image_path, 'image_shape', image_shape}.
- (optional) info\['image'\]:{'image_idx': idx, 'image_path': image_path, 'image_shape', image_shape}.
**Note:** the info['annos'] is in the referenced camera coordinate system. More details please refer to [this](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf)
**Note:** the info\['annos'\] is in the referenced camera coordinate system. More details please refer to [this](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf)
The core function to get kitti_infos_xxx.pkl and kitti_infos_xxx_mono3d.coco.json are [get_kitti_image_info](https://github.com/open-mmlab/mmdetection3d/blob/7873c8f62b99314f35079f369d1dab8d63f8a3ce/tools/data_converter/kitti_data_utils.py#L140) and [get_2d_boxes](https://github.com/open-mmlab/mmdetection3d/blob/7873c8f62b99314f35079f369d1dab8d63f8a3ce/tools/data_converter/kitti_converter.py#L378). Please refer to [kitti_converter.py](https://github.com/open-mmlab/mmdetection3d/blob/7873c8f62b99314f35079f369d1dab8d63f8a3ce/tools/data_converter/kitti_converter.py) for more details.
......
......@@ -90,19 +90,19 @@ Next, we will elaborate on the difference compared to nuScenes in terms of the d
- without `lyft_database/xxxxx.bin`: This folder and `.bin` files are not extracted on the Lyft dataset due to the negligible effect of ground-truth sampling in the experiments.
- `lyft_infos_train.pkl`: training dataset infos, each frame info has two keys: `metadata` and `infos`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.01-train'}`, while `infos` contains the detailed information the same as nuScenes except for the following details:
- info['sweeps']: Sweeps information.
- info['sweeps'][i]['type']: The sweep data type, e.g., `'lidar'`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.01-train'}`, while `infos` contains the detailed information the same as nuScenes except for the following details:
- info\['sweeps'\]: Sweeps information.
- info\['sweeps'\]\[i\]\['type'\]: The sweep data type, e.g., `'lidar'`.
Lyft has different LiDAR settings for some samples, but we always take only the points collected by the top LiDAR for the consistency of data distribution.
- info['gt_names']: There are 9 categories on the Lyft dataset, and the imbalance of annotations for different categories is even more significant than nuScenes.
- without info['gt_velocity']: There is no velocity measurement on Lyft.
- info['num_lidar_pts']: Set to -1 by default.
- info['num_radar_pts']: Set to 0 by default.
- without info['valid_flag']: This flag does recorded due to invalid `num_lidar_pts` and `num_radar_pts`.
- info\['gt_names'\]: There are 9 categories on the Lyft dataset, and the imbalance of annotations for different categories is even more significant than nuScenes.
- without info\['gt_velocity'\]: There is no velocity measurement on Lyft.
- info\['num_lidar_pts'\]: Set to -1 by default.
- info\['num_radar_pts'\]: Set to 0 by default.
- without info\['valid_flag'\]: This flag does recorded due to invalid `num_lidar_pts` and `num_radar_pts`.
- `nuscenes_infos_train_mono3d.coco.json`: training dataset coco-style info. This file only contains 2D information, without the information required by 3D detection, such as camera intrinsics.
- info['images']: A list containing all the image info.
- info\['images'\]: A list containing all the image info.
- only containing `'file_name'`, `'id'`, `'width'`, `'height'`.
- info['annotations']: A list containing all the annotation info.
- info\['annotations'\]: A list containing all the annotation info.
- only containing `'file_name'`, `'image_id'`, `'area'`, `'category_name'`, `'category_id'`, `'bbox'`, `'is_crowd'`, `'segmentation'`, `'id'`, where `'is_crowd'`, `'segmentation'` are set to `0` and `[]` by default.
There is no attribute annotation on Lyft.
......
......@@ -62,63 +62,63 @@ Next, we will elaborate on the details recorded in these info files.
- `nuscenes_database/xxxxx.bin`: point cloud data included in each 3D bounding box of the training dataset
- `nuscenes_infos_train.pkl`: training dataset info, each frame info has two keys: `metadata` and `infos`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.0-trainval'}`, while `infos` contains the detailed information as follows:
- info['lidar_path']: The file path of the lidar point cloud data.
- info['token']: Sample data token.
- info['sweeps']: Sweeps information (`sweeps` in the nuScenes refer to the intermediate frames without annotations, while `samples` refer to those key frames with annotations).
- info['sweeps'][i]['data_path']: The data path of i-th sweep.
- info['sweeps'][i]['type']: The sweep data type, e.g., `'lidar'`.
- info['sweeps'][i]['sample_data_token']: The sweep sample data token.
- info['sweeps'][i]['sensor2ego_translation']: The translation from the current sensor (for collecting the sweep data) to ego vehicle. (1x3 list)
- info['sweeps'][i]['sensor2ego_rotation']: The rotation from the current sensor (for collecting the sweep data) to ego vehicle. (1x4 list in the quaternion format)
- info['sweeps'][i]['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['sweeps'][i]['ego2global_rotation']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['sweeps'][i]['timestamp']: Timestamp of the sweep data.
- info['sweeps'][i]['sensor2lidar_translation']: The translation from the current sensor (for collecting the sweep data) to lidar. (1x3 list)
- info['sweeps'][i]['sensor2lidar_rotation']: The rotation from the current sensor (for collecting the sweep data) to lidar. (1x4 list in the quaternion format)
- info['cams']: Cameras calibration information. It contains six keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.0-trainval'}`, while `infos` contains the detailed information as follows:
- info\['lidar_path'\]: The file path of the lidar point cloud data.
- info\['token'\]: Sample data token.
- info\['sweeps'\]: Sweeps information (`sweeps` in the nuScenes refer to the intermediate frames without annotations, while `samples` refer to those key frames with annotations).
- info\['sweeps'\]\[i\]\['data_path'\]: The data path of i-th sweep.
- info\['sweeps'\]\[i\]\['type'\]: The sweep data type, e.g., `'lidar'`.
- info\['sweeps'\]\[i\]\['sample_data_token'\]: The sweep sample data token.
- info\['sweeps'\]\[i\]\['sensor2ego_translation'\]: The translation from the current sensor (for collecting the sweep data) to ego vehicle. (1x3 list)
- info\['sweeps'\]\[i\]\['sensor2ego_rotation'\]: The rotation from the current sensor (for collecting the sweep data) to ego vehicle. (1x4 list in the quaternion format)
- info\['sweeps'\]\[i\]\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['sweeps'\]\[i\]\['ego2global_rotation'\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
- info\['sweeps'\]\[i\]\['sensor2lidar_translation'\]: The translation from the current sensor (for collecting the sweep data) to lidar. (1x3 list)
- info\['sweeps'\]\[i\]\['sensor2lidar_rotation'\]: The rotation from the current sensor (for collecting the sweep data) to lidar. (1x4 list in the quaternion format)
- info\['cams'\]: Cameras calibration information. It contains six keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`.
Each dictionary contains detailed information following the above way for each sweep data (has the same keys for each information as above). In addition, each camera has a key `'cam_intrinsic'` for recording the intrinsic parameters when projecting 3D points to each image plane.
- info['lidar2ego_translation']: The translation from lidar to ego vehicle. (1x3 list)
- info['lidar2ego_rotation']: The rotation from lidar to ego vehicle. (1x4 list in the quaternion format)
- info['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['ego2global_rotation']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['timestamp']: Timestamp of the sample data.
- info['gt_boxes']: 7-DoF annotations of 3D bounding boxes, an Nx7 array.
- info['gt_names']: Categories of 3D bounding boxes, an 1xN array.
- info['gt_velocity']: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info['num_lidar_pts']: Number of lidar points included in each 3D bounding box.
- info['num_radar_pts']: Number of radar points included in each 3D bounding box.
- info['valid_flag']: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
- info\['lidar2ego_translation'\]: The translation from lidar to ego vehicle. (1x3 list)
- info\['lidar2ego_rotation'\]: The rotation from lidar to ego vehicle. (1x4 list in the quaternion format)
- info\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['ego2global_rotation'\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['timestamp'\]: Timestamp of the sample data.
- info\['gt_boxes'\]: 7-DoF annotations of 3D bounding boxes, an Nx7 array.
- info\['gt_names'\]: Categories of 3D bounding boxes, an 1xN array.
- info\['gt_velocity'\]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info\['num_lidar_pts'\]: Number of lidar points included in each 3D bounding box.
- info\['num_radar_pts'\]: Number of radar points included in each 3D bounding box.
- info\['valid_flag'\]: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
- `nuscenes_infos_train_mono3d.coco.json`: training dataset coco-style info. This file organizes image-based data into three categories (keys): `'categories'`, `'images'`, `'annotations'`.
- info['categories']: A list containing all the category names. Each element follows the dictionary format and consists of two keys: `'id'` and `'name'`.
- info['images']: A list containing all the image info.
- info['images'][i]['file_name']: The file name of the i-th image.
- info['images'][i]['id']: Sample data token of the i-th image.
- info['images'][i]['token']: Sample token corresponding to this frame.
- info['images'][i]['cam2ego_rotation']: The rotation from the camera to ego vehicle. (1x4 list in the quaternion format)
- info['images'][i]['cam2ego_translation']: The translation from the camera to ego vehicle. (1x3 list)
- info['images'][i]['ego2global_rotation'']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['images'][i]['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['images'][i]['cam_intrinsic']: Camera intrinsic matrix. (3x3 list)
- info['images'][i]['width']: Image width, 1600 by default in nuScenes.
- info['images'][i]['height']: Image height, 900 by default in nuScenes.
- info['annotations']: A list containing all the annotation info.
- info['annotations'][i]['file_name']: The file name of the corresponding image.
- info['annotations'][i]['image_id']: The image id (token) of the corresponding image.
- info['annotations'][i]['area']: Area of the 2D bounding box.
- info['annotations'][i]['category_name']: Category name.
- info['annotations'][i]['category_id']: Category id.
- info['annotations'][i]['bbox']: 2D bounding box annotation (exterior rectangle of the projected 3D box), 1x4 list following [x1, y1, x2-x1, y2-y1].
- info\['categories'\]: A list containing all the category names. Each element follows the dictionary format and consists of two keys: `'id'` and `'name'`.
- info\['images'\]: A list containing all the image info.
- info\['images'\]\[i\]\['file_name'\]: The file name of the i-th image.
- info\['images'\]\[i\]\['id'\]: Sample data token of the i-th image.
- info\['images'\]\[i\]\['token'\]: Sample token corresponding to this frame.
- info\['images'\]\[i\]\['cam2ego_rotation'\]: The rotation from the camera to ego vehicle. (1x4 list in the quaternion format)
- info\['images'\]\[i\]\['cam2ego_translation'\]: The translation from the camera to ego vehicle. (1x3 list)
- info\['images'\]\[i\]\['ego2global_rotation''\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['images'\]\[i\]\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['images'\]\[i\]\['cam_intrinsic'\]: Camera intrinsic matrix. (3x3 list)
- info\['images'\]\[i\]\['width'\]: Image width, 1600 by default in nuScenes.
- info\['images'\]\[i\]\['height'\]: Image height, 900 by default in nuScenes.
- info\['annotations'\]: A list containing all the annotation info.
- info\['annotations'\]\[i\]\['file_name'\]: The file name of the corresponding image.
- info\['annotations'\]\[i\]\['image_id'\]: The image id (token) of the corresponding image.
- info\['annotations'\]\[i\]\['area'\]: Area of the 2D bounding box.
- info\['annotations'\]\[i\]\['category_name'\]: Category name.
- info\['annotations'\]\[i\]\['category_id'\]: Category id.
- info\['annotations'\]\[i\]\['bbox'\]: 2D bounding box annotation (exterior rectangle of the projected 3D box), 1x4 list following \[x1, y1, x2-x1, y2-y1\].
x1/y1 are minimum coordinates along horizontal/vertical direction of the image.
- info['annotations'][i]['iscrowd']: Whether the region is crowded. Defaults to 0.
- info['annotations'][i]['bbox_cam3d']: 3D bounding box (gravity) center location (3), size (3), (global) yaw angle (1), 1x7 list.
- info['annotations'][i]['velo_cam3d']: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info['annotations'][i]['center2d']: Projected 3D-center containing 2.5D information: projected center location on the image (2) and depth (1), 1x3 list.
- info['annotations'][i]['attribute_name']: Attribute name.
- info['annotations'][i]['attribute_id']: Attribute id.
- info\['annotations'\]\[i\]\['iscrowd'\]: Whether the region is crowded. Defaults to 0.
- info\['annotations'\]\[i\]\['bbox_cam3d'\]: 3D bounding box (gravity) center location (3), size (3), (global) yaw angle (1), 1x7 list.
- info\['annotations'\]\[i\]\['velo_cam3d'\]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info\['annotations'\]\[i\]\['center2d'\]: Projected 3D-center containing 2.5D information: projected center location on the image (2) and depth (1), 1x3 list.
- info\['annotations'\]\[i\]\['attribute_name'\]: Attribute name.
- info\['annotations'\]\[i\]\['attribute_id'\]: Attribute id.
We maintain a default attribute collection and mapping for attribute classification.
Please refer to [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L53) for more details.
- info['annotations'][i]['id']: Annotation id. Defaults to `i`.
- info\['annotations'\]\[i\]\['id'\]: Annotation id. Defaults to `i`.
Here we only explain the data recorded in the training info files. The same applies to validation and testing set.
......@@ -194,10 +194,11 @@ train_pipeline = [
```
It follows the general pipeline of 2D detection while differs in some details:
- It uses monocular pipelines to load images, which includes additional required information like camera intrinsics.
- It needs to load 3D annotations.
- Some data augmentation techniques need to be adjusted, such as `RandomFlip3D`.
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
## Evaluation
......
......@@ -36,7 +36,9 @@ mmdetection3d
Under folder `Stanford3dDataset_v1.2_Aligned_Version`, the rooms are spilted into 6 areas. We use 5 areas for training and 1 for evaluation (typically `Area_5`). Under the directory of each area, there are folders in which raw point cloud data and relevant annotations are saved. For instance, under folder `Area_1/office_1` the files are as below:
- `office_1.txt`: A txt file storing coordinates and colors of each point in the raw point cloud data.
- `Annotations/`: This folder contains txt files for different object instances. Each txt file represents one instance, e.g.
- `chair_1.txt`: A txt file storing raw point cloud data of one chair in this room.
If we concat all the txt files under `Annotations/`, we will get the same point cloud as denoted by `office_1.txt`.
......@@ -138,13 +140,13 @@ s3dis
```
- `points/xxxxx.bin`: The exported point cloud data.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, ${NUM_INSTANCES}], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [0, 12].
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: \[0, ${NUM_INSTANCES}\], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: \[0, 12\].
- `s3dis_infos_Area_1.pkl`: Area 1 data infos, the detailed info of each room is as follows:
- info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
- info['pts_path']: The path of `points/xxxxx.bin`.
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: The path of `points/xxxxx.bin`.
- info\['pts_instance_mask_path'\]: The path of `instance_mask/xxxxx.bin`.
- info\['pts_semantic_mask_path'\]: The path of `semantic_mask/xxxxx.bin`.
- `seg_info`: The generated infos to support semantic segmentation model training.
- `Area_1_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
- `Area_1_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
......@@ -200,7 +202,7 @@ train_pipeline = [
]
```
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 13) during training. Other class ids will be converted to `ignore_index` which equals to `13`.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like \[0, 13) during training. Other class ids will be converted to `ignore_index` which equals to `13`.
- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.0` for S3DIS.
- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`.
- Data augmentation:
......
......@@ -135,7 +135,7 @@ By exporting ScanNet RGB data, for each scene we load a set of RGB images with c
python extract_posed_images.py
```
Each of 1201 train, 312 validation and 100 test scenes contains a single `.sens` file. For instance, for scene `0001_01` we have `data/scannet/scans/scene0001_01/0001_01.sens`. For this scene all images and poses are extracted to `data/scannet/posed_images/scene0001_01`. Specifically, there will be 300 image files xxxxx.jpg, 300 camera pose files xxxxx.txt and a single `intrinsic.txt` file. Typically, single scene contains several thousand images. By default, we extract only 300 of them with resulting space occupation of <100 Gb. To extract more images, use `--max-images-per-scene` parameter.
Each of 1201 train, 312 validation and 100 test scenes contains a single `.sens` file. For instance, for scene `0001_01` we have `data/scannet/scans/scene0001_01/0001_01.sens`. For this scene all images and poses are extracted to `data/scannet/posed_images/scene0001_01`. Specifically, there will be 300 image files xxxxx.jpg, 300 camera pose files xxxxx.txt and a single `intrinsic.txt` file. Typically, single scene contains several thousand images. By default, we extract only 300 of them with resulting space occupation of \<100 Gb. To extract more images, use `--max-images-per-scene` parameter.
### Create dataset
......@@ -222,29 +222,28 @@ scannet
```
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline [`GlobalAlignment`](https://github.com/open-mmlab/mmdetection3d/blob/9f0b01caf6aefed861ef4c3eb197c09362d26b32/mmdet3d/datasets/pipelines/transforms_3d.py#L423) of 3D detection task.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` ID will be mapped to train ID in train pipeline `PointSegClassMapping`.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: \[0, NUM_INSTANCES\], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: \[1, 40\], i.e. `nyu40id` standard. Note: the `nyu40id` ID will be mapped to train ID in train pipeline `PointSegClassMapping`.
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
- `scannet_infos_train.pkl`: The train data infos, the detailed info of each scan is as follows:
- info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
- info['pts_path']: The path of `points/xxxxx.bin`.
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info['annos']: The annotations of each scan.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes in depth coordinate system, i.e. (x_size, y_size, z_size), shape: [K, 3].
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes in depth coordinate system, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['index']: The index of all ground truths, i.e. [0, K).
- annotations['class']: The train class ID of the bounding boxes, value range: [0, 18), shape: [K, ].
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: The path of `points/xxxxx.bin`.
- info\['pts_instance_mask_path'\]: The path of `instance_mask/xxxxx.bin`.
- info\['pts_semantic_mask_path'\]: The path of `semantic_mask/xxxxx.bin`.
- info\['annos'\]: The annotations of each scan.
- annotations\['gt_num'\]: The number of ground truths.
- annotations\['name'\]: The semantic name of all ground truths, e.g. `chair`.
- annotations\['location'\]: The gravity center of the axis-aligned 3D bounding boxes in depth coordinate system. Shape: \[K, 3\], K is the number of ground truths.
- annotations\['dimensions'\]: The dimensions of the axis-aligned 3D bounding boxes in depth coordinate system, i.e. (x_size, y_size, z_size), shape: \[K, 3\].
- annotations\['gt_boxes_upright_depth'\]: The axis-aligned 3D bounding boxes in depth coordinate system, each bounding box is (x, y, z, x_size, y_size, z_size), shape: \[K, 6\].
- annotations\['unaligned_location'\]: The gravity center of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['unaligned_dimensions'\]: The dimensions of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['unaligned_gt_boxes_upright_depth'\]: The axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['index'\]: The index of all ground truths, i.e. \[0, K).
- annotations\['class'\]: The train class ID of the bounding boxes, value range: \[0, 18), shape: \[K, \].
- `scannet_infos_val.pkl`: The val data infos, which shares the same format as `scannet_infos_train.pkl`.
- `scannet_infos_test.pkl`: The test data infos, which almost shares the same format as `scannet_infos_train.pkl` except for the lack of annotation.
## Training pipeline
A typical training pipeline of ScanNet for 3D detection is as follows.
......@@ -291,11 +290,11 @@ train_pipeline = [
```
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
- `PointSegClassMapping`: Only the valid category IDs will be mapped to class label IDs like [0, 18) during training.
- `PointSegClassMapping`: Only the valid category IDs will be mapped to class label IDs like \[0, 18) during training.
- Data augmentation:
- `PointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation).
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-5, 5\] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation).
## Metrics
......
......@@ -108,7 +108,7 @@ train_pipeline = [
]
```
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like \[0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`.
- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.5` for ScanNet.
- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`.
......
......@@ -116,7 +116,7 @@ Under each following folder there are overall 5285 train files and 5050 val file
- `label_v1`: Detection annotation data in `.txt` (version 1)
- `seg_label`: Segmentation annotation data in `.txt`
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
### Create dataset
......@@ -240,25 +240,24 @@ sunrgbd
- `points/0xxxxx.bin`: The point cloud data after downsample.
- `sunrgbd_infos_train.pkl`: The train data infos, the detailed info of each scene is as follows:
- info['point_cloud']: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx` is the index of the scene.
- info['pts_path']: The path of `points/0xxxxx.bin`.
- info['image']: The image path and metainfo:
- image['image_idx']: The index of the image.
- image['image_shape']: The shape of the image tensor.
- image['image_path']: The path of the image.
- info['annos']: The annotations of each scene.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: [K, 3].
- annotations['rotation_y']: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: [K, ].
- annotations['gt_boxes_upright_depth']: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: [K, 7].
- annotations['bbox']: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: [K, 4].
- annotations['index']: The index of all ground truths, range [0, K).
- annotations['class']: The train class id of the bounding boxes, value range: [0, 10), shape: [K, ].
- info\['point_cloud'\]: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx\` is the index of the scene.
- info\['pts_path'\]: The path of `points/0xxxxx.bin`.
- info\['image'\]: The image path and metainfo:
- image\['image_idx'\]: The index of the image.
- image\['image_shape'\]: The shape of the image tensor.
- image\['image_path'\]: The path of the image.
- info\['annos'\]: The annotations of each scene.
- annotations\['gt_num'\]: The number of ground truths.
- annotations\['name'\]: The semantic name of all ground truths, e.g. `chair`.
- annotations\['location'\]: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: \[K, 3\], K is the number of ground truths.
- annotations\['dimensions'\]: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: \[K, 3\].
- annotations\['rotation_y'\]: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: \[K, \].
- annotations\['gt_boxes_upright_depth'\]: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: \[K, 7\].
- annotations\['bbox'\]: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: \[K, 4\].
- annotations\['index'\]: The index of all ground truths, range \[0, K).
- annotations\['class'\]: The train class id of the bounding boxes, value range: \[0, 10), shape: \[K, \].
- `sunrgbd_infos_val.pkl`: The val data infos, which shares the same format as `sunrgbd_infos_train.pkl`.
## Train pipeline
A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
......@@ -289,8 +288,9 @@ train_pipeline = [
```
Data augmentation for point clouds:
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-30, 30] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of [0.85, 1.15] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-30, 30\] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of \[0.85, 1.15\] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).
- `PointSample`: downsample the input point cloud.
A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
......@@ -332,6 +332,7 @@ train_pipeline = [
```
Data augmentation/normalization for images:
- `Resize`: resize the input image, `keep_ratio=True` means the ratio of the image is kept unchanged.
- `Normalize`: normalize the RGB channels of the input image.
- `RandomFlip`: randomly flip the input image.
......
......@@ -103,36 +103,36 @@ Considering there are many similar frames in the original dataset, we can basica
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
Then you can evaluate your models on Waymo. An example to evaluate PointPillars on Waymo with 8 GPUs with Waymo metrics is as follows.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--eval waymo --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```
`pklfile_prefix` should be given in the `--eval-options` if the bin file is needed to be generated. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definitions of difficulty and different methods of computing Average Precision.
......@@ -148,28 +148,28 @@ Then you can evaluate your models on Waymo. An example to evaluate PointPillars
An example to test PointPillars on Waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--format-only --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```
After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). Basically, here are some example commands.
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. Make sure you change the fields in `submission.txtpb` before running the command above.
......@@ -4,9 +4,39 @@ We list some potential troubles encountered by users and developers, along with
## MMCV/MMDet/MMDet3D Installation
- Compatibility issue between MMCV, MMDetection, MMSegmentation and MMDection3D; "ConvWS is already registered in conv layer"; "AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, \<=xxx."
The required versions of MMCV, MMDetection and MMSegmentation for different versions of MMDetection3D are as below. Please install the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
| :-------------------: | :---------------------: | :--------------------: | :------------------------: |
| master | mmdet>=2.24.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.6.0 |
| v1.0.0rc3 | mmdet>=2.24.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.6.0 |
| v1.0.0rc2 | mmdet>=2.24.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.6.0 |
| v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0 |
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.17.3 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.2 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.1 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.16.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.15.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.14.0 | mmdet>=2.10.0, <=2.11.0 | mmseg==0.14.0 | mmcv-full>=1.3.1, <=1.4.0 |
| 0.13.0 | mmdet>=2.10.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.12.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.11.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.10.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.9.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.8.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.7.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.6.0 | mmdet>=2.4.0, <=2.11.0 | Not required | mmcv-full>=1.1.3, <=1.2.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5 |
- If you faced the error shown below when importing open3d:
``OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found``
`OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found`
please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
......@@ -21,15 +51,15 @@ We list some potential troubles encountered by users and developers, along with
- If you face the error shown below when importing pycocotools:
``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject``
`ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy < 1.20.0. Or you can compile and install the latest pycocotools from source as below:
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy \< 1.20.0. Or you can compile and install the latest pycocotools from source as below:
``pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"``
`pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
or
``pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"``
`pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
## How to annotate point cloud?
......
# Prerequisites
In this section we demonstrate how to prepare an environment with PyTorch.
MMDection3D works on Linux, Windows (experimental support) and macOS and requires the following packages:
- Linux or macOS (Windows is in experimental support)
- Python 3.6+
- PyTorch 1.3+
- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
- GCC 5+
- [MMCV](https://mmcv.readthedocs.io/en/latest/#installation)
The required versions of MMCV, MMDetection and MMSegmentation for different versions of MMDetection3D are as below. Please install the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
| :-------------------: | :---------------------: | :--------------------: | :------------------------: |
| master | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc2 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0 |
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.17.3 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.2 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.1 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.16.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.15.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.14.0 | mmdet>=2.10.0, <=2.11.0 | mmseg==0.14.0 | mmcv-full>=1.3.1, <=1.4.0 |
| 0.13.0 | mmdet>=2.10.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.12.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.11.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.10.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.9.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.8.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.7.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.6.0 | mmdet>=2.4.0, <=2.11.0 | Not required | mmcv-full>=1.1.3, <=1.2.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5 |
# Installation
## Install MMDetection3D
### Quick installation instructions script
Assuming that you already have CUDA 11.0 installed, here is a full script for quick installation of MMDetection3D with conda.
Otherwise, you should refer to the step-by-step installation instructions in the next section.
```shell
conda create -n open-mmlab python=3.7 pytorch=1.9 cudatoolkit=11.0 torchvision -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
mim install mmdet
mim install mmsegmentation
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip3 install -e .
```{note}
If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
```
### Step-by-step installation instructions
**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
**a. Create a conda virtual environment and activate it.**
**Step 1.** Create a conda environment and activate it.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
```
**b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/).**
**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.
```shell
conda install pytorch torchvision -c pytorch
```
Note: Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
`E.g. 1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install
PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.
```shell
conda install pytorch==1.5.0 cudatoolkit=10.1 torchvision==0.6.0 -c pytorch
```
`E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install
PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.
On GPU platforms:
```shell
conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
```
If you build PyTorch from source instead of installing the prebuilt package,
you can use more CUDA versions such as 9.0.
**c. Install [MMCV](https://mmcv.readthedocs.io/en/latest/).**
*mmcv-full* is necessary since MMDetection3D relies on MMDetection, CUDA ops in *mmcv-full* are required.
`e.g.` The pre-build *mmcv-full* could be installed by running: (available versions could be found [here](https://mmcv.readthedocs.io/en/latest/#install-with-pip))
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
conda install pytorch torchvision -c pytorch
```
Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv-full` with `CUDA 11` and `PyTorch 1.7.0`, use the following command:
On CPU platforms:
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
conda install pytorch torchvision cpuonly -c pytorch
```
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
# Installation
```shell
# We can ignore the micro version of PyTorch
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html
```
We recommend that users follow our best practices to install MMDetection3D. However, the whole process is highly customizable. See [Customize Installation](#customize-installation) section for more information.
See [here](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions.
Optionally, you could also build the full version from source:
## Best Practices
Assuming that you already have CUDA 11.0 installed, here is a full script for quick installation of MMDetection3D with conda.
Otherwise, you should refer to the step-by-step installation instructions in the next section.
```shell
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e . # package mmcv-full will be installed after this step
cd ..
pip install openmim
mim install mmcv-full
mim install mmdet
mim install mmsegmentation
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -e .
```
Or directly run
**Step 0.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).
```shell
pip install mmcv-full
```
**Step 1.** Install [MMDetection](https://github.com/open-mmlab/mmdetection).
**d. Install [MMDetection](https://github.com/open-mmlab/mmdetection).**
```shell
pip install mmdet
......@@ -140,12 +67,12 @@ Optionally, you could also build MMDetection from source in case you want to mod
```shell
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
git checkout v2.19.0 # switch to v2.19.0 branch
git checkout v2.24.0 # switch to v2.24.0 branch
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
```
**e. Install [MMSegmentation](https://github.com/open-mmlab/mmsegmentation).**
**Step 2.** Install [MMSegmentation](https://github.com/open-mmlab/mmsegmentation).
```shell
pip install mmsegmentation
......@@ -160,14 +87,14 @@ git checkout v0.20.0 # switch to v0.20.0 branch
pip install -e . # or "python setup.py develop"
```
**f. Clone the MMDetection3D repository.**
**Step 3.** Clone the MMDetection3D repository.
```shell
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
```
**g.Install build requirements and then install MMDetection3D.**
**Step 4.** Install build requirements and then install MMDetection3D.
```shell
pip install -v -e . # or "python setup.py develop"
......@@ -215,60 +142,10 @@ you can install it before installing MMCV.
5. The code can not be built for CPU only environment (where CUDA isn't available) for now.
## Another option: Docker Image
We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/master/docker/Dockerfile) to build an image.
```shell
# build an image with PyTorch 1.6, CUDA 10.1
docker build -t mmdetection3d -f docker/Dockerfile .
```
Run it with
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d
```
## A from-scratch setup script
Here is a full script for setting up MMdetection3D with conda.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest PyTorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y
# install mmcv
pip install mmcv-full
# install mmdetection
pip install git+https://github.com/open-mmlab/mmdetection.git
# install mmsegmentation
pip install git+https://github.com/open-mmlab/mmsegmentation.git
# install mmdetection3d
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -v -e .
```
## Using multiple MMDetection3D versions
The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMDetection3D in the current directory.
## Verification
To use the default MMDetection3D installed in the environment rather than that you are working with, you can remove the following line in those scripts
```shell
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
```
# Verification
## Verify with point cloud demo
### Verify with point cloud demo
We provide several demo scripts to test a single sample. Pre-trained models can be downloaded from [model zoo](model_zoo.md). To test a single-modality 3D detection on point cloud scenes:
......@@ -326,24 +203,76 @@ to_ply('./test.obj', './test.ply', 'obj')
More demos about single/multi-modality and indoor/outdoor 3D detection can be found in [demo](demo.md).
## High-level APIs for testing point clouds
## Customize Installation
### Synchronous interface
### CUDA Versions
When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
Here is an example of building the model and test given point clouds.
- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
```python
from mmdet3d.apis import init_model, inference_detector
Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
```{note}
Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
```
### Install MMCV without MIM
MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
For example, the following command install mmcv-full built for PyTorch 1.10.x and CUDA 11.3.
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
```
### Using MMDetection3D with Docker
We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/master/docker/Dockerfile) to build an image.
config_file = 'configs/votenet/votenet_8x8_scannet-3d-18class.py'
checkpoint_file = 'checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth'
```shell
# build an image with PyTorch 1.6, CUDA 10.1
docker build -t mmdetection3d -f docker/Dockerfile .
```
# build the model from a config file and a checkpoint file
model = init_model(config_file, checkpoint_file, device='cuda:0')
Run it with
# test a single image and show the results
point_cloud = 'test.bin'
result, data = inference_detector(model, point_cloud)
# visualize the results and save the results in 'results' folder
model.show_results(data, result, out_dir='results')
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d
```
### A from-scratch setup script
Here is a full script for setting up MMdetection3D with conda.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest PyTorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y
# install mmcv
pip install mmcv-full
# install mmdetection
pip install git+https://github.com/open-mmlab/mmdetection.git
# install mmsegmentation
pip install git+https://github.com/open-mmlab/mmsegmentation.git
# install mmdetection3d
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -v -e .
```
## Trouble shooting
If you have some issues during the installation, please first view the [FAQ](faq.md) page.
You may [open an issue](https://github.com/open-mmlab/mmdetection3d/issues/new/choose) on GitHub if no solution is found.
......@@ -33,9 +33,11 @@ Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection
Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/mvxnet) for details.
### RegNetX
Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently.
### nuImages
We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently.
### H3DNet
......@@ -98,6 +100,10 @@ Please refer to [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.
Please refer to [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) for details. We provide MonoFlex baselines on KITTI dataset.
### SA-SSD
Please refer to [SA-SSD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/sassd) for details. We provide SA-SSD baselines on the KITTI dataset.
### Mixed Precision (FP16) Training
Please refer to [Mixed Precision (FP16) Training on PointPillars](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py) for details.
......@@ -67,8 +67,8 @@ We can also achieve better performance with finetuned FCOS3D by running:
./tools/dist_train.sh fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py 8
```
after training a baseline model with the previous script.
Please remember to modify the path [here](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py#L8) correspondingly.
After training a baseline model with the previous script,
please remember to modify the path [here](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py#L8) correspondingly.
## Quantitative Evaluation
......
......@@ -94,9 +94,8 @@ data = dict(
test=dict(pipeline=test_pipeline, classes=class_names, file_client_args=file_client_args))
```
## Load pretrained model from Ceph
```python
model = dict(
pts_backbone=dict(
......@@ -109,6 +108,7 @@ model = dict(
```
## Load checkpoint from Ceph
```python
# replace the path with your checkpoint path on Ceph
load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
......@@ -132,6 +132,7 @@ evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetect
```
## Save the training log into Ceph
The training log will be backed up to the specified Ceph path after training.
```python
......@@ -141,6 +142,7 @@ log_config = dict(
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
])
```
You can also delete the local training log after backing up to the specified Ceph path by setting `keep_local = False`.
```python
......
......@@ -34,14 +34,14 @@ We follow the below style to name config files. Contributors are advised to foll
- `{backbone}`: backbone type like `regnet-400mf`, `regnet-1.6gf`.
- `[neck]`: neck type like `fpn`, `secfpn`.
- `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `sbn` (Synchronized Batch Normalization).
`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
- `[misc]`: miscellaneous setting/plugins of model, e.g. `strong-aug` means using stronger augmentation strategies for training.
- `[batch_per_gpu x gpu]`: samples per GPU and GPUs, `4x8` is used by default.
- `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc.
`1x` and `2x` means 12 epochs and 24 epochs respectively.
`20e` is adopted in cascade models, which denotes 20 epochs.
For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
`1x` and `2x` means 12 epochs and 24 epochs respectively.
`20e` is adopted in cascade models, which denotes 20 epochs.
For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
- `{dataset}`: dataset like `nus-3d`, `kitti-3d`, `lyft-3d`, `scannet-3d`, `sunrgbd-3d`. We also indicate the number of classes we are using if there exist multiple settings, e.g., `kitti-3d-3class` and `kitti-3d-car` means training on KITTI dataset with 3 classes and single class, respectively.
## Deprecated train_cfg/test_cfg
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment