Unverified Commit cf922153 authored by Shilong Zhang's avatar Shilong Zhang Committed by GitHub
Browse files

[Docs] Replace markdownlint with mdformat for avoiding installing ruby (#1489)

parent 7f4bb54d
......@@ -62,63 +62,63 @@ Next, we will elaborate on the details recorded in these info files.
- `nuscenes_database/xxxxx.bin`: point cloud data included in each 3D bounding box of the training dataset
- `nuscenes_infos_train.pkl`: training dataset info, each frame info has two keys: `metadata` and `infos`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.0-trainval'}`, while `infos` contains the detailed information as follows:
- info['lidar_path']: The file path of the lidar point cloud data.
- info['token']: Sample data token.
- info['sweeps']: Sweeps information (`sweeps` in the nuScenes refer to the intermediate frames without annotations, while `samples` refer to those key frames with annotations).
- info['sweeps'][i]['data_path']: The data path of i-th sweep.
- info['sweeps'][i]['type']: The sweep data type, e.g., `'lidar'`.
- info['sweeps'][i]['sample_data_token']: The sweep sample data token.
- info['sweeps'][i]['sensor2ego_translation']: The translation from the current sensor (for collecting the sweep data) to ego vehicle. (1x3 list)
- info['sweeps'][i]['sensor2ego_rotation']: The rotation from the current sensor (for collecting the sweep data) to ego vehicle. (1x4 list in the quaternion format)
- info['sweeps'][i]['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['sweeps'][i]['ego2global_rotation']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['sweeps'][i]['timestamp']: Timestamp of the sweep data.
- info['sweeps'][i]['sensor2lidar_translation']: The translation from the current sensor (for collecting the sweep data) to lidar. (1x3 list)
- info['sweeps'][i]['sensor2lidar_rotation']: The rotation from the current sensor (for collecting the sweep data) to lidar. (1x4 list in the quaternion format)
- info['cams']: Cameras calibration information. It contains six keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`.
`metadata` contains the basic information for the dataset itself, such as `{'version': 'v1.0-trainval'}`, while `infos` contains the detailed information as follows:
- info\['lidar_path'\]: The file path of the lidar point cloud data.
- info\['token'\]: Sample data token.
- info\['sweeps'\]: Sweeps information (`sweeps` in the nuScenes refer to the intermediate frames without annotations, while `samples` refer to those key frames with annotations).
- info\['sweeps'\]\[i\]\['data_path'\]: The data path of i-th sweep.
- info\['sweeps'\]\[i\]\['type'\]: The sweep data type, e.g., `'lidar'`.
- info\['sweeps'\]\[i\]\['sample_data_token'\]: The sweep sample data token.
- info\['sweeps'\]\[i\]\['sensor2ego_translation'\]: The translation from the current sensor (for collecting the sweep data) to ego vehicle. (1x3 list)
- info\['sweeps'\]\[i\]\['sensor2ego_rotation'\]: The rotation from the current sensor (for collecting the sweep data) to ego vehicle. (1x4 list in the quaternion format)
- info\['sweeps'\]\[i\]\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['sweeps'\]\[i\]\['ego2global_rotation'\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['sweeps'\]\[i\]\['timestamp'\]: Timestamp of the sweep data.
- info\['sweeps'\]\[i\]\['sensor2lidar_translation'\]: The translation from the current sensor (for collecting the sweep data) to lidar. (1x3 list)
- info\['sweeps'\]\[i\]\['sensor2lidar_rotation'\]: The rotation from the current sensor (for collecting the sweep data) to lidar. (1x4 list in the quaternion format)
- info\['cams'\]: Cameras calibration information. It contains six keys corresponding to each camera: `'CAM_FRONT'`, `'CAM_FRONT_RIGHT'`, `'CAM_FRONT_LEFT'`, `'CAM_BACK'`, `'CAM_BACK_LEFT'`, `'CAM_BACK_RIGHT'`.
Each dictionary contains detailed information following the above way for each sweep data (has the same keys for each information as above). In addition, each camera has a key `'cam_intrinsic'` for recording the intrinsic parameters when projecting 3D points to each image plane.
- info['lidar2ego_translation']: The translation from lidar to ego vehicle. (1x3 list)
- info['lidar2ego_rotation']: The rotation from lidar to ego vehicle. (1x4 list in the quaternion format)
- info['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['ego2global_rotation']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['timestamp']: Timestamp of the sample data.
- info['gt_boxes']: 7-DoF annotations of 3D bounding boxes, an Nx7 array.
- info['gt_names']: Categories of 3D bounding boxes, an 1xN array.
- info['gt_velocity']: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info['num_lidar_pts']: Number of lidar points included in each 3D bounding box.
- info['num_radar_pts']: Number of radar points included in each 3D bounding box.
- info['valid_flag']: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
- info\['lidar2ego_translation'\]: The translation from lidar to ego vehicle. (1x3 list)
- info\['lidar2ego_rotation'\]: The rotation from lidar to ego vehicle. (1x4 list in the quaternion format)
- info\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['ego2global_rotation'\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['timestamp'\]: Timestamp of the sample data.
- info\['gt_boxes'\]: 7-DoF annotations of 3D bounding boxes, an Nx7 array.
- info\['gt_names'\]: Categories of 3D bounding boxes, an 1xN array.
- info\['gt_velocity'\]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info\['num_lidar_pts'\]: Number of lidar points included in each 3D bounding box.
- info\['num_radar_pts'\]: Number of radar points included in each 3D bounding box.
- info\['valid_flag'\]: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
- `nuscenes_infos_train_mono3d.coco.json`: training dataset coco-style info. This file organizes image-based data into three categories (keys): `'categories'`, `'images'`, `'annotations'`.
- info['categories']: A list containing all the category names. Each element follows the dictionary format and consists of two keys: `'id'` and `'name'`.
- info['images']: A list containing all the image info.
- info['images'][i]['file_name']: The file name of the i-th image.
- info['images'][i]['id']: Sample data token of the i-th image.
- info['images'][i]['token']: Sample token corresponding to this frame.
- info['images'][i]['cam2ego_rotation']: The rotation from the camera to ego vehicle. (1x4 list in the quaternion format)
- info['images'][i]['cam2ego_translation']: The translation from the camera to ego vehicle. (1x3 list)
- info['images'][i]['ego2global_rotation'']: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info['images'][i]['ego2global_translation']: The translation from the ego vehicle to global coordinates. (1x3 list)
- info['images'][i]['cam_intrinsic']: Camera intrinsic matrix. (3x3 list)
- info['images'][i]['width']: Image width, 1600 by default in nuScenes.
- info['images'][i]['height']: Image height, 900 by default in nuScenes.
- info['annotations']: A list containing all the annotation info.
- info['annotations'][i]['file_name']: The file name of the corresponding image.
- info['annotations'][i]['image_id']: The image id (token) of the corresponding image.
- info['annotations'][i]['area']: Area of the 2D bounding box.
- info['annotations'][i]['category_name']: Category name.
- info['annotations'][i]['category_id']: Category id.
- info['annotations'][i]['bbox']: 2D bounding box annotation (exterior rectangle of the projected 3D box), 1x4 list following [x1, y1, x2-x1, y2-y1].
x1/y1 are minimum coordinates along horizontal/vertical direction of the image.
- info['annotations'][i]['iscrowd']: Whether the region is crowded. Defaults to 0.
- info['annotations'][i]['bbox_cam3d']: 3D bounding box (gravity) center location (3), size (3), (global) yaw angle (1), 1x7 list.
- info['annotations'][i]['velo_cam3d']: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info['annotations'][i]['center2d']: Projected 3D-center containing 2.5D information: projected center location on the image (2) and depth (1), 1x3 list.
- info['annotations'][i]['attribute_name']: Attribute name.
- info['annotations'][i]['attribute_id']: Attribute id.
We maintain a default attribute collection and mapping for attribute classification.
Please refer to [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L53) for more details.
- info['annotations'][i]['id']: Annotation id. Defaults to `i`.
- info\['categories'\]: A list containing all the category names. Each element follows the dictionary format and consists of two keys: `'id'` and `'name'`.
- info\['images'\]: A list containing all the image info.
- info\['images'\]\[i\]\['file_name'\]: The file name of the i-th image.
- info\['images'\]\[i\]\['id'\]: Sample data token of the i-th image.
- info\['images'\]\[i\]\['token'\]: Sample token corresponding to this frame.
- info\['images'\]\[i\]\['cam2ego_rotation'\]: The rotation from the camera to ego vehicle. (1x4 list in the quaternion format)
- info\['images'\]\[i\]\['cam2ego_translation'\]: The translation from the camera to ego vehicle. (1x3 list)
- info\['images'\]\[i\]\['ego2global_rotation''\]: The rotation from the ego vehicle to global coordinates. (1x4 list in the quaternion format)
- info\['images'\]\[i\]\['ego2global_translation'\]: The translation from the ego vehicle to global coordinates. (1x3 list)
- info\['images'\]\[i\]\['cam_intrinsic'\]: Camera intrinsic matrix. (3x3 list)
- info\['images'\]\[i\]\['width'\]: Image width, 1600 by default in nuScenes.
- info\['images'\]\[i\]\['height'\]: Image height, 900 by default in nuScenes.
- info\['annotations'\]: A list containing all the annotation info.
- info\['annotations'\]\[i\]\['file_name'\]: The file name of the corresponding image.
- info\['annotations'\]\[i\]\['image_id'\]: The image id (token) of the corresponding image.
- info\['annotations'\]\[i\]\['area'\]: Area of the 2D bounding box.
- info\['annotations'\]\[i\]\['category_name'\]: Category name.
- info\['annotations'\]\[i\]\['category_id'\]: Category id.
- info\['annotations'\]\[i\]\['bbox'\]: 2D bounding box annotation (exterior rectangle of the projected 3D box), 1x4 list following \[x1, y1, x2-x1, y2-y1\].
x1/y1 are minimum coordinates along horizontal/vertical direction of the image.
- info\['annotations'\]\[i\]\['iscrowd'\]: Whether the region is crowded. Defaults to 0.
- info\['annotations'\]\[i\]\['bbox_cam3d'\]: 3D bounding box (gravity) center location (3), size (3), (global) yaw angle (1), 1x7 list.
- info\['annotations'\]\[i\]\['velo_cam3d'\]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), an Nx2 array.
- info\['annotations'\]\[i\]\['center2d'\]: Projected 3D-center containing 2.5D information: projected center location on the image (2) and depth (1), 1x3 list.
- info\['annotations'\]\[i\]\['attribute_name'\]: Attribute name.
- info\['annotations'\]\[i\]\['attribute_id'\]: Attribute id.
We maintain a default attribute collection and mapping for attribute classification.
Please refer to [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/nuscenes_mono_dataset.py#L53) for more details.
- info\['annotations'\]\[i\]\['id'\]: Annotation id. Defaults to `i`.
Here we only explain the data recorded in the training info files. The same applies to validation and testing set.
......@@ -194,10 +194,11 @@ train_pipeline = [
```
It follows the general pipeline of 2D detection while differs in some details:
- It uses monocular pipelines to load images, which includes additional required information like camera intrinsics.
- It needs to load 3D annotations.
- Some data augmentation techniques need to be adjusted, such as `RandomFlip3D`.
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
## Evaluation
......
......@@ -36,10 +36,12 @@ mmdetection3d
Under folder `Stanford3dDataset_v1.2_Aligned_Version`, the rooms are spilted into 6 areas. We use 5 areas for training and 1 for evaluation (typically `Area_5`). Under the directory of each area, there are folders in which raw point cloud data and relevant annotations are saved. For instance, under folder `Area_1/office_1` the files are as below:
- `office_1.txt`: A txt file storing coordinates and colors of each point in the raw point cloud data.
- `Annotations/`: This folder contains txt files for different object instances. Each txt file represents one instance, e.g.
- `chair_1.txt`: A txt file storing raw point cloud data of one chair in this room.
If we concat all the txt files under `Annotations/`, we will get the same point cloud as denoted by `office_1.txt`.
- `chair_1.txt`: A txt file storing raw point cloud data of one chair in this room.
If we concat all the txt files under `Annotations/`, we will get the same point cloud as denoted by `office_1.txt`.
Export S3DIS data by running `python collect_indoor3d_data.py`. The main steps include:
......@@ -138,16 +140,16 @@ s3dis
```
- `points/xxxxx.bin`: The exported point cloud data.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, ${NUM_INSTANCES}], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [0, 12].
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: \[0, ${NUM_INSTANCES}\], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: \[0, 12\].
- `s3dis_infos_Area_1.pkl`: Area 1 data infos, the detailed info of each room is as follows:
- info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
- info['pts_path']: The path of `points/xxxxx.bin`.
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: The path of `points/xxxxx.bin`.
- info\['pts_instance_mask_path'\]: The path of `instance_mask/xxxxx.bin`.
- info\['pts_semantic_mask_path'\]: The path of `semantic_mask/xxxxx.bin`.
- `seg_info`: The generated infos to support semantic segmentation model training.
- `Area_1_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
- `Area_1_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
- `Area_1_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
- `Area_1_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
## Training pipeline
......@@ -200,13 +202,13 @@ train_pipeline = [
]
```
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 13) during training. Other class ids will be converted to `ignore_index` which equals to `13`.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like \[0, 13) during training. Other class ids will be converted to `ignore_index` which equals to `13`.
- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.0` for S3DIS.
- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`.
- Data augmentation:
- `GlobalRotScaleTrans`: randomly rotate and scale input point cloud.
- `RandomJitterPoints`: randomly jitter point cloud by adding different noise vector to each point.
- `RandomDropPointsColor`: set the colors of point cloud to all zeros by a probability `drop_ratio`.
- `GlobalRotScaleTrans`: randomly rotate and scale input point cloud.
- `RandomJitterPoints`: randomly jitter point cloud by adding different noise vector to each point.
- `RandomDropPointsColor`: set the colors of point cloud to all zeros by a probability `drop_ratio`.
## Metrics
......
......@@ -135,7 +135,7 @@ By exporting ScanNet RGB data, for each scene we load a set of RGB images with c
python extract_posed_images.py
```
Each of 1201 train, 312 validation and 100 test scenes contains a single `.sens` file. For instance, for scene `0001_01` we have `data/scannet/scans/scene0001_01/0001_01.sens`. For this scene all images and poses are extracted to `data/scannet/posed_images/scene0001_01`. Specifically, there will be 300 image files xxxxx.jpg, 300 camera pose files xxxxx.txt and a single `intrinsic.txt` file. Typically, single scene contains several thousand images. By default, we extract only 300 of them with resulting space occupation of <100 Gb. To extract more images, use `--max-images-per-scene` parameter.
Each of 1201 train, 312 validation and 100 test scenes contains a single `.sens` file. For instance, for scene `0001_01` we have `data/scannet/scans/scene0001_01/0001_01.sens`. For this scene all images and poses are extracted to `data/scannet/posed_images/scene0001_01`. Specifically, there will be 300 image files xxxxx.jpg, 300 camera pose files xxxxx.txt and a single `intrinsic.txt` file. Typically, single scene contains several thousand images. By default, we extract only 300 of them with resulting space occupation of \<100 Gb. To extract more images, use `--max-images-per-scene` parameter.
### Create dataset
......@@ -222,29 +222,28 @@ scannet
```
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline [`GlobalAlignment`](https://github.com/open-mmlab/mmdetection3d/blob/9f0b01caf6aefed861ef4c3eb197c09362d26b32/mmdet3d/datasets/pipelines/transforms_3d.py#L423) of 3D detection task.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` ID will be mapped to train ID in train pipeline `PointSegClassMapping`.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: \[0, NUM_INSTANCES\], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: \[1, 40\], i.e. `nyu40id` standard. Note: the `nyu40id` ID will be mapped to train ID in train pipeline `PointSegClassMapping`.
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
- `scannet_infos_train.pkl`: The train data infos, the detailed info of each scan is as follows:
- info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
- info['pts_path']: The path of `points/xxxxx.bin`.
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info['annos']: The annotations of each scan.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes in depth coordinate system, i.e. (x_size, y_size, z_size), shape: [K, 3].
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes in depth coordinate system, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations['index']: The index of all ground truths, i.e. [0, K).
- annotations['class']: The train class ID of the bounding boxes, value range: [0, 18), shape: [K, ].
- info\['point_cloud'\]: {'num_features': 6, 'lidar_idx': sample_idx}.
- info\['pts_path'\]: The path of `points/xxxxx.bin`.
- info\['pts_instance_mask_path'\]: The path of `instance_mask/xxxxx.bin`.
- info\['pts_semantic_mask_path'\]: The path of `semantic_mask/xxxxx.bin`.
- info\['annos'\]: The annotations of each scan.
- annotations\['gt_num'\]: The number of ground truths.
- annotations\['name'\]: The semantic name of all ground truths, e.g. `chair`.
- annotations\['location'\]: The gravity center of the axis-aligned 3D bounding boxes in depth coordinate system. Shape: \[K, 3\], K is the number of ground truths.
- annotations\['dimensions'\]: The dimensions of the axis-aligned 3D bounding boxes in depth coordinate system, i.e. (x_size, y_size, z_size), shape: \[K, 3\].
- annotations\['gt_boxes_upright_depth'\]: The axis-aligned 3D bounding boxes in depth coordinate system, each bounding box is (x, y, z, x_size, y_size, z_size), shape: \[K, 6\].
- annotations\['unaligned_location'\]: The gravity center of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['unaligned_dimensions'\]: The dimensions of the axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['unaligned_gt_boxes_upright_depth'\]: The axis-unaligned 3D bounding boxes in depth coordinate system.
- annotations\['index'\]: The index of all ground truths, i.e. \[0, K).
- annotations\['class'\]: The train class ID of the bounding boxes, value range: \[0, 18), shape: \[K, \].
- `scannet_infos_val.pkl`: The val data infos, which shares the same format as `scannet_infos_train.pkl`.
- `scannet_infos_test.pkl`: The test data infos, which almost shares the same format as `scannet_infos_train.pkl` except for the lack of annotation.
## Training pipeline
A typical training pipeline of ScanNet for 3D detection is as follows.
......@@ -291,11 +290,11 @@ train_pipeline = [
```
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
- `PointSegClassMapping`: Only the valid category IDs will be mapped to class label IDs like [0, 18) during training.
- `PointSegClassMapping`: Only the valid category IDs will be mapped to class label IDs like \[0, 18) during training.
- Data augmentation:
- `PointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation).
- `PointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-5, 5\] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation).
## Metrics
......
......@@ -67,8 +67,8 @@ scannet
```
- `seg_info`: The generated infos to support semantic segmentation model training.
- `train_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
- `train_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
- `train_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance.
- `train_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
## Training pipeline
......@@ -108,7 +108,7 @@ train_pipeline = [
]
```
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like \[0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`.
- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.5` for ScanNet.
- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`.
......
......@@ -116,7 +116,7 @@ Under each following folder there are overall 5285 train files and 5050 val file
- `label_v1`: Detection annotation data in `.txt` (version 1)
- `seg_label`: Segmentation annotation data in `.txt`
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
### Create dataset
......@@ -240,25 +240,24 @@ sunrgbd
- `points/0xxxxx.bin`: The point cloud data after downsample.
- `sunrgbd_infos_train.pkl`: The train data infos, the detailed info of each scene is as follows:
- info['point_cloud']: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx` is the index of the scene.
- info['pts_path']: The path of `points/0xxxxx.bin`.
- info['image']: The image path and metainfo:
- image['image_idx']: The index of the image.
- image['image_shape']: The shape of the image tensor.
- image['image_path']: The path of the image.
- info['annos']: The annotations of each scene.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: [K, 3].
- annotations['rotation_y']: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: [K, ].
- annotations['gt_boxes_upright_depth']: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: [K, 7].
- annotations['bbox']: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: [K, 4].
- annotations['index']: The index of all ground truths, range [0, K).
- annotations['class']: The train class id of the bounding boxes, value range: [0, 10), shape: [K, ].
- info\['point_cloud'\]: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx\` is the index of the scene.
- info\['pts_path'\]: The path of `points/0xxxxx.bin`.
- info\['image'\]: The image path and metainfo:
- image\['image_idx'\]: The index of the image.
- image\['image_shape'\]: The shape of the image tensor.
- image\['image_path'\]: The path of the image.
- info\['annos'\]: The annotations of each scene.
- annotations\['gt_num'\]: The number of ground truths.
- annotations\['name'\]: The semantic name of all ground truths, e.g. `chair`.
- annotations\['location'\]: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: \[K, 3\], K is the number of ground truths.
- annotations\['dimensions'\]: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: \[K, 3\].
- annotations\['rotation_y'\]: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: \[K, \].
- annotations\['gt_boxes_upright_depth'\]: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: \[K, 7\].
- annotations\['bbox'\]: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: \[K, 4\].
- annotations\['index'\]: The index of all ground truths, range \[0, K).
- annotations\['class'\]: The train class id of the bounding boxes, value range: \[0, 10), shape: \[K, \].
- `sunrgbd_infos_val.pkl`: The val data infos, which shares the same format as `sunrgbd_infos_train.pkl`.
## Train pipeline
A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
......@@ -289,8 +288,9 @@ train_pipeline = [
```
Data augmentation for point clouds:
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-30, 30] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of [0.85, 1.15] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-30, 30\] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of \[0.85, 1.15\] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).
- `PointSample`: downsample the input point cloud.
A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
......@@ -332,6 +332,7 @@ train_pipeline = [
```
Data augmentation/normalization for images:
- `Resize`: resize the input image, `keep_ratio=True` means the ratio of the image is kept unchanged.
- `Normalize`: normalize the RGB channels of the input image.
- `RandomFlip`: randomly flip the input image.
......
......@@ -103,36 +103,36 @@ Considering there are many similar frames in the original dataset, we can basica
For evaluation on Waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`. Basically, you can follow the commands below to install `bazel` and build the file.
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
```shell
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```
Then you can evaluate your models on Waymo. An example to evaluate PointPillars on Waymo with 8 GPUs with Waymo metrics is as follows.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--eval waymo --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--eval waymo --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
`pklfile_prefix` should be given in the `--eval-options` if the bin file is needed to be generated. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definitions of difficulty and different methods of computing Average Precision.
......@@ -148,28 +148,28 @@ Then you can evaluate your models on Waymo. An example to evaluate PointPillars
An example to test PointPillars on Waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--format-only --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
```shell
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \
checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \
--format-only --eval-options 'pklfile_prefix=results/waymo-car/kitti_results' \
'submission_prefix=results/waymo-car/kitti_results'
```
After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md/). Basically, here are some example commands.
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
```shell
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/core/evaluation/waymo_utils/submission.txtpb'
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
```
For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. Make sure you change the fields in `submission.txtpb` before running the command above.
......@@ -6,7 +6,7 @@ We list some potential troubles encountered by users and developers, along with
- If you faced the error shown below when importing open3d:
``OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found``
`OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found`
please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
......@@ -21,15 +21,15 @@ We list some potential troubles encountered by users and developers, along with
- If you face the error shown below when importing pycocotools:
``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject``
`ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy < 1.20.0. Or you can compile and install the latest pycocotools from source as below:
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy \< 1.20.0. Or you can compile and install the latest pycocotools from source as below:
``pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"``
`pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
or
``pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"``
`pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"`
## How to annotate point cloud?
......
......@@ -7,33 +7,32 @@
- GCC 5+
- [MMCV](https://mmcv.readthedocs.io/en/latest/#installation)
The required versions of MMCV, MMDetection and MMSegmentation for different versions of MMDetection3D are as below. Please install the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
| :-------------------: | :---------------------: | :--------------------: | :------------------------: |
| master | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc2 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.7.0 |
| v1.0.0rc1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.4.8, <=1.5.0 |
| v1.0.0rc0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.1 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.18.0 | mmdet>=2.19.0, <=3.0.0 | mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.17, <=1.5.0 |
| 0.17.3 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.2 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.1 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.17.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.16.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.15.0 | mmdet>=2.14.0, <=3.0.0 | mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0 |
| 0.14.0 | mmdet>=2.10.0, <=2.11.0 | mmseg==0.14.0 | mmcv-full>=1.3.1, <=1.4.0 |
| 0.13.0 | mmdet>=2.10.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.12.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.4.0 |
| 0.11.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.10.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.9.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.2.4, <=1.3.0 |
| 0.8.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.7.0 | mmdet>=2.5.0, <=2.11.0 | Not required | mmcv-full>=1.1.5, <=1.3.0 |
| 0.6.0 | mmdet>=2.4.0, <=2.11.0 | Not required | mmcv-full>=1.1.3, <=1.2.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5 |
| MMDetection3D version | MMDetection version | MMSegmentation version | MMCV version |
| :-------------------: | :----------------------: | :---------------------: | :-------------------------: |
| master | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.4.8, \<=1.7.0 |
| v1.0.0rc2 | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.4.8, \<=1.7.0 |
| v1.0.0rc1 | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.4.8, \<=1.5.0 |
| v1.0.0rc0 | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.3.17, \<=1.5.0 |
| 0.18.1 | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.3.17, \<=1.5.0 |
| 0.18.0 | mmdet>=2.19.0, \<=3.0.0 | mmseg>=0.20.0, \<=1.0.0 | mmcv-full>=1.3.17, \<=1.5.0 |
| 0.17.3 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.17.2 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.17.1 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.17.0 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.16.0 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.15.0 | mmdet>=2.14.0, \<=3.0.0 | mmseg>=0.14.1, \<=1.0.0 | mmcv-full>=1.3.8, \<=1.4.0 |
| 0.14.0 | mmdet>=2.10.0, \<=2.11.0 | mmseg==0.14.0 | mmcv-full>=1.3.1, \<=1.4.0 |
| 0.13.0 | mmdet>=2.10.0, \<=2.11.0 | Not required | mmcv-full>=1.2.4, \<=1.4.0 |
| 0.12.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.2.4, \<=1.4.0 |
| 0.11.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.2.4, \<=1.3.0 |
| 0.10.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.2.4, \<=1.3.0 |
| 0.9.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.2.4, \<=1.3.0 |
| 0.8.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.1.5, \<=1.3.0 |
| 0.7.0 | mmdet>=2.5.0, \<=2.11.0 | Not required | mmcv-full>=1.1.5, \<=1.3.0 |
| 0.6.0 | mmdet>=2.4.0, \<=2.11.0 | Not required | mmcv-full>=1.1.3, \<=1.2.0 |
| 0.5.0 | 2.3.0 | Not required | mmcv-full==1.0.5 |
# Installation
......@@ -176,20 +175,20 @@ pip install -v -e . # or "python setup.py develop"
Note:
1. The git commit id will be written to the version number with step d, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
> Important: Be sure to remove the `./build` folder if you reinstall mmdet with a different CUDA/PyTorch version.
> Important: Be sure to remove the `./build` folder if you reinstall mmdet with a different CUDA/PyTorch version.
```shell
pip uninstall mmdet3d
rm -rf ./build
find . -name "*.so" | xargs rm
```
```shell
pip uninstall mmdet3d
rm -rf ./build
find . -name "*.so" | xargs rm
```
2. Following the above instructions, MMDetection3D is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
you can install it before installing MMCV.
you can install it before installing MMCV.
4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`.
......@@ -208,10 +207,10 @@ you can install it before installing MMCV.
We also support Minkowski Engine as a sparse convolution backend. If necessary please follow original [installation guide](https://github.com/NVIDIA/MinkowskiEngine#installation) or use `pip`:
```shell
conda install openblas-devel -c anaconda
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=/opt/conda/include" --install-option="--blas=openblas"
```
```shell
conda install openblas-devel -c anaconda
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=/opt/conda/include" --install-option="--blas=openblas"
```
5. The code can not be built for CPU only environment (where CUDA isn't available) for now.
......@@ -283,7 +282,7 @@ python demo/pcd_demo.py demo/data/kitti/kitti_000008.bin configs/second/hv_secon
```
If you want to input a `ply` file, you can use the following function and convert it to `bin` format. Then you can use the converted `bin` file to generate demo.
Note that you need to install `pandas` and `plyfile` before using this script. This function can also be used for data preprocessing for training ```ply data```.
Note that you need to install `pandas` and `plyfile` before using this script. This function can also be used for data preprocessing for training `ply data`.
```python
import numpy as np
......
......@@ -33,9 +33,11 @@ Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection
Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/mvxnet) for details.
### RegNetX
Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently.
### nuImages
We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently.
### H3DNet
......
......@@ -94,9 +94,8 @@ data = dict(
test=dict(pipeline=test_pipeline, classes=class_names, file_client_args=file_client_args))
```
## Load pretrained model from Ceph
```python
model = dict(
pts_backbone=dict(
......@@ -109,6 +108,7 @@ model = dict(
```
## Load checkpoint from Ceph
```python
# replace the path with your checkpoint path on Ceph
load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
......@@ -132,6 +132,7 @@ evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetect
```
## Save the training log into Ceph
The training log will be backed up to the specified Ceph path after training.
```python
......@@ -141,6 +142,7 @@ log_config = dict(
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
])
```
You can also delete the local training log after backing up to the specified Ceph path by setting `keep_local = False`.
```python
......
......@@ -34,14 +34,14 @@ We follow the below style to name config files. Contributors are advised to foll
- `{backbone}`: backbone type like `regnet-400mf`, `regnet-1.6gf`.
- `[neck]`: neck type like `fpn`, `secfpn`.
- `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `sbn` (Synchronized Batch Normalization).
`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
- `[misc]`: miscellaneous setting/plugins of model, e.g. `strong-aug` means using stronger augmentation strategies for training.
- `[batch_per_gpu x gpu]`: samples per GPU and GPUs, `4x8` is used by default.
- `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc.
`1x` and `2x` means 12 epochs and 24 epochs respectively.
`20e` is adopted in cascade models, which denotes 20 epochs.
For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
`1x` and `2x` means 12 epochs and 24 epochs respectively.
`20e` is adopted in cascade models, which denotes 20 epochs.
For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
- `{dataset}`: dataset like `nus-3d`, `kitti-3d`, `lyft-3d`, `scannet-3d`, `sunrgbd-3d`. We also indicate the number of classes we are using if there exist multiple settings, e.g., `kitti-3d-3class` and `kitti-3d-car` means training on KITTI dataset with 3 classes and single class, respectively.
## Deprecated train_cfg/test_cfg
......
......@@ -7,43 +7,43 @@ MMDetection3D uses three different coordinate systems. The existence of differen
Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
- Camera coordinate system -- the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
```
up z front
| ^
| /
| /
| /
|/
left ------ 0 ------> x right
|
|
|
|
v
y down
```
- LiDAR coordinate system -- the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
```
z up x front
^ ^
| /
| /
| /
|/
y left <------ 0 ------ right
```
- Depth coordinate system -- the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of ``$$`(x, y, z, dx, dy, dz, r)`$$``, our coordinate systems also define how to interpret the box dimensions ``$$`(dx, dy, dz)`$$`` and the yaw angle ``$$`r`$$``.
```
z up y front
^ ^
| /
| /
| /
|/
left ------ 0 ------> x right
```
The definition of coordinate systems in this tutorial is actually **more than just defining the three axes**. For a box in the form of `` $$`(x, y, z, dx, dy, dz, r)`$$ ``, our coordinate systems also define how to interpret the box dimensions `` $$`(dx, dy, dz)`$$ `` and the yaw angle `` $$`r`$$ ``.
The illustration of the three coordinate systems is shown below:
......@@ -55,13 +55,13 @@ We will stick to the three coordinate systems defined in this tutorial in the fu
## Definition of the yaw angle
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane ``$$`\Pi`$$`` perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on ``$$`\Pi`$$`` have non-zero yaw angles depending on its angle with the reference direction.
Please refer to [wikipedia](https://en.wikipedia.org/wiki/Euler_angles#Tait%E2%80%93Bryan_angles) for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane `` $$`\Pi`$$ `` perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on `` $$`\Pi`$$ `` have non-zero yaw angles depending on its angle with the reference direction.
Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one's eyes).
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of ``$$`\frac{\pi}{2}`$$``.
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of `` $$`\frac{\pi}{2}`$$ ``.
```
z up y front (yaw=0.5*pi)
......@@ -92,9 +92,9 @@ __|____|____|____|______\ x right
## Definition of the box dimensions
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be ``$$`dx`$$``. However, this is not always the case in some datasets (we will address that later).
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be `` $$`dx`$$ ``. However, this is not always the case in some datasets (we will address that later).
The following figures show the meaning of the correspondence between the x-axis and ``$$`dx`$$``, and between the y-axis and ``$$`dy`$$``.
The following figures show the meaning of the correspondence between the x-axis and `` $$`dx`$$ ``, and between the y-axis and `` $$`dy`$$ ``.
```
y front
......@@ -111,7 +111,7 @@ __|____|____|____|______\ x right
| dy
```
Note that the box direction is always parallel with the edge ``$$`dx`$$``.
Note that the box direction is always parallel with the edge `` $$`dx`$$ ``.
```
y front
......@@ -138,14 +138,12 @@ In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird's
![](https://raw.githubusercontent.com/traveller59/second.pytorch/master/images/kittibox.png)
For each box, the dimensions are ``$$`(w, l, h)`$$``, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
For each box, the dimensions are `` $$`(w, l, h)`$$ ``, and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the [repo](https://github.com/traveller59/second.pytorch#concepts).
Our LiDAR coordinate system has two changes:
- The yaw angle is defined to be right-handed instead of left-handed for consistency;
- The box dimensions are ``$$`(l, w, h)`$$`` instead of ``$$`(w, l, h)`$$``, since ``$$`w`$$`` corresponds to ``$$`dy`$$`` and ``$$`l`$$`` corresponds to ``$$`dx`$$`` in KITTI.
- The box dimensions are `` $$`(l, w, h)`$$ `` instead of `` $$`(w, l, h)`$$ ``, since `` $$`w`$$ `` corresponds to `` $$`dy`$$ `` and `` $$`l`$$ `` corresponds to `` $$`dx`$$ `` in KITTI.
### Waymo
......@@ -153,7 +151,7 @@ We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also s
### NuScenes
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to ``$$`(dy, dx)`$$``, or ``$$`(w, l)`$$``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a `Box` instance. The coordinate system of `Box` is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to `` $$`(dy, dx)`$$ ``, or `` $$`(w, l)`$$ ``, respectively, instead of the reverse. For more details, please refer to the NuScenes [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md#notes).
Readers may refer to the [NuScenes development kit](https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection) for the definition of a [NuScenes box](https://github.com/nutonomy/nuscenes-devkit/blob/2c6a752319f23910d5f55cc995abc547a9e54142/python-sdk/nuscenes/utils/data_classes.py#L457) and implementation of [NuScenes evaluation](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/evaluate.py).
......@@ -185,25 +183,25 @@ Take the conversion between our Camera coordinate system and LiDAR coordinate sy
First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
- ``$$`x_{LiDAR}=z_{camera}`$$``
- ``$$`y_{LiDAR}=-x_{camera}`$$``
- ``$$`z_{LiDAR}=-y_{camera}`$$``
- `` $$`x_{LiDAR}=z_{camera}`$$ ``
- `` $$`y_{LiDAR}=-x_{camera}`$$ ``
- `` $$`z_{LiDAR}=-y_{camera}`$$ ``
Then, the box dimensions before and after the conversion satisfy the following relationship:
- ``$$`dx_{LiDAR}=dx_{camera}`$$``
- ``$$`dy_{LiDAR}=dz_{camera}`$$``
- ``$$`dz_{LiDAR}=dy_{camera}`$$``
- `` $$`dx_{LiDAR}=dx_{camera}`$$ ``
- `` $$`dy_{LiDAR}=dz_{camera}`$$ ``
- `` $$`dz_{LiDAR}=dy_{camera}`$$ ``
Finally, the yaw angle should also be converted:
- ``$$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$``
- `` $$`r_{LiDAR}=-\frac{\pi}{2}-r_{camera}`$$ ``
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/box_3d_mode.py) for more details.
### Bird's Eye View
The BEV of a camera coordinate system box is ``$$`(x, z, dx, dz, -r)`$$`` if the 3D box is ``$$`(x, y, z, dx, dy, dz, r)`$$``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
The BEV of a camera coordinate system box is `` $$`(x, z, dx, dz, -r)`$$ `` if the 3D box is `` $$`(x, y, z, dx, dy, dz, r)`$$ ``. The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code [here](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/bbox/structures/cam_box3d.py) for more details.
......@@ -225,18 +223,18 @@ For each box related op, we have marked the type of boxes to which we can apply
No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
#### Q3: How does a phase difference of ``$$`2\pi`$$`` in the yaw angle of a box affect evaluation?
#### Q3: How does a phase difference of `` $$`2\pi`$$ `` in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of ``$$`2\pi`$$`` in the yaw angle will result in the same box, thus not affecting evaluation.
For IoU calculation, a phase difference of `` $$`2\pi`$$ `` in the yaw angle will result in the same box, thus not affecting evaluation.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of ``$$`2\pi`$$`` will not change the result.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of `` $$`2\pi`$$ `` will not change the result.
#### Q4: How does a phase difference of ``$$`\pi`$$`` in the yaw angle of a box affect evaluation?
#### Q4: How does a phase difference of `` $$`\pi`$$ `` in the yaw angle of a box affect evaluation?
For IoU calculation, a phase difference of ``$$`\pi`$$`` in the yaw angle will result in the same box, thus not affecting evaluation.
For IoU calculation, a phase difference of `` $$`\pi`$$ `` in the yaw angle will result in the same box, thus not affecting evaluation.
However, for angle prediction evaluation, this will result in the exact opposite direction.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add ``$$`\pi`$$`` to this angle, the car front will become the car rear.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add `` $$`\pi`$$ `` to this angle, the car front will become the car rear.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of ``$$`\pi`$$`` will not affect the angle prediction score.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of `` $$`\pi`$$ `` will not affect the angle prediction score.
......@@ -222,62 +222,62 @@ There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
......
......@@ -41,8 +41,8 @@ To find the above module defined above, this module should be imported into the
- Add `mmdet3d/core/optimizer/__init__.py` to import it.
The newly defined module should be imported in `mmdet3d/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
The newly defined module should be imported in `mmdet3d/core/optimizer/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_optimizer import MyOptimizer
......@@ -116,35 +116,35 @@ Tricks not implemented by the optimizer should be implemented through optimizer
- __Use gradient clip to stabilize training__:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
```python
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
```
```python
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
```
If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings in the base config. See the [config documentation](https://mmdetection.readthedocs.io/en/latest/tutorials/config.html) for more details.
If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to override the unnecessary settings in the base config. See the [config documentation](https://mmdetection.readthedocs.io/en/latest/tutorials/config.html) for more details.
- __Use momentum schedule to accelerate model convergence__:
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/lr_updater.py#L358) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/momentum_updater.py#L225).
```python
lr_config = dict(
policy='cyclic',
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config = dict(
policy='cyclic',
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way.
Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence.
For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/lr_updater.py#L358) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/v1.3.7/mmcv/runner/hooks/momentum_updater.py#L225).
```python
lr_config = dict(
policy='cyclic',
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
momentum_config = dict(
policy='cyclic',
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
)
```
## Customize training schedules
......@@ -153,20 +153,20 @@ We support many other learning rate schedule [here](https://github.com/open-mmla
- Poly schedule:
```python
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
```
```python
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
```
- ConsineAnnealing schedule:
```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```
```python
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=1000,
warmup_ratio=1.0 / 10,
min_lr_ratio=1e-5)
```
## Customize workflow
......@@ -240,8 +240,8 @@ Then we need to make `MyHook` imported. Assuming the hook is in `mmdet3d/core/ut
- Modify `mmdet3d/core/utils/__init__.py` to import it.
The newly defined module should be imported in `mmdet3d/core/utils/__init__.py` so that the registry will
find the new module and add it:
The newly defined module should be imported in `mmdet3d/core/utils/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_hook import MyHook
......
......@@ -86,100 +86,113 @@ For each operation, we list the related dict fields that are added/updated/remov
### Data loading
`LoadPointsFromFile`
- add: points
`LoadPointsFromMultiSweeps`
- update: points
`LoadAnnotations3D`
- add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
### Pre-processing
`GlobalRotScaleTrans`
- add: pcd_trans, pcd_rotation, pcd_scale_factor
- update: points, *bbox3d_fields
- update: points, \*bbox3d_fields
`RandomFlip3D`
- add: flip, pcd_horizontal_flip, pcd_vertical_flip
- update: points, *bbox3d_fields
- update: points, \*bbox3d_fields
`PointsRangeFilter`
- update: points
`ObjectRangeFilter`
- update: gt_bboxes_3d, gt_labels_3d
`ObjectNameFilter`
- update: gt_bboxes_3d, gt_labels_3d
`PointShuffle`
- update: points
`PointsRangeFilter`
- update: points
### Formatting
`DefaultFormatBundle3D`
- update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
`Collect3D`
- add: img_meta (the keys of img_meta is specified by `meta_keys`)
- remove: all other keys except for those specified by `keys`
### Test time augmentation
`MultiScaleFlipAug`
- update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
## Extend and use custom pipelines
1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python
from mmdet.datasets import PIPELINES
```python
from mmdet.datasets import PIPELINES
@PIPELINES.register_module()
class MyTransform:
@PIPELINES.register_module()
class MyTransform:
def __call__(self, results):
results['dummy'] = True
return results
```
def __call__(self, results):
results['dummy'] = True
return results
```
2. Import the new class.
```python
from .my_pipeline import MyTransform
```
```python
from .my_pipeline import MyTransform
```
3. Use it in config files.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
file_client_args=file_client_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
file_client_args=file_client_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='MyTransform'),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
......@@ -37,17 +37,17 @@ python ./tools/deploy.py \
### Description of all arguments
* `deploy_cfg` : The path of deploy config file in MMDeploy codebase.
* `model_cfg` : The path of model config file in OpenMMLab codebase.
* `checkpoint` : The path of model checkpoint file.
* `img` : The path of point cloud file or image file that used to convert model.
* `--test-img` : The path of image file that used to test model. If not specified, it will be set to `None`.
* `--work-dir` : The path of work directory that used to save logs and models.
* `--calib-dataset-cfg` : Only valid in int8 mode. Config used for calibration. If not specified, it will be set to `None` and use "val" dataset in model config for calibration.
* `--device` : The device used for conversion. If not specified, it will be set to `cpu`.
* `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
* `--show` : Whether to show detection outputs.
* `--dump-info` : Whether to output information for SDK.
- `deploy_cfg` : The path of deploy config file in MMDeploy codebase.
- `model_cfg` : The path of model config file in OpenMMLab codebase.
- `checkpoint` : The path of model checkpoint file.
- `img` : The path of point cloud file or image file that used to convert model.
- `--test-img` : The path of image file that used to test model. If not specified, it will be set to `None`.
- `--work-dir` : The path of work directory that used to save logs and models.
- `--calib-dataset-cfg` : Only valid in int8 mode. Config used for calibration. If not specified, it will be set to `None` and use "val" dataset in model config for calibration.
- `--device` : The device used for conversion. If not specified, it will be set to `cpu`.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
- `--show` : Whether to show detection outputs.
- `--dump-info` : Whether to output information for SDK.
### Example
......@@ -110,12 +110,12 @@ python tools/test.py \
## Supported models
| Model | TorchScript | OnnxRuntime | TensorRT | NCNN | PPLNN | OpenVINO | Model config |
| -------------------- | :---------: | :---------: | :------: | :---: | :---: | :------: | -------------------------------------------------------------------------------------- |
| PointPillars | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) |
| CenterPoint (pillar) | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) |
| Model | TorchScript | OnnxRuntime | TensorRT | NCNN | PPLNN | OpenVINO | Model config |
| -------------------- | :---------: | :---------: | :------: | :--: | :---: | :------: | -------------------------------------------------------------------------------------- |
| PointPillars | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) |
| CenterPoint (pillar) | ? | Y | Y | N | N | Y | [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) |
## Note
* MMDeploy version >= 0.4.0.
* Currently, CenterPoint has only supported the pillar version.
- MMDeploy version >= 0.4.0.
- Currently, CenterPoint has only supported the pillar version.
......@@ -261,62 +261,62 @@ There are three ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
If the concatenated dataset is used for test or evaluation, this manner supports to evaluate each dataset separately. To test the concatenated datasets as a whole, you can set `separate_eval=False` as below.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
separate_eval=False,
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
```python
dataset_A_train = dict()
dataset_B_train = dict()
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
If the concatenated dataset is used for test or evaluation, this manner also supports to evaluate each dataset separately.
3. We also support to define `ConcatDataset` explicitly as the following.
```python
dataset_A_val = dict()
dataset_B_val = dict()
```python
dataset_A_val = dict()
dataset_B_val = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dataset_A_train,
val=dict(
type='ConcatDataset',
datasets=[dataset_A_val, dataset_B_val],
separate_eval=False))
```
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
This manner allows users to evaluate all the datasets as a single one by setting `separate_eval=False`.
**Note:**
......@@ -435,7 +435,6 @@ Here you can refer to the setting of the existing datasets. theoretically, `voxe
if the `point_cloud_range` and `voxel_size` are set to be `[0, -40, -3, 70.4, 40, 1]` and `[0.05, 0.05, 0.1]` respectively, then the shape of intermediate feature map should be `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`. More details refers to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/382).
### Adjust Anchor Range and Size in Config
```python
......@@ -450,6 +449,7 @@ anchor_generator=dict(
rotations=[0, 1.57],
reshape_out=False),
```
Regarding the setting of `anchor_range`, it is generally adjusted according to dataset. Note that `z` value needs to be adjusted accordingly to the position of the point cloud, please refer to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/986).
Regarding the setting of `anchor_size`, it is usually necessary to count the average length, width and height of the entire training dataset as `anchor_size` to obtain the best results.
......
......@@ -14,26 +14,26 @@ python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title
Examples:
- Plot the classification loss of some run.
- Plot the classification loss of some run.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
```
- Plot the classification and regression loss of some run, and save the figure to a pdf.
- Plot the classification and regression loss of some run, and save the figure to a pdf.
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
```
- Compare the bbox mAP of two runs in the same figure.
- Compare the bbox mAP of two runs in the same figure.
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
```shell
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1
# evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict
python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
```
You can also compute the average training speed.
......@@ -51,7 +51,7 @@ time std over epochs is 0.0028
average iter time: 1.1959 s/iter
```
&emsp;
&#8195;
# Visualization
......@@ -126,7 +126,7 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task
![](../../resources/browse_dataset_mono.png)
&emsp;
&#8195;
# Model Serving
......@@ -184,7 +184,7 @@ Example:
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
```
&emsp;
&#8195;
# Model Complexity
......@@ -213,7 +213,7 @@ comparisons, but double check it before you adopt it in technical reports or pap
2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
3. We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
&emsp;
&#8195;
# Model Conversion
......@@ -258,7 +258,7 @@ python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth
The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`.
&emsp;
&#8195;
# Dataset Conversion
......@@ -271,15 +271,15 @@ python -u tools/data_converter/nuimage_converter.py --data-root ${DATA_ROOT} --v
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
```
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
- `--data-root`: the root of the dataset, defaults to `./data/nuimages`.
- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini`
- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`.
- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel.
- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study.
More details could be referred to the [doc](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html) for dataset preparation and [README](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages/README.md/) for nuImages dataset.
&emsp;
&#8195;
# Miscellaneous
......
......@@ -32,6 +32,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [-
目前我们只支持 SMOKE 的 CPU 推理测试。
可选参数:
- `RESULT_FILE`:输出结果(pickle 格式)的文件名,如果未指定,结果不会被保存。
- `EVAL_METRICS`:在结果上评测的项,不同的数据集有不同的合法值。具体来说,我们默认对不同的数据集都使用各自的官方度量方法进行评测,所以对 nuScenes、Lyft、ScanNet 和 SUNRGBD 这些数据集来说在检测任务上可以简单设置为 `mAP`;对 KITTI 数据集来说,如果我们只想评测 2D 检测效果,可以将度量方法设置为 `img_bbox`;对于 Waymo 数据集,我们提供了 KITTI 风格(不稳定)和 Waymo 官方风格这两种评测方法,分别对应 `kitti``waymo`,我们推荐使用默认的官方度量方法,它的性能稳定而且可以与其它算法公平比较;同样地,对 S3DIS、ScanNet 这些数据集来说,在分割任务上的度量方法可以设置为 `mIoU`
- `--show`:如果被指定,检测结果会在静默模式下被保存,用于调试和可视化,但只在单块GPU测试的情况下生效,和 `--show-dir` 搭配使用。
......@@ -180,6 +181,7 @@ export CUDA_VISIBLE_DEVICES=-1
- `--options 'Key=value'`:覆盖使用的配置中的一些设定。
`resume-from``load-from` 的不同点:
- `resume-from` 加载模型权重和优化器状态,同时周期数也从特定的模型权重文件中继承,通常用于恢复偶然中断的训练过程。
- `load-from` 仅加载模型权重,训练周期从0开始,通常用于微调。
......
......@@ -72,7 +72,6 @@ KITTI 官方提供的目标检测开发[工具包](https://s3.eu-central-1.amazo
更多关于 Waymo 数据集预处理的中间结果的细节,请参照对应的[说明文档](https://mmdetection3d.readthedocs.io/zh_CN/latest/datasets/waymo_det.html)
## 准备配置文件
第二步是准备配置文件来帮助数据集的读取和使用,另外,为了在 3D 检测中获得不错的性能,调整超参数通常是必要的。
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment