[Doc] Add SUN RGB-D doc (#770)

* Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add multi-modality related info * Update according to comments * Add chinese doc and frevised the docs * Add some script * Fix typos and formats * Fix typos * Fix typos

[Doc] Add SUN RGB-D doc (#770)
* Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add multi-modality related info * Update according to comments * Add chinese doc and frevised the docs * Add some script * Fix typos and formats * Fix typos * Fix typos
43b4632b · Yezhen Cong · GitHub · 111f33be · 43b4632b · 43b4632b
Unverified Commit 43b4632b authored Jul 30, 2021 by Yezhen Cong Committed by GitHub Jul 30, 2021
8 changed files
--- a/data/sunrgbd/README.md
+++ b/data/sunrgbd/README.md
@@ -2,7 +2,7 @@

 We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).

-1. Download SUNRGBD v2 data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
+1. Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.

 2. Enter the `matlab` folder, Extract point clouds and annotations by running `extract_split.m`, `extract_rgbd_data_v2.m` and `extract_rgbd_data_v1.m`.

@@ -47,12 +47,12 @@ sunrgbd
 │   ├── SUNRGBDtoolbox
 ├── sunrgbd_trainval
 │   ├── calib
-│   ├── image
-│   ├── label_v1
-│   ├── train_data_idx.txt
 │   ├── depth
+│   ├── image
 │   ├── label
+│   ├── label_v1
 │   ├── seg_label
+│   ├── train_data_idx.txt
 │   ├── val_data_idx.txt
 ├── points
 ├── sunrgbd_infos_train.pkl

--- a/docs/datasets/index.rst
+++ b/docs/datasets/index.rst
@@ -2,6 +2,7 @@
   :maxdepth: 2

   waymo_det.md
+   sunrgbd_det.md
   scannet_det.md
   scannet_sem_seg.md
   s3dis_sem_seg.md
--- a/docs/datasets/scannet_det.md
+++ b/docs/datasets/scannet_det.md
@@ -113,7 +113,7 @@ def export(mesh_file,
        # bbox format is [x, y, z, dx, dy, dz, label_id]
        # [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
        # [label_id] is semantic label id in 'nyu40id' standard
-        # Note: since 3d bbox is axis-aligned, the yaw is 0.
+        # Note: since 3D bbox is axis-aligned, the yaw is 0.
        unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
                                        object_id_to_label_id, instance_ids)
        aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
@@ -221,7 +221,7 @@ scannet
 ├── scannet_infos_test.pkl
 ```

- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
+- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task.
 - `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
 - `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
 - `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
@@ -231,21 +231,21 @@ scannet
    - info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
    - info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
    - info['annos']: The annotations of each scan.
-        - annotations['gt_num']: The number of ground truth.
+        - annotations['gt_num']: The number of ground truths.
        - annotations['name']： The semantic name of all ground truths, e.g. `chair`.
-        - annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
-        - annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
-        - annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
-        - annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
-        - annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
-        - annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
+        - annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths.
+        - annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3].
+        - annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
+        - annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes.
+        - annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes.
+        - annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes.
        - annotations['index']: The index of all ground truths, i.e. [0, K).
-        - annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
+        - annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ].


 ## Training pipeline

-A typical training pipeline of ScanNet for 3d detection is as below.
+A typical training pipeline of ScanNet for 3D detection is as follows.

 ```python
 train_pipeline = [
@@ -291,12 +291,12 @@ train_pipeline = [
 - `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
 - `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
 - Data augmentation:
-    - `IndoorPointSample`: downsample input point cloud.
-    - `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
-    - `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
+    - `IndoorPointSample`: downsample the input point cloud.
+    - `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
+    - `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet.

 ## Metrics

-Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
+Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py).

-As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
+As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
--- a/docs/datasets/sunrgbd_det.md
+++ b/docs/datasets/sunrgbd_det.md
+# SUN RGB-D for 3D Object Detection
+
+## Dataset preparation
+
+For the overall process, please refer to the [README](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/) page for SUN RGB-D.
+
+### Download SUN RGB-D data and toolbox
+
+Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move `SUNRGBD.zip`, `SUNRGBDMeta2DBB_v2.mat`, `SUNRGBDMeta3DBB_v2.mat` and `SUNRGBDtoolbox.zip` to the `OFFICIAL_SUNRGBD` folder, unzip the zip files.
+
+The directory structure before data preparation should be as below:
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── extract_rgbd_data_v1.m
+│   ├── extract_rgbd_data_v2.m
+│   ├── extract_split.m
+├── OFFICIAL_SUNRGBD
+│   ├── SUNRGBD
+│   ├── SUNRGBDMeta2DBB_v2.mat
+│   ├── SUNRGBDMeta3DBB_v2.mat
+│   ├── SUNRGBDtoolbox
+```
+
+### Extract data and annotations for 3D detection from raw data
+
+Extract SUN RGB-D annotation data from raw annotation data by running (this requires MATLAB installed on your machine):
+
+```bash
+matlab -nosplash -nodesktop -r 'extract_split;quit;'
+matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
+matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
+```
+
+The main steps include:
+
+- Extract train and val split.
+- Extract data for 3D detection from raw data.
+- Extract and format detection annotation from raw data.
+
+The main component of `extract_rgbd_data_v2.m` which extracts point cloud data from depth map is as follows:
+
+```matlab
+data = SUNRGBDMeta(imageId);
+data.depthpath(1:16) = '';
+data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
+data.rgbpath(1:16) = '';
+data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
+
+% extract point cloud from depth map
+[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
+rgb(isnan(points3d(:,1)),:) = [];
+points3d(isnan(points3d(:,1)),:) = [];
+points3d_rgb = [points3d, rgb];
+
+% MAT files are 3x smaller than TXT files. In Python we can use
+% scipy.io.loadmat('xxx.mat')['points3d_rgb'] to load the data.
+mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
+txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
+% save point cloud data
+parsave(strcat(depth_folder, mat_filename), points3d_rgb);
+```
+
+The main component of `extract_rgbd_data_v1.m` which extracts annotation is as follows:
+
+```matlab
+% Write 2D and 3D box label
+data2d = data;
+fid = fopen(strcat(det_label_folder, txt_filename), 'w');
+for j = 1:length(data.groundtruth3DBB)
+    centroid = data.groundtruth3DBB(j).centroid;  % 3D bbox center
+    classname = data.groundtruth3DBB(j).classname;  % class name
+    orientation = data.groundtruth3DBB(j).orientation;  % 3D bbox orientation
+    coeffs = abs(data.groundtruth3DBB(j).coeffs);  % 3D bbox size
+    box2d = data2d.groundtruth2DBB(j).gtBb2D;  % 2D bbox
+    fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
+end
+fclose(fid);
+```
+
+The above two scripts call functions such as `read3dPoints` from the [toolbox](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip) provided by SUN RGB-D.
+
+The directory structure after extraction should be as follows.
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── extract_rgbd_data_v1.m
+│   ├── extract_rgbd_data_v2.m
+│   ├── extract_split.m
+├── OFFICIAL_SUNRGBD
+│   ├── SUNRGBD
+│   ├── SUNRGBDMeta2DBB_v2.mat
+│   ├── SUNRGBDMeta3DBB_v2.mat
+│   ├── SUNRGBDtoolbox
+├── sunrgbd_trainval
+│   ├── calib
+│   ├── depth
+│   ├── image
+│   ├── label
+│   ├── label_v1
+│   ├── seg_label
+│   ├── train_data_idx.txt
+│   ├── val_data_idx.txt
+```
+
+Under each following folder there are overall 5285 train files and 5050 val files:
+
+- `calib`: Camera calibration information in `.txt`
+- `depth`: Point cloud saved in `.mat` (xyz+rgb)
+- `image`: Image data in `.jpg`
+- `label`: Detection annotation data in `.txt` (version 2)
+- `label_v1`: Detection annotation data in `.txt` (version 1)
+- `seg_label`: Segmentation annotation data in `.txt`
+
+ Currently, we use v1 data for training and testing, so the version 2 labels are unused.
+
+### Create dataset
+
+Please run the command below to create the dataset.
+
+```shell
+python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
+--out-dir ./data/sunrgbd --extra-tag sunrgbd
+```
+
+or (if in a slurm environment)
+
+```
+bash tools/create_data.sh <job_name> sunrgbd
+```
+
+The above point cloud data are further saved in `.bin` format. Meanwhile `.pkl` info files are also generated for saving annotation and metadata. The core function `process_single_scene` of getting data infos is as follows.
+
+```python
+def process_single_scene(sample_idx):
+    print(f'{self.split} sample_idx: {sample_idx}')
+    # convert depth to points
+    # and downsample the points
+    SAMPLE_NUM = 50000
+    pc_upright_depth = self.get_depth(sample_idx)
+    pc_upright_depth_subsampled = random_sampling(
+        pc_upright_depth, SAMPLE_NUM)
+
+    info = dict()
+    pc_info = {'num_features': 6, 'lidar_idx': sample_idx}
+    info['point_cloud'] = pc_info
+
+    # save point cloud data in `.bin` format
+    mmcv.mkdir_or_exist(osp.join(self.root_dir, 'points'))
+    pc_upright_depth_subsampled.tofile(
+        osp.join(self.root_dir, 'points', f'{sample_idx:06d}.bin'))
+
+    # save point cloud file path
+    info['pts_path'] = osp.join('points', f'{sample_idx:06d}.bin')
+
+    # save image file path and metainfo
+    img_path = osp.join('image', f'{sample_idx:06d}.jpg')
+    image_info = {
+        'image_idx': sample_idx,
+        'image_shape': self.get_image_shape(sample_idx),
+        'image_path': img_path
+    }
+    info['image'] = image_info
+
+    # save calibration information
+    K, Rt = self.get_calibration(sample_idx)
+    calib_info = {'K': K, 'Rt': Rt}
+    info['calib'] = calib_info
+
+    # save all annotation
+    if has_label:
+        obj_list = self.get_label_objects(sample_idx)
+        annotations = {}
+        annotations['gt_num'] = len([
+            obj.classname for obj in obj_list
+            if obj.classname in self.cat2label.keys()
+        ])
+        if annotations['gt_num'] != 0:
+            # class name
+            annotations['name'] = np.array([
+                obj.classname for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # 2D image bounding boxes
+            annotations['bbox'] = np.concatenate([
+                obj.box2d.reshape(1, 4) for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ], axis=0)
+            # 3D bounding box center location (in depth coordinate system)
+            annotations['location'] = np.concatenate([
+                obj.centroid.reshape(1, 3) for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ], axis=0)
+            # 3D bounding box dimension/size (in depth coordinate system)
+            annotations['dimensions'] = 2 * np.array([
+                [obj.l, obj.h, obj.w] for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # 3D bounding box rotation angle/yaw angle (in depth coordinate system)
+            annotations['rotation_y'] = np.array([
+                obj.heading_angle for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            annotations['index'] = np.arange(
+                len(obj_list), dtype=np.int32)
+            # class label (number)
+            annotations['class'] = np.array([
+                self.cat2label[obj.classname] for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # 3D bounding box (in depth coordinate system)
+            annotations['gt_boxes_upright_depth'] = np.stack(
+                [
+                    obj.box3d for obj in obj_list
+                    if obj.classname in self.cat2label.keys()
+                ], axis=0)  # (K,8)
+        info['annos'] = annotations
+    return info
+```
+
+The directory structure after processing should be as follows.
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── ...
+├── OFFICIAL_SUNRGBD
+│   ├── ...
+├── sunrgbd_trainval
+│   ├── ...
+├── points
+├── sunrgbd_infos_train.pkl
+├── sunrgbd_infos_val.pkl
+```
+
+- `points/0xxxxx.bin`: The point cloud data after downsample.
+- `sunrgbd_infos_train.pkl`: The train data infos, the detailed info of each scene is as follows:
+    - info['point_cloud']: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx` is the index of the scene.
+    - info['pts_path']: The path of `points/0xxxxx.bin`.
+    - info['image']: The image path and metainfo:
+        - image['image_idx']: The index of the image.
+        - image['image_shape']: The shape of the image tensor.
+        - image['image_path']: The path of the image.
+    - info['annos']: The annotations of each scene.
+        - annotations['gt_num']: The number of ground truths.
+        - annotations['name']: The semantic name of all ground truths, e.g. `chair`.
+        - annotations['location']: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
+        - annotations['dimensions']: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: [K, 3].
+        - annotations['rotation_y']: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: [K, ].
+        - annotations['gt_boxes_upright_depth']: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: [K, 7].
+        - annotations['bbox']: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: [K, 4].
+        - annotations['index']: The index of all ground truths, range [0, K).
+        - annotations['class']: The train class id of the bounding boxes, value range: [0, 10), shape: [K, ].
+- `sunrgbd_infos_val.pkl`: The val data infos, which shares the same format as `sunrgbd_infos_train.pkl`.
+
+
+## Train pipeline
+
+A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=True,
+        load_dim=6,
+        use_dim=[0, 1, 2]),
+    dict(type='LoadAnnotations3D'),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+    ),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.523599, 0.523599],
+        scale_ratio_range=[0.85, 1.15],
+        shift_height=True),
+    dict(type='IndoorPointSample', num_points=20000),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+```
+
+Data augmentation for point clouds:
+- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
+- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-30, 30] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of [0.85, 1.15] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D.
+- `IndoorPointSample`: downsample the input point cloud.
+
+A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=True,
+        load_dim=6,
+        use_dim=[0, 1, 2]),
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations3D'),
+    dict(type='LoadAnnotations', with_bbox=True),
+    dict(type='Resize', img_scale=(1333, 600), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.0),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+    ),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.523599, 0.523599],
+        scale_ratio_range=[0.85, 1.15],
+        shift_height=True),
+    dict(type='IndoorPointSample', num_points=20000),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(
+        type='Collect3D',
+        keys=[
+            'img', 'gt_bboxes', 'gt_labels', 'points', 'gt_bboxes_3d',
+            'gt_labels_3d'
+        ])
+]
+```
+
+Data augmentation/normalization for images:
+- `Resize`: resize the input image, `keep_ratio=True` means the ratio of the image is kept unchanged.
+- `Normalize`: normalize the RGB channels of the input image.
+- `RandomFlip`: randomly flip the input image.
+- `Pad`: pad the input image with zeros by default.
+
+The image augmentation and normalization functions are implemented in [MMDetection](https://github.com/open-mmlab/mmdetection/tree/master/mmdet/datasets/pipelines).
+
+## Metrics
+
+Same as ScanNet, typically mean Average Precision (mAP) is used for evaluation on SUN RGB-D, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
+
+Since SUN RGB-D consists of image data, detection on image data is also feasible. For instance, in ImVoteNet, we first train an image detector, and we also use mAP for evaluation, e.g. `mAP@0.5`. We use the `eval_map` function from [MMDetection](https://github.com/open-mmlab/mmdetection) to calculate mAP.
--- a/docs_zh-CN/datasets/index.rst
+++ b/docs_zh-CN/datasets/index.rst
@@ -2,6 +2,7 @@
   :maxdepth: 2

   waymo_det.md
+   sunrgbd_det.md
   scannet_det.md
   scannet_sem_seg.md
   s3dis_sem_seg.md
--- a/docs_zh-CN/datasets/scannet_det.md
+++ b/docs_zh-CN/datasets/scannet_det.md
-# 3D目标检测Scannet数据集
\ No newline at end of file
+# 3D 目标检测 Scannet 数据集
--- a/docs_zh-CN/datasets/sunrgbd_det.md
+++ b/docs_zh-CN/datasets/sunrgbd_det.md
+# 3D 目标检测 SUN RGB-D 数据集
+
+## 数据集的准备
+
+对于数据集准备的整体流程，请参考 SUN RGB-D 的[指南](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/)。
+
+### 下载 SUN RGB-D 数据与工具包
+
+在[这里](http://rgbd.cs.princeton.edu/data/)下载 SUN RGB-D 的数据。接下来，将 `SUNRGBD.zip`、`SUNRGBDMeta2DBB_v2.mat`、`SUNRGBDMeta3DBB_v2.mat` 和 `SUNRGBDtoolbox.zip` 移动到 `OFFICIAL_SUNRGBD` 文件夹，并解压文件。
+
+下载完成后，数据处理之前的文件目录结构如下：
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── extract_rgbd_data_v1.m
+│   ├── extract_rgbd_data_v2.m
+│   ├── extract_split.m
+├── OFFICIAL_SUNRGBD
+│   ├── SUNRGBD
+│   ├── SUNRGBDMeta2DBB_v2.mat
+│   ├── SUNRGBDMeta3DBB_v2.mat
+│   ├── SUNRGBDtoolbox
+```
+
+### 从原始数据中提取 3D 检测所需数据与标注
+
+通过运行如下指令从原始文件中提取出 SUN RGB-D 的标注（这需要您的机器中安装了 MATLAB）：
+
+```bash
+matlab -nosplash -nodesktop -r 'extract_split;quit;'
+matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
+matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
+```
+
+主要的步骤包括：
+
+- 提取出训练集和验证集的索引文件；
+- 从原始数据中提取出 3D 检测所需要的数据；
+- 从原始的标注数据中提取并组织检测任务使用的标注数据。
+
+用于从深度图中提取点云数据的 `extract_rgbd_data_v2.m` 的主要部分如下：
+
+```matlab
+data = SUNRGBDMeta(imageId);
+data.depthpath(1:16) = '';
+data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
+data.rgbpath(1:16) = '';
+data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
+
+% 从深度图获取点云
+[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
+rgb(isnan(points3d(:,1)),:) = [];
+points3d(isnan(points3d(:,1)),:) = [];
+points3d_rgb = [points3d, rgb];
+
+% MAT 文件比 TXT 文件小三倍。在 Python 中我们可以使用
+% scipy.io.loadmat('xxx.mat')['points3d_rgb'] 来加载数据
+mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
+txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
+% 保存点云数据
+parsave(strcat(depth_folder, mat_filename), points3d_rgb);
+```
+
+用于提取并组织检测任务标注的 `extract_rgbd_data_v1.m` 的主要部分如下：
+
+```matlab
+% 输出 2D 和 3D 包围框
+data2d = data;
+fid = fopen(strcat(det_label_folder, txt_filename), 'w');
+for j = 1:length(data.groundtruth3DBB)
+    centroid = data.groundtruth3DBB(j).centroid;  % 3D 包围框中心
+    classname = data.groundtruth3DBB(j).classname;  % 类名
+    orientation = data.groundtruth3DBB(j).orientation;  % 3D 包围框方向
+    coeffs = abs(data.groundtruth3DBB(j).coeffs);  % 3D 包围框大小
+    box2d = data2d.groundtruth2DBB(j).gtBb2D;  % 2D 包围框
+    fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
+end
+fclose(fid);
+```
+
+上面的两个脚本调用了 SUN RGB-D 提供的[工具包](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip)中的一些函数，如 `read3dPoints`。
+
+使用上述脚本提取数据后，文件目录结构应如下：
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── extract_rgbd_data_v1.m
+│   ├── extract_rgbd_data_v2.m
+│   ├── extract_split.m
+├── OFFICIAL_SUNRGBD
+│   ├── SUNRGBD
+│   ├── SUNRGBDMeta2DBB_v2.mat
+│   ├── SUNRGBDMeta3DBB_v2.mat
+│   ├── SUNRGBDtoolbox
+├── sunrgbd_trainval
+│   ├── calib
+│   ├── depth
+│   ├── image
+│   ├── label
+│   ├── label_v1
+│   ├── seg_label
+│   ├── train_data_idx.txt
+│   ├── val_data_idx.txt
+```
+
+在如下每个文件夹下，都有总计 5285 个训练集样本和 5050 个验证集样本：
+
+- `calib`：`.txt` 后缀的相机标定文件。
+- `depth`：`.mat` 后缀的点云文件，包含 xyz 坐标和 rgb 色彩值。
+- `image`：`.jpg` 后缀的二维图像文件。
+- `label`：`.txt` 后缀的用于检测任务的标注数据（版本二）。
+- `label_v1`：`.txt` 后缀的用于检测任务的标注数据（版本一）。
+- `seg_label`：`.txt` 后缀的用于分割任务的标注数据。
+
+目前，我们使用版本一的数据用于训练与测试，因此版本二的标注并未使用。
+
+### 创建数据集
+
+请运行如下指令创建数据集：
+
+```shell
+python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
+--out-dir ./data/sunrgbd --extra-tag sunrgbd
+```
+
+或者，如果使用 slurm，可以使用如下指令替代：
+
+```
+bash tools/create_data.sh <job_name> sunrgbd
+```
+
+之前提到的点云数据就会被处理并以 `.bin` 格式重新存储。与此同时，`.pkl` 文件也被生成，用于存储数据标注和元信息。这一步处理中，用于生成 `.pkl` 文件的核心函数 `process_single_scene` 如下：
+
+```python
+def process_single_scene(sample_idx):
+    print(f'{self.split} sample_idx: {sample_idx}')
+    # 将深度图转换为点云并降采样点云
+    SAMPLE_NUM = 50000
+    pc_upright_depth = self.get_depth(sample_idx)
+    pc_upright_depth_subsampled = random_sampling(
+        pc_upright_depth, SAMPLE_NUM)
+
+    info = dict()
+    pc_info = {'num_features': 6, 'lidar_idx': sample_idx}
+    info['point_cloud'] = pc_info
+
+    # 将点云保存为 `.bin` 格式
+    mmcv.mkdir_or_exist(osp.join(self.root_dir, 'points'))
+    pc_upright_depth_subsampled.tofile(
+        osp.join(self.root_dir, 'points', f'{sample_idx:06d}.bin'))
+
+    # 存储点云存储路径
+    info['pts_path'] = osp.join('points', f'{sample_idx:06d}.bin')
+
+    # 存储图像存储路径以及其元信息
+    img_path = osp.join('image', f'{sample_idx:06d}.jpg')
+    image_info = {
+        'image_idx': sample_idx,
+        'image_shape': self.get_image_shape(sample_idx),
+        'image_path': img_path
+    }
+    info['image'] = image_info
+
+    # 保存标定信息
+    K, Rt = self.get_calibration(sample_idx)
+    calib_info = {'K': K, 'Rt': Rt}
+    info['calib'] = calib_info
+
+    # 保存所有数据标注
+    if has_label:
+        obj_list = self.get_label_objects(sample_idx)
+        annotations = {}
+        annotations['gt_num'] = len([
+            obj.classname for obj in obj_list
+            if obj.classname in self.cat2label.keys()
+        ])
+        if annotations['gt_num'] != 0:
+            # 类别名称
+            annotations['name'] = np.array([
+                obj.classname for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # 二维图像包围框
+            annotations['bbox'] = np.concatenate([
+                obj.box2d.reshape(1, 4) for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ], axis=0)
+            # depth 坐标系下的三维包围框中心坐标
+            annotations['location'] = np.concatenate([
+                obj.centroid.reshape(1, 3) for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ], axis=0)
+            # depth 坐标系下的三维包围框大小
+            annotations['dimensions'] = 2 * np.array([
+                [obj.l, obj.h, obj.w] for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # depth 坐标系下的三维包围框旋转角
+            annotations['rotation_y'] = np.array([
+                obj.heading_angle for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            annotations['index'] = np.arange(
+                len(obj_list), dtype=np.int32)
+            # 类别标签（数字）
+            annotations['class'] = np.array([
+                self.cat2label[obj.classname] for obj in obj_list
+                if obj.classname in self.cat2label.keys()
+            ])
+            # depth 坐标系下的三维包围框
+            annotations['gt_boxes_upright_depth'] = np.stack(
+                [
+                    obj.box3d for obj in obj_list
+                    if obj.classname in self.cat2label.keys()
+                ], axis=0)  # (K,8)
+        info['annos'] = annotations
+    return info
+```
+
+如上数据处理后，文件目录结构应如下：
+
+```
+sunrgbd
+├── README.md
+├── matlab
+│   ├── ...
+├── OFFICIAL_SUNRGBD
+│   ├── ...
+├── sunrgbd_trainval
+│   ├── ...
+├── points
+├── sunrgbd_infos_train.pkl
+├── sunrgbd_infos_val.pkl
+```
+
+- `points/0xxxxx.bin`：降采样后的点云数据。
+- `sunrgbd_infos_train.pkl`：训练集数据信息（标注与元信息），每个场景所含数据信息具体如下：
+    - info['point_cloud']：`{'num_features': 6, 'lidar_idx': sample_idx}`，其中 `sample_idx` 为该场景的索引。
+    - info['pts_path']：`points/0xxxxx.bin` 的路径。
+    - info['image']：图像路径与元信息：
+        - image['image_idx']：图像索引。
+        - image['image_shape']：图像张量的形状（即其尺寸）。
+        - image['image_path']图像路径。
+    - info['annos']：每个场景的标注：
+        - annotations['gt_num']：真实物体（ground truth）的数量
+        - annotations['name']：所有真实物体的语义类别名称，比如 `chair`（椅子）。
+        - annotations['location']：depth 坐标系下三维包围框的重力（gravity center）中心，形状为 [K, 3]，其中 K 是真实物体的数量。
+        - annotations['dimensions']：depth 坐标系下三维包围框的大小，形状为 [K, 3]。
+        - annotations['rotation_y']：depth 坐标系下三维包围框的旋转角，形状为 [K, ]。
+        - annotations['gt_boxes_upright_depth']：depth 坐标系下三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`，形状为 [K, 7]。
+        - annotations['bbox']：二维包围框 `(x, y, x_size, y_size)`，形状为 [K, 4]。
+        - annotations['index']：所有真实物体的索引，范围为 [0, K)。
+        - annotations['class']：所有真实物体类别的标号范围为 [0, 10)，形状为 [K, ]。
+- `sunrgbd_infos_val.pkl`：验证集上的数据信息，与 `sunrgbd_infos_train.pkl` 格式完全一致。
+
+
+## 训练流程
+
+SUN RGB-D 上纯点云 3D 物体检测的经典流程如下：
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=True,
+        load_dim=6,
+        use_dim=[0, 1, 2]),
+    dict(type='LoadAnnotations3D'),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+    ),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.523599, 0.523599],
+        scale_ratio_range=[0.85, 1.15],
+        shift_height=True),
+    dict(type='IndoorPointSample', num_points=20000),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
+]
+```
+
+点云上的数据增强
+- `RandomFlip3D`：随机左右或前后翻转输入点云。
+- `GlobalRotScaleTrans`：旋转输入点云，对于 SUN RGB-D 角度通常落入 [-30, 30] （度）的范围；并放缩输入点云，对于 SUN RGB-D 比例通常落入 [0.85, 1.15] 的范围；最后平移输入点云，对于 SUN RGB-D 通常位移量为 0。
+- `IndoorPointSample`：降采样输入点云。
+
+SUN RGB-D 上多模态（点云和图像）3D 物体检测的经典流程如下：
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=True,
+        load_dim=6,
+        use_dim=[0, 1, 2]),
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations3D'),
+    dict(type='LoadAnnotations', with_bbox=True),
+    dict(type='Resize', img_scale=(1333, 600), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.0),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+    ),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.523599, 0.523599],
+        scale_ratio_range=[0.85, 1.15],
+        shift_height=True),
+    dict(type='IndoorPointSample', num_points=20000),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(
+        type='Collect3D',
+        keys=[
+            'img', 'gt_bboxes', 'gt_labels', 'points', 'gt_bboxes_3d',
+            'gt_labels_3d'
+        ])
+]
+```
+
+图像上的数据增强/归一化
+- `Resize`: 改变输入图像的大小, `keep_ratio=True` 意味着图像的比例不改变。
+- `Normalize`: 归一化图像的 RGB 通道。
+- `RandomFlip`: 随机地翻折图像。
+- `Pad`: 扩大图像，默认情况下用零填充图像的边缘。
+
+图像增强和归一化函数的实现取自 [MMDetection](https://github.com/open-mmlab/mmdetection/tree/master/mmdet/datasets/pipelines)。
+
+## 度量指标
+
+与 ScanNet 一样，通常 mAP（全类平均精度）被用于SUN RGB-D 的检测任务的评估，比如 `mAP@0.25` 和 `mAP@0.5`。具体来说，评估时一个通用的计算 3D 物体检测多个类别的精度和召回率的函数被调用，可以参考 [`indoor_eval.py`](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py)。
+
+因为 SUN RGB-D 包含有图像数据，所以图像上的物体检测也是可行的。举个例子，在 ImVoteNet 中，我们首先训练了一个图像检测器，并且也使用 mAP 指标，如 `mAP@0.5`，来评估其表现。我们使用 [MMDetection](https://github.com/open-mmlab/mmdetection) 库中的 `eval_map` 函数来计算 mAP。
--- a/tools/data_converter/sunrgbd_data_utils.py
+++ b/tools/data_converter/sunrgbd_data_utils.py
@@ -191,9 +191,9 @@ class SUNRGBDData(object):
                    ],
                                                             axis=0)
                    annotations['dimensions'] = 2 * np.array([
-                        [obj.l, obj.h, obj.w] for obj in obj_list
+                        [obj.l, obj.w, obj.h] for obj in obj_list
                        if obj.classname in self.cat2label.keys()
-                    ])  # lhw(depth) format
+                    ])  # lwh (depth) format
                    annotations['rotation_y'] = np.array([
                        obj.heading_angle for obj in obj_list
                        if obj.classname in self.cat2label.keys()