Scannet doc (#664)

* ScanNet-doc: data prepare and train pipeline * add metric of scannet * Fix: data prepare link, summary sentence, 'inwhich' * simplify code script * simplify code script, add few comments

Scannet doc (#664)
* ScanNet-doc: data prepare and train pipeline * add metric of scannet * Fix: data prepare link, summary sentence, 'inwhich' * simplify code script * simplify code script, add few comments
71f58f8d · hjin2902 · GitHub · 684c5a07 · 71f58f8d
Unverified Commit 71f58f8d authored Jun 30, 2021 by hjin2902 Committed by GitHub Jun 30, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 284 additions and 0 deletions

docs/datasets/scannet_det.md docs/datasets/scannet_det.md +284 -0

No files found.
--- a/docs/datasets/scannet_det.md
+++ b/docs/datasets/scannet_det.md
+# ScanNet for 3D Object Detection
+
+## Dataset preparation
+
+For the overall process, please refer to the [README](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md/) page for ScanNet.
+
+### Export ScanNet data
+
+By exporting ScanNet data, we load the raw point cloud data and generate the relevant annotations including semantic label, instance label and ground truth bounding boxes.
+
+```shell
+python batch_load_scannet_data.py
+```
+
+The directory structure before data preparation should be as below
+
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── scannet
+│   │   ├── meta_data
+│   │   ├── scans
+│   │   │   ├── scenexxxx_xx
+│   │   ├── batch_load_scannet_data.py
+│   │   ├── load_scannet_data.py
+│   │   ├── scannet_utils.py
+│   │   ├── README.md
+```
+
+Under folder `scans` there are overall 1201 train and 312 validation folders in which raw point cloud data and relevant annotations are saved. For instance, under folder `scene0001_01` the files are as below:
+
+- `scene0001_01_vh_clean_2.ply`: Mesh file including raw point cloud data.
+- `scene0001_01.aggregation.json`: Aggregation file including object id, segments id and label.
+- `scene0001_01_vh_clean_2.0.010000.segs.json`: Segmentation file including segments id and vertex.
+- `scene0001_01.txt`: Meta file including axis-aligned matrix, etc.
+- `scene0001_01_vh_clean_2.labels.ply`
+
+Export ScanNet data by running `python batch_load_scannet_data.py`. The main steps include:
+
+- Export original files to point cloud, instance label, semantic label and bounding box file.
+- Downsample raw point cloud and filter invalid classes.
+- Save point cloud data and relevant annotation files.
+
+ And the core function `export` in `load_scannet_data.py` is as follows:
+
+```python
+def export(mesh_file,
+           agg_file,
+           seg_file,
+           meta_file,
+           label_map_file,
+           output_file=None,
+           test_mode=False):
+
+    # label map file: ./data/scannet/meta_data/scannetv2-labels.combined.tsv
+    # the various label standards in the label map file, e.g. 'nyu40id'
+    label_map = scannet_utils.read_label_mapping(
+        label_map_file, label_from='raw_category', label_to='nyu40id')
+    # load raw point cloud data, 6-dims feature: XYZRGB
+    mesh_vertices = scannet_utils.read_mesh_vertices_rgb(mesh_file)
+
+    # Load scene axis alignment matrix: a 4x4 transformation matrix
+    # transform raw points in sensor coordinate system to a coordinate system
+    # which is axis-aligned with the length/width of the room
+    lines = open(meta_file).readlines()
+    # test set data doesn't have align_matrix
+    axis_align_matrix = np.eye(4)
+    for line in lines:
+        if 'axisAlignment' in line:
+            axis_align_matrix = [
+                float(x)
+                for x in line.rstrip().strip('axisAlignment = ').split(' ')
+            ]
+            break
+    axis_align_matrix = np.array(axis_align_matrix).reshape((4, 4))
+
+    # perform global alignment of mesh vertices
+    pts = np.ones((mesh_vertices.shape[0], 4))
+    # raw point cloud in homogeneous coordinats, each row: [x, y, z, 1]
+    pts[:, 0:3] = mesh_vertices[:, 0:3]
+    # transform raw mesh vertices to aligned mesh vertices
+    pts = np.dot(pts, axis_align_matrix.transpose())  # Nx4
+    aligned_mesh_vertices = np.concatenate([pts[:, 0:3], mesh_vertices[:, 3:]],
+                                           axis=1)
+
+    # Load semantic and instance labels
+    if not test_mode:
+        # each object has one semantic label and consists of several segments
+        object_id_to_segs, label_to_segs = read_aggregation(agg_file)
+        # many points may belong to the same segment
+        seg_to_verts, num_verts = read_segmentation(seg_file)
+        label_ids = np.zeros(shape=(num_verts), dtype=np.uint32)
+        object_id_to_label_id = {}
+        for label, segs in label_to_segs.items():
+            label_id = label_map[label]
+            for seg in segs:
+                verts = seg_to_verts[seg]
+                # each point has one semantic label
+                label_ids[verts] = label_id
+        instance_ids = np.zeros(
+            shape=(num_verts), dtype=np.uint32)  # 0: unannotated
+        for object_id, segs in object_id_to_segs.items():
+            for seg in segs:
+                verts = seg_to_verts[seg]
+                # object_id is 1-indexed, i.e. 1,2,3,.,,,.NUM_INSTANCES
+                # each point belongs to one object
+                instance_ids[verts] = object_id
+                if object_id not in object_id_to_label_id:
+                    object_id_to_label_id[object_id] = label_ids[verts][0]
+        # bbox format is [x, y, z, dx, dy, dz, label_id]
+        # [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
+        # [label_id] is semantic label id in 'nyu40id' standard
+        # Note: since 3d bbox is axis-aligned, the yaw is 0.
+        unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
+                                        object_id_to_label_id, instance_ids)
+        aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
+                                      object_id_to_label_id, instance_ids)
+    ...
+
+    return mesh_vertices, label_ids, instance_ids, unaligned_bboxes, \
+        aligned_bboxes, object_id_to_label_id, axis_align_matrix
+
+```
+
+After exporting each scan, the raw point cloud could be downsampled, e.g. to 50000, if the number of points is too large. In addition, invalid semantic labels outside of `nyu40id` standard or optional `DONOT CARE` classes should be filtered. Finally, the point cloud data, semantic labels, instance labels and ground truth bounding boxes should be saved in `.npy` files.
+
+### Create dataset
+
+```shell
+python tools/create_data.py scannet --root-path ./data/scannet \
+--out-dir ./data/scannet --extra-tag scannet
+```
+
+The above exported point cloud file, semantic label file and instance label file are further saved in `.bin` format. Meanwhile `.pkl` info files are also generated for train or validation. The core function `process_single_scene` of getting data infos is as follows.
+
+```python
+def process_single_scene(sample_idx):
+
+    # save point cloud, instance label and semantic label in .bin file respectively, get info['pts_path'], info['pts_instance_mask_path'] and info['pts_semantic_mask_path']
+    ...
+
+    # get annotations
+    if has_label:
+        annotations = {}
+        # box is of shape [k, 6 + class]
+        aligned_box_label = self.get_aligned_box_label(sample_idx)
+        unaligned_box_label = self.get_unaligned_box_label(sample_idx)
+        annotations['gt_num'] = aligned_box_label.shape[0]
+        if annotations['gt_num'] != 0:
+            aligned_box = aligned_box_label[:, :-1]  # k, 6
+            unaligned_box = unaligned_box_label[:, :-1]
+            classes = aligned_box_label[:, -1]  # k
+            annotations['name'] = np.array([
+                self.label2cat[self.cat_ids2class[classes[i]]]
+                for i in range(annotations['gt_num'])
+            ])
+            # default names are given to aligned bbox for compatibility
+            # we also save unaligned bbox info with marked names
+            annotations['location'] = aligned_box[:, :3]
+            annotations['dimensions'] = aligned_box[:, 3:6]
+            annotations['gt_boxes_upright_depth'] = aligned_box
+            annotations['unaligned_location'] = unaligned_box[:, :3]
+            annotations['unaligned_dimensions'] = unaligned_box[:, 3:6]
+            annotations[
+                'unaligned_gt_boxes_upright_depth'] = unaligned_box
+            annotations['index'] = np.arange(
+                annotations['gt_num'], dtype=np.int32)
+            annotations['class'] = np.array([
+                self.cat_ids2class[classes[i]]
+                for i in range(annotations['gt_num'])
+            ])
+        axis_align_matrix = self.get_axis_align_matrix(sample_idx)
+        annotations['axis_align_matrix'] = axis_align_matrix  # 4x4
+        info['annos'] = annotations
+    return info
+```
+
+The directory structure after process should be as below
+
+```
+scannet
+├── scannet_utils.py
+├── batch_load_scannet_data.py
+├── load_scannet_data.py
+├── scannet_utils.py
+├── README.md
+├── scans
+├── scans_test
+├── scannet_instance_data
+├── points
+│   ├── xxxxx.bin
+├── instance_mask
+│   ├── xxxxx.bin
+├── semantic_mask
+│   ├── xxxxx.bin
+├── seg_info
+│   ├── train_label_weight.npy
+│   ├── train_resampled_scene_idxs.npy
+│   ├── val_label_weight.npy
+│   ├── val_resampled_scene_idxs.npy
+├── scannet_infos_train.pkl
+├── scannet_infos_val.pkl
+├── scannet_infos_test.pkl
+```
+
+- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Note: the point would be axis-aligned in pre-processing `GlobalAlignment` of 3d detection task.
+- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
+- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
+- `scannet_infos_train.pkl`: The train data infos, the detailed info of each scan is as follows:
+    - info['point_cloud']: {'num_features': 6, 'lidar_idx': sample_idx}.
+    - info['pts_path']: The path of `points/xxxxx.bin`.
+    - info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
+    - info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
+    - info['annos']: The annotations of each scan.
+        - annotations['gt_num']: The number of ground truth.
+        - annotations['name']： The semantic name of all ground truths, e.g. `chair`.
+        - annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
+        - annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
+        - annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
+        - annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
+        - annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
+        - annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
+        - annotations['index']: The index of all ground truths, i.e. [0, K).
+        - annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
+
+
+## Train pipeline
+
+A typical train pipeline of ScanNet for 3d detection is as below.
+
+```python
+train_pipeline = [
+    dict(
+        type='LoadPointsFromFile',
+        coord_type='DEPTH',
+        shift_height=True,
+        load_dim=6,
+        use_dim=[0, 1, 2]),
+    dict(
+        type='LoadAnnotations3D',
+        with_bbox_3d=True,
+        with_label_3d=True,
+        with_mask_3d=True,
+        with_seg_3d=True),
+    dict(type='GlobalAlignment', rotation_axis=2),
+    dict(
+        type='PointSegClassMapping',
+        valid_cat_ids=(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34,
+                       36, 39),
+        max_cat_id=40),
+    dict(type='IndoorPointSample', num_points=40000),
+    dict(
+        type='RandomFlip3D',
+        sync_2d=False,
+        flip_ratio_bev_horizontal=0.5,
+        flip_ratio_bev_vertical=0.5),
+    dict(
+        type='GlobalRotScaleTrans',
+        rot_range=[-0.087266, 0.087266],
+        scale_ratio_range=[1.0, 1.0],
+        shift_height=True),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(
+        type='Collect3D',
+        keys=[
+            'points', 'gt_bboxes_3d', 'gt_labels_3d', 'pts_semantic_mask',
+            'pts_instance_mask'
+        ])
+]
+```
+- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
+- `PointSegClassMapping`: Only the valid category id will be mapped to train class label id like [0, 18).
+- Data augmentation:
+    - `IndoorPointSample`: downsample input point cloud.
+    - `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
+    - `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
+
+## Metrics
+
+Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
+As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.