Unverified Commit 43b4632b authored by Yezhen Cong's avatar Yezhen Cong Committed by GitHub
Browse files

[Doc] Add SUN RGB-D doc (#770)

* Add doc

* Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc

* Revert mistakenly modified file

* Fix typos

* Add multi-modality related info

* Add doc

* Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc

* Revert mistakenly modified file

* Fix typos

* Add multi-modality related info

* Add multi-modality related info

* Update according to comments

* Add chinese doc and frevised the docs

* Add some script

* Fix typos and formats

* Fix typos

* Fix typos
parent 111f33be
......@@ -2,7 +2,7 @@
We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).
1. Download SUNRGBD v2 data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
1. Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
2. Enter the `matlab` folder, Extract point clouds and annotations by running `extract_split.m`, `extract_rgbd_data_v2.m` and `extract_rgbd_data_v1.m`.
......@@ -47,12 +47,12 @@ sunrgbd
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── image
│ ├── label_v1
│ ├── train_data_idx.txt
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
├── points
├── sunrgbd_infos_train.pkl
......
......@@ -2,6 +2,7 @@
:maxdepth: 2
waymo_det.md
sunrgbd_det.md
scannet_det.md
scannet_sem_seg.md
s3dis_sem_seg.md
......@@ -113,7 +113,7 @@ def export(mesh_file,
# bbox format is [x, y, z, dx, dy, dz, label_id]
# [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
# [label_id] is semantic label id in 'nyu40id' standard
# Note: since 3d bbox is axis-aligned, the yaw is 0.
# Note: since 3D bbox is axis-aligned, the yaw is 0.
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
......@@ -221,7 +221,7 @@ scannet
├── scannet_infos_test.pkl
```
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
......@@ -231,21 +231,21 @@ scannet
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info['annos']: The annotations of each scan.
- annotations['gt_num']: The number of ground truth.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
- annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
- annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
- annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
- annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
- annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3].
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes.
- annotations['index']: The index of all ground truths, i.e. [0, K).
- annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
- annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ].
## Training pipeline
A typical training pipeline of ScanNet for 3d detection is as below.
A typical training pipeline of ScanNet for 3D detection is as follows.
```python
train_pipeline = [
......@@ -291,12 +291,12 @@ train_pipeline = [
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
- Data augmentation:
- `IndoorPointSample`: downsample input point cloud.
- `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
- `IndoorPointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet.
## Metrics
Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py).
As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
# SUN RGB-D for 3D Object Detection
## Dataset preparation
For the overall process, please refer to the [README](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/) page for SUN RGB-D.
### Download SUN RGB-D data and toolbox
Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move `SUNRGBD.zip`, `SUNRGBDMeta2DBB_v2.mat`, `SUNRGBDMeta3DBB_v2.mat` and `SUNRGBDtoolbox.zip` to the `OFFICIAL_SUNRGBD` folder, unzip the zip files.
The directory structure before data preparation should be as below:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
```
### Extract data and annotations for 3D detection from raw data
Extract SUN RGB-D annotation data from raw annotation data by running (this requires MATLAB installed on your machine):
```bash
matlab -nosplash -nodesktop -r 'extract_split;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
```
The main steps include:
- Extract train and val split.
- Extract data for 3D detection from raw data.
- Extract and format detection annotation from raw data.
The main component of `extract_rgbd_data_v2.m` which extracts point cloud data from depth map is as follows:
```matlab
data = SUNRGBDMeta(imageId);
data.depthpath(1:16) = '';
data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
data.rgbpath(1:16) = '';
data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
% extract point cloud from depth map
[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
rgb(isnan(points3d(:,1)),:) = [];
points3d(isnan(points3d(:,1)),:) = [];
points3d_rgb = [points3d, rgb];
% MAT files are 3x smaller than TXT files. In Python we can use
% scipy.io.loadmat('xxx.mat')['points3d_rgb'] to load the data.
mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
% save point cloud data
parsave(strcat(depth_folder, mat_filename), points3d_rgb);
```
The main component of `extract_rgbd_data_v1.m` which extracts annotation is as follows:
```matlab
% Write 2D and 3D box label
data2d = data;
fid = fopen(strcat(det_label_folder, txt_filename), 'w');
for j = 1:length(data.groundtruth3DBB)
centroid = data.groundtruth3DBB(j).centroid; % 3D bbox center
classname = data.groundtruth3DBB(j).classname; % class name
orientation = data.groundtruth3DBB(j).orientation; % 3D bbox orientation
coeffs = abs(data.groundtruth3DBB(j).coeffs); % 3D bbox size
box2d = data2d.groundtruth2DBB(j).gtBb2D; % 2D bbox
fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
end
fclose(fid);
```
The above two scripts call functions such as `read3dPoints` from the [toolbox](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip) provided by SUN RGB-D.
The directory structure after extraction should be as follows.
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
```
Under each following folder there are overall 5285 train files and 5050 val files:
- `calib`: Camera calibration information in `.txt`
- `depth`: Point cloud saved in `.mat` (xyz+rgb)
- `image`: Image data in `.jpg`
- `label`: Detection annotation data in `.txt` (version 2)
- `label_v1`: Detection annotation data in `.txt` (version 1)
- `seg_label`: Segmentation annotation data in `.txt`
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
### Create dataset
Please run the command below to create the dataset.
```shell
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
--out-dir ./data/sunrgbd --extra-tag sunrgbd
```
or (if in a slurm environment)
```
bash tools/create_data.sh <job_name> sunrgbd
```
The above point cloud data are further saved in `.bin` format. Meanwhile `.pkl` info files are also generated for saving annotation and metadata. The core function `process_single_scene` of getting data infos is as follows.
```python
def process_single_scene(sample_idx):
print(f'{self.split} sample_idx: {sample_idx}')
# convert depth to points
# and downsample the points
SAMPLE_NUM = 50000
pc_upright_depth = self.get_depth(sample_idx)
pc_upright_depth_subsampled = random_sampling(
pc_upright_depth, SAMPLE_NUM)
info = dict()
pc_info = {'num_features': 6, 'lidar_idx': sample_idx}
info['point_cloud'] = pc_info
# save point cloud data in `.bin` format
mmcv.mkdir_or_exist(osp.join(self.root_dir, 'points'))
pc_upright_depth_subsampled.tofile(
osp.join(self.root_dir, 'points', f'{sample_idx:06d}.bin'))
# save point cloud file path
info['pts_path'] = osp.join('points', f'{sample_idx:06d}.bin')
# save image file path and metainfo
img_path = osp.join('image', f'{sample_idx:06d}.jpg')
image_info = {
'image_idx': sample_idx,
'image_shape': self.get_image_shape(sample_idx),
'image_path': img_path
}
info['image'] = image_info
# save calibration information
K, Rt = self.get_calibration(sample_idx)
calib_info = {'K': K, 'Rt': Rt}
info['calib'] = calib_info
# save all annotation
if has_label:
obj_list = self.get_label_objects(sample_idx)
annotations = {}
annotations['gt_num'] = len([
obj.classname for obj in obj_list
if obj.classname in self.cat2label.keys()
])
if annotations['gt_num'] != 0:
# class name
annotations['name'] = np.array([
obj.classname for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# 2D image bounding boxes
annotations['bbox'] = np.concatenate([
obj.box2d.reshape(1, 4) for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0)
# 3D bounding box center location (in depth coordinate system)
annotations['location'] = np.concatenate([
obj.centroid.reshape(1, 3) for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0)
# 3D bounding box dimension/size (in depth coordinate system)
annotations['dimensions'] = 2 * np.array([
[obj.l, obj.h, obj.w] for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# 3D bounding box rotation angle/yaw angle (in depth coordinate system)
annotations['rotation_y'] = np.array([
obj.heading_angle for obj in obj_list
if obj.classname in self.cat2label.keys()
])
annotations['index'] = np.arange(
len(obj_list), dtype=np.int32)
# class label (number)
annotations['class'] = np.array([
self.cat2label[obj.classname] for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# 3D bounding box (in depth coordinate system)
annotations['gt_boxes_upright_depth'] = np.stack(
[
obj.box3d for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0) # (K,8)
info['annos'] = annotations
return info
```
The directory structure after processing should be as follows.
```
sunrgbd
├── README.md
├── matlab
│ ├── ...
├── OFFICIAL_SUNRGBD
│ ├── ...
├── sunrgbd_trainval
│ ├── ...
├── points
├── sunrgbd_infos_train.pkl
├── sunrgbd_infos_val.pkl
```
- `points/0xxxxx.bin`: The point cloud data after downsample.
- `sunrgbd_infos_train.pkl`: The train data infos, the detailed info of each scene is as follows:
- info['point_cloud']: `·`{'num_features': 6, 'lidar_idx': sample_idx}`, where `sample_idx` is the index of the scene.
- info['pts_path']: The path of `points/0xxxxx.bin`.
- info['image']: The image path and metainfo:
- image['image_idx']: The index of the image.
- image['image_shape']: The shape of the image tensor.
- image['image_path']: The path of the image.
- info['annos']: The annotations of each scene.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of the 3D bounding boxes in depth coordinate system. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the 3D bounding boxes in depth coordinate system, i.e. `(x_size, y_size, z_size)`, shape: [K, 3].
- annotations['rotation_y']: The yaw angle of the 3D bounding boxes in depth coordinate system. Shape: [K, ].
- annotations['gt_boxes_upright_depth']: The 3D bounding boxes in depth coordinate system, each bounding box is `(x, y, z, x_size, y_size, z_size, yaw)`, shape: [K, 7].
- annotations['bbox']: The 2D bounding boxes, each bounding box is `(x, y, x_size, y_size)`, shape: [K, 4].
- annotations['index']: The index of all ground truths, range [0, K).
- annotations['class']: The train class id of the bounding boxes, value range: [0, 10), shape: [K, ].
- `sunrgbd_infos_val.pkl`: The val data infos, which shares the same format as `sunrgbd_infos_train.pkl`.
## Train pipeline
A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadAnnotations3D'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='IndoorPointSample', num_points=20000),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
Data augmentation for point clouds:
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-30, 30] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of [0.85, 1.15] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D.
- `IndoorPointSample`: downsample the input point cloud.
A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations3D'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='IndoorPointSample', num_points=20000),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'points', 'gt_bboxes_3d',
'gt_labels_3d'
])
]
```
Data augmentation/normalization for images:
- `Resize`: resize the input image, `keep_ratio=True` means the ratio of the image is kept unchanged.
- `Normalize`: normalize the RGB channels of the input image.
- `RandomFlip`: randomly flip the input image.
- `Pad`: pad the input image with zeros by default.
The image augmentation and normalization functions are implemented in [MMDetection](https://github.com/open-mmlab/mmdetection/tree/master/mmdet/datasets/pipelines).
## Metrics
Same as ScanNet, typically mean Average Precision (mAP) is used for evaluation on SUN RGB-D, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
Since SUN RGB-D consists of image data, detection on image data is also feasible. For instance, in ImVoteNet, we first train an image detector, and we also use mAP for evaluation, e.g. `mAP@0.5`. We use the `eval_map` function from [MMDetection](https://github.com/open-mmlab/mmdetection) to calculate mAP.
......@@ -2,6 +2,7 @@
:maxdepth: 2
waymo_det.md
sunrgbd_det.md
scannet_det.md
scannet_sem_seg.md
s3dis_sem_seg.md
# 3D目标检测Scannet数据集
\ No newline at end of file
# 3D 目标检测 Scannet 数据集
# 3D 目标检测 SUN RGB-D 数据集
## 数据集的准备
对于数据集准备的整体流程,请参考 SUN RGB-D 的[指南](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md/)
### 下载 SUN RGB-D 数据与工具包
[这里](http://rgbd.cs.princeton.edu/data/)下载 SUN RGB-D 的数据。接下来,将 `SUNRGBD.zip``SUNRGBDMeta2DBB_v2.mat``SUNRGBDMeta3DBB_v2.mat``SUNRGBDtoolbox.zip` 移动到 `OFFICIAL_SUNRGBD` 文件夹,并解压文件。
下载完成后,数据处理之前的文件目录结构如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
```
### 从原始数据中提取 3D 检测所需数据与标注
通过运行如下指令从原始文件中提取出 SUN RGB-D 的标注(这需要您的机器中安装了 MATLAB):
```bash
matlab -nosplash -nodesktop -r 'extract_split;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
```
主要的步骤包括:
- 提取出训练集和验证集的索引文件;
- 从原始数据中提取出 3D 检测所需要的数据;
- 从原始的标注数据中提取并组织检测任务使用的标注数据。
用于从深度图中提取点云数据的 `extract_rgbd_data_v2.m` 的主要部分如下:
```matlab
data = SUNRGBDMeta(imageId);
data.depthpath(1:16) = '';
data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
data.rgbpath(1:16) = '';
data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
% 从深度图获取点云
[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
rgb(isnan(points3d(:,1)),:) = [];
points3d(isnan(points3d(:,1)),:) = [];
points3d_rgb = [points3d, rgb];
% MAT 文件比 TXT 文件小三倍。在 Python 中我们可以使用
% scipy.io.loadmat('xxx.mat')['points3d_rgb'] 来加载数据
mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
% 保存点云数据
parsave(strcat(depth_folder, mat_filename), points3d_rgb);
```
用于提取并组织检测任务标注的 `extract_rgbd_data_v1.m` 的主要部分如下:
```matlab
% 输出 2D 和 3D 包围框
data2d = data;
fid = fopen(strcat(det_label_folder, txt_filename), 'w');
for j = 1:length(data.groundtruth3DBB)
centroid = data.groundtruth3DBB(j).centroid; % 3D 包围框中心
classname = data.groundtruth3DBB(j).classname; % 类名
orientation = data.groundtruth3DBB(j).orientation; % 3D 包围框方向
coeffs = abs(data.groundtruth3DBB(j).coeffs); % 3D 包围框大小
box2d = data2d.groundtruth2DBB(j).gtBb2D; % 2D 包围框
fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
end
fclose(fid);
```
上面的两个脚本调用了 SUN RGB-D 提供的[工具包](https://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip)中的一些函数,如 `read3dPoints`
使用上述脚本提取数据后,文件目录结构应如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
```
在如下每个文件夹下,都有总计 5285 个训练集样本和 5050 个验证集样本:
- `calib``.txt` 后缀的相机标定文件。
- `depth``.mat` 后缀的点云文件,包含 xyz 坐标和 rgb 色彩值。
- `image``.jpg` 后缀的二维图像文件。
- `label``.txt` 后缀的用于检测任务的标注数据(版本二)。
- `label_v1``.txt` 后缀的用于检测任务的标注数据(版本一)。
- `seg_label``.txt` 后缀的用于分割任务的标注数据。
目前,我们使用版本一的数据用于训练与测试,因此版本二的标注并未使用。
### 创建数据集
请运行如下指令创建数据集:
```shell
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
--out-dir ./data/sunrgbd --extra-tag sunrgbd
```
或者,如果使用 slurm,可以使用如下指令替代:
```
bash tools/create_data.sh <job_name> sunrgbd
```
之前提到的点云数据就会被处理并以 `.bin` 格式重新存储。与此同时,`.pkl` 文件也被生成,用于存储数据标注和元信息。这一步处理中,用于生成 `.pkl` 文件的核心函数 `process_single_scene` 如下:
```python
def process_single_scene(sample_idx):
print(f'{self.split} sample_idx: {sample_idx}')
# 将深度图转换为点云并降采样点云
SAMPLE_NUM = 50000
pc_upright_depth = self.get_depth(sample_idx)
pc_upright_depth_subsampled = random_sampling(
pc_upright_depth, SAMPLE_NUM)
info = dict()
pc_info = {'num_features': 6, 'lidar_idx': sample_idx}
info['point_cloud'] = pc_info
# 将点云保存为 `.bin` 格式
mmcv.mkdir_or_exist(osp.join(self.root_dir, 'points'))
pc_upright_depth_subsampled.tofile(
osp.join(self.root_dir, 'points', f'{sample_idx:06d}.bin'))
# 存储点云存储路径
info['pts_path'] = osp.join('points', f'{sample_idx:06d}.bin')
# 存储图像存储路径以及其元信息
img_path = osp.join('image', f'{sample_idx:06d}.jpg')
image_info = {
'image_idx': sample_idx,
'image_shape': self.get_image_shape(sample_idx),
'image_path': img_path
}
info['image'] = image_info
# 保存标定信息
K, Rt = self.get_calibration(sample_idx)
calib_info = {'K': K, 'Rt': Rt}
info['calib'] = calib_info
# 保存所有数据标注
if has_label:
obj_list = self.get_label_objects(sample_idx)
annotations = {}
annotations['gt_num'] = len([
obj.classname for obj in obj_list
if obj.classname in self.cat2label.keys()
])
if annotations['gt_num'] != 0:
# 类别名称
annotations['name'] = np.array([
obj.classname for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# 二维图像包围框
annotations['bbox'] = np.concatenate([
obj.box2d.reshape(1, 4) for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0)
# depth 坐标系下的三维包围框中心坐标
annotations['location'] = np.concatenate([
obj.centroid.reshape(1, 3) for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0)
# depth 坐标系下的三维包围框大小
annotations['dimensions'] = 2 * np.array([
[obj.l, obj.h, obj.w] for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# depth 坐标系下的三维包围框旋转角
annotations['rotation_y'] = np.array([
obj.heading_angle for obj in obj_list
if obj.classname in self.cat2label.keys()
])
annotations['index'] = np.arange(
len(obj_list), dtype=np.int32)
# 类别标签(数字)
annotations['class'] = np.array([
self.cat2label[obj.classname] for obj in obj_list
if obj.classname in self.cat2label.keys()
])
# depth 坐标系下的三维包围框
annotations['gt_boxes_upright_depth'] = np.stack(
[
obj.box3d for obj in obj_list
if obj.classname in self.cat2label.keys()
], axis=0) # (K,8)
info['annos'] = annotations
return info
```
如上数据处理后,文件目录结构应如下:
```
sunrgbd
├── README.md
├── matlab
│ ├── ...
├── OFFICIAL_SUNRGBD
│ ├── ...
├── sunrgbd_trainval
│ ├── ...
├── points
├── sunrgbd_infos_train.pkl
├── sunrgbd_infos_val.pkl
```
- `points/0xxxxx.bin`:降采样后的点云数据。
- `sunrgbd_infos_train.pkl`:训练集数据信息(标注与元信息),每个场景所含数据信息具体如下:
- info['point_cloud']:`{'num_features': 6, 'lidar_idx': sample_idx}`,其中 `sample_idx` 为该场景的索引。
- info['pts_path']:`points/0xxxxx.bin` 的路径。
- info['image']:图像路径与元信息:
- image['image_idx']:图像索引。
- image['image_shape']:图像张量的形状(即其尺寸)。
- image['image_path']图像路径。
- info['annos']:每个场景的标注:
- annotations['gt_num']:真实物体(ground truth)的数量
- annotations['name']:所有真实物体的语义类别名称,比如 `chair`(椅子)。
- annotations['location']:depth 坐标系下三维包围框的重力(gravity center)中心,形状为 [K, 3],其中 K 是真实物体的数量。
- annotations['dimensions']:depth 坐标系下三维包围框的大小,形状为 [K, 3]。
- annotations['rotation_y']:depth 坐标系下三维包围框的旋转角,形状为 [K, ]。
- annotations['gt_boxes_upright_depth']:depth 坐标系下三维包围框 `(x, y, z, x_size, y_size, z_size, yaw)`,形状为 [K, 7]。
- annotations['bbox']:二维包围框 `(x, y, x_size, y_size)`,形状为 [K, 4]。
- annotations['index']:所有真实物体的索引,范围为 [0, K)。
- annotations['class']:所有真实物体类别的标号范围为 [0, 10),形状为 [K, ]。
- `sunrgbd_infos_val.pkl`:验证集上的数据信息,与 `sunrgbd_infos_train.pkl` 格式完全一致。
## 训练流程
SUN RGB-D 上纯点云 3D 物体检测的经典流程如下:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadAnnotations3D'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='IndoorPointSample', num_points=20000),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```
点云上的数据增强
- `RandomFlip3D`:随机左右或前后翻转输入点云。
- `GlobalRotScaleTrans`:旋转输入点云,对于 SUN RGB-D 角度通常落入 [-30, 30] (度)的范围;并放缩输入点云,对于 SUN RGB-D 比例通常落入 [0.85, 1.15] 的范围;最后平移输入点云,对于 SUN RGB-D 通常位移量为 0。
- `IndoorPointSample`:降采样输入点云。
SUN RGB-D 上多模态(点云和图像)3D 物体检测的经典流程如下:
```python
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations3D'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='IndoorPointSample', num_points=20000),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'points', 'gt_bboxes_3d',
'gt_labels_3d'
])
]
```
图像上的数据增强/归一化
- `Resize`: 改变输入图像的大小, `keep_ratio=True` 意味着图像的比例不改变。
- `Normalize`: 归一化图像的 RGB 通道。
- `RandomFlip`: 随机地翻折图像。
- `Pad`: 扩大图像,默认情况下用零填充图像的边缘。
图像增强和归一化函数的实现取自 [MMDetection](https://github.com/open-mmlab/mmdetection/tree/master/mmdet/datasets/pipelines)
## 度量指标
与 ScanNet 一样,通常 mAP(全类平均精度)被用于SUN RGB-D 的检测任务的评估,比如 `mAP@0.25``mAP@0.5`。具体来说,评估时一个通用的计算 3D 物体检测多个类别的精度和召回率的函数被调用,可以参考 [`indoor_eval.py`](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py)
因为 SUN RGB-D 包含有图像数据,所以图像上的物体检测也是可行的。举个例子,在 ImVoteNet 中,我们首先训练了一个图像检测器,并且也使用 mAP 指标,如 `mAP@0.5`,来评估其表现。我们使用 [MMDetection](https://github.com/open-mmlab/mmdetection) 库中的 `eval_map` 函数来计算 mAP。
......@@ -191,9 +191,9 @@ class SUNRGBDData(object):
],
axis=0)
annotations['dimensions'] = 2 * np.array([
[obj.l, obj.h, obj.w] for obj in obj_list
[obj.l, obj.w, obj.h] for obj in obj_list
if obj.classname in self.cat2label.keys()
]) # lhw(depth) format
]) # lwh (depth) format
annotations['rotation_y'] = np.array([
obj.heading_angle for obj in obj_list
if obj.classname in self.cat2label.keys()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment