customize_dataset.md 18.1 KB
Newer Older
1
# Customize Datasets
twang's avatar
twang committed
2

3
In this note, you will know how to train and test predefined models with customized datasets.
twang's avatar
twang committed
4

5
The basic steps are as below:
twang's avatar
twang committed
6

7
8
1. Prepare data
2. Prepare a config
9
3. Train, test and inference models on the customized dataset
twang's avatar
twang committed
10

11
## Data Preparation
twang's avatar
twang committed
12

13
The ideal situation is that we can reorganize the customized raw data and convert the annotation format into KITTI style. However, considering some calibration files and 3D annotations in KITTI format are difficult to obtain for customized datasets, we introduce the basic data format in the doc.
twang's avatar
twang committed
14

15
### Basic Data Format
twang's avatar
twang committed
16

17
#### Point cloud Format
twang's avatar
twang committed
18

19
Currently, we only support `.bin` format point cloud for training and inference. Before training on your own datasets, you need to convert your point cloud files with other formats to `.bin` files. The common point cloud data formats include `.pcd` and `.las`, we list some open-source tools for reference.
twang's avatar
twang committed
20

21
1. Convert `.pcd` to `.bin`: https://github.com/DanielPollithy/pypcd
22

23
- You can install `pypcd` with the following command:
24

25
26
27
  ```bash
  pip install git+https://github.com/DanielPollithy/pypcd.git
  ```
28

29
- You can use the following script to read the `.pcd` file and convert it to `.bin` format for saving:
30

31
32
33
  ```python
  import numpy as np
  from pypcd import pypcd
34

35
36
37
38
39
40
41
42
43
  pcd_data = pypcd.PointCloud.from_path('point_cloud_data.pcd')
  points = np.zeros([pcd_data.width, 4], dtype=np.float32)
  points[:, 0] = pcd_data.pc_data['x'].copy()
  points[:, 1] = pcd_data.pc_data['y'].copy()
  points[:, 2] = pcd_data.pc_data['z'].copy()
  points[:, 3] = pcd_data.pc_data['intensity'].copy().astype(np.float32)
  with open('point_cloud_data.bin', 'wb') as f:
      f.write(points.tobytes())
  ```
44

45
2. Convert `.las` to `.bin`: The common conversion path is `.las -> .pcd -> .bin`, and the conversion path `.las -> .pcd` can be achieved through [this tool](https://github.com/Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor).
twang's avatar
twang committed
46

47
48
#### Label Format

49
The most basic information: 3D bounding box and category label of each scene need to be contained in the `.txt` annotation file. Each line represents a 3D box in a certain scene as follow:
50

51
```
52
53
54
55
# format: [x, y, z, dx, dy, dz, yaw, category_name]
1.23 1.42 0.23 3.96 1.65 1.55 1.56 Car
3.51 2.15 0.42 1.05 0.87 1.86 1.23 Pedestrian
...
twang's avatar
twang committed
56
57
```

58
**Note**: Currently we only support KITTI Metric evaluation for customized datasets evaluation.
twang's avatar
twang committed
59

60
The 3D Box should be stored in unified 3D coordinates.
twang's avatar
twang committed
61

62
#### Calibration Format
twang's avatar
twang committed
63

64
For the point cloud data collected by each LiDAR, they are usually fused and converted to a certain LiDAR coordinate. So typically the calibration information file should contain the intrinsic matrix of each camera and the transformation extrinsic matrix from the LiDAR to each camera in `.txt` calibration file, while `Px` represents the intrinsic matrix of `camera_x` and `lidar2camx` represents the transformation extrinsic matrix from the `lidar` to `camera_x`.
twang's avatar
twang committed
65
66

```
67
68
69
70
71
72
73
74
75
76
77
78
P0
P1
P2
P3
P4
...
lidar2cam0
lidar2cam1
lidar2cam2
lidar2cam3
lidar2cam4
...
twang's avatar
twang committed
79
80
```

81
### Raw Data Structure
twang's avatar
twang committed
82

83
#### LiDAR-Based 3D Detection
twang's avatar
twang committed
84

85
The raw data for LiDAR-based 3D object detection are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation set, `points` includes point cloud data which are supposed to be stored in `.bin` format and `labels` includes label files for 3D detection.
twang's avatar
twang committed
86

87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── custom
│   │   ├── ImageSets
│   │   │   ├── train.txt
│   │   │   ├── val.txt
│   │   ├── points
│   │   │   ├── 000000.bin
│   │   │   ├── 000001.bin
│   │   │   ├── ...
│   │   ├── labels
│   │   │   ├── 000000.txt
│   │   │   ├── 000001.txt
│   │   │   ├── ...
```
twang's avatar
twang committed
106

107
108
#### Vision-Based 3D Detection

109
The raw data for vision-based 3D object detection are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation set, `images` contains the images from different cameras, for example, images from `camera_x` need to be placed in `images/images_x`, `calibs` contains calibration information files which store the camera intrinsic matrix of each camera, and `labels` includes label files for 3D detection.
twang's avatar
twang committed
110

111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── custom
│   │   ├── ImageSets
│   │   │   ├── train.txt
│   │   │   ├── val.txt
│   │   ├── calibs
│   │   │   ├── 000000.txt
│   │   │   ├── 000001.txt
│   │   │   ├── ...
│   │   ├── images
│   │   │   ├── images_0
│   │   │   │   ├── 000000.png
│   │   │   │   ├── 000001.png
│   │   │   │   ├── ...
│   │   │   ├── images_1
│   │   │   ├── images_2
│   │   │   ├── ...
│   │   ├── labels
│   │   │   ├── 000000.txt
│   │   │   ├── 000001.txt
│   │   │   ├── ...
twang's avatar
twang committed
137
138
```

139
140
#### Multi-Modality 3D Detection

141
The raw data for multi-modality 3D object detection are typically organized as follows. Different from vision-based 3D object detection, calibration information files in `calibs` store the camera intrinsic matrix of each camera and extrinsic matrix.
twang's avatar
twang committed
142

143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── custom
│   │   ├── ImageSets
│   │   │   ├── train.txt
│   │   │   ├── val.txt
│   │   ├── calibs
│   │   │   ├── 000000.txt
│   │   │   ├── 000001.txt
│   │   │   ├── ...
│   │   ├── points
│   │   │   ├── 000000.bin
│   │   │   ├── 000001.bin
│   │   │   ├── ...
│   │   ├── images
│   │   │   ├── images_0
│   │   │   │   ├── 000000.png
│   │   │   │   ├── 000001.png
│   │   │   │   ├── ...
│   │   │   ├── images_1
│   │   │   ├── images_2
│   │   │   ├── ...
│   │   ├── labels
│   │   │   ├── 000000.txt
│   │   │   ├── 000001.txt
│   │   │   ├── ...
twang's avatar
twang committed
173
174
```

175
#### LiDAR-Based 3D Semantic Segmentation
twang's avatar
twang committed
176

177
The raw data for LiDAR-based 3D semantic segmentation are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation set, `points` includes point cloud data, and `semantic_mask` includes point-level label.
twang's avatar
twang committed
178

179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── custom
│   │   ├── ImageSets
│   │   │   ├── train.txt
│   │   │   ├── val.txt
│   │   ├── points
│   │   │   ├── 000000.bin
│   │   │   ├── 000001.bin
│   │   │   ├── ...
│   │   ├── semantic_mask
│   │   │   ├── 000000.bin
│   │   │   ├── 000001.bin
│   │   │   ├── ...
```
twang's avatar
twang committed
198

199
### Data Converter
twang's avatar
twang committed
200

201
Once you prepared the raw data following our instruction, you can directly use the following command to generate training/validation information files.
twang's avatar
twang committed
202

203
204
```bash
python tools/create_data.py custom --root-path ./data/custom --out-dir ./data/custom --extra-tag custom
twang's avatar
twang committed
205
206
```

207
## An example of customized dataset
twang's avatar
twang committed
208

209
Once we finish data preparation, we can create a new dataset in `mmdet3d/datasets/my_dataset.py` to load the data.
twang's avatar
twang committed
210
211

```python
212
import mmengine
twang's avatar
twang committed
213

214
from mmdet3d.registry import DATASETS
215
from .det3d_dataset import Det3DDataset
twang's avatar
twang committed
216
217


218
219
220
221
222
@DATASETS.register_module()
class MyDataset(Det3DDataset):

    # replace with all the classes in customized pkl info file
    METAINFO = {
223
        'classes': ('Pedestrian', 'Cyclist', 'Car')
224
225
226
    }

    def parse_ann_info(self, info):
227
        """Process the `instances` in data info to `ann_info`.
228
229

        Args:
230
            info (dict): Data information of single data sample.
231
232

        Returns:
233
234
235
236
237
            dict: Annotation information consists of the following keys:

                - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`):
                  3D ground truth bboxes.
                - gt_labels_3d (np.ndarray): Labels of ground truths.
238
239
240
241
242
243
244
245
246
247
248
249
250
251
        """
        ann_info = super().parse_ann_info(info)
        if ann_info is None:
            ann_info = dict()
            # empty instance
            ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32)
            ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64)

        # filter the gt classes not used in training
        ann_info = self._remove_dontcare(ann_info)
        gt_bboxes_3d = LiDARInstance3DBoxes(ann_info['gt_bboxes_3d'])
        ann_info['gt_bboxes_3d'] = gt_bboxes_3d
        return ann_info
```
twang's avatar
twang committed
252

253
After the data pre-processing, there are two steps for users to train the customized new dataset:
254

255
256
1. Modify the config file for using the customized dataset.
2. Check the annotations of the customized dataset.
257

258
Here we take training PointPillars on customized dataset as an example:
twang's avatar
twang committed
259

260
### Prepare a config
twang's avatar
twang committed
261

262
Here we demonstrate a config sample for pure point cloud training.
263

264
#### Prepare dataset config
twang's avatar
twang committed
265

266
In `configs/_base_/datasets/custom.py`:
twang's avatar
twang committed
267

268
269
270
271
272
273
274
```python
# dataset settings
dataset_type = 'MyDataset'
data_root = 'data/custom/'
class_names = ['Pedestrian', 'Cyclist', 'Car']  # replace with your dataset class
point_cloud_range = [0, -40, -3, 70.4, 40, 1]  # adjust according to your dataset
input_modality = dict(use_lidar=True, use_camera=False)
275
metainfo = dict(classes=class_names)
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328

train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,  # replace with your point cloud data dimension
        use_dim=4),  # replace with the actual dimension used in training and inference
    dict(
        type='LoadAnnotations3D',
        with_bbox_3d=True,
        with_label_3d=True),
    dict(
        type='ObjectNoise',
        num_try=100,
        translation_std=[1.0, 1.0, 0.5],
        global_rot_range=[0.0, 0.0],
        rot_range=[-0.78539816, 0.78539816]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(
        type='Pack3DDetInputs',
        keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,  # replace with your point cloud data dimension
        use_dim=4),
    dict(type='Pack3DDetInputs', keys=['points'])
]
# construct a pipeline for data and gt loading in show function
eval_pipeline = [
    dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
    dict(type='Pack3DDetInputs', keys=['points']),
]
train_dataloader = dict(
    batch_size=6,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
329
            ann_file='custom_infos_train.pkl',  # specify your training pkl info
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
            data_prefix=dict(pts='points'),
            pipeline=train_pipeline,
            modality=input_modality,
            test_mode=False,
            metainfo=metainfo,
            box_type_3d='LiDAR')))
val_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(pts='points'),
346
        ann_file='custom_infos_val.pkl',  # specify your validation pkl info
347
348
349
350
351
352
353
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        metainfo=metainfo,
        box_type_3d='LiDAR'))
val_evaluator = dict(
    type='KittiMetric',
354
    ann_file=data_root + 'custom_infos_val.pkl',  # specify your validation pkl info
355
356
    metric='bbox')
```
twang's avatar
twang committed
357

358
#### Prepare model config
359

360
361
For voxel-based detectors such as SECOND, PointPillars and CenterPoint, the point cloud range and voxel size should be adjusted according to your dataset.
Theoretically, `voxel_size` is linked to the setting of `point_cloud_range`. Setting a smaller `voxel_size` will increase the voxel num and the corresponding memory consumption. In addition, the following issues need to be noted:
362

363
If the `point_cloud_range` and `voxel_size` are set to be `[0, -40, -3, 70.4, 40, 1]` and `[0.05, 0.05, 0.1]` respectively, then the shape of intermediate feature map should be `[(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]`. When changing `point_cloud_range`, remember to change the shape of intermediate feature map in `middle_encoder` according to the `voxel_size`.
twang's avatar
twang committed
364

365
Regarding the setting of `anchor_range`, it is generally adjusted according to dataset. Note that `z` value needs to be adjusted accordingly to the position of the point cloud, please refer to this [issue](https://github.com/open-mmlab/mmdetection3d/issues/986).
twang's avatar
twang committed
366

367
Regarding the setting of `anchor_size`, it is usually necessary to count the average length, width and height of objects in the entire training dataset as `anchor_size` to obtain the best results.
twang's avatar
twang committed
368

369
In `configs/_base_/models/pointpillars_hv_secfpn_custom.py`:
twang's avatar
twang committed
370
371

```python
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
voxel_size = [0.16, 0.16, 4]  # adjust according to your dataset
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]  # adjust according to your dataset
model = dict(
    type='VoxelNet',
    data_preprocessor=dict(
        type='Det3DDataPreprocessor',
        voxel=True,
        voxel_layer=dict(
            max_num_points=32,
            point_cloud_range=point_cloud_range,
            voxel_size=voxel_size,
            max_voxels=(16000, 40000))),
    voxel_encoder=dict(
        type='PillarFeatureNet',
        in_channels=4,
        feat_channels=[64],
        with_distance=False,
        voxel_size=voxel_size,
        point_cloud_range=point_cloud_range),
    # the `output_shape` should be adjusted according to `point_cloud_range`
    # and `voxel_size`
    middle_encoder=dict(
        type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
    backbone=dict(
        type='SECOND',
        in_channels=64,
        layer_nums=[3, 5, 5],
        layer_strides=[2, 2, 2],
        out_channels=[64, 128, 256]),
    neck=dict(
        type='SECONDFPN',
        in_channels=[64, 128, 256],
        upsample_strides=[1, 2, 4],
        out_channels=[128, 128, 128]),
    bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=384,
        feat_channels=384,
        use_direction_classifier=True,
        assign_per_class=True,
        # adjust the `ranges` and `sizes` according to your dataset
        anchor_generator=dict(
            type='AlignedAnchor3DRangeGenerator',
            ranges=[
                [0, -39.68, -0.6, 69.12, 39.68, -0.6],
                [0, -39.68, -0.6, 69.12, 39.68, -0.6],
                [0, -39.68, -1.78, 69.12, 39.68, -1.78],
            ],
            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
            rotations=[0, 1.57],
            reshape_out=False),
        diff_rad_by_sin=True,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='mmdet.FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(
            type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
        loss_dir=dict(
            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
            loss_weight=0.2)),
    # model training and testing settings
    train_cfg=dict(
        assigner=[
            dict(  # for Pedestrian
                type='Max3DIoUAssigner',
442
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
443
444
445
446
447
448
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(  # for Cyclist
                type='Max3DIoUAssigner',
449
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
450
451
452
453
454
455
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(  # for Car
                type='Max3DIoUAssigner',
456
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
                pos_iou_thr=0.6,
                neg_iou_thr=0.45,
                min_pos_iou=0.45,
                ignore_iof_thr=-1),
        ],
        allowed_border=0,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_thr=0.01,
        score_thr=0.1,
        min_bbox_size=0,
        nms_pre=100,
        max_num=50))
twang's avatar
twang committed
473
474
```

475
#### Prepare overall config
twang's avatar
twang committed
476

477
We combine all the configs above in `configs/pointpillars/pointpillars_hv_secfpn_8xb6_custom.py`:
twang's avatar
twang committed
478
479

```python
480
481
482
483
484
_base_ = [
    '../_base_/models/pointpillars_hv_secfpn_custom.py',
    '../_base_/datasets/custom.py',
    '../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
twang's avatar
twang committed
485
486
```

487
#### Visualize your dataset (optional)
twang's avatar
twang committed
488

489
490
To validate whether your prepared data and config are correct, it's highly recommended to use `tools/misc/browse_dataset.py` script
to visualize your dataset and annotations before training and validation. Please refer to [visualization doc](https://mmdetection3d.readthedocs.io/en/dev-1.x/user_guides/visualization.html) for more details.
twang's avatar
twang committed
491

492
## Evaluation
twang's avatar
twang committed
493

494
Once the data and config have been prepared, you can directly run the training/testing script following our doc.
twang's avatar
twang committed
495

496
**Note**: We only provide an implementation for KITTI style evaluation for the customized dataset. It should be included in the dataset config:
twang's avatar
twang committed
497

498
499
500
```python
val_evaluator = dict(
    type='KittiMetric',
501
    ann_file=data_root + 'custom_infos_val.pkl',  # specify your validation pkl info
502
503
    metric='bbox')
```