DATA.md 11.1 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# Prepare datasets for Painter

The training of our model uses [COCO](https://cocodataset.org/), [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/), [NYUDepthV2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html), [Synthetic Rain Datasets](https://paperswithcode.com/dataset/synthetic-rain-datasets), [SIDD](https://www.eecs.yorku.ca/~kamel/sidd/), and [LoL](https://daooshee.github.io/BMVC2018website/) datasets.

After processing, the datasets should look like:

```
$Painter_ROOT/datasets/
    nyu_depth_v2/
        sync/
        official_splits/
        nyu_depth_v2_labeled.mat
        datasets/nyu_depth_v2/
        nyuv2_sync_image_depth.json  # generated
        nyuv2_test_image_depth.json  # generated
    ade20k/
        images/
        annotations/
        annotations_detectron2/  # generated
        annotations_with_color/  # generated
        ade20k_training_image_semantic.json  # generated
        ade20k_validation_image_semantic.json  # generated
    ADEChallengeData2016/  # sim-link to $Painter_ROOT/datasets/ade20k
    coco/
        train2017/
        val2017/
        annotations/
            instances_train2017.json
            instances_val2017.json
            person_keypoints_val2017.json
            panoptic_train2017.json
            panoptic_val2017.json
            panoptic_train2017/
            panoptic_val2017/
        panoptic_semseg_val2017/  # generated
        panoptic_val2017/  # sim-link to $Painter_ROOT/datasets/coco/annotations/panoptic_val2017
        pano_sem_seg/  # generated
            panoptic_segm_train2017_with_color
            panoptic_segm_val2017_with_color
            coco_train2017_image_panoptic_sem_seg.json
            coco_val2017_image_panoptic_sem_seg.json
        pano_ca_inst/  # generated
            train_aug0/
            train_aug1/
            ...
            train_aug29/
            train_org/
            train_flip/
            val_org/
            coco_train_image_panoptic_inst.json
            coco_val_image_panoptic_inst.json
    coco_pose/
        person_detection_results/
            COCO_val2017_detections_AP_H_56_person.json
        data_pair/  # generated
            train_256x192_aug0/
            train_256x192_aug1/
            ...
            train_256x192_aug19/
            val_256x192/
            test_256x192/
            test_256x192_flip/
        coco_pose_256x192_train.json  # generated
        coco_pose_256x192_val.json  # generated
    derain/
        train/
            input/
            target/
        test/
            Rain100H/
            Rain100L/
            Test100/
            Test1200/
            Test2800/
        derain_train.json
        derain_test_rain100h.json
    denoise/
        SIDD_Medium_Srgb/
        train/
        val/
        denoise_ssid_train.json  # generated
        denoise_ssid_val.json  # generated
    light_enhance/
        our485/
            low/
            high/
        eval15/
            low/
            high/
        enhance_lol_train.json  # generated
        enhance_lol_val.json  # generated

```
Please follow the following instruction to pre-process individual datasets.


## NYU Depth V2

First, download the dataset from [here](https://drive.google.com/file/d/1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP/view?usp=sharing). Please make sure to locate the downloaded file to `$Painter_ROOT/datasets/nyu_depth_v2/sync.zip`

Next, prepare [NYU Depth V2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) test set.
```bash
# get official NYU Depth V2 split file
wget -P datasets/nyu_depth_v2/ http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
# convert mat file to image files
python data/depth/extract_official_train_test_set_from_mat.py datasets/nyu_depth_v2/nyu_depth_v2_labeled.mat data/depth/splits.mat datasets/nyu_depth_v2/official_splits/
```

Lastly, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOT/datasets/nyu_depth_v2/`.
```bash
python data/depth/gen_json_nyuv2_depth.py --split sync
python data/depth/gen_json_nyuv2_depth.py --split test
```

## ADE20k Semantic Segmentation

First, download the dataset from the [official website](https://groups.csail.mit.edu/vision/datasets/ADE20K/), and put it in `$Painter_ROOT/datasets/`. Afterward, unzip the zip file and rename the target folder as `ade20k`. The ADE20k folder should look like:
```
ade20k/
    images/
    annotations/
```

Second, prepare annotations for training using the following command. The generated annotations will be saved at `$Painter_ROOT/datasets/ade20k/annotations_with_color/`.
```bash
python data/ade20k/gen_color_ade20k_sem.py --split training
python data/ade20k/gen_color_ade20k_sem.py --split validation
```

Third, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOT/datasets/ade20k/`.
```bash
python data/ade20k/gen_json_ade20k_sem.py --split training
python data/ade20k/gen_json_ade20k_sem.py --split validation
```

Lastly, to enable evaluation with detectron2, link `$Painter_ROOT/datasets/ade20k` to `$Painter_ROOT/datasets/ADEChallengeData2016` and run:
```bash
# ln -s $Painter_ROOT/datasets/ade20k datasets/ADEChallengeData2016
python data/prepare_ade20k_sem_seg.py
```

## COCO Panoptic Segmentation
Download the COCO2017 dataset and the corresponding panoptic segmentation annotation. The COCO folder should look like:
```
coco/
    train2017/
    val2017/
    annotations/
        instances_train2017.json
        instances_val2017.json
        panoptic_train2017.json
        panoptic_val2017.json
        panoptic_train2017/
        panoptic_val2017/
```

### Prepare Data for COCO Semantic Segmentation
Prepare annotations for training using the following command. The generated annotations will be saved at `$Painter_ROOT/datasets/coco/pano_sem_seg/`.
```bash
python data/coco_semseg/gen_color_coco_panoptic_segm.py --split train2017
python data/coco_semseg/gen_color_coco_panoptic_segm.py --split val2017
```

Prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOT/datasets/coco/pano_sem_seg/`.
```bash
python data/coco_semseg/gen_json_coco_panoptic_segm.py --split train2017
python data/coco_semseg/gen_json_coco_panoptic_segm.py --split val2017
```

### Prepare Data for COCO Class-Agnostic Instance Segmentation 

First, pre-process the dataset using the following command, the painted ground truth will be saved to `$Painter_ROOT/datasets/coco/pano_ca_inst`. 

```bash
cd $Painter_ROOT/data/mmdet_custom

# generate training data with common data augmentation for instance segmentation, 
# note we generate 30 copies by alternating train_aug{idx} in configs/coco_panoptic_ca_inst_gen_aug.py
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_aug.py 1
# generate training data with only horizontal flip augmentation
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_orgflip.py 1
# generate training data w/o data augmentation
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_org.py 1

# generate validation data (w/o data augmentation)
./tools/dist_test.sh configs/coco_panoptic_ca_inst_gen_org.py none 1 --eval segm
```

Next, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOT/datasets/coco/pano_ca_inst`.
```bash
cd $Painter_ROOT
python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split train
python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split val
```

Lastly, to enable evaluation with detectron2, link `$Painter_ROOT/datasets/coco/annotations/panoptic_val2017` to `$Painter_ROOT/datasets/coco/panoptic_val2017` and run:
```bash
# ln -s $Painter_ROOT/datasets/coco/annotations/panoptic_val2017 datasets/coco/panoptic_val2017
python data/prepare_coco_semantic_annos_from_panoptic_annos.py
```


## COCO Human Pose Estimation

First, download person detection result of COCO val2017 from [google drive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk), and put it in `$Painter_ROOT/datasets/coco_pose/`


First, pre-process the dataset using the following command, the painted ground truth will be saved to `$Painter_ROOT/datasets/coco_pose/`. 

```bash
cd $Painter_ROOT/data/mmpose_custom

# generate training data with common data augmentation for pose estimation, note we generate 20 copies for training
./tools/dist_train.sh configs/coco_256x192_gendata.py 1
# genearte data for eval during training
./tools/dist_test.sh configs/coco_256x192_gendata.py none 1

# generate data for testing (using offline boxes)
./tools/dist_test.sh configs/coco_256x192_gendata_test.py none 1
# generate data for testing (using offline boxes & with flip)
./tools/dist_test.sh configs/coco_256x192_gendata_testflip.py none 1
```

Next, prepare json files for training and evaluation. The generated json files will be saved at `datasets/pano_ca_inst/`.
```bash
cd $Painter_ROOT
python data/mmpose_custom/gen_json_coco_pose.py --split train
python data/mmpose_custom/gen_json_coco_pose.py --split val
```


## Low-level Vision Tasks

### Deraining
We follow [MPRNet](https://github.com/swz30/MPRNet) to prepare the data for deraining.

Download the dataset following the instructions in [MPRNet](https://github.com/swz30/MPRNet/blob/main/Deraining/Datasets/README.md), and put it in `$Painter_ROOT/datasets/derain/`. The folder should look like:
```bash
derain/
    train/
        input/
        target/
    test/
        Rain100H/
        Rain100L/
        Test100/
        Test1200/
        Test2800/
```

Next, prepare json files for training and evaluation. The generated json files will be saved at `datasets/derain/`.
```bash
python data/derain/gen_json_rain.py --split train
python data/derain/gen_json_rain.py --split val
```

### Denoising
We follow [Uformer](https://github.com/ZhendongWang6/Uformer) to prepare the data for SIDD denoising dataset.

For training data of SIDD, you can download the SIDD-Medium dataset from the [official url](https://www.eecs.yorku.ca/~kamel/sidd/dataset.php). For evaluation on SIDD, you can download data from [here](https://mailustceducn-my.sharepoint.com/:f:/g/personal/zhendongwang_mail_ustc_edu_cn/Ev832uKaw2JJhwROKqiXGfMBttyFko_zrDVzfSbFFDoi4Q?e=S3p5hQ).

Next, generate image patches for training by the following command:
```bash
python data/sidd/generate_patches_SIDD.py --src_dir datasets/denoise/SIDD_Medium_Srgb/Data --tar_dir datasets/denoise/train
```

Lastly, prepare json files for training and evaluation. The generated json files will be saved at `datasets/denoise/`.
```bash
python data/sidd/gen_json_sidd.py --split train
python data/sidd/gen_json_sidd.py --split val
```


### Low-Light Image Enhancement

First, download images of LOL dataset from [google drive](https://drive.google.com/file/d/157bjO1_cFuSd0HWDUuAmcHRJDVyWpOxB/view) and put it in `$Painter_ROOT/datasets/light_enhance/`. The folder should look like:
look like:
```bash
light_enhance/
    our485/
        low/
        high/
    eval15/
        low/
        high/
```

Next, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOTdatasets/light_enhance/`.
```bash
python data/lol/gen_json_lol.py --split train
python data/lol/gen_json_lol.py --split val
```