First commit

106580f9 · chenych · 106580f9 · 106580f9 · 106580f9 · 106580f9
Commit 106580f9 authored Dec 29, 2023 by chenych
20 changed files
--- a/._toy_datasets
+++ b/._toy_datasets
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+datasets/
+toy_datasets/
+models
+models_inference
+work_dirs/
+wandb
+datasets
+.idea
+
+.nfs*
+*.pth
+log.txt
+log*.txt
+*log.txt
--- a/README.md
+++ b/README.md
+# MaskedDenoising
+## 论文
+[Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://arxiv.org/abs/2212.02499)
+
+## 模型结构
+
+
+
+<div align=center>
+    <img src="./doc/method.jpg"/>
+</div>
+
+## 算法原理
+
+
+
+<div align=center>
+    <img src="./doc/progress.png"/>
+</div>
+
+## 环境配置
+
+### Docker（方法一）
+
+-v 路径、docker_name和imageID根据实际情况修改
+
+```image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
+docker run -it -v /path/your_code_data/:/path/ your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+
+cd /your_code_path/maskeddenoising_pytorch
+pip install --upgrade setuptools wheel
+pip install -r requirement.txt
+```
+
+### Dockerfile（方法二）
+
+-v 路径、docker_name和imageID根据实际情况修改
+
+```
+cd ./docker
+cp ../requirement.txt requirement.txt
+docker build --no-cache -t maskeddenoising:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+
+cd /your_code_path/maskeddenoising_pytorch
+pip install --upgrade setuptools wheel
+pip install -r requirement.txt
+```
+
+### Anaconda（方法三）
+
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/
+
+```
+DTK软件栈：dtk23.04.1
+python：python3.8
+torch：1.13.1
+torchvision：0.14.1
+```
+
+Tips：以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
+
+2、其他非特殊库直接按照requirement.txt安装
+
+```bash
+pip install --upgrade setuptools wheel
+pip install -r requirement.txt
+```
+
+## 数据集
+
+### 数据集所需环境配置
+#### ADE20K Semantic Segmentation
+
+```bash
+git clone https://github.com/facebookresearch/detectron2
+python -m pip install -e detectron2
+```
+
+#### COCO Panoptic Segmentation
+
+```bash
+pip install openmim #（0.3.9）
+mim install mmcv-full # 注意版本是不是1.7.1
+pip install mmdet==2.26.0 # 对应 mmcv-1.7.1
+pip install yapf==0.40.1
+```
+
+#### COCO Pose Estimation
+
+pip install mmcv==1.3.9
+pip install mmpose==0.29.0
+
+或者也可以直接采用源码安装mmpose
+```bash
+# choose commit id `8c58a18b`
+git clone https://github.com/open-mmlab/mmpose.git
+cd mmpose
+pip install -r requirements.txt
+pip install -v -e .
+```
+
+### 数据集下载
+项目数据集需求较多, 可以使用提供的[a toy training dataset](https://huggingface.co/BAAI/Painter/blob/main/toy_datasets.tar)数据集来验证功能, 数据集由每个类别中各10个类别组成. 将数据集放置于 `$Painter_ROOT/toy_datasets` 路径下, 并设置`$Painter_ROOT/train_painter_vit_large.sh` 中 `DATA_PATH=toy_datasets`.
+
+完整所需数据集如下所示:
+
+#### NYU Depth V2
+
+首先, 下载数据集[here](https://drive.google.com/file/d/1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP/view?usp=sharing). 确保将下载的数据集存放到 `$Painter_ROOT/datasets/nyu_depth_v2/sync.zip`
+
+接下来准备NYU_Depth_V2测试集[NYU Depth V2 test](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html).
+
+```bash
+# 下载原始 NYU Depth V2 split file
+wget -P datasets/nyu_depth_v2/ http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
+# 将 mat 数据转换成 image files
+python data/depth/extract_official_train_test_set_from_mat.py datasets/nyu_depth_v2/nyu_depth_v2_labeled.mat data/depth/splits.mat datasets/nyu_depth_v2/official_splits/
+```
+
+最后, 准备训练和验证所需json数据, 生成的json数据将会默认保存到 `$Painter_ROOT/datasets/nyu_depth_v2/` 路径下.
+
+```bash
+python data/depth/gen_json_nyuv2_depth.py --split sync
+python data/depth/gen_json_nyuv2_depth.py --split test
+```
+
+#### ADE20k Semantic Segmentation
+
+首先, 下载数据集 [official website](https://groups.csail.mit.edu/vision/datasets/ADE20K/), 将下载的数据集存放到 `$Painter_ROOT/datasets/`.
+
+接下来, 解压 zip 文件并重命名为`ade20k`. 完成后的 ade20k 文件结构如下所示:
+```bash
+ade20k/
+    images/
+    annotations/
+```
+
+第二, 执行下面的命令准备训练和验证所需的 annotations, 生成的 annotations 将会默认保存到 `$Painter_ROOT/datasets/ade20k/annotations_with_color/` 路径下.
+```bash
+python data/ade20k/gen_color_ade20k_sem.py --split training
+python data/ade20k/gen_color_ade20k_sem.py --split validation
+```
+
+第三, 准备训练和验证所需json文件, 生成的json数据将会默认保存到 `$Painter_ROOT/datasets/ade20k/` 路径下.
+```bash
+python data/ade20k/gen_json_ade20k_sem.py --split training
+python data/ade20k/gen_json_ade20k_sem.py --split validation
+```
+
+最后, 为了确认能通过 detectron2 进行验证, 创建 `$Painter_ROOT/datasets/ade20k` to `$Painter_ROOT/datasets/ADEChallengeData2016` 的软连接, 然后执行下面的操作:
+```bash
+# 创建软连接
+# ln -s $Painter_ROOT/datasets/ade20k datasets/ADEChallengeData2016
+# 执行
+python data/prepare_ade20k_sem_seg.py
+```
+
+#### COCO Panoptic Segmentation
+下载 COCO2017 数据 和 the corresponding panoptic segmentation annotation. 完成后的 COCO 文件结构如下所示:
+```
+coco/
+    train2017/
+    val2017/
+    annotations/
+        instances_train2017.json
+        instances_val2017.json
+        panoptic_train2017.json
+        panoptic_val2017.json
+        panoptic_train2017/
+        panoptic_val2017/
+```
+
+1. 准备 COCO Semantic Segmentation
+准备训练所需的annotations, 生成的annotations默认保存到 `$Painter_ROOT/datasets/coco/pano_sem_seg/` 路径下.
+```bash
+python data/coco_semseg/gen_color_coco_panoptic_segm.py --split train2017
+python data/coco_semseg/gen_color_coco_panoptic_segm.py --split val2017
+```
+
+准备训练和验证所需的json数据, 生成的json数据默认保存到 `$Painter_ROOT/datasets/coco/pano_sem_seg/` 路径下.
+```bash
+python data/coco_semseg/gen_json_coco_panoptic_segm.py --split train2017
+python data/coco_semseg/gen_json_coco_panoptic_segm.py --split val2017
+```
+
+2. 准备 COCO Class-Agnostic Instance Segmentation
+
+第一步, 通过下面的命令对数据进行预处理, 生成的 painted ground truth 将会默认保存到 `$Painter_ROOT/datasets/coco/pano_ca_inst` 路径下.
+
+```bash
+cd $Painter_ROOT/data/mmdet_custom
+
+# 为实例分割生成使用通用数据增强的训练数据, 注意我们通过在configs/coco_panoptic_ca_inst_gen_augg.py中交替生成30个副本train_aug{idx}
+./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_aug.py 1
+# 仅使用水平翻转增强生成训练数据
+./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_orgflip.py 1
+# 生成无数据增强的训练数据
+./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_org.py 1
+# 生成验证数据(无数据增强)
+./tools/dist_test.sh configs/coco_panoptic_ca_inst_gen_org.py none 1 --eval segm
+```
+
+然后, 准备训练和验证所需json文件. 生成的json文件将会默认保存到 `$Painter_ROOT/datasets/coco/pano_ca_inst` 路径下.
+
+```bash
+cd $Painter_ROOT
+python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split train
+python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split val
+```
+
+最后, 为了确保使用detectron2进行验证, 创建`$Painter_ROOT/datasets/coco/annotations/panoptic_val2017` to `$Painter_ROOT/datasets/coco/panoptic_val2017` 的软连接并运行:
+```bash
+# 创建软连接
+# ln -s $Painter_ROOT/datasets/coco/annotations/panoptic_val2017 datasets/coco/panoptic_val2017
+# 执行
+python data/prepare_coco_semantic_annos_from_panoptic_annos.py
+```
+
+#### COCO Human Pose Estimation
+
+首先, 下载COCO val2017的行人检测结果 [google drive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk), 将下载的数据放入 `$Painter_ROOT/datasets/coco_pose/` 路径下
+
+然后, 通过下面的命令对数据进行预处理, 得到的 painted ground truth 默认保存到 `$Painter_ROOT/datasets/coco_pose/` 路径下.
+
+```bash
+cd $Painter_ROOT/data/mmpose_custom
+
+# 生成用于姿态估计的通用数据增强的训练数据, 本项目生成20个副本用于训练, 需要对coco_256x192_gendata.py中52行的aug_idx参数进行对应数量修改,当前默认为0
+./tools/dist_train.sh configs/coco_256x192_gendata.py 1
+# 生成训练期间验证的数据
+./tools/dist_test.sh configs/coco_256x192_gendata.py none 1
+# 生成用于测试的数据(使用离线盒子)
+./tools/dist_test.sh configs/coco_256x192_gendata_test.py none 1
+# 生成用于测试的数据(使用离线盒子+翻转)
+./tools/dist_test.sh configs/coco_256x192_gendata_testflip.py none 1
+```
+
+接着, 准备训练和验证所需json文件. 生成的json文件将会默认保存到 `datasets/pano_ca_inst/` 路径下.
+```bash
+cd $Painter_ROOT
+python data/mmpose_custom/gen_json_coco_pose.py --split train
+python data/mmpose_custom/gen_json_coco_pose.py --split val
+```
+
+#### Low-level Vision Tasks
+
+##### Deraining
+参考[MPRNet](https://github.com/swz30/MPRNet) 进行deraining的数据准备.
+
+跟随[MPRNet](https://github.com/swz30/MPRNet/blob/main/Deraining/Datasets/README.md)的指令步骤下载数据集, 将下载的数据集保存到 `$Painter_ROOT/datasets/derain/`. 完成后的 Derain 文件结构如下所示:
+```
+derain/
+    train/
+        input/
+        target/
+    test/
+        Rain100H/
+        Rain100L/
+        Test100/
+        Test1200/
+        Test2800/
+```
+
+接着, 通过下面的命令, 准备训练和验证所需json文件. 生成的json文件将保存到 `datasets/derain/` 路径下.
+```bash
+python data/derain/gen_json_rain.py --split train
+python data/derain/gen_json_rain.py --split val
+```
+
+### Denoising
+参考[Uformer](https://github.com/ZhendongWang6/Uformer)准备SIDD denoising数据集.
+
+针对训练用的SIDD数据集, 可从[official url](https://www.eecs.yorku.ca/~kamel/sidd/dataset.php)中下载SIDD-Medium dataset数据;
+
+针对验证用的SIDD数据集. 可以从[here](https://mailustceducn-my.sharepoint.com/:f:/g/personal/zhendongwang_mail_ustc_edu_cn/Ev832uKaw2JJhwROKqiXGfMBttyFko_zrDVzfSbFFDoi4Q?e=S3p5hQ)下载.
+
+接下来, 使用以下命令生成用于训练的图像补丁:
+```bash
+python data/sidd/generate_patches_SIDD.py --src_dir datasets/denoise/SIDD_Medium_Srgb/Data --tar_dir datasets/denoise/train
+```
+
+最后, 准备训练和验证所需json文件, 生成的json文件将保存在 `datasets/denoise/` 路径下.
+```bash
+python data/sidd/gen_json_sidd.py --split train
+python data/sidd/gen_json_sidd.py --split val
+```
+
+
+### Low-Light Image Enhancement
+
+首先, 下载 LOL 数据集 [google drive](https://drive.google.com/file/d/157bjO1_cFuSd0HWDUuAmcHRJDVyWpOxB/view), 将下载的数据集存放到 `$Painter_ROOT/datasets/light_enhance/` 路径下. 完成后的 LOL 文件结构如下所示:
+
+```
+light_enhance/
+    our485/
+        low/
+        high/
+    eval15/
+        low/
+        high/
+```
+
+Next, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOTdatasets/light_enhance/`.
+```
+python data/lol/gen_json_lol.py --split train
+python data/lol/gen_json_lol.py --split val
+```
+
+数据集的目录结构如下：
+
+```
+├── nyu_depth_v2/
+│   ├── sync/
+│   ├── official_splits/
+│   ├── nyu_depth_v2_labeled.mat
+│   ├── nyuv2_sync_image_depth.json  # generated
+│   ├── nyuv2_test_image_depth.json  # generated
+├── ade20k/
+│   ├── images/
+│   ├── annotations/
+│   ├── annotations_detectron2/  # generated
+│   ├── annotations_with_color/  # generated
+│   ├── ade20k_training_image_semantic.json  # generated
+│   ├── ade20k_validation_image_semantic.json  # generated
+├── ADEChallengeData2016/  # sim-link to $Painter_ROOT/datasets/ade20k
+├── coco/
+│   ├── train2017/
+│   ├── val2017/
+│   ├── annotations/
+│       ├── instances_train2017.json
+│       ├── instances_val2017.json
+│       ├── person_keypoints_val2017.json
+│       ├── panoptic_train2017.json
+│       ├── panoptic_val2017.json
+│       ├── panoptic_train2017/
+│       ├── panoptic_val2017/
+│   ├── panoptic_semseg_val2017/  # generated
+│   ├── panoptic_val2017/  # sim-link to $Painter_ROOT/datasets/coco/annotations/panoptic_val2017
+│   ├── pano_sem_seg/  # generated
+│       ├── panoptic_segm_train2017_with_color
+│       ├── panoptic_segm_val2017_with_color
+│       ├── coco_train2017_image_panoptic_sem_seg.json
+│       ├── coco_val2017_image_panoptic_sem_seg.json
+│   ├── pano_ca_inst/  # generated
+│       ├── train_aug0/
+│       ├── train_aug1/
+│       ├── ...
+│       ├── train_aug29/
+│       ├── train_org/
+│       ├── train_flip/
+│       ├── val_org/
+│       ├── coco_train_image_panoptic_inst.json
+│       ├── coco_val_image_panoptic_inst.json
+├── coco_pose/
+│   ├── person_detection_results/
+│       ├── COCO_val2017_detections_AP_H_56_person.json
+│   ├── data_pair/  # generated
+│       ├── train_256x192_aug0/
+│       ├── train_256x192_aug1/
+│       ├── ...
+│       ├── train_256x192_aug19/
+│       ├── val_256x192/
+│       ├── test_256x192/
+│       ├── test_256x192_flip/
+│   ├── coco_pose_256x192_train.json  # generated
+│   ├── coco_pose_256x192_val.json  # generated
+├── derain/
+│   ├── train/
+│       ├── input/
+│       ├── target/
+│   ├── test/
+│       ├── Rain100H/
+│       ├── Rain100L/
+│       ├── Test100/
+│       ├── Test1200/
+│       ├── Test2800/
+│   ├── derain_train.json
+│   ├── derain_test_rain100h.json
+├── denoise/
+│   ├── SIDD_Medium_Srgb/
+│   ├── train/
+│   ├── val/
+│   ├── denoise_ssid_train.json  # generated
+│   ├── denoise_ssid_val.json  # generated
+├── light_enhance/
+│   ├── our485/
+│       ├── low/
+│       ├── high/
+│   ├── eval15/
+│       ├── low/
+│       ├── high/
+│   ├── enhance_lol_train.json  # generated
+│   ├── enhance_lol_val.json  # generated
+```
+
+## 训练
+下载预训练模型 [MAE ViT-Large model ](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改`$Painter_ROOT/train_painter_vit_large.sh`中finetune参数地址.
+### 单机多卡
+#### 普通训练
+
+```
+bash train_painter_vit_large.sh
+```
+
+#### 分布式训练
+```
+bash train_multi.sh
+```
+
+## 推理
+下载推理模型[🤗 Hugging Face Models](https://huggingface.co/BAAI/Painter/blob/main/painter_vit_large.pth). The results on various tasks are summarized below:
+
+## NYU Depth V2
+
+To evaluate Painter on NYU Depth V2, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/nyuv2_depth/eval.sh`, then run:
+```bash
+bash eval/nyuv2_depth/eval.sh
+```
+
+## ADE20k Semantic Segmentation
+
+To evaluate Painter on ADE20k semantic segmentation, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/ade20k_semantic/eval.sh`, then run:
+```bash
+bash eval/ade20k_semantic/eval.sh
+```
+
+## COCO Panoptic Segmentation
+
+To evaluate Painter on COCO panoptic segmentation, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/coco_panoptic/eval.sh`, then run:
+```bash
+bash eval/coco_panoptic/eval.sh
+```
+
+
+## COCO Human Pose Estimation
+
+为了评估Painter对COCO姿态的估计, 首先生成绘制的图像:
+```bash
+python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
+python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth --flip_test
+```
+
+Then, you may update the `job_name` and `ckpt_file` in `$Painter_ROOT/eval/mmpose_custom/configs/coco_256x192_test_offline.py`, and run:
+```bash
+cd $Painter_ROOT/eval/mmpose_custom
+./tools/dist_test.sh configs/coco_256x192_test_offline.py none 1 --eval mAP
+```
+
+## Low-level Vision Tasks
+
+### Deraining
+
+To evaluate Painter on deraining, first generate the derained images.
+```bash
+python eval/derain/painter_inference_derain.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
+```
+
+Then, update the path to derained images and ground truth in `$Painter_ROOT/eval/derain/evaluate_PSNR_SSIM.m` and run the following script in MATLAB.
+```bash
+$Painter_ROOT/eval/derain/evaluate_PSNR_SSIM.m
+```
+
+
+### Denoising
+
+To evaluate Painter on SIDD denoising, first generate the denoised images.
+```bash
+python eval/sidd/painter_inference_sidd.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
+```
+
+Then, update the path to denoising output and ground truth in `$Painter_ROOT/eval/sidd/eval_sidd.m` and run the following script in MATLAB.
+```bash
+$Painter_ROOT/eval/sidd/eval_sidd.m
+```
+
+
+### Low-Light Image Enhancement
+
+To evaluate Painter on LoL image enhancement:
+```bash
+python eval/lol/painter_inference_lol.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
+```
+
+#### 单卡推理
+
+```
+bash test.sh
+```
+
+## result
+
+本地测试集测试结果单张展示：
+
+<div align=center>
+    <img src="./doc/origin.png"/>
+</div>
+
+<div align=center>
+    <img src="./doc/results.png"/>
+</div>
+
+### 精度
+
+基于项目提供的测试数据, 得到单卡测试结果如下：
+
+|       | PSNR | SSIM | LPIPS  |
+| :------: | :------: | :------: | :------: |
+| ours | 29.04 | 0.7615 | 0.1294  |
+| paper | 30.13 | 0.7981 | 0.1031 |
+
+
+## 应用场景
+### 算法类别
+图像降噪
+
+### 热点应用行业
+交通,公安,制造
+
+## 源码仓库及问题反馈
+http://developer.hpccube.com/codes/modelzoo/maskeddenoising_pytorch.git
+
+## 参考资料
+https://github.com/haoyuc/MaskedDenoising.git
--- a/data/__init__.py
+++ b/data/__init__.py
--- a/data/ade20k/gen_color_ade20k_sem.py
+++ b/data/ade20k/gen_color_ade20k_sem.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import argparse
+import json
+import tqdm
+import sys
+sys.path.insert(0, "data")
+
+import numpy as np
+from PIL import Image
+
+
+def unique(ar, return_index=False, return_inverse=False, return_counts=False):
+    "copied from https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/master/mit_semseg/utils.py"
+    ar = np.asanyarray(ar).flatten()
+
+    optional_indices = return_index or return_inverse
+    optional_returns = optional_indices or return_counts
+
+    if ar.size == 0:
+        if not optional_returns:
+            ret = ar
+        else:
+            ret = (ar,)
+            if return_index:
+                ret += (np.empty(0, np.bool),)
+            if return_inverse:
+                ret += (np.empty(0, np.bool),)
+            if return_counts:
+                ret += (np.empty(0, np.intp),)
+        return ret
+    if optional_indices:
+        perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
+        aux = ar[perm]
+    else:
+        ar.sort()
+        aux = ar
+    flag = np.concatenate(([True], aux[1:] != aux[:-1]))
+
+    if not optional_returns:
+        ret = aux[flag]
+    else:
+        ret = (aux[flag],)
+        if return_index:
+            ret += (perm[flag],)
+        if return_inverse:
+            iflag = np.cumsum(flag) - 1
+            inv_idx = np.empty(ar.shape, dtype=np.intp)
+            inv_idx[perm] = iflag
+            ret += (inv_idx,)
+        if return_counts:
+            idx = np.concatenate(np.nonzero(flag) + ([ar.size],))
+            ret += (np.diff(idx),)
+    return ret
+
+
+def colorEncode(labelmap, colors, mode='RGB'):
+    "Modified from https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/master/mit_semseg/utils.py"
+    labelmap = labelmap.astype('int')
+    labelmap_rgb = np.zeros((labelmap.shape[0], labelmap.shape[1], 3),
+                            dtype=np.uint8)
+
+    for label in unique(labelmap):
+        if label <= 0:
+            continue
+        # note the color_index = class_index - 1
+        labelmap_rgb += (labelmap == label)[:, :, np.newaxis] * \
+            np.tile(np.array(colors[label-1], dtype=np.uint8), (labelmap.shape[0], labelmap.shape[1], 1))
+
+    if mode == 'BGR':
+        return labelmap_rgb[:, :, ::-1]
+    else:
+        return labelmap_rgb
+
+
+def define_colors_per_location_mean_sep():
+    num_locations = 150
+    num_sep_per_channel = int(num_locations ** (1 / 3)) + 1  # 19
+    separation_per_channel = 256 // num_sep_per_channel
+
+    color_list = []
+    for location in range(num_locations):
+        num_seq_r = location // num_sep_per_channel ** 2
+        num_seq_g = (location % num_sep_per_channel ** 2) // num_sep_per_channel
+        num_seq_b = location % num_sep_per_channel
+        assert (num_seq_r <= num_sep_per_channel) and (num_seq_g <= num_sep_per_channel) \
+               and (num_seq_b <= num_sep_per_channel)
+
+        R = 255 - num_seq_r * separation_per_channel
+        G = 255 - num_seq_g * separation_per_channel
+        B = 255 - num_seq_b * separation_per_channel
+        assert (R < 256) and (G < 256) and (B < 256)
+        assert (R >= 0) and (G >= 0) and (B >= 0)
+        assert (R, G, B) not in color_list
+
+        color_list.append((R, G, B))
+        # print(location, (num_seq_r, num_seq_g, num_seq_b), (R, G, B))
+
+    return color_list
+
+
+PALETTE = define_colors_per_location_mean_sep()
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('ADE20k semantic segmentation preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['training', 'validation'], required=True)
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    args = get_args_parser()
+
+    image_dir = os.path.join("datasets/ade20k/images", args.split)
+    segm_dir = os.path.join("datasets/ade20k/annotations", args.split)
+    save_dir = os.path.join("datasets/ade20k/annotations_with_color", args.split)
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+
+    color_list = define_colors_per_location_mean_sep()
+
+    segm_path_list = glob.glob(os.path.join(segm_dir, '*.png'))
+    for segm_path in tqdm.tqdm(segm_path_list):
+        # check files
+        file_name = os.path.basename(segm_path)
+        # in ade20k, images are jpegs, while segms are pngs
+        image_path = os.path.join(image_dir, file_name.replace('.png', '.jpg'))
+        assert os.path.isfile(segm_path)
+        assert os.path.isfile(image_path)
+
+        # paint colors on segm
+        segm = Image.open(segm_path)
+        segm_color = colorEncode(labelmap=np.array(segm), colors=color_list).astype(np.uint8)
+        segm_color = Image.fromarray(segm_color)
+        segm_color.save(os.path.join(save_dir, file_name))
--- a/data/ade20k/gen_json_ade20k_sem.py
+++ b/data/ade20k/gen_json_ade20k_sem.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import json
+import tqdm
+import argparse
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('ADE20k semantic segmentation preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['training', 'validation'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/ade20k')
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    args = get_args_parser()
+
+    image_dir = os.path.join("datasets/ade20k/images", args.split)
+    annos_dir = os.path.join("datasets/ade20k/annotations_with_color", args.split)
+    save_path = os.path.join(args.output_dir, "ade20k_{}_image_semantic.json".format(args.split))
+
+    output_dict = []
+
+    image_path_list = glob.glob(os.path.join(image_dir, '*g'))
+    for image_path in tqdm.tqdm(image_path_list):
+        image_name = image_path.split('/')[-1].split('.')[0]
+        image_path = os.path.join(image_dir, image_name + '.jpg')
+        panoptic_path = os.path.join(annos_dir, image_name + '.png')
+        assert os.path.isfile(image_path)
+        assert os.path.isfile(panoptic_path)
+        pair_dict = {}
+        pair_dict["image_path"] = os.path.join("ade20k/images/{}/".format(args.split), image_name + ".jpg")
+        pair_dict["target_path"] = "ade20k/annotations_with_color/{}/".format(args.split) + image_name + ".png"
+        pair_dict["type"] = "ade20k_image2semantic"
+        output_dict.append(pair_dict)
+    json.dump(output_dict, open(save_path, 'w'))
--- a/data/coco_semseg/gen_color_coco_panoptic_segm.py
+++ b/data/coco_semseg/gen_color_coco_panoptic_segm.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import copy
+import os
+import argparse
+import glob
+import json
+import warnings
+import tqdm
+import sys
+sys.path.insert(0, "data")
+
+import numpy as np
+from PIL import Image
+import matplotlib.pyplot as plt
+# from .gen_color_coco_stuff_sem import PALETTE
+# from .gen_color_ade20k_sem import unique, colorEncode
+
+from skimage.segmentation import find_boundaries
+from panopticapi.utils import rgb2id, IdGenerator
+
+
+# define colors according to mean separation
+def define_colors_by_mean_sep(num_colors=133, channelsep=7):
+    num_sep_per_channel = channelsep
+    separation_per_channel = 256 // num_sep_per_channel
+
+    color_dict = {}
+    # R = G = B = 0
+    # B += separation_per_channel  # offset for the first loop
+    for location in range(num_colors):
+        num_seq_r = location // num_sep_per_channel ** 2
+        num_seq_g = (location % num_sep_per_channel ** 2) // num_sep_per_channel
+        num_seq_b = location % num_sep_per_channel
+        assert (num_seq_r <= num_sep_per_channel) and (num_seq_g <= num_sep_per_channel) \
+               and (num_seq_b <= num_sep_per_channel)
+
+        R = 255 - num_seq_r * separation_per_channel
+        G = 255 - num_seq_g * separation_per_channel
+        B = 255 - num_seq_b * separation_per_channel
+        assert (R < 256) and (G < 256) and (B < 256)
+        assert (R >= 0) and (G >= 0) and (B >= 0)
+        assert (R, G, B) not in color_dict.values()
+
+        color_dict[location] = (R, G, B)
+        # print(location, (num_seq_r, num_seq_g, num_seq_b), (R, G, B))
+    return color_dict
+
+
+def load_image_with_retry(image_path):
+    while True:
+        try:
+            img = Image.open(image_path)
+            return img
+        except OSError as e:
+            print(f"Catched exception: {str(e)}. Re-trying...")
+            import time
+            time.sleep(1)
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('COCO panoptic segmentation preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['train2017', 'val2017'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/coco/pano_sem_seg')
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    args = get_args_parser()
+    split = args.split
+    channelsep = 7
+
+    json_file = 'datasets/coco/annotations/panoptic_{}.json'.format(split)
+    segmentations_folder = 'datasets/coco/annotations/panoptic_{}'.format(split)
+    img_folder = 'datasets/coco/{}'.format(split)
+    panoptic_coco_categories = 'data/panoptic_coco_categories.json'
+    output_dir = os.path.join(args.output_dir, 'panoptic_segm_{}_with_color'.format(split))
+    print(output_dir)
+
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    else:
+        warnings.warn("{} exists! make sure to overwrite?".format(output_dir))
+        # raise NotImplementedError("{} exists! make sure to overwrite?".format(output_dir))
+
+    # load cat info
+    with open(panoptic_coco_categories, 'r') as f:
+        categories_list = json.load(f)
+    categories = {category['id']: category for category in categories_list}
+    catid2colorid = {category['id']: idx for idx, category in enumerate(categories_list)}
+
+    # define colors (dict of cat_id to color mapper)
+    num_colors = len(categories)
+    color_dict = define_colors_by_mean_sep(num_colors=num_colors, channelsep=channelsep)
+
+    # load image annos
+    with open(json_file, 'r') as f:
+        coco_d = json.load(f)
+
+    num_iscrowd = 0
+    for ann in tqdm.tqdm(coco_d['annotations']):
+        # save the time for loading images
+        # # find input img that correspond to the annotation
+        segmentation_org = np.array(
+            load_image_with_retry(os.path.join(segmentations_folder, ann['file_name'])),
+            dtype=np.uint8
+        )
+        segmentation_id = rgb2id(segmentation_org)
+
+        image_height_segm, image_width_segm = segmentation_org.shape[0], segmentation_org.shape[1]
+        image_height, image_width = image_height_segm, image_width_segm
+
+        segmentation = copy.deepcopy(segmentation_org)
+        segmentation[:, :, :] = 0
+
+        boxes = [seg['bbox'] for seg in ann['segments_info']]  # x, y, w, h
+        if len(boxes) == 0:
+            print("bbox is empty!")
+            continue
+        boxes = np.array(boxes)  # (num_boxes, 4)
+
+        for segment_info in ann['segments_info']:
+            # retrieval color using class id
+            catid = segment_info['category_id']
+            colorid = catid2colorid[catid]
+            color = color_dict[colorid]
+            # paint color
+            mask = segmentation_id == segment_info['id']
+            segmentation[mask] = color
+
+        segmentation = Image.fromarray(segmentation)
+        segmentation.save(os.path.join(output_dir, ann['file_name']))
--- a/data/coco_semseg/gen_json_coco_panoptic_segm.py
+++ b/data/coco_semseg/gen_json_coco_panoptic_segm.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import json
+import tqdm
+import argparse
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('COCO semantic segmentation preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['train2017', 'val2017'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/coco/pano_sem_seg/')
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = get_args_parser()
+
+    image_dir = "datasets/coco/{}/".format(args.split)
+    panoptic_dir = 'datasets/coco/pano_sem_seg/panoptic_segm_{}_with_color/'.format(args.split)
+    save_path = os.path.join(args.output_dir, "coco_{}_image_panoptic_sem_seg.json".format(args.split))
+    print(save_path)
+
+    output_dict = []
+
+    image_path_list = glob.glob(image_dir + '*g')
+    for image_path in tqdm.tqdm(image_path_list):
+        image_name = image_path.split('/')[-1].split('.')[0]
+        image_path = os.path.join(image_dir, image_name+'.jpg')
+        panoptic_path = os.path.join(panoptic_dir, image_name+'.png')
+        assert os.path.isfile(image_path)
+        if not os.path.isfile(panoptic_path):
+            print("ignore {}".format(image_path))
+            continue
+        pair_dict = {}
+        pair_dict["image_path"] = image_path.replace('datasets/', '')
+        pair_dict["target_path"] = panoptic_path.replace('datasets/', '')
+        pair_dict["type"] = "coco_image2panoptic_sem_seg"
+        output_dict.append(pair_dict)
+
+    json.dump(output_dict, open(save_path, 'w'))
--- a/data/depth/download_from_gdrive.py
+++ b/data/depth/download_from_gdrive.py
+# Source: https://stackoverflow.com/a/39225039
+
+import requests
+
+
+def download_file_from_google_drive(id, destination):
+    def get_confirm_token(response):
+        for key, value in response.cookies.items():
+            if key.startswith('download_warning'):
+                return value
+
+        return None
+
+    def save_response_content(response, destination):
+        CHUNK_SIZE = 32768
+
+        with open(destination, "wb") as f:
+            for chunk in response.iter_content(CHUNK_SIZE):
+                if chunk: # filter out keep-alive new chunks
+                    f.write(chunk)
+
+    URL = "https://docs.google.com/uc?export=download"
+
+    session = requests.Session()
+
+    response = session.get(URL, params = { 'id' : id }, stream = True)
+    token = get_confirm_token(response)
+
+    if token:
+        params = { 'id' : id, 'confirm' : token }
+        response = session.get(URL, params = params, stream = True)
+
+    save_response_content(response, destination)    
+
+
+if __name__ == "__main__":
+    import sys
+    if len(sys.argv) != 3:
+        print("Usage: python google_drive.py drive_file_id destination_file_path")
+    else:
+        # TAKE ID FROM SHAREABLE LINK
+        file_id = sys.argv[1]
+        # DESTINATION FILE ON YOUR DISK
+        destination = sys.argv[2]
+        download_file_from_google_drive(file_id, destination)
--- a/data/depth/eval_with_pngs.py
+++ b/data/depth/eval_with_pngs.py
+# Copyright (C) 2019 Jin Han Lee
+#
+# This file is a part of BTS.
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>
+
+from __future__ import absolute_import, division, print_function
+
+import os
+import argparse
+import fnmatch
+import cv2
+import numpy as np
+
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
+
+
+def convert_arg_line_to_args(arg_line):
+    for arg in arg_line.split():
+        if not arg.strip():
+            continue
+        yield arg
+
+
+parser = argparse.ArgumentParser(description='BTS TensorFlow implementation.', fromfile_prefix_chars='@')
+parser.convert_arg_line_to_args = convert_arg_line_to_args
+
+parser.add_argument('--pred_path',           type=str,   help='path to the prediction results in png', required=True)
+parser.add_argument('--gt_path',             type=str,   help='root path to the groundtruth data', required=False,
+                    default='datasets/nyuv2/official_splits/test/')
+parser.add_argument('--dataset',             type=str,   help='dataset to test on, nyu or kitti', default='nyu')
+parser.add_argument('--eigen_crop',                      help='if set, crops according to Eigen NIPS14', action='store_true')
+parser.add_argument('--garg_crop',                       help='if set, crops according to Garg  ECCV16', action='store_true')
+parser.add_argument('--min_depth_eval',      type=float, help='minimum depth for evaluation', default=1e-3)
+parser.add_argument('--max_depth_eval',      type=float, help='maximum depth for evaluation', default=10)
+parser.add_argument('--do_kb_crop',                      help='if set, crop input images as kitti benchmark images', action='store_true')
+
+args = parser.parse_args()
+
+
+def compute_errors(gt, pred):
+    thresh = np.maximum((gt / pred), (pred / gt))
+    d1 = (thresh < 1.25).mean()
+    d2 = (thresh < 1.25 ** 2).mean()
+    d3 = (thresh < 1.25 ** 3).mean()
+
+    rmse = (gt - pred) ** 2
+    rmse = np.sqrt(rmse.mean())
+
+    rmse_log = (np.log(gt) - np.log(pred)) ** 2
+    rmse_log = np.sqrt(rmse_log.mean())
+
+    abs_rel = np.mean(np.abs(gt - pred) / gt)
+    sq_rel = np.mean(((gt - pred)**2) / gt)
+
+    err = np.log(pred) - np.log(gt)
+    silog = np.sqrt(np.mean(err ** 2) - np.mean(err) ** 2) * 100
+
+    err = np.abs(np.log10(pred) - np.log10(gt))
+    log10 = np.mean(err)
+
+    return silog, log10, abs_rel, sq_rel, rmse, rmse_log, d1, d2, d3
+
+
+def test():
+    global gt_depths, missing_ids, pred_filenames
+    gt_depths = []
+    missing_ids = set()
+    pred_filenames = []
+
+    for root, dirnames, filenames in os.walk(args.pred_path):
+        for pred_filename in fnmatch.filter(filenames, '*.png'):
+            if 'cmap' in pred_filename or 'gt' in pred_filename:
+                continue
+            dirname = root.replace(args.pred_path, '')
+            pred_filenames.append(os.path.join(dirname, pred_filename))
+
+    num_test_samples = len(pred_filenames)
+
+    pred_depths = []
+
+    for i in range(num_test_samples):
+        pred_depth_path = os.path.join(args.pred_path, pred_filenames[i])
+        pred_depth = cv2.imread(pred_depth_path, -1)
+        if pred_depth is None:
+            print('Missing: %s ' % pred_depth_path)
+            missing_ids.add(i)
+            continue
+
+        if args.dataset == 'nyu':
+            pred_depth = pred_depth.astype(np.float32) / 1000.0
+        else:
+            pred_depth = pred_depth.astype(np.float32) / 256.0
+
+        pred_depths.append(pred_depth)
+
+    print('Raw png files reading done')
+    print('Evaluating {} files'.format(len(pred_depths)))
+
+    if args.dataset == 'kitti':
+        for t_id in range(num_test_samples):
+            file_dir = pred_filenames[t_id].split('.')[0]
+            filename = file_dir.split('_')[-1]
+            directory = file_dir.replace('_' + filename, '')
+            gt_depth_path = os.path.join(args.gt_path, directory, 'proj_depth/groundtruth/image_02', filename + '.png')
+            depth = cv2.imread(gt_depth_path, -1)
+            if depth is None:
+                print('Missing: %s ' % gt_depth_path)
+                missing_ids.add(t_id)
+                continue
+
+            depth = depth.astype(np.float32) / 256.0
+            gt_depths.append(depth)
+
+    elif args.dataset == 'nyu':
+        for t_id in range(num_test_samples):
+            file_dir = pred_filenames[t_id].split('.')[0]
+            filename = file_dir.split('_')[-1]
+            directory = file_dir.replace('_rgb_'+file_dir.split('_')[-1], '')
+            gt_depth_path = os.path.join(args.gt_path, directory, 'sync_depth_' + filename + '.png')
+            depth = cv2.imread(gt_depth_path, -1)
+            if depth is None:
+                print('Missing: %s ' % gt_depth_path)
+                missing_ids.add(t_id)
+                continue
+
+            depth = depth.astype(np.float32) / 1000.0
+            gt_depths.append(depth)
+
+    print('GT files reading done')
+    print('{} GT files missing'.format(len(missing_ids)))
+
+    print('Computing errors')
+    eval(pred_depths)
+
+    print('Done.')
+
+
+def eval(pred_depths):
+
+    num_samples = len(pred_depths)
+    pred_depths_valid = []
+
+    i = 0
+    for t_id in range(num_samples):
+        if t_id in missing_ids:
+            continue
+
+        pred_depths_valid.append(pred_depths[t_id])
+
+    num_samples = num_samples - len(missing_ids)
+
+    silog = np.zeros(num_samples, np.float32)
+    log10 = np.zeros(num_samples, np.float32)
+    rms = np.zeros(num_samples, np.float32)
+    log_rms = np.zeros(num_samples, np.float32)
+    abs_rel = np.zeros(num_samples, np.float32)
+    sq_rel = np.zeros(num_samples, np.float32)
+    d1 = np.zeros(num_samples, np.float32)
+    d2 = np.zeros(num_samples, np.float32)
+    d3 = np.zeros(num_samples, np.float32)
+    
+    for i in range(num_samples):
+
+        gt_depth = gt_depths[i]
+        pred_depth = pred_depths_valid[i]
+
+        pred_depth[pred_depth < args.min_depth_eval] = args.min_depth_eval
+        pred_depth[pred_depth > args.max_depth_eval] = args.max_depth_eval
+        pred_depth[np.isinf(pred_depth)] = args.max_depth_eval
+
+        gt_depth[np.isinf(gt_depth)] = 0
+        gt_depth[np.isnan(gt_depth)] = 0
+
+        valid_mask = np.logical_and(gt_depth > args.min_depth_eval, gt_depth < args.max_depth_eval)
+
+        if args.do_kb_crop:
+            height, width = gt_depth.shape
+            top_margin = int(height - 352)
+            left_margin = int((width - 1216) / 2)
+            pred_depth_uncropped = np.zeros((height, width), dtype=np.float32)
+            pred_depth_uncropped[top_margin:top_margin + 352, left_margin:left_margin + 1216] = pred_depth
+            pred_depth = pred_depth_uncropped
+
+        if args.garg_crop or args.eigen_crop:
+            gt_height, gt_width = gt_depth.shape
+            eval_mask = np.zeros(valid_mask.shape)
+
+            if args.garg_crop:
+                eval_mask[int(0.40810811 * gt_height):int(0.99189189 * gt_height), int(0.03594771 * gt_width):int(0.96405229 * gt_width)] = 1
+
+            elif args.eigen_crop:
+                if args.dataset == 'kitti':
+                    eval_mask[int(0.3324324 * gt_height):int(0.91351351 * gt_height), int(0.0359477 * gt_width):int(0.96405229 * gt_width)] = 1
+                else:
+                    eval_mask[45:471, 41:601] = 1
+
+            valid_mask = np.logical_and(valid_mask, eval_mask)
+
+        silog[i], log10[i], abs_rel[i], sq_rel[i], rms[i], log_rms[i], d1[i], d2[i], d3[i] = compute_errors(gt_depth[valid_mask], pred_depth[valid_mask])
+
+    print("{:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}".format(
+        'd1', 'd2', 'd3', 'AbsRel', 'SqRel', 'RMSE', 'RMSElog', 'SILog', 'log10'))
+    print("{:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}".format(
+        d1.mean(), d2.mean(), d3.mean(),
+        abs_rel.mean(), sq_rel.mean(), rms.mean(), log_rms.mean(), silog.mean(), log10.mean()))
+
+    return silog, log10, abs_rel, sq_rel, rms, log_rms, d1, d2, d3
+
+
+def main():
+    test()
+
+
+if __name__ == '__main__':
+    main()
+
+
+
--- a/data/depth/extract_official_train_test_set_from_mat.py
+++ b/data/depth/extract_official_train_test_set_from_mat.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#######################################################################################
+# The MIT License
+
+# Copyright (c) 2014       Hannes Schulz, University of Bonn  <schulz@ais.uni-bonn.de>
+# Copyright (c) 2013       Benedikt Waldvogel, University of Bonn <mail@bwaldvogel.de>
+# Copyright (c) 2008-2009  Sebastian Nowozin                       <nowozin@gmail.com>
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+#######################################################################################
+#
+# Helper script to convert the NYU Depth v2 dataset Matlab file into a set of
+# PNG and JPEG images.
+#
+# See https://github.com/deeplearningais/curfil/wiki/Training-and-Prediction-with-the-NYU-Depth-v2-Dataset
+
+from __future__ import print_function
+
+import h5py
+import numpy as np
+import os
+import scipy.io
+import sys
+import cv2
+
+
+def convert_image(i, scene, depth_raw, image):
+
+    idx = int(i) + 1
+    if idx in train_images:
+        train_test = "train"
+    else:
+        assert idx in test_images, "index %d neither found in training set nor in test set" % idx
+        train_test = "test"
+
+    folder = "%s/%s/%s" % (out_folder, train_test, scene)
+    if not os.path.exists(folder):
+        os.makedirs(folder)
+
+    img_depth = depth_raw * 1000.0
+    img_depth_uint16 = img_depth.astype(np.uint16)
+    cv2.imwrite("%s/sync_depth_%05d.png" % (folder, i), img_depth_uint16)
+    image = image[:, :, ::-1]
+    image_black_boundary = np.zeros((480, 640, 3), dtype=np.uint8)
+    image_black_boundary[7:474, 7:632, :] = image[7:474, 7:632, :]
+    cv2.imwrite("%s/rgb_%05d.jpg" % (folder, i), image_black_boundary)
+
+
+if __name__ == "__main__":
+
+    if len(sys.argv) < 4:
+        print("usage: %s <h5_file> <train_test_split> <out_folder>" % sys.argv[0], file=sys.stderr)
+        sys.exit(0)
+
+    h5_file = h5py.File(sys.argv[1], "r")
+    # h5py is not able to open that file. but scipy is
+    train_test = scipy.io.loadmat(sys.argv[2])
+    out_folder = sys.argv[3]
+
+    test_images = set([int(x) for x in train_test["testNdxs"]])
+    train_images = set([int(x) for x in train_test["trainNdxs"]])
+    print("%d training images" % len(train_images))
+    print("%d test images" % len(test_images))
+
+    depth_raw = h5_file['rawDepths']
+
+    print("reading", sys.argv[1])
+
+    images = h5_file['images']
+    # scenes = [u''.join(chr(c) for c in h5_file[obj_ref]) for obj_ref in h5_file['sceneTypes'][0]]
+    scenes = [u''.join(chr(c[0]) for c in h5_file[obj_ref]) for obj_ref in h5_file['sceneTypes'][0]]
+
+    print("processing images")
+    for i, image in enumerate(images):
+        print("image", i + 1, "/", len(images))
+        convert_image(i, scenes[i], depth_raw[i, :, :].T, image.T)
+
+    print("Finished")
\ No newline at end of file
--- a/data/depth/gen_json_nyuv2_depth.py
+++ b/data/depth/gen_json_nyuv2_depth.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import json
+import tqdm
+import argparse
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('NYU Depth V2 preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['sync', 'test'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/nyu_depth_v2')
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    args = get_args_parser()
+
+    split2dir = {
+        'sync': 'sync',
+        'test': 'official_splits/test',
+    }
+
+    split_dir = split2dir[args.split]
+    output_dict = []
+    save_path = os.path.join(args.output_dir, "nyuv2_{}_image_depth.json".format(args.split))
+
+    src_dir = os.path.join("datasets/nyu_depth_v2", split_dir)
+    image_path_list = glob.glob(src_dir + "/*/rgb_*.jpg")
+
+    for image_path in tqdm.tqdm(image_path_list):
+        room_name = image_path.split('/')[-2]
+        frame_name = image_path.split('/')[-1].split('.')[0].split('_')[1]
+        target_path = src_dir + '/' + room_name + '/sync_depth_' + frame_name + '.png'
+        assert os.path.isfile(image_path)
+        assert os.path.isfile(target_path)
+        image_name = image_path.split('{}/'.format(args.split))[-1]
+        target_name = target_path.split('{}/'.format(args.split))[-1]
+
+        pair_dict = {}
+        pair_dict["image_path"] = "nyu_depth_v2/{}/".format(split_dir) + image_name
+        pair_dict["target_path"] = "nyu_depth_v2/{}/".format(split_dir) + target_name
+        pair_dict["type"] = "nyuv2_image2depth"
+        output_dict.append(pair_dict)
+
+    json.dump(output_dict, open(save_path, 'w'))
--- a/data/depth/splits.mat
+++ b/data/depth/splits.mat
--- a/data/derain/gen_json_rain.py
+++ b/data/derain/gen_json_rain.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import json
+import tqdm
+import argparse
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('Deraining preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['train', 'val'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/derain')
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = get_args_parser()
+
+    if args.split == 'train':
+        image_dir = "datasets/derain/train/input/"
+        save_path = os.path.join(args.output_dir, "derain_train.json")
+    elif args.split == 'val':
+        # we simply use Rain100H for loss eval
+        image_dir = "datasets/derain/test/Rain100H/input/"
+        save_path = os.path.join(args.output_dir, "derain_test_rain100h.json")
+    else:
+        raise NotImplementedError
+    print(save_path)
+
+    output_dict = []
+
+    image_path_list = glob.glob(os.path.join(image_dir, '*.png')) + glob.glob(os.path.join(image_dir, '*.jpg'))
+
+    for image_path in tqdm.tqdm(image_path_list):
+        # image_name = os.path.basename(image_path)
+        target_path = image_path.replace('input', 'target')
+        assert os.path.isfile(image_path)
+        assert os.path.isfile(target_path)
+        pair_dict = {}
+        pair_dict["image_path"] = image_path.replace('datasets/', '')
+        pair_dict["target_path"] = target_path.replace('datasets/', '')
+        pair_dict["type"] = "derain_image2derain"
+        output_dict.append(pair_dict)
+
+    json.dump(output_dict, open(save_path, 'w'))
--- a/data/lol/gen_json_lol.py
+++ b/data/lol/gen_json_lol.py
+# --------------------------------------------------------
+# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
+# Github source: https://github.com/baaivision/Painter
+# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
+# Licensed under The MIT License [see LICENSE for details]
+# By Xinlong Wang, Wen Wang
+# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
+# --------------------------------------------------------'
+
+import os
+import glob
+import json
+import tqdm
+import argparse
+
+
+def get_args_parser():
+    parser = argparse.ArgumentParser('Low lignt enhancement preparation', add_help=False)
+    parser.add_argument('--split', type=str, help='dataset split', 
+                        choices=['train', 'val'], required=True)
+    parser.add_argument('--output_dir', type=str, help='path to output dir', 
+                        default='datasets/light_enhance')
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = get_args_parser()
+
+    save_path = os.path.join(args.output_dir, "enhance_lol_{}.json".format(args.split))
+
+    if args.split == 'train':
+        image_dir = "datasets/light_enhance/our485/low/"
+    elif args.split == 'val':
+        image_dir = "datasets/light_enhance/eval15/low/"
+    else:
+        raise NotImplementedError
+    print(save_path)
+
+    output_dict = []
+
+    image_path_list = glob.glob(os.path.join(image_dir, '*.png')) + glob.glob(os.path.join(image_dir, '*.jpg'))
+    for image_path in tqdm.tqdm(image_path_list):
+        target_path = image_path.replace('low', 'high')
+        assert os.path.isfile(image_path)
+        assert os.path.isfile(target_path)
+        pair_dict = {}
+        pair_dict["image_path"] = image_path.replace('datasets/', '')
+        pair_dict["target_path"] = target_path.replace('datasets/', '')
+        pair_dict["type"] = "lol_image2enhance"
+        output_dict.append(pair_dict)
+
+    json.dump(output_dict, open(save_path, 'w'))
--- a/data/mmdet_custom/apis/train.py
+++ b/data/mmdet_custom/apis/train.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import random
+import sys
+import tqdm
+
+import numpy as np
+import torch
+import torch.distributed as dist
+from mmcv.runner import (DistSamplerSeedHook, EpochBasedRunner,
+                         Fp16OptimizerHook, OptimizerHook, build_runner,
+                         get_dist_info)
+
+from mmdet.core import DistEvalHook, EvalHook, build_optimizer
+from mmdet.datasets import (build_dataloader, build_dataset,
+                            replace_ImageToTensor)
+from mmdet.utils import (build_ddp, build_dp, compat_cfg,
+                         find_latest_checkpoint, get_root_logger)
+
+
+def train_detector(model,
+                   dataset,
+                   cfg,
+                   distributed=False,
+                   validate=False,
+                   timestamp=None,
+                   meta=None):
+
+    cfg = compat_cfg(cfg)
+    logger = get_root_logger(log_level=cfg.log_level)
+
+    # prepare data loaders
+    dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
+
+    runner_type = 'EpochBasedRunner' if 'runner' not in cfg else cfg.runner[
+        'type']
+
+    train_dataloader_default_args = dict(
+        samples_per_gpu=2,
+        workers_per_gpu=2,
+        # `num_gpus` will be ignored if distributed
+        num_gpus=len(cfg.gpu_ids),
+        dist=distributed,
+        seed=cfg.seed,
+        runner_type=runner_type,
+        persistent_workers=False)
+
+    train_loader_cfg = {
+        **train_dataloader_default_args,
+        **cfg.data.get('train_dataloader', {})
+    }
+
+    load_data_only = cfg.custom.get('load_data_only', False)
+    assert load_data_only
+    train_loader_cfg_custom = copy.deepcopy(train_loader_cfg)
+    # train_loader_cfg_custom['shuffle'] = False  # we prefer gen data in order
+    # train_loader_cfg_custom['dist'] = False
+    data_loaders = [build_dataloader(ds, **train_loader_cfg_custom) for ds in dataset]
+    # only enumerate dataset
+    for data_loader in data_loaders:
+        for _ in tqdm.tqdm(data_loader):
+            pass
+    print("dataset enumerated, exit!")
+    sys.exit()
--- a/data/mmdet_custom/configs/_base_/dataset/coco_panoptic.py
+++ b/data/mmdet_custom/configs/_base_/dataset/coco_panoptic.py
+# dataset settings
+dataset_type = 'CocoPanopticDatasetCustom'
+data_root = '/home/datasets/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='LoadPanopticAnnotations',
+        with_bbox=True,
+        with_mask=True,
+        with_seg=True),
+    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='SegRescale', scale_factor=1 / 4),
+    dict(type='DefaultFormatBundle'),
+    dict(
+        type='Collect',
+        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/panoptic_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        seg_prefix=data_root + 'annotations/panoptic_train2017/',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/panoptic_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        seg_prefix=data_root + 'annotations/panoptic_val2017/',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/panoptic_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        seg_prefix=data_root + 'annotations/panoptic_val2017/',
+        pipeline=test_pipeline))
+evaluation = dict(interval=1, metric=['PQ'])
--- a/data/mmdet_custom/configs/_base_/default_runtime.py
+++ b/data/mmdet_custom/configs/_base_/default_runtime.py
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+custom_hooks = [dict(type='NumClassCheckHook')]
+
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+
+# disable opencv multithreading to avoid system being overloaded
+opencv_num_threads = 0
+# set multi-process start method as `fork` to speed up the training
+mp_start_method = 'fork'
+
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=16)
--- a/data/mmdet_custom/configs/coco_panoptic_ca_inst_gen_aug.py
+++ b/data/mmdet_custom/configs/coco_panoptic_ca_inst_gen_aug.py
+# modified from mask2former config
+_base_ = [
+    './_base_/dataset/coco_panoptic.py', './_base_/default_runtime.py'
+]
+num_things_classes = 80
+num_stuff_classes = 53
+num_classes = num_things_classes + num_stuff_classes
+model = None
+
+# dataset settings
+image_size = (1024, 1024)
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile', to_float32=True),
+    dict(
+        type='LoadPanopticAnnotations',
+        with_bbox=True,
+        with_mask=True,
+        with_seg=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    # large scale jittering
+    dict(
+        type='Resize',
+        img_scale=image_size,
+        ratio_range=(0.7, 2.0),
+        multiscale_mode='range',
+        keep_ratio=False),
+    dict(
+        type='RandomCrop',
+        crop_size=image_size,
+        crop_type='absolute',
+        recompute_bbox=True,
+        allow_negative_crop=True),
+    # dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=image_size),
+    dict(
+        type='SaveDataPairCustom',
+        dir_name='train_aug0',
+        target_path='/home/datasets/coco/pano_ca_inst',
+    ),  # custom, we don't care the transforms afterward
+    dict(type='DefaultFormatBundle', img_to_float=True),
+    dict(
+        type='Collect',
+        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile', to_float32=True),
+    dict(
+        type='LoadPanopticAnnotations',
+        with_bbox=True,
+        with_mask=True,
+        with_seg=True),
+    dict(type='RandomFlip', flip_ratio=0.0),
+    # large scale jittering
+    dict(
+        type='Resize',
+        img_scale=image_size,
+        ratio_range=(1.0, 1.0),
+        multiscale_mode='range',
+        keep_ratio=False),
+    dict(type='Pad', size=image_size),
+    dict(
+        type='SaveDataPairCustom',
+        dir_name='val_org',
+        target_path='/home/datasets/coco/pano_ca_inst',
+    ),  # custom, we don't care the transforms afterward
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='DefaultFormatBundle', img_to_float=True),
+    dict(
+        type='Collect',
+        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
+]
+
+data_root = '/home/datasets/coco/'
+data = dict(
+    samples_per_gpu=8,
+    workers_per_gpu=8,
+    train=dict(pipeline=train_pipeline),
+    val=dict(
+        pipeline=test_pipeline,
+        ins_ann_file=data_root + 'annotations/instances_val2017.json',
+    ),
+    test=dict(
+        pipeline=test_pipeline,
+        ins_ann_file=data_root + 'annotations/instances_val2017.json',
+    ))
+
+embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
+# optimizer
+optimizer = dict(
+    type='AdamW',
+    lr=0.0001,
+    weight_decay=0.05,
+    eps=1e-8,
+    betas=(0.9, 0.999),
+    paramwise_cfg=dict(
+        custom_keys={
+            'backbone': dict(lr_mult=0.1, decay_mult=1.0),
+            'query_embed': embed_multi,
+            'query_feat': embed_multi,
+            'level_embed': embed_multi,
+        },
+        norm_decay_mult=0.0))
+optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))
+
+custom = dict(
+    load_data_only=True,
+)
+by_epoch = True
+# learning policy
+lr_config = dict(
+    policy='step',
+    gamma=0.1,
+    by_epoch=by_epoch,
+    step=[327778, 355092],
+    warmup='linear',
+    warmup_by_epoch=by_epoch,
+    warmup_ratio=1.0,  # no warmup
+    warmup_iters=10)
+
+max_iters = 368750
+# runner = dict(type='IterBasedRunner', max_iters=max_iters)
+runner = dict(type='EpochBasedRunner', max_epochs=1)  # we prefer by epoch
+
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook', by_epoch=by_epoch),
+        dict(type='TensorboardLoggerHook', by_epoch=by_epoch)
+    ])
+interval = 5000
+workflow = [('train', interval)]
+checkpoint_config = dict(
+    by_epoch=by_epoch, interval=interval, save_last=True, max_keep_ckpts=3)
+
+# Before 365001th iteration, we do evaluation every 5000 iterations.
+# After 365000th iteration, we do evaluation every 368750 iterations,
+# which means that we do evaluation at the end of training.
+dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
+evaluation = dict(
+    interval=interval,
+    dynamic_intervals=dynamic_intervals,
+    metric=['PQ', 'bbox', 'segm'])
+
+# import newly registered module
+custom_imports = dict(
+    imports=[
+        'data.coco_panoptic',
+        'data.pipelines.transforms',
+    ],
+    allow_failed_imports=False)
--- a/data/mmdet_custom/configs/coco_panoptic_ca_inst_gen_org.py
+++ b/data/mmdet_custom/configs/coco_panoptic_ca_inst_gen_org.py
+# modified from mask2former config
+_base_ = [
+    './_base_/dataset/coco_panoptic.py', './_base_/default_runtime.py'
+]
+num_things_classes = 80
+num_stuff_classes = 53
+num_classes = num_things_classes + num_stuff_classes
+model = None
+
+# dataset settings
+image_size = (1024, 1024)
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile', to_float32=True),
+    dict(
+        type='LoadPanopticAnnotations',
+        with_bbox=True,
+        with_mask=True,
+        with_seg=True),
+    dict(type='RandomFlip', flip_ratio=0.0),
+    # # large scale jittering
+    dict(
+        type='Resize',
+        img_scale=image_size,
+        ratio_range=(1.0, 1.0),
+        multiscale_mode='range',
+        keep_ratio=False),
+    # dict(
+    #     type='RandomCrop',
+    #     crop_size=image_size,
+    #     crop_type='absolute',
+    #     recompute_bbox=True,
+    #     allow_negative_crop=True),
+    # dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=image_size),
+    dict(
+        type='SaveDataPairCustom',
+        dir_name='train_org',
+        target_path='/home/datasets/coco/pano_ca_inst',
+    ),  # custom, we don't care the transforms afterward
+    dict(type='DefaultFormatBundle', img_to_float=True),
+    dict(
+        type='Collect',
+        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile', to_float32=True),
+    dict(
+        type='LoadPanopticAnnotations',
+        with_bbox=True,
+        with_mask=True,
+        with_seg=True),
+    dict(type='RandomFlip', flip_ratio=0.0),
+    # large scale jittering
+    dict(
+        type='Resize',
+        img_scale=image_size,
+        ratio_range=(1.0, 1.0),
+        multiscale_mode='range',
+        keep_ratio=False),
+    dict(type='Pad', size=image_size),
+    dict(
+        type='SaveDataPairCustom',
+        dir_name='val_org',
+        target_path='/home/datasets/coco/pano_ca_inst',
+    ),  # custom, we don't care the transforms afterward
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='DefaultFormatBundle', img_to_float=True),
+    dict(
+        type='Collect',
+        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
+]
+
+data_root = '/home/datasets/coco/'
+data = dict(
+    samples_per_gpu=8,
+    workers_per_gpu=8,
+    train=dict(pipeline=train_pipeline),
+    val=dict(
+        pipeline=test_pipeline,
+        ins_ann_file=data_root + 'annotations/instances_val2017.json',
+    ),
+    test=dict(
+        pipeline=test_pipeline,
+        ins_ann_file=data_root + 'annotations/instances_val2017.json',
+    ))
+
+embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
+# optimizer
+optimizer = dict(
+    type='AdamW',
+    lr=0.0001,
+    weight_decay=0.05,
+    eps=1e-8,
+    betas=(0.9, 0.999),
+    paramwise_cfg=dict(
+        custom_keys={
+            'backbone': dict(lr_mult=0.1, decay_mult=1.0),
+            'query_embed': embed_multi,
+            'query_feat': embed_multi,
+            'level_embed': embed_multi,
+        },
+        norm_decay_mult=0.0))
+optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))
+
+custom = dict(
+    load_data_only=True,
+)
+by_epoch = True
+# learning policy
+lr_config = dict(
+    policy='step',
+    gamma=0.1,
+    by_epoch=by_epoch,
+    step=[327778, 355092],
+    warmup='linear',
+    warmup_by_epoch=by_epoch,
+    warmup_ratio=1.0,  # no warmup
+    warmup_iters=10)
+
+max_iters = 368750
+# runner = dict(type='IterBasedRunner', max_iters=max_iters)
+runner = dict(type='EpochBasedRunner', max_epochs=1)  # we prefer by epoch
+
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook', by_epoch=by_epoch),
+        dict(type='TensorboardLoggerHook', by_epoch=by_epoch)
+    ])
+interval = 5000
+workflow = [('train', interval)]
+checkpoint_config = dict(
+    by_epoch=by_epoch, interval=interval, save_last=True, max_keep_ckpts=3)
+
+# Before 365001th iteration, we do evaluation every 5000 iterations.
+# After 365000th iteration, we do evaluation every 368750 iterations,
+# which means that we do evaluation at the end of training.
+dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
+evaluation = dict(
+    interval=interval,
+    dynamic_intervals=dynamic_intervals,
+    metric=['PQ', 'bbox', 'segm'])
+
+# import newly registered module
+custom_imports = dict(
+    imports=[
+        'data.coco_panoptic',
+        'data.pipelines.transforms',
+    ],
+    allow_failed_imports=False)