Commit 106580f9 authored by chenych's avatar chenych
Browse files

First commit

parents
Pipeline #689 failed with stages
in 0 seconds
File added
__pycache__
datasets/
toy_datasets/
models
models_inference
work_dirs/
wandb
datasets
.idea
.nfs*
*.pth
log.txt
log*.txt
*log.txt
# MaskedDenoising
## 论文
[Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://arxiv.org/abs/2212.02499)
## 模型结构
<div align=center>
<img src="./doc/method.jpg"/>
</div>
## 算法原理
<div align=center>
<img src="./doc/progress.png"/>
</div>
## 环境配置
### Docker(方法一)
-v 路径、docker_name和imageID根据实际情况修改
```image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker run -it -v /path/your_code_data/:/path/ your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/maskeddenoising_pytorch
pip install --upgrade setuptools wheel
pip install -r requirement.txt
```
### Dockerfile(方法二)
-v 路径、docker_name和imageID根据实际情况修改
```
cd ./docker
cp ../requirement.txt requirement.txt
docker build --no-cache -t maskeddenoising:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/maskeddenoising_pytorch
pip install --upgrade setuptools wheel
pip install -r requirement.txt
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
```
DTK软件栈:dtk23.04.1
python:python3.8
torch:1.13.1
torchvision:0.14.1
```
Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
2、其他非特殊库直接按照requirement.txt安装
```bash
pip install --upgrade setuptools wheel
pip install -r requirement.txt
```
## 数据集
### 数据集所需环境配置
#### ADE20K Semantic Segmentation
```bash
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```
#### COCO Panoptic Segmentation
```bash
pip install openmim #(0.3.9)
mim install mmcv-full # 注意版本是不是1.7.1
pip install mmdet==2.26.0 # 对应 mmcv-1.7.1
pip install yapf==0.40.1
```
#### COCO Pose Estimation
pip install mmcv==1.3.9
pip install mmpose==0.29.0
或者也可以直接采用源码安装mmpose
```bash
# choose commit id `8c58a18b`
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip install -r requirements.txt
pip install -v -e .
```
### 数据集下载
项目数据集需求较多, 可以使用提供的[a toy training dataset](https://huggingface.co/BAAI/Painter/blob/main/toy_datasets.tar)数据集来验证功能, 数据集由每个类别中各10个类别组成. 将数据集放置于 `$Painter_ROOT/toy_datasets` 路径下, 并设置`$Painter_ROOT/train_painter_vit_large.sh``DATA_PATH=toy_datasets`.
完整所需数据集如下所示:
#### NYU Depth V2
首先, 下载数据集[here](https://drive.google.com/file/d/1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP/view?usp=sharing). 确保将下载的数据集存放到 `$Painter_ROOT/datasets/nyu_depth_v2/sync.zip`
接下来准备NYU_Depth_V2测试集[NYU Depth V2 test](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html).
```bash
# 下载原始 NYU Depth V2 split file
wget -P datasets/nyu_depth_v2/ http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
# 将 mat 数据转换成 image files
python data/depth/extract_official_train_test_set_from_mat.py datasets/nyu_depth_v2/nyu_depth_v2_labeled.mat data/depth/splits.mat datasets/nyu_depth_v2/official_splits/
```
最后, 准备训练和验证所需json数据, 生成的json数据将会默认保存到 `$Painter_ROOT/datasets/nyu_depth_v2/` 路径下.
```bash
python data/depth/gen_json_nyuv2_depth.py --split sync
python data/depth/gen_json_nyuv2_depth.py --split test
```
#### ADE20k Semantic Segmentation
首先, 下载数据集 [official website](https://groups.csail.mit.edu/vision/datasets/ADE20K/), 将下载的数据集存放到 `$Painter_ROOT/datasets/`.
接下来, 解压 zip 文件并重命名为`ade20k`. 完成后的 ade20k 文件结构如下所示:
```bash
ade20k/
images/
annotations/
```
第二, 执行下面的命令准备训练和验证所需的 annotations, 生成的 annotations 将会默认保存到 `$Painter_ROOT/datasets/ade20k/annotations_with_color/` 路径下.
```bash
python data/ade20k/gen_color_ade20k_sem.py --split training
python data/ade20k/gen_color_ade20k_sem.py --split validation
```
第三, 准备训练和验证所需json文件, 生成的json数据将会默认保存到 `$Painter_ROOT/datasets/ade20k/` 路径下.
```bash
python data/ade20k/gen_json_ade20k_sem.py --split training
python data/ade20k/gen_json_ade20k_sem.py --split validation
```
最后, 为了确认能通过 detectron2 进行验证, 创建 `$Painter_ROOT/datasets/ade20k` to `$Painter_ROOT/datasets/ADEChallengeData2016` 的软连接, 然后执行下面的操作:
```bash
# 创建软连接
# ln -s $Painter_ROOT/datasets/ade20k datasets/ADEChallengeData2016
# 执行
python data/prepare_ade20k_sem_seg.py
```
#### COCO Panoptic Segmentation
下载 COCO2017 数据 和 the corresponding panoptic segmentation annotation. 完成后的 COCO 文件结构如下所示:
```
coco/
train2017/
val2017/
annotations/
instances_train2017.json
instances_val2017.json
panoptic_train2017.json
panoptic_val2017.json
panoptic_train2017/
panoptic_val2017/
```
1. 准备 COCO Semantic Segmentation
准备训练所需的annotations, 生成的annotations默认保存到 `$Painter_ROOT/datasets/coco/pano_sem_seg/` 路径下.
```bash
python data/coco_semseg/gen_color_coco_panoptic_segm.py --split train2017
python data/coco_semseg/gen_color_coco_panoptic_segm.py --split val2017
```
准备训练和验证所需的json数据, 生成的json数据默认保存到 `$Painter_ROOT/datasets/coco/pano_sem_seg/` 路径下.
```bash
python data/coco_semseg/gen_json_coco_panoptic_segm.py --split train2017
python data/coco_semseg/gen_json_coco_panoptic_segm.py --split val2017
```
2. 准备 COCO Class-Agnostic Instance Segmentation
第一步, 通过下面的命令对数据进行预处理, 生成的 painted ground truth 将会默认保存到 `$Painter_ROOT/datasets/coco/pano_ca_inst` 路径下.
```bash
cd $Painter_ROOT/data/mmdet_custom
# 为实例分割生成使用通用数据增强的训练数据, 注意我们通过在configs/coco_panoptic_ca_inst_gen_augg.py中交替生成30个副本train_aug{idx}
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_aug.py 1
# 仅使用水平翻转增强生成训练数据
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_orgflip.py 1
# 生成无数据增强的训练数据
./tools/dist_train.sh configs/coco_panoptic_ca_inst_gen_org.py 1
# 生成验证数据(无数据增强)
./tools/dist_test.sh configs/coco_panoptic_ca_inst_gen_org.py none 1 --eval segm
```
然后, 准备训练和验证所需json文件. 生成的json文件将会默认保存到 `$Painter_ROOT/datasets/coco/pano_ca_inst` 路径下.
```bash
cd $Painter_ROOT
python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split train
python data/mmdet_custom/gen_json_coco_panoptic_inst.py --split val
```
最后, 为了确保使用detectron2进行验证, 创建`$Painter_ROOT/datasets/coco/annotations/panoptic_val2017` to `$Painter_ROOT/datasets/coco/panoptic_val2017` 的软连接并运行:
```bash
# 创建软连接
# ln -s $Painter_ROOT/datasets/coco/annotations/panoptic_val2017 datasets/coco/panoptic_val2017
# 执行
python data/prepare_coco_semantic_annos_from_panoptic_annos.py
```
#### COCO Human Pose Estimation
首先, 下载COCO val2017的行人检测结果 [google drive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk), 将下载的数据放入 `$Painter_ROOT/datasets/coco_pose/` 路径下
然后, 通过下面的命令对数据进行预处理, 得到的 painted ground truth 默认保存到 `$Painter_ROOT/datasets/coco_pose/` 路径下.
```bash
cd $Painter_ROOT/data/mmpose_custom
# 生成用于姿态估计的通用数据增强的训练数据, 本项目生成20个副本用于训练, 需要对coco_256x192_gendata.py中52行的aug_idx参数进行对应数量修改,当前默认为0
./tools/dist_train.sh configs/coco_256x192_gendata.py 1
# 生成训练期间验证的数据
./tools/dist_test.sh configs/coco_256x192_gendata.py none 1
# 生成用于测试的数据(使用离线盒子)
./tools/dist_test.sh configs/coco_256x192_gendata_test.py none 1
# 生成用于测试的数据(使用离线盒子+翻转)
./tools/dist_test.sh configs/coco_256x192_gendata_testflip.py none 1
```
接着, 准备训练和验证所需json文件. 生成的json文件将会默认保存到 `datasets/pano_ca_inst/` 路径下.
```bash
cd $Painter_ROOT
python data/mmpose_custom/gen_json_coco_pose.py --split train
python data/mmpose_custom/gen_json_coco_pose.py --split val
```
#### Low-level Vision Tasks
##### Deraining
参考[MPRNet](https://github.com/swz30/MPRNet) 进行deraining的数据准备.
跟随[MPRNet](https://github.com/swz30/MPRNet/blob/main/Deraining/Datasets/README.md)的指令步骤下载数据集, 将下载的数据集保存到 `$Painter_ROOT/datasets/derain/`. 完成后的 Derain 文件结构如下所示:
```
derain/
train/
input/
target/
test/
Rain100H/
Rain100L/
Test100/
Test1200/
Test2800/
```
接着, 通过下面的命令, 准备训练和验证所需json文件. 生成的json文件将保存到 `datasets/derain/` 路径下.
```bash
python data/derain/gen_json_rain.py --split train
python data/derain/gen_json_rain.py --split val
```
### Denoising
参考[Uformer](https://github.com/ZhendongWang6/Uformer)准备SIDD denoising数据集.
针对训练用的SIDD数据集, 可从[official url](https://www.eecs.yorku.ca/~kamel/sidd/dataset.php)中下载SIDD-Medium dataset数据;
针对验证用的SIDD数据集. 可以从[here](https://mailustceducn-my.sharepoint.com/:f:/g/personal/zhendongwang_mail_ustc_edu_cn/Ev832uKaw2JJhwROKqiXGfMBttyFko_zrDVzfSbFFDoi4Q?e=S3p5hQ)下载.
接下来, 使用以下命令生成用于训练的图像补丁:
```bash
python data/sidd/generate_patches_SIDD.py --src_dir datasets/denoise/SIDD_Medium_Srgb/Data --tar_dir datasets/denoise/train
```
最后, 准备训练和验证所需json文件, 生成的json文件将保存在 `datasets/denoise/` 路径下.
```bash
python data/sidd/gen_json_sidd.py --split train
python data/sidd/gen_json_sidd.py --split val
```
### Low-Light Image Enhancement
首先, 下载 LOL 数据集 [google drive](https://drive.google.com/file/d/157bjO1_cFuSd0HWDUuAmcHRJDVyWpOxB/view), 将下载的数据集存放到 `$Painter_ROOT/datasets/light_enhance/` 路径下. 完成后的 LOL 文件结构如下所示:
```
light_enhance/
our485/
low/
high/
eval15/
low/
high/
```
Next, prepare json files for training and evaluation. The generated json files will be saved at `$Painter_ROOTdatasets/light_enhance/`.
```
python data/lol/gen_json_lol.py --split train
python data/lol/gen_json_lol.py --split val
```
数据集的目录结构如下:
```
├── nyu_depth_v2/
│ ├── sync/
│ ├── official_splits/
│ ├── nyu_depth_v2_labeled.mat
│ ├── nyuv2_sync_image_depth.json # generated
│ ├── nyuv2_test_image_depth.json # generated
├── ade20k/
│ ├── images/
│ ├── annotations/
│ ├── annotations_detectron2/ # generated
│ ├── annotations_with_color/ # generated
│ ├── ade20k_training_image_semantic.json # generated
│ ├── ade20k_validation_image_semantic.json # generated
├── ADEChallengeData2016/ # sim-link to $Painter_ROOT/datasets/ade20k
├── coco/
│ ├── train2017/
│ ├── val2017/
│ ├── annotations/
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ ├── person_keypoints_val2017.json
│ ├── panoptic_train2017.json
│ ├── panoptic_val2017.json
│ ├── panoptic_train2017/
│ ├── panoptic_val2017/
│ ├── panoptic_semseg_val2017/ # generated
│ ├── panoptic_val2017/ # sim-link to $Painter_ROOT/datasets/coco/annotations/panoptic_val2017
│ ├── pano_sem_seg/ # generated
│ ├── panoptic_segm_train2017_with_color
│ ├── panoptic_segm_val2017_with_color
│ ├── coco_train2017_image_panoptic_sem_seg.json
│ ├── coco_val2017_image_panoptic_sem_seg.json
│ ├── pano_ca_inst/ # generated
│ ├── train_aug0/
│ ├── train_aug1/
│ ├── ...
│ ├── train_aug29/
│ ├── train_org/
│ ├── train_flip/
│ ├── val_org/
│ ├── coco_train_image_panoptic_inst.json
│ ├── coco_val_image_panoptic_inst.json
├── coco_pose/
│ ├── person_detection_results/
│ ├── COCO_val2017_detections_AP_H_56_person.json
│ ├── data_pair/ # generated
│ ├── train_256x192_aug0/
│ ├── train_256x192_aug1/
│ ├── ...
│ ├── train_256x192_aug19/
│ ├── val_256x192/
│ ├── test_256x192/
│ ├── test_256x192_flip/
│ ├── coco_pose_256x192_train.json # generated
│ ├── coco_pose_256x192_val.json # generated
├── derain/
│ ├── train/
│ ├── input/
│ ├── target/
│ ├── test/
│ ├── Rain100H/
│ ├── Rain100L/
│ ├── Test100/
│ ├── Test1200/
│ ├── Test2800/
│ ├── derain_train.json
│ ├── derain_test_rain100h.json
├── denoise/
│ ├── SIDD_Medium_Srgb/
│ ├── train/
│ ├── val/
│ ├── denoise_ssid_train.json # generated
│ ├── denoise_ssid_val.json # generated
├── light_enhance/
│ ├── our485/
│ ├── low/
│ ├── high/
│ ├── eval15/
│ ├── low/
│ ├── high/
│ ├── enhance_lol_train.json # generated
│ ├── enhance_lol_val.json # generated
```
## 训练
下载预训练模型 [MAE ViT-Large model ](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改`$Painter_ROOT/train_painter_vit_large.sh`中finetune参数地址.
### 单机多卡
#### 普通训练
```
bash train_painter_vit_large.sh
```
#### 分布式训练
```
bash train_multi.sh
```
## 推理
下载推理模型[🤗 Hugging Face Models](https://huggingface.co/BAAI/Painter/blob/main/painter_vit_large.pth). The results on various tasks are summarized below:
## NYU Depth V2
To evaluate Painter on NYU Depth V2, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/nyuv2_depth/eval.sh`, then run:
```bash
bash eval/nyuv2_depth/eval.sh
```
## ADE20k Semantic Segmentation
To evaluate Painter on ADE20k semantic segmentation, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/ade20k_semantic/eval.sh`, then run:
```bash
bash eval/ade20k_semantic/eval.sh
```
## COCO Panoptic Segmentation
To evaluate Painter on COCO panoptic segmentation, you may first update the `$JOB_NAME` in `$Painter_ROOT/eval/coco_panoptic/eval.sh`, then run:
```bash
bash eval/coco_panoptic/eval.sh
```
## COCO Human Pose Estimation
为了评估Painter对COCO姿态的估计, 首先生成绘制的图像:
```bash
python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
python -m torch.distributed.launch --nproc_per_node=8 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth --flip_test
```
Then, you may update the `job_name` and `ckpt_file` in `$Painter_ROOT/eval/mmpose_custom/configs/coco_256x192_test_offline.py`, and run:
```bash
cd $Painter_ROOT/eval/mmpose_custom
./tools/dist_test.sh configs/coco_256x192_test_offline.py none 1 --eval mAP
```
## Low-level Vision Tasks
### Deraining
To evaluate Painter on deraining, first generate the derained images.
```bash
python eval/derain/painter_inference_derain.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
```
Then, update the path to derained images and ground truth in `$Painter_ROOT/eval/derain/evaluate_PSNR_SSIM.m` and run the following script in MATLAB.
```bash
$Painter_ROOT/eval/derain/evaluate_PSNR_SSIM.m
```
### Denoising
To evaluate Painter on SIDD denoising, first generate the denoised images.
```bash
python eval/sidd/painter_inference_sidd.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
```
Then, update the path to denoising output and ground truth in `$Painter_ROOT/eval/sidd/eval_sidd.m` and run the following script in MATLAB.
```bash
$Painter_ROOT/eval/sidd/eval_sidd.m
```
### Low-Light Image Enhancement
To evaluate Painter on LoL image enhancement:
```bash
python eval/lol/painter_inference_lol.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
```
#### 单卡推理
```
bash test.sh
```
## result
本地测试集测试结果单张展示:
<div align=center>
<img src="./doc/origin.png"/>
</div>
<div align=center>
<img src="./doc/results.png"/>
</div>
### 精度
基于项目提供的测试数据, 得到单卡测试结果如下:
| | PSNR | SSIM | LPIPS |
| :------: | :------: | :------: | :------: |
| ours | 29.04 | 0.7615 | 0.1294 |
| paper | 30.13 | 0.7981 | 0.1031 |
## 应用场景
### 算法类别
图像降噪
### 热点应用行业
交通,公安,制造
## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/maskeddenoising_pytorch.git
## 参考资料
https://github.com/haoyuc/MaskedDenoising.git
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import argparse
import json
import tqdm
import sys
sys.path.insert(0, "data")
import numpy as np
from PIL import Image
def unique(ar, return_index=False, return_inverse=False, return_counts=False):
"copied from https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/master/mit_semseg/utils.py"
ar = np.asanyarray(ar).flatten()
optional_indices = return_index or return_inverse
optional_returns = optional_indices or return_counts
if ar.size == 0:
if not optional_returns:
ret = ar
else:
ret = (ar,)
if return_index:
ret += (np.empty(0, np.bool),)
if return_inverse:
ret += (np.empty(0, np.bool),)
if return_counts:
ret += (np.empty(0, np.intp),)
return ret
if optional_indices:
perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
aux = ar[perm]
else:
ar.sort()
aux = ar
flag = np.concatenate(([True], aux[1:] != aux[:-1]))
if not optional_returns:
ret = aux[flag]
else:
ret = (aux[flag],)
if return_index:
ret += (perm[flag],)
if return_inverse:
iflag = np.cumsum(flag) - 1
inv_idx = np.empty(ar.shape, dtype=np.intp)
inv_idx[perm] = iflag
ret += (inv_idx,)
if return_counts:
idx = np.concatenate(np.nonzero(flag) + ([ar.size],))
ret += (np.diff(idx),)
return ret
def colorEncode(labelmap, colors, mode='RGB'):
"Modified from https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/master/mit_semseg/utils.py"
labelmap = labelmap.astype('int')
labelmap_rgb = np.zeros((labelmap.shape[0], labelmap.shape[1], 3),
dtype=np.uint8)
for label in unique(labelmap):
if label <= 0:
continue
# note the color_index = class_index - 1
labelmap_rgb += (labelmap == label)[:, :, np.newaxis] * \
np.tile(np.array(colors[label-1], dtype=np.uint8), (labelmap.shape[0], labelmap.shape[1], 1))
if mode == 'BGR':
return labelmap_rgb[:, :, ::-1]
else:
return labelmap_rgb
def define_colors_per_location_mean_sep():
num_locations = 150
num_sep_per_channel = int(num_locations ** (1 / 3)) + 1 # 19
separation_per_channel = 256 // num_sep_per_channel
color_list = []
for location in range(num_locations):
num_seq_r = location // num_sep_per_channel ** 2
num_seq_g = (location % num_sep_per_channel ** 2) // num_sep_per_channel
num_seq_b = location % num_sep_per_channel
assert (num_seq_r <= num_sep_per_channel) and (num_seq_g <= num_sep_per_channel) \
and (num_seq_b <= num_sep_per_channel)
R = 255 - num_seq_r * separation_per_channel
G = 255 - num_seq_g * separation_per_channel
B = 255 - num_seq_b * separation_per_channel
assert (R < 256) and (G < 256) and (B < 256)
assert (R >= 0) and (G >= 0) and (B >= 0)
assert (R, G, B) not in color_list
color_list.append((R, G, B))
# print(location, (num_seq_r, num_seq_g, num_seq_b), (R, G, B))
return color_list
PALETTE = define_colors_per_location_mean_sep()
def get_args_parser():
parser = argparse.ArgumentParser('ADE20k semantic segmentation preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['training', 'validation'], required=True)
return parser.parse_args()
if __name__ == '__main__':
args = get_args_parser()
image_dir = os.path.join("datasets/ade20k/images", args.split)
segm_dir = os.path.join("datasets/ade20k/annotations", args.split)
save_dir = os.path.join("datasets/ade20k/annotations_with_color", args.split)
if not os.path.exists(save_dir):
os.makedirs(save_dir)
color_list = define_colors_per_location_mean_sep()
segm_path_list = glob.glob(os.path.join(segm_dir, '*.png'))
for segm_path in tqdm.tqdm(segm_path_list):
# check files
file_name = os.path.basename(segm_path)
# in ade20k, images are jpegs, while segms are pngs
image_path = os.path.join(image_dir, file_name.replace('.png', '.jpg'))
assert os.path.isfile(segm_path)
assert os.path.isfile(image_path)
# paint colors on segm
segm = Image.open(segm_path)
segm_color = colorEncode(labelmap=np.array(segm), colors=color_list).astype(np.uint8)
segm_color = Image.fromarray(segm_color)
segm_color.save(os.path.join(save_dir, file_name))
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import json
import tqdm
import argparse
def get_args_parser():
parser = argparse.ArgumentParser('ADE20k semantic segmentation preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['training', 'validation'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/ade20k')
return parser.parse_args()
if __name__ == '__main__':
args = get_args_parser()
image_dir = os.path.join("datasets/ade20k/images", args.split)
annos_dir = os.path.join("datasets/ade20k/annotations_with_color", args.split)
save_path = os.path.join(args.output_dir, "ade20k_{}_image_semantic.json".format(args.split))
output_dict = []
image_path_list = glob.glob(os.path.join(image_dir, '*g'))
for image_path in tqdm.tqdm(image_path_list):
image_name = image_path.split('/')[-1].split('.')[0]
image_path = os.path.join(image_dir, image_name + '.jpg')
panoptic_path = os.path.join(annos_dir, image_name + '.png')
assert os.path.isfile(image_path)
assert os.path.isfile(panoptic_path)
pair_dict = {}
pair_dict["image_path"] = os.path.join("ade20k/images/{}/".format(args.split), image_name + ".jpg")
pair_dict["target_path"] = "ade20k/annotations_with_color/{}/".format(args.split) + image_name + ".png"
pair_dict["type"] = "ade20k_image2semantic"
output_dict.append(pair_dict)
json.dump(output_dict, open(save_path, 'w'))
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import copy
import os
import argparse
import glob
import json
import warnings
import tqdm
import sys
sys.path.insert(0, "data")
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
# from .gen_color_coco_stuff_sem import PALETTE
# from .gen_color_ade20k_sem import unique, colorEncode
from skimage.segmentation import find_boundaries
from panopticapi.utils import rgb2id, IdGenerator
# define colors according to mean separation
def define_colors_by_mean_sep(num_colors=133, channelsep=7):
num_sep_per_channel = channelsep
separation_per_channel = 256 // num_sep_per_channel
color_dict = {}
# R = G = B = 0
# B += separation_per_channel # offset for the first loop
for location in range(num_colors):
num_seq_r = location // num_sep_per_channel ** 2
num_seq_g = (location % num_sep_per_channel ** 2) // num_sep_per_channel
num_seq_b = location % num_sep_per_channel
assert (num_seq_r <= num_sep_per_channel) and (num_seq_g <= num_sep_per_channel) \
and (num_seq_b <= num_sep_per_channel)
R = 255 - num_seq_r * separation_per_channel
G = 255 - num_seq_g * separation_per_channel
B = 255 - num_seq_b * separation_per_channel
assert (R < 256) and (G < 256) and (B < 256)
assert (R >= 0) and (G >= 0) and (B >= 0)
assert (R, G, B) not in color_dict.values()
color_dict[location] = (R, G, B)
# print(location, (num_seq_r, num_seq_g, num_seq_b), (R, G, B))
return color_dict
def load_image_with_retry(image_path):
while True:
try:
img = Image.open(image_path)
return img
except OSError as e:
print(f"Catched exception: {str(e)}. Re-trying...")
import time
time.sleep(1)
def get_args_parser():
parser = argparse.ArgumentParser('COCO panoptic segmentation preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['train2017', 'val2017'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/coco/pano_sem_seg')
return parser.parse_args()
if __name__ == '__main__':
args = get_args_parser()
split = args.split
channelsep = 7
json_file = 'datasets/coco/annotations/panoptic_{}.json'.format(split)
segmentations_folder = 'datasets/coco/annotations/panoptic_{}'.format(split)
img_folder = 'datasets/coco/{}'.format(split)
panoptic_coco_categories = 'data/panoptic_coco_categories.json'
output_dir = os.path.join(args.output_dir, 'panoptic_segm_{}_with_color'.format(split))
print(output_dir)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
else:
warnings.warn("{} exists! make sure to overwrite?".format(output_dir))
# raise NotImplementedError("{} exists! make sure to overwrite?".format(output_dir))
# load cat info
with open(panoptic_coco_categories, 'r') as f:
categories_list = json.load(f)
categories = {category['id']: category for category in categories_list}
catid2colorid = {category['id']: idx for idx, category in enumerate(categories_list)}
# define colors (dict of cat_id to color mapper)
num_colors = len(categories)
color_dict = define_colors_by_mean_sep(num_colors=num_colors, channelsep=channelsep)
# load image annos
with open(json_file, 'r') as f:
coco_d = json.load(f)
num_iscrowd = 0
for ann in tqdm.tqdm(coco_d['annotations']):
# save the time for loading images
# # find input img that correspond to the annotation
segmentation_org = np.array(
load_image_with_retry(os.path.join(segmentations_folder, ann['file_name'])),
dtype=np.uint8
)
segmentation_id = rgb2id(segmentation_org)
image_height_segm, image_width_segm = segmentation_org.shape[0], segmentation_org.shape[1]
image_height, image_width = image_height_segm, image_width_segm
segmentation = copy.deepcopy(segmentation_org)
segmentation[:, :, :] = 0
boxes = [seg['bbox'] for seg in ann['segments_info']] # x, y, w, h
if len(boxes) == 0:
print("bbox is empty!")
continue
boxes = np.array(boxes) # (num_boxes, 4)
for segment_info in ann['segments_info']:
# retrieval color using class id
catid = segment_info['category_id']
colorid = catid2colorid[catid]
color = color_dict[colorid]
# paint color
mask = segmentation_id == segment_info['id']
segmentation[mask] = color
segmentation = Image.fromarray(segmentation)
segmentation.save(os.path.join(output_dir, ann['file_name']))
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import json
import tqdm
import argparse
def get_args_parser():
parser = argparse.ArgumentParser('COCO semantic segmentation preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['train2017', 'val2017'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/coco/pano_sem_seg/')
return parser.parse_args()
if __name__ == "__main__":
args = get_args_parser()
image_dir = "datasets/coco/{}/".format(args.split)
panoptic_dir = 'datasets/coco/pano_sem_seg/panoptic_segm_{}_with_color/'.format(args.split)
save_path = os.path.join(args.output_dir, "coco_{}_image_panoptic_sem_seg.json".format(args.split))
print(save_path)
output_dict = []
image_path_list = glob.glob(image_dir + '*g')
for image_path in tqdm.tqdm(image_path_list):
image_name = image_path.split('/')[-1].split('.')[0]
image_path = os.path.join(image_dir, image_name+'.jpg')
panoptic_path = os.path.join(panoptic_dir, image_name+'.png')
assert os.path.isfile(image_path)
if not os.path.isfile(panoptic_path):
print("ignore {}".format(image_path))
continue
pair_dict = {}
pair_dict["image_path"] = image_path.replace('datasets/', '')
pair_dict["target_path"] = panoptic_path.replace('datasets/', '')
pair_dict["type"] = "coco_image2panoptic_sem_seg"
output_dict.append(pair_dict)
json.dump(output_dict, open(save_path, 'w'))
# Source: https://stackoverflow.com/a/39225039
import requests
def download_file_from_google_drive(id, destination):
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
if __name__ == "__main__":
import sys
if len(sys.argv) != 3:
print("Usage: python google_drive.py drive_file_id destination_file_path")
else:
# TAKE ID FROM SHAREABLE LINK
file_id = sys.argv[1]
# DESTINATION FILE ON YOUR DISK
destination = sys.argv[2]
download_file_from_google_drive(file_id, destination)
# Copyright (C) 2019 Jin Han Lee
#
# This file is a part of BTS.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>
from __future__ import absolute_import, division, print_function
import os
import argparse
import fnmatch
import cv2
import numpy as np
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
def convert_arg_line_to_args(arg_line):
for arg in arg_line.split():
if not arg.strip():
continue
yield arg
parser = argparse.ArgumentParser(description='BTS TensorFlow implementation.', fromfile_prefix_chars='@')
parser.convert_arg_line_to_args = convert_arg_line_to_args
parser.add_argument('--pred_path', type=str, help='path to the prediction results in png', required=True)
parser.add_argument('--gt_path', type=str, help='root path to the groundtruth data', required=False,
default='datasets/nyuv2/official_splits/test/')
parser.add_argument('--dataset', type=str, help='dataset to test on, nyu or kitti', default='nyu')
parser.add_argument('--eigen_crop', help='if set, crops according to Eigen NIPS14', action='store_true')
parser.add_argument('--garg_crop', help='if set, crops according to Garg ECCV16', action='store_true')
parser.add_argument('--min_depth_eval', type=float, help='minimum depth for evaluation', default=1e-3)
parser.add_argument('--max_depth_eval', type=float, help='maximum depth for evaluation', default=10)
parser.add_argument('--do_kb_crop', help='if set, crop input images as kitti benchmark images', action='store_true')
args = parser.parse_args()
def compute_errors(gt, pred):
thresh = np.maximum((gt / pred), (pred / gt))
d1 = (thresh < 1.25).mean()
d2 = (thresh < 1.25 ** 2).mean()
d3 = (thresh < 1.25 ** 3).mean()
rmse = (gt - pred) ** 2
rmse = np.sqrt(rmse.mean())
rmse_log = (np.log(gt) - np.log(pred)) ** 2
rmse_log = np.sqrt(rmse_log.mean())
abs_rel = np.mean(np.abs(gt - pred) / gt)
sq_rel = np.mean(((gt - pred)**2) / gt)
err = np.log(pred) - np.log(gt)
silog = np.sqrt(np.mean(err ** 2) - np.mean(err) ** 2) * 100
err = np.abs(np.log10(pred) - np.log10(gt))
log10 = np.mean(err)
return silog, log10, abs_rel, sq_rel, rmse, rmse_log, d1, d2, d3
def test():
global gt_depths, missing_ids, pred_filenames
gt_depths = []
missing_ids = set()
pred_filenames = []
for root, dirnames, filenames in os.walk(args.pred_path):
for pred_filename in fnmatch.filter(filenames, '*.png'):
if 'cmap' in pred_filename or 'gt' in pred_filename:
continue
dirname = root.replace(args.pred_path, '')
pred_filenames.append(os.path.join(dirname, pred_filename))
num_test_samples = len(pred_filenames)
pred_depths = []
for i in range(num_test_samples):
pred_depth_path = os.path.join(args.pred_path, pred_filenames[i])
pred_depth = cv2.imread(pred_depth_path, -1)
if pred_depth is None:
print('Missing: %s ' % pred_depth_path)
missing_ids.add(i)
continue
if args.dataset == 'nyu':
pred_depth = pred_depth.astype(np.float32) / 1000.0
else:
pred_depth = pred_depth.astype(np.float32) / 256.0
pred_depths.append(pred_depth)
print('Raw png files reading done')
print('Evaluating {} files'.format(len(pred_depths)))
if args.dataset == 'kitti':
for t_id in range(num_test_samples):
file_dir = pred_filenames[t_id].split('.')[0]
filename = file_dir.split('_')[-1]
directory = file_dir.replace('_' + filename, '')
gt_depth_path = os.path.join(args.gt_path, directory, 'proj_depth/groundtruth/image_02', filename + '.png')
depth = cv2.imread(gt_depth_path, -1)
if depth is None:
print('Missing: %s ' % gt_depth_path)
missing_ids.add(t_id)
continue
depth = depth.astype(np.float32) / 256.0
gt_depths.append(depth)
elif args.dataset == 'nyu':
for t_id in range(num_test_samples):
file_dir = pred_filenames[t_id].split('.')[0]
filename = file_dir.split('_')[-1]
directory = file_dir.replace('_rgb_'+file_dir.split('_')[-1], '')
gt_depth_path = os.path.join(args.gt_path, directory, 'sync_depth_' + filename + '.png')
depth = cv2.imread(gt_depth_path, -1)
if depth is None:
print('Missing: %s ' % gt_depth_path)
missing_ids.add(t_id)
continue
depth = depth.astype(np.float32) / 1000.0
gt_depths.append(depth)
print('GT files reading done')
print('{} GT files missing'.format(len(missing_ids)))
print('Computing errors')
eval(pred_depths)
print('Done.')
def eval(pred_depths):
num_samples = len(pred_depths)
pred_depths_valid = []
i = 0
for t_id in range(num_samples):
if t_id in missing_ids:
continue
pred_depths_valid.append(pred_depths[t_id])
num_samples = num_samples - len(missing_ids)
silog = np.zeros(num_samples, np.float32)
log10 = np.zeros(num_samples, np.float32)
rms = np.zeros(num_samples, np.float32)
log_rms = np.zeros(num_samples, np.float32)
abs_rel = np.zeros(num_samples, np.float32)
sq_rel = np.zeros(num_samples, np.float32)
d1 = np.zeros(num_samples, np.float32)
d2 = np.zeros(num_samples, np.float32)
d3 = np.zeros(num_samples, np.float32)
for i in range(num_samples):
gt_depth = gt_depths[i]
pred_depth = pred_depths_valid[i]
pred_depth[pred_depth < args.min_depth_eval] = args.min_depth_eval
pred_depth[pred_depth > args.max_depth_eval] = args.max_depth_eval
pred_depth[np.isinf(pred_depth)] = args.max_depth_eval
gt_depth[np.isinf(gt_depth)] = 0
gt_depth[np.isnan(gt_depth)] = 0
valid_mask = np.logical_and(gt_depth > args.min_depth_eval, gt_depth < args.max_depth_eval)
if args.do_kb_crop:
height, width = gt_depth.shape
top_margin = int(height - 352)
left_margin = int((width - 1216) / 2)
pred_depth_uncropped = np.zeros((height, width), dtype=np.float32)
pred_depth_uncropped[top_margin:top_margin + 352, left_margin:left_margin + 1216] = pred_depth
pred_depth = pred_depth_uncropped
if args.garg_crop or args.eigen_crop:
gt_height, gt_width = gt_depth.shape
eval_mask = np.zeros(valid_mask.shape)
if args.garg_crop:
eval_mask[int(0.40810811 * gt_height):int(0.99189189 * gt_height), int(0.03594771 * gt_width):int(0.96405229 * gt_width)] = 1
elif args.eigen_crop:
if args.dataset == 'kitti':
eval_mask[int(0.3324324 * gt_height):int(0.91351351 * gt_height), int(0.0359477 * gt_width):int(0.96405229 * gt_width)] = 1
else:
eval_mask[45:471, 41:601] = 1
valid_mask = np.logical_and(valid_mask, eval_mask)
silog[i], log10[i], abs_rel[i], sq_rel[i], rms[i], log_rms[i], d1[i], d2[i], d3[i] = compute_errors(gt_depth[valid_mask], pred_depth[valid_mask])
print("{:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}, {:>7}".format(
'd1', 'd2', 'd3', 'AbsRel', 'SqRel', 'RMSE', 'RMSElog', 'SILog', 'log10'))
print("{:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}, {:7.3f}".format(
d1.mean(), d2.mean(), d3.mean(),
abs_rel.mean(), sq_rel.mean(), rms.mean(), log_rms.mean(), silog.mean(), log10.mean()))
return silog, log10, abs_rel, sq_rel, rms, log_rms, d1, d2, d3
def main():
test()
if __name__ == '__main__':
main()
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#######################################################################################
# The MIT License
# Copyright (c) 2014 Hannes Schulz, University of Bonn <schulz@ais.uni-bonn.de>
# Copyright (c) 2013 Benedikt Waldvogel, University of Bonn <mail@bwaldvogel.de>
# Copyright (c) 2008-2009 Sebastian Nowozin <nowozin@gmail.com>
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#######################################################################################
#
# Helper script to convert the NYU Depth v2 dataset Matlab file into a set of
# PNG and JPEG images.
#
# See https://github.com/deeplearningais/curfil/wiki/Training-and-Prediction-with-the-NYU-Depth-v2-Dataset
from __future__ import print_function
import h5py
import numpy as np
import os
import scipy.io
import sys
import cv2
def convert_image(i, scene, depth_raw, image):
idx = int(i) + 1
if idx in train_images:
train_test = "train"
else:
assert idx in test_images, "index %d neither found in training set nor in test set" % idx
train_test = "test"
folder = "%s/%s/%s" % (out_folder, train_test, scene)
if not os.path.exists(folder):
os.makedirs(folder)
img_depth = depth_raw * 1000.0
img_depth_uint16 = img_depth.astype(np.uint16)
cv2.imwrite("%s/sync_depth_%05d.png" % (folder, i), img_depth_uint16)
image = image[:, :, ::-1]
image_black_boundary = np.zeros((480, 640, 3), dtype=np.uint8)
image_black_boundary[7:474, 7:632, :] = image[7:474, 7:632, :]
cv2.imwrite("%s/rgb_%05d.jpg" % (folder, i), image_black_boundary)
if __name__ == "__main__":
if len(sys.argv) < 4:
print("usage: %s <h5_file> <train_test_split> <out_folder>" % sys.argv[0], file=sys.stderr)
sys.exit(0)
h5_file = h5py.File(sys.argv[1], "r")
# h5py is not able to open that file. but scipy is
train_test = scipy.io.loadmat(sys.argv[2])
out_folder = sys.argv[3]
test_images = set([int(x) for x in train_test["testNdxs"]])
train_images = set([int(x) for x in train_test["trainNdxs"]])
print("%d training images" % len(train_images))
print("%d test images" % len(test_images))
depth_raw = h5_file['rawDepths']
print("reading", sys.argv[1])
images = h5_file['images']
# scenes = [u''.join(chr(c) for c in h5_file[obj_ref]) for obj_ref in h5_file['sceneTypes'][0]]
scenes = [u''.join(chr(c[0]) for c in h5_file[obj_ref]) for obj_ref in h5_file['sceneTypes'][0]]
print("processing images")
for i, image in enumerate(images):
print("image", i + 1, "/", len(images))
convert_image(i, scenes[i], depth_raw[i, :, :].T, image.T)
print("Finished")
\ No newline at end of file
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import json
import tqdm
import argparse
def get_args_parser():
parser = argparse.ArgumentParser('NYU Depth V2 preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['sync', 'test'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/nyu_depth_v2')
return parser.parse_args()
if __name__ == '__main__':
args = get_args_parser()
split2dir = {
'sync': 'sync',
'test': 'official_splits/test',
}
split_dir = split2dir[args.split]
output_dict = []
save_path = os.path.join(args.output_dir, "nyuv2_{}_image_depth.json".format(args.split))
src_dir = os.path.join("datasets/nyu_depth_v2", split_dir)
image_path_list = glob.glob(src_dir + "/*/rgb_*.jpg")
for image_path in tqdm.tqdm(image_path_list):
room_name = image_path.split('/')[-2]
frame_name = image_path.split('/')[-1].split('.')[0].split('_')[1]
target_path = src_dir + '/' + room_name + '/sync_depth_' + frame_name + '.png'
assert os.path.isfile(image_path)
assert os.path.isfile(target_path)
image_name = image_path.split('{}/'.format(args.split))[-1]
target_name = target_path.split('{}/'.format(args.split))[-1]
pair_dict = {}
pair_dict["image_path"] = "nyu_depth_v2/{}/".format(split_dir) + image_name
pair_dict["target_path"] = "nyu_depth_v2/{}/".format(split_dir) + target_name
pair_dict["type"] = "nyuv2_image2depth"
output_dict.append(pair_dict)
json.dump(output_dict, open(save_path, 'w'))
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import json
import tqdm
import argparse
def get_args_parser():
parser = argparse.ArgumentParser('Deraining preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['train', 'val'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/derain')
return parser.parse_args()
if __name__ == "__main__":
args = get_args_parser()
if args.split == 'train':
image_dir = "datasets/derain/train/input/"
save_path = os.path.join(args.output_dir, "derain_train.json")
elif args.split == 'val':
# we simply use Rain100H for loss eval
image_dir = "datasets/derain/test/Rain100H/input/"
save_path = os.path.join(args.output_dir, "derain_test_rain100h.json")
else:
raise NotImplementedError
print(save_path)
output_dict = []
image_path_list = glob.glob(os.path.join(image_dir, '*.png')) + glob.glob(os.path.join(image_dir, '*.jpg'))
for image_path in tqdm.tqdm(image_path_list):
# image_name = os.path.basename(image_path)
target_path = image_path.replace('input', 'target')
assert os.path.isfile(image_path)
assert os.path.isfile(target_path)
pair_dict = {}
pair_dict["image_path"] = image_path.replace('datasets/', '')
pair_dict["target_path"] = target_path.replace('datasets/', '')
pair_dict["type"] = "derain_image2derain"
output_dict.append(pair_dict)
json.dump(output_dict, open(save_path, 'w'))
# --------------------------------------------------------
# Images Speak in Images: A Generalist Painter for In-Context Visual Learning (https://arxiv.org/abs/2212.02499)
# Github source: https://github.com/baaivision/Painter
# Copyright (c) 2022 Beijing Academy of Artificial Intelligence (BAAI)
# Licensed under The MIT License [see LICENSE for details]
# By Xinlong Wang, Wen Wang
# Based on MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer codebases
# --------------------------------------------------------'
import os
import glob
import json
import tqdm
import argparse
def get_args_parser():
parser = argparse.ArgumentParser('Low lignt enhancement preparation', add_help=False)
parser.add_argument('--split', type=str, help='dataset split',
choices=['train', 'val'], required=True)
parser.add_argument('--output_dir', type=str, help='path to output dir',
default='datasets/light_enhance')
return parser.parse_args()
if __name__ == "__main__":
args = get_args_parser()
save_path = os.path.join(args.output_dir, "enhance_lol_{}.json".format(args.split))
if args.split == 'train':
image_dir = "datasets/light_enhance/our485/low/"
elif args.split == 'val':
image_dir = "datasets/light_enhance/eval15/low/"
else:
raise NotImplementedError
print(save_path)
output_dict = []
image_path_list = glob.glob(os.path.join(image_dir, '*.png')) + glob.glob(os.path.join(image_dir, '*.jpg'))
for image_path in tqdm.tqdm(image_path_list):
target_path = image_path.replace('low', 'high')
assert os.path.isfile(image_path)
assert os.path.isfile(target_path)
pair_dict = {}
pair_dict["image_path"] = image_path.replace('datasets/', '')
pair_dict["target_path"] = target_path.replace('datasets/', '')
pair_dict["type"] = "lol_image2enhance"
output_dict.append(pair_dict)
json.dump(output_dict, open(save_path, 'w'))
# Copyright (c) OpenMMLab. All rights reserved.
import copy
import os
import random
import sys
import tqdm
import numpy as np
import torch
import torch.distributed as dist
from mmcv.runner import (DistSamplerSeedHook, EpochBasedRunner,
Fp16OptimizerHook, OptimizerHook, build_runner,
get_dist_info)
from mmdet.core import DistEvalHook, EvalHook, build_optimizer
from mmdet.datasets import (build_dataloader, build_dataset,
replace_ImageToTensor)
from mmdet.utils import (build_ddp, build_dp, compat_cfg,
find_latest_checkpoint, get_root_logger)
def train_detector(model,
dataset,
cfg,
distributed=False,
validate=False,
timestamp=None,
meta=None):
cfg = compat_cfg(cfg)
logger = get_root_logger(log_level=cfg.log_level)
# prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
runner_type = 'EpochBasedRunner' if 'runner' not in cfg else cfg.runner[
'type']
train_dataloader_default_args = dict(
samples_per_gpu=2,
workers_per_gpu=2,
# `num_gpus` will be ignored if distributed
num_gpus=len(cfg.gpu_ids),
dist=distributed,
seed=cfg.seed,
runner_type=runner_type,
persistent_workers=False)
train_loader_cfg = {
**train_dataloader_default_args,
**cfg.data.get('train_dataloader', {})
}
load_data_only = cfg.custom.get('load_data_only', False)
assert load_data_only
train_loader_cfg_custom = copy.deepcopy(train_loader_cfg)
# train_loader_cfg_custom['shuffle'] = False # we prefer gen data in order
# train_loader_cfg_custom['dist'] = False
data_loaders = [build_dataloader(ds, **train_loader_cfg_custom) for ds in dataset]
# only enumerate dataset
for data_loader in data_loaders:
for _ in tqdm.tqdm(data_loader):
pass
print("dataset enumerated, exit!")
sys.exit()
# dataset settings
dataset_type = 'CocoPanopticDatasetCustom'
data_root = '/home/datasets/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='SegRescale', scale_factor=1 / 4),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/panoptic_train2017.json',
img_prefix=data_root + 'train2017/',
seg_prefix=data_root + 'annotations/panoptic_train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/panoptic_val2017.json',
img_prefix=data_root + 'val2017/',
seg_prefix=data_root + 'annotations/panoptic_val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/panoptic_val2017.json',
img_prefix=data_root + 'val2017/',
seg_prefix=data_root + 'annotations/panoptic_val2017/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric=['PQ'])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'
# Default setting for scaling LR automatically
# - `enable` means enable scaling LR automatically
# or not by default.
# - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=16)
# modified from mask2former config
_base_ = [
'./_base_/dataset/coco_panoptic.py', './_base_/default_runtime.py'
]
num_things_classes = 80
num_stuff_classes = 53
num_classes = num_things_classes + num_stuff_classes
model = None
# dataset settings
image_size = (1024, 1024)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='RandomFlip', flip_ratio=0.5),
# large scale jittering
dict(
type='Resize',
img_scale=image_size,
ratio_range=(0.7, 2.0),
multiscale_mode='range',
keep_ratio=False),
dict(
type='RandomCrop',
crop_size=image_size,
crop_type='absolute',
recompute_bbox=True,
allow_negative_crop=True),
# dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=image_size),
dict(
type='SaveDataPairCustom',
dir_name='train_aug0',
target_path='/home/datasets/coco/pano_ca_inst',
), # custom, we don't care the transforms afterward
dict(type='DefaultFormatBundle', img_to_float=True),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
test_pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='RandomFlip', flip_ratio=0.0),
# large scale jittering
dict(
type='Resize',
img_scale=image_size,
ratio_range=(1.0, 1.0),
multiscale_mode='range',
keep_ratio=False),
dict(type='Pad', size=image_size),
dict(
type='SaveDataPairCustom',
dir_name='val_org',
target_path='/home/datasets/coco/pano_ca_inst',
), # custom, we don't care the transforms afterward
dict(type='Normalize', **img_norm_cfg),
dict(type='DefaultFormatBundle', img_to_float=True),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
data_root = '/home/datasets/coco/'
data = dict(
samples_per_gpu=8,
workers_per_gpu=8,
train=dict(pipeline=train_pipeline),
val=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
),
test=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
))
embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
# optimizer
optimizer = dict(
type='AdamW',
lr=0.0001,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1, decay_mult=1.0),
'query_embed': embed_multi,
'query_feat': embed_multi,
'level_embed': embed_multi,
},
norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))
custom = dict(
load_data_only=True,
)
by_epoch = True
# learning policy
lr_config = dict(
policy='step',
gamma=0.1,
by_epoch=by_epoch,
step=[327778, 355092],
warmup='linear',
warmup_by_epoch=by_epoch,
warmup_ratio=1.0, # no warmup
warmup_iters=10)
max_iters = 368750
# runner = dict(type='IterBasedRunner', max_iters=max_iters)
runner = dict(type='EpochBasedRunner', max_epochs=1) # we prefer by epoch
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', by_epoch=by_epoch),
dict(type='TensorboardLoggerHook', by_epoch=by_epoch)
])
interval = 5000
workflow = [('train', interval)]
checkpoint_config = dict(
by_epoch=by_epoch, interval=interval, save_last=True, max_keep_ckpts=3)
# Before 365001th iteration, we do evaluation every 5000 iterations.
# After 365000th iteration, we do evaluation every 368750 iterations,
# which means that we do evaluation at the end of training.
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
evaluation = dict(
interval=interval,
dynamic_intervals=dynamic_intervals,
metric=['PQ', 'bbox', 'segm'])
# import newly registered module
custom_imports = dict(
imports=[
'data.coco_panoptic',
'data.pipelines.transforms',
],
allow_failed_imports=False)
# modified from mask2former config
_base_ = [
'./_base_/dataset/coco_panoptic.py', './_base_/default_runtime.py'
]
num_things_classes = 80
num_stuff_classes = 53
num_classes = num_things_classes + num_stuff_classes
model = None
# dataset settings
image_size = (1024, 1024)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='RandomFlip', flip_ratio=0.0),
# # large scale jittering
dict(
type='Resize',
img_scale=image_size,
ratio_range=(1.0, 1.0),
multiscale_mode='range',
keep_ratio=False),
# dict(
# type='RandomCrop',
# crop_size=image_size,
# crop_type='absolute',
# recompute_bbox=True,
# allow_negative_crop=True),
# dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=image_size),
dict(
type='SaveDataPairCustom',
dir_name='train_org',
target_path='/home/datasets/coco/pano_ca_inst',
), # custom, we don't care the transforms afterward
dict(type='DefaultFormatBundle', img_to_float=True),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
test_pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(
type='LoadPanopticAnnotations',
with_bbox=True,
with_mask=True,
with_seg=True),
dict(type='RandomFlip', flip_ratio=0.0),
# large scale jittering
dict(
type='Resize',
img_scale=image_size,
ratio_range=(1.0, 1.0),
multiscale_mode='range',
keep_ratio=False),
dict(type='Pad', size=image_size),
dict(
type='SaveDataPairCustom',
dir_name='val_org',
target_path='/home/datasets/coco/pano_ca_inst',
), # custom, we don't care the transforms afterward
dict(type='Normalize', **img_norm_cfg),
dict(type='DefaultFormatBundle', img_to_float=True),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
]
data_root = '/home/datasets/coco/'
data = dict(
samples_per_gpu=8,
workers_per_gpu=8,
train=dict(pipeline=train_pipeline),
val=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
),
test=dict(
pipeline=test_pipeline,
ins_ann_file=data_root + 'annotations/instances_val2017.json',
))
embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
# optimizer
optimizer = dict(
type='AdamW',
lr=0.0001,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1, decay_mult=1.0),
'query_embed': embed_multi,
'query_feat': embed_multi,
'level_embed': embed_multi,
},
norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))
custom = dict(
load_data_only=True,
)
by_epoch = True
# learning policy
lr_config = dict(
policy='step',
gamma=0.1,
by_epoch=by_epoch,
step=[327778, 355092],
warmup='linear',
warmup_by_epoch=by_epoch,
warmup_ratio=1.0, # no warmup
warmup_iters=10)
max_iters = 368750
# runner = dict(type='IterBasedRunner', max_iters=max_iters)
runner = dict(type='EpochBasedRunner', max_epochs=1) # we prefer by epoch
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', by_epoch=by_epoch),
dict(type='TensorboardLoggerHook', by_epoch=by_epoch)
])
interval = 5000
workflow = [('train', interval)]
checkpoint_config = dict(
by_epoch=by_epoch, interval=interval, save_last=True, max_keep_ckpts=3)
# Before 365001th iteration, we do evaluation every 5000 iterations.
# After 365000th iteration, we do evaluation every 368750 iterations,
# which means that we do evaluation at the end of training.
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
evaluation = dict(
interval=interval,
dynamic_intervals=dynamic_intervals,
metric=['PQ', 'bbox', 'segm'])
# import newly registered module
custom_imports = dict(
imports=[
'data.coco_panoptic',
'data.pipelines.transforms',
],
allow_failed_imports=False)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment