README.md 10 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
4
5
6
7
# Painter
## 论文
[Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://arxiv.org/abs/2212.02499)

## 模型结构

<div align=center>
Rayyyyy's avatar
Rayyyyy committed
8
    <img src="./doc/method.png"/>
chenych's avatar
chenych committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</div>

## 算法原理
将视觉任务的连续输出空间离散化, 并使用语言或专门设计的离散标记作为任务提示, 将视觉问题转化为 NLP 问题.

<div align=center>
    <img src="./doc/progress.png"/>
</div>

## 环境配置
Tips: timm==0.3.2 版本存在 [cannot import name 'container_abcs' from 'torch._six'](https://github.com/huggingface/pytorch-image-models/issues/420#issuecomment-776459842) 问题, 需要将 timm/models/layers/helpers.py 中 from torch._six import container_abcs 修改为

```bash
import torch
TORCH_MAJOR = int(torch.__version__.split('.')[0])
TORCH_MINOR = int(torch.__version__.split('.')[1])

if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    from torch._six import container_abcs
else:
chenych's avatar
chenych committed
29
    import collections.abc as container_abcs
chenych's avatar
chenych committed
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
```

### Docker(方法一)

-v 路径、docker_name和imageID根据实际情况修改

```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

cd /your_code_path/painter_pytorch
pip install --upgrade setuptools wheel
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

### Dockerfile(方法二)

-v 路径、docker_name和imageID根据实际情况修改

```bash
cd ./docker
cp ../requirements.txt requirements.txt
docker build --no-cache -t painter:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

cd /your_code_path/painter_pytorch
pip install --upgrade setuptools wheel
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/

```
DTK软件栈:dtk23.04.1
python:python3.8
torch:1.13.1
torchvision:0.14.1
```

Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应

2、其他非特殊库直接按照requirements.txt安装

```bash
pip install --upgrade setuptools wheel
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

## 数据集
chenych's avatar
chenych committed
90
91
本项目所需数据集较多, 所以提供了项目数据集需求较多, 可以使用提供的[a toy training dataset](https://huggingface.co/BAAI/Painter/blob/main/toy_datasets.tar)数据集来验证部分功能, 数据集由每个类别中各10个类别组成. 将数据集放置于 `$Painter_ROOT/toy_datasets` 路径下, 并设置`$Painter_ROOT/train_painter_vit_large.sh``DATA_PATH=toy_datasets`即可, 其他参数请参考训练章节的介绍.
如需完整数据集, 请参考[data instructions](docs/DATA.md).
chenych's avatar
chenych committed
92

chenych's avatar
chenych committed
93
完整数据集的目录结构如下:
chenych's avatar
chenych committed
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180

```
├── nyu_depth_v2/
│   ├── sync/
│   ├── official_splits/
│   ├── nyu_depth_v2_labeled.mat
│   ├── nyuv2_sync_image_depth.json  # 生成
│   ├── nyuv2_test_image_depth.json  # 生成
├── ade20k/
│   ├── images/
│   ├── annotations/
│   ├── annotations_detectron2/  # 生成
│   ├── annotations_with_color/  # 生成
│   ├── ade20k_training_image_semantic.json  # 生成
│   ├── ade20k_validation_image_semantic.json  # 生成
├── ADEChallengeData2016/  # sim-link to $Painter_ROOT/datasets/ade20k
├── coco/
│   ├── train2017/
│   ├── val2017/
│   ├── annotations/
│       ├── instances_train2017.json
│       ├── instances_val2017.json
│       ├── person_keypoints_val2017.json
│       ├── panoptic_train2017.json
│       ├── panoptic_val2017.json
│       ├── panoptic_train2017/
│       ├── panoptic_val2017/
│   ├── panoptic_semseg_val2017/  # 生成
│   ├── panoptic_val2017/  # sim-link to $Painter_ROOT/datasets/coco/annotations/panoptic_val2017
│   ├── pano_sem_seg/  # 生成
│       ├── panoptic_segm_train2017_with_color
│       ├── panoptic_segm_val2017_with_color
│       ├── coco_train2017_image_panoptic_sem_seg.json
│       ├── coco_val2017_image_panoptic_sem_seg.json
│   ├── pano_ca_inst/  # 生成
│       ├── train_aug0/
│       ├── train_aug1/
│       ├── ...
│       ├── train_aug29/
│       ├── train_org/
│       ├── train_flip/
│       ├── val_org/
│       ├── coco_train_image_panoptic_inst.json
│       ├── coco_val_image_panoptic_inst.json
├── coco_pose/
│   ├── person_detection_results/
│       ├── COCO_val2017_detections_AP_H_56_person.json
│   ├── data_pair/  # 生成
│       ├── train_256x192_aug0/
│       ├── train_256x192_aug1/
│       ├── ...
│       ├── train_256x192_aug19/
│       ├── val_256x192/
│       ├── test_256x192/
│       ├── test_256x192_flip/
│   ├── coco_pose_256x192_train.json  # 生成
│   ├── coco_pose_256x192_val.json  # 生成
├── derain/
│   ├── train/
│       ├── input/
│       ├── target/
│   ├── test/
│       ├── Rain100H/
│       ├── Rain100L/
│       ├── Test100/
│       ├── Test1200/
│       ├── Test2800/
│   ├── derain_train.json
│   ├── derain_test_rain100h.json
├── denoise/
│   ├── SIDD_Medium_Srgb/
│   ├── train/
│   ├── val/
│   ├── denoise_ssid_train.json  # 生成
│   ├── denoise_ssid_val.json  # 生成
├── light_enhance/
│   ├── our485/
│       ├── low/
│       ├── high/
│   ├── eval15/
│       ├── low/
│       ├── high/
│   ├── enhance_lol_train.json  # 生成
│   ├── enhance_lol_val.json  # 生成
```

## 训练
181
下载预训练模型 [MAE ViT-Large model ](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改 `$Painter_ROOT/train.sh``$Painter_ROOT/single_process.sh` 中finetune参数地址.
chenych's avatar
chenych committed
182
183

### 单机多卡
Rayyyyy's avatar
Rayyyyy committed
184
本项目默认参数是单机4卡 (total_bsz = 1x4x32 = 128), 如需使用其他的卡数, 请修改 train.sh 中对应参数.
chenych's avatar
chenych committed
185
186
187
188
189
190
```bash
bash train.sh
```

### 多机多卡
Tips: 作者使用8个节点, 每个节点8张卡 (total_bsz = 8x8x32 = 2048) 进行的训练;
191
使用多节点的情况下,需要将使用节点写入hostfile文件, 多节点每个节点一行, 例如: c1xxxxxx slots=4
chenych's avatar
chenych committed
192
```bash
Rayyyyy's avatar
Rayyyyy committed
193
bash run_train_multi.sh
chenych's avatar
chenych committed
194
195
196
```

## 推理
chenych's avatar
chenych committed
197
198
199
200
201
202
1. 下载推理模型[🤗 Hugging Face Models](https://huggingface.co/BAAI/Painter/blob/main/painter_vit_large.pth), 或者准备好自己的待测试模型

2. 部分测试无法使用toy_datasets进行验证,如果使用toy_datasets数据集进行推理, 需确认默认图片是否存在, 如不存在请修改对应参数, 如
`eval/nyuv2_depth/eval.sh` 中的 `PROMPT="study_room_0005b/rgb_00094"`, rgb_00094图片不存在于toy_datasets数据集中,请对应修改为toy_datasets中的图片名,如`PROMPT="study_room_0005b/rgb_00092"`

各个数据集推理方法如下:
chenych's avatar
chenych committed
203
204
205

### NYU Depth V2

chenych's avatar
chenych committed
206
设置 `$Painter_ROOT/eval/nyuv2_depth/eval.sh` 文件里的 `JOB_NAME``PROMPT``CKPT_FILE``DATA_DIR` 参数:
chenych's avatar
chenych committed
207
208
209
210
211
212
```bash
bash eval/nyuv2_depth/eval.sh
```

### ADE20k Semantic Segmentation

chenych's avatar
chenych committed
213
214
1. 无法使用toy_datasets进行验证;
2. 设置 `$Painter_ROOT/eval/ade20k_semantic/eval.sh` 文件里的 `JOB_NAME``PROMPT` 参数, 执行下面的命令:
chenych's avatar
chenych committed
215
216
217
218
219
220
```bash
bash eval/ade20k_semantic/eval.sh
```

### COCO Panoptic Segmentation

chenych's avatar
chenych committed
221
222
1. 无法使用toy_datasets进行验证;
2. 设置 `$Painter_ROOT/eval/coco_panoptic/eval.sh` 文件里的 `JOB_NAME``PROMPT` 参数, 然后执行下面的命令:
chenych's avatar
chenych committed
223
224
225
226
227
228
```bash
bash eval/coco_panoptic/eval.sh
```

### COCO Human Pose Estimation

chenych's avatar
chenych committed
229
230
1. 无法使用toy_datasets进行验证;
2. 生成验证所需的图像:
chenych's avatar
chenych committed
231
```bash
chenych's avatar
chenych committed
232
233
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth --flip_test
chenych's avatar
chenych committed
234
235
```

chenych's avatar
chenych committed
236
3. 修改  `$Painter_ROOT/eval/mmpose_custom/configs/coco_256x192_test_offline.py` 文件中的 `job_name``data_root``bbox_file``ckpt_file` 参数, 执行:
chenych's avatar
chenych committed
237
238
239
240
241
242
243
```bash
cd $Painter_ROOT/eval/mmpose_custom
./tools/dist_test.sh configs/coco_256x192_test_offline.py none 1 --eval mAP
```

### Low-Light Image Enhancement

chenych's avatar
chenych committed
244
执行
chenych's avatar
chenych committed
245
246
247
248
```bash
python eval/lol/painter_inference_lol.py --ckpt_path models/path/of/painter_vit_large.pth --data_dir path/of/datasets

Example:
chenych's avatar
chenych committed
249
python eval/lol/painter_inference_lol.py --ckpt_path models/painter_vit_large.pth --data_dir datasets
chenych's avatar
chenych committed
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
```

## result

三张图从左到右分别为: 原始图像, 低分辨率图像, 经模型超分处理后的图像.
<div align=center>
    <img src="./doc/high.png"/>
    <img src="./doc/low.png"/>
    <img src="./doc/results.png"/>
</div>


### 精度

基于项目提供的测试数据, 得到单卡测试结果如下:

chenych's avatar
chenych committed
266
| enhance | datasets | PSNR | SSIM |
chenych's avatar
chenych committed
267
| :------: | :------: | :------: | :------: |
chenych's avatar
chenych committed
268
269
| ours | LOL | 22.34 | 0.8716 |
| paper | LOL | 22.34 | 0.872 |
chenych's avatar
chenych committed
270
271
272
273
274
275
276
277
278
279
280
281
282
283


## 应用场景
### 算法类别
图像超分,图像分割

### 热点应用行业
交通,医疗,政府,制造

## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/painter_pytorch.git

## 参考资料
https://github.com/baaivision/Painter/tree/main/Painter