README.md 10.1 KB
Newer Older
chenych's avatar
chenych committed
1
2
# Painter
## 论文
Rayyyyy's avatar
Rayyyyy committed
3
4
`Images Speak in Images: A Generalist Painter for In-Context Visual Learning`
- https://arxiv.org/abs/2212.02499
chenych's avatar
chenych committed
5
6

## 模型结构
Rayyyyy's avatar
Rayyyyy committed
7
整个模型基于VIT系列的骨干,VIT的backbone由encoder、decoder两部分组成, 编码器是由VIT的骨干block堆积而成,解码器其实是卷积层构成的。
chenych's avatar
chenych committed
8
<div align=center>
Rayyyyy's avatar
Rayyyyy committed
9
    <img src="./doc/method.png"/>
chenych's avatar
chenych committed
10
11
12
</div>

## 算法原理
Rayyyyy's avatar
Rayyyyy committed
13
通用视觉模型Painter, 将“以视觉为中心”作为建模核心思想,将图像作为输入和输出,从而获得上下文视觉信息,完成不同的视觉任务。将视觉任务的连续输出空间离散化, 并使用语言或专门设计的离散标记作为任务提示, 将视觉问题转化为NLP问题。
chenych's avatar
chenych committed
14
15
16
17
18
<div align=center>
    <img src="./doc/progress.png"/>
</div>

## 环境配置
Rayyyyy's avatar
Rayyyyy committed
19
Tips: timm==0.3.2 版本存在[cannot import name 'container_abcs' from 'torch._six'](https://github.com/huggingface/pytorch-image-models/issues/420#issuecomment-776459842)问题, 需要将 `timm/models/layers/helpers.py``from torch._six import container_abcs`修改为
chenych's avatar
chenych committed
20
21
22

```bash
import torch
Rayyyyy's avatar
Rayyyyy committed
23
24
TORCH_MAJOR=int(torch.__version__.split('.')[0])
TORCH_MINOR=int(torch.__version__.split('.')[1])
chenych's avatar
chenych committed
25
26
27
28

if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    from torch._six import container_abcs
else:
chenych's avatar
chenych committed
29
    import collections.abc as container_abcs
chenych's avatar
chenych committed
30
31
32
33
34
35
```

### Docker(方法一)
-v 路径、docker_name和imageID根据实际情况修改

```bash
Rayyyyy's avatar
Rayyyyy committed
36
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
chenych's avatar
chenych committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

cd /your_code_path/painter_pytorch
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

### Dockerfile(方法二)
-v 路径、docker_name和imageID根据实际情况修改

```bash
cd ./docker
Rayyyyy's avatar
Rayyyyy committed
51

chenych's avatar
chenych committed
52
53
54
55
56
57
58
59
60
61
62
63
docker build --no-cache -t painter:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

cd /your_code_path/painter_pytorch
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

### Anaconda(方法三)

Rayyyyy's avatar
Rayyyyy committed
64
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:https://developer.hpccube.com/tool/
chenych's avatar
chenych committed
65
66

```
Rayyyyy's avatar
Rayyyyy committed
67
68
69
70
DTK软件栈:dtk24.04
python:python3.10
torch:2.1.0
torchvision:0.16.0
chenych's avatar
chenych committed
71
72
73
74
75
76
77
78
79
80
81
82
83
84
```

Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应

2、其他非特殊库直接按照requirements.txt安装

```bash
pip install -r requirements.txt
# 安装detectron2
git clone https://github.com/facebookresearch/detectron2
python -m pip install -e detectron2
```

## 数据集
Rayyyyy's avatar
Rayyyyy committed
85
本项目所需数据集较多, 可以使用提供的[toy_datasets](http://113.200.138.88:18080/aimodels/baai/Painter/-/blob/main/toy_datasets.tar)数据集来验证部分功能, 数据集由每个类别中各10个类别组成. 将数据集放置于`$Painter_ROOT/toy_datasets`路径下, 并设置`$Painter_ROOT/train_painter_vit_large.sh``DATA_PATH=toy_datasets`即可, 其他参数请参考训练章节的介绍。
chenych's avatar
chenych committed
86

Rayyyyy's avatar
Rayyyyy committed
87
如需完整数据集, 请参考[data instructions](./docs/DATA.md)。完整数据集的目录结构如下:
chenych's avatar
chenych committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174

```
├── nyu_depth_v2/
│   ├── sync/
│   ├── official_splits/
│   ├── nyu_depth_v2_labeled.mat
│   ├── nyuv2_sync_image_depth.json  # 生成
│   ├── nyuv2_test_image_depth.json  # 生成
├── ade20k/
│   ├── images/
│   ├── annotations/
│   ├── annotations_detectron2/  # 生成
│   ├── annotations_with_color/  # 生成
│   ├── ade20k_training_image_semantic.json  # 生成
│   ├── ade20k_validation_image_semantic.json  # 生成
├── ADEChallengeData2016/  # sim-link to $Painter_ROOT/datasets/ade20k
├── coco/
│   ├── train2017/
│   ├── val2017/
│   ├── annotations/
│       ├── instances_train2017.json
│       ├── instances_val2017.json
│       ├── person_keypoints_val2017.json
│       ├── panoptic_train2017.json
│       ├── panoptic_val2017.json
│       ├── panoptic_train2017/
│       ├── panoptic_val2017/
│   ├── panoptic_semseg_val2017/  # 生成
│   ├── panoptic_val2017/  # sim-link to $Painter_ROOT/datasets/coco/annotations/panoptic_val2017
│   ├── pano_sem_seg/  # 生成
│       ├── panoptic_segm_train2017_with_color
│       ├── panoptic_segm_val2017_with_color
│       ├── coco_train2017_image_panoptic_sem_seg.json
│       ├── coco_val2017_image_panoptic_sem_seg.json
│   ├── pano_ca_inst/  # 生成
│       ├── train_aug0/
│       ├── train_aug1/
│       ├── ...
│       ├── train_aug29/
│       ├── train_org/
│       ├── train_flip/
│       ├── val_org/
│       ├── coco_train_image_panoptic_inst.json
│       ├── coco_val_image_panoptic_inst.json
├── coco_pose/
│   ├── person_detection_results/
│       ├── COCO_val2017_detections_AP_H_56_person.json
│   ├── data_pair/  # 生成
│       ├── train_256x192_aug0/
│       ├── train_256x192_aug1/
│       ├── ...
│       ├── train_256x192_aug19/
│       ├── val_256x192/
│       ├── test_256x192/
│       ├── test_256x192_flip/
│   ├── coco_pose_256x192_train.json  # 生成
│   ├── coco_pose_256x192_val.json  # 生成
├── derain/
│   ├── train/
│       ├── input/
│       ├── target/
│   ├── test/
│       ├── Rain100H/
│       ├── Rain100L/
│       ├── Test100/
│       ├── Test1200/
│       ├── Test2800/
│   ├── derain_train.json
│   ├── derain_test_rain100h.json
├── denoise/
│   ├── SIDD_Medium_Srgb/
│   ├── train/
│   ├── val/
│   ├── denoise_ssid_train.json  # 生成
│   ├── denoise_ssid_val.json  # 生成
├── light_enhance/
│   ├── our485/
│       ├── low/
│       ├── high/
│   ├── eval15/
│       ├── low/
│       ├── high/
│   ├── enhance_lol_train.json  # 生成
│   ├── enhance_lol_val.json  # 生成
```

## 训练
Rayyyyy's avatar
Rayyyyy committed
175
下载预训练模型[MAE ViT-Large model](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改`$Painter_ROOT/train.sh``$Painter_ROOT/single_process.sh`中finetune参数地址。
chenych's avatar
chenych committed
176
177

### 单机多卡
Rayyyyy's avatar
Rayyyyy committed
178
179
本项目默认参数是单机4卡(total_bsz=1x4x32=128), 如需使用其他的卡数, 请修改`train.sh`中对应参数.

chenych's avatar
chenych committed
180
181
182
183
184
```bash
bash train.sh
```

### 多机多卡
Rayyyyy's avatar
Rayyyyy committed
185
186
187
188
Tips: 作者使用8个节点, 每个节点8张卡 (total_bsz=8x8x32=2048) 进行的训练;

使用多节点的情况下,需要将使用节点写入hostfile文件, 多节点每个节点一行, 例如: c1xxxxxx slots=8, 8代表当前节点8张卡

chenych's avatar
chenych committed
189
```bash
Rayyyyy's avatar
Rayyyyy committed
190
bash run_train_multi.sh
chenych's avatar
chenych committed
191
192
193
```

## 推理
Rayyyyy's avatar
Rayyyyy committed
194
1. 下载推理模型[painter_vit_large.pth](http://113.200.138.88:18080/aimodels/baai/Painter), 或者准备好自己的待测试模型;
chenych's avatar
chenych committed
195

Rayyyyy's avatar
Rayyyyy committed
196
2. 部分测试无法使用toy_datasets进行验证,如果使用toy_datasets数据集进行推理, 需确认默认图片是否存在, 如不存在请修改对应参数`PROMPT`;
chenych's avatar
chenych committed
197

Rayyyyy's avatar
Rayyyyy committed
198
3. 各个数据集推理方法如下:
chenych's avatar
chenych committed
199
200

### NYU Depth V2
Rayyyyy's avatar
Rayyyyy committed
201
202
设置[eval/nyuv2_depth/eval.sh](./eval/nyuv2_depth/eval.sh)文件里的`JOB_NAME``PROMPT``CKPT_FILE``DATA_DIR`参数,执行:

chenych's avatar
chenych committed
203
204
205
206
207
```bash
bash eval/nyuv2_depth/eval.sh
```

### ADE20k Semantic Segmentation
Rayyyyy's avatar
Rayyyyy committed
208
209
210
211
1. **当前数据集无法使用 toy_datasets 进行验证**;

2. 设置 [eval/ade20k_semantic/eval.sh](./eval/ade20k_semantic/eval.sh)文件里的`JOB_NAME``PROMPT`参数, 执行下面的命令:

chenych's avatar
chenych committed
212
213
214
215
216
```bash
bash eval/ade20k_semantic/eval.sh
```

### COCO Panoptic Segmentation
Rayyyyy's avatar
Rayyyyy committed
217
218
219
1. **当前数据集无法使用 toy_datasets 进行验证**;

2. 设置[eval/coco_panoptic/eval.sh](./eval/coco_panoptic/eval.sh)文件里的`JOB_NAME``PROMPT`参数, 然后执行下面的命令:
chenych's avatar
chenych committed
220
221
222
223
224
```bash
bash eval/coco_panoptic/eval.sh
```

### COCO Human Pose Estimation
Rayyyyy's avatar
Rayyyyy committed
225
226
1. **当前数据集无法使用 toy_datasets 进行验证**;

chenych's avatar
chenych committed
227
2. 生成验证所需的图像:
Rayyyyy's avatar
Rayyyyy committed
228

chenych's avatar
chenych committed
229
```bash
chenych's avatar
chenych committed
230
231
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 --use_env eval/mmpose_custom/painter_inference_pose.py --ckpt_path models/painter_vit_large/painter_vit_large.pth --flip_test
chenych's avatar
chenych committed
232
233
```

chenych's avatar
chenych committed
234
3. 修改  `$Painter_ROOT/eval/mmpose_custom/configs/coco_256x192_test_offline.py` 文件中的 `job_name``data_root``bbox_file``ckpt_file` 参数, 执行:
Rayyyyy's avatar
Rayyyyy committed
235

chenych's avatar
chenych committed
236
237
238
239
240
241
242
243
```bash
cd $Painter_ROOT/eval/mmpose_custom
./tools/dist_test.sh configs/coco_256x192_test_offline.py none 1 --eval mAP
```

### Low-Light Image Enhancement
```bash
python eval/lol/painter_inference_lol.py --ckpt_path models/path/of/painter_vit_large.pth --data_dir path/of/datasets
Rayyyyy's avatar
Rayyyyy committed
244
```
chenych's avatar
chenych committed
245
Example:
Rayyyyy's avatar
Rayyyyy committed
246
```
chenych's avatar
chenych committed
247
python eval/lol/painter_inference_lol.py --ckpt_path models/painter_vit_large.pth --data_dir datasets
chenych's avatar
chenych committed
248
249
250
```

## result
Rayyyyy's avatar
Rayyyyy committed
251
三张图从左到右分别为: 原始图像, 低分辨率图像, 经模型超分处理后的图像。
chenych's avatar
chenych committed
252
253
254
255
256
257
258
259
260
<div align=center>
    <img src="./doc/high.png"/>
    <img src="./doc/low.png"/>
    <img src="./doc/results.png"/>
</div>

### 精度
基于项目提供的测试数据, 得到单卡测试结果如下:

Rayyyyy's avatar
Rayyyyy committed
261
262
263
264
| device | task | datasets | PSNR | SSIM |
| :------: | :------: | :------: | :------: | :------: |
| Z100L | enhance | LOL | 22.34 | 0.8716 |
| paper | enhance | LOL | 22.34 | 0.872 |
chenych's avatar
chenych committed
265
266
267
268
269
270
271
272
273
274
275
276
277

## 应用场景
### 算法类别
图像超分,图像分割

### 热点应用行业
交通,医疗,政府,制造

## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/painter_pytorch.git

## 参考资料
https://github.com/baaivision/Painter/tree/main/Painter