README.md 15.2 KB
Newer Older
dcuai's avatar
dcuai committed
1
# X-Decoder
luopl's avatar
luopl committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## 论文
`Generalized Decoding for Pixel, Image, and Language`
- https://arxiv.org/abs/2212.11270
## 模型结构
X-Decoder 通过一个统一的解码器结构,将像素级、图像级和语言级的任务集成在同一语义空间中,实现了不同视觉和视觉-语言任务的高效处理和协同学习。X-Decoder将两种类型的查询作为输入:
通用非语义查询和文本输入引导的语义查询,使解码器能够识别各种语言相关的视觉任务,在多项视觉任务性能均有很好的表现。
<div align=center>
    <img src="./assets/xdecoder_framework.png"/>
</div>

## 算法原理
模型完整结构图如下,由一个图像编码器,一个文本编码器和本文自己设计的x解码器组成。X-Decoder 接受两种类型的输入查询:非语义查询(如图像特征)和语义查询(如从文本中提取的查询)。
这些查询被处理并传入模型,用于生成相应的输出。有了这些新设计,X-Decoder是第一个提供统一方式支持所有类型图像分割和各种视觉语言(VL)任务的模型。该设计实现了不同粒度任务之间的无缝交互,
并通过学习一个共同而丰富的像素级视觉语义理解空间来带来互利,无需任何伪标签。在对有限数量的分割数据和数百万图像文本对的混合集进行预训练后,X-Decoder在零样本和微调设置下都表现出对各种下游任务的强大可迁移性。

<div align=center>
    <img src="./assets/Overall pipeline for our model.png"/>
</div>

## 环境配置

### Docker(方法一)
chenzk's avatar
chenzk committed
24
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤,以及[光合](https://developer.sourcefind.cn/tool/)开发者社区深度学习库下载地址
luopl's avatar
luopl committed
25
26

```
dcuai's avatar
dcuai committed
27
28
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10 
docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name xdecoder_mmcv  <your IMAGE ID> bash # <your IMAGE ID>为以上拉取的docker的镜像ID替换
luopl's avatar
luopl committed
29
cd /path/your_code_data/xdecoder_mmcv
luopl's avatar
luopl committed
30
31
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
luopl's avatar
luopl committed
32
git clone https://github.com/cocodataset/panopticapi.git
luopl's avatar
luopl committed
33
34
35
36
37
38
39
40
41
42
43
44
45
cd panopticapi
pip install e .
```

### Dockerfile(方法二)
此处提供dockerfile的使用方法

```
docker build --no-cache -t xdecoder:latest .
docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name xdecoder_mmcv  xdecoder  bash
cd /path/your_code_data/xdecoder_mmcv
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
luopl's avatar
luopl committed
46
git clone https://github.com/cocodataset/panopticapi.git
luopl's avatar
luopl committed
47
48
49
50
51
52
53
cd panopticapi
pip install e .
```

### Anaconda(方法三)
此处提供本地配置、编译的详细步骤,例如:

chenzk's avatar
chenzk committed
54
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
luopl's avatar
luopl committed
55
```
dcuai's avatar
dcuai committed
56
57
58
59
60
61
62
DTK驱动:dtk24.04.2
python:python3.10
torch: 2.1.0
torchvision: 0.16.0
mmcv: 2.0.1

conda create -n xdecoder python=3.10
luopl's avatar
luopl committed
63
conda activate xdecoder
dcuai's avatar
dcuai committed
64
65
66
pip install torch-2.1.0+das.opt1.dtk24042-cp310-cp310-manylinux_2_28_x86_64.whl
pip install torchvision-0.16.0+das.opt1.dtk24042-cp310-cp310-manylinux_2_28_x86_64.whl
pip install mmcv-2.0.1+das.opt1.dtk24042-cp310-cp310-manylinux_2_28_x86_64.whl
luopl's avatar
luopl committed
67
68
69
70
71
72
73
74
75
76

```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`

其它依赖环境安装如下:

```
cd /path/your_code_data/xdecoder_mmcv
pip install -r requirements/multimodal.txt -i https://mirrors.aliyun.com/pypi/simple/
pip install mmdet -i https://mirrors.aliyun.com/pypi/simple/
luopl's avatar
luopl committed
77
git clone https://github.com/cocodataset/panopticapi.git
luopl's avatar
luopl committed
78
79
80
81
82
83
84
85
cd panopticapi
pip install e .
```

## 数据集

根据以下文档准备数据集[docs](https://github.com/open-mmlab/mmdetection/blob/main/docs/en/user_guides/dataset_prepare.md#coco-caption-dataset-preparation)

luopl's avatar
luopl committed
86
下载后,根据上述docs链接中的方法进行数据处理
luopl's avatar
luopl committed
87

luopl's avatar
luopl committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
数据集目录如下:
```
data
├── ADEChallengeData2016
│   ├── ade20k_instance_train.json
│   ├── ade20k_instance_val.json
│   ├── ade20k_panoptic_train
│   │   ├── ADE_train_00000001.png
│   │   ├── ADE_train_00000002.png
│   │   ├── ...
│   ├── ade20k_panoptic_train.json
│   ├── ade20k_panoptic_val
│   │   ├── ADE_val_00000001.png
│   │   ├── ADE_val_00000002.png
│   │   ├── ...
│   ├── ade20k_panoptic_val.json
│   ├── annotations
│   │   ├── training
│   │   │   ├── ADE_train_00000001.png
│   │   │   ├── ADE_train_00000002.png
│   │   │   ├── ...
│   │   ├── validation
│   │   │   ├── ADE_val_00000001.png
│   │   │   ├── ADE_val_00000002.png
│   │   │   ├── ...
│   ├── annotations_instance
│   │   ├── training
│   │   │   ├── ADE_train_00000001.png
│   │   │   ├── ADE_train_00000002.png
│   │   │   ├── ...
│   │   ├── validation
│   │   │   ├── ADE_val_00000001.png
│   │   │   ├── ADE_val_00000002.png
│   │   │   ├── ...
│   ├── categoryMapping.txt
│   ├── images
│   │   ├── training
│   │   │   ├── ADE_train_00000001.jpg
│   │   │   ├── ADE_train_00000002.jpg
│   │   │   ├── ...
│   │   ├── validation
│   │   │   ├── ADE_val_00000001.jpg
│   │   │   ├── ADE_val_00000002.jpg
│   │   │   ├── ...
│   ├── imgCatIds.json
│   ├── objectInfo150.txt
│   │── sceneCategories.txt
├── coco
│   ├── annotations
│   │   ├── panoptic_train2017.json
│   │   ├── panoptic_train2017
│   │   ├── panoptic_val2017.json
│   │   ├── panoptic_val2017
dcuai's avatar
dcuai committed
141
142
143
144
145
│   │   ├── coco_karpathy_train.json
│   │   ├── coco_karpathy_test.json
│   │   ├── coco_karpathy_val.json
│   │   ├── coco_karpathy_val_gt.json
│   │   ├── coco_karpathy_test_gt.json
dcuai's avatar
dcuai committed
146
147
│   │   ├── panoptic_semseg_train2017 (生成)
│   │   ├── panoptic_semseg_val2017 (生成)
luopl's avatar
luopl committed
148
149
150
151
152
│   │   ...
│   ├── train2017
│   ├── val2017
│   ├── test2017
│   ├── ...
dcuai's avatar
dcuai committed
153
154
155
156
157
158
159
160
161
162
163
164
│   ├── refcoco
│   │   ├── instances.json
│   │   ├── refs(google).p
│   │   └── refs(unc).p
│   ├── refcoco+
│   │   ├── instances.json
│   │   └── refs(unc).p
│   ├── refcocog
│   │   ├── instances.json
│   │   ├── refs(google).p
│   │   └── refs(umd).p
│   │── train2014
dcuai's avatar
dcuai committed
165
│   ├── val2014
luopl's avatar
luopl committed
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
    ...
```

## 训练



## 推理
首先下载以下模型权重文件:
```
cd /path/your_code_data/xdecoder_mmcv
wget https://download.openmmlab.com/mmdetection/v3.0/xdecoder/xdecoder_focalt_last_novg.pt
wget https://download.openmmlab.com/mmdetection/v3.0/xdecoder/xdecoder_focalt_best_openseg.pt
```
### 单机单卡

如果无法连接外网可先将clip-vit-base-patch32下载到/path/your_code_data/xdecoder_mmcv/openai文件夹下
Huggingface下载 [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)

dcuai's avatar
dcuai committed
185
注:使用本地clip-vit-base-patch32权重,修改projects/XDecoder/xdecoder/language_model.py中的line18 :tokenizer='openai/clip-vit-base-patch32',修改为本地clip-vit-base-patch32文件夹绝对路径
luopl's avatar
luopl committed
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232

(1) Open Vocabulary Semantic Segmentation

```
cd projects/XDecoder
python demo.py ../../images/animals.png configs/xdecoder-tiny_zeroshot_open-vocab-semseg_coco.py --weights ../../xdecoder_focalt_last_novg.pt --texts zebra.giraffe
```

(2) Open Vocabulary Instance Segmentation
```
cd projects/XDecoder
python demo.py ../../images/owls.jpeg configs/xdecoder-tiny_zeroshot_open-vocab-instance_coco.py --weights ../../xdecoder_focalt_last_novg.pt --texts owl
```
(3) Open Vocabulary Panoptic Segmentation

```
cd projects/XDecoder
python demo.py ../../images/street.jpg configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_coco.py --weights ../../xdecoder_focalt_last_novg.pt  --text car.person --stuff-text tree.sky
```

(4) Referring Expression Segmentation

```
cd projects/XDecoder
python demo.py ../../images/fruit.jpg configs/xdecoder-tiny_zeroshot_open-vocab-ref-seg_refcocog.py --weights ../../xdecoder_focalt_last_novg.pt  --text "The larger watermelon. The front white flower. White tea pot."
```

(5) Image Caption

```
cd projects/XDecoder
python demo.py ../../images/penguin.jpeg configs/xdecoder-tiny_zeroshot_caption_coco2014.py --weights ../../xdecoder_focalt_last_novg.pt
```

(6) Referring Expression Image Caption

```
cd projects/XDecoder
python demo.py ../../images/fruit.jpg configs/xdecoder-tiny_zeroshot_ref-caption.py --weights ../../xdecoder_focalt_last_novg.pt --text 'White tea pot'
```
(7) Text Image Region Retrieval

```
cd projects/XDecoder
python demo.py ../../images/coco configs/xdecoder-tiny_zeroshot_text-image-retrieval.py --weights ../../xdecoder_focalt_last_novg.pt --text 'pizza on the plate'
```
### 单机多卡
dcuai's avatar
dcuai committed
233
| PYTHONPATH环境变量路径需根据实际情况进行修改 |
luopl's avatar
luopl committed
234
235
236

(1) Semantic segmentation on ADE20K
```
dcuai's avatar
dcuai committed
237
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_ade20k.py xdecoder_focalt_best_openseg.pt 4 --cfg-options model.test_cfg.use_thr_for_mc=False
luopl's avatar
luopl committed
238
239
240
```
(2) Instance segmentation on ADE20K
```
dcuai's avatar
dcuai committed
241
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_ade20k.py xdecoder_focalt_best_openseg.pt 4
luopl's avatar
luopl committed
242
243
244
```
(3) Panoptic segmentation on ADE20K
```
dcuai's avatar
dcuai committed
245
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_ade20k.py xdecoder_focalt_best_openseg.pt 4
luopl's avatar
luopl committed
246
247
248
```
(4) Semantic segmentation on COCO2017
```
dcuai's avatar
dcuai committed
249
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_coco.py xdecoder_focalt_last_novg.pt 4 --cfg-options model.test_cfg.use_thr_for_mc=False
luopl's avatar
luopl committed
250
251
252
```
(5) Instance segmentation on COCO2017
```
dcuai's avatar
dcuai committed
253
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_coco.py xdecoder_focalt_last_novg.pt 4
luopl's avatar
luopl committed
254
255
256
```
(6) Panoptic segmentation on COCO2017
```
dcuai's avatar
dcuai committed
257
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_coco.py xdecoder_focalt_last_novg.pt 4
luopl's avatar
luopl committed
258
259
260
```
(7) Referring segmentation on RefCOCO
```
dcuai's avatar
dcuai committed
261
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh  projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-ref-seg_refcocog.py xdecoder_focalt_last_novg.pt 4  --cfg-options test_dataloader.dataset.split='val'
luopl's avatar
luopl committed
262
263
264
```
(8) Image Caption on COCO2014

luopl's avatar
luopl committed
265
266
在测试前需要安装jdk1.8,否则在测试过程中会提示 java does not exist

dcuai's avatar
dcuai committed
267
268
可参考以下安装步骤更换repos:

luopl's avatar
luopl committed
269
```
dcuai's avatar
dcuai committed
270
271
apt update
apt install -y openjdk-8-jdk
luopl's avatar
luopl committed
272
273
```

luopl's avatar
luopl committed
274
```
dcuai's avatar
dcuai committed
275
HIP_VISIBLE_DEVICES=0,1,2,3 PYTHONPATH=/public/home/luopl/xdecoder_mmcv/projects/XDecoder ./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_caption_coco2014.py xdecoder_focalt_last_novg.pt 4
luopl's avatar
luopl committed
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
```
## result
(1) Open Vocabulary Semantic Segmentation

<div align=center>
    <img src="./assets/animals.png"/>
</div>

(2) Open Vocabulary Instance Segmentation

<div align=center>
    <img src="./assets/owls.jpeg"/>
</div>

(3) Open Vocabulary Panoptic Segmentation

<div align=center>
    <img src="./assets/street.jpg"/>
</div>

(4) Referring Expression Segmentation

<div align=center>
    <img src="./assets/fruit1.jpg"/>
</div>

(5) Image Caption

<div align=center>
    <img src="./assets/penguin.jpeg"/>
</div>

(6) Referring Expression Image Caption

<div align=center>
    <img src="./assets/fruit.jpg"/>
</div>

(7) Text Image Region Retrieval

<div align=center>
    <img src="./assets/000.jpg"/>
</div>


### 精度
使用四张DCU-K100 AI卡推理

(1) Semantic segmentation on ADE20K


|    Model    |          mIoU           | mIOU(official) | Config     | 
|:------------:|:-------------------------:|------|------------|
|  xdecoder_focalt_best_openseg.pt    |        25.24         | 25.13 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_ade20k.py) | 

(2) Instance segmentation on ADE20K


|    Model    |          mIoU           | mIOU(official) | Config     | 
|:------------:|:-------------------------:|------|------------|
|  xdecoder_focalt_best_openseg.pt    |        10.1         | 10.1 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_ade20k.py) | 

(3) Panoptic segmentation on ADE20K

|    Model    | mIoU  | mIOU(official) | Config     | 
|:------------:|:-----:|------|------------|
| xdecoder_focalt_best_openseg.pt    | 19.12 | 18.97 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_ade20k.py) | 

(4) Semantic segmentation on COCO2017

|    Model    | mIoU  | mIOU(official) | Config     | 
|:------------:|:-----:|----------------|------------|
| xdecoder_focalt_last_novg.pt    | 62.10 | 62.10          | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_coco.py) | 

(5) Instance segmentation on COCO2017

|    Model    | mIoU | mIOU(official) | Config     | 
|:------------:|:----:|------|------------|
| xdecoder_focalt_last_novg.pt    | 39.8 | 39.7 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_coco.py) | 

(6) Panoptic segmentation on COCO2017

|    Model    | mIoU  | mIOU(official) | Config     | 
|:------------:|:-----:|------|------------|
| xdecoder_focalt_last_novg.pt    | 51.42 | 51.16 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_coco.py) | 

(7) Referring segmentation on RefCOCO


|    Model    | text mode  | cIoU | cIOU(official)     | Config  | 
|:------------:|:-----:|------|------------|---|
|xdecoder_focalt_last_novg.pt    | select first | 58.8514 | 57.85 |  [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-ref-seg_refcocog.py) | 

(8) Image Caption on COCO2014


|    Model    | BLEU-4	  | CIDER | Config     | 
|:------------:|:-----:|------|------------|
| xdecoder_focalt_last_novg.pt   | 35.26 | 116.81 | [config](projects/XDecoder/configs/xdecoder-tiny_zeroshot_caption_coco2014.py) | 


## 应用场景
### 算法类别
`图像分割`
### 热点应用行业
`科研,制造,医疗,家居,教育`
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
383
- https://developer.sourcefind.cn/codes/modelzoo/xdecoder_mmcv
luopl's avatar
luopl committed
384
385
386
## 参考资料
- https://github.com/open-mmlab/mmdetection/tree/main/projects/XDecoder