README.md 6.45 KB
Newer Older
mashun1's avatar
idmvton  
mashun1 committed
1
2
3
4
# IDM-VTON

## 论文

mashun1's avatar
mashun1 committed
5
`Improving Diffusion Models for Virtual Try-on`
mashun1's avatar
idmvton  
mashun1 committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

* https://arxiv.org/abs/2403.05139

## 模型结构

模型基于`SDXL`,使用`IP-Adapter`以及`GarmentNet``Unet`)提取衣物特征并加入主网络。

![alt text](readme_imgs/model_structure.png)

## 算法原理

使用`self-attention`融合低级图像特征信息,使用`cross attention`融合高级语义特征。

![alt text](readme_imgs/alg.png)


## 环境配置

### Docker(方法一)
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310

    docker run --shm-size 10g --network=host --name=idmvton --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

    cd BasicSR && python setup.py develop


### Dockerfile(方法二)

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=idmvton --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

    cd BasicSR && python setup.py develop


### Anaconda (方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
53
https://developer.sourcefind.cn/tool/
mashun1's avatar
idmvton  
mashun1 committed
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

    DTK驱动:dtk24.04
    python:python3.10
    torch: 2.1.0
    torchvision: 0.16.0
    onnx: 1.15.0
    flash-attn: 2.0.4

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

    cd BasicSR && python setup.py develop

## 数据集

|名称|链接|
|:---|:---|
chenzk's avatar
chenzk committed
76
77
|VITON-HD| [github](https://github.com/shadow2496/VITON-HD) <br> [SCNet]|
|Dress Code|[github](https://github.com/aimagelab/dress-code) <br> [SCNet]|
mashun1's avatar
idmvton  
mashun1 committed
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

除了下载原始数据,该项目提供了用于测试的数据集,存放在`datasets`中。

VITON-HD

    train
    |-- ...

    test
    |-- image
    |-- image-densepose
    |-- agnostic-mask
    |-- cloth
    |-- vitonhd_test_tagged.json

DressCode

    |-- dresses
        |-- images
        |-- image-densepose
        |-- dc_caption.txt
        |-- ...
    |-- lower_body
        |-- images
        |-- image-densepose
        |-- dc_caption.txt
        |-- ...
    |-- upper_body
        |-- images
        |-- image-densepose
        |-- dc_caption.txt
        |-- ...

注意:其中image-denspose使用[detectron2](https://github.com/facebookresearch/detectron2)处理获得,具体参考 https://github.com/sangyun884/HR-VITON/issues/45 ,也可通过[原文连接](https://kaistackr-my.sharepoint.com/:u:/g/personal/cpis7_kaist_ac_kr/EaIPRG-aiRRIopz9i002FOwBDa-0-BHUKVZ7Ia5yAVVG3A?e=YxkAip)直接下载 。

mashun1's avatar
mashun1 committed
113
## 训练
mashun1's avatar
idmvton  
mashun1 committed
114

mashun1's avatar
mashun1 committed
115

mashun1's avatar
idmvton  
mashun1 committed
116

mashun1's avatar
mashun1 committed
117
## 推理
mashun1's avatar
idmvton  
mashun1 committed
118
119
120
121
122
123
124
125
126
127

### 命令行

#### VITON-HD

    accelerate launch inference.py \
    --width 768 --height 1024 --num_inference_steps 30 \
    --pretrained_model_name_or_path <path/to/pretrained_models> \
    --output_dir "result" \
    --unpaired \
mashun1's avatar
mashun1 committed
128
    --data_dir <path/to/datasets/viton-hd> \
mashun1's avatar
idmvton  
mashun1 committed
129
130
131
132
133
134
135
136
137
138
139
140
    --seed 42 \
    --test_batch_size 1 \
    --guidance_scale 2.0

#### DressCode

    accelerate launch inference_dc.py \
    --width 768 --height 1024 --num_inference_steps 30 \
    --pretrained_model_name_or_path <path/to/pretrained_models> \
    --unet_path <path/to/pretrained_models/dc> \
    --output_dir "result" \
    --unpaired \
mashun1's avatar
mashun1 committed
141
    --data_dir <path/to/datasets/dress-code> \
mashun1's avatar
idmvton  
mashun1 committed
142
143
144
145
146
    --seed 42 
    --test_batch_size 1
    --guidance_scale 2.0
    --category "upper_body" 

mashun1's avatar
mashun1 committed
147
148
注意:以上默认使用多卡推理,可使用`export HIP_VISIBLE_DEVICES=<设备号>`进行限制。

mashun1's avatar
idmvton  
mashun1 committed
149
150
151
152
### webui

    python gradio_demo/app.py

mashun1's avatar
mashun1 committed
153
154
注意:需要修改`gradio_demo/app.py``base_path='path/to/pretrained_models'`

mashun1's avatar
idmvton  
mashun1 committed
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
## result

|model|image|cloth|prompt|result|
|:---|:---:|:---:|:---:|:---:|
|dc|![alt txt](readme_imgs/020716_0.jpg)|![alt txt](readme_imgs/020717_1.jpg)|Short Sleeves Round Neck Knit Dress|![alt txt](readme_imgs/020716_0_r.jpg)|
||![alt txt](readme_imgs/00006_00.jpg)|![alt txt](readme_imgs/00013_00.jpg)|a photo of Short Sleeve Round Neck T-shirts|![alt txt](readme_imgs/00006_00_r.jpg)|


### 精度



## 应用场景

### 算法类别

mashun1's avatar
mashun1 committed
171
`AIGC`
mashun1's avatar
idmvton  
mashun1 committed
172
173
174
175
176

### 热点应用行业

`零售,广媒,电商`

mashun1's avatar
mashun1 committed
177
178
179
180
## 预训练权重

|名称|链接|
|:---|:---|
chenzk's avatar
chenzk committed
181
182
|IDM-VTON|[huggingface](https://hf-mirror.com/yisol/IDM-VTON) <br> [SCNet]|
|DC|[huggingface](https://hf-mirror.com/yisol/IDM-VTON-DC) <br> [SCNet]|
mashun1's avatar
mashun1 committed
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230

    ckpt/
    ├── densepose
    │   └── model_final_162be9.pkl
    ├── humanparsing
    │   ├── parsing_atr.onnx
    │   └── parsing_lip.onnx
    └── openpose
        └── ckpts
            └── body_pose_model.pth
    

    pretrained_models/
    ├── dc
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    ├── image_encoder
    │   ├── config.json
    │   └── model.safetensors
    ├── model_index.json
    ├── scheduler
    │   └── scheduler_config.json
    ├── text_encoder
    │   ├── config.json
    │   └── model.safetensors
    ├── text_encoder_2
    │   ├── config.json
    │   └── model.safetensors
    ├── tokenizer
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── tokenizer_2
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── unet
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    ├── unet_encoder
    │   ├── config.json
    │   └── diffusion_pytorch_model.safetensors
    └── vae
        ├── config.json
        └── diffusion_pytorch_model.safetensors

mashun1's avatar
idmvton  
mashun1 committed
231
232
## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
233
* https://developer.sourcefind.cn/codes/modelzoo/idm-vton_pytorch
mashun1's avatar
idmvton  
mashun1 committed
234
235
236
237
238
239
240
241
242

## 参考资料

* https://github.com/yisol/IDM-VTON