README.md 6.51 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
# PixArt-alpha

## 论文

mashun1's avatar
mashun1 committed
5
`PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis`
mashun1's avatar
mashun1 committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

* https://arxiv.org/abs/2310.00426

## 模型结构

该模型基于`DiT(Diffusion Transformer)`模型,添加了`Multi-Head Cross-Attention`用于对其文本与图像。

![alt text](readme_imgs/image-1.png)


## 算法原理

模型中主要涉及`Multi-Head Self-Attention``Multi-Head Cross-Attention`,其中`Multi-Head Self-Attention`主要用于对图像建模,`Multi-Head Cross-Attention`用于对齐图像与文本。

![alt text](readme_imgs/image-2.png)

## 环境配置

### Docker(方法一)

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

    pip install timm --no-deps

    pip uninstall apex

    # 安装diffusers
    # 手动安装
    git clone https://github.com/huggingface/diffusers.git

    cd diffusers && python setup.py install

    # 自动安装
    pip install git+https://github.com/huggingface/diffusers

    

### Dockerfile(方法二)

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

    pip install timm --no-deps

    pip uninstall apex

    # 安装diffusers
    # 手动安装
    git clone https://github.com/huggingface/diffusers.git

    cd diffusers && python setup.py install

    # 自动安装
    pip install git+https://github.com/huggingface/diffusers


### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/

    DTK驱动:dtk23.10.1
    python:python3.8
    torch:2.1.0
    torchvision:0.16.0
    triton:2.1.0

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

    pip install timm --no-deps

    # 安装diffusers
    # 手动安装
    git clone https://github.com/huggingface/diffusers.git

    cd diffusers && python setup.py install

    # 自动安装
    pip install git+https://github.com/huggingface/diffusers

## 数据集

mashun1's avatar
mashun1 committed
115
116
注意:该数据集为训练数据集

mashun1's avatar
mashun1 committed
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
完整数据:https://ai.meta.com/datasets/segment-anything/

测试数据:https://huggingface.co/datasets/PixArt-alpha/data_toy


数据下载完成后需要进行处理,可运行以下脚本:

    # 使用LLava获取更加详细的图像描述
    python tools/VLM_caption_lightning.py --output output/dir/ --data-root data/root/path --index path/to/data.json

    # 提前生成训练需要的特征
    python tools/extract_features.py --img_size=256 \
    --json_path "data/data_toy/data_info.json" \
    --t5_save_root "data/data_toy/caption_feature_wmask" \
    --vae_save_root "data/data_toy/img_vae_features" \
    --pretrained_models_dir "pretrained_models/hub/pixart_alpha" \
    --dataset_root "data/data_toy/images/"

处理后获得下述数据结构

    data/
    └── data_toy
        ├── caption_feature_wmask
        │   ├── 0_1.npz
        │   └── 0_3.npz
        ├── captions
        │   ├── 0_1.txt
        │   └── 0_3.txt
        ├── data_info.json
        ├── images
        │   ├── 0_1.png
        │   └── 0_3.png
        ├── img_vae_features
        │   └── 256resolution
        │       └── noflip
        │           ├── 0_1.npy
        │           └── 0_3.npy
        └── partition
            └── part0.txt

## 训练

mashun1's avatar
mashun1 committed
159

mashun1's avatar
mashun1 committed
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225

## 推理

### 模型下载

|Model+url|存放位置|
|:---:|:---:|
|[T5](https://hf-mirror.com/PixArt-alpha/PixArt-alpha/tree/main/t5-v1_1-xxl)|/path/to/save/models/pixart_alpha/t5_ckpts|
|[sd-vae-ft-ema](https://hf-mirror.com/PixArt-alpha/PixArt-alpha/tree/main/sd-vae-ft-ema)|/path/to/save/models/pixart_alpha/sd-vae-ft-ema|

    pixart_alpha/
    ├── sd-vae-ft-ema
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    └── t5_ckpts
        └── t5-v1_1-xxl
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            ├── pytorch_model-00002-of-00002.bin
            ├── pytorch_model.bin.index.json
            ├── special_tokens_map.json
            ├── spiece.model
            └── tokenizer_config.json

注意:上述模型需手动下载,其余模型将在运行时自动下载。

    export HF_ENDPOINT=https://hf-mirror.com
    export HUB_HOME=/path/to/save/models

### 命令

    # 快速测试
    HIP_VISIBLE_DEVICES=0 python quick_inference_with_code.py <prompt>

### WebUI

    # diffusers version
    DEMO_PORT=12345 python app/app.py

## result

||prompt|output|
|:---|:---:|:---:|
||a dog is playing a basketball|![alt text](readme_imgs/image-3.png)|


### 精度



## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`零售,广媒,教育`

## 源码仓库及问题反馈

* https://developer.hpccube.com/codes/modelzoo/pixart-alpha_pytorch

## 参考资料

mashun1's avatar
mashun1 committed
226
* https://github.com/PixArt-alpha/PixArt-alpha