README.md 7.71 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
# HunyuanDiT

## 论文

mashun1's avatar
mashun1 committed
5
`Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding`
mashun1's avatar
mashun1 committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

* https://arxiv.org/pdf/2405.08748

## 模型结构

模型基于`transformer decoder`结构,在`DiT`基础上重新设计了`Time Embedding`以及`positional Embedding`的添加方式,`Text Prompt`通过两个`text encoder`进行编码,其余与DiT一致。

![alt text](readme_imgs/model_structure.png)


## 算法原理
使用`self-attention`捕获图像内部的结构信息,使用`cross attention`对齐文本与图像。

![alt text](readme_imgs/alg.png)


## 环境配置

### Docker(方法一)
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310

    docker run --shm-size 10g --network=host --name=hunyuan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install timm --no-deps

    pip install flash_attn-2.0.4+das1.0+82379d7.abi0.dtk2404.torch2.1-cp310-cp310-manylinux2014_x86_64.whl  (开发者社区)

mashun1's avatar
mashun1 committed
36
37
    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

mashun1's avatar
mashun1 committed
38
39
40
41
42
43
44
45
46
47
48
49
50

### Dockerfile(方法二)

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=hunyuan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt
    
    pip install timm --no-deps

    pip install flash_attn-2.0.4+das1.0+82379d7.abi0.dtk2404.torch2.1-cp310-cp310-manylinux2014_x86_64.whl  (开发者社区)

mashun1's avatar
mashun1 committed
51
52
    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

mashun1's avatar
mashun1 committed
53
54
55

### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
56
https://developer.sourcefind.cn/tool/
mashun1's avatar
mashun1 committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

    DTK驱动:dtk24.04
    python:python3.10
    torch: 2.1.0
    torchvision: 0.16.0
    onnx: 1.15.0
    flash-attn: 2.0.4

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

    pip install timm --no-deps

mashun1's avatar
mashun1 committed
73
74
    pip install bitsandbytes-0.42.0-py3-none-any.whl  (whl文件夹中)

mashun1's avatar
mashun1 committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
## 数据集



## 推理

### 命令行

    # Prompt Enhancement + Text-to-Image. Torch mode
    python sample_t2i.py --prompt "千里冰封万里雪飘"

    # Prompt Enhancement + Text-to-Image. Torch mode(在最新硬件上可用)
    python sample_t2i.py --prompt "千里冰封万里雪飘" --infer-mode fa

    # Only Text-to-Image. Torch mode
    python sample_t2i.py --prompt "飞流直下三千尺疑是银河落九天" --no-enhance

    # Generate an image with other image sizes.
    python sample_t2i.py --prompt "飞流直下三千尺疑是银河落九天" --image-size 1280 768

    # Prompt Enhancement + Text-to-Image. DialogGen loads with 4-bit quantization, but it may loss performance.
    python sample_t2i.py --prompt "飞流直下三千尺疑是银河落九天"  --load-4bit


参数列表
|    Argument     |  Default  |                     Description                     |
|:---------------:|:---------:|:---------------------------------------------------:|
|   `--prompt`    |   None    |        The text prompt for image generation         |
| `--image-size`  | 1024 1024 |           The size of the generated image           |
|    `--seed`     |    42     |        The random seed for generating images        |
| `--infer-steps` |    100    |          The number of steps for sampling           |
|  `--negative`   |     -     |      The negative prompt for image generation       |
| `--infer-mode`  |   torch   |       The inference mode (torch, fa)         |
|   `--sampler`   |   ddpm    |    The diffusion sampler (ddpm, ddim, or dpmms)     |
| `--no-enhance`  |   False   |        Disable the prompt enhancement model         |
| `--model-root`  |   ckpts   |     The root directory of the model checkpoints     |
|  `--load-key`   |    ema    | Load the student model or EMA model (ema or module) |
|  `--load-4bit`  |   Fasle   |     Load DialogGen model with 4bit quantization     |


### webui(推荐)

    # By default, we start a Chinese UI.
    python app/hydit_app.py

    # Using Flash Attention for acceleration. (在最新硬件上可用)
    python app/hydit_app.py --infer-mode fa

    # You can disable the enhancement model if the GPU memory is insufficient.
    # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag. 
    python app/hydit_app.py --no-enhance

    # Start with English UI
    python app/hydit_app.py --lang en

    # Start a multi-turn(对话形式) T2I generation UI. 
    # If your DCU memory is less than 32GB, use '--load-4bit' to enable 4-bit quantization, which requires at least 22GB of memory.
    python app/multiTurnT2I_app.py


## result

|||||
|:---|:---:|:---:|:---:|
|结果|![alt text](readme_imgs/result_1.png)|![alt text](readme_imgs/result_2.png)|![alt text](readme_imgs/result_3.png)|
|prompt|千里冰封万里雪飘|飞流直下三千尺疑是银河落九天|一只金毛犬叼着一个RTX4090显卡|


### 精度



## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`零售,广媒,电商`

mashun1's avatar
mashun1 committed
157
## 预训练权重
chenzk's avatar
chenzk committed
158
[HunyuanDiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT)
mashun1's avatar
mashun1 committed
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211

下载链接中的所有模型文件,并放入`ckpts`文件中。

    ckpts/
    ├── dialoggen
    │   ├── config.json
    │   ├── generation_config.json
    │   ├── model-00001-of-00004.safetensors
    │   ├── model-00002-of-00004.safetensors
    │   ├── model-00003-of-00004.safetensors
    │   ├── model-00004-of-00004.safetensors
    │   ├── model.safetensors.index.json
    │   ├── openai
    │   │   └── clip-vit-large-patch14-336
    │   │       ├── config.json
    │   │       ├── merges.txt
    │   │       ├── preprocessor_config.json
    │   │       ├── pytorch_model.bin
    │   │       ├── README.md
    │   │       ├── special_tokens_map.json
    │   │       ├── tokenizer_config.json
    │   │       ├── tokenizer.json
    │   │       └── vocab.json
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── tokenizer.model
    └── t2i
        ├── clip_text_encoder
        │   ├── config.json
        │   └── pytorch_model.bin
        ├── model
        │   ├── pytorch_model_ema.pt
        │   └── pytorch_model_module.pt
        ├── mt5
        │   ├── config.json
        │   ├── download.sh
        │   ├── generation_config.json
        │   ├── nohup.out
        │   ├── pytorch_model.bin
        │   ├── README.md
        │   ├── special_tokens_map.json
        │   ├── spiece.model
        │   └── tokenizer_config.json
        ├── sdxl-vae-fp16-fix
        │   ├── config.json
        │   ├── diffusion_pytorch_model.bin
        │   └── diffusion_pytorch_model.safetensors
        └── tokenizer
            ├── special_tokens_map.json
            ├── tokenizer_config.json
            ├── vocab_org.txt
            └── vocab.txt

mashun1's avatar
mashun1 committed
212
213
## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
214
* https://developer.sourcefind.cn/codes/modelzoo/hunyuandit_pytorch
mashun1's avatar
mashun1 committed
215
216
217
218
219
220
221
222
223

## 参考资料

* https://github.com/Tencent/HunyuanDiT/tree/main