README.md 3.4 KB
Newer Older
dcuai's avatar
dcuai committed
1
# sd3.5
mashun1's avatar
sd3.5  
mashun1 committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

## 论文

`Scaling Rectified Flow Transformers for High-Resolution Image Synthesis`

* https://arxiv.org/abs/2403.03206

## 模型结构

sd3.5模型采用sd3相同的模型结构,具体来说使用了3个文本编码器,主干网络由`MM-DiT`组成。

<img src="readme_imgs/arch.png" style="zoom:70%">

## 算法原理

模型采用`flow matching`算法进行训练,与传统的扩散模型不同,Flow Matching直接优化整个轨迹,避免了逐步采样和反向过程,从而简化了训练过程并提高了生成效率。

<img src="readme_imgs/alg.png" style="zoom:100%">


## 环境配置

### Docker(方法一)
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk24.04.2-py3.10

    docker run --shm-size 50g --network=host --name=sd3.5 --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

### Dockerfile(方法二)

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 50g --network=host --name=sd3.5 --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

### Anaconda (方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
43
https://developer.sourcefind.cn/tool/
mashun1's avatar
sd3.5  
mashun1 committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

    DTK驱动:dtk24.04.2
    python:python3.10
    torch: 2.1.0
    torchvision: 0.16.0

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

## 数据集



## 训练



## 推理

```
export HF_ENDPOINT=https://hf-mirror.com
```

```bash
# Generate a cat using SD3.5 Large model (at models/sd3.5_large.safetensors) with its default settings
python sd3_infer.py --prompt "cute wallpaper art of a cat"
# Or use a text file with a list of prompts
python sd3_infer.py --prompt path/to/my_prompts.txt
# Generate a cat using SD3.5 Large Turbo with its default settings
python sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large_turbo.safetensors
```

## result

|model|prompt|result|
|:---:|:---:|:---:|
|large|cute wallpaper art of a dog|<img src="readme_imgs/result1.png" style="zoom:20%">
|turbo|cute wallpaper art of a dog|<img src="readme_imgs/result2.png" style="zoom:20%">|

### 精度



## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`电商,绘画,广媒`

## 预训练权重

chenzk's avatar
chenzk committed
102
large_mode: [huggingface](https://hf-mirror.com/stabilityai/stable-diffusion-3.5-large)
mashun1's avatar
sd3.5  
mashun1 committed
103

chenzk's avatar
chenzk committed
104
large_turbo: [huggingface](https://hf-mirror.com/stabilityai/stable-diffusion-3.5-large-turbo)
mashun1's avatar
sd3.5  
mashun1 committed
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

注意:仅需要按权重文件结构下载需要的权重即可,其中`clip_g, clip_l.safetensors, t5xxxl_fp16.safetensor`位于`text_encoders`目录下。

### 权重文件结构

```
models/
├── clip_g.safetensors
├── clip_l.safetensors
├── sd3.5_large.safetensors
├── sd3.5_large_turbo.safetensors
└── t5xxl_fp16.safetensors
```

## 源码仓库及问题反馈

* https://developer.sourcefind.cn/codes/modelzoo/sd3.5_pytorch

## 参考资料

* https://github.com/Stability-AI/sd3.5