README.md 3.17 KB
Newer Older
lijian6's avatar
lijian6 committed
1
# Stable Diffusion 3 Medium
lijian6's avatar
lijian6 committed
2
## 论文
lijian6's avatar
lijian6 committed
3

lijian6's avatar
lijian6 committed
4
`Scaling Rectified Flow Transformers for High-Resolution Image Synthesis`
lijian6's avatar
lijian6 committed
5

lijian6's avatar
lijian6 committed
6
https://arxiv.org/abs/2403.03206
lijian6's avatar
lijian6 committed
7

lijian6's avatar
lijian6 committed
8
## 模型结构
lijian6's avatar
lijian6 committed
9

lijian6's avatar
lijian6 committed
10
Stable Diffusion 3 Medium 是一种多模态扩散转换器(MMDiT)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面具有显着改进的性能。
lijian6's avatar
lijian6 committed
11

lijian6's avatar
lijian6 committed
12
本项目主要针对Stable Diffusion 3 Medium在DCU平台的推理性能优化,达到DCU平台较快的生成效果。
lijian6's avatar
lijian6 committed
13

lijian6's avatar
lijian6 committed
14
![img](docs/mmdit.png)
lijian6's avatar
lijian6 committed
15
16


lijian6's avatar
lijian6 committed
17
## 算法原理
lijian6's avatar
lijian6 committed
18

lijian6's avatar
lijian6 committed
19
SD3 以序列 Embedding 的形式处理文本输入和视觉隐空间特征。位置编码是施加在隐空间特征的 2x2 patch 上的,随后被展开成 patch 的 Enbedding 序列。这一序列和文本的特征序列一起,被送入 MMDiT 的各个模块中去。两种特征序列被转化成相同特征维度,拼接在一起,然后送入一系列注意力机制模块和多层感知机 (MLP) 里。
lijian6's avatar
lijian6 committed
20

lijian6's avatar
lijian6 committed
21
为应对两种模态间的差异,MMDiT 模块使用两组不同的权重去转换文本和图像序列的特征维度。两个序列之后会在注意力操作之前被合并在一起。这种设计使得两种表征能在自己的特征空间里工作,同时也使得它们之间可以通过注意力机制从对方的特征中提取有用的信息。这种文本和图像间双向的信息流动有别于以前的文生图模型,后者的文本信息是通过 cross-attention 送入模型的,且不同层输入的文本特征均是文本编码器的输出,不随深度的变化而改变。
lijian6's avatar
lijian6 committed
22

lijian's avatar
Update  
lijian committed
23
24
![img](docs/algorithm.png)

lijian6's avatar
lijian6 committed
25
26
## 环境配置
提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
lijian6's avatar
lijian6 committed
27
```
lijian6's avatar
lijian6 committed
28
29
30
31
32
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stablediffusion_v2-1_dtk24.04_xformers0.0.25_py310
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name sd3 --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
lijian6's avatar
lijian6 committed
33
```
lijian6's avatar
lijian6 committed
34
35
36
37
镜像版本依赖:
* DTK驱动:dtk24.04
* Pytorch: 2.1.0
* python: python3.10
lijian6's avatar
lijian6 committed
38

lijian6's avatar
lijian6 committed
39
40
## 数据集

lijian6's avatar
lijian6 committed
41

lijian6's avatar
lijian6 committed
42
43
## 推理
### 安装diffuser和依赖
lijian6's avatar
lijian6 committed
44
45

```
chenzk's avatar
chenzk committed
46
git clone http://developer.sourcefind.cn/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
lijian6's avatar
lijian6 committed
47
48
cd stable-diffusion-3-medium_diffusers
git submodule init && git submodule update
lijian6's avatar
lijian6 committed
49
50
51
52

1. 卸载旧torch和diffusers
pip uninstall torch diffusers
2. 安装新版本
lijian6's avatar
lijian6 committed
53
54
cd diffusers
python3 setup.py install
lijian6's avatar
lijian6 committed
55
cd .. && ./env.sh
lijian6's avatar
lijian6 committed
56

lijian6's avatar
lijian6 committed
57
```
lijian6's avatar
lijian6 committed
58

lijian6's avatar
lijian6 committed
59
### 模型下载
chenzk's avatar
chenzk committed
60
[stable-diffusion-3-medium-diffusers](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
lijian6's avatar
lijian6 committed
61

lijian6's avatar
lijian6 committed
62
### 运行 stable-diffusion-3-medium
lijian6's avatar
lijian6 committed
63

lijian6's avatar
lijian6 committed
64
```
lijian6's avatar
lijian6 committed
65
66
67
68
69
python SD3-medium.py

使用xformers计算attention:
export USE_XFORMERS=1
python SD3-medium.py
lijian6's avatar
lijian6 committed
70
```
lijian6's avatar
lijian6 committed
71

lijian6's avatar
lijian6 committed
72
## result
lijian6's avatar
lijian6 committed
73
![img](./docs/result.png)
lijian6's avatar
lijian6 committed
74

lijian6's avatar
lijian6 committed
75
76
### 精度

lijian6's avatar
lijian6 committed
77

lijian6's avatar
lijian6 committed
78
79
80
## 应用场景
### 算法类别
`以文生图`
lijian6's avatar
lijian6 committed
81

lijian6's avatar
lijian6 committed
82
83
### 热点应用行业
`绘画,动漫,媒体`
lijian6's avatar
lijian6 committed
84

lijian6's avatar
lijian6 committed
85
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
86
http://developer.sourcefind.cn/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
lijian6's avatar
lijian6 committed
87

lijian6's avatar
lijian6 committed
88
89
## 参考资料
https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
lijian6's avatar
lijian6 committed
90