README.md 3.24 KB
Newer Older
lijian6's avatar
lijian6 committed
1
# Stable Diffusion 3 Medium
lijian6's avatar
lijian6 committed
2
## 论文
lijian6's avatar
lijian6 committed
3

lijian6's avatar
lijian6 committed
4
`Scaling Rectified Flow Transformers for High-Resolution Image Synthesis`
lijian6's avatar
lijian6 committed
5

lijian6's avatar
lijian6 committed
6
https://arxiv.org/abs/2403.03206
lijian6's avatar
lijian6 committed
7

lijian6's avatar
lijian6 committed
8
## 模型结构
lijian6's avatar
lijian6 committed
9

lijian6's avatar
lijian6 committed
10
Stable Diffusion 3 Medium 是一种多模态扩散转换器(MMDiT)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面具有显着改进的性能。
lijian6's avatar
lijian6 committed
11

lijian6's avatar
lijian6 committed
12
本项目主要针对Stable Diffusion 3 Medium在DCU平台的推理性能优化,达到DCU平台较快的生成效果。
lijian6's avatar
lijian6 committed
13

lijian6's avatar
lijian6 committed
14
![img](docs/mmdit.png)
lijian6's avatar
lijian6 committed
15
16


lijian6's avatar
lijian6 committed
17
## 算法原理
lijian6's avatar
lijian6 committed
18

lijian6's avatar
lijian6 committed
19
SD3 以序列 Embedding 的形式处理文本输入和视觉隐空间特征。位置编码是施加在隐空间特征的 2x2 patch 上的,随后被展开成 patch 的 Enbedding 序列。这一序列和文本的特征序列一起,被送入 MMDiT 的各个模块中去。两种特征序列被转化成相同特征维度,拼接在一起,然后送入一系列注意力机制模块和多层感知机 (MLP) 里。
lijian6's avatar
lijian6 committed
20

lijian6's avatar
lijian6 committed
21
为应对两种模态间的差异,MMDiT 模块使用两组不同的权重去转换文本和图像序列的特征维度。两个序列之后会在注意力操作之前被合并在一起。这种设计使得两种表征能在自己的特征空间里工作,同时也使得它们之间可以通过注意力机制从对方的特征中提取有用的信息。这种文本和图像间双向的信息流动有别于以前的文生图模型,后者的文本信息是通过 cross-attention 送入模型的,且不同层输入的文本特征均是文本编码器的输出,不随深度的变化而改变。
lijian6's avatar
lijian6 committed
22

lijian6's avatar
lijian6 committed
23
24
## 环境配置
提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
lijian6's avatar
lijian6 committed
25
```
lijian6's avatar
lijian6 committed
26
27
28
29
30
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stablediffusion_v2-1_dtk24.04_xformers0.0.25_py310
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name sd3 --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
lijian6's avatar
lijian6 committed
31
```
lijian6's avatar
lijian6 committed
32
33
34
35
镜像版本依赖:
* DTK驱动:dtk24.04
* Pytorch: 2.1.0
* python: python3.10
lijian6's avatar
lijian6 committed
36

lijian6's avatar
lijian6 committed
37
38
## 数据集

lijian6's avatar
lijian6 committed
39

lijian6's avatar
lijian6 committed
40
41
## 推理
### 安装diffuser和依赖
lijian6's avatar
lijian6 committed
42
43

```
lijian6's avatar
lijian6 committed
44
45
46
git clone http://developer.hpccube.com/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
cd stable-diffusion-3-medium_diffusers
git submodule init && git submodule update
lijian6's avatar
lijian6 committed
47
48
49
50

1. 卸载旧torch和diffusers
pip uninstall torch diffusers
2. 安装新版本
lijian6's avatar
lijian6 committed
51
52
cd diffusers
python3 setup.py install
lijian6's avatar
lijian6 committed
53
cd .. && ./env.sh
lijian6's avatar
lijian6 committed
54

lijian6's avatar
lijian6 committed
55
```
lijian6's avatar
lijian6 committed
56

lijian6's avatar
lijian6 committed
57
### 模型下载
lijian6's avatar
lijian6 committed
58

lijian6's avatar
lijian6 committed
59
模型快速下载中心:[AIModels](http://113.200.138.88:18080/aimodels), 本项目模型链接:[stable-diffusion-3-medium-diffusers](http://113.200.138.88:18080/aimodels/stable-diffusion-3-medium)
lijian6's avatar
lijian6 committed
60

lijian6's avatar
lijian6 committed
61
### 运行 stable-diffusion-3-medium
lijian6's avatar
lijian6 committed
62

lijian6's avatar
lijian6 committed
63
```
lijian6's avatar
lijian6 committed
64
65
66
67
68
python SD3-medium.py

使用xformers计算attention:
export USE_XFORMERS=1
python SD3-medium.py
lijian6's avatar
lijian6 committed
69
```
lijian6's avatar
lijian6 committed
70

lijian6's avatar
lijian6 committed
71
## result
lijian6's avatar
lijian6 committed
72
![img](./docs/result.png)
lijian6's avatar
lijian6 committed
73

lijian6's avatar
lijian6 committed
74
75
### 精度

lijian6's avatar
lijian6 committed
76

lijian6's avatar
lijian6 committed
77
78
79
## 应用场景
### 算法类别
`以文生图`
lijian6's avatar
lijian6 committed
80

lijian6's avatar
lijian6 committed
81
82
### 热点应用行业
`绘画,动漫,媒体`
lijian6's avatar
lijian6 committed
83

lijian6's avatar
lijian6 committed
84
85
## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/stable-diffusion-3-medium_diffusers.git
lijian6's avatar
lijian6 committed
86

lijian6's avatar
lijian6 committed
87
88
## 参考资料
https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
lijian6's avatar
lijian6 committed
89