# stable_diffusion
## 论文
[High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/pdf/2112.10752)
## 模型结构
stable diffusion的核心是latent diffusion model,latent diffusion model结构如下：

![image](http://10.6.10.68/modelzoo/stable_diffusion_ait/-/raw/master/resources/sd_model.png)
## 算法原理
根据模型结构，算法原理简要如下：

![image](http://10.6.10.68/modelzoo/stable_diffusion_ait/-/raw/master/resources/sd_principle.png)
## 数据集
无
## 环境配置
在[光源](https://sourcefind.cn/#/service-list)可拉取推理的docker镜像。stable_diffusion_ait推荐的镜像如下：
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:ait-0.0.1_dtk24.04_py310
docker run -d -t -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri/ --network=host --group-add video --name sd-test image.sourcefind.cn:5000/dcu/admin/base/custom:ait-0.0.1_dtk24.04_py310
docker exec -it sd-test bash
source /opt/dtk/env.sh
```
## 推理
**install ait**
```
cd stable_diffusion_ait
pip3 install dist/aitemplate-0.0.1-py3-none-any.whl
```
#### 01_resnet-50
```
cd examples/01_resnet-50
```
下载resnet50 weights(https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth)

**benchmark**
```
python3 benchmark_ait.py
python3 benchmark_pt.py
```
**infer**
```
python3 infer_with_torch.py
```
#### 02_bert
```
cd examples/02_bert
```
下载bert-base-uncased weights(https://huggingface.co/google-bert/bert-base-uncased)

**benchmark**
```
python3 benchmark_ait.py
python3 benchmark_pt.py
```
**infer**
```
python3 demo.py
```
#### 03_vit
```
cd examples/03_vit
```
下载vit_base_patch16_224 weights(https://huggingface.co/timm/vit_base_patch16_224.augreg2_in21k_ft_in1k)

**benchmark**
```
python3 benchmark_ait.py
python3 benchmark_pt.py
```
**verification**
```
python3 verification.py
```
#### 04_stable_diffusion

下载stable-diffusion-2-1-base weights(https://huggingface.co/stabilityai/stable-diffusion-2-1-base)

下载clip-vit-large-patch14 weights(https://huggingface.co/openai/clip-vit-large-patch14)

**compile**
```
cd examples/04_stable_diffusion
python3 scripts/compile.py --local-dir stable-diffusion-2-1-base_path
```
**benchmark**
```
python3 src/benchmark.py --local-dir stable-diffusion-2-1-base_path --clip-dir clip-vit-large-patch14_path --benchmark-pt True
```
**infer**
```
python3 scripts/demo.py --local-dir stable-diffusion-2-1-base_path
python3 scripts/demo_pt.py --local-dir stable-diffusion-2-1-base_path
```
## result
![image](http://10.6.10.68/modelzoo/stable_diffusion_ait/-/raw/master/resources/example_ait.png)
### 精度
无
### 性能数据
01_resnet-50
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 3.50665771484375 | 2.7900346517562866 | 
| 2 | 4.198978271484375 | 3.022238612174988 | 
| 4 | 5.242999877929687 | 3.6645140647888184 | 
| 8 | 7.416472778320313 | 4.517657279968262 | 
| 16 | 11.60461181640625 | 6.50670599937439 | 
| 32 | 19.8466064453125 | 10.511437177658081 | 
| 64 | 36.08590576171875 | 18.35416030883789 | 
| 128 | 67.2965625 | 32.82508373260498 | 
| 256 | 133.891044921875 | 59.32628345489502 | 

02_bert
bert sequence length 64
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 5.492328491210937 | 3.763094484806061 | 
| 2 | 5.851014404296875 | 3.934549331665039 | 
| 4 | 5.8462060546875 | 7.370500087738037 | 
| 8 | 7.20282958984375 | 7.630655765533447 | 
| 16 | 10.13709716796875 | 6.997292518615723 | 
| 32 | 14.629547119140625 | 15.192972660064697 | 
| 64 | 24.83916259765625 | 18.988140106201172 | 
| 128 | 45.0836083984375 | 33.51811981201172 | 
| 256 | 85.2006640625 | 91.8479995727539 | 

bert sequence length 128
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 5.583170776367187 | 3.8969525694847107 | 
| 2 | 5.851030883789062 | 7.2915791273117065 | 
| 4 | 7.507911376953125 | 7.635279178619385 | 
| 8 | 10.716405029296874 | 6.723778605461121 | 
| 16 | 16.03172607421875 | 14.665886878967285 | 
| 32 | 27.00265869140625 | 18.18143320083618 | 
| 64 | 49.812158203125 | 32.23751640319824 | 
| 128 | 94.589228515625 | 87.85263633728027 | 
| 256 | 179.57365234375 | 107.5546760559082 | 

bert sequence length 256
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 5.536416625976562 | 4.418213129043579 | 
| 2 | 7.61077392578125 | 5.24817168712616 | 
| 4 | 11.207763671875 | 14.24576473236084 | 
| 8 | 16.431724853515625 | 12.16104507446289 | 
| 16 | 26.9556640625 | 19.11765956878662 | 
| 32 | 49.54421875 | 33.73731803894043 | 
| 64 | 93.535673828125 | 61.45344161987305 | 
| 128 | 178.09998046875 | 113.69585227966309 | 
| 256 | 347.1721484375 | 2563.2373657226562 | 

03_vit
vit_base_patch16_224
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 4.531996154785157 | 8.322586297988892 | 
| 2 | 6.666417846679687 | 8.580682277679443 | 
| 4 | 10.00460205078125 | 6.754000902175903 | 
| 8 | 13.427578125 | 9.69419240951538 | 
| 16 | 21.916123046875 | 17.138832092285156 | 
| 32 | 40.23213134765625 | 28.402775287628174 | 
| 64 | 72.446611328125 | 53.653794288635254 | 
| 128 | 136.889541015625 | 99.72106170654297 | 
| 256 | 269.488203125 | 186.07625198364258 | 

04_stable_diffusion
single batch
| module | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| clip | 11.66868896484375 | 13.639567375183105 | 
| unet | 106.440107421875 | 71.8858814239502 | 
| vae | 95.6298046875 | 74.00970458984375 | 
| pipline | 5429.30386474609375 | 3681.943343162536855 | 

batched version
| batch size | pt latency(ms) | ait latency(ms) | 
| :------: | :------: |:------: |
| 1 | 5429.30386474609375 | 3681.943343162536855 | 
| 2 | 13816.322286962532 | 5283.831155044027 | 
| 4 | 23745.903372997418 | 9285.692506004125 | 
## 应用场景
### 算法类别
文生图
### 热点应用行业
艺术设计,游戏开发,电影制作
## 源码仓库及问题反馈
http://10.6.10.68/modelzoo/stable_diffusion_ait
## 参考资料
https://github.com/ROCm/AITemplate

