# stable_diffusion ## 论文 [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/pdf/2112.10752) ## 模型结构 stable diffusion的核心是latent diffusion model,latent diffusion model结构如下: ![image](https://developer.hpccube.com/codes/modelzoo/stable_diffusion_ait/-/raw/master/resources/sd_model.png) ## 算法原理 根据模型结构,算法原理简要如下: ![image](https://developer.hpccube.com/codes/modelzoo/stable_diffusion_ait/-/raw/master/resources/sd_principle.png) ## 数据集 无 ## 环境配置 在[光源](https://sourcefind.cn/#/service-list)可拉取推理的docker镜像。stable_diffusion_ait推荐的镜像如下: ``` docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:ait-0.0.1_dtk24.04_py310 docker run -d -t -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri/ --network=host --group-add video --name sd-test image.sourcefind.cn:5000/dcu/admin/base/custom:ait-0.0.1_dtk24.04_py310 docker exec -it sd-test bash source /opt/dtk/env.sh ``` ## 推理 **install ait** ``` cd stable_diffusion_ait pip3 install dist/aitemplate-0.0.1-py3-none-any.whl ``` #### 01_resnet-50 ``` cd examples/01_resnet-50 ``` 下载resnet50 weights(https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth) **benchmark** ``` python3 benchmark_ait.py python3 benchmark_pt.py ``` **infer** ``` python3 infer_with_torch.py ``` #### 02_bert ``` cd examples/02_bert ``` 下载bert-base-uncased weights(https://huggingface.co/google-bert/bert-base-uncased) **benchmark** ``` python3 benchmark_ait.py python3 benchmark_pt.py ``` **infer** ``` python3 demo.py ``` #### 03_vit ``` cd examples/03_vit ``` 下载vit_base_patch16_224 weights(https://huggingface.co/timm/vit_base_patch16_224.augreg2_in21k_ft_in1k) **benchmark** ``` python3 benchmark_ait.py python3 benchmark_pt.py ``` **verification** ``` python3 verification.py ``` #### 04_stable_diffusion 下载stable-diffusion-2-1-base weights(https://huggingface.co/stabilityai/stable-diffusion-2-1-base) 下载clip-vit-large-patch14 weights(https://huggingface.co/openai/clip-vit-large-patch14) **compile** ``` cd examples/04_stable_diffusion python3 scripts/compile.py --local-dir stable-diffusion-2-1-base_path ``` **benchmark** ``` python3 src/benchmark.py --local-dir stable-diffusion-2-1-base_path --clip-dir clip-vit-large-patch14_path --benchmark-pt True ``` **infer** ``` python3 scripts/demo.py --local-dir stable-diffusion-2-1-base_path python3 scripts/demo_pt.py --local-dir stable-diffusion-2-1-base_path ``` ## result ![image](https://developer.hpccube.com/codes/modelzoo/stable_diffusion_ait/-/raw/master/resources/example_ait.png) ### 精度 无 ### 性能数据 01_resnet-50 | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 3.50665771484375 | 2.7900346517562866 | | 2 | 4.198978271484375 | 3.022238612174988 | | 4 | 5.242999877929687 | 3.6645140647888184 | | 8 | 7.416472778320313 | 4.517657279968262 | | 16 | 11.60461181640625 | 6.50670599937439 | | 32 | 19.8466064453125 | 10.511437177658081 | | 64 | 36.08590576171875 | 18.35416030883789 | | 128 | 67.2965625 | 32.82508373260498 | | 256 | 133.891044921875 | 59.32628345489502 | 02_bert bert sequence length 64 | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 5.492328491210937 | 3.763094484806061 | | 2 | 5.851014404296875 | 3.934549331665039 | | 4 | 5.8462060546875 | 7.370500087738037 | | 8 | 7.20282958984375 | 7.630655765533447 | | 16 | 10.13709716796875 | 6.997292518615723 | | 32 | 14.629547119140625 | 15.192972660064697 | | 64 | 24.83916259765625 | 18.988140106201172 | | 128 | 45.0836083984375 | 33.51811981201172 | | 256 | 85.2006640625 | 91.8479995727539 | bert sequence length 128 | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 5.583170776367187 | 3.8969525694847107 | | 2 | 5.851030883789062 | 7.2915791273117065 | | 4 | 7.507911376953125 | 7.635279178619385 | | 8 | 10.716405029296874 | 6.723778605461121 | | 16 | 16.03172607421875 | 14.665886878967285 | | 32 | 27.00265869140625 | 18.18143320083618 | | 64 | 49.812158203125 | 32.23751640319824 | | 128 | 94.589228515625 | 87.85263633728027 | | 256 | 179.57365234375 | 107.5546760559082 | bert sequence length 256 | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 5.536416625976562 | 4.418213129043579 | | 2 | 7.61077392578125 | 5.24817168712616 | | 4 | 11.207763671875 | 14.24576473236084 | | 8 | 16.431724853515625 | 12.16104507446289 | | 16 | 26.9556640625 | 19.11765956878662 | | 32 | 49.54421875 | 33.73731803894043 | | 64 | 93.535673828125 | 61.45344161987305 | | 128 | 178.09998046875 | 113.69585227966309 | | 256 | 347.1721484375 | 2563.2373657226562 | 03_vit vit_base_patch16_224 | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 4.531996154785157 | 8.322586297988892 | | 2 | 6.666417846679687 | 8.580682277679443 | | 4 | 10.00460205078125 | 6.754000902175903 | | 8 | 13.427578125 | 9.69419240951538 | | 16 | 21.916123046875 | 17.138832092285156 | | 32 | 40.23213134765625 | 28.402775287628174 | | 64 | 72.446611328125 | 53.653794288635254 | | 128 | 136.889541015625 | 99.72106170654297 | | 256 | 269.488203125 | 186.07625198364258 | 04_stable_diffusion single batch | module | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | clip | 11.66868896484375 | 13.639567375183105 | | unet | 106.440107421875 | 71.8858814239502 | | vae | 95.6298046875 | 74.00970458984375 | | pipline | 5429.30386474609375 | 3681.943343162536855 | batched version | batch size | pt latency(ms) | ait latency(ms) | | :------: | :------: |:------: | | 1 | 5429.30386474609375 | 3681.943343162536855 | | 2 | 13816.322286962532 | 5283.831155044027 | | 4 | 23745.903372997418 | 9285.692506004125 | ## 应用场景 ### 算法类别 文生图 ### 热点应用行业 艺术设计,游戏开发,电影制作 ## 源码仓库及问题反馈 http://10.6.10.68/modelzoo/stable_diffusion_ait ## 参考资料 https://github.com/ROCm/AITemplate