# Stable Diffusion ## 论文 `High-Resolution Image Synthesis with Latent Diffusion Models` - https://arxiv.org/abs/2112.10752 ## 模型结构 通过串联或更通用的交叉注意机制来调节LDM ![img](./doc/arch.png) ## 算法原理 通过将图像形成过程分解为去噪自动编码器的顺序应用,扩散模型(DM)在图像数据和其他数据上实现了最先进的合成结果。为了在有限的计算资源上进行DM训练,同时保持其质量和灵活性,我们将其应用于强大的预训练自动编码器的潜在空间。在这种表示上训练扩散模型首次能够在降低复杂性和空间下采样之间达到接近最佳的点,提高了视觉逼真度。通过在模型架构中引入跨注意力层,将扩散模型变成了强大而灵活的生成器,用于文本或边界框等一般条件输入,高分辨率合成以卷积方式成为可能。我们的潜在扩散模型(LDM)在各种任务上实现了极具竞争力的性能,包括无条件图像生成、修复和超分辨率,同时与基于像素的DM相比,显著降低了计算要求。 ![img](./doc/algo.png) ## 模型介绍 Stable Diffusion是一种文图生成潜在扩散型。我们能够使用LAION-512B数据库子集的512x512图像上训练潜在扩散模型。与谷歌的Imagen类似,该模型使用frozen CLIP ViT-L/14文本编码器根据文本提示对模型进行调节。凭借其860M UNet和123M文本编码器,该模型相对轻巧,可以在少至具有10GB VRAM的GPU上运行。 ## 环境配置 ### Docker(方法一): ``` docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stablediffusion # 用以上拉取的docker的镜像ID替换 docker run --rm --shm-size 10g --network=host --name=stablediffussion --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_stablediffussion:/home/sd -it bash ``` ### Dockerfile(方法二): ``` cd stablediffussion/docker docker build --no-cache -t stablediffussion:test . docker run --rm --shm-size 10g --network=host --name=stablediffussion --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_stablediffussion:/home/sd -it stablediffussion:test bash ``` ## 下载Stable Diffusion模型 ``` cd stablediffussion ## 下载checkpoint model git lfs install git clone https://huggingface.co/CompVis/stable-diffusion-v-1-1-original git clone https://huggingface.co/CompVis/stable-diffusion-v-1-2-original git clone https://huggingface.co/CompVis/stable-diffusion-v-1-3-original git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original ## 下载diffusers version模型 git clone https://huggingface.co/CompVis/stable-diffusion-v1-1 git clone https://huggingface.co/CompVis/stable-diffusion-v1-2 git clone https://huggingface.co/CompVis/stable-diffusion-v1-3 git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 以上9个,任选其一 ## 下载safety-checker git clone https://huggingface.co/CompVis/stable-diffusion-safety-checker ``` ## 推理: ### 1、运行checkpoint version示例: 下载完`stable-diffusion-*-original`模型之后,link到对应的目录: ``` mkdir -p models/ldm/stable-diffusion-v1/ ln -s models/ldm/stable-diffusion-v1/model.ckpt ``` 运行: ``` python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms ``` ``` usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}] optional arguments: -h, --help show this help message and exit --prompt [PROMPT] the prompt to render --outdir [OUTDIR] dir to write results to --skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples --skip_save do not save individual samples. For speed measurements. --ddim_steps DDIM_STEPS number of ddim sampling steps --plms use plms sampling --laion400m uses the LAION400M model --fixed_code if enabled, uses the same starting code across samples --ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling --n_iter N_ITER sample this often --H H image height, in pixel space --W W image width, in pixel space --C C latent channels --f F downsampling factor --n_samples N_SAMPLES how many samples to produce for each given prompt. A.k.a. batch size --n_rows N_ROWS rows in the grid (default: n_samples) --scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty)) --from-file FROM_FILE if specified, load prompts from this file --config CONFIG path to config which constructs model --ckpt CKPT path to checkpoint of model --seed SEED the seed (for reproducible sampling) --precision {full,autocast} evaluate at this precision ``` ### 2、运行Diffusers示例: 下载diffusers version模型, 运行 python Diffusers.py: ``` # make sure you're logged in with `huggingface-cli login` import torch from torch import autocast from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained( "stable-diffusion-v1-4", use_auth_token=True ).to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt)["sample"][0] image.save("astronaut_rides_horse.png") ``` ## result ![img](./doc/result.png) ## 应用场景 ### 算法类别 `文图生成` ### 热点应用行业 `绘画,动漫,3D人物模型` ## 源码仓库及问题反馈 http://developer.hpccube.com/codes/modelzoo/stablediffussion.git ## 参考 https://github.com/CompVis/stable-diffusion