# Stable Diffusion ## 模型介绍 Stable Diffusion是一种潜在的文本到图像扩散型。我们能够在LAION-512B数据库子集的512x5图像上训练潜在扩散模型。与谷歌的Imagen类似,该模型使用冻结的 CLIP ViT-L/14 文本编码器根据文本提示对模型进行调节。凭借其860M UNet和123M文本编码器,该模型相对轻巧,可以在至少具有10GB VRAM的GPU上运行。请参阅下面的此部分和模型卡。 ## 环境配置 1、使用镜像: 在光源可拉取推理的docker镜像,拉取方式如下: ``` docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stable-diffusion ``` 2、使用conda环境: A suitable [conda](https://conda.io/) environment named `ldm` can be created and activated with: ``` conda env create -f environment.yaml conda activate ldm ``` ## 下载Stable Diffusion模型 ### checkpoint version git lfs install git clone https://huggingface.co/CompVis/stable-diffusion-v-1-1-original ### diffusers version git clone https://huggingface.co/CompVis/stable-diffusion-v1-1 ## 运行示例: ### 运行checkpoint version示例: 下载完`stable-diffusion-*-original`模型之后,link到对应的目录: ``` mkdir -p models/ldm/stable-diffusion-v1/ ln -s models/ldm/stable-diffusion-v1/model.ckpt ``` 运行: ``` python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms ``` ```commandline usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}] optional arguments: -h, --help show this help message and exit --prompt [PROMPT] the prompt to render --outdir [OUTDIR] dir to write results to --skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples --skip_save do not save individual samples. For speed measurements. --ddim_steps DDIM_STEPS number of ddim sampling steps --plms use plms sampling --laion400m uses the LAION400M model --fixed_code if enabled, uses the same starting code across samples --ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling --n_iter N_ITER sample this often --H H image height, in pixel space --W W image width, in pixel space --C C latent channels --f F downsampling factor --n_samples N_SAMPLES how many samples to produce for each given prompt. A.k.a. batch size --n_rows N_ROWS rows in the grid (default: n_samples) --scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty)) --from-file FROM_FILE if specified, load prompts from this file --config CONFIG path to config which constructs model --ckpt CKPT path to checkpoint of model --seed SEED the seed (for reproducible sampling) --precision {full,autocast} evaluate at this precision ``` ### 运行Diffusers示例: Diffusers.py: ```py # make sure you're logged in with `huggingface-cli login` from torch import autocast from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", use_auth_token=True ).to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt)["sample"][0] image.save("astronaut_rides_horse.png") ``` ## 源码仓库及问题反馈 http://developer.hpccube.com/codes/modelzoo/stablediffussion.git ## 参考 https://github.com/CompVis/stable-diffusion