# Stable Diffusion

## 论文
`High-Resolution Image Synthesis with Latent Diffusion Models`
- https://arxiv.org/abs/2112.10752

## 模型结构
通过串联或更通用的交叉注意机制来调节LDM

![img](./doc/arch.png)
## 算法原理
通过将图像形成过程分解为去噪自动编码器的顺序应用，扩散模型（DM）在图像数据和其他数据上实现了最先进的合成结果。为了在有限的计算资源上进行DM训练，同时保持其质量和灵活性，我们将其应用于强大的预训练自动编码器的潜在空间。在这种表示上训练扩散模型首次能够在降低复杂性和空间下采样之间达到接近最佳的点，提高了视觉逼真度。通过在模型架构中引入跨注意力层，将扩散模型变成了强大而灵活的生成器，用于文本或边界框等一般条件输入，高分辨率合成以卷积方式成为可能。我们的潜在扩散模型（LDM）在各种任务上实现了极具竞争力的性能，包括无条件图像生成、修复和超分辨率，同时与基于像素的DM相比，显著降低了计算要求。

![img](./doc/algo.png)
## 模型介绍

Stable Diffusion是一种文图生成潜在扩散型。我们能够使用LAION-512B数据库子集的512x512图像上训练潜在扩散模型。与谷歌的Imagen类似，该模型使用frozen CLIP ViT-L/14文本编码器根据文本提示对模型进行调节。凭借其860M UNet和123M文本编码器，该模型相对轻巧，可以在少至具有10GB VRAM的GPU上运行。
  
## 环境配置

### Docker（方法一）：
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:stablediffusion
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run --rm --shm-size 10g --network=host --name=stablediffussion --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_stablediffussion:/home/sd -it <your IMAGE ID> bash
```
### Dockerfile（方法二）：
```
cd stablediffussion/docker
docker build --no-cache -t stablediffussion:test .
docker run --rm --shm-size 10g --network=host --name=stablediffussion --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_stablediffussion:/home/sd -it stablediffussion:test bash
```
## 下载Stable Diffusion模型
```
cd stablediffussion
## 下载checkpoint model
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-1-original
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-2-original
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-3-original
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original

## 下载diffusers version模型
git clone https://huggingface.co/CompVis/stable-diffusion-v1-1
git clone https://huggingface.co/CompVis/stable-diffusion-v1-2
git clone https://huggingface.co/CompVis/stable-diffusion-v1-3
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
以上9个，任选其一

## 下载safety-checker
git clone https://huggingface.co/CompVis/stable-diffusion-safety-checker
```

## 推理：

### 1、运行checkpoint version示例：

下载完`stable-diffusion-*-original`模型之后，link到对应的目录：
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
```

运行：
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
```
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
                  [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
                  [--seed SEED] [--precision {full,autocast}]

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision
```

### 2、运行Diffusers示例：

下载diffusers version模型, 运行 python Diffusers.py：
```
# make sure you're logged in with `huggingface-cli login`
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
	"stable-diffusion-v1-4", 
	use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")
```
## result
![img](./doc/result.png)

## 应用场景
### 算法类别
`文图生成`

### 热点应用行业
`绘画,动漫,3D人物模型`

## 源码仓库及问题反馈
http://developer.hpccube.com/codes/modelzoo/stablediffussion.git

## 参考
https://github.com/CompVis/stable-diffusion