README.md 7.3 KB
Newer Older
1
2
3
4
5
6
7
8
# Stable Diffusion

## Overview

Stable Diffusion was proposed in [Stable Diffusion Announcement](https://stability.ai/blog/stable-diffusion-announcement) by Patrick Esser and Robin Rombach and the Stability AI team.

The summary of the model is the following:

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
9
*Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page. The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.*
10
11
12

## Tips:

Quentin Gallouédec's avatar
Quentin Gallouédec committed
13
- Stable Diffusion has the same architecture as [Latent Diffusion](https://huggingface.co/papers/2112.10752) but uses a frozen CLIP Text Encoder instead of training the text encoder jointly with the diffusion model.
14
- An in-detail explanation of the Stable Diffusion model can be found under [Stable Diffusion with 🧨 Diffusers](https://huggingface.co/blog/stable_diffusion).
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
15
- If you don't want to rely on the Hugging Face Hub and having to pass a authentication token, you can
16
download the weights with `git lfs install; git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5` and instead pass the local path to the cloned folder to `from_pretrained` as shown below.
17
18
19
20
21
22
23
- Stable Diffusion can work with a variety of different samplers as is shown below.

## Available Pipelines:

| Pipeline | Tasks | Colab
|---|---|:---:|
| [pipeline_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
24
25
| [pipeline_stable_diffusion_img2img](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [pipeline_stable_diffusion_inpaint](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | *Text-Guided Image Inpainting* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
26
27
28

## Examples:

29
30
### Using Stable Diffusion without being logged into the Hub.

31
If you want to download the model weights using a single Python line, you need to be logged in via `hf auth login`.
32
33
34
35

```python
from diffusers import DiffusionPipeline

36
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
37
38
```

apolinario's avatar
apolinario committed
39
This however can make it difficult to build applications on top of `diffusers` as you will always have to pass the token around. A potential way to solve this issue is by downloading the weights to a local path `"./stable-diffusion-v1-5"`:
40
41
42

```
git lfs install
43
git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
44
45
46
47
48
49
50
```

and simply passing the local path to `from_pretrained`:

```python
from diffusers import StableDiffusionPipeline

apolinario's avatar
apolinario committed
51
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
52
53
```

54
55
56
### Text-to-Image with default PLMS scheduler

```python
57
# make sure you're logged in with `hf auth login`
58
59
from diffusers import StableDiffusionPipeline

60
pipe = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
61
62
63
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
64
65
image = pipe(prompt).images[0]

66
67
68
69
70
71
image.save("astronaut_rides_horse.png")
```

### Text-to-Image with DDIM scheduler

```python
72
# make sure you're logged in with `hf auth login`
73
74
from diffusers import StableDiffusionPipeline, DDIMScheduler

75
scheduler =  DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
76
77

pipe = StableDiffusionPipeline.from_pretrained(
78
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
79
80
81
82
    scheduler=scheduler,
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
83
84
image = pipe(prompt).images[0]

85
86
87
88
89
90
image.save("astronaut_rides_horse.png")
```

### Text-to-Image with K-LMS scheduler

```python
91
# make sure you're logged in with `hf auth login`
92
93
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

94
lms = LMSDiscreteScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
95
96

pipe = StableDiffusionPipeline.from_pretrained(
97
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
98
99
100
101
    scheduler=lms,
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
102
103
image = pipe(prompt).images[0]

104
105
image.save("astronaut_rides_horse.png")
```
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

### CycleDiffusion using Stable Diffusion and DDIM scheduler

```python
import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import CycleDiffusionPipeline, DDIMScheduler


# load the scheduler. CycleDiffusion only supports stochastic schedulers.

# load the pipeline
121
# make sure you're logged in with `hf auth login`
122
model_id_or_path = "CompVis/stable-diffusion-v1-4"
123
scheduler = DDIMScheduler.from_pretrained(model_id_or_path, subfolder="scheduler")
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
pipe = CycleDiffusionPipeline.from_pretrained(model_id_or_path, scheduler=scheduler).to("cuda")

# let's download an initial image
url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/An%20astronaut%20riding%20a%20horse.png"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image.save("horse.png")

# let's specify a prompt
source_prompt = "An astronaut riding a horse"
prompt = "An astronaut riding an elephant"

# call the pipeline
image = pipe(
    prompt=prompt,
    source_prompt=source_prompt,
141
    image=init_image,
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
    num_inference_steps=100,
    eta=0.1,
    strength=0.8,
    guidance_scale=2,
    source_guidance_scale=1,
).images[0]

image.save("horse_to_elephant.png")

# let's try another example
# See more samples at the original repo: https://github.com/ChenWu98/cycle-diffusion
url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/A%20black%20colored%20car.png"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image.save("black.png")

source_prompt = "A black colored car"
prompt = "A blue colored car"

# call the pipeline
torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    source_prompt=source_prompt,
167
    image=init_image,
168
169
170
171
172
173
174
175
176
    num_inference_steps=100,
    eta=0.1,
    strength=0.85,
    guidance_scale=3,
    source_guidance_scale=1,
).images[0]

image.save("black_to_blue.png")
```