quicktour.md 15.7 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Nathan Lambert's avatar
Nathan Lambert committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

Steven Liu's avatar
Steven Liu committed
13
14
[[open-in-colab]]

Patrick von Platen's avatar
Patrick von Platen committed
15
# Quicktour
Nathan Lambert's avatar
Nathan Lambert committed
16

Steven Liu's avatar
Steven Liu committed
17
18
19
20
21
22
23
24
25
26
27
28
Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone.

Whether you're a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about:

* The [`DiffusionPipeline`] is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.
* Popular pretrained [model](./api/models) architectures and modules that can be used as building blocks for creating diffusion systems.
* Many different [schedulers](./api/schedulers/overview) - algorithms that control how noise is added for training, and how to generate denoised images during inference.

The quicktour will show you how to use the [`DiffusionPipeline`] for inference, and then walk you through how to combine a model and scheduler to replicate what's happening inside the [`DiffusionPipeline`].

<Tip>

29
The quicktour is a simplified version of the introductory 🧨 Diffusers [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) to help you get started quickly. If you want to learn more about 🧨 Diffusers' goal, design philosophy, and additional details about its core API, check out the notebook!
Steven Liu's avatar
Steven Liu committed
30
31

</Tip>
Nathan Lambert's avatar
Nathan Lambert committed
32

Patrick von Platen's avatar
Patrick von Platen committed
33
34
Before you begin, make sure you have all the necessary libraries installed:

35
36
37
```py
# uncomment to install the necessary libraries in Colab
#!pip install --upgrade diffusers accelerate transformers
Patrick von Platen's avatar
Patrick von Platen committed
38
39
```

Steven Liu's avatar
Steven Liu committed
40
41
- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training.
- [🤗 Transformers](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview).
Patrick von Platen's avatar
Patrick von Platen committed
42

Patrick von Platen's avatar
Patrick von Platen committed
43
44
## DiffusionPipeline

Steven Liu's avatar
Steven Liu committed
45
The [`DiffusionPipeline`] is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the [🧨 Diffusers Summary](./api/pipelines/overview#diffusers-summary) table.
Nathan Lambert's avatar
Nathan Lambert committed
46

Patrick von Platen's avatar
Patrick von Platen committed
47
48
| **Task**                     | **Description**                                                                                              | **Pipeline**
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
Steven Liu's avatar
Steven Liu committed
49
| Unconditional Image Generation          | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
50
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
Patrick von Platen's avatar
Patrick von Platen committed
51
| Text-Guided Image-to-Image Translation     | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
52
| Text-Guided Image-Inpainting          | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
53
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
Nathan Lambert's avatar
Nathan Lambert committed
54

Steven Liu's avatar
Steven Liu committed
55
56
57
Start by creating an instance of a [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
You can use the [`DiffusionPipeline`] for any [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) stored on the Hugging Face Hub.
In this quicktour, you'll load the [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint for text-to-image generation.
Patrick von Platen's avatar
Patrick von Platen committed
58

Steven Liu's avatar
Steven Liu committed
59
<Tip warning={true}>
Patrick von Platen's avatar
Patrick von Platen committed
60

Steven Liu's avatar
Steven Liu committed
61
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) models, please carefully read the [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) first before running the model. 🧨 Diffusers implements a [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) to prevent offensive or harmful content, but the model's improved image generation capabilities can still produce potentially harmful content.
Patrick von Platen's avatar
Patrick von Platen committed
62

Steven Liu's avatar
Steven Liu committed
63
64
65
</Tip>

Load the model with the [`~DiffusionPipeline.from_pretrained`] method:
Patrick von Platen's avatar
Patrick von Platen committed
66
67
68
69

```python
>>> from diffusers import DiffusionPipeline

70
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
Nathan Lambert's avatar
Nathan Lambert committed
71
```
Patrick von Platen's avatar
Patrick von Platen committed
72

Steven Liu's avatar
Steven Liu committed
73
74
75
76
77
78
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. You'll see that the Stable Diffusion pipeline is composed of the [`UNet2DConditionModel`] and [`PNDMScheduler`] among other things:

```py
>>> pipeline
StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
79
  "_diffusers_version": "0.21.4",
Steven Liu's avatar
Steven Liu committed
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
  ...,
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  ...,
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}
```

We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters.
You can move the generator object to a GPU, just like you would in PyTorch:
Patrick von Platen's avatar
Patrick von Platen committed
99
100

```python
101
>>> pipeline.to("cuda")
Patrick von Platen's avatar
Patrick von Platen committed
102
103
```

Steven Liu's avatar
Steven Liu committed
104
Now you can pass a text prompt to the `pipeline` to generate an image, and then access the denoised image. By default, the image output is wrapped in a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
Patrick von Platen's avatar
Patrick von Platen committed
105
106

```python
107
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
Steven Liu's avatar
Steven Liu committed
108
>>> image
Patrick von Platen's avatar
Patrick von Platen committed
109
110
```

Steven Liu's avatar
Steven Liu committed
111
112
113
<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
</div>
Patrick von Platen's avatar
Patrick von Platen committed
114

Steven Liu's avatar
Steven Liu committed
115
Save the image by calling `save`:
Patrick von Platen's avatar
Patrick von Platen committed
116
117
118
119
120

```python
>>> image.save("image_of_squirrel_painting.png")
```

Steven Liu's avatar
Steven Liu committed
121
122
123
### Local pipeline

You can also use the pipeline locally. The only difference is you need to download the weights first:
Patrick von Platen's avatar
Patrick von Platen committed
124

YiYi Xu's avatar
YiYi Xu committed
125
126
127
```bash
!git lfs install
!git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
Patrick von Platen's avatar
Patrick von Platen committed
128
129
```

Steven Liu's avatar
Steven Liu committed
130
Then load the saved weights into the pipeline:
Patrick von Platen's avatar
Patrick von Platen committed
131
132

```python
133
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)
Patrick von Platen's avatar
Patrick von Platen committed
134
135
```

136
Now, you can run the pipeline as you would in the section above.
Patrick von Platen's avatar
Patrick von Platen committed
137

Steven Liu's avatar
Steven Liu committed
138
### Swapping schedulers
Patrick von Platen's avatar
Patrick von Platen committed
139

Steven Liu's avatar
Steven Liu committed
140
Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default [`PNDMScheduler`] with the [`EulerDiscreteScheduler`], load it with the [`~diffusers.ConfigMixin.from_config`] method:
Patrick von Platen's avatar
Patrick von Platen committed
141

Steven Liu's avatar
Steven Liu committed
142
```py
143
>>> from diffusers import EulerDiscreteScheduler
Patrick von Platen's avatar
Patrick von Platen committed
144

145
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
146
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
Nathan Lambert's avatar
Nathan Lambert committed
147
148
```

Steven Liu's avatar
Steven Liu committed
149
150
151
152
153
154
155
156
157
158
159
160
161
162
Try generating an image with the new scheduler and see if you notice a difference!

In the next section, you'll take a closer look at the components - the model and scheduler - that make up the [`DiffusionPipeline`] and learn how to use these components to generate an image of a cat.

## Models

Most models take a noisy sample, and at each timestep it predicts the *noise residual* (other models learn to predict the previous sample directly or the velocity or [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.

Models are initiated with the [`~ModelMixin.from_pretrained`] method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you'll load the [`UNet2DModel`], a basic unconditional image generation model with a checkpoint trained on cat images:

```py
>>> from diffusers import UNet2DModel

>>> repo_id = "google/ddpm-cat-256"
163
>>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
Steven Liu's avatar
Steven Liu committed
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
```

To access the model parameters, call `model.config`:

```py
>>> model.config
```

The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can't be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.

Some of the most important parameters are:

* `sample_size`: the height and width dimension of the input sample.
* `in_channels`: the number of input channels of the input sample.
* `down_block_types` and `up_block_types`: the type of down- and upsampling blocks used to create the UNet architecture.
* `block_out_channels`: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.
* `layers_per_block`: the number of ResNet blocks present in each UNet block.

To use the model for inference, create the image shape with random Gaussian noise. It should have a `batch` axis because the model can receive multiple random noises, a `channel` axis corresponding to the number of input channels, and a `sample_size` axis for the height and width of the image:

```py
>>> import torch

>>> torch.manual_seed(0)

>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])
```

194
For inference, pass the noisy image and a `timestep` to the model. The `timestep` indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the `sample` method to get the model output:
Steven Liu's avatar
Steven Liu committed
195
196
197
198
199
200
201
202
203
204
205
206
207
208

```py
>>> with torch.no_grad():
...     noisy_residual = model(sample=noisy_sample, timestep=2).sample
```

To generate actual examples though, you'll need a scheduler to guide the denoising process. In the next section, you'll learn how to couple a model with a scheduler.

## Schedulers

Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the `noisy_residual`.

<Tip>

209
🧨 Diffusers is a toolbox for building diffusion systems. While the [`DiffusionPipeline`] is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system.
Steven Liu's avatar
Steven Liu committed
210
211
212

</Tip>

213
For the quicktour, you'll instantiate the [`DDPMScheduler`] with its [`~diffusers.ConfigMixin.from_config`] method:
Steven Liu's avatar
Steven Liu committed
214
215
216
217

```py
>>> from diffusers import DDPMScheduler

218
>>> scheduler = DDPMScheduler.from_pretrained(repo_id)
Steven Liu's avatar
Steven Liu committed
219
220
221
>>> scheduler
DDPMScheduler {
  "_class_name": "DDPMScheduler",
222
  "_diffusers_version": "0.21.4",
Steven Liu's avatar
Steven Liu committed
223
224
225
226
227
  "beta_end": 0.02,
  "beta_schedule": "linear",
  "beta_start": 0.0001,
  "clip_sample": true,
  "clip_sample_range": 1.0,
228
  "dynamic_thresholding_ratio": 0.995,
Steven Liu's avatar
Steven Liu committed
229
230
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
231
232
233
234
  "sample_max_value": 1.0,
  "steps_offset": 0,
  "thresholding": false,
  "timestep_spacing": "leading",
Steven Liu's avatar
Steven Liu committed
235
236
237
238
239
240
241
  "trained_betas": null,
  "variance_type": "fixed_small"
}
```

<Tip>

242
💡 Unlike a model, a scheduler does not have trainable weights and is parameter-free!
Steven Liu's avatar
Steven Liu committed
243
244
245
246
247

</Tip>

Some of the most important parameters are:

248
* `num_train_timesteps`: the length of the denoising process or, in other words, the number of timesteps required to process random Gaussian noise into a data sample.
Steven Liu's avatar
Steven Liu committed
249
250
251
252
253
254
255
256
* `beta_schedule`: the type of noise schedule to use for inference and training.
* `beta_start` and `beta_end`: the start and end noise values for the noise schedule.

To predict a slightly less noisy image, pass the following to the scheduler's [`~diffusers.DDPMScheduler.step`] method: model output, `timestep`, and current `sample`.

```py
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape
257
torch.Size([1, 3, 256, 256])
Steven Liu's avatar
Steven Liu committed
258
259
```

260
The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisy! Let's bring it all together now and visualize the entire denoising process. 
Steven Liu's avatar
Steven Liu committed
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304

First, create a function that postprocesses and displays the denoised image as a `PIL.Image`:

```py
>>> import PIL.Image
>>> import numpy as np


>>> def display_sample(sample, i):
...     image_processed = sample.cpu().permute(0, 2, 3, 1)
...     image_processed = (image_processed + 1.0) * 127.5
...     image_processed = image_processed.numpy().astype(np.uint8)

...     image_pil = PIL.Image.fromarray(image_processed[0])
...     display(f"Image at step {i}")
...     display(image_pil)
```

To speed up the denoising process, move the input and model to a GPU:

```py
>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")
```

Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:

```py
>>> import tqdm

>>> sample = noisy_sample

>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
...     # 1. predict noise residual
...     with torch.no_grad():
...         residual = model(sample, t).sample

...     # 2. compute less noisy image and set x_t -> x_t-1
...     sample = scheduler.step(residual, t, sample).prev_sample

...     # 3. optionally look at image
...     if (i + 1) % 50 == 0:
...         display_sample(sample, i + 1)
```
305

Steven Liu's avatar
Steven Liu committed
306
Sit back and watch as a cat is generated from nothing but noise! 😻
Nathan Lambert's avatar
Nathan Lambert committed
307

Steven Liu's avatar
Steven Liu committed
308
309
310
<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
</div>
Nathan Lambert's avatar
Nathan Lambert committed
311

Steven Liu's avatar
Steven Liu committed
312
## Next steps
Nathan Lambert's avatar
Nathan Lambert committed
313

314
Hopefully, you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can:
Nathan Lambert's avatar
Nathan Lambert committed
315

Steven Liu's avatar
Steven Liu committed
316
317
* Train or finetune a model to generate your own images in the [training](./tutorials/basic_training) tutorial.
* See example official and community [training or finetuning scripts](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) for a variety of use cases.
318
319
* Learn more about loading, accessing, changing, and comparing schedulers in the [Using different Schedulers](./using-diffusers/schedulers) guide.
* Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher-quality images with the [Stable Diffusion](./stable_diffusion) guide.
320
* Dive deeper into speeding up 🧨 Diffusers with guides on [optimized PyTorch on a GPU](./optimization/fp16), and inference guides for running [Stable Diffusion on Apple Silicon (M1/M2)](./optimization/mps) and [ONNX Runtime](./optimization/onnx).