loading.md 10.5 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Nathan Lambert's avatar
Nathan Lambert committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
14
[[open-in-colab]]

15
# DiffusionPipeline
16

17
Diffusion models consists of multiple components like UNets or diffusion transformers (DiTs), text encoders, variational autoencoders (VAEs), and schedulers. The [`DiffusionPipeline`] wraps all of these components into a single easy-to-use API without giving up the flexibility to modify it's components.
18

19
This guide will show you how to load a [`DiffusionPipeline`].
20

21
## Loading a pipeline
22

23
[`DiffusionPipeline`] is a base pipeline class that automatically selects and returns an instance of a model's pipeline subclass, like [`QwenImagePipeline`], by scanning the `model_index.json` file for the class name.
24

25
Pass a model id to [`~DiffusionPipeline.from_pretrained`] to load a pipeline.
Steven Liu's avatar
Steven Liu committed
26
27

```py
28
import torch
Steven Liu's avatar
Steven Liu committed
29
30
from diffusers import DiffusionPipeline

31
32
33
pipeline = DiffusionPipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
)
Steven Liu's avatar
Steven Liu committed
34
35
```

36
Every model has a specific pipeline subclass that inherits from [`DiffusionPipeline`]. A subclass usually has a narrow focus and are task-specific. See the table below for an example.
37

38
39
40
41
42
| pipeline subclass | task |
|---|---|
| [`QwenImagePipeline`] | text-to-image |
| [`QwenImageImg2ImgPipeline`] | image-to-image |
| [`QwenImageInpaintPipeline`] | inpaint |
43

44
You could use the subclass directly by passing a model id to [`~QwenImagePipeline.from_pretrained`].
45

Steven Liu's avatar
Steven Liu committed
46
```py
hlky's avatar
hlky committed
47
import torch
48
from diffusers import QwenImagePipeline
hlky's avatar
hlky committed
49

50
51
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
hlky's avatar
hlky committed
52
53
54
)
```

Steven Liu's avatar
Steven Liu committed
55
56
57
> [!TIP]
> Refer to the [Single file format](./other-formats#single-file-format) docs to learn how to load single file models.

58
### Local pipelines
hlky's avatar
hlky committed
59

60
Pipelines can also be run locally. Use [`~huggingface_hub.snapshot_download`] to download a model repository.
61

62
63
```py
from huggingface_hub import snapshot_download
64

65
66
snapshot_download(repo_id="Qwen/Qwen-Image")
```
67

68
The model is downloaded to your [cache](../installation#cache). Pass the folder path to [`~QwenImagePipeline.from_pretrained`] to load it.
69
70
71

```py
import torch
72
from diffusers import QwenImagePipeline
73

74
75
pipeline = QwenImagePipeline.from_pretrained(
  "path/to/your/cache", torch_dtype=torch.bfloat16, device_map="cuda"
76
77
78
)
```

79
The [`~QwenImagePipeline.from_pretrained`] method won't download files from the Hub when it detects a local path. But this also means it won't download and cache any updates that have been made to the model either.
80

81
## Pipeline data types
82

83
Use the `torch_dtype` argument in [`~DiffusionPipeline.from_pretrained`] to load a model with a specific data type. This allows you to load different models in different precisions. For example, loading a large transformer model in half-precision reduces the memory required.
84

85
Pass the data type for each model as a dictionary to `torch_dtype`. Use the `default` key to set the default data type. If a model isn't in the dictionary and `default` isn't provided, it is loaded in full precision (`torch.float32`).
86

Steven Liu's avatar
Steven Liu committed
87
88
```py
import torch
89
from diffusers import QwenImagePipeline
90

91
92
93
94
95
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image",
  torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
print(pipeline.transformer.dtype, pipeline.vae.dtype)
96
```
97

98
You don't need to use a dictionary if you're loading all the models in the same data type.
99
100

```py
101
import torch
102
from diffusers import QwenImagePipeline
Steven Liu's avatar
Steven Liu committed
103

104
105
106
107
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16
)
print(pipeline.transformer.dtype, pipeline.vae.dtype)
108
109
```

110
## Device placement
111

112
The `device_map` argument determines individual model or pipeline placement on an accelerator like a GPU. It is especially helpful when there are multiple GPUs.
113

Steven Liu's avatar
Steven Liu committed
114
A pipeline supports two options for `device_map`, `"cuda"` and `"balanced"`. Refer to the table below to compare the placement strategies.
115

116
117
| parameter | description |
|---|---|
Steven Liu's avatar
Steven Liu committed
118
119
| `"cuda"` | places pipeline on a supported accelerator device like CUDA |
| `"balanced"` | evenly distributes pipeline on all GPUs |
120

121
Use the `max_memory` argument in [`~DiffusionPipeline.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available.
122

Steven Liu's avatar
Steven Liu committed
123
```py
124
125
126
import torch
from diffusers import DiffusionPipeline

Steven Liu's avatar
Steven Liu committed
127
max_memory = {0: "16GB", 1: "16GB"}
128
129
130
131
132
pipeline = DiffusionPipeline.from_pretrained(
  "Qwen/Qwen-Image", 
  torch_dtype=torch.bfloat16,
  device_map="cuda",
)
133
134
```

135
The `hf_device_map` attribute allows you to access and view the `device_map`.
136

Steven Liu's avatar
Steven Liu committed
137
```py
138
139
print(pipeline.hf_device_map)
# {'unet': 1, 'vae': 1, 'safety_checker': 0, 'text_encoder': 0}
Steven Liu's avatar
Steven Liu committed
140
```
141

142
Reset a pipeline's `device_map` with the [`~DiffusionPipeline.reset_device_map`] method. This is necessary if you want to use methods such as `.to()`, [`~DiffusionPipeline.enable_sequential_cpu_offload`], and [`~DiffusionPipeline.enable_model_cpu_offload`].
143

Steven Liu's avatar
Steven Liu committed
144
```py
145
pipeline.reset_device_map()
146
147
```

148
## Parallel loading
149

150
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
151

152
Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
153

154
The `device_map` argument should be set to `"cuda"` to pre-allocate a large chunk of memory based on the model size. This substantially reduces model load time because warming up the memory allocator now avoids many smaller calls to the allocator later.
155

156
157
158
```py
import os
import torch
Steven Liu's avatar
Steven Liu committed
159
160
from diffusers import DiffusionPipeline

161
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
162

163
164
165
166
pipeline = DiffusionPipeline.from_pretrained(
  "Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16, device_map="cuda"
)
```
167

168
## Replacing models in a pipeline
169

170
[`DiffusionPipeline`] is flexible and accommodates loading different models or schedulers. You can experiment with different schedulers to optimize for generation speed or quality, and you can replace models with more performant ones.
171

Steven Liu's avatar
Steven Liu committed
172
The example below uses a more stable VAE version.
173

Steven Liu's avatar
Steven Liu committed
174
```py
175
import torch
Steven Liu's avatar
Steven Liu committed
176
from diffusers import DiffusionPipeline, AutoModel
177

178
179
vae = AutoModel.from_pretrained(
  "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
180
181
)

Steven Liu's avatar
Steven Liu committed
182
pipeline = DiffusionPipeline.from_pretrained(
183
184
185
186
  "stabilityai/stable-diffusion-xl-base-1.0",
  vae=vae,
  torch_dtype=torch.float16,
  device_map="cuda"
187
)
188
189
```

190
## Reusing models in multiple pipelines
191

192
When working with multiple pipelines that use the same model, the [`~DiffusionPipeline.from_pipe`] method enables reusing a model instead of reloading it each time. This allows you to use multiple pipelines without increasing memory usage.
193

194
Memory usage is determined by the pipeline with the highest memory requirement regardless of the number of pipelines.
195

196
The example below loads a pipeline and then loads a second pipeline with [`~DiffusionPipeline.from_pipe`] to use [perturbed-attention guidance (PAG)](../api/pipelines/pag) to improve generation quality.
197

198
199
> [!WARNING]
> Use [`AutoPipelineForText2Image`] because [`DiffusionPipeline`] doesn't support PAG. Refer to the [AutoPipeline](../tutorials/autopipeline) docs to learn more. 
200

Steven Liu's avatar
Steven Liu committed
201
```py
202
203
import torch
from diffusers import AutoPipelineForText2Image
204

205
206
pipeline_sdxl = AutoPipelineForText2Image.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, device_map="cuda"
207
)
208
209
210
211
212
213
214
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
image = pipeline_sdxl(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
# Max memory reserved: 10.47 GB
215
```
216

217
Set `enable_pag=True` in the second pipeline to enable PAG. The second pipeline uses the same amount of memory because it shares model weights with the first one.
218

219
220
221
222
223
224
225
226
227
228
229
```py
pipeline = AutoPipelineForText2Image.from_pipe(
  pipeline_sdxl, enable_pag=True
)
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
image = pipeline(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
# Max memory reserved: 10.47 GB
230
231
```

232
233
> [!WARNING]
> Pipelines created by [`~DiffusionPipeline.from_pipe`] share the same models and *state*. Modifying the state of a model in one pipeline affects all the other pipelines that share the same model.
234

235
Some methods may not work correctly on pipelines created with [`~DiffusionPipeline.from_pipe`]. For example, [`~DiffusionPipeline.enable_model_cpu_offload`] relies on a unique model execution order, which may differ in the new pipeline. To ensure proper functionality, reapply these methods on the new pipeline.
236

237
238
239
## Safety checker

Diffusers provides a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for older Stable Diffusion models to prevent generating harmful content. It screens the generated output against a set of hardcoded harmful concepts.
240

241
If you want to disable the safety checker, pass `safety_checker=None` in [`~DiffusionPipeline.from_pretrained`] as shown below.
242

243
```py
244
from diffusers import DiffusionPipeline
245

246
247
248
249
250
251
252
pipeline = DiffusionPipeline.from_pretrained(
  "stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None
)
"""
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
"""
```