loading.md 11.2 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Nathan Lambert's avatar
Nathan Lambert committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
14
[[open-in-colab]]

15
# DiffusionPipeline
16

17
Diffusion models consists of multiple components like UNets or diffusion transformers (DiTs), text encoders, variational autoencoders (VAEs), and schedulers. The [`DiffusionPipeline`] wraps all of these components into a single easy-to-use API without giving up the flexibility to modify it's components.
18

19
This guide will show you how to load a [`DiffusionPipeline`].
20

21
## Loading a pipeline
22

23
[`DiffusionPipeline`] is a base pipeline class that automatically selects and returns an instance of a model's pipeline subclass, like [`QwenImagePipeline`], by scanning the `model_index.json` file for the class name.
24

25
Pass a model id to [`~DiffusionPipeline.from_pretrained`] to load a pipeline.
Steven Liu's avatar
Steven Liu committed
26
27

```py
28
import torch
Steven Liu's avatar
Steven Liu committed
29
30
from diffusers import DiffusionPipeline

31
32
33
pipeline = DiffusionPipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
)
Steven Liu's avatar
Steven Liu committed
34
35
```

36
Every model has a specific pipeline subclass that inherits from [`DiffusionPipeline`]. A subclass usually has a narrow focus and are task-specific. See the table below for an example.
37

38
39
40
41
42
| pipeline subclass | task |
|---|---|
| [`QwenImagePipeline`] | text-to-image |
| [`QwenImageImg2ImgPipeline`] | image-to-image |
| [`QwenImageInpaintPipeline`] | inpaint |
43

44
You could use the subclass directly by passing a model id to [`~QwenImagePipeline.from_pretrained`].
45

Steven Liu's avatar
Steven Liu committed
46
```py
hlky's avatar
hlky committed
47
import torch
48
from diffusers import QwenImagePipeline
hlky's avatar
hlky committed
49

50
51
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16, device_map="cuda"
hlky's avatar
hlky committed
52
53
54
)
```

55
### Local pipelines
hlky's avatar
hlky committed
56

57
Pipelines can also be run locally. Use [`~huggingface_hub.snapshot_download`] to download a model repository.
58

59
60
```py
from huggingface_hub import snapshot_download
61

62
63
snapshot_download(repo_id="Qwen/Qwen-Image")
```
64

65
The model is downloaded to your [cache](../installation#cache). Pass the folder path to [`~QwenImagePipeline.from_pretrained`] to load it.
66
67
68

```py
import torch
69
from diffusers import QwenImagePipeline
70

71
72
pipeline = QwenImagePipeline.from_pretrained(
  "path/to/your/cache", torch_dtype=torch.bfloat16, device_map="cuda"
73
74
75
)
```

76
The [`~QwenImagePipeline.from_pretrained`] method won't download files from the Hub when it detects a local path. But this also means it won't download and cache any updates that have been made to the model either.
77

78
## Pipeline data types
79

80
Use the `torch_dtype` argument in [`~DiffusionPipeline.from_pretrained`] to load a model with a specific data type. This allows you to load different models in different precisions. For example, loading a large transformer model in half-precision reduces the memory required.
81

82
Pass the data type for each model as a dictionary to `torch_dtype`. Use the `default` key to set the default data type. If a model isn't in the dictionary and `default` isn't provided, it is loaded in full precision (`torch.float32`).
83

Steven Liu's avatar
Steven Liu committed
84
85
```py
import torch
86
from diffusers import QwenImagePipeline
87

88
89
90
91
92
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image",
  torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
print(pipeline.transformer.dtype, pipeline.vae.dtype)
93
```
94

95
You don't need to use a dictionary if you're loading all the models in the same data type.
96
97

```py
98
import torch
99
from diffusers import QwenImagePipeline
Steven Liu's avatar
Steven Liu committed
100

101
102
103
104
pipeline = QwenImagePipeline.from_pretrained(
  "Qwen/Qwen-Image", torch_dtype=torch.bfloat16
)
print(pipeline.transformer.dtype, pipeline.vae.dtype)
105
106
```

107
## Device placement
108

109
The `device_map` argument determines individual model or pipeline placement on an accelerator like a GPU. It is especially helpful when there are multiple GPUs.
110

111
Diffusers currently provides three options to `device_map`, `"cuda"`, `"balanced"` and `"auto"`. Refer to the table below to compare the three placement strategies.
112

113
114
115
116
117
| parameter | description |
|---|---|
| `"cuda"` | places model or pipeline on CUDA device |
| `"balanced"` | evenly distributes model or pipeline on all GPUs |
| `"auto"` | distribute model from fastest device first to slowest |
118

119
Use the `max_memory` argument in [`~DiffusionPipeline.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available.
120

121
122
<hfoptions id="device_map">
<hfoption id="pipeline">
123

Steven Liu's avatar
Steven Liu committed
124
```py
125
126
127
128
129
130
131
132
import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
  "Qwen/Qwen-Image", 
  torch_dtype=torch.bfloat16,
  device_map="cuda",
)
133
134
```

135
136
</hfoption>
<hfoption id="individual model">
137

Steven Liu's avatar
Steven Liu committed
138
```py
139
140
141
142
143
144
145
146
147
148
149
import torch
from diffusers import AutoModel

max_memory = {0: "16GB", 1: "16GB"}
transformer = AutoModel.from_pretrained(
    "Qwen/Qwen-Image", 
    subfolder="transformer",
    torch_dtype=torch.bfloat16
    device_map="cuda",
    max_memory=max_memory
)
150
```
Steven Liu's avatar
Steven Liu committed
151

152
153
</hfoption>
</hfoptions>
154

155
The `hf_device_map` attribute allows you to access and view the `device_map`.
156

Steven Liu's avatar
Steven Liu committed
157
```py
158
159
print(pipeline.hf_device_map)
# {'unet': 1, 'vae': 1, 'safety_checker': 0, 'text_encoder': 0}
Steven Liu's avatar
Steven Liu committed
160
```
161

162
Reset a pipeline's `device_map` with the [`~DiffusionPipeline.reset_device_map`] method. This is necessary if you want to use methods such as `.to()`, [`~DiffusionPipeline.enable_sequential_cpu_offload`], and [`~DiffusionPipeline.enable_model_cpu_offload`].
163

Steven Liu's avatar
Steven Liu committed
164
```py
165
pipeline.reset_device_map()
166
167
```

168
## Parallel loading
169

170
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
171

172
Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
173

174
The `device_map` argument should be set to `"cuda"` to pre-allocate a large chunk of memory based on the model size. This substantially reduces model load time because warming up the memory allocator now avoids many smaller calls to the allocator later.
175

176
177
178
```py
import os
import torch
Steven Liu's avatar
Steven Liu committed
179
180
from diffusers import DiffusionPipeline

181
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
182

183
184
185
186
pipeline = DiffusionPipeline.from_pretrained(
  "Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16, device_map="cuda"
)
```
187

188
## Replacing models in a pipeline
189

190
[`DiffusionPipeline`] is flexible and accommodates loading different models or schedulers. You can experiment with different schedulers to optimize for generation speed or quality, and you can replace models with more performant ones.
191

192
The example below swaps the default scheduler to generate higher quality images and a more stable VAE version. Pass the `subfolder` argument in [`~HeunDiscreteScheduler.from_pretrained`] to load the scheduler to the correct subfolder.
193

Steven Liu's avatar
Steven Liu committed
194
```py
195
import torch
196
from diffusers import DiffusionPipeline, HeunDiscreteScheduler, AutoModel
197

198
199
200
201
202
scheduler = HeunDiscreteScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler"
)
vae = AutoModel.from_pretrained(
  "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
203
204
)

Steven Liu's avatar
Steven Liu committed
205
pipeline = DiffusionPipeline.from_pretrained(
206
207
208
209
210
  "stabilityai/stable-diffusion-xl-base-1.0",
  scheduler=scheduler,
  vae=vae,
  torch_dtype=torch.float16,
  device_map="cuda"
211
)
212
213
```

214
## Reusing models in multiple pipelines
215

216
When working with multiple pipelines that use the same model, the [`~DiffusionPipeline.from_pipe`] method enables reusing a model instead of reloading it each time. This allows you to use multiple pipelines without increasing memory usage.
217

218
Memory usage is determined by the pipeline with the highest memory requirement regardless of the number of pipelines.
219

220
The example below loads a pipeline and then loads a second pipeline with [`~DiffusionPipeline.from_pipe`] to use [perturbed-attention guidance (PAG)](../api/pipelines/pag) to improve generation quality.
221

222
223
> [!WARNING]
> Use [`AutoPipelineForText2Image`] because [`DiffusionPipeline`] doesn't support PAG. Refer to the [AutoPipeline](../tutorials/autopipeline) docs to learn more. 
224

Steven Liu's avatar
Steven Liu committed
225
```py
226
227
import torch
from diffusers import AutoPipelineForText2Image
228

229
230
pipeline_sdxl = AutoPipelineForText2Image.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, device_map="cuda"
231
)
232
233
234
235
236
237
238
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
image = pipeline_sdxl(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
# Max memory reserved: 10.47 GB
239
```
240

241
Set `enable_pag=True` in the second pipeline to enable PAG. The second pipeline uses the same amount of memory because it shares model weights with the first one.
242

243
244
245
246
247
248
249
250
251
252
253
```py
pipeline = AutoPipelineForText2Image.from_pipe(
  pipeline_sdxl, enable_pag=True
)
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
image = pipeline(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
# Max memory reserved: 10.47 GB
254
255
```

256
257
> [!WARNING]
> Pipelines created by [`~DiffusionPipeline.from_pipe`] share the same models and *state*. Modifying the state of a model in one pipeline affects all the other pipelines that share the same model.
258

259
Some methods may not work correctly on pipelines created with [`~DiffusionPipeline.from_pipe`]. For example, [`~DiffusionPipeline.enable_model_cpu_offload`] relies on a unique model execution order, which may differ in the new pipeline. To ensure proper functionality, reapply these methods on the new pipeline.
260

261
262
263
## Safety checker

Diffusers provides a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for older Stable Diffusion models to prevent generating harmful content. It screens the generated output against a set of hardcoded harmful concepts.
264

265
If you want to disable the safety checker, pass `safety_checker=None` in [`~DiffusionPipeline.from_pretrained`] as shown below.
266

267
```py
268
from diffusers import DiffusionPipeline
269

270
271
272
273
274
275
276
pipeline = DiffusionPipeline.from_pretrained(
  "stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None
)
"""
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
"""
```