callback.md 14.8 KB
Newer Older
1
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

Steven Liu's avatar
Steven Liu committed
13
# Pipeline callbacks
14

omahs's avatar
omahs committed
15
The denoising loop of a pipeline can be modified with custom defined functions using the `callback_on_step_end` parameter. The callback function is executed at the end of each step, and modifies the pipeline attributes and variables for the next step. This is really useful for *dynamically* adjusting certain pipeline attributes or modifying tensor variables. This versatility allows for interesting use cases such as changing the prompt embeddings at each timestep, assigning different weights to the prompt embeddings, and editing the guidance scale. With callbacks, you can implement new features without modifying the underlying code!
16

Steven Liu's avatar
Steven Liu committed
17
18
> [!TIP]
> 🤗 Diffusers currently only supports `callback_on_step_end`, but feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you have a cool use-case and require a callback function with a different execution point!
Steven Liu's avatar
Steven Liu committed
19

Steven Liu's avatar
Steven Liu committed
20
This guide will demonstrate how callbacks work by a few features you can implement with them.
Steven Liu's avatar
Steven Liu committed
21

Álvaro Somoza's avatar
Álvaro Somoza committed
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
## Official callbacks

We provide a list of callbacks you can plug into an existing pipeline and modify the denoising loop. This is the current list of official callbacks:

- `SDCFGCutoffCallback`: Disables the CFG after a certain number of steps for all SD 1.5 pipelines, including text-to-image, image-to-image, inpaint, and controlnet.
- `SDXLCFGCutoffCallback`: Disables the CFG after a certain number of steps for all SDXL pipelines, including text-to-image, image-to-image, inpaint, and controlnet.
- `IPAdapterScaleCutoffCallback`: Disables the IP Adapter after a certain number of steps for all pipelines supporting IP-Adapter.

> [!TIP]
> If you want to add a new official callback, feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) or [submit a PR](https://huggingface.co/docs/diffusers/main/en/conceptual/contribution#how-to-open-a-pr).

To set up a callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments

- `cutoff_step_ratio`: Float number with the ratio of the steps.
- `cutoff_step_index`: Integer number with the exact number of the step.

```python
import torch

from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback


callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
# can also be used with cutoff_step_index
# callback = SDXLCFGCutoffCallback(cutoff_step_ratio=None, cutoff_step_index=10)

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution"

generator = torch.Generator(device="cpu").manual_seed(2628670641)

out = pipeline(
    prompt=prompt,
    negative_prompt="",
    guidance_scale=6.5,
    num_inference_steps=25,
    generator=generator,
    callback_on_step_end=callback,
)

out.images[0].save("official_callback.png")
```

<div class="flex gap-4">
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/without_cfg_callback.png" alt="generated image of a sports car at the road" />
    <figcaption class="mt-2 text-center text-sm text-gray-500">without SDXLCFGCutoffCallback</figcaption>
  </div>
  <div>
omahs's avatar
omahs committed
78
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/with_cfg_callback.png" alt="generated image of a sports car at the road with cfg callback" />
Álvaro Somoza's avatar
Álvaro Somoza committed
79
80
81
82
    <figcaption class="mt-2 text-center text-sm text-gray-500">with SDXLCFGCutoffCallback</figcaption>
  </div>
</div>

Steven Liu's avatar
Steven Liu committed
83
84
85
86
## Dynamic classifier-free guidance

Dynamic classifier-free guidance (CFG) is a feature that allows you to disable CFG after a certain number of inference steps which can help you save compute with minimal cost to performance. The callback function for this should have the following arguments:

Álvaro Somoza's avatar
Álvaro Somoza committed
87
88
89
- `pipeline` (or the pipeline instance) provides access to important properties such as `num_timesteps` and `guidance_scale`. You can modify these properties by updating the underlying attributes. For this example, you'll disable CFG by setting `pipeline._guidance_scale=0.0`.
- `step_index` and `timestep` tell you where you are in the denoising loop. Use `step_index` to turn off CFG after reaching 40% of `num_timesteps`.
- `callback_kwargs` is a dict that contains tensor variables you can modify during the denoising loop. It only includes variables specified in the `callback_on_step_end_tensor_inputs` argument, which is passed to the pipeline's `__call__` method. Different pipelines may use different sets of variables, so please check a pipeline's `_callback_tensor_inputs` attribute for the list of variables you can modify. Some common variables include `latents` and `prompt_embeds`. For this function, change the batch size of `prompt_embeds` after setting `guidance_scale=0.0` in order for it to work properly.
Steven Liu's avatar
Steven Liu committed
90
91

Your callback function should look something like this:
92
93

```python
94
def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):
95
        # adjust the batch_size of prompt_embeds according to guidance_scale
Steven Liu's avatar
Steven Liu committed
96
        if step_index == int(pipeline.num_timesteps * 0.4):
97
                prompt_embeds = callback_kwargs["prompt_embeds"]
98
                prompt_embeds = prompt_embeds.chunk(2)[-1]
99

100
                # update guidance_scale and prompt_embeds
Steven Liu's avatar
Steven Liu committed
101
                pipeline._guidance_scale = 0.0
102
                callback_kwargs["prompt_embeds"] = prompt_embeds
103
104
105
        return callback_kwargs
```

Steven Liu's avatar
Steven Liu committed
106
Now, you can pass the callback function to the `callback_on_step_end` parameter and the `prompt_embeds` to `callback_on_step_end_tensor_inputs`.
107

Steven Liu's avatar
Steven Liu committed
108
```py
109
110
111
import torch
from diffusers import StableDiffusionPipeline

112
pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
Steven Liu's avatar
Steven Liu committed
113
pipeline = pipeline.to("cuda")
114
115
116
117

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manual_seed(1)
Steven Liu's avatar
Steven Liu committed
118
119
120
121
122
123
out = pipeline(
    prompt,
    generator=generator,
    callback_on_step_end=callback_dynamic_cfg,
    callback_on_step_end_tensor_inputs=['prompt_embeds']
)
124
125
126
127

out.images[0].save("out_custom_cfg.png")
```

Steven Liu's avatar
Steven Liu committed
128
## Interrupt the diffusion process
Dhruv Nair's avatar
Dhruv Nair committed
129

Steven Liu's avatar
Steven Liu committed
130
131
> [!TIP]
> The interruption callback is supported for text-to-image, image-to-image, and inpainting for the [StableDiffusionPipeline](../api/pipelines/stable_diffusion/overview) and [StableDiffusionXLPipeline](../api/pipelines/stable_diffusion/stable_diffusion_xl).
Dhruv Nair's avatar
Dhruv Nair committed
132

Steven Liu's avatar
Steven Liu committed
133
Stopping the diffusion process early is useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.
Dhruv Nair's avatar
Dhruv Nair committed
134

Steven Liu's avatar
Steven Liu committed
135
This callback function should take the following arguments: `pipeline`, `i`, `t`, and `callback_kwargs` (this must be returned). Set the pipeline's `_interrupt` attribute to `True` to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.
Dhruv Nair's avatar
Dhruv Nair committed
136
137
138
139
140
141

In this example, the diffusion process is stopped after 10 steps even though `num_inference_steps` is set to 50.

```python
from diffusers import StableDiffusionPipeline

142
pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
Steven Liu's avatar
Steven Liu committed
143
pipeline.enable_model_cpu_offload()
Dhruv Nair's avatar
Dhruv Nair committed
144
145
num_inference_steps = 50

Steven Liu's avatar
Steven Liu committed
146
def interrupt_callback(pipeline, i, t, callback_kwargs):
Dhruv Nair's avatar
Dhruv Nair committed
147
148
    stop_idx = 10
    if i == stop_idx:
Steven Liu's avatar
Steven Liu committed
149
        pipeline._interrupt = True
Dhruv Nair's avatar
Dhruv Nair committed
150
151
152

    return callback_kwargs

Steven Liu's avatar
Steven Liu committed
153
pipeline(
Dhruv Nair's avatar
Dhruv Nair committed
154
155
156
157
158
    "A photo of a cat",
    num_inference_steps=num_inference_steps,
    callback_on_step_end=interrupt_callback,
)
```
Steven Liu's avatar
Steven Liu committed
159

160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
## IP Adapter Cutoff

IP Adapter is an image prompt adapter that can be used for diffusion models without any changes to the underlying model. We can use the IP Adapter Cutoff Callback to disable the IP Adapter after a certain number of steps. To set up the callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments:

- `cutoff_step_ratio`: Float number with the ratio of the steps.
- `cutoff_step_index`: Integer number with the exact number of the step.

We need to download the diffusion model and load the ip_adapter for it as follows:

```py
from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
pipeline.set_ip_adapter_scale(0.6)
```
The setup for the callback should look something like this:

```py

from diffusers import AutoPipelineForText2Image
from diffusers.callbacks import IPAdapterScaleCutoffCallback
from diffusers.utils import load_image
import torch
 

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    torch_dtype=torch.float16
).to("cuda")


pipeline.load_ip_adapter(
    "h94/IP-Adapter", 
    subfolder="sdxl_models", 
    weight_name="ip-adapter_sdxl.bin"
)

pipeline.set_ip_adapter_scale(0.6)


callback = IPAdapterScaleCutoffCallback(
    cutoff_step_ratio=None, 
    cutoff_step_index=5
)

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png"
)

generator = torch.Generator(device="cuda").manual_seed(2628670641)

images = pipeline(
    prompt="a tiger sitting in a chair drinking orange juice",
    ip_adapter_image=image,
    negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
    generator=generator,
    num_inference_steps=50,
    callback_on_step_end=callback,
).images

images[0].save("custom_callback_img.png")
```

<div class="flex gap-4">
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/without_callback.png" alt="generated image of a tiger sitting in a chair drinking orange juice" />
    <figcaption class="mt-2 text-center text-sm text-gray-500">without IPAdapterScaleCutoffCallback</figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/with_callback2.png" alt="generated image of a tiger sitting in a chair drinking orange juice with ip adapter callback" />
    <figcaption class="mt-2 text-center text-sm text-gray-500">with IPAdapterScaleCutoffCallback</figcaption>
  </div>
</div>


Steven Liu's avatar
Steven Liu committed
238
239
240
241
242
243
244
245
246
247
248
249
250
251
## Display image after each generation step

> [!TIP]
> This tip was contributed by [asomoza](https://github.com/asomoza).

Display an image after each generation step by accessing and converting the latents after each step into an image. The latent space is compressed to 128x128, so the images are also 128x128 which is useful for a quick preview.

1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post.

```py
def latents_to_rgb(latents):
    weights = (
        (60, -60, 25, -70),
        (60,  -5, 15, -50),
252
        (60,  10, -5, -35),
Steven Liu's avatar
Steven Liu committed
253
254
255
256
257
    )

    weights_tensor = torch.t(torch.tensor(weights, dtype=latents.dtype).to(latents.device))
    biases_tensor = torch.tensor((150, 140, 130), dtype=latents.dtype).to(latents.device)
    rgb_tensor = torch.einsum("...lxy,lr -> ...rxy", latents, weights_tensor) + biases_tensor.unsqueeze(-1).unsqueeze(-1)
258
    image_array = rgb_tensor.clamp(0, 255).byte().cpu().numpy().transpose(1, 2, 0)
Steven Liu's avatar
Steven Liu committed
259
260
261
262
263
264
265
266
267

    return Image.fromarray(image_array)
```

2. Create a function to decode and save the latents into an image.

```py
def decode_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]
Tolga Cangöz's avatar
Tolga Cangöz committed
268

269
    image = latents_to_rgb(latents[0])
Steven Liu's avatar
Steven Liu committed
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
    image.save(f"{step}.png")

    return callback_kwargs
```

3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You also need to specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents.

```py
from diffusers import AutoPipelineForText2Image
import torch
from PIL import Image

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
).to("cuda")

289
290
291
image = pipeline(
    prompt="A croissant shaped like a cute bear.",
    negative_prompt="Deformed, ugly, bad anatomy",
Steven Liu's avatar
Steven Liu committed
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
    callback_on_step_end=decode_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]
```

<div class="flex gap-4 justify-center">
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/tips_step_0.png"/>
    <figcaption class="mt-2 text-center text-sm text-gray-500">step 0</figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/tips_step_19.png"/>
    <figcaption class="mt-2 text-center text-sm text-gray-500">step 19
    </figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/tips_step_29.png"/>
    <figcaption class="mt-2 text-center text-sm text-gray-500">step 29</figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/tips_step_39.png"/>
    <figcaption class="mt-2 text-center text-sm text-gray-500">step 39</figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/tips_step_49.png"/>
    <figcaption class="mt-2 text-center text-sm text-gray-500">step 49</figcaption>
  </div>
</div>