pag.md 5.29 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
YiYi Xu's avatar
YiYi Xu committed
2
3
4
5
6
7
8
9
10
11
12
13
14

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Perturbed-Attention Guidance

Steven Liu's avatar
Steven Liu committed
15
16
17
18
<div class="flex flex-wrap space-x-1">
  <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>

YiYi Xu's avatar
YiYi Xu committed
19
20
21
22
23
24
25
26
[Perturbed-Attention Guidance (PAG)](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) is a new diffusion sampling guidance that improves sample quality across both unconditional and conditional settings, achieving this without requiring further training or the integration of external modules.

PAG was introduced in [Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance](https://huggingface.co/papers/2403.17377) by Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin and Seungryong Kim.

The abstract from the paper is:

*Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.*

27
PAG can be used by specifying the `pag_applied_layers` as a parameter when instantiating a PAG pipeline. It can be a single string or a list of strings. Each string can be a unique layer identifier or a regular expression to identify one or more layers.
28
29
30
31
32
33

- Full identifier as a normal string: `down_blocks.2.attentions.0.transformer_blocks.0.attn1.processor`
- Full identifier as a RegEx: `down_blocks.2.(attentions|motion_modules).0.transformer_blocks.0.attn1.processor`
- Partial identifier as a RegEx: `down_blocks.2`, or `attn1`
- List of identifiers (can be combo of strings and ReGex): `["blocks.1", "blocks.(14|20)", r"down_blocks\.(2,3)"]`

Steven Liu's avatar
Steven Liu committed
34
35
> [!WARNING]
> Since RegEx is supported as a way for matching layer identifiers, it is crucial to use it correctly otherwise there might be unexpected behaviour. The recommended way to use PAG is by specifying layers as `blocks.{layer_index}` and `blocks.({layer_index_1|layer_index_2|...})`. Using it in any other way, while doable, may bypass our basic validation checks and give you unexpected results.
36

Aryan's avatar
Aryan committed
37
38
39
40
41
## AnimateDiffPAGPipeline
[[autodoc]] AnimateDiffPAGPipeline
  - all
  - __call__

42
43
44
45
46
## HunyuanDiTPAGPipeline
[[autodoc]] HunyuanDiTPAGPipeline
  - all
  - __call__

Álvaro Somoza's avatar
Álvaro Somoza committed
47
48
49
## KolorsPAGPipeline
[[autodoc]] KolorsPAGPipeline
  - all
50
  - __call__
Álvaro Somoza's avatar
Álvaro Somoza committed
51

52
53
54
55
56
## StableDiffusionPAGInpaintPipeline
[[autodoc]] StableDiffusionPAGInpaintPipeline
	- all
	- __call__

57
58
59
60
61
## StableDiffusionPAGPipeline
[[autodoc]] StableDiffusionPAGPipeline
	- all
	- __call__

62
63
64
65
66
## StableDiffusionPAGImg2ImgPipeline
[[autodoc]] StableDiffusionPAGImg2ImgPipeline
	- all
	- __call__

67
68
## StableDiffusionControlNetPAGPipeline
[[autodoc]] StableDiffusionControlNetPAGPipeline
69
70
71

## StableDiffusionControlNetPAGInpaintPipeline
[[autodoc]] StableDiffusionControlNetPAGInpaintPipeline
72
73
74
	- all
	- __call__

YiYi Xu's avatar
YiYi Xu committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
## StableDiffusionXLPAGPipeline
[[autodoc]] StableDiffusionXLPAGPipeline
	- all
	- __call__

## StableDiffusionXLPAGImg2ImgPipeline
[[autodoc]] StableDiffusionXLPAGImg2ImgPipeline
	- all
	- __call__

## StableDiffusionXLPAGInpaintPipeline
[[autodoc]] StableDiffusionXLPAGInpaintPipeline
	- all
	- __call__

## StableDiffusionXLControlNetPAGPipeline
[[autodoc]] StableDiffusionXLControlNetPAGPipeline
	- all
	- __call__
94

95
96
97
98
## StableDiffusionXLControlNetPAGImg2ImgPipeline
[[autodoc]] StableDiffusionXLControlNetPAGImg2ImgPipeline
	- all
	- __call__
99

100
101
102
103
104
## StableDiffusion3PAGPipeline
[[autodoc]] StableDiffusion3PAGPipeline
	- all
	- __call__

105
106
107
108
## StableDiffusion3PAGImg2ImgPipeline
[[autodoc]] StableDiffusion3PAGImg2ImgPipeline
	- all
	- __call__
109

110
111
112
## PixArtSigmaPAGPipeline
[[autodoc]] PixArtSigmaPAGPipeline
	- all
113
	- __call__