Unverified Commit fa35750d authored by Susung Hong's avatar Susung Hong Committed by GitHub
Browse files

Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (#2193)



* Add Stable Diffusion Sw/ elf-Attention Guidance

* Modify __init__.py

* Register attention storing processor

* Update pipeline_stable_diffusion_sag.py

* Editing default value

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update dummy_torch_and_transformers_objects.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update pipeline_stable_diffusion_sag.py

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Create test_stable_diffusion_sag.py

* Create self_attention_guidance.py

* Update pipeline_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Rename self_attention_guidance.py to self_attention_guidance.mdx

* Update self_attention_guidance.mdx

* Update self_attention_guidance.mdx

* Update _toctree.yml

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Fixing order

* Update pipeline_stable_diffusion_sag.py

* fixing import order

* fix order

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Naming change

* Noting pred_x0

* Adding some fast tests

* Update pipeline_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update docs/source/en/api/pipelines/stable_diffusion/self_attention_guidance.mdx

* implement gaussian_blur

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* fix tests

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

---------
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarWill Berman <wlbberman@gmail.com>
parent fd3d5502
......@@ -153,6 +153,8 @@
title: InstructPix2Pix
- local: api/pipelines/stable_diffusion/pix2pix_zero
title: Pix2Pix Zero
- local: api/pipelines/stable_diffusion/self_attention_guidance
title: Self-Attention Guidance
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
......
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Self-Attention Guidance (SAG)
## Overview
[Self-Attention Guidance](https://arxiv.org/abs/2210.00939) by Susung Hong et al.
The abstract of the paper is the following:
*Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample quality and diversity. Despite their remarkable performance, DDMs remain black boxes on which further study is necessary to take a profound step. Motivated by this, we delve into the design of conventional U-shaped diffusion models. More specifically, we investigate the self-attention modules within these models through carefully designed experiments and explore their characteristics. In addition, inspired by the studies that substantiate the effectiveness of the guidance schemes, we present plug-and-play diffusion guidance, namely Self-Attention Guidance (SAG), that can drastically boost the performance of existing diffusion models. Our method, SAG, extracts the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Subsequently, we measure the dissimilarity between the predicted noises obtained from feeding the blurred and original input to the diffusion model and leverage it as guidance. With this guidance, we observe apparent improvements in a wide range of diffusion models, e.g., ADM, IDDPM, and Stable Diffusion, and show that the results further improve by combining our method with the conventional guidance scheme. We provide extensive ablation studies to verify our choices.*
Resources:
* [Project Page](https://ku-cvlab.github.io/Self-Attention-Guidance).
* [Paper](https://arxiv.org/abs/2210.00939).
* [Original Code](https://github.com/KU-CVLAB/Self-Attention-Guidance).
* [Demo](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionSAGPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py) | *Text-to-Image Generation* | [Colab](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb) |
## Usage example
```python
import torch
from diffusers import StableDiffusionSAGPipeline
from accelerate.utils import set_seed
pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
seed = 8978
prompt = "."
guidance_scale = 7.5
num_images_per_prompt = 1
sag_scale = 1.0
set_seed(seed)
images = pipe(
prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
).images
images[0].save("example.png")
```
## StableDiffusionSAGPipeline
[[autodoc]] StableDiffusionSAGPipeline
- __call__
- all
......@@ -119,6 +119,7 @@ else:
StableDiffusionPipeline,
StableDiffusionPipelineSafe,
StableDiffusionPix2PixZeroPipeline,
StableDiffusionSAGPipeline,
StableDiffusionUpscalePipeline,
StableUnCLIPImg2ImgPipeline,
StableUnCLIPPipeline,
......
......@@ -55,6 +55,7 @@ else:
StableDiffusionLatentUpscalePipeline,
StableDiffusionPipeline,
StableDiffusionPix2PixZeroPipeline,
StableDiffusionSAGPipeline,
StableDiffusionUpscalePipeline,
StableUnCLIPImg2ImgPipeline,
StableUnCLIPPipeline,
......
......@@ -44,6 +44,7 @@ if is_transformers_available() and is_torch_available():
from .pipeline_stable_diffusion_inpaint_legacy import StableDiffusionInpaintPipelineLegacy
from .pipeline_stable_diffusion_instruct_pix2pix import StableDiffusionInstructPix2PixPipeline
from .pipeline_stable_diffusion_latent_upscale import StableDiffusionLatentUpscalePipeline
from .pipeline_stable_diffusion_sag import StableDiffusionSAGPipeline
from .pipeline_stable_diffusion_upscale import StableDiffusionUpscalePipeline
from .pipeline_stable_unclip import StableUnCLIPPipeline
from .pipeline_stable_unclip_img2img import StableUnCLIPImg2ImgPipeline
......
......@@ -212,6 +212,21 @@ class StableDiffusionPipelineSafe(metaclass=DummyObject):
requires_backends(cls, ["torch", "transformers"])
class StableDiffusionSAGPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch", "transformers"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
class StableDiffusionPix2PixZeroPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
......
# coding=utf-8
# Copyright 2022 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import unittest
import numpy as np
import torch
from transformers import CLIPTextConfig, CLIPTextModel, CLIPTokenizer
from diffusers import (
AutoencoderKL,
DDIMScheduler,
StableDiffusionSAGPipeline,
UNet2DConditionModel,
)
from diffusers.utils import slow, torch_device
from diffusers.utils.testing_utils import require_torch_gpu
from ...test_pipelines_common import PipelineTesterMixin
torch.backends.cuda.matmul.allow_tf32 = False
class StableDiffusionSAGPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = StableDiffusionSAGPipeline
test_cpu_offload = False
def get_dummy_components(self):
torch.manual_seed(0)
unet = UNet2DConditionModel(
block_out_channels=(32, 64),
layers_per_block=2,
sample_size=32,
in_channels=4,
out_channels=4,
down_block_types=("DownBlock2D", "CrossAttnDownBlock2D"),
up_block_types=("CrossAttnUpBlock2D", "UpBlock2D"),
cross_attention_dim=32,
)
scheduler = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
torch.manual_seed(0)
vae = AutoencoderKL(
block_out_channels=[32, 64],
in_channels=3,
out_channels=3,
down_block_types=["DownEncoderBlock2D", "DownEncoderBlock2D"],
up_block_types=["UpDecoderBlock2D", "UpDecoderBlock2D"],
latent_channels=4,
)
torch.manual_seed(0)
text_encoder_config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=32,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
)
text_encoder = CLIPTextModel(text_encoder_config)
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
components = {
"unet": unet,
"scheduler": scheduler,
"vae": vae,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"safety_checker": None,
"feature_extractor": None,
}
return components
def get_dummy_inputs(self, device, seed=0):
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
"prompt": ".",
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 1.0,
"sag_scale": 1.0,
"output_type": "numpy",
}
return inputs
@slow
@require_torch_gpu
class StableDiffusionPipelineIntegrationTests(unittest.TestCase):
def tearDown(self):
# clean up the VRAM after each test
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
def test_stable_diffusion_1(self):
sag_pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
sag_pipe = sag_pipe.to(torch_device)
sag_pipe.set_progress_bar_config(disable=None)
prompt = "."
generator = torch.manual_seed(0)
output = sag_pipe(
[prompt], generator=generator, guidance_scale=7.5, sag_scale=1.0, num_inference_steps=20, output_type="np"
)
image = output.images
image_slice = image[0, -3:, -3:, -1]
assert image.shape == (1, 512, 512, 3)
expected_slice = np.array([0.1568, 0.1738, 0.1695, 0.1693, 0.1507, 0.1705, 0.1547, 0.1751, 0.1949])
assert np.abs(image_slice.flatten() - expected_slice).max() < 5e-4
def test_stable_diffusion_2(self):
sag_pipe = StableDiffusionSAGPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")
sag_pipe = sag_pipe.to(torch_device)
sag_pipe.set_progress_bar_config(disable=None)
prompt = "."
generator = torch.manual_seed(0)
output = sag_pipe(
[prompt], generator=generator, guidance_scale=7.5, sag_scale=1.0, num_inference_steps=20, output_type="np"
)
image = output.images
image_slice = image[0, -3:, -3:, -1]
assert image.shape == (1, 512, 512, 3)
expected_slice = np.array([0.3459, 0.2876, 0.2537, 0.3002, 0.2671, 0.2160, 0.3026, 0.2262, 0.2371])
assert np.abs(image_slice.flatten() - expected_slice).max() < 5e-5
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment