"vscode:/vscode.git/clone" did not exist on "c04c17edfa6407738d2bbbb5f44fe36b7e2f3f63"
t2i_adapter.md 6.09 KB
Newer Older
Steven Liu's avatar
Steven Liu committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# T2I-Adapter

Steven Liu's avatar
Steven Liu committed
15
[T2I-Adapter](https://huggingface.co/papers/2302.08453) is an adapter that enables controllable generation like [ControlNet](./controlnet). A T2I-Adapter works by learning a *mapping* between a control signal (for example, a depth map) and a pretrained model's internal knowledge. The adapter is plugged in to the base model to provide extra guidance based on the control signal during generation.
Steven Liu's avatar
Steven Liu committed
16

Steven Liu's avatar
Steven Liu committed
17
Load a T2I-Adapter conditioned on a specific control, such as canny edge, and pass it to the pipeline in [`~DiffusionPipeline.from_pretrained`].
Steven Liu's avatar
Steven Liu committed
18
19

```py
Steven Liu's avatar
Steven Liu committed
20
21
import torch
from diffusers import T2IAdapter, StableDiffusionXLAdapterPipeline, AutoencoderKL
Steven Liu's avatar
Steven Liu committed
22

Steven Liu's avatar
Steven Liu committed
23
24
25
26
27
t2i_adapter = T2IAdapter.from_pretrained(
    "TencentARC/t2i-adapter-canny-sdxl-1.0",
    torch_dtype=torch.float16,
)
```
Steven Liu's avatar
Steven Liu committed
28

Steven Liu's avatar
Steven Liu committed
29
Generate a canny image with [opencv-python](https://github.com/opencv/opencv-python).
Steven Liu's avatar
Steven Liu committed
30
31
32
33
34
35
36

```py
import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image

Steven Liu's avatar
Steven Liu committed
37
38
39
40
41
original_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)

image = np.array(original_image)
Steven Liu's avatar
Steven Liu committed
42
43
44
45
46

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
Steven Liu's avatar
Steven Liu committed
47
48
49
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
Steven Liu's avatar
Steven Liu committed
50
51
```

Steven Liu's avatar
Steven Liu committed
52
Pass the canny image to the pipeline to generate an image.
Steven Liu's avatar
Steven Liu committed
53
54
55
56
57

```py
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
58
    adapter=t2i_adapter,
Steven Liu's avatar
Steven Liu committed
59
60
    vae=vae,
    torch_dtype=torch.float16,
Steven Liu's avatar
Steven Liu committed
61
).to("cuda")
Steven Liu's avatar
Steven Liu committed
62

Steven Liu's avatar
Steven Liu committed
63
64
65
66
prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita. 
The cat is floating leisurely in the pool and completely relaxed and happy.
"""
Steven Liu's avatar
Steven Liu committed
67

Steven Liu's avatar
Steven Liu committed
68
69
70
71
72
pipeline(
    prompt, 
    image=canny_image,
    num_inference_steps=100, 
    guidance_scale=10,
Steven Liu's avatar
Steven Liu committed
73
74
75
).images[0]
```

Steven Liu's avatar
Steven Liu committed
76
77
78
79
80
81
82
83
84
85
86
87
88
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
  <figure>
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png" width="300" alt="Generated image (prompt only)"/>
    <figcaption style="text-align: center;">original image</figcaption>
  </figure>
  <figure>
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Control image (Canny edges)"/>
    <figcaption style="text-align: center;">canny image</figcaption>
  </figure>
  <figure>
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-canny-cat-generated.png" width="300" alt="Generated image (ControlNet + prompt)"/>
    <figcaption style="text-align: center;">generated image</figcaption>
  </figure>
Steven Liu's avatar
Steven Liu committed
89
90
91
92
</div>

## MultiAdapter

Steven Liu's avatar
Steven Liu committed
93
You can compose multiple controls, such as canny image and a depth map, with the [`MultiAdapter`] class.
Steven Liu's avatar
Steven Liu committed
94

Steven Liu's avatar
Steven Liu committed
95
96
97
The example below composes a canny image and depth map.

Load the control images and T2I-Adapters as a list.
Steven Liu's avatar
Steven Liu committed
98
99

```py
Steven Liu's avatar
Steven Liu committed
100
import torch
Steven Liu's avatar
Steven Liu committed
101
from diffusers.utils import load_image
Steven Liu's avatar
Steven Liu committed
102
from diffusers import StableDiffusionXLAdapterPipeline, AutoencoderKL, MultiAdapter, T2IAdapter
Steven Liu's avatar
Steven Liu committed
103

Steven Liu's avatar
Steven Liu committed
104
105
canny_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png"
Steven Liu's avatar
Steven Liu committed
106
107
)
depth_image = load_image(
Steven Liu's avatar
Steven Liu committed
108
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png"
Steven Liu's avatar
Steven Liu committed
109
)
Steven Liu's avatar
Steven Liu committed
110
111
112
113
114
controls = [canny_image, depth_image]
prompt = ["""
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby, 
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""]
Steven Liu's avatar
Steven Liu committed
115
116
117

adapters = MultiAdapter(
    [
Steven Liu's avatar
Steven Liu committed
118
119
        T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16),
        T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas-sdxl-1.0", torch_dtype=torch.float16),
Steven Liu's avatar
Steven Liu committed
120
121
122
123
    ]
)
```

Steven Liu's avatar
Steven Liu committed
124
Pass the adapters, prompt, and control images to [`StableDiffusionXLAdapterPipeline`]. Use the `adapter_conditioning_scale` parameter to determine how much weight to assign to each control.
Steven Liu's avatar
Steven Liu committed
125
126

```py
Steven Liu's avatar
Steven Liu committed
127
128
129
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
130
    torch_dtype=torch.float16,
Steven Liu's avatar
Steven Liu committed
131
    vae=vae,
Steven Liu's avatar
Steven Liu committed
132
133
134
    adapter=adapters,
).to("cuda")

Steven Liu's avatar
Steven Liu committed
135
136
137
138
139
140
141
pipeline(
    prompt,
    image=controls,
    height=1024,
    width=1024,
    adapter_conditioning_scale=[0.7, 0.7]
).images[0]
Steven Liu's avatar
Steven Liu committed
142
143
```

Steven Liu's avatar
Steven Liu committed
144
145
146
147
148
149
150
151
152
153
154
155
156
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
  <figure>
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Generated image (prompt only)"/>
    <figcaption style="text-align: center;">canny image</figcaption>
  </figure>
  <figure>
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png" width="300" alt="Control image (Canny edges)"/>
    <figcaption style="text-align: center;">depth map</figcaption>
  </figure>
  <figure> 
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-multi-rabbit.png" width="300" alt="Generated image (ControlNet + prompt)"/>
    <figcaption style="text-align: center;">generated image</figcaption>
  </figure>
Steven Liu's avatar
Steven Liu committed
157
</div>