"vscode:/vscode.git/clone" did not exist on "320506c75ad7d40c7d9fc6db5a45955906c5f729"
custom_pipeline_overview.md 13.3 KB
Newer Older
1
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Patrick von Platen's avatar
Patrick von Platen committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
# Load community pipelines and components
Patrick von Platen's avatar
Patrick von Platen committed
14

15
16
[[open-in-colab]]

17
18
## Community pipelines

Steven Liu's avatar
Steven Liu committed
19
Community pipelines are any [`DiffusionPipeline`] class that are different from the original paper implementation (for example, the [`StableDiffusionControlNetPipeline`] corresponds to the [Text-to-Image Generation with ControlNet Conditioning](https://arxiv.org/abs/2302.05543) paper). They provide additional functionality or extend the original implementation of a pipeline.
Patrick von Platen's avatar
Patrick von Platen committed
20

Steven Liu's avatar
Steven Liu committed
21
There are many cool community pipelines like [Marigold Depth Estimation](https://github.com/huggingface/diffusers/tree/main/examples/community#marigold-depth-estimation) or [InstantID](https://github.com/huggingface/diffusers/tree/main/examples/community#instantid-pipeline), and you can find all the official community pipelines [here](https://github.com/huggingface/diffusers/tree/main/examples/community).
Patrick von Platen's avatar
Patrick von Platen committed
22

Steven Liu's avatar
Steven Liu committed
23
There are two types of community pipelines, those stored on the Hugging Face Hub and those stored on Diffusers GitHub repository. Hub pipelines are completely customizable (scheduler, models, pipeline code, etc.) while Diffusers GitHub pipelines are only limited to custom pipeline code. Refer to this [table](./contribute_pipeline#share-your-pipeline) for a more detailed comparison of Hub vs GitHub community pipelines.
Patrick von Platen's avatar
Patrick von Platen committed
24

Steven Liu's avatar
Steven Liu committed
25
26
<hfoptions id="community">
<hfoption id="Hub pipelines">
Patrick von Platen's avatar
Patrick von Platen committed
27

Steven Liu's avatar
Steven Liu committed
28
To load a Hugging Face Hub community pipeline, pass the repository id of the community pipeline to the `custom_pipeline` argument and the model repository where you'd like to load the pipeline weights and components from. For example, the example below loads a dummy pipeline from [hf-internal-testing/diffusers-dummy-pipeline](https://huggingface.co/hf-internal-testing/diffusers-dummy-pipeline/blob/main/pipeline.py) and the pipeline weights and components from [google/ddpm-cifar10-32](https://huggingface.co/google/ddpm-cifar10-32):
29

Steven Liu's avatar
Steven Liu committed
30
31
> [!WARNING]
> By loading a community pipeline from the Hugging Face Hub, you are trusting that the code you are loading is safe. Make sure to inspect the code online before loading and running it automatically!
32
33

```py
Patrick von Platen's avatar
Patrick von Platen committed
34
35
36
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
37
    "google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline", use_safetensors=True
Patrick von Platen's avatar
Patrick von Platen committed
38
39
40
)
```

Steven Liu's avatar
Steven Liu committed
41
42
43
44
</hfoption>
<hfoption id="GitHub pipelines">

To load a GitHub community pipeline, pass the repository id of the community pipeline to the `custom_pipeline` argument and the model repository where you you'd like to load the pipeline weights and components from. You can also load model components directly. The example below loads the community [CLIP Guided Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#clip-guided-stable-diffusion) pipeline and the CLIP model components.
Patrick von Platen's avatar
Patrick von Platen committed
45

46
```py
Patrick von Platen's avatar
Patrick von Platen committed
47
from diffusers import DiffusionPipeline
48
from transformers import CLIPImageProcessor, CLIPModel
Patrick von Platen's avatar
Patrick von Platen committed
49
50
51

clip_model_id = "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"

52
feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id)
Patrick von Platen's avatar
Patrick von Platen committed
53
54
55
clip_model = CLIPModel.from_pretrained(clip_model_id)

pipeline = DiffusionPipeline.from_pretrained(
apolinario's avatar
apolinario committed
56
    "runwayml/stable-diffusion-v1-5",
Patrick von Platen's avatar
Patrick von Platen committed
57
58
59
    custom_pipeline="clip_guided_stable_diffusion",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
60
    use_safetensors=True,
Patrick von Platen's avatar
Patrick von Platen committed
61
62
63
)
```

Steven Liu's avatar
Steven Liu committed
64
65
66
</hfoption>
</hfoptions>

67
68
### Load from a local file

Steven Liu's avatar
Steven Liu committed
69
Community pipelines can also be loaded from a local file if you pass a file path instead. The path to the passed directory must contain a pipeline.py file that contains the pipeline class.
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

```py
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="./path/to/pipeline_directory/",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
    use_safetensors=True,
)
```

### Load from a specific version

By default, community pipelines are loaded from the latest stable version of Diffusers. To load a community pipeline from another version, use the `custom_revision` parameter.

<hfoptions id="version">
<hfoption id="main">

Steven Liu's avatar
Steven Liu committed
88
For example, to load from the main branch:
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

```py
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="clip_guided_stable_diffusion",
    custom_revision="main",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
    use_safetensors=True,
)
```

</hfoption>
<hfoption id="older version">

Steven Liu's avatar
Steven Liu committed
104
For example, to load from a previous version of Diffusers like v0.25.0:
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

```py
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="clip_guided_stable_diffusion",
    custom_revision="v0.25.0",
    clip_model=clip_model,
    feature_extractor=feature_extractor,
    use_safetensors=True,
)
```

</hfoption>
</hfoptions>

Steven Liu's avatar
Steven Liu committed
120
### Load with from_pipe
121

Steven Liu's avatar
Steven Liu committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
Community pipelines can also be loaded with the [`~DiffusionPipeline.from_pipe`] method which allows you to load and reuse multiple pipelines without any additional memory overhead (learn more in the [Reuse a pipeline](./loading#reuse-a-pipeline) guide). The memory requirement is determined by the largest single pipeline loaded.

For example, let's load a community pipeline that supports [long prompts with weighting](https://github.com/huggingface/diffusers/tree/main/examples/community#long-prompt-weighting-stable-diffusion) from a Stable Diffusion pipeline.

```py
import torch
from diffusers import DiffusionPipeline

pipe_sd = DiffusionPipeline.from_pretrained("emilianJR/CyberRealistic_V3", torch_dtype=torch.float16)
pipe_sd.to("cuda")
# load long prompt weighting pipeline
pipe_lpw = DiffusionPipeline.from_pipe(
    pipe_sd,
    custom_pipeline="lpw_stable_diffusion",
).to("cuda")

prompt = "cat, hiding in the leaves, ((rain)), zazie rainyday, beautiful eyes, macro shot, colorful details, natural lighting, amazing composition, subsurface scattering, amazing textures, filmic, soft light, ultra-detailed eyes, intricate details, detailed texture, light source contrast, dramatic shadows, cinematic light, depth of field, film grain, noise, dark background, hyperrealistic dslr film still, dim volumetric cinematic lighting"
neg_prompt = "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
generator = torch.Generator(device="cpu").manual_seed(20)
out_lpw = pipe_lpw(
    prompt, 
    negative_prompt=neg_prompt, 
    width=512,
    height=512,
    max_embeddings_multiples=3, 
    num_inference_steps=50,
    generator=generator,
    ).images[0]
out_lpw
```

<div class="flex gap-4">
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/from_pipe_lpw.png" />
    <figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion with long prompt weighting</figcaption>
  </div>
  <div>
    <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/from_pipe_non_lpw.png" />
    <figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion</figcaption>
  </div>
</div>
163
164
165

## Community components

166
Community components allow users to build pipelines that may have customized components that are not a part of Diffusers. If your pipeline has custom components that Diffusers doesn't already support, you need to provide their implementations as Python modules. These customized components could be a VAE, UNet, and scheduler. In most cases, the text encoder is imported from the Transformers library. The pipeline code itself can also be customized.
167

168
This section shows how users should use community components to build a community pipeline.
169

Steven Liu's avatar
Steven Liu committed
170
You'll use the [showlab/show-1-base](https://huggingface.co/showlab/show-1-base) pipeline checkpoint as an example.
171

172
1. Import and load the text encoder from Transformers:
173
174
175
176
177
178
179
180
181

```python
from transformers import T5Tokenizer, T5EncoderModel

pipe_id = "showlab/show-1-base"
tokenizer = T5Tokenizer.from_pretrained(pipe_id, subfolder="tokenizer")
text_encoder = T5EncoderModel.from_pretrained(pipe_id, subfolder="text_encoder")
```

182
2. Load a scheduler:
183
184
185
186
187
188
189

```python
from diffusers import DPMSolverMultistepScheduler

scheduler = DPMSolverMultistepScheduler.from_pretrained(pipe_id, subfolder="scheduler")
```

190
3. Load an image processor:
191
192
193
194
195
196
197

```python
from transformers import CLIPFeatureExtractor

feature_extractor = CLIPFeatureExtractor.from_pretrained(pipe_id, subfolder="feature_extractor")
```

198
199
200
201
202
203
<Tip warning={true}>

In steps 4 and 5, the custom [UNet](https://github.com/showlab/Show-1/blob/main/showone/models/unet_3d_condition.py) and [pipeline](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py) implementation must match the format shown in their files for this example to work.

</Tip>

Steven Liu's avatar
Steven Liu committed
204
4. Now you'll load a [custom UNet](https://github.com/showlab/Show-1/blob/main/showone/models/unet_3d_condition.py), which in this example, has already been implemented in [showone_unet_3d_condition.py](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py) for your convenience. You'll notice the [`UNet3DConditionModel`] class name is changed to `ShowOneUNet3DConditionModel` because [`UNet3DConditionModel`] already exists in Diffusers. Any components needed for the `ShowOneUNet3DConditionModel` class should be placed in showone_unet_3d_condition.py.
205

Steven Liu's avatar
Steven Liu committed
206
    Once this is done, you can initialize the UNet:
207

Steven Liu's avatar
Steven Liu committed
208
209
    ```python
    from showone_unet_3d_condition import ShowOneUNet3DConditionModel
210

Steven Liu's avatar
Steven Liu committed
211
212
    unet = ShowOneUNet3DConditionModel.from_pretrained(pipe_id, subfolder="unet")
    ```
213

Steven Liu's avatar
Steven Liu committed
214
5. Finally, you'll load the custom pipeline code. For this example, it has already been created for you in [pipeline_t2v_base_pixel.py](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/pipeline_t2v_base_pixel.py). This script contains a custom `TextToVideoIFPipeline` class for generating videos from text. Just like the custom UNet, any code needed for the custom pipeline to work should go in pipeline_t2v_base_pixel.py.
215

216
Once everything is in place, you can initialize the `TextToVideoIFPipeline` with the `ShowOneUNet3DConditionModel`:
217
218
219
220
221
222

```python
from pipeline_t2v_base_pixel import TextToVideoIFPipeline
import torch

pipeline = TextToVideoIFPipeline(
223
224
225
226
    unet=unet,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    scheduler=scheduler,
227
228
229
230
231
232
    feature_extractor=feature_extractor
)
pipeline = pipeline.to(device="cuda")
pipeline.torch_dtype = torch.float16
```

233
Push the pipeline to the Hub to share with the community!
234
235
236
237
238

```python
pipeline.push_to_hub("custom-t2v-pipeline")
```

Steven Liu's avatar
Steven Liu committed
239
After the pipeline is successfully pushed, you need to make a few changes:
240

Steven Liu's avatar
Steven Liu committed
241
242
243
1. Change the `_class_name` attribute in [model_index.json](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/model_index.json#L2) to `"pipeline_t2v_base_pixel"` and `"TextToVideoIFPipeline"`.
2. Upload `showone_unet_3d_condition.py` to the [unet](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py) subfolder.
3. Upload `pipeline_t2v_base_pixel.py` to the pipeline [repository](https://huggingface.co/sayakpaul/show-1-base-with-code/tree/main).
244

Steven Liu's avatar
Steven Liu committed
245
246
247
248
To run inference, add the `trust_remote_code` argument while initializing the pipeline to handle all the "magic" behind the scenes.

> [!WARNING]
> As an additional precaution with `trust_remote_code=True`, we strongly encourage you to pass a commit hash to the `revision` parameter in [`~DiffusionPipeline.from_pretrained`] to make sure the code hasn't been updated with some malicious new lines of code (unless you fully trust the model owners).
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273

```python
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "<change-username>/<change-id>", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")

prompt = "hello"

# Text embeds
prompt_embeds, negative_embeds = pipeline.encode_prompt(prompt)

# Keyframes generation (8x64x40, 2fps)
video_frames = pipeline(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    num_frames=8,
    height=40,
    width=64,
    num_inference_steps=2,
    guidance_scale=9.0,
    output_type="pt"
).frames
274
275
```

Steven Liu's avatar
Steven Liu committed
276
As an additional reference, take a look at the repository structure of [stabilityai/japanese-stable-diffusion-xl](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl/) which also uses the `trust_remote_code` feature.
277
278
279
280
281
282
283
284
285

```python
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/japanese-stable-diffusion-xl", trust_remote_code=True
)
pipeline.to("cuda")
286
```