stable_unclip.mdx 2.98 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Will Berman's avatar
Will Berman committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Stable unCLIP

Stable unCLIP checkpoints are finetuned from [stable diffusion 2.1](./stable_diffusion_2) checkpoints to condition on CLIP image embeddings.
Stable unCLIP also still conditions on text embeddings. Given the two separate conditionings, stable unCLIP can be used
for text guided image variation. When combined with an unCLIP prior, it can also be used for full text to image generation.

## Tips

Stable unCLIP takes a `noise_level` as input during inference. `noise_level` determines how much noise is added 
to the image embeddings. A higher `noise_level` increases variation in the final un-noised images. By default, 
we do not add any additional noise to the image embeddings i.e. `noise_level = 0`.

### Available checkpoints:

TODO

### Text-to-Image Generation

```python
import torch
from diffusers import StableUnCLIPPipeline

pipe = StableUnCLIPPipeline.from_pretrained(
    "fusing/stable-unclip-2-1-l", torch_dtype=torch.float16
)  # TODO update model path
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
images = pipe(prompt).images
images[0].save("astronaut_horse.png")
```


### Text guided Image-to-Image Variation

```python
import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import StableUnCLIPImg2ImgPipeline

pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "fusing/stable-unclip-2-1-l-img2img", torch_dtype=torch.float16
)  # TODO update model path
pipe = pipe.to("cuda")

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"

images = pipe(prompt, init_image).images
images[0].save("fantasy_landscape.png")
```

### StableUnCLIPPipeline

[[autodoc]] StableUnCLIPPipeline
	- all
	- __call__
	- enable_attention_slicing
	- disable_attention_slicing
	- enable_vae_slicing
	- disable_vae_slicing
	- enable_xformers_memory_efficient_attention
	- disable_xformers_memory_efficient_attention


### StableUnCLIPImg2ImgPipeline

[[autodoc]] StableUnCLIPImg2ImgPipeline
	- all
	- __call__
	- enable_attention_slicing
	- disable_attention_slicing
	- enable_vae_slicing
	- disable_vae_slicing
	- enable_xformers_memory_efficient_attention
	- disable_xformers_memory_efficient_attention