"git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "2a8cf8e39fca6fb3e92b3c36e15358dbdc404de7"
Unverified Commit abc47dec authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[SDXL, Docs] Textual inversion (#5039)



* [SDXL, Docs] Textual inversion

* Update docs/source/en/using-diffusers/sdxl.md

* finish

* Apply suggestions from code review
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent 941473a1
...@@ -397,6 +397,8 @@ image = pipeline(prompt=prompt, prompt_2=prompt_2).images[0] ...@@ -397,6 +397,8 @@ image = pipeline(prompt=prompt, prompt_2=prompt_2).images[0]
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-double-prompt.png" alt="generated image of an astronaut in a jungle in the style of a van gogh painting"/> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-double-prompt.png" alt="generated image of an astronaut in a jungle in the style of a van gogh painting"/>
</div> </div>
The dual text-encoders also support textual inversion embeddings that need to be loaded separately as explained in the [SDXL textual inversion](textual_inversion_inference#stable-diffusion-xl] section.
## Optimizations ## Optimizations
SDXL is a large model, and you may need to optimize memory to get it to run on your hardware. Here are some tips to save memory and speed up inference. SDXL is a large model, and you may need to optimize memory to get it to run on your hardware. Here are some tips to save memory and speed up inference.
...@@ -426,4 +428,4 @@ SDXL is a large model, and you may need to optimize memory to get it to run on y ...@@ -426,4 +428,4 @@ SDXL is a large model, and you may need to optimize memory to get it to run on y
## Other resources ## Other resources
If you're interested in experimenting with a minimal version of the [`UNet2DConditionModel`] used in SDXL, take a look at the [minSDXL](https://github.com/cloneofsimo/minSDXL) implementation which is written in PyTorch and directly compatible with 🤗 Diffusers. If you're interested in experimenting with a minimal version of the [`UNet2DConditionModel`] used in SDXL, take a look at the [minSDXL](https://github.com/cloneofsimo/minSDXL) implementation which is written in PyTorch and directly compatible with 🤗 Diffusers.
\ No newline at end of file
...@@ -28,6 +28,8 @@ from diffusers.utils import make_image_grid ...@@ -28,6 +28,8 @@ from diffusers.utils import make_image_grid
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
``` ```
## Stable Diffusion 1 and 2
Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer): Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer):
```py ```py
...@@ -69,3 +71,50 @@ grid ...@@ -69,3 +71,50 @@ grid
<div class="flex justify-center"> <div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png"> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png">
</div> </div>
## Stable Diffusion XL
Stable Diffusion XL (SDXL) can also use textual inversion vectors for inference. In contrast to Stable Diffusion 1 and 2, SDXL has two text encoders so you'll need two textual inversion embeddings - one for each text encoder model.
Let's download the SDXL textual inversion embeddings and have a closer look at it's structure:
```py
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
file = hf_hub_download("dn118/unaestheticXL", filename="unaestheticXLv31.safetensors")
state_dict = load_file(file)
state_dict
```
```
{'clip_g': tensor([[ 0.0077, -0.0112, 0.0065, ..., 0.0195, 0.0159, 0.0275],
...,
[-0.0170, 0.0213, 0.0143, ..., -0.0302, -0.0240, -0.0362]],
'clip_l': tensor([[ 0.0023, 0.0192, 0.0213, ..., -0.0385, 0.0048, -0.0011],
...,
[ 0.0475, -0.0508, -0.0145, ..., 0.0070, -0.0089, -0.0163]],
```
There are two tensors, `"clip-g"` and `"clip-l"`.
`"clip-g"` corresponds to the bigger text encoder in SDXL and refers to
`pipe.text_encoder_2` and `"clip-l"` refers to `pipe.text_encoder`.
Now you can load each tensor separately by passing them along with the correct text encoder and tokenizer
to [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`]:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.load_textual_inversion(state_dict["clip_g"], token="unaestheticXLv31", text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2)
pipe.load_textual_inversion(state_dict["clip_l"], token="unaestheticXLv31", text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer)
# the embedding should be used as a negative embedding, so we pass it as a negative prompt
generator = torch.Generator().manual_seed(33)
image = pipe("a woman standing in front of a mountain", negative_prompt="unaestheticXLv31", generator=generator).images[0]
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment