Unverified Commit 7942bb8d authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

[`Docs`] Fix typos, improve, update at Using Diffusers' Task page (#5611)



* Fix typos, improve, update; kandinsky doesn't want fp16 due to deprecation; ogkalu and kohbanye don't have safetensor; add make_image_grid for better visualization

* Update inpaint.md

* Remove erronous Space

* Update docs/source/en/using-diffusers/conditional_image_generation.md
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* Update img2img.md

* load_image() already converts to RGB

* Update depth2img.md

* Update img2img.md

* Update inpaint.md

---------
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent aab6de22
...@@ -30,6 +30,7 @@ You can generate images from a prompt in 🤗 Diffusers in two steps: ...@@ -30,6 +30,7 @@ You can generate images from a prompt in 🤗 Diffusers in two steps:
```py ```py
from diffusers import AutoPipelineForText2Image from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained( pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
...@@ -42,6 +43,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -42,6 +43,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
image = pipeline( image = pipeline(
"stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k" "stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k"
).images[0] ).images[0]
image
``` ```
<div class="flex justify-center"> <div class="flex justify-center">
...@@ -65,6 +67,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -65,6 +67,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
).to("cuda") ).to("cuda")
generator = torch.Generator("cuda").manual_seed(31) generator = torch.Generator("cuda").manual_seed(31)
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0] image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
image
``` ```
### Stable Diffusion XL ### Stable Diffusion XL
...@@ -80,6 +83,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -80,6 +83,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
).to("cuda") ).to("cuda")
generator = torch.Generator("cuda").manual_seed(31) generator = torch.Generator("cuda").manual_seed(31)
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0] image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
image
``` ```
### Kandinsky 2.2 ### Kandinsky 2.2
...@@ -93,15 +97,16 @@ from diffusers import AutoPipelineForText2Image ...@@ -93,15 +97,16 @@ from diffusers import AutoPipelineForText2Image
import torch import torch
pipeline = AutoPipelineForText2Image.from_pretrained( pipeline = AutoPipelineForText2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, variant="fp16" "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
generator = torch.Generator("cuda").manual_seed(31) generator = torch.Generator("cuda").manual_seed(31)
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0] image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
image
``` ```
### ControlNet ### ControlNet
ControlNet are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet's, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them. ControlNet models are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them.
In this example, let's condition the ControlNet with a human pose estimation image. Load the ControlNet model pretrained on human pose estimations: In this example, let's condition the ControlNet with a human pose estimation image. Load the ControlNet model pretrained on human pose estimations:
...@@ -124,6 +129,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -124,6 +129,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
).to("cuda") ).to("cuda")
generator = torch.Generator("cuda").manual_seed(31) generator = torch.Generator("cuda").manual_seed(31)
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=pose_image, generator=generator).images[0] image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=pose_image, generator=generator).images[0]
image
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -163,6 +169,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -163,6 +169,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
image = pipeline( image = pipeline(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", height=768, width=512 "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", height=768, width=512
).images[0] ).images[0]
image
``` ```
<div class="flex justify-center"> <div class="flex justify-center">
...@@ -171,7 +178,7 @@ image = pipeline( ...@@ -171,7 +178,7 @@ image = pipeline(
<Tip warning={true}> <Tip warning={true}>
Other models may have different default image sizes depending on the image size's in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first! Other models may have different default image sizes depending on the image sizes in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first!
</Tip> </Tip>
...@@ -189,6 +196,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -189,6 +196,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
image = pipeline( image = pipeline(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", guidance_scale=3.5 "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", guidance_scale=3.5
).images[0] ).images[0]
image
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -221,16 +229,17 @@ image = pipeline( ...@@ -221,16 +229,17 @@ image = pipeline(
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
).images[0] ).images[0]
image
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
<div class="flex-1"> <div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-1.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-1.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
</div> </div>
<div class="flex-1"> <div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-2.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-2.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "astronaut"</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "astronaut"</figcaption>
</div> </div>
</div> </div>
...@@ -252,6 +261,7 @@ image = pipeline( ...@@ -252,6 +261,7 @@ image = pipeline(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
generator=generator, generator=generator,
).images[0] ).images[0]
image
``` ```
## Control image generation ## Control image generation
...@@ -278,14 +288,14 @@ pipeline = AutoPipelineForText2Image.from_pretrained( ...@@ -278,14 +288,14 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
image = pipeline( image = pipeline(
prompt_emebds=prompt_embeds, # generated from Compel prompt_embeds=prompt_embeds, # generated from Compel
negative_prompt_embeds=negative_prompt_embeds, # generated from Compel negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
).images[0] ).images[0]
``` ```
### ControlNet ### ControlNet
As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)! As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet model pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)!
There are many types of conditioning inputs you can use, and 🤗 Diffusers supports ControlNet for Stable Diffusion and SDXL models. Take a look at the more comprehensive [ControlNet](controlnet) guide to learn how you can use these models. There are many types of conditioning inputs you can use, and 🤗 Diffusers supports ControlNet for Stable Diffusion and SDXL models. Take a look at the more comprehensive [ControlNet](controlnet) guide to learn how you can use these models.
...@@ -300,7 +310,7 @@ from diffusers import AutoPipelineForText2Image ...@@ -300,7 +310,7 @@ from diffusers import AutoPipelineForText2Image
import torch import torch
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda") pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda")
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overheard", fullgraph=True) pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
``` ```
For more tips on how to optimize your code to save memory and speed up inference, read the [Memory and speed](../optimization/fp16) and [Torch 2.0](../optimization/torch2.0) guides. For more tips on how to optimize your code to save memory and speed up inference, read the [Memory and speed](../optimization/fp16) and [Torch 2.0](../optimization/torch2.0) guides.
...@@ -20,12 +20,10 @@ Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]: ...@@ -20,12 +20,10 @@ Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
```python ```python
import torch import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline from diffusers import StableDiffusionDepth2ImgPipeline
from diffusers.utils import load_image, make_image_grid
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained( pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth", "stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16, torch_dtype=torch.float16,
use_safetensors=True, use_safetensors=True,
...@@ -36,22 +34,13 @@ Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to p ...@@ -36,22 +34,13 @@ Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to p
```python ```python
url = "http://images.cocodataset.org/val2017/000000039769.jpg" url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw) init_image = load_image(url)
prompt = "two tigers" prompt = "two tigers"
n_prompt = "bad, deformed, ugly, bad anatomy" negative_prompt = "bad, deformed, ugly, bad anatomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0] image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
| Input | Output | | Input | Output |
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| |---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> | | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map!
<iframe
src="https://radames-stable-diffusion-depth2img.hf.space"
frameborder="0"
width="850"
height="500"
></iframe>
...@@ -21,13 +21,15 @@ With 🤗 Diffusers, this is as easy as 1-2-3: ...@@ -21,13 +21,15 @@ With 🤗 Diffusers, this is as easy as 1-2-3:
1. Load a checkpoint into the [`AutoPipelineForImage2Image`] class; this pipeline automatically handles loading the correct pipeline class based on the checkpoint: 1. Load a checkpoint into the [`AutoPipelineForImage2Image`] class; this pipeline automatically handles loading the correct pipeline class based on the checkpoint:
```py ```py
import torch
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
``` ```
...@@ -48,7 +50,7 @@ init_image = load_image("https://huggingface.co/datasets/huggingface/documentati ...@@ -48,7 +50,7 @@ init_image = load_image("https://huggingface.co/datasets/huggingface/documentati
```py ```py
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k" prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipeline(prompt, image=init_image).images[0] image = pipeline(prompt, image=init_image).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex gap-4"> <div class="flex gap-4">
...@@ -72,27 +74,25 @@ Stable Diffusion v1.5 is a latent diffusion model initialized from an earlier ch ...@@ -72,27 +74,25 @@ Stable Diffusion v1.5 is a latent diffusion model initialized from an earlier ch
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0] image = pipeline(prompt, image=init_image).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex gap-4"> <div class="flex gap-4">
...@@ -112,27 +112,25 @@ SDXL is a more powerful version of the Stable Diffusion model. It uses a larger ...@@ -112,27 +112,25 @@ SDXL is a more powerful version of the Stable Diffusion model. It uses a larger
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.5).images[0] image = pipeline(prompt, image=init_image, strength=0.5).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex gap-4"> <div class="flex gap-4">
...@@ -154,27 +152,25 @@ The simplest way to use Kandinsky 2.2 is: ...@@ -154,27 +152,25 @@ The simplest way to use Kandinsky 2.2 is:
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0] image = pipeline(prompt, image=init_image).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex gap-4"> <div class="flex gap-4">
...@@ -199,32 +195,29 @@ There are several important parameters you can configure in the pipeline that'll ...@@ -199,32 +195,29 @@ There are several important parameters you can configure in the pipeline that'll
- 📈 a higher `strength` value gives the model more "creativity" to generate an image that's different from the initial image; a `strength` value of 1.0 means the initial image is more or less ignored - 📈 a higher `strength` value gives the model more "creativity" to generate an image that's different from the initial image; a `strength` value of 1.0 means the initial image is more or less ignored
- 📉 a lower `strength` value means the generated image is more similar to the initial image - 📉 a lower `strength` value means the generated image is more similar to the initial image
The `strength` and `num_inference_steps` parameter are related because `strength` determines the number of noise steps to add. For example, if the `num_inference_steps` is 50 and `strength` is 0.8, then this means adding 40 (50 * 0.8) steps of noise to the initial image and then denoising for 40 steps to get the newly generated image. The `strength` and `num_inference_steps` parameters are related because `strength` determines the number of noise steps to add. For example, if the `num_inference_steps` is 50 and `strength` is 0.8, then this means adding 40 (50 * 0.8) steps of noise to the initial image and then denoising for 40 steps to get the newly generated image.
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = init_image
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.8).images[0] image = pipeline(prompt, image=init_image, strength=0.8).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -250,27 +243,25 @@ You can combine `guidance_scale` with `strength` for even more precise control o ...@@ -250,27 +243,25 @@ You can combine `guidance_scale` with `strength` for even more precise control o
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, guidance_scale=8.0).images[0] image = pipeline(prompt, image=init_image, guidance_scale=8.0).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -294,38 +285,36 @@ A negative prompt conditions the model to *not* include things in an image, and ...@@ -294,38 +285,36 @@ A negative prompt conditions the model to *not* include things in an image, and
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy" negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"
# pass prompt and image to pipeline # pass prompt and image to pipeline
image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image).images[0] image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
<div class="flex-1"> <div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-negative-1.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-negative-1.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
</div> </div>
<div class="flex-1"> <div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-negative-2.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-negative-2.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "jungle"</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "jungle"</figcaption>
</div> </div>
</div> </div>
...@@ -342,52 +331,54 @@ Start by generating an image with the text-to-image pipeline: ...@@ -342,52 +331,54 @@ Start by generating an image with the text-to-image pipeline:
```py ```py
from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
import torch import torch
from diffusers.utils import make_image_grid
pipeline = AutoPipelineForText2Image.from_pretrained( pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k").images[0] text2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k").images[0]
text2image
``` ```
Now you can pass this generated image to the image-to-image pipeline: Now you can pass this generated image to the image-to-image pipeline:
```py ```py
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=image).images[0] image2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=text2image).images[0]
image make_image_grid([text2image, image2image], rows=1, cols=2)
``` ```
### Image-to-image-to-image ### Image-to-image-to-image
You can also chain multiple image-to-image pipelines together to create more interesting images. This can be useful for iteratively performing style transfer on an image, generate short GIFs, restore color to an image, or restore missing areas of an image. You can also chain multiple image-to-image pipelines together to create more interesting images. This can be useful for iteratively performing style transfer on an image, generating short GIFs, restoring color to an image, or restoring missing areas of an image.
Start by generating an image: Start by generating an image:
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
...@@ -404,10 +395,11 @@ It is important to specify `output_type="latent"` in the pipeline to keep all th ...@@ -404,10 +395,11 @@ It is important to specify `output_type="latent"` in the pipeline to keep all th
Pass the latent output from this pipeline to the next pipeline to generate an image in a [comic book art style](https://huggingface.co/ogkalu/Comic-Diffusion): Pass the latent output from this pipeline to the next pipeline to generate an image in a [comic book art style](https://huggingface.co/ogkalu/Comic-Diffusion):
```py ```py
pipelne = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"ogkalu/Comic-Diffusion", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "ogkalu/Comic-Diffusion", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# need to include the token "charliebo artstyle" in the prompt to use this checkpoint # need to include the token "charliebo artstyle" in the prompt to use this checkpoint
...@@ -418,14 +410,15 @@ Repeat one more time to generate the final image in a [pixel art style](https:// ...@@ -418,14 +410,15 @@ Repeat one more time to generate the final image in a [pixel art style](https://
```py ```py
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"kohbanye/pixel-art-style", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "kohbanye/pixel-art-style", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# need to include the token "pixelartstyle" in the prompt to use this checkpoint # need to include the token "pixelartstyle" in the prompt to use this checkpoint
image = pipeline("Astronaut in a jungle, pixelartstyle", image=image).images[0] image = pipeline("Astronaut in a jungle, pixelartstyle", image=image).images[0]
image make_image_grid([init_image, image], rows=1, cols=2)
``` ```
### Image-to-upscaler-to-super-resolution ### Image-to-upscaler-to-super-resolution
...@@ -436,21 +429,19 @@ Start with an image-to-image pipeline: ...@@ -436,21 +429,19 @@ Start with an image-to-image pipeline:
```py ```py
import torch import torch
import requests
from PIL import Image
from io import BytesIO
from diffusers import AutoPipelineForImage2Image from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
...@@ -467,7 +458,9 @@ It is important to specify `output_type="latent"` in the pipeline to keep all th ...@@ -467,7 +458,9 @@ It is important to specify `output_type="latent"` in the pipeline to keep all th
Chain it to an upscaler pipeline to increase the image resolution: Chain it to an upscaler pipeline to increase the image resolution:
```py ```py
upscaler = AutoPipelineForImage2Image.from_pretrained( from diffusers import StableDiffusionLatentUpscalePipeline
upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(
"stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
upscaler.enable_model_cpu_offload() upscaler.enable_model_cpu_offload()
...@@ -479,14 +472,16 @@ image_2 = upscaler(prompt, image=image_1, output_type="latent").images[0] ...@@ -479,14 +472,16 @@ image_2 = upscaler(prompt, image=image_1, output_type="latent").images[0]
Finally, chain it to a super-resolution pipeline to further enhance the resolution: Finally, chain it to a super-resolution pipeline to further enhance the resolution:
```py ```py
super_res = AutoPipelineForImage2Image.from_pretrained( from diffusers import StableDiffusionUpscalePipeline
super_res = StableDiffusionUpscalePipeline.from_pretrained(
"stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
super_res.enable_model_cpu_offload() super_res.enable_model_cpu_offload()
super_res.enable_xformers_memory_efficient_attention() super_res.enable_xformers_memory_efficient_attention()
image_3 = upscaler(prompt, image=image_2).images[0] image_3 = super_res(prompt, image=image_2).images[0]
image_3 make_image_grid([init_image, image_3.resize((512, 512))], rows=1, cols=2)
``` ```
## Control image generation ## Control image generation
...@@ -504,13 +499,14 @@ from diffusers import AutoPipelineForImage2Image ...@@ -504,13 +499,14 @@ from diffusers import AutoPipelineForImage2Image
import torch import torch
pipeline = AutoPipelineForImage2Image.from_pretrained( pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline(prompt_emebds=prompt_embeds, # generated from Compel image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
negative_prompt_embeds, # generated from Compel negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
image=init_image, image=init_image,
).images[0] ).images[0]
``` ```
...@@ -522,19 +518,20 @@ ControlNets provide a more flexible and accurate way to control image generation ...@@ -522,19 +518,20 @@ ControlNets provide a more flexible and accurate way to control image generation
For example, let's condition an image with a depth map to keep the spatial information in the image. For example, let's condition an image with a depth map to keep the spatial information in the image.
```py ```py
from diffusers.utils import load_image, make_image_grid
# prepare image # prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
response = requests.get(url) init_image = load_image(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((958, 960)) # resize to depth image dimensions init_image = init_image.resize((958, 960)) # resize to depth image dimensions
depth_image = load_image("https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png") depth_image = load_image("https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png")
make_image_grid([init_image, depth_image], rows=1, cols=2)
``` ```
Load a ControlNet model conditioned on depth maps and the [`AutoPipelineForImage2Image`]: Load a ControlNet model conditioned on depth maps and the [`AutoPipelineForImage2Image`]:
```py ```py
from diffusers import ControlNetModel, AutoPipelineForImage2Image from diffusers import ControlNetModel, AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
...@@ -542,6 +539,7 @@ pipeline = AutoPipelineForImage2Image.from_pretrained( ...@@ -542,6 +539,7 @@ pipeline = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
``` ```
...@@ -549,8 +547,8 @@ Now generate a new image conditioned on the depth map, initial image, and prompt ...@@ -549,8 +547,8 @@ Now generate a new image conditioned on the depth map, initial image, and prompt
```py ```py
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipeline(prompt, image=init_image, control_image=depth_image).images[0] image_control_net = pipeline(prompt, image=init_image, control_image=depth_image).images[0]
image make_image_grid([init_image, depth_image, image_control_net], rows=1, cols=3)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -575,13 +573,14 @@ pipeline = AutoPipelineForImage2Image.from_pretrained( ...@@ -575,13 +573,14 @@ pipeline = AutoPipelineForImage2Image.from_pretrained(
"nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16, "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
prompt = "elden ring style astronaut in a jungle" # include the token "elden ring style" in the prompt prompt = "elden ring style astronaut in a jungle" # include the token "elden ring style" in the prompt
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy" negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"
image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image, strength=0.45, guidance_scale=10.5).images[0] image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image_control_net, strength=0.45, guidance_scale=10.5).images[0]
image make_image_grid([init_image, depth_image, image_control_net, image_elden_ring], rows=2, cols=2)
``` ```
<div class="flex justify-center"> <div class="flex justify-center">
...@@ -597,10 +596,10 @@ Running diffusion models is computationally expensive and intensive, but with a ...@@ -597,10 +596,10 @@ Running diffusion models is computationally expensive and intensive, but with a
+ pipeline.enable_xformers_memory_efficient_attention() + pipeline.enable_xformers_memory_efficient_attention()
``` ```
With [`torch.compile`](../optimization/torch2.0#torch.compile), you can boost your inference speed even more by wrapping your UNet with it: With [`torch.compile`](../optimization/torch2.0#torchcompile), you can boost your inference speed even more by wrapping your UNet with it:
```py ```py
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
``` ```
To learn more, take a look at the [Reduce memory usage](../optimization/memory) and [Torch 2.0](../optimization/torch2.0) guides. To learn more, take a look at the [Reduce memory usage](../optimization/memory) and [Torch 2.0](../optimization/torch2.0) guides.
...@@ -23,12 +23,13 @@ With 🤗 Diffusers, here is how you can do inpainting: ...@@ -23,12 +23,13 @@ With 🤗 Diffusers, here is how you can do inpainting:
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
``` ```
...@@ -41,8 +42,8 @@ You'll notice throughout the guide, we use [`~DiffusionPipeline.enable_model_cpu ...@@ -41,8 +42,8 @@ You'll notice throughout the guide, we use [`~DiffusionPipeline.enable_model_cpu
2. Load the base and mask images: 2. Load the base and mask images:
```py ```py
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
``` ```
3. Create a prompt to inpaint the image with and pass it to the pipeline with the base and mask images: 3. Create a prompt to inpaint the image with and pass it to the pipeline with the base and mask images:
...@@ -51,6 +52,7 @@ mask_image = load_image("https://huggingface.co/datasets/huggingface/documentati ...@@ -51,6 +52,7 @@ mask_image = load_image("https://huggingface.co/datasets/huggingface/documentati
prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k" prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured" negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0] image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex gap-4"> <div class="flex gap-4">
...@@ -58,6 +60,10 @@ image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_imag ...@@ -58,6 +60,10 @@ image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_imag
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">base image</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">base image</figcaption>
</div> </div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">mask image</figcaption>
</div>
<div> <div>
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint-cat.png"/> <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint-cat.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">generated image</figcaption> <figcaption class="mt-2 text-center text-sm text-gray-500">generated image</figcaption>
...@@ -79,7 +85,7 @@ Upload a base image to inpaint on and use the sketch tool to draw a mask. Once y ...@@ -79,7 +85,7 @@ Upload a base image to inpaint on and use the sketch tool to draw a mask. Once y
## Popular models ## Popular models
[Stable Diffusion Inpainting](https://huggingface.co/runwayml/stable-diffusion-inpainting), [Stable Diffusion XL (SDXL) Inpainting](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1), and [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder-inpaint) are among the most popular models for inpainting. SDXL typically produces higher resolution images than Stable Diffusion v1.5, and Kandinsky 2.2 is also capable of generating high-quality images. [Stable Diffusion Inpainting](https://huggingface.co/runwayml/stable-diffusion-inpainting), [Stable Diffusion XL (SDXL) Inpainting](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1), and [Kandinsky 2.2 Inpainting](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder-inpaint) are among the most popular models for inpainting. SDXL typically produces higher resolution images than Stable Diffusion v1.5, and Kandinsky 2.2 is also capable of generating high-quality images.
### Stable Diffusion Inpainting ### Stable Diffusion Inpainting
...@@ -88,21 +94,23 @@ Stable Diffusion Inpainting is a latent diffusion model finetuned on 512x512 ima ...@@ -88,21 +94,23 @@ Stable Diffusion Inpainting is a latent diffusion model finetuned on 512x512 ima
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
generator = torch.Generator("cuda").manual_seed(92) generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
### Stable Diffusion XL (SDXL) Inpainting ### Stable Diffusion XL (SDXL) Inpainting
...@@ -112,21 +120,23 @@ SDXL is a larger and more powerful version of Stable Diffusion v1.5. This model ...@@ -112,21 +120,23 @@ SDXL is a larger and more powerful version of Stable Diffusion v1.5. This model
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16" "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
generator = torch.Generator("cuda").manual_seed(92) generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
### Kandinsky 2.2 Inpainting ### Kandinsky 2.2 Inpainting
...@@ -136,21 +146,23 @@ The Kandinsky model family is similar to SDXL because it uses two models as well ...@@ -136,21 +146,23 @@ The Kandinsky model family is similar to SDXL because it uses two models as well
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
generator = torch.Generator("cuda").manual_seed(92) generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -186,20 +198,22 @@ Image features - like quality and "creativity" - are dependent on pipeline param ...@@ -186,20 +198,22 @@ Image features - like quality and "creativity" - are dependent on pipeline param
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -229,20 +243,22 @@ You can use `strength` and `guidance_scale` together for more control over how e ...@@ -229,20 +243,22 @@ You can use `strength` and `guidance_scale` together for more control over how e
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -267,22 +283,23 @@ A negative prompt assumes the opposite role of a prompt; it guides the model awa ...@@ -267,22 +283,23 @@ A negative prompt assumes the opposite role of a prompt; it guides the model awa
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry" negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0] image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
image make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex justify-center"> <div class="flex justify-center">
...@@ -302,7 +319,7 @@ import numpy as np ...@@ -302,7 +319,7 @@ import numpy as np
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
device = "cuda" device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
...@@ -334,6 +351,7 @@ mask_image_arr[mask_image_arr >= 0.5] = 1 ...@@ -334,6 +351,7 @@ mask_image_arr[mask_image_arr >= 0.5] = 1
unmasked_unchanged_image_arr = (1 - mask_image_arr) * init_image + mask_image_arr * repainted_image unmasked_unchanged_image_arr = (1 - mask_image_arr) * init_image + mask_image_arr * repainted_image
unmasked_unchanged_image = PIL.Image.fromarray(unmasked_unchanged_image_arr.round().astype("uint8")) unmasked_unchanged_image = PIL.Image.fromarray(unmasked_unchanged_image_arr.round().astype("uint8"))
unmasked_unchanged_image.save("force_unmasked_unchanged.png") unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)
``` ```
## Chained inpainting pipelines ## Chained inpainting pipelines
...@@ -349,35 +367,37 @@ Start with the text-to-image pipeline to create a castle: ...@@ -349,35 +367,37 @@ Start with the text-to-image pipeline to create a castle:
```py ```py
import torch import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForText2Image.from_pretrained( pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0] text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]
``` ```
Load the mask image of the output from above: Load the mask image of the output from above:
```py ```py
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")
``` ```
And let's inpaint the masked area with a waterfall: And let's inpaint the masked area with a waterfall:
```py ```py
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16, variant="fp16" "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
prompt = "digital painting of a fantasy waterfall, cloudy" prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=image, mask_image=mask_image).images[0] image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
image make_image_grid([text2image, mask_image, image], rows=1, cols=3)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -391,7 +411,6 @@ image ...@@ -391,7 +411,6 @@ image
</div> </div>
</div> </div>
### Inpaint-to-image-to-image ### Inpaint-to-image-to-image
You can also chain an inpainting pipeline before another pipeline like image-to-image or an upscaler to improve the quality. You can also chain an inpainting pipeline before another pipeline like image-to-image or an upscaler to improve the quality.
...@@ -401,23 +420,24 @@ Begin by inpainting an image: ...@@ -401,23 +420,24 @@ Begin by inpainting an image:
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0] image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
# resize image to 1024x1024 for SDXL # resize image to 1024x1024 for SDXL
image = image.resize((1024, 1024)) image_inpainting = image_inpainting.resize((1024, 1024))
``` ```
Now let's pass the image to another inpainting pipeline with SDXL's refiner model to enhance the image details and quality: Now let's pass the image to another inpainting pipeline with SDXL's refiner model to enhance the image details and quality:
...@@ -427,9 +447,10 @@ pipeline = AutoPipelineForInpainting.from_pretrained( ...@@ -427,9 +447,10 @@ pipeline = AutoPipelineForInpainting.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16" "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline(prompt=prompt, image=image, mask_image=mask_image, output_type="latent").images[0] image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]
``` ```
<Tip> <Tip>
...@@ -442,9 +463,11 @@ Finally, you can pass this image to an image-to-image pipeline to put the finish ...@@ -442,9 +463,11 @@ Finally, you can pass this image to an image-to-image pipeline to put the finish
```py ```py
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline) pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline(prompt=prompt, image=image).images[0] image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -477,18 +500,21 @@ Once you've generated the embeddings, pass them to the `prompt_embeds` (and `neg ...@@ -477,18 +500,21 @@ Once you've generated the embeddings, pass them to the `prompt_embeds` (and `neg
```py ```py
import torch import torch
from diffusers import AutoPipelineForInpainting from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid
pipeline = AutoPipelineForInpainting.from_pretrained( pipeline = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
image = pipeline(prompt_emebds=prompt_embeds, # generated from Compel image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
negative_prompt_embeds, # generated from Compel negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
image=init_image, image=init_image,
mask_image=mask_image mask_image=mask_image
).images[0] ).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
### ControlNet ### ControlNet
...@@ -501,7 +527,7 @@ For example, let's condition an image with a ControlNet pretrained on inpaint im ...@@ -501,7 +527,7 @@ For example, let's condition an image with a ControlNet pretrained on inpaint im
import torch import torch
import numpy as np import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image from diffusers.utils import load_image, make_image_grid
# load ControlNet # load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16") controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")
...@@ -511,11 +537,12 @@ pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained( ...@@ -511,11 +537,12 @@ pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16" "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
# load base and mask image # load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png").convert("RGB") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png").convert("RGB") mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
# prepare control image # prepare control image
def make_inpaint_condition(init_image, mask_image): def make_inpaint_condition(init_image, mask_image):
...@@ -536,7 +563,7 @@ Now generate an image from the base, mask and control images. You'll notice feat ...@@ -536,7 +563,7 @@ Now generate an image from the base, mask and control images. You'll notice feat
```py ```py
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0] image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
image make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)
``` ```
You can take this a step further and chain it with an image-to-image pipeline to apply a new [style](https://huggingface.co/nitrosocke/elden-ring-diffusion): You can take this a step further and chain it with an image-to-image pipeline to apply a new [style](https://huggingface.co/nitrosocke/elden-ring-diffusion):
...@@ -548,13 +575,14 @@ pipeline = AutoPipelineForImage2Image.from_pretrained( ...@@ -548,13 +575,14 @@ pipeline = AutoPipelineForImage2Image.from_pretrained(
"nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16, "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
).to("cuda") ).to("cuda")
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention() pipeline.enable_xformers_memory_efficient_attention()
prompt = "elden ring style castle" # include the token "elden ring style" in the prompt prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details" negative_prompt = "bad architecture, deformed, disfigured, poor details"
image = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0] image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
image make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)
``` ```
<div class="flex flex-row gap-4"> <div class="flex flex-row gap-4">
...@@ -576,17 +604,17 @@ image ...@@ -576,17 +604,17 @@ image
It can be difficult and slow to run diffusion models if you're resource constrained, but it doesn't have to be with a few optimization tricks. One of the biggest (and easiest) optimizations you can enable is switching to memory-efficient attention. If you're using PyTorch 2.0, [scaled-dot product attention](../optimization/torch2.0#scaled-dot-product-attention) is automatically enabled and you don't need to do anything else. For non-PyTorch 2.0 users, you can install and use [xFormers](../optimization/xformers)'s implementation of memory-efficient attention. Both options reduce memory usage and accelerate inference. It can be difficult and slow to run diffusion models if you're resource constrained, but it doesn't have to be with a few optimization tricks. One of the biggest (and easiest) optimizations you can enable is switching to memory-efficient attention. If you're using PyTorch 2.0, [scaled-dot product attention](../optimization/torch2.0#scaled-dot-product-attention) is automatically enabled and you don't need to do anything else. For non-PyTorch 2.0 users, you can install and use [xFormers](../optimization/xformers)'s implementation of memory-efficient attention. Both options reduce memory usage and accelerate inference.
You can also offload the model to the GPU to save even more memory: You can also offload the model to the CPU to save even more memory:
```diff ```diff
+ pipeline.enable_xformers_memory_efficient_attention() + pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload() + pipeline.enable_model_cpu_offload()
``` ```
To speed-up your inference code even more, use [`torch_compile`](../optimization/torch2.0#torch.compile). You should wrap `torch.compile` around the most intensive component in the pipeline which is typically the UNet: To speed-up your inference code even more, use [`torch_compile`](../optimization/torch2.0#torchcompile). You should wrap `torch.compile` around the most intensive component in the pipeline which is typically the UNet:
```py ```py
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
``` ```
Learn more in the [Reduce memory usage](../optimization/memory) and [Torch 2.0](../optimization/torch2.0) guides. Learn more in the [Reduce memory usage](../optimization/memory) and [Torch 2.0](../optimization/torch2.0) guides.
...@@ -23,16 +23,16 @@ You can use any of the 🧨 Diffusers [checkpoints](https://huggingface.co/model ...@@ -23,16 +23,16 @@ You can use any of the 🧨 Diffusers [checkpoints](https://huggingface.co/model
<Tip> <Tip>
💡 Want to train your own unconditional image generation model? Take a look at the training [guide](training/unconditional_training) to learn how to generate your own images. 💡 Want to train your own unconditional image generation model? Take a look at the training [guide](../training/unconditional_training) to learn how to generate your own images.
</Tip> </Tip>
In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239): In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239):
```python ```python
>>> from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
>>> generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128", use_safetensors=True) generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128", use_safetensors=True)
``` ```
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
...@@ -40,13 +40,14 @@ Because the model consists of roughly 1.4 billion parameters, we strongly recomm ...@@ -40,13 +40,14 @@ Because the model consists of roughly 1.4 billion parameters, we strongly recomm
You can move the generator object to a GPU, just like you would in PyTorch: You can move the generator object to a GPU, just like you would in PyTorch:
```python ```python
>>> generator.to("cuda") generator.to("cuda")
``` ```
Now you can use the `generator` to generate an image: Now you can use the `generator` to generate an image:
```python ```python
>>> image = generator().images[0] image = generator().images[0]
image
``` ```
The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
...@@ -54,7 +55,7 @@ The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs ...@@ -54,7 +55,7 @@ The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs
You can save the image by calling: You can save the image by calling:
```python ```python
>>> image.save("generated_image.png") image.save("generated_image.png")
``` ```
Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality! Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality!
...@@ -65,5 +66,3 @@ Try out the Spaces below, and feel free to play around with the inference steps ...@@ -65,5 +66,3 @@ Try out the Spaces below, and feel free to play around with the inference steps
width="850" width="850"
height="500" height="500"
></iframe> ></iframe>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment