Unverified Commit 3eb40786 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Prompting (#12312)

* init

* fix

* batch inf

* feedback

* update
parent a4bc8454
......@@ -125,6 +125,9 @@ dmypy.json
.vs
.vscode
# Cursor
.cursor
# Pycharm
.idea
......
......@@ -49,7 +49,7 @@
isExpanded: false
sections:
- local: using-diffusers/weighted_prompts
title: Prompt techniques
title: Prompting
- local: using-diffusers/create_a_server
title: Create a server
- local: using-diffusers/batched_inference
......
......@@ -16,43 +16,7 @@ Batch inference processes multiple prompts at a time to increase throughput. It
The downside is increased latency because you must wait for the entire batch to complete, and more GPU memory is required for large batches.
<hfoptions id="usage">
<hfoption id="text-to-image">
For text-to-image, pass a list of prompts to the pipeline.
```py
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
prompts = [
"cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
]
images = pipeline(
prompt=prompts,
).images
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes = axes.flatten()
for i, image in enumerate(images):
axes[i].imshow(image)
axes[i].set_title(f"Image {i+1}")
axes[i].axis('off')
plt.tight_layout()
plt.show()
```
To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
For text-to-image, pass a list of prompts to the pipeline and for image-to-image, pass a list of images and prompts to the pipeline. The example below demonstrates batched text-to-image inference.
```py
import torch
......@@ -61,78 +25,19 @@ from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
images = pipeline(
prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
num_images_per_prompt=4
).images
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes = axes.flatten()
for i, image in enumerate(images):
axes[i].imshow(image)
axes[i].set_title(f"Image {i+1}")
axes[i].axis('off')
plt.tight_layout()
plt.show()
```
Combine both approaches to generate different variations of different prompts.
```py
images = pipeline(
prompt=prompts,
num_images_per_prompt=2,
).images
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes = axes.flatten()
for i, image in enumerate(images):
axes[i].imshow(image)
axes[i].set_title(f"Image {i+1}")
axes[i].axis('off')
plt.tight_layout()
plt.show()
```
</hfoption>
<hfoption id="image-to-image">
For image-to-image, pass a list of input images and prompts to the pipeline.
```py
import torch
from diffusers.utils import load_image
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
input_images = [
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"),
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
]
torch_dtype=torch.float16,
device_map="cuda"
)
prompts = [
"cinematic photo of a beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
"Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
"Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
"Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
"Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
]
images = pipeline(
prompt=prompts,
image=input_images,
guidance_scale=8.0,
strength=0.5
).images
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
......@@ -147,24 +52,31 @@ plt.tight_layout()
plt.show()
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference.png"/>
</div>
To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
```py
import torch
import matplotlib.pyplot as plt
from diffusers.utils import load_image
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
torch_dtype=torch.float16,
device_map="cuda"
)
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
prompt="""
Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the
space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the
nostalgic, lofi-inspired game aesthetic.
"""
images = pipeline(
prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
image=input_image,
prompt=prompt,
num_images_per_prompt=4
).images
......@@ -180,26 +92,19 @@ plt.tight_layout()
plt.show()
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-2.png"/>
</div>
Combine both approaches to generate different variations of different prompts.
```py
input_images = [
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
]
prompts = [
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
]
images = pipeline(
prompt=prompts,
image=input_images,
num_images_per_prompt=2,
).images
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
fig, axes = plt.subplots(2, 4, figsize=(12, 12))
axes = axes.flatten()
for i, image in enumerate(images):
......@@ -211,16 +116,18 @@ plt.tight_layout()
plt.show()
```
</hfoption>
</hfoptions>
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-3.png"/>
</div>
## Deterministic generation
Enable reproducible batch generation by passing a list of [Generator’s](https://pytorch.org/docs/stable/generated/torch.Generator.html) to the pipeline and tie each `Generator` to a seed to reuse it.
Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch.
> [!TIP]
> Refer to the [Reproducibility](./reusing_seeds) docs to learn more about deterministic algorithms and the `Generator` object.
Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch. Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
```py
generator = [torch.Generator(device="cuda").manual_seed(0)] * 3
......@@ -234,14 +141,16 @@ from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
torch_dtype=torch.float16,
device_map="cuda"
)
generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(3)]
prompts = [
"cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
"Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
"Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
"Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
"Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
]
images = pipeline(
......@@ -261,4 +170,4 @@ plt.tight_layout()
plt.show()
```
You can use this to iteratively select an image associated with a seed and then improve on it by crafting a more detailed prompt.
\ No newline at end of file
You can use this to select an image associated with a seed and iteratively improve on it by crafting a more detailed prompt.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment