stable_diffusion.md 5.7 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2

3
4
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
5

6
http://www.apache.org/licenses/LICENSE-2.0
7

8
9
10
11
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
12

13
[[open-in-colab]]
14

15
# Basic performance
16

17
Diffusion is a random process that is computationally demanding. You may need to run the [`DiffusionPipeline`] several times before getting a desired output. That's why it's important to carefully balance generation speed and memory usage in order to iterate faster,
18

19
This guide recommends some basic performance tips for using the [`DiffusionPipeline`]. Refer to the Inference Optimization section docs such as [Accelerate inference](./optimization/fp16) or [Reduce memory usage](./optimization/memory) for more detailed performance guides.
20

21
## Memory usage
22

23
Reducing the amount of memory used indirectly speeds up generation and can help a model fit on device.
24

Steven Liu's avatar
Steven Liu committed
25
26
The [`~DiffusionPipeline.enable_model_cpu_offload`] method moves a model to the CPU when it is not in use to save GPU memory.

27
```py
28
import torch
29
from diffusers import DiffusionPipeline
30

31
32
pipeline = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
33
34
35
  torch_dtype=torch.bfloat16,
  device_map="cuda"
)
36
pipeline.enable_model_cpu_offload()
37

38
39
40
41
42
43
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
pipeline(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
44
```
45

46
## Inference speed
47

48
Denoising is the most computationally demanding process during diffusion. Methods that optimizes this process accelerates inference speed. Try the following methods for a speed up.
49

Steven Liu's avatar
Steven Liu committed
50
- Add `device_map="cuda"` to place the pipeline on a GPU. Placing a model on an accelerator, like a GPU, increases speed because it performs computations in parallel.
51
- Set `torch_dtype=torch.bfloat16` to execute the pipeline in half-precision. Reducing the data type precision increases speed because it takes less time to perform computations in a lower precision.
52

53
```py
54
import torch
55
56
import time
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
57

58
59
pipeline = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
60
61
62
  torch_dtype=torch.bfloat16,
  device_map="cuda
)
63
```
64

65
66
- Use a faster scheduler, such as [`DPMSolverMultistepScheduler`], which only requires ~20-25 steps.
- Set `num_inference_steps` to a lower value. Reducing the number of inference steps reduces the overall number of computations. However, this can result in lower generation quality.
67

68
```py
69
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
70

71
72
73
74
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
75

76
77
78
start_time = time.perf_counter()
image = pipeline(prompt).images[0]
end_time = time.perf_counter()
79

80
print(f"Image generation took {end_time - start_time:.3f} seconds")
81
```
82

83
## Generation quality
84

85
Many modern diffusion models deliver high-quality images out-of-the-box. However, you can still improve generation quality by trying the following.
86

87
- Try a more detailed and descriptive prompt. Include details such as the image medium, subject, style, and aesthetic. A negative prompt may also help by guiding a model away from undesirable features by using words like low quality or blurry.
88

89
90
91
    ```py
    import torch
    from diffusers import DiffusionPipeline
92

93
94
    pipeline = DiffusionPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
95
96
97
        torch_dtype=torch.bfloat16,
        device_map="cuda"
    )
98

99
100
101
102
103
104
105
    prompt = """
    cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
    highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
    """
    negative_prompt = "low quality, blurry, ugly, poor details"
    pipeline(prompt, negative_prompt=negative_prompt).images[0]
    ```
106

107
    For more details about creating better prompts, take a look at the [Prompt techniques](./using-diffusers/weighted_prompts) doc.
108

109
- Try a different scheduler, like [`HeunDiscreteScheduler`] or [`LMSDiscreteScheduler`], that gives up generation speed for quality.
110

111
112
113
    ```py
    import torch
    from diffusers import DiffusionPipeline, HeunDiscreteScheduler
114

115
116
    pipeline = DiffusionPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
Steven Liu's avatar
Steven Liu committed
117
118
119
        torch_dtype=torch.bfloat16,
        device_map="cuda"
    )
120
    pipeline.scheduler = HeunDiscreteScheduler.from_config(pipeline.scheduler.config)
121

122
123
124
125
126
127
128
    prompt = """
    cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
    highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
    """
    negative_prompt = "low quality, blurry, ugly, poor details"
    pipeline(prompt, negative_prompt=negative_prompt).images[0]
    ```
129
130
131

## Next steps

132
Diffusers offers more advanced and powerful optimizations such as [group-offloading](./optimization/memory#group-offloading) and [regional compilation](./optimization/fp16#regional-compilation). To learn more about how to maximize performance, take a look at the Inference Optimization section.