Unverified Commit f77ff561 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[Docs] No more autocast (#2021)

no more autocast
parent f861cde1
...@@ -20,7 +20,6 @@ We'll discuss how the following settings impact performance and memory. ...@@ -20,7 +20,6 @@ We'll discuss how the following settings impact performance and memory.
| ---------------- | ------- | ------- | | ---------------- | ------- | ------- |
| original | 9.50s | x1 | | original | 9.50s | x1 |
| cuDNN auto-tuner | 9.37s | x1.01 | | cuDNN auto-tuner | 9.37s | x1.01 |
| autocast (fp16) | 5.47s | x1.74 |
| fp16 | 3.61s | x2.63 | | fp16 | 3.61s | x2.63 |
| channels last | 3.30s | x2.88 | | channels last | 3.30s | x2.88 |
| traced UNet | 3.21s | x2.96 | | traced UNet | 3.21s | x2.96 |
...@@ -54,27 +53,9 @@ import torch ...@@ -54,27 +53,9 @@ import torch
torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cuda.matmul.allow_tf32 = True
``` ```
## Automatic mixed precision (AMP)
If you use a CUDA GPU, you can take advantage of `torch.autocast` to perform inference roughly twice as fast at the cost of slightly lower precision. All you need to do is put your inference call inside an `autocast` context manager. The following example shows how to do it using Stable Diffusion text-to-image generation as an example:
```Python
from torch import autocast
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt).images[0]
```
Despite the precision loss, in our experience the final image results look the same as the `float32` versions. Feel free to experiment and report back!
## Half precision weights ## Half precision weights
To save more GPU memory and get even more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them: To save more GPU memory and get more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
```Python ```Python
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
...@@ -88,6 +69,11 @@ prompt = "a photo of an astronaut riding a horse on mars" ...@@ -88,6 +69,11 @@ prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0] image = pipe(prompt).images[0]
``` ```
<Tip warning={true}>
It is strongly discouraged to make use of [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than using pure
float16 precision.
</Tip>
## Sliced attention for additional memory savings ## Sliced attention for additional memory savings
For even additional memory savings, you can use a sliced version of attention that performs the computation in steps instead of all at once. For even additional memory savings, you can use a sliced version of attention that performs the computation in steps instead of all at once.
......
...@@ -640,7 +640,6 @@ from diffusers import DiffusionPipeline ...@@ -640,7 +640,6 @@ from diffusers import DiffusionPipeline
from PIL import Image from PIL import Image
import requests import requests
from torch import autocast
processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined") processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined")
model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined") model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined")
...@@ -659,8 +658,7 @@ image = Image.open(requests.get(url, stream=True).raw).resize((512, 512)) ...@@ -659,8 +658,7 @@ image = Image.open(requests.get(url, stream=True).raw).resize((512, 512))
text = "a glass" # will mask out this text text = "a glass" # will mask out this text
prompt = "a cup" # the masked out region will be replaced with this prompt = "a cup" # the masked out region will be replaced with this
with autocast("cuda"): image = pipe(image=image, text=text, prompt=prompt).images[0]
image = pipe(image=image, text=text, prompt=prompt).images[0]
``` ```
### Bit Diffusion ### Bit Diffusion
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment