@@ -74,11 +74,14 @@ You need to accept the model license before downloading or using the Stable Diff
### Text-to-Image generation with Stable Diffusion
We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
precision while being roughly twice as fast and requiring half the amount of GPU RAM.
```python
# make sure you're logged in with `huggingface-cli login`
@@ -68,7 +68,7 @@ Despite the precision loss, in our experience the final image results look the s
## Half precision weights
To save more GPU memory, you can load the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
To save more GPU memory and get even more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
## Sliced attention for additional memory savings
...
...
@@ -101,8 +105,7 @@ pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_attention_slicing()
with torch.autocast("cuda"):
image = pipe(prompt).images[0]
image = pipe(prompt).images[0]
```
There's a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM!
@@ -109,7 +109,6 @@ A full training run takes ~1 hour on one V100 GPU.
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `identifier`(e.g. sks in above example) in your prompt.
@@ -74,8 +74,6 @@ A full training run takes ~1 hour on one V100 GPU.
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.