Unverified Commit acd31781 authored by Pedro Cuenca's avatar Pedro Cuenca Committed by GitHub
Browse files

Docs: recommend xformers (#1724)

* Fix links to flash attention.

* Add xformers installation instructions.

* Make link to xformers install more prominent.

* Link to xformers install from training docs.
parent c6d0dff4
...@@ -45,6 +45,8 @@ ...@@ -45,6 +45,8 @@
- sections: - sections:
- local: optimization/fp16 - local: optimization/fp16
title: "Memory and Speed" title: "Memory and Speed"
- local: optimization/xformers
title: "xFormers"
- local: optimization/onnx - local: optimization/onnx
title: "ONNX" title: "ONNX"
- local: optimization/open_vino - local: optimization/open_vino
......
...@@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License.
# Memory and speed # Memory and speed
We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers).
We'll discuss how the following settings impact performance and memory.
| | Latency | Speedup | | | Latency | Speedup |
| ---------------- | ------- | ------- | | ---------------- | ------- | ------- |
...@@ -322,7 +324,9 @@ with torch.inference_mode(): ...@@ -322,7 +324,9 @@ with torch.inference_mode():
## Memory Efficient Attention ## Memory Efficient Attention
Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) .
Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf).
Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt): Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt):
| GPU | Base Attention FP16 | Memory Efficient Attention FP16 | | GPU | Base Attention FP16 | Memory Efficient Attention FP16 |
...@@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference ...@@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
To leverage it just make sure you have: To leverage it just make sure you have:
- PyTorch > 1.12 - PyTorch > 1.12
- Cuda available - Cuda available
- Installed the [xformers](https://github.com/facebookresearch/xformers) library - [Installed the xformers library](xformers).
```python ```python
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
import torch import torch
......
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Installing xFormers
We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
```bash
pip install pyre-extensions==0.0.23
pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
```
We'll update these instructions when the wheels are published to the official PyPI repository.
...@@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers ...@@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt pip install -U -r diffusers/examples/dreambooth/requirements.txt
``` ```
Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash ```bash
accelerate config accelerate config
......
...@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ...@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
- [Text Inversion](./text_inversion) - [Text Inversion](./text_inversion)
- [Dreambooth](./dreambooth) - [Dreambooth](./dreambooth)
If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
| Task | 🤗 Accelerate | 🤗 Datasets | Colab | Task | 🤗 Accelerate | 🤗 Datasets | Colab
|---|---|:---:|:---:| |---|---|:---:|:---:|
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment