Docs: recommend xformers (#1724)

* Fix links to flash attention. * Add xformers installation instructions. * Make link to xformers install more prominent. * Link to xformers install from training docs.

Docs: recommend xformers (#1724)
* Fix links to flash attention. * Add xformers installation instructions. * Make link to xformers install more prominent. * Link to xformers install from training docs.
acd31781 · Pedro Cuenca · GitHub · c6d0dff4 · acd31781 · acd31781
Unverified Commit acd31781 authored Dec 16, 2022 by Pedro Cuenca Committed by GitHub Dec 16, 2022
5 changed files
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -45,6 +45,8 @@
 - sections:
  - local: optimization/fp16
    title: "Memory and Speed"
+  - local: optimization/xformers
+    title: "xFormers"
  - local: optimization/onnx
    title: "ONNX"
  - local: optimization/open_vino

--- a/docs/source/optimization/fp16.mdx
+++ b/docs/source/optimization/fp16.mdx
@@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License.
 # Memory and speed
-We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed.
+We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers).
+We'll discuss how the following settings impact performance and memory.
 |                  | Latency | Speedup |
 | ---------------- | ------- | ------- |
@@ -322,7 +324,9 @@ with torch.inference_mode():
 ## Memory Efficient Attention
-Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) .
+Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf).
 Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt):
 | GPU              	| Base Attention FP16 	| Memory Efficient Attention FP16 	|
@@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
 To leverage it just make sure you have: 
 - PyTorch > 1.12
 - Cuda available
- - Installed the [xformers](https://github.com/facebookresearch/xformers) library
+ - [Installed the xformers library](xformers).
 ```python
 from diffusers import StableDiffusionPipeline
 import torch

--- a/docs/source/optimization/xformers.mdx
+++ b/docs/source/optimization/xformers.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Installing xFormers
+We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
+Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
+Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
+```bash
+pip install pyre-extensions==0.0.23
+pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
+```
+We'll update these instructions when the wheels are published to the official PyPI repository.
--- a/docs/source/training/dreambooth.mdx
+++ b/docs/source/training/dreambooth.mdx
@@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers
 pip install -U -r diffusers/examples/dreambooth/requirements.txt
 ```
-Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
+xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
+After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
 ```bash
 accelerate config

--- a/docs/source/training/overview.mdx
+++ b/docs/source/training/overview.mdx
@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
 - [Text Inversion](./text_inversion)
 - [Dreambooth](./dreambooth)
+If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
 | Task | 🤗 Accelerate | 🤗 Datasets | Colab
 |---|---|:---:|:---:|