Add compilation OOM note for FA 2.0 (#346)

Add compilation warning for FA 2.0 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Add compilation OOM note for FA 2.0 (#346)
Add compilation warning for FA 2.0 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
054a9c17 · Kirthi Shankar Sivamani · GitHub · 1cb4b25a · 054a9c17
Unverified Commit 054a9c17 authored Jul 31, 2023 by Kirthi Shankar Sivamani Committed by GitHub Jul 31, 2023
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

README.rst README.rst +8 -0

No files found.
--- a/README.rst
+++ b/README.rst
@@ -191,6 +191,14 @@ From source
 `See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_.
+Compiling with Flash Attention 2
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+TransformerEngine release v0.11.0 adds support for Flash Attention 2.0 for improved performance. It is a known issue that Flash Attention 2.0 compilation is
+resource intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory
+errors during the installation of TransformerEngine. To circumvent the issue, please try setting **MAX_JOBS=1** in the environment. If the errors persist, then
+proceed to install a supported version of Flash Attention 1 (v1.0.6 to v1.0.9).
 Model Support
 ----------