Unverified Commit 054a9c17 authored by Kirthi Shankar Sivamani's avatar Kirthi Shankar Sivamani Committed by GitHub
Browse files

Add compilation OOM note for FA 2.0 (#346)



Add compilation warning for FA 2.0
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent 1cb4b25a
...@@ -191,6 +191,14 @@ From source ...@@ -191,6 +191,14 @@ From source
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_. `See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_.
Compiling with Flash Attention 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TransformerEngine release v0.11.0 adds support for Flash Attention 2.0 for improved performance. It is a known issue that Flash Attention 2.0 compilation is
resource intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory
errors during the installation of TransformerEngine. To circumvent the issue, please try setting **MAX_JOBS=1** in the environment. If the errors persist, then
proceed to install a supported version of Flash Attention 1 (v1.0.6 to v1.0.9).
Model Support Model Support
---------- ----------
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment