Unverified Commit 7976bd00 authored by Santosh Bhavani's avatar Santosh Bhavani Committed by GitHub
Browse files

Update README.rst - Installation section (#502)



* Update README.rst - Installation section

Added pip install instructions and cleaned up pre-reqs and FlashAttention-2 section
Signed-off-by: default avatarSantosh Bhavani <santosh@semantic.md>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: default avatarSantosh Bhavani <santosh@semantic.md>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent a9cfbfd3
...@@ -135,37 +135,47 @@ Installation ...@@ -135,37 +135,47 @@ Installation
---------- ----------
.. installation .. installation
In the NGC container Pre-requisites
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
* Linux x86_64
* CUDA 11.8+ for Hopper and CUDA 12.1+ for Ada
* NVIDIA Driver supporting CUDA 11.8 or later
* cuDNN 8.1 or later
* For fused attention, CUDA 12.1 or later, NVIDIA Driver supporting CUDA 12.1 or later, and cuDNN 8.9 or later.
The quickest way to get started with Transformer Engine is the NGC PyTorch container on Docker
`NVIDIA GPU Cloud Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_ (versions 22.09 and later). ^^^^^^^^^^^^^^^^^^^^
The quickest way to get started with Transformer Engine is by using Docker images on
`NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_. For example to use the NGC PyTorch container interactively,
.. code-block:: bash .. code-block:: bash
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.04-py3 docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.10-py3
Where 23.04 is the container version. For example, 23.04 for the April 2023 release. Where 23.10 is the container version. For example, 23.10 for the October 2023 release.
Pre-requisites pip
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
* Linux x86_64 To install the latest stable version of Transformer Engine,
* CUDA 11.8 or later
* NVIDIA Driver supporting CUDA 11.8 or later .. code-block:: bash
* cuDNN 8.1 or later
* For fused attention, CUDA 12.1 or later, NVIDIA Driver supporting CUDA 12.1 or later, and cuDNN 8.9 or later. pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
From source From source
^^^^^^^^^^^ ^^^^^^^^^^^
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_.
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_. Compiling with FlashAttention-2
Compiling with Flash Attention 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transformer Engine release v0.11.0 adds support for FlashAttention-2 in PyTorch for improved performance.
It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue. If the errors persist, install a supported version of FlashAttention-1 (v1.0.6 to v1.0.9).
TransformerEngine release v0.11.0 adds support for Flash Attention 2.0 for improved performance. It is a known issue that Flash Attention 2.0 compilation is Note that NGC PyTorch 23.08+ containers include FlashAttention-2.
resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory
errors during the installation of TransformerEngine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue. If the errors persist, install a supported version of Flash Attention 1 (v1.0.6 to v1.0.9).
Model Support Model Support
---------- ----------
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment