Unverified Commit 8ffbbabd authored by Santosh Bhavani's avatar Santosh Bhavani Committed by GitHub
Browse files

README.md - Installation section (#1689)



* Update README.rst - Installation

Update installation section with comprehensive guidelines

- Add detailed system requirements
- Include Conda installation method (experimental)
- Document environment variables for customizing build process
- Update FlashAttention support to cover both version 2 and 3
- Add troubleshooting section with solutions for common installation issues
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>

* Update README.rst - Installation

removed conda section
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>

* Update README.rst - Installation

added all gpu archs that support FP8
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update installation.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix docs and adding troubleshooting
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent beaecf84
...@@ -145,18 +145,30 @@ Flax ...@@ -145,18 +145,30 @@ Flax
Installation Installation
============ ============
.. installation
Pre-requisites System Requirements
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
* Linux x86_64
* CUDA 12.1+ (CUDA 12.8+ for Blackwell)
* NVIDIA Driver supporting CUDA 12.1 or later
* cuDNN 9.3 or later
Docker * **Hardware:** Blackwell, Hopper, Grace Hopper/Blackwell, Ada, Ampere
^^^^^^^^^^^^^^^^^^^^
* **OS:** Linux (official), WSL2 (limited support)
* **Software:**
* CUDA: 12.1+ (Hopper/Ada/Ampere), 12.8+ (Blackwell) with compatible NVIDIA drivers
* cuDNN: 9.3+
* Compiler: GCC 9+ or Clang 10+ with C++17 support
* Python: 3.12 recommended
* **Source Build Requirements:** CMake 3.18+, Ninja, Git 2.17+, pybind11 2.6.0+
* **Notes:** FP8 features require Compute Capability 8.9+ (Ada/Hopper/Blackwell)
Installation Methods
^^^^^^^^^^^^^^^^^^^
Docker (Recommended)
^^^^^^^^^^^^^^^^^^^
The quickest way to get started with Transformer Engine is by using Docker images on The quickest way to get started with Transformer Engine is by using Docker images on
`NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_. `NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_.
For example to use the NGC PyTorch container interactively, For example to use the NGC PyTorch container interactively,
...@@ -167,41 +179,116 @@ For example to use the NGC PyTorch container interactively, ...@@ -167,41 +179,116 @@ For example to use the NGC PyTorch container interactively,
Where 25.01 (corresponding to January 2025 release) is the container version. Where 25.01 (corresponding to January 2025 release) is the container version.
pip **Benefits of using NGC containers:**
^^^^^^^^^^^^^^^^^^^^
To install the latest stable version of Transformer Engine, * All dependencies pre-installed with compatible versions and optimized configurations
* NGC PyTorch 23.08+ containers include FlashAttention-2
pip Installation
^^^^^^^^^^^^^^^^^^^
**Prerequisites for pip installation:**
* A compatible C++ compiler
* CUDA Toolkit with cuDNN and NVCC (NVIDIA CUDA Compiler) installed
To install the latest stable version with pip:
.. code-block:: bash .. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable # For PyTorch integration
pip install --no-build-isolation transformer_engine[pytorch]
# For JAX integration
pip install --no-build-isolation transformer_engine[jax]
# For both frameworks
pip install --no-build-isolation transformer_engine[pytorch,jax]
Alternatively, install directly from the GitHub repository:
.. code-block:: bash
This will automatically detect if any supported deep learning frameworks are installed and build pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
Transformer Engine support for them. To explicitly specify frameworks, set the environment variable
NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
Alternatively, the package can be directly installed from When installing from GitHub, you can explicitly specify frameworks using the environment variable:
`Transformer Engine's PyPI <https://pypi.org/project/transformer-engine/>`_, e.g.
.. code-block:: bash .. code-block:: bash
pip3 install transformer_engine[pytorch] NVTE_FRAMEWORK=pytorch,jax pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
Source Installation
^^^^^^^^^^^^^^^^^^^
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be Environment Variables
explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). ^^^^^^^^^^^^^^^^^^^
Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX These environment variables can be set before installation to customize the build process:
and PyTorch extensions.
From source * **CUDA_PATH**: Path to CUDA installation
^^^^^^^^^^^ * **CUDNN_PATH**: Path to cuDNN installation
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_. * **CXX**: Path to C++ compiler
* **NVTE_FRAMEWORK**: Comma-separated list of frameworks to build for (e.g., ``pytorch,jax``)
* **MAX_JOBS**: Limit number of parallel build jobs (default varies by system)
* **NVTE_BUILD_THREADS_PER_JOB**: Control threads per build job
Compiling with FlashAttention-2 Compiling with FlashAttention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transformer Engine release v0.11.0 added support for FlashAttention-2 in PyTorch for improved performance. Transformer Engine supports both FlashAttention-2 and FlashAttention-3 in PyTorch for improved performance. FlashAttention-3 was added in release v1.11 and is prioritized over FlashAttention-2 when both are present in the environment.
You can verify which FlashAttention version is being used by setting these environment variables:
.. code-block:: bash
NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=1 python your_script.py
It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue. It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue.
Note that NGC PyTorch 23.08+ containers include FlashAttention-2. .. troubleshooting-begin-marker-do-not-remove
Troubleshooting
^^^^^^^^^^^^^^^^^^^
**Common Issues and Solutions:**
1. **ABI Compatibility Issues:**
* **Symptoms:** ``ImportError`` with undefined symbols when importing transformer_engine
* **Solution:** Ensure PyTorch and Transformer Engine are built with the same C++ ABI setting. Rebuild PyTorch from source with matching ABI.
* **Context:** If you're using PyTorch built with a different C++ ABI than your system's default, you may encounter these undefined symbol errors. This is particularly common with pip-installed PyTorch outside of containers.
2. **Missing Headers or Libraries:**
* **Symptoms:** CMake errors about missing headers (``cudnn.h``, ``cublas_v2.h``, ``filesystem``, etc.)
* **Solution:** Install missing development packages or set environment variables to point to correct locations:
.. code-block:: bash
export CUDA_PATH=/path/to/cuda
export CUDNN_PATH=/path/to/cudnn
* If CMake can't find a C++ compiler, set the ``CXX`` environment variable.
* Ensure all paths are correctly set before installation.
3. **Build Resource Issues:**
* **Symptoms:** Compilation hangs, system freezes, or out-of-memory errors
* **Solution:** Limit parallel builds:
.. code-block:: bash
MAX_JOBS=1 NVTE_BUILD_THREADS_PER_JOB=1 pip install ...
4. **Verbose Build Logging:**
* For detailed build logs to help diagnose issues:
.. code-block:: bash
cd transformer_engine
pip install -v -v -v --no-build-isolation .
.. troubleshooting-end-marker-do-not-remove
Breaking Changes Breaking Changes
================ ================
......
...@@ -34,7 +34,7 @@ Transformer Engine can be directly installed from `our PyPI <https://pypi.org/pr ...@@ -34,7 +34,7 @@ Transformer Engine can be directly installed from `our PyPI <https://pypi.org/pr
.. code-block:: bash .. code-block:: bash
pip3 install transformer_engine[pytorch] pip3 install --no-build-isolation transformer_engine[pytorch]
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions. To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions.
...@@ -54,7 +54,7 @@ Execute the following command to install the latest stable version of Transforme ...@@ -54,7 +54,7 @@ Execute the following command to install the latest stable version of Transforme
.. code-block:: bash .. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`).
...@@ -71,7 +71,7 @@ Execute the following command to install the latest development build of Transfo ...@@ -71,7 +71,7 @@ Execute the following command to install the latest development build of Transfo
.. code-block:: bash .. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@main pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@main
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). To only build the framework-agnostic C++ API, set `NVTE_FRAMEWORK=none`. This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). To only build the framework-agnostic C++ API, set `NVTE_FRAMEWORK=none`.
...@@ -79,7 +79,7 @@ In order to install a specific PR, execute (after changing NNN to the PR number) ...@@ -79,7 +79,7 @@ In order to install a specific PR, execute (after changing NNN to the PR number)
.. code-block:: bash .. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
Installation (from source) Installation (from source)
...@@ -93,8 +93,8 @@ Execute the following commands to install Transformer Engine from source: ...@@ -93,8 +93,8 @@ Execute the following commands to install Transformer Engine from source:
git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git
cd TransformerEngine cd TransformerEngine
export NVTE_FRAMEWORK=pytorch # Optionally set framework export NVTE_FRAMEWORK=pytorch # Optionally set framework
pip3 install . # Build and install pip3 install --no-build-isolation . # Build and install
If the Git repository has already been cloned, make sure to also clone the submodules: If the Git repository has already been cloned, make sure to also clone the submodules:
...@@ -106,10 +106,14 @@ Extra dependencies for testing can be installed by setting the "test" option: ...@@ -106,10 +106,14 @@ Extra dependencies for testing can be installed by setting the "test" option:
.. code-block:: bash .. code-block:: bash
pip3 install .[test] pip3 install --no-build-isolation .[test]
To build the C++ extensions with debug symbols, e.g. with the `-g` flag: To build the C++ extensions with debug symbols, e.g. with the `-g` flag:
.. code-block:: bash .. code-block:: bash
pip3 install . --global-option=--debug pip3 install --no-build-isolation . --global-option=--debug
.. include:: ../README.rst
:start-after: troubleshooting-begin-marker-do-not-remove
:end-before: troubleshooting-end-marker-do-not-remove
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment