Unverified Commit 8ffbbabd authored by Santosh Bhavani's avatar Santosh Bhavani Committed by GitHub
Browse files

README.md - Installation section (#1689)



* Update README.rst - Installation

Update installation section with comprehensive guidelines

- Add detailed system requirements
- Include Conda installation method (experimental)
- Document environment variables for customizing build process
- Update FlashAttention support to cover both version 2 and 3
- Add troubleshooting section with solutions for common installation issues
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>

* Update README.rst - Installation

removed conda section
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>

* Update README.rst - Installation

added all gpu archs that support FP8
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update README.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update installation.rst
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix docs and adding troubleshooting
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: default avatarSantosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent beaecf84
......@@ -145,18 +145,30 @@ Flax
Installation
============
.. installation
Pre-requisites
System Requirements
^^^^^^^^^^^^^^^^^^^^
* Linux x86_64
* CUDA 12.1+ (CUDA 12.8+ for Blackwell)
* NVIDIA Driver supporting CUDA 12.1 or later
* cuDNN 9.3 or later
Docker
^^^^^^^^^^^^^^^^^^^^
* **Hardware:** Blackwell, Hopper, Grace Hopper/Blackwell, Ada, Ampere
* **OS:** Linux (official), WSL2 (limited support)
* **Software:**
* CUDA: 12.1+ (Hopper/Ada/Ampere), 12.8+ (Blackwell) with compatible NVIDIA drivers
* cuDNN: 9.3+
* Compiler: GCC 9+ or Clang 10+ with C++17 support
* Python: 3.12 recommended
* **Source Build Requirements:** CMake 3.18+, Ninja, Git 2.17+, pybind11 2.6.0+
* **Notes:** FP8 features require Compute Capability 8.9+ (Ada/Hopper/Blackwell)
Installation Methods
^^^^^^^^^^^^^^^^^^^
Docker (Recommended)
^^^^^^^^^^^^^^^^^^^
The quickest way to get started with Transformer Engine is by using Docker images on
`NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_.
For example to use the NGC PyTorch container interactively,
......@@ -167,41 +179,116 @@ For example to use the NGC PyTorch container interactively,
Where 25.01 (corresponding to January 2025 release) is the container version.
pip
^^^^^^^^^^^^^^^^^^^^
To install the latest stable version of Transformer Engine,
**Benefits of using NGC containers:**
* All dependencies pre-installed with compatible versions and optimized configurations
* NGC PyTorch 23.08+ containers include FlashAttention-2
pip Installation
^^^^^^^^^^^^^^^^^^^
**Prerequisites for pip installation:**
* A compatible C++ compiler
* CUDA Toolkit with cuDNN and NVCC (NVIDIA CUDA Compiler) installed
To install the latest stable version with pip:
.. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
# For PyTorch integration
pip install --no-build-isolation transformer_engine[pytorch]
# For JAX integration
pip install --no-build-isolation transformer_engine[jax]
# For both frameworks
pip install --no-build-isolation transformer_engine[pytorch,jax]
Alternatively, install directly from the GitHub repository:
This will automatically detect if any supported deep learning frameworks are installed and build
Transformer Engine support for them. To explicitly specify frameworks, set the environment variable
NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
.. code-block:: bash
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
Alternatively, the package can be directly installed from
`Transformer Engine's PyPI <https://pypi.org/project/transformer-engine/>`_, e.g.
When installing from GitHub, you can explicitly specify frameworks using the environment variable:
.. code-block:: bash
pip3 install transformer_engine[pytorch]
NVTE_FRAMEWORK=pytorch,jax pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
Source Installation
^^^^^^^^^^^^^^^^^^^
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_
Environment Variables
^^^^^^^^^^^^^^^^^^^
These environment variables can be set before installation to customize the build process:
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be
explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]).
Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX
and PyTorch extensions.
* **CUDA_PATH**: Path to CUDA installation
* **CUDNN_PATH**: Path to cuDNN installation
* **CXX**: Path to C++ compiler
* **NVTE_FRAMEWORK**: Comma-separated list of frameworks to build for (e.g., ``pytorch,jax``)
* **MAX_JOBS**: Limit number of parallel build jobs (default varies by system)
* **NVTE_BUILD_THREADS_PER_JOB**: Control threads per build job
From source
^^^^^^^^^^^
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_.
Compiling with FlashAttention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transformer Engine supports both FlashAttention-2 and FlashAttention-3 in PyTorch for improved performance. FlashAttention-3 was added in release v1.11 and is prioritized over FlashAttention-2 when both are present in the environment.
Compiling with FlashAttention-2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Transformer Engine release v0.11.0 added support for FlashAttention-2 in PyTorch for improved performance.
You can verify which FlashAttention version is being used by setting these environment variables:
.. code-block:: bash
NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=1 python your_script.py
It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue.
Note that NGC PyTorch 23.08+ containers include FlashAttention-2.
.. troubleshooting-begin-marker-do-not-remove
Troubleshooting
^^^^^^^^^^^^^^^^^^^
**Common Issues and Solutions:**
1. **ABI Compatibility Issues:**
* **Symptoms:** ``ImportError`` with undefined symbols when importing transformer_engine
* **Solution:** Ensure PyTorch and Transformer Engine are built with the same C++ ABI setting. Rebuild PyTorch from source with matching ABI.
* **Context:** If you're using PyTorch built with a different C++ ABI than your system's default, you may encounter these undefined symbol errors. This is particularly common with pip-installed PyTorch outside of containers.
2. **Missing Headers or Libraries:**
* **Symptoms:** CMake errors about missing headers (``cudnn.h``, ``cublas_v2.h``, ``filesystem``, etc.)
* **Solution:** Install missing development packages or set environment variables to point to correct locations:
.. code-block:: bash
export CUDA_PATH=/path/to/cuda
export CUDNN_PATH=/path/to/cudnn
* If CMake can't find a C++ compiler, set the ``CXX`` environment variable.
* Ensure all paths are correctly set before installation.
3. **Build Resource Issues:**
* **Symptoms:** Compilation hangs, system freezes, or out-of-memory errors
* **Solution:** Limit parallel builds:
.. code-block:: bash
MAX_JOBS=1 NVTE_BUILD_THREADS_PER_JOB=1 pip install ...
4. **Verbose Build Logging:**
* For detailed build logs to help diagnose issues:
.. code-block:: bash
cd transformer_engine
pip install -v -v -v --no-build-isolation .
.. troubleshooting-end-marker-do-not-remove
Breaking Changes
================
......
......@@ -34,7 +34,7 @@ Transformer Engine can be directly installed from `our PyPI <https://pypi.org/pr
.. code-block:: bash
pip3 install transformer_engine[pytorch]
pip3 install --no-build-isolation transformer_engine[pytorch]
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions.
......@@ -54,7 +54,7 @@ Execute the following command to install the latest stable version of Transforme
.. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`).
......@@ -71,7 +71,7 @@ Execute the following command to install the latest development build of Transfo
.. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@main
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@main
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). To only build the framework-agnostic C++ API, set `NVTE_FRAMEWORK=none`.
......@@ -79,7 +79,7 @@ In order to install a specific PR, execute (after changing NNN to the PR number)
.. code-block:: bash
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
Installation (from source)
......@@ -94,7 +94,7 @@ Execute the following commands to install Transformer Engine from source:
cd TransformerEngine
export NVTE_FRAMEWORK=pytorch # Optionally set framework
pip3 install . # Build and install
pip3 install --no-build-isolation . # Build and install
If the Git repository has already been cloned, make sure to also clone the submodules:
......@@ -106,10 +106,14 @@ Extra dependencies for testing can be installed by setting the "test" option:
.. code-block:: bash
pip3 install .[test]
pip3 install --no-build-isolation .[test]
To build the C++ extensions with debug symbols, e.g. with the `-g` flag:
.. code-block:: bash
pip3 install . --global-option=--debug
pip3 install --no-build-isolation . --global-option=--debug
.. include:: ../README.rst
:start-after: troubleshooting-begin-marker-do-not-remove
:end-before: troubleshooting-end-marker-do-not-remove
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment