README.md - Installation section (#1689)

* Update README.rst - Installation Update installation section with comprehensive guidelines - Add detailed system requirements - Include Conda installation method (experimental) - Document environment variables for customizing build process - Update FlashAttention support to cover both version 2 and 3 - Add troubleshooting section with solutions for common installation issues Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation removed conda section Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation added all gpu archs that support FP8 Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update installation.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs and adding troubleshooting Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

README.md - Installation section (#1689)
* Update README.rst - Installation Update installation section with comprehensive guidelines - Add detailed system requirements - Include Conda installation method (experimental) - Document environment variables for customizing build process - Update FlashAttention support to cover both version 2 and 3 - Add troubleshooting section with solutions for common installation issues Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation removed conda section Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation added all gpu archs that support FP8 Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update installation.rst Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs and adding troubleshooting Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
8ffbbabd · Santosh Bhavani · GitHub · beaecf84 · 8ffbbabd · 8ffbbabd
Unverified Commit 8ffbbabd authored Apr 16, 2025 by Santosh Bhavani Committed by GitHub Apr 16, 2025
Show whitespace changes
Inline Side-by-side

Showing with 128 additions and 37 deletions

README.rst README.rst +116 -29

docs/installation.rst docs/installation.rst +12 -8

No files found.
--- a/README.rst
+++ b/README.rst
@@ -145,18 +145,30 @@ Flax

 Installation
 ============
-.. installation

-Pre-requisites
+System Requirements
 ^^^^^^^^^^^^^^^^^^^^
-* Linux x86_64
-* CUDA 12.1+ (CUDA 12.8+ for Blackwell)
-* NVIDIA Driver supporting CUDA 12.1 or later
-* cuDNN 9.3 or later

-Docker
-^^^^^^^^^^^^^^^^^^^^
+* **Hardware:** Blackwell, Hopper, Grace Hopper/Blackwell, Ada, Ampere
+
+* **OS:** Linux (official), WSL2 (limited support)
+
+* **Software:**
+
+  * CUDA: 12.1+ (Hopper/Ada/Ampere), 12.8+ (Blackwell) with compatible NVIDIA drivers
+  * cuDNN: 9.3+
+  * Compiler: GCC 9+ or Clang 10+ with C++17 support
+  * Python: 3.12 recommended
+
+* **Source Build Requirements:** CMake 3.18+, Ninja, Git 2.17+, pybind11 2.6.0+
+
+* **Notes:** FP8 features require Compute Capability 8.9+ (Ada/Hopper/Blackwell)

+Installation Methods
+^^^^^^^^^^^^^^^^^^^
+
+Docker (Recommended)
+^^^^^^^^^^^^^^^^^^^
 The quickest way to get started with Transformer Engine is by using Docker images on
 `NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_.
 For example to use the NGC PyTorch container interactively,
@@ -167,41 +179,116 @@ For example to use the NGC PyTorch container interactively,

 Where 25.01 (corresponding to January 2025 release) is the container version.

-pip
-^^^^^^^^^^^^^^^^^^^^
-To install the latest stable version of Transformer Engine,
+**Benefits of using NGC containers:**
+
+* All dependencies pre-installed with compatible versions and optimized configurations
+* NGC PyTorch 23.08+ containers include FlashAttention-2
+
+pip Installation
+^^^^^^^^^^^^^^^^^^^
+
+**Prerequisites for pip installation:**
+
+* A compatible C++ compiler
+* CUDA Toolkit with cuDNN and NVCC (NVIDIA CUDA Compiler) installed
+
+To install the latest stable version with pip:

 .. code-block:: bash

-    pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+    # For PyTorch integration
+    pip install --no-build-isolation transformer_engine[pytorch]
+    
+    # For JAX integration
+    pip install --no-build-isolation transformer_engine[jax]
+    
+    # For both frameworks
+    pip install --no-build-isolation transformer_engine[pytorch,jax]
+
+Alternatively, install directly from the GitHub repository:

-This will automatically detect if any supported deep learning frameworks are installed and build
-Transformer Engine support for them. To explicitly specify frameworks, set the environment variable
-NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
+.. code-block:: bash
+
+    pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

-Alternatively, the package can be directly installed from
-`Transformer Engine's PyPI <https://pypi.org/project/transformer-engine/>`_, e.g.
+When installing from GitHub, you can explicitly specify frameworks using the environment variable:

 .. code-block:: bash

-    pip3 install transformer_engine[pytorch]
+    NVTE_FRAMEWORK=pytorch,jax pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+
+Source Installation
+^^^^^^^^^^^^^^^^^^^
+
+`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_
+
+Environment Variables
+^^^^^^^^^^^^^^^^^^^
+These environment variables can be set before installation to customize the build process:

-To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be
-explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]).
-Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX
-and PyTorch extensions.
+* **CUDA_PATH**: Path to CUDA installation
+* **CUDNN_PATH**: Path to cuDNN installation
+* **CXX**: Path to C++ compiler
+* **NVTE_FRAMEWORK**: Comma-separated list of frameworks to build for (e.g., ``pytorch,jax``)
+* **MAX_JOBS**: Limit number of parallel build jobs (default varies by system)
+* **NVTE_BUILD_THREADS_PER_JOB**: Control threads per build job

-From source
-^^^^^^^^^^^
-`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_.
+Compiling with FlashAttention
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Transformer Engine supports both FlashAttention-2 and FlashAttention-3 in PyTorch for improved performance. FlashAttention-3 was added in release v1.11 and is prioritized over FlashAttention-2 when both are present in the environment.

-Compiling with FlashAttention-2
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Transformer Engine release v0.11.0 added support for FlashAttention-2 in PyTorch for improved performance.
+You can verify which FlashAttention version is being used by setting these environment variables:
+
+.. code-block:: bash
+
+    NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=1 python your_script.py

 It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue.

-Note that NGC PyTorch 23.08+ containers include FlashAttention-2.
+.. troubleshooting-begin-marker-do-not-remove
+Troubleshooting
+^^^^^^^^^^^^^^^^^^^
+
+**Common Issues and Solutions:**
+
+1. **ABI Compatibility Issues:**
+
+   * **Symptoms:** ``ImportError`` with undefined symbols when importing transformer_engine
+   * **Solution:** Ensure PyTorch and Transformer Engine are built with the same C++ ABI setting. Rebuild PyTorch from source with matching ABI.
+   * **Context:** If you're using PyTorch built with a different C++ ABI than your system's default, you may encounter these undefined symbol errors. This is particularly common with pip-installed PyTorch outside of containers.
+
+2. **Missing Headers or Libraries:**
+
+   * **Symptoms:** CMake errors about missing headers (``cudnn.h``, ``cublas_v2.h``, ``filesystem``, etc.)
+   * **Solution:** Install missing development packages or set environment variables to point to correct locations:
+
+     .. code-block:: bash
+
+         export CUDA_PATH=/path/to/cuda
+         export CUDNN_PATH=/path/to/cudnn
+
+   * If CMake can't find a C++ compiler, set the ``CXX`` environment variable.
+   * Ensure all paths are correctly set before installation.
+
+3. **Build Resource Issues:**
+
+   * **Symptoms:** Compilation hangs, system freezes, or out-of-memory errors
+   * **Solution:** Limit parallel builds:
+
+     .. code-block:: bash
+
+         MAX_JOBS=1 NVTE_BUILD_THREADS_PER_JOB=1 pip install ...
+
+4. **Verbose Build Logging:**
+
+   * For detailed build logs to help diagnose issues:
+
+     .. code-block:: bash
+
+         cd transformer_engine
+         pip install -v -v -v --no-build-isolation .
+
+.. troubleshooting-end-marker-do-not-remove

 Breaking Changes
 ================

--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -34,7 +34,7 @@ Transformer Engine can be directly installed from `our PyPI <https://pypi.org/pr

 .. code-block:: bash

-    pip3 install transformer_engine[pytorch]
+    pip3 install --no-build-isolation transformer_engine[pytorch]

 To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]). Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX and PyTorch extensions.

@@ -54,7 +54,7 @@ Execute the following command to install the latest stable version of Transforme

 .. code-block:: bash

-  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+  pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable

 This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`).

@@ -71,7 +71,7 @@ Execute the following command to install the latest development build of Transfo

 .. code-block:: bash

-  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@main
+  pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@main

 This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable `NVTE_FRAMEWORK` to a comma-separated list (e.g. `NVTE_FRAMEWORK=jax,pytorch`). To only build the framework-agnostic C++ API, set `NVTE_FRAMEWORK=none`.

@@ -79,7 +79,7 @@ In order to install a specific PR, execute (after changing NNN to the PR number)

 .. code-block:: bash

-  pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge
+  pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@refs/pull/NNN/merge


 Installation (from source)
@@ -94,7 +94,7 @@ Execute the following commands to install Transformer Engine from source:

  cd TransformerEngine
  export NVTE_FRAMEWORK=pytorch         # Optionally set framework
-  pip3 install .                   # Build and install
+  pip3 install --no-build-isolation .   # Build and install

 If the Git repository has already been cloned, make sure to also clone the submodules:

@@ -106,10 +106,14 @@ Extra dependencies for testing can be installed by setting the "test" option:

 .. code-block:: bash

-  pip3 install .[test]
+  pip3 install --no-build-isolation .[test]

 To build the C++ extensions with debug symbols, e.g. with the `-g` flag:

 .. code-block:: bash

-  pip3 install . --global-option=--debug
+  pip3 install --no-build-isolation . --global-option=--debug
+
+.. include:: ../README.rst
+   :start-after: troubleshooting-begin-marker-do-not-remove
+   :end-before: troubleshooting-end-marker-do-not-remove