Support building with headers from nvidia wheels (#2623)

* Support building with headers from nvidia wheels There are two changes: 1. `import nvidia` returns a namespace package with `__file__` equal to `None` 2. Add the way to force headers from nvidia wheels. Without that envvar, it's practically impossible with CUDA installed system-wide. I successfully built the package with torch using the following `uv` configuration: ``` [tool.uv.extra-build-dependencies] "transformer-engine-torch" = [ "ninja", "nvidia-cuda-crt==13.0.88", "nvidia-cuda-cccl==13.0.85", { requirement = "torch", match-runtime = true }, { requirement = "pytorch-triton", match-runtime = true }, { requirement = "nvidia-cusolver", match-runtime = true }, { requirement = "nvidia-curand", match-runtime = true }, { requirement = "nvidia-cublas", match-runtime = true }, { requirement = "nvidia-cusparse", match-runtime = true }, { requirement = "nvidia-cudnn-cu13", match-runtime = true }, { requirement = "nvidia-nvtx", match-runtime = true }, { requirement = "nvidia-cuda-nvrtc", match-runtime = true }, { requirement = "nvidia-cuda-runtime", match-runtime = true }, ] ``` Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> * Apply suggestion from @ksivaman Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Support building with headers from nvidia wheels (#2623)
* Support building with headers from nvidia wheels There are two changes: 1. `import nvidia` returns a namespace package with `__file__` equal to `None` 2. Add the way to force headers from nvidia wheels. Without that envvar, it's practically impossible with CUDA installed system-wide. I successfully built the package with torch using the following `uv` configuration: ``` [tool.uv.extra-build-dependencies] "transformer-engine-torch" = [ "ninja", "nvidia-cuda-crt==13.0.88", "nvidia-cuda-cccl==13.0.85", { requirement = "torch", match-runtime = true }, { requirement = "pytorch-triton", match-runtime = true }, { requirement = "nvidia-cusolver", match-runtime = true }, { requirement = "nvidia-curand", match-runtime = true }, { requirement = "nvidia-cublas", match-runtime = true }, { requirement = "nvidia-cusparse", match-runtime = true }, { requirement = "nvidia-cudnn-cu13", match-runtime = true }, { requirement = "nvidia-nvtx", match-runtime = true }, { requirement = "nvidia-cuda-nvrtc", match-runtime = true }, { requirement = "nvidia-cuda-runtime", match-runtime = true }, ] ``` Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> * Apply suggestion from @ksivaman Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
94ba75d7 · Vadim Markovtsev · GitHub · 3ceb248e · 94ba75d7
Unverified Commit 94ba75d7 authored Feb 03, 2026 by Vadim Markovtsev Committed by GitHub Feb 02, 2026
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 2 deletions

build_tools/utils.py build_tools/utils.py +6 -2

No files found.
--- a/build_tools/utils.py
+++ b/build_tools/utils.py
@@ -228,9 +228,10 @@ def nvcc_path() -> Tuple[str, str]:
 def get_cuda_include_dirs() -> Tuple[str, str]:
    """Returns the CUDA header directory."""
+    force_wheels = bool(int(os.getenv("NVTE_BUILD_USE_NVIDIA_WHEELS", "0")))
    # If cuda is installed via toolkit, all necessary headers
    # are bundled inside the top level cuda directory.
-    if cuda_toolkit_include_path() is not None:
+    if not force_wheels and cuda_toolkit_include_path() is not None:
        return [cuda_toolkit_include_path()]
    # Use pip wheels to include all headers.
@@ -239,7 +240,10 @@ def get_cuda_include_dirs() -> Tuple[str, str]:
    except ModuleNotFoundError as e:
        raise RuntimeError("CUDA not found.")
+    if nvidia.__file__ is not None:
        cuda_root = Path(nvidia.__file__).parent
+    else:
+        cuda_root = Path(nvidia.__path__[0])  # namespace
    return [
        subdir / "include"
        for subdir in cuda_root.iterdir()