[Perf][GDN] Align TMA usage with upstream FLA (#38981)

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Perf][GDN] Align TMA usage with upstream FLA (#38981)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
99e5539a · Artem Perevedentsev · GitHub · a88ce94b · 99e5539a
Unverified Commit 99e5539a authored Apr 04, 2026 by Artem Perevedentsev Committed by GitHub Apr 05, 2026
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 3 deletions

vllm/model_executor/layers/fla/ops/utils.py vllm/model_executor/layers/fla/ops/utils.py +7 -3

No files found.
--- a/vllm/model_executor/layers/fla/ops/utils.py
+++ b/vllm/model_executor/layers/fla/ops/utils.py
@@ -154,9 +154,13 @@ is_nvidia_hopper = is_nvidia and (
 )
 use_cuda_graph = is_nvidia and os.environ.get("FLA_USE_CUDA_GRAPH", "0") == "1"
 is_gather_supported = hasattr(triton.language, "gather")
-is_tma_supported = (is_nvidia and torch.cuda.get_device_capability(0)[0] >= 9) and (
+is_tma_supported = (
+    is_nvidia_hopper
+    and os.getenv("FLA_USE_TMA", "0") == "1"
+    and (
        hasattr(triton.language, "_experimental_make_tensor_descriptor")
        or hasattr(triton.language, "make_tensor_descriptor")
+    )
 )