[PyTorch] Only disable Flash Attention in Userbuffers test on SM 8.0 (#2401)

Only disable Flash Attention in Userbuffers test on A100 Signed-off-by: Tim Moon <tmoon@nvidia.com>

[PyTorch] Only disable Flash Attention in Userbuffers test on SM 8.0 (#2401)
Only disable Flash Attention in Userbuffers test on A100 Signed-off-by: Tim Moon <tmoon@nvidia.com>
f8cb598c · Tim Moon · GitHub · a75da0ca · f8cb598c
Unverified Commit f8cb598c authored Nov 21, 2025 by Tim Moon Committed by GitHub Nov 21, 2025
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

tests/pytorch/distributed/test_comm_gemm_overlap.py tests/pytorch/distributed/test_comm_gemm_overlap.py +5 -1

No files found.
--- a/tests/pytorch/distributed/test_comm_gemm_overlap.py
+++ b/tests/pytorch/distributed/test_comm_gemm_overlap.py
@@ -120,6 +120,10 @@ def _run_layer_with_overlap(
    os.environ["PYTORCH_JIT"] = "0"
    os.environ["NVTE_TORCH_COMPILE"] = "0"
    os.environ["NVTE_ALLOW_NONDETERMINISTIC_ALGO"] = "0"
+    if te.get_device_compute_capability() <= (8, 0):
+        # We've experienced numerical discrepancies in Flash Attention
+        # backward when running with Userbuffers on A100s. This does
+        # not show up in more recent GPUs.
        os.environ["NVTE_FLASH_ATTN"] = "0"

    result = subprocess.run(test_cmd, env=os.environ, capture_output=True, check=False)