[PyT] Bump the min version expected to supported FP8 current scaling...

[PyT] Bump the min version expected to supported FP8 current scaling determinism on Blackwell (#2316) * Bump the min version expected to supported FP8 cs det on Blackwell Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * Disable fused attn for cudnn < 9.14 for FP8 CS. Disable fused attn for cudnn < 9.18 for FP8 deterministic CS Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

[PyT] Bump the min version expected to supported FP8 current scaling...
[PyT] Bump the min version expected to supported FP8 current scaling determinism on Blackwell (#2316) * Bump the min version expected to supported FP8 cs det on Blackwell Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * Disable fused attn for cudnn < 9.14 for FP8 CS. Disable fused attn for cudnn < 9.18 for FP8 deterministic CS Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
26370b11 · Kshitij Lakhani · GitHub · 0e80c847 · 26370b11
Unverified Commit 26370b11 authored Oct 30, 2025 by Kshitij Lakhani Committed by GitHub Oct 30, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 15 additions and 3 deletions

transformer_engine/pytorch/attention/dot_product_attention/utils.py ...r_engine/pytorch/attention/dot_product_attention/utils.py +15 -3

No files found.
--- a/transformer_engine/pytorch/attention/dot_product_attention/utils.py
+++ b/transformer_engine/pytorch/attention/dot_product_attention/utils.py
@@ -477,9 +477,21 @@ def get_attention_backend(
            if device_compute_capability < (10, 0):
                logger.debug("Disabling FusedAttention for FP8 current scaling on arch < sm100")
                use_fused_attention = False
-            elif cudnn_version < (9, 14, 0):
-                logger.debug("Disabling FusedAttention for FP8 current scaling with cuDNN < 9.14.0")
-                use_fused_attention = False
+            # TODO(cyanguwa): Modify the min cuDNN version supporting FP8 current scaling
+            # determinism for Blackwell
+            else:
+                if cudnn_version < (9, 14, 0):
+                    logger.debug(
+                        "Disabling FusedAttention for FP8 current scaling with cuDNN < 9.14.0"
+                    )
+                    use_fused_attention = False
+                else:
+                    if deterministic and cudnn_version < (9, 18, 0):
+                        logger.debug(
+                            "Disabling FusedAttention for FP8 current scaling requiring determinism"
+                            " with cuDNN < 9.18.0"
+                        )
+                        use_fused_attention = False

        if device_compute_capability == (12, 0):
            if use_flash_attention: