Fix the condition error when checking fp8 attn in `get_attention_backend` (#1965)

Update utils.py Fix the condition error of the FP8 attention in `get_attention_backend` Signed-off-by: yuzhongw-nvidia <yuzhongw@nvidia.com> Co-authored-by: Xiaowei Ren <103958965+xrennvidia@users.noreply.github.com>

Fix the condition error when checking fp8 attn in `get_attention_backend` (#1965)
Update utils.py Fix the condition error of the FP8 attention in `get_attention_backend` Signed-off-by: yuzhongw-nvidia <yuzhongw@nvidia.com> Co-authored-by: Xiaowei Ren <103958965+xrennvidia@users.noreply.github.com>
ab5cc407 · yuzhongw-nvidia · GitHub · 78a38212 · ab5cc407
Unverified Commit ab5cc407 authored Jul 22, 2025 by yuzhongw-nvidia Committed by GitHub Jul 21, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

transformer_engine/pytorch/attention/dot_product_attention/utils.py ...r_engine/pytorch/attention/dot_product_attention/utils.py +1 -1

No files found.
--- a/transformer_engine/pytorch/attention/dot_product_attention/utils.py
+++ b/transformer_engine/pytorch/attention/dot_product_attention/utils.py
@@ -609,7 +609,7 @@ def get_attention_backend(
                " bias for THD format"
            )
            use_fused_attention = False
-        elif fp8 and head_dim_qk != head_dim_v:
+        elif fp8 and fp8_meta["recipe"].fp8_dpa and head_dim_qk != head_dim_v:
            logger.debug(
                "Disabling FusedAttention as it does not support context parallelism with FP8"
                " MLA attention"