"git@developer.sourcefind.cn:OpenDAS/TransformerEngine.git" did not exist on "b20c05310abf293db722345f490a9107894771d4"
Unverified Commit ab5cc407 authored by yuzhongw-nvidia's avatar yuzhongw-nvidia Committed by GitHub
Browse files

Fix the condition error when checking fp8 attn in `get_attention_backend` (#1965)



Update utils.py

Fix the condition error of the FP8 attention in `get_attention_backend`
Signed-off-by: default avataryuzhongw-nvidia <yuzhongw@nvidia.com>
Co-authored-by: default avatarXiaowei Ren <103958965+xrennvidia@users.noreply.github.com>
parent 78a38212
......@@ -609,7 +609,7 @@ def get_attention_backend(
" bias for THD format"
)
use_fused_attention = False
elif fp8 and head_dim_qk != head_dim_v:
elif fp8 and fp8_meta["recipe"].fp8_dpa and head_dim_qk != head_dim_v:
logger.debug(
"Disabling FusedAttention as it does not support context parallelism with FP8"
" MLA attention"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment