Unverified Commit 684f2545 authored by Matthew Bonanni's avatar Matthew Bonanni Committed by GitHub
Browse files

Prefer FlashAttention MLA as default over FlashMLA (#27363)


Signed-off-by: default avatarMatthew Bonanni <mbonanni@redhat.com>
parent e5534249
...@@ -55,15 +55,15 @@ def _get_backend_priorities( ...@@ -55,15 +55,15 @@ def _get_backend_priorities(
return [ return [
AttentionBackendEnum.CUTLASS_MLA, AttentionBackendEnum.CUTLASS_MLA,
AttentionBackendEnum.FLASHINFER_MLA, AttentionBackendEnum.FLASHINFER_MLA,
AttentionBackendEnum.FLASHMLA,
AttentionBackendEnum.FLASH_ATTN_MLA, AttentionBackendEnum.FLASH_ATTN_MLA,
AttentionBackendEnum.FLASHMLA,
AttentionBackendEnum.TRITON_MLA, AttentionBackendEnum.TRITON_MLA,
AttentionBackendEnum.FLASHMLA_SPARSE, AttentionBackendEnum.FLASHMLA_SPARSE,
] ]
else: else:
return [ return [
AttentionBackendEnum.FLASHMLA,
AttentionBackendEnum.FLASH_ATTN_MLA, AttentionBackendEnum.FLASH_ATTN_MLA,
AttentionBackendEnum.FLASHMLA,
AttentionBackendEnum.FLASHINFER_MLA, AttentionBackendEnum.FLASHINFER_MLA,
AttentionBackendEnum.TRITON_MLA, AttentionBackendEnum.TRITON_MLA,
AttentionBackendEnum.FLASHMLA_SPARSE, AttentionBackendEnum.FLASHMLA_SPARSE,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment