[bugfix][deepseek] fix flashmla kernel selection (#25956)

Signed-off-by: youkaichao <youkaichao@gmail.com>

[bugfix][deepseek] fix flashmla kernel selection (#25956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
a2e6fa7e · youkaichao · GitHub · 9f1c4eca · a2e6fa7e
Unverified Commit a2e6fa7e authored Oct 01, 2025 by youkaichao Committed by GitHub Oct 01, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/attention/ops/flashmla.py vllm/attention/ops/flashmla.py +1 -1

No files found.
--- a/vllm/attention/ops/flashmla.py
+++ b/vllm/attention/ops/flashmla.py
@@ -136,7 +136,7 @@ def flash_mla_with_kvcache(
        descale_k is None
    ), "descale_q and descale_k should be both None or both not None"

-    if (descale_q is not None) and (descale_k is not None):
+    if indices is None and q.element_size() == 1:
        out, softmax_lse = torch.ops._flashmla_extension_C.fwd_kvcache_mla_fp8(
            q, k_cache, head_dim_v, cache_seqlens, block_table, softmax_scale,
            causal, tile_scheduler_metadata, num_splits, descale_q, descale_k)