[Misc] Remove unnecessary fallback to prefill-decode attention (#19138)

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

[Misc] Remove unnecessary fallback to prefill-decode attention (#19138)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
18093084 · vllmellm · GitHub · da403802 · 18093084
Unverified Commit 18093084 authored Jun 05, 2025 by vllmellm Committed by GitHub Jun 05, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 4 deletions

vllm/v1/attention/backends/triton_attn.py vllm/v1/attention/backends/triton_attn.py +1 -4

No files found.
--- a/vllm/v1/attention/backends/triton_attn.py
+++ b/vllm/v1/attention/backends/triton_attn.py
@@ -171,10 +171,7 @@ class TritonAttentionImpl(AttentionImpl):
        # Whenever making a change in this method, please benchmark the
        # performance to make sure it does not introduce any overhead.
-        num_queries_per_kv = query.shape[1] // key.shape[1]
+        use_prefill_decode_attn = self.force_prefill_decode_attn
-        num_q_is_pow2 = (num_queries_per_kv & (num_queries_per_kv - 1)) == 0
-        use_prefill_decode_attn = (self.force_prefill_decode_attn
-                                   or not num_q_is_pow2)
        num_actual_tokens = attn_metadata.num_actual_tokens
        if use_prefill_decode_attn: