[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
ab597c86 · Gregory Shtrasberg · GitHub · 4197168e · ab597c86
Unverified Commit ab597c86 authored Jan 28, 2026 by Gregory Shtrasberg Committed by GitHub Jan 28, 2026
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

vllm/v1/attention/backends/triton_attn.py vllm/v1/attention/backends/triton_attn.py +4 -0

No files found.
--- a/vllm/v1/attention/backends/triton_attn.py
+++ b/vllm/v1/attention/backends/triton_attn.py
@@ -572,6 +572,10 @@ class TritonAttentionImpl(AttentionImpl):
        kv_cache: torch.Tensor,
        slot_mapping: torch.Tensor,
    ):
+        if self.attn_type in (AttentionType.ENCODER_ONLY, AttentionType.ENCODER):
+            # For encoder attention,
+            # we use direct Q, K, V tensors without caching
+            return
        # For decoder and cross-attention, use KV cache as before
        key_cache, value_cache = kv_cache.unbind(1)