mla: don't update kv cache on dummy forwards (#36282)

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

mla: don't update kv cache on dummy forwards (#36282)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
24a03915 · Itay Alroy · GitHub · b5e34e1f · 24a03915
Unverified Commit 24a03915 authored Mar 07, 2026 by Itay Alroy Committed by GitHub Mar 07, 2026
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

vllm/model_executor/layers/attention/mla_attention.py vllm/model_executor/layers/attention/mla_attention.py +4 -0

No files found.
--- a/vllm/model_executor/layers/attention/mla_attention.py
+++ b/vllm/model_executor/layers/attention/mla_attention.py
@@ -905,6 +905,10 @@ def unified_mla_kv_cache_update(
    the data dependency between them to ensure torch.compile preserves ordering.
    """
    forward_context = get_forward_context()
+    if forward_context.attn_metadata is None:
+        # Dummy/profile forwards should not update live KV cache pages.
+        return torch.empty(0, device=kv_c_normed.device, dtype=kv_c_normed.dtype)
+
    attn_layer = forward_context.no_compile_layers[layer_name]
    kv_cache = attn_layer.kv_cache[forward_context.virtual_engine]