Unverified Commit 24a03915 authored by Itay Alroy's avatar Itay Alroy Committed by GitHub
Browse files

mla: don't update kv cache on dummy forwards (#36282)


Signed-off-by: default avatarItay Alroy <ialroy@nvidia.com>
parent b5e34e1f
......@@ -905,6 +905,10 @@ def unified_mla_kv_cache_update(
the data dependency between them to ensure torch.compile preserves ordering.
"""
forward_context = get_forward_context()
if forward_context.attn_metadata is None:
# Dummy/profile forwards should not update live KV cache pages.
return torch.empty(0, device=kv_c_normed.device, dtype=kv_c_normed.dtype)
attn_layer = forward_context.no_compile_layers[layer_name]
kv_cache = attn_layer.kv_cache[forward_context.virtual_engine]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment