[Bugfix] Fix V1 dummy run writing NaN to KV cache null block (#39444)

Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>

[Bugfix] Fix V1 dummy run writing NaN to KV cache null block (#39444)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
ad720aef · Elvir Crnčević · GitHub · 270e8a41 · ad720aef
Unverified Commit ad720aef authored Apr 10, 2026 by Elvir Crnčević Committed by GitHub Apr 10, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

vllm/v1/worker/gpu_model_runner.py vllm/v1/worker/gpu_model_runner.py +7 -1

No files found.
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -5356,12 +5356,18 @@ class GPUModelRunner(
        attn_metadata: PerLayerAttnMetadata | None = None
        slot_mappings_by_group, slot_mappings = self._get_slot_mappings(
-            num_tokens_padded=num_tokens,
+            num_tokens_padded=num_tokens_padded,
            num_reqs_padded=num_reqs_padded,
            num_tokens_unpadded=num_tokens_unpadded,
            ubatch_slices=ubatch_slices_padded,
        )
+        # Dummy runs have no real slot assignments — fill with -1 so
+        # concat_and_cache kernels skip the KV write.
+        if slot_mappings_by_group is not None:
+            for sm in slot_mappings_by_group.values():
+                sm.fill_(-1)
        # _dummy_run shares pinned CPU buffers (seq_lens, query_start_loc,
        # etc.) with execute_model.  It must participate in the same event
        # protocol so that back-to-back dummy/real steps don't overwrite