[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037)

Signed-off-by: NickLucche <nlucches@redhat.com>

[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037)
Signed-off-by: NickLucche <nlucches@redhat.com>
af35d3a3 · Nicolò Lucchesi · GitHub · 3b457143 · af35d3a3
Unverified Commit af35d3a3 authored Mar 18, 2025 by Nicolò Lucchesi Committed by GitHub Mar 18, 2025
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 0 deletions

vllm/v1/worker/tpu_model_runner.py vllm/v1/worker/tpu_model_runner.py +3 -0

No files found.
--- a/vllm/v1/worker/tpu_model_runner.py
+++ b/vllm/v1/worker/tpu_model_runner.py
@@ -410,6 +410,9 @@ class TPUModelRunner:
        # Do the padding and copy the tensors to the TPU.
        padded_total_num_scheduled_tokens = _get_padded_token_len(
            total_num_scheduled_tokens)
+        # Zero out to avoid spurious values from prev iteration (last cp chunk)
+        self.input_ids_cpu[
+            total_num_scheduled_tokens:padded_total_num_scheduled_tokens] = 0
        self.input_ids = self.input_ids_cpu[:
                                            padded_total_num_scheduled_tokens].to(
                                                self.device)