[Bugfix][TPU] Use np array when updating cache slot_mapping (#17971)

Signed-off-by: Siyuan Liu <lsiyuan@google.com>

[Bugfix][TPU] Use np array when updating cache slot_mapping (#17971)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
43078301 · Siyuan Liu · GitHub · 19a3c78d · 43078301
Unverified Commit 43078301 authored May 11, 2025 by Siyuan Liu Committed by GitHub May 12, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/v1/worker/tpu_model_runner.py vllm/v1/worker/tpu_model_runner.py +1 -1

No files found.
--- a/vllm/v1/worker/tpu_model_runner.py
+++ b/vllm/v1/worker/tpu_model_runner.py
@@ -531,7 +531,7 @@ class TPUModelRunner(LoRAModelRunnerMixin):
        np.add(block_numbers * self.block_size,
               block_offsets,
               out=self.input_batch.block_table.
-               slot_mapping_cpu[:total_num_scheduled_tokens])
+               slot_mapping_np[:total_num_scheduled_tokens])

        # Prepare the attention metadata.
        self.query_start_loc_np[0] = 0