Unverified Commit 43078301 authored by Siyuan Liu's avatar Siyuan Liu Committed by GitHub
Browse files

[Bugfix][TPU] Use np array when updating cache slot_mapping (#17971)


Signed-off-by: default avatarSiyuan Liu <lsiyuan@google.com>
parent 19a3c78d
...@@ -531,7 +531,7 @@ class TPUModelRunner(LoRAModelRunnerMixin): ...@@ -531,7 +531,7 @@ class TPUModelRunner(LoRAModelRunnerMixin):
np.add(block_numbers * self.block_size, np.add(block_numbers * self.block_size,
block_offsets, block_offsets,
out=self.input_batch.block_table. out=self.input_batch.block_table.
slot_mapping_cpu[:total_num_scheduled_tokens]) slot_mapping_np[:total_num_scheduled_tokens])
# Prepare the attention metadata. # Prepare the attention metadata.
self.query_start_loc_np[0] = 0 self.query_start_loc_np[0] = 0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment