Unverified Commit d74278fb authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

[Model Runner V2] Fix unintended CPU-GPU sync in make_dummy (#34667)


Signed-off-by: default avatarWoosuk Kwon <woosuk@inferact.ai>
parent b68fd899
...@@ -108,7 +108,7 @@ class InputBatch: ...@@ -108,7 +108,7 @@ class InputBatch:
query_start_loc_np = np.empty(num_reqs + 1, dtype=np.int32) query_start_loc_np = np.empty(num_reqs + 1, dtype=np.int32)
query_start_loc_np[0] = 0 query_start_loc_np[0] = 0
np.cumsum(num_scheduled_tokens, out=query_start_loc_np[1:]) np.cumsum(num_scheduled_tokens, out=query_start_loc_np[1:])
input_buffers.query_start_loc[0] = 0 input_buffers.query_start_loc[:1] = 0
torch.cumsum( torch.cumsum(
seq_lens, dim=0, out=input_buffers.query_start_loc[1 : num_reqs + 1] seq_lens, dim=0, out=input_buffers.query_start_loc[1 : num_reqs + 1]
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment