Unverified Commit c878b43b authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

[Model Runner V2] Remove unnecessary copies in PW CUDA graph capture (#34849)


Signed-off-by: default avatarWoosuk Kwon <woosuk@inferact.ai>
parent 2b84ac66
......@@ -218,13 +218,11 @@ class CudaGraphManager:
batch_descriptor=batch_descriptor,
slot_mapping=slot_mappings,
):
hidden_states = model(
model(
input_ids=input_ids,
positions=positions,
inputs_embeds=inputs_embeds,
)
assert self.hidden_states is not None
self.hidden_states[:num_tokens] = hidden_states
@torch.inference_mode()
def capture(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment