Unverified Commit c878b43b authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

[Model Runner V2] Remove unnecessary copies in PW CUDA graph capture (#34849)


Signed-off-by: default avatarWoosuk Kwon <woosuk@inferact.ai>
parent 2b84ac66
...@@ -218,13 +218,11 @@ class CudaGraphManager: ...@@ -218,13 +218,11 @@ class CudaGraphManager:
batch_descriptor=batch_descriptor, batch_descriptor=batch_descriptor,
slot_mapping=slot_mappings, slot_mapping=slot_mappings,
): ):
hidden_states = model( model(
input_ids=input_ids, input_ids=input_ids,
positions=positions, positions=positions,
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
) )
assert self.hidden_states is not None
self.hidden_states[:num_tokens] = hidden_states
@torch.inference_mode() @torch.inference_mode()
def capture( def capture(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment