Unverified Commit 19748806 authored by Benjamin Chislett's avatar Benjamin Chislett Committed by GitHub
Browse files

[Bugfix] skip cuda graph for drafter when running with eager (#26821)


Signed-off-by: default avatarBenjamin Chislett <bchislett@nvidia.com>
parent 4a8a567e
......@@ -3482,7 +3482,10 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
if self.speculative_config and self.speculative_config.use_eagle():
assert isinstance(self.drafter, EagleProposer)
use_cudagraphs = cudagraph_runtime_mode == CUDAGraphMode.PIECEWISE
use_cudagraphs = (
cudagraph_runtime_mode == CUDAGraphMode.PIECEWISE
and not self.speculative_config.enforce_eager
)
self.drafter.dummy_run(num_tokens, use_cudagraphs=use_cudagraphs)
# This is necessary to avoid blocking DP.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment