[perf] Use CPU tensor to reduce GPU->CPU sync (#25884)

Signed-off-by: Lehua Ding <lehuading@tencent.com>

[perf] Use CPU tensor to reduce GPU->CPU sync (#25884)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
e184c9c5 · Lehua Ding · GitHub · d7e34b42 · e184c9c5
Unverified Commit e184c9c5 authored Sep 30, 2025 by Lehua Ding Committed by GitHub Sep 30, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/v1/worker/gpu_model_runner.py vllm/v1/worker/gpu_model_runner.py +1 -1

No files found.
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -2478,7 +2478,7 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
            effective_drafter_max_model_len = (
                self.speculative_config.draft_model_config.max_model_len)
        input_fits_in_drafter = spec_decode_common_attn_metadata and (
-            spec_decode_common_attn_metadata.seq_lens.max() +
+            spec_decode_common_attn_metadata.max_seq_len +
            self.speculative_config.num_speculative_tokens
            <= effective_drafter_max_model_len)
        if use_padded_batch_for_eagle and input_fits_in_drafter: