[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455)

Signed-off-by: Lehua Ding <lehuading@tencent.com>

[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
04024282 · Lehua Ding · GitHub · 17af6aa0 · 04024282
Unverified Commit 04024282 authored Oct 25, 2025 by Lehua Ding Committed by GitHub Oct 24, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

vllm/v1/worker/gpu_model_runner.py vllm/v1/worker/gpu_model_runner.py +4 -1

No files found.
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@@ -3492,7 +3492,10 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
            self.eplb_step(is_dummy=True, is_profile=is_profile)
        logit_indices = np.cumsum(num_scheduled_tokens) - 1
-        return hidden_states, hidden_states[logit_indices]
+        logit_indices_device = torch.from_numpy(logit_indices).to(
+            self.device, non_blocking=True
+        )
+        return hidden_states, hidden_states[logit_indices_device]
    @torch.inference_mode()
    def _dummy_sampler_run(