[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
24b65eff · Chen Zhang · GitHub · 41b6f920 · 24b65eff
Unverified Commit 24b65eff authored Dec 18, 2025 by Chen Zhang Committed by GitHub Dec 18, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

vllm/v1/engine/core_client.py vllm/v1/engine/core_client.py +2 -1

No files found.
--- a/vllm/v1/engine/core_client.py
+++ b/vllm/v1/engine/core_client.py
@@ -268,7 +268,8 @@ class InprocClient(EngineCoreClient):
        self.engine_core = EngineCore(*args, **kwargs)

    def get_output(self) -> EngineCoreOutputs:
-        outputs, _ = self.engine_core.step_fn()
+        outputs, model_executed = self.engine_core.step_fn()
+        self.engine_core.post_step(model_executed=model_executed)
        return outputs and outputs.get(0) or EngineCoreOutputs()

    def get_supported_tasks(self) -> tuple[SupportedTask, ...]: