fix(vllm): warn that stream interval is not respected for now (#4650)

Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>

fix(vllm): warn that stream interval is not respected for now (#4650)
Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
bbeb2808 · Alec · GitHub · f26dbd09 · bbeb2808
Unverified Commit bbeb2808 authored Dec 15, 2025 by Alec Committed by GitHub Dec 15, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 0 deletions

components/src/dynamo/vllm/args.py components/src/dynamo/vllm/args.py +6 -0

No files found.
--- a/components/src/dynamo/vllm/args.py
+++ b/components/src/dynamo/vllm/args.py
@@ -225,6 +225,12 @@ def parse_args() -> Config:
    args.enable_local_indexer = str(args.enable_local_indexer).lower() == "true"
    engine_args = AsyncEngineArgs.from_cli_args(args)
+    if hasattr(engine_args, "stream_interval") and engine_args.stream_interval != 1:
+        logger.warning(
+            "--stream-interval is currently not respected in Dynamo. "
+            "Dynamo uses its own post-processing implementation on the frontend, "
+            "bypassing vLLM's OutputProcessor buffering. "
+        )
    # Workaround for vLLM GIL contention bug with NIXL connector when using UniProcExecutor.
    # With TP=1, vLLM defaults to UniProcExecutor which runs scheduler and worker in the same
    # process. This causes a hot loop in _process_engine_step that doesn't release the GIL,