fix(sglang): use incremental_streaming_output instead of deprecated stream_output (#7642)

Signed-off-by: Yangmin Li <yangminl@nvidia.com>

fix(sglang): use incremental_streaming_output instead of deprecated stream_output (#7642)
Signed-off-by: Yangmin Li <yangminl@nvidia.com>
8fe2082c · YAMY · GitHub · ab5a31b5 · 8fe2082c
Unverified Commit 8fe2082c authored Apr 01, 2026 by YAMY Committed by GitHub Apr 01, 2026
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 4 deletions

components/src/dynamo/sglang/args.py components/src/dynamo/sglang/args.py +6 -4

No files found.
--- a/components/src/dynamo/sglang/args.py
+++ b/components/src/dynamo/sglang/args.py
@@ -374,9 +374,11 @@ async def parse_args(args: list[str]) -> Config:
        )

    # Dynamo's streaming handlers expect disjoint output_ids from SGLang (only new
-    # tokens since last output), not cumulative tokens. When stream_output=True,
-    # SGLang sends disjoint segments which Dynamo passes through directly.
-    # Force stream_output=True for optimal streaming performance.
+    # tokens since last output), not cumulative tokens.
+    # sglang renamed stream_output -> incremental_streaming_output in PR #20614.
+    if hasattr(ServerArgs, "incremental_streaming_output"):
+        server_args.incremental_streaming_output = True
+    else:
        server_args.stream_output = True

    if dynamo_config.use_sglang_tokenizer: