"docs/pages/backends/vllm/multi-node.md" did not exist on "0a2a820bcacda705d927c6fdfcf37ec076e4e3fd"
Unverified Commit 8fe2082c authored by YAMY's avatar YAMY Committed by GitHub
Browse files

fix(sglang): use incremental_streaming_output instead of deprecated stream_output (#7642)


Signed-off-by: default avatarYangmin Li <yangminl@nvidia.com>
parent ab5a31b5
......@@ -374,9 +374,11 @@ async def parse_args(args: list[str]) -> Config:
)
# Dynamo's streaming handlers expect disjoint output_ids from SGLang (only new
# tokens since last output), not cumulative tokens. When stream_output=True,
# SGLang sends disjoint segments which Dynamo passes through directly.
# Force stream_output=True for optimal streaming performance.
# tokens since last output), not cumulative tokens.
# sglang renamed stream_output -> incremental_streaming_output in PR #20614.
if hasattr(ServerArgs, "incremental_streaming_output"):
server_args.incremental_streaming_output = True
else:
server_args.stream_output = True
if dynamo_config.use_sglang_tokenizer:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment