Unverified Commit 8fe2082c authored by YAMY's avatar YAMY Committed by GitHub
Browse files

fix(sglang): use incremental_streaming_output instead of deprecated stream_output (#7642)


Signed-off-by: default avatarYangmin Li <yangminl@nvidia.com>
parent ab5a31b5
...@@ -374,9 +374,11 @@ async def parse_args(args: list[str]) -> Config: ...@@ -374,9 +374,11 @@ async def parse_args(args: list[str]) -> Config:
) )
# Dynamo's streaming handlers expect disjoint output_ids from SGLang (only new # Dynamo's streaming handlers expect disjoint output_ids from SGLang (only new
# tokens since last output), not cumulative tokens. When stream_output=True, # tokens since last output), not cumulative tokens.
# SGLang sends disjoint segments which Dynamo passes through directly. # sglang renamed stream_output -> incremental_streaming_output in PR #20614.
# Force stream_output=True for optimal streaming performance. if hasattr(ServerArgs, "incremental_streaming_output"):
server_args.incremental_streaming_output = True
else:
server_args.stream_output = True server_args.stream_output = True
if dynamo_config.use_sglang_tokenizer: if dynamo_config.use_sglang_tokenizer:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment