feat(sglang): enforce stream_output=True for optimal streaming performance (#5510)
This ensures that only new tokens are returned by sglang which avoids the overhead from creating copies of the entire token sequences per each iteration. These copies can become a bottleneck particularly for long sequence lengths and large concurrency counts.
Signed-off-by:
Matej Kosec <mkosec@nvidia.com>
Showing
Please register or sign in to comment