Fix mini_lb for PD with long output: limit chunk size of decode response (#7301)

Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>

Fix mini_lb for PD with long output: limit chunk size of decode response (#7301)
Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
2ae809c5 · ch-tiger1 · GitHub · 1de4db9b · 2ae809c5
Unverified Commit 2ae809c5 authored Jun 19, 2025 by ch-tiger1 Committed by GitHub Jun 18, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

python/sglang/srt/disaggregation/mini_lb.py python/sglang/srt/disaggregation/mini_lb.py +7 -1

No files found.
--- a/python/sglang/srt/disaggregation/mini_lb.py
+++ b/python/sglang/srt/disaggregation/mini_lb.py
@@ -18,6 +18,10 @@ from fastapi.responses import ORJSONResponse, Response, StreamingResponse

 from sglang.srt.disaggregation.utils import PDRegistryRequest

+AIOHTTP_STREAM_READ_CHUNK_SIZE = (
+    1024 * 64
+)  # 64KB, to prevent aiohttp's "Chunk too big" error
+

 def setup_logger():
    logger = logging.getLogger("pdlb")
@@ -154,7 +158,9 @@ class MiniLoadBalancer:
                        else:
                            yield chunk
                else:
-                    async for chunk in decode_response.content:
+                    async for chunk in decode_response.content.iter_chunked(
+                        AIOHTTP_STREAM_READ_CHUNK_SIZE
+                    ):
                        yield chunk

        return StreamingResponse(