examples/tensorrt_llm/configs/disagg.yaml · 6630fa5c4423af6df9c346d4fba8c9f18eeb6f99 · OpenDAS / dynamo

fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c

richardhuo-nv authored Apr 28, 2025

We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.

6630fa5c

disagg.yaml 1.49 KB

Replace disagg.yaml