fix: change the processor number to 5 to reduce the tokenization bottleneck (#865)
We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128. The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.
Showing
Please register or sign in to comment