Unverified Commit 02a43f82 authored by Michael Goin's avatar Michael Goin Committed by GitHub
Browse files

Update default max_num_batch_tokens for chunked prefill to 2048 (#10544)

parent cfea9c04
......@@ -1133,9 +1133,9 @@ class SchedulerConfig:
# max_num_batched_tokens.
self.max_num_batched_tokens = max(self.max_model_len, 2048)
else:
# It is the values that have the best balance between ITL
# and TTFT on A100. Note it is not optimized for throughput.
self.max_num_batched_tokens = 512
# This value is chosen to have a balance between ITL
# and TTFT. Note it is not optimized for throughput.
self.max_num_batched_tokens = 2048
else:
# If max_model_len is too short, use 2048 as the default value
# for higher throughput.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment