Unverified Commit 864de4be authored by jthomson04's avatar jthomson04 Committed by GitHub
Browse files

fix: Reduce max_num_tokens in wide_ep dsr1 prefill worker (#7133)


Signed-off-by: default avatarjthomson04 <jwillthomson19@gmail.com>
parent aeaf45f5
...@@ -26,7 +26,7 @@ pipeline_parallel_size: 1 ...@@ -26,7 +26,7 @@ pipeline_parallel_size: 1
enable_attention_dp: true enable_attention_dp: true
max_batch_size: 1 max_batch_size: 1
max_num_tokens: 8192 max_num_tokens: 4096
max_seq_len: 8192 max_seq_len: 8192
kv_cache_config: kv_cache_config:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment