Unverified Commit 43854732 authored by Neal Vaidya's avatar Neal Vaidya Committed by GitHub
Browse files

fix: use default batch_size for decode (#2376)

parent 3b722842
......@@ -17,7 +17,6 @@ disable_overlap_scheduler: false
moe_config:
backend: CUTLASS
cuda_graph_config:
max_batch_size: 128
enable_padding: true
cache_transceiver_config:
backend: ucx
......
......@@ -203,7 +203,6 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \
--disaggregation-mode decode \
--disaggregation-strategy prefill_first \
--max-num-tokens 16384 \
--max-batch-size 128 \
--free-gpu-memory-fraction 0.9 \
--tensor-parallel-size 4 \
--expert-parallel-size 4
......
......@@ -40,7 +40,6 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \
--disaggregation-mode decode \
--disaggregation-strategy "$DISAGGREGATION_STRATEGY" \
--max-num-tokens 16384 \
--max-batch-size 128 \
--free-gpu-memory-fraction 0.9 \
--tensor-parallel-size 4 \
--expert-parallel-size 4
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment