Unverified Commit a738dbb2 authored by QiliangCui's avatar QiliangCui Committed by GitHub
Browse files

Update test case parameter to have the throughput above 8.0 (#19994)


Signed-off-by: default avatarQiliang Cui <derrhein@gmail.com>
parent 33d5e29b
...@@ -4,8 +4,8 @@ CONTAINER_NAME=vllm-tpu ...@@ -4,8 +4,8 @@ CONTAINER_NAME=vllm-tpu
# vllm config # vllm config
MODEL=meta-llama/Llama-3.1-8B-Instruct MODEL=meta-llama/Llama-3.1-8B-Instruct
MAX_NUM_SEQS=512 MAX_NUM_SEQS=256
MAX_NUM_BATCHED_TOKENS=512 MAX_NUM_BATCHED_TOKENS=1024
TENSOR_PARALLEL_SIZE=1 TENSOR_PARALLEL_SIZE=1
MAX_MODEL_LEN=2048 MAX_MODEL_LEN=2048
DOWNLOAD_DIR=/mnt/disks/persist DOWNLOAD_DIR=/mnt/disks/persist
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment