Qwen2-72B-Instruct-GPTQ-Int4测试benchmark_throughput,在固定输入输出长度下，随着num-prompts增加没有线性能力 (#4) · Issues · ModelZoo / Qwen1.5_vllm

Qwen2-72B-Instruct-GPTQ-Int4测试benchmark_throughput,在固定输入输出长度下，随着num-prompts增加没有线性能力

测试命令：python3 benchmark_throughput.py --model $path -q gptq --tensor-parallel-size 4 --num-prompts $bs --input-len $input_len --output-len $output_len --trust-remote-code --enforce-eager --dtype float16 结果： Batch Size<32时候，Generate Throughput没有线性提升，其值基本不变。