Update README.md

36816c8e · jerrrrry · 6cd6b13d · 36816c8e
Commit 36816c8e authored Jun 17, 2025 by jerrrrry
Show whitespace changes
Inline Side-by-side

Showing with 58 additions and 0 deletions

README.md README.md +58 -0

No files found.
--- a/README.md
+++ b/README.md
+# 0.8.5
+1. Offline推理 根据需求自定义参数
+benchmark_throughput_0.8.5.py
+使用如下脚本可以减少不同参数推理时反复load模型
+batch prompt_tokens completion_tokens可以用空格分隔传成字符串
+其他参数与标准脚本一致
+<pre>
+export HIP_VISIBLE_DEVICES=1
+tp=1
+model_path=/llm-models/qwen1.5/Qwen1.5-0.5B-Chat
+batch="1 2"
+prompt_tokens="16 64"
+completion_tokens="128 256"
+python benchmark_throughput_0.8.5.py --model ${model_path} --tensor-parallel-size ${tp} --num-prompts ${batch} --input-len ${prompt_tokens} --output-len ${completion_tokens} \
+    --dtype float16  --trust-remote-code --max-model-len 32768 --output-json ./test_0.5B-0.7.2.txt
+</pre>
+按照如上传参，则计算的场景如下：
+bs    input    output
+1      16        128
+1      64        256
+2      16        128
+2      64        256
+推理结果汇总在--output-json ./test_0.5B-0.7.2.txt当中,示例如下：
+bash
+bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_median,ttft_p99,tpop_mean,tpop_median,tpop_p99,output_token_throughput_mean,output_token_throughput_median,output_token_throughput_p99,inout_token_throughput_mean,inout_token_throughput_median,inout_token_throughput_p99
+1_16_128,3.49,0.29,41.26,36.68,0.03801,0.03801,0.03801,0.0269,0.02691,0.02691,37.04,37.04,37.04,41.66,41.66,41.66
+1_64_256,7.14,0.14,44.82,35.85,0.0291,0.0291,0.0291,0.0278,0.02776,0.02776,36.01,36.01,36.01,45.01,45.01,45.01
+2_16_128,3.62,0.55,79.56,70.72,0.04829,0.04829,0.04893,0.028,0.02801,0.02801,35.51,35.51,35.51,39.94,39.94,39.95
+2_64_256,7.31,0.27,87.55,70.04,0.04697,0.04697,0.04764,0.0284,0.02836,0.02836,35.17,35.17,35.18,43.97,43.97,43.97
+2. Server推理
+先 bash server.sh  等待服务起来后 再bash test.sh   根据需求修改测试参数
 # 0.7.2
 1. Offline推理
 benchmark_throughput_0.7.2.py