Update README.md

eac68470 · jerrrrry · be3814d9 · eac68470
Commit eac68470 authored May 07, 2025 by jerrrrry
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 7 deletions

README.md README.md +8 -7

No files found.
--- a/README.md
+++ b/README.md
-# 环境配置
-1. 拉取镜像，创建容器，安装基础依赖包
-vllm测试
-0.7.2
-Offline推理
+# 0.7.2
+1. Offline推理
 benchmark_throughput_0.7.2.py
 使用如下脚本可以减少不同参数推理时反复load模型
 batch prompt_tokens completion_tokens可以用空格分隔传成字符串
@@ -26,7 +23,7 @@ bs    input    output
 2      16        128
 2      64        256

-推理结果汇总在--output-json ./test_0.5B-0.7.2.txt当中,示例如下表：
+推理结果汇总在--output-json ./test_0.5B-0.7.2.txt当中,示例如下：

 bash
 bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_median,ttft_p99,tpop_mean,tpop_median,tpop_p99,output_token_throughput_mean,output_token_throughput_median,output_token_throughput_p99,inout_token_throughput_mean,inout_token_throughput_median,inout_token_throughput_p99
@@ -35,7 +32,7 @@ bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_medi
 2_16_128,3.62,0.55,79.56,70.72,0.04829,0.04829,0.04893,0.028,0.02801,0.02801,35.51,35.51,35.51,39.94,39.94,39.95
 2_64_256,7.31,0.27,87.55,70.04,0.04697,0.04697,0.04764,0.0284,0.02836,0.02836,35.17,35.17,35.18,43.97,43.97,43.97

-Server推理
+2. Server推理
 benchmark_servein_0.7.2.py
 backend_request_func.py
 使用此方式可以避免sever推理时实际生成长度和指定长度不一致问题
@@ -43,14 +40,18 @@ bash
 #使用提供的脚本进行测试

 #启动server
+</pre>
 vllm serve $MODEL_PATH  --trust-remote-code   --dtype $dtype --max-model-len $max_len -tp $tp  --gpu-memory-utilization 0.97
+</pre>



 #发送请求
 #--distributed-executor-backend ray等其他参数根据实际情况添加
+</pre>
 方式与平常一样，只是需要加上--ignore-eos
 python  benchmark_servein_0.7.2.py --backend vllm --ignore-eos  --dataset-name random --random-input-len  $input_len --random-output-len  $output_len --model $MODEL_PATH  --num-prompts $num_prompts --endpoint /v1/completions
+</pre>
 prof
 offline_prof
 hipprof