Commit eac68470 authored by jerrrrry's avatar jerrrrry
Browse files

Update README.md

parent be3814d9
# 环境配置 # 0.7.2
1. 拉取镜像,创建容器,安装基础依赖包 1. Offline推理
vllm测试
0.7.2
Offline推理
benchmark_throughput_0.7.2.py benchmark_throughput_0.7.2.py
使用如下脚本可以减少不同参数推理时反复load模型 使用如下脚本可以减少不同参数推理时反复load模型
batch prompt_tokens completion_tokens可以用空格分隔传成字符串 batch prompt_tokens completion_tokens可以用空格分隔传成字符串
...@@ -26,7 +23,7 @@ bs input output ...@@ -26,7 +23,7 @@ bs input output
2 16 128 2 16 128
2 64 256 2 64 256
推理结果汇总在--output-json ./test_0.5B-0.7.2.txt当中,示例如下 推理结果汇总在--output-json ./test_0.5B-0.7.2.txt当中,示例如下:
bash bash
bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_median,ttft_p99,tpop_mean,tpop_median,tpop_p99,output_token_throughput_mean,output_token_throughput_median,output_token_throughput_p99,inout_token_throughput_mean,inout_token_throughput_median,inout_token_throughput_p99 bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_median,ttft_p99,tpop_mean,tpop_median,tpop_p99,output_token_throughput_mean,output_token_throughput_median,output_token_throughput_p99,inout_token_throughput_mean,inout_token_throughput_median,inout_token_throughput_p99
...@@ -35,7 +32,7 @@ bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_medi ...@@ -35,7 +32,7 @@ bs_in_out,elapsed_time,Throughput,total_tokens,output_tokens,ttft_mean,ttft_medi
2_16_128,3.62,0.55,79.56,70.72,0.04829,0.04829,0.04893,0.028,0.02801,0.02801,35.51,35.51,35.51,39.94,39.94,39.95 2_16_128,3.62,0.55,79.56,70.72,0.04829,0.04829,0.04893,0.028,0.02801,0.02801,35.51,35.51,35.51,39.94,39.94,39.95
2_64_256,7.31,0.27,87.55,70.04,0.04697,0.04697,0.04764,0.0284,0.02836,0.02836,35.17,35.17,35.18,43.97,43.97,43.97 2_64_256,7.31,0.27,87.55,70.04,0.04697,0.04697,0.04764,0.0284,0.02836,0.02836,35.17,35.17,35.18,43.97,43.97,43.97
Server推理 2. Server推理
benchmark_servein_0.7.2.py benchmark_servein_0.7.2.py
backend_request_func.py backend_request_func.py
使用此方式可以避免sever推理时实际生成长度和指定长度不一致问题 使用此方式可以避免sever推理时实际生成长度和指定长度不一致问题
...@@ -43,14 +40,18 @@ bash ...@@ -43,14 +40,18 @@ bash
#使用提供的脚本进行测试 #使用提供的脚本进行测试
#启动server #启动server
</pre>
vllm serve $MODEL_PATH --trust-remote-code --dtype $dtype --max-model-len $max_len -tp $tp --gpu-memory-utilization 0.97 vllm serve $MODEL_PATH --trust-remote-code --dtype $dtype --max-model-len $max_len -tp $tp --gpu-memory-utilization 0.97
</pre>
#发送请求 #发送请求
#--distributed-executor-backend ray等其他参数根据实际情况添加 #--distributed-executor-backend ray等其他参数根据实际情况添加
</pre>
方式与平常一样,只是需要加上--ignore-eos 方式与平常一样,只是需要加上--ignore-eos
python benchmark_servein_0.7.2.py --backend vllm --ignore-eos --dataset-name random --random-input-len $input_len --random-output-len $output_len --model $MODEL_PATH --num-prompts $num_prompts --endpoint /v1/completions python benchmark_servein_0.7.2.py --backend vllm --ignore-eos --dataset-name random --random-input-len $input_len --random-output-len $output_len --model $MODEL_PATH --num-prompts $num_prompts --endpoint /v1/completions
</pre>
prof prof
offline_prof offline_prof
hipprof hipprof
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment