[README]更新api服务推理性能测试

a3afc415 · laibao · 15c6147a · a3afc415
Commit a3afc415 authored Oct 15, 2024 by laibao
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 0 deletions

README.md README.md +16 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -123,6 +123,22 @@ python benchmarks/benchmark_throughput.py --num-prompts 1 --model Qwen/Qwen2.5-7

 其中 `--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。

+### OpenAI api服务推理性能测试
+
+1.启动服务：
+
+```bash
+python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-instruct --enforce-eager --dtype float16 --trust-remote-code
+```
+
+2.启动客户端
+
+```
+python benchmarks/benchmark_serving.py --model Qwen/Qwen2.5-7B-instruct --dataset ShareGPT_V3_unfiltered_cleaned_split.json  --num-prompts 1 --trust-remote-code
+```
+
+参数同使用数据集，离线批量推理性能测试，具体参考[benchmarks/benchmark_serving.py](/codes/modelzoo/qwen1.5_vllm/-/blob/master/benchmarks/benchmark_serving.py)
+
 ### OpenAI兼容服务

 启动服务：