update readme

b51171e8 · zhuwenwen · 2b338d49 · b51171e8
Commit b51171e8 authored Jun 06, 2024 by zhuwenwen
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 4 deletions

README.md README.md +4 -4

No files found.
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@ pip install ray==2.9.1 tiktoken aiohttp==3.9.1 outlines==0.0.37 openai==1.23.3
 * flash_attn: 2.0.4
 * python: python3.10
-`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应.目前只能在K100_AI上使用`
 ## 数据集
 无
@@ -102,7 +102,7 @@ python vllm/examples/offline_inference.py
 ```bash
 python vllm/benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model Qwen/Qwen1.5-7B-Chat -tp 1 --trust-remote-code --enforce-eager --dtype float16
 ```
-其中`--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len  1`即为首字延迟。
+其中`--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。
 2、使用数据集
 下载数据集：
@@ -113,7 +113,7 @@ wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/r
 ```bash
 python benchmark_throughput.py --num-prompts 1 --model Qwen/Qwen1.5-7B-Chat --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
 ```
-其中`--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。
+其中`--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。
 ### api服务推理性能测试
@@ -134,7 +134,7 @@ python vllm/benchmarks/benchmark_serving.py --model Qwen/Qwen1.5-7B-Chat --datas
 ```bash
 python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-7B-Chat --enforce-eager --dtype float16 --trust-remote-code
 ```
-这里`--model`为加载模型路径，`--dtype`为数据类型：float16，默认情况使用tokenizer中的预定义聊天模板，`--chat-template`可以添加新模板覆盖默认模板
+这里`--model`为加载模型路径，`--dtype`为数据类型：float16，默认情况使用tokenizer中的预定义聊天模板，`--chat-template`可以添加新模板覆盖默认模板,`-q gptq`为使用gptq量化模型进行推理。
 列出模型型号：
 ```bash