update readme

0d96d60e · zhuwenwen · c6432003 · 0d96d60e
Commit 0d96d60e authored Jun 06, 2024 by zhuwenwen
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 6 deletions

README.md README.md +6 -6

No files found.
--- a/README.md
+++ b/README.md
@@ -60,7 +60,7 @@ pip install ray==2.9.1 aiohttp==3.9.1 outlines==0.0.37 openai==1.23.3
 * flash_attn: 2.0.4
 * python: python3.10

-`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应.目前只能在K100_AI上使用`

 ## 数据集
 无
@@ -69,8 +69,8 @@ pip install ray==2.9.1 aiohttp==3.9.1 outlines==0.0.37 openai==1.23.3
 ### 源码编译安装
 ```
 # 若使用光源的镜像，可以跳过源码编译安装，镜像中已安装vllm。
-git clone http://developer.hpccube.com/codes/modelzoo/qwen1.5_vllm.git
-cd qwen1.5_vllm
+git clone http://developer.hpccube.com/codes/modelzoo/llama_vllm.git
+cd llama_vllm
 git submodule init && git submodule update
 cd vllm
 pip install wheel
@@ -102,7 +102,7 @@ python vllm/examples/offline_inference.py
 ```bash
 python vllm/benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model meta-llama/Llama-2-7b-chat-hf -tp 1 --trust-remote-code --enforce-eager --dtype float16
 ```
-其中`--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len  1`即为首字延迟。
+其中`--num-prompts`是batch数，`--input-len`是输入seqlen，`--output-len`是输出token长度，`--model`为模型路径，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。

 2、使用数据集
 下载数据集：
@@ -113,7 +113,7 @@ wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/r
 ```bash
 python vllm/benchmarks/benchmark_throughput.py --num-prompts 1 --model meta-llama/Llama-2-7b-chat-hf --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
 ```
-其中`--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。
+其中`--num-prompts`是batch数，`--model`为模型路径，`--dataset`为使用的数据集，`-tp`为使用卡数，`dtype="float16"`为推理数据类型，如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。


 ### api服务推理性能测试
@@ -134,7 +134,7 @@ python vllm/benchmarks/benchmark_serving.py --model meta-llama/Llama-2-7b-chat-h
 ```bash
 python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf --enforce-eager --dtype float16 --trust-remote-code
 ```
-这里`--model`为加载模型路径，`--dtype`为数据类型：float16，默认情况使用tokenizer中的预定义聊天模板，`--chat-template`可以添加新模板覆盖默认模板
+这里`--model`为加载模型路径，`--dtype`为数据类型：float16，默认情况使用tokenizer中的预定义聊天模板，`--chat-template`可以添加新模板覆盖默认模板,`-q gptq`为使用gptq量化模型进行推理。

 列出模型型号：
 ```bash