Commit b5d6a878 authored by zhuwenwen's avatar zhuwenwen
Browse files

update std

parent 243aa730
......@@ -69,7 +69,7 @@ python offline_inference.py
### 离线批量推理性能测试
1、指定输入输出
```bash
python benchmark_throughput.py -num-prompts 1 --input-len 32 --output-len 128 --model meta-llama/Llama-2-7b-chat-hf -tp 1 --trust-remote-code --enforce-eager --dtype float16
python benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model meta-llama/Llama-2-7b-chat-hf -tp 1 --trust-remote-code --enforce-eager --dtype float16
```
其中`--num-prompts`是batch数,`--input-len`是输入seqlen,`--output-len`是输出token长度,`--model`为模型路径,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。若指定`--output-len 1`即为首字延迟。
......@@ -80,7 +80,7 @@ wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/r
```
```bash
python benchmark_throughput.py -num-prompts 1 --model meta-llama/Llama-2-7b-chat-hf --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
python benchmark_throughput.py --num-prompts 1 --model meta-llama/Llama-2-7b-chat-hf --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
```
其中`--num-prompts`是batch数,`--model`为模型路径,`--dataset`为使用的数据集,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。
......@@ -145,7 +145,7 @@ curl http://localhost:8000/v1/chat/completions \
Prompt: 'I believe the meaning of life is', Generated text: ' to find purpose, happiness, and fulfillment. Here are some reasons why:\n\n1. Purpose: Having a sense of purpose gives life meaning and direction. It helps individuals set goals and work towards achieving them, which can lead to a sense of accomplishment and fulfillment.\n2. Happiness: Happiness is a fundamental aspect of life that brings joy and satisfaction.
```
## 精度
### 精度
## 应用场景
......
......@@ -7,4 +7,4 @@ modelDescription=LLama是一个基础语言模型的集合,参数范围从7B到7
# 应用场景
appScenario=推理,对话问答,金融,科研,教育
# 框架类型
frameType=llama_vllm
frameType=vllm
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment