README.md 13.1 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
<!--
laibao's avatar
laibao committed
2
3
 * @Author: laibai
 * @email: laibao@sugon.com
zhuwenwen's avatar
zhuwenwen committed
4
 * @Date: 2024-05-24 14:15:07
zhuwenwen's avatar
zhuwenwen committed
5
 * @LastEditTime: 2024-09-30 08:30:01
zhuwenwen's avatar
zhuwenwen committed
6
-->
laibao's avatar
laibao committed
7

laibao's avatar
laibao committed
8
# Qwen2.5
zhuwenwen's avatar
zhuwenwen committed
9
10

## 论文
laibao's avatar
laibao committed
11
12


zhuwenwen's avatar
zhuwenwen committed
13
14

## 模型结构
laibao's avatar
laibao committed
15

laibao's avatar
laibao committed
16
Qwen2.5是阿里云开源的最新一代大型语言模型,标志着Qwen系列在性能和功能上的又一次飞跃。本次更新着重提升了模型的多语言处理能力,支持超过29种语言,包括中文、英文、法文、西班牙文、葡萄牙文、德文等。所有规模的模型现在都能支持高达128K tokens的上下文长度,并能生成最长8K tokens的内容。预训练数据集也从7T tokens扩展到了18T tokens,显著提升了模型的知识储备。此外,Qwen2.5还增强了对系统提示的适应性,提升了角色扮演和聊天机器人的背景设置能力。模型系列包括从0.5B到72B不同参数规模的版本,以满足不同应用场景的需求 。
laibao's avatar
laibao committed
17

zhuwenwen's avatar
zhuwenwen committed
18
19
20
21
22
<div align=center>
    <img src="./doc/qwen1.5.jpg"/>
</div>

## 算法原理
laibao's avatar
laibao committed
23

laibao's avatar
readme  
laibao committed
24
和Qwen一样,Qwen2.5仍然是一个decoder-only的transformer模型,使用SwiGLU激活函数、RoPE、多头注意力机制等。
zhuwenwen's avatar
zhuwenwen committed
25
26
27
28
29
30

<div align=center>
    <img src="./doc/qwen1.5.png"/>
</div>

## 环境配置
laibao's avatar
laibao committed
31

zhuwenwen's avatar
zhuwenwen committed
32
### Docker(方法一)
laibao's avatar
laibao committed
33

zhuwenwen's avatar
zhuwenwen committed
34
35
36
提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:

```
laibao's avatar
laibao committed
37
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
zhuwenwen's avatar
zhuwenwen committed
38
39
40
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
laibao's avatar
laibao committed
41
docker run -it --name qwen2.5_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash
zhuwenwen's avatar
zhuwenwen committed
42
```
laibao's avatar
laibao committed
43

zhuwenwen's avatar
zhuwenwen committed
44
`Tips:若在K100/Z100L上使用,使用定制镜像docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.5.0-dtk24.04.1-ubuntu20.04-py310-zk-v1,K100/Z100L不支持awq量化`
zhuwenwen's avatar
zhuwenwen committed
45
46

### Dockerfile(方法二)
laibao's avatar
laibao committed
47

zhuwenwen's avatar
zhuwenwen committed
48
49
50
```
# <Host Path>主机端路径
# <Container Path>容器映射路径
laibao's avatar
laibao committed
51
docker build -t qwen2.5:latest .
laibao's avatar
readme  
laibao committed
52
docker run -it --name qwen2.5_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> qwen2.5:latest /bin/bash
zhuwenwen's avatar
zhuwenwen committed
53
54
55
```

### Anaconda(方法三)
laibao's avatar
laibao committed
56

zhuwenwen's avatar
zhuwenwen committed
57
```
laibao's avatar
laibao committed
58
conda create -n qwen2.5_vllm python=3.10
zhuwenwen's avatar
zhuwenwen committed
59
```
laibao's avatar
laibao committed
60

zhuwenwen's avatar
zhuwenwen committed
61
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
laibao's avatar
laibao committed
62

zhuwenwen's avatar
zhuwenwen committed
63
* DTK驱动:dtk24.04.2
zhuwenwen's avatar
zhuwenwen committed
64
65
* Pytorch: 2.1.0
* triton:2.1.0
zhuwenwen's avatar
zhuwenwen committed
66
* lmslim: 0.1.0
zhuwenwen's avatar
zhuwenwen committed
67
* xformers: 0.0.25
laibao's avatar
laibao committed
68
* flash_attn: 2.6.1
zhuwenwen's avatar
zhuwenwen committed
69
* vllm: 0.5.0
zhuwenwen's avatar
zhuwenwen committed
70
* python: python3.10
zhuwenwen's avatar
zhuwenwen committed
71

zhuwenwen's avatar
zhuwenwen committed
72
`Tips:需先安装相关依赖,最后安装vllm包`
zhuwenwen's avatar
zhuwenwen committed
73
74

## 数据集
laibao's avatar
laibao committed
75

laibao's avatar
readme  
laibao committed
76

zhuwenwen's avatar
zhuwenwen committed
77
78
79

## 推理

laibao's avatar
laibao committed
80
81
### 模型下载

laibao's avatar
laibao committed
82
83
84
85
86
87
88
89
| 基座模型                                                                         | chat模型                                                                                          | GPTQ模型                                                                                                              | AWQ模型                                                                                                   |
| -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| [Qwen2.5 3B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-3B)                  | [Qwen2.5 3B Instruct](http://113.200.138.88:18080/aimodels/qwen2.5-3b-instruct)                      | [Qwen2.5-3B-Instruct-GPTQ-Int4](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-3b-instruct-gptq-int4)                 | [Qwen2.5-3B-Instruct-AWQ](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-3b-instruct-awq)                 |
| [Qwen2.5-7B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-7B)                  | [ Qwen2.5 7B Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-7B-Instruct)                | [Qwen2.5-7B-Instruct-GPTQ-Int4](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-7b-instruct-gptq-int4)                 | [Qwen2.5-7B-Instruct-AWQ](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-7b-instruct-awq)                 |
| [Qwen2.5-14B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-14B)                | [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)                             | [Qwen2.5-14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4)                             | [Qwen2.5-14B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-AWQ)                             |
| [Qwen2.5-32B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-32B)                | [Qwen2.5-32B-Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-32B-Instruct)               | [Qwen2.5-32B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4)                             | [Qwen2.5-32B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-AWQ)                             |
| [Qwen2.5-72B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-72B)                | [Qwen2.5-72B-Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-72B-Instruct)               | [Qwen2.5-72B-Instruct-GPTQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4)               | [Qwen2.5-72B-Instruct-AWQ](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-72B-Instruct-AWQ)               |
| [ Qwen2.5 Coder 1.5B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-1.5B) | [Qwen2.5-Coder-1.5B-Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-1.5B-Instruct) | [Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4) | [Qwen2.5-Coder-1.5B-Instruct-AWQ](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-coder-1.5b-instruct-awq) |
laibao's avatar
laibao committed
90
| [Qwen2.5 Coder 7B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-7B)      | [Qwen2.5 Coder 7B Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-7B-Instruct)     | [Qwen2.5 Coder 7B Instruct GPTQ Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4)     | [Qwen2.5 Coder 7B Instruct AWQ](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Coder-7B-Instruct-AWQ)     |
laibao's avatar
laibao committed
91
92
| [Qwen2.5 Math 1.5B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Math-1.5B)    | [Qwen2.5 Math 1.5B Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Math-1.5B-Instruct)   |                                                                                                                       |                                                                                                           |
| [ Qwen2.5 Math 7B](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Math-7B)       | [Qwen2.5-Math-7B-Instruct](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-Math-7B-Instruct)       |                                                                                                                       |                                                                                                           |
zhuwenwen's avatar
add env  
zhuwenwen committed
93

zhuwenwen's avatar
zhuwenwen committed
94
### 离线批量推理
laibao's avatar
laibao committed
95

zhuwenwen's avatar
zhuwenwen committed
96
```bash
laibao's avatar
laibao committed
97
python examples/offline_inference.py
zhuwenwen's avatar
zhuwenwen committed
98
```
laibao's avatar
laibao committed
99

zhuwenwen's avatar
zhuwenwen committed
100
其中,`prompts`为提示词;`temperature`为控制采样随机性的值,值越小模型生成越确定,值变高模型生成更随机,0表示贪婪采样,默认为1;`max_tokens=16`为生成长度,默认为1;
zhuwenwen's avatar
zhuwenwen committed
101
`model`为模型路径;`tensor_parallel_size=1`为使用卡数,默认为1;`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理,`quantization="gptq"`为使用gptq量化进行推理,需下载以上GPTQ模型。`quantization="awq"`为使用awq量化进行推理,需下载以上AWQ模型。
zhuwenwen's avatar
zhuwenwen committed
102
103

### 离线批量推理性能测试
laibao's avatar
laibao committed
104

zhuwenwen's avatar
zhuwenwen committed
105
1、指定输入输出
laibao's avatar
laibao committed
106

zhuwenwen's avatar
zhuwenwen committed
107
```bash
laibao's avatar
laibao committed
108
python benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model Qwen/Qwen2.5-7B-instruct -tp 1 --trust-remote-code --enforce-eager --dtype float16
zhuwenwen's avatar
zhuwenwen committed
109
```
laibao's avatar
laibao committed
110
111

其中 `--num-prompts`是batch数,`--input-len`是输入seqlen,`--output-len`是输出token长度,`--model`为模型路径,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。若指定 `--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。
zhuwenwen's avatar
zhuwenwen committed
112
113
114

2、使用数据集
下载数据集:
laibao's avatar
laibao committed
115

zhuwenwen's avatar
zhuwenwen committed
116
117
118
119
120
```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

```bash
laibao's avatar
laibao committed
121
python benchmarks/benchmark_throughput.py --num-prompts 1 --model Qwen/Qwen2.5-7B-instruct --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
zhuwenwen's avatar
zhuwenwen committed
122
123
```

laibao's avatar
laibao committed
124
其中 `--num-prompts`是batch数,`--model`为模型路径,`--dataset`为使用的数据集,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。
zhuwenwen's avatar
zhuwenwen committed
125

126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
### OpenAI api服务推理性能测试

1.启动服务:

```bash
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-instruct --enforce-eager --dtype float16 --trust-remote-code
```

2.启动客户端

```
python benchmarks/benchmark_serving.py --model Qwen/Qwen2.5-7B-instruct --dataset ShareGPT_V3_unfiltered_cleaned_split.json  --num-prompts 1 --trust-remote-code
```

参数同使用数据集,离线批量推理性能测试,具体参考[benchmarks/benchmark_serving.py](/codes/modelzoo/qwen1.5_vllm/-/blob/master/benchmarks/benchmark_serving.py)

zhuwenwen's avatar
zhuwenwen committed
142
### OpenAI兼容服务
laibao's avatar
laibao committed
143

zhuwenwen's avatar
zhuwenwen committed
144
启动服务:
laibao's avatar
laibao committed
145

zhuwenwen's avatar
zhuwenwen committed
146
```bash
laibao's avatar
laibao committed
147
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-instruct --enforce-eager --dtype float16 --trust-remote-code
zhuwenwen's avatar
zhuwenwen committed
148
```
laibao's avatar
laibao committed
149
150

这里 `--model`为加载模型路径,`--dtype`为数据类型:float16,默认情况使用tokenizer中的预定义聊天模板,`--chat-template`可以添加新模板覆盖默认模板,`-q gptq`为使用gptq量化模型进行推理,`-q awqq`为使用awq量化模型进行推理。
zhuwenwen's avatar
zhuwenwen committed
151
152

列出模型型号:
laibao's avatar
laibao committed
153

zhuwenwen's avatar
zhuwenwen committed
154
155
156
157
158
```bash
curl http://localhost:8000/v1/models
```

### OpenAI Completions API和vllm结合使用
laibao's avatar
laibao committed
159

zhuwenwen's avatar
zhuwenwen committed
160
161
162
163
```bash
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
laibao's avatar
laibao committed
164
        "model": "Qwen/Qwen2.5-7B-instruct",
zhuwenwen's avatar
zhuwenwen committed
165
166
167
168
169
170
        "prompt": "What is deep learning?",
        "max_tokens": 7,
        "temperature": 0
    }'
```

laibao's avatar
laibao committed
171
或者使用[examples/openai_completion_client.py](examples/openai_completion_client.py)
zhuwenwen's avatar
zhuwenwen committed
172
173

### OpenAI Chat API和vllm结合使用
laibao's avatar
laibao committed
174

zhuwenwen's avatar
zhuwenwen committed
175
176
177
178
```bash
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
laibao's avatar
laibao committed
179
        "model": "Qwen/Qwen2.5-7B-instruct",
zhuwenwen's avatar
zhuwenwen committed
180
181
182
183
184
185
186
        "messages": [
            {"role": "system", "content": "What is deep learning?"},
            {"role": "user", "content": "What is deep learning?"}
        ]
    }'
```

laibao's avatar
laibao committed
187
或者使用[examples/openai_chatcompletion_client.py](examples/openai_chatcompletion_client.py)
zhuwenwen's avatar
zhuwenwen committed
188

laibao's avatar
laibao committed
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
### **gradio和vllm结合使用**

1.安装gradio

```
pip install gradio
```

2.安装必要文件

    2.1 启动gradio服务,根据提示操作

```
python  gradio_openai_chatbot_webserver.py --model "Qwen/Qwen2.5-7B-instruct" --model-url http://localhost:8000/v1 --temp 0.8 --stop-token-ids ""
```

    2.2 更改文件权限

打开提示下载文件目录,输入以下命令给予权限

```
chmod +x frpc_linux_amd64_v0.*
```

3.启动OpenAI兼容服务

```
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-7B-instruct --enforce-eager --dtype float16 --trust-remote-code --port 8000
```

4.启动gradio服务

```
python  gradio_openai_chatbot_webserver.py --model "Qwen/Qwen2.5-7B-instruct" --model-url http://localhost:8000/v1 --temp 0.8 --stop-token-ids ""
```

5.使用对话服务

在浏览器中输入本地 URL,可以使用 Gradio 提供的对话服务。

zhuwenwen's avatar
zhuwenwen committed
229
## result
laibao's avatar
laibao committed
230

zhuwenwen's avatar
zhuwenwen committed
231
使用的加速卡:1张 DCU-K100_AI-64G
laibao's avatar
laibao committed
232

zhuwenwen's avatar
zhuwenwen committed
233
234
235
236
237
```
Prompt: 'What is deep learning?', Generated text: ' Deep learning is a subset of machine learning that involves the use of neural networks to model and solve complex problems. Neural networks are a network of interconnected nodes or " neurons" that are designed to recognize patterns in data, learn from examples, and make predictions or decisions.\nThe term "deep" in deep learning refers to the use of multiple layers or hidden layers in these neural networks. Each layer processes the input data in a different way, extracting increasingly abstract features as the data passes through.'
```

### 精度
laibao's avatar
laibao committed
238

zhuwenwen's avatar
zhuwenwen committed
239
240
241
242
243


## 应用场景

### 算法类别
laibao's avatar
laibao committed
244

zhuwenwen's avatar
zhuwenwen committed
245
246
247
对话问答

### 热点应用行业
laibao's avatar
laibao committed
248

zhuwenwen's avatar
zhuwenwen committed
249
250
251
金融,科研,教育

## 源码仓库及问题反馈
laibao's avatar
laibao committed
252

laibao's avatar
laibao committed
253
* [https://developer.hpccube.com/codes/modelzoo/qwen2.5_vllm](https://developer.hpccube.com/codes/modelzoo/qwen1.5_vllm)
zhuwenwen's avatar
zhuwenwen committed
254
255
256

## 参考资料

laibao's avatar
laibao committed
257
* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)