README.md 6.83 KB
Newer Older
xiabo's avatar
xiabo committed
1
# Qwen
xiabo's avatar
xiabo committed
2
## 论文
lvhan028's avatar
lvhan028 committed
3

xiabo's avatar
xiabo committed
4
`Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities`
RunningLeon's avatar
RunningLeon committed
5

xiabo's avatar
xiabo committed
6
https://arxiv.org/pdf/2308.12966.pdf
lvhan028's avatar
lvhan028 committed
7

xiabo's avatar
xiabo committed
8
## 模型结构
lvhan028's avatar
lvhan028 committed
9

xiabo's avatar
xiabo committed
10
通义千问(Qwen) 是阿里云研发的通义千问大模型系列的70/140亿参数规模的模型。Qwen是基于Transformer的大语言模型, 在超大规模的预训练数据上进行训练得到。预训练数据类型多样,覆盖广泛,包括大量网络文本、专业书籍、代码等。同时,在Qwen-7B的基础上,使用对齐机制打造了基于大语言模型的AI助手Qwen-7B-Chat。
lvhan028's avatar
lvhan028 committed
11

xiabo's avatar
xiabo committed
12
本项目主要针对Qwen-Chat在DCU平台的推理性能优化,达到DCU平台较快的对话效果。
13

xiabo's avatar
xiabo committed
14
![qwen](docs/transformer.jpg)
15
16


xiabo's avatar
xiabo committed
17
## 算法原理
18

xiabo's avatar
xiabo committed
19
Qwen的构建采用了类似LLaMA的架构。与标准transformer的主要差异有:1)使用非连接嵌入、2)使用旋转位置嵌入、3)在注意力中除了QKV外不使用偏置、4)使用RMSNorm代替LayerNorm、5)使用SwiGLU代替ReLU、以及6)采用快速注意力来加速训练。该模型共有32层,嵌入维度为4096,注意力头数为32。
20

xiabo's avatar
xiabo committed
21
![qwen](docs/qwen.png)
lvhan028's avatar
lvhan028 committed
22

xiabo's avatar
xiabo committed
23
## 环境配置
xiabo's avatar
xiabo committed
24
提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
AllentDan's avatar
AllentDan committed
25
```
zhouxiang's avatar
zhouxiang committed
26
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:lmdeploy1.0-dtk23.10-torch1.13-py38-latest
xiabo's avatar
xiabo committed
27
28
29
30
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name qwen --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
31
```
xiabo's avatar
xiabo committed
32
镜像版本依赖:
zhouxiang's avatar
zhouxiang committed
33
* DTK驱动:dtk23.10
xiabo's avatar
xiabo committed
34
35
36
37
38
39
40
* Pytorch: 1.13
* python: python3.8

## 数据集


## 推理
xiabo's avatar
xiabo committed
41
42
43
### 源码编译安装
```
# 若使用光源的镜像,可以不用源码编译,镜像里面安装好了lmdeploy,可跳过源码编译安装
zhouxiang's avatar
zhouxiang committed
44
# 获取源码,编译并安装
zhouxiang's avatar
zhouxiang committed
45
git clone http://developer.hpccube.com/codes/modelzoo/Qwen_lmdeploy.git
zhouxiang's avatar
zhouxiang committed
46
cd qwen_lmdeploy
zhouxiang's avatar
zhouxiang committed
47
48
git submodule init && git submodule update
cd lmdeploy
xiabo's avatar
xiabo committed
49
50
51
52
53
54
55
56
mkdir build && cd build
sh ../generate.sh
make -j 32
make install
cd .. && python3 setup.py install

```

xiabo's avatar
xiabo committed
57
58
### 模型下载

zhouxiang's avatar
zhouxiang committed
59
[Qwen-7B-chat](https://modelscope.cn/models/Qwen/Qwen-7B-Chat)
xiabo's avatar
xiabo committed
60

zhouxiang's avatar
zhouxiang committed
61
[Qwen-14B-chat](https://modelscope.cn/models/Qwen/Qwen-14B-Chat)
xiabo's avatar
xiabo committed
62

zhouxiang's avatar
zhouxiang committed
63
[Qwen-72B-Chat](https://modelscope.cn/models/qwen/Qwen-72B-Chat)
zhouxiang's avatar
zhouxiang committed
64

xiabo's avatar
xiabo committed
65
### 运行 Qwen-7B-chat
xiabo's avatar
xiabo committed
66
```
xiabo's avatar
xiabo committed
67
# 模型转换
zhouxiang's avatar
zhouxiang committed
68
# <model_name> 模型的名字 ('llama', 'internlm', 'vicuna', 'wizardlM', 'internlm-chat-7b', 'internlm-chat', 'internlm-chat-7b-8k', 'internlm-chat-20b', 'internlm-20b', 'baichuan-7b', 'baichuan2-7b', 'puyu', 'llama2', 'qwen-7b', 'qwen-14b', 'qwen-72b', 'codellama', 'solar', 'ultralm', 'ultracm', 'yi')
xiabo's avatar
xiabo committed
69
# <model_path> 模型路径
zhouxiang's avatar
zhouxiang committed
70
71
# <model_format> 模型的格式 ('llama', 'hf', None。可以不写默认None,代码会根据模型选择格式)
# <tokenizer_path> tokenizer模型的路径(默认None,会去model_path里面找对应的其他模型:'tokenizer.model',千问:'qwen.tiktoken')
xiabo's avatar
xiabo committed
72
73
74
# <model_format> 保存输出的目标路径(默认./workspace)
# <tp> 用于张量并行的GPU数量应该是2^n

zhouxiang's avatar
zhouxiang committed
75
lmdeploy convert --model_name qwen-7b --model_path /path/to/model --dst_path ./workspace_qwe7b --tp 1
xiabo's avatar
xiabo committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

# bash界面运行
lmdeploy chat turbomind --model_path ./workspace_qwe7b --tp 1     # 输入问题后执行2次回车进行推理

# 服务器网页端运行

在bash端运行:
# <model_path_or_server> 部署模型的路径或tritonserver URL或restful api URL。前者用于与gradio直接运行服务。后者用于默认情况下使用tritonserver运行。如果输入URL是restful api。请启用另一个标志“restful_api”。
# <server_name> gradio服务器的ip地址
# <server_port> gradio服务器的ip的端口
# <batch_size> 于直接运行Turbomind的batch大小 (默认32)
# <tp> 用于张量并行的GPU数量应该是2^n (和模型转换的时候保持一致)
# <restful_api> modelpath_or_server的标志(默认是False)

lmdeploy serve gradio --model_path_or_server ./workspace_qwe7b --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 

在网页上输入{ip}:{pord}即可进行对话
xiabo's avatar
xiabo committed
93
```
xiabo's avatar
xiabo committed
94
### 运行 Qwen-14B-chat
xiabo's avatar
xiabo committed
95
```
xiabo's avatar
xiabo committed
96
# 模型转换
zhouxiang's avatar
zhouxiang committed
97
lmdeploy convert --model_name qwen-14b --model_path /path/to/model --dst_path ./workspace_qwen14b --tp 2
xiabo's avatar
xiabo committed
98
99

# bash界面运行
zhouxiang's avatar
zhouxiang committed
100
lmdeploy chat turbomind --model_path ./workspace_qwen14b --tp 2
xiabo's avatar
xiabo committed
101
102
103
104

# 服务器网页端运行

在bash端运行:
zhouxiang's avatar
zhouxiang committed
105
106
107
108
109
110
111
112
lmdeploy serve gradio --model_path_or_server ./workspace_qwen14b --server_name {ip} --server_port {pord} --batch_size 32 --tp 2 --restful_api False 

在网页上输入{ip}:{pord}即可进行对话
```
### 运行 Qwen-72B-chat

```
# 模型转换
zhouxiang's avatar
zhouxiang committed
113
lmdeploy convert --model_name qwen-72b --model_path /path/to/model --dst_path ./workspace_qwen72b --tp 8
zhouxiang's avatar
zhouxiang committed
114
115
116
117
118
119
120
121

# bash界面运行
lmdeploy chat turbomind --model_path ./workspace_qwen72b --tp 8

# 服务器网页端运行

在bash端运行:
lmdeploy serve gradio --model_path_or_server ./workspace_qwen72b --server_name {ip} --server_port {pord} --batch_size 32 --tp 8 --restful_api False 
xiabo's avatar
xiabo committed
122
123

在网页上输入{ip}:{pord}即可进行对话
xiabo's avatar
xiabo committed
124
```
zhouxiang's avatar
zhouxiang committed
125

zhouxiang's avatar
zhouxiang committed
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
### api-server方式运行实例

启动server:

```shell
# --instance_num: turbomind推理实例的个数。可理解为支持的最大并发数
# --tp: 在 tensor parallel时,使用的GPU数量
lmdeploy serve api_server ./workspace_qwen72b --server_name ${server_ip} --server_port ${server_port} --tp 8
```

浏览器上打开 `http://{server_ip}:{server_port}`,即可访问 swagger,查阅 RESTful API 的详细信息。

可以用命令行,在控制台与 server 通信(在新启的命令行页面下执行):

```shell
# restful_api_url 就是 api_server 产生的,即上述启动server的http://{server_ip}:{server_port}
lmdeploy serve api_client restful_api_url
```

或者,启动 gradio,在 webui 的聊天对话框中,与服务交流:

```shell
# restful_api_url 就是 api_server 产生的,比如 http://localhost:23333
# server_ip 和 server_port 是用来提供 gradio ui 访问服务的
# 例子: lmdeploy serve gradio http://localhost:23333 --server_name localhost --server_port 6006 --restful_api True
lmdeploy serve gradio restful_api_url --server_name ${server_ip} --server_port ${server_port} --restful_api True
```

**需要保证'{server_ip}:{server_port}'在外部浏览器中的可访问性**

zhouxiang's avatar
zhouxiang committed
156
关于 RESTful API的详细介绍,请参考[这份](https://developer.hpccube.com/codes/aicomponent/lmdeploy/-/blob/dtk23.10-v0.1.0/docs/zh_cn/restful_api.md)文档。
zhouxiang's avatar
zhouxiang committed
157

xiabo's avatar
xiabo committed
158
## result
zhouxiang's avatar
zhouxiang committed
159

xiabo's avatar
xiabo committed
160
![qwen推理](docs/qwen推理.gif)
xiabo's avatar
xiabo committed
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175

### 精度



## 应用场景

### 算法类别

`对话问答`


### 热点应用行业

`医疗,科研,金融,教育`
176

lvhan028's avatar
lvhan028 committed
177

xiabo's avatar
xiabo committed
178
179
## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/qwen_lmdeploy
lvhan028's avatar
lvhan028 committed
180

xiabo's avatar
xiabo committed
181
182
## 参考资料
https://github.com/InternLM/LMDeploy