README.md 12.2 KB
Newer Older
laibao's avatar
laibao committed
1
2
3
4
5
6
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2024-05-24 14:15:07
 * @LastEditTime: 2024-09-30 08:30:01
-->
laibao's avatar
laibao committed
7

laibao's avatar
laibao committed
8
# llava
laibao's avatar
laibao committed
9
10

## 论文
laibao's avatar
laibao committed
11

laibao's avatar
laibao committed
12
13
14
Visual Instruction Tuning

[2304.08485 (arxiv.org)](https://arxiv.org/pdf/2304.08485)
laibao's avatar
laibao committed
15
16

## 模型结构
laibao's avatar
laibao committed
17

laibao's avatar
laibao committed
18
19
LLaVA(大型语言和视觉助手)是一个开源的大型多模态模型,结合了视觉和语言能力。它通过将视觉编码器与语言模型 Vicuna 结合,实现了先进的视觉和语言理解,在多模态任务中表现优异,并在多个基准测试中(如 Science QA)设立了新的标准。LLaVA 以成本效益高的训练和高效扩展性著称,最近的更新着重提升了多模态推理能力,尤其是对高分辨率图像的理解。

laibao's avatar
laibao committed
20
LLaVA 的最新进展包括支持动态高分辨率处理,以及多语言的零样本能力,如中文,展现了在非英语数据上未经特定微调的情况下也能保持出色的表现 
laibao's avatar
laibao committed
21

laibao's avatar
laibao committed
22
<div align=center>
laibao's avatar
laibao committed
23
    <img src="./doc/llava_network.png"/>
laibao's avatar
laibao committed
24
25
26
</div>

## 算法原理
laibao's avatar
laibao committed
27

laibao's avatar
laibao committed
28
LLaVA(Large Language and Vision Assistant)的算法原理主要包括以下几个方面:
laibao's avatar
laibao committed
29

laibao's avatar
laibao committed
30
31
32
33
* **视觉指令调优** :通过使用GPT-4生成的多模态语言-图像指令数据,对模型进行调优,以提高其在新任务上的零样本能力。
* **大规模多模态模型** :将CLIP的视觉编码器与Vicuna的语言解码器连接,形成一个端到端训练的多模态模型,用于通用的视觉和语言理解。
* **数据生成** :利用GPT-4生成多模态指令跟随数据,包括对图像内容的详细描述和复杂推理问题。
* **评估基准** :构建了两个评估基准,包含多样且具有挑战性的应用任务,以测试模型的多模态对话能力。
laibao's avatar
laibao committed
34
35

## 环境配置
laibao's avatar
laibao committed
36

laibao's avatar
laibao committed
37
### Docker(方法一)
laibao's avatar
laibao committed
38

laibao's avatar
laibao committed
39
40
41
42
43
44
45
46
47
提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:

```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name qwen1.5_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash
```
laibao's avatar
laibao committed
48

laibao's avatar
laibao committed
49
50
51
`Tips:若在K100/Z100L上使用,使用定制镜像docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.5.0-dtk24.04.1-ubuntu20.04-py310-zk-v1,K100/Z100L不支持awq量化`

### Dockerfile(方法二)
laibao's avatar
laibao committed
52

laibao's avatar
laibao committed
53
54
55
56
57
58
59
60
```
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker build -t qwen1.5:latest .
docker run -it --name qwen1.5_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> qwen1.5:latest /bin/bash
```

### Anaconda(方法三)
laibao's avatar
laibao committed
61

laibao's avatar
laibao committed
62
63
64
```
conda create -n qwen1.5_vllm python=3.10
```
laibao's avatar
laibao committed
65

laibao's avatar
laibao committed
66
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
laibao's avatar
laibao committed
67

laibao's avatar
laibao committed
68
69
70
71
72
73
74
75
76
77
78
79
* DTK驱动:dtk24.04.2
* Pytorch: 2.1.0
* triton:2.1.0
* lmslim: 0.1.0
* xformers: 0.0.25
* flash_attn: 2.0.4
* vllm: 0.5.0
* python: python3.10

`Tips:需先安装相关依赖,最后安装vllm包`

## 数据集
laibao's avatar
laibao committed
80

laibao's avatar
laibao committed
81
82
83
84


## 推理

laibao's avatar
laibao committed
85
86
87
88
89
90
91
92
93
94
95
96
97
98
### 模型下载

| 基座模型                                                       | chat模型                                                                    | GPTQ模型                                                                                   | AWQ模型                                                                                      |
| -------------------------------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- |
| [Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)                    | [Qwen-7B-Chat](http://113.200.138.88:18080/aimodels/Qwen-7B-Chat)              | [Qwen-7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)                       |                                                                                              |
| [Qwen-14B](https://huggingface.co/Qwen/Qwen-14B)                  | [Qwen-14B-Chat](http://113.200.138.88:18080/aimodels/Qwen-14B-Chat)            | [Qwen-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen-14B-Chat-Int4)                     |                                                                                              |
| [Qwen-72B](http://113.200.138.88:18080/aimodels/qwen/Qwen-72B)    | [Qwen-72B-Chat](http://113.200.138.88:18080/aimodels/Qwen-72B-Chat)            | [Qwen-72B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen-72B-Chat-Int4)                     |                                                                                              |
| [Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B)              | [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)                 | [Qwen1.5-7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int4)            | [Qwen1.5-7B-Chat-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen1.5-7B-Chat-AWQ)       |
| [Qwen1.5-14B](https://huggingface.co/Qwen/Qwen1.5-14B)            | [Qwen1.5-14B-Chat](http://113.200.138.88:18080/aimodels/qwen/Qwen1.5-14B-Chat) | [Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)          | [Qwen1.5-14B-Chat-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen1.5-14B-Chat-AWQ)     |
| [Qwen1.5-32B](http://113.200.138.88:18080/aimodels/Qwen1.5-32B)   | [Qwen1.5-32B-Chat](http://113.200.138.88:18080/aimodels/Qwen1.5-32B-Chat)      | [Qwen1.5-32B-Chat-GPTQ-Int4](http://113.200.138.88:18080/aimodels/Qwen1.5-32B-Chat-GPTQ-Int4) | [Qwen1.5-32B-Chat-AWQ-Int4](https://huggingface.co/Qwen/Qwen1.5-32B-Chat-AWQ)                   |
| [Qwen1.5-72B](http://113.200.138.88:18080/aimodels/Qwen1.5-72B)   | [Qwen1.5-72B-Chat](http://113.200.138.88:18080/aimodels/Qwen1.5-72B-Chat)      | [Qwen1.5-72B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int4)          | [Qwen1.5-72B-Chat-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen1.5-72B-Chat-AWQ)     |
| [Qwen1.5-110B](http://113.200.138.88:18080/aimodels/Qwen1.5-110B) | [Qwen1.5-110B-Chat](http://113.200.138.88:18080/aimodels/Qwen1.5-110B-Chat)    | [Qwen1.5-110B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-110B-Chat-GPTQ-Int4)        | [Qwen1.5-110B-Chat-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen1.5-110B-Chat-AWQ)   |
| [Qwen2-7B](http://113.200.138.88:18080/aimodels/Qwen2-7B)         | [Qwen2-7B-Instruct](http://113.200.138.88:18080/aimodels/Qwen2-7B-Instruct)    | [Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4)        | [Qwen2-7B-Instruct-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen2-7B-Instruct-AWQ)   |
| [Qwen2-72B](http://113.200.138.88:18080/aimodels/Qwen2-72B)       | [Qwen2-72B-Instruct](http://113.200.138.88:18080/aimodels/Qwen2-72B-Instruct)  | [Qwen2-72B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-72B-Instruct-GPTQ-Int4)      | [Qwen2-72B-Instruct-AWQ-Int4](http://113.200.138.88:18080/aimodels/qwen/Qwen2-72B-Instruct-AWQ) |
laibao's avatar
laibao committed
99
100

### 离线批量推理
laibao's avatar
laibao committed
101

laibao's avatar
laibao committed
102
103
104
```bash
python examples/offline_inference.py
```
laibao's avatar
laibao committed
105

laibao's avatar
laibao committed
106
107
108
109
其中,`prompts`为提示词;`temperature`为控制采样随机性的值,值越小模型生成越确定,值变高模型生成更随机,0表示贪婪采样,默认为1;`max_tokens=16`为生成长度,默认为1;
`model`为模型路径;`tensor_parallel_size=1`为使用卡数,默认为1;`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理,`quantization="gptq"`为使用gptq量化进行推理,需下载以上GPTQ模型。`quantization="awq"`为使用awq量化进行推理,需下载以上AWQ模型。

### 离线批量推理性能测试
laibao's avatar
laibao committed
110

laibao's avatar
laibao committed
111
1、指定输入输出
laibao's avatar
laibao committed
112

laibao's avatar
laibao committed
113
114
115
```bash
python benchmarks/benchmark_throughput.py --num-prompts 1 --input-len 32 --output-len 128 --model Qwen/Qwen1.5-7B-Chat -tp 1 --trust-remote-code --enforce-eager --dtype float16
```
laibao's avatar
laibao committed
116
117

其中 `--num-prompts`是batch数,`--input-len`是输入seqlen,`--output-len`是输出token长度,`--model`为模型路径,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。若指定 `--output-len  1`即为首字延迟。`-q gptq`为使用gptq量化模型进行推理。
laibao's avatar
laibao committed
118
119
120

2、使用数据集
下载数据集:
laibao's avatar
laibao committed
121

laibao's avatar
laibao committed
122
123
124
125
126
127
128
129
```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

```bash
python benchmarks/benchmark_throughput.py --num-prompts 1 --model Qwen/Qwen1.5-7B-Chat --dataset ShareGPT_V3_unfiltered_cleaned_split.json -tp 1 --trust-remote-code --enforce-eager --dtype float16
```

laibao's avatar
laibao committed
130
其中 `--num-prompts`是batch数,`--model`为模型路径,`--dataset`为使用的数据集,`-tp`为使用卡数,`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理。`-q gptq`为使用gptq量化模型进行推理。
laibao's avatar
laibao committed
131
132

### api服务推理性能测试
laibao's avatar
laibao committed
133

laibao's avatar
laibao committed
134
1、启动服务端:
laibao's avatar
laibao committed
135

laibao's avatar
laibao committed
136
137
138
139
140
```bash
python -m vllm.entrypoints.openai.api_server  --model Qwen/Qwen1.5-7B-Chat  --dtype float16 --enforce-eager -tp 1 
```

2、启动客户端:
laibao's avatar
laibao committed
141

laibao's avatar
laibao committed
142
143
144
145
```bash
python benchmarks/benchmark_serving.py --model Qwen/Qwen1.5-7B-Chat --dataset ShareGPT_V3_unfiltered_cleaned_split.json  --num-prompts 1 --trust-remote-code
```

laibao's avatar
laibao committed
146
参数同使用数据集,离线批量推理性能测试,具体参考[benchmarks/benchmark_serving.py](benchmarks/benchmark_serving.py)
laibao's avatar
laibao committed
147
148

### OpenAI兼容服务
laibao's avatar
laibao committed
149

laibao's avatar
laibao committed
150
启动服务:
laibao's avatar
laibao committed
151

laibao's avatar
laibao committed
152
153
154
```bash
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-7B-Chat --enforce-eager --dtype float16 --trust-remote-code
```
laibao's avatar
laibao committed
155
156

这里 `--model`为加载模型路径,`--dtype`为数据类型:float16,默认情况使用tokenizer中的预定义聊天模板,`--chat-template`可以添加新模板覆盖默认模板,`-q gptq`为使用gptq量化模型进行推理,`-q awqq`为使用awq量化模型进行推理。
laibao's avatar
laibao committed
157
158

列出模型型号:
laibao's avatar
laibao committed
159

laibao's avatar
laibao committed
160
161
162
163
164
```bash
curl http://localhost:8000/v1/models
```

### OpenAI Completions API和vllm结合使用
laibao's avatar
laibao committed
165

laibao's avatar
laibao committed
166
167
168
169
170
171
172
173
174
175
176
```bash
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen1.5-7B",
        "prompt": "What is deep learning?",
        "max_tokens": 7,
        "temperature": 0
    }'
```

laibao's avatar
laibao committed
177
或者使用[examples/openai_completion_client.py](examples/openai_completion_client.py)
laibao's avatar
laibao committed
178
179

### OpenAI Chat API和vllm结合使用
laibao's avatar
laibao committed
180

laibao's avatar
laibao committed
181
182
183
184
185
186
187
188
189
190
191
192
```bash
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen1.5-7B-Chat",
        "messages": [
            {"role": "system", "content": "What is deep learning?"},
            {"role": "user", "content": "What is deep learning?"}
        ]
    }'
```

laibao's avatar
laibao committed
193
或者使用[examples/openai_chatcompletion_client.py](examples/openai_chatcompletion_client.py)
laibao's avatar
laibao committed
194
195

## result
laibao's avatar
laibao committed
196

laibao's avatar
laibao committed
197
使用的加速卡:1张 DCU-K100_AI-64G
laibao's avatar
laibao committed
198

laibao's avatar
laibao committed
199
200
201
202
203
```
Prompt: 'What is deep learning?', Generated text: ' Deep learning is a subset of machine learning that involves the use of neural networks to model and solve complex problems. Neural networks are a network of interconnected nodes or " neurons" that are designed to recognize patterns in data, learn from examples, and make predictions or decisions.\nThe term "deep" in deep learning refers to the use of multiple layers or hidden layers in these neural networks. Each layer processes the input data in a different way, extracting increasingly abstract features as the data passes through.'
```

### 精度
laibao's avatar
laibao committed
204

laibao's avatar
laibao committed
205
206
207
208
209


## 应用场景

### 算法类别
laibao's avatar
laibao committed
210

laibao's avatar
laibao committed
211
212
213
对话问答

### 热点应用行业
laibao's avatar
laibao committed
214

laibao's avatar
laibao committed
215
216
217
金融,科研,教育

## 源码仓库及问题反馈
laibao's avatar
laibao committed
218

laibao's avatar
laibao committed
219
220
221
222
* [https://developer.hpccube.com/codes/modelzoo/qwen1.5_vllm](https://developer.hpccube.com/codes/modelzoo/qwen1.5_vllm)

## 参考资料

laibao's avatar
laibao committed
223
* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)