README.md 7.4 KB
Newer Older
laibao's avatar
laibao committed
1
2
3
4
5
6
7
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2024-05-24 14:15:07
 * @LastEditTime: 2024-09-30 08:30:01
-->

dcuai's avatar
dcuai committed
8
# LLaVA
laibao's avatar
laibao committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

## 论文

Visual Instruction Tuning

[2304.08485 (arxiv.org)](https://arxiv.org/pdf/2304.08485)

## 模型结构

LLaVA(大型语言和视觉助手)是一个开源的大型多模态模型,结合了视觉和语言能力。它通过将视觉编码器与语言模型 Vicuna 结合,实现了先进的视觉和语言理解,在多模态任务中表现优异,并在多个基准测试中(如 Science QA)设立了新的标准。LLaVA 以成本效益高的训练和高效扩展性著称,最近的更新着重提升了多模态推理能力,尤其是对高分辨率图像的理解。

LLaVA 的最新进展包括支持动态高分辨率处理,以及多语言的零样本能力,如中文,展现了在非英语数据上未经特定微调的情况下也能保持出色的表现

<div align=center>
    <img src="./doc/llava_network.png"/>
</div>

## 算法原理

LLaVA(Large Language and Vision Assistant)的算法原理主要包括以下几个方面:

* **视觉指令调优** :通过使用GPT-4生成的多模态语言-图像指令数据,对模型进行调优,以提高其在新任务上的零样本能力。
* **大规模多模态模型** :将CLIP的视觉编码器与Vicuna的语言解码器连接,形成一个端到端训练的多模态模型,用于通用的视觉和语言理解。
* **数据生成** :利用GPT-4生成多模态指令跟随数据,包括对图像内容的详细描述和复杂推理问题。
* **评估基准** :构建了两个评估基准,包含多样且具有挑战性的应用任务,以测试模型的多模态对话能力。

## 环境配置

### Docker(方法一)

提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:

```
42
docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas104381-0915-das1.6-py3.10-20250916-rc2
laibao's avatar
laibao committed
43
44
45
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
zhuwenwen's avatar
zhuwenwen committed
46
# 若要在主机端和容器端映射端口需要删除--network host参数
laibao's avatar
laibao committed
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash

```

`Tips:若在K100/Z100L上使用,使用定制镜像docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.5.0-dtk24.04.1-ubuntu20.04-py310-zk-v1,K100/Z100L不支持awq量化`

### Dockerfile(方法二)

```
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker build -t llava:latest .
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> llava:latest /bin/bash

```

### Anaconda(方法三)

```
conda create -n llava_vllm python=3.10
```

chenzk's avatar
chenzk committed
69
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
laibao's avatar
laibao committed
70

71
72
73
74
* DTK驱动:dtk25.04.01
* Pytorch: 2.4.0
* triton: 3.0.0
* lmslim: 0.2.1
laibao's avatar
laibao committed
75
* flash_attn: 2.6.1
76
* flash_mla: 1.0.0
77
* vllm: 0.9.2
laibao's avatar
laibao committed
78
79
80
* python: python3.10

`Tips:需先安装相关依赖,最后安装vllm包`
81

82
83
84
85
86
87
88
89
90
91
92
环境变量:
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_NUMA_BIND=1
export VLLM_RANK0_NUMA=0
export VLLM_RANK1_NUMA=1
export VLLM_RANK2_NUMA=2
export VLLM_RANK3_NUMA=3
export VLLM_RANK4_NUMA=4
export VLLM_RANK5_NUMA=5
export VLLM_RANK6_NUMA=6
export VLLM_RANK7_NUMA=7
93

laibao's avatar
laibao committed
94
95
96
97
98
99
100
101
## 数据集



## 推理

### 模型下载

102
103
| 基座模型                                                         |                                                                     |                                                                                 |
| ---------------------------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
chenzk's avatar
chenzk committed
104
| [llava-v1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | [llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf) | [llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf) |
laibao's avatar
laibao committed
105

106
## 模型推理
laibao's avatar
laibao committed
107
108

```bash
109
python examples/offline_inference/vision_language.py
laibao's avatar
laibao committed
110
111
112
113
114
115
116
117
118
```

### OpenAI兼容服务

启动服务:

`cd examples`

```bash
119
 vllm serve model_path --chat-template template_llava.jinja --port 8000 --allowed-local-media-path xxx
laibao's avatar
laibao committed
120
121
```

122
123
这里 `model_path`为加载模型路径,`--chat-template`可以添加新模板覆盖默认模板,`--allowed-local-media-path`指定允许访问的本地媒体文件路径。

laibao's avatar
laibao committed
124
125
126
127
### OpenAI Completions API和vllm结合使用

```bash

128
curl http://localhost:8000/v1/chat/completions \ompletions \
laibao's avatar
laibao committed
129
130
131
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
132
    "model": "model_path",
laibao's avatar
laibao committed
133
134
135
    "messages": [
      {
        "role": "user",
136
137
138
139
        "content": [
          {"type": "text", "text": " What is the content of this image?"},
          {"type": "image_url", "image_url": {"url": "xxx"}}
        ]
laibao's avatar
laibao committed
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
      }
    ],
    "max_tokens": 300
  }'

```

### **gradio和vllm结合使用**

1.安装gradio

```
pip install gradio
```

laibao's avatar
laibao committed
155
2.安装必要文件与端口映射
laibao's avatar
laibao committed
156
157
158
159

    2.1 启动gradio服务,根据提示操作

```
laibao's avatar
laibao committed
160
python  gradio_openai_vlm_webserver.py --model "/mnt/data/llm-models/llava/llava-1.5-7b-hf" --model-url http://localhost:8000/v1 --host "0.0.0.0" --port 8001
laibao's avatar
laibao committed
161
162
163
164
165
166
167
168
169
```

    2.2 更改文件权限

打开提示下载文件目录,输入以下命令给予权限

```
chmod +x frpc_linux_amd64_v0.*
```
laibao's avatar
laibao committed
170

laibao's avatar
laibao committed
171
172
173
174
    2.3端口映射

```
ssh -L 8000:计算节点IP:8000 -L 8001:计算节点IP:8001 用户名@登录节点 -p 登录节点端口
laibao's avatar
laibao committed
175
```
laibao's avatar
laibao committed
176
177
178
179
180
181

3.启动OpenAI兼容服务

`cd examples`

```
laibao's avatar
laibao committed
182
 vllm serve /mnt/data/llm-models/llava/llava-1.5-7b-hf --chat-template template_llava.jinja --port 8000 --host"0.0.0.0"
laibao's avatar
laibao committed
183
184
185
```

4.启动gradio服务
laibao's avatar
laibao committed
186

laibao's avatar
laibao committed
187
188
```
python  gradio_openai_vlm_webserver.py --model "/mnt/data/llm-models/llava/llava-1.5-7b-hf" --model-url http://localhost:8000/v1 --host "0.0.0.0" --port 8001"
laibao's avatar
laibao committed
189
190
```

laibao's avatar
laibao committed
191
5.使用对话服务
laibao's avatar
laibao committed
192
193
194

在浏览器中输入本地 URL,可以使用 Gradio 提供的对话服务。

laibao's avatar
laibao committed
195
## result
laibao's avatar
laibao committed
196

laibao's avatar
laibao committed
197
198
### 离线推理服务

chenzk's avatar
chenzk committed
199
使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
laibao's avatar
laibao committed
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215

输入:

    images:

<div align="center">
    <img src="./doc/images.png" width="300" height="200"/>
</div>

    text:     	                       What is the content of this image?

输出:

    output:               The image features a close-up view of a stop sign on a city street

### gradio服务
laibao's avatar
laibao committed
216

chenzk's avatar
chenzk committed
217
使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
laibao's avatar
laibao committed
218
219
220
221
222
223
224
225
226
227
228
229
230

<div align=center>
    <img src="./doc/llava_gradio.png" width="800" height="500"/>
</div>

### 精度



## 应用场景

### 算法类别

laibao's avatar
laibao committed
231
对话问答
laibao's avatar
laibao committed
232
233
234
235
236
237
238

### 热点应用行业

金融,科研,教育

## 源码仓库及问题反馈

laibao's avatar
laibao committed
239
* [https://developer.sourcefind.cn/codes/modelzoo/llava_vllm](https://developer.sourcefind.cn/codes/modelzoo/llava_vllm)
laibao's avatar
laibao committed
240
241
242
243

## 参考资料

* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)