README.md 7.59 KB
Newer Older
laibao's avatar
laibao committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2024-05-24 14:15:07
 * @LastEditTime: 2024-09-30 08:30:01
-->

# llava

## 论文

Visual Instruction Tuning

[2304.08485 (arxiv.org)](https://arxiv.org/pdf/2304.08485)

## 模型结构

LLaVA(大型语言和视觉助手)是一个开源的大型多模态模型,结合了视觉和语言能力。它通过将视觉编码器与语言模型 Vicuna 结合,实现了先进的视觉和语言理解,在多模态任务中表现优异,并在多个基准测试中(如 Science QA)设立了新的标准。LLaVA 以成本效益高的训练和高效扩展性著称,最近的更新着重提升了多模态推理能力,尤其是对高分辨率图像的理解。

LLaVA 的最新进展包括支持动态高分辨率处理,以及多语言的零样本能力,如中文,展现了在非英语数据上未经特定微调的情况下也能保持出色的表现

<div align=center>
    <img src="./doc/llava_network.png"/>
</div>

## 算法原理

LLaVA(Large Language and Vision Assistant)的算法原理主要包括以下几个方面:

* **视觉指令调优** :通过使用GPT-4生成的多模态语言-图像指令数据,对模型进行调优,以提高其在新任务上的零样本能力。
* **大规模多模态模型** :将CLIP的视觉编码器与Vicuna的语言解码器连接,形成一个端到端训练的多模态模型,用于通用的视觉和语言理解。
* **数据生成** :利用GPT-4生成多模态指令跟随数据,包括对图像内容的详细描述和复杂推理问题。
* **评估基准** :构建了两个评估基准,包含多样且具有挑战性的应用任务,以测试模型的多模态对话能力。

## 环境配置

### Docker(方法一)

提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:

```
42
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10
laibao's avatar
laibao committed
43
44
45
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
zhuwenwen's avatar
zhuwenwen committed
46
# 若要在主机端和容器端映射端口需要删除--network host参数
laibao's avatar
laibao committed
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash

```

`Tips:若在K100/Z100L上使用,使用定制镜像docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.5.0-dtk24.04.1-ubuntu20.04-py310-zk-v1,K100/Z100L不支持awq量化`

### Dockerfile(方法二)

```
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker build -t llava:latest .
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> llava:latest /bin/bash

```

### Anaconda(方法三)

```
conda create -n llava_vllm python=3.10
```

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。

laibao's avatar
laibao committed
71
72
73
74
75
76
* DTK驱动:dtk24.04.3
* Pytorch: 2.3.0
* triton: 2.1.0
* lmslim: 0.1.2
* flash_attn: 2.6.1
* vllm: 0.6.2
laibao's avatar
laibao committed
77
78
79
* python: python3.10

`Tips:需先安装相关依赖,最后安装vllm包`
80
81
82

环境变量:  
export ALLREDUCE_STREAM_WITH_COMPUTE=1  
laibao's avatar
laibao committed
83
export VLLM_NUMA_BIND=1  
84
85
export VLLM_RANK0_NUMA=0  
export VLLM_RANK1_NUMA=1  
laibao's avatar
laibao committed
86
87
88
89
export VLLM_RANK2_NUMA=2  
export VLLM_RANK3_NUMA=3  
export VLLM_RANK4_NUMA=4  
export VLLM_RANK5_NUMA=5  
90
91
92
export VLLM_RANK6_NUMA=6  
export VLLM_RANK7_NUMA=7  

93

laibao's avatar
laibao committed
94
95
96
97
98
99
100
101
## 数据集



## 推理

### 模型下载

laibao's avatar
laibao committed
102
103
| 基座模型                                                                           |                                                                                       |                                                                                                   |
| ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
laibao's avatar
laibao committed
104
| [llava-v1.5-7b-hf](http://113.200.138.88:18080/aimodels/llava-hf/llava-1.5-7b-hf.git) | [llava-v1.6-34b-hf](http://113.200.138.88:18080/aimodels/llava-hf/llava-v1.6-34b-hf.git) | [llava-v1.6-vicuna-7b-hf](http://113.200.138.88:18080/aimodels/llava-hf/llava-v1.6-vicuna-7b-hf.git) |
laibao's avatar
laibao committed
105
106
107
108

### 模型推理

```bash
laibao's avatar
laibao committed
109
python examples/offline_inference_vision_language.py
laibao's avatar
laibao committed
110
111
112
113
114
115
116
117
118
```

### OpenAI兼容服务

启动服务:

`cd examples`

```bash
laibao's avatar
laibao committed
119
 vllm serve llava/llava-1.5-7b-hf --chat-template template_llava.jinja --port 8000
laibao's avatar
laibao committed
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```

这里 `--model`为加载模型路径,`--image-input-type pixel_values`为图片输入的类型:pixel_values,`--image-token-id`用于指定图片输入的特殊标记 ID,`--image-input-shape`设置图片输入的形状,`--image-feature-size`指定图像特征的大小,`--chat-template`可以添加新模板覆盖默认模板。

### OpenAI Completions API和vllm结合使用

```bash

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "llava/llava-1.5-7b-hf",
    "messages": [
      {
        "role": "user",
        "content": "What is the content of this image? [local file](images/cherry_blossom.jpg)"
      }
    ],
    "max_tokens": 300
  }'

```

### **gradio和vllm结合使用**

1.安装gradio

```
pip install gradio
```

laibao's avatar
laibao committed
152
2.安装必要文件与端口映射
laibao's avatar
laibao committed
153
154
155
156

    2.1 启动gradio服务,根据提示操作

```
laibao's avatar
laibao committed
157
python  gradio_openai_vlm_webserver.py --model "/mnt/data/llm-models/llava/llava-1.5-7b-hf" --model-url http://localhost:8000/v1 --host "0.0.0.0" --port 8001
laibao's avatar
laibao committed
158
159
160
161
162
163
164
165
166
```

    2.2 更改文件权限

打开提示下载文件目录,输入以下命令给予权限

```
chmod +x frpc_linux_amd64_v0.*
```
laibao's avatar
laibao committed
167

laibao's avatar
laibao committed
168
169
170
171
    2.3端口映射

```
ssh -L 8000:计算节点IP:8000 -L 8001:计算节点IP:8001 用户名@登录节点 -p 登录节点端口
laibao's avatar
laibao committed
172
```
laibao's avatar
laibao committed
173
174
175
176
177
178

3.启动OpenAI兼容服务

`cd examples`

```
laibao's avatar
laibao committed
179
 vllm serve /mnt/data/llm-models/llava/llava-1.5-7b-hf --chat-template template_llava.jinja --port 8000 --host"0.0.0.0"
laibao's avatar
laibao committed
180
181
182
```

4.启动gradio服务
laibao's avatar
laibao committed
183

laibao's avatar
laibao committed
184
185
```
python  gradio_openai_vlm_webserver.py --model "/mnt/data/llm-models/llava/llava-1.5-7b-hf" --model-url http://localhost:8000/v1 --host "0.0.0.0" --port 8001"
laibao's avatar
laibao committed
186
187
```

laibao's avatar
laibao committed
188
5.使用对话服务
laibao's avatar
laibao committed
189
190
191

在浏览器中输入本地 URL,可以使用 Gradio 提供的对话服务。

laibao's avatar
laibao committed
192
## result
laibao's avatar
laibao committed
193

laibao's avatar
laibao committed
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
### 离线推理服务

使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](http://113.200.138.88:18080/aimodels/llava-v1.5-7b)

输入:

    images:

<div align="center">
    <img src="./doc/images.png" width="300" height="200"/>
</div>

    text:     	                       What is the content of this image?

输出:

    output:               The image features a close-up view of a stop sign on a city street

### gradio服务
laibao's avatar
laibao committed
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227

使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](http://113.200.138.88:18080/aimodels/llava-v1.5-7b)

<div align=center>
    <img src="./doc/llava_gradio.png" width="800" height="500"/>
</div>

### 精度



## 应用场景

### 算法类别

laibao's avatar
laibao committed
228
对话问答
laibao's avatar
laibao committed
229
230
231
232
233
234
235

### 热点应用行业

金融,科研,教育

## 源码仓库及问题反馈

laibao's avatar
laibao committed
236
* [https://developer.sourcefind.cn/codes/modelzoo/llava_vllm](https://developer.sourcefind.cn/codes/modelzoo/llava_vllm)
laibao's avatar
laibao committed
237
238
239
240

## 参考资料

* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)