README.md 6.96 KB
Newer Older
laibao's avatar
laibao committed
1
2
3
4
5
6
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2024-05-24 14:15:07
 * @LastEditTime: 2024-09-30 08:30:01
-->
laibao's avatar
laibao committed
7

laibao's avatar
laibao committed
8
# llava
laibao's avatar
laibao committed
9
10

## 论文
laibao's avatar
laibao committed
11

laibao's avatar
laibao committed
12
13
14
Visual Instruction Tuning

[2304.08485 (arxiv.org)](https://arxiv.org/pdf/2304.08485)
laibao's avatar
laibao committed
15
16

## 模型结构
laibao's avatar
laibao committed
17

laibao's avatar
laibao committed
18
19
LLaVA(大型语言和视觉助手)是一个开源的大型多模态模型,结合了视觉和语言能力。它通过将视觉编码器与语言模型 Vicuna 结合,实现了先进的视觉和语言理解,在多模态任务中表现优异,并在多个基准测试中(如 Science QA)设立了新的标准。LLaVA 以成本效益高的训练和高效扩展性著称,最近的更新着重提升了多模态推理能力,尤其是对高分辨率图像的理解。

laibao's avatar
laibao committed
20
LLaVA 的最新进展包括支持动态高分辨率处理,以及多语言的零样本能力,如中文,展现了在非英语数据上未经特定微调的情况下也能保持出色的表现
laibao's avatar
laibao committed
21

laibao's avatar
laibao committed
22
<div align=center>
laibao's avatar
laibao committed
23
    <img src="./doc/llava_network.png"/>
laibao's avatar
laibao committed
24
25
26
</div>

## 算法原理
laibao's avatar
laibao committed
27

laibao's avatar
laibao committed
28
LLaVA(Large Language and Vision Assistant)的算法原理主要包括以下几个方面:
laibao's avatar
laibao committed
29

laibao's avatar
laibao committed
30
31
32
33
* **视觉指令调优** :通过使用GPT-4生成的多模态语言-图像指令数据,对模型进行调优,以提高其在新任务上的零样本能力。
* **大规模多模态模型** :将CLIP的视觉编码器与Vicuna的语言解码器连接,形成一个端到端训练的多模态模型,用于通用的视觉和语言理解。
* **数据生成** :利用GPT-4生成多模态指令跟随数据,包括对图像内容的详细描述和复杂推理问题。
* **评估基准** :构建了两个评估基准,包含多样且具有挑战性的应用任务,以测试模型的多模态对话能力。
laibao's avatar
laibao committed
34
35

## 环境配置
laibao's avatar
laibao committed
36

laibao's avatar
laibao committed
37
### Docker(方法一)
laibao's avatar
laibao committed
38

laibao's avatar
laibao committed
39
40
41
42
43
44
45
提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:

```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
laibao's avatar
laibao committed
46
47
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash

laibao's avatar
laibao committed
48
```
laibao's avatar
laibao committed
49

laibao's avatar
laibao committed
50
51
52
`Tips:若在K100/Z100L上使用,使用定制镜像docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.5.0-dtk24.04.1-ubuntu20.04-py310-zk-v1,K100/Z100L不支持awq量化`

### Dockerfile(方法二)
laibao's avatar
laibao committed
53

laibao's avatar
laibao committed
54
55
56
```
# <Host Path>主机端路径
# <Container Path>容器映射路径
laibao's avatar
laibao committed
57
58
59
docker build -t llava:latest .
docker run -it --name llava_vllm --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> llava:latest /bin/bash

laibao's avatar
laibao committed
60
61
62
```

### Anaconda(方法三)
laibao's avatar
laibao committed
63

laibao's avatar
laibao committed
64
```
laibao's avatar
laibao committed
65
conda create -n llava_vllm python=3.10
laibao's avatar
laibao committed
66
```
laibao's avatar
laibao committed
67

laibao's avatar
laibao committed
68
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
laibao's avatar
laibao committed
69

laibao's avatar
laibao committed
70
71
72
73
74
75
76
77
78
79
80
81
* DTK驱动:dtk24.04.2
* Pytorch: 2.1.0
* triton:2.1.0
* lmslim: 0.1.0
* xformers: 0.0.25
* flash_attn: 2.0.4
* vllm: 0.5.0
* python: python3.10

`Tips:需先安装相关依赖,最后安装vllm包`

## 数据集
laibao's avatar
laibao committed
82

laibao's avatar
laibao committed
83
84
85
86


## 推理

laibao's avatar
laibao committed
87
88
### 模型下载

laibao's avatar
laibao committed
89
90
91
92
| 基座模型                                                         |                                                                     |                                                                                 |
| ---------------------------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| [llava-v1.5-7b](http://113.200.138.88:18080/aimodels/llava-v1.5-7b) | [llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf) | [llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf) |

laibao's avatar
laibao committed
93
### 模型推理
laibao's avatar
laibao committed
94

laibao's avatar
laibao committed
95
```bash
laibao's avatar
laibao committed
96
python examples/llava_example.py
laibao's avatar
laibao committed
97
```
laibao's avatar
laibao committed
98

laibao's avatar
laibao committed
99

laibao's avatar
laibao committed
100
101
102
103
### OpenAI兼容服务

启动服务:

laibao's avatar
laibao committed
104
105
`cd examples`

laibao's avatar
laibao committed
106
```bash
laibao's avatar
laibao committed
107
python -m vllm.entrypoints.openai.api_server --model llava/llava-1.5-7b-hf --image-input-type pixel_values --image-token-id 32000 --image-input-shape 1,3,336,336 --image-feature-size 576 --chat-template template_llava.jinja
laibao's avatar
laibao committed
108

laibao's avatar
laibao committed
109
110
```

laibao's avatar
laibao committed
111
这里 `--model`为加载模型路径,`--image-input-type pixel_values`为图片输入的类型:pixel_values,`--image-token-id`用于指定图片输入的特殊标记 ID,`--image-input-shape`设置图片输入的形状,`--image-feature-size`指定图像特征的大小,`--chat-template`可以添加新模板覆盖默认模板。
laibao's avatar
laibao committed
112

laibao's avatar
laibao committed
113
### OpenAI Completions API和vllm结合使用
laibao's avatar
laibao committed
114
115
116

```bash

laibao's avatar
laibao committed
117
118
119
120
121
122
123
124
125
126
127
128
129
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "llava/llava-1.5-7b-hf",
    "messages": [
      {
        "role": "user",
        "content": "What is the content of this image? [local file](images/cherry_blossom.jpg)"
      }
    ],
    "max_tokens": 300
  }'
laibao's avatar
laibao committed
130
131
132

```

laibao's avatar
laibao committed
133
134
### result

laibao's avatar
laibao committed
135
使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](http://113.200.138.88:18080/aimodels/llava-v1.5-7b)
laibao's avatar
laibao committed
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

输入:

    images:

<div align="center">
    <img src="./doc/images.png" width="300" height="200"/>
</div>

    text:     	                       What is the content of this image?

输出:

    output:               The image features a close-up view of a stop sign on a city street

laibao's avatar
laibao committed
151
152
153
### **gradio和vllm结合使用**

1.安装gradio
laibao's avatar
laibao committed
154

laibao's avatar
laibao committed
155
156
157
```
pip install gradio
```
laibao's avatar
laibao committed
158

laibao's avatar
laibao committed
159
160
161
162
2.安装必要文件

    2.1 启动gradio服务,根据提示操作

laibao's avatar
laibao committed
163
```
laibao's avatar
laibao committed
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
python  gradio_openai_vlm_webserver.py --model "/mnt/data/llm-models/llava/llava-1.5-7b-hf" --model-url http://localhost:8000/v1
```

    2.2 更改文件权限

打开提示下载文件目录,输入以下命令给予权限

```
chmod +x frpc_linux_amd64_v0.*
```

3.启动OpenAI兼容服务

`cd examples`

```
python -m vllm.entrypoints.openai.api_server     --model /mnt/data/llm-models/llava/llava-1.5-7b-hf     --image-input-type pixel_values     --image-token-id 32000     --image-input-shape 1,3,336,336     --image-feature-size 576     --chat-template template_llava.jinja
```

5.使用对话服务

在浏览器中输入本地 URL,可以使用 Gradio 提供的对话服务。

### result

使用的加速卡:单卡K100_AI     模型:[llava-v1.5-7b](http://113.200.138.88:18080/aimodels/llava-v1.5-7b)

<div align=center>
laibao's avatar
laibao committed
192
    <img src="./doc/llava_gradio.png" width="800" height="500"/>
laibao's avatar
laibao committed
193
</div>
laibao's avatar
laibao committed
194
195

### 精度
laibao's avatar
laibao committed
196

laibao's avatar
laibao committed
197
198
199
200
201


## 应用场景

### 算法类别
laibao's avatar
laibao committed
202

laibao's avatar
laibao committed
203
204
205
对话问答

### 热点应用行业
laibao's avatar
laibao committed
206

laibao's avatar
laibao committed
207
208
209
金融,科研,教育

## 源码仓库及问题反馈
laibao's avatar
laibao committed
210

laibao's avatar
laibao committed
211
* [https://developer.sourcefind.cn/codes/modelzoo/llava_vllm](https://developer.sourcefind.cn/codes/modelzoo/llava_vllm)
laibao's avatar
laibao committed
212
213
214

## 参考资料

laibao's avatar
laibao committed
215
* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)