README.md 4.94 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# LLaVA-NeXT: Improved reasoning, OCR, and world knowledge

## 论文

`LLaVA-NeXT: Improved reasoning, OCR, and world knowledge`

* https://llava-vl.github.io/blog/2024-01-30-llava-next/

## 模型结构

模型包含一个预训练视觉编码器,一个映射器以及一个大语言模型。

![alt text](readme_imgs/arch.png)

## 算法原理

当提供高分辨率图像和保留这些细节的表示时,模型感知图像中复杂细节的能力会得到显着提高。它减少了在面对低分辨率图像时猜想想象的视觉内容的模型幻觉。通过将图像分割成视觉编码器最初训练时所针对的分辨率的更小的图像块,并独立地对它们进行编码。在获得各个图像块的特征图之后,将它们组合成一个具有目标分辨率的大型特征图,并将其输入到大型语言模型(LLM)中。

![alt text](readme_imgs/alg.png)

## 环境配置

注意:需要修改代码,具体如下

`llava/model/llava_arch.py +410`

```python
# 修改前
image_feature = torch.cat((image_feature, self.model.image_newline[None]), dim=0)

# 修改后
image_feature = torch.cat((image_feature, self.model.image_newline[None].to(image_feature.device)), dim=0)
```

### Docker(方法一)
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10

    docker run --shm-size 100g --network=host --name=llava_next --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -e ".[train]"

    pip install lmms-eval

    pip install python-Levenshtein

    pip install gradio

    pip install rouge


### Dockerfile(方法二)

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 100g --network=host --name=llava_next --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
    
    pip install -e ".[train]"

    pip install lmms-eval

    pip install python-Levenshtein

    pip install gradio

    pip install rouge


### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/

```
DTK驱动: dtk2504
python:python3.10
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.4.0
```

2、其他非特殊库直接按照requirements.txt安装

```
pip install -e ".[train]"

pip install lmms-eval

pip install python-Levenshtein

pip install gradio
```

## 数据集



## 训练



## 推理

单图像输入

```bash
python inference_single_hf.py
```

多图像输入

```bash
python inference_multi_hf.py
```

注意:在运行前需要修改文件中的图像和模型路径。

### 更多模型

[interleave](interleave/README_interleave.md) | [image](image/README_image.md) | [onevision](onevision/README_onevision.md) | [video](video/README_video.md)

## result

![alt text](readme_imgs/result.png)

### 精度



## 应用场景

### 算法类别

`对话问答`

### 热点应用行业

`电商,教育,交通,能源`


## 预训练权重

|model|url|
|:---:|:---:|
|llava-v1.6-mistral-7b-hf|[hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) \| [SCNet]() |
|llava-v1.6-vicuna-7b-hf|[hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf) \| [SCNet](http://113.200.138.88:18080/aimodels/llava-hf/llava-v1.6-vicuna-7b-hf) |
|llava-v1.6-vicuna-13b-hf|[hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) \| [SCNet]() |
|llava-v1.6-34b-hf|[hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf) \| [SCNet](http://113.200.138.88:18080/aimodels/llava-hf/llava-v1.6-34b-hf) |
|llama3-llava-next-8b-hf|[hf](https://huggingface.co/llava-hf/llama3-llava-next-8b-hf) \| [SCNet]() |
|llava-next-72b-hf|[hf](https://huggingface.co/llava-hf/llava-next-72b-hf) \| [SCNet]() |
|llava-next-110b-hf|[hf](https://huggingface.co/llava-hf/llava-next-110b-hf) \| [SCNet]() |


<!-- | Version | LLM | Schedule | Checkpoint |
|----------|----------|-----------|-----------|
| LLaVA-1.6 | Vicuna-7B | full_ft-1e | [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) |
| LLaVA-1.6 | Vicuna-13B | full_ft-1e | [liuhaotian/llava-v1.6-vicuna-13b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-13b) |
| LLaVA-1.6 | Mistral-7B | full_ft-1e | [liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b) |
| LLaVA-1.6 | Hermes-Yi-34B | full_ft-1e | [liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b) | -->


## 源码仓库及问题反馈

* https://developer.sourcefind.cn/codes/modelzoo/llava-next_pytorch

## 参考资料

* https://llava-vl.github.io/blog/2024-01-30-llava-next/
* https://hugging-face.cn/docs/transformers/model_doc/llava_next