README.md 5.44 KB
Newer Older
wanglch's avatar
wanglch committed
1
2
3
4
5
6
# Vary

**开源多模态OCR大模型**

## 论文

wanglch's avatar
wanglch committed
7
- [Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models](https://arxiv.org/abs/2312.06109)
wanglch's avatar
wanglch committed
8
9

## 模型结构
wanglch's avatar
wanglch committed
10
11
12
13
Vary的整体思想很简单,主要分为两个阶段,Vary-tiny和Vary-base:
Vary-tiny:设计了一个词汇表网络和一个小型的仅解码器的转换器,通过自回归生成所需的新视觉词汇表。这个词汇表会和OPT-125M模型一起训练。
Vary-base:将新的视觉词汇表与原始词汇表(CLIP)合并,扩展了vanilla(原始的)视觉词汇表。联合LLM-7B模型进行训练。
                    
wanglch's avatar
wanglch committed
14
15
16
17
18
19
20
21
22
23
24
25
26
<div align="center">
    <img src="./image/model.png"/>
</div>

## 算法原理
 Vary享有两种构象:Vary-tiny 和 Vary-base。我们设计 Vary-tiny 来 “编写”新的视觉词汇,而 Vary-base 则利用新的词汇。具体来说,Vary-tiny 主要由词汇网络和微型 OPT-125M组成。在这两个模块之间,我们添加了一个线性层来对齐通道尺寸。由于 Vary-tiny 主要关注细粒度感知,因此它没有文本输入分支。我们希望新的视觉词汇网络能在处理人工图像(即文档和图表)方面表现出色,以弥补 CLIP 的不足。同时,我们也希望在对自然图像进行标记时,它不会成为 CLIP 的噪音。因此,在生成过程中,我们将人工文档和图表数据作为正样本,将自然图像作为负样本来训练 Vary-tiny。完成上述过程后,我们提取词汇网络并将其添加到一个大型模型中,从而建立 Vary-base。新旧词汇网络享有独立的输入嵌入层,并在 LLM 之前进行整合。在这一阶段,我们冻结新旧视觉词汇网络的权重,并解冻其他模块的权重。

<div align="center">
    <img src="./assets/CLIP.png"/>
</div>

## 环境配置

dcuai's avatar
dcuai committed
27
`注:在部署环境前需修改本仓库vary/demo/run_qwen_vary.py和vary/model/vary_qwen_vary.py中的模型路径改为本地模型路径,同时将模型中的config.json文件中的模型路径改为本地路径,完成以上操作后再执行pip install e .指令。`
wanglch's avatar
wanglch committed
28

wanglch's avatar
wanglch committed
29
30
31
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤

wanglch's avatar
wanglch committed
32
```
dcuai's avatar
dcuai committed
33
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
wanglch's avatar
wanglch committed
34
35
36

docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name vary <your imageID> bash

wanglch's avatar
wanglch committed
37
cd /path/your_code_data/
wanglch's avatar
wanglch committed
38
39
40

pip install e .

dcuai's avatar
dcuai committed
41
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
42
43
44
45
```

### Dockerfile(方法二)
```
wanglch's avatar
wanglch committed
46
47
48
49
50
51
52
53
54
55
cd /path/your_code_data/docker

docker build --no-cache -t vary:latest .

docker run --shm-size=64G --name vary -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it vary:latest bash

cd /path/your_code_data/

pip intall e .

dcuai's avatar
dcuai committed
56
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
57
58
59
```

### Anaconda(方法三)
chenzk's avatar
chenzk committed
60
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
wanglch's avatar
wanglch committed
61
```
dcuai's avatar
dcuai committed
62
DTK驱动:dtk24.04.1
wanglch's avatar
wanglch committed
63
64
65
66
67
68
69
70
71
72
73
74
python:python3.10
torch:2.1
torchvision: 0.16.0
deepspped: 0.12.3
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`

```
conda create -n vary python=3.10

conda activate vary

wanglch's avatar
wanglch committed
75
cd /path/your_code_data/
wanglch's avatar
wanglch committed
76
77
78

pip install e .

dcuai's avatar
dcuai committed
79
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
80
81
82
```

## 数据集
wanglch's avatar
wanglch committed
83
本项目暂未开放数据集,需自己构建数据集
dcuai's avatar
dcuai committed
84
可参考本项目github
dcuai's avatar
dcuai committed
85
- [Ucas-HaoranWei/Vary](https://github.com/Ucas-HaoranWei/Vary)
wanglch's avatar
wanglch committed
86
87

## 训练
wanglch's avatar
wanglch committed
88
89


wanglch's avatar
wanglch committed
90
91
92
93
## 推理
**需严格按照本仓库代码目录进行排列**
备注:在run.sh修改 --image-file 替换ocr文件
```
wanglch's avatar
wanglch committed
94
python /home/wanglch/projects/Vary_pytorch/vary/demo/run_qwen_vary.py --model-name /home/wanglch/projects/Vary_pytorch/cache/models--HaoranWei--vary-llava80k --image-file /home/wanglch/projects/Vary_pytorch/image/pic.jpg
wanglch's avatar
wanglch committed
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
```
备注:修改 vary/demo/run_qwen_vary.py 替换57行代码执行不同任务操作
```
qs = 'Provide the ocr results of this image.' # 执行ocr任务
qs = 'Detevate the ** in this image.' # 检测任务
qs = 'Convert the document to markdown format.' # 公式转markdown
qs = 'Describe this image in within 100 words.' # 多模态描述
```

### 推理代码

```
bash run.sh
```

## result

dcuai's avatar
dcuai committed
112
**英语文档ocr结果**
wanglch's avatar
wanglch committed
113
114
115
116
117
118
119
120
<div align=center>
    <img src="./image/pic3.jpg"/>
</div>

<div align=center>
    <img src="./assets/ocr_en.png"/>
</div>

dcuai's avatar
dcuai committed
121
**中文文档ocr结果**
wanglch's avatar
wanglch committed
122
123
124
125
126
127
128
129
<div align=center>
    <img src="./image/pic2.jpg"/>
</div>

<div align=center>
    <img src="./assets/ocr_cn.png"/>
</div>

dcuai's avatar
dcuai committed
130
**内容描述结果**
wanglch's avatar
wanglch committed
131
132
133
134
135
136
137
138
<div align=center>
    <img src="./image/pic.jpg"/>
</div>

<div align=center>
    <img src="./assets/pic_result.png"/>
</div>

dcuai's avatar
dcuai committed
139
140
### 精度

wanglch's avatar
wanglch committed
141
142
143

## 应用场景
### 算法类别
wanglch's avatar
wanglch committed
144
`OCR`
wanglch's avatar
wanglch committed
145
146

### 热点应用行业
dcuai's avatar
dcuai committed
147
`金融,教育,政府,科研`
wanglch's avatar
wanglch committed
148
149
150

## 预训练权重

dcuai's avatar
dcuai committed
151
- [Vary weights huggingface 预训练模型下载地址]<https://huggingface.co/Haoran-megvii/Vary> 可联系作者获取模型权重!
wanglch's avatar
wanglch committed
152
153
154
155
156
`weihaoran18@mails.ucas.ac.cn`

- 本项目提供权重地址为[Here](https://pan.baidu.com/s/1CjlRmq0_q-NSJez2BKrghg),
  验证码可在本仓库留言索取。

dcuai's avatar
dcuai committed
157
- [Download the CLIP-VIT-L]<https://huggingface.co/openai/clip-vit-large-patch14/>
wanglch's avatar
wanglch committed
158
159
160


## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
161
- http://developer.sourcefind.cn/codes/modelzoo/vary_pytorch.git
wanglch's avatar
wanglch committed
162
163

## 参考资料
wanglch's avatar
wanglch committed
164
- [Ucas-HaoranWei/Vary](https://github.com/Ucas-HaoranWei/Vary)
wanglch's avatar
wanglch committed
165