"vscode:/vscode.git/clone" did not exist on "37f5729c51d71f9c10c3d691dffb17f3af666d1a"
README.md 4.59 KB
Newer Older
wanglch's avatar
wanglch committed
1
2
3
4
5
6
# Vary-toy

**开源多模态OCR大模型**

## 论文

wanglch's avatar
wanglch committed
7
- [Small Language Model Meets with Reinforced Vision Vocabulary](https://arxiv.org/abs/2401.12503)
wanglch's avatar
wanglch committed
8
9

## 模型结构
wanglch's avatar
wanglch committed
10
最近Vary的团队开发了一个更小版本的Vary模型——1.8B Vary-toy,与Vary相比,Vary-toy除了小之外,还优化了新视觉词表。解决了原Vary只用新视觉词表做pdf ocr的网络容量浪费,以及吃不到SAM预训练优势的问题。与Vary-toy同时发布的还有更强的视觉词表网络,其不仅能做pdf-level ocr,还能做通用视觉目标检测。Vary-toy在消费级显卡可训练、8G显存的老显卡可运行,依旧支持中英文。
wanglch's avatar
wanglch committed
11
12
13
14
15
16
17
18
19
20
21
22
23

<div align="center">
    <img src="./image/model.png"/>
</div>

## 算法原理
Vary-toy 利用 Vary-tiny+ 管道来生成新的愿景 Vary-toy的词汇。这样的视觉词汇可以有效地编码密集的文本和自然物体位置信息转换为令牌。基于改进的词汇量,Vary-toy 不仅拥有所有以前的功能(文档OCR),但也很好地处理了对象检测任务。

<div align="center">
    <img src="./assets/CLIP.png"/>
</div>

## 环境配置
dcuai's avatar
dcuai committed
24

dcuai's avatar
dcuai committed
25
26
27
28
29
30
**注意:** 🚨 
|在部署环境前需将[vary/demo/run_qwen_vary.py](https://developer.sourcefind.cn/codes/modelzoo/vary-toy_pytorch/-/blob/main/vary/demo/run_qwen_vary.py)[vary/model/vary_qwen_vary.py](https://developer.sourcefind.cn/codes/modelzoo/vary-toy_pytorch/-/blob/main/vary/model/vary_qwen_vary.py)[vary/model/vary_toy_qwen1_8.py](https://developer.sourcefind.cn/codes/modelzoo/vary-toy_pytorch/-/blob/main/vary/model/vary_toy_**qwen1_8.py)中的模型路径改为本地模型路径,同时将Vary-toy模型中的config.json文件中的模型路径改为本地路径,完成以上操作后再执行pip install e .指令。 |
| -------- |

 

dcuai's avatar
dcuai committed
31

wanglch's avatar
wanglch committed
32
33
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
dcuai's avatar
dcuai committed
34
35
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
wanglch's avatar
wanglch committed
36
37
38

docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name vary-toy <your imageID> bash

wanglch's avatar
wanglch committed
39
cd /path/your_code_data/
wanglch's avatar
wanglch committed
40
41

pip install e .
dcuai's avatar
dcuai committed
42
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
43
44
45
46
47

```

### Dockerfile(方法二)
```
wanglch's avatar
wanglch committed
48
49
50
51
52
53
54
55
56
cd /path/your_code_data/docker

docker build --no-cache -t vary-toy:latest .

docker run --shm-size=64G --name vary-toy -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it vary-toy:latest bash

cd /path/your_code_data/

pip intall e .
dcuai's avatar
dcuai committed
57
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
58

wanglch's avatar
wanglch committed
59
60
61
```

### Anaconda(方法三)
chenzk's avatar
chenzk committed
62
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
wanglch's avatar
wanglch committed
63
```
dcuai's avatar
dcuai committed
64
DTK驱动:dtk24.04.1
wanglch's avatar
wanglch committed
65
66
67
68
69
70
71
72
73
74
75
76
python:python3.10
torch:2.1
torchvision: 0.16.0
deepspped: 0.12.3
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`

```
conda create -n vary-toy python=3.10

conda activate vary-toy

wanglch's avatar
wanglch committed
77
cd /path/your_code_data/
wanglch's avatar
wanglch committed
78
79

pip install e .
dcuai's avatar
dcuai committed
80
pip install numpy==1.24.3
wanglch's avatar
wanglch committed
81
82
83
84
85
86
87
88

```

## 数据集

无, 本项目暂未开放数据集

## 训练
wanglch's avatar
wanglch committed
89

wanglch's avatar
wanglch committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

## 推理
**需严格按照本仓库代码目录进行排列**
备注:在run.sh修改 --image-file 替换ocr文件
```
python /home/wanglch/projects/Vary-toy_pytorch/vary/demo/run_qwen_vary.py --model-name /home/wanglch/projects/Vary-toy_pytorch/cache/models--HaoranWei--Vary-toy --image-file /home/wanglch/projects/Vary-toy_pytorch/image/pic.jpg
```
备注:修改 vary/demo/run_qwen_vary.py 替换57行代码执行不同任务操作
```
qs = 'Provide the ocr results of this image.' # 执行ocr任务
qs = 'Detevate the ** in this image.' # 检测任务
qs = 'Convert the document to markdown format.' # 公式转markdown
qs = 'Describe this image in within 100 words.' # 多模态描述
```

### 推理代码

```
bash run.sh
```

## result

dcuai's avatar
dcuai committed
113
**英语文档ocr结果**
wanglch's avatar
wanglch committed
114
115
116
117
<div align=center>
    <img src="./image/ocr_en.png"/>
</div>

dcuai's avatar
dcuai committed
118
**中文文档ocr结果**
wanglch's avatar
wanglch committed
119
120
121
122
<div align=center>
    <img src="./image/ocr_cn.png"/>
</div>

dcuai's avatar
dcuai committed
123
124
125
### 精度


wanglch's avatar
wanglch committed
126
127
## 应用场景
### 算法类别
wanglch's avatar
wanglch committed
128
`OCR`
wanglch's avatar
wanglch committed
129
130

### 热点应用行业
dcuai's avatar
dcuai committed
131
`金融,教育,政府,科研`
wanglch's avatar
wanglch committed
132
133
134

## 预训练权重

chenzk's avatar
chenzk committed
135
136
[Vary-toy](https://huggingface.co/Haoran-megvii/Vary-toy). 
[clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) 
wanglch's avatar
wanglch committed
137
138

## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
139
- http://developer.sourcefind.cn/codes/modelzoo/vary-toy_pytorch.git
wanglch's avatar
wanglch committed
140
141
142
143

## 参考资料
- 本项目gitlab地址[Ucas-HaoranWei/Vary-toy](https://github.com/Ucas-HaoranWei/Vary-toy/tree/main)