README.md 5.81 KB
Newer Older
dcuai's avatar
dcuai committed
1
# donut
Geewook Kim's avatar
Geewook Kim committed
2

wanglch's avatar
wanglch committed
3
Donut 🍩,即文档理解转换器,是一种利用无 OCR 端到端转换器模型进行文档理解的新方法。Donut 不需要现成的 OCR 引擎/API,但它在各种可视化文档理解任务(如可视化文档分类或信息提取(又称文档解析))中表现出了最先进的性能。
Geewook Kim's avatar
Geewook Kim committed
4

wanglch's avatar
wanglch committed
5
6
## 论文
- [论文地址] [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
Geewook Kim's avatar
Geewook Kim committed
7

wanglch's avatar
wanglch committed
8
## 模型结构
Geewook Kim's avatar
Geewook Kim committed
9

wanglch's avatar
wanglch committed
10
Donut是一种端到端的(即,自包含的)VDU模型,用于通用理解文档图像。Donut的架构非常简单,由基于Transformer的视觉编码器和文本解码器模块组成。请注意,Donut并不依赖于任何与OCR功能相关的模块,而是使用视觉编码器从给定的文档图像中提取特征。接下来的文本解码器将派生的特征映射到一系列子词令牌,以构造所需的结构化格式。每个模型组件都基于Transformer,因此模型可以轻易地以端到端的方式进行训练。
Geewook Kim's avatar
Geewook Kim committed
11

wanglch's avatar
wanglch committed
12
13
14
<div align="center">
    <img src="misc/overview.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
15

wanglch's avatar
wanglch committed
16
## 算法原理
Geewook Kim's avatar
Geewook Kim committed
17

wanglch's avatar
wanglch committed
18
编码器。视觉编码器将输入文档图像x∈RH×W×C转换为一组嵌入向量{zi|zi∈Rd,1≤i≤n},其中n是特征图大小或图像块数量,d是编码器的潜在向量的维度。
Geewook Kim's avatar
Geewook Kim committed
19

wanglch's avatar
wanglch committed
20
解码器。给定{z},文本解码器生成一个令牌序列(yi)mi=1,其中yi∈Rv是第i个令牌的独热向量,v是令牌词汇表的大小,m是超参数。
Geewook Kim's avatar
Geewook Kim committed
21

wanglch's avatar
wanglch committed
22
<div align=center>
wanglch's avatar
wanglch committed
23
    <img src="./misc/model.png"/>
wanglch's avatar
wanglch committed
24
</div>
Geewook Kim's avatar
Geewook Kim committed
25

wanglch's avatar
wanglch committed
26
27
28
29
## 环境配置
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
wanglch's avatar
wanglch committed
30
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
Geewook Kim's avatar
Geewook Kim committed
31

wanglch's avatar
wanglch committed
32
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --network=host --device=/dev/kfd --device=/dev/dri/ --group-add video --name donut <your imageID> bash
Geewook Kim's avatar
Geewook Kim committed
33

wanglch's avatar
wanglch committed
34
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
35

wanglch's avatar
wanglch committed
36
37
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
Geewook Kim's avatar
Geewook Kim committed
38

wanglch's avatar
wanglch committed
39
40
41
### Dockerfile(方法二)
```
cd /path/your_code_data/docker
Geewook Kim's avatar
Geewook Kim committed
42

wanglch's avatar
wanglch committed
43
docker build --no-cache -t donut:latest .
Geewook Kim's avatar
Geewook Kim committed
44

wanglch's avatar
wanglch committed
45
docker run --shm-size=64G --name donut -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --network=host --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it donut bash
wanglch's avatar
wanglch committed
46
47
```
### Anaconda(方法三)
Geewook Kim's avatar
Geewook Kim committed
48

wanglch's avatar
wanglch committed
49
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Geewook Kim's avatar
Geewook Kim committed
50
```
dcuai's avatar
dcuai committed
51
DTK驱动:dtk24.04.1
wanglch's avatar
wanglch committed
52
53
python:python3.10
torch:2.1.0
wanglch's avatar
wanglch committed
54
```
dcuai's avatar
dcuai committed
55
`Tips:以上dtk驱动、python、pytorch等DCU相关工具版本需要严格一一对应`
Geewook Kim's avatar
Geewook Kim committed
56
57

```
wanglch's avatar
wanglch committed
58
conda create -n donut python=3.10
Geewook Kim's avatar
Geewook Kim committed
59

wanglch's avatar
wanglch committed
60
conda activate donut
Geewook Kim's avatar
Geewook Kim committed
61

wanglch's avatar
wanglch committed
62
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
63

wanglch's avatar
wanglch committed
64
65
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
```
Geewook Kim's avatar
Geewook Kim committed
66

wanglch's avatar
wanglch committed
67
## 数据集
Geewook Kim's avatar
Geewook Kim committed
68

dcuai's avatar
dcuai committed
69
本仓库使用[naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2)数据集, 如需使用可点击下载完整的数据集,训练数据目录结构如下,用于自建微调的数据集请按此目录结构进行制备(具体可参考[donut github](https://github.com/clovaai/donut?tab=readme-ov-file)数据集部分):
wanglch's avatar
wanglch committed
70

wanglch's avatar
wanglch committed
71
```
Geewook Kim's avatar
Geewook Kim committed
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
> tree dataset_name
dataset_name
├── test
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
├── train
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
└── validation
    ├── metadata.jsonl
    ├── {image_path0}
    ├── {image_path1}
              .
              .

> cat dataset_name/test/metadata.jsonl
Geewook Kim's avatar
Geewook Kim committed
94
95
{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
{"file_name": {image_path1}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
Geewook Kim's avatar
Geewook Kim committed
96
97
98
     .
     .
```
wanglch's avatar
wanglch committed
99
## 训练
wanglch's avatar
wanglch committed
100
### 单机单卡
wanglch's avatar
wanglch committed
101
102
103
```
bash single_dcu_train.sh
```
Geewook Kim's avatar
Geewook Kim committed
104
105


wanglch's avatar
wanglch committed
106
## 推理
Geewook Kim's avatar
Geewook Kim committed
107

wanglch's avatar
wanglch committed
108
### 单机单卡
Geewook Kim's avatar
Geewook Kim committed
109

wanglch's avatar
wanglch committed
110
### 票据OCR
Geewook Kim's avatar
Geewook Kim committed
111

wanglch's avatar
wanglch committed
112
113
114
115
```
python donut_cord_infernce.py
```
### 火车票OCR
Geewook Kim's avatar
Geewook Kim committed
116

wanglch's avatar
wanglch committed
117
118
119
```
python donut_zhtrainticket_inference.py
```
Geewook Kim's avatar
Geewook Kim committed
120

wanglch's avatar
wanglch committed
121
## result
Geewook Kim's avatar
Geewook Kim committed
122

wanglch's avatar
wanglch committed
123
### 票据OCR结果
Geewook Kim's avatar
Geewook Kim committed
124

wanglch's avatar
wanglch committed
125
126
127
<div align=center>
    <img src="./result/cord1.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
128
129


wanglch's avatar
wanglch committed
130
### 火车票OCR结果
Geewook Kim's avatar
Geewook Kim committed
131

wanglch's avatar
wanglch committed
132
133
134
<div align=center>
    <img src="./result/train.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
135

wanglch's avatar
wanglch committed
136
### 精度
wanglch's avatar
wanglch committed
137
测试数据: [naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2)  ,使用的加速卡:V100S/K100。
Geewook Kim's avatar
Geewook Kim committed
138

wanglch's avatar
wanglch committed
139
140
141
142
| device | train_loss | TED_accuracy_score | F1_accuracy_score |
| :------: | :------: | :------: | :------: |
| V100s | 0.0533 | 0.87157 | 0.796 | 
| K100 | 0.038 | 0.87028 | 0.8047 | 
Geewook Kim's avatar
Geewook Kim committed
143

wanglch's avatar
wanglch committed
144
## 应用场景
Geewook Kim's avatar
Geewook Kim committed
145

wanglch's avatar
wanglch committed
146
147
### 算法类别
`ocr`
Geewook Kim's avatar
Geewook Kim committed
148

wanglch's avatar
wanglch committed
149
150
151
152
153
### 热点应用行业
`金融,教育,政府,科研,制造,能源,交通`

## 预训练权重

wanglch's avatar
wanglch committed
154
155
预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels)

wanglch's avatar
wanglch committed
156
157
158
159
160
161
162
项目中的预训练权重可从快速下载通道下载:

[donut](http://113.200.138.88:18080/aimodels/donut)

[Donut-Base Finetuned-Docvqa](http://113.200.138.88:18080/aimodels/donut-base-finetuned-docvqa)

[donut-base-finetuned-cord-v2](http://113.200.138.88:18080/aimodels/donut-base-finetuned-cord-v2)
wanglch's avatar
wanglch committed
163

dcuai's avatar
dcuai committed
164
huggingface链接如下:
wanglch's avatar
wanglch committed
165
166
167
168
169
170
171
-  [naver-clova-ix/donut-base](https://huggingface.co/naver-clova-ix/donut-base/tree/official)

-  [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official)

- [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) 

## 源码仓库及问题反馈
wanglch's avatar
wanglch committed
172
- https://developer.hpccube.com/codes/modelzoo/donut_pytorch
wanglch's avatar
wanglch committed
173
174
175
## 参考资料
- [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
- [donut github](https://github.com/clovaai/donut?tab=readme-ov-file)
Geewook Kim's avatar
Geewook Kim committed
176