"examples/vscode:/vscode.git/clone" did not exist on "5b647543c141a6b21307f3fbc679d2a0a9231c41"
README.md 5.28 KB
Newer Older
wanglch's avatar
wanglch committed
1
# Donut
Geewook Kim's avatar
Geewook Kim committed
2

wanglch's avatar
wanglch committed
3
Donut 🍩,即文档理解转换器,是一种利用无 OCR 端到端转换器模型进行文档理解的新方法。Donut 不需要现成的 OCR 引擎/API,但它在各种可视化文档理解任务(如可视化文档分类或信息提取(又称文档解析))中表现出了最先进的性能。
Geewook Kim's avatar
Geewook Kim committed
4

wanglch's avatar
wanglch committed
5
6
## 论文
- [论文地址] [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
Geewook Kim's avatar
Geewook Kim committed
7

wanglch's avatar
wanglch committed
8
## 模型结构
Geewook Kim's avatar
Geewook Kim committed
9

wanglch's avatar
wanglch committed
10
Donut是一种端到端的(即,自包含的)VDU模型,用于通用理解文档图像。Donut的架构非常简单,由基于Transformer的视觉编码器和文本解码器模块组成。请注意,Donut并不依赖于任何与OCR功能相关的模块,而是使用视觉编码器从给定的文档图像中提取特征。接下来的文本解码器将派生的特征映射到一系列子词令牌,以构造所需的结构化格式。每个模型组件都基于Transformer,因此模型可以轻易地以端到端的方式进行训练。
Geewook Kim's avatar
Geewook Kim committed
11

wanglch's avatar
wanglch committed
12
13
14
<div align="center">
    <img src="misc/overview.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
15

wanglch's avatar
wanglch committed
16
## 算法原理
Geewook Kim's avatar
Geewook Kim committed
17

wanglch's avatar
wanglch committed
18
编码器。视觉编码器将输入文档图像x∈RH×W×C转换为一组嵌入向量{zi|zi∈Rd,1≤i≤n},其中n是特征图大小或图像块数量,d是编码器的潜在向量的维度。
Geewook Kim's avatar
Geewook Kim committed
19

wanglch's avatar
wanglch committed
20
解码器。给定{z},文本解码器生成一个令牌序列(yi)mi=1,其中yi∈Rv是第i个令牌的独热向量,v是令牌词汇表的大小,m是超参数。
Geewook Kim's avatar
Geewook Kim committed
21

wanglch's avatar
wanglch committed
22
<div align=center>
wanglch's avatar
wanglch committed
23
    <img src="./misc/model.png"/>
wanglch's avatar
wanglch committed
24
</div>
Geewook Kim's avatar
Geewook Kim committed
25

wanglch's avatar
wanglch committed
26
27
28
29
## 环境配置
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
wanglch's avatar
wanglch committed
30
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-ubuntu20.04-dtk23.10-py38
Geewook Kim's avatar
Geewook Kim committed
31

wanglch's avatar
wanglch committed
32
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name donut <your imageID> bash
Geewook Kim's avatar
Geewook Kim committed
33

wanglch's avatar
wanglch committed
34
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
35

wanglch's avatar
wanglch committed
36
37
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
Geewook Kim's avatar
Geewook Kim committed
38

wanglch's avatar
wanglch committed
39
40
41
### Dockerfile(方法二)
```
cd /path/your_code_data/docker
Geewook Kim's avatar
Geewook Kim committed
42

wanglch's avatar
wanglch committed
43
docker build --no-cache -t donut:latest .
Geewook Kim's avatar
Geewook Kim committed
44

wanglch's avatar
wanglch committed
45
46
47
docker run --shm-size=64G --name donut -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it donut bash
```
### Anaconda(方法三)
Geewook Kim's avatar
Geewook Kim committed
48

wanglch's avatar
wanglch committed
49
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Geewook Kim's avatar
Geewook Kim committed
50
```
wanglch's avatar
wanglch committed
51
52
53
54
55
56
DTK驱动:dtk23.10
python:python3.8
torch:1.13.0
torchvision:0.14.0
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
Geewook Kim's avatar
Geewook Kim committed
57
58

```
wanglch's avatar
wanglch committed
59
conda create -n donut python=3.8
Geewook Kim's avatar
Geewook Kim committed
60

wanglch's avatar
wanglch committed
61
conda activate donut
Geewook Kim's avatar
Geewook Kim committed
62

wanglch's avatar
wanglch committed
63
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
64

wanglch's avatar
wanglch committed
65
66
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
```
Geewook Kim's avatar
Geewook Kim committed
67

wanglch's avatar
wanglch committed
68
## 数据集
Geewook Kim's avatar
Geewook Kim committed
69

wanglch's avatar
wanglch committed
70
 [naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2) 
wanglch's avatar
wanglch committed
71
72

项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
Geewook Kim's avatar
Geewook Kim committed
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
```bash
> tree dataset_name
dataset_name
├── test
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
.
.
├── train
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
.
.
└── validation
    ├── metadata.jsonl
    ├── {image_path0}
    ├── {image_path1}
              .
              .

> cat dataset_name/test/metadata.jsonl
Geewook Kim's avatar
Geewook Kim committed
96
97
{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
{"file_name": {image_path1}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
Geewook Kim's avatar
Geewook Kim committed
98
99
100
     .
     .
```
wanglch's avatar
wanglch committed
101
## 训练
wanglch's avatar
wanglch committed
102
### 单机单卡
wanglch's avatar
wanglch committed
103
104
105
```
bash single_dcu_train.sh
```
Geewook Kim's avatar
Geewook Kim committed
106
107


wanglch's avatar
wanglch committed
108
## 推理
Geewook Kim's avatar
Geewook Kim committed
109

wanglch's avatar
wanglch committed
110
### 单机单卡
Geewook Kim's avatar
Geewook Kim committed
111

wanglch's avatar
wanglch committed
112
### 票据OCR
Geewook Kim's avatar
Geewook Kim committed
113

wanglch's avatar
wanglch committed
114
115
116
117
```
python donut_cord_infernce.py
```
### 火车票OCR
Geewook Kim's avatar
Geewook Kim committed
118

wanglch's avatar
wanglch committed
119
120
121
```
python donut_zhtrainticket_inference.py
```
Geewook Kim's avatar
Geewook Kim committed
122

wanglch's avatar
wanglch committed
123
## result
Geewook Kim's avatar
Geewook Kim committed
124

wanglch's avatar
wanglch committed
125
### 票据OCR结果
Geewook Kim's avatar
Geewook Kim committed
126

wanglch's avatar
wanglch committed
127
128
129
<div align=center>
    <img src="./result/cord1.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
130
131


wanglch's avatar
wanglch committed
132
### 火车票OCR结果
Geewook Kim's avatar
Geewook Kim committed
133

wanglch's avatar
wanglch committed
134
135
136
<div align=center>
    <img src="./result/train.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
137

wanglch's avatar
wanglch committed
138
### 精度
wanglch's avatar
wanglch committed
139
测试数据: [naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2)  ,使用的加速卡:V100S/K100。
Geewook Kim's avatar
Geewook Kim committed
140

wanglch's avatar
wanglch committed
141
142
143
144
| device | train_loss | TED_accuracy_score | F1_accuracy_score |
| :------: | :------: | :------: | :------: |
| V100s | 0.0533 | 0.87157 | 0.796 | 
| K100 | 0.038 | 0.87028 | 0.8047 | 
Geewook Kim's avatar
Geewook Kim committed
145

wanglch's avatar
wanglch committed
146
## 应用场景
Geewook Kim's avatar
Geewook Kim committed
147

wanglch's avatar
wanglch committed
148
149
### 算法类别
`ocr`
Geewook Kim's avatar
Geewook Kim committed
150

wanglch's avatar
wanglch committed
151
152
153
154
155
156
157
158
159
160
161
162
### 热点应用行业
`金融,教育,政府,科研,制造,能源,交通`

## 预训练权重

-  [naver-clova-ix/donut-base](https://huggingface.co/naver-clova-ix/donut-base/tree/official)

-  [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official)

- [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) 

## 源码仓库及问题反馈
wanglch's avatar
wanglch committed
163
- https://developer.hpccube.com/codes/modelzoo/donut_pytorch
wanglch's avatar
wanglch committed
164
165
166
## 参考资料
- [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
- [donut github](https://github.com/clovaai/donut?tab=readme-ov-file)
Geewook Kim's avatar
Geewook Kim committed
167