"docs/source/en/add_new_model.mdx" did not exist on "b0c7d2ec58a8080780fb21327cc7cd7e1a177780"
README.md 5.48 KB
Newer Older
wanglch's avatar
wanglch committed
1
# Donut
Geewook Kim's avatar
Geewook Kim committed
2

wanglch's avatar
wanglch committed
3
Donut 🍩,即文档理解转换器,是一种利用无 OCR 端到端转换器模型进行文档理解的新方法。Donut 不需要现成的 OCR 引擎/API,但它在各种可视化文档理解任务(如可视化文档分类或信息提取(又称文档解析))中表现出了最先进的性能。
Geewook Kim's avatar
Geewook Kim committed
4

wanglch's avatar
wanglch committed
5
6
## 论文
- [论文地址] [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
Geewook Kim's avatar
Geewook Kim committed
7

wanglch's avatar
wanglch committed
8
## 模型结构
Geewook Kim's avatar
Geewook Kim committed
9

wanglch's avatar
wanglch committed
10
Donut是一种端到端的(即,自包含的)VDU模型,用于通用理解文档图像。Donut的架构非常简单,由基于Transformer的视觉编码器和文本解码器模块组成。请注意,Donut并不依赖于任何与OCR功能相关的模块,而是使用视觉编码器从给定的文档图像中提取特征。接下来的文本解码器将派生的特征映射到一系列子词令牌,以构造所需的结构化格式。每个模型组件都基于Transformer,因此模型可以轻易地以端到端的方式进行训练。
Geewook Kim's avatar
Geewook Kim committed
11

wanglch's avatar
wanglch committed
12
13
14
<div align="center">
    <img src="misc/overview.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
15

wanglch's avatar
wanglch committed
16
## 算法原理
Geewook Kim's avatar
Geewook Kim committed
17

wanglch's avatar
wanglch committed
18
编码器。视觉编码器将输入文档图像x∈RH×W×C转换为一组嵌入向量{zi|zi∈Rd,1≤i≤n},其中n是特征图大小或图像块数量,d是编码器的潜在向量的维度。
Geewook Kim's avatar
Geewook Kim committed
19

wanglch's avatar
wanglch committed
20
解码器。给定{z},文本解码器生成一个令牌序列(yi)mi=1,其中yi∈Rv是第i个令牌的独热向量,v是令牌词汇表的大小,m是超参数。
Geewook Kim's avatar
Geewook Kim committed
21

wanglch's avatar
wanglch committed
22
<div align=center>
wanglch's avatar
wanglch committed
23
    <img src="./misc/model.png"/>
wanglch's avatar
wanglch committed
24
</div>
Geewook Kim's avatar
Geewook Kim committed
25

wanglch's avatar
wanglch committed
26
27
28
29
30
## 环境配置
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/gpu/admin/base/pytorch:pytorch1.13-py3.8-cuda11.8
Geewook Kim's avatar
Geewook Kim committed
31

wanglch's avatar
wanglch committed
32
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name donut <your imageID> bash
Geewook Kim's avatar
Geewook Kim committed
33

wanglch's avatar
wanglch committed
34
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
35

wanglch's avatar
wanglch committed
36
37
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
Geewook Kim's avatar
Geewook Kim committed
38

wanglch's avatar
wanglch committed
39
40
41
### Dockerfile(方法二)
```
cd /path/your_code_data/docker
Geewook Kim's avatar
Geewook Kim committed
42

wanglch's avatar
wanglch committed
43
docker build --no-cache -t donut:latest .
Geewook Kim's avatar
Geewook Kim committed
44

wanglch's avatar
wanglch committed
45
46
47
docker run --shm-size=64G --name donut -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it donut bash
```
### Anaconda(方法三)
Geewook Kim's avatar
Geewook Kim committed
48

wanglch's avatar
wanglch committed
49
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Geewook Kim's avatar
Geewook Kim committed
50
```
wanglch's avatar
wanglch committed
51
52
53
54
55
56
DTK驱动:dtk23.10
python:python3.8
torch:1.13.0
torchvision:0.14.0
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
Geewook Kim's avatar
Geewook Kim committed
57

wanglch's avatar
wanglch committed
58
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Geewook Kim's avatar
Geewook Kim committed
59
```
wanglch's avatar
wanglch committed
60
conda create -n donut python=3.8
Geewook Kim's avatar
Geewook Kim committed
61

wanglch's avatar
wanglch committed
62
conda activate donut
Geewook Kim's avatar
Geewook Kim committed
63

wanglch's avatar
wanglch committed
64
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
65

wanglch's avatar
wanglch committed
66
67
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
```
Geewook Kim's avatar
Geewook Kim committed
68

wanglch's avatar
wanglch committed
69
## 数据集
Geewook Kim's avatar
Geewook Kim committed
70

wanglch's avatar
wanglch committed
71
72
73
 [naver-clova-ix/cord-v2](http://icrc.hitsz.edu.cn/Article/show/139.html) 

项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
Geewook Kim's avatar
Geewook Kim committed
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
```bash
> tree dataset_name
dataset_name
├── test
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
.
.
├── train
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
.
.
└── validation
    ├── metadata.jsonl
    ├── {image_path0}
    ├── {image_path1}
              .
              .

> cat dataset_name/test/metadata.jsonl
Geewook Kim's avatar
Geewook Kim committed
97
98
{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
{"file_name": {image_path1}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
Geewook Kim's avatar
Geewook Kim committed
99
100
101
     .
     .
```
wanglch's avatar
wanglch committed
102
103
104
105
106
## 训练
### 单机多卡
```
bash single_dcu_train.sh
```
Geewook Kim's avatar
Geewook Kim committed
107

wanglch's avatar
wanglch committed
108
109
110
111
### 单机单卡
```
bash multi_dcu_train.sh
```
Geewook Kim's avatar
Geewook Kim committed
112

wanglch's avatar
wanglch committed
113
## 推理
Geewook Kim's avatar
Geewook Kim committed
114

wanglch's avatar
wanglch committed
115
### 单机单卡
Geewook Kim's avatar
Geewook Kim committed
116

wanglch's avatar
wanglch committed
117
### 票据OCR
Geewook Kim's avatar
Geewook Kim committed
118

wanglch's avatar
wanglch committed
119
120
121
122
```
python donut_cord_infernce.py
```
### 火车票OCR
Geewook Kim's avatar
Geewook Kim committed
123

wanglch's avatar
wanglch committed
124
125
126
```
python donut_zhtrainticket_inference.py
```
Geewook Kim's avatar
Geewook Kim committed
127

wanglch's avatar
wanglch committed
128
## result
Geewook Kim's avatar
Geewook Kim committed
129

wanglch's avatar
wanglch committed
130
### 票据OCR结果
Geewook Kim's avatar
Geewook Kim committed
131

wanglch's avatar
wanglch committed
132
133
134
<div align=center>
    <img src="./result/cord1.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
135
136


wanglch's avatar
wanglch committed
137
### 火车票OCR结果
Geewook Kim's avatar
Geewook Kim committed
138

wanglch's avatar
wanglch committed
139
140
141
<div align=center>
    <img src="./result/train.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
142

wanglch's avatar
wanglch committed
143
144
### 精度
测试数据: [naver-clova-ix/cord-v2](http://icrc.hitsz.edu.cn/Article/show/139.html) ,使用的加速卡:V100S/K100。
Geewook Kim's avatar
Geewook Kim committed
145

wanglch's avatar
wanglch committed
146
147
148
149
| device | train_time | train_loss | TED_accuracy_score | F1_accuracy_score |
| :------: | :------: | :------: | :------: | :------: |
| V100s |  12.62 min | 0.0533 | 0.87157 | 0.796 | 
| K100 | 24.99 min | 0.038 | 0.87028 | 0.8047 | 
Geewook Kim's avatar
Geewook Kim committed
150

wanglch's avatar
wanglch committed
151
## 应用场景
Geewook Kim's avatar
Geewook Kim committed
152

wanglch's avatar
wanglch committed
153
154
### 算法类别
`ocr`
Geewook Kim's avatar
Geewook Kim committed
155

wanglch's avatar
wanglch committed
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
### 热点应用行业
`金融,教育,政府,科研,制造,能源,交通`

## 预训练权重

-  [naver-clova-ix/donut-base](https://huggingface.co/naver-clova-ix/donut-base/tree/official)

-  [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official)

- [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) 

## 源码仓库及问题反馈
- http://developer.hpccube.com/codes/modelzoo/umt5.git
## 参考资料
- [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
- [donut github](https://github.com/clovaai/donut?tab=readme-ov-file)
Geewook Kim's avatar
Geewook Kim committed
172