"host/online_compilation/CMakeLists.txt" did not exist on "8f5f64960e36ce7679bfc827e4652f48c2c4bddb"
README.md 5.32 KB
Newer Older
wanglch's avatar
wanglch committed
1
# Donut
Geewook Kim's avatar
Geewook Kim committed
2

wanglch's avatar
wanglch committed
3
Donut 🍩,即文档理解转换器,是一种利用无 OCR 端到端转换器模型进行文档理解的新方法。Donut 不需要现成的 OCR 引擎/API,但它在各种可视化文档理解任务(如可视化文档分类或信息提取(又称文档解析))中表现出了最先进的性能。
Geewook Kim's avatar
Geewook Kim committed
4

wanglch's avatar
wanglch committed
5
6
## 论文
- [论文地址] [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
Geewook Kim's avatar
Geewook Kim committed
7

wanglch's avatar
wanglch committed
8
## 模型结构
Geewook Kim's avatar
Geewook Kim committed
9

wanglch's avatar
wanglch committed
10
Donut是一种端到端的(即,自包含的)VDU模型,用于通用理解文档图像。Donut的架构非常简单,由基于Transformer的视觉编码器和文本解码器模块组成。请注意,Donut并不依赖于任何与OCR功能相关的模块,而是使用视觉编码器从给定的文档图像中提取特征。接下来的文本解码器将派生的特征映射到一系列子词令牌,以构造所需的结构化格式。每个模型组件都基于Transformer,因此模型可以轻易地以端到端的方式进行训练。
Geewook Kim's avatar
Geewook Kim committed
11

wanglch's avatar
wanglch committed
12
13
14
<div align="center">
    <img src="misc/overview.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
15

wanglch's avatar
wanglch committed
16
## 算法原理
Geewook Kim's avatar
Geewook Kim committed
17

wanglch's avatar
wanglch committed
18
编码器。视觉编码器将输入文档图像x∈RH×W×C转换为一组嵌入向量{zi|zi∈Rd,1≤i≤n},其中n是特征图大小或图像块数量,d是编码器的潜在向量的维度。
Geewook Kim's avatar
Geewook Kim committed
19

wanglch's avatar
wanglch committed
20
解码器。给定{z},文本解码器生成一个令牌序列(yi)mi=1,其中yi∈Rv是第i个令牌的独热向量,v是令牌词汇表的大小,m是超参数。
Geewook Kim's avatar
Geewook Kim committed
21

wanglch's avatar
wanglch committed
22
<div align=center>
wanglch's avatar
wanglch committed
23
    <img src="./misc/model.png"/>
wanglch's avatar
wanglch committed
24
</div>
Geewook Kim's avatar
Geewook Kim committed
25

wanglch's avatar
wanglch committed
26
27
28
29
## 环境配置
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
wanglch's avatar
wanglch committed
30
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-ubuntu20.04-dtk23.10-py38
Geewook Kim's avatar
Geewook Kim committed
31

wanglch's avatar
wanglch committed
32
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --network=host --device=/dev/kfd --device=/dev/dri/ --group-add video --name donut <your imageID> bash
Geewook Kim's avatar
Geewook Kim committed
33

wanglch's avatar
wanglch committed
34
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
35

wanglch's avatar
wanglch committed
36
37
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
Geewook Kim's avatar
Geewook Kim committed
38

wanglch's avatar
wanglch committed
39
40
41
### Dockerfile(方法二)
```
cd /path/your_code_data/docker
Geewook Kim's avatar
Geewook Kim committed
42

wanglch's avatar
wanglch committed
43
docker build --no-cache -t donut:latest .
Geewook Kim's avatar
Geewook Kim committed
44

wanglch's avatar
wanglch committed
45
docker run --shm-size=64G --name donut -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --network=host --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it donut bash
wanglch's avatar
wanglch committed
46
47
```
### Anaconda(方法三)
Geewook Kim's avatar
Geewook Kim committed
48

wanglch's avatar
wanglch committed
49
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Geewook Kim's avatar
Geewook Kim committed
50
```
wanglch's avatar
wanglch committed
51
52
53
54
55
56
DTK驱动:dtk23.10
python:python3.8
torch:1.13.0
torchvision:0.14.0
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
Geewook Kim's avatar
Geewook Kim committed
57
58

```
wanglch's avatar
wanglch committed
59
conda create -n donut python=3.8
Geewook Kim's avatar
Geewook Kim committed
60

wanglch's avatar
wanglch committed
61
conda activate donut
Geewook Kim's avatar
Geewook Kim committed
62

wanglch's avatar
wanglch committed
63
cd /path/your_code_data/
Geewook Kim's avatar
Geewook Kim committed
64

wanglch's avatar
wanglch committed
65
66
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
```
Geewook Kim's avatar
Geewook Kim committed
67

wanglch's avatar
wanglch committed
68
## 数据集
Geewook Kim's avatar
Geewook Kim committed
69

wanglch's avatar
wanglch committed
70
本仓库提供训练验证数据集[naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2), 如需使用请下载完整的数据集,训练数据目录结构如下,用于自建微调的数据集请按此目录结构进行制备:
wanglch's avatar
wanglch committed
71

wanglch's avatar
wanglch committed
72
```
Geewook Kim's avatar
Geewook Kim committed
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
> tree dataset_name
dataset_name
├── test
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
├── train
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
└── validation
    ├── metadata.jsonl
    ├── {image_path0}
    ├── {image_path1}
              .
              .

> cat dataset_name/test/metadata.jsonl
Geewook Kim's avatar
Geewook Kim committed
95
96
{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
{"file_name": {image_path1}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
Geewook Kim's avatar
Geewook Kim committed
97
98
99
     .
     .
```
wanglch's avatar
wanglch committed
100
## 训练
wanglch's avatar
wanglch committed
101
### 单机单卡
wanglch's avatar
wanglch committed
102
103
104
```
bash single_dcu_train.sh
```
Geewook Kim's avatar
Geewook Kim committed
105
106


wanglch's avatar
wanglch committed
107
## 推理
Geewook Kim's avatar
Geewook Kim committed
108

wanglch's avatar
wanglch committed
109
### 单机单卡
Geewook Kim's avatar
Geewook Kim committed
110

wanglch's avatar
wanglch committed
111
### 票据OCR
Geewook Kim's avatar
Geewook Kim committed
112

wanglch's avatar
wanglch committed
113
114
115
116
```
python donut_cord_infernce.py
```
### 火车票OCR
Geewook Kim's avatar
Geewook Kim committed
117

wanglch's avatar
wanglch committed
118
119
120
```
python donut_zhtrainticket_inference.py
```
Geewook Kim's avatar
Geewook Kim committed
121

wanglch's avatar
wanglch committed
122
## result
Geewook Kim's avatar
Geewook Kim committed
123

wanglch's avatar
wanglch committed
124
### 票据OCR结果
Geewook Kim's avatar
Geewook Kim committed
125

wanglch's avatar
wanglch committed
126
127
128
<div align=center>
    <img src="./result/cord1.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
129
130


wanglch's avatar
wanglch committed
131
### 火车票OCR结果
Geewook Kim's avatar
Geewook Kim committed
132

wanglch's avatar
wanglch committed
133
134
135
<div align=center>
    <img src="./result/train.png"/>
</div>
Geewook Kim's avatar
Geewook Kim committed
136

wanglch's avatar
wanglch committed
137
### 精度
wanglch's avatar
wanglch committed
138
测试数据: [naver-clova-ix/cord-v2](https://hf-mirror.com/datasets/naver-clova-ix/cord-v2)  ,使用的加速卡:V100S/K100。
Geewook Kim's avatar
Geewook Kim committed
139

wanglch's avatar
wanglch committed
140
141
142
143
| device | train_loss | TED_accuracy_score | F1_accuracy_score |
| :------: | :------: | :------: | :------: |
| V100s | 0.0533 | 0.87157 | 0.796 | 
| K100 | 0.038 | 0.87028 | 0.8047 | 
Geewook Kim's avatar
Geewook Kim committed
144

wanglch's avatar
wanglch committed
145
## 应用场景
Geewook Kim's avatar
Geewook Kim committed
146

wanglch's avatar
wanglch committed
147
148
### 算法类别
`ocr`
Geewook Kim's avatar
Geewook Kim committed
149

wanglch's avatar
wanglch committed
150
151
152
153
154
155
156
157
158
159
160
161
### 热点应用行业
`金融,教育,政府,科研,制造,能源,交通`

## 预训练权重

-  [naver-clova-ix/donut-base](https://huggingface.co/naver-clova-ix/donut-base/tree/official)

-  [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official)

- [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) 

## 源码仓库及问题反馈
wanglch's avatar
wanglch committed
162
- https://developer.hpccube.com/codes/modelzoo/donut_pytorch
wanglch's avatar
wanglch committed
163
164
165
## 参考资料
- [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664)
- [donut github](https://github.com/clovaai/donut?tab=readme-ov-file)
Geewook Kim's avatar
Geewook Kim committed
166