"ppocr/vscode:/vscode.git/clone" did not exist on "75aa6f2f10821c7cfae2ef808b153cd9a1cfa4b7"
README.md 6.95 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
English | [简体中文](README_ch.md)
WenmuZhou's avatar
WenmuZhou committed
2

3
4
5
6
7
8
9
10
11
12
13
14
15
- [1. Introduction](#1-introduction)
- [2. Update log](#2-update-log)
- [3. Features](#3-features)
- [4. Results](#4-results)
  - [4.1 Layout analysis and table recognition](#41-layout-analysis-and-table-recognition)
  - [4.2 DOC-VQA](#42-doc-vqa)
- [5. Quick start](#5-quick-start)
- [6. PP-Structure System](#6-pp-structure-system)
  - [6.1 Layout analysis and table recognition](#61-layout-analysis-and-table-recognition)
    - [6.1.1 Layout analysis](#611-layout-analysis)
    - [6.1.2 Table recognition](#612-table-recognition)
  - [6.2 DOC-VQA](#62-doc-vqa)
- [7. Model List](#7-model-list)
WenmuZhou's avatar
WenmuZhou committed
16
17
18
  - [7.1 Layout analysis model](#71-layout-analysis-model)
  - [7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
  - [7.3 DOC-VQA model](#73-doc-vqa-model)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
19

WenmuZhou's avatar
update  
WenmuZhou committed
20
<a name="1"></a>
21

WenmuZhou's avatar
update  
WenmuZhou committed
22
## 1. Introduction
23

WenmuZhou's avatar
update  
WenmuZhou committed
24
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
25

WenmuZhou's avatar
update  
WenmuZhou committed
26
<a name="2"></a>
27

WenmuZhou's avatar
update  
WenmuZhou committed
28
## 2. Update log
WenmuZhou's avatar
WenmuZhou committed
29
* 2022.02.12 DOC-VQA add LayoutLMv2 model。
WenmuZhou's avatar
update  
WenmuZhou committed
30
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)
grasswolfs's avatar
grasswolfs committed
31

WenmuZhou's avatar
update  
WenmuZhou committed
32
<a name="3"></a>
33

WenmuZhou's avatar
update  
WenmuZhou committed
34
## 3. Features
35

WenmuZhou's avatar
update  
WenmuZhou committed
36
The main features of PP-Structure are as follows:
grasswolfs's avatar
grasswolfs committed
37

WenmuZhou's avatar
update  
WenmuZhou committed
38
39
40
41
42
43
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
44

WenmuZhou's avatar
opt doc  
WenmuZhou committed
45

WenmuZhou's avatar
update  
WenmuZhou committed
46
<a name="4"></a>
WenmuZhou's avatar
WenmuZhou committed
47

WenmuZhou's avatar
update  
WenmuZhou committed
48
## 4. Results
49

WenmuZhou's avatar
update  
WenmuZhou committed
50
<a name="41"></a>
grasswolfs's avatar
grasswolfs committed
51

WenmuZhou's avatar
update  
WenmuZhou committed
52
### 4.1 Layout analysis and table recognition
WenmuZhou's avatar
WenmuZhou committed
53

WenmuZhou's avatar
update  
WenmuZhou committed
54
<img src="../doc/table/ppstructure.GIF" width="100%"/>
grasswolfs's avatar
grasswolfs committed
55

WenmuZhou's avatar
update  
WenmuZhou committed
56
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
grasswolfs's avatar
grasswolfs committed
57

WenmuZhou's avatar
update  
WenmuZhou committed
58
<a name="42"></a>
WenmuZhou's avatar
WenmuZhou committed
59

WenmuZhou's avatar
update  
WenmuZhou committed
60
### 4.2 DOC-VQA
WenmuZhou's avatar
WenmuZhou committed
61

WenmuZhou's avatar
update  
WenmuZhou committed
62
* SER
63
64
*
![](../doc/vqa/result_ser/zh_val_0_ser.jpg) | ![](../doc/vqa/result_ser/zh_val_42_ser.jpg)
WenmuZhou's avatar
update  
WenmuZhou committed
65
---|---
WenmuZhou's avatar
WenmuZhou committed
66

WenmuZhou's avatar
update  
WenmuZhou committed
67
Different colored boxes in the figure represent different categories. For xfun dataset, there are three categories: query, answer and header:
WenmuZhou's avatar
opt doc  
WenmuZhou committed
68

WenmuZhou's avatar
update  
WenmuZhou committed
69
70
71
* Dark purple: header
* Light purple: query
* Army green: answer
WenmuZhou's avatar
WenmuZhou committed
72

WenmuZhou's avatar
update  
WenmuZhou committed
73
The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
WenmuZhou's avatar
WenmuZhou committed
74
75


WenmuZhou's avatar
update  
WenmuZhou committed
76
* RE
WenmuZhou's avatar
WenmuZhou committed
77

78
![](../doc/vqa/result_re/zh_val_21_re.jpg) | ![](../doc/vqa/result_re/zh_val_40_re.jpg)
WenmuZhou's avatar
update  
WenmuZhou committed
79
---|---
WenmuZhou's avatar
WenmuZhou committed
80
81


WenmuZhou's avatar
update  
WenmuZhou committed
82
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
WenmuZhou's avatar
WenmuZhou committed
83
84


WenmuZhou's avatar
update  
WenmuZhou committed
85
<a name="5"></a>
WenmuZhou's avatar
WenmuZhou committed
86

WenmuZhou's avatar
update  
WenmuZhou committed
87
## 5. Quick start
WenmuZhou's avatar
WenmuZhou committed
88

WenmuZhou's avatar
update  
WenmuZhou committed
89
Start from [Quick Installation](./docs/quickstart.md)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
90

WenmuZhou's avatar
update  
WenmuZhou committed
91
<a name="6"></a>
WenmuZhou's avatar
opt doc  
WenmuZhou committed
92

WenmuZhou's avatar
update  
WenmuZhou committed
93
## 6. PP-Structure System
WenmuZhou's avatar
opt doc  
WenmuZhou committed
94

WenmuZhou's avatar
update  
WenmuZhou committed
95
<a name="61"></a>
WenmuZhou's avatar
opt doc  
WenmuZhou committed
96

WenmuZhou's avatar
update  
WenmuZhou committed
97
### 6.1 Layout analysis and table recognition
WenmuZhou's avatar
opt doc  
WenmuZhou committed
98

WenmuZhou's avatar
update  
WenmuZhou committed
99
![pipeline](../doc/table/pipeline.jpg)
WenmuZhou's avatar
WenmuZhou committed
100

WenmuZhou's avatar
update  
WenmuZhou committed
101
In PP-Structure, the image will be divided into 5 types of areas **text, title, image list and table**. For the first 4 types of areas, directly use PP-OCR system to complete the text detection and recognition. For the table area, after the table structuring process, the table in image is converted into an Excel file with the same table style.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
102

WenmuZhou's avatar
update  
WenmuZhou committed
103
#### 6.1.1 Layout analysis
WenmuZhou's avatar
opt doc  
WenmuZhou committed
104

105
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md).
WenmuZhou's avatar
opt doc  
WenmuZhou committed
106

WenmuZhou's avatar
update  
WenmuZhou committed
107
#### 6.1.2 Table recognition
WenmuZhou's avatar
opt doc  
WenmuZhou committed
108

WenmuZhou's avatar
update  
WenmuZhou committed
109
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
110

WenmuZhou's avatar
update  
WenmuZhou committed
111
<a name="62"></a>
WenmuZhou's avatar
opt doc  
WenmuZhou committed
112

WenmuZhou's avatar
update  
WenmuZhou committed
113
### 6.2 DOC-VQA
WenmuZhou's avatar
WenmuZhou committed
114

WenmuZhou's avatar
update  
WenmuZhou committed
115
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
WenmuZhou's avatar
WenmuZhou committed
116

WenmuZhou's avatar
WenmuZhou committed
117

WenmuZhou's avatar
update  
WenmuZhou committed
118
<a name="7"></a>
WenmuZhou's avatar
WenmuZhou committed
119

WenmuZhou's avatar
update  
WenmuZhou committed
120
## 7. Model List
121

WenmuZhou's avatar
WenmuZhou committed
122
PP-Structure Series Model List (Updating)
123

WenmuZhou's avatar
WenmuZhou committed
124
125
126
127

<a name="71"></a>

### 7.1 Layout analysis model
128
129
130

|model name|description|download|
| --- | --- | --- |
WenmuZhou's avatar
update  
WenmuZhou committed
131
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
132

WenmuZhou's avatar
WenmuZhou committed
133
<a name="72"></a>
WenmuZhou's avatar
update  
WenmuZhou committed
134

WenmuZhou's avatar
WenmuZhou committed
135
### 7.2 OCR and table recognition model
136
137
138

|model name|description|model size|download|
| --- | --- | --- | --- |
WenmuZhou's avatar
WenmuZhou committed
139
140
141
142
143
|ch_PP-OCRv2_det_slim|Slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_PP-OCRv2_rec_slim|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text recognition| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset| 18.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |

<a name="73"></a>
WenmuZhou's avatar
update  
WenmuZhou committed
144

WenmuZhou's avatar
WenmuZhou committed
145
### 7.3 DOC-VQA model
146

WenmuZhou's avatar
update  
WenmuZhou committed
147
148
|model name|description|model size|download|
| --- | --- | --- | --- |
WenmuZhou's avatar
WenmuZhou committed
149
150
|ser_LayoutXLM_xfun_zhd|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|re_LayoutXLM_xfun_zh|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh) |
LDOUBLEV's avatar
LDOUBLEV committed
151

WenmuZhou's avatar
update  
WenmuZhou committed
152
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and  [PPStructure model_list](./docs/model_list.md)