README.md 9.74 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
English | [简体中文](README_ch.md)
WenmuZhou's avatar
WenmuZhou committed
2

grasswolfs's avatar
grasswolfs committed
3
# PP-Structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
4

grasswolfs's avatar
grasswolfs committed
5
6
7
8
9
10
PP-Structure is an OCR toolkit that can be used for complex documents analysis. The main features are as follows:
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
11

grasswolfs's avatar
grasswolfs committed
12
## 1. Visualization
13

grasswolfs's avatar
grasswolfs committed
14
<img src="../doc/table/ppstructure.GIF" width="100%"/>
15
16
17



grasswolfs's avatar
grasswolfs committed
18
19
20
## 2. Installation

### 2.1 Install requirements
21

grasswolfs's avatar
grasswolfs committed
22
- **(1) Install PaddlePaddle**
23
24

```bash
grasswolfs's avatar
grasswolfs committed
25
26
27
pip3 install --upgrade pip

# GPU
Daniel Yang's avatar
Daniel Yang committed
28
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple
29

grasswolfs's avatar
grasswolfs committed
30
# CPU
Daniel Yang's avatar
Daniel Yang committed
31
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
32

grasswolfs's avatar
grasswolfs committed
33
# For more,refer[Installation](https://www.paddlepaddle.org.cn/install/quick)。
34
```
WenmuZhou's avatar
opt doc  
WenmuZhou committed
35

grasswolfs's avatar
grasswolfs committed
36
- **(2) Install Layout-Parser**
WenmuZhou's avatar
WenmuZhou committed
37

38
```bash
grasswolfs's avatar
grasswolfs committed
39
pip3 install -U premailer paddleocr https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
40
41
```

grasswolfs's avatar
grasswolfs committed
42
43
44
### 2.2 Install PaddleOCR(including PP-OCR and PP-Structure)

- **(1) PIP install PaddleOCR whl package(inference only)**
WenmuZhou's avatar
WenmuZhou committed
45

46
```bash
Daniel Yang's avatar
Daniel Yang committed
47
pip install "paddleocr>=2.2"
48
```
grasswolfs's avatar
grasswolfs committed
49
50
51
52
53

- **(2) Clone PaddleOCR(Inference+training)**

```bash
git clone https://github.com/PaddlePaddle/PaddleOCR
WenmuZhou's avatar
opt doc  
WenmuZhou committed
54
```
WenmuZhou's avatar
WenmuZhou committed
55
56


MissPenguin's avatar
MissPenguin committed
57
## 3. Quick Start
grasswolfs's avatar
grasswolfs committed
58
59

### 3.1 Use by command line
WenmuZhou's avatar
WenmuZhou committed
60

WenmuZhou's avatar
opt doc  
WenmuZhou committed
61
```bash
WenmuZhou's avatar
WenmuZhou committed
62
paddleocr --image_dir=../doc/table/1.png --type=structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
63
64
```

grasswolfs's avatar
grasswolfs committed
65
### 3.2 Use by python API
WenmuZhou's avatar
WenmuZhou committed
66
67

```python
WenmuZhou's avatar
WenmuZhou committed
68
import os
WenmuZhou's avatar
WenmuZhou committed
69
import cv2
WenmuZhou's avatar
WenmuZhou committed
70
from paddleocr import PPStructure,draw_structure_result,save_structure_res
WenmuZhou's avatar
WenmuZhou committed
71

WenmuZhou's avatar
WenmuZhou committed
72
table_engine = PPStructure(show_log=True)
WenmuZhou's avatar
WenmuZhou committed
73

WenmuZhou's avatar
WenmuZhou committed
74
save_folder = './output/table'
WenmuZhou's avatar
WenmuZhou committed
75
76
77
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
WenmuZhou's avatar
WenmuZhou committed
78
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
WenmuZhou's avatar
WenmuZhou committed
79

WenmuZhou's avatar
WenmuZhou committed
80
for line in result:
WenmuZhou's avatar
WenmuZhou committed
81
    line.pop('img')
WenmuZhou's avatar
WenmuZhou committed
82
83
84
85
    print(line)

from PIL import Image

WenmuZhou's avatar
WenmuZhou committed
86
font_path = '../doc/fonts/simfang.ttf'
WenmuZhou's avatar
WenmuZhou committed
87
image = Image.open(img_path).convert('RGB')
WenmuZhou's avatar
WenmuZhou committed
88
im_show = draw_structure_result(image, result,font_path=font_path)
WenmuZhou's avatar
WenmuZhou committed
89
90
91
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
MissPenguin's avatar
MissPenguin committed
92
93
### 3.3 Returned results format
The returned results of PP-Structure is a list composed of a dict, an example is as follows
WenmuZhou's avatar
WenmuZhou committed
94
95
96

```shell
[
grasswolfs's avatar
grasswolfs committed
97
98
99
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
WenmuZhou's avatar
WenmuZhou committed
100
101
102
103
104
105
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

grasswolfs's avatar
grasswolfs committed
106
| Parameter            | Description           |
WenmuZhou's avatar
WenmuZhou committed
107
108
109
110
111
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

WenmuZhou's avatar
WenmuZhou committed
112

MissPenguin's avatar
MissPenguin committed
113
### 3.4 Parameter description:
WenmuZhou's avatar
opt doc  
WenmuZhou committed
114
115
116
117
118
119
120
121
122
123

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

WenmuZhou's avatar
WenmuZhou committed
124
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
125

MissPenguin's avatar
MissPenguin committed
126
## 4. PP-Structure Pipeline
WenmuZhou's avatar
opt doc  
WenmuZhou committed
127
128

the process is as follows
WenmuZhou's avatar
WenmuZhou committed
129
![pipeline](../doc/table/pipeline_en.jpg)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
130

MissPenguin's avatar
MissPenguin committed
131
In PP-Structure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
WenmuZhou's avatar
WenmuZhou committed
132

grasswolfs's avatar
grasswolfs committed
133
### 4.1 LayoutParser
WenmuZhou's avatar
opt doc  
WenmuZhou committed
134

WenmuZhou's avatar
WenmuZhou committed
135
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
WenmuZhou's avatar
opt doc  
WenmuZhou committed
136

MissPenguin's avatar
MissPenguin committed
137
### 4.2 Table Recognition
WenmuZhou's avatar
opt doc  
WenmuZhou committed
138

MissPenguin's avatar
MissPenguin committed
139
Table Recognition converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
140

MissPenguin's avatar
MissPenguin committed
141
## 5. Prediction by inference engine
WenmuZhou's avatar
opt doc  
WenmuZhou committed
142

grasswolfs's avatar
grasswolfs committed
143
Use the following commands to complete the inference.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
144
145

```python
WenmuZhou's avatar
WenmuZhou committed
146
147
148
149
150
151
152
153
154
155
156
157
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

158
python3 predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
WenmuZhou's avatar
opt doc  
WenmuZhou committed
159
```
WenmuZhou's avatar
WenmuZhou committed
160
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
WenmuZhou committed
161

WenmuZhou's avatar
WenmuZhou committed
162
**Model List**
WenmuZhou's avatar
WenmuZhou committed
163
164


WenmuZhou's avatar
opt doc  
WenmuZhou committed
165
166
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
grasswolfs's avatar
grasswolfs committed
167
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188

**Model List**

LayoutParser model

|model name|description|download|
| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet data set can be divided into 5 types of areas **text, title, table, picture and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_word | The layout analysis model trained on the TableBank Word dataset can only detect tables | [TableBank Word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) |
| ppyolov2_r50vd_dcn_365e_tableBank_latex | The layout analysis model trained on the TableBank Latex dataset can only detect tables | [TableBank Latex](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_latex.tar) |

OCR and table recognition model

|model name|description|model size|download|
| --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|2.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) |
|en_ppocr_mobile_v2.0_table_det|Text detection of English table scenes trained on PubLayNet dataset|4.7M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar) |
|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |

Daniel Yang's avatar
Daniel Yang committed
189
If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` .