README.md 7.52 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
English | [简体中文](README_ch.md)
WenmuZhou's avatar
WenmuZhou committed
2

grasswolfs's avatar
grasswolfs committed
3
# PP-Structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
4

grasswolfs's avatar
grasswolfs committed
5
6
7
8
9
10
11
PP-Structure is an OCR toolkit that can be used for complex documents analysis. The main features are as follows:
- Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser)
- Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR)
- Support to extract excel files from the table areas
- Support python whl package and command line usage, easy to use
- Support custom training for layout analysis and table structure tasks
- The total model size is only about 18.6M (continuous optimization)
12

grasswolfs's avatar
grasswolfs committed
13
## 1. Visualization
14

grasswolfs's avatar
grasswolfs committed
15
<img src="../doc/table/ppstructure.GIF" width="100%"/>
16
17
18



grasswolfs's avatar
grasswolfs committed
19
20
21
## 2. Installation

### 2.1 Install requirements
22

grasswolfs's avatar
grasswolfs committed
23
- **(1) Install PaddlePaddle**
24
25

```bash
grasswolfs's avatar
grasswolfs committed
26
27
28
pip3 install --upgrade pip

# GPU
Daniel Yang's avatar
Daniel Yang committed
29
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple
30

grasswolfs's avatar
grasswolfs committed
31
# CPU
Daniel Yang's avatar
Daniel Yang committed
32
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
33

grasswolfs's avatar
grasswolfs committed
34
# For more,refer[Installation](https://www.paddlepaddle.org.cn/install/quick)。
35
```
WenmuZhou's avatar
opt doc  
WenmuZhou committed
36

grasswolfs's avatar
grasswolfs committed
37
- **(2) Install Layout-Parser**
WenmuZhou's avatar
WenmuZhou committed
38

39
```bash
grasswolfs's avatar
grasswolfs committed
40
pip3 install -U premailer paddleocr https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
41
42
```

grasswolfs's avatar
grasswolfs committed
43
44
45
### 2.2 Install PaddleOCR(including PP-OCR and PP-Structure)

- **(1) PIP install PaddleOCR whl package(inference only)**
WenmuZhou's avatar
WenmuZhou committed
46

47
```bash
Daniel Yang's avatar
Daniel Yang committed
48
pip install "paddleocr>=2.2"
49
```
grasswolfs's avatar
grasswolfs committed
50
51
52
53
54

- **(2) Clone PaddleOCR(Inference+training)**

```bash
git clone https://github.com/PaddlePaddle/PaddleOCR
WenmuZhou's avatar
opt doc  
WenmuZhou committed
55
```
WenmuZhou's avatar
WenmuZhou committed
56
57


MissPenguin's avatar
MissPenguin committed
58
## 3. Quick Start
grasswolfs's avatar
grasswolfs committed
59
60

### 3.1 Use by command line
WenmuZhou's avatar
WenmuZhou committed
61

WenmuZhou's avatar
opt doc  
WenmuZhou committed
62
```bash
WenmuZhou's avatar
WenmuZhou committed
63
paddleocr --image_dir=../doc/table/1.png --type=structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
64
65
```

grasswolfs's avatar
grasswolfs committed
66
### 3.2 Use by python API
WenmuZhou's avatar
WenmuZhou committed
67
68

```python
WenmuZhou's avatar
WenmuZhou committed
69
import os
WenmuZhou's avatar
WenmuZhou committed
70
import cv2
WenmuZhou's avatar
WenmuZhou committed
71
from paddleocr import PPStructure,draw_structure_result,save_structure_res
WenmuZhou's avatar
WenmuZhou committed
72

WenmuZhou's avatar
WenmuZhou committed
73
table_engine = PPStructure(show_log=True)
WenmuZhou's avatar
WenmuZhou committed
74

WenmuZhou's avatar
WenmuZhou committed
75
save_folder = './output/table'
WenmuZhou's avatar
WenmuZhou committed
76
77
78
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
WenmuZhou's avatar
WenmuZhou committed
79
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
WenmuZhou's avatar
WenmuZhou committed
80

WenmuZhou's avatar
WenmuZhou committed
81
for line in result:
WenmuZhou's avatar
WenmuZhou committed
82
    line.pop('img')
WenmuZhou's avatar
WenmuZhou committed
83
84
85
86
    print(line)

from PIL import Image

WenmuZhou's avatar
WenmuZhou committed
87
font_path = '../doc/fonts/simfang.ttf'
WenmuZhou's avatar
WenmuZhou committed
88
image = Image.open(img_path).convert('RGB')
WenmuZhou's avatar
WenmuZhou committed
89
im_show = draw_structure_result(image, result,font_path=font_path)
WenmuZhou's avatar
WenmuZhou committed
90
91
92
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
MissPenguin's avatar
MissPenguin committed
93
94
### 3.3 Returned results format
The returned results of PP-Structure is a list composed of a dict, an example is as follows
WenmuZhou's avatar
WenmuZhou committed
95
96
97

```shell
[
grasswolfs's avatar
grasswolfs committed
98
99
100
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
WenmuZhou's avatar
WenmuZhou committed
101
102
103
104
105
106
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

grasswolfs's avatar
grasswolfs committed
107
| Parameter            | Description           |
WenmuZhou's avatar
WenmuZhou committed
108
109
110
111
112
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

WenmuZhou's avatar
WenmuZhou committed
113

MissPenguin's avatar
MissPenguin committed
114
### 3.4 Parameter description:
WenmuZhou's avatar
opt doc  
WenmuZhou committed
115
116
117
118
119
120
121
122
123
124

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

WenmuZhou's avatar
WenmuZhou committed
125
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
126

MissPenguin's avatar
MissPenguin committed
127
## 4. PP-Structure Pipeline
WenmuZhou's avatar
opt doc  
WenmuZhou committed
128
129

the process is as follows
WenmuZhou's avatar
WenmuZhou committed
130
![pipeline](../doc/table/pipeline_en.jpg)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
131

MissPenguin's avatar
MissPenguin committed
132
In PP-Structure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
WenmuZhou's avatar
WenmuZhou committed
133

grasswolfs's avatar
grasswolfs committed
134
### 4.1 LayoutParser
WenmuZhou's avatar
opt doc  
WenmuZhou committed
135

WenmuZhou's avatar
WenmuZhou committed
136
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
WenmuZhou's avatar
opt doc  
WenmuZhou committed
137

MissPenguin's avatar
MissPenguin committed
138
### 4.2 Table Recognition
WenmuZhou's avatar
opt doc  
WenmuZhou committed
139

MissPenguin's avatar
MissPenguin committed
140
Table Recognition converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
141

MissPenguin's avatar
MissPenguin committed
142
## 5. Prediction by inference engine
WenmuZhou's avatar
opt doc  
WenmuZhou committed
143

grasswolfs's avatar
grasswolfs committed
144
Use the following commands to complete the inference.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
145
146

```python
WenmuZhou's avatar
WenmuZhou committed
147
148
149
150
151
152
153
154
155
156
157
158
159
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

python3 table/predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
WenmuZhou's avatar
opt doc  
WenmuZhou committed
160
```
WenmuZhou's avatar
WenmuZhou committed
161
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
WenmuZhou committed
162

WenmuZhou's avatar
WenmuZhou committed
163
**Model List**
WenmuZhou's avatar
WenmuZhou committed
164
165


WenmuZhou's avatar
opt doc  
WenmuZhou committed
166
167
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
grasswolfs's avatar
grasswolfs committed
168
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |