README.md 6.37 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
# PPStructure
WenmuZhou's avatar
WenmuZhou committed
2

WenmuZhou's avatar
WenmuZhou committed
3
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
WenmuZhou's avatar
opt doc  
WenmuZhou committed
4
5
6
7
## 1. Quick start

### install

WenmuZhou's avatar
WenmuZhou committed
8
**install paddleocr**
WenmuZhou's avatar
WenmuZhou committed
9

WenmuZhou's avatar
WenmuZhou committed
10
ref to [paddleocr whl doc](../doc/doc_en/whl_en.md)
WenmuZhou's avatar
WenmuZhou committed
11

WenmuZhou's avatar
WenmuZhou committed
12
13
14
**install layoutparser**
```sh
pip3 install -U premailer https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
WenmuZhou's avatar
opt doc  
WenmuZhou committed
15
```
WenmuZhou's avatar
WenmuZhou committed
16

WenmuZhou's avatar
opt doc  
WenmuZhou committed
17
### 1.2 Use
WenmuZhou's avatar
WenmuZhou committed
18

WenmuZhou's avatar
opt doc  
WenmuZhou committed
19
#### 1.2.1 Use by command line
WenmuZhou's avatar
WenmuZhou committed
20

WenmuZhou's avatar
opt doc  
WenmuZhou committed
21
```bash
WenmuZhou's avatar
WenmuZhou committed
22
paddleocr --image_dir=../doc/table/1.png --type=structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
23
24
```

WenmuZhou's avatar
opt doc  
WenmuZhou committed
25
#### 1.2.2 Use by code
WenmuZhou's avatar
WenmuZhou committed
26
27

```python
WenmuZhou's avatar
WenmuZhou committed
28
import os
WenmuZhou's avatar
WenmuZhou committed
29
import cv2
WenmuZhou's avatar
WenmuZhou committed
30
from paddleocr import PPStructure,draw_structure_result,save_structure_res
WenmuZhou's avatar
WenmuZhou committed
31

WenmuZhou's avatar
WenmuZhou committed
32
table_engine = PPStructure(show_log=True)
WenmuZhou's avatar
WenmuZhou committed
33

WenmuZhou's avatar
WenmuZhou committed
34
save_folder = './output/table'
WenmuZhou's avatar
WenmuZhou committed
35
36
37
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
WenmuZhou's avatar
WenmuZhou committed
38
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
WenmuZhou's avatar
WenmuZhou committed
39

WenmuZhou's avatar
WenmuZhou committed
40
for line in result:
WenmuZhou's avatar
WenmuZhou committed
41
    line.pop('img')
WenmuZhou's avatar
WenmuZhou committed
42
43
44
45
    print(line)

from PIL import Image

WenmuZhou's avatar
WenmuZhou committed
46
font_path = '../doc/fonts/simfang.ttf'
WenmuZhou's avatar
WenmuZhou committed
47
image = Image.open(img_path).convert('RGB')
WenmuZhou's avatar
WenmuZhou committed
48
im_show = draw_structure_result(image, result,font_path=font_path)
WenmuZhou's avatar
WenmuZhou committed
49
50
51
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
WenmuZhou's avatar
WenmuZhou committed
52
#### 1.2.3 返回结果说明
WenmuZhou's avatar
WenmuZhou committed
53
The return result of PPStructure is a list composed of a dict, an example is as follows
WenmuZhou's avatar
WenmuZhou committed
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

```shell
[
  {   'type': 'Text', 
      'bbox': [34, 432, 345, 462], 
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], 
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

| Parameter            | Description           | 
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

WenmuZhou's avatar
WenmuZhou committed
72

WenmuZhou's avatar
WenmuZhou committed
73
#### 1.2.4 Parameter Description:
WenmuZhou's avatar
opt doc  
WenmuZhou committed
74
75
76
77
78
79
80
81
82
83

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

WenmuZhou's avatar
WenmuZhou committed
84
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
85

WenmuZhou's avatar
WenmuZhou committed
86
## 2. PPStructure Pipeline
WenmuZhou's avatar
opt doc  
WenmuZhou committed
87
88

the process is as follows
WenmuZhou's avatar
WenmuZhou committed
89
![pipeline](../doc/table/pipeline_en.jpg)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
90

WenmuZhou's avatar
WenmuZhou committed
91
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
WenmuZhou's avatar
WenmuZhou committed
92

WenmuZhou's avatar
opt doc  
WenmuZhou committed
93
94
### 2.1 LayoutParser

WenmuZhou's avatar
WenmuZhou committed
95
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
WenmuZhou's avatar
opt doc  
WenmuZhou committed
96

WenmuZhou's avatar
WenmuZhou committed
97
### 2.2 Table Structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
98
99
100

Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)

WenmuZhou's avatar
WenmuZhou committed
101
## 3. Predictive by inference engine
WenmuZhou's avatar
opt doc  
WenmuZhou committed
102
103
104
105

Use the following commands to complete the inference. 

```python
WenmuZhou's avatar
WenmuZhou committed
106
107
108
109
110
111
112
113
114
115
116
117
118
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

python3 table/predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
WenmuZhou's avatar
opt doc  
WenmuZhou committed
119
```
WenmuZhou's avatar
WenmuZhou committed
120
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
WenmuZhou committed
121

WenmuZhou's avatar
WenmuZhou committed
122
**Model List**
WenmuZhou's avatar
WenmuZhou committed
123
124


WenmuZhou's avatar
opt doc  
WenmuZhou committed
125
126
127
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |