README.md 7.6 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
# PPStructure
WenmuZhou's avatar
WenmuZhou committed
2

WenmuZhou's avatar
WenmuZhou committed
3
PPStructure is an OCR toolkit for complex layout analysis. It can divide document data in the form of pictures into **text, table, title, picture and list** 5 types of areas, and extract the table area as excel
WenmuZhou's avatar
opt doc  
WenmuZhou committed
4
5
6
## 1. Quick start

### install
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
**install PaddlePaddle2.0**

```bash
pip3 install --upgrade pip

# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple

# If you only have cpu on your machine, please run the following command to install

python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple

For more version requirements, please refer to the instructions in the [installation document](https://www.paddlepaddle.org.cn/install/quick) .
```

**Clone PaddleOCR repo**

```bash
# Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR

# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR

# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
```
WenmuZhou's avatar
opt doc  
WenmuZhou committed
33

WenmuZhou's avatar
WenmuZhou committed
34
**install paddleocr**
WenmuZhou's avatar
WenmuZhou committed
35

36
37
38
39
40
41
42
install by pypi
```bash
cd PaddleOCR
pip install "paddleocr>=2.2" #  # Recommend to use version 2.2
```

build own whl package and install
WenmuZhou's avatar
WenmuZhou committed
43

44
45
46
47
```bash
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
```
WenmuZhou's avatar
WenmuZhou committed
48
49
50
**install layoutparser**
```sh
pip3 install -U premailer https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
WenmuZhou's avatar
opt doc  
WenmuZhou committed
51
```
WenmuZhou's avatar
WenmuZhou committed
52

WenmuZhou's avatar
opt doc  
WenmuZhou committed
53
### 1.2 Use
WenmuZhou's avatar
WenmuZhou committed
54

WenmuZhou's avatar
opt doc  
WenmuZhou committed
55
#### 1.2.1 Use by command line
WenmuZhou's avatar
WenmuZhou committed
56

WenmuZhou's avatar
opt doc  
WenmuZhou committed
57
```bash
WenmuZhou's avatar
WenmuZhou committed
58
paddleocr --image_dir=../doc/table/1.png --type=structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
59
60
```

WenmuZhou's avatar
opt doc  
WenmuZhou committed
61
#### 1.2.2 Use by code
WenmuZhou's avatar
WenmuZhou committed
62
63

```python
WenmuZhou's avatar
WenmuZhou committed
64
import os
WenmuZhou's avatar
WenmuZhou committed
65
import cv2
WenmuZhou's avatar
WenmuZhou committed
66
from paddleocr import PPStructure,draw_structure_result,save_structure_res
WenmuZhou's avatar
WenmuZhou committed
67

WenmuZhou's avatar
WenmuZhou committed
68
table_engine = PPStructure(show_log=True)
WenmuZhou's avatar
WenmuZhou committed
69

WenmuZhou's avatar
WenmuZhou committed
70
save_folder = './output/table'
WenmuZhou's avatar
WenmuZhou committed
71
72
73
img_path = '../doc/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
WenmuZhou's avatar
WenmuZhou committed
74
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
WenmuZhou's avatar
WenmuZhou committed
75

WenmuZhou's avatar
WenmuZhou committed
76
for line in result:
WenmuZhou's avatar
WenmuZhou committed
77
    line.pop('img')
WenmuZhou's avatar
WenmuZhou committed
78
79
80
81
    print(line)

from PIL import Image

WenmuZhou's avatar
WenmuZhou committed
82
font_path = '../doc/fonts/simfang.ttf'
WenmuZhou's avatar
WenmuZhou committed
83
image = Image.open(img_path).convert('RGB')
WenmuZhou's avatar
WenmuZhou committed
84
im_show = draw_structure_result(image, result,font_path=font_path)
WenmuZhou's avatar
WenmuZhou committed
85
86
87
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
WenmuZhou's avatar
WenmuZhou committed
88
#### 1.2.3 返回结果说明
WenmuZhou's avatar
WenmuZhou committed
89
The return result of PPStructure is a list composed of a dict, an example is as follows
WenmuZhou's avatar
WenmuZhou committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

```shell
[
  {   'type': 'Text', 
      'bbox': [34, 432, 345, 462], 
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]], 
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
The description of each field in dict is as follows

| Parameter            | Description           | 
| --------------- | -------------|
|type|Type of image area|
|bbox|The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y]|
|res|OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text|

WenmuZhou's avatar
WenmuZhou committed
108

WenmuZhou's avatar
WenmuZhou committed
109
#### 1.2.4 Parameter Description:
WenmuZhou's avatar
opt doc  
WenmuZhou committed
110
111
112
113
114
115
116
117
118
119

| Parameter            | Description                                     | Default value                                        |
| --------------- | ---------------------------------------- | ------------------------------------------- |
| output          | The path where excel and recognition results are saved                | ./output/table                              |
| table_max_len   | The long side of the image is resized in table structure model  | 488                                         |
| table_model_dir | inference model path of table structure model          | None                                        |
| table_char_type | dict path of table structure model                 | ../ppocr/utils/dict/table_structure_dict.tx |

Most of the parameters are consistent with the paddleocr whl package, see [doc of whl](../doc/doc_en/whl_en.md)

WenmuZhou's avatar
WenmuZhou committed
120
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
opt doc  
WenmuZhou committed
121

WenmuZhou's avatar
WenmuZhou committed
122
## 2. PPStructure Pipeline
WenmuZhou's avatar
opt doc  
WenmuZhou committed
123
124

the process is as follows
WenmuZhou's avatar
WenmuZhou committed
125
![pipeline](../doc/table/pipeline_en.jpg)
WenmuZhou's avatar
opt doc  
WenmuZhou committed
126

WenmuZhou's avatar
WenmuZhou committed
127
In PPStructure, the image will be analyzed by layoutparser first. In the layout analysis, the area in the image will be classified, including **text, title, image, list and table** 5 categories. For the first 4 types of areas, directly use the PP-OCR to complete the text detection and recognition. The table area will  be converted to an excel file of the same table style via Table OCR.
WenmuZhou's avatar
WenmuZhou committed
128

WenmuZhou's avatar
opt doc  
WenmuZhou committed
129
130
### 2.1 LayoutParser

WenmuZhou's avatar
WenmuZhou committed
131
Layout analysis divides the document data into regions, including the use of Python scripts for layout analysis tools, extraction of special category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
WenmuZhou's avatar
opt doc  
WenmuZhou committed
132

WenmuZhou's avatar
WenmuZhou committed
133
### 2.2 Table Structure
WenmuZhou's avatar
opt doc  
WenmuZhou committed
134
135
136

Table OCR converts table image into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed, please refer to [document](table/README.md)

WenmuZhou's avatar
WenmuZhou committed
137
## 3. Predictive by inference engine
WenmuZhou's avatar
opt doc  
WenmuZhou committed
138
139
140
141

Use the following commands to complete the inference. 

```python
WenmuZhou's avatar
WenmuZhou committed
142
143
144
145
146
147
148
149
150
151
152
153
154
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the table structure model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..

python3 table/predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer --rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_type=ch --det_limit_side_len=736 --det_limit_type=min --output=../output/table --vis_font_path=../doc/fonts/simfang.ttf
WenmuZhou's avatar
opt doc  
WenmuZhou committed
155
```
WenmuZhou's avatar
WenmuZhou committed
156
After running, each image will have a directory with the same name under the directory specified in the output field. Each table in the picture will be stored as an excel and figure area will be cropped and saved, the excel and image file name will be the coordinates of the table in the image.
WenmuZhou's avatar
WenmuZhou committed
157

WenmuZhou's avatar
WenmuZhou committed
158
**Model List**
WenmuZhou's avatar
WenmuZhou committed
159
160


WenmuZhou's avatar
opt doc  
WenmuZhou committed
161
162
163
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |