"vscode:/vscode.git/clone" did not exist on "54bca8c9424f7fcd3ddf8fe023359bbd93d18b82"
quickstart.md 10.3 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
# PP-Structure 快速开始

MissPenguin's avatar
update  
MissPenguin committed
3
4
5
6
- [1. 安装依赖包](#1)
- [2. 便捷使用](#2)
    - [2.1 命令行使用](#21)
        - [2.1.1 版面分析+表格识别](#211)
7
8
9
10
        - [2.1.2 版面分析](#212)
        - [2.1.3 表格识别](#213)
        - [2.1.4 DocVQA](#214)
    - [2.2 代码使用](#22)
MissPenguin's avatar
update  
MissPenguin committed
11
        - [2.2.1 版面分析+表格识别](#221)
12
13
14
        - [2.2.2 版面分析](#222)
        - [2.2.3 表格识别](#223)
        - [2.2.4 DocVQA](#224)
MissPenguin's avatar
update  
MissPenguin committed
15
16
17
18
19
20
21
    - [2.3 返回结果说明](#23)
        - [2.3.1 版面分析+表格识别](#231)
        - [2.3.2 DocVQA](#232)
    - [2.4 参数说明](#24)


<a name="1"></a>
WenmuZhou's avatar
WenmuZhou committed
22
23
24
## 1. 安装依赖包

```bash
WenmuZhou's avatar
WenmuZhou committed
25
26
# 安装 paddleocr,推荐使用2.5+版本
pip3 install "paddleocr>=2.5"
MissPenguin's avatar
update  
MissPenguin committed
27
# 安装 版面分析依赖包layoutparser(如不需要版面分析功能,可跳过)
28
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
MissPenguin's avatar
update  
MissPenguin committed
29
30
# 安装 DocVQA依赖包paddlenlp(如不需要DocVQA功能,可跳过)
pip install paddlenlp
WenmuZhou's avatar
WenmuZhou committed
31
32
33

```

MissPenguin's avatar
update  
MissPenguin committed
34
<a name="2"></a>
WenmuZhou's avatar
WenmuZhou committed
35
36
## 2. 便捷使用

MissPenguin's avatar
update  
MissPenguin committed
37
38
<a name="21"></a>
### 2.1 命令行使用  
39

MissPenguin's avatar
update  
MissPenguin committed
40
41
<a name="211"></a>
#### 2.1.1 版面分析+表格识别
WenmuZhou's avatar
WenmuZhou committed
42
```bash
43
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
WenmuZhou's avatar
WenmuZhou committed
44
45
```

MissPenguin's avatar
update  
MissPenguin committed
46
<a name="212"></a>
47
48
49
50
51
52
53
54
55
56
57
58
59
#### 2.1.2 版面分析
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
```

<a name="213"></a>
#### 2.1.3 表格识别
```bash
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
```

<a name="214"></a>
#### 2.1.4 DocVQA
WenmuZhou's avatar
WenmuZhou committed
60

fanruinet's avatar
fanruinet committed
61
请参考:[文档视觉问答](../vqa/README.md)
WenmuZhou's avatar
WenmuZhou committed
62

MissPenguin's avatar
update  
MissPenguin committed
63
<a name="22"></a>
64
### 2.2 代码使用
WenmuZhou's avatar
WenmuZhou committed
65

MissPenguin's avatar
update  
MissPenguin committed
66
67
68
<a name="221"></a>
#### 2.2.1 版面分析+表格识别

WenmuZhou's avatar
WenmuZhou committed
69
70
71
72
73
74
75
```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

76
77
save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
WenmuZhou's avatar
WenmuZhou committed
78
79
80
81
82
83
84
85
86
87
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

88
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
WenmuZhou's avatar
WenmuZhou committed
89
90
91
92
93
94
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

MissPenguin's avatar
update  
MissPenguin committed
95
<a name="222"></a>
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
#### 2.2.2 版面分析

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(table=False, ocr=False, show_log=True)

save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

<a name="223"></a>
#### 2.2.3 表格识别

```python
import os
import cv2
from paddleocr import PPStructure,save_structure_res

table_engine = PPStructure(layout=False, show_log=True)

save_folder = './output'
img_path = 'PaddleOCR/ppstructure/docs/table/table.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)
```

<a name="224"></a>
#### 2.2.4 DocVQA
WenmuZhou's avatar
WenmuZhou committed
139

fanruinet's avatar
fanruinet committed
140
请参考:[文档视觉问答](../vqa/README.md)
WenmuZhou's avatar
WenmuZhou committed
141

MissPenguin's avatar
update  
MissPenguin committed
142
<a name="23"></a>
WenmuZhou's avatar
WenmuZhou committed
143
144
145
### 2.3 返回结果说明
PP-Structure的返回结果为一个dict组成的list,示例如下

MissPenguin's avatar
update  
MissPenguin committed
146
147
<a name="231"></a>
#### 2.3.1 版面分析+表格识别
WenmuZhou's avatar
WenmuZhou committed
148
149
150
151
152
153
154
155
156
157
158
```shell
[
  {   'type': 'Text',
      'bbox': [34, 432, 345, 462],
      'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
  }
]
```
dict 里各个字段说明如下

159
160
161
162
163
| 字段            | 说明                                                                                                                                                                                                                                                                                                                                                                                            |
| --------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|type| 图片区域的类型                                                                                                                                                                                                                                                                                                                                                                                       |
|bbox| 图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y]                                                                                                                                                                                                                                                                                                                                                           |
|res| 图片区域的OCR或表格识别结果。<br> 表格: 一个dict,字段说明如下<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `html`: 表格的HTML字符串<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; 在代码使用模式下,前向传入return_ocr_result_in_table=True可以拿到表格中每个文本的检测识别结果,对应为如下字段: <br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `boxes`: 文本检测坐标<br>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; `rec_res`: 文本识别结果。<br> OCR: 一个包含各个单行文字的检测坐标和识别结果的元组 |
WenmuZhou's avatar
WenmuZhou committed
164

MissPenguin's avatar
update  
MissPenguin committed
165
166
167
168
169
170
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。

  ```
  /output/table/1/
    └─ res.txt
    └─ [454, 360, 824, 658].xlsx  表格识别结果
171
172
    └─ [16, 2, 828, 305].jpg            被裁剪出的图片区域
    └─ [17, 361, 404, 711].xlsx        表格识别结果
MissPenguin's avatar
update  
MissPenguin committed
173
174
175
176
  ```

<a name="232"></a>
#### 2.3.2 DocVQA
WenmuZhou's avatar
WenmuZhou committed
177

fanruinet's avatar
fanruinet committed
178
请参考:[文档视觉问答](../vqa/README.md)
WenmuZhou's avatar
WenmuZhou committed
179

MissPenguin's avatar
update  
MissPenguin committed
180
<a name="24"></a>
WenmuZhou's avatar
WenmuZhou committed
181
182
### 2.4 参数说明

183
184
185
186
187
188
189
190
191
192
193
194
195
196
| 字段                   | 说明                                                                                                                                                 | 默认值                                                     |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| output               | excel和识别结果保存的地址                                                                                                                                    | ./output/table                                          |
| table_max_len        | 表格结构模型预测时,图像的长边resize尺度                                                                                                                            | 488                                                     |
| table_model_dir      | 表格结构模型 inference 模型地址                                                                                                                              | None                                                    |
| table_char_dict_path | 表格结构模型所用字典地址                                                                                                                                       | ../ppocr/utils/dict/table_structure_dict.txt            |
| layout_path_model    | 版面分析模型模型地址,可以为在线地址或者本地地址,当为本地地址时,需要指定 layout_label_map, 命令行模式下可通过--layout_label_map='{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}' 指定 | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config |
| layout_label_map     | 版面分析模型模型label映射字典                                                                                                                                  | None                                                    |
| model_name_or_path   | VQA SER模型地址                                                                                                                                        | None                                                    |
| max_seq_length       | VQA SER模型最大支持token长度                                                                                                                               | 512                                                     |
| label_map_path       | VQA SER 标签文件地址                                                                                                                                     | ./vqa/labels/labels_ser.txt                             |
| layout               | 前向中是否执行版面分析                                                                                                                                        | True                                                    |
| table                | 前向中是否执行表格识别                                                                                                                                        | True                                                    |
| ocr                  | 对于版面分析中的非表格区域,是否执行ocr。当layout为False时会被自动设置为False                                                                                                  | True                                                    |
WenmuZhou's avatar
WenmuZhou committed
197

fanruinet's avatar
fanruinet committed
198
大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)