"vscode:/vscode.git/clone" did not exist on "468793641374d00a0cd7017cc0b2310b4c710376"
README_ch.md 6.03 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
2
3
4
[English](README.md) | 简体中文

# 版面分析使用说明

MissPenguin's avatar
update  
MissPenguin committed
5
6
7
8
9
10
11
12
- [1. 安装whl包](#1)
- [2. 使用](#2)
- [3. 后处理](#3)
- [4. 指标](#4)
- [5. 训练版面分析模型](#5)


<a name="1"></a>
grasswolfs's avatar
grasswolfs committed
13
14
15
16
17
## 1.  安装whl包
```bash
pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
```

MissPenguin's avatar
update  
MissPenguin committed
18
<a name="2"></a>
grasswolfs's avatar
grasswolfs committed
19
20
21
22
23
24
25
## 2. 使用

使用layoutparser识别给定文档的布局:

```python
import cv2
import layoutparser as lp
MissPenguin's avatar
update  
MissPenguin committed
26
image = cv2.imread("ppstructure/docs/table/layout.jpg")
grasswolfs's avatar
grasswolfs committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
image = image[..., ::-1]

# 加载模型
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
                                enforce_cpu=False,
                                enable_mkldnn=True)
# 检测
layout = model.detect(image)

# 显示结果
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
```

下图展示了结果,不同颜色的检测框表示不同的类别,并通过`show_element_type`在框的左上角显示具体类别:

<div align="center">
MissPenguin's avatar
update  
MissPenguin committed
46
<img src="../docs/table/result_all.jpg"  width = "600" />
grasswolfs's avatar
grasswolfs committed
47
48
49
50
51
52
53
54
55
56
57
</div>

`PaddleDetectionLayoutModel`函数参数说明如下:

|      参数      |            含义             |   默认值    |                             备注                             |
| :------------: | :-------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |        模型配置路径         |    None     | 指定config_path会自动下载模型(仅第一次,之后模型存在,不会再下载) |
|   model_path   |          模型路径           |    None     | 本地模型路径,config_path和model_path必须设置一个,不能同时为None |
|   threshold    |       预测得分的阈值        |     0.5     |                              \                               |
|  input_shape   |     reshape之后图片尺寸     | [3,640,640] |                              \                               |
|   batch_size   |       测试batch size        |      1      |                              \                               |
WenmuZhou's avatar
WenmuZhou committed
58
|   label_map    |         类别映射表          |    None     | 设置config_path时,可以为None,根据数据集名称自动获取label_map,设置model_path时需要手动指定 |
grasswolfs's avatar
grasswolfs committed
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|  enforce_cpu   |     代码是否使用CPU运行     |    False    |         设置为False表示使用GPU,True表示强制使用CPU          |
| enforce_mkldnn | CPU预测中是否开启MKLDNN加速 |    True     |                              \                               |
|   thread_num   |        设置CPU线程数        |     10      |                              \                               |

目前支持以下几种模型配置和label map,您可以通过修改 `--config_path``--label_map`使用这些模型,从而检测不同类型的内容:

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

* TableBank word和TableBank latex分别在word文档、latex文档数据集训练;
* 下载的TableBank数据集里同时包含word和latex。

MissPenguin's avatar
update  
MissPenguin committed
74
<a name="3"></a>
grasswolfs's avatar
grasswolfs committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
## 3. 后处理

版面分析检测包含多个类别,如果只想获取指定类别(如"Text"类别)的检测框、可以使用下述代码:

```python
# 接上面代码
# 首先过滤特定文本类型的区域
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

# 因为在图像区域内可能检测到文本区域,所以只需要删除它们
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

# 对文本区域排序并分配id
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# 最终合并两个列表,并按顺序添加索引
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# 显示结果
show_img = lp.draw_box(image, text_blocks,
            box_width=3,
            show_element_id=True)
show_img.show()
```

显示只有"Text"类别的结果:

<div align="center">
MissPenguin's avatar
update  
MissPenguin committed
113
<img src="../docs/table/result_text.jpg"  width = "600" />
grasswolfs's avatar
grasswolfs committed
114
115
</div>

MissPenguin's avatar
update  
MissPenguin committed
116
<a name="4"></a>
grasswolfs's avatar
grasswolfs committed
117
118
119
120
121
122
123
124
125
126
127
128
129
## 4. 指标

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

**Envrionment:**

**CPU:**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core

**GPU:**  a single NVIDIA Tesla P40

MissPenguin's avatar
update  
MissPenguin committed
130
<a name="5"></a>
grasswolfs's avatar
grasswolfs committed
131
132
133
## 5. 训练版面分析模型

上述模型基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 训练,如果您想训练自己的版面分析模型,请参考:[train_layoutparser_model](train_layoutparser_model_ch.md)