README.md 6.6 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
English | [简体中文](README_ch.md)
WenmuZhou's avatar
WenmuZhou committed
2

WenmuZhou's avatar
WenmuZhou committed
3

grasswolfs's avatar
grasswolfs committed
4
# Getting Started
WenmuZhou's avatar
WenmuZhou committed
5

MissPenguin's avatar
MissPenguin committed
6
[1. Install whl package](#Install)
WenmuZhou's avatar
WenmuZhou committed
7

MissPenguin's avatar
MissPenguin committed
8
[2. Quick Start](#QuickStart)
WenmuZhou's avatar
WenmuZhou committed
9

grasswolfs's avatar
grasswolfs committed
10
[3. PostProcess](#PostProcess)
WenmuZhou's avatar
WenmuZhou committed
11

grasswolfs's avatar
grasswolfs committed
12
[4. Results](#Results)
WenmuZhou's avatar
WenmuZhou committed
13

grasswolfs's avatar
grasswolfs committed
14
15
[5. Training](#Training)

MissPenguin's avatar
MissPenguin committed
16
<a name="Install"></a>
grasswolfs's avatar
grasswolfs committed
17
18

## 1.  Install whl package
WenmuZhou's avatar
WenmuZhou committed
19
```bash
grasswolfs's avatar
grasswolfs committed
20
21
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
WenmuZhou's avatar
WenmuZhou committed
22
23
```

MissPenguin's avatar
MissPenguin committed
24
<a name="QuickStart"></a>
WenmuZhou's avatar
WenmuZhou committed
25

grasswolfs's avatar
grasswolfs committed
26
## 2. Quick Start
WenmuZhou's avatar
WenmuZhou committed
27

MissPenguin's avatar
MissPenguin committed
28
Use LayoutParser to identify the layout of a document:
WenmuZhou's avatar
WenmuZhou committed
29
30

```python
WenmuZhou's avatar
WenmuZhou committed
31
import cv2
WenmuZhou's avatar
WenmuZhou committed
32
import layoutparser as lp
WenmuZhou's avatar
WenmuZhou committed
33
image = cv2.imread("doc/table/layout.jpg")
WenmuZhou's avatar
WenmuZhou committed
34
35
image = image[..., ::-1]

grasswolfs's avatar
grasswolfs committed
36
37
# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
WenmuZhou's avatar
WenmuZhou committed
38
39
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
grasswolfs's avatar
grasswolfs committed
40
                                enforce_cpu=False,
WenmuZhou's avatar
WenmuZhou committed
41
                                enable_mkldnn=True)
grasswolfs's avatar
grasswolfs committed
42
# detect
WenmuZhou's avatar
WenmuZhou committed
43
44
layout = model.detect(image)

grasswolfs's avatar
grasswolfs committed
45
# show result
WenmuZhou's avatar
WenmuZhou committed
46
47
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
48
49
```

grasswolfs's avatar
grasswolfs committed
50
The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`
WenmuZhou's avatar
WenmuZhou committed
51
52
53
54

<div align="center">
<img src="../../doc/table/result_all.jpg"  width = "600" />
</div>
grasswolfs's avatar
grasswolfs committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
`PaddleDetectionLayoutModel`parameters are described as follows:

|   parameter    |                       description                        |   default   |                            remark                            |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
|   batch_size   |                    testing batch size                    |      1      |                              \                               |
|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |

The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:
WenmuZhou's avatar
WenmuZhou committed
70
71
72
73
74
75
76

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

grasswolfs's avatar
grasswolfs committed
77
78
* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。
WenmuZhou's avatar
WenmuZhou committed
79

grasswolfs's avatar
grasswolfs committed
80
<a name="PostProcess"></a>
WenmuZhou's avatar
WenmuZhou committed
81

grasswolfs's avatar
grasswolfs committed
82
## 3. PostProcess
WenmuZhou's avatar
WenmuZhou committed
83

grasswolfs's avatar
grasswolfs committed
84
Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
WenmuZhou's avatar
WenmuZhou committed
85
86

```python
grasswolfs's avatar
grasswolfs committed
87
88
# follow the above code
# filter areas for a specific text type
WenmuZhou's avatar
WenmuZhou committed
89
90
91
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

grasswolfs's avatar
grasswolfs committed
92
# text areas may be detected within the image area, delete these areas
WenmuZhou's avatar
WenmuZhou committed
93
94
95
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

grasswolfs's avatar
grasswolfs committed
96
# sort text areas and assign ID
WenmuZhou's avatar
WenmuZhou committed
97
98
99
100
101
102
103
104
105
106
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

grasswolfs's avatar
grasswolfs committed
107
# the two lists are merged and the indexes are added in order
WenmuZhou's avatar
WenmuZhou committed
108
109
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

grasswolfs's avatar
grasswolfs committed
110
# display result
WenmuZhou's avatar
WenmuZhou committed
111
show_img = lp.draw_box(image, text_blocks,
grasswolfs's avatar
grasswolfs committed
112
            box_width=3,
WenmuZhou's avatar
WenmuZhou committed
113
            show_element_id=True)
WenmuZhou's avatar
WenmuZhou committed
114
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
115
116
```

grasswolfs's avatar
grasswolfs committed
117
Displays results with only the "Text" category:
WenmuZhou's avatar
WenmuZhou committed
118
119
120
121

<div align="center">
<img src="../../doc/table/result_text.jpg"  width = "600" />
</div>
grasswolfs's avatar
grasswolfs committed
122
<a name="Results"></a>
WenmuZhou's avatar
WenmuZhou committed
123

grasswolfs's avatar
grasswolfs committed
124
## 4. Results
WenmuZhou's avatar
WenmuZhou committed
125
126
127
128
129
130

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

grasswolfs's avatar
grasswolfs committed
131
**Envrionment:**
WenmuZhou's avatar
WenmuZhou committed
132

grasswolfs's avatar
grasswolfs committed
133
**CPU:**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core
WenmuZhou's avatar
WenmuZhou committed
134

grasswolfs's avatar
grasswolfs committed
135
**GPU:**  a single NVIDIA Tesla P40
WenmuZhou's avatar
WenmuZhou committed
136

grasswolfs's avatar
grasswolfs committed
137
<a name="Training"></a>
WenmuZhou's avatar
WenmuZhou committed
138

grasswolfs's avatar
grasswolfs committed
139
## 5. Training
WenmuZhou's avatar
WenmuZhou committed
140

MissPenguin's avatar
MissPenguin committed
141
The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model.md)