README_en.md 6.6 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Getting Started

[1. Install whl package](#Install whl package)

[2. Quick Start](#Quick Start)

[3. PostProcess](#PostProcess)

[4. Results](#Results)

[5. Training](#Training)

<a name="Install whl package"></a>

## 1.  Install whl package
```bash
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
```

<a name="Quick Start"></a>

## 2. Quick Start

Use LayoutParser to identify the layout of a given document:

```python
WenmuZhou's avatar
WenmuZhou committed
28
import cv2
WenmuZhou's avatar
WenmuZhou committed
29
import layoutparser as lp
WenmuZhou's avatar
WenmuZhou committed
30
image = cv2.imread("doc/table/layout.jpg")
WenmuZhou's avatar
WenmuZhou committed
31
32
33
34
35
36
37
38
39
40
41
42
image = image[..., ::-1]

# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config", 
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
                                enforce_cpu=False, 
                                enable_mkldnn=True)
# detect
layout = model.detect(image)

# show result
WenmuZhou's avatar
WenmuZhou committed
43
44
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
```

The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`

<div align="center">
<img src="../../doc/table/result_all.jpg"  width = "600" />
</div>
`PaddleDetectionLayoutModel`parameters are described as follows:

|   parameter    |                       description                        |   default   |                            remark                            |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
|   batch_size   |                    testing batch size                    |      1      |                              \                               |
|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |

The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。

<a name="PostProcess"></a>

## 3. PostProcess

Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:

```python
WenmuZhou's avatar
WenmuZhou committed
84
# follow the above code
WenmuZhou's avatar
WenmuZhou committed
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# filter areas for a specific text type
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

# text areas may be detected within the image area, delete these areas
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

# sort text areas and assign ID
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# the two lists are merged and the indexes are added in order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# display result
WenmuZhou's avatar
WenmuZhou committed
108
show_img = lp.draw_box(image, text_blocks,
WenmuZhou's avatar
WenmuZhou committed
109
110
            box_width=3, 
            show_element_id=True)
WenmuZhou's avatar
WenmuZhou committed
111
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
```

Displays results with only the "Text" category:

<div align="center">
<img src="../../doc/table/result_text.jpg"  width = "600" />
</div>
<a name="Results"></a>

## 4. Results

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

**Envrionment:**	

**CPU:**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core

**GPU:**  a single NVIDIA Tesla P40

<a name="Training"></a>

## 5. Training

The above model is based on PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) ,if you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model_en.md)