README.md 6.61 KB
Newer Older
grasswolfs's avatar
grasswolfs committed
1
English | [简体中文](README_ch.md)
WenmuZhou's avatar
WenmuZhou committed
2
3
4
5
6
7
- [Getting Started](#getting-started)
  - [1.  Install whl package](#1--install-whl-package)
  - [2. Quick Start](#2-quick-start)
  - [3. PostProcess](#3-postprocess)
  - [4. Results](#4-results)
  - [5. Training](#5-training)
WenmuZhou's avatar
WenmuZhou committed
8

grasswolfs's avatar
grasswolfs committed
9
# Getting Started
WenmuZhou's avatar
WenmuZhou committed
10

grasswolfs's avatar
grasswolfs committed
11
## 1.  Install whl package
WenmuZhou's avatar
WenmuZhou committed
12
```bash
grasswolfs's avatar
grasswolfs committed
13
14
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
WenmuZhou's avatar
WenmuZhou committed
15
16
```

grasswolfs's avatar
grasswolfs committed
17
## 2. Quick Start
WenmuZhou's avatar
WenmuZhou committed
18

MissPenguin's avatar
MissPenguin committed
19
Use LayoutParser to identify the layout of a document:
WenmuZhou's avatar
WenmuZhou committed
20
21

```python
WenmuZhou's avatar
WenmuZhou committed
22
import cv2
WenmuZhou's avatar
WenmuZhou committed
23
import layoutparser as lp
WenmuZhou's avatar
WenmuZhou committed
24
image = cv2.imread("doc/table/layout.jpg")
WenmuZhou's avatar
WenmuZhou committed
25
26
image = image[..., ::-1]

grasswolfs's avatar
grasswolfs committed
27
28
# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
WenmuZhou's avatar
WenmuZhou committed
29
30
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
grasswolfs's avatar
grasswolfs committed
31
                                enforce_cpu=False,
WenmuZhou's avatar
WenmuZhou committed
32
                                enable_mkldnn=True)
grasswolfs's avatar
grasswolfs committed
33
# detect
WenmuZhou's avatar
WenmuZhou committed
34
35
layout = model.detect(image)

grasswolfs's avatar
grasswolfs committed
36
# show result
WenmuZhou's avatar
WenmuZhou committed
37
38
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
39
40
```

grasswolfs's avatar
grasswolfs committed
41
The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`
WenmuZhou's avatar
WenmuZhou committed
42
43
44
45

<div align="center">
<img src="../../doc/table/result_all.jpg"  width = "600" />
</div>
grasswolfs's avatar
grasswolfs committed
46
47
48
49
50
51
52
53
54
`PaddleDetectionLayoutModel`parameters are described as follows:

|   parameter    |                       description                        |   default   |                            remark                            |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
|   batch_size   |                    testing batch size                    |      1      |                              \                               |
WenmuZhou's avatar
WenmuZhou committed
55
|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map, You need to specify it manually when setting model_path |
grasswolfs's avatar
grasswolfs committed
56
57
58
59
60
|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |

The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:
WenmuZhou's avatar
WenmuZhou committed
61
62
63
64
65
66
67

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

grasswolfs's avatar
grasswolfs committed
68
69
* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。
WenmuZhou's avatar
WenmuZhou committed
70

grasswolfs's avatar
grasswolfs committed
71
## 3. PostProcess
WenmuZhou's avatar
WenmuZhou committed
72

grasswolfs's avatar
grasswolfs committed
73
Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
WenmuZhou's avatar
WenmuZhou committed
74
75

```python
grasswolfs's avatar
grasswolfs committed
76
77
# follow the above code
# filter areas for a specific text type
WenmuZhou's avatar
WenmuZhou committed
78
79
80
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

grasswolfs's avatar
grasswolfs committed
81
# text areas may be detected within the image area, delete these areas
WenmuZhou's avatar
WenmuZhou committed
82
83
84
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

grasswolfs's avatar
grasswolfs committed
85
# sort text areas and assign ID
WenmuZhou's avatar
WenmuZhou committed
86
87
88
89
90
91
92
93
94
95
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

grasswolfs's avatar
grasswolfs committed
96
# the two lists are merged and the indexes are added in order
WenmuZhou's avatar
WenmuZhou committed
97
98
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

grasswolfs's avatar
grasswolfs committed
99
# display result
WenmuZhou's avatar
WenmuZhou committed
100
show_img = lp.draw_box(image, text_blocks,
grasswolfs's avatar
grasswolfs committed
101
            box_width=3,
WenmuZhou's avatar
WenmuZhou committed
102
            show_element_id=True)
WenmuZhou's avatar
WenmuZhou committed
103
show_img.show()
WenmuZhou's avatar
WenmuZhou committed
104
105
```

grasswolfs's avatar
grasswolfs committed
106
Displays results with only the "Text" category:
WenmuZhou's avatar
WenmuZhou committed
107
108
109
110
111

<div align="center">
<img src="../../doc/table/result_text.jpg"  width = "600" />
</div>

grasswolfs's avatar
grasswolfs committed
112
## 4. Results
WenmuZhou's avatar
WenmuZhou committed
113
114
115
116
117
118

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

grasswolfs's avatar
grasswolfs committed
119
**Envrionment:**
WenmuZhou's avatar
WenmuZhou committed
120

grasswolfs's avatar
grasswolfs committed
121
**CPU:**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core
WenmuZhou's avatar
WenmuZhou committed
122

grasswolfs's avatar
grasswolfs committed
123
**GPU:**  a single NVIDIA Tesla P40
WenmuZhou's avatar
WenmuZhou committed
124

grasswolfs's avatar
grasswolfs committed
125
## 5. Training
WenmuZhou's avatar
WenmuZhou committed
126

MissPenguin's avatar
MissPenguin committed
127
The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model.md)