README.md 7.29 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
3
4
5
6
7
8
9
10
- [Table Recognition](#table-recognition)
  - [1. pipeline](#1-pipeline)
  - [2. Performance](#2-performance)
  - [3. How to use](#3-how-to-use)
    - [3.1 quick start](#31-quick-start)
    - [3.2 Train](#32-train)
    - [3.3 Eval](#33-eval)
    - [3.4 Inference](#34-inference)


MissPenguin's avatar
MissPenguin committed
11
# Table Recognition
WenmuZhou's avatar
WenmuZhou committed
12
13

## 1. pipeline
MissPenguin's avatar
MissPenguin committed
14
The table recognition mainly contains three models
WenmuZhou's avatar
WenmuZhou committed
15
16
17
18
1. Single line text detection-DB
2. Single line text recognition-CRNN
3. Table structure and cell coordinate prediction-RARE

MissPenguin's avatar
MissPenguin committed
19
The table recognition flow chart is as follows
WenmuZhou's avatar
WenmuZhou committed
20

zhoujun's avatar
zhoujun committed
21
![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)
WenmuZhou's avatar
WenmuZhou committed
22
23
24
25
26
27

1. The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
2. The table structure and cell coordinates is predicted by RARE model.
3. The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
4. The cell recognition result and the table structure together construct the html string of the table.

WenmuZhou's avatar
WenmuZhou committed
28
29
## 2. Performance
We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:
WenmuZhou's avatar
WenmuZhou committed
30

WenmuZhou's avatar
WenmuZhou committed
31
32

|Method|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|
33
34
35
| --- | --- |
| EDD<sup>[2]</sup> | 88.3 |
| Ours | 93.32 |
WenmuZhou's avatar
WenmuZhou committed
36
37
38
39

## 3. How to use

### 3.1 quick start
WenmuZhou's avatar
WenmuZhou committed
40

WenmuZhou's avatar
WenmuZhou committed
41
42
43
44
45
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
46
47
48
49
50
# Download the detection model of the ultra-lightweight table English OCR model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar && tar xf en_ppocr_mobile_v2.0_table_det_infer.tar
# Download the recognition model of the ultra-lightweight table English OCR model and unzip it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar && tar xf en_ppocr_mobile_v2.0_table_rec_infer.tar
# Download the ultra-lightweight English table inch model and unzip it
WenmuZhou's avatar
WenmuZhou committed
51
52
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
53
# run
54
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=./docs/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table
WenmuZhou's avatar
WenmuZhou committed
55
```
56
57
Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`.

WenmuZhou's avatar
WenmuZhou committed
58
59
After running, the excel sheet of each picture will be saved in the directory specified by the output field

WenmuZhou's avatar
WenmuZhou committed
60
### 3.2 Train
WenmuZhou's avatar
WenmuZhou committed
61
62
63

In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents

WenmuZhou's avatar
WenmuZhou committed
64
* data preparation  
WenmuZhou's avatar
WenmuZhou committed
65
66
The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。

WenmuZhou's avatar
WenmuZhou committed
67
* Start training  
WenmuZhou's avatar
WenmuZhou committed
68
69
70
71
72
73
74
75
76
77
78
79
*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
```shell
# single GPU training
python3 tools/train.py -c configs/table/table_mv3.yml
# multi-GPU training
# Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
```

In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).

WenmuZhou's avatar
WenmuZhou committed
80
* load trained model and continue training
WenmuZhou's avatar
WenmuZhou committed
81
82
83
84
85
86
87
88

If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

```shell
python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
```

**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
WenmuZhou's avatar
WenmuZhou committed
89

WenmuZhou's avatar
WenmuZhou committed
90
### 3.3 Eval
WenmuZhou's avatar
WenmuZhou committed
91

WenmuZhou's avatar
WenmuZhou committed
92
The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
WenmuZhou's avatar
WenmuZhou committed
93
```json
WenmuZhou's avatar
WenmuZhou committed
94
{"PMC4289340_004_00.png": [
95
96
  ["<html>", "<body>", "<table>", "<thead>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "</thead>", "<tbody>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>",  "</tbody>", "</table>", "</body>", "</html>"],
  [[1, 4, 29, 13], [137, 4, 161, 13], [215, 4, 236, 13], [1, 17, 30, 27], [137, 17, 147, 27], [215, 17, 225, 27]],
WenmuZhou's avatar
WenmuZhou committed
97
98
  [["<b>", "F", "e", "a", "t", "u", "r", "e", "</b>"], ["<b>", "G", "b", "3", " ", "+", "</b>"], ["<b>", "G", "b", "3", " ", "-", "</b>"], ["<b>", "P", "a", "t", "i", "e", "n", "t", "s", "</b>"], ["6", "2"], ["4", "5"]]
]}
WenmuZhou's avatar
WenmuZhou committed
99
100
101
102
103
104
105
106
```
In gt json, the key is the image name, the value is the corresponding gt, and gt is a list composed of four items, and each item is
1. HTML string list of table structure
2. The coordinates of each cell (not including the empty text in the cell)
3. The text information in each cell (not including the empty text in the cell)

Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
```python
WenmuZhou's avatar
WenmuZhou committed
107
cd PaddleOCR/ppstructure
108
python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --gt_path=path/to/gt.json
WenmuZhou's avatar
WenmuZhou committed
109
110
```

WenmuZhou's avatar
WenmuZhou committed
111
112
If the PubLatNet eval dataset is used, it will be output
```bash
WenmuZhou's avatar
WenmuZhou committed
113
teds: 93.32
WenmuZhou's avatar
WenmuZhou committed
114
```
WenmuZhou's avatar
WenmuZhou committed
115

WenmuZhou's avatar
WenmuZhou committed
116
### 3.4 Inference
WenmuZhou's avatar
WenmuZhou committed
117
118

```python
WenmuZhou's avatar
WenmuZhou committed
119
cd PaddleOCR/ppstructure
WenmuZhou's avatar
WenmuZhou committed
120
python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
WenmuZhou's avatar
WenmuZhou committed
121
```
MissPenguin's avatar
MissPenguin committed
122
After running, the excel sheet of each picture will be saved in the directory specified by the output field
WenmuZhou's avatar
WenmuZhou committed
123
124
125

Reference
1. https://github.com/ibm-aur-nlp/PubTabNet
126
2. https://arxiv.org/pdf/1911.10683