-[7.2 OCR and table recognition model](#72-ocr-and-table-recognition-model)
-[7.3 DOC-VQA model](#73-doc-vqa-model)
## 1. Introduction
## 1. Introduction
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks
<aname="2"></a>
## 2. Update log
## 2. Update log
* 2022.02.12 DOC-VQA add LayoutLMv2 model。
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)。
* 2021.12.07 add [DOC-VQA SER and RE tasks](vqa/README.md)。
<aname="3"></a>
## 3. Features
## 3. Features
The main features of PP-Structure are as follows:
The main features of PP-Structure are as follows:
...
@@ -38,21 +36,14 @@ The main features of PP-Structure are as follows:
...
@@ -38,21 +36,14 @@ The main features of PP-Structure are as follows:
- Support custom training for layout analysis and table structure tasks
- Support custom training for layout analysis and table structure tasks
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
- Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE)
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
The figure shows the pipeline of layout analysis + table recognition. The image is first divided into four areas of image, text, title and table by layout analysis, and then OCR detection and recognition is performed on the three areas of image, text and title, and the table is performed table recognition, where the image will also be stored for use.
<aname="42"></a>
### 4.2 DOC-VQA
### 4.2 DOC-VQA
* SER
* SER
...
@@ -77,19 +68,12 @@ The corresponding category and OCR recognition results are also marked at the to
...
@@ -77,19 +68,12 @@ The corresponding category and OCR recognition results are also marked at the to
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
In the figure, the red box represents the question, the blue box represents the answer, and the question and answer are connected by green lines. The corresponding category and OCR recognition results are also marked at the top left of the OCR detection box.
<aname="5"></a>
## 5. Quick start
## 5. Quick start
Start from [Quick Installation](./docs/quickstart.md)
Start from [Quick Installation](./docs/quickstart.md)
<aname="6"></a>
## 6. PP-Structure System
## 6. PP-Structure System
<aname="61"></a>
### 6.1 Layout analysis and table recognition
### 6.1 Layout analysis and table recognition


...
@@ -104,39 +88,33 @@ Layout analysis classifies image by region, including the use of Python scripts
...
@@ -104,39 +88,33 @@ Layout analysis classifies image by region, including the use of Python scripts
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
Table recognition converts table images into excel documents, which include the detection and recognition of table text and the prediction of table structure and cell coordinates. For detailed instructions, please refer to [document](table/README.md)
<aname="62"></a>
### 6.2 DOC-VQA
### 6.2 DOC-VQA
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
Document Visual Question Answering (DOC-VQA) if a type of Visual Question Answering (VQA), which includes Semantic Entity Recognition (SER) and Relation Extraction (RE) tasks. Based on SER task, text recognition and classification in images can be completed. Based on THE RE task, we can extract the relation of the text content in the image, such as judge the problem pair. For details, please refer to [document](vqa/README.md)
<aname="7"></a>
## 7. Model List
## 7. Model List
PP-Structure系列模型列表(更新中)
PP-Structure Series Model List (Updating)
* Layout analysis model
### 7.1 Layout analysis model
|model name|description|download|
|model name|description|download|
| --- | --- | --- |
| --- | --- | --- |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
| ppyolov2_r50vd_dcn_365e_publaynet | The layout analysis model trained on the PubLayNet dataset can divide image into 5 types of areas **text, title, table, picture, and list** | [PubLayNet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) |
|ch_PP-OCRv2_det_slim|[New] Slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection| 3M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar)|
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
|ch_PP-OCRv2_rec_slim|[New] Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text recognition| 9M |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset| 18.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
* DOC-VQA model
### 7.3 DOC-VQA model
|model name|description|model size|download|
|model name|description|model size|download|
| --- | --- | --- | --- |
| --- | --- | --- | --- |
|PP-Layout_v1.0_ser_pretrained|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
|ser_LayoutXLM_xfun_zhd|SER model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar) |
|PP-Layout_v1.0_re_pretrained|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
|re_LayoutXLM_xfun_zh|RE model trained on xfun Chinese dataset based on LayoutXLM|1.4G|[inference model coming soon]() / [trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar) |
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/model_list.md)
If you need to use other models, you can download the model in [PPOCR model_list](../doc/doc_en/models_list_en.md) and [PPStructure model_list](./docs/model_list.md)
-[Key Information Extraction(KIE)](#key-information-extractionkie)
-[1. Quick Use](#1-quick-use)
-[2. Model Training](#2-model-training)
-[3. Model Evaluation](#3-model-evaluation)
-[4. Reference](#4-reference)
# Key Information Extraction(KIE)
# Key Information Extraction(KIE)
...
@@ -6,13 +10,6 @@ This section provides a tutorial example on how to quickly use, train, and evalu
...
@@ -6,13 +10,6 @@ This section provides a tutorial example on how to quickly use, train, and evalu
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
*[1. Quick Use](#1-----)
*[2. Model Training](#2-----)
*[3. Model Evaluation](#3-----)
<aname="1-----"></a>
## 1. Quick Use
## 1. Quick Use
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
...
@@ -37,7 +34,6 @@ The visualization results are shown in the figure below:
...
@@ -37,7 +34,6 @@ The visualization results are shown in the figure below:
<imgsrc="./imgs/0.png"width="800">
<imgsrc="./imgs/0.png"width="800">
</div>
</div>
<aname="2-----"></a>
## 2. Model Training
## 2. Model Training
Create a softlink to the folder, `PaddleOCR/train_data`:
Create a softlink to the folder, `PaddleOCR/train_data`:
...
@@ -51,7 +47,6 @@ The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. Th
...
@@ -51,7 +47,6 @@ The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. Th
Use LayoutParser to identify the layout of a document:
Use LayoutParser to identify the layout of a document:
...
@@ -77,8 +68,6 @@ The following model configurations and label maps are currently supported, which
...
@@ -77,8 +68,6 @@ The following model configurations and label maps are currently supported, which
* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。
* Download TableBank dataset contains both word and latex。
<aname="PostProcess"></a>
## 3. PostProcess
## 3. PostProcess
Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:
...
@@ -119,7 +108,6 @@ Displays results with only the "Text" category:
...
@@ -119,7 +108,6 @@ Displays results with only the "Text" category:
@@ -134,8 +122,6 @@ Displays results with only the "Text" category:
...
@@ -134,8 +122,6 @@ Displays results with only the "Text" category:
**GPU:** a single NVIDIA Tesla P40
**GPU:** a single NVIDIA Tesla P40
<aname="Training"></a>
## 5. Training
## 5. Training
The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model.md)
The above model is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). If you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model.md)
For more installation tutorials, please refer to: [Install doc](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
For more installation tutorials, please refer to: [Install doc](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
<aname="Data_preparation"></a>
## 2. Data preparation
## 2. Data preparation
Download the [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) dataset
Download the [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) dataset
...
@@ -80,8 +65,6 @@ PubLayNet directory structure after decompressing :
...
@@ -80,8 +65,6 @@ PubLayNet directory structure after decompressing :
For other datasets,please refer to [the PrepareDataSet]((https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/PrepareDataSet.md) )
For other datasets,please refer to [the PrepareDataSet]((https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/PrepareDataSet.md) )
<aname="Configuration"></a>
## 3. Configuration
## 3. Configuration
We use the `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` configuration for training,the configuration file is as follows
We use the `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` configuration for training,the configuration file is as follows
...
@@ -113,8 +96,6 @@ The `ppyolov2_r50vd_dcn_365e_coco.yml` configuration depends on other configurat
...
@@ -113,8 +96,6 @@ The `ppyolov2_r50vd_dcn_365e_coco.yml` configuration depends on other configurat
Modify the preceding files, such as the dataset path and batch size etc.
Modify the preceding files, such as the dataset path and batch size etc.
<aname="Training"></a>
## 4. Training
## 4. Training
PaddleDetection provides single-card/multi-card training mode to meet various training needs of users:
PaddleDetection provides single-card/multi-card training mode to meet various training needs of users:
`--draw_threshold` is an optional parameter. According to the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659), different threshold will produce different results, ` keep_top_k ` represent the maximum amount of output target, the default value is 10. You can set different value according to your own actual situation。
`--draw_threshold` is an optional parameter. According to the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659), different threshold will produce different results, ` keep_top_k ` represent the maximum amount of output target, the default value is 10. You can set different value according to your own actual situation。
<aname="Deployment"></a>
## 6. Deployment
## 6. Deployment
Use your trained model in Layout Parser
Use your trained model in Layout Parser
<aname="Export_model"></a>
### 6.1 Export model
### 6.1 Export model
n the process of model training, the model file saved contains the process of forward prediction and back propagation. In the actual industrial deployment, there is no need for back propagation. Therefore, the model should be translated into the model format required by the deployment. The `tools/export_model.py` script is provided in PaddleDetection to export the model.
n the process of model training, the model file saved contains the process of forward prediction and back propagation. In the actual industrial deployment, there is no need for back propagation. Therefore, the model should be translated into the model format required by the deployment. The `tools/export_model.py` script is provided in PaddleDetection to export the model.
...
@@ -183,8 +158,6 @@ The prediction model is exported to `inference/ppyolov2_r50vd_dcn_365e_coco` ,in
...
@@ -183,8 +158,6 @@ The prediction model is exported to `inference/ppyolov2_r50vd_dcn_365e_coco` ,in
More model export tutorials, please refer to:[EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md)
More model export tutorials, please refer to:[EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md)
<aname="Inference"></a>
### 6.2 Inference
### 6.2 Inference
`model_path` represent the trained model path, and layoutparser is used to predict:
`model_path` represent the trained model path, and layoutparser is used to predict:
More PaddleDetection training tutorials,please reference:[PaddleDetection Training](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/GETTING_STARTED_cn.md)
More PaddleDetection training tutorials,please reference:[PaddleDetection Training](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/GETTING_STARTED_cn.md)