Merge remote-tracking branch 'origin/dygraph' into dygraph

aa59fca5 · Leif · 12d15752 · f01f24c7 · aa59fca5 · aa59fca5
Commit aa59fca5 authored Apr 28, 2022 by Leif
20 changed files
--- a/doc/table/pipeline.jpg
+++ b/doc/table/pipeline.jpg
--- a/doc/table/pipeline_en.jpg
+++ b/doc/table/pipeline_en.jpg
--- a/doc/table/ppstructure.GIF
+++ b/doc/table/ppstructure.GIF
--- a/doc/table/result_all.jpg
+++ b/doc/table/result_all.jpg
--- a/doc/table/result_text.jpg
+++ b/doc/table/result_text.jpg
--- a/doc/table/table.jpg
+++ b/doc/table/table.jpg
--- a/doc/table/tableocr_pipeline.jpg
+++ b/doc/table/tableocr_pipeline.jpg
--- a/doc/table/tableocr_pipeline_en.jpg
+++ b/doc/table/tableocr_pipeline_en.jpg
--- a/doc/vqa/input/zh_val_0.jpg
+++ b/doc/vqa/input/zh_val_0.jpg
--- a/doc/vqa/input/zh_val_21.jpg
+++ b/doc/vqa/input/zh_val_21.jpg
--- a/doc/vqa/input/zh_val_40.jpg
+++ b/doc/vqa/input/zh_val_40.jpg
--- a/doc/vqa/input/zh_val_42.jpg
+++ b/doc/vqa/input/zh_val_42.jpg
--- a/doc/vqa/result_re/zh_val_21_re.jpg
+++ b/doc/vqa/result_re/zh_val_21_re.jpg
--- a/doc/vqa/result_re/zh_val_40_re.jpg
+++ b/doc/vqa/result_re/zh_val_40_re.jpg
--- a/doc/vqa/result_ser/zh_val_0_ser.jpg
+++ b/doc/vqa/result_ser/zh_val_0_ser.jpg
--- a/doc/vqa/result_ser/zh_val_42_ser.jpg
+++ b/doc/vqa/result_ser/zh_val_42_ser.jpg
--- a/ppstructure/layout/README_ch.md
+++ b/ppstructure/layout/README_ch.md
 [English](README.md) | 简体中文
- [版面分析使用说明](#版面分析使用说明)
-  - [1.  安装whl包](#1--安装whl包)
-  - [2. 使用](#2-使用)
-  - [3. 后处理](#3-后处理)
-  - [4. 指标](#4-指标)
-  - [5. 训练版面分析模型](#5-训练版面分析模型)

 # 版面分析使用说明

+- [1. 安装whl包](#1)
+- [2. 使用](#2)
+- [3. 后处理](#3)
+- [4. 指标](#4)
+- [5. 训练版面分析模型](#5)
+
+
+<a name="1"></a>
 ## 1.  安装whl包
 ```bash
 pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
 ```

+<a name="2"></a>
 ## 2. 使用

 使用layoutparser识别给定文档的布局：
@@ -20,7 +23,7 @@ pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-a
 ```python
 import cv2
 import layoutparser as lp
-image = cv2.imread("doc/table/layout.jpg")
+image = cv2.imread("ppstructure/docs/table/layout.jpg")
 image = image[..., ::-1]

 # 加载模型
@@ -40,7 +43,7 @@ show_img.show()
 下图展示了结果，不同颜色的检测框表示不同的类别，并通过`show_element_type`在框的左上角显示具体类别：

 <div align="center">
-<img src="../../doc/table/result_all.jpg"  width = "600" />
+<img src="../docs/table/result_all.jpg"  width = "600" />
 </div>

 `PaddleDetectionLayoutModel`函数参数说明如下:
@@ -68,6 +71,7 @@ show_img.show()
 * TableBank word和TableBank latex分别在word文档、latex文档数据集训练；
 * 下载的TableBank数据集里同时包含word和latex。

+<a name="3"></a>
 ## 3. 后处理

 版面分析检测包含多个类别，如果只想获取指定类别(如"Text"类别)的检测框、可以使用下述代码：
@@ -106,9 +110,10 @@ show_img.show()
 显示只有"Text"类别的结果：

 <div align="center">
-<img src="../../doc/table/result_text.jpg"  width = "600" />
+<img src="../docs/table/result_text.jpg"  width = "600" />
 </div>

+<a name="4"></a>
 ## 4. 指标

 | Dataset   | mAP  | CPU time cost | GPU time cost |
@@ -122,6 +127,7 @@ show_img.show()

    **GPU：**  a single NVIDIA Tesla P40

+<a name="5"></a>
 ## 5. 训练版面分析模型

 上述模型基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 训练，如果您想训练自己的版面分析模型，请参考：[train_layoutparser_model](train_layoutparser_model_ch.md)
--- a/ppstructure/predict_system.py
+++ b/ppstructure/predict_system.py
@@ -23,9 +23,10 @@ sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
 os.environ["FLAGS_allocator_strategy"] = 'auto_growth'
 import cv2
 import json
-import numpy as np
 import time
 import logging
+from copy import deepcopy
+from attrdict import AttrDict

 from ppocr.utils.utility import get_image_file_list, check_and_read_gif
 from ppocr.utils.logging import get_logger
@@ -40,97 +41,122 @@ class StructureSystem(object):
    def __init__(self, args):
        self.mode = args.mode
        if self.mode == 'structure':
-            import layoutparser as lp
-            # args.det_limit_type = 'resize_long'
-            args.drop_score = 0
            if not args.show_log:
                logger.setLevel(logging.INFO)
-            self.text_system = TextSystem(args)
-            self.table_system = TableSystem(args,
-                                            self.text_system.text_detector,
-                                            self.text_system.text_recognizer)
-
-            config_path = None
-            model_path = None
-            if os.path.isdir(args.layout_path_model):
-                model_path = args.layout_path_model
+            if args.layout == False and args.ocr == True:
+                args.ocr = False
+                logger.warning(
+                    "When args.layout is false, args.ocr is automatically set to false"
+                )
+            args.drop_score = 0
+            # init layout and ocr model
+            self.text_system = None
+            if args.layout:
+                import layoutparser as lp
+                config_path = None
+                model_path = None
+                if os.path.isdir(args.layout_path_model):
+                    model_path = args.layout_path_model
+                else:
+                    config_path = args.layout_path_model
+                self.table_layout = lp.PaddleDetectionLayoutModel(
+                    config_path=config_path,
+                    model_path=model_path,
+                    label_map=args.layout_label_map,
+                    threshold=0.5,
+                    enable_mkldnn=args.enable_mkldnn,
+                    enforce_cpu=not args.use_gpu,
+                    thread_num=args.cpu_threads)
+                if args.ocr:
+                    self.text_system = TextSystem(args)
+            else:
+                self.table_layout = None
+            if args.table:
+                if self.text_system is not None:
+                    self.table_system = TableSystem(
+                        args, self.text_system.text_detector,
+                        self.text_system.text_recognizer)
+                else:
+                    self.table_system = TableSystem(args)
            else:
-                config_path = args.layout_path_model
-            self.table_layout = lp.PaddleDetectionLayoutModel(
-                config_path=config_path,
-                model_path=model_path,
-                label_map=args.layout_label_map,
-                threshold=0.5,
-                enable_mkldnn=args.enable_mkldnn,
-                enforce_cpu=not args.use_gpu,
-                thread_num=args.cpu_threads)
-            self.use_angle_cls = args.use_angle_cls
-            self.drop_score = args.drop_score
+                self.table_system = None
+
        elif self.mode == 'vqa':
            raise NotImplementedError

-    def __call__(self, img):
+    def __call__(self, img, return_ocr_result_in_table=False):
        if self.mode == 'structure':
            ori_im = img.copy()
-            layout_res = self.table_layout.detect(img[..., ::-1])
+            if self.table_layout is not None:
+                layout_res = self.table_layout.detect(img[..., ::-1])
+            else:
+                h, w = ori_im.shape[:2]
+                layout_res = [AttrDict(coordinates=[0, 0, w, h], type='Table')]
            res_list = []
            for region in layout_res:
+                res = ''
                x1, y1, x2, y2 = region.coordinates
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                roi_img = ori_im[y1:y2, x1:x2, :]
                if region.type == 'Table':
-                    res = self.table_system(roi_img)
+                    if self.table_system is not None:
+                        res = self.table_system(roi_img,
+                                                return_ocr_result_in_table)
                else:
-                    filter_boxes, filter_rec_res = self.text_system(roi_img)
-                    # remove style char
-                    style_token = [
-                        '<strike>', '<strike>', '<sup>', '</sub>', '<b>',
-                        '</b>', '<sub>', '</sup>', '<overline>', '</overline>',
-                        '<underline>', '</underline>', '<i>', '</i>'
-                    ]
-                    res = []
-                    for box, rec_res in zip(filter_boxes, filter_rec_res):
-                        rec_str, rec_conf = rec_res
-                        for token in style_token:
-                            if token in rec_str:
-                                rec_str = rec_str.replace(token, '')
-                        box += [x1, y1]
-                        res.append({
-                            'text': rec_str,
-                            'confidence': float(rec_conf),
-                            'text_region': box.tolist()
-                        })
+                    if self.text_system is not None:
+                        filter_boxes, filter_rec_res = self.text_system(roi_img)
+                        # remove style char
+                        style_token = [
+                            '<strike>', '<strike>', '<sup>', '</sub>', '<b>',
+                            '</b>', '<sub>', '</sup>', '<overline>',
+                            '</overline>', '<underline>', '</underline>', '<i>',
+                            '</i>'
+                        ]
+                        res = []
+                        for box, rec_res in zip(filter_boxes, filter_rec_res):
+                            rec_str, rec_conf = rec_res
+                            for token in style_token:
+                                if token in rec_str:
+                                    rec_str = rec_str.replace(token, '')
+                            box += [x1, y1]
+                            res.append({
+                                'text': rec_str,
+                                'confidence': float(rec_conf),
+                                'text_region': box.tolist()
+                            })
                res_list.append({
                    'type': region.type,
                    'bbox': [x1, y1, x2, y2],
                    'img': roi_img,
                    'res': res
                })
+            return res_list
        elif self.mode == 'vqa':
            raise NotImplementedError
-        return res_list
+        return None


 def save_structure_res(res, save_folder, img_name):
    excel_save_folder = os.path.join(save_folder, img_name)
    os.makedirs(excel_save_folder, exist_ok=True)
+    res_cp = deepcopy(res)
    # save res
    with open(
            os.path.join(excel_save_folder, 'res.txt'), 'w',
            encoding='utf8') as f:
-        for region in res:
-            if region['type'] == 'Table':
+        for region in res_cp:
+            roi_img = region.pop('img')
+            f.write('{}\n'.format(json.dumps(region)))
+
+            if region['type'] == 'Table' and len(region[
+                    'res']) > 0 and 'html' in region['res']:
                excel_path = os.path.join(excel_save_folder,
                                          '{}.xlsx'.format(region['bbox']))
-                to_excel(region['res'], excel_path)
+                to_excel(region['res']['html'], excel_path)
            elif region['type'] == 'Figure':
-                roi_img = region['img']
                img_path = os.path.join(excel_save_folder,
                                        '{}.jpg'.format(region['bbox']))
                cv2.imwrite(img_path, roi_img)
-            else:
-                for text_result in region['res']:
-                    f.write('{}\n'.format(json.dumps(text_result)))


 def main(args):

--- a/ppstructure/table/README.md
+++ b/ppstructure/table/README.md
@@ -51,7 +51,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
 wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
 cd ..
 # run
-python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
+python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=./docs/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table
 ```
 Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`.


--- a/ppstructure/table/README_ch.md
+++ b/ppstructure/table/README_ch.md
- [表格识别](#表格识别)
-  - [1. 表格识别 pipeline](#1-表格识别-pipeline)
-  - [2. 性能](#2-性能)
-  - [3. 使用](#3-使用)
-    - [3.1 快速开始](#31-快速开始)
-    - [3.2 训练](#32-训练)
-    - [3.3 评估](#33-评估)
-    - [3.4 预测](#34-预测)
+[English](README.md) | 简体中文

 # 表格识别

+- [1. 表格识别 pipeline](#1)
+- [2. 性能](#2)
+- [3. 使用](#3)
+    - [3.1 快速开始](#31)
+    - [3.2 训练](#32)
+    - [3.3 评估](#33)
+    - [3.4 预测](#34)
+
+
+<a name="1"></a>
 ## 1. 表格识别 pipeline

 表格识别主要包含三个模型
@@ -18,7 +21,7 @@

 具体流程图如下

-![tableocr_pipeline](../../doc/table/tableocr_pipeline.jpg)
+![tableocr_pipeline](../docs/table/tableocr_pipeline.jpg)

 流程说明:

@@ -28,7 +31,9 @@
 4. 单元格的识别结果和表格结构一起构造表格的html字符串。


+<a name="2"></a>
 ## 2. 性能
+
 我们在 PubTabNet<sup>[1]</sup> 评估数据集上对算法进行了评估，性能如下


@@ -37,8 +42,10 @@
 | EDD<sup>[2]</sup> | 88.3 |
 | Ours | 93.32 |

+<a name="3"></a>
 ## 3. 使用

+<a name="31"></a>
 ### 3.1 快速开始

 ```python
@@ -54,12 +61,13 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
 wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
 cd ..
 # 执行预测
-python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
+python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=./docs/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ./output/table
 ```
 运行完成后，每张图片的excel表格会保存到output字段指定的目录下

 note: 上述模型是在 PubLayNet 数据集上训练的表格识别模型，仅支持英文扫描场景，如需识别其他场景需要自己训练模型后替换 `det_model_dir`,`rec_model_dir`,`table_model_dir`三个字段即可。

+<a name="32"></a>
 ### 3.2 训练

 在这一章节中，我们仅介绍表格结构模型的训练，[文字检测](../../doc/doc_ch/detection.md)和[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。
@@ -89,6 +97,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./yo

 **注意**：`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级，即同时指定两个参数时，优先加载`Global.checkpoints`指定的模型，如果`Global.checkpoints`指定的模型路径有误，会加载`Global.pretrain_weights`指定的模型。

+<a name="33"></a>
 ### 3.3 评估

 表格使用 [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) 作为模型的评估指标。在进行模型评估之前，需要将pipeline中的三个模型分别导出为inference模型(我们已经提供好)，还需要准备评估的gt， gt示例如下:
@@ -113,6 +122,8 @@ python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_di
 ```bash
 teds: 93.32
 ```
+
+<a name="34"></a>
 ### 3.4 预测

 ```python
@@ -120,6 +131,6 @@ cd PaddleOCR/ppstructure
 python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
 ```

-Reference
+# Reference
 1. https://github.com/ibm-aur-nlp/PubTabNet
 2. https://arxiv.org/pdf/1911.10683