Commit 86b90aa9 authored by Leif's avatar Leif
Browse files

Merge remote-tracking branch 'origin/dygraph' into dygraph

parents 801b5771 8fe1b8d3
...@@ -13,7 +13,6 @@ English | [简体中文](README_ch.md) ...@@ -13,7 +13,6 @@ English | [简体中文](README_ch.md)
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a> <a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a> <a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a> <a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p> </p>
...@@ -24,7 +23,8 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -24,7 +23,8 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
**Recent updates** **Recent updates**
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2,LayoutXLM).
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758). - PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. - 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
...@@ -38,7 +38,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools ...@@ -38,7 +38,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M - Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M - General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition - Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-language recognition: Korean, Japanese, German, French - Support multi-language recognition: about 80 languages like Korean, Japanese, German, French, etc
- document structurize system PP-Structure
- support layout analysis and table recognition (support export to Excel)
- support key information extraction
- support DocVQA
- Rich toolkits related to the OCR areas - Rich toolkits related to the OCR areas
- Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation - Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation
- Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image - Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image
......
...@@ -9,7 +9,6 @@ ...@@ -9,7 +9,6 @@
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a> <a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a> <a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a> <a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p> </p>
...@@ -20,11 +19,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -20,11 +19,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 近期更新 ## 近期更新
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758) - PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758)
- 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 - 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 - 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
> 完整PaddleOCR更新时间线可参考[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/update.md) > [更多](./doc/doc_ch/update.md)
## 特性 ## 特性
...@@ -33,11 +34,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -33,11 +34,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M - 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M - 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别 - 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语等 - 支持多语言识别:韩语、日语、德语、法语等约80种语言
- PP-Structure文档结构化系统
- 支持版面分析与表格识别(含Excel导出)
- 支持关键信息提取任务
- 支持DocVQA任务
- 丰富易用的OCR相关工具组件 - 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注 - 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像 - 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
- 文档分析能力PP-Structure:支持版面分析与表格识别(含Excel导出)
- 支持用户自定义训练,提供丰富的预测推理部署方案 - 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用 - 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统 - 可运行于Linux、Windows、MacOS等多种系统
...@@ -56,6 +60,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力 ...@@ -56,6 +60,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<div align="center"> <div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
</div> </div>
## 零代码体验 ## 零代码体验
- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr - 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
......
Global:
use_gpu: True
epoch_num: 60
log_smooth_window: 20
print_batch_step: 50
save_model_dir: ./output/kie_5/
save_epoch_step: 50
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [ 0, 80 ]
# 1. If pretrained_model is saved in static mode, such as classification pretrained model
# from static branch, load_static_weights must be set as True.
# 2. If you want to finetune the pretrained models we provide in the docs,
# you should set load_static_weights as False.
load_static_weights: False
cal_metric_during_train: False
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
class_path: ./train_data/wildreceipt/class_list.txt
infer_img: ./train_data/wildreceipt/1.txt
save_res_path: ./output/sdmgr_kie/predicts_kie.txt
img_scale: [ 1024, 512 ]
Architecture:
model_type: kie
algorithm: SDMGR
Transform:
Backbone:
name: Kie_backbone
Head:
name: SDMGRHead
Loss:
name: SDMGRLoss
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
learning_rate: 0.001
decay_epochs: [ 60, 80, 100]
values: [ 0.001, 0.0001, 0.00001]
warmup_epoch: 2
regularizer:
name: 'L2'
factor: 0.00005
PostProcess:
name: None
Metric:
name: KIEMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/wildreceipt/
label_file_list: [ './train_data/wildreceipt/wildreceipt_train.txt' ]
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- KieLabelEncode: # Class handling label
character_dict_path: ./train_data/wildreceipt/dict.txt
- KieResize:
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'relations', 'texts', 'points', 'labels', 'tag', 'shape'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 4
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/wildreceipt
label_file_list:
- ./train_data/wildreceipt/wildreceipt_test.txt
# - /paddle/data/PaddleOCR/train_data/wildreceipt/1.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- KieLabelEncode: # Class handling label
character_dict_path: ./train_data/wildreceipt/dict.txt
- KieResize:
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'relations', 'texts', 'points', 'labels', 'tag', 'ori_image', 'ori_boxes', 'shape']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 4
...@@ -28,6 +28,7 @@ Optimizer: ...@@ -28,6 +28,7 @@ Optimizer:
lr: lr:
name: Cosine name: Cosine
learning_rate: 0.001 learning_rate: 0.001
warmup_epoch: 5
regularizer: regularizer:
name: 'L2' name: 'L2'
factor: 0.00004 factor: 0.00004
......
...@@ -28,6 +28,7 @@ Optimizer: ...@@ -28,6 +28,7 @@ Optimizer:
lr: lr:
name: Cosine name: Cosine
learning_rate: 0.001 learning_rate: 0.001
warmup_epoch: 5
regularizer: regularizer:
name: 'L2' name: 'L2'
factor: 0.00001 factor: 0.00001
......
...@@ -75,7 +75,7 @@ Train: ...@@ -75,7 +75,7 @@ Train:
channel_first: False channel_first: False
- SEEDLabelEncode: # Class handling label - SEEDLabelEncode: # Class handling label
- RecResizeImg: - RecResizeImg:
character_type: en character_dict_path:
image_shape: [3, 64, 256] image_shape: [3, 64, 256]
padding: False padding: False
- KeepKeys: - KeepKeys:
...@@ -96,7 +96,7 @@ Eval: ...@@ -96,7 +96,7 @@ Eval:
channel_first: False channel_first: False
- SEEDLabelEncode: # Class handling label - SEEDLabelEncode: # Class handling label
- RecResizeImg: - RecResizeImg:
character_type: en character_dict_path:
image_shape: [3, 64, 256] image_shape: [3, 64, 256]
padding: False padding: False
- KeepKeys: - KeepKeys:
......
...@@ -103,7 +103,7 @@ opencv3/ ...@@ -103,7 +103,7 @@ opencv3/
#### 1.2.1 直接下载安装 #### 1.2.1 直接下载安装
* [Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。 * [Paddle预测库官网](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。
* 下载之后使用下面的方法解压。 * 下载之后使用下面的方法解压。
...@@ -119,7 +119,7 @@ tar -xf paddle_inference.tgz ...@@ -119,7 +119,7 @@ tar -xf paddle_inference.tgz
```shell ```shell
git clone https://github.com/PaddlePaddle/Paddle.git git clone https://github.com/PaddlePaddle/Paddle.git
git checkout release/2.1 git checkout develop
``` ```
* 进入Paddle目录后,编译方法如下。 * 进入Paddle目录后,编译方法如下。
......
...@@ -79,7 +79,7 @@ opencv3/ ...@@ -79,7 +79,7 @@ opencv3/
#### 1.2.1 Direct download and installation #### 1.2.1 Direct download and installation
[Paddle inference library official website](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html). You can view and select the appropriate version of the inference library on the official website. [Paddle inference library official website](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html). You can view and select the appropriate version of the inference library on the official website.
* After downloading, use the following method to uncompress. * After downloading, use the following method to uncompress.
...@@ -97,7 +97,7 @@ Finally you can see the following files in the folder of `paddle_inference/`. ...@@ -97,7 +97,7 @@ Finally you can see the following files in the folder of `paddle_inference/`.
```shell ```shell
git clone https://github.com/PaddlePaddle/Paddle.git git clone https://github.com/PaddlePaddle/Paddle.git
git checkout release/2.1 git checkout develop
``` ```
* After entering the Paddle directory, the commands to compile the paddle inference library are as follows. * After entering the Paddle directory, the commands to compile the paddle inference library are as follows.
......
...@@ -45,63 +45,67 @@ PaddleOCR operating environment and Paddle Serving operating environment are nee ...@@ -45,63 +45,67 @@ PaddleOCR operating environment and Paddle Serving operating environment are nee
``` ```
3. Install the client to send requests to the service 3. Install the client to send requests to the service
In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
The python3.7 version is recommended here:
``` ```bash
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl # 安装serving,用于启动服务
pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl
``` pip3 install paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl
# 如果是cuda10.1环境,可以使用下面的命令安装paddle-serving-server
4. Install serving-app # wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl
``` # pip3 install paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl
pip3 install paddle-serving-app==0.6.1
``` # 安装client,用于向服务发送请求
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl
pip3 install paddle_serving_client-0.7.0-cp37-none-any.whl
# 安装serving-app
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl
pip3 install paddle_serving_app-0.7.0-py3-none-any.whl
```
**note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md). **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md).
<a name="model-conversion"></a> <a name="model-conversion"></a>
## Model conversion ## Model conversion
When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy. When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
Firstly, download the [inference model](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-20-series-model-listupdate-on-dec-15) of PPOCR Firstly, download the [inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/README_ch.md#pp-ocr%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8%E6%9B%B4%E6%96%B0%E4%B8%AD) of PPOCR
``` ```
# Download and unzip the OCR text detection model # Download and unzip the OCR text detection model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar -O ch_PP-OCRv2_det_infer.tar && tar -xf ch_PP-OCRv2_det_infer.tar
# Download and unzip the OCR text recognition model # Download and unzip the OCR text recognition model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar -O ch_PP-OCRv2_rec_infer.tar && tar -xf ch_PP-OCRv2_rec_infer.tar
``` ```
Then, you can use installed paddle_serving_client tool to convert inference model to mobile model. Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
``` ```
# Detection model conversion # Detection model conversion
python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_det_infer/ \ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_det_infer/ \
--model_filename inference.pdmodel \ --model_filename inference.pdmodel \
--params_filename inference.pdiparams \ --params_filename inference.pdiparams \
--serving_server ./ppocr_det_mobile_2.0_serving/ \ --serving_server ./ppocrv2_det_serving/ \
--serving_client ./ppocr_det_mobile_2.0_client/ --serving_client ./ppocrv2_det_client/
# Recognition model conversion # Recognition model conversion
python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_rec_infer/ \ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
--model_filename inference.pdmodel \ --model_filename inference.pdmodel \
--params_filename inference.pdiparams \ --params_filename inference.pdiparams \
--serving_server ./ppocr_rec_mobile_2.0_serving/ \ --serving_server ./ppocrv2_rec_serving/ \
--serving_client ./ppocr_rec_mobile_2.0_client/ --serving_client ./ppocrv2_rec_client/
``` ```
After the detection model is converted, there will be additional folders of `ppocr_det_mobile_2.0_serving` and `ppocr_det_mobile_2.0_client` in the current folder, with the following format: After the detection model is converted, there will be additional folders of `ppocr_det_mobile_2.0_serving` and `ppocr_det_mobile_2.0_client` in the current folder, with the following format:
``` ```
|- ppocr_det_mobile_2.0_serving/ |- ppocrv2_det_serving/
|- __model__ |- __model__
|- __params__ |- __params__
|- serving_server_conf.prototxt |- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt |- serving_server_conf.stream.prototxt
|- ppocr_det_mobile_2.0_client |- ppocrv2_det_client
|- serving_client_conf.prototxt |- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt |- serving_client_conf.stream.prototxt
``` ```
The recognition model is the same. The recognition model is the same.
......
...@@ -34,70 +34,66 @@ PaddleOCR提供2种服务部署方式: ...@@ -34,70 +34,66 @@ PaddleOCR提供2种服务部署方式:
- 准备PaddleServing的运行环境,步骤如下 - 准备PaddleServing的运行环境,步骤如下
1. 安装serving,用于启动服务 ```bash
``` # 安装serving,用于启动服务
pip3 install paddle-serving-server==0.6.1 # for CPU wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl
pip3 install paddle-serving-server-gpu==0.6.1 # for GPU pip3 install paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl
# 其他GPU环境需要确认环境再选择执行如下命令 # 如果是cuda10.1环境,可以使用下面的命令安装paddle-serving-server
pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6 # wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl
pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7 # pip3 install paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl
```
# 安装client,用于向服务发送请求
2. 安装client,用于向服务发送请求 wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl
[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包,这里推荐python3.7版本: pip3 install paddle_serving_client-0.7.0-cp37-none-any.whl
``` # 安装serving-app
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl
pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl pip3 install paddle_serving_app-0.7.0-py3-none-any.whl
``` ```
3. 安装serving-app
```
pip3 install paddle-serving-app==0.6.1
```
**Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。 **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Latest_Packages_CN.md)
<a name="模型转换"></a> <a name="模型转换"></a>
## 模型转换 ## 模型转换
使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。 使用PaddleServing做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。
首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-20-series-model-listupdate-on-dec-15) 首先,下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-series-model-listupdate-on-september-8th)
```
```bash
# 下载并解压 OCR 文本检测模型 # 下载并解压 OCR 文本检测模型
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar -O ch_PP-OCRv2_det_infer.tar && tar -xf ch_PP-OCRv2_det_infer.tar
# 下载并解压 OCR 文本识别模型 # 下载并解压 OCR 文本识别模型
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar -O ch_PP-OCRv2_rec_infer.tar && tar -xf ch_PP-OCRv2_rec_infer.tar
``` ```
接下来,用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。 接下来,用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
``` ```bash
# 转换检测模型 # 转换检测模型
python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_det_infer/ \ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_det_infer/ \
--model_filename inference.pdmodel \ --model_filename inference.pdmodel \
--params_filename inference.pdiparams \ --params_filename inference.pdiparams \
--serving_server ./ppocr_det_mobile_2.0_serving/ \ --serving_server ./ppocrv2_det_serving/ \
--serving_client ./ppocr_det_mobile_2.0_client/ --serving_client ./ppocrv2_det_client/
# 转换识别模型 # 转换识别模型
python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_rec_infer/ \ python3 -m paddle_serving_client.convert --dirname ./ch_PP-OCRv2_rec_infer/ \
--model_filename inference.pdmodel \ --model_filename inference.pdmodel \
--params_filename inference.pdiparams \ --params_filename inference.pdiparams \
--serving_server ./ppocr_rec_mobile_2.0_serving/ \ --serving_server ./ppocrv2_rec_serving/ \
--serving_client ./ppocr_rec_mobile_2.0_client/ --serving_client ./ppocrv2_rec_client/
``` ```
检测模型转换完成后,会在当前文件夹多出`ppocr_det_mobile_2.0_serving``ppocr_det_mobile_2.0_client`的文件夹,具备如下格式: 检测模型转换完成后,会在当前文件夹多出`ppocrv2_det_serving``ppocrv2_det_client`的文件夹,具备如下格式:
``` ```
|- ppocr_det_mobile_2.0_serving/ |- ppocrv2_det_serving/
|- __model__ |- __model__
|- __params__ |- __params__
|- serving_server_conf.prototxt |- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt |- serving_server_conf.stream.prototxt
|- ppocr_det_mobile_2.0_client |- ppocrv2_det_client
|- serving_client_conf.prototxt |- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt |- serving_client_conf.stream.prototxt
......
...@@ -34,7 +34,7 @@ op: ...@@ -34,7 +34,7 @@ op:
client_type: local_predictor client_type: local_predictor
#det模型路径 #det模型路径
model_config: ./ppocr_det_mobile_2.0_serving model_config: ./ppocrv2_det_serving
#Fetch结果列表,以client_config中fetch_var的alias_name为准 #Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["save_infer_model/scale_0.tmp_1"] fetch_list: ["save_infer_model/scale_0.tmp_1"]
...@@ -60,7 +60,7 @@ op: ...@@ -60,7 +60,7 @@ op:
client_type: local_predictor client_type: local_predictor
#rec模型路径 #rec模型路径
model_config: ./ppocr_rec_mobile_2.0_serving model_config: ./ppocrv2_rec_serving
#Fetch结果列表,以client_config中fetch_var的alias_name为准 #Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["save_infer_model/scale_0.tmp_1"] fetch_list: ["save_infer_model/scale_0.tmp_1"]
......
...@@ -54,7 +54,7 @@ class DetOp(Op): ...@@ -54,7 +54,7 @@ class DetOp(Op):
_, self.new_h, self.new_w = det_img.shape _, self.new_h, self.new_w = det_img.shape
return {"x": det_img[np.newaxis, :].copy()}, False, None, "" return {"x": det_img[np.newaxis, :].copy()}, False, None, ""
def postprocess(self, input_dicts, fetch_dict, log_id): def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
det_out = fetch_dict["save_infer_model/scale_0.tmp_1"] det_out = fetch_dict["save_infer_model/scale_0.tmp_1"]
ratio_list = [ ratio_list = [
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
...@@ -129,7 +129,7 @@ class RecOp(Op): ...@@ -129,7 +129,7 @@ class RecOp(Op):
return feed_list, False, None, "" return feed_list, False, None, ""
def postprocess(self, input_dicts, fetch_data, log_id): def postprocess(self, input_dicts, fetch_data, data_id, log_id):
res_list = [] res_list = []
if isinstance(fetch_data, dict): if isinstance(fetch_data, dict):
if len(fetch_data) > 0: if len(fetch_data) > 0:
......
...@@ -54,7 +54,7 @@ class DetOp(Op): ...@@ -54,7 +54,7 @@ class DetOp(Op):
_, self.new_h, self.new_w = det_img.shape _, self.new_h, self.new_w = det_img.shape
return {"x": det_img[np.newaxis, :].copy()}, False, None, "" return {"x": det_img[np.newaxis, :].copy()}, False, None, ""
def postprocess(self, input_dicts, fetch_dict, log_id): def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
det_out = fetch_dict["save_infer_model/scale_0.tmp_1"] det_out = fetch_dict["save_infer_model/scale_0.tmp_1"]
ratio_list = [ ratio_list = [
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
......
...@@ -56,7 +56,7 @@ class RecOp(Op): ...@@ -56,7 +56,7 @@ class RecOp(Op):
feed_list.append(feed) feed_list.append(feed)
return feed_list, False, None, "" return feed_list, False, None, ""
def postprocess(self, input_dicts, fetch_data, log_id): def postprocess(self, input_dicts, fetch_data, data_id, log_id):
res_list = [] res_list = []
if isinstance(fetch_data, dict): if isinstance(fetch_data, dict):
if len(fetch_data) > 0: if len(fetch_data) > 0:
......
# 更新 # 更新
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。 - 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。 - 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。 - 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
......
# RECENT UPDATES # RECENT UPDATES
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2,LayoutXLM).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile. - 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized. - 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
......
...@@ -19,6 +19,7 @@ from __future__ import unicode_literals ...@@ -19,6 +19,7 @@ from __future__ import unicode_literals
import numpy as np import numpy as np
import string import string
from shapely.geometry import LineString, Point, Polygon
import json import json
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
...@@ -286,6 +287,168 @@ class E2ELabelEncodeTrain(object): ...@@ -286,6 +287,168 @@ class E2ELabelEncodeTrain(object):
return data return data
class KieLabelEncode(object):
def __init__(self, character_dict_path, norm=10, directed=False, **kwargs):
super(KieLabelEncode, self).__init__()
self.dict = dict({'': 0})
with open(character_dict_path, 'r', encoding='utf-8') as fr:
idx = 1
for line in fr:
char = line.strip()
self.dict[char] = idx
idx += 1
self.norm = norm
self.directed = directed
def compute_relation(self, boxes):
"""Compute relation between every two boxes."""
x1s, y1s = boxes[:, 0:1], boxes[:, 1:2]
x2s, y2s = boxes[:, 4:5], boxes[:, 5:6]
ws, hs = x2s - x1s + 1, np.maximum(y2s - y1s + 1, 1)
dxs = (x1s[:, 0][None] - x1s) / self.norm
dys = (y1s[:, 0][None] - y1s) / self.norm
xhhs, xwhs = hs[:, 0][None] / hs, ws[:, 0][None] / hs
whs = ws / hs + np.zeros_like(xhhs)
relations = np.stack([dxs, dys, whs, xhhs, xwhs], -1)
bboxes = np.concatenate([x1s, y1s, x2s, y2s], -1).astype(np.float32)
return relations, bboxes
def pad_text_indices(self, text_inds):
"""Pad text index to same length."""
max_len = 300
recoder_len = max([len(text_ind) for text_ind in text_inds])
padded_text_inds = -np.ones((len(text_inds), max_len), np.int32)
for idx, text_ind in enumerate(text_inds):
padded_text_inds[idx, :len(text_ind)] = np.array(text_ind)
return padded_text_inds, recoder_len
def list_to_numpy(self, ann_infos):
"""Convert bboxes, relations, texts and labels to ndarray."""
boxes, text_inds = ann_infos['points'], ann_infos['text_inds']
boxes = np.array(boxes, np.int32)
relations, bboxes = self.compute_relation(boxes)
labels = ann_infos.get('labels', None)
if labels is not None:
labels = np.array(labels, np.int32)
edges = ann_infos.get('edges', None)
if edges is not None:
labels = labels[:, None]
edges = np.array(edges)
edges = (edges[:, None] == edges[None, :]).astype(np.int32)
if self.directed:
edges = (edges & labels == 1).astype(np.int32)
np.fill_diagonal(edges, -1)
labels = np.concatenate([labels, edges], -1)
padded_text_inds, recoder_len = self.pad_text_indices(text_inds)
max_num = 300
temp_bboxes = np.zeros([max_num, 4])
h, _ = bboxes.shape
temp_bboxes[:h, :h] = bboxes
temp_relations = np.zeros([max_num, max_num, 5])
temp_relations[:h, :h, :] = relations
temp_padded_text_inds = np.zeros([max_num, max_num])
temp_padded_text_inds[:h, :] = padded_text_inds
temp_labels = np.zeros([max_num, max_num])
temp_labels[:h, :h + 1] = labels
tag = np.array([h, recoder_len])
return dict(
image=ann_infos['image'],
points=temp_bboxes,
relations=temp_relations,
texts=temp_padded_text_inds,
labels=temp_labels,
tag=tag)
def convert_canonical(self, points_x, points_y):
assert len(points_x) == 4
assert len(points_y) == 4
points = [Point(points_x[i], points_y[i]) for i in range(4)]
polygon = Polygon([(p.x, p.y) for p in points])
min_x, min_y, _, _ = polygon.bounds
points_to_lefttop = [
LineString([points[i], Point(min_x, min_y)]) for i in range(4)
]
distances = np.array([line.length for line in points_to_lefttop])
sort_dist_idx = np.argsort(distances)
lefttop_idx = sort_dist_idx[0]
if lefttop_idx == 0:
point_orders = [0, 1, 2, 3]
elif lefttop_idx == 1:
point_orders = [1, 2, 3, 0]
elif lefttop_idx == 2:
point_orders = [2, 3, 0, 1]
else:
point_orders = [3, 0, 1, 2]
sorted_points_x = [points_x[i] for i in point_orders]
sorted_points_y = [points_y[j] for j in point_orders]
return sorted_points_x, sorted_points_y
def sort_vertex(self, points_x, points_y):
assert len(points_x) == 4
assert len(points_y) == 4
x = np.array(points_x)
y = np.array(points_y)
center_x = np.sum(x) * 0.25
center_y = np.sum(y) * 0.25
x_arr = np.array(x - center_x)
y_arr = np.array(y - center_y)
angle = np.arctan2(y_arr, x_arr) * 180.0 / np.pi
sort_idx = np.argsort(angle)
sorted_points_x, sorted_points_y = [], []
for i in range(4):
sorted_points_x.append(points_x[sort_idx[i]])
sorted_points_y.append(points_y[sort_idx[i]])
return self.convert_canonical(sorted_points_x, sorted_points_y)
def __call__(self, data):
import json
label = data['label']
annotations = json.loads(label)
boxes, texts, text_inds, labels, edges = [], [], [], [], []
for ann in annotations:
box = ann['points']
x_list = [box[i][0] for i in range(4)]
y_list = [box[i][1] for i in range(4)]
sorted_x_list, sorted_y_list = self.sort_vertex(x_list, y_list)
sorted_box = []
for x, y in zip(sorted_x_list, sorted_y_list):
sorted_box.append(x)
sorted_box.append(y)
boxes.append(sorted_box)
text = ann['transcription']
texts.append(ann['transcription'])
text_ind = [self.dict[c] for c in text if c in self.dict]
text_inds.append(text_ind)
labels.append(ann['label'])
edges.append(ann.get('edge', 0))
ann_infos = dict(
image=data['image'],
points=boxes,
texts=texts,
text_inds=text_inds,
edges=edges,
labels=labels)
return self.list_to_numpy(ann_infos)
class AttnLabelEncode(BaseRecLabelEncode): class AttnLabelEncode(BaseRecLabelEncode):
""" Convert between text-label and text-index """ """ Convert between text-label and text-index """
...@@ -344,8 +507,12 @@ class SEEDLabelEncode(BaseRecLabelEncode): ...@@ -344,8 +507,12 @@ class SEEDLabelEncode(BaseRecLabelEncode):
max_text_length, character_dict_path, use_space_char) max_text_length, character_dict_path, use_space_char)
def add_special_char(self, dict_character): def add_special_char(self, dict_character):
self.padding = "padding"
self.end_str = "eos" self.end_str = "eos"
dict_character = dict_character + [self.end_str] self.unknown = "unknown"
dict_character = dict_character + [
self.end_str, self.padding, self.unknown
]
return dict_character return dict_character
def __call__(self, data): def __call__(self, data):
...@@ -356,8 +523,8 @@ class SEEDLabelEncode(BaseRecLabelEncode): ...@@ -356,8 +523,8 @@ class SEEDLabelEncode(BaseRecLabelEncode):
if len(text) >= self.max_text_len: if len(text) >= self.max_text_len:
return None return None
data['length'] = np.array(len(text)) + 1 # conclude eos data['length'] = np.array(len(text)) + 1 # conclude eos
text = text + [len(self.character) - 1] * (self.max_text_len - len(text) text = text + [len(self.character) - 3] + [len(self.character) - 2] * (
) self.max_text_len - len(text) - 1)
data['label'] = np.array(text) data['label'] = np.array(text)
return data return data
......
...@@ -111,7 +111,6 @@ class NormalizeImage(object): ...@@ -111,7 +111,6 @@ class NormalizeImage(object):
from PIL import Image from PIL import Image
if isinstance(img, Image.Image): if isinstance(img, Image.Image):
img = np.array(img) img = np.array(img)
assert isinstance(img, assert isinstance(img,
np.ndarray), "invalid input 'img' in NormalizeImage" np.ndarray), "invalid input 'img' in NormalizeImage"
data['image'] = ( data['image'] = (
...@@ -367,3 +366,53 @@ class E2EResizeForTest(object): ...@@ -367,3 +366,53 @@ class E2EResizeForTest(object):
ratio_w = resize_w / float(w) ratio_w = resize_w / float(w)
return im, (ratio_h, ratio_w) return im, (ratio_h, ratio_w)
class KieResize(object):
def __init__(self, **kwargs):
super(KieResize, self).__init__()
self.max_side, self.min_side = kwargs['img_scale'][0], kwargs[
'img_scale'][1]
def __call__(self, data):
img = data['image']
points = data['points']
src_h, src_w, _ = img.shape
im_resized, scale_factor, [ratio_h, ratio_w
], [new_h, new_w] = self.resize_image(img)
resize_points = self.resize_boxes(img, points, scale_factor)
data['ori_image'] = img
data['ori_boxes'] = points
data['points'] = resize_points
data['image'] = im_resized
data['shape'] = np.array([new_h, new_w])
return data
def resize_image(self, img):
norm_img = np.zeros([1024, 1024, 3], dtype='float32')
scale = [512, 1024]
h, w = img.shape[:2]
max_long_edge = max(scale)
max_short_edge = min(scale)
scale_factor = min(max_long_edge / max(h, w),
max_short_edge / min(h, w))
resize_w, resize_h = int(w * float(scale_factor) + 0.5), int(h * float(
scale_factor) + 0.5)
max_stride = 32
resize_h = (resize_h + max_stride - 1) // max_stride * max_stride
resize_w = (resize_w + max_stride - 1) // max_stride * max_stride
im = cv2.resize(img, (resize_w, resize_h))
new_h, new_w = im.shape[:2]
w_scale = new_w / w
h_scale = new_h / h
scale_factor = np.array(
[w_scale, h_scale, w_scale, h_scale], dtype=np.float32)
norm_img[:new_h, :new_w, :] = im
return norm_img, scale_factor, [h_scale, w_scale], [new_h, new_w]
def resize_boxes(self, im, points, scale_factor):
points = points * scale_factor
img_shape = im.shape[:2]
points[:, 0::2] = np.clip(points[:, 0::2], 0, img_shape[1])
points[:, 1::2] = np.clip(points[:, 1::2], 0, img_shape[0])
return points
...@@ -35,6 +35,7 @@ from .cls_loss import ClsLoss ...@@ -35,6 +35,7 @@ from .cls_loss import ClsLoss
# e2e loss # e2e loss
from .e2e_pg_loss import PGLoss from .e2e_pg_loss import PGLoss
from .kie_sdmgr_loss import SDMGRLoss
# basic loss function # basic loss function
from .basic_loss import DistanceLoss from .basic_loss import DistanceLoss
...@@ -50,7 +51,7 @@ def build_loss(config): ...@@ -50,7 +51,7 @@ def build_loss(config):
support_dict = [ support_dict = [
'DBLoss', 'PSELoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss', 'DBLoss', 'PSELoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss',
'AttentionLoss', 'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'AttentionLoss', 'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss',
'TableAttentionLoss', 'SARLoss', 'AsterLoss' 'TableAttentionLoss', 'SARLoss', 'AsterLoss', 'SDMGRLoss'
] ]
config = copy.deepcopy(config) config = copy.deepcopy(config)
module_name = config.pop('name') module_name = config.pop('name')
......
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import nn
import paddle
class SDMGRLoss(nn.Layer):
def __init__(self, node_weight=1.0, edge_weight=1.0, ignore=0):
super().__init__()
self.loss_node = nn.CrossEntropyLoss(ignore_index=ignore)
self.loss_edge = nn.CrossEntropyLoss(ignore_index=-1)
self.node_weight = node_weight
self.edge_weight = edge_weight
self.ignore = ignore
def pre_process(self, gts, tag):
gts, tag = gts.numpy(), tag.numpy().tolist()
temp_gts = []
batch = len(tag)
for i in range(batch):
num, recoder_len = tag[i][0], tag[i][1]
temp_gts.append(
paddle.to_tensor(
gts[i, :num, :num + 1], dtype='int64'))
return temp_gts
def accuracy(self, pred, target, topk=1, thresh=None):
"""Calculate accuracy according to the prediction and target.
Args:
pred (torch.Tensor): The model prediction, shape (N, num_class)
target (torch.Tensor): The target of each prediction, shape (N, )
topk (int | tuple[int], optional): If the predictions in ``topk``
matches the target, the predictions will be regarded as
correct ones. Defaults to 1.
thresh (float, optional): If not None, predictions with scores under
this threshold are considered incorrect. Default to None.
Returns:
float | tuple[float]: If the input ``topk`` is a single integer,
the function will return a single float as accuracy. If
``topk`` is a tuple containing multiple integers, the
function will return a tuple containing accuracies of
each ``topk`` number.
"""
assert isinstance(topk, (int, tuple))
if isinstance(topk, int):
topk = (topk, )
return_single = True
else:
return_single = False
maxk = max(topk)
if pred.shape[0] == 0:
accu = [pred.new_tensor(0.) for i in range(len(topk))]
return accu[0] if return_single else accu
pred_value, pred_label = paddle.topk(pred, maxk, axis=1)
pred_label = pred_label.transpose(
[1, 0]) # transpose to shape (maxk, N)
correct = paddle.equal(pred_label,
(target.reshape([1, -1]).expand_as(pred_label)))
res = []
for k in topk:
correct_k = paddle.sum(correct[:k].reshape([-1]).astype('float32'),
axis=0,
keepdim=True)
res.append(
paddle.multiply(correct_k,
paddle.to_tensor(100.0 / pred.shape[0])))
return res[0] if return_single else res
def forward(self, pred, batch):
node_preds, edge_preds = pred
gts, tag = batch[4], batch[5]
gts = self.pre_process(gts, tag)
node_gts, edge_gts = [], []
for gt in gts:
node_gts.append(gt[:, 0])
edge_gts.append(gt[:, 1:].reshape([-1]))
node_gts = paddle.concat(node_gts)
edge_gts = paddle.concat(edge_gts)
node_valids = paddle.nonzero(node_gts != self.ignore).reshape([-1])
edge_valids = paddle.nonzero(edge_gts != -1).reshape([-1])
loss_node = self.loss_node(node_preds, node_gts)
loss_edge = self.loss_edge(edge_preds, edge_gts)
loss = self.node_weight * loss_node + self.edge_weight * loss_edge
return dict(
loss=loss,
loss_node=loss_node,
loss_edge=loss_edge,
acc_node=self.accuracy(
paddle.gather(node_preds, node_valids),
paddle.gather(node_gts, node_valids)),
acc_edge=self.accuracy(
paddle.gather(edge_preds, edge_valids),
paddle.gather(edge_gts, edge_valids)))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment