"vscode:/vscode.git/clone" did not exist on "d934d3d79519653ad4db5b63a7665c1536b4187c"
Commit a9444597 authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into kie_doc

parents 29a096e2 0e325b5a
...@@ -28,12 +28,12 @@ The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japane ...@@ -28,12 +28,12 @@ The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japane
This document will briefly introduce how to use the multilingual model. This document will briefly introduce how to use the multilingual model.
- [1 Installation](#Install) - [1 Installation](#Install)
- [1.1 paddle installation](#paddleinstallation) - [1.1 Paddle installation](#paddleinstallation)
- [1.2 paddleocr package installation](#paddleocr_package_install) - [1.2 PaddleOCR package installation](#paddleocr_package_install)
- [2 Quick Use](#Quick_Use) - [2 Quick Use](#Quick_Use)
- [2.1 Command line operation](#Command_line_operation) - [2.1 Command line operation](#Command_line_operation)
- [2.2 python script running](#python_Script_running) - [2.2 Run with Python script](#python_Script_running)
- [3 Custom Training](#Custom_Training) - [3 Custom Training](#Custom_Training)
- [4 Inference and Deployment](#inference) - [4 Inference and Deployment](#inference)
- [4 Supported languages and abbreviations](#language_abbreviations) - [4 Supported languages and abbreviations](#language_abbreviations)
...@@ -42,7 +42,7 @@ This document will briefly introduce how to use the multilingual model. ...@@ -42,7 +42,7 @@ This document will briefly introduce how to use the multilingual model.
## 1 Installation ## 1 Installation
<a name="paddle_install"></a> <a name="paddle_install"></a>
### 1.1 paddle installation ### 1.1 Paddle installation
``` ```
# cpu # cpu
pip install paddlepaddle pip install paddlepaddle
...@@ -52,7 +52,7 @@ pip install paddlepaddle-gpu ...@@ -52,7 +52,7 @@ pip install paddlepaddle-gpu
``` ```
<a name="paddleocr_package_install"></a> <a name="paddleocr_package_install"></a>
### 1.2 paddleocr package installation ### 1.2 PaddleOCR package installation
pip install pip install
...@@ -79,8 +79,8 @@ paddleocr -h ...@@ -79,8 +79,8 @@ paddleocr -h
* Whole image prediction (detection + recognition) * Whole image prediction (detection + recognition)
Paddleocr currently supports 80 languages, which can be switched by modifying the --lang parameter. PaddleOCR currently supports 80 languages, which can be specified by the --lang parameter.
The specific supported [language] (#language_abbreviations) can be viewed in the table. The supported languages are listed in the [table](#language_abbreviations).
``` bash ``` bash
paddleocr --image_dir doc/imgs_en/254.jpg --lang=en paddleocr --image_dir doc/imgs_en/254.jpg --lang=en
...@@ -90,7 +90,7 @@ paddleocr --image_dir doc/imgs_en/254.jpg --lang=en ...@@ -90,7 +90,7 @@ paddleocr --image_dir doc/imgs_en/254.jpg --lang=en
<img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600"> <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div> </div>
The result is a list, each item contains a text box, text and recognition confidence The result is a list. Each item contains a text box, text and recognition confidence
```text ```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]] [('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]] [('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
...@@ -110,7 +110,7 @@ paddleocr --image_dir doc/imgs_words_en/word_308.png --det false --lang=en ...@@ -110,7 +110,7 @@ paddleocr --image_dir doc/imgs_words_en/word_308.png --det false --lang=en
![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_words_en/word_308.png) ![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_words_en/word_308.png)
The result is a tuple, which returns the recognition result and recognition confidence The result is a 2-tuple, which contains the recognition result and recognition confidence
```text ```text
(0.99879867, 'LITTLE') (0.99879867, 'LITTLE')
...@@ -122,7 +122,7 @@ The result is a tuple, which returns the recognition result and recognition conf ...@@ -122,7 +122,7 @@ The result is a tuple, which returns the recognition result and recognition conf
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
``` ```
The result is a list, each item contains only text boxes The result is a list. Each item represents the coordinates of a text box.
``` ```
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]] [[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
...@@ -132,9 +132,9 @@ The result is a list, each item contains only text boxes ...@@ -132,9 +132,9 @@ The result is a list, each item contains only text boxes
``` ```
<a name="python_script_running"></a> <a name="python_script_running"></a>
### 2.2 python script running ### 2.2 Run with Python script
ppocr also supports running in python scripts for easy embedding in your own code: PPOCR is able to run with Python scripts for easy integration with your own code:
* Whole image prediction (detection + recognition) * Whole image prediction (detection + recognition)
...@@ -167,12 +167,12 @@ Visualization of results: ...@@ -167,12 +167,12 @@ Visualization of results:
![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/korean.jpg) ![](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.1/doc/imgs_results/korean.jpg)
ppocr also supports direction classification. For more usage methods, please refer to: [whl package instructions](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/whl.md). PPOCR also supports direction classification. For more detailed usage, please refer to: [whl package instructions](whl_en.md).
<a name="Custom_training"></a> <a name="Custom_training"></a>
## 3 Custom training ## 3 Custom training
ppocr supports using your own data for custom training or finetune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml) PPOCR supports using your own data for custom training or fine-tune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml)
Modify the training data path, dictionary and other parameters. Modify the training data path, dictionary and other parameters.
For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment, For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment,
...@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor ...@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor
## 4 Inference and Deployment ## 4 Inference and Deployment
In addition to installing the whl package for quick forecasting, In addition to installing the whl package for quick forecasting,
ppocr also provides a variety of forecasting deployment methods. PPOCR also provides a variety of forecasting deployment methods.
If necessary, you can read related documents: If necessary, you can read related documents:
- [Python Inference](./inference_en.md) - [Python Inference](./inference_en.md)
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## 1. PaddleOCR Overview ## 1. PaddleOCR Overview
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. Combining actual testing and industrial experience, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library according to different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure tool library for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality. PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. With the experience from real world scenarios and the industry, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library of different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure toolkit for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality.
<div align="center"> <div align="center">
<img src="../overview_en.png"> <img src="../overview_en.png">
...@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith ...@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith
# Recommend # Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR git clone https://github.com/PaddlePaddle/PaddleOCR
# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud: # If you cannot pull successfully due to network problems, you can switch to the mirror hosted on Gitee:
git clone https://gitee.com/paddlepaddle/PaddleOCR git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method. # Note: The mirror on Gitee may not keep in synchronization with the latest project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
``` ```
### **2.2 Install third-party libraries** ### **2.2 Install third-party libraries**
...@@ -34,6 +34,6 @@ pip3 install -r requirements.txt ...@@ -34,6 +34,6 @@ pip3 install -r requirements.txt
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely). Please try to download Shapely whl file from [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found) Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
\ No newline at end of file
...@@ -6,18 +6,18 @@ ...@@ -6,18 +6,18 @@
<a name="Brief_Introduction"></a> <a name="Brief_Introduction"></a>
## 1. Brief Introduction ## 1. Brief Introduction
OCR algorithm can be divided into two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm gets the detection box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm can complete text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster. OCR algorithms can be divided into two categories: two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm locates the box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm combines text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster.
### Introduction Of PGNet Algorithm ### Introduction Of PGNet Algorithm
In recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has the advantages that other algorithms do not During the recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has some advantages over the other algorithms.
- Pgnet loss is designed to guide training, and no character-level annotations is needed - PGNet loss is designed to guide training, and no character-level annotations is needed.
- NMS and ROI related operations are not needed, It can accelerate the prediction - NMS and ROI related operations are not needed. It can accelerate the prediction
- The reading order prediction module is proposed - The reading order prediction module is proposed
- A graph based modification module (GRM) is proposed to further improve the performance of model recognition - A graph based modification module (GRM) is proposed to further improve the performance of model recognition
- Higher accuracy and faster prediction speed - Higher accuracy and faster prediction speed
For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) ,The schematic diagram of the algorithm is as follows: For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf). The schematic diagram of the algorithm is as follows:
![](../pgnet_framework.png) ![](../pgnet_framework.png)
After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text centerline prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction. After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text center-line prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction.
The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition. The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition.
The results of detection and recognition are as follows: The results of detection and recognition are as follows:
...@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf ...@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf
<a name="Quick_Use"></a> <a name="Quick_Use"></a>
## 3. Quick Use ## 3. Quick Use
### inference model download ### Inference model download
This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar) This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar)
``` ```
mkdir inference && cd inference mkdir inference && cd inference
...@@ -131,7 +131,7 @@ python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Optimizer.base_lr=0.0 ...@@ -131,7 +131,7 @@ python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Optimizer.base_lr=0.0
``` ```
#### Load trained model and continue training #### Load trained model and continue training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. If you would like to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
```shell ```shell
python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.checkpoints=./your/trained/model
``` ```
......
...@@ -12,15 +12,15 @@ ...@@ -12,15 +12,15 @@
* [4. FAQ](#3-faq) * [4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training. This article will introduce the basic concepts that is necessary for model training and tuning.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene. At the same time, it will briefly introduce the structure of the training data and how to prepare the data to fine-tune model in vertical scenes.
<a name="1-Yml-Configuration"></a> <a name="1-Yml-Configuration"></a>
## 1. Yml Configuration ## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify. The PaddleOCR uses configuration files to control network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to train the model. Fine-tuning can also be completed by modifying the parameters in the configuration file, which is simple and convenient.
For the complete configuration file description, please refer to [Configuration File](./config_en.md) For the complete configuration file description, please refer to [Configuration File](./config_en.md)
...@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration ...@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration
## 2. Basic Concepts ## 2. Basic Concepts
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference: During the model training process, some hyper-parameters can be manually specified to obtain the optimal result at the least cost. Different data volumes may require different hyper-parameters. When you want to fine-tune the model based on your own data, there are several parameter adjustment strategies for reference:
<a name="11-learning-rate"></a> <a name="11-learning-rate"></a>
### 2.1 Learning Rate ### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration. The learning rate is one of the most important hyper-parameters for training neural networks. It represents the step length of the gradient moving towards the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example: A variety of learning rate update strategies are provided by PaddleOCR, which can be specified in configuration files. For example,
``` ```
Optimizer: Optimizer:
...@@ -46,16 +46,15 @@ Optimizer: ...@@ -46,16 +46,15 @@ Optimizer:
warmup_epoch: 5 warmup_epoch: 5
``` ```
Piecewise stands for piecewise constant attenuation. Different learning rates are specified in different learning stages, `Piecewise` stands for piece-wise constant attenuation. Different learning rates are specified in different learning stages, and the learning rate stay the same in each stage.
and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py). `warmup_epoch` means that in the first 5 epochs, the learning rate will be increased gradually from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a> <a name="12-regularization"></a>
### 2.2 Regularization ### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods. Regularization can effectively avoid algorithm over-fitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods. L1 and L2 regularization are the most widely used regularization methods.
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters; L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters. while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
The configuration method is as follows: The configuration method is as follows:
...@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows: ...@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data. - Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set. - Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc. Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](../doc_ch/datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<a name="22-vertical-scene"></a> <a name="22-vertical-scene"></a>
...@@ -129,17 +128,17 @@ There are several experiences for reference when constructing the data set: ...@@ -129,17 +128,17 @@ There are several experiences for reference when constructing the data set:
**Q**: How to choose a suitable network input shape when training CRNN recognition? **Q**: How to choose a suitable network input shape when training CRNN recognition?
A: The general height is 32, the longest width is selected, there are two methods: A: The general height is 32, the longest width is selected, there are two methods:
(1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples. (1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated. (2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do? **Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try: A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
(1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4. (1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4.
(2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47) (2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0 **Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files). - 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized. - 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048). - 2021.1.21 update more than 25+ multilingual recognition models [models list](./models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image. - 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. - 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
......
doc/joinus.PNG

189 KB | W: | H:

doc/joinus.PNG

199 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
...@@ -2551,7 +2551,7 @@ ...@@ -2551,7 +2551,7 @@
"\n", "\n",
"Paddle Serving是飞桨为方便开发者进行服务化部署而打造的工具,本节主要介绍基于Paddle Serving的PP-OCRv2系统服务化部署过程。\n", "Paddle Serving是飞桨为方便开发者进行服务化部署而打造的工具,本节主要介绍基于Paddle Serving的PP-OCRv2系统服务化部署过程。\n",
"\n", "\n",
"## 4.1 Padde Serving简介\n", "## 4.1 Paddle Serving简介\n",
"\n", "\n",
"Paddle Serving作为飞桨(PaddlePaddle)开源的服务化部署框架,长期目标就是围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。Paddle Serving目前提供了两套框架C++ Serving和Python Pipeline。Python Pipeline框架倾向于二次开发的便捷性,C++ Serving框架更倾向于追求极致性能。\n", "Paddle Serving作为飞桨(PaddlePaddle)开源的服务化部署框架,长期目标就是围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。Paddle Serving目前提供了两套框架C++ Serving和Python Pipeline。Python Pipeline框架倾向于二次开发的便捷性,C++ Serving框架更倾向于追求极致性能。\n",
"\n", "\n",
...@@ -347,8 +347,9 @@ class PaddleOCR(predict_system.TextSystem): ...@@ -347,8 +347,9 @@ class PaddleOCR(predict_system.TextSystem):
ocr with paddleocr ocr with paddleocr
args: args:
img: img for ocr, support ndarray, img_path and list or ndarray img: img for ocr, support ndarray, img_path and list or ndarray
det: use text detection or not, if false, only rec will be exec. default is True det: use text detection or not. If false, only rec will be exec. Default is True
rec: use text recognition or not, if false, only det will be exec. default is True rec: use text recognition or not. If false, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
""" """
assert isinstance(img, (np.ndarray, list, str)) assert isinstance(img, (np.ndarray, list, str))
if isinstance(img, list) and det == True: if isinstance(img, list) and det == True:
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
class ClsMetric(object): class ClsMetric(object):
def __init__(self, main_indicator='acc', **kwargs): def __init__(self, main_indicator='acc', **kwargs):
self.main_indicator = main_indicator self.main_indicator = main_indicator
self.eps = 1e-5
self.reset() self.reset()
def __call__(self, pred_label, *args, **kwargs): def __call__(self, pred_label, *args, **kwargs):
...@@ -28,7 +29,7 @@ class ClsMetric(object): ...@@ -28,7 +29,7 @@ class ClsMetric(object):
all_num += 1 all_num += 1
self.correct_num += correct_num self.correct_num += correct_num
self.all_num += all_num self.all_num += all_num
return {'acc': correct_num / all_num, } return {'acc': correct_num / (all_num + self.eps), }
def get_metric(self): def get_metric(self):
""" """
...@@ -36,7 +37,7 @@ class ClsMetric(object): ...@@ -36,7 +37,7 @@ class ClsMetric(object):
'acc': 0 'acc': 0
} }
""" """
acc = self.correct_num / self.all_num acc = self.correct_num / (self.all_num + self.eps)
self.reset() self.reset()
return {'acc': acc} return {'acc': acc}
......
...@@ -20,6 +20,7 @@ class RecMetric(object): ...@@ -20,6 +20,7 @@ class RecMetric(object):
def __init__(self, main_indicator='acc', is_filter=False, **kwargs): def __init__(self, main_indicator='acc', is_filter=False, **kwargs):
self.main_indicator = main_indicator self.main_indicator = main_indicator
self.is_filter = is_filter self.is_filter = is_filter
self.eps = 1e-5
self.reset() self.reset()
def _normalize_text(self, text): def _normalize_text(self, text):
...@@ -47,8 +48,8 @@ class RecMetric(object): ...@@ -47,8 +48,8 @@ class RecMetric(object):
self.all_num += all_num self.all_num += all_num
self.norm_edit_dis += norm_edit_dis self.norm_edit_dis += norm_edit_dis
return { return {
'acc': correct_num / all_num, 'acc': correct_num / (all_num + self.eps),
'norm_edit_dis': 1 - norm_edit_dis / (all_num + 1e-3) 'norm_edit_dis': 1 - norm_edit_dis / (all_num + self.eps)
} }
def get_metric(self): def get_metric(self):
...@@ -58,8 +59,8 @@ class RecMetric(object): ...@@ -58,8 +59,8 @@ class RecMetric(object):
'norm_edit_dis': 0, 'norm_edit_dis': 0,
} }
""" """
acc = 1.0 * self.correct_num / (self.all_num + 1e-3) acc = 1.0 * self.correct_num / (self.all_num + self.eps)
norm_edit_dis = 1 - self.norm_edit_dis / (self.all_num + 1e-3) norm_edit_dis = 1 - self.norm_edit_dis / (self.all_num + self.eps)
self.reset() self.reset()
return {'acc': acc, 'norm_edit_dis': norm_edit_dis} return {'acc': acc, 'norm_edit_dis': norm_edit_dis}
......
...@@ -12,9 +12,12 @@ ...@@ -12,9 +12,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import numpy as np import numpy as np
class TableMetric(object): class TableMetric(object):
def __init__(self, main_indicator='acc', **kwargs): def __init__(self, main_indicator='acc', **kwargs):
self.main_indicator = main_indicator self.main_indicator = main_indicator
self.eps = 1e-5
self.reset() self.reset()
def __call__(self, pred, batch, *args, **kwargs): def __call__(self, pred, batch, *args, **kwargs):
...@@ -31,9 +34,7 @@ class TableMetric(object): ...@@ -31,9 +34,7 @@ class TableMetric(object):
correct_num += 1 correct_num += 1
self.correct_num += correct_num self.correct_num += correct_num
self.all_num += all_num self.all_num += all_num
return { return {'acc': correct_num * 1.0 / (all_num + self.eps), }
'acc': correct_num * 1.0 / all_num,
}
def get_metric(self): def get_metric(self):
""" """
...@@ -41,7 +42,7 @@ class TableMetric(object): ...@@ -41,7 +42,7 @@ class TableMetric(object):
'acc': 0, 'acc': 0,
} }
""" """
acc = 1.0 * self.correct_num / self.all_num acc = 1.0 * self.correct_num / (self.all_num + self.eps)
self.reset() self.reset()
return {'acc': acc} return {'acc': acc}
......
...@@ -96,7 +96,7 @@ In PP-Structure, the image will be divided into 5 types of areas **text, title, ...@@ -96,7 +96,7 @@ In PP-Structure, the image will be divided into 5 types of areas **text, title,
#### 6.1.1 Layout analysis #### 6.1.1 Layout analysis
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md). Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).
#### 6.1.2 Table recognition #### 6.1.2 Table recognition
......
# Key Information Extraction(KIE)
This section provides a tutorial example on how to quickly use, train, and evaluate a key information extraction(KIE) model, [SDMGR](https://arxiv.org/abs/2103.14470), in PaddleOCR.
[SDMGR(Spatial Dual-Modality Graph Reasoning)](https://arxiv.org/abs/2103.14470) is a KIE algorithm that classifies each detected text region into predefined categories, such as order ID, invoice number, amount, and etc.
* [1. Quick Use](#1-----)
* [2. Model Training](#2-----)
* [3. Model Evaluation](#3-----)
<a name="1-----"></a>
## 1. Quick Use
[Wildreceipt dataset](https://paperswithcode.com/dataset/wildreceipt) is used for this tutorial. It contains 1765 photos, with 25 classes, and 50000 text boxes, which can be downloaded by wget:
```shell
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
Download the pretrained model and predict the result:
```shell
cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
```
The prediction result is saved as `./output/sdmgr_kie/predicts_kie.txt`, and the visualization results are saved in the folder`/output/sdmgr_kie/kie_results/`.
The visualization results are shown in the figure below:
<div align="center">
<img src="./imgs/0.png" width="800">
</div>
<a name="2-----"></a>
## 2. Model Training
Create a softlink to the folder, `PaddleOCR/train_data`:
```shell
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
```
The configuration file used for training is `configs/kie/kie_unet_sdmgr.yml`. The default training data path in the configuration file is `train_data/wildreceipt`. After preparing the data, you can execute the model training with the following command:
```shell
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
```
<a name="3-----"></a>
## 3. Model Evaluation
After training, you can execute the model evaluation with the following command:
```shell
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
```
**Reference:**
<!-- [ALGORITHM] -->
```bibtex
@misc{sun2021spatial,
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
year={2021},
eprint={2103.14470},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
...@@ -39,7 +39,7 @@ paddleocr --image_dir=../doc/table/1.png --type=structure ...@@ -39,7 +39,7 @@ paddleocr --image_dir=../doc/table/1.png --type=structure
* VQA * VQA
coming soon 请参考:[文档视觉问答](../vqa/README.md)
<a name="22"></a> <a name="22"></a>
...@@ -74,7 +74,7 @@ im_show.save('result.jpg') ...@@ -74,7 +74,7 @@ im_show.save('result.jpg')
* VQA * VQA
comming soon 请参考:[文档视觉问答](../vqa/README.md)
<a name="23"></a> <a name="23"></a>
...@@ -101,7 +101,7 @@ dict 里各个字段说明如下 ...@@ -101,7 +101,7 @@ dict 里各个字段说明如下
* VQA * VQA
comming soon 请参考:[文档视觉问答](../vqa/README.md)
<a name="24"></a> <a name="24"></a>
...@@ -116,9 +116,9 @@ comming soon ...@@ -116,9 +116,9 @@ comming soon
| model_name_or_path | VQA SER模型地址 | None | | model_name_or_path | VQA SER模型地址 | None |
| max_seq_length | VQA SER模型最大支持token长度 | 512 | | max_seq_length | VQA SER模型最大支持token长度 | 512 |
| label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt | | label_map_path | VQA SER 标签文件地址 | ./vqa/labels/labels_ser.txt |
| mode | pipeline预测模式,structure: 版面分析+表格识别; vqa: ser文档信息抽取 | structure | | mode | pipeline预测模式,structure: 版面分析+表格识别; VQA: SER文档信息抽取 | structure |
大部分参数和paddleocr whl包保持一致,见 [whl包文档](../doc/doc_ch/whl.md) 大部分参数和PaddleOCR whl包保持一致,见 [whl包文档](../../doc/doc_ch/whl.md)
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名名为表格在图片里的坐标。 运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名名为表格在图片里的坐标。
...@@ -133,16 +133,16 @@ cd ppstructure ...@@ -133,16 +133,16 @@ cd ppstructure
# 下载模型 # 下载模型
mkdir inference && cd inference mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压 # 下载PP-OCRv2文本检测模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压 # 下载PP-OCRv2文本识别模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
# 下载超轻量级英文表格英寸模型并解压 # 下载超轻量级英文表格预测模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
python3 predict_system.py --det_model_dir=inference/ch_ppocr_mobile_v2.0_det_infer \ python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
--rec_model_dir=inference/ch_ppocr_mobile_v2.0_rec_infer \ --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
--table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \ --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
--image_dir=../doc/table/1.png \ --image_dir=../doc/table/1.png \
--rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \ --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
......
...@@ -41,7 +41,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab ...@@ -41,7 +41,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
# run # run
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_dict_path=../ppocr/utils/dict/en_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
``` ```
Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`. Note: The above model is trained on the PubLayNet dataset and only supports English scanning scenarios. If you need to identify other scenarios, you need to train the model yourself and replace the three fields `det_model_dir`, `rec_model_dir`, `table_model_dir`.
......
...@@ -56,7 +56,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab ...@@ -56,7 +56,7 @@ wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_tab
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd .. cd ..
# 执行预测 # 执行预测
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_dict_path=../ppocr/utils/dict/en_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=../doc/table/table.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
``` ```
运行完成后,每张图片的excel表格会保存到output字段指定的目录下 运行完成后,每张图片的excel表格会保存到output字段指定的目录下
......
...@@ -68,14 +68,14 @@ test_tipc/ ...@@ -68,14 +68,14 @@ test_tipc/
├── model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt # 测试Linux上c++预测的配置文件 ├── model_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt # 测试Linux上c++预测的配置文件
├── model_linux_gpu_normal_normal_infer_python_jetson.txt # 测试Jetson上python预测的配置文件 ├── model_linux_gpu_normal_normal_infer_python_jetson.txt # 测试Jetson上python预测的配置文件
├── train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt # 测试Linux上多机多卡、混合精度训练和python预测的配置文件 ├── train_linux_gpu_fleet_amp_infer_python_linux_gpu_cpu.txt # 测试Linux上多机多卡、混合精度训练和python预测的配置文件
├── ... ├── ...
├── ch_ppocr_server_v2.0_det # ch_ppocr_server_v2.0_det模型的测试配置文件目录 ├── ch_ppocr_server_v2.0_det # ch_ppocr_server_v2.0_det模型的测试配置文件目录
├── ... ├── ...
├── ch_ppocr_mobile_v2.0_rec # ch_ppocr_mobile_v2.0_rec模型的测试配置文件目录 ├── ch_ppocr_mobile_v2.0_rec # ch_ppocr_mobile_v2.0_rec模型的测试配置文件目录
├── ... ├── ...
├── ch_ppocr_server_v2.0_det # ch_ppocr_server_v2.0_det模型的测试配置文件目录 ├── ch_ppocr_server_v2.0_det # ch_ppocr_server_v2.0_det模型的测试配置文件目录
├── ... ├── ...
├── ... ├── ...
├── results/ # 预先保存的预测结果,用于和实际预测结果进行精读比对 ├── results/ # 预先保存的预测结果,用于和实际预测结果进行精读比对
├── python_ppocr_det_mobile_results_fp32.txt # 预存的mobile版ppocr检测模型python预测fp32精度的结果 ├── python_ppocr_det_mobile_results_fp32.txt # 预存的mobile版ppocr检测模型python预测fp32精度的结果
├── python_ppocr_det_mobile_results_fp16.txt # 预存的mobile版ppocr检测模型python预测fp16精度的结果 ├── python_ppocr_det_mobile_results_fp16.txt # 预存的mobile版ppocr检测模型python预测fp16精度的结果
...@@ -119,7 +119,7 @@ bash test_tipc/test_train_inference_python.sh configs/[model_name]/[params_file_ ...@@ -119,7 +119,7 @@ bash test_tipc/test_train_inference_python.sh configs/[model_name]/[params_file_
bash test_tipc/prepare.sh ./test_tipc/configs/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'lite_train_lite_infer' bash test_tipc/prepare.sh ./test_tipc/configs/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'lite_train_lite_infer'
# 运行测试 # 运行测试
bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'lite_train_lite_infer' bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'lite_train_lite_infer'
``` ```
关于本示例命令的更多信息可查看[基础训练预测使用文档](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/test_tipc/docs/test_train_inference_python.md#22-%E5%8A%9F%E8%83%BD%E6%B5%8B%E8%AF%95) 关于本示例命令的更多信息可查看[基础训练预测使用文档](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/test_tipc/docs/test_train_inference_python.md#22-%E5%8A%9F%E8%83%BD%E6%B5%8B%E8%AF%95)
### 配置文件命名规范 ### 配置文件命名规范
...@@ -136,9 +136,9 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/ch_ppocr_mobil ...@@ -136,9 +136,9 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/ch_ppocr_mobil
<a name="more"></a> <a name="more"></a>
## 4. 开始测试 ## 4. 开始测试
各功能测试中涉及混合精度、裁剪、量化等训练相关,及mkldnn、Tensorrt等多种预测相关参数配置,请点击下方相应链接了解更多细节和使用教程: 各功能测试中涉及混合精度、裁剪、量化等训练相关,及mkldnn、Tensorrt等多种预测相关参数配置,请点击下方相应链接了解更多细节和使用教程:
- [test_train_inference_python 使用](docs/test_train_inference_python.md) :测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。 - [test_train_inference_python 使用](docs/test_train_inference_python.md) :测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。
- [test_inference_cpp 使用](docs/test_inference_cpp.md):测试基于C++的模型推理。 - [test_inference_cpp 使用](docs/test_inference_cpp.md):测试基于C++的模型推理。
- [test_serving 使用](docs/test_serving.md):测试基于Paddle Serving的服务化部署功能。 - [test_serving 使用](docs/test_serving.md):测试基于Paddle Serving的服务化部署功能。
- [test_lite_arm_cpu_cpp 使用](docs/test_lite_arm_cpu_cpp.md):测试基于Paddle-Lite的ARM CPU端c++预测部署功能。 - [test_lite_arm_cpp 使用](docs/test_lite_arm_cpp.md):测试基于Paddle-Lite的ARM CPU端c++预测部署功能。
- [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。 - [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。
...@@ -284,7 +284,6 @@ else ...@@ -284,7 +284,6 @@ else
set_amp_config=" " set_amp_config=" "
fi fi
for trainer in ${trainer_list[*]}; do for trainer in ${trainer_list[*]}; do
eval ${env}
flag_quant=False flag_quant=False
if [ ${trainer} = ${pact_key} ]; then if [ ${trainer} = ${pact_key} ]; then
run_train=${pact_trainer} run_train=${pact_trainer}
...@@ -344,6 +343,7 @@ else ...@@ -344,6 +343,7 @@ else
# run eval # run eval
if [ ${eval_py} != "null" ]; then if [ ${eval_py} != "null" ]; then
eval ${env}
set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}") set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}")
eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}" eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}"
eval $eval_cmd eval $eval_cmd
......
...@@ -92,11 +92,11 @@ class TextSystem(object): ...@@ -92,11 +92,11 @@ class TextSystem(object):
self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list, self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list,
rec_res) rec_res)
filter_boxes, filter_rec_res = [], [] filter_boxes, filter_rec_res = [], []
for box, rec_reuslt in zip(dt_boxes, rec_res): for box, rec_result in zip(dt_boxes, rec_res):
text, score = rec_reuslt text, score = rec_result
if score >= self.drop_score: if score >= self.drop_score:
filter_boxes.append(box) filter_boxes.append(box)
filter_rec_res.append(rec_reuslt) filter_rec_res.append(rec_result)
return filter_boxes, filter_rec_res return filter_boxes, filter_rec_res
......
...@@ -73,8 +73,8 @@ def main(): ...@@ -73,8 +73,8 @@ def main():
images = paddle.to_tensor(images) images = paddle.to_tensor(images)
preds = model(images) preds = model(images)
post_result = post_process_class(preds) post_result = post_process_class(preds)
for rec_reuslt in post_result: for rec_result in post_result:
logger.info('\t result: {}'.format(rec_reuslt)) logger.info('\t result: {}'.format(rec_result))
logger.info("success!") logger.info("success!")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment