Commit ec0de454 authored by Leif's avatar Leif
Browse files

Merge remote-tracking branch 'upstream/dygraph' into dy1

parents a127340c c3e5522c
...@@ -122,8 +122,7 @@ For a new language request, please refer to [Guideline for new language_requests ...@@ -122,8 +122,7 @@ For a new language request, please refer to [Guideline for new language_requests
<img src="./doc/ppocr_framework.png" width="800"> <img src="./doc/ppocr_framework.png" width="800">
</div> </div>
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner and PACT quantization is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim). PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection[2], detection frame correction and CRNN text recognition[7]. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner [8] and PACT quantization [9] is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
## Visualization [more](./doc/doc_en/visualization_en.md) ## Visualization [more](./doc/doc_en/visualization_en.md)
......
...@@ -8,8 +8,8 @@ PaddleOCR同时支持动态图与静态图两种编程范式 ...@@ -8,8 +8,8 @@ PaddleOCR同时支持动态图与静态图两种编程范式
- 静态图版本:develop分支 - 静态图版本:develop分支
**近期更新** **近期更新**
- 2021.1.4 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数142个,每周一都会更新,欢迎大家持续关注。
- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。 - 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.12.14 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数127个,每周一都会更新,欢迎大家持续关注。
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 - 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941 - 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- [More](./doc/doc_ch/update.md) - [More](./doc/doc_ch/update.md)
...@@ -101,8 +101,8 @@ PaddleOCR同时支持动态图与静态图两种编程范式 ...@@ -101,8 +101,8 @@ PaddleOCR同时支持动态图与静态图两种编程范式
- [效果展示](#效果展示) - [效果展示](#效果展示)
- FAQ - FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md) - [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用30个问题](./doc/doc_ch/FAQ.md) - [【理论篇】OCR通用31个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战84个问题](./doc/doc_ch/FAQ.md) - [【实战篇】PaddleOCR实战101个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群) - [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md) - [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书) - [许可证书](#许可证书)
...@@ -115,7 +115,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式 ...@@ -115,7 +115,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
<img src="./doc/ppocr_framework.png" width="800"> <img src="./doc/ppocr_framework.png" width="800">
</div> </div>
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。其中FPGM裁剪器和PACT量化的实现可以参考[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测[2]、检测框矫正和CRNN文本识别三部分组成[7]。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。其中FPGM裁剪器[8]和PACT量化[9]的实现可以参考[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
<a name="效果展示"></a> <a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md) ## 效果展示 [more](./doc/doc_ch/visualization.md)
......
...@@ -22,7 +22,7 @@ English | [简体中文](README_ch.md) ...@@ -22,7 +22,7 @@ English | [简体中文](README_ch.md)
</div> </div>
The Style-Text data synthesis tool is a tool based on Baidu's self-developed text editing algorithm "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047). The Style-Text data synthesis tool is a tool based on Baidu and HUST cooperation research work, "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).
Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes: Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
* (1) Text foreground style transfer module. * (1) Text foreground style transfer module.
...@@ -69,12 +69,14 @@ fusion_generator: ...@@ -69,12 +69,14 @@ fusion_generator:
1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder. 1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder.
```python ```python
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
``` ```
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean. * Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean.
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models. * Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly. So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
* Note 3: You can modify `use_gpu` in `configs/config.yml` to determine whether to use GPU for prediction.
For example, enter the following image and corpus `PaddleOCR`. For example, enter the following image and corpus `PaddleOCR`.
...@@ -139,8 +141,9 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag ...@@ -139,8 +141,9 @@ We provide a general dataset containing Chinese, English and Korean (50,000 imag
2. You can run the following command to start synthesis task: 2. You can run the following command to start synthesis task:
``` bash ``` bash
python -m tools.synth_dataset.py -c configs/dataset_config.yml python3 tools/synth_dataset.py -c configs/dataset_config.yml
``` ```
We also provide example corpus and images in `examples` folder. We also provide example corpus and images in `examples` folder.
<div align="center"> <div align="center">
<img src="examples/style_images/1.jpg" width="300"> <img src="examples/style_images/1.jpg" width="300">
......
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
</div> </div>
Style-Text数据合成工具是基于百度自研的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047 Style-Text数据合成工具是基于百度和华科合作研发的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047
不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。 不同于常用的基于GAN的数据合成工具,Style-Text主要框架包括:1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。
...@@ -61,11 +61,12 @@ fusion_generator: ...@@ -61,11 +61,12 @@ fusion_generator:
输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下: 输入一张风格图和一段文字语料,运行tools/synth_image,合成单张图片,结果图像保存在当前目录下:
```python ```python
python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
``` ```
* 注1:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。 * 注1:语言选项和语料相对应,目前该工具只支持英文、简体中文和韩语。
* 注2:Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计,我们主要支持高度在32左右的风格图像。 * 注2:Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计,我们主要支持高度在32左右的风格图像。
如果输入图像尺寸相差过多,效果可能不佳。 如果输入图像尺寸相差过多,效果可能不佳。
* 注3:可以通过修改配置文件中的`use_gpu`(true或者false)参数来决定是否使用GPU进行预测。
例如,输入如下图片和语料"PaddleOCR": 例如,输入如下图片和语料"PaddleOCR":
...@@ -127,7 +128,7 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_ ...@@ -127,7 +128,7 @@ python3 -m tools.synth_image -c configs/config.yml --style_image examples/style_
2. 运行`tools/synth_dataset`合成数据: 2. 运行`tools/synth_dataset`合成数据:
``` bash ``` bash
python -m tools.synth_dataset -c configs/dataset_config.yml python3 tools/synth_dataset.py -c configs/dataset_config.yml
``` ```
我们在examples目录下提供了样例图片和语料。 我们在examples目录下提供了样例图片和语料。
<div align="center"> <div align="center">
......
...@@ -28,6 +28,7 @@ class StyleTextRecPredictor(object): ...@@ -28,6 +28,7 @@ class StyleTextRecPredictor(object):
], "Generator {} not supported.".format(algorithm) ], "Generator {} not supported.".format(algorithm)
use_gpu = config["Global"]['use_gpu'] use_gpu = config["Global"]['use_gpu']
check_gpu(use_gpu) check_gpu(use_gpu)
paddle.set_device('gpu' if use_gpu else 'cpu')
self.logger = get_logger() self.logger = get_logger()
self.generator = getattr(style_text_rec, algorithm)(config) self.generator = getattr(style_text_rec, algorithm)(config)
self.height = config["Global"]["image_height"] self.height = config["Global"]["image_height"]
......
...@@ -11,6 +11,14 @@ ...@@ -11,6 +11,14 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
from engine.synthesisers import DatasetSynthesiser from engine.synthesisers import DatasetSynthesiser
......
...@@ -16,13 +16,13 @@ import cv2 ...@@ -16,13 +16,13 @@ import cv2
import sys import sys
import glob import glob
from utils.config import ArgsParser
from engine.synthesisers import ImageSynthesiser
__dir__ = os.path.dirname(os.path.abspath(__file__)) __dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__) sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
from utils.config import ArgsParser
from engine.synthesisers import ImageSynthesiser
def synth_image(): def synth_image():
args = ArgsParser().parse_args() args = ArgsParser().parse_args()
......
...@@ -67,7 +67,7 @@ Train: ...@@ -67,7 +67,7 @@ Train:
data_dir: ./train_data/icdar2015/text_localization/ data_dir: ./train_data/icdar2015/text_localization/
label_file_list: label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt - ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [0.5] ratio_list: [1.0]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: BGR img_mode: BGR
......
...@@ -66,7 +66,7 @@ Train: ...@@ -66,7 +66,7 @@ Train:
data_dir: ./train_data/icdar2015/text_localization/ data_dir: ./train_data/icdar2015/text_localization/
label_file_list: label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt - ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [0.5] ratio_list: [1.0]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: BGR img_mode: BGR
......
...@@ -62,7 +62,7 @@ Train: ...@@ -62,7 +62,7 @@ Train:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/ data_dir: ./train_data/
label_file_list: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt] label_file_list: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train_label_json.txt]
data_ratio_list: [0.5, 0.5] ratio_list: [0.5, 0.5]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: BGR img_mode: BGR
......
...@@ -25,9 +25,9 @@ ...@@ -25,9 +25,9 @@
namespace PaddleOCR { namespace PaddleOCR {
class Config { class OCRConfig {
public: public:
explicit Config(const std::string &config_file) { explicit OCRConfig(const std::string &config_file) {
config_map_ = LoadConfig(config_file); config_map_ = LoadConfig(config_file);
this->use_gpu = bool(stoi(config_map_["use_gpu"])); this->use_gpu = bool(stoi(config_map_["use_gpu"]));
...@@ -41,8 +41,6 @@ public: ...@@ -41,8 +41,6 @@ public:
this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"])); this->use_mkldnn = bool(stoi(config_map_["use_mkldnn"]));
this->use_zero_copy_run = bool(stoi(config_map_["use_zero_copy_run"]));
this->max_side_len = stoi(config_map_["max_side_len"]); this->max_side_len = stoi(config_map_["max_side_len"]);
this->det_db_thresh = stod(config_map_["det_db_thresh"]); this->det_db_thresh = stod(config_map_["det_db_thresh"]);
...@@ -76,8 +74,6 @@ public: ...@@ -76,8 +74,6 @@ public:
bool use_mkldnn = false; bool use_mkldnn = false;
bool use_zero_copy_run = false;
int max_side_len = 960; int max_side_len = 960;
double det_db_thresh = 0.3; double det_db_thresh = 0.3;
......
...@@ -30,6 +30,8 @@ ...@@ -30,6 +30,8 @@
#include <include/preprocess_op.h> #include <include/preprocess_op.h>
#include <include/utility.h> #include <include/utility.h>
using namespace paddle_infer;
namespace PaddleOCR { namespace PaddleOCR {
class Classifier { class Classifier {
...@@ -37,14 +39,12 @@ public: ...@@ -37,14 +39,12 @@ public:
explicit Classifier(const std::string &model_dir, const bool &use_gpu, explicit Classifier(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem, const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads, const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const bool &use_zero_copy_run, const bool &use_mkldnn, const double &cls_thresh) {
const double &cls_thresh) {
this->use_gpu_ = use_gpu; this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id; this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn; this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->cls_thresh = cls_thresh; this->cls_thresh = cls_thresh;
...@@ -57,14 +57,13 @@ public: ...@@ -57,14 +57,13 @@ public:
cv::Mat Run(cv::Mat &img); cv::Mat Run(cv::Mat &img);
private: private:
std::shared_ptr<PaddlePredictor> predictor_; std::shared_ptr<Predictor> predictor_;
bool use_gpu_ = false; bool use_gpu_ = false;
int gpu_id_ = 0; int gpu_id_ = 0;
int gpu_mem_ = 4000; int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4; int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false; bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
double cls_thresh = 0.5; double cls_thresh = 0.5;
std::vector<float> mean_ = {0.5f, 0.5f, 0.5f}; std::vector<float> mean_ = {0.5f, 0.5f, 0.5f};
......
...@@ -32,6 +32,8 @@ ...@@ -32,6 +32,8 @@
#include <include/postprocess_op.h> #include <include/postprocess_op.h>
#include <include/preprocess_op.h> #include <include/preprocess_op.h>
using namespace paddle_infer;
namespace PaddleOCR { namespace PaddleOCR {
class DBDetector { class DBDetector {
...@@ -39,8 +41,8 @@ public: ...@@ -39,8 +41,8 @@ public:
explicit DBDetector(const std::string &model_dir, const bool &use_gpu, explicit DBDetector(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem, const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads, const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const bool &use_zero_copy_run, const bool &use_mkldnn, const int &max_side_len,
const int &max_side_len, const double &det_db_thresh, const double &det_db_thresh,
const double &det_db_box_thresh, const double &det_db_box_thresh,
const double &det_db_unclip_ratio, const double &det_db_unclip_ratio,
const bool &visualize) { const bool &visualize) {
...@@ -49,7 +51,6 @@ public: ...@@ -49,7 +51,6 @@ public:
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn; this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->max_side_len_ = max_side_len; this->max_side_len_ = max_side_len;
...@@ -69,14 +70,13 @@ public: ...@@ -69,14 +70,13 @@ public:
void Run(cv::Mat &img, std::vector<std::vector<std::vector<int>>> &boxes); void Run(cv::Mat &img, std::vector<std::vector<std::vector<int>>> &boxes);
private: private:
std::shared_ptr<PaddlePredictor> predictor_; std::shared_ptr<Predictor> predictor_;
bool use_gpu_ = false; bool use_gpu_ = false;
int gpu_id_ = 0; int gpu_id_ = 0;
int gpu_mem_ = 4000; int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4; int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false; bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
int max_side_len_ = 960; int max_side_len_ = 960;
......
...@@ -32,6 +32,8 @@ ...@@ -32,6 +32,8 @@
#include <include/preprocess_op.h> #include <include/preprocess_op.h>
#include <include/utility.h> #include <include/utility.h>
using namespace paddle_infer;
namespace PaddleOCR { namespace PaddleOCR {
class CRNNRecognizer { class CRNNRecognizer {
...@@ -39,14 +41,12 @@ public: ...@@ -39,14 +41,12 @@ public:
explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu, explicit CRNNRecognizer(const std::string &model_dir, const bool &use_gpu,
const int &gpu_id, const int &gpu_mem, const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads, const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const bool &use_zero_copy_run, const bool &use_mkldnn, const string &label_path) {
const string &label_path) {
this->use_gpu_ = use_gpu; this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id; this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem; this->gpu_mem_ = gpu_mem;
this->cpu_math_library_num_threads_ = cpu_math_library_num_threads; this->cpu_math_library_num_threads_ = cpu_math_library_num_threads;
this->use_mkldnn_ = use_mkldnn; this->use_mkldnn_ = use_mkldnn;
this->use_zero_copy_run_ = use_zero_copy_run;
this->label_list_ = Utility::ReadDict(label_path); this->label_list_ = Utility::ReadDict(label_path);
this->label_list_.insert(this->label_list_.begin(), this->label_list_.insert(this->label_list_.begin(),
...@@ -63,14 +63,13 @@ public: ...@@ -63,14 +63,13 @@ public:
Classifier *cls); Classifier *cls);
private: private:
std::shared_ptr<PaddlePredictor> predictor_; std::shared_ptr<Predictor> predictor_;
bool use_gpu_ = false; bool use_gpu_ = false;
int gpu_id_ = 0; int gpu_id_ = 0;
int gpu_mem_ = 4000; int gpu_mem_ = 4000;
int cpu_math_library_num_threads_ = 4; int cpu_math_library_num_threads_ = 4;
bool use_mkldnn_ = false; bool use_mkldnn_ = false;
bool use_zero_copy_run_ = false;
std::vector<std::string> label_list_; std::vector<std::string> label_list_;
......
...@@ -122,10 +122,10 @@ build/paddle_inference_install_dir/ ...@@ -122,10 +122,10 @@ build/paddle_inference_install_dir/
* 下载之后使用下面的方法解压。 * 下载之后使用下面的方法解压。
``` ```
tar -xf fluid_inference.tgz tar -xf paddle_inference.tgz
``` ```
最终会在当前的文件夹中生成`fluid_inference/`的子文件夹。 最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。
## 2 开始运行 ## 2 开始运行
...@@ -137,11 +137,11 @@ tar -xf fluid_inference.tgz ...@@ -137,11 +137,11 @@ tar -xf fluid_inference.tgz
``` ```
inference/ inference/
|-- det_db |-- det_db
| |--model | |--inference.pdparams
| |--params | |--inference.pdimodel
|-- rec_rcnn |-- rec_rcnn
| |--model | |--inference.pdparams
| |--params | |--inference.pdparams
``` ```
...@@ -180,7 +180,7 @@ cmake .. \ ...@@ -180,7 +180,7 @@ cmake .. \
make -j make -j
``` ```
`OPENCV_DIR`为opencv编译安装的地址;`LIB_DIR`为下载(`fluid_inference`文件夹)或者编译生成的Paddle预测库地址(`build/fluid_inference_install_dir`文件夹);`CUDA_LIB_DIR`为cuda库文件地址,在docker中;为`/usr/local/cuda/lib64``CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/` `OPENCV_DIR`为opencv编译安装的地址;`LIB_DIR`为下载(`paddle_inference`文件夹)或者编译生成的Paddle预测库地址(`build/paddle_inference_install_dir`文件夹);`CUDA_LIB_DIR`为cuda库文件地址,在docker中;为`/usr/local/cuda/lib64``CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`
* 编译完成之后,会在`build`文件夹下生成一个名为`ocr_system`的可执行文件。 * 编译完成之后,会在`build`文件夹下生成一个名为`ocr_system`的可执行文件。
...@@ -202,7 +202,6 @@ gpu_id 0 # GPU id,使用GPU时有效 ...@@ -202,7 +202,6 @@ gpu_id 0 # GPU id,使用GPU时有效
gpu_mem 4000 # 申请的GPU内存 gpu_mem 4000 # 申请的GPU内存
cpu_math_library_num_threads 10 # CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快 cpu_math_library_num_threads 10 # CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快
use_mkldnn 1 # 是否使用mkldnn库 use_mkldnn 1 # 是否使用mkldnn库
use_zero_copy_run 1 # 是否使用use_zero_copy_run进行预测
# det config # det config
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960 max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
......
...@@ -107,10 +107,10 @@ make inference_lib_dist ...@@ -107,10 +107,10 @@ make inference_lib_dist
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html). For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html](https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html).
* After the compilation process, you can see the following files in the folder of `build/fluid_inference_install_dir/`. * After the compilation process, you can see the following files in the folder of `build/paddle_inference_install_dir/`.
``` ```
build/fluid_inference_install_dir/ build/paddle_inference_install_dir/
|-- CMakeCache.txt |-- CMakeCache.txt
|-- paddle |-- paddle
|-- third_party |-- third_party
...@@ -130,10 +130,10 @@ Among them, `paddle` is the Paddle library required for C++ prediction later, an ...@@ -130,10 +130,10 @@ Among them, `paddle` is the Paddle library required for C++ prediction later, an
* After downloading, use the following method to uncompress. * After downloading, use the following method to uncompress.
``` ```
tar -xf fluid_inference.tgz tar -xf paddle_inference.tgz
``` ```
Finally you can see the following files in the folder of `fluid_inference/`. Finally you can see the following files in the folder of `paddle_inference/`.
## 2. Compile and run the demo ## 2. Compile and run the demo
...@@ -145,11 +145,11 @@ Finally you can see the following files in the folder of `fluid_inference/`. ...@@ -145,11 +145,11 @@ Finally you can see the following files in the folder of `fluid_inference/`.
``` ```
inference/ inference/
|-- det_db |-- det_db
| |--model | |--inference.pdparams
| |--params | |--inference.pdimodel
|-- rec_rcnn |-- rec_rcnn
| |--model | |--inference.pdparams
| |--params | |--inference.pdparams
``` ```
...@@ -188,7 +188,9 @@ cmake .. \ ...@@ -188,7 +188,9 @@ cmake .. \
make -j make -j
``` ```
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`fluid_inference` folder) or the generated Paddle inference library path (`build/fluid_inference_install_dir` folder); `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`. `OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`paddle_inference` folder)
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
`CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder. * After the compilation is completed, an executable file named `ocr_system` will be generated in the `build` folder.
...@@ -211,7 +213,6 @@ gpu_id 0 # GPU id when use_gpu is 1 ...@@ -211,7 +213,6 @@ gpu_id 0 # GPU id when use_gpu is 1
gpu_mem 4000 # GPU memory requested gpu_mem 4000 # GPU memory requested
cpu_math_library_num_threads 10 # Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed cpu_math_library_num_threads 10 # Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed
use_mkldnn 1 # Whether to use mkdlnn library use_mkldnn 1 # Whether to use mkdlnn library
use_zero_copy_run 1 # Whether to use use_zero_copy_run for inference
max_side_len 960 # Limit the maximum image height and width to 960 max_side_len 960 # Limit the maximum image height and width to 960
det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result det_db_thresh 0.3 # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
...@@ -244,4 +245,4 @@ The detection results will be shown on the screen, which is as follows. ...@@ -244,4 +245,4 @@ The detection results will be shown on the screen, which is as follows.
### 2.3 Notes ### 2.3 Notes
* Paddle2.0.0-beta0 inference model library is recommanded for this tuturial. * Paddle2.0.0-beta0 inference model library is recommended for this toturial.
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
namespace PaddleOCR { namespace PaddleOCR {
std::vector<std::string> Config::split(const std::string &str, std::vector<std::string> OCRConfig::split(const std::string &str,
const std::string &delim) { const std::string &delim) {
std::vector<std::string> res; std::vector<std::string> res;
if ("" == str) if ("" == str)
...@@ -38,7 +38,7 @@ std::vector<std::string> Config::split(const std::string &str, ...@@ -38,7 +38,7 @@ std::vector<std::string> Config::split(const std::string &str,
} }
std::map<std::string, std::string> std::map<std::string, std::string>
Config::LoadConfig(const std::string &config_path) { OCRConfig::LoadConfig(const std::string &config_path) {
auto config = Utility::ReadDict(config_path); auto config = Utility::ReadDict(config_path);
std::map<std::string, std::string> dict; std::map<std::string, std::string> dict;
...@@ -53,7 +53,7 @@ Config::LoadConfig(const std::string &config_path) { ...@@ -53,7 +53,7 @@ Config::LoadConfig(const std::string &config_path) {
return dict; return dict;
} }
void Config::PrintConfigInfo() { void OCRConfig::PrintConfigInfo() {
std::cout << "=======Paddle OCR inference config======" << std::endl; std::cout << "=======Paddle OCR inference config======" << std::endl;
for (auto iter = config_map_.begin(); iter != config_map_.end(); iter++) { for (auto iter = config_map_.begin(); iter != config_map_.end(); iter++) {
std::cout << iter->first << " : " << iter->second << std::endl; std::cout << iter->first << " : " << iter->second << std::endl;
......
...@@ -42,7 +42,7 @@ int main(int argc, char **argv) { ...@@ -42,7 +42,7 @@ int main(int argc, char **argv) {
exit(1); exit(1);
} }
Config config(argv[1]); OCRConfig config(argv[1]);
config.PrintConfigInfo(); config.PrintConfigInfo();
...@@ -50,37 +50,22 @@ int main(int argc, char **argv) { ...@@ -50,37 +50,22 @@ int main(int argc, char **argv) {
cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR); cv::Mat srcimg = cv::imread(img_path, cv::IMREAD_COLOR);
DBDetector det( DBDetector det(config.det_model_dir, config.use_gpu, config.gpu_id,
config.det_model_dir, config.use_gpu, config.gpu_id, config.gpu_mem, config.gpu_mem, config.cpu_math_library_num_threads,
config.cpu_math_library_num_threads, config.use_mkldnn, config.use_mkldnn, config.max_side_len, config.det_db_thresh,
config.use_zero_copy_run, config.max_side_len, config.det_db_thresh, config.det_db_box_thresh, config.det_db_unclip_ratio,
config.det_db_box_thresh, config.det_db_unclip_ratio, config.visualize); config.visualize);
Classifier *cls = nullptr; Classifier *cls = nullptr;
if (config.use_angle_cls == true) { if (config.use_angle_cls == true) {
cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id, cls = new Classifier(config.cls_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads, config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.use_zero_copy_run, config.use_mkldnn, config.cls_thresh);
config.cls_thresh);
} }
CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id, CRNNRecognizer rec(config.rec_model_dir, config.use_gpu, config.gpu_id,
config.gpu_mem, config.cpu_math_library_num_threads, config.gpu_mem, config.cpu_math_library_num_threads,
config.use_mkldnn, config.use_zero_copy_run, config.use_mkldnn, config.char_list_file);
config.char_list_file);
#ifdef USE_MKL
#pragma omp parallel
for (auto i = 0; i < 10; i++) {
LOG_IF(WARNING,
config.cpu_math_library_num_threads != omp_get_num_threads())
<< "WARNING! MKL is running on " << omp_get_num_threads()
<< " threads while cpu_math_library_num_threads is set to "
<< config.cpu_math_library_num_threads
<< ". Possible reason could be 1. You have set omp_set_num_threads() "
"somewhere; 2. MKL is not linked properly";
}
#endif
auto start = std::chrono::system_clock::now(); auto start = std::chrono::system_clock::now();
std::vector<std::vector<std::vector<int>>> boxes; std::vector<std::vector<std::vector<int>>> boxes;
......
...@@ -35,26 +35,16 @@ cv::Mat Classifier::Run(cv::Mat &img) { ...@@ -35,26 +35,16 @@ cv::Mat Classifier::Run(cv::Mat &img) {
this->permute_op_.Run(&resize_img, input.data()); this->permute_op_.Run(&resize_img, input.data());
// Inference. // Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames(); auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]); auto input_t = this->predictor_->GetInputHandle(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data()); input_t->CopyFromCpu(input.data());
this->predictor_->ZeroCopyRun(); this->predictor_->Run();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> softmax_out; std::vector<float> softmax_out;
std::vector<int64_t> label_out; std::vector<int64_t> label_out;
auto output_names = this->predictor_->GetOutputNames(); auto output_names = this->predictor_->GetOutputNames();
auto softmax_out_t = this->predictor_->GetOutputTensor(output_names[0]); auto softmax_out_t = this->predictor_->GetOutputHandle(output_names[0]);
auto softmax_shape_out = softmax_out_t->shape(); auto softmax_shape_out = softmax_out_t->shape();
int softmax_out_num = int softmax_out_num =
...@@ -63,7 +53,7 @@ cv::Mat Classifier::Run(cv::Mat &img) { ...@@ -63,7 +53,7 @@ cv::Mat Classifier::Run(cv::Mat &img) {
softmax_out.resize(softmax_out_num); softmax_out.resize(softmax_out_num);
softmax_out_t->copy_to_cpu(softmax_out.data()); softmax_out_t->CopyToCpu(softmax_out.data());
float score = 0; float score = 0;
int label = 0; int label = 0;
...@@ -95,7 +85,7 @@ void Classifier::LoadModel(const std::string &model_dir) { ...@@ -95,7 +85,7 @@ void Classifier::LoadModel(const std::string &model_dir) {
} }
// false for zero copy tensor // false for zero copy tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_); config.SwitchUseFeedFetchOps(false);
// true for multiple input // true for multiple input
config.SwitchSpecifyInputNames(true); config.SwitchSpecifyInputNames(true);
...@@ -104,6 +94,6 @@ void Classifier::LoadModel(const std::string &model_dir) { ...@@ -104,6 +94,6 @@ void Classifier::LoadModel(const std::string &model_dir) {
config.EnableMemoryOptim(); config.EnableMemoryOptim();
config.DisableGlogInfo(); config.DisableGlogInfo();
this->predictor_ = CreatePaddlePredictor(config); this->predictor_ = CreatePredictor(config);
} }
} // namespace PaddleOCR } // namespace PaddleOCR
...@@ -17,12 +17,17 @@ ...@@ -17,12 +17,17 @@
namespace PaddleOCR { namespace PaddleOCR {
void DBDetector::LoadModel(const std::string &model_dir) { void DBDetector::LoadModel(const std::string &model_dir) {
AnalysisConfig config; // AnalysisConfig config;
paddle_infer::Config config;
config.SetModel(model_dir + "/inference.pdmodel", config.SetModel(model_dir + "/inference.pdmodel",
model_dir + "/inference.pdiparams"); model_dir + "/inference.pdiparams");
if (this->use_gpu_) { if (this->use_gpu_) {
config.EnableUseGpu(this->gpu_mem_, this->gpu_id_); config.EnableUseGpu(this->gpu_mem_, this->gpu_id_);
// config.EnableTensorRtEngine(
// 1 << 20, 1, 3,
// AnalysisConfig::Precision::kFloat32,
// false, false);
} else { } else {
config.DisableGpu(); config.DisableGpu();
if (this->use_mkldnn_) { if (this->use_mkldnn_) {
...@@ -32,10 +37,8 @@ void DBDetector::LoadModel(const std::string &model_dir) { ...@@ -32,10 +37,8 @@ void DBDetector::LoadModel(const std::string &model_dir) {
} }
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_); config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
} }
// use zero_copy_run as default
// false for zero copy tensor config.SwitchUseFeedFetchOps(false);
// true for commom tensor
config.SwitchUseFeedFetchOps(!this->use_zero_copy_run_);
// true for multiple input // true for multiple input
config.SwitchSpecifyInputNames(true); config.SwitchSpecifyInputNames(true);
...@@ -44,7 +47,7 @@ void DBDetector::LoadModel(const std::string &model_dir) { ...@@ -44,7 +47,7 @@ void DBDetector::LoadModel(const std::string &model_dir) {
config.EnableMemoryOptim(); config.EnableMemoryOptim();
config.DisableGlogInfo(); config.DisableGlogInfo();
this->predictor_ = CreatePaddlePredictor(config); this->predictor_ = CreatePredictor(config);
} }
void DBDetector::Run(cv::Mat &img, void DBDetector::Run(cv::Mat &img,
...@@ -64,31 +67,21 @@ void DBDetector::Run(cv::Mat &img, ...@@ -64,31 +67,21 @@ void DBDetector::Run(cv::Mat &img,
this->permute_op_.Run(&resize_img, input.data()); this->permute_op_.Run(&resize_img, input.data());
// Inference. // Inference.
if (this->use_zero_copy_run_) {
auto input_names = this->predictor_->GetInputNames(); auto input_names = this->predictor_->GetInputNames();
auto input_t = this->predictor_->GetInputTensor(input_names[0]); auto input_t = this->predictor_->GetInputHandle(input_names[0]);
input_t->Reshape({1, 3, resize_img.rows, resize_img.cols}); input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
input_t->copy_from_cpu(input.data()); input_t->CopyFromCpu(input.data());
this->predictor_->ZeroCopyRun(); this->predictor_->Run();
} else {
paddle::PaddleTensor input_t;
input_t.shape = {1, 3, resize_img.rows, resize_img.cols};
input_t.data =
paddle::PaddleBuf(input.data(), input.size() * sizeof(float));
input_t.dtype = PaddleDType::FLOAT32;
std::vector<paddle::PaddleTensor> outputs;
this->predictor_->Run({input_t}, &outputs, 1);
}
std::vector<float> out_data; std::vector<float> out_data;
auto output_names = this->predictor_->GetOutputNames(); auto output_names = this->predictor_->GetOutputNames();
auto output_t = this->predictor_->GetOutputTensor(output_names[0]); auto output_t = this->predictor_->GetOutputHandle(output_names[0]);
std::vector<int> output_shape = output_t->shape(); std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1,
std::multiplies<int>()); std::multiplies<int>());
out_data.resize(out_num); out_data.resize(out_num);
output_t->copy_to_cpu(out_data.data()); output_t->CopyToCpu(out_data.data());
int n2 = output_shape[2]; int n2 = output_shape[2];
int n3 = output_shape[3]; int n3 = output_shape[3];
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment