Commit df001f3c authored by Leif's avatar Leif
Browse files

Merge remote-tracking branch 'origin/dygraph' into dygraph

parents 9cce1213 bdca6cd7
include LICENSE
include README.md
recursive-include ppocr/utils *.txt utility.py logging.py network.py
recursive-include ppocr/utils *.*
recursive-include ppocr/data *.py
recursive-include ppocr/postprocess *.py
recursive-include tools/infer *.py
......
......@@ -11,8 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddleocr
from .paddleocr import *
__version__ = paddleocr.VERSION
__all__ = ['PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result', 'save_structure_res','download_with_progressbar']
__all__ = [
'PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result',
'save_structure_res', 'download_with_progressbar'
]
......@@ -8,7 +8,7 @@ Global:
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [3000, 2000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
......
This image diff could not be displayed because it is too large. You can view the blob instead.
- [Visual Studio 2019 Community CMake 编译指南](#visual-studio-2019-community-cmake-编译指南)
- [1. 环境准备](#1-环境准备)
- [1.1 安装必须环境](#11-安装必须环境)
- [1.2 下载 PaddlePaddle C++ 预测库和 Opencv](#12-下载-paddlepaddle-c-预测库和-opencv)
- [1.2.1 下载 PaddlePaddle C++ 预测库](#121-下载-paddlepaddle-c-预测库)
- [1.2.2 安装配置OpenCV](#122-安装配置opencv)
- [1.2.3 下载PaddleOCR代码](#123-下载paddleocr代码)
- [2. 开始运行](#2-开始运行)
- [Step1: 构建Visual Studio项目](#step1-构建visual-studio项目)
- [Step2: 执行cmake配置](#step2-执行cmake配置)
- [Step3: 生成Visual Studio 项目](#step3-生成visual-studio-项目)
- [Step4: 预测](#step4-预测)
- [FAQ](#faq)
# Visual Studio 2019 Community CMake 编译指南
PaddleOCR在Windows 平台下基于`Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目,但是直到`2019`才提供了稳定和完全的支持,所以如果你想使用CMake管理项目编译构建,我们推荐你使用`Visual Studio 2019`环境下构建。
**下面所有示例以工作目录为 `D:\projects\cpp`演示**
## 1. 环境准备
### 1.1 安装必须环境
## 前置条件
* Visual Studio 2019
* CUDA 10.2,cudnn 7+ (仅在使用GPU版本的预测库时需要)
* CMake 3.0+
* CMake 3.22+
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
**下面所有示例以工作目录为 `D:\projects`演示**
### 1.2 下载 PaddlePaddle C++ 预测库和 Opencv
### Step1: 下载PaddlePaddle C++ 预测库 paddle_inference
#### 1.2.1 下载 PaddlePaddle C++ 预测库
PaddlePaddle C++ 预测库针对不同的`CPU``CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#windows)
......@@ -26,87 +43,94 @@ paddle_inference
└── version.txt # 版本和编译信息
```
### Step2: 安装配置OpenCV
#### 1.2.2 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
1. 在OpenCV官网下载适用于Windows平台的Opencv, [下载地址](https://github.com/opencv/opencv/releases)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\cpp\opencv`
### Step3: 使用Visual Studio 2019直接编译CMake
#### 1.2.3 下载PaddleOCR代码
```bash
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
```
1. 打开Visual Studio 2019 Community,点击`继续但无需代码`
![step2](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
2. 点击: `文件`->`打开`->`CMake`
![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
## 2. 开始运行
选择项目代码所在路径,并打开`CMakeList.txt`
### Step1: 构建Visual Studio项目
cmake安装完后后系统里会有一个cmake-gui程序,打开cmake-gui,在第一个输入框处填写源代码路径,第二个输入框处填写编译输出路径
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
![step1](imgs/cmake_step1.png)
3. 点击:`项目`->`CMake设置`
### Step2: 执行cmake配置
点击界面下方的`Configure`按钮,第一次点击会弹出提示框进行Visual Studio配置,如下图,选择你的Visual Studio版本即可,目标平台选择x64。然后点击`finish`按钮即开始自动执行配置。
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
![step2](imgs/cmake_step2.png)
4. 分别设置编译选项指定`CUDA``CUDNN_LIB``OpenCV``Paddle预测库`的路径
第一次执行会报错,这是正常现象,接下来进行Opencv和预测库的配置
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐):
* cpu版本,仅需考虑OPENCV_DIR、OpenCV_DIR、PADDLE_LIB三个参数
| 参数名 | 含义 |
| ---- | ---- |
| *CUDA_LIB | CUDA的库路径 |
| *CUDNN_LIB | CUDNN的库路径 |
| OPENCV_DIR | OpenCV的安装路径 |
| PADDLE_LIB | Paddle预测库的路径 |
- OPENCV_DIR:填写opencv lib文件夹所在位置
- OpenCV_DIR:同填写opencv lib文件夹所在位
- PADDLE_LIB:paddle_inference文件夹所在位置
**注意:**
1. 使用`CPU`版预测库,请把`WITH_GPU`的勾去掉
2. 如果使用的是`openblas`版本,请把`WITH_MKL`勾去掉
* GPU版本,在cpu版本的基础上,还需填写以下变量
CUDA_LIB、CUDNN_LIB、TENSORRT_DIR、WITH_GPU、WITH_TENSORRT
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
- CUDA_LIB: CUDA地址,如 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64`
- CUDNN_LIB: 和CUDA_LIB一致
- TENSORRT_DIR:TRT下载后解压缩的位置
- WITH_GPU: 打钩
- WITH_TENSORRT:打勾
配置好的截图如下
![step3](imgs/cmake_step3.png)
配置完成后,再次点击`Configure`按钮。
下面给出with GPU的配置示例:
![step5](./vs2019_build_withgpu_config.png)
**注意:**
CMAKE_BACKWARDS的版本要根据平台安装cmake的版本进行设置。
1. 如果使用的是`openblas`版本,请把`WITH_MKL`勾去掉
2. 遇到报错 `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, 将 `deploy/cpp_infer/external-cmake/auto-log.cmake` 中的github地址改为 https://gitee.com/Double_V/AutoLog 地址即可。
### Step3: 生成Visual Studio 项目
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`
点击`Generate`按钮即可生成Visual Studio 项目的sln文件。
![step4](imgs/cmake_step4.png)
5. 点击`生成`->`全部生成`
点击`Open Project`按钮即可在Visual Studio 中打开项目。打开后截图如下
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
![step5](imgs/vs_step1.png)
在开始生成解决方案之前,执行下面步骤:
1.`Debug`改为`Release`
2. 下载[dirent.h](https://paddleocr.bj.bcebos.com/deploy/cpp_infer/cpp_files/dirent.h),并拷贝到 Visual Studio 的 include 文件夹下,如`C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\VS\include`
点击`生成->生成解决方案`,即可在`build/Release/`文件夹下看见`ppocr.exe`文件。
运行之前,将下面文件拷贝到`build/Release/`文件夹下
1. `paddle_inference/paddle/lib/paddle_inference.dll`
2. `opencv/build/x64/vc15/bin/opencv_world455.dll`
### Step4: 预测
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release\Release`目录下,打开`cmd`,并切换到`D:\projects\PaddleOCR\deploy\cpp_infer\`:
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release\Release`目录下,打开`cmd`,并切换到`D:\projects\cpp\PaddleOCR\deploy\cpp_infer\`:
```
cd D:\projects\PaddleOCR\deploy\cpp_infer
cd /d D:\projects\cpp\PaddleOCR\deploy\cpp_infer
```
可执行文件`ppocr.exe`即为样例的预测程序,其主要使用方法如下,更多使用方法可以参考[说明文档](../readme.md)`运行demo`部分。
```shell
#识别中文图片 `D:\projects\PaddleOCR\doc\imgs_words\ch\`
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\ch_ppocr_mobile_v2.0_rec_infer --image_dir=D:\projects\PaddleOCR\doc\imgs_words\ch\
#识别英文图片 'D:\projects\PaddleOCR\doc\imgs_words\en\'
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\inference\rec_mv3crnn --image_dir=D:\projects\PaddleOCR\doc\imgs_words\en\ --char_list_file=D:\projects\PaddleOCR\ppocr\utils\dict\en_dict.txt
# 切换终端编码为utf8
CHCP 65001
# 执行预测
.\build\Release\ppocr.exe system --det_model_dir=D:\projects\cpp\ch_PP-OCRv2_det_slim_quant_infer --rec_model_dir=D:\projects\cpp\ch_PP-OCRv2_rec_slim_quant_infer --image_dir=D:\projects\cpp\PaddleOCR\doc\imgs\11.jpg
```
识别结果如下
![result](imgs/result.png)
第一个参数为配置文件路径,第二个参数为需要预测的图片路径,第三个参数为配置文本识别的字典。
### FQA
* 在Windows下的终端中执行文件exe时,可能会发生乱码的现象,此时需要在终端中输入`CHCP 65001`,将终端的编码方式由GBK编码(默认)改为UTF-8编码,更加具体的解释可以参考这篇博客:[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)。
* 编译时,如果报错`错误:C1083 无法打开包括文件:"dirent.h":No such file or directory`,可以参考该[文档](https://blog.csdn.net/Dora_blank/article/details/117740837#41_C1083_direnthNo_such_file_or_directory_54),新建`dirent.h`文件,并添加到`utility.cpp`的头文件引用中。同时修改`utility.cpp`70行:`lstat`改成`stat`。
* 编译时,如果报错`Autolog未定义`,新建`autolog.h`文件,内容为:[autolog.h](https://github.com/LDOUBLEV/AutoLog/blob/main/auto_log/autolog.h),并添加到`main.cpp`的头文件引用中,再次编译。
* 运行时,如果弹窗报错找不到`paddle_inference.dll`或者`openblas.dll`,在`D:\projects\paddle_inference`预测库内找到这两个文件,复制到`D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release\Release`目录下。不用重新编译,再次运行即可。
## FAQ
* 运行时,弹窗报错提示`应用程序无法正常启动(0xc0000142)`,并且`cmd`窗口内提示`You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found.`,把tensort目录下的lib里面的所有dll文件复制到release目录下,再次运行即可。
......@@ -6,6 +6,7 @@ set(FETCHCONTENT_BASE_DIR "${CMAKE_CURRENT_BINARY_DIR}/third-party")
FetchContent_Declare(
extern_Autolog
PREFIX autolog
# If you don't have access to github, replace it with https://gitee.com/Double_V/AutoLog
GIT_REPOSITORY https://github.com/LDOUBLEV/AutoLog.git
GIT_TAG main
)
......
......@@ -46,8 +46,7 @@ public:
const double &det_db_box_thresh,
const double &det_db_unclip_ratio,
const bool &use_polygon_score, const bool &use_dilation,
const bool &visualize, const bool &use_tensorrt,
const std::string &precision) {
const bool &use_tensorrt, const std::string &precision) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem;
......@@ -62,7 +61,6 @@ public:
this->use_polygon_score_ = use_polygon_score;
this->use_dilation_ = use_dilation;
this->visualize_ = visualize;
this->use_tensorrt_ = use_tensorrt;
this->precision_ = precision;
......
......@@ -44,7 +44,8 @@ public:
const int &gpu_id, const int &gpu_mem,
const int &cpu_math_library_num_threads,
const bool &use_mkldnn, const string &label_path,
const bool &use_tensorrt, const std::string &precision,
const bool &use_tensorrt,
const std::string &precision,
const int &rec_batch_num) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
......@@ -66,7 +67,8 @@ public:
// Load Paddle inference model
void LoadModel(const std::string &model_dir);
void Run(std::vector<cv::Mat> img_list, std::vector<double> *times);
void Run(std::vector<cv::Mat> img_list, std::vector<std::string> &rec_texts,
std::vector<float> &rec_text_scores, std::vector<double> *times);
private:
std::shared_ptr<Predictor> predictor_;
......
......@@ -38,7 +38,8 @@ public:
static void
VisualizeBboxes(const cv::Mat &srcimg,
const std::vector<std::vector<std::vector<int>>> &boxes);
const std::vector<std::vector<std::vector<int>>> &boxes,
const std::string &save_path);
template <class ForwardIterator>
inline static size_t argmax(ForwardIterator first, ForwardIterator last) {
......@@ -51,8 +52,9 @@ public:
static cv::Mat GetRotateCropImage(const cv::Mat &srcimage,
std::vector<std::vector<int>> box);
static std::vector<int> argsort(const std::vector<float>& array);
static std::vector<int> argsort(const std::vector<float> &array);
static std::string basename(const std::string &filename);
};
} // namespace PaddleOCR
\ No newline at end of file
# 服务器端C++预测
本章节介绍PaddleOCR 模型的的C++部署方法,与之对应的python预测部署方式参考[文档](../../doc/doc_ch/inference.md)
C++在性能计算上优于python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成
PaddleOCR模型部署。
- [服务器端C++预测](#服务器端c预测)
- [1. 准备环境](#1-准备环境)
- [1.0 运行准备](#10-运行准备)
......@@ -18,6 +12,14 @@ PaddleOCR模型部署。
- [1. 只调用检测:](#1-只调用检测)
- [2. 只调用识别:](#2-只调用识别)
- [3. 调用串联:](#3-调用串联)
- [3. FAQ](#3-faq)
# 服务器端C++预测
本章节介绍PaddleOCR 模型的的C++部署方法,与之对应的python预测部署方式参考[文档](../../doc/doc_ch/inference.md)
C++在性能计算上优于python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成
PaddleOCR模型部署。
<a name="1"></a>
......@@ -28,7 +30,7 @@ PaddleOCR模型部署。
### 1.0 运行准备
- Linux环境,推荐使用docker。
- Windows环境,目前支持基于`Visual Studio 2019 Community`进行编译
- Windows环境。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
......@@ -254,6 +256,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
|gpu_mem|int|4000|申请的GPU内存|
|cpu_math_library_num_threads|int|10|CPU预测时的线程数,在机器核数充足的情况下,该值越大,预测速度越快|
|enable_mkldnn|bool|true|是否使用mkldnn库|
|output|str|./output|可视化结果保存的路径|
- 检测模型相关
......@@ -265,7 +268,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
|det_db_box_thresh|float|0.5|DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小|
|det_db_unclip_ratio|float|1.6|表示文本框的紧致程度,越小则文本框更靠近文本|
|use_polygon_score|bool|false|是否使用多边形框计算bbox score,false表示使用矩形框计算。矩形框计算速度更快,多边形框对弯曲文本区域计算更准确。|
|visualize|bool|true|是否对结果进行可视化,为1时,会在当前文件夹下保存文件名为`ocr_vis.png`的预测结果。|
|visualize|bool|true|是否对结果进行可视化,为1时,预测结果会保存在`output`字段指定的文件夹下和输入图像同名的图像上。|
- 方向分类器相关
......@@ -280,10 +283,10 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
|参数名称|类型|默认参数|意义|
| :---: | :---: | :---: | :---: |
|rec_model_dir|string|-|识别模型inference model地址|
|char_list_file|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|字典文件|
* PaddleOCR也支持多语言的预测,更多支持的语言和模型可以参考[识别文档](../../doc/doc_ch/recognition.md)中的多语言字典与模型部分,如果希望进行多语言预测,只需将修改`char_list_file`(字典文件路径)以及`rec_model_dir`(inference模型路径)字段即可。
* PaddleOCR也支持多语言的预测,更多支持的语言和模型可以参考[识别文档](../../doc/doc_ch/recognition.md)中的多语言字典与模型部分,如果希望进行多语言预测,只需将修改`rec_char_dict_path`(字典文件路径)以及`rec_model_dir`(inference模型路径)字段即可。
最终屏幕上会输出检测结果如下。
......@@ -291,5 +294,6 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
<img src="./imgs/cpp_infer_pred_12.png" width="600">
</div>
## 3. FAQ
**注意:在使用Paddle预测库时,推荐使用2.0.0版本的预测库。**
1. 遇到报错 `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, 将 `deploy/cpp_infer/external-cmake/auto-log.cmake` 中的github地址改为 https://gitee.com/Double_V/AutoLog 地址即可。
- [Server-side C++ Inference](#server-side-c-inference)
- [1. Prepare the Environment](#1-prepare-the-environment)
- [Environment](#environment)
- [1.1 Compile OpenCV](#11-compile-opencv)
- [1.2 Compile or Download or the Paddle Inference Library](#12-compile-or-download-or-the-paddle-inference-library)
- [1.2.1 Direct download and installation](#121-direct-download-and-installation)
- [1.2.2 Compile the inference source code](#122-compile-the-inference-source-code)
- [2. Compile and Run the Demo](#2-compile-and-run-the-demo)
- [2.1 Export the inference model](#21-export-the-inference-model)
- [2.2 Compile PaddleOCR C++ inference demo](#22-compile-paddleocr-c-inference-demo)
- [Run the demo](#run-the-demo)
- [1. run det demo:](#1-run-det-demo)
- [2. run rec demo:](#2-run-rec-demo)
- [3. run system demo:](#3-run-system-demo)
- [3. FAQ](#3-faq)
# Server-side C++ Inference
This chapter introduces the C++ deployment steps of the PaddleOCR model. The corresponding Python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
......@@ -10,6 +26,7 @@ This section will introduce how to configure the C++ environment and deploy Padd
### Environment
- Linux, docker is recommended.
- Windows.
### 1.1 Compile OpenCV
......@@ -232,6 +249,7 @@ More parameters are as follows,
|gpu_mem|int|4000|GPU memory requested|
|cpu_math_library_num_threads|int|10|Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed|
|enable_mkldnn|bool|true|Whether to use mkdlnn library|
|output|str|./output|Path where visualization results are saved|
- Detection related parameters
......@@ -243,7 +261,7 @@ More parameters are as follows,
|det_db_box_thresh|float|0.5|DB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate|
|det_db_unclip_ratio|float|1.6|Indicates the compactness of the text box, the smaller the value, the closer the text box to the text|
|use_polygon_score|bool|false|Whether to use polygon box to calculate bbox score, false means to use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|visualize|bool|true|Whether to visualize the results,when it is set as true, The prediction result will be save in the image file `./ocr_vis.png`.|
|visualize|bool|true|Whether to visualize the results,when it is set as true, the prediction results will be saved in the folder specified by the `output` field on an image with the same name as the input image.|
- Classifier related parameters
......@@ -258,9 +276,9 @@ More parameters are as follows,
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
|rec_model_dir|string|-|Address of recognition inference model|
|char_list_file|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file|
|rec_char_dict_path|string|../../ppocr/utils/ppocr_keys_v1.txt|dictionary file|
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `char_list_file` and `rec_model_dir`.
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
The detection results will be shown on the screen, which is as follows.
......@@ -270,6 +288,6 @@ The detection results will be shown on the screen, which is as follows.
</div>
### 2.3 Notes
## 3. FAQ
* Paddle 2.0.0 inference model library is recommended for this tutorial.
1. Encountered the error `unable to access 'https://github.com/LDOUBLEV/AutoLog.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.`, change the github address in `deploy/cpp_infer/external-cmake/auto-log.cmake` to the https://gitee.com/Double_V/AutoLog address.
......@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "glog/logging.h"
#include "omp.h"
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
......@@ -21,13 +20,13 @@
#include <iomanip>
#include <iostream>
#include <ostream>
#include <sys/stat.h>
#include <vector>
#include <cstring>
#include <fstream>
#include <numeric>
#include <glog/logging.h>
#include <include/ocr_cls.h>
#include <include/ocr_det.h>
#include <include/ocr_rec.h>
......@@ -45,7 +44,7 @@ DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
DEFINE_bool(benchmark, false, "Whether use benchmark.");
DEFINE_string(save_log_path, "./log_output/", "Save benchmark log path.");
DEFINE_string(output, "./output/", "Save benchmark log path.");
// detection related
DEFINE_string(image_dir, "", "Dir of input image.");
DEFINE_string(det_model_dir, "", "Path of det inference model.");
......@@ -63,7 +62,7 @@ DEFINE_double(cls_thresh, 0.9, "Threshold of cls_thresh.");
// recognition related
DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
DEFINE_string(char_list_file, "../../ppocr/utils/ppocr_keys_v1.txt",
DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt",
"Path of dictionary.");
using namespace std;
......@@ -86,11 +85,17 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_use_dilation, FLAGS_visualize,
FLAGS_use_polygon_score, FLAGS_use_dilation,
FLAGS_use_tensorrt, FLAGS_precision);
if (!PathExists(FLAGS_output)) {
mkdir(FLAGS_output.c_str(), 0777);
}
for (int i = 0; i < cv_all_img_names.size(); ++i) {
// LOG(INFO) << "The predict img: " << cv_all_img_names[i];
if (!FLAGS_benchmark) {
cout << "The predict img: " << cv_all_img_names[i] << endl;
}
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
if (!srcimg.data) {
......@@ -102,7 +107,11 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
std::vector<double> det_times;
det.Run(srcimg, boxes, &det_times);
// visualization
if (FLAGS_visualize) {
std::string file_name = Utility::basename(cv_all_img_names[i]);
Utility::VisualizeBboxes(srcimg, boxes, FLAGS_output + "/" + file_name);
}
time_info[0] += det_times[0];
time_info[1] += det_times[1];
time_info[2] += det_times[2];
......@@ -130,20 +139,18 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
int main_rec(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info = {0, 0, 0};
std::string char_list_file = FLAGS_char_list_file;
std::string rec_char_dict_path = FLAGS_rec_char_dict_path;
if (FLAGS_benchmark)
char_list_file = FLAGS_char_list_file.substr(6);
cout << "label file: " << char_list_file << endl;
rec_char_dict_path = FLAGS_rec_char_dict_path.substr(6);
cout << "label file: " << rec_char_dict_path << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
char_list_file, FLAGS_use_tensorrt, FLAGS_precision,
rec_char_dict_path, FLAGS_use_tensorrt, FLAGS_precision,
FLAGS_rec_batch_num);
std::vector<cv::Mat> img_list;
for (int i = 0; i < cv_all_img_names.size(); ++i) {
LOG(INFO) << "The predict img: " << cv_all_img_names[i];
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
if (!srcimg.data) {
std::cerr << "[ERROR] image read failed! image path: "
......@@ -152,8 +159,15 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
}
img_list.push_back(srcimg);
}
std::vector<std::string> rec_texts(img_list.size(), "");
std::vector<float> rec_text_scores(img_list.size(), 0);
std::vector<double> rec_times;
rec.Run(img_list, &rec_times);
rec.Run(img_list, rec_texts, rec_text_scores, &rec_times);
// output rec results
for (int i = 0; i < rec_texts.size(); i++) {
cout << "The predict img: " << cv_all_img_names[i] << "\t" << rec_texts[i]
<< "\t" << rec_text_scores[i] << endl;
}
time_info[0] += rec_times[0];
time_info[1] += rec_times[1];
time_info[2] += rec_times[2];
......@@ -172,11 +186,15 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info_det = {0, 0, 0};
std::vector<double> time_info_rec = {0, 0, 0};
if (!PathExists(FLAGS_output)) {
mkdir(FLAGS_output.c_str(), 0777);
}
DBDetector det(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_use_dilation, FLAGS_visualize,
FLAGS_use_polygon_score, FLAGS_use_dilation,
FLAGS_use_tensorrt, FLAGS_precision);
Classifier *cls = nullptr;
......@@ -186,18 +204,18 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
FLAGS_cls_thresh, FLAGS_use_tensorrt, FLAGS_precision);
}
std::string char_list_file = FLAGS_char_list_file;
std::string rec_char_dict_path = FLAGS_rec_char_dict_path;
if (FLAGS_benchmark)
char_list_file = FLAGS_char_list_file.substr(6);
cout << "label file: " << char_list_file << endl;
rec_char_dict_path = FLAGS_rec_char_dict_path.substr(6);
cout << "label file: " << rec_char_dict_path << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
char_list_file, FLAGS_use_tensorrt, FLAGS_precision,
rec_char_dict_path, FLAGS_use_tensorrt, FLAGS_precision,
FLAGS_rec_batch_num);
for (int i = 0; i < cv_all_img_names.size(); ++i) {
LOG(INFO) << "The predict img: " << cv_all_img_names[i];
cout << "The predict img: " << cv_all_img_names[i] << endl;
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
if (!srcimg.data) {
......@@ -205,15 +223,21 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
<< cv_all_img_names[i] << endl;
exit(1);
}
// det
std::vector<std::vector<std::vector<int>>> boxes;
std::vector<double> det_times;
std::vector<double> rec_times;
det.Run(srcimg, boxes, &det_times);
if (FLAGS_visualize) {
std::string file_name = Utility::basename(cv_all_img_names[i]);
Utility::VisualizeBboxes(srcimg, boxes, FLAGS_output + "/" + file_name);
}
time_info_det[0] += det_times[0];
time_info_det[1] += det_times[1];
time_info_det[2] += det_times[2];
// rec
std::vector<cv::Mat> img_list;
for (int j = 0; j < boxes.size(); j++) {
cv::Mat crop_img;
......@@ -223,8 +247,14 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
}
img_list.push_back(crop_img);
}
rec.Run(img_list, &rec_times);
std::vector<std::string> rec_texts(img_list.size(), "");
std::vector<float> rec_text_scores(img_list.size(), 0);
rec.Run(img_list, rec_texts, rec_text_scores, &rec_times);
// output rec results
for (int i = 0; i < rec_texts.size(); i++) {
std::cout << i << "\t" << rec_texts[i] << "\t" << rec_text_scores[i]
<< std::endl;
}
time_info_rec[0] += rec_times[0];
time_info_rec[1] += rec_times[1];
time_info_rec[2] += rec_times[2];
......
......@@ -175,11 +175,6 @@ void DBDetector::Run(cv::Mat &img,
std::chrono::duration<float> postprocess_diff =
postprocess_end - postprocess_start;
times->push_back(double(postprocess_diff.count() * 1000));
//// visualization
if (this->visualize_) {
Utility::VisualizeBboxes(srcimg, boxes);
}
}
} // namespace PaddleOCR
......@@ -17,6 +17,8 @@
namespace PaddleOCR {
void CRNNRecognizer::Run(std::vector<cv::Mat> img_list,
std::vector<std::string> &rec_texts,
std::vector<float> &rec_text_scores,
std::vector<double> *times) {
std::chrono::duration<float> preprocess_diff =
std::chrono::steady_clock::now() - std::chrono::steady_clock::now();
......@@ -86,7 +88,7 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list,
// ctc decode
auto postprocess_start = std::chrono::steady_clock::now();
for (int m = 0; m < predict_shape[0]; m++) {
std::vector<std::string> str_res;
std::string str_res;
int argmax_idx;
int last_index = 0;
float score = 0.f;
......@@ -104,17 +106,16 @@ void CRNNRecognizer::Run(std::vector<cv::Mat> img_list,
if (argmax_idx > 0 && (!(n > 0 && argmax_idx == last_index))) {
score += max_value;
count += 1;
str_res.push_back(label_list_[argmax_idx]);
str_res += label_list_[argmax_idx];
}
last_index = argmax_idx;
}
score /= count;
if (isnan(score))
if (isnan(score)) {
continue;
for (int i = 0; i < str_res.size(); i++) {
std::cout << str_res[i];
}
std::cout << "\tscore: " << score << std::endl;
rec_texts[indices[beg_img_no + m]] = str_res;
rec_text_scores[indices[beg_img_no + m]] = score;
}
auto postprocess_end = std::chrono::steady_clock::now();
postprocess_diff += postprocess_end - postprocess_start;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment