"vscode:/vscode.git/clone" did not exist on "42230e04e91ab6f940078a3270bfbc79afc7f6ad"
Commit 243f89da authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into dygraph

parents b62e6954 a8318353
......@@ -86,15 +86,8 @@ PPOCRLabel # [Normal mode] for [detection + recognition] labeling
PPOCRLabel --kie True # [KIE mode] for [detection + recognition + keyword extraction] labeling
```
#### 1.2.2 Build and Install the Whl Package Locally
```bash
cd PaddleOCR/PPOCRLabel
python3 setup.py bdist_wheel
pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl
```
#### 1.2.3 Run PPOCRLabel by Python Script
#### 1.2.2 Run PPOCRLabel by Python Script
If you modify the PPOCRLabel file (for example, specifying a new built-in model), it will be more convenient to see the results by running the Python script. If you still want to start with the whl package, you need to uninstall the whl package in the current environment and then recompile it according to the next section.
```bash
cd ./PPOCRLabel # Switch to the PPOCRLabel directory
......@@ -104,6 +97,13 @@ python PPOCRLabel.py # [Normal mode] for [detection + recognition] labeling
python PPOCRLabel.py --kie True # [KIE mode] for [detection + recognition + keyword extraction] labeling
```
#### 1.2.3 Build and Install the Whl Package Locally
Compile and install a new whl package, where 1.0.2 is the version number, you can specify the new version in 'setup.py'.
```bash
cd PaddleOCR/PPOCRLabel
python3 setup.py bdist_wheel
pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl
```
## 2. Usage
......
......@@ -88,7 +88,7 @@ PPOCRLabel --lang ch --kie True # 启动 【KIE 模式】,用于打【检测+
#### 1.2.2 通过Python脚本运行PPOCRLabel
如果您对PPOCRLabel文件有所更改(例如指定新的内置模型),通过Python脚本运行会更加方的看到更改的结果。如果仍然需要通过whl包启动,则需要参考下节重新编译whl包。
如果您对PPOCRLabel文件有所更改(例如指定新的内置模型),通过Python脚本运行会更加方便的看到更改的结果。如果仍然需要通过whl包启动,则需要先卸载当前环境中的whl包,然后参考下节重新编译whl包。
```bash
cd ./PPOCRLabel # 切换到PPOCRLabel目录
......@@ -100,11 +100,9 @@ python PPOCRLabel.py --lang ch
编译与安装新的whl包,其中1.0.2为版本号,可在 `setup.py` 中指定新版本。
```bash
cd ./PPOCRLabel # 切换到PPOCRLabel目录
# 选择标签模式来启动
python PPOCRLabel.py --lang ch # 启动【普通模式】,用于打【检测+识别】场景的标签
python PPOCRLabel.py --lang ch --kie True # 启动 【KIE 模式】,用于打【检测+识别+关键字提取】场景的标签
cd PaddleOCR/PPOCRLabel
python3 setup.py bdist_wheel
pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple
```
......
......@@ -45,8 +45,9 @@ public:
const double &det_db_thresh,
const double &det_db_box_thresh,
const double &det_db_unclip_ratio,
const bool &use_polygon_score, const bool &use_dilation,
const bool &use_tensorrt, const std::string &precision) {
const std::string &det_db_score_mode,
const bool &use_dilation, const bool &use_tensorrt,
const std::string &precision) {
this->use_gpu_ = use_gpu;
this->gpu_id_ = gpu_id;
this->gpu_mem_ = gpu_mem;
......@@ -58,7 +59,7 @@ public:
this->det_db_thresh_ = det_db_thresh;
this->det_db_box_thresh_ = det_db_box_thresh;
this->det_db_unclip_ratio_ = det_db_unclip_ratio;
this->use_polygon_score_ = use_polygon_score;
this->det_db_score_mode_ = det_db_score_mode;
this->use_dilation_ = use_dilation;
this->use_tensorrt_ = use_tensorrt;
......@@ -88,7 +89,7 @@ private:
double det_db_thresh_ = 0.3;
double det_db_box_thresh_ = 0.5;
double det_db_unclip_ratio_ = 2.0;
bool use_polygon_score_ = false;
std::string det_db_score_mode_ = "slow";
bool use_dilation_ = false;
bool visualize_ = true;
......
......@@ -56,7 +56,7 @@ public:
std::vector<std::vector<std::vector<int>>>
BoxesFromBitmap(const cv::Mat pred, const cv::Mat bitmap,
const float &box_thresh, const float &det_db_unclip_ratio,
const bool &use_polygon_score);
const std::string &det_db_score_mode);
std::vector<std::vector<std::vector<int>>>
FilterTagDetRes(std::vector<std::vector<std::vector<int>>> boxes,
......
......@@ -267,7 +267,7 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
|det_db_thresh|float|0.3|用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显|
|det_db_box_thresh|float|0.5|DB后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小|
|det_db_unclip_ratio|float|1.6|表示文本框的紧致程度,越小则文本框更靠近文本|
|use_polygon_score|bool|false|是否使用多边形框计算bbox score,false表示使用矩形框计算。矩形框计算速度更快,多边形框对弯曲文本区域计算更准确。|
|det_db_score_mode|string|slow|slow:使用多边形框计算bbox score,fast:使用矩形框计算。矩形框计算速度更快,多边形框对弯曲文本区域计算更准确。|
|visualize|bool|true|是否对结果进行可视化,为1时,预测结果会保存在`output`字段指定的文件夹下和输入图像同名的图像上。|
- 方向分类器相关
......
......@@ -260,7 +260,7 @@ More parameters are as follows,
|det_db_thresh|float|0.3|Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result|
|det_db_box_thresh|float|0.5|DB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate|
|det_db_unclip_ratio|float|1.6|Indicates the compactness of the text box, the smaller the value, the closer the text box to the text|
|use_polygon_score|bool|false|Whether to use polygon box to calculate bbox score, false means to use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|det_db_score_mode|string|slow| slow: use polygon box to calculate bbox score, fast: use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|visualize|bool|true|Whether to visualize the results,when it is set as true, the prediction results will be saved in the folder specified by the `output` field on an image with the same name as the input image.|
- Classifier related parameters
......
......@@ -36,25 +36,26 @@
#include "auto_log/autolog.h"
#include <gflags/gflags.h>
// common args
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_int32(gpu_id, 0, "Device id of GPU to execute.");
DEFINE_int32(gpu_mem, 4000, "GPU id when infering with GPU.");
DEFINE_int32(cpu_threads, 10, "Num of threads with CPU.");
DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
DEFINE_bool(benchmark, false, "Whether use benchmark.");
DEFINE_string(output, "./output/", "Save benchmark log path.");
// detection related
DEFINE_string(image_dir, "", "Dir of input image.");
DEFINE_bool(visualize, true, "Whether show the detection results.");
// detection related
DEFINE_string(det_model_dir, "", "Path of det inference model.");
DEFINE_int32(max_side_len, 960, "max_side_len of input image.");
DEFINE_double(det_db_thresh, 0.3, "Threshold of det_db_thresh.");
DEFINE_double(det_db_box_thresh, 0.6, "Threshold of det_db_box_thresh.");
DEFINE_double(det_db_unclip_ratio, 1.5, "Threshold of det_db_unclip_ratio.");
DEFINE_bool(use_polygon_score, false, "Whether use polygon score.");
DEFINE_bool(use_dilation, false, "Whether use the dilation on output map.");
DEFINE_bool(visualize, true, "Whether show the detection results.");
DEFINE_string(det_db_score_mode, "slow", "Whether use polygon score.");
// classification related
DEFINE_bool(use_angle_cls, false, "Whether use use_angle_cls.");
DEFINE_string(cls_model_dir, "", "Path of cls inference model.");
......@@ -85,7 +86,7 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_use_dilation,
FLAGS_det_db_score_mode, FLAGS_use_dilation,
FLAGS_use_tensorrt, FLAGS_precision);
if (!PathExists(FLAGS_output)) {
......@@ -117,13 +118,21 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
time_info[2] += det_times[2];
if (FLAGS_benchmark) {
cout << cv_all_img_names[i] << '\t';
cout << cv_all_img_names[i] << "\t[";
for (int n = 0; n < boxes.size(); n++) {
cout << '[';
for (int m = 0; m < boxes[n].size(); m++) {
cout << boxes[n][m][0] << ' ' << boxes[n][m][1] << ' ';
cout << '[' << boxes[n][m][0] << ',' << boxes[n][m][1] << "]";
if (m != boxes[n].size() - 1) {
cout << ',';
}
}
cout << ']';
if (n != boxes.size() - 1) {
cout << ',';
}
}
cout << endl;
cout << ']' << endl;
}
}
......@@ -140,8 +149,6 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info = {0, 0, 0};
std::string rec_char_dict_path = FLAGS_rec_char_dict_path;
if (FLAGS_benchmark)
rec_char_dict_path = FLAGS_rec_char_dict_path.substr(6);
cout << "label file: " << rec_char_dict_path << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
......@@ -194,7 +201,7 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
FLAGS_gpu_mem, FLAGS_cpu_threads, FLAGS_enable_mkldnn,
FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_use_dilation,
FLAGS_det_db_score_mode, FLAGS_use_dilation,
FLAGS_use_tensorrt, FLAGS_precision);
Classifier *cls = nullptr;
......@@ -205,8 +212,6 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
}
std::string rec_char_dict_path = FLAGS_rec_char_dict_path;
if (FLAGS_benchmark)
rec_char_dict_path = FLAGS_rec_char_dict_path.substr(6);
cout << "label file: " << rec_char_dict_path << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
......
......@@ -161,7 +161,7 @@ void DBDetector::Run(cv::Mat &img,
boxes = post_processor_.BoxesFromBitmap(
pred_map, bit_map, this->det_db_box_thresh_, this->det_db_unclip_ratio_,
this->use_polygon_score_);
this->det_db_score_mode_);
boxes = post_processor_.FilterTagDetRes(boxes, ratio_h, ratio_w, srcimg);
auto postprocess_end = std::chrono::steady_clock::now();
......
......@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include <include/clipper.h>
#include <include/postprocess_op.h>
#include <include/clipper.cpp>
namespace PaddleOCR {
......@@ -187,8 +187,7 @@ float PostProcessor::PolygonScoreAcc(std::vector<cv::Point> contour,
cv::Mat mask;
mask = cv::Mat::zeros(ymax - ymin + 1, xmax - xmin + 1, CV_8UC1);
cv::Point* rook_point = new cv::Point[contour.size()];
cv::Point *rook_point = new cv::Point[contour.size()];
for (int i = 0; i < contour.size(); ++i) {
rook_point[i] = cv::Point(int(box_x[i]) - xmin, int(box_y[i]) - ymin);
......@@ -196,14 +195,14 @@ float PostProcessor::PolygonScoreAcc(std::vector<cv::Point> contour,
const cv::Point *ppt[1] = {rook_point};
int npt[] = {int(contour.size())};
cv::fillPoly(mask, ppt, npt, 1, cv::Scalar(1));
cv::Mat croppedImg;
pred(cv::Rect(xmin, ymin, xmax - xmin + 1, ymax - ymin + 1)).copyTo(croppedImg);
pred(cv::Rect(xmin, ymin, xmax - xmin + 1, ymax - ymin + 1))
.copyTo(croppedImg);
float score = cv::mean(croppedImg, mask)[0];
delete []rook_point;
delete[] rook_point;
return score;
}
......@@ -247,7 +246,7 @@ float PostProcessor::BoxScoreFast(std::vector<std::vector<float>> box_array,
std::vector<std::vector<std::vector<int>>> PostProcessor::BoxesFromBitmap(
const cv::Mat pred, const cv::Mat bitmap, const float &box_thresh,
const float &det_db_unclip_ratio, const bool &use_polygon_score) {
const float &det_db_unclip_ratio, const std::string &det_db_score_mode) {
const int min_size = 3;
const int max_candidates = 1000;
......@@ -281,7 +280,7 @@ std::vector<std::vector<std::vector<int>>> PostProcessor::BoxesFromBitmap(
}
float score;
if (use_polygon_score)
if (det_db_score_mode == "slow")
/* compute using polygon*/
score = PolygonScoreAcc(contours[_i], pred);
else
......
......@@ -29,8 +29,7 @@ def read_params():
cfg.rec_model_dir = "./inference/ch_PP-OCRv2_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.rec_batch_num = 6
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
......
......@@ -47,8 +47,7 @@ def read_params():
cfg.rec_model_dir = "./inference/ch_PP-OCRv2_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.rec_batch_num = 6
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr/utils/ppocr_keys_v1.txt"
......
......@@ -188,7 +188,7 @@ hub serving start -c deploy/hubserving/ocr_system/config.json
- **output**:可视化结果保存路径,默认为`./hubserving_result`
访问示例:
```python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir./doc/imgs/ --visualize=false```
```python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir=./doc/imgs/ --visualize=false```
## 4. 返回结果格式说明
返回结果为列表(list),列表中的每一项为词典(dict),词典一共可能包含3种字段,信息如下:
......
......@@ -196,7 +196,7 @@ For example, if using the configuration file to start the text angle classificat
**Eg.**
```shell
python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir./doc/imgs/ --visualize=false`
python tools/test_hubserving.py --server_url=http://127.0.0.1:8868/predict/ocr_system --image_dir=./doc/imgs/ --visualize=false`
```
## 4. Returned result format
......
......@@ -25,7 +25,6 @@ def read_params():
# params for table structure model
cfg.table_max_len = 488
cfg.table_model_dir = './inference/en_ppocr_mobile_v2.0_table_structure_infer/'
cfg.table_char_type = 'en'
cfg.table_char_dict_path = './ppocr/utils/dict/table_structure_dict.txt'
cfg.show_log = False
return cfg
......@@ -133,6 +133,7 @@ def main():
sub_model_save_path, logger)
else:
save_path = os.path.join(save_path, "inference")
model.eval()
export_single_model(quanter, model, infer_shape, save_path, logger)
......
......@@ -3,12 +3,13 @@
本文介绍针对PP-OCR模型库的Python推理引擎使用方法,内容依次为文本检测、文本识别、方向分类器以及三者串联在CPU、GPU上的预测方法。
- [1. 文本检测模型推理](#文本检测模型推理)
- [2. 文本识别模型推理](#文本识别模型推理)
- [2.1 超轻量中文识别模型推理](#超轻量中文识别模型推理)
- [2.2 多语言模型的推理](#多语言模型的推理)
- [3. 方向分类模型推理](#方向分类模型推理)
- [4. 文本检测、方向分类和文字识别串联推理](#文本检测、方向分类和文字识别串联推理)
- [基于Python引擎的PP-OCR模型库推理](#基于python引擎的pp-ocr模型库推理)
- [1. 文本检测模型推理](#1-文本检测模型推理)
- [2. 文本识别模型推理](#2-文本识别模型推理)
- [2.1 超轻量中文识别模型推理](#21-超轻量中文识别模型推理)
- [2.2 多语言模型的推理](#22-多语言模型的推理)
- [3. 方向分类模型推理](#3-方向分类模型推理)
- [4. 文本检测、方向分类和文字识别串联推理](#4-文本检测方向分类和文字识别串联推理)
<a name="文本检测模型推理"></a>
......@@ -82,7 +83,7 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153)
如果您需要预测的是其他语言模型,可以在[此链接](./models_list.md#%E5%A4%9A%E8%AF%AD%E8%A8%80%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)中找到对应语言的inference模型,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径, 同时为了得到正确的可视化结果,需要通过 `--vis_font_path` 指定可视化的字体路径,`doc/fonts/` 路径下有默认提供的小语种字体,例如韩文识别:
```
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
```
![](../imgs_words/korean/1.jpg)
......
# PaddleOCR快速开始
- [1. 安装](#1)
- [1.1 安装PaddlePaddle](#11)
- [1.2 安装PaddleOCR whl包](#12)
- [PaddleOCR快速开始](#paddleocr快速开始)
- [1. 安装](#1-安装)
- [1.1 安装PaddlePaddle](#11-安装paddlepaddle)
- [1.2 安装PaddleOCR whl包](#12-安装paddleocr-whl包)
- [2. 便捷使用](#2-便捷使用)
- [2.1 命令行使用](#21-命令行使用)
- [2.1.1 中英文模型](#211-中英文模型)
- [2.1.2 多语言模型](#212-多语言模型)
- [2.1.3 版面分析](#213-版面分析)
- [2.2 Python脚本使用](#22-python脚本使用)
- [2.2.1 中英文与多语言使用](#221-中英文与多语言使用)
- [2.2.2 版面分析](#222-版面分析)
- [3. 小结](#3-小结)
- [2. 便捷使用](#2)
- [2.1 命令行使用](#21)
- [2.1.1 中英文模型](#211)
- [2.1.2 多语言模型](#212)
- [2.1.3 版面分析](#213)
- [2.2 Python脚本使用](#22)
- [2.2.1 中英文与多语言使用](#221)
- [2.2.2 版面分析](#222)
- [3.小结](#3)
# PaddleOCR快速开始
<a name="1"></a>
......@@ -204,7 +204,7 @@ paddleocr --image_dir=./table/1.png --type=structure
| output | excel和识别结果保存的地址 | ./output/table |
| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 |
| table_model_dir | 表格结构模型 inference 模型地址 | None |
| table_char_type | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt |
| table_char_dict_path | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt |
大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md)
......
......@@ -2,19 +2,20 @@
本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
- [1 数据准备](#数据准备)
- [1.1 自定义数据集](#自定义数据集)
- [1.2 数据下载](#数据下载)
- [1.3 字典](#字典)
- [1.4 支持空格](#支持空格)
- [2 启动训练](#启动训练)
- [2.1 数据增强](#数据增强)
- [2.2 通用模型训练](#通用模型训练)
- [2.3 多语言模型训练](#多语言模型训练)
- [2.4 知识蒸馏训练](#知识蒸馏训练)
- [3 评估](#评估)
- [4 预测](#预测)
- [5 转Inference模型测试](#Inference)
- [文字识别](#文字识别)
- [1. 数据准备](#1-数据准备)
- [1.1 自定义数据集](#11-自定义数据集)
- [1.2 数据下载](#12-数据下载)
- [1.3 字典](#13-字典)
- [1.4 添加空格类别](#14-添加空格类别)
- [2. 启动训练](#2-启动训练)
- [2.1 数据增强](#21-数据增强)
- [2.2 通用模型训练](#22-通用模型训练)
- [2.3 多语言模型训练](#23-多语言模型训练)
- [2.4 知识蒸馏训练](#24-知识蒸馏训练)
- [3 评估](#3-评估)
- [4 预测](#4-预测)
- [5. 转Inference模型测试](#5-转inference模型测试)
<a name="数据准备"></a>
......@@ -477,8 +478,8 @@ python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_trai
- 自定义模型推理
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径,并且设置 `rec_char_type=ch`
如果训练时修改了文本的字典,在使用inference模型预测时,需要通过`--rec_char_dict_path`指定使用的字典路径
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="ch" --rec_char_dict_path="your text dict path"
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
```
......@@ -98,7 +98,6 @@ def read_params():
cfg.rec_model_dir = "./ocr_rec_server/" # 识别算法模型路径
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment