Commit 8c000977 authored by WenmuZhou's avatar WenmuZhou
Browse files

update doc

parent aad3093a
# 使用Paddle Serving预测推理
阅读本文档之前,请先阅读文档 [基于Python预测引擎推理](./inference.md)
同本地执行预测一样,我们需要保存一份可以用于Paddle Serving的模型。
接下来首先介绍如何将训练的模型转换成Paddle Serving模型,然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。
### 一、 准备环境
我们先安装Paddle Serving相关组件
我们推荐用户使用GPU来做Paddle Serving的OCR服务部署
**CUDA版本:9.X/10.X**
**CUDNN版本:7.X**
**操作系统版本:Linux/Windows**
**Python版本: 2.7/3.5/3.6/3.7**
**Python操作指南:**
目前Serving用于OCR的部分功能还在测试当中,因此在这里我们给出[Servnig latest package](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)
大家根据自己的环境选择需要安装的whl包即可,例如以Python 3.5为例,执行下列命令
```
#CPU/GPU版本选择一个
#GPU版本服务端
#CUDA 9
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py3-none-any.whl
#CUDA 10
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py3-none-any.whl
#CPU版本服务端
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py3-none-any.whl
#客户端和App包使用以下链接(CPU,GPU通用)
python -m pip install -U https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp36-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py3-none-any.whl
```
## 二、训练模型转Serving模型
在前序文档 [基于Python预测引擎推理](./inference.md) 中,我们提供了如何把训练的checkpoint转换成Paddle模型。Paddle模型通常由一个文件夹构成,内含模型结构描述文件`model`和模型参数文件`params`。Serving模型由两个文件夹构成,用于存放客户端和服务端的配置。
我们以`ch_rec_r34_vd_crnn`模型作为例子,下载链接在:
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
tar xf ch_rec_r34_vd_crnn_infer.tar
```
因此我们按照Serving模型转换教程,运行下列python文件。
```
python tools/inference_to_serving.py --model_dir ch_rec_r34_vd_crnn
```
最终会在`serving_client_dir``serving_server_dir`生成客户端和服务端的模型配置。其中`serving_server_dir``serving_client_dir`的名字可以自定义。最终文件结构如下
```
/ch_rec_r34_vd_crnn/
├── serving_client_dir # 客户端配置文件夹
└── serving_server_dir # 服务端配置文件夹
```
## 三、文本检测模型Serving推理
启动服务可以根据实际需求选择启动`标准版`或者`快速版`,两种方式的对比如下表:
|版本|特点|适用场景|
|-|-|-|
|标准版|稳定性高,分布式部署|适用于吞吐量大,需要跨机房部署的情况|
|快速版|部署方便,预测速度快|适用于对预测速度要求高,迭代速度快的场景,Windows用户只能选择快速版|
接下来的命令中,我们会指定快速版和标准版的命令。需要说明的是,标准版只能用Linux平台,快速版可以支持Linux/Windows。
文本检测模型推理,默认使用DB模型的配置参数,识别默认为CRNN。
配置文件在`params.py`中,我们贴出配置部分,如果需要做改动,也在这个文件内部进行修改。
```
def read_params():
cfg = Config()
#use gpu
cfg.use_gpu = False # 是否使用GPU
cfg.use_pdserving = True # 是否使用paddleserving,必须为True
#params for text detector
cfg.det_algorithm = "DB" # 检测算法, DB/EAST等
cfg.det_model_dir = "./det_mv_server/" # 检测算法模型路径
cfg.det_max_side_len = 960
#DB params
cfg.det_db_thresh =0.3
cfg.det_db_box_thresh =0.5
cfg.det_db_unclip_ratio =2.0
#EAST params
cfg.det_east_score_thresh = 0.8
cfg.det_east_cover_thresh = 0.1
cfg.det_east_nms_thresh = 0.2
#params for text recognizer
cfg.rec_algorithm = "CRNN" # 识别算法, CRNN/RARE等
cfg.rec_model_dir = "./ocr_rec_server/" # 识别算法模型路径
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
cfg.rec_batch_num = 30
cfg.max_text_length = 25
cfg.rec_char_dict_path = "./ppocr_keys_v1.txt" # 识别算法字典文件
cfg.use_space_char = True
#params for text classifier
cfg.use_angle_cls = True # 是否启用分类算法
cfg.cls_model_dir = "./ocr_clas_server/" # 分类算法模型路径
cfg.cls_image_shape = "3, 48, 192"
cfg.label_list = ['0', '180']
cfg.cls_batch_num = 30
cfg.cls_thresh = 0.9
return cfg
```
与本地预测不同的是,Serving预测需要一个客户端和一个服务端,因此接下来的教程都是两行代码。
在正式执行服务端启动命令之前,先export PYTHONPATH到工程主目录下。
```
export PYTHONPATH=$PWD:$PYTHONPATH
cd deploy/pdserving
```
为了方便用户复现Demo程序,我们提供了Chinese and English ultra-lightweight OCR model (8.1M)版本的Serving模型
```
wget --no-check-certificate https://paddleocr.bj.bcebos.com/deploy/pdserving/ocr_pdserving_suite.tar.gz
tar xf ocr_pdserving_suite.tar.gz
```
### 1. 超轻量中文检测模型推理
超轻量中文检测模型推理,可以执行如下命令启动服务端:
```
#根据环境只需要启动其中一个就可以
python det_rpc_server.py #标准版,Linux用户
python det_local_server.py #快速版,Windows/Linux用户
```
客户端
```
python det_web_client.py
```
Serving的推测和本地预测不同点在于,客户端发送请求到服务端,服务端需要检测到文字框之后返回框的坐标,此处没有后处理的图片,只能看到坐标值。
## 四、文本识别模型Serving推理
下面将介绍超轻量中文识别模型推理、基于CTC损失的识别模型推理和基于Attention损失的识别模型推理。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。此外,如果训练时修改了文本的字典,请参考下面的自定义文本识别字典的推理。
### 1. 超轻量中文识别模型推理
超轻量中文识别模型推理,可以执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#根据环境只需要启动其中一个就可以
python rec_rpc_server.py #标准版,Linux用户
python rec_local_server.py #快速版,Windows/Linux用户
```
如果需要使用CPU版本,还需增加 `--use_gpu False`
客户端
```
python rec_web_client.py
```
![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:
```
{u'result': {u'score': [u'0.89547354'], u'pred_text': ['实力活力']}}
```
## 五、方向分类模型推理
下面将介绍方向分类模型推理。
### 1. 方向分类模型推理
方向分类模型推理, 可以执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#根据环境只需要启动其中一个就可以
python clas_rpc_server.py #标准版,Linux用户
python clas_local_server.py #快速版,Windows/Linux用户
```
客户端
```
python rec_web_client.py
```
![](../imgs_words/ch/word_4.jpg)
执行命令后,上面图像的预测结果(分类的方向和得分)会打印到屏幕上,示例如下:
```
{u'result': {u'direction': [u'0'], u'score': [u'0.9999963']}}
```
## 六、文本检测、方向分类和文字识别串联Serving推理
### 1. 超轻量中文OCR模型推理
在执行预测时,需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir``rec_model_dir`分别指定检测,方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。与本地预测不同的是,为了减少网络传输耗时,可视化识别结果目前不做处理,用户收到的是推理得到的文字字段。
执行如下命令启动服务端:
需要注意params.py中的`--use_gpu`的值
```
#标准版,Linux用户
#GPU用户
python -m paddle_serving_server_gpu.serve --model det_infer_server --port 9293 --gpu_id 0
python -m paddle_serving_server_gpu.serve --model cls_infer_server --port 9294 --gpu_id 0
python ocr_rpc_server.py
#CPU用户
python -m paddle_serving_server.serve --model det_infer_server --port 9293
python -m paddle_serving_server.serve --model cls_infer_server --port 9294
python ocr_rpc_server.py
#快速版,Windows/Linux用户
python ocr_local_server.py
```
客户端
```
python rec_web_client.py
```
# 整体目录结构
PaddleOCR 的整体目录结构介绍如下:
```
PaddleOCR
├── configs // 配置文件,可通过yml文件选择模型结构并修改超参
│ ├── cls // 方向分类器相关配置文件
│ │ ├── cls_mv3.yml // 训练配置相关,包括骨干网络、head、loss、优化器
│ │ └── cls_reader.yml // 数据读取相关,数据读取方式、数据存储路径
│ ├── det // 检测相关配置文件
│ │ ├── det_db_icdar15_reader.yml // 数据读取
│ │ ├── det_mv3_db.yml // 训练配置
│ │ ...
│ └── rec // 识别相关配置文件
│ ├── rec_benchmark_reader.yml // LMDB 格式数据读取相关
│ ├── rec_chinese_common_train.yml // 通用中文训练配置
│ ├── rec_icdar15_reader.yml // simple 数据读取相关,包括数据读取函数、数据路径、标签文件
│ ...
├── deploy // 部署相关
│ ├── android_demo // android_demo
│ │ ...
│ ├── cpp_infer // C++ infer
│ │ ├── CMakeLists.txt // Cmake 文件
│ │ ├── docs // 说明文档
│ │ │ └── windows_vs2019_build.md
│ │ ├── include // 头文件
│ │ │ ├── clipper.h // clipper 库
│ │ │ ├── config.h // 预测配置
│ │ │ ├── ocr_cls.h // 方向分类器
│ │ │ ├── ocr_det.h // 文字检测
│ │ │ ├── ocr_rec.h // 文字识别
│ │ │ ├── postprocess_op.h // 检测后处理
│ │ │ ├── preprocess_op.h // 检测预处理
│ │ │ └── utility.h // 工具
│ │ ├── readme.md // 说明文档
│ │ ├── ...
│ │ ├── src // 源文件
│ │ │ ├── clipper.cpp
│ │ │ ├── config.cpp
│ │ │ ├── main.cpp
│ │ │ ├── ocr_cls.cpp
│ │ │ ├── ocr_det.cpp
│ │ │ ├── ocr_rec.cpp
│ │ │ ├── postprocess_op.cpp
│ │ │ ├── preprocess_op.cpp
│ │ │ └── utility.cpp
│ │ └── tools // 编译、执行脚本
│ │ ├── build.sh // 编译脚本
│ │ ├── config.txt // 配置文件
│ │ └── run.sh // 测试启动脚本
│ ├── docker
│ │ └── hubserving
│ │ ├── cpu
│ │ │ └── Dockerfile
│ │ ├── gpu
│ │ │ └── Dockerfile
│ │ ├── README_cn.md
│ │ ├── README.md
│ │ └── sample_request.txt
│ ├── hubserving // hubserving
│ │ ├── ocr_det // 文字检测
│ │ │ ├── config.json // serving 配置
│ │ │ ├── __init__.py
│ │ │ ├── module.py // 预测模型
│ │ │ └── params.py // 预测参数
│ │ ├── ocr_rec // 文字识别
│ │ │ ├── config.json
│ │ │ ├── __init__.py
│ │ │ ├── module.py
│ │ │ └── params.py
│ │ └── ocr_system // 系统预测
│ │ ├── config.json
│ │ ├── __init__.py
│ │ ├── module.py
│ │ └── params.py
│ ├── imgs // 预测图片
│ │ ├── cpp_infer_pred_12.png
│ │ └── demo.png
│ ├── ios_demo // ios demo
│ │ ...
│ ├── lite // lite 部署
│ │ ├── cls_process.cc // 方向分类器数据处理
│ │ ├── cls_process.h
│ │ ├── config.txt // 检测配置参数
│ │ ├── crnn_process.cc // crnn数据处理
│ │ ├── crnn_process.h
│ │ ├── db_post_process.cc // db数据处理
│ │ ├── db_post_process.h
│ │ ├── Makefile // 编译文件
│ │ ├── ocr_db_crnn.cc // 串联预测
│ │ ├── prepare.sh // 数据准备
│ │ ├── readme.md // 说明文档
│ │ ...
│ ├── pdserving // pdserving 部署
│ │ ├── det_local_server.py // 检测 快速版,部署方便预测速度快
│ │ ├── det_web_server.py // 检测 完整版,稳定性高分布式部署
│ │ ├── ocr_local_server.py // 检测+识别 快速版
│ │ ├── ocr_web_client.py // 客户端
│ │ ├── ocr_web_server.py // 检测+识别 完整版
│ │ ├── readme.md // 说明文档
│ │ ├── rec_local_server.py // 识别 快速版
│ │ └── rec_web_server.py // 识别 完整版
│ └── slim
│ └── quantization // 量化相关
│ ├── export_model.py // 导出模型
│ ├── quant.py // 量化
│ └── README.md // 说明文档
├── doc // 文档教程
│ ...
├── paddleocr.py
├── ppocr // 网络核心代码
│ ├── data // 数据处理
│ │ ├── cls // 方向分类器
│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch
│ │ │ └── randaugment.py // 随机数据增广操作
│ │ ├── det // 检测
│ │ │ ├── data_augment.py // 数据增广操作
│ │ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,读取数据并组成batch
│ │ │ ├── db_process.py // db 数据处理
│ │ │ ├── east_process.py // east 数据处理
│ │ │ ├── make_border_map.py // 生成边界图
│ │ │ ├── make_shrink_map.py // 生成收缩图
│ │ │ ├── random_crop_data.py // 随机切割
│ │ │ └── sast_process.py // sast 数据处理
│ │ ├── reader_main.py // 数据读取器主函数
│ │ └── rec // 识别
│ │ ├── dataset_traversal.py // 数据传输,定义数据读取器,包含 LMDB_Reader 和 Simple_Reader
│ │ └── img_tools.py // 数据处理相关,包括数据归一化、扰动
│ ├── __init__.py
│ ├── modeling // 组网相关
│ │ ├── architectures // 模型架构,定义模型所需的各个模块
│ │ │ ├── cls_model.py // 方向分类器
│ │ │ ├── det_model.py // 检测
│ │ │ └── rec_model.py // 识别
│ │ ├── backbones // 骨干网络
│ │ │ ├── det_mobilenet_v3.py // 检测 mobilenet_v3
│ │ │ ├── det_resnet_vd.py
│ │ │ ├── det_resnet_vd_sast.py
│ │ │ ├── rec_mobilenet_v3.py // 识别 mobilenet_v3
│ │ │ ├── rec_resnet_fpn.py
│ │ │ └── rec_resnet_vd.py
│ │ ├── common_functions.py // 公共函数
│ │ ├── heads // 头函数
│ │ │ ├── cls_head.py // 分类头
│ │ │ ├── det_db_head.py // db 检测头
│ │ │ ├── det_east_head.py // east 检测头
│ │ │ ├── det_sast_head.py // sast 检测头
│ │ │ ├── rec_attention_head.py // 识别 attention
│ │ │ ├── rec_ctc_head.py // 识别 ctc
│ │ │ ├── rec_seq_encoder.py // 识别 序列编码
│ │ │ ├── rec_srn_all_head.py // 识别 srn 相关
│ │ │ └── self_attention // srn attention
│ │ │ └── model.py
│ │ ├── losses // 损失函数
│ │ │ ├── cls_loss.py // 方向分类器损失函数
│ │ │ ├── det_basic_loss.py // 检测基础loss
│ │ │ ├── det_db_loss.py // DB loss
│ │ │ ├── det_east_loss.py // EAST loss
│ │ │ ├── det_sast_loss.py // SAST loss
│ │ │ ├── rec_attention_loss.py // attention loss
│ │ │ ├── rec_ctc_loss.py // ctc loss
│ │ │ └── rec_srn_loss.py // srn loss
│ │ └── stns // 空间变换网络
│ │ └── tps.py // TPS 变换
│ ├── optimizer.py // 优化器
│ ├── postprocess // 后处理
│ │ ├── db_postprocess.py // DB 后处理
│ │ ├── east_postprocess.py // East 后处理
│ │ ├── lanms // lanms 相关
│ │ │ ...
│ │ ├── locality_aware_nms.py // nms
│ │ └── sast_postprocess.py // sast 后处理
│ └── utils // 工具
│ ├── character.py // 字符处理,包括对文本的编码和解码,计算预测准确率
│ ├── check.py // 参数加载检查
│ ├── ic15_dict.txt // 英文数字字典,区分大小写
│ ├── ppocr_keys_v1.txt // 中文字典,用于训练中文模型
│ ├── save_load.py // 模型保存和加载函数
│ ├── stats.py // 统计
│ └── utility.py // 工具函数,包含输入参数是否合法等相关检查工具
├── README_en.md // 说明文档
├── README.md
├── requirments.txt // 安装依赖
├── setup.py // whl包打包脚本
└── tools // 启动工具
├── eval.py // 评估函数
├── eval_utils // 评估工具
│ ├── eval_cls_utils.py // 分类相关
│ ├── eval_det_iou.py // 检测 iou 相关
│ ├── eval_det_utils.py // 检测相关
│ ├── eval_rec_utils.py // 识别相关
│ └── __init__.py
├── export_model.py // 导出 infer 模型
├── infer // 基于预测引擎预测
│ ├── predict_cls.py
│ ├── predict_det.py
│ ├── predict_rec.py
│ ├── predict_system.py
│ └── utility.py
├── infer_cls.py // 基于训练引擎 预测分类
├── infer_det.py // 基于训练引擎 预测检测
├── infer_rec.py // 基于训练引擎 预测识别
├── program.py // 整体流程
├── test_hubserving.py
└── train.py // 启动训练
```
## 中文OCR训练预测技巧
这里整理了一些中文OCR训练预测技巧,持续更新中,欢迎各位小伙伴贡献OCR炼丹秘籍~
- [更换骨干网络](#更换骨干网络)
- [中文长文本识别](#中文长文本识别)
- [空格识别](#空格识别)
<a name="更换骨干网络"></a>
#### 1、更换骨干网络
- **问题描述**
目前PaddleOCR中使用的骨干网络有ResNet_vd系列和MobileNetV3系列,更换骨干网络是否有助于效果提升?更换时需要注意什么?
- **炼丹建议**
- 无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)
- 文字检测骨干网络的替换,主要是确定类似与ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。
- 文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。
<a name="中文长文本识别"></a>
#### 2、中文长文本识别
- **问题描述**
中文识别模型训练时分辨率最大是[3,32,320],如果待识别的文本图像太长,如下图所示,该如何适配?
<div align="center">
<img src="../tricks/long_text_examples.jpg" width="600">
</div>
- **炼丹建议**
在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。[参考代码如下](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/predict_rec.py)
```
def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
assert imgC == img.shape[2]
if self.character_type == "ch":
imgW = int((32 * max_wh_ratio))
h, w = img.shape[:2]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
```
<a name="空格识别"></a>
#### 3、空格识别
- **问题描述**
如下图所示,对于中英文混合场景,为了便于阅读和使用识别结果,往往需要将单词之间的空格识别出来,这种情况如何适配?
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="600">
</div>
- **炼丹建议**
空格识别可以考虑以下两种方案:(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。PaddleOCR目前采用的是第二种方案。
\ No newline at end of file
# 更新
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- 2020.9.19 更新超轻量压缩ppocr_mobile_slim系列模型,整体模型3.5M(详见[PP-OCR Pipline](../../README_ch.md#PP-OCR)),适合在移动端部署使用。[模型下载](../../README_ch.md#模型下载)
- 2020.9.17 更新超轻量ppocr_mobile系列和通用ppocr_server系列中英文ocr模型,媲美商业效果。[模型下载](../../README_ch.md#模型下载)
- 2020.9.17 更新[英文识别模型](./models_list.md#english-recognition-model)[多语种识别模型](./models_list.md#english-recognition-model),已支持`德语、法语、日语、韩语`,更多语种识别模型将持续更新。
- 2020.8.26 更新OCR相关的84个常见问题及解答,具体参考[FAQ](./FAQ.md)
- 2020.8.24 支持通过whl包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)
- 2020.8.21 更新8月18日B站直播课回放和PPT,课节2,易学易用的OCR工具大礼包,[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
......@@ -7,8 +12,8 @@
- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark
- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md)
- 2020.6.8 添加[数据集](./doc/doc_ch/datasets.md),并保持持续更新
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./config.md)
- 2020.6.8 添加[数据集](./datasets.md),并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验
......
# 效果展示
- [超轻量级中文OCR效果展示](#超轻量级中文OCR)
- [通用中文OCR效果展示](#通用中文OCR)
- [支持空格的中文OCR效果展示](#支持空格的中文OCR)
<a name="超轻量级中文OCR"></a>
## 超轻量级中文OCR效果展示
<a name="通用ppocr_server_1.1效果展示"></a>
## 通用ppocr_server_1.1效果展示
<div align="center">
<img src="../imgs_results/1.jpg" width="800">
<img src="../imgs_results/1101.jpg" width="800">
<img src="../imgs_results/1102.jpg" width="800">
<img src="../imgs_results/1103.jpg" width="800">
<img src="../imgs_results/1104.jpg" width="800">
<img src="../imgs_results/1105.jpg" width="800">
<img src="../imgs_results/1106.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/7.jpg" width="800">
</div>
<a name="英文识别模型效果展示"></a>
## 英文识别模型效果展示
<div align="center">
<img src="../imgs_results/12.jpg" width="800">
<img src="../imgs_results/img_12.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/4.jpg" width="800">
</div>
<a name="多语言识别模型效果展示"></a>
## 多语言识别模型效果展示
<div align="center">
<img src="../imgs_results/6.jpg" width="800">
<img src="../imgs_results/1110.jpg" width="800">
<img src="../imgs_results/1112.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/9.jpg" width="800">
</div>
<a name="超轻量ppocr_mobile_1.0效果展示"></a>
## 超轻量ppocr_mobile_1.0效果展示
<div align="center">
<img src="../imgs_results/1.jpg" width="800">
<img src="../imgs_results/7.jpg" width="800">
<img src="../imgs_results/6.jpg" width="800">
<img src="../imgs_results/16.png" width="800">
</div>
<div align="center">
<img src="../imgs_results/22.jpg" width="800">
</div>
<a name="通用中文OCR"></a>
## 通用中文OCR效果展示
<a name="通用ppocr_server_1.0效果展示"></a>
## 通用ppocr_server_1.0效果展示
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/11.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/2.jpg" width="800">
</div>
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/8.jpg" width="800">
</div>
<a name="支持空格的中文OCR"></a>
## 支持空格的中文OCR效果展示
### 轻量级模型
<div align="center">
<img src="../imgs_results/img_11.jpg" width="800">
</div>
### 通用模型
<div align="center">
<img src="../imgs_results/chinese_db_crnn_server/en_paper.jpg" width="800">
</div>
......@@ -11,12 +11,47 @@ pip install paddleocr
本地构建并安装
```bash
python setup.py bdist_wheel
pip install dist/paddleocr-0.0.3-py3-none-any.whl
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号
```
### 1. 代码使用
* 检测+识别全流程
* 检测+分类+识别全流程
```python
from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
结果可视化
<div align="center">
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>
* 检测+识别
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
......@@ -48,12 +83,27 @@ im_show.save('result.jpg')
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>
* 分类+识别
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
print(line)
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```
* 单独执行检测
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path,rec=False)
result = ocr.ocr(img_path, rec=False)
for line in result:
print(line)
......@@ -84,7 +134,7 @@ im_show.save('result.jpg')
from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path,det=False)
result = ocr.ocr(img_path, det=False)
for line in result:
print(line)
```
......@@ -93,6 +143,20 @@ for line in result:
['韩国小馆', 0.9907421]
```
* 单独执行分类
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
print(line)
```
结果是一个list,每个item只包含分类结果和分类置信度
```bash
['0', 0.9999924]
```
### 通过命令行使用
查看帮助信息
......@@ -100,7 +164,19 @@ for line in result:
paddleocr -h
```
* 检测+识别全流程
* 检测+分类+识别全流程
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
* 检测+识别
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
```
......@@ -112,6 +188,16 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
......
```
* 分类+识别
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```
* 单独执行检测
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
......@@ -134,17 +220,27 @@ paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
['韩国小馆', 0.9907421]
```
* 单独执行分类
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false
```
结果是一个list,每个item只包含分类结果和分类置信度
```bash
['0', 0.9999924]
```
## 自定义模型
当内置模型无法满足需求时,需要使用到自己训练的模型。
首先,参照[inference.md](./inference.md) 第一节转换将检测和识别模型转换为inference模型,然后按照如下方式使用
首先,参照[inference.md](./inference.md) 第一节转换将检测、分类和识别模型转换为inference模型,然后按照如下方式使用
### 代码使用
```python
from paddleocr import PaddleOCR, draw_ocr
# 检测模型和识别模型路径下必须含有model和params文件
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}')
# 模型路径下必须含有model和params文件
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path)
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
......@@ -162,7 +258,7 @@ im_show.save('result.jpg')
### 通过命令行使用
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
```
## 参数说明
......@@ -182,13 +278,21 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
| det_east_cover_thresh | EAST模型输出框的阈值,低于此值的预测框会被丢弃 | 0.1 |
| det_east_nms_thresh | EAST模型输出框NMS的阈值 | 0.2 |
| rec_algorithm | 使用的识别算法类型 | CRNN |
| rec_model_dir | 识别模型所在文件夹。传承那方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| rec_model_dir | 识别模型所在文件夹。传方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" |
| rec_char_type | 识别算法的字符类型,中文(ch)英文(en) | ch |
| rec_char_type | 识别算法的字符类型,中文(ch)英文(en)、法语(french)、德语(german)、韩语(korean)、日语(japan) | ch |
| rec_batch_num | 进行识别时,同时前向的图片数 | 30 |
| max_text_length | 识别算法能识别的最大文字长度 | 25 |
| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt |
| use_space_char | 是否识别空格 | TRUE |
| use_angle_cls | 是否加载分类模型 | FALSE |
| cls_model_dir | 分类模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/cls`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| cls_image_shape | 分类算法的输入图片尺寸 | "3, 48, 192" |
| label_list | 分类算法的标签列表 | ['0', '180'] |
| cls_batch_num | 进行分类时,同时前向的图片数 |30 |
| enable_mkldnn | 是否启用mkldnn | FALSE |
| use_zero_copy_run | 是否通过zero_copy_run的方式进行前向 | FALSE |
| lang | 模型语言类型,目前支持 中文(ch)和英文(en) | ch |
| det | 前向时使用启动检测 | TRUE |
| rec | 前向时是否启动识别 | TRUE |
| cls | 前向时是否启动分类 | FALSE |
<a name="Algorithm_introduction"></a>
## Algorithm introduction
This tutorial lists the text detection algorithms and text recognition algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v1.1 models list](./models_list_en.md).
- [1. Text Detection Algorithm](#TEXTDETECTIONALGORITHM)
- [2. Text Recognition Algorithm](#TEXTRECOGNITIONALGORITHM)
<a name="TEXTDETECTIONALGORITHM"></a>
### 1. Text Detection Algorithm
PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
On the ICDAR2015 dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link|
|-|-|-|-|-|-|
|EAST|ResNet50_vd|88.18%|85.51%|86.82%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)|
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
On Total-Text dataset, the text detection result is as follows:
|Model|Backbone|precision|recall|Hmean|Download link|
|-|-|-|-|-|-|
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
**Note:** Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from [Baidu Drive](https://pan.baidu.com/s/12cPnZcVuV1zn5DOd4mqjVw) (download code: 2bpi).
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
<a name="TEXTRECOGNITIONALGORITHM"></a>
### 2. Text Recognition Algorithm
PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research)
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|Model|Backbone|Avg Accuracy|Module combination|Download link|
|-|-|-|-|-|
|Rosetta|Resnet34_vd|80.24%|rec_r34_vd_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_none_ctc.tar)|
|Rosetta|MobileNetV3|78.16%|rec_mv3_none_none_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_none_ctc.tar)|
|CRNN|Resnet34_vd|82.20%|rec_r34_vd_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_none_bilstm_ctc.tar)|
|CRNN|MobileNetV3|79.37%|rec_mv3_none_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar)|
|STAR-Net|Resnet34_vd|83.93%|rec_r34_vd_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)|
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA) (download code: y3ry).
The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
## TEXT ANGLE CLASSIFICATION
### DATA PREPARATION
Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data/cls`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
```
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
```
please refer to the following to organize your data.
- Training set
First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label.
* Note: by default, the image path and image label are split with `\t`, if you use other methods to split, it will cause training error
0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively.
```
" Image file name Image annotation "
train_data/word_001.jpg 0
train_data/word_002.jpg 180
```
The final training set should have the following file structure:
```
|-train_data
|-cls
|- cls_gt_train.txt
|- train
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
```
- Test set
Similar to the training set, the test set also needs to be provided a folder
containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows:
```
|-train_data
|-cls
|- cls_gt_test.txt
|- test
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
### TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.
Start training:
```
# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Training icdar15 English data
python3 tools/train.py -c configs/cls/cls_mv3.yml
```
- Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py)
[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
- Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/cls_mv3/best_accuracy` during the evaluation process.
If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
### EVALUATION
The evaluation data set can be modified via `configs/cls/cls_reader.yml` setting of `label_file_path` in EvalReader.
```
export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```
### PREDICTION
* Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script.
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
```
# Predict English results
python3 tools/infer_rec.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
```
Input image:
![](../imgs_words/en/word_1.png)
Get the prediction result of the input image:
```
infer_img: doc/imgs_words/en/word_1.png
scores: [[0.93161047 0.06838956]]
label: [0]
```
# BENCHMARK
This document gives the prediction time-consuming benchmark of PaddleOCR Ultra Lightweight Chinese Model (8.6M) on each platform.
This document gives the performance of the series models for Chinese and English recognition.
## TEST DATA
* 500 images were randomly sampled from the Chinese public data set [ICDAR2017-RCTW](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#ICDAR2017-RCTW-17).
Most of the pictures in the set were collected in the wild through mobile phone cameras.
Some are screenshots.
These pictures show various scenes, including street scenes, posters, menus, indoor scenes and screenshots of mobile applications.
## MEASUREMENT
The predicted time-consuming indicators on the four platforms are as follows:
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
<div align="center">
<img src="../datasets/doc.jpg" width = "1000" height = "500" />
</div>
| Long size(px) | T4(s) | V100(s) | Intel Xeon 6148(s) | Snapdragon 855(s) |
| :---------: | :-----: | :-------: | :------------------: | :-----------------: |
| 960 | 0.092 | 0.057 | 0.319 | 0.354 |
| 640 | 0.067 | 0.045 | 0.198 | 0.236 |
| 480 | 0.057 | 0.043 | 0.151 | 0.175 |
## MEASUREMENT
Explanation:
* The evaluation time-consuming stage is the complete stage from image input to result output, including image
pre-processing and post-processing.
* ```Intel Xeon 6148``` is the server-side CPU model. Intel MKL-DNN is used in the test to accelerate the CPU prediction speed.
To use this operation, you need to:
* Update to the latest version of PaddlePaddle: https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev
Please select the corresponding mkl version wheel package according to the CUDA version and Python version of your environment,
for example, CUDA10, Python3.7 environment, you should:
```
# Obtain the installation package
wget https://paddle-wheel.bj.bcebos.com/0.0.0-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
# Installation
pip3.7 install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
```
* Use parameters ```--enable_mkldnn True``` to turn on the acceleration switch when making predictions
* ```Snapdragon 855``` is a mobile processing platform model.
- v1.0 indicates DB+CRNN models without the strategies. v1.1 indicates the PP-OCR models with the strategies and the direction classify. slim_v1.1 indicates the PP-OCR models with prunner or quantization.
- The long size of the input for the text detector is 960.
- The evaluation time-consuming stage is the complete stage from image input to result output, including image pre-processing and post-processing.
- ```Intel Xeon 6148``` is the server-side CPU model. Intel MKL-DNN is used in the test to accelerate the CPU prediction speed.
- ```Snapdragon 855``` is a mobile processing platform model.
Compares the model size and F-score:
| Model Name | Model Size <br> of the <br> Whole System\(M\) | Model Size <br>of the Text <br> Detector\(M\) | Model Size <br> of the Direction <br> Classifier\(M\) | Model Size<br>of the Text <br> Recognizer \(M\) | F\-score |
|:-:|:-:|:-:|:-:|:-:|:-:|
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 |
| ch\_ppocr\_server\_v1\.1 | 155\.1 | 47\.2 | 0\.9 | 107 | 0\.5414 |
| ch\_ppocr\_mobile\_v1\.0 | 8\.6 | 4\.1 | \- | 4\.5 | 0\.393 |
| ch\_ppocr\_server\_v1\.0 | 203\.8 | 98\.5 | \- | 105\.3 | 0\.4436 |
Compares the time-consuming on T4 GPU (ms):
| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer |
|:-:|:-:|:-:|:-:|:-:|
| ch\_ppocr\_mobile\_v1\.1 | 137 | 35 | 24 | 78 |
| ch\_ppocr\_server\_v1\.1 | 204 | 39 | 25 | 140 |
| ch\_ppocr\_mobile\_v1\.0 | 117 | 41 | \- | 76 |
| ch\_ppocr\_server\_v1\.0 | 199 | 52 | \- | 147 |
Compares the time-consuming on CPU (ms):
| Model Name | Overall | Text Detector | Direction Classifier | Text Recognizer |
|:-:|:-:|:-:|:-:|:-:|
| ch\_ppocr\_mobile\_v1\.1 | 421 | 164 | 51 | 206 |
| ch\_ppocr\_mobile\_v1\.0 | 398 | 219 | \- | 179 |
Compares the model size, F-score, the time-consuming on SD 855 of between the slim models and the original models:
| Model Name | Model Size <br> of the <br> Whole System\(M\) | Model Size <br>of the Text <br> Detector\(M\) | Model Size <br> of the Direction <br> Classifier\(M\) | Model Size<br>of the Text <br> Recognizer \(M\) | F\-score | SD 855<br>\(ms\) |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
| ch\_ppocr\_mobile\_v1\.1 | 8\.1 | 2\.6 | 0\.9 | 4\.6 | 0\.5193 | 306 |
| ch\_ppocr\_mobile\_slim\_v1\.1 | 3\.5 | 1\.4 | 0\.5 | 1\.6 | 0\.521 | 268 |
......@@ -10,7 +10,7 @@ The following list can be viewed via `--help`
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
Take `rec_chinese_lite_train.yml` as an example
Take `rec_chinese_lite_train_v1.1.yml` as an example
| Parameter | Use | Default | Note |
......@@ -32,6 +32,7 @@ Take `rec_chinese_lite_train.yml` as an example
| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention |
| distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | Wether to recognize space | false | Only support in character_type=ch mode |
label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in the direction classifier |
| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ |
| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
......
......@@ -6,7 +6,7 @@ The process of making a customized ultra-lightweight OCR models can be divided i
PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
python3 tools/train.py -c configs/det/det_mv3_db.yml 2>&1 | tee det_db.log
```
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)
......@@ -14,7 +14,7 @@ For more details about data preparation and training tutorials, refer to the doc
PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
```
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml 2>&1 | tee rec_ch_lite.log
```
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)
......
......@@ -38,12 +38,14 @@ If you want to train PaddleOCR on other datasets, please build the annotation fi
## TRAINING
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```shell
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
# Download the pre-trained model of ResNet50
# or, download the pre-trained model of ResNet18_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
# or, download the pre-trained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
# decompressing the pre-training model file, take MobileNetV3 as an example
......@@ -62,7 +64,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
#### START TRAINING
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml 2>&1 | tee train_det.log
```
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
......@@ -70,15 +72,15 @@ For a detailed explanation of the configuration file, please refer to [config](.
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001
```
#### load trained model and conntinue training
#### load trained model and continue training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example:
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model
```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
......@@ -88,18 +90,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3_v1.1.yml`
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
Such as:
```shell
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
......@@ -108,16 +110,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou
Test the detection result on a single image:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
```
When testing the DB model, adjust the post-processing threshold:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```
Test the detection result on all images in the folder:
```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
```
......@@ -5,13 +5,14 @@ The inference model (the model saved by `fluid.io.save_inference_model`) is gene
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/zh_CN/extension/paddle_inference.md).
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model.
- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
- [Convert detection model to inference model](#Convert_detection_model)
- [Convert recognition model to inference model](#Convert_recognition_model)
- [Convert angle classification model to inference model](#Convert_angle_class_model)
- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
......@@ -19,15 +20,20 @@ Next, we first introduce how to convert a trained model into an inference model,
- [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
- [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
- [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
- [5. Multilingual model inference](#Multilingual model inference)
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
- [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
- [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
- [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
- [4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
- [5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
- [6. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
- [1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
- [TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
- [2. OTHER MODELS](#OTHER_MODELS)
......@@ -38,8 +44,9 @@ Next, we first introduce how to convert a trained model into an inference model,
Download the lightweight Chinese detection model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/
```
The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
```
# -c Set the training algorithm yml configuration file
......@@ -47,9 +54,9 @@ The above model is a DB algorithm trained with MobileNetV3 as the backbone. To c
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy \
Global.save_inference_dir=./inference/det_db/
python3 tools/export_model.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/
```
When converting to an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the `Global.checkpoints` and `Global.save_inference_dir` parameters in the configuration file.
`Global.checkpoints` points to the model parameter file saved during training, and `Global.save_inference_dir` is the directory where the generated inference model is saved.
After the conversion is successful, there are two files in the `save_inference_dir` directory:
......@@ -64,7 +71,7 @@ inference/det_db/
Download the lightweight Chinese recognition model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/
```
The recognition model is converted to the inference model in the same way as the detection, as follows:
......@@ -74,8 +81,7 @@ The recognition model is converted to the inference model in the same way as the
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \
Global.save_inference_dir=./inference/rec_crnn/
python3 tools/export_model.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy \
```
If you have a model trained on your own dataset with a different dictionary file, please make sure that you modify the `character_dict_path` in the configuration file to your dictionary file path.
......@@ -87,6 +93,33 @@ After the conversion is successful, there are two files in the directory:
└─ params Identify the parameter files of the inference model
```
<a name="Convert_angle_class_model"></a>
### Convert angle classification model to inference model
Download the angle classification model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar -C ./ch_lite/
```
The angle classification model is converted to the inference model in the same way as the detection, as follows:
```
# -c Set the training algorithm yml configuration file
# -o Set optional parameters
# Global.checkpoints parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_cls_train/best_accuracy \
Global.save_inference_dir=./inference/cls/
```
After the conversion is successful, there are two files in the directory:
```
/inference/cls/
└─ model Identify the saved model files
└─ params Identify the parameter files of the inference model
```
<a name="DETECTION_MODEL_INFERENCE"></a>
## TEXT DETECTION MODEL INFERENCE
......@@ -268,23 +301,82 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
```
<a name="SRN-BASED_RECOGNITION"></a>
### 4. SRN-BASED TEXT RECOGNITION MODEL INFERENCE
The recognition model based on SRN requires additional setting of the recognition algorithm parameter --rec_algorithm="SRN".
At the same time, it is necessary to ensure that the predicted shape is consistent with the training, such as: --rec_image_shape="1, 64, 256"
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" \
--rec_model_dir="./inference/srn/" \
--rec_image_shape="1, 64, 256" \
--rec_char_type="en" \
--rec_algorithm="SRN"
```
<a name="USING_CUSTOM_CHARACTERS"></a>
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
### 5. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
```
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
### 6. MULTILINGAUL MODEL INFERENCE
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/` path, such as Korean recognition:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/korean.ttf"
```
![](../imgs_words/korean/1.jpg)
After executing the command, the prediction result of the above figure is:
``` text
2020-09-19 16:15:05,076-INFO: index: [205 206 38 39]
2020-09-19 16:15:05,077-INFO: word : 바탕으로
2020-09-19 16:15:05,077-INFO: score: 0.9171358942985535
```
<a name="ANGLE_CLASSIFICATION_MODEL_INFERENCE"></a>
## ANGLE CLASSIFICATION MODEL INFERENCE
The following will introduce the angle classification model inference.
<a name="ANGLE_CLASS_MODEL_INFERENCE"></a>
### 1.ANGLE CLASSIFICATION MODEL INFERENCE
For angle classification model inference, you can execute the following commands:
```
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --cls_model_dir="./inference/cls/"
```
![](../imgs_words/ch/word_4.jpg)
After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen.
Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999963]
<a name="CONCATENATION"></a>
## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION
## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
<a name="LIGHTWEIGHT_CHINESE_MODEL"></a>
### 1. LIGHTWEIGHT CHINESE MODEL
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model.The visualized recognition results are saved to the `./inference_results` folder by default.
```
# use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true
# not use use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
```
......
......@@ -3,11 +3,11 @@
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
PaddleOCR working environment:
- PaddlePaddle1.7
- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0.0.beta
- python3.7
- glibc 2.23
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/).
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://www.runoob.com/docker/docker-tutorial.html/).
*If you want to directly run the prediction code on mac or windows, you can start from step 2.*
......@@ -49,18 +49,15 @@ docker images
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
```
**2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress)**
**2. Install PaddlePaddle Fluid v2.0**
```
pip3 install --upgrade pip
# If you have cuda9 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
# If you have cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
# If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
......
## OCR model list(V1.1, updated on 9.22)
- [1. Text Detection Model](#Detection)
- [2. Text Recognition Model](#Recognition)
- [Chinese Recognition Model](#Chinese)
- [English Recognition Model](#English)
- [Multilingual Recognition Model](#Multilingual)
- [3. Text Angle Classification Model](#Angle)
The downloadable models provided by PaddleOCR include `inference model`, `trained model`, `pre-trained model` and `slim model`. The differences between the models are as follows:
|model type|model format|description|
|-|-|-|
|inference model|model、params|Used for reasoning based on Python prediction engine. [detail](./inference_en.md)|
|trained model / pre-trained model|\*.pdmodel、\*.pdopt、\*.pdparams|The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb|Generally used for Lite deployment|
<a name="Detection"></a>
### 1. Text Detection Model
|model name|description|config|model size|download|
|-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|1.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb)|
|ch_ppocr_mobile_v1.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[det_mv3_db_v1.1.yml](../../configs/det/det_mv3_db_v1.1.yml)|2.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|
|ch_ppocr_server_v1.1_det|General model, which is larger than the lightweight model, but achieved better performance|[det_r18_vd_db_v1.1.yml](../../configs/det/det_r18_vd_db_v1.1.yml)|47.2M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar)|
<a name="Recognition"></a>
### 2. Text Recognition Model
<a name="Chinese"></a>
#### Chinese Recognition Model
|model name|description|config|model size|download|
|-|-|-|-|-|
|ch_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|1.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml)|4.6M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
|ch_ppocr_server_v1.1_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml)|105M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
<a name="English"></a>
#### English Recognition Model
|model name|description|config|model size|download|
|-|-|-|-|-|
|en_ppocr_mobile_slim_v1.1_rec|Slim pruned and quantized lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|0.9M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/en/en_ppocr_mobile_v1.1_rec_quant_opt.nb) |
|en_ppocr_mobile_v1.1_rec|Original lightweight model, supporting English and number recognition|[rec_en_lite_train.yml](../../configs/rec/multi_languages/rec_en_lite_train.yml)|2.0M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_train.tar) |
<a name="Multilingual"></a>
#### Multilingual Recognition Model(Updating...)
|model name|description|config|model size|download|
|-|-|-|-|-|
| french_ppocr_mobile_v1.1_rec |Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_languages/rec_french_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_train.tar) |
| german_ppocr_mobile_v1.1_rec |German model for French recognition|[rec_ger_lite_train.yml](../../configs/rec/multi_languages/rec_ger_lite_train.yml)|2.1M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_train.tar) |
| korean_ppocr_mobile_v1.1_rec |Lightweight model for Korean recognition|[rec_korean_lite_train.yml](../../configs/rec/multi_languages/rec_korean_lite_train.yml)|3.4M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_train.tar) |
| japan_ppocr_mobile_v1.1_rec |Lightweight model for Japanese recognition|[rec_japan_lite_train.yml](../../configs/rec/multi_languages/rec_japan_lite_train.yml)|3.7M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_train.tar) |
<a name="Angle"></a>
### 3. Text Angle Classification Model
|model name|description|config|model size|download|
|-|-|-|-|-|
|ch_ppocr_mobile_v1.1_cls_quant|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|0.5M|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_train.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) |
|ch_ppocr_mobile_v1.1_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|850kb|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |
## OCR model list(V1.0, updated on 7.16)
|model name|description|detection model|recognition model|recognition model supporting space recognition|
|-|-|-|-|-|
|chinese_db_crnn_mobile|8.6M lightweight OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|General OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar) | [inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar) |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
......@@ -5,16 +5,19 @@
Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment.
*Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)。*
* Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md).
## 2.inference models
| Name | Introduction | Detection model | Recognition model | Recognition model with space support |
|-|-|-|-|-|
|chinese_db_crnn_mobile| Ultra-lightweight Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server| Universal Chinese OCR model |[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
The detection and recognition models on the mobile and server sides are as follows. For more models (including multiple languages), please refer to [PP-OCR v1.1 series model list](../doc_ch/models_list.md)
* If wget is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, and uncompress it and place it in the corresponding directory.
| Model introduction | Model name | Recommended scene | Detection model | Direction Classifier | Recognition model |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| Ultra-lightweight Chinese OCR model(8.1M) | ch_ppocr_mobile_v1.1_xx |Mobile-side/Server-side|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) |
| Universal Chinese OCR model(155.1M) |ch_ppocr_server_v1.1_xx|Server-side |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) |
* If `wget` is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, then uncompress it and place it in the corresponding directory.
Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them.
......@@ -24,6 +27,8 @@ mkdir inference && cd inference
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition model and unzip
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
# Download the direction classifier model and unzip
wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package}
cd ..
```
......@@ -32,9 +37,11 @@ Take the ultra-lightweight model as an example:
```
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar && tar xf ch_ppocr_mobile_v1.1_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar && tar xf ch_ppocr_mobile_v1.1_rec_infer.tar
# Download the direction classifier model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar && tar xf ch_ppocr_mobile_v1.1_cls_infer.tar
cd ..
```
......@@ -42,10 +49,13 @@ After decompression, the file structure should be as follows:
```
|-inference
|-ch_rec_mv3_crnn
|-ch_ppocr_mobile_v1.1_det_infer
|- model
|- params
|-ch_ppocr_mobile_v1.1_rec_infer
|- model
|- params
|-ch_det_mv3_db
|-ch_ppocr_mobile_v1.1_cls_infer
|- model
|- params
...
......@@ -53,20 +63,20 @@ After decompression, the file structure should be as follows:
## 3. Single image or image set prediction
* The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual results are saved to the `./inference_results` folder by default.
* The following code implements text detection and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `rec_model_dir` specifies the path to identify the inference model, the parameter `use_angle_cls` specifies whether to use the direction classifier, the parameter `cls_model_dir` specifies the path to identify the direction classifier model, the parameter `use_space_char` specifies whether to predict the space char. The visual results are saved to the `./inference_results` folder by default.
```bash
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# Predict imageset specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False
```
- Universal Chinese OCR model
......@@ -75,25 +85,18 @@ Please follow the above steps to download the corresponding models and update th
```
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/" --use_angle_cls=True --use_space_char=True
```
- Universal Chinese OCR model with the support of space
Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows.
* Note
- If you want to use the recognition model which does not support space char recognition, please update the source code to the latest version and add parameters `--use_space_char=False`.
- If you do not want to use direction classifier, please update the source code to the latest version and add parameters `--use_angle_cls=False`.
* Note: Please update the source code to the latest version and add parameters `--use_space_char=True`
```
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/" --use_space_char=True
```
For more text detection and recognition tandem reasoning, please refer to the document tutorial
: [Inference with Python inference engine](./inference_en.md)
In addition, the tutorial also provides other deployment methods for the Chinese OCR model:
- [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md)
- [Service deployment](./serving_en.md)
- [Service deployment](../../deploy/pdserving/readme_en.md)
- [End-to-end deployment](../../deploy/lite/readme_en.md)
## TEXT RECOGNITION
- [DATA PREPARATION](#DATA_PREPARATION)
- [Dataset Download](#Dataset_download)
- [Costom Dataset](#Costom_Dataset)
- [Dictionary](#Dictionary)
- [Add Space Category](#Add_space_category)
- [TRAINING](#TRAINING)
- [Data Augmentation](#Data_Augmentation)
- [Training](#Training)
- [Multi-language](#Multi_language)
- [EVALUATION](#EVALUATION)
- [PREDICTION](#PREDICTION)
- [Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a>
### DATA PREPARATION
......@@ -13,13 +30,14 @@ The default storage path for training data is `PaddleOCR/train_data`, if you alr
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
```
<a name="Dataset_download"></a>
* Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required for benchmark
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
<a name="Costom_Dataset"></a>
* Use your own dataset:
If you want to use your own data for training, please refer to the following to organize your data.
......@@ -72,7 +90,7 @@ Similar to the training set, the test set also needs to be provided a folder con
|- word_003.jpg
| ...
```
<a name="Dictionary"></a>
- Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
......@@ -92,9 +110,21 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters.
`ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
`ppocr/utils/dict/french_dict.txt` is a French dictionary with 118 characters
`ppocr/utils/dict/japan_dict.txt` is a French dictionary with 4399 characters
`ppocr/utils/dict/korean_dict.txt` is a French dictionary with 3636 characters
`ppocr/utils/dict/german_dict.txt` is a French dictionary with 131 characters
You can use it on demand.
The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) or corpus file to [corpus](../../ppocr/utils/corpus) and we will thank you in the Repo.
You can use them if needed.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
......@@ -102,12 +132,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a>
- Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `true`.
**Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a>
### TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
......@@ -126,14 +158,12 @@ tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
Start training:
```
# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Training icdar15 English data
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
# Training icdar15 English data and saving the log as train_rec.log
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml 2>&1 | tee train_rec.log
```
<a name="Data_Augmentation"></a>
- Data Augmentation
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
......@@ -142,7 +172,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
<a name="Training"></a>
- Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
......@@ -154,7 +184,10 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
| Configuration file | Algorithm | backbone | trans | seq | pred |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: |
| [rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml) | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| [rec_chinese_common_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_common_train_v1.1.yml) | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc |
| rec_chinese_common_train.yml | CRNN | ResNet34_vd | None | BiLSTM | ctc |
| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc |
| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc |
......@@ -165,7 +198,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
For training Chinese data, it is recommended to use
训练中文数据,推荐使用[rec_chinese_lite_train_v1.1.yml](../../configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
co
Take `rec_mv3_none_none_ctc.yml` as an example:
```
......@@ -201,8 +235,43 @@ Optimizer:
```
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="Multi_language"></a>
- Multi-language
PaddleOCR also provides multi-language. The configuration file in `configs/rec/multi_languages` provides multi-language configuration files. Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: |
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English |
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean |
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded on [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
Take `rec_french_lite_train` as an example:
```
Global:
...
# Add a custom dictionary, if you modify the dictionary
# please point the path to the new dictionary
character_dict_path: ./ppocr/utils/dict/french_dict.txt
# Add data augmentation during training
distort: true
# Identify spaces
use_space_char: true
...
# Modify reader type
reader_yml: ./configs/rec/multi_languages/rec_french_reader.yml
...
...
```
<a name="EVALUATION"></a>
### EVALUATION
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
......@@ -210,11 +279,13 @@ The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml`
```
export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
python3 tools/eval.py -c configs/rec/rec_icdar15_reader.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```
<a name="PREDICTION"></a>
### PREDICTION
<a name="Training_engine_prediction"></a>
* Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script.
......@@ -223,7 +294,7 @@ The default prediction picture is stored in `infer_img`, and the weight is speci
```
# Predict English results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
```
Input image:
......@@ -238,11 +309,11 @@ infer_img: doc/imgs_words/en/word_1.png
word : joint
```
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, you can use the following command to predict the Chinese model:
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model with `python3 tools/train.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml`, you can use the following command to predict the Chinese model:
```
# Predict Chinese results
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
```
Input image:
......
# Service deployment
PaddleOCR provides 2 service deployment methods::
- Based on **HubServing**:Has been integrated into PaddleOCR ([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/hubserving)). Please follow this tutorial.
- Based on **PaddleServing**:See PaddleServing official website for details ([demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)). Follow-up will also be integrated into PaddleOCR.
The service deployment directory includes three service packages: detection, recognition, and two-stage series connection. Select the corresponding service package to install and start service according to your needs. The directory is as follows:
```
deploy/hubserving/
└─ ocr_det detection module service package
└─ ocr_rec recognition module service package
└─ ocr_system two-stage series connection service package
```
Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows:
```
deploy/hubserving/ocr_system/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
```
## Quick start service
The following steps take the 2-stage series service as an example. If only the detection service or recognition service is needed, replace the corresponding file path.
### 1. Prepare the environment
```shell
# Install paddlehub
pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
# Set environment variables on Linux
export PYTHONPATH=.
# Set environment variables on Windows
SET PYTHONPATH=.
```
### 2. Install Service Module
PaddleOCR provides 3 kinds of service modules, install the required modules according to your needs.
* On Linux platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy/hubserving/ocr_det/
# Or, install the recognition service module:
hub install deploy/hubserving/ocr_rec/
# Or, install the 2-stage series service module:
hub install deploy/hubserving/ocr_system/
```
* On Windows platform, the examples are as follows.
```shell
# Install the detection service module:
hub install deploy\hubserving\ocr_det\
# Or, install the recognition service module:
hub install deploy\hubserving\ocr_rec\
# Or, install the 2-stage series service module:
hub install deploy\hubserving\ocr_system\
```
### 3. Start service
#### Way 1. Start with command line parameters (CPU only)
**start command:**
```shell
$ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
--port XXXX \
--use_multiprocess \
--workers \
```
**parameters:**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start the 2-stage series service:
```shell
hub serving start -m ocr_system
```
This completes the deployment of a service API, using the default port number 8866.
#### Way 2. Start with configuration file(CPU、GPU)
**start command:**
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
```python
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true
},
"predict_args": {
}
}
},
"port": 8868,
"use_multiprocess": false,
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them, **when `use_gpu` is `true`, it means that the GPU is used to start the service**.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
For example, use GPU card No. 3 to start the 2-stage series service:
```shell
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/ocr_system/config.json
```
## Send prediction requests
After the service starts, you can use the following command to send a prediction request to obtain the prediction result:
```shell
python tools/test_hubserving.py server_url image_path
```
Two parameters need to be passed to the script:
- **server_url**:service address,format of which is
`http://[ip_address]:[port]/predict/[module_name]`
For example, if the detection, recognition and 2-stage serial services are started with provided configuration files, the respective `server_url` would be:
`http://127.0.0.1:8866/predict/ocr_det`
`http://127.0.0.1:8867/predict/ocr_rec`
`http://127.0.0.1:8868/predict/ocr_system`
- **image_path**:Test image path, can be a single image path or an image directory path
**Eg.**
```shell
python tools/test_hubserving.py http://127.0.0.1:8868/predict/ocr_system ./doc/imgs/
```
## Returned result format
The returned result is a list. Each item in the list is a dict. The dict may contain three fields. The information is as follows:
|field name|data type|description|
|-|-|-|
|text|str|text content|
|confidence|float|text recognition confidence|
|text_region|list|text location coordinates|
The fields returned by different modules are different. For example, the results returned by the text recognition service module do not contain `text_region`. The details are as follows:
|field name/module name|ocr_det|ocr_rec|ocr_system|
|-|-|-|-|
|text||✔|✔|
|confidence||✔|✔|
|text_region|✔||✔|
**Note:** If you need to add, delete or modify the returned fields, you can modify the file `module.py` of the corresponding module. For the complete process, refer to the user-defined modification service module in the next section.
## User defined service module modification
If you need to modify the service logic, the following steps are generally required (take the modification of `ocr_system` for example):
- 1. Stop service
```shell
hub serving stop --port/-p XXXX
```
- 2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `det_model_dir` and `rec_model_dir` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation. It is suggested to run `module.py` directly for debugging after modification before starting the service test.
- 3. Uninstall old service module
```shell
hub uninstall ocr_system
```
- 4. Install modified service module
```shell
hub install deploy/hubserving/ocr_system/
```
- 5. Restart service
```shell
hub serving start -m ocr_system
```
# Overall directory structure
The overall directory structure of PaddleOCR is introduced as follows:
```
PaddleOCR
├── configs // configuration file, you can select model structure and modify hyperparameters through yml file
│ ├── cls // Related configuration files of direction classifier
│ │ ├── cls_mv3.yml // training configuration related, including backbone network, head, loss, optimizer
│ │ └── cls_reader.yml // Data reading related, data reading method, data storage path
│ ├── det // Detection related configuration files
│ │ ├── det_db_icdar15_reader.yml // data read
│ │ ├── det_mv3_db.yml // training configuration
│ │ ...
│ └── rec // Identify related configuration files
│ ├── rec_benchmark_reader.yml // LMDB format data reading related
│ ├── rec_chinese_common_train.yml // General Chinese training configuration
│ ├── rec_icdar15_reader.yml // simple data reading related, including data reading function, data path, label file
│ ...
├── deploy // deployment related
│ ├── android_demo // android_demo
│ │ ...
│ ├── cpp_infer // C++ infer
│ │ ├── CMakeLists.txt // Cmake file
│ │ ├── docs // documentation
│ │ │ └── windows_vs2019_build.md
│ │ ├── include
│ │ │ ├── clipper.h // clipper library
│ │ │ ├── config.h // infer configuration
│ │ │ ├── ocr_cls.h // direction classifier
│ │ │ ├── ocr_det.h // text detection
│ │ │ ├── ocr_rec.h // text recognition
│ │ │ ├── postprocess_op.h // postprocess after detection
│ │ │ ├── preprocess_op.h // preprocess detection
│ │ │ └── utility.h // tools
│ │ ├── readme.md // documentation
│ │ ├── ...
│ │ ├── src // source file
│ │ │ ├── clipper.cpp
│ │ │ ├── config.cpp
│ │ │ ├── main.cpp
│ │ │ ├── ocr_cls.cpp
│ │ │ ├── ocr_det.cpp
│ │ │ ├── ocr_rec.cpp
│ │ │ ├── postprocess_op.cpp
│ │ │ ├── preprocess_op.cpp
│ │ │ └── utility.cpp
│ │ └── tools // compile and execute script
│ │ ├── build.sh // compile script
│ │ ├── config.txt // configuration file
│ │ └── run.sh // Test startup script
│ ├── docker
│ │ └── hubserving
│ │ ├── cpu
│ │ │ └── Dockerfile
│ │ ├── gpu
│ │ │ └── Dockerfile
│ │ ├── README_cn.md
│ │ ├── README.md
│ │ └── sample_request.txt
│ ├── hubserving // hubserving
│ │ ├── ocr_det // text detection
│ │ │ ├── config.json // serving configuration
│ │ │ ├── __init__.py
│ │ │ ├── module.py // prediction model
│ │ │ └── params.py // prediction parameters
│ │ ├── ocr_rec // text recognition
│ │ │ ├── config.json
│ │ │ ├── __init__.py
│ │ │ ├── module.py
│ │ │ └── params.py
│ │ └── ocr_system // system forecast
│ │ ├── config.json
│ │ ├── __init__.py
│ │ ├── module.py
│ │ └── params.py
│ ├── imgs // prediction picture
│ │ ├── cpp_infer_pred_12.png
│ │ └── demo.png
│ ├── ios_demo // ios demo
│ │ ...
│ ├── lite // lite deployment
│ │ ├── cls_process.cc // direction classifier data processing
│ │ ├── cls_process.h
│ │ ├── config.txt // check configuration parameters
│ │ ├── crnn_process.cc // crnn data processing
│ │ ├── crnn_process.h
│ │ ├── db_post_process.cc // db data processing
│ │ ├── db_post_process.h
│ │ ├── Makefile // compile file
│ │ ├── ocr_db_crnn.cc // series prediction
│ │ ├── prepare.sh // data preparation
│ │ ├── readme.md // documentation
│ │ ...
│ ├── pdserving // pdserving deployment
│ │ ├── det_local_server.py // fast detection version, easy deployment and fast prediction
│ │ ├── det_web_server.py // Full version of detection, high stability and distributed deployment
│ │ ├── ocr_local_server.py // detection + identification quick version
│ │ ├── ocr_web_client.py // client
│ │ ├── ocr_web_server.py // detection + identification full version
│ │ ├── readme.md // documentation
│ │ ├── rec_local_server.py // recognize quick version
│ │ └── rec_web_server.py // Identify the full version
│ └── slim
│ └── quantization // quantization related
│ ├── export_model.py // export model
│ ├── quant.py // quantization
│ └── README.md // Documentation
├── doc // Documentation tutorial
│ ...
├── paddleocr.py
├── ppocr // network core code
│ ├── data // data processing
│ │ ├── cls // direction classifier
│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch
│ │ │ └── randaugment.py // Random data augmentation operation
│ │ ├── det // detection
│ │ │ ├── data_augment.py // data augmentation operation
│ │ │ ├── dataset_traversal.py // Data transmission, define data reader, read data and form batch
│ │ │ ├── db_process.py // db data processing
│ │ │ ├── east_process.py // east data processing
│ │ │ ├── make_border_map.py // Generate boundary map
│ │ │ ├── make_shrink_map.py // Generate shrink map
│ │ │ ├── random_crop_data.py // random crop
│ │ │ └── sast_process.py // sast data processing
│ │ ├── reader_main.py // main function of data reader
│ │ └── rec // recognation
│ │ ├── dataset_traversal.py // Data transmission, define data reader, including LMDB_Reader and Simple_Reader
│ │ └── img_tools.py // Data processing related, including data normalization and disturbance
│ ├── __init__.py
│ ├── modeling // networking related
│ │ ├── architectures // Model architecture, which defines the various modules required by the model
│ │ │ ├── cls_model.py // direction classifier
│ │ │ ├── det_model.py // detection
│ │ │ └── rec_model.py // recognition
│ │ ├── backbones // backbone network
│ │ │ ├── det_mobilenet_v3.py // detect mobilenet_v3
│ │ │ ├── det_resnet_vd.py
│ │ │ ├── det_resnet_vd_sast.py
│ │ │ ├── rec_mobilenet_v3.py // recognize mobilenet_v3
│ │ │ ├── rec_resnet_fpn.py
│ │ │ └── rec_resnet_vd.py
│ │ ├── common_functions.py // common functions
│ │ ├── heads
│ │ │ ├── cls_head.py // class header
│ │ │ ├── det_db_head.py // db detection head
│ │ │ ├── det_east_head.py // east detection head
│ │ │ ├── det_sast_head.py // sast detection head
│ │ │ ├── rec_attention_head.py // recognition attention
│ │ │ ├── rec_ctc_head.py // recognition ctc
│ │ │ ├── rec_seq_encoder.py // recognition sequence code
│ │ │ ├── rec_srn_all_head.py // srn related
│ │ │ └── self_attention // srn attention
│ │ │ └── model.py
│ │ ├── losses // loss function
│ │ │ ├── cls_loss.py // Directional classifier loss function
│ │ │ ├── det_basic_loss.py // detect basic loss
│ │ │ ├── det_db_loss.py // DB loss
│ │ │ ├── det_east_loss.py // EAST loss
│ │ │ ├── det_sast_loss.py // SAST loss
│ │ │ ├── rec_attention_loss.py // attention loss
│ │ │ ├── rec_ctc_loss.py // ctc loss
│ │ │ └── rec_srn_loss.py // srn loss
│ │ └── stns // Spatial transformation network
│ │ └── tps.py // TPS conversion
│ ├── optimizer.py // optimizer
│ ├── postprocess // post-processing
│ │ ├── db_postprocess.py // DB postprocess
│ │ ├── east_postprocess.py // East postprocess
│ │ ├── lanms // lanms related
│ │ │ ...
│ │ ├── locality_aware_nms.py // nms
│ │ └── sast_postprocess.py // sast post-processing
│ └── utils // tools
│ ├── character.py // Character processing, including text encoding and decoding, and calculation of prediction accuracy
│ ├── check.py // parameter loading check
│ ├── ic15_dict.txt // English number dictionary, case sensitive
│ ├── ppocr_keys_v1.txt // Chinese dictionary, used to train Chinese models
│ ├── save_load.py // model save and load function
│ ├── stats.py // Statistics
│ └── utility.py // Tool functions, including related check tools such as whether the input parameters are legal
├── README_en.md // documentation
├── README.md
├── requirments.txt // installation dependencies
├── setup.py // whl package packaging script
└── tools // start tool
├── eval.py // evaluation function
├── eval_utils // evaluation tools
│ ├── eval_cls_utils.py // category related
│ ├── eval_det_iou.py // detect iou related
│ ├── eval_det_utils.py // detection related
│ ├── eval_rec_utils.py // recognition related
│ └── __init__.py
├── export_model.py // export infer model
├── infer // Forecast based on prediction engine
│ ├── predict_cls.py
│ ├── predict_det.py
│ ├── predict_rec.py
│ ├── predict_system.py
│ └── utility.py
├── infer_cls.py // Predict classification based on training engine
├── infer_det.py // Predictive detection based on training engine
├── infer_rec.py // Predictive recognition based on training engine
├── program.py // overall process
├── test_hubserving.py
└── train.py // start training
```
# RECENT UPDATES
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](../../README.md#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](../../README.md#Supported-Chinese-model-list)
- 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](../../README.md#Supported-Chinese-model-list)
- 2020.9.17 update [English recognition model](./models_list_en.md#english-recognition-model) and [Multilingual recognition model](./models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
......@@ -7,7 +11,7 @@
- 2020.7.15, Add several related datasets, data annotation and synthesis tools.
- 2020.7.9 Add a new model to support recognize the character "space".
- 2020.7.9 Add the data augument and learning rate decay strategies during training.
- 2020.6.8 Add [datasets](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.8 Add [datasets](./datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.6.5 Support exporting `attention` model to `inference_model`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment