Merge pull request #5 from PaddlePaddle/develop

merge paddleocr

Merge pull request #5 from PaddlePaddle/develop
merge paddleocr
ee05c913 · zhoujun · GitHub · 7c09c97d · 2bdaea56 · ee05c913
Unverified Commit ee05c913 authored Aug 27, 2020 by zhoujun Committed by GitHub Aug 27, 2020
20 changed files
--- a/deploy/lite/readme_en.md
+++ b/deploy/lite/readme_en.md
@@ -13,7 +13,7 @@ deployment solutions for end-side deployment issues.
 - Computer (for Compiling Paddle Lite)
 - Mobile phone (arm7 or arm8)

-## 2. Build ncnn library
+## 2. Build PaddleLite library
 [build for Docker](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#docker)
 [build for Linux](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#android)
 [build for MAC OS](https://paddle-lite.readthedocs.io/zh/latest/user_guides/source_compile.html#id13)

--- a/deploy/pdserving/det_local_server.py
+++ b/deploy/pdserving/det_local_server.py
@@ -23,7 +23,7 @@ from paddle_serving_app.reader import Div, Normalize, Transpose
 from paddle_serving_app.reader import DBPostProcess, FilterBoxes
 if sys.argv[1] == 'gpu':
    from paddle_serving_server_gpu.web_service import WebService
-elif sys.argv[1] == 'cpu'
+elif sys.argv[1] == 'cpu':
    from paddle_serving_server.web_service import WebService
 import time
 import re
@@ -67,11 +67,13 @@ class OCRService(WebService):

 ocr_service = OCRService(name="ocr")
 ocr_service.load_model_config("ocr_det_model")
+ocr_service.init_det()
 if sys.argv[1] == 'gpu':
    ocr_service.set_gpus("0")
    ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
+    ocr_service.run_debugger_service(gpu=True)
 elif sys.argv[1] == 'cpu':
    ocr_service.prepare_server(workdir="workdir", port=9292)
+    ocr_service.run_debugger_service()
 ocr_service.init_det()
-ocr_service.run_debugger_service()
 ocr_service.run_web_service()
--- a/deploy/pdserving/ocr_local_server.py
+++ b/deploy/pdserving/ocr_local_server.py
@@ -104,10 +104,11 @@ class OCRService(WebService):

 ocr_service = OCRService(name="ocr")
 ocr_service.load_model_config("ocr_rec_model")
-ocr_service.prepare_server(workdir="workdir", port=9292)
 ocr_service.init_det_debugger(det_model_config="ocr_det_model")
 if sys.argv[1] == 'gpu':
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
    ocr_service.run_debugger_service(gpu=True)
 elif sys.argv[1] == 'cpu':
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
    ocr_service.run_debugger_service()
 ocr_service.run_web_service()
--- a/deploy/pdserving/readme.md
+++ b/deploy/pdserving/readme.md
@@ -55,6 +55,23 @@ tar -xzvf ocr_det.tar.gz
 ```
 执行上述命令会下载`db_crnn_mobile`的模型，如果想要下载规模更大的`db_crnn_server`模型，可以在下载预测模型并解压之后。参考[如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型](https://github.com/PaddlePaddle/Serving/blob/develop/doc/INFERENCE_TO_SERVING_CN.md)。

+我们以`ch_rec_r34_vd_crnn`模型作为例子，下载链接在：
+
+```
+wget --no-check-certificate https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar
+tar xf ch_rec_r34_vd_crnn_infer.tar
+```
+因此我们按照Serving模型转换教程，运行下列python文件。
+```
+from paddle_serving_client.io import inference_model_to_serving
+inference_model_dir = "ch_rec_r34_vd_crnn"
+serving_client_dir = "serving_client_dir"
+serving_server_dir = "serving_server_dir"
+feed_var_names, fetch_var_names = inference_model_to_serving(
+        inference_model_dir, serving_client_dir, serving_server_dir, model_filename="model", params_filename="params")
+```
+最终会在`serving_client_dir`和`serving_server_dir`生成客户端和服务端的模型配置。
+
 ### 3. 启动服务
 启动服务可以根据实际需求选择启动`标准版`或者`快速版`，两种方式的对比如下表：  


--- a/deploy/pdserving/rec_local_server.py
+++ b/deploy/pdserving/rec_local_server.py
@@ -22,7 +22,10 @@ from paddle_serving_client import Client
 from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
 from paddle_serving_app.reader import Div, Normalize, Transpose
 from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
-from paddle_serving_server_gpu.web_service import WebService
+if sys.argv[1] == 'gpu':
+    from paddle_serving_server_gpu.web_service import WebService
+elif sys.argv[1] == 'cpu':
+    from paddle_serving_server.web_service import WebService
 import time
 import re
 import base64
@@ -65,8 +68,12 @@ class OCRService(WebService):

 ocr_service = OCRService(name="ocr")
 ocr_service.load_model_config("ocr_rec_model")
-ocr_service.set_gpus("0")
 ocr_service.init_rec()
-ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
-ocr_service.run_debugger_service()
+if sys.argv[1] == 'gpu':
+    ocr_service.set_gpus("0")
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
+    ocr_service.run_debugger_service(gpu=True)
+elif sys.argv[1] == 'cpu':
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
+    ocr_service.run_debugger_service()
 ocr_service.run_web_service()
--- a/doc/doc_ch/FAQ.md
+++ b/doc/doc_ch/FAQ.md
-## FAQ
+# FAQ

-1. **预测报错：got an unexpected keyword argument 'gradient_clip'**  
-安装的paddle版本不对，目前本项目仅支持paddle1.7，近期会适配到1.8。
+## 写在前面

-2. **转换attention识别模型时报错：KeyError: 'predict'**  
-问题已解决，请更新到最新代码。
+- 我们收集整理了开源以来在issues和用户群中的常见问题并且给出了简要解答，旨在为OCR的开发者提供一些参考，也希望帮助大家少走一些弯路。

-3. **关于推理速度**  
-图片中的文字较多时，预测时间会增，可以使用--rec_batch_num设置更小预测batch num，默认值为30，可以改为10或其他数值。
+- OCR领域大佬众多，本文档回答主要依赖有限的项目实践，难免挂一漏万，如有遗漏和不足，也**希望有识之士帮忙补充和修正**，万分感谢。

-4. **服务部署与移动端部署**  
-预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案，欢迎持续关注。

-5. **自研算法发布时间**  
-自研算法SAST、SRN、End2End-PSL都将在7-8月陆续发布，敬请期待。
+## PaddleOCR常见问题汇总(持续更新)

-6. **如何在Windows或Mac系统上运行**  
-PaddleOCR已完成Windows和Mac系统适配，运行时注意两点：1、在[快速安装](./installation.md)时，如果不想安装docker，可跳过第一步，直接从第二步安装paddle开始。2、inference模型下载时，如果没有安装wget，可直接点击模型链接或将链接地址复制到浏览器进行下载，并解压放置到相应目录。
+* [【精选】OCR精选10个问题](#OCR精选10个问题)
+* [【理论篇】OCR通用21个问题](#OCR通用问题)
+  * [基础知识3题](#基础知识)
+  * [数据集4题](#数据集)
+  * [模型训练调优6题](#模型训练调优)
+  * [预测部署8题](#预测部署)
+* [【实战篇】PaddleOCR实战53个问题](#PaddleOCR实战问题)
+  * [使用咨询17题](#使用咨询)
+  * [数据集9题](#数据集)
+  * [模型训练调优13题](#模型训练调优)
+  * [预测部署14题](#预测部署)

-7. **超轻量模型和通用OCR模型的区别**  
-目前PaddleOCR开源了2个中文模型，分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下：
-    - 相同点：两者使用相同的**算法**和**训练数据**；  
-    - 不同点：不同之处在于**骨干网络**和**通道参数**，超轻量模型使用MobileNetV3作为骨干网络，通用模型使用Resnet50_vd作为检测模型backbone，Resnet34_vd作为识别模型backbone，具体参数差异可对比两种模型训练的配置文件.
+
+
+<a name="OCR精选10个问题"></a>
+## 【精选】OCR精选10个问题
+
+#### Q1.1.1：基于深度学习的文字检测方法有哪几种？各有什么优缺点？
+
+**A**：常用的基于深度学习的文字检测方法一般可以分为基于回归的、基于分割的两大类，当然还有一些将两者进行结合的方法。
+
+（1）基于回归的方法分为box回归和像素值回归。a. 采用box回归的方法主要有CTPN、Textbox系列和EAST，这类算法对规则形状文本检测效果较好，但无法准确检测不规则形状文本。 b. 像素值回归的方法主要有CRAFT和SA-Text，这类算法能够检测弯曲文本且对小文本效果优秀但是实时性能不够。
+
+（2）基于分割的算法，如PSENet，这类算法不受文本形状的限制，对各种形状的文本都能取得较好的效果，但是往往后处理比较复杂，导致耗时严重。目前也有一些算法专门针对这个问题进行改进，如DB，将二值化进行近似，使其可导，融入训练，从而获取更准确的边界，大大降低了后处理的耗时。
+
+#### Q1.1.2：对于中文行文本识别，CTC和Attention哪种更优？
+
+**A**：（1）从效果上来看，通用OCR场景CTC的识别效果优于Attention，因为带识别的字典中的字符比较多，常用中文汉字三千字以上，如果训练样本不足的情况下，对于这些字符的序列关系挖掘比较困难。中文场景下Attention模型的优势无法体现。而且Attention适合短语句识别，对长句子识别比较差。
+
+（2）从训练和预测速度上，Attention的串行解码结构限制了预测速度，而CTC网络结构更高效，预测速度上更有优势。
+
+#### Q1.1.3：弯曲形变的文字识别需要怎么处理？TPS应用场景是什么，是否好用？
+
+**A**：（1）在大多数情况下，如果遇到的场景弯曲形变不是太严重，检测4个顶点，然后直接通过仿射变换转正识别就足够了。
+
+（2）如果不能满足需求，可以尝试使用TPS（Thin Plate Spline），即薄板样条插值。TPS是一种插值算法，经常用于图像变形等，通过少量的控制点就可以驱动图像进行变化。一般用在有弯曲形变的文本识别中，当检测到不规则的/弯曲的（如，使用基于分割的方法检测算法）文本区域，往往先使用TPS算法对文本区域矫正成矩形再进行识别，如，STAR-Net、RARE等识别算法中引入了TPS模块。
+**Warning**：TPS看起来美好，在实际应用时经常发现并不够鲁棒，并且会增加耗时，需要谨慎使用。
+
+#### Q1.1.4：简单的对于精度要求不高的OCR任务，数据集需要准备多少张呢？
+
+**A**：（1）训练数据的数量和需要解决问题的复杂度有关系。难度越大，精度要求越高，则数据集需求越大，而且一般情况实际中的训练数据越多效果越好。
+
+（2）对于精度要求不高的场景，检测任务和识别任务需要的数据量是不一样的。对于检测任务，500张图像可以保证基本的检测效果。对于识别任务，需要保证识别字典中每个字符出现在不同场景的行文本图像数目需要大于200张（举例，如果有字典中有5个字，每个字都需要出现在200张图片以上，那么最少要求的图像数量应该在200-1000张之间），这样可以保证基本的识别效果。
+
+#### Q1.1.5：背景干扰的文字（如印章盖到落款上，需要识别落款或者印章中的文字），如何识别？
+
+**A**：（1）在人眼确认可识别的条件下，对于背景有干扰的文字，首先要保证检测框足够准确，如果检测框不准确，需要考虑是否可以通过过滤颜色等方式对图像预处理并且增加更多相关的训练数据；在识别的部分，注意在训练数据中加入背景干扰类的扩增图像。
+
+（2）如果MobileNet模型不能满足需求，可以尝试ResNet系列大模型来获得更好的效果
+。
+
+#### Q1.1.6：OCR领域常用的评估指标是什么？
+
+**A**：对于两阶段的可以分开来看，分别是检测和识别阶段
+
+（1）检测阶段：先按照检测框和标注框的IOU评估，IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框，是采用多边形进行表示。检测准确率：正确的检测框个数在全部检测框的占比，主要是判断检测指标。检测召回率：正确的检测框个数在全部标注框的占比，主要是判断漏检的指标。
+
+
+（2）识别阶段：
+字符识别准确率，即正确识别的文本行占标注的文本行数量的比例，只有整行文本识别对才算正确识别。
+
+（3）端到端统计：
+端对端召回率：准确检测并正确识别文本行在全部标注文本行的占比；
+端到端准确率：准确检测并正确识别文本行在 检测到的文本行数量 的占比；
+准确检测的标准是检测框与标注框的IOU大于某个阈值，正确识别的的检测框中的文本与标注的文本相同。
+
+
+#### Q1.1.7：单张图上多语种并存识别（如单张图印刷体和手写文字并存），应该如何处理？
+
+**A**：单张图像中存在多种类型文本的情况很常见，典型的以学生的试卷为代表，一张图像同时存在手写体和印刷体两种文本，这类情况下，可以尝试”1个检测模型+1个N分类模型+N个识别模型”的解决方案。
+其中不同类型文本共用同一个检测模型，N分类模型指额外训练一个分类器，将检测到的文本进行分类，如手写+印刷的情况就是二分类，N种语言就是N分类，在识别的部分，针对每个类型的文本单独训练一个识别模型，如手写+印刷的场景，就需要训练一个手写体识别模型，一个印刷体识别模型，如果一个文本框的分类结果是手写体，那么就传给手写体识别模型进行识别，其他情况同理。
+
+#### Q1.1.8：请问PaddleOCR项目中的中文超轻量和通用模型用了哪些数据集？训练多少样本，gpu什么配置，跑了多少个epoch，大概跑了多久？
+
+**A**：
+（1）检测的话，LSVT街景数据集共3W张图像，超轻量模型，150epoch左右，2卡V100 跑了不到2天；通用模型：2卡V100 150epoch 不到4天。
+（2）
+识别的话，520W左右的数据集（真实数据26W+合成数据500W）训练，超轻量模型：4卡V100，总共训练了5天左右。通用模型：4卡V100，共训练6天。
+
+超轻量模型训练分为2个阶段：
+(1)全量数据训练50epoch，耗时3天
+(2)合成数据+真实数据按照1:1数据采样，进行finetune训练200epoch，耗时2天
+
+通用模型训练：
+真实数据+合成数据，动态采样(1：1)训练，200epoch，耗时 6天左右。
+
+
+#### Q1.1.9：PaddleOCR模型推理方式有几种？各自的优缺点是什么
+
+**A**：目前推理方式支持基于训练引擎推理和基于预测引擎推理。
+
+（1）基于训练引擎推理不需要转换模型，但是需要先组网再load参数，语言只支持python，不适合系统集成。
+
+（2）基于预测引擎的推理需要先转换模型为inference格式，然后可以进行不需要组网的推理，语言支持c++和python，适合系统集成。
+
+#### Q1.1.10：PaddleOCR中，对于模型预测加速，CPU加速的途径有哪些？基于TenorRT加速GPU对输入有什么要求？
+
+**A**：（1）CPU可以使用mkldnn进行加速；对于python inference的话，可以把enable_mkldnn改为true，[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/tools/infer/utility.py#L73)，对于cpp inference的话，在配置文件里面配置use_mkldnn 1即可，[参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/deploy/cpp_infer/tools/config.txt#L6)
+
+（2）GPU需要注意变长输入问题等，TRT6 之后才支持变长输入
+
+<a name="OCR通用问题"></a>
+## 【理论篇】OCR通用问题
+### 基础知识
+
+#### Q2.1.1：CRNN能否识别两行的文字?还是说必须一行？
+
+**A**：CRNN是一种基于1D-CTC的算法，其原理决定无法识别2行或多行的文字，只能单行识别。
+
+#### Q2.1.2：怎么判断行文本图像是否是颠倒的？
+
+**A**：有两种方案：（1）原始图像和颠倒图像都进行识别预测，取得分较高的为识别结果。
+（2）训练一个正常图像和颠倒图像的方向分类器进行判断。
+
+#### Q2.1.3：目前OCR普遍是二阶段，端到端的方案在业界落地情况如何？
+
+**A**：端到端在文字分布密集的业务场景，效率会比较有保证，精度的话看自己业务数据积累情况，如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景，比如工业仪表识别、车牌识别都用到端到端解决方案。
+
+
+### 数据集
+
+#### Q2.2.1：支持空格的模型，标注数据的时候是不是要标注空格？中间几个空格都要标注出来么？
+
+**A**：如果需要检测和识别模型，就需要在标注的时候把空格标注出来，而且在字典中增加空格对应的字符。标注过程中，如果中间几个空格标注一个就行。
+
+#### Q2.2.2：如果考虑支持竖排文字识别，相关的数据集如何合成？
+
+**A**：竖排文字与横排文字合成方式相同，只是选择了垂直字体。合成工具推荐：[text_renderer](https://github.com/Sanster/text_renderer)
+
+#### Q2.2.3：训练文字识别模型，真实数据有30w，合成数据有500w，需要做样本均衡吗？
+
+**A**：需要，一般需要保证一个batch中真实数据样本和合成数据样本的比例是1：1~1：3左右效果比较理想。如果合成数据过大，会过拟合到合成数据，预测效果往往不佳。还有一种**启发性**的尝试是可以先用大量合成数据训练一个base模型，然后再用真实数据微调，在一些简单场景效果也是会有提升的。
+
+#### Q2.2.4：请问一下，竖排文字识别时候，字的特征已经变了，这种情况在数据集和字典标注是新增一个类别还是多个角度的字共享一个类别？
+
+**A**：可以根据实际场景做不同的尝试，共享一个类别是可以收敛，效果也还不错。但是如果分开训练，同类样本之间一致性更好，更容易收敛，识别效果会更优。
+
+### 模型训练调优
+
+#### Q2.3.1：如何更换文本检测/识别的backbone？
+**A**：无论是文字检测，还是文字识别，骨干网络的选择是预测效果和预测效率的权衡。一般，选择更大规模的骨干网络，例如ResNet101_vd，则检测或识别更准确，但预测耗时相应也会增加。而选择更小规模的骨干网络，例如MobileNetV3_small_x0_35，则预测更快，但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。[**飞桨图像分类套件PaddleClas**](https://github.com/PaddlePaddle/PaddleClas)汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构，在上述图像分类任务的top1识别准确率，GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的[**117个预训练模型下载地址**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+
+ （1）文字检测骨干网络的替换，主要是确定类似与ResNet的4个stages，以方便集成后续的类似FPN的检测头。此外，对于文字检测问题，使用ImageNet训练的分类预训练模型，可以加速收敛和效果提升。
+
+ （2）文字识别的骨干网络的替换，需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大，因此高度下降频率少一些，宽度下降频率多一些。可以参考PaddleOCR中[MobileNetV3骨干网络](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py)的改动。
+
+#### Q2.3.2：文本识别训练不加LSTM是否可以收敛？
+
+**A**：理论上是可以收敛的，加上LSTM模块主要是为了挖掘文字之间的序列关系，提升识别效果。对于有明显上下文语义的场景效果会比较明显。
+
+#### Q2.3.3：文本识别中LSTM和GRU如何选择？
+
+**A**：从项目实践经验来看，序列模块采用LSTM的识别效果优于GRU，但是LSTM的计算量比GRU大一些，可以根据自己实际情况选择。
+
+#### Q2.3.4：对于CRNN模型，backbone采用DenseNet和ResNet_vd，哪种网络结构更好？
+
+**A**：Backbone的识别效果在CRNN模型上的效果，与Imagenet 1000 图像分类任务上识别效果和效率一致。在图像分类任务上ResnNet_vd（79%+）的识别精度明显优于DenseNet（77%+），此外对于GPU，Nvidia针对ResNet系列模型做了优化，预测效率更高，所以相对而言，resnet_vd是较好选择。如果是移动端，可以优先考虑MobileNetV3系列。
+
+#### Q2.3.5：训练识别时，如何选择合适的网络输入shape？
+
+**A**：一般高度采用32，最长宽度的选择，有两种方法：
+
+（1）统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。
+
+（2）统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1，英文认为3：1，预估一个最长宽度。
+
+#### Q2.3.6：如何识别文字比较长的文本？
+
+**A**：在中文识别模型训练时，并不是采用直接将训练样本缩放到[3,32,320]进行训练，而是先等比例缩放图像，保证图像高度为32，宽度不足320的部分补0，宽高比大于10的样本直接丢弃。预测时，如果是单张图像预测，则按上述操作直接对图像缩放，不做宽度320的限制。如果是多张图预测，则采用batch方式预测，每个batch的宽度动态变换，采用这个batch中最长宽度。
+
+### 预测部署
+
+#### Q2.4.1：请问对于图片中的密集文字，有什么好的处理办法吗？
+
+**A**：可以先试用预训练模型测试一下，例如DB+CRNN，判断下密集文字图片中是检测还是识别的问题，然后针对性的改善。还有一种是如果图象中密集文字较小，可以尝试增大图像分辨率，对图像进行一定范围内的拉伸，将文字稀疏化，提高识别效果。
+
+#### Q2.4.2：对于一些在识别时稍微模糊的文本，有没有一些图像增强的方式？
+
+**A**：在人类肉眼可以识别的前提下，可以考虑图像处理中的均值滤波、中值滤波或者高斯滤波等模糊算子尝试。也可以尝试从数据扩增扰动来强化模型鲁棒性，另外新的思路有对抗性训练和超分SR思路，可以尝试借鉴。但目前业界尚无普遍认可的最优方案，建议优先在数据采集阶段增加一些限制提升图片质量。
+
+#### Q2.4.3：对于特定文字检测，例如身份证只检测姓名，检测指定区域文字更好，还是检测全部区域再筛选更好？
+
+**A**：两个角度来说明一般检测全部区域再筛选更好。
+
+（1）由于特定文字和非特定文字之间的视觉特征并没有很强的区分行，只检测指定区域，容易造成特定文字漏检。
+
+（2）产品的需求可能是变化的，不排除后续对于模型需求变化的可能性（比如又需要增加一个字段），相比于训练模型，后处理的逻辑会更容易调整。
+
+#### Q2.4.4：对于小白如何快速入门中文OCR项目实践？
+
+**A**：建议可以先了解OCR方向的基础知识，大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看，从内容的完备性来看，PaddleOCR的中英文双语教程文档是有明显优势的，在数据集、模型训练、预测部署文档详实，可以快速入手。而且还有微信用户群答疑，非常适合学习实践。项目地址：[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
+
+#### Q2.4.5：如何识别带空格的英文行文本图像？
+
+**A**：空格识别可以考虑以下两种方案：
+
+(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时，需要将含有空格的文本行分成好多段。
+
+(2)优化文本识别算法。在识别字典里面引入空格字符，然后在识别的训练数据中，如果用空行，进行标注。此外，合成数据时，通过拼接训练数据，生成含有空格的文本。
+
+#### Q2.4.6：中英文一起识别时也可以加空格字符来训练吗
+
+**A**：中文识别可以加空格当做分隔符训练，具体的效果如何没法给出直接评判，根据实际业务数据训练来判断。
+
+#### Q2.4.7：低像素文字或者字号比较小的文字有什么超分辨率方法吗
+
+**A**：超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中，比较经典的有SRCNN，另外CVPR2020也有一篇超分辨率的工作可以参考文章：Unpaired Image Super-Resolution using Pseudo-Supervision，但是没有充分的实践验证过，需要看实际场景下的效果。
+
+#### Q2.4.8：表格识别有什么好的模型 或者论文推荐么
+
+**A**：表格目前学术界比较成熟的解决方案不多 ，可以尝试下分割的论文方案。
+
+
+<a name="PaddleOCR实战问题"></a>
+## 【实战篇】PaddleOCR实战问题
+
+### 使用咨询
+
+#### Q3.1.1：OSError： [WinError 126] 找不到指定的模块。mac pro python 3.4 shapely import 问题
+
+**A**：这个问题是因为shapely库安装有误，可以参考 [#212](https://github.com/PaddlePaddle/PaddleOCR/issues/212) 这个issue重新安装一下
+
+#### Q3.1.2：安装了paddle-gpu，运行时提示没有安装gpu版本的paddle，可能是什么原因?
+
+**A**：用户同时安装了paddle cpu和gpu版本，都删掉之后，重新安装gpu版本的padle就好了
+
+#### Q3.1.3：试用报错：Cannot load cudnn shared library，是什么原因呢？
+
+**A**：需要把cudnn lib添加到LD_LIBRARY_PATH中去。
+
+#### Q3.1.4：PaddlePaddle怎么指定GPU运行 os.environ["CUDA_VISIBLE_DEVICES"]这种不生效
+
+**A**：通过设置 export CUDA_VISIBLE_DEVICES='0'环境变量
+
+#### Q3.1.5：windows下训练没有问题，aistudio中提示数据路径有问题
+
+**A**：需要把`\`改为`/`（windows和linux的文件夹分隔符不一样，windows下的是`\`，linux下是`/`）
+
+#### Q3.1.6：gpu版的paddle虽然能在cpu上运行，但是必须要有gpu设备
+
+**A**：export CUDA_VISIBLE_DEVICES=''，CPU是可以正常跑的
+
+#### Q3.1.7：预测报错ImportError： dlopen： cannot load any more object with static TLS
+
+**A**：glibc的版本问题，运行需要glibc的版本号大于2.23。
+
+#### Q3.1.8：提供的inference model和预训练模型的区别
+
+**A**：inference model为固化模型，文件中包含网络结构和网络参数，多用于预测部署。预训练模型是训练过程中保存好的模型，多用于fine-tune训练或者断点训练。
+
+#### Q3.1.9：模型的解码部分有后处理？
+
+**A**：有的检测的后处理在ppocr/postprocess路径下，识别的后处理均在ppocr/utils/character.py文件内
+
+#### Q3.1.10：PaddleOCR中文模型是否支持数字识别？
+
+**A**：支持的，可以看下ppocr/utils/ppocr_keys_v1.txt 这个文件，是支持的识别字符列表，其中包含了数字识别。
+
+#### Q3.1.11：PaddleOCR如何做到横排和竖排同时支持的？
+
+**A**：合成了一批竖排文字，逆时针旋转90度后加入训练集与横排一起训练。预测时根据图片长款比判断是否为竖排，若为竖排则将crop出的文本逆时针旋转90度后送入识别网络。
+
+#### Q3.1.12：如何获取检测文本框的坐标？
+
+**A**：文本检测的结果有box和文本信息, 具体 [参考代码](https://github.com/PaddlePaddle/PaddleOCR/blob/9d33e36df550762b204d5fbfd7977a25e31b2c44/tools/infer/predict_system.py#L13)
+
+#### Q3.1.13：识别模型框出来的位置太紧凑，会丢失边缘的文字信息，导致识别错误
+
+**A**： 可以在命令中加入 --det_db_unclip_ratio ，参数[定义位置](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L49)，这个参数是检测后处理时控制文本框大小的，默认2.0，可以尝试改成2.5或者更大，反之，如果觉得文本框不够紧凑，也可以把该参数调小。
+
+#### Q3.1.14：英文手写体识别有计划提供的预训练模型吗?
+
+**A**：近期也在开展需求调研，如果企业用户需求较多，我们会考虑增加相应的研发投入，后续提供对应的预训练模型，如果有需求欢迎通过issue或者加入微信群联系我们。
+
+#### Q3.1.15：PaddleOCR的算法可以用于手写文字检测识别吗?后续有计划推出手写预训练模型么？
+**A**：理论上只要有相应的数据集，都是可以的。当然手写识别毕竟和印刷体有区别，对应训练调优策略可能需要适配性优化。
+
+
+#### Q3.1.16：PaddleOCR是否支持在Windows或Mac系统上运行？
+
+**A**：PaddleOCR已完成Windows和Mac系统适配，运行时注意两点：
+
+（1）在[快速安装](./installation.md)时，如果不想安装docker，可跳过第一步，直接从第二步安装paddle开始。
+
+（2）inference模型下载时，如果没有安装wget，可直接点击模型链接或将链接地址复制到浏览器进行下载，并解压放置到相应目录。
+
+#### Q3.1.17：PaddleOCR开源的超轻量模型和通用OCR模型的区别？
+**A**：目前PaddleOCR开源了2个中文模型，分别是8.6M超轻量中文模型和通用中文OCR模型。两者对比信息如下：
+- 相同点：两者使用相同的**算法**和**训练数据**；  
+- 不同点：不同之处在于**骨干网络**和**通道参数**，超轻量模型使用MobileNetV3作为骨干网络，通用模型使用Resnet50_vd作为检测模型backbone，Resnet34_vd作为识别模型backbone，具体参数差异可对比两种模型训练的配置文件.

 |模型|骨干网络|检测训练配置|识别训练配置|
 |-|-|-|-|
 |8.6M超轻量中文OCR模型|MobileNetV3+MobileNetV3|det_mv3_db.yml|rec_chinese_lite_train.yml|
 |通用中文OCR模型|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|

-8. **是否有计划开源仅识别数字或仅识别英文+数字的模型**  
-暂不计划开源仅数字、仅数字+英文、或其他小垂类专用模型。PaddleOCR开源了多种检测、识别算法供用户自定义训练，两种中文模型也是基于开源的算法库训练产出，有小垂类需求的小伙伴，可以按照教程准备好数据，选择合适的配置文件，自行训练，相信能有不错的效果。训练有任何问题欢迎提issue或在交流群提问，我们会及时解答。

-9. **开源模型使用的训练数据是什么，能否开源**  
-目前开源的模型，数据集和量级如下：
-    - 检测：  
-    英文数据集，ICDAR2015  
-    中文数据集，LSVT街景数据集训练数据3w张图片
-    - 识别：  
-    英文数据集，MJSynth和SynthText合成数据，数据量上千万。  
-    中文数据集，LSVT街景数据集根据真值将图crop出来，并进行位置校准，总共30w张图像。此外基于LSVT的语料，合成数据500w。  
-    
-    其中，公开数据集都是开源的，用户可自行搜索下载，也可参考[中文数据集](./datasets.md)，合成数据暂不开源，用户可使用开源合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
+### 数据集
+
+#### Q3.2.1：如何制作PaddleOCR支持的数据格式
+
+**A**：可以参考检测与识别训练文档，里面有数据格式详细介绍。[检测文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)，[识别文档](https：//github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/recognition.md)
+
+#### Q3.2.2：请问一下，如果想用预训练模型，但是我的数据里面又出现了预训练模型字符集中没有的字符，新的字符是在字符集前面添加还是在后面添加？
+
+**A**：在后面添加，修改dict之后，就改变了模型最后一层fc的结构，之前训练到的参数没有用到，相当于从头训练，因此acc是0。
+
+#### Q3.2.3：如何调试数据读取程序？
+
+**A**：tools/train.py中有一个test_reader()函数用于调试数据读取。
+
+#### Q3.2.4：开源模型使用的训练数据是什么，能否开源？
+
+**A**：目前开源的模型，数据集和量级如下：
+
+- 检测：  
+    - 英文数据集，ICDAR2015  
+    - 中文数据集，LSVT街景数据集训练数据3w张图片
+
+- 识别：  
+    - 英文数据集，MJSynth和SynthText合成数据，数据量上千万。  
+    - 中文数据集，LSVT街景数据集根据真值将图crop出来，并进行位置校准，总共30w张图像。此外基于LSVT的语料，合成数据500w。  
+
+其中，公开数据集都是开源的，用户可自行搜索下载，也可参考[中文数据集](./datasets.md)，合成数据暂不开源，用户可使用开源合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。
+
+#### Q3.2.5：请问中文字符集多大呢？支持生僻字识别吗？
+
+**A**：中文字符集是6623， 支持生僻字识别。训练样本中有部分生僻字，但样本不多，如果有特殊需求建议使用自己的数据集做fine-tune。
+
+#### Q3.2.6：中文文本检测、文本识别构建训练集的话，大概需要多少数据量
+
+**A**：检测需要的数据相对较少，在PaddleOCR模型的基础上进行Fine-tune，一般需要500张可达到不错的效果。
+识别分英文和中文，一般英文场景需要几十万数据可达到不错的效果，中文则需要几百万甚至更多。
+
+#### Q3.2.7：中文识别模型如何选择？
+
+**A**：中文模型共有2大类：通用模型和超轻量模型。他们各自的优势如下：
+超轻量模型具有更小的模型大小，更快的预测速度。适合用于端侧使用。
+通用模型具有更高的模型精度，适合对模型大小不敏感的场景。
+此外基于以上模型，PaddleOCR还提供了支持空格识别的模型，主要针对中文场景中的英文句子。
+您可以根据实际使用需求进行选择。
+
+#### Q3.2.8：图像旋转90° 文本检测可以正常检测到具体文本位置，但是识别准确度大幅降低，是否会考虑增加相应的旋转预处理？
+
+**A**：目前模型只支持两种方向的文字：水平和垂直。 为了降低模型大小，加快模型预测速度，PaddleOCR暂时没有加入图片的方向判断。建议用户在识别前自行转正，后期也会考虑添加选择角度判断。
+
+#### Q3.2.9：同一张图通用检测出21个条目，轻量级检测出26个 ，难道不是轻量级的好吗？
+
+**A**：可以主要参考可视化效果，通用模型更倾向于检测一整行文字，轻量级可能会有一行文字被分成两段检测的情况，不是数量越多，效果就越好。
+
+### 模型训练调优
+
+#### Q3.3.1：文本长度超过25，应该怎么处理？
+
+**A**：默认训练时的文本可识别的最大长度为25，超过25的文本会被忽略不参与训练。如果您训练样本中的长文本较多，可以修改配置文件中的 max\_text\_length 字段，设置为更大的最长文本长度，具体位置在[这里](https://github.com/PaddlePaddle/PaddleOCR/blob/fb9e47b262529386983edc21b33abfa16bbf06ac/configs/rec/rec_chinese_lite_train.yml#L13)。
+
+#### Q3.3.2：配置文件里面检测的阈值设置么?
+
+**A**：有的，检测相关的参数主要有以下几个：
+``max_side_len：预测时图像resize的长边尺寸
+thresh: 用于二值化输出图的阈值
+box_thresh:用于过滤文本框的阈值，低于此阈值的文本框不要
+unclip_ratio: 文本框扩张的系数，关系到文本框的大小``
+
+这些参数的默认值见[代码](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/tools/infer/utility.py#L40)，可以通过从命令行传递参数进行修改。
+
+#### Q3.3.3：我想请教一下，你们在训练识别时候，lsvt里的非矩形框文字，你们是怎么做处理的呢。忽略掉还是去最小旋转框？
+
+**A**：现在是忽略处理的
+
+#### Q3.3.4：训练过程中，如何恰当的停止训练（直接kill，经常还有显存占用的问题）
+
+**A**：可以通过下面的脚本终止所有包含train.py字段的进程，
+
+```
+ps -axu | grep train.py | awk '{print $2}' | xargs kill -9
+```
+
+#### Q3.3.5：读数据进程数设置4~8时训练一会进程接连defunct后gpu利用率一直为0卡死
+
+**A**：修改多进程的队列数后解决， 将[代码段]( https://github.com/PaddlePaddle/PaddleOCR/blob/549108fe0aa0d87c0a3b2d471f1c653e89daab80/ppocr/data/reader_main.py#L75 )  修改为：
+
+```
+return paddle.reader.multiprocess_reader(readers, False, queue_size=320)
+
+```
+
+#### Q3.3.6：可不可以将pretrain_weights设置为空呢？想从零开始训练一个model
+
+**A**：这个是可以的，在训练通用识别模型的时候，pretrain_weights就设置为空，但是这样可能需要更长的迭代轮数才能达到相同的精度。
+
+#### Q3.3.7：PaddleOCR默认不是200个step保存一次模型吗？为啥文件夹下面都没有生成
+
+**A**：因为默认保存的起始点不是0，而是4000，将eval_batch_step [4000, 5000]改为[0, 2000] 就是从第0次迭代开始，每2000迭代保存一次模型
+
+#### Q3.3.8：如何进行模型微调？
+
+**A**：注意配置好匹配的数据集合适，然后在finetune训练时，可以加载我们提供的预训练模型，设置配置文件中Global.pretrain_weights 参数为要加载的预训练模型路径。
+
+#### Q3.3.9：文本检测换成自己的数据没法训练，有一些”###”是什么意思？
+
+**A**：数据格式有问题，”###” 表示要被忽略的文本区域，所以你的数据都被跳过了，可以换成其他任意字符或者就写个空的。
+
+#### Q3.3.10：copy_from_cpu这个地方，这块input不变(t_data的size不变)连续调用两次copy_from_cpu()时，这里面的gpu_place会重新malloc GPU内存吗？还是只有当ele_size变化时才会重新在GPU上malloc呢？
+
+**A**：小于等于的时候都不会重新分配，只有大于的时候才会重新分配
+
+#### Q3.3.11：自己训练出来的未inference转换的模型 可以当作预训练模型吗？
+
+**A**：可以的，但是如果训练数据两量少的话，可能会过拟合到少量数据上，泛化性能不佳。
+
+#### Q3.3.12：使用带TPS的识别模型预测报错
+
+**A**：直接更换配置文件里的Backbone.function即可，格式为：网络文件路径,网络Class名词。如果所需的backbone在PaddleOCR里没有提供，可以参照PaddleClas里面的网络结构，进行修改尝试。具体修改原则可以参考OCR通用问题中 "如何更换文本检测/识别的backbone" 的回答。
+
+#### Q3.3.13：如何更换文本检测/识别的backbone？报错信息：``Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100)  ``
+
+**A**：TPS模块暂时无法支持变长的输入，请设置 ``--rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape``
+
+### 预测部署
+
+#### Q3.4.1：如何pip安装opt模型转换工具？
+
+**A**：由于OCR端侧部署需要某些算子的支持，这些算子仅在Paddle-Lite 最新develop分支中，所以需要自己编译opt模型转换工具。opt工具可以通过编译PaddleLite获得，编译步骤参考[lite部署文档](https://github.com/PaddlePaddle/PaddleOCR/blob/0791714b91/deploy/lite/readme.md)  中2.1 模型优化部分。
+
+#### Q3.4.2：如何将PaddleOCR预测模型封装成SDK
+
+**A**：如果是Python的话，可以使用tools/infer/predict_system.py中的TextSystem进行sdk封装，如果是c++的话，可以使用deploy/cpp_infer/src下面的DBDetector和CRNNRecognizer完成封装
+
+#### Q3.4.3：服务部署可以只发布文本识别，而不带文本检测模型么？
+
+**A**：可以的。默认的服务部署是检测和识别串联预测的。也支持单独发布文本检测或文本识别模型，比如使用PaddleHUBPaddleOCR 模型时，deploy下有三个文件夹，分别是
+
+- ocr_det：检测预测
+- ocr_rec: 识别预测
+- ocr_system: 检测识别串联预测
+
+每个模块是单独分开的，所以可以选择只发布文本识别模型。使用PaddleServing部署时同理。
+
+
+#### Q3.4.4：为什么PaddleOCR检测预测是只支持一张图片测试？即test_batch_size_per_card=1
+
+**A**：测试的时候，对图像等比例缩放，最长边960，不同图像等比例缩放后长宽不一致，无法组成batch，所以设置为test_batch_size为1。
+
+#### Q3.4.5：为什么使用c++ inference和python inference结果不一致?
+
+**A**：可能是导出的inference model版本与预测库版本需要保持一致，比如在Windows下，Paddle官网提供的预测库版本是1.8，而PaddleOCR提供的inference model 版本是1.7，因此最终预测结果会有差别。可以在Paddle1.8环境下导出模型，再基于该模型进行预测。
+此外也需要保证两者的预测参数配置完全一致。
+
+#### Q3.4.6：为什么第一张张图预测时间很长，第二张之后预测时间会降低？
+
+**A**：第一张图需要显存资源初始化，耗时较多。完成模型加载后，之后的预测时间会明显缩短。
+
+#### Q3.4.7：请问opt工具可以直接转int8量化后的模型为.nb文件吗
+
+**A**：有的，PaddleLite提供完善的opt工具，可以参考[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_with_data.html)
+
+#### Q3.4.8：请问在安卓端怎么设置这个参数 --det_db_unclip_ratio=3
+
+**A**：在安卓APK上无法设置，没有暴露这个接口，如果使用的是PaddledOCR/deploy/lite/的demo，可以修改config.txt中的对应参数来设置
+
+#### Q3.4.9：PaddleOCR模型是否可以转换成ONNX模型?
+
+**A**：目前暂不支持转ONNX，相关工作在研发中。
+
+#### Q3.4.10：使用opt工具对检测模型转换时报错 can not found op arguments for node conv2_b_attr
+
+**A**：这个问题大概率是编译opt工具的Paddle-Lite不是develop分支，建议使用Paddle-Lite 的develop分支编译opt工具。
+
+#### Q3.4.11：libopenblas.so找不到是什么意思？
+
+**A**：目前包括mkl和openblas两种版本的预测库，推荐使用mkl的预测库，如果下载的预测库是mkl的，编译的时候也需要勾选`with_mkl`选项
+，以Linux下编译为例，需要在设置这里为ON，`-DWITH_MKL=ON`，[参考链接](https://github.com/PaddlePaddle/PaddleOCR/blob/8a78af26df0dd8f15b734cc8db13e25d2a3656a2/deploy/cpp_infer/tools/build.sh#L12)。此外，使用预测库时，推荐在Linux或者Windows上进行开发，不推荐在MacOS上开发。
+
+#### Q3.4.12：使用自定义字典训练，inference时如何修改
+
+**A**：使用了自定义字典的话，用inference预测时，需要通过 --rec_char_dict_path 修改字典路径。详细操作可参考[文档](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/inference.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E5%AD%97%E5%85%B8%E7%9A%84%E6%8E%A8%E7%90%86)

-10. **使用带TPS的识别模型预测报错**  
-报错信息：Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](320) != Grid dimension[2](100)  
-原因：TPS模块暂时无法支持变长的输入，请设置 --rec_image_shape='3,32,100' --rec_char_type='en' 固定输入shape
+#### Q3.4.13：能否返回单字字符的位置？

-11. **自定义字典训练的模型，识别结果出现字典里没出现的字**  
-预测时没有设置采用的自定义字典路径。设置方法是在预测时，通过增加输入参数rec_char_dict_path来设置。
+**A**：训练的时候标注是整个文本行的标注，所以预测的也是文本行位置，如果要获取单字符位置信息，可以根据预测的文本，计算字符数量，再去根据整个文本行的位置信息，估计文本块中每个字符的位置。

+#### Q3.4.14：PaddleOCR模型部署方式有哪几种？
+**A**：目前有Inference部署，serving部署和手机端Paddle Lite部署，可根据不同场景做灵活的选择：Inference部署适用于本地离线部署，serving部署适用于云端部署，Paddle Lite部署适用于手机端集成。
--- a/doc/doc_ch/config.md
+++ b/doc/doc_ch/config.md
@@ -63,8 +63,9 @@
 |         beta1           |    设置一阶矩估计的指数衰减率  |       0.9         |               \             |
 |         beta2           |    设置二阶矩估计的指数衰减率  |     0.999         |               \             |
 |         decay           |         是否使用decay       |    \              |               \             |
-|      function(decay)    |         设置decay方式       |   -    |       目前支持cosine_decay与piecewise_decay  |
-|      step_each_epoch    |      每个epoch包含多少次迭代, cosine_decay时有效   |         20       | 计算方式：total_image_num / (batch_size_per_card * card_size) |
-|        total_epoch      |    总共迭代多少个epoch, cosine_decay时有效        |       1000      | 与Global.epoch_num 一致        |
+|      function(decay)    |         设置decay方式       |   -    |       目前支持cosine_decay, cosine_decay_warmup与piecewise_decay  |
+|      step_each_epoch    |      每个epoch包含多少次迭代, cosine_decay/cosine_decay_warmup时有效   |         20       | 计算方式：total_image_num / (batch_size_per_card * card_size) |
+|        total_epoch      |    总共迭代多少个epoch, cosine_decay/cosine_decay_warmup时有效        |       1000      | 与Global.epoch_num 一致        |
+|        warmup_minibatch      |  线性warmup的迭代次数, cosine_decay_warmup时有效        |       1000      | \        |
 |        boundaries      |    学习率下降时的迭代次数间隔, piecewise_decay时有效       |       -      | 参数为列表形式        |
 |        decay_rate      |    学习率衰减系数, piecewise_decay时有效       |       -      |  \        |
--- a/doc/doc_ch/detection.md
+++ b/doc/doc_ch/detection.md
 # 文字检测

-本节以icdar15数据集为例，介绍PaddleOCR中检测模型的训练、评估与测试。
+本节以icdar2015数据集为例，介绍PaddleOCR中检测模型的训练、评估与测试。

 ## 数据准备
 icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到，首次下载需注册。

 将下载到的数据集解压到工作目录下，假设解压在 PaddleOCR/train_data/ 下。另外，PaddleOCR将零散的标注文件整理成单独的标注文件
 ，您可以通过wget的方式进行下载。
-```
+```shell
 # 在PaddleOCR路径下
 cd PaddleOCR/
 wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
@@ -23,21 +23,21 @@ wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_la
  └─ test_icdar2015_label.txt     icdar数据集的测试标注
 ```

-提供的标注文件格式为，其中中间是"\t"分隔：
+提供的标注文件格式如下，中间用"\t"分隔：
 ```
 " 图像文件名                    json.dumps编码的图像标注信息"
 ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
 ```
 json.dumps编码前的图像标注信息是包含多个字典的list，字典中的 `points` 表示文本框的四个点的坐标(x, y)，从左上角的点开始顺时针排列。
-`transcription` 表示当前文本框的文字，在文本检测任务中并不需要这个信息。
-如果您想在其他数据集上训练PaddleOCR，可以按照上述形式构建标注文件。
+`transcription` 表示当前文本框的文字，**当其内容为“###”时，表示该文本框无效，在训练时会跳过。**

+如果您想在其他数据集上训练，可以按照上述形式构建标注文件。

 ## 快速启动训练

 首先下载模型backbone的pretrain model，PaddleOCR的检测模型目前支持两种backbone，分别是MobileNetV3、ResNet50_vd，
 您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。
-```
+```shell
 cd PaddleOCR/
 # 下载MobileNetV3的预训练模型
 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
@@ -45,7 +45,7 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/Mob
 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar

 # 解压预训练模型文件，以MobileNetV3为例
-tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/
+tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/

 # 注：正确解压backbone预训练权重文件后，文件夹下包含众多以网络层命名的权重文件，格式如下：
 ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
@@ -57,11 +57,11 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models

 ```

-**启动训练**
+#### 启动训练

 *如果您安装的是cpu版本，请将配置文件中的 `use_gpu` 字段修改为false*

-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
 ```

@@ -69,52 +69,52 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=
 有关配置文件的详细解释，请参考[链接](./config.md)。

 您也可以通过-o参数在不需要修改yml文件的情况下，改变训练的参数，比如，调整训练的学习率为0.0001
-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
 ```

-**断点训练**
+#### 断点训练

 如果训练程序中断，如果希望加载训练中断的模型从而恢复训练，可以通过指定Global.checkpoints指定要加载的模型路径：
-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
 ```

-**注意**：Global.checkpoints的优先级高于Global.pretrain_weights的优先级，即同时指定两个参数时，优先加载Global.checkpoints指定的模型，如果Global.checkpoints指定的模型路径有误，会加载Global.pretrain_weights指定的模型。
+**注意**：`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级，即同时指定两个参数时，优先加载`Global.checkpoints`指定的模型，如果`Global.checkpoints`指定的模型路径有误，会加载`Global.pretrain_weights`指定的模型。

 ## 指标评估

 PaddleOCR计算三个OCR检测相关的指标，分别是：Precision、Recall、Hmean。

-运行如下代码，根据配置文件det_db_mv3.yml中save_res_path指定的测试集检测结果文件，计算评估指标。
+运行如下代码，根据配置文件`det_db_mv3.yml`中`save_res_path`指定的测试集检测结果文件，计算评估指标。

-评估时设置后处理参数box_thresh=0.6，unclip_ratio=1.5，使用不同数据集、不同模型训练，可调整这两个参数进行优化
-```
+评估时设置后处理参数`box_thresh=0.6`，`unclip_ratio=1.5`，使用不同数据集、不同模型训练，可调整这两个参数进行优化
+```shell
 python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
 ```
-训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时，需要设置Global.checkpoints指向保存的参数文件。
+训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时，需要设置`Global.checkpoints`指向保存的参数文件。

 比如：
-```
+```shell
 python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
 ```

-* 注：box_thresh、unclip_ratio是DB后处理所需要的参数，在评估EAST模型时不需要设置
+* 注：`box_thresh`、`unclip_ratio`是DB后处理所需要的参数，在评估EAST模型时不需要设置

 ## 测试检测效果

 测试单张图像的检测效果
-```
+```shell
 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
 ```

 测试DB模型时，调整后处理阈值，
-```
+```shell
 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
 ```


 测试文件夹下所有图像的检测效果
-```
+```shell
 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
 ```
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md

 # 基于Python预测引擎推理

-inference 模型（fluid.io.save_inference_model保存的模型）
-一般是模型训练完成后保存的固化模型，多用于预测部署。
-训练过程中保存的模型是checkpoints模型，保存的是模型的参数，多用于恢复训练等。
+inference 模型（`fluid.io.save_inference_model`保存的模型）
+一般是模型训练完成后保存的固化模型，多用于预测部署。训练过程中保存的模型是checkpoints模型，保存的是模型的参数，多用于恢复训练等。
 与checkpoints模型相比，inference 模型会额外保存模型的结构信息，在预测部署、加速推理上性能优越，灵活方便，适合与实际系统集成。更详细的介绍请参考文档[分类预测框架](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).

 接下来首先介绍如何将训练的模型转换成inference模型，然后将依次介绍文本检测、文本识别以及两者串联基于预测引擎推理。

+
+- [一、训练模型转inference模型](#训练模型转inference模型)
+    - [检测模型转inference模型](#检测模型转inference模型)
+    - [识别模型转inference模型](#识别模型转inference模型)  
+    
+- [二、文本检测模型推理](#文本检测模型推理)
+    - [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理)
+    - [2. DB文本检测模型推理](#DB文本检测模型推理)
+    - [3. EAST文本检测模型推理](#EAST文本检测模型推理)
+    - [4. SAST文本检测模型推理](#SAST文本检测模型推理)  
+    
+- [三、文本识别模型推理](#文本识别模型推理)
+    - [1. 超轻量中文识别模型推理](#超轻量中文识别模型推理)
+    - [2. 基于CTC损失的识别模型推理](#基于CTC损失的识别模型推理)
+    - [3. 基于Attention损失的识别模型推理](#基于Attention损失的识别模型推理)
+    - [4. 自定义文本识别字典的推理](#自定义文本识别字典的推理)  
+    
+- [四、文本检测、识别串联推理](#文本检测、识别串联推理)
+    - [1. 超轻量中文OCR模型推理](#超轻量中文OCR模型推理)
+    - [2. 其他模型推理](#其他模型推理)
+    
+    
+<a name="训练模型转inference模型"></a>
 ## 一、训练模型转inference模型
+<a name="检测模型转inference模型"></a>
 ### 检测模型转inference模型

 下载超轻量级中文检测模型：
@@ -24,15 +47,16 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar &

 python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/
 ```
-转inference模型时，使用的配置文件和训练时使用的配置文件相同。另外，还需要设置配置文件中的Global.checkpoints、Global.save_inference_dir参数。
-其中Global.checkpoints指向训练中保存的模型参数文件，Global.save_inference_dir是生成的inference模型要保存的目录。
-转换成功后，在save_inference_dir 目录下有两个文件：
+转inference模型时，使用的配置文件和训练时使用的配置文件相同。另外，还需要设置配置文件中的`Global.checkpoints`、`Global.save_inference_dir`参数。
+其中`Global.checkpoints`指向训练中保存的模型参数文件，`Global.save_inference_dir`是生成的inference模型要保存的目录。
+转换成功后，在`save_inference_dir`目录下有两个文件：
 ```
 inference/det_db/
  └─  model     检测inference模型的program文件
  └─  params    检测inference模型的参数文件
 ```

+<a name="识别模型转inference模型"></a>
 ### 识别模型转inference模型

 下载超轻量中文识别模型：
@@ -51,7 +75,7 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa
        Global.save_inference_dir=./inference/rec_crnn/
 ```

-如果您是在自己的数据集上训练的模型，并且调整了中文字符的字典文件，请注意修改配置文件中的character_dict_path是否是所需要的字典文件。
+**注意：**如果您是在自己的数据集上训练的模型，并且调整了中文字符的字典文件，请注意修改配置文件中的`character_dict_path`是否是所需要的字典文件。

 转换成功后，在目录下有两个文件：
 ```
@@ -60,11 +84,13 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa
  └─  params    识别inference模型的参数文件
 ```

+<a name="文本检测模型推理"></a>
 ## 二、文本检测模型推理

-下面将介绍超轻量中文检测模型推理、DB文本检测模型推理和EAST文本检测模型推理。默认配置是根据DB文本检测模型推理设置的。由于EAST和DB算法差别很大，在推理时，需要通过传入相应的参数适配EAST文本检测算法。
+文本检测模型推理，默认使用DB模型的配置参数。当不使用DB模型时，在推理时，需要通过传入相应的参数进行算法适配，细节参考下文。

-### 1.超轻量中文检测模型推理
+<a name="超轻量中文检测模型推理"></a>
+### 1. 超轻量中文检测模型推理

 超轻量中文检测模型推理，可以执行如下命令：

@@ -72,11 +98,11 @@ python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Globa
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
 ```

-可视化文本检测结果默认保存到 ./inference_results 文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：

 ![](../imgs_results/det_res_2.jpg)

-通过设置参数det_max_side_len的大小，改变检测算法中图片规范化的最大值。当图片的长宽都小于det_max_side_len，则使用原图预测，否则将图片等比例缩放到最大值，进行预测。该参数默认设置为det_max_side_len=960. 如果输入图片的分辨率比较大，而且想使用更大的分辨率预测，可以执行如下命令：
+通过设置参数`det_max_side_len`的大小，改变检测算法中图片规范化的最大值。当图片的长宽都小于`det_max_side_len`，则使用原图预测，否则将图片等比例缩放到最大值，进行预测。该参数默认设置为`det_max_side_len=960`。 如果输入图片的分辨率比较大，而且想使用更大的分辨率预测，可以执行如下命令：

 ```
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200
@@ -87,7 +113,8 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_di
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
 ```

-### 2.DB文本检测模型推理
+<a name="DB文本检测模型推理"></a>
+### 2. DB文本检测模型推理

 首先将DB文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar))，可以使用如下命令进行转换：

@@ -105,13 +132,14 @@ DB文本检测模型推理，可以执行如下命令：
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"
 ```

-可视化文本检测结果默认保存到 ./inference_results 文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：

 ![](../imgs_results/det_res_img_10_db.jpg)

-**注意**：由于ICDAR2015数据集只有1000张训练图像，主要针对英文场景，所以上述模型对中文文本图像检测效果非常差。
+**注意**：由于ICDAR2015数据集只有1000张训练图像，且主要针对英文场景，所以上述模型对中文文本图像检测效果会比较差。

-### 3.EAST文本检测模型推理
+<a name="EAST文本检测模型推理"></a>
+### 3. EAST文本检测模型推理

 首先将EAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar))，可以使用如下命令进行转换：

@@ -123,24 +151,59 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
 python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east"
 ```

-EAST文本检测模型推理，需要设置参数det_algorithm，指定检测算法类型为EAST，可以执行如下命令：
+**EAST文本检测模型推理，需要设置参数`--det_algorithm="EAST"`**，可以执行如下命令：

 ```
 python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/"
 ```
-可视化文本检测结果默认保存到 ./inference_results 文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：

 ![](../imgs_results/det_res_img_10_east.jpg)

-**注意**：本代码库中EAST后处理中NMS采用的Python版本，所以预测速度比较耗时。如果采用C++版本，会有明显加速。
+**注意**：本代码库中，EAST后处理Locality-Aware NMS有python和c++两种版本，c++版速度明显快于python版。由于c++版本nms编译版本问题，只有python3.5环境下会调用c++版nms，其他情况将调用python版nms。
+
+
+<a name="SAST文本检测模型推理"></a>
+### 4. SAST文本检测模型推理
+#### (1). 四边形文本检测模型（ICDAR2015）  
+首先将SAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar))，可以使用如下命令进行转换：
+```
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.checkpoints="./models/sast_r50_vd_icdar2015/best_accuracy" Global.save_inference_dir="./inference/det_sast_ic15"
+```
+**SAST文本检测模型推理，需要设置参数`--det_algorithm="SAST"`**，可以执行如下命令：
+```
+python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/"
+```
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：
+
+![](../imgs_results/det_res_img_10_sast.jpg)
+
+#### (2). 弯曲文本检测模型（Total-Text）  
+首先将SAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在Total-Text英文数据集训练的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar))，可以使用如下命令进行转换：
+
+```
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.checkpoints="./models/sast_r50_vd_total_text/best_accuracy" Global.save_inference_dir="./inference/det_sast_tt"
+```
+
+**SAST文本检测模型推理，需要设置参数`--det_algorithm="SAST"`，同时，还需要增加参数`--det_sast_polygon=True`，**可以执行如下命令：
+```
+python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_sast_polygon=True
+```
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：

+![](../imgs_results/det_res_img623_sast.jpg)

+**注意**：本代码库中，SAST后处理Locality-Aware NMS有python和c++两种版本，c++版速度明显快于python版。由于c++版本nms编译版本问题，只有python3.5环境下会调用c++版nms，其他情况将调用python版nms。
+
+
+<a name="文本识别模型推理"></a>
 ## 三、文本识别模型推理

 下面将介绍超轻量中文识别模型推理、基于CTC损失的识别模型推理和基于Attention损失的识别模型推理。对于中文文本识别，建议优先选择基于CTC损失的识别模型，实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。此外，如果训练时修改了文本的字典，请参考下面的自定义文本识别字典的推理。


-### 1.超轻量中文识别模型推理
+<a name="超轻量中文识别模型推理"></a>
+### 1. 超轻量中文识别模型推理

 超轻量中文识别模型推理，可以执行如下命令：

@@ -155,7 +218,8 @@ python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg"
 Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695]


-### 2.基于CTC损失的识别模型推理
+<a name="基于CTC损失的识别模型推理"></a>
+### 2. 基于CTC损失的识别模型推理

 我们以STAR-Net为例，介绍基于CTC损失的识别模型推理。 CRNN和Rosetta使用方式类似，不用设置识别算法参数rec_algorithm。

@@ -176,7 +240,8 @@ STAR-Net文本识别模型推理，可以执行如下命令：
 python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
 ```

-### 3.基于Attention损失的识别模型推理
+<a name="基于Attention损失的识别模型推理"></a>
+### 3. 基于Attention损失的识别模型推理

 基于Attention损失的识别模型与ctc不同，需要额外设置识别算法参数 --rec_algorithm="RARE"

@@ -202,16 +267,18 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
 dict_character = list(self.character_str)
 ```

-### 4.自定义文本识别字典的推理
+<a name="自定义文本识别字典的推理"></a>
+### 4. 自定义文本识别字典的推理
 如果训练时修改了文本的字典，在使用inference模型预测时，需要通过`--rec_char_dict_path`指定使用的字典路径

 ```
 python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
 ```

+<a name="文本检测、识别串联推理"></a>
 ## 四、文本检测、识别串联推理
-
-### 1.超轻量中文OCR模型推理
+<a name="超轻量中文OCR模型推理"></a>
+### 1. 超轻量中文OCR模型推理

 在执行预测时，需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。

@@ -223,9 +290,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model

 ![](../imgs_results/2.jpg)

-### 2.其他模型推理
+<a name="其他模型推理"></a>
+### 2. 其他模型推理
+
+如果想尝试使用其他检测算法或者识别算法，请参考上述文本检测模型推理和文本识别模型推理，更新相应配置和模型。
+
+**注意：由于检测框矫正逻辑的局限性，暂不支持使用SAST弯曲文本检测模型（即，使用参数`--det_sast_polygon=True`时）进行模型串联。**

-如果想尝试使用其他检测算法或者识别算法，请参考上述文本检测模型推理和文本识别模型推理，更新相应配置和模型，下面给出基于EAST文本检测和STAR-Net文本识别执行命令：
+下面给出基于EAST文本检测和STAR-Net文本识别执行命令：

 ```
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"

--- a/doc/doc_ch/quickstart.md
+++ b/doc/doc_ch/quickstart.md
@@ -5,6 +5,8 @@

 请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。

+*注意：也可以通过 whl 包安装使用PaddleOCR，具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)。*
+
 ## 2.inference模型下载

 |模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|

--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -18,6 +18,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset

 若您本地没有数据集，可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据，用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，下载 benchmark 所需的lmdb格式数据集。

+如果希望复现SRN的论文指标，需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
+
 * 使用自己数据集：

 若您希望使用自己的数据进行训练，请参考下文组织您的数据。
@@ -161,6 +163,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
 | rec_r34_vd_none_none_ctc.yml |  Rosetta |   Resnet34_vd |  None   |  None |  ctc  |
 | rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
 | rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |
+| rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |

 训练中文数据，推荐使用`rec_chinese_lite_train.yml`，如您希望尝试其他算法在中文数据集上的效果，请参考下列说明修改配置文件：


--- a/doc/doc_ch/update.md
+++ b/doc/doc_ch/update.md
 # 更新
+- 2020.8.24 支持通过whl包安装使用PaddleOCR，具体参考[Paddleocr Package使用说明](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md)
+- 2020.8.21 更新8月18日B站直播课回放和PPT，课节2，易学易用的OCR工具大礼包，[获取地址](https://aistudio.baidu.com/aistudio/education/group/info/1519)
 - 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
- 2020.7.23 发布7月21日B站直播课回放和PPT，PaddleOCR开源大礼包全面解读，[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
+- 2020.7.23 发布7月21日B站直播课回放和PPT，课节1，PaddleOCR开源大礼包全面解读，[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO，支持iOS和Android系统
 - 2020.7.15 完善预测部署，添加基于C++预测引擎推理、服务化部署和端侧部署方案，以及超轻量级中文OCR模型预测耗时Benchmark
 - 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具

--- a/doc/doc_ch/whl.md
+++ b/doc/doc_ch/whl.md
+# paddleocr package使用说明
+
+## 快速上手
+
+### 安装whl包
+
+pip安装
+```bash
+pip install paddleocr
+```
+
+本地构建并安装
+```bash
+python setup.py bdist_wheel
+pip install dist/paddleocr-0.0.3-py3-none-any.whl
+```
+### 1. 代码使用
+
+* 检测+识别全流程
+```python
+from paddleocr import PaddleOCR, draw_ocr
+ocr = PaddleOCR() # need to run only once to download and load model into memory
+img_path = 'PaddleOCR/doc/imgs/11.jpg'
+result = ocr.ocr(img_path)
+for line in result:
+    print(line)
+
+# 显示结果
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+结果是一个list，每个item包含了文本框，文字和识别置信度
+```bash
+[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
+[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
+[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['（45元/每公斤，100公斤起订）', 0.9676722]]
+......
+```
+结果可视化
+
+<div align="center">
+    <img src="../imgs_results/whl/11_det_rec.jpg" width="800">
+</div>
+
+* 单独执行检测
+```python
+from paddleocr import PaddleOCR, draw_ocr
+ocr = PaddleOCR() # need to run only once to download and load model into memory
+img_path = 'PaddleOCR/doc/imgs/11.jpg'
+result = ocr.ocr(img_path,rec=False)
+for line in result:
+    print(line)
+
+# 显示结果
+from PIL import Image
+
+image = Image.open(img_path).convert('RGB')
+im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+结果是一个list，每个item只包含文本框
+```bash
+[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
+[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
+[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
+......
+```
+结果可视化
+
+
+<div align="center">
+    <img src="../imgs_results/whl/11_det.jpg" width="800">
+</div>
+
+* 单独执行识别
+```python
+from paddleocr import PaddleOCR
+ocr = PaddleOCR() # need to run only once to download and load model into memory
+img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
+result = ocr.ocr(img_path,det=False)
+for line in result:
+    print(line)
+```
+结果是一个list，每个item只包含识别结果和识别置信度
+```bash
+['韩国小馆', 0.9907421]
+```
+
+### 通过命令行使用
+
+查看帮助信息
+```bash
+paddleocr -h
+```
+
+* 检测+识别全流程
+```bash
+paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
+```
+结果是一个list，每个item包含了文本框，文字和识别置信度
+```bash
+[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
+[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
+[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['（45元/每公斤，100公斤起订）', 0.9676722]]
+......
+```
+
+* 单独执行检测
+```bash
+paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
+```
+结果是一个list，每个item只包含文本框
+```bash
+[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
+[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
+[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
+......
+```
+
+* 单独执行识别
+```bash
+paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
+```
+
+结果是一个list，每个item只包含识别结果和识别置信度
+```bash
+['韩国小馆', 0.9907421]
+```
+
+## 自定义模型
+当内置模型无法满足需求时，需要使用到自己训练的模型。
+首先，参照[inference.md](./inference.md) 第一节转换将检测和识别模型转换为inference模型，然后按照如下方式使用
+
+### 代码使用
+```python
+from paddleocr import PaddleOCR, draw_ocr
+# 检测模型和识别模型路径下必须含有model和params文件
+ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}')
+img_path = 'PaddleOCR/doc/imgs/11.jpg'
+result = ocr.ocr(img_path)
+for line in result:
+    print(line)
+
+# 显示结果
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+
+### 通过命令行使用
+
+```bash
+paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
+```
+
+## 参数说明
+
+| 字段                    | 说明                                                                                                                                                                                                                 | 默认值                  |
+|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
+| use_gpu                 | 是否使用GPU                                                                                                                                                                                                          | TRUE                    |
+| gpu_mem                 | 初始化占用的GPU内存大小                                                                                                                                                                                              | 8000M                   |
+| image_dir               | 通过命令行调用时执行预测的图片或文件夹路径                                                                                                                                                                           |                         |
+| det_algorithm           | 使用的检测算法类型                                                                                                                                                                                                   | DB                      |
+| det_model_dir          |  检测模型所在文件夹。传参方式有两种，1. None: 自动下载内置模型到 `~/.paddleocr/det`；2.自己转换好的inference模型路径，模型路径下必须包含model和params文件 |   None        |
+| det_max_side_len        | 检测算法前向时图片长边的最大尺寸，当长边超出这个值时会将长边resize到这个大小，短边等比例缩放                                                                                                                         | 960                     |
+| det_db_thresh           | DB模型输出预测图的二值化阈值                                                                                                                                                                                         | 0.3                     |
+| det_db_box_thresh       | DB模型输出框的阈值，低于此值的预测框会被丢弃                                                                                                                                                                           | 0.5                     |
+| det_db_unclip_ratio     | DB模型输出框扩大的比例                                                                                                                                                                                               | 2                       |
+| det_east_score_thresh   | EAST模型输出预测图的二值化阈值                                                                                                                                                                                       | 0.8                     |
+| det_east_cover_thresh   | EAST模型输出框的阈值，低于此值的预测框会被丢弃                                                                                                                                                                         | 0.1                     |
+| det_east_nms_thresh     | EAST模型输出框NMS的阈值                                                                                                                                                                                              | 0.2                     |
+| rec_algorithm           | 使用的识别算法类型                                                                                                                                                                                                   | CRNN                    |
+| rec_model_dir          | 识别模型所在文件夹。传承那方式有两种，1. None: 自动下载内置模型到 `~/.paddleocr/rec`；2.自己转换好的inference模型路径，模型路径下必须包含model和params文件 | None |
+| rec_image_shape         | 识别算法的输入图片尺寸                                                                                                                                                                                             | "3,32,320"              |
+| rec_char_type           | 识别算法的字符类型，中文(ch)或英文(en)                                                                                                                                                                               | ch                      |
+| rec_batch_num           | 进行识别时，同时前向的图片数                                                                                                                                                                                         | 30                      |
+| max_text_length         | 识别算法能识别的最大文字长度                                                                                                                                                                                         | 25                      |
+| rec_char_dict_path      | 识别模型字典路径，当rec_model_dir使用方式2传参时需要修改为自己的字典路径                                                                                                                                                | ./ppocr/utils/ppocr_keys_v1.txt                        |
+| use_space_char          | 是否识别空格                                                                                                                                                                                                         | TRUE                    |
+| enable_mkldnn           | 是否启用mkldnn                                                                                                                                                                                                       | FALSE                   |
+| det                     | 前向时使用启动检测                                                                                                                                                                                                   | TRUE                    |
+| rec                     | 前向时是否启动识别                                                                                                                                                                                                   | TRUE                    |
--- a/doc/doc_en/FAQ_en.md
+++ b/doc/doc_en/FAQ_en.md
@@ -45,9 +45,12 @@ At present, the open source model, dataset and magnitude are as follows:
    Among them, the public datasets are opensourced, users can search and download by themselves, or refer to [Chinese data set](./datasets_en.md), synthetic data is not opensourced, users can use open-source synthesis tools to synthesize data themselves. Current available synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator), etc.

 10. **Error in using the model with TPS module for prediction**  
-Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100)  
+Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100)  
 Solution：TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'

-11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**
+11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**  
+The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.

-The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.
\ No newline at end of file
+
+12. **Results of cpp_infer and python_inference are very different**  
+Versions of exprted inference model and inference libraray should be same. For example, on Windows platform, version of the inference libraray that PaddlePaddle provides is 1.8, but version of the inference model that PaddleOCR provides is 1.7, you should export model yourself(`tools/export_model.py`) on PaddlePaddle1.8 and then use the exported model for inference.
--- a/doc/doc_en/config_en.md
+++ b/doc/doc_en/config_en.md
@@ -60,8 +60,9 @@ Take `rec_icdar15_train.yml` as an example:
 |         beta1           |    Set the exponential decay rate for the 1st moment estimates  |       0.9         |               \             |
 |         beta2           |    Set the exponential decay rate for the 2nd moment estimates  |     0.999         |               \             |
 |         decay           |         Whether to use decay       |    \              |               \             |
-|      function(decay)    |         Set the decay function       |   cosine_decay    |         Support cosine_decay and piecewise_decay            |
-|      step_each_epoch    |      The number of steps in an epoch. Used in cosine_decay  |         20       | Calculation ：total_image_num / (batch_size_per_card * card_size) |
-|        total_epoch      |    The number of epochs. Used in cosine_decay      |       1000      | Consistent with Global.epoch_num      |
+|      function(decay)    |         Set the decay function       |   cosine_decay    |         Support cosine_decay, cosine_decay_warmup and piecewise_decay            |
+|      step_each_epoch    |      The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup  |         20       | Calculation: total_image_num / (batch_size_per_card * card_size) |
+|        total_epoch      |    The number of epochs. Used in cosine_decay/cosine_decay_warmup      |       1000      | Consistent with Global.epoch_num      |
+|        warmup_minibatch      |  Number of steps for linear warmup. Used in cosine_decay_warmup        |       1000      | \        |
 |        boundaries      |    The step intervals to reduce learning rate. Used in piecewise_decay       |       -      |  The format is list        |
 |        decay_rate      |    Learning rate decay rate. Used in piecewise_decay       |       -      |  \        |
--- a/doc/doc_en/detection_en.md
+++ b/doc/doc_en/detection_en.md
 # TEXT DETECTION

-This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
+This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.

 ## DATA PREPARATION
 The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

 Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
-```
+```shell
 # Under the PaddleOCR path
 cd PaddleOCR/
 wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
@@ -27,16 +27,19 @@ The provided annotation file format is as follow, seperated by "\t":
 " Image file name             Image annotation information encoded by json.dumps"
 ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
 ```
-The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
+The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries. 
+
+The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
+
+`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

-`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
-If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format.
+If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.


 ## TRAINING

 First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
-```
+```shell
 cd PaddleOCR/
 # Download the pre-trained model of MobileNetV3
 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
@@ -44,7 +47,7 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/Mob
 wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar

 # decompressing the pre-training model file, take MobileNetV3 as an example
-tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/
+tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/

 # Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:
 ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
@@ -56,9 +59,9 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models

 ```

-**START TRAINING**  
+#### START TRAINING
 *If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml
 ```

@@ -66,19 +69,19 @@ In the above instruction, use `-c` to select the training to use the `configs/de
 For a detailed explanation of the configuration file, please refer to [config](./config_en.md).

 You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
 ```

-**load trained model and conntinue training**
+#### load trained model and conntinue training
 If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

 For example:
-```
+```shell
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
 ```

-**Note**:The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
+**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.


 ## EVALUATION
@@ -89,7 +92,7 @@ Run the following code to calculate the evaluation indicators. The result will b

 When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.

-```
+```shell
 python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
 ```
 The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.

--- a/doc/doc_en/inference_en.md
+++ b/doc/doc_en/inference_en.md

 # Reasoning based on Python prediction engine

-The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
+The inference model (the model saved by `fluid.io.save_inference_model`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.

 The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

@@ -9,7 +9,31 @@ Compared with the checkpoints model, the inference model will additionally save

 Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model.

+- [CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
+    - [Convert detection model to inference model](#Convert_detection_model)
+    - [Convert recognition model to inference model](#Convert_recognition_model)
+    
+    
+- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
+    - [1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION)
+    - [2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
+    - [3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
+    - [4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
+    
+- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
+    - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
+    - [2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
+    - [3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE](#ATTENTION-BASED_RECOGNITION)
+    - [4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
+    
+    
+- [TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
+    - [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
+    - [2. OTHER MODELS](#OTHER_MODELS)
+    
+<a name="CONVERT"></a>
 ## CONVERT TRAINING MODEL TO INFERENCE MODEL
+<a name="Convert_detection_model"></a>
 ### Convert detection model to inference model

 Download the lightweight Chinese detection model:
@@ -35,6 +59,7 @@ inference/det_db/
  └─  params    Check the parameter file of the inference model
 ```

+<a name="Convert_recognition_model"></a>
 ### Convert recognition model to inference model

 Download the lightweight Chinese recognition model:
@@ -62,11 +87,13 @@ After the conversion is successful, there are two files in the directory:
  └─  params    Identify the parameter files of the inference model
 ```

+<a name="DETECTION_MODEL_INFERENCE"></a>
 ## TEXT DETECTION MODEL INFERENCE

 The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
 Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.

+<a name="LIGHTWEIGHT_DETECTION"></a>
 ### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE

 For lightweight Chinese detection model inference, you can execute the following commands:
@@ -90,6 +117,7 @@ If you want to use the CPU for prediction, execute the command as follows
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
 ```

+<a name="DB_DETECTION"></a>
 ### 2. DB TEXT DETECTION MODEL INFERENCE

 First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert:
@@ -114,6 +142,7 @@ The visualized text detection results are saved to the `./inference_results` fol

 **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.

+<a name="EAST_DETECTION"></a>
 ### 3. EAST TEXT DETECTION MODEL INFERENCE

 First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert:
@@ -126,23 +155,64 @@ First, convert the model saved in the EAST text detection training process into
 python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east"
 ```

-For EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type to EAST, run the following command:
+**For EAST text detection model inference, you need to set the parameter ``--det_algorithm="EAST"``**, run the following command:

 ```
 python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST"
 ```
+
 The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:

 ![](../imgs_results/det_res_img_10_east.jpg)

-**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup.
+**Note**: EAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
+
+
+<a name="SAST_DETECTION"></a>
+### 4. SAST TEXT DETECTION MODEL INFERENCE
+#### (1). Quadrangle text detection model (ICDAR2015)  
+First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)), you can use the following command to convert:
+
+```
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.checkpoints="./models/sast_r50_vd_icdar2015/best_accuracy" Global.save_inference_dir="./inference/det_sast_ic15"
+```
+
+**For SAST quadrangle text detection model inference, you need to set the parameter `--det_algorithm="SAST"`**, run the following command:
+
+```
+python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/"
+```
+
+The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
+
+![](../imgs_results/det_res_img_10_sast.jpg)

+#### (2). Curved text detection model (Total-Text)  
+First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the Total-Text English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)), you can use the following command to convert:

+```
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.checkpoints="./models/sast_r50_vd_total_text/best_accuracy" Global.save_inference_dir="./inference/det_sast_tt"
+```
+
+**For SAST curved text detection model inference, you need to set the parameter `--det_algorithm="SAST"` and `--det_sast_polygon=True`**, run the following command:
+
+```
+python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_sast_polygon=True
+```
+
+The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
+
+![](../imgs_results/det_res_img623_sast.jpg)
+
+**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
+
+<a name="RECOGNITION_MODEL_INFERENCE"></a>
 ## TEXT RECOGNITION MODEL INFERENCE

 The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.


+<a name="LIGHTWEIGHT_RECOGNITION"></a>
 ### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE

 For lightweight Chinese recognition model inference, you can execute the following commands:
@@ -158,6 +228,7 @@ After executing the command, the prediction results (recognized text and score)
 Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695]


+<a name="CTC-BASED_RECOGNITION"></a>
 ### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE

 Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`.
@@ -178,6 +249,7 @@ For STAR-Net text recognition model inference, execute the following commands:
 python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
 ```

+<a name="ATTENTION-BASED_RECOGNITION"></a>
 ### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
 ![](../imgs_words_en/word_336.png)

@@ -196,6 +268,7 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
 dict_character = list(self.character_str)
 ```

+<a name="USING_CUSTOM_CHARACTERS"></a>
 ### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
 If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.

@@ -203,8 +276,10 @@ If the chars dictionary is modified during training, you need to specify the new
 python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
 ```

+<a name="CONCATENATION"></a>
 ## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION

+<a name="LIGHTWEIGHT_CHINESE_MODEL"></a>
 ### 1. LIGHTWEIGHT CHINESE MODEL

 When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default.
@@ -217,9 +292,14 @@ After executing the command, the recognition result image is as follows:

 ![](../imgs_results/2.jpg)

+<a name="OTHER_MODELS"></a>
 ### 2. OTHER MODELS

-If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition:
+If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model.
+
+**Note: due to the limitation of rotation logic of detected box, SAST curved text detection model (using the parameter `det_sast_polygon=True`) is not supported for model combination yet.**
+
+The following command uses the combination of the EAST text detection and STAR-Net text recognition:

 ```
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"

--- a/doc/doc_en/quickstart_en.md
+++ b/doc/doc_en/quickstart_en.md
@@ -5,6 +5,7 @@

 Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment.

+*Note: Support the use of PaddleOCR through whl package installation，pelease refer  [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)。*

 ## 2.inference models


--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -18,6 +18,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset

 If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，download the lmdb format dataset required for benchmark

+If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
+
 * Use your own dataset:

 If you want to use your own data for training, please refer to the following to organize your data.

--- a/doc/doc_en/update_en.md
+++ b/doc/doc_en/update_en.md
 # RECENT UPDATES
+- 2020.8.24 Support the use of PaddleOCR through whl package installation，pelease refer  [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
 - 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
 - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
 - 2020.7.15, Add mobile App demo , support both iOS and  Android  ( based on easyedge and Paddle Lite)