Merge pull request #1 from PaddlePaddle/dygraph

Dygraph

Merge pull request #1 from PaddlePaddle/dygraph
Dygraph
6893d151 · Thomas Young · GitHub · 32665fe5 · 58794e06 · 6893d151
Unverified Commit 6893d151 authored May 21, 2021 by Thomas Young Committed by GitHub May 21, 2021
20 changed files
--- a/deploy/lite/readme.md
+++ b/deploy/lite/readme.md
+# 端侧部署
+
+本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleOCR超轻量中文检测、识别模型的详细步骤。
+
+Paddle Lite是飞桨轻量化推理引擎，为手机、IOT端提供高效推理能力，并广泛整合跨平台硬件，为端侧部署及应用落地问题提供轻量化的部署方案。
+
+
+## 1. 准备环境
+
+### 运行准备
+- 电脑（编译Paddle Lite）
+- 安卓手机（armv7或armv8）
+
+### 1.1 准备交叉编译环境
+交叉编译环境用于编译 Paddle Lite 和 PaddleOCR 的C++ demo。
+支持多种开发环境，不同开发环境的编译流程请参考对应文档。
+
+1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
+2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
+3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
+
+### 1.2 准备预测库
+
+预测库有两种获取方式：
+- 1. 直接下载，预测库下载链接如下：
+
+      | 平台 | 预测库下载链接 |
+      |---|---|
+      |Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
+      |IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
+
+      注：1. 上述预测库为PaddleLite 2.9分支编译得到，有关PaddleLite 2.9 详细信息可参考 [链接](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.9) 。
+
+- 2. [推荐]编译Paddle-Lite得到预测库，Paddle-Lite的编译方式如下：
+```
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+# 切换到Paddle-Lite release/v2.9 稳定分支
+git checkout release/v2.9
+./lite/tools/build_android.sh  --arch=armv8  --with_cv=ON --with_extra=ON
+```
+
+注意：编译Paddle-Lite获得预测库时，需要打开`--with_cv=ON --with_extra=ON`两个选项，`--arch`表示`arm`版本，这里指定为armv8，
+更多编译命令
+介绍请参考 [链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_andriod.html) 。
+
+直接下载预测库并解压后，可以得到`inference_lite_lib.android.armv8/`文件夹，通过编译Paddle-Lite得到的预测库位于
+`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
+预测库的文件目录如下：
+```
+inference_lite_lib.android.armv8/
+|-- cxx                                        C++ 预测库和头文件
+|   |-- include                                C++ 头文件
+|   |   |-- paddle_api.h
+|   |   |-- paddle_image_preprocess.h
+|   |   |-- paddle_lite_factory_helper.h
+|   |   |-- paddle_place.h
+|   |   |-- paddle_use_kernels.h
+|   |   |-- paddle_use_ops.h
+|   |   `-- paddle_use_passes.h
+|   `-- lib                                           C++预测库
+|       |-- libpaddle_api_light_bundled.a             C++静态库
+|       `-- libpaddle_light_api_shared.so             C++动态库
+|-- java                                     Java预测库
+|   |-- jar
+|   |   `-- PaddlePredictor.jar
+|   |-- so
+|   |   `-- libpaddle_lite_jni.so
+|   `-- src
+|-- demo                                     C++和Java示例代码
+|   |-- cxx                                  C++  预测库demo
+|   `-- java                                 Java 预测库demo
+```
+
+## 2 开始运行
+
+### 2.1 模型优化
+
+Paddle-Lite 提供了多种策略来自动优化原始的模型，其中包括量化、子图融合、混合调度、Kernel优选等方法，使用Paddle-lite的opt工具可以自动
+对inference模型进行优化，优化后的模型更轻量，模型运行速度更快。
+
+如果已经准备好了 `.nb` 结尾的模型文件，可以跳过此步骤。
+
+下述表格中也提供了一系列中文移动端模型：
+
+|模型版本|模型简介|模型大小|检测模型|文本方向分类模型|识别模型|Paddle-Lite版本|
+|---|---|---|---|---|---|---|
+|V2.0|超轻量中文OCR 移动端模型|7.8M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
+|V2.0(slim)|超轻量中文OCR 移动端模型|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
+
+如果直接使用上述表格中的模型进行部署，可略过下述步骤，直接阅读 [2.2节](#2.2与手机联调)。
+
+如果要部署的模型不在上述表格中，则需要按照如下步骤获得优化后的模型。
+
+模型优化需要Paddle-Lite的opt可执行文件，可以通过编译Paddle-Lite源码获得，编译步骤如下：
+```
+# 如果准备环境时已经clone了Paddle-Lite，则不用重新clone Paddle-Lite
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+git checkout release/v2.9
+# 启动编译
+./lite/tools/build.sh build_optimize_tool
+```
+
+编译完成后，opt文件位于`build.opt/lite/api/`下，可通过如下方式查看opt的运行选项和使用方式；
+```
+cd build.opt/lite/api/
+./opt
+```
+
+|选项|说明|
+|---|---|
+|--model_dir|待优化的PaddlePaddle模型（非combined形式）的路径|
+|--model_file|待优化的PaddlePaddle模型（combined形式）的网络结构文件路径|
+|--param_file|待优化的PaddlePaddle模型（combined形式）的权重文件路径|
+|--optimize_out_type|输出模型类型，目前支持两种类型：protobuf和naive_buffer，其中naive_buffer是一种更轻量级的序列化/反序列化实现。若您需要在mobile端执行模型预测，请将此选项设置为naive_buffer。默认为protobuf|
+|--optimize_out|优化模型的输出路径|
+|--valid_targets|指定模型可执行的backend，默认为arm。目前可支持x86、arm、opencl、npu、xpu，可以同时指定多个backend(以空格分隔)，Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU（Kirin 810/990 Soc搭载的达芬奇架构NPU），应当设置为npu, arm|
+|--record_tailoring_info|当使用 根据模型裁剪库文件 功能时，则设置该选项为true，以记录优化后模型含有的kernel和OP信息，默认为false|
+
+`--model_dir`适用于待优化的模型是非combined方式，PaddleOCR的inference模型是combined方式，即模型结构和模型参数使用单独一个文件存储。
+
+下面以PaddleOCR的超轻量中文模型为例，介绍使用编译好的opt文件完成inference模型到Paddle-Lite优化模型的转换。
+
+```
+# 【推荐】 下载PaddleOCR V2.0版本的中英文 inference模型
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_det_slim_infer.tar
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_rec_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_rec_slim_infer.tar
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_cls_slim_infer.tar
+# 转换V2.0检测模型
+./opt --model_file=./ch_ppocr_mobile_v2.0_det_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_det_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_det_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+# 转换V2.0识别模型
+./opt --model_file=./ch_ppocr_mobile_v2.0_rec_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_rec_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_rec_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+# 转换V2.0方向分类器模型
+./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+
+```
+
+转换成功后，inference模型目录下会多出`.nb`结尾的文件，即是转换成功的模型文件。
+
+注意：使用paddle-lite部署时，需要使用opt工具优化后的模型。 opt 工具的输入模型是paddle保存的inference模型
+
+<a name="2.2与手机联调"></a>
+### 2.2 与手机联调
+
+首先需要进行一些准备工作。
+ 1. 准备一台arm8的安卓手机，如果编译的预测库和opt文件是armv7，则需要arm7的手机，并修改Makefile中`ARM_ABI = arm7`。
+ 2. 打开手机的USB调试选项，选择文件传输模式，连接电脑。
+ 3. 电脑上安装adb工具，用于调试。 adb安装方式如下：
+
+    3.1. MAC电脑安装ADB:
+    ```
+    brew cask install android-platform-tools
+    ```
+    3.2. Linux安装ADB
+    ```
+    sudo apt update
+    sudo apt install -y wget adb
+    ```
+    3.3. Window安装ADB
+
+    win上安装需要去谷歌的安卓平台下载adb软件包进行安装：[链接](https://developer.android.com/studio)
+
+    打开终端，手机连接电脑，在终端中输入
+    ```
+    adb devices
+    ```
+    如果有device输出，则表示安装成功。
+    ```
+       List of devices attached
+       744be294    device
+    ```
+
+ 4. 准备优化后的模型、预测库文件、测试图像和使用的字典文件。
+ ```
+ git clone https://github.com/PaddlePaddle/PaddleOCR.git
+ cd PaddleOCR/deploy/lite/
+ # 运行prepare.sh，准备预测库文件、测试图像和使用的字典文件，并放置在预测库中的demo/cxx/ocr文件夹下
+ sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
+
+ # 进入OCR demo的工作目录
+ cd /{lite prediction library path}/inference_lite_lib.android.armv8/
+ cd demo/cxx/ocr/
+ # 将C++预测动态库so文件复制到debug文件夹中
+ cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
+ ```
+
+ 准备测试图像，以`PaddleOCR/doc/imgs/11.jpg`为例，将测试的图像复制到`demo/cxx/ocr/debug/`文件夹下。
+ 准备lite opt工具优化后的模型文件，比如使用`ch_ppocr_mobile_v2.0_det_slim_opt.nb，ch_ppocr_mobile_v2.0_rec_slim_opt.nb, ch_ppocr_mobile_v2.0_cls_slim_opt.nb`，模型文件放置在`demo/cxx/ocr/debug/`文件夹下。
+
+ 执行完成后，ocr文件夹下将有如下文件格式：
+
+```
+demo/cxx/ocr/
+|-- debug/  
+|   |--ch_ppocr_mobile_v2.0_det_slim_opt.nb           优化后的检测模型文件
+|   |--ch_ppocr_mobile_v2.0_rec_slim_opt.nb           优化后的识别模型文件
+|   |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb           优化后的文字方向分类器模型文件
+|   |--11.jpg                           待测试图像
+|   |--ppocr_keys_v1.txt                中文字典文件
+|   |--libpaddle_light_api_shared.so    C++预测库文件
+|   |--config.txt                       超参数配置
+|-- config.txt                  超参数配置
+|-- cls_process.cc              方向分类器的预处理和后处理文件
+|-- cls_process.h
+|-- crnn_process.cc             识别模型CRNN的预处理和后处理文件
+|-- crnn_process.h
+|-- db_post_process.cc          检测模型DB的后处理文件
+|-- db_post_process.h
+|-- Makefile                    编译文件
+|-- ocr_db_crnn.cc              C++预测源文件
+```
+
+#### 注意：
+1. ppocr_keys_v1.txt是中文字典文件，如果使用的 nb 模型是英文数字或其他语言的模型，需要更换为对应语言的字典。
+PaddleOCR 在ppocr/utils/下存放了多种字典，包括：
+```
+dict/french_dict.txt     # 法语字典
+dict/german_dict.txt     # 德语字典
+ic15_dict.txt       # 英文字典
+dict/japan_dict.txt      # 日语字典
+dict/korean_dict.txt     # 韩语字典
+ppocr_keys_v1.txt   # 中文字典
+...
+```
+
+2.  `config.txt` 包含了检测器、分类器的超参数，如下：
+```
+max_side_len  960         # 输入图像长宽大于960时，等比例缩放图像，使得图像最长边为960
+det_db_thresh  0.3        # 用于过滤DB预测的二值化图像，设置为0.-0.3对结果影响不明显
+det_db_box_thresh  0.5    # DB后处理过滤box的阈值，如果检测存在漏框情况，可酌情减小
+det_db_unclip_ratio  1.6  # 表示文本框的紧致程度，越小则文本框更靠近文本
+use_direction_classify  0  # 是否使用方向分类器，0表示不使用，1表示使用
+```
+
+ 5. 启动调试
+
+ 上述步骤完成后就可以使用adb将文件push到手机上运行，步骤如下：
+
+ ```
+ # 执行编译，得到可执行文件ocr_db_crnn, 第一次执行此命令会下载opencv等依赖库，下载完成后，需要再执行一次
+ make -j
+
+ # 将编译的可执行文件移动到debug文件夹中
+ mv ocr_db_crnn ./debug/
+ # 将debug文件夹push到手机上
+ adb push debug /data/local/tmp/
+ adb shell
+ cd /data/local/tmp/debug
+ export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
+ # 开始使用，ocr_db_crnn可执行文件的使用方式为:
+ # ./ocr_db_crnn  检测模型文件 方向分类器模型文件  识别模型文件  测试图像路径  字典文件路径
+ ./ocr_db_crnn ch_ppocr_mobile_v2.0_det_slim_opt.nb  ch_ppocr_mobile_v2.0_rec_slim_opt.nb  ch_ppocr_mobile_v2.0_cls_slim_opt.nb  ./11.jpg  ppocr_keys_v1.txt
+ ```
+
+ 如果对代码做了修改，则需要重新编译并push到手机上。
+
+ 运行效果如下：
+
+<div align="center">
+    <img src="imgs/lite_demo.png" width="600">
+</div>
+
+
+## FAQ
+Q1：如果想更换模型怎么办，需要重新按照流程走一遍吗？
+
+A1：如果已经走通了上述步骤，更换模型只需要替换 .nb 模型文件即可，同时要注意更新字典
+
+Q2：换一个图测试怎么做？
+
+A2：替换debug下的.jpg测试图像为你想要测试的图像，adb push 到手机上即可
+
+Q3：如何封装到手机APP中？
+
+A3：此demo旨在提供能在手机上运行OCR的核心算法部分，PaddleOCR/deploy/android_demo是将这个demo封装到手机app的示例，供参考
--- a/deploy/lite/readme_en.md
+++ b/deploy/lite/readme_en.md
+# Tutorial of PaddleOCR Mobile deployment
+
+This tutorial will introduce how to use [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy paddleOCR ultra-lightweight Chinese and English detection models on mobile phones.
+
+paddle-lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoTs, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for end-side deployment issues.
+
+## 1. Preparation
+
+### 运行准备
+
+- Computer (for Compiling Paddle Lite)
+- Mobile phone (arm7 or arm8)
+
+### 1.1 Prepare the cross-compilation environment
+The cross-compilation environment is used to compile C++ demos of Paddle Lite and PaddleOCR.
+Supports multiple development environments.
+
+For the compilation process of different development environments, please refer to the corresponding documents.
+
+1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
+2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
+3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
+
+### 1.2 Prepare Paddle-Lite library
+
+There are two ways to obtain the Paddle-Lite library：
+- 1. Download directly, the download link of the Paddle-Lite library is as follows：
+
+      | Platform | Paddle-Lite library download link |
+      |---|---|
+      |Android|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.android.armv7.gcc.c++_shared.with_extra.with_cv.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.android.armv8.gcc.c++_shared.with_extra.with_cv.tar.gz)|
+      |IOS|[arm7](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.ios.armv7.with_cv.with_extra.with_log.tiny_publish.tar.gz) / [arm8](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.9/inference_lite_lib.ios.armv8.with_cv.with_extra.with_log.tiny_publish.tar.gz)|
+
+      Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.9 branch. For more information about Paddle-Lite 2.9, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.9).
+
+- 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows：
+```
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+# Switch to Paddle-Lite release/v2.8 stable branch
+git checkout release/v2.8
+./lite/tools/build_android.sh  --arch=armv8  --with_cv=ON --with_extra=ON
+```
+
+Note: When compiling Paddle-Lite to obtain the Paddle-Lite library, you need to turn on the two options `--with_cv=ON --with_extra=ON`, `--arch` means the `arm` version, here is designated as armv8,
+
+More compilation commands refer to the introduction [link](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_andriod.html) 。
+
+After directly downloading the Paddle-Lite library and decompressing it, you can get the `inference_lite_lib.android.armv8/` folder, and the Paddle-Lite library obtained by compiling Paddle-Lite is located
+`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/` folder.
+
+The structure of the prediction library is as follows:
+```
+inference_lite_lib.android.armv8/
+|-- cxx                                        C++ prebuild library
+|   |-- include                                C++
+|   |   |-- paddle_api.h
+|   |   |-- paddle_image_preprocess.h
+|   |   |-- paddle_lite_factory_helper.h
+|   |   |-- paddle_place.h
+|   |   |-- paddle_use_kernels.h
+|   |   |-- paddle_use_ops.h
+|   |   `-- paddle_use_passes.h
+|   `-- lib                                           C++ library
+|       |-- libpaddle_api_light_bundled.a             C++ static library
+|       `-- libpaddle_light_api_shared.so             C++ dynamic library
+|-- java                                     Java library
+|   |-- jar
+|   |   `-- PaddlePredictor.jar
+|   |-- so
+|   |   `-- libpaddle_lite_jni.so
+|   `-- src
+|-- demo                                     C++ and Java demo
+|   |-- cxx                                  C++ demo
+|   `-- java                                 Java demo
+```
+
+## 2 Run
+
+### 2.1 Inference Model Optimization
+
+Paddle Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle Lite provide opt tools to automatically complete the optimization steps and output a lightweight, optimal executable model.
+
+If you have prepared the model file ending in .nb, you can skip this step.
+
+The following table also provides a series of models that can be deployed on mobile phones to recognize Chinese. You can directly download the optimized model.
+
+|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
+|---|---|---|---|---|---|---|
+|V2.0|extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download lin](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download lin](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
+|V2.0(slim)|extra-lightweight chinese OCR optimized model|3.3M|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
+
+If you directly use the model in the above table for deployment, you can skip the following steps and directly read [Section 2.2](#2.2 Run optimized model on Phone).
+
+If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
+
+The `opt` tool can be obtained by compiling Paddle Lite.
+```
+git clone https://github.com/PaddlePaddle/Paddle-Lite.git
+cd Paddle-Lite
+git checkout release/v2.9
+./lite/tools/build.sh build_optimize_tool
+```
+
+After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways:
+
+```
+cd build.opt/lite/api/
+./opt
+```
+
+|Options|Description|
+|---|---|
+|--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
+|--model_file|The network structure file path of the PaddlePaddle model (combined form) to be optimized|
+|--param_file|The weight file path of the PaddlePaddle model (combined form) to be optimized|
+|--optimize_out_type|Output model type, currently supports two types: protobuf and naive_buffer, among which naive_buffer is a more lightweight serialization/deserialization implementation. If you need to perform model prediction on the mobile side, please set this option to naive_buffer. The default is protobuf|
+|--optimize_out|The output path of the optimized model|
+|--valid_targets|The executable backend of the model, the default is arm. Currently it supports x86, arm, opencl, npu, xpu, multiple backends can be specified at the same time (separated by spaces), and Model Optimize Tool will automatically select the best method. If you need to support Huawei NPU (DaVinci architecture NPU equipped with Kirin 810/990 Soc), it should be set to npu, arm|
+|--record_tailoring_info|When using the function of cutting library files according to the model, set this option to true to record the kernel and OP information contained in the optimized model. The default is false|
+
+`--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
+
+The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
+
+```
+# [Recommendation] Download the Chinese and English inference model of PaddleOCR V2.0
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_det_slim_infer.tar
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_rec_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_rec_slim_infer.tar
+wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_cls_slim_infer.tar && tar xf  ch_ppocr_mobile_v2.0_cls_slim_infer.tar
+# Convert V2.0 detection model
+./opt --model_file=./ch_ppocr_mobile_v2.0_det_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_det_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_det_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+# Convert V2.0 recognition model
+./opt --model_file=./ch_ppocr_mobile_v2.0_rec_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_rec_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_rec_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+# Convert V2.0 angle classifier model
+./opt --model_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdmodel  --param_file=./ch_ppocr_mobile_v2.0_cls_slim_infer/inference.pdiparams  --optimize_out=./ch_ppocr_mobile_v2.0_cls_slim_opt --valid_targets=arm  --optimize_out_type=naive_buffer
+
+```
+
+After the conversion is successful, there will be more files ending with `.nb` in the inference model directory, which is the successfully converted model file.
+
+<a name="2.2 Run optimized model on Phone"></a>
+### 2.2 Run optimized model on Phone
+
+Some preparatory work is required first.
+ 1. Prepare an Android phone with arm8. If the compiled prediction library and opt file are armv7, you need an arm7 phone and modify ARM_ABI = arm7 in the Makefile.
+ 2. Make sure the phone is connected to the computer, open the USB debugging option of the phone, and select the file transfer mode.
+ 3. Install the adb tool on the computer.
+
+    3.1. Install ADB for MAC:
+    ```
+    brew cask install android-platform-tools
+    ```
+    3.2. Install ADB for Linux
+    ```
+    sudo apt update
+    sudo apt install -y wget adb
+    ```
+    3.3. Install ADB for windows
+
+    To install on win, you need to go to Google's Android platform to download the adb package for installation：[link](https://developer.android.com/studio)
+
+    Verify whether adb is installed successfully
+     ```
+    adb devices
+    ```
+    If there is device output, it means the installation is successful。
+    ```
+       List of devices attached
+       744be294    device
+    ```
+
+ 4. Prepare optimized models, prediction library files, test images and dictionary files used.
+ ```
+ git clone https://github.com/PaddlePaddle/PaddleOCR.git
+ cd PaddleOCR/deploy/lite/
+ # run prepare.sh
+ sh prepare.sh /{lite prediction library path}/inference_lite_lib.android.armv8
+
+ #
+ cd /{lite prediction library path}/inference_lite_lib.android.armv8/
+ cd demo/cxx/ocr/
+ # copy paddle-lite C++ .so file to debug/ directory
+ cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
+
+ cd inference_lite_lib.android.armv8/demo/cxx/ocr/
+ cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
+ ```
+
+Prepare the test image, taking PaddleOCR/doc/imgs/11.jpg as an example, copy the image file to the demo/cxx/ocr/debug/ folder. Prepare the model files optimized by the lite opt tool, ch_det_mv3_db_opt.nb, ch_rec_mv3_crnn_opt.nb, and place them under the demo/cxx/ocr/debug/ folder.
+
+The structure of the OCR demo is as follows after the above command is executed:
+
+```
+demo/cxx/ocr/
+|-- debug/  
+|   |--ch_ppocr_mobile_v2.0_det_slim_opt.nb           Detection model
+|   |--ch_ppocr_mobile_v2.0_rec_slim_opt.nb           Recognition model
+|   |--ch_ppocr_mobile_v2.0_cls_slim_opt.nb           Text direction classification model
+|   |--11.jpg                           Image for OCR
+|   |--ppocr_keys_v1.txt                Dictionary file
+|   |--libpaddle_light_api_shared.so    C++ .so file
+|   |--config.txt                       Config file
+|-- config.txt                  Config file
+|-- cls_process.cc              Pre-processing and post-processing files for the angle classifier
+|-- cls_process.h
+|-- crnn_process.cc             Pre-processing and post-processing files for the CRNN model
+|-- crnn_process.h
+|-- db_post_process.cc          Pre-processing and post-processing files for the DB model
+|-- db_post_process.h
+|-- Makefile  
+|-- ocr_db_crnn.cc              C++ main code
+```
+
+#### 注意：
+1. `ppocr_keys_v1.txt` is a Chinese dictionary file. If the nb model is used for English recognition or other language recognition, dictionary file should be replaced with a dictionary of the corresponding language. PaddleOCR provides a variety of dictionaries under ppocr/utils/, including:
+```
+dict/french_dict.txt     # french
+dict/german_dict.txt     # german
+ic15_dict.txt       # english
+dict/japan_dict.txt      # japan
+dict/korean_dict.txt     # korean
+ppocr_keys_v1.txt   # chinese
+```
+
+2.  `config.txt` of the detector and classifier, as shown below:
+```
+max_side_len  960         #  Limit the maximum image height and width to 960
+det_db_thresh  0.3        # Used to filter the binarized image of DB prediction, setting 0.-0.3 has no obvious effect on the result
+det_db_box_thresh  0.5    # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
+det_db_unclip_ratio  1.6  # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
+use_direction_classify  0  # Whether to use the direction classifier, 0 means not to use, 1 means to use
+```
+
+ 5. Run Model on phone
+
+After the above steps are completed, you can use adb to push the file to the phone to run, the steps are as follows:
+
+ ```
+ # Execute the compilation and get the executable file ocr_db_crnn
+ # The first execution of this command will download dependent libraries such as opencv. After the download is complete, you need to execute it again
+ make -j
+ # Move the compiled executable file to the debug folder
+ mv ocr_db_crnn ./debug/
+ # Push the debug folder to the phone
+ adb push debug /data/local/tmp/
+ adb shell
+ cd /data/local/tmp/debug
+ export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
+ # The use of ocr_db_crnn is:
+ # ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path
+ ./ocr_db_crnn ch_ppocr_mobile_v2.0_det_opt.nb  ch_ppocr_mobile_v2.0_rec_opt.nb  ch_ppocr_mobile_v2.0_cls_opt.nb  ./11.jpg  ppocr_keys_v1.txt
+ ```
+
+If you modify the code, you need to recompile and push to the phone.
+
+The outputs are as follows:
+
+<div align="center">
+    <img src="imgs/lite_demo.png" width="600">
+</div>
+
+## FAQ
+
+Q1: What if I want to change the model, do I need to run it again according to the process?
+
+A1: If you have performed the above steps, you only need to replace the .nb model file to complete the model replacement.
+
+Q2: How to test with another picture?
+
+A2: Replace the .jpg test image under ./debug with the image you want to test, and run adb push to push new image to the phone.
+
+Q3: How to package it into the mobile APP?
+
+A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
--- a/deploy/pdserving/README.md
+++ b/deploy/pdserving/README.md
@@ -30,38 +30,32 @@ The introduction and tutorial of Paddle Serving service deployment framework ref
 PaddleOCR operating environment and Paddle Serving operating environment are needed.

 1. Please prepare PaddleOCR operating environment reference [link](../../doc/doc_ch/installation.md).
+   Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.0.1.
+

 2. The steps of PaddleServing operating environment prepare are as follows:

    Install serving which used to start the service
    ```
-    pip3 install paddle-serving-server==0.5.0 # for CPU
-    pip3 install paddle-serving-server-gpu==0.5.0 # for GPU
+    pip3 install paddle-serving-server==0.6.0 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.0 # for GPU
    # Other GPU environments need to confirm the environment and then choose to execute the following commands
-    pip3 install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
-    pip3 install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
-    pip3 install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
-    pip3 install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
+    pip3 install paddle-serving-server-gpu==0.6.0.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.0.post11 # GPU with CUDA11 + TensorRT7
    ```

 3. Install the client to send requests to the service
-    ```
-    pip3 install paddle-serving-client==0.5.0 # for CPU
+    In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
+    The python3.7 version is recommended here:

-    pip3 install paddle-serving-client-gpu==0.5.0 # for GPU
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
    ```

 4. Install serving-app
    ```
-    pip3 install paddle-serving-app==0.3.0
-    # fix local_predict to support load dynamic model
-    # find the install directoory of paddle_serving_app
-    vim /usr/local/lib/python3.7/site-packages/paddle_serving_app/local_predict.py
-    # replace line 85 of local_predict.py config = AnalysisConfig(model_path) with:
-    if os.path.exists(os.path.join(model_path, "__params__")):
-        config = AnalysisConfig(os.path.join(model_path, "__model__"), os.path.join(model_path, "__params__"))
-    else:
-        config = AnalysisConfig(model_path)
+    pip3 install paddle-serving-app==0.6.0
    ```

   **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
@@ -74,38 +68,38 @@ When using PaddleServing for service deployment, you need to convert the saved i
 Firstly, download the [inference model](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-20-series-model-listupdate-on-dec-15) of PPOCR
 ```
 # Download and unzip the OCR text detection model
-wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar && tar xf ch_ppocr_server_v2.0_det_infer.tar
+wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
 # Download and unzip the OCR text recognition model
-wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar && tar xf ch_ppocr_server_v2.0_rec_infer.tar
+wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar

 ```
-Then, you can use installed paddle_serving_client tool to convert inference model to server model.
+Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
 ```
 #  Detection model conversion
-python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_server_v2.0_det_infer/ \
+python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_det_infer/ \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
-                                         --serving_server ./ppocr_det_server_2.0_serving/ \
-                                         --serving_client ./ppocr_det_server_2.0_client/
+                                         --serving_server ./ppocr_det_mobile_2.0_serving/ \
+                                         --serving_client ./ppocr_det_mobile_2.0_client/

 #  Recognition model conversion
-python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_server_v2.0_rec_infer/ \
+python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_rec_infer/ \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
-                                         --serving_server ./ppocr_rec_server_2.0_serving/  \
-                                         --serving_client ./ppocr_rec_server_2.0_client/
+                                         --serving_server ./ppocr_rec_mobile_2.0_serving/  \
+                                         --serving_client ./ppocr_rec_mobile_2.0_client/

 ```

-After the detection model is converted, there will be additional folders of `ppocr_det_server_2.0_serving` and `ppocr_det_server_2.0_client` in the current folder, with the following format:
+After the detection model is converted, there will be additional folders of `ppocr_det_mobile_2.0_serving` and `ppocr_det_mobile_2.0_client` in the current folder, with the following format:
 ```
-|- ppocr_det_server_2.0_serving/
+|- ppocr_det_mobile_2.0_serving/
   |- __model__
   |- __params__
   |- serving_server_conf.prototxt
   |- serving_server_conf.stream.prototxt

-|- ppocr_det_server_2.0_client
+|- ppocr_det_mobile_2.0_client
   |- serving_client_conf.prototxt
   |- serving_client_conf.stream.prototxt

@@ -147,6 +141,78 @@ The recognition model is the same.
    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
    ![](./imgs/results.png)  

+    Adjust the number of concurrency in config.yml to get the largest QPS. Generally, the number of concurrent detection and recognition is 2:1
+
+    ```
+    det:
+        concurrency: 8
+        ...
+    rec:
+        concurrency: 4
+        ...
+    ```
+
+    Multiple service requests can be sent at the same time if necessary.
+
+    The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
+
+    Tested on 200 real pictures, and limited the detection long side to 960. The average QPS on T4 GPU can reach around 23:
+
+    ```
+
+    2021-05-13 03:42:36,895 ==================== TRACER ======================
+    2021-05-13 03:42:36,975 Op(rec):
+    2021-05-13 03:42:36,976         in[14.472382882882883 ms]
+    2021-05-13 03:42:36,976         prep[9.556855855855856 ms]
+    2021-05-13 03:42:36,976         midp[59.921905405405404 ms]
+    2021-05-13 03:42:36,976         postp[15.345945945945946 ms]
+    2021-05-13 03:42:36,976         out[1.9921216216216215 ms]
+    2021-05-13 03:42:36,976         idle[0.16254943864471572]
+    2021-05-13 03:42:36,976 Op(det):
+    2021-05-13 03:42:36,976         in[315.4468035714286 ms]
+    2021-05-13 03:42:36,976         prep[69.5980625 ms]
+    2021-05-13 03:42:36,976         midp[18.989535714285715 ms]
+    2021-05-13 03:42:36,976         postp[18.857803571428573 ms]
+    2021-05-13 03:42:36,977         out[3.1337544642857145 ms]
+    2021-05-13 03:42:36,977         idle[0.7477961159203756]
+    2021-05-13 03:42:36,977 DAGExecutor:
+    2021-05-13 03:42:36,977         Query count[224]
+    2021-05-13 03:42:36,977         QPS[22.4 q/s]
+    2021-05-13 03:42:36,977         Succ[0.9910714285714286]
+    2021-05-13 03:42:36,977         Error req[169, 170]
+    2021-05-13 03:42:36,977         Latency:
+    2021-05-13 03:42:36,977                 ave[535.1678348214285 ms]
+    2021-05-13 03:42:36,977                 .50[172.651 ms]
+    2021-05-13 03:42:36,977                 .60[187.904 ms]
+    2021-05-13 03:42:36,977                 .70[245.675 ms]
+    2021-05-13 03:42:36,977                 .80[526.684 ms]
+    2021-05-13 03:42:36,977                 .90[854.596 ms]
+    2021-05-13 03:42:36,977                 .95[1722.728 ms]
+    2021-05-13 03:42:36,977                 .99[3990.292 ms]
+    2021-05-13 03:42:36,978 Channel (server worker num[10]):
+    2021-05-13 03:42:36,978         chl0(In: ['@DAGExecutor'], Out: ['det']) size[0/0]
+    2021-05-13 03:42:36,979         chl1(In: ['det'], Out: ['rec']) size[6/0]
+    2021-05-13 03:42:36,979         chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0]
+    ```
+
+## WINDOWS Users
+
+Windows does not support Pipeline Serving, if we want to lauch paddle serving on Windows, we should use Web Service, for more infomation please refer to [Paddle Serving for Windows Users](https://github.com/PaddlePaddle/Serving/blob/develop/doc/WINDOWS_TUTORIAL.md)
+
+
+1. Start Server
+
+```
+cd win
+python3 ocr_web_server.py
+```
+
+2. Client Send Requests
+
+```
+python3 ocr_web_client.py
+```
+
 <a name="faq"></a>
 ## FAQ
 **Q1**: No result return after sending the request.

--- a/deploy/pdserving/README_CN.md
+++ b/deploy/pdserving/README_CN.md
@@ -29,41 +29,31 @@ PaddleOCR提供2种服务部署方式：

 需要准备PaddleOCR的运行环境和Paddle Serving的运行环境。

- 准备PaddleOCR的运行环境参考[链接](../../doc/doc_ch/installation.md)
+- 准备PaddleOCR的运行环境[链接](../../doc/doc_ch/installation.md)
+  根据环境下载对应的paddle whl包，推荐安装2.0.1版本

 - 准备PaddleServing的运行环境，步骤如下

 1. 安装serving，用于启动服务
    ```
-    pip3 install paddle-serving-server==0.5.0 # for CPU
-    pip3 install paddle-serving-server-gpu==0.5.0 # for GPU
+    pip3 install paddle-serving-server==0.6.0 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.0 # for GPU
    # 其他GPU环境需要确认环境再选择执行如下命令
-    pip3 install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
-    pip3 install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
-    pip3 install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
-    pip3 install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
+    pip3 install paddle-serving-server-gpu==0.6.0.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.0.post11 # GPU with CUDA11 + TensorRT7
    ```

 2. 安装client，用于向服务发送请求
-    ```
-    pip3 install paddle-serving-client==0.5.0  # for CPU
+    在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包，这里推荐python3.7版本：

-    pip3 install paddle-serving-client-gpu==0.5.0   # for GPU
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
    ```

 3. 安装serving-app
    ```
-    pip3 install paddle-serving-app==0.3.0
-    ```
-    **note:**  安装0.3.0版本的serving-app后，为了能加载动态图模型，需要修改serving_app的源码，具体为：
-    ```
-    # 找到paddle_serving_app的安装目录，找到并编辑local_predict.py文件
-    vim /usr/local/lib/python3.7/site-packages/paddle_serving_app/local_predict.py
-    # 将local_predict.py 的第85行 config = AnalysisConfig(model_path)  替换为：
-    if os.path.exists(os.path.join(model_path, "__params__")):
-        config = AnalysisConfig(os.path.join(model_path, "__model__"), os.path.join(model_path, "__params__"))
-    else:
-        config = AnalysisConfig(model_path)
+    pip3 install paddle-serving-app==0.6.0
    ```

    **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
@@ -76,38 +66,38 @@ PaddleOCR提供2种服务部署方式：
 首先，下载PPOCR的[inference模型](https://github.com/PaddlePaddle/PaddleOCR#pp-ocr-20-series-model-listupdate-on-dec-15)
 ```
 # 下载并解压 OCR 文本检测模型
-wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar && tar xf ch_ppocr_server_v2.0_det_infer.tar
+wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
 # 下载并解压 OCR 文本识别模型
-wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar && tar xf ch_ppocr_server_v2.0_rec_infer.tar
+wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
 ```

 接下来，用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。

 ```
 # 转换检测模型
-python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_server_v2.0_det_infer/ \
+python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_det_infer/ \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
-                                         --serving_server ./ppocr_det_server_2.0_serving/ \
-                                         --serving_client ./ppocr_det_server_2.0_client/
+                                         --serving_server ./ppocr_det_mobile_2.0_serving/ \
+                                         --serving_client ./ppocr_det_mobile_2.0_client/

 # 转换识别模型
-python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_server_v2.0_rec_infer/ \
+python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_rec_infer/ \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
-                                         --serving_server ./ppocr_rec_server_2.0_serving/  \
-                                         --serving_client ./ppocr_rec_server_2.0_client/
+                                         --serving_server ./ppocr_rec_mobile_2.0_serving/  \
+                                         --serving_client ./ppocr_rec_mobile_2.0_client/
 ```

-检测模型转换完成后，会在当前文件夹多出`ppocr_det_server_2.0_serving` 和`ppocr_det_server_2.0_client`的文件夹，具备如下格式：
+检测模型转换完成后，会在当前文件夹多出`ppocr_det_mobile_2.0_serving` 和`ppocr_det_mobile_2.0_client`的文件夹，具备如下格式：
 ```
-|- ppocr_det_server_2.0_serving/
+|- ppocr_det_mobile_2.0_serving/
  |- __model__  
  |- __params__
  |- serving_server_conf.prototxt  
  |- serving_server_conf.stream.prototxt

-|- ppocr_det_server_2.0_client
+|- ppocr_det_mobile_2.0_client
  |- serving_client_conf.prototxt  
  |- serving_client_conf.stream.prototxt

@@ -148,6 +138,77 @@ python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_server_v2.0_rec_in
    成功运行后，模型预测的结果会打印在cmd窗口中，结果示例为：
    ![](./imgs/results.png)

+    调整 config.yml 中的并发个数获得最大的QPS, 一般检测和识别的并发数为2：1
+    ```
+    det:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 8
+        ...
+    rec:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 4
+        ...
+    ```
+    有需要的话可以同时发送多个服务请求
+
+    预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
+
+    在200张真实图片上测试，把检测长边限制为960。T4 GPU 上 QPS 均值可达到23左右：
+
+    ```
+    2021-05-13 03:42:36,895 ==================== TRACER ======================
+    2021-05-13 03:42:36,975 Op(rec):
+    2021-05-13 03:42:36,976         in[14.472382882882883 ms]
+    2021-05-13 03:42:36,976         prep[9.556855855855856 ms]
+    2021-05-13 03:42:36,976         midp[59.921905405405404 ms]
+    2021-05-13 03:42:36,976         postp[15.345945945945946 ms]
+    2021-05-13 03:42:36,976         out[1.9921216216216215 ms]
+    2021-05-13 03:42:36,976         idle[0.16254943864471572]
+    2021-05-13 03:42:36,976 Op(det):
+    2021-05-13 03:42:36,976         in[315.4468035714286 ms]
+    2021-05-13 03:42:36,976         prep[69.5980625 ms]
+    2021-05-13 03:42:36,976         midp[18.989535714285715 ms]
+    2021-05-13 03:42:36,976         postp[18.857803571428573 ms]
+    2021-05-13 03:42:36,977         out[3.1337544642857145 ms]
+    2021-05-13 03:42:36,977         idle[0.7477961159203756]
+    2021-05-13 03:42:36,977 DAGExecutor:
+    2021-05-13 03:42:36,977         Query count[224]
+    2021-05-13 03:42:36,977         QPS[22.4 q/s]
+    2021-05-13 03:42:36,977         Succ[0.9910714285714286]
+    2021-05-13 03:42:36,977         Error req[169, 170]
+    2021-05-13 03:42:36,977         Latency:
+    2021-05-13 03:42:36,977                 ave[535.1678348214285 ms]
+    2021-05-13 03:42:36,977                 .50[172.651 ms]
+    2021-05-13 03:42:36,977                 .60[187.904 ms]
+    2021-05-13 03:42:36,977                 .70[245.675 ms]
+    2021-05-13 03:42:36,977                 .80[526.684 ms]
+    2021-05-13 03:42:36,977                 .90[854.596 ms]
+    2021-05-13 03:42:36,977                 .95[1722.728 ms]
+    2021-05-13 03:42:36,977                 .99[3990.292 ms]
+    2021-05-13 03:42:36,978 Channel (server worker num[10]):
+    2021-05-13 03:42:36,978         chl0(In: ['@DAGExecutor'], Out: ['det']) size[0/0]
+    2021-05-13 03:42:36,979         chl1(In: ['det'], Out: ['rec']) size[6/0]
+    2021-05-13 03:42:36,979         chl2(In: ['rec'], Out: ['@DAGExecutor']) size[0/0]
+    ```
+
+## WINDOWS用户
+
+Windows用户不能使用上述的启动方式，需要使用Web Service，详情参见[Windows平台使用Paddle Serving指导](https://github.com/PaddlePaddle/Serving/blob/develop/doc/WINDOWS_TUTORIAL_CN.md)
+
+
+1. 启动服务端程序
+
+```
+cd win 
+python3 ocr_web_server.py
+```
+
+2. 发送服务请求
+
+```
+python3 ocr_web_client.py
+```
+

 <a name="FAQ"></a>
 ## FAQ

--- a/deploy/pdserving/config.yml
+++ b/deploy/pdserving/config.yml
 #rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
-rpc_port: 18090
+rpc_port: 18091

 #http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
-http_port: 9999
+http_port: 9998

 #worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
 ##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
-worker_num: 20
+worker_num: 10

 #build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
-build_dag_each_worker: false
+build_dag_each_worker: False

 dag:
    #op资源类型, True, 为线程模型；False，为进程模型
    is_thread_op: False

    #重试次数
-    retry: 1
+    retry: 10

    #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
-    use_profile: False
+    use_profile: True
    
    tracer:
        interval_s: 10
 op:
    det:
        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
-        concurrency: 4
+        concurrency: 8

        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
        local_service_conf:
@@ -34,18 +34,18 @@ op:
            client_type: local_predictor

            #det模型路径
-            model_config: /paddle/serving/models/det_serving_server/ #ocr_det_model
+            model_config: ./ppocr_det_mobile_2.0_serving

            #Fetch结果列表，以client_config中fetch_var的alias_name为准
            fetch_list: ["save_infer_model/scale_0.tmp_1"]

            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
-            devices: "2"
+            devices: "0"
            
            ir_optim: True
    rec:
        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
-        concurrency: 1
+        concurrency: 4

        #超时时间, 单位ms
        timeout: -1
@@ -60,12 +60,12 @@ op:
            client_type: local_predictor

            #rec模型路径
-            model_config: /paddle/serving/models/rec_serving_server/ #ocr_rec_model
+            model_config: ./ppocr_rec_mobile_2.0_serving

            #Fetch结果列表，以client_config中fetch_var的alias_name为准
-            fetch_list: ["save_infer_model/scale_0.tmp_1"] #["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] 
+            fetch_list: ["save_infer_model/scale_0.tmp_1"]  

            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
-            devices: "2"
+            devices: "0"
            
            ir_optim: True
--- a/deploy/pdserving/ocr_reader.py
+++ b/deploy/pdserving/ocr_reader.py
@@ -21,7 +21,6 @@ import sys
 import argparse
 import string
 from copy import deepcopy
-import paddle


 class DetResizeForTest(object):
@@ -34,12 +33,12 @@ class DetResizeForTest(object):
        elif 'limit_side_len' in kwargs:
            self.limit_side_len = kwargs['limit_side_len']
            self.limit_type = kwargs.get('limit_type', 'min')
-        elif 'resize_long' in kwargs:
-            self.resize_type = 2
-            self.resize_long = kwargs.get('resize_long', 960)
-        else:
+        elif 'resize_short' in kwargs:
            self.limit_side_len = 736
            self.limit_type = 'min'
+        else:
+            self.resize_type = 2
+            self.resize_long = kwargs.get('resize_long', 960)

    def __call__(self, data):
        img = deepcopy(data)
@@ -227,8 +226,6 @@ class CTCLabelDecode(BaseRecLabelDecode):
        super(CTCLabelDecode, self).__init__(config)

    def __call__(self, preds, label=None, *args, **kwargs):
-        if isinstance(preds, paddle.Tensor):
-            preds = preds.numpy()
        preds_idx = preds.argmax(axis=2)
        preds_prob = preds.max(axis=2)
        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)

--- a/deploy/pdserving/pipeline_http_client.py
+++ b/deploy/pdserving/pipeline_http_client.py
@@ -23,8 +23,8 @@ def cv2_to_base64(image):
    return base64.b64encode(image).decode('utf8')


-url = "http://127.0.0.1:9999/ocr/prediction"
-test_img_dir = "../doc/imgs/"
+url = "http://127.0.0.1:9998/ocr/prediction"
+test_img_dir = "../../doc/imgs/"
 for idx, img_file in enumerate(os.listdir(test_img_dir)):
    with open(os.path.join(test_img_dir, img_file), 'rb') as file:
        image_data1 = file.read()
@@ -36,5 +36,5 @@ for idx, img_file in enumerate(os.listdir(test_img_dir)):
        r = requests.post(url=url, data=json.dumps(data))
        print(r.json())

-test_img_dir = "../doc/imgs/"
+test_img_dir = "../../doc/imgs/"
 print("==> total number of test imgs: ", len(os.listdir(test_img_dir)))
--- a/deploy/pdserving/pipeline_rpc_client.py
+++ b/deploy/pdserving/pipeline_rpc_client.py
@@ -23,7 +23,7 @@ import base64
 import os

 client = PipelineClient()
-client.connect(['127.0.0.1:18090'])
+client.connect(['127.0.0.1:18091'])


 def cv2_to_base64(image):
@@ -39,4 +39,3 @@ for img_file in os.listdir(test_img_dir):
 for i in range(1):
    ret = client.predict(feed_dict={"image": image}, fetch=["res"])
    print(ret)
-    #print(ret)
--- a/deploy/pdserving/web_service.py
+++ b/deploy/pdserving/web_service.py
@@ -11,10 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-try:
-    from paddle_serving_server_gpu.web_service import WebService, Op
-except ImportError:
-    from paddle_serving_server.web_service import WebService, Op
+from paddle_serving_server.web_service import WebService, Op

 import logging
 import numpy as np
@@ -48,28 +45,24 @@ class DetOp(Op):
    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
        data = base64.b64decode(input_dict["image"].encode('utf8'))
+        self.raw_im = data
        data = np.fromstring(data, np.uint8)
        # Note: class variables(self.var) can only be used in process op mode
        im = cv2.imdecode(data, cv2.IMREAD_COLOR)
-        self.im = im
        self.ori_h, self.ori_w, _ = im.shape
-
-        det_img = self.det_preprocess(self.im)
+        det_img = self.det_preprocess(im)
        _, self.new_h, self.new_w = det_img.shape
-        print("det image shape", det_img.shape)
        return {"x": det_img[np.newaxis, :].copy()}, False, None, ""

    def postprocess(self, input_dicts, fetch_dict, log_id):
-        print("input_dicts: ", input_dicts)
        det_out = fetch_dict["save_infer_model/scale_0.tmp_1"]
        ratio_list = [
            float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
        ]
        dt_boxes_list = self.post_func(det_out, [ratio_list])
        dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
-        out_dict = {"dt_boxes": dt_boxes, "image": self.im}
+        out_dict = {"dt_boxes": dt_boxes, "image": self.raw_im}

-        print("out dict", out_dict["dt_boxes"])
        return out_dict, None, ""


@@ -83,35 +76,75 @@ class RecOp(Op):

    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
-        im = input_dict["image"]
+        raw_im = input_dict["image"]
+        data = np.frombuffer(raw_im, np.uint8)
+        im = cv2.imdecode(data, cv2.IMREAD_COLOR)
        dt_boxes = input_dict["dt_boxes"]
        dt_boxes = self.sorted_boxes(dt_boxes)
        feed_list = []
        img_list = []
        max_wh_ratio = 0
-        for i, dtbox in enumerate(dt_boxes):
-            boximg = self.get_rotate_crop_image(im, dt_boxes[i])
-            img_list.append(boximg)
-            h, w = boximg.shape[0:2]
-            wh_ratio = w * 1.0 / h
-            max_wh_ratio = max(max_wh_ratio, wh_ratio)
-        _, w, h = self.ocr_reader.resize_norm_img(img_list[0],
-                                                  max_wh_ratio).shape
-
-        imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
-        for id, img in enumerate(img_list):
-            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            imgs[id] = norm_img
-        print("rec image shape", imgs.shape)
-        feed = {"x": imgs.copy()}
-        return feed, False, None, ""
-
-    def postprocess(self, input_dicts, fetch_dict, log_id):
-        rec_res = self.ocr_reader.postprocess(fetch_dict, with_score=True)
-        res_lst = []
-        for res in rec_res:
-            res_lst.append(res[0])
-        res = {"res": str(res_lst)}
+        ## Many mini-batchs, the type of feed_data is list.
+        max_batch_size = 6  # len(dt_boxes)
+
+        # If max_batch_size is 0, skipping predict stage
+        if max_batch_size == 0:
+            return {}, True, None, ""
+        boxes_size = len(dt_boxes)
+        batch_size = boxes_size // max_batch_size
+        rem = boxes_size % max_batch_size
+        for bt_idx in range(0, batch_size + 1):
+            imgs = None
+            boxes_num_in_one_batch = 0
+            if bt_idx == batch_size:
+                if rem == 0:
+                    continue
+                else:
+                    boxes_num_in_one_batch = rem
+            elif bt_idx < batch_size:
+                boxes_num_in_one_batch = max_batch_size
+            else:
+                _LOGGER.error("batch_size error, bt_idx={}, batch_size={}".
+                              format(bt_idx, batch_size))
+                break
+
+            start = bt_idx * max_batch_size
+            end = start + boxes_num_in_one_batch
+            img_list = []
+            for box_idx in range(start, end):
+                boximg = self.get_rotate_crop_image(im, dt_boxes[box_idx])
+                img_list.append(boximg)
+                h, w = boximg.shape[0:2]
+                wh_ratio = w * 1.0 / h
+                max_wh_ratio = max(max_wh_ratio, wh_ratio)
+            _, w, h = self.ocr_reader.resize_norm_img(img_list[0],
+                                                      max_wh_ratio).shape
+
+            imgs = np.zeros((boxes_num_in_one_batch, 3, w, h)).astype('float32')
+            for id, img in enumerate(img_list):
+                norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
+                imgs[id] = norm_img
+            feed = {"x": imgs.copy()}
+            feed_list.append(feed)
+
+        return feed_list, False, None, ""
+
+    def postprocess(self, input_dicts, fetch_data, log_id):
+        res_list = []
+        if isinstance(fetch_data, dict):
+            if len(fetch_data) > 0:
+                rec_batch_res = self.ocr_reader.postprocess(
+                    fetch_data, with_score=True)
+                for res in rec_batch_res:
+                    res_list.append(res[0])
+        elif isinstance(fetch_data, list):
+            for one_batch in fetch_data:
+                one_batch_res = self.ocr_reader.postprocess(
+                    one_batch, with_score=True)
+                for res in one_batch_res:
+                    res_list.append(res[0])
+
+        res = {"res": str(res_list)}
        return res, None, ""



--- a/deploy/pdserving/win/ocr_reader.py
+++ b/deploy/pdserving/win/ocr_reader.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import copy
+import numpy as np
+import math
+import re
+import sys
+import argparse
+import string
+from copy import deepcopy
+
+
+class DetResizeForTest(object):
+    def __init__(self, **kwargs):
+        super(DetResizeForTest, self).__init__()
+        self.resize_type = 0
+        if 'image_shape' in kwargs:
+            self.image_shape = kwargs['image_shape']
+            self.resize_type = 1
+        elif 'limit_side_len' in kwargs:
+            self.limit_side_len = kwargs['limit_side_len']
+            self.limit_type = kwargs.get('limit_type', 'min')
+        elif 'resize_short' in kwargs:
+            self.limit_side_len = 736
+            self.limit_type = 'min'
+        else:
+            self.resize_type = 2
+            self.resize_long = kwargs.get('resize_long', 960)
+
+    def __call__(self, data):
+        img = deepcopy(data)
+        src_h, src_w, _ = img.shape
+
+        if self.resize_type == 0:
+            img, [ratio_h, ratio_w] = self.resize_image_type0(img)
+        elif self.resize_type == 2:
+            img, [ratio_h, ratio_w] = self.resize_image_type2(img)
+        else:
+            img, [ratio_h, ratio_w] = self.resize_image_type1(img)
+
+        return img
+
+    def resize_image_type1(self, img):
+        resize_h, resize_w = self.image_shape
+        ori_h, ori_w = img.shape[:2]  # (h, w, c)
+        ratio_h = float(resize_h) / ori_h
+        ratio_w = float(resize_w) / ori_w
+        img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        return img, [ratio_h, ratio_w]
+
+    def resize_image_type0(self, img):
+        """
+        resize image to a size multiple of 32 which is required by the network
+        args:
+            img(array): array with shape [h, w, c]
+        return(tuple):
+            img, (ratio_h, ratio_w)
+        """
+        limit_side_len = self.limit_side_len
+        h, w, _ = img.shape
+
+        # limit the max side
+        if self.limit_type == 'max':
+            if max(h, w) > limit_side_len:
+                if h > w:
+                    ratio = float(limit_side_len) / h
+                else:
+                    ratio = float(limit_side_len) / w
+            else:
+                ratio = 1.
+        else:
+            if min(h, w) < limit_side_len:
+                if h < w:
+                    ratio = float(limit_side_len) / h
+                else:
+                    ratio = float(limit_side_len) / w
+            else:
+                ratio = 1.
+        resize_h = int(h * ratio)
+        resize_w = int(w * ratio)
+
+        resize_h = int(round(resize_h / 32) * 32)
+        resize_w = int(round(resize_w / 32) * 32)
+
+        try:
+            if int(resize_w) <= 0 or int(resize_h) <= 0:
+                return None, (None, None)
+            img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        except:
+            print(img.shape, resize_w, resize_h)
+            sys.exit(0)
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+        # return img, np.array([h, w])
+        return img, [ratio_h, ratio_w]
+
+    def resize_image_type2(self, img):
+        h, w, _ = img.shape
+
+        resize_w = w
+        resize_h = h
+
+        # Fix the longer side
+        if resize_h > resize_w:
+            ratio = float(self.resize_long) / resize_h
+        else:
+            ratio = float(self.resize_long) / resize_w
+
+        resize_h = int(resize_h * ratio)
+        resize_w = int(resize_w * ratio)
+
+        max_stride = 128
+        resize_h = (resize_h + max_stride - 1) // max_stride * max_stride
+        resize_w = (resize_w + max_stride - 1) // max_stride * max_stride
+        img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+
+        return img, [ratio_h, ratio_w]
+
+
+class BaseRecLabelDecode(object):
+    """ Convert between text-label and text-index """
+
+    def __init__(self, config):
+        support_character_type = [
+            'ch', 'en', 'EN_symbol', 'french', 'german', 'japan', 'korean',
+            'it', 'xi', 'pu', 'ru', 'ar', 'ta', 'ug', 'fa', 'ur', 'rs', 'oc',
+            'rsc', 'bg', 'uk', 'be', 'te', 'ka', 'chinese_cht', 'hi', 'mr',
+            'ne', 'EN'
+        ]
+        character_type = config['character_type']
+        character_dict_path = config['character_dict_path']
+        use_space_char = True
+        assert character_type in support_character_type, "Only {} are supported now but get {}".format(
+            support_character_type, character_type)
+
+        self.beg_str = "sos"
+        self.end_str = "eos"
+
+        if character_type == "en":
+            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+            dict_character = list(self.character_str)
+        elif character_type == "EN_symbol":
+            # same with ASTER setting (use 94 char).
+            self.character_str = string.printable[:-6]
+            dict_character = list(self.character_str)
+        elif character_type in support_character_type:
+            self.character_str = ""
+            assert character_dict_path is not None, "character_dict_path should not be None when character_type is {}".format(
+                character_type)
+            with open(character_dict_path, "rb") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    line = line.decode('utf-8').strip("\n").strip("\r\n")
+                    self.character_str += line
+            if use_space_char:
+                self.character_str += " "
+            dict_character = list(self.character_str)
+
+        else:
+            raise NotImplementedError
+        self.character_type = character_type
+        dict_character = self.add_special_char(dict_character)
+        self.dict = {}
+        for i, char in enumerate(dict_character):
+            self.dict[char] = i
+        self.character = dict_character
+
+    def add_special_char(self, dict_character):
+        return dict_character
+
+    def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
+        """ convert text-index into text-label. """
+        result_list = []
+        ignored_tokens = self.get_ignored_tokens()
+        batch_size = len(text_index)
+        for batch_idx in range(batch_size):
+            char_list = []
+            conf_list = []
+            for idx in range(len(text_index[batch_idx])):
+                if text_index[batch_idx][idx] in ignored_tokens:
+                    continue
+                if is_remove_duplicate:
+                    # only for predict
+                    if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
+                            batch_idx][idx]:
+                        continue
+                char_list.append(self.character[int(text_index[batch_idx][
+                    idx])])
+                if text_prob is not None:
+                    conf_list.append(text_prob[batch_idx][idx])
+                else:
+                    conf_list.append(1)
+            text = ''.join(char_list)
+            result_list.append((text, np.mean(conf_list)))
+        return result_list
+
+    def get_ignored_tokens(self):
+        return [0]  # for ctc blank
+
+
+class CTCLabelDecode(BaseRecLabelDecode):
+    """ Convert between text-label and text-index """
+
+    def __init__(
+            self,
+            config,
+            #character_dict_path=None,
+            #character_type='ch',
+            #use_space_char=False,
+            **kwargs):
+        super(CTCLabelDecode, self).__init__(config)
+
+    def __call__(self, preds, label=None, *args, **kwargs):
+        preds_idx = preds.argmax(axis=2)
+        preds_prob = preds.max(axis=2)
+        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
+        if label is None:
+            return text
+        label = self.decode(label)
+        return text, label
+
+    def add_special_char(self, dict_character):
+        dict_character = ['blank'] + dict_character
+        return dict_character
+
+
+class CharacterOps(object):
+    """ Convert between text-label and text-index """
+
+    def __init__(self, config):
+        self.character_type = config['character_type']
+        self.loss_type = config['loss_type']
+        if self.character_type == "en":
+            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+            dict_character = list(self.character_str)
+        elif self.character_type == "ch":
+            character_dict_path = config['character_dict_path']
+            self.character_str = ""
+            with open(character_dict_path, "rb") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    line = line.decode('utf-8').strip("\n").strip("\r\n")
+                    self.character_str += line
+            dict_character = list(self.character_str)
+        elif self.character_type == "en_sensitive":
+            # same with ASTER setting (use 94 char).
+            self.character_str = string.printable[:-6]
+            dict_character = list(self.character_str)
+        else:
+            self.character_str = None
+        assert self.character_str is not None, \
+            "Nonsupport type of the character: {}".format(self.character_str)
+        self.beg_str = "sos"
+        self.end_str = "eos"
+        if self.loss_type == "attention":
+            dict_character = [self.beg_str, self.end_str] + dict_character
+        self.dict = {}
+        for i, char in enumerate(dict_character):
+            self.dict[char] = i
+        self.character = dict_character
+
+    def encode(self, text):
+        """convert text-label into text-index.
+        input:
+            text: text labels of each image. [batch_size]
+
+        output:
+            text: concatenated text index for CTCLoss.
+                    [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
+            length: length of each text. [batch_size]
+        """
+        if self.character_type == "en":
+            text = text.lower()
+
+        text_list = []
+        for char in text:
+            if char not in self.dict:
+                continue
+            text_list.append(self.dict[char])
+        text = np.array(text_list)
+        return text
+
+    def decode(self, text_index, is_remove_duplicate=False):
+        """ convert text-index into text-label. """
+        char_list = []
+        char_num = self.get_char_num()
+
+        if self.loss_type == "attention":
+            beg_idx = self.get_beg_end_flag_idx("beg")
+            end_idx = self.get_beg_end_flag_idx("end")
+            ignored_tokens = [beg_idx, end_idx]
+        else:
+            ignored_tokens = [char_num]
+
+        for idx in range(len(text_index)):
+            if text_index[idx] in ignored_tokens:
+                continue
+            if is_remove_duplicate:
+                if idx > 0 and text_index[idx - 1] == text_index[idx]:
+                    continue
+            char_list.append(self.character[text_index[idx]])
+        text = ''.join(char_list)
+        return text
+
+    def get_char_num(self):
+        return len(self.character)
+
+    def get_beg_end_flag_idx(self, beg_or_end):
+        if self.loss_type == "attention":
+            if beg_or_end == "beg":
+                idx = np.array(self.dict[self.beg_str])
+            elif beg_or_end == "end":
+                idx = np.array(self.dict[self.end_str])
+            else:
+                assert False, "Unsupport type %s in get_beg_end_flag_idx"\
+                    % beg_or_end
+            return idx
+        else:
+            err = "error in get_beg_end_flag_idx when using the loss %s"\
+                % (self.loss_type)
+            assert False, err
+
+
+class OCRReader(object):
+    def __init__(self,
+                 algorithm="CRNN",
+                 image_shape=[3, 32, 320],
+                 char_type="ch",
+                 batch_num=1,
+                 char_dict_path="./ppocr_keys_v1.txt"):
+        self.rec_image_shape = image_shape
+        self.character_type = char_type
+        self.rec_batch_num = batch_num
+        char_ops_params = {}
+        char_ops_params["character_type"] = char_type
+        char_ops_params["character_dict_path"] = char_dict_path
+        char_ops_params['loss_type'] = 'ctc'
+        self.char_ops = CharacterOps(char_ops_params)
+        self.label_ops = CTCLabelDecode(char_ops_params)
+
+    def resize_norm_img(self, img, max_wh_ratio):
+        imgC, imgH, imgW = self.rec_image_shape
+        if self.character_type == "ch":
+            imgW = int(32 * max_wh_ratio)
+        h = img.shape[0]
+        w = img.shape[1]
+        ratio = w / float(h)
+        if math.ceil(imgH * ratio) > imgW:
+            resized_w = imgW
+        else:
+            resized_w = int(math.ceil(imgH * ratio))
+        resized_image = cv2.resize(img, (resized_w, imgH))
+        resized_image = resized_image.astype('float32')
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+        padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
+
+        padding_im[:, :, 0:resized_w] = resized_image
+        return padding_im
+
+    def preprocess(self, img_list):
+        img_num = len(img_list)
+        norm_img_batch = []
+        max_wh_ratio = 0
+        for ino in range(img_num):
+            h, w = img_list[ino].shape[0:2]
+            wh_ratio = w * 1.0 / h
+            max_wh_ratio = max(max_wh_ratio, wh_ratio)
+
+        for ino in range(img_num):
+            norm_img = self.resize_norm_img(img_list[ino], max_wh_ratio)
+            norm_img = norm_img[np.newaxis, :]
+            norm_img_batch.append(norm_img)
+        norm_img_batch = np.concatenate(norm_img_batch)
+        norm_img_batch = norm_img_batch.copy()
+
+        return norm_img_batch[0]
+
+    def postprocess_old(self, outputs, with_score=False):
+        rec_res = []
+        rec_idx_lod = outputs["ctc_greedy_decoder_0.tmp_0.lod"]
+        rec_idx_batch = outputs["ctc_greedy_decoder_0.tmp_0"]
+        if with_score:
+            predict_lod = outputs["softmax_0.tmp_0.lod"]
+        for rno in range(len(rec_idx_lod) - 1):
+            beg = rec_idx_lod[rno]
+            end = rec_idx_lod[rno + 1]
+            if isinstance(rec_idx_batch, list):
+                rec_idx_tmp = [x[0] for x in rec_idx_batch[beg:end]]
+            else:  #nd array
+                rec_idx_tmp = rec_idx_batch[beg:end, 0]
+            preds_text = self.char_ops.decode(rec_idx_tmp)
+            if with_score:
+                beg = predict_lod[rno]
+                end = predict_lod[rno + 1]
+                if isinstance(outputs["softmax_0.tmp_0"], list):
+                    outputs["softmax_0.tmp_0"] = np.array(outputs[
+                        "softmax_0.tmp_0"]).astype(np.float32)
+                probs = outputs["softmax_0.tmp_0"][beg:end, :]
+                ind = np.argmax(probs, axis=1)
+                blank = probs.shape[1]
+                valid_ind = np.where(ind != (blank - 1))[0]
+                score = np.mean(probs[valid_ind, ind[valid_ind]])
+                rec_res.append([preds_text, score])
+            else:
+                rec_res.append([preds_text])
+        return rec_res
+
+    def postprocess(self, outputs, with_score=False):
+        preds = outputs["save_infer_model/scale_0.tmp_1"]
+        try:
+            preds = preds.numpy()
+        except:
+            pass
+        preds_idx = preds.argmax(axis=2)
+        preds_prob = preds.max(axis=2)
+        text = self.label_ops.decode(
+            preds_idx, preds_prob, is_remove_duplicate=True)
+        return text
--- a/deploy/pdserving/win/ocr_web_client.py
+++ b/deploy/pdserving/win/ocr_web_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- coding: utf-8 -*-
+
+import requests
+import json
+import cv2
+import base64
+import os, sys
+import time
+
+
+def cv2_to_base64(image):
+    #data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(image).decode(
+        'utf8')  #data.tostring()).decode('utf8')
+
+
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:9292/ocr/prediction"
+
+test_img_dir = "../../../doc/imgs/"
+for idx, img_file in enumerate(os.listdir(test_img_dir)):
+    with open(os.path.join(test_img_dir, img_file), 'rb') as file:
+        image_data1 = file.read()
+
+    image = cv2_to_base64(image_data1)
+    for i in range(1):
+        data = {"feed": [{"image": image}], "fetch": ["save_infer_model/scale_0.tmp_1"]}
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        print(r.json())
+
+test_img_dir = "../../../doc/imgs/"
+print("==> total number of test imgs: ", len(os.listdir(test_img_dir)))
--- a/deploy/pdserving/win/ocr_web_server.py
+++ b/deploy/pdserving/win/ocr_web_server.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+import cv2
+import sys
+import numpy as np
+import os
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
+from paddle_serving_app.reader import Div, Normalize, Transpose
+from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
+from ocr_reader import OCRReader
+try:
+    from paddle_serving_server_gpu.web_service import WebService
+except ImportError:
+    from paddle_serving_server.web_service import WebService
+from paddle_serving_app.local_predict import LocalPredictor
+import time
+import re
+import base64
+
+
+class OCRService(WebService):
+    def init_det_debugger(self, det_model_config):
+        self.det_preprocess = Sequential([
+            ResizeByFactor(32, 960), Div(255),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
+                (2, 0, 1))
+        ])
+        self.det_client = LocalPredictor()
+        if sys.argv[1] == 'gpu':
+            self.det_client.load_model_config(
+                det_model_config, use_gpu=True, gpu_id=1)
+        elif sys.argv[1] == 'cpu':
+            self.det_client.load_model_config(det_model_config)
+        self.ocr_reader = OCRReader(
+            char_dict_path="../../../ppocr/utils/ppocr_keys_v1.txt")
+
+    def preprocess(self, feed=[], fetch=[]):
+        data = base64.b64decode(feed[0]["image"].encode('utf8'))
+        data = np.fromstring(data, np.uint8)
+        im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        ori_h, ori_w, _ = im.shape
+        det_img = self.det_preprocess(im)
+        _, new_h, new_w = det_img.shape
+        det_img = det_img[np.newaxis, :]
+        det_img = det_img.copy()
+        det_out = self.det_client.predict(
+            feed={"x": det_img}, fetch=["save_infer_model/scale_0.tmp_1"], batch=True)
+        filter_func = FilterBoxes(10, 10)
+        post_func = DBPostProcess({
+            "thresh": 0.3,
+            "box_thresh": 0.5,
+            "max_candidates": 1000,
+            "unclip_ratio": 1.5,
+            "min_size": 3
+        })
+        sorted_boxes = SortedBoxes()
+        ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
+        dt_boxes_list = post_func(det_out["save_infer_model/scale_0.tmp_1"], [ratio_list])
+        dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
+        dt_boxes = sorted_boxes(dt_boxes)
+        get_rotate_crop_image = GetRotateCropImage()
+        img_list = []
+        max_wh_ratio = 0
+        for i, dtbox in enumerate(dt_boxes):
+            boximg = get_rotate_crop_image(im, dt_boxes[i])
+            img_list.append(boximg)
+            h, w = boximg.shape[0:2]
+            wh_ratio = w * 1.0 / h
+            max_wh_ratio = max(max_wh_ratio, wh_ratio)
+        if len(img_list) == 0:
+            return [], []
+        _, w, h = self.ocr_reader.resize_norm_img(img_list[0],
+                                                  max_wh_ratio).shape
+        imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
+        for id, img in enumerate(img_list):
+            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
+            imgs[id] = norm_img
+        feed = {"x": imgs.copy()}
+        fetch = ["save_infer_model/scale_0.tmp_1"]
+        return feed, fetch, True
+
+    def postprocess(self, feed={}, fetch=[], fetch_map=None):
+        rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
+        res_lst = []
+        for res in rec_res:
+            res_lst.append(res[0])
+        res = {"res": res_lst}
+        return res
+
+
+ocr_service = OCRService(name="ocr")
+ocr_service.load_model_config("../ppocr_rec_mobile_2.0_serving")
+ocr_service.prepare_server(workdir="workdir", port=9292)
+ocr_service.init_det_debugger(det_model_config="../ppocr_det_mobile_2.0_serving")
+if sys.argv[1] == 'gpu':
+    ocr_service.set_gpus("0")
+    ocr_service.run_debugger_service(gpu=True)
+elif sys.argv[1] == 'cpu':
+    ocr_service.run_debugger_service()
+ocr_service.run_web_service()
--- a/deploy/slim/prune/README.md
+++ b/deploy/slim/prune/README.md
@@ -23,13 +23,13 @@

 ```bash
 git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd PaddleSlim
 git checkout develop
-cd Paddleslim
 python3 setup.py install
 ```

 ### 2. 获取预训练模型
-模型裁剪需要加载事先训练好的模型，PaddleOCR也提供了一系列(模型)[../../../doc/doc_ch/models_list.md]，开发者可根据需要自行选择模型或使用自己的模型。
+模型裁剪需要加载事先训练好的模型，PaddleOCR也提供了一系列[模型](../../../doc/doc_ch/models_list.md)，开发者可根据需要自行选择模型或使用自己的模型。

 ### 3. 敏感度分析训练

@@ -49,14 +49,14 @@ python3 setup.py install

 进入PaddleOCR根目录，通过以下命令对模型进行敏感度分析训练：
 ```bash
-python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrain_weights="your trained model"
+python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model" Global.save_model_dir=./output/prune_model/
 ```

 ### 4. 导出模型、预测部署

 在得到裁剪训练保存的模型后，我们可以将其导出为inference_model：
 ```bash
-pytho3.7 deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrain_weights=./output/det_db/best_accuracy  Global.save_inference_dir=inference_model
+pytho3.7 deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./output/det_db/best_accuracy  Global.save_inference_dir=./prune/prune_inference_model
 ```

 inference model的预测和部署参考：

--- a/deploy/slim/prune/README_en.md
+++ b/deploy/slim/prune/README_en.md
@@ -22,15 +22,15 @@ Five steps for OCR model prune:

 ```bash
 git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd PaddleSlim
 git checkout develop
-cd Paddleslim
 python3 setup.py install
 ```


 ### 2. Download Pretrain Model
 Model prune needs to load pre-trained models.
-PaddleOCR also provides a series of (models)[../../../doc/doc_en/models_list_en.md]. Developers can choose their own models or use their own models according to their needs.
+PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.md). Developers can choose their own models or use their own models according to their needs.


 ### 3. Pruning sensitivity analysis
@@ -54,7 +54,7 @@ Enter the PaddleOCR root directory，perform sensitivity analysis on the model w

 ```bash

-python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrain_weights="your trained model"
+python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model="your trained model"  Global.save_model_dir=./output/prune_model/

 ```

@@ -63,7 +63,7 @@ python3.7 deploy/slim/prune/sensitivity_anal.py -c configs/det/ch_ppocr_v2.0/ch_

 We can export the pruned model as inference_model for deployment:
 ```bash
-python deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml  -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
+python deploy/slim/prune/export_prune_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml  -o Global.pretrained_model=./output/det_db/best_accuracy  Global.save_inference_dir=./prune/prune_inference_model
 ```

 Reference for prediction and deployment of inference model:

--- a/deploy/slim/quantization/README.md
+++ b/deploy/slim/quantization/README.md
@@ -23,7 +23,7 @@

 ```bash
 git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
+cd PaddleSlim
 python setup.py install
 ```

@@ -37,12 +37,12 @@ PaddleOCR提供了一系列训练好的[模型](../../../doc/doc_ch/models_list.

 量化训练的代码位于slim/quantization/quant.py 中，比如训练检测模型，训练指令如下：
 ```bash
-python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model'   Global.save_model_dir=./output/quant_model
+python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model='your trained model'   Global.save_model_dir=./output/quant_model

 # 比如下载提供的训练模型
 wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
 tar -xf ch_ppocr_mobile_v2.0_det_train.tar
-python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v2.0_det_train/best_accuracy   Global.save_inference_dir=./output/quant_inference_model
+python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_ppocr_mobile_v2.0_det_train/best_accuracy   Global.save_model_dir=./output/quant_inference_model

 ```
 如果要训练识别模型的量化，修改配置文件和加载的模型参数即可。
@@ -52,7 +52,7 @@ python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global
 在得到量化训练保存的模型后，我们可以将其导出为inference_model，用于预测部署：

 ```bash
-python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
+python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_inference_dir=./output/quant_inference_model
 ```

 ### 5. 量化模型部署

--- a/deploy/slim/quantization/README_en.md
+++ b/deploy/slim/quantization/README_en.md
@@ -26,7 +26,7 @@ After training, if you want to further compress the model size and accelerate th

 ```bash
 git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
+cd PaddlSlim
 python setup.py install
 ```

@@ -43,13 +43,12 @@ After the quantization strategy is defined, the model can be quantified.

 The code for quantization training is located in `slim/quantization/quant.py`. For example, to train a detection model, the training instructions are as follows:
 ```bash
-python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model'   Global.save_model_dir=./output/quant_model
+python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model='your trained model'   Global.save_model_dir=./output/quant_model

 # download provided model
 wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
 tar -xf ch_ppocr_mobile_v2.0_det_train.tar
-python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v2.0_det_train/best_accuracy   Global.save_model_dir=./output/quant_model
-
+python deploy/slim/quantization/quant.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_ppocr_mobile_v2.0_det_train/best_accuracy   Global.save_model_dir=./output/quant_model
 ```


@@ -58,7 +57,7 @@ python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global
 After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:

 ```bash
-python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_inference_dir=./output/quant_inference_model
+python deploy/slim/quantization/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_inference_dir=./output/quant_inference_model
 ```

 ### 5. Deploy

--- a/deploy/slim/quantization/quant.py
+++ b/deploy/slim/quantization/quant.py
@@ -112,10 +112,6 @@ def main(config, device, logger, vdl_writer):
        config['Architecture']["Head"]['out_channels'] = char_num
    model = build_model(config['Architecture'])

-    # prepare to quant
-    quanter = QAT(config=quant_config, act_preprocess=PACT)
-    quanter.quantize(model)
-
    if config['Global']['distributed']:
        model = paddle.DataParallel(model)

@@ -136,31 +132,15 @@ def main(config, device, logger, vdl_writer):

    logger.info('train dataloader has {} iters, valid dataloader has {} iters'.
                format(len(train_dataloader), len(valid_dataloader)))
+    quanter = QAT(config=quant_config, act_preprocess=PACT)
+    quanter.quantize(model)
+
    # start train
    program.train(config, train_dataloader, valid_dataloader, device, model,
                  loss_class, optimizer, lr_scheduler, post_process_class,
                  eval_class, pre_best_model_dict, logger, vdl_writer)


-def test_reader(config, device, logger):
-    loader = build_dataloader(config, 'Train', device, logger)
-    import time
-    starttime = time.time()
-    count = 0
-    try:
-        for data in loader():
-            count += 1
-            if count % 1 == 0:
-                batch_time = time.time() - starttime
-                starttime = time.time()
-                logger.info("reader: {}, {}, {}".format(
-                    count, len(data[0]), batch_time))
-    except Exception as e:
-        logger.info(e)
-    logger.info("finish reader: {}, Success!".format(count))
-
-
 if __name__ == '__main__':
    config, device, logger, vdl_writer = program.preprocess(is_train=True)
    main(config, device, logger, vdl_writer)
-    # test_reader(config, device, logger)
--- a/doc/doc_ch/detection.md
+++ b/doc/doc_ch/detection.md
@@ -45,26 +45,17 @@ json.dumps编码前的图像标注信息是包含多个字典的list，字典中
 ## 快速启动训练

 首先下载模型backbone的pretrain model，PaddleOCR的检测模型目前支持两种backbone，分别是MobileNetV3、ResNet_vd系列，
-您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures)中的模型更换backbone。
+您可以根据需求使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/develop/ppcls/modeling/architectures)中的模型更换backbone，
+对应的backbone预训练模型可以从[PaddleClas repo 主页中找到下载链接](https://github.com/PaddlePaddle/PaddleClas#mobile-series)。
 ```shell
 cd PaddleOCR/
+# 根据backbone的不同选择下载对应的预训练模型
 # 下载MobileNetV3的预训练模型
-wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
+wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
 # 或，下载ResNet18_vd的预训练模型
-wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
+wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams
 # 或，下载ResNet50_vd的预训练模型
-wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
-
-# 解压预训练模型文件，以MobileNetV3为例
-tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/
-
-# 注：正确解压backbone预训练权重文件后，文件夹下包含众多以网络层命名的权重文件，格式如下：
-./pretrain_models/MobileNetV3_large_x0_5_pretrained/
-  └─ conv_last_bn_mean
-  └─ conv_last_bn_offset
-  └─ conv_last_bn_scale
-  └─ conv_last_bn_variance
-  └─ ......
+wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams

 ```

@@ -120,16 +111,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{pat

 测试单张图像的检测效果
 ```shell
-python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false
+python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
 ```

 测试DB模型时，调整后处理阈值，
 ```shell
-python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
+python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
 ```


 测试文件夹下所有图像的检测效果
 ```shell
-python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false
+python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
 ```
--- a/doc/doc_ch/distributed_training.md
+++ b/doc/doc_ch/distributed_training.md
+# 分布式训练
+
+## 简介
+
+* 分布式训练的高性能，是飞桨的核心优势技术之一，在分类任务上，分布式训练可以达到几乎线性的加速比。OCR训练任务中往往包含大量训练数据，以识别为例，ppocrv2.0模型在训练时使用了1800W数据，如果使用单机训练，会非常耗时。因此，PaddleOCR中使用分布式训练接口完成训练任务，同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考：[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。
+
+## 使用方法
+
+### 单机训练
+
+* 以识别为例，本地准备好数据之后，使用`paddle.distributed.launch`的接口启动训练任务即可。下面为运行代码示例。
+
+```shell
+python3 -m paddle.distributed.launch \
+    --log_dir=./log/ \
+    --gpus '0,1,2,3,4,5,6,7' \
+    tools/train.py \
+    -c configs/rec/rec_mv3_none_bilstm_ctc.yml
+```
+
+### 多机训练
+
+* 相比单机训练，多机训练时，只需要添加`--ips`的参数，该参数表示需要参与分布式训练的机器的ip列表，不同机器的ip用逗号隔开。下面为运行代码示例。
+
+
+```shell
+ip_list="192.168.0.1,192.168.0.2"
+python3 -m paddle.distributed.launch \
+    --log_dir=./log/ \
+    --ips="${ip_list}" \
+    --gpus="0,1,2,3,4,5,6,7" \
+    tools/train.py \
+    -c configs/rec/rec_mv3_none_bilstm_ctc.yml
+```
+
+**注：**
+* 不同机器的ip信息需要用逗号隔开，可以通过`ifconfig`或者`ipconfig`查看。
+* 不同机器之间需要做免密设置，且可以直接ping通，否则无法完成通信。
+* 不同机器之间的代码、数据与运行命令或脚本需要保持一致，且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0，以此类推。
+
+
+## 性能效果测试
+
+* 基于单机8卡P40，和2机8卡P40，在26W公开识别数据集(LSVT, RCTW, MTWI)上进行训练，最终耗时如下。
+
+|         模型             |     配置文件 |  机器数量    | 每台机器的GPU数量  |   训练时间    | 识别Acc    | 加速比 |
+| :----------------------: | :------------: | :------------: | :---------------: | :----------: | :-----------: | :-----------: |
+|          CRNN        |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml  | 1     |  8  |  60h  |  66.7% | - |
+|          CRNN        |   configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml   | 2   |  8  |  40h  |  67.0% | 150% |
+
+可以看出，精度没有下降的情况下，训练时间由60h缩短为了40h，加速比可以达到60h/40h=150%，效率为60h/(40h*2)=75%。
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md
@@ -12,7 +12,7 @@ inference 模型（`paddle.jit.save`保存的模型）
 - [一、训练模型转inference模型](#训练模型转inference模型)
    - [检测模型转inference模型](#检测模型转inference模型)
    - [识别模型转inference模型](#识别模型转inference模型)  
-    - [方向分类模型转inference模型](#方向分类模型转inference模型)  
+    - [方向分类模型转inference模型](#方向分类模型转inference模型)

 - [二、文本检测模型推理](#文本检测模型推理)
    - [1. 超轻量中文检测模型推理](#超轻量中文检测模型推理)
@@ -49,10 +49,9 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobi
 # -c 后面设置训练算法的yml配置文件
 # -o 配置可选参数
 # Global.pretrained_model 参数设置待转换的训练模型地址，不用添加文件后缀 .pdmodel，.pdopt或.pdparams。
-# Global.load_static_weights 参数需要设置为 False。
 # Global.save_inference_dir参数设置转换的模型将保存的地址。

-python3 tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_db/
+python3 tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_det_train/best_accuracy Global.save_inference_dir=./inference/det_db/
 ```
 转inference模型时，使用的配置文件和训练时使用的配置文件相同。另外，还需要设置配置文件中的`Global.pretrained_model`参数，其指向训练中保存的模型参数文件。
 转换成功后，在模型保存目录下有三个文件：
@@ -76,10 +75,9 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobi
 # -c 后面设置训练算法的yml配置文件
 # -o 配置可选参数
 # Global.pretrained_model 参数设置待转换的训练模型地址，不用添加文件后缀 .pdmodel，.pdopt或.pdparams。
-# Global.load_static_weights 参数需要设置为 False。
 # Global.save_inference_dir参数设置转换的模型将保存的地址。

-python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/rec_crnn/
+python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_rec_train/best_accuracy  Global.save_inference_dir=./inference/rec_crnn/
 ```

 **注意：**如果您是在自己的数据集上训练的模型，并且调整了中文字符的字典文件，请注意修改配置文件中的`character_dict_path`是否是所需要的字典文件。
@@ -105,10 +103,9 @@ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobi
 # -c 后面设置训练算法的yml配置文件
 # -o 配置可选参数
 # Global.pretrained_model 参数设置待转换的训练模型地址，不用添加文件后缀 .pdmodel，.pdopt或.pdparams。
-# Global.load_static_weights 参数需要设置为 False。
 # Global.save_inference_dir参数设置转换的模型将保存的地址。

-python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_cls_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/cls/
+python3 tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.pretrained_model=./ch_lite/ch_ppocr_mobile_v2.0_cls_train/best_accuracy  Global.save_inference_dir=./inference/cls/
 ```

 转换成功后，在目录下有三个文件：
@@ -164,7 +161,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_di
 首先将DB文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例（ [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar) )，可以使用如下命令进行转换：

 ```
-python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_db
+python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_db
 ```

 DB文本检测模型推理，可以执行如下命令：
@@ -185,7 +182,7 @@ python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_
 首先将EAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例（ [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar) )，可以使用如下命令进行转换：

 ```
-python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_east
+python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_east
 ```

 **EAST文本检测模型推理，需要设置参数`--det_algorithm="EAST"`**，可以执行如下命令：
@@ -205,7 +202,7 @@ python3 tools/infer/predict_det.py --det_algorithm="EAST" --image_dir="./doc/img
 #### (1). 四边形文本检测模型（ICDAR2015）  
 首先将SAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar))，可以使用如下命令进行转换：
 ```
-python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_ic15
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_sast_ic15

 ```
 **SAST文本检测模型推理，需要设置参数`--det_algorithm="SAST"`**，可以执行如下命令：
@@ -220,7 +217,7 @@ python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/img
 首先将SAST文本检测训练过程中保存的模型，转换成inference model。以基于Resnet50_vd骨干网络，在Total-Text英文数据集训练的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar))，可以使用如下命令进行转换：

 ```
-python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/det_sast_tt
+python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_sast_tt

 ```

@@ -270,7 +267,7 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:('实力活力', 0.98458153)
 的模型为例（ [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_none_bilstm_ctc_v2.0_train.tar) )，可以使用如下命令进行转换：

 ```
-python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model=./rec_r34_vd_none_bilstm_ctc_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/rec_crnn
+python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model=./rec_r34_vd_none_bilstm_ctc_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/rec_crnn
 ```

 CRNN 文本识别模型推理，可以执行如下命令：
@@ -362,17 +359,18 @@ Predicts of ./doc/imgs_words/ch/word_4.jpg:['0', 0.9999982]
 <a name="超轻量中文OCR模型推理"></a>
 ### 1. 超轻量中文OCR模型推理

-在执行预测时，需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测，方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。可视化识别结果默认保存到 ./inference_results 文件夹里面。
+在执行预测时，需要通过参数`image_dir`指定单张图像或者图像集合的路径、参数`det_model_dir`,`cls_model_dir`和`rec_model_dir`分别指定检测，方向分类和识别的inference模型路径。参数`use_angle_cls`用于控制是否启用方向分类模型。`use_mp`表示是否使用多进程。`total_process_num`表示在使用多进程时的进程数。可视化识别结果默认保存到 ./inference_results 文件夹里面。

-```
+```shell
 # 使用方向分类器
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true

 # 不使用方向分类器
 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false
-```
-

+# 使用多进程
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6
+```