Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into paddle2onnx

8485edfc · tink2123 · 48a6ebad · 0b37c118 · 8485edfc · 8485edfc
Commit 8485edfc authored Nov 08, 2021 by tink2123
20 changed files
--- a/PPOCRLabel/README.md
+++ b/PPOCRLabel/README.md
@@ -21,12 +21,9 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
  - Click to modify the recognition result.(If you can't change the result, please switch to the system default input method, or switch back to the original input method again)
 - 2020.12.18: Support re-recognition of a single label box (by [ninetailskim](https://github.com/ninetailskim) ), perfect shortcut keys.
-### TODO:
+## 1. Installation
- Lock box mode: For the same scene data, the size and position of the locked detection box can be transferred between different pictures.
-## Installation
+### 1.1 Environment Preparation
-### 1. Environment Preparation
 #### **Install PaddlePaddle 2.0**
@@ -66,7 +63,7 @@ If you getting this error `OSError: [WinError 126] The specified module could no
 Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
-### 2. Install PPOCRLabel
+### 1.2 Install PPOCRLabel
 #### Windows
@@ -94,9 +91,9 @@ cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
 python3 PPOCRLabel.py
 ```
-## Usage
+## 2. Usage
-### Steps
+### 2.1 Steps
 1. Build and launch using the instructions above.
@@ -140,9 +137,9 @@ python3 PPOCRLabel.py
 |  rec_gt.txt   | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Export recognition result". |
 |   crop_img    | The recognition data, generated at the same time with *rec_gt.txt* |
-## Explanation
+## 3. Explanation
-### Shortcut keys
+### 3.1 Shortcut keys
 | Shortcut keys            | Description                                      |
 | ------------------------ | ------------------------------------------------ |
@@ -162,31 +159,37 @@ python3 PPOCRLabel.py
 | Ctrl--                   | Zoom out                                         |
 | ↑→↓←                     | Move selected box                                |
-### Built-in Model
+### 3.2 Built-in Model
 - Default model: PPOCRLabel uses the Chinese and English ultra-lightweight OCR model in PaddleOCR by default, supports Chinese, English and number recognition, and multiple language detection.
 - Model language switching: Changing the built-in model language is supportable by clicking "PaddleOCR"-"Choose OCR Model" in the menu bar. Currently supported languagesinclude French, German, Korean, and Japanese.
  For specific model download links, please refer to [PaddleOCR Model List](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md#multilingual-recognition-modelupdating)
- Custom model: The model trained by users can be replaced by modifying PPOCRLabel.py in [PaddleOCR class instantiation](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110) referring [Custom Model Code](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md#use-custom-model)
+- **Custom Model**: If users want to replace the built-in model with their own inference model, they can follow the [Custom Model Code Usage](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/whl_en.md#31-use-by-code) by modifying PPOCRLabel.py for [Instantiation of PaddleOCR class](https://github.com/PaddlePaddle/PaddleOCR/blob/release/ 2.3/PPOCRLabel/PPOCRLabel.py#L116) :
+  add parameter `det_model_dir`  in `self.ocr = PaddleOCR(use_pdserving=False, use_angle_cls=True, det=True, cls=True, use_gpu=gpu, lang=lang) `
-### Export Label Result
+### 3.3 Export Label Result
 PPOCRLabel supports three ways to export Label.txt
 - Automatically export: After selecting "File - Auto Export Label Mode", the program will automatically write the annotations into Label.txt every time the user confirms an image. If this option is not turned on, it will be automatically exported after detecting that the user has manually checked 5 images.
+  > The automatically export mode is turned off by default
 - Manual export: Click "File-Export Marking Results" to manually export the label.
 - Close application export
-### Export Partial Recognition Results
+### 3.4 Export Partial Recognition Results
-For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox.
+For some data that are difficult to recognize, the recognition results will not be exported by **unchecking** the corresponding tags in the recognition results checkbox. The unchecked recognition result is saved as `True` in the `difficult` variable in the label file `label.txt`.
-*Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
+> *Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
-### Error message
+### 3.5 Error message
 - If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated.

--- a/PPOCRLabel/README_ch.md
+++ b/PPOCRLabel/README_ch.md
@@ -21,16 +21,12 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具，内置P
  - 识别结果更改为单击修改。（如果无法修改，请切换为系统自带输入法，或再次切回原输入法）
 - 2020.12.18： 支持对单个标记框进行重新识别（by [ninetailskim](https://github.com/ninetailskim)），完善快捷键。
-#### 尽请期待
- 锁定框模式：针对同一场景数据，被锁定的检测框的大小与位置能在不同图片之间传递。
 如果您对以上内容感兴趣或对完善工具有不一样的想法，欢迎加入我们的SIG队伍与我们共同开发。可以在[此处](https://github.com/PaddlePaddle/PaddleOCR/issues/1728)完成问卷和前置任务，经过我们确认相关内容后即可正式加入，享受SIG福利，共同为OCR开源事业贡献（特别说明：针对PPOCRLabel的改进也属于PaddleOCR前置任务）
-## 安装
+## 1. 安装
-### 1. 环境搭建
+### 1.1 环境搭建
 #### 安装PaddlePaddle
 ```bash
@@ -67,7 +63,7 @@ pip3 install -r requirements.txt
 注意，windows环境下，建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装， 直接通过pip安装的shapely库可能出现`[winRrror 126] 找不到指定模块的问题`。
-### 2. 安装PPOCRLabel
+### 1.2 安装PPOCRLabel
 #### Windows
@@ -95,11 +91,9 @@ cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
 python3 PPOCRLabel.py --lang ch
 ```
+## 2. 使用
+### 2.1 操作步骤
-## 使用
-### 操作步骤
 1. 安装与运行：使用上述命令安装与运行程序。
 2. 打开文件夹：在菜单栏点击 “文件” - "打开目录" 选择待标记图片的文件夹<sup>[1]</sup>.
@@ -130,9 +124,9 @@ python3 PPOCRLabel.py --lang ch
 |  rec_gt.txt   | 识别标签。可直接用于PPOCR识别模型训练。需用户手动点击菜单栏“文件” - "导出识别结果"后产生。 |
 |   crop_img    |   识别数据。按照检测框切割后的图片。与rec_gt.txt同时产生。   |
-## 说明
+## 3. 说明
-### 快捷键
+### 3.1 快捷键
 | 快捷键           | 说明                         |
 | ---------------- | ---------------------------- |
@@ -152,29 +146,35 @@ python3 PPOCRLabel.py --lang ch
 | Ctrl--           | 放大                         |
 | ↑→↓←             | 移动标记框                   |
-### 内置模型
+### 3.2 内置模型
 - 默认模型：PPOCRLabel默认使用PaddleOCR中的中英文超轻量OCR模型，支持中英文与数字识别，多种语言检测。
 - 模型语言切换：用户可通过菜单栏中 "PaddleOCR" - "选择模型" 切换内置模型语言，目前支持的语言包括法文、德文、韩文、日文。具体模型下载链接可参考[PaddleOCR模型列表](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md).
- - 自定义模型：用户可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B)，通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/PPOCRLabel/PPOCRLabel.py#L110)替换成自己训练的模型。
+ - **自定义模型**：如果用户想将内置模型更换为自己的推理模型，可根据[自定义模型代码使用](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/whl.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A8%A1%E5%9E%8B)，通过修改PPOCRLabel.py中针对[PaddleOCR类的实例化](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/PPOCRLabel.py#L116) :
-### 导出标记结果
+   `self.ocr = PaddleOCR(use_pdserving=False, use_angle_cls=True, det=True, cls=True, use_gpu=gpu, lang=lang) `，在 `det_model_dir` 中传入  自己的模型即可。 
+### 3.3 导出标记结果
 PPOCRLabel支持三种导出方式：
 - 自动导出：点击“文件 - 自动导出标记结果”后，用户每确认过一张图片，程序自动将标记结果写入Label.txt中。若未开启此选项，则检测到用户手动确认过5张图片后进行自动导出。
+  > 默认情况下自动导出功能为关闭状态
 - 手动导出：点击“文件 - 导出标记结果”手动导出标记。
 - 关闭应用程序导出
-### 导出部分识别结果
+### 3.4 导出部分识别结果
-针对部分难以识别的数据，通过在识别结果的复选框中**取消勾选**相应的标记，其识别结果不会被导出。
+针对部分难以识别的数据，通过在识别结果的复选框中**取消勾选**相应的标记，其识别结果不会被导出。被取消勾选的识别结果在标记文件 `label.txt` 中的 `difficult` 变量保存为 `True` 。
-*注意：识别结果中的复选框状态仍需用户手动点击确认后才能保留*
+> *注意：识别结果中的复选框状态仍需用户手动点击确认后才能保留*
-### 错误提示
+### 3.5 错误提示
 - 如果同时使用whl包安装了paddleocr，其优先级大于通过paddleocr.py调用PaddleOCR类，whl包未更新时会导致程序异常。
 - PPOCRLabel**不支持对中文文件名**的图片进行自动标注。
@@ -194,6 +194,6 @@ PPOCRLabel支持三种导出方式：
    pip install opencv-contrib-python-headless==4.2.0.32
    ```
-### 参考资料
+### 4. 参考资料
 1.[Tzutalin. LabelImg. Git code (2015)](https://github.com/tzutalin/labelImg)
--- a/README.md
+++ b/README.md
@@ -119,7 +119,7 @@ For a new language request, please refer to [Guideline for new language_requests
    - [Table Recognition](./ppstructure/table/README.md)
 - Academic Circles
    - [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md)
-    - [PGNet Algorithm](./doc/doc_en/algorithm_overview_en.md)
+    - [PGNet Algorithm](./doc/doc_en/pgnet_en.md)
    - [Python Inference](./doc/doc_en/inference_en.md)
 - Data Annotation and Synthesis
    - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)

--- a/README_ch.md
+++ b/README_ch.md
@@ -109,15 +109,16 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 - [PP-Structure信息提取](./ppstructure/README_ch.md)
    - [版面分析](./ppstructure/layout/README_ch.md)
    - [表格识别](./ppstructure/table/README_ch.md)
+- OCR学术圈
+    - [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
+    - [端到端PGNet算法](./doc/doc_ch/pgnet.md)
+    - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
+    - [使用PaddleOCR架构添加新算法](./doc/doc_ch/add_new_algorithm.md)
 - 数据标注与合成
    - [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
    - [数据合成工具Style-Text](./StyleText/README_ch.md)
    - [其它数据标注工具](./doc/doc_ch/data_annotation.md)
    - [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- OCR学术圈
-    - [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
-    - [端到端PGNet算法](./doc/doc_ch/pgnet.md)
-    - [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
 - 数据集
    - [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
    - [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)

--- a/benchmark/readme.md
+++ b/benchmark/readme.md
-# PaddleOCR DB/EAST 算法训练benchmark测试
+# PaddleOCR DB/EAST/PSE 算法训练benchmark测试
 PaddleOCR/benchmark目录下的文件用于获取并分析训练日志。
 训练采用icdar2015数据集，包括1000张训练图像和500张测试图像。模型配置采用resnet18_vd作为backbone，分别训练batch_size=8和batch_size=16的情况。
@@ -28,7 +28,3 @@ det_res18_db_v2.0_sp_bs8_fp32_1
 det_res18_db_v2.0_mp_bs16_fp32_1
 det_res18_db_v2.0_mp_bs8_fp32_1
 ```
--- a/benchmark/run_benchmark_det.sh
+++ b/benchmark/run_benchmark_det.sh
@@ -6,7 +6,7 @@ function _set_params(){
    run_mode=${1:-"sp"}          # 单卡sp|多卡mp
    batch_size=${2:-"64"}
    fp_item=${3:-"fp32"}        # fp32|fp16
-    max_iter=${4:-"500"}       # 可选，如果需要修改代码提前中断
+    max_iter=${4:-"10"}       # 可选，如果需要修改代码提前中断
    model_name=${5:-"model_name"}
    run_log_path=${TRAIN_LOG_DIR:-$(pwd)}  # TRAIN_LOG_DIR 后续QA设置该参数
@@ -20,7 +20,7 @@ function _train(){
    echo "Train on ${num_gpu_devices} GPUs"
    echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size"
-    train_cmd="-c configs/det/${model_name}.yml -o Train.loader.batch_size_per_card=${batch_size} Global.epoch_num=${max_iter} "   
+    train_cmd="-c configs/det/${model_name}.yml -o Train.loader.batch_size_per_card=${batch_size} Global.epoch_num=${max_iter} Global.eval_batch_step=[0,20000] Global.print_batch_step=2"   
    case ${run_mode} in
      sp) 
        train_cmd="python3.7 tools/train.py "${train_cmd}""
@@ -39,18 +39,24 @@ function _train(){
        echo -e "${model_name}, SUCCESS"
        export job_fail_flag=0
    fi
-    kill -9 `ps -ef|grep 'python3.7'|awk '{print $2}'`
    if [ $run_mode = "mp" -a -d mylog ]; then
        rm ${log_file}
        cp mylog/workerlog.0 ${log_file}
    fi
+}
-    # run log analysis
+function _analysis_log(){
-    analysis_cmd="python3.7 benchmark/analysis.py --filename ${log_file}  --mission_name ${model_name} --run_mode ${mode} --direction_id 0 --keyword 'ips:' --base_batch_size ${batch_szie} --skip_steps 1 --gpu_num ${num_gpu_devices}  --index 1  --model_mode=-1  --ips_unit=samples/sec"
+    analysis_cmd="python3.7 benchmark/analysis.py --filename ${log_file}  --mission_name ${model_name} --run_mode ${run_mode} --direction_id 0 --keyword 'ips:' --base_batch_size ${batch_size} --skip_steps 1 --gpu_num ${num_gpu_devices}  --index 1  --model_mode=-1  --ips_unit=samples/sec"
    eval $analysis_cmd
 }
+function _kill_process(){
+    kill -9 `ps -ef|grep 'python3.7'|awk '{print $2}'`
+}
 _set_params $@
 _train
+_analysis_log
+_kill_process
\ No newline at end of file
--- a/benchmark/run_det.sh
+++ b/benchmark/run_det.sh
@@ -3,11 +3,11 @@
 # 1 安装该模型需要的依赖 (如需开启优化策略请注明)
 python3.7 -m pip install -r requirements.txt
 # 2 拷贝该模型需要数据、预训练模型
-wget -c  -p ./tain_data/  https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/icdar2015.tar && cd train_data  && tar xf icdar2015.tar && cd ../
+wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/icdar2015.tar && cd train_data  && tar xf icdar2015.tar && cd ../
-wget -c -p ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
+wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
 # 3 批量运行（如不方便批量，1，2需放到单个模型中）
-model_mode_list=(det_res18_db_v2.0 det_r50_vd_east)
+model_mode_list=(det_res18_db_v2.0 det_r50_vd_east det_r50_vd_pse)
 fp_item_list=(fp32)
 bs_list=(8 16)
 for model_mode in ${model_mode_list[@]}; do
@@ -15,11 +15,11 @@ for model_mode in ${model_mode_list[@]}; do
          for bs_item in ${bs_list[@]}; do
            echo "index is speed, 1gpus, begin, ${model_name}"
            run_mode=sp
-            CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode}     #  (5min)
+            CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 2 ${model_mode}     #  (5min)
            sleep 60
            echo "index is speed, 8gpus, run_mode is multi_process, begin, ${model_name}"
            run_mode=mp
-            CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode} 
+            CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 2 ${model_mode} 
            sleep 60
            done
      done

--- a/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_distill.yml
+++ b/configs/det/ch_PP-OCRv2/ch_PP-OCR_det_distill.yml
@@ -90,7 +90,7 @@ Optimizer:
 PostProcess:
  name: DistillationDBPostProcess
-  model_name: ["Student", "Student2"]
+  model_name: ["Student"]
  key: head_out
  thresh: 0.3
  box_thresh: 0.6

--- a/deploy/cpp_infer/include/ocr_rec.h
+++ b/deploy/cpp_infer/include/ocr_rec.h
@@ -44,7 +44,8 @@ public:
                          const int &gpu_id, const int &gpu_mem,
                          const int &cpu_math_library_num_threads,
                          const bool &use_mkldnn, const string &label_path,
-                          const bool &use_tensorrt, const std::string &precision) {
+                          const bool &use_tensorrt, const std::string &precision,
+                          const int &rec_batch_num) {
    this->use_gpu_ = use_gpu;
    this->gpu_id_ = gpu_id;
    this->gpu_mem_ = gpu_mem;
@@ -52,6 +53,7 @@ public:
    this->use_mkldnn_ = use_mkldnn;
    this->use_tensorrt_ = use_tensorrt;
    this->precision_ = precision;
+    this->rec_batch_num_ = rec_batch_num;
    this->label_list_ = Utility::ReadDict(label_path);
    this->label_list_.insert(this->label_list_.begin(),
@@ -64,7 +66,7 @@ public:
  // Load Paddle inference model
  void LoadModel(const std::string &model_dir);
-  void Run(cv::Mat &img, std::vector<double> *times);
+  void Run(std::vector<cv::Mat> img_list, std::vector<double> *times);
 private:
  std::shared_ptr<Predictor> predictor_;
@@ -82,10 +84,12 @@ private:
  bool is_scale_ = true;
  bool use_tensorrt_ = false;
  std::string precision_ = "fp32";
+  int rec_batch_num_ = 6;
  // pre-process
  CrnnResizeImg resize_op_;
  Normalize normalize_op_;
-  Permute permute_op_;
+  PermuteBatch permute_op_;
  // post-process
  PostProcessor post_processor_;

--- a/deploy/cpp_infer/include/preprocess_op.h
+++ b/deploy/cpp_infer/include/preprocess_op.h
@@ -44,6 +44,11 @@ public:
  virtual void Run(const cv::Mat *im, float *data);
 };
+class PermuteBatch {
+public:
+  virtual void Run(const std::vector<cv::Mat> imgs, float *data);
+};
 class ResizeImgType0 {
 public:
  virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len,

--- a/deploy/cpp_infer/include/utility.h
+++ b/deploy/cpp_infer/include/utility.h
@@ -50,6 +50,9 @@ public:
  static cv::Mat GetRotateCropImage(const cv::Mat &srcimage,
                          std::vector<std::vector<int>> box);
+  static std::vector<int> argsort(const std::vector<float>& array);
 };
 } // namespace PaddleOCR
\ No newline at end of file
--- a/deploy/cpp_infer/src/main.cpp
+++ b/deploy/cpp_infer/src/main.cpp
@@ -61,7 +61,7 @@ DEFINE_string(cls_model_dir, "", "Path of cls inference model.");
 DEFINE_double(cls_thresh, 0.9, "Threshold of cls_thresh.");
 // recognition related
 DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
-DEFINE_int32(rec_batch_num, 1, "rec_batch_num.");
+DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
 DEFINE_string(char_list_file, "../../ppocr/utils/ppocr_keys_v1.txt", "Path of dictionary.");
@@ -146,8 +146,9 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
    CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
                       FLAGS_gpu_mem, FLAGS_cpu_threads,
                       FLAGS_enable_mkldnn, char_list_file,
-                       FLAGS_use_tensorrt, FLAGS_precision);
+                       FLAGS_use_tensorrt, FLAGS_precision, FLAGS_rec_batch_num);
+    std::vector<cv::Mat> img_list;
    for (int i = 0; i < cv_all_img_names.size(); ++i) {
      LOG(INFO) << "The predict img: " << cv_all_img_names[i];
@@ -156,14 +157,13 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
        std::cerr << "[ERROR] image read failed! image path: " << cv_all_img_names[i] << endl;
        exit(1);
      }
+      img_list.push_back(srcimg);
+    }
    std::vector<double> rec_times;
-      rec.Run(srcimg, &rec_times);
+    rec.Run(img_list, &rec_times);
    time_info[0] += rec_times[0];
    time_info[1] += rec_times[1];
    time_info[2] += rec_times[2];
-    }
    if (FLAGS_benchmark) {
        AutoLogger autolog("ocr_rec", 
@@ -171,7 +171,7 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
                           FLAGS_use_tensorrt,
                           FLAGS_enable_mkldnn,
                           FLAGS_cpu_threads,
-                           1, 
+                           FLAGS_rec_batch_num, 
                           "dynamic", 
                           FLAGS_precision, 
                           time_info, 
@@ -209,7 +209,7 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
    CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
                       FLAGS_gpu_mem, FLAGS_cpu_threads,
                       FLAGS_enable_mkldnn, char_list_file,
-                       FLAGS_use_tensorrt, FLAGS_precision);
+                       FLAGS_use_tensorrt, FLAGS_precision, FLAGS_rec_batch_num);
    for (int i = 0; i < cv_all_img_names.size(); ++i) {
      LOG(INFO) << "The predict img: " << cv_all_img_names[i];
@@ -228,19 +228,22 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
      time_info_det[1] += det_times[1];
      time_info_det[2] += det_times[2];
-      cv::Mat crop_img;
+      std::vector<cv::Mat> img_list;
      for (int j = 0; j < boxes.size(); j++) {
+          cv::Mat crop_img;
          crop_img = Utility::GetRotateCropImage(srcimg, boxes[j]);
          if (cls != nullptr) {
              crop_img = cls->Run(crop_img);
          }
-        rec.Run(crop_img, &rec_times);
+          img_list.push_back(crop_img);
+      }
+      rec.Run(img_list, &rec_times);
      time_info_rec[0] += rec_times[0];
      time_info_rec[1] += rec_times[1];
      time_info_rec[2] += rec_times[2];
    }
-    }
    if (FLAGS_benchmark) {
        AutoLogger autolog_det("ocr_det", 
                            FLAGS_use_gpu,
@@ -257,7 +260,7 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
                            FLAGS_use_tensorrt,
                            FLAGS_enable_mkldnn,
                            FLAGS_cpu_threads,
-                            1, 
+                            FLAGS_rec_batch_num, 
                            "dynamic", 
                            FLAGS_precision, 
                            time_info_rec, 

--- a/deploy/cpp_infer/src/ocr_rec.cpp
+++ b/deploy/cpp_infer/src/ocr_rec.cpp
@@ -16,27 +16,48 @@
 namespace PaddleOCR {
-void CRNNRecognizer::Run(cv::Mat &img, std::vector<double> *times) {
+void CRNNRecognizer::Run(std::vector<cv::Mat> img_list, std::vector<double> *times) {
-  cv::Mat srcimg;
+    std::chrono::duration<float> preprocess_diff = std::chrono::steady_clock::now() - std::chrono::steady_clock::now();
-  img.copyTo(srcimg);
+    std::chrono::duration<float> inference_diff = std::chrono::steady_clock::now() - std::chrono::steady_clock::now();
-  cv::Mat resize_img;
+    std::chrono::duration<float> postprocess_diff = std::chrono::steady_clock::now() - std::chrono::steady_clock::now();
+    int img_num = img_list.size();
+    std::vector<float> width_list;
+    for (int i = 0; i < img_num; i++) {
+        width_list.push_back(float(img_list[i].cols) / img_list[i].rows);
+    }
+    std::vector<int> indices = Utility::argsort(width_list);
-  float wh_ratio = float(srcimg.cols) / float(srcimg.rows);
+    for (int beg_img_no = 0; beg_img_no < img_num; beg_img_no += this->rec_batch_num_) {
        auto preprocess_start = std::chrono::steady_clock::now();
-  this->resize_op_.Run(srcimg, resize_img, wh_ratio, this->use_tensorrt_);
+        int end_img_no = min(img_num, beg_img_no + this->rec_batch_num_);
+        float max_wh_ratio = 0;
-  this->normalize_op_.Run(&resize_img, this->mean_, this->scale_,
+        for (int ino = beg_img_no; ino < end_img_no; ino ++) {
-                          this->is_scale_);
+            int h = img_list[indices[ino]].rows;
+            int w = img_list[indices[ino]].cols;
-  std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
+            float wh_ratio = w * 1.0 / h;
+            max_wh_ratio = max(max_wh_ratio, wh_ratio);
+        }
+        std::vector<cv::Mat> norm_img_batch;
+        for (int ino = beg_img_no; ino < end_img_no; ino ++) {
+            cv::Mat srcimg;
+            img_list[indices[ino]].copyTo(srcimg);
+            cv::Mat resize_img;
+            this->resize_op_.Run(srcimg, resize_img, max_wh_ratio, this->use_tensorrt_);
+            this->normalize_op_.Run(&resize_img, this->mean_, this->scale_, this->is_scale_);
+            norm_img_batch.push_back(resize_img);
+        }
-  this->permute_op_.Run(&resize_img, input.data());
+        int batch_width = int(ceilf(32 * max_wh_ratio)) - 1;
+        std::vector<float> input(this->rec_batch_num_ * 3 * 32 * batch_width, 0.0f);
+        this->permute_op_.Run(norm_img_batch, input.data());
        auto preprocess_end = std::chrono::steady_clock::now();
+        preprocess_diff += preprocess_end - preprocess_start;
        // Inference.
        auto input_names = this->predictor_->GetInputNames();
        auto input_t = this->predictor_->GetInputHandle(input_names[0]);
-  input_t->Reshape({1, 3, resize_img.rows, resize_img.cols});
+        input_t->Reshape({this->rec_batch_num_, 3, 32, batch_width});
        auto inference_start = std::chrono::steady_clock::now();
        input_t->CopyFromCpu(input.data());
        this->predictor_->Run();
@@ -52,9 +73,11 @@ void CRNNRecognizer::Run(cv::Mat &img, std::vector<double> *times) {
        output_t->CopyToCpu(predict_batch.data());
        auto inference_end = std::chrono::steady_clock::now();
+        inference_diff += inference_end - inference_start;
        // ctc decode
        auto postprocess_start = std::chrono::steady_clock::now();
+        for (int m = 0; m < predict_shape[0]; m++) {
            std::vector<std::string> str_res;
            int argmax_idx;
            int last_index = 0;
@@ -64,11 +87,11 @@ void CRNNRecognizer::Run(cv::Mat &img, std::vector<double> *times) {
            for (int n = 0; n < predict_shape[1]; n++) {
                argmax_idx =
-        int(Utility::argmax(&predict_batch[n * predict_shape[2]],
+                    int(Utility::argmax(&predict_batch[(m * predict_shape[1] + n) * predict_shape[2]],
-                            &predict_batch[(n + 1) * predict_shape[2]]));
+                                        &predict_batch[(m * predict_shape[1] + n + 1) * predict_shape[2]]));
                max_value =
-        float(*std::max_element(&predict_batch[n * predict_shape[2]],
+                    float(*std::max_element(&predict_batch[(m * predict_shape[1] + n) * predict_shape[2]],
-                                &predict_batch[(n + 1) * predict_shape[2]]));
+                                            &predict_batch[(m * predict_shape[1] + n + 1) * predict_shape[2]]));
                if (argmax_idx > 0 && (!(n > 0 && argmax_idx == last_index))) {
                    score += max_value;
@@ -77,21 +100,23 @@ void CRNNRecognizer::Run(cv::Mat &img, std::vector<double> *times) {
                }
                last_index = argmax_idx;
            }
-  auto postprocess_end = std::chrono::steady_clock::now();
            score /= count;
+            if (isnan(score))
+                continue;
            for (int i = 0; i < str_res.size(); i++) {
                std::cout << str_res[i];
            }
            std::cout << "\tscore: " << score << std::endl;
+        }
-  std::chrono::duration<float> preprocess_diff = preprocess_end - preprocess_start;
+        auto postprocess_end = std::chrono::steady_clock::now();
+        postprocess_diff += postprocess_end - postprocess_start;
+    }
    times->push_back(double(preprocess_diff.count() * 1000));
-  std::chrono::duration<float> inference_diff = inference_end - inference_start;
    times->push_back(double(inference_diff.count() * 1000));
-  std::chrono::duration<float> postprocess_diff = postprocess_end - postprocess_start;
    times->push_back(double(postprocess_diff.count() * 1000));
 }
 void CRNNRecognizer::LoadModel(const std::string &model_dir) {
  //   AnalysisConfig config;
  paddle_infer::Config config;

--- a/deploy/cpp_infer/src/preprocess_op.cpp
+++ b/deploy/cpp_infer/src/preprocess_op.cpp
@@ -40,6 +40,17 @@ void Permute::Run(const cv::Mat *im, float *data) {
  }
 }
+void PermuteBatch::Run(const std::vector<cv::Mat> imgs, float *data) {
+    for (int j = 0; j < imgs.size(); j ++){
+        int rh = imgs[j].rows;
+        int rw = imgs[j].cols;
+        int rc = imgs[j].channels();
+        for (int i = 0; i < rc; ++i) {
+            cv::extractChannel(imgs[j], cv::Mat(rh, rw, CV_32FC1, data + (j * rc + i) * rh * rw), i);
+        }
+    }
+}
 void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
                    const std::vector<float> &scale, const bool is_scale) {
  double e = 1.0;
@@ -95,6 +106,7 @@ void CrnnResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img, float wh_ratio,
  float ratio = float(img.cols) / float(img.rows);
  int resize_w, resize_h;
  if (ceilf(imgH * ratio) > imgW)
    resize_w = imgW;
  else

--- a/deploy/cpp_infer/src/utility.cpp
+++ b/deploy/cpp_infer/src/utility.cpp
@@ -147,4 +147,17 @@ cv::Mat Utility::GetRotateCropImage(const cv::Mat &srcimage,
  }
 }
+std::vector<int> Utility::argsort(const std::vector<float>& array)
+{
+    const int array_len(array.size());
+    std::vector<int> array_index(array_len, 0);
+    for (int i = 0; i < array_len; ++i)
+        array_index[i] = i;
+    std::sort(array_index.begin(), array_index.end(),
+        [&array](int pos1, int pos2) {return (array[pos1] < array[pos2]); });
+    return array_index;
+}
 } // namespace PaddleOCR
\ No newline at end of file
--- a/deploy/pdserving/README.md
+++ b/deploy/pdserving/README.md
@@ -114,7 +114,7 @@ The recognition model is the same.
    git clone https://github.com/PaddlePaddle/PaddleOCR
    # Enter the working directory  
-    cd PaddleOCR/deploy/pdserver/
+    cd PaddleOCR/deploy/pdserving/
    ```
    The pdserver directory contains the code to start the pipeline service and send prediction requests, including:

--- a/deploy/pdserving/README_CN.md
+++ b/deploy/pdserving/README_CN.md
@@ -112,7 +112,7 @@ python3 -m paddle_serving_client.convert --dirname ./ch_ppocr_mobile_v2.0_rec_in
    git clone https://github.com/PaddlePaddle/PaddleOCR
    # 进入到工作目录
-    cd PaddleOCR/deploy/pdserver/
+    cd PaddleOCR/deploy/pdserving/
    ```
    pdserver目录包含启动pipeline服务和发送预测请求的代码，包括：
    ```

--- a/doc/doc_ch/detection.md
+++ b/doc/doc_ch/detection.md
@@ -96,6 +96,10 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
 # 单机多卡训练，通过 --gpus 参数设置使用的GPU ID
 python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
+# 多机多卡训练，通过 --ips 参数设置使用的机器IP地址，通过 --gpus 参数设置使用的GPU ID
+python3 -m paddle.distributed.launch --ips="10.21.226.181,10.21.226.133" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
 ```
 上述指令中，通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。
@@ -106,6 +110,15 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
 ```
+**注意:** 采用多机多卡训练时，需要替换上面命令中的ips值为您机器的地址，机器之间需要能够相互ping通。查看机器ip地址的命令为`ifconfig`。
+如果您想进一步加快训练速度，可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html)， 以单机单卡为例，命令如下：
+```shell
+python3 tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
+ ```
 <a name="14-----"></a>
 ## 1.4 断点训练

--- a/doc/doc_ch/training.md
+++ b/doc/doc_ch/training.md
@@ -4,15 +4,16 @@
 同时会简单介绍PaddleOCR模型训练数据的组成部分，以及如何在垂类场景中准备数据finetune模型。
- [1. 基本概念](#基本概念)
+- [1.配置文件说明](#配置文件)
-  * [1.1 学习率](#学习率)
+- [2. 基本概念](#基本概念)
-  * [1.2 正则化](#正则化)
+  * [2.1 学习率](#学习率)
-  * [1.3 评估指标](#评估指标)
+  * [2.2 正则化](#正则化)
- [2. 数据与垂类场景](#数据与垂类场景)
+  * [2.3 评估指标](#评估指标)
-  * [2.1 训练数据](#训练数据)
+- [3. 数据与垂类场景](#数据与垂类场景)
-  * [2.2 垂类场景](#垂类场景)
+  * [3.1 训练数据](#训练数据)
-  * [2.3 自己构建数据集](#自己构建数据集)
+  * [3.2 垂类场景](#垂类场景)
-* [3. 常见问题](#常见问题)
+  * [3.3 自己构建数据集](#自己构建数据集)
+* [4. 常见问题](#常见问题)
 <a name="基本概念"></a>
 ## 1. 基本概念
@@ -23,7 +24,7 @@ OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析
 模型调优时需要关注以下参数：
 <a name="学习率"></a>
-### 1.1 学习率
+### 2.1 学习率
 学习率是训练神经网络的重要超参数之一，它代表在每一次迭代中梯度向损失函数最优解移动的步长。
 在PaddleOCR中提供了多种学习率更新策略,可以通过配置文件修改，例如：
@@ -42,7 +43,7 @@ Piecewise 代表分段常数衰减，在不同的学习阶段指定不同的学
 warmup_epoch 代表在前5个epoch中，学习率将逐渐从0增加到base_lr。全部策略可以参考代码[learning_rate.py](../../ppocr/optimizer/learning_rate.py) 。
 <a name="正则化"></a>
-### 1.2 正则化
+### 2.2 正则化
 正则化可以有效的避免算法过拟合，PaddleOCR中提供了L1、L2正则方法，L1 和 L2 正则化是最常用的正则化方法。L1 正则化向目标函数添加正则化项，以减少参数的绝对值总和；而 L2 正则化中，添加正则化项的目的在于减少参数平方的总和。配置方法如下：
@@ -55,7 +56,7 @@ Optimizer:
 ```
 <a name="评估指标"></a>
-### 1.3 评估指标
+### 2.3 评估指标
 （1）检测阶段：先按照检测框和标注框的IOU评估，IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框，是采用多边形进行表示。检测准确率：正确的检测框个数在全部检测框的占比，主要是判断检测指标。检测召回率：正确的检测框个数在全部标注框的占比，主要是判断漏检的指标。
@@ -65,10 +66,10 @@ Optimizer:
 <a name="数据与垂类场景"></a>
-## 2. 数据与垂类场景
+## 3. 数据与垂类场景
 <a name="训练数据"></a>
-### 2.1 训练数据
+### 3.1 训练数据
 目前开源的模型，数据集和量级如下：
    - 检测：  
@@ -83,13 +84,14 @@ Optimizer:
 其中，公开数据集都是开源的，用户可自行搜索下载，也可参考[中文数据集](./datasets.md)，合成数据暂不开源，用户可使用开源合成工具自行合成，可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer) 、[SynthText](https://github.com/ankush-me/SynthText) 、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) 等。
 <a name="垂类场景"></a>
-### 2.2 垂类场景
+### 3.2 垂类场景
 PaddleOCR主要聚焦通用OCR，如果有垂类需求，您可以用PaddleOCR+垂类数据自己训练；
 如果缺少带标注的数据，或者不想投入研发成本，建议直接调用开放的API，开放的API覆盖了目前比较常见的一些垂类。
 <a name="自己构建数据集"></a>
-### 2.3 自己构建数据集
+### 3.3 自己构建数据集
 在构建数据集时有几个经验可供参考：
@@ -107,7 +109,7 @@ PaddleOCR主要聚焦通用OCR，如果有垂类需求，您可以用PaddleOCR+
 <a name="常见问题"></a>
-## 3. 常见问题
+## 4. 常见问题
 **Q**：训练CRNN识别时，如何选择合适的网络输入shape？

--- a/doc/doc_en/detection_en.md
+++ b/doc/doc_en/detection_en.md
@@ -99,6 +99,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
 # Set the GPU ID used by the '--gpus' parameter.
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
+# multi-Node, multi-GPU training
+# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
+python3 -m paddle.distributed.launch --ips="10.21.226.181,10.21.226.133" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
+```
+**Note:** For multi-Node multi-GPU training, you need to replace the `ips` value in the preceding command with the address of your machine, and the machines must be able to ping each other. The command for viewing the IP address of the machine is `ifconfig`.
+If you want to further speed up the training, you can use [automatic mixed precision training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_en.html). for single card training, the command is as follows:
+```
+python3 tools/train.py -c configs/det/det_mv3_db.yml \
+     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
+     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
 ```
 ### 2.2 Load Trained Model and Continue Training