Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into fixocr

79aec8f2 · LDOUBLEV · caaf0bd4 · 7214a741 · 79aec8f2 · 79aec8f2
Commit 79aec8f2 authored Jul 11, 2020 by LDOUBLEV
20 changed files
--- a/doc/doc_ch/inference.md
+++ b/doc/doc_ch/inference.md

-# 基于预测引擎推理
+# 基于Python预测引擎推理

 inference 模型（fluid.io.save_inference_model保存的模型）
 一般是模型训练完成后保存的固化模型，多用于预测部署。

--- a/doc/doc_ch/quickstart.md
+++ b/doc/doc_ch/quickstart.md
+
+# 中文OCR模型快速使用
+
+## 1.环境配置
+
+请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。
+
+## 2.inference模型下载
+
+|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
+|-|-|-|-|-|
+|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
+|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
+
+*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载，并解压放置在相应目录下*
+
+复制上表中的检测和识别的`inference模型`下载地址，并解压
+
+```
+mkdir inference && cd inference
+# 下载检测模型并解压
+wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
+# 下载识别模型并解压
+wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
+cd ..
+```
+
+以超轻量级模型为例：
+
+```
+mkdir inference && cd inference
+# 下载超轻量级中文OCR模型的检测模型并解压
+wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
+# 下载超轻量级中文OCR模型的识别模型并解压
+wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
+cd ..
+```
+
+解压完毕后应有如下文件结构：
+
+```
+|-inference
+    |-ch_rec_mv3_crnn
+        |- model
+        |- params
+    |-ch_det_mv3_db
+        |- model
+        |- params
+    ...
+```
+
+## 3.单张图像或者图像集合预测
+
+以下代码实现了文本检测、识别串联推理，在执行预测时，需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。
+
+```bash
+
+# 预测image_dir指定的单张图像
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
+
+# 预测image_dir指定的图像集合
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/"
+
+# 如果想使用CPU进行预测，需设置use_gpu参数为False
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/"  --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
+```
+
+通用中文OCR模型的体验可以按照上述步骤下载相应的模型，并且更新相关的参数，示例如下：
+```
+# 预测image_dir指定的单张图像
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/"  --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
+```
+
+带空格的通用中文OCR模型的体验可以按照上述步骤下载相应的模型，并且更新相关的参数，示例如下：
+
+```
+# 预测image_dir指定的单张图像
+python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/"  --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/"
+```
+
+更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md)。
+
+此外，文档教程中也提供了中文OCR模型的其他预测部署方式：
+- 基于C++预测引擎推理(comming soon)
+- [服务部署](./doc/doc_ch/serving.md)
+- 端侧部署(comming soon)
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -94,7 +94,10 @@ word_dict.txt 每行有一个单字，将字符与数字索引映射在一起，
 `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典，
 您可以按需使用。

-如需自定义dic文件，请修改 `configs/rec/rec_icdar15_train.yml` 中的 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`。
+如需自定义dic文件，请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`。
+
+*如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。`use_space_char` 仅在 `character_type=ch` 时生效*
+

 ### 启动训练

@@ -124,6 +127,18 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
 ```

+- 数据增强
+
+PaddleOCR提供了多种数据增强方式，如果您希望在训练时加入扰动，请在配置文件中设置 `distort: true`。
+
+默认的扰动方式有：颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)。
+
+训练过程中每种扰动方式以50%的概率被选择，具体代码实现请参考：[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
+
+*由于OpenCV的兼容性问题，扰动操作暂时只支持GPU*
+
+- 训练
+
 PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率，默认每500个iter评估一次。评估过程中默认将最佳acc模型，保存为 `output/rec_CRNN/best_accuracy` 。

 如果验证集很大，测试将会比较耗时，建议减少评估次数，或训练完再进行评估。
@@ -157,12 +172,26 @@ Global:
  character_type: ch
  # 添加自定义字典，如修改字典请将路径指向新字典
  character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
+  # 训练时添加数据增强
+  distort: true
+  # 识别空格
+  use_space_char: true
  ...
  # 修改reader类型
  reader_yml: ./configs/rec/rec_chinese_reader.yml
  ...

 ...
+
+Optimizer:
+  ...
+  # 添加学习率衰减策略
+  decay:
+    function: cosine_decay
+    # 每个 epoch 包含 iter 数
+    step_each_epoch: 20
+    # 总共训练epoch数
+    total_epoch: 1000
 ```
 **注意，预测/评估时的配置文件请务必与训练一致。**


--- a/doc/doc_ch/reference.md
+++ b/doc/doc_ch/reference.md
+# 参考文献
+
+```
+1. EAST:
+@inproceedings{zhou2017east,
+  title={EAST: an efficient and accurate scene text detector},
+  author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
+  booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
+  pages={5551--5560},
+  year={2017}
+}
+
+2. DB:
+@article{liao2019real,
+  title={Real-time Scene Text Detection with Differentiable Binarization},
+  author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
+  journal={arXiv preprint arXiv:1911.08947},
+  year={2019}
+}
+
+3. DTRB:
+@inproceedings{baek2019wrong,
+  title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
+  author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
+  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
+  pages={4715--4723},
+  year={2019}
+}
+
+4. SAST:
+@inproceedings{wang2019single,
+  title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
+  author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
+  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
+  pages={1277--1285},
+  year={2019}
+}
+
+5. SRN:
+@article{yu2020towards,
+  title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
+  author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
+  journal={arXiv preprint arXiv:2003.12294},
+  year={2020}
+}
+
+6. end2end-psl:
+@inproceedings{sun2019chinese,
+  title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
+  author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
+  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
+  pages={9086--9095},
+  year={2019}
+}
+```
\ No newline at end of file
--- a/doc/doc_ch/serving.md
+++ b/doc/doc_ch/serving.md
+# 服务部署
+
+PaddleOCR提供2种服务部署方式：
+- 基于HubServing的部署：已集成到PaddleOCR中（[code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/ocr_hubserving)），按照本教程使用；
+- 基于PaddleServing的部署：详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr)，后续也将集成到PaddleOCR。  
+
+服务部署目录下包括检测、识别、2阶段串联三种服务包，根据需求选择相应的服务包进行安装和启动。目录如下：
+```
+deploy/hubserving/
+  └─  ocr_det     检测模块服务包
+  └─  ocr_rec     识别模块服务包
+  └─  ocr_system  检测+识别串联服务包
+```
+
+每个服务包下包含3个文件。以2阶段串联服务包为例，目录如下：
+```
+deploy/hubserving/ocr_system/
+  └─  __init__.py    空文件
+  └─  config.json    配置文件，启动服务时作为参数传入
+  └─  module.py      主模块，包含服务的完整逻辑
+```
+
+## 启动服务
+以下步骤以检测+识别2阶段串联服务为例，如果只需要检测服务或识别服务，替换相应文件路径即可。
+### 1. 安装paddlehub
+```pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple```
+
+### 2. 安装服务模块
+PaddleOCR提供3种服务模块，根据需要安装所需模块。如： 
+
+安装检测服务模块：  
+```hub install deploy/hubserving/ocr_det/```  
+
+或，安装识别服务模块：  
+```hub install deploy/hubserving/ocr_rec/```  
+
+或，安装检测+识别串联服务模块：  
+```hub install deploy/hubserving/ocr_system/```  
+
+### 3. 修改配置文件
+在config.json中指定模型路径、是否使用GPU、是否对结果做可视化等参数，如，串联服务ocr_system的配置：
+```python
+{
+    "modules_info": {
+        "ocr_system": {
+            "init_args": {
+                "version": "1.0.0",
+                "det_model_dir": "./inference/det/",
+                "rec_model_dir": "./inference/rec/",
+                "use_gpu": true
+            },
+            "predict_args": {
+                "visualization": false
+            }
+        }
+    }
+}
+```
+其中，模型路径对应的模型为```inference模型```。
+
+### 4. 运行启动命令
+```hub serving start -m ocr_system --config hubserving/ocr_det/config.json```  
+
+这样就完成了一个服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测(即，config中use_gpu置为true)，则需要在启动服务之前，设置CUDA_VISIBLE_DEVICES环境变量，如：```export CUDA_VISIBLE_DEVICES=0```，否则不用设置。
+
+## 发送预测请求
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果:
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+# 发送HTTP请求
+data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]}
+headers = {"Content-type": "application/json"}
+# url = "http://127.0.0.1:8866/predict/ocr_det"
+# url = "http://127.0.0.1:8866/predict/ocr_rec"
+url = "http://127.0.0.1:8866/predict/ocr_system"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+# 打印预测结果
+print(r.json()["results"])
+```
+
+你可能需要根据实际情况修改```url```字符串中的端口号和服务模块名称。  
+
+上面所示代码都已写入测试脚本，可直接运行命令：```python tools/test_hubserving.py```
+
+## 自定义修改服务模块
+如果需要修改服务逻辑，你一般需要操作以下步骤：  
+
+1、 停止服务  
+```hub serving stop -m ocr_system```  
+
+2、 到相应的module.py文件中根据实际需求修改代码  
+
+3、 卸载旧服务包  
+```hub uninstall ocr_system```  
+
+4、 安装修改后的新服务包  
+```hub install deploy/hubserving/ocr_system/```  
+
--- a/doc/doc_ch/update.md
+++ b/doc/doc_ch/update.md
 # 版本更新
-
+- 2020.7.9 添加支持空格的识别模型，识别效果
+- 2020.7.9 添加数据增强、学习率衰减策略,具体参考配置文件
+- 2020.6.8 添加数据集，并保持持续更新
 - 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
 - 2020.6.5 支持单独预测识别时，输出结果得分
 - 2020.5.30 提供超轻量级中文OCR在线体验

--- a/doc/doc_ch/visualization.md
+++ b/doc/doc_ch/visualization.md
+# 效果展示
+- [超轻量级中文OCR效果展示](#超轻量级中文OCR)
+- [通用中文OCR效果展示](#通用中文OCR)
+- [支持空格的中文OCR效果展示](#支持空格的中文OCR)
+
+<a name="超轻量级中文OCR"></a>
+## 超轻量级中文OCR效果展示
+
+![](../imgs_results/1.jpg)
+![](../imgs_results/7.jpg)
+![](../imgs_results/12.jpg)
+![](../imgs_results/4.jpg)
+![](../imgs_results/6.jpg)
+![](../imgs_results/9.jpg)
+![](../imgs_results/16.png)
+![](../imgs_results/22.jpg)
+
+<a name="通用中文OCR"></a>
+## 通用中文OCR效果展示
+![](../imgs_results/chinese_db_crnn_server/11.jpg)
+![](../imgs_results/chinese_db_crnn_server/2.jpg)
+![](../imgs_results/chinese_db_crnn_server/8.jpg)
+
+<a name="支持空格的中文OCR"></a>
+## 支持空格的中文OCR效果展示
+
+### 轻量级模型
+![](../imgs_results/img_11.jpg)
+
+### 通用模型
+![](../imgs_results/chinese_db_crnn_server/en_paper.jpg)
--- a/doc/doc_en/config_en.md
+++ b/doc/doc_en/config_en.md
@@ -22,7 +22,7 @@ Take `rec_chinese_lite_train.yml` as an example
 |      print_batch_step    |    Set print log interval         |       10          |                \                 |
 |      save_model_dir      |    Set model save path        |  output/{model_name}  |                \                 |
 |      save_epoch_step     |    Set model save interval        |       3           |                \                 |
-|      eval_batch_step     |    Set the model evaluation interval        |       2000        |                \                 |
+|      eval_batch_step     |    Set the model evaluation interval        |2000 or [1000, 2000] |runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration  |
 |train_batch_size_per_card |  Set the batch size during training   |         256         |                \                 |
 | test_batch_size_per_card |  Set the batch size during testing    |         256         |                \                 |
 |      image_shape         |    Set input image size        |   [3, 32, 100]    |                \                 |
@@ -30,6 +30,8 @@ Take `rec_chinese_lite_train.yml` as an example
 |      character_type      |    Set character type            |       ch          |    en/ch, the default dict will be used for en, and the custom dict will be used for ch|
 |      character_dict_path |    Set dictionary path            |  ./ppocr/utils/ic15_dict.txt  |    \                 |
 |      loss_type           |    Set loss type              |       ctc         |    Supports two types of loss: ctc / attention |
+|       distort            |    Set use distort          |       false       |  Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)                 |
+|      use_space_char          |    Wether to recognize space             |        false      |         Only support in character_type=ch mode                 |
 |      reader_yml          |    Set the reader configuration file          |  ./configs/rec/rec_icdar15_reader.yml  |  \          |
 |      pretrain_weights    |    Load pre-trained model path      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
 |      checkpoints         |    Load saved model path            |       None        |    Used to load saved parameters to continue training after interruption |

--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -158,9 +158,23 @@ Global:
  ...
  # Modify reader type
  reader_yml: ./configs/rec/rec_chinese_reader.yml
+  # Whether to use data augmentation
+  distort: true
+  # Whether to recognize spaces
+  use_space_char: true
  ...

 ...
+
+Optimizer:
+  ...
+  # Add learning rate decay strategy
+  decay:
+    function: cosine_decay
+    # Each epoch contains iter number
+    step_each_epoch: 20
+    # Total epoch number
+    total_epoch: 1000
 ```
 **Note that the configuration file for prediction/evaluation must be consistent with the training.**


--- a/doc/imgs_en/img_12.jpg
+++ b/doc/imgs_en/img_12.jpg
--- a/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg
+++ b/doc/imgs_results/chinese_db_crnn_server/en_paper.jpg
--- a/doc/imgs_results/img_11.jpg
+++ b/doc/imgs_results/img_11.jpg
--- a/ppocr/data/det/db_process.py
+++ b/ppocr/data/det/db_process.py
@@ -194,8 +194,12 @@ class DBProcessTest(object):
        img_std = [0.229, 0.224, 0.225]
        im = im.astype(np.float32, copy=False)
        im = im / 255
-        im -= img_mean
-        im /= img_std
+        im[:, :, 0] -= img_mean[0]
+        im[:, :, 1] -= img_mean[1]
+        im[:, :, 2] -= img_mean[2]
+        im[:, :, 0] /= img_std[0]
+        im[:, :, 1] /= img_std[1]
+        im[:, :, 2] /= img_std[2]
        channel_swap = (2, 0, 1)
        im = im.transpose(channel_swap)
        return im

--- a/ppocr/data/rec/dataset_traversal.py
+++ b/ppocr/data/rec/dataset_traversal.py
@@ -45,12 +45,20 @@ class LMDBReader(object):
        self.use_tps = False
        if "tps" in params:
            self.ues_tps = True
+        self.use_distort = False
+        if "distort" in params:
+            self.use_distort = params['distort'] and params['use_gpu']
+            if not params['use_gpu']:
+                logger.info(
+                    "Distort operation can only support in GPU. Distort will be set to False."
+                )
        if params['mode'] == 'train':
            self.batch_size = params['train_batch_size_per_card']
            self.drop_last = True
        else:
            self.batch_size = params['test_batch_size_per_card']
            self.drop_last = False
+            self.use_distort = False
        self.infer_img = params['infer_img']

    def load_hierarchical_lmdb_dataset(self):
@@ -142,7 +150,8 @@ class LMDBReader(object):
                                label=label,
                                char_ops=self.char_ops,
                                loss_type=self.loss_type,
-                                max_text_length=self.max_text_length)
+                                max_text_length=self.max_text_length,
+                                distort=self.use_distort)
                            if outs is None:
                                continue
                            yield outs
@@ -185,12 +194,20 @@ class SimpleReader(object):
        self.use_tps = False
        if "tps" in params:
            self.use_tps = True
+        self.use_distort = False
+        if "distort" in params:
+            self.use_distort = params['distort'] and params['use_gpu']
+            if not params['use_gpu']:
+                logger.info(
+                    "Distort operation can only support in GPU.Distort will be set to False."
+                )
        if params['mode'] == 'train':
            self.batch_size = params['train_batch_size_per_card']
            self.drop_last = True
        else:
            self.batch_size = params['test_batch_size_per_card']
            self.drop_last = False
+            self.use_distort = False

    def __call__(self, process_id):
        if self.mode != 'train':
@@ -232,9 +249,14 @@ class SimpleReader(object):
                        img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

                    label = substr[1]
-                    outs = process_image(img, self.image_shape, label,
-                                         self.char_ops, self.loss_type,
-                                         self.max_text_length)
+                    outs = process_image(
+                        img=img,
+                        image_shape=self.image_shape,
+                        label=label,
+                        char_ops=self.char_ops,
+                        loss_type=self.loss_type,
+                        max_text_length=self.max_text_length,
+                        distort=self.use_distort)
                    if outs is None:
                        continue
                    yield outs

--- a/ppocr/data/rec/img_tools.py
+++ b/ppocr/data/rec/img_tools.py
@@ -15,6 +15,7 @@
 import math
 import cv2
 import numpy as np
+import random
 from ppocr.utils.utility import initial_logger
 logger = initial_logger()

@@ -89,6 +90,254 @@ def get_img_data(value):
    return imgori


+def flag():
+    """
+    flag
+    """
+    return 1 if random.random() > 0.5000001 else -1
+
+
+def cvtColor(img):
+    """
+    cvtColor
+    """
+    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
+    delta = 0.001 * random.random() * flag()
+    hsv[:, :, 2] = hsv[:, :, 2] * (1 + delta)
+    new_img = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
+    return new_img
+
+
+def blur(img):
+    """
+    blur
+    """
+    h, w, _ = img.shape
+    if h > 10 and w > 10:
+        return cv2.GaussianBlur(img, (5, 5), 1)
+    else:
+        return img
+
+
+def jitter(img):
+    """
+    jitter
+    """
+    w, h, _ = img.shape
+    if h > 10 and w > 10:
+        thres = min(w, h)
+        s = int(random.random() * thres * 0.01)
+        src_img = img.copy()
+        for i in range(s):
+            img[i:, i:, :] = src_img[:w - i, :h - i, :]
+        return img
+    else:
+        return img
+
+
+def add_gasuss_noise(image, mean=0, var=0.1):
+    """
+    Gasuss noise
+    """
+
+    noise = np.random.normal(mean, var**0.5, image.shape)
+    out = image + 0.5 * noise
+    out = np.clip(out, 0, 255)
+    out = np.uint8(out)
+    return out
+
+
+def get_crop(image):
+    """
+    random crop
+    """
+    h, w, _ = image.shape
+    top_min = 1
+    top_max = 8
+    top_crop = int(random.randint(top_min, top_max))
+    top_crop = min(top_crop, h - 1)
+    crop_img = image.copy()
+    ratio = random.randint(0, 1)
+    if ratio:
+        crop_img = crop_img[top_crop:h, :, :]
+    else:
+        crop_img = crop_img[0:h - top_crop, :, :]
+    return crop_img
+
+
+class Config:
+    """
+    Config
+    """
+
+    def __init__(self, ):
+        self.anglex = random.random() * 30
+        self.angley = random.random() * 15
+        self.anglez = random.random() * 10
+        self.fov = 42
+        self.r = 0
+        self.shearx = random.random() * 0.3
+        self.sheary = random.random() * 0.05
+        self.borderMode = cv2.BORDER_REPLICATE
+
+    def make(self, w, h, ang):
+        """
+        make
+        """
+        self.anglex = random.random() * 5 * flag()
+        self.angley = random.random() * 5 * flag()
+        self.anglez = -1 * random.random() * int(ang) * flag()
+        self.fov = 42
+        self.r = 0
+        self.shearx = 0
+        self.sheary = 0
+        self.borderMode = cv2.BORDER_REPLICATE
+        self.w = w
+        self.h = h
+
+        self.perspective = True
+        self.crop = True
+        self.affine = False
+        self.reverse = True
+        self.noise = True
+        self.jitter = True
+        self.blur = True
+        self.color = True
+
+
+def rad(x):
+    """
+    rad
+    """
+    return x * np.pi / 180
+
+
+def get_warpR(config):
+    """
+    get_warpR
+    """
+    anglex, angley, anglez, fov, w, h, r = \
+        config.anglex, config.angley, config.anglez, config.fov, config.w, config.h, config.r
+    if w > 69 and w < 112:
+        anglex = anglex * 1.5
+
+    z = np.sqrt(w**2 + h**2) / 2 / np.tan(rad(fov / 2))
+    # Homogeneous coordinate transformation matrix
+    rx = np.array([[1, 0, 0, 0],
+                   [0, np.cos(rad(anglex)), -np.sin(rad(anglex)), 0], [
+                       0,
+                       -np.sin(rad(anglex)),
+                       np.cos(rad(anglex)),
+                       0,
+                   ], [0, 0, 0, 1]], np.float32)
+    ry = np.array([[np.cos(rad(angley)), 0, np.sin(rad(angley)), 0],
+                   [0, 1, 0, 0], [
+                       -np.sin(rad(angley)),
+                       0,
+                       np.cos(rad(angley)),
+                       0,
+                   ], [0, 0, 0, 1]], np.float32)
+    rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0, 0],
+                   [-np.sin(rad(anglez)), np.cos(rad(anglez)), 0, 0],
+                   [0, 0, 1, 0], [0, 0, 0, 1]], np.float32)
+    r = rx.dot(ry).dot(rz)
+    # generate 4 points
+    pcenter = np.array([h / 2, w / 2, 0, 0], np.float32)
+    p1 = np.array([0, 0, 0, 0], np.float32) - pcenter
+    p2 = np.array([w, 0, 0, 0], np.float32) - pcenter
+    p3 = np.array([0, h, 0, 0], np.float32) - pcenter
+    p4 = np.array([w, h, 0, 0], np.float32) - pcenter
+    dst1 = r.dot(p1)
+    dst2 = r.dot(p2)
+    dst3 = r.dot(p3)
+    dst4 = r.dot(p4)
+    list_dst = np.array([dst1, dst2, dst3, dst4])
+    org = np.array([[0, 0], [w, 0], [0, h], [w, h]], np.float32)
+    dst = np.zeros((4, 2), np.float32)
+    # Project onto the image plane
+    dst[:, 0] = list_dst[:, 0] * z / (z - list_dst[:, 2]) + pcenter[0]
+    dst[:, 1] = list_dst[:, 1] * z / (z - list_dst[:, 2]) + pcenter[1]
+
+    warpR = cv2.getPerspectiveTransform(org, dst)
+
+    dst1, dst2, dst3, dst4 = dst
+    r1 = int(min(dst1[1], dst2[1]))
+    r2 = int(max(dst3[1], dst4[1]))
+    c1 = int(min(dst1[0], dst3[0]))
+    c2 = int(max(dst2[0], dst4[0]))
+
+    try:
+        ratio = min(1.0 * h / (r2 - r1), 1.0 * w / (c2 - c1))
+
+        dx = -c1
+        dy = -r1
+        T1 = np.float32([[1., 0, dx], [0, 1., dy], [0, 0, 1.0 / ratio]])
+        ret = T1.dot(warpR)
+    except:
+        ratio = 1.0
+        T1 = np.float32([[1., 0, 0], [0, 1., 0], [0, 0, 1.]])
+        ret = T1
+    return ret, (-r1, -c1), ratio, dst
+
+
+def get_warpAffine(config):
+    """
+    get_warpAffine
+    """
+    anglez = config.anglez
+    rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0],
+                   [-np.sin(rad(anglez)), np.cos(rad(anglez)), 0]], np.float32)
+    return rz
+
+
+def warp(img, ang):
+    """
+    warp
+    """
+    h, w, _ = img.shape
+    config = Config()
+    config.make(w, h, ang)
+    new_img = img
+
+    if config.perspective:
+        tp = random.randint(1, 100)
+        if tp >= 50:
+            warpR, (r1, c1), ratio, dst = get_warpR(config)
+            new_w = int(np.max(dst[:, 0])) - int(np.min(dst[:, 0]))
+            new_img = cv2.warpPerspective(
+                new_img,
+                warpR, (int(new_w * ratio), h),
+                borderMode=config.borderMode)
+    if config.crop:
+        img_height, img_width = img.shape[0:2]
+        tp = random.randint(1, 100)
+        if tp >= 50 and img_height >= 20 and img_width >= 20:
+            new_img = get_crop(new_img)
+    if config.affine:
+        warpT = get_warpAffine(config)
+        new_img = cv2.warpAffine(
+            new_img, warpT, (w, h), borderMode=config.borderMode)
+    if config.blur:
+        tp = random.randint(1, 100)
+        if tp >= 50:
+            new_img = blur(new_img)
+    if config.color:
+        tp = random.randint(1, 100)
+        if tp >= 50:
+            new_img = cvtColor(new_img)
+    if config.jitter:
+        new_img = jitter(new_img)
+    if config.noise:
+        tp = random.randint(1, 100)
+        if tp >= 50:
+            new_img = add_gasuss_noise(new_img)
+    if config.reverse:
+        tp = random.randint(1, 100)
+        if tp >= 50:
+            new_img = 255 - new_img
+    return new_img
+
+
 def process_image(img,
                  image_shape,
                  label=None,
@@ -96,7 +345,10 @@ def process_image(img,
                  loss_type=None,
                  max_text_length=None,
                  tps=None,
-                  infer_mode=False):
+                  infer_mode=False,
+                  distort=False):
+    if distort:
+        img = warp(img, 10)
    if infer_mode and char_ops.character_type == "ch" and not tps:
        norm_img = resize_norm_img_chinese(img, image_shape)
    else:
@@ -108,7 +360,7 @@ def process_image(img,
        text = char_ops.encode(label)
        if len(text) == 0 or len(text) > max_text_length:
            logger.info(
-                "Warning in ppocr/data/rec/img_tools.py:line106: Wrong data type."
+                "Warning in ppocr/data/rec/img_tools.py:line362: Wrong data type."
                "Excepted string with length between 1 and {}, but "
                "got '{}'. Label is '{}'".format(max_text_length,
                                                 len(text), label))

--- a/ppocr/utils/character.py
+++ b/ppocr/utils/character.py
@@ -30,12 +30,17 @@ class CharacterOps(object):
            dict_character = list(self.character_str)
        elif self.character_type == "ch":
            character_dict_path = config['character_dict_path']
+            add_space = False
+            if 'use_space_char' in config:
+                add_space = config['use_space_char']
            self.character_str = ""
            with open(character_dict_path, "rb") as fin:
                lines = fin.readlines()
                for line in lines:
                    line = line.decode('utf-8').strip("\n").strip("\r\n")
                    self.character_str += line
+            if add_space:
+                self.character_str += " "
            dict_character = list(self.character_str)
        elif self.character_type == "en_sensitive":
            # same with ASTER setting (use 94 char).
@@ -93,7 +98,7 @@ class CharacterOps(object):
            if is_remove_duplicate:
                if idx > 0 and text_index[idx - 1] == text_index[idx]:
                    continue
-            char_list.append(self.character[text_index[idx]])
+            char_list.append(self.character[int(text_index[idx])])
        text = ''.join(char_list)
        return text


--- a/tools/infer/predict_rec.py
+++ b/tools/infer/predict_rec.py
@@ -39,7 +39,8 @@ class TextRecognizer(object):
        self.rec_algorithm = args.rec_algorithm
        char_ops_params = {
            "character_type": args.rec_char_type,
-            "character_dict_path": args.rec_char_dict_path
+            "character_dict_path": args.rec_char_dict_path,
+            "use_space_char": args.use_space_char
        }
        if self.rec_algorithm != "RARE":
            char_ops_params['loss_type'] = 'ctc'

--- a/tools/infer/utility.py
+++ b/tools/infer/utility.py
@@ -63,6 +63,7 @@ def parse_args():
        "--rec_char_dict_path",
        type=str,
        default="./ppocr/utils/ppocr_keys_v1.txt")
+    parser.add_argument("--use_space_char", type=bool, default=True)
    return parser.parse_args()


@@ -90,8 +91,9 @@ def create_predictor(args, mode):
        config.enable_use_gpu(args.gpu_mem, 0)
    else:
        config.disable_gpu()
-
-    config.enable_memory_optim()
+        config.enable_mkldnn()
+        config.set_cpu_math_library_num_threads(4)
+    #config.enable_memory_optim()
    config.disable_glog_info()

    # use zero copy
@@ -169,26 +171,35 @@ def draw_ocr_box_txt(image, boxes, txts):
    draw_left = ImageDraw.Draw(img_left)
    draw_right = ImageDraw.Draw(img_right)
    for (box, txt) in zip(boxes, txts):
-        color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
+        color = (random.randint(0, 255), random.randint(0, 255),
+                 random.randint(0, 255))
        draw_left.polygon(box, fill=color)
-        draw_right.polygon([box[0][0], box[0][1],
-                            box[1][0], box[1][1],
-                            box[2][0], box[2][1],
-                            box[3][0], box[3][1]], outline=color)
-        box_height = math.sqrt((box[0][0] - box[3][0]) ** 2 + (box[0][1] - box[3][1]) ** 2)
-        box_width = math.sqrt((box[0][0] - box[1][0]) ** 2 + (box[0][1] - box[1][1]) ** 2)
+        draw_right.polygon(
+            [
+                box[0][0], box[0][1], box[1][0], box[1][1], box[2][0],
+                box[2][1], box[3][0], box[3][1]
+            ],
+            outline=color)
+        box_height = math.sqrt((box[0][0] - box[3][0])**2 + (box[0][1] - box[3][
+            1])**2)
+        box_width = math.sqrt((box[0][0] - box[1][0])**2 + (box[0][1] - box[1][
+            1])**2)
        if box_height > 2 * box_width:
            font_size = max(int(box_width * 0.9), 10)
-            font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8")
+            font = ImageFont.truetype(
+                "./doc/simfang.ttf", font_size, encoding="utf-8")
            cur_y = box[0][1]
            for c in txt:
                char_size = font.getsize(c)
-                draw_right.text((box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font)
+                draw_right.text(
+                    (box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font)
                cur_y += char_size[1]
        else:
            font_size = max(int(box_height * 0.8), 10)
-            font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8")
-            draw_right.text([box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
+            font = ImageFont.truetype(
+                "./doc/simfang.ttf", font_size, encoding="utf-8")
+            draw_right.text(
+                [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
    img_left = Image.blend(image, img_left, 0.5)
    img_show = Image.new('RGB', (w * 2, h), (255, 255, 255))
    img_show.paste(img_left, (0, 0, w, h))
@@ -292,6 +303,25 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.):
    return np.array(blank_img)


+def base64_to_cv2(b64str):
+    import base64
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+
+def draw_boxes(image, boxes, scores=None, drop_score=0.5):
+    if scores is None:
+        scores = [1] * len(boxes)
+    for (box, score) in zip(boxes, scores):
+        if score < drop_score:
+            continue
+        box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
+        image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
+    return image
+
+
 if __name__ == '__main__':
    test_img = "./doc/test_v2"
    predict_txt = "./doc/predict.txt"

--- a/tools/program.py
+++ b/tools/program.py
@@ -219,6 +219,13 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict):
    epoch_num = config['Global']['epoch_num']
    print_batch_step = config['Global']['print_batch_step']
    eval_batch_step = config['Global']['eval_batch_step']
+    start_eval_step = 0
+    if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
+        start_eval_step = eval_batch_step[0]
+        eval_batch_step = eval_batch_step[1]
+        logger.info(
+            "During the training process, after the {}th iteration, an evaluation is run every {} iterations".
+            format(start_eval_step, eval_batch_step))
    save_epoch_step = config['Global']['save_epoch_step']
    save_model_dir = config['Global']['save_model_dir']
    if not os.path.exists(save_model_dir):
@@ -246,7 +253,7 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict):
                t2 = time.time()
                train_batch_elapse = t2 - t1
                train_stats.update(stats)
-                if train_batch_id > 0 and train_batch_id \
+                if train_batch_id > start_eval_step and (train_batch_id -start_eval_step)  \
                    % print_batch_step == 0:
                    logs = train_stats.log()
                    strs = 'epoch: {}, iter: {}, {}, time: {:.3f}'.format(
@@ -286,6 +293,13 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
    epoch_num = config['Global']['epoch_num']
    print_batch_step = config['Global']['print_batch_step']
    eval_batch_step = config['Global']['eval_batch_step']
+    start_eval_step = 0
+    if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
+        start_eval_step = eval_batch_step[0]
+        eval_batch_step = eval_batch_step[1]
+        logger.info(
+            "During the training process, after the {}th iteration, an evaluation is run every {} iterations".
+            format(start_eval_step, eval_batch_step))
    save_epoch_step = config['Global']['save_epoch_step']
    save_model_dir = config['Global']['save_model_dir']
    if not os.path.exists(save_model_dir):
@@ -324,7 +338,7 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
                train_batch_elapse = t2 - t1
                stats = {'loss': loss, 'acc': acc}
                train_stats.update(stats)
-                if train_batch_id > 0 and train_batch_id \
+                if train_batch_id > start_eval_step and (train_batch_id - start_eval_step) \
                    % print_batch_step == 0:
                    logs = train_stats.log()
                    strs = 'epoch: {}, iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format(

--- a/tools/test_hubserving.py
+++ b/tools/test_hubserving.py
+#!usr/bin/python
+# -*- coding: utf-8 -*-
+
+import requests
+import json
+import cv2
+import base64
+import time
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+start = time.time()
+# 发送HTTP请求
+data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]}
+headers = {"Content-type": "application/json"}
+# url = "http://127.0.0.1:8866/predict/ocr_det"
+# url = "http://127.0.0.1:8866/predict/ocr_rec"
+url = "http://127.0.0.1:8866/predict/ocr_system"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+end = time.time()
+
+# 打印预测结果
+print(r.json()["results"])
+print("time cost: ", end - start)