Merge pull request #13 from PaddlePaddle/dygraph

Dygraph

Merge pull request #13 from PaddlePaddle/dygraph
Dygraph
0458f0cc · zhoujun · GitHub · 04b0318b · 836839bb · 0458f0cc
Unverified Commit 0458f0cc authored Dec 10, 2020 by zhoujun Committed by GitHub Dec 10, 2020
20 changed files
--- a/deploy/docker/hubserving/gpu/Dockerfile
+++ b/deploy/docker/hubserving/gpu/Dockerfile
+# Version: 1.0.0
+FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
+
+# PaddleOCR base on Python3.7
+RUN pip3.7 install --upgrade pip -i https://mirror.baidu.com/pypi/simple
+
+RUN python3.7 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
+
+RUN pip3.7 install paddlehub --upgrade -i https://mirror.baidu.com/pypi/simple
+
+RUN git clone https://github.com/PaddlePaddle/PaddleOCR.git /PaddleOCR
+
+WORKDIR /PaddleOCR
+
+RUN pip3.7 install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
+
+RUN mkdir -p /PaddleOCR/inference/
+# Download orc detect model(light version). if you want to change normal version, you can change ch_ppocr_mobile_v1.1_det_infer to ch_ppocr_server_v1.1_det_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD {link} /PaddleOCR/inference/
+RUN tar xf /PaddleOCR/inference/{file}.tar -C /PaddleOCR/inference/
+
+# Download direction classifier(light version). If you want to change normal version, you can change ch_ppocr_mobile_v1.1_cls_infer to ch_ppocr_mobile_v1.1_cls_infer, also remember change cls_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD {link} /PaddleOCR/inference/
+RUN tar xf /PaddleOCR/inference/{file} -C /PaddleOCR/inference/
+
+# Download orc recognition model(light version). If you want to change normal version, you can change ch_ppocr_mobile_v1.1_rec_infer to ch_ppocr_server_v1.1_rec_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py）
+ADD {link} /PaddleOCR/inference/
+RUN tar xf /PaddleOCR/inference/{file}.tar -C /PaddleOCR/inference/
+
+EXPOSE 8868
+
+CMD ["/bin/bash","-c","hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
\ No newline at end of file
--- a/deploy/docker/hubserving/sample_request.txt
+++ b/deploy/docker/hubserving/sample_request.txt
--- a/doc/doc_ch/add_new_algorithm.md
+++ b/doc/doc_ch/add_new_algorithm.md
+# 添加新算法
+
+PaddleOCR将一个算法分解为以下几个部分，并对各部分进行模块化处理，方便快速组合出新的算法。
+
+* 数据加载和处理
+* 网络
+* 后处理
+* 损失函数
+* 指标评估
+* 优化器
+
+下面将分别对每个部分进行介绍，并介绍如何在该部分里添加新算法所需模块。
+
+## 数据加载和处理
+
+数据加载和处理由不同的模块(module)组成，其完成了图片的读取、数据增强和label的制作。这一部分在[ppocr/data](../../ppocr/data)下。 各个文件及文件夹作用说明如下:
+
+```bash
+ppocr/data/
+├── imaug             # 图片的读取、数据增强和label制作相关的文件
+│   ├── label_ops.py  # 对label进行变换的modules
+│   ├── operators.py  # 对image进行变换的modules
+│   ├──.....
+├── __init__.py
+├── lmdb_dataset.py   # 读取lmdb的数据集的dataset
+└── simple_dataset.py # 读取以`image_path\tgt`形式保存的数据集的dataset
+```
+
+PaddleOCR内置了大量图像操作相关模块，对于没有没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/data/imaug](../../ppocr/data/imaug) 文件夹下新建文件，如my_module.py。
+2. 在 my_module.py 文件内添加相关代码，示例代码如下:
+
+```python
+class MyModule:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, data):
+        img = data['image']
+        label = data['label']
+        # your process code
+
+        data['image'] = img
+        data['label'] = label
+        return data
+```
+
+3. 在 [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) 文件内导入添加的模块。
+
+数据处理的所有处理步骤由不同的模块顺序执行而成，在config文件中按照列表的形式组合并执行。如:
+
+```yaml
+# angle class data process
+transforms:
+  - DecodeImage: # load image
+      img_mode: BGR
+      channel_first: False
+  - MyModule:
+      args1: args1
+      args2: args2
+  - KeepKeys:
+      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
+```
+
+## 网络
+
+网络部分完成了网络的组网操作，PaddleOCR将网络划分为四部分，这一部分在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones->
+necks->heads)依次通过这四个部分。
+
+```bash
+├── architectures # 网络的组网代码
+├── transforms    # 网络的图像变换模块
+├── backbones     # 网络的特征提取模块
+├── necks         # 网络的特征增强模块
+└── heads         # 网络的输出模块
+```
+
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的常用模块，对于没有内置的模块可通过如下步骤添加，四个部分添加步骤一致，以backbones为例:
+
+1. 在 [ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件，如my_backbone.py。
+2. 在 my_backbone.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class MyBackbone(nn.Layer):
+    def __init__(self, *args, **kwargs):
+        super(MyBackbone, self).__init__()
+        # your init code
+        self.conv = nn.xxxx
+
+    def forward(self, inputs):
+        # your necwork forward
+        y = self.conv(inputs)
+        return y
+```
+
+3. 在 [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的模块。
+
+在完成网络的四部分模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Architecture:
+  model_type: rec
+  algorithm: CRNN
+  Transform:
+    name: MyTransform
+    args1: args1
+    args2: args2
+  Backbone:
+    name: MyBackbone
+    args1: args1
+  Neck:
+    name: MyNeck
+    args1: args1
+  Head:
+    name: MyHead
+    args1: args1
+```
+
+## 后处理
+
+后处理实现解码网络输出获得文本框或者识别到的文字。这一部分在[ppocr/postprocess](../../ppocr/postprocess)下。
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的后处理模块，对于没有内置的组件可通过如下步骤添加:
+
+1. 在 [ppocr/postprocess](../../ppocr/postprocess) 文件夹下新建文件，如 my_postprocess.py。
+2. 在 my_postprocess.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+
+
+class MyPostProcess:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, preds, label=None, *args, **kwargs):
+        if isinstance(preds, paddle.Tensor):
+            preds = preds.numpy()
+        # you preds decode code
+        preds = self.decode_preds(preds)
+        if label is None:
+            return preds
+        # you label decode code
+        label = self.decode_label(label)
+        return preds, label
+
+    def decode_preds(self, preds):
+        # you preds decode code
+        pass
+
+    def decode_label(self, preds):
+        # you label decode code
+        pass
+```
+
+3. 在 [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py)文件内导入添加的模块。
+
+在后处理模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+PostProcess:
+  name: MyPostProcess
+  args1: args1
+  args2: args2
+```
+
+## 损失函数
+
+损失函数用于计算网络输出和label之间的距离。这一部分在[ppocr/losses](../../ppocr/losses)下。
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的损失函数模块，对于没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/losses](../../ppocr/losses) 文件夹下新建文件，如 my_loss.py。
+2. 在 my_loss.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+from paddle import nn
+
+
+class MyLoss(nn.Layer):
+    def __init__(self, **kwargs):
+        super(MyLoss, self).__init__()
+        # you init code
+        pass
+
+    def __call__(self, predicts, batch):
+        label = batch[1]
+        # your loss code
+        loss = self.loss(input=predicts, label=label)
+        return {'loss': loss}
+```
+
+3. 在 [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py)文件内导入添加的模块。
+
+在损失函数添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Loss:
+  name: MyLoss
+  args1: args1
+  args2: args2
+```
+
+## 指标评估
+
+指标评估用于计算网络在当前batch上的性能。这一部分在[ppocr/metrics](../../ppocr/metrics)下。 PaddleOCR内置了检测，分类和识别等算法相关的指标评估模块，对于没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/metrics](../../ppocr/metrics) 文件夹下新建文件，如my_metric.py。
+2. 在 my_metric.py 文件内添加相关代码，示例代码如下:
+
+```python
+
+class MyMetric(object):
+    def __init__(self, main_indicator='acc', **kwargs):
+        # main_indicator is used for select best model
+        self.main_indicator = main_indicator
+        self.reset()
+
+    def __call__(self, preds, batch, *args, **kwargs):
+        # preds is out of postprocess
+        # batch is out of dataloader
+        labels = batch[1]
+        cur_correct_num = 0
+        cur_all_num = 0
+        # you metric code
+        self.correct_num += cur_correct_num
+        self.all_num += cur_all_num
+        return {'acc': cur_correct_num / cur_all_num, }
+
+    def get_metric(self):
+        """
+        return metircs {
+                 'acc': 0,
+                 'norm_edit_dis': 0,
+            }
+        """
+        acc = self.correct_num / self.all_num
+        self.reset()
+        return {'acc': acc}
+
+    def reset(self):
+        # reset metric
+        self.correct_num = 0
+        self.all_num = 0
+
+```
+
+3. 在 [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py)文件内导入添加的模块。
+
+在指标评估模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Metric:
+  name: MyMetric
+  main_indicator: acc
+```
+
+## 优化器
+
+优化器用于训练网络。优化器内部还包含了网络正则化和学习率衰减模块。 这一部分在[ppocr/optimizer](../../ppocr/optimizer)下。 PaddleOCR内置了`Momentum`,`Adam`
+和`RMSProp`等常用的优化器模块，`Linear`,`Cosine`,`Step`和`Piecewise`等常用的正则化模块与`L1Decay`和`L2Decay`等常用的学习率衰减模块。
+对于没有内置的模块可通过如下步骤添加，以`optimizer`为例:
+
+1. 在 [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) 文件内创建自己的优化器，示例代码如下:
+
+```python
+from paddle import optimizer as optim
+
+
+class MyOptim(object):
+    def __init__(self, learning_rate=0.001, *args, **kwargs):
+        self.learning_rate = learning_rate
+
+    def __call__(self, parameters):
+        # It is recommended to wrap the built-in optimizer of paddle
+        opt = optim.XXX(
+            learning_rate=self.learning_rate,
+            parameters=parameters)
+        return opt
+
+```
+
+在优化器模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Optimizer:
+  name: MyOptim
+  args1: args1
+  args2: args2
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
\ No newline at end of file
--- a/doc/doc_ch/angle_class.md
+++ b/doc/doc_ch/angle_class.md
@@ -45,7 +45,7 @@ train_data/cls/word_002.jpg   180
 ```
 |-train_data
    |-cls
-        |- 和一个cls_gt_test.txt
+        |- cls_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
@@ -62,29 +62,36 @@ PaddleOCR提供了训练脚本、评估脚本和预测脚本。
 *如果您安装的是cpu版本，请将配置文件中的 `use_gpu` 字段修改为false*

 ```
-# 设置PYTHONPATH路径
-export PYTHONPATH=$PYTHONPATH:.
-# GPU训练 支持单卡，多卡训练，通过CUDA_VISIBLE_DEVICES指定卡号
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-# 启动训练
-python3 tools/train.py -c configs/cls/cls_mv3.yml
+# GPU训练 支持单卡，多卡训练，通过selected_gpus指定卡号
+# 启动训练，下面的命令已经写入train.sh文件中，只需修改文件里的配置文件路径即可
+python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/cls/cls_mv3.yml
 ```

 - 数据增强

-PaddleOCR提供了多种数据增强方式，如果您希望在训练时加入扰动，请在配置文件中设置 `distort: true`。
+PaddleOCR提供了多种数据增强方式，如果您希望在训练时加入扰动，请在配置文件中取消`Train.dataset.transforms`下的`RecAug`和`RandAugment`字段的注释。

 默认的扰动方式有：颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse),随机数据增强(RandAugment)。

 训练过程中除随机数据增强外每种扰动方式以50%的概率被选择，具体代码实现请参考：
-[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py)
-[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
+[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) 
+[randaugment.py](../../ppocr/data/imaug/randaugment.py)

 *由于OpenCV的兼容性问题，扰动操作暂时只支持linux*

 ### 训练

-PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml` 中修改 `eval_batch_step` 设置评估频率，默认每500个iter评估一次。评估过程中默认将最佳acc模型，保存为 `output/cls_mv3/best_accuracy` 。
+PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml` 中修改 `eval_batch_step` 设置评估频率，默认每1000个iter评估一次。训练过程中将会保存如下内容：
+```bash
+├── best_accuracy.pdopt # 最佳模型的优化器参数
+├── best_accuracy.pdparams # 最佳模型的参数
+├── best_accuracy.states # 最佳模型的指标和epoch等信息
+├── config.yml # 本次实验的配置文件
+├── latest.pdopt # 最新模型的优化器参数
+├── latest.pdparams # 最新模型的参数
+├── latest.states # 最新模型的指标和epoch等信息
+└── train.log # 训练日志
+```

 如果验证集很大，测试将会比较耗时，建议减少评估次数，或训练完再进行评估。

@@ -92,9 +99,8 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/cls/cls_mv3.yml`

 ### 评估

-评估数据集可以通过`configs/cls/cls_reader.yml`  修改EvalReader中的 `label_file_path` 设置。
+评估数据集可以通过修改`configs/cls/cls_mv3.yml`文件里的`Eval.dataset.label_file_list` 字段设置。

-*注意* 评估时必须确保配置文件中 infer_img 字段为空
 ```
 export CUDA_VISIBLE_DEVICES=0
 # GPU 评估， Global.checkpoints 为待测权重
@@ -107,21 +113,20 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/

 使用 PaddleOCR 训练好的模型，可以通过以下脚本进行快速预测。

-默认预测图片存储在 `infer_img` 里，通过 `-o Global.checkpoints` 指定权重：
+通过 `Global.infer_img` 指定预测图片或文件夹路径，通过 `Global.checkpoints` 指定权重：

 ```
 # 预测分类结果
-python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg
 ```

 预测图片：

-![](../imgs_words/en/word_1.png)
+![](../imgs_words/ch/word_1.jpg)

 得到输入图像的预测结果：

 ```
-infer_img: doc/imgs_words/en/word_1.png
-    scores: [[0.93161047 0.06838956]]
-    label: [0]
+infer_img: doc/imgs_words/ch/word_1.jpg
+     result: ('0', 0.9998784)
 ```
--- a/doc/doc_ch/config.md
+++ b/doc/doc_ch/config.md
-# 可选参数列表
+## 可选参数列表

 以下列表可以通过`--help`查看

@@ -8,65 +8,115 @@
 |          -o              |      ALL       |  设置配置文件里的参数内容  |  None  |  使用-o配置相较于-c选择的配置文件具有更高的优先级。例如：`-o Global.use_gpu=false`  |  


-## 配置文件 Global 参数介绍
+## 配置文件参数介绍

 以 `rec_chinese_lite_train_v1.1.yml ` 为例
-
+### Global 

 |         字段             |            用途                |      默认值       |            备注            |
 | :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|      algorithm           |    设置算法                    |  与配置文件同步   |     选择模型，支持模型请参考[简介](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) |
-|      use_gpu             |    设置代码运行场所            |       true        |                \                 |
-|      epoch_num           |    最大训练epoch数             |       3000        |                \                 |
+|      use_gpu             |    设置代码是否在gpu运行           |       true        |                \                 |
+|      epoch_num           |    最大训练epoch数             |       500        |                \                 |
 |      log_smooth_window   |    滑动窗口大小            |       20          |                \                 |
 |      print_batch_step    |    设置打印log间隔         |       10          |                \                 |
 |      save_model_dir      |    设置模型保存路径        |  output/{算法名称}  |                \                 |
 |      save_epoch_step     |    设置模型保存间隔        |       3           |                \                 |
 |      eval_batch_step     |    设置模型评估间隔        | 2000 或 [1000, 2000]        | 2000 表示每2000次迭代评估一次，[1000， 2000]表示从1000次迭代开始，每2000次评估一次   |
-|train_batch_size_per_card |  设置训练时单卡batch size    |         256         |                \                 |
-| test_batch_size_per_card |  设置评估时单卡batch size    |         256         |                \                 |
-|      image_shape         |    设置输入图片尺寸        |   [3, 32, 100]    |                \                 |
+|      cal_metric_during_train     |    设置是否在训练过程中评估指标，此时评估的是模型在当前batch下的指标        |       true         |                \                 |
+|      load_static_weights     |   设置预训练模型是否是静态图模式保存(目前仅检测算法需要)        |       true         |                \                 |
+|      pretrained_model    |    设置加载预训练模型路径      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
+|      checkpoints         |    加载模型参数路径            |       None        |    用于中断后加载参数继续训练 |
+|      use_visualdl  |    设置是否启用visualdl进行可视化log展示 |          False        |    [教程地址](https://www.paddlepaddle.org.cn/paddle/visualdl) |
+|      infer_img            |    设置预测图像路径或文件夹路径     |       ./infer_img | \|
+|      character_dict_path |    设置字典路径            |  ./ppocr/utils/ppocr_keys_v1.txt  |    \                 |
 |      max_text_length     |    设置文本最大长度        |       25          |                \                 |
 |      character_type      |    设置字符类型            |       ch          |    en/ch, en时将使用默认dict，ch时使用自定义dict|
-|      character_dict_path |    设置字典路径            |  ./ppocr/utils/ic15_dict.txt  |    \                 |
-|      loss_type           |    设置 loss 类型              |       ctc         |    支持两种loss： ctc / attention |
-|       distort            |    设置是否使用数据增强          |       false       |  设置为true时，将在训练时随机进行扰动，支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)                 |
-|       use_space_char     |    设置是否识别空格             |        false      |          仅在 character_type=ch 时支持空格                 |
+|      use_space_char     |    设置是否识别空格             |        True      |          仅在 character_type=ch 时支持空格                 |
 |      label_list          |    设置方向分类器支持的角度       |    ['0','180']    |     仅在方向分类器中生效 |
-|      average_window      |    ModelAverage优化器中的窗口长度计算比例 |  0.15       |       目前仅应用与SRN |
-|      max_average_window  |    平均值计算窗口长度的最大值   |   15625              | 推荐设置为一轮训练中mini-batchs的数目|
-|      min_average_window  |    平均值计算窗口长度的最小值  |    10000              |      \          |
-|      reader_yml          |    设置reader配置文件          |  ./configs/rec/rec_icdar15_reader.yml  |  \          |
-|      pretrain_weights    |    加载预训练模型路径      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
-|      checkpoints         |    加载模型参数路径            |       None        |    用于中断后加载参数继续训练 |
-|      save_inference_dir  |    inference model 保存路径 |          None        |    用于保存inference model |
+|      save_res_path          |    设置检测模型的结果保存地址       |    ./output/det_db/predicts_db.txt    |     仅在检测模型中生效 |

-## 配置文件 Reader 系列参数介绍
+### Optimizer ([ppocr/optimizer](../../ppocr/optimizer))

-以 `rec_chinese_reader.yml` 为例
+|         字段             |            用途            |      默认值        |            备注             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         优化器类名          |  Adam  |  目前支持`Momentum`,`Adam`,`RMSProp`, 见[ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py)  |
+|      beta1           |    设置一阶矩估计的指数衰减率  |       0.9         |               \             |
+|      beta2           |    设置二阶矩估计的指数衰减率  |     0.999         |               \             |
+|      **lr**                |         设置学习率decay方式       |   -    |       \  |
+|        name    |      学习率decay类名   |         Cosine       | 目前支持`Linear`,`Cosine`,`Step`,`Piecewise`, 见[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
+|        learning_rate      |    基础学习率        |       0.001      |  \        |
+|      **regularizer**      |  设置网络正则化方式        |       -      | \        |
+|        name      |    正则化类名      |       L2     | 目前支持`L1`,`L2`, 见[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py)        |
+|        factor      |    学习率衰减系数       |       0.00004     |  \        |

-|         字段             |            用途                |      默认值       |            备注            |
-| :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|      reader_function     |    选择数据读取方式        |  ppocr.data.rec.dataset_traversal,SimpleReader  | 支持SimpleReader / LMDBReader 两种数据读取方式 |
-|      num_workers             |    设置数据读取线程数            |       8        |                \                 |
-|      img_set_dir          |    数据集路径             |       ./train_data        |                \                 |
-|      label_file_path      |    数据标签路径           |       ./train_data/rec_gt_train.txt| \    |
-|      infer_img            |    预测图像文件夹路径     |       ./infer_img | \|

-## 配置文件 Optimizer 系列参数介绍
+### Architecture ([ppocr/modeling](../../ppocr/modeling))
+在ppocr中，网络被划分为Transform,Backbone,Neck和Head四个阶段
+
+|         字段             |            用途            |      默认值        |            备注             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      model_type        |         网络类型          |  rec  |  目前支持`rec`,`det`,`cls`  |
+|      algorithm           |    模型名称  |       CRNN         |               支持列表见[algorithm_overview](./algorithm_overview.md)             |
+|      **Transform**           |    设置变换方式  |       -       |               目前仅rec类型的算法支持, 具体见[ppocr/modeling/transform](../../ppocr/modeling/transform)              |
+|        name    |      变换方式类名   |         TPS       | 目前支持`TPS` |
+|        num_fiducial      |    TPS控制点数        |       20      |  上下边各十个       |
+|        loc_lr      |    定位网络学习率        |       0.1      |  \      |
+|        model_name      |    定位网络大小        |       small      |  目前支持`small`,`large`       |
+|      **Backbone**      |  设置网络backbone类名        |       -      | 具体见[ppocr/modeling/backbones](../../ppocr/modeling/backbones)        |
+|        name      |    backbone类名       |       ResNet     | 目前支持`MobileNetV3`,`ResNet`        |
+|        layers      |    resnet层数       |       34     |  支持18,34,50,101,152,200       |
+|        model_name      |    MobileNetV3 网络大小       |       small     |  支持`small`,`large`       |
+|      **Neck**      |  设置网络neck        |       -      | 具体见[ppocr/modeling/necks](../../ppocr/modeling/necks)        |
+|        name      |    neck类名       |       SequenceEncoder     | 目前支持`SequenceEncoder`,`DBFPN`        |
+|        encoder_type      |    SequenceEncoder编码器类型       |       rnn     |  支持`reshape`,`fc`,`rnn`       |
+|        hidden_size      |   rnn内部单元数       |       48     |  \      |
+|        out_channels      |   DBFPN输出通道数       |       256     |  \      |
+|      **Head**      |  设置网络Head        |       -      | 具体见[ppocr/modeling/heads](../../ppocr/modeling/heads)        |
+|        name      |    head类名       |       CTCHead     | 目前支持`CTCHead`,`DBHead`,`ClsHead`        |
+|        fc_decay      |    CTCHead正则化系数       |       0.0004     |  \      |
+|        k      |   DBHead二值化系数       |       50     |  \      |
+|        class_dim      |   ClsHead输出分类数       |       2     |  \      |
+

-以 `rec_icdar15_train.yml` 为例
+### Loss ([ppocr/losses](../../ppocr/losses))
+
+|         字段             |            用途            |      默认值        |            备注             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         网络loss类名          |  CTCLoss  |  目前支持`CTCLoss`,`DBLoss`,`ClsLoss`  |
+|      balance_loss        |        DBLossloss中是否对正负样本数量进行均衡(使用OHEM)         |  True  |  \  |
+|      ohem_ratio        |        DBLossloss中的OHEM的负正样本比例         |  3  |  \  |
+|      main_loss_type        |        DBLossloss中shrink_map所采用的的loss        |  DiceLoss  |  支持`DiceLoss`,`BCELoss`  |
+|      alpha        |        DBLossloss中shrink_map_loss的系数       |  5  |  \  |
+|      beta        |        DBLossloss中threshold_map_loss的系数       |  10  |  \  |
+
+### PostProcess ([ppocr/postprocess](../../ppocr/postprocess))
+
+|         字段             |            用途            |      默认值        |            备注             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         后处理类名          |  CTCLabelDecode  |  目前支持`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`  |
+|      thresh        |        DBPostProcess中分割图进行二值化的阈值         |  0.3  |  \  |
+|      box_thresh        |        DBPostProcess中对输出框进行过滤的阈值，低于此阈值的框不会输出         |  0.7  |  \  |
+|      max_candidates        |        DBPostProcess中输出的最大文本框数量        |  1000  |   |
+|      unclip_ratio        |        DBPostProcess中对文本框进行放大的比例       |  2.0  |  \  |
+
+### Metric ([ppocr/metrics](../../ppocr/metrics))
+
+|         字段             |            用途            |      默认值        |            备注             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         指标评估方法名称          |  CTCLabelDecode  |  目前支持`DetMetric`,`RecMetric`,`ClsMetric`  |
+|      main_indicator        |        主要指标,用于选取最优模型         |  acc |  对于检测方法为hmean，识别和分类方法为acc  |

+### Dataset  ([ppocr/data](../../ppocr/data))
 |         字段             |            用途            |      默认值        |            备注             |
 | :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|         function        |         选择优化器          |  pocr.optimizer,AdamDecay  |  目前只支持Adam方式  |
-|         base_lr         |      设置初始学习率          |       0.0005      |               \             |
-|         beta1           |    设置一阶矩估计的指数衰减率  |       0.9         |               \             |
-|         beta2           |    设置二阶矩估计的指数衰减率  |     0.999         |               \             |
-|         decay           |         是否使用decay       |    \              |               \             |
-|      function(decay)    |         设置decay方式       |   -    |       目前支持cosine_decay, cosine_decay_warmup与piecewise_decay  |
-|      step_each_epoch    |      每个epoch包含多少次迭代, cosine_decay/cosine_decay_warmup时有效   |         20       | 计算方式：total_image_num / (batch_size_per_card * card_size) |
-|        total_epoch      |    总共迭代多少个epoch, cosine_decay/cosine_decay_warmup时有效        |       1000      | 与Global.epoch_num 一致        |
-|        warmup_minibatch      |  线性warmup的迭代次数, cosine_decay_warmup时有效        |       1000      | \        |
-|        boundaries      |    学习率下降时的迭代次数间隔, piecewise_decay时有效       |       -      | 参数为列表形式        |
-|        decay_rate      |    学习率衰减系数, piecewise_decay时有效       |       -      |  \        |
+|      **dataset**        |         每次迭代返回一个样本          |  -  |  -  |
+|      name        |        dataset类名         |  SimpleDataSet |  目前支持`SimpleDataSet`和`LMDBDateSet`  |
+|      data_dir        |        数据集图片存放路径         |  ./train_data |  \  |
+|      label_file_list        |        数据标签路径         |  ["./train_data/train_list.txt"] | dataset为LMDBDateSet时不需要此参数   |
+|      ratio_list        |        数据集的比例         |  [1.0] | 若label_file_list中有两个train_list，且ratio_list为[0.4,0.6]，则从train_list1中采样40%，从train_list2中采样60%组合整个dataset   |
+|      transforms        |        对图片和标签进行变换的方法列表         |  [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] |   见[ppocr/data/imaug](../../ppocr/data/imaug)  |
+|      **loader**        |        dataloader相关         |  - |   |
+|      shuffle        |        每个epoch是否将数据集顺序打乱         |  True | \  |
+|      batch_size_per_card        |        训练时单卡batch size         |  256 | \  |
+|      drop_last        |        是否丢弃因数据集样本数不能被 batch_size 整除而产生的最后一个不完整的mini-batch        |  True | \  |
+|      num_workers        |        用于加载数据的子进程个数，若为0即为不开启子进程，在主进程中进行数据加载        |  8 | \  |
\ No newline at end of file
--- a/doc/doc_ch/tree.md
+++ b/doc/doc_ch/tree.md
--- a/doc/doc_ch/whl.md
+++ b/doc/doc_ch/whl.md
@@ -261,6 +261,61 @@ im_show.save('result.jpg')
 paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
 ```

+### 使用网络图片或者numpy数组作为输入
+
+1. 网络图片
+
+代码使用
+```python
+from paddleocr import PaddleOCR, draw_ocr
+# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语，可以通过修改lang参数进行切换
+# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
+ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
+img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
+result = ocr.ocr(img_path, cls=True)
+for line in result:
+    print(line)
+
+# 显示结果
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+命令行模式
+```bash
+paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
+```
+
+2. numpy数组
+仅通过代码使用时支持numpy数组作为输入
+```python
+from paddleocr import PaddleOCR, draw_ocr
+# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语，可以通过修改lang参数进行切换
+# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
+ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
+img_path = 'PaddleOCR/doc/imgs/11.jpg'
+img = cv2.imread(img_path)
+# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), 如果你自己训练的模型支持灰度图，可以将这句话的注释取消
+result = ocr.ocr(img_path, cls=True)
+for line in result:
+    print(line)
+
+# 显示结果
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+
 ## 参数说明

 | 字段                    | 说明                                                                                                                                                                                                                 | 默认值                  |
@@ -285,6 +340,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
 | max_text_length         | 识别算法能识别的最大文字长度                                                                                                                                                                                         | 25                      |
 | rec_char_dict_path      | 识别模型字典路径，当rec_model_dir使用方式2传参时需要修改为自己的字典路径                                                                                                                                                | ./ppocr/utils/ppocr_keys_v1.txt                        |
 | use_space_char          | 是否识别空格                                                                                                                                                                                                         | TRUE                    |
+| drop_score          | 对输出按照分数(来自于识别模型)进行过滤，低于此分数的不返回                                                                                                                                                                                                         | 0.5                    |
 | use_angle_cls          | 是否加载分类模型                                                                                                                                                                                                         | FALSE                    |
 | cls_model_dir          | 分类模型所在文件夹。传参方式有两种，1. None: 自动下载内置模型到 `~/.paddleocr/cls`；2.自己转换好的inference模型路径，模型路径下必须包含model和params文件                                                                                 | None                    |
 | cls_image_shape          | 分类算法的输入图片尺寸                                                                           | "3, 48, 192"                    |
@@ -295,4 +351,4 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
 | lang                     | 模型语言类型,目前支持 中文(ch)和英文(en)                                                                                                                                                                                                  | ch                    |
 | det                     | 前向时使用启动检测                                                                                                                                                                                                   | TRUE                    |
 | rec                     | 前向时是否启动识别                                                                                                                                                                                                   | TRUE                    |
-| cls                     | 前向时是否启动分类                                                                                                                                                                                                 | FALSE                    |
+| cls                     | 前向时是否启动分类 (命令行模式下使用use_angle_cls控制前向是否启动分类)                                                                                                                                                                                                | FALSE                    |
--- a/doc/doc_en/add_new_algorithm_en.md
+++ b/doc/doc_en/add_new_algorithm_en.md
+# Add new algorithm
+
+PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
+
+* Data loading and processing
+* Network
+* Post-processing
+* Loss
+* Metric
+* Optimizer
+
+The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.
+
+
+## Data loading and processing
+
+Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:
+
+```bash
+ppocr/data/
+├── imaug             # Scripts for image reading, data augment and label production
+│   ├── label_ops.py  # Modules that transform the label
+│   ├── operators.py  # Modules that transform the image
+│   ├──.....
+├── __init__.py
+├── lmdb_dataset.py   # The dataset that reads the lmdb
+└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
+```
+
+PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:
+
+1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
+2. Add code in the my_module.py file, the sample code is as follows:
+
+```python
+class MyModule:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, data):
+        img = data['image']
+        label = data['label']
+        # your process code
+
+        data['image'] = img
+        data['label'] = label
+        return data
+```
+
+3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.
+
+All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:
+
+```yaml
+# angle class data process
+transforms:
+  - DecodeImage: # load image
+      img_mode: BGR
+      channel_first: False
+  - MyModule:
+      args1: args1
+      args2: args2
+  - KeepKeys:
+      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
+```
+
+## Network
+
+The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
+necks->heads).
+
+```bash
+├── architectures # Code for building network
+├── transforms    # Image Transformation Module
+├── backbones     # Feature extraction module
+├── necks         # Feature enhancement module
+└── heads         # Output module
+```
+
+PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:
+
+1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
+2. Add code in the my_backbone.py file, the sample code is as follows:
+
+```python
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class MyBackbone(nn.Layer):
+    def __init__(self, *args, **kwargs):
+        super(MyBackbone, self).__init__()
+        # your init code
+        self.conv = nn.xxxx
+
+    def forward(self, inputs):
+        # your necwork forward
+        y = self.conv(inputs)
+        return y
+```
+
+3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
+
+After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
+
+```yaml
+Architecture:
+  model_type: rec
+  algorithm: CRNN
+  Transform:
+    name: MyTransform
+    args1: args1
+    args2: args2
+  Backbone:
+    name: MyBackbone
+    args1: args1
+  Neck:
+    name: MyNeck
+    args1: args1
+  Head:
+    name: MyHead
+    args1: args1
+```
+
+## Post-processing
+
+Post-processing realizes decoding network output to obtain text box or recognized text. This part is under [ppocr/postprocess](../../ppocr/postprocess).
+PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:
+
+1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
+2. Add code in the my_postprocess.py file, the sample code is as follows:
+
+```python
+import paddle
+
+
+class MyPostProcess:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, preds, label=None, *args, **kwargs):
+        if isinstance(preds, paddle.Tensor):
+            preds = preds.numpy()
+        # you preds decode code
+        preds = self.decode_preds(preds)
+        if label is None:
+            return preds
+        # you label decode code
+        label = self.decode_label(label)
+        return preds, label
+
+    def decode_preds(self, preds):
+        # you preds decode code
+        pass
+
+    def decode_label(self, preds):
+        # you label decode code
+        pass
+```
+
+3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.
+
+After the post-processing module is added, you only need to configure it in the configuration file to use, such as:
+
+```yaml
+PostProcess:
+  name: MyPostProcess
+  args1: args1
+  args2: args2
+```
+
+## Loss
+
+The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
+PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:
+
+1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
+2. Add code in the my_loss.py file, the sample code is as follows:
+
+```python
+import paddle
+from paddle import nn
+
+
+class MyLoss(nn.Layer):
+    def __init__(self, **kwargs):
+        super(MyLoss, self).__init__()
+        # you init code
+        pass
+
+    def __call__(self, predicts, batch):
+        label = batch[1]
+        # your loss code
+        loss = self.loss(input=predicts, label=label)
+        return {'loss': loss}
+```
+
+3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.
+
+After the loss function module is added, you only need to configure it in the configuration file to use it, such as:
+
+```yaml
+Loss:
+  name: MyLoss
+  args1: args1
+  args2: args2
+```
+
+## Metric
+
+Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:
+
+1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
+2. Add code in the my_metric.py file, the sample code is as follows:
+
+```python
+
+class MyMetric(object):
+    def __init__(self, main_indicator='acc', **kwargs):
+        # main_indicator is used for select best model
+        self.main_indicator = main_indicator
+        self.reset()
+
+    def __call__(self, preds, batch, *args, **kwargs):
+        # preds is out of postprocess
+        # batch is out of dataloader
+        labels = batch[1]
+        cur_correct_num = 0
+        cur_all_num = 0
+        # you metric code
+        self.correct_num += cur_correct_num
+        self.all_num += cur_all_num
+        return {'acc': cur_correct_num / cur_all_num, }
+
+    def get_metric(self):
+        """
+        return metircs {
+                 'acc': 0,
+                 'norm_edit_dis': 0,
+            }
+        """
+        acc = self.correct_num / self.all_num
+        self.reset()
+        return {'acc': acc}
+
+    def reset(self):
+        # reset metric
+        self.correct_num = 0
+        self.all_num = 0
+
+```
+
+3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.
+
+After the metric module is added, you only need to configure it in the configuration file to use it, such as:
+
+```yaml
+Metric:
+  name: MyMetric
+  main_indicator: acc
+```
+
+## 优化器
+
+The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
+Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
+Modules without built-in can be added through the following steps, take `optimizer` as an example:
+
+1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:
+
+```python
+from paddle import optimizer as optim
+
+
+class MyOptim(object):
+    def __init__(self, learning_rate=0.001, *args, **kwargs):
+        self.learning_rate = learning_rate
+
+    def __call__(self, parameters):
+        # It is recommended to wrap the built-in optimizer of paddle
+        opt = optim.XXX(
+            learning_rate=self.learning_rate,
+            parameters=parameters)
+        return opt
+
+```
+
+After the optimizer module is added, you only need to configure it in the configuration file to use, such as:
+
+```yaml
+Optimizer:
+  name: MyOptim
+  args1: args1
+  args2: args2
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
\ No newline at end of file
--- a/doc/doc_en/angle_class_en.md
+++ b/doc/doc_en/angle_class_en.md
@@ -65,26 +65,35 @@ Start training:
 ```
 # Set PYTHONPATH path
 export PYTHONPATH=$PYTHONPATH:.
-# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-# Training icdar15 English data
-python3 tools/train.py -c configs/cls/cls_mv3.yml
+# GPU training Support single card and multi-card training, specify the card number through selected_gpus
+# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
+python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/cls/cls_mv3.yml
 ```

 - Data Augmentation

-PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.
+PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.

 The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.

 Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
-[randaugment.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/cls/randaugment.py)
-[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
+[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) 
+[randaugment.py](../../ppocr/data/imaug/randaugment.py)


 - Training

-PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/cls_mv3/best_accuracy` during the evaluation process.
+PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 1000 iter. The following content will be saved during training:
+```bash
+├── best_accuracy.pdopt # Optimizer parameters for the best model
+├── best_accuracy.pdparams # Parameters of the best model
+├── best_accuracy.states # Metric info and epochs of the best model
+├── config.yml # Configuration file for this experiment
+├── latest.pdopt # Optimizer parameters for the latest model
+├── latest.pdparams # Parameters of the latest model
+├── latest.states # Metric info and epochs of the latest model
+└── train.log # Training log
+```

 If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.

@@ -92,7 +101,7 @@ If the evaluation set is large, the test will be time-consuming. It is recommend

 ### EVALUATION

-The evaluation data set can be modified via `configs/cls/cls_reader.yml` setting of `label_file_path` in EvalReader.
+The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.

 ```
 export CUDA_VISIBLE_DEVICES=0
@@ -106,21 +115,20 @@ python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/

 Using the model trained by paddleocr, you can quickly get prediction through the following script.

-The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
+Use `Global.infer_img` to specify the path of the predicted picture or folder, and use `Global.checkpoints` to specify the weight:

 ```
 # Predict English results
-python3 tools/infer_rec.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
+python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_10.png
 ```

 Input image:

-![](../imgs_words/en/word_1.png)
+![](../imgs_words_en/word_10.png)

 Get the prediction result of the input image:

 ```
-infer_img: doc/imgs_words/en/word_1.png
-    scores: [[0.93161047 0.06838956]]
-    label: [0]
+infer_img: doc/imgs_words_en/word_10.png
+     result: ('0', 0.9999995)
 ```
--- a/doc/doc_en/config_en.md
+++ b/doc/doc_en/config_en.md
-# OPTIONAL PARAMETERS LIST
+## Optional parameter list

-The following list can be viewed via `--help`
+The following list can be viewed through `--help`

 |         FLAG             |     Supported script    |        Use        |      Defaults       |         Note         |
 | :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
-|          -c              |      ALL       |  Specify configuration file to use |  None  |  **Please refer to the parameter introduction for configuration file usage** |
-|          -o              |      ALL       |  set configuration options  |  None  |  Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false`  |  
-
+|          -c              |      ALL       |  Specify configuration file to use  |  None  |  **Please refer to the parameter introduction for configuration file usage** |
+|          -o              |      ALL       |  set configuration options  |  None  |  Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |

 ## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE

-Take `rec_chinese_lite_train_v1.1.yml` as an example
-
+Take rec_chinese_lite_train_v1.1.yml as an example
+### Global 

-|         Parameter             |            Use                |      Default       |            Note            |
+|         Parameter             |            Use                |      Defaults       |            Note            |
 | :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|      algorithm           |    Select algorithm to use                    |  Synchronize with configuration file   |     For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README_en.md) |
-|      use_gpu             |    Set using GPU or not            |       true        |                \                 |
-|      epoch_num           |    Maximum training epoch number             |       3000        |                \                 |
+|      use_gpu             |    Set using GPU or not           |       true        |                \                 |
+|      epoch_num           |    Maximum training epoch number             |       500        |                \                 |
 |      log_smooth_window   |    Sliding window size            |       20          |                \                 |
 |      print_batch_step    |    Set print log interval         |       10          |                \                 |
-|      save_model_dir      |    Set model save path        |  output/{model_name}  |                \                 |
+|      save_model_dir      |    Set model save path        |  output/{算法名称}  |                \                 |
 |      save_epoch_step     |    Set model save interval        |       3           |                \                 |
-|      eval_batch_step     |    Set the model evaluation interval        |2000 or [1000, 2000] |runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration  |
-|train_batch_size_per_card |  Set the batch size during training   |         256         |                \                 |
-| test_batch_size_per_card |  Set the batch size during testing    |         256         |                \                 |
-|      image_shape         |    Set input image size        |   [3, 32, 100]    |                \                 |
-|      max_text_length     |    Set the maximum text length        |       25          |                \                 |
-|      character_type      |    Set character type            |       ch          |    en/ch, the default dict will be used for en, and the custom dict will be used for ch|
-|      character_dict_path |    Set dictionary path            |  ./ppocr/utils/ic15_dict.txt  |    \                 |
-|      loss_type           |    Set loss type              |       ctc         |    Supports two types of loss: ctc / attention |
-|       distort            |    Set use distort          |       false       |  Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)                 |
-|      use_space_char          |    Wether to recognize space             |        false      |         Only support in character_type=ch mode                 |
-     label_list          | Set the angle supported by the direction classifier | ['0','180'] | Only valid in the direction classifier |
-|      reader_yml          |    Set the reader configuration file          |  ./configs/rec/rec_icdar15_reader.yml  |  \          |
-|      pretrain_weights    |    Load pre-trained model path      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
-|      checkpoints         |    Load saved model path            |       None        |    Used to load saved parameters to continue training after interruption |
-|      save_inference_dir  |   path to save model for inference |          None        |   Use to save inference model |
-
-## INTRODUCTION TO READER PARAMETERS OF CONFIGURATION FILE
-
-Take `rec_chinese_reader.yml` as an example:
-
-|         Parameter             |            Use                |      Default       |            Note            |
-| :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|      reader_function     |    Select data reading method        |  ppocr.data.rec.dataset_traversal,SimpleReader  | Support two data reading methods: SimpleReader / LMDBReader  |
-|      num_workers             |    Set the number of data reading threads            |       8        |                \                 |
-|      img_set_dir          |    Image folder path             |       ./train_data        |                \                 |
-|      label_file_path      |    Groundtruth file path           |       ./train_data/rec_gt_train.txt| \    |
-|      infer_img            |    Result folder path     |       ./infer_img | \|
+|      eval_batch_step     |    Set the model evaluation interval        | 2000 or [1000, 2000]        | runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration   |
+|      cal_metric_during_train     |    Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated        |       true         |                \                 |
+|      load_static_weights     |   Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)        |       true         |                \                 |
+|      pretrained_model    |    Set the path of the pre-trained model      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
+|      checkpoints         |    set model parameter path            |       None        |   Used to load parameters after interruption to continue training|
+|      use_visualdl  |    Set whether to enable visualdl for visual log display |          False        |    [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
+|      infer_img            |    Set inference image path or folder path     |       ./infer_img | \|
+|      character_dict_path |    Set dictionary path            |  ./ppocr/utils/ppocr_keys_v1.txt  |    \                 |
+|      max_text_length     |    Set the maximum length of text        |       25          |                \                 |
+|      character_type      |    Set character type            |       ch          |    en/ch, the default dict will be used for en, and the custom dict will be used for ch |
+|      use_space_char     |    Set whether to recognize spaces             |        True      |          Only support in character_type=ch mode                 |
+|      label_list          |    Set the angle supported by the direction classifier       |    ['0','180']    |     Only valid in angle classifier model |
+|      save_res_path          |    Set the save address of the test model results       |    ./output/det_db/predicts_db.txt    |     Only valid in the text detection model |
+
+### Optimizer ([ppocr/optimizer](../../ppocr/optimizer))
+
+|         Parameter             |            Use            |      Defaults        |            Note             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         Optimizer class name          |  Adam  |  Currently supports`Momentum`,`Adam`,`RMSProp`, see [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py)  |
+|      beta1           |    Set the exponential decay rate for the 1st moment estimates  |       0.9         |               \             |
+|      beta2           |    Set the exponential decay rate for the 2nd moment estimates  |     0.999         |               \             |
+|      **lr**                |         Set the learning rate decay method       |   -    |       \  |
+|        name    |      Learning rate decay class name   |         Cosine       | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
+|        learning_rate      |    Set the base learning rate        |       0.001      |  \        |
+|      **regularizer**      |  Set network regularization method        |       -      | \        |
+|        name      |    Regularizer class name      |       L2     |  Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py)        |
+|        factor      |    Learning rate decay coefficient       |       0.00004     |  \        |
+
+
+### Architecture ([ppocr/modeling](../../ppocr/modeling))
+In ppocr, the network is divided into four stages: Transform, Backbone, Neck and Head
+
+|         Parameter             |            Use            |      Defaults        |            Note             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      model_type        |         Network Type          |  rec  |  Currently support`rec`,`det`,`cls`  |
+|      algorithm           |    Model name  |       CRNN         |               See [algorithm_overview](./algorithm_overview.md) for the support list             |
+|      **Transform**           |    Set the transformation method  |       -       |               Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details            |
+|        name    |      Transformation class name   |         TPS       | Currently supports `TPS` |
+|        num_fiducial      |   Number of TPS control points        |       20      |  Ten on the top and bottom       |
+|        loc_lr      |    Localization network learning rate        |       0.1      |  \      |
+|        model_name      |    Localization network size        |       small      |  Currently support`small`,`large`       |
+|      **Backbone**      |  Set the network backbone class name        |       -      | see [ppocr/modeling/backbones](../../ppocr/modeling/backbones)        |
+|        name      |    backbone class name       |       ResNet     | Currently support`MobileNetV3`,`ResNet`        |
+|        layers      |    resnet layers       |       34     |  Currently support18,34,50,101,152,200       |
+|        model_name      |    MobileNetV3 network size       |       small     |  Currently support`small`,`large`       |
+|      **Neck**      |  Set network neck        |       -      | see[ppocr/modeling/necks](../../ppocr/modeling/necks)        |
+|        name      |    neck class name       |       SequenceEncoder     | Currently support`SequenceEncoder`,`DBFPN`        |
+|        encoder_type      |    SequenceEncoder encoder type       |       rnn     |  Currently support`reshape`,`fc`,`rnn`       |
+|        hidden_size      |   rnn number of internal units       |       48     |  \      |
+|        out_channels      |   Number of DBFPN output channels       |       256     |  \      |
+|      **Head**      |  Set the network head        |       -      | see[ppocr/modeling/heads](../../ppocr/modeling/heads)        |
+|        name      |    head class name       |       CTCHead     | Currently support`CTCHead`,`DBHead`,`ClsHead`        |
+|        fc_decay      |    CTCHead regularization coefficient       |       0.0004     |  \      |
+|        k      |   DBHead binarization coefficient       |       50     |  \      |
+|        class_dim      |   ClsHead output category number       |       2     |  \      |
+
+
+### Loss ([ppocr/losses](../../ppocr/losses))
+
+|         Parameter             |            Use            |      Defaults        |            Note             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         loss class name          |  CTCLoss  |  Currently support`CTCLoss`,`DBLoss`,`ClsLoss`  |
+|      balance_loss        |        Whether to balance the number of positive and negative samples in DBLossloss (using OHEM)         |  True  |  \  |
+|      ohem_ratio        |        The negative and positive sample ratio of OHEM in DBLossloss         |  3  |  \  |
+|      main_loss_type        |        The loss used by shrink_map in DBLossloss        |  DiceLoss  |  Currently support`DiceLoss`,`BCELoss`  |
+|      alpha        |        The coefficient of shrink_map_loss in DBLossloss       |  5  |  \  |
+|      beta        |        The coefficient of threshold_map_loss in DBLossloss       |  10  |  \  |

-## INTRODUCTION TO OPTIMIZER PARAMETERS OF CONFIGURATION FILE
+### PostProcess ([ppocr/postprocess](../../ppocr/postprocess))

-Take `rec_icdar15_train.yml` as an example:
+|         Parameter             |            Use            |      Defaults        |            Note             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         Post-processing class name          |  CTCLabelDecode  |  Currently support`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`  |
+|      thresh        |        The threshold for binarization of the segmentation map in DBPostProcess         |  0.3  |  \  |
+|      box_thresh        |        The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output         |  0.7  |  \  |
+|      max_candidates        |        The maximum number of text boxes output in DBPostProcess        |  1000  |   |
+|      unclip_ratio        |        The unclip ratio of the text box in DBPostProcess       |  2.0  |  \  |
+
+### Metric ([ppocr/metrics](../../ppocr/metrics))
+
+|         Parameter             |            Use            |      Defaults        |            Note             |
+| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
+|      name        |         Metric method name          |  CTCLabelDecode  |  Currently support`DetMetric`,`RecMetric`,`ClsMetric`  |
+|      main_indicator        |        Main indicators, used to select the best model        |  acc |  For the detection method is hmean, the recognition and classification method is acc  |

-|         Parameter             |            Use          |      Default        |            None             |
+### Dataset  ([ppocr/data](../../ppocr/data))
+|         Parameter             |            Use            |      Defaults        |            Note             |
 | :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
-|         function        |         Select Optimizer function          |  pocr.optimizer,AdamDecay  |  Only support Adam  |
-|         base_lr         |      Set the base lr          |       0.0005      |               \             |
-|         beta1           |    Set the exponential decay rate for the 1st moment estimates  |       0.9         |               \             |
-|         beta2           |    Set the exponential decay rate for the 2nd moment estimates  |     0.999         |               \             |
-|         decay           |         Whether to use decay       |    \              |               \             |
-|      function(decay)    |         Set the decay function       |   cosine_decay    |         Support cosine_decay, cosine_decay_warmup and piecewise_decay            |
-|      step_each_epoch    |      The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup  |         20       | Calculation: total_image_num / (batch_size_per_card * card_size) |
-|        total_epoch      |    The number of epochs. Used in cosine_decay/cosine_decay_warmup      |       1000      | Consistent with Global.epoch_num      |
-|        warmup_minibatch      |  Number of steps for linear warmup. Used in cosine_decay_warmup        |       1000      | \        |
-|        boundaries      |    The step intervals to reduce learning rate. Used in piecewise_decay       |       -      |  The format is list        |
-|        decay_rate      |    Learning rate decay rate. Used in piecewise_decay       |       -      |  \        |
+|      **dataset**        |         Return one sample per iteration          |  -  |  -  |
+|      name        |        dataset class name         |  SimpleDataSet |   Currently support`SimpleDataSet`,`LMDBDateSet`  |
+|      data_dir        |        Image folder path        |  ./train_data |  \  |
+|      label_file_list        |        Groundtruth file path         |  ["./train_data/train_list.txt"] | This parameter is not required when dataset is LMDBDateSet   |
+|      ratio_list        |        Ratio of data set         |  [1.0] | If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset   |
+|      transforms        |        List of methods to transform images and labels         |  [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] |   see[ppocr/data/imaug](../../ppocr/data/imaug)  |
+|      **loader**        |        dataloader related         |  - |   |
+|      shuffle        |        Does each epoch disrupt the order of the data set         |  True | \  |
+|      batch_size_per_card        |        Single card batch size during training         |  256 | \  |
+|      drop_last        |        Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size        |  True | \  |
+|      num_workers        |        The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process       |  8 | \  |
\ No newline at end of file
--- a/doc/doc_en/tree_en.md
+++ b/doc/doc_en/tree_en.md
--- a/doc/doc_en/whl_en.md
+++ b/doc/doc_en/whl_en.md
@@ -271,6 +271,59 @@ im_show.save('result.jpg')
 paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
 ```

+### Use web images or numpy array as input
+
+1. Web image
+
+Use by code
+```python
+from paddleocr import PaddleOCR, draw_ocr
+ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
+img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
+result = ocr.ocr(img_path, cls=True)
+for line in result:
+    print(line)
+
+# show result
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+Use by command line
+```bash
+paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
+```
+
+2. Numpy array
+Support numpy array as input only when used by code
+
+```python
+from paddleocr import PaddleOCR, draw_ocr
+ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
+img_path = 'PaddleOCR/doc/imgs/11.jpg'
+img = cv2.imread(img_path)
+# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), If your own training model supports grayscale images, you can uncomment this line
+result = ocr.ocr(img_path, cls=True)
+for line in result:
+    print(line)
+
+# show result
+from PIL import Image
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.save('result.jpg')
+```
+
+
 ## Parameter Description

 | Parameter                    | Description                                                                                                                                                                                                                 | Default value                  |
@@ -295,6 +348,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
 | max_text_length         | The maximum text length that the recognition algorithm can recognize                                                                                                                                                                                         | 25                      |
 | rec_char_dict_path      | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2                                                                                                                                              | ./ppocr/utils/ppocr_keys_v1.txt                        |
 | use_space_char          | Whether to recognize spaces                                                                                                                                                                                                         | TRUE                    |
+| drop_score          | Filter the output by score (from the recognition model), and those below this score will not be returned                                                                                                                                                                                                        | 0.5                    |
 | use_angle_cls          | Whether to load classification model                                                                                                                                                                                                       | FALSE                    |
 | cls_model_dir           | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
 | cls_image_shape         | image shape of classification algorithm                                                                                                                                                                                            | "3,48,192"              |
@@ -305,4 +359,4 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_
 | lang                     | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported                                                                                                                                                                                                  | ch                    |
 | det                     | Enable detction when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
 | rec                     | Enable recognition when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
-| cls                     | Enable classification when `ppocr.ocr` func exec                                                                                                                                                                                                   | FALSE                    |
+| cls                     | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction)                                                                                                                                                                                                   | FALSE                    |
--- a/paddleocr.py
+++ b/paddleocr.py
@@ -26,17 +26,50 @@ import requests
 from tqdm import tqdm

 from tools.infer import predict_system
-from ppocr.utils.utility import initial_logger
+from ppocr.utils.logging import get_logger

-logger = initial_logger()
+logger = get_logger()
 from ppocr.utils.utility import check_and_read_gif, get_image_file_list

 __all__ = ['PaddleOCR']

-model_params = {
-    'det': 'https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar',
-    'rec':
-    'https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar',
+model_urls = {
+    'det':
+        'https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar',
+    'rec': {
+        'ch': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/ppocr_keys_v1.txt'
+        },
+        'en': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/en/en_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/ic15_dict.txt'
+        },
+        'french': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/fr/french_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/dict/french_dict.txt'
+        },
+        'german': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/ge/german_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/dict/german_dict.txt'
+        },
+        'korean': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/kr/korean_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/dict/korean_dict.txt'
+        },
+        'japan': {
+            'url':
+                'https://paddleocr.bj.bcebos.com/20-09-22/mobile/jp/japan_ppocr_mobile_v1.1_rec_infer.tar',
+            'dict_path': './ppocr/utils/dict/japan_dict.txt'
+        }
+    },
+    'cls':
+        'https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar'
 }

 SUPPORT_DET_MODEL = ['DB']
@@ -54,8 +87,8 @@ def download_with_progressbar(url, save_path):
            progress_bar.update(len(data))
            file.write(data)
    progress_bar.close()
-    if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
-        logger.error("ERROR, something went wrong")
+    if total_size_in_bytes == 0 or progress_bar.n != total_size_in_bytes:
+        logger.error("Something went wrong while downloading models")
        sys.exit(0)


@@ -63,7 +96,7 @@ def maybe_download(model_storage_directory, url):
    # using custom model
    if not os.path.exists(os.path.join(
            model_storage_directory, 'model')) or not os.path.exists(
-                os.path.join(model_storage_directory, 'params')):
+        os.path.join(model_storage_directory, 'params')):
        tmp_path = os.path.join(model_storage_directory, url.split('/')[-1])
        print('download {} to {}'.format(url, tmp_path))
        os.makedirs(model_storage_directory, exist_ok=True)
@@ -84,53 +117,102 @@ def maybe_download(model_storage_directory, url):
        os.remove(tmp_path)


-def parse_args():
+def parse_args(mMain=True, add_help=True):
    import argparse

    def str2bool(v):
        return v.lower() in ("true", "t", "1")

-    parser = argparse.ArgumentParser()
-    # params for prediction engine
-    parser.add_argument("--use_gpu", type=str2bool, default=True)
-    parser.add_argument("--ir_optim", type=str2bool, default=True)
-    parser.add_argument("--use_tensorrt", type=str2bool, default=False)
-    parser.add_argument("--gpu_mem", type=int, default=8000)
-
-    # params for text detector
-    parser.add_argument("--image_dir", type=str)
-    parser.add_argument("--det_algorithm", type=str, default='DB')
-    parser.add_argument("--det_model_dir", type=str, default=None)
-    parser.add_argument("--det_max_side_len", type=float, default=960)
-
-    # DB parmas
-    parser.add_argument("--det_db_thresh", type=float, default=0.3)
-    parser.add_argument("--det_db_box_thresh", type=float, default=0.5)
-    parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0)
-
-    # EAST parmas
-    parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
-    parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
-    parser.add_argument("--det_east_nms_thresh", type=float, default=0.2)
-
-    # params for text recognizer
-    parser.add_argument("--rec_algorithm", type=str, default='CRNN')
-    parser.add_argument("--rec_model_dir", type=str, default=None)
-    parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
-    parser.add_argument("--rec_char_type", type=str, default='ch')
-    parser.add_argument("--rec_batch_num", type=int, default=30)
-    parser.add_argument("--max_text_length", type=int, default=25)
-    parser.add_argument(
-        "--rec_char_dict_path",
-        type=str,
-        default="./ppocr/utils/ppocr_keys_v1.txt")
-    parser.add_argument("--use_space_char", type=bool, default=True)
-    parser.add_argument("--enable_mkldnn", type=bool, default=False)
-
-    parser.add_argument("--det", type=str2bool, default=True)
-    parser.add_argument("--rec", type=str2bool, default=True)
-    parser.add_argument("--use_zero_copy_run", type=bool, default=False)
-    return parser.parse_args()
+    if mMain:
+        parser = argparse.ArgumentParser(add_help=add_help)
+        # params for prediction engine
+        parser.add_argument("--use_gpu", type=str2bool, default=True)
+        parser.add_argument("--ir_optim", type=str2bool, default=True)
+        parser.add_argument("--use_tensorrt", type=str2bool, default=False)
+        parser.add_argument("--gpu_mem", type=int, default=8000)
+
+        # params for text detector
+        parser.add_argument("--image_dir", type=str)
+        parser.add_argument("--det_algorithm", type=str, default='DB')
+        parser.add_argument("--det_model_dir", type=str, default=None)
+        parser.add_argument("--det_limit_side_len", type=float, default=960)
+        parser.add_argument("--det_limit_type", type=str, default='max')
+
+        # DB parmas
+        parser.add_argument("--det_db_thresh", type=float, default=0.3)
+        parser.add_argument("--det_db_box_thresh", type=float, default=0.5)
+        parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0)
+
+        # EAST parmas
+        parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
+        parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
+        parser.add_argument("--det_east_nms_thresh", type=float, default=0.2)
+
+        # params for text recognizer
+        parser.add_argument("--rec_algorithm", type=str, default='CRNN')
+        parser.add_argument("--rec_model_dir", type=str, default=None)
+        parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
+        parser.add_argument("--rec_char_type", type=str, default='ch')
+        parser.add_argument("--rec_batch_num", type=int, default=30)
+        parser.add_argument("--max_text_length", type=int, default=25)
+        parser.add_argument("--rec_char_dict_path", type=str, default=None)
+        parser.add_argument("--use_space_char", type=bool, default=True)
+        parser.add_argument("--drop_score", type=float, default=0.5)
+
+        # params for text classifier
+        parser.add_argument("--cls_model_dir", type=str, default=None)
+        parser.add_argument("--cls_image_shape", type=str, default="3, 48, 192")
+        parser.add_argument("--label_list", type=list, default=['0', '180'])
+        parser.add_argument("--cls_batch_num", type=int, default=30)
+        parser.add_argument("--cls_thresh", type=float, default=0.9)
+
+        parser.add_argument("--enable_mkldnn", type=bool, default=False)
+        parser.add_argument("--use_zero_copy_run", type=bool, default=False)
+        parser.add_argument("--use_pdserving", type=str2bool, default=False)
+
+        parser.add_argument("--lang", type=str, default='ch')
+        parser.add_argument("--det", type=str2bool, default=True)
+        parser.add_argument("--rec", type=str2bool, default=True)
+        parser.add_argument("--use_angle_cls", type=str2bool, default=False)
+        return parser.parse_args()
+    else:
+        return argparse.Namespace(use_gpu=True,
+                                  ir_optim=True,
+                                  use_tensorrt=False,
+                                  gpu_mem=8000,
+                                  image_dir='',
+                                  det_algorithm='DB',
+                                  det_model_dir=None,
+                                  det_limit_side_len=960,
+                                  det_limit_type='max',
+                                  det_db_thresh=0.3,
+                                  det_db_box_thresh=0.5,
+                                  det_db_unclip_ratio=2.0,
+                                  det_east_score_thresh=0.8,
+                                  det_east_cover_thresh=0.1,
+                                  det_east_nms_thresh=0.2,
+                                  rec_algorithm='CRNN',
+                                  rec_model_dir=None,
+                                  rec_image_shape="3, 32, 320",
+                                  rec_char_type='ch',
+                                  rec_batch_num=30,
+                                  max_text_length=25,
+                                  rec_char_dict_path=None,
+                                  use_space_char=True,
+                                  drop_score=0.5,
+                                  cls_model_dir=None,
+                                  cls_image_shape="3, 48, 192",
+                                  label_list=['0', '180'],
+                                  cls_batch_num=30,
+                                  cls_thresh=0.9,
+                                  enable_mkldnn=False,
+                                  use_zero_copy_run=False,
+                                  use_pdserving=False,
+                                  lang='ch',
+                                  det=True,
+                                  rec=True,
+                                  use_angle_cls=False
+                                  )


 class PaddleOCR(predict_system.TextSystem):
@@ -140,18 +222,31 @@ class PaddleOCR(predict_system.TextSystem):
        args:
            **kwargs: other params show in paddleocr --help
        """
-        postprocess_params = parse_args()
+        postprocess_params = parse_args(mMain=False, add_help=False)
        postprocess_params.__dict__.update(**kwargs)
+        self.use_angle_cls = postprocess_params.use_angle_cls
+        lang = postprocess_params.lang
+        assert lang in model_urls[
+            'rec'], 'param lang must in {}, but got {}'.format(
+            model_urls['rec'].keys(), lang)
+        if postprocess_params.rec_char_dict_path is None:
+            postprocess_params.rec_char_dict_path = model_urls['rec'][lang][
+                'dict_path']

        # init model dir
        if postprocess_params.det_model_dir is None:
            postprocess_params.det_model_dir = os.path.join(BASE_DIR, 'det')
        if postprocess_params.rec_model_dir is None:
-            postprocess_params.rec_model_dir = os.path.join(BASE_DIR, 'rec')
+            postprocess_params.rec_model_dir = os.path.join(
+                BASE_DIR, 'rec/{}'.format(lang))
+        if postprocess_params.cls_model_dir is None:
+            postprocess_params.cls_model_dir = os.path.join(BASE_DIR, 'cls')
        print(postprocess_params)
        # download model
-        maybe_download(postprocess_params.det_model_dir, model_params['det'])
-        maybe_download(postprocess_params.rec_model_dir, model_params['rec'])
+        maybe_download(postprocess_params.det_model_dir, model_urls['det'])
+        maybe_download(postprocess_params.rec_model_dir,
+                       model_urls['rec'][lang]['url'])
+        maybe_download(postprocess_params.cls_model_dir, model_urls['cls'])

        if postprocess_params.det_algorithm not in SUPPORT_DET_MODEL:
            logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL))
@@ -166,7 +261,7 @@ class PaddleOCR(predict_system.TextSystem):
        # init det_model and rec_model
        super().__init__(postprocess_params)

-    def ocr(self, img, det=True, rec=True):
+    def ocr(self, img, det=True, rec=True, cls=False):
        """
        ocr with paddleocr
        args：
@@ -175,7 +270,16 @@ class PaddleOCR(predict_system.TextSystem):
            rec: use text recognition or not, if false, only det will be exec. default is True
        """
        assert isinstance(img, (np.ndarray, list, str))
+        if isinstance(img, list) and det == True:
+            logger.error('When input a list of images, det must be false')
+            exit(0)
+
+        self.use_angle_cls = cls
        if isinstance(img, str):
+            # download net image
+            if img.startswith('http'):
+                download_with_progressbar(img, 'tmp.jpg')
+                img = 'tmp.jpg'
            image_file = img
            img, flag = check_and_read_gif(image_file)
            if not flag:
@@ -183,6 +287,8 @@ class PaddleOCR(predict_system.TextSystem):
            if img is None:
                logger.error("error in loading image:{}".format(image_file))
                return None
+        if isinstance(img, np.ndarray) and len(img.shape) == 2:
+            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
        if det and rec:
            dt_boxes, rec_res = self.__call__(img)
            return [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)]
@@ -194,20 +300,34 @@ class PaddleOCR(predict_system.TextSystem):
        else:
            if not isinstance(img, list):
                img = [img]
+            if self.use_angle_cls:
+                img, cls_res, elapse = self.text_classifier(img)
+                if not rec:
+                    return cls_res
            rec_res, elapse = self.text_recognizer(img)
            return rec_res


 def main():
-    # for com
-    args = parse_args()
-    image_file_list = get_image_file_list(args.image_dir)
+    # for cmd
+    args = parse_args(mMain=True)
+    image_dir = args.image_dir
+    if image_dir.startswith('http'):
+        download_with_progressbar(image_dir, 'tmp.jpg')
+        image_file_list = ['tmp.jpg']
+    else:
+        image_file_list = get_image_file_list(args.image_dir)
    if len(image_file_list) == 0:
        logger.error('no images find in {}'.format(args.image_dir))
        return
-    ocr_engine = PaddleOCR()
+
+    ocr_engine = PaddleOCR(**(args.__dict__))
    for img_path in image_file_list:
-        print(img_path)
-        result = ocr_engine.ocr(img_path, det=args.det, rec=args.rec)
-        for line in result:
-            print(line)
\ No newline at end of file
+        logger.info('{}{}{}'.format('*' * 10, img_path, '*' * 10))
+        result = ocr_engine.ocr(img_path,
+                                det=args.det,
+                                rec=args.rec,
+                                cls=args.use_angle_cls)
+        if result is not None:
+            for line in result:
+                logger.info(line)
--- a/ppocr/data/imaug/__init__.py
+++ b/ppocr/data/imaug/__init__.py
@@ -26,6 +26,9 @@ from .randaugment import RandAugment
 from .operators import *
 from .label_ops import *

+from .east_process import *
+from .sast_process import *
+

 def transform(data, ops=None):
    """ transform """

--- a/ppocr/data/imaug/east_process.py
+++ b/ppocr/data/imaug/east_process.py
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import math
+import cv2
+import numpy as np
+import json
+import sys
+import os
+
+__all__ = ['EASTProcessTrain']
+
+
+class EASTProcessTrain(object):
+    def __init__(self,
+                 image_shape = [512, 512],
+                 background_ratio = 0.125,
+                 min_crop_side_ratio = 0.1,
+                 min_text_size = 10,
+                 **kwargs):
+        self.input_size = image_shape[1]
+        self.random_scale = np.array([0.5, 1, 2.0, 3.0])
+        self.background_ratio = background_ratio
+        self.min_crop_side_ratio = min_crop_side_ratio
+        self.min_text_size = min_text_size
+
+    def preprocess(self, im):
+        input_size = self.input_size
+        im_shape = im.shape
+        im_size_min = np.min(im_shape[0:2])
+        im_size_max = np.max(im_shape[0:2])
+        im_scale = float(input_size) / float(im_size_max)
+        im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale)
+        img_mean = [0.485, 0.456, 0.406]
+        img_std = [0.229, 0.224, 0.225]
+        # im = im[:, :, ::-1].astype(np.float32)
+        im = im / 255
+        im -= img_mean
+        im /= img_std
+        new_h, new_w, _ = im.shape
+        im_padded = np.zeros((input_size, input_size, 3), dtype=np.float32)
+        im_padded[:new_h, :new_w, :] = im
+        im_padded = im_padded.transpose((2, 0, 1))
+        im_padded = im_padded[np.newaxis, :]
+        return im_padded, im_scale
+
+    def rotate_im_poly(self, im, text_polys):
+        """
+        rotate image with 90 / 180 / 270 degre
+        """
+        im_w, im_h = im.shape[1], im.shape[0]
+        dst_im = im.copy()
+        dst_polys = []
+        rand_degree_ratio = np.random.rand()
+        rand_degree_cnt = 1
+        if 0.333 < rand_degree_ratio < 0.666:
+            rand_degree_cnt = 2
+        elif rand_degree_ratio > 0.666:
+            rand_degree_cnt = 3
+        for i in range(rand_degree_cnt):
+            dst_im = np.rot90(dst_im)
+        rot_degree = -90 * rand_degree_cnt
+        rot_angle = rot_degree * math.pi / 180.0
+        n_poly = text_polys.shape[0]
+        cx, cy = 0.5 * im_w, 0.5 * im_h
+        ncx, ncy = 0.5 * dst_im.shape[1], 0.5 * dst_im.shape[0]
+        for i in range(n_poly):
+            wordBB = text_polys[i]
+            poly = []
+            for j in range(4):
+                sx, sy = wordBB[j][0], wordBB[j][1]
+                dx = math.cos(rot_angle) * (sx - cx)\
+                    - math.sin(rot_angle) * (sy - cy) + ncx
+                dy = math.sin(rot_angle) * (sx - cx)\
+                    + math.cos(rot_angle) * (sy - cy) + ncy
+                poly.append([dx, dy])
+            dst_polys.append(poly)
+        dst_polys = np.array(dst_polys, dtype=np.float32)
+        return dst_im, dst_polys
+
+    def polygon_area(self, poly):
+        """
+        compute area of a polygon
+        :param poly:
+        :return:
+        """
+        edge = [(poly[1][0] - poly[0][0]) * (poly[1][1] + poly[0][1]),
+                (poly[2][0] - poly[1][0]) * (poly[2][1] + poly[1][1]),
+                (poly[3][0] - poly[2][0]) * (poly[3][1] + poly[2][1]),
+                (poly[0][0] - poly[3][0]) * (poly[0][1] + poly[3][1])]
+        return np.sum(edge) / 2.
+
+    def check_and_validate_polys(self, polys, tags, img_height, img_width):
+        """
+        check so that the text poly is in the same direction,
+        and also filter some invalid polygons
+        :param polys:
+        :param tags:
+        :return:
+        """
+        h, w = img_height, img_width
+        if polys.shape[0] == 0:
+            return polys
+        polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w - 1)
+        polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h - 1)
+
+        validated_polys = []
+        validated_tags = []
+        for poly, tag in zip(polys, tags):
+            p_area = self.polygon_area(poly)
+            #invalid poly
+            if abs(p_area) < 1:
+                continue
+            if p_area > 0:
+                #'poly in wrong direction'
+                if not tag:
+                    tag = True  #reversed cases should be ignore
+                poly = poly[(0, 3, 2, 1), :]
+            validated_polys.append(poly)
+            validated_tags.append(tag)
+        return np.array(validated_polys), np.array(validated_tags)
+
+    def draw_img_polys(self, img, polys):
+        if len(img.shape) == 4:
+            img = np.squeeze(img, axis=0)
+        if img.shape[0] == 3:
+            img = img.transpose((1, 2, 0))
+            img[:, :, 2] += 123.68
+            img[:, :, 1] += 116.78
+            img[:, :, 0] += 103.94
+        cv2.imwrite("tmp.jpg", img)
+        img = cv2.imread("tmp.jpg")
+        for box in polys:
+            box = box.astype(np.int32).reshape((-1, 1, 2))
+            cv2.polylines(img, [box], True, color=(255, 255, 0), thickness=2)
+        import random
+        ino = random.randint(0, 100)
+        cv2.imwrite("tmp_%d.jpg" % ino, img)
+        return
+
+    def shrink_poly(self, poly, r):
+        """
+        fit a poly inside the origin poly, maybe bugs here...
+        used for generate the score map
+        :param poly: the text poly
+        :param r: r in the paper
+        :return: the shrinked poly
+        """
+        # shrink ratio
+        R = 0.3
+        # find the longer pair
+        dist0 = np.linalg.norm(poly[0] - poly[1])
+        dist1 = np.linalg.norm(poly[2] - poly[3])
+        dist2 = np.linalg.norm(poly[0] - poly[3])
+        dist3 = np.linalg.norm(poly[1] - poly[2])
+        if dist0 + dist1 > dist2 + dist3:
+            # first move (p0, p1), (p2, p3), then (p0, p3), (p1, p2)
+            ## p0, p1
+            theta = np.arctan2((poly[1][1] - poly[0][1]),
+                               (poly[1][0] - poly[0][0]))
+            poly[0][0] += R * r[0] * np.cos(theta)
+            poly[0][1] += R * r[0] * np.sin(theta)
+            poly[1][0] -= R * r[1] * np.cos(theta)
+            poly[1][1] -= R * r[1] * np.sin(theta)
+            ## p2, p3
+            theta = np.arctan2((poly[2][1] - poly[3][1]),
+                               (poly[2][0] - poly[3][0]))
+            poly[3][0] += R * r[3] * np.cos(theta)
+            poly[3][1] += R * r[3] * np.sin(theta)
+            poly[2][0] -= R * r[2] * np.cos(theta)
+            poly[2][1] -= R * r[2] * np.sin(theta)
+            ## p0, p3
+            theta = np.arctan2((poly[3][0] - poly[0][0]),
+                               (poly[3][1] - poly[0][1]))
+            poly[0][0] += R * r[0] * np.sin(theta)
+            poly[0][1] += R * r[0] * np.cos(theta)
+            poly[3][0] -= R * r[3] * np.sin(theta)
+            poly[3][1] -= R * r[3] * np.cos(theta)
+            ## p1, p2
+            theta = np.arctan2((poly[2][0] - poly[1][0]),
+                               (poly[2][1] - poly[1][1]))
+            poly[1][0] += R * r[1] * np.sin(theta)
+            poly[1][1] += R * r[1] * np.cos(theta)
+            poly[2][0] -= R * r[2] * np.sin(theta)
+            poly[2][1] -= R * r[2] * np.cos(theta)
+        else:
+            ## p0, p3
+            # print poly
+            theta = np.arctan2((poly[3][0] - poly[0][0]),
+                               (poly[3][1] - poly[0][1]))
+            poly[0][0] += R * r[0] * np.sin(theta)
+            poly[0][1] += R * r[0] * np.cos(theta)
+            poly[3][0] -= R * r[3] * np.sin(theta)
+            poly[3][1] -= R * r[3] * np.cos(theta)
+            ## p1, p2
+            theta = np.arctan2((poly[2][0] - poly[1][0]),
+                               (poly[2][1] - poly[1][1]))
+            poly[1][0] += R * r[1] * np.sin(theta)
+            poly[1][1] += R * r[1] * np.cos(theta)
+            poly[2][0] -= R * r[2] * np.sin(theta)
+            poly[2][1] -= R * r[2] * np.cos(theta)
+            ## p0, p1
+            theta = np.arctan2((poly[1][1] - poly[0][1]),
+                               (poly[1][0] - poly[0][0]))
+            poly[0][0] += R * r[0] * np.cos(theta)
+            poly[0][1] += R * r[0] * np.sin(theta)
+            poly[1][0] -= R * r[1] * np.cos(theta)
+            poly[1][1] -= R * r[1] * np.sin(theta)
+            ## p2, p3
+            theta = np.arctan2((poly[2][1] - poly[3][1]),
+                               (poly[2][0] - poly[3][0]))
+            poly[3][0] += R * r[3] * np.cos(theta)
+            poly[3][1] += R * r[3] * np.sin(theta)
+            poly[2][0] -= R * r[2] * np.cos(theta)
+            poly[2][1] -= R * r[2] * np.sin(theta)
+        return poly
+
+    def generate_quad(self, im_size, polys, tags):
+        """
+        Generate quadrangle.
+        """
+        h, w = im_size
+        poly_mask = np.zeros((h, w), dtype=np.uint8)
+        score_map = np.zeros((h, w), dtype=np.uint8)
+        # (x1, y1, ..., x4, y4, short_edge_norm)
+        geo_map = np.zeros((h, w, 9), dtype=np.float32)
+        # mask used during traning, to ignore some hard areas
+        training_mask = np.ones((h, w), dtype=np.uint8)
+        for poly_idx, poly_tag in enumerate(zip(polys, tags)):
+            poly = poly_tag[0]
+            tag = poly_tag[1]
+
+            r = [None, None, None, None]
+            for i in range(4):
+                dist1 = np.linalg.norm(poly[i] - poly[(i + 1) % 4])
+                dist2 = np.linalg.norm(poly[i] - poly[(i - 1) % 4])
+                r[i] = min(dist1, dist2)
+            # score map
+            shrinked_poly = self.shrink_poly(
+                poly.copy(), r).astype(np.int32)[np.newaxis, :, :]
+            cv2.fillPoly(score_map, shrinked_poly, 1)
+            cv2.fillPoly(poly_mask, shrinked_poly, poly_idx + 1)
+            # if the poly is too small, then ignore it during training
+            poly_h = min(
+                np.linalg.norm(poly[0] - poly[3]),
+                np.linalg.norm(poly[1] - poly[2]))
+            poly_w = min(
+                np.linalg.norm(poly[0] - poly[1]),
+                np.linalg.norm(poly[2] - poly[3]))
+            if min(poly_h, poly_w) < self.min_text_size:
+                cv2.fillPoly(training_mask,
+                             poly.astype(np.int32)[np.newaxis, :, :], 0)
+
+            if tag:
+                cv2.fillPoly(training_mask,
+                             poly.astype(np.int32)[np.newaxis, :, :], 0)
+
+            xy_in_poly = np.argwhere(poly_mask == (poly_idx + 1))
+            # geo map.
+            y_in_poly = xy_in_poly[:, 0]
+            x_in_poly = xy_in_poly[:, 1]
+            poly[:, 0] = np.minimum(np.maximum(poly[:, 0], 0), w)
+            poly[:, 1] = np.minimum(np.maximum(poly[:, 1], 0), h)
+            for pno in range(4):
+                geo_channel_beg = pno * 2
+                geo_map[y_in_poly, x_in_poly, geo_channel_beg] =\
+                    x_in_poly - poly[pno, 0]
+                geo_map[y_in_poly, x_in_poly, geo_channel_beg+1] =\
+                    y_in_poly - poly[pno, 1]
+            geo_map[y_in_poly, x_in_poly, 8] = \
+                1.0 / max(min(poly_h, poly_w), 1.0)
+        return score_map, geo_map, training_mask
+
+    def crop_area(self,
+                  im,
+                  polys,
+                  tags,
+                  crop_background=False,
+                  max_tries=50):
+        """
+        make random crop from the input image
+        :param im:
+        :param polys:
+        :param tags:
+        :param crop_background:
+        :param max_tries:
+        :return:
+        """
+        h, w, _ = im.shape
+        pad_h = h // 10
+        pad_w = w // 10
+        h_array = np.zeros((h + pad_h * 2), dtype=np.int32)
+        w_array = np.zeros((w + pad_w * 2), dtype=np.int32)
+        for poly in polys:
+            poly = np.round(poly, decimals=0).astype(np.int32)
+            minx = np.min(poly[:, 0])
+            maxx = np.max(poly[:, 0])
+            w_array[minx + pad_w:maxx + pad_w] = 1
+            miny = np.min(poly[:, 1])
+            maxy = np.max(poly[:, 1])
+            h_array[miny + pad_h:maxy + pad_h] = 1
+        # ensure the cropped area not across a text
+        h_axis = np.where(h_array == 0)[0]
+        w_axis = np.where(w_array == 0)[0]
+        if len(h_axis) == 0 or len(w_axis) == 0:
+            return im, polys, tags
+
+        for i in range(max_tries):
+            xx = np.random.choice(w_axis, size=2)
+            xmin = np.min(xx) - pad_w
+            xmax = np.max(xx) - pad_w
+            xmin = np.clip(xmin, 0, w - 1)
+            xmax = np.clip(xmax, 0, w - 1)
+            yy = np.random.choice(h_axis, size=2)
+            ymin = np.min(yy) - pad_h
+            ymax = np.max(yy) - pad_h
+            ymin = np.clip(ymin, 0, h - 1)
+            ymax = np.clip(ymax, 0, h - 1)
+            if xmax - xmin < self.min_crop_side_ratio * w or \
+               ymax - ymin < self.min_crop_side_ratio * h:
+                # area too small
+                continue
+            if polys.shape[0] != 0:
+                poly_axis_in_area = (polys[:, :, 0] >= xmin)\
+                    & (polys[:, :, 0] <= xmax)\
+                    & (polys[:, :, 1] >= ymin)\
+                    & (polys[:, :, 1] <= ymax)
+                selected_polys = np.where(
+                    np.sum(poly_axis_in_area, axis=1) == 4)[0]
+            else:
+                selected_polys = []
+
+            if len(selected_polys) == 0:
+                # no text in this area
+                if crop_background:
+                    im = im[ymin:ymax + 1, xmin:xmax + 1, :]
+                    polys = []
+                    tags = []
+                    return im, polys, tags
+                else:
+                    continue
+
+            im = im[ymin:ymax + 1, xmin:xmax + 1, :]
+            polys = polys[selected_polys]
+            tags = tags[selected_polys]
+            polys[:, :, 0] -= xmin
+            polys[:, :, 1] -= ymin
+            return im, polys, tags
+        return im, polys, tags
+
+    def crop_background_infor(self, im, text_polys, text_tags):
+        im, text_polys, text_tags = self.crop_area(
+            im, text_polys, text_tags, crop_background=True)
+
+        if len(text_polys) > 0:
+            return None
+        # pad and resize image
+        input_size = self.input_size
+        im, ratio = self.preprocess(im)
+        score_map = np.zeros((input_size, input_size), dtype=np.float32)
+        geo_map = np.zeros((input_size, input_size, 9), dtype=np.float32)
+        training_mask = np.ones((input_size, input_size), dtype=np.float32)
+        return im, score_map, geo_map, training_mask
+
+    def crop_foreground_infor(self, im, text_polys, text_tags):
+        im, text_polys, text_tags = self.crop_area(
+            im, text_polys, text_tags, crop_background=False)
+
+        if text_polys.shape[0] == 0:
+            return None
+        #continue for all ignore case
+        if np.sum((text_tags * 1.0)) >= text_tags.size:
+            return None
+        # pad and resize image
+        input_size = self.input_size
+        im, ratio = self.preprocess(im)
+        text_polys[:, :, 0] *= ratio
+        text_polys[:, :, 1] *= ratio
+        _, _, new_h, new_w = im.shape
+        #         print(im.shape)
+        #         self.draw_img_polys(im, text_polys)
+        score_map, geo_map, training_mask = self.generate_quad(
+            (new_h, new_w), text_polys, text_tags)
+        return im, score_map, geo_map, training_mask
+
+    def __call__(self, data):
+        im = data['image']
+        text_polys = data['polys']
+        text_tags = data['ignore_tags']
+        if im is None:
+            return None
+        if text_polys.shape[0] == 0:
+            return None
+
+        #add rotate cases
+        if np.random.rand() < 0.5:
+            im, text_polys = self.rotate_im_poly(im, text_polys)
+        h, w, _ = im.shape
+        text_polys, text_tags = self.check_and_validate_polys(text_polys,
+                                                              text_tags, h, w)
+        if text_polys.shape[0] == 0:
+            return None
+
+        # random scale this image
+        rd_scale = np.random.choice(self.random_scale)
+        im = cv2.resize(im, dsize=None, fx=rd_scale, fy=rd_scale)
+        text_polys *= rd_scale
+        if np.random.rand() < self.background_ratio:
+            outs = self.crop_background_infor(im, text_polys, text_tags)
+        else:
+            outs = self.crop_foreground_infor(im, text_polys, text_tags)
+
+        if outs is None:
+            return None
+        im, score_map, geo_map, training_mask = outs
+        score_map = score_map[np.newaxis, ::4, ::4].astype(np.float32)
+        geo_map = np.swapaxes(geo_map, 1, 2)
+        geo_map = np.swapaxes(geo_map, 1, 0)
+        geo_map = geo_map[:, ::4, ::4].astype(np.float32)
+        training_mask = training_mask[np.newaxis, ::4, ::4]
+        training_mask = training_mask.astype(np.float32)
+
+        data['image'] = im[0]
+        data['score_map'] = score_map
+        data['geo_map'] = geo_map
+        data['training_mask'] = training_mask
+        # print(im.shape, score_map.shape, geo_map.shape, training_mask.shape)
+        return data
\ No newline at end of file
--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@@ -52,6 +52,7 @@ class DetLabelEncode(object):
                txt_tags.append(True)
            else:
                txt_tags.append(False)
+        boxes = self.expand_points_num(boxes)
        boxes = np.array(boxes, dtype=np.float32)
        txt_tags = np.array(txt_tags, dtype=np.bool)

@@ -70,6 +71,17 @@ class DetLabelEncode(object):
        rect[3] = pts[np.argmax(diff)]
        return rect

+    def expand_points_num(self, boxes):
+        max_points_num = 0
+        for box in boxes:
+            if len(box) > max_points_num:
+                max_points_num = len(box)
+        ex_boxes = []
+        for box in boxes:
+            ex_box = box + [box[-1]] * (max_points_num - len(box))
+            ex_boxes.append(ex_box)
+        return ex_boxes
+

 class BaseRecLabelEncode(object):
    """ Convert between text-label and text-index """
@@ -79,15 +91,17 @@ class BaseRecLabelEncode(object):
                 character_dict_path=None,
                 character_type='ch',
                 use_space_char=False):
-        support_character_type = ['ch', 'en', 'en_sensitive']
+        support_character_type = [
+            'ch', 'en', 'en_sensitive', 'french', 'german', 'japan', 'korean'
+        ]
        assert character_type in support_character_type, "Only {} are supported now but get {}".format(
-            support_character_type, self.character_str)
+            support_character_type, character_type)

        self.max_text_len = max_text_length
        if character_type == "en":
            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
            dict_character = list(self.character_str)
-        elif character_type == "ch":
+        elif character_type in ["ch", "french", "german", "japan", "korean"]:
            self.character_str = ""
            assert character_dict_path is not None, "character_dict_path should not be None when character_type is ch"
            with open(character_dict_path, "rb") as fin:

--- a/ppocr/data/imaug/operators.py
+++ b/ppocr/data/imaug/operators.py
@@ -42,6 +42,8 @@ class DecodeImage(object):
                img) > 0, "invalid input 'img' in DecodeImage"
        img = np.frombuffer(img, dtype='uint8')
        img = cv2.imdecode(img, 1)
+        if img is None:
+            return None
        if self.img_mode == 'GRAY':
            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
        elif self.img_mode == 'RGB':
@@ -120,26 +122,37 @@ class DetResizeForTest(object):
        if 'limit_side_len' in kwargs:
            self.limit_side_len = kwargs['limit_side_len']
            self.limit_type = kwargs.get('limit_type', 'min')
+        if 'resize_long' in kwargs:
+            self.resize_type = 2
+            self.resize_long = kwargs.get('resize_long', 960)
        else:
            self.limit_side_len = 736
            self.limit_type = 'min'

    def __call__(self, data):
        img = data['image']
+        src_h, src_w, _ = img.shape

        if self.resize_type == 0:
-            img, shape = self.resize_image_type0(img)
+            # img, shape = self.resize_image_type0(img)
+            img, [ratio_h, ratio_w] = self.resize_image_type0(img)
+        elif self.resize_type == 2:
+            img, [ratio_h, ratio_w] = self.resize_image_type2(img)
        else:
-            img, shape = self.resize_image_type1(img)
+            # img, shape = self.resize_image_type1(img)
+            img, [ratio_h, ratio_w] = self.resize_image_type1(img)
        data['image'] = img
-        data['shape'] = shape
+        data['shape'] = np.array([src_h, src_w, ratio_h, ratio_w])
        return data

    def resize_image_type1(self, img):
        resize_h, resize_w = self.image_shape
        ori_h, ori_w = img.shape[:2]  # (h, w, c)
+        ratio_h = float(resize_h) / ori_h
+        ratio_w = float(resize_w) / ori_w
        img = cv2.resize(img, (int(resize_w), int(resize_h)))
-        return img, np.array([ori_h, ori_w])
+        # return img, np.array([ori_h, ori_w])
+        return img, [ratio_h, ratio_w]

    def resize_image_type0(self, img):
        """
@@ -182,4 +195,31 @@ class DetResizeForTest(object):
        except:
            print(img.shape, resize_w, resize_h)
            sys.exit(0)
-        return img, np.array([h, w])
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+        # return img, np.array([h, w])
+        return img, [ratio_h, ratio_w]
+
+    def resize_image_type2(self, img):
+        h, w, _ = img.shape
+
+        resize_w = w
+        resize_h = h
+
+        # Fix the longer side
+        if resize_h > resize_w:
+            ratio = float(self.resize_long) / resize_h
+        else:
+            ratio = float(self.resize_long) / resize_w
+
+        resize_h = int(resize_h * ratio)
+        resize_w = int(resize_w * ratio)
+
+        max_stride = 128
+        resize_h = (resize_h + max_stride - 1) // max_stride * max_stride
+        resize_w = (resize_w + max_stride - 1) // max_stride * max_stride
+        img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        ratio_h = resize_h / float(h)
+        ratio_w = resize_w / float(w)
+
+        return img, [ratio_h, ratio_w]
--- a/ppocr/data/imaug/sast_process.py
+++ b/ppocr/data/imaug/sast_process.py
--- a/ppocr/data/simple_dataset.py
+++ b/ppocr/data/simple_dataset.py
@@ -27,14 +27,13 @@ class SimpleDataSet(Dataset):
        global_config = config['Global']
        dataset_config = config[mode]['dataset']
        loader_config = config[mode]['loader']
-        batch_size = loader_config['batch_size_per_card']

        self.delimiter = dataset_config.get('delimiter', '\t')
        label_file_list = dataset_config.pop('label_file_list')
        data_source_num = len(label_file_list)
        ratio_list = dataset_config.get("ratio_list", [1.0])
        if isinstance(ratio_list, (float, int)):
-            ratio_list = [float(ratio_list)] * len(data_source_num)
+            ratio_list = [float(ratio_list)] * int(data_source_num)

        assert len(
            ratio_list
@@ -76,6 +75,8 @@ class SimpleDataSet(Dataset):
            label = substr[1]
            img_path = os.path.join(self.data_dir, file_name)
            data = {'img_path': img_path, 'label': label}
+            if not os.path.exists(img_path):
+                raise Exception("{} does not exist!".format(img_path))
            with open(data['img_path'], 'rb') as f:
                img = f.read()
                data['image'] = img

--- a/ppocr/losses/__init__.py
+++ b/ppocr/losses/__init__.py
@@ -18,6 +18,8 @@ import copy
 def build_loss(config):
    # det loss
    from .det_db_loss import DBLoss
+    from .det_east_loss import EASTLoss
+    from .det_sast_loss import SASTLoss

    # rec loss
    from .rec_ctc_loss import CTCLoss
@@ -25,7 +27,7 @@ def build_loss(config):
    # cls loss
    from .cls_loss import ClsLoss

-    support_dict = ['DBLoss', 'CTCLoss', 'ClsLoss']
+    support_dict = ['DBLoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss']

    config = copy.deepcopy(config)
    module_name = config.pop('name')