Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into trt_cpp

6ec9aa4d · LDOUBLEV · df1c97af · 65e61f44 · 6ec9aa4d · 6ec9aa4d
Commit 6ec9aa4d authored Feb 18, 2021 by LDOUBLEV
18 changed files
--- a/doc/doc_en/inference_en.md
+++ b/doc/doc_en/inference_en.md
@@ -148,7 +148,7 @@ The visual text detection results are saved to the ./inference_results folder by
 ![](../imgs_results/det_res_00018069.jpg)

 You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
-The optional parameters of `litmit_type` are [`max`, `min`], and
+The optional parameters of `limit_type` are [`max`, `min`], and
 `det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.

 The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,

--- a/doc/doc_en/models_list_en.md
+++ b/doc/doc_en/models_list_en.md
 ## OCR model list（V2.0, updated on 2021.1.20）
-**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
+> **Note**
+> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
+> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).

 - [1. Text Detection Model](#Detection)
 - [2. Text Recognition Model](#Recognition)
@@ -12,9 +14,13 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine

 |model type|model format|description|
 |--- | --- | --- |
-|inference model|inference.pdmodel、inference.pdiparams|Used for reasoning based on Python prediction engine，[detail](./inference_en.md)|
+|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine，[detail](./inference_en.md)|
 |trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
-|slim model|\*.nb|Generally used for Lite deployment|
+|slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
+
+Relationship of the above models is as follows.
+
+![](../imgs_en/model_prod_flow_en.png)

 <a name="Detection"></a>
 ### 1. Text Detection Model

--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
 ## TEXT RECOGNITION

- [DATA PREPARATION](#DATA_PREPARATION)
-    - [Dataset Download](#Dataset_download)
-    - [Costom Dataset](#Costom_Dataset)  
-    - [Dictionary](#Dictionary)  
-    - [Add Space Category](#Add_space_category)
+- [1 DATA PREPARATION](#DATA_PREPARATION)
+    - [1.1 Costom Dataset](#Costom_Dataset)
+    - [1.2 Dataset Download](#Dataset_download)
+    - [1.3 Dictionary](#Dictionary)  
+    - [1.4 Add Space Category](#Add_space_category)

- [TRAINING](#TRAINING)
-    - [Data Augmentation](#Data_Augmentation)
-    - [Training](#Training)
-    - [Multi-language](#Multi_language)
+- [2 TRAINING](#TRAINING)
+    - [2.1 Data Augmentation](#Data_Augmentation)
+    - [2.2 Training](#Training)
+    - [2.3 Multi-language](#Multi_language)

- [EVALUATION](#EVALUATION)
+- [3 EVALUATION](#EVALUATION)

- [PREDICTION](#PREDICTION)
-    - [Training engine prediction](#Training_engine_prediction)
+- [4 PREDICTION](#PREDICTION)
+    - [4.1 Training engine prediction](#Training_engine_prediction)

 <a name="DATA_PREPARATION"></a>
 ### DATA PREPARATION


-PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
+PaddleOCR supports two data formats:
+- `LMDB` is used to train data sets stored in lmdb format;
+- `general data` is used to train data sets stored in text files:

 Please organize the dataset as follows:

 The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:

 ```
+# linux and mac os
 ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
+# windows
+mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
 ```

-<a name="Dataset_download"></a>
-* Dataset download
-
-If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，download the lmdb format dataset required for benchmark
-
-If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
-
 <a name="Costom_Dataset"></a>
-* Use your own dataset:
+#### 1.1 Costom dataset

 If you want to use your own data for training, please refer to the following to organize your data.

 - Training set

-First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
+It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:

 * Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error

 ```
 " Image file name           Image annotation "

-train_data/train_0001.jpg   简单可依赖
-train_data/train_0002.jpg   用科技让复杂的世界更简单
-```
-PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
-
-```
-# Training set label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
-# Test Set Label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
+train_data/rec/train/word_001.jpg   简单可依赖
+train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
+...
 ```

 The final training set should have the following file structure:

 ```
 |-train_data
-    |-ic15_data
+  |-rec
    |- rec_gt_train.txt
    |- train
        |- word_001.png
@@ -82,6 +73,7 @@ Similar to the training set, the test set also needs to be provided a folder con

 ```
 |-train_data
+  |-rec
    |-ic15_data
        |- rec_gt_test.txt
        |- test
@@ -90,8 +82,25 @@ Similar to the training set, the test set also needs to be provided a folder con
            |- word_003.jpg
            | ...
 ```
+
+<a name="Dataset_download"></a>
+#### 1.2 Dataset download
+
+If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，download the lmdb format dataset required for benchmark
+
+If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
+
+PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
+
+```
+# Training set label
+wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
+# Test Set Label
+wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
+```
+
 <a name="Dictionary"></a>
- Dictionary
+#### 1.3 Dictionary

 Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.

@@ -108,6 +117,8 @@ n

 In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]

+PaddleOCR has built-in dictionaries, which can be used on demand.
+
 `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.

 `ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
@@ -123,8 +134,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
 `ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters


-You can use it on demand.
-
 The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
 If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) and we will thank you in the Repo.

@@ -136,14 +145,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
 If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.

 <a name="Add_space_category"></a>
- Add space category
+#### 1.4 Add space category

 If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.

 **Note: use_space_char only takes effect when character_type=ch**

 <a name="TRAINING"></a>
-### TRAINING
+### 2 TRAINING

 PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:

@@ -166,7 +175,7 @@ Start training:
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_icdar15_train.yml
 ```
 <a name="Data_Augmentation"></a>
- Data Augmentation
+#### 2.1 Data Augmentation

 PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.

@@ -175,7 +184,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
 Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)

 <a name="Training"></a>
- Training
+#### 2.2 Training

 PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.

@@ -268,7 +277,7 @@ Eval:
 **Note that the configuration file for prediction/evaluation must be consistent with the training.**

 <a name="Multi_language"></a>
- Multi-language
+#### 2.3 Multi-language

 PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
 provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
@@ -420,7 +429,7 @@ Eval:
 ```

 <a name="EVALUATION"></a>
-### EVALUATION
+### 3 EVALUATION

 The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.

@@ -430,10 +439,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
 ```

 <a name="PREDICTION"></a>
-### PREDICTION
+### 4 PREDICTION

 <a name="Training_engine_prediction"></a>
-* Training engine prediction
+#### 4.1 Training engine prediction

 Using the model trained by paddleocr, you can quickly get prediction through the following script.


--- a/doc/doc_en/whl_en.md
+++ b/doc/doc_en/whl_en.md
 # paddleocr package

-## Get started quickly
-### install package
+## 1 Get started quickly
+### 1.1 install package
 install by pypi
 ```bash
 pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
@@ -12,9 +12,11 @@ build own whl package and install
 python3 setup.py bdist_wheel
 pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
 ```
-### 1. Use by code
+## 2 Use
+### 2.1 Use by code
+The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.

-* detection classification and recognition
+* detection angle classification and recognition
 ```python
 from paddleocr import PaddleOCR,draw_ocr
 # Paddleocr supports Chinese, English, French, German, Korean and Japanese.
@@ -163,7 +165,7 @@ Output will be a list, each item contains classification result and confidence
 ['0', 0.99999964]
 ```

-### Use by command line
+### 2.2 Use by command line

 show help information
 ```bash
@@ -239,11 +241,11 @@ Output will be a list, each item contains classification result and confidence
 ['0', 0.99999964]
 ```

-## Use custom model
+## 3 Use custom model
 When the built-in model cannot meet the needs, you need to use your own trained model.
 First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows

-### 1. Use by code
+### 3.1 Use by code

 ```python
 from paddleocr import PaddleOCR,draw_ocr
@@ -265,17 +267,17 @@ im_show = Image.fromarray(im_show)
 im_show.save('result.jpg')
 ```

-### Use by command line
+### 3.2 Use by command line

 ```bash
 paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
 ```

-### Use web images or numpy array as input
+## 4 Use web images or numpy array as input

-1. Web image
+### 4.1 Web image

-Use by code
+- Use by code
 ```python
 from paddleocr import PaddleOCR, draw_ocr
 ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
@@ -294,12 +296,12 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
 im_show = Image.fromarray(im_show)
 im_show.save('result.jpg')
 ```
-Use by command line
+- Use by command line
 ```bash
 paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
 ```

-2. Numpy array
+### 4.2 Numpy array
 Support numpy array as input only when used by code

 ```python
@@ -324,7 +326,7 @@ im_show.save('result.jpg')
 ```


-## Parameter Description
+## 5 Parameter Description

 | Parameter                    | Description                                                                                                                                                                                                                 | Default value                  |
 |-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|

--- a/doc/imgs/model_prod_flow_ch.png
+++ b/doc/imgs/model_prod_flow_ch.png
--- a/doc/imgs_en/model_prod_flow_en.png
+++ b/doc/imgs_en/model_prod_flow_en.png
--- a/doc/imgs_results/angle_class_example.jpg
+++ b/doc/imgs_results/angle_class_example.jpg
--- a/doc/joinus.PNG
+++ b/doc/joinus.PNG
--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@@ -215,7 +215,7 @@ class AttnLabelEncode(BaseRecLabelEncode):
            return None
        data['length'] = np.array(len(text))
        text = [0] + text + [len(self.character) - 1] + [0] * (self.max_text_len
-                                                               - len(text) - 1)
+                                                               - len(text) - 2)
        data['label'] = np.array(text)
        return data

@@ -255,13 +255,13 @@ class SRNLabelEncode(BaseRecLabelEncode):
    def __call__(self, data):
        text = data['label']
        text = self.encode(text)
-        char_num = len(self.character_str)
+        char_num = len(self.character)
        if text is None:
            return None
        if len(text) > self.max_text_len:
            return None
        data['length'] = np.array(len(text))
-        text = text + [char_num] * (self.max_text_len - len(text))
+        text = text + [char_num - 1] * (self.max_text_len - len(text))
        data['label'] = np.array(text)
        return data


--- a/ppocr/data/imaug/make_shrink_map.py
+++ b/ppocr/data/imaug/make_shrink_map.py
@@ -84,11 +84,12 @@ class MakeShrinkMap(object):
        return polygons, ignore_tags

    def polygon_area(self, polygon):
-        # return cv2.contourArea(polygon.astype(np.float32))
-        edge = 0
-        for i in range(polygon.shape[0]):
-            next_index = (i + 1) % polygon.shape[0]
-            edge += (polygon[next_index, 0] - polygon[i, 0]) * (
-                polygon[next_index, 1] - polygon[i, 1])
-
-        return edge / 2.
+        """
+        compute polygon area
+        """
+        area = 0
+        q = polygon[-1]
+        for p in polygon:
+            area += p[0] * q[1] - p[1] * q[0]
+            q = p
+        return area / 2.0
--- a/ppocr/data/imaug/operators.py
+++ b/ppocr/data/imaug/operators.py
@@ -185,8 +185,8 @@ class DetResizeForTest(object):
        resize_h = int(h * ratio)
        resize_w = int(w * ratio)

-        resize_h = int(round(resize_h / 32) * 32)
-        resize_w = int(round(resize_w / 32) * 32)
+        resize_h = max(int(round(resize_h / 32) * 32), 32)
+        resize_w = max(int(round(resize_w / 32) * 32), 32)

        try:
            if int(resize_w) <= 0 or int(resize_h) <= 0:

--- a/ppocr/metrics/rec_metric.py
+++ b/ppocr/metrics/rec_metric.py
@@ -29,7 +29,7 @@ class RecMetric(object):
            pred = pred.replace(" ", "")
            target = target.replace(" ", "")
            norm_edit_dis += Levenshtein.distance(pred, target) / max(
-                len(pred), len(target))
+                len(pred), len(target), 1)
            if pred == target:
                correct_num += 1
            all_num += 1

--- a/ppocr/modeling/heads/rec_att_head.py
+++ b/ppocr/modeling/heads/rec_att_head.py
@@ -146,6 +146,9 @@ class AttentionLSTM(nn.Layer):
        else:
            targets = paddle.zeros(shape=[batch_size], dtype="int32")
            probs = None
+            char_onehots = None
+            outputs = None
+            alpha = None

            for i in range(num_steps):
                char_onehots = self._char_to_onehot(

--- a/requirements.txt
+++ b/requirements.txt
 shapely
-imgaug
+scikit-image==0.17.2
+imgaug==0.4.0
 pyclipper
 lmdb
 opencv-python==4.2.0.32

--- a/tools/eval.py
+++ b/tools/eval.py
@@ -47,6 +47,7 @@ def main():
        config['Architecture']["Head"]['out_channels'] = len(
            getattr(post_process_class, 'character'))
    model = build_model(config['Architecture'])
+    use_srn = config['Architecture']['algorithm'] == "SRN"

    best_model_dict = init_model(config, model, logger)
    if len(best_model_dict):
@@ -59,7 +60,7 @@ def main():

    # start eval
    metirc = program.eval(model, valid_dataloader, post_process_class,
-                          eval_class)
+                          eval_class, use_srn)
    logger.info('metric eval ***************')
    for k, v in metirc.items():
        logger.info('{}:{}'.format(k, v))

--- a/tools/infer/predict_rec.py
+++ b/tools/infer/predict_rec.py
@@ -54,6 +54,13 @@ class TextRecognizer(object):
                "character_dict_path": args.rec_char_dict_path,
                "use_space_char": args.use_space_char
            }
+        elif self.rec_algorithm == "RARE":
+            postprocess_params = {
+                'name': 'AttnLabelDecode',
+                "character_type": args.rec_char_type,
+                "character_dict_path": args.rec_char_dict_path,
+                "use_space_char": args.use_space_char
+            }
        self.postprocess_op = build_post_process(postprocess_params)
        self.predictor, self.input_tensor, self.output_tensors = \
            utility.create_predictor(args, 'rec', logger)

--- a/tools/program.py
+++ b/tools/program.py
@@ -163,6 +163,11 @@ def train(config,
    if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
        start_eval_step = eval_batch_step[0]
        eval_batch_step = eval_batch_step[1]
+        if len(valid_dataloader) == 0:
+            logger.info(
+                'No Images in eval dataset, evaluation during training will be disabled'
+            )
+            start_eval_step = 1e111
        logger.info(
            "During the training process, after the {}th iteration, an evaluation is run every {} iterations".
            format(start_eval_step, eval_batch_step))
@@ -177,6 +182,8 @@ def train(config,
    model_average = False
    model.train()

+    use_srn = config['Architecture']['algorithm'] == "SRN"
+
    if 'start_epoch' in best_model_dict:
        start_epoch = best_model_dict['start_epoch']
    else:
@@ -195,7 +202,7 @@ def train(config,
                break
            lr = optimizer.get_lr()
            images = batch[0]
-            if config['Architecture']['algorithm'] == "SRN":
+            if use_srn:
                others = batch[-4:]
                preds = model(images, others)
                model_average = True
@@ -251,8 +258,12 @@ def train(config,
                        min_average_window=10000,
                        max_average_window=15625)
                    Model_Average.apply()
-                cur_metric = eval(model, valid_dataloader, post_process_class,
-                                  eval_class)
+                cur_metric = eval(
+                    model,
+                    valid_dataloader,
+                    post_process_class,
+                    eval_class,
+                    use_srn=use_srn)
                cur_metric_str = 'cur metric, {}'.format(', '.join(
                    ['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
                logger.info(cur_metric_str)
@@ -316,7 +327,8 @@ def train(config,
    return


-def eval(model, valid_dataloader, post_process_class, eval_class):
+def eval(model, valid_dataloader, post_process_class, eval_class,
+         use_srn=False):
    model.eval()
    with paddle.no_grad():
        total_frame = 0.0
@@ -327,7 +339,8 @@ def eval(model, valid_dataloader, post_process_class, eval_class):
                break
            images = batch[0]
            start = time.time()
-            if "SRN" in str(model.head):
+
+            if use_srn:
                others = batch[-4:]
                preds = model(images, others)
            else:

--- a/tools/train.py
+++ b/tools/train.py
@@ -50,6 +50,12 @@ def main(config, device, logger, vdl_writer):

    # build dataloader
    train_dataloader = build_dataloader(config, 'Train', device, logger)
+    if len(train_dataloader) == 0:
+        logger.error(
+            'No Images in train dataset, please check annotation file and path in the configuration file'
+        )
+        return
+
    if config['Eval']:
        valid_dataloader = build_dataloader(config, 'Eval', device, logger)
    else: