delete debug

6cb47e76 · LDOUBLEV · d666de85 · c9d7ec85 · 6cb47e76 · 6cb47e76
Commit 6cb47e76 authored Feb 07, 2022 by LDOUBLEV
6 changed files
--- a/notebook/notebook_en/6.document_analysis/document_analysis_practice-form_recognition.ipynb
+++ b/notebook/notebook_en/6.document_analysis/document_analysis_practice-form_recognition.ipynb
--- a/notebook/notebook_en/6.document_analysis/document_analysis_theory.ipynb
+++ b/notebook/notebook_en/6.document_analysis/document_analysis_theory.ipynb
--- a/notebook/notebook_en/how_to_use_these_notebooks.ipynb
+++ b/notebook/notebook_en/how_to_use_these_notebooks.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# 1. Course Prerequisites\n",
+    "\n",
+    "The OCR model involved in this course is based on deep learning, so its related basic knowledge, environment configuration, project engineering and other materials will be introduced in this section, especially for readers who are not familiar with deep learning. content.\n",
+    "\n",
+    "### 1.1 Preliminary Knowledge\n",
+    "\n",
+    "The \"learning\" of deep learning has been developed from the content of neurons, perceptrons, and multilayer neural networks in machine learning. Therefore, understanding the basic machine learning algorithms is of great help to the understanding and application of deep learning. The \"deepness\" of deep learning is embodied in a series of vector-based mathematical operations such as convolution and pooling used in the process of processing a large amount of information. If you lack the theoretical foundation of the two, you can learn from teacher Li Hongyi's [Linear Algebra](https://aistudio.baidu.com/aistudio/course/introduce/2063) and [Machine Learning](https://aistudio.baidu.com/aistudio/course/introduce/1978) courses.\n",
+    "\n",
+    "For the understanding of deep learning itself, you can refer to the zero-based course of Bai Ran, an outstanding architect of Baidu: [Baidu architects take you hands-on with zero-based practice deep learning](https://aistudio.baidu.com/aistudio/course/introduce/1297), which covers the development history of deep learning and introduces the complete components of deep learning through a classic case. It is a set of practice-oriented deep learning courses.\n",
+    "\n",
+    "For the practice of theoretical knowledge, [Python basic knowledge](https://aistudio.baidu.com/aistudio/course/introduce/1224) is essential. At the same time, in order to quickly reproduce the deep learning model, the deep learning framework used in this course For: Flying PaddlePaddle. If you have used other frameworks, you can quickly learn how to use flying paddles through [Quick Start Document](https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/quick_start/hello_paddle.html).\n",
+    "\n",
+    "### 1.2 Basic Environment Preparation\n",
+    "\n",
+    "If you want to run the code of this course in a local environment and have not built a Python environment before, you can follow the [zero-base operating environment preparation](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/environment.md), install Anaconda or docker environment according to your operating system.\n",
+    "\n",
+    "If you don't have local resources, you can run the code through the AI Studio training platform. Each item in it is presented in a notebook, which is convenient for developers to learn. If you are not familiar with the related operations of Notebook, you can refer to [AI Studio Project Description](https://ai.baidu.com/ai-doc/AISTUDIO/0k3e2tfzm).\n",
+    "\n",
+    "### 1.3 Get and Run the Code\n",
+    "\n",
+    "This course relies on the formation of PaddleOCR's code repository. First, clone the complete project of PaddleOCR:\n",
+    "\n",
+    "```bash\n",
+    "# [recommend]\n",
+    "git clone https://github.com/PaddlePaddle/PaddleOCR\n",
+    "\n",
+    "# If you cannot pull successfully due to network problems, you can also choose to use the hosting on Code Cloud:\n",
+    "git clone https://gitee.com/paddlepaddle/PaddleOCR\n",
+    "```\n",
+    "\n",
+    "> Note: The code cloud hosted code may not be able to synchronize the update of this github project in real time, there is a delay of 3~5 days, please use the recommended method first.\n",
+    ">\n",
+    "> If you are not familiar with git operations, you can download the compressed package directly from the `Code` on the homepage of PaddleOCR\n",
+    "\n",
+    "Then install third-party libraries:\n",
+    "\n",
+    "```bash\n",
+    "cd PaddleOCR\n",
+    "pip3 install -r requirements.txt\n",
+    "```\n",
+    "\n",
+    "\n",
+    "\n",
+    "### 1.4 Access to Information\n",
+    "\n",
+    "[PaddleOCR Usage Document](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/README.md) describes in detail how to use PaddleOCR to complete model application, training and deployment. The document is rich in content, most of the user’s questions are described in the document or FAQ, especially in [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.3/doc/doc_en/FAQ_en.md), in accordance with the application process of deep learning, has precipitated the user's common questions, it is recommended that you read it carefully.\n",
+    "\n",
+    "### 1.5 Ask for Help\n",
+    "\n",
+    "If you encounter BUG, ease of use or documentation related issues while using PaddleOCR, you can contact the official via [Github issue](https://github.com/PaddlePaddle/PaddleOCR/issues), please follow the issue template Provide as much information as possible so that official personnel can quickly locate the problem. At the same time, the WeChat group is the daily communication position for the majority of PaddleOCR users, and it is more suitable for asking some consulting questions. In addition to the PaddleOCR team members, there will also be enthusiastic developers answering your questions."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "py35-paddle1.2.0"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/ppocr/utils/utility.py
+++ b/ppocr/utils/utility.py
@@ -105,3 +105,22 @@ def set_seed(seed=1024):
    random.seed(seed)
    np.random.seed(seed)
    paddle.seed(seed)
+class AverageMeter:
+    def __init__(self):
+        self.reset()
+    def reset(self):
+        """reset"""
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+    def update(self, val, n=1):
+        """update"""
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
--- a/test_tipc/supplementary/readme.md
+++ b/test_tipc/supplementary/readme.md
@@ -3,7 +3,7 @@
 Linux端基础训练预测功能测试的主程序为test_train_python.sh，可以测试基于Python的模型训练、评估等基本功能，包括裁剪、量化、蒸馏训练。
-![](./tipc_train.png)
+![](./test_tipc/tipc_train.png)
 测试链条如上图所示，主要测试内容有带共享权重，自定义OP的模型的正常训练和slim相关功能训练流程是否正常。
@@ -28,23 +28,23 @@ pip3 install -r requirements.txt
 - 模式1：lite_train_lite_infer，使用少量数据训练，用于快速验证训练到预测的走通流程，不验证精度和速度；
 ```
-bash test_tipc/test_train_python.sh ./test_tipc/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'lite_train_lite_infer'
+bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python.txt 'lite_train_lite_infer'
 ```
 - 模式2：whole_train_whole_infer，使用全量数据训练，用于快速验证训练到预测的走通流程，验证模型最终训练精度；
 ```
-bash test_tipc/test_train_python.sh ./test_tipc/ch_ppocr_mobile_v2.0_det/train_infer_python.txt 'whole_train_whole_infer'
+bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python.txt 'whole_train_whole_infer'
 ```
 如果是运行量化裁剪等训练方式，需要使用不同的配置文件。量化训练的测试指令如下：
 ```
-bash test_tipc/test_train_python.sh ./test_tipc/ch_ppocr_mobile_v2.0_det/train_infer_python_PACT.txt 'lite_train_lite_infer'
+bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python_PACT.txt 'lite_train_lite_infer'
 ```
 同理，FPGM裁剪的运行方式如下：
 ```
-bash test_tipc/test_train_python.sh ./test_tipc/ch_ppocr_mobile_v2.0_det/train_infer_python_FPGM.txt 'lite_train_lite_infer'
+bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python_FPGM.txt 'lite_train_lite_infer'
 ```
 运行相应指令后，在`test_tipc/output`文件夹下自动会保存运行日志。如'lite_train_lite_infer'模式运行后，在test_tipc/extra_output文件夹有以下文件：

--- a/tools/program.py
+++ b/tools/program.py
@@ -21,7 +21,7 @@ import sys
 import platform
 import yaml
 import time
-import shutil
+import datetime
 import paddle
 import paddle.distributed as dist
 from tqdm import tqdm
@@ -29,11 +29,10 @@ from argparse import ArgumentParser, RawDescriptionHelpFormatter
 from ppocr.utils.stats import TrainingStats
 from ppocr.utils.save_load import save_model
-from ppocr.utils.utility import print_dict
+from ppocr.utils.utility import print_dict, AverageMeter
 from ppocr.utils.logging import get_logger
 from ppocr.utils import profiler
 from ppocr.data import build_dataloader
-import numpy as np
 class ArgsParser(ArgumentParser):
@@ -48,7 +47,8 @@ class ArgsParser(ArgumentParser):
            '--profiler_options',
            type=str,
            default=None,
-            help='The option of profiler, which should be in format \"key1=value1;key2=value2;key3=value3\".'
+            help='The option of profiler, which should be in format ' \
+                 '\"key1=value1;key2=value2;key3=value3\".'
        )
    def parse_args(self, argv=None):
@@ -99,7 +99,8 @@ def merge_config(config, opts):
            sub_keys = key.split('.')
            assert (
                sub_keys[0] in config
-            ), "the sub_keys can only be one of global_config: {}, but get: {}, please check your running command".format(
+            ), "the sub_keys can only be one of global_config: {}, but get: " \
+               "{}, please check your running command".format(
                config.keys(), sub_keys[0])
            cur = config[sub_keys[0]]
            for idx, sub_key in enumerate(sub_keys[1:]):
@@ -160,11 +161,13 @@ def train(config,
        eval_batch_step = eval_batch_step[1]
        if len(valid_dataloader) == 0:
            logger.info(
-                'No Images in eval dataset, evaluation during training will be disabled'
+                'No Images in eval dataset, evaluation during training ' \
+                'will be disabled'
            )
            start_eval_step = 1e111
        logger.info(
-            "During the training process, after the {}th iteration, an evaluation is run every {} iterations".
+            "During the training process, after the {}th iteration, " \
+            "an evaluation is run every {} iterations".
            format(start_eval_step, eval_batch_step))
    save_epoch_step = config['Global']['save_epoch_step']
    save_model_dir = config['Global']['save_model_dir']
@@ -189,10 +192,11 @@ def train(config,
    start_epoch = best_model_dict[
        'start_epoch'] if 'start_epoch' in best_model_dict else 1
-    train_reader_cost = 0.0
-    train_run_cost = 0.0
    total_samples = 0
+    train_reader_cost = 0.0
+    train_batch_cost = 0.0
    reader_start = time.time()
+    eta_meter = AverageMeter()
    max_iter = len(train_dataloader) - 1 if platform.system(
    ) == "Windows" else len(train_dataloader)
@@ -203,7 +207,6 @@ def train(config,
                config, 'Train', device, logger, seed=epoch)
            max_iter = len(train_dataloader) - 1 if platform.system(
            ) == "Windows" else len(train_dataloader)
        for idx, batch in enumerate(train_dataloader):
            profiler.add_profiler_step(profiler_options)
            train_reader_cost += time.time() - reader_start
@@ -214,7 +217,6 @@ def train(config,
            if use_srn:
                model_average = True
-            train_start = time.time()
            # use amp
            if scaler:
                with paddle.amp.auto_cast():
@@ -242,7 +244,9 @@ def train(config,
                optimizer.step()
            optimizer.clear_grad()
-            train_run_cost += time.time() - train_start
+            train_batch_time = time.time() - reader_start
+            train_batch_cost += train_batch_time
+            eta_meter.update(train_batch_time)
            global_step += 1
            total_samples += len(images)
@@ -273,19 +277,27 @@ def train(config,
                (global_step > 0 and global_step % print_batch_step == 0) or
                (idx >= len(train_dataloader) - 1)):
                logs = train_stats.log()
-                strs = 'epoch: [{}/{}], global_step: {}, {}, avg_reader_cost: {:.5f} s, avg_batch_cost: {:.5f} s, avg_samples: {}, samples/s: {:.5f}'.format(
-                    epoch, epoch_num, global_step, logs, train_reader_cost /
+                eta_sec = ((epoch_num + 1 - epoch) * \
-                    print_batch_step, (train_reader_cost + train_run_cost) /
+                    len(train_dataloader) - idx - 1) * eta_meter.avg
-                    print_batch_step, total_samples / print_batch_step,
+                eta_sec_format = str(datetime.timedelta(seconds=int(eta_sec)))
-                    total_samples / (train_reader_cost + train_run_cost))
+                strs = 'epoch: [{}/{}], global_step: {}, {}, avg_reader_cost: ' \
+                       '{:.5f} s, avg_batch_cost: {:.5f} s, avg_samples: {}, ' \
+                       'samples/s: {:.5f}, eta: {}'.format(
+                    epoch, epoch_num, global_step, logs,
+                    train_reader_cost / print_batch_step,
+                    train_batch_cost / print_batch_step,
+                    total_samples / print_batch_step,
+                    total_samples / train_batch_cost, eta_sec_format)
                logger.info(strs)
-                train_reader_cost = 0.0
-                train_run_cost = 0.0
                total_samples = 0
+                train_reader_cost = 0.0
+                train_batch_cost = 0.0
            # eval
            if global_step > start_eval_step and \
-                    (global_step - start_eval_step) % eval_batch_step == 0 and dist.get_rank() == 0:
+                    (global_step - start_eval_step) % eval_batch_step == 0 \
+                    and dist.get_rank() == 0:
                if model_average:
                    Model_Average = paddle.incubate.optimizer.ModelAverage(
                        0.15,