"vscode:/vscode.git/clone" did not exist on "106d1953a1b3f5e532586d71d587c65583b10503"
Commit 3c906d41 authored by tink2123's avatar tink2123
Browse files

merge dygraph

parents 8308f332 6a41a37a
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package) + [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
* [2. Easy-to-Use](#2-easy-to-use) * [2. Easy-to-Use](#2-easy-to-use)
+ [2.1 Use by command line](#21-use-by-command-line) + [2.1 Use by Command Line](#21-use-by-command-line)
- [2.1.1 English and Chinese Model](#211-english-and-chinese-model) - [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
- [2.1.2 Multi-language Model](#212-multi-language-model) - [2.1.2 Multi-language Model](#212-multi-language-model)
- [2.1.3 Layout Analysis](#213-layoutAnalysis) - [2.1.3 Layout Analysis](#213-layoutAnalysis)
...@@ -39,7 +39,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+ ...@@ -39,7 +39,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
<a name="21-use-by-command-line"></a> <a name="21-use-by-command-line"></a>
### 2.1 Use by command line ### 2.1 Use by Command Line
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
...@@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag ...@@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.990372] ['PAIN', 0.990372]
``` ```
If you need to use the 2.0 model, please specify the parameter `--version 2.0`, paddleocr uses the 2.1 model by default. More whl package usage can be found in [whl package](./whl_en.md) If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a> <a name="212-multi-language-model"></a>
#### 2.1.2 Multi-language Model #### 2.1.2 Multi-language Model
......
# TEXT RECOGNITION # Text Recognition
- [1 DATA PREPARATION](#DATA_PREPARATION) - [1. Data Preparation](#DATA_PREPARATION)
- [1.1 Costom Dataset](#Costom_Dataset) - [1.1 Costom Dataset](#Costom_Dataset)
- [1.2 Dataset Download](#Dataset_download) - [1.2 Dataset Download](#Dataset_download)
- [1.3 Dictionary](#Dictionary) - [1.3 Dictionary](#Dictionary)
- [1.4 Add Space Category](#Add_space_category) - [1.4 Add Space Category](#Add_space_category)
- [2 TRAINING](#TRAINING) - [2. Training](#TRAINING)
- [2.1 Data Augmentation](#Data_Augmentation) - [2.1 Data Augmentation](#Data_Augmentation)
- [2.2 General Training](#Training) - [2.2 General Training](#Training)
- [2.3 Multi-language Training](#Multi_language) - [2.3 Multi-language Training](#Multi_language)
- [3 EVALUATION](#EVALUATION) - [3. Evaluation](#EVALUATION)
- [4 PREDICTION](#PREDICTION) - [4. Prediction](#PREDICTION)
- [4.1 Training engine prediction](#Training_engine_prediction) - [5. Convert to Inference Model](#Inference)
- [5 CONVERT TO INFERENCE MODEL](#Inference)
<a name="DATA_PREPARATION"></a> <a name="DATA_PREPARATION"></a>
## 1 DATA PREPARATION ## 1. Data Preparation
PaddleOCR supports two data formats: PaddleOCR supports two data formats:
...@@ -37,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset> ...@@ -37,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
``` ```
<a name="Costom_Dataset"></a> <a name="Costom_Dataset"></a>
### 1.1 Costom dataset ### 1.1 Costom Dataset
If you want to use your own data for training, please refer to the following to organize your data. If you want to use your own data for training, please refer to the following to organize your data.
...@@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con ...@@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
``` ```
<a name="Dataset_download"></a> <a name="Dataset_download"></a>
### 1.2 Dataset download ### 1.2 Dataset Download
- ICDAR2015 - ICDAR2015
...@@ -169,14 +168,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co ...@@ -169,14 +168,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch. If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a> <a name="Add_space_category"></a>
### 1.4 Add space category ### 1.4 Add Space Category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch** **Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a> <a name="TRAINING"></a>
## 2 TRAINING ## 2.Training
<a name="Data_Augmentation"></a> <a name="Data_Augmentation"></a>
### 2.1 Data Augmentation ### 2.1 Data Augmentation
...@@ -367,7 +366,7 @@ Eval: ...@@ -367,7 +366,7 @@ Eval:
<a name="EVALUATION"></a> <a name="EVALUATION"></a>
## 3 EVALUATION ## 3. Evalution
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
...@@ -377,7 +376,7 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec ...@@ -377,7 +376,7 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
``` ```
<a name="PREDICTION"></a> <a name="PREDICTION"></a>
## 4 PREDICTION ## 4. Prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
...@@ -441,7 +440,7 @@ infer_img: doc/imgs_words/ch/word_1.jpg ...@@ -441,7 +440,7 @@ infer_img: doc/imgs_words/ch/word_1.jpg
<a name="Inference"></a> <a name="Inference"></a>
## 5 CONVERT TO INFERENCE MODEL ## 5. Convert to Inference Model
The recognition model is converted to the inference model in the same way as the detection, as follows: The recognition model is converted to the inference model in the same way as the detection, as follows:
......
# MODEL TRAINING # Model Training
- [1. Basic concepts](#1-basic-concepts) - [1.Yml Configuration ](#1-Yml-Configuration)
* [1.1 Learning rate](#11-learning-rate) - [2. Basic Concepts](#1-basic-concepts)
* [1.2 Regularization](#12-regularization) * [2.1 Learning Rate](#11-learning-rate)
* [1.3 Evaluation indicators](#13-evaluation-indicators-) * [2.2 Regularization](#12-regularization)
- [2. Data and vertical scenes](#2-data-and-vertical-scenes) * [2.3 Evaluation Indicators](#13-evaluation-indicators-)
* [2.1 Training data](#21-training-data) - [3. Data and Vertical Scenes](#2-data-and-vertical-scenes)
* [2.2 Vertical scene](#22-vertical-scene) * [3.1 Training Data](#21-training-data)
* [2.3 Build your own data set](#23-build-your-own-data-set) * [3.2 Vertical Scene](#22-vertical-scene)
* [3. FAQ](#3-faq) * [3.3 Build Your Own Dataset](#23-build-your-own-data-set)
* [4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training. This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene. At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<a name="1-Yml-Configuration"></a>
## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
<a name="1-basic-concepts"></a> <a name="1-basic-concepts"></a>
# 1. Basic concepts # 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task. ## 2. Basic Concepts
It usually consists of two subtasks: text detection and text recognition.
The following parameters need to be paid attention to when tuning the model: The following parameters need to be paid attention to when tuning the model:
<a name="11-learning-rate"></a> <a name="11-learning-rate"></a>
## 1.1 Learning rate ### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration. The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example: A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
...@@ -61,7 +69,7 @@ Optimizer: ...@@ -61,7 +69,7 @@ Optimizer:
factor: 2.0e-05 factor: 2.0e-05
``` ```
<a name="13-evaluation-indicators-"></a> <a name="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators ### 2.3 Evaluation Indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection. (1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
...@@ -71,11 +79,11 @@ Optimizer: ...@@ -71,11 +79,11 @@ Optimizer:
<a name="2-data-and-vertical-scenes"></a> <a name="2-data-and-vertical-scenes"></a>
# 2. Data and vertical scenes ## 3. Data and Vertical Scenes
<a name="21-training-data"></a> <a name="21-training-data"></a>
## 2.1 Training data ### 3.1 Training Data
The current open source models, data sets and magnitudes are as follows: The current open source models, data sets and magnitudes are as follows:
...@@ -92,14 +100,14 @@ Among them, the public data sets are all open source, users can search and downl ...@@ -92,14 +100,14 @@ Among them, the public data sets are all open source, users can search and downl
<a name="22-vertical-scene"></a> <a name="22-vertical-scene"></a>
## 2.2 Vertical scene ### 3.2 Vertical Scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself; PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories. If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<a name="23-build-your-own-data-set"></a> <a name="23-build-your-own-data-set"></a>
## 2.3 Build your own data set ### 3.3 Build Your Own Dataset
There are several experiences for reference when constructing the data set: There are several experiences for reference when constructing the data set:
......
doc/joinus.PNG

188 KB | W: | H:

doc/joinus.PNG

191 KB | W: | H:

doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
doc/joinus.PNG
  • 2-up
  • Swipe
  • Onion skin
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.vision.transforms import ColorJitter as pp_ColorJitter
__all__ = ['ColorJitter']
class ColorJitter(object):
def __init__(self, brightness=0, contrast=0, saturation=0, hue=0,**kwargs):
self.aug = pp_ColorJitter(brightness, contrast, saturation, hue)
def __call__(self, data):
image = data['image']
image = self.aug(image)
data['image'] = image
return data
...@@ -19,11 +19,13 @@ from __future__ import unicode_literals ...@@ -19,11 +19,13 @@ from __future__ import unicode_literals
from .iaa_augment import IaaAugment from .iaa_augment import IaaAugment
from .make_border_map import MakeBorderMap from .make_border_map import MakeBorderMap
from .make_shrink_map import MakeShrinkMap from .make_shrink_map import MakeShrinkMap
from .random_crop_data import EastRandomCropData, PSERandomCrop from .random_crop_data import EastRandomCropData, RandomCropImgMask
from .make_pse_gt import MakePseGt
from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, SEEDResize from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg, SEEDResize
from .randaugment import RandAugment from .randaugment import RandAugment
from .copy_paste import CopyPaste from .copy_paste import CopyPaste
from .ColorJitter import ColorJitter
from .operators import * from .operators import *
from .label_ops import * from .label_ops import *
......
...@@ -181,6 +181,8 @@ class NRTRLabelEncode(BaseRecLabelEncode): ...@@ -181,6 +181,8 @@ class NRTRLabelEncode(BaseRecLabelEncode):
text = self.encode(text) text = self.encode(text)
if text is None: if text is None:
return None return None
if len(text) >= self.max_text_len - 1:
return None
data['length'] = np.array(len(text)) data['length'] = np.array(len(text))
text.insert(0, 2) text.insert(0, 2)
text.append(3) text.append(3)
......
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import numpy as np
import pyclipper
from shapely.geometry import Polygon
__all__ = ['MakePseGt']
class MakePseGt(object):
r'''
Making binary mask from detection data with ICDAR format.
Typically following the process of class `MakeICDARData`.
'''
def __init__(self, kernel_num=7, size=640, min_shrink_ratio=0.4, **kwargs):
self.kernel_num = kernel_num
self.min_shrink_ratio = min_shrink_ratio
self.size = size
def __call__(self, data):
image = data['image']
text_polys = data['polys']
ignore_tags = data['ignore_tags']
h, w, _ = image.shape
short_edge = min(h, w)
if short_edge < self.size:
# keep short_size >= self.size
scale = self.size / short_edge
image = cv2.resize(image, dsize=None, fx=scale, fy=scale)
text_polys *= scale
gt_kernels = []
for i in range(1,self.kernel_num+1):
# s1->sn, from big to small
rate = 1.0 - (1.0 - self.min_shrink_ratio) / (self.kernel_num - 1) * i
text_kernel, ignore_tags = self.generate_kernel(image.shape[0:2], rate, text_polys, ignore_tags)
gt_kernels.append(text_kernel)
training_mask = np.ones(image.shape[0:2], dtype='uint8')
for i in range(text_polys.shape[0]):
if ignore_tags[i]:
cv2.fillPoly(training_mask, text_polys[i].astype(np.int32)[np.newaxis, :, :], 0)
gt_kernels = np.array(gt_kernels)
gt_kernels[gt_kernels > 0] = 1
data['image'] = image
data['polys'] = text_polys
data['gt_kernels'] = gt_kernels[0:]
data['gt_text'] = gt_kernels[0]
data['mask'] = training_mask.astype('float32')
return data
def generate_kernel(self, img_size, shrink_ratio, text_polys, ignore_tags=None):
h, w = img_size
text_kernel = np.zeros((h, w), dtype=np.float32)
for i, poly in enumerate(text_polys):
polygon = Polygon(poly)
distance = polygon.area * (1 - shrink_ratio * shrink_ratio) / (polygon.length + 1e-6)
subject = [tuple(l) for l in poly]
pco = pyclipper.PyclipperOffset()
pco.AddPath(subject, pyclipper.JT_ROUND,
pyclipper.ET_CLOSEDPOLYGON)
shrinked = np.array(pco.Execute(-distance))
if len(shrinked) == 0 or shrinked.size == 0:
if ignore_tags is not None:
ignore_tags[i] = True
continue
try:
shrinked = np.array(shrinked[0]).reshape(-1, 2)
except:
if ignore_tags is not None:
ignore_tags[i] = True
continue
cv2.fillPoly(text_kernel, [shrinked.astype(np.int32)], i + 1)
return text_kernel, ignore_tags
...@@ -164,47 +164,55 @@ class EastRandomCropData(object): ...@@ -164,47 +164,55 @@ class EastRandomCropData(object):
return data return data
class PSERandomCrop(object): class RandomCropImgMask(object):
def __init__(self, size, **kwargs): def __init__(self, size, main_key, crop_keys, p=3 / 8, **kwargs):
self.size = size self.size = size
self.main_key = main_key
self.crop_keys = crop_keys
self.p = p
def __call__(self, data): def __call__(self, data):
imgs = data['imgs'] image = data['image']
h, w = imgs[0].shape[0:2] h, w = image.shape[0:2]
th, tw = self.size th, tw = self.size
if w == tw and h == th: if w == tw and h == th:
return imgs return data
# label中存在文本实例,并且按照概率进行裁剪,使用threshold_label_map控制 mask = data[self.main_key]
if np.max(imgs[2]) > 0 and random.random() > 3 / 8: if np.max(mask) > 0 and random.random() > self.p:
# 文本实例的左上角点 # make sure to crop the text region
tl = np.min(np.where(imgs[2] > 0), axis=1) - self.size tl = np.min(np.where(mask > 0), axis=1) - (th, tw)
tl[tl < 0] = 0 tl[tl < 0] = 0
# 文本实例的右下角点 br = np.max(np.where(mask > 0), axis=1) - (th, tw)
br = np.max(np.where(imgs[2] > 0), axis=1) - self.size
br[br < 0] = 0 br[br < 0] = 0
# 保证选到右下角点时,有足够的距离进行crop
br[0] = min(br[0], h - th) br[0] = min(br[0], h - th)
br[1] = min(br[1], w - tw) br[1] = min(br[1], w - tw)
for _ in range(50000): i = random.randint(tl[0], br[0]) if tl[0] < br[0] else 0
i = random.randint(tl[0], br[0]) j = random.randint(tl[1], br[1]) if tl[1] < br[1] else 0
j = random.randint(tl[1], br[1])
# 保证shrink_label_map有文本
if imgs[1][i:i + th, j:j + tw].sum() <= 0:
continue
else: else:
break i = random.randint(0, h - th) if h - th > 0 else 0
else: j = random.randint(0, w - tw) if w - tw > 0 else 0
i = random.randint(0, h - th)
j = random.randint(0, w - tw)
# return i, j, th, tw # return i, j, th, tw
for idx in range(len(imgs)): for k in data:
if len(imgs[idx].shape) == 3: if k in self.crop_keys:
imgs[idx] = imgs[idx][i:i + th, j:j + tw, :] if len(data[k].shape) == 3:
if np.argmin(data[k].shape) == 0:
img = data[k][:, i:i + th, j:j + tw]
if img.shape[1] != img.shape[2]:
a = 1
elif np.argmin(data[k].shape) == 2:
img = data[k][i:i + th, j:j + tw, :]
if img.shape[1] != img.shape[0]:
a = 1
else:
img = data[k]
else: else:
imgs[idx] = imgs[idx][i:i + th, j:j + tw] img = data[k][i:i + th, j:j + tw]
data['imgs'] = imgs if img.shape[0] != img.shape[1]:
a = 1
data[k] = img
return data return data
...@@ -44,12 +44,33 @@ class ClsResizeImg(object): ...@@ -44,12 +44,33 @@ class ClsResizeImg(object):
class NRTRRecResizeImg(object): class NRTRRecResizeImg(object):
def __init__(self, image_shape, resize_type, **kwargs): def __init__(self, image_shape, resize_type, padding=False, **kwargs):
self.image_shape = image_shape self.image_shape = image_shape
self.resize_type = resize_type self.resize_type = resize_type
self.padding = padding
def __call__(self, data): def __call__(self, data):
img = data['image'] img = data['image']
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
image_shape = self.image_shape
if self.padding:
imgC, imgH, imgW = image_shape
# todo: change to 0 and modified image shape
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
norm_img = np.expand_dims(resized_image, -1)
norm_img = norm_img.transpose((2, 0, 1))
resized_image = norm_img.astype(np.float32) / 128. - 1.
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
data['image'] = padding_im
return data
if self.resize_type == 'PIL': if self.resize_type == 'PIL':
image_pil = Image.fromarray(np.uint8(img)) image_pil = Image.fromarray(np.uint8(img))
img = image_pil.resize(self.image_shape, Image.ANTIALIAS) img = image_pil.resize(self.image_shape, Image.ANTIALIAS)
......
...@@ -15,7 +15,6 @@ import numpy as np ...@@ -15,7 +15,6 @@ import numpy as np
import os import os
import random import random
from paddle.io import Dataset from paddle.io import Dataset
from .imaug import transform, create_operators from .imaug import transform, create_operators
......
...@@ -20,6 +20,7 @@ import paddle.nn as nn ...@@ -20,6 +20,7 @@ import paddle.nn as nn
from .det_db_loss import DBLoss from .det_db_loss import DBLoss
from .det_east_loss import EASTLoss from .det_east_loss import EASTLoss
from .det_sast_loss import SASTLoss from .det_sast_loss import SASTLoss
from .det_pse_loss import PSELoss
# rec loss # rec loss
from .rec_ctc_loss import CTCLoss from .rec_ctc_loss import CTCLoss
...@@ -27,6 +28,8 @@ from .rec_att_loss import AttentionLoss ...@@ -27,6 +28,8 @@ from .rec_att_loss import AttentionLoss
from .rec_srn_loss import SRNLoss from .rec_srn_loss import SRNLoss
from .rec_nrtr_loss import NRTRLoss from .rec_nrtr_loss import NRTRLoss
from .rec_sar_loss import SARLoss from .rec_sar_loss import SARLoss
from .rec_aster_loss import AsterLoss
# cls loss # cls loss
from .cls_loss import ClsLoss from .cls_loss import ClsLoss
...@@ -42,14 +45,12 @@ from .combined_loss import CombinedLoss ...@@ -42,14 +45,12 @@ from .combined_loss import CombinedLoss
# table loss # table loss
from .table_att_loss import TableAttentionLoss from .table_att_loss import TableAttentionLoss
from .rec_aster_loss import AsterLoss
def build_loss(config): def build_loss(config):
support_dict = [ support_dict = [
'DBLoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss', 'AttentionLoss', 'DBLoss', 'PSELoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss',
'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss', 'AttentionLoss', 'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss',
'SARLoss', 'AsterLoss' 'TableAttentionLoss', 'SARLoss', 'AsterLoss'
] ]
config = copy.deepcopy(config) config = copy.deepcopy(config)
......
...@@ -56,31 +56,34 @@ class CELoss(nn.Layer): ...@@ -56,31 +56,34 @@ class CELoss(nn.Layer):
class KLJSLoss(object): class KLJSLoss(object):
def __init__(self, mode='kl'): def __init__(self, mode='kl'):
assert mode in ['kl', 'js', 'KL', 'JS'], "mode can only be one of ['kl', 'js', 'KL', 'JS']" assert mode in ['kl', 'js', 'KL', 'JS'
], "mode can only be one of ['kl', 'js', 'KL', 'JS']"
self.mode = mode self.mode = mode
def __call__(self, p1, p2, reduction="mean"): def __call__(self, p1, p2, reduction="mean"):
loss = paddle.multiply(p2, paddle.log( (p2+1e-5)/(p1+1e-5) + 1e-5)) loss = paddle.multiply(p2, paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
if self.mode.lower() == "js": if self.mode.lower() == "js":
loss += paddle.multiply(p1, paddle.log((p1+1e-5)/(p2+1e-5) + 1e-5)) loss += paddle.multiply(
p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5))
loss *= 0.5 loss *= 0.5
if reduction == "mean": if reduction == "mean":
loss = paddle.mean(loss, axis=[1,2]) loss = paddle.mean(loss, axis=[1, 2])
elif reduction=="none" or reduction is None: elif reduction == "none" or reduction is None:
return loss return loss
else: else:
loss = paddle.sum(loss, axis=[1,2]) loss = paddle.sum(loss, axis=[1, 2])
return loss return loss
class DMLLoss(nn.Layer): class DMLLoss(nn.Layer):
""" """
DMLLoss DMLLoss
""" """
def __init__(self, act=None): def __init__(self, act=None, use_log=False):
super().__init__() super().__init__()
if act is not None: if act is not None:
assert act in ["softmax", "sigmoid"] assert act in ["softmax", "sigmoid"]
...@@ -91,19 +94,23 @@ class DMLLoss(nn.Layer): ...@@ -91,19 +94,23 @@ class DMLLoss(nn.Layer):
else: else:
self.act = None self.act = None
self.use_log = use_log
self.jskl_loss = KLJSLoss(mode="js") self.jskl_loss = KLJSLoss(mode="js")
def forward(self, out1, out2): def forward(self, out1, out2):
if self.act is not None: if self.act is not None:
out1 = self.act(out1) out1 = self.act(out1)
out2 = self.act(out2) out2 = self.act(out2)
if len(out1.shape) < 2: if self.use_log:
# for recognition distillation, log is needed for feature map
log_out1 = paddle.log(out1) log_out1 = paddle.log(out1)
log_out2 = paddle.log(out2) log_out2 = paddle.log(out2)
loss = (F.kl_div( loss = (F.kl_div(
log_out1, out2, reduction='batchmean') + F.kl_div( log_out1, out2, reduction='batchmean') + F.kl_div(
log_out2, out1, reduction='batchmean')) / 2.0 log_out2, out1, reduction='batchmean')) / 2.0
else: else:
# for detection distillation log is not needed
loss = self.jskl_loss(out1, out2) loss = self.jskl_loss(out1, out2)
return loss return loss
......
...@@ -49,11 +49,15 @@ class CombinedLoss(nn.Layer): ...@@ -49,11 +49,15 @@ class CombinedLoss(nn.Layer):
loss = loss_func(input, batch, **kargs) loss = loss_func(input, batch, **kargs)
if isinstance(loss, paddle.Tensor): if isinstance(loss, paddle.Tensor):
loss = {"loss_{}_{}".format(str(loss), idx): loss} loss = {"loss_{}_{}".format(str(loss), idx): loss}
weight = self.loss_weight[idx] weight = self.loss_weight[idx]
for key in loss.keys():
if key == "loss": loss = {key: loss[key] * weight for key in loss}
loss_all += loss[key] * weight
if "loss" in loss:
loss_all += loss["loss"]
else: else:
loss_dict["{}_{}".format(key, idx)] = loss[key] loss_all += paddle.add_n(list(loss.values()))
loss_dict.update(loss)
loss_dict["loss"] = loss_all loss_dict["loss"] = loss_all
return loss_dict return loss_dict
...@@ -75,12 +75,6 @@ class BalanceLoss(nn.Layer): ...@@ -75,12 +75,6 @@ class BalanceLoss(nn.Layer):
mask (variable): masked maps. mask (variable): masked maps.
return: (variable) balanced loss return: (variable) balanced loss
""" """
# if self.main_loss_type in ['DiceLoss']:
# # For the loss that returns to scalar value, perform ohem on the mask
# mask = ohem_batch(pred, gt, mask, self.negative_ratio)
# loss = self.loss(pred, gt, mask)
# return loss
positive = gt * mask positive = gt * mask
negative = (1 - gt) * mask negative = (1 - gt) * mask
...@@ -154,52 +148,3 @@ class BCELoss(nn.Layer): ...@@ -154,52 +148,3 @@ class BCELoss(nn.Layer):
def forward(self, input, label, mask=None, weight=None, name=None): def forward(self, input, label, mask=None, weight=None, name=None):
loss = F.binary_cross_entropy(input, label, reduction=self.reduction) loss = F.binary_cross_entropy(input, label, reduction=self.reduction)
return loss return loss
\ No newline at end of file
def ohem_single(score, gt_text, training_mask, ohem_ratio):
pos_num = (int)(np.sum(gt_text > 0.5)) - (
int)(np.sum((gt_text > 0.5) & (training_mask <= 0.5)))
if pos_num == 0:
# selected_mask = gt_text.copy() * 0 # may be not good
selected_mask = training_mask
selected_mask = selected_mask.reshape(
1, selected_mask.shape[0], selected_mask.shape[1]).astype('float32')
return selected_mask
neg_num = (int)(np.sum(gt_text <= 0.5))
neg_num = (int)(min(pos_num * ohem_ratio, neg_num))
if neg_num == 0:
selected_mask = training_mask
selected_mask = selected_mask.reshape(
1, selected_mask.shape[0], selected_mask.shape[1]).astype('float32')
return selected_mask
neg_score = score[gt_text <= 0.5]
# 将负样本得分从高到低排序
neg_score_sorted = np.sort(-neg_score)
threshold = -neg_score_sorted[neg_num - 1]
# 选出 得分高的 负样本 和正样本 的 mask
selected_mask = ((score >= threshold) |
(gt_text > 0.5)) & (training_mask > 0.5)
selected_mask = selected_mask.reshape(
1, selected_mask.shape[0], selected_mask.shape[1]).astype('float32')
return selected_mask
def ohem_batch(scores, gt_texts, training_masks, ohem_ratio):
scores = scores.numpy()
gt_texts = gt_texts.numpy()
training_masks = training_masks.numpy()
selected_masks = []
for i in range(scores.shape[0]):
selected_masks.append(
ohem_single(scores[i, :, :], gt_texts[i, :, :], training_masks[
i, :, :], ohem_ratio))
selected_masks = np.concatenate(selected_masks, 0)
selected_masks = paddle.to_tensor(selected_masks)
return selected_masks
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddle import nn
from paddle.nn import functional as F
import numpy as np
from ppocr.utils.iou import iou
class PSELoss(nn.Layer):
def __init__(self,
alpha,
ohem_ratio=3,
kernel_sample_mask='pred',
reduction='sum',
eps=1e-6,
**kwargs):
"""Implement PSE Loss.
"""
super(PSELoss, self).__init__()
assert reduction in ['sum', 'mean', 'none']
self.alpha = alpha
self.ohem_ratio = ohem_ratio
self.kernel_sample_mask = kernel_sample_mask
self.reduction = reduction
self.eps = eps
def forward(self, outputs, labels):
predicts = outputs['maps']
predicts = F.interpolate(predicts, scale_factor=4)
texts = predicts[:, 0, :, :]
kernels = predicts[:, 1:, :, :]
gt_texts, gt_kernels, training_masks = labels[1:]
# text loss
selected_masks = self.ohem_batch(texts, gt_texts, training_masks)
loss_text = self.dice_loss(texts, gt_texts, selected_masks)
iou_text = iou((texts > 0).astype('int64'),
gt_texts,
training_masks,
reduce=False)
losses = dict(loss_text=loss_text, iou_text=iou_text)
# kernel loss
loss_kernels = []
if self.kernel_sample_mask == 'gt':
selected_masks = gt_texts * training_masks
elif self.kernel_sample_mask == 'pred':
selected_masks = (
F.sigmoid(texts) > 0.5).astype('float32') * training_masks
for i in range(kernels.shape[1]):
kernel_i = kernels[:, i, :, :]
gt_kernel_i = gt_kernels[:, i, :, :]
loss_kernel_i = self.dice_loss(kernel_i, gt_kernel_i,
selected_masks)
loss_kernels.append(loss_kernel_i)
loss_kernels = paddle.mean(paddle.stack(loss_kernels, axis=1), axis=1)
iou_kernel = iou((kernels[:, -1, :, :] > 0).astype('int64'),
gt_kernels[:, -1, :, :],
training_masks * gt_texts,
reduce=False)
losses.update(dict(loss_kernels=loss_kernels, iou_kernel=iou_kernel))
loss = self.alpha * loss_text + (1 - self.alpha) * loss_kernels
losses['loss'] = loss
if self.reduction == 'sum':
losses = {x: paddle.sum(v) for x, v in losses.items()}
elif self.reduction == 'mean':
losses = {x: paddle.mean(v) for x, v in losses.items()}
return losses
def dice_loss(self, input, target, mask):
input = F.sigmoid(input)
input = input.reshape([input.shape[0], -1])
target = target.reshape([target.shape[0], -1])
mask = mask.reshape([mask.shape[0], -1])
input = input * mask
target = target * mask
a = paddle.sum(input * target, 1)
b = paddle.sum(input * input, 1) + self.eps
c = paddle.sum(target * target, 1) + self.eps
d = (2 * a) / (b + c)
return 1 - d
def ohem_single(self, score, gt_text, training_mask, ohem_ratio=3):
pos_num = int(paddle.sum((gt_text > 0.5).astype('float32'))) - int(
paddle.sum(
paddle.logical_and((gt_text > 0.5), (training_mask <= 0.5))
.astype('float32')))
if pos_num == 0:
selected_mask = training_mask
selected_mask = selected_mask.reshape(
[1, selected_mask.shape[0], selected_mask.shape[1]]).astype(
'float32')
return selected_mask
neg_num = int(paddle.sum((gt_text <= 0.5).astype('float32')))
neg_num = int(min(pos_num * ohem_ratio, neg_num))
if neg_num == 0:
selected_mask = training_mask
selected_mask = selected_mask.view(
1, selected_mask.shape[0],
selected_mask.shape[1]).astype('float32')
return selected_mask
neg_score = paddle.masked_select(score, gt_text <= 0.5)
neg_score_sorted = paddle.sort(-neg_score)
threshold = -neg_score_sorted[neg_num - 1]
selected_mask = paddle.logical_and(
paddle.logical_or((score >= threshold), (gt_text > 0.5)),
(training_mask > 0.5))
selected_mask = selected_mask.reshape(
[1, selected_mask.shape[0], selected_mask.shape[1]]).astype(
'float32')
return selected_mask
def ohem_batch(self, scores, gt_texts, training_masks, ohem_ratio=3):
selected_masks = []
for i in range(scores.shape[0]):
selected_masks.append(
self.ohem_single(scores[i, :, :], gt_texts[i, :, :],
training_masks[i, :, :], ohem_ratio))
selected_masks = paddle.concat(selected_masks, 0).astype('float32')
return selected_masks
...@@ -44,10 +44,11 @@ class DistillationDMLLoss(DMLLoss): ...@@ -44,10 +44,11 @@ class DistillationDMLLoss(DMLLoss):
def __init__(self, def __init__(self,
model_name_pairs=[], model_name_pairs=[],
act=None, act=None,
use_log=False,
key=None, key=None,
maps_name=None, maps_name=None,
name="dml"): name="dml"):
super().__init__(act=act) super().__init__(act=act, use_log=use_log)
assert isinstance(model_name_pairs, list) assert isinstance(model_name_pairs, list)
self.key = key self.key = key
self.model_name_pairs = self._check_model_name_pairs(model_name_pairs) self.model_name_pairs = self._check_model_name_pairs(model_name_pairs)
...@@ -57,7 +58,8 @@ class DistillationDMLLoss(DMLLoss): ...@@ -57,7 +58,8 @@ class DistillationDMLLoss(DMLLoss):
def _check_model_name_pairs(self, model_name_pairs): def _check_model_name_pairs(self, model_name_pairs):
if not isinstance(model_name_pairs, list): if not isinstance(model_name_pairs, list):
return [] return []
elif isinstance(model_name_pairs[0], list) and isinstance(model_name_pairs[0][0], str): elif isinstance(model_name_pairs[0], list) and isinstance(
model_name_pairs[0][0], str):
return model_name_pairs return model_name_pairs
else: else:
return [model_name_pairs] return [model_name_pairs]
...@@ -112,8 +114,8 @@ class DistillationDMLLoss(DMLLoss): ...@@ -112,8 +114,8 @@ class DistillationDMLLoss(DMLLoss):
loss_dict["{}_{}_{}_{}_{}".format(key, pair[ loss_dict["{}_{}_{}_{}_{}".format(key, pair[
0], pair[1], map_name, idx)] = loss[key] 0], pair[1], map_name, idx)] = loss[key]
else: else:
loss_dict["{}_{}_{}".format(self.name, self.maps_name[_c], loss_dict["{}_{}_{}".format(self.name, self.maps_name[
idx)] = loss _c], idx)] = loss
loss_dict = _sum_loss(loss_dict) loss_dict = _sum_loss(loss_dict)
......
...@@ -169,21 +169,10 @@ class DetectionIoUEvaluator(object): ...@@ -169,21 +169,10 @@ class DetectionIoUEvaluator(object):
numGlobalCareDet += numDetCare numGlobalCareDet += numDetCare
perSampleMetrics = { perSampleMetrics = {
'precision': precision,
'recall': recall,
'hmean': hmean,
'pairs': pairs,
'iouMat': [] if len(detPols) > 100 else iouMat.tolist(),
'gtPolPoints': gtPolPoints,
'detPolPoints': detPolPoints,
'gtCare': numGtCare, 'gtCare': numGtCare,
'detCare': numDetCare, 'detCare': numDetCare,
'gtDontCare': gtDontCarePolsNum,
'detDontCare': detDontCarePolsNum,
'detMatched': detMatched, 'detMatched': detMatched,
'evaluationLog': evaluationLog
} }
return perSampleMetrics return perSampleMetrics
def combine_results(self, results): def combine_results(self, results):
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
from paddle import nn from paddle import nn
import paddle
class MTB(nn.Layer): class MTB(nn.Layer):
...@@ -40,7 +41,8 @@ class MTB(nn.Layer): ...@@ -40,7 +41,8 @@ class MTB(nn.Layer):
x = self.block(images) x = self.block(images)
if self.cnn_num == 2: if self.cnn_num == 2:
# (b, w, h, c) # (b, w, h, c)
x = x.transpose([0, 3, 2, 1]) x = paddle.transpose(x, [0, 3, 2, 1])
x_shape = x.shape x_shape = paddle.shape(x)
x = x.reshape([x_shape[0], x_shape[1], x_shape[2] * x_shape[3]]) x = paddle.reshape(
x, [x_shape[0], x_shape[1], x_shape[2] * x_shape[3]])
return x return x
...@@ -20,6 +20,7 @@ def build_head(config): ...@@ -20,6 +20,7 @@ def build_head(config):
from .det_db_head import DBHead from .det_db_head import DBHead
from .det_east_head import EASTHead from .det_east_head import EASTHead
from .det_sast_head import SASTHead from .det_sast_head import SASTHead
from .det_pse_head import PSEHead
from .e2e_pg_head import PGHead from .e2e_pg_head import PGHead
# rec head # rec head
...@@ -33,9 +34,9 @@ def build_head(config): ...@@ -33,9 +34,9 @@ def build_head(config):
# cls head # cls head
from .cls_head import ClsHead from .cls_head import ClsHead
support_dict = [ support_dict = [
'DBHead', 'EASTHead', 'SASTHead', 'CTCHead', 'ClsHead', 'AttentionHead', 'DBHead', 'PSEHead', 'EASTHead', 'SASTHead', 'CTCHead', 'ClsHead',
'SRNHead', 'PGHead', 'TableAttentionHead', 'SARHead', 'Transformer', 'AttentionHead', 'SRNHead', 'PGHead', 'Transformer',
'AsterHead', 'SARHead' 'TableAttentionHead', 'SARHead', 'AsterHead'
] ]
#table head #table head
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment