Commit 4824c25b authored by wangsen's avatar wangsen
Browse files

Initial commit

parents
# 快速构建卡证类OCR
- [快速构建卡证类OCR](#快速构建卡证类ocr)
- [1. 金融行业卡证识别应用](#1-金融行业卡证识别应用)
- [1.1 金融行业中的OCR相关技术](#11-金融行业中的ocr相关技术)
- [1.2 金融行业中的卡证识别场景介绍](#12-金融行业中的卡证识别场景介绍)
- [1.3 OCR落地挑战](#13-ocr落地挑战)
- [2. 卡证识别技术解析](#2-卡证识别技术解析)
- [2.1 卡证分类模型](#21-卡证分类模型)
- [2.2 卡证识别模型](#22-卡证识别模型)
- [3. OCR技术拆解](#3-ocr技术拆解)
- [3.1技术流程](#31技术流程)
- [3.2 OCR技术拆解---卡证分类](#32-ocr技术拆解---卡证分类)
- [卡证分类:数据、模型准备](#卡证分类数据模型准备)
- [卡证分类---修改配置文件](#卡证分类---修改配置文件)
- [卡证分类---训练](#卡证分类---训练)
- [3.2 OCR技术拆解---卡证识别](#32-ocr技术拆解---卡证识别)
- [身份证识别:检测+分类](#身份证识别检测分类)
- [数据标注](#数据标注)
- [4 . 项目实践](#4--项目实践)
- [4.1 环境准备](#41-环境准备)
- [4.2 配置文件修改](#42-配置文件修改)
- [4.3 代码修改](#43-代码修改)
- [4.3.1 数据读取](#431-数据读取)
- [4.3.2 head修改](#432--head修改)
- [4.3.3 修改loss](#433-修改loss)
- [4.3.4 后处理](#434-后处理)
- [4.4. 模型启动](#44-模型启动)
- [5 总结](#5-总结)
- [References](#references)
## 1. 金融行业卡证识别应用
### 1.1 金融行业中的OCR相关技术
* 《“十四五”数字经济发展规划》指出,2020年我国数字经济核心产业增加值占GDP比重达7.8%,随着数字经济迈向全面扩展,到2025年该比例将提升至10%。
* 在过去数年的跨越发展与积累沉淀中,数字金融、金融科技已在对金融业的重塑与再造中充分印证了其自身价值。
* 以智能为目标,提升金融数字化水平,实现业务流程自动化,降低人力成本。
![](https://ai-studio-static-online.cdn.bcebos.com/8bb381f164c54ea9b4043cf66fc92ffdea8aaf851bab484fa6e19bd2f93f154f)
### 1.2 金融行业中的卡证识别场景介绍
应用场景:身份证、银行卡、营业执照、驾驶证等。
应用难点:由于数据的采集来源多样,以及实际采集数据各种噪声:反光、褶皱、模糊、倾斜等各种问题干扰。
![](https://ai-studio-static-online.cdn.bcebos.com/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8)
### 1.3 OCR落地挑战
![](https://ai-studio-static-online.cdn.bcebos.com/a5973a8ddeff4bd7ac082f02dc4d0c79de21e721b41641cbb831f23c2cb8fce2)
## 2. 卡证识别技术解析
![](https://ai-studio-static-online.cdn.bcebos.com/d7f96effc2434a3ca2d4144ff33c50282b830670c892487d8d7dec151921cce7)
### 2.1 卡证分类模型
卡证分类:基于PPLCNet
与其他轻量级模型相比在CPU环境下ImageNet数据集上的表现
![](https://ai-studio-static-online.cdn.bcebos.com/cbda3390cb994f98a3c8a9ba88c90c348497763f6c9f4b4797f7d63d84da5f63)
![](https://ai-studio-static-online.cdn.bcebos.com/dedab7b7fd6543aa9e7f625132b24e3ba3f200e361fa468dac615f7814dfb98d)
* 模型来自模型库PaddleClas,它是一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
### 2.2 卡证识别模型
* 检测:DBNet 识别:SVRT
![](https://ai-studio-static-online.cdn.bcebos.com/9a7a4e19edc24310b46620f2ee7430f918223b93d4f14a15a52973c096926bad)
* PPOCRv3在文本检测、识别进行了一系列改进优化,在保证精度的同时提升预测效率
![](https://ai-studio-static-online.cdn.bcebos.com/6afdbb77e8db4aef9b169e4e94c5d90a9764cfab4f2c4c04aa9afdf4f54d7680)
![](https://ai-studio-static-online.cdn.bcebos.com/c1a7d197847a4f168848c59b8e625d1d5e8066b778144395a8b9382bb85dc364)
## 3. OCR技术拆解
### 3.1技术流程
![](https://ai-studio-static-online.cdn.bcebos.com/89ba046177864d8783ced6cb31ba92a66ca2169856a44ee59ac2bb18e44a6c4b)
### 3.2 OCR技术拆解---卡证分类
#### 卡证分类:数据、模型准备
A 使用爬虫获取无标注数据,将相同类别的放在同一文件夹下,文件名从0开始命名。具体格式如下图所示。
​ 注:卡证类数据,建议每个类别数据量在500张以上
![](https://ai-studio-static-online.cdn.bcebos.com/6f875b6e695e4fe5aedf427beb0d4ce8064ad7cc33c44faaad59d3eb9732639d)
B 一行命令生成标签文件
```
tree -r -i -f | grep -E "jpg|JPG|jpeg|JPEG|png|PNG|webp" | awk -F "/" '{print $0" "$2}' > train_list.txt
```
C [下载预训练模型 ](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/models/PP-LCNet.md)
#### 卡证分类---修改配置文件
配置文件主要修改三个部分:
全局参数:预训练模型路径/训练轮次/图像尺寸
模型结构:分类数
数据处理:训练/评估数据路径
![](https://ai-studio-static-online.cdn.bcebos.com/e0dc05039c7444c5ab1260ff550a408748df8d4cfe864223adf390e51058dbd5)
#### 卡证分类---训练
指定配置文件启动训练:
```
!python /home/aistudio/work/PaddleClas/tools/train.py -c /home/aistudio/work/PaddleClas/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
```
![](https://ai-studio-static-online.cdn.bcebos.com/06af09bde845449ba0a676410f4daa1cdc3983ac95034bdbbafac3b7fd94042f)
​ 注:日志中显示了训练结果和评估结果(训练时可以设置固定轮数评估一次)
### 3.2 OCR技术拆解---卡证识别
卡证识别(以身份证检测为例)
存在的困难及问题:
* 在自然场景下,由于各种拍摄设备以及光线、角度不同等影响导致实际得到的证件影像千差万别。
* 如何快速提取需要的关键信息
* 多行的文本信息,检测结果如何正确拼接
![](https://ai-studio-static-online.cdn.bcebos.com/4f8f5533a2914e0a821f4a639677843c32ec1f08a1b1488d94c0b8bfb6e72d2d)
* OCR技术拆解---OCR工具库
PaddleOCR是一个丰富、领先且实用的OCR工具库,助力开发者训练出更好的模型并应用落地
身份证识别:用现有的方法识别
![](https://ai-studio-static-online.cdn.bcebos.com/12d402e6a06d482a88f979e0ebdfb39f4d3fc8b80517499689ec607ddb04fbf3)
#### 身份证识别:检测+分类
> 方法:基于现有的dbnet检测模型,加入分类方法。检测同时进行分类,从一定程度上优化识别流程
![](https://ai-studio-static-online.cdn.bcebos.com/e1e798c87472477fa0bfca0da12bb0c180845a3e167a4761b0d26ff4330a5ccb)
![](https://ai-studio-static-online.cdn.bcebos.com/23a5a19c746441309864586e467f995ec8a551a3661640e493fc4d77520309cd)
#### 数据标注
使用PaddleOCRLable进行快速标注
![](https://ai-studio-static-online.cdn.bcebos.com/a73180425fa14f919ce52d9bf70246c3995acea1831843cca6c17d871b8f5d95)
* 修改PPOCRLabel.py,将下图中的kie参数设置为True
![](https://ai-studio-static-online.cdn.bcebos.com/d445cf4d850e4063b9a7fc6a075c12204cf912ff23ec471fa2e268b661b3d693)
* 数据标注踩坑分享
![](https://ai-studio-static-online.cdn.bcebos.com/89f42eccd600439fa9e28c97ccb663726e4e54ce3a854825b4c3b7d554ea21df)
​ 注:两者只有标注有差别,训练参数数据集都相同
## 4 . 项目实践
AIStudio项目链接:[快速构建卡证类OCR](https://aistudio.baidu.com/aistudio/projectdetail/4459116)
### 4.1 环境准备
1)拉取[paddleocr](https://github.com/PaddlePaddle/PaddleOCR)项目,如果从github上拉取速度慢可以选择从gitee上获取。
```
!git clone https://github.com/PaddlePaddle/PaddleOCR.git -b release/2.6 /home/aistudio/work/
```
2)获取并解压预训练模型,如果要使用其他模型可以从模型库里自主选择合适模型。
```
!wget -P work/pre_trained/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
!tar -vxf /home/aistudio/work/pre_trained/ch_PP-OCRv3_det_distill_train.tar -C /home/aistudio/work/pre_trained
```
3) 安装必要依赖
```
!pip install -r /home/aistudio/work/requirements.txt
```
### 4.2 配置文件修改
修改配置文件 *work/configs/det/detmv3db.yml*
具体修改说明如下:
![](https://ai-studio-static-online.cdn.bcebos.com/fcdf517af5a6466294d72db7450209378d8efd9b77764e329d3f2aff3579a20c)
注:在上述的配置文件的Global变量中需要添加以下两个参数:
​ label_list 为标签表
​ num_classes 为分类数
​ 上述两个参数根据实际的情况配置即可
![](https://ai-studio-static-online.cdn.bcebos.com/0b056be24f374812b61abf43305774767ae122c8479242f98aa0799b7bfc81d4)
其中lable_list内容如下例所示,***建议第一个参数设置为 background,不要设置为实际要提取的关键信息种类***
![](https://ai-studio-static-online.cdn.bcebos.com/9fc78bbcdf754898b9b2c7f000ddf562afac786482ab4f2ab063e2242faa542a)
配置文件中的其他设置说明
![](https://ai-studio-static-online.cdn.bcebos.com/c7fc5e631dd44bc8b714630f4e49d9155a831d9e56c64e2482ded87081d0db22)
![](https://ai-studio-static-online.cdn.bcebos.com/8d1022ac25d9474daa4fb236235bd58760039d58ad46414f841559d68e0d057f)
![](https://ai-studio-static-online.cdn.bcebos.com/ee927ad9ebd442bb96f163a7ebbf4bc95e6bedee97324a51887cf82de0851fd3)
### 4.3 代码修改
#### 4.3.1 数据读取
* 修改 PaddleOCR/ppocr/data/imaug/label_ops.py中的DetLabelEncode
```python
class DetLabelEncode(object):
# 修改检测标签的编码处,新增了参数分类数:num_classes,重写初始化方法,以及分类标签的读取
def __init__(self, label_list, num_classes=8, **kwargs):
self.num_classes = num_classes
self.label_list = []
if label_list:
if isinstance(label_list, str):
with open(label_list, 'r+', encoding='utf-8') as f:
for line in f.readlines():
self.label_list.append(line.replace("\n", ""))
else:
self.label_list = label_list
else:
assert ' please check label_list whether it is none or config is right'
if num_classes != len(self.label_list): # 校验分类数和标签的一致性
assert 'label_list length is not equal to the num_classes'
def __call__(self, data):
label = data['label']
label = json.loads(label)
nBox = len(label)
boxes, txts, txt_tags, classes = [], [], [], []
for bno in range(0, nBox):
box = label[bno]['points']
txt = label[bno]['key_cls'] # 此处将kie中的参数作为分类读取
boxes.append(box)
txts.append(txt)
if txt in ['*', '###']:
txt_tags.append(True)
if self.num_classes > 1:
classes.append(-2)
else:
txt_tags.append(False)
if self.num_classes > 1: # 将KIE内容的key标签作为分类标签使用
classes.append(int(self.label_list.index(txt)))
if len(boxes) == 0:
return None
boxes = self.expand_points_num(boxes)
boxes = np.array(boxes, dtype=np.float32)
txt_tags = np.array(txt_tags, dtype=np.bool_)
classes = classes
data['polys'] = boxes
data['texts'] = txts
data['ignore_tags'] = txt_tags
if self.num_classes > 1:
data['classes'] = classes
return data
```
* 修改 PaddleOCR/ppocr/data/imaug/make_shrink_map.py中的MakeShrinkMap类。这里需要注意的是,如果我们设置的label_list中的第一个参数为要检测的信息那么会得到如下的mask,
举例说明:
这是检测的mask图,图中有四个mask那么实际对应的分类应该是4类
![](https://ai-studio-static-online.cdn.bcebos.com/42d2188d3d6b498880952e12c3ceae1efabf135f8d9f4c31823f09ebe02ba9d2)
label_list中第一个为关键分类,则得到的分类Mask实际如下,与上图相比,少了一个box:
![](https://ai-studio-static-online.cdn.bcebos.com/864604967256461aa7c5d32cd240645e9f4c70af773341d5911f22d5a3e87b5f)
```python
class MakeShrinkMap(object):
r'''
Making binary mask from detection data with ICDAR format.
Typically following the process of class `MakeICDARData`.
'''
def __init__(self, min_text_size=8, shrink_ratio=0.4, num_classes=8, **kwargs):
self.min_text_size = min_text_size
self.shrink_ratio = shrink_ratio
self.num_classes = num_classes # 添加了分类
def __call__(self, data):
image = data['image']
text_polys = data['polys']
ignore_tags = data['ignore_tags']
if self.num_classes > 1:
classes = data['classes']
h, w = image.shape[:2]
text_polys, ignore_tags = self.validate_polygons(text_polys,
ignore_tags, h, w)
gt = np.zeros((h, w), dtype=np.float32)
mask = np.ones((h, w), dtype=np.float32)
gt_class = np.zeros((h, w), dtype=np.float32) # 新增分类
for i in range(len(text_polys)):
polygon = text_polys[i]
height = max(polygon[:, 1]) - min(polygon[:, 1])
width = max(polygon[:, 0]) - min(polygon[:, 0])
if ignore_tags[i] or min(height, width) < self.min_text_size:
cv2.fillPoly(mask,
polygon.astype(np.int32)[np.newaxis, :, :], 0)
ignore_tags[i] = True
else:
polygon_shape = Polygon(polygon)
subject = [tuple(l) for l in polygon]
padding = pyclipper.PyclipperOffset()
padding.AddPath(subject, pyclipper.JT_ROUND,
pyclipper.ET_CLOSEDPOLYGON)
shrinked = []
# Increase the shrink ratio every time we get multiple polygon returned back
possible_ratios = np.arange(self.shrink_ratio, 1,
self.shrink_ratio)
np.append(possible_ratios, 1)
for ratio in possible_ratios:
distance = polygon_shape.area * (
1 - np.power(ratio, 2)) / polygon_shape.length
shrinked = padding.Execute(-distance)
if len(shrinked) == 1:
break
if shrinked == []:
cv2.fillPoly(mask,
polygon.astype(np.int32)[np.newaxis, :, :], 0)
ignore_tags[i] = True
continue
for each_shirnk in shrinked:
shirnk = np.array(each_shirnk).reshape(-1, 2)
cv2.fillPoly(gt, [shirnk.astype(np.int32)], 1)
if self.num_classes > 1: # 绘制分类的mask
cv2.fillPoly(gt_class, polygon.astype(np.int32)[np.newaxis, :, :], classes[i])
data['shrink_map'] = gt
if self.num_classes > 1:
data['class_mask'] = gt_class
data['shrink_mask'] = mask
return data
```
由于在训练数据中会对数据进行resize设置,yml中的操作为:EastRandomCropData,所以需要修改PaddleOCR/ppocr/data/imaug/random_crop_data.py中的EastRandomCropData
```python
class EastRandomCropData(object):
def __init__(self,
size=(640, 640),
max_tries=10,
min_crop_side_ratio=0.1,
keep_ratio=True,
num_classes=8,
**kwargs):
self.size = size
self.max_tries = max_tries
self.min_crop_side_ratio = min_crop_side_ratio
self.keep_ratio = keep_ratio
self.num_classes = num_classes
def __call__(self, data):
img = data['image']
text_polys = data['polys']
ignore_tags = data['ignore_tags']
texts = data['texts']
if self.num_classes > 1:
classes = data['classes']
all_care_polys = [
text_polys[i] for i, tag in enumerate(ignore_tags) if not tag
]
# 计算crop区域
crop_x, crop_y, crop_w, crop_h = crop_area(
img, all_care_polys, self.min_crop_side_ratio, self.max_tries)
# crop 图片 保持比例填充
scale_w = self.size[0] / crop_w
scale_h = self.size[1] / crop_h
scale = min(scale_w, scale_h)
h = int(crop_h * scale)
w = int(crop_w * scale)
if self.keep_ratio:
padimg = np.zeros((self.size[1], self.size[0], img.shape[2]),
img.dtype)
padimg[:h, :w] = cv2.resize(
img[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w], (w, h))
img = padimg
else:
img = cv2.resize(
img[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w],
tuple(self.size))
# crop 文本框
text_polys_crop = []
ignore_tags_crop = []
texts_crop = []
classes_crop = []
for poly, text, tag,class_index in zip(text_polys, texts, ignore_tags,classes):
poly = ((poly - (crop_x, crop_y)) * scale).tolist()
if not is_poly_outside_rect(poly, 0, 0, w, h):
text_polys_crop.append(poly)
ignore_tags_crop.append(tag)
texts_crop.append(text)
if self.num_classes > 1:
classes_crop.append(class_index)
data['image'] = img
data['polys'] = np.array(text_polys_crop)
data['ignore_tags'] = ignore_tags_crop
data['texts'] = texts_crop
if self.num_classes > 1:
data['classes'] = classes_crop
return data
```
#### 4.3.2 head修改
主要修改 ppocr/modeling/heads/det_db_head.py,将Head类中的最后一层的输出修改为实际的分类数,同时在DBHead中新增分类的head。
![](https://ai-studio-static-online.cdn.bcebos.com/0e25da2ccded4af19e95c85c3d3287ab4d53e31a4eed4607b6a4cb637c43f6d3)
#### 4.3.3 修改loss
修改PaddleOCR/ppocr/losses/det_db_loss.py中的DBLoss类,分类采用交叉熵损失函数进行计算。
![](https://ai-studio-static-online.cdn.bcebos.com/dc10a070018d4d27946c26ec24a2a85bc3f16422f4964f72a9b63c6170d954e1)
#### 4.3.4 后处理
由于涉及到eval以及后续推理能否正常使用,我们需要修改后处理的相关代码,修改位置 PaddleOCR/ppocr/postprocess/db_postprocess.py中的DBPostProcess类
```python
class DBPostProcess(object):
"""
The post process for Differentiable Binarization (DB).
"""
def __init__(self,
thresh=0.3,
box_thresh=0.7,
max_candidates=1000,
unclip_ratio=2.0,
use_dilation=False,
score_mode="fast",
**kwargs):
self.thresh = thresh
self.box_thresh = box_thresh
self.max_candidates = max_candidates
self.unclip_ratio = unclip_ratio
self.min_size = 3
self.score_mode = score_mode
assert score_mode in [
"slow", "fast"
], "Score mode must be in [slow, fast] but got: {}".format(score_mode)
self.dilation_kernel = None if not use_dilation else np.array(
[[1, 1], [1, 1]])
def boxes_from_bitmap(self, pred, _bitmap, classes, dest_width, dest_height):
"""
_bitmap: single map with shape (1, H, W),
whose values are binarized as {0, 1}
"""
bitmap = _bitmap
height, width = bitmap.shape
outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
if len(outs) == 3:
img, contours, _ = outs[0], outs[1], outs[2]
elif len(outs) == 2:
contours, _ = outs[0], outs[1]
num_contours = min(len(contours), self.max_candidates)
boxes = []
scores = []
class_indexes = []
class_scores = []
for index in range(num_contours):
contour = contours[index]
points, sside = self.get_mini_boxes(contour)
if sside < self.min_size:
continue
points = np.array(points)
if self.score_mode == "fast":
score, class_index, class_score = self.box_score_fast(pred, points.reshape(-1, 2), classes)
else:
score, class_index, class_score = self.box_score_slow(pred, contour, classes)
if self.box_thresh > score:
continue
box = self.unclip(points).reshape(-1, 1, 2)
box, sside = self.get_mini_boxes(box)
if sside < self.min_size + 2:
continue
box = np.array(box)
box[:, 0] = np.clip(
np.round(box[:, 0] / width * dest_width), 0, dest_width)
box[:, 1] = np.clip(
np.round(box[:, 1] / height * dest_height), 0, dest_height)
boxes.append(box.astype(np.int16))
scores.append(score)
class_indexes.append(class_index)
class_scores.append(class_score)
if classes is None:
return np.array(boxes, dtype=np.int16), scores
else:
return np.array(boxes, dtype=np.int16), scores, class_indexes, class_scores
def unclip(self, box):
unclip_ratio = self.unclip_ratio
poly = Polygon(box)
distance = poly.area * unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded = np.array(offset.Execute(distance))
return expanded
def get_mini_boxes(self, contour):
bounding_box = cv2.minAreaRect(contour)
points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
index_1, index_2, index_3, index_4 = 0, 1, 2, 3
if points[1][1] > points[0][1]:
index_1 = 0
index_4 = 1
else:
index_1 = 1
index_4 = 0
if points[3][1] > points[2][1]:
index_2 = 2
index_3 = 3
else:
index_2 = 3
index_3 = 2
box = [
points[index_1], points[index_2], points[index_3], points[index_4]
]
return box, min(bounding_box[1])
def box_score_fast(self, bitmap, _box, classes):
'''
box_score_fast: use bbox mean score as the mean score
'''
h, w = bitmap.shape[:2]
box = _box.copy()
xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int32), 0, w - 1)
xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int32), 0, w - 1)
ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int32), 0, h - 1)
ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int32), 0, h - 1)
mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
box[:, 0] = box[:, 0] - xmin
box[:, 1] = box[:, 1] - ymin
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
if classes is None:
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], None, None
else:
k = 999
class_mask = np.full((ymax - ymin + 1, xmax - xmin + 1), k, dtype=np.int32)
cv2.fillPoly(class_mask, box.reshape(1, -1, 2).astype(np.int32), 0)
classes = classes[ymin:ymax + 1, xmin:xmax + 1]
new_classes = classes + class_mask
a = new_classes.reshape(-1)
b = np.where(a >= k)
classes = np.delete(a, b[0].tolist())
class_index = np.argmax(np.bincount(classes))
class_score = np.sum(classes == class_index) / len(classes)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], class_index, class_score
def box_score_slow(self, bitmap, contour, classes):
"""
box_score_slow: use polyon mean score as the mean score
"""
h, w = bitmap.shape[:2]
contour = contour.copy()
contour = np.reshape(contour, (-1, 2))
xmin = np.clip(np.min(contour[:, 0]), 0, w - 1)
xmax = np.clip(np.max(contour[:, 0]), 0, w - 1)
ymin = np.clip(np.min(contour[:, 1]), 0, h - 1)
ymax = np.clip(np.max(contour[:, 1]), 0, h - 1)
mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
contour[:, 0] = contour[:, 0] - xmin
contour[:, 1] = contour[:, 1] - ymin
cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
if classes is None:
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], None, None
else:
k = 999
class_mask = np.full((ymax - ymin + 1, xmax - xmin + 1), k, dtype=np.int32)
cv2.fillPoly(class_mask, contour.reshape(1, -1, 2).astype(np.int32), 0)
classes = classes[ymin:ymax + 1, xmin:xmax + 1]
new_classes = classes + class_mask
a = new_classes.reshape(-1)
b = np.where(a >= k)
classes = np.delete(a, b[0].tolist())
class_index = np.argmax(np.bincount(classes))
class_score = np.sum(classes == class_index) / len(classes)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], class_index, class_score
def __call__(self, outs_dict, shape_list):
pred = outs_dict['maps']
if isinstance(pred, paddle.Tensor):
pred = pred.numpy()
pred = pred[:, 0, :, :]
segmentation = pred > self.thresh
if "classes" in outs_dict:
classes = outs_dict['classes']
if isinstance(classes, paddle.Tensor):
classes = classes.numpy()
classes = classes[:, 0, :, :]
else:
classes = None
boxes_batch = []
for batch_index in range(pred.shape[0]):
src_h, src_w, ratio_h, ratio_w = shape_list[batch_index]
if self.dilation_kernel is not None:
mask = cv2.dilate(
np.array(segmentation[batch_index]).astype(np.uint8),
self.dilation_kernel)
else:
mask = segmentation[batch_index]
if classes is None:
boxes, scores = self.boxes_from_bitmap(pred[batch_index], mask, None,
src_w, src_h)
boxes_batch.append({'points': boxes})
else:
boxes, scores, class_indexes, class_scores = self.boxes_from_bitmap(pred[batch_index], mask,
classes[batch_index],
src_w, src_h)
boxes_batch.append({'points': boxes, "classes": class_indexes, "class_scores": class_scores})
return boxes_batch
```
### 4.4. 模型启动
在完成上述步骤后我们就可以正常启动训练
```
!python /home/aistudio/work/PaddleOCR/tools/train.py -c /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
```
其他命令:
```
!python /home/aistudio/work/PaddleOCR/tools/eval.py -c /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
!python /home/aistudio/work/PaddleOCR/tools/infer_det.py -c /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
```
模型推理
```
!python /home/aistudio/work/PaddleOCR/tools/infer/predict_det.py --image_dir="/home/aistudio/work/test_img/" --det_model_dir="/home/aistudio/work/PaddleOCR/output/infer"
```
## 5 总结
1. 分类+检测在一定程度上能够缩短用时,具体的模型选取要根据业务场景恰当选择。
2. 数据标注需要多次进行测试调整标注方法,一般进行检测模型微调,需要标注至少上百张。
3. 设置合理的batch_size以及resize大小,同时注意lr设置。
## References
1 https://github.com/PaddlePaddle/PaddleOCR
2 https://github.com/PaddlePaddle/PaddleClas
3 https://blog.csdn.net/YY007H/article/details/124491217
# 基于PP-OCRv3的手写文字识别
- [1. 项目背景及意义](#1-项目背景及意义)
- [2. 项目内容](#2-项目内容)
- [3. PP-OCRv3识别算法介绍](#3-PP-OCRv3识别算法介绍)
- [4. 安装环境](#4-安装环境)
- [5. 数据准备](#5-数据准备)
- [6. 模型训练](#6-模型训练)
- [6.1 下载预训练模型](#61-下载预训练模型)
- [6.2 修改配置文件](#62-修改配置文件)
- [6.3 开始训练](#63-开始训练)
- [7. 模型评估](#7-模型评估)
- [8. 模型导出推理](#8-模型导出推理)
- [8.1 模型导出](#81-模型导出)
- [8.2 模型推理](#82-模型推理)
## 1. 项目背景及意义
目前光学字符识别(OCR)技术在我们的生活当中被广泛使用,但是大多数模型在通用场景下的准确性还有待提高。针对于此我们借助飞桨提供的PaddleOCR套件较容易的实现了在垂类场景下的应用。手写体在日常生活中较为常见,然而手写体的识别却存在着很大的挑战,因为每个人的手写字体风格不一样,这对于视觉模型来说还是相当有挑战的。因此训练一个手写体识别模型具有很好的现实意义。下面给出一些手写体的示例图:
![example](https://ai-studio-static-online.cdn.bcebos.com/7a8865b2836f42d382e7c3fdaedc4d307d797fa2bcd0466e9f8b7705efff5a7b)
## 2. 项目内容
本项目基于PaddleOCR套件,以PP-OCRv3识别模型为基础,针对手写文字识别场景进行优化。
Aistudio项目链接:[OCR手写文字识别](https://aistudio.baidu.com/aistudio/projectdetail/4330587)
## 3. PP-OCRv3识别算法介绍
PP-OCRv3的识别模块是基于文本识别算法[SVTR](https://arxiv.org/abs/2205.00159)优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。如下图所示,PP-OCRv3采用了6个优化策略。
![v3_rec](https://ai-studio-static-online.cdn.bcebos.com/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a)
优化策略汇总如下:
* SVTR_LCNet:轻量级文本识别网络
* GTC:Attention指导CTC训练策略
* TextConAug:挖掘文字上下文信息的数据增广策略
* TextRotNet:自监督的预训练模型
* UDML:联合互学习策略
* UIM:无标注数据挖掘方案
详细优化策略描述请参考[PP-OCRv3优化策略](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/PP-OCRv3_introduction.md#3-%E8%AF%86%E5%88%AB%E4%BC%98%E5%8C%96)
## 4. 安装环境
```python
# 首先git官方的PaddleOCR项目,安装需要的依赖
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
```
## 5. 数据准备
本项目使用公开的手写文本识别数据集,包含Chinese OCR, 中科院自动化研究所-手写中文数据集[CASIA-HWDB2.x](http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html),以及由中科院手写数据和网上开源数据合并组合的[数据集](https://aistudio.baidu.com/aistudio/datasetdetail/102884/0)等,该项目已经挂载处理好的数据集,可直接下载使用进行训练。
```python
下载并解压数据
tar -xf hw_data.tar
```
## 6. 模型训练
### 6.1 下载预训练模型
首先需要下载我们需要的PP-OCRv3识别预训练模型,更多选择请自行选择其他的[文字识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/models_list.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)
```python
# 使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models
```
### 6.2 修改配置文件
我们使用`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml`,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch_num: 100 # 训练epoch数
save_model_dir: ./output/ch_PP-OCR_v3_rec
save_epoch_step: 10
eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy # 预训练模型路径
lr:
name: Cosine # 修改学习率衰减策略为Cosine
learning_rate: 0.0001 # 修改fine-tune的学习率
warmup_epoch: 2 # 修改warmup轮数
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data # 训练集图片路径
ext_op_transform_idx: 1
label_file_list:
- ./train_data/chineseocr-data/rec_hand_line_all_label_train.txt # 训练集标签
- ./train_data/handwrite/HWDB2.0Train_label.txt
- ./train_data/handwrite/HWDB2.1Train_label.txt
- ./train_data/handwrite/HWDB2.2Train_label.txt
- ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_train_labels.txt
- ./train_data/handwrite/HW_Chinese/train_hw.txt
ratio_list:
- 0.1
- 1.0
- 1.0
- 1.0
- 0.02
- 1.0
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data # 测试集图片路径
label_file_list:
- ./train_data/chineseocr-data/rec_hand_line_all_label_val.txt # 测试集标签
- ./train_data/handwrite/HWDB2.0Test_label.txt
- ./train_data/handwrite/HWDB2.1Test_label.txt
- ./train_data/handwrite/HWDB2.2Test_label.txt
- ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_val_labels.txt
- ./train_data/handwrite/HW_Chinese/test_hw.txt
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
```
由于数据集大多是长文本,因此需要**注释**掉下面的数据增广策略,以便训练出更好的模型。
```
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
```
### 6.3 开始训练
我们使用上面修改好的配置文件`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml`,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。
```python
# 开始训练识别模型
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
```
## 7. 模型评估
在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:
```python
# 评估预训练模型
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
```
```
[2022/07/14 10:46:22] ppocr INFO: load pretrain successful from ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy
eval model:: 100%|████████████████████████████| 687/687 [03:29<00:00, 3.27it/s]
[2022/07/14 10:49:52] ppocr INFO: metric eval ***************
[2022/07/14 10:49:52] ppocr INFO: acc:0.03724954461811258
[2022/07/14 10:49:52] ppocr INFO: norm_edit_dis:0.4859541065843199
[2022/07/14 10:49:52] ppocr INFO: Teacher_acc:0.0371584699368947
[2022/07/14 10:49:52] ppocr INFO: Teacher_norm_edit_dis:0.48718814890536477
[2022/07/14 10:49:52] ppocr INFO: fps:947.8562684823883
```
可以看出,直接加载预训练模型进行评估,效果较差,因为预训练模型并不是基于手写文字进行单独训练的,所以我们需要基于预训练模型进行finetune。
训练完成后,可以进行测试评估,评估命令如下:
```python
# 评估finetune效果
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy"
```
评估结果如下,可以看出识别准确率为54.3%。
```
[2022/07/14 10:54:06] ppocr INFO: metric eval ***************
[2022/07/14 10:54:06] ppocr INFO: acc:0.5430100180913
[2022/07/14 10:54:06] ppocr INFO: norm_edit_dis:0.9203322593158589
[2022/07/14 10:54:06] ppocr INFO: Teacher_acc:0.5401183969626324
[2022/07/14 10:54:06] ppocr INFO: Teacher_norm_edit_dis:0.919827504507755
[2022/07/14 10:54:06] ppocr INFO: fps:928.948733797251
```
如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="left">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
将下载或训练完成的模型放置在对应目录下即可完成模型推理。
## 8. 模型导出推理
训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
### 8.1 模型导出
导出命令如下:
```python
# 转化为推理模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
```
### 8.2 模型推理
导出模型后,可以使用如下命令进行推理预测:
```python
# 推理预测
python tools/infer/predict_rec.py --image_dir="train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
```
```
[2022/07/14 10:55:56] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/07/14 10:55:58] ppocr INFO: Predicts of train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg:('品结构,差异化的多品牌渗透使欧莱雅确立了其在中国化妆', 0.9904912114143372)
```
```python
# 可视化文字识别图片
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
img_path = 'train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg'
def vis(img_path):
plt.figure()
image = Image.open(img_path)
plt.imshow(image)
plt.show()
# image = image.resize([208, 208])
vis(img_path)
```
![res](https://ai-studio-static-online.cdn.bcebos.com/ad7c02745491498d82e0ce95f4a274f9b3920b2f467646858709359b7af9d869)
# 金融智能核验:扫描合同关键信息抽取
本案例将使用OCR技术和通用信息抽取技术,实现合同关键信息审核和比对。通过本章的学习,你可以快速掌握:
1. 使用PaddleOCR提取扫描文本内容
2. 使用PaddleNLP抽取自定义信息
点击进入 [AI Studio 项目](https://aistudio.baidu.com/aistudio/projectdetail/4545772)
## 1. 项目背景
合同审核广泛应用于大中型企业、上市公司、证券、基金公司中,是规避风险的重要任务。
- 合同内容对比:合同审核场景中,快速找出不同版本合同修改区域、版本差异;如合同盖章归档场景中有效识别实际签署的纸质合同、电子版合同差异。
- 合规性检查:法务人员进行合同审核,如合同完备性检查、大小写金额检查、签约主体一致性检查、双方权利和义务对等性分析等。
- 风险点识别:通过合同审核可识别事实倾向型风险点和数值计算型风险点等,例如交付地点约定不明、合同总价款不一致、重要条款缺失等风险点。
![](https://ai-studio-static-online.cdn.bcebos.com/d5143df967fa4364a38868793fe7c57b0c0b1213930243babd6ae01423dcbc4d)
传统业务中大多使用人工进行纸质版合同审核,存在成本高,工作量大,效率低的问题,且一旦出错将造成巨额损失。
本项目针对以上场景,使用PaddleOCR+PaddleNLP快速提取文本内容,经过少量数据微调即可准确抽取关键信息,**高效完成合同内容对比、合规性检查、风险点识别等任务,提高效率,降低风险**
![](https://ai-studio-static-online.cdn.bcebos.com/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b)
## 2. 解决方案
### 2.1 扫描合同文本内容提取
使用PaddleOCR开源的模型可以快速完成扫描文档的文本内容提取,在清晰文档上识别准确率可达到95%+。下面来快速体验一下:
#### 2.1.1 环境准备
[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)提供了适用于通用场景的高精轻量模型,提供数据预处理-模型推理-后处理全流程,支持pip安装:
```
python -m pip install paddleocr
```
#### 2.1.2 效果测试
使用一张合同图片作为测试样本,感受ppocrv3模型效果:
<img src=https://ai-studio-static-online.cdn.bcebos.com/46258d0dc9dc40bab3ea0e70434e4a905646df8a647f4c49921e217de5142def width=300>
使用中文检测+识别模型提取文本,实例化PaddleOCR类:
```
from paddleocr import PaddleOCR, draw_ocr
# paddleocr目前支持中英文、英文、法语、德语、韩语、日语等80个语种,可以通过修改lang参数进行切换
ocr = PaddleOCR(use_angle_cls=False, lang="ch") # need to run only once to download and load model into memory
```
一行命令启动预测,预测结果包括`检测框``文本识别内容`:
```
img_path = "./test_img/hetong2.jpg"
result = ocr.ocr(img_path, cls=False)
for line in result:
print(line)
# 可视化结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.show()
```
#### 2.1.3 图片预处理
通过上图可视化结果可以看到,印章部分造成的文本遮盖,影响了文本识别结果,因此可以考虑通道提取,去除图片中的红色印章:
```
import cv2
import numpy as np
import matplotlib.pyplot as plt
#读入图像,三通道
image=cv2.imread("./test_img/hetong2.jpg",cv2.IMREAD_COLOR) #timg.jpeg
#获得三个通道
Bch,Gch,Rch=cv2.split(image)
#保存三通道图片
cv2.imwrite('blue_channel.jpg',Bch)
cv2.imwrite('green_channel.jpg',Gch)
cv2.imwrite('red_channel.jpg',Rch)
```
#### 2.1.4 合同文本信息提取
经过2.1.3的预处理后,合同照片的红色通道被分离,获得了一张相对更干净的图片,此时可以再次使用ppocr模型提取文本内容:
```
import numpy as np
import cv2
img_path = './red_channel.jpg'
result = ocr.ocr(img_path, cls=False)
# 可视化结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
im_show = Image.fromarray(im_show)
vis = np.array(im_show)
im_show.show()
```
忽略检测框内容,提取完整的合同文本:
```
txts = [line[1][0] for line in result]
all_context = "\n".join(txts)
print(all_context)
```
通过以上环节就完成了扫描合同关键信息抽取的第一步:文本内容提取,接下来可以基于识别出的文本内容抽取关键信息
### 2.2 合同关键信息抽取
#### 2.2.1 环境准备
安装PaddleNLP
```
pip install --upgrade pip
pip install --upgrade paddlenlp
```
#### 2.2.2 合同关键信息抽取
PaddleNLP 使用 Taskflow 统一管理多场景任务的预测功能,其中`information_extraction` 通过大量的有标签样本进行训练,在通用的场景中一般可以直接使用,只需更换关键字即可。例如在合同信息抽取中,我们重新定义抽取关键字:
甲方、乙方、币种、金额、付款方式
将使用OCR提取好的文本作为输入,使用三行命令可以对上文中提取到的合同文本进行关键信息抽取:
```
from paddlenlp import Taskflow
schema = ["甲方","乙方","总价"]
ie = Taskflow('information_extraction', schema=schema)
ie.set_schema(schema)
ie(all_context)
```
可以看到UIE模型可以准确的提取出关键信息,用于后续的信息比对或审核。
## 3.效果优化
### 3.1 文本识别后处理调优
实际图片采集过程中,可能出现部分图片弯曲等问题,导致使用默认参数识别文本时存在漏检,影响关键信息获取。
例如下图:
<img src="https://ai-studio-static-online.cdn.bcebos.com/fe350481be0241c58736d487d1bf06c2e65911bf01254a79944be629c4c10091" height="300" width="300">
直接进行预测:
```
img_path = "./test_img/hetong3.jpg"
# 预测结果
result = ocr.ocr(img_path, cls=False)
# 可视化结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.show()
```
可视化结果可以看到,弯曲图片存在漏检,一般来说可以通过调整后处理参数解决,无需重新训练模型。漏检问题往往是因为检测模型获得的分割图太小,生成框的得分过低被过滤掉了,通常有两种方式调整参数:
- 开启`use_dilatiion=True` 膨胀分割区域
- 调小`det_db_box_thresh`阈值
```
# 重新实例化 PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="ch", det_db_box_thresh=0.3, use_dilation=True)
# 预测并可视化
img_path = "./test_img/hetong3.jpg"
# 预测结果
result = ocr.ocr(img_path, cls=False)
# 可视化结果
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.show()
```
可以看到漏检问题被很好的解决,提取完整的文本内容:
```
txts = [line[1][0] for line in result]
context = "\n".join(txts)
print(context)
```
### 3.2 关键信息提取调优
UIE通过大量有标签样本进行训练,得到了一个开箱即用的高精模型。 然而针对不同场景,可能会出现部分实体无法被抽取的情况。通常来说有以下几个方法进行效果调优:
- 修改 schema
- 添加正则方法
- 标注小样本微调模型
**修改schema**
Prompt和原文描述越像,抽取效果越好,例如
```
三:合同价格:总价为人民币大写:参拾玖万捌仟伍佰
元,小写:398500.00元。总价中包括站房工程建设、安装
及相关避雷、消防、接地、电力、材料费、检验费、安全、
验收等所需费用及其他相关费用和税金。
```
schema = ["总金额"] 时无法准确抽取,与原文描述差异较大。 修改 schema = ["总价"] 再次尝试:
```
from paddlenlp import Taskflow
# schema = ["总金额"]
schema = ["总价"]
ie = Taskflow('information_extraction', schema=schema)
ie.set_schema(schema)
ie(all_context)
```
**模型微调**
UIE的建模方式主要是通过 `Prompt` 方式来建模, `Prompt` 在小样本上进行微调效果非常有效。详细的数据标注+模型微调步骤可以参考项目:
[PaddleNLP信息抽取技术重磅升级!](https://aistudio.baidu.com/aistudio/projectdetail/3914778?channelType=0&channel=0)
[工单信息抽取](https://aistudio.baidu.com/aistudio/projectdetail/3914778?contributionType=1)
[快递单信息抽取](https://aistudio.baidu.com/aistudio/projectdetail/4038499?contributionType=1)
## 总结
扫描合同的关键信息提取可以使用 PaddleOCR + PaddleNLP 组合实现,两个工具均有以下优势:
* 使用简单:whl包一键安装,3行命令调用
* 效果领先:优秀的模型效果可覆盖几乎全部的应用场景
* 调优成本低:OCR模型可通过后处理参数的调整适配略有偏差的扫描文本, UIE模型可以通过极少的标注样本微调,成本很低。
## 作业
尝试自己解析出 `test_img/homework.png` 扫描合同中的 [甲方、乙方] 关键词:
<img src=https://ai-studio-static-online.cdn.bcebos.com/50a49a3c9f8348bfa04e8c8b97d3cce0d0dd6b14040f43939268d120688ef7ca width=300 hight=400>
更多场景下的垂类模型获取,请扫下图二维码填写问卷,加入PaddleOCR官方交流群获取模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<img src=https://ai-studio-static-online.cdn.bcebos.com/606538b59ea845cb99943b1dec6efe724e78f75c1e9c49228c7bf7da9f8837f5 width=300 hight=300>
# 基于PP-OCRv3的液晶屏读数识别
- [1. 项目背景及意义](#1-项目背景及意义)
- [2. 项目内容](#2-项目内容)
- [3. 安装环境](#3-安装环境)
- [4. 文字检测](#4-文字检测)
- [4.1 PP-OCRv3检测算法介绍](#41-PP-OCRv3检测算法介绍)
- [4.2 数据准备](#42-数据准备)
- [4.3 模型训练](#43-模型训练)
- [4.3.1 预训练模型直接评估](#431-预训练模型直接评估)
- [4.3.2 预训练模型直接finetune](#432-预训练模型直接finetune)
- [4.3.3 基于预训练模型Finetune_student模型](#433-基于预训练模型Finetune_student模型)
- [4.3.4 基于预训练模型Finetune_teacher模型](#434-基于预训练模型Finetune_teacher模型)
- [4.3.5 采用CML蒸馏进一步提升student模型精度](#435-采用CML蒸馏进一步提升student模型精度)
- [4.3.6 模型导出推理](#436-4.3.6-模型导出推理)
- [5. 文字识别](#5-文字识别)
- [5.1 PP-OCRv3识别算法介绍](#51-PP-OCRv3识别算法介绍)
- [5.2 数据准备](#52-数据准备)
- [5.3 模型训练](#53-模型训练)
- [5.4 模型导出推理](#54-模型导出推理)
- [6. 系统串联](#6-系统串联)
- [6.1 后处理](#61-后处理)
- [7. PaddleServing部署](#7-PaddleServing部署)
## 1. 项目背景及意义
目前光学字符识别(OCR)技术在我们的生活当中被广泛使用,但是大多数模型在通用场景下的准确性还有待提高,针对于此我们借助飞桨提供的PaddleOCR套件较容易的实现了在垂类场景下的应用。
该项目以国家质量基础(NQI)为准绳,充分利用大数据、云计算、物联网等高新技术,构建覆盖计量端、实验室端、数据端和硬件端的完整计量解决方案,解决传统计量校准中存在的难题,拓宽计量检测服务体系和服务领域;解决无数传接口或数传接口不统一、不公开的计量设备,以及计量设备所处的环境比较恶劣,不适合人工读取数据。通过OCR技术实现远程计量,引领计量行业向智慧计量转型和发展。
## 2. 项目内容
本项目基于PaddleOCR开源套件,以PP-OCRv3检测和识别模型为基础,针对液晶屏读数识别场景进行优化。
Aistudio项目链接:[OCR液晶屏读数识别](https://aistudio.baidu.com/aistudio/projectdetail/4080130)
## 3. 安装环境
```python
# 首先git官方的PaddleOCR项目,安装需要的依赖
# 第一次运行打开该注释
# git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
```
## 4. 文字检测
文本检测的任务是定位出输入图像中的文字区域。近年来学术界关于文本检测的研究非常丰富,一类方法将文本检测视为目标检测中的一个特定场景,基于通用目标检测算法进行改进适配,如TextBoxes[1]基于一阶段目标检测器SSD[2]算法,调整目标框使之适合极端长宽比的文本行,CTPN[3]则是基于Faster RCNN[4]架构改进而来。但是文本检测与目标检测在目标信息以及任务本身上仍存在一些区别,如文本一般长宽比较大,往往呈“条状”,文本行之间可能比较密集,弯曲文本等,因此又衍生了很多专用于文本检测的算法。本项目基于PP-OCRv3算法进行优化。
### 4.1 PP-OCRv3检测算法介绍
PP-OCRv3检测模型是对PP-OCRv2中的CML(Collaborative Mutual Learning) 协同互学习文本检测蒸馏策略进行了升级。如下图所示,CML的核心思想结合了①传统的Teacher指导Student的标准蒸馏与 ②Students网络之间的DML互学习,可以让Students网络互学习的同时,Teacher网络予以指导。PP-OCRv3分别针对教师模型和学生模型进行进一步效果优化。其中,在对教师模型优化时,提出了大感受野的PAN结构LK-PAN和引入了DML(Deep Mutual Learning)蒸馏策略;在对学生模型优化时,提出了残差注意力机制的FPN结构RSE-FPN。
![](https://ai-studio-static-online.cdn.bcebos.com/c306b2f028364805a55494d435ab553a76cf5ae5dd3f4649a948ea9aeaeb28b8)
详细优化策略描述请参考[PP-OCRv3优化策略](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/PP-OCRv3_introduction.md#2)
### 4.2 数据准备
[计量设备屏幕字符检测数据集](https://aistudio.baidu.com/aistudio/datasetdetail/127845)数据来源于实际项目中各种计量设备的数显屏,以及在网上搜集的一些其他数显屏,包含训练集755张,测试集355张。
```python
# 在PaddleOCR下创建新的文件夹train_data
mkdir train_data
# 下载数据集并解压到指定路径下
unzip icdar2015.zip -d train_data
```
```python
# 随机查看文字检测数据集图片
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
train = './train_data/icdar2015/text_localization/test'
# 从指定目录中选取一张图片
def get_one_image(train):
plt.figure()
files = os.listdir(train)
n = len(files)
ind = np.random.randint(0,n)
img_dir = os.path.join(train,files[ind])
image = Image.open(img_dir)
plt.imshow(image)
plt.show()
image = image.resize([208, 208])
get_one_image(train)
```
![det_png](https://ai-studio-static-online.cdn.bcebos.com/0639da09b774458096ae577e82b2c59e89ced6a00f55458f946997ab7472a4f8)
### 4.3 模型训练
#### 4.3.1 预训练模型直接评估
下载我们需要的PP-OCRv3检测预训练模型,更多选择请自行选择其他的[文字检测模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/models_list.md#1-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B)
```python
#使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_det_distill_train.tar -C pretrained_models
```
在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:
```python
# 评估预训练模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy"
```
结果如下:
| | 方案 |hmeans|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%|
#### 4.3.2 预训练模型直接finetune
##### 修改配置文件
我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
```
##### 开始训练
使用我们上面修改的配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml,训练命令如下:
```python
# 开始训练模型
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
```
评估训练好的模型:
```python
# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det/best_accuracy"
```
结果如下:
| | 方案 |hmeans|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%|
| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%|
#### 4.3.3 基于预训练模型Finetune_student模型
我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det_student/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/student
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
```
训练命令如下:
```python
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/student
```
评估训练好的模型:
```python
# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_student/best_accuracy"
```
结果如下:
| | 方案 |hmeans|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%|
| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%|
| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%|
#### 4.3.4 基于预训练模型Finetune_teacher模型
首先需要从提供的预训练模型best_accuracy.pdparams中提取teacher参数,组合成适合dml训练的初始化模型,提取代码如下:
```python
cd ./pretrained_models/
# transform teacher params in best_accuracy.pdparams into teacher_dml.paramers
import paddle
# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())
# keep teacher params
t_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
# print(t_params.keys())
s_params = {"Student." + key: t_params[key] for key in t_params}
s2_params = {"Student2." + key: t_params[key] for key in t_params}
s_params = {**s_params, **s2_params}
# print(s_params.keys())
paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/teacher_dml.pdparams")
```
我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det_teacher/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
```
训练命令如下:
```python
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
```
评估训练好的模型:
```python
# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_teacher/best_accuracy"
```
结果如下:
| | 方案 |hmeans|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%|
| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%|
| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%|
| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.80%|
#### 4.3.5 采用CML蒸馏进一步提升student模型精度
需要从4.3.3和4.3.4训练得到的best_accuracy.pdparams中提取各自代表student和teacher的参数,组合成适合cml训练的初始化模型,提取代码如下:
```python
# transform teacher params and student parameters into cml model
import paddle
all_params = paddle.load("./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())
t_params = paddle.load("./output/ch_PP-OCR_v3_det_teacher/best_accuracy.pdparams")
# print(t_params.keys())
s_params = paddle.load("./output/ch_PP-OCR_v3_det_student/best_accuracy.pdparams")
# print(s_params.keys())
for key in all_params:
# teacher is OK
if "Teacher." in key:
new_key = key.replace("Teacher", "Student")
#print("{} >> {}\n".format(key, new_key))
assert all_params[key].shape == t_params[new_key].shape
all_params[key] = t_params[new_key]
if "Student." in key:
new_key = key.replace("Student.", "")
#print("{} >> {}\n".format(key, new_key))
assert all_params[key].shape == s_params[new_key].shape
all_params[key] = s_params[new_key]
if "Student2." in key:
new_key = key.replace("Student2.", "")
print("{} >> {}\n".format(key, new_key))
assert all_params[key].shape == s_params[new_key].shape
all_params[key] = s_params[new_key]
paddle.save(all_params, "./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student.pdparams")
```
训练命令如下:
```python
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student Global.save_model_dir=./output/ch_PP-OCR_v3_det_finetune/
```
评估训练好的模型:
```python
# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_finetune/best_accuracy"
```
结果如下:
| | 方案 |hmeans|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量检测预训练模型直接预测 |47.50%|
| 1 | PP-OCRv3中英文超轻量检测预训练模型fintune |65.20%|
| 2 | PP-OCRv3中英文超轻量检测预训练模型fintune学生模型 |80.00%|
| 3 | PP-OCRv3中英文超轻量检测预训练模型fintune教师模型 |84.80%|
| 4 | 基于2和3训练好的模型fintune |82.70%|
如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="left">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
将下载或训练完成的模型放置在对应目录下即可完成模型推理。
#### 4.3.6 模型导出推理
训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
##### 4.3.6.1 模型导出
导出命令如下:
```python
# 转化为推理模型
python tools/export_model.py \
-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./output/ch_PP-OCR_v3_det_finetune/best_accuracy \
-o Global.save_inference_dir="./inference/det_ppocrv3"
```
##### 4.3.6.2 模型推理
导出模型后,可以使用如下命令进行推理预测:
```python
# 推理预测
python tools/infer/predict_det.py --image_dir="train_data/icdar2015/text_localization/test/1.jpg" --det_model_dir="./inference/det_ppocrv3/Student"
```
## 5. 文字识别
文本识别的任务是识别出图像中的文字内容,一般输入来自于文本检测得到的文本框截取出的图像文字区域。文本识别一般可以根据待识别文本形状分为规则文本识别和不规则文本识别两大类。规则文本主要指印刷字体、扫描文本等,文本大致处在水平线位置;不规则文本往往不在水平位置,存在弯曲、遮挡、模糊等问题。不规则文本场景具有很大的挑战性,也是目前文本识别领域的主要研究方向。本项目基于PP-OCRv3算法进行优化。
### 5.1 PP-OCRv3识别算法介绍
PP-OCRv3的识别模块是基于文本识别算法[SVTR](https://arxiv.org/abs/2205.00159)优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。如下图所示,PP-OCRv3采用了6个优化策略。
![](https://ai-studio-static-online.cdn.bcebos.com/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a)
优化策略汇总如下:
* SVTR_LCNet:轻量级文本识别网络
* GTC:Attention指导CTC训练策略
* TextConAug:挖掘文字上下文信息的数据增广策略
* TextRotNet:自监督的预训练模型
* UDML:联合互学习策略
* UIM:无标注数据挖掘方案
详细优化策略描述请参考[PP-OCRv3优化策略](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/PP-OCRv3_introduction.md#3-%E8%AF%86%E5%88%AB%E4%BC%98%E5%8C%96)
### 5.2 数据准备
[计量设备屏幕字符识别数据集](https://aistudio.baidu.com/aistudio/datasetdetail/128714)数据来源于实际项目中各种计量设备的数显屏,以及在网上搜集的一些其他数显屏,包含训练集19912张,测试集4099张。
```python
# 解压下载的数据集到指定路径下
unzip ic15_data.zip -d train_data
```
```python
# 随机查看文字检测数据集图片
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
train = './train_data/ic15_data/train'
# 从指定目录中选取一张图片
def get_one_image(train):
plt.figure()
files = os.listdir(train)
n = len(files)
ind = np.random.randint(0,n)
img_dir = os.path.join(train,files[ind])
image = Image.open(img_dir)
plt.imshow(image)
plt.show()
image = image.resize([208, 208])
get_one_image(train)
```
![rec_png](https://ai-studio-static-online.cdn.bcebos.com/3de0d475c69746d0a184029001ef07c85fd68816d66d4beaa10e6ef60030f9b4)
### 5.3 模型训练
#### 下载预训练模型
下载我们需要的PP-OCRv3识别预训练模型,更多选择请自行选择其他的[文字识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/models_list.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)
```python
# 使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models
```
#### 修改配置文件
我们使用configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch_num: 100 # 训练epoch数
save_model_dir: ./output/ch_PP-OCR_v3_rec
save_epoch_step: 10
eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
cal_metric_during_train: true
pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy # 预训练模型路径
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
use_space_char: true # 使用空格
lr:
name: Cosine # 修改学习率衰减策略为Cosine
learning_rate: 0.0002 # 修改fine-tune的学习率
warmup_epoch: 2 # 修改warmup轮数
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/ # 训练集图片路径
ext_op_transform_idx: 1
label_file_list:
- ./train_data/ic15_data/rec_gt_train.txt # 训练集标签
ratio_list:
- 1.0
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/ # 测试集图片路径
label_file_list:
- ./train_data/ic15_data/rec_gt_test.txt # 测试集标签
ratio_list:
- 1.0
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
```
在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:
```python
# 评估预训练模型
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
```
结果如下:
| | 方案 |accuracy|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.40%|
#### 开始训练
我们使用上面修改好的配置文件configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。
```python
# 开始训练识别模型
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
```
训练完成后,可以对训练模型中最好的进行测试,评估命令如下:
```python
# 评估finetune效果
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints="./output/ch_PP-OCR_v3_rec/best_accuracy"
```
结果如下:
| | 方案 |accuracy|
|---|---------------------------|---|
| 0 | PP-OCRv3中英文超轻量识别预训练模型直接预测 |70.40%|
| 1 | PP-OCRv3中英文超轻量识别预训练模型finetune |82.20%|
如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="left">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
将下载或训练完成的模型放置在对应目录下即可完成模型推理。
### 5.4 模型导出推理
训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
#### 模型导出
导出命令如下:
```python
# 转化为推理模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
```
#### 模型推理
导出模型后,可以使用如下命令进行推理预测
```python
# 推理预测
python tools/infer/predict_rec.py --image_dir="train_data/ic15_data/test/1_crop_0.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
```
## 6. 系统串联
我们将上面训练好的检测和识别模型进行系统串联测试,命令如下:
```python
#串联测试
python3 tools/infer/predict_system.py --image_dir="./train_data/icdar2015/text_localization/test/142.jpg" --det_model_dir="./inference/det_ppocrv3/Student" --rec_model_dir="./inference/rec_ppocrv3/Student"
```
测试结果保存在`./inference_results/`目录下,可以用下面代码进行可视化
```python
%cd /home/aistudio/PaddleOCR
# 显示结果
import matplotlib.pyplot as plt
from PIL import Image
img_path= "./inference_results/142.jpg"
img = Image.open(img_path)
plt.figure("test_img", figsize=(30,30))
plt.imshow(img)
plt.show()
```
![sys_res_png](https://ai-studio-static-online.cdn.bcebos.com/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373)
### 6.1 后处理
如果需要获取key-value信息,可以基于启发式的规则,将识别结果与关键字库进行匹配;如果匹配上了,则取该字段为key, 后面一个字段为value。
```python
def postprocess(rec_res):
keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
"累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
"全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
"大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
key_value = []
if len(rec_res) > 1:
for i in range(len(rec_res) - 1):
rec_str, _ = rec_res[i]
for key in keys:
if rec_str in key:
key_value.append([rec_str, rec_res[i + 1][0]])
break
return key_value
key_value = postprocess(filter_rec_res)
```
## 7. PaddleServing部署
首先需要安装PaddleServing部署相关的环境
```python
python -m pip install paddle-serving-server-gpu
python -m pip install paddle_serving_client
python -m pip install paddle-serving-app
```
### 7.1 转化检测模型
```python
cd deploy/pdserving/
python -m paddle_serving_client.convert --dirname ../../inference/det_ppocrv3/Student/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ppocr_det_v3_serving/ \
--serving_client ./ppocr_det_v3_client/
```
### 7.2 转化识别模型
```python
python -m paddle_serving_client.convert --dirname ../../inference/rec_ppocrv3/Student \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ppocr_rec_v3_serving/ \
--serving_client ./ppocr_rec_v3_client/
```
### 7.3 启动服务
首先可以将后处理代码加入到web_service.py中,具体修改如下:
```
# 代码153行后面增加下面代码
def _postprocess(rec_res):
keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
"累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
"全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
"大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
key_value = []
if len(rec_res) > 1:
for i in range(len(rec_res) - 1):
rec_str, _ = rec_res[i]
for key in keys:
if rec_str in key:
key_value.append([rec_str, rec_res[i + 1][0]])
break
return key_value
key_value = _postprocess(rec_list)
res = {"result": str(key_value)}
# res = {"result": str(result_list)}
```
启动服务端
```python
python web_service.py 2>&1 >log.txt
```
### 7.4 发送请求
然后再开启一个新的终端,运行下面的客户端代码
```python
python pipeline_http_client.py --image_dir ../../train_data/icdar2015/text_localization/test/142.jpg
```
可以获取到最终的key-value结果:
```
大气压, 100.07kPa
干球温度, 0000℃
计前温度, 0000℃
湿球温度, 0000℃
计前压力, -0000kPa
流量, 00.0L/min
静压, 00000kPa
含湿量, 00.0 %
```
# 一种基于PaddleOCR的轻量级车牌识别模型
- [1. 项目介绍](#1-项目介绍)
- [2. 环境搭建](#2-环境搭建)
- [3. 数据集准备](#3-数据集准备)
- [3.1 数据集标注规则](#31-数据集标注规则)
- [3.2 制作符合PP-OCR训练格式的标注文件](#32-制作符合pp-ocr训练格式的标注文件)
- [4. 实验](#4-实验)
- [4.1 检测](#41-检测)
- [4.1.1 预训练模型直接预测](#411-预训练模型直接预测)
- [4.1.2 CCPD车牌数据集fine-tune](#412-ccpd车牌数据集fine-tune)
- [4.1.3 CCPD车牌数据集fine-tune+量化训练](#413-ccpd车牌数据集fine-tune量化训练)
- [4.1.4 模型导出](#414-模型导出)
- [4.2 识别](#42-识别)
- [4.2.1 预训练模型直接预测](#421-预训练模型直接预测)
- [4.2.2 预训练模型直接预测+改动后处理](#422-预训练模型直接预测改动后处理)
- [4.2.3 CCPD车牌数据集fine-tune](#423-ccpd车牌数据集fine-tune)
- [4.2.4 CCPD车牌数据集fine-tune+量化训练](#424-ccpd车牌数据集fine-tune量化训练)
- [4.2.5 模型导出](#425-模型导出)
- [4.3 计算End2End指标](#43-计算End2End指标)
- [4.4 部署](#44-部署)
- [4.5 实验总结](#45-实验总结)
## 1. 项目介绍
车牌识别(Vehicle License Plate Recognition,VLPR) 是计算机视频图像识别技术在车辆牌照识别中的一种应用。车牌识别技术要求能够将运动中的汽车牌照从复杂背景中提取并识别出来,在高速公路车辆管理,停车场管理和城市交通中得到广泛应用。
本项目难点如下:
1. 车牌在图像中的尺度差异大、在车辆上的悬挂位置不固定
2. 车牌图像质量层次不齐: 角度倾斜、图片模糊、光照不足、过曝等问题严重
3. 边缘和端测场景应用对模型大小有限制,推理速度有要求
针对以上问题, 本例选用 PP-OCRv3 这一开源超轻量OCR系统进行车牌识别系统的开发。基于PP-OCRv3模型,在CCPD数据集达到99%的检测和94%的识别精度,模型大小12.8M(2.5M+10.3M)。基于量化对模型体积进行进一步压缩到5.8M(1M+4.8M), 同时推理速度提升25%。
aistudio项目链接: [基于PaddleOCR的轻量级车牌识别范例](https://aistudio.baidu.com/aistudio/projectdetail/3919091?contributionType=1)
## 2. 环境搭建
本任务基于Aistudio完成, 具体环境如下:
- 操作系统: Linux
- PaddlePaddle: 2.3
- paddleslim: 2.2.2
- PaddleOCR: Release/2.5
下载 PaddleOCR代码
```bash
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
```
安装依赖库
```bash
pip install -r PaddleOCR/requirements.txt
```
## 3. 数据集准备
所使用的数据集为 CCPD2020 新能源车牌数据集,该数据集为
该数据集分布如下:
|数据集类型|数量|
|---|---|
|训练集| 5769|
|验证集| 1001|
|测试集| 5006|
数据集图片示例如下:
![](https://ai-studio-static-online.cdn.bcebos.com/3bce057a8e0c40a0acbd26b2e29e4e2590a31bc412764be7b9e49799c69cb91c)
数据集可以从这里下载 https://aistudio.baidu.com/aistudio/datasetdetail/101595
下载好数据集后对数据集进行解压
```bash
unzip -d /home/aistudio/data /home/aistudio/data/data101595/CCPD2020.zip
```
### 3.1 数据集标注规则
CPPD数据集的图片文件名具有特殊规则,详细可查看:https://github.com/detectRecog/CCPD
具体规则如下:
例如: 025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg
每个名称可以分为七个字段,以-符号作为分割。这些字段解释如下。
- 025:车牌面积与整个图片区域的面积比。025 (25%)
- 95_113:水平倾斜程度和垂直倾斜度。水平 95度 垂直 113度
- 154&383_386&473:左上和右下顶点的坐标。左上(154,383) 右下(386,473)
- 386&473_177&454_154&383_363&402:整个图像中车牌的四个顶点的精确(x,y)坐标。这些坐标从右下角顶点开始。(386,473) (177,454) (154,383) (363,402)
- 0_0_22_27_27_33_16:CCPD中的每个图像只有一个车牌。每个车牌号码由一个汉字,一个字母和五个字母或数字组成。有效的中文车牌由七个字符组成:省(1个字符),字母(1个字符),字母+数字(5个字符)。“ 0_0_22_27_27_33_16”是每个字符的索引。这三个数组定义如下。每个数组的最后一个字符是字母O,而不是数字0。我们将O用作“无字符”的符号,因为中文车牌字符中没有O。因此以上车牌拼起来即为 皖AY339S
- 37:牌照区域的亮度。 37 (37%)
- 15:车牌区域的模糊度。15 (15%)
```python
provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W','X', 'Y', 'Z', 'O']
ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X','Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
```
### 3.2 制作符合PP-OCR训练格式的标注文件
在开始训练之前,可使用如下代码制作符合PP-OCR训练格式的标注文件。
```python
import cv2
import os
import json
from tqdm import tqdm
import numpy as np
provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O']
ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
def make_label(img_dir, save_gt_folder, phase):
crop_img_save_dir = os.path.join(save_gt_folder, phase, 'crop_imgs')
os.makedirs(crop_img_save_dir, exist_ok=True)
f_det = open(os.path.join(save_gt_folder, phase, 'det.txt'), 'w', encoding='utf-8')
f_rec = open(os.path.join(save_gt_folder, phase, 'rec.txt'), 'w', encoding='utf-8')
i = 0
for filename in tqdm(os.listdir(os.path.join(img_dir, phase))):
str_list = filename.split('-')
if len(str_list) < 5:
continue
coord_list = str_list[3].split('_')
txt_list = str_list[4].split('_')
boxes = []
for coord in coord_list:
boxes.append([int(x) for x in coord.split("&")])
boxes = [boxes[2], boxes[3], boxes[0], boxes[1]]
lp_number = provinces[int(txt_list[0])] + alphabets[int(txt_list[1])] + ''.join([ads[int(x)] for x in txt_list[2:]])
# det
det_info = [{'points':boxes, 'transcription':lp_number}]
f_det.write('{}\t{}\n'.format(os.path.join(phase, filename), json.dumps(det_info, ensure_ascii=False)))
# rec
boxes = np.float32(boxes)
img = cv2.imread(os.path.join(img_dir, phase, filename))
# crop_img = img[int(boxes[:,1].min()):int(boxes[:,1].max()),int(boxes[:,0].min()):int(boxes[:,0].max())]
crop_img = get_rotate_crop_image(img, boxes)
crop_img_save_filename = '{}_{}.jpg'.format(i,'_'.join(txt_list))
crop_img_save_path = os.path.join(crop_img_save_dir, crop_img_save_filename)
cv2.imwrite(crop_img_save_path, crop_img)
f_rec.write('{}/crop_imgs/{}\t{}\n'.format(phase, crop_img_save_filename, lp_number))
i+=1
f_det.close()
f_rec.close()
def get_rotate_crop_image(img, points):
'''
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
'''
assert len(points) == 4, "shape of points must be 4*2"
img_crop_width = int(
max(
np.linalg.norm(points[0] - points[1]),
np.linalg.norm(points[2] - points[3])))
img_crop_height = int(
max(
np.linalg.norm(points[0] - points[3]),
np.linalg.norm(points[1] - points[2])))
pts_std = np.float32([[0, 0], [img_crop_width, 0],
[img_crop_width, img_crop_height],
[0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE,
flags=cv2.INTER_CUBIC)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
img_dir = '/home/aistudio/data/CCPD2020/ccpd_green'
save_gt_folder = '/home/aistudio/data/CCPD2020/PPOCR'
# phase = 'train' # change to val and test to make val dataset and test dataset
for phase in ['train','val','test']:
make_label(img_dir, save_gt_folder, phase)
```
通过上述命令可以完成了`训练集``验证集``测试集`的制作,制作完成的数据集信息如下:
| 类型 | 数据集 | 图片地址 | 标签地址 | 图片数量 |
| --- | --- | --- | --- | --- |
| 检测 | 训练集 | /home/aistudio/data/CCPD2020/ccpd_green/train | /home/aistudio/data/CCPD2020/PPOCR/train/det.txt | 5769 |
| 检测 | 验证集 | /home/aistudio/data/CCPD2020/ccpd_green/val | /home/aistudio/data/CCPD2020/PPOCR/val/det.txt | 1001 |
| 检测 | 测试集 | /home/aistudio/data/CCPD2020/ccpd_green/test | /home/aistudio/data/CCPD2020/PPOCR/test/det.txt | 5006 |
| 识别 | 训练集 | /home/aistudio/data/CCPD2020/PPOCR/train/crop_imgs | /home/aistudio/data/CCPD2020/PPOCR/train/rec.txt | 5769 |
| 识别 | 验证集 | /home/aistudio/data/CCPD2020/PPOCR/val/crop_imgs | /home/aistudio/data/CCPD2020/PPOCR/val/rec.txt | 1001 |
| 识别 | 测试集 | /home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs | /home/aistudio/data/CCPD2020/PPOCR/test/rec.txt | 5006 |
在普遍的深度学习流程中,都是在训练集训练,在验证集选择最优模型后在测试集上进行测试。在本例中,我们省略中间步骤,直接在训练集训练,在测试集选择最优模型,因此我们只使用训练集和测试集。
## 4. 实验
由于数据集比较少,为了模型更好和更快的收敛,这里选用 PaddleOCR 中的 PP-OCRv3 模型进行文本检测和识别,并且使用 PP-OCRv3 模型参数作为预训练模型。PP-OCRv3在PP-OCRv2的基础上,中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考[PP-OCRv3](../doc/doc_ch/PP-OCRv3_introduction.md)技术报告。
由于车牌场景均为端侧设备部署,因此对速度和模型大小有比较高的要求,因此还需要采用量化训练的方式进行模型大小的压缩和模型推理速度的加速。模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。
因此,本实验中对于车牌检测和识别有如下3种方案:
1. PP-OCRv3中英文超轻量预训练模型直接预测
2. CCPD车牌数据集在PP-OCRv3模型上fine-tune
3. CCPD车牌数据集在PP-OCRv3模型上fine-tune后量化
### 4.1 检测
#### 4.1.1 预训练模型直接预测
从下表中下载PP-OCRv3文本检测预训练模型
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
| --- | --- | --- | --- | --- |
|ch_PP-OCRv3_det| 【最新】原始超轻量模型,支持中英文、多语种文本检测 |[ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)| 3.8M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar)|
使用如下命令下载预训练模型
```bash
mkdir models
cd models
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar -xf ch_PP-OCRv3_det_distill_train.tar
cd /home/aistudio/PaddleOCR
```
预训练模型下载完成后,我们使用[ch_PP-OCRv3_det_student.yml](../configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml) 配置文件进行后续实验,在开始评估之前需要对配置文件中部分字段进行设置,具体如下:
1. 模型存储和训练相关:
1. Global.pretrained_model: 指向PP-OCRv3文本检测预训练模型地址
2. 数据集相关
1. Eval.dataset.data_dir:指向测试集图片存放目录
2. Eval.dataset.label_file_list:指向测试集标注文件
上述字段均为必须修改的字段,可以通过修改配置文件的方式改动,也可在不需要修改配置文件的情况下,改变训练的参数。这里使用不改变配置文件的方式 。使用如下命令进行PP-OCRv3文本检测预训练模型的评估
```bash
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
```
上述指令中,通过-c 选择训练使用配置文件,通过-o参数在不需要修改配置文件的情况下,改变训练的参数。
使用预训练模型进行评估,指标如下所示:
| 方案 |hmeans|
|---------------------------|---|
| PP-OCRv3中英文超轻量检测预训练模型直接预测 |76.12%|
#### 4.1.2 CCPD车牌数据集fine-tune
**训练**
为了进行fine-tune训练,我们需要在配置文件中设置需要使用的预训练模型地址,学习率和数据集等参数。 具体如下:
1. 模型存储和训练相关:
1. Global.pretrained_model: 指向PP-OCRv3文本检测预训练模型地址
2. Global.eval_batch_step: 模型多少step评估一次,这里设为从第0个step开始没隔772个step评估一次,772为一个epoch总的step数。
2. 优化器相关:
1. Optimizer.lr.name: 学习率衰减器设为常量 Const
2. Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍
3. Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关:
1. Train.dataset.data_dir:指向训练集图片存放目录
2. Train.dataset.label_file_list:指向训练集标注文件
3. Eval.dataset.data_dir:指向测试集图片存放目录
4. Eval.dataset.label_file_list:指向测试集标注文件
使用如下代码即可启动在CCPD车牌数据集上的fine-tune。
```bash
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_model_dir=output/CCPD/det \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/det.txt] \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
```
在上述命令中,通过`-o`的方式修改了配置文件中的参数。
**评估**
训练完成后使用如下命令进行评估
```bash
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
```
使用预训练模型和CCPD车牌数据集fine-tune,指标分别如下:
|方案|hmeans|
|---|---|
|PP-OCRv3中英文超轻量检测预训练模型直接预测|76.12%|
|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%|
可以看到进行fine-tune能显著提升车牌检测的效果。
#### 4.1.3 CCPD车牌数据集fine-tune+量化训练
此处采用 PaddleOCR 中提供好的[量化教程](../deploy/slim/quantization/README.md)对模型进行量化训练。
量化训练可通过如下命令启动:
```bash
python3.7 deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Global.save_model_dir=output/CCPD/det_quant \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/det.txt] \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
```
量化后指标对比如下
|方案|hmeans| 模型大小 | 预测速度(lite) |
|---|---|------|------------|
|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%| 2.5M | 223ms |
|PP-OCRv3中英文超轻量检测预训练模型 fine-tune+量化|98.91%| 1.0M | 189ms |
可以看到通过量化训练在精度几乎无损的情况下,降低模型体积60%并且推理速度提升15%。
速度测试基于[PaddleOCR lite教程](../deploy/lite/readme_ch.md)完成。
#### 4.1.4 模型导出
使用如下命令可以将训练好的模型进行导出
* 非量化模型
```bash
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Global.save_inference_dir=output/det/infer
```
* 量化模型
```bash
python deploy/slim/quantization/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det_quant/best_accuracy.pdparams \
Global.save_inference_dir=output/det/infer
```
### 4.2 识别
#### 4.2.1 预训练模型直接预测
从下表中下载PP-OCRv3文本识别预训练模型
|模型名称|模型简介|配置文件|推理模型大小|下载地址|
| --- | --- | --- | --- | --- |
|ch_PP-OCRv3_rec|【最新】原始超轻量模型,支持中英文、数字识别|[ch_PP-OCRv3_rec_distillation.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)| 12.4M |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
使用如下命令下载预训练模型
```bash
mkdir models
cd models
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
tar -xf ch_PP-OCRv3_rec_train.tar
cd /home/aistudio/PaddleOCR
```
PaddleOCR提供的PP-OCRv3识别模型采用蒸馏训练策略,因此提供的预训练模型中会包含`Teacher``Student`模型的参数,详细信息可参考[knowledge_distillation.md](../doc/doc_ch/knowledge_distillation.md)。 因此,模型下载完成后需要使用如下代码提取`Student`模型的参数:
```python
import paddle
# 加载预训练模型
all_params = paddle.load("models/ch_PP-OCRv3_rec_train/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 学生模型的权重提取
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# 查看学生模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "models/ch_PP-OCRv3_rec_train/student.pdparams")
```
预训练模型下载完成后,我们使用[ch_PP-OCRv3_rec.yml](../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml) 配置文件进行后续实验,在开始评估之前需要对配置文件中部分字段进行设置,具体如下:
1. 模型存储和训练相关:
1. Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
2. 数据集相关
1. Eval.dataset.data_dir:指向测试集图片存放目录
2. Eval.dataset.label_file_list:指向测试集标注文件
使用如下命令进行PP-OCRv3文本识别预训练模型的评估
```bash
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
```
如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="left">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
评估部分日志如下:
```bash
[2022/05/12 19:52:02] ppocr INFO: load pretrain successful from models/ch_PP-OCRv3_rec_train/best_accuracy
eval model:: 100%|██████████████████████████████| 40/40 [00:15<00:00, 2.57it/s]
[2022/05/12 19:52:17] ppocr INFO: metric eval ***************
[2022/05/12 19:52:17] ppocr INFO: acc:0.0
[2022/05/12 19:52:17] ppocr INFO: norm_edit_dis:0.8656084923002452
[2022/05/12 19:52:17] ppocr INFO: Teacher_acc:0.000399520574511545
[2022/05/12 19:52:17] ppocr INFO: Teacher_norm_edit_dis:0.8657902943394548
[2022/05/12 19:52:17] ppocr INFO: fps:1443.1801978719905
```
使用预训练模型进行评估,指标如下所示:
|方案|acc|
|---|---|
|PP-OCRv3中英文超轻量识别预训练模型直接预测|0%|
从评估日志中可以看到,直接使用PP-OCRv3预训练模型进行评估,acc非常低,但是norm_edit_dis很高。因此,我们猜测是模型大部分文字识别是对的,只有少部分文字识别错误。使用如下命令进行infer查看模型的推理结果进行验证:
```bash
python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.infer_img=/home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs/0_0_0_3_32_30_31_30_30.jpg
```
输出部分日志如下:
```bash
[2022/05/01 08:51:57] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
W0501 08:51:57.127391 11326 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0501 08:51:57.132315 11326 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022/05/01 08:52:00] ppocr INFO: load pretrain successful from models/ch_PP-OCRv3_rec_train/student
[2022/05/01 08:52:00] ppocr INFO: infer_img: /home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs/0_0_3_32_30_31_30_30.jpg
[2022/05/01 08:52:00] ppocr INFO: result: {"Student": {"label": "皖A·D86766", "score": 0.9552637934684753}, "Teacher": {"label": "皖A·D86766", "score": 0.9917094707489014}}
[2022/05/01 08:52:00] ppocr INFO: success!
```
从infer结果可以看到,车牌中的文字大部分都识别正确,只是多识别出了一个`·`。针对这种情况,有如下两种方案:
1. 直接通过后处理去掉多识别的`·`
2. 进行 fine-tune。
#### 4.2.2 预训练模型直接预测+改动后处理
直接通过后处理去掉多识别的`·`,在后处理的改动比较简单,只需在 [ppocr/postprocess/rec_postprocess.py](../ppocr/postprocess/rec_postprocess.py) 文件的76行添加如下代码:
```python
text = text.replace('·','')
```
改动前后指标对比:
|方案|acc|
|---|---|
|PP-OCRv3中英文超轻量识别预训练模型直接预测|0.20%|
|PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`|90.97%|
可以看到,去掉多余的`·`能大幅提高精度。
#### 4.2.3 CCPD车牌数据集fine-tune
**训练**
为了进行fine-tune训练,我们需要在配置文件中设置需要使用的预训练模型地址,学习率和数据集等参数。 具体如下:
1. 模型存储和训练相关:
1. Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
2. Global.eval_batch_step: 模型多少step评估一次,这里设为从第0个step开始没隔45个step评估一次,45为一个epoch总的step数。
2. 优化器相关
1. Optimizer.lr.name: 学习率衰减器设为常量 Const
2. Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍
3. Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关
1. Train.dataset.data_dir:指向训练集图片存放目录
2. Train.dataset.label_file_list:指向训练集标注文件
3. Eval.dataset.data_dir:指向测试集图片存放目录
4. Eval.dataset.label_file_list:指向测试集标注文件
使用如下命令启动 fine-tune
```bash
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_model_dir=output/CCPD/rec/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
```
**评估**
训练完成后使用如下命令进行评估
```bash
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
```
使用预训练模型和CCPD车牌数据集fine-tune,指标分别如下:
|方案| acc |
|---|--------|
|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0.00% |
|PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`| 90.97% |
|PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% |
可以看到进行fine-tune能显著提升车牌识别的效果。
#### 4.2.4 CCPD车牌数据集fine-tune+量化训练
此处采用 PaddleOCR 中提供好的[量化教程](../deploy/slim/quantization/README.md)对模型进行量化训练。
量化训练可通过如下命令启动:
```bash
python3.7 deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Global.save_model_dir=output/CCPD/rec_quant/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
```
量化后指标对比如下
|方案| acc | 模型大小 | 预测速度(lite) |
|---|--------|-------|------------|
|PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | 10.3M | 4.2ms |
|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.40% | 4.8M | 1.8ms |
可以看到量化后能降低模型体积53%并且推理速度提升57%,但是由于识别数据过少,量化带来了1%的精度下降。
速度测试基于[PaddleOCR lite教程](../deploy/lite/readme_ch.md)完成。
#### 4.2.5 模型导出
使用如下命令可以将训练好的模型进行导出。
* 非量化模型
```bash
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec/infer
```
* 量化模型
```bash
python deploy/slim/quantization/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec_quant/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec_quant/infer
```
### 4.3 计算End2End指标
端到端指标可通过 [PaddleOCR内置脚本](../tools/end2end/readme.md) 进行计算,具体步骤如下:
1. 导出模型
通过如下命令进行模型的导出。注意,量化模型导出时,需要配置eval数据集
```bash
# 检测模型
# 预训练模型
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_inference_dir=output/ch_PP-OCRv3_det_distill_train/infer
# 非量化模型
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/det/infer
# 量化模型
python deploy/slim/quantization/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det_quant/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/det_quant/infer \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt] \
Eval.loader.num_workers=0
# 识别模型
# 预训练模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_inference_dir=output/ch_PP-OCRv3_rec_train/infer
# 非量化模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec/infer
# 量化模型
python deploy/slim/quantization/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec_quant/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec_quant/infer \
Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
```
2. 用导出的模型对测试集进行预测
此处,分别使用PP-OCRv3预训练模型,fintune模型和量化模型对测试集的所有图像进行预测,命令如下:
```bash
# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型
python3 tools/infer/predict_system.py --det_model_dir=models/ch_PP-OCRv3_det_distill_train/infer --rec_model_dir=models/ch_PP-OCRv3_rec_train/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/pretrain --use_dilation=true
# PP-OCRv3中英文超轻量检测预训练模型+fine-tune,PP-OCRv3中英文超轻量识别预训练模型+fine-tune
python3 tools/infer/predict_system.py --det_model_dir=output/CCPD/det/infer --rec_model_dir=output/CCPD/rec/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/fine-tune --use_dilation=true
# PP-OCRv3中英文超轻量检测预训练模型 fine-tune +量化,PP-OCRv3中英文超轻量识别预训练模型 fine-tune +量化 结果转换和评估
python3 tools/infer/predict_system.py --det_model_dir=output/CCPD/det_quant/infer --rec_model_dir=output/CCPD/rec_quant/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/quant --use_dilation=true
```
3. 转换label并计算指标
将gt和上一步保存的预测结果转换为端对端评测需要的数据格式,并根据转换后的数据进行端到端指标计算
```bash
python3 tools/end2end/convert_ppocr_label.py --mode=gt --label_path=/home/aistudio/data/CCPD2020/PPOCR/test/det.txt --save_folder=end2end/gt
# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型 结果转换和评估
python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/pretrain/system_results.txt --save_folder=end2end/pretrain
python3 tools/end2end/eval_end2end.py end2end/gt end2end/pretrain
# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型+后处理去掉多识别的`·` 结果转换和评估
# 需手动修改后处理函数
python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/post/system_results.txt --save_folder=end2end/post
python3 tools/end2end/eval_end2end.py end2end/gt end2end/post
# PP-OCRv3中英文超轻量检测预训练模型 fine-tune,PP-OCRv3中英文超轻量识别预训练模型 fine-tune 结果转换和评估
python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/fine-tune/system_results.txt --save_folder=end2end/fine-tune
python3 tools/end2end/eval_end2end.py end2end/gt end2end/fine-tune
# PP-OCRv3中英文超轻量检测预训练模型 fine-tune +量化,PP-OCRv3中英文超轻量识别预训练模型 fine-tune +量化 结果转换和评估
python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/quant/system_results.txt --save_folder=end2end/quant
python3 tools/end2end/eval_end2end.py end2end/gt end2end/quant
```
日志如下:
```bash
The convert label saved in end2end/gt
The convert label saved in end2end/pretrain
start testing...
hit, dt_count, gt_count 2 5988 5006
character_acc: 70.42%
avg_edit_dist_field: 2.37
avg_edit_dist_img: 2.37
precision: 0.03%
recall: 0.04%
fmeasure: 0.04%
The convert label saved in end2end/post
start testing...
hit, dt_count, gt_count 4224 5988 5006
character_acc: 81.59%
avg_edit_dist_field: 1.47
avg_edit_dist_img: 1.47
precision: 70.54%
recall: 84.38%
fmeasure: 76.84%
The convert label saved in end2end/fine-tune
start testing...
hit, dt_count, gt_count 4286 4898 5006
character_acc: 94.16%
avg_edit_dist_field: 0.47
avg_edit_dist_img: 0.47
precision: 87.51%
recall: 85.62%
fmeasure: 86.55%
The convert label saved in end2end/quant
start testing...
hit, dt_count, gt_count 4349 4951 5006
character_acc: 94.13%
avg_edit_dist_field: 0.47
avg_edit_dist_img: 0.47
precision: 87.84%
recall: 86.88%
fmeasure: 87.36%
```
各个方案端到端指标如下:
|模型| 指标 |
|---|--------|
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型| 0.04% |
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`| 78.27% |
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune| 87.14% |
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化 <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化| 88.00% |
从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到78.27%,在CCPD数据集上进行 fine-tune 后指标进一步提升到87.14%, 在经过量化训练之后,由于检测模型的recall变高,指标进一步提升到88%。但是这个结果仍旧不符合检测模型+识别模型的真实性能(99%*94%=93%),因此我们需要对 base case 进行具体分析。
在之前的端到端预测结果中,可以看到很多不符合车牌标注的文字被识别出来, 因此可以进行简单的过滤来提升precision
为了快速评估,我们在 ` tools/end2end/convert_ppocr_label.py` 脚本的 58 行加入如下代码,对非8个字符的结果进行过滤
```python
if len(txt) != 8: # 车牌字符串长度为8
continue
```
此外,通过可视化box可以发现有很多框都是竖直翻转之后的框,并且没有完全框住车牌边界,因此需要进行框的竖直翻转以及轻微扩大,示意图如下:
![](https://ai-studio-static-online.cdn.bcebos.com/59ab0411c8eb4dfd917fb2b6e5b69a17ee7ca48351444aec9ac6104b79ff1028)
修改前后个方案指标对比如下:
各个方案端到端指标如下:
|模型|base|A:识别结果过滤|B:use_dilation|C:flip_box|best|
|---|---|---|---|---|---|
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型|0.04%|0.08%|0.02%|0.05%|0.00%(A)|
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`|78.27%|90.84%|78.61%|79.43%|91.66%(A+B+C)|
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune|87.14%|90.40%|87.66%|89.98%|92.50%(A+B+C)|
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化 <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|88.00%|90.54%|88.50%|89.46%|92.02%(A+B+C)|
从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到91.66%,在CCPD数据集上进行 fine-tune 后指标进一步提升到92.5%, 在经过量化训练之后,指标变为92.02%。
### 4.4 部署
- 基于 Paddle Inference 的python推理
检测模型和识别模型分别 fine-tune 并导出为inference模型之后,可以使用如下命令基于 Paddle Inference 进行端到端推理并对结果进行可视化。
```bash
python tools/infer/predict_system.py \
--det_model_dir=output/CCPD/det/infer/ \
--rec_model_dir=output/CCPD/rec/infer/ \
--image_dir="/home/aistudio/data/CCPD2020/ccpd_green/test/04131106321839081-92_258-159&509_530&611-527&611_172&599_159&509_530&525-0_0_3_32_30_31_30_30-109-106.jpg" \
--rec_image_shape=3,48,320
```
推理结果如下
![](https://ai-studio-static-online.cdn.bcebos.com/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7)
- 端侧部署
端侧部署我们采用基于 PaddleLite 的 cpp 推理。Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。具体可参考 [PaddleOCR lite教程](../deploy/lite/readme_ch.md)
### 4.5 实验总结
我们分别使用PP-OCRv3中英文超轻量预训练模型在车牌数据集上进行了直接评估和 fine-tune 和 fine-tune +量化3种方案的实验,并基于[PaddleOCR lite教程](../deploy/lite/readme_ch.md)进行了速度测试,指标对比如下:
- 检测
|方案|hmeans| 模型大小 | 预测速度(lite) |
|---|---|------|------------|
|PP-OCRv3中英文超轻量检测预训练模型直接预测|76.12%|2.5M| 233ms |
|PP-OCRv3中英文超轻量检测预训练模型 fine-tune|99.00%| 2.5M | 233ms |
|PP-OCRv3中英文超轻量检测预训练模型 fine-tune + 量化|98.91%| 1.0M | 189ms |fine-tune
- 识别
|方案| acc | 模型大小 | 预测速度(lite) |
|---|--------|-------|------------|
|PP-OCRv3中英文超轻量识别预训练模型直接预测| 0.00% |10.3M| 4.2ms |
|PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的`·`| 90.97% |10.3M| 4.2ms |
|PP-OCRv3中英文超轻量识别预训练模型 fine-tune| 94.54% | 10.3M | 4.2ms |
|PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化| 93.40% | 4.8M | 1.8ms |
- 端到端指标如下:
|方案|fmeasure|模型大小|预测速度(lite) |
|---|---|---|---|
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型|0.08%|12.8M|298ms|
|PP-OCRv3中英文超轻量检测预训练模型 <br> PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的`·`|91.66%|12.8M|298ms|
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune|92.50%|12.8M|298ms|
|PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化 <br> PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化|92.02%|5.80M|224ms|
**结论**
PP-OCRv3的检测模型在未经过fine-tune的情况下,在车牌数据集上也有一定的精度,经过 fine-tune 后能够极大的提升检测效果,精度达到99%。在使用量化训练后检测模型的精度几乎无损,并且模型大小压缩60%。
PP-OCRv3的识别模型在未经过fine-tune的情况下,在车牌数据集上精度为0,但是经过分析可以知道,模型大部分字符都预测正确,但是会多预测一个特殊字符,去掉这个特殊字符后,精度达到90%。PP-OCRv3识别模型在经过 fine-tune 后识别精度进一步提升,达到94.4%。在使用量化训练后识别模型大小压缩53%,但是由于数据量多少,带来了1%的精度损失。
从端到端结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到91.66%,在CCPD数据集上进行 fine-tune 后指标进一步提升到92.5%, 在经过量化训练之后,指标轻微下降到92.02%但模型大小降低54%。
# 高精度中文场景文本识别模型SVTR
## 1. 简介
PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库,其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度,SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet,使用两层Global Blocks。在中文场景中,PP-OCRv3识别主要使用如下优化策略([详细技术报告](../doc/doc_ch/PP-OCRv3_introduction.md)):
- GTC:Attention指导CTC训练策略;
- TextConAug:挖掘文字上下文信息的数据增广策略;
- TextRotNet:自监督的预训练模型;
- UDML:联合互学习策略;
- UIM:无标注数据挖掘方案。
其中 *UIM:无标注数据挖掘方案* 使用了高精度的SVTR中文模型进行无标注文件的刷库,该模型在PP-OCRv3识别的数据集上训练,精度对比如下表。
|中文识别算法|模型|UIM|精度|
| --- | --- | --- |--- |
|PP-OCRv3|SVTR_LCNet| w/o |78.40%|
|PP-OCRv3|SVTR_LCNet| w |79.40%|
|SVTR|SVTR-Tiny|-|82.50%|
aistudio项目链接: [高精度中文场景文本识别模型SVTR](https://aistudio.baidu.com/aistudio/projectdetail/4263032)
## 2. SVTR中文模型使用
### 环境准备
本任务基于Aistudio完成, 具体环境如下:
- 操作系统: Linux
- PaddlePaddle: 2.3
- PaddleOCR: dygraph
下载 PaddleOCR代码
```bash
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
```
安装依赖库
```bash
pip install -r PaddleOCR/requirements.txt -i https://mirror.baidu.com/pypi/simple
```
### 快速使用
获取SVTR中文模型文件,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
```bash
# 解压模型文件
tar xf svtr_ch_high_accuracy.tar
```
预测中文文本,以下图为例:
![](../doc/imgs_words/ch/word_1.jpg)
预测命令:
```bash
# CPU预测
python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=False
# GPU预测
#python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=True
```
可以看到最后打印结果为
- result: 韩国小馆 0.9853458404541016
0.9853458404541016为预测置信度。
### 推理模型导出与预测
inference 模型(paddle.jit.save保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
运行识别模型转inference模型命令,如下:
```bash
python tools/export_model.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.save_inference_dir=./inference/svtr_ch
```
转换成功后,在目录下有三个文件:
```shell
inference/svtr_ch/
├── inference.pdiparams # 识别inference模型的参数文件
├── inference.pdiparams.info # 识别inference模型的参数信息,可忽略
└── inference.pdmodel # 识别inference模型的program文件
```
inference模型预测,命令如下:
```bash
# CPU预测
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320' --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=False
# GPU预测
#python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320' --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=True
```
**注意**
- 使用SVTR算法时,需要指定--rec_algorithm='SVTR'
- 如果使用自定义字典训练的模型,需要将--rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt修改为自定义的字典
- --rec_image_shape='3, 32, 320' 该参数不能去掉
*.html linguist-language=python
*.ipynb linguist-language=python
\ No newline at end of file
.DS_Store
*.pth
*.pyc
*.pyo
*.log
*.tmp
*.pkl
__pycache__/
.idea/
output/
test/*.jpg
datasets/
index/
train_log/
log/
profiling_log/
\ No newline at end of file
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# Real-time Scene Text Detection with Differentiable Binarization
**note**: some code is inherited from [WenmuZhou/DBNet.pytorch](https://github.com/WenmuZhou/DBNet.pytorch)
[中文解读](https://zhuanlan.zhihu.com/p/94677957)
![network](imgs/paper/db.jpg)
## update
2020-06-07: 添加灰度图训练,训练灰度图时需要在配置里移除`dataset.args.transforms.Normalize`
## Install Using Conda
```
conda env create -f environment.yml
git clone https://github.com/WenmuZhou/DBNet.paddle.git
cd DBNet.paddle/
```
or
## Install Manually
```bash
conda create -n dbnet python=3.6
conda activate dbnet
conda install ipython pip
# python dependencies
pip install -r requirement.txt
# clone repo
git clone https://github.com/WenmuZhou/DBNet.paddle.git
cd DBNet.paddle/
```
## Requirements
* paddlepaddle 2.4+
## Download
TBD
## Data Preparation
Training data: prepare a text `train.txt` in the following format, use '\t' as a separator
```
./datasets/train/img/001.jpg ./datasets/train/gt/001.txt
```
Validation data: prepare a text `test.txt` in the following format, use '\t' as a separator
```
./datasets/test/img/001.jpg ./datasets/test/gt/001.txt
```
- Store images in the `img` folder
- Store groundtruth in the `gt` folder
The groundtruth can be `.txt` files, with the following format:
```
x1, y1, x2, y2, x3, y3, x4, y4, annotation
```
## Train
1. config the `dataset['train']['dataset'['data_path']'`,`dataset['validate']['dataset'['data_path']`in [config/icdar2015_resnet18_fpn_DBhead_polyLR.yaml](cconfig/icdar2015_resnet18_fpn_DBhead_polyLR.yaml)
* . single gpu train
```bash
bash singlel_gpu_train.sh
```
* . Multi-gpu training
```bash
bash multi_gpu_train.sh
```
## Test
[eval.py](tools/eval.py) is used to test model on test dataset
1. config `model_path` in [eval.sh](eval.sh)
2. use following script to test
```bash
bash eval.sh
```
## Predict
[predict.py](tools/predict.py) Can be used to inference on all images in a folder
1. config `model_path`,`input_folder`,`output_folder` in [predict.sh](predict.sh)
2. use following script to predict
```
bash predict.sh
```
You can change the `model_path` in the `predict.sh` file to your model location.
tips: if result is not good, you can change `thre` in [predict.sh](predict.sh)
## Export Model
[export_model.py](tools/export_model.py) Can be used to inference on all images in a folder
use following script to export inference model
```
python tools/export_model.py --config_file config/icdar2015_resnet50_FPN_DBhead_polyLR.yaml -o trainer.resume_checkpoint=model_best.pth trainer.output_dir=output/infer
```
## Paddle Inference infer
[infer.py](tools/infer.py) Can be used to inference on all images in a folder
use following script to export inference model
```
python tools/infer.py --model-dir=output/infer/ --img-path imgs/paper/db.jpg
```
<h2 id="Performance">Performance</h2>
### [ICDAR 2015](http://rrc.cvc.uab.es/?ch=4)
only train on ICDAR2015 dataset
| Method | image size (short size) |learning rate | Precision (%) | Recall (%) | F-measure (%) | FPS |
|:--------------------------:|:-------:|:--------:|:--------:|:------------:|:---------------:|:-----:|
| ImageNet-resnet50-FPN-DBHead(torch) |736 |1e-3|90.19 | 78.14 | 83.88 | 27 |
| ImageNet-resnet50-FPN-DBHead(paddle) |736 |1e-3| 89.47 | 79.03 | 83.92 | 27 |
| ImageNet-resnet50-FPN-DBHead(paddle_amp) |736 |1e-3| 88.62 | 79.95 | 84.06 | 27 |
### examples
TBD
### reference
1. https://arxiv.org/pdf/1911.08947.pdf
2. https://github.com/WenmuZhou/DBNet.pytorch
**If this repository helps you,please star it. Thanks.**
from .base_trainer import BaseTrainer
from .base_dataset import BaseDataSet
\ No newline at end of file
# -*- coding: utf-8 -*-
# @Time : 2019/12/4 13:12
# @Author : zhoujun
import copy
from paddle.io import Dataset
from data_loader.modules import *
class BaseDataSet(Dataset):
def __init__(self,
data_path: str,
img_mode,
pre_processes,
filter_keys,
ignore_tags,
transform=None,
target_transform=None):
assert img_mode in ['RGB', 'BRG', 'GRAY']
self.ignore_tags = ignore_tags
self.data_list = self.load_data(data_path)
item_keys = [
'img_path', 'img_name', 'text_polys', 'texts', 'ignore_tags'
]
for item in item_keys:
assert item in self.data_list[
0], 'data_list from load_data must contains {}'.format(
item_keys)
self.img_mode = img_mode
self.filter_keys = filter_keys
self.transform = transform
self.target_transform = target_transform
self._init_pre_processes(pre_processes)
def _init_pre_processes(self, pre_processes):
self.aug = []
if pre_processes is not None:
for aug in pre_processes:
if 'args' not in aug:
args = {}
else:
args = aug['args']
if isinstance(args, dict):
cls = eval(aug['type'])(**args)
else:
cls = eval(aug['type'])(args)
self.aug.append(cls)
def load_data(self, data_path: str) -> list:
"""
把数据加载为一个list:
:params data_path: 存储数据的文件夹或者文件
return a dict ,包含了,'img_path','img_name','text_polys','texts','ignore_tags'
"""
raise NotImplementedError
def apply_pre_processes(self, data):
for aug in self.aug:
data = aug(data)
return data
def __getitem__(self, index):
try:
data = copy.deepcopy(self.data_list[index])
im = cv2.imread(data['img_path'], 1
if self.img_mode != 'GRAY' else 0)
if self.img_mode == 'RGB':
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
data['img'] = im
data['shape'] = [im.shape[0], im.shape[1]]
data = self.apply_pre_processes(data)
if self.transform:
data['img'] = self.transform(data['img'])
data['text_polys'] = data['text_polys'].tolist()
if len(self.filter_keys):
data_dict = {}
for k, v in data.items():
if k not in self.filter_keys:
data_dict[k] = v
return data_dict
else:
return data
except:
return self.__getitem__(np.random.randint(self.__len__()))
def __len__(self):
return len(self.data_list)
# -*- coding: utf-8 -*-
# @Time : 2019/8/23 21:50
# @Author : zhoujun
import os
import pathlib
import shutil
from pprint import pformat
import anyconfig
import paddle
import numpy as np
import random
from paddle.jit import to_static
from paddle.static import InputSpec
from utils import setup_logger
class BaseTrainer:
def __init__(self,
config,
model,
criterion,
train_loader,
validate_loader,
metric_cls,
post_process=None):
config['trainer']['output_dir'] = os.path.join(
str(pathlib.Path(os.path.abspath(__name__)).parent),
config['trainer']['output_dir'])
config['name'] = config['name'] + '_' + model.name
self.save_dir = config['trainer']['output_dir']
self.checkpoint_dir = os.path.join(self.save_dir, 'checkpoint')
os.makedirs(self.checkpoint_dir, exist_ok=True)
self.global_step = 0
self.start_epoch = 0
self.config = config
self.criterion = criterion
# logger and tensorboard
self.visualdl_enable = self.config['trainer'].get('visual_dl', False)
self.epochs = self.config['trainer']['epochs']
self.log_iter = self.config['trainer']['log_iter']
if paddle.distributed.get_rank() == 0:
anyconfig.dump(config, os.path.join(self.save_dir, 'config.yaml'))
self.logger = setup_logger(os.path.join(self.save_dir, 'train.log'))
self.logger_info(pformat(self.config))
self.model = self.apply_to_static(model)
# device
if paddle.device.cuda.device_count(
) > 0 and paddle.device.is_compiled_with_cuda():
self.with_cuda = True
random.seed(self.config['trainer']['seed'])
np.random.seed(self.config['trainer']['seed'])
paddle.seed(self.config['trainer']['seed'])
else:
self.with_cuda = False
self.logger_info('train with and paddle {}'.format(paddle.__version__))
# metrics
self.metrics = {
'recall': 0,
'precision': 0,
'hmean': 0,
'train_loss': float('inf'),
'best_model_epoch': 0
}
self.train_loader = train_loader
if validate_loader is not None:
assert post_process is not None and metric_cls is not None
self.validate_loader = validate_loader
self.post_process = post_process
self.metric_cls = metric_cls
self.train_loader_len = len(train_loader)
if self.validate_loader is not None:
self.logger_info(
'train dataset has {} samples,{} in dataloader, validate dataset has {} samples,{} in dataloader'.
format(
len(self.train_loader.dataset), self.train_loader_len,
len(self.validate_loader.dataset),
len(self.validate_loader)))
else:
self.logger_info(
'train dataset has {} samples,{} in dataloader'.format(
len(self.train_loader.dataset), self.train_loader_len))
self._initialize_scheduler()
self._initialize_optimizer()
# resume or finetune
if self.config['trainer']['resume_checkpoint'] != '':
self._load_checkpoint(
self.config['trainer']['resume_checkpoint'], resume=True)
elif self.config['trainer']['finetune_checkpoint'] != '':
self._load_checkpoint(
self.config['trainer']['finetune_checkpoint'], resume=False)
if self.visualdl_enable and paddle.distributed.get_rank() == 0:
from visualdl import LogWriter
self.writer = LogWriter(self.save_dir)
# 混合精度训练
self.amp = self.config.get('amp', None)
if self.amp == 'None':
self.amp = None
if self.amp:
self.amp['scaler'] = paddle.amp.GradScaler(
init_loss_scaling=self.amp.get("scale_loss", 1024),
use_dynamic_loss_scaling=self.amp.get(
'use_dynamic_loss_scaling', True))
self.model, self.optimizer = paddle.amp.decorate(
models=self.model,
optimizers=self.optimizer,
level=self.amp.get('amp_level', 'O2'))
# 分布式训练
if paddle.device.cuda.device_count() > 1:
self.model = paddle.DataParallel(self.model)
# make inverse Normalize
self.UN_Normalize = False
for t in self.config['dataset']['train']['dataset']['args'][
'transforms']:
if t['type'] == 'Normalize':
self.normalize_mean = t['args']['mean']
self.normalize_std = t['args']['std']
self.UN_Normalize = True
def apply_to_static(self, model):
support_to_static = self.config['trainer'].get('to_static', False)
if support_to_static:
specs = None
print('static')
specs = [InputSpec([None, 3, -1, -1])]
model = to_static(model, input_spec=specs)
self.logger_info(
"Successfully to apply @to_static with specs: {}".format(specs))
return model
def train(self):
"""
Full training logic
"""
for epoch in range(self.start_epoch + 1, self.epochs + 1):
self.epoch_result = self._train_epoch(epoch)
self._on_epoch_finish()
if paddle.distributed.get_rank() == 0 and self.visualdl_enable:
self.writer.close()
self._on_train_finish()
def _train_epoch(self, epoch):
"""
Training logic for an epoch
:param epoch: Current epoch number
"""
raise NotImplementedError
def _eval(self, epoch):
"""
eval logic for an epoch
:param epoch: Current epoch number
"""
raise NotImplementedError
def _on_epoch_finish(self):
raise NotImplementedError
def _on_train_finish(self):
raise NotImplementedError
def _save_checkpoint(self, epoch, file_name):
"""
Saving checkpoints
:param epoch: current epoch number
:param log: logging information of the epoch
:param save_best: if True, rename the saved checkpoint to 'model_best.pth.tar'
"""
state_dict = self.model.state_dict()
state = {
'epoch': epoch,
'global_step': self.global_step,
'state_dict': state_dict,
'optimizer': self.optimizer.state_dict(),
'config': self.config,
'metrics': self.metrics
}
filename = os.path.join(self.checkpoint_dir, file_name)
paddle.save(state, filename)
def _load_checkpoint(self, checkpoint_path, resume):
"""
Resume from saved checkpoints
:param checkpoint_path: Checkpoint path to be resumed
"""
self.logger_info("Loading checkpoint: {} ...".format(checkpoint_path))
checkpoint = paddle.load(checkpoint_path)
self.model.set_state_dict(checkpoint['state_dict'])
if resume:
self.global_step = checkpoint['global_step']
self.start_epoch = checkpoint['epoch']
self.config['lr_scheduler']['args']['last_epoch'] = self.start_epoch
# self.scheduler.load_state_dict(checkpoint['scheduler'])
self.optimizer.set_state_dict(checkpoint['optimizer'])
if 'metrics' in checkpoint:
self.metrics = checkpoint['metrics']
self.logger_info("resume from checkpoint {} (epoch {})".format(
checkpoint_path, self.start_epoch))
else:
self.logger_info("finetune from checkpoint {}".format(
checkpoint_path))
def _initialize(self, name, module, *args, **kwargs):
module_name = self.config[name]['type']
module_args = self.config[name].get('args', {})
assert all([k not in module_args for k in kwargs
]), 'Overwriting kwargs given in config file is not allowed'
module_args.update(kwargs)
return getattr(module, module_name)(*args, **module_args)
def _initialize_scheduler(self):
self.lr_scheduler = self._initialize('lr_scheduler',
paddle.optimizer.lr)
def _initialize_optimizer(self):
self.optimizer = self._initialize(
'optimizer',
paddle.optimizer,
parameters=self.model.parameters(),
learning_rate=self.lr_scheduler)
def inverse_normalize(self, batch_img):
if self.UN_Normalize:
batch_img[:, 0, :, :] = batch_img[:, 0, :, :] * self.normalize_std[
0] + self.normalize_mean[0]
batch_img[:, 1, :, :] = batch_img[:, 1, :, :] * self.normalize_std[
1] + self.normalize_mean[1]
batch_img[:, 2, :, :] = batch_img[:, 2, :, :] * self.normalize_std[
2] + self.normalize_mean[2]
def logger_info(self, s):
if paddle.distributed.get_rank() == 0:
self.logger.info(s)
name: DBNet
dataset:
train:
dataset:
type: SynthTextDataset # 数据集类型
args:
data_path: ''# SynthTextDataset 根目录
pre_processes: # 数据的预处理过程,包含augment和标签制作
- type: IaaAugment # 使用imgaug进行变换
args:
- {'type':Fliplr, 'args':{'p':0.5}}
- {'type': Affine, 'args':{'rotate':[-10,10]}}
- {'type':Resize,'args':{'size':[0.5,3]}}
- type: EastRandomCropData
args:
size: [640,640]
max_tries: 50
keep_ratio: true
- type: MakeBorderMap
args:
shrink_ratio: 0.4
- type: MakeShrinkMap
args:
shrink_ratio: 0.4
min_text_size: 8
transforms: # 对图片进行的变换方式
- type: ToTensor
args: {}
- type: Normalize
args:
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
img_mode: RGB
filter_keys: ['img_path','img_name','text_polys','texts','ignore_tags','shape'] # 返回数据之前,从数据字典里删除的key
ignore_tags: ['*', '###']
loader:
batch_size: 1
shuffle: true
num_workers: 0
collate_fn: ''
\ No newline at end of file
name: DBNet
base: ['config/SynthText.yaml']
arch:
type: Model
backbone:
type: resnet18
pretrained: true
neck:
type: FPN
inner_channels: 256
head:
type: DBHead
out_channels: 2
k: 50
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 1.5 # from paper
metric:
type: QuadMetric
args:
is_output_polygon: false
loss:
type: DBLoss
alpha: 1
beta: 10
ohem_ratio: 3
optimizer:
type: Adam
args:
lr: 0.001
weight_decay: 0
amsgrad: true
lr_scheduler:
type: WarmupPolyLR
args:
warmup_epoch: 3
trainer:
seed: 2
epochs: 1200
log_iter: 10
show_images_iter: 50
resume_checkpoint: ''
finetune_checkpoint: ''
output_dir: output
visual_dl: false
amp:
scale_loss: 1024
amp_level: O2
custom_white_list: []
custom_black_list: ['exp', 'sigmoid', 'concat']
dataset:
train:
dataset:
args:
data_path: ./datasets/SynthText
img_mode: RGB
loader:
batch_size: 2
shuffle: true
num_workers: 6
collate_fn: ''
\ No newline at end of file
name: DBNet
dataset:
train:
dataset:
type: ICDAR2015Dataset # 数据集类型
args:
data_path: # 一个存放 img_path \t gt_path的文件
- ''
pre_processes: # 数据的预处理过程,包含augment和标签制作
- type: IaaAugment # 使用imgaug进行变换
args:
- {'type':Fliplr, 'args':{'p':0.5}}
- {'type': Affine, 'args':{'rotate':[-10,10]}}
- {'type':Resize,'args':{'size':[0.5,3]}}
- type: EastRandomCropData
args:
size: [640,640]
max_tries: 50
keep_ratio: true
- type: MakeBorderMap
args:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- type: MakeShrinkMap
args:
shrink_ratio: 0.4
min_text_size: 8
transforms: # 对图片进行的变换方式
- type: ToTensor
args: {}
- type: Normalize
args:
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
img_mode: RGB
filter_keys: [img_path,img_name,text_polys,texts,ignore_tags,shape] # 返回数据之前,从数据字典里删除的key
ignore_tags: ['*', '###']
loader:
batch_size: 1
shuffle: true
num_workers: 0
collate_fn: ''
validate:
dataset:
type: ICDAR2015Dataset
args:
data_path:
- ''
pre_processes:
- type: ResizeShortSize
args:
short_size: 736
resize_text_polys: false
transforms:
- type: ToTensor
args: {}
- type: Normalize
args:
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
img_mode: RGB
filter_keys: []
ignore_tags: ['*', '###']
loader:
batch_size: 1
shuffle: true
num_workers: 0
collate_fn: ICDARCollectFN
\ No newline at end of file
name: DBNet
base: ['config/icdar2015.yaml']
arch:
type: Model
backbone:
type: deformable_resnet18
pretrained: true
neck:
type: FPN
inner_channels: 256
head:
type: DBHead
out_channels: 2
k: 50
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 1.5 # from paper
metric:
type: QuadMetric
args:
is_output_polygon: false
loss:
type: DBLoss
alpha: 1
beta: 10
ohem_ratio: 3
optimizer:
type: Adam
args:
lr: 0.001
weight_decay: 0
amsgrad: true
lr_scheduler:
type: WarmupPolyLR
args:
warmup_epoch: 3
trainer:
seed: 2
epochs: 1200
log_iter: 10
show_images_iter: 50
resume_checkpoint: ''
finetune_checkpoint: ''
output_dir: output
visual_dl: false
amp:
scale_loss: 1024
amp_level: O2
custom_white_list: []
custom_black_list: ['exp', 'sigmoid', 'concat']
dataset:
train:
dataset:
args:
data_path:
- ./datasets/train.txt
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ''
validate:
dataset:
args:
data_path:
- ./datasets/test.txt
pre_processes:
- type: ResizeShortSize
args:
short_size: 736
resize_text_polys: false
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ICDARCollectFN
\ No newline at end of file
name: DBNet
base: ['config/icdar2015.yaml']
arch:
type: Model
backbone:
type: resnet18
pretrained: true
neck:
type: FPN
inner_channels: 256
head:
type: DBHead
out_channels: 2
k: 50
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 1.5 # from paper
metric:
type: QuadMetric
args:
is_output_polygon: false
loss:
type: DBLoss
alpha: 1
beta: 10
ohem_ratio: 3
optimizer:
type: Adam
args:
lr: 0.001
weight_decay: 0
amsgrad: true
lr_scheduler:
type: WarmupPolyLR
args:
warmup_epoch: 3
trainer:
seed: 2
epochs: 1200
log_iter: 10
show_images_iter: 50
resume_checkpoint: ''
finetune_checkpoint: ''
output_dir: output
visual_dl: false
amp:
scale_loss: 1024
amp_level: O2
custom_white_list: []
custom_black_list: ['exp', 'sigmoid', 'concat']
dataset:
train:
dataset:
args:
data_path:
- ./datasets/train.txt
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ''
validate:
dataset:
args:
data_path:
- ./datasets/test.txt
pre_processes:
- type: ResizeShortSize
args:
short_size: 736
resize_text_polys: false
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ICDARCollectFN
name: DBNet
base: ['config/icdar2015.yaml']
arch:
type: Model
backbone:
type: resnet18
pretrained: true
neck:
type: FPN
inner_channels: 256
head:
type: DBHead
out_channels: 2
k: 50
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 1.5 # from paper
metric:
type: QuadMetric
args:
is_output_polygon: false
loss:
type: DBLoss
alpha: 1
beta: 10
ohem_ratio: 3
optimizer:
type: Adam
args:
lr: 0.001
weight_decay: 0
amsgrad: true
lr_scheduler:
type: StepLR
args:
step_size: 10
gama: 0.8
trainer:
seed: 2
epochs: 500
log_iter: 10
show_images_iter: 50
resume_checkpoint: ''
finetune_checkpoint: ''
output_dir: output
visual_dl: false
amp:
scale_loss: 1024
amp_level: O2
custom_white_list: []
custom_black_list: ['exp', 'sigmoid', 'concat']
dataset:
train:
dataset:
args:
data_path:
- ./datasets/train.txt
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ''
validate:
dataset:
args:
data_path:
- ./datasets/test.txt
pre_processes:
- type: ResizeShortSize
args:
short_size: 736
resize_text_polys: false
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ICDARCollectFN
name: DBNet
base: ['config/icdar2015.yaml']
arch:
type: Model
backbone:
type: resnet50
pretrained: true
neck:
type: FPN
inner_channels: 256
head:
type: DBHead
out_channels: 2
k: 50
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.7
max_candidates: 1000
unclip_ratio: 1.5 # from paper
metric:
type: QuadMetric
args:
is_output_polygon: false
loss:
type: DBLoss
alpha: 1
beta: 10
ohem_ratio: 3
optimizer:
type: Adam
lr_scheduler:
type: Polynomial
args:
learning_rate: 0.001
warmup_epoch: 3
trainer:
seed: 2
epochs: 1200
log_iter: 10
show_images_iter: 50
resume_checkpoint: ''
finetune_checkpoint: ''
output_dir: output/fp16_o2
visual_dl: false
amp:
scale_loss: 1024
amp_level: O2
custom_white_list: []
custom_black_list: ['exp', 'sigmoid', 'concat']
dataset:
train:
dataset:
args:
data_path:
- ./datasets/train.txt
img_mode: RGB
loader:
batch_size: 16
shuffle: true
num_workers: 6
collate_fn: ''
validate:
dataset:
args:
data_path:
- ./datasets/test.txt
pre_processes:
- type: ResizeShortSize
args:
short_size: 736
resize_text_polys: false
img_mode: RGB
loader:
batch_size: 1
shuffle: true
num_workers: 6
collate_fn: ICDARCollectFN
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment