"...MoQ/huggingface-transformers/examples/seq2seq/README.md" did not exist on "aebde649e30016aa33b2e1345cb22210a2e49b04"
Commit 79aec8f2 authored by LDOUBLEV's avatar LDOUBLEV
Browse files

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleOCR into fixocr

parents caaf0bd4 7214a741
# 基于预测引擎推理
# 基于Python预测引擎推理
inference 模型(fluid.io.save_inference_model保存的模型)
一般是模型训练完成后保存的固化模型,多用于预测部署。
......
# 中文OCR模型快速使用
## 1.环境配置
请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。
## 2.inference模型下载
|模型名称|模型简介|检测模型地址|识别模型地址|支持空格的识别模型地址|
|-|-|-|-|-|
|chinese_db_crnn_mobile|超轻量级中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)
|chinese_db_crnn_server|通用中文OCR模型|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|[inference模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_enhance.tar)
*windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下*
复制上表中的检测和识别的`inference模型`下载地址,并解压
```
mkdir inference && cd inference
# 下载检测模型并解压
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# 下载识别模型并解压
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
cd ..
```
以超轻量级模型为例:
```
mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
解压完毕后应有如下文件结构:
```
|-inference
|-ch_rec_mv3_crnn
|- model
|- params
|-ch_det_mv3_db
|- model
|- params
...
```
## 3.单张图像或者图像集合预测
以下代码实现了文本检测、识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数det_model_dir指定检测inference模型的路径和参数rec_model_dir指定识别inference模型的路径。可视化识别结果默认保存到 ./inference_results 文件夹里面。
```bash
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# 预测image_dir指定的图像集合
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/"
# 如果想使用CPU进行预测,需设置use_gpu参数为False
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False
```
通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下:
```
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/"
```
带空格的通用中文OCR模型的体验可以按照上述步骤下载相应的模型,并且更新相关的参数,示例如下:
```
# 预测image_dir指定的单张图像
python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_12.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn_enhance/"
```
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md)
此外,文档教程中也提供了中文OCR模型的其他预测部署方式:
- 基于C++预测引擎推理(comming soon)
- [服务部署](./doc/doc_ch/serving.md)
- 端侧部署(comming soon)
......@@ -94,7 +94,10 @@ word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,
`ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典,
您可以按需使用。
如需自定义dic文件,请修改 `configs/rec/rec_icdar15_train.yml` 中的 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 并将 `character_type` 设置为 `ch`
*如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `true`。`use_space_char` 仅在 `character_type=ch` 时生效*
### 启动训练
......@@ -124,6 +127,18 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
```
- 数据增强
PaddleOCR提供了多种数据增强方式,如果您希望在训练时加入扰动,请在配置文件中设置 `distort: true`
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)。
训练过程中每种扰动方式以50%的概率被选择,具体代码实现请参考:[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)
*由于OpenCV的兼容性问题,扰动操作暂时只支持GPU*
- 训练
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy`
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。
......@@ -157,12 +172,26 @@ Global:
character_type: ch
# 添加自定义字典,如修改字典请将路径指向新字典
character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
# 训练时添加数据增强
distort: true
# 识别空格
use_space_char: true
...
# 修改reader类型
reader_yml: ./configs/rec/rec_chinese_reader.yml
...
...
Optimizer:
...
# 添加学习率衰减策略
decay:
function: cosine_decay
# 每个 epoch 包含 iter 数
step_each_epoch: 20
# 总共训练epoch数
total_epoch: 1000
```
**注意,预测/评估时的配置文件请务必与训练一致。**
......
# 参考文献
```
1. EAST:
@inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={5551--5560},
year={2017}
}
2. DB:
@article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
journal={arXiv preprint arXiv:1911.08947},
year={2019}
}
3. DTRB:
@inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={4715--4723},
year={2019}
}
4. SAST:
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}
5. SRN:
@article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
journal={arXiv preprint arXiv:2003.12294},
year={2020}
}
6. end2end-psl:
@inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9086--9095},
year={2019}
}
```
\ No newline at end of file
# 服务部署
PaddleOCR提供2种服务部署方式:
- 基于HubServing的部署:已集成到PaddleOCR中([code](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/ocr_hubserving)),按照本教程使用;
- 基于PaddleServing的部署:详见PaddleServing官网[demo](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr),后续也将集成到PaddleOCR。
服务部署目录下包括检测、识别、2阶段串联三种服务包,根据需求选择相应的服务包进行安装和启动。目录如下:
```
deploy/hubserving/
└─ ocr_det 检测模块服务包
└─ ocr_rec 识别模块服务包
└─ ocr_system 检测+识别串联服务包
```
每个服务包下包含3个文件。以2阶段串联服务包为例,目录如下:
```
deploy/hubserving/ocr_system/
└─ __init__.py 空文件
└─ config.json 配置文件,启动服务时作为参数传入
└─ module.py 主模块,包含服务的完整逻辑
```
## 启动服务
以下步骤以检测+识别2阶段串联服务为例,如果只需要检测服务或识别服务,替换相应文件路径即可。
### 1. 安装paddlehub
```pip3 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple```
### 2. 安装服务模块
PaddleOCR提供3种服务模块,根据需要安装所需模块。如:
安装检测服务模块:
```hub install deploy/hubserving/ocr_det/```
或,安装识别服务模块:
```hub install deploy/hubserving/ocr_rec/```
或,安装检测+识别串联服务模块:
```hub install deploy/hubserving/ocr_system/```
### 3. 修改配置文件
在config.json中指定模型路径、是否使用GPU、是否对结果做可视化等参数,如,串联服务ocr_system的配置:
```python
{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"det_model_dir": "./inference/det/",
"rec_model_dir": "./inference/rec/",
"use_gpu": true
},
"predict_args": {
"visualization": false
}
}
}
}
```
其中,模型路径对应的模型为```inference模型```。
### 4. 运行启动命令
```hub serving start -m ocr_system --config hubserving/ocr_det/config.json```
这样就完成了一个服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测(即,config中use_gpu置为true),则需要在启动服务之前,设置CUDA_VISIBLE_DEVICES环境变量,如:```export CUDA_VISIBLE_DEVICES=0```,否则不用设置。
## 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果:
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]}
headers = {"Content-type": "application/json"}
# url = "http://127.0.0.1:8866/predict/ocr_det"
# url = "http://127.0.0.1:8866/predict/ocr_rec"
url = "http://127.0.0.1:8866/predict/ocr_system"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
你可能需要根据实际情况修改```url```字符串中的端口号和服务模块名称。
上面所示代码都已写入测试脚本,可直接运行命令:```python tools/test_hubserving.py```
## 自定义修改服务模块
如果需要修改服务逻辑,你一般需要操作以下步骤:
1、 停止服务
```hub serving stop -m ocr_system```
2、 到相应的module.py文件中根据实际需求修改代码
3、 卸载旧服务包
```hub uninstall ocr_system```
4、 安装修改后的新服务包
```hub install deploy/hubserving/ocr_system/```
# 版本更新
- 2020.7.9 添加支持空格的识别模型,识别效果
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考配置文件
- 2020.6.8 添加数据集,并保持持续更新
- 2020.6.5 支持 `attetnion` 模型导出 `inference_model`
- 2020.6.5 支持单独预测识别时,输出结果得分
- 2020.5.30 提供超轻量级中文OCR在线体验
......
# 效果展示
- [超轻量级中文OCR效果展示](#超轻量级中文OCR)
- [通用中文OCR效果展示](#通用中文OCR)
- [支持空格的中文OCR效果展示](#支持空格的中文OCR)
<a name="超轻量级中文OCR"></a>
## 超轻量级中文OCR效果展示
![](../imgs_results/1.jpg)
![](../imgs_results/7.jpg)
![](../imgs_results/12.jpg)
![](../imgs_results/4.jpg)
![](../imgs_results/6.jpg)
![](../imgs_results/9.jpg)
![](../imgs_results/16.png)
![](../imgs_results/22.jpg)
<a name="通用中文OCR"></a>
## 通用中文OCR效果展示
![](../imgs_results/chinese_db_crnn_server/11.jpg)
![](../imgs_results/chinese_db_crnn_server/2.jpg)
![](../imgs_results/chinese_db_crnn_server/8.jpg)
<a name="支持空格的中文OCR"></a>
## 支持空格的中文OCR效果展示
### 轻量级模型
![](../imgs_results/img_11.jpg)
### 通用模型
![](../imgs_results/chinese_db_crnn_server/en_paper.jpg)
......@@ -22,7 +22,7 @@ Take `rec_chinese_lite_train.yml` as an example
| print_batch_step | Set print log interval | 10 | \ |
| save_model_dir | Set model save path | output/{model_name} | \ |
| save_epoch_step | Set model save interval | 3 | \ |
| eval_batch_step | Set the model evaluation interval | 2000 | \ |
| eval_batch_step | Set the model evaluation interval |2000 or [1000, 2000] |runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration |
|train_batch_size_per_card | Set the batch size during training | 256 | \ |
| test_batch_size_per_card | Set the batch size during testing | 256 | \ |
| image_shape | Set input image size | [3, 32, 100] | \ |
......@@ -30,6 +30,8 @@ Take `rec_chinese_lite_train.yml` as an example
| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch|
| character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ |
| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention |
| distort | Set use distort | false | Support distort type ,read [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
| use_space_char | Wether to recognize space | false | Only support in character_type=ch mode |
| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ |
| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ |
| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
......
......@@ -158,9 +158,23 @@ Global:
...
# Modify reader type
reader_yml: ./configs/rec/rec_chinese_reader.yml
# Whether to use data augmentation
distort: true
# Whether to recognize spaces
use_space_char: true
...
...
Optimizer:
...
# Add learning rate decay strategy
decay:
function: cosine_decay
# Each epoch contains iter number
step_each_epoch: 20
# Total epoch number
total_epoch: 1000
```
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
......
......@@ -194,8 +194,12 @@ class DBProcessTest(object):
img_std = [0.229, 0.224, 0.225]
im = im.astype(np.float32, copy=False)
im = im / 255
im -= img_mean
im /= img_std
im[:, :, 0] -= img_mean[0]
im[:, :, 1] -= img_mean[1]
im[:, :, 2] -= img_mean[2]
im[:, :, 0] /= img_std[0]
im[:, :, 1] /= img_std[1]
im[:, :, 2] /= img_std[2]
channel_swap = (2, 0, 1)
im = im.transpose(channel_swap)
return im
......
......@@ -45,12 +45,20 @@ class LMDBReader(object):
self.use_tps = False
if "tps" in params:
self.ues_tps = True
self.use_distort = False
if "distort" in params:
self.use_distort = params['distort'] and params['use_gpu']
if not params['use_gpu']:
logger.info(
"Distort operation can only support in GPU. Distort will be set to False."
)
if params['mode'] == 'train':
self.batch_size = params['train_batch_size_per_card']
self.drop_last = True
else:
self.batch_size = params['test_batch_size_per_card']
self.drop_last = False
self.use_distort = False
self.infer_img = params['infer_img']
def load_hierarchical_lmdb_dataset(self):
......@@ -142,7 +150,8 @@ class LMDBReader(object):
label=label,
char_ops=self.char_ops,
loss_type=self.loss_type,
max_text_length=self.max_text_length)
max_text_length=self.max_text_length,
distort=self.use_distort)
if outs is None:
continue
yield outs
......@@ -185,12 +194,20 @@ class SimpleReader(object):
self.use_tps = False
if "tps" in params:
self.use_tps = True
self.use_distort = False
if "distort" in params:
self.use_distort = params['distort'] and params['use_gpu']
if not params['use_gpu']:
logger.info(
"Distort operation can only support in GPU.Distort will be set to False."
)
if params['mode'] == 'train':
self.batch_size = params['train_batch_size_per_card']
self.drop_last = True
else:
self.batch_size = params['test_batch_size_per_card']
self.drop_last = False
self.use_distort = False
def __call__(self, process_id):
if self.mode != 'train':
......@@ -232,9 +249,14 @@ class SimpleReader(object):
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
label = substr[1]
outs = process_image(img, self.image_shape, label,
self.char_ops, self.loss_type,
self.max_text_length)
outs = process_image(
img=img,
image_shape=self.image_shape,
label=label,
char_ops=self.char_ops,
loss_type=self.loss_type,
max_text_length=self.max_text_length,
distort=self.use_distort)
if outs is None:
continue
yield outs
......
......@@ -15,6 +15,7 @@
import math
import cv2
import numpy as np
import random
from ppocr.utils.utility import initial_logger
logger = initial_logger()
......@@ -89,6 +90,254 @@ def get_img_data(value):
return imgori
def flag():
"""
flag
"""
return 1 if random.random() > 0.5000001 else -1
def cvtColor(img):
"""
cvtColor
"""
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
delta = 0.001 * random.random() * flag()
hsv[:, :, 2] = hsv[:, :, 2] * (1 + delta)
new_img = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
return new_img
def blur(img):
"""
blur
"""
h, w, _ = img.shape
if h > 10 and w > 10:
return cv2.GaussianBlur(img, (5, 5), 1)
else:
return img
def jitter(img):
"""
jitter
"""
w, h, _ = img.shape
if h > 10 and w > 10:
thres = min(w, h)
s = int(random.random() * thres * 0.01)
src_img = img.copy()
for i in range(s):
img[i:, i:, :] = src_img[:w - i, :h - i, :]
return img
else:
return img
def add_gasuss_noise(image, mean=0, var=0.1):
"""
Gasuss noise
"""
noise = np.random.normal(mean, var**0.5, image.shape)
out = image + 0.5 * noise
out = np.clip(out, 0, 255)
out = np.uint8(out)
return out
def get_crop(image):
"""
random crop
"""
h, w, _ = image.shape
top_min = 1
top_max = 8
top_crop = int(random.randint(top_min, top_max))
top_crop = min(top_crop, h - 1)
crop_img = image.copy()
ratio = random.randint(0, 1)
if ratio:
crop_img = crop_img[top_crop:h, :, :]
else:
crop_img = crop_img[0:h - top_crop, :, :]
return crop_img
class Config:
"""
Config
"""
def __init__(self, ):
self.anglex = random.random() * 30
self.angley = random.random() * 15
self.anglez = random.random() * 10
self.fov = 42
self.r = 0
self.shearx = random.random() * 0.3
self.sheary = random.random() * 0.05
self.borderMode = cv2.BORDER_REPLICATE
def make(self, w, h, ang):
"""
make
"""
self.anglex = random.random() * 5 * flag()
self.angley = random.random() * 5 * flag()
self.anglez = -1 * random.random() * int(ang) * flag()
self.fov = 42
self.r = 0
self.shearx = 0
self.sheary = 0
self.borderMode = cv2.BORDER_REPLICATE
self.w = w
self.h = h
self.perspective = True
self.crop = True
self.affine = False
self.reverse = True
self.noise = True
self.jitter = True
self.blur = True
self.color = True
def rad(x):
"""
rad
"""
return x * np.pi / 180
def get_warpR(config):
"""
get_warpR
"""
anglex, angley, anglez, fov, w, h, r = \
config.anglex, config.angley, config.anglez, config.fov, config.w, config.h, config.r
if w > 69 and w < 112:
anglex = anglex * 1.5
z = np.sqrt(w**2 + h**2) / 2 / np.tan(rad(fov / 2))
# Homogeneous coordinate transformation matrix
rx = np.array([[1, 0, 0, 0],
[0, np.cos(rad(anglex)), -np.sin(rad(anglex)), 0], [
0,
-np.sin(rad(anglex)),
np.cos(rad(anglex)),
0,
], [0, 0, 0, 1]], np.float32)
ry = np.array([[np.cos(rad(angley)), 0, np.sin(rad(angley)), 0],
[0, 1, 0, 0], [
-np.sin(rad(angley)),
0,
np.cos(rad(angley)),
0,
], [0, 0, 0, 1]], np.float32)
rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0, 0],
[-np.sin(rad(anglez)), np.cos(rad(anglez)), 0, 0],
[0, 0, 1, 0], [0, 0, 0, 1]], np.float32)
r = rx.dot(ry).dot(rz)
# generate 4 points
pcenter = np.array([h / 2, w / 2, 0, 0], np.float32)
p1 = np.array([0, 0, 0, 0], np.float32) - pcenter
p2 = np.array([w, 0, 0, 0], np.float32) - pcenter
p3 = np.array([0, h, 0, 0], np.float32) - pcenter
p4 = np.array([w, h, 0, 0], np.float32) - pcenter
dst1 = r.dot(p1)
dst2 = r.dot(p2)
dst3 = r.dot(p3)
dst4 = r.dot(p4)
list_dst = np.array([dst1, dst2, dst3, dst4])
org = np.array([[0, 0], [w, 0], [0, h], [w, h]], np.float32)
dst = np.zeros((4, 2), np.float32)
# Project onto the image plane
dst[:, 0] = list_dst[:, 0] * z / (z - list_dst[:, 2]) + pcenter[0]
dst[:, 1] = list_dst[:, 1] * z / (z - list_dst[:, 2]) + pcenter[1]
warpR = cv2.getPerspectiveTransform(org, dst)
dst1, dst2, dst3, dst4 = dst
r1 = int(min(dst1[1], dst2[1]))
r2 = int(max(dst3[1], dst4[1]))
c1 = int(min(dst1[0], dst3[0]))
c2 = int(max(dst2[0], dst4[0]))
try:
ratio = min(1.0 * h / (r2 - r1), 1.0 * w / (c2 - c1))
dx = -c1
dy = -r1
T1 = np.float32([[1., 0, dx], [0, 1., dy], [0, 0, 1.0 / ratio]])
ret = T1.dot(warpR)
except:
ratio = 1.0
T1 = np.float32([[1., 0, 0], [0, 1., 0], [0, 0, 1.]])
ret = T1
return ret, (-r1, -c1), ratio, dst
def get_warpAffine(config):
"""
get_warpAffine
"""
anglez = config.anglez
rz = np.array([[np.cos(rad(anglez)), np.sin(rad(anglez)), 0],
[-np.sin(rad(anglez)), np.cos(rad(anglez)), 0]], np.float32)
return rz
def warp(img, ang):
"""
warp
"""
h, w, _ = img.shape
config = Config()
config.make(w, h, ang)
new_img = img
if config.perspective:
tp = random.randint(1, 100)
if tp >= 50:
warpR, (r1, c1), ratio, dst = get_warpR(config)
new_w = int(np.max(dst[:, 0])) - int(np.min(dst[:, 0]))
new_img = cv2.warpPerspective(
new_img,
warpR, (int(new_w * ratio), h),
borderMode=config.borderMode)
if config.crop:
img_height, img_width = img.shape[0:2]
tp = random.randint(1, 100)
if tp >= 50 and img_height >= 20 and img_width >= 20:
new_img = get_crop(new_img)
if config.affine:
warpT = get_warpAffine(config)
new_img = cv2.warpAffine(
new_img, warpT, (w, h), borderMode=config.borderMode)
if config.blur:
tp = random.randint(1, 100)
if tp >= 50:
new_img = blur(new_img)
if config.color:
tp = random.randint(1, 100)
if tp >= 50:
new_img = cvtColor(new_img)
if config.jitter:
new_img = jitter(new_img)
if config.noise:
tp = random.randint(1, 100)
if tp >= 50:
new_img = add_gasuss_noise(new_img)
if config.reverse:
tp = random.randint(1, 100)
if tp >= 50:
new_img = 255 - new_img
return new_img
def process_image(img,
image_shape,
label=None,
......@@ -96,7 +345,10 @@ def process_image(img,
loss_type=None,
max_text_length=None,
tps=None,
infer_mode=False):
infer_mode=False,
distort=False):
if distort:
img = warp(img, 10)
if infer_mode and char_ops.character_type == "ch" and not tps:
norm_img = resize_norm_img_chinese(img, image_shape)
else:
......@@ -108,7 +360,7 @@ def process_image(img,
text = char_ops.encode(label)
if len(text) == 0 or len(text) > max_text_length:
logger.info(
"Warning in ppocr/data/rec/img_tools.py:line106: Wrong data type."
"Warning in ppocr/data/rec/img_tools.py:line362: Wrong data type."
"Excepted string with length between 1 and {}, but "
"got '{}'. Label is '{}'".format(max_text_length,
len(text), label))
......
......@@ -30,12 +30,17 @@ class CharacterOps(object):
dict_character = list(self.character_str)
elif self.character_type == "ch":
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
......@@ -93,7 +98,7 @@ class CharacterOps(object):
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[text_index[idx]])
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
......
......@@ -39,7 +39,8 @@ class TextRecognizer(object):
self.rec_algorithm = args.rec_algorithm
char_ops_params = {
"character_type": args.rec_char_type,
"character_dict_path": args.rec_char_dict_path
"character_dict_path": args.rec_char_dict_path,
"use_space_char": args.use_space_char
}
if self.rec_algorithm != "RARE":
char_ops_params['loss_type'] = 'ctc'
......
......@@ -63,6 +63,7 @@ def parse_args():
"--rec_char_dict_path",
type=str,
default="./ppocr/utils/ppocr_keys_v1.txt")
parser.add_argument("--use_space_char", type=bool, default=True)
return parser.parse_args()
......@@ -90,8 +91,9 @@ def create_predictor(args, mode):
config.enable_use_gpu(args.gpu_mem, 0)
else:
config.disable_gpu()
config.enable_memory_optim()
config.enable_mkldnn()
config.set_cpu_math_library_num_threads(4)
#config.enable_memory_optim()
config.disable_glog_info()
# use zero copy
......@@ -169,26 +171,35 @@ def draw_ocr_box_txt(image, boxes, txts):
draw_left = ImageDraw.Draw(img_left)
draw_right = ImageDraw.Draw(img_right)
for (box, txt) in zip(boxes, txts):
color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
color = (random.randint(0, 255), random.randint(0, 255),
random.randint(0, 255))
draw_left.polygon(box, fill=color)
draw_right.polygon([box[0][0], box[0][1],
box[1][0], box[1][1],
box[2][0], box[2][1],
box[3][0], box[3][1]], outline=color)
box_height = math.sqrt((box[0][0] - box[3][0]) ** 2 + (box[0][1] - box[3][1]) ** 2)
box_width = math.sqrt((box[0][0] - box[1][0]) ** 2 + (box[0][1] - box[1][1]) ** 2)
draw_right.polygon(
[
box[0][0], box[0][1], box[1][0], box[1][1], box[2][0],
box[2][1], box[3][0], box[3][1]
],
outline=color)
box_height = math.sqrt((box[0][0] - box[3][0])**2 + (box[0][1] - box[3][
1])**2)
box_width = math.sqrt((box[0][0] - box[1][0])**2 + (box[0][1] - box[1][
1])**2)
if box_height > 2 * box_width:
font_size = max(int(box_width * 0.9), 10)
font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8")
font = ImageFont.truetype(
"./doc/simfang.ttf", font_size, encoding="utf-8")
cur_y = box[0][1]
for c in txt:
char_size = font.getsize(c)
draw_right.text((box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font)
draw_right.text(
(box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font)
cur_y += char_size[1]
else:
font_size = max(int(box_height * 0.8), 10)
font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8")
draw_right.text([box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
font = ImageFont.truetype(
"./doc/simfang.ttf", font_size, encoding="utf-8")
draw_right.text(
[box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
img_left = Image.blend(image, img_left, 0.5)
img_show = Image.new('RGB', (w * 2, h), (255, 255, 255))
img_show.paste(img_left, (0, 0, w, h))
......@@ -292,6 +303,25 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.):
return np.array(blank_img)
def base64_to_cv2(b64str):
import base64
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
def draw_boxes(image, boxes, scores=None, drop_score=0.5):
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score:
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
return image
if __name__ == '__main__':
test_img = "./doc/test_v2"
predict_txt = "./doc/predict.txt"
......
......@@ -219,6 +219,13 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict):
epoch_num = config['Global']['epoch_num']
print_batch_step = config['Global']['print_batch_step']
eval_batch_step = config['Global']['eval_batch_step']
start_eval_step = 0
if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
start_eval_step = eval_batch_step[0]
eval_batch_step = eval_batch_step[1]
logger.info(
"During the training process, after the {}th iteration, an evaluation is run every {} iterations".
format(start_eval_step, eval_batch_step))
save_epoch_step = config['Global']['save_epoch_step']
save_model_dir = config['Global']['save_model_dir']
if not os.path.exists(save_model_dir):
......@@ -246,7 +253,7 @@ def train_eval_det_run(config, exe, train_info_dict, eval_info_dict):
t2 = time.time()
train_batch_elapse = t2 - t1
train_stats.update(stats)
if train_batch_id > 0 and train_batch_id \
if train_batch_id > start_eval_step and (train_batch_id -start_eval_step) \
% print_batch_step == 0:
logs = train_stats.log()
strs = 'epoch: {}, iter: {}, {}, time: {:.3f}'.format(
......@@ -286,6 +293,13 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
epoch_num = config['Global']['epoch_num']
print_batch_step = config['Global']['print_batch_step']
eval_batch_step = config['Global']['eval_batch_step']
start_eval_step = 0
if type(eval_batch_step) == list and len(eval_batch_step) >= 2:
start_eval_step = eval_batch_step[0]
eval_batch_step = eval_batch_step[1]
logger.info(
"During the training process, after the {}th iteration, an evaluation is run every {} iterations".
format(start_eval_step, eval_batch_step))
save_epoch_step = config['Global']['save_epoch_step']
save_model_dir = config['Global']['save_model_dir']
if not os.path.exists(save_model_dir):
......@@ -324,7 +338,7 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
train_batch_elapse = t2 - t1
stats = {'loss': loss, 'acc': acc}
train_stats.update(stats)
if train_batch_id > 0 and train_batch_id \
if train_batch_id > start_eval_step and (train_batch_id - start_eval_step) \
% print_batch_step == 0:
logs = train_stats.log()
strs = 'epoch: {}, iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format(
......
#!usr/bin/python
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import time
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
start = time.time()
# 发送HTTP请求
data = {'images':[cv2_to_base64(open("./doc/imgs/11.jpg", 'rb').read())]}
headers = {"Content-type": "application/json"}
# url = "http://127.0.0.1:8866/predict/ocr_det"
# url = "http://127.0.0.1:8866/predict/ocr_rec"
url = "http://127.0.0.1:8866/predict/ocr_system"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
end = time.time()
# 打印预测结果
print(r.json()["results"])
print("time cost: ", end - start)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment