"src/include/Array.hpp" did not exist on "5e5c27a63b1637556a17e17546147da6cb6d732e"
Unverified Commit e3fd6db0 authored by xiaoting's avatar xiaoting Committed by GitHub
Browse files

Merge branch 'develop' into adaptation_ch

parents 5edb619c bf321ecd
# paddleocr package使用说明
## 快速上手
### 安装whl包
pip安装
```bash
pip install paddleocr
```
本地构建并安装
```bash
python setup.py bdist_wheel
pip install dist/paddleocr-0.0.3-py3-none-any.whl
```
### 1. 代码使用
* 检测+识别全流程
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
结果可视化
<div align="center">
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>
* 单独执行检测
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item只包含文本框
```bash
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
......
```
结果可视化
<div align="center">
<img src="../imgs_results/whl/11_det.jpg" width="800">
</div>
* 单独执行识别
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path,det=False)
for line in result:
print(line)
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```
### 通过命令行使用
查看帮助信息
```bash
paddleocr -h
```
* 检测+识别全流程
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
* 单独执行检测
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
```
结果是一个list,每个item只包含文本框
```bash
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
......
```
* 单独执行识别
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```
## 自定义模型
当内置模型无法满足需求时,需要使用到自己训练的模型。
首先,参照[inference.md](./inference.md) 第一节转换将检测和识别模型转换为inference模型,然后按照如下方式使用
### 代码使用
```python
from paddleocr import PaddleOCR, draw_ocr
# 检测模型和识别模型路径下必须含有model和params文件
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}')
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
### 通过命令行使用
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
```
## 参数说明
| 字段 | 说明 | 默认值 |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu | 是否使用GPU | TRUE |
| gpu_mem | 初始化占用的GPU内存大小 | 8000M |
| image_dir | 通过命令行调用时执行预测的图片或文件夹路径 | |
| det_algorithm | 使用的检测算法类型 | DB |
| det_model_dir | 检测模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/det`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| det_max_side_len | 检测算法前向时图片长边的最大尺寸,当长边超出这个值时会将长边resize到这个大小,短边等比例缩放 | 960 |
| det_db_thresh | DB模型输出预测图的二值化阈值 | 0.3 |
| det_db_box_thresh | DB模型输出框的阈值,低于此值的预测框会被丢弃 | 0.5 |
| det_db_unclip_ratio | DB模型输出框扩大的比例 | 2 |
| det_east_score_thresh | EAST模型输出预测图的二值化阈值 | 0.8 |
| det_east_cover_thresh | EAST模型输出框的阈值,低于此值的预测框会被丢弃 | 0.1 |
| det_east_nms_thresh | EAST模型输出框NMS的阈值 | 0.2 |
| rec_algorithm | 使用的识别算法类型 | CRNN |
| rec_model_dir | 识别模型所在文件夹。传承那方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
| rec_image_shape | 识别算法的输入图片尺寸 | "3,32,320" |
| rec_char_type | 识别算法的字符类型,中文(ch)或英文(en) | ch |
| rec_batch_num | 进行识别时,同时前向的图片数 | 30 |
| max_text_length | 识别算法能识别的最大文字长度 | 25 |
| rec_char_dict_path | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径 | ./ppocr/utils/ppocr_keys_v1.txt |
| use_space_char | 是否识别空格 | TRUE |
| enable_mkldnn | 是否启用mkldnn | FALSE |
| det | 前向时使用启动检测 | TRUE |
| rec | 前向时是否启动识别 | TRUE |
...@@ -60,8 +60,9 @@ Take `rec_icdar15_train.yml` as an example: ...@@ -60,8 +60,9 @@ Take `rec_icdar15_train.yml` as an example:
| beta1 | Set the exponential decay rate for the 1st moment estimates | 0.9 | \ | | beta1 | Set the exponential decay rate for the 1st moment estimates | 0.9 | \ |
| beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ | | beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ |
| decay | Whether to use decay | \ | \ | | decay | Whether to use decay | \ | \ |
| function(decay) | Set the decay function | cosine_decay | Support cosine_decay and piecewise_decay | | function(decay) | Set the decay function | cosine_decay | Support cosine_decay, cosine_decay_warmup and piecewise_decay |
| step_each_epoch | The number of steps in an epoch. Used in cosine_decay | 20 | Calculation :total_image_num / (batch_size_per_card * card_size) | | step_each_epoch | The number of steps in an epoch. Used in cosine_decay/cosine_decay_warmup | 20 | Calculation: total_image_num / (batch_size_per_card * card_size) |
| total_epoch | The number of epochs. Used in cosine_decay | 1000 | Consistent with Global.epoch_num | | total_epoch | The number of epochs. Used in cosine_decay/cosine_decay_warmup | 1000 | Consistent with Global.epoch_num |
| warmup_minibatch | Number of steps for linear warmup. Used in cosine_decay_warmup | 1000 | \ |
| boundaries | The step intervals to reduce learning rate. Used in piecewise_decay | - | The format is list | | boundaries | The step intervals to reduce learning rate. Used in piecewise_decay | - | The format is list |
| decay_rate | Learning rate decay rate. Used in piecewise_decay | - | \ | | decay_rate | Learning rate decay rate. Used in piecewise_decay | - | \ |
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment. Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment.
*Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)。*
## 2.inference models ## 2.inference models
......
# RECENT UPDATES # RECENT UPDATES
- 2020.8.24 Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/whl_en.md)
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294) - 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519) - 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite) - 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
......
# paddleocr package
## Get started quickly
### install package
install by pypi
```bash
pip install paddleocr
```
build own whl package and install
```bash
python setup.py bdist_wheel
pip install dist/paddleocr-0.0.3-py3-none-any.whl
```
### 1. Use by code
* detection and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```
Visualization of results
<div align="center">
<img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>
* only detection
```python
from paddleocr import PaddleOCR,draw_ocr
ocr = PaddleOCR() # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
......
```
Visualization of results
<div align="center">
<img src="../imgs_results/whl/12_det.jpg" width="800">
</div>
* only recognition
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR() # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path,det=False)
for line in result:
print(line)
```
Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```
### Use by command line
show help information
```bash
paddleocr -h
```
* detection and recognition
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg
```
Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```
* only detection
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
```
Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
......
```
* only recognition
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false
```
Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```
## Use custom model
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows
### 1. Use by code
```python
from paddleocr import PaddleOCR,draw_ocr
# The path of detection and recognition model must contain model and params files
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}',rec_model_dir='{your_rec_model_dir}å')
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
### Use by command line
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir}
```
## Parameter Description
| Parameter | Description | Default value |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu | use GPU or not | TRUE |
| gpu_mem | GPU memory size used for initialization | 8000M |
| image_dir | The images path or folder path for predicting when used by the command line | |
| det_algorithm | Type of detection algorithm selected | DB |
| det_model_dir | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| det_max_side_len | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally | 960 |
| det_db_thresh | Binarization threshold value of DB output map | 0.3 |
| det_db_box_thresh | The threshold value of the DB output box. Boxes score lower than this value will be discarded | 0.5 |
| det_db_unclip_ratio | The expanded ratio of DB output box | 2 |
| det_east_score_thresh | Binarization threshold value of EAST output map | 0.8 |
| det_east_cover_thresh | The threshold value of the EAST output box. Boxes score lower than this value will be discarded | 0.1 |
| det_east_nms_thresh | The NMS threshold value of EAST model output box | 0.2 |
| rec_algorithm | Type of recognition algorithm selected | CRNN |
| rec_model_dir | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| rec_image_shape | image shape of recognition algorithm | "3,32,320" |
| rec_char_type | Character type of recognition algorithm, Chinese (ch) or English (en) | ch |
| rec_batch_num | When performing recognition, the batchsize of forward images | 30 |
| max_text_length | The maximum text length that the recognition algorithm can recognize | 25 |
| rec_char_dict_path | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2 | ./ppocr/utils/ppocr_keys_v1.txt |
| use_space_char | Whether to recognize spaces | TRUE |
| enable_mkldnn | Whether to enable mkldnn | FALSE |
| det | Enable detction when `ppocr.ocr` func exec | TRUE |
| rec | Enable detction when `ppocr.ocr` func exec | TRUE |
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
__dir__ = os.path.dirname(__file__)
sys.path.append(os.path.join(__dir__, ''))
import cv2
import numpy as np
from pathlib import Path
import tarfile
import requests
from tqdm import tqdm
from tools.infer import predict_system
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import check_and_read_gif, get_image_file_list
__all__ = ['PaddleOCR']
model_params = {
'det': 'https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar',
'rec':
'https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar',
}
SUPPORT_DET_MODEL = ['DB']
SUPPORT_REC_MODEL = ['CRNN']
BASE_DIR = os.path.expanduser("~/.paddleocr/")
def download_with_progressbar(url, save_path):
response = requests.get(url, stream=True)
total_size_in_bytes = int(response.headers.get('content-length', 0))
block_size = 1024 # 1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open(save_path, 'wb') as file:
for data in response.iter_content(block_size):
progress_bar.update(len(data))
file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
logger.error("ERROR, something went wrong")
sys.exit(0)
def maybe_download(model_storage_directory, url):
# using custom model
if not os.path.exists(os.path.join(
model_storage_directory, 'model')) or not os.path.exists(
os.path.join(model_storage_directory, 'params')):
tmp_path = os.path.join(model_storage_directory, url.split('/')[-1])
print('download {} to {}'.format(url, tmp_path))
os.makedirs(model_storage_directory, exist_ok=True)
download_with_progressbar(url, tmp_path)
with tarfile.open(tmp_path, 'r') as tarObj:
for member in tarObj.getmembers():
if "model" in member.name:
filename = 'model'
elif "params" in member.name:
filename = 'params'
else:
continue
file = tarObj.extractfile(member)
with open(
os.path.join(model_storage_directory, filename),
'wb') as f:
f.write(file.read())
os.remove(tmp_path)
def parse_args():
import argparse
def str2bool(v):
return v.lower() in ("true", "t", "1")
parser = argparse.ArgumentParser()
# params for prediction engine
parser.add_argument("--use_gpu", type=str2bool, default=True)
parser.add_argument("--ir_optim", type=str2bool, default=True)
parser.add_argument("--use_tensorrt", type=str2bool, default=False)
parser.add_argument("--gpu_mem", type=int, default=8000)
# params for text detector
parser.add_argument("--image_dir", type=str)
parser.add_argument("--det_algorithm", type=str, default='DB')
parser.add_argument("--det_model_dir", type=str, default=None)
parser.add_argument("--det_max_side_len", type=float, default=960)
# DB parmas
parser.add_argument("--det_db_thresh", type=float, default=0.3)
parser.add_argument("--det_db_box_thresh", type=float, default=0.5)
parser.add_argument("--det_db_unclip_ratio", type=float, default=2.0)
# EAST parmas
parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
parser.add_argument("--det_east_nms_thresh", type=float, default=0.2)
# params for text recognizer
parser.add_argument("--rec_algorithm", type=str, default='CRNN')
parser.add_argument("--rec_model_dir", type=str, default=None)
parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
parser.add_argument("--rec_char_type", type=str, default='ch')
parser.add_argument("--rec_batch_num", type=int, default=30)
parser.add_argument("--max_text_length", type=int, default=25)
parser.add_argument(
"--rec_char_dict_path",
type=str,
default="./ppocr/utils/ppocr_keys_v1.txt")
parser.add_argument("--use_space_char", type=bool, default=True)
parser.add_argument("--enable_mkldnn", type=bool, default=False)
parser.add_argument("--det", type=str2bool, default=True)
parser.add_argument("--rec", type=str2bool, default=True)
return parser.parse_args()
class PaddleOCR(predict_system.TextSystem):
def __init__(self, **kwargs):
"""
paddleocr package
args:
**kwargs: other params show in paddleocr --help
"""
postprocess_params = parse_args()
postprocess_params.__dict__.update(**kwargs)
# init model dir
if postprocess_params.det_model_dir is None:
postprocess_params.det_model_dir = os.path.join(BASE_DIR, 'det')
if postprocess_params.rec_model_dir is None:
postprocess_params.rec_model_dir = os.path.join(BASE_DIR, 'rec')
print(postprocess_params)
# download model
maybe_download(postprocess_params.det_model_dir, model_params['det'])
maybe_download(postprocess_params.rec_model_dir, model_params['rec'])
if postprocess_params.det_algorithm not in SUPPORT_DET_MODEL:
logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL))
sys.exit(0)
if postprocess_params.rec_algorithm not in SUPPORT_REC_MODEL:
logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL))
sys.exit(0)
postprocess_params.rec_char_dict_path = Path(
__file__).parent / postprocess_params.rec_char_dict_path
# init det_model and rec_model
super().__init__(postprocess_params)
def ocr(self, img, det=True, rec=True):
"""
ocr with paddleocr
args:
img: img for ocr, support ndarray, img_path and list or ndarray
det: use text detection or not, if false, only rec will be exec. default is True
rec: use text recognition or not, if false, only det will be exec. default is True
"""
assert isinstance(img, (np.ndarray, list, str))
if isinstance(img, str):
image_file = img
img, flag = check_and_read_gif(image_file)
if not flag:
img = cv2.imread(image_file)
if img is None:
logger.error("error in loading image:{}".format(image_file))
return None
if det and rec:
dt_boxes, rec_res = self.__call__(img)
return [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)]
elif det and not rec:
dt_boxes, elapse = self.text_detector(img)
if dt_boxes is None:
return None
return [box.tolist() for box in dt_boxes]
else:
if not isinstance(img, list):
img = [img]
rec_res, elapse = self.text_recognizer(img)
return rec_res
def main():
# for com
args = parse_args()
image_file_list = get_image_file_list(args.image_dir)
if len(image_file_list) == 0:
logger.error('no images find in {}'.format(args.image_dir))
return
ocr_engine = PaddleOCR()
for img_path in image_file_list:
print(img_path)
result = ocr_engine.ocr(img_path, det=args.det, rec=args.rec)
for line in result:
print(line)
...@@ -14,14 +14,50 @@ ...@@ -14,14 +14,50 @@
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import math
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
import paddle.fluid.layers.ops as ops
from ppocr.utils.utility import initial_logger from ppocr.utils.utility import initial_logger
logger = initial_logger() logger = initial_logger()
def cosine_decay_with_warmup(learning_rate,
step_each_epoch,
epochs=500,
warmup_minibatch=1000):
"""Applies cosine decay to the learning rate.
lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
decrease lr for every mini-batch and start with warmup.
"""
global_step = _decay_step_counter()
lr = fluid.layers.tensor.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=True,
name="learning_rate")
warmup_minibatch = fluid.layers.fill_constant(
shape=[1],
dtype='float32',
value=float(warmup_minibatch),
force_cpu=True)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(global_step < warmup_minibatch):
decayed_lr = learning_rate * (1.0 * global_step / warmup_minibatch)
fluid.layers.tensor.assign(input=decayed_lr, output=lr)
with switch.default():
decayed_lr = learning_rate * \
(ops.cos((global_step - warmup_minibatch) * (math.pi / (epochs * step_each_epoch))) + 1)/2
fluid.layers.tensor.assign(input=decayed_lr, output=lr)
return lr
def AdamDecay(params, parameter_list=None): def AdamDecay(params, parameter_list=None):
""" """
define optimizer function define optimizer function
...@@ -36,7 +72,9 @@ def AdamDecay(params, parameter_list=None): ...@@ -36,7 +72,9 @@ def AdamDecay(params, parameter_list=None):
l2_decay = params.get("l2_decay", 0.0) l2_decay = params.get("l2_decay", 0.0)
if 'decay' in params: if 'decay' in params:
supported_decay_mode = ["cosine_decay", "piecewise_decay"] supported_decay_mode = [
"cosine_decay", "cosine_decay_warmup", "piecewise_decay"
]
params = params['decay'] params = params['decay']
decay_mode = params['function'] decay_mode = params['function']
assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format( assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format(
...@@ -49,6 +87,15 @@ def AdamDecay(params, parameter_list=None): ...@@ -49,6 +87,15 @@ def AdamDecay(params, parameter_list=None):
learning_rate=base_lr, learning_rate=base_lr,
step_each_epoch=step_each_epoch, step_each_epoch=step_each_epoch,
epochs=total_epoch) epochs=total_epoch)
elif decay_mode == "cosine_decay_warmup":
step_each_epoch = params['step_each_epoch']
total_epoch = params['total_epoch']
warmup_minibatch = params.get("warmup_minibatch", 1000)
base_lr = cosine_decay_with_warmup(
learning_rate=base_lr,
step_each_epoch=step_each_epoch,
epochs=total_epoch,
warmup_minibatch=warmup_minibatch)
elif decay_mode == "piecewise_decay": elif decay_mode == "piecewise_decay":
boundaries = params["boundaries"] boundaries = params["boundaries"]
decay_rate = params["decay_rate"] decay_rate = params["decay_rate"]
...@@ -104,5 +151,5 @@ def RMSProp(params, parameter_list=None): ...@@ -104,5 +151,5 @@ def RMSProp(params, parameter_list=None):
optimizer = fluid.optimizer.RMSProp( optimizer = fluid.optimizer.RMSProp(
learning_rate=base_lr, learning_rate=base_lr,
regularization=fluid.regularizer.L2Decay(regularization_coeff=l2_decay)) regularization=fluid.regularizer.L2Decay(regularization_coeff=l2_decay))
return optimizer return optimizer
\ No newline at end of file
shapely shapely
imgaug imgaug
pyclipper pyclipper
lmdb lmdb
\ No newline at end of file tqdm
numpy
\ No newline at end of file
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from setuptools import setup
from io import open
with open('requirments.txt', encoding="utf-8-sig") as f:
requirements = f.readlines()
requirements.append('tqdm')
def readme():
with open('doc/doc_en/whl_en.md', encoding="utf-8-sig") as f:
README = f.read()
return README
setup(
name='paddleocr',
packages=['paddleocr'],
package_dir={'paddleocr': ''},
include_package_data=True,
entry_points={"console_scripts": ["paddleocr= paddleocr.paddleocr:main"]},
version='0.0.3',
install_requires=requirements,
license='Apache License 2.0',
description='Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices',
long_description=readme(),
long_description_content_type='text/markdown',
url='https://github.com/PaddlePaddle/PaddleOCR',
download_url='https://github.com/PaddlePaddle/PaddleOCR.git',
keywords=[
'ocr textdetection textrecognition paddleocr crnn east star-net rosetta ocrlite db chineseocr chinesetextdetection chinesetextrecognition'
],
classifiers=[
'Intended Audience :: Developers', 'Operating System :: OS Independent',
'Natural Language :: Chinese (Simplified)',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.2',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7', 'Topic :: Utilities'
], )
...@@ -17,28 +17,32 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) ...@@ -17,28 +17,32 @@ __dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__) sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..')))
import cv2
import copy
import numpy as np
import math
import time
import sys
import paddle.fluid as fluid
import tools.infer.utility as utility import tools.infer.utility as utility
from ppocr.utils.utility import initial_logger from ppocr.utils.utility import initial_logger
logger = initial_logger() logger = initial_logger()
from ppocr.utils.utility import get_image_file_list, check_and_read_gif from ppocr.utils.utility import get_image_file_list, check_and_read_gif
import cv2
from ppocr.data.det.sast_process import SASTProcessTest from ppocr.data.det.sast_process import SASTProcessTest
from ppocr.data.det.east_process import EASTProcessTest from ppocr.data.det.east_process import EASTProcessTest
from ppocr.data.det.db_process import DBProcessTest from ppocr.data.det.db_process import DBProcessTest
from ppocr.postprocess.db_postprocess import DBPostProcess from ppocr.postprocess.db_postprocess import DBPostProcess
from ppocr.postprocess.east_postprocess import EASTPostPocess from ppocr.postprocess.east_postprocess import EASTPostPocess
from ppocr.postprocess.sast_postprocess import SASTPostProcess from ppocr.postprocess.sast_postprocess import SASTPostProcess
import copy
import numpy as np
import math
import time
import sys
class TextDetector(object): class TextDetector(object):
def __init__(self, args): def __init__(self, args):
max_side_len = args.det_max_side_len max_side_len = args.det_max_side_len
self.det_algorithm = args.det_algorithm self.det_algorithm = args.det_algorithm
self.use_zero_copy_run = args.use_zero_copy_run
preprocess_params = {'max_side_len': max_side_len} preprocess_params = {'max_side_len': max_side_len}
postprocess_params = {} postprocess_params = {}
if self.det_algorithm == "DB": if self.det_algorithm == "DB":
...@@ -127,7 +131,7 @@ class TextDetector(object): ...@@ -127,7 +131,7 @@ class TextDetector(object):
dt_boxes_new.append(box) dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new) dt_boxes = np.array(dt_boxes_new)
return dt_boxes return dt_boxes
def __call__(self, img): def __call__(self, img):
ori_im = img.copy() ori_im = img.copy()
im, ratio_list = self.preprocess_op(img) im, ratio_list = self.preprocess_op(img)
...@@ -135,8 +139,12 @@ class TextDetector(object): ...@@ -135,8 +139,12 @@ class TextDetector(object):
return None, 0 return None, 0
im = im.copy() im = im.copy()
starttime = time.time() starttime = time.time()
self.input_tensor.copy_from_cpu(im) if self.use_zero_copy_run:
self.predictor.zero_copy_run() self.input_tensor.copy_from_cpu(im)
self.predictor.zero_copy_run()
else:
im = fluid.core.PaddleTensor(im)
self.predictor.run([im])
outputs = [] outputs = []
for output_tensor in self.output_tensors: for output_tensor in self.output_tensors:
output = output_tensor.copy_to_cpu() output = output_tensor.copy_to_cpu()
...@@ -152,7 +160,7 @@ class TextDetector(object): ...@@ -152,7 +160,7 @@ class TextDetector(object):
outs_dict['f_tvo'] = outputs[3] outs_dict['f_tvo'] = outputs[3]
else: else:
outs_dict['maps'] = outputs[0] outs_dict['maps'] = outputs[0]
dt_boxes_list = self.postprocess_op(outs_dict, [ratio_list]) dt_boxes_list = self.postprocess_op(outs_dict, [ratio_list])
dt_boxes = dt_boxes_list[0] dt_boxes = dt_boxes_list[0]
if self.det_algorithm == "SAST" and self.det_sast_polygon: if self.det_algorithm == "SAST" and self.det_sast_polygon:
......
...@@ -17,15 +17,18 @@ __dir__ = os.path.dirname(os.path.abspath(__file__)) ...@@ -17,15 +17,18 @@ __dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__) sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) sys.path.append(os.path.abspath(os.path.join(__dir__, '../..')))
import tools.infer.utility as utility
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import get_image_file_list, check_and_read_gif
import cv2 import cv2
import copy import copy
import numpy as np import numpy as np
import math import math
import time import time
import paddle.fluid as fluid
import tools.infer.utility as utility
from ppocr.utils.utility import initial_logger
logger = initial_logger()
from ppocr.utils.utility import get_image_file_list, check_and_read_gif
from ppocr.utils.character import CharacterOps from ppocr.utils.character import CharacterOps
...@@ -37,6 +40,7 @@ class TextRecognizer(object): ...@@ -37,6 +40,7 @@ class TextRecognizer(object):
self.character_type = args.rec_char_type self.character_type = args.rec_char_type
self.rec_batch_num = args.rec_batch_num self.rec_batch_num = args.rec_batch_num
self.rec_algorithm = args.rec_algorithm self.rec_algorithm = args.rec_algorithm
self.use_zero_copy_run = args.use_zero_copy_run
char_ops_params = { char_ops_params = {
"character_type": args.rec_char_type, "character_type": args.rec_char_type,
"character_dict_path": args.rec_char_dict_path, "character_dict_path": args.rec_char_dict_path,
...@@ -102,8 +106,12 @@ class TextRecognizer(object): ...@@ -102,8 +106,12 @@ class TextRecognizer(object):
norm_img_batch = np.concatenate(norm_img_batch) norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy() norm_img_batch = norm_img_batch.copy()
starttime = time.time() starttime = time.time()
self.input_tensor.copy_from_cpu(norm_img_batch) if self.use_zero_copy_run:
self.predictor.zero_copy_run() self.input_tensor.copy_from_cpu(norm_img_batch)
self.predictor.zero_copy_run()
else:
norm_img_batch = fluid.core.PaddleTensor(norm_img_batch)
self.predictor.run([norm_img_batch])
if self.loss_type == "ctc": if self.loss_type == "ctc":
rec_idx_batch = self.output_tensors[0].copy_to_cpu() rec_idx_batch = self.output_tensors[0].copy_to_cpu()
......
...@@ -157,7 +157,6 @@ def main(args): ...@@ -157,7 +157,6 @@ def main(args):
boxes, boxes,
txts, txts,
scores, scores,
draw_txt=True,
drop_score=drop_score) drop_score=drop_score)
draw_img_save = "./inference_results/" draw_img_save = "./inference_results/"
if not os.path.exists(draw_img_save): if not os.path.exists(draw_img_save):
......
...@@ -71,6 +71,7 @@ def parse_args(): ...@@ -71,6 +71,7 @@ def parse_args():
default="./ppocr/utils/ppocr_keys_v1.txt") default="./ppocr/utils/ppocr_keys_v1.txt")
parser.add_argument("--use_space_char", type=bool, default=True) parser.add_argument("--use_space_char", type=bool, default=True)
parser.add_argument("--enable_mkldnn", type=bool, default=False) parser.add_argument("--enable_mkldnn", type=bool, default=False)
parser.add_argument("--use_zero_copy_run", type=bool, default=False)
return parser.parse_args() return parser.parse_args()
...@@ -105,9 +106,12 @@ def create_predictor(args, mode): ...@@ -105,9 +106,12 @@ def create_predictor(args, mode):
#config.enable_memory_optim() #config.enable_memory_optim()
config.disable_glog_info() config.disable_glog_info()
# use zero copy if args.use_zero_copy_run:
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False) config.switch_use_feed_fetch_ops(False)
else:
config.switch_use_feed_fetch_ops(True)
predictor = create_paddle_predictor(config) predictor = create_paddle_predictor(config)
input_names = predictor.get_input_names() input_names = predictor.get_input_names()
input_tensor = predictor.get_input_tensor(input_names[0]) input_tensor = predictor.get_input_tensor(input_names[0])
...@@ -139,7 +143,12 @@ def resize_img(img, input_size=600): ...@@ -139,7 +143,12 @@ def resize_img(img, input_size=600):
return im return im
def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): def draw_ocr(image,
boxes,
txts=None,
scores=None,
drop_score=0.5,
font_path="./doc/simfang.ttf"):
""" """
Visualize the results of OCR detection and recognition Visualize the results of OCR detection and recognition
args: args:
...@@ -147,23 +156,29 @@ def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5): ...@@ -147,23 +156,29 @@ def draw_ocr(image, boxes, txts, scores, draw_txt=True, drop_score=0.5):
boxes(list): boxes with shape(N, 4, 2) boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts txts(list): the texts
scores(list): txxs corresponding scores scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized drop_score(float): only scores greater than drop_threshold will be visualized
font_path: the path of font which is used to draw text
return(array): return(array):
the visualized img the visualized img
""" """
if scores is None: if scores is None:
scores = [1] * len(boxes) scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores): box_num = len(boxes)
if score < drop_score or math.isnan(score): for i in range(box_num):
if scores is not None and (scores[i] < drop_score or
math.isnan(scores[i])):
continue continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if txts is not None:
if draw_txt:
img = np.array(resize_img(image, input_size=600)) img = np.array(resize_img(image, input_size=600))
txt_img = text_visual( txt_img = text_visual(
txts, scores, img_h=img.shape[0], img_w=600, threshold=drop_score) txts,
scores,
img_h=img.shape[0],
img_w=600,
threshold=drop_score,
font_path=font_path)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img return img
return image return image
...@@ -241,7 +256,12 @@ def str_count(s): ...@@ -241,7 +256,12 @@ def str_count(s):
return s_len - math.ceil(en_dg_count / 2) return s_len - math.ceil(en_dg_count / 2)
def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): def text_visual(texts,
scores,
img_h=400,
img_w=600,
threshold=0.,
font_path="./doc/simfang.ttf"):
""" """
create new blank img and draw txt on it create new blank img and draw txt on it
args: args:
...@@ -249,6 +269,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): ...@@ -249,6 +269,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.):
scores(list|None): corresponding score of each txt scores(list|None): corresponding score of each txt
img_h(int): the height of blank img img_h(int): the height of blank img
img_w(int): the width of blank img img_w(int): the width of blank img
font_path: the path of font which is used to draw text
return(array): return(array):
""" """
...@@ -267,7 +288,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.): ...@@ -267,7 +288,7 @@ def text_visual(texts, scores, img_h=400, img_w=600, threshold=0.):
font_size = 20 font_size = 20
txt_color = (0, 0, 0) txt_color = (0, 0, 0)
font = ImageFont.truetype("./doc/simfang.ttf", font_size, encoding="utf-8") font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
gap = font_size + 5 gap = font_size + 5
txt_img_list = [] txt_img_list = []
...@@ -348,6 +369,6 @@ if __name__ == '__main__': ...@@ -348,6 +369,6 @@ if __name__ == '__main__':
txts.append(dic['transcription']) txts.append(dic['transcription'])
scores.append(round(dic['scores'], 3)) scores.append(round(dic['scores'], 3))
new_img = draw_ocr(image, boxes, txts, scores, draw_txt=True) new_img = draw_ocr(image, boxes, txts, scores)
cv2.imwrite(img_name, new_img) cv2.imwrite(img_name, new_img)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment