"test/gemm/gemm_xdl_int8.cpp" did not exist on "5b178874a1b2a1cae217e87e1988ab92a40d71b8"
Unverified Commit 8a7d3a29 authored by andyjpaddle's avatar andyjpaddle Committed by GitHub
Browse files

Merge branch 'dygraph' into add_rec_sar

parents 073fad37 e63533e4
# PaddleOCR全景图与项目克隆
## 1. PaddleOCR全景图
PaddleOCR包含丰富的文本检测、文本识别以及端到端算法。结合实际测试与产业经验,PaddleOCR选择DB和CRNN作为基础的检测和识别模型,经过一系列优化策略提出面向产业应用的PP-OCR模型。PP-OCR模型针对通用场景,根据不同语种形成了PP-OCR模型库。基于PP-OCR的能力,PaddleOCR针对文档场景任务发布PP-Structure工具库,包含版面分析和表格识别两大任务。为了打通产业落地的全流程,PaddleOCR提供了规模化的数据生产工具和多种预测部署工具,助力开发者快速落地。
<div align="center">
<img src="../overview.png">
</div>
## 2. 项目克隆
### **2.1 克隆PaddleOCR repo代码**
```
【推荐】git clone https://github.com/PaddlePaddle/PaddleOCR
```
如果因为网络问题无法pull成功,也可选择使用码云上的托管:
```
git clone https://gitee.com/paddlepaddle/PaddleOCR
```
注:码云托管代码可能无法实时同步本github项目更新,存在3~5天延时,请优先使用推荐方式。
### **2.2 安装第三方库**
```
cd PaddleOCR
pip3 install -r requirements.txt
```
# PaddleOCR快速开始
# 中文OCR模型快速使用
## 1.环境配置 - [PaddleOCR快速开始](#paddleocr)
+ [1. 安装PaddleOCR whl包](#1)
* [2. 便捷使用](#2)
+ [2.1 命令行使用](#21)
- [2.1.1 中英文模型](#211)
- [2.1.2 多语言模型](#212)
- [2.1.3 版面分析](#213)
+ [2.2 Python脚本使用](#22)
- [2.2.1 中英文与多语言使用](#221)
- [2.2.2 版面分析](#222)
请先参考[快速安装](./installation.md)配置PaddleOCR运行环境。 <a name="1"></a>
*注意:也可以通过 whl 包安装使用PaddleOCR,具体参考[Paddleocr Package使用说明](./whl.md)。* ## 1. 安装PaddleOCR whl包
## 2.inference模型下载 ```bash
pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本
```
* 移动端和服务器端的检测与识别模型如下,更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](../doc_ch/models_list.md) - 对于Windows环境用户:
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 | 直接通过pip安装的shapely库可能出现`[winRrror 126] 找不到指定模块的问题`。建议从[这里](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载shapely安装包完成安装,
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型(8.1M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用OCR模型(143M) | ch_ppocr_server_v2.0_xx |服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
- 使用**版面分析**功能时,运行以下命令**安装 Layout-Parser**
* windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下 ```bash
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
```
复制上表中的检测和识别的`inference模型`下载地址,并解压
``` <a name="2"></a>
mkdir inference && cd inference ## 2. 便捷使用
# 下载检测模型并解压 <a name="21"></a>
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package} ### 2.1 命令行使用
# 下载识别模型并解压
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
# 下载方向分类器模型并解压
wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package}
cd ..
```
以超轻量级模型为例: PaddleOCR提供了一系列测试图片,点击[这里](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip)下载并解压,然后在终端中切换到相应目录
``` ```
mkdir inference && cd inference cd /path/to/ppocr_img
# 下载超轻量级中文OCR模型的检测模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# 下载超轻量级中文OCR模型的文本方向分类器模型并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
cd ..
``` ```
解压完毕后应有如下文件结构: 如果不使用提供的测试图片,可以将下方`--image_dir`参数替换为相应的测试图片路径
<a name="211"></a>
#### 2.1.1 中英文模型
* 检测+方向分类器+识别全流程:设置方向分类器参数`--use_angle_cls true`后可对竖排文本进行识别。
```bash
paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
- 单独使用检测:设置`--rec``false`
```bash
paddleocr --image_dir ./imgs/11.jpg --rec false
```
结果是一个list,每个item只包含文本框
```bash
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
......
```
- 单独使用识别:设置`--det``false`
```bash
paddleocr --image_dir ./imgs_words/ch/word_1.jpg --det false
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```
更多whl包使用可参考[whl包文档](./whl.md)
<a name="212"></a>
#### 2.1.2 多语言模型
Paddleocr目前支持80个语种,可以通过修改`--lang`参数进行切换,对于英文模型,指定`--lang=en`
``` bash
paddleocr --image_dir ./imgs_en/254.jpg --lang=en
``` ```
├── ch_ppocr_mobile_v2.0_cls_infer
│ ├── inference.pdiparams <div align="center">
│ ├── inference.pdiparams.info <img src="../imgs_en/254.jpg" width="300" height="600">
│ └── inference.pdmodel <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
├── ch_ppocr_mobile_v2.0_det_infer </div>
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info 结果是一个list,每个item包含了文本框,文字和识别置信度
│ └── inference.pdmodel
├── ch_ppocr_mobile_v2.0_rec_infer ```text
├── inference.pdiparams [('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
├── inference.pdiparams.info [('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
└── inference.pdmodel [('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
``` ```
## 3.单张图像或者图像集合预测 常用的多语言简写包括
以下代码实现了文本检测、方向分类器和识别串联推理,在执行预测时,需要通过参数image_dir指定单张图像或者图像集合的路径、参数`det_model_dir`指定检测inference模型的路径、参数`rec_model_dir`指定识别inference模型的路径、参数`use_angle_cls`指定是否使用方向分类器、参数`cls_model_dir`指定方向分类器inference模型的路径、参数`use_space_char`指定是否预测空格字符。可视化识别结果默认保存到`./inference_results`文件夹里面。 | 语种 | 缩写 | | 语种 | 缩写 | | 语种 | 缩写 |
| -------- | ----------- | ---- | -------- | ------ | ---- | -------- | ------ |
| 中文 | ch | | 法文 | fr | | 日文 | japan |
| 英文 | en | | 德文 | german | | 韩文 | korean |
| 繁体中文 | chinese_cht | | 意大利文 | it | | 俄罗斯文 | ru |
```bash 全部语种及其对应的缩写列表可查看[多语言模型教程](./multi_languages.md)
<a name="213"></a>
# 预测image_dir指定的单张图像 #### 2.1.3 版面分析
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True
# 预测image_dir指定的图像集合 版面分析是指对文档图片中的文字、标题、列表、图片和表格5类区域进行划分。对于前三类区域,直接使用OCR模型完成对应区域文字检测与识别,并将结果保存在txt中。对于表格类区域,经过表格结构化处理后,表格图片转换为相同表格样式的Excel文件。图片区域会被单独裁剪成图像。
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True
# 如果想使用CPU进行预测,需设置use_gpu参数为False 使用PaddleOCR的版面分析功能,需要指定`--type=structure`
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False
```bash
paddleocr --image_dir=./table/1.png --type=structure
``` ```
- 通用中文OCR模型 - **返回结果说明**
PP-Structure的返回结果为一个dict组成的list,示例如下
```shell
[{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```
其中各个字段说明如下
| 字段 | 说明 |
| ---- | ------------------------------------------------------------ |
| type | 图片区域的类型 |
| bbox | 图片区域的在原图的坐标,分别[左上角x,左上角y,右下角x,右下角y] |
| res | 图片区域的OCR或表格识别结果。<br>表格: 表格的HTML字符串; <br>OCR: 一个包含各个单行文字的检测坐标和识别结果的元组 |
运行完成后,每张图片会在`output`字段指定的目录下有一个同名目录,图片里的每个表格会存储为一个excel,图片区域会被裁剪之后保存下来,excel文件和图片名为表格在图片里的坐标。
```
/output/table/1/
└─ res.txt
└─ [454, 360, 824, 658].xlsx 表格识别结果
└─ [16, 2, 828, 305].jpg 被裁剪出的图片区域
└─ [17, 361, 404, 711].xlsx 表格识别结果
```
- **参数说明**
| 字段 | 说明 | 默认值 |
| --------------- | ---------------------------------------- | -------------------------------------------- |
| output | excel和识别结果保存的地址 | ./output/table |
| table_max_len | 表格结构模型预测时,图像的长边resize尺度 | 488 |
| table_model_dir | 表格结构模型 inference 模型地址 | None |
| table_char_type | 表格结构模型所用字典地址 | ../ppocr/utils/dict/table_structure_dict.txt |
大部分参数和paddleocr whl包保持一致,见 [whl包文档](./whl.md)
<a name="22"></a>
### 2.2 Python脚本使用
<a name="221"></a>
#### 2.2.1 中英文与多语言使用
请按照上述步骤下载相应的模型,并且更新相关的参数,示例如下: 通过Python脚本使用PaddleOCR whl包,whl包会自动下载ppocr轻量级模型作为默认模型。
* 检测+方向分类器+识别全流程
```python
from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换
# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = './imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash ```bash
# 预测image_dir指定的单张图像 [[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True [[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
``` ```
* 注意: 结果可视化
- 如果希望使用不支持空格的识别模型,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_space_char=False`
- 如果不希望使用方向分类器,在预测的时候需要注意:请将代码更新到最新版本,并添加参数 `--use_angle_cls=False` <div align="center">
<img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>
<a name="222"></a>
#### 2.2.2 版面分析
```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res
table_engine = PPStructure(show_log=True)
更多的文本检测、识别串联推理使用方式请参考文档教程中[基于Python预测引擎推理](./inference.md) save_folder = './output/table'
img_path = './table/paper-image.jpg'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
此外,文档教程中也提供了中文OCR模型的其他预测部署方式: for line in result:
- [基于C++预测引擎推理](../../deploy/cpp_infer/readme.md) line.pop('img')
- [服务部署](../../deploy/hubserving) print(line)
- [端侧部署(目前只支持静态图)](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/lite)
from PIL import Image
font_path = './fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
## 文字识别 # 文字识别
本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
- [1 数据准备](#数据准备) - [1 数据准备](#数据准备)
- [1.1 自定义数据集](#自定义数据集) - [1.1 自定义数据集](#自定义数据集)
...@@ -9,22 +10,21 @@ ...@@ -9,22 +10,21 @@
- [2 启动训练](#启动训练) - [2 启动训练](#启动训练)
- [2.1 数据增强](#数据增强) - [2.1 数据增强](#数据增强)
- [2.2 训练](#训练) - [2.2 通用模型训练](#通用模型训练)
- [2.3 小语种](#小语种) - [2.3 多语言模型训练](#多语言模型训练)
- [3 评估](#评估) - [3 评估](#评估)
- [4 预测](#预测) - [4 预测](#预测)
- [4.1 训练引擎预测](#训练引擎预测)
<a name="数据准备"></a> <a name="数据准备"></a>
### 1. 数据准备 ## 1. 数据准备
PaddleOCR 支持两种数据格式: PaddleOCR 支持两种数据格式:
- `lmdb` 用于训练以lmdb格式存储的数据集; - `lmdb` 用于训练以lmdb格式存储的数据集(LMDBDataSet);
- `通用数据` 用于训练以文本文件存储的数据集: - `通用数据` 用于训练以文本文件存储的数据集(SimpleDataSet);
训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录: 训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
...@@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset> ...@@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
``` ```
<a name="准备数据集"></a> <a name="准备数据集"></a>
#### 1.1 自定义数据集 ### 1.1 自定义数据集
下面以通用数据集为例, 介绍如何准备数据集: 下面以通用数据集为例, 介绍如何准备数据集:
* 训练集 * 训练集
...@@ -82,16 +82,16 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单 ...@@ -82,16 +82,16 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
<a name="数据下载"></a> <a name="数据下载"></a>
1.2 数据下载 ### 1.2 数据下载
若您本地没有数据集,可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,下载 benchmark 所需的lmdb格式数据集。 - ICDAR2015
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件,通过以下方式下载: 若您本地没有数据集,可以在官网下载 [ICDAR2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据,用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,下载 benchmark 所需的lmdb格式数据集。
如果希望复现SRN的论文指标,需要下载离线[增广数据](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA),提取码: y3ry。增广数据是由MJSynth和SynthText做旋转和扰动得到的。数据下载完成后请解压到 {your_path}/PaddleOCR/train_data/data_lmdb_release/training/ 路径下。
如果希望复现SAR的论文指标,需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码:627x。此外,真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。 如果希望复现SAR的论文指标,需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码:627x。此外,真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。
如果你使用的是icdar2015的公开数据集,PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件,通过以下方式下载:
``` ```
# 训练集标签 # 训练集标签
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
...@@ -99,15 +99,25 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t ...@@ -99,15 +99,25 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
``` ```
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例: PaddleOCR 也提供了数据格式转换脚本,可以将ICDAR官网 label 转换为PaddleOCR支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例:
``` ```
# 将官网下载的标签文件转换为 rec_gt_label.txt # 将官网下载的标签文件转换为 rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt" python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
``` ```
数据样式格式如下,(a)为原始图片,(b)为每张图片对应的 Ground Truth 文本文件:
![](../datasets/icdar_rec.png)
- 多语言数据集
多语言模型的训练数据集均为100w的合成数据,使用了开源合成工具 [text_renderer](https://github.com/Sanster/text_renderer) ,少量的字体可以通过下面两种方式下载。
* [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) 提取码:frgi
* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
<a name="字典"></a> <a name="字典"></a>
1.3 字典 ### 1.3 字典
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。 最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
...@@ -154,13 +164,27 @@ PaddleOCR内置了一部分字典,可以按需使用。 ...@@ -154,13 +164,27 @@ PaddleOCR内置了一部分字典,可以按需使用。
并将 `character_type` 设置为 `ch` 并将 `character_type` 设置为 `ch`
<a name="支持空格"></a> <a name="支持空格"></a>
1.4 添加空格类别 ### 1.4 添加空格类别
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True` 如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`
<a name="启动训练"></a> <a name="启动训练"></a>
### 2. 启动训练 ## 2. 启动训练
<a name="数据增强"></a>
### 2.1 数据增强
PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加了数据增广。
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)、TIA数据增广。
训练过程中每种扰动方式以40%的概率被选择,具体代码实现请参考:[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
<a name="通用模型训练"></a>
### 2.2 通用模型训练
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例: PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
...@@ -180,23 +204,16 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc ...@@ -180,23 +204,16 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
``` ```
# GPU训练 支持单卡,多卡训练,通过--gpus参数指定卡号 # GPU训练 支持单卡,多卡训练
# 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log # 训练icdar15英文数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
```
<a name="数据增强"></a>
#### 2.1 数据增强
PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加了数据增广。
默认的扰动方式有:颜色空间转换(cvtColor)、模糊(blur)、抖动(jitter)、噪声(Gasuss noise)、随机切割(random crop)、透视(perspective)、颜色反转(reverse)、TIA数据增广。
训练过程中每种扰动方式以40%的概率被选择,具体代码实现请参考:[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) #单卡训练(训练周期长,不建议)
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux* #多卡训练,通过--gpus参数指定卡号
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
```
<a name="训练"></a>
#### 2.2 训练
PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy` PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 `output/rec_CRNN/best_accuracy`
...@@ -286,85 +303,12 @@ Eval: ...@@ -286,85 +303,12 @@ Eval:
``` ```
**注意,预测/评估时的配置文件请务必与训练一致。** **注意,预测/评估时的配置文件请务必与训练一致。**
<a name="小语种"></a> <a name="多语言模型训练"></a>
#### 2.3 小语种 ### 2.3 多语言模型训练
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml) PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)
您有两种方式创建所需的配置文件: 按语系划分,目前PaddleOCR支持的语种有:
1. 通过脚本自动生成
[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) 可以帮助您生成多语言模型的配置文件
- 以意大利语为例,如果您的数据是按如下格式准备的:
```
|-train_data
|- it_train.txt # 训练集标签
|- it_val.txt # 验证集标签
|- data
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
可以使用默认参数,生成配置文件:
```bash
# 该代码需要在指定目录运行
cd PaddleOCR/configs/rec/multi_language/
# 通过-l或者--language参数设置需要生成的语种的配置文件,该命令会将默认参数写入配置文件
python3 generate_multi_language_configs.py -l it
```
- 如果您的数据放置在其他位置,或希望使用自己的字典,可以通过指定相关参数来生成配置文件:
```bash
# -l或者--language字段是必须的
# --train修改训练集,--val修改验证集,--data_dir修改数据集目录,--dict修改字典路径, -o修改对应默认参数
cd PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py -l it \ # 语种
--train {path/of/train_label.txt} \ # 训练标签文件的路径
--val {path/of/val_label.txt} \ # 验证集标签文件的路径
--data_dir {train_data/path} \ # 训练数据的根目录
--dict {path/of/dict} \ # 字典文件路径
-o Global.use_gpu=False # 是否使用gpu
...
```
意大利文由拉丁字母组成,因此执行完命令后会得到名为 rec_latin_lite_train.yml 的配置文件。
2. 手动修改配置文件
您也可以手动修改模版中的以下几个字段:
```
Global:
use_gpu: True
epoch_num: 500
...
character_type: it # 需要识别的语种
character_dict_path: {path/of/dict} # 字典文件所在路径
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/ # 数据存放根目录
label_file_list: ["./train_data/train_list.txt"] # 训练集label路径
...
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/ # 数据存放根目录
label_file_list: ["./train_data/val_list.txt"] # 验证集label路径
...
```
目前PaddleOCR支持的多语言算法有:
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type | | 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: | | :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
...@@ -381,10 +325,6 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi ...@@ -381,10 +325,6 @@ PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi
更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99) 更多支持语种请参考: [多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/multi_languages.md#%E8%AF%AD%E7%A7%8D%E7%BC%A9%E5%86%99)
多语言模型训练方式与中文模型一致,训练数据集均为100w的合成数据,少量的字体可以通过下面两种方式下载。
* [百度网盘](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA)。提取码:frgi。
* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
如您希望在现有模型效果的基础上调优,请参考下列说明修改配置文件: 如您希望在现有模型效果的基础上调优,请参考下列说明修改配置文件:
`rec_french_lite_train` 为例: `rec_french_lite_train` 为例:
...@@ -420,7 +360,7 @@ Eval: ...@@ -420,7 +360,7 @@ Eval:
... ...
``` ```
<a name="评估"></a> <a name="评估"></a>
### 3 评估 ## 3 评估
评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。 评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
...@@ -430,14 +370,29 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec ...@@ -430,14 +370,29 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
``` ```
<a name="预测"></a> <a name="预测"></a>
### 4 预测 ## 4 预测
<a name="训练引擎预测"></a>
#### 4.1 训练引擎的预测
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。 使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 指定权重: 默认预测图片存储在 `infer_img` 里,通过 `-o Global.checkpoints` 加载训练好的参数文件:
根据配置文件中设置的的 `save_model_dir``save_epoch_step` 字段,会有以下几种参数被保存下来:
```
output/rec/
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── config.yml
├── iter_epoch_3.pdopt
├── iter_epoch_3.pdparams
├── iter_epoch_3.states
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
```
其中 best_accuracy.* 是评估集上的最优模型;iter_epoch_x.* 是以 `save_epoch_step` 为间隔保存下来的模型;latest.* 是最后一个epoch的模型。
``` ```
# 预测英文结果 # 预测英文结果
......
# 模型训练
本文将介绍模型训练时需掌握的基本概念,和训练时的调优方法。
同时会简单介绍PaddleOCR模型训练数据的组成部分,以及如何在垂类场景中准备数据finetune模型。
- [1. 基本概念](#基本概念)
* [1.1 学习率](#学习率)
* [1.2 正则化](#正则化)
* [1.3 评估指标](#评估指标)
- [2. 数据与垂类场景](#数据与垂类场景)
* [2.1 训练数据](#训练数据)
* [2.2 垂类场景](#垂类场景)
* [2.3 自己构建数据集](#自己构建数据集)
* [3. 常见问题](#常见问题)
<a name="基本概念"></a>
## 1. 基本概念
OCR(Optical Character Recognition,光学字符识别)是指对图像进行分析识别处理,获取文字和版面信息的过程,是典型的计算机视觉任务,
通常由文本检测和文本识别两个子任务构成。
模型调优时需要关注以下参数:
<a name="学习率"></a>
### 1.1 学习率
学习率是训练神经网络的重要超参数之一,它代表在每一次迭代中梯度向损失函数最优解移动的步长。
在PaddleOCR中提供了多种学习率更新策略,可以通过配置文件修改,例如:
```
Optimizer:
...
lr:
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
```
Piecewise 代表分段常数衰减,在不同的学习阶段指定不同的学习率,在每段内学习率相同。
warmup_epoch 代表在前5个epoch中,学习率将逐渐从0增加到base_lr。全部策略可以参考代码[learning_rate.py](../../ppocr/optimizer/learning_rate.py)
<a name="正则化"></a>
### 1.2 正则化
正则化可以有效的避免算法过拟合,PaddleOCR中提供了L1、L2正则方法,L1 和 L2 正则化是最常用的正则化方法。L1 正则化向目标函数添加正则化项,以减少参数的绝对值总和;而 L2 正则化中,添加正则化项的目的在于减少参数平方的总和。配置方法如下:
```
Optimizer:
...
regularizer:
name: L2
factor: 2.0e-05
```
<a name="评估指标"></a>
### 1.3 评估指标
(1)检测阶段:先按照检测框和标注框的IOU评估,IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框,是采用多边形进行表示。检测准确率:正确的检测框个数在全部检测框的占比,主要是判断检测指标。检测召回率:正确的检测框个数在全部标注框的占比,主要是判断漏检的指标。
(2)识别阶段: 字符识别准确率,即正确识别的文本行占标注的文本行数量的比例,只有整行文本识别对才算正确识别。
(3)端到端统计: 端对端召回率:准确检测并正确识别文本行在全部标注文本行的占比; 端到端准确率:准确检测并正确识别文本行在 检测到的文本行数量 的占比; 准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的的检测框中的文本与标注的文本相同。
<a name="数据与垂类场景"></a>
## 2. 数据与垂类场景
<a name="训练数据"></a>
### 2.1 训练数据
目前开源的模型,数据集和量级如下:
- 检测:
- 英文数据集,ICDAR2015
- 中文数据集,LSVT街景数据集训练数据3w张图片
- 识别:
- 英文数据集,MJSynth和SynthText合成数据,数据量上千万。
- 中文数据集,LSVT街景数据集根据真值将图crop出来,并进行位置校准,总共30w张图像。此外基于LSVT的语料,合成数据500w。
- 小语种数据集,使用不同语料和字体,分别生成了100w合成数据集,并使用ICDAR-MLT作为验证集。
其中,公开数据集都是开源的,用户可自行搜索下载,也可参考[中文数据集](./datasets.md),合成数据暂不开源,用户可使用开源合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)[SynthText](https://github.com/ankush-me/SynthText)[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) 等。
<a name="垂类场景"></a>
### 2.2 垂类场景
PaddleOCR主要聚焦通用OCR,如果有垂类需求,您可以用PaddleOCR+垂类数据自己训练;
如果缺少带标注的数据,或者不想投入研发成本,建议直接调用开放的API,开放的API覆盖了目前比较常见的一些垂类。
<a name="自己构建数据集"></a>
### 2.3 自己构建数据集
在构建数据集时有几个经验可供参考:
(1) 训练集的数据量:
a. 检测需要的数据相对较少,在PaddleOCR模型的基础上进行Fine-tune,一般需要500张可达到不错的效果。
b. 识别分英文和中文,一般英文场景需要几十万数据可达到不错的效果,中文则需要几百万甚至更多。
(2)当训练数据量少时,可以尝试以下三种方式获取更多的数据:
a. 人工采集更多的训练数据,最直接也是最有效的方式。
b. 基于PIL和opencv基本图像处理或者变换。例如PIL中ImageFont, Image, ImageDraw三个模块将文字写到背景中,opencv的旋转仿射变换,高斯滤波等。
c. 利用数据生成算法合成数据,例如pix2pix或StyleText等算法。
<a name="常见问题"></a>
## 3. 常见问题
**Q**:训练CRNN识别时,如何选择合适的网络输入shape?
A:一般高度采用32,最长宽度的选择,有两种方法:
(1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。
(2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。
**Q**:识别训练时,训练集精度已经到达90了,但验证集精度一直在70,涨不上去怎么办?
A:训练集精度90,测试集70多的话,应该是过拟合了,有两个可尝试的方法:
(1)加入更多的增广方式或者调大增广prob的[概率](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341),默认为0.4。
(2)调大系统的[l2 dcay值](https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
**Q**: 识别模型训练时,loss能正常下降,但acc一直为0
A:识别模型训练初期acc为0是正常的,多训一段时间指标就上来了。
# 更新 # 更新
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
- 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。 - 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。 - 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题,总数124个,并且计划以后每周一都会更新,欢迎大家持续关注。
- 2020.11.25 更新半自动标注工具[PPOCRLabel](../../PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。 - 2020.11.25 更新半自动标注工具[PPOCRLabel](../../PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
......
# 效果展示 # 效果展示
<a name="超轻量PP-OCRv2效果展示"></a>
## 超轻量PP-OCRv2效果展示
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg" width="800">
<a name="超轻量ppocr_server_2.0效果展示"></a> <a name="超轻量ppocr_server_2.0效果展示"></a>
## 通用ppocr_server_2.0 效果展示 ## 通用PP-OCR server 效果展示
<div align="center"> <div align="center">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
...@@ -10,8 +16,6 @@ ...@@ -10,8 +16,6 @@
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00059985.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00111002.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00077949.jpg" width="800">
<img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800"> <img src="../imgs_results/ch_ppocr_mobile_v2.0/00207393.jpg" width="800">
</div> </div>
......
...@@ -210,7 +210,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true ...@@ -210,7 +210,7 @@ paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true
```bash ```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]] [[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]] [[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]] [[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]µ
...... ......
``` ```
......
...@@ -51,7 +51,7 @@ Take rec_chinese_lite_train_v2.0.yml as an example ...@@ -51,7 +51,7 @@ Take rec_chinese_lite_train_v2.0.yml as an example
### Architecture ([ppocr/modeling](../../ppocr/modeling)) ### Architecture ([ppocr/modeling](../../ppocr/modeling))
In ppocr, the network is divided into four stages: Transform, Backbone, Neck and Head In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head
| Parameter | Use | Defaults | Note | | Parameter | Use | Defaults | Note |
| :---------------------: | :---------------------: | :--------------: | :--------------------: | | :---------------------: | :---------------------: | :--------------: | :--------------------: |
...@@ -120,3 +120,108 @@ In ppocr, the network is divided into four stages: Transform, Backbone, Neck and ...@@ -120,3 +120,108 @@ In ppocr, the network is divided into four stages: Transform, Backbone, Neck and
| batch_size_per_card | Single card batch size during training | 256 | \ | | batch_size_per_card | Single card batch size during training | 256 | \ |
| drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ | | drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ |
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ | | num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
## 3. MULTILINGUAL CONFIG FILE GENERATION
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)
There are two ways to create the required configuration file::
1. Automatically generated by script
[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models
- Take Italian as an example, if your data is prepared in the following format:
```
|-train_data
|- it_train.txt # train_set label
|- it_val.txt # val_set label
|- data
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
You can use the default parameters to generate a configuration file:
```bash
# The code needs to be run in the specified directory
cd PaddleOCR/configs/rec/multi_language/
# Set the configuration file of the language to be generated through the -l or --language parameter.
# This command will write the default parameters into the configuration file
python3 generate_multi_language_configs.py -l it
```
- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:
```bash
# -l or --language field is required
# --train to modify the training set
# --val to modify the validation set
# --data_dir to modify the data set directory
# --dict to modify the dict path
# -o to modify the corresponding default parameters
cd PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py -l it \ # language
--train {path/of/train_label.txt} \ # path of train_label
--val {path/of/val_label.txt} \ # path of val_label
--data_dir {train_data/path} \ # root directory of training data
--dict {path/of/dict} \ # path of dict
-o Global.use_gpu=False # whether to use gpu
...
```
Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.
2. Manually modify the configuration file
You can also manually modify the following fields in the template:
```
Global:
use_gpu: True
epoch_num: 500
...
character_type: it # language
character_dict_path: {path/of/dict} # path of dict
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of training data
label_file_list: ["./train_data/train_list.txt"] # train label path
...
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of val data
label_file_list: ["./train_data/val_list.txt"] # val label path
...
```
Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht|
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English(Case sensitive) | EN |
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan |
| rec_korean_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Korean | korean |
| rec_latin_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Latin | latin |
| rec_arabic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | arabic | ar |
| rec_cyrillic_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | cyrillic | cyrillic |
| rec_devanagari_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | devanagari | devanagari |
For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
# TEXT DETECTION # CONTENT
- [Paste Your Document In Here](#paste-your-document-in-here)
- [1. TEXT DETECTION](#1-text-detection)
* [1.1 DATA PREPARATION](#11-data-preparation)
* [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model)
* [1.3 START TRAINING](#13-start-training)
* [1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING](#14-load-trained-model-and-continue-training)
* [1.5 TRAINING WITH NEW BACKBONE](#15-training-with-new-backbone)
* [1.6 EVALUATION](#16-evaluation)
* [1.7 TEST](#17-test)
* [1.8 INFERENCE MODEL PREDICTION](#18-inference-model-prediction)
- [2. FAQ](#2-faq)
# 1. TEXT DETECTION
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
## DATA PREPARATION ## 1.1 DATA PREPARATION
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`
<p align="center">
<img src="./doc/datasets/ic15_location_download.png" align="middle" width = "600"/>
<p align="center">
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
```shell ```shell
# Under the PaddleOCR path # Under the PaddleOCR path
...@@ -36,10 +58,11 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin ...@@ -36,10 +58,11 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format. If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
## TRAINING ## 1.2 DOWNLOAD PRETRAINED MODEL
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/develop/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in [PaddleClas repo](https://github.com/PaddlePaddle/PaddleClas#mobile-series).
```shell ```shell
cd PaddleOCR/ cd PaddleOCR/
# Download the pre-trained model of MobileNetV3 # Download the pre-trained model of MobileNetV3
...@@ -49,11 +72,13 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg ...@@ -49,11 +72,13 @@ wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dyg
# or, download the pre-trained model of ResNet50_vd # or, download the pre-trained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams
```
#### START TRAINING ## 1.3 START TRAINING
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.* *If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
``` ```
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
...@@ -62,16 +87,17 @@ For a detailed explanation of the configuration file, please refer to [config](. ...@@ -62,16 +87,17 @@ For a detailed explanation of the configuration file, please refer to [config](.
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
```shell ```shell
# single GPU training # single GPU training
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db.yml -o \
Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Optimizer.base_lr=0.0001
# multi-GPU training # multi-GPU training
# Set the GPU ID used by the '--gpus' parameter. # Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
``` ```
#### load trained model and continue training ## 1.4 LOAD TRAINED MODEL AND CONTINUE TRAINING
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example: For example:
...@@ -82,9 +108,59 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you ...@@ -82,9 +108,59 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. **Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
## EVALUATION ## 1.5 TRAINING WITH NEW BACKBONE
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).
```bash
├── architectures # Code for building network
├── transforms # Image Transformation Module
├── backbones # Feature extraction module
├── necks # Feature enhancement module
└── heads # Output module
```
If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.
However, if you want to use a new Backbone, an example of replacing the backbones is as follows:
1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:
```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
def __init__(self, *args, **kwargs):
super(MyBackbone, self).__init__()
# your init code
self.conv = nn.xxxx
def forward(self, inputs):
# your network forward
y = self.conv(inputs)
return y
```
3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
```yaml
Backbone:
name: MyBackbone
args1: args1
```
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
## 1.6 EVALUATION
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml` Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
...@@ -95,10 +171,9 @@ The model parameters during training are saved in the `Global.save_model_dir` di ...@@ -95,10 +171,9 @@ The model parameters during training are saved in the `Global.save_model_dir` di
python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. ## 1.7 TEST
## TEST
Test the detection result on a single image: Test the detection result on a single image:
```shell ```shell
...@@ -107,7 +182,7 @@ python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./ ...@@ -107,7 +182,7 @@ python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./
When testing the DB model, adjust the post-processing threshold: When testing the DB model, adjust the post-processing threshold:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
``` ```
...@@ -115,3 +190,33 @@ Test the detection result on all images in the folder: ...@@ -115,3 +190,33 @@ Test the detection result on all images in the folder:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
``` ```
## 1.8 INFERENCE MODEL PREDICTION
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
Firstly, we can convert DB trained model to inference model:
```shell
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/"
```
The detection inference model prediction:
```shell
python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```
If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm:
```shell
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```
# 2. FAQ
Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).
# Environment Preparation
* [1. Python Environment Setup](#1)
+ [1.1 Windows](#1.1)
+ [1.2 Mac](#1.2)
+ [1.3 Linux](#1.3)
* [2. Install PaddlePaddle 2.0](#2)
<a name="1"></a>
## 1. Python Environment Setup
<a name="1.1"></a>
### 1.1 Windows
#### 1.1.1 Install Anaconda
- Note: To use paddlepaddle you need to install python environment first, here we choose python integrated environment Anaconda toolkit
- Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment.
- Anaconda download.
- Address: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
- Most Win10 computers are 64-bit operating systems, choose x86_64 version; if the computer is a 32-bit operating system, choose x86.exe
<img src="../install/windows/Anaconda_download.png" alt="anaconda download" width="800" align="center "/>
- After the download is complete, double-click the installer to enter the graphical interface
- The default installation location is C drive, it is recommended to change the installation location to D drive.
<img src="../install/windows/anaconda_install_folder.png" alt="install config" width="500" align=" left"/>
- Check conda to add environment variables and ignore the warning that
<img src="../install/windows/anaconda_install_env.png" alt="add conda to path" width="500" align="center"/>
#### 1.1.2 Opening the terminal and creating the conda environment
- Open Anaconda Prompt terminal: bottom left Windows Start Menu -> Anaconda3 -> Anaconda Prompt start console
<img src="../install/windows/anaconda_prompt.png" alt="anaconda download" width="300" align="center"/>
- Create a new conda environment
```shell
# Enter the following command at the command line to create an environment named paddle_env
# Here to speed up the download, use the Tsinghua source
conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ # This is a one line command
```
This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status
The command line will then output a prompt, type y and enter to continue the installation
<img src="../install/windows/conda_new_env.png" alt="conda create" width="700" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line.
```shell
# Activate the paddle_env environment
conda activate paddle_env
# View the current location of python
where python
```
<img src="../install/windows/conda_list_env.png" alt="create environment" width="600" align="center"/>
The above anaconda environment and python environment are installed
<a name="1.2"></a>
### 1.2 Mac
#### 1.2.1 Installing Anaconda
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
- Anaconda download:.
- Address: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
<img src="../install/mac/anaconda_start.png" alt="anaconda download" width="800" align="center"/>
- Select `Anaconda3-2021.05-MacOSX-x86_64.pkg` at the bottom to download
- After downloading, double click on the .pkg file to enter the graphical interface
- Just follow the default settings, it will take a while to install
- It is recommended to install a code editor such as vscode or pycharm
#### 1.2.2 Open a terminal and create a conda environment
- Open the terminal
- Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal
- **Add conda to the environment variables**
- Environment variables are added so that the system can recognize the conda command
- Open `~/.bash_profile` in the terminal by typing the following command.
```shell
vim ~/.bash_profile
```
- Add conda as an environment variable in `~/.bash_profile`.
```shell
# Press i first to enter edit mode
# In the first line type.
export PATH="~/opt/anaconda3/bin:$PATH"
# If you customized the installation location during installation, change ~/opt/anaconda3/bin to the bin folder in the customized installation directory
```
```shell
# The modified ~/.bash_profile file should look like this (where xxx is the username)
export PATH="~/opt/anaconda3/bin:$PATH"
# >>> conda initialize >>>
# !!! Contents within this block are managed by 'conda init' !!!
__conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then
. "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/Users/xxx/opt/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
```
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables
- Create a new conda environment
```shell
# Enter the following command at the command line to create an environment called paddle_env
# Here to speed up the download, use Tsinghua source
conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
```
- This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status
- The command line will then output a prompt, type y and enter to continue the installation
- <img src="../install/mac/conda_create.png" alt="conda_create" width="600" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line.
```shell
# Activate the paddle_env environment
conda activate paddle_env
# View the current location of python
where python
```
<img src="../install/mac/conda_activate.png" alt="conda_actviate" width="600" align="center"/>
The above anaconda environment and python environment are installed
<a name="1.3"></a>
### 1.3 Linux
Linux users can choose to run either Anaconda or Docker. If you are familiar with Docker and need to train the PaddleOCR model, it is recommended to use the Docker environment, where the development process of PaddleOCR is run. If you are not familiar with Docker, you can also use Anaconda to run the project.
#### 1.3.1 Anaconda environment configuration
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
- **Download Anaconda**.
- Download at: https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
<img src="../install/linux/anaconda_download.png" akt="anaconda download" width="800" align="center"/>
- Select the appropriate version for your operating system
- Type `uname -m` in the terminal to check the command set used by your system
- Download method 1: Download locally, then transfer the installation package to the linux server
- Download method 2: Directly use linux command line to download
```shell
# First install wget
sudo apt-get install wget # Ubuntu
sudo yum install wget # CentOS
```
```bash
# Then use wget to download from Tsinghua source
# If you want to download Anaconda3-2021.05-Linux-x86_64.sh, the download command is as follows
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.05-Linux-x86_64.sh
# If you want to download another version, you need to change the file name after the last 1 / to the version you want to download
```
- To install Anaconda.
- Type `sh Anaconda3-2021.05-Linux-x86_64.sh` at the command line
- If you downloaded a different version, replace the file name of the command with the name of the file you downloaded
- Just follow the installation instructions
- You can exit by typing q when viewing the license
- **Add conda to the environment variables**
- If you have already added conda to the environment variable path during the installation, you can skip this step
- Open `~/.bashrc` in a terminal.
```shell
# Enter the following command in the terminal.
vim ~/.bashrc
```
- Add conda as an environment variable in `~/.bashrc`.
```shell
# Press i first to enter edit mode # In the first line enter.
export PATH="~/anaconda3/bin:$PATH"
# If you customized the installation location during installation, change ~/anaconda3/bin to the bin folder in the customized installation directory
```
```shell
# The modified ~/.bash_profile file should look like this (where xxx is the username)
export PATH="~/opt/anaconda3/bin:$PATH"
# >>> conda initialize >>>
# !!! Contents within this block are managed by 'conda init' !!!
__conda_setup="$('/Users/xxx/opt/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh" ]; then
. "/Users/xxx/opt/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/Users/xxx/opt/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
```
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables
- Create a new conda environment
```shell
# Enter the following command at the command line to create an environment called paddle_env
# Here to speed up the download, use Tsinghua source
conda create --name paddle_env python=3.8 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
```
- This command will create an executable environment named paddle_env with python version 3.8, which will take a while depending on the network status
- The command line will then output a prompt, type y and enter to continue the installation
<img src="../install/linux/conda_create.png" alt="conda_create" width="500" align="center"/>
- To activate the conda environment you just created, enter the following command at the command line.
```shell
# Activate the paddle_env environment
conda activate paddle_env
```
The above anaconda environment and python environment are installed
#### 1.3.2 Docker environment preparation
**The first time you use this docker image, it will be downloaded automatically. Please be patient.**
```bash
# Switch to the working directory
cd /home/Projects
# You need to create a docker container for the first run, and do not need to run the current command when you run it again
# Create a docker container named ppocr and map the current directory to the /paddle directory of the container
# If using CPU, use docker instead of nvidia-docker to create docker
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
```
<a name="2"></a>
## 2. Install PaddlePaddle 2.0
- If you have cuda9 or cuda10 installed on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- If you only have cpu on your machine, please run the following command to install
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
# Reasoning based on Python prediction engine
This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU.
- [TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
- [TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
- [1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
- [2. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
- [ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
- [TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
<a name="DETECTION_MODEL_INFERENCE"></a>
## TEXT DETECTION MODEL INFERENCE
The default configuration is based on the inference setting of the DB text detection model. For lightweight Chinese detection model inference, you can execute the following commands:
```
# download DB text detection inference model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# predict
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/"
```
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_00018069.jpg)
You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.
The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
If this value is exceeded, the image will be resized with the same width ratio to ensure that the longest side is `det_limit_side_len`.
Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest side of the image is limited to 960.
If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --det_limit_type=max --det_limit_side_len=1216
```
If you want to use the CPU for prediction, execute the command as follows
```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
```
<a name="RECOGNITION_MODEL_INFERENCE"></a>
## TEXT RECOGNITION MODEL INFERENCE
<a name="LIGHTWEIGHT_RECOGNITION"></a>
### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
For lightweight Chinese recognition model inference, you can execute the following commands:
```
# download CRNN text recognition inference model
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_10.png" --rec_model_dir="ch_ppocr_mobile_v2.0_rec_infer"
```
![](../imgs_words_en/word_10.png)
After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.
```bash
Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
```
<a name="MULTILINGUAL_MODEL_INFERENCE"></a>
### 2. MULTILINGAUL MODEL INFERENCE
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_type="korean" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
```
![](../imgs_words/korean/1.jpg)
After executing the command, the prediction result of the above figure is:
``` text
Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
```
<a name="ANGLE_CLASS_MODEL_INFERENCE"></a>
## ANGLE CLASSIFICATION MODEL INFERENCE
For angle classification model inference, you can execute the following commands:
```
# download text angle class inference model:
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words_en/word_10.png" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer"
```
![](../imgs_words_en/word_10.png)
After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen.
```
Predicts of ./doc/imgs_words_en/word_10.png:['0', 0.9999995]
```
<a name="CONCATENATION"></a>
## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
```shell
# use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --cls_model_dir="./inference/cls/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=true
# not use use direction classifier
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
# use multi-process
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" --use_angle_cls=false --use_mp=True --total_process_num=6
```
After executing the command, the recognition result image is as follows:
![](../imgs_results/system_res_00018069.jpg)
# PP-OCR Model and Configuration
- [INTRODUCTION ABOUT OCR](#introduction-about-ocr)
* [BASIC CONCEPTS OF OCR DETECTION MODEL](#basic-concepts-of-ocr-detection-model)
* [Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
* [PP-OCR model](#pp-ocr-model)
* [And a table of contents](#and-a-table-of-contents)
* [On the right](#on-the-right)
## 1. INTRODUCTION ABOUT OCR
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
OCR (Optical Character Recognition, Optical Character Recognition) is currently the general term for text recognition. It is not limited to document or book text recognition, but also includes recognizing text in natural scenes. It can also be called STR (Scene Text Recognition).
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
### 1.1 BASIC CONCEPTS OF OCR DETECTION MODEL
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
Text detection algorithms based on deep learning can be roughly divided into the following categories:
1. Method based on target detection. Generally, after the text box is predicted, the final text box is filtered through NMS, which is mostly four-point text box, which is not ideal for curved text scenes. Typical algorithms are methods such as EAST and Text Box.
2. Method based on text segmentation. The text line is regarded as the segmentation target, and then the external text box is constructed through the segmentation result, which can handle curved text, and the effect is not ideal for the text cross scene problem. Typical algorithms are DB, PSENet and other methods.
3. Hybrid target detection and segmentation method.
### 1.2 Basic concepts of OCR recognition model
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
### 1.3 PP-OCR model
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
Among them, PaddleOCR has released the PP-OCR series model for the general OCR in Chinese and English natural scenes. The PP-OCR model is composed of the DB+CRNN algorithm. It uses massive Chinese data training and model tuning methods to have high text detection and recognition capabilities in Chinese scenes. And PaddleOCR has launched a high-precision and ultra-lightweight PP-OCRv2 model. The detection model is only 3M, and the recognition model is only 8.5M. Using [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)'s model quantification method, the detection model can be compressed to 0.8M without reducing the accuracy. The recognition is compressed to 3M, which is more suitable for mobile deployment scenarios.
## OCR model list(V2.0, updated on 2021.1.20 ## OCR model list(V2.1, updated on 2021.9.6
> **Note** > **Note**
> 1. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance. > 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed.
> 2. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md). > 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
- [1. Text Detection Model](#Detection) - [1. Text Detection Model](#Detection)
- [2. Text Recognition Model](#Recognition) - [2. Text Recognition Model](#Recognition)
...@@ -28,6 +29,8 @@ Relationship of the above models is as follows. ...@@ -28,6 +29,8 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.1_det|slim quantization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.yml](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)| 3M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_quant_infer.tar)|
|ch_ppocr_mobile_v2.1_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_lite_train_cml_v2.1.ym](../../configs/det/ch_ppocr_v2.1/ch_det_lite_train_cml_v2.1.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_distill_train.tar)|
|ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)| |ch_ppocr_mobile_slim_v2.0_det|Slim pruned lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|2.6M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/slim/ch_ppocr_mobile_v2.0_det_prune_infer.tar)|
|ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)| |ch_ppocr_mobile_v2.0_det|Original lightweight model, supporting Chinese, English, multilingual text detection|[ch_det_mv3_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml)|3M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|
|ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)| |ch_ppocr_server_v2.0_det|General model, which is larger than the lightweight model, but achieved better performance|[ch_det_res18_db_v2.0.yml](../../configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml)|47M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)|
...@@ -40,6 +43,8 @@ Relationship of the above models is as follows. ...@@ -40,6 +43,8 @@ Relationship of the above models is as follows.
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.1_rec|Slim qunatization with distillation lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)| 9M |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_quant_train.tar) |
|ch_ppocr_mobile_v2.1_rec|Original lightweight model, supporting Chinese, English, multilingual text detection|[rec_chinese_lite_train_distillation_v2.1.yml](../../configs/rec/ch_ppocr_v2.1/rec_chinese_lite_train_distillation_v2.1.yml)|8.5M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_train.tar) |
|ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) | |ch_ppocr_mobile_slim_v2.0_rec|Slim pruned and quantized lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)| 6M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_slim_train.tar) |
|ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) | |ch_ppocr_mobile_v2.0_rec|Original lightweight model, supporting Chinese, English and number recognition|[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)|5.2M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) | |ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
...@@ -58,45 +63,6 @@ Relationship of the above models is as follows. ...@@ -58,45 +63,6 @@ Relationship of the above models is as follows.
<a name="Multilingual"></a> <a name="Multilingual"></a>
#### Multilingual Recognition Model(Updating...) #### Multilingual Recognition Model(Updating...)
**Note:** The configuration file of the new multi language model is generated by code. You can use the `--help` parameter to check which multi language are supported by current PaddleOCR.
```bash
# The code needs to run in the specified directory
cd {your/path/}PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py --help
```
Take the Italian configuration file as an example:
##### 1.Generate Italian configuration file to test the model provided
you can generate the default configuration file through the following command, and use the default language dictionary provided by paddleocr for prediction.
```bash
# The code needs to run in the specified directory
cd {your/path/}PaddleOCR/configs/rec/multi_language/
# Set the required language configuration file through -l or --language parameter
# This command will write the default parameter to the configuration file.
python3 generate_multi_language_configs.py -l it
```
##### 2. Generate Italian configuration file to train your own data
If you want to train your own model, you can prepare the training set file, verification set file, dictionary file and training data path. Here we assume that the Italian training set, verification set, dictionary and training data path are:
- Training set:{your/path/}PaddleOCR/train_data/train_list.txt
- Validation set: {your/path/}PaddleOCR/train_data/val_list.txt
- Use the default dictionary provided by paddleocr:{your/path/}PaddleOCR/ppocr/utils/dict/it_dict.txt
- Training data path:{your/path/}PaddleOCR/train_data
```bash
# The code needs to run in the specified directory
cd {your/path/}PaddleOCR/configs/rec/multi_language/
# The -l or --language parameter is required
# --train modify train_list path
# --val modify eval_list path
# --data_dir modify data dir
# -o modify default parameters
# --dict Change the dictionary path. The example uses the default dictionary path, so that this parameter can be empty.
python3 generate_multi_language_configs.py -l it \
--train {path/to/train_list} \
--val {path/to/val_list} \
--data_dir {path/to/data_dir} \
-o Global.use_gpu=False
```
|model name| dict file | description|config|model size|download| |model name| dict file | description|config|model size|download|
| --- | --- | --- |--- | --- | --- | | --- | --- | --- |--- | --- | --- |
| french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt | Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) | | french_mobile_v2.0_rec | ppocr/utils/dict/french_dict.txt | Lightweight model for French recognition|[rec_french_lite_train.yml](../../configs/rec/multi_language/rec_french_lite_train.yml)|2.65M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_train.tar) |
...@@ -120,12 +86,14 @@ For more supported languages, please refer to : [Multi-language model](./multi_l ...@@ -120,12 +86,14 @@ For more supported languages, please refer to : [Multi-language model](./multi_l
|model name|description|config|model size|download| |model name|description|config|model size|download|
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_train.tar) | |ch_ppocr_mobile_slim_v2.0_cls|Slim quantized model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)| 2.1M | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_slim_train.tar) |
|ch_ppocr_mobile_v2.0_cls|Original model|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | |ch_ppocr_mobile_v2.0_cls|Original model for text angle classification|[cls_mv3.yml](../../configs/cls/cls_mv3.yml)|1.38M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |
<a name="Paddle-Lite"></a> <a name="Paddle-Lite"></a>
### 4. Paddle-Lite Model ### 4. Paddle-Lite Model
|Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch| |Version|Introduction|Model size|Detection model|Text Direction model|Recognition model|Paddle-Lite branch|
|---|---|---|---|---|---|---| |---|---|---|---|---|---|---|
|V2.0|extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9| |V2.1|ppocr_v2.1 extra-lightweight chinese OCR optimized model|11M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_infer_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_infer_opt.nb)|v2.9|
|V2.0(slim)|extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9| |V2.1(slim)|extra-lightweight chinese OCR optimized model|4.9M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/chinese/ch_ppocr_mobile_v2.1_rec_slim_opt.nb)|v2.9|
|V2.0|ppocr_v2.0 extra-lightweight chinese OCR optimized model|7.8M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_opt.nb)|v2.9|
|V2.0(slim)|ppovr_v2.0 extra-lightweight chinese OCR optimized model|3.3M|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_det_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_cls_slim_opt.nb)|[download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb)|v2.9|
...@@ -198,13 +198,13 @@ If necessary, you can read related documents: ...@@ -198,13 +198,13 @@ If necessary, you can read related documents:
| Language | Abbreviation | | Language | Abbreviation | | Language | Abbreviation | | Language | Abbreviation |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
|chinese and english|ch| |Arabic|ar| |Chinese & English|ch| |Arabic|ar|
|english|en| |Hindi|hi| |English|en| |Hindi|hi|
|french|fr| |Uyghur|ug| |French|fr| |Uyghur|ug|
|german|german| |Persian|fa| |German|german| |Persian|fa|
|japan|japan| |Urdu|ur| |Japan|japan| |Urdu|ur|
|korean|korean| | Serbian(latin) |rs_latin| |Korean|korean| | Serbian(latin) |rs_latin|
|chinese traditional |ch_tra| |Occitan |oc| |Chinese Traditional |chinese_cht| |Occitan |oc|
| Italian |it| |Marathi|mr| | Italian |it| |Marathi|mr|
|Spanish |es| |Nepali|ne| |Spanish |es| |Nepali|ne|
| Portuguese|pt| |Serbian(cyrillic)|rs_cyrillic| | Portuguese|pt| |Serbian(cyrillic)|rs_cyrillic|
......
# PaddleOCR Overview and Project Clone
## 1. PaddleOCR Overview
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. Combining actual testing and industrial experience, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library according to different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure tool library for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality.
<div align="center">
<img src="../overview_en.png">
</div>
## 2. Project Clone
### **2.1 Clone PaddleOCR repo**
```
# Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR
# If you cannot pull successfully due to network problems, you can also choose to use the code hosting on the cloud:
git clone https://gitee.com/paddlepaddle/PaddleOCR
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
```
### **2.2 Install third-party libraries**
```
cd PaddleOCR
pip3 install -r requirements.txt
```
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows.
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Reference: [Solve shapely installation on windows](
\ No newline at end of file
# Quick start of Chinese OCR model # PaddleOCR Quick Start
## 1. Prepare for the environment [PaddleOCR Quick Start](#paddleocr-quick-start)
Please refer to [quick installation](./installation_en.md) to configure the PaddleOCR operating environment. + [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
* [2. Easy-to-Use](#2-easy-to-use)
+ [2.1 Use by command line](#21-use-by-command-line)
- [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
- [2.1.2 Multi-language Model](#212-multi-language-model)
- [2.1.3 Layout Analysis](#213-layoutAnalysis)
+ [2.2 Use by Code](#22-use-by-code)
- [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
- [2.2.2 Layout Analysis](#222-layoutAnalysis)
* Note: Support the use of PaddleOCR through whl package installation,pelease refer [PaddleOCR Package](./whl_en.md).
## 2.inference models
The detection and recognition models on the mobile and server sides are as follows. For more models (including multiple languages), please refer to [PP-OCR v2.0 series model list](../doc_ch/models_list.md) <a name="1-install-paddleocr-whl-package"></a>
| Model introduction | Model name | Recommended scene | Detection model | Direction Classifier | Recognition model | ## 1. Install PaddleOCR Whl Package
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| Ultra-lightweight Chinese OCR model (8.1M) | ch_ppocr_mobile_v2.0_xx |Mobile-side/Server-side|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| Universal Chinese OCR model (143M) | ch_ppocr_server_v2.0_xx |Server-side |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
```
* If `wget` is not installed in the windows environment, you can copy the link to the browser to download when downloading the model, then uncompress it and place it in the corresponding directory. - **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Copy the download address of the `inference model` for detection and recognition in the table above, and uncompress them. Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
``` - **For layout analysis users**, run the following command to install **Layout-Parser**
mkdir inference && cd inference
# Download the detection model and unzip
wget {url/of/detection/inference_model} && tar xf {name/of/detection/inference_model/package}
# Download the recognition model and unzip
wget {url/of/recognition/inference_model} && tar xf {name/of/recognition/inference_model/package}
# Download the direction classifier model and unzip
wget {url/of/classification/inference_model} && tar xf {name/of/classification/inference_model/package}
cd ..
```
Take the ultra-lightweight model as an example: ```bash
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
```
<a name="2-easy-to-use"></a>
## 2. Easy-to-Use
<a name="21-use-by-command-line"></a>
### 2.1 Use by command line
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
```bash
cd /path/to/ppocr_img
``` ```
mkdir inference && cd inference
# Download the detection model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar
# Download the recognition model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar
# Download the angle classifier model of the ultra-lightweight Chinese OCR model and uncompress it
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
cd ..
```
After decompression, the file structure should be as follows: If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
<a name="211-english-and-chinese-model"></a>
#### 2.1.1 Chinese and English Model
* Detection, direction classification and recognition: set the direction classifier parameter`--use_angle_cls true` to recognize vertical text.
```bash
paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en
```
Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```
* Only detection: set `--rec` to `false`
```bash
paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
```
Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
......
```
* Only recognition: set `--det` to `false`
```bash
paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
```
Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```
More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a>
#### 2.1.2 Multi-language Model
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
``` ```
├── ch_ppocr_mobile_v2.0_cls_infer
│ ├── inference.pdiparams <div align="center">
│ ├── inference.pdiparams.info <img src="../imgs_en/254.jpg" width="300" height="600">
│ └── inference.pdmodel <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
├── ch_ppocr_mobile_v2.0_det_infer </div>
│ ├── inference.pdiparams The result is a list, each item contains a text box, text and recognition confidence
│ ├── inference.pdiparams.info
│ └── inference.pdmodel ```text
├── ch_ppocr_mobile_v2.0_rec_infer [('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
├── inference.pdiparams [('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
├── inference.pdiparams.info [('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
└── inference.pdmodel [('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
``` ```
## 3. Single image or image set prediction Commonly used multilingual abbreviations include
| Language | Abbreviation | | Language | Abbreviation | | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English | ch | | French | fr | | Japanese | japan |
| English | en | | German | german | | Korean | korean |
| Chinese Traditional | chinese_cht | | Italian | it | | Russian | ru |
* The following code implements text detection、angle class and recognition process. When performing prediction, you need to specify the path of a single image or image set through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `rec_model_dir` specifies the path to identify the inference model, the parameter `use_angle_cls` specifies whether to use the direction classifier, the parameter `cls_model_dir` specifies the path to identify the direction classifier model, the parameter `use_space_char` specifies whether to predict the space char. The visual results are saved to the `./inference_results` folder by default. A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
<a name="213-layoutAnalysis"></a>
#### 2.1.3 Layout Analysis
Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image.
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`
```bash ```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
```
# Predict a single image specified by image_dir - **Results Format**
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True
The returned results of PP-Structure is a list composed of a dict, an example is as follows
```shell
[
{ 'type': 'Text',
'bbox': [34, 432, 345, 462],
'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent ', 0.465441)])
}
]
```
The description of each field in dict is as follows
| Parameter | Description |
| --------- | ------------------------------------------------------------ |
| type | Type of image area |
| bbox | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
| res | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
- **Parameter Description:**
| Parameter | Description | Default value |
| --------------- | ------------------------------------------------------------ | -------------------------------------------- |
| output | The path where excel and recognition results are saved | ./output/table |
| table_max_len | The long side of the image is resized in table structure model | 488 |
| table_model_dir | inference model path of table structure model | None |
| table_char_type | dict path of table structure model | ../ppocr/utils/dict/table_structure_dict.txt |
<a name="22-use-by-code"></a>
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
#### 2.2.1 Chinese & English Model and Multilingual Model
* detection, angle classification and recognition:
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
# Predict imageset specified by image_dir Output will be a list, each item contains bounding box, text and recognition confidence
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True
# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False ```bash
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
``` ```
- Universal Chinese OCR model Visualization of results
Please follow the above steps to download the corresponding models and update the relevant parameters, The example is as follows. <div align="center">
<img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>
<a name="222-layoutAnalysis"></a>
``` #### 2.2.2 Layout Analysis
# Predict a single image specified by image_dir
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_server_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_server_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True ```python
``` import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res
* Note table_engine = PPStructure(show_log=True)
- If you want to use the recognition model which does not support space char recognition, please update the source code to the latest version and add parameters `--use_space_char=False`.
- If you do not want to use direction classifier, please update the source code to the latest version and add parameters `--use_angle_cls=False`.
save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])
For more text detection and recognition tandem reasoning, please refer to the document tutorial for line in result:
: [Inference with Python inference engine](./inference_en.md) line.pop('img')
print(line)
In addition, the tutorial also provides other deployment methods for the Chinese OCR model: from PIL import Image
- [Server-side C++ inference](../../deploy/cpp_infer/readme_en.md)
- [Service deployment](../../deploy/hubserving) font_path = './fonts/simfang.ttf'
- [End-to-end deployment](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/deploy/lite) image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
## TEXT RECOGNITION # TEXT RECOGNITION
- [1 DATA PREPARATION](#DATA_PREPARATION) - [1 DATA PREPARATION](#DATA_PREPARATION)
- [1.1 Costom Dataset](#Costom_Dataset) - [1.1 Costom Dataset](#Costom_Dataset)
...@@ -8,8 +8,8 @@ ...@@ -8,8 +8,8 @@
- [2 TRAINING](#TRAINING) - [2 TRAINING](#TRAINING)
- [2.1 Data Augmentation](#Data_Augmentation) - [2.1 Data Augmentation](#Data_Augmentation)
- [2.2 Training](#Training) - [2.2 General Training](#Training)
- [2.3 Multi-language](#Multi_language) - [2.3 Multi-language Training](#Multi_language)
- [3 EVALUATION](#EVALUATION) - [3 EVALUATION](#EVALUATION)
...@@ -17,12 +17,12 @@ ...@@ -17,12 +17,12 @@
- [4.1 Training engine prediction](#Training_engine_prediction) - [4.1 Training engine prediction](#Training_engine_prediction)
<a name="DATA_PREPARATION"></a> <a name="DATA_PREPARATION"></a>
### DATA PREPARATION ## 1 DATA PREPARATION
PaddleOCR supports two data formats: PaddleOCR supports two data formats:
- `LMDB` is used to train data sets stored in lmdb format; - `LMDB` is used to train data sets stored in lmdb format(LMDBDataSet);
- `general data` is used to train data sets stored in text files: - `general data` is used to train data sets stored in text files(SimpleDataSet):
Please organize the dataset as follows: Please organize the dataset as follows:
...@@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset> ...@@ -36,7 +36,7 @@ mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
``` ```
<a name="Costom_Dataset"></a> <a name="Costom_Dataset"></a>
#### 1.1 Costom dataset ### 1.1 Costom dataset
If you want to use your own data for training, please refer to the following to organize your data. If you want to use your own data for training, please refer to the following to organize your data.
...@@ -84,11 +84,12 @@ Similar to the training set, the test set also needs to be provided a folder con ...@@ -84,11 +84,12 @@ Similar to the training set, the test set also needs to be provided a folder con
``` ```
<a name="Dataset_download"></a> <a name="Dataset_download"></a>
#### 1.2 Dataset download ### 1.2 Dataset download
If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark - ICDAR2015
If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path. If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads).
Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ,download the lmdb format dataset required for benchmark
If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR. If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
...@@ -101,8 +102,28 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t ...@@ -101,8 +102,28 @@ wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_t
wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
``` ```
PaddleOCR also provides a data format conversion script, which can convert ICDAR official website label to a data format
supported by PaddleOCR. The data conversion tool is in `ppocr/utils/gen_label.py`, here is the training set as an example:
```
# convert the official gt to rec_gt_label.txt
python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
```
The data format is as follows, (a) is the original picture, (b) is the Ground Truth text file corresponding to each picture:
![](../datasets/icdar_rec.png)
- Multilingual dataset
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) ,Extraction code:frgi.
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
<a name="Dictionary"></a> <a name="Dictionary"></a>
#### 1.3 Dictionary ### 1.3 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...@@ -147,14 +168,26 @@ To customize the dict file, please modify the `character_dict_path` field in `co ...@@ -147,14 +168,26 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch. If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<a name="Add_space_category"></a> <a name="Add_space_category"></a>
#### 1.4 Add space category ### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch** **Note: use_space_char only takes effect when character_type=ch**
<a name="TRAINING"></a> <a name="TRAINING"></a>
### 2 TRAINING ## 2 TRAINING
<a name="Data_Augmentation"></a>
### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, TIA augmentation.
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
<a name="Training"></a>
### 2.2 General Training
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
...@@ -172,21 +205,15 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc ...@@ -172,21 +205,15 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
Start training: Start training:
``` ```
# GPU training Support single card and multi-card training, specify the card number through --gpus # GPU training Support single card and multi-card training
# Training icdar15 English data and The training log will be automatically saved as train.log under "{save_model_dir}" # Training icdar15 English data and The training log will be automatically saved as train.log under "{save_model_dir}"
#specify the single card training(Long training time, not recommended)
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
#specify the card number through --gpus
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml
``` ```
<a name="Data_Augmentation"></a>
#### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, TIA augmentation.
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
<a name="Training"></a>
#### 2.2 Training
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process. PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.
...@@ -281,87 +308,7 @@ Eval: ...@@ -281,87 +308,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.** **Note that the configuration file for prediction/evaluation must be consistent with the training.**
<a name="Multi_language"></a> <a name="Multi_language"></a>
#### 2.3 Multi-language ### 2.3 Multi-language Training
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)
There are two ways to create the required configuration file::
1. Automatically generated by script
[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models
- Take Italian as an example, if your data is prepared in the following format:
```
|-train_data
|- it_train.txt # train_set label
|- it_val.txt # val_set label
|- data
|- word_001.jpg
|- word_002.jpg
|- word_003.jpg
| ...
```
You can use the default parameters to generate a configuration file:
```bash
# The code needs to be run in the specified directory
cd PaddleOCR/configs/rec/multi_language/
# Set the configuration file of the language to be generated through the -l or --language parameter.
# This command will write the default parameters into the configuration file
python3 generate_multi_language_configs.py -l it
```
- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:
```bash
# -l or --language field is required
# --train to modify the training set
# --val to modify the validation set
# --data_dir to modify the data set directory
# --dict to modify the dict path
# -o to modify the corresponding default parameters
cd PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py -l it \ # language
--train {path/of/train_label.txt} \ # path of train_label
--val {path/of/val_label.txt} \ # path of val_label
--data_dir {train_data/path} \ # root directory of training data
--dict {path/of/dict} \ # path of dict
-o Global.use_gpu=False # whether to use gpu
...
```
Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.
2. Manually modify the configuration file
You can also manually modify the following fields in the template:
```
Global:
use_gpu: True
epoch_num: 500
...
character_type: it # language
character_dict_path: {path/of/dict} # path of dict
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of training data
label_file_list: ["./train_data/train_list.txt"] # train label path
...
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/ # root directory of val data
label_file_list: ["./train_data/val_list.txt"] # val label path
...
```
Currently, the multi-language algorithms supported by PaddleOCR are: Currently, the multi-language algorithms supported by PaddleOCR are:
...@@ -380,9 +327,6 @@ Currently, the multi-language algorithms supported by PaddleOCR are: ...@@ -380,9 +327,6 @@ Currently, the multi-language algorithms supported by PaddleOCR are:
For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations) For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)
The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file: If you want to finetune on the basis of the existing model effect, please refer to the following instructions to modify the configuration file:
...@@ -421,7 +365,7 @@ Eval: ...@@ -421,7 +365,7 @@ Eval:
``` ```
<a name="EVALUATION"></a> <a name="EVALUATION"></a>
### 3 EVALUATION ## 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
...@@ -431,20 +375,39 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec ...@@ -431,20 +375,39 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
``` ```
<a name="PREDICTION"></a> <a name="PREDICTION"></a>
### 4 PREDICTION ## 4 PREDICTION
<a name="Training_engine_prediction"></a>
#### 4.1 Training engine prediction
Using the model trained by paddleocr, you can quickly get prediction through the following script. Using the model trained by paddleocr, you can quickly get prediction through the following script.
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`: The default prediction picture is stored in `infer_img`, and the trained weight is specified via `-o Global.checkpoints`:
According to the `save_model_dir` and `save_epoch_step` fields set in the configuration file, the following parameters will be saved:
```
output/rec/
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── config.yml
├── iter_epoch_3.pdopt
├── iter_epoch_3.pdparams
├── iter_epoch_3.states
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
```
Among them, best_accuracy.* is the best model on the evaluation set; iter_epoch_x.* is the model saved at intervals of `save_epoch_step`; latest.* is the model of the last epoch.
``` ```
# Predict English results # Predict English results
python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words/en/word_1.jpg python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words/en/word_1.jpg
``` ```
Input image: Input image:
![](../imgs_words/en/word_1.png) ![](../imgs_words/en/word_1.png)
......
# MODEL TRAINING
- [1. Basic concepts](#1-basic-concepts)
* [1.1 Learning rate](#11-learning-rate)
* [1.2 Regularization](#12-regularization)
* [1.3 Evaluation indicators](#13-evaluation-indicators-)
- [2. Data and vertical scenes](#2-data-and-vertical-scenes)
* [2.1 Training data](#21-training-data)
* [2.2 Vertical scene](#22-vertical-scene)
* [2.3 Build your own data set](#23-build-your-own-data-set)
* [3. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<a name="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
It usually consists of two subtasks: text detection and text recognition.
The following parameters need to be paid attention to when tuning the model:
<a name="11-learning-rate"></a>
## 1.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
```
Optimizer:
...
lr:
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
```
Piecewise stands for piecewise constant attenuation. Different learning rates are specified in different learning stages,
and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<a name="12-regularization"></a>
## 1.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
The configuration method is as follows:
```
Optimizer:
...
regularizer:
name: L2
factor: 2.0e-05
```
<a name="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
(2) Recognition stage: Character recognition accuracy, that is, the ratio of correctly recognized text lines to the number of marked text lines. Only the entire line of text recognition pairs can be regarded as correct recognition.
(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.
<a name="2-data-and-vertical-scenes"></a>
# 2. Data and vertical scenes
<a name="21-training-data"></a>
## 2.1 Training data
The current open source models, data sets and magnitudes are as follows:
- Detection:
- English data set, ICDAR2015
- Chinese data set, LSVT street view data set training data 3w pictures
- Identification:
- English data set, MJSynth and SynthText synthetic data, the data volume is tens of millions.
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<a name="22-vertical-scene"></a>
## 2.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<a name="23-build-your-own-data-set"></a>
## 2.3 Build your own data set
There are several experiences for reference when constructing the data set:
(1) The amount of data in the training set:
a. The data required for detection is relatively small. For Fine-tune based on the PaddleOCR model, 500 sheets are generally required to achieve good results.
b. Recognition is divided into English and Chinese. Generally, English scenarios require hundreds of thousands of data to achieve good results, while Chinese requires several million or more.
(2) When the amount of training data is small, you can try the following three ways to get more data:
a. Manually collect more training data, the most direct and effective way.
b. Basic image processing or transformation based on PIL and opencv. For example, the three modules of ImageFont, Image, ImageDraw in PIL write text into the background, opencv's rotating affine transformation, Gaussian filtering and so on.
c. Use data generation algorithms to synthesize data, such as algorithms such as pix2pix.
<a name="3-faq"></a>
# 3. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition?
A: The general height is 32, the longest width is selected, there are two methods:
(1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
(1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4.
(2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period.
# RECENT UPDATES # RECENT UPDATES
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image. - 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly. - 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941 - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
......
# Visualization # Visualization
<a name="PP-OCRv2"></a>
## PP-OCRv2
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic001.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic002.jpg" width="800">
<img src="../imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg" width="800">
<a name="ppocr_server_2.0"></a> <a name="ppocr_server_2.0"></a>
## ch_ppocr_server_2.0 ## ch_ppocr_server_2.0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment