whl.md 19.6 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
3
4
5
6
7
8
9
10
11
12
13
# paddleocr package使用说明

## 快速上手

### 安装whl包

pip安装
```bash
pip install paddleocr
```

本地构建并安装
```bash
WenmuZhou's avatar
WenmuZhou committed
14
15
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x是paddleocr的版本号
WenmuZhou's avatar
WenmuZhou committed
16
17
18
```
### 1. 代码使用

WenmuZhou's avatar
WenmuZhou committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
* 检测+分类+识别全流程
```python
from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```
结果可视化

<div align="center">
    <img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>


* 检测+识别
WenmuZhou's avatar
WenmuZhou committed
55
56
```python
from paddleocr import PaddleOCR, draw_ocr
WenmuZhou's avatar
WenmuZhou committed
57
ocr = PaddleOCR() # need to run only once to download and load model into memory
WenmuZhou's avatar
WenmuZhou committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
img_path = 'PaddleOCR/doc/imgs/11.jpg'
result = ocr.ocr(img_path)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
WenmuZhou's avatar
WenmuZhou committed
78
......
WenmuZhou's avatar
WenmuZhou committed
79
80
81
82
83
84
85
```
结果可视化

<div align="center">
    <img src="../imgs_results/whl/11_det_rec.jpg" width="800">
</div>

WenmuZhou's avatar
WenmuZhou committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

* 分类+识别
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
    print(line)
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```

WenmuZhou's avatar
WenmuZhou committed
101
102
103
* 单独执行检测
```python
from paddleocr import PaddleOCR, draw_ocr
WenmuZhou's avatar
WenmuZhou committed
104
ocr = PaddleOCR() # need to run only once to download and load model into memory
WenmuZhou's avatar
WenmuZhou committed
105
img_path = 'PaddleOCR/doc/imgs/11.jpg'
WenmuZhou's avatar
WenmuZhou committed
106
result = ocr.ocr(img_path, rec=False)
WenmuZhou's avatar
WenmuZhou committed
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
for line in result:
    print(line)

# 显示结果
from PIL import Image

image = Image.open(img_path).convert('RGB')
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
结果是一个list,每个item只包含文本框
```bash
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
WenmuZhou's avatar
WenmuZhou committed
123
......
WenmuZhou's avatar
WenmuZhou committed
124
125
126
127
128
129
130
131
132
133
134
```
结果可视化


<div align="center">
    <img src="../imgs_results/whl/11_det.jpg" width="800">
</div>

* 单独执行识别
```python
from paddleocr import PaddleOCR
WenmuZhou's avatar
WenmuZhou committed
135
ocr = PaddleOCR() # need to run only once to download and load model into memory
WenmuZhou's avatar
WenmuZhou committed
136
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
WenmuZhou's avatar
WenmuZhou committed
137
result = ocr.ocr(img_path, det=False)
WenmuZhou's avatar
WenmuZhou committed
138
139
140
141
142
143
144
145
for line in result:
    print(line)
```
结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```

WenmuZhou's avatar
WenmuZhou committed
146
147
148
149
150
151
152
153
154
155
156
157
158
159
* 单独执行分类
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_words/ch/word_1.jpg'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
    print(line)
```
结果是一个list,每个item只包含分类结果和分类置信度
```bash
['0', 0.9999924]
```

WenmuZhou's avatar
WenmuZhou committed
160
161
162
163
164
165
166
### 通过命令行使用

查看帮助信息
```bash
paddleocr -h
```

WenmuZhou's avatar
WenmuZhou committed
167
168
169
170
171
172
173
174
175
176
177
178
179
* 检测+分类+识别全流程
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --use_angle_cls true --cls true
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
......
```

* 检测+识别
WenmuZhou's avatar
WenmuZhou committed
180
181
182
183
184
185
186
187
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg
```
结果是一个list,每个item包含了文本框,文字和识别置信度
```bash
[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['纯臻营养护发素', 0.964739]]
[[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['产品信息/参数', 0.98069626]]
[[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45元/每公斤,100公斤起订)', 0.9676722]]
WenmuZhou's avatar
WenmuZhou committed
188
......
WenmuZhou's avatar
WenmuZhou committed
189
190
```

WenmuZhou's avatar
WenmuZhou committed
191
192
193
194
195
196
197
198
199
200
* 分类+识别
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false
```

结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```

WenmuZhou's avatar
WenmuZhou committed
201
202
203
204
205
206
207
208
209
* 单独执行检测
```bash
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --rec false
```
结果是一个list,每个item只包含文本框
```bash
[[26.0, 457.0], [137.0, 457.0], [137.0, 477.0], [26.0, 477.0]]
[[25.0, 425.0], [372.0, 425.0], [372.0, 448.0], [25.0, 448.0]]
[[128.0, 397.0], [273.0, 397.0], [273.0, 414.0], [128.0, 414.0]]
WenmuZhou's avatar
WenmuZhou committed
210
......
WenmuZhou's avatar
WenmuZhou committed
211
212
213
214
215
216
217
218
219
220
221
222
```

* 单独执行识别
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --det false
```

结果是一个list,每个item只包含识别结果和识别置信度
```bash
['韩国小馆', 0.9907421]
```

WenmuZhou's avatar
WenmuZhou committed
223
224
225
226
227
228
229
230
231
232
* 单独执行分类
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_words/ch/word_1.jpg --use_angle_cls true --cls true --det false --rec false
```

结果是一个list,每个item只包含分类结果和分类置信度
```bash
['0', 0.9999924]
```

WenmuZhou's avatar
WenmuZhou committed
233
234
## 自定义模型
当内置模型无法满足需求时,需要使用到自己训练的模型。
WenmuZhou's avatar
WenmuZhou committed
235
首先,参照[inference.md](./inference.md) 第一节转换将检测、分类和识别模型转换为inference模型,然后按照如下方式使用
WenmuZhou's avatar
WenmuZhou committed
236
237
238
239

### 代码使用
```python
from paddleocr import PaddleOCR, draw_ocr
WenmuZhou's avatar
WenmuZhou committed
240
241
# 模型路径下必须含有model和params文件
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
WenmuZhou's avatar
WenmuZhou committed
242
img_path = 'PaddleOCR/doc/imgs/11.jpg'
WenmuZhou's avatar
WenmuZhou committed
243
result = ocr.ocr(img_path, cls=True)
WenmuZhou's avatar
WenmuZhou committed
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

### 通过命令行使用

```bash
WenmuZhou's avatar
WenmuZhou committed
261
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true --cls true
WenmuZhou's avatar
WenmuZhou committed
262
263
```

WenmuZhou's avatar
WenmuZhou committed
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
### 使用网络图片或者numpy数组作为输入

1. 网络图片

代码使用
```python
from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
命令行模式
```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
```

2. numpy数组
仅通过代码使用时支持numpy数组作为输入
```python
from paddleocr import PaddleOCR, draw_ocr
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
img = cv2.imread(img_path)
# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), 如果你自己训练的模型支持灰度图,可以将这句话的注释取消
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# 显示结果
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

WenmuZhou's avatar
WenmuZhou committed
319
320
321
322
323
324
325
326
## 参数说明

| 字段                    | 说明                                                                                                                                                                                                                 | 默认值                  |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu                 | 是否使用GPU                                                                                                                                                                                                          | TRUE                    |
| gpu_mem                 | 初始化占用的GPU内存大小                                                                                                                                                                                              | 8000M                   |
| image_dir               | 通过命令行调用时执行预测的图片或文件夹路径                                                                                                                                                                           |                         |
| det_algorithm           | 使用的检测算法类型                                                                                                                                                                                                   | DB                      |
WenmuZhou's avatar
WenmuZhou committed
327
| det_model_dir          |  检测模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/det`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 |   None        |
WenmuZhou's avatar
WenmuZhou committed
328
329
330
331
332
333
334
335
| det_max_side_len        | 检测算法前向时图片长边的最大尺寸,当长边超出这个值时会将长边resize到这个大小,短边等比例缩放                                                                                                                         | 960                     |
| det_db_thresh           | DB模型输出预测图的二值化阈值                                                                                                                                                                                         | 0.3                     |
| det_db_box_thresh       | DB模型输出框的阈值,低于此值的预测框会被丢弃                                                                                                                                                                           | 0.5                     |
| det_db_unclip_ratio     | DB模型输出框扩大的比例                                                                                                                                                                                               | 2                       |
| det_east_score_thresh   | EAST模型输出预测图的二值化阈值                                                                                                                                                                                       | 0.8                     |
| det_east_cover_thresh   | EAST模型输出框的阈值,低于此值的预测框会被丢弃                                                                                                                                                                         | 0.1                     |
| det_east_nms_thresh     | EAST模型输出框NMS的阈值                                                                                                                                                                                              | 0.2                     |
| rec_algorithm           | 使用的识别算法类型                                                                                                                                                                                                   | CRNN                    |
WenmuZhou's avatar
WenmuZhou committed
336
| rec_model_dir          | 识别模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/rec`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件 | None |
WenmuZhou's avatar
WenmuZhou committed
337
| rec_image_shape         | 识别算法的输入图片尺寸                                                                                                                                                                                             | "3,32,320"              |
WenmuZhou's avatar
WenmuZhou committed
338
| rec_char_type           | 识别算法的字符类型,中英文(ch)、英文(en)、法语(french)、德语(german)、韩语(korean)、日语(japan)                                                                                                                                                                               | ch                      |
WenmuZhou's avatar
WenmuZhou committed
339
| rec_batch_num           | 进行识别时,同时前向的图片数                                                                                                                                                                                         | 30                      |
WenmuZhou's avatar
WenmuZhou committed
340
341
| max_text_length         | 识别算法能识别的最大文字长度                                                                                                                                                                                         | 25                      |
| rec_char_dict_path      | 识别模型字典路径,当rec_model_dir使用方式2传参时需要修改为自己的字典路径                                                                                                                                                | ./ppocr/utils/ppocr_keys_v1.txt                        |
WenmuZhou's avatar
WenmuZhou committed
342
| use_space_char          | 是否识别空格                                                                                                                                                                                                         | TRUE                    |
WenmuZhou's avatar
WenmuZhou committed
343
| drop_score          | 对输出按照分数(来自于识别模型)进行过滤,低于此分数的不返回                                                                                                                                                                                                         | 0.5                    |
WenmuZhou's avatar
WenmuZhou committed
344
345
346
347
348
| use_angle_cls          | 是否加载分类模型                                                                                                                                                                                                         | FALSE                    |
| cls_model_dir          | 分类模型所在文件夹。传参方式有两种,1. None: 自动下载内置模型到 `~/.paddleocr/cls`;2.自己转换好的inference模型路径,模型路径下必须包含model和params文件                                                                                 | None                    |
| cls_image_shape          | 分类算法的输入图片尺寸                                                                           | "3, 48, 192"                    |
| label_list          | 分类算法的标签列表                                                                           | ['0', '180']                  |
| cls_batch_num          | 进行分类时,同时前向的图片数                                                                          |30                 |
WenmuZhou's avatar
WenmuZhou committed
349
| enable_mkldnn           | 是否启用mkldnn                                                                                                                                                                                                       | FALSE                   |
WenmuZhou's avatar
WenmuZhou committed
350
351
| use_zero_copy_run           | 是否通过zero_copy_run的方式进行前向                                                                                                                                                                               | FALSE                   |
| lang                     | 模型语言类型,目前支持 中文(ch)和英文(en)                                                                                                                                                                                                  | ch                    |
WenmuZhou's avatar
WenmuZhou committed
352
353
| det                     | 前向时使用启动检测                                                                                                                                                                                                   | TRUE                    |
| rec                     | 前向时是否启动识别                                                                                                                                                                                                   | TRUE                    |
WenmuZhou's avatar
WenmuZhou committed
354
| cls                     | 前向时是否启动分类 (命令行模式下使用use_angle_cls控制前向是否启动分类)                                                                                                                                                                                                | FALSE                    |