whl_en.md 20.8 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
2
# paddleocr package

WenmuZhou's avatar
WenmuZhou committed
3
4
## 1 Get started quickly
### 1.1 install package
WenmuZhou's avatar
WenmuZhou committed
5
6
install by pypi
```bash
WenmuZhou's avatar
WenmuZhou committed
7
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
WenmuZhou's avatar
WenmuZhou committed
8
9
10
11
```

build own whl package and install
```bash
WenmuZhou's avatar
WenmuZhou committed
12
13
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
WenmuZhou's avatar
WenmuZhou committed
14
```
WenmuZhou's avatar
WenmuZhou committed
15
16
17
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
WenmuZhou's avatar
WenmuZhou committed
18

WenmuZhou's avatar
WenmuZhou committed
19
* detection angle classification and recognition
WenmuZhou's avatar
WenmuZhou committed
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
WenmuZhou's avatar
WenmuZhou committed
38
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

WenmuZhou's avatar
WenmuZhou committed
57
58
59
* detection and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
WenmuZhou's avatar
WenmuZhou committed
60
ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory
WenmuZhou's avatar
WenmuZhou committed
61
62
63
64
65
66
67
68
69
70
71
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path)
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
WenmuZhou's avatar
WenmuZhou committed
72
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
73
74
75
76
77
78
79
80
81
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
WenmuZhou's avatar
WenmuZhou committed
82
......
WenmuZhou's avatar
WenmuZhou committed
83
84
85
86
87
88
89
90
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

WenmuZhou's avatar
WenmuZhou committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
* classification and recognition
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains recognition text and confidence
```bash
['PAIN', 0.990372]
```

WenmuZhou's avatar
WenmuZhou committed
106
107
108
* only detection
```python
from paddleocr import PaddleOCR,draw_ocr
WenmuZhou's avatar
WenmuZhou committed
109
ocr = PaddleOCR() # need to run only once to download and load model into memory
WenmuZhou's avatar
WenmuZhou committed
110
111
112
113
114
115
116
117
118
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
    print(line)

# draw result
from PIL import Image

image = Image.open(img_path).convert('RGB')
WenmuZhou's avatar
WenmuZhou committed
119
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
120
121
122
123
124
125
126
127
128
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
WenmuZhou's avatar
WenmuZhou committed
129
......
WenmuZhou's avatar
WenmuZhou committed
130
131
132
133
134
135
136
137
138
139
140
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det.jpg" width="800">
</div>

* only recognition
```python
from paddleocr import PaddleOCR
WenmuZhou's avatar
WenmuZhou committed
141
ocr = PaddleOCR(lang='en') # need to run only once to load model into memory
WenmuZhou's avatar
WenmuZhou committed
142
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
WenmuZhou's avatar
WenmuZhou committed
143
result = ocr.ocr(img_path, det=False, cls=False)
WenmuZhou's avatar
WenmuZhou committed
144
145
146
147
for line in result:
    print(line)
```

WenmuZhou's avatar
WenmuZhou committed
148
Output will be a list, each item contains recognition text and confidence
WenmuZhou's avatar
WenmuZhou committed
149
150
151
152
```bash
['PAIN', 0.990372]
```

WenmuZhou's avatar
WenmuZhou committed
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
* only classification
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

WenmuZhou's avatar
WenmuZhou committed
168
### 2.2 Use by command line
WenmuZhou's avatar
WenmuZhou committed
169
170
171
172
173
174

show help information
```bash
paddleocr -h
```

WenmuZhou's avatar
WenmuZhou committed
175
176
* detection classification and recognition
```bash
WenmuZhou's avatar
WenmuZhou committed
177
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en
WenmuZhou's avatar
WenmuZhou committed
178
179
180
181
182
183
184
185
186
187
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```

WenmuZhou's avatar
WenmuZhou committed
188
189
* detection and recognition
```bash
WenmuZhou's avatar
WenmuZhou committed
190
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en
WenmuZhou's avatar
WenmuZhou committed
191
192
193
194
195
196
197
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
WenmuZhou's avatar
WenmuZhou committed
198
......
WenmuZhou's avatar
WenmuZhou committed
199
200
```

WenmuZhou's avatar
WenmuZhou committed
201
202
* classification and recognition
```bash
WenmuZhou's avatar
WenmuZhou committed
203
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en
WenmuZhou's avatar
WenmuZhou committed
204
205
206
207
208
209
210
```

Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```

WenmuZhou's avatar
WenmuZhou committed
211
212
213
214
215
216
217
218
219
220
* only detection
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
```

Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
WenmuZhou's avatar
WenmuZhou committed
221
......
WenmuZhou's avatar
WenmuZhou committed
222
223
224
225
```

* only recognition
```bash
WenmuZhou's avatar
WenmuZhou committed
226
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en
WenmuZhou's avatar
WenmuZhou committed
227
228
229
230
231
232
233
```

Output will be a list, each item contains text and recognition confidence
```bash
['PAIN', 0.990372]
```

WenmuZhou's avatar
WenmuZhou committed
234
235
* only classification
```bash
WenmuZhou's avatar
WenmuZhou committed
236
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false
WenmuZhou's avatar
WenmuZhou committed
237
238
239
240
241
242
243
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

WenmuZhou's avatar
WenmuZhou committed
244
## 3 Use custom model
WenmuZhou's avatar
WenmuZhou committed
245
246
247
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows

WenmuZhou's avatar
WenmuZhou committed
248
### 3.1 Use by code
WenmuZhou's avatar
WenmuZhou committed
249
250
251
252

```python
from paddleocr import PaddleOCR,draw_ocr
# The path of detection and recognition model must contain model and params files
WenmuZhou's avatar
WenmuZhou committed
253
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
WenmuZhou's avatar
WenmuZhou committed
254
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
WenmuZhou's avatar
WenmuZhou committed
255
result = ocr.ocr(img_path, cls=True)
WenmuZhou's avatar
WenmuZhou committed
256
257
258
259
260
261
262
263
264
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
WenmuZhou's avatar
WenmuZhou committed
265
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
266
267
268
269
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

WenmuZhou's avatar
WenmuZhou committed
270
### 3.2 Use by command line
WenmuZhou's avatar
WenmuZhou committed
271
272

```bash
WenmuZhou's avatar
WenmuZhou committed
273
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
WenmuZhou's avatar
WenmuZhou committed
274
275
```

WenmuZhou's avatar
WenmuZhou committed
276
## 4 Use web images or numpy array as input
WenmuZhou's avatar
WenmuZhou committed
277

WenmuZhou's avatar
WenmuZhou committed
278
### 4.1 Web image
WenmuZhou's avatar
WenmuZhou committed
279

WenmuZhou's avatar
WenmuZhou committed
280
- Use by code
WenmuZhou's avatar
WenmuZhou committed
281
282
283
284
285
286
287
288
289
290
291
292
293
294
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
WenmuZhou's avatar
WenmuZhou committed
295
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
296
297
298
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
WenmuZhou's avatar
WenmuZhou committed
299
- Use by command line
WenmuZhou's avatar
WenmuZhou committed
300
301
302
303
```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
```

WenmuZhou's avatar
WenmuZhou committed
304
### 4.2 Numpy array
WenmuZhou's avatar
WenmuZhou committed
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Support numpy array as input only when used by code

```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
img = cv2.imread(img_path)
# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), If your own training model supports grayscale images, you can uncomment this line
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
WenmuZhou's avatar
WenmuZhou committed
323
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
WenmuZhou's avatar
WenmuZhou committed
324
325
326
327
328
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```


WenmuZhou's avatar
WenmuZhou committed
329
## 5 Parameter Description
WenmuZhou's avatar
WenmuZhou committed
330
331
332
333
334
335
336

| Parameter                    | Description                                                                                                                                                                                                                 | Default value                  |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu                 | use GPU or not                                                                                                                                                                                                          | TRUE                    |
| gpu_mem                 | GPU memory size used for initialization                                                                                                                                                                                              | 8000M                   |
| image_dir               | The images path or folder path for predicting when used by the command line                                                                                                                                                                           |                         |
| det_algorithm           | Type of detection algorithm selected                                                                                                                                                                                                   | DB                      |
WenmuZhou's avatar
WenmuZhou committed
337
| det_model_dir           | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None           |
WenmuZhou's avatar
WenmuZhou committed
338
339
340
341
342
343
344
345
| det_max_side_len        | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally                                                                                                                         | 960                     |
| det_db_thresh           | Binarization threshold value of DB output map                                                                                                                                                                                        | 0.3                     |
| det_db_box_thresh       | The threshold value of the DB output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.5                     |
| det_db_unclip_ratio     | The expanded ratio of DB output box                                                                                                                                                                                             | 2                       |
| det_east_score_thresh   | Binarization threshold value of EAST output map                                                                                                                                                                                       | 0.8                     |
| det_east_cover_thresh   | The threshold value of the EAST output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.1                     |
| det_east_nms_thresh     | The NMS threshold value of EAST model output box                                                                                                                                                                                              | 0.2                     |
| rec_algorithm           | Type of recognition algorithm selected                                                                                                                                                                                                | CRNN                    |
WenmuZhou's avatar
WenmuZhou committed
346
| rec_model_dir           | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
WenmuZhou's avatar
WenmuZhou committed
347
348
349
| rec_image_shape         | image shape of recognition algorithm                                                                                                                                                                                            | "3,32,320"              |
| rec_char_type           | Character type of recognition algorithm, Chinese (ch) or English (en)                                                                                                                                                                               | ch                      |
| rec_batch_num           | When performing recognition, the batchsize of forward images                                                                                                                                                                                         | 30                      |
WenmuZhou's avatar
WenmuZhou committed
350
351
| max_text_length         | The maximum text length that the recognition algorithm can recognize                                                                                                                                                                                         | 25                      |
| rec_char_dict_path      | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2                                                                                                                                              | ./ppocr/utils/ppocr_keys_v1.txt                        |
WenmuZhou's avatar
WenmuZhou committed
352
| use_space_char          | Whether to recognize spaces                                                                                                                                                                                                         | TRUE                    |
WenmuZhou's avatar
WenmuZhou committed
353
| drop_score          | Filter the output by score (from the recognition model), and those below this score will not be returned                                                                                                                                                                                                        | 0.5                    |
WenmuZhou's avatar
WenmuZhou committed
354
355
356
357
358
| use_angle_cls          | Whether to load classification model                                                                                                                                                                                                       | FALSE                    |
| cls_model_dir           | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| cls_image_shape         | image shape of classification algorithm                                                                                                                                                                                            | "3,48,192"              |
| label_list         | label list of classification algorithm                                                                                                                                                                                            | ['0','180']           |
| cls_batch_num           | When performing classification, the batchsize of forward images                                                                                                                                                                                         | 30                      |
WenmuZhou's avatar
WenmuZhou committed
359
| enable_mkldnn           | Whether to enable mkldnn                                                                                                                                                                                                       | FALSE                   |
WenmuZhou's avatar
WenmuZhou committed
360
361
| use_zero_copy_run           | Whether to forward by zero_copy_run                                                                                                                                                                               | FALSE                   |
| lang                     | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported                                                                                                                                                                                                  | ch                    |
WenmuZhou's avatar
WenmuZhou committed
362
| det                     | Enable detction when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
WenmuZhou's avatar
WenmuZhou committed
363
| rec                     | Enable recognition when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
WenmuZhou's avatar
WenmuZhou committed
364
| cls                     | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction)                                                                                                                                                                                                   | FALSE                    |