quick_start.md 9.75 KB
Newer Older
wanglch's avatar
wanglch committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
---
comments: true
hide:
  - navigation
---

### 安装

#### 1. 安装PaddlePaddle

> 如果您没有基础的Python运行环境,请参考[运行环境准备](./ppocr/environment.md)。

=== "CPU端安装"

    ```bash linenums="1"
    python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
    ```

=== "GPU端安装"

    由于GPU端需要根据具体CUDA版本来对应安装使用,以下仅以Linux平台,pip安装英伟达GPU, CUDA11.8为例,其他平台,请参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。


    ```bash linenums="1"
    python -m pip install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
    ```

#### 2. 安装`paddleocr`

```bash linenums="1"
pip install paddleocr
```

NOTE: 可以通过设置环境变量 `PADDLE_OCR_BASE_DIR` 来自定义 OCR 模型的存储位置。如果未设置此变量,模型将下载到以下默认位置:

- 在Linux/macOS上路径为:`${HOME}/.paddleocr`
- 在Windows上路径为:`C:\Users\{username}\.paddleocr`

### Python脚本使用

=== "文本检测+方向分类+文本识别"

    ```python linenums="1"
    from paddleocr import PaddleOCR, draw_ocr

    # Paddleocr supports Chinese, English, French, German, Korean and Japanese
    # You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
    # to switch the language model in order
    ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
    img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
    result = ocr.ocr(img_path, cls=True)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)

    # draw result
    from PIL import Image
    result = result[0]
    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save('result.jpg')
    ```

    输出示例:

    ```python linenums="1"
    [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
    [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
    [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
    ......
    ```

=== "文本检测+文本识别"

    ```python linenums="1"
    from paddleocr import PaddleOCR,draw_ocr
    ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory
    img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
    result = ocr.ocr(img_path, cls=False)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)

    # draw result
    from PIL import Image
    result = result[0]
    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save('result.jpg')
    ```

    输出示例:

    ```python linenums="1"
    [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
    [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
    [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
    ......
    ```

=== "方向分类+文本识别"

    ```python linenums="1"
    from paddleocr import PaddleOCR
    ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory
    img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
    result = ocr.ocr(img_path, det=False, cls=True)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)
    ```

    输出示例:

    ```python linenums="1"
    ['PAIN', 0.990372]
    ```

=== "只有文本检测"

    ```python linenums="1"
    from paddleocr import PaddleOCR,draw_ocr
    ocr = PaddleOCR() # need to run only once to download and load model into memory
    img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
    result = ocr.ocr(img_path,rec=False)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)

    # draw result
    from PIL import Image
    result = result[0]
    image = Image.open(img_path).convert('RGB')
    im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save('result.jpg')
    ```

    输出示例:

    ```python linenums="1"
    [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
    [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
    [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
    ......
    ```

=== "只有识别"

    ```python linenums="1"
    from paddleocr import PaddleOCR
    ocr = PaddleOCR(lang='en') # need to run only once to load model into memory
    img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
    result = ocr.ocr(img_path, det=False, cls=False)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)
    ```

    输出示例:

    ```python linenums="1"
    ['PAIN', 0.990372]
    ```

=== "只有方向分类"

    ```python linenums="1"
    from paddleocr import PaddleOCR
    ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory
    img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
    result = ocr.ocr(img_path, det=False, rec=False, cls=True)
    for idx in range(len(result)):
        res = result[idx]
        for line in res:
            print(line)
    ```

    输出示例:

    ```python linenums="1"
    ['0', 0.99999964]
    ```

### 命令行使用

显示帮助信息

```bash linenums="1"
paddleocr -h
```

=== "文本检测+方向分类+文本识别"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en
    ```

    输出示例:

    ```python linenums="1"
    [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
    [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
    [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
    ......
    ```

    还支持 PDF 文件,可以通过设置 `page_num` 参数来推理前几页,默认值为 0,这意味着处理全部页面。

    ```bash linenums="1"
    paddleocr --image_dir ./xxx.pdf --use_angle_cls true --use_gpu false --page_num 2
    ```

=== "文本检测+文本识别"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en
    ```

    输出示例:

    ```python linenums="1"
    [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
    [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
    [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
    ......
    ```

=== "方向分类+文本识别"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en
    ```

    输出示例:

    ```python linenums="1"
    ['PAIN', 0.990372]
    ```

=== "只有文本检测"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
    ```

    输出示例:

    ```python linenums="1"
    [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
    [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
    [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
    ......
    ```

=== "只有识别"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en
    ```

    输出示例:

    ```python linenums="1"
    ['PAIN', 0.990372]
    ```

=== "只有方向分类"

    ```bash linenums="1"
    paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false
    ```

    输出示例:

    ```python linenums="1"
    ['0', 0.99999964]
    ```

更加详细的文档,请移步:[PaddleOCR快速开始](./ppocr/quick_start.md)

### 在线demo

- PP-OCRv4 在线体验地址:<https://aistudio.baidu.com/community/app/91660>
- SLANet 在线体验地址:<https://aistudio.baidu.com/community/app/91661>
- PP-ChatOCRv3-doc 在线体验地址:<https://aistudio.baidu.com/community/app/182491>
- PP-ChatOCRv2-common 在线体验地址:<https://aistudio.baidu.com/community/app/91662>
- PP-ChatOCRv2-doc 在线体验地址:<https://aistudio.baidu.com/community/app/70303>

### 相关文档

- [一键调用48个PaddleOCR核心模型](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html)
- 一行命令快速使用:[文本检测识别(中英文/多语言)](https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/overview.html)
- 一行命令快速使用:[文档分析](https://paddlepaddle.github.io/PaddleOCR/latest/ppstructure/overview.html)