quickstart_en.md 12 KB
Newer Older
MissPenguin's avatar
MissPenguin committed
1
2
3
4
5
# PaddleOCR Quick Start

**Note:** this tutorial mainly introduces the usage of PP-OCR series models, please refer to [PP-Structure Quick Start](../../ppstructure/docs/quickstart_en.md) for the quick use of document analysis related functions.

- [1. Installation](#1-installation)
WenmuZhou's avatar
WenmuZhou committed
6
7
    - [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
    - [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
MissPenguin's avatar
MissPenguin committed
8
- [2. Easy-to-Use](#2-easy-to-use)
WenmuZhou's avatar
WenmuZhou committed
9
10
11
12
13
14
15
    - [2.1 Use by Command Line](#21-use-by-command-line)
      - [2.1.1 Chinese and English Model](#211-chinese-and-english-model)
      - [2.1.2 Multi-language Model](#212-multi-language-model)
      - [2.1.3 Layout Analysis](#213-layout-analysis)
    - [2.2 Use by Code](#22-use-by-code)
      - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model)
      - [2.2.2 Layout Analysis](#222-layout-analysis)
MissPenguin's avatar
MissPenguin committed
16
- [3. Summary](#3-summary)
littletomatodonkey's avatar
littletomatodonkey committed
17
18
19



Leif's avatar
Leif committed
20
<a name="1nstallation"></a>
littletomatodonkey's avatar
littletomatodonkey committed
21

Leif's avatar
Leif committed
22
## 1. Installation
littletomatodonkey's avatar
littletomatodonkey committed
23

Leif's avatar
Leif committed
24
<a name="11-install-paddlepaddle"></a>
WenmuZhou's avatar
WenmuZhou committed
25

Leif's avatar
Leif committed
26
27
28
### 1.1 Install PaddlePaddle

> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
littletomatodonkey's avatar
littletomatodonkey committed
29

Leif's avatar
Leif committed
30
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
littletomatodonkey's avatar
littletomatodonkey committed
31

Leif's avatar
Leif committed
32
33
34
35
36
37
38
39
40
  ```bash
  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
  ```

- If you have no available GPU on your machine, please run the following command to install the CPU version

  ```bash
  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
  ```
littletomatodonkey's avatar
littletomatodonkey committed
41

Leif's avatar
Leif committed
42
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
WenmuZhou's avatar
WenmuZhou committed
43

Leif's avatar
Leif committed
44
45
46
<a name="12-install-paddleocr-whl-package"></a>

### 1.2 Install PaddleOCR Whl Package
Leif's avatar
Leif committed
47
48
49

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey committed
50
51
```

Leif's avatar
Leif committed
52
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Leif's avatar
Leif committed
53

Leif's avatar
Leif committed
54
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
Leif's avatar
Leif committed
55

Leif's avatar
Leif committed
56
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey committed
57

Leif's avatar
Leif committed
58
59
60
61
62
63
64
65
66
67
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

68
### 2.1 Use by Command Line
Leif's avatar
Leif committed
69

Leif's avatar
Leif committed
70
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
Leif's avatar
Leif committed
71
72

```bash
Leif's avatar
Leif committed
73
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey committed
74
```
Leif's avatar
Leif committed
75

Leif's avatar
Leif committed
76
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
Leif's avatar
Leif committed
77

78
79
**Note**: The whl package uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3,48,320`, so if you use the recognition function, you need to add the parameter `--rec_image_shape 3,48,320`, if you do not use the default `PP- OCRv3` model, you do not need to set this parameter.

Leif's avatar
Leif committed
80
<a name="211-english-and-chinese-model"></a>
Leif's avatar
Leif committed
81

Leif's avatar
Leif committed
82
#### 2.1.1 Chinese and English Model
Leif's avatar
Leif committed
83

84
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
Leif's avatar
Leif committed
85

Leif's avatar
Leif committed
86
  ```bash
87
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false --rec_image_shape 3,48,320
Leif's avatar
Leif committed
88
  ```
littletomatodonkey's avatar
littletomatodonkey committed
89

Leif's avatar
Leif committed
90
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey committed
91

Leif's avatar
Leif committed
92
  ```bash
93
94
95
  [[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
  [[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
  [[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
Leif's avatar
Leif committed
96
97
98
99
100
101
102
103
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
Leif's avatar
Leif committed
104

Leif's avatar
Leif committed
105
106
107
  Output will be a list, each item only contains bounding box

  ```bash
108
109
110
  [[397.0, 802.0], [1092.0, 802.0], [1092.0, 841.0], [397.0, 841.0]]
  [[397.0, 750.0], [1211.0, 750.0], [1211.0, 789.0], [397.0, 789.0]]
  [[397.0, 702.0], [1209.0, 698.0], [1209.0, 734.0], [397.0, 738.0]]
Leif's avatar
Leif committed
111
112
113
114
115
116
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
117
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en --rec_image_shape 3,48,320
Leif's avatar
Leif committed
118
119
120
121
122
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
123
  ['PAIN', 0.9934559464454651]
Leif's avatar
Leif committed
124
125
  ```

126
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--versioin PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md)
Leif's avatar
Leif committed
127
<a name="212-multi-language-model"></a>
Leif's avatar
Leif committed
128
129
130

#### 2.1.2 Multi-language Model

131
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter. PP-OCRv3 currently only supports Chinese and English models, and other multilingual models will be updated one after another.
Leif's avatar
Leif committed
132
133

``` bash
134
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en --rec_image_shape 3,48,320
littletomatodonkey's avatar
littletomatodonkey committed
135
136
```

Leif's avatar
Leif committed
137
138
139
140
141
142
143
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
144
145
146
[[[67.0, 51.0], [327.0, 46.0], [327.0, 74.0], [68.0, 80.0]], ('PHOCAPITAL', 0.9944712519645691)]
[[[72.0, 92.0], [453.0, 84.0], [454.0, 114.0], [73.0, 122.0]], ('107 State Street', 0.9744491577148438)]
[[[69.0, 135.0], [501.0, 125.0], [501.0, 156.0], [70.0, 165.0]], ('Montpelier Vermont', 0.9357033967971802)]
Leif's avatar
Leif committed
147
148
......
```
littletomatodonkey's avatar
littletomatodonkey committed
149

Leif's avatar
Leif committed
150
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey committed
151

Leif's avatar
Leif committed
152
153
154
155
156
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey committed
157

Leif's avatar
Leif committed
158
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
Leif's avatar
Leif committed
159
<a name="213-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
160

Leif's avatar
Leif committed
161
162
163
#### 2.1.3 Layout Analysis

Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image.
littletomatodonkey's avatar
littletomatodonkey committed
164

Leif's avatar
Leif committed
165
166
167
168
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey committed
169
170
```

Leif's avatar
Leif committed
171
- **Results Format**
Leif's avatar
Leif committed
172

Leif's avatar
Leif committed
173
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
Leif's avatar
Leif committed
174

Leif's avatar
Leif committed
175
176
177
178
179
180
181
182
183
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
Leif's avatar
Leif committed
184

Leif's avatar
Leif committed
185
  The description of each field in dict is as follows
Leif's avatar
Leif committed
186

Leif's avatar
Leif committed
187
188
189
190
191
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey committed
192

Leif's avatar
Leif committed
193
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey committed
194

Leif's avatar
Leif committed
195
196
197
198
199
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
WenmuZhou's avatar
WenmuZhou committed
200
  | table_char_dict_path | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
Leif's avatar
Leif committed
201

Leif's avatar
Leif committed
202
<a name="22-use-by-code"></a>
Leif's avatar
Leif committed
203

Leif's avatar
Leif committed
204
205
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
Leif's avatar
Leif committed
206

Leif's avatar
Leif committed
207
#### 2.2.1 Chinese & English Model and Multilingual Model
Leif's avatar
Leif committed
208

Leif's avatar
Leif committed
209
* detection, angle classification and recognition:
Leif's avatar
Leif committed
210

Leif's avatar
Leif committed
211
212
213
214
215
216
217
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
Leif's avatar
Leif committed
218
219
220
221
222
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


Leif's avatar
Leif committed
223
224
# draw result
from PIL import Image
Leif's avatar
Leif committed
225
226
227
228
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
Leif's avatar
Leif committed
229
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
Leif's avatar
Leif committed
230
231
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey committed
232
```
Leif's avatar
Leif committed
233

Leif's avatar
Leif committed
234
Output will be a list, each item contains bounding box, text and recognition confidence
Leif's avatar
Leif committed
235
236

```bash
237
238
239
240
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
  [[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
  [[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
  ......
littletomatodonkey's avatar
littletomatodonkey committed
241
242
```

Leif's avatar
Leif committed
243
Visualization of results
littletomatodonkey's avatar
littletomatodonkey committed
244

Leif's avatar
Leif committed
245
<div align="center">
Leif's avatar
Leif committed
246
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
Leif's avatar
Leif committed
247
</div>
Leif's avatar
Leif committed
248
<a name="222-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
249

Leif's avatar
Leif committed
250
#### 2.2.2 Layout Analysis
Leif's avatar
Leif committed
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Leif's avatar
Leif committed
277
278
279
280
281
282
283
284

<a name="3"></a>

## 3. Summary

In this section, you have mastered the use of PaddleOCR whl packages and obtained results.

PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.