quickstart_en.md 11.5 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey committed
1

Leif's avatar
Leif committed
2
# PaddleOCR Quick Start
littletomatodonkey's avatar
littletomatodonkey committed
3

Leif's avatar
Leif committed
4
5
6
+ [1. Installation](#1installation)
  + [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
  + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
Leif's avatar
Leif committed
7
* [2. Easy-to-Use](#2-easy-to-use)
8
  + [2.1 Use by Command Line](#21-use-by-command-line)
Leif's avatar
Leif committed
9
10
    - [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
    - [2.1.2 Multi-language Model](#212-multi-language-model)
Leif's avatar
Leif committed
11
    - [2.1.3 Layout Analysis](#213-layoutAnalysis)
Leif's avatar
Leif committed
12
13
  + [2.2 Use by Code](#22-use-by-code)
    - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
Leif's avatar
Leif committed
14
    - [2.2.2 Layout Analysis](#222-layoutAnalysis)
Leif's avatar
Leif committed
15
* [3. Summary](#3)
littletomatodonkey's avatar
littletomatodonkey committed
16

Leif's avatar
Leif committed
17
<a name="1nstallation"></a>
littletomatodonkey's avatar
littletomatodonkey committed
18

Leif's avatar
Leif committed
19
## 1. Installation
littletomatodonkey's avatar
littletomatodonkey committed
20

Leif's avatar
Leif committed
21
<a name="11-install-paddlepaddle"></a>
WenmuZhou's avatar
WenmuZhou committed
22

Leif's avatar
Leif committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
### 1.1 Install PaddlePaddle

> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).

- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install

  ```bash
  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
  ```

- If you have no available GPU on your machine, please run the following command to install the CPU version

  ```bash
  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
  ```

For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.

<a name="12-install-paddleocr-whl-package"></a>

### 1.2 Install PaddleOCR Whl Package
Leif's avatar
Leif committed
44
45
46

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey committed
47
48
```

Leif's avatar
Leif committed
49
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Leif's avatar
Leif committed
50

Leif's avatar
Leif committed
51
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
Leif's avatar
Leif committed
52

Leif's avatar
Leif committed
53
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey committed
54

Leif's avatar
Leif committed
55
56
57
58
59
60
61
62
63
64
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

65
### 2.1 Use by Command Line
Leif's avatar
Leif committed
66

Leif's avatar
Leif committed
67
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
Leif's avatar
Leif committed
68
69

```bash
Leif's avatar
Leif committed
70
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey committed
71
```
Leif's avatar
Leif committed
72

Leif's avatar
Leif committed
73
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
Leif's avatar
Leif committed
74

Leif's avatar
Leif committed
75
<a name="211-english-and-chinese-model"></a>
Leif's avatar
Leif committed
76

Leif's avatar
Leif committed
77
#### 2.1.1 Chinese and English Model
Leif's avatar
Leif committed
78

79
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
Leif's avatar
Leif committed
80

Leif's avatar
Leif committed
81
  ```bash
82
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
Leif's avatar
Leif committed
83
  ```
littletomatodonkey's avatar
littletomatodonkey committed
84

Leif's avatar
Leif committed
85
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey committed
86

Leif's avatar
Leif committed
87
88
89
90
91
92
93
94
95
96
97
98
  ```bash
  [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
  [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
  [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
Leif's avatar
Leif committed
99

Leif's avatar
Leif committed
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
  Output will be a list, each item only contains bounding box

  ```bash
  [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
  [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
  [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
  ['PAIN', 0.990372]
  ```

121
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
Leif's avatar
Leif committed
122
<a name="212-multi-language-model"></a>
Leif's avatar
Leif committed
123
124
125

#### 2.1.2 Multi-language Model

Leif's avatar
Leif committed
126
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
Leif's avatar
Leif committed
127
128
129

``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey committed
130
131
```

Leif's avatar
Leif committed
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
[('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
```
littletomatodonkey's avatar
littletomatodonkey committed
148

Leif's avatar
Leif committed
149
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey committed
150

Leif's avatar
Leif committed
151
152
153
154
155
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey committed
156

Leif's avatar
Leif committed
157
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
Leif's avatar
Leif committed
158
<a name="213-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
159

Leif's avatar
Leif committed
160
161
162
#### 2.1.3 Layout Analysis

Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image.
littletomatodonkey's avatar
littletomatodonkey committed
163

Leif's avatar
Leif committed
164
165
166
167
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey committed
168
169
```

Leif's avatar
Leif committed
170
- **Results Format**
Leif's avatar
Leif committed
171

Leif's avatar
Leif committed
172
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
Leif's avatar
Leif committed
173

Leif's avatar
Leif committed
174
175
176
177
178
179
180
181
182
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
Leif's avatar
Leif committed
183

Leif's avatar
Leif committed
184
  The description of each field in dict is as follows
Leif's avatar
Leif committed
185

Leif's avatar
Leif committed
186
187
188
189
190
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey committed
191

Leif's avatar
Leif committed
192
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey committed
193

Leif's avatar
Leif committed
194
195
196
197
198
199
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
  | table_char_type | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
Leif's avatar
Leif committed
200

Leif's avatar
Leif committed
201
<a name="22-use-by-code"></a>
Leif's avatar
Leif committed
202

Leif's avatar
Leif committed
203
204
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
Leif's avatar
Leif committed
205

Leif's avatar
Leif committed
206
#### 2.2.1 Chinese & English Model and Multilingual Model
Leif's avatar
Leif committed
207

Leif's avatar
Leif committed
208
* detection, angle classification and recognition:
Leif's avatar
Leif committed
209

Leif's avatar
Leif committed
210
211
212
213
214
215
216
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
Leif's avatar
Leif committed
217
218
219
220
221
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


Leif's avatar
Leif committed
222
223
# draw result
from PIL import Image
Leif's avatar
Leif committed
224
225
226
227
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
Leif's avatar
Leif committed
228
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
Leif's avatar
Leif committed
229
230
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey committed
231
```
Leif's avatar
Leif committed
232

Leif's avatar
Leif committed
233
Output will be a list, each item contains bounding box, text and recognition confidence
Leif's avatar
Leif committed
234
235

```bash
Leif's avatar
Leif committed
236
237
238
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
Leif's avatar
Leif committed
239
......
littletomatodonkey's avatar
littletomatodonkey committed
240
241
```

Leif's avatar
Leif committed
242
Visualization of results
littletomatodonkey's avatar
littletomatodonkey committed
243

Leif's avatar
Leif committed
244
<div align="center">
Leif's avatar
Leif committed
245
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
Leif's avatar
Leif committed
246
</div>
Leif's avatar
Leif committed
247
<a name="222-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
248

Leif's avatar
Leif committed
249
#### 2.2.2 Layout Analysis
Leif's avatar
Leif committed
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
Leif's avatar
Leif committed
276
277
278
279
280
281
282
283

<a name="3"></a>

## 3. Summary

In this section, you have mastered the use of PaddleOCR whl packages and obtained results.

PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.