quickstart_en.md 9.57 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey committed
1

Leif's avatar
Leif committed
2
# PaddleOCR Quick Start
littletomatodonkey's avatar
littletomatodonkey committed
3

Leif's avatar
Leif committed
4
[PaddleOCR Quick Start](#paddleocr-quick-start)
littletomatodonkey's avatar
littletomatodonkey committed
5

Leif's avatar
Leif committed
6
+ [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
Leif's avatar
Leif committed
7
8
9
10
11
12
13
14
* [2. Easy-to-Use](#2-easy-to-use)
  + [2.1 Use by command line](#21-use-by-command-line)
    - [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
    - [2.1.2 Multi-language Model](#212-multi-language-model)
    - [2.1.3 LayoutParser](#213-layoutparser)
  + [2.2 Use by Code](#22-use-by-code)
    - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
    - [2.2.2 LayoutParser](#222-layoutparser)
littletomatodonkey's avatar
littletomatodonkey committed
15
16
17



Leif's avatar
Leif committed
18
<a name="1-install-paddleocr-whl-package"></a>
WenmuZhou's avatar
WenmuZhou committed
19

Leif's avatar
Leif committed
20
## 1. Install PaddleOCR Whl Package
Leif's avatar
Leif committed
21
22
23

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey committed
24
25
```

Leif's avatar
Leif committed
26
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Leif's avatar
Leif committed
27

Leif's avatar
Leif committed
28
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
Leif's avatar
Leif committed
29

Leif's avatar
Leif committed
30
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey committed
31

Leif's avatar
Leif committed
32
33
34
35
36
37
38
39
40
41
42
43
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

### 2.1 Use by command line

Leif's avatar
Leif committed
44
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
Leif's avatar
Leif committed
45
46

```bash
Leif's avatar
Leif committed
47
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey committed
48
```
Leif's avatar
Leif committed
49

Leif's avatar
Leif committed
50
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
Leif's avatar
Leif committed
51

Leif's avatar
Leif committed
52
<a name="211-english-and-chinese-model"></a>
Leif's avatar
Leif committed
53

Leif's avatar
Leif committed
54
#### 2.1.1 Chinese and English Model
Leif's avatar
Leif committed
55

Leif's avatar
Leif committed
56
* Detection, direction classification and recognition: set the direction classifier parameter`--use_angle_cls true` to recognize vertical text.
Leif's avatar
Leif committed
57

Leif's avatar
Leif committed
58
59
60
  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en
  ```
littletomatodonkey's avatar
littletomatodonkey committed
61

Leif's avatar
Leif committed
62
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey committed
63

Leif's avatar
Leif committed
64
65
66
67
68
69
70
71
72
73
74
75
  ```bash
  [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
  [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
  [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
Leif's avatar
Leif committed
76

Leif's avatar
Leif committed
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
  Output will be a list, each item only contains bounding box

  ```bash
  [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
  [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
  [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
  ['PAIN', 0.990372]
  ```

More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a>
Leif's avatar
Leif committed
100
101
102

#### 2.1.2 Multi-language Model

Leif's avatar
Leif committed
103
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
Leif's avatar
Leif committed
104
105
106

``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey committed
107
108
```

Leif's avatar
Leif committed
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
[('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
```
littletomatodonkey's avatar
littletomatodonkey committed
125

Leif's avatar
Leif committed
126
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey committed
127

Leif's avatar
Leif committed
128
129
130
131
132
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey committed
133

Leif's avatar
Leif committed
134
135
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
<a name="213-layoutparser"></a>
littletomatodonkey's avatar
littletomatodonkey committed
136

Leif's avatar
Leif committed
137
#### 2.1.3 LayoutParser
littletomatodonkey's avatar
littletomatodonkey committed
138

Leif's avatar
Leif committed
139
140
141
142
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey committed
143
144
```

Leif's avatar
Leif committed
145
- **Results Format**
Leif's avatar
Leif committed
146

Leif's avatar
Leif committed
147
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
Leif's avatar
Leif committed
148

Leif's avatar
Leif committed
149
150
151
152
153
154
155
156
157
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
Leif's avatar
Leif committed
158

Leif's avatar
Leif committed
159
  The description of each field in dict is as follows
Leif's avatar
Leif committed
160

Leif's avatar
Leif committed
161
162
163
164
165
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey committed
166

Leif's avatar
Leif committed
167
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey committed
168

Leif's avatar
Leif committed
169
170
171
172
173
174
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
  | table_char_type | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
Leif's avatar
Leif committed
175

Leif's avatar
Leif committed
176
<a name="22-use-by-code"></a>
Leif's avatar
Leif committed
177

Leif's avatar
Leif committed
178
179
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
Leif's avatar
Leif committed
180

Leif's avatar
Leif committed
181
#### 2.2.1 Chinese & English Model and Multilingual Model
Leif's avatar
Leif committed
182

Leif's avatar
Leif committed
183
* detection, angle classification and recognition:
Leif's avatar
Leif committed
184

Leif's avatar
Leif committed
185
186
187
188
189
190
191
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
Leif's avatar
Leif committed
192
193
194
195
196
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


Leif's avatar
Leif committed
197
198
# draw result
from PIL import Image
Leif's avatar
Leif committed
199
200
201
202
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
Leif's avatar
Leif committed
203
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
Leif's avatar
Leif committed
204
205
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey committed
206
```
Leif's avatar
Leif committed
207

Leif's avatar
Leif committed
208
Output will be a list, each item contains bounding box, text and recognition confidence
Leif's avatar
Leif committed
209
210

```bash
Leif's avatar
Leif committed
211
212
213
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
Leif's avatar
Leif committed
214
......
littletomatodonkey's avatar
littletomatodonkey committed
215
216
```

Leif's avatar
Leif committed
217
Visualization of results
littletomatodonkey's avatar
littletomatodonkey committed
218

Leif's avatar
Leif committed
219
<div align="center">
Leif's avatar
Leif committed
220
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
Leif's avatar
Leif committed
221
</div>
Leif's avatar
Leif committed
222
<a name="222-layoutparser"></a>
littletomatodonkey's avatar
littletomatodonkey committed
223

Leif's avatar
Leif committed
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
#### 2.2.2 LayoutParser

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
littletomatodonkey's avatar
littletomatodonkey committed
251