quickstart_en.md 10.2 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey committed
1

Leif's avatar
Leif committed
2
# PaddleOCR Quick Start
littletomatodonkey's avatar
littletomatodonkey committed
3

WenmuZhou's avatar
WenmuZhou committed
4
5
6
7
8
9
10
11
12
13
- [PaddleOCR Quick Start](#paddleocr-quick-start)
  - [1. Install PaddleOCR Whl Package](#1-install-paddleocr-whl-package)
  - [2. Easy-to-Use](#2-easy-to-use)
    - [2.1 Use by Command Line](#21-use-by-command-line)
      - [2.1.1 Chinese and English Model](#211-chinese-and-english-model)
      - [2.1.2 Multi-language Model](#212-multi-language-model)
      - [2.1.3 Layout Analysis](#213-layout-analysis)
    - [2.2 Use by Code](#22-use-by-code)
      - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model)
      - [2.2.2 Layout Analysis](#222-layout-analysis)
littletomatodonkey's avatar
littletomatodonkey committed
14
15
16



Leif's avatar
Leif committed
17
<a name="1-install-paddleocr-whl-package"></a>
WenmuZhou's avatar
WenmuZhou committed
18

Leif's avatar
Leif committed
19
## 1. Install PaddleOCR Whl Package
Leif's avatar
Leif committed
20
21
22

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey committed
23
24
```

Leif's avatar
Leif committed
25
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Leif's avatar
Leif committed
26

Leif's avatar
Leif committed
27
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
Leif's avatar
Leif committed
28

Leif's avatar
Leif committed
29
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey committed
30

Leif's avatar
Leif committed
31
32
33
34
35
36
37
38
39
40
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

41
### 2.1 Use by Command Line
Leif's avatar
Leif committed
42

Leif's avatar
Leif committed
43
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
Leif's avatar
Leif committed
44
45

```bash
Leif's avatar
Leif committed
46
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey committed
47
```
Leif's avatar
Leif committed
48

Leif's avatar
Leif committed
49
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
Leif's avatar
Leif committed
50

Leif's avatar
Leif committed
51
<a name="211-english-and-chinese-model"></a>
Leif's avatar
Leif committed
52

Leif's avatar
Leif committed
53
#### 2.1.1 Chinese and English Model
Leif's avatar
Leif committed
54

55
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
Leif's avatar
Leif committed
56

Leif's avatar
Leif committed
57
  ```bash
58
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
Leif's avatar
Leif committed
59
  ```
littletomatodonkey's avatar
littletomatodonkey committed
60

Leif's avatar
Leif committed
61
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey committed
62

Leif's avatar
Leif committed
63
64
65
66
67
68
69
70
71
72
73
74
  ```bash
  [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
  [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
  [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
Leif's avatar
Leif committed
75

Leif's avatar
Leif committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
  Output will be a list, each item only contains bounding box

  ```bash
  [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
  [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
  [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
  ['PAIN', 0.990372]
  ```

97
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
Leif's avatar
Leif committed
98
<a name="212-multi-language-model"></a>
Leif's avatar
Leif committed
99
100
101

#### 2.1.2 Multi-language Model

Leif's avatar
Leif committed
102
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
Leif's avatar
Leif committed
103
104
105

``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey committed
106
107
```

Leif's avatar
Leif committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
[('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
```
littletomatodonkey's avatar
littletomatodonkey committed
124

Leif's avatar
Leif committed
125
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey committed
126

Leif's avatar
Leif committed
127
128
129
130
131
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey committed
132

Leif's avatar
Leif committed
133
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
Leif's avatar
Leif committed
134
<a name="213-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
135

Leif's avatar
Leif committed
136
137
138
#### 2.1.3 Layout Analysis

Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image.
littletomatodonkey's avatar
littletomatodonkey committed
139

Leif's avatar
Leif committed
140
141
142
143
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey committed
144
145
```

Leif's avatar
Leif committed
146
- **Results Format**
Leif's avatar
Leif committed
147

Leif's avatar
Leif committed
148
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
Leif's avatar
Leif committed
149

Leif's avatar
Leif committed
150
151
152
153
154
155
156
157
158
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
Leif's avatar
Leif committed
159

Leif's avatar
Leif committed
160
  The description of each field in dict is as follows
Leif's avatar
Leif committed
161

Leif's avatar
Leif committed
162
163
164
165
166
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey committed
167

Leif's avatar
Leif committed
168
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey committed
169

Leif's avatar
Leif committed
170
171
172
173
174
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
WenmuZhou's avatar
WenmuZhou committed
175
  | table_char_dict_path | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
Leif's avatar
Leif committed
176

Leif's avatar
Leif committed
177
<a name="22-use-by-code"></a>
Leif's avatar
Leif committed
178

Leif's avatar
Leif committed
179
180
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
Leif's avatar
Leif committed
181

Leif's avatar
Leif committed
182
#### 2.2.1 Chinese & English Model and Multilingual Model
Leif's avatar
Leif committed
183

Leif's avatar
Leif committed
184
* detection, angle classification and recognition:
Leif's avatar
Leif committed
185

Leif's avatar
Leif committed
186
187
188
189
190
191
192
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
Leif's avatar
Leif committed
193
194
195
196
197
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


Leif's avatar
Leif committed
198
199
# draw result
from PIL import Image
Leif's avatar
Leif committed
200
201
202
203
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
Leif's avatar
Leif committed
204
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
Leif's avatar
Leif committed
205
206
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey committed
207
```
Leif's avatar
Leif committed
208

Leif's avatar
Leif committed
209
Output will be a list, each item contains bounding box, text and recognition confidence
Leif's avatar
Leif committed
210
211

```bash
Leif's avatar
Leif committed
212
213
214
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
Leif's avatar
Leif committed
215
......
littletomatodonkey's avatar
littletomatodonkey committed
216
217
```

Leif's avatar
Leif committed
218
Visualization of results
littletomatodonkey's avatar
littletomatodonkey committed
219

Leif's avatar
Leif committed
220
<div align="center">
Leif's avatar
Leif committed
221
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
Leif's avatar
Leif committed
222
</div>
Leif's avatar
Leif committed
223
<a name="222-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey committed
224

Leif's avatar
Leif committed
225
#### 2.2.2 Layout Analysis
Leif's avatar
Leif committed
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```