inference_en.md 4.17 KB
Newer Older
WenmuZhou's avatar
WenmuZhou committed
1
# Python Inference
MissPenguin's avatar
update  
MissPenguin committed
2

3
- [1. Structure](#1)
WenmuZhou's avatar
WenmuZhou committed
4
5
6
  - [1.1 layout analysis + table recognition](#1.1)
  - [1.2 layout analysis](#1.2)
  - [1.3 table recognition](#1.3)
7
- [2. DocVQA](#2)
MissPenguin's avatar
update  
MissPenguin committed
8
9

<a name="1"></a>
10
## 1. Structure
WenmuZhou's avatar
WenmuZhou committed
11
Go to the `ppstructure` directory
MissPenguin's avatar
update  
MissPenguin committed
12
13
14

```bash
cd ppstructure
15
````
WenmuZhou's avatar
WenmuZhou committed
16
17
18

download model

19
```bash
MissPenguin's avatar
update  
MissPenguin committed
20
mkdir inference && cd inference
WenmuZhou's avatar
WenmuZhou committed
21
# Download the PP-OCRv2 text detection model and unzip it
MissPenguin's avatar
update  
MissPenguin committed
22
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_slim_quant_infer.tar && tar xf ch_PP-OCRv2_det_slim_quant_infer.tar
WenmuZhou's avatar
WenmuZhou committed
23
# Download the PP-OCRv2 text recognition model and unzip it
MissPenguin's avatar
update  
MissPenguin committed
24
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar && tar xf ch_PP-OCRv2_rec_slim_quant_infer.tar
WenmuZhou's avatar
WenmuZhou committed
25
# Download the ultra-lightweight English table structure model and unzip it
MissPenguin's avatar
update  
MissPenguin committed
26
27
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar
cd ..
28
29
```
<a name="1.1"></a>
WenmuZhou's avatar
WenmuZhou committed
30
### 1.1 layout analysis + table recognition
31
```bash
MissPenguin's avatar
update  
MissPenguin committed
32
33
34
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
                          --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
                          --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
35
                          --image_dir=./docs/table/1.png \
MissPenguin's avatar
update  
MissPenguin committed
36
37
                          --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
                          --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
38
                          --output=../output \
MissPenguin's avatar
update  
MissPenguin committed
39
40
                          --vis_font_path=../doc/fonts/simfang.ttf
```
WenmuZhou's avatar
WenmuZhou committed
41
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image. Detailed results are stored in the `res.txt` file.
42
43

<a name="1.2"></a>
WenmuZhou's avatar
WenmuZhou committed
44
### 1.2 layout analysis
45
46
47
```bash
python3 predict_system.py --image_dir=./docs/table/1.png --table=false --ocr=false --output=../output/
```
WenmuZhou's avatar
WenmuZhou committed
48
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each picture in image will be cropped and saved. The filename of picture area is their coordinates in the image. Layout analysis results will be stored in the `res.txt` file
49
50

<a name="1.3"></a>
WenmuZhou's avatar
WenmuZhou committed
51
### 1.3 table recognition
52
53
54
55
56
57
58
59
60
61
62
```bash
python3 predict_system.py --det_model_dir=inference/ch_PP-OCRv2_det_slim_quant_infer \
                          --rec_model_dir=inference/ch_PP-OCRv2_rec_slim_quant_infer \
                          --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer \
                          --image_dir=./docs/table/table.jpg \
                          --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
                          --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
                          --output=../output \
                          --vis_font_path=../doc/fonts/simfang.ttf \
                          --layout=false
```
WenmuZhou's avatar
WenmuZhou committed
63
After the operation is completed, each image will have a directory with the same name in the `structure` directory under the directory specified by the `output` field. Each table in the image will be stored as an excel. The filename of excel is their coordinates in the image.
MissPenguin's avatar
update  
MissPenguin committed
64
65
66
67
68
69
70

<a name="2"></a>
## 2. DocVQA

```bash
cd ppstructure

WenmuZhou's avatar
WenmuZhou committed
71
# download model
MissPenguin's avatar
update  
MissPenguin committed
72
73
74
75
76
77
78
79
80
mkdir inference && cd inference
wget https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && tar xf PP-Layout_v1.0_ser_pretrained.tar
cd ..

python3 predict_system.py --model_name_or_path=vqa/PP-Layout_v1.0_ser_pretrained/ \
                          --mode=vqa \
                          --image_dir=vqa/images/input/zh_val_0.jpg  \
                          --vis_font_path=../doc/fonts/simfang.ttf
```
WenmuZhou's avatar
WenmuZhou committed
81
After the operation is completed, each image will store the visualized image in the `vqa` directory under the directory specified by the `output` field, and the image name is the same as the input image name.