README.md 2.43 KB
Newer Older
zhougaofeng's avatar
zhougaofeng committed
1
2
# magic_pdf

zhougaofeng's avatar
zhougaofeng committed
3
4
## 安装

zhougaofeng's avatar
zhougaofeng committed
5
### 以下演示在223节点安装pdf解析模块(可以直接使用镜像:1177ea7959ce)
zhougaofeng's avatar
zhougaofeng committed
6

zhougaofeng's avatar
zhougaofeng committed
7
### 1、下载本项目
zhougaofeng's avatar
zhougaofeng committed
8

zhougaofeng's avatar
zhougaofeng committed
9
`git clone http://developer.sourcefind.cn/codes/zhiAn123/magic_pdf.git`
zhougaofeng's avatar
zhougaofeng committed
10

zhougaofeng's avatar
zhougaofeng committed
11
### 2、下载需要的模型库
zhougaofeng's avatar
zhougaofeng committed
12

zhougaofeng's avatar
zhougaofeng committed
13

zhougaofeng's avatar
zhougaofeng committed
14
15
16
17
18
19
20
下载qwen模型:[快速下载通道](http://113.200.138.88:18080/aimodels/qwen/Qwen2-VL-7B-Instruct.git)

下载PDF解析需要的模型:

(1)`git lfs clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git`

(2)使用魔搭下载
zhougaofeng's avatar
zhougaofeng committed
21

zhougaofeng's avatar
zhougaofeng committed
22
23
24
25
26
27
`pip install modelscope`

`from modelscope import snapshot_download`

`model_dir = snapshot_download('opendatalab/PDF-Extract-Kit')`

zhougaofeng's avatar
zhougaofeng committed
28

zhougaofeng's avatar
zhougaofeng committed
29
### 3、安装需要的依赖库
zhougaofeng's avatar
zhougaofeng committed
30

zhougaofeng's avatar
zhougaofeng committed
31
#### 进入主目录(以下内容都在主目录下进行)
zhougaofeng's avatar
zhougaofeng committed
32

zhougaofeng's avatar
zhougaofeng committed
33
`cd magic_pdf`
zhougaofeng's avatar
zhougaofeng committed
34

zhougaofeng's avatar
zhougaofeng committed
35
执行本地源码安装
zhougaofeng's avatar
zhougaofeng committed
36

zhougaofeng's avatar
zhougaofeng committed
37
#### pip install -e .
zhougaofeng's avatar
zhougaofeng committed
38

zhougaofeng's avatar
zhougaofeng committed
39
### 4、修改magic-pdf.template.json
zhougaofeng's avatar
zhougaofeng committed
40

zhougaofeng's avatar
zhougaofeng committed
41
42
43
<div align=center>
    <img src="doc/image (9).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
44

zhougaofeng's avatar
zhougaofeng committed
45
"models-dir":"[模型路径]" 路径指向**第二步下载的pdf解析模型路径下的models文件夹**
zhougaofeng's avatar
zhougaofeng committed
46

zhougaofeng's avatar
zhougaofeng committed
47
将magic-pdf.template.json 拷贝到/root目录下并改名为magic-pdf.json
zhougaofeng's avatar
zhougaofeng committed
48

zhougaofeng's avatar
zhougaofeng committed
49
50
51
<div align=center>
    <img src="doc/image (10).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
52

53
54
55
56
57
58
59
60
61
62
63
64
65
### 5、配置config.ini中的路由地址

vim magic_pdf/config.ini

默认如下:

`pdf_server = http://0.0.0.0:4090`

`ocr_server = http://0.0.0.0:4080`

根据需要,自行配置路由地址

### 6、启动qwen-ocr模块:
zhougaofeng's avatar
zhougaofeng committed
66

zhougaofeng's avatar
zhougaofeng committed
67
修改magic_pdf/magic_pdf/dict2md/ocr_server.py文件中模型路径地址
zhougaofeng's avatar
zhougaofeng committed
68
69
70
71

<div align=center>
    <img src="doc/image11.png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
72
73
74

#### qwen-ocr服务启动代码:

zhougaofeng's avatar
zhougaofeng committed
75
`python magic_pdf/dict2md/ocr_server.py`
zhougaofeng's avatar
zhougaofeng committed
76

77
默认使用0号DCU卡 ,可以通过--dcu_id 指定卡,-c 指定qwen模型地址,--config_path 指定config.ini路径
zhougaofeng's avatar
zhougaofeng committed
78

zhougaofeng's avatar
zhougaofeng committed
79
80
81
82
83
qwen-ocr模块启动成功:
<div align=center>
    <img src="doc/image (5).png"/>
</div>

84
### 7、启动pdf-server解析服务:
zhougaofeng's avatar
zhougaofeng committed
85
86
87

#### pdf-server解析服务启动代码:

zhougaofeng's avatar
zhougaofeng committed
88
`python magic_pdf/tools/pdf_server.py`
zhougaofeng's avatar
zhougaofeng committed
89

90
默认使用0号DCU卡 ,可以通过--dcu_id 指定卡,--config_path 指定config.ini路径
zhougaofeng's avatar
zhougaofeng committed
91

zhougaofeng's avatar
zhougaofeng committed
92
93
94
<div align=center>
    <img src="doc/image (6).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
95

zhougaofeng's avatar
zhougaofeng committed
96
97
98
99
启动成功:
<div align=center>
    <img src="doc/image (7).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
100

101
102
### 8、解析pdf

zhougaofeng's avatar
zhougaofeng committed
103
`python magic_pdf/parse/common_parse.py -p [文件/目录 路径] -o [输出地址]`
104
105
106
107
108
109
110

-p指定pdf路径,-o指定输出路径 --config_path 指定config.ini路径 

<div align=center>
    <img src="doc/image12.png"/>
</div>

zhougaofeng's avatar
zhougaofeng committed
111
112
113
<div align=center>
    <img src="doc/image (8).png"/>
</div>
114
115