README.md 2.03 KB
Newer Older
zhougaofeng's avatar
zhougaofeng committed
1
2
# magic_pdf

zhougaofeng's avatar
zhougaofeng committed
3
4
## 安装

zhougaofeng's avatar
zhougaofeng committed
5
### 以下演示在223节点安装pdf解析模块(可以直接使用镜像:1177ea7959ce)
zhougaofeng's avatar
zhougaofeng committed
6

zhougaofeng's avatar
zhougaofeng committed
7
下载本项目
zhougaofeng's avatar
zhougaofeng committed
8

zhougaofeng's avatar
zhougaofeng committed
9
`git clone http://developer.sourcefind.cn/codes/zhiAn123/magic_pdf.git`
zhougaofeng's avatar
zhougaofeng committed
10

zhougaofeng's avatar
zhougaofeng committed
11
下载需要的模型库
zhougaofeng's avatar
zhougaofeng committed
12
13

`git lfs clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git`
zhougaofeng's avatar
zhougaofeng committed
14

zhougaofeng's avatar
zhougaofeng committed
15
使用魔搭下载
zhougaofeng's avatar
zhougaofeng committed
16

zhougaofeng's avatar
zhougaofeng committed
17
18
19
20
21
22
`pip install modelscope`

`from modelscope import snapshot_download`

`model_dir = snapshot_download('opendatalab/PDF-Extract-Kit')`

zhougaofeng's avatar
zhougaofeng committed
23
24


zhougaofeng's avatar
zhougaofeng committed
25
### 1、安装需要的依赖库
zhougaofeng's avatar
zhougaofeng committed
26

zhougaofeng's avatar
zhougaofeng committed
27
28
`cd magic_pdf-main`

zhougaofeng's avatar
zhougaofeng committed
29
30
#### pip install -e .

zhougaofeng's avatar
zhougaofeng committed
31
### 2、安装需要的模型
zhougaofeng's avatar
zhougaofeng committed
32

zhougaofeng's avatar
zhougaofeng committed
33
#### 修改magic-pdf.template.json
zhougaofeng's avatar
zhougaofeng committed
34

zhougaofeng's avatar
zhougaofeng committed
35
36
37
38
<div align=center>
    <img src="doc/image (9).png"/>
</div>
需要注意,"models-dir":"/home/practice/model/PDF-Extract-Kit/models" 路径指向PDF-Extract-Kit/models
zhougaofeng's avatar
zhougaofeng committed
39

zhougaofeng's avatar
zhougaofeng committed
40
41
42
43
将magic-pdf.template.json 拷贝到/root目录下并改名为magic-pdf.json
<div align=center>
    <img src="doc/image (10).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
44

zhougaofeng's avatar
zhougaofeng committed
45
### 4、启动qwen-ocr模块:
zhougaofeng's avatar
zhougaofeng committed
46
下载qwen模型:[快速下载通道](http://113.200.138.88:18080/aimodels/qwen/Qwen2-VL-7B-Instruct.git)
zhougaofeng's avatar
zhougaofeng committed
47

zhougaofeng's avatar
zhougaofeng committed
48
49
50
51
52
修改magic_pdf-main/magic_pdf/dict2md/ocr_server.py文件中模型路径地址

<div align=center>
    <img src="doc/image11.png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
53
54
55

#### qwen-ocr服务启动代码:

zhougaofeng's avatar
zhougaofeng committed
56
`python magic_pdf/dict2md/ocr_server.py`
zhougaofeng's avatar
zhougaofeng committed
57

zhougaofeng's avatar
zhougaofeng committed
58
默认使用6020端口,0号DCU卡 ,可以通过--dcu_id 指定卡,--server_port指定端口号,-c 指定qwen模型地址
zhougaofeng's avatar
zhougaofeng committed
59

zhougaofeng's avatar
zhougaofeng committed
60
61
62
63
64
65
qwen-ocr模块启动成功:
<div align=center>
    <img src="doc/image (5).png"/>
</div>

### 5、启动pdf-server解析服务:
zhougaofeng's avatar
zhougaofeng committed
66
67
68

#### pdf-server解析服务启动代码:

zhougaofeng's avatar
zhougaofeng committed
69
`python magic_pdf/tools/pdf_server.py`
zhougaofeng's avatar
zhougaofeng committed
70

zhougaofeng's avatar
zhougaofeng committed
71
72
默认使用6030端口,0号DCU卡 ,可以通过--dcu_id 指定卡,--pdf_port指定端口号

zhougaofeng's avatar
zhougaofeng committed
73
74
75
<div align=center>
    <img src="doc/image (6).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
76

zhougaofeng's avatar
zhougaofeng committed
77
78
79
80
启动成功:
<div align=center>
    <img src="doc/image (7).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
81

zhougaofeng's avatar
zhougaofeng committed
82
### 6、解析pdf
zhougaofeng's avatar
zhougaofeng committed
83
`python magic_pdf/parse/common_parse.py -p [文件/目录 路径] -o [输出地址]`
zhougaofeng's avatar
zhougaofeng committed
84
85
86
87
<div align=center>
    <img src="doc/image (8).png"/>
</div>
-p指定pdf路径,-o指定输出路径