README.md 1.51 KB
Newer Older
zhougaofeng's avatar
zhougaofeng committed
1
2
# magic_pdf

zhougaofeng's avatar
zhougaofeng committed
3
4
## 安装

zhougaofeng's avatar
zhougaofeng committed
5
### 以下演示在223节点安装pdf解析模块(可以直接使用镜像:1177ea7959ce)
zhougaofeng's avatar
zhougaofeng committed
6

zhougaofeng's avatar
zhougaofeng committed
7
8
下载本项目,用于替换官方项目代码

zhougaofeng's avatar
zhougaofeng committed
9
### 1、安装需要的依赖库
zhougaofeng's avatar
zhougaofeng committed
10

zhougaofeng's avatar
zhougaofeng committed
11
下载官方的项目:
zhougaofeng's avatar
zhougaofeng committed
12
` git clone https://github.com/opendatalab/MinerU.git`
zhougaofeng's avatar
zhougaofeng committed
13

zhougaofeng's avatar
zhougaofeng committed
14
15
16
17
#### 将本项目的magic_pdf替换git clone 官方的magic_pdf
#### pip uninstall magic-pdf
#### pip install -e .

zhougaofeng's avatar
zhougaofeng committed
18
19
### 2、安装需要的模型
`git clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git`
zhougaofeng's avatar
zhougaofeng committed
20
21
22
23
24
25
#### 修改magic-pdf.template.json
cd MinerU
<div align=center>
    <img src="doc/image (9).png"/>
</div>
需要注意,"models-dir":"/home/practice/model/PDF-Extract-Kit/models" 路径指向PDF-Extract-Kit/models
zhougaofeng's avatar
zhougaofeng committed
26

zhougaofeng's avatar
zhougaofeng committed
27
28
29
30
将magic-pdf.template.json 拷贝到/root目录下并改名为magic-pdf.json
<div align=center>
    <img src="doc/image (10).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
31

zhougaofeng's avatar
zhougaofeng committed
32
### 4、启动qwen-ocr模块:
zhougaofeng's avatar
zhougaofeng committed
33

zhougaofeng's avatar
zhougaofeng committed
34
`python magic_pdf/dict2md/ocr_server.py`
zhougaofeng's avatar
zhougaofeng committed
35

zhougaofeng's avatar
zhougaofeng committed
36
默认使用6020端口,0号DCU卡 ,可以通过--dcu_id 指定卡,--server_port指定端口号
zhougaofeng's avatar
zhougaofeng committed
37

zhougaofeng's avatar
zhougaofeng committed
38
39
40
41
42
43
qwen-ocr模块启动成功:
<div align=center>
    <img src="doc/image (5).png"/>
</div>

### 5、启动pdf-server解析服务:
zhougaofeng's avatar
zhougaofeng committed
44
`python magic_pdf/tools/pdf_server.py`
zhougaofeng's avatar
zhougaofeng committed
45
46
47
48
49
50
51
<div align=center>
    <img src="doc/image (6).png"/>
</div>
启动成功:
<div align=center>
    <img src="doc/image (7).png"/>
</div>
zhougaofeng's avatar
zhougaofeng committed
52

zhougaofeng's avatar
zhougaofeng committed
53
### 6、解析pdf
zhougaofeng's avatar
zhougaofeng committed
54
`python magic_pdf/parse/common_parse.py -p [文件/目录 路径] -o [输出地址]`
zhougaofeng's avatar
zhougaofeng committed
55
56
57
58
<div align=center>
    <img src="doc/image (8).png"/>
</div>
-p指定pdf路径,-o指定输出路径