# magic_pdf ## 安装 ### 以下演示在223节点安装pdf解析模块（可以直接使用镜像：1177ea7959ce） ### 1、docker run -it --shm-size=1024G -v /parastor/home/zhougf/Qwen1.5-pytorch:/home/practice -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --network=host --group-add video --name pdf_tmp a4dd5be0ca23 bash

### 2、安装需要的依赖库 pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://pypi.tuna.tsinghua.edu.cn/simple

注意：会安装cuda相关的库（nvidia-cudnn），以及没有适配的库（比如torchtext），等安装结束后，卸载这些库即可安装dtk版本的torch、torchvision 下载官方的项目: git clone https://github.com/opendatalab/MinerU.git #### 将本项目的magic_pdf替换git clone 官方的magic_pdf #### pip uninstall magic-pdf #### pip install -e . ### 3、安装需要的模型 git clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git #### 修改magic-pdf.template.json cd MinerU

需要注意，"models-dir":"/home/practice/model/PDF-Extract-Kit/models" 路径指向PDF-Extract-Kit/models 将magic-pdf.template.json 拷贝到/root目录下并改名为magic-pdf.json

### 4、启动qwen-ocr模块：安装qwen_vl_utils库，更新transformers库为4。45版本，卸载flash_attn （1）、pip install qwen_vl_utils -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple （2）、pip install transformers==4.45 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple （3）、pip uninstall flash_attn 默认使用6020端口，0号DCU卡，可以通过--dcu_id 指定卡，--server_port指定端口号

qwen-ocr模块启动成功：

### 5、启动pdf-server解析服务： python magic_pdf/tools/pdf_server.py

启动成功：

### 6、解析pdf python magic_pdf/parse/common_parse.py -p other/接口人.xlsx -o other_res/

-p指定pdf路径，-o指定输出路径