Initial commit

a565fa3a · luopl · a565fa3a · a565fa3a · a565fa3a · a565fa3a
Commit a565fa3a authored Oct 29, 2025 by luopl
20 changed files
--- a/.gitattributes
+++ b/.gitattributes
+*.js linguist-vendored
+*.mjs linguist-vendored
+*.html linguist-documentation
+*.css linguist-vendored
+*.scss linguist-vendored
\ No newline at end of file
--- a/.gitignore
+++ b/.gitignore
+*.tar
+*.tar.gz
+*.zip
+venv*/
+envs/
+slurm_logs/
+sync1.sh
+data_preprocess_pj1
+data-preparation1
+__pycache__
+*.log
+*.pyc
+.vscode
+debug/
+*.ipynb
+.idea
+# vscode history
+.history
+.DS_Store
+.env
+bad_words/
+bak/
+app/tests/*
+temp/
+tmp/
+tmp
+.vscode
+.vscode/
+ocr_demo
+.coveragerc
+/app/common/__init__.py
+/magic_pdf/config/__init__.py
+source.dev.env
+tmp
+projects/web/node_modules
+projects/web/dist
+projects/web_demo/web_demo/static/
+cli_debug/
+debug_utils/
+# sphinx docs
+_build/
+output/
\ No newline at end of file
--- a/LICENSE.md
+++ b/LICENSE.md
--- a/MinerU_CLA.md
+++ b/MinerU_CLA.md
+# MinerU Contributor License Agreement
+In order to clarify the intellectual property license granted with Contributions from any person or entity, the open source project MinerU ("MinerU") must have a Contributor License Agreement (CLA) on file that has been signed by each Contributor, indicating agreement to the license terms below. This license is for your protection as a Contributor as well as the protection of MinerU and its users; it does not change your rights to use your own Contributions for any other purpose.
+You accept and agree to the following terms and conditions for Your present and future Contributions submitted to MinerU. Except for the license granted herein to MinerU and recipients of software distributed by MinerU, You reserve all right, title, and interest in and to Your Contributions.
+1. Definitions. "You" (or "Your") shall mean the copyright owner or legal entity authorized by the copyright owner that is making this Agreement with MinerU. For legal entities, the entity making a Contribution and all other entities that control, are controlled by, or are under common control with that entity are considered to be a single Contributor. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "Contribution" shall mean the code, documentation or any original work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to MinerU for inclusion in, or documentation of, any of the products owned or managed by MinerU (the "Work"). For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to MinerU or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, MinerU for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
+2. Grant of Copyright License. Subject to the terms and conditions of this Agreement, You hereby grant to MinerU and to recipients of software distributed by MinerU a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
+3. Grant of Patent License. Subject to the terms and conditions of this Agreement, You hereby grant to MinerU and to recipients of software distributed by MinerU a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Work to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.
+4. You represent that You are legally entitled to grant the above license. If You are an entity, You represent further that each of Your employee designated by You is authorized to submit Contributions on behalf of You. If You are an individual and Your employer(s) has rights to intellectual property that You create that includes Your Contributions, You represent further that You have received permission to make Contributions on behalf of that employer, that Your employer has waived such rights for Your Contributions to MinerU, or that Your employer has executed a separate CLA with MinerU.
+5. If you do post content or submit material on MinerU and unless we indicate otherwise, you grant MinerU a nonexclusive, royalty-free, perpetual, irrevocable, and fully sublicensable right to use, reproduce, modify, adapt, publish, perform, translate, create derivative works from, distribute, and display such content throughout the world in any media. You grant MinerU and sublicensees the right to use your GitHub Public Profile, including but not limited to name, that you submit in connection with such content. You represent and warrant that you own or otherwise control all of the rights to the content that you post; that the content is accurate; that use of the content you supply does not violate this policy and will not cause injury to any person or entity; and that you will indemnify MinerU for all claims resulting from content you supply. MinerU has the right but not the obligation to monitor and edit or remove any activity or content. MinerU takes no responsibility and assumes no liability for any content posted by you or any third party.
+6. You represent that each of Your Contributions is Your original creation. Should You wish to submit work that is not Your original creation, You may submit it to MinerU separately from any Contribution, identifying the complete details of its source and of any license or other restriction (including, but not limited to, related patents, trademarks, and license agreements) of which You are personally aware, and conspicuously marking the work as "Submitted on behalf of a third-party: [named here]".
+7. You are not expected to provide support for Your Contributions, except to the extent You desire to provide support. You may provide support for free, for a fee, or not at all. Unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
+8. You agree to notify MinerU of any facts or circumstances of which You become aware that would make these representations inaccurate in any respect.
+9. MinerU reserves the right to update or change this Agreement at any time, by posting the most current version of the Agreement on MinerU, with a new Effective Date shown on Jul. 24th, 2024. All such changes in the Agreement are effective from the Effective Date. Your continued use of MinerU after we post any such changes signifies your agreement to those changes. If you do not agree to the then-current Agreement, you must immediately discontinue using MinerU.
--- a/README.md
+++ b/README.md
+# MinerU2.5
+## 论文
+`
+MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
+`
+- https://arxiv.org/abs/2509.22186
+## 模型结构
+MinerU 2.5 版本重点是介绍自 2.0 版本以来 vlm 端的进展，之前的集成处理方案(Pipeline)并没有大变化。
+VLM 的核心创新是采用解耦架构，通过从粗到细两阶段推理机制，将全局布局分析与局部内容识别分离开。
+在第一阶段，模型对下采样后的文档图像进行快速且整体的布局分析。
+在第二阶段，在检测到的布局指导下，从原始高分辨率输入中裁剪关键区域，并在局部窗口中进行精细识别。
+<div align=center>
+    <img src="./assets/framework.png"/>
+</div>
+## 算法原理
+MinerU 2.5 的模型由三个部分组成：
+- 语言模型：因为文档解析任务通常对大规模语言模型的依赖性较低，为了更好地适应裁剪图像解析中不同的分辨率和宽高比， 将 0.5B 参数的 Qwen2-Instruct 模型原有的 1D-RoPE 替换为 M-RoPE ，从而增强了模型在不同分辨率下的泛化能力。
+- 视觉编码器：受 Qwen2-VL 的启发，MinerU2.5 集成了一种原生分辨率编码机制。 虽然 Qwen2.5-VL 系列采用窗口注意力机制来提高效率，但这种设计会导致文档解析任务的性能下降。因此，采用基于 Qwen2-VL 初始化的 675M 参数 NaViT。
+该视觉编码器支持动态图像分辨率，并采用 2D-RoPE 进行位置编码，使其能够灵活地处理各种分辨率和宽高比的输入。
+- 图像块合并器：为了平衡效率和性能，该架构对相邻的 2×2 视觉标记使用像素解混，在将聚合的视觉标记传递给大型语言模型之前对其进行预处理。这种设计有效地实现了计算效率和任务性能之间的权衡。
+## 环境配置
+### 硬件需求
+DCU型号：K100_AI，节点数量：1台，卡数：1张。
+`-v 路径`、`docker_name`和`imageID`根据实际情况修改
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换
+docker run -it --name mineru2.5 --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v $PWD/MinerU_pytorch:/home/MinerU2.5_pytorch <your IMAGE ID> /bin/bash
+cd /home/MinerU2.5_pytorch
+pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+pip install numpy==1.25.0
+pip install -e .[all] --no-deps
+```
+### Dockerfile（方法二）
+```
+cd /home/MinerU_pytorch
+docker build --no-cache -t MinerU2.5:latest .
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换
+docker run -it --name mineru2.5 --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v $PWD/MinerU_pytorch:/home/MinerU2.5_pytorch <your IMAGE ID> /bin/bash
+cd /home/MinerU2.5_pytorch
+pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+pip install numpy==1.25.0
+pip install -e .[all] --no-deps
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+- https://developer.sourcefind.cn/tool/
+```
+DTK驱动:dtk25.04.2
+python:python3.10
+torch:2.5.1
+torchvision:0.20.1
+triton:3.1
+flash-attn:2.6.1
+vllm:0.9.2
+lmslim:0.3.1
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
+2、其它非特殊库安装,如果特殊深度学习库被替换，请重新安装上述适配版本
+```
+cd /home/MinerU2.5_pytorch
+pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
+pip install numpy==1.25.0
+pip install -e .[all] --no-deps
+```
+## 数据集
+`无`
+## 训练
+`无`
+## 推理
+模型源配置
+```
+#下载模型设置环境变量：
+export MINERU_MODEL_SOURCE=modelscope
+#如需使用本地模型，可使用交互式命令行工具选择模型下载：
+mineru-models-download --help
+#下载完成后，模型路径会在当前终端窗口输出，并自动写入用户目录下的 mineru.json
+```
+### 单机单卡
+```
+cd /home/MinerU2.5_pytorch
+# Default parsing using pipeline backend
+#<input_path>：本地 PDF/图像文件或目录
+#<output_path>：输出目录
+HIP_VISIBLE_DEVICES=0 mineru -p <input_path> -o <output_path>  
+# Or specify vlm backend for parsing
+mineru -p <input_path> -o <output_path> -b vlm-transformers  
+#The vlm backend additionally supports vllm acceleration
+mineru -p <input_path> -o <output_path> -b vlm-vllm-engine
+#FastAPI calls
+#Access http://127.0.0.1:8000/docs in your browser to view the API documentation.
+mineru-api --host 0.0.0.0 --port 8000
+#Start Gradio WebUI visual frontend
+#Access http://127.0.0.1:7860 in your browser to use the Gradio WebUI.
+# Using pipeline/vlm-transformers/vlm-http-client backends
+mineru-gradio --server-name 0.0.0.0 --server-port 7860
+# Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
+mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
+#Using http-client/server method:
+# Start vllm server (requires vllm environment)
+mineru-vllm-server --port 30000
+#In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
+mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
+```
+更多资料可参考源项目中的[`README_ori`](./README_orgin.md)。
+## result
+解析示例：
+layout:
+<div align=center>
+    <img src="./assets/layout.png"/>
+</div>
+解析结果：
+<div align=center>
+    <img src="./assets/result.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：pytorch、vllm。
+## 应用场景
+### 算法类别
+`OCR`
+### 热点应用行业
+`科研,教育,政府,广媒`
+## 预训练权重
+魔搭社区下载地址为：[OpenDataLab/PDF-Extract-Kit-1.0](https://www.modelscope.cn/models/OpenDataLab/PDF-Extract-Kit-1.0)
+transformers/vllm后端模型地址：[OpenDataLab/MinerU2.5-2509-1.2B](https://www.modelscope.cn/models/OpenDataLab/MinerU2.5-2509-1.2B)
+注意：`自动下载模型建议加镜像源下载：export HF_ENDPOINT=https://hf-mirror.com`
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/mineru2.5_pytorch
+## 参考资料
+- https://github.com/opendatalab/MinerU
--- a/README_ori.md
+++ b/README_ori.md
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
--- a/SECURITY.md
+++ b/SECURITY.md
+# Security Policy
+## Supported Versions
+latest
+## Reporting a Vulnerability
+Please do not report security vulnerabilities through public GitHub issues.
+Instead, please report them at https://github.com/opendatalab/MinerU/security.
+Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
+  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
+  * Full paths of source file(s) related to the manifestation of the issue
+  * The location of the affected source code (tag/branch/commit or direct URL)
+  * Any special configuration required to reproduce the issue
+  * Step-by-step instructions to reproduce the issue
+  * Proof-of-concept or exploit code (if possible)
+  * Impact of the issue, including how an attacker might exploit the issue
+This information will help us triage your report more quickly.
+## Preferred Languages
+We prefer all communications to be in English and Chinese.
+## Policy
+We will fix security issues in the project's own code as quickly as possible. Before the project completes the fix, you must not disclose the vulnerability information to any public platform.
--- a/assets/framework.png
+++ b/assets/framework.png
--- a/assets/layout.png
+++ b/assets/layout.png
--- a/assets/result.png
+++ b/assets/result.png
--- a/demo/demo.py
+++ b/demo/demo.py
+# Copyright (c) Opendatalab. All rights reserved.
+import copy
+import json
+import os
+from pathlib import Path
+from loguru import logger
+from mineru.cli.common import convert_pdf_bytes_to_bytes_by_pypdfium2, prepare_env, read_fn
+from mineru.data.data_reader_writer import FileBasedDataWriter
+from mineru.utils.draw_bbox import draw_layout_bbox, draw_span_bbox
+from mineru.utils.enum_class import MakeMode
+from mineru.backend.vlm.vlm_analyze import doc_analyze as vlm_doc_analyze
+from mineru.backend.pipeline.pipeline_analyze import doc_analyze as pipeline_doc_analyze
+from mineru.backend.pipeline.pipeline_middle_json_mkcontent import union_make as pipeline_union_make
+from mineru.backend.pipeline.model_json_to_middle_json import result_to_middle_json as pipeline_result_to_middle_json
+from mineru.backend.vlm.vlm_middle_json_mkcontent import union_make as vlm_union_make
+from mineru.utils.guess_suffix_or_lang import guess_suffix_by_path
+def do_parse(
+    output_dir,  # Output directory for storing parsing results
+    pdf_file_names: list[str],  # List of PDF file names to be parsed
+    pdf_bytes_list: list[bytes],  # List of PDF bytes to be parsed
+    p_lang_list: list[str],  # List of languages for each PDF, default is 'ch' (Chinese)
+    backend="pipeline",  # The backend for parsing PDF, default is 'pipeline'
+    parse_method="auto",  # The method for parsing PDF, default is 'auto'
+    formula_enable=True,  # Enable formula parsing
+    table_enable=True,  # Enable table parsing
+    server_url=None,  # Server URL for vlm-http-client backend
+    f_draw_layout_bbox=True,  # Whether to draw layout bounding boxes
+    f_draw_span_bbox=True,  # Whether to draw span bounding boxes
+    f_dump_md=True,  # Whether to dump markdown files
+    f_dump_middle_json=True,  # Whether to dump middle JSON files
+    f_dump_model_output=True,  # Whether to dump model output files
+    f_dump_orig_pdf=True,  # Whether to dump original PDF files
+    f_dump_content_list=True,  # Whether to dump content list files
+    f_make_md_mode=MakeMode.MM_MD,  # The mode for making markdown content, default is MM_MD
+    start_page_id=0,  # Start page ID for parsing, default is 0
+    end_page_id=None,  # End page ID for parsing, default is None (parse all pages until the end of the document)
+):
+    if backend == "pipeline":
+        for idx, pdf_bytes in enumerate(pdf_bytes_list):
+            new_pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
+            pdf_bytes_list[idx] = new_pdf_bytes
+        infer_results, all_image_lists, all_pdf_docs, lang_list, ocr_enabled_list = pipeline_doc_analyze(pdf_bytes_list, p_lang_list, parse_method=parse_method, formula_enable=formula_enable,table_enable=table_enable)
+        for idx, model_list in enumerate(infer_results):
+            model_json = copy.deepcopy(model_list)
+            pdf_file_name = pdf_file_names[idx]
+            local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)
+            image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
+            images_list = all_image_lists[idx]
+            pdf_doc = all_pdf_docs[idx]
+            _lang = lang_list[idx]
+            _ocr_enable = ocr_enabled_list[idx]
+            middle_json = pipeline_result_to_middle_json(model_list, images_list, pdf_doc, image_writer, _lang, _ocr_enable, formula_enable)
+            pdf_info = middle_json["pdf_info"]
+            pdf_bytes = pdf_bytes_list[idx]
+            _process_output(
+                pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
+                md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
+                f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
+                f_make_md_mode, middle_json, model_json, is_pipeline=True
+            )
+    else:
+        if backend.startswith("vlm-"):
+            backend = backend[4:]
+        f_draw_span_bbox = False
+        parse_method = "vlm"
+        for idx, pdf_bytes in enumerate(pdf_bytes_list):
+            pdf_file_name = pdf_file_names[idx]
+            pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
+            local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)
+            image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
+            middle_json, infer_result = vlm_doc_analyze(pdf_bytes, image_writer=image_writer, backend=backend, server_url=server_url)
+            pdf_info = middle_json["pdf_info"]
+            _process_output(
+                pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
+                md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
+                f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
+                f_make_md_mode, middle_json, infer_result, is_pipeline=False
+            )
+def _process_output(
+        pdf_info,
+        pdf_bytes,
+        pdf_file_name,
+        local_md_dir,
+        local_image_dir,
+        md_writer,
+        f_draw_layout_bbox,
+        f_draw_span_bbox,
+        f_dump_orig_pdf,
+        f_dump_md,
+        f_dump_content_list,
+        f_dump_middle_json,
+        f_dump_model_output,
+        f_make_md_mode,
+        middle_json,
+        model_output=None,
+        is_pipeline=True
+):
+    """处理输出文件"""
+    if f_draw_layout_bbox:
+        draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
+    if f_draw_span_bbox:
+        draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
+    if f_dump_orig_pdf:
+        md_writer.write(
+            f"{pdf_file_name}_origin.pdf",
+            pdf_bytes,
+        )
+    image_dir = str(os.path.basename(local_image_dir))
+    if f_dump_md:
+        make_func = pipeline_union_make if is_pipeline else vlm_union_make
+        md_content_str = make_func(pdf_info, f_make_md_mode, image_dir)
+        md_writer.write_string(
+            f"{pdf_file_name}.md",
+            md_content_str,
+        )
+    if f_dump_content_list:
+        make_func = pipeline_union_make if is_pipeline else vlm_union_make
+        content_list = make_func(pdf_info, MakeMode.CONTENT_LIST, image_dir)
+        md_writer.write_string(
+            f"{pdf_file_name}_content_list.json",
+            json.dumps(content_list, ensure_ascii=False, indent=4),
+        )
+    if f_dump_middle_json:
+        md_writer.write_string(
+            f"{pdf_file_name}_middle.json",
+            json.dumps(middle_json, ensure_ascii=False, indent=4),
+        )
+    if f_dump_model_output:
+        md_writer.write_string(
+            f"{pdf_file_name}_model.json",
+            json.dumps(model_output, ensure_ascii=False, indent=4),
+        )
+    logger.info(f"local output dir is {local_md_dir}")
+def parse_doc(
+        path_list: list[Path],
+        output_dir,
+        lang="ch",
+        backend="pipeline",
+        method="auto",
+        server_url=None,
+        start_page_id=0,
+        end_page_id=None
+):
+    """
+        Parameter description:
+        path_list: List of document paths to be parsed, can be PDF or image files.
+        output_dir: Output directory for storing parsing results.
+        lang: Language option, default is 'ch', optional values include['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka']。
+            Input the languages in the pdf (if known) to improve OCR accuracy.  Optional.
+            Adapted only for the case where the backend is set to "pipeline"
+        backend: the backend for parsing pdf:
+            pipeline: More general.
+            vlm-transformers: More general.
+            vlm-vllm-engine: Faster(engine).
+            vlm-http-client: Faster(client).
+            without method specified, pipeline will be used by default.
+        method: the method for parsing pdf:
+            auto: Automatically determine the method based on the file type.
+            txt: Use text extraction method.
+            ocr: Use OCR method for image-based PDFs.
+            Without method specified, 'auto' will be used by default.
+            Adapted only for the case where the backend is set to "pipeline".
+        server_url: When the backend is `http-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
+        start_page_id: Start page ID for parsing, default is 0
+        end_page_id: End page ID for parsing, default is None (parse all pages until the end of the document)
+    """
+    try:
+        file_name_list = []
+        pdf_bytes_list = []
+        lang_list = []
+        for path in path_list:
+            file_name = str(Path(path).stem)
+            pdf_bytes = read_fn(path)
+            file_name_list.append(file_name)
+            pdf_bytes_list.append(pdf_bytes)
+            lang_list.append(lang)
+        do_parse(
+            output_dir=output_dir,
+            pdf_file_names=file_name_list,
+            pdf_bytes_list=pdf_bytes_list,
+            p_lang_list=lang_list,
+            backend=backend,
+            parse_method=method,
+            server_url=server_url,
+            start_page_id=start_page_id,
+            end_page_id=end_page_id
+        )
+    except Exception as e:
+        logger.exception(e)
+if __name__ == '__main__':
+    # args
+    __dir__ = os.path.dirname(os.path.abspath(__file__))
+    pdf_files_dir = os.path.join(__dir__, "pdfs")
+    output_dir = os.path.join(__dir__, "output")
+    pdf_suffixes = ["pdf"]
+    image_suffixes = ["png", "jpeg", "jp2", "webp", "gif", "bmp", "jpg"]
+    doc_path_list = []
+    for doc_path in Path(pdf_files_dir).glob('*'):
+        if guess_suffix_by_path(doc_path) in pdf_suffixes + image_suffixes:
+            doc_path_list.append(doc_path)
+    """如果您由于网络问题无法下载模型，可以设置环境变量MINERU_MODEL_SOURCE为modelscope使用免代理仓库下载模型"""
+    # os.environ['MINERU_MODEL_SOURCE'] = "modelscope"
+    """Use pipeline mode if your environment does not support VLM"""
+    parse_doc(doc_path_list, output_dir, backend="pipeline")
+    """To enable VLM mode, change the backend to 'vlm-xxx'"""
+    # parse_doc(doc_path_list, output_dir, backend="vlm-transformers")  # more general.
+    # parse_doc(doc_path_list, output_dir, backend="vlm-vllm-engine")  # faster(engine).
+    # parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000")  # faster(client).
\ No newline at end of file
--- a/demo/pdfs/demo1.pdf
+++ b/demo/pdfs/demo1.pdf
--- a/demo/pdfs/demo2.pdf
+++ b/demo/pdfs/demo2.pdf
--- a/demo/pdfs/demo3.pdf
+++ b/demo/pdfs/demo3.pdf
--- a/demo/pdfs/small_ocr.pdf
+++ b/demo/pdfs/small_ocr.pdf
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
\ No newline at end of file
--- a/docs/chemical_knowledge_introduction/introduction.pdf
+++ b/docs/chemical_knowledge_introduction/introduction.pdf
--- a/docs/chemical_knowledge_introduction/introduction.xmind
+++ b/docs/chemical_knowledge_introduction/introduction.xmind
--- a/docs/en/demo/index.md
+++ b/docs/en/demo/index.md
+<script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/5.35.0/gradio.js"></script>
+<gradio-app src="https://opendatalab-mineru.hf.space"></gradio-app>