Merge pull request #2065 from opendatalab/release-1.3.0

Release 1.3.0

Merge pull request #2065 from opendatalab/release-1.3.0
Release 1.3.0
41d96cd8 · Xiaomeng Zhao · GitHub · c3d43e52 · dd96663c · 41d96cd8
Unverified Commit 41d96cd8 authored Apr 03, 2025 by Xiaomeng Zhao Committed by GitHub Apr 03, 2025
20 changed files
--- a/README.md
+++ b/README.md
@@ -47,6 +47,20 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
 </div>
 # Changelog
+- 2025/04/03 Release of version 1.3.0, with many changes in this version:
+  - Installation and compatibility optimization
+    - By using paddleocr2torch, completely replaced the paddle framework and paddleocr used in the project, resolving conflicts between paddle and torch.
+    - Removed the use of layoutlmv3 in layout, solving compatibility issues caused by `detectron2`.
+    - Extended torch version compatibility to 2.2~2.6.
+    - CUDA compatibility extended to 11.8~12.6 (CUDA version determined by torch), addressing compatibility issues for some users with 50-series and H-series Nvidia GPUs.
+    - Python compatible versions extended to 3.10~3.12, resolving the issue of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
+  - Performance optimization (compared to version 1.0.1, formula parsing speed improved by over 1400%, and overall parsing speed improved by over 500%)
+    - Improved parsing speed for batch processing of multiple small PDF files ([script example](demo/batch_demo.py)).
+    - Optimized the loading and usage of the mfr model, reducing memory usage and improving parsing speed. (requires re-executing the [model download process](docs/how_to_download_models_en.md) to obtain incremental updates of model files)
+    - Optimized memory usage, allowing the project to run with as little as 6GB.
+    - Improved running speed on mps devices.
+  - Parsing effect optimization
+    - Updated the mfr model to unimernet(2503), solving the issue of missing line breaks in multi-line formulas.
 - 2025/03/03 1.2.1 released, fixed several bugs:
  - Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers
  - Fixed caption matching inaccuracies in certain scenarios
@@ -215,7 +229,7 @@ There are three different ways to experience MinerU:
    </tr>
    <tr>
        <td colspan="3">Python Version</td>
-        <td colspan="3">3.10(Please make sure to create a Python 3.10 virtual environment using conda)</td>
+        <td colspan="3">3.10~3.12</td>
    </tr>
    <tr>
        <td colspan="3">Nvidia Driver Version</td>
@@ -225,8 +239,8 @@ There are three different ways to experience MinerU:
    </tr>
    <tr>
        <td colspan="3">CUDA Environment</td>
-        <td>Automatic installation [12.1 (pytorch) + 11.8 (paddle)]</td>
+        <td>11.8/12.4/12.6</td>
-        <td>11.8 (manual installation) + cuDNN v8.7.0 (manual installation)</td>
+        <td>11.8/12.4/12.6</td>
        <td>None</td>
    </tr>
    <tr>
@@ -236,11 +250,11 @@ There are three different ways to experience MinerU:
        <td>None</td>
    </tr>
    <tr>
-        <td rowspan="2">GPU Hardware Support List</td>
+        <td rowspan="2">GPU/MPS Hardware Support List</td>
-        <td colspan="2">GPU VRAM 8GB or more</td>
+        <td colspan="2">GPU VRAM 6GB or more</td>
-        <td colspan="2">2080~2080Ti / 3060Ti~3090Ti / 4060~4090<br>
+        <td colspan="2">All GPUs with Tensor Cores produced from Volta(2017) onwards.<br>
-        8G VRAM can enable all acceleration features</td>
+        More than 6GB VRAM </td>
-        <td rowspan="2">None</td>
+        <td rowspan="2">apple slicon</td>
    </tr>
 </table>
@@ -257,9 +271,9 @@ Synced with dev branch updates:
 #### 1. Install magic-pdf
 ```bash
-conda create -n mineru python=3.10
+conda create -n mineru 'python<3.13' -y
 conda activate mineru
-pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com
+pip install -U "magic-pdf[full]"
 ```
 #### 2. Download model weight files
@@ -284,7 +298,7 @@ You can modify certain configurations in this file to enable or disable features
 {
    // other config
    "layout-config": {
-        "model": "doclayout_yolo" // Please change to "layoutlmv3" when using layoutlmv3.
+        "model": "doclayout_yolo" 
    },
    "formula-config": {
        "mfd_model": "yolo_v8_mfd",
@@ -292,8 +306,8 @@ You can modify certain configurations in this file to enable or disable features
        "enable": true  // The formula recognition feature is enabled by default. If you need to disable it, please change the value here to "false".
    },
    "table-config": {
-        "model": "rapid_table",  // Default to using "rapid_table", can be switched to "tablemaster" or "struct_eqtable".
+        "model": "rapid_table", 
-        "sub_model": "slanet_plus",  // When the model is "rapid_table", you can choose a sub_model. The options are "slanet_plus" and "unitable"
+        "sub_model": "slanet_plus",
        "enable": true, // The table recognition feature is enabled by default. If you need to disable it, please change the value here to "false".
        "max_time": 400
    }
@@ -308,7 +322,7 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
 - [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
 - Quick Deployment with Docker
 > [!IMPORTANT]
-> Docker requires a GPU with at least 8GB of VRAM, and all acceleration features are enabled by default.
+> Docker requires a GPU with at least 6GB of VRAM, and all acceleration features are enabled by default.
 >
 > Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
 > 
@@ -330,7 +344,7 @@ If your device has NPU acceleration hardware, you can follow the tutorial below
 ### Using MPS
-If your device uses Apple silicon chips, you can enable MPS acceleration for certain supported tasks (such as layout detection and formula detection).
+If your device uses Apple silicon chips, you can enable MPS acceleration for your tasks.
 You can enable MPS acceleration by setting the `device-mode` parameter to `mps` in the `magic-pdf.json` configuration file.
@@ -341,10 +355,6 @@ You can enable MPS acceleration by setting the `device-mode` parameter to `mps`
 }
 ```
-> [!TIP]
-> Since the formula recognition task cannot utilize MPS acceleration, you can disable the formula recognition feature in tasks where it is not needed to achieve optimal performance.
->
-> You can disable the formula recognition feature by setting the `enable` parameter in the `formula-config` section to `false`.
 ## Usage
@@ -418,6 +428,8 @@ This project currently uses PyMuPDF to achieve advanced functionality. However,
 - [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
 - [RapidTable](https://github.com/RapidAI/RapidTable)
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
+- [RapidOCR](https://github.com/RapidAI/RapidOCR)
+- [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
 - [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
 - [fast-langdetect](https://github.com/LlmKira/fast-langdetect)

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -46,6 +46,21 @@
 </div>
 # 更新记录
+- 2025/04/03 1.3.0 发布，在这个版本我们做出了许多改变：
+  - 安装与兼容性优化
+    - 通过使用paddleocr2torch，完全替代了paddle框架以及paddleocr在项目中的使用，解决了paddle和torch的冲突问题
+    - 通过移除layout中layoutlmv3的使用，解决了由`detectron2`导致的兼容问题
+    - torch版本兼容扩展到2.2~2.6
+    - cuda兼容扩展到11.8~12.6（cuda版本由torch决定），解决部分用户50系显卡与H系显卡的兼容问题
+    - python兼容版本扩展到3.10~3.12，解决了在非3.10环境下安装时自动降级到0.6.1的问题
+    - 优化离线部署流程，部署成功后不需要联网下载任何模型文件
+  - 性能优化（与1.0.1版本相比，公式解析速度最高提升超过1400%，整体解析速度提升超过500%）
+    - 通过支持多个pdf文件的batch处理（[脚本样例](demo/batch_demo.py)），提升了批量小文件的解析速度
+    - 通过优化mfr模型的加载和使用，降低了显存占用并提升了解析速度(需重新执行[模型下载流程](docs/how_to_download_models_zh_cn.md)以获得模型文件的增量更新)
+    - 优化显存占用，最低仅需6GB即可运行本项目
+    - 优化了在mps设备上的运行速度
+  - 解析效果优化
+    - mfr模型更新到unimernet(2503)，解决多行公式中换行丢失的问题
 - 2025/03/03 1.2.1 发布，修复了一些问题：
  - 修复在字母与数字的全角转半角操作时对标点符号的影响
  - 修复在某些情况下caption的匹配不准确问题
@@ -216,7 +231,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
    </tr>
    <tr>
        <td colspan="3">python版本</td>
-        <td colspan="3">3.10 (请务必通过conda创建3.10虚拟环境)</td>
+        <td colspan="3">>=3.9,<=3.12</td>
    </tr>
    <tr>
        <td colspan="3">Nvidia Driver 版本</td>
@@ -226,8 +241,8 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
    </tr>
    <tr>
        <td colspan="3">CUDA环境</td>
-        <td>自动安装[12.1(pytorch)+11.8(paddle)]</td>
+        <td>11.8/12.4/12.6</td>
-        <td>11.8(手动安装)+cuDNN v8.7.0(手动安装)</td>
+        <td>11.8/12.4/12.6</td>
        <td>None</td>
    </tr>
    <tr>
@@ -237,12 +252,12 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
        <td>None</td>
    </tr>
    <tr>
-        <td rowspan="2">GPU硬件支持列表</td>
+        <td rowspan="2">GPU/MPS 硬件支持列表</td>
-        <td colspan="2">显存8G以上</td>
+        <td colspan="2">显存6G以上</td>
        <td colspan="2">
-        2080~2080Ti / 3060Ti~3090Ti / 4060~4090<br>
+        Volta(2017)及之后生产的全部带Tensor Core的GPU <br>
-        8G显存及以上可开启全部加速功能</td>
+        6G显存及以上</td>
-        <td rowspan="2">None</td>
+        <td rowspan="2">apple slicon</td>
    </tr>
 </table>
@@ -262,9 +277,9 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 > 最新版本国内镜像源同步可能会有延迟，请耐心等待
 ```bash
-conda create -n mineru python=3.10
+conda create -n mineru 'python<3.13' -y
 conda activate mineru
-pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
+pip install -U "magic-pdf[full]" -i https://mirrors.aliyun.com/pypi/simple
 ```
 #### 2. 下载模型权重文件
@@ -288,7 +303,7 @@ pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i
 {
    // other config
    "layout-config": {
-        "model": "doclayout_yolo" // 使用layoutlmv3请修改为“layoutlmv3"
+        "model": "doclayout_yolo" 
    },
    "formula-config": {
        "mfd_model": "yolo_v8_mfd",
@@ -296,8 +311,8 @@ pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i
        "enable": true  // 公式识别功能默认是开启的，如果需要关闭请修改此处的值为"false"
    },
    "table-config": {
-        "model": "rapid_table",  // 默认使用"rapid_table",可以切换为"tablemaster"和"struct_eqtable"
+        "model": "rapid_table",
-        "sub_model": "slanet_plus",  // 当model为"rapid_table"时，可以自选sub_model，可选项为"slanet_plus"和"unitable"
+        "sub_model": "slanet_plus",
        "enable": true, // 表格识别功能默认是开启的，如果需要关闭请修改此处的值为"false"
        "max_time": 400
    }
@@ -312,7 +327,7 @@ pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i
 - [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
 - 使用Docker快速部署
 > [!IMPORTANT]
-> Docker 需设备gpu显存大于等于8GB，默认开启所有加速功能
+> Docker 需设备gpu显存大于等于6GB，默认开启所有加速功能
 > 
 > 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
 > 
@@ -332,7 +347,7 @@ pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i
 [NPU加速教程](docs/README_Ascend_NPU_Acceleration_zh_CN.md)
 ### 使用MPS
-如果您的设备使用Apple silicon 芯片，您可以在部分支持的任务（layout检测/公式检测）中开启mps加速：
+如果您的设备使用Apple silicon 芯片，您可以开启mps加速：
 您可以通过在 `magic-pdf.json` 配置文件中将 `device-mode` 参数设置为 `mps` 来启用 MPS 加速。
@@ -343,10 +358,6 @@ pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i
 }
 ```
-> [!TIP]
-> 由于公式识别任务无法开启mps加速，您可在不需要识别公式的任务关闭公式识别功能以获得最佳性能。
->
-> 您可以通过将 `formula-config` 部分中的 `enable` 参数设置为 `false` 来禁用公式识别功能。
 ## 使用
@@ -422,6 +433,8 @@ TODO
 - [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
 - [RapidTable](https://github.com/RapidAI/RapidTable)
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
+- [RapidOCR](https://github.com/RapidAI/RapidOCR)
+- [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
 - [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
 - [fast-langdetect](https://github.com/LlmKira/fast-langdetect)

--- a/demo/batch_demo.py
+++ b/demo/batch_demo.py
+import os
+from pathlib import Path
+from magic_pdf.data.batch_build_dataset import batch_build_dataset
+from magic_pdf.tools.common import batch_do_parse
+def batch(pdf_dir, output_dir, method, lang):
+    os.makedirs(output_dir, exist_ok=True)
+    doc_paths = []
+    for doc_path in Path(pdf_dir).glob('*'):
+        if doc_path.suffix == '.pdf':
+            doc_paths.append(doc_path)
+    # build dataset with 2 workers
+    datasets = batch_build_dataset(doc_paths, 4, lang)
+    # os.environ["MINERU_MIN_BATCH_INFERENCE_SIZE"] = "200"  # every 200 pages will be parsed in one batch
+    batch_do_parse(output_dir, [str(doc_path.stem) for doc_path in doc_paths], datasets, method)
+if __name__ == '__main__':
+    batch("pdfs", "output", "auto", "")
--- a/demo/demo.py
+++ b/demo/demo.py
@@ -7,18 +7,17 @@ from magic_pdf.model.doc_analyze_by_custom_model import doc_analyze
 from magic_pdf.config.enums import SupportedPdfParseMethod
 # args
-pdf_file_name = "demo1.pdf"  # replace with the real pdf path
+__dir__ = os.path.dirname(os.path.abspath(__file__))
-name_without_suff = pdf_file_name.split(".")[0]
+pdf_file_name = os.path.join(__dir__, "pdfs", "demo1.pdf")  # replace with the real pdf path
+name_without_extension = os.path.basename(pdf_file_name).split('.')[0]
 # prepare env
-local_image_dir, local_md_dir = "output/images", "output"
+local_image_dir = os.path.join(__dir__, "output", name_without_extension, "images")
+local_md_dir = os.path.join(__dir__, "output", name_without_extension)
 image_dir = str(os.path.basename(local_image_dir))
 os.makedirs(local_image_dir, exist_ok=True)
-image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(
+image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
-    local_md_dir
-)
 # read bytes
 reader1 = FileBasedDataReader("")
@@ -41,32 +40,29 @@ else:
    ## pipeline
    pipe_result = infer_result.pipe_txt_mode(image_writer)
-### draw model result on each page
-infer_result.draw_model(os.path.join(local_md_dir, f"{name_without_suff}_model.pdf"))
 ### get model inference result
 model_inference_result = infer_result.get_infer_res()
 ### draw layout result on each page
-pipe_result.draw_layout(os.path.join(local_md_dir, f"{name_without_suff}_layout.pdf"))
+pipe_result.draw_layout(os.path.join(local_md_dir, f"{name_without_extension}_layout.pdf"))
 ### draw spans result on each page
-pipe_result.draw_span(os.path.join(local_md_dir, f"{name_without_suff}_spans.pdf"))
+pipe_result.draw_span(os.path.join(local_md_dir, f"{name_without_extension}_spans.pdf"))
 ### get markdown content
 md_content = pipe_result.get_markdown(image_dir)
 ### dump markdown
-pipe_result.dump_md(md_writer, f"{name_without_suff}.md", image_dir)
+pipe_result.dump_md(md_writer, f"{name_without_extension}.md", image_dir)
 ### get content list content
 content_list_content = pipe_result.get_content_list(image_dir)
 ### dump content list
-pipe_result.dump_content_list(md_writer, f"{name_without_suff}_content_list.json", image_dir)
+pipe_result.dump_content_list(md_writer, f"{name_without_extension}_content_list.json", image_dir)
 ### get middle json
 middle_json_content = pipe_result.get_middle_json()
 ### dump middle json
-pipe_result.dump_middle_json(md_writer, f'{name_without_suff}_middle.json')
+pipe_result.dump_middle_json(md_writer, f'{name_without_extension}_middle.json')
--- a/demo/demo1.pdf
+++ b/demo/demo1.pdf
--- a/demo/demo2.pdf
+++ b/demo/demo2.pdf
--- a/demo/pdfs/demo3.pdf
+++ b/demo/pdfs/demo3.pdf
--- a/demo/small_ocr.pdf
+++ b/demo/small_ocr.pdf
--- a/docker/ascend_npu/Dockerfile
+++ b/docker/ascend_npu/Dockerfile
@@ -34,10 +34,9 @@ RUN python3 -m venv /opt/mineru_venv
 RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
    pip3 install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple && \
    wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/ascend_npu/requirements.txt -O requirements.txt && \
-    pip3 install -r requirements.txt --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple && \
+    pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple && \
    wget https://gitee.com/ascend/pytorch/releases/download/v6.0.rc2-pytorch2.3.1/torch_npu-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl && \
-    pip3 install torch_npu-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl && \
+    pip3 install torch_npu-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl"
-    pip3 install https://gcore.jsdelivr.net/gh/myhloli/wheels@main/assets/whl/paddle-custom-npu/paddle_custom_npu-0.0.0-cp310-cp310-linux_aarch64.whl"
 # Copy the configuration file template and install magic-pdf latest
 RUN /bin/bash -c "wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/magic-pdf.template.json && \

--- a/docker/ascend_npu/requirements.txt
+++ b/docker/ascend_npu/requirements.txt
 boto3>=1.28.43
 Brotli>=1.1.0
 click>=8.1.7
-PyMuPDF>=1.24.9,<=1.24.14
+PyMuPDF>=1.24.9,<1.25.0
 loguru>=0.6.0
 numpy>=1.21.6,<2.0.0
 fast-langdetect>=0.2.3,<0.3.0
 scikit-learn>=1.0.2
 pdfminer.six==20231228
-unimernet==0.2.3
+torch==2.3.1
-torch>=2.2.2,<=2.3.1
+torchvision==0.18.1
-torchvision>=0.17.2,<=0.18.1
 matplotlib
 ultralytics>=8.3.48
-paddleocr==2.7.3
-paddlepaddle==3.0.0rc1
-struct-eqtable==0.3.2
-einops
-accelerate
-rapidocr-paddle>=1.4.5,<2.0.0
-rapidocr-onnxruntime>=1.4.4,<2.0.0
 rapid-table>=1.0.3,<2.0.0
 doclayout-yolo==0.0.2b1
+ftfy
 openai
-detectron2
+pydantic>=2.7.2,<2.11
+transformers>=4.49.0,<5.0.0
+tqdm>=4.67.1
\ No newline at end of file
--- a/docker/china/Dockerfile
+++ b/docker/china/Dockerfile
@@ -31,8 +31,7 @@ RUN python3 -m venv /opt/mineru_venv
 RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
    pip3 install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple && \
    wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/requirements.txt -O requirements.txt && \
-    pip3 install -r requirements.txt --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple && \
+    pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple"
-    pip3 install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/"
 # Copy the configuration file template and install magic-pdf latest
 RUN /bin/bash -c "wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/magic-pdf.template.json && \

--- a/docker/china/requirements.txt
+++ b/docker/china/requirements.txt
 boto3>=1.28.43
 Brotli>=1.1.0
 click>=8.1.7
-PyMuPDF>=1.24.9,<=1.24.14
+PyMuPDF>=1.24.9,<1.25.0
 loguru>=0.6.0
 numpy>=1.21.6,<2.0.0
 fast-langdetect>=0.2.3,<0.3.0
 scikit-learn>=1.0.2
 pdfminer.six==20231228
-unimernet==0.2.3
+torch>=2.2.2,!=2.5.0,!=2.5.1,<=2.6.0
-torch>=2.2.2,<=2.3.1
+torchvision
-torchvision>=0.17.2,<=0.18.1
 matplotlib
 ultralytics>=8.3.48
-paddleocr==2.7.3
-struct-eqtable==0.3.2
-einops
-accelerate
-rapidocr-paddle>=1.4.5,<2.0.0
-rapidocr-onnxruntime>=1.4.4,<2.0.0
 rapid-table>=1.0.3,<2.0.0
 doclayout-yolo==0.0.2b1
+ftfy
 openai
-detectron2
+pydantic>=2.7.2,<2.11
+transformers>=4.49.0,<5.0.0
+tqdm>=4.67.1
\ No newline at end of file
--- a/docker/global/Dockerfile
+++ b/docker/global/Dockerfile
@@ -31,8 +31,7 @@ RUN python3 -m venv /opt/mineru_venv
 RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
    pip3 install --upgrade pip && \
    wget https://github.com/opendatalab/MinerU/raw/master/docker/global/requirements.txt -O requirements.txt && \
-    pip3 install -r requirements.txt --extra-index-url https://wheels.myhloli.com && \
+    pip3 install -r requirements.txt"
-    pip3 install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/"
 # Copy the configuration file template and install magic-pdf latest
 RUN /bin/bash -c "wget https://github.com/opendatalab/MinerU/raw/master/magic-pdf.template.json && \

--- a/docker/global/requirements.txt
+++ b/docker/global/requirements.txt
 boto3>=1.28.43
 Brotli>=1.1.0
 click>=8.1.7
-PyMuPDF>=1.24.9,<=1.24.14
+PyMuPDF>=1.24.9,<1.25.0
 loguru>=0.6.0
 numpy>=1.21.6,<2.0.0
 fast-langdetect>=0.2.3,<0.3.0
 scikit-learn>=1.0.2
 pdfminer.six==20231228
-unimernet==0.2.3
+torch>=2.2.2,!=2.5.0,!=2.5.1,<=2.6.0
-torch>=2.2.2,<=2.3.1
+torchvision
-torchvision>=0.17.2,<=0.18.1
 matplotlib
 ultralytics>=8.3.48
-paddleocr==2.7.3
-struct-eqtable==0.3.2
-einops
-accelerate
-rapidocr-paddle>=1.4.5,<2.0.0
-rapidocr-onnxruntime>=1.4.4,<2.0.0
 rapid-table>=1.0.3,<2.0.0
 doclayout-yolo==0.0.2b1
+ftfy
 openai
-detectron2
+pydantic>=2.7.2,<2.11
+transformers>=4.49.0,<5.0.0
+tqdm>=4.67.1
\ No newline at end of file
--- a/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
@@ -9,11 +9,11 @@ nvidia-smi
 If you see information similar to the following, it means that the NVIDIA drivers are already installed, and you can skip Step 2.
 > [!NOTE]
-> Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
+> Notice:`CUDA Version` should be >= 12.4, If the displayed version number is less than 12.4, please upgrade the driver.
 ```plaintext
 +---------------------------------------------------------------------------------------+
-| NVIDIA-SMI 537.34                 Driver Version: 537.34       CUDA Version: 12.2     |
+| NVIDIA-SMI 570.133.07             Driver Version: 572.83         CUDA Version: 12.8   |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
@@ -31,7 +31,7 @@ If no driver is installed, use the following command:
 ```sh
 sudo apt-get update
-sudo apt-get install nvidia-driver-545
+sudo apt-get install nvidia-driver-570-server
 ```
 Install the proprietary driver and restart your computer after installation.
@@ -53,17 +53,15 @@ In the final step, enter `yes`, close the terminal, and reopen it.
 ### 4. Create an Environment Using Conda
-Specify Python version 3.10.
+```bash
+conda create -n mineru 'python<3.13' -y
-```sh
+conda activate mineru
-conda create -n MinerU python=3.10
-conda activate MinerU
 ```
 ### 5. Install Applications
 ```sh
-pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
+pip install -U magic-pdf[full]
 ```
 > [!IMPORTANT]
 > After installation, make sure to check the version of `magic-pdf` using the following command:
@@ -72,7 +70,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
 > magic-pdf --version
 > ```
 >
-> If the version number is less than 0.7.0, please report the issue.
+> If the version number is less than 1.3.0, please report the issue.
 ### 6. Download Models
@@ -94,13 +92,13 @@ You can find the `magic-pdf.json` file in your user directory.
 Download a sample file from the repository and test it.
 ```sh
-wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf
+wget https://github.com/opendatalab/MinerU/raw/master/demo/pdfs/small_ocr.pdf
 magic-pdf -p small_ocr.pdf -o ./output
 ```
 ### 9. Test CUDA Acceleration
-If your graphics card has at least **8GB** of VRAM, follow these steps to test CUDA acceleration:
+If your graphics card has at least **6GB** of VRAM, follow these steps to test CUDA acceleration:
 1. Modify the value of `"device-mode"` in the `magic-pdf.json` configuration file located in your home directory.
   ```json
@@ -111,15 +109,4 @@ If your graphics card has at least **8GB** of VRAM, follow these steps to test C
 2. Test CUDA acceleration with the following command:
   ```sh
   magic-pdf -p small_ocr.pdf -o ./output
   ```
\ No newline at end of file
-### 10. Enable CUDA Acceleration for OCR
-1. Download `paddlepaddle-gpu`. Installation will automatically enable OCR acceleration.
-   ```sh
-   python -m pip install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
-   ```
-2. Test OCR acceleration with the following command:
-   ```sh
-   magic-pdf -p small_ocr.pdf -o ./output
-   ```
--- a/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
@@ -9,11 +9,11 @@ nvidia-smi
 如果看到类似如下的信息，说明已经安装了nvidia驱动，可以跳过步骤2
 > [!NOTE]
-> `CUDA Version` 显示的版本号应 >= 12.1，如显示的版本号小于12.1，请升级驱动
+> `CUDA Version` 显示的版本号应 >= 12.4，如显示的版本号小于12.4，请升级驱动
 ```plaintext
 +---------------------------------------------------------------------------------------+
-| NVIDIA-SMI 537.34                 Driver Version: 537.34       CUDA Version: 12.2     |
+| NVIDIA-SMI 570.133.07             Driver Version: 572.83         CUDA Version: 12.8   |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
@@ -31,7 +31,7 @@ nvidia-smi
 ```bash
 sudo apt-get update
-sudo apt-get install nvidia-driver-545
+sudo apt-get install nvidia-driver-570-server
 ```
 安装专有驱动，安装完成后，重启电脑
@@ -53,17 +53,15 @@ bash Anaconda3-2024.06-1-Linux-x86_64.sh
 ## 4. 使用conda 创建环境
-需指定python版本为3.10
 ```bash
-conda create -n MinerU python=3.10
+conda create -n mineru 'python<3.13' -y
-conda activate MinerU
+conda activate mineru
 ```
 ## 5. 安装应用
 ```bash
-pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
+pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```
 > [!IMPORTANT]
@@ -73,7 +71,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 > magic-pdf --version
 > ```
 >
-> 如果版本号小于0.7.0，请到issue中向我们反馈
+> 如果版本号小于1.3.0，请到issue中向我们反馈
 ## 6. 下载模型
@@ -93,13 +91,13 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 从仓库中下载样本文件，并测试
 ```bash
-wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/demo/small_ocr.pdf
+wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/demo/pdfs/small_ocr.pdf
 magic-pdf -p small_ocr.pdf -o ./output
 ```
 ## 9. 测试CUDA加速
-如果您的显卡显存大于等于 **8GB** ，可以进行以下流程，测试CUDA解析加速效果
+如果您的显卡显存大于等于 **6GB** ，可以进行以下流程，测试CUDA解析加速效果
 **1.修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值**
@@ -115,20 +113,4 @@ magic-pdf -p small_ocr.pdf -o ./output
 magic-pdf -p small_ocr.pdf -o ./output
 ```
 > [!TIP]
-> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`layout detection cost` 和 `mfr time` 应提速10倍以上。
+> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，使用cuda加速会比cpu更快。
-## 10. 为ocr开启cuda加速
-**1.下载paddlepaddle-gpu, 安装完成后会自动开启ocr加速**
-```bash
-python -m pip install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
-```
-**2.运行以下命令测试ocr加速效果**
-```bash
-magic-pdf -p small_ocr.pdf -o ./output
-```
-> [!TIP]
-> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`ocr cost`应提速10倍以上。
--- a/docs/README_Windows_CUDA_Acceleration_en_US.md
+++ b/docs/README_Windows_CUDA_Acceleration_en_US.md
@@ -2,10 +2,11 @@
 ### 1. Install CUDA and cuDNN
-Required versions: CUDA 11.8 + cuDNN 8.7.0
+You need to install a CUDA version that is compatible with torch's requirements. Currently, torch supports CUDA 11.8/12.4/12.6.
- CUDA 11.8: https://developer.nvidia.com/cuda-11-8-0-download-archive
+- CUDA 11.8 https://developer.nvidia.com/cuda-11-8-0-download-archive
- cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x: https://developer.nvidia.com/rdp/cudnn-archive
+- CUDA 12.4 https://developer.nvidia.com/cuda-12-4-0-download-archive
+- CUDA 12.6 https://developer.nvidia.com/cuda-12-6-0-download-archive
 ### 2. Install Anaconda
@@ -15,17 +16,15 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 ### 3. Create an Environment Using Conda
-Python version must be 3.10.
+```bash
+conda create -n mineru 'python<3.13' -y
-```
+conda activate mineru
-conda create -n MinerU python=3.10
-conda activate MinerU
 ```
 ### 4. Install Applications
 ```
-pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
+pip install -U magic-pdf[full]
 ```
 > [!IMPORTANT]
@@ -35,7 +34,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
 > magic-pdf --version
 > ```
 >
-> If the version number is less than 0.7.0, please report it in the issues section.
+> If the version number is less than 1.3.0, please report it in the issues section.
 ### 5. Download Models
@@ -54,18 +53,18 @@ You can find the `magic-pdf.json` file in your 【user directory】 .
 Download a sample file from the repository and test it.
 ```powershell
-  wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf
+  wget https://github.com/opendatalab/MinerU/raw/master/demo/pdfs/small_ocr.pdf -O small_ocr.pdf
  magic-pdf -p small_ocr.pdf -o ./output
 ```
 ### 8. Test CUDA Acceleration
-If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
+If your graphics card has at least 6GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
-1. **Overwrite the installation of torch and torchvision** supporting CUDA.
+1. **Overwrite the installation of torch and torchvision** supporting CUDA.(Please select the appropriate index-url based on your CUDA version. For more details, refer to the [PyTorch official website](https://pytorch.org/get-started/locally/).)
   ```
-   pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu118
+   pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
   ```
 2. **Modify the value of `"device-mode"`** in the `magic-pdf.json` configuration file located in your user directory.
@@ -81,15 +80,4 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
   ```
   magic-pdf -p small_ocr.pdf -o ./output
   ```
\ No newline at end of file
-### 9. Enable CUDA Acceleration for OCR
-1. **Download paddlepaddle-gpu**, which will automatically enable OCR acceleration upon installation.
-   ```
-   pip install paddlepaddle-gpu==2.6.1
-   ```
-2. **Run the following command to test OCR acceleration**:
-   ```
-   magic-pdf -p small_ocr.pdf -o ./output
-   ```
--- a/docs/README_Windows_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Windows_CUDA_Acceleration_zh_CN.md
@@ -2,10 +2,11 @@
 ## 1. 安装cuda和cuDNN
-需要安装的版本 CUDA 11.8 + cuDNN 8.7.0
+需要安装符合torch要求的cuda版本，torch目前支持11.8/12.4/12.6
 - CUDA 11.8 https://developer.nvidia.com/cuda-11-8-0-download-archive
- cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x https://developer.nvidia.com/rdp/cudnn-archive
+- CUDA 12.4 https://developer.nvidia.com/cuda-12-4-0-download-archive
+- CUDA 12.6 https://developer.nvidia.com/cuda-12-6-0-download-archive
 ## 2. 安装anaconda
@@ -16,17 +17,15 @@ https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.06-1-Window
 ## 3. 使用conda 创建环境
-需指定python版本为3.10
 ```bash
-conda create -n MinerU python=3.10
+conda create -n mineru 'python<3.13' -y
-conda activate MinerU
+conda activate mineru
 ```
 ## 4. 安装应用
 ```bash
-pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
+pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```
 > [!IMPORTANT]
@@ -36,7 +35,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 > magic-pdf --version
 > ```
 >
-> 如果版本号小于0.7.0，请到issue中向我们反馈
+> 如果版本号小于 1.3.0 ，请到issue中向我们反馈
 ## 5. 下载模型
@@ -55,18 +54,18 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 从仓库中下载样本文件，并测试
 ```powershell
- wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf
+ wget https://github.com/opendatalab/MinerU/raw/master/demo/pdfs/small_ocr.pdf -O small_ocr.pdf
 magic-pdf -p small_ocr.pdf -o ./output
 ```
 ## 8. 测试CUDA加速
-如果您的显卡显存大于等于 **8GB** ，可以进行以下流程，测试CUDA解析加速效果
+如果您的显卡显存大于等于 **6GB** ，可以进行以下流程，测试CUDA解析加速效果
-**1.覆盖安装支持cuda的torch和torchvision**
+**1.覆盖安装支持cuda的torch和torchvision**(请根据cuda版本选择合适的index-url，具体可参考[torch官网](https://pytorch.org/get-started/locally/))
 ```bash
-pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu118
+pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
 ```
 **2.修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值**
@@ -84,20 +83,4 @@ magic-pdf -p small_ocr.pdf -o ./output
 ```
 > [!TIP]
-> CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断，通常情况下，`layout detection time` 和 `mfr time` 应提速10倍以上。
+> CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断，通常情况下，cuda加速后运行速度比cpu更快。
-## 9. 为ocr开启cuda加速
-**1.下载paddlepaddle-gpu, 安装完成后会自动开启ocr加速**
-```bash
-pip install paddlepaddle-gpu==2.6.1
-```
-**2.运行以下命令测试ocr加速效果**
-```bash
-magic-pdf -p small_ocr.pdf -o ./output
-```
-> [!TIP]
-> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`ocr time`应提速10倍以上。
--- a/magic-pdf.template.json
+++ b/magic-pdf.template.json
@@ -40,5 +40,5 @@
            "enable": false
        }
    },
-    "config_version": "1.1.1"
+    "config_version": "1.2.0"
 }
\ No newline at end of file
--- a/magic_pdf/data/batch_build_dataset.py
+++ b/magic_pdf/data/batch_build_dataset.py
+import concurrent.futures
+import fitz
+from magic_pdf.data.dataset import PymuDocDataset
+from magic_pdf.data.utils import fitz_doc_to_image  # PyMuPDF
+def partition_array_greedy(arr, k):
+    """Partition an array into k parts using a simple greedy approach.
+    Parameters:
+    -----------
+    arr : list
+        The input array of integers
+    k : int
+        Number of partitions to create
+    Returns:
+    --------
+    partitions : list of lists
+        The k partitions of the array
+    """
+    # Handle edge cases
+    if k <= 0:
+        raise ValueError('k must be a positive integer')
+    if k > len(arr):
+        k = len(arr)  # Adjust k if it's too large
+    if k == 1:
+        return [list(range(len(arr)))]
+    if k == len(arr):
+        return [[i] for i in range(len(arr))]
+    # Sort the array in descending order
+    sorted_indices = sorted(range(len(arr)), key=lambda i: arr[i][1], reverse=True)
+    # Initialize k empty partitions
+    partitions = [[] for _ in range(k)]
+    partition_sums = [0] * k
+    # Assign each element to the partition with the smallest current sum
+    for idx in sorted_indices:
+        # Find the partition with the smallest sum
+        min_sum_idx = partition_sums.index(min(partition_sums))
+        # Add the element to this partition
+        partitions[min_sum_idx].append(idx)  # Store the original index
+        partition_sums[min_sum_idx] += arr[idx][1]
+    return partitions
+def process_pdf_batch(pdf_jobs, idx):
+    """Process a batch of PDF pages using multiple threads.
+    Parameters:
+    -----------
+    pdf_jobs : list of tuples
+        List of (pdf_path, page_num) tuples
+    output_dir : str or None
+        Directory to save images to
+    num_threads : int
+        Number of threads to use
+    **kwargs :
+        Additional arguments for process_pdf_page
+    Returns:
+    --------
+    images : list
+        List of processed images
+    """
+    images = []
+    for pdf_path, _ in pdf_jobs:
+        doc = fitz.open(pdf_path)
+        tmp = []
+        for page_num in range(len(doc)):
+            page = doc[page_num]
+            tmp.append(fitz_doc_to_image(page))
+        images.append(tmp)
+    return (idx, images)
+def batch_build_dataset(pdf_paths, k, lang=None):
+    """Process multiple PDFs by partitioning them into k balanced parts and
+    processing each part in parallel.
+    Parameters:
+    -----------
+    pdf_paths : list
+        List of paths to PDF files
+    k : int
+        Number of partitions to create
+    output_dir : str or None
+        Directory to save images to
+    threads_per_worker : int
+        Number of threads to use per worker
+    **kwargs :
+        Additional arguments for process_pdf_page
+    Returns:
+    --------
+    all_images : list
+        List of all processed images
+    """
+    # Get page counts for each PDF
+    pdf_info = []
+    total_pages = 0
+    for pdf_path in pdf_paths:
+        try:
+            doc = fitz.open(pdf_path)
+            num_pages = len(doc)
+            pdf_info.append((pdf_path, num_pages))
+            total_pages += num_pages
+            doc.close()
+        except Exception as e:
+            print(f'Error opening {pdf_path}: {e}')
+    # Partition the jobs based on page countEach job has 1 page
+    partitions = partition_array_greedy(pdf_info, k)
+    # Process each partition in parallel
+    all_images_h = {}
+    with concurrent.futures.ProcessPoolExecutor(max_workers=k) as executor:
+        # Submit one task per partition
+        futures = []
+        for sn, partition in enumerate(partitions):
+            # Get the jobs for this partition
+            partition_jobs = [pdf_info[idx] for idx in partition]
+            # Submit the task
+            future = executor.submit(
+                process_pdf_batch,
+                partition_jobs,
+                sn
+            )
+            futures.append(future)
+        # Process results as they complete
+        for i, future in enumerate(concurrent.futures.as_completed(futures)):
+            try:
+                idx, images = future.result()
+                all_images_h[idx] = images
+            except Exception as e:
+                print(f'Error processing partition: {e}')
+    results = [None] * len(pdf_paths)
+    for i in range(len(partitions)):
+        partition = partitions[i]
+        for j in range(len(partition)):
+            with open(pdf_info[partition[j]][0], 'rb') as f:
+                pdf_bytes = f.read()
+            dataset = PymuDocDataset(pdf_bytes, lang=lang)
+            dataset.set_images(all_images_h[i][j])
+            results[partition[j]] = dataset
+    return results