@@ -48,251 +51,344 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
</div>
# Changelog
- 2025/05/24 1.3.12 Released
- Added support for ppocrv5 model, updated `ch_server` model to `PP-OCRv5_rec_server` and `ch_lite` model to `PP-OCRv5_rec_mobile` (model update required)
- In testing, we found that ppocrv5(server) shows some improvement for handwritten documents, but slightly lower accuracy than v4_server_doc for other document types. Therefore, the default ch model remains unchanged as `PP-OCRv4_server_rec_doc`.
- Since ppocrv5 enhances recognition capabilities for handwritten text and special characters, you can manually select ppocrv5 models for Japanese, traditional Chinese mixed scenarios and handwritten document scenarios
- You can select the appropriate model through the lang parameter `lang='ch_server'` (python api) or `--lang ch_server` (command line):
-`ch`: `PP-OCRv4_rec_server_doc` (default) (Chinese, English, Japanese, Traditional Chinese mixed/15k dictionary)
-`ch_server`: `PP-OCRv5_rec_server` (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
-`ch_lite`: `PP-OCRv5_rec_mobile` (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
-`ch_server_v4`: `PP-OCRv4_rec_server` (Chinese, English mixed/6k dictionary)
-`ch_lite_v4`: `PP-OCRv4_rec_mobile` (Chinese, English mixed/6k dictionary)
- Added support for handwritten documents by optimizing layout recognition of handwritten text areas
- This feature is supported by default, no additional configuration needed
- You can refer to the instructions above to manually select ppocrv5 model for better handwritten document parsing
- The demos on `huggingface` and `modelscope` have been updated to support handwriting recognition and ppocrv5 models, which you can experience online
- 2025/04/29 1.3.10 Released
- Support for custom formula delimiters can be achieved by modifying the `latex-delimiter-config` item in the `magic-pdf.json` file under the user directory.
- 2025/04/27 1.3.9 Released
- Optimized the formula parsing function to improve the success rate of formula rendering
- 2025/04/23 1.3.8 Released
- The default `ocr` model (`ch`) has been updated to `PP-OCRv4_server_rec_doc` (model update required)
-`PP-OCRv4_server_rec_doc` is trained on a mix of more Chinese document data and PP-OCR training data, enhancing recognition capabilities for some traditional Chinese characters, Japanese, and special characters. It supports over 15,000 recognizable characters, improving text recognition in documents while also boosting general text recognition.
-[Performance comparison between PP-OCRv4_server_rec_doc, PP-OCRv4_server_rec, and PP-OCRv4_mobile_rec](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/text_recognition.html#ii-supported-model-list)
- Verified results show that the `PP-OCRv4_server_rec_doc` model significantly improves accuracy in both single-language (`Chinese`, `English`, `Japanese`, `Traditional Chinese`) and mixed-language scenarios, with speed comparable to `PP-OCRv4_server_rec`, making it suitable for most use cases.
- In a small number of pure English scenarios, the `PP-OCRv4_server_rec_doc` model may encounter word concatenation issues, whereas `PP-OCRv4_server_rec` performs better in such cases. Therefore, we have retained the `PP-OCRv4_server_rec` model, which users can invoke by passing the parameter `lang='ch_server'`(python api) or `--lang ch_server`(cli).
- 2025/04/22 1.3.7 Released
- Fixed the issue where the `lang` parameter was ineffective during table parsing model initialization.
- Fixed the significant slowdown in OCR and table parsing speed in `cpu` mode.
- 2025/04/16 1.3.4 Released
- Slightly improved the speed of OCR detection by removing some unused blocks.
- Fixed page-level sorting errors caused by footnotes in certain cases.
- 2025/04/12 1.3.2 released
- Fixed the issue of incompatible dependency package versions when installing in Python 3.13 environment on Windows systems.
- Optimized memory usage during batch inference.
- Improved the parsing effect of tables rotated by 90 degrees.
- Enhanced the parsing accuracy for large tables in financial report samples.
- Fixed the occasional word concatenation issue in English text areas when OCR language is not specified.(The model needs to be updated)
- 2025/04/08 1.3.1 released, fixed some compatibility issues
- Supported Python 3.13
- Made the final adaptation for some outdated Linux systems (e.g., CentOS 7), and no further support will be guaranteed for subsequent versions. [Installation Instructions](https://github.com/opendatalab/MinerU/issues/1004)
- 2025/04/03 1.3.0 released, in this version we made many optimizations and improvements:
- Installation and compatibility optimization
- By removing the use of `layoutlmv3` in layout, resolved compatibility issues caused by `detectron2`.
- Torch version compatibility extended to 2.2~2.6 (excluding 2.5).
- CUDA compatibility supports 11.8/12.4/12.6/12.8 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
- Python compatible versions expanded to 3.10~3.12, solving the problem of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
- Offline deployment process optimized; no internet connection required after successful deployment to download any model files.
- Performance optimization
- By supporting batch processing of multiple PDF files ([script example](demo/batch_demo.py)), improved parsing speed for small files in batches (compared to version 1.0.1, formula parsing speed increased by over 1400%, overall parsing speed increased by over 500%).
- Optimized loading and usage of the mfr model, reducing GPU memory usage and improving parsing speed (requires re-execution of the [model download process](docs/how_to_download_models_en.md) to obtain incremental updates of model files).
- Optimized GPU memory usage, requiring only a minimum of 6GB to run this project.
- Improved running speed on MPS devices.
- Parsing effect optimization
- Updated the mfr model to `unimernet(2503)`, solving the issue of lost line breaks in multi-line formulas.
- Usability Optimization
- By using `paddleocr2torch`, completely replaced the use of the `paddle` framework and `paddleocr` in the project, resolving conflicts between `paddle` and `torch`, as well as thread safety issues caused by the `paddle` framework.
- Added a real-time progress bar during the parsing process to accurately track progress, making the wait less painful.
<details>
<summary>2025/03/03 1.2.1 released</summary>
<ul>
<li>Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers</li>
<li>Fixed caption matching inaccuracies in certain scenarios</li>
<li>Fixed formula span loss issues in certain scenarios</li>
</ul>
</details>
<details>
<summary>2025/02/24 1.2.0 released</summary>
<p>This version includes several fixes and improvements to enhance parsing efficiency and accuracy:</p>
<ul>
<li><strong>Performance Optimization</strong>
<ul>
<li>Increased classification speed for PDF documents in auto mode.</li>
</ul>
</li>
<li><strong>Parsing Optimization</strong>
<ul>
<li>Improved parsing logic for documents containing watermarks, significantly enhancing the parsing results for such documents.</li>
<li>Enhanced the matching logic for multiple images/tables and captions within a single page, improving the accuracy of image-text matching in complex layouts.</li>
</ul>
</li>
<li><strong>Bug Fixes</strong>
<ul>
<li>Fixed an issue where image/table spans were incorrectly filled into text blocks under certain conditions.</li>
<li>Resolved an issue where title blocks were empty in some cases.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/01/22 1.1.0 released</summary>
<p>In this version we have focused on improving parsing accuracy and efficiency:</p>
<ul>
<li><strong>Model capability upgrade</strong> (requires re-executing the <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">model download process</a> to obtain incremental updates of model files)
<ul>
<li>The layout recognition model has been upgraded to the latest <code>doclayout_yolo(2501)</code> model, improving layout recognition accuracy.</li>
<li>The formula parsing model has been upgraded to the latest <code>unimernet(2501)</code> model, improving formula recognition accuracy.</li>
</ul>
</li>
<li><strong>Performance optimization</strong>
<ul>
<li>On devices that meet certain configuration requirements (16GB+ VRAM), by optimizing resource usage and restructuring the processing pipeline, overall parsing speed has been increased by more than 50%.</li>
</ul>
</li>
<li><strong>Parsing effect optimization</strong>
<ul>
<li>Added a new heading classification feature (testing version, enabled by default) to the online demo (<ahref="https://mineru.net/OpenSourceTools/Extractor">mineru.net</a>/<ahref="https://huggingface.co/spaces/opendatalab/MinerU">huggingface</a>/<ahref="https://www.modelscope.cn/studios/OpenDataLab/MinerU">modelscope</a>), which supports hierarchical classification of headings, thereby enhancing document structuring.</li>
</ul>
</li>
</ul>
</details>
- 2025/06/13 2.0.0 Released
- MinerU 2.0 represents a comprehensive reconstruction and upgrade from architecture to functionality, delivering a more streamlined design, enhanced performance, and more flexible user experience.
-**New Architecture**: MinerU 2.0 has been deeply restructured in code organization and interaction methods, significantly improving system usability, maintainability, and extensibility.
-**Removal of Third-party Dependency Limitations**: Completely eliminated the dependency on `pymupdf`, moving the project toward a more open and compliant open-source direction.
-**Ready-to-use, Easy Configuration**: No need to manually edit JSON configuration files; most parameters can now be set directly via command line or API.
-**Automatic Model Management**: Added automatic model download and update mechanisms, allowing users to complete model deployment without manual intervention.
-**Offline Deployment Friendly**: Provides built-in model download commands, supporting deployment requirements in completely offline environments.
-**Streamlined Code Structure**: Removed thousands of lines of redundant code, simplified class inheritance logic, significantly improving code readability and development efficiency.
-**Unified Intermediate Format Output**: Adopted standardized `middle_json` format, compatible with most secondary development scenarios based on this format, ensuring seamless ecosystem business migration.
-**Small Model, Big Capabilities**: With parameters under 1B, yet surpassing traditional 72B-level vision-language models (VLMs) in parsing accuracy.
-**Multiple Functions in One**: A single model covers multilingual recognition, handwriting recognition, layout analysis, table parsing, formula recognition, reading order sorting, and other core tasks.
-**Ultimate Inference Speed**: Achieves peak throughput exceeding 10,000 tokens/s through `sglang` acceleration on a single NVIDIA 4090 card, easily handling large-scale document processing requirements.
-**Online Experience**: You can experience this model online on our Hugging Face demo: [](https://huggingface.co/spaces/opendatalab/mineru2)
-**Incompatible Changes Notice**: To improve overall architectural rationality and long-term maintainability, this version contains some incompatible changes:
- Python package name changed from `magic-pdf` to `mineru`, and the command-line tool changed from `magic-pdf` to `mineru`. Please update your scripts and command calls accordingly.
- For modular system design and ecosystem consistency considerations, MinerU 2.0 no longer includes the LibreOffice document conversion module. If you need to process Office documents, we recommend converting them to PDF format through an independently deployed LibreOffice service before proceeding with subsequent parsing operations.
<details>
<summary>2025/01/10 1.0.1 released</summary>
<p>This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:</p>
<ul>
<li><strong>New API Interface</strong>
<ul>
<li>For the data-side API, we have introduced the Dataset class, designed to provide a robust and flexible data processing framework. This framework currently supports a variety of document formats, including images (.jpg and .png), PDFs, Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx). It ensures effective support for data processing tasks ranging from simple to complex.</li>
<li>For the user-side API, we have meticulously designed the MinerU processing workflow as a series of composable Stages. Each Stage represents a specific processing step, allowing users to define new Stages according to their needs and creatively combine these stages to customize their data processing workflows.</li>
</ul>
</li>
<li><strong>Enhanced Compatibility</strong>
<ul>
<li>By optimizing the dependency environment and configuration items, we ensure stable and efficient operation on ARM architecture Linux systems.</li>
<li>We have deeply integrated with Huawei Ascend NPU acceleration, providing autonomous and controllable high-performance computing capabilities. This supports the localization and development of AI application platforms in China. <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/README_Ascend_NPU_Acceleration_zh_CN.md">Ascend NPU Acceleration</a></li>
</ul>
</li>
<li><strong>Automatic Language Identification</strong>
<ul>
<li>By introducing a new language recognition model, setting the <code>lang</code> configuration to <code>auto</code> during document parsing will automatically select the appropriate OCR language model, improving the accuracy of scanned document parsing.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2024/11/22 0.10.0 released</summary>
<p>Introducing hybrid OCR text extraction capabilities:</p>
<ul>
<li>Significantly improved parsing performance in complex text distribution scenarios such as dense formulas, irregular span regions, and text represented by images.</li>
<li>Combines the dual advantages of accurate content extraction and faster speed in text mode, and more precise span/line region recognition in OCR mode.</li>
</ul>
</details>
<details>
<summary>2024/11/15 0.9.3 released</summary>
<p>Integrated <ahref="https://github.com/RapidAI/RapidTable">RapidTable</a> for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.</p>
</details>
<details>
<summary>2024/11/06 0.9.2 released</summary>
<p>Integrated the <ahref="https://huggingface.co/U4R/StructTable-InternVL2-1B">StructTable-InternVL2-1B</a> model for table recognition functionality.</p>
</details>
<details>
<summary>2024/10/31 0.9.0 released</summary>
<p>This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:</p>
<ul>
<li>Refactored the sorting module code to use <ahref="https://github.com/ppaanngggg/layoutreader">layoutreader</a> for reading order sorting, ensuring high accuracy in various layouts.</li>
<li>Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.</li>
<li>Refactored the list and table of contents recognition functions, significantly improving the accuracy of list blocks and table of contents blocks, as well as the parsing of corresponding text paragraphs.</li>
<li>Refactored the matching logic for figures, tables, and descriptive text, greatly enhancing the accuracy of matching captions and footnotes to figures and tables, and reducing the loss rate of descriptive text to near zero.</li>
<li>Added multi-language support for OCR, supporting detection and recognition of 84 languages. For the list of supported languages, see <ahref="https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/multi_languages.html#5-support-languages-and-abbreviations">OCR Language Support List</a>.</li>
<li>Added memory recycling logic and other memory optimization measures, significantly reducing memory usage. The memory requirement for enabling all acceleration features except table acceleration (layout/formula/OCR) has been reduced from 16GB to 8GB, and the memory requirement for enabling all acceleration features has been reduced from 24GB to 10GB.</li>
<li>Optimized configuration file feature switches, adding an independent formula detection switch to significantly improve speed and parsing results when formula detection is not needed.</li>
<li>Added the self-developed <code>doclayout_yolo</code> model, which speeds up processing by more than 10 times compared to the original solution while maintaining similar parsing effects, and can be freely switched with <code>layoutlmv3</code> via the configuration file.</li>
<li>Upgraded formula parsing to <code>unimernet 0.2.1</code>, improving formula parsing accuracy while significantly reducing memory usage.</li>
<li>Due to the repository change for <code>PDF-Extract-Kit 1.0</code>, you need to re-download the model. Please refer to <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">How to Download Models</a> for detailed steps.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2024/09/27 Version 0.8.1 released</summary>
<p>Fixed some bugs, and providing a <ahref="https://github.com/opendatalab/MinerU/blob/master/projects/web_demo/README.md">localized deployment version</a> of the <ahref="https://opendatalab.com/OpenSourceTools/Extractor/PDF/">online demo</a> and the <ahref="https://github.com/opendatalab/MinerU/blob/master/projects/web/README.md">front-end interface</a>.</p>
</details>
<details>
<summary>2024/09/09 Version 0.8.0 released</summary>
<p>Supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.</p>
</details>
<details>
<summary>2024/08/30 Version 0.7.1 released</summary>
<li>Added support for PPOCRv5 models, updated <code>ch_server</code> model to <code>PP-OCRv5_rec_server</code>, and <code>ch_lite</code> model to <code>PP-OCRv5_rec_mobile</code> (model update required)
<ul>
<li>In testing, we found that PPOCRv5(server) has some improvement for handwritten documents, but has slightly lower accuracy than v4_server_doc for other document types, so the default ch model remains unchanged as <code>PP-OCRv4_server_rec_doc</code>.</li>
<li>Since PPOCRv5 has enhanced recognition capabilities for handwriting and special characters, you can manually choose the PPOCRv5 model for Japanese-Traditional Chinese mixed scenarios and handwritten documents</li>
<li>You can select the appropriate model through the lang parameter <code>lang='ch_server'</code> (Python API) or <code>--lang ch_server</code> (command line):
<ul>
<li><ahref="#online-demo">Online Demo</a></li>
<li><ahref="#quick-cpu-demo">Quick CPU Demo</a></li>
<li><ahref="#using-gpu">Using GPU</a></li>
<li><ahref="#using-npu">Using NPU</a></li>
<li><code>ch</code>: <code>PP-OCRv4_server_rec_doc</code> (default) (Chinese/English/Japanese/Traditional Chinese mixed/15K dictionary)</li>
<li><code>ch_server</code>: <code>PP-OCRv5_rec_server</code> (Chinese/English/Japanese/Traditional Chinese mixed + handwriting/18K dictionary)</li>
<li><code>ch_lite</code>: <code>PP-OCRv5_rec_mobile</code> (Chinese/English/Japanese/Traditional Chinese mixed + handwriting/18K dictionary)</li>
<li>Added support for handwritten documents through optimized layout recognition of handwritten text areas
<ul>
<li>This feature is supported by default, no additional configuration required</li>
<li>You can refer to the instructions above to manually select the PPOCRv5 model for better handwritten document parsing results</li>
</ul>
</li>
<li>The <code>huggingface</code> and <code>modelscope</code> demos have been updated to versions that support handwriting recognition and PPOCRv5 models, which you can experience online</li>
</ul>
</details>
<details>
<summary>2025/04/29 Release 1.3.10</summary>
<ul>
<li>Added support for custom formula delimiters, which can be configured by modifying the <code>latex-delimiter-config</code> section in the <code>magic-pdf.json</code> file in your user directory.</li>
</ul>
</details>
<details>
<summary>2025/04/27 Release 1.3.9</summary>
<ul>
<li>Optimized formula parsing functionality, improved formula rendering success rate</li>
</ul>
</details>
<details>
<summary>2025/04/23 Release 1.3.8</summary>
<ul>
<li>The default <code>ocr</code> model (<code>ch</code>) has been updated to <code>PP-OCRv4_server_rec_doc</code> (model update required)
<ul>
<li><code>PP-OCRv4_server_rec_doc</code> is trained on a mixture of more Chinese document data and PP-OCR training data based on <code>PP-OCRv4_server_rec</code>, adding recognition capabilities for some traditional Chinese characters, Japanese, and special characters. It can recognize over 15,000 characters and improves both document-specific and general text recognition abilities.</li>
<li><ahref="https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html#_3">Performance comparison of PP-OCRv4_server_rec_doc/PP-OCRv4_server_rec/PP-OCRv4_mobile_rec</a></li>
<li>After verification, the <code>PP-OCRv4_server_rec_doc</code> model shows significant accuracy improvements in Chinese/English/Japanese/Traditional Chinese in both single language and mixed language scenarios, with comparable speed to <code>PP-OCRv4_server_rec</code>, making it suitable for most use cases.</li>
<li>In some pure English scenarios, <code>PP-OCRv4_server_rec_doc</code> may have word adhesion issues, while <code>PP-OCRv4_server_rec</code> performs better in these cases. Therefore, we've kept the <code>PP-OCRv4_server_rec</code> model, which users can access by adding the parameter <code>lang='ch_server'</code> (Python API) or <code>--lang ch_server</code> (command line).</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/04/22 Release 1.3.7</summary>
<ul>
<li>Fixed the issue where the lang parameter was ineffective during table parsing model initialization</li>
<li>Fixed the significant speed reduction of OCR and table parsing in <code>cpu</code> mode</li>
</ul>
</details>
<details>
<summary>2025/04/16 Release 1.3.4</summary>
<ul>
<li>Slightly improved OCR-det speed by removing some unnecessary blocks</li>
<li>Fixed page-internal sorting errors caused by footnotes in certain cases</li>
</ul>
</details>
<details>
<summary>2025/04/12 Release 1.3.2</summary>
<ul>
<li>Fixed dependency version incompatibility issues when installing on Windows with Python 3.13</li>
<li>Optimized memory usage during batch inference</li>
<li>Improved parsing of tables rotated 90 degrees</li>
<li>Enhanced parsing of oversized tables in financial report samples</li>
<li>Fixed the occasional word adhesion issue in English text areas when OCR language is not specified (model update required)</li>
</ul>
</details>
<details>
<summary>2025/04/08 Release 1.3.1</summary>
<ul>
<li>Fixed several compatibility issues
<ul>
<li>Added support for Python 3.13</li>
<li>Made final adaptations for outdated Linux systems (such as CentOS 7) with no guarantee of continued support in future versions, <ahref="https://github.com/opendatalab/MinerU/issues/1004">installation instructions</a></li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/04/03 Release 1.3.0</summary>
<ul>
<li>Installation and compatibility optimizations
<ul>
<li>Resolved compatibility issues caused by <code>detectron2</code> by removing <code>layoutlmv3</code> usage in layout</li>
<li>Extended torch version compatibility to 2.2~2.6 (excluding 2.5)</li>
<li>Added CUDA compatibility for versions 11.8/12.4/12.6/12.8 (CUDA version determined by torch), solving compatibility issues for users with 50-series and H-series GPUs</li>
<li>Extended Python compatibility to versions 3.10~3.12, fixing the issue of automatic downgrade to version 0.6.1 when installing in non-3.10 environments</li>
<li>Optimized offline deployment process, eliminating the need to download any model files after successful deployment</li>
</ul>
</li>
<li>Performance optimizations
<ul>
<li>Enhanced parsing speed for batches of small files by supporting batch processing of multiple PDF files (<ahref="demo/batch_demo.py">script example</a>), with formula parsing speed improved by up to 1400% and overall parsing speed improved by up to 500% compared to version 1.0.1</li>
<li>Reduced memory usage and improved parsing speed by optimizing MFR model loading and usage (requires re-running the <ahref="docs/how_to_download_models_zh_cn.md">model download process</a> to get incremental updates to model files)</li>
<li>Optimized GPU memory usage, requiring only 6GB minimum to run this project</li>
<li>Improved running speed on MPS devices</li>
</ul>
</li>
<li>Parsing effect optimizations
<ul>
<li>Updated MFR model to <code>unimernet(2503)</code>, fixing line break loss issues in multi-line formulas</li>
</ul>
</li>
<li>Usability optimizations
<ul>
<li>Completely replaced the <code>paddle</code> framework and <code>paddleocr</code> in the project by using <code>paddleocr2torch</code>, resolving conflicts between <code>paddle</code> and <code>torch</code>, as well as thread safety issues caused by the <code>paddle</code> framework</li>
<li>Added real-time progress bar display during parsing, allowing precise tracking of parsing progress and making the waiting process more bearable</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/03/03 1.2.1 released</summary>
<ul>
<li>Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers</li>
<li>Fixed caption matching inaccuracies in certain scenarios</li>
<li>Fixed formula span loss issues in certain scenarios</li>
</ul>
</details>
<details>
<summary>2025/02/24 1.2.0 released</summary>
<p>This version includes several fixes and improvements to enhance parsing efficiency and accuracy:</p>
<ul>
<li><strong>Performance Optimization</strong>
<ul>
<li>Increased classification speed for PDF documents in auto mode.</li>
</ul>
</li>
<li><strong>Parsing Optimization</strong>
<ul>
<li>Improved parsing logic for documents containing watermarks, significantly enhancing the parsing results for such documents.</li>
<li>Enhanced the matching logic for multiple images/tables and captions within a single page, improving the accuracy of image-text matching in complex layouts.</li>
</ul>
</li>
<li><strong>Bug Fixes</strong>
<ul>
<li>Fixed an issue where image/table spans were incorrectly filled into text blocks under certain conditions.</li>
<li>Resolved an issue where title blocks were empty in some cases.</li>
</ul>
</li>
<li><ahref="#todo">TODO</a></li>
<li><ahref="#known-issues">Known Issues</a></li>
<li><ahref="#faq">FAQ</a></li>
<li><ahref="#all-thanks-to-our-contributors">All Thanks To Our Contributors</a></li>
<p>In this version we have focused on improving parsing accuracy and efficiency:</p>
<ul>
<li><strong>Model capability upgrade</strong> (requires re-executing the <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">model download process</a> to obtain incremental updates of model files)
<ul>
<li>The layout recognition model has been upgraded to the latest <code>doclayout_yolo(2501)</code> model, improving layout recognition accuracy.</li>
<li>The formula parsing model has been upgraded to the latest <code>unimernet(2501)</code> model, improving formula recognition accuracy.</li>
</ul>
</li>
<li><strong>Performance optimization</strong>
<ul>
<li>On devices that meet certain configuration requirements (16GB+ VRAM), by optimizing resource usage and restructuring the processing pipeline, overall parsing speed has been increased by more than 50%.</li>
</ul>
</li>
<li><strong>Parsing effect optimization</strong>
<ul>
<li>Added a new heading classification feature (testing version, enabled by default) to the online demo (<ahref="https://mineru.net/OpenSourceTools/Extractor">mineru.net</a>/<ahref="https://huggingface.co/spaces/opendatalab/MinerU">huggingface</a>/<ahref="https://www.modelscope.cn/studios/OpenDataLab/MinerU">modelscope</a>), which supports hierarchical classification of headings, thereby enhancing document structuring.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/01/10 1.0.1 released</summary>
<p>This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:</p>
<ul>
<li><strong>New API Interface</strong>
<ul>
<li>For the data-side API, we have introduced the Dataset class, designed to provide a robust and flexible data processing framework. This framework currently supports a variety of document formats, including images (.jpg and .png), PDFs, Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx). It ensures effective support for data processing tasks ranging from simple to complex.</li>
<li>For the user-side API, we have meticulously designed the MinerU processing workflow as a series of composable Stages. Each Stage represents a specific processing step, allowing users to define new Stages according to their needs and creatively combine these stages to customize their data processing workflows.</li>
</ul>
</li>
<li><strong>Enhanced Compatibility</strong>
<ul>
<li>By optimizing the dependency environment and configuration items, we ensure stable and efficient operation on ARM architecture Linux systems.</li>
<li>We have deeply integrated with Huawei Ascend NPU acceleration, providing autonomous and controllable high-performance computing capabilities. This supports the localization and development of AI application platforms in China. <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/README_Ascend_NPU_Acceleration_zh_CN.md">Ascend NPU Acceleration</a></li>
</ul>
</li>
<li><strong>Automatic Language Identification</strong>
<ul>
<li>By introducing a new language recognition model, setting the <code>lang</code> configuration to <code>auto</code> during document parsing will automatically select the appropriate OCR language model, improving the accuracy of scanned document parsing.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2024/11/22 0.10.0 released</summary>
<p>Introducing hybrid OCR text extraction capabilities:</p>
<ul>
<li>Significantly improved parsing performance in complex text distribution scenarios such as dense formulas, irregular span regions, and text represented by images.</li>
<li>Combines the dual advantages of accurate content extraction and faster speed in text mode, and more precise span/line region recognition in OCR mode.</li>
</ul>
</details>
<details>
<summary>2024/11/15 0.9.3 released</summary>
<p>Integrated <ahref="https://github.com/RapidAI/RapidTable">RapidTable</a> for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.</p>
</details>
<details>
<summary>2024/11/06 0.9.2 released</summary>
<p>Integrated the <ahref="https://huggingface.co/U4R/StructTable-InternVL2-1B">StructTable-InternVL2-1B</a> model for table recognition functionality.</p>
</details>
<details>
<summary>2024/10/31 0.9.0 released</summary>
<p>This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:</p>
<ul>
<li>Refactored the sorting module code to use <ahref="https://github.com/ppaanngggg/layoutreader">layoutreader</a> for reading order sorting, ensuring high accuracy in various layouts.</li>
<li>Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.</li>
<li>Refactored the list and table of contents recognition functions, significantly improving the accuracy of list blocks and table of contents blocks, as well as the parsing of corresponding text paragraphs.</li>
<li>Refactored the matching logic for figures, tables, and descriptive text, greatly enhancing the accuracy of matching captions and footnotes to figures and tables, and reducing the loss rate of descriptive text to near zero.</li>
<li>Added multi-language support for OCR, supporting detection and recognition of 84 languages. For the list of supported languages, see <ahref="https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/multi_languages.html#5-support-languages-and-abbreviations">OCR Language Support List</a>.</li>
<li>Added memory recycling logic and other memory optimization measures, significantly reducing memory usage. The memory requirement for enabling all acceleration features except table acceleration (layout/formula/OCR) has been reduced from 16GB to 8GB, and the memory requirement for enabling all acceleration features has been reduced from 24GB to 10GB.</li>
<li>Optimized configuration file feature switches, adding an independent formula detection switch to significantly improve speed and parsing results when formula detection is not needed.</li>
<li>Added the self-developed <code>doclayout_yolo</code> model, which speeds up processing by more than 10 times compared to the original solution while maintaining similar parsing effects, and can be freely switched with <code>layoutlmv3</code> via the configuration file.</li>
<li>Upgraded formula parsing to <code>unimernet 0.2.1</code>, improving formula parsing accuracy while significantly reducing memory usage.</li>
<li>Due to the repository change for <code>PDF-Extract-Kit 1.0</code>, you need to re-download the model. Please refer to <ahref="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">How to Download Models</a> for detailed steps.</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2024/09/27 Version 0.8.1 released</summary>
<p>Fixed some bugs, and providing a <ahref="https://github.com/opendatalab/MinerU/blob/master/projects/web_demo/README.md">localized deployment version</a> of the <ahref="https://opendatalab.com/OpenSourceTools/Extractor/PDF/">online demo</a> and the <ahref="https://github.com/opendatalab/MinerU/blob/master/projects/web/README.md">front-end interface</a>.</p>
</details>
<details>
<summary>2024/09/09 Version 0.8.0 released</summary>
<p>Supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.</p>
</details>
<details>
<summary>2024/08/30 Version 0.7.1 released</summary>
@@ -326,12 +422,9 @@ If you encounter any installation issues, please first consult the <a href="#faq
If the parsing results are not as expected, refer to the <ahref="#known-issues">Known Issues</a>. </br>
There are three different ways to experience MinerU:
-[Online Demo (No Installation Required)](#online-demo)
-[Quick CPU Demo (Windows, Linux, Mac)](#quick-cpu-demo)
- Accelerate inference by using CUDA/CANN/MPS
-[Linux/Windows + CUDA](#Using-GPU)
-[Linux + CANN](#using-npu)
-[MacOS + MPS](#using-mps)
-[Online Demo](#online-demo)
-[Local Deployment](#local-deployment)
> [!WARNING]
> **Pre-installation Notice—Hardware and Software Environment Support**
...
...
@@ -342,182 +435,235 @@ There are three different ways to experience MinerU:
>
> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
<table>
<tr>
<tdcolspan="3"rowspan="2">Operating System</td>
</tr>
<tableborder="1">
<tr>
<td>Linux after 2019</td>
<td>Windows 10 / 11</td>
<td>macOS 11+</td>
<td>Parsing Backend</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sgslang</td>
</tr>
<tr>
<tdcolspan="3">CPU</td>
<td>x86_64 / arm64</td>
<td>x86_64(unsupported ARM Windows)</td>
<td>x86_64 / arm64</td>
<td>Operating System</td>
<td>windows/linux/mac</td>
<td>windows/linux</td>
<td>windows(wsl2)/linux</td>
</tr>
<tr>
<tdcolspan="3">Memory Requirements</td>
<tdcolspan="3">16GB or more, recommended 32GB+</td>
#### 1.3 Install full version (with sglang acceleration)
Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for detailed instructions.
To use **sglang acceleration for VLM model inference**, install the full version:
#### 3. Modify the Configuration File for Additional Configuration
```bash
uv pip install"mineru[all]>=2.0.0"
```
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】.
Or install from source:
> [!TIP]
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
```bash
uv pip install-e .[all]
```
You can modify certain configurations in this file to enable or disable features, such as table recognition:
---
### 2. Using MinerU
> [!NOTE]
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
-`<input_path>`: Local PDF file or directory (supports pdf/png/jpg/jpeg)
-`<output_path>`: Output directory
If your device supports CUDA and meets the GPU requirements of the mainline environment, you can use GPU acceleration. Please select the appropriate guide based on your system:
> 💡 For more information about output files, please refer to [Output File Documentation](docs/output_file_zh_cn.md)
[Using MinerU via Command Line](https://mineru.readthedocs.io/en/latest/user_guide/usage/command_line.html)
---
> [!TIP]
> For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
### 3. API Usage
### API
You can also call MinerU through Python code, see example code at:
👉 [Python Usage Example](demo/demo.py)
[Using MinerU via Python API](https://mineru.readthedocs.io/en/latest/user_guide/usage/api.html)
---
### 4. Deploy Derivative Projects
### Deploy Derived Projects
Community developers have created various extensions based on MinerU, including:
Derived projects include secondary development projects based on MinerU by project developers and community developers,
such as application interfaces based on Gradio, RAG based on llama, web demos similar to the official website, lightweight multi-GPU load balancing client/server ends, etc.
These projects may offer more features and a better user experience.
For specific deployment methods, please refer to the [Derived Project README](projects/README.md)
- Graphical interface based on Gradio
- Web API based on FastAPI
- Client/server architecture with multi-GPU load balancing, etc.
These projects typically offer better user experience and additional features.
### Development Guide
For detailed deployment instructions, please refer to:
In the final step, enter `yes`, close the terminal, and reopen it.
### 4. Create an Environment Using Conda
```bash
conda create -n mineru 'python=3.12'-y
conda activate mineru
```
### 5. Install Applications
```sh
pip install-U magic-pdf[full]
```
> [!TIP]
> After installation, you can check the version of `magic-pdf` using the following command:
>
> ```sh
> magic-pdf --version
> ```
### 6. Download Models
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
## 7. Understand the Location of the Configuration File
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your user directory.
> [!TIP]
> The user directory for Linux is "/home/username".
### 8. First Run
Download a sample file from the repository and test it.
You need to install a CUDA version that is compatible with torch's requirements. For details, please refer to the [official PyTorch website](https://pytorch.org/get-started/locally/).
- CUDA 11.8 https://developer.nvidia.com/cuda-11-8-0-download-archive
- CUDA 12.4 https://developer.nvidia.com/cuda-12-4-0-download-archive
- CUDA 12.6 https://developer.nvidia.com/cuda-12-6-0-download-archive
- CUDA 12.8 https://developer.nvidia.com/cuda-12-8-0-download-archive
### 2. Install Anaconda
If Anaconda is already installed, you can skip this step.
> After installation, you can check the version of `magic-pdf` using the following command:
>
> ```bash
> magic-pdf --version
> ```
### 5. Download Models
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
### 6. Understand the Location of the Configuration File
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】 .
> [!TIP]
> The user directory for Windows is "C:/Users/username".
### 7. First Run
Download a sample file from the repository and test it.
If your graphics card has at least 6GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
1.**Overwrite the installation of torch and torchvision** supporting CUDA.(Please select the appropriate index-url based on your CUDA version. For more details, refer to the [PyTorch official website](https://pytorch.org/get-started/locally/).)
Model downloads are divided into initial downloads and updates to the model directory. Please refer to the corresponding documentation for instructions on how to proceed.
# Initial download of model files
### Download the Model from Hugging Face
Use a Python Script to Download Model Files from Hugging Face
The Python script will automatically download the model files and configure the model directory in the configuration file.
The configuration file can be found in the user directory, with the filename `magic-pdf.json`.
# How to update models previously downloaded
## 1. Models downloaded via Hugging Face or Model Scope
If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.