- This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
- This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
-**Performance Optimizations:**
-**Performance Optimizations:**
...
@@ -51,10 +59,10 @@
...
@@ -51,10 +59,10 @@
- Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
- Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
- Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
- Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
-**Experience Enhancements:**
-**Experience Enhancements:**
- Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](#3-api-calls-or-visual-invocation).
- Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
- Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
- Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
- Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
- Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
- Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](#4-extending-mineru-functionality-through-configuration-files).
- Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files).
-**New Features:**
-**New Features:**
- Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
- Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
- Introduced limited support for vertical text layout in the `pipeline` backend.
- Introduced limited support for vertical text layout in the `pipeline` backend.
...
@@ -517,6 +525,11 @@ You can get the [Docker Deployment Instructions](https://opendatalab.github.io/M
...
@@ -517,6 +525,11 @@ You can get the [Docker Deployment Instructions](https://opendatalab.github.io/M
### Using MinerU
### Using MinerU
The simplest command line invocation is:
```bash
mineru -p <input_path> -o <output_path>
```
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/).
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/).
If your question is not listed, you can also use[DeepWiki](https://deepwiki.com/opendatalab/MinerU) to communicate with the AI assistant, which can solve most common problems.
If your question is not listed, try using[DeepWiki](https://deepwiki.com/opendatalab/MinerU)'s AI assistant for common issues.
If you still cannot resolve the issue, you can join the community through[Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl)to communicate with other users and developers.
For unresolved problems, join our[Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) community for support.
??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2"
??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2"
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation.
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation.
---
## Docker Description
## Docker Description
MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
...
@@ -28,9 +26,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg
...
@@ -28,9 +26,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg
>
>
> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
---
## Start Docker Container
## Start Docker Container:
```bash
```bash
docker run --gpus all \
docker run --gpus all \
...
@@ -42,9 +38,7 @@ docker run --gpus all \
...
@@ -42,9 +38,7 @@ docker run --gpus all \
```
```
After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features.
After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features.
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [MinerU Usage Documentation](../usage/index.md).
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
If you encounter any installation issues, please check the [FAQ](../FAQ/index.md) first.
If you encounter any installation issues, please check the [FAQ](../faq/index.md) first.
## Online Experience
## Online Experience
...
@@ -93,4 +93,9 @@ You can get the [Docker Deployment Instructions](./docker_deployment.md) in the
...
@@ -93,4 +93,9 @@ You can get the [Docker Deployment Instructions](./docker_deployment.md) in the
### Using MinerU
### Using MinerU
The simplest command line invocation is:
```bash
mineru -p <input_path> -o <output_path>
```
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md).
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md).
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
---
### Performance Optimization Parameters
### Performance Optimization Parameters
> [!TIP]
> [!TIP]
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
...
@@ -20,15 +16,11 @@
...
@@ -20,15 +16,11 @@
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
---
### Parameter Passing Instructions
### Parameter Passing Instructions
> [!TIP]
> [!TIP]
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
---
## GPU Device Selection and Configuration
## GPU Device Selection and Configuration
### CUDA_VISIBLE_DEVICES Basic Usage
### CUDA_VISIBLE_DEVICES Basic Usage
...
@@ -39,8 +31,6 @@
...
@@ -39,8 +31,6 @@
> ```
> ```
> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
---
### Common Device Configuration Examples
### Common Device Configuration Examples
> [!TIP]
> [!TIP]
> Here are some common `CUDA_VISIBLE_DEVICES` setting examples:
> Here are some common `CUDA_VISIBLE_DEVICES` setting examples:
...
@@ -52,8 +42,6 @@
...
@@ -52,8 +42,6 @@
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible
This section provides comprehensive usage instructions for the project. We will help you progressively master the project's usage from basic to advanced through the following sections:
MinerU uses `huggingface` as the default model source. If users cannot access `huggingface` due to network restrictions, they can conveniently switch the model source to `modelscope` through environment variables:
```bash
export MINERU_MODEL_SOURCE=modelscope
```
For more information about model source configuration and custom local model paths, please refer to the [Model Source Documentation](./model_source.md) in the documentation.
---
## Table of Contents
## Quick Usage via Command Line
-[Quick Usage](./quick_usage.md) - Quick setup and basic usage
MinerU has built-in command line tools that allow users to quickly use MinerU for PDF parsing through the command line:
-[Model Source Configuration](./model_source.md) - Detailed configuration instructions for model sources
```bash
-[Command Line Tools](./cli_tools.md) - Detailed parameter descriptions for command line tools
# Default parsing using pipeline backend
-[Advanced Optimization Parameters](./advanced_cli_parameters.md) - Advanced parameter descriptions for command line tool adaptation
mineru -p <input_path> -o <output_path>
```
> [!TIP]
>- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory
>
> For more information about output files, please refer to [Output File Documentation](../output_files.md).
> [!NOTE]
## Getting Started
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
We recommend reading the documentation in the order listed above, which will help you better understand and use the project features.
```bash
If you encounter issues during usage, please check the [FAQ](../faq/index.md)
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
---
## Advanced Usage via API, WebUI, sglang-client/server
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- FastAPI calls:
```bash
mineru-api --host 127.0.0.1 --port 8000
```
>[!TIP]
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
---
## Extending MinerU Functionality with Configuration Files
MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can edit `mineru.json` file in your user directory to add custom configurations.
>[!TIP]
>The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
Here are some available configuration options:
-`latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
-`llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
-`models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
@@ -36,7 +36,7 @@ or use the interactive command line tool to select model downloads:
...
@@ -36,7 +36,7 @@ or use the interactive command line tool to select model downloads:
```bash
```bash
mineru-models-download
mineru-models-download
```
```
>[!TIP]
>[!NOTE]
>- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory.
>- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory.
>- You can also create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
>- You can also create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
>- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`.
>- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`.
MinerU uses `huggingface` as the default model source. If users cannot access `huggingface` due to network restrictions, they can conveniently switch the model source to `modelscope` through environment variables:
```bash
export MINERU_MODEL_SOURCE=modelscope
```
For more information about model source configuration and custom local model paths, please refer to the [Model Source Documentation](./model_source.md) in the documentation.
## Quick Usage via Command Line
MinerU has built-in command line tools that allow users to quickly use MinerU for PDF parsing through the command line:
```bash
# Default parsing using pipeline backend
mineru -p <input_path> -o <output_path>
```
> [!TIP]
>- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory
>
> For more information about output files, please refer to [Output File Documentation](../reference/output_files.md).
> [!NOTE]
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
## Advanced Usage via API, WebUI, sglang-client/server
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- FastAPI calls:
```bash
mineru-api --host 0.0.0.0 --port 8000
```
>[!TIP]
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
## Extending MinerU Functionality with Configuration Files
MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can edit `mineru.json` file in your user directory to add custom configurations.
>[!IMPORTANT]
>The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
Here are some available configuration options:
-`latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
-`llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
-`models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.