Commit 2d23d70e authored by Sidney233's avatar Sidney233
Browse files

Merge remote-tracking branch 'origin/dev' into dev

parents 2683991e 40c09296
...@@ -44,6 +44,14 @@ ...@@ -44,6 +44,14 @@
# Changelog # Changelog
- 2025/07/16 2.1.1 Released
- Bug fixes
- Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005
- Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968
- Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915
- Usability improvements
- Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services
- Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience
- 2025/07/05 Version 2.1.0 Released - 2025/07/05 Version 2.1.0 Released
- This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows: - This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
- **Performance Optimizations:** - **Performance Optimizations:**
...@@ -51,10 +59,10 @@ ...@@ -51,10 +59,10 @@
- Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages). - Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
- Layout analysis speed of the `pipeline` backend has been increased by approximately 20%. - Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
- **Experience Enhancements:** - **Experience Enhancements:**
- Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](#3-api-calls-or-visual-invocation). - Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
- Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer). - Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
- Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`. - Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
- Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](#4-extending-mineru-functionality-through-configuration-files). - Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files).
- **New Features:** - **New Features:**
- Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html) - Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
- Introduced limited support for vertical text layout in the `pipeline` backend. - Introduced limited support for vertical text layout in the `pipeline` backend.
...@@ -517,6 +525,11 @@ You can get the [Docker Deployment Instructions](https://opendatalab.github.io/M ...@@ -517,6 +525,11 @@ You can get the [Docker Deployment Instructions](https://opendatalab.github.io/M
### Using MinerU ### Using MinerU
The simplest command line invocation is:
```bash
mineru -p <input_path> -o <output_path>
```
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/). You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/).
# TODO # TODO
...@@ -617,4 +630,4 @@ Currently, some models in this project are trained based on YOLO. However, since ...@@ -617,4 +630,4 @@ Currently, some models in this project are trained based on YOLO. However, since
- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit) - [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench) - [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html) - [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc) - [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
\ No newline at end of file
...@@ -43,17 +43,25 @@ ...@@ -43,17 +43,25 @@
</div> </div>
# 更新记录 # 更新记录
- 2025/07/16 2.1.1发布
- bug修复
- 修复`pipeline`在某些情况可能发生的文本块内容丢失问题 #3005
- 修复`sglang-client`需要安装`torch`等不必要的包的问题 #2968
- 更新`dockerfile`以修复linux字体缺失导致的解析文本内容不完整问题 #2915
- 易用性更新
- 更新`compose.yaml`,便于用户直接启动`sglang-server``mineru-api``mineru-gradio`服务
- 启用全新的[在线文档站点](https://opendatalab.github.io/MinerU/zh/),简化readme,提供更好的文档体验
- 2025/07/05 2.1.0发布 - 2025/07/05 2.1.0发布
- 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下: - 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:
- 性能优化: - 性能优化:
- 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度 - 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度
- 大幅提升`pipeline`后端批量处理大量页数较少(<10)文档时的后处理速度 - 大幅提升`pipeline`后端批量处理大量页数较少(<10)文档时的后处理速度
- `pipline`后端的layout分析速度提升约20% - `pipeline`后端的layout分析速度提升约20%
- 体验优化: - 体验优化:
- 内置开箱即用的`fastapi服务``gradio webui`,详细使用方法请参考[文档](#3-api-调用-或-可视化调用) - 内置开箱即用的`fastapi服务``gradio webui`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)
- `sglang`适配`0.4.8`版本,大幅降低`vlm-sglang`后端的显存要求,最低可在`8G显存`(Turing及以后架构)的显卡上运行 - `sglang`适配`0.4.8`版本,大幅降低`vlm-sglang`后端的显存要求,最低可在`8G显存`(Turing及以后架构)的显卡上运行
- 对所有命令增加`sglang`的参数透传,使得`sglang-engine`后端可以与`sglang-server`一致,接收`sglang`的所有参数 - 对所有命令增加`sglang`的参数透传,使得`sglang-engine`后端可以与`sglang-server`一致,接收`sglang`的所有参数
- 支持基于配置文件的功能扩展,包含`自定义公式标识符``开启标题分级功能``自定义本地模型目录`,详细使用方法请参考[文档](#4-基于配置文件扩展-mineru-功能) - 支持基于配置文件的功能扩展,包含`自定义公式标识符``开启标题分级功能``自定义本地模型目录`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1)
- 新特性: - 新特性:
- `pipeline`后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html) - `pipeline`后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
- `pipeline`后端增加对竖排文本的有限支持 - `pipeline`后端增加对竖排文本的有限支持
...@@ -503,6 +511,12 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并 ...@@ -503,6 +511,12 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并
--- ---
### 使用 MinerU ### 使用 MinerU
最简单的命令行调用方式:
```bash
mineru -p <input_path> -o <output_path>
```
您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](https://opendatalab.github.io/MinerU/zh/usage/) 您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](https://opendatalab.github.io/MinerU/zh/usage/)
# TODO # TODO
......
...@@ -3,14 +3,18 @@ FROM lmsysorg/sglang:v0.4.8.post1-cu126 ...@@ -3,14 +3,18 @@ FROM lmsysorg/sglang:v0.4.8.post1-cu126
# Install libgl for opencv support & Noto fonts for Chinese characters # Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \ RUN apt-get update && \
apt-get install -y fonts-noto-core fonts-noto-cjk && \ apt-get install -y \
apt-get install -y libgl1 && \ fonts-noto-core \
apt-get clean && \ fonts-noto-cjk \
fontconfig \
libgl1 && \
fc-cache -fv && \ fc-cache -fv && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* rm -rf /var/lib/apt/lists/*
# Install mineru latest # Install mineru latest
RUN python3 -m pip install -U 'mineru[core]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages RUN python3 -m pip install -U 'mineru[core]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages && \
python3 -m pip cache purge
# Download models and update the configuration file # Download models and update the configuration file
RUN /bin/bash -c "mineru-models-download -s modelscope -m all" RUN /bin/bash -c "mineru-models-download -s modelscope -m all"
......
# Use the official sglang image # Use the official sglang image
FROM lmsysorg/sglang:v0.4.8.post1-cu126 FROM lmsysorg/sglang:v0.4.8.post1-cu126
# Install libgl for opencv support # Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \ RUN apt-get update && \
apt-get install -y fonts-noto-core fonts-noto-cjk && \ apt-get install -y \
apt-get install -y libgl1 && \ fonts-noto-core \
apt-get clean && \ fonts-noto-cjk \
fontconfig \
libgl1 && \
fc-cache -fv && \ fc-cache -fv && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* rm -rf /var/lib/apt/lists/*
# Install mineru latest # Install mineru latest
RUN python3 -m pip install -U 'mineru[core]' --break-system-packages RUN python3 -m pip install -U 'mineru[core]' --break-system-packages && \
python3 -m pip cache purge
# Download models and update the configuration file # Download models and update the configuration file
RUN /bin/bash -c "mineru-models-download -s huggingface -m all" RUN /bin/bash -c "mineru-models-download -s huggingface -m all"
......
# Frequently Asked Questions # Frequently Asked Questions
If your question is not listed, you can also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to communicate with the AI assistant, which can solve most common problems. If your question is not listed, try using [DeepWiki](https://deepwiki.com/opendatalab/MinerU)'s AI assistant for common issues.
If you still cannot resolve the issue, you can join the community through [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to communicate with other users and developers. For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) community for support.
??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2" ??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2"
......
...@@ -13,8 +13,6 @@ docker build -t mineru-sglang:latest -f Dockerfile . ...@@ -13,8 +13,6 @@ docker build -t mineru-sglang:latest -f Dockerfile .
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms. > The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation. > If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation.
---
## Docker Description ## Docker Description
MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference. MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
...@@ -28,9 +26,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg ...@@ -28,9 +26,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg
> >
> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service. > If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
--- ## Start Docker Container
## Start Docker Container:
```bash ```bash
docker run --gpus all \ docker run --gpus all \
...@@ -42,9 +38,7 @@ docker run --gpus all \ ...@@ -42,9 +38,7 @@ docker run --gpus all \
``` ```
After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features. After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features.
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [MinerU Usage Documentation](../usage/index.md). You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
---
## Start Services Directly with Docker Compose ## Start Services Directly with Docker Compose
...@@ -66,7 +60,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml ...@@ -66,7 +60,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
### Start sglang-server service ### Start sglang-server service
connect to `sglang-server` via `vlm-sglang-client` backend connect to `sglang-server` via `vlm-sglang-client` backend
```bash ```bash
docker compose -f compose.yaml --profile mineru-sglang-server up -d docker compose -f compose.yaml --profile sglang-server up -d
``` ```
>[!TIP] >[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed) >In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
...@@ -78,7 +72,7 @@ connect to `sglang-server` via `vlm-sglang-client` backend ...@@ -78,7 +72,7 @@ connect to `sglang-server` via `vlm-sglang-client` backend
### Start Web API service ### Start Web API service
```bash ```bash
docker compose -f compose.yaml --profile mineru-api up -d docker compose -f compose.yaml --profile api up -d
``` ```
>[!TIP] >[!TIP]
>Access `http://<server_ip>:8000/docs` in your browser to view the API documentation. >Access `http://<server_ip>:8000/docs` in your browser to view the API documentation.
...@@ -87,7 +81,7 @@ connect to `sglang-server` via `vlm-sglang-client` backend ...@@ -87,7 +81,7 @@ connect to `sglang-server` via `vlm-sglang-client` backend
### Start Gradio WebUI service ### Start Gradio WebUI service
```bash ```bash
docker compose -f compose.yaml --profile mineru-gradio up -d docker compose -f compose.yaml --profile gradio up -d
``` ```
>[!TIP] >[!TIP]
> >
......
# Quick Start # Quick Start
If you encounter any installation issues, please check the [FAQ](../FAQ/index.md) first. If you encounter any installation issues, please check the [FAQ](../faq/index.md) first.
## Online Experience ## Online Experience
...@@ -93,4 +93,9 @@ You can get the [Docker Deployment Instructions](./docker_deployment.md) in the ...@@ -93,4 +93,9 @@ You can get the [Docker Deployment Instructions](./docker_deployment.md) in the
### Using MinerU ### Using MinerU
The simplest command line invocation is:
```bash
mineru -p <input_path> -o <output_path>
```
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md). You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md).
\ No newline at end of file
# Advanced Command Line Parameters # Advanced Command Line Parameters
---
## SGLang Acceleration Parameter Optimization ## SGLang Acceleration Parameter Optimization
### Memory Optimization Parameters ### Memory Optimization Parameters
...@@ -11,8 +9,6 @@ ...@@ -11,8 +9,6 @@
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower. > - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2` > - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
---
### Performance Optimization Parameters ### Performance Optimization Parameters
> [!TIP] > [!TIP]
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters: > If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
...@@ -20,15 +16,11 @@ ...@@ -20,15 +16,11 @@
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2` > - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile` > - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
---
### Parameter Passing Instructions ### Parameter Passing Instructions
> [!TIP] > [!TIP]
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api` > - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands) > - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
---
## GPU Device Selection and Configuration ## GPU Device Selection and Configuration
### CUDA_VISIBLE_DEVICES Basic Usage ### CUDA_VISIBLE_DEVICES Basic Usage
...@@ -39,8 +31,6 @@ ...@@ -39,8 +31,6 @@
> ``` > ```
> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends. > - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
---
### Common Device Configuration Examples ### Common Device Configuration Examples
> [!TIP] > [!TIP]
> Here are some common `CUDA_VISIBLE_DEVICES` setting examples: > Here are some common `CUDA_VISIBLE_DEVICES` setting examples:
...@@ -52,8 +42,6 @@ ...@@ -52,8 +42,6 @@
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible > CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> ``` > ```
---
## Practical Application Scenarios ## Practical Application Scenarios
> [!TIP] > [!TIP]
> Here are some possible usage scenarios: > Here are some possible usage scenarios:
......
# Using MinerU # Usage Guide
## Quick Model Source Configuration This section provides comprehensive usage instructions for the project. We will help you progressively master the project's usage from basic to advanced through the following sections:
MinerU uses `huggingface` as the default model source. If users cannot access `huggingface` due to network restrictions, they can conveniently switch the model source to `modelscope` through environment variables:
```bash
export MINERU_MODEL_SOURCE=modelscope
```
For more information about model source configuration and custom local model paths, please refer to the [Model Source Documentation](./model_source.md) in the documentation.
--- ## Table of Contents
## Quick Usage via Command Line - [Quick Usage](./quick_usage.md) - Quick setup and basic usage
MinerU has built-in command line tools that allow users to quickly use MinerU for PDF parsing through the command line: - [Model Source Configuration](./model_source.md) - Detailed configuration instructions for model sources
```bash - [Command Line Tools](./cli_tools.md) - Detailed parameter descriptions for command line tools
# Default parsing using pipeline backend - [Advanced Optimization Parameters](./advanced_cli_parameters.md) - Advanced parameter descriptions for command line tool adaptation
mineru -p <input_path> -o <output_path>
```
> [!TIP]
>- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory
>
> For more information about output files, please refer to [Output File Documentation](../output_files.md).
> [!NOTE] ## Getting Started
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
We recommend reading the documentation in the order listed above, which will help you better understand and use the project features.
```bash If you encounter issues during usage, please check the [FAQ](../faq/index.md)
# Or specify vlm backend for parsing \ No newline at end of file
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
---
## Advanced Usage via API, WebUI, sglang-client/server
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- FastAPI calls:
```bash
mineru-api --host 127.0.0.1 --port 8000
```
>[!TIP]
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends
mineru-gradio --server-name 127.0.0.1 --server-port 7860
# Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
```
>[!TIP]
>
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
- Using `sglang-client/server` method:
```bash
# Start sglang server (requires sglang environment)
mineru-sglang-server --port 30000
```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ```
> [!TIP]
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
---
## Extending MinerU Functionality with Configuration Files
MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can edit `mineru.json` file in your user directory to add custom configurations.
>[!TIP]
>The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
Here are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
...@@ -36,7 +36,7 @@ or use the interactive command line tool to select model downloads: ...@@ -36,7 +36,7 @@ or use the interactive command line tool to select model downloads:
```bash ```bash
mineru-models-download mineru-models-download
``` ```
>[!TIP] > [!NOTE]
>- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory. >- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory.
>- You can also create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`. >- You can also create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
>- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`. >- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`.
......
# Using MinerU
## Quick Model Source Configuration
MinerU uses `huggingface` as the default model source. If users cannot access `huggingface` due to network restrictions, they can conveniently switch the model source to `modelscope` through environment variables:
```bash
export MINERU_MODEL_SOURCE=modelscope
```
For more information about model source configuration and custom local model paths, please refer to the [Model Source Documentation](./model_source.md) in the documentation.
## Quick Usage via Command Line
MinerU has built-in command line tools that allow users to quickly use MinerU for PDF parsing through the command line:
```bash
# Default parsing using pipeline backend
mineru -p <input_path> -o <output_path>
```
> [!TIP]
>- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory
>
> For more information about output files, please refer to [Output File Documentation](../reference/output_files.md).
> [!NOTE]
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
```bash
# Or specify vlm backend for parsing
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
## Advanced Usage via API, WebUI, sglang-client/server
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- FastAPI calls:
```bash
mineru-api --host 0.0.0.0 --port 8000
```
>[!TIP]
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends
mineru-gradio --server-name 0.0.0.0 --server-port 7860
# Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
```
>[!TIP]
>
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
- Using `sglang-client/server` method:
```bash
# Start sglang server (requires sglang environment)
mineru-sglang-server --port 30000
```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ```
> [!NOTE]
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
## Extending MinerU Functionality with Configuration Files
MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can edit `mineru.json` file in your user directory to add custom configurations.
>[!IMPORTANT]
>The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
Here are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
...@@ -13,8 +13,6 @@ docker build -t mineru-sglang:latest -f Dockerfile . ...@@ -13,8 +13,6 @@ docker build -t mineru-sglang:latest -f Dockerfile .
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.8.post1-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台, > [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.8.post1-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.8.post1-cu128-b200` 再执行build操作。 > 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.8.post1-cu128-b200` 再执行build操作。
---
## Docker说明 ## Docker说明
Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。 Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。
...@@ -27,9 +25,7 @@ Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中 ...@@ -27,9 +25,7 @@ Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中
> >
> 如果您的设备不满足上述条件,您仍然可以使用MinerU的其他功能,但无法使用`sglang`加速VLM模型推理,即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。 > 如果您的设备不满足上述条件,您仍然可以使用MinerU的其他功能,但无法使用`sglang`加速VLM模型推理,即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
--- ## 启动 Docker 容器
## 启动 Docker 容器:
```bash ```bash
docker run --gpus all \ docker run --gpus all \
...@@ -41,9 +37,7 @@ docker run --gpus all \ ...@@ -41,9 +37,7 @@ docker run --gpus all \
``` ```
执行该命令后,您将进入到Docker容器的交互式终端,并映射了一些端口用于可能会使用的服务,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。 执行该命令后,您将进入到Docker容器的交互式终端,并映射了一些端口用于可能会使用的服务,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[MinerU使用文档](../usage/index.md) 您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)
---
## 通过 Docker Compose 直接启动服务 ## 通过 Docker Compose 直接启动服务
...@@ -64,7 +58,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml ...@@ -64,7 +58,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
### 启动 sglang-server 服务 ### 启动 sglang-server 服务
并通过`vlm-sglang-client`后端连接`sglang-server` 并通过`vlm-sglang-client`后端连接`sglang-server`
```bash ```bash
docker compose -f compose.yaml --profile mineru-sglang-server up -d docker compose -f compose.yaml --profile sglang-server up -d
``` ```
>[!TIP] >[!TIP]
>在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境) >在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
...@@ -76,7 +70,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml ...@@ -76,7 +70,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
### 启动 Web API 服务 ### 启动 Web API 服务
```bash ```bash
docker compose -f compose.yaml --profile mineru-api up -d docker compose -f compose.yaml --profile api up -d
``` ```
>[!TIP] >[!TIP]
>在浏览器中访问 `http://<server_ip>:8000/docs` 查看API文档。 >在浏览器中访问 `http://<server_ip>:8000/docs` 查看API文档。
...@@ -85,7 +79,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml ...@@ -85,7 +79,7 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
### 启动 Gradio WebUI 服务 ### 启动 Gradio WebUI 服务
```bash ```bash
docker compose -f compose.yaml --profile mineru-gradio up -d docker compose -f compose.yaml --profile gradio up -d
``` ```
>[!TIP] >[!TIP]
> >
......
# 快速开始 # 快速开始
如果遇到任何安装问题,请先查询 [FAQ](../FAQ/index.md) 如果遇到任何安装问题,请先查询 [FAQ](../faq/index.md)
## 在线体验 ## 在线体验
...@@ -93,4 +93,9 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并 ...@@ -93,4 +93,9 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并
### 使用 MinerU ### 使用 MinerU
最简单的命令行调用方式:
```bash
mineru -p <input_path> -o <output_path>
```
您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](../usage/index.md) 您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](../usage/index.md)
\ No newline at end of file
# 命令行参数进阶技巧 # 命令行参数进阶
---
## SGLang 加速参数优化 ## SGLang 加速参数优化
...@@ -11,8 +9,6 @@ ...@@ -11,8 +9,6 @@
> - 如果您使用单张显卡遇到显存不足的情况时,可能需要调低KV缓存大小,`--mem-fraction-static 0.5`,如仍出现显存不足问题,可尝试进一步降低到`0.4`或更低 > - 如果您使用单张显卡遇到显存不足的情况时,可能需要调低KV缓存大小,`--mem-fraction-static 0.5`,如仍出现显存不足问题,可尝试进一步降低到`0.4`或更低
> - 如您有两张以上显卡,可尝试通过张量并行(TP)模式简单扩充可用显存:`--tp-size 2` > - 如您有两张以上显卡,可尝试通过张量并行(TP)模式简单扩充可用显存:`--tp-size 2`
---
### 性能优化参数 ### 性能优化参数
> [!TIP] > [!TIP]
> 如果您已经可以正常使用sglang对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数: > 如果您已经可以正常使用sglang对vlm模型进行加速推理,但仍然希望进一步提升推理速度,可以尝试以下参数:
...@@ -20,15 +16,11 @@ ...@@ -20,15 +16,11 @@
> - 如果您有超过多张显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp-size 2` > - 如果您有超过多张显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp-size 2`
> - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile` > - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile`
---
### 参数传递说明 ### 参数传递说明
> [!TIP] > [!TIP]
> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api` > - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
> - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands) > - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
---
## GPU 设备选择与配置 ## GPU 设备选择与配置
### CUDA_VISIBLE_DEVICES 基本用法 ### CUDA_VISIBLE_DEVICES 基本用法
...@@ -39,8 +31,6 @@ ...@@ -39,8 +31,6 @@
> ``` > ```
> - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。 > - 这种指定方式对所有的命令行调用都有效,包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`,且对`pipeline`、`vlm`后端均适用。
---
### 常见设备配置示例 ### 常见设备配置示例
> [!TIP] > [!TIP]
> 以下是一些常见的 `CUDA_VISIBLE_DEVICES` 设置示例: > 以下是一些常见的 `CUDA_VISIBLE_DEVICES` 设置示例:
...@@ -52,8 +42,6 @@ ...@@ -52,8 +42,6 @@
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible > CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> ``` > ```
---
## 实际应用场景 ## 实际应用场景
> [!TIP] > [!TIP]
......
...@@ -31,33 +31,28 @@ mineru-api --help ...@@ -31,33 +31,28 @@ mineru-api --help
Usage: mineru-api [OPTIONS] Usage: mineru-api [OPTIONS]
Options: Options:
--host TEXT Server host (default: 127.0.0.1) --host TEXT 服务器主机地址(默认:127.0.0.1
--port INTEGER Server port (default: 8000) --port INTEGER 服务器端口(默认:8000
--reload Enable auto-reload (development mode) --reload 启用自动重载(开发模式)
--help Show this message and exit. --help 显示此帮助信息并退出
``` ```
```bash ```bash
mineru-gradio --help mineru-gradio --help
Usage: mineru-gradio [OPTIONS] Usage: mineru-gradio [OPTIONS]
Options: Options:
--enable-example BOOLEAN Enable example files for input.The example --enable-example BOOLEAN 启用示例文件输入(需要将示例文件放置在当前
files to be input need to be placed in the 执行命令目录下的 `example` 文件夹中)
`example` folder within the directory where --enable-sglang-engine BOOLEAN 启用 SgLang 引擎后端以提高处理速度
the command is currently executed. --enable-api BOOLEAN 启用 Gradio API 以提供应用程序服务
--enable-sglang-engine BOOLEAN Enable SgLang engine backend for faster --max-convert-pages INTEGER 设置从 PDF 转换为 Markdown 的最大页数
processing. --server-name TEXT 设置 Gradio 应用程序的服务器主机名
--enable-api BOOLEAN Enable gradio API for serving the --server-port INTEGER 设置 Gradio 应用程序的服务器端口
application.
--max-convert-pages INTEGER Set the maximum number of pages to convert
from PDF to Markdown.
--server-name TEXT Set the server name for the Gradio app.
--server-port INTEGER Set the server port for the Gradio app.
--latex-delimiters-type [a|b|all] --latex-delimiters-type [a|b|all]
Set the type of LaTeX delimiters to use in 设置在 Markdown 渲染中使用的 LaTeX 分隔符类型
Markdown rendering:'a' for type '$', 'b' for ('a' 表示 '$' 类型,'b' 表示 '()[]' 类型,
type '()[]', 'all' for both types. 'all' 表示两种类型都使用)
--help Show this message and exit. --help 显示此帮助信息并退出
``` ```
## 环境变量说明 ## 环境变量说明
...@@ -71,5 +66,3 @@ MinerU命令行工具的某些参数存在相同功能的环境变量配置, ...@@ -71,5 +66,3 @@ MinerU命令行工具的某些参数存在相同功能的环境变量配置,
- `MINERU_TOOLS_CONFIG_JSON`:用于指定配置文件路径,默认为用户目录下的`mineru.json`,可通过环境变量指定其他配置文件路径。 - `MINERU_TOOLS_CONFIG_JSON`:用于指定配置文件路径,默认为用户目录下的`mineru.json`,可通过环境变量指定其他配置文件路径。
- `MINERU_FORMULA_ENABLE`:用于启用公式解析,默认为`true`,可通过环境变量设置为`false`来禁用公式解析。 - `MINERU_FORMULA_ENABLE`:用于启用公式解析,默认为`true`,可通过环境变量设置为`false`来禁用公式解析。
- `MINERU_TABLE_ENABLE`:用于启用表格解析,默认为`true`,可通过环境变量设置为`false`来禁用表格解析。 - `MINERU_TABLE_ENABLE`:用于启用表格解析,默认为`true`,可通过环境变量设置为`false`来禁用表格解析。
# 使用 MinerU # 使用指南
## 快速配置模型源 本章节提供了项目的完整使用说明。我们将通过以下几个部分,帮助您从基础到进阶逐步掌握项目的使用方法:
MinerU默认使用`huggingface`作为模型源,若用户网络无法访问`huggingface`,可以通过环境变量便捷地切换模型源为`modelscope`
```bash
export MINERU_MODEL_SOURCE=modelscope
```
有关模型源配置和自定义本地模型路径的更多信息,请参考文档中的[模型源说明](./model_source.md)
--- ## 目录
## 通过命令行快速使用 - [快速使用](./quick_usage.md) - 快速上手和基本使用
MinerU内置了命令行工具,用户可以通过命令行快速使用MinerU进行PDF解析: - [模型源配置](./model_source.md) - 模型源的详细配置说明
```bash - [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
# 默认使用pipeline后端解析 - [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
mineru -p <input_path> -o <output_path>
```
> [!TIP]
> - `<input_path>`:本地 PDF/图片 文件或目录
> - `<output_path>`:输出目录
>
> 更多关于输出文件的信息,请参考[输出文件说明](../output_files.md)。
> [!NOTE] ## 开始使用
> 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速,
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择适合自己cuda版本的命令安装支持加速的`torch`和`torchvision`。
建议按照上述顺序阅读文档,这样可以帮助您更好地理解和使用项目功能。
```bash 如果您在使用过程中遇到问题,请查看 [FAQ](../faq/index.md)
# 或指定vlm后端解析 \ No newline at end of file
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> vlm后端另外支持`sglang`加速,与`transformers`后端相比,`sglang`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)
---
## 通过api、webui、sglang-client/server进阶使用
- 通过python api直接调用:[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- 通过fast api方式调用:
```bash
mineru-api --host 127.0.0.1 --port 8000
```
>[!TIP]
>在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
- 启动gradio webui 可视化前端:
```bash
# 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
mineru-gradio --server-name 127.0.0.1 --server-port 7860
# 或使用 vlm-sglang-engine/pipeline 后端(需安装sglang环境)
mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
```
>[!TIP]
>
>- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
>- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
- 使用`sglang-client/server`方式调用:
```bash
# 启动sglang server(需要安装sglang环境)
mineru-sglang-server --port 30000
```
>[!TIP]
>在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ```
> [!TIP]
> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`,
> 我们整理了一些`sglang`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
---
## 基于配置文件扩展 MinerU 功能
MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。您可通过编辑用户目录下的 `mineru.json` 文件,添加自定义配置。
>[!TIP]
>`mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成,也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。
以下是一些可用的配置选项:
- `latex-delimiter-config`:用于配置 LaTeX 公式的分隔符,默认为`$`符号,可根据需要修改为其他符号或字符串。
- `llm-aided-config`:用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型,默认使用`阿里云百炼``qwen2.5-32b-instruct`模型,您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
- `models-dir`:用于指定本地模型存储目录,请为`pipeline``vlm`后端分别指定模型目录,指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
...@@ -37,7 +37,7 @@ mineru-models-download --help ...@@ -37,7 +37,7 @@ mineru-models-download --help
```bash ```bash
mineru-models-download mineru-models-download
``` ```
>[!TIP] > [!NOTE]
>- 下载完成后,模型路径会在当前终端窗口输出,并自动写入用户目录下的 `mineru.json`。 >- 下载完成后,模型路径会在当前终端窗口输出,并自动写入用户目录下的 `mineru.json`。
>- 您也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建配置文件。 >- 您也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建配置文件。
>- 模型下载到本地后,您可以自由移动模型文件夹到其他位置,同时需要在 `mineru.json` 中更新模型路径。 >- 模型下载到本地后,您可以自由移动模型文件夹到其他位置,同时需要在 `mineru.json` 中更新模型路径。
......
# 使用 MinerU
## 快速配置模型源
MinerU默认使用`huggingface`作为模型源,若用户网络无法访问`huggingface`,可以通过环境变量便捷地切换模型源为`modelscope`
```bash
export MINERU_MODEL_SOURCE=modelscope
```
有关模型源配置和自定义本地模型路径的更多信息,请参考文档中的[模型源说明](./model_source.md)
## 通过命令行快速使用
MinerU内置了命令行工具,用户可以通过命令行快速使用MinerU进行PDF解析:
```bash
# 默认使用pipeline后端解析
mineru -p <input_path> -o <output_path>
```
> [!TIP]
> - `<input_path>`:本地 PDF/图片 文件或目录
> - `<output_path>`:输出目录
>
> 更多关于输出文件的信息,请参考[输出文件说明](../reference/output_files.md)。
> [!NOTE]
> 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速,
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择适合自己cuda版本的命令安装支持加速的`torch`和`torchvision`。
```bash
# 或指定vlm后端解析
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> vlm后端另外支持`sglang`加速,与`transformers`后端相比,`sglang`的加速比可达20~30倍,可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。
如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)
## 通过api、webui、sglang-client/server进阶使用
- 通过python api直接调用:[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- 通过fast api方式调用:
```bash
mineru-api --host 0.0.0.0 --port 8000
```
>[!TIP]
>在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
- 启动gradio webui 可视化前端:
```bash
# 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
mineru-gradio --server-name 0.0.0.0 --server-port 7860
# 或使用 vlm-sglang-engine/pipeline 后端(需安装sglang环境)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
```
>[!TIP]
>
>- 在浏览器中访问 `http://127.0.0.1:7860` 使用 Gradio WebUI。
>- 访问 `http://127.0.0.1:7860/?view=api` 使用 Gradio API。
- 使用`sglang-client/server`方式调用:
```bash
# 启动sglang server(需要安装sglang环境)
mineru-sglang-server --port 30000
```
>[!TIP]
>在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ```
> [!NOTE]
> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`,
> 我们整理了一些`sglang`使用中的常用参数和使用方法,可以在文档[命令行进阶参数](./advanced_cli_parameters.md)中获取。
## 基于配置文件扩展 MinerU 功能
MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。您可通过编辑用户目录下的 `mineru.json` 文件,添加自定义配置。
>[!IMPORTANT]
>`mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成,也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。
以下是一些可用的配置选项:
- `latex-delimiter-config`:用于配置 LaTeX 公式的分隔符,默认为`$`符号,可根据需要修改为其他符号或字符串。
- `llm-aided-config`:用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型,默认使用`阿里云百炼``qwen2.5-32b-instruct`模型,您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
- `models-dir`:用于指定本地模型存储目录,请为`pipeline``vlm`后端分别指定模型目录,指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
__version__ = "2.1.0" __version__ = "2.1.1"
...@@ -56,12 +56,12 @@ extra: ...@@ -56,12 +56,12 @@ extra:
name: GitHub name: GitHub
- icon: fontawesome/brands/x-twitter - icon: fontawesome/brands/x-twitter
link: https://x.com/OpenDataLab_AI link: https://x.com/OpenDataLab_AI
name: Twitter name: X-Twitter
- icon: fontawesome/brands/discord - icon: fontawesome/brands/discord
link: https://discord.gg/Tdedn9GTXq link: https://discord.gg/Tdedn9GTXq
name: Discord name: Discord
- icon: fontawesome/brands/weixin - icon: fontawesome/brands/weixin
link: https://mineru.space/common/qun/?qid=362634 link: http://mineru.space/s/V85Yl
name: WeChat name: WeChat
- icon: material/email - icon: material/email
link: mailto:OpenDataLab@pjlab.org.cn link: mailto:OpenDataLab@pjlab.org.cn
...@@ -78,8 +78,9 @@ nav: ...@@ -78,8 +78,9 @@ nav:
- Docker Deployment: quick_start/docker_deployment.md - Docker Deployment: quick_start/docker_deployment.md
- Usage: - Usage:
- Usage: usage/index.md - Usage: usage/index.md
- CLI Tools: usage/cli_tools.md - Quick Usage: usage/quick_usage.md
- Model Source: usage/model_source.md - Model Source: usage/model_source.md
- CLI Tools: usage/cli_tools.md
- Advanced CLI Parameters: usage/advanced_cli_parameters.md - Advanced CLI Parameters: usage/advanced_cli_parameters.md
- Reference: - Reference:
- Output File Format: reference/output_files.md - Output File Format: reference/output_files.md
...@@ -117,6 +118,7 @@ plugins: ...@@ -117,6 +118,7 @@ plugins:
Extension Modules: 扩展模块安装 Extension Modules: 扩展模块安装
Docker Deployment: Docker部署 Docker Deployment: Docker部署
Usage: 使用方法 Usage: 使用方法
Quick Usage: 快速使用
CLI Tools: 命令行工具 CLI Tools: 命令行工具
Model Source: 模型源 Model Source: 模型源
Advanced CLI Parameters: 命令行进阶参数 Advanced CLI Parameters: 命令行进阶参数
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment