Unverified Commit da048cbf authored by Sidney233's avatar Sidney233 Committed by GitHub
Browse files

Merge branch 'opendatalab:dev' into dev

parents 5685b22b 8a7fec50
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/)
[![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru)
[![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru)
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
...@@ -31,9 +31,6 @@ ...@@ -31,9 +31,6 @@
<!-- hot link --> <!-- hot link -->
<p align="center"> <p align="center">
<a href="https://github.com/opendatalab/PDF-Extract-Kit">PDF-Extract-Kit: High-Quality PDF Extraction Toolkit</a>🔥🔥🔥
<br>
<br>
🚀<a href="https://mineru.net/?source=github">Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!</a> 🚀<a href="https://mineru.net/?source=github">Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!</a>
</p> </p>
...@@ -398,36 +395,6 @@ ...@@ -398,36 +395,6 @@
</details> </details>
</details> </details>
<!-- TABLE OF CONTENT -->
<details open="open">
<summary><h2 style="display: inline-block">Table of Contents</h2></summary>
<ol>
<li>
<a href="#mineru">MinerU</a>
<ul>
<li><a href="#project-introduction">Project Introduction</a></li>
<li><a href="#key-features">Key Features</a></li>
<li><a href="#quick-start">Quick Start</a>
<ul>
<li><a href="#online-demo">Online Demo</a></li>
<li><a href="#local-deployment">Local Deployment</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#todo">TODO</a></li>
<li><a href="#known-issues">Known Issues</a></li>
<li><a href="#faq">FAQ</a></li>
<li><a href="#all-thanks-to-our-contributors">All Thanks To Our Contributors</a></li>
<li><a href="#license-information">License Information</a></li>
<li><a href="#acknowledgments">Acknowledgments</a></li>
<li><a href="#citation">Citation</a></li>
<li><a href="#star-history">Star History</a></li>
<li><a href="#links">Links</a></li>
</ol>
</details>
# MinerU # MinerU
## Project Introduction ## Project Introduction
...@@ -453,14 +420,25 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c ...@@ -453,14 +420,25 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
- Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration - Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
- Compatible with Windows, Linux, and Mac platforms. - Compatible with Windows, Linux, and Mac platforms.
## Quick Start # Quick Start
If you encounter any installation issues, please first consult the <a href="#faq">FAQ</a>. </br> If you encounter any installation issues, please first consult the <a href="#faq">FAQ</a>. </br>
If the parsing results are not as expected, refer to the <a href="#known-issues">Known Issues</a>. </br> If the parsing results are not as expected, refer to the <a href="#known-issues">Known Issues</a>. </br>
There are three different ways to experience MinerU:
- [Online Demo](#online-demo) ## Online Experience
- [Local Deployment](#local-deployment)
### Official online web application
The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use
- [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
### Gradio-based online demo
A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required
- [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
- [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
## Local Deployment
> [!WARNING] > [!WARNING]
...@@ -481,9 +459,9 @@ There are three different ways to experience MinerU: ...@@ -481,9 +459,9 @@ There are three different ways to experience MinerU:
</tr> </tr>
<tr> <tr>
<td>Operating System</td> <td>Operating System</td>
<td>windows/linux/mac</td> <td>Linux / Windows / macOS</td>
<td>windows/linux</td> <td>Linux / Windows</td>
<td>windows(wsl2)/linux</td> <td>Linux / Windows (via WSL2)</td>
</tr> </tr>
<tr> <tr>
<td>CPU Inference Support</td> <td>CPU Inference Support</td>
...@@ -492,12 +470,12 @@ There are three different ways to experience MinerU: ...@@ -492,12 +470,12 @@ There are three different ways to experience MinerU:
</tr> </tr>
<tr> <tr>
<td>GPU Requirements</td> <td>GPU Requirements</td>
<td>Turing architecture or later, 6GB+ VRAM or Apple Silicon</td> <td>Turing architecture and later, 6GB+ VRAM or Apple Silicon</td>
<td colspan="2">Turing architecture or later, 8GB+ VRAM</td> <td colspan="2">Turing architecture and later, 8GB+ VRAM</td>
</tr> </tr>
<tr> <tr>
<td>Memory Requirements</td> <td>Memory Requirements</td>
<td colspan="3">Minimum 16GB+, 32GB+ recommended</td> <td colspan="3">Minimum 16GB+, recommended 32GB+</td>
</tr> </tr>
<tr> <tr>
<td>Disk Space Requirements</td> <td>Disk Space Requirements</td>
...@@ -509,280 +487,37 @@ There are three different ways to experience MinerU: ...@@ -509,280 +487,37 @@ There are three different ways to experience MinerU:
</tr> </tr>
</table> </table>
## Online Demo ### Install MinerU
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
## Local Deployment
### 1. Install MinerU
#### 1.1 Install via pip or uv
#### Install MinerU using pip or uv
```bash ```bash
pip install --upgrade pip pip install --upgrade pip
pip install uv pip install uv
uv pip install -U "mineru[core]" uv pip install -U "mineru[core]"
``` ```
#### 1.2 Install from source #### Install MinerU from source code
```bash ```bash
git clone https://github.com/opendatalab/MinerU.git git clone https://github.com/opendatalab/MinerU.git
cd MinerU cd MinerU
uv pip install -e .[core] uv pip install -e .[core]
``` ```
> [!NOTE]
> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration,
> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
#### 1.3 Install Full Version (supports sglang acceleration) (requires device with Turing or newer architecture and at least 8GB GPU memory)
If you need to use **sglang to accelerate VLM model inference**, you can choose any of the following methods to install the full version:
- Install using uv or pip:
```bash
uv pip install -U "mineru[all]"
```
- Install from source:
```bash
uv pip install -e .[all]
```
> [!TIP]
> If any exceptions occur during the installation of `sglang`, please refer to the [official sglang documentation](https://docs.sglang.ai/start/install.html) for troubleshooting and solutions, or directly use Docker-based installation.
- Build image using Dockerfile:
```bash
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
docker build -t mineru-sglang:latest -f Dockerfile .
```
Start Docker container:
```bash
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
--ipc=host \
mineru-sglang:latest \
mineru-sglang-server --host 0.0.0.0 --port 30000
```
Or start using Docker Compose:
```bash
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
docker compose -f compose.yaml up -d
```
> [!TIP] > [!TIP]
> The Dockerfile uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the default base image, which supports the Turing/Ampere/Ada Lovelace/Hopper platforms. > `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you are using the newer Blackwell platform, please change the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200`. > If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
#### 1.4 Install client (for connecting to sglang-server on edge devices that require only CPU and network connectivity)
```bash
uv pip install -U mineru
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<host_ip>:<port>
```
--- ---
### 2. Using MinerU #### Deploy MinerU using Docker
MinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues.
#### 2.1 Command Line Usage You can get the [Docker Deployment Instructions](https://opendatalab.github.io/MinerU/quick_start/docker_deployment/) in the documentation.
##### Basic Usage
The simplest command line invocation is:
```bash
mineru -p <input_path> -o <output_path>
```
- `<input_path>`: Local PDF/Image file or directory (supports pdf/png/jpg/jpeg/webp/gif)
- `<output_path>`: Output directory
##### View Help Information
Get all available parameter descriptions:
```bash
mineru --help
```
##### Parameter Details
```text
Usage: mineru [OPTIONS]
Options:
-v, --version Show version and exit
-p, --path PATH Input file path or directory (required)
-o, --output PATH Output directory (required)
-m, --method [auto|txt|ocr] Parsing method: auto (default), txt, ocr (pipeline backend only)
-b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
Parsing backend (default: pipeline)
-l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
Specify document language (improves OCR accuracy, pipeline backend only)
-u, --url TEXT Service address when using sglang-client
-s, --start INTEGER Starting page number (0-based)
-e, --end INTEGER Ending page number (0-based)
-f, --formula BOOLEAN Enable formula parsing (default: on)
-t, --table BOOLEAN Enable table parsing (default: on)
-d, --device TEXT Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline backend only)
--vram INTEGER Maximum GPU VRAM usage per process (GB)(pipeline backend only)
--source [huggingface|modelscope|local]
Model source, default: huggingface
--help Show help information
```
--- ---
#### 2.2 Model Source Configuration ### Using MinerU
MinerU automatically downloads required models from HuggingFace on first run. If HuggingFace is inaccessible, you can switch model sources:
##### Switch to ModelScope Source
```bash You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](https://opendatalab.github.io/MinerU/usage/).
mineru -p <input_path> -o <output_path> --source modelscope
```
Or set environment variable:
```bash
export MINERU_MODEL_SOURCE=modelscope
mineru -p <input_path> -o <output_path>
```
##### Using Local Models
###### 1. Download Models Locally
```bash
mineru-models-download --help
```
Or use interactive command-line tool to select models:
```bash
mineru-models-download
```
After download, model paths will be displayed in current terminal and automatically written to `mineru.json` in user directory.
###### 2. Parse Using Local Models
```bash
mineru -p <input_path> -o <output_path> --source local
```
Or enable via environment variable:
```bash
export MINERU_MODEL_SOURCE=local
mineru -p <input_path> -o <output_path>
```
---
#### 2.3 Using sglang to Accelerate VLM Model Inference
##### Through the sglang-engine Mode
```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
```
##### Through the sglang-server/client Mode
1. Start Server:
```bash
mineru-sglang-server --port 30000
```
2. Use Client in another terminal:
```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
```
> [!TIP]
> For more information about output files, please refer to [Output File Documentation](docs/output_file_en_us.md)
---
### 3. API Calls or Visual Invocation
1. Directly invoke using Python API: [Python Invocation Example](demo/demo.py)
2. Invoke using FastAPI:
```bash
mineru-api --host 127.0.0.1 --port 8000
```
Visit http://127.0.0.1:8000/docs in your browser to view the API documentation.
3. Use Gradio WebUI or Gradio API:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backend
mineru-gradio --server-name 127.0.0.1 --server-port 7860
# Or using vlm-sglang-engine/pipeline backend
mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
```
Access http://127.0.0.1:7860 in your browser to use the Gradio WebUI, or visit http://127.0.0.1:7860/?view=api to use the Gradio API.
> [!TIP]
> Below are some suggestions and notes for using the sglang acceleration mode:
> - The sglang acceleration mode currently supports operation on Turing architecture GPUs with a minimum of 8GB VRAM, but you may encounter VRAM shortages on GPUs with less than 24GB VRAM. You can optimize VRAM usage with the following parameters:
> - If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by setting `--mem-fraction-static 0.5`. If VRAM issues persist, try lowering it further to `0.4` or below.
> - If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode: `--tp-size 2`
> - If you are already successfully using sglang to accelerate VLM inference but wish to further improve inference speed, consider the following parameters:
> - If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode: `--dp-size 2`
> - You can also enable `torch.compile` to accelerate inference speed by about 15%: `--enable-torch-compile`
> - For more information on using sglang parameters, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
> - All sglang-supported parameters can be passed to MinerU via command-line arguments, including those used with the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> [!TIP]
> - In any case, you can specify visible GPU devices at the start of a command line by adding the `CUDA_VISIBLE_DEVICES` environment variable. For example:
> ```bash
> CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
> ```
> - This method works for all command-line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
> - Below are some common `CUDA_VISIBLE_DEVICES` settings:
> ```bash
> CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
> CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
> CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional
> CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
> CUDA_VISIBLE_DEVICES="" No GPU will be visible
> ```
> - Below are some possible use cases:
> - If you have multiple GPUs and need to specify GPU 0 and GPU 1 to launch 'sglang-server' in multi-GPU mode, you can use the following command:
> ```bash
> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
> ```
> - If you have multiple GPUs and need to launch two `fastapi` services on GPU 0 and GPU 1 respectively, listening on different ports, you can use the following commands:
> ```bash
> # In terminal 1
> CUDA_VISIBLE_DEVICES=0 mineru-api --host 127.0.0.1 --port 8000
> # In terminal 2
> CUDA_VISIBLE_DEVICES=1 mineru-api --host 127.0.0.1 --port 8001
> ```
---
### 4. Extending MinerU Functionality Through Configuration Files
- MinerU is designed to work out-of-the-box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your home directory and add custom configurations.
- The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`. Alternatively, you can create it by copying the [configuration template file](./mineru.template.json) to your home directory and renaming it to `mineru.json`.
- Below are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to the `$` symbol, and can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure related parameters for LLM-assisted heading level detection, compatible with all LLM models supporting the `OpenAI protocol`. It defaults to Alibaba Cloud Qwen's `qwen2.5-32b-instruct` model. You need to configure an API key yourself and set `enable` to `true` to activate this feature.
- `models-dir`: Used to specify local model storage directories. Please specify separate model directories for the `pipeline` and `vlm` backends. After specifying these directories, you can use local models by setting the environment variable `export MINERU_MODEL_SOURCE=local`.
---
# TODO # TODO
...@@ -790,6 +525,9 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1 ...@@ -790,6 +525,9 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1
- [x] Recognition of `index` and `list` in the main text - [x] Recognition of `index` and `list` in the main text
- [x] Table recognition - [x] Table recognition
- [x] Heading Classification - [x] Heading Classification
- [x] Handwritten Text Recognition
- [x] Vertical Text Recognition
- [x] Latin Accent Mark Recognition
- [ ] Code block recognition in the main text - [ ] Code block recognition in the main text
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf) - [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- [ ] Geometric shape recognition - [ ] Geometric shape recognition
...@@ -807,7 +545,7 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1 ...@@ -807,7 +545,7 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1
# FAQ # FAQ
- If you encounter any issues during usage, you can first check the [FAQ](docs/FAQ_en_us.md) for solutions. - If you encounter any issues during usage, you can first check the [FAQ](https://opendatalab.github.io/MinerU/faq/) for solutions.
- If your issue remains unresolved, you may also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to interact with an AI assistant, which can address most common problems. - If your issue remains unresolved, you may also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to interact with an AI assistant, which can address most common problems.
- If you still cannot resolve the issue, you are welcome to join our community via [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to discuss with other users and developers. - If you still cannot resolve the issue, you are welcome to join our community via [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to discuss with other users and developers.
...@@ -872,11 +610,11 @@ Currently, some models in this project are trained based on YOLO. However, since ...@@ -872,11 +610,11 @@ Currently, some models in this project are trained based on YOLO. However, since
# Links # Links
- [Easy Data Preparation with latest LLMs-based Operators and Pipelines](https://github.com/OpenDCAI/DataFlow)
- [Vis3 (OSS browser based on s3)](https://github.com/opendatalab/Vis3)
- [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU) - [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU)
- [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM) - [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM)
- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit) - [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)
- [Vis3 (OSS browser based on s3)](https://github.com/opendatalab/Vis3)
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench) - [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html) - [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc) - [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
\ No newline at end of file
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/)
[![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru)
[![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru)
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
...@@ -415,9 +415,16 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c ...@@ -415,9 +415,16 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
## 在线体验 ## 在线体验
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) ### 官网在线应用
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) 官网在线版功能与客户端一致,界面美观,功能丰富,需要登录使用
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
- [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
### 基于Gradio的在线demo
基于gradio开发的webui,界面简洁,仅包含核心解析功能,免登录
- [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
- [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
## 本地部署 ## 本地部署
...@@ -476,8 +483,6 @@ pip install uv -i https://mirrors.aliyun.com/pypi/simple ...@@ -476,8 +483,6 @@ pip install uv -i https://mirrors.aliyun.com/pypi/simple
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
``` ```
---
#### 通过源码安装MinerU #### 通过源码安装MinerU
```bash ```bash
git clone https://github.com/opendatalab/MinerU.git git clone https://github.com/opendatalab/MinerU.git
...@@ -485,73 +490,20 @@ cd MinerU ...@@ -485,73 +490,20 @@ cd MinerU
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
``` ```
> [!TIP]
> `mineru[core]`包含除`sglang`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
> 如果您有使用`sglang`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
--- ---
#### 使用docker部署Mineru #### 使用docker部署Mineru
MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。 MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。
您可以在文档中获取[Docker部署说明]()。 您可以在文档中获取[Docker部署说明](https://opendatalab.github.io/MinerU/zh/quick_start/docker_deployment/)
--- ---
### 使用 MinerU ### 使用 MinerU
您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](https://opendatalab.github.io/MinerU/zh/usage/)
#### 快速配置模型源
MinerU默认使用`huggingface`作为模型源,用户可以通过环境变量便捷地切换模型源为`modelscope`
```bash
export MINERU_MODEL_SOURCE=modelscope
```
有关模型源配置和自定义本地模型路径的更多信息,请参考文档中的[模型源说明]()。
---
#### 通过命令行快速使用
MinerU内置了命令行工具,用户可以通过命令行快速使用MinerU进行PDF解析:
```bash
# 默认使用pipeline后端解析
mineru -p <input_path> -o <output_path>
```
- `<input_path>`:本地 PDF/图片 文件或目录(支持 pdf/png/jpg/jpeg/webp/gif)
- `<output_path>`:输出目录
> [!NOTE]
> 这将在Linux和macOS系统自动开启cuda/mps加速。Windows用户如需使用cuda加速,
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。
```bash
# 或指定vlm后端解析
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> vlm后端另外支持`sglang`加速,与`transformers`后端相比,`sglang`的加速比可达20~30倍,可以在文档中获取支持`sgalng`加速的[完整安装使用说明]()。
如果需要通过自定义参数调整解析选项,您也可以在文档中查看命令行工具的[详细使用说明]()。
---
#### 通过api、webui、sglang-client/server进阶使用
- 通过python api直接调用:[Python 调用示例](demo/demo.py)
- 通过fast api方式调用:
```bash
mineru-api --host 127.0.0.1 --port 8000
```
- 启动gradio webui 可视化前端:
```bash
# 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
mineru-gradio --server-name 127.0.0.1 --server-port 7860
# 或使用 vlm-sglang-engine/pipeline 后端(需安装sglang环境)
mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
```
- 使用`sglang-client/server`方式调用:
```bash
# 启动sglang server(需要安装sglang环境)
mineru-sglang-server --port 30000
# 在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
```
> [!TIP]
> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU,包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`,
> 我们整理了一些`sglang`使用中的常用参数和使用方法,可以在文档[sglang常用参数]()中获取。
# TODO # TODO
...@@ -579,7 +531,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers ...@@ -579,7 +531,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
# FAQ # FAQ
- 如果您在使用过程中遇到问题,可以先查看[常见问题](docs/FAQ_zh_cn.md)是否有解答。 - 如果您在使用过程中遇到问题,可以先查看[常见问题](https://opendatalab.github.io/MinerU/zh/faq/)是否有解答。
- 如果未能解决您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。 - 如果未能解决您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。
- 如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。 - 如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。
...@@ -644,11 +596,11 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers ...@@ -644,11 +596,11 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
# Links # Links
- [Easy Data Preparation with latest LLMs-based Operators and Pipelines](https://github.com/OpenDCAI/DataFlow)
- [Vis3 (OSS browser based on s3)](https://github.com/opendatalab/Vis3)
- [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU) - [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https://github.com/opendatalab/labelU)
- [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM) - [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https://github.com/opendatalab/LabelLLM)
- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit) - [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)
- [Vis3 (OSS browser based on s3)](https://github.com/opendatalab/Vis3)
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench) - [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html) - [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc) - [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
\ No newline at end of file
# Frequently Asked Questions
If your question is not listed, you can also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to communicate with the AI assistant, which can solve most common problems.
If you still cannot resolve the issue, you can join the community through [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to communicate with other users and developers.
## 1. Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2
The `libgl` library is missing in Ubuntu 22.04 on WSL2. You can install the `libgl` library with the following command to resolve the issue:
```bash
sudo apt-get install libgl1-mesa-glx
```
Reference: https://github.com/opendatalab/MinerU/issues/388
## 2. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
Reference: https://github.com/opendatalab/MinerU/issues/1004
## 3. Missing text information in parsing results when installing and using on Linux systems.
MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
To solve this problem, you can install the noto font package with the following commands, which are effective on Ubuntu/Debian systems:
```bash
sudo apt update
sudo apt install fonts-noto-core
sudo apt install fonts-noto-cjk
fc-cache -fv
```
You can also directly use our [Docker deployment](../quick_start/docker_deployment.md) method to build the image, which includes the above font packages by default.
Reference: https://github.com/opendatalab/MinerU/issues/2915
<script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/5.35.0/gradio.js"></script>
<gradio-app src="https://opendatalab-mineru.hf.space"></gradio-app>
# Frequently Asked Questions
If your question is not listed, you can also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to communicate with the AI assistant, which can solve most common problems.
If you still cannot resolve the issue, you can join the community through [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to communicate with other users and developers.
??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2"
The `libgl` library is missing in Ubuntu 22.04 on WSL2. You can install the `libgl` library with the following command to resolve the issue:
```bash
sudo apt-get install libgl1-mesa-glx
```
Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
??? question "Missing text information in parsing results when installing and using on Linux systems."
MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
To solve this problem, you can install the noto font package with the following commands, which are effective on Ubuntu/Debian systems:
```bash
sudo apt update
sudo apt install fonts-noto-core
sudo apt install fonts-noto-cjk
fc-cache -fv
```
You can also directly use our [Docker deployment](../quick_start/docker_deployment.md) method to build the image, which includes the above font packages by default.
Reference: [#2915](https://github.com/opendatalab/MinerU/issues/2915)
<div align="center" xmlns="http://www.w3.org/1999/html"> <div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo --> <!-- logo -->
<p align="center"> <p align="center">
<img src="../images/MinerU-logo.png" width="300px" style="vertical-align:middle;"> <img src="https://opendatalab.github.io/MinerU/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p> </p>
</div>
<!-- icon --> <!-- icon -->
...@@ -14,14 +15,14 @@ ...@@ -14,14 +15,14 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/)
[![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru)
[![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru)
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839) [![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
<div align="center">
<a href="https://trendshift.io/repositories/11174" target="_blank"><img src="https://trendshift.io/api/badge/repositories/11174" alt="opendatalab%2FMinerU | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <a href="https://trendshift.io/repositories/11174" target="_blank"><img src="https://trendshift.io/api/badge/repositories/11174" alt="opendatalab%2FMinerU | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
<!-- hot link --> <!-- hot link -->
......
...@@ -5,12 +5,12 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u ...@@ -5,12 +5,12 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
## Build Docker Image using Dockerfile: ## Build Docker Image using Dockerfile:
```bash ```bash
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
docker build -t mineru-sglang:latest -f Dockerfile . docker build -t mineru-sglang:latest -f Dockerfile .
``` ```
> [!TIP] > [!TIP]
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms. > The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.8.post1-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation. > If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.8.post1-cu128-b200` before executing the build operation.
## Docker Description ## Docker Description
...@@ -19,6 +19,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg ...@@ -19,6 +19,7 @@ MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sg
> [!NOTE] > [!NOTE]
> Requirements for using `sglang` to accelerate VLM model inference: > Requirements for using `sglang` to accelerate VLM model inference:
>
> - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM. > - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command. > - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
> - Docker container must have access to the host machine's graphics devices. > - Docker container must have access to the host machine's graphics devices.
...@@ -41,28 +42,41 @@ You can also directly start MinerU services by replacing `/bin/bash` with servic ...@@ -41,28 +42,41 @@ You can also directly start MinerU services by replacing `/bin/bash` with servic
## Start Services Directly with Docker Compose ## Start Services Directly with Docker Compose
We provide a `compose.yml` file that you can use to quickly start MinerU services. We provide a [compose.yaml](https://github.com/opendatalab/MinerU/blob/master/docker/compose.yaml) file that you can use to quickly start MinerU services.
```bash ```bash
# Download compose.yaml file # Download compose.yaml file
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
``` ```
>[!NOTE]
>
>- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
>- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
>- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
- Start `sglang-server` service and connect to `sglang-server` via `vlm-sglang-client` backend: - Start `sglang-server` service and connect to `sglang-server` via `vlm-sglang-client` backend:
```bash ```bash
docker compose -f compose.yaml --profile mineru-sglang-server up -d docker compose -f compose.yaml --profile mineru-sglang-server up -d
# In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
``` ```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
> ```
- Start API service: - Start API service:
```bash ```bash
docker compose -f compose.yaml --profile mineru-api up -d docker compose -f compose.yaml --profile mineru-api up -d
``` ```
Access `http://<server_ip>:8000/docs` in your browser to view the API documentation. >[!TIP]
>Access `http://<server_ip>:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI service: - Start Gradio WebUI service:
```bash ```bash
docker compose -f compose.yaml --profile mineru-gradio up -d docker compose -f compose.yaml --profile mineru-gradio up -d
``` ```
Access `http://<server_ip>:7860` in your browser to use the Gradio WebUI or access `http://<server_ip>:7860/?view=api` to use the Gradio API. >[!TIP]
>
>- Access `http://<server_ip>:7860` in your browser to use the Gradio WebUI.
>- Access `http://<server_ip>:7860/?view=api` to use the Gradio API.
...@@ -4,12 +4,16 @@ If you encounter any installation issues, please check the [FAQ](../FAQ/index.md ...@@ -4,12 +4,16 @@ If you encounter any installation issues, please check the [FAQ](../FAQ/index.md
## Online Experience ## Online Experience
- Official online demo: The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use ### Official online web application
- [![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) The official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use
- [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
### Gradio-based online demo
A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required
- Gradio-based online demo: A WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required - [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
- [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) - [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
- [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
## Local Deployment ## Local Deployment
...@@ -22,7 +26,7 @@ If you encounter any installation issues, please check the [FAQ](../FAQ/index.md ...@@ -22,7 +26,7 @@ If you encounter any installation issues, please check the [FAQ](../FAQ/index.md
> >
> In non-mainstream environments, due to the diversity of hardware and software configurations, as well as compatibility issues with third-party dependencies, we cannot guarantee 100% usability of the project. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first, as most issues have corresponding solutions in the FAQ. Additionally, we encourage community feedback on issues so that we can gradually expand our support range. > In non-mainstream environments, due to the diversity of hardware and software configurations, as well as compatibility issues with third-party dependencies, we cannot guarantee 100% usability of the project. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first, as most issues have corresponding solutions in the FAQ. Additionally, we encourage community feedback on issues so that we can gradually expand our support range.
<table> <table border="1">
<tr> <tr>
<td>Parsing Backend</td> <td>Parsing Backend</td>
<td>pipeline</td> <td>pipeline</td>
...@@ -86,3 +90,7 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u ...@@ -86,3 +90,7 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
You can get the [Docker Deployment Instructions](./docker_deployment.md) in the documentation. You can get the [Docker Deployment Instructions](./docker_deployment.md) in the documentation.
--- ---
### Using MinerU
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md).
\ No newline at end of file
# Overview # MinerU Output Files Documentation
After executing the `mineru` command, in addition to outputting files related to markdown, several other files unrelated to markdown will also be generated. These files will be introduced one by one. ## Overview
## some_pdf_layout.pdf After executing the `mineru` command, in addition to the main markdown file output, multiple auxiliary files are generated for debugging, quality inspection, and further processing. These files include:
Each page's layout consists of one or more bounding boxes. The number in the top-right corner of each box indicates the reading order. Additionally, different content blocks are highlighted with distinct background colors within the layout.pdf. - **Visual debugging files**: Help users intuitively understand the document parsing process and results
![layout example](../images/layout_example.png) - **Structured data files**: Contain detailed parsing data for secondary development
## some_pdf_spans.pdf(Applicable only to the pipeline backend) The following sections provide detailed descriptions of each file's purpose and format.
All spans on the page are drawn with different colored line frames according to the span type. This file can be used for quality control, allowing for quick identification of issues such as missing text or unrecognized inline formulas. ## Visual Debugging Files
![spans example](../images/spans_example.png) ### Layout Analysis File (layout.pdf)
## some_pdf_model.json(Applicable only to the pipeline backend) **File naming format**: `{original_filename}_layout.pdf`
### Structure Definition **Functionality**:
- Visualizes layout analysis results for each page
- Numbers in the top-right corner of each detection box indicate reading order
- Different background colors distinguish different types of content blocks
**Use cases**:
- Check if layout analysis is correct
- Verify if reading order is reasonable
- Debug layout-related issues
![layout page example](../images/layout_example.png)
### Text Spans File (spans.pdf)
> [!NOTE]
> Only applicable to pipeline backend
**File naming format**: `{original_filename}_spans.pdf`
**Functionality**:
- Uses different colored line boxes to annotate page content based on span type
- Used for quality inspection and issue troubleshooting
**Use cases**:
- Quickly troubleshoot text loss issues
- Check inline formula recognition
- Verify text segmentation accuracy
![span page example](../images/spans_example.png)
## Structured Data Files
### Model Inference Results (model.json)
> [!NOTE]
> Only applicable to pipeline backend
**File naming format**: `{original_filename}_model.json`
#### Data Structure Definition
```python ```python
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from enum import IntEnum from enum import IntEnum
class CategoryType(IntEnum): class CategoryType(IntEnum):
title = 0 # Title """Content category enumeration"""
plain_text = 1 # Text title = 0 # Title
abandon = 2 # Includes headers, footers, page numbers, and page annotations plain_text = 1 # Text
figure = 3 # Image abandon = 2 # Including headers, footers, page numbers, and page annotations
figure_caption = 4 # Image description figure = 3 # Image
table = 5 # Table figure_caption = 4 # Image caption
table_caption = 6 # Table description table = 5 # Table
table_footnote = 7 # Table footnote table_caption = 6 # Table caption
isolate_formula = 8 # Block formula table_footnote = 7 # Table footnote
formula_caption = 9 # Formula label isolate_formula = 8 # Interline formula
formula_caption = 9 # Interline formula number
embedding = 13 # Inline formula embedding = 13 # Inline formula
isolated = 14 # Block formula isolated = 14 # Interline formula
text = 15 # OCR recognition result text = 15 # OCR recognition result
class PageInfo(BaseModel): class PageInfo(BaseModel):
page_no: int = Field(description="Page number, the first page is 0", ge=0) """Page information"""
page_no: int = Field(description="Page number, first page is 0", ge=0)
height: int = Field(description="Page height", gt=0) height: int = Field(description="Page height", gt=0)
width: int = Field(description="Page width", ge=0) width: int = Field(description="Page width", ge=0)
class ObjectInferenceResult(BaseModel): class ObjectInferenceResult(BaseModel):
"""Object recognition result"""
category_id: CategoryType = Field(description="Category", ge=0) category_id: CategoryType = Field(description="Category", ge=0)
poly: list[float] = Field(description="Quadrilateral coordinates, representing the coordinates of the top-left, top-right, bottom-right, and bottom-left points respectively") poly: list[float] = Field(description="Quadrilateral coordinates, format: [x0,y0,x1,y1,x2,y2,x3,y3]")
score: float = Field(description="Confidence of the inference result") score: float = Field(description="Confidence score of inference result")
latex: str | None = Field(description="LaTeX parsing result", default=None) latex: str | None = Field(description="LaTeX parsing result", default=None)
html: str | None = Field(description="HTML parsing result", default=None) html: str | None = Field(description="HTML parsing result", default=None)
class PageInferenceResults(BaseModel): class PageInferenceResults(BaseModel):
layout_dets: list[ObjectInferenceResult] = Field(description="Page recognition results", ge=0) """Page inference results"""
page_info: PageInfo = Field(description="Page metadata") layout_dets: list[ObjectInferenceResult] = Field(description="Page recognition results")
page_info: PageInfo = Field(description="Page metadata")
# Complete inference results
# The inference results of all pages, ordered by page number, are stored in a list as the inference results of MinerU
inference_result: list[PageInferenceResults] = [] inference_result: list[PageInferenceResults] = []
``` ```
The format of the poly coordinates is \[x0, y0, x1, y1, x2, y2, x3, y3\], representing the coordinates of the top-left, top-right, bottom-right, and bottom-left points respectively. #### Coordinate System Description
![Poly Coordinate Diagram](../images/poly.png)
### example `poly` coordinate format: `[x0, y0, x1, y1, x2, y2, x3, y3]`
- Represents coordinates of top-left, top-right, bottom-right, bottom-left points respectively
- Coordinate origin is at the top-left corner of the page
![poly coordinate diagram](../images/poly.png)
#### Sample Data
```json ```json
[ [
...@@ -116,142 +165,127 @@ The format of the poly coordinates is \[x0, y0, x1, y1, x2, y2, x3, y3\], repres ...@@ -116,142 +165,127 @@ The format of the poly coordinates is \[x0, y0, x1, y1, x2, y2, x3, y3\], repres
] ]
``` ```
## some_pdf_model_output.txt (Applicable only to the VLM backend) ### VLM Output Results (model_output.txt)
This file contains the output of the VLM model, with each page's output separated by `----`.
Each page's output consists of text blocks starting with `<|box_start|>` and ending with `<|md_end|>`.
The meaning of each field is as follows:
- `<|box_start|>x0 y0 x1 y1<|box_end|>`
x0 y0 x1 y1 represent the coordinates of a quadrilateral, indicating the top-left and bottom-right points. The values are based on a normalized page size of 1000x1000.
- `<|ref_start|>type<|ref_end|>`
`type` indicates the block type. Possible values are:
```json
{
"text": "Text",
"title": "Title",
"image": "Image",
"image_caption": "Image Caption",
"image_footnote": "Image Footnote",
"table": "Table",
"table_caption": "Table Caption",
"table_footnote": "Table Footnote",
"equation": "Interline Equation"
}
```
- `<|md_start|>Markdown content<|md_end|>`
This field contains the Markdown content of the block. If `type` is `text`, the end of the text may contain the `<|txt_contd|>` tag, indicating that this block can be connected with the following `text` block(s).
If `type` is `table`, the content is in `otsl` format and needs to be converted into HTML for rendering in Markdown.
## some_pdf_middle.json
| Field Name | Description |
|:---------------| :------------------------------------------------------------------------------------------------------------- |
| pdf_info | list, each element is a dict representing the parsing result of each PDF page, see the table below for details |
| \_backend | pipeline \| vlm, used to indicate the mode used in this intermediate parsing state |
| \_version_name | string, indicates the version of mineru used in this parsing |
<br>
**pdf_info**
Field structure description > [!NOTE]
> Only applicable to VLM backend
| Field Name | Description | **File naming format**: `{original_filename}_model_output.txt`
| :------------------ | :----------------------------------------------------------------------------------------------------------------- |
| preproc_blocks | Intermediate result after PDF preprocessing, not yet segmented |
| layout_bboxes | Layout segmentation results, containing layout direction (vertical, horizontal), and bbox, sorted by reading order |
| page_idx | Page number, starting from 0 |
| page_size | Page width and height |
| \_layout_tree | Layout tree structure |
| images | list, each element is a dict representing an img_block |
| tables | list, each element is a dict representing a table_block |
| interline_equations | list, each element is a dict representing an interline_equation_block |
| discarded_blocks | List, block information returned by the model that needs to be dropped |
| para_blocks | Result after segmenting preproc_blocks |
In the above table, `para_blocks` is an array of dicts, each dict representing a block structure. A block can support up to one level of nesting. #### File Format Description
<br> - Uses `----` to separate output results for each page
- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
**block** #### Field Meanings
The outer block is referred to as a first-level block, and the fields in the first-level block include: | Tag | Format | Description |
|-----|--------|-------------|
| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
| Field Name | Description | #### Supported Content Types
| :--------- | :------------------------------------------------------------- |
| type | Block type (table\|image) | ```json
| bbox | Block bounding box coordinates | {
| blocks | list, each element is a dict representing a second-level block | "text": "Text",
"title": "Title",
"image": "Image",
"image_caption": "Image caption",
"image_footnote": "Image footnote",
"table": "Table",
"table_caption": "Table caption",
"table_footnote": "Table footnote",
"equation": "Interline formula"
}
```
<br> #### Special Tags
There are only two types of first-level blocks: "table" and "image". All other blocks are second-level blocks.
The fields in a second-level block include: - `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
| Field Name | Description | ### Intermediate Processing Results (middle.json)
| :--------- | :---------------------------------------------------------------------------------------------------------- |
| type | Block type |
| bbox | Block bounding box coordinates |
| lines | list, each element is a dict representing a line, used to describe the composition of a line of information |
Detailed explanation of second-level block types **File naming format**: `{original_filename}_middle.json`
| type | Description | #### Top-level Structure
| :----------------- | :--------------------- |
| image_body | Main body of the image |
| image_caption | Image description text |
| image_footnote | Image footnote |
| table_body | Main body of the table |
| table_caption | Table description text |
| table_footnote | Table footnote |
| text | Text block |
| title | Title block |
| index | Index block |
| list | List block |
| interline_equation | Block formula |
<br> | Field Name | Type | Description |
|------------|------|-------------|
| `pdf_info` | `list[dict]` | Array of parsing results for each page |
| `_backend` | `string` | Parsing mode: `pipeline` or `vlm` |
| `_version_name` | `string` | MinerU version number |
**line** #### Page Information Structure (pdf_info)
The field format of a line is as follows: | Field Name | Description |
|------------|-------------|
| `preproc_blocks` | Unsegmented intermediate results after PDF preprocessing |
| `layout_bboxes` | Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
| `page_idx` | Page number, starting from 0 |
| `page_size` | Page width and height `[width, height]` |
| `_layout_tree` | Layout tree structure |
| `images` | Image block information list |
| `tables` | Table block information list |
| `interline_equations` | Interline formula block information list |
| `discarded_blocks` | Block information to be discarded |
| `para_blocks` | Content block results after segmentation |
| Field Name | Description | #### Block Structure Hierarchy
| :--------- | :------------------------------------------------------------------------------------------------------ |
| bbox | Bounding box coordinates of the line |
| spans | list, each element is a dict representing a span, used to describe the composition of the smallest unit |
<br>
**span** ```
Level 1 blocks (table | image)
└── Level 2 blocks
└── Lines
└── Spans
```
| Field Name | Description | #### Level 1 Block Fields
| :------------------ | :------------------------------------------------------------------------------------------------------- |
| bbox | Bounding box coordinates of the span |
| type | Type of the span |
| content \| img_path | Text spans use content, chart spans use img_path to store the actual text or screenshot path information |
The types of spans are as follows: | Field Name | Description |
|------------|-------------|
| `type` | Block type: `table` or `image` |
| `bbox` | Rectangular box coordinates of the block `[x0, y0, x1, y1]` |
| `blocks` | List of contained level 2 blocks |
| type | Description | #### Level 2 Block Fields
| :----------------- | :------------- |
| image | Image |
| table | Table |
| text | Text |
| inline_equation | Inline formula |
| interline_equation | Block formula |
**Summary** | Field Name | Description |
|------------|-------------|
| `type` | Block type (see table below) |
| `bbox` | Rectangular box coordinates of the block |
| `lines` | List of contained line information |
A span is the smallest storage unit for all elements. #### Level 2 Block Types
The elements stored within para_blocks are block information. | Type | Description |
|------|-------------|
| `image_body` | Image body |
| `image_caption` | Image caption text |
| `image_footnote` | Image footnote |
| `table_body` | Table body |
| `table_caption` | Table caption text |
| `table_footnote` | Table footnote |
| `text` | Text block |
| `title` | Title block |
| `index` | Index block |
| `list` | List block |
| `interline_equation` | Interline formula block |
The block structure is as follows: #### Line and Span Structure
First-level block (if any) -> Second-level block -> Line -> Span **Line fields**:
- `bbox`: Rectangular box coordinates of the line
- `spans`: List of contained spans
### example **Span fields**:
- `bbox`: Rectangular box coordinates of the span
- `type`: Span type (`image`, `table`, `text`, `inline_equation`, `interline_equation`)
- `content` | `img_path`: Text content or image path
#### Sample Data
```json ```json
{ {
...@@ -354,29 +388,37 @@ First-level block (if any) -> Second-level block -> Line -> Span ...@@ -354,29 +388,37 @@ First-level block (if any) -> Second-level block -> Line -> Span
} }
``` ```
### Content List (content_list.json)
**File naming format**: `{original_filename}_content_list.json`
#### Functionality
This is a simplified version of `middle.json` that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
#### Content Types
## some_pdf_content_list.json | Type | Description |
|------|-------------|
| `image` | Image |
| `table` | Table |
| `text` | Text/Title |
| `equation` | Interline formula |
This file is a JSON array where each element is a dict storing all readable content blocks in the document in reading order. #### Text Level Identification
`content_list` can be viewed as a simplified version of `middle.json`. The content block types are mostly consistent with those in `middle.json`, but layout information is not included.
The content has the following types: Text levels are distinguished through the `text_level` field:
| type | desc | - No `text_level` or `text_level: 0`: Body text
|:---------|:--------------| - `text_level: 1`: Level 1 heading
| image | Image | - `text_level: 2`: Level 2 heading
| table | Table | - And so on...
| text | Text / Title |
| equation | Block formula |
Please note that both `title` and text blocks in `content_list` are uniformly represented using the text type. The `text_level` field is used to distinguish the hierarchy of text blocks: #### Common Fields
- A block without the `text_level` field or with `text_level=0` represents body text.
- A block with `text_level=1` represents a level-1 heading.
- A block with `text_level=2` represents a level-2 heading, and so on.
Each content contains the `page_idx` field, indicating the page number (starting from 0) where the content block resides. All content blocks include a `page_idx` field indicating the page number (starting from 0).
### example #### Sample Data
```json ```json
[ [
...@@ -437,4 +479,13 @@ Each content contains the `page_idx` field, indicating the page number (starting ...@@ -437,4 +479,13 @@ Each content contains the `page_idx` field, indicating the page number (starting
"page_idx": 5 "page_idx": 5
} }
] ]
``` ```
\ No newline at end of file
## Summary
The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs:
- **Model outputs**: Use raw outputs (model.json, model_output.txt)
- **Debugging and verification**: Use visualization files (layout.pdf, spans.pdf)
- **Content extraction**: Use simplified files (*.md, content_list.json)
- **Secondary development**: Use structured files (middle.json)
...@@ -5,19 +5,21 @@ ...@@ -5,19 +5,21 @@
### Memory Optimization Parameters ### Memory Optimization Parameters
> [!TIP] > [!TIP]
> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters: > SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
>
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower. > - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2` > - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
### Performance Optimization Parameters ### Performance Optimization Parameters
> [!TIP] > [!TIP]
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters: > If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
>
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2` > - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile` > - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
### Parameter Passing Instructions ### Parameter Passing Instructions
> [!TIP] > [!TIP]
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api` > - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
## GPU Device Selection and Configuration ## GPU Device Selection and Configuration
...@@ -31,22 +33,29 @@ ...@@ -31,22 +33,29 @@
### Common Device Configuration Examples ### Common Device Configuration Examples
> [!TIP] > [!TIP]
> - Here are some common `CUDA_VISIBLE_DEVICES` setting examples: > Here are some common `CUDA_VISIBLE_DEVICES` setting examples:
> ```bash > ```bash
> CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen > CUDA_VISIBLE_DEVICES=1 # Only device 1 will be seen
> CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible > CUDA_VISIBLE_DEVICES=0,1 # Devices 0 and 1 will be visible
> CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional > CUDA_VISIBLE_DEVICES="0,1" # Same as above, quotation marks are optional
> CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked > CUDA_VISIBLE_DEVICES=0,2,3 # Devices 0, 2, 3 will be visible; device 1 is masked
> CUDA_VISIBLE_DEVICES="" No GPU will be visible > CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> ``` > ```
### Practical Application Scenarios ## Practical Application Scenarios
> [!TIP] > [!TIP]
> Here are some possible usage scenarios: > Here are some possible usage scenarios:
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start 'sglang-server', you can use the following command: >
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
> ```bash > ```bash
> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2 > CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
> ``` > ```
>
> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
> ```bash
> CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
> ```
>
> - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands: > - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:
> ```bash > ```bash
> # In terminal 1 > # In terminal 1
......
...@@ -63,6 +63,8 @@ Options: ...@@ -63,6 +63,8 @@ Options:
## Environment Variables Description ## Environment Variables Description
Some parameters of MinerU command line tools have equivalent environment variable configurations. Generally, environment variable configurations have higher priority than command line parameters and take effect across all command line tools. Some parameters of MinerU command line tools have equivalent environment variable configurations. Generally, environment variable configurations have higher priority than command line parameters and take effect across all command line tools.
Here are the environment variables and their descriptions:
- `MINERU_DEVICE_MODE`: Used to specify inference device, supports device types like `cpu/cuda/cuda:0/npu/mps`, only effective for `pipeline` backend. - `MINERU_DEVICE_MODE`: Used to specify inference device, supports device types like `cpu/cuda/cuda:0/npu/mps`, only effective for `pipeline` backend.
- `MINERU_VIRTUAL_VRAM_SIZE`: Used to specify maximum GPU VRAM usage per process (GB), only effective for `pipeline` backend. - `MINERU_VIRTUAL_VRAM_SIZE`: Used to specify maximum GPU VRAM usage per process (GB), only effective for `pipeline` backend.
- `MINERU_MODEL_SOURCE`: Used to specify model source, supports `huggingface/modelscope/local`, defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables. - `MINERU_MODEL_SOURCE`: Used to specify model source, supports `huggingface/modelscope/local`, defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables.
......
...@@ -15,14 +15,16 @@ MinerU has built-in command line tools that allow users to quickly use MinerU fo ...@@ -15,14 +15,16 @@ MinerU has built-in command line tools that allow users to quickly use MinerU fo
# Default parsing using pipeline backend # Default parsing using pipeline backend
mineru -p <input_path> -o <output_path> mineru -p <input_path> -o <output_path>
``` ```
- `<input_path>`: Local PDF/image file or directory > [!TIP]
- `<output_path>`: Output directory >- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory
>
> For more information about output files, please refer to [Output File Documentation](../output_files.md).
> [!NOTE] > [!NOTE]
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems. Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`. > The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
> [!TIP]
> For more information about output files, please refer to [Output File Documentation](./output_file.md).
```bash ```bash
# Or specify vlm backend for parsing # Or specify vlm backend for parsing
...@@ -42,7 +44,8 @@ If you need to adjust parsing options through custom parameters, you can also ch ...@@ -42,7 +44,8 @@ If you need to adjust parsing options through custom parameters, you can also ch
```bash ```bash
mineru-api --host 127.0.0.1 --port 8000 mineru-api --host 127.0.0.1 --port 8000
``` ```
Access http://127.0.0.1:8000/docs in your browser to view the API documentation. >[!TIP]
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend: - Start Gradio WebUI visual frontend:
```bash ```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends # Using pipeline/vlm-transformers/vlm-sglang-client backends
...@@ -50,23 +53,32 @@ If you need to adjust parsing options through custom parameters, you can also ch ...@@ -50,23 +53,32 @@ If you need to adjust parsing options through custom parameters, you can also ch
# Or using vlm-sglang-engine/pipeline backends (requires sglang environment) # Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
``` ```
Access http://127.0.0.1:7860 in your browser to use Gradio WebUI or access http://127.0.0.1:7860/?view=api to use the Gradio API. >[!TIP]
>
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
- Using `sglang-client/server` method: - Using `sglang-client/server` method:
```bash ```bash
# Start sglang server (requires sglang environment) # Start sglang server (requires sglang environment)
mineru-sglang-server --port 30000 mineru-sglang-server --port 30000
# In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
``` ```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ```
> [!TIP] > [!TIP]
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`. > All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md). > We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
## Extending MinerU Functionality with Configuration Files ## Extending MinerU Functionality with Configuration Files
- MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your user directory to add custom configurations. MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your user directory to add custom configurations.
- The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`. The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
- Here are some available configuration options: Here are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature. - `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`. - `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
...@@ -38,6 +38,7 @@ mineru-models-download ...@@ -38,6 +38,7 @@ mineru-models-download
``` ```
>[!TIP] >[!TIP]
>- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory. >- After download completion, the model path will be output in the current terminal window and automatically written to `mineru.json` in the user directory.
>- You can also create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
>- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`. >- After downloading models locally, you can freely move the model folder to other locations while updating the model path in `mineru.json`.
>- If you deploy the model folder to another server, please ensure you move the `mineru.json` file to the user directory of the new device and configure the model path correctly. >- If you deploy the model folder to another server, please ensure you move the `mineru.json` file to the user directory of the new device and configure the model path correctly.
>- If you need to update model files, you can run the `mineru-models-download` command again. Model updates do not support custom paths currently - if you haven't moved the local model folder, model files will be incrementally updated; if you have moved the model folder, model files will be re-downloaded to the default location and `mineru.json` will be updated. >- If you need to update model files, you can run the `mineru-models-download` command again. Model updates do not support custom paths currently - if you haven't moved the local model folder, model files will be incrementally updated; if you have moved the model folder, model files will be re-downloaded to the default location and `mineru.json` will be updated.
......
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="url(#paint0_linear_8609_1645)"/>
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="516px" height="516px" viewBox="0 0 516 516" enable-background="new 0 0 516 516" xml:space="preserve"> <image id="image0" width="516" height="516" x="0" y="0" <path d="M19.7238 3.86898C19.7238 4.57597 19.1502 5.1491 18.4427 5.1491C17.7352 5.1491 17.1616 4.57597 17.1616 3.86898C17.1616 3.16199 17.7352 2.58887 18.4427 2.58887C19.1502 2.58887 19.7238 3.16199 19.7238 3.86898Z" fill="#010101"/>
xlink:href=" <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="url(#paint1_linear_8609_1645)"/>
AAB1MAAA6mAAADqYAAAXcJy6UTwAAAACYktHRAD/h4/MvwAAAAlwSFlzAAAWJQAAFiUBSVIk8AAA <path d="M15.3681 5.1491C15.3681 5.85609 14.7945 6.42921 14.087 6.42921C13.3794 6.42921 12.8059 5.85609 12.8059 5.1491C12.8059 4.44211 13.3794 3.86898 14.087 3.86898C14.7945 3.86898 15.3681 4.44211 15.3681 5.1491Z" fill="#010101"/>
MLJJREFUeNrt3XecXFX9//HXmfSekN4I6SGVhJAAMQkkFAEF4QsCoiBFaYIIyE/Er/IFUbCioIB0 <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="url(#paint2_linear_8609_1645)"/>
BemKikIoAjZAQGkhodcACSSkk5BkP78/Npvtm5k5955zZ+b93McDdjZ77/mcuzvvvfUcZ4hIpcvF <path fill-rule="evenodd" clip-rule="evenodd" d="M8.05175 11.2368C8.05175 13.4605 9.14375 15.4293 10.8211 16.6371C11.8241 15.7389 12.4551 14.4345 12.4551 12.9828V9.39673C12.4551 8.85661 12.8197 8.38448 13.3426 8.24757L19.8924 6.53265C20.6459 6.33534 21.3826 6.90341 21.3826 7.6818L21.3826 12.0452C21.3826 17.2179 17.1861 21.4111 12.0095 21.4111L11.9942 21.4111C6.81758 21.4111 2.62109 17.2179 2.62109 12.0452V9.03388C2.62109 8.49175 2.9884 8.01839 3.51385 7.88336L6.56677 7.09882C7.31904 6.9055 8.05175 7.47318 8.05175 8.24934V11.2368ZM3.9798 12.0452C3.9798 13.8476 4.57565 15.5108 5.58124 16.849C6.04996 17.4728 6.7655 17.8884 7.54573 17.8884V17.8884C8.28848 17.8884 8.9927 17.7236 9.62376 17.4286C7.83439 15.9596 6.69304 13.7314 6.69304 11.2368V8.46821L3.9798 9.16546V12.0452Z" fill="#010101"/>
LkBE4lMQiIiCQEQUBCKCgkBEUBCICAoCEUFBICIoCEQEBYGIoCAQERQEIoKCQERQEIgICgIRQUEg <defs>
IigIRAQFgYigIBARFAQigoJARFAQiAgKAhFBQSAiKAhEBAWBiKAgEBEUBCKCgkBEUBCICAoCEUFB <linearGradient id="paint0_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
ICIoCEQEBYGIoCAQERQEIoKCQERQEIgICgIRQUEgIigIRAQFgYigIBARFAQigoJARFAQiAgKAhFB <stop stop-color="white"/>
QSAiKAhEBAWBiKAgEBEUBCKCgkBEUBCICAoCEUFBICIoCEQEBYGIoCAQERQEIoKCQERQEIgICgIR <stop offset="1" stop-color="#2E2E2E"/>
QUEgIigIRAQFgYigIBARFAQigoJARFAQiAgKAhFBQSAiKAhEBAWBiKAgEBGgdaiGXL1XOaoAh8MA </linearGradient>
A3IYAIbDUVXne6q/Vvfrhm3+r2thiZrlar7eeInml2Pzko17UFMLdb4vt/lVwyXKXc1PomrLdqiq <linearGradient id="paint1_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
s/3qcpu3Ts3PoPY7an6SYeuuqWrT5t+72p9orl/VECYwMTfShtGVbjg2sJx3WWSP8JR7jcW2inrV <stop stop-color="white"/>
V21eY1VK1YbZNsGCQCTjtmEmn2BvNwHqvf3a04XB4A4EVvIP5vEA82MXmzQXKou1R1B+ymiPYKg7 <stop offset="1" stop-color="#2E2E2E"/>
jtnMyHMFr/EEv+auctojwAJ9NPWjcORwm38hqj9zm79a93uqv1b3627Lf1taomY56i1Xd4nml2v4 </linearGradient>
i1zz77W11P2+HLnNa68sNT+J2u1Qd/vV/cjV+SlT7zuIsOVymz9aVbfe3p3r3nBW4MfK3J/c9Nrf <linearGradient id="paint2_linear_8609_1645" x1="14.3898" y1="8.36821" x2="13.1876" y2="19.4461" gradientUnits="userSpaceOnUse">
rfROtgV6fyoIFATFKocgcLNzjxUcAjUfyzk/171mjWkJ8/7UVQOpXF04m3ttWtHLd3Pf4nZ2it2N <stop stop-color="white"/>
JCgIpFINst+479HWbyU2193GZ2N3xZ+CQCrTULvRDkhkTUPc9Xwzdnd86fKhVKJtuZWpia2tvV3g <stop offset="1" stop-color="#2E2E2E"/>
4HuxO+VDewRSeQZwjSUXAwDYtzk6drd8KAik0nS1K21u4mtt537C7NhdK56CQCqMO419U1lxd3c1 </linearGradient>
g2L3rlgKAqksu3F6ausezgWlejuJgkAqSXfOo1t6q3dHcnDsLhZHQSAVxB3jZqbcwjl0jt3LYigI </defs>
pHL04+zU25joDo/dzWIoCKQEuC3PB3h9nEyvAKUeQ9fY26twCgIpAUaV/0cn2zNIsdP5TOztVTgF
gVQIty+TwzTEvrSP3dtCKQikUuzv+4BRvtw+DInd2UIpCKQyDGB8sLa6MjF2dwulIJAS4fw+xjEm
YLFzS+3GIj19KCXCc+y+bV3A43Y3zdqxLlx7/rRHIJVhVNDWJtA9docLoyCQStCRkUHba822sbtc
GAWBVIL2bBO4xRJ7DlFBIJWgTZqPGjWpR+wuF0YnC6WkFHkyvm3wIOgeuD1PCgJJVjt6sB1TGUM/
BtIBA1ayhNd5jv+yiOVsilBVLvhveom9s0qsXMmwHOPYkX3Y2zX667tlirNXuJeHeYzXi22kyIuI
61jqBgfdGiV18VBBIEk5wB1us12/lr/JhnOiO5H57q92Bw8HrM7YGHh7LArcnicFgfjbx53CbDrm
/f3jGOcOZx4/5clAFX7EksDbZEXg9jzpqoH4GcAv3R3sU0AMVOvFEfyFrwR6Tm8NbwXeLi8Ebs+T
gkB8fNr91U6kQ5FL9+ESbmF0gDqreDnkZuE9lgVtz5uCQIqV40xu830bu/35AwcWsVyhHy+yIdym
sQf5KFxrSVAQSHHauWvsB7TzX5GN5iZ3UsFLFfqxkDcCbp3Ho1wk9aAgkGK0dVdzVGKP2rbjZ5yQ
csUv8VTKLdTaGPSKSCIUBFIw19ZdyRGJrrK1u5TjUi3aeCjV9ddt6t/F3ycRi4JACua+ypGJr7SV
+4nbPc2q7S+8meb667ir1E4VKgikcJ+x76Sy3i78MtUrCK/ZHSmuvdbr3BmknUQpCKQwQ90P6ZTS
use4s1Md4usXQW78/R0LArSSMAWBFCT3LUakuPrPJ3zuob5X7MoU115tiV2eehspUBBIIfa2Q1Nd
fyvOZUCK67+Yd1OtH67jpZRbSIWCQPLX1Z2d2mHBZm44R6e4+lft3DSrtyfs+2muPz0KAsnfHqQ8
lzCAO5XeKa7+am5Lbd2rOYflKdaeIgWB5KuV+1SQ35c+HJbi2jfZd9I6mWdncG+KladKQSD5GuH2
C9OQO6TgZxkLscCOTuXv9hXcmGLVKVMQSL72sD6BWprM1FTX/5idkPRDQXaVncGaVKtOlQYmiaxk
ZsbKBZzsu5Obbn9LtYVbrK27PLn9DruCU/k49e2SIgVBZJ4TeYXT1U0P1pZjV1zKm+Y3vM+lDE9g
TWa/5MzSjgEdGki+ptA5YGsjA0wsfg8H8VfvtSzhZE4ttaFKG1MQSH52CHoUMzDIlGHP2MH2HVYW
vwK73w6yy6gKtl1So0OD6ErkLMGEoK11c0XcX1jEscSHnGd/c99lGm0KXvY9+wFXsSq5Tqd9NNQS
BUF0VhpREHYKL2e9grX1EJ+wg9wpTC3g4OcVu5lLWJxYd6OfK1IQSH76Bm6vS9DWfsfvbH/3Saay
01a+80P+YfdzZ1JjG+QyclyhIMiAktgnCD13YNvgPfyj/ZEBTGAcOzGOoa7e/oG9zfPM5xkW8h/f
KwSOmp957P2AWgqCTCiBKAj9OxvnPfIO7zCPdnSlI11sG1phONa6paxlVTLnA1wGDgQaUxBIfkKP
yhvzetZ63k9nxdmNe10+zIjs/Y1owOMiW1HWxu5w8rIbA9ojkHylPaBHA67o4T+zF6lZDoAaCoLM
yN4vcD2vB/113mCJXZqTfOjQQPLzbNDW3g06L5EoCCRPTwVtbVHpTRHSlFI4KKimIJD8vMSrAVt7
PvjJyYS5EgoBUBBIvj6yB4K1tckejd3dSqMgkHzdHux85juEC50UlNa+QDUFgeTrGZ4P1NK/eS12
Z4uXy/r1n2aqFsnP+/b7MA3Zr2J3tRildlagPgWB5GsTt/JW+s3Y/dwXu6vFURBIZXg2wDy/VVyU
1L51uF/uUo6AagoCKYBdlvatxnYniVwxcHX+m7bSjwEFgRRmgX031fW/w/ms9luFa/TakdYRfGmf
F6hLQSCFucL+kNq6zb6f3h2MSb9lXQrrjEdBIIXZxKmp3WN4C6lfL0jiF7583v61FARSqDftSJam
sN5/2XF+g4Dl/wb1eSuXz+FAXQoCKdw/7fNJR4EtsGP85g4s7u3Z/MBhVmetaZ5nyAYFgRTjHjuC
DxNc30K+wAvFLpzz/Atf+99cnTd8yOsO8SkIpDjz7LO8ndC6/m6H8mTsDlU2BYEU6347hH8nsJ4b
7QCeKW7Rct5ZD0tBIMV71D5lv/Q6wfeenWLHFn+QkcWBwUuTgkB8vM+pdrA9VtzCdovtyaWsL2ZZ
7QkkS0EgfjbxJ2bbUcxnYwFLrbAHbV+O4LnY5Us1jWIs/tbza7uJo9wsZjBsq9/9Ag/YbTxUbGPZ
miqsXLhQm7T+rlz11I8113CN2uEcDIejqs73VH+t7tcN2/xf18ISNcvVfL3xEs0v19TRp9tSn235
vPr7cptfVfqvp8PGuZ1sGuMY4/rU/zeD13iahTzNE7zs0cJWfrI1Pzm3ld+Rpr9ef4mWfne23kLt
OlwTvahZqrae5s54hPmt0h6BJGk+87mOnvSkuw2gJ+0BY5V7j2UsY7Hf/EWlOfZPaVAQSPKWpnIL
cunqTDs60Io2wMdsZB2r2BC7qPoUBFICqnegS043RrA9kxjPALalE22ANaziHZ7jOZ7jpeIPkpKm
IBBJ3vZMZ4oby0TXu8G/dKKT9WOKA1jg/sHdBBoJsmUKAsmwmhNwJaMDu7O3m8YABuVxaX57trcj
3NN2Bb+OffpDQSDiqy3d2JZZfMpNpqO1K2jZjuziprkj7Tz+FjMMFASSSSVyt8BAtmMMuzHdjfSo
t5XNcbPsZ/yAJbE6oiCQzCmBJwjaMZXpbpztwFjXPpE1tnZnsCvHsiBOhxQEIoUYym7s5sYziD7+
K2tgF3eXncw9MbqlIBDZuo50YyJz2MONoJO1Sq2dYe52O5ZbwndQQSAZkrn7BXIMYSRjmcsM1yNI
i53c5fZB+ElgFQSSARk8NdjfJrmdGG/j3LjAtXV3V9qnmR+2uwoCyYQMnSCczF7s6kYwkG6RKhjq
rrJ9WB6ySQXB1uTQg4Upysy+QFe6M4N92N11p3PsYtiZs/hmyAYVBE1pzQAGMpBt6U87oIpVLOEN
3ubdtOf+qxwZ2QfozBBGMI05bueMRBIA7lS7h7+Fa09BUF9fdmac24FRbO/aNvrX5bzonuI/zOP1
2IXGkNxfb1dn/IBoxrCDm8Bkm+QGZCkCNuvkLrQ5rAvVnAYmqf53xy5uLtPYjmF03GpnXnD/sGv5
Z6UNTFK7BZt+47gtW7zh3/vaV2kPG5JHC13Y1e3FeEawbab/EG7i83ZzqIiq7CBo5bpU9WfX3F72
CboWeGS4xt1bdS7PVGoQVPe4vuaDgPzfpqQUBK1dl6oh7JGbbdPpTIfY2zIv8+wA1muEojT1YQjb
MYdZbmyRmdvJDnS72I+4JtEZf0pK3UG76kd93aG3crEjsjfbMZrZzHSjM3gI0JIp7Mi/wjRVaUHQ
lslMdBOZbJNdR+9fi37uR7Yn/4+nY3crrvoj7jlaHoMvmLaMZ7KbwI5MSeBnHUNvZioIkrYdc9nJ
7cAQ+iW5Wrc3YzmTW2N3T+roz2zmuvE2mIGxS/G0J5ezIkRD5R4EHejFJHZzMxlu3VLq7WC72rXi
pthdrXjt6cRY9mOu246utPVfYXxudxuoIPDRnmEMYmc+4XalU+o7hZ3tMmfcHLvTFWsIQ5jAXuzS
aGCwUpdje54P0VD5BcEwpjCWiezCgIDTYnXjF7Y+G6PPVZDu7MhYprAjE8p2CrRPcEeIZsonCDoz
jbluKtsxlDbhfy1sG/dLW8DC2JuhQkxkhpvFKBvmupfkacD8DQrTTKkHQRs6MZB9meMm0tXi3iPe
z11te/BR7E1SxjrShV3Yh5muH11Jb1SADHEjwrRTukEwkIFMYjaz3aDM/E3Y1Z1l/xe7iDLUiQGM
ZBqzmekq4u1fR/cwzZReEHRhMmPYkR2Z5LJX/an8gadiF1FGBrET45jKRDc0M3EfVqBOZ++t1JxW
jGOGm872NpyeGXlyrbFt3LF2SuwiykAbZjLD7cpgG+naVmgEBJX9IOhED2Yw182gn3UrgePCw7mq
0u80LFqOjgxiFvu58fSmS+xyKkl2g6Ar/RnFLnySca6Ubg7pySEKgoL1ZCCj2J1PuImxS6lMWQyC
UUxhe6YyyZXmDaIH8zPej11EyZjAFEaxK1NcVx0CxJOtIJjDvm4SQ2xk9EErPLjRNoG/xq4i8/qx
Gzu6KQxnSCn/tMtFNoKgNWP4AgfRt0yOC/dTEDSjDR2ZxAz2cqNtGwqbJVBSlIWBSSbb6e7wEjgN
mL+Ftn3sEtLQcGASmhk2pMmv92ewjeWTbnayT3+WvdeqhoVoJuoegeH6cbI7gV4xq0jBAAbzVuwi
MqIH2zOJSezI1LJ9HqAMRA0Ct6ddyJQyPDpsxyQFASOZ66YxzrajjyIg62IGwZf4cZmcE2ioTckP
iFG89nRkFru7XdnWepbVAV9ZixUEbflf963YnU9Njp6xSwiuPX3Zjl3Yhyl0LsO9vDIXJwjacJE7
LXbXU1VuZz1aMpjxjGU6OzBShwClKkoQuJPstNgdT1nGJvVNyVTmuClsbyNdB90JUNpiBMHnuCh2
t8VTJw7naEZb5R0ClangQZAbZz/WjSQlrQ97urNtXOwyJEmhg6A1l+mGkpJ2mDvNpuswoNwEDgJ3
uM2M3WUpWncudMeZLgmWobBB0I9SuGRorOc1u5v+7jB0GrzWRLuSabGLkHQEDQJ3NKNid7hFS1ls
85nHo24hm/gKh8cuKENmulvoH7sISUvIIOjNIbG724zVvMjT9ixP8F9Wb/mqdoFrzXI3KQbKWcgg
mMv42N1tyF7kQf7LAl5hUb2vxy4sW2a4W+kbuwhJU7ggaMOetInd3c02ssr+xr08ynu8VyE3/xQt
tw0XmGKgzAULAtebObE7yxqW2Os8wr08ysf6s5+nn9rs2CVI2oIFgW3rtovYz5dZwEv2Dx7jnYhV
lKIj7MjYJUj6wh0azIrRPVvKozzunraFLNSVwCJs674fuwQJIVwQjA3YK2OtvcA8HnXzbQmrArZc
ZtwJDI5dg4QQ7hzByACNGO/ztv2Hh/knb7ExVN/K1gg+G7sECSPcHkHvVNe+jOd41R7nMZ4M1qPy
9xmGxy5BwggXBAPSWKnBk/yLp3iBZ3QIkCzXjU/FrkFCCRcEyba0yd7jn9zDkyxmcbA+VBSb7GbE
rkFCCRcEyVy1X81i3rC/8CALWK9bgVLk2C0j099IAKXzo17ISzxnj/CQ0yEApP9YZFs3R3dcVY5S
CIJl9nOe4Vle1p0AtVJ/k3Zlcuw+SjilEATvcRHrYhdRccbQOXYJEk4udgF50a9keKlc5ZGsKo0g
0DFBeCFuAJPMKI0g2EwnrwLqEbsACamkgkACysrYERJECQaB9guC0GauKCUYBCKSNAWBiCgIRERB
ICIoCEQEBYGIoCAQERQEIoKCQESoqCDQrXIizamgIBCR5igIRERBICIVEgQ6OyDSsooIAhFpWZkH
gfYFRPJR5kEgIvlQEIiIgkBEFAQigoJARFAQiAgKAhFBQSAiKAhEBAWBiKAgEBEUBCKCgkBEUBCI
CAoCEUFBICIoCEQEBYGIoCAQERQEIoKCQERQEIgICgIRQUEgIigIWqJtIxVDv+zNW+WxrKZYkpLS
OnYBGbYcwxW3qGufdnFKGkmS9gia57NtuscuXqQQCoLmFbk3IFJ6FATN2+CxbJvYxYsUQucImreM
KloVuWx3HcNLKdEeQfM+9li2Y+ziRQqhIGjeEqqKXlZ7WlJSFATN8ztH0C52+SL5UxA070OPi/Xd
GBi7fJH8lf0urMc1wI9ZzTZFLtuatrF7LpI/7RG0ZEXRS3ahd+ziRfKnIGjeJpYUvWx7usUuXyR/
ZX9o4HE9v4rFRR9YtHPFHlSk3y+RRrRH0LxNvFf0sm3oE7t8kfyV/R6Blw89lu2tv9lSOrRH0JJ3
PJbVHoGUEAVBS171WLaPbjOW0qEgaEnx5wigDz1ily+Sr3BB4NOSx+G28/lY7fHgUT8FgZSOcCcL
1xR9r50r+mFgX+tYwqAilx3geqVbnE5GVoRAw+OEC4JlRf+FbEdXFhfbrNfbZSWvuGKDIGcDfJoW
AWBdmGZK4dAgF22PYI3XdYOxkaqWMmI+l7ALEC4Iiv/THC8I1vOSx9JD9OCReAt0BBguCIpPtg4R
xwT22SMYpwuI4m1TmGbCnSMo/vx7K7+hQL3Otrzv0e721pPlPo2LsCxMM8H2COyDohdtS2evln0+
3vX4QbRn2/S2p1QIn3GyChDu0KD4s5+eewReFvGWx9KTotUt5cJjn7QQ4YJgZdFLdog4yMf7HmMS
wIxodUu5WB2mmXDnCJYXv6jr5NOw12nXj3jTo+4drTUbfZqXirc8TDPhgqD4Yb+gl981FK/ThS96
LNuX4bzgVbpUuqVhmimFy4f4nSz0PF34ImuLbrgtU9LZmFIxiv/tK0i4IFjksWzXYFU29oxHhLV2
c70eetrKh1SAd8M0Ey4Ilnss2xevswReXi3+OQdggnX02h9p8UMqwPIwzYQLAp9jnW5+QeD5d3eh
R9MDGZHO5pSK8LELdNUgWBC49awveuEedPFp2/Mv79MeTQ9wO6e1RaX82RIru3ME6zxujdjGLwg8
Pehxv7fTTUXiYZnXnNwFCHeL8Wor/h69vm6biKfVnvW64jFZcx5J0d4vv/EI1vvcLGmdI55WW2+P
eyy9A6NS2Z5p07nILCjLIPA5XVjsOEFJMB71WLoD20esvXia8SIL3im/INjA6x5Lj4562fwBr6UP
pENahaV4l0Lf4FtZGvN50qUgIYcz99kj2Jb2AStt6A3zuK3DzSzJyU6qYhcg4BaF+vsX7vIhzifd
BkYNgg/5t8fSXZgdsfZipbYXI3n72JaGOlUTco/gveIfPHKjI95bCGv4h9fy+6ZVWIq/Jj3TW7Xk
abXXo3oFCXf5EFvh0a2u9AtVaZOe9rgdCje3BCc7ibu9BeCdUE8ahN4j8OnWmICVNjaf5z2W7uE+
nVZhKT3H0CbayNFSa2moh5BDnyz0GRN4h4CVNvau17gCrTg0avWFGxD1nIxUe5dVoZoKGQSbvI54
RgestDHjEa8D8h0YH7X+QnXU9LgZsCjcbV1hf9yvFL+o2zFopY3Y/V5zIw9wn4xbf4GG6KpBfB43
5Rcs5OVDeNXjEYquDA9Va5Oe5zWv5fdIZ3iVlG4nGqRDg+jWeo2gXaCQVw2wVz0ODtoyIVStzfTg
Qa/FZ6VzcJDSvuMA3WIc3VL+G66xsIcGL3gEQRumBq21sTu9RiTuQEoHB6mMfTQsyhaWuj4KNcsR
hA6CDz1mN8DtELTWxp70GtEYDiihuwn0pEF8r4e7nSh0EMBTHssOjTqIKZjd6LO4m5jeHYYJ6xxx
2lnZzGtkrIIFDgJ7zOOgtk/kewngz36Luy9Frj9fg0vyManyUuX1R7NgofcIXvZ4qq2ni/1k/6vm
98zBdHZKvqgUrhlsqxuMo9tEOe8R8ARril7WMTFwtQ2t4h6v5du7rydfVAonCwfSMdIWlhqrw86R
FfQ+Aodb7XXCbZTvnEfeHvIavxBmRX5mIj9jYxcgPOY2hZzGJux9BFiV/ctjJRPcyMhzAz3uNWwZ
9HGHJbtdU9CRkbFLEHvULOQ0NuHvKH/CY9m+tl3kuYE+5m6v5R2HsF2i25PEzxL0d7EPwQSeCdtc
+CB4zmsQrBlRxy4E7F7PGz/HcmDiNSX70S/5qJICbeTZsA2GD4J3eMlj6V0i30sAL3geHMBXMn5O
flzsAsSeCTXnYY3wQbDcPO6gdjsyIHjFDdj1Xrca44bx+WQrSvTAoI3bPebWFQAe8TwpXbDwQbCe
Jz2Wbsf04BU3dLfXA8mAO51tkiwo0QODDhnYwvKsxzR7RYkx/MSzXn9R94xQcX1VdqnnGvpzcuxO
NGs0Q2OXUPHWez7VUoQYQfC6z1Qnbm7UCVGr3egzfRuA+yL9kysnyUODtJ6RlAIs8DqPVpQ4QeAz
EGjvDOwTvGt3eK5hqDs+uXKSPDRw+0XdsgKwwGt0z6LECIL1LPSp2M2NUHN9m7iJ1V5rcByayYt0
w3UzUQY8E36eqShDVNqjrPVYfGYGxtP7D494rmGM+2zsTjRhH7rFLqHiLTefebWKFGes2n95nXcf
zh5Rqq5rNX/wu4gI7pTsPXfg5mg+g+gW81j4RuMEwWKvswQd3W5Rqq7HbvC998sGuXNi96KB0UyJ
XYLYcx5P6BYt0uj19nevxacnex2+KCvsCu91HM7OsbtRzyy2jV1CxaviTzGajTWNxd2s81h6avSx
igBu9T6328r9JJnzHYlcOmzrZsV+kkNYy8Mxmo0VBC96PbrTjtkZmInnQ7vKex07cVzsbmwx3JXK
mIrl7Fmfu2yKF+vttN7u91ncHZCJ4TVv9J6ksrX7ehLDgCRyD8FUi3/AVfHsrjjtxvu7eoPX0hMj
z3tU7UW72nsdg/ly7G4A0D53ROwSBPD6A1m8eEHwktf91I6jolVe1/Us8V2FOzkD90rCeNs7dgli
z/FqnJbjBcEK+5vP4m6vDFw5gOfxmusAgNa5izJwoPO52AUIcJf3wWaR4gXBx547QSP5dLTa67Cr
WOy9jsnu/MjdGOy+ELkCgY38M9gghQ3EPPf+BC/7LO4SHt6jSM9zWwJrOS7uGXv3RXrFbF8A+G+M
ewqrxQyC1z3ncpkUfZ4DAOw7vOu9kvacH/Gt2IEDorUttZ7xfby9eDGDYBP38rHH8r0zsk+wzH6c
wFqmuG/E6oD7XDYitcKt5b54jUe9Lcce8Bz0aw49Y9a/xVXmM/zaZu7Lcc56uN7uGNrEaFnqedM8
59b0Eff+vNc8R2+fRDYuea0ggX0C6+J+EWOYMNvVsvXEQ4Wyf3qOceElbhCY3eh1lrQ1h9M+ag9q
/IW/JrCWwe7i4I8Bt8sdm4HbtaWKm2I2H/tX4E9+04W4mYyO3INqK/hRInm+H18LW7ibY3uFbVGa
9Hq8KwYQPwjWeN5b3c0FfuM0x+4miSO8Vu7coPcZtnWn0C5ge9IMuyXmgQG4UPcvNPt86y7OZ1pU
+MhGhB/qsUkj3OOJ3CH4ktur6vXCFskVObOjO5YrNCZRBpiNb26wnjDv0Nh7BLDA84RhB7JyT9zL
9r1E1jOSX9ApSMXtOEkxkAX2cJyHj2vFD4Llvk/1u+MychERfuU9pCkAti/nhijXHcWEEO3IVt3h
NZxvAuIHATzseWfesCTnCPCywr7tO6RpNXeqOzL1aofyTd0/kAkv85fYJWQhCJ7jH559ODwzY+09
ZNcmsp62/CI3d2s/HEfOZ2yx4xkSbsNIC/4e6+HjWlkIgirvocHHk5X5eTbyw4TmrevMxWyfYqUz
3AlhNolsxWq7PXYJ2QgC7M+84LmKIwKdXtu6l7ggmRXZeK5jYEpVutwFmsokI17godglZCQIWG5/
8rtK4mZkZp8Au9HuTGhN09wlhVzlz/cwweFOsdmht4s0aaP9OvaJQsjCfQTV+riX6OrVwLM2JZkT
dQkY7B5LbLbjn9rpzf2Tw22+f6D6PgKXZ5q6CcxLcjZm8bDSxrR8srxS7iOotoQHPNcwgewc875l
30psGsuvcVHC1bXiHMVAVtgNCYxmkYCsBAH2Q68pTwB3embuJ4Cb7dakVuXO4ltJlpY7mkPDbxBp
0nouiV1CtcwEAU96ToMGQ9yJsTuxxVrOTujqAeDOd6fWzkjkabydG3XLSB12X/wLh9WyEwQf81vv
vmTnfgJ43c5lQ2Jr+6E7NJEg6O5+kdqVCCncVV5jdCUoO0EA9/C45xrGckjsTtRxk3e01Wprv8pn
h77lH6cjdwazYm8W2WKe95mxxGQpCN7jJt9TbLmT3IjY3ahlX0/m2QMAutplHOy5jgMtIw9tC7CB
yI8e15WlIMBu5jXPNQyzs2L3oo737XRLbsKKHnal1x7PeH6UmduuBF6LNc9hUzIVBLxrf/RdhTua
KbG7Ucej/G+Ca+tu13BYMQvmyHVxlzAs9uaQWnZbvMHLG8tWEMCVrPRcQ2v3I1rH7kYdl/GbBNfW
2a52JxexXGsuYLfYm0LqWMLFsUuoK2tBsCCBIRxncmzsbtRlp5HAYOdbdORSvl3wz+0cOyX2dpC6
7Od8ELuGurJyi3GtUe4x7wG/XrBPxh7xpZ4p7r6Ep2z9iZ2JNb7F2DV5ttXtz810iL0RpJa9xbR8
5/SorFuMa72YwBP9ozN0uzHAf+yrCf88T3cX5z2Q+z5cpxjImJ97Tu2TuOwFAVzjv9PkTkvqern3
nXzVbrCk5zs+ld8woKl/yNX/GOOuoUfCbYufF8nACAT1ZTEInkvgPEE7zkmqbwnc1gvwY36fTD1b
6jrY/ZFRW/mm/lxLv2TbFW+/zdSBK5DNIMCu8r+w4vbyHcnQNfN5kVbaVzwHbm/EdrTbmdPCNwyw
GzSdWea8ZlfHLqGxTAYBz/C7BNZyLpN9Fk/ooKDWO3Y8ixJe5wRuc19u5t/62VUtxoREYTfyduwa
GsveVYNqfdyz9PFu9I92EJuKXThHVZ26E9pOe7vb6JLMqrbYZD/NfcfW1lw12JztvbjOMjNqk2zx
tk0u7BxYpV41qLbEEphfmP0p+vDAFfj1PM2zk3zHXWiklTuTmxrcNdiL6xUDWWQXZOv+gRpZ3SOA
Tjzi/KffeMf2YEFx9bom9wgS2EP4X3eed78aW2hfc/ds3iPob9dmZMJ4qcce5EBWFLhMkMqyukcA
a/hRAmsZ4C5MZxIPj2sJP7A0RqUZ425w59Ia2NZuUAxk1A8KjYFQsrtHAD240yVwN4B9lZ8XU2/L
ewRu81eL2n6t+ZU72r9nTVT9a7vNfcNmpLFu8WXz+B/WFLxUkNqyHARwgLs57/vnmrfEPs2/C683
nyCo+ZeCt2KX3G/sAO+eSSlZYwcUMxBJpR8aAPwxkTnh+riLXe/0iizqIGGVfSXRR5Ek+27IznhE
jWU7CMy+7f1YMsAuaY/MU/gchPa27e85IbyUkvfsh7FLaEm2gwDmJ3IZEfd1Ppt2qQXvGbxjh7Iw
7aokG+wSXoldQ0uyfY4AoKd7gu0SKOAtt2dVXjMs1txIlO85gvr/XuAx3Vh3e6pTnUo2PGvT+ai4
RXWOoNpSOyuRWYMGc2UCJx6T9rwdzPOxi5CUrbZvFBsDoWQ/COD3ycwaZDNdQvMU5yfPfaDn7UDm
h6xLQrObuDt2DVtTCkGwkR8kND/c6XzZfyX5y/OswYvaKyhrb3BxoP17D6UQBPBfrkiou+e4aS1/
R9LPHLp81rrQDua5hBuWrLikFGI++ycLq3Vyf2Z2EnXYc3yypYeBa08S+p4srLtkbuunOUa6G5i2
tW+SknOP7eO3Ap0srGuNnVn8A8V1ufHuEtqF78BW9wxe4iCX2AzKkhGr7dTYJeSnVIIAnrDvJ7Sm
A90vkh91JD8tnjVYxFF2ZZy6JB12Pi/FriE/pRMEcClPJLSmY3KnREoCoIX7ENdxkv0kYmGSKLsv
qXNb6SulIFhsZ7E8kTU5/tcOjN2dJic538g3OS8rU2WLl2V8J6sPHTdWSkEAD5LQsI/Wy13JJ2J3
B4wcDc4crLfv2GezNCueFMXsxwnOhZ26UrlqUKOju49dEyrpRduPlxvXmc5Vg7ozEtXMS1QdBFXk
MKg/S9FcdyN9E+qnxPA324v1SaxIVw2astZOSWxO+VHuOvrH7lB9W34cD9hu3M2G2PVIkZbbkcnE
QCilFgTwH0tuxL8Z7lZ6xu5QQw5wuNfc/QqCUmXf4I3YNRSm1A4NADpyi/tUYmu7277A0rp1xjw0
yFGFw/q649mLaemMtihps5s5KrkTvhqqrHlj3J8YkVhll3NyzeF5VewgaFc1xp3AfgxOcnNJUE2e
eyqegqCltR3F1bRKbG2XcEb1bnjUIBhTNT13hM2mbaKbSsLaYAfy5yRXGOYd2jpIK4mz6914zkxs
bae4FXw74hNiw5nCnhzkemb+ITXZmvOSjYFQSnSPAOjEn10ijyEBsNH9H9+NsEcwIjfLdmcqY5Le
PBKDPcTuia8zSOWlGwQwzt2X4OW/Kvd/XFC1KUgQtLJ2Nsrtxz4Mp1ep7pVJQzafzyR5dmDzWoPU
XspBAIe53yT5NnK/rfq9e8ktrlrGx6kEQcdcL+ttE3K720yGlOClW2nJGjuMu5JfrYIgn7VexFkJ
r7LK/bdqIQvdy+6tqrdY4jZPWeoRBJ3omxtio2wg43PjbFQqG0Li+2Ziz8fWoyDIR6fcjanNGPQB
7/IBS3mVxbxpb/MeK1mWRxC0ogP9GOQGsQ0j6U0/ejOQbilVKZlg13NqInNwNF5zkPpLPQhgsLuH
samX/xEfsY4NrONDVvE+uOWsrHMB06wtPehOd7rTiVa0pwMddSGwYsy3XdOJAQVB/nZ28+gaqBsi
jb1pB/N4WivXQ0f5etROLq0HPKSsrLcT04uBUMohCOC3dmEik6CIFGqTncu9sYvwVw6HBgBt3HV8
LlBXRLawCzkn3T9COkdQmK7uLmYG6oxItXn2P6xJtwmdIyjMSjs8scFNRfLxTzsu7RgIpXz2CAB2
cHcxMFCHpNI9Z3vzTvrNaI+gcE/ZiSyLXYRUhGV2RogYCKW8ggD+ZCdlfQJqKQOr7aRyuFZQq9yC
AG6xU9za2EVIWVtlX+KW2EUkq/yCAK62i2KXIGVsg53FzbGLSFo5BgF2YelMNSUl5+dcHruE5JXX
VYNabdyVHBW2SakIl9opYRvUDUV+2rlfcWToRqXM/dqOZ13YJnX50M96O96ujV2ElBO7NnwMhFK+
QQDrOJHrYhchZeM6TizXGCjnQ4Nqbdyv+GKcpqWsXGdfjjMFnQ4NkrDBTkAHCOLrWjuhvGeiLPcg
gPV2ol0duwgpZXa1nVjuQ9+UfxDAek7mmthFSMm6hgoYAasSggDW2/E6QJCi/NxOKP8YqJQggI12
ApfFLkJKzDousq+W97mBGuV+1aC+8903NMGY5GmdnZaFW9V1Z2EajnEX0yV2EVICltlXuCl2EaAg
SKkKO8RdpVkQZCs+sBO5PXYR1XQfQTpusy/yWuwiJNNetP/JSgyEUnl7BAATuMLtErsWySZ7mC/z
Yuwq6tQTpJXKDALo425iTuxqJIP+bF/kg9hF1KVDgzQtsQO4nE2xy5BM2WQ/sUOzFQOhVGoQwGpO
tLP4OHYZkhkr7UzOKJd5CgpVqYcGNZ8f7C5keOyqJANesVP5S+wimqJzBKlUYQ0/n+RuY2TsuiQu
e4STeCp2Fc3UFqSVyj00qPG07W7Xxy5CYrJr+WRWYyAUBQEs4gS7kFWxy5AoVts5HM/K2GXEpkOD
Gp9xP2Pb2PVJWLaQM7J5ZqBOjUFa0R5BjTvZ2+6JXYSEZL9j36zHQCgKgloLOcjOK9/hKaWeNXY2
h+tm8xo6NGjw2h3MRQyLXaekbIF9jXmxi8iPDg3iuN0OsN/GLkLSZFfZ/qUSA6Foj6DBa4dBK77i
vs02sauVFHzIWXYNVbHLyJ9uKEqliryCAGCW+x4zYtcrCbvfznDPhPqdT4aCIJUq8g4C6M5p7lR6
xK5ZEvKu/Zifs8EFemslRUGQShUFBAHAHHeOHlcuC3fZeTwOjX/uWacgSKWKAoMAevBV91W6x65c
PHxo3+bamucKFQRNURA0eN3kr8kMznPaLyhNVXYPZ/NM7RcUBE1REDR43cyvSc6dzkkMjV2/FOg1
fmKXs7HulxQETVEQNHjd3K+Jw8a6MziCdrH7IHm7wn7mFjT8eSoImqIgaPC6hSAA2M+dy9TYvZA8
PGQXcH9TP08FQVMUBA1ebyUIoAdfdOfQM3ZPpAVv24+5pvrRYgVBfhQEDV5vNQgABvEt90UdJGTS
B/YHzueNmpcKgvwoCBq8zisIAHcIh3Ng7P5IA7fbpTxc9wsKgvwoCBq8zjsIsLYczGlup9h9kmr2
CD/kzw3HpVYQ5EdB0OB1AUEA0J3Pu//HoNj9qnjP2neZx4rG/6AgyI+CoMHrAoMAoCsnuyMYF7tv
FesFu45fNjfqoIIgPwqCBq+LCAKAXu5YjmNE7P5VnDe5zX7Koua/QUGQHwVBg9dFBgHAKA7nGKcB
UAOxJVzOrcxv+bsUBPlREDR47REEAH05yH1DoyGn7m27kutrLxI2T0GQHwVBg9eeQQDQhRPdUYyN
3dcyZfzbbueKfOehUBDkR0HQ4HUCQQCwDV9yc9kzdn/LzgP2W37DhvwXUBDkR0HQ4HVCQQDQ1e1s
x7uDYve5PNhabna/t783dYmwJQqC/CgIGrxOMAhwWAc31g5zx2ggVC9v2q3cwHy3sfDfVgVBfhQE
DV4nHATV39efY92+TNHTCQVbzfN2B5dVnxEo5i2sIMiPgqDB61SCAHBt7CC3N/vSN/Y2KBmvcq/d
7f7Y/E8sHwqC/CgIGrxOLQiq/7+jm2wHun1jb4eM22B/cXfZv3i+5Z9YPhQE+VEQNHidchDgsM6M
Z08+54bqUKGRj+0V7uQmXnFr62+3hp/nS0GQHwVBg9cBgqD6H9uxs/syOzGMVrG3SiZ8zMv2MHfw
YPUsRI22FwqCNCkIGrwOFgTVn/fmU25Xdq3o24828DRP2N+5k7W1X1QQ1FAQpFJFxoKg+pPhboJN
Zp/KG9vAHudu/s6Cxo8NKQi2bKMgrSgIGryOEgTVn/VkNLPZ302gY0Y2V1qqWG+Pcjf38RZLm/4W
BUENBUEqVWQ4CKrl3ADbn93dePqW4byLH/KePcZD7q+2qOUZiRUENRQEqVSR+SCo+bwTO7lPMI4x
bF8GVxfWs4CF9jSP8Sgf5bP1FAQ1FASpVFEyQVDz2SA3ysYwlulMzcY2LIStYj7z+Rev8yJvF7L1
FAQ1FASpVFFyQVD9X0cvt42NZmd2YozrQ5uMbNCmVLHRVvFvnuQpnmUZy+tPOZbf1lMQ1FAQpFJF
iQZB7f9ztGUEk5nGaDeILnSjS+ytCsBK1vC+vc18nuZpXmFdSxtIQZA/BUEqVZR8ENT9vrYMZSTD
3HAG051+DKBb0M35Ph+wlEV8aM/zMm/zbL5vMgVB/hQEqVRRVkFQ97PO9KU/3V0v68cgejGYPnRx
iT7iZCt4iw9ZwmLedu/Yct5lKcv4sOl+5PtzaPk7FARhqm0du5uSkNWs5pXNnzva0YkOtKED7Wwo
PWhPT3rTlS70xtGOHrRik+tG5/rrsBXkyLGSFaxlNatZyQes5V1WuEWsZz1rWM86Piqx95JsVbA9
AhHJrlzsAkQkPgWBiCgIRERBICIoCEQEBYGIoCAQERQEIoKCQERQEIgICgIRQUEgIigIRAQFgYig
IBARFAQigoJARFAQiAgKAhFBQSAiKAhEBAWBiKAgEBEUBCKCgkBEgP8PCPTCCyMAfxEAAAAldEVY
dGRhdGU6Y3JlYXRlADIwMjUtMDctMDhUMDI6Mjc6NTgrMDA6MDDf29LGAAAAJXRFWHRkYXRlOm1v
ZGlmeQAyMDI1LTA3LTA4VDAyOjI3OjU4KzAwOjAwroZqegAAACh0RVh0ZGF0ZTp0aW1lc3RhbXAA
MjAyNS0wNy0wOFQwMjoyNzo1OCswMDowMPmTS6UAAAAASUVORK5CYII=" />
</svg> </svg>
# 常见问题解答
如果未能列出您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。
如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。
### 1. 在WSL2的Ubuntu22.04中遇到报错`ImportError: libGL.so.1: cannot open shared object file: No such file or directory`
WSL2的Ubuntu22.04中缺少`libgl`库,可通过以下命令安装`libgl`库解决:
```bash
sudo apt-get install libgl1-mesa-glx
```
参考:https://github.com/opendatalab/MinerU/issues/388
### 2. 在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`
新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
参考:https://github.com/opendatalab/MinerU/issues/1004
### 3. 在 Linux 系统安装并使用时,解析结果缺失部份文字信息。
MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。
为了解决这个问题,您可以通过以下命令安装noto字体包,这在Ubuntu/debian系统中有效:
```bash
sudo apt update
sudo apt install fonts-noto-core
sudo apt install fonts-noto-cjk
fc-cache -fv
```
也可以直接使用我们的[Docker部署](../quick_start/docker_deployment.md)方式构建镜像,镜像中默认包含以上字体包。
参考:https://github.com/opendatalab/MinerU/issues/2915
\ No newline at end of file
<iframe src="https://opendatalab-mineru.ms.show" frameborder="0" width="850" height="850"></iframe>
\ No newline at end of file
# 常见问题解答
如果未能列出您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。
如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。
??? question "在WSL2的Ubuntu22.04中遇到报错`ImportError: libGL.so.1: cannot open shared object file: No such file or directory`"
WSL2的Ubuntu22.04中缺少`libgl`库,可通过以下命令安装`libgl`库解决:
```bash
sudo apt-get install libgl1-mesa-glx
```
参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
参考:[#1004](https://github.com/opendatalab/MinerU/issues/1004)
??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。
为了解决这个问题,您可以通过以下命令安装noto字体包,这在Ubuntu/debian系统中有效:
```bash
sudo apt update
sudo apt install fonts-noto-core
sudo apt install fonts-noto-cjk
fc-cache -fv
```
也可以直接使用我们的[Docker部署](../quick_start/docker_deployment.md)方式构建镜像,镜像中默认包含以上字体包。
参考:[#2915](https://github.com/opendatalab/MinerU/issues/2915)
\ No newline at end of file
<div align="center" xmlns="http://www.w3.org/1999/html"> <div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo --> <!-- logo -->
<p align="center"> <p align="center">
<img src="../images/MinerU-logo.png" width="300px" style="vertical-align:middle;"> <img src="https://opendatalab.github.io/MinerU/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p> </p>
</div> </div>
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mineru)](https://pypi.org/project/mineru/)
[![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru)](https://pepy.tech/project/mineru)
[![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru) [![Downloads](https://static.pepy.tech/badge/mineru/month)](https://pepy.tech/project/mineru)
[![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU) [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
......
...@@ -37,33 +37,45 @@ docker run --gpus all \ ...@@ -37,33 +37,45 @@ docker run --gpus all \
``` ```
执行该命令后,您将进入到Docker容器的交互式终端,并映射了一些端口用于可能会使用的服务,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。 执行该命令后,您将进入到Docker容器的交互式终端,并映射了一些端口用于可能会使用的服务,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[MinerU使用文档](../usage/index_back.md) 您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[MinerU使用文档](../usage/index.md)
## 通过 Docker Compose 直接启动服务 ## 通过 Docker Compose 直接启动服务
我们提供了`compose.yml`文件,您可以通过它来快速启动MinerU服务。 我们提供了[compose.yml](https://github.com/opendatalab/MinerU/blob/master/docker/compose.yaml)文件,您可以通过它来快速启动MinerU服务。
```bash ```bash
# 下载 compose.yaml 文件 # 下载 compose.yaml 文件
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
``` ```
>[!NOTE]
>
>- `compose.yaml`文件中包含了MinerU的多个服务配置,您可以根据需要选择启动特定的服务。
>- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
>- 由于`sglang`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`sglang`服务,因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时,其他可能使用显存的服务已停止。
- 启动`sglang-server`服务,并通过`vlm-sglang-client`后端连接`sglang-server` - 启动`sglang-server`服务,并通过`vlm-sglang-client`后端连接`sglang-server`
```bash ```bash
docker compose -f compose.yaml --profile mineru-sglang-server up -d docker compose -f compose.yaml --profile mineru-sglang-server up -d
# 在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
``` ```
>[!TIP]
>在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
> ```
- 启动 API 服务: - 启动 API 服务:
```bash ```bash
docker compose -f compose.yaml --profile mineru-api up -d docker compose -f compose.yaml --profile mineru-api up -d
``` ```
>[!TIP] >[!TIP]
>在浏览器中访问 `http://<server_ip>:8000/docs` 查看API文档。 >在浏览器中访问 `http://<server_ip>:8000/docs` 查看API文档。
- 启动 Gradio WebUI 服务: - 启动 Gradio WebUI 服务:
```bash ```bash
docker compose -f compose.yaml --profile mineru-gradio up -d docker compose -f compose.yaml --profile mineru-gradio up -d
``` ```
>[!TIP] >[!TIP]
>在浏览器中访问 `http://<server_ip>:7860` 使用 Gradio WebUI 或访问 `http://<server_ip>:7860/?view=api` 使用 Gradio API。 >
\ No newline at end of file >- 在浏览器中访问 `http://<server_ip>:7860` 使用 Gradio WebUI。
>- 访问 `http://<server_ip>:7860/?view=api` 使用 Gradio API。
\ No newline at end of file
...@@ -4,12 +4,16 @@ ...@@ -4,12 +4,16 @@
## 在线体验 ## 在线体验
- 官网在线demo:官网在线版功能与客户端一致,界面美观,功能丰富,需要登录使用 ### 官网在线应用
- [![OpenDataLab](https://img.shields.io/badge/Demo_on_OpenDataLab-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github) 官网在线版功能与客户端一致,界面美观,功能丰富,需要登录使用
- [![OpenDataLab](https://img.shields.io/badge/webapp_on_mineru.net-blue?logo=&labelColor=white)](https://mineru.net/OpenSourceTools/Extractor?source=github)
- 基于Gradio的在线demo:基于gradio开发的webui,界面简洁,仅包含核心解析功能,免登录 ### 基于Gradio的在线demo
- [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU) 基于gradio开发的webui,界面简洁,仅包含核心解析功能,免登录
- [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
- [![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
- [![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
## 本地部署 ## 本地部署
...@@ -22,7 +26,7 @@ ...@@ -22,7 +26,7 @@
> >
> 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。 > 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
<table> <table border="1">
<tr> <tr>
<td>解析后端</td> <td>解析后端</td>
<td>pipeline</td> <td>pipeline</td>
...@@ -85,4 +89,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple ...@@ -85,4 +89,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。 MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。
您可以在文档中获取[Docker部署说明](./docker_deployment.md) 您可以在文档中获取[Docker部署说明](./docker_deployment.md)
--- ---
\ No newline at end of file
### 使用 MinerU
您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](../usage/index.md)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment