Merge pull request #3016 from myhloli/dev

Dev

Merge pull request #3016 from myhloli/dev
Dev
03ea29bd · Xiaomeng Zhao · GitHub · 343eaac1 · 406b8ea9 · 03ea29bd
Unverified Commit 03ea29bd authored Jul 14, 2025 by Xiaomeng Zhao Committed by GitHub Jul 14, 2025
10 changed files
--- a/docs/zh/usage/advanced_cli_parameters.md
+++ b/docs/zh/usage/advanced_cli_parameters.md
+# 命令行参数进阶技巧
+
+## SGLang 加速参数优化
+
+### 显存优化参数
+> [!TIP]
+>sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行，但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用：
+>- 如果您使用单张显卡遇到显存不足的情况时，可能需要调低KV缓存大小，`--mem-fraction-static 0.5`，如仍出现显存不足问题，可尝试进一步降低到`0.4`或更低。
+>- 如您有两张以上显卡，可尝试通过张量并行（TP）模式简单扩充可用显存：`--tp-size 2`
+
+### 性能优化参数
+> [!TIP]
+>如果您已经可以正常使用sglang对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
+>- 如果您有超过多张显卡，可以使用sglang的多卡并行模式来增加吞吐量：`--dp-size 2`
+>- 同时您可以启用`torch.compile`来将推理速度加速约15%：`--enable-torch-compile`
+
+### 参数传递说明
+> [!TIP]
+>- 如果您想了解更多有关`sglang`的参数使用方法，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+>- 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
+
+## GPU 设备选择与配置
+
+### CUDA_VISIBLE_DEVICES 基本用法
+> [!TIP]
+> - 任何情况下，您都可以通过在命令行的开头添加`CUDA_VISIBLE_DEVICES` 环境变量来指定可见的 GPU 设备。例如：
+>   ```bash
+>   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
+>   ```
+> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
+
+### 常见设备配置示例
+> [!TIP]
+> - 以下是一些常见的 `CUDA_VISIBLE_DEVICES` 设置示例：
+>   ```bash
+>   CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
+>   CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
+>   CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional
+>   CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
+>   CUDA_VISIBLE_DEVICES="" No GPU will be visible
+>   ```
+
+### 实际应用场景
+> [!TIP]
+>以下是一些可能的使用场景：
+>- 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动'sglang-server'，可以使用以下命令：
+>  ```bash
+>  CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
+>  ```
+>- 如果您有多张显卡，需要在卡0和卡1上启动两个`fastapi`服务，并分别监听不同的端口，可以使用以下命令：
+>  ```bash
+>  # 在终端1中
+>  CUDA_VISIBLE_DEVICES=0 mineru-api --host 127.0.0.1 --port 8000
+>  # 在终端2中
+>  CUDA_VISIBLE_DEVICES=1 mineru-api --host 127.0.0.1 --port 8001
+>  ```
--- a/docs/zh/usage/api.md
+++ b/docs/zh/usage/api.md
-# API 调用 或 可视化调用
-
-1. 使用python api直接调用：[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
-2. 使用fast api方式调用：
-    ```bash
-    mineru-api --host 127.0.0.1 --port 8000
-    ```
-    在浏览器中访问 http://127.0.0.1:8000/docs 查看API文档。
-
-3. 使用gradio webui 或 gradio api调用
-    ```bash
-    # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
-    mineru-gradio --server-name 127.0.0.1 --server-port 7860
-    # 或使用 vlm-sglang-engine/pipeline 后端
-    mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
-    ```
-    在浏览器中访问 http://127.0.0.1:7860 使用 Gradio WebUI 或访问 http://127.0.0.1:7860/?view=api 使用 Gradio API。
-
-> [!TIP]
-> - 以下是一些使用sglang加速模式的建议和注意事项：
-> - sglang加速模式目前支持在最低8G显存的Turing架构显卡上运行，但在显存<24G的显卡上可能会遇到显存不足的问题, 可以通过使用以下参数来优化显存使用：
->   - 如果您使用单张显卡遇到显存不足的情况时，可能需要调低KV缓存大小，`--mem-fraction-static 0.5`，如仍出现显存不足问题，可尝试进一步降低到`0.4`或更低。
->   - 如您有两张以上显卡，可尝试通过张量并行（TP）模式简单扩充可用显存：`--tp-size 2`
-> - 如果您已经可以正常使用sglang对vlm模型进行加速推理，但仍然希望进一步提升推理速度，可以尝试以下参数：
->   - 如果您有超过多张显卡，可以使用sglang的多卡并行模式来增加吞吐量：`--dp-size 2`
->   - 同时您可以启用`torch.compile`来将推理速度加速约15%：`--enable-torch-compile`
-> - 如果您想了解更多有关`sglang`的参数使用方法，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
-> - 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`
-
-> [!TIP]
-> - 任何情况下，您都可以通过在命令行的开头添加`CUDA_VISIBLE_DEVICES` 环境变量来指定可见的 GPU 设备。例如：
->   ```bash
->   CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
->   ```
-> - 这种指定方式对所有的命令行调用都有效，包括 `mineru`、`mineru-sglang-server`、`mineru-gradio` 和 `mineru-api`，且对`pipeline`、`vlm`后端均适用。
-> - 以下是一些常见的 `CUDA_VISIBLE_DEVICES` 设置示例：
->   ```bash
->   CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
->   CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
->   CUDA_VISIBLE_DEVICES=“0,1” Same as above, quotation marks are optional
->   CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
->   CUDA_VISIBLE_DEVICES="" No GPU will be visible
->   ```
-> - 以下是一些可能的使用场景：
->   - 如果您有多张显卡，需要指定卡0和卡1，并使用多卡并行来启动'sglang-server'，可以使用以下命令：
->   ```bash
->   CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
->   ```
->   - 如果您有多张显卡，需要在卡0和卡1上启动两个`fastapi`服务，并分别监听不同的端口，可以使用以下命令：
->   ```bash
->   # 在终端1中
->   CUDA_VISIBLE_DEVICES=0 mineru-api --host 127.0.0.1 --port 8000
->   # 在终端2中
->   CUDA_VISIBLE_DEVICES=1 mineru-api --host 127.0.0.1 --port 8001
->   ```
-
---
--- a/docs/zh/usage/cli_tools.md
+++ b/docs/zh/usage/cli_tools.md
+# 命令行工具使用说明
+
+## 查看帮助信息
+要查看 MinerU 命令行工具的帮助信息，可以使用 `--help` 参数。以下是各个命令行工具的帮助信息示例：
+```bash
+mineru --help
+Usage: mineru [OPTIONS]
+
+Options:
+  -v, --version                   显示版本并退出
+  -p, --path PATH                 输入文件路径或目录（必填）
+  -o, --output PATH               输出目录（必填）
+  -m, --method [auto|txt|ocr]     解析方法：auto（默认）、txt、ocr（仅用于 pipeline 后端）
+  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
+                                  解析后端（默认为 pipeline）
+  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
+                                  指定文档语言（可提升 OCR 准确率，仅用于 pipeline 后端）
+  -u, --url TEXT                  当使用 sglang-client 时，需指定服务地址
+  -s, --start INTEGER             开始解析的页码（从 0 开始）
+  -e, --end INTEGER               结束解析的页码（从 0 开始）
+  -f, --formula BOOLEAN           是否启用公式解析（默认开启）
+  -t, --table BOOLEAN             是否启用表格解析（默认开启）
+  -d, --device TEXT               推理设备（如 cpu/cuda/cuda:0/npu/mps，仅 pipeline 后端）
+  --vram INTEGER                  单进程最大 GPU 显存占用(GB)（仅 pipeline 后端）
+  --source [huggingface|modelscope|local]
+                                  模型来源，默认 huggingface
+  --help                          显示帮助信息
+```
+```bash
+mineru-api --help
+Usage: mineru-api [OPTIONS]
+
+Options:
+  --host TEXT     Server host (default: 127.0.0.1)
+  --port INTEGER  Server port (default: 8000)
+  --reload        Enable auto-reload (development mode)
+  --help          Show this message and exit.
+```
+```bash
+mineru-gradio --help
+Usage: mineru-gradio [OPTIONS]
+
+Options:
+  --enable-example BOOLEAN        Enable example files for input.The example
+                                  files to be input need to be placed in the
+                                  `example` folder within the directory where
+                                  the command is currently executed.
+  --enable-sglang-engine BOOLEAN  Enable SgLang engine backend for faster
+                                  processing.
+  --enable-api BOOLEAN            Enable gradio API for serving the
+                                  application.
+  --max-convert-pages INTEGER     Set the maximum number of pages to convert
+                                  from PDF to Markdown.
+  --server-name TEXT              Set the server name for the Gradio app.
+  --server-port INTEGER           Set the server port for the Gradio app.
+  --latex-delimiters-type [a|b|all]
+                                  Set the type of LaTeX delimiters to use in
+                                  Markdown rendering:'a' for type '$', 'b' for
+                                  type '()[]', 'all' for both types.
+  --help                          Show this message and exit.
+```
+
+## 环境变量说明
+
+MinerU命令行工具的某些参数存在相同功能的环境变量配置，通常环境变量配置的优先级高于命令行参数，且在所有命令行工具中都生效。
+- `MINERU_DEVICE_MODE`：用于指定推理设备，支持`cpu/cuda/cuda:0/npu/mps`等设备类型，仅对`pipeline`后端生效。
+- `MINERU_VIRTUAL_VRAM_SIZE`：用于指定单进程最大 GPU 显存占用(GB)，仅对`pipeline`后端生效。
+- `MINERU_MODEL_SOURCE`：用于指定模型来源，支持`huggingface/modelscope/local`，默认为`huggingface`，可通过环境变量切换为`modelscope`或使用本地模型。
+- `MINERU_TOOLS_CONFIG_JSON`：用于指定配置文件路径，默认为用户目录下的`mineru.json`，可通过环境变量指定其他配置文件路径。
+- `MINERU_FORMULA_ENABLE`：用于启用公式解析，默认为`true`，可通过环境变量设置为`false`来禁用公式解析。
+- `MINERU_TABLE_ENABLE`：用于启用表格解析，默认为`true`，可通过环境变量设置为`false`来禁用表格解析。
+
+
--- a/docs/zh/usage/config.md
+++ b/docs/zh/usage/config.md
-
-# 基于配置文件扩展 MinerU 功能
-
- MinerU 现已实现开箱即用，但也支持通过配置文件扩展功能。您可以在用户目录下创建 `mineru.json` 文件，添加自定义配置。
- `mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成，也可以通过将[配置模板文件](../../mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。
- 以下是一些可用的配置选项：
-  - `latex-delimiter-config`：用于配置 LaTeX 公式的分隔符，默认为`$`符号，可根据需要修改为其他符号或字符串。
-  - `llm-aided-config`：用于配置 LLM 辅助标题分级的相关参数，兼容所有支持`openai协议`的 LLM 模型，默认使用`阿里云百炼`的`qwen2.5-32b-instruct`模型，您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
-  - `models-dir`：用于指定本地模型存储目录，请为`pipeline`和`vlm`后端分别指定模型目录，指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
-
---
\ No newline at end of file
--- a/docs/zh/usage/index.md
+++ b/docs/zh/usage/index.md
 # 使用 MinerU

-## 命令行使用方式
-
-### 基础用法
-
-最简单的命令行调用方式如下：
-
-```bash
-mineru -p <input_path> -o <output_path>
-```
-
- `<input_path>`：本地 PDF/图片 文件或目录（支持 pdf/png/jpg/jpeg/webp/gif）
- `<output_path>`：输出目录
-
-### 查看帮助信息
-
-获取所有可用参数说明：
-
+## 快速配置模型源
+MinerU默认使用`huggingface`作为模型源，若用户网络无法访问`huggingface`，可以通过环境变量便捷地切换模型源为`modelscope`：
 ```bash
-mineru --help
-```
-
-### 参数详解
-
-```text
-Usage: mineru [OPTIONS]
-
-Options:
-  -v, --version                   显示版本并退出
-  -p, --path PATH                 输入文件路径或目录（必填）
-  -o, --output PATH               输出目录（必填）
-  -m, --method [auto|txt|ocr]     解析方法：auto（默认）、txt、ocr（仅用于 pipeline 后端）
-  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
-                                  解析后端（默认为 pipeline）
-  -l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
-                                  指定文档语言（可提升 OCR 准确率，仅用于 pipeline 后端）
-  -u, --url TEXT                  当使用 sglang-client 时，需指定服务地址
-  -s, --start INTEGER             开始解析的页码（从 0 开始）
-  -e, --end INTEGER               结束解析的页码（从 0 开始）
-  -f, --formula BOOLEAN           是否启用公式解析（默认开启）
-  -t, --table BOOLEAN             是否启用表格解析（默认开启）
-  -d, --device TEXT               推理设备（如 cpu/cuda/cuda:0/npu/mps，仅 pipeline 后端）
-  --vram INTEGER                  单进程最大 GPU 显存占用(GB)（仅 pipeline 后端）
-  --source [huggingface|modelscope|local]
-                                  模型来源，默认 huggingface
-  --help                          显示帮助信息
+export MINERU_MODEL_SOURCE=modelscope
 ```
+有关模型源配置和自定义本地模型路径的更多信息，请参考文档中的[模型源说明](./model_source.md)。

 ---

-## 模型源配置
-
-MinerU 默认在首次运行时自动从 HuggingFace 下载所需模型。若无法访问 HuggingFace，可通过以下方式切换模型源：
-
-### 切换至 ModelScope 源
-
-```bash
-mineru -p <input_path> -o <output_path> --source modelscope
-```
-
-或设置环境变量：
-
+## 通过命令行快速使用
+MinerU内置了命令行工具，用户可以通过命令行快速使用MinerU进行PDF解析：
 ```bash
-export MINERU_MODEL_SOURCE=modelscope
+# 默认使用pipeline后端解析
 mineru -p <input_path> -o <output_path>
 ```
+- `<input_path>`：本地 PDF/图片 文件或目录
+- `<output_path>`：输出目录

-### 使用本地模型
-
-#### 1. 下载模型到本地
-
-```bash
-mineru-models-download --help
-```
-
-或使用交互式命令行工具选择模型下载：
-
-```bash
-mineru-models-download
-```
-
-下载完成后，模型路径会在当前终端窗口输出，并自动写入用户目录下的 `mineru.json`。
+> [!NOTE]
+> 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速，
+> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择适合自己cuda版本的命令安装支持加速的`torch`和`torchvision`。

-#### 2. 使用本地模型进行解析
+> [!TIP]
+> 更多关于输出文件的信息，请参考[输出文件说明](./output_file.md)。

 ```bash
-mineru -p <input_path> -o <output_path> --source local
+# 或指定vlm后端解析
+mineru -p <input_path> -o <output_path> -b vlm-transformers
 ```
+> [!TIP]
+> vlm后端另外支持`sglang`加速，与`transformers`后端相比，`sglang`的加速比可达20～30倍，可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`sglang`加速的完整包安装方法。

-或通过环境变量启用：
-
-```bash
-export MINERU_MODEL_SOURCE=local
-mineru -p <input_path> -o <output_path>
-```
+如果需要通过自定义参数调整解析选项，您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。

 ---

-## 使用 sglang 加速 VLM 模型推理
-
-### 通过 sglang-engine 模式
-
-```bash
-mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
-```
-
-### 通过 sglang-server/client 模式
-
-1. 启动 Server：
-
-```bash
-mineru-sglang-server --port 30000
-```
-
-2. 在另一个终端中使用 Client 调用：
+## 通过api、webui、sglang-client/server进阶使用
+
+- 通过python api直接调用：[Python 调用示例](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
+- 通过fast api方式调用：
+  ```bash
+  mineru-api --host 127.0.0.1 --port 8000
+  ```
+  在浏览器中访问 http://127.0.0.1:8000/docs 查看API文档。
+- 启动gradio webui 可视化前端：
+  ```bash
+  # 使用 pipeline/vlm-transformers/vlm-sglang-client 后端
+  mineru-gradio --server-name 127.0.0.1 --server-port 7860
+  # 或使用 vlm-sglang-engine/pipeline 后端（需安装sglang环境）
+  mineru-gradio --server-name 127.0.0.1 --server-port 7860 --enable-sglang-engine true
+  ```
+  在浏览器中访问 http://127.0.0.1:7860 使用 Gradio WebUI 或访问 http://127.0.0.1:7860/?view=api 使用 Gradio API。
+- 使用`sglang-client/server`方式调用：
+  ```bash
+  # 启动sglang server(需要安装sglang环境)
+  mineru-sglang-server --port 30000
+  # 在另一个终端中通过sglang client连接sglang server（只需cpu与网络，不需要sglang环境）
+  mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
+  ``` 
+> [!TIP]
+> 所有sglang官方支持的参数都可用通过命令行参数传递给 MinerU，包括以下命令:`mineru`、`mineru-sglang-server`、`mineru-gradio`、`mineru-api`，
+> 我们整理了一些`sglang`使用中的常用参数和使用方法，可以在文档[命令行参数进阶技巧](./advanced_cli_parameters.md)中获取。

-```bash
-mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
-```

-> [!TIP]
-> 更多关于输出文件的信息，请参考 [输出文件说明](../output_file.md)
+## 基于配置文件扩展 MinerU 功能

---
+- MinerU 现已实现开箱即用，但也支持通过配置文件扩展功能。您可以在用户目录下创建 `mineru.json` 文件，添加自定义配置。
+- `mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成，也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。
+- 以下是一些可用的配置选项：
+  - `latex-delimiter-config`：用于配置 LaTeX 公式的分隔符，默认为`$`符号，可根据需要修改为其他符号或字符串。
+  - `llm-aided-config`：用于配置 LLM 辅助标题分级的相关参数，兼容所有支持`openai协议`的 LLM 模型，默认使用`阿里云百炼`的`qwen2.5-32b-instruct`模型，您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
+  - `models-dir`：用于指定本地模型存储目录，请为`pipeline`和`vlm`后端分别指定模型目录，指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
--- a/docs/zh/usage/model_source.md
+++ b/docs/zh/usage/model_source.md
+# 模型源说明
+
+MinerU使用 `HuggingFace` 和 `ModelScope` 作为模型仓库，用户可以根据需要切换模型源或使用本地模型。
+
+- `HuggingFace` 是默认的模型源，在全球范围内提供了优异的加载速度和极高稳定性。
+- `ModelScope` 是中国大陆地区用户的最佳选择，提供了无缝兼容`hf`的SDK模块，适用于无法访问HuggingFace的用户。
+
+## 模型源的切换方法
+
+### 通过命令行参数切换
+目前仅`mineru`命令行工具支持通过命令行参数切换模型源，其他命令行工具如`mineru-api`、`mineru-gradio`等暂不支持。
+```bash
+mineru -p <input_path> -o <output_path> --source modelscope
+```
+
+### 通过环境变量切换
+在任何情况下可以通过设置环境变量来切换模型源，这适用于所有命令行工具和API调用。
+```bash
+export MINERU_MODEL_SOURCE=modelscope
+```
+或
+```python
+import os
+os.environ["MINERU_MODEL_SOURCE"] = "modelscope"
+```
+>[!TIP]
+> 通过环境变量设置的模型源会在当前终端会话中生效，直到终端关闭或环境变量被修改。且优先级高于命令行参数，如同时设置了命令行参数和环境变量，命令行参数将被忽略。
+
+
+## 使用本地模型
+
+### 1. 下载模型到本地
+```bash
+mineru-models-download --help
+```
+或使用交互式命令行工具选择模型下载：
+```bash
+mineru-models-download
+```
+>[!TIP]
+>- 下载完成后，模型路径会在当前终端窗口输出，并自动写入用户目录下的 `mineru.json`。
+>- 模型下载到本地后，您可以自由移动模型文件夹到其他位置，同时需要在 `mineru.json` 中更新模型路径。
+>- 如您将模型文件夹部署到其他服务器上，请确保将 `mineru.json`文件一同移动到新设备的用户目录中并正确配置模型路径。
+>- 如您需要更新模型文件，可以再次运行 `mineru-models-download` 命令，模型更新暂不支持自定义路径，如您没有移动本地模型文件夹，模型文件会增量更新；如您移动了模型文件夹，模型文件会重新下载到默认位置并更新`mineru.json`。
+
+### 2. 使用本地模型进行解析
+
+```bash
+mineru -p <input_path> -o <output_path> --source local
+```
+或通过环境变量启用：
+```bash
+export MINERU_MODEL_SOURCE=local
+mineru -p <input_path> -o <output_path>
+```
\ No newline at end of file
--- a/docs/zh/output_file.md
+++ b/docs/zh/output_file.md
--- a/mineru/backend/vlm/token_to_middle_json.py
+++ b/mineru/backend/vlm/token_to_middle_json.py
@@ -12,14 +12,16 @@ from mineru.version import __version__

 heading_level_import_success = False
 llm_aided_config = get_llm_aided_config()
-if llm_aided_config and llm_aided_config.get('title_aided', {}).get('enable', False):
-    try:
-        from mineru.utils.llm_aided import llm_aided_title
-        from mineru.backend.pipeline.model_init import AtomModelSingleton
-        heading_level_import_success = True
-    except Exception as e:
-        logger.warning("The heading level feature cannot be used. If you need to use the heading level feature, "
-                        "please execute `pip install mineru[core]` to install the required packages.")
+if llm_aided_config:
+    title_aided_config = llm_aided_config.get('title_aided', {})
+    if title_aided_config.get('enable', False):
+        try:
+            from mineru.utils.llm_aided import llm_aided_title
+            from mineru.backend.pipeline.model_init import AtomModelSingleton
+            heading_level_import_success = True
+        except Exception as e:
+            logger.warning("The heading level feature cannot be used. If you need to use the heading level feature, "
+                            "please execute `pip install mineru[core]` to install the required packages.")


 def token_to_page_info(token, image_dict, page, image_writer, page_index) -> dict:

--- a/mineru/utils/model_utils.py
+++ b/mineru/utils/model_utils.py
@@ -206,37 +206,49 @@ def filter_nested_tables(table_res_list, overlap_threshold=0.8, area_threshold=0


 def remove_overlaps_min_blocks(res_list):
-    #  重叠block，小的不能直接删除，需要和大的那个合并成一个更大的。
-    #  删除重叠blocks中较小的那些
+    # 重叠block，小的不能直接删除，需要和大的那个合并成一个更大的。
+    # 删除重叠blocks中较小的那些
    need_remove = []
-    for res1 in res_list:
-        for res2 in res_list:
-            if res1 != res2:
-                overlap_box = get_minbox_if_overlap_by_ratio(
-                    res1['bbox'], res2['bbox'], 0.8
-                )
-                if overlap_box is not None:
-                    res_to_remove = next(
-                        (res for res in res_list if res['bbox'] == overlap_box),
-                        None,
-                    )
-                    if (
-                        res_to_remove is not None
-                        and res_to_remove not in need_remove
-                    ):
-                        large_res = res1 if res1 != res_to_remove else res2
-                        x1, y1, x2, y2 = large_res['bbox']
-                        sx1, sy1, sx2, sy2 = res_to_remove['bbox']
-                        x1 = min(x1, sx1)
-                        y1 = min(y1, sy1)
-                        x2 = max(x2, sx2)
-                        y2 = max(y2, sy2)
-                        large_res['bbox'] = [x1, y1, x2, y2]
-                        need_remove.append(res_to_remove)
-
-    if len(need_remove) > 0:
-        for res in need_remove:
-            res_list.remove(res)
+    for i in range(len(res_list)):
+        # 如果当前元素已在需要移除列表中，则跳过
+        if res_list[i] in need_remove:
+            continue
+
+        for j in range(i + 1, len(res_list)):
+            # 如果比较对象已在需要移除列表中，则跳过
+            if res_list[j] in need_remove:
+                continue
+
+            overlap_box = get_minbox_if_overlap_by_ratio(
+                res_list[i]['bbox'], res_list[j]['bbox'], 0.8
+            )
+
+            if overlap_box is not None:
+                res_to_remove = None
+                large_res = None
+
+                # 确定哪个是小块（要移除的）
+                if overlap_box == res_list[i]['bbox']:
+                    res_to_remove = res_list[i]
+                    large_res = res_list[j]
+                elif overlap_box == res_list[j]['bbox']:
+                    res_to_remove = res_list[j]
+                    large_res = res_list[i]
+
+                if res_to_remove is not None and res_to_remove not in need_remove:
+                    # 更新大块的边界为两者的并集
+                    x1, y1, x2, y2 = large_res['bbox']
+                    sx1, sy1, sx2, sy2 = res_to_remove['bbox']
+                    x1 = min(x1, sx1)
+                    y1 = min(y1, sy1)
+                    x2 = max(x2, sx2)
+                    y2 = max(y2, sy2)
+                    large_res['bbox'] = [x1, y1, x2, y2]
+                    need_remove.append(res_to_remove)
+
+    # 从列表中移除标记的元素
+    for res in need_remove:
+        res_list.remove(res)

    return res_list, need_remove


--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -49,15 +49,16 @@ nav:
    - "MinerU": index.md
    - Quick Start:
      - quick_start/index.md
-      - Online Demo: quick_start/online_demo.md
-      - Local Deployment: quick_start/local_deployment.md
+      - Extension Modules: quick_start/extension_modules.md
+      - Docker Deployment: quick_start/docker_deployment.md
    - Usage:
      - usage/index.md
-      - API Calls or Visual Invocation: usage/api.md
-      - Extending MinerU Functionality Through Configuration Files: usage/config.md
+      - CLI Tools: usage/cli_tools.md
+      - Model Source: usage/model_source.md
+      - Advanced CLI Parameters: usage/advanced_cli_parameters.md
+      - Output File Format: usage/output_file.md
  - FAQ:
      - FAQ: FAQ/index.md
-  - Output File Format: output_file.md
  - Known Issues: known_issues.md
  - TODO: todo.md

@@ -76,14 +77,15 @@ plugins:
          nav_translations:
            Home: 主页
            Quick Start: 快速开始
-            Online Demo: 在线体验
-            Local Deployment: 本地部署
+            Extension Modules: 扩展模块
+            Docker Deployment: Docker部署
            Usage: 使用方法
-            API Calls or Visual Invocation: API 调用 或 可视化调用
-            Extending MinerU Functionality Through Configuration Files: 基于配置文件扩展 MinerU 功能
+            CLI Tools: 命令行工具
+            Model Source: 模型源
+            Advanced CLI Parameters: 命令行参数进阶技巧
            FAQ: FAQ
            Output File Format: 输出文件格式
-            Known Issues: Known Issues
+            Known Issues: 已知问题
            TODO: TODO
  - mkdocs-video