Merge pull request #2768 from myhloli/dev

Dev

Merge pull request #2768 from myhloli/dev
Dev
8a40ff3c · Xiaomeng Zhao · GitHub · 2e30cc17 · f7b37684 · 8a40ff3c
Unverified Commit 8a40ff3c authored Jun 23, 2025 by Xiaomeng Zhao Committed by GitHub Jun 23, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 21 additions and 16 deletions

README.md README.md +11 -8

README_zh-CN.md README_zh-CN.md +10 -8

No files found.
--- a/README.md
+++ b/README.md
@@ -439,7 +439,7 @@ There are three different ways to experience MinerU:
        <td>Parsing Backend</td>
        <td>pipeline</td>
        <td>vlm-transformers</td>
-        <td>vlm-sgslang</td>
+        <td>vlm-sglang</td>
    </tr>
    <tr>
        <td>Operating System</td>
@@ -502,7 +502,7 @@ cd MinerU
 uv pip install -e .[core]
 ```
-> [!TIP]  
+> [!NOTE]  
 > Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration, 
 > please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
@@ -651,13 +651,13 @@ mineru -p <input_path> -o <output_path>
 #### 2.3 Using sglang to Accelerate VLM Model Inference
-##### Start sglang-engine Mode
+##### Through the sglang-engine Mode
 ```bash
 mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
 ```
-##### Start sglang-server/client Mode
+##### Through the sglang-server/client Mode
 1. Start Server:
@@ -666,10 +666,13 @@ mineru-sglang-server --port 30000
 ```
 > [!TIP]
-> sglang acceleration requires a GPU with Ampere architecture or newer, and at least 24GB VRAM. If you have two 12GB or 16GB GPUs, you can use Tensor Parallelism (TP) mode:  
+> sglang-server has some commonly used parameters for configuration:
-> `mineru-sglang-server --port 30000 --tp 2`  
+> - If you have two GPUs with `12GB` or `16GB` VRAM, you can use the Tensor Parallel (TP) mode: `--tp 2`
-> 
+> - If you have two GPUs with `11GB` VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size: `--tp 2 --mem-fraction-static 0.7`
-> If you still encounter out-of-memory errors with two GPUs, or if you need to improve throughput or inference speed using multi-GPU parallelism, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands).
+> - If you have more than two GPUs with `24GB` VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput: `--dp 2`
+> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
+> - If you want to learn more about the usage of `sglang` parameters, please refer to the [official sglang documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
 2. Use Client in another terminal:

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -429,7 +429,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
        <td>解析后端</td>
        <td>pipeline</td>
        <td>vlm-transformers</td>
-        <td>vlm-sgslang</td>
+        <td>vlm-sglang</td>
    </tr>
    <tr>
        <td>操作系统</td>
@@ -492,7 +492,7 @@ cd MinerU
 uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
 ```
-> [!TIP]
+> [!NOTE]
 > Linux和macOS系统安装后自动支持cuda/mps加速，Windows用户如需使用cuda加速，
 > 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。
@@ -640,13 +640,13 @@ mineru -p <input_path> -o <output_path>
 #### 2.3 使用 sglang 加速 VLM 模型推理
-##### 启动 sglang-engine 模式
+##### 通过 sglang-engine 模式
 ```bash
 mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
 ```
-##### 启动 sglang-server/client 模式
+##### 通过 sglang-server/client 模式
 1. 启动 Server：
@@ -655,10 +655,12 @@ mineru-sglang-server --port 30000
 ```
 > [!TIP]
-> sglang加速需设备有Ampere及以后架构，24G显存及以上显卡，如您有两张12G或16G显卡，可以通过张量并行（TP）模式使用:
+> sglang-server 有一些常用参数可以配置：
-> `mineru-sglang-server --port 30000 --tp 2`
+> - 如您有两张显存为`12G`或`16G`的显卡，可以通过张量并行（TP）模式使用：`--tp 2`
-> 
+> - 如您有两张`11G`显卡，除了张量并行外，还需要调低KV缓存大小，可以使用：`--tp 2 --mem-fraction-static 0.7`
-> 如使用两张卡仍出现显存不足错误或需要使用多卡并行增加吞吐量或推理速度，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
+> - 如果您有超过多张`24G`以上显卡，可以使用sglang的多卡并行模式来增加吞吐量：`--dp 2`
+> - 同时您可以启用`torch.compile`来将推理速度加速约15%：`--enable-torch-compile`
+> - 如果您想了解更多有关`sglang`的参数使用方法，请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
 2. 在另一个终端中使用 Client 调用：