Unverified Commit 94410962 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

Merge pull request #2777 from opendatalab/dev

update docs
parents 07b4d6dc 9e6256c5
...@@ -439,7 +439,7 @@ There are three different ways to experience MinerU: ...@@ -439,7 +439,7 @@ There are three different ways to experience MinerU:
<td>Parsing Backend</td> <td>Parsing Backend</td>
<td>pipeline</td> <td>pipeline</td>
<td>vlm-transformers</td> <td>vlm-transformers</td>
<td>vlm-sgslang</td> <td>vlm-sglang</td>
</tr> </tr>
<tr> <tr>
<td>Operating System</td> <td>Operating System</td>
...@@ -502,7 +502,7 @@ cd MinerU ...@@ -502,7 +502,7 @@ cd MinerU
uv pip install -e .[core] uv pip install -e .[core]
``` ```
> [!TIP] > [!NOTE]
> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration, > Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration,
> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version. > please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
...@@ -651,13 +651,13 @@ mineru -p <input_path> -o <output_path> ...@@ -651,13 +651,13 @@ mineru -p <input_path> -o <output_path>
#### 2.3 Using sglang to Accelerate VLM Model Inference #### 2.3 Using sglang to Accelerate VLM Model Inference
##### Start sglang-engine Mode ##### Through the sglang-engine Mode
```bash ```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
``` ```
##### Start sglang-server/client Mode ##### Through the sglang-server/client Mode
1. Start Server: 1. Start Server:
...@@ -666,10 +666,13 @@ mineru-sglang-server --port 30000 ...@@ -666,10 +666,13 @@ mineru-sglang-server --port 30000
``` ```
> [!TIP] > [!TIP]
> sglang acceleration requires a GPU with Ampere architecture or newer, and at least 24GB VRAM. If you have two 12GB or 16GB GPUs, you can use Tensor Parallelism (TP) mode: > sglang-server has some commonly used parameters for configuration:
> `mineru-sglang-server --port 30000 --tp 2` > - If you have two GPUs with `12GB` or `16GB` VRAM, you can use the Tensor Parallel (TP) mode: `--tp 2`
> > - If you have two GPUs with `11GB` VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size: `--tp 2 --mem-fraction-static 0.7`
> If you still encounter out-of-memory errors with two GPUs, or if you need to improve throughput or inference speed using multi-GPU parallelism, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands). > - If you have more than two GPUs with `24GB` VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput: `--dp 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
> - If you want to learn more about the usage of `sglang` parameters, please refer to the [official sglang documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
2. Use Client in another terminal: 2. Use Client in another terminal:
......
...@@ -429,7 +429,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c ...@@ -429,7 +429,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
<td>解析后端</td> <td>解析后端</td>
<td>pipeline</td> <td>pipeline</td>
<td>vlm-transformers</td> <td>vlm-transformers</td>
<td>vlm-sgslang</td> <td>vlm-sglang</td>
</tr> </tr>
<tr> <tr>
<td>操作系统</td> <td>操作系统</td>
...@@ -492,7 +492,7 @@ cd MinerU ...@@ -492,7 +492,7 @@ cd MinerU
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
``` ```
> [!TIP] > [!NOTE]
> Linux和macOS系统安装后自动支持cuda/mps加速,Windows用户如需使用cuda加速, > Linux和macOS系统安装后自动支持cuda/mps加速,Windows用户如需使用cuda加速,
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。 > 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。
...@@ -640,13 +640,13 @@ mineru -p <input_path> -o <output_path> ...@@ -640,13 +640,13 @@ mineru -p <input_path> -o <output_path>
#### 2.3 使用 sglang 加速 VLM 模型推理 #### 2.3 使用 sglang 加速 VLM 模型推理
##### 启动 sglang-engine 模式 ##### 通过 sglang-engine 模式
```bash ```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
``` ```
##### 启动 sglang-server/client 模式 ##### 通过 sglang-server/client 模式
1. 启动 Server: 1. 启动 Server:
...@@ -655,10 +655,12 @@ mineru-sglang-server --port 30000 ...@@ -655,10 +655,12 @@ mineru-sglang-server --port 30000
``` ```
> [!TIP] > [!TIP]
> sglang加速需设备有Ampere及以后架构,24G显存及以上显卡,如您有两张12G或16G显卡,可以通过张量并行(TP)模式使用: > sglang-server 有一些常用参数可以配置:
> `mineru-sglang-server --port 30000 --tp 2` > - 如您有两张显存为`12G`或`16G`的显卡,可以通过张量并行(TP)模式使用:`--tp 2`
> > - 如您有两张`11G`显卡,除了张量并行外,还需要调低KV缓存大小,可以使用:`--tp 2 --mem-fraction-static 0.7`
> 如使用两张卡仍出现显存不足错误或需要使用多卡并行增加吞吐量或推理速度,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands) > - 如果您有超过多张`24G`以上显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp 2`
> - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile`
> - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
2. 在另一个终端中使用 Client 调用: 2. 在另一个终端中使用 Client 调用:
......
# Documentation:
# https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands
services: services:
mineru-sglang: mineru-sglang:
image: mineru-sglang:latest image: mineru-sglang:latest
...@@ -11,6 +13,10 @@ services: ...@@ -11,6 +13,10 @@ services:
command: command:
--host 0.0.0.0 --host 0.0.0.0
--port 30000 --port 30000
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
# --dp 2 # If you have more than two GPUs with 24GB VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput
# --tp 2 # If you have two GPUs with 12GB or 16GB VRAM, you can use the Tensor Parallel (TP) mode
# --mem-fraction-static 0.7 # If you have two GPUs with 11GB VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size
ulimits: ulimits:
memlock: -1 memlock: -1
stack: 67108864 stack: 67108864
......
...@@ -9,7 +9,7 @@ from ...utils.config_reader import get_formula_enable, get_table_enable ...@@ -9,7 +9,7 @@ from ...utils.config_reader import get_formula_enable, get_table_enable
from ...utils.model_utils import crop_img, get_res_list_from_layout_res from ...utils.model_utils import crop_img, get_res_list_from_layout_res
from ...utils.ocr_utils import get_adjusted_mfdetrec_res, get_ocr_result_list, OcrConfidence from ...utils.ocr_utils import get_adjusted_mfdetrec_res, get_ocr_result_list, OcrConfidence
YOLO_LAYOUT_BASE_BATCH_SIZE = 1 YOLO_LAYOUT_BASE_BATCH_SIZE = 8
MFD_BASE_BATCH_SIZE = 1 MFD_BASE_BATCH_SIZE = 1
MFR_BASE_BATCH_SIZE = 16 MFR_BASE_BATCH_SIZE = 16
......
from typing import List, Dict, Union
from doclayout_yolo import YOLOv10 from doclayout_yolo import YOLOv10
from tqdm import tqdm from tqdm import tqdm
import numpy as np
from PIL import Image
class DocLayoutYOLOModel(object): class DocLayoutYOLOModel:
def __init__(self, weight, device): def __init__(
self.model = YOLOv10(weight) self,
weight: str,
device: str = "cuda",
imgsz: int = 1280,
conf: float = 0.1,
iou: float = 0.45,
):
self.model = YOLOv10(weight).to(device)
self.device = device self.device = device
self.imgsz = imgsz
self.conf = conf
self.iou = iou
def predict(self, image): def _parse_prediction(self, prediction) -> List[Dict]:
layout_res = [] layout_res = []
doclayout_yolo_res = self.model.predict(
image, # 容错处理
imgsz=1280, if not hasattr(prediction, "boxes") or prediction.boxes is None:
conf=0.10, return layout_res
iou=0.45,
verbose=False, device=self.device for xyxy, conf, cls in zip(
)[0] prediction.boxes.xyxy.cpu(),
for xyxy, conf, cla in zip( prediction.boxes.conf.cpu(),
doclayout_yolo_res.boxes.xyxy.cpu(), prediction.boxes.cls.cpu(),
doclayout_yolo_res.boxes.conf.cpu(),
doclayout_yolo_res.boxes.cls.cpu(),
): ):
xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy] coords = list(map(int, xyxy.tolist()))
new_item = { xmin, ymin, xmax, ymax = coords
"category_id": int(cla.item()), layout_res.append({
"category_id": int(cls.item()),
"poly": [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax], "poly": [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
"score": round(float(conf.item()), 3), "score": round(float(conf.item()), 3),
} })
layout_res.append(new_item)
return layout_res return layout_res
def batch_predict(self, images: list, batch_size: int) -> list: def predict(self, image: Union[np.ndarray, Image.Image]) -> List[Dict]:
images_layout_res = [] prediction = self.model.predict(
# for index in range(0, len(images), batch_size): image,
for index in tqdm(range(0, len(images), batch_size), desc="Layout Predict"): imgsz=self.imgsz,
doclayout_yolo_res = [ conf=self.conf,
image_res.cpu() iou=self.iou,
for image_res in self.model.predict( verbose=False
images[index : index + batch_size], )[0]
imgsz=1280, return self._parse_prediction(prediction)
conf=0.10,
iou=0.45, def batch_predict(
self,
images: List[Union[np.ndarray, Image.Image]],
batch_size: int = 4
) -> List[List[Dict]]:
results = []
with tqdm(total=len(images), desc="Layout Predict") as pbar:
for idx in range(0, len(images), batch_size):
batch = images[idx: idx + batch_size]
predictions = self.model.predict(
batch,
imgsz=self.imgsz,
conf=self.conf,
iou=self.iou,
verbose=False, verbose=False,
device=self.device,
) )
] for pred in predictions:
for image_res in doclayout_yolo_res: results.append(self._parse_prediction(pred))
layout_res = [] pbar.update(len(batch))
for xyxy, conf, cla in zip( return results
image_res.boxes.xyxy, \ No newline at end of file
image_res.boxes.conf,
image_res.boxes.cls,
):
xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
new_item = {
"category_id": int(cla.item()),
"poly": [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
"score": round(float(conf.item()), 3),
}
layout_res.append(new_item)
images_layout_res.append(layout_res)
return images_layout_res
from typing import List, Union
from tqdm import tqdm from tqdm import tqdm
from ultralytics import YOLO from ultralytics import YOLO
import numpy as np
from PIL import Image
class YOLOv8MFDModel(object): class YOLOv8MFDModel:
def __init__(self, weight, device="cpu"): def __init__(
self.mfd_model = YOLO(weight) self,
weight: str,
device: str = "cpu",
imgsz: int = 1888,
conf: float = 0.25,
iou: float = 0.45,
):
self.model = YOLO(weight).to(device)
self.device = device self.device = device
self.imgsz = imgsz
self.conf = conf
self.iou = iou
def predict(self, image): def _run_predict(
mfd_res = self.mfd_model.predict( self,
image, imgsz=1888, conf=0.25, iou=0.45, verbose=False, device=self.device inputs: Union[np.ndarray, Image.Image, List],
)[0] is_batch: bool = False
return mfd_res ) -> List:
preds = self.model.predict(
def batch_predict(self, images: list, batch_size: int) -> list: inputs,
images_mfd_res = [] imgsz=self.imgsz,
# for index in range(0, len(images), batch_size): conf=self.conf,
for index in tqdm(range(0, len(images), batch_size), desc="MFD Predict"): iou=self.iou,
mfd_res = [
image_res.cpu()
for image_res in self.mfd_model.predict(
images[index : index + batch_size],
imgsz=1888,
conf=0.25,
iou=0.45,
verbose=False, verbose=False,
device=self.device, device=self.device
) )
] return [pred.cpu() for pred in preds] if is_batch else preds[0].cpu()
for image_res in mfd_res:
images_mfd_res.append(image_res) def predict(self, image: Union[np.ndarray, Image.Image]):
return images_mfd_res return self._run_predict(image)
def batch_predict(
self,
images: List[Union[np.ndarray, Image.Image]],
batch_size: int = 4
) -> List:
results = []
with tqdm(total=len(images), desc="MFD Predict") as pbar:
for idx in range(0, len(images), batch_size):
batch = images[idx: idx + batch_size]
batch_preds = self._run_predict(batch, is_batch=True)
results.extend(batch_preds)
pbar.update(len(batch))
return results
\ No newline at end of file
...@@ -15,7 +15,7 @@ def page_to_image( ...@@ -15,7 +15,7 @@ def page_to_image(
scale = dpi / 72 scale = dpi / 72
long_side_length = max(*page.get_size()) long_side_length = max(*page.get_size())
if long_side_length > max_width_or_height: if (long_side_length*scale) > max_width_or_height:
scale = max_width_or_height / long_side_length scale = max_width_or_height / long_side_length
bitmap: PdfBitmap = page.render(scale=scale) # type: ignore bitmap: PdfBitmap = page.render(scale=scale) # type: ignore
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment