Unverified Commit 94410962 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

Merge pull request #2777 from opendatalab/dev

update docs
parents 07b4d6dc 9e6256c5
......@@ -439,7 +439,7 @@ There are three different ways to experience MinerU:
<td>Parsing Backend</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sgslang</td>
<td>vlm-sglang</td>
</tr>
<tr>
<td>Operating System</td>
......@@ -502,7 +502,7 @@ cd MinerU
uv pip install -e .[core]
```
> [!TIP]
> [!NOTE]
> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration,
> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
......@@ -651,13 +651,13 @@ mineru -p <input_path> -o <output_path>
#### 2.3 Using sglang to Accelerate VLM Model Inference
##### Start sglang-engine Mode
##### Through the sglang-engine Mode
```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
```
##### Start sglang-server/client Mode
##### Through the sglang-server/client Mode
1. Start Server:
......@@ -666,10 +666,13 @@ mineru-sglang-server --port 30000
```
> [!TIP]
> sglang acceleration requires a GPU with Ampere architecture or newer, and at least 24GB VRAM. If you have two 12GB or 16GB GPUs, you can use Tensor Parallelism (TP) mode:
> `mineru-sglang-server --port 30000 --tp 2`
>
> If you still encounter out-of-memory errors with two GPUs, or if you need to improve throughput or inference speed using multi-GPU parallelism, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands).
> sglang-server has some commonly used parameters for configuration:
> - If you have two GPUs with `12GB` or `16GB` VRAM, you can use the Tensor Parallel (TP) mode: `--tp 2`
> - If you have two GPUs with `11GB` VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size: `--tp 2 --mem-fraction-static 0.7`
> - If you have more than two GPUs with `24GB` VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput: `--dp 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
> - If you want to learn more about the usage of `sglang` parameters, please refer to the [official sglang documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
2. Use Client in another terminal:
......
......@@ -429,7 +429,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
<td>解析后端</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sgslang</td>
<td>vlm-sglang</td>
</tr>
<tr>
<td>操作系统</td>
......@@ -492,7 +492,7 @@ cd MinerU
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
```
> [!TIP]
> [!NOTE]
> Linux和macOS系统安装后自动支持cuda/mps加速,Windows用户如需使用cuda加速,
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。
......@@ -640,13 +640,13 @@ mineru -p <input_path> -o <output_path>
#### 2.3 使用 sglang 加速 VLM 模型推理
##### 启动 sglang-engine 模式
##### 通过 sglang-engine 模式
```bash
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
```
##### 启动 sglang-server/client 模式
##### 通过 sglang-server/client 模式
1. 启动 Server:
......@@ -655,10 +655,12 @@ mineru-sglang-server --port 30000
```
> [!TIP]
> sglang加速需设备有Ampere及以后架构,24G显存及以上显卡,如您有两张12G或16G显卡,可以通过张量并行(TP)模式使用:
> `mineru-sglang-server --port 30000 --tp 2`
>
> 如使用两张卡仍出现显存不足错误或需要使用多卡并行增加吞吐量或推理速度,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
> sglang-server 有一些常用参数可以配置:
> - 如您有两张显存为`12G`或`16G`的显卡,可以通过张量并行(TP)模式使用:`--tp 2`
> - 如您有两张`11G`显卡,除了张量并行外,还需要调低KV缓存大小,可以使用:`--tp 2 --mem-fraction-static 0.7`
> - 如果您有超过多张`24G`以上显卡,可以使用sglang的多卡并行模式来增加吞吐量:`--dp 2`
> - 同时您可以启用`torch.compile`来将推理速度加速约15%:`--enable-torch-compile`
> - 如果您想了解更多有关`sglang`的参数使用方法,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
2. 在另一个终端中使用 Client 调用:
......
# Documentation:
# https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands
services:
mineru-sglang:
image: mineru-sglang:latest
......@@ -11,6 +13,10 @@ services:
command:
--host 0.0.0.0
--port 30000
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
# --dp 2 # If you have more than two GPUs with 24GB VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput
# --tp 2 # If you have two GPUs with 12GB or 16GB VRAM, you can use the Tensor Parallel (TP) mode
# --mem-fraction-static 0.7 # If you have two GPUs with 11GB VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size
ulimits:
memlock: -1
stack: 67108864
......@@ -23,4 +29,4 @@ services:
devices:
- driver: nvidia
device_ids: ["0"]
capabilities: [gpu]
\ No newline at end of file
capabilities: [gpu]
......@@ -9,7 +9,7 @@ from ...utils.config_reader import get_formula_enable, get_table_enable
from ...utils.model_utils import crop_img, get_res_list_from_layout_res
from ...utils.ocr_utils import get_adjusted_mfdetrec_res, get_ocr_result_list, OcrConfidence
YOLO_LAYOUT_BASE_BATCH_SIZE = 1
YOLO_LAYOUT_BASE_BATCH_SIZE = 8
MFD_BASE_BATCH_SIZE = 1
MFR_BASE_BATCH_SIZE = 16
......
from typing import List, Dict, Union
from doclayout_yolo import YOLOv10
from tqdm import tqdm
import numpy as np
from PIL import Image
class DocLayoutYOLOModel(object):
def __init__(self, weight, device):
self.model = YOLOv10(weight)
class DocLayoutYOLOModel:
def __init__(
self,
weight: str,
device: str = "cuda",
imgsz: int = 1280,
conf: float = 0.1,
iou: float = 0.45,
):
self.model = YOLOv10(weight).to(device)
self.device = device
self.imgsz = imgsz
self.conf = conf
self.iou = iou
def predict(self, image):
def _parse_prediction(self, prediction) -> List[Dict]:
layout_res = []
doclayout_yolo_res = self.model.predict(
image,
imgsz=1280,
conf=0.10,
iou=0.45,
verbose=False, device=self.device
)[0]
for xyxy, conf, cla in zip(
doclayout_yolo_res.boxes.xyxy.cpu(),
doclayout_yolo_res.boxes.conf.cpu(),
doclayout_yolo_res.boxes.cls.cpu(),
# 容错处理
if not hasattr(prediction, "boxes") or prediction.boxes is None:
return layout_res
for xyxy, conf, cls in zip(
prediction.boxes.xyxy.cpu(),
prediction.boxes.conf.cpu(),
prediction.boxes.cls.cpu(),
):
xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
new_item = {
"category_id": int(cla.item()),
coords = list(map(int, xyxy.tolist()))
xmin, ymin, xmax, ymax = coords
layout_res.append({
"category_id": int(cls.item()),
"poly": [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
"score": round(float(conf.item()), 3),
}
layout_res.append(new_item)
})
return layout_res
def batch_predict(self, images: list, batch_size: int) -> list:
images_layout_res = []
# for index in range(0, len(images), batch_size):
for index in tqdm(range(0, len(images), batch_size), desc="Layout Predict"):
doclayout_yolo_res = [
image_res.cpu()
for image_res in self.model.predict(
images[index : index + batch_size],
imgsz=1280,
conf=0.10,
iou=0.45,
def predict(self, image: Union[np.ndarray, Image.Image]) -> List[Dict]:
prediction = self.model.predict(
image,
imgsz=self.imgsz,
conf=self.conf,
iou=self.iou,
verbose=False
)[0]
return self._parse_prediction(prediction)
def batch_predict(
self,
images: List[Union[np.ndarray, Image.Image]],
batch_size: int = 4
) -> List[List[Dict]]:
results = []
with tqdm(total=len(images), desc="Layout Predict") as pbar:
for idx in range(0, len(images), batch_size):
batch = images[idx: idx + batch_size]
predictions = self.model.predict(
batch,
imgsz=self.imgsz,
conf=self.conf,
iou=self.iou,
verbose=False,
device=self.device,
)
]
for image_res in doclayout_yolo_res:
layout_res = []
for xyxy, conf, cla in zip(
image_res.boxes.xyxy,
image_res.boxes.conf,
image_res.boxes.cls,
):
xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
new_item = {
"category_id": int(cla.item()),
"poly": [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
"score": round(float(conf.item()), 3),
}
layout_res.append(new_item)
images_layout_res.append(layout_res)
return images_layout_res
for pred in predictions:
results.append(self._parse_prediction(pred))
pbar.update(len(batch))
return results
\ No newline at end of file
from typing import List, Union
from tqdm import tqdm
from ultralytics import YOLO
import numpy as np
from PIL import Image
class YOLOv8MFDModel(object):
def __init__(self, weight, device="cpu"):
self.mfd_model = YOLO(weight)
class YOLOv8MFDModel:
def __init__(
self,
weight: str,
device: str = "cpu",
imgsz: int = 1888,
conf: float = 0.25,
iou: float = 0.45,
):
self.model = YOLO(weight).to(device)
self.device = device
self.imgsz = imgsz
self.conf = conf
self.iou = iou
def predict(self, image):
mfd_res = self.mfd_model.predict(
image, imgsz=1888, conf=0.25, iou=0.45, verbose=False, device=self.device
)[0]
return mfd_res
def _run_predict(
self,
inputs: Union[np.ndarray, Image.Image, List],
is_batch: bool = False
) -> List:
preds = self.model.predict(
inputs,
imgsz=self.imgsz,
conf=self.conf,
iou=self.iou,
verbose=False,
device=self.device
)
return [pred.cpu() for pred in preds] if is_batch else preds[0].cpu()
def batch_predict(self, images: list, batch_size: int) -> list:
images_mfd_res = []
# for index in range(0, len(images), batch_size):
for index in tqdm(range(0, len(images), batch_size), desc="MFD Predict"):
mfd_res = [
image_res.cpu()
for image_res in self.mfd_model.predict(
images[index : index + batch_size],
imgsz=1888,
conf=0.25,
iou=0.45,
verbose=False,
device=self.device,
)
]
for image_res in mfd_res:
images_mfd_res.append(image_res)
return images_mfd_res
def predict(self, image: Union[np.ndarray, Image.Image]):
return self._run_predict(image)
def batch_predict(
self,
images: List[Union[np.ndarray, Image.Image]],
batch_size: int = 4
) -> List:
results = []
with tqdm(total=len(images), desc="MFD Predict") as pbar:
for idx in range(0, len(images), batch_size):
batch = images[idx: idx + batch_size]
batch_preds = self._run_predict(batch, is_batch=True)
results.extend(batch_preds)
pbar.update(len(batch))
return results
\ No newline at end of file
......@@ -15,7 +15,7 @@ def page_to_image(
scale = dpi / 72
long_side_length = max(*page.get_size())
if long_side_length > max_width_or_height:
if (long_side_length*scale) > max_width_or_height:
scale = max_width_or_height / long_side_length
bitmap: PdfBitmap = page.render(scale=scale) # type: ignore
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment