Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
3c8385c2
"vscode:/vscode.git/clone" did not exist on "1dacedd2dbada7942b5a7348ea78679f69b5f10e"
Unverified
Commit
3c8385c2
authored
Jun 20, 2025
by
Xiaomeng Zhao
Committed by
GitHub
Jun 20, 2025
Browse files
Merge pull request #2751 from myhloli/dev
Dev
parents
6162ae2b
d29cf4e0
Changes
5
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
39 additions
and
27 deletions
+39
-27
.github/ISSUE_TEMPLATE/bug_report.yml
.github/ISSUE_TEMPLATE/bug_report.yml
+2
-5
README.md
README.md
+5
-1
README_zh-CN.md
README_zh-CN.md
+5
-1
mineru/backend/vlm/vlm_magic_model.py
mineru/backend/vlm/vlm_magic_model.py
+26
-19
mineru/model/ocr/paddleocr2pytorch/pytorch_paddle.py
mineru/model/ocr/paddleocr2pytorch/pytorch_paddle.py
+1
-1
No files found.
.github/ISSUE_TEMPLATE/bug_report.yml
View file @
3c8385c2
...
@@ -109,14 +109,11 @@ body:
...
@@ -109,14 +109,11 @@ body:
-
type
:
dropdown
-
type
:
dropdown
id
:
software_version
id
:
software_version
attributes
:
attributes
:
label
:
Software version | 软件版本 (m
agic-pdf
--version)
label
:
Software version | 软件版本 (m
ineru
--version)
#multiple: false
#multiple: false
options
:
options
:
-
-
-
"
1.0.x"
-
"
2.0.x"
-
"
1.1.x"
-
"
1.2.x"
-
"
1.3.x"
validations
:
validations
:
required
:
true
required
:
true
...
...
README.md
View file @
3c8385c2
...
@@ -502,7 +502,11 @@ cd MinerU
...
@@ -502,7 +502,11 @@ cd MinerU
uv pip
install
-e
.[core]
uv pip
install
-e
.[core]
```
```
#### 1.3 Install the Full Version (Supports sglang Acceleration)
> [!TIP]
> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration,
> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
#### 1.3 Install Full Version (supports sglang acceleration) (requires device with Ampere or newer architecture and at least 24GB GPU memory)
If you need to use
**sglang to accelerate VLM model inference**
, you can choose any of the following methods to install the full version:
If you need to use
**sglang to accelerate VLM model inference**
, you can choose any of the following methods to install the full version:
...
...
README_zh-CN.md
View file @
3c8385c2
...
@@ -492,7 +492,11 @@ cd MinerU
...
@@ -492,7 +492,11 @@ cd MinerU
uv pip
install
-e
.[core]
-i
https://mirrors.aliyun.com/pypi/simple
uv pip
install
-e
.[core]
-i
https://mirrors.aliyun.com/pypi/simple
```
```
#### 1.3 安装完整版(支持 sglang 加速)
> [!TIP]
> Linux和macOS系统安装后自动支持cuda/mps加速,Windows用户如需使用cuda加速,请前往 [Pytorch官网](https://pytorch.org/get-started/locally/)
> 选择合适的cuda版本安装pytorch。
#### 1.3 安装完整版(支持 sglang 加速)(需确保设备有Ampere及以后架构,24G显存及以上显卡)
如需使用
**sglang 加速 VLM 模型推理**
,请选择合适的方式安装完整版本:
如需使用
**sglang 加速 VLM 模型推理**
,请选择合适的方式安装完整版本:
...
...
mineru/backend/vlm/vlm_magic_model.py
View file @
3c8385c2
import
re
import
re
from
typing
import
Literal
from
typing
import
Literal
from
loguru
import
logger
from
mineru.utils.boxbase
import
bbox_distance
,
is_in
from
mineru.utils.boxbase
import
bbox_distance
,
is_in
from
mineru.utils.enum_class
import
ContentType
,
BlockType
,
SplitFlag
from
mineru.utils.enum_class
import
ContentType
,
BlockType
,
SplitFlag
from
mineru.backend.vlm.vlm_middle_json_mkcontent
import
merge_para_with_text
from
mineru.backend.vlm.vlm_middle_json_mkcontent
import
merge_para_with_text
...
@@ -22,6 +24,7 @@ class MagicModel:
...
@@ -22,6 +24,7 @@ class MagicModel:
# 解析每个块
# 解析每个块
for
index
,
block_info
in
enumerate
(
block_infos
):
for
index
,
block_info
in
enumerate
(
block_infos
):
block_bbox
=
block_info
[
0
].
strip
()
block_bbox
=
block_info
[
0
].
strip
()
try
:
x1
,
y1
,
x2
,
y2
=
map
(
int
,
block_bbox
.
split
())
x1
,
y1
,
x2
,
y2
=
map
(
int
,
block_bbox
.
split
())
x_1
,
y_1
,
x_2
,
y_2
=
(
x_1
,
y_1
,
x_2
,
y_2
=
(
int
(
x1
*
width
/
1000
),
int
(
x1
*
width
/
1000
),
...
@@ -41,6 +44,10 @@ class MagicModel:
...
@@ -41,6 +44,10 @@ class MagicModel:
# print(f"类型: {block_type}")
# print(f"类型: {block_type}")
# print(f"内容: {block_content}")
# print(f"内容: {block_content}")
# print("-" * 50)
# print("-" * 50)
except
Exception
as
e
:
# 如果解析失败,可能是因为格式不正确,跳过这个块
logger
.
warning
(
f
"Invalid block format:
{
block_info
}
, error:
{
e
}
"
)
continue
span_type
=
"unknown"
span_type
=
"unknown"
if
block_type
in
[
if
block_type
in
[
...
...
mineru/model/ocr/paddleocr2pytorch/pytorch_paddle.py
View file @
3c8385c2
...
@@ -58,7 +58,7 @@ class PytorchPaddleOCR(TextSystem):
...
@@ -58,7 +58,7 @@ class PytorchPaddleOCR(TextSystem):
device
=
get_device
()
device
=
get_device
()
if
device
==
'cpu'
and
self
.
lang
in
[
'ch'
,
'ch_server'
,
'japan'
,
'chinese_cht'
]:
if
device
==
'cpu'
and
self
.
lang
in
[
'ch'
,
'ch_server'
,
'japan'
,
'chinese_cht'
]:
logger
.
warning
(
"The current device in use is CPU. To ensure the speed of parsing, the language is automatically switched to ch_lite."
)
#
logger.warning("The current device in use is CPU. To ensure the speed of parsing, the language is automatically switched to ch_lite.")
self
.
lang
=
'ch_lite'
self
.
lang
=
'ch_lite'
if
self
.
lang
in
latin_lang
:
if
self
.
lang
in
latin_lang
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment