Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
546be00a
Commit
546be00a
authored
Jun 05, 2025
by
myhloli
Browse files
refactor: update OCR score handling to filter low-confidence results
parent
3334157f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
2 deletions
+6
-2
mineru/backend/pipeline/model_json_to_middle_json.py
mineru/backend/pipeline/model_json_to_middle_json.py
+6
-2
No files found.
mineru/backend/pipeline/model_json_to_middle_json.py
View file @
546be00a
...
@@ -197,8 +197,12 @@ def result_to_middle_json(model_list, images_list, pdf_doc, image_writer, lang=N
...
@@ -197,8 +197,12 @@ def result_to_middle_json(model_list, images_list, pdf_doc, image_writer, lang=N
need_ocr_list
),
f
'ocr_res_list:
{
len
(
ocr_res_list
)
}
, need_ocr_list:
{
len
(
need_ocr_list
)
}
'
need_ocr_list
),
f
'ocr_res_list:
{
len
(
ocr_res_list
)
}
, need_ocr_list:
{
len
(
need_ocr_list
)
}
'
for
index
,
span
in
enumerate
(
need_ocr_list
):
for
index
,
span
in
enumerate
(
need_ocr_list
):
ocr_text
,
ocr_score
=
ocr_res_list
[
index
]
ocr_text
,
ocr_score
=
ocr_res_list
[
index
]
if
ocr_score
>
0.6
:
span
[
'content'
]
=
ocr_text
span
[
'content'
]
=
ocr_text
span
[
'score'
]
=
float
(
f
"
{
ocr_score
:.
3
f
}
"
)
span
[
'score'
]
=
float
(
f
"
{
ocr_score
:.
3
f
}
"
)
else
:
span
[
'content'
]
=
''
span
[
'score'
]
=
0.0
"""分段"""
"""分段"""
para_split
(
middle_json
[
"pdf_info"
])
para_split
(
middle_json
[
"pdf_info"
])
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment