Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
97bcc8b2
"docs/vscode:/vscode.git/clone" did not exist on "71f58f8db08dfd0d869e0c0bbe0a7fd197df1660"
Commit
97bcc8b2
authored
Nov 25, 2024
by
myhloli
Browse files
refactor(pdf_parse): improve code readability and maintainability
parent
034c59a8
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
10 deletions
+9
-10
magic_pdf/pdf_parse_union_core_v2.py
magic_pdf/pdf_parse_union_core_v2.py
+9
-10
No files found.
magic_pdf/pdf_parse_union_core_v2.py
View file @
97bcc8b2
...
@@ -182,8 +182,7 @@ def txt_spans_extract_v2(pdf_page, spans, all_bboxes, all_discarded_blocks, lang
...
@@ -182,8 +182,7 @@ def txt_spans_extract_v2(pdf_page, spans, all_bboxes, all_discarded_blocks, lang
for
block
in
all_bboxes
+
all_discarded_blocks
:
for
block
in
all_bboxes
+
all_discarded_blocks
:
if
block
[
7
]
in
[
BlockType
.
ImageBody
,
BlockType
.
TableBody
,
BlockType
.
InterlineEquation
]:
if
block
[
7
]
in
[
BlockType
.
ImageBody
,
BlockType
.
TableBody
,
BlockType
.
InterlineEquation
]:
continue
continue
overlap_ratio
=
calculate_overlap_area_in_bbox1_area_ratio
(
span
[
'bbox'
],
block
[
0
:
4
])
if
calculate_overlap_area_in_bbox1_area_ratio
(
span
[
'bbox'
],
block
[
0
:
4
])
>
0.5
:
if
overlap_ratio
>
0.5
:
if
block
in
all_bboxes
:
if
block
in
all_bboxes
:
useful_spans
.
append
(
span
)
useful_spans
.
append
(
span
)
else
:
else
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment