Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
43581df6
"git@developer.sourcefind.cn:OpenDAS/ollama.git" did not exist on "734b57da0e04d363e70ea47a3a6f372c0b9dac82"
Commit
43581df6
authored
Mar 12, 2024
by
许瑞
Browse files
feat: add remove bbox overlap
parent
61a0c62c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
43 additions
and
0 deletions
+43
-0
magic_pdf/pre_proc/remove_bbox_overlap.py
magic_pdf/pre_proc/remove_bbox_overlap.py
+43
-0
No files found.
magic_pdf/pre_proc/remove_bbox_overlap.py
0 → 100644
View file @
43581df6
from
magic_pdf.libs.boxbase
import
_is_in_or_part_overlap
,
_is_in
def
_remove_overlap_between_bbox
(
spans
):
res
=
[]
for
v
in
spans
:
for
i
in
range
(
len
(
res
)):
if
_is_in
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
continue
if
_is_in_or_part_overlap
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
ix0
,
iy0
,
ix1
,
iy1
=
res
[
i
][
"bbox"
]
x0
,
y0
,
x1
,
y1
=
v
[
"bbox"
]
diff_x
=
min
(
x1
,
ix1
)
-
max
(
x0
,
ix0
)
diff_y
=
min
(
y1
,
iy1
)
-
max
(
y0
,
iy0
)
if
diff_x
>
diff_y
:
if
x1
>=
ix1
:
mid
=
(
x0
+
ix1
)
//
2
ix1
=
min
(
mid
,
ix1
)
x0
=
max
(
mid
+
1
,
x0
)
else
:
mid
=
(
ix0
+
x1
)
//
2
ix0
=
max
(
mid
+
1
,
ix0
)
x1
=
min
(
mid
,
x1
)
else
:
if
y1
>=
iy1
:
mid
=
(
y0
+
iy1
)
//
2
y0
=
max
(
mid
+
1
,
y0
)
iy1
=
min
(
iy1
,
mid
)
else
:
mid
=
(
iy0
+
y1
)
//
2
y1
=
min
(
y1
,
mid
)
iy0
=
max
(
mid
+
1
,
iy0
)
res
[
i
][
"bbox"
]
=
[
ix0
,
iy0
,
ix1
,
iy1
]
v
[
"bbox"
]
=
[
x0
,
y0
,
x1
,
y1
]
res
.
append
(
v
)
return
res
def
remove_overlap_between_bbox
(
spans
):
return
_remove_overlap_between_bbox
(
spans
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment