Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
37483f0a
Commit
37483f0a
authored
Apr 22, 2024
by
liukaiwen
Browse files
更新了para_split
parent
4cc88d2b
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
1 deletion
+7
-1
magic_pdf/para/para_split_v2.py
magic_pdf/para/para_split_v2.py
+7
-1
No files found.
magic_pdf/para/para_split_v2.py
View file @
37483f0a
...
@@ -696,4 +696,10 @@ def para_split(pdf_info_dict, debug_mode, lang="en"):
...
@@ -696,4 +696,10 @@ def para_split(pdf_info_dict, debug_mode, lang="en"):
page_paras
=
page
[
'para_blocks'
]
page_paras
=
page
[
'para_blocks'
]
new_layout_bbox
=
new_layout_of_pages
[
page_num
]
new_layout_bbox
=
new_layout_of_pages
[
page_num
]
__connect_middle_align_text
(
page_paras
,
new_layout_bbox
,
page_num
,
lang
,
debug_mode
=
debug_mode
)
__connect_middle_align_text
(
page_paras
,
new_layout_bbox
,
page_num
,
lang
,
debug_mode
=
debug_mode
)
__merge_signle_list_text
(
page_paras
,
new_layout_bbox
,
page_num
,
lang
)
__merge_signle_list_text
(
page_paras
,
new_layout_bbox
,
page_num
,
lang
)
\ No newline at end of file
# layout展平
for
page_num
,
page
in
enumerate
(
pdf_info_dict
.
values
()):
page_paras
=
page
[
'para_blocks'
]
page_blocks
=
[
block
for
layout
in
page_paras
for
block
in
layout
]
page
[
"para_blocks"
]
=
page_blocks
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment