Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
f70246d6
Unverified
Commit
f70246d6
authored
Nov 22, 2024
by
Xiaomeng Zhao
Committed by
GitHub
Nov 22, 2024
Browse files
Merge pull request #1058 from myhloli/dev
refactor(para): improve line stop flag and remove unused debug mode
parents
93208f44
5d6cbcb1
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
3 deletions
+3
-3
magic_pdf/para/para_split_v3.py
magic_pdf/para/para_split_v3.py
+1
-1
magic_pdf/pdf_parse_union_core_v2.py
magic_pdf/pdf_parse_union_core_v2.py
+2
-2
No files found.
magic_pdf/para/para_split_v3.py
View file @
f70246d6
...
...
@@ -352,7 +352,7 @@ def __para_merge_page(blocks):
continue
def
para_split
(
pdf_info_dict
,
debug_mode
=
False
):
def
para_split
(
pdf_info_dict
):
all_blocks
=
[]
for
page_num
,
page
in
pdf_info_dict
.
items
():
blocks
=
copy
.
deepcopy
(
page
[
'preproc_blocks'
])
...
...
magic_pdf/pdf_parse_union_core_v2.py
View file @
f70246d6
...
...
@@ -114,7 +114,7 @@ def chars_to_content(span):
del
span
[
'chars'
]
LINE_STOP_FLAG
=
(
'.'
,
'!'
,
'?'
,
'。'
,
'!'
,
'?'
,
')'
,
')'
,
'"'
,
'”'
,
':'
,
':'
,
';'
,
';'
,
']'
,
'】'
,
'}'
,
'}'
,
'>'
,
'》'
,
'、'
,
','
,
','
)
LINE_STOP_FLAG
=
(
'.'
,
'!'
,
'?'
,
'。'
,
'!'
,
'?'
,
')'
,
')'
,
'"'
,
'”'
,
':'
,
':'
,
';'
,
';'
,
']'
,
'】'
,
'}'
,
'}'
,
'>'
,
'》'
,
'、'
,
','
,
','
,
'-'
,
'—'
,
'–'
,
)
def
fill_char_in_spans
(
spans
,
all_chars
):
for
char
in
all_chars
:
...
...
@@ -830,7 +830,7 @@ def pdf_parse_union(
pdf_info_dict
[
f
'page_
{
page_id
}
'
]
=
page_info
"""分段"""
para_split
(
pdf_info_dict
,
debug_mode
=
debug_mode
)
para_split
(
pdf_info_dict
)
"""dict转list"""
pdf_info_list
=
dict_to_list
(
pdf_info_dict
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment