Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
zhougaofeng
magic_pdf
Commits
1e548892
Commit
1e548892
authored
Dec 31, 2024
by
zhougaofeng
Browse files
Update pdf_parse_union_core_v2.py
parent
b9336031
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
0 deletions
+9
-0
magic_pdf/pdf_parse_union_core_v2.py
magic_pdf/pdf_parse_union_core_v2.py
+9
-0
No files found.
magic_pdf/pdf_parse_union_core_v2.py
View file @
1e548892
...
@@ -6,6 +6,7 @@ from typing import List
...
@@ -6,6 +6,7 @@ from typing import List
import
torch
import
torch
from
loguru
import
logger
from
loguru
import
logger
import
configparser
from
magic_pdf.config.enums
import
SupportedPdfParseMethod
from
magic_pdf.config.enums
import
SupportedPdfParseMethod
from
magic_pdf.data.dataset
import
Dataset
,
PageableData
from
magic_pdf.data.dataset
import
Dataset
,
PageableData
...
@@ -555,6 +556,14 @@ def pdf_parse_union(config_path,local_image_dir,
...
@@ -555,6 +556,14 @@ def pdf_parse_union(config_path,local_image_dir,
"""初始化启动时间"""
"""初始化启动时间"""
start_time
=
time
.
time
()
start_time
=
time
.
time
()
config
=
configparser
.
ConfigParser
()
config
.
read
(
config_path
)
url
=
config
.
get
(
'server'
,
'ocr_server'
)
client
=
PredictClient
(
url
)
ocr_status
=
client
.
check_health
()
if
not
ocr_status
:
logger
.
warning
(
f
'Health check failed. The server at "
{
url
}
" is not responding as expected.'
)
logger
.
info
(
f
'Qwen ocr解析服务无法正常运行,暂不使用qwen解析表格服务'
)
for
page_id
,
page
in
enumerate
(
dataset
):
for
page_id
,
page
in
enumerate
(
dataset
):
"""debug时输出每页解析的耗时."""
"""debug时输出每页解析的耗时."""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment