Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
2acd1ecc466627ae7bc56abf047d726b422ffd97
Switch branch/tag
mineru
magic_pdf
pipeline.py
19 Mar, 2024
1 commit
qa需求定制输出
· ef267e09
赵小蒙
authored
Mar 19, 2024
ef267e09
18 Mar, 2024
2 commits
ocr后不需要再次检测need_drop,且ocr_dropped_parse_pdf逻辑后需要将need_drop置为false
· f5b9cff4
赵小蒙
authored
Mar 18, 2024
f5b9cff4
增加uni_parse_pdf逻辑
· b7c12891
赵小蒙
authored
Mar 18, 2024
b7c12891
16 Mar, 2024
1 commit
按照统一格式组合文本型pdf的解析结果
· d5ea44f9
xuchao
authored
Mar 16, 2024
d5ea44f9
15 Mar, 2024
3 commits
增加标准格式的拼装逻辑
· 051ee3c3
赵小蒙
authored
Mar 15, 2024
051ee3c3
s3_image_save_path统一配置
· f10b4a50
赵小蒙
authored
Mar 15, 2024
f10b4a50
book_name生成逻辑更新
· b1ac8d03
赵小蒙
authored
Mar 15, 2024
b1ac8d03
14 Mar, 2024
2 commits
ocr模式更新spark pipeline
· 9bd6294b
赵小蒙
authored
Mar 14, 2024
9bd6294b
data_type/bookid/data_source兼容处理
· 8a52ada3
赵小蒙
authored
Mar 14, 2024
8a52ada3
01 Mar, 2024
1 commit
目录重构
· f7a7206e
赵小蒙
authored
Mar 02, 2024
f7a7206e