Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
b94fd7f068d241df502ea0d3931e205bd7f599a1
Switch branch/tag
mineru
magic_pdf
pipeline.py
21 Mar, 2024
4 commits
feat: add layout
· 4f1f7d62
许瑞
authored
Mar 21, 2024
4f1f7d62
qa版本最终阶段保留pdf_intermediate_dict信息
· 1d5d7781
赵小蒙
authored
Mar 21, 2024
1d5d7781
fix: fix typo
· 390fdb2c
许瑞
authored
Mar 21, 2024
390fdb2c
feat: add extract_train_data
· 09269c84
许瑞
authored
Mar 20, 2024
09269c84
19 Mar, 2024
1 commit
qa需求定制输出
· ef267e09
赵小蒙
authored
Mar 19, 2024
ef267e09
18 Mar, 2024
2 commits
ocr后不需要再次检测need_drop,且ocr_dropped_parse_pdf逻辑后需要将need_drop置为false
· f5b9cff4
赵小蒙
authored
Mar 18, 2024
f5b9cff4
增加uni_parse_pdf逻辑
· b7c12891
赵小蒙
authored
Mar 18, 2024
b7c12891
16 Mar, 2024
1 commit
按照统一格式组合文本型pdf的解析结果
· d5ea44f9
xuchao
authored
Mar 16, 2024
d5ea44f9
15 Mar, 2024
3 commits
增加标准格式的拼装逻辑
· 051ee3c3
赵小蒙
authored
Mar 15, 2024
051ee3c3
s3_image_save_path统一配置
· f10b4a50
赵小蒙
authored
Mar 15, 2024
f10b4a50
book_name生成逻辑更新
· b1ac8d03
赵小蒙
authored
Mar 15, 2024
b1ac8d03
14 Mar, 2024
2 commits
ocr模式更新spark pipeline
· 9bd6294b
赵小蒙
authored
Mar 14, 2024
9bd6294b
data_type/bookid/data_source兼容处理
· 8a52ada3
赵小蒙
authored
Mar 14, 2024
8a52ada3
01 Mar, 2024
1 commit
目录重构
· f7a7206e
赵小蒙
authored
Mar 02, 2024
f7a7206e