Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
d5ea44f944d73349c6a012fad426cfff0c2a2584
Switch branch/tag
mineru
magic_pdf
16 Mar, 2024
1 commit
按照统一格式组合文本型pdf的解析结果
· d5ea44f9
xuchao
authored
Mar 16, 2024
d5ea44f9
15 Mar, 2024
7 commits
增加标准格式的拼装逻辑
· 051ee3c3
赵小蒙
authored
Mar 15, 2024
051ee3c3
修复spans为空list导致的IndexError: list index out of range
· a0135640
赵小蒙
authored
Mar 15, 2024
a0135640
s3_image_save_path统一配置
· f10b4a50
赵小蒙
authored
Mar 15, 2024
f10b4a50
book_name生成逻辑更新
· b1ac8d03
赵小蒙
authored
Mar 15, 2024
b1ac8d03
join_path逻辑修复
· 84867933
赵小蒙
authored
Mar 15, 2024
84867933
mk_mm_markdown2中span_type分类更新
· 195998a0
赵小蒙
authored
Mar 15, 2024
195998a0
make多模态markdown时图片地址更改为fullpath
· f06a3213
赵小蒙
authored
Mar 15, 2024
f06a3213
14 Mar, 2024
9 commits
实现layout内部分段
· 084e9328
xuchao
authored
Mar 14, 2024
084e9328
update code
· f68c6629
xuchao
authored
Mar 01, 2024
f68c6629
make markdown时特殊符号转义
· 59b0b0c3
赵小蒙
authored
Mar 14, 2024
59b0b0c3
截图增加s3上传逻辑,移除宽或高为0的spans
· 8a2736a5
赵小蒙
authored
Mar 14, 2024
8a2736a5
删除高度或者宽度为0的spans
· 0b35b73c
赵小蒙
authored
Mar 14, 2024
0b35b73c
ocr模式更新spark pipeline
· 9bd6294b
赵小蒙
authored
Mar 14, 2024
9bd6294b
data_type/bookid/data_source兼容处理
· 8a52ada3
赵小蒙
authored
Mar 14, 2024
8a52ada3
ocr模式下content type 抽象
· 26c23782
赵小蒙
authored
Mar 14, 2024
26c23782
在layout.pdf中绘制drop的bbox
· b6f051d8
赵小蒙
authored
Mar 14, 2024
b6f051d8
13 Mar, 2024
10 commits
在dict中加入qa需要的字段
· 85587b25
赵小蒙
authored
Mar 13, 2024
85587b25
fix import
· b560c18f
赵小蒙
authored
Mar 13, 2024
b560c18f
add modify inline equation y axis
· 21cfaf4c
liukaiwen
authored
Mar 13, 2024
add false displayed equation to inline equation
21cfaf4c
fix import
· 6f7aa890
赵小蒙
authored
Mar 13, 2024
6f7aa890
add modify inline equation y axis
· 64d67b5c
liukaiwen
authored
Mar 13, 2024
add false displayed equation to inline equation
64d67b5c
add modify inline equation y axis
· 1f468bed
liukaiwen
authored
Mar 13, 2024
add false displayed equation to inline equation
1f468bed
将对span的操作移动到ocr_span_list_modify,增加独占一行区块的位置调整逻辑
· 32fd7f95
赵小蒙
authored
Mar 13, 2024
32fd7f95
remove_overlaps_min_spans阈值调整 0.8->0.65
· 86dc22ca
赵小蒙
authored
Mar 13, 2024
86dc22ca
draw_bbox工具类逻辑更新
· 07abba71
赵小蒙
authored
Mar 13, 2024
07abba71
移动modify_y_axis在pipeline中的位置
· 63969109
赵小蒙
authored
Mar 13, 2024
63969109
12 Mar, 2024
12 commits
重构drow_bbox为工具类
· 7512baaa
赵小蒙
authored
Mar 12, 2024
7512baaa
span重叠删除的阈值0.8->0.5
· 070139a5
赵小蒙
authored
Mar 12, 2024
070139a5
feat: complete self check
· 2611e853
许瑞
authored
Mar 12, 2024
2611e853
feat: add remove bbox overlap
· 43581df6
许瑞
authored
Mar 12, 2024
43581df6
pdf_info_dict中间态结构调整
· 61a0c62c
赵小蒙
authored
Mar 12, 2024
61a0c62c
debug时自动绘制layout区域和text区域
· f31117de
赵小蒙
authored
Mar 12, 2024
f31117de
增加生成多模态markdown逻辑
· ec1a6ef7
赵小蒙
authored
Mar 12, 2024
ec1a6ef7
修复了layout相交的分离算法,并修复layout排序有误的问题
· 3c8b2545
赵小蒙
authored
Mar 12, 2024
3c8b2545
add modify inline equation y axis
· 5513e48a
liukaiwen
authored
Mar 12, 2024
add false displayed equation to inline equation
5513e48a
lkw
· f9f36c10
liukaiwen
authored
Mar 12, 2024
f9f36c10
lkw
· 94a7ba3d
liukaiwen
authored
Mar 12, 2024
94a7ba3d
更新清除重叠span的逻辑
· 9cc53a5e
赵小蒙
authored
Mar 12, 2024
9cc53a5e
11 Mar, 2024
1 commit
lkw
· 83deab21
liukaiwen
authored
Mar 11, 2024
83deab21