Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
a4a9fd693403f1ebcd0438f3f2b90cea35e50df4
Switch branch/tag
mineru
magic_pdf
pdf_parse_by_ocr.py
20 Mar, 2024
1 commit
为ocr模式的demo增加online模式,pipeline进行微调适配online模式
· ce96c3f6
赵小蒙
authored
Mar 20, 2024
ce96c3f6
19 Mar, 2024
1 commit
实现页面与页面之间段落的合并
· acabae56
xuchao
authored
Mar 19, 2024
acabae56
18 Mar, 2024
1 commit
ocr模式对所有drop的span记录tag并分类
· 5eab010b
赵小蒙
authored
Mar 18, 2024
5eab010b
14 Mar, 2024
5 commits
实现layout内部分段
· 084e9328
xuchao
authored
Mar 14, 2024
084e9328
截图增加s3上传逻辑,移除宽或高为0的spans
· 8a2736a5
赵小蒙
authored
Mar 14, 2024
8a2736a5
删除高度或者宽度为0的spans
· 0b35b73c
赵小蒙
authored
Mar 14, 2024
0b35b73c
ocr模式下content type 抽象
· 26c23782
赵小蒙
authored
Mar 14, 2024
26c23782
在layout.pdf中绘制drop的bbox
· b6f051d8
赵小蒙
authored
Mar 14, 2024
b6f051d8
13 Mar, 2024
5 commits
在dict中加入qa需要的字段
· 85587b25
赵小蒙
authored
Mar 13, 2024
85587b25
add modify inline equation y axis
· 64d67b5c
liukaiwen
authored
Mar 13, 2024
add false displayed equation to inline equation
64d67b5c
add modify inline equation y axis
· 1f468bed
liukaiwen
authored
Mar 13, 2024
add false displayed equation to inline equation
1f468bed
将对span的操作移动到ocr_span_list_modify,增加独占一行区块的位置调整逻辑
· 32fd7f95
赵小蒙
authored
Mar 13, 2024
32fd7f95
移动modify_y_axis在pipeline中的位置
· 63969109
赵小蒙
authored
Mar 13, 2024
63969109
12 Mar, 2024
5 commits
重构drow_bbox为工具类
· 7512baaa
赵小蒙
authored
Mar 12, 2024
7512baaa
feat: complete self check
· 2611e853
许瑞
authored
Mar 12, 2024
2611e853
pdf_info_dict中间态结构调整
· 61a0c62c
赵小蒙
authored
Mar 12, 2024
61a0c62c
debug时自动绘制layout区域和text区域
· f31117de
赵小蒙
authored
Mar 12, 2024
f31117de
lkw
· 94a7ba3d
liukaiwen
authored
Mar 12, 2024
94a7ba3d
11 Mar, 2024
1 commit
lkw
· 83deab21
liukaiwen
authored
Mar 11, 2024
83deab21
08 Mar, 2024
6 commits
ocr模式增加截图功能
· a5f8de98
赵小蒙
authored
Mar 08, 2024
a5f8de98
ocr pipeline更新
· 17b09f71
赵小蒙
authored
Mar 08, 2024
17b09f71
span->line现基于模型的layout进行拼接
· 864e9535
赵小蒙
authored
Mar 08, 2024
864e9535
对模型的layout坐标转换
· f9bd0040
赵小蒙
authored
Mar 08, 2024
f9bd0040
将模型和pymu坐标的转换逻辑抽象成方法
· f62d1aa7
赵小蒙
authored
Mar 08, 2024
f62d1aa7
ocr模式下删除header/page number/footnote/footer
· 388223f2
赵小蒙
authored
Mar 08, 2024
388223f2
07 Mar, 2024
2 commits
增加ocr模式的layout解析功能
· fcea39d3
赵小蒙
authored
Mar 07, 2024
fcea39d3
ocr拼接逻辑更新
· caa1588a
赵小蒙
authored
Mar 07, 2024
caa1588a
06 Mar, 2024
2 commits
parse_pdf_by_ocr 逻辑更新
· a0be4652
赵小蒙
authored
Mar 06, 2024
a0be4652
增加ocr版本解析功能
· 701f3849
赵小蒙
authored
Mar 06, 2024
701f3849