Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
c9c14bea
Commit
c9c14bea
authored
Mar 04, 2024
by
赵小蒙
Browse files
更新readme
parent
9fe81795
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
43 additions
and
11 deletions
+43
-11
README.md
README.md
+24
-11
others/README.md
others/README.md
+19
-0
No files found.
README.md
View file @
c9c14bea
# pdf_toolbox
pdf 解析基础函数
#
#
pdf是否是文字类型/扫描类型的区分
#
Magic-PDF
```
shell
cat
s3_pdf_path.example.pdf | parallel
--colsep
' '
-j
10
"python pdf_meta_scan.py --s3-pdf-path {2} --s3-profile {1} >> {/}.jsonl"
便捷、准确的将PDF转换成Markdown文档
find
dir
/to/jsonl/
-type
f
-name
"*.jsonl"
| parallel
-j
10
"python pdf_classfy_by_type.py --json_file {} >> {/}.jsonl"
```
### 上手指南
###### 开发前的配置要求
python 3.9+
```
shell
# 如果单独运行脚本,合并到code-clean之后需要运行,参考如下:
python
-m
pdf_meta_scan
--s3-pdf-path
"D:
\p
df_files
\内
容排序测试_pdf
\p
3_图文混排 5.pdf"
--s3-profile
s2
###### **安装步骤**
1.
Clone the repo
```
sh
git clone https://github.com/myhloli/Magic-PDF.git
```
## pdf
### 版权说明
该项目签署了MIT 授权许可,详情请参阅
[
LICENSE.txt
](
https://github.com/shaojintian/Best_README_template/blob/master/LICENSE.txt
)
### 鸣谢
-
[
PyMuPDF
](
https://github.com/pymupdf/PyMuPDF
)
others/README.md
0 → 100644
View file @
c9c14bea
# pdf_toolbox
pdf 解析基础函数
## pdf是否是文字类型/扫描类型的区分
```
shell
cat
s3_pdf_path.example.pdf | parallel
--colsep
' '
-j
10
"python pdf_meta_scan.py --s3-pdf-path {2} --s3-profile {1} >> {/}.jsonl"
find
dir
/to/jsonl/
-type
f
-name
"*.jsonl"
| parallel
-j
10
"python pdf_classfy_by_type.py --json_file {} >> {/}.jsonl"
```
```
shell
# 如果单独运行脚本,合并到code-clean之后需要运行,参考如下:
python
-m
pdf_meta_scan
--s3-pdf-path
"D:
\p
df_files
\内
容排序测试_pdf
\p
3_图文混排 5.pdf"
--s3-profile
s2
```
## pdf
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment