Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
4d6dcb00
Commit
4d6dcb00
authored
Jul 13, 2024
by
myhloli
Browse files
docs: update readme
parent
19fd0a40
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
24 additions
and
26 deletions
+24
-26
README.md
README.md
+12
-13
README_zh-CN.md
README_zh-CN.md
+12
-13
No files found.
README.md
View file @
4d6dcb00
...
@@ -94,9 +94,9 @@ Alternatively, for built-in high-precision model parsing capabilities, use:
...
@@ -94,9 +94,9 @@ Alternatively, for built-in high-precision model parsing capabilities, use:
```
bash
```
bash
pip
install
magic-pdf[full-cpu]
pip
install
magic-pdf[full-cpu]
```
```
The high-precision models depend on detectron2, which requires a compiled installation.
The high-precision models depend on detectron2, which requires a compiled installation.
If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114
If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114
Or directly use our pre-compiled wheel packages (limited to python 3.10):
Or directly use our pre-compiled wheel packages (limited to python 3.10):
```
bash
```
bash
pip
install
detectron2
--extra-index-url
https://myhloli.github.io/wheels/
pip
install
detectron2
--extra-index-url
https://myhloli.github.io/wheels/
```
```
...
@@ -104,7 +104,7 @@ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
...
@@ -104,7 +104,7 @@ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
#### 2. Downloading model weights files
#### 2. Downloading model weights files
For detailed references, please see below
[
how_to_download_models
](
docs/how_to_download_models_en.md
)
For detailed references, please see below
[
how_to_download_models
](
docs/how_to_download_models_en.md
)
After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD.
After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD.
...
@@ -130,9 +130,9 @@ In magic-pdf.json, configure "models-dir" to point to the directory where the mo
...
@@ -130,9 +130,9 @@ In magic-pdf.json, configure "models-dir" to point to the directory where the mo
```
bash
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
```
```
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
You can find the corresponding xxx_model.json file in the markdown directory.
You can find the corresponding xxx_model.json file in the markdown directory.
If you intend to do secondary development on the post-processing pipeline, you can use the command:
If you intend to do secondary development on the post-processing pipeline, you can use the command:
```
bash
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
```
...
@@ -150,12 +150,12 @@ magic-pdf --help
...
@@ -150,12 +150,12 @@ magic-pdf --help
##### CUDA
##### CUDA
You need to install the corresponding PyTorch version according to your CUDA version.
You need to install the corresponding PyTorch version according to your CUDA version.
This example installs the CUDA 11.8 version.More information https://pytorch.org/get-started/locally/
```
bash
```
bash
# When using the GPU solution, you need to reinstall PyTorch for the corresponding CUDA version. This example installs the CUDA 11.8 version.
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
```
```
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```
json
```
json
{
{
"device-mode"
:
"cuda"
"device-mode"
:
"cuda"
...
@@ -164,9 +164,8 @@ Also, you need to modify the value of "device-mode" in the configuration file ma
...
@@ -164,9 +164,8 @@ Also, you need to modify the value of "device-mode" in the configuration file ma
##### MPS
##### MPS
For macOS users with M-series chip devices, you can use MPS for inference acceleration.
For macOS users with M-series chip devices, you can use MPS for inference acceleration.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```
json
```
json
{
{
"device-mode"
:
"mps"
"device-mode"
:
"mps"
...
...
README_zh-CN.md
View file @
4d6dcb00
...
@@ -70,7 +70,7 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
...
@@ -70,7 +70,7 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
python >= 3.9
python >= 3.9
推荐使用虚拟环境,以避免可能发生的依赖冲突,venv和conda均可使用。
推荐使用虚拟环境,以避免可能发生的依赖冲突,venv和conda均可使用。
例如:
例如:
```
bash
```
bash
conda create
-n
MinerU
python
=
3.10
conda create
-n
MinerU
python
=
3.10
...
@@ -90,19 +90,19 @@ pip install magic-pdf
...
@@ -90,19 +90,19 @@ pip install magic-pdf
```
bash
```
bash
pip
install
magic-pdf[full-cpu]
pip
install
magic-pdf[full-cpu]
```
```
高精度模型依赖于detectron2,该库需要编译安装,如需自行编译,请参考https://github.com/facebookresearch/detectron2/issues/5114
高精度模型依赖于detectron2,该库需要编译安装,如需自行编译,请参考
https://github.com/facebookresearch/detectron2/issues/5114
或是直接使用我们预编译的whl包(仅限python 3.10):
或是直接使用我们预编译的whl包(仅限python 3.10):
```
bash
```
bash
pip
install
detectron2
--extra-index-url
https://myhloli.github.io/wheels/
pip
install
detectron2
--extra-index-url
https://myhloli.github.io/wheels/
```
```
#### 2. 下载模型权重文件
#### 2. 下载模型权重文件
详细参考
[
如何下载模型文件
](
docs/how_to_download_models_zh_cn.md
)
详细参考
[
如何下载模型文件
](
docs/how_to_download_models_zh_cn.md
)
下载后请将models目录移动到空间较大的ssd磁盘目录
下载后请将models目录移动到空间较大的ssd磁盘目录
#### 3. 拷贝配置文件并进行配置
#### 3. 拷贝配置文件并进行配置
在仓库根目录可以获得
[
magic-pdf.template.json
](
magic-pdf.template.json
)
文件
在仓库根目录可以获得
[
magic-pdf.template.json
](
magic-pdf.template.json
)
文件
```
bash
```
bash
cp
magic-pdf.template.json ~/magic-pdf.json
cp
magic-pdf.template.json ~/magic-pdf.json
```
```
...
@@ -120,8 +120,8 @@ cp magic-pdf.template.json ~/magic-pdf.json
...
@@ -120,8 +120,8 @@ cp magic-pdf.template.json ~/magic-pdf.json
```
bash
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
```
```
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件,markdown目录中可以找到对应的xxx_model.json文件
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件,markdown目录中可以找到对应的xxx_model.json文件
如果您有意对后处理pipeline进行二次开发,可以使用命令
如果您有意对后处理pipeline进行二次开发,可以使用命令
```
bash
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
```
...
@@ -138,9 +138,9 @@ magic-pdf --help
...
@@ -138,9 +138,9 @@ magic-pdf --help
###### CUDA
###### CUDA
需要根据自己的CUDA版本安装对应的pytorch版本
需要根据自己的CUDA版本安装对应的pytorch版本
以下是对应CUDA 11.8版本的安装命令,更多信息请参考 https://pytorch.org/get-started/locally/
```
bash
```
bash
# 使用gpu方案时,需要重新安装对应cuda版本的pytorch,例子是安装CUDA 11.8版本的
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
```
```
...
@@ -152,9 +152,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
...
@@ -152,9 +152,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
```
```
###### MPS
###### MPS
使用macOS(M系列芯片设备)可以使用MPS进行推理加速
使用macOS(M系列芯片设备)可以使用MPS进行推理加速
需要修改配置文件magic-pdf.json中"device-mode"的值
需要修改配置文件magic-pdf.json中"device-mode"的值
```
json
```
json
{
{
"device-mode"
:
"mps"
"device-mode"
:
"mps"
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment