Commit 4d6dcb00 authored by myhloli's avatar myhloli
Browse files

docs: update readme

parent 19fd0a40
...@@ -94,9 +94,9 @@ Alternatively, for built-in high-precision model parsing capabilities, use: ...@@ -94,9 +94,9 @@ Alternatively, for built-in high-precision model parsing capabilities, use:
```bash ```bash
pip install magic-pdf[full-cpu] pip install magic-pdf[full-cpu]
``` ```
The high-precision models depend on detectron2, which requires a compiled installation. The high-precision models depend on detectron2, which requires a compiled installation.
If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114 If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114
Or directly use our pre-compiled wheel packages (limited to python 3.10): Or directly use our pre-compiled wheel packages (limited to python 3.10):
```bash ```bash
pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
``` ```
...@@ -104,7 +104,7 @@ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/ ...@@ -104,7 +104,7 @@ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
#### 2. Downloading model weights files #### 2. Downloading model weights files
For detailed references, please see below[how_to_download_models](docs/how_to_download_models_en.md) For detailed references, please see below [how_to_download_models](docs/how_to_download_models_en.md)
After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD. After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD.
...@@ -130,9 +130,9 @@ In magic-pdf.json, configure "models-dir" to point to the directory where the mo ...@@ -130,9 +130,9 @@ In magic-pdf.json, configure "models-dir" to point to the directory where the mo
```bash ```bash
magic-pdf pdf-command --pdf "pdf_path" --inside_model true magic-pdf pdf-command --pdf "pdf_path" --inside_model true
``` ```
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf". After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
You can find the corresponding xxx_model.json file in the markdown directory. You can find the corresponding xxx_model.json file in the markdown directory.
If you intend to do secondary development on the post-processing pipeline, you can use the command: If you intend to do secondary development on the post-processing pipeline, you can use the command:
```bash ```bash
magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path" magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
``` ```
...@@ -150,12 +150,12 @@ magic-pdf --help ...@@ -150,12 +150,12 @@ magic-pdf --help
##### CUDA ##### CUDA
You need to install the corresponding PyTorch version according to your CUDA version. You need to install the corresponding PyTorch version according to your CUDA version.
This example installs the CUDA 11.8 version.More information https://pytorch.org/get-started/locally/
```bash ```bash
# When using the GPU solution, you need to reinstall PyTorch for the corresponding CUDA version. This example installs the CUDA 11.8 version.
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118 pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
``` ```
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json. Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```json ```json
{ {
"device-mode":"cuda" "device-mode":"cuda"
...@@ -164,9 +164,8 @@ Also, you need to modify the value of "device-mode" in the configuration file ma ...@@ -164,9 +164,8 @@ Also, you need to modify the value of "device-mode" in the configuration file ma
##### MPS ##### MPS
For macOS users with M-series chip devices, you can use MPS for inference acceleration. For macOS users with M-series chip devices, you can use MPS for inference acceleration.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json. You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```json ```json
{ {
"device-mode":"mps" "device-mode":"mps"
......
...@@ -70,7 +70,7 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3 ...@@ -70,7 +70,7 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
python >= 3.9 python >= 3.9
推荐使用虚拟环境,以避免可能发生的依赖冲突,venv和conda均可使用。 推荐使用虚拟环境,以避免可能发生的依赖冲突,venv和conda均可使用。
例如: 例如:
```bash ```bash
conda create -n MinerU python=3.10 conda create -n MinerU python=3.10
...@@ -90,19 +90,19 @@ pip install magic-pdf ...@@ -90,19 +90,19 @@ pip install magic-pdf
```bash ```bash
pip install magic-pdf[full-cpu] pip install magic-pdf[full-cpu]
``` ```
高精度模型依赖于detectron2,该库需要编译安装,如需自行编译,请参考https://github.com/facebookresearch/detectron2/issues/5114 高精度模型依赖于detectron2,该库需要编译安装,如需自行编译,请参考 https://github.com/facebookresearch/detectron2/issues/5114
或是直接使用我们预编译的whl包(仅限python 3.10): 或是直接使用我们预编译的whl包(仅限python 3.10):
```bash ```bash
pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
``` ```
#### 2. 下载模型权重文件 #### 2. 下载模型权重文件
详细参考[如何下载模型文件](docs/how_to_download_models_zh_cn.md) 详细参考 [如何下载模型文件](docs/how_to_download_models_zh_cn.md)
下载后请将models目录移动到空间较大的ssd磁盘目录 下载后请将models目录移动到空间较大的ssd磁盘目录
#### 3. 拷贝配置文件并进行配置 #### 3. 拷贝配置文件并进行配置
在仓库根目录可以获得[magic-pdf.template.json](magic-pdf.template.json)文件 在仓库根目录可以获得 [magic-pdf.template.json](magic-pdf.template.json) 文件
```bash ```bash
cp magic-pdf.template.json ~/magic-pdf.json cp magic-pdf.template.json ~/magic-pdf.json
``` ```
...@@ -120,8 +120,8 @@ cp magic-pdf.template.json ~/magic-pdf.json ...@@ -120,8 +120,8 @@ cp magic-pdf.template.json ~/magic-pdf.json
```bash ```bash
magic-pdf pdf-command --pdf "pdf_path" --inside_model true magic-pdf pdf-command --pdf "pdf_path" --inside_model true
``` ```
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件,markdown目录中可以找到对应的xxx_model.json文件 程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件,markdown目录中可以找到对应的xxx_model.json文件
如果您有意对后处理pipeline进行二次开发,可以使用命令 如果您有意对后处理pipeline进行二次开发,可以使用命令
```bash ```bash
magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path" magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
``` ```
...@@ -138,9 +138,9 @@ magic-pdf --help ...@@ -138,9 +138,9 @@ magic-pdf --help
###### CUDA ###### CUDA
需要根据自己的CUDA版本安装对应的pytorch版本 需要根据自己的CUDA版本安装对应的pytorch版本
以下是对应CUDA 11.8版本的安装命令,更多信息请参考 https://pytorch.org/get-started/locally/
```bash ```bash
# 使用gpu方案时,需要重新安装对应cuda版本的pytorch,例子是安装CUDA 11.8版本的
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118 pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
``` ```
...@@ -152,9 +152,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https ...@@ -152,9 +152,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
``` ```
###### MPS ###### MPS
使用macOS(M系列芯片设备)可以使用MPS进行推理加速 使用macOS(M系列芯片设备)可以使用MPS进行推理加速
需要修改配置文件magic-pdf.json中"device-mode"的值
需要修改配置文件magic-pdf.json中"device-mode"的值
```json ```json
{ {
"device-mode":"mps" "device-mode":"mps"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment