"vscode:/vscode.git/clone" did not exist on "1e877b1917c0d458d0c76af98932f99ed7b10757"
Unverified Commit 77596ad0 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

docs: update model download instructions to use python script (#560)

parent a1c0d535
### 1. Install Git LFS
Before you begin, make sure Git Large File Storage (Git LFS) is installed on your system. Install it using the following command:
```bash
git lfs install
```
### 2. Download the Model from Hugging Face
To download the `PDF-Extract-Kit` model from Hugging Face, use the following command:
### 1. Download the Model from Hugging Face
Use a Python Script to Download Model Files from Hugging Face
```bash
git lfs clone https://huggingface.co/opendatalab/PDF-Extract-Kit
pip install huggingface_hub
wget https://github.com/opendatalab/MinerU/raw/master/docs/download_models_hf.py
python download_models_hf.py
```
Ensure that Git LFS is enabled during the clone to properly download all large files.
### 3. Additional steps
After the Python script finishes executing, it will output the directory where the models are downloaded.
### 2. Additional steps
#### 1. Check whether the model directory is downloaded completely.
......@@ -65,6 +57,7 @@ The structure of the model folder is as follows, including configuration files a
Please check whether the size of the model file in the directory is consistent with the description on the web page. If possible, it is best to check whether the model is downloaded completely through sha256.
#### 3. Move the model to the solid-state drive
#### 3.
Additionally, in `~/magic-pdf.json`, update the model directory path to the absolute path of the `models` directory output by the previous Python script. Otherwise, you will encounter an error indicating that the model cannot be loaded.
Move the 'models' directory to a directory with large disk space, preferably on a solid-state drive (SSD). In addition, modify the model directory in `~/magic-pdf.json` to point to the final model storage location, otherwise the model cannot be loaded.
\ No newline at end of file
# 如何下载模型文件
模型文件可以从Hugging Face 或 Model Scope 下载,由于网络原因,国内用户访问HF 可能会失败,请使用 ModelScope。
方法一:[从 Hugging Face 下载模型](#方法一从-hugging-face-下载模型)
方法二:[从 ModelScope 下载模型](#方法二从-modelscope-下载模型)
## 方法一:从 Hugging Face 下载模型
使用Git LFS 从Hugging Face下载模型文件
```bash
git lfs install # 安装 Git 大文件存储插件 (Git LFS)
git lfs clone https://huggingface.co/opendatalab/PDF-Extract-Kit # 从 Hugging Face 下载 PDF-Extract-Kit 模型
```
模型文件可以从 Hugging Face 或 Model Scope 下载,由于网络原因,国内用户访问HF可能会失败,请使用 ModelScope。
<details>
<summary>方法一:从 Hugging Face 下载模型</summary>
<p>使用python脚本 从Hugging Face下载模型文件</p>
<pre><code>pip install huggingface_hub
wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models_hf.py
python download_models_hf.py</code></pre>
<p>python脚本执行完毕后,会输出模型下载目录</p>
</details>
## 方法二:从 ModelScope 下载模型
ModelScope 支持SDK或模型下载,任选一个即可。
[Git lsf下载](#1利用git-lsf下载)
[SDK下载](#2利用sdk下载)
### 1)利用Git lsf下载
```bash
git lfs install
git lfs clone https://www.modelscope.cn/opendatalab/PDF-Extract-Kit.git
```
### 2)利用SDK下载
### 使用python脚本 从ModelScope下载模型文件
```bash
# 首先安装modelscope
pip install modelscope
wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models.py
python download_models.py
```
```python
# 使用modelscope sdk下载模型
from modelscope import snapshot_download
model_dir = snapshot_download('opendatalab/PDF-Extract-Kit')
print(f"模型文件下载路径为:{model_dir}/models")
```
python脚本执行完毕后,会输出模型下载目录
## 【❗️必须要做❗️】的额外步骤(模型下载完成后请务必完成以下操作)
### 1.检查模型目录是否下载完整
......@@ -95,6 +71,5 @@ print(f"模型文件下载路径为:{model_dir}/models")
### 2.检查模型文件是否下载完整
请检查目录下的模型文件大小与网页上描述是否一致,如果可以的话,最好通过sha256校验模型是否下载完整
### 3.移动模型到固态硬盘
将 'models' 目录移动到具有较大磁盘空间的目录中,最好是在固态硬盘(SSD)上。
此外在 `~/magic-pdf.json`里修改模型的目录指向最终的模型存放位置,否则会报模型无法加载的错误。
### 3.修改magic-pdf.json中的模型路径
此外在 `~/magic-pdf.json`里修改模型的目录指向之前python脚本输出的models目录的绝对路径,否则会报模型无法加载的错误。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment