how_to_download_models_en.md 2.62 KB
Newer Older
1
### 1. Install Git LFS
2
3
4
5
6
7
Before you begin, make sure Git Large File Storage (Git LFS) is installed on your system. Install it using the following command:

```bash
git lfs install
```

8
### 2. Download the Model from Hugging Face
9
10
11
To download the `PDF-Extract-Kit` model from Hugging Face, use the following command:

```bash
12
git lfs clone https://huggingface.co/opendatalab/PDF-Extract-Kit
13
14
15
16
```

Ensure that Git LFS is enabled during the clone to properly download all large files.

17
### 3. Additional steps
18

19
#### 1. Check whether the model directory is downloaded completely.
20

21
The structure of the model folder is as follows, including configuration files and weight files of different components:
22
```
23
../
24
25
├── Layout
│   ├── config.json
26
│   └── model_final.pth
27
28
29
30
31
32
33
34
35
36
├── MFD
│   └── weights.pt
├── MFR
│   └── UniMERNet
│       ├── config.json
│       ├── preprocessor_config.json
│       ├── pytorch_model.bin
│       ├── README.md
│       ├── tokenizer_config.json
│       └── tokenizer.json
37
38
39
40
41
42
43
44
45
46
│── TabRec
│   └─StructEqTable
│       ├── config.json
│       ├── generation_config.json
│       ├── model.safetensors
│       ├── preprocessor_config.json
│       ├── special_tokens_map.json
│       ├── spiece.model
│       ├── tokenizer.json
│       └── tokenizer_config.json 
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
│   └─ TableMaster 
│       └─ ch_PP-OCRv3_det_infer
│           ├── inference.pdiparams
│           ├── inference.pdiparams.info
│           └── inference.pdmodel
│       └─ ch_PP-OCRv3_rec_infer
│           ├── inference.pdiparams
│           ├── inference.pdiparams.info
│           └── inference.pdmodel
│       └─ table_structure_tablemaster_infer
│           ├── inference.pdiparams
│           ├── inference.pdiparams.info
│           └── inference.pdmodel
│       ├── ppocr_keys_v1.txt
│       └── table_master_structure_dict.txt
62
└── README.md
63
```
64
65
66
67
68
69
70
#### 2. Check whether the model file is fully downloaded.

Please check whether the size of the model file in the directory is consistent with the description on the web page. If possible, it is best to check whether the model is downloaded completely through sha256.

#### 3. Move the model to the solid-state drive

Move the 'models' directory to a directory with large disk space, preferably on a solid-state drive (SSD). In addition, modify the model directory in `~/magic-pdf.json` to point to the final model storage location, otherwise the model cannot be loaded.