config.rst 3.25 KB
Newer Older
xu rui's avatar
xu rui committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155


Config
=========

File **magic-pdf.json** is typically located in the **${HOME}** directory under a Linux system or in the **C:\Users\{username}** directory under a Windows system.


magic-pdf.json
----------------

.. code:: json 

    {
        "bucket_info":{
            "bucket-name-1":["ak", "sk", "endpoint"],
            "bucket-name-2":["ak", "sk", "endpoint"]
        },
        "models-dir":"/tmp/models",
        "layoutreader-model-dir":"/tmp/layoutreader",
        "device-mode":"cpu",
        "layout-config": {
            "model": "layoutlmv3"
        },
        "formula-config": {
            "mfd_model": "yolo_v8_mfd",
            "mfr_model": "unimernet_small",
            "enable": true
        },
        "table-config": {
            "model": "rapid_table",
            "enable": false,
            "max_time": 400    
        },
        "config_version": "1.0.0"
    }




bucket_info
^^^^^^^^^^^^^^
Store the access_key, secret_key and endpoint of AWS S3 Compatible storage config

Example: 

.. code:: text

        {
            "image_bucket":[{access_key}, {secret_key}, {endpoint}],
            "video_bucket":[{access_key}, {secret_key}, {endpoint}]
        }


models-dir
^^^^^^^^^^^^

Store the models download from **huggingface** or **modelshop**. You do not need to modify this field if you download the model using the scripts shipped with **MinerU**


layoutreader-model-dir
^^^^^^^^^^^^^^^^^^^^^^^

Store the models download from **huggingface** or **modelshop**. You do not need to modify this field if you download the model using the scripts shipped with **MinerU**


devide-mode
^^^^^^^^^^^^^^

This field have two options, **cpu** or **cuda**.

**cpu**: inference via cpu

**cuda**: using cuda to accelerate inference


layout-config 
^^^^^^^^^^^^^^^

.. code:: json

    {
        "model": "layoutlmv3"  
    }

layout model can not be disabled now, And we have only kind of layout model currently.


formula-config
^^^^^^^^^^^^^^^^

.. code:: json

    {
        "mfd_model": "yolo_v8_mfd",   
        "mfr_model": "unimernet_small",
        "enable": true 
    }


mfd_model
""""""""""

Specify the formula detection model, options are ['yolo_v8_mfd']


mfr_model
""""""""""
Specify the formula recognition model, options are ['unimernet_small']

Check `UniMERNet <https://github.com/opendatalab/UniMERNet>`_ for more details


enable
""""""""

on-off flag, options are [true, false]. **true** means enable formula inference, **false** means disable formula inference


table-config
^^^^^^^^^^^^^^^^

.. code:: json

   {
        "model": "rapid_table",
        "enable": false,
        "max_time": 400    
    }

model
""""""""

Specify the table inference model, options are ['rapid_table', 'tablemaster', 'struct_eqtable']


max_time
"""""""""

Since table recognition is a time-consuming process, we set a timeout period. If the process exceeds this time, the table recognition will be terminated.



enable
"""""""

on-off flag, options are [true, false]. **true** means enable table inference, **false** means disable table inference


config_version
^^^^^^^^^^^^^^^^

The version of config schema.


156
157
158
.. admonition:: Tip
    :class: tip
    
159
    Check `Config Schema <https://github.com/opendatalab/MinerU/blob/master/magic-pdf.template.json>`_ for the latest details
xu rui's avatar
xu rui committed
160