command_line.rst 3.08 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12


Command Line
===================

.. code:: bash

   magic-pdf --help
   Usage: magic-pdf [OPTIONS]

   Options:
     -v, --version                display the version and exit
xu rui's avatar
xu rui committed
13
14
     -p, --path PATH              local filepath or directory. support PDF, PPT,
                                  PPTX, DOC, DOCX, PNG, JPG files  [required]
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
     -o, --output-dir PATH        output local directory  [required]
     -m, --method [ocr|txt|auto]  the method for parsing pdf. ocr: using ocr
                                  technique to extract information from pdf. txt:
                                  suitable for the text-based pdf only and
                                  outperform ocr. auto: automatically choose the
                                  best method for parsing pdf from ocr and txt.
                                  without method specified, auto will be used by
                                  default.
     -l, --lang TEXT              Input the languages in the pdf (if known) to
                                  improve OCR accuracy.  Optional. You should
                                  input "Abbreviation" with language form url: ht
                                  tps://paddlepaddle.github.io/PaddleOCR/en/ppocr
                                  /blog/multi_languages.html#5-support-languages-
                                  and-abbreviations
     -d, --debug BOOLEAN          Enables detailed debugging information during
                                  the execution of the CLI commands.
     -s, --start INTEGER          The starting page for PDF parsing, beginning
                                  from 0.
     -e, --end INTEGER            The ending page for PDF parsing, beginning from
                                  0.
     --help                       Show this message and exit.


   ## show version
   magic-pdf -v

   ## command line example
   magic-pdf -p {some_pdf} -o {some_output_dir} -m auto

xu rui's avatar
xu rui committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57

.. admonition:: Important
    :class: tip

    The file must endswith with the following suffix.
       .pdf 
       .png
       .jpg
       .ppt
       .pptx
       .doc
       .docx


58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
``{some_pdf}`` can be a single PDF file or a directory containing
multiple PDFs. The results will be saved in the ``{some_output_dir}``
directory. The output file list is as follows:

.. code:: text

   ├── some_pdf.md                          # markdown file
   ├── images                               # directory for storing images
   ├── some_pdf_layout.pdf                  # layout diagram
   ├── some_pdf_middle.json                 # MinerU intermediate processing result
   ├── some_pdf_model.json                  # model inference result
   ├── some_pdf_origin.pdf                  # original PDF file
   ├── some_pdf_spans.pdf                   # smallest granularity bbox position information diagram
   └── some_pdf_content_list.json           # Rich text JSON arranged in reading order

xu rui's avatar
xu rui committed
73
74
.. admonition:: Tip
   :class: tip
xu rui's avatar
xu rui committed
75
   
xu rui's avatar
xu rui committed
76

xu rui's avatar
xu rui committed
77
   For more information about the output files, please refer to the :doc:`../inference_result` or :doc:`../pipe_result`