Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
MinerU
Commits
91d825b2
Commit
91d825b2
authored
Dec 02, 2024
by
xu rui
Browse files
docs: fix table format
parent
f6bd47de
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
64 additions
and
67 deletions
+64
-67
next_docs/requirements.txt
next_docs/requirements.txt
+5
-1
next_docs/zh_cn/user_guide/tutorial/output_file_description.rst
...ocs/zh_cn/user_guide/tutorial/output_file_description.rst
+59
-66
No files found.
next_docs/requirements.txt
View file @
91d825b2
numpy==1.26.4
click==8.1.7
fast-langdetect==0.2.2
Brotli==1.1.0
boto3>=1.28.43
boto3>=1.28.43
loguru>=0.6.0
loguru>=0.6.0
myst-parser
myst-parser
...
@@ -9,4 +13,4 @@ sphinx-argparse>=0.5.2
...
@@ -9,4 +13,4 @@ sphinx-argparse>=0.5.2
sphinx-book-theme>=1.1.3
sphinx-book-theme>=1.1.3
sphinx-copybutton>=0.5.2
sphinx-copybutton>=0.5.2
sphinx_rtd_theme>=3.0.1
sphinx_rtd_theme>=3.0.1
autodoc_pydantic>=2.2.0
autodoc_pydantic>=2.2.0
\ No newline at end of file
next_docs/zh_cn/user_guide/tutorial/output_file_description.rst
View file @
91d825b2
...
@@ -137,49 +137,45 @@ poly 坐标的格式 [x0, y0, x1, y1, x2, y2, x3, y3],
...
@@ -137,49 +137,45 @@ poly 坐标的格式 [x0, y0, x1, y1, x2, y2, x3, y3],
some_pdf_middle.json
some_pdf_middle.json
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
+-----------+----------------------------------------------------------+
+--------------------+----------------------------------------------------------+
| 字段名 | 解释 |
| 字段名 | 解释 |
+===========+==========================================================+
+====================+==========================================================+
| pdf_info | list,每个 |
| pdf_info | list,每个元素都是一个 |
| | 元素都是一个dict,这个dict是每一页pdf的解析结果,详见下表 |
| | dict,这个dict是每一页pdf的解析结果,详见下表 |
+-----------+----------------------------------------------------------+
+--------------------+----------------------------------------------------------+
| | ocr \| txt,用来标识本次解析的中间态使用的模式 |
| \_parse_type | ocr \| txt,用来标识本次解析的中间态使用的模式 |
| \_parse_type | |
+--------------------+----------------------------------------------------------+
+-----------+----------------------------------------------------------+
| \_version_name | string,表示本次解析使用的 magic-pdf 的版本号 |
| | string, 表示本次解析使用的 magic-pdf 的版本号 |
+-------------------------------------------------------------------------------+
| \_version_name | |
+-----------+----------------------------------------------------------+
**pdf_info** 字段结构说明
**pdf_info** 字段结构说明
+--------------+-------------------------------------------------------+
+---------------------+-------------------------------------------------------+
| 字段名 | 解释 |
| 字段名 | 解释 |
+==============+=======================================================+
+=====================+=======================================================+
| | pdf预处理后,未分段的中间结果 |
| preproc_blocks | pdf预处理后,未分段的中间结果 |
| preeproc_blocks | |
+---------------------+-------------------------------------------------------+
+--------------+-------------------------------------------------------+
| | 布局分割的结果, |
| | 布局分割的结果, |
| layout_bboxes | 含有布局的方向(垂直、水平),和bbox,按阅读顺序排序 |
| layout_bboxes | 含有布局的方向(垂直、水平),和bbox,按阅读顺序排序 |
+---------------------+-------------------------------------------------------+
+--------------+-------------------------------------------------------+
| page_idx | 页码,从0开始 |
| page_idx | 页码,从0开始 |
+---------------------+-------------------------------------------------------+
+--------------+-------------------------------------------------------+
| page_size | 页面的宽度和高度 |
| page_size | 页面的宽度和高度 |
+---------------------+-------------------------------------------------------+
+--------------+-------------------------------------------------------+
| \_layout_tree | 布局树状结构 |
| \ | 布局树状结构 |
+---------------------+-------------------------------------------------------+
| _layout_tree | |
| images | list,每个元素是一个dict,每个dict表示一个img_block |
+--------------+-------------------------------------------------------+
+---------------------+-------------------------------------------------------+
| images | list,每个元素是一个dict,每个dict表示一个img_block |
| tables | list,每个元素是一个dict,每个dict表示一个table_block |
+--------------+-------------------------------------------------------+
+---------------------+-------------------------------------------------------+
| tables | list,每个元素是一个dict,每个dict表示一个table_block |
| | list,每个元素是一个 |
+--------------+-------------------------------------------------------+
| interline_equations | dict,每个dict表示一个interline_equation_block |
| | list,每个元素 |
+---------------------+-------------------------------------------------------+
| interline_equations | 是一个dict,每个dict表示一个interline_equation_block |
| | List, 模型返回的需要drop的block信息 |
+--------------+-------------------------------------------------------+
| discarded_blocks | |
| | List, 模型返回的需要drop的block信息 |
+---------------------+-------------------------------------------------------+
| discarded_blocks | |
| para_blocks | 将preproc_blocks进行分段之后的结果 |
+--------------+-------------------------------------------------------+
+---------------------+-------------------------------------------------------+
| para_blocks | 将preproc_blocks进行分段之后的结果 |
+--------------+-------------------------------------------------------+
上表中 ``para_blocks``
上表中 ``para_blocks``
是个dict的数组,每个dict是一个block结构,block最多支持一次嵌套
是个dict的数组,每个dict是一个block结构,block最多支持一次嵌套
...
@@ -200,20 +196,18 @@ blocks list,里面的每个元素都是一个dict格式的二级block
...
@@ -200,20 +196,18 @@ blocks list,里面的每个元素都是一个dict格式的二级block
二级block中的字段包括
二级block中的字段包括
+-----+----------------------------------------------------------------+
+----------+----------------------------------------------------------------+
| 字 | 解释 |
| 字 | 解释 |
| 段 | |
| 段 | |
| 名 | |
| 名 | |
+=====+================================================================+
+==========+================================================================+
| | block类型 |
| | block类型 |
| type | |
| type | |
+-----+----------------------------------------------------------------+
+----------+----------------------------------------------------------------+
| | block矩形框坐标 |
| bbox | block矩形框坐标 |
| bbox | |
+----------+----------------------------------------------------------------+
+-----+----------------------------------------------------------------+
| lines | list,每个元素都是一个dict表示的line,用来描述一行信息的构成 |
| | list,每个元素都是一个dict表示的line,用来描述一行信息的构成 |
+----------+----------------------------------------------------------------+
| lines | |
+-----+----------------------------------------------------------------+
二级block的类型详解
二级block的类型详解
...
@@ -237,22 +231,21 @@ interline_equation 行间公式块
...
@@ -237,22 +231,21 @@ interline_equation 行间公式块
line 的 字段格式如下
line 的 字段格式如下
+----+-----------------------------------------------------------------+
+-----------+-----------------------------------------------------------------+
| 字 | 解释 |
| 字 | 解释 |
| 段 | |
| 段 | |
| 名 | |
| 名 | |
+====+=================================================================+
+===========+=================================================================+
| bbox | line的矩形框坐标 |
| bbox | line的矩形框坐标 |
| | |
+-----------+-----------------------------------------------------------------+
+----+-----------------------------------------------------------------+
| spans | list, |
| spans | list, |
| | 每个元素都是一个dict表示的span,用来描述一个最小组成单元的构成 |
| | 每个元素都是一个dict表示的span,用来描述一个最小组成单元的构成 |
+-----------+-----------------------------------------------------------------+
+----+-----------------------------------------------------------------+
**span**
**span**
+------------+---------------------------------------------------------+
+------------+---------------------------------------------------------+
| 字段名 | 解释
|
| 字段名
| 解释 |
+============+=========================================================+
+============+=========================================================+
| bbox | span的矩形框坐标 |
| bbox | span的矩形框坐标 |
+------------+---------------------------------------------------------+
+------------+---------------------------------------------------------+
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment