"examples/vscode:/vscode.git/clone" did not exist on "bcb476797ccb7523f3e114f7440b4c8d9bb7154b"
Unverified Commit cb9b3056 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

Merge pull request #3030 from myhloli/dev

Dev
parents 5644672a bedb2fd8
...@@ -90,3 +90,7 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u ...@@ -90,3 +90,7 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
You can get the [Docker Deployment Instructions](./docker_deployment.md) in the documentation. You can get the [Docker Deployment Instructions](./docker_deployment.md) in the documentation.
--- ---
### Using MinerU
You can use MinerU for PDF parsing through various methods such as command line, API, and WebUI. For detailed instructions, please refer to the [Usage Guide](../usage/index.md).
\ No newline at end of file
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible > CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> ``` > ```
### Practical Application Scenarios ## Practical Application Scenarios
> [!TIP] > [!TIP]
> Here are some possible usage scenarios: > Here are some possible usage scenarios:
> >
......
...@@ -19,7 +19,7 @@ mineru -p <input_path> -o <output_path> ...@@ -19,7 +19,7 @@ mineru -p <input_path> -o <output_path>
>- `<input_path>`: Local PDF/image file or directory >- `<input_path>`: Local PDF/image file or directory
>- `<output_path>`: Output directory >- `<output_path>`: Output directory
> >
> For more information about output files, please refer to [Output File Documentation](./output_file.md). > For more information about output files, please refer to [Output File Documentation](../output_files.md).
> [!NOTE] > [!NOTE]
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems. > The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
...@@ -67,16 +67,18 @@ If you need to adjust parsing options through custom parameters, you can also ch ...@@ -67,16 +67,18 @@ If you need to adjust parsing options through custom parameters, you can also ch
> ```bash > ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000 > mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> ``` > ```
> [!TIP] > [!TIP]
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`. > All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md). > We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
## Extending MinerU Functionality with Configuration Files ## Extending MinerU Functionality with Configuration Files
- MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your user directory to add custom configurations. MinerU is now ready to use out of the box, but also supports extending functionality through configuration files. You can create a `mineru.json` file in your user directory to add custom configurations.
- The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`. The `mineru.json` file will be automatically generated when you use the built-in model download command `mineru-models-download`, or you can create it by copying the [configuration template file](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json) to your user directory and renaming it to `mineru.json`.
- Here are some available configuration options: Here are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
...@@ -89,4 +89,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple ...@@ -89,4 +89,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。 MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并解决一些棘手的环境兼容问题。
您可以在文档中获取[Docker部署说明](./docker_deployment.md) 您可以在文档中获取[Docker部署说明](./docker_deployment.md)
--- ---
\ No newline at end of file
### 使用 MinerU
您可以通过命令行、API、WebUI等多种方式使用MinerU进行PDF解析,具体使用方法请参考[使用指南](../usage/index.md)
\ No newline at end of file
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
> CUDA_VISIBLE_DEVICES="" # No GPU will be visible > CUDA_VISIBLE_DEVICES="" # No GPU will be visible
> ``` > ```
### 实际应用场景 ## 实际应用场景
> [!TIP] > [!TIP]
> 以下是一些可能的使用场景: > 以下是一些可能的使用场景:
......
...@@ -19,7 +19,7 @@ mineru -p <input_path> -o <output_path> ...@@ -19,7 +19,7 @@ mineru -p <input_path> -o <output_path>
> - `<input_path>`:本地 PDF/图片 文件或目录 > - `<input_path>`:本地 PDF/图片 文件或目录
> - `<output_path>`:输出目录 > - `<output_path>`:输出目录
> >
> 更多关于输出文件的信息,请参考[输出文件说明](./output_file.md)。 > 更多关于输出文件的信息,请参考[输出文件说明](../output_files.md)。
> [!NOTE] > [!NOTE]
> 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速, > 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速,
...@@ -75,10 +75,9 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers ...@@ -75,10 +75,9 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
## 基于配置文件扩展 MinerU 功能 ## 基于配置文件扩展 MinerU 功能
- MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。您可以在用户目录下创建 `mineru.json` 文件,添加自定义配置。 MinerU 现已实现开箱即用,但也支持通过配置文件扩展功能。您可以在用户目录下创建 `mineru.json` 文件,添加自定义配置。
- `mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成,也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。 `mineru.json` 文件会在您使用内置模型下载命令 `mineru-models-download` 时自动生成,也可以通过将[配置模板文件](https://github.com/opendatalab/MinerU/blob/master/mineru.template.json)复制到用户目录下并重命名为 `mineru.json` 来创建。
- 以下是一些可用的配置选项: 以下是一些可用的配置选项:
- `latex-delimiter-config`:用于配置 LaTeX 公式的分隔符,默认为`$`符号,可根据需要修改为其他符号或字符串。
- `latex-delimiter-config`:用于配置 LaTeX 公式的分隔符,默认为`$`符号,可根据需要修改为其他符号或字符串。 - `llm-aided-config`:用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型,默认使用`阿里云百炼``qwen2.5-32b-instruct`模型,您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。
- `llm-aided-config`:用于配置 LLM 辅助标题分级的相关参数,兼容所有支持`openai协议`的 LLM 模型,默认使用`阿里云百炼``qwen2.5-32b-instruct`模型,您需要自行配置 API 密钥并将`enable`设置为`true`来启用此功能。 - `models-dir`:用于指定本地模型存储目录,请为`pipeline``vlm`后端分别指定模型目录,指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
- `models-dir`:用于指定本地模型存储目录,请为`pipeline``vlm`后端分别指定模型目录,指定目录后您可通过配置环境变量`export MINERU_MODEL_SOURCE=local`来使用本地模型。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment