Commits · 0405461d35c94ebdc7780c7c4e160322636886ea · wangsen / MinerU

10 Aug, 2024 1 commit

docs(faq): add solution for libGL.so.1 missing on WSL2 Ubuntu22.04 · 0405461d

myhloli authored Aug 11, 2024

Add FAQ entries in both English and Chinese to address the issue where the
libGL.so.1 library is missing on Ubuntu22.04 when running under WSL2. The
FAQ now includes instructions on how to install the missing library, resolvingthe corresponding ImportError.Closes https://github.com/opendatalab/MinerU/issues/388

0405461d

09 Aug, 2024 7 commits
- Update README_Windows_CUDA_Acceleration_en_US.md · 24503530
  sfk authored Aug 09, 2024
  
  24503530
- Update README_Windows_CUDA_Acceleration_zh_CN.md · ece8dac4
  sfk authored Aug 09, 2024
  
  ece8dac4
- Update README_Ubuntu_CUDA_Acceleration_en_US.md · 409ece82
  sfk authored Aug 09, 2024
  
  409ece82
- Update README_Ubuntu_CUDA_Acceleration_zh_CN.md · 18f82ab7
  sfk authored Aug 09, 2024
  
  18f82ab7
- Create FAQ_en_us.md · 85e36358
  sfk authored Aug 09, 2024
  
  85e36358
- Create output_file_en_us.md · cf704253
  sfk authored Aug 09, 2024
  
  cf704253
- Update FAQ_zh_cn.md · b03b5cdd
  Xiaomeng Zhao authored Aug 09, 2024
  
  b03b5cdd
08 Aug, 2024 1 commit

docs(cuda-acceleration): add tips to verify CUDA acceleration effectiveness · 048e0952

myhloli authored Aug 08, 2024

Add notes in the Ubuntu and Windows CUDA acceleration guides on how to
determine if CUDA acceleration is working. This includes checking for
significant reductions in `layout detection cost`, `mfr time`, and `ocr cost`
as indicators of successful acceleration.

048e0952

07 Aug, 2024 4 commits

docs(zh-cn): emphasize additional steps in model download guide · 8da5328f

myhloli authored Aug 07, 2024

Add an exclamation mark to the section title to stress the importance of completing the
additional steps after downloading a model. This change is made in the Chinese
documentation to ensure users are aware of the necessary post-download actions.

8da5328f

fix(models-download-path): correct the download path for PDF-Extract-Kit · 2ff63b7c

myhloli authored Aug 07, 2024

Adjust the print statement in the how_to_download_models_zh_cn.md guide to reflect
the correct model download location. The path has been updated to specify the 'models'
directory where the model is actually downloaded.

2ff63b7c

docs(models_zh_cn): add print statement to download models example · c7067c85

赵小蒙 authored Aug 07, 2024

Add a print statement to the example code in 'how_to_download_models_zh_cn.md' to
output the downloaded model directory path. This enhancement aids users in locating
the model files as it provides a clear indication of where they are saved on the
user's file system.

c7067c85

docs(readme): update acknowledgment section and project description-... · 361f5042

myhloli authored Aug 07, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

361f5042

06 Aug, 2024 7 commits

docs(models-download): update steps and remove deprecated sectionsUpdate the... · d2a8cb42

myhloli authored Aug 06, 2024

docs(models-download): update steps and remove deprecated sectionsUpdate the model download instructions to reflect the current process, removing
unnecessary sections and simplifying the steps. The updated guide now includesclearer instructions on installing Git LFS, downloading models from Hugging Face,and additional checks for model file completeness. This change ensures that the
documentation is up-to-date and provides a streamlined experience for users
downloading models.

d2a8cb42

docs: correct path format description in Windows CUDA docsUpdate the... · c723cc65

myhloli authored Aug 06, 2024

docs: correct path format description in Windows CUDA docsUpdate the instructions in the Windows CUDA Acceleration documentation toreflect the correct path format. Specifically, clarify that Windows paths
should include the drive letter and replace backslashes with forward slashes.

c723cc65

docs(cuda-acceleration): update PowerShell examples and formatting in README · 4f5689a4
myhloli authored Aug 06, 2024

4f5689a4

docs: update URLs to gitee for Windows CUDA acceleration guides · d3e42e08

myhloli authored Aug 06, 2024

Update the URLs for downloading the `magic-pdf.template.json` and `small_ocr.pdf`
files in the Windows CUDA acceleration guides. The links now point to the giteerepository instead of GitHub, ensuring users have access to the necessary files
from the correct source.

d3e42e08

docs(zh_CN): update Ubuntu CUDA Acceleration guide · 020602eb

myhloli authored Aug 06, 2024

- Streamline the installation process by removing the redundant apt update step.
- Adjust the numbering of installation steps throughout the document.
- Update download URLs to gitee for the configuration template and demo file.
- Ensure consistency in the model directory configuration advice.

020602eb

docs: add Ubuntu 22.04 LTS CUDA acceleration setup guide · 4d7dc065

myhloli authored Aug 06, 2024

Add a new README_Ubuntu_CUDA_Acceleration_en_US.md document to provide users with a
setup guide for enabling and testing CUDA acceleration on Ubuntu 22.04 LTS. The guideincludes steps to check and install NVIDIA drivers, install Anaconda, create a conda
environment, install required applications, download and verify models, configure theenvironment, and test CUDA acceleration.

This addition addresses the need for clear, concise instructions on achieving better
performance with CUDA-enabled graphics cards and

4d7dc065

docs(FAQ): update troubleshooting sections for offline deployment and Mac issues · 2eaa9ca1

myhloli authored Aug 06, 2024

- Note the fix in version 0.6.2b1 for the network error during the first run of offline  deployment and clarify the model download requirement.
- Update the dependency installation guide for users on macOS with Intel CPUs.
- Indicate the resolution in version 0.6.2b1 for compatibility issues with paddlepaddle
  version 2.6.1 on certain Linux systems.

This change aims to make the FAQ more informative and easier to navigate for users
experiencing similar issues, providing direct solutions and links where applicable.

2eaa9ca1

05 Aug, 2024 1 commit

mirror(conda): use tuna mirror for Anaconda download · 29e48c73

myhloli authored Aug 05, 2024

Update the download links for Anaconda in both Ubuntu and Windows CUDA
Acceleration documents to use the Tuna mirror. This change helps ensure that
users in China have faster access to the Anaconda distribution.

29e48c73

02 Aug, 2024 9 commits

docs: specify absolute path for model weights configuration · 9778a461

myhloli authored Aug 03, 2024

Update the README documents to clarify that the "models-dir" in the
configuration should be an absolute path. Also, provide additional guidance
for Windows users on how to correctly format the path to avoid common issues
with path escaping in JSON files.

9778a461

docs: add wget command for Ubuntu and powershell script for Windows · 44a2dc37

myhloli authored Aug 03, 2024

Add instructions to download the magic-pdf.template.json file using wget on
Ubuntu and a PowerShell script on Windows in the respective README files.
This is to facilitate the setup process by providing direct download options,
replacing manual file transfers.

44a2dc37

fix(docs): pin Magic-PDF version to 0.6.2b1 in install commands · a0c62b26

myhloli authored Aug 02, 2024

Update the install commands in both Ubuntu and Windows CUDA Acceleration
guides to specify Magic-PDF version 0.6.2b1, ensuring consistency andavoiding potential version mismatches.

a0c62b26

docs(FAQ): update dependency installation troubleshooting · 961330f7

myhloli authored Aug 02, 2024

Update the FAQ to clarify the dependency installation issue when using magic-pdf. Ensure
users are directed to install the specific version of magic-pdf that resolves the dependency
error, rather than listing all individual dependencies. This simplifies the troubleshooting process
and provides a direct solution for users encountering the "Required dependency not installed"
error.

961330f7

docs(models_zh_cn): update download methods from ModelScope · a24890b1

myhloli authored Aug 02, 2024

Update the download methods for models in the Chinese documentation to reflect
the latest options available from ModelScope. Simplify the section titles and
revise download instructions for clarity and consistency.

a24890b1

docs: update model download instructions and CUDA acceleration setup · 3ef4d054

myhloli authored Aug 02, 2024

Update the documentation to reflect the latest model download procedures, emphasis on
model file integrity checks, and expanded instructions for setting up CUDA accelerationon Ubuntu and Windows environments. The README files for various OS have been
enhanced with additional details to assist users in configuring and verifying their
environments for optimal performance.

3ef4d054

Make the documentation on how to download the model more concise · 2a06e0c8
xuchao authored Aug 02, 2024

2a06e0c8

feat(model inference): add table recognition and conversion to LaTeX (#284) · 37925f36

Kaiwen Liu authored Aug 02, 2024

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into html.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into LaTex.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # feat(model inference): add table recognition and convertion to LaTeX

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec

) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

* add table recognition and convertion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

---------
Co-authored-by: liukaiwen <liukaiwen@pjlab.org.cn>

37925f36

docs(output-file): correct poly coordinate format and update table descriptions · 41737adf

myhloli authored Aug 02, 2024

- Fix the description of the 'poly' coordinate format in the output file documentation to correctly reflect the order of coordinates: left-top, right-top, right-bottom,
  left-bottom.
- Update various table-related descriptions for clarity and consistency, including
  field names and their corresponding explanations.
- Add version name field description in 'middle.json' structure to document the
  version of the magic-pdf used in the parsing process.
- Refactor the block and line description tables to improve readability and alignment
  with the rest of the documentation.

41737adf

01 Aug, 2024 9 commits

docs: update README for Ubuntu CUDA Acceleration · 15125623

myhloli authored Aug 01, 2024

- Adjust command installation format for PaddlePaddle GPU.
- Clarify instruction numbering for testing OCR acceleration.

15125623

docs(zh_CN): update Ubuntu CUDA setup guide for accuracy · a09291ad

myhloli authored Aug 01, 2024

Update the Ubuntu CUDA Acceleration setup guide to reflect the correct user directory
path and improve the clarity of instructions. Remove references to Windows and macOS
as they are out of scope for this document. Ensure the configuration file copying
command is correctly represented for Linux users.

a09291ad

fix(docs): correct link to magic-pdf.template.json in README · 51a0bf4a

myhloli authored Aug 01, 2024

Update the link to the magic-pdf.template.json configuration template file in the
README_Ubuntu_CUDA_Acceleration_zh_CN.md document. The file path was previously
incorrect and has been amended to point to the correct location.

51a0bf4a

docs(magic-pdf): update model directory reference in configuration · 866e47a0

myhloli authored Aug 01, 2024

Update the instruction in README_Ubuntu_CUDA_Acceleration_zh_CN.md to reference
the correct section number for downloading the model weights. This change ensures
that users are directed to the correct location in the document for setting up the
model directory in the magic-pdf.json configuration.

866e47a0

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for... · fc18a5cf

myhloli authored Aug 01, 2024

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for Ubuntu 22.04 LTS installation.
- Detail the process of checking, installing, and configuring NVIDIA drivers.
- Include instructions for installing Anaconda and creating a specific environment.
- Provide guidance on installing magic-pdf and its dependencies.
- Add a note to verify magic-pdf version and report issues if necessary.
- Describe the process of downloading models and configuring the application.
- Include a sample command to run the application with CUDA acceleration.
- Add a note for enabling OCR CUDA acceleration with specific GPU requirements.

This update ensures users have the latest information for setting up CUDA accelerationwith magic-pdf on Ubuntu 22.04 LTS, specifically for version 0.6.2, and provides clearer
instructions on the installation and configuration process.

fc18a5cf

docs: restructure download guide and add ModelScope options · b4b2a099

myhloli authored Aug 01, 2024

Restructured the how-to download models document for better clarity and
added sections on downloading models from ModelScope, including SDK and
Git download methods. Provided detailed steps for installing Git LFS and
checking model integrity after download. Also included recommendations
for moving the models to an SSD for better performance.

b4b2a099

Feat/impl cli (#264) · 40e0827e

icecraft authored Aug 01, 2024



* feat: refractor cli command

* feat: add docs to describe the output files of cli

* feat: resove review comments

* feat: updat docs about middle.json

---------
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>

40e0827e

Update how_to_download_models_en.md · c30a1abd
Richard Li authored Aug 01, 2024

c30a1abd

# feat(model inference): add table recognition and convertion to LaTeX · d04f3f22

liukaiwen authored Aug 01, 2024

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

d04f3f22

31 Jul, 2024 1 commit

docs: add installation guide for git lfs on various platforms · 808563ce

myhloli authored Jul 31, 2024

Add detailed instructions for installing git lfs on Linux, macOS, and Windows
to facilitate users in downloading models from ModelScope repository. The guide
is included in the `how_to_download_models_zh_cn.md` document.

808563ce