Commits · 2eaa9ca1d3361c2908aecfbe0d385abce37b74fd · wangsen / MinerU

06 Aug, 2024 1 commit

docs(FAQ): update troubleshooting sections for offline deployment and Mac issues · 2eaa9ca1

myhloli authored Aug 06, 2024

- Note the fix in version 0.6.2b1 for the network error during the first run of offline  deployment and clarify the model download requirement.
- Update the dependency installation guide for users on macOS with Intel CPUs.
- Indicate the resolution in version 0.6.2b1 for compatibility issues with paddlepaddle
  version 2.6.1 on certain Linux systems.

This change aims to make the FAQ more informative and easier to navigate for users
experiencing similar issues, providing direct solutions and links where applicable.

2eaa9ca1

05 Aug, 2024 1 commit

mirror(conda): use tuna mirror for Anaconda download · 29e48c73

myhloli authored Aug 05, 2024

Update the download links for Anaconda in both Ubuntu and Windows CUDA
Acceleration documents to use the Tuna mirror. This change helps ensure that
users in China have faster access to the Anaconda distribution.

29e48c73

02 Aug, 2024 9 commits

docs: specify absolute path for model weights configuration · 9778a461

myhloli authored Aug 03, 2024

Update the README documents to clarify that the "models-dir" in the
configuration should be an absolute path. Also, provide additional guidance
for Windows users on how to correctly format the path to avoid common issues
with path escaping in JSON files.

9778a461

docs: add wget command for Ubuntu and powershell script for Windows · 44a2dc37

myhloli authored Aug 03, 2024

Add instructions to download the magic-pdf.template.json file using wget on
Ubuntu and a PowerShell script on Windows in the respective README files.
This is to facilitate the setup process by providing direct download options,
replacing manual file transfers.

44a2dc37

fix(docs): pin Magic-PDF version to 0.6.2b1 in install commands · a0c62b26

myhloli authored Aug 02, 2024

Update the install commands in both Ubuntu and Windows CUDA Acceleration
guides to specify Magic-PDF version 0.6.2b1, ensuring consistency andavoiding potential version mismatches.

a0c62b26

docs(FAQ): update dependency installation troubleshooting · 961330f7

myhloli authored Aug 02, 2024

Update the FAQ to clarify the dependency installation issue when using magic-pdf. Ensure
users are directed to install the specific version of magic-pdf that resolves the dependency
error, rather than listing all individual dependencies. This simplifies the troubleshooting process
and provides a direct solution for users encountering the "Required dependency not installed"
error.

961330f7

docs(models_zh_cn): update download methods from ModelScope · a24890b1

myhloli authored Aug 02, 2024

Update the download methods for models in the Chinese documentation to reflect
the latest options available from ModelScope. Simplify the section titles and
revise download instructions for clarity and consistency.

a24890b1

docs: update model download instructions and CUDA acceleration setup · 3ef4d054

myhloli authored Aug 02, 2024

Update the documentation to reflect the latest model download procedures, emphasis on
model file integrity checks, and expanded instructions for setting up CUDA accelerationon Ubuntu and Windows environments. The README files for various OS have been
enhanced with additional details to assist users in configuring and verifying their
environments for optimal performance.

3ef4d054

Make the documentation on how to download the model more concise · 2a06e0c8
xuchao authored Aug 02, 2024

2a06e0c8

feat(model inference): add table recognition and conversion to LaTeX (#284) · 37925f36

Kaiwen Liu authored Aug 02, 2024

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into html.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into LaTex.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # feat(model inference): add table recognition and convertion to LaTeX

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec

) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

* add table recognition and convertion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

---------
Co-authored-by: liukaiwen <liukaiwen@pjlab.org.cn>

37925f36

docs(output-file): correct poly coordinate format and update table descriptions · 41737adf

myhloli authored Aug 02, 2024

- Fix the description of the 'poly' coordinate format in the output file documentation to correctly reflect the order of coordinates: left-top, right-top, right-bottom,
  left-bottom.
- Update various table-related descriptions for clarity and consistency, including
  field names and their corresponding explanations.
- Add version name field description in 'middle.json' structure to document the
  version of the magic-pdf used in the parsing process.
- Refactor the block and line description tables to improve readability and alignment
  with the rest of the documentation.

41737adf

01 Aug, 2024 9 commits

docs: update README for Ubuntu CUDA Acceleration · 15125623

myhloli authored Aug 01, 2024

- Adjust command installation format for PaddlePaddle GPU.
- Clarify instruction numbering for testing OCR acceleration.

15125623

docs(zh_CN): update Ubuntu CUDA setup guide for accuracy · a09291ad

myhloli authored Aug 01, 2024

Update the Ubuntu CUDA Acceleration setup guide to reflect the correct user directory
path and improve the clarity of instructions. Remove references to Windows and macOS
as they are out of scope for this document. Ensure the configuration file copying
command is correctly represented for Linux users.

a09291ad

fix(docs): correct link to magic-pdf.template.json in README · 51a0bf4a

myhloli authored Aug 01, 2024

Update the link to the magic-pdf.template.json configuration template file in the
README_Ubuntu_CUDA_Acceleration_zh_CN.md document. The file path was previously
incorrect and has been amended to point to the correct location.

51a0bf4a

docs(magic-pdf): update model directory reference in configuration · 866e47a0

myhloli authored Aug 01, 2024

Update the instruction in README_Ubuntu_CUDA_Acceleration_zh_CN.md to reference
the correct section number for downloading the model weights. This change ensures
that users are directed to the correct location in the document for setting up the
model directory in the magic-pdf.json configuration.

866e47a0

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for... · fc18a5cf

myhloli authored Aug 01, 2024

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for Ubuntu 22.04 LTS installation.
- Detail the process of checking, installing, and configuring NVIDIA drivers.
- Include instructions for installing Anaconda and creating a specific environment.
- Provide guidance on installing magic-pdf and its dependencies.
- Add a note to verify magic-pdf version and report issues if necessary.
- Describe the process of downloading models and configuring the application.
- Include a sample command to run the application with CUDA acceleration.
- Add a note for enabling OCR CUDA acceleration with specific GPU requirements.

This update ensures users have the latest information for setting up CUDA accelerationwith magic-pdf on Ubuntu 22.04 LTS, specifically for version 0.6.2, and provides clearer
instructions on the installation and configuration process.

fc18a5cf

docs: restructure download guide and add ModelScope options · b4b2a099

myhloli authored Aug 01, 2024

Restructured the how-to download models document for better clarity and
added sections on downloading models from ModelScope, including SDK and
Git download methods. Provided detailed steps for installing Git LFS and
checking model integrity after download. Also included recommendations
for moving the models to an SSD for better performance.

b4b2a099

Feat/impl cli (#264) · 40e0827e

icecraft authored Aug 01, 2024



* feat: refractor cli command

* feat: add docs to describe the output files of cli

* feat: resove review comments

* feat: updat docs about middle.json

---------
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>

40e0827e

Update how_to_download_models_en.md · c30a1abd
Richard Li authored Aug 01, 2024

c30a1abd

# feat(model inference): add table recognition and convertion to LaTeX · d04f3f22

liukaiwen authored Aug 01, 2024

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

d04f3f22

31 Jul, 2024 2 commits

docs: add installation guide for git lfs on various platforms · 808563ce

myhloli authored Jul 31, 2024

Add detailed instructions for installing git lfs on Linux, macOS, and Windows
to facilitate users in downloading models from ModelScope repository. The guide
is included in the `how_to_download_models_zh_cn.md` document.

808563ce

Update how_to_download_models_zh_cn.md · b7cd875f
Richard Li authored Jul 31, 2024
```
use git lfs clone to download model from ModelScope
```
b7cd875f

30 Jul, 2024 1 commit
- docs(readme): update version requirement and GPU usage links · 3cdac5e4
  myhloli authored Jul 30, 2024
  
  3cdac5e4
29 Jul, 2024 1 commit
- Update FAQ_zh_cn.md · 230e37ed
  Xiaomeng Zhao authored Jul 29, 2024
  
  230e37ed
26 Jul, 2024 2 commits
- reduce size of minerU logo · f2b4b8ff
  徐超 authored Jul 26, 2024
  
  f2b4b8ff
- add MinerU logo · 9ec91339
  徐超 authored Jul 26, 2024
  
  9ec91339
24 Jul, 2024 2 commits
- Update FAQ_zh_cn.md · 90328b56
  Xiaomeng Zhao authored Jul 24, 2024
  
  90328b56
- docs: add FAQ section on Linux dependency issue · 0d9427ca
  myhloli authored Jul 24, 2024
  
  0d9427ca
23 Jul, 2024 2 commits
- move cla to root · 4f967dcc
  myhloli authored Jul 24, 2024
  
  4f967dcc
- feat(docs): add MinerU Contributor License Agreement · 52b74df4
  myhloli authored Jul 24, 2024
  
  52b74df4
19 Jul, 2024 1 commit
- add opendatalab logo · a9d2ef8d
  徐超 authored Jul 19, 2024
  
  a9d2ef8d
17 Jul, 2024 5 commits
- docs(FAQ_zh_cn): update · 6a9ad924
  myhloli authored Jul 17, 2024
  
  6a9ad924
- docs(FAQ_zh_cn): update markdown headers and add new questions for installation and usage issues · 549940d0
  myhloli authored Jul 17, 2024
  
  549940d0
- docs(FAQ_zh_cn): update markdown headers · d6eb101d
  myhloli authored Jul 17, 2024
  
  d6eb101d
- docs(FAQ_zh_cn): update markdown headers and add new question for MPS acceleration issue · c28e9dd6
  myhloli authored Jul 17, 2024
  
  c28e9dd6
- docs(readme): add link to FAQ for common issue resolution · e3ef9b20
  myhloli authored Jul 17, 2024
  
  e3ef9b20
15 Jul, 2024 1 commit
- Add support for model downloads from ModelScope · 01ec4614
  wangbinDL authored Jul 15, 2024
  
  01ec4614
13 Jul, 2024 1 commit
- docs: update download model instructions for Chinese users · 19fd0a40
  myhloli authored Jul 13, 2024
  
  19fd0a40
12 Jul, 2024 2 commits

docs(readme): update instructions for model download and environment setup · 21d7a693
myhloli authored Jul 12, 2024

21d7a693

feat(config-reader): add models-dir and device-mode configurations · 695b3579

myhloli authored Jul 12, 2024

Add new configuration options for custom model directories and device modeselection. This allows users to specify the directory where models are stored
and choose between CPU and GPU modes for model inference. The configurations
are read from a JSON file and can be easily extended to support additional
options in the future.

695b3579