Commits · 3ef4d054cf019d6dce4e03e10f698726b7c14ac0 · wangsen / MinerU

02 Aug, 2024 3 commits

docs: update model download instructions and CUDA acceleration setup · 3ef4d054

myhloli authored Aug 02, 2024

Update the documentation to reflect the latest model download procedures, emphasis on
model file integrity checks, and expanded instructions for setting up CUDA accelerationon Ubuntu and Windows environments. The README files for various OS have been
enhanced with additional details to assist users in configuring and verifying their
environments for optimal performance.

3ef4d054

feat(model inference): add table recognition and conversion to LaTeX (#284) · 37925f36

Kaiwen Liu authored Aug 02, 2024

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into html.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into LaTex.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # feat(model inference): add table recognition and convertion to LaTeX

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec

) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

* add table recognition and convertion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

---------
Co-authored-by: liukaiwen <liukaiwen@pjlab.org.cn>

37925f36

docs(output-file): correct poly coordinate format and update table descriptions · 41737adf

myhloli authored Aug 02, 2024

- Fix the description of the 'poly' coordinate format in the output file documentation to correctly reflect the order of coordinates: left-top, right-top, right-bottom,
  left-bottom.
- Update various table-related descriptions for clarity and consistency, including
  field names and their corresponding explanations.
- Add version name field description in 'middle.json' structure to document the
  version of the magic-pdf used in the parsing process.
- Refactor the block and line description tables to improve readability and alignment
  with the rest of the documentation.

41737adf

01 Aug, 2024 8 commits

docs: update README for Ubuntu CUDA Acceleration · 15125623

myhloli authored Aug 01, 2024

- Adjust command installation format for PaddlePaddle GPU.
- Clarify instruction numbering for testing OCR acceleration.

15125623

docs(zh_CN): update Ubuntu CUDA setup guide for accuracy · a09291ad

myhloli authored Aug 01, 2024

Update the Ubuntu CUDA Acceleration setup guide to reflect the correct user directory
path and improve the clarity of instructions. Remove references to Windows and macOS
as they are out of scope for this document. Ensure the configuration file copying
command is correctly represented for Linux users.

a09291ad

fix(docs): correct link to magic-pdf.template.json in README · 51a0bf4a

myhloli authored Aug 01, 2024

Update the link to the magic-pdf.template.json configuration template file in the
README_Ubuntu_CUDA_Acceleration_zh_CN.md document. The file path was previously
incorrect and has been amended to point to the correct location.

51a0bf4a

docs(magic-pdf): update model directory reference in configuration · 866e47a0

myhloli authored Aug 01, 2024

Update the instruction in README_Ubuntu_CUDA_Acceleration_zh_CN.md to reference
the correct section number for downloading the model weights. This change ensures
that users are directed to the correct location in the document for setting up the
model directory in the magic-pdf.json configuration.

866e47a0

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for... · fc18a5cf

myhloli authored Aug 01, 2024

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for Ubuntu 22.04 LTS installation.
- Detail the process of checking, installing, and configuring NVIDIA drivers.
- Include instructions for installing Anaconda and creating a specific environment.
- Provide guidance on installing magic-pdf and its dependencies.
- Add a note to verify magic-pdf version and report issues if necessary.
- Describe the process of downloading models and configuring the application.
- Include a sample command to run the application with CUDA acceleration.
- Add a note for enabling OCR CUDA acceleration with specific GPU requirements.

This update ensures users have the latest information for setting up CUDA accelerationwith magic-pdf on Ubuntu 22.04 LTS, specifically for version 0.6.2, and provides clearer
instructions on the installation and configuration process.

fc18a5cf

docs: restructure download guide and add ModelScope options · b4b2a099

myhloli authored Aug 01, 2024

Restructured the how-to download models document for better clarity and
added sections on downloading models from ModelScope, including SDK and
Git download methods. Provided detailed steps for installing Git LFS and
checking model integrity after download. Also included recommendations
for moving the models to an SSD for better performance.

b4b2a099

Feat/impl cli (#264) · 40e0827e

icecraft authored Aug 01, 2024



* feat: refractor cli command

* feat: add docs to describe the output files of cli

* feat: resove review comments

* feat: updat docs about middle.json

---------
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>

40e0827e

Update how_to_download_models_en.md · c30a1abd
Richard Li authored Aug 01, 2024

c30a1abd

31 Jul, 2024 2 commits

docs: add installation guide for git lfs on various platforms · 808563ce

myhloli authored Jul 31, 2024

Add detailed instructions for installing git lfs on Linux, macOS, and Windows
to facilitate users in downloading models from ModelScope repository. The guide
is included in the `how_to_download_models_zh_cn.md` document.

808563ce

Update how_to_download_models_zh_cn.md · b7cd875f
Richard Li authored Jul 31, 2024
```
use git lfs clone to download model from ModelScope
```
b7cd875f

30 Jul, 2024 1 commit
- docs(readme): update version requirement and GPU usage links · 3cdac5e4
  myhloli authored Jul 30, 2024
  
  3cdac5e4
29 Jul, 2024 1 commit
- Update FAQ_zh_cn.md · 230e37ed
  Xiaomeng Zhao authored Jul 29, 2024
  
  230e37ed
26 Jul, 2024 2 commits
- reduce size of minerU logo · f2b4b8ff
  徐超 authored Jul 26, 2024
  
  f2b4b8ff
- add MinerU logo · 9ec91339
  徐超 authored Jul 26, 2024
  
  9ec91339
24 Jul, 2024 2 commits
- Update FAQ_zh_cn.md · 90328b56
  Xiaomeng Zhao authored Jul 24, 2024
  
  90328b56
- docs: add FAQ section on Linux dependency issue · 0d9427ca
  myhloli authored Jul 24, 2024
  
  0d9427ca
23 Jul, 2024 2 commits
- move cla to root · 4f967dcc
  myhloli authored Jul 24, 2024
  
  4f967dcc
- feat(docs): add MinerU Contributor License Agreement · 52b74df4
  myhloli authored Jul 24, 2024
  
  52b74df4
19 Jul, 2024 1 commit
- add opendatalab logo · a9d2ef8d
  徐超 authored Jul 19, 2024
  
  a9d2ef8d
17 Jul, 2024 5 commits
- docs(FAQ_zh_cn): update · 6a9ad924
  myhloli authored Jul 17, 2024
  
  6a9ad924
- docs(FAQ_zh_cn): update markdown headers and add new questions for installation and usage issues · 549940d0
  myhloli authored Jul 17, 2024
  
  549940d0
- docs(FAQ_zh_cn): update markdown headers · d6eb101d
  myhloli authored Jul 17, 2024
  
  d6eb101d
- docs(FAQ_zh_cn): update markdown headers and add new question for MPS acceleration issue · c28e9dd6
  myhloli authored Jul 17, 2024
  
  c28e9dd6
- docs(readme): add link to FAQ for common issue resolution · e3ef9b20
  myhloli authored Jul 17, 2024
  
  e3ef9b20
15 Jul, 2024 1 commit
- Add support for model downloads from ModelScope · 01ec4614
  wangbinDL authored Jul 15, 2024
  
  01ec4614
13 Jul, 2024 1 commit
- docs: update download model instructions for Chinese users · 19fd0a40
  myhloli authored Jul 13, 2024
  
  19fd0a40
12 Jul, 2024 2 commits

docs(readme): update instructions for model download and environment setup · 21d7a693
myhloli authored Jul 12, 2024

21d7a693

feat(config-reader): add models-dir and device-mode configurations · 695b3579

myhloli authored Jul 12, 2024

Add new configuration options for custom model directories and device modeselection. This allows users to specify the directory where models are stored
and choose between CPU and GPU modes for model inference. The configurations
are read from a JSON file and can be easily extended to support additional
options in the future.

695b3579

05 Jul, 2024 1 commit
- update readme · fb27361e
  赵小蒙 authored Jul 05, 2024
  
  fb27361e
28 Jun, 2024 1 commit
- update images · f84eb897
  赵小蒙 authored Jun 28, 2024
  
  f84eb897
27 Jun, 2024 4 commits
- update: remove watermark and use transparent background · bfd4bfeb
  赵小蒙 authored Jun 27, 2024
  
  bfd4bfeb
- update: remove video file · 2640875c
  赵小蒙 authored Jun 27, 2024
  
  2640875c
- update: · 9eca87f4
  赵小蒙 authored Jun 27, 2024
```
add demo video
```
  9eca87f4
- update Flowchart and Submodule Repositories · a8730cc9
  赵小蒙 authored Jun 27, 2024
  
  a8730cc9
26 Jun, 2024 2 commits
- update project panorama · 5334d3a9
  赵小蒙 authored Jun 26, 2024
  
  5334d3a9
- add Project Panorama · c1bc9d83
  赵小蒙 authored Jun 26, 2024
  
  c1bc9d83