Commits · 87fd4c280609edc5cf98e8cd13408bb7092af864 · wangsen / MinerU

08 Apr, 2025 1 commit
- Update bug_report.yml · 87fd4c28
  Xiaomeng Zhao authored Apr 08, 2025
  
  87fd4c28
07 Apr, 2025 3 commits
- Merge pull request #2125 from opendatalab/dev · 980f5c8c
  Xiaomeng Zhao authored Apr 07, 2025
```
docs: update torchvision version in CUDA installation guide
```
  980f5c8c
- Merge pull request #2124 from myhloli/dev · f442adfc
  Xiaomeng Zhao authored Apr 07, 2025
```
docs: update torchvision version in CUDA installation guide
```
  f442adfc
- docs: update torchvision version in CUDA installation guide · d4cda0a8
  myhloli authored Apr 07, 2025
```
- Update torchvision version from0.21.1 to0.21.0 in Windows CUDA acceleration guides
- Update both English and Chinese versions of the documentation
```
  d4cda0a8
06 Apr, 2025 4 commits

Merge pull request #2115 from myhloli/dev · 60fdf851
Xiaomeng Zhao authored Apr 06, 2025
```
build: remove accelerate dependency
```
60fdf851

build: remove accelerate dependency · a10b9aec

myhloli authored Apr 06, 2025

- Remove accelerate package from requirements.txt
- This change ensures only necessary external dependencies are introduced

a10b9aec

Merge pull request #2114 from myhloli/dev · e3261b0e

Xiaomeng Zhao authored Apr 06, 2025

build(deps): add accelerate package and update requirements https://github.com/opendatalab/MinerU/issues/2112

e3261b0e

build(deps): add accelerate package and update requirements · 09632ddd

myhloli authored Apr 06, 2025

- Add accelerate package to support model training acceleration
- Update requirements.txt to include new dependency

09632ddd

03 Apr, 2025 29 commits

Merge pull request #2093 from opendatalab/master · c5329a07
Xiaomeng Zhao authored Apr 03, 2025
```
master -> dev
```
c5329a07
Update version.py with new version · d629ce04
myhloli authored Apr 03, 2025

d629ce04
Merge pull request #2091 from opendatalab/release-1.3.0 · 3963b965
Xiaomeng Zhao authored Apr 03, 2025
```
Release 1.3.0
```
3963b965
Merge pull request #2090 from opendatalab/dev · 1cd50125
Xiaomeng Zhao authored Apr 03, 2025
```
docs(readme): update release notes for version 1.3.0
```
1cd50125
Merge pull request #2089 from myhloli/dev · 1a1b8fdb
Xiaomeng Zhao authored Apr 03, 2025
```
docs(readme): update release notes for version 1.3.0 
```
1a1b8fdb

docs(readme): update changelog and highlight usability improvements · 4067f6fd

myhloli authored Apr 03, 2025

- Remove duplicate entries for paddleocr2torch and thread safety
- Add new entry for real-time progress bar implementation
- Update mfr model to unimernet(2503)
- Extend torch version compatibility
- Enhance cuda support for various GPU models
- Improve parsing speed on MPS devices

4067f6fd

docs(readme): update release notes for version 1.3.0 · 5c2e25ac

myhloli authored Apr 03, 2025

- Update release notes in both English and Chinese README files
- Highlight major optimizations and improvements in version 1.3.0
- Clarify compatibility changes for torch, CUDA, and Python versions
- Emphasize performance improvements and parsing speed enhancements
- Mention specific bug fixes and parsing effect optimizations

5c2e25ac

Merge pull request #2065 from opendatalab/release-1.3.0 · 41d96cd8
Xiaomeng Zhao authored Apr 03, 2025
```
Release 1.3.0
```
41d96cd8
Merge pull request #2088 from opendatalab/dev · dd96663c
Xiaomeng Zhao authored Apr 03, 2025
```
fix: support non-pdf file in batch mode
```
dd96663c
Merge pull request #2087 from icecraft/fix/convert_image_with_pymupdf · bb40b9b6
Xiaomeng Zhao authored Apr 03, 2025
```
fix: convert image with pymupdf
```
bb40b9b6
fix: convert image with pymupdf · 3e8ee23e
icecraft authored Apr 03, 2025

3e8ee23e
Merge pull request #2086 from icecraft/fix/support_non_pdf_in_batch · 14097d4e
Xiaomeng Zhao authored Apr 03, 2025
```
fix: support non-pdf file in batch mode
```
14097d4e
fix: support non-pdf file in batch mode · 3379f3b3
icecraft authored Apr 03, 2025

3379f3b3
Merge pull request #2084 from opendatalab/dev · efbd00bf
Xiaomeng Zhao authored Apr 03, 2025
```
feat(web_api): update configuration and remove unused code 
```
efbd00bf
Merge pull request #2083 from myhloli/dev · e38efb97
Xiaomeng Zhao authored Apr 03, 2025
```
feat(web_api): update configuration and remove unused code
```
e38efb97

feat(web_api): update configuration and remove unused code · 3a820305

myhloli authored Apr 03, 2025

- Comment out PaddlePaddle GPU installation in Dockerfile
- Add OCR model download URL in download_models.py
- Update config version in magic-pdf.json
- Remove outdated information and simplify README.md
- Remove volume creation for PaddleOCR models in Dockerfile

3a820305

Merge pull request #2082 from opendatalab/dev · 01054426
Xiaomeng Zhao authored Apr 03, 2025
```
docs(user_guide): update installation guide and CUDA support
```
01054426
Merge pull request #2081 from myhloli/dev · 5c46c791
Xiaomeng Zhao authored Apr 03, 2025
```
docs(user_guide): update installation guide and CUDA support
```
5c46c791

docs(user_guide): update installation guide and CUDA support · b51ac110

myhloli authored Apr 03, 2025

- Update CUDA version requirements to 12.4 and higher
- Add support for CUDA 12.6 and CANN environments- Update Python version requirements to 3.10-3.12
- Remove paddlepaddle-gpu installation and related instructions
- Update magic-pdf installation command to use Aliyun mirror
- Add storage requirements and update memory requirements
- Update GPU hardware support list to include all GPUs with Tensor Cores
- Add support for Apple Silicon

b51ac110

Merge pull request #2080 from opendatalab/dev · 579057dd
Xiaomeng Zhao authored Apr 03, 2025
```
docs(readme): update changelog and compatibility information 
```
579057dd
Merge pull request #2079 from myhloli/dev · 9ffdd0df
Xiaomeng Zhao authored Apr 03, 2025
```
docs(readme): update changelog and compatibility information
```
9ffdd0df

docs(readme): update changelog and compatibility information · 0544996f

myhloli authored Apr 03, 2025

- Update changelog for version 1.3.0 release
- Clarify CUDA and GPU compatibility improvements
- Add information about batch processing speed improvements
- Update model download process and memory usage optimizations
- Include link to batch processing demo script

0544996f

Merge pull request #2078 from opendatalab/dev · 6a30b5bd
Xiaomeng Zhao authored Apr 03, 2025
```
feat(model): add tqdm progress bar to model prediction loops
```
6a30b5bd
Merge pull request #2077 from myhloli/dev · fe4e62a7
Xiaomeng Zhao authored Apr 03, 2025
```
feat(model): add tqdm progress bar to model prediction loops
```
fe4e62a7

refactor(magic_pdf): optimize table recognition and layout detection · 1fd72f5f

myhloli authored Apr 03, 2025

- Update table recognition logic to process each table individually
- Refactor layout detection to use tqdm for progress tracking
- Optimize OCR recognition by using a single tqdm wrapper
- Improve MFR prediction with a more accurate progress bar
- Simplify MFD prediction by removing unnecessary total calculation

1fd72f5f

refactor(magic_pdf): remove OCR timing measurement code · 795233d1

myhloli authored Apr 03, 2025

- Comment out OCR timing measurement code to improve readability and performance
- Remove unnecessary logging of OCR processing time

795233d1

refactor(magic_pdf): optimize code and improve logging · 553f250f

myhloli authored Apr 03, 2025

- Remove unused imports and comments
- Increase MIN_BATCH_INFERENCE_SIZE from 100 to 200
- Comment out VRAM cleaning and logging in batch_analyze.py
- Simplify code in doc_analyze_by_custom_model.py- Add tqdm progress bar in pdf_parse_union_core_v2.py
- Enable tqdm in OCR processing

553f250f

docs(README): update model config examples and add tqdm dependency · 86058278
myhloli authored Apr 03, 2025
```
- Remove outdated comments in table-config examples
- Add tqdm to requirements in all Docker environments
```
86058278

feat(model): add tqdm progress bar to model prediction loops · 8e1c2339

myhloli authored Apr 03, 2025

- Add tqdm progress bar to batch prediction loops in multiple model modules
- Improve logging and error handling in batch analysis script
- Update table model initialization to use default sub-model if none specified
- Add tqdm dependency to requirements.txt

8e1c2339

02 Apr, 2025 3 commits
- Merge pull request #2074 from opendatalab/dev · 13742c38
  Xiaomeng Zhao authored Apr 03, 2025
```
feat(model): update Chinese OCR detection model to PP-OCRv3 
```
  13742c38
- Merge pull request #2073 from myhloli/dev · 09bd890e
  Xiaomeng Zhao authored Apr 03, 2025
```
feat(model): update Chinese OCR detection model to PP-OCRv3
```
  09bd890e
- feat(model): update Chinese OCR detection model to PP-OCRv3 · ddfeea94
  myhloli authored Apr 03, 2025
```
- Replace ch_PP-OCRv4_det_infer.pth with ch_PP-OCRv3_det_infer.pth in models_config.yml
- Add new ch_PP-OCRv3_det_infer model configuration in arch_config.yaml
```
  ddfeea94