Commits · 977e05ae2a7c677960cd308cb140f016826e7e80 · wangsen / MinerU

06 Jan, 2025 24 commits

Merge pull request #1425 from myhloli/dev · 977e05ae
Xiaomeng Zhao authored Jan 06, 2025
```
docs(ascend): 更新文档说明，增加docker运行前的环境要求
```
977e05ae

docs(ascend): 更新文档说明，增加docker运行前的环境要求 · a11e26bd

myhloli authored Jan 06, 2025

- 在文档中明确指出，使用docker运行MinerU前需确保物理机已安装支持CANN 8.0.RC2的驱动和固件
- 此更新有助于用户更好地准备适配Ascend NPU的环境，避免潜在的运行问题

a11e26bd

Merge pull request #1424 from myhloli/dev · 8474b898

Xiaomeng Zhao authored Jan 06, 2025

feat: enable table recognition by default- Set table recognition to enabled by default in the UI

8474b898

feat: enable table recognition by default- Set table recognition to enabled by default in the UI · bdfdfea6
myhloli authored Jan 06, 2025
```
- Change default layout model to 'doclayout_yolo'- Enable table recognition in the magic-pdf template
```
bdfdfea6
Merge pull request #1423 from myhloli/dev · 0665355a
Xiaomeng Zhao authored Jan 06, 2025
```
Dev
```
0665355a

docs(readme): update Docker commands to auto-activate virtual environment · a3de866d

myhloli authored Jan 06, 2025

- Update Docker run command in both README.md and README_zh-CN.md
- Add command to automatically activate the virtual environment upon container start
- Ensure users have the correct environment setup when accessing the container

a3de866d

style(docker): format Dockerfile for better readability · cc62ae8b

myhloli authored Jan 06, 2025

- Align python3.10-dev and python3-pip for improved visual consistency
- Enhance Dockerfile formatting without changing functionality

cc62ae8b

Update Dockerfile · 57b5999e
Xiaomeng Zhao authored Jan 06, 2025

57b5999e
Merge pull request #1422 from myhloli/dev · bc39fa87
Xiaomeng Zhao authored Jan 06, 2025
```
docs(README): update for 1.0.0 release and improve documentation
```
bc39fa87

docs(README): update for 1.0.0 release and improve documentation · 29dde7c2

myhloli authored Jan 06, 2025

- Update README.md and README_zh-CN.md for 1.0.0 release
- Add new API and compatibility information
- Update links to user guide and documentation
- Improve NPU acceleration section

29dde7c2

Merge pull request #1421 from myhloli/dev · 57e0af48
Xiaomeng Zhao authored Jan 06, 2025
```
fix(table): handle empty OCR result in rapidtable
```
57e0af48

fix(table): handle empty OCR result in rapidtable · 12caa784

myhloli authored Jan 06, 2025

- Add check for empty OCR result when using PaddleOCR model
- Assign None to ocr_result if no text is detected, preventing further errors

12caa784

Merge pull request #1420 from myhloli/dev · d76ec7da
Xiaomeng Zhao authored Jan 06, 2025
```
docs(README): add Ascend NPU acceleration guide
```
d76ec7da

docs(README): add Ascend NPU acceleration guide · 4d110d31

myhloli authored Jan 06, 2025

- Add new file README_Ascend_NPU_Acceleration_zh_CN.md in docs folder
- Update README.md and README_zh-CN.md to include link to new NPU acceleration guide
- Provide instructions for building and running Docker image for Ascend NPU
- List known issues and limitations when using Ascend NPU

4d110d31

Merge pull request #1414 from icecraft/refactor/remove_unused_code · 6b2b6132
Xiaomeng Zhao authored Jan 06, 2025
```
Refactor/remove unused code
```
6b2b6132
Merge pull request #1419 from myhloli/dev · a8831ba6
Xiaomeng Zhao authored Jan 06, 2025
```
feat：Add NPU support
```
a8831ba6
Merge branch 'dev' into dev · 8a0aa7a4
Xiaomeng Zhao authored Jan 06, 2025

8a0aa7a4

build(docker): update Dockerfiles for China and Huawei NPU versions · 2e1bf881

myhloli authored Jan 06, 2025

- Update package sources to use Aliyun mirrors for faster downloads
- Upgrade pip and install Python packages in virtual environment
- Add python3.10-dev package to Huawei NPU Dockerfile
- Update requirements file URLs to master branch- Install specific version of torch_npu in Huawei NPU Dockerfile
- Update magic-pdf installation method
- Improve modelscope installation process
- Optimize model download and configuration update steps

2e1bf881

fix: remove unused code · b557b458
icecraft authored Jan 06, 2025

b557b458
refactor: remove unused method in MagicModel class · 0622356e
icecraft authored Jan 06, 2025

0622356e
refactor: remove unused method in MagicModel class · d13f3c6d
icecraft authored Jan 06, 2025

d13f3c6d

build(docker): update Dockerfiles and download scripts · 36c3ad6f

myhloli authored Jan 06, 2025

- Update Dockerfiles in china, global, and huawei_npu directories
- Improve wget commands by specifying output file names
- Update READMEs to reflect new Dockerfile locations

36c3ad6f

Merge remote-tracking branch 'origin/dev' into dev · 0f1dff1e
myhloli authored Jan 06, 2025

0f1dff1e

build(docker): add Dockerfiles for global and Huawei NPU setups · ad099808

myhloli authored Jan 06, 2025

- Add Dockerfile for global setup with Ubuntu base image
- Add Dockerfile for Huawei NPU setup with Ascend base image
- Update requirements file structure:  - Rename requirements-docker.txt to docker/china/requirements.txt - Add new requirements files for global and Huawei NPU setups
- Install necessary packages and dependencies in both Dockerfiles- Set up virtual environment and install Python packages
- Download models and configure magic-pdf for both setups

ad099808

05 Jan, 2025 4 commits

docs(README): update documentation for NPU support · 2e8601ab

myhloli authored Jan 05, 2025

- Add section for using NPU acceleration in both English and Chinese README files
- Update system requirements to include CANN environment for NPU support
- Enhance the "Quick Start" guide with NPU-related information- Modify hardware requirements to specify "Ascend 910b" for NPU acceleration

2e8601ab

feat(tools): add character bounding box drawing functionality · f911a102

myhloli authored Jan 05, 2025

- Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes
- Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline
- Include option to draw character bounding boxes in debug mode

f911a102

style(pdf_parse_union_core_v2): remove unnecessary spaces and improve code... · 9951a170

myhloli authored Jan 05, 2025

style(pdf_parse_union_core_v2): remove unnecessary spaces and improve code formatting- Remove extra space in conditional statement for character spacing logic
- Adjust spacing in trigonometric checks for line direction- Improve overall code readability and consistency

9951a170

fix(magic-pdf): update OCR model selection logic · 16a0a350

myhloli authored Jan 05, 2025

- Add missing 'else' statement in OCR model selection logic
- Ensure consistent formatting of 'if' statements for better readability
- Remove unnecessary empty line in the 'app.py' file

16a0a350

03 Jan, 2025 4 commits
- refactor(ocr): comment out unnecessary log statement · 04febf52
  myhloli authored Jan 03, 2025
```
- Remove logger.info() call for additional_ocr_params to reduce log verbosity
```
  04febf52
- feat(model): add onnxruntime support for paddleocr on cpu · 512adb67
  myhloli authored Jan 03, 2025
```
- Implement ONNXModelSingleton to manage ONNX models
- Modify ModifiedPaddleOCR to use ONNX models on ARM CPUs without CUDA
- Update RapidTableModel to use RapidOCR with ONNXRuntime on CPU
- Add rapidocr_onnxruntime dependency in setup.py
```
  512adb67
- Merge pull request #1398 from yzztin/dev · ad9abc32
  Xiaomeng Zhao authored Jan 03, 2025
```
fix(web_api): Modify the import path of InferenceResult
```
  ad9abc32
- fix(web_api): Modify the import path of InferenceResult · 05109c36
  yzz authored Jan 03, 2025
  
  05109c36
02 Jan, 2025 3 commits

Merge pull request #1386 from myhloli/fix-char-without-space · 26f8cbac
Xiaomeng Zhao authored Jan 02, 2025
```
refactor(pdf_parse): improve character spacing handling in PDF text extraction
```
26f8cbac

refactor(pdf_parse): improve character spacing handling in PDF text extraction · c93950dc

myhloli authored Jan 02, 2025

- Update the logic for inserting spaces between characters- Consider the next character's position instead of the previous one
- Adjust the spacing threshold to 25% of the average character width
- Ignore spaces at the end of lines to prevent double spaces

c93950dc

refactor(pdf_parse): improve character spacing handling in PDF text extraction · 7c5cdcd4

myhloli authored Jan 02, 2025

- Update the logic for inserting spaces between characters- Consider the next character's position instead of the previous one
- Adjust the spacing threshold to 25% of the average character width
- Ignore spaces at the end of lines to prevent double spaces

7c5cdcd4

30 Dec, 2024 3 commits

refactor(magic_pdf): comment out npu-related code · 88b909e2

myhloli authored Dec 30, 2024

- Remove use_npu variable initialization
- Comment out device assignment and npu check
- Comment out use_npu parameter in ModifiedPaddleOCR constructor

88b909e2

fix(npu): correct module name for NPU operations · 2684e775

myhloli authored Dec 30, 2024

- Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu`
- Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu`
- Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`

2684e775

build(deps): update pydantic to latest version · 2e87e649

myhloli authored Dec 30, 2024

- Remove upper version limit for pydantic dependency
- This change allows for the use of the latest pydantic version

2e87e649

27 Dec, 2024 2 commits
- Merge pull request #1370 from icecraft/fix/path_delimiter · e72709cc
  Xiaomeng Zhao authored Dec 27, 2024
```
fix: s3 path join method
```
  e72709cc
- fix: s3 path join method · d637dab3
  icecraft authored Dec 27, 2024
  
  d637dab3