- 06 Jan, 2025 24 commits
-
-
Xiaomeng Zhao authored
docs(ascend): 更新文档说明,增加docker运行前的环境要求
-
myhloli authored
- 在文档中明确指出,使用docker运行MinerU前需确保物理机已安装支持CANN 8.0.RC2的驱动和固件 - 此更新有助于用户更好地准备适配Ascend NPU的环境,避免潜在的运行问题
-
Xiaomeng Zhao authored
feat: enable table recognition by default- Set table recognition to enabled by default in the UI
-
myhloli authored
- Change default layout model to 'doclayout_yolo'- Enable table recognition in the magic-pdf template
-
Xiaomeng Zhao authored
Dev
-
myhloli authored
- Update Docker run command in both README.md and README_zh-CN.md - Add command to automatically activate the virtual environment upon container start - Ensure users have the correct environment setup when accessing the container
-
myhloli authored
- Align python3.10-dev and python3-pip for improved visual consistency - Enhance Dockerfile formatting without changing functionality
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs(README): update for 1.0.0 release and improve documentation
-
myhloli authored
- Update README.md and README_zh-CN.md for 1.0.0 release - Add new API and compatibility information - Update links to user guide and documentation - Improve NPU acceleration section
-
Xiaomeng Zhao authored
fix(table): handle empty OCR result in rapidtable
-
myhloli authored
- Add check for empty OCR result when using PaddleOCR model - Assign None to ocr_result if no text is detected, preventing further errors
-
Xiaomeng Zhao authored
docs(README): add Ascend NPU acceleration guide
-
myhloli authored
- Add new file README_Ascend_NPU_Acceleration_zh_CN.md in docs folder - Update README.md and README_zh-CN.md to include link to new NPU acceleration guide - Provide instructions for building and running Docker image for Ascend NPU - List known issues and limitations when using Ascend NPU
-
Xiaomeng Zhao authored
Refactor/remove unused code
-
Xiaomeng Zhao authored
feat:Add NPU support
-
Xiaomeng Zhao authored
-
myhloli authored
- Update package sources to use Aliyun mirrors for faster downloads - Upgrade pip and install Python packages in virtual environment - Add python3.10-dev package to Huawei NPU Dockerfile - Update requirements file URLs to master branch- Install specific version of torch_npu in Huawei NPU Dockerfile - Update magic-pdf installation method - Improve modelscope installation process - Optimize model download and configuration update steps
-
icecraft authored
-
icecraft authored
-
icecraft authored
-
myhloli authored
- Update Dockerfiles in china, global, and huawei_npu directories - Improve wget commands by specifying output file names - Update READMEs to reflect new Dockerfile locations
-
myhloli authored
-
myhloli authored
- Add Dockerfile for global setup with Ubuntu base image - Add Dockerfile for Huawei NPU setup with Ascend base image - Update requirements file structure: - Rename requirements-docker.txt to docker/china/requirements.txt - Add new requirements files for global and Huawei NPU setups - Install necessary packages and dependencies in both Dockerfiles- Set up virtual environment and install Python packages - Download models and configure magic-pdf for both setups
-
- 05 Jan, 2025 4 commits
-
-
myhloli authored
- Add section for using NPU acceleration in both English and Chinese README files - Update system requirements to include CANN environment for NPU support - Enhance the "Quick Start" guide with NPU-related information- Modify hardware requirements to specify "Ascend 910b" for NPU acceleration
-
myhloli authored
- Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes - Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline - Include option to draw character bounding boxes in debug mode
-
myhloli authored
style(pdf_parse_union_core_v2): remove unnecessary spaces and improve code formatting- Remove extra space in conditional statement for character spacing logic - Adjust spacing in trigonometric checks for line direction- Improve overall code readability and consistency
-
myhloli authored
- Add missing 'else' statement in OCR model selection logic - Ensure consistent formatting of 'if' statements for better readability - Remove unnecessary empty line in the 'app.py' file
-
- 03 Jan, 2025 4 commits
-
-
myhloli authored
- Remove logger.info() call for additional_ocr_params to reduce log verbosity
-
myhloli authored
- Implement ONNXModelSingleton to manage ONNX models - Modify ModifiedPaddleOCR to use ONNX models on ARM CPUs without CUDA - Update RapidTableModel to use RapidOCR with ONNXRuntime on CPU - Add rapidocr_onnxruntime dependency in setup.py
-
Xiaomeng Zhao authored
fix(web_api): Modify the import path of InferenceResult
-
yzz authored
-
- 02 Jan, 2025 3 commits
-
-
Xiaomeng Zhao authored
refactor(pdf_parse): improve character spacing handling in PDF text extraction
-
myhloli authored
- Update the logic for inserting spaces between characters- Consider the next character's position instead of the previous one - Adjust the spacing threshold to 25% of the average character width - Ignore spaces at the end of lines to prevent double spaces
-
myhloli authored
- Update the logic for inserting spaces between characters- Consider the next character's position instead of the previous one - Adjust the spacing threshold to 25% of the average character width - Ignore spaces at the end of lines to prevent double spaces
-
- 30 Dec, 2024 3 commits
-
-
myhloli authored
- Remove use_npu variable initialization - Comment out device assignment and npu check - Comment out use_npu parameter in ModifiedPaddleOCR constructor
-
myhloli authored
- Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu` - Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu` - Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`
-
myhloli authored
- Remove upper version limit for pydantic dependency - This change allows for the use of the latest pydantic version
-
- 27 Dec, 2024 2 commits
-
-
Xiaomeng Zhao authored
fix: s3 path join method
-
icecraft authored
-