- 09 Jan, 2025 2 commits
-
-
Xiaomeng Zhao authored
refactor(langdetect): simplify language detection model
-
myhloli authored
- Remove LangDetectMode and related conditional logic - Use a single model weight for language detection - Add logging for language detection results - Update model initialization and prediction methods
-
- 08 Jan, 2025 9 commits
-
-
myhloli authored
- Add language detection model initialization and integration - Update model list to include language detection - Refactor language detection utils for better model management
-
myhloli authored
- Add separate models for Chinese/Japanese and English/French/German detection - Implement mode-based detection to use appropriate models for different languages - Update language detection process to use higher DPI for better accuracy - Modify model initialization and prediction logic to support new language-specific models
-
Xiaomeng Zhao authored
refactor(docs): consolidate MS Office document conversion guides
-
myhloli authored
-
Xiaomeng Zhao authored
docs/update_docs
-
icecraft authored
-
Xiaomeng Zhao authored
fix(pdf_parse): ensure block bounding boxes do not have negative values
-
myhloli authored
- Add logic to set any negative values in block['bbox'] to 0 - This prevents potential errors when processing PDF blocks
-
Xiaomeng Zhao authored
add test case
-
- 07 Jan, 2025 12 commits
-
-
myhloli authored
-
myhloli authored
-
dt-yy authored
-
dt-yy authored
-
dt-yy authored
-
dt-yy authored
-
Xiaomeng Zhao authored
fix(clear_bu): remove unused input from clear button
-
myhloli authored
Remove 'table_enable' input from the clear button's function call. This change ensures that only necessary inputs are included in the clear operation, improving code efficiency and maintainability.
-
Xiaomeng Zhao authored
feat(api): simplify markdown and content list generation
-
myhloli authored
- Remove DropMode and MakeMode imports from user code - Set default drop_mode to DropMode.NONE in get_markdown and get_content_list methods - Remove md_make_mode parameter from get_content_list method - Add dump_middle_json method to PipeResult - Update examples in API documentation and demo script
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
-
- 06 Jan, 2025 17 commits
-
-
Xiaomeng Zhao authored
docs(Ascend): 更新已知问题说明
-
myhloli authored
- 修改 paddlepaddle 使用内嵌 onnx 模型的描述,明确仅支持中英文 ocr
-
Xiaomeng Zhao authored
docs: update README files for v0.10.0 release
-
myhloli authored
- Update README.md and README_zh-CN.md to reflect the latest features and improvements - Highlight new hybrid OCR text extraction capabilities and performance enhancements - Emphasize optimized compatibility for ARM architecture Linux systems - Mention integration with Huawei Ascend NPU acceleration
-
Xiaomeng Zhao authored
docs(ascend): 更新文档说明,增加docker运行前的环境要求
-
myhloli authored
- 在文档中明确指出,使用docker运行MinerU前需确保物理机已安装支持CANN 8.0.RC2的驱动和固件 - 此更新有助于用户更好地准备适配Ascend NPU的环境,避免潜在的运行问题
-
Xiaomeng Zhao authored
feat: enable table recognition by default- Set table recognition to enabled by default in the UI
-
myhloli authored
- Change default layout model to 'doclayout_yolo'- Enable table recognition in the magic-pdf template
-
Xiaomeng Zhao authored
Dev
-
myhloli authored
- Update Docker run command in both README.md and README_zh-CN.md - Add command to automatically activate the virtual environment upon container start - Ensure users have the correct environment setup when accessing the container
-
myhloli authored
- Align python3.10-dev and python3-pip for improved visual consistency - Enhance Dockerfile formatting without changing functionality
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs(README): update for 1.0.0 release and improve documentation
-
myhloli authored
- Update README.md and README_zh-CN.md for 1.0.0 release - Add new API and compatibility information - Update links to user guide and documentation - Improve NPU acceleration section
-
Xiaomeng Zhao authored
fix(table): handle empty OCR result in rapidtable
-
myhloli authored
- Add check for empty OCR result when using PaddleOCR model - Assign None to ocr_result if no text is detected, preventing further errors
-
Xiaomeng Zhao authored
docs(README): add Ascend NPU acceleration guide
-