- 21 Feb, 2025 3 commits
-
-
myhloli authored
- Add ImportError handling to silence known import-related exceptions - Improve generic exception handling to log error messages- Maintain existing specific exception handlers for license-related issues
-
myhloli authored
- Add license verification logic for Ascend plugin - Handle different license-related exceptions with appropriate error messages - Log success message with license expiration date if verification passes - Fall back to CPU model if license verification fails or plugin is not available
-
myhloli authored
- Update instructions for AI-generated titles optimization - Use ast.literal_eval() instead of json.loads() for parsing completion content - Refactor variable names and logging for better code readability- Add error handling for JSON decoding issues
-
- 18 Feb, 2025 3 commits
- 14 Feb, 2025 1 commit
-
-
myhloli authored
-
- 11 Feb, 2025 2 commits
-
-
myhloli authored
- Move environment variable settings for NPU, MPS, and other configurations to the global scope in doc_analyze_by_custom_model.py - Remove redundant environment variable settings in pdf_extract_kit.py - This change ensures consistent configuration across the application and avoids potential conflicts or duplicate settings
-
myhloli authored
-
- 10 Feb, 2025 2 commits
-
-
myhloli authored
- Remove redundant imports for StructTableModel and TableMasterPaddleModel - Reorder imports to group related modules together - Update import structure for better readability and maintainability
-
myhloli authored
- Remove unused utility functions - Update import statements for better readability - Add conditional imports for Ascend plugin - Refactor table model initialization to support NPU
-
- 09 Feb, 2025 4 commits
-
-
myhloli authored
- Update calculate_contrast function to support both RGB and BGR image modes - Add input validation for image mode in calculate_contrast function - Modify usage of calculate_contrast function in OCR processing to specify image mode
-
myhloli authored
- Increase batch size from 8 to 256 for language detection inference - Add timing measurement for language detection process
-
myhloli authored
-
myhloli authored
-
- 08 Feb, 2025 2 commits
-
-
myhloli authored
- Rename empty_spans to need_ocr_spans for better clarity - Add calculate_contrast function to measure image contrast - Filter out low-contrast spans to improve OCR accuracy - Update OCR processing workflow to use new filtering method
-
myhloli authored
- Uncomment detect_invalid_chars_by_pymupdf function call - Comment out detect_invalid_chars function call
-
- 07 Feb, 2025 1 commit
-
-
myhloli authored
- Update batch ratio calculation logic to better utilize available GPU memory - Improve logging for all GPU memory sizes
-
- 27 Jan, 2025 2 commits
- 23 Jan, 2025 1 commit
-
-
myhloli authored
-
- 22 Jan, 2025 3 commits
-
-
myhloli authored
- Add timing measurement for formula, text, and title optimization using LLM - Log the execution time for each LLM aided process
-
myhloli authored
- Add a check to return 0 when either bbox1_area or bbox2_area is zero - This prevents division by zero errors when calculating IoU
-
myhloli authored
- Restore commented code for filtering out characters with invalid bounding boxes - This change may affect the filtering of unnecessary characters in PDF parsing
-
- 21 Jan, 2025 7 commits
-
-
myhloli authored
- Update conditions for batch ratio assignment: -8 <= gpu_memory < 10: batch_ratio = 2 - 10 <= gpu_memory <= 12: batch_ratio =4 - This fix ensures proper batch ratio selection for GPU memory sizes
-
myhloli authored
- Improve batch ratio calculation based on GPU memory - Enhance performance for devices with 8GB or more VRAM
-
myhloli authored
- Reduce batch_ratio by 1 for better performance and stability - This change ensures more consistent memory usage when processing documents
-
myhloli authored
refactor(magic_pdf): adjust VRAM allocation and MFR batch size- Update VRAM allocation logic to use 'VIRTUAL_VRAM_SIZE' environment variable - Reduce MFR (Math Formula Recognition) batch size from 64 to 32
-
myhloli authored
- Update GPU memory check and batch ratio calculation logic - Add support for virtual VRAM size environment variable - Improve logging for GPU memory and batch ratio
-
myhloli authored
- Reduce YOLO_LAYOUT_BASE_BATCH_SIZE from 4 to 1 - Simplify batch ratio calculation for formula detection - Remove unused conditional logic in batch ratio determination
-
myhloli authored
- Update model path from 'unimernet_small' to 'unimernet_small_2501' in multiple scripts and configuration files - This change affects download_models.py, download_models_hf.py, and model_configs.yaml
-
- 20 Jan, 2025 3 commits
-
-
myhloli authored
- Add key length validation for ONNX model initialization - Move import statements to the top of the file - Wrap model initialization in a try-except block for better error handling - Refactor code to improve readability and maintainability
-
myhloli authored
- Add remove_tilted_line function to filter out lines with angles between 2 and 88 degrees - Integrate the new function into the text extraction process - Improve the accuracy of text block processing by removing non-horizontal/vertical lines
-
陆逊 authored
-
- 17 Jan, 2025 3 commits
-
-
myhloli authored
- Added instructions for checking the reasonability of heading levels - Included guidelines for making fine adjustments based on context and logic - Emphasized the importance of aligning the final result with the document's actual structure
-
myhloli authored
- Commented out the original batch ratio calculation - Set a fixed batch ratio of 2 for GPUs with less than 8 GB memory - Increased batch ratio to 4 for GPUs with 8 GB or more memory
-
myhloli authored
- Import get_device function from magic_pdf.libs.config_reader- Update RapidTableModel initialization to include device parameter for Unitable model
-
- 16 Jan, 2025 3 commits
-
-
myhloli authored
- Modify the batch analyze process to handle the rapid table model's output - Add logic_points variable to capture additional output from rapid table prediction
-
myhloli authored
- Update RapidTable dependency to version 1.0.3 - Add support for sub-models in RapidTable - Update magic-pdf configuration to include table sub-model - Modify table model initialization to support sub-models - Update table prediction logic to handle new output format
-
myhloli authored
- Adjust end_page_id calculation to prevent IndexError when accessing pages - Enhance error handling in LLM post-processing by specifically catching JSONDecodeError
-