- 03 Mar, 2025 1 commit
-
-
myhloli authored
- Add performance_stats module to measure and print execution time statistics - Implement measure_time decorator to track execution time of key functions - Remove multi-threading in pdf parsing for better resource management - Optimize pdf parsing logic for improved performance
-
- 28 Feb, 2025 1 commit
-
-
myhloli authored
- Add ThreadPoolExecutor to process PDF pages in parallel - Create separate function for page processing to improve readability and maintainability - Include error handling for individual page processing tasks - Log total page processing time for performance monitoring
-
- 27 Feb, 2025 1 commit
-
-
myhloli authored
- Update condition to only convert full-width letters and numbers - Remove separate case for full-width space
-
- 26 Feb, 2025 5 commits
-
-
Xiaomeng Zhao authored
-
icecraft authored
-
Xiaomeng Zhao authored
refactor(magic_pdf): remove bfloat16 support checks and usage
-
myhloli authored
- Replace complex device selection logic with a single line using torch.device - Remove redundant checks and imports for better readability and maintainability
-
myhloli authored
- Remove supports_bfloat16 variable and related checks - Remove model.bfloat16() call for LayoutLMv3ForTokenClassification - Simplify device selection logic
-
- 25 Feb, 2025 5 commits
-
-
Xiaomeng Zhao authored
perf(model): optimize batch analyze process
-
myhloli authored
- Implement full_to_half function to convert full-width characters to half-width - Apply conversion to span content before merging paragraphs - Improve text processing for better readability and consistency
-
myhloli authored
- Move batch model initialization outside the loop - Collect page dimensions before analyzing- Update page info dictionary structure - Add null dimensions for non-analyzed pages
-
Xiaomeng Zhao authored
docs(ascend): update Ascend NPU acceleration documentation
-
myhloli authored
- Add information about OS and CANN version compatibility - Include details on high-performance mode and its requirements - Update paddlepaddle-gpu installation instructions for CUDA acceleration - Remove unnecessary empty line from changelog
-
- 24 Feb, 2025 11 commits
-
-
Xiaomeng Zhao authored
docs(README): update release notes for version 1.2.0
-
myhloli authored
- Update English and Chinese README files with the changelog for version 1.2.0 - Include details on performance optimizations, parsing improvements, and bug fixes - Highlight specific enhancements for PDF document classification, watermark handling, and layout matching
-
myhloli authored
-
myhloli authored
- Update English and Chinese README files with the changelog for version 1.2.0 - Include details on performance optimizations, parsing improvements, and bug fixes - Highlight specific enhancements for PDF document classification, watermark handling, and layout matching
-
Xiaomeng Zhao authored
feat(pre_proc): add block type compatibility check for span allocation
-
myhloli authored
- Introduce span_block_type_compatible function to check compatibility between span and block types - Update fill_spans_in_blocks function to use the new compatibility check - Improve accuracy of span allocation to blocks based on content type
-
Xiaomeng Zhao authored
fix(llm_aided): update prompt
-
myhloli authored
-
myhloli authored
- Update the logic for determining `end_page_id` to handle negative values - This change ensures proper behavior when `end_page_id` is set to -1 or other negative values
-
Xiaomeng Zhao authored
fix #1747
-
Xiaomeng Zhao authored
Updata ext.py is_pdf function to support the pdf with Chinese characters and special characters
-
- 23 Feb, 2025 2 commits
-
-
Xiaomeng Zhao authored
chore(magic_pdf): enhance license logging information
-
myhloli authored
- Add license ID information to the log for better traceability - Improve logging format to include both license ID and expiration date
-
- 22 Feb, 2025 3 commits
-
-
Nathan Dahlberg authored
-
github-actions[bot] authored
-
sayThQ199 authored
Determine whether the file name with a.pdf extension supports the inclusion of special characters or Chinese characters.
-
- 21 Feb, 2025 6 commits
-
-
Xiaomeng Zhao authored
fix(model): handle import errors and improve exception logging
-
myhloli authored
- Add ImportError handling to silence known import-related exceptions - Improve generic exception handling to log error messages- Maintain existing specific exception handlers for license-related issues
-
Xiaomeng Zhao authored
feat(model_init): implement license verification for Ascend plugin
-
myhloli authored
- Add license verification logic for Ascend plugin - Handle different license-related exceptions with appropriate error messages - Log success message with license expiration date if verification passes - Fall back to CPU model if license verification fails or plugin is not available
-
Xiaomeng Zhao authored
refactor(magic_pdf): improve title optimization process
-
myhloli authored
- Update instructions for AI-generated titles optimization - Use ast.literal_eval() instead of json.loads() for parsing completion content - Refactor variable names and logging for better code readability- Add error handling for JSON decoding issues
-
- 19 Feb, 2025 2 commits
-
-
Xiaomeng Zhao authored
docs(windows): add numpy version limit for CUDA installation
-
Xiaomeng Zhao authored
-
- 18 Feb, 2025 3 commits
-
-
Xiaomeng Zhao authored
fix: ut
-
Xiaomeng Zhao authored
Fix/caption match
-
myhloli authored
- Add numpy version limit (<2.0.0) in the pip installation command for Windows CUDA acceleration - Update both English and Chinese versions of the README
-