- 04 Mar, 2025 1 commit
-
-
myhloli authored
- Optimize paragraph splitting algorithm for better text block separation - Update fast-langdetect dependency to ensure compatibility
-
- 24 Feb, 2025 1 commit
-
-
myhloli authored
-
- 21 Feb, 2025 1 commit
-
-
myhloli authored
- Update instructions for AI-generated titles optimization - Use ast.literal_eval() instead of json.loads() for parsing completion content - Refactor variable names and logging for better code readability- Add error handling for JSON decoding issues
-
- 17 Jan, 2025 1 commit
-
-
myhloli authored
- Added instructions for checking the reasonability of heading levels - Included guidelines for making fine adjustments based on context and logic - Emphasized the importance of aligning the final result with the document's actual structure
-
- 16 Jan, 2025 1 commit
-
-
myhloli authored
- Adjust end_page_id calculation to prevent IndexError when accessing pages - Enhance error handling in LLM post-processing by specifically catching JSONDecodeError
-
- 15 Jan, 2025 1 commit
-
-
myhloli authored
- Clarify the expected format for the optimized title list JSON output- Emphasize the need to return only the title levels in the specified format
-
- 14 Jan, 2025 1 commit
-
-
myhloli authored
- Add average line height calculation for title blocks - Include page number in title dictionary - Improve title optimization prompt for better hierarchy- Implement retry mechanism for JSON decoding errors - Add error logging for title count mismatch
-
- 03 Jan, 2025 1 commit
-
-
myhloli authored
- Implement ONNXModelSingleton to manage ONNX models - Modify ModifiedPaddleOCR to use ONNX models on ARM CPUs without CUDA - Update RapidTableModel to use RapidOCR with ONNXRuntime on CPU - Add rapidocr_onnxruntime dependency in setup.py
-
- 25 Dec, 2024 2 commits
-
-
myhloli authored
- Comment out logging statements for title list, title completion, and length comparison - Improve code readability and reduce clutter by removing unused debug information
-
myhloli authored
- Implement llm_aided_title function to optimize document titles using LLM - Update pdf_parse_union_core_v2.py to include title optimization - Modify ocr_mkcontent.py to use optimized title levels- Add openai SDK dependency in setup.py
-
- 24 Dec, 2024 1 commit
-
-
myhloli authored
- Add LLM-aided formula and text correction functionality - Update config reader to include LLM-aided settings - Create new LLM-aided processing module - Update main processing script to incorporate LLM-aided corrections - Modify download scripts to check for new config version
-
- 12 Dec, 2024 1 commit
-
-
myhloli authored
- Add initial setup for layout detection - Implement conditional cropping for tall images - Skip cropping for wide images to improve performance - Reuse Image object across layout detection steps
-
- 26 Nov, 2024 1 commit
-
-
myhloli authored
-
- 19 Nov, 2024 1 commit
-
-
icecraft authored
-
- 11 Apr, 2024 1 commit
-
-
赵小蒙 authored
2、实现UNIPipe
-
- 05 Mar, 2024 1 commit
-
-
赵小蒙 authored
-
- 01 Mar, 2024 1 commit
-
-
赵小蒙 authored
-