- 28 Nov, 2024 6 commits
-
-
Xiaomeng Zhao authored
refactor(para): improve language detection and block splitting
-
myhloli authored
- Add language detection for each block of text - Implement language-specific logic for right margin alignment - Introduce logging for debugging purposes
-
Xiaomeng Zhao authored
fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text
-
myhloli authored
fix(Hybrid OCR):Enable Hybrid OCR for Empty Spans That Contain a Certain Number of Placeholders but No Actual Text
-
Xiaomeng Zhao authored
fix(lite_model): Adapt iite Mode to the Hybrid OCR Mode in Version 0.10
-
myhloli authored
-
- 27 Nov, 2024 18 commits
-
-
Xiaomeng Zhao authored
master -> dev
-
myhloli authored
-
Xiaomeng Zhao authored
Release 0.10.2
-
Xiaomeng Zhao authored
refactor(pdf_parse_union_core_v2): optimize page processing time logging
-
myhloli authored
-
Xiaomeng Zhao authored
Feat/add s3 read write example
-
xu rui authored
-
Xiaomeng Zhao authored
docs(README): remove code examples and redirect to documentation
-
myhloli authored
- Remove command line and API code examples from README files - Add links to online documentation for command line and API usage - Update content to point users to the new locations for detailed information
-
icecraft authored
-
Xiaomeng Zhao authored
refactor(ocr): remove unused functions and optimize OCR processing loop
-
myhloli authored
- Remove unused function `calculate_angle_degrees`- Refactor `calculate_is_angle` to use directly in OCR processing - Eliminate unnecessary loop index `idx` in OCR processing loops
-
myhloli authored
- Remove commented-out code in ocr_dict_merge.py - Improve imports and code organization in ocr_detect_all_bboxes.py - Delete unnecessary empty lines and improve code readability
-
Xiaomeng Zhao authored
refactor(libs): remove unused imports and functions
-
myhloli authored
- Remove unused imports from commons.py - Delete unused functions related to AWS and S3 operations - Update import statements in other modules to reflect changes in commons.py - Remove redundant code and improve code readability
-
Xiaomeng Zhao authored
test: json minify
-
myhloli authored
-
Xiaomeng Zhao authored
fix: test_tools unittest
-
- 26 Nov, 2024 16 commits
-
-
Xiaomeng Zhao authored
perf(image_processing): reduce maximum image size for analysis
-
myhloli authored
- Decrease the maximum image size threshold from 9000 to 4500 pixels - This change aims to improve performance and reduce memory usage - Affects the custom model document analysis process
-
Xiaomeng Zhao authored
fix: test_rag
-
icecraft authored
-
icecraft authored
-
Xiaomeng Zhao authored
refactor: remove deprecated markdown_utils function
-
myhloli authored
-
Xiaomeng Zhao authored
test: Shield some failed test cases
-
myhloli authored
-
Xiaomeng Zhao authored
refactor(pre_proc): remove unused functions and simplify code
-
myhloli authored
- Remove unused imports and functions across multiple files - Simplify code by deleting unnecessary comments and empty lines - Update function signatures to match actual usage - Replace redundant code with more efficient alternatives
-
Xiaomeng Zhao authored
refactor(magic_pdf): remove unused functions and simplify code
-
myhloli authored
-
Xiaomeng Zhao authored
refactor(magic_pdf): remove unused functions and simplify code
-
myhloli authored
-
Xiaomeng Zhao authored
feat(pdf_parse): improve text extraction for vertical spans
-