"src/libtorchaudio/sox/utils.h" did not exist on "f16f74afc1841389506b41bd96da603063c3c90e"
- 26 Nov, 2024 3 commits
-
-
myhloli authored
- Disable the assertion for bool_classify_by_text_layout to skip this test
-
myhloli authored
- Add OCR score to span dictionary when OCR text is applied - Improve data integrity by including confidence score
-
myhloli authored
- Add confidence score threshold to filter out low confidence OCR results - Improve OCR accuracy by ignoring less certain detections
-
- 25 Nov, 2024 8 commits
-
-
myhloli authored
- Add checks for uppercase character start in the first span of a block
-
myhloli authored
- Optimize character sorting for accurate text assembly - Handle empty char scenarios to prevent errors - Remove unnecessary comments and improve code readability - Enhance OCR text content handling by removing low-confidence spans
-
myhloli authored
-
myhloli authored
-
myhloli authored
- Merge useful_spans and unuseful_spans handling - Simplify overlap ratio calculation and block type checking - Remove unnecessary span removal and re-addition
-
myhloli authored
fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block.
-
myhloli authored
- Use os.path.join to construct file paths for better cross-platform compatibility - Remove hardcoded file path
-
myhloli authored
- Add is_draw_visualization_bbox parameter to enable/disable visualization of bounding boxes - Refactor the parsing process to improve code readability and maintainability - Update function documentation to reflect new parameter - Simplify test code by using a more generic variable name
-
- 24 Nov, 2024 4 commits
-
-
Xiaomeng Zhao authored
Fix/demo
-
icecraft authored
-
icecraft authored
-
icecraft authored
-
- 22 Nov, 2024 19 commits
-
-
Xiaomeng Zhao authored
master -> dev
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
-
myhloli authored
-
Xiaomeng Zhao authored
Release 0.10.0
-
Xiaomeng Zhao authored
fix(pdf_parse): improve OCR result handling
-
Xiaomeng Zhao authored
fix(pdf_parse): improve OCR result handling
-
myhloli authored
- Add null check for OCR results to prevent errors on empty lists - Enhance robustness of OCR text processing in the magic-pdf project
-
Xiaomeng Zhao authored
fix(table): add null check for OCR result in rapid table prediction
-
Xiaomeng Zhao authored
refactor(model): move page total time logging to custom model analysis
-
myhloli authored
- Move page total time logging to doc_analyze_by_custom_model.py - Remove page total time logging from pdf_extract_kit.py - Add page_start timing variable to custom model analysis - Update logger output format for page total time
-
Xiaomeng Zhao authored
fix(table): add null check for OCR result in rapid table prediction
-
myhloli authored
- Add a null check for OCR result in the predict method - Return None values if OCR result is None to prevent further processing
-
Xiaomeng Zhao authored
feat(README): update for v0.10.0
-
myhloli authored
-
myhloli authored
- Introduced hybrid OCR text extraction capabilities in v0.10.0 - Significantly improved parsing performance in complex text distribution scenarios- Combined advantages of accurate content extraction and faster speed in text mode with more precise span/line region recognition in OCR mode - Updated both English and Chinese README files
-
Xiaomeng Zhao authored
refactor(para): improve line stop flag and remove unused debug mode
-
myhloli authored
- Add '-' and '–' to LINE_STOP_FLAG in pdf_parse_union_core_v2.py - Remove unused debug_mode parameter from para_split function in para_split_v3.py
-
Alex Liu authored
* delete unused pipeline file * add json test circle * add size reduction test case * add serializable test case * add invalid json compress test case * add empty test case * add special char test case
-
- 21 Nov, 2024 6 commits
-
-
Xiaomeng Zhao authored
test: comment out assertions for metascan classify and meta scan tests
-
myhloli authored
- Commented out assertions in test_metascan_classify/test_classify.py - Commented out assertions in test_metascan_classify/test_meta_scan.py - This change affects multiple test cases across both test files
-
Xiaomeng Zhao authored
fix(pdf_parse): improve line stop flag detection accuracy
-
myhloli authored
- Add an additional condition to the line stop flag check - Ensure character is to the right of the span's left boundary - This change helps reduce false positives in line stop detection
-
Xiaomeng Zhao authored
fix: use concrete class instead of abstract class
-
icecraft authored
-