- 16 Apr, 2025 1 commit
-
-
myhloli authored
-
- 14 Apr, 2025 1 commit
-
-
myhloli authored
-
- 12 Apr, 2025 1 commit
-
-
myhloli authored
-
- 08 Apr, 2025 1 commit
-
-
myhloli authored
-
- 03 Apr, 2025 1 commit
-
-
myhloli authored
-
- 01 Apr, 2025 1 commit
-
-
myhloli authored
- Enhance the logging of execution times by adding more detailed function identification - Implement class name and module name inclusion for better traceability
-
- 07 Mar, 2025 1 commit
-
-
myhloli authored
- Remove PIL usage across multiple files - Convert image processing functions to use NumPy arrays - Update crop_img function to work with NumPy arrays - Modify image loading and resizing to use NumPy and OpenCV - Clean up unused imports and comments related to PIL
-
- 04 Mar, 2025 1 commit
-
-
myhloli authored
-
- 03 Mar, 2025 2 commits
-
-
myhloli authored
-
myhloli authored
- Add performance_stats module to measure and print execution time statistics - Implement measure_time decorator to track execution time of key functions - Remove multi-threading in pdf parsing for better resource management - Optimize pdf parsing logic for improved performance
-
- 27 Feb, 2025 1 commit
-
-
myhloli authored
-
- 09 Feb, 2025 1 commit
-
-
myhloli authored
- Increase batch size from 8 to 256 for language detection inference - Add timing measurement for language detection process
-
- 23 Jan, 2025 1 commit
-
-
myhloli authored
-
- 22 Jan, 2025 1 commit
-
-
myhloli authored
- Add a check to return 0 when either bbox1_area or bbox2_area is zero - This prevents division by zero errors when calculating IoU
-
- 15 Jan, 2025 1 commit
-
-
myhloli authored
- Add `remove_invalid_surrogates` function to filter out invalid UTF-16 surrogate pairs - Integrate the new function into the `detect_lang` workflow - Include a test case with UTF-16 surrogates to verify the fix
-
- 14 Jan, 2025 1 commit
-
-
myhloli authored
- Merge title blocks that are close to each other horizontally - Adjust line insertion logic for title blocks- Increase image size and decrease confidence threshold for layout detection - Update DocLayoutYOLO model weights - Refactor drawing of bounding boxes for different block types
-
- 10 Jan, 2025 3 commits
- 09 Jan, 2025 1 commit
-
-
myhloli authored
- Improve language detection by removing newline characters from the input text - Add error handling and fallback mechanism to deal with text containing control characters
-
- 05 Jan, 2025 1 commit
-
-
myhloli authored
- Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes - Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline - Include option to draw character bounding boxes in debug mode
-
- 30 Dec, 2024 1 commit
-
-
myhloli authored
- Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu` - Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu` - Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`
-
- 26 Dec, 2024 2 commits
-
-
myhloli authored
- Update clean_memory function to support both CUDA and NPU devices - Implement get_device function to centralize device selection logic - Modify model initialization and memory cleaning to use the selected device - Update RapidTableModel to support both RapidOCR and PaddleOCR engines
-
myhloli authored
- Add NPU support for memory cleaning and model initialization - Optimize table model initialization and prediction process - Update memory utils to support NPU - Add language parameter for table model
-
- 24 Dec, 2024 1 commit
-
-
myhloli authored
- Add LLM-aided formula and text correction functionality - Update config reader to include LLM-aided settings - Create new LLM-aided processing module - Update main processing script to incorporate LLM-aided corrections - Modify download scripts to check for new config version
-
- 11 Dec, 2024 2 commits
- 10 Dec, 2024 1 commit
-
-
myhloli authored
- Replace MuPDF with pdfminer for detecting invalid characters in PDFs - Uncomment and update the detect_invalid_chars function to use pdfminer - Update the check_invalid_chars function in pdf_meta_scan.py to use the new implementation
-
- 03 Dec, 2024 2 commits
- 02 Dec, 2024 1 commit
-
-
myhloli authored
-
- 29 Nov, 2024 2 commits
- 28 Nov, 2024 1 commit
-
-
myhloli authored
- Replace pdfminer with PyMuPDF for character detection - Implement new method detect_invalid_chars_by_pymupdf - Update check_invalid_chars in pdf_meta_scan.py to use new method - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters - Remove unused imports and update requirements.txt
-
- 27 Nov, 2024 2 commits
- 26 Nov, 2024 3 commits
- 25 Nov, 2024 1 commit
-
-
myhloli authored
-