- 09 Feb, 2025 1 commit
-
-
myhloli authored
- Increase batch size from 8 to 256 for language detection inference - Add timing measurement for language detection process
-
- 10 Dec, 2024 1 commit
-
-
myhloli authored
- Replace MuPDF with pdfminer for detecting invalid characters in PDFs - Uncomment and update the detect_invalid_chars function to use pdfminer - Update the check_invalid_chars function in pdf_meta_scan.py to use the new implementation
-
- 28 Nov, 2024 1 commit
-
-
myhloli authored
- Replace pdfminer with PyMuPDF for character detection - Implement new method detect_invalid_chars_by_pymupdf - Update check_invalid_chars in pdf_meta_scan.py to use new method - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters - Remove unused imports and update requirements.txt
-
- 20 Jun, 2024 1 commit
-
-
赵小蒙 authored
-
- 19 Jun, 2024 2 commits