1. 10 Dec, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): disable PaddlePaddle signal handler · dd7f6781
      myhloli authored
      - Import paddle module and disable its signal handler to prevent interference with other components
      - This change addresses potential conflicts between PaddlePaddle and other libraries or system signals
      dd7f6781
    • myhloli's avatar
      refactor: comment out clean_memory function call · 2b6e9442
      myhloli authored
      - Remove the call to clean_memory() function from pdf_parse_union_core_v2.py
      - This change may affect memory usage and needs to be tested to ensure proper functionality
      2b6e9442
  2. 09 Dec, 2024 3 commits
  3. 07 Dec, 2024 2 commits
  4. 06 Dec, 2024 10 commits
  5. 05 Dec, 2024 1 commit
    • myhloli's avatar
      perf(model): add threading lock for OCR model initialization · 04478095
      myhloli authored
      - Introduce a lock to synchronize access to OCR model initialization- This change improves thread safety when multiple threads access the OCR model concurrently
      - The lock ensures that the OCR model is initialized only once, even in multi-threaded scenarios
      04478095
  6. 03 Dec, 2024 7 commits
  7. 02 Dec, 2024 3 commits
  8. 30 Nov, 2024 1 commit
  9. 29 Nov, 2024 8 commits
  10. 28 Nov, 2024 3 commits
    • myhloli's avatar
      feat(pdf_parse): add line start flag detection and optimize line stop flag logic · 949d0867
      myhloli authored
      - Add LINE_START_FLAG tuple to identify starting flags of a line
      - Modify calculate_char_in_span function to handle both line start and stop flags
      - Remove redundant char_is_line_stop_flag variable and simplify logic
      - Improve line flag detection to enhance text extraction accuracy
      949d0867
    • myhloli's avatar
      refactor(pdf_check): improve character detection using PyMuPDF · ac888156
      myhloli authored
      - Replace pdfminer with PyMuPDF for character detection
      - Implement new method detect_invalid_chars_by_pymupdf
      - Update check_invalid_chars in pdf_meta_scan.py to use new method
      - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters
      - Remove unused imports and update requirements.txt
      ac888156
    • myhloli's avatar
      refactor(ocr): improve text processing and span handling · 88c0854a
      myhloli authored
      - Remove unused language detection code
      - Simplify text content processing logic
      - Update span sorting and text extraction in pdf_parse_union_core_v2.py
      88c0854a