1. 28 Apr, 2025 1 commit
    • myhloli's avatar
      feat(latex): enhance LaTeX delimiter support and configurability · 100e9c17
      myhloli authored
      - Add support for \(\) and \[\] delimiters in addition to $$ and $$- Make LaTeX delimiter configuration more flexible and user-defined
      - Update configuration file to include LaTeX delimiter settings
      - Modify OCR content generation to use configurable delimiters
      100e9c17
  2. 27 Apr, 2025 1 commit
  3. 23 Apr, 2025 1 commit
  4. 22 Apr, 2025 1 commit
  5. 21 Apr, 2025 1 commit
  6. 17 Apr, 2025 1 commit
  7. 16 Apr, 2025 1 commit
  8. 14 Apr, 2025 1 commit
  9. 12 Apr, 2025 1 commit
  10. 08 Apr, 2025 1 commit
  11. 03 Apr, 2025 1 commit
  12. 01 Apr, 2025 1 commit
  13. 07 Mar, 2025 1 commit
    • myhloli's avatar
      refactor(magic_pdf): replace PIL with NumPy for image processing · 1b34f7e4
      myhloli authored
      - Remove PIL usage across multiple files
      - Convert image processing functions to use NumPy arrays
      - Update crop_img function to work with NumPy arrays
      - Modify image loading and resizing to use NumPy and OpenCV
      - Clean up unused imports and comments related to PIL
      1b34f7e4
  14. 04 Mar, 2025 1 commit
  15. 03 Mar, 2025 2 commits
  16. 27 Feb, 2025 1 commit
  17. 09 Feb, 2025 1 commit
  18. 23 Jan, 2025 1 commit
  19. 22 Jan, 2025 1 commit
  20. 15 Jan, 2025 1 commit
  21. 14 Jan, 2025 1 commit
    • myhloli's avatar
      feat(layout): improve title block handling and layout detection · c20e9a1e
      myhloli authored
      - Merge title blocks that are close to each other horizontally
      - Adjust line insertion logic for title blocks- Increase image size and decrease confidence threshold for layout detection
      - Update DocLayoutYOLO model weights
      - Refactor drawing of bounding boxes for different block types
      c20e9a1e
  22. 10 Jan, 2025 3 commits
  23. 09 Jan, 2025 1 commit
  24. 05 Jan, 2025 1 commit
    • myhloli's avatar
      feat(tools): add character bounding box drawing functionality · f911a102
      myhloli authored
      - Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes
      - Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline
      - Include option to draw character bounding boxes in debug mode
      f911a102
  25. 30 Dec, 2024 1 commit
    • myhloli's avatar
      fix(npu): correct module name for NPU operations · 2684e775
      myhloli authored
      - Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu`
      - Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu`
      - Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`
      2684e775
  26. 26 Dec, 2024 2 commits
    • myhloli's avatar
      refactor(device): optimize memory cleaning and device selection · 50f48417
      myhloli authored
      - Update clean_memory function to support both CUDA and NPU devices
      - Implement get_device function to centralize device selection logic
      - Modify model initialization and memory cleaning to use the selected device
      - Update RapidTableModel to support both RapidOCR and PaddleOCR engines
      50f48417
    • myhloli's avatar
      feat(model): add npu support and optimize table model · 7990e7df
      myhloli authored
      - Add NPU support for memory cleaning and model initialization
      - Optimize table model initialization and prediction process
      - Update memory utils to support NPU
      - Add language parameter for table model
      7990e7df
  27. 24 Dec, 2024 1 commit
    • myhloli's avatar
      feat(llm): add LLM-aided formula and text correction · c660fdc8
      myhloli authored
      - Add LLM-aided formula and text correction functionality
      - Update config reader to include LLM-aided settings
      - Create new LLM-aided processing module
      - Update main processing script to incorporate LLM-aided corrections
      - Modify download scripts to check for new config version
      c660fdc8
  28. 11 Dec, 2024 2 commits
  29. 10 Dec, 2024 1 commit
  30. 03 Dec, 2024 2 commits
  31. 02 Dec, 2024 1 commit
  32. 29 Nov, 2024 2 commits
  33. 28 Nov, 2024 1 commit
    • myhloli's avatar
      refactor(pdf_check): improve character detection using PyMuPDF · ac888156
      myhloli authored
      - Replace pdfminer with PyMuPDF for character detection
      - Implement new method detect_invalid_chars_by_pymupdf
      - Update check_invalid_chars in pdf_meta_scan.py to use new method
      - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters
      - Remove unused imports and update requirements.txt
      ac888156