1. 04 Mar, 2025 1 commit
  2. 03 Mar, 2025 2 commits
  3. 27 Feb, 2025 1 commit
  4. 09 Feb, 2025 1 commit
  5. 23 Jan, 2025 1 commit
  6. 22 Jan, 2025 1 commit
  7. 15 Jan, 2025 1 commit
  8. 14 Jan, 2025 1 commit
    • myhloli's avatar
      feat(layout): improve title block handling and layout detection · c20e9a1e
      myhloli authored
      - Merge title blocks that are close to each other horizontally
      - Adjust line insertion logic for title blocks- Increase image size and decrease confidence threshold for layout detection
      - Update DocLayoutYOLO model weights
      - Refactor drawing of bounding boxes for different block types
      c20e9a1e
  9. 10 Jan, 2025 3 commits
  10. 09 Jan, 2025 1 commit
  11. 05 Jan, 2025 1 commit
    • myhloli's avatar
      feat(tools): add character bounding box drawing functionality · f911a102
      myhloli authored
      - Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes
      - Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline
      - Include option to draw character bounding boxes in debug mode
      f911a102
  12. 30 Dec, 2024 1 commit
    • myhloli's avatar
      fix(npu): correct module name for NPU operations · 2684e775
      myhloli authored
      - Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu`
      - Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu`
      - Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`
      2684e775
  13. 26 Dec, 2024 2 commits
    • myhloli's avatar
      refactor(device): optimize memory cleaning and device selection · 50f48417
      myhloli authored
      - Update clean_memory function to support both CUDA and NPU devices
      - Implement get_device function to centralize device selection logic
      - Modify model initialization and memory cleaning to use the selected device
      - Update RapidTableModel to support both RapidOCR and PaddleOCR engines
      50f48417
    • myhloli's avatar
      feat(model): add npu support and optimize table model · 7990e7df
      myhloli authored
      - Add NPU support for memory cleaning and model initialization
      - Optimize table model initialization and prediction process
      - Update memory utils to support NPU
      - Add language parameter for table model
      7990e7df
  14. 24 Dec, 2024 1 commit
    • myhloli's avatar
      feat(llm): add LLM-aided formula and text correction · c660fdc8
      myhloli authored
      - Add LLM-aided formula and text correction functionality
      - Update config reader to include LLM-aided settings
      - Create new LLM-aided processing module
      - Update main processing script to incorporate LLM-aided corrections
      - Modify download scripts to check for new config version
      c660fdc8
  15. 11 Dec, 2024 2 commits
  16. 10 Dec, 2024 1 commit
  17. 03 Dec, 2024 2 commits
  18. 02 Dec, 2024 1 commit
  19. 29 Nov, 2024 2 commits
  20. 28 Nov, 2024 1 commit
    • myhloli's avatar
      refactor(pdf_check): improve character detection using PyMuPDF · ac888156
      myhloli authored
      - Replace pdfminer with PyMuPDF for character detection
      - Implement new method detect_invalid_chars_by_pymupdf
      - Update check_invalid_chars in pdf_meta_scan.py to use new method
      - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters
      - Remove unused imports and update requirements.txt
      ac888156
  21. 27 Nov, 2024 2 commits
  22. 26 Nov, 2024 3 commits
  23. 25 Nov, 2024 1 commit
  24. 22 Nov, 2024 1 commit
  25. 21 Nov, 2024 1 commit
  26. 19 Nov, 2024 1 commit
  27. 18 Nov, 2024 1 commit
  28. 15 Nov, 2024 1 commit
  29. 08 Nov, 2024 2 commits