1. 09 Jan, 2025 1 commit
  2. 05 Jan, 2025 1 commit
    • myhloli's avatar
      feat(tools): add character bounding box drawing functionality · f911a102
      myhloli authored
      - Add `draw_char_bbox` function to `draw_bbox.py` for drawing character bounding boxes
      - Integrate `draw_char_bbox` into `common.py` for use in PDF processing pipeline
      - Include option to draw character bounding boxes in debug mode
      f911a102
  3. 30 Dec, 2024 1 commit
    • myhloli's avatar
      fix(npu): correct module name for NPU operations · 2684e775
      myhloli authored
      - Update `clean_memory.py` to use `torch_npu.npu` instead of `torch.npu`
      - Update `model_utils.py` to use `torch_npu.npu` instead of `torch.npu`
      - Simplify NPU availability check and bfloat16 support in `pdf_parse_union_core_v2.py`
      2684e775
  4. 26 Dec, 2024 2 commits
    • myhloli's avatar
      refactor(device): optimize memory cleaning and device selection · 50f48417
      myhloli authored
      - Update clean_memory function to support both CUDA and NPU devices
      - Implement get_device function to centralize device selection logic
      - Modify model initialization and memory cleaning to use the selected device
      - Update RapidTableModel to support both RapidOCR and PaddleOCR engines
      50f48417
    • myhloli's avatar
      feat(model): add npu support and optimize table model · 7990e7df
      myhloli authored
      - Add NPU support for memory cleaning and model initialization
      - Optimize table model initialization and prediction process
      - Update memory utils to support NPU
      - Add language parameter for table model
      7990e7df
  5. 24 Dec, 2024 1 commit
    • myhloli's avatar
      feat(llm): add LLM-aided formula and text correction · c660fdc8
      myhloli authored
      - Add LLM-aided formula and text correction functionality
      - Update config reader to include LLM-aided settings
      - Create new LLM-aided processing module
      - Update main processing script to incorporate LLM-aided corrections
      - Modify download scripts to check for new config version
      c660fdc8
  6. 11 Dec, 2024 2 commits
  7. 10 Dec, 2024 1 commit
  8. 03 Dec, 2024 2 commits
  9. 02 Dec, 2024 1 commit
  10. 29 Nov, 2024 2 commits
  11. 28 Nov, 2024 1 commit
    • myhloli's avatar
      refactor(pdf_check): improve character detection using PyMuPDF · ac888156
      myhloli authored
      - Replace pdfminer with PyMuPDF for character detection
      - Implement new method detect_invalid_chars_by_pymupdf
      - Update check_invalid_chars in pdf_meta_scan.py to use new method
      - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters
      - Remove unused imports and update requirements.txt
      ac888156
  12. 27 Nov, 2024 2 commits
  13. 26 Nov, 2024 3 commits
  14. 25 Nov, 2024 1 commit
  15. 22 Nov, 2024 1 commit
  16. 21 Nov, 2024 1 commit
  17. 19 Nov, 2024 1 commit
  18. 18 Nov, 2024 1 commit
  19. 15 Nov, 2024 1 commit
  20. 08 Nov, 2024 2 commits
  21. 07 Nov, 2024 1 commit
    • myhloli's avatar
      feat(model): add xycut algorithm for block sorting · 7d5850e3
      myhloli authored
      - Implement xycut algorithm to sort blocks when layoutreader fails
      - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
      - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
      7d5850e3
  22. 06 Nov, 2024 2 commits
  23. 01 Nov, 2024 1 commit
    • myhloli's avatar
      feat(pdf_parse): improve span filtering and add new block types · 149132d6
      myhloli authored
      - Refactor remove_outside_spans function to filter spans more accurately
      - Add image_footnote, index, and list block types to output file documentation
      - Update draw_span_bbox to use preproc_blocks instead of para_blocks
      - Bump version to 0.9.0
      149132d6
  24. 28 Oct, 2024 1 commit
  25. 26 Oct, 2024 1 commit
    • myhloli's avatar
      feat(draw_bbox): update bounding box drawing for tables and images · 0e8d5893
      myhloli authored
      - Add support for drawing bounding boxes of table and image sub-blocks
      - Implement sorting of table blocks based on type order
      - Update bounding box drawing for text and title blocks
      - Refactor code to handle different block types and their sub-blocks
      0e8d5893
  26. 24 Oct, 2024 1 commit
  27. 23 Oct, 2024 1 commit
    • myhloli's avatar
      feat(model): add support for DocLayout-YOLO model · 1279f2cd
      myhloli authored
      - Add new layout model option: DocLayout-YOLO
      - Implement model initialization and prediction for DocLayout-YOLO
      - Update configuration options to include new model- Modify existing code to support both LayoutLMv3 and DocLayout-YOLO models
      - Update Gradio app to support more Custom Switch
      1279f2cd
  28. 17 Oct, 2024 1 commit
  29. 14 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): include List and Index block types in processing · 0a9a6d3e
      myhloli authored
      Add List and Index to the list of block types being processed in the draw_bbox.py file. This inclusion ensures that these block types are handled similarly to other text-containing blocks, improving the overall document processing accuracy and consistency.
      0a9a6d3e
    • myhloli's avatar
      feat(list&index block): detect and merge list and index blocks · 1f1dd353
      myhloli authored
      - Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages
      - Update block types to include list and index categories
      - Adjust text merging logic to handle new block types
      - Modify layout drawing to distinguish list and index blocks
      1f1dd353
  30. 08 Oct, 2024 1 commit