1. 01 Nov, 2024 4 commits
  2. 31 Oct, 2024 1 commit
  3. 30 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): handle missing image_path in spans · faf8c286
      myhloli authored
      - Add check for 'image_path' in spans to avoid errors when it's missing
      - Update image handling in both paragraph text and content dictionary
      - Improve error handling and make the code more robust
      faf8c286
    • myhloli's avatar
      fix(ocr): improve image and table content extraction · b7e9d454
      myhloli authored
      - Update image content extraction to iterate through all spans in a block
      - Add support for extracting table content from spans within a block
      - Handle multiple content types within table spans (latex, html, image)
      - Refactor code to be more modular and easier to maintain
      b7e9d454
  4. 28 Oct, 2024 5 commits
  5. 27 Oct, 2024 2 commits
  6. 26 Oct, 2024 1 commit
    • myhloli's avatar
      feat(draw_bbox): update bounding box drawing for tables and images · 0e8d5893
      myhloli authored
      - Add support for drawing bounding boxes of table and image sub-blocks
      - Implement sorting of table blocks based on type order
      - Update bounding box drawing for text and title blocks
      - Refactor code to handle different block types and their sub-blocks
      0e8d5893
  7. 25 Oct, 2024 7 commits
  8. 24 Oct, 2024 3 commits
  9. 23 Oct, 2024 1 commit
    • myhloli's avatar
      feat(model): add support for DocLayout-YOLO model · 1279f2cd
      myhloli authored
      - Add new layout model option: DocLayout-YOLO
      - Implement model initialization and prediction for DocLayout-YOLO
      - Update configuration options to include new model- Modify existing code to support both LayoutLMv3 and DocLayout-YOLO models
      - Update Gradio app to support more Custom Switch
      1279f2cd
  10. 21 Oct, 2024 2 commits
  11. 18 Oct, 2024 1 commit
  12. 17 Oct, 2024 2 commits
  13. 15 Oct, 2024 4 commits
  14. 14 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): include List and Index block types in processing · 0a9a6d3e
      myhloli authored
      Add List and Index to the list of block types being processed in the draw_bbox.py file. This inclusion ensures that these block types are handled similarly to other text-containing blocks, improving the overall document processing accuracy and consistency.
      0a9a6d3e
    • myhloli's avatar
      feat(list&index block): detect and merge list and index blocks · 1f1dd353
      myhloli authored
      - Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages
      - Update block types to include list and index categories
      - Adjust text merging logic to handle new block types
      - Modify layout drawing to distinguish list and index blocks
      1f1dd353
  15. 10 Oct, 2024 2 commits
  16. 08 Oct, 2024 1 commit