1. 15 Nov, 2024 1 commit
  2. 14 Nov, 2024 1 commit
  3. 13 Nov, 2024 1 commit
  4. 11 Nov, 2024 1 commit
  5. 08 Nov, 2024 5 commits
  6. 07 Nov, 2024 1 commit
    • myhloli's avatar
      feat(model): add xycut algorithm for block sorting · 7d5850e3
      myhloli authored
      - Implement xycut algorithm to sort blocks when layoutreader fails
      - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
      - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
      7d5850e3
  7. 06 Nov, 2024 1 commit
  8. 05 Nov, 2024 1 commit
  9. 04 Nov, 2024 4 commits
  10. 03 Nov, 2024 2 commits
  11. 02 Nov, 2024 2 commits
    • myhloli's avatar
      feat(list): improve list detection algorithm- Add center_close_num and... · 2bf6c268
      myhloli authored
      feat(list): improve list detection algorithm- Add center_close_num and external_sides_not_close_num variables to analyze line positioning
      - Implement new list detection condition for centered lines
      - Enhance existing list detection logic with additional checks
      2bf6c268
    • myhloli's avatar
      fix(list): improve list identification accuracy- Adjust the threshold for... · a8f2e7d6
      myhloli authored
      fix(list): improve list identification accuracy- Adjust the threshold for determining right-side spacing to 0.26 * block_weight
      - Add TODO comment for special list identification with all centered lines- Modify the condition for recognizing short item lists with left alignment
      - Update the condition for identifying the end of a list item
      a8f2e7d6
  12. 01 Nov, 2024 8 commits
  13. 31 Oct, 2024 1 commit
  14. 30 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): handle missing image_path in spans · faf8c286
      myhloli authored
      - Add check for 'image_path' in spans to avoid errors when it's missing
      - Update image handling in both paragraph text and content dictionary
      - Improve error handling and make the code more robust
      faf8c286
    • myhloli's avatar
      fix(ocr): improve image and table content extraction · b7e9d454
      myhloli authored
      - Update image content extraction to iterate through all spans in a block
      - Add support for extracting table content from spans within a block
      - Handle multiple content types within table spans (latex, html, image)
      - Refactor code to be more modular and easier to maintain
      b7e9d454
  15. 28 Oct, 2024 5 commits
  16. 27 Oct, 2024 2 commits
  17. 26 Oct, 2024 1 commit
    • myhloli's avatar
      feat(draw_bbox): update bounding box drawing for tables and images · 0e8d5893
      myhloli authored
      - Add support for drawing bounding boxes of table and image sub-blocks
      - Implement sorting of table blocks based on type order
      - Update bounding box drawing for text and title blocks
      - Refactor code to handle different block types and their sub-blocks
      0e8d5893
  18. 25 Oct, 2024 1 commit