1. 21 Nov, 2024 5 commits
  2. 18 Nov, 2024 3 commits
    • myhloli's avatar
      refactor(para): adjust right margin threshold based on block width · 69805f4b
      myhloli authored
      - Introduce a variable threshold for right margin based on block width
      - Use 0.26 * block_weight for wider blocks (block_weight_radio >= 0.5)
      - Use 0.36 * block_weight for narrower blocks- This change aims to improve paragraph splitting accuracy for different block widths
      69805f4b
    • myhloli's avatar
      refactor(para): improve paragraph splitting logic · 517fbe5b
      myhloli authored
      - Add page size information to blocks
      - Calculate block width ratio relative to page width
      - Adjust threshold for determining right side indentation
      - Implement additional checks for merging blocks across pages
      - Improve logic for identifying list structures
      517fbe5b
    • myhloli's avatar
      feat(ocr): improve handling of angled text boxes · 4fd966eb
      myhloli authored
      - Add calculate_is_angle function to detect angled text boxes
      - Update update_det_boxes and merge_det_boxes functions to handle angled text boxes
      - Modify angle detection logic in various parts of the code
      4fd966eb
  3. 15 Nov, 2024 1 commit
  4. 14 Nov, 2024 1 commit
  5. 13 Nov, 2024 1 commit
  6. 11 Nov, 2024 1 commit
  7. 08 Nov, 2024 5 commits
  8. 07 Nov, 2024 1 commit
    • myhloli's avatar
      feat(model): add xycut algorithm for block sorting · 7d5850e3
      myhloli authored
      - Implement xycut algorithm to sort blocks when layoutreader fails
      - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
      - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
      7d5850e3
  9. 06 Nov, 2024 1 commit
  10. 05 Nov, 2024 1 commit
  11. 04 Nov, 2024 4 commits
  12. 03 Nov, 2024 2 commits
  13. 02 Nov, 2024 2 commits
    • myhloli's avatar
      feat(list): improve list detection algorithm- Add center_close_num and... · 2bf6c268
      myhloli authored
      feat(list): improve list detection algorithm- Add center_close_num and external_sides_not_close_num variables to analyze line positioning
      - Implement new list detection condition for centered lines
      - Enhance existing list detection logic with additional checks
      2bf6c268
    • myhloli's avatar
      fix(list): improve list identification accuracy- Adjust the threshold for... · a8f2e7d6
      myhloli authored
      fix(list): improve list identification accuracy- Adjust the threshold for determining right-side spacing to 0.26 * block_weight
      - Add TODO comment for special list identification with all centered lines- Modify the condition for recognizing short item lists with left alignment
      - Update the condition for identifying the end of a list item
      a8f2e7d6
  14. 01 Nov, 2024 8 commits
  15. 31 Oct, 2024 1 commit
  16. 30 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): handle missing image_path in spans · faf8c286
      myhloli authored
      - Add check for 'image_path' in spans to avoid errors when it's missing
      - Update image handling in both paragraph text and content dictionary
      - Improve error handling and make the code more robust
      faf8c286
    • myhloli's avatar
      fix(ocr): improve image and table content extraction · b7e9d454
      myhloli authored
      - Update image content extraction to iterate through all spans in a block
      - Add support for extracting table content from spans within a block
      - Handle multiple content types within table spans (latex, html, image)
      - Refactor code to be more modular and easier to maintain
      b7e9d454
  17. 28 Oct, 2024 1 commit
    • myhloli's avatar
      refactor(table): disable StructEqTable support and add TableMaster support · 377b09cf
      myhloli authored
      - Remove import and usage of StructTableModel- Add support for TableMaster model- Update table model initialization logic to support TableMaster
      - Log error and exit if StructEqTable is selected, as it's under upgrade
      - Update README files to reflect changes in table parsing capabilities
      377b09cf