1. 24 Nov, 2024 1 commit
  2. 22 Nov, 2024 5 commits
  3. 21 Nov, 2024 7 commits
  4. 20 Nov, 2024 1 commit
  5. 19 Nov, 2024 2 commits
  6. 18 Nov, 2024 4 commits
  7. 15 Nov, 2024 2 commits
  8. 14 Nov, 2024 1 commit
  9. 13 Nov, 2024 1 commit
  10. 11 Nov, 2024 1 commit
  11. 08 Nov, 2024 5 commits
  12. 07 Nov, 2024 1 commit
    • myhloli's avatar
      feat(model): add xycut algorithm for block sorting · 7d5850e3
      myhloli authored
      - Implement xycut algorithm to sort blocks when layoutreader fails
      - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
      - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
      7d5850e3
  13. 06 Nov, 2024 3 commits
  14. 05 Nov, 2024 1 commit
  15. 04 Nov, 2024 5 commits
    • myhloli's avatar
      fix(merge_text): add ligature replacement functionality · bd755962
      myhloli authored
      - Implement __replace_ligatures function to split ligature characters- Integrate ligature replacement into the merge_para_with_text function
      - Handle common ligatures such as fi, fl, ff, ffi, and ffl
      bd755962
    • myhloli's avatar
      feat(model): add HTML minification to StructTableModel · b5117e72
      myhloli authored
      - Import 're' module for regular expression operations
      - Implement HTML minification for 'output_format=html'
      - Add 'minify_html' method to remove unnecessary whitespace and format HTML
      b5117e72
    • myhloli's avatar
      refactor(model): comment out unused code in ppTableModel · 5ee02a99
      myhloli authored
      - Comment out an unused code block in the ppTableModel.py file
      - Improve code readability and maintainability by removing unnecessary code
      5ee02a99
    • myhloli's avatar
      feat(table): upgrade StructEqTable model and integrate into PDF Extract Kit · 11f23843
      myhloli authored
      - Update StructTableModel to use the latest struct-eqtable library
      - Add support for HTML table extraction in PDF Extract Kit
      - Improve error handling and model initialization
      - Update dependencies in setup.py for struct-eqtable
      11f23843
    • ciaran's avatar
      Update pdf_extract_kit.py · fb6cb8b0
      ciaran authored
      Modify line 397 to ensure compatibility with CPU execution, addressing the issue where specifying 'cpu' in config.json still results in a ValueError for expecting a cuda device but getting 'cpu' during demo execution.
      fb6cb8b0