1. 05 Nov, 2024 1 commit
  2. 04 Nov, 2024 5 commits
    • myhloli's avatar
      fix(merge_text): add ligature replacement functionality · bd755962
      myhloli authored
      - Implement __replace_ligatures function to split ligature characters- Integrate ligature replacement into the merge_para_with_text function
      - Handle common ligatures such as fi, fl, ff, ffi, and ffl
      bd755962
    • myhloli's avatar
      feat(model): add HTML minification to StructTableModel · b5117e72
      myhloli authored
      - Import 're' module for regular expression operations
      - Implement HTML minification for 'output_format=html'
      - Add 'minify_html' method to remove unnecessary whitespace and format HTML
      b5117e72
    • myhloli's avatar
      refactor(model): comment out unused code in ppTableModel · 5ee02a99
      myhloli authored
      - Comment out an unused code block in the ppTableModel.py file
      - Improve code readability and maintainability by removing unnecessary code
      5ee02a99
    • myhloli's avatar
      feat(table): upgrade StructEqTable model and integrate into PDF Extract Kit · 11f23843
      myhloli authored
      - Update StructTableModel to use the latest struct-eqtable library
      - Add support for HTML table extraction in PDF Extract Kit
      - Improve error handling and model initialization
      - Update dependencies in setup.py for struct-eqtable
      11f23843
    • ciaran's avatar
      Update pdf_extract_kit.py · fb6cb8b0
      ciaran authored
      Modify line 397 to ensure compatibility with CPU execution, addressing the issue where specifying 'cpu' in config.json still results in a ValueError for expecting a cuda device but getting 'cpu' during demo execution.
      fb6cb8b0
  3. 03 Nov, 2024 2 commits
  4. 02 Nov, 2024 2 commits
    • myhloli's avatar
      feat(list): improve list detection algorithm- Add center_close_num and... · 2bf6c268
      myhloli authored
      feat(list): improve list detection algorithm- Add center_close_num and external_sides_not_close_num variables to analyze line positioning
      - Implement new list detection condition for centered lines
      - Enhance existing list detection logic with additional checks
      2bf6c268
    • myhloli's avatar
      fix(list): improve list identification accuracy- Adjust the threshold for... · a8f2e7d6
      myhloli authored
      fix(list): improve list identification accuracy- Adjust the threshold for determining right-side spacing to 0.26 * block_weight
      - Add TODO comment for special list identification with all centered lines- Modify the condition for recognizing short item lists with left alignment
      - Update the condition for identifying the end of a list item
      a8f2e7d6
  5. 01 Nov, 2024 8 commits
  6. 31 Oct, 2024 1 commit
  7. 30 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): handle missing image_path in spans · faf8c286
      myhloli authored
      - Add check for 'image_path' in spans to avoid errors when it's missing
      - Update image handling in both paragraph text and content dictionary
      - Improve error handling and make the code more robust
      faf8c286
    • myhloli's avatar
      fix(ocr): improve image and table content extraction · b7e9d454
      myhloli authored
      - Update image content extraction to iterate through all spans in a block
      - Add support for extracting table content from spans within a block
      - Handle multiple content types within table spans (latex, html, image)
      - Refactor code to be more modular and easier to maintain
      b7e9d454
  8. 28 Oct, 2024 5 commits
  9. 27 Oct, 2024 2 commits
  10. 26 Oct, 2024 1 commit
    • myhloli's avatar
      feat(draw_bbox): update bounding box drawing for tables and images · 0e8d5893
      myhloli authored
      - Add support for drawing bounding boxes of table and image sub-blocks
      - Implement sorting of table blocks based on type order
      - Update bounding box drawing for text and title blocks
      - Refactor code to handle different block types and their sub-blocks
      0e8d5893
  11. 25 Oct, 2024 7 commits
  12. 24 Oct, 2024 3 commits
  13. 23 Oct, 2024 1 commit
    • myhloli's avatar
      feat(model): add support for DocLayout-YOLO model · 1279f2cd
      myhloli authored
      - Add new layout model option: DocLayout-YOLO
      - Implement model initialization and prediction for DocLayout-YOLO
      - Update configuration options to include new model- Modify existing code to support both LayoutLMv3 and DocLayout-YOLO models
      - Update Gradio app to support more Custom Switch
      1279f2cd