1. 03 Dec, 2024 2 commits
    • myhloli's avatar
      fix(vram): improve VRAM checking logic · 104273cc
      myhloli authored
      - Update VRAM checking logic in app.py and model_utils.py
      - Add None and type checks for VRAM values
      - Adjust concurrency limit calculation in app.py
      - Modify clean_vram function to handle cases with no VRAM information
      104273cc
    • myhloli's avatar
      feat(gradio_app): implement dynamic concurrency limit based on VRAM · b1fe9d4f
      myhloli authored
      - Add get_concurrency_limit function to calculate concurrency limit based on VRAM
      - Update clean_vram function and rename to get_vram for better clarity
      - Apply concurrency limit to the to_markdown function in the Gradio app
      b1fe9d4f
  2. 29 Nov, 2024 1 commit
  3. 28 Nov, 2024 1 commit
  4. 27 Nov, 2024 2 commits
  5. 26 Nov, 2024 2 commits
  6. 24 Nov, 2024 2 commits
  7. 22 Nov, 2024 2 commits
  8. 21 Nov, 2024 2 commits
    • myhloli's avatar
      refactor(txt_parse): improve text extraction accuracy with new algorithm · 309be741
      myhloli authored
      - Implement new text extraction method (txt_spans_extract_v2) to enhance accuracy
      - Add character filling in spans for better text reconstruction
      - Introduce empty span handling using OCR for missed text
      - Optimize span filtering and overlap removal
      309be741
    • myhloli's avatar
      feat(ocr): improve text detection and OCR accuracy · b2e37a2d
      myhloli authored
      - Update OCR utils to handle different box formats and improve angle calculation
      - Modify PDF extraction kit to support OCR option and optimize processing flow
      - Enhance PPOCR model to sort and filter detection boxes, improving text splitting accuracy
      b2e37a2d
  9. 19 Nov, 2024 1 commit
  10. 18 Nov, 2024 2 commits
  11. 15 Nov, 2024 1 commit
  12. 08 Nov, 2024 2 commits
    • myhloli's avatar
      feat(table): add RapidOCR support for RapidTable model · fe2c2c0d
      myhloli authored
      - Integrate RapidOCR with RapidTable model for table recognition
      - Improve memory management for devices with <= 8GB VRAM
      - Update table recognition process to use RapidOCR for RapidTable
      - Add rapidocr-paddle dependency in setup.py
      fe2c2c0d
    • myhloli's avatar
      feat(table): integrate RapidTable model for table recognition · 240fe99e
      myhloli authored
      - Add RapidTable model support for table recognition
      - Update table model configuration and initialization
      - Modify table recognition process to use RapidTable when specified
      - Add RapidTable dependency to setup.py
      240fe99e
  13. 07 Nov, 2024 1 commit
    • myhloli's avatar
      feat(model): add xycut algorithm for block sorting · 7d5850e3
      myhloli authored
      - Implement xycut algorithm to sort blocks when layoutreader fails
      - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
      - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
      7d5850e3
  14. 06 Nov, 2024 1 commit
  15. 05 Nov, 2024 1 commit
  16. 04 Nov, 2024 4 commits
    • myhloli's avatar
      feat(model): add HTML minification to StructTableModel · b5117e72
      myhloli authored
      - Import 're' module for regular expression operations
      - Implement HTML minification for 'output_format=html'
      - Add 'minify_html' method to remove unnecessary whitespace and format HTML
      b5117e72
    • myhloli's avatar
      refactor(model): comment out unused code in ppTableModel · 5ee02a99
      myhloli authored
      - Comment out an unused code block in the ppTableModel.py file
      - Improve code readability and maintainability by removing unnecessary code
      5ee02a99
    • myhloli's avatar
      feat(table): upgrade StructEqTable model and integrate into PDF Extract Kit · 11f23843
      myhloli authored
      - Update StructTableModel to use the latest struct-eqtable library
      - Add support for HTML table extraction in PDF Extract Kit
      - Improve error handling and model initialization
      - Update dependencies in setup.py for struct-eqtable
      11f23843
    • ciaran's avatar
      Update pdf_extract_kit.py · fb6cb8b0
      ciaran authored
      Modify line 397 to ensure compatibility with CPU execution, addressing the issue where specifying 'cpu' in config.json still results in a ValueError for expecting a cuda device but getting 'cpu' during demo execution.
      fb6cb8b0
  17. 28 Oct, 2024 5 commits
  18. 25 Oct, 2024 4 commits
  19. 24 Oct, 2024 3 commits
  20. 23 Oct, 2024 1 commit
    • myhloli's avatar
      feat(model): add support for DocLayout-YOLO model · 1279f2cd
      myhloli authored
      - Add new layout model option: DocLayout-YOLO
      - Implement model initialization and prediction for DocLayout-YOLO
      - Update configuration options to include new model- Modify existing code to support both LayoutLMv3 and DocLayout-YOLO models
      - Update Gradio app to support more Custom Switch
      1279f2cd