1. 15 Oct, 2024 2 commits
    • myhloli's avatar
      refactor(pdf): adjust span filling threshold in block construction · 7e301b84
      myhloli authored
      Increased the threshold for filling spans in blocks from 0.3 to 0.5 to improve the accuracy of block formation. This change helps refine the grouping of spans into blocks, potentially enhancing the overall structure and readability of the PDF content.
      7e301b84
    • myhloli's avatar
      refactor(para_split_v3): merge list and index block detection · fdcb49d3
      myhloli authored
      - Combine __is_list_block() and __is_index_block() into a single function __is_list_or_index_block()
      - Simplify block type determination logic
      - Remove redundant code and improve readability
      - Optimize block merging process
      fdcb49d3
  2. 14 Oct, 2024 2 commits
    • myhloli's avatar
      fix(magic_pdf): include List and Index block types in processing · 0a9a6d3e
      myhloli authored
      Add List and Index to the list of block types being processed in the draw_bbox.py file. This inclusion ensures that these block types are handled similarly to other text-containing blocks, improving the overall document processing accuracy and consistency.
      0a9a6d3e
    • myhloli's avatar
      feat(list&index block): detect and merge list and index blocks · 1f1dd353
      myhloli authored
      - Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages
      - Update block types to include list and index categories
      - Adjust text merging logic to handle new block types
      - Modify layout drawing to distinguish list and index blocks
      1f1dd353
  3. 10 Oct, 2024 2 commits
  4. 09 Oct, 2024 2 commits
  5. 08 Oct, 2024 20 commits
  6. 06 Oct, 2024 2 commits
  7. 30 Sep, 2024 6 commits
  8. 29 Sep, 2024 2 commits
  9. 28 Sep, 2024 2 commits
    • myhloli's avatar
      refactor(magic_pdf): import model helpers directly for clarity · 42a7d792
      myhloli authored
      Update import statements in `pdf_parse_union_core_v2.py` to directly import
      `prepare_inputs`, `boxes2inputs`, and `parse_logits` from `magic_pdf.model.v3.helpers`
      instead of from `magic_pdf.model.v3`. This change streamlines the imports, making the
      code more readable and maintaining a cleaner approach to modular design.
      42a7d792
    • myhloli's avatar
      refactor(pdf_parse_union_core_v2): update import paths to use new package structure · 5522d0a3
      myhloli authored
      Adapt import statements in `pdf_parse_union_core_v2.py` to reflect the updated packagestructure, changing from the `magic_pdf.v3.helpers` module to the `magic_pdf.model.v3`
      module. This ensures compatibility with the revised directory layout.
      5522d0a3