1. 04 Mar, 2025 1 commit
  2. 24 Dec, 2024 1 commit
    • myhloli's avatar
      feat(llm): add LLM-aided formula and text correction · c660fdc8
      myhloli authored
      - Add LLM-aided formula and text correction functionality
      - Update config reader to include LLM-aided settings
      - Create new LLM-aided processing module
      - Update main processing script to incorporate LLM-aided corrections
      - Modify download scripts to check for new config version
      c660fdc8
  3. 30 Nov, 2024 1 commit
  4. 28 Nov, 2024 1 commit
  5. 25 Nov, 2024 1 commit
  6. 22 Nov, 2024 1 commit
  7. 19 Nov, 2024 1 commit
  8. 18 Nov, 2024 2 commits
    • myhloli's avatar
      refactor(para): adjust right margin threshold based on block width · 69805f4b
      myhloli authored
      - Introduce a variable threshold for right margin based on block width
      - Use 0.26 * block_weight for wider blocks (block_weight_radio >= 0.5)
      - Use 0.36 * block_weight for narrower blocks- This change aims to improve paragraph splitting accuracy for different block widths
      69805f4b
    • myhloli's avatar
      refactor(para): improve paragraph splitting logic · 517fbe5b
      myhloli authored
      - Add page size information to blocks
      - Calculate block width ratio relative to page width
      - Adjust threshold for determining right side indentation
      - Implement additional checks for merging blocks across pages
      - Improve logic for identifying list structures
      517fbe5b
  9. 11 Nov, 2024 1 commit
  10. 03 Nov, 2024 1 commit
  11. 02 Nov, 2024 2 commits
    • myhloli's avatar
      feat(list): improve list detection algorithm- Add center_close_num and... · 2bf6c268
      myhloli authored
      feat(list): improve list detection algorithm- Add center_close_num and external_sides_not_close_num variables to analyze line positioning
      - Implement new list detection condition for centered lines
      - Enhance existing list detection logic with additional checks
      2bf6c268
    • myhloli's avatar
      fix(list): improve list identification accuracy- Adjust the threshold for... · a8f2e7d6
      myhloli authored
      fix(list): improve list identification accuracy- Adjust the threshold for determining right-side spacing to 0.26 * block_weight
      - Add TODO comment for special list identification with all centered lines- Modify the condition for recognizing short item lists with left alignment
      - Update the condition for identifying the end of a list item
      a8f2e7d6
  12. 21 Oct, 2024 1 commit
    • myhloli's avatar
      refactor(para): improve paragraph splitting algorithm · 8cc76c49
      myhloli authored
      - Adjust the threshold for identifying index blocks from 3 lines to 2 lines
      - Add a new function __is_list_group to detect if a group of blocks is a list
      - Modify the paragraph merging logic to handle list groups differently
      8cc76c49
  13. 15 Oct, 2024 3 commits
  14. 14 Oct, 2024 1 commit
    • myhloli's avatar
      feat(list&index block): detect and merge list and index blocks · 1f1dd353
      myhloli authored
      - Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages
      - Update block types to include list and index categories
      - Adjust text merging logic to handle new block types
      - Modify layout drawing to distinguish list and index blocks
      1f1dd353
  15. 10 Oct, 2024 2 commits