- 15 Oct, 2024 3 commits
-
-
myhloli authored
- Update list block detection logic to require at least 2 numeric start lines - Ensure the number of numeric start lines matches the number of end lines - Remove detection of non-border starting lines for simplicity
-
myhloli authored
-
myhloli authored
- Combine __is_list_block() and __is_index_block() into a single function __is_list_or_index_block() - Simplify block type determination logic - Remove redundant code and improve readability - Optimize block merging process
-
- 14 Oct, 2024 1 commit
-
-
myhloli authored
- Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages - Update block types to include list and index categories - Adjust text merging logic to handle new block types - Modify layout drawing to distinguish list and index blocks
-
- 10 Oct, 2024 2 commits
-
-
myhloli authored
-
myhloli authored
- Reintegrate para_split_v3 into the pdf_parse_union_core_v2 process - Add support for specifying page range in doc_analyze_by_custom_model - Implement garbage collection and memory cleaning after processing - Refine image loading from PDF, including handling out-of-range pages
-