- 18 Nov, 2024 1 commit
-
-
myhloli authored
- Add page size information to blocks - Calculate block width ratio relative to page width - Adjust threshold for determining right side indentation - Implement additional checks for merging blocks across pages - Improve logic for identifying list structures
-
- 11 Nov, 2024 1 commit
-
-
hyastar authored
-
- 03 Nov, 2024 1 commit
-
-
myhloli authored
- Add block_height calculation to determine block aspect ratio - Update list identification condition to include aspect ratio check - Improve code readability with better formatting and line breaks
-
- 02 Nov, 2024 2 commits
-
-
myhloli authored
feat(list): improve list detection algorithm- Add center_close_num and external_sides_not_close_num variables to analyze line positioning - Implement new list detection condition for centered lines - Enhance existing list detection logic with additional checks
-
myhloli authored
fix(list): improve list identification accuracy- Adjust the threshold for determining right-side spacing to 0.26 * block_weight - Add TODO comment for special list identification with all centered lines- Modify the condition for recognizing short item lists with left alignment - Update the condition for identifying the end of a list item
-
- 21 Oct, 2024 1 commit
-
-
myhloli authored
- Adjust the threshold for identifying index blocks from 3 lines to 2 lines - Add a new function __is_list_group to detect if a group of blocks is a list - Modify the paragraph merging logic to handle list groups differently
-
- 15 Oct, 2024 3 commits
-
-
myhloli authored
- Update list block detection logic to require at least 2 numeric start lines - Ensure the number of numeric start lines matches the number of end lines - Remove detection of non-border starting lines for simplicity
-
myhloli authored
-
myhloli authored
- Combine __is_list_block() and __is_index_block() into a single function __is_list_or_index_block() - Simplify block type determination logic - Remove redundant code and improve readability - Optimize block merging process
-
- 14 Oct, 2024 1 commit
-
-
myhloli authored
- Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages - Update block types to include list and index categories - Adjust text merging logic to handle new block types - Modify layout drawing to distinguish list and index blocks
-
- 10 Oct, 2024 2 commits
-
-
myhloli authored
-
myhloli authored
- Reintegrate para_split_v3 into the pdf_parse_union_core_v2 process - Add support for specifying page range in doc_analyze_by_custom_model - Implement garbage collection and memory cleaning after processing - Refine image loading from PDF, including handling out-of-range pages
-