"docs/vscode:/vscode.git/clone" did not exist on "71f58f8db08dfd0d869e0c0bbe0a7fd197df1660"
Commit 97bcc8b2 authored by myhloli's avatar myhloli
Browse files

refactor(pdf_parse): improve code readability and maintainability

parent 034c59a8
...@@ -182,8 +182,7 @@ def txt_spans_extract_v2(pdf_page, spans, all_bboxes, all_discarded_blocks, lang ...@@ -182,8 +182,7 @@ def txt_spans_extract_v2(pdf_page, spans, all_bboxes, all_discarded_blocks, lang
for block in all_bboxes + all_discarded_blocks: for block in all_bboxes + all_discarded_blocks:
if block[7] in [BlockType.ImageBody, BlockType.TableBody, BlockType.InterlineEquation]: if block[7] in [BlockType.ImageBody, BlockType.TableBody, BlockType.InterlineEquation]:
continue continue
overlap_ratio = calculate_overlap_area_in_bbox1_area_ratio(span['bbox'], block[0:4]) if calculate_overlap_area_in_bbox1_area_ratio(span['bbox'], block[0:4]) > 0.5:
if overlap_ratio > 0.5:
if block in all_bboxes: if block in all_bboxes:
useful_spans.append(span) useful_spans.append(span)
else: else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment