Commit 71efb101 authored by myhloli's avatar myhloli
Browse files

refactor(pdf_parse): adjust line calculation for block height

- Remove unnecessary addition of 1 when calculating lines for block height
- This change affects the logic for both potential double-column and triple-column structures
parent be505a95
......@@ -452,14 +452,14 @@ def insert_lines_into_block(block_bbox, line_height, page_w, page_h):
if (
block_height > page_h * 0.25 and page_w * 0.5 > block_weight > page_w * 0.25
): # 可能是双列结构,可以切细点
lines = int(block_height / line_height) + 1
lines = int(block_height / line_height)
else:
# 如果block的宽度超过0.4页面宽度,则将block分成3行(是一种复杂布局,图不能切的太细)
if block_weight > page_w * 0.4:
lines = 3
line_height = (y1 - y0) / lines
elif block_weight > page_w * 0.25: # (可能是三列结构,也切细点)
lines = int(block_height / line_height) + 1
lines = int(block_height / line_height)
else: # 判断长宽比
if block_height / block_weight > 1.2: # 细长的不分
return [[x0, y0, x1, y1]]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment