Unverified Commit 2c2fcbe8 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

Merge pull request #2403 from myhloli/dev

feat(model_utils): adjust table detection threshold and add features
parents 5e15d9b6 9c37d65f
...@@ -48,6 +48,9 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte ...@@ -48,6 +48,9 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
</div> </div>
# Changelog # Changelog
- 2025/04/29 1.3.10 Released
- Support for custom formula delimiters can be achieved by modifying the `latex-delimiter-config` item in the `magic-pdf.json` file under the user directory.
- Pinned `pdfminer.six` to version `20250324` to prevent parsing failures caused by new versions.
- 2025/04/27 1.3.9 Released - 2025/04/27 1.3.9 Released
- Optimized the formula parsing function to improve the success rate of formula rendering - Optimized the formula parsing function to improve the success rate of formula rendering
- Updated `pdfminer.six` to the latest version, fixing some abnormal PDF parsing issues - Updated `pdfminer.six` to the latest version, fixing some abnormal PDF parsing issues
......
...@@ -47,6 +47,9 @@ ...@@ -47,6 +47,9 @@
</div> </div>
# 更新记录 # 更新记录
- 2025/04/29 1.3.10 发布
- 支持使用自定义公式标识符,可通过修改用户目录下的`magic-pdf.json`文件中的`latex-delimiter-config`项实现。
- 锁定`pdfminer.six``20250324`版本,以避免新版本导致的解析失败问题。
- 2025/04/27 1.3.9 发布 - 2025/04/27 1.3.9 发布
- 优化公式解析功能,提升公式渲染的成功率 - 优化公式解析功能,提升公式渲染的成功率
- 更新`pdfminer.six`到最新版本,修复了部分pdf解析异常问题 - 更新`pdfminer.six`到最新版本,修复了部分pdf解析异常问题
......
...@@ -172,8 +172,8 @@ def filter_nested_tables(table_res_list, overlap_threshold=0.8, area_threshold=0 ...@@ -172,8 +172,8 @@ def filter_nested_tables(table_res_list, overlap_threshold=0.8, area_threshold=0
tables_inside = [j for j in range(len(table_res_list)) tables_inside = [j for j in range(len(table_res_list))
if i != j and is_inside(table_info[j], table_info[i], overlap_threshold)] if i != j and is_inside(table_info[j], table_info[i], overlap_threshold)]
# Continue if there are at least 2 tables inside # Continue if there are at least 3 tables inside
if len(tables_inside) >= 2: if len(tables_inside) >= 3:
# Check if inside tables overlap with each other # Check if inside tables overlap with each other
tables_overlap = any(do_overlap(table_info[tables_inside[idx1]], table_info[tables_inside[idx2]]) tables_overlap = any(do_overlap(table_info[tables_inside[idx1]], table_info[tables_inside[idx2]])
for idx1 in range(len(tables_inside)) for idx1 in range(len(tables_inside))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment