fix(pdf_parse): improve OCR result handling

- Add null check for OCR results to prevent errors on empty lists - Enhance robustness of OCR text processing in the magic-pdf project

fix(pdf_parse): improve OCR result handling
- Add null check for OCR results to prevent errors on empty lists - Enhance robustness of OCR text processing in the magic-pdf project
6b296ee2 · myhloli · f1e2f084 · 6b296ee2
Commit 6b296ee2 authored Nov 22, 2024 by myhloli
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

magic_pdf/pdf_parse_union_core_v2.py magic_pdf/pdf_parse_union_core_v2.py +1 -1

No files found.
--- a/magic_pdf/pdf_parse_union_core_v2.py
+++ b/magic_pdf/pdf_parse_union_core_v2.py
@@ -222,7 +222,7 @@ def txt_spans_extract_v2(pdf_page, spans, all_bboxes, all_discarded_blocks, lang
            ocr_res = ocr_model.ocr(span_img, det=False)
            # logger.info(f"ocr_res: {ocr_res}")
            # logger.info(f"empty_span: {span}")
-            if len(ocr_res) > 0:
+            if ocr_res and len(ocr_res) > 0:
                if len(ocr_res[0]) > 0:
                    ocr_text, ocr_score = ocr_res[0][0]
                    if ocr_score > 0.5 and len(ocr_text) > 0: