• myhloli's avatar
    fix(pdf): improve ligature handling and text extraction · c638fc5d
    myhloli authored
    - Move ligature replacement function to pdf_parse_union_core_v2.py
    - Optimize ligature replacement using a more efficient approach
    - Modify text extraction flags to preserve ligatures in PDF content
    - Remove unnecessary function from ocr_mkcontent.py
    c638fc5d
ocr_mkcontent.py 12.4 KB