refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation.
- Remove unused functions such as split_long_words, ocr_mk_mm_markdown_with_para, etc. - Simplify ocr_mk_markdown_with_para_core_v2 by removing unnecessary language detection and word splitting logic- Remove wordninja dependency from requirements - Update ocr_model_init to include additional parameters for OCR model configuration
Showing
| ... | ... | @@ -8,7 +8,6 @@ pdfminer.six==20231228 |
| pydantic>=2.7.2,<2.8.0 | ||
| PyMuPDF>=1.24.9 | ||
| scikit-learn>=1.0.2 | ||
| wordninja>=2.0.0 | ||
| torch>=2.2.2,<=2.3.1 | ||
| transformers | ||
| # The requirements.txt must ensure that only necessary external dependencies are introduced. If there are new dependencies to add, please contact the project administrator. |
Please register or sign in to comment