Commit 089a63e0 authored by Jannik Streidl's avatar Jannik Streidl
Browse files

feat: added ocr functionality to the pdf loader

parent eb51ad14
......@@ -419,7 +419,7 @@ def get_loader(filename: str, file_content_type: str, file_path: str):
]
if file_ext == "pdf":
loader = PyPDFLoader(file_path)
loader = PyPDFLoader(file_path, extract_images=True)
elif file_ext == "csv":
loader = CSVLoader(file_path)
elif file_ext == "rst":
......
......@@ -33,6 +33,7 @@ pandas
openpyxl
pyxlsb
xlrd
rapidocr-onnxruntime
faster-whisper
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment