"...git@developer.sourcefind.cn:OpenDAS/torchaudio.git" did not exist on "a76b0066d4b3fbe303b92e8bb903514a11195fab"
Unverified Commit 8fb5f547 authored by Timothy Jaeryang Baek's avatar Timothy Jaeryang Baek Committed by GitHub
Browse files

Merge pull request #1050 from jannikstdl/rag-pdf-ocr

feat: added ocr functionality to the pdf loader
parents 2111398d 089a63e0
......@@ -425,7 +425,7 @@ def get_loader(filename: str, file_content_type: str, file_path: str):
]
if file_ext == "pdf":
loader = PyPDFLoader(file_path)
loader = PyPDFLoader(file_path, extract_images=True)
elif file_ext == "csv":
loader = CSVLoader(file_path)
elif file_ext == "rst":
......
......@@ -34,6 +34,7 @@ pandas
openpyxl
pyxlsb
xlrd
rapidocr-onnxruntime
faster-whisper
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment