"git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "3578b5e33714649bd6c16d69683507e215705da2"
Unverified Commit 8fb5f547 authored by Timothy Jaeryang Baek's avatar Timothy Jaeryang Baek Committed by GitHub
Browse files

Merge pull request #1050 from jannikstdl/rag-pdf-ocr

feat: added ocr functionality to the pdf loader
parents 2111398d 089a63e0
...@@ -425,7 +425,7 @@ def get_loader(filename: str, file_content_type: str, file_path: str): ...@@ -425,7 +425,7 @@ def get_loader(filename: str, file_content_type: str, file_path: str):
] ]
if file_ext == "pdf": if file_ext == "pdf":
loader = PyPDFLoader(file_path) loader = PyPDFLoader(file_path, extract_images=True)
elif file_ext == "csv": elif file_ext == "csv":
loader = CSVLoader(file_path) loader = CSVLoader(file_path)
elif file_ext == "rst": elif file_ext == "rst":
......
...@@ -34,6 +34,7 @@ pandas ...@@ -34,6 +34,7 @@ pandas
openpyxl openpyxl
pyxlsb pyxlsb
xlrd xlrd
rapidocr-onnxruntime
faster-whisper faster-whisper
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment