feat: add support for JPEG images and update documentation

- Add '.jpeg' to the list of supported image extensions in app.py and read_api.py - Update projects READMEs to indicate that web_demo is deprecated

feat: add support for JPEG images and update documentation
- Add '.jpeg' to the list of supported image extensions in app.py and read_api.py - Update projects READMEs to indicate that web_demo is deprecated
fcb5660f · myhloli · d105d87c · fcb5660f · fcb5660f · fcb5660f
Commit fcb5660f authored Apr 21, 2025 by myhloli
Showing with 5 additions and 5 deletions

magic_pdf/data/read_api.py magic_pdf/data/read_api.py +1 -1

projects/README.md projects/README.md +1 -1

projects/README_zh-CN.md projects/README_zh-CN.md +1 -1

projects/web_api/app.py projects/web_api/app.py +2 -2

No files found.
--- a/magic_pdf/data/read_api.py
+++ b/magic_pdf/data/read_api.py
@@ -116,7 +116,7 @@ def read_local_office(path: str) -> list[PymuDocDataset]:
    shutil.rmtree(temp_dir)
    return ret

-def read_local_images(path: str, suffixes: list[str]=['.png', '.jpg']) -> list[ImageDataset]:
+def read_local_images(path: str, suffixes: list[str]=['.png', '.jpg', '.jpeg']) -> list[ImageDataset]:
    """Read images from path or directory.

    Args:

--- a/projects/README.md
+++ b/projects/README.md
@@ -4,6 +4,6 @@

 - [llama_index_rag](./llama_index_rag/README.md): Build a lightweight RAG system based on llama_index
 - [gradio_app](./gradio_app/README.md): Build a web app based on gradio
- [web_demo](./web_demo/README.md): MinerU online [demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/) localized deployment version
+- ~~[web_demo](./web_demo/README.md): MinerU online [demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/) localized deployment version~~(Deprecated)
 - [web_api](./web_api/README.md): Web API Based on FastAPI
 - [multi_gpu](./multi_gpu/README.md): Multi-GPU parallel processing based on LitServe
--- a/projects/README_zh-CN.md
+++ b/projects/README_zh-CN.md
@@ -4,6 +4,6 @@

 - [llama_index_rag](./llama_index_rag/README_zh-CN.md): 基于 llama_index 构建轻量级 RAG 系统
 - [gradio_app](./gradio_app/README_zh-CN.md): 基于 Gradio 的 Web 应用
- [web_demo](./web_demo/README_zh-CN.md): MinerU在线[demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/)本地化部署版本
+- ~~[web_demo](./web_demo/README_zh-CN.md): MinerU在线[demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/)本地化部署版本~~(已过时)
 - [web_api](./web_api/README.md): 基于 FastAPI 的 Web API
 - [multi_gpu](./multi_gpu/README.md): 基于 LitServe 的多 GPU 并行处理
--- a/projects/web_api/app.py
+++ b/projects/web_api/app.py
@@ -28,7 +28,7 @@ app = FastAPI()

 pdf_extensions = [".pdf"]
 office_extensions = [".ppt", ".pptx", ".doc", ".docx"]
-image_extensions = [".png", ".jpg"]
+image_extensions = [".png", ".jpg", ".jpeg"]

 class MemoryDataWriter(DataWriter):
    def __init__(self):
@@ -128,7 +128,7 @@ def process_file(
        Tuple[InferenceResult, PipeResult]: Returns inference result and pipeline result
    """

-    ds = Union[PymuDocDataset, ImageDataset]
+    ds: Union[PymuDocDataset, ImageDataset] = None
    if file_extension in pdf_extensions:
        ds = PymuDocDataset(file_bytes)
    elif file_extension in office_extensions: