Add results and Update README

92a367a9 · chenych · 92a367a9 · 92a367a9 · 92a367a9 · 92a367a9
Commit 92a367a9 authored Oct 31, 2025 by chenych
15 changed files
--- a/Contributors.md
+++ b/Contributors.md
+# Contributors
+None
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+MIT License
+
+Copyright (c) 2025 DeepSeek
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
+# PaddleOCR-VL
+## 论文
+[PaddleOCR-VL](https://arxiv.org/abs/2510.14528)
+
+## 模型结构
+PaddleOCR-VL-0.9B是百度PaddlePaddle团队于2025年10月发布的超轻量级视觉-语言模型，专门针对文档解析场景优化。它是ERNIE-4.5系列中最强大的衍生模型之一。其核心组件为 PaddleOCR-VL-0.9B，这是一种紧凑而强大的视觉语言模型（VLM），它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成，以实现精准的元素识别。该创新模型高效支持 109 种语言，并在识别复杂元素（如文本、表格、公式和图表）方面表现出色，同时保持极低的资源消耗。
+<div align=center>
+    <img src="./doc/model.png"/>
+</div>
+
+## 算法原理
+PaddleOCR-VL 将复杂的文档解析任务分解为两个阶段。第一阶段 PP-DocLayoutV2 负责版面分析，定位语义区域并预测其阅读顺序。随后，第二阶段 PaddleOCR-VL-0.9B 基于这些版面预测，对文本、表格、公式和图表等多样化内容进行细粒度识别。最后，轻量级后处理模块聚合两阶段输出，并将最终文档格式化为结构化的 Markdown 和 JSON。
+<div align=center>
+    <img src="./doc/method.png"/>
+</div>
+
+## 环境配置
+### 硬件需求
+DCU型号：K100AI，节点数量：1台，卡数：1张。
+
+`-v 路径`、`docker_name`根据实际情况修改
+
+### Docker（方法一）
+```bash
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.9.2-ubuntu22.04-dtk25.04.2-py3.10-paddleocr-vl
+
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
+
+cd /your_code_path/paddleocr-vl_paddle
+```
+
+### Dockerfile（方法二）
+```bash
+cd docker
+docker build --no-cache -t paddleocr-vl:latest .
+
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
+
+cd /your_code_path/paddleocr-vl_paddle
+```
+
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+```bash
+DTK: 25.04.2
+python: 3.10.12
+vllm: 0.9.2+das.opt1.dtk25042
+transformers: 4.57.1
+```
+`Tips：以上dtk驱动、pytorch等DCU相关工具版本需要严格一一对应`, 其它非深度学习库参照requirements.txt安装：
+```bash
+python -m pip install paddlepaddle-dcu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/dcu/
+python -m pip install -U "paddleocr[doc-parser]"
+python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
+```
+
+## 数据集
+暂无
+
+## 训练
+暂无
+
+## 推理
+> 模型地址，测试图片路径，输出路径根据实际情况修改。
+### 命令行推理
+```bash
+export PADDLE_PDX_DISABLE_DEV_MODEL_WL=1
+
+paddleocr doc_parser -i ./doc/paddleocr_vl_demo.png --device DCU --precision fp32 --save_path ./output
+```
+### vllm
+serve端
+```bash
+export PADDLE_PDX_DISABLE_DEV_MODEL_WL=true
+
+vllm serve PaddlePaddle/PaddleOCR-VL --trust-remote-code --max-model-len 16384 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.8 --served-model-name PaddleOCR-VL-0.9B
+```
+client
+```bash
+curl http://localhost:8000/v1/chat/completions   \
+    -H "Content-Type:application/json"  \
+    -d '{
+        "messages": [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image_url", "image_url": {"url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png"}},
+                    {"type": "text", "text": "OCR:"}
+                ]
+            }
+        ],
+        "temperature": 0.7
+    }'
+```
+
+## result
+
+<div align=center>
+    <img src="./doc/result-dcu.png"/>
+</div>
+
+### 精度
+DCU与GPU精度一致，推理框架：paddle。
+
+## 应用场景
+### 算法类别
+OCR
+
+### 热点应用行业
+`制造,金融,交通,教育,医疗`
+
+## 预训练权重
+- [PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL)
+
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/deepseek-ocr_pytorch
+
+## 参考资料
+- https://github.com/PaddlePaddle/PaddleOCR
+- https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html
--- a/doc/PaddleOCR-VL.pdf
+++ b/doc/PaddleOCR-VL.pdf
--- a/doc/method.png
+++ b/doc/method.png
--- a/doc/model.png
+++ b/doc/model.png
--- a/doc/paddleocr_vl_demo.png
+++ b/doc/paddleocr_vl_demo.png
--- a/doc/result-dcu.png
+++ b/doc/result-dcu.png
--- a/doc/钢笔中文手写_000050_crop_9.jpg
+++ b/doc/钢笔中文手写_000050_crop_9.jpg
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.9.2-ubuntu22.04-dtk25.04.2-py3.10-paddleocr-vl
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1791
+# 模型名称
+modelName=paddleocr-vl_paddle
+# 模型描述
+modelDescription=飞桨正式发布新一代多模态文档解析模型方案PaddleOCR-VL！该方案仅0.9B参数就刷新了多个权威文档解析评测记录，并具备109种语言的文档解析能力。
+# 应用场景
+appScenario=推理,OCR,制造,金融,交通,教育,医疗
+# 框架类型
+frameType=paddle
+# 加速卡类型
+accelerateType=K100AI,BW1000
\ No newline at end of file
--- a/paddleocr-vl-image.py
+++ b/paddleocr-vl-image.py
+from paddleocr import PaddleOCRVL
+
+pipeline = PaddleOCRVL(device='DCU')
+# pipeline = PaddleOCRVL(use_doc_orientation_classify=True) # 通过 use_doc_orientation_classify 指定是否使用文档方向分类模型
+# pipeline = PaddleOCRVL(use_doc_unwarping=True) # 通过 use_doc_unwarping 指定是否使用文本图像矫正模块
+# pipeline = PaddleOCRVL(use_layout_detection=False) # 通过 use_layout_detection 指定是否使用版面区域检测排序模块
+output = pipeline.predict("./doc/paddleocr_vl_demo.png")
+for res in output:
+    res.print() ## 打印预测的结构化输出
+    res.save_to_json(save_path="output") ## 保存当前图像的结构化json结果
+    res.save_to_markdown(save_path="output") ## 保存当前图像的markdown格式的结果
--- a/paddleocr-vl-pdf.py
+++ b/paddleocr-vl-pdf.py
+from pathlib import Path
+from paddleocr import PaddleOCRVL
+
+input_file = "./your_pdf_file.pdf"
+output_path = Path("./output")
+
+pipeline = PaddleOCRVL(device='DCU')
+output = pipeline.predict(input=input_file)
+
+markdown_list = []
+markdown_images = []
+
+for res in output:
+    md_info = res.markdown
+    markdown_list.append(md_info)
+    markdown_images.append(md_info.get("markdown_images", {}))
+
+markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
+
+mkd_file_path = output_path / f"{Path(input_file).stem}.md"
+mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
+
+with open(mkd_file_path, "w", encoding="utf-8") as f:
+    f.write(markdown_texts)
+
+for item in markdown_images:
+    if item:
+        for path, image in item.items():
+            file_path = output_path / path
+            file_path.parent.mkdir(parents=True, exist_ok=True)
+            image.save(file_path)
\ No newline at end of file
--- a/requirements.txt
+++ b/requirements.txt
+shapely
+scikit-image
+pyclipper
+lmdb
+tqdm
+numpy
+rapidfuzz
+opencv-python
+opencv-contrib-python
+cython
+Pillow
+pyyaml
+requests
+albumentations
+# to be compatible with albumentations
+albucore
+packaging