Docs: update config instructions for vLLM compatibility

1ad6cb11 · raojy · bd1df840 · 1ad6cb11 · 1ad6cb11 · 1ad6cb11
Commit 1ad6cb11 authored Jan 26, 2026 by raojy
Showing with 51 additions and 8 deletions

README.md README.md +51 -8

images/after_fix.png images/after_fix.png +0 -0

images/before_fix.png images/before_fix.png +0 -0

images/result1.png images/result1.png +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -232,8 +232,6 @@ torchrun ./LLaMA-Factory/src/train.py  \
 ### 单机单卡
 ```
 python inference.py
 ```
@@ -244,18 +242,63 @@ python inference.py
 CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
 ```
-## result
+### vllm
+如果在启动服务时遇到 `ValidationError` 或 `KeyError` 相关的配置错误，通常是因为当前 vLLM 版本尚未完全兼容新版模型的配置文件字段。
- text:" OCR 图片中的文字信息 "
+**解决方案：** 请手动修改模型目录下的 `config.json` 文件，将 `rope_scaling` 配置段中的 `type` 字段重命名为 `rope_type`。
-<div align=left>
+修改前:
-    <img src="./images/result.png"/>
+<div align=center>
+    <img src="./images/before_fix.png"/>
+</div>
+修改后:
+<div align=center>
+    <img src="./images/after_fix.png"/>
 </div>
+#### 单机推理
+```
+## serve启动
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export VLLM_MLA_DISABLE=0
+export VLLM_USE_FLASH_MLA=1
+# 启动命令
+vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
+    --trust-remote-code \
+    --max-model-len 32768 \
+    --served-model-name qwen-vl \
+    --dtype bfloat16 \
+    --tensor-parallel-size 1 \
+    --gpu-memory-utilization 0.9
+## client访问
+curl http://localhost:8000/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "minimax",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
+### 效果展示
+![image-20260126100649166](C:\Users\45210\Desktop\曙光上库\Qwen2.5-vl_pytorch\images\result1.png)
 ### 精度
-无
+DCU与GPU精度一致，推理框架：vllm。
@@ -264,6 +307,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
 `对话问答`
 ### 热点应用行业
 `科研,教育,政府,金融`
 ## 预训练权重
 [ModelScope](https://modelscope.cn/)
 - [Qwen2.5-VL-3B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)
@@ -279,4 +323,3 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
 - https://qwenlm.github.io/zh/blog/qwen2.5-vl/
 - https://github.com/QwenLM/Qwen2.5-VL
--- a/images/after_fix.png
+++ b/images/after_fix.png
--- a/images/before_fix.png
+++ b/images/before_fix.png
--- a/images/result1.png
+++ b/images/result1.png