updata

f8829eaa · raojy · 5751d8a3 · f8829eaa · 5751d8a3 · f8829eaa
Commit f8829eaa authored Jan 28, 2026 by raojy
Hide whitespace changes
Inline Side-by-side

Showing with 52 additions and 19 deletions

README.md README.md +52 -19

images/result1.png images/result1.png +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -247,26 +247,11 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python inference.py
 ### vllm
-如果在启动服务时遇到 `ValidationError` 或 `KeyError` 相关的配置错误，通常是因为当前 vLLM 版本尚未完全兼容新版模型的配置文件字段。
-**解决方案：** 请手动修改模型目录下的 `config.json` 文件，将 `rope_scaling` 配置段中的 `type` 字段重命名为 `rope_type`。
+#### 单卡推理
-修改前:
-<div align=center>
-    <img src="./images/before_fix.png"/>
-</div>
-修改后:
-<div align=center>
-    <img src="./images/after_fix.png"/>
-</div>
-#### 单机推理
 ```
+# 适用于3B/7B模型
 # 启动命令
 vllm serve Qwen/Qwen2.5-VL-3B-Instruct \
    --trust-remote-code \
@@ -291,12 +276,60 @@ curl http://localhost:8000/v1/chat/completions   \
 ```
+### 多卡推理
+```
+# 适用于72B模型
+# 启动命令
+python3 -m vllm.entrypoints.openai.api_server \
+  --model "Qwen/Qwen2.5-VL-72B-Instruct" \
+  --served-model-name "qwen-vl" \
+  --tensor-parallel-size 4 \
+  --gpu-memory-utilization 0.95 \
+  --max-model-len 4096 \
+  --dtype bfloat16 \
+  --enforce-eager \
+  --trust-remote-code \
+  --port 8000
+## client访问
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer EMPTY" \
+  -d '{
+    "model": "qwen-vl",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {
+            "type": "image_url",
+            "image_url": {
+              "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
+            }
+          },
+          {
+            "type": "text",
+            "text": "描述这张图片的内容。"
+          }
+        ]
+      }
+    ],
+    "max_tokens": 512,
+    "temperature": 0.7,
+    "top_p": 0.8
+  }'
+```
 ### 效果展示
 <div align=center>
-    <img src="./images/perform.png"/>
+    <img src="./images/result1.png"/>
 </div>
 ### 精度
 DCU与GPU精度一致，推理框架：vllm。
@@ -307,7 +340,7 @@ DCU与GPU精度一致，推理框架：vllm。
 | --------------------------- | ---------- | ----------------- | ---------------- | ------------------------------------------------------------ |
 | **Qwen2.5-VL-3B-Instruct**  | 3B         | K100AI, BW1000 等 | 1                | [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) |
 | **Qwen2.5-VL-7B-Instruct**  | 7B         | K100AI, BW1000 等 | 1                | [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) |
-| **Qwen2.5-VL-72B-Instruct** | 72B        | K100AI, BW1000 等 | 8                | [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) |
+| **Qwen2.5-VL-72B-Instruct** | 72B        | K100AI, BW1000 等 | 4                | [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) |
 ## 源码仓库及问题反馈

--- a/images/result1.png
+++ b/images/result1.png