updata

6d5cbdb8 · raojy · 9e2e1a15 · 6d5cbdb8
Commit 6d5cbdb8 authored Jan 30, 2026 by raojy
Hide whitespace changes
Inline Side-by-side

Showing with 197 additions and 45 deletions

README.md README.md +197 -45

No files found.
--- a/README.md
+++ b/README.md
@@ -37,19 +37,48 @@ Visual Coding Boost：从图像/视频生成 Draw.io/HTML/CSS/JS。
 | torchvision  | 0.20.1+das.opt1.dtk25042 |
 |  flash_attn  |  2.6.1+das.opt1.dtk2504  |
 |      av      |          16.0.1          |
+|     vllm     |          0.11.0+das.opt1.alpha.dtk25042.20251225.gca4598a4          |
+## 硬件需求
+DCU型号：K100AI,节点数量：2台,卡数：16 张。

-推荐使用镜像:

- 挂载地址`-v`，`{docker_name}`和 `{docker_image_name}`根据实际模型情况修改
+推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
+
+- 挂载地址`-v`

 ```bash
-docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_path/:/path/your_code_path/ -v /opt/hyhal/:/opt/hyhal/:ro {docker_image_name} bash
+docker run -it \
+    --shm-size 60g \
+    --network=host \
+    --name {docker_name} \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mkfd \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -u root \
+    -v /opt/hyhal/:/opt/hyhal/:ro \
+    -v /path/your_code_data/:/path/your_code_data/ \
+    {docker_image_name} bash

 示例如下：
-docker pull harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
-docker run -it --shm-size 200g --network=host --name qwen3vl --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_path/:/path/your_code_path/ -v /opt/hyhal/:/opt/hyhal/:ro harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226 bash
-#视频推理时安装PyAV后端依赖
-pip install av
+docker run -it \
+    --shm-size 60g \
+    --network=host \
+    --name qwen3 \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mkfd \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -u root \
+    -v /opt/hyhal/:/opt/hyhal/:ro \
+    -v /path/your_code_data/:/path/your_code_data/ \
+    image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
 ```

 更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
@@ -80,88 +109,210 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_multi_images.py
 HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py
 ```

+## vllm
+
+#### 单机推理
+
+```bash
+## serve启动
+export HF_HUB_OFFLINE=1
+export TRANSFORMERS_OFFLINE=1
+
+vllm serve Qwen3-VL-8B-Instruct \
+--trust-remote-code \
+--max-model-len 32768 \
+--served-model-name qwen-vl \
+--dtype bfloat16 \
+--tensor-parallel-size 1 \
+--gpu-memory-utilization 0.9
+
+## client访问
+curl http://localhost:8000/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "qwen-vl",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
+
+### 多机多卡推理
+样例模型：[Qwen3-VL-235B-A22B-Thinking ](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)
+
+1. 加入环境变量
+> 请注意：
+> 每个节点上的环境变量都写到.sh文件中，保存后各个计算节点分别source`.sh`文件
+>
+> VLLM_HOST_IP：节点本地通信口ip，尽量选择IB网卡的IP，**避免出现rccl超时问题**
+>
+> NCCL_SOCKET_IFNAME和 GLOO_SOCKET_IFNAME：节点本地通信网口ip对应的名称
+>
+> 通信口和ip查询方法：ifconfig
+>
+> IB口状态查询：ibstat  !!!一定要active激活状态才可用，各个节点要保持统一
+
+```bash
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export VLLM_HOST_IP=x.x.x.x # 对应计算节点的IP，选择IB口SOCKET_IFNAME对应IP地址
+export NCCL_SOCKET_IFNAME=ibxxxx
+export GLOO_SOCKET_IFNAME=ibxxxx
+export NCCL_IB_HCA=mlx5_0:1 # 环境中的IB网卡名字
+unset NCCL_ALGO
+export NCCL_MIN_NCHANNELS=16
+export NCCL_MAX_NCHANNELS=16
+export NCCL_NET_GDR_READ=1
+export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export VLLM_SPEC_DECODE_EAGER=1
+export VLLM_MLA_DISABLE=0
+export VLLM_USE_FLASH_MLA=1
+
+# K100_AI集群建议额外设置的环境变量：
+export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
+export VLLM_RPC_TIMEOUT=1800000
+
+# 海光CPU绑定核
+export VLLM_NUMA_BIND=1
+export VLLM_RANK0_NUMA=0
+export VLLM_RANK1_NUMA=1
+export VLLM_RANK2_NUMA=2
+export VLLM_RANK3_NUMA=3
+export VLLM_RANK4_NUMA=4
+export VLLM_RANK5_NUMA=5
+export VLLM_RANK6_NUMA=6
+export VLLM_RANK7_NUMA=7
+```
+
+2. 启动RAY集群
+> x.x.x.x 对应第一步 VLLM_HOST_IP
+
+```bash
+# head节点执行
+ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=32
+# worker节点执行
+ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
+```
+3. 启动vllm server
+> intel cpu 需要加参数：`--enforce-eager`
+
+```bash
+vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
+  --host *.*.*.* \
+  --port 8000 \
+  --distributed-executor-backend ray \
+  --tensor-parallel-size 8 \
+  --pipeline-parallel-size 2 \
+  --trust-remote-code \
+  --dtype bfloat16 \
+  --max-model-len 32768 \
+  --max-num-seqs 128 \
+  --block-size 64 \
+  --gpu-memory-utilization 0.90 \
+  --enforce-eager \
+  --allowed-local-media-path / \
+  --served-model-name qwen-vl \
+  --override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}'
+  ```
+
+启动完成后可通过以下方式访问：
+```bash
+curl http://x.x.x.x:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen-vl",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {
+            "type": "image_url",
+            "image_url": {
+              "url": "file://test22.png"
+            }
+          },
+          {
+            "type": "text",
+            "text": "请详细描述这张图片的内容。"
+          }
+        ]
+      }
+    ],
+    "max_tokens": 512,
+    "temperature": 0.7
+  }'
+```
+
+
+
+
+## vllm效果展示
+
+
 ## 效果展示
+
 **场景1** ：普通图文对话
 Input:
+
 - image:
+
 <div align=center>
    <img src="./doc/demo.jpeg"/>
 </div>

+
 - text: "Describe this image."

 Output:
+
 <div align=center>
    <img src="./doc/result.png"/>
 </div>

+
 **场景2** ：多图像推理
 Input:
+
 - image1:
+
 <div align=center>
    <img src="./doc/demo.jpeg"/>
 </div>

+
 - image2:
+
 <div align=center>
    <img src="./doc/dog.jpg"/>
 </div>

+
 - text: "Identify the similarities between these images."

 Output:
+
 <div align=center>
    <img src="./doc/result_multi_images.png"/>
 </div>

+
 **场景3** ：视频推理
+
 - Vedio:
-![space_woaudio](./doc/space_woaudio.mp4)
+  ![space_woaudio](./doc/space_woaudio.mp4)

 - text:: "Describe this video."

 Output:
+
 <div align=center>
    <img src="./doc/result_vedio.png"/>
 </div>


-## vllm
-
-#### 单机推理
-
-```bash
-## serve启动
-export HF_HUB_OFFLINE=1
-export TRANSFORMERS_OFFLINE=1
-
-vllm serve Qwen3-VL-8B-Instruct \
--trust-remote-code \
--max-model-len 32768 \
--served-model-name qwen-vl \
--dtype bfloat16 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.9
-
-## client访问
-curl http://localhost:8000/v1/chat/completions   \
-    -H "Content-Type: application/json"  \
-    -d '{
-        "model": "qwen-vl",
-        "messages": [
-            {
-                "role": "user",
-                "content": "牛顿提出了哪三大运动定律？请简要说明。"
-            }
-        ]
-    }'
-```
-
-## vllm效果展示

-<div align=center>
-    <img src="./doc/perform.png"/>
-</div>

 ### 精度

@@ -170,8 +321,9 @@ curl http://localhost:8000/v1/chat/completions   \
 ## 预训练权重
 |         模型名称         | 权重大小 | DCU型号  | 最低卡数需求 |下载地址|
 |:--------------------:|:----:|:----------:|:------:|:----------:|
-| Qwen3-VL-4B-Instruct |  4B  | BW1000|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
-| Qwen3-VL-8B-Instruct |  8B  | BW1000|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
+| Qwen3-VL-4B-Instruct |  4B  | K100AI|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
+| Qwen3-VL-8B-Instruct |  8B  | K100AI|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
+| Qwen3-VL-235B-A22B-Thinking |  235B  | K100AI|   16    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |

 ## 源码仓库及问题反馈
 - https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch