updata

893f761a · raojy · d190605a · 893f761a · 893f761a
Commit 893f761a authored Jan 30, 2026 by raojy
Hide whitespace changes
Inline Side-by-side

Showing with 20 additions and 36 deletions

README.md README.md +19 -35

model.properties model.properties +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -44,26 +44,9 @@ DCU型号：K100AI,节点数量：2台,卡数：16 张。

 推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226

- 挂载地址`-v`
+- 挂载地址`-v`根据实际模型情况修改

 ```bash
-docker run -it \
-    --shm-size 60g \
-    --network=host \
-    --name {docker_name} \
-    --privileged \
-    --device=/dev/kfd \
-    --device=/dev/dri \
-    --device=/dev/mkfd \
-    --group-add video \
-    --cap-add=SYS_PTRACE \
-    --security-opt seccomp=unconfined \
-    -u root \
-    -v /opt/hyhal/:/opt/hyhal/:ro \
-    -v /path/your_code_data/:/path/your_code_data/ \
-    {docker_image_name} bash
-
-示例如下：
 docker run -it \
    --shm-size 60g \
    --network=host \
@@ -118,7 +101,7 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py
 export HF_HUB_OFFLINE=1
 export TRANSFORMERS_OFFLINE=1

-vllm serve Qwen3-VL-8B-Instruct \
+vllm serve Qwen/Qwen3-VL-8B-Instruct \
 --trust-remote-code \
 --max-model-len 32768 \
 --served-model-name qwen-vl \
@@ -196,11 +179,10 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
 ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
 ```
 3. 启动vllm server
-> intel cpu 需要加参数：`--enforce-eager`

 ```bash
 vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
-  --host *.*.*.* \
+  --host x.x.x.x \
  --port 8000 \
  --distributed-executor-backend ray \
  --tensor-parallel-size 8 \
@@ -211,14 +193,14 @@ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
  --max-num-seqs 128 \
  --block-size 64 \
  --gpu-memory-utilization 0.90 \
-  --enforce-eager \
  --allowed-local-media-path / \
  --served-model-name qwen-vl \
  --override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}'
-  ```
+```

 启动完成后可通过以下方式访问：
 ```bash
+# /path/to/your/project 请改为图像文件存储的目录
 curl http://x.x.x.x:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
@@ -230,7 +212,7 @@ curl http://x.x.x.x:8000/v1/chat/completions \
          {
            "type": "image_url",
            "image_url": {
-              "url": "file://test22.png"
+              "url": "file:///path/to/your/project/doc/test.png"
            }
          },
          {
@@ -246,8 +228,6 @@ curl http://x.x.x.x:8000/v1/chat/completions \
 ```


-
-
 ## vllm效果展示


@@ -310,20 +290,24 @@ Output:
 <div align=center>
    <img src="./doc/result_vedio.png"/>
 </div>
-
-
-
-
 ### 精度

 `DCU与GPU精度一致，支持推理框架：transformers、vllm。`

 ## 预训练权重
-|         模型名称         | 权重大小 | DCU型号  | 最低卡数需求 |下载地址|
-|:--------------------:|:----:|:----------:|:------:|:----------:|
-| Qwen3-VL-4B-Instruct |  4B  | K100AI|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
-| Qwen3-VL-8B-Instruct |  8B  | K100AI|   1    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
-| Qwen3-VL-235B-A22B-Thinking |  235B  | K100AI|   16    | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |
+
+## Qwen3-VL 全系列模型清单
+
+| **模型名称**                    | **权重大小** | **最低卡数需求 (K100AI)** | **下载地址 (Hugging Face)**                                  |
+| ------------------------------- | ------------ | ------------------------- | ------------------------------------------------------------ |
+| **Qwen3-VL-2B-Instruct**        | 2B           | 1                         | [Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct) |
+| **Qwen3-VL-4B-Instruct**        | 4B           | 1                         | [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
+| **Qwen3-VL-8B-Instruct**        | 8B           | 1                         | [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
+| **Qwen3-VL-32B-Instruct**       | 32B          | 4                         | [Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) |
+| **Qwen3-VL-30B-A3B-Instruct**   | 30B          | 1-2                       | [Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) |
+| **Qwen3-VL-30B-A3B-Thinking**   | 30B          | 2                         | [Qwen3-VL-30B-A3B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking) |
+| **Qwen3-VL-235B-A22B-Instruct** | 235B         | 8                         | [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) |
+| **Qwen3-VL-235B-A22B-Thinking** | 235B         | 16                        | [Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |

 ## 源码仓库及问题反馈
 - https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch

--- a/model.properties
+++ b/model.properties
@@ -11,4 +11,4 @@ appCategory=多模态
 # 框架类型
 frameType=pytorch
 # 加速卡类型
-accelerateType=BW1000、K100AI
+accelerateType=BW1000.K100AI