Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3-VL_pytorch
Commits
893f761a
Commit
893f761a
authored
Jan 30, 2026
by
raojy
Browse files
updata
parent
d190605a
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
20 additions
and
36 deletions
+20
-36
README.md
README.md
+19
-35
model.properties
model.properties
+1
-1
No files found.
README.md
View file @
893f761a
...
...
@@ -44,26 +44,9 @@ DCU型号:K100AI,节点数量:2台,卡数:16 张。
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
-
挂载地址
`-v`
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
docker run
-it
\
--shm-size
60g
\
--network
=
host
\
--name
{
docker_name
}
\
--privileged
\
--device
=
/dev/kfd
\
--device
=
/dev/dri
\
--device
=
/dev/mkfd
\
--group-add
video
\
--cap-add
=
SYS_PTRACE
\
--security-opt
seccomp
=
unconfined
\
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
{
docker_image_name
}
bash
示例如下:
docker run
-it
\
--shm-size
60g
\
--network
=
host
\
...
...
@@ -118,7 +101,7 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py
export
HF_HUB_OFFLINE
=
1
export
TRANSFORMERS_OFFLINE
=
1
vllm serve Qwen3-VL-8B-Instruct
\
vllm serve
Qwen/
Qwen3-VL-8B-Instruct
\
--trust-remote-code
\
--max-model-len
32768
\
--served-model-name
qwen-vl
\
...
...
@@ -196,11 +179,10 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start
--address
=
'x.x.x.x:6379'
--num-gpus
=
8
--num-cpus
=
32
```
3.
启动vllm server
> intel cpu 需要加参数:`--enforce-eager`
```
bash
vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking
\
--host
*
.
*
.
*
.
*
\
--host
x.x.x.x
\
--port
8000
\
--distributed-executor-backend
ray
\
--tensor-parallel-size
8
\
...
...
@@ -211,14 +193,14 @@ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
--max-num-seqs
128
\
--block-size
64
\
--gpu-memory-utilization
0.90
\
--enforce-eager
\
--allowed-local-media-path
/
\
--served-model-name
qwen-vl
\
--override-generation-config
'{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}'
```
```
启动完成后可通过以下方式访问:
```
bash
# /path/to/your/project 请改为图像文件存储的目录
curl http://x.x.x.x:8000/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
...
...
@@ -230,7 +212,7 @@ curl http://x.x.x.x:8000/v1/chat/completions \
{
"type": "image_url",
"image_url": {
"url": "file://test
22
.png"
"url": "file://
/path/to/your/project/doc/
test.png"
}
},
{
...
...
@@ -246,8 +228,6 @@ curl http://x.x.x.x:8000/v1/chat/completions \
```
## vllm效果展示
...
...
@@ -310,20 +290,24 @@ Output:
<div
align=
center
>
<img
src=
"./doc/result_vedio.png"
/>
</div>
### 精度
`DCU与GPU精度一致,支持推理框架:transformers、vllm。`
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:--------------------:|:----:|:----------:|:------:|:----------:|
| Qwen3-VL-4B-Instruct | 4B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| Qwen3-VL-8B-Instruct | 8B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
| Qwen3-VL-235B-A22B-Thinking | 235B | K100AI| 16 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |
## Qwen3-VL 全系列模型清单
|
**模型名称**
|
**权重大小**
|
**最低卡数需求 (K100AI)**
|
**下载地址 (Hugging Face)**
|
| ------------------------------- | ------------ | ------------------------- | ------------------------------------------------------------ |
|
**Qwen3-VL-2B-Instruct**
| 2B | 1 |
[
Qwen3-VL-2B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
)
|
|
**Qwen3-VL-4B-Instruct**
| 4B | 1 |
[
Qwen3-VL-4B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
)
|
|
**Qwen3-VL-8B-Instruct**
| 8B | 1 |
[
Qwen3-VL-8B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
)
|
|
**Qwen3-VL-32B-Instruct**
| 32B | 4 |
[
Qwen3-VL-32B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct
)
|
|
**Qwen3-VL-30B-A3B-Instruct**
| 30B | 1-2 |
[
Qwen3-VL-30B-A3B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
)
|
|
**Qwen3-VL-30B-A3B-Thinking**
| 30B | 2 |
[
Qwen3-VL-30B-A3B-Thinking
](
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking
)
|
|
**Qwen3-VL-235B-A22B-Instruct**
| 235B | 8 |
[
Qwen3-VL-235B-A22B-Instruct
](
https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct
)
|
|
**Qwen3-VL-235B-A22B-Thinking**
| 235B | 16 |
[
Qwen3-VL-235B-A22B-Thinking
](
https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking
)
|
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch
...
...
model.properties
View file @
893f761a
...
...
@@ -11,4 +11,4 @@ appCategory=多模态
# 框架类型
frameType
=
pytorch
# 加速卡类型
accelerateType
=
BW1000
、
K100AI
accelerateType
=
BW1000
.
K100AI
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment