Commit 893f761a authored by raojy's avatar raojy
Browse files

updata

parent d190605a
...@@ -44,26 +44,9 @@ DCU型号:K100AI,节点数量:2台,卡数:16 张。 ...@@ -44,26 +44,9 @@ DCU型号:K100AI,节点数量:2台,卡数:16 张。
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226 推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
- 挂载地址`-v` - 挂载地址`-v`根据实际模型情况修改
```bash ```bash
docker run -it \
--shm-size 60g \
--network=host \
--name {docker_name} \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mkfd \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
{docker_image_name} bash
示例如下:
docker run -it \ docker run -it \
--shm-size 60g \ --shm-size 60g \
--network=host \ --network=host \
...@@ -118,7 +101,7 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py ...@@ -118,7 +101,7 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py
export HF_HUB_OFFLINE=1 export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1 export TRANSFORMERS_OFFLINE=1
vllm serve Qwen3-VL-8B-Instruct \ vllm serve Qwen/Qwen3-VL-8B-Instruct \
--trust-remote-code \ --trust-remote-code \
--max-model-len 32768 \ --max-model-len 32768 \
--served-model-name qwen-vl \ --served-model-name qwen-vl \
...@@ -196,11 +179,10 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3 ...@@ -196,11 +179,10 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32 ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
``` ```
3. 启动vllm server 3. 启动vllm server
> intel cpu 需要加参数:`--enforce-eager`
```bash ```bash
vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
--host *.*.*.* \ --host x.x.x.x \
--port 8000 \ --port 8000 \
--distributed-executor-backend ray \ --distributed-executor-backend ray \
--tensor-parallel-size 8 \ --tensor-parallel-size 8 \
...@@ -211,14 +193,14 @@ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \ ...@@ -211,14 +193,14 @@ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
--max-num-seqs 128 \ --max-num-seqs 128 \
--block-size 64 \ --block-size 64 \
--gpu-memory-utilization 0.90 \ --gpu-memory-utilization 0.90 \
--enforce-eager \
--allowed-local-media-path / \ --allowed-local-media-path / \
--served-model-name qwen-vl \ --served-model-name qwen-vl \
--override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}' --override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}'
``` ```
启动完成后可通过以下方式访问: 启动完成后可通过以下方式访问:
```bash ```bash
# /path/to/your/project 请改为图像文件存储的目录
curl http://x.x.x.x:8000/v1/chat/completions \ curl http://x.x.x.x:8000/v1/chat/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
...@@ -230,7 +212,7 @@ curl http://x.x.x.x:8000/v1/chat/completions \ ...@@ -230,7 +212,7 @@ curl http://x.x.x.x:8000/v1/chat/completions \
{ {
"type": "image_url", "type": "image_url",
"image_url": { "image_url": {
"url": "file://test22.png" "url": "file:///path/to/your/project/doc/test.png"
} }
}, },
{ {
...@@ -246,8 +228,6 @@ curl http://x.x.x.x:8000/v1/chat/completions \ ...@@ -246,8 +228,6 @@ curl http://x.x.x.x:8000/v1/chat/completions \
``` ```
## vllm效果展示 ## vllm效果展示
...@@ -310,20 +290,24 @@ Output: ...@@ -310,20 +290,24 @@ Output:
<div align=center> <div align=center>
<img src="./doc/result_vedio.png"/> <img src="./doc/result_vedio.png"/>
</div> </div>
### 精度 ### 精度
`DCU与GPU精度一致,支持推理框架:transformers、vllm。` `DCU与GPU精度一致,支持推理框架:transformers、vllm。`
## 预训练权重 ## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:--------------------:|:----:|:----------:|:------:|:----------:| ## Qwen3-VL 全系列模型清单
| Qwen3-VL-4B-Instruct | 4B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| Qwen3-VL-8B-Instruct | 8B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) | | **模型名称** | **权重大小** | **最低卡数需求 (K100AI)** | **下载地址 (Hugging Face)** |
| Qwen3-VL-235B-A22B-Thinking | 235B | K100AI| 16 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) | | ------------------------------- | ------------ | ------------------------- | ------------------------------------------------------------ |
| **Qwen3-VL-2B-Instruct** | 2B | 1 | [Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct) |
| **Qwen3-VL-4B-Instruct** | 4B | 1 | [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| **Qwen3-VL-8B-Instruct** | 8B | 1 | [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
| **Qwen3-VL-32B-Instruct** | 32B | 4 | [Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) |
| **Qwen3-VL-30B-A3B-Instruct** | 30B | 1-2 | [Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) |
| **Qwen3-VL-30B-A3B-Thinking** | 30B | 2 | [Qwen3-VL-30B-A3B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking) |
| **Qwen3-VL-235B-A22B-Instruct** | 235B | 8 | [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) |
| **Qwen3-VL-235B-A22B-Thinking** | 235B | 16 | [Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch - https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch
......
...@@ -11,4 +11,4 @@ appCategory=多模态 ...@@ -11,4 +11,4 @@ appCategory=多模态
# 框架类型 # 框架类型
frameType=pytorch frameType=pytorch
# 加速卡类型 # 加速卡类型
accelerateType=BW1000K100AI accelerateType=BW1000.K100AI
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment