Commit 893f761a authored by raojy's avatar raojy
Browse files

updata

parent d190605a
......@@ -44,26 +44,9 @@ DCU型号:K100AI,节点数量:2台,卡数:16 张。
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
- 挂载地址`-v`
- 挂载地址`-v`根据实际模型情况修改
```bash
docker run -it \
--shm-size 60g \
--network=host \
--name {docker_name} \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mkfd \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
{docker_image_name} bash
示例如下:
docker run -it \
--shm-size 60g \
--network=host \
......@@ -118,7 +101,7 @@ HIP_VISIBLE_DEVICES=0 python qwen3vl_infer_video.py
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve Qwen3-VL-8B-Instruct \
vllm serve Qwen/Qwen3-VL-8B-Instruct \
--trust-remote-code \
--max-model-len 32768 \
--served-model-name qwen-vl \
......@@ -196,11 +179,10 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
```
3. 启动vllm server
> intel cpu 需要加参数:`--enforce-eager`
```bash
vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
--host *.*.*.* \
--host x.x.x.x \
--port 8000 \
--distributed-executor-backend ray \
--tensor-parallel-size 8 \
......@@ -211,14 +193,14 @@ vllm serve Qwen/Qwen3-VL-235B-A22B-Thinking \
--max-num-seqs 128 \
--block-size 64 \
--gpu-memory-utilization 0.90 \
--enforce-eager \
--allowed-local-media-path / \
--served-model-name qwen-vl \
--override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}'
```
```
启动完成后可通过以下方式访问:
```bash
# /path/to/your/project 请改为图像文件存储的目录
curl http://x.x.x.x:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
......@@ -230,7 +212,7 @@ curl http://x.x.x.x:8000/v1/chat/completions \
{
"type": "image_url",
"image_url": {
"url": "file://test22.png"
"url": "file:///path/to/your/project/doc/test.png"
}
},
{
......@@ -246,8 +228,6 @@ curl http://x.x.x.x:8000/v1/chat/completions \
```
## vllm效果展示
......@@ -310,20 +290,24 @@ Output:
<div align=center>
<img src="./doc/result_vedio.png"/>
</div>
### 精度
`DCU与GPU精度一致,支持推理框架:transformers、vllm。`
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:--------------------:|:----:|:----------:|:------:|:----------:|
| Qwen3-VL-4B-Instruct | 4B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| Qwen3-VL-8B-Instruct | 8B | K100AI| 1 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
| Qwen3-VL-235B-A22B-Thinking | 235B | K100AI| 16 | [Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |
## Qwen3-VL 全系列模型清单
| **模型名称** | **权重大小** | **最低卡数需求 (K100AI)** | **下载地址 (Hugging Face)** |
| ------------------------------- | ------------ | ------------------------- | ------------------------------------------------------------ |
| **Qwen3-VL-2B-Instruct** | 2B | 1 | [Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct) |
| **Qwen3-VL-4B-Instruct** | 4B | 1 | [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
| **Qwen3-VL-8B-Instruct** | 8B | 1 | [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
| **Qwen3-VL-32B-Instruct** | 32B | 4 | [Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) |
| **Qwen3-VL-30B-A3B-Instruct** | 30B | 1-2 | [Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) |
| **Qwen3-VL-30B-A3B-Thinking** | 30B | 2 | [Qwen3-VL-30B-A3B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking) |
| **Qwen3-VL-235B-A22B-Instruct** | 235B | 8 | [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) |
| **Qwen3-VL-235B-A22B-Thinking** | 235B | 16 | [Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) |
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/qwen3-vl_pytorch
......
......@@ -11,4 +11,4 @@ appCategory=多模态
# 框架类型
frameType=pytorch
# 加速卡类型
accelerateType=BW1000K100AI
accelerateType=BW1000.K100AI
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment