Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3.5
Commits
09c6cf7c
Commit
09c6cf7c
authored
Jun 04, 2026
by
raojy
💬
Browse files
Update README.md
parent
89613cf4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
75 additions
and
3 deletions
+75
-3
README.md
README.md
+75
-3
No files found.
README.md
View file @
09c6cf7c
...
@@ -21,9 +21,10 @@ Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉
...
@@ -21,9 +21,10 @@ Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉
| vllm | 0.18.1+das.dtk2604 |
| vllm | 0.18.1+das.dtk2604 |
| triton | 3.6.0+das.opt1.dtk2604 |
| triton | 3.6.0+das.opt1.dtk2604 |
| torch | 2.10.0+das.opt1.dtk2604 |
| torch | 2.10.0+das.opt1.dtk2604 |
| SGLang | 0.5.10rc0+das.opt2.alpha.dtk2604 |
当前仅支持定制镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.18.1-ubuntu22.04-dtk26.04-py3.10-20260510-2242
-
**vLLM
当前仅支持定制镜像:
**
:
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.18.1-ubuntu22.04-dtk26.04-py3.10-20260510-2242
-
**SGLang推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang0.5.10rc0-ubuntu22.04-dtk26.04-py3.10-20260518
-
挂载地址
`-v`
根据实际模型情况修改
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
```
bash
docker run
-it
\
docker run
-it
\
...
@@ -178,14 +179,85 @@ curl http://localhost:8001/v1/chat/completions \
...
@@ -178,14 +179,85 @@ curl http://localhost:8001/v1/chat/completions \
}'
}'
```
```
### SGLang
#### 单机推理
##### BF16
1.
serve启动,以
`Qwen/Qwen3.6-27B`
为例(此命令适用于非K100AI芯片)
```
bash
export
SGLANG_ENABLE_SPEC_V2
=
1
export
SGLANG_USE_FUSED_TOPK_SOFTMAX
=
1
export
SGLANG_USE_LIGHTOP
=
1
export
SGLANG_USE_CAUSAL_CONV1D
=
1
export
SGLANG_USE_AITER_LINEAR_ATTN
=
1
export
SGLANG_USE_CUDA_IPC_TRANSPORT
=
1
sglang serve
--model-path
Qwen/Qwen3.5-35B-A3B
\
--attention-backend
fa3
\
--mm-attention-backend
fa3
\
--enable-piecewise-cuda-graph
\
--tp-size
1
--pp-size
1
\
--page-size
64
\
--mem-fraction-static
0.95
\
--mamba-scheduler-strategy
extra_buffer
\
--kv-cache-dtype
fp8_e5m2
\
--trust-remote-code
\
--chunked-prefill-size
-1
--context-length
8192
```
2.
serve启动,以
`Qwen/Qwen3.6-27B`
为例(此命令适用于K100AI芯片)
```
bash
export
SGLANG_ENABLE_SPEC_V2
=
1
export
SGLANG_USE_FUSED_TOPK_SOFTMAX
=
1
export
SGLANG_USE_LIGHTOP
=
1
export
SGLANG_USE_CAUSAL_CONV1D
=
1
export
SGLANG_USE_AITER_LINEAR_ATTN
=
1
export
SGLANG_USE_CUDA_IPC_TRANSPORT
=
1
export
HIP_VISIBLE_DEVICES
=
4,5,6,7
export
SGLANG_KV_LAYOUT_DCU_FA
=
0
sglang serve
--model-path
Qwen/Qwen3.5-35B-A3B
\
--attention-backend
fa3
\
--mm-attention-backend
fa3
\
--disable-cuda-graph
\
--tp-size
2
--pp-size
1
\
--page-size
64
\
--mem-fraction-static
0.9
\
--kv-cache-dtype
bf16
\
--trust-remote-code
\
--chunked-prefill-size
-1
\
--disable-radix-cache
\
```
2.
client访问
```
bash
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "Qwen/Qwen3.6-27B-FP8",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.8,
"chat_template_kwargs": {
"enable_thinking": true
}
}'
```
## 效果展示
## 效果展示
<div
align=
center
>
<div
align=
center
>
<img
src=
"./doc/result-dcu.png"
/>
<img
src=
"./doc/result-dcu.png"
/>
</div>
</div>
### 精度
### 精度
DCU与GPU精度一致,推理框架:vllm。
-
推理框架: SGLang
-
测试数据: humaneval、gsm8k
-
使用的加速卡: BW1000
| model name| humaneval | gsm8k |
| :------: | :------: | :------: |
| Qwen3.5-27B | 0.92 | 0.98 |
| Qwen3.5-35B-A3B | 0.92 | 0.98 |
| Qwen3.5-122B-A10B | 0.93 | 0.98 |
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.5
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.5
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment