Commit 09c6cf7c authored by raojy's avatar raojy 💬
Browse files

Update README.md

parent 89613cf4
...@@ -21,9 +21,10 @@ Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉 ...@@ -21,9 +21,10 @@ Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉
| vllm | 0.18.1+das.dtk2604 | | vllm | 0.18.1+das.dtk2604 |
| triton | 3.6.0+das.opt1.dtk2604 | | triton | 3.6.0+das.opt1.dtk2604 |
| torch | 2.10.0+das.opt1.dtk2604 | | torch | 2.10.0+das.opt1.dtk2604 |
| SGLang | 0.5.10rc0+das.opt2.alpha.dtk2604 |
当前仅支持定制镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.18.1-ubuntu22.04-dtk26.04-py3.10-20260510-2242 - **vLLM当前仅支持定制镜像:** : harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.18.1-ubuntu22.04-dtk26.04-py3.10-20260510-2242
- **SGLang推理请使用:** harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang0.5.10rc0-ubuntu22.04-dtk26.04-py3.10-20260518
- 挂载地址`-v` 根据实际模型情况修改 - 挂载地址`-v` 根据实际模型情况修改
```bash ```bash
docker run -it \ docker run -it \
...@@ -178,14 +179,85 @@ curl http://localhost:8001/v1/chat/completions \ ...@@ -178,14 +179,85 @@ curl http://localhost:8001/v1/chat/completions \
}' }'
``` ```
### SGLang
#### 单机推理
##### BF16
1. serve启动,以`Qwen/Qwen3.6-27B`为例(此命令适用于非K100AI芯片)
```bash
export SGLANG_ENABLE_SPEC_V2=1
export SGLANG_USE_FUSED_TOPK_SOFTMAX=1
export SGLANG_USE_LIGHTOP=1
export SGLANG_USE_CAUSAL_CONV1D=1
export SGLANG_USE_AITER_LINEAR_ATTN=1
export SGLANG_USE_CUDA_IPC_TRANSPORT=1
sglang serve --model-path Qwen/Qwen3.5-35B-A3B \
--attention-backend fa3 \
--mm-attention-backend fa3 \
--enable-piecewise-cuda-graph \
--tp-size 1 --pp-size 1 \
--page-size 64 \
--mem-fraction-static 0.95 \
--mamba-scheduler-strategy extra_buffer \
--kv-cache-dtype fp8_e5m2 \
--trust-remote-code \
--chunked-prefill-size -1 --context-length 8192
```
2. serve启动,以`Qwen/Qwen3.6-27B`为例(此命令适用于K100AI芯片)
```bash
export SGLANG_ENABLE_SPEC_V2=1
export SGLANG_USE_FUSED_TOPK_SOFTMAX=1
export SGLANG_USE_LIGHTOP=1
export SGLANG_USE_CAUSAL_CONV1D=1
export SGLANG_USE_AITER_LINEAR_ATTN=1
export SGLANG_USE_CUDA_IPC_TRANSPORT=1
export HIP_VISIBLE_DEVICES=4,5,6,7
export SGLANG_KV_LAYOUT_DCU_FA=0
sglang serve --model-path Qwen/Qwen3.5-35B-A3B \
--attention-backend fa3 \
--mm-attention-backend fa3 \
--disable-cuda-graph \
--tp-size 2 --pp-size 1 \
--page-size 64 \
--mem-fraction-static 0.9 \
--kv-cache-dtype bf16 \
--trust-remote-code \
--chunked-prefill-size -1 \
--disable-radix-cache \
```
2. client访问
```bash
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.6-27B-FP8",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.8,
"chat_template_kwargs": {
"enable_thinking": true
}
}'
```
## 效果展示 ## 效果展示
<div align=center> <div align=center>
<img src="./doc/result-dcu.png"/> <img src="./doc/result-dcu.png"/>
</div> </div>
### 精度 ### 精度
DCU与GPU精度一致,推理框架:vllm。 - 推理框架: SGLang
- 测试数据: humaneval、gsm8k
- 使用的加速卡: BW1000
| model name| humaneval | gsm8k |
| :------: | :------: | :------: |
| Qwen3.5-27B | 0.92 | 0.98 |
| Qwen3.5-35B-A3B | 0.92 | 0.98 |
| Qwen3.5-122B-A10B | 0.93 | 0.98 |
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/qwen3.5 - https://developer.sourcefind.cn/codes/modelzoo/qwen3.5
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment