Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3.5
Commits
92b6a63b
Commit
92b6a63b
authored
Apr 09, 2026
by
chenych
Browse files
Update docker image
parents
33bc4671
7df363c8
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
49 additions
and
31 deletions
+49
-31
README.md
README.md
+49
-31
No files found.
README.md
View file @
92b6a63b
...
...
@@ -3,7 +3,6 @@
[
Qwen3.5
](
https://qwen.ai/blog?id=qwen3.5
)
## 模型简介
Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉与语言组件上解耦并行策略,避免统一方案带来的低效。利用稀疏激活实现跨模块计算重叠,在混合文本-图像-视频数据上相比纯文本基线达到近 100% 的训练吞吐。在此基础上,原生 FP8 流水线对激活、MoE 路由与 GEMM 运算采用低精度,并通过运行时监控在敏感层保持 BF16,实现约 50% 的激活显存降低与超过 10% 的加速,并稳定扩展至数万亿 token。
为了持续释放强化学习的潜力,构建了可扩展的异步强化学习框架,支持 Qwen3.5 全尺寸模型,并全面覆盖文本、多模态及多轮交互场景。通过训推分离架构的解耦式设计,该框架显著提升了硬件利用率,实现了动态负载均衡和细粒度的故障恢复。配合 FP8 训推、Rollout 路由回放、投机采样以及多轮 Rollout 锁定等技术,进一步优化了系统吞吐,提高了训推一致性。通过系统与算法协同设计,该框架在严格控制样本陈旧性的基础上有效缓解了数据长尾问题,提高了训练曲线的稳定性和性能上限。此外,框架面向原生智能体工作流设计,能够实现稳定、无缝的多轮环境交互,消除了框架层的调度中断。这种解耦设计使得系统能够扩展百万级规模的 Agent 脚手架与环境,从而显著增强模型的泛化能力。上述优化最终取得了 3×–5× 的端到端加速,展现了卓越的稳定性、高效率与可扩展性。
...
...
@@ -22,7 +21,7 @@ Qwen3.5 通过异构基础设施实现高效的原生多模态训练:在视觉
| triton | 3.3.0+das.opt2.dtk2604.20260203.g393ad86c |
| torch | 2.9.0+das.opt1.dtk2604.20260126.g22910426 |
当前仅支持定制镜像:
image
.sourcefind.cn:5
000
/dcu/admin/base/custom:
pytorch2.9.1
-ubuntu22.04-dtk26.04-
0130-py3.10-20260204-qwen3_5
当前仅支持定制镜像:
harbor
.sourcefind.cn:5
443
/dcu/admin/base/custom:
vllm015
-ubuntu22.04-dtk26.04-
glm5-0408
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
...
...
@@ -40,36 +39,46 @@ docker run -it \
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
image
.sourcefind.cn:5
000
/dcu/admin/base/custom:
pytorch2.9.1
-ubuntu22.04-dtk26.04-
0130-py3.10-20260204-qwen3_5
bash
harbor
.sourcefind.cn:5
443
/dcu/admin/base/custom:
vllm015
-ubuntu22.04-dtk26.04-
glm5-0408
bash
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.sourcefind.cn/tool/
)
开发者社区下载安装,numpy、transformers库需要替换安装:
```
pip install git+https://github.com/huggingface/transformers.git
pip install numpy==1.25.0
```
## 数据集
暂无
`
暂无
`
## 训练
暂无
`
暂无
`
## 推理
### vllm
**注意**
:
-
使用
`K100 AI`
启动服务时需要添加
`--disable-custom-all-reduce`
参数
-
加载W8A8模型启动服务时需要添加
`-cc.mode=3`
和
`-cc.inductor_compile_config='{"combo_kernels": false, "benchmark_combo_kernel": false}'`
参数
#### 单机推理
```
bash
export
ALLREDUCE_STREAM_WITH_COMPUTE
=
1
export
VLLM_USE_PIECEWISE
=
1
export
VLLM_USE_FLASH_MLA
=
1
export
USE_FUSED_RMS_QUANT
=
0
export
USE_FUSED_SILU_MUL_QUANT
=
1
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
export
VLLM_USE_GLOBAL_CACHE13
=
1
export
VLLM_FUSED_MOE_CHUNK_SIZE
=
16384
export
VLLM_CUSTOM_CACHE
=
1
export
VLLM_USE_OPT_CAT
=
1
export
VLLM_USE_FUSED_FILL_RMS_CAT
=
1
export
VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD
=
0
export
VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT
=
0
```
bash
## serve启动
vllm serve Qwen/Qwen3.5-35B-A3B
\
--port
8001
\
--tensor-parallel-size
2
\
--max-model-len
262144
\
--reasoning-parser
qwen3
--gpu-memory-utilization
0.9
\
--reasoning-parser
qwen3
\
--enable-auto-tool-choice
\
--tool-call-parser
qwen3_coder
## client访问
curl http://localhost:8001/v1/chat/completions
\
...
...
@@ -82,6 +91,7 @@ curl http://localhost:8001/v1/chat/completions \
"temperature": 0.6
}'
```
#### 多机推理
1.
加入环境变量
> 请注意:
...
...
@@ -97,6 +107,7 @@ curl http://localhost:8001/v1/chat/completions \
```
bash
export
ALLREDUCE_STREAM_WITH_COMPUTE
=
1
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
VLLM_HOST_IP
=
x.x.x.x
# 对应计算节点的IP,选择IB口SOCKET_IFNAME对应IP地址
export
NCCL_SOCKET_IFNAME
=
ibxxxx
export
GLOO_SOCKET_IFNAME
=
ibxxxx
...
...
@@ -105,14 +116,21 @@ unset NCCL_ALGO
export
NCCL_MIN_NCHANNELS
=
16
export
NCCL_MAX_NCHANNELS
=
16
export
NCCL_NET_GDR_READ
=
1
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
VLLM_SPEC_DECODE_EAGER
=
1
export
VLLM_MLA_DISABLE
=
0
export
VLLM_USE_FLASH_MLA
=
1
export
VLLM_RPC_TIMEOUT
=
1800000
# K100_AI集群建议额外设置的环境变量:
export
VLLM_ENFORCE_EAGER_BS_THRESHOLD
=
44
export
VLLM_USE_PIECEWISE
=
1
export
USE_FUSED_RMS_QUANT
=
0
export
USE_FUSED_SILU_MUL_QUANT
=
1
export
VLLM_USE_GLOBAL_CACHE13
=
1
export
VLLM_FUSED_MOE_CHUNK_SIZE
=
16384
export
VLLM_CUSTOM_CACHE
=
1
export
VLLM_USE_OPT_CAT
=
1
export
VLLM_USE_FUSED_FILL_RMS_CAT
=
1
export
VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD
=
0
export
VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT
=
0
# 不能和kvfp8一起开
# 海光CPU绑定核
export
VLLM_NUMA_BIND
=
1
...
...
@@ -136,21 +154,20 @@ ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=3
ray start
--address
=
'x.x.x.x:6379'
--num-gpus
=
8
--num-cpus
=
32
```
3.
启动vllm server
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
3.
启动vllm serve
```
bash
## serve启动
vllm serve Qwen/Qwen3.5-397B-A17B
\
--port
8001
\
--tensor-parallel-size
16
\
--distributed-executor-backend
ray
\
--max-model-len
262144
\
--reasoning-parser
qwen3
--gpu-memory-utilization
0.9
\
--reasoning-parser
qwen3
\
--enable-auto-tool-choice
\
--tool-call-parser
qwen3_coder
```
## client访问
4.
client访问
```
bash
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
...
...
@@ -174,6 +191,7 @@ DCU与GPU精度一致,推理框架:vllm。
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.5-397B-A17B | 397B | K100AI,BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.5-397B-A17B-INT8 | 397B | BW1000 | 8 |
[
ModelScope
](
https://www.modelscope.cn/models/metax-tech/Qwen3.5-397B-A17B-W8A8
)
|
| Qwen3.5-122B-A10B | 122B | K100AI,BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B | K100AI,BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B | K100AI,BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment