Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
raojy
Qwen3.6_vllm
Commits
9986bee1
Commit
9986bee1
authored
Apr 17, 2026
by
raojy
💬
Browse files
Update README.md
parent
9a37f010
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
24 additions
and
116 deletions
+24
-116
README.md
README.md
+24
-116
No files found.
README.md
View file @
9986bee1
...
@@ -7,7 +7,7 @@
...
@@ -7,7 +7,7 @@
Qwen3.6-35B-A3B 是一款采用混合专家 (MoE) 架构并包含视觉编码器的多模态因果语言模型。总参数量为 35B,推理时激活参数量为 3B。此版本重点优化了智能体编程 (Agentic Coding) 的准确率,并引入了思维保留 (Thinking Preservation) 机制,特别适合长上下文和复杂的代码仓库开发任务。
Qwen3.6-35B-A3B 是一款采用混合专家 (MoE) 架构并包含视觉编码器的多模态因果语言模型。总参数量为 35B,推理时激活参数量为 3B。此版本重点优化了智能体编程 (Agentic Coding) 的准确率,并引入了思维保留 (Thinking Preservation) 机制,特别适合长上下文和复杂的代码仓库开发任务。
<div
align=
center
>
<div
align=
center
>
<img
src=
"./doc/
qwen3.5_397b_a17b_infra.jp
g"
/>
<img
src=
"./doc/
1.pn
g"
/>
</div>
</div>
## 环境依赖
## 环境依赖
...
@@ -15,12 +15,12 @@ Qwen3.6-35B-A3B 是一款采用混合专家 (MoE) 架构并包含视觉编码器
...
@@ -15,12 +15,12 @@ Qwen3.6-35B-A3B 是一款采用混合专家 (MoE) 架构并包含视觉编码器
| :------: |:-----------------------------------------:|
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
| DTK | 26.04 |
| python | 3.10.12 |
| python | 3.10.12 |
| transformers | 5.
5
.0 |
| transformers | 5.
2
.0 |
| vllm | 0.1
8
.1+das.
3266200.dtk2604
|
| vllm | 0.1
5
.1+das.
opt1.alpha.dtk2604.torch290.2604081832.gbcb2ba
|
| triton | 3.
4
.0+
git1ef59765
|
| triton | 3.
3
.0+
das.opt2.dtk2604.torch290.20260331.g31542e
|
| torch | 2.
10
.0+das.opt1.dtk2604.202603
25.g6b060a
|
| torch | 2.
9
.0+das.opt1.dtk2604.202603
31.g4e3c1e7
|
当前
仅支持定制
镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm01
8
-ubuntu22.04-dtk26.04-
gemma4-0413
当前
推荐使用
镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm01
5
-ubuntu22.04-dtk26.04-
0409-modelzoo
-
挂载地址
`-v`
根据实际模型情况修改
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
```
bash
...
@@ -38,7 +38,7 @@ docker run -it \
...
@@ -38,7 +38,7 @@ docker run -it \
-u
root
\
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
-v
/path/your_code_data/:/path/your_code_data/
\
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm01
8
-ubuntu22.04-dtk26.04-
gemma4-0413
bash
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm01
5
-ubuntu22.04-dtk26.04-
0409-modelzoo
bash
```
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
...
@@ -51,119 +51,33 @@ docker run -it \
...
@@ -51,119 +51,33 @@ docker run -it \
暂无
暂无
## 推理
## 推理
> 如果出现`ImportError: librocm_smi64.so.2: cannot open shaned object file: No such file or directory`报错,系机器hyhal版本较低所致,请进行升级。
### vllm
### vllm
**mtp**
功能需添加以下参数:
```
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 2
```
#### 单机推理
#### 单机推理
```
bash
```
bash
## serve启动
## serve启动
vllm serve Qwen/Qwen3.
5
-35B-A3B
\
vllm serve Qwen/Qwen3.
6
-35B-A3B
\
--port
8001
\
--port
8001
\
--trust-remote-code
\
--trust-remote-code
\
--dtype
bfloat16
\
--dtype
bfloat16
\
--tensor-parallel-size
2
\
--tensor-parallel-size
4
\
--gpu-memory-utilization
0.925
\
--gpu-memory-utilization
0.925
--default-chat-template-kwargs
'{"enable_thinking": false}'
\
--reasoning-parser
qwen3
\
--enable-auto-tool-choice
\
--tool-call-parser
qwen3_coder
## client访问
## client访问
curl http://localhost:8001/v1/chat/completions
\
curl
-X
POST
"http://localhost:8001/v1/chat/completions"
-H
"Content-Type: application/json"
-d
'{
-H
"Content-Type: application/json"
\
"model": "Qwen/Qwen3.6-35B-A3B",
-d
'{
"model": "Qwen/Qwen3.5-35B-A3B",
"messages": [
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
{"role": "system", "content": "你是一个有用的助手。"},
{"role": "user", "content": "你好,请做一下简单的自我介绍。"}
],
],
"temperature": 0.6
"stream": false
}'
```
#### 多机推理
1.
加入环境变量
> 请注意:
> 每个节点上的环境变量都写到.sh文件中,保存后各个计算节点分别source`.sh`文件
>
> VLLM_HOST_IP:节点本地通信口ip,尽量选择IB网卡的IP,**避免出现rccl超时问题**
>
> NCCL_SOCKET_IFNAME和 GLOO_SOCKET_IFNAME:节点本地通信网口ip对应的名称
>
> 通信口和ip查询方法:ifconfig
>
> IB口状态查询:ibstat !!!一定要active激活状态才可用,各个节点要保持统一
```
bash
export
ALLREDUCE_STREAM_WITH_COMPUTE
=
1
export
VLLM_HOST_IP
=
x.x.x.x
# 对应计算节点的IP,选择IB口SOCKET_IFNAME对应IP地址
export
NCCL_SOCKET_IFNAME
=
ibxxxx
export
GLOO_SOCKET_IFNAME
=
ibxxxx
export
NCCL_IB_HCA
=
mlx5_x:1
# 环境中的IB网卡名字
unset
NCCL_ALGO
export
NCCL_MIN_NCHANNELS
=
16
export
NCCL_MAX_NCHANNELS
=
16
export
NCCL_NET_GDR_READ
=
1
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
VLLM_RPC_TIMEOUT
=
1800000
# 海光CPU绑定核
export
VLLM_NUMA_BIND
=
1
export
VLLM_RANK0_NUMA
=
0
export
VLLM_RANK1_NUMA
=
1
export
VLLM_RANK2_NUMA
=
2
export
VLLM_RANK3_NUMA
=
3
export
VLLM_RANK4_NUMA
=
4
export
VLLM_RANK5_NUMA
=
5
export
VLLM_RANK6_NUMA
=
6
export
VLLM_RANK7_NUMA
=
7
```
2.
启动RAY集群
> x.x.x.x 对应第一步 VLLM_HOST_IP
```
bash
# head节点执行
ray start
--head
--node-ip-address
=
x.x.x.x
--port
=
6379
--num-gpus
=
8
--num-cpus
=
32
# worker节点执行
ray start
--address
=
'x.x.x.x:6379'
--num-gpus
=
8
--num-cpus
=
32
```
3.
启动vllm server
```
bash
## serve启动
vllm serve Qwen/Qwen3.5-397B-A17B
\
--port
8001
\
--tensor-parallel-size
16
\
--distributed-executor-backend
ray
\
--trust-remote-code
\
--dtype
bfloat16
\
--gpu-memory-utilization
0.925
\
--default-chat-template-kwargs
'{"enable_thinking": false}'
\
--reasoning-parser
qwen3
\
--enable-auto-tool-choice
\
--tool-call-parser
qwen3_coder
## client访问
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "Qwen/Qwen3.5-397B-A17B",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.6
}'
}'
```
```
## 效果展示
## 效果展示
<div
align=
center
>
<div
align=
center
>
<img
src=
"./doc/
result-dcu.jp
g"
/>
<img
src=
"./doc/
2.pn
g"
/>
</div>
</div>
### 精度
### 精度
...
@@ -172,18 +86,12 @@ DCU与GPU精度一致,推理框架:vllm。
...
@@ -172,18 +86,12 @@ DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.5-397B-A17B | 397B | K100AI,BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.6-35B-A3B | 397B | K100AI,BW1000 | 4 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.6-35B-A3B
)
|
| Qwen3.5-397B-A17B-INT8 | 397B | BW1000 | 8 |
[
Modelscope
](
https://www.modelscope.cn/models/metax-tech/Qwen3.5-397B-A17B-W8A8
)
|
| Qwen3.6-35B-A3B-FP8 | 397B | BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8
)
|
| Qwen3.5-122B-A10B | 122B | K100AI,BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B | K100AI,BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B | K100AI,BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
| Qwen3.5-9B | 9B | K100AI,BW1000 | 1 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-9B
)
|
| Qwen3.5-4B | 4B | K100AI,BW1000 | 1 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-4B
)
|
| Qwen3.5-2B | 2B | K100AI,BW1000 | 1 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-2B
)
|
| Qwen3.5-0.8B | 0.8B | K100AI,BW1000 | 1 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-0.8B
)
|
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.
5
_vllm
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.
6
_vllm
## 参考资料
## 参考资料
-
https://github.com/QwenLM/Qwen3.
5
-
https://github.com/QwenLM/Qwen3.
6
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment