Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3.5
Commits
045f9041
Commit
045f9041
authored
Feb 27, 2026
by
luopl
Committed by
chenych
Mar 03, 2026
Browse files
add the K100 AI inference method
parent
cb168f56
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
35 additions
and
6 deletions
+35
-6
README.md
README.md
+34
-5
model.properties
model.properties
+1
-1
No files found.
README.md
View file @
045f9041
# Qwen3.5
_vllm
# Qwen3.5
## 论文
[
Qwen3.5
](
https://qwen.ai/blog?id=qwen3.5
)
...
...
@@ -58,6 +58,30 @@ pip install numpy==1.25.0
## 推理
### vllm
#### 单机推理
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
```
bash
## serve启动
vllm serve Qwen/Qwen3.5-35B-A3B
\
--port
8001
\
--tensor-parallel-size
2
\
--max-model-len
262144
\
--reasoning-parser
qwen3
## client访问
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "Qwen/Qwen3.5-35B-A3B",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.6
}'
```
#### 多机推理
1.
加入环境变量
> 请注意:
...
...
@@ -87,6 +111,9 @@ export VLLM_MLA_DISABLE=0
export
VLLM_USE_FLASH_MLA
=
1
export
VLLM_RPC_TIMEOUT
=
1800000
# K100_AI集群建议额外设置的环境变量:
export
VLLM_ENFORCE_EAGER_BS_THRESHOLD
=
44
# 海光CPU绑定核
export
VLLM_NUMA_BIND
=
1
export
VLLM_RANK0_NUMA
=
0
...
...
@@ -111,6 +138,8 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
3.
启动vllm server
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
```
bash
## serve启动
...
...
@@ -144,10 +173,10 @@ DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.5-397B-A17B | 397B | BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.5-122B-A10B | 122B | BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B | BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B | BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
| Qwen3.5-397B-A17B | 397B |
K100AI,
BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.5-122B-A10B | 122B |
K100AI,
BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B |
K100AI,
BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B |
K100AI,
BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.5_vllm
...
...
model.properties
View file @
045f9041
...
...
@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
frameType
=
vllm
# 加速卡类型
accelerateType
=
BW1000
\ No newline at end of file
accelerateType
=
K100AI,BW1000
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment