Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Qwen3.5_vllm
Commits
045f9041
Commit
045f9041
authored
Feb 27, 2026
by
luopl
Committed by
chenych
Mar 03, 2026
Browse files
add the K100 AI inference method
parent
cb168f56
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
35 additions
and
6 deletions
+35
-6
README.md
README.md
+34
-5
model.properties
model.properties
+1
-1
No files found.
README.md
View file @
045f9041
# Qwen3.5
_vllm
# Qwen3.5
## 论文
[
Qwen3.5
](
https://qwen.ai/blog?id=qwen3.5
)
...
...
@@ -58,6 +58,30 @@ pip install numpy==1.25.0
## 推理
### vllm
#### 单机推理
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
```
bash
## serve启动
vllm serve Qwen/Qwen3.5-35B-A3B
\
--port
8001
\
--tensor-parallel-size
2
\
--max-model-len
262144
\
--reasoning-parser
qwen3
## client访问
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "Qwen/Qwen3.5-35B-A3B",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.6
}'
```
#### 多机推理
1.
加入环境变量
> 请注意:
...
...
@@ -87,6 +111,9 @@ export VLLM_MLA_DISABLE=0
export
VLLM_USE_FLASH_MLA
=
1
export
VLLM_RPC_TIMEOUT
=
1800000
# K100_AI集群建议额外设置的环境变量:
export
VLLM_ENFORCE_EAGER_BS_THRESHOLD
=
44
# 海光CPU绑定核
export
VLLM_NUMA_BIND
=
1
export
VLLM_RANK0_NUMA
=
0
...
...
@@ -111,6 +138,8 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
3.
启动vllm server
**注意**
:使用
`K100 AI`
集群启动服务时需要添加
`--disable-custom-all-reduce`
参数
```
bash
## serve启动
...
...
@@ -144,10 +173,10 @@ DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.5-397B-A17B | 397B | BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.5-122B-A10B | 122B | BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B | BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B | BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
| Qwen3.5-397B-A17B | 397B |
K100AI,
BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-397B-A17B
)
|
| Qwen3.5-122B-A10B | 122B |
K100AI,
BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-122B-A10B
)
|
| Qwen3.5-35B-A3B | 35B |
K100AI,
BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
)
|
| Qwen3.5-27B | 27B |
K100AI,
BW1000 | 2 |
[
Hugging Face
](
https://huggingface.co/Qwen/Qwen3.5-27B
)
|
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/qwen3.5_vllm
...
...
model.properties
View file @
045f9041
...
...
@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
frameType
=
vllm
# 加速卡类型
accelerateType
=
BW1000
\ No newline at end of file
accelerateType
=
K100AI,BW1000
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment