add the K100 AI inference method

045f9041 · luopl · chenych · cb168f56 · 045f9041 · 045f9041
Commit 045f9041 authored Feb 27, 2026 by luopl Committed by chenych Mar 03, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 35 additions and 6 deletions

README.md README.md +34 -5

model.properties model.properties +1 -1

No files found.
--- a/README.md
+++ b/README.md
-# Qwen3.5_vllm
+# Qwen3.5
 ## 论文
 [Qwen3.5](https://qwen.ai/blog?id=qwen3.5)

@@ -58,6 +58,30 @@ pip install numpy==1.25.0

 ## 推理
 ### vllm
+#### 单机推理
+
+**注意**：使用`K100 AI` 集群启动服务时需要添加`--disable-custom-all-reduce`参数
+
+```bash
+## serve启动
+
+vllm serve Qwen/Qwen3.5-35B-A3B \
+    --port 8001 \
+    --tensor-parallel-size 2 \
+    --max-model-len 262144 \
+    --reasoning-parser qwen3
+
+## client访问
+curl http://localhost:8001/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "Qwen/Qwen3.5-35B-A3B",
+        "messages": [
+          {"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
+        ],
+        "temperature": 0.6
+    }'
+```
 #### 多机推理
 1. 加入环境变量
 > 请注意：
@@ -87,6 +111,9 @@ export VLLM_MLA_DISABLE=0
 export VLLM_USE_FLASH_MLA=1
 export VLLM_RPC_TIMEOUT=1800000

+# K100_AI集群建议额外设置的环境变量：
+export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
+
 # 海光CPU绑定核
 export VLLM_NUMA_BIND=1
 export VLLM_RANK0_NUMA=0
@@ -111,6 +138,8 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32

 3. 启动vllm server

+**注意**：使用`K100 AI` 集群启动服务时需要添加`--disable-custom-all-reduce`参数
+
 ```bash
 ## serve启动

@@ -144,10 +173,10 @@ DCU与GPU精度一致，推理框架：vllm。
 ## 预训练权重
 |  模型名称  | 权重大小 | DCU型号  | 最低卡数需求 |         下载地址          |
 |:------:|:----:|:----------:|:------:|:---------------------:|
-| Qwen3.5-397B-A17B | 397B | BW1000 |   16   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) |
-| Qwen3.5-122B-A10B | 122B | BW1000 |   8   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) |
-| Qwen3.5-35B-A3B | 35B | BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) |
-| Qwen3.5-27B | 27B | BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-27B) |
+| Qwen3.5-397B-A17B | 397B | K100AI,BW1000 |   16   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) |
+| Qwen3.5-122B-A10B | 122B | K100AI,BW1000 |   8   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) |
+| Qwen3.5-35B-A3B | 35B | K100AI,BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) |
+| Qwen3.5-27B | 27B | K100AI,BW1000 |   2   | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-27B) |

 ## 源码仓库及问题反馈
 - https://developer.sourcefind.cn/codes/modelzoo/qwen3.5_vllm

--- a/model.properties
+++ b/model.properties
@@ -11,4 +11,4 @@ appCategory=对话问答
 # 框架类型
 frameType=vllm
 # 加速卡类型
-accelerateType=BW1000
\ No newline at end of file
+accelerateType=K100AI,BW1000