Commit 045f9041 authored by luopl's avatar luopl Committed by chenych
Browse files

add the K100 AI inference method

parent cb168f56
# Qwen3.5_vllm
# Qwen3.5
## 论文
[Qwen3.5](https://qwen.ai/blog?id=qwen3.5)
......@@ -58,6 +58,30 @@ pip install numpy==1.25.0
## 推理
### vllm
#### 单机推理
**注意**:使用`K100 AI` 集群启动服务时需要添加`--disable-custom-all-reduce`参数
```bash
## serve启动
vllm serve Qwen/Qwen3.5-35B-A3B \
--port 8001 \
--tensor-parallel-size 2 \
--max-model-len 262144 \
--reasoning-parser qwen3
## client访问
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-35B-A3B",
"messages": [
{"role": "user", "content": "Type \"I love Qwen3.5\" backwards"}
],
"temperature": 0.6
}'
```
#### 多机推理
1. 加入环境变量
> 请注意:
......@@ -87,6 +111,9 @@ export VLLM_MLA_DISABLE=0
export VLLM_USE_FLASH_MLA=1
export VLLM_RPC_TIMEOUT=1800000
# K100_AI集群建议额外设置的环境变量:
export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
# 海光CPU绑定核
export VLLM_NUMA_BIND=1
export VLLM_RANK0_NUMA=0
......@@ -111,6 +138,8 @@ ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32
3. 启动vllm server
**注意**:使用`K100 AI` 集群启动服务时需要添加`--disable-custom-all-reduce`参数
```bash
## serve启动
......@@ -144,10 +173,10 @@ DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Qwen3.5-397B-A17B | 397B | BW1000 | 16 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) |
| Qwen3.5-122B-A10B | 122B | BW1000 | 8 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) |
| Qwen3.5-35B-A3B | 35B | BW1000 | 2 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) |
| Qwen3.5-27B | 27B | BW1000 | 2 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-27B) |
| Qwen3.5-397B-A17B | 397B | K100AI,BW1000 | 16 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) |
| Qwen3.5-122B-A10B | 122B | K100AI,BW1000 | 8 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) |
| Qwen3.5-35B-A3B | 35B | K100AI,BW1000 | 2 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) |
| Qwen3.5-27B | 27B | K100AI,BW1000 | 2 | [Hugging Face](https://huggingface.co/Qwen/Qwen3.5-27B) |
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/qwen3.5_vllm
......
......@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
frameType=vllm
# 加速卡类型
accelerateType=BW1000
\ No newline at end of file
accelerateType=K100AI,BW1000
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment