Commit ed315608 authored by raojy's avatar raojy 💬
Browse files

Update README.md

parent 313413b4
......@@ -13,11 +13,11 @@ Qwen3.6是一款采用混合专家 (MoE) 架构并包含视觉编码器的多模
| 软件 | 版本 |
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
| python | 3.10.12 |
| transformers | 5.5.0 |
| vllm | 0.18.1+das.fa71803.dtk2604 |
| triton | 3.6.0+gitc73250c4.staging |
| torch | 2.10.0+das.opt1.dtk2604.20260325.g6b060a |
| Python | 3.10.12 |
| Transformers | 5.5.0 |
| vLLm | 0.18.1+das.fa71803.dtk2604 |
| Triton | 3.6.0+gitc73250c4.staging |
| Torch | 2.10.0+das.opt1.dtk2604.20260325.g6b060a |
当前推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm018-ubuntu22.04-dtk26.04-qwen3.6-20260423
......@@ -56,7 +56,7 @@ docker run -it \
暂无
## 推理
### vllm
### vLLm
#### 单机推理
```bash
## serve启动
......@@ -65,7 +65,7 @@ vllm serve Qwen/Qwen3.6-27B \
--trust-remote-code \
--dtype bfloat16 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.925 \
--gpu-memory-utilization 0.9 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
......@@ -103,7 +103,7 @@ curl http://localhost:8001/v1/chat/completions \
</div>
### 精度
- 推理框架:vllm
- 推理框架:vLLm
- 测试数据:humaneval、gsm8k
- 使用的加速卡:bw1000
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment