Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ModelZoo
Kimi-K2.6
Commits
5f8fc7ef
Commit
5f8fc7ef
authored
Jun 10, 2026
by
weishb
Browse files
更新模板,新增sglang部署方法
parent
3e468042
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
74 additions
and
12 deletions
+74
-12
README.md
README.md
+73
-11
model.properties
model.properties
+1
-1
No files found.
README.md
View file @
5f8fc7ef
...
...
@@ -19,13 +19,13 @@ Kimi K2.6 是一个开源的原生多模态智能体模型,在长周期编程
| 软件 | 版本 |
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
|
p
ython | 3.10.12 |
|
P
ython | 3.10.12 |
| Transformers | 4.57.6 |
| vLLM | 0.15.1+das.opt1.alpha.dtk2604.20260220.g2799735a |
| triton | 3.3.0+das.opt2.dtk2604.torch291.20260210.g1329924c |
| torch | 2.9.0+das.opt1.dtk2604.20260206.g275d08c2 |
| SGLang | 0.5.10rc0+das.opt2.alpha.dtk2604 |
当前仅支持以下镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
-
**vLLM推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
-
**SGLang推理请使用:**
harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang0.5.10rc0-ubuntu22.04-dtk26.04-py3.10-20260428
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
...
...
@@ -52,6 +52,14 @@ docker run -it \
pip install pycountry
```
## 预训练权重
**请根据`支持的DCU型号`选择对应模型下载,FP8模型仅在BW1100/BW1101上支持,其他型号请勿使用!**
| 模型名称 | 权重大小 | 数据类型 | 支持的DCU型号 | 最低卡数需求 | 下载地址 |
|:-----:|:----------:|:----------:|:----------:|:---------------------:|:----------:|
| Kimi-K2.5 | 1.1T | INT4 | BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/moonshotai/Kimi-K2.6
)
|
| Kimi-K2.5 | 1.1T | INT4 | BW1100 | 8 |
[
Hugging Face
](
https://huggingface.co/moonshotai/Kimi-K2.6
)
|
## 数据集
暂无
...
...
@@ -162,19 +170,73 @@ curl http://localhost:8000/v1/chat/completions \
}'
```
### SGLang
#### 单机推理
```
bash
#serve启动
export
HIP_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
SGLANG_USE_LIGHTOP
=
1
export
SGLANG_USE_OPT_CAT
=
1
export
USE_DCU_CUSTOM_ALLREDUCE
=
1
export
SGL_CHUNKED_PREFIX_CACHE_THRESHOLD
=
0
export
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT
=
1200
export
GLIBC_TUNABLES
=
glibc.rtld.optional_static_tls
=
0x40000
export
HIP_GRAPH_ACCUMULATE_DISPATCH
=
0
export
SGLANG_TORCH_PROFILER_DIR
=
/workspace/profiling
export
SGLANG_KVALLOC_KERNEL
=
1
export
SGLANG_CREATE_EXTEND_AFTER_DECODE_SPEC_INFO
=
1
export
SGLANG_ASSIGN_EXTEND_CACHE_LOCS
=
1
export
SGLANG_ASSIGN_REQ_TO_TOKEN_POOL
=
1
export
SGLANG_GET_LAST_LOC
=
1
export
SGLANG_CREATE_FLASHMLA_KV_INDICES_TRITON
=
1
export
SGLANG_CREATE_CHUNKED_PREFIX_CACHE_KV_INDICES
=
1
export
NCCL_MAX_NCHANNELS
=
16
export
NCCL_MIN_NCHANNELS
=
16
export
ALLREDUCE_STREAM_WITH_COMPUTE
=
1
python3
-m
sglang.launch_server
\
--model-path
/path/to/model/Kimi-K2.5
\
--kv-cache-dtype
fp8_e4m3
\
--host
$(
hostname
-I
|
awk
'{print $1}'
)
\
--port
30000
\
--trust-remote-code
\
--page-size
64
\
--dist-init-addr
$(
hostname
-I
|
awk
'{print $1}'
)
:5001
\
--reasoning-parser
kimi_k2
\
--tool-call-parser
kimi_k2
\
--nnodes
1
\
--node-rank
0
\
--dtype
bfloat16
\
--tp-size
8
\
--pp-size
1
\
--mem-fraction-static
0.98
\
--attention-backend
dcu_mla
\
--enable-torch-compile
\
--numa-node
0 0 0 0 1 1 1 1
\
--chunked-prefill-size
-1
\
--max-running-requests
512
\
--context-length
65536
## client访问
curl http://10.16.1.15:30000/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "/path/to/model/Kimi-K2.5 ",
"messages": [
{"role": "user", "content": "你好,请用一句话介绍你自己。"}
],
"max_tokens": 128,
"temperature": 0.6
}'
```
## 效果展示
<div
align=
center
>
<img
src=
"./doc/result-dcu.png"
/>
</div>
### 精度
DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
| Kimi-K2.6 | 1.1T | BW1000 | 16 |
[
Hugging Face
](
https://huggingface.co/moonshotai/Kimi-K2.6
)
|
| Kimi-K2.6 | 1.1T | BW1100 | 8 |
[
Hugging Face
](
https://huggingface.co/moonshotai/Kimi-K2.6
)
|
DCU与GPU精度一致,推理框架:vllm,sglang
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/kimi-k2.6
...
...
model.properties
View file @
5f8fc7ef
...
...
@@ -9,6 +9,6 @@ processType=推理
# 算法类别
appCategory
=
对话问答
# 框架类型
frameType
=
vllm
frameType
=
vllm
,sglang
# 加速卡类型
accelerateType
=
BW1000,BW1100
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment