Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
DeepSeek-V3.2-Exp_vllm
Commits
a025e014
Commit
a025e014
authored
Mar 31, 2026
by
chenych
Browse files
Update VLLM
parent
d10c1d90
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
58 additions
and
61 deletions
+58
-61
README.md
README.md
+35
-50
inference/config.json
inference/config.json
+2
-2
inference/fp8_cast_bf16.py
inference/fp8_cast_bf16.py
+16
-6
model.properties
model.properties
+5
-3
No files found.
README.md
View file @
a025e014
...
...
@@ -2,54 +2,47 @@
## 论文
[
DeepSeek_V3.2
](
./DeepSeek_V3_2.pdf
)
## 模型结构
DeepSeek-V3.2-Exp模型是一个实验版本,作为迈向下一代架构的中间步骤,V3.2-Exp 在 V3.1-Terminus 的基础上引入了 DeepSeek 稀疏注意力机制--一种旨在探索和验证在长上下文场景中训练和推理效率优化的稀疏注意力机制。
## 模型简介
DeepSeek-V3.2-Exp模型是一个实验版本,作为迈向下一代架构的中间步骤,V3.2-Exp 在 V3.1-Terminus 的基础上引入了 DeepSeek 稀疏注意力机制,DeepSeek 稀疏注意力机制(DSA)首次实现了细粒度的稀疏注意力,在保持几乎相同的模型输出质量的同时,显著提高了长上下文训练和推理效率。
这个实验版本代表了deepseek团队对更高效变压器架构的持续研究,特别关注在处理扩展文本序列时提高计算效率。
<div
align=
center
>
<img
src=
"./doc/arch.png"
/>
</div>
## 算法原理
DeepSeek 稀疏注意力机制(DSA)首次实现了细粒度的稀疏注意力,在保持几乎相同的模型输出质量的同时,显著提高了长上下文训练和推理效率。
## 环境配置
### 硬件需求
DCU型号:K100AI,节点数量:4台,卡数:32 张。
## 环境依赖
| 软件 | 版本 |
| :------: | :------: |
| DTK | 26.04 |
| python | 3.10.12 |
| transformers | 4.57.6 |
| vllm | 0.11.0+das.opt1.rc2.dtk2604.20260128.g0bf89b0c |
| torch | 2.5.1+das.opt1.dtk2604.20260116.g78471bfd |
`-v 路径`
、
`docker_name`
和
`imageID`
根据实际情况修改
推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-py3.10
### Docker(方法一)
-
挂载地址
`-v`
根据实际模型情况修改
```
bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas104381-0915-das1.6-py3.10-20250916-rc2-ds3.2
docker run
-it
--shm-size
200g
--network
=
host
--name
{
docker_name
}
--privileged
--device
=
/dev/kfd
--device
=
/dev/dri
--device
=
/dev/mkfd
--group-add
video
--cap-add
=
SYS_PTRACE
--security-opt
seccomp
=
unconfined
-u
root
-v
/path/your_code_data/:/path/your_code_data/
-v
/opt/hyhal/:/opt/hyhal/:ro
{
imageID
}
bash
cd
/your_code_path/deepseek-v3.2-exp_vllm
```
### Dockerfile(方法二)
```
bash
cd
docker
docker build
--no-cache
-t
deepseek-v3.2-exp:latest
.
docker run
-it
--shm-size
200g
--network
=
host
--name
{
docker_name
}
--privileged
--device
=
/dev/kfd
--device
=
/dev/dri
--device
=
/dev/mkfd
--group-add
video
--cap-add
=
SYS_PTRACE
--security-opt
seccomp
=
unconfined
-u
root
-v
/path/your_code_data/:/path/your_code_data/
-v
/opt/hyhal/:/opt/hyhal/:ro
{
imageID
}
bash
cd
/your_code_path/deepseek-v3.2-exp_vllm
docker run
-it
\
--shm-size
200g
\
--network
=
host
\
--name
deepseek-v32
\
--privileged
\
--device
=
/dev/kfd
\
--device
=
/dev/dri
\
--device
=
/dev/mkfd
\
--group-add
video
\
--cap-add
=
SYS_PTRACE
\
--security-opt
seccomp
=
unconfined
\
-u
root
\
-v
/opt/hyhal/:/opt/hyhal/:ro
\
-v
/path/your_code_data/:/path/your_code_data/
\
harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-py3.10 bash
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.sourcefind.cn/tool/
)
开发者社区下载安装。
```
bash
DTK: 25.04.1
python: 3.10.12
torch: 2.5.1+das.opt1.dtk25041
transformers: 4.56.1
```
`Tips:以上dtk驱动、pytorch等DCU相关工具版本需要严格一一对应`
,其他包安装如下:
```
bash
wget http://112.11.119.99:18000/temp/vllm-0.9.2%2Bdas.opt1.rc2.51af08a.dtk25041-cp310-cp310-linux_x86_64.whl
pip
install
vllm-0.9.2+das.opt1.rc2.51af08a.dtk25041-cp310-cp310-linux_x86_64.whl
```
## 数据集
无
...
...
@@ -120,9 +113,6 @@ export VLLM_RANK5_NUMA=5
export
VLLM_RANK6_NUMA
=
6
export
VLLM_RANK7_NUMA
=
7
#BW集群需要额外设置的环境变量
export
NCCL_NET_GDR_LEVEL
=
7
export
NCCL_SDMA_COPY_ENABLE
=
0
```
2.
启动RAY集群
...
...
@@ -155,7 +145,7 @@ vllm serve /path/to/DeepSeek-V3.2-Exp-bf16 \
curl http://127.0.0.1:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "
/path/to/DeepSeek-V3.2-Exp-bf16
",
"model": "
ds32
",
"messages": [
{
"role": "user",
...
...
@@ -170,7 +160,7 @@ curl http://127.0.0.1:8001/v1/chat/completions \
}'
```
##
result
##
效果展示
<div
align=
center
>
<img
src=
"./doc/results_dcu.png"
/>
</div>
...
...
@@ -178,16 +168,11 @@ curl http://127.0.0.1:8001/v1/chat/completions \
### 精度
DCU与GPU精度一致,推理框架:vllm。
## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`制造,金融,教育,广媒`
## 预训练权重
-
[
DeepSeek-V3.2-Exp
](
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
)
-
[
DeepSeek-V3.2-Exp-Base
](
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base
)
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| DeepSeek-V3.2-Exp | 685B | BW1000 | 32 |
[
Hugging Face
](
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
)
|
| DeepSeek-V3.2-Exp-Base | 685B | BW1000 | 32 |
[
Hugging Face
](
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base
)
|
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/deepseek-v3.2-exp_vllm
...
...
inference/config.json
View file @
a025e014
{
"architectures"
:
[
"DeepseekV3ForCausalLM"
"DeepseekV3
2
ForCausalLM"
],
"attention_bias"
:
false
,
"attention_dropout"
:
0.0
,
...
...
@@ -17,7 +17,7 @@
"intermediate_size"
:
18432
,
"kv_lora_rank"
:
512
,
"max_position_embeddings"
:
163840
,
"model_type"
:
"deepseek_v3"
,
"model_type"
:
"deepseek_v3
2
"
,
"moe_intermediate_size"
:
2048
,
"moe_layer_freq"
:
1
,
"n_group"
:
8
,
...
...
inference/fp8_cast_bf16.py
View file @
a025e014
...
...
@@ -13,8 +13,15 @@ def main(fp8_path, bf16_path):
torch
.
set_default_dtype
(
torch
.
bfloat16
)
os
.
makedirs
(
bf16_path
,
exist_ok
=
True
)
model_index_file
=
os
.
path
.
join
(
fp8_path
,
"model.safetensors.index.json"
)
config_file
=
os
.
path
.
join
(
fp8_path
,
'config.json'
)
with
open
(
model_index_file
,
"r"
)
as
f
:
model_index
=
json
.
load
(
f
)
with
open
(
config_file
,
"r"
)
as
f
:
config
=
json
.
load
(
f
)
if
'quantization_config'
in
config
:
config
.
pop
(
'quantization_config'
)
weight_map
=
model_index
[
"weight_map"
]
# Cache for loaded safetensor files
...
...
@@ -64,6 +71,7 @@ def main(fp8_path, bf16_path):
# Update model index
new_model_index_file
=
os
.
path
.
join
(
bf16_path
,
"model.safetensors.index.json"
)
new_config_file
=
os
.
path
.
join
(
bf16_path
,
"config.json"
)
for
weight_name
in
fp8_weight_names
:
scale_inv_name
=
f
"
{
weight_name
}
_scale_inv"
if
scale_inv_name
in
weight_map
:
...
...
@@ -71,6 +79,8 @@ def main(fp8_path, bf16_path):
with
open
(
new_model_index_file
,
"w"
)
as
f
:
json
.
dump
({
"metadata"
:
{},
"weight_map"
:
weight_map
},
f
,
indent
=
2
)
with
open
(
new_config_file
,
"w"
)
as
f
:
json
.
dump
(
config
,
f
,
indent
=
2
)
if
__name__
==
"__main__"
:
parser
=
ArgumentParser
()
...
...
model.properties
View file @
a025e014
...
...
@@ -4,9 +4,11 @@ modelCode=1761
modelName
=
deepseek-v3.2-exp_vllm
# 模型描述
modelDescription
=
DeepSeek-V3.2-Exp模型是一个实验版本,作为迈向下一代架构的中间步骤。
# 应用场景
appScenario
=
推理,对话问答,制造,金融,教育,广媒
# 运行过程
processType
=
推理
# 算法类别
appCategory
=
对话问答
# 框架类型
frameType
=
vllm
# 加速卡类型
accelerateType
=
K100AI
\ No newline at end of file
accelerateType
=
BW1000
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment