Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
DeepSeek-V3.2-Exp_vllm
Commits
1fac49dc
"vscode:/vscode.git/clone" did not exist on "3de3c13a0fb4d7a35008c26d8b62cfadac6a011a"
Commit
1fac49dc
authored
Sep 30, 2025
by
chenych
Browse files
modify bugs
parent
468b77ef
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
9 additions
and
8 deletions
+9
-8
README.md
README.md
+8
-7
doc/arch.png
doc/arch.png
+0
-0
inference/config_671B_v3.2.json
inference/config_671B_v3.2.json
+1
-1
start_vllm.sh
start_vllm.sh
+0
-0
No files found.
README.md
View file @
1fac49dc
...
@@ -3,11 +3,15 @@
...
@@ -3,11 +3,15 @@
[
DeepSeek_V3.2
](
./DeepSeek_V3_2.pdf
)
[
DeepSeek_V3.2
](
./DeepSeek_V3_2.pdf
)
## 模型结构
## 模型结构
DeepSeek-V3.2-Exp模型是一个实验版本,作为迈向下一代架构的中间步骤,V3.2-Exp 在 V3.1-Terminus 的基础上引入了 DeepSeek 稀疏注意力机制--一种旨在探索和验证在长上下文场景中训练和推理效率优化的稀疏注意力机制。
这个实验版本代表了deepseek团队对更高效变压器架构的持续研究,特别关注在处理扩展文本序列时提高计算效率。
<div
align=
center
>
<img
src=
"./doc/arch.png"
/>
</div>
## 算法原理
## 算法原理
DeepSeek 稀疏注意力机制(DSA)首次实现了细粒度的稀疏注意力,在保持几乎相同的模型输出质量的同时,显著提高了长上下文训练和推理效率。
## 环境配置
## 环境配置
### 硬件需求
### 硬件需求
...
@@ -44,11 +48,7 @@ torch: 2.5.1+das.opt1.dtk25041
...
@@ -44,11 +48,7 @@ torch: 2.5.1+das.opt1.dtk25041
vllm: 0.9.2+das.opt1.rc2.dtk25041
vllm: 0.9.2+das.opt1.rc2.dtk25041
transformers: 4.55.0
transformers: 4.55.0
```
```
`Tips:以上dtk驱动、pytorch等DCU相关工具版本需要严格一一对应`
, 其它库安装方式如下:
`Tips:以上dtk驱动、pytorch等DCU相关工具版本需要严格一一对应`
```
bash
```
## 数据集
## 数据集
无
无
...
@@ -63,6 +63,7 @@ cd inference
...
@@ -63,6 +63,7 @@ cd inference
# fp8转bf16
# fp8转bf16
python fp8_cast_bf16.py
--input-fp8-hf-path
/path/to/DeepSeek-V3.2-Exp
--output-bf16-hf-path
/path/to/DeepSeek-V3.2-Exp-bf16
python fp8_cast_bf16.py
--input-fp8-hf-path
/path/to/DeepSeek-V3.2-Exp
--output-bf16-hf-path
/path/to/DeepSeek-V3.2-Exp-bf16
```
```
2.
进行模型划分
2.
进行模型划分
```
bash
```
bash
python convert.py
--hf-ckpt-path
/path/to/DeepSeek-V3.2-Exp-bf16
--save-path
/path/to/DeepSeek-V3.2-Demo
--n-experts
256
--model-parallel
32
python convert.py
--hf-ckpt-path
/path/to/DeepSeek-V3.2-Exp-bf16
--save-path
/path/to/DeepSeek-V3.2-Demo
--n-experts
256
--model-parallel
32
...
...
doc/arch.png
View replaced file @
468b77ef
View file @
1fac49dc
120 KB
|
W:
|
H:
89.2 KB
|
W:
|
H:
2-up
Swipe
Onion skin
inference/config_671B_v3.2.json
View file @
1fac49dc
...
@@ -18,7 +18,7 @@
...
@@ -18,7 +18,7 @@
"qk_nope_head_dim"
:
128
,
"qk_nope_head_dim"
:
128
,
"qk_rope_head_dim"
:
64
,
"qk_rope_head_dim"
:
64
,
"v_head_dim"
:
128
,
"v_head_dim"
:
128
,
"dtype"
:
"
fp8
"
,
"dtype"
:
"
bf16
"
,
"scale_fmt"
:
"ue8m0"
,
"scale_fmt"
:
"ue8m0"
,
"index_n_heads"
:
64
,
"index_n_heads"
:
64
,
"index_head_dim"
:
128
,
"index_head_dim"
:
128
,
...
...
start_vllm.sh
0 → 100644
View file @
1fac49dc
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment