Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Step-3.5-Flash_vllm
Commits
64678777
Commit
64678777
authored
Mar 16, 2026
by
luopl
Browse files
add Step-3.5-Flash-FP8
parent
a11e3ef2
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
41 additions
and
6 deletions
+41
-6
README.md
README.md
+40
-5
lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl
lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl
+0
-0
model.properties
model.properties
+1
-1
No files found.
README.md
View file @
64678777
...
@@ -56,9 +56,11 @@ docker run -it \
...
@@ -56,9 +56,11 @@ docker run -it \
```
```
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
更多镜像可前往
[
光源
](
https://sourcefind.cn/#/service-list
)
下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.sourcefind.cn/tool/
)
开发者社区下载安装,pycountry
库
需要单独安装:
关于本项目DCU显卡所需的特殊深度学习库可从
[
光合
](
https://developer.sourcefind.cn/tool/
)
开发者社区下载安装,pycountry需要单独安装
,lmslim库需卸载重装
:
```
```
pip install pycountry
pip install pycountry
pip uninstall lmslim
pip install lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl --no-deps
```
```
## 数据集
## 数据集
...
@@ -71,12 +73,12 @@ pip install pycountry
...
@@ -71,12 +73,12 @@ pip install pycountry
### vllm
### vllm
#### 单机推理
#### 单机推理
**1. Step-3.5-Flash模型推理:**
```
bash
```
bash
## serve启动
## serve启动
vllm serve stepfun-ai/Step-3.5-Flash
\
vllm serve stepfun-ai/Step-3.5-Flash
\
--port
8001
\
--port
8001
\
--tensor-parallel-size
8
\
--tensor-parallel-size
4
\
--enable-expert-parallel
\
--enable-expert-parallel
\
--disable-cascade-attn
\
--disable-cascade-attn
\
--reasoning-parser
step3p5
\
--reasoning-parser
step3p5
\
...
@@ -100,6 +102,38 @@ curl http://localhost:8001/v1/chat/completions \
...
@@ -100,6 +102,38 @@ curl http://localhost:8001/v1/chat/completions \
}'
}'
```
```
**2. Step-3.5-Flash-FP8模型推理:**
```
bash
## serve启动
vllm serve stepfun-ai/Step-3.5-Flash-FP8
\
--port
8001
\
--tensor-parallel-size
2
\
--enable-expert-parallel
\
--disable-cascade-attn
\
--reasoning-parser
step3p5
\
--enable-auto-tool-choice
\
--tool-call-parser
step3p5
\
--hf-overrides
'{"num_nextn_predict_layers": 1}'
\
--speculative_config
'{"method": "step3p5_mtp", "num_speculative_tokens": 1}'
\
--trust-remote-code
\
--quantization
fp8
\
--compilation-config
'{"pass_config": {"fuse_act_quant": false}}'
## client访问
curl http://localhost:8001/v1/chat/completions
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "stepfun-ai/Step-3.5-Flash-FP8",
"messages": [
{
"role": "user",
"content": "牛顿提出了哪三大运动定律?请简要说明。"
}
]
}'
```
## 效果展示
## 效果展示
<div
align=
center
>
<div
align=
center
>
<img
src=
"./doc/result-dcu.png"
/>
<img
src=
"./doc/result-dcu.png"
/>
...
@@ -110,8 +144,9 @@ DCU与GPU精度一致,推理框架:vllm。
...
@@ -110,8 +144,9 @@ DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----------:|:------:|:---------------------:|
|:------:|:----:|:------:|:------:|:---------------------:|
| Step-3.5-Flash | 199B | BW1000 | 8 |
[
Hugging Face
](
https://huggingface.co/stepfun-ai/Step-3.5-Flash
)
|
| Step-3.5-Flash | 199B | BW1100 | 4 |
[
Hugging Face
](
https://huggingface.co/stepfun-ai/Step-3.5-Flash
)
|
| Step-3.5-Flash-FP8 | 199B | BW1100 | 2 |
[
Hugging Face
](
https://huggingface.co/stepfun-ai/Step-3.5-Flash-FP8
)
|
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
https://developer.sourcefind.cn/codes/modelzoo/step-3.5-flash_vllm
-
https://developer.sourcefind.cn/codes/modelzoo/step-3.5-flash_vllm
...
...
lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl
0 → 100644
View file @
64678777
File added
model.properties
View file @
64678777
...
@@ -11,4 +11,4 @@ appCategory=对话问答
...
@@ -11,4 +11,4 @@ appCategory=对话问答
# 框架类型
# 框架类型
frameType
=
vllm
frameType
=
vllm
# 加速卡类型
# 加速卡类型
accelerateType
=
BW1
0
00
accelerateType
=
BW1
1
00
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment