add Step-3.5-Flash-FP8

64678777 · luopl · a11e3ef2 · 64678777 · 64678777 · 64678777
Commit 64678777 authored Mar 16, 2026 by luopl
Showing with 41 additions and 6 deletions

README.md README.md +40 -5

lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl +0 -0

model.properties model.properties +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -56,9 +56,11 @@ docker run -it \
 ```
 更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
-关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，pycountry库需要单独安装：
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，pycountry需要单独安装，lmslim库需卸载重装：
 ```
 pip install pycountry
+pip uninstall lmslim
+pip install lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl --no-deps
 ```
 ## 数据集
@@ -71,12 +73,12 @@ pip install pycountry
 ### vllm
 #### 单机推理
+**1. Step-3.5-Flash模型推理：**
 ```bash
 ## serve启动
 vllm serve stepfun-ai/Step-3.5-Flash \
    --port 8001 \
-    --tensor-parallel-size 8 \
+    --tensor-parallel-size 4 \
    --enable-expert-parallel \
    --disable-cascade-attn \
    --reasoning-parser step3p5 \
@@ -100,6 +102,38 @@ curl http://localhost:8001/v1/chat/completions   \
    }'
 ```
+**2. Step-3.5-Flash-FP8模型推理：**
+```bash
+## serve启动
+vllm serve stepfun-ai/Step-3.5-Flash-FP8 \
+  --port 8001 \
+  --tensor-parallel-size 2 \
+  --enable-expert-parallel \
+  --disable-cascade-attn \
+  --reasoning-parser step3p5 \
+  --enable-auto-tool-choice \
+  --tool-call-parser step3p5 \
+  --hf-overrides '{"num_nextn_predict_layers": 1}' \
+  --speculative_config '{"method": "step3p5_mtp", "num_speculative_tokens": 1}' \
+  --trust-remote-code \
+  --quantization fp8 \
+  --compilation-config '{"pass_config": {"fuse_act_quant": false}}'
+## client访问
+curl http://localhost:8001/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "stepfun-ai/Step-3.5-Flash-FP8",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
 ## 效果展示
 <div align=center>
    <img src="./doc/result-dcu.png"/>
@@ -110,8 +144,9 @@ DCU与GPU精度一致，推理框架：vllm。
 ## 预训练权重
 |  模型名称  | 权重大小 | DCU型号  | 最低卡数需求 |         下载地址          |
-|:------:|:----:|:----------:|:------:|:---------------------:|
+|:------:|:----:|:------:|:------:|:---------------------:|
-| Step-3.5-Flash | 199B | BW1000 |   8    | [Hugging Face](https://huggingface.co/stepfun-ai/Step-3.5-Flash) |
+| Step-3.5-Flash | 199B | BW1100 |   4    | [Hugging Face](https://huggingface.co/stepfun-ai/Step-3.5-Flash) |
+| Step-3.5-Flash-FP8 | 199B | BW1100 |   2    | [Hugging Face](https://huggingface.co/stepfun-ai/Step-3.5-Flash-FP8) |
 ## 源码仓库及问题反馈
 - https://developer.sourcefind.cn/codes/modelzoo/step-3.5-flash_vllm

--- a/lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl
+++ b/lmslim-0.3.1+das.opt4.dtk2604-cp310-cp310-linux_x86_64.whl
--- a/model.properties
+++ b/model.properties
@@ -11,4 +11,4 @@ appCategory=对话问答
 # 框架类型
 frameType=vllm
 # 加速卡类型
-accelerateType=BW1000
+accelerateType=BW1100