README.md 5.71 KB
Newer Older
luopl's avatar
luopl committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# Step3

## 论文
`
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
`
- https://arxiv.org/abs/2507.19427

## 模型结构
Step3 是一个先进的多模态推理模型,基于混合专家架构构建,拥有 321B 总参数,单token激活38B参数。
它采用端到端设计,旨在最大限度地降低解码成本,同时在视觉语言推理领域提供顶级性能。
通过 Multi-Matrix Factorization Attention(MFA)和Attention-FFN Disaggregation (AFD)的协同设计,Step3 高端和低端加速器上均保持卓越的效率。
AFD的架构实现如图所示。
<div align=center>
    <img src="./figures/AFD_architecture.png"/>
</div>

## 算法原理

Step-3引入了两大优化设计:

- 在模型算法方面,引入了MFA(Multi-matrix Factorization Attention)算法,其计算密度设计更加均衡,相较于MHA, GQA, MLA实现更低的decode成本。
- 在系统设计方面,引入了Attention FFN分离架构(Attention-FFN Disaggregation, AFD), 并根据具体硬件配置相应并行策略。


## 环境配置

### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250802-step3
# <your IMAGE ID>为以上拉取的docker的镜像ID替换
docker run -it --shm-size=1024G -v $PWD/Step3_pytorch:/home/Step3_pytorch -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name step3 <your IMAGE ID> bash

```
### Dockerfile(方法二)
```
cd $PWD/Step3_pytorch/docker
docker build --no-cache -t step3:latest .
docker run --shm-size=1024G --name step3 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/Step3_pytorch:/home/Step3_pytorch -it step3 bash

```

## 数据集
`无`

## 训练
`无`

## 推理

注意:运行该模型需要16x64(GB)显存。
### 多机多卡
启动ray集群
```
#head节点执行
ray start --head --node-ip-address=*.*.*.*(主节点ip) --port=*** --num-gpus=8 --num-cpus=**
```

```
其余节点执行
ray start --address='*.*.*.*:***' (*.*.*.*:主节点ip,***:port) --num-gpus=8 --num-cpus=**
```

vLLM Deployment(vllm官方暂不支持AFD,只支持非分离模式部署):

```
#head节点执行
luopl's avatar
luopl committed
68
#Tensor Parallelism
luopl's avatar
luopl committed
69
70
71
72
73
74
75
76
77
78
79
80
81

VLLM_USE_NN=0 VLLM_USE_FLASH_ATTN_PA=0 vllm serve /path/to/step3 \
    --reasoning-parser step3 \
    --enable-auto-tool-choice \
    --tool-call-parser step3 \
    --trust-remote-code \
    --max-num-batched-tokens 4096 \
    --distributed-executor-backend ray  \
    --dtype float16  \
    -tp 16 \
    --port $PORT_SERVING 

```
luopl's avatar
luopl committed
82
` 暂不支持attention data parallelism`
luopl's avatar
luopl committed
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209

- Client Request Examples
```
from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="/path/to/step3",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://xxxxx.png"
                    },
                },
                {"type": "text", "text": "Please describe the image."},
            ],
        },
    ],
)
print("Chat response:", chat_response)
```
- You can also upload base64-encoded local images:
```
import base64
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
image_path = "/path/to/local/image.png"
with open(image_path, "rb") as f:
    encoded_image = base64.b64encode(f.read())
encoded_image_text = encoded_image.decode("utf-8")
base64_step = f"data:image;base64,{encoded_image_text}"
chat_response = client.chat.completions.create(
    model="step3",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": base64_step
                    },
                },
                {"type": "text", "text": "Please describe the image."},
            ],
        },
    ],
)
print("Chat response:", chat_response)
```
- text only:
```
from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="/path/to/step3",  # 模型路径
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "The capital of France is"},
    ],
)
print("Chat response:", chat_response.choices[0].message.content)
```

## result
example1:
- text:Please describe the image.
- image:

<div align=center>
    <img src="./figures/bee.jpg"/>
</div>


- 输出结果:


<div align=center>
    <img src="./figures/result.png"/>
</div>

### 精度
`无`
## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`电商,教育,广媒`
## 预训练权重
huggingface权重下载地址为:

- [stepfun-ai/step3](https://huggingface.co/stepfun-ai/step3)

`注:建议加镜像源下载:export HF_ENDPOINT=https://hf-mirror.com`
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/step3_pytorch
## 参考资料
- https://github.com/stepfun-ai/Step3/tree/main