# Step3 ## 论文 ` Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding ` - https://arxiv.org/abs/2507.19427 ## 模型结构 Step3 是一个先进的多模态推理模型,基于混合专家架构构建,拥有 321B 总参数,单token激活38B参数。 它采用端到端设计,旨在最大限度地降低解码成本,同时在视觉语言推理领域提供顶级性能。 通过 Multi-Matrix Factorization Attention(MFA)和Attention-FFN Disaggregation (AFD)的协同设计,Step3 高端和低端加速器上均保持卓越的效率。 AFD的架构实现如图所示。
## 算法原理 Step-3引入了两大优化设计: - 在模型算法方面,引入了MFA(Multi-matrix Factorization Attention)算法,其计算密度设计更加均衡,相较于MHA, GQA, MLA实现更低的decode成本。 - 在系统设计方面,引入了Attention FFN分离架构(Attention-FFN Disaggregation, AFD), 并根据具体硬件配置相应并行策略。 ## 环境配置 ### Docker(方法一) ``` docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250802-step3 # 为以上拉取的docker的镜像ID替换 docker run -it --shm-size=1024G -v $PWD/Step3_pytorch:/home/Step3_pytorch -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name step3 bash ``` ### Dockerfile(方法二) ``` cd $PWD/Step3_pytorch/docker docker build --no-cache -t step3:latest . docker run --shm-size=1024G --name step3 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/Step3_pytorch:/home/Step3_pytorch -it step3 bash ``` ## 数据集 `无` ## 训练 `无` ## 推理 注意:运行该模型需要16x64(GB)显存。 ### 多机多卡 启动ray集群 ``` #head节点执行 ray start --head --node-ip-address=*.*.*.*(主节点ip) --port=*** --num-gpus=8 --num-cpus=** ``` ``` 其余节点执行 ray start --address='*.*.*.*:***' (*.*.*.*:主节点ip,***:port) --num-gpus=8 --num-cpus=** ``` vLLM Deployment(vllm官方暂不支持AFD,只支持非分离模式部署): ``` #head节点执行 VLLM_USE_NN=0 VLLM_USE_FLASH_ATTN_PA=0 vllm serve /path/to/step3 \ --reasoning-parser step3 \ --enable-auto-tool-choice \ --tool-call-parser step3 \ --trust-remote-code \ --max-num-batched-tokens 4096 \ --distributed-executor-backend ray \ --dtype float16 \ -tp 16 \ --port $PORT_SERVING ``` - Client Request Examples ``` from openai import OpenAI # Set OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) chat_response = client.chat.completions.create( model="/path/to/step3", messages=[ {"role": "system", "content": "You are a helpful assistant."}, { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://xxxxx.png" }, }, {"type": "text", "text": "Please describe the image."}, ], }, ], ) print("Chat response:", chat_response) ``` - You can also upload base64-encoded local images: ``` import base64 from openai import OpenAI # Set OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) image_path = "/path/to/local/image.png" with open(image_path, "rb") as f: encoded_image = base64.b64encode(f.read()) encoded_image_text = encoded_image.decode("utf-8") base64_step = f"data:image;base64,{encoded_image_text}" chat_response = client.chat.completions.create( model="step3", messages=[ {"role": "system", "content": "You are a helpful assistant."}, { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": base64_step }, }, {"type": "text", "text": "Please describe the image."}, ], }, ], ) print("Chat response:", chat_response) ``` - text only: ``` from openai import OpenAI # Set OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) chat_response = client.chat.completions.create( model="/path/to/step3", # 模型路径 messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "The capital of France is"}, ], ) print("Chat response:", chat_response.choices[0].message.content) ``` ## result example1: - text:Please describe the image. - image:
- 输出结果:
### 精度 `无` ## 应用场景 ### 算法类别 `对话问答` ### 热点应用行业 `电商,教育,广媒` ## 预训练权重 huggingface权重下载地址为: - [stepfun-ai/step3](https://huggingface.co/stepfun-ai/step3) `注:建议加镜像源下载:export HF_ENDPOINT=https://hf-mirror.com` ## 源码仓库及问题反馈 - https://developer.sourcefind.cn/codes/modelzoo/step3_pytorch ## 参考资料 - https://github.com/stepfun-ai/Step3/tree/main