# MiniMax-M2.5
## 论文
[MiniMax-M2.5](https://www.minimax.io/news/forge-scalable-agent-rl-framework-and-algorithm)

## 模型简介
MiniMax-M2.5 在数十万个复杂真实世界环境中经过强化学习的广泛训练，在代码编写、智能体工具使用与搜索、办公任务以及一系列其他具有经济价值的任务上达到当前最先进（SOTA）水平，在多项基准测试中取得了优异成绩：SWE-Bench Verified 得分 80.2%、Multi-SWE-Bench 得分 51.3%、BrowseComp（含上下文管理）得分 76.3%。

MiniMax-M2.5自主研发了一套智能体原生的 RL 框架，名为 Forge。该框架引入了一个中间层，将底层的训练-推理引擎与智能体完全解耦，支持任意智能体的集成，能够优化模型在不同智能体架构和工具上的泛化能力。为提升系统吞吐量，优化了异步调度策略，在系统吞吐量与样本偏离策略程度之间取得平衡，并设计了一种树状结构的训练样本合并策略，实现了约 40 倍的训练加速。

<div align=center>
    <img src="./doc/rl_1.png"/>
</div>

在算法方面，继续采用去年初提出的 CISPO 算法，以确保 MoE 模型在大规模训练过程中的稳定性。为应对智能体 rollout 中长上下文带来的信用分配挑战，模型引入了一种过程奖励机制，用于端到端监控生成质量。此外，为了深度对齐用户体验，我们通过智能体轨迹评估任务完成时间，在模型智能性与响应速度之间实现了最佳权衡。

<div align=center>
    <img src="./doc/rl_2.png"/>
</div>

## 环境依赖
| 软件 |   版本        |
| :------: |:---------:|
| DTK |      26.04        |
| python |   3.10.12     |
| transformers |   4.57.6    |
| vllm |   0.11.0+das.opt1.rc3.dtk2604     |
| torch | 2.5.1+das.opt1.dtk2604.20260116.g78471bfd |

推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202

- 挂载地址`-v` 根据实际模型情况修改
```bash
docker run -it \
    --shm-size 60g \
    --network=host \
    --name minimax-m2.5 \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，vllm包需要替换安装：
```
pip uninstall vllm
pip install vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
```

## 数据集
暂无

## 训练
暂无

## 推理
1. 将FP8模型权重转换成BF16，转换方法如下：

```bash
python cast_model_dtype/fp8_cast_bf16.py --input-fp8-hf-path /path/of/MiniMax/MiniMax-M2.5/ --output-bf16-hf-path /path/of/MiniMax/MiniMax-M2.5-bf16
```
2. 相关模型文件拷贝：
```bash
cp /path/of/MiniMax/MiniMax-M2.5/config.json /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/chat_template.jinja /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/configuration.json /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/generation_config.json  /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/configuration_minimax_m2.py  /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/tokenizer* /path/of/MiniMax/MiniMax-M2.5-bf16
cp /path/of/MiniMax/MiniMax-M2.5/vocab.json /path/of/MiniMax/MiniMax-M2.5-bf16
```

**删掉 `/path/of/MiniMax/MiniMax-M2-bf16/config.json` 中的 `quantization_config` 字段内容，如图所示**
<div align=center>
    <img src="./doc/quant.png"/>
</div>

### vllm
#### 单机推理

```bash
## serve启动
vllm serve /path/of/MiniMax/MiniMax-M2.5-bf16 \
    --trust-remote-code \
    --served-model-name minimax-m2.5 \
    --max-model-len 32768 \
    --dtype bfloat16 \
    -tp 8 \
    --port 8001 \
    --enable-auto-tool-choice \
    --tool-call-parser minimax-m2 \
    --enable-expert-parallel 


## client访问
curl http://localhost:8001/v1/chat/completions   \
    -H "Content-Type: application/json"  \
    -d '{
        "model": "minimax-m2.5",
        "messages": [
            {
                "role": "user",
                "content": "牛顿提出了哪三大运动定律？请简要说明。"
            }
        ]
    }'
```

## 效果展示
<div align=center>
    <img src="./doc/result-dcu.png"/>
</div>

### 精度
DCU与GPU精度一致，推理框架：vllm。

## 预训练权重
|          模型名称          | 权重大小 | DCU型号  | 最低卡数需求 |下载地址|
|:----------------------:|:----:|:----------:|:------:|:----------:|
|  MiniMax-M2.5   | 229B | BW1000 |   8    | [Hugging Face](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) |

## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/minimax-m2.5_vllm

## 参考资料
- https://github.com/MiniMax-AI/MiniMax-M2.5