Initial commit

02d50b70 · luopl · 02d50b70 · 02d50b70 · 02d50b70 · 02d50b70
Commit 02d50b70 authored Feb 14, 2026 by luopl
9 changed files
--- a/README.md
+++ b/README.md
+# MiniMax-M2.5
+## 论文
+[MiniMax-M2.5](https://www.minimax.io/news/forge-scalable-agent-rl-framework-and-algorithm)
+## 模型简介
+MiniMax-M2.5 在数十万个复杂真实世界环境中经过强化学习的广泛训练，在代码编写、智能体工具使用与搜索、办公任务以及一系列其他具有经济价值的任务上达到当前最先进（SOTA）水平，在多项基准测试中取得了优异成绩：SWE-Bench Verified 得分 80.2%、Multi-SWE-Bench 得分 51.3%、BrowseComp（含上下文管理）得分 76.3%。
+MiniMax-M2.5自主研发了一套智能体原生的 RL 框架，名为 Forge。该框架引入了一个中间层，将底层的训练-推理引擎与智能体完全解耦，支持任意智能体的集成，能够优化模型在不同智能体架构和工具上的泛化能力。为提升系统吞吐量，优化了异步调度策略，在系统吞吐量与样本偏离策略程度之间取得平衡，并设计了一种树状结构的训练样本合并策略，实现了约 40 倍的训练加速。
+<div align=center>
+    <img src="./doc/rl_1.png"/>
+</div>
+在算法方面，继续采用去年初提出的 CISPO 算法，以确保 MoE 模型在大规模训练过程中的稳定性。为应对智能体 rollout 中长上下文带来的信用分配挑战，模型引入了一种过程奖励机制，用于端到端监控生成质量。此外，为了深度对齐用户体验，我们通过智能体轨迹评估任务完成时间，在模型智能性与响应速度之间实现了最佳权衡。
+<div align=center>
+    <img src="./doc/rl_2.png"/>
+</div>
+## 环境依赖
+| 软件 |   版本        |
+| :------: |:---------:|
+| DTK |      26.04        |
+| python |   3.10.12     |
+| transformers |   4.57.6    |
+| vllm |   0.11.0+das.opt1.rc3.dtk2604     |
+| torch | 2.5.1+das.opt1.dtk2604.20260116.g78471bfd |
+推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202
+- 挂载地址`-v` 根据实际模型情况修改
+```bash
+docker run -it \
+    --shm-size 60g \
+    --network=host \
+    --name minimax-m2.5 \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mkfd \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -u root \
+    -v /opt/hyhal/:/opt/hyhal/:ro \
+    -v /path/your_code_data/:/path/your_code_data/ \
+    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-0130-py3.10-20260202 bash
+```
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，vllm包需要替换安装：
+```
+pip uninstall vllm
+pip install vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
+```
+## 数据集
+暂无
+## 训练
+暂无
+## 推理
+1. 将FP8模型权重转换成BF16，转换方法如下：
+```bash
+python cast_model_dtype/fp8_cast_bf16.py --input-fp8-hf-path /path/of/MiniMax/MiniMax-M2.5/ --output-bf16-hf-path /path/of/MiniMax/MiniMax-M2.5-bf16
+```
+2. 相关模型文件拷贝：
+```bash
+cp /path/of/MiniMax/MiniMax-M2.5/config.json /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/chat_template.jinja /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/configuration.json /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/generation_config.json  /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/configuration_minimax_m2.py  /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/tokenizer* /path/of/MiniMax/MiniMax-M2.5-bf16
+cp /path/of/MiniMax/MiniMax-M2.5/vocab.json /path/of/MiniMax/MiniMax-M2.5-bf16
+```
+**删掉 `/path/of/MiniMax/MiniMax-M2-bf16/config.json` 中的 `quantization_config` 字段内容，如图所示**
+<div align=center>
+    <img src="./doc/quant.png"/>
+</div>
+### vllm
+#### 单机推理
+```bash
+## serve启动
+vllm serve /path/of/MiniMax/MiniMax-M2.5-bf16 \
+    --trust-remote-code \
+    --served-model-name minimax-m2.5 \
+    --max-model-len 32768 \
+    --dtype bfloat16 \
+    -tp 8 \
+    --port 8001 \
+    --enable-auto-tool-choice \
+    --tool-call-parser minimax-m2 \
+    --enable-expert-parallel 
+## client访问
+curl http://localhost:8001/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "minimax-m2.5",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
+## 效果展示
+<div align=center>
+    <img src="./doc/result-dcu.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：vllm。
+## 预训练权重
+|          模型名称          | 权重大小 | DCU型号  | 最低卡数需求 |下载地址|
+|:----------------------:|:----:|:----------:|:------:|:----------:|
+|  MiniMax-M2.5   | 230B | BW1000 |   8    | [Hugging Face](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) |
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/minimax-m2.5_vllm
+## 参考资料
+- https://github.com/MiniMax-AI/MiniMax-M2.5
--- a/cast_model_dtype/fp8_cast_bf16.py
+++ b/cast_model_dtype/fp8_cast_bf16.py
+import os
+import json
+from argparse import ArgumentParser
+from glob import glob
+from tqdm import tqdm
+import torch
+from safetensors.torch import load_file, save_file
+# from kernel import weight_dequant
+block_size = 128
+def weight_dequant(weight, scale):
+    shape = weight.shape
+    assert weight.dim() == 2
+    weight = weight.view(shape[0] // block_size, block_size, shape[1] // block_size, block_size).transpose(1, 2).contiguous().view(-1, block_size * block_size)
+    weight = (weight.float() * scale.view(-1, 1).float()).to(torch.get_default_dtype()).view(shape[0] // block_size, shape[1] // block_size, block_size, block_size).transpose(1, 2).contiguous().view(shape)
+    return weight
+def main(fp8_path, bf16_path):
+    torch.set_default_dtype(torch.bfloat16)
+    os.makedirs(bf16_path, exist_ok=True)
+    model_index_file = os.path.join(fp8_path, "model.safetensors.index.json")
+    with open(model_index_file, "r") as f:
+        model_index = json.load(f)
+    weight_map = model_index["weight_map"]
+    # Cache for loaded safetensor files
+    loaded_files = {}
+    fp8_weight_names = []
+    # Helper function to get tensor from the correct file
+    def get_tensor(tensor_name):
+        file_name = weight_map[tensor_name]
+        if file_name not in loaded_files:
+            file_path = os.path.join(fp8_path, file_name)
+            loaded_files[file_name] = load_file(file_path, device="cuda")
+        return loaded_files[file_name][tensor_name]
+    safetensor_files = list(glob(os.path.join(fp8_path, "*.safetensors")))
+    safetensor_files.sort()
+    for safetensor_file in tqdm(safetensor_files):
+        file_name = os.path.basename(safetensor_file)
+        current_state_dict = load_file(safetensor_file, device="cuda")
+        loaded_files[file_name] = current_state_dict
+        new_state_dict = {}
+        for weight_name, weight in current_state_dict.items():
+            if weight_name.endswith("_scale_inv"):
+                continue
+            elif weight.element_size() == 1:  # FP8 weight
+                scale_inv_name = f"{weight_name}_scale_inv"
+                try:
+                    # Get scale_inv from the correct file
+                    scale_inv = get_tensor(scale_inv_name)
+                    fp8_weight_names.append(weight_name)
+                    new_state_dict[weight_name] = weight_dequant(weight, scale_inv)
+                except KeyError:
+                    print(f"Warning: Missing scale_inv tensor for {weight_name}, skipping conversion")
+                    new_state_dict[weight_name] = weight
+            else:
+                new_state_dict[weight_name] = weight
+        new_safetensor_file = os.path.join(bf16_path, file_name)
+        save_file(new_state_dict, new_safetensor_file)
+        # Memory management: keep only the 2 most recently used files
+        if len(loaded_files) > 2:
+            oldest_file = next(iter(loaded_files))
+            del loaded_files[oldest_file]
+            torch.cuda.empty_cache()
+    # Update model index
+    new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
+    for weight_name in fp8_weight_names:
+        scale_inv_name = f"{weight_name}_scale_inv"
+        if scale_inv_name in weight_map:
+            weight_map.pop(scale_inv_name)
+    with open(new_model_index_file, "w") as f:
+        json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
+if __name__ == "__main__":
+    parser = ArgumentParser()
+    parser.add_argument("--input-fp8-hf-path", type=str, required=True)
+    parser.add_argument("--output-bf16-hf-path", type=str, required=True)
+    args = parser.parse_args()
+    main(args.input_fp8_hf_path, args.output_bf16_hf_path)
--- a/doc/quant.png
+++ b/doc/quant.png
--- a/doc/result-dcu.png
+++ b/doc/result-dcu.png
--- a/doc/rl_1.png
+++ b/doc/rl_1.png
--- a/doc/rl_2.png
+++ b/doc/rl_2.png
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=2094
+# 模型名称
+modelName=minimax-m2.5_vllm
+# 模型描述
+modelDescription=MiniMax-M2.5在数十万个复杂真实世界环境中经过强化学习的广泛训练，在代码编写、智能体工具使用与搜索、办公任务以及一系列其他具有经济价值的任务上达到当前最先进（SOTA）水平，在多项基准测试中取得了优异成绩。
+# 运行过程
+processType=推理
+# 算法类别
+appCategory=代码生成
+# 框架类型
+frameType=vllm
+# 加速卡类型
+accelerateType=K100AI,BW1000
\ No newline at end of file
--- a/vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl
+++ b/vllm-0.11.0+das.opt1.rc3.dtk2604-cp310-cp310-linux_x86_64.whl