first commit

55cce7ff · chenych · 55cce7ff · 55cce7ff · 55cce7ff · 55cce7ff
Commit 55cce7ff authored Nov 16, 2025 by chenych
9 changed files
--- a/Contributors.md
+++ b/Contributors.md
+# Contributors
+This file contains the list of everyone who contributed to the repository
+<br>
+<table>
+<th>Contributors1</th><th>Contributors2</th>  <tr>
+    <td><img src="xxx1">
+    <br>
+    <a href="xxx1">xxx1</a></td>
+    <td><img src="xxx2">
+    <br>
+    <a href="xxx2">xxx2</a></td>
+  </tr>
+</table>
+<br>
+### Thanks to everyone who helped in building this Repository :)
--- a/LICENSE
+++ b/LICENSE
+MIT License
+Copyright 2025 MiniMax AI.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+Our only modification is that, if the Software (or any derivative works
+thereof) is used for any of your commercial products or services that have more
+than 100 million monthly active users, or more than 30 million US dollars (or
+equivalent in other currencies) in annual recurring revenue, you shall
+prominently display “MiniMax M2” on the user interface of such product or
+service.
--- a/README.md
+++ b/README.md
+# MiniMax-M2
+## 论文
+[MiniMax-M2](https://www.minimax.io/news/minimax-m2)
+## 模型简介
+MiniMax-M2 重新定义了代理的效率。它是一个紧凑、快速且成本效益高的 MoE 模型（总参数量为 2300 亿，激活参数量为 100 亿），旨在在编码和代理任务中提供精英级性能，同时保持强大的通用智能。仅需 100 亿个激活参数，MiniMax-M2 就能提供当今领先模型所期望的复杂、端到端工具使用性能，但其精简的形式使其部署和扩展比以往任何时候都更容易。
+- **卓越的智能：**根据 Artificial Analysis 的基准测试，MiniMax-M2 在数学、科学、指令执行、编码和代理工具使用方面表现出高度竞争性的通用智能。其综合得分在全球开源模型中排名第一。
+- **高级编码：**MiniMax-M2 专为端到端开发人员工作流程设计，在多文件编辑、编码-运行-修复循环和测试验证修复方面表现出色。在 Terminal-Bench 和 (Multi-)SWE-Bench 风格任务中的强大表现证明了其在终端、IDE 和 CI 中跨语言的实际有效性。
+- **代理性能：**MiniMax-M2 能够在 shell、浏览器、检索和代码运行器中规划和执行复杂的长周期工具链。在 BrowseComp 风格的评估中，它能够始终如一地找到难以浮现的来源，保持证据可追溯，并优雅地从不稳定步骤中恢复。
+- **高效设计：**凭借 100 亿个激活参数（总共 2300 亿个参数），MiniMax-M2 提供了更低的延迟、更低的成本和更高的吞吐量，适用于交互式代理和批量采样——完美契合了向高度可部署但仍能在编码和代理任务中发光发热的模型转变的趋势。
+<div align=center>
+    <img src="./doc/Bench.png"/>
+</div>
+## 环境依赖
+| 软件 | 版本 |
+| :------: | :------: |
+| DTK | 25.04.2 |
+| python | 3.10.12 |
+| transformers | 4.57.1 |
+| vllm | 0.11.0+das.opt1.alpha.8e22ded.dtk25042 |
+| torch | 2.5.1+das.opt1.dtk25042 |
+| triton | 3.1+das.opt1.3c5d12d.dtk25041 |
+| flash_attn | 2.6.1+das.opt1.dtk2504 |
+| flash_mla | 1.0.0+das.opt1.dtk25042 |
+推荐使用镜像:
+- 挂载地址`-v`根据实际模型情况修改
+```bash
+docker run -it --shm-size 60g --network=host --name minimax_m2 --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro -v /path/your_code_path/:/path/your_code_path/ image.sourcefind.cn:5000/dcu/admin/custom/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10-minimax-m2 bash
+```
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+## 数据集
+暂无
+## 训练
+暂无
+## 推理
+1. 将FP8模型权重转换成BF16，转换方法如下：
+```bash
+python cast_model_dtype/fp8_cast_bf16.py --input-fp8-hf-path /path/of/MiniMax/MiniMax-M2/ --output-bf16-hf-path /path/of/MiniMax/MiniMax-M2-bf16
+```
+2. 相关模型文件拷贝：
+```bash
+cp config.json /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/chat_template.jinja /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/configuration.json /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/generation_config.json  /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/configuration_minimax_m2.py  /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/tokenizer* /path/of/MiniMax/MiniMax-M2-bf16
+cp /path/of/MiniMax/MiniMax-M2/vocab.json /path/of/MiniMax/MiniMax-M2-bf16
+```
+### vllm
+#### 单机推理
+```bash
+## serve启动
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export VLLM_MLA_DISABLE=0
+export VLLM_USE_FLASH_MLA=1
+vllm serve /path/of/MiniMax/MiniMax-M2-bf16/ \
+    --trust-remote-code \
+    --max-model-len 32768 \
+    --served-model-name minimax \
+    --dtype bfloat16 \
+    -tp 8
+## client访问
+curl http://localhost:8000/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "minimax",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
+## 效果展示
+<div align=center>
+    <img src="./doc/results.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：vllm。
+## 预训练权重
+| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
+|:-----:|:----------:|:----------:|:---------------------:|:----------:|
+| MiniMax-M2 | 230 B | K100AI | 8 | [下载地址](https://huggingface.co/MiniMaxAI/MiniMax-M2) |
+## 源码仓库及问题反馈
+-
+## 参考资料
+- https://github.com/MiniMax-AI/MiniMax-M2
--- a/cast_model_dtype/fp8_cast_bf16.py
+++ b/cast_model_dtype/fp8_cast_bf16.py
+import os
+import json
+from argparse import ArgumentParser
+from glob import glob
+from tqdm import tqdm
+import torch
+from safetensors.torch import load_file, save_file
+# from kernel import weight_dequant
+block_size = 128
+def weight_dequant(weight, scale):
+    shape = weight.shape
+    assert weight.dim() == 2
+    weight = weight.view(shape[0] // block_size, block_size, shape[1] // block_size, block_size).transpose(1, 2).contiguous().view(-1, block_size * block_size)
+    weight = (weight.float() * scale.view(-1, 1).float()).to(torch.get_default_dtype()).view(shape[0] // block_size, shape[1] // block_size, block_size, block_size).transpose(1, 2).contiguous().view(shape)
+    return weight
+def main(fp8_path, bf16_path):
+    torch.set_default_dtype(torch.bfloat16)
+    os.makedirs(bf16_path, exist_ok=True)
+    model_index_file = os.path.join(fp8_path, "model.safetensors.index.json")
+    with open(model_index_file, "r") as f:
+        model_index = json.load(f)
+    weight_map = model_index["weight_map"]
+    # Cache for loaded safetensor files
+    loaded_files = {}
+    fp8_weight_names = []
+    # Helper function to get tensor from the correct file
+    def get_tensor(tensor_name):
+        file_name = weight_map[tensor_name]
+        if file_name not in loaded_files:
+            file_path = os.path.join(fp8_path, file_name)
+            loaded_files[file_name] = load_file(file_path, device="cuda")
+        return loaded_files[file_name][tensor_name]
+    safetensor_files = list(glob(os.path.join(fp8_path, "*.safetensors")))
+    safetensor_files.sort()
+    for safetensor_file in tqdm(safetensor_files):
+        file_name = os.path.basename(safetensor_file)
+        current_state_dict = load_file(safetensor_file, device="cuda")
+        loaded_files[file_name] = current_state_dict
+        new_state_dict = {}
+        for weight_name, weight in current_state_dict.items():
+            if weight_name.endswith("_scale_inv"):
+                continue
+            elif weight.element_size() == 1:  # FP8 weight
+                scale_inv_name = f"{weight_name}_scale_inv"
+                try:
+                    # Get scale_inv from the correct file
+                    scale_inv = get_tensor(scale_inv_name)
+                    fp8_weight_names.append(weight_name)
+                    new_state_dict[weight_name] = weight_dequant(weight, scale_inv)
+                except KeyError:
+                    print(f"Warning: Missing scale_inv tensor for {weight_name}, skipping conversion")
+                    new_state_dict[weight_name] = weight
+            else:
+                new_state_dict[weight_name] = weight
+        new_safetensor_file = os.path.join(bf16_path, file_name)
+        save_file(new_state_dict, new_safetensor_file)
+        # Memory management: keep only the 2 most recently used files
+        if len(loaded_files) > 2:
+            oldest_file = next(iter(loaded_files))
+            del loaded_files[oldest_file]
+            torch.cuda.empty_cache()
+    # Update model index
+    new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
+    for weight_name in fp8_weight_names:
+        scale_inv_name = f"{weight_name}_scale_inv"
+        if scale_inv_name in weight_map:
+            weight_map.pop(scale_inv_name)
+    with open(new_model_index_file, "w") as f:
+        json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
+if __name__ == "__main__":
+    parser = ArgumentParser()
+    parser.add_argument("--input-fp8-hf-path", type=str, required=True)
+    parser.add_argument("--output-bf16-hf-path", type=str, required=True)
+    args = parser.parse_args()
+    main(args.input_fp8_hf_path, args.output_bf16_hf_path)
--- a/config.json
+++ b/config.json
+{
+  "architectures": [
+    "MiniMaxM2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "attn_type_list": [
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
+    "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
+  },
+  "bos_token_id": null,
+  "eos_token_id": null,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 3072,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layernorm_full_attention_beta": 1.0,
+  "layernorm_linear_attention_beta": 1.0,
+  "layernorm_mlp_beta": 1.0,
+  "max_position_embeddings": 196608,
+  "mlp_intermediate_size": 8192,
+  "model_type": "minimax_m2",
+  "mtp_transformer_layers": 1,
+  "num_attention_heads": 48,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 62,
+  "num_key_value_heads": 8,
+  "num_local_experts": 256,
+  "num_mtp_modules": 3,
+  "output_router_logits": false,
+  "qk_norm_type": "per_layer",
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 5000000,
+  "rotary_dim": 64,
+  "router_aux_loss_coef": 0.001,
+  "router_jitter_noise": 0.0,
+  "scoring_func": "sigmoid",
+  "shared_intermediate_size": 0,
+  "shared_moe_mode": "sigmoid",
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mtp": true,
+  "use_qk_norm": true,
+  "use_routing_bias": true,
+  "vocab_size": 200064
+}
--- a/doc/Bench.png
+++ b/doc/Bench.png
--- a/doc/results.png
+++ b/doc/results.png
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1813
+# 模型名称
+modelName=minimax-m2_vllm
+# 模型描述
+modelDescription=MiniMax-M2是一个紧凑、快速且成本效益高的 MoE 模型（总参数量为 2300 亿，激活参数量为 100 亿），旨在在编码和代理任务中提供精英级性能，同时保持强大的通用智能。
+# 应用场景
+processType=推理
+# 算法类别
+appScenario=代码生成
+# 框架类型
+frameType=vllm
+# 加速卡类型
+accelerateType=K100AI