首次提交

7f2360be · weishb · 7f2360be · 7f2360be · 7f2360be · 7f2360be
Commit 7f2360be authored Apr 14, 2026 by weishb
8 changed files
--- a/LICENSE
+++ b/LICENSE
+NON-COMMERCIAL LICENSE
+Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorization.
+Copyright (c) 2026 MiniMax
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software for non-commercial purposes, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or provide copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+2. If the Software (or any derivative works thereof) is used for any Commercial Use, you shall prominently display "Built with MiniMax M2.7" on a related website, user interface, blogpost, about page or product documentation.
+3. Any Commercial Use of the Software or any derivative work thereof is prohibited without obtaining a separate, prior written authorization from MiniMax.  To request such authorization, please contact api@minimax.io with the subject line "M2.7 licensing".
+4. "Commercial Use" means any use of the Software or any derivative work thereof that is primarily intended for commercial advantage or monetary compensation, which includes, without limitation: (i) offering products or services to third parties for a fee, which utilize, incorporate, or rely on the Software or its derivatives, (ii) the commercial use of APIs provided by or for the Software or its derivatives, including to support or enable commercial products, services, or operations, whether in a cloud-based, hosted, or other similar environment, and (iii) the deployment or provision of the Software or its derivatives that have been subjected to post-training, fine-tuning, instruction-tuning, or any other form of modification, for any commercial purpose.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+Appendix: Prohibited Uses
+You agree you will not use, or allow others to use, the Software or any derivatives of the Software to:
+1. Generate or disseminate content prohibited by applicable laws or regulations.
+2. Assist with, engage in or otherwise support any military purpose.
+3. Exploit, harm, or attempt to exploit or harm minors.
+4. Generate or disseminate false or misleading information with the intent to cause harm.
+5. Promote discrimination, hate speech, or harmful behavior against individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other characteristic that is associated with systemic discrimination or marginalization.
--- a/README.md
+++ b/README.md
+# MiniMax-M2.7
+## 论文
+暂无
+
+## 模型简介
+MiniMax-M2.7 是 MiniMax 开源的 MiniMax-M2 系列模型之一，面向代码生成、Agent 工作流与复杂工具调用场景。根据官方公开资料，MiniMax-M2 系列采用 MoE 架构，总参数规模约为 230B，激活参数约为 10B，在保持通用文本生成能力的同时，重点强化了编码、多步规划和工具使用能力。
+
+<div align=center>
+    <img src="./doc/01.png"/>
+</div>
+该系列还是一个 interleaved thinking 模型。官方说明要求在多轮对话中保留助手历史消息中的 `<think>...</think>` 内容，否则会影响模型效果。
+
+## 环境依赖
+
+| 软件 |                    版本                     |
+| :------: |:-----------------------------------------:|
+| DTK |                   26.04                   |
+| python |                  3.10.12                  |
+| transformers |                5.2.0.dev0                 |
+| vllm |       0.15.1+das.opt1.alpha.dtk2604.torch290.2604081832.gbcb2ba       |
+| triton | 3.3.0+das.opt2.dtk2604.torch290.20260331.g31542e |
+| torch | 2.9.0+das.opt1.dtk2604.20260331.g4e3c1e7 |
+
+当前仅支持定制镜像:  harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm015-ubuntu22.04-dtk26.04-0409-modelzoo
+
+- 挂载地址`-v` 根据实际模型情况修改
+```bash
+docker run -it \
+    --shm-size 60g \
+    --network=host \
+    --name minimax-m2.7 \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mkfd \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -u root \
+    -v /opt/hyhal/:/opt/hyhal/:ro \
+    -v /path/your_code_data/:/path/your_code_data/ \
+    harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm015-ubuntu22.04-dtk26.04-0409-modelzoo bash
+```
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
+
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装
+## 数据集
+暂无
+
+## 训练
+暂无
+
+## 推理
+1. 将FP8模型权重转换成BF16，转换方法如下：
+
+```bash
+python cast_model_dtype/fp8_cast_bf16.py --input-fp8-hf-path /path/of/MiniMax/MiniMax-M2.7/ --output-bf16-hf-path /path/of/MiniMax/MiniMax-M2.7-bf16
+```
+2. 相关模型文件拷贝：
+```bash
+cp /path/of/MiniMax/MiniMax-M2.7/config.json /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/chat_template.jinja /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/configuration.json /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/generation_config.json  /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/configuration_minimax_m2.py  /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/tokenizer* /path/of/MiniMax/MiniMax-M2.7-bf16
+cp /path/of/MiniMax/MiniMax-M2.7/vocab.json /path/of/MiniMax/MiniMax-M2.7-bf16
+```
+
+**删掉 `/path/of/MiniMax/MiniMax-M2.7-bf16/config.json` 中的 `quantization_config` 字段内容，如图所示**
+<div align=center>
+    <img src="./doc/quant.png"/>
+</div>
+
+### vllm
+#### 单机推理
+
+```bash
+## serve启动
+vllm serve /path/MiniMax-M2.7-bf16 \
+    --trust-remote-code \
+    --served-model-name minimax-m2.7 \
+    --gpu-memory-utilization 0.85 \
+    --max-model-len 32768 \
+    --dtype bfloat16 \
+    -tp 8 \
+    --port 8001 \
+    --enable-auto-tool-choice \
+    --tool-call-parser minimax_m2 \
+    --reasoning-parser minimax_m2 \
+    --enable-expert-parallel
+
+## client访问
+curl http://localhost:8001/v1/chat/completions   \
+    -H "Content-Type: application/json"  \
+    -d '{
+        "model": "minimax-m2.7",
+        "messages": [
+            {
+                "role": "user",
+                "content": "牛顿提出了哪三大运动定律？请简要说明。"
+            }
+        ]
+    }'
+```
+
+## 效果展示
+<div align=center>
+    <img src="./doc/result.png"/>
+</div>
+
+
+### 精度
+DCU与GPU精度一致，推理框架：vllm。
+
+## 预训练权重
+| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 | 下载地址 |
+|:-----:|:----------:|:----------:|:---------------------:|:----------:|
+| MiniMax-M2.7 | 229B | BW1000,BW1100 | 8 | [ModelScope](https://www.modelscope.cn/models/MiniMax/MiniMax-M2.7/summary) |
+
+## 源码仓库及问题反馈
+- http://developer.sourcefind.cn/codes/modelzoo/minimax-m2.7_vllm
+
+## 参考资料
+- https://www.modelscope.cn/models/MiniMax/MiniMax-M2.7/summary
+- https://huggingface.co/MiniMaxAI/MiniMax-M2.7
+- https://github.com/MiniMax-AI/MiniMax-M2
+- https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html
--- a/cast_model_dtype/fp8_cast_bf16.py
+++ b/cast_model_dtype/fp8_cast_bf16.py
+import os
+import json
+from argparse import ArgumentParser
+from glob import glob
+from tqdm import tqdm
+
+import torch
+from safetensors.torch import load_file, save_file
+
+# from kernel import weight_dequant
+block_size = 128
+
+def weight_dequant(weight, scale):
+    shape = weight.shape
+    assert weight.dim() == 2
+    weight = weight.view(shape[0] // block_size, block_size, shape[1] // block_size, block_size).transpose(1, 2).contiguous().view(-1, block_size * block_size)
+    weight = (weight.float() * scale.view(-1, 1).float()).to(torch.get_default_dtype()).view(shape[0] // block_size, shape[1] // block_size, block_size, block_size).transpose(1, 2).contiguous().view(shape)
+    return weight
+
+def main(fp8_path, bf16_path):
+    torch.set_default_dtype(torch.bfloat16)
+    os.makedirs(bf16_path, exist_ok=True)
+    model_index_file = os.path.join(fp8_path, "model.safetensors.index.json")
+    with open(model_index_file, "r") as f:
+        model_index = json.load(f)
+    weight_map = model_index["weight_map"]
+
+    # Cache for loaded safetensor files
+    loaded_files = {}
+    fp8_weight_names = []
+
+    # Helper function to get tensor from the correct file
+    def get_tensor(tensor_name):
+        file_name = weight_map[tensor_name]
+        if file_name not in loaded_files:
+            file_path = os.path.join(fp8_path, file_name)
+            loaded_files[file_name] = load_file(file_path, device="cuda")
+        return loaded_files[file_name][tensor_name]
+
+    safetensor_files = list(glob(os.path.join(fp8_path, "*.safetensors")))
+    safetensor_files.sort()
+    for safetensor_file in tqdm(safetensor_files):
+        file_name = os.path.basename(safetensor_file)
+        current_state_dict = load_file(safetensor_file, device="cuda")
+        loaded_files[file_name] = current_state_dict
+
+        new_state_dict = {}
+        for weight_name, weight in current_state_dict.items():
+            if weight_name.endswith("_scale_inv"):
+                continue
+            elif weight.element_size() == 1:  # FP8 weight
+                scale_inv_name = f"{weight_name}_scale_inv"
+                try:
+                    # Get scale_inv from the correct file
+                    scale_inv = get_tensor(scale_inv_name)
+                    fp8_weight_names.append(weight_name)
+                    new_state_dict[weight_name] = weight_dequant(weight, scale_inv)
+                except KeyError:
+                    print(f"Warning: Missing scale_inv tensor for {weight_name}, skipping conversion")
+                    new_state_dict[weight_name] = weight
+            else:
+                new_state_dict[weight_name] = weight
+
+        new_safetensor_file = os.path.join(bf16_path, file_name)
+        save_file(new_state_dict, new_safetensor_file)
+
+        # Memory management: keep only the 2 most recently used files
+        if len(loaded_files) > 2:
+            oldest_file = next(iter(loaded_files))
+            del loaded_files[oldest_file]
+            torch.cuda.empty_cache()
+
+    # Update model index
+    new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
+    for weight_name in fp8_weight_names:
+        scale_inv_name = f"{weight_name}_scale_inv"
+        if scale_inv_name in weight_map:
+            weight_map.pop(scale_inv_name)
+    with open(new_model_index_file, "w") as f:
+        json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
+
+
+if __name__ == "__main__":
+    parser = ArgumentParser()
+    parser.add_argument("--input-fp8-hf-path", type=str, required=True)
+    parser.add_argument("--output-bf16-hf-path", type=str, required=True)
+    args = parser.parse_args()
+    main(args.input_fp8_hf_path, args.output_bf16_hf_path)
+
--- a/doc/01.png
+++ b/doc/01.png
--- a/doc/quant.png
+++ b/doc/quant.png
--- a/doc/result.png
+++ b/doc/result.png
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=2356
+# 模型名称
+modelName=MiniMax-M2.7_vllm
+# 模型描述
+modelDescription=MiniMax-M2.7 是面向代码生成与 Agent 工作流的 MiniMax-M2 系列 MoE 文本生成模型。
+# 运行过程
+processType=推理
+# 算法类别
+appCategory=文本生成
+# 框架类型
+frameType=vllm
+# 加速卡类型
+accelerateType=BW1100,BW1000