Commit 55cce7ff authored by chenych's avatar chenych
Browse files

first commit

parents
# Contributors
This file contains the list of everyone who contributed to the repository
<br>
<table>
<th>Contributors1</th><th>Contributors2</th> <tr>
<td><img src="xxx1">
<br>
<a href="xxx1">xxx1</a></td>
<td><img src="xxx2">
<br>
<a href="xxx2">xxx2</a></td>
</tr>
</table>
<br>
### Thanks to everyone who helped in building this Repository :)
MIT License
Copyright 2025 MiniMax AI.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Our only modification is that, if the Software (or any derivative works
thereof) is used for any of your commercial products or services that have more
than 100 million monthly active users, or more than 30 million US dollars (or
equivalent in other currencies) in annual recurring revenue, you shall
prominently display “MiniMax M2” on the user interface of such product or
service.
# MiniMax-M2
## 论文
[MiniMax-M2](https://www.minimax.io/news/minimax-m2)
## 模型简介
MiniMax-M2 重新定义了代理的效率。它是一个紧凑、快速且成本效益高的 MoE 模型(总参数量为 2300 亿,激活参数量为 100 亿),旨在在编码和代理任务中提供精英级性能,同时保持强大的通用智能。仅需 100 亿个激活参数,MiniMax-M2 就能提供当今领先模型所期望的复杂、端到端工具使用性能,但其精简的形式使其部署和扩展比以往任何时候都更容易。
- **卓越的智能:**根据 Artificial Analysis 的基准测试,MiniMax-M2 在数学、科学、指令执行、编码和代理工具使用方面表现出高度竞争性的通用智能。其综合得分在全球开源模型中排名第一。
- **高级编码:**MiniMax-M2 专为端到端开发人员工作流程设计,在多文件编辑、编码-运行-修复循环和测试验证修复方面表现出色。在 Terminal-Bench 和 (Multi-)SWE-Bench 风格任务中的强大表现证明了其在终端、IDE 和 CI 中跨语言的实际有效性。
- **代理性能:**MiniMax-M2 能够在 shell、浏览器、检索和代码运行器中规划和执行复杂的长周期工具链。在 BrowseComp 风格的评估中,它能够始终如一地找到难以浮现的来源,保持证据可追溯,并优雅地从不稳定步骤中恢复。
- **高效设计:**凭借 100 亿个激活参数(总共 2300 亿个参数),MiniMax-M2 提供了更低的延迟、更低的成本和更高的吞吐量,适用于交互式代理和批量采样——完美契合了向高度可部署但仍能在编码和代理任务中发光发热的模型转变的趋势。
<div align=center>
<img src="./doc/Bench.png"/>
</div>
## 环境依赖
| 软件 | 版本 |
| :------: | :------: |
| DTK | 25.04.2 |
| python | 3.10.12 |
| transformers | 4.57.1 |
| vllm | 0.11.0+das.opt1.alpha.8e22ded.dtk25042 |
| torch | 2.5.1+das.opt1.dtk25042 |
| triton | 3.1+das.opt1.3c5d12d.dtk25041 |
| flash_attn | 2.6.1+das.opt1.dtk2504 |
| flash_mla | 1.0.0+das.opt1.dtk25042 |
推荐使用镜像:
- 挂载地址`-v`根据实际模型情况修改
```bash
docker run -it --shm-size 60g --network=host --name minimax_m2 --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro -v /path/your_code_path/:/path/your_code_path/ image.sourcefind.cn:5000/dcu/admin/custom/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10-minimax-m2 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
## 数据集
暂无
## 训练
暂无
## 推理
1. 将FP8模型权重转换成BF16,转换方法如下:
```bash
python cast_model_dtype/fp8_cast_bf16.py --input-fp8-hf-path /path/of/MiniMax/MiniMax-M2/ --output-bf16-hf-path /path/of/MiniMax/MiniMax-M2-bf16
```
2. 相关模型文件拷贝:
```bash
cp config.json /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/chat_template.jinja /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/configuration.json /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/generation_config.json /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/configuration_minimax_m2.py /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/tokenizer* /path/of/MiniMax/MiniMax-M2-bf16
cp /path/of/MiniMax/MiniMax-M2/vocab.json /path/of/MiniMax/MiniMax-M2-bf16
```
### vllm
#### 单机推理
```bash
## serve启动
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_MLA_DISABLE=0
export VLLM_USE_FLASH_MLA=1
vllm serve /path/of/MiniMax/MiniMax-M2-bf16/ \
--trust-remote-code \
--max-model-len 32768 \
--served-model-name minimax \
--dtype bfloat16 \
-tp 8
## client访问
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimax",
"messages": [
{
"role": "user",
"content": "牛顿提出了哪三大运动定律?请简要说明。"
}
]
}'
```
## 效果展示
<div align=center>
<img src="./doc/results.png"/>
</div>
### 精度
DCU与GPU精度一致,推理框架:vllm。
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| MiniMax-M2 | 230 B | K100AI | 8 | [下载地址](https://huggingface.co/MiniMaxAI/MiniMax-M2) |
## 源码仓库及问题反馈
-
## 参考资料
- https://github.com/MiniMax-AI/MiniMax-M2
import os
import json
from argparse import ArgumentParser
from glob import glob
from tqdm import tqdm
import torch
from safetensors.torch import load_file, save_file
# from kernel import weight_dequant
block_size = 128
def weight_dequant(weight, scale):
shape = weight.shape
assert weight.dim() == 2
weight = weight.view(shape[0] // block_size, block_size, shape[1] // block_size, block_size).transpose(1, 2).contiguous().view(-1, block_size * block_size)
weight = (weight.float() * scale.view(-1, 1).float()).to(torch.get_default_dtype()).view(shape[0] // block_size, shape[1] // block_size, block_size, block_size).transpose(1, 2).contiguous().view(shape)
return weight
def main(fp8_path, bf16_path):
torch.set_default_dtype(torch.bfloat16)
os.makedirs(bf16_path, exist_ok=True)
model_index_file = os.path.join(fp8_path, "model.safetensors.index.json")
with open(model_index_file, "r") as f:
model_index = json.load(f)
weight_map = model_index["weight_map"]
# Cache for loaded safetensor files
loaded_files = {}
fp8_weight_names = []
# Helper function to get tensor from the correct file
def get_tensor(tensor_name):
file_name = weight_map[tensor_name]
if file_name not in loaded_files:
file_path = os.path.join(fp8_path, file_name)
loaded_files[file_name] = load_file(file_path, device="cuda")
return loaded_files[file_name][tensor_name]
safetensor_files = list(glob(os.path.join(fp8_path, "*.safetensors")))
safetensor_files.sort()
for safetensor_file in tqdm(safetensor_files):
file_name = os.path.basename(safetensor_file)
current_state_dict = load_file(safetensor_file, device="cuda")
loaded_files[file_name] = current_state_dict
new_state_dict = {}
for weight_name, weight in current_state_dict.items():
if weight_name.endswith("_scale_inv"):
continue
elif weight.element_size() == 1: # FP8 weight
scale_inv_name = f"{weight_name}_scale_inv"
try:
# Get scale_inv from the correct file
scale_inv = get_tensor(scale_inv_name)
fp8_weight_names.append(weight_name)
new_state_dict[weight_name] = weight_dequant(weight, scale_inv)
except KeyError:
print(f"Warning: Missing scale_inv tensor for {weight_name}, skipping conversion")
new_state_dict[weight_name] = weight
else:
new_state_dict[weight_name] = weight
new_safetensor_file = os.path.join(bf16_path, file_name)
save_file(new_state_dict, new_safetensor_file)
# Memory management: keep only the 2 most recently used files
if len(loaded_files) > 2:
oldest_file = next(iter(loaded_files))
del loaded_files[oldest_file]
torch.cuda.empty_cache()
# Update model index
new_model_index_file = os.path.join(bf16_path, "model.safetensors.index.json")
for weight_name in fp8_weight_names:
scale_inv_name = f"{weight_name}_scale_inv"
if scale_inv_name in weight_map:
weight_map.pop(scale_inv_name)
with open(new_model_index_file, "w") as f:
json.dump({"metadata": {}, "weight_map": weight_map}, f, indent=2)
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--input-fp8-hf-path", type=str, required=True)
parser.add_argument("--output-bf16-hf-path", type=str, required=True)
args = parser.parse_args()
main(args.input_fp8_hf_path, args.output_bf16_hf_path)
{
"architectures": [
"MiniMaxM2ForCausalLM"
],
"attention_dropout": 0.0,
"attn_type_list": [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
],
"auto_map": {
"AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
"AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
},
"bos_token_id": null,
"eos_token_id": null,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 1536,
"layernorm_full_attention_beta": 1.0,
"layernorm_linear_attention_beta": 1.0,
"layernorm_mlp_beta": 1.0,
"max_position_embeddings": 196608,
"mlp_intermediate_size": 8192,
"model_type": "minimax_m2",
"mtp_transformer_layers": 1,
"num_attention_heads": 48,
"num_experts_per_tok": 8,
"num_hidden_layers": 62,
"num_key_value_heads": 8,
"num_local_experts": 256,
"num_mtp_modules": 3,
"output_router_logits": false,
"qk_norm_type": "per_layer",
"rms_norm_eps": 1e-06,
"rope_theta": 5000000,
"rotary_dim": 64,
"router_aux_loss_coef": 0.001,
"router_jitter_noise": 0.0,
"scoring_func": "sigmoid",
"shared_intermediate_size": 0,
"shared_moe_mode": "sigmoid",
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mtp": true,
"use_qk_norm": true,
"use_routing_bias": true,
"vocab_size": 200064
}
icon.png

72.7 KB

# 模型唯一标识
modelCode=1813
# 模型名称
modelName=minimax-m2_vllm
# 模型描述
modelDescription=MiniMax-M2是一个紧凑、快速且成本效益高的 MoE 模型(总参数量为 2300 亿,激活参数量为 100 亿),旨在在编码和代理任务中提供精英级性能,同时保持强大的通用智能。
# 应用场景
processType=推理
# 算法类别
appScenario=代码生成
# 框架类型
frameType=vllm
# 加速卡类型
accelerateType=K100AI
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment