Commit 4d49d792 authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2025 MiniMax
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This diff is collapsed.
# MiniMax-M1
MiniMax M1拥有超长的上下文能力,100万token输入,8万token输出,足以媲美Gemini 2.5 Pro的开源模型。
## 论文
`无`
## 模型结构
MiniMax采用通用的Decoder-Only结构,基于MOE,将大量Softmax Attention替换成包含线性注意力机制的Lightning Attention(7:1)节约计算量。
<div align=center>
<img src="./doc/MiniMax.png"/>
</div>
## 算法原理
MiniMax引入Linear Attention节约计算量,同时,其RL训练引入自主提出的CISPO,比DAPO训练收敛速度约快一倍。
Lightning Attention:把注意力计算分成块内和块间两部分,块内用传统注意力计算,块间用线性注意力的核技巧,避免了累积求和操作(cumsum)拖慢速度;
CISPO:选择裁剪重要性采样权重,这样可以保留所有token的梯度贡献,特别是在长响应中至关重要;
<div align=center>
<img src="./doc/Lightning_Attention.png"/>
</div>
## 环境配置
```
mv MiniMax-M1_vllm MiniMax-M1 # 去框架名后缀
```
### 硬件需求
DCU型号:BW1000,节点数量:2 台,卡数:2*8 张。
### 通信配置
一、节点间基础通信
`在本地机器上配置以下内容:`
1、关闭防火墙:
```
systemctl stop firewalld # 若为centos
ufw disable # 若为Ubuntu
```
2、设置amd_iommu=on:
```
vim /etc/default/grub
```
<div align=center>
<img src="./doc/amd_iommu.png"/>
</div>
更新下配置:
```
grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg
```
重启机器后校验是否生效(检查是否存在imxxx=pt):
```
BOOT_IMAGE=(hd0,gpt3)/vmlinuz-4.18.0-372.9.1.el8.x86_64 root=UUID=80974f58-7d23-49bb-bd8b-8e299eb0d188 ro crashkernel=auto rhgb quiet systemd.unified_cgroup_hierachy=1 systemd.unified_cgroup_hierarchy=1 amd_iommu=on iommu=pt
```
`在后面步骤启动的容器里面配置以下内容:`
```
apt update
apt install openssh-server -y
```
vim /etc/ssh/sshd_config # 修改下面PermitRootLogin为yes
```
# 取消以下4句命令的注释
RSAAuthentication yes #启用 RSA 认证
PubkeyAuthentication yes #启用公钥私钥配对认证方式
AuthorizedKeysFile ~/.ssh/authorized_keys #公钥文件路径(和下面生成的文件同)
PermitRootLogin yes #root能使用ssh登录
```
重启ssh服务,并设置开机启动:
```
service sshd restart
chkconfig sshd on
查看sshd状态:service ssh status
开启sshd服务:/etc/init.d/ssh restart
```
下面开始设置节点间免密通信的秘钥:
1、ssh-keygen生成秘钥
```
ssh-keygen -t ed25519 # 此处以ed25519为例,读者可自己设置为其它名字,遇到提问全部回车键确认
```
2、将需要使用的各个节点`~/.ssh/authorized_keys`里的秘钥收集复制到`~/.ssh/id_rsa.pub`,每个节点`~/.ssh/id_rsa.pub`里的所有秘钥最终一致。格式类似如下:
<div align=center>
<img src="./doc/id_rsa.png"/>
</div>
3、设置节点间的通信端口号
```
/usr/sbin/sshd -p 10085 # 不同节点可以设置不同的端口号,打通秘钥和端口号之后可以用ssh -p之类的命令验证节点间是否通信已经通畅,否则需检查前面步骤是否设置成功。
```
以上设置非标准步骤,不同服务器或集群存在明显差异,无法完全复制此过程,请读者根据自己机器的实际情况灵活采用,总体目标是开启amd_iommu、打通节点间的容器内可以直接免密登录。
二、ray相关通信
`在后面步骤启动的容器里面配置以下内容:`
```
vim ~/.bashrc
```
在脚本`.bashrc`最后面添加以下命令(以BW200卡的集群为例):
```
export ALLREDUCE_STREAM_WITH_COMPUTE=1
export VLLM_HOST_IP=x.x.x.x
export NCCL_SOCKET_IFNAME=enp33s0f3u1
export GLOO_SOCKET_IFNAME=enp33s0f3u1
unset NCCL_ALGO
export NCCL_MIN_NCHANNELS=16
export NCCL_MAX_NCHANNELS=16
export NCCL_NET_GDR_READ=1
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_SPEC_DECODE_EAGER=1
export VLLM_MLA_DISABLE=0
export VLLM_USE_FLASH_MLA=1
# 若为BW卡,则添加以下信息:
export NCCL_NET_GDR_LEVEL=7
export NCCL_SDMA_COPY_ENABLE=0
export NCCL_IB_HCA=mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1,mlx5_9:1
# 若为K100_AI卡,则添加以下信息(本步骤以BW卡为示例,故注释了以下信息。):
# export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44
```
其中`VLLM_HOST_IP``NCCL_SOCKET_IFNAME`需要替换成每个自己机器上查到的信息,每个节点的ip不同,查询方式如下:
```
通信口和ip查询方法:ifconfig
VLLM_HOST_IP: 节点本地通信口ip
NCCL_SOCKET_IFNAME和GLOO_SOCKET_IFNAME: 节点本地通信网口名
```
`示例:`
<div align=center>
<img src="./doc/ip.png"/>
</div>
带BW卡的集群VLLM_HOST_IP需要设置为ib网卡对应的IP,避免出现rccl超时问题:
<div align=center>
<img src="./doc/ip_bw.png"/>
</div>
注意:添加完以上信息后必须退出容器,然后重启容器,最后重新进入容器后以上环境配置才能生效,否则后续会出现NCCL通信报错,重启容器的命令如下:
```
docker restart minimax # 必须
```
`Tips:由于通信配置方面属于运维人员的专业内容,其它人员可能了解很少,以上关于通信的配置建议读者让运维人员进行配置。`
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04.1-rc4-das1.6-py3.10-20250620-fixpy
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:e99e26dbb33b
docker run -it --shm-size=192G --network=host --ipc=host -p 8000:8000 -v $PWD/MiniMax-M1:/home/MiniMax-M1 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=//dev/dri/ --group-add video --name minimax <your IMAGE ID> bash
```
### Dockerfile(方法二)
```
cd /home/MiniMax-M1/docker
docker build --no-cache -t minimax:latest .
docker run --shm-size=192G --name minimax --network=host --ipc=host -p 8000:8000 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=//dev/dri/ --group-add video -v $PWD/../../MiniMax-M1:/home/MiniMax-M1 -it minimax bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.hpccube.com/tool/
```
DTK驱动:dtk2504
python:python3.10
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
vllm:0.8.5
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.4.0
transformers:4.51.1
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
`无`
## 数据集
`无`
## 训练
`无`
## 推理
预训练权重目录结构:
```
/home/MiniMax-M1/
└── MiniMax/MiniMax-M1-40k
```
权重下载完成后,请将`MiniMax/MiniMax-M1-40k/config.json`中的`architectures`修改为:
```
"architectures": [
"MiniMaxText01ForCausalLM"
],
```
### 多机多卡
```
cd /home/MiniMax-M1
# 启动ray
ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=16 # 启动主节点的ray, x.x.x.x 为前面步骤中ifconfig查到的主节点ip(VLLM_HOST_IP)。
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=16 # 启动其它节点的ray, x.x.x.x 为前面步骤中ifconfig查到的主节点ip(VLLM_HOST_IP)。
# 可用ray status 查看ray的集群启动状态。
# 本项目以MiniMax-M1-40k示例,其它MiniMax-M1模型以此类推,MiniMax-M1-80k对卡数的需求更多。
# 方法一:vllm在线推理
export SAFETENSORS_FAST_GPU=1
export VLLM_USE_V1=0
# 启动服务端
vllm serve MiniMax/MiniMax-M1-40k --distributed-executor-backend ray --host 0.0.0.0 --port 8000 --tensor-parallel-size 16 --max_model_len 4096 --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.99 --trust-remote-code
# 客户端测试命令示例:
curl http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "MiniMax/MiniMax-M1-40k",
"messages": [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
{"role": "user", "content": [{"type": "text", "text": "美国的国土面积多大?"}]}
]
}'
方法二:vllm离线推理
python infer_vllm.py # 以MiniMax-M1-40k示例
# 对于报错:AttributeError: 'NoneType' object has no attribute 'info'
# 注释掉此行原始代码的logger日志打印即可:/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_distributed_executor.py", line 127
```
更多资料可参考源项目中的[`README_orgin`](./README_orgin.md)
## result
vllm推理效果示例:
`输入: `
```
prompt: "美国的国土面积多大?"
```
`输出:`
```
Generated text: '<think>\n好的,我现在需要回答用户的问题:“美国的国土面积多大?”首先,我得确认自己对这个问题的了解程度。我记得美国是世界上面积较大的国家之一,但具体数字可能记不太准了。可能需要查证一下。\n\n首先,我应该回想一下美国的基本地理知识。美国位于北美洲,东临大西洋,西接太平洋,北边加拿大,南接墨西哥。国土面积包括本土的48个州和阿拉斯加、夏威夷两个州,以及一些海外领土。不过通常所说的国土面积可能指的是陆地和水域的总面积,或者只是陆地面积?\n\n接下来,我需要确定正确的单位。通常国 国土面积会用平方公里或者平方英里来表示。比如,中国是约960万平方公里,美国可能比中国小一些?或者更大?我记得之前学过的数据可能有些混淆,需要确认。\n\n然后,可能需要考虑不同的数据来源是否一致。比如维基百科的数据,或者其他权威网站的数据。另外,是否 有最新的数据,因为有时候国家的面积可能会有调整,比如通过领土争端解决或者测量技术的改进。\n\n另外,需要注意美国国土面积的组成部分。比如,本土48州的面积,阿拉斯加的面积,夏威夷的面积,以及其他海外领土如波多黎各、关岛等的面积是否包含在内。通常来说 说,国土面积可能指的是总领土,包括所有州和领土,但有时候可能只算主要部分。\n\n比如,根据我之前的记忆,美国的面积大约是9,833,517平方公里,或者约3,796,742平方英里。但不确定这个数字是否准确。或者可能更接近9.6 million平方公里?需要核实。\n\n另外,可 能存在不同的统计方式,比如总土地面积、水域面积等。例如,美国的陆地面积可能比总国土面积小,因为包括内陆水域如五大湖等。\n\n现在,我需要找到可靠的数据来源。比如,可以查阅世界银行的数据,或者美国政府官方网站的数据,或者权威的地理数据库。\n\n根据世 世界银行的数据,美国的国土面积是9,833,517平方公里。而中国的面积是9,388,210平方公里(可能不包括台湾?)。所以美国比中国稍大?或者可能我的记忆有误?\n\n不过,可能存在不同的数据来源,比如有的资料说美国面积约9.6百万平方公里,而中国约9.6百万,但可能统 统计方式不同。需要确认。\n\n另外,需要注意单位转换是否正确。比如,1平方英里等于2.58999平方公里。所以如果美国的面积是3,796,742平方英里,那么换算成平方公里就是3,796,742 × 2.58999 ≈ 9,833,517平方公里,这和世界银行的数据一致。\n\n所以,正确的答案应该 该是大约9,833'
```
### 精度
DCU与GPU精度一致,推理框架:vllm。
## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`制造,广媒,金融,能源,医疗,家居,教育`
## 预训练权重
HF下载地址为:[MiniMaxAI/MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/MiniMax-M1_vllm.git
## 参考资料
- https://github.com/MiniMax-AI/MiniMax-M1.git
<div align="center">
<picture>
<source srcset="figures/MiniMaxLogo-Dark.png" media="(prefers-color-scheme: dark)">
<img src="figures/MiniMaxLogo-Light.png" width="60%" alt="MiniMax">
</source>
</picture>
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.minimax.io" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://arxiv.org/abs/2506.13585" target="_blank" style="margin: 2px;">
<img alt="Paper" src="https://img.shields.io/badge/📖_Paper-MiniMax--M1-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://chat.minimax.io/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/_MiniMax_Chat-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://www.minimax.io/platform" style="margin: 2px;">
<img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/MiniMax-AI/MiniMax-MCP" style="margin: 2px;">
<img alt="MCP" src="https://img.shields.io/badge/🚀_MCP-MiniMax_MCP-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/MiniMax-AI/MiniMax-M1" target="_blank" style="margin: 2px;">
<img alt="GitHub" src="https://img.shields.io/badge/🐙_GitHub-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://www.modelscope.cn/organization/MiniMax" target="_blank" style="margin: 2px;">
<img alt="ModelScope" src="https://img.shields.io/badge/🤖️_ModelScope-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/MiniMax-AI/MiniMax-M1/blob/main/LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/⚖️_License-Apache_2.0-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/figures/wechat-qrcode.jpeg" target="_blank" style="margin: 2px;">
<img alt="WeChat" src="https://img.shields.io/badge/💬_WeChat-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
# MiniMax-M1
## 1. Model Overview
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01),
which contains a total of 456 billion parameters with 45.9 billion parameters activated
per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1
particularly suitable for complex tasks that require processing long inputs and thinking extensively.
MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from
traditional mathematical reasoning to sandbox-based, real-world software engineering environments.
We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose
CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
[80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
on standard benchmarks show that our models outperform other strong open-weight models such as
the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
foundation for next-generation language model agents to reason and tackle real-world challenges.
<p align="center">
<img width="100%" src="figures/TextBench.png">
<br>
<small><em>Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1.</em></small>
</p>
## 2. Evaluation
**Performance of MiniMax-M1 on core benchmarks.**
| **Category** | **Task** | **MiniMax-M1-80K** | **MiniMax-M1-40K** | **Qwen3-235B-A22B** | **DeepSeek-R1-0528** | **DeepSeek-R1** | **Seed-Thinking-v1.5** | **Claude 4 Opus** | **Gemini 2.5 Pro (06-05)** | **OpenAI-o3** |
|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| | *Extended Thinking* | *80K* | *40K* | *32k* | *64k* | *32k* | *32k* | *64k* | *64k* | *100k* |
| ***Mathematics*** | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 |
| | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 |
| | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 |
| ***General Coding*** | LiveCodeBench *(24/8~25/5)* | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 |
| | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 |
| ***Reasoning & Knowledge***| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 |
| | HLE *(no tools)* | 8.4\* | 7.2\* | 7.6\* | 17.7\* | 8.6\* | 8.2 | 10.7 | 21.6 | 20.3 |
| | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 |
| | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 |
| ***Software Engineering***| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 |
| ***Long Context*** | OpenAI-MRCR *(128k)* | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 |
| | OpenAI-MRCR *(1M)* | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- |
| | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 |
| ***Agentic Tool Use***| TAU-bench *(airline)* | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 |
| | TAU-bench *(retail)* | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 |
| ***Factuality*** | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 |
| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
\* conducted on the text-only HLE subset.
Our models are evaluated with `temperature=1.0`, `top_p=0.95`.
### SWE-bench methodology
We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are:
`"astropy__astropy-7606"`,
`"astropy__astropy-8707"`,
`"astropy__astropy-8872"`,
`"django__django-10097"`,
`"matplotlib__matplotlib-20488"`,
`"psf__requests-2317"`,
`"psf__requests-2931"`,
`"psf__requests-5414"`,
`"pylint-dev__pylint-6528"`,
`"pylint-dev__pylint-7277"`,
`"sphinx-doc__sphinx-10435"`,
`"sphinx-doc__sphinx-7985"`,
`"sphinx-doc__sphinx-8269"`,
`"sphinx-doc__sphinx-8475"`
### TAU-bench methodology
We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40.
Our general system prompt is:
```
- In each round, you need to carefully examine the tools provided to you to determine if any can be used.
- You must adhere to all of the policies. Pay attention to the details in the terms. Solutions for most situations can be found within these policies.
```
## 3. Recommendations for Minimax-M1 Model Usage
To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt.
### 3.1. Inference Parameters
- Temperature: **`1.0`**
- Top_p: **`0.95`**
This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence.
### 3.2. System Prompt
Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios.
#### A. General-Purpose Scenarios
For common tasks like summarization, translation, Q&A, or creative writing:
```
You are a helpful assistant.
```
#### B. Web Development Scenarios
For complex tasks like generating code for web pages:
```
You are a web development engineer, writing web pages according to the instructions below. You are a powerful code editing assistant capable of writing code and creating artifacts in conversations with users, or modifying and updating existing artifacts as requested by users.
All code is written in a single code block to form a complete code file for display, without separating HTML and JavaScript code. An artifact refers to a runnable complete code snippet, you prefer to integrate and output such complete runnable code rather than breaking it down into several code blocks. For certain types of code, they can render graphical interfaces in a UI window. After generation, please check the code execution again to ensure there are no errors in the output.
Output only the HTML, without any additional descriptive text. Make the UI looks modern and beautiful.
```
#### C. Mathematical Scenarios
When dealing with problems that require calculation or logical deduction:
```
Please reason step by step, and put your final answer within \boxed{}.
```
## 4. Deployment Guide
Download the model from HuggingFace repository:
- [MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)
- [MiniMax-M1-80k](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k)
For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/latest/) to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features:
- 🔥 Outstanding service throughout performance
- ⚡ Efficient and intelligent memory management
- 📦 Powerful batch request processing capability
- ⚙️ Deeply optimized underlying performance
For detailed vLLM deployment instructions, please refer to our [vLLM Deployment Guide](./docs/vllm_deployment_guide.md).
Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our [MiniMax-M1 Transformers Deployment Guide](./docs/transformers_deployment_guide.md).
## 5. Function Calling
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](./docs/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
## 6. Chatbot & API
For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
## 7. Citation
```
@misc{minimax2025minimaxm1scalingtesttimecompute,
title={MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention},
author={MiniMax},
year={2025},
eprint={2506.13585},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.13585},
}
```
## 8. Contact Us
Contact us at [model@minimax.io](mailto:model@minimax.io).
\ No newline at end of file
{
"architectures": [
"MiniMaxM1ForCausalLM"
],
"attention_dropout": 0.0,
"attn_type_list": [
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1
],
"auto_map": {
"AutoConfig": "configuration_minimax_m1.MiniMaxM1Config",
"AutoModelForCausalLM": "modeling_minimax_m1.MiniMaxM1ForCausalLM"
},
"bos_token_id": null,
"eos_token_id": null,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 6144,
"initializer_range": 0.02,
"intermediate_size": 9216,
"layernorm_full_attention_alpha": 3.5565588200778455,
"layernorm_full_attention_beta": 1.0,
"layernorm_linear_attention_alpha": 3.5565588200778455,
"layernorm_linear_attention_beta": 1.0,
"layernorm_mlp_alpha": 3.5565588200778455,
"layernorm_mlp_beta": 1.0,
"max_position_embeddings": 10240000,
"model_type": "minimax_m1",
"num_attention_heads": 64,
"num_experts_per_tok": 2,
"num_hidden_layers": 80,
"num_key_value_heads": 8,
"num_local_experts": 32,
"output_router_logits": false,
"postnorm": true,
"rms_norm_eps": 1e-05,
"rope_theta": 10000000,
"rotary_dim": 64,
"router_aux_loss_coef": 0.001,
"router_jitter_noise": 0.0,
"shared_intermediate_size": 0,
"shared_moe_mode": "sigmoid",
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.45.2",
"use_cache": true,
"vocab_size": 200064
}
""" MiniMaxM1 model configuration"""
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import logging
logger = logging.get_logger(__name__)
class MiniMaxM1Config(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`MiniMaxM1Model`]. It is used to instantiate an
MiniMaxM1 model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the MiniMaxM1.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 32000):
Vocabulary size of the MiniMaxM1 model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`MiniMaxM1Model`]
hidden_size (`int`, *optional*, defaults to 4096):
Dimension of the hidden representations.
intermediate_size (`int`, *optional*, defaults to 14336):
Dimension of the MLP representations.
num_hidden_layers (`int`, *optional*, defaults to 32):
Number of hidden layers in the Transformer encoder.
num_attention_heads (`int`, *optional*, defaults to 32):
Number of attention heads for each attention layer in the Transformer encoder.
num_key_value_heads (`int`, *optional*, defaults to 8):
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
`num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
by meanpooling all the original heads within that group. For more details checkout [this
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
The maximum sequence length that this model might ever be used with. MiniMaxM1's sliding window attention
allows sequence of up to 4096*32 tokens.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*):
The id of the padding token.
bos_token_id (`int`, *optional*, defaults to 1):
The id of the "beginning-of-sequence" token.
eos_token_id (`int`, *optional*, defaults to 2):
The id of the "end-of-sequence" token.
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether the model's input and output word embeddings should be tied.
rope_theta (`float`, *optional*, defaults to 1000000.0):
The base period of the RoPE embeddings.
sliding_window (`int`, *optional*):
Sliding window attention window size. If not specified, will default to `4096`.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
num_experts_per_tok (`int`, *optional*, defaults to 2):
The number of experts to route per-token, can be also interpreted as the `top-k` routing
parameter
num_local_experts (`int`, *optional*, defaults to 8):
Number of experts per Sparse MLP layer.
output_router_logits (`bool`, *optional*, defaults to `False`):
Whether or not the router logits should be returned by the model. Enabeling this will also
allow the model to output the auxiliary loss. See [here]() for more details
router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
The aux loss factor for the total loss.
router_jitter_noise (`float`, *optional*, defaults to 0.0):
Amount of noise to add to the router.
```python
>>> from transformers import MiniMaxM1Model, MiniMaxM1Config
>>> # Initializing a MiniMaxM1 style configuration
>>> configuration = MiniMaxM1Config()
>>> # Initializing a model from the MiniMaxM1 style configuration
>>> model = MiniMaxM1Model(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "MiniMaxM1"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=32000,
hidden_size=4096,
intermediate_size=14336,
num_hidden_layers=32,
num_attention_heads=32,
num_key_value_heads=8,
hidden_act="silu",
max_position_embeddings=4096 * 32,
initializer_range=0.02,
rms_norm_eps=1e-5,
use_cache=True,
pad_token_id=None,
bos_token_id=None,
eos_token_id=None,
tie_word_embeddings=False,
rope_theta=1e6,
sliding_window=None,
attention_dropout=0.0,
num_experts_per_tok=2,
num_local_experts=8,
output_router_logits=False,
router_aux_loss_coef=0.001,
router_jitter_noise=0.0,
**kwargs,
):
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.sliding_window = sliding_window
# for backward compatibility
if num_key_value_heads is None:
num_key_value_heads = num_attention_heads
self.num_key_value_heads = num_key_value_heads
self.hidden_act = hidden_act
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache
self.rope_theta = rope_theta
self.attention_dropout = attention_dropout
self.num_experts_per_tok = num_experts_per_tok
self.num_local_experts = num_local_experts
self.output_router_logits = output_router_logits
self.router_aux_loss_coef = router_aux_loss_coef
self.router_jitter_noise = router_jitter_noise
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
tie_word_embeddings=tie_word_embeddings,
**kwargs,
)
doc/ip.png

1.06 MB

FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04.1-rc4-das1.6-py3.10-20250620-fixpy
ENV DEBIAN_FRONTEND=noninteractive
# RUN yum update && yum install -y git cmake wget build-essential
# RUN source /opt/dtk-dtk25.04.1/env.sh
# # 安装pip相关依赖
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
# MiniMax-M1 Function Call Guide
[FunctionCall中文使用指南](./function_call_guide_cn.md)
## 📖 Introduction
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. This document provides detailed instructions on how to use the function calling feature of MiniMax-M1.
## 🚀 Quick Start
### Using Chat Template
MiniMax-M1 uses a specific chat template format to handle function calls. The chat template is defined in `tokenizer_config.json`, and you can use it in your code through the template.
```python
from transformers import AutoTokenizer
def get_default_tools():
return [
{
{
"name": "get_current_weather",
"description": "Get the latest weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "A certain city, such as Beijing, Shanghai"
}
},
}
"required": ["location"],
"type": "object"
}
}
]
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "What's the weather like in Shanghai today?"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
# Enable function call tools
tools = get_default_tools()
# Apply chat template and add tool definitions
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
```
## 🛠️ Function Call Definition
### Function Structure
Function calls need to be defined in the `tools` field of the request body. Each function consists of the following components:
```json
{
"tools": [
{
"name": "search_web",
"description": "Search function.",
"parameters": {
"properties": {
"query_list": {
"description": "Keywords for search, with list element count of 1.",
"items": { "type": "string" },
"type": "array"
},
"query_tag": {
"description": "Classification of the query",
"items": { "type": "string" },
"type": "array"
}
},
"required": [ "query_list", "query_tag" ],
"type": "object"
}
}
]
}
```
**Field Descriptions:**
- `name`: Function name
- `description`: Function description
- `parameters`: Function parameter definition
- `properties`: Parameter property definitions, where key is the parameter name and value contains detailed parameter description
- `required`: List of required parameters
- `type`: Parameter type (usually "object")
### Internal Model Processing Format
When processed internally by the model, function definitions are converted to a special format and concatenated to the input text:
```
]~!b[]~b]system ai_setting=MiniMax AI
MiniMax AI is an AI assistant independently developed by MiniMax. [e~[
]~b]system tool_setting=tools
You are provided with these tools:
<tools>
{"name": "search_web", "description": "Search function.", "parameters": {"properties": {"query_list": {"description": "Keywords for search, with list element count of 1.", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "Classification of the query", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
</tools>
If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
<tool_calls>
{"name": <tool-name>, "arguments": <args-json-object>}
...
</tool_calls>[e~[
]~b]user name=User
When were the most recent launch events for OpenAI and Gemini?[e~[
]~b]ai name=MiniMax AI
```
### Model Output Format
The model outputs function calls in the following format:
```xml
<think>
Okay, I will search for the OpenAI and Gemini latest release.
</think>
<tool_calls>
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
</tool_calls>
```
## 📥 Function Call Result Processing
### Parsing Function Calls
You can use the following code to parse function calls from the model output:
```python
import re
import json
def parse_function_calls(content: str):
"""
Parse function calls from model output
"""
function_calls = []
# Match content within <tool_calls> tags
tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
if not tool_calls_match:
return function_calls
tool_calls_content = tool_calls_match.group(1).strip()
# Parse each function call (one JSON object per line)
for line in tool_calls_content.split('\n'):
line = line.strip()
if not line:
continue
try:
# Parse JSON format function call
call_data = json.loads(line)
function_name = call_data.get("name")
arguments = call_data.get("arguments", {})
function_calls.append({
"name": function_name,
"arguments": arguments
})
print(f"Function call: {function_name}, Arguments: {arguments}")
except json.JSONDecodeError as e:
print(f"Parameter parsing failed: {line}, Error: {e}")
return function_calls
# Example: Handle weather query function
def execute_function_call(function_name: str, arguments: dict):
"""
Execute function call and return result
"""
if function_name == "get_current_weather":
location = arguments.get("location", "Unknown location")
# Build function execution result
return {
"role": "tool",
"name": function_name,
"content": json.dumps({
"location": location,
"temperature": "25",
"unit": "celsius",
"weather": "Sunny"
}, ensure_ascii=False)
}
elif function_name == "search_web":
query_list = arguments.get("query_list", [])
query_tag = arguments.get("query_tag", [])
# Simulate search results
return {
"role": "tool",
"name": function_name,
"content": f"Search keywords: {query_list}, Categories: {query_tag}\nSearch results: Relevant information found"
}
return None
```
### Returning Function Execution Results to the Model
After successfully parsing function calls, you should add the function execution results to the conversation history so that the model can access and utilize this information in subsequent interactions.
#### Single Result
If the model decides to call `search_web`, we suggest you to return the function result in the following format, with the `name` field set to the specific tool name.
```json
{
"data": [
{
"role": "tool",
"name": "search_web",
"content": "search_result"
}
]
}
```
Corresponding model input format:
```
]~b]tool name=search_web
search_result[e~[
```
#### Multiple Result
If the model decides to call `search_web` and `get_current_weather` at the same time, we suggest you to return the multiple function results in the following format, with the `name` field set to "tools", and use the `content` field to contain multiple results.
```json
{
"data": [
{
"role": "tool",
"name": "tools",
"content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
}
]
}
```
Corresponding model input format:
```
]~b]tool name=tools
Tool name: search_web
Tool result: test_result1
Tool name: get_current_weather
Tool result: test_result2[e~[
```
While we suggest following the above formats, as long as the model input is easy to understand, the specific values of `name` and `content` is entirely up to the caller.
# MiniMax-M1 函数调用(Function Call)功能指南
## 📖 简介
MiniMax-M1 模型支持函数调用功能,使模型能够识别何时需要调用外部函数,并以结构化格式输出函数调用参数。本文档详细介绍了如何使用 MiniMax-M1 的函数调用功能。
## 🚀 快速开始
### 聊天模板使用
MiniMax-M1 使用特定的聊天模板格式处理函数调用。聊天模板定义在 `tokenizer_config.json` 中,你可以在代码中通过 template 来进行使用。
```python
from transformers import AutoTokenizer
def get_default_tools():
return [
{
{
"name": "get_current_weather",
"description": "Get the latest weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "A certain city, such as Beijing, Shanghai"
}
},
}
"required": ["location"],
"type": "object"
}
}
]
# 加载模型和分词器
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "What's the weather like in Shanghai today?"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
# 启用函数调用工具
tools = get_default_tools()
# 应用聊天模板,并加入工具定义
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
```
## 🛠️ 函数调用的定义
### 函数结构体
函数调用需要在请求体中定义 `tools` 字段,每个函数由以下部分组成:
```json
{
"tools": [
{
"name": "search_web",
"description": "搜索函数。",
"parameters": {
"properties": {
"query_list": {
"description": "进行搜索的关键词,列表元素个数为1。",
"items": { "type": "string" },
"type": "array"
},
"query_tag": {
"description": "query的分类",
"items": { "type": "string" },
"type": "array"
}
},
"required": [ "query_list", "query_tag" ],
"type": "object"
}
}
]
}
```
**字段说明:**
- `name`: 函数名称
- `description`: 函数功能描述
- `parameters`: 函数参数定义
- `properties`: 参数属性定义,key 是参数名,value 包含参数的详细描述
- `required`: 必填参数列表
- `type`: 参数类型(通常为 "object")
### 模型内部处理格式
在模型内部处理时,函数定义会被转换为特殊格式并拼接到输入文本中:
```
]~!b[]~b]system ai_setting=MiniMax AI
MiniMax AI是由上海稀宇科技有限公司(MiniMax)自主研发的AI助理。[e~[
]~b]system tool_setting=tools
You are provided with these tools:
<tools>
{"name": "search_web", "description": "搜索函数。", "parameters": {"properties": {"query_list": {"description": "进行搜索的关键词,列表元素个数为1。", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "query的分类", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
</tools>
If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
<tool_calls>
{"name": <tool-name>, "arguments": <args-json-object>}
...
</tool_calls>[e~[
]~b]user name=用户
OpenAI 和 Gemini 的最近一次发布会都是什么时候?[e~[
]~b]ai name=MiniMax AI
```
### 模型输出格式
模型会以以下格式输出函数调用:
```xml
<think>
Okay, I will search for the OpenAI and Gemini latest release.
</think>
<tool_calls>
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
</tool_calls>
```
## 📥 函数调用结果处理
### 解析函数调用
您可以使用以下代码解析模型输出的函数调用:
```python
import re
import json
def parse_function_calls(content: str):
"""
解析模型输出中的函数调用
"""
function_calls = []
# 匹配 <tool_calls> 标签内的内容
tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
if not tool_calls_match:
return function_calls
tool_calls_content = tool_calls_match.group(1).strip()
# 解析每个函数调用(每行一个JSON对象)
for line in tool_calls_content.split('\n'):
line = line.strip()
if not line:
continue
try:
# 解析JSON格式的函数调用
call_data = json.loads(line)
function_name = call_data.get("name")
arguments = call_data.get("arguments", {})
function_calls.append({
"name": function_name,
"arguments": arguments
})
print(f"调用函数: {function_name}, 参数: {arguments}")
except json.JSONDecodeError as e:
print(f"参数解析失败: {line}, 错误: {e}")
return function_calls
# 示例:处理天气查询函数
def execute_function_call(function_name: str, arguments: dict):
"""
执行函数调用并返回结果
"""
if function_name == "get_current_weather":
location = arguments.get("location", "未知位置")
# 构建函数执行结果
return {
"role": "tool",
"name": function_name,
"content": json.dumps({
"location": location,
"temperature": "25",
"unit": "celsius",
"weather": "晴朗"
}, ensure_ascii=False)
}
elif function_name == "search_web":
query_list = arguments.get("query_list", [])
query_tag = arguments.get("query_tag", [])
# 模拟搜索结果
return {
"role": "tool",
"name": function_name,
"content": f"搜索关键词: {query_list}, 分类: {query_tag}\n搜索结果: 相关信息已找到"
}
return None
```
### 将函数执行结果返回给模型
成功解析函数调用后,您应将函数执行结果添加到对话历史中,以便模型在后续交互中能够访问和利用这些信息。
#### 单个结果
假如模型调用了 `search_web` 函数,您可以参考如下格式添加执行结果,`name` 字段为具体的函数名称。
```json
{
"data": [
{
"role": "tool",
"name": "search_web",
"content": "search_result"
}
]
}
```
对应如下的模型输入格式:
```
]~b]tool name=search_web
search_result[e~[
```
#### 多个结果
假如模型同时调用了 `search_web``get_current_weather` 函数,您可以参考如下格式添加执行结果,`name` 字段为"tools",`content`包含多个结果。
```json
{
"data": [
{
"role": "tool",
"name": "tools",
"content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
}
]
}
```
对应如下的模型输入格式:
```
]~b]tool name=tools
Tool name: search_web
Tool result: test_result1
Tool name: get_current_weather
Tool result: test_result2[e~[
```
虽然我们建议您参考以上格式,但只要返回给模型的输入易于理解,`name``content` 的具体内容完全由您自主决定。
# Guia de Uso de Function Call no MiniMax-M1
[FunctionCall中文使用指南](./function_call_guide_cn.md)
## 📖 Introdução
O modelo MiniMax-M1 possui suporte para chamadas de funções (Function Call), permitindo que o modelo identifique quando funções externas precisam ser chamadas e gere os parâmetros dessas chamadas em um formato estruturado. Este documento fornece instruções detalhadas sobre como utilizar o recurso de chamadas de funções do MiniMax-M1.
## 🚀 Início Rápido
### Usando o Template de Chat
O MiniMax-M1 utiliza um template específico de chat para lidar com chamadas de funções. Este template é definido no arquivo `tokenizer_config.json` e pode ser utilizado no seu código através do template.
```python
from transformers import AutoTokenizer
def get_default_tools():
return [
{
{
"name": "get_current_weather",
"description": "Get the latest weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "A certain city, such as Beijing, Shanghai"
}
},
}
"required": ["location"],
"type": "object"
}
}
]
# Modelo de carga e tokenizador
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "What's the weather like in Shanghai today?"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
# Habilitar ferramentas de chamada de função
tools = get_default_tools()
# Aplicar modelo de bate-papo e adicionar definições de ferramentas
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
```
## 🛠️ Definição de Function Call
### Estrutura da Função
As funções precisam ser definidas no campo `tools` do corpo da requisição. Cada função é composta pelos seguintes elementos:
```json
{
"tools": [
{
"name": "search_web",
"description": "Search function.",
"parameters": {
"properties": {
"query_list": {
"description": "Keywords for search, with list element count of 1.",
"items": { "type": "string" },
"type": "array"
},
"query_tag": {
"description": "Classification of the query",
"items": { "type": "string" },
"type": "array"
}
},
"required": [ "query_list", "query_tag" ],
"type": "object"
}
}
]
}
```
**Descrição dos Campos:**
* `name`: Nome da função
* `description`: Descrição da função
* `parameters`: Definição dos parâmetros da função
* `properties`: Definições dos parâmetros, onde a chave é o nome do parâmetro e o valor contém a descrição
* `required`: Lista de parâmetros obrigatórios
* `type`: Tipo de dado (geralmente "object")
### Formato Interno de Processamento do Modelo
Internamente, as definições de funções são convertidas para um formato especial e concatenadas ao texto de entrada:
```
]~!b[]~b]system ai_setting=MiniMax AI
MiniMax AI is an AI assistant independently developed by MiniMax. [e~[
]~b]system tool_setting=tools
You are provided with these tools:
<tools>
{"name": "search_web", "description": "Search function.", "parameters": {"properties": {"query_list": {"description": "Keywords for search, with list element count of 1.", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "Classification of the query", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
</tools>
If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
<tool_calls>
{"name": <tool-name>, "arguments": <args-json-object>}
...
</tool_calls>[e~[
]~b]user name=User
When were the most recent launch events for OpenAI and Gemini?[e~[
]~b]ai name=MiniMax AI
```
### Formato de Saída do Modelo
O modelo gera chamadas de função no seguinte formato:
```xml
<think>
Ok, vou procurar a versão mais recente do OpenAI e do Gemini.
</think>
<tool_calls>
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
</tool_calls>
```
## 📥 Processamento dos Resultados da Function Call
### Fazendo o Parse das Chamadas de Função
Você pode utilizar o código abaixo para extrair as chamadas de função a partir da saída do modelo:
```python
import re
import json
def parse_function_calls(content: str):
"""
Parse function calls from model output
"""
function_calls = []
# Corresponder conteúdo dentro das tags <tool_calls>
tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
if not tool_calls_match:
return function_calls
tool_calls_content = tool_calls_match.group(1).strip()
# Analisar cada chamada de função (um objeto JSON por linha)
for line in tool_calls_content.split('\n'):
line = line.strip()
if not line:
continue
try:
# Chamada de função de formato JSON de análise
call_data = json.loads(line)
function_name = call_data.get("name")
arguments = call_data.get("arguments", {})
function_calls.append({
"name": function_name,
"arguments": arguments
})
print(f"Function call: {function_name}, Arguments: {arguments}")
except json.JSONDecodeError as e:
print(f"Parameter parsing failed: {line}, Error: {e}")
return function_calls
# Exemplo: Manipular função de consulta de clima
def execute_function_call(function_name: str, arguments: dict):
"""
Execute function call and return result
"""
if function_name == "get_current_weather":
location = arguments.get("location", "Unknown location")
# Resultado da execução da função de construção
return {
"role": "tool",
"name": function_name,
"content": json.dumps({
"location": location,
"temperature": "25",
"unit": "celsius",
"weather": "Sunny"
}, ensure_ascii=False)
}
elif function_name == "search_web":
query_list = arguments.get("query_list", [])
query_tag = arguments.get("query_tag", [])
# Simular resultados de pesquisa
return {
"role": "tool",
"name": function_name,
"content": f"Search keywords: {query_list}, Categories: {query_tag}\nSearch results: Relevant information found"
}
return None
```
### Retornando os Resultados das Funções para o Modelo
Após interpretar e executar as funções, você deve adicionar os resultados na sequência de mensagens, para que o modelo os utilize nas respostas seguintes.
#### Resultado Único
Se o modelo solicitar a função `search_web`, retorne no seguinte formato, com o campo `name` igual ao nome da ferramenta:
```json
{
"data": [
{
"role": "tool",
"name": "search_web",
"content": "search_result"
}
]
}
```
Formato correspondente no input do modelo:
```
]~b]tool name=search_web
search_result[e~[
```
#### Vários Resultados
Se o modelo solicitar simultaneamente `search_web` e `get_current_weather`, envie da seguinte forma, usando `name` como "tools" e colocando todos os resultados no campo `content`:
```json
{
"data": [
{
"role": "tool",
"name": "tools",
"content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
}
]
}
```
Formato correspondente no input do modelo:
```
]~b]tool name=tools
Tool name: search_web
Tool result: resultado1
Tool name: get_current_weather
Tool result: resultado2[e~[
```
Embora esse seja o formato recomendado, desde que a entrada seja clara para o modelo, os valores de `name` e `content` podem ser adaptados conforme a necessidade.
# 🚀 MiniMax Model Transformers Deployment Guide
[Transformers中文版部署指南](./transformers_deployment_guide_cn.md)
## 📖 Introduction
This guide will help you deploy the MiniMax-M1 model using the [Transformers](https://huggingface.co/docs/transformers/index) library. Transformers is a widely used deep learning library that provides a rich collection of pre-trained models and flexible model operation interfaces.
## 🛠️ Environment Setup
### Installing Transformers
```bash
pip install transformers torch accelerate
```
## 📋 Basic Usage Example
The pre-trained model can be used as follows:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
MODEL_PATH = "{MODEL_PATH}"
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
messages = [
{"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
{"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
{"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
generation_config = GenerationConfig(
max_new_tokens=20,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
generated_ids = model.generate(**model_inputs, generation_config=generation_config)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## ⚡ Performance Optimization
### Speeding up with Flash Attention
The code snippet above showcases inference without any optimization tricks. However, one can drastically speed up the model by leveraging [Flash Attention](../perf_train_gpu_one#flash-attention-2), which is a faster implementation of the attention mechanism used inside the model.
First, make sure to install the latest version of Flash Attention 2:
```bash
pip install -U flash-attn --no-build-isolation
```
Also make sure that you have hardware that is compatible with Flash-Attention 2. Read more about it in the official documentation of the [Flash Attention repository](https://github.com/Dao-AILab/flash-attention). Additionally, ensure you load your model in half-precision (e.g. `torch.float16`).
To load and run a model using Flash Attention-2, refer to the snippet below:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "{MODEL_PATH}"
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
prompt = "My favourite condiment is"
model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
response = tokenizer.batch_decode(generated_ids)[0]
print(response)
```
## 📮 Getting Support
If you encounter any issues while deploying the MiniMax-M1 model:
- Please check our official documentation
- Contact our technical support team through official channels
- Submit an Issue on our GitHub repository
We continuously optimize the deployment experience on Transformers and welcome your feedback!
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment