v1.0

4d49d792 · chenzk · 4d49d792 · 4d49d792 · 4d49d792 · 4d49d792
Commit 4d49d792 authored Jun 30, 2025 by chenzk
20 changed files
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2025 MiniMax
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/MiniMax/MiniMax-M1-40k/README.md
+++ b/MiniMax/MiniMax-M1-40k/README.md
--- a/MiniMax_M1_tech_report.pdf
+++ b/MiniMax_M1_tech_report.pdf
--- a/README.md
+++ b/README.md
+# MiniMax-M1
+MiniMax M1拥有超长的上下文能力，100万token输入，8万token输出，足以媲美Gemini 2.5 Pro的开源模型。
+## 论文
+`无`
+## 模型结构
+MiniMax采用通用的Decoder-Only结构，基于MOE，将大量Softmax Attention替换成包含线性注意力机制的Lightning Attention(7:1)节约计算量。
+<div align=center>
+    <img src="./doc/MiniMax.png"/>
+</div>
+## 算法原理
+MiniMax引入Linear Attention节约计算量，同时，其RL训练引入自主提出的CISPO，比DAPO训练收敛速度约快一倍。
+Lightning Attention：把注意力计算分成块内和块间两部分，块内用传统注意力计算，块间用线性注意力的核技巧，避免了累积求和操作（cumsum）拖慢速度；
+CISPO：选择裁剪重要性采样权重，这样可以保留所有token的梯度贡献，特别是在长响应中至关重要；
+<div align=center>
+    <img src="./doc/Lightning_Attention.png"/>
+</div>
+## 环境配置
+```
+mv MiniMax-M1_vllm MiniMax-M1 # 去框架名后缀
+```
+### 硬件需求
+DCU型号：BW1000，节点数量：2 台，卡数：2*8 张。
+### 通信配置
+一、节点间基础通信
+`在本地机器上配置以下内容：`
+1、关闭防火墙：
+```
+systemctl stop  firewalld # 若为centos 
+ufw disable # 若为Ubuntu
+```
+2、设置amd_iommu=on:
+```
+vim /etc/default/grub
+```
+<div align=center>
+    <img src="./doc/amd_iommu.png"/>
+</div>
+更新下配置:
+```
+grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg
+```
+重启机器后校验是否生效(检查是否存在imxxx=pt)：
+```
+BOOT_IMAGE=(hd0,gpt3)/vmlinuz-4.18.0-372.9.1.el8.x86_64 root=UUID=80974f58-7d23-49bb-bd8b-8e299eb0d188 ro crashkernel=auto rhgb quiet systemd.unified_cgroup_hierachy=1 systemd.unified_cgroup_hierarchy=1 amd_iommu=on iommu=pt
+```
+`在后面步骤启动的容器里面配置以下内容：`
+```
+apt update
+apt install openssh-server -y
+```
+vim /etc/ssh/sshd_config # 修改下面PermitRootLogin为yes
+```
+# 取消以下4句命令的注释
+RSAAuthentication yes #启用 RSA 认证
+PubkeyAuthentication yes #启用公钥私钥配对认证方式
+AuthorizedKeysFile ~/.ssh/authorized_keys #公钥文件路径（和下面生成的文件同）
+PermitRootLogin yes #root能使用ssh登录
+```
+重启ssh服务，并设置开机启动：
+```
+service sshd restart
+chkconfig sshd on
+查看sshd状态：service ssh status
+开启sshd服务：/etc/init.d/ssh restart
+```
+下面开始设置节点间免密通信的秘钥：
+1、ssh-keygen生成秘钥
+```
+ssh-keygen -t ed25519 # 此处以ed25519为例，读者可自己设置为其它名字，遇到提问全部回车键确认
+```
+2、将需要使用的各个节点`~/.ssh/authorized_keys`里的秘钥收集复制到`~/.ssh/id_rsa.pub`，每个节点`~/.ssh/id_rsa.pub`里的所有秘钥最终一致。格式类似如下：
+<div align=center>
+    <img src="./doc/id_rsa.png"/>
+</div>
+3、设置节点间的通信端口号
+```
+/usr/sbin/sshd -p 10085 # 不同节点可以设置不同的端口号，打通秘钥和端口号之后可以用ssh -p之类的命令验证节点间是否通信已经通畅，否则需检查前面步骤是否设置成功。
+```
+以上设置非标准步骤，不同服务器或集群存在明显差异，无法完全复制此过程，请读者根据自己机器的实际情况灵活采用，总体目标是开启amd_iommu、打通节点间的容器内可以直接免密登录。
+二、ray相关通信
+`在后面步骤启动的容器里面配置以下内容：`
+```
+vim ~/.bashrc
+```
+在脚本`.bashrc`最后面添加以下命令（以BW200卡的集群为例）：
+```
+export ALLREDUCE_STREAM_WITH_COMPUTE=1
+export VLLM_HOST_IP=x.x.x.x
+export NCCL_SOCKET_IFNAME=enp33s0f3u1
+export GLOO_SOCKET_IFNAME=enp33s0f3u1
+unset NCCL_ALGO
+export NCCL_MIN_NCHANNELS=16
+export NCCL_MAX_NCHANNELS=16
+export NCCL_NET_GDR_READ=1
+export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export VLLM_SPEC_DECODE_EAGER=1
+export VLLM_MLA_DISABLE=0
+export VLLM_USE_FLASH_MLA=1
+# 若为BW卡，则添加以下信息：
+export NCCL_NET_GDR_LEVEL=7
+export NCCL_SDMA_COPY_ENABLE=0
+export NCCL_IB_HCA=mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1,mlx5_9:1
+# 若为K100_AI卡，则添加以下信息(本步骤以BW卡为示例，故注释了以下信息。)：
+# export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44 
+```
+其中`VLLM_HOST_IP`和`NCCL_SOCKET_IFNAME`需要替换成每个自己机器上查到的信息，每个节点的ip不同，查询方式如下：
+```
+通信口和ip查询方法：ifconfig
+VLLM_HOST_IP： 节点本地通信口ip
+NCCL_SOCKET_IFNAME和GLOO_SOCKET_IFNAME： 节点本地通信网口名
+```
+`示例：`
+<div align=center>
+    <img src="./doc/ip.png"/>
+</div>
+带BW卡的集群VLLM_HOST_IP需要设置为ib网卡对应的IP，避免出现rccl超时问题：
+<div align=center>
+    <img src="./doc/ip_bw.png"/>
+</div>
+注意：添加完以上信息后必须退出容器，然后重启容器，最后重新进入容器后以上环境配置才能生效，否则后续会出现NCCL通信报错，重启容器的命令如下：
+```
+docker restart minimax # 必须
+```
+`Tips：由于通信配置方面属于运维人员的专业内容，其它人员可能了解很少，以上关于通信的配置建议读者让运维人员进行配置。`
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04.1-rc4-das1.6-py3.10-20250620-fixpy
+# <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：e99e26dbb33b
+docker run -it --shm-size=192G --network=host --ipc=host -p 8000:8000 -v $PWD/MiniMax-M1:/home/MiniMax-M1 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=//dev/dri/ --group-add video --name minimax <your IMAGE ID> bash
+```
+### Dockerfile（方法二）
+```
+cd /home/MiniMax-M1/docker
+docker build --no-cache -t minimax:latest .
+docker run --shm-size=192G --name minimax --network=host --ipc=host -p 8000:8000 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=//dev/dri/ --group-add video -v $PWD/../../MiniMax-M1:/home/MiniMax-M1 -it minimax bash
+# 若遇到Dockerfile启动的方式安装环境需要长时间等待，可注释掉里面的pip安装，启动容器后再安装python库：pip install -r requirements.txt。
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+- https://developer.hpccube.com/tool/
+```
+DTK驱动:dtk2504
+python:python3.10
+torch:2.4.1
+torchvision:0.19.1
+triton:3.0.0
+vllm:0.8.5
+flash-attn:2.6.1
+deepspeed:0.14.2
+apex:1.4.0
+transformers:4.51.1
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
+2、其它非特殊库参照requirements.txt安装
+`无`
+## 数据集
+`无`
+## 训练
+`无`
+## 推理
+预训练权重目录结构：
+```
+/home/MiniMax-M1/
+    └── MiniMax/MiniMax-M1-40k
+``` 
+权重下载完成后，请将`MiniMax/MiniMax-M1-40k/config.json`中的`architectures`修改为：
+```
+"architectures": [
+    "MiniMaxText01ForCausalLM"
+  ],
+```
+### 多机多卡
+```
+cd /home/MiniMax-M1
+# 启动ray
+ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=16 # 启动主节点的ray, x.x.x.x 为前面步骤中ifconfig查到的主节点ip（VLLM_HOST_IP）。
+ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=16 # 启动其它节点的ray, x.x.x.x 为前面步骤中ifconfig查到的主节点ip（VLLM_HOST_IP）。
+# 可用ray status 查看ray的集群启动状态。
+# 本项目以MiniMax-M1-40k示例，其它MiniMax-M1模型以此类推，MiniMax-M1-80k对卡数的需求更多。
+# 方法一：vllm在线推理
+export SAFETENSORS_FAST_GPU=1
+export VLLM_USE_V1=0
+# 启动服务端
+vllm serve MiniMax/MiniMax-M1-40k --distributed-executor-backend ray --host 0.0.0.0 --port 8000 --tensor-parallel-size 16 --max_model_len 4096 --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.99 --trust-remote-code
+# 客户端测试命令示例：
+curl http://0.0.0.0:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
+        "model": "MiniMax/MiniMax-M1-40k",
+        "messages": [
+            {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
+            {"role": "user", "content": [{"type": "text", "text": "美国的国土面积多大?"}]}
+        ]
+    }'
+方法二：vllm离线推理
+python infer_vllm.py # 以MiniMax-M1-40k示例
+# 对于报错：AttributeError: 'NoneType' object has no attribute 'info'
+# 注释掉此行原始代码的logger日志打印即可：/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_distributed_executor.py", line 127
+```
+更多资料可参考源项目中的[`README_orgin`](./README_orgin.md)。
+## result
+vllm推理效果示例：
+`输入: `
+```
+prompt: "美国的国土面积多大?"
+```
+`输出:`
+```
+Generated text: '<think>\n好的，我现在需要回答用户的问题：“美国的国土面积多大？”首先，我得确认自己对这个问题的了解程度。我记得美国是世界上面积较大的国家之一，但具体数字可能记不太准了。可能需要查证一下。\n\n首先，我应该回想一下美国的基本地理知识。美国位于北美洲，东临大西洋，西接太平洋，北边加拿大，南接墨西哥。国土面积包括本土的48个州和阿拉斯加、夏威夷两个州，以及一些海外领土。不过通常所说的国土面积可能指的是陆地和水域的总面积，或者只是陆地面积？\n\n接下来，我需要确定正确的单位。通常国    国土面积会用平方公里或者平方英里来表示。比如，中国是约960万平方公里，美国可能比中国小一些？或者更大？我记得之前学过的数据可能有些混淆，需要确认。\n\n然后，可能需要考虑不同的数据来源是否一致。比如维基百科的数据，或者其他权威网站的数据。另外，是否       有最新的数据，因为有时候国家的面积可能会有调整，比如通过领土争端解决或者测量技术的改进。\n\n另外，需要注意美国国土面积的组成部分。比如，本土48州的面积，阿拉斯加的面积，夏威夷的面积，以及其他海外领土如波多黎各、关岛等的面积是否包含在内。通常来说    说，国土面积可能指的是总领土，包括所有州和领土，但有时候可能只算主要部分。\n\n比如，根据我之前的记忆，美国的面积大约是9,833,517平方公里，或者约3,796,742平方英里。但不确定这个数字是否准确。或者可能更接近9.6 million平方公里？需要核实。\n\n另外，可       能存在不同的统计方式，比如总土地面积、水域面积等。例如，美国的陆地面积可能比总国土面积小，因为包括内陆水域如五大湖等。\n\n现在，我需要找到可靠的数据来源。比如，可以查阅世界银行的数据，或者美国政府官方网站的数据，或者权威的地理数据库。\n\n根据世    世界银行的数据，美国的国土面积是9,833,517平方公里。而中国的面积是9,388,210平方公里（可能不包括台湾？）。所以美国比中国稍大？或者可能我的记忆有误？\n\n不过，可能存在不同的数据来源，比如有的资料说美国面积约9.6百万平方公里，而中国约9.6百万，但可能统    统计方式不同。需要确认。\n\n另外，需要注意单位转换是否正确。比如，1平方英里等于2.58999平方公里。所以如果美国的面积是3,796,742平方英里，那么换算成平方公里就是3,796,742 × 2.58999 ≈ 9,833,517平方公里，这和世界银行的数据一致。\n\n所以，正确的答案应该    该是大约9,833'
+```
+### 精度
+DCU与GPU精度一致，推理框架：vllm。
+## 应用场景
+### 算法类别
+`对话问答`
+### 热点应用行业
+`制造,广媒,金融,能源,医疗,家居,教育`
+## 预训练权重
+HF下载地址为：[MiniMaxAI/MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)
+## 源码仓库及问题反馈
+- http://developer.sourcefind.cn/codes/modelzoo/MiniMax-M1_vllm.git
+## 参考资料
+- https://github.com/MiniMax-AI/MiniMax-M1.git
--- a/README_origin.md
+++ b/README_origin.md
+<div align="center">
+  <picture>
+    <source srcset="figures/MiniMaxLogo-Dark.png" media="(prefers-color-scheme: dark)">
+      <img src="figures/MiniMaxLogo-Light.png" width="60%" alt="MiniMax">
+    </source>
+  </picture>
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://www.minimax.io" target="_blank" style="margin: 2px;">
+    <img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://arxiv.org/abs/2506.13585" target="_blank" style="margin: 2px;">
+    <img alt="Paper" src="https://img.shields.io/badge/📖_Paper-MiniMax--M1-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://chat.minimax.io/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/_MiniMax_Chat-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://www.minimax.io/platform" style="margin: 2px;">
+    <img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-MCP" style="margin: 2px;">
+    <img alt="MCP" src="https://img.shields.io/badge/🚀_MCP-MiniMax_MCP-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
+    <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-M1" target="_blank" style="margin: 2px;">
+    <img alt="GitHub" src="https://img.shields.io/badge/🐙_GitHub-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://www.modelscope.cn/organization/MiniMax" target="_blank" style="margin: 2px;">
+    <img alt="ModelScope" src="https://img.shields.io/badge/🤖️_ModelScope-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-M1/blob/main/LICENSE" style="margin: 2px;">
+    <img alt="License" src="https://img.shields.io/badge/⚖️_License-Apache_2.0-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/figures/wechat-qrcode.jpeg" target="_blank" style="margin: 2px;">
+    <img alt="WeChat" src="https://img.shields.io/badge/💬_WeChat-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+# MiniMax-M1
+## 1. Model Overview 
+We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
+MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
+attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01), 
+which contains a total of 456 billion parameters with 45.9 billion parameters activated
+per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
+million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
+in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
+R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1
+particularly suitable for complex tasks that require processing long inputs and thinking extensively.
+MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from
+traditional mathematical reasoning to sandbox-based, real-world software engineering environments.
+We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose
+CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
+outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
+efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
+train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and 
+[80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
+on standard benchmarks show that our models outperform other strong open-weight models such as
+the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
+and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
+foundation for next-generation language model agents to reason and tackle real-world challenges. 
+<p align="center">
+  <img width="100%" src="figures/TextBench.png">
+  <br>
+  <small><em>Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1.</em></small>
+</p>
+## 2. Evaluation
+**Performance of MiniMax-M1 on core benchmarks.**
+| **Category** | **Task** | **MiniMax-M1-80K** | **MiniMax-M1-40K** | **Qwen3-235B-A22B** | **DeepSeek-R1-0528** | **DeepSeek-R1** | **Seed-Thinking-v1.5** | **Claude 4 Opus** | **Gemini 2.5 Pro (06-05)** | **OpenAI-o3** |
+|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| | *Extended Thinking* | *80K* | *40K* | *32k* | *64k* | *32k* | *32k* | *64k* | *64k* | *100k* |
+| ***Mathematics*** | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 |
+| | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 |
+| | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 |
+| ***General Coding*** | LiveCodeBench *(24/8~25/5)* | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 |
+| | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 |
+| ***Reasoning & Knowledge***| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 |
+| | HLE *(no tools)* | 8.4\* | 7.2\* | 7.6\* | 17.7\* | 8.6\* | 8.2 | 10.7 | 21.6 | 20.3 |
+| | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 |
+| | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 |
+| ***Software Engineering***| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 |
+| ***Long Context*** | OpenAI-MRCR *(128k)* | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 |
+| | OpenAI-MRCR *(1M)* | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- |
+| | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 |
+| ***Agentic Tool Use***| TAU-bench *(airline)* | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 |
+| | TAU-bench *(retail)* | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 |
+| ***Factuality*** | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 |
+| ***General Assistant***| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 |
+\* conducted on the text-only HLE subset.
+Our models are evaluated with `temperature=1.0`, `top_p=0.95`. 
+### SWE-bench methodology 
+We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are:
+`"astropy__astropy-7606"`,
+`"astropy__astropy-8707"`,
+`"astropy__astropy-8872"`,
+`"django__django-10097"`,
+`"matplotlib__matplotlib-20488"`,
+`"psf__requests-2317"`,
+`"psf__requests-2931"`,
+`"psf__requests-5414"`,
+`"pylint-dev__pylint-6528"`,
+`"pylint-dev__pylint-7277"`,
+`"sphinx-doc__sphinx-10435"`,
+`"sphinx-doc__sphinx-7985"`,
+`"sphinx-doc__sphinx-8269"`,
+`"sphinx-doc__sphinx-8475"`
+### TAU-bench methodology 
+We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40. 
+Our general system prompt is: 
+```
+- In each round, you need to carefully examine the tools provided to you to determine if any can be used.
+- You must adhere to all of the policies. Pay attention to the details in the terms. Solutions for most situations can be found within these policies.
+``` 
+## 3. Recommendations for Minimax-M1 Model Usage
+To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt.
+### 3.1. Inference Parameters
+- Temperature: **`1.0`**
+- Top_p: **`0.95`**
+This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence.
+### 3.2. System Prompt
+Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios.
+#### A. General-Purpose Scenarios
+For common tasks like summarization, translation, Q&A, or creative writing:
+```
+You are a helpful assistant.
+```
+#### B. Web Development Scenarios
+For complex tasks like generating code for web pages:
+``` 
+You are a web development engineer, writing web pages according to the instructions below. You are a powerful code editing assistant capable of writing code and creating artifacts in conversations with users, or modifying and updating existing artifacts as requested by users. 
+All code is written in a single code block to form a complete code file for display, without separating HTML and JavaScript code. An artifact refers to a runnable complete code snippet, you prefer to integrate and output such complete runnable code rather than breaking it down into several code blocks. For certain types of code, they can render graphical interfaces in a UI window. After generation, please check the code execution again to ensure there are no errors in the output.
+Output only the HTML, without any additional descriptive text. Make the UI looks modern and beautiful.
+```
+#### C. Mathematical Scenarios
+When dealing with problems that require calculation or logical deduction:
+```
+Please reason step by step, and put your final answer within \boxed{}.
+```
+## 4. Deployment Guide
+Download the model from HuggingFace repository: 
+- [MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)
+- [MiniMax-M1-80k](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k)
+For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/latest/) to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features:
+- 🔥 Outstanding service throughout performance
+- ⚡ Efficient and intelligent memory management
+- 📦 Powerful batch request processing capability
+- ⚙️ Deeply optimized underlying performance
+For detailed vLLM deployment instructions, please refer to our [vLLM Deployment Guide](./docs/vllm_deployment_guide.md).
+Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our [MiniMax-M1 Transformers Deployment Guide](./docs/transformers_deployment_guide.md).
+## 5. Function Calling
+The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](./docs/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
+## 6. Chatbot & API
+For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
+## 7. Citation
+```
+@misc{minimax2025minimaxm1scalingtesttimecompute,
+      title={MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention}, 
+      author={MiniMax},
+      year={2025},
+      eprint={2506.13585},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2506.13585}, 
+}
+```
+## 8. Contact Us
+Contact us at [model@minimax.io](mailto:model@minimax.io).
\ No newline at end of file
--- a/config.json
+++ b/config.json
+{
+  "architectures": [
+    "MiniMaxM1ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "attn_type_list": [
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    0,
+    1
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_minimax_m1.MiniMaxM1Config",
+    "AutoModelForCausalLM": "modeling_minimax_m1.MiniMaxM1ForCausalLM"
+  },
+  "bos_token_id": null,
+  "eos_token_id": null,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 6144,
+  "initializer_range": 0.02,
+  "intermediate_size": 9216,
+  "layernorm_full_attention_alpha": 3.5565588200778455,
+  "layernorm_full_attention_beta": 1.0,
+  "layernorm_linear_attention_alpha": 3.5565588200778455,
+  "layernorm_linear_attention_beta": 1.0,
+  "layernorm_mlp_alpha": 3.5565588200778455,
+  "layernorm_mlp_beta": 1.0,
+  "max_position_embeddings": 10240000,
+  "model_type": "minimax_m1",
+  "num_attention_heads": 64,
+  "num_experts_per_tok": 2,
+  "num_hidden_layers": 80,
+  "num_key_value_heads": 8,
+  "num_local_experts": 32,
+  "output_router_logits": false,
+  "postnorm": true,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000000,
+  "rotary_dim": 64,
+  "router_aux_loss_coef": 0.001,
+  "router_jitter_noise": 0.0,
+  "shared_intermediate_size": 0,
+  "shared_moe_mode": "sigmoid",
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.45.2",
+  "use_cache": true,
+  "vocab_size": 200064
+}
--- a/configuration_minimax_m1.py
+++ b/configuration_minimax_m1.py
+""" MiniMaxM1 model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class MiniMaxM1Config(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`MiniMaxM1Model`]. It is used to instantiate an
+    MiniMaxM1 model according to the specified arguments, defining the model architecture. Instantiating a configuration
+    with the defaults will yield a similar configuration to that of the MiniMaxM1.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 32000):
+            Vocabulary size of the MiniMaxM1 model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`MiniMaxM1Model`]
+        hidden_size (`int`, *optional*, defaults to 4096):
+            Dimension of the hidden representations.
+        intermediate_size (`int`, *optional*, defaults to 14336):
+            Dimension of the MLP representations.
+        num_hidden_layers (`int`, *optional*, defaults to 32):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 32):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        num_key_value_heads (`int`, *optional*, defaults to 8):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details checkout [this
+            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
+            The maximum sequence length that this model might ever be used with. MiniMaxM1's sliding window attention
+            allows sequence of up to 4096*32 tokens.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-05):
+            The epsilon used by the rms normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            The id of the padding token.
+        bos_token_id (`int`, *optional*, defaults to 1):
+            The id of the "beginning-of-sequence" token.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            The id of the "end-of-sequence" token.
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether the model's input and output word embeddings should be tied.
+        rope_theta (`float`, *optional*, defaults to 1000000.0):
+            The base period of the RoPE embeddings.
+        sliding_window (`int`, *optional*):
+            Sliding window attention window size. If not specified, will default to `4096`.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+        num_experts_per_tok (`int`, *optional*, defaults to 2):
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
+            parameter
+        num_local_experts (`int`, *optional*, defaults to 8):
+            Number of experts per Sparse MLP layer.
+        output_router_logits (`bool`, *optional*, defaults to `False`):
+            Whether or not the router logits should be returned by the model. Enabeling this will also
+            allow the model to output the auxiliary loss. See [here]() for more details
+        router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
+            The aux loss factor for the total loss.
+        router_jitter_noise (`float`, *optional*, defaults to 0.0):
+            Amount of noise to add to the router.
+    ```python
+    >>> from transformers import MiniMaxM1Model, MiniMaxM1Config
+    >>> # Initializing a MiniMaxM1 style configuration
+    >>> configuration = MiniMaxM1Config()
+    >>> # Initializing a model from the MiniMaxM1 style configuration
+    >>> model = MiniMaxM1Model(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "MiniMaxM1"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    def __init__(
+        self,
+        vocab_size=32000,
+        hidden_size=4096,
+        intermediate_size=14336,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=8,
+        hidden_act="silu",
+        max_position_embeddings=4096 * 32,
+        initializer_range=0.02,
+        rms_norm_eps=1e-5,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=None,
+        eos_token_id=None,
+        tie_word_embeddings=False,
+        rope_theta=1e6,
+        sliding_window=None,
+        attention_dropout=0.0,
+        num_experts_per_tok=2,
+        num_local_experts=8,
+        output_router_logits=False,
+        router_aux_loss_coef=0.001,
+        router_jitter_noise=0.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.sliding_window = sliding_window
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.attention_dropout = attention_dropout
+        self.num_experts_per_tok = num_experts_per_tok
+        self.num_local_experts = num_local_experts
+        self.output_router_logits = output_router_logits
+        self.router_aux_loss_coef = router_aux_loss_coef
+        self.router_jitter_noise = router_jitter_noise
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )
--- a/doc/Lightning_Attention.png
+++ b/doc/Lightning_Attention.png
--- a/doc/MiniMax.png
+++ b/doc/MiniMax.png
--- a/doc/PermitRootLogin.png
+++ b/doc/PermitRootLogin.png
--- a/doc/amd_iommu.png
+++ b/doc/amd_iommu.png
--- a/doc/id_rsa.png
+++ b/doc/id_rsa.png
--- a/doc/ip.png
+++ b/doc/ip.png
--- a/doc/ip_bw.png
+++ b/doc/ip_bw.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04.1-rc4-das1.6-py3.10-20250620-fixpy
+ENV DEBIAN_FRONTEND=noninteractive
+# RUN yum update && yum install -y git cmake wget build-essential
+# RUN source /opt/dtk-dtk25.04.1/env.sh
+# # 安装pip相关依赖
+COPY requirements.txt requirements.txt
+RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
--- a/docker/requirements.txt
+++ b/docker/requirements.txt
--- a/docs/function_call_guide.md
+++ b/docs/function_call_guide.md
+# MiniMax-M1 Function Call Guide
+[FunctionCall中文使用指南](./function_call_guide_cn.md)
+## 📖 Introduction
+The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. This document provides detailed instructions on how to use the function calling feature of MiniMax-M1.
+## 🚀 Quick Start
+### Using Chat Template
+MiniMax-M1 uses a specific chat template format to handle function calls. The chat template is defined in `tokenizer_config.json`, and you can use it in your code through the template.
+```python
+from transformers import AutoTokenizer
+def get_default_tools():
+    return [
+        {
+          {
+            "name": "get_current_weather",
+            "description": "Get the latest weather for a location",
+            "parameters": {
+                "type": "object", 
+                "properties": {
+                    "location": {
+                        "type": "string", 
+                        "description": "A certain city, such as Beijing, Shanghai"
+                    }
+                }, 
+            }
+            "required": ["location"],
+            "type": "object"
+          }
+        }
+    ]
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "What's the weather like in Shanghai today?"
+messages = [
+    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
+    {"role": "user", "content": [{"type": "text", "text": prompt}]},
+]
+# Enable function call tools
+tools = get_default_tools()
+# Apply chat template and add tool definitions
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    tools=tools
+)
+```
+## 🛠️ Function Call Definition
+### Function Structure
+Function calls need to be defined in the `tools` field of the request body. Each function consists of the following components:
+```json
+{
+  "tools": [
+    {
+      "name": "search_web",
+      "description": "Search function.",
+      "parameters": {
+        "properties": {
+          "query_list": {
+            "description": "Keywords for search, with list element count of 1.",
+            "items": { "type": "string" },
+            "type": "array"
+          },
+          "query_tag": {
+            "description": "Classification of the query",
+            "items": { "type": "string" },
+            "type": "array"
+          }
+        },
+        "required": [ "query_list", "query_tag" ],
+        "type": "object"
+      }
+    }
+  ]
+}
+```
+**Field Descriptions:**
+- `name`: Function name
+- `description`: Function description
+- `parameters`: Function parameter definition
+  - `properties`: Parameter property definitions, where key is the parameter name and value contains detailed parameter description
+  - `required`: List of required parameters
+  - `type`: Parameter type (usually "object")
+### Internal Model Processing Format
+When processed internally by the model, function definitions are converted to a special format and concatenated to the input text:
+```
+]~!b[]~b]system ai_setting=MiniMax AI
+MiniMax AI is an AI assistant independently developed by MiniMax. [e~[
+]~b]system tool_setting=tools
+You are provided with these tools:
+<tools>
+{"name": "search_web", "description": "Search function.", "parameters": {"properties": {"query_list": {"description": "Keywords for search, with list element count of 1.", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "Classification of the query", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
+</tools>
+If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
+<tool_calls>
+{"name": <tool-name>, "arguments": <args-json-object>}
+...
+</tool_calls>[e~[
+]~b]user name=User
+When were the most recent launch events for OpenAI and Gemini?[e~[
+]~b]ai name=MiniMax AI
+```
+### Model Output Format
+The model outputs function calls in the following format:
+```xml
+<think>
+Okay, I will search for the OpenAI and Gemini latest release.
+</think>
+<tool_calls>
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
+</tool_calls>
+```
+## 📥 Function Call Result Processing
+### Parsing Function Calls
+You can use the following code to parse function calls from the model output:
+```python
+import re
+import json
+def parse_function_calls(content: str):
+    """
+    Parse function calls from model output
+    """
+    function_calls = []
+    # Match content within <tool_calls> tags
+    tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
+    tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
+    if not tool_calls_match:
+        return function_calls
+    tool_calls_content = tool_calls_match.group(1).strip()
+    # Parse each function call (one JSON object per line)
+    for line in tool_calls_content.split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            # Parse JSON format function call
+            call_data = json.loads(line)
+            function_name = call_data.get("name")
+            arguments = call_data.get("arguments", {})
+            function_calls.append({
+                "name": function_name,
+                "arguments": arguments
+            })
+            print(f"Function call: {function_name}, Arguments: {arguments}")
+        except json.JSONDecodeError as e:
+            print(f"Parameter parsing failed: {line}, Error: {e}")
+    return function_calls
+# Example: Handle weather query function
+def execute_function_call(function_name: str, arguments: dict):
+    """
+    Execute function call and return result
+    """
+    if function_name == "get_current_weather":
+        location = arguments.get("location", "Unknown location")
+        # Build function execution result
+        return {
+            "role": "tool", 
+            "name": function_name, 
+            "content": json.dumps({
+                "location": location, 
+                "temperature": "25", 
+                "unit": "celsius", 
+                "weather": "Sunny"
+            }, ensure_ascii=False)
+        }
+    elif function_name == "search_web":
+        query_list = arguments.get("query_list", [])
+        query_tag = arguments.get("query_tag", [])
+        # Simulate search results
+        return {
+            "role": "tool",
+            "name": function_name,
+            "content": f"Search keywords: {query_list}, Categories: {query_tag}\nSearch results: Relevant information found"
+        }
+    return None
+```
+### Returning Function Execution Results to the Model
+After successfully parsing function calls, you should add the function execution results to the conversation history so that the model can access and utilize this information in subsequent interactions.
+#### Single Result
+If the model decides to call `search_web`, we suggest you to return the function result in the following format, with the `name` field set to the specific tool name.
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "search_web", 
+       "content": "search_result"
+     }
+  ]
+}
+```
+Corresponding model input format:
+```
+]~b]tool name=search_web
+search_result[e~[
+```
+#### Multiple Result
+If the model decides to call `search_web` and `get_current_weather` at the same time, we suggest you to return the multiple function results in the following format, with the `name` field set to "tools", and use the `content` field to contain multiple results.
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "tools", 
+       "content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
+     }
+  ]
+}
+```
+Corresponding model input format:
+```
+]~b]tool name=tools
+Tool name: search_web
+Tool result: test_result1
+Tool name: get_current_weather
+Tool result: test_result2[e~[
+```
+While we suggest following the above formats, as long as the model input is easy to understand, the specific values of `name` and `content` is entirely up to the caller.
--- a/docs/function_call_guide_cn.md
+++ b/docs/function_call_guide_cn.md
+# MiniMax-M1 函数调用（Function Call）功能指南
+## 📖 简介
+MiniMax-M1 模型支持函数调用功能，使模型能够识别何时需要调用外部函数，并以结构化格式输出函数调用参数。本文档详细介绍了如何使用 MiniMax-M1 的函数调用功能。
+## 🚀 快速开始
+### 聊天模板使用
+MiniMax-M1 使用特定的聊天模板格式处理函数调用。聊天模板定义在 `tokenizer_config.json` 中，你可以在代码中通过 template 来进行使用。
+```python
+from transformers import AutoTokenizer
+def get_default_tools():
+    return [
+        {
+          {
+            "name": "get_current_weather",
+            "description": "Get the latest weather for a location",
+            "parameters": {
+                "type": "object", 
+                "properties": {
+                    "location": {
+                        "type": "string", 
+                        "description": "A certain city, such as Beijing, Shanghai"
+                    }
+                }, 
+            }
+            "required": ["location"],
+            "type": "object"
+          }
+        }
+    ]
+# 加载模型和分词器
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "What's the weather like in Shanghai today?"
+messages = [
+    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
+    {"role": "user", "content": [{"type": "text", "text": prompt}]},
+]
+# 启用函数调用工具
+tools = get_default_tools()
+# 应用聊天模板，并加入工具定义
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    tools=tools
+)
+```
+## 🛠️ 函数调用的定义
+### 函数结构体
+函数调用需要在请求体中定义 `tools` 字段，每个函数由以下部分组成：
+```json
+{
+  "tools": [
+    {
+      "name": "search_web",
+      "description": "搜索函数。",
+      "parameters": {
+        "properties": {
+          "query_list": {
+            "description": "进行搜索的关键词，列表元素个数为1。",
+            "items": { "type": "string" },
+            "type": "array"
+          },
+          "query_tag": {
+            "description": "query的分类",
+            "items": { "type": "string" },
+            "type": "array"
+          }
+        },
+        "required": [ "query_list", "query_tag" ],
+        "type": "object"
+      }
+    }
+  ]
+}
+```
+**字段说明：**
+- `name`: 函数名称
+- `description`: 函数功能描述
+- `parameters`: 函数参数定义
+  - `properties`: 参数属性定义，key 是参数名，value 包含参数的详细描述
+  - `required`: 必填参数列表
+  - `type`: 参数类型（通常为 "object"）
+### 模型内部处理格式
+在模型内部处理时，函数定义会被转换为特殊格式并拼接到输入文本中：
+```
+]~!b[]~b]system ai_setting=MiniMax AI
+MiniMax AI是由上海稀宇科技有限公司（MiniMax）自主研发的AI助理。[e~[
+]~b]system tool_setting=tools
+You are provided with these tools:
+<tools>
+{"name": "search_web", "description": "搜索函数。", "parameters": {"properties": {"query_list": {"description": "进行搜索的关键词，列表元素个数为1。", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "query的分类", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
+</tools>
+If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
+<tool_calls>
+{"name": <tool-name>, "arguments": <args-json-object>}
+...
+</tool_calls>[e~[
+]~b]user name=用户
+OpenAI 和 Gemini 的最近一次发布会都是什么时候?[e~[
+]~b]ai name=MiniMax AI
+```
+### 模型输出格式
+模型会以以下格式输出函数调用：
+```xml
+<think>
+Okay, I will search for the OpenAI and Gemini latest release.
+</think>
+<tool_calls>
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
+</tool_calls>
+```
+## 📥 函数调用结果处理
+### 解析函数调用
+您可以使用以下代码解析模型输出的函数调用：
+```python
+import re
+import json
+def parse_function_calls(content: str):
+    """
+    解析模型输出中的函数调用
+    """
+    function_calls = []
+    # 匹配 <tool_calls> 标签内的内容
+    tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
+    tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
+    if not tool_calls_match:
+        return function_calls
+    tool_calls_content = tool_calls_match.group(1).strip()
+    # 解析每个函数调用（每行一个JSON对象）
+    for line in tool_calls_content.split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            # 解析JSON格式的函数调用
+            call_data = json.loads(line)
+            function_name = call_data.get("name")
+            arguments = call_data.get("arguments", {})
+            function_calls.append({
+                "name": function_name,
+                "arguments": arguments
+            })
+            print(f"调用函数: {function_name}, 参数: {arguments}")
+        except json.JSONDecodeError as e:
+            print(f"参数解析失败: {line}, 错误: {e}")
+    return function_calls
+# 示例：处理天气查询函数
+def execute_function_call(function_name: str, arguments: dict):
+    """
+    执行函数调用并返回结果
+    """
+    if function_name == "get_current_weather":
+        location = arguments.get("location", "未知位置")
+        # 构建函数执行结果
+        return {
+            "role": "tool", 
+            "name": function_name, 
+            "content": json.dumps({
+                "location": location, 
+                "temperature": "25", 
+                "unit": "celsius", 
+                "weather": "晴朗"
+            }, ensure_ascii=False)
+        }
+    elif function_name == "search_web":
+        query_list = arguments.get("query_list", [])
+        query_tag = arguments.get("query_tag", [])
+        # 模拟搜索结果
+        return {
+            "role": "tool",
+            "name": function_name,
+            "content": f"搜索关键词: {query_list}, 分类: {query_tag}\n搜索结果: 相关信息已找到"
+        }
+    return None
+```
+### 将函数执行结果返回给模型
+成功解析函数调用后，您应将函数执行结果添加到对话历史中，以便模型在后续交互中能够访问和利用这些信息。
+#### 单个结果
+假如模型调用了 `search_web` 函数，您可以参考如下格式添加执行结果，`name` 字段为具体的函数名称。
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "search_web", 
+       "content": "search_result"
+     }
+  ]
+}
+```
+对应如下的模型输入格式：
+```
+]~b]tool name=search_web
+search_result[e~[
+```
+#### 多个结果
+假如模型同时调用了 `search_web` 和 `get_current_weather` 函数，您可以参考如下格式添加执行结果，`name` 字段为"tools"，`content`包含多个结果。
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "tools", 
+       "content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
+     }
+  ]
+}
+```
+对应如下的模型输入格式：
+```
+]~b]tool name=tools
+Tool name: search_web
+Tool result: test_result1
+Tool name: get_current_weather
+Tool result: test_result2[e~[
+```
+虽然我们建议您参考以上格式，但只要返回给模型的输入易于理解，`name` 和 `content` 的具体内容完全由您自主决定。
--- a/docs/function_call_guide_pt-br.md
+++ b/docs/function_call_guide_pt-br.md
+# Guia de Uso de Function Call no MiniMax-M1
+[FunctionCall中文使用指南](./function_call_guide_cn.md)
+## 📖 Introdução
+O modelo MiniMax-M1 possui suporte para chamadas de funções (Function Call), permitindo que o modelo identifique quando funções externas precisam ser chamadas e gere os parâmetros dessas chamadas em um formato estruturado. Este documento fornece instruções detalhadas sobre como utilizar o recurso de chamadas de funções do MiniMax-M1.
+## 🚀 Início Rápido
+### Usando o Template de Chat
+O MiniMax-M1 utiliza um template específico de chat para lidar com chamadas de funções. Este template é definido no arquivo `tokenizer_config.json` e pode ser utilizado no seu código através do template.
+```python
+from transformers import AutoTokenizer
+def get_default_tools():
+    return [
+        {
+          {
+            "name": "get_current_weather",
+            "description": "Get the latest weather for a location",
+            "parameters": {
+                "type": "object", 
+                "properties": {
+                    "location": {
+                        "type": "string", 
+                        "description": "A certain city, such as Beijing, Shanghai"
+                    }
+                }, 
+            }
+            "required": ["location"],
+            "type": "object"
+          }
+        }
+    ]
+# Modelo de carga e tokenizador
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "What's the weather like in Shanghai today?"
+messages = [
+    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-M1 model."}]},
+    {"role": "user", "content": [{"type": "text", "text": prompt}]},
+]
+# Habilitar ferramentas de chamada de função
+tools = get_default_tools()
+# Aplicar modelo de bate-papo e adicionar definições de ferramentas
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    tools=tools
+)
+```
+## 🛠️ Definição de Function Call
+### Estrutura da Função
+As funções precisam ser definidas no campo `tools` do corpo da requisição. Cada função é composta pelos seguintes elementos:
+```json
+{
+  "tools": [
+    {
+      "name": "search_web",
+      "description": "Search function.",
+      "parameters": {
+        "properties": {
+          "query_list": {
+            "description": "Keywords for search, with list element count of 1.",
+            "items": { "type": "string" },
+            "type": "array"
+          },
+          "query_tag": {
+            "description": "Classification of the query",
+            "items": { "type": "string" },
+            "type": "array"
+          }
+        },
+        "required": [ "query_list", "query_tag" ],
+        "type": "object"
+      }
+    }
+  ]
+}
+```
+**Descrição dos Campos:**
+* `name`: Nome da função
+* `description`: Descrição da função
+* `parameters`: Definição dos parâmetros da função
+  * `properties`: Definições dos parâmetros, onde a chave é o nome do parâmetro e o valor contém a descrição
+  * `required`: Lista de parâmetros obrigatórios
+  * `type`: Tipo de dado (geralmente "object")
+### Formato Interno de Processamento do Modelo
+Internamente, as definições de funções são convertidas para um formato especial e concatenadas ao texto de entrada:
+```
+]~!b[]~b]system ai_setting=MiniMax AI
+MiniMax AI is an AI assistant independently developed by MiniMax. [e~[
+]~b]system tool_setting=tools
+You are provided with these tools:
+<tools>
+{"name": "search_web", "description": "Search function.", "parameters": {"properties": {"query_list": {"description": "Keywords for search, with list element count of 1.", "items": {"type": "string"}, "type": "array"}, "query_tag": {"description": "Classification of the query", "items": {"type": "string"}, "type": "array"}}, "required": ["query_list", "query_tag"], "type": "object"}}
+</tools>
+If you need to call tools, please respond with <tool_calls></tool_calls> XML tags, and provide tool-name and json-object of arguments, following the format below:
+<tool_calls>
+{"name": <tool-name>, "arguments": <args-json-object>}
+...
+</tool_calls>[e~[
+]~b]user name=User
+When were the most recent launch events for OpenAI and Gemini?[e~[
+]~b]ai name=MiniMax AI
+```
+### Formato de Saída do Modelo
+O modelo gera chamadas de função no seguinte formato:
+```xml
+<think>
+Ok, vou procurar a versão mais recente do OpenAI e do Gemini.
+</think>
+<tool_calls>
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"OpenAI\" \"latest\" \"release\""]}}
+{"name": "search_web", "arguments": {"query_tag": ["technology", "events"], "query_list": ["\"Gemini\" \"latest\" \"release\""]}}
+</tool_calls>
+```
+## 📥 Processamento dos Resultados da Function Call
+### Fazendo o Parse das Chamadas de Função
+Você pode utilizar o código abaixo para extrair as chamadas de função a partir da saída do modelo:
+```python
+import re
+import json
+def parse_function_calls(content: str):
+    """
+    Parse function calls from model output
+    """
+    function_calls = []
+    # Corresponder conteúdo dentro das tags <tool_calls>
+    tool_calls_pattern = r"<tool_calls>(.*?)</tool_calls>"
+    tool_calls_match = re.search(tool_calls_pattern, content, re.DOTALL)
+    if not tool_calls_match:
+        return function_calls
+    tool_calls_content = tool_calls_match.group(1).strip()
+    # Analisar cada chamada de função (um objeto JSON por linha)
+    for line in tool_calls_content.split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            # Chamada de função de formato JSON de análise
+            call_data = json.loads(line)
+            function_name = call_data.get("name")
+            arguments = call_data.get("arguments", {})
+            function_calls.append({
+                "name": function_name,
+                "arguments": arguments
+            })
+            print(f"Function call: {function_name}, Arguments: {arguments}")
+        except json.JSONDecodeError as e:
+            print(f"Parameter parsing failed: {line}, Error: {e}")
+    return function_calls
+# Exemplo: Manipular função de consulta de clima
+def execute_function_call(function_name: str, arguments: dict):
+    """
+    Execute function call and return result
+    """
+    if function_name == "get_current_weather":
+        location = arguments.get("location", "Unknown location")
+        # Resultado da execução da função de construção
+        return {
+            "role": "tool", 
+            "name": function_name, 
+            "content": json.dumps({
+                "location": location, 
+                "temperature": "25", 
+                "unit": "celsius", 
+                "weather": "Sunny"
+            }, ensure_ascii=False)
+        }
+    elif function_name == "search_web":
+        query_list = arguments.get("query_list", [])
+        query_tag = arguments.get("query_tag", [])
+        # Simular resultados de pesquisa
+        return {
+            "role": "tool",
+            "name": function_name,
+            "content": f"Search keywords: {query_list}, Categories: {query_tag}\nSearch results: Relevant information found"
+        }
+    return None
+```
+### Retornando os Resultados das Funções para o Modelo
+Após interpretar e executar as funções, você deve adicionar os resultados na sequência de mensagens, para que o modelo os utilize nas respostas seguintes.
+#### Resultado Único
+Se o modelo solicitar a função `search_web`, retorne no seguinte formato, com o campo `name` igual ao nome da ferramenta:
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "search_web", 
+       "content": "search_result"
+     }
+  ]
+}
+```
+Formato correspondente no input do modelo:
+```
+]~b]tool name=search_web
+search_result[e~[
+```
+#### Vários Resultados
+Se o modelo solicitar simultaneamente `search_web` e `get_current_weather`, envie da seguinte forma, usando `name` como "tools" e colocando todos os resultados no campo `content`:
+```json
+{
+  "data": [
+     {
+       "role": "tool", 
+       "name": "tools", 
+       "content": "Tool name: search_web\nTool result: test_result1\n\nTool name: get_current_weather\nTool result: test_result2"
+     }
+  ]
+}
+```
+Formato correspondente no input do modelo:
+```
+]~b]tool name=tools
+Tool name: search_web
+Tool result: resultado1
+Tool name: get_current_weather
+Tool result: resultado2[e~[
+```
+Embora esse seja o formato recomendado, desde que a entrada seja clara para o modelo, os valores de `name` e `content` podem ser adaptados conforme a necessidade.
--- a/docs/transformers_deployment_guide.md
+++ b/docs/transformers_deployment_guide.md
+# 🚀 MiniMax Model Transformers Deployment Guide
+[Transformers中文版部署指南](./transformers_deployment_guide_cn.md)
+## 📖 Introduction
+This guide will help you deploy the MiniMax-M1 model using the [Transformers](https://huggingface.co/docs/transformers/index) library. Transformers is a widely used deep learning library that provides a rich collection of pre-trained models and flexible model operation interfaces.
+## 🛠️ Environment Setup
+### Installing Transformers
+```bash
+pip install transformers torch accelerate
+```
+## 📋 Basic Usage Example
+The pre-trained model can be used as follows:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
+MODEL_PATH = "{MODEL_PATH}"
+model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
+messages = [
+    {"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
+    {"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
+    {"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
+generation_config = GenerationConfig(
+    max_new_tokens=20,
+    eos_token_id=tokenizer.eos_token_id,
+    use_cache=True,
+)
+generated_ids = model.generate(**model_inputs, generation_config=generation_config)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## ⚡ Performance Optimization
+### Speeding up with Flash Attention
+The code snippet above showcases inference without any optimization tricks. However, one can drastically speed up the model by leveraging [Flash Attention](../perf_train_gpu_one#flash-attention-2), which is a faster implementation of the attention mechanism used inside the model.
+First, make sure to install the latest version of Flash Attention 2:
+```bash
+pip install -U flash-attn --no-build-isolation
+```
+Also make sure that you have hardware that is compatible with Flash-Attention 2. Read more about it in the official documentation of the [Flash Attention repository](https://github.com/Dao-AILab/flash-attention). Additionally, ensure you load your model in half-precision (e.g. `torch.float16`).
+To load and run a model using Flash Attention-2, refer to the snippet below:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_PATH = "{MODEL_PATH}"
+model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
+prompt = "My favourite condiment is"
+model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
+generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
+response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
+```
+## 📮 Getting Support
+If you encounter any issues while deploying the MiniMax-M1 model:
+- Please check our official documentation
+- Contact our technical support team through official channels
+- Submit an Issue on our GitHub repository
+We continuously optimize the deployment experience on Transformers and welcome your feedback!