Commit fbb7398d authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2228 failed with stages
in 0 seconds
---
license: apache-2.0
library_name: transformers
inference: false
---
<p align="center">
<img src="assets/logo.png" width="150" style="margin-bottom: 0.2;"/>
<p>
# 🍓 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
<!-- Broader Real-World Applications -->
<!-- # 🍓 Marco-o1: An Open Large Reasoning Model for Real-World Solutions -->
<!-- <h2 align="center"> <a href="https://github.com/AIDC-AI/Marco-o1/">Marco-o1</a></h2> -->
<!-- <h5 align="center"> If you appreciate our project, please consider giving us a star ⭐ on GitHub to stay updated with the latest developments. </h2> -->
<div align="center">
<!-- **Affiliations:** -->
⭐ _**MarcoPolo Team**_ ⭐
[_**AI Business, Alibaba International Digital Commerce**_](https://aidc-ai.com)
[**Github**](https://github.com/AIDC-AI/Marco-o1) 🤗 [**Hugging Face**](https://huggingface.co/AIDC-AI/Marco-o1) 📝 [**Paper**](https://arxiv.org/abs/2411.14405) 🧑‍💻 [**Model**](https://huggingface.co/AIDC-AI/Marco-o1) 🗂️ [**Data**](https://github.com/AIDC-AI/Marco-o1/tree/main/data) 📽️ [**Demo**](https://huggingface.co/AIDC-AI/Marco-o1)
</div>
🎯 **Marco-o1** not only focuses on disciplines with standard answers, such as mathematics, physics, and coding—which are well-suited for reinforcement learning (RL)—but also places greater emphasis on **open-ended resolutions**. We aim to address the question: _"Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?"_
Currently, Marco-o1 Large Language Model (LLM) is powered by _Chain-of-Thought (CoT) fine-tuning_, _Monte Carlo Tree Search (MCTS)_, _reflection mechanisms_, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks.
⚠️ **Limitations:** <ins>We would like to emphasize that this research work is inspired by OpenAI's o1 (from which the name is also derived). This work aims to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models. Besides, our focus is on open-ended questions, and we have observed interesting phenomena in multilingual applications. However, we must acknowledge that the current model primarily exhibits o1-like reasoning characteristics and its performance still fall short of a fully realized "o1" model. This is not a one-time effort, and we remain committed to continuous optimization and ongoing improvement.</ins>
![img.png](assets/img.png)
## 🚀 Highlights
Currently, our work is distinguished by the following highlights:
- 🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing full-parameter fine-tuning on the base model using open-source CoT dataset combined with our self-developed synthetic data.
- 🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS (Marco-o1-MCTS), using the model's output confidence to guide the search and expand the solution space.
- 🍀 Reasoning Action Strategy: We implement novel reasoning action strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step), including exploring different action granularities within the MCTS framework and prompting the model to self-reflect, thereby significantly enhancing the model's ability to solve complex problems.
- 🍀 Application in Translation Tasks: We are the first to apply Large Reasoning Models (LRM) to Machine Translation task, exploring inference time scaling laws in the multilingual and translation domain.
OpenAI recently introduced the groundbreaking o1 model, renowned for its exceptional reasoning capabilities. This model has demonstrated outstanding performance on platforms such as AIME, CodeForces, surpassing other leading models. Inspired by this success, we aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges.
🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS, and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improved its handling of complex tasks. MCTS allows exploration of multiple reasoning paths using confidence scores derived from softmax-applied log probabilities of the top-k alternative tokens, guiding the model to optimal solutions. Moreover, our reasoning action strategy involves varying the granularity of actions within steps and mini-steps to optimize search efficiency and accuracy.
<div align="center">
<img src="assets/intro_2.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 2: </strong>The overview of Marco-o1.</p>
</div>
🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of +6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese) dataset, showcasing enhanced reasoning capabilities.
<div align="center">
<img src="assets/results.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 3: </strong>The main results of Marco-o1.</p>
</div>
🌎 Additionally, in translation tasks, we demonstrate that Marco-o1 excels in translating slang expressions, such as translating "这个鞋拥有踩屎感" (literal translation: "This shoe offers a stepping-on-poop sensation.") to "This shoe has a comfortable sole," demonstrating its superior grasp of colloquial nuances.
<div align="center">
<img src="assets/translation.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 4: </strong>The demostration of translation task using Marco-o1.</p>
</div>
For more information,please visit our [**Github**](https://github.com/AIDC-AI/Marco-o1).
## Usage
1. **Load Marco-o1-CoT model:**
```
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")
```
2. **Inference:**
Execute the inference script (you can give any customized inputs inside):
```
./src/talk_with_model.py
# Use vLLM
./src/talk_with_model_vllm.py
```
# 👨🏻‍💻 Acknowledgement
## Main Contributors
From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:
- Yu Zhao
- [Huifeng Yin](https://github.com/HuifengYin)
- Hao Wang
- [Longyue Wang](http://www.longyuewang.com)
## Citation
If you find Marco-o1 useful for your research and applications, please cite:
```
@misc{zhao2024marcoo1openreasoningmodels,
title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
year={2024},
eprint={2411.14405},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.14405},
}
```
## LICENSE
This project is licensed under [Apache License Version 2](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) (SPDX-License-identifier: Apache-2.0).
## DISCLAIMER
We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.
\ No newline at end of file
Copyright (C) 2024 AIDC-AI
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Copyright 2018- The Hugging Face team. All rights reserved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
\ No newline at end of file
Copyright (C) 2024 AIDC-AI
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This model was trained based on the following model:
Qwen2.5-7B-Instruct (https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), license:(https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE, SPDX-License-identifier:Apache-2.0)
# Marco-o1
通过结合CoT微调、MCTS算法和推理行动策略,Marco-o1超越部分闭源产品。
## 论文
`Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions`
- https://arxiv.org/pdf/2411.14405
## 模型结构
本项目基于Qwen2-7B-Instruct进行全参微调,模型结构类似Llama系列,采用极简Decoder-only结构,Llama源自基本的transformer结构,主体为attention(QKV自点积)+ffn(全连接),最后外加一个softmax进行概率转换输出即可,为了使数据分布归一化方便训练收敛,在attention、ffn、softmax前分别再加一个RMS Norm。
<div align=center>
<img src="./doc/llama3.png"/>
</div>
## 算法原理
Marco-o1采用思维链(CoT)微调、蒙特卡罗树搜索(MCTS)、反思机制和创新的推理动作策略,专门针对复杂的现实问题求解任务进行了优化。
使用MCTS生成Marco-o1 CoT数据集,有助于形成复杂的推理路径、增强模型的推理能力,MCTS模拟次数越多,搜索结果越接近最优解。
<div align=center>
<img src="./doc/algorithm.png"/>
</div>
## 环境配置
```
mv Marco-o1_pytorch Marco-o1 # 去框架名后缀
```
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-py3.10-dtk24.04.3-ubuntu20.04
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:b272aae8ec72
docker run -it --shm-size=64G -v $PWD/Marco-o1:/home/Marco-o1 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name marcoo1 <your IMAGE ID> bash
cd /home/Marco-o1
pip install -r requirements.txt
pip install whl/lmslim-0.1.2+das.dtk24043-cp310-cp310-linux_x86_64.whl # 安装lmslim==0.1.2
pip install whl/vllm-0.6.2+das.opt1.cd549d3.dtk24043-cp310-cp310-linux_x86_64.whl # 安装vllm==0.6.2
```
### Dockerfile(方法二)
```
cd cd /home/Marco-o1/docker
docker build --no-cache -t Marco-o1:latest .
docker run --shm-size=64G --name marcoo1 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../Marco-o1:/home/Marco-o1 -it marcoo1 bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
cd /home/Marco-o1
pip install whl/lmslim-0.1.2+das.dtk24043-cp310-cp310-linux_x86_64.whl # 安装lmslim==0.1.2
pip install whl/vllm-0.6.2+das.opt1.cd549d3.dtk24043-cp310-cp310-linux_x86_64.whl # 安装vllm==0.6.2
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.hpccube.com/tool/
```
DTK驱动:dtk24.04.3
python:python3.10
torch:2.3.0
torchvision:0.18.1
torchaudio:2.1.2
triton:2.1.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.3.0
xformers:0.0.25
transformers:4.48.0
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
```
cd /home/Marco-o1
pip install -r requirements.txt
pip install whl/lmslim-0.1.2+das.dtk24043-cp310-cp310-linux_x86_64.whl # 安装lmslim==0.1.2
pip install whl/vllm-0.6.2+das.opt1.cd549d3.dtk24043-cp310-cp310-linux_x86_64.whl # 安装vllm==0.6.2
```
## 数据集
`无`
demo数据目录结构如下:
```
/home/Marco-o1/data
├── CoT_demo.json
```
## 推理
### 单机多卡
**方法一**
```
python src/talk_with_model.py
```
**方法二**
```
python src/talk_with_model_vllm.py
```
更多资料可参考源项目的[`README_origin`](./README_origin.md)
## result
`输入: `
```
User: 地球的周长是多少
```
`输出:`
```
<Thought>
Alright, I need to figure out the Earth's circumference. Let's start by recalling some basic geography. The Earth is an oblate spheroid, which means it's slightly flattened at the poles and bulging at the equator. But for simplicity, I'll consider it a perfect sphere to estimate the circumference.
First, I remember that the Earth's equatorial circumference is approximately 40,075 kilometers. That's a standard figure I've heard before. But wait, is that accurate? Maybe I should double-check that number. Let me think... Yes, 40,075 kilometers is the commonly accepted value for the Earth's circumference at the equator.
But what about the polar circumference? I think it's slightly less due to the Earth's oblate shape. Let me recall... The polar circumference is roughly 40,008 kilometers. Okay, so there's a slight difference between the equatorial and polar circumferences.
Now, if I wanted to calculate the Earth's circumference using its radius, I could use the formula for the circumference of a circle: C = 2πr. I know the Earth's average radius is about 6,371 kilometers. Plugging that in: C = 2 * π * 6,371 km ��
40,075 km. That matches the number I initially thought of.
But wait, is the Earth's radius exactly 6,371 kilometers? I think it varies slightly depending on where you measure it. The Earth is not a perfect sphere; it's slightly flattened at the poles and bulging at the equator. So, the radius at the equator is a bit different from the radius at the poles.
Let me think about the difference. The Earth's equatorial radius is approximately 6,378 kilometers, while the polar radius is about 6,357 kilometers. That's a difference of about 21 kilometers. So, if I use the average radius, 6,371 kilometers, it makes sense that the circumference comes out to around 40,075 kilometers.
But what if I use the polar radius instead? Let's try that: C = 2 * π * 6,357 km �� 40,008 km. That's the polar circumference. So, depending on where you measure the circumference, it can vary slightly.
Is there a standard value that's widely accepted? Yes, the equatorial circumference is generally cited as 40,075 kilometers, which is the most commonly used figure. However, it's good to be aware of the slight variations due to the Earth's shape.
Another thought: how does this compare to other planets? For example, Mars has a circumference of about 21,280 kilometers, which is much smaller. Earth's circumference is significantly larger, which makes sense given its size.
Also, considering the Earth's circumference, it's interesting to think about how distances are measured on our planet. For instance, one degree of latitude is approximately 111 kilometers, which is useful for navigation and mapping.
But back to the main question: what's the Earth's circumference? Based on the information I have, the equatorial circumference is approximately 40,075 kilometers, and the polar circumference is about 40,008 kilometers. The average circumference, using the average radius, is also around 40,075 kilometers.
Is there any other factor I should consider? Maybe the Earth's rotation affects its circumference, but I think that's negligible for this calculation. The Earth's rotation causes the equator to bulge outward, but the difference in circumference is already accounted for in the measurements.
In conclusion, the Earth's circumference is approximately 40,075 kilometers at the equator, with a slight variation at the poles.
</Thought>
<Output>
The Earth's circumference is approximately 40,075 kilometers at the equator.
</Output>
```
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`制造,广媒,金融,能源,医疗,家居,教育`
## 预训练权重
预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels) ,项目中的预训练权重可从快速下载通道下载:[AIDC-AI/Marco-o1](http://113.200.138.88:18080/aimodels/aidc-ai-group/Marco-o1.git)
Hugging Face下载地址为:[AIDC-AI/Marco-o1](https://huggingface.co/AIDC-AI/Marco-o1)
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/marco-o1_pytorch.git
## 参考资料
- https://github.com/AIDC-AI/Marco-o1.git
<p align="center">
<img src="assets/logo.png" width="150" style="margin-bottom: 0.2;"/>
<p>
# 🍓 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
<!-- Broader Real-World Applications -->
<!-- # 🍓 Marco-o1: An Open Large Reasoning Model for Real-World Solutions -->
<!-- <h2 align="center"> <a href="https://github.com/AIDC-AI/Marco-o1/">Marco-o1</a></h2> -->
<!-- <h5 align="center"> If you appreciate our project, please consider giving us a star ⭐ on GitHub to stay updated with the latest developments. </h2> -->
<h4 align="center">
<!-- [![🤗Hugging Face](https://img.shields.io/badge/🤗Hugging_Face-Marco_o1-yellow)](https://huggingface.co/) [![Project Page](https://img.shields.io/badge/Project_Page-Marco_o1-blue)](https://github.com/AIDC-AI/Marco-o1/) -->
<div align="center">
<img src="https://img.shields.io/badge/Version-1.0.0-blue.svg" alt="Version">
<img src="https://img.shields.io/badge/License-Apache%202.0-green.svg" alt="License">
<img src="https://img.shields.io/github/stars/AIDC-AI/Marco-o1?color=yellow" alt="Stars">
<img src="https://img.shields.io/github/issues/AIDC-AI/Marco-o1?color=red" alt="Issues">
<img src="https://img.shields.io/badge/python-3.8-purple.svg" alt="Python">
</h4>
<div align="center">
<!-- **Affiliations:** -->
⭐ _**MarcoPolo Team**_ ⭐
[_**Alibaba International Digital Commerce**_](https://aidc-ai.com)
:octocat: [**Github**](https://github.com/AIDC-AI/Marco-o1) 🤗 [**Hugging Face**](https://huggingface.co/AIDC-AI/Marco-o1) 📝 [**Paper**](https://arxiv.org/abs/2411.14405) 🧑‍💻 [**Model**](https://huggingface.co/AIDC-AI/Marco-o1) 🗂️ [**Data**](https://github.com/AIDC-AI/Marco-o1/tree/main/data) 📽️ [**Demo**](https://huggingface.co/AIDC-AI/Marco-o1)
</div>
#
🎯 **Marco-o1** not only focuses on disciplines with standard answers, such as mathematics, physics, and coding—which are well-suited for reinforcement learning (RL)—but also places greater emphasis on **open-ended resolutions**. We aim to address the question: _"Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?"_
Currently, Marco-o1 Large Language Model (LLM) is powered by _Chain-of-Thought (CoT) fine-tuning_, _Monte Carlo Tree Search (MCTS)_, _reflection mechanisms_, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks.
⚠️ **Limitations:** <ins>We would like to emphasize that this research work is inspired by OpenAI's o1 (from which the name is also derived). This work aims to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models. Besides, our focus is on open-ended questions, and we have observed interesting phenomena in multilingual applications. However, we must acknowledge that the current model primarily exhibits o1-like reasoning characteristics and its performance still fall short of a fully realized "o1" model. This is not a one-time effort, and we remain committed to continuous optimization and ongoing improvement.</ins>
<div align="center">
</div>
<div align="center">
<img src="assets/strawberry_2.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 1: </strong> A classic 'strawberry' question reasoned by our Marco-o1 model: "How many 'r' are in strawberry". Although the answer is correct, the final letter 'y' is overlooked during CoT. This is an interesting finding, which is discussed in issue https://github.com/AIDC-AI/Marco-o1/issues/3.</p>
</div>
## 🚀 Highlights
Currently, our work is distinguished by the following highlights:
- 🍀 **Fine-Tuning with CoT Data:** We develop <ins>Marco-o1-CoT</ins> by performing full-parameter fine-tuning on the base model using open-source CoT dataset combined with our self-developed synthetic data.
- 🍀 **Solution Space Expansion via MCTS:** We integrate LLMs with MCTS (<ins>Marco-o1-MCTS</ins>), using the model's output confidence to guide the search and expand the solution space.
- 🍀 **Reasoning Action Strategy:** We implement novel reasoning action strategies and a reflection mechanism (<ins>Marco-o1-MCTS mini-step</ins>), including exploring different action granularities within the MCTS framework and prompting the model to self-reflect, thereby significantly enhancing the model's ability to solve complex problems.
- 🍀 **Application in Translation Tasks:** We are the first to apply Large Reasoning Models (LRM) to <ins>Machine Translation task</ins>, exploring inference time scaling laws in the multilingual and translation domain.
## 🔥 News
<!-- ## Coming Soon -->
<!-- This is our initial version, and we will continue to update and enhance the model's reasoning capabilities. -->
1. [Coming Soon] 🏃 **Reward Models:** We are working on training reward models, including Outcome Reward Modeling (ORM) and Process Reward Modeling (PRM), to provide a more accurate reward signal for MCTS. A more precise reward function will help reduce randomness in tree search results and improve overall performance.
2. [Coming Soon] 🏃 **Reinforcement Learning:** We are conducting reinforcement learning training to develop better reasoning models. By utilizing RL techniques, we aim to refine the model's decision-making processes and further enhance its problem-solving abilities.
- [2024/11/13] 🔥 We released **Marco-o1**. This initial release includes our reasoning model, optimized for complex problem-solving and versatile applications across various domains.
## 🔔 Introduction
OpenAI recently introduced the groundbreaking o1 model, renowned for its exceptional reasoning capabilities. This model has demonstrated outstanding performance on platforms such as AIME and CodeForces, surpassing other leading models. Inspired by this success, we aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges.
🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS, and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improved its handling of complex tasks. MCTS allows exploration of multiple reasoning paths using confidence scores derived from softmax-applied log probabilities of the top-k alternative tokens, guiding the model to optimal solutions. Moreover, our reasoning action strategy involves varying the granularity of actions within steps and mini-steps to optimize search efficiency and accuracy.
<div align="center">
<img src="assets/intro_2.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 2: </strong>The overview of Marco-o1.</p>
</div>
🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of +6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese) dataset, showcasing enhanced reasoning capabilities.
<div align="center">
<img src="assets/results.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 3: </strong>The main results of Marco-o1.</p>
</div>
🌎 Additionally, in translation tasks, we demonstrate that Marco-o1 excels in translating slang expressions, such as translating "这个鞋拥有踩屎感" (literal translation: "This shoe offers a stepping-on-poop sensation.") to "This shoe has a comfortable sole," demonstrating its superior grasp of colloquial nuances.
<div align="center">
<img src="assets/translation.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 4: </strong>The demonstration of translation task using Marco-o1.</p>
</div>
<!-- ## 🎨 Case Show
This is a classic example from our Marco-o1 model, "How many 'r's are in the word 'strawberry'?" Through multiple search steps, the correct answer was obtained, as shown in Figure 2. Although we tested general reasoning capabilities, our primary focus is on tackling challenging translation problems. An example of this focus is shown in Figure 3, illustrating the translation of the sentence "This shoe has a comfortable sole and is highly recommended for purchase."
-->
<!-- <div align="center">
<img src="assets/strawberry_2.jpg" alt="Figure Description or Alt Text" width="80%">
<p><strong>Figure 2: </strong>Reasoning example of "How many 'r' are in strawberry"</p>
</div> -->
## 📚 Marco Reasoning Datasets
To enhance the reasoning capabilities of the Marco-o1 model, we employed a
SFT strategy using a variety of datasets.
- 📊 **Open-O1 CoT Dataset (Filtered):** We refined the Open-O1 project's CoT Dataset by applying heuristic and quality filtering processes. This enhancement allowed the model to adopt structured reasoning patterns effectively.
- 📊 **Marco-o1 CoT Dataset (Synthetic):** We generated the Marco-o1 CoT Dataset using MCTS, which helped to formulate complex reasoning pathways, further bolstering the model's reasoning capabilities.
- 📊 **Marco Instruction Dataset:** Recognizing the critical role of robust instruction-following capabilities in executing complex tasks, we incorporated a set of instruction-following data. This integration ensures the model remains competent across a wide range of tasks, maintaining its general effectiveness while significantly boosting its reasoning flair.
| Dataset | \#Samples |
|--------------------|-----------------|
| Open-O1 CoT Dataset (Filtered) | 45,125 |
| Marco-o1 CoT Dataset (Synthetic) | 10,000 |
| Marco Instruction Dataset| 5,141 |
| **Total** | **60,266** |
📥 [Marco Reasoning Dataset](https://github.com/AIDC-AI/Marco-o1/blob/main/data/CoT_demo.json) (Our Partial Dataset)
## 🌟 Solution Space Expansion via MCTS
We integrated LLMs with MCTS to enhance the reasoning capabilities of our Marco-o1 model:
- 💎 **Nodes as Reasoning States:** In the MCTS framework, each node represents a reasoning state of the problem-solving process.
- 💎 **Actions as LLM Outputs:** The possible actions from a node are the outputs generated by the LLM. These outputs represent potential steps or mini-steps in the reasoning chain.
- 💎 **Rollout and Reward Calculation:** During the rollout phase, the LLM continues the reasoning process to a terminal state.
- 💎 **Guiding MCTS:** This reward score $R$ is used to evaluate and select promising paths within the MCTS, effectively guiding the search towards more confident and reliable reasoning chains.
Furthermore, we obtain the value of each state by computing a confidence score using the following formulas:
1. **Confidence Score ($c_i$):**
For each token $t_i$ generated during the rollout, we calculate its confidence score by applying the softmax function to its log probability and the log probabilities of the top 5 alternative tokens. This is given by:
$$c_i = \frac{\exp(p(t_i))}{\sum_{k=1}^{5} \exp(p(t_k))}$$
**where:**
- $c_i$ is the confidence score for the $i^{th}$ token in the rollout.
- $p(t_i)$ is the log probability of the $i^{th}$ token generated by the LLM.
- $p(t_k)$ for $k = 1$ to $5$ are the log probabilities of the top 5 predicted tokens at the $i^{th}$ step.
- $n$ is the total number of tokens in the rollout sequence.
This equation ensures that the confidence score reflects the relative probability of the chosen token compared to the top alternatives, effectively normalizing the scores between 0 and 1.
2. **Reward Score ($v$):**
After obtaining the confidence scores for all tokens in the rollout sequence, we compute the average confidence score across all tokens to derive the overall reward score:
$$v = \frac{1}{n} \sum_{i=1}^{n} c_i$$
**where:** $v$ is the overall reward score for the rollout path.
This average serves as the reward signal that evaluates the quality of the reasoning path taken during the rollout. A higher $v$ indicates a more confident and likely accurate reasoning path.
By employing this method, we effectively expand the solution space, allowing the model to explore a vast array of reasoning paths and select the most probable ones based on calculated confidence scores.
## 🌟 Reasoning Action Strategy
### ✨ Action Selection
We observed that using actions as the granularity for MCTS search is relatively coarse, often causing the model to overlook nuanced reasoning paths crucial for solving complex problems. To address this, we explored different levels of granularity in the MCTS search. Initially, we used steps as the unit of search. To further expand the model's search space and enhance its problem-solving capabilities, we experimented with dividing these steps into smaller units of 64 or 32 tokens, referred to as "mini-step." This finer granularity allowed the model to explore reasoning paths in greater detail. While token-level search offers theoretical maximum flexibility and granularity, it is currently impractical due to the significant computational resources required and the challenges associated with designing an effective reward model at this level.
In our experiments, we implemented the following strategies within the MCTS framework:
- 💎 **step as Action:** We allowed the model to generate complete reasoning steps as actions. Each MCTS node represents an entire thought or action label. This method enables efficient exploration but may miss finer-grained reasoning paths essential for complex problem-solving.
- 💎 **mini-step as Action:** We used mini-steps of 32 or 64 tokens as actions. This finer granularity expands the solution space and improves the model's ability to navigate complex reasoning tasks by considering more nuanced steps in the search process. By exploring the solution space at this level, the model is better equipped to find correct answers that might be overlooked with larger action units.
### ✨ Reflection after Thinking
We introduced a reflection mechanism by adding the phrase **"Wait! Maybe I made some mistakes! I need to rethink from scratch."** at the end of each thought process. This prompts the model to self-reflect and reevaluate its reasoning steps. Implementing this reflection has yielded significant improvements, especially on difficult problems that the original model initially solved incorrectly. With the addition of reflection, approximately half of these challenging problems were answered correctly.
From the self-critic perspective, this approach allows the model to act as its own critic, identifying potential errors in its reasoning. By explicitly prompting the model to question its initial conclusions, we encourage it to re-express and refine its thought process. This self-critical mechanism leverages the model's capacity to detect inconsistencies or mistakes in its own output, leading to more accurate and reliable problem-solving. The reflection step serves as an internal feedback loop, enhancing the model's ability to self-correct without external intervention.
## 🎯 Experimental Results
Based on 💡 [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), we performed SFT using our training data to create 💡 **Marco-o1-CoT**. Besides, we employed Marco-o1-CoT within the framework of MCTS tree search, differentiating by actions:
- 💡 **Marco-o1-MCTS (step)**: using each inference step as an action (step).
- 💡 **Marco-o1-MCTS (mini-step of 64 tokens)**: using a 64-token mini-step as an action (64 tokens).
- 💡 **Marco-o1-MCTS (mini-step of 64 tokens)**: using a 32-token mini-step as an action (32 tokens).
During testing, each model utilized a CoT prompt to ensure consistency in reasoning processes. We then tested these configurations on the English (En) and Chinese (Zh) subsets of the [MGSM dataset](https://huggingface.co/datasets/juletxara/mgsm), obtaining the following results:
| **Model** | **MGSM-En (Acc.)** | **MGSM-Zh (Acc.)** |
|--------------------------|--------------------|--------------------|
| Qwen2-7B-Instruct | 84.00% | 76.80% |
| Marco-o1-CoT | 85.60% | 71.20% |
| Marco-o1-MCTS (step) | 90.40% | 80.00% |
| Marco-o1-MCTS (mini-step of 64 tokens) | 88.40% | 80.40% |
| Marco-o1-MCTS (mini-step of 32 tokens) | 87.60% | 82.40% |
📥 [Marco-o1-CoT](https://huggingface.co/AIDC-AI/Marco-o1) (Our Lastest Model)
🔬 These results demonstrate that:
1. **Performance of Marco-o1-CoT vs. Qwen2-7B-Instruct**:
- In the MGSM-en dataset, Marco-o1-CoT shows an advantage over Qwen2-7B-Instruct, as shown in Figure 4, which is expected due to the fine-tuning with English CoT data.
- In the MGSM-zh dataset, however, Marco-o1-CoT exhibits a decrease in performance compared to Qwen2-7B-Instruct. This decline is attributed to the fact that the CoT data used for fine-tuning was in English, which may not transfer effectively to the Chinese dataset.
2. **Impact of MCTS Integration**:
- The three MCTS-enhanced models demonstrate improvements over Marco-o1-CoT, indicating that incorporating MCTS helps to expand the model's solution space and increase the probability of obtaining correct answers.
- However, since we use the Confidence Score as the reward, the tree search results exhibit significant randomness. In MGSM-en, the "step as Action" strategy performs the best, while in MGSM-zh, the "mini-step as Action (32)" strategy yields the highest accuracy.
- Currently, as shown in Figure 5-6, we cannot draw definitive conclusions about which action strategy is superior. We believe that as the reward becomes more accurate, the larger solution space provided by MCTS will demonstrate greater potential.
<div align="center">
<img src="assets/cot-step.jpg" alt="Figure Description or Alt Text" width="100%">
<p><strong>Figure 4: </strong>MCTS Expands the Solution Space for Correct Answers. Comparison between Marco-o1-CoT (left) and Marco-o1-MCTS (step) (right) on the MGSM dataset. While Marco-o1-CoT failed to provide the correct answer, integrating MCTS with step-level actions allowed the model to explore a broader solution space, increasing the likelihood of arriving at the correct solution.</p>
</div>
<div align="center">
<img src="assets/step-ministep32.jpg" alt="Figure Description or Alt Text" width="100%">
<p><strong>Figure 5: </strong>Finer Granularity with mini-steps Enhances Problem-Solving. Comparison between Marco-o1-MCTS (step) (left) and Marco-o1-MCTS (mini-step of 32 tokens) (right) on the MGSM dataset. The step-level action strategy did not yield the correct answer, but by using a finer-grained mini-step of 32 tokens, the model successfully navigated the solution space to find the correct answer, demonstrating the effectiveness of increased action granularity.</p>
</div>
<div align="center">
<img src="assets/ministep64-step.jpg" alt="Figure Description or Alt Text" width="100%">
<p><strong>Figure 6: </strong>Optimal Action Granularity Depends on Problem Complexity. Comparison between Marco-o1-MCTS (mini-step of 64 tokens) (left) and Marco-o1-MCTS (step) (right) on the MGSM dataset. The model with a mini-step of 64 tokens failed to find the correct answer, whereas using step-level actions enabled the model to correctly solve the problem. This highlights that we cannot draw definitive conclusions about which action strategy is superior. We believe that as the reward becomes more accurate, the larger solution space provided by MCTS will demonstrate greater potential. </p>
</div>
Furthermore, we use Test@N to denote the percentage of problems solved correctly at least once when allowing the model to make N separate guesses for each problem. We evaluated solve rates at Test@1, Test@8, and Test@32. The results demonstrate that MCTS shows an advantage with a lower number of separate guesses (Test@1). This reveals the potential of MCTS. In future work, we plan to train the reward model (RM) in conjunction with MCTS to continue optimizing our approach.
<div align="center">
<img src="assets/test@k.jpg" alt="Figure Description or Alt Text" width="80%">
</div>
These results demonstrate the effectiveness of our approach in enhancing the reasoning capabilities of the model across different languages and configurations.
## Case Study of Translation Task
We have also conducted some open-ended tests on our models, such as translation issues and achieved some positive results. In the future, we will continue to explore other areas and improve the model's related performance.
<div align="center">
<img src="assets/trans-case-1.jpg" alt="Figure Description or Alt Text" width="60%">
<p><strong>Figure 7: </strong>Translation comparison of a colloquial expression “It’s so beautiful that it’s captivating,
the upper part has a distinctly Korean style, the soft and fluffy material is perfectly thick, and it’s
complemented by a base layer, creating a unique and everyday-wear outfit”.</p>
</div>
<div align="center">
<img src="assets/trans-case-2.jpg" alt="Figure Description or Alt Text" width="60%">
<p><strong>Figure 8: </strong> Translation comparison of a colloquial expression “It’s so beautiful! And it’s so cheap, super
straight and doesn’t curl. Buy it, buy it!”.</p>
</div>
# ⚡️ Released Resources
## Models and Datasets
📥 [Marco-o1-CoT](https://huggingface.co/AIDC-AI/Marco-o1) (Our Lastest Model)
📥 [Marco Reasoning Dataset](https://github.com/AIDC-AI/Marco-o1/blob/main/data/CoT_demo.json) (Our Partial Dataset)
## Installation
To install Marco-o1, follow these steps:
```bash
# Clone the repository
git clone https://github.com/AIDC-AI/Marco-o1
# Change to the Macaw-LLM directory
cd Marco-o1
# Install required packages
pip install -r requirements.txt
```
## Usage
1. **Load Marco-o1-CoT model:**
```
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")
```
2. **Inference:**
Execute the inference script (you can give any customized inputs inside):
```
./src/talk_with_model.py
# Use vLLM
./src/talk_with_model_vllm.py
```
3. **Deploy using FastAPI:**
Check the README.md file in examples folder.
# 👨🏻‍💻 Acknowledgement
## Main Contributors
From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:
- Yu Zhao
- [Huifeng Yin](https://github.com/HuifengYin)
- Hao Wang
- [Longyue Wang](http://www.longyuewang.com)
## Citation
If you find Marco-o1 useful for your research and applications, please cite:
```
@misc{zhao2024marcoo1openreasoningmodels,
title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
year={2024},
eprint={2411.14405},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.14405},
}
```
## LICENSE
This project is licensed under [Apache License Version 2](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) (SPDX-License-identifier: Apache-2.0).
## DISCLAIMER
We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment