README_en.md 6.85 KB
Newer Older
chenych's avatar
chenych committed
1
2
# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

chenych's avatar
chenych committed
3
4
5
[![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)](https://github.com/hiyouga/EasyR1/stargazers)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)

chenych's avatar
chenych committed
6
7
8
9
10
11
12
This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode.

## Features

- Supported models
chenych's avatar
chenych committed
13
  - Llama3/Qwen2/Qwen2.5 language models
chenych's avatar
chenych committed
14
15
16
17
18
  - Qwen2/Qwen2.5-VL vision language models
  - DeepSeek-R1 distill models

- Supported algorithms
  - GRPO
chenych's avatar
chenych committed
19
20
21
  - Reinforce++
  - ReMax
  - RLOO
chenych's avatar
chenych committed
22
23

- Supported datasets
chenych's avatar
chenych committed
24
25
26
27
28
29
  - Any text, vision-text dataset in a [specific format](#custom-dataset)

- Supported tricks
  - Padding-free training
  - Resuming from checkpoint
  - Wandb & SwanLab & Mlflow & Tensorboard tracking
chenych's avatar
chenych committed
30
31
32
33
34
35
36
37
38
39
40
41

## Requirements

### Software Requirements

- Python 3.9+
- transformers>=4.49.0
- flash-attn>=2.4.3
- vllm>=0.7.3

We provide a [Dockerfile](./Dockerfile) to easily build environments.

chenych's avatar
chenych committed
42
43
44
45
46
47
48
49
50
We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1.

```bash
# stable
docker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix
# nightly
docker pull hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
```

chenych's avatar
chenych committed
51
52
53
54
### Hardware Requirements

\* *estimated*

chenych's avatar
chenych committed
55
56
57
58
| Method                   | Bits |  1.5B  |   3B   |   7B   |   32B   |
| ------------------------ | ---- | ------ | ------ | ------ | ------- |
| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB |
| GRPO Full Fine-Tuning    | BF16 | 1*24GB | 1*40GB | 4*40GB |  8*80GB |
chenych's avatar
chenych committed
59
60

> [!NOTE]
chenych's avatar
chenych committed
61
62
> Use `worker.actor.fsdp.torch_dtype=bf16` and `worker.actor.optim.strategy=adamw_bf16` to enable bf16 training.
>
chenych's avatar
chenych committed
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps

![image](assets/qwen2_5_vl_7b_geo.png)

### Installation

```bash
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
```

### GRPO Training

```bash
chenych's avatar
chenych committed
80
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
chenych's avatar
chenych committed
81
82
83
84
85
```

### Merge Checkpoint in Hugging Face Format

```bash
chenych's avatar
chenych committed
86
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor
chenych's avatar
chenych committed
87
88
```

chenych's avatar
chenych committed
89
> [!TIP]
chenych's avatar
chenych committed
90
91
> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`.
>
chenych's avatar
chenych committed
92
> If you want to use SwanLab logger, consider using `bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh`.
chenych's avatar
chenych committed
93
94
95

## Custom Dataset

chenych's avatar
chenych committed
96
Please refer to the example datasets to prepare your own dataset.
chenych's avatar
chenych committed
97
98
99

- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
chenych's avatar
chenych committed
100
101
102
103
104
105
106
107
108
109
110
111
112

> [!TIP]
> EasyR1 already supports multi-image dataset.

## How to Understand GRPO in EasyR1

![image](assets/easyr1_grpo.png)

- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/docs/trl/v0.15.2/en/grpo_trainer).

## How to Run 70B+ Model in Multi-node Environment

Please see the **[veRL's official doc](https://verl.readthedocs.io/en/latest/start/multinode.html)** for multi-node training and Ray debugger.
chenych's avatar
chenych committed
113
114
115

## Other Baselines

chenych's avatar
chenych committed
116
117
118
119
120
We also reproduced the following two baselines of the [R1-V](https://github.com/deep-agent/R1-V) project.
- [CLEVR-70k-Counting](examples/baselines/qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem.
- [GeoQA-8k](examples/baselines/qwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

## Awesome Work using EasyR1
chenych's avatar
chenych committed
121

chenych's avatar
chenych committed
122
123
124
125
126
127
- **MMR1**: Advancing the Frontiers of Multimodal Reasoning. [![[code]](https://img.shields.io/github/stars/LengSicong/MMR1)](https://github.com/LengSicong/MMR1)
- **Vision-R1**: Incentivizing Reasoning Capability in Multimodal Large Language Models. [![[code]](https://img.shields.io/github/stars/Osilly/Vision-R1)](https://github.com/Osilly/Vision-R1) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06749-blue)](https://arxiv.org/abs/2503.06749)
- **Seg-Zero**: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [![[code]](https://img.shields.io/github/stars/dvlab-research/Seg-Zero)](https://github.com/dvlab-research/Seg-Zero) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06520-blue)](https://arxiv.org/abs/2503.06520)
- **MetaSpatial**: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [![[code]](https://img.shields.io/github/stars/PzySeere/MetaSpatial)](https://github.com/PzySeere/MetaSpatial) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.18470-blue)](https://arxiv.org/abs/2503.18470)
- **Temporal-R1**: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward
 [![[code]](https://img.shields.io/github/stars/appletea233/Temporal-R1)](https://github.com/appletea233/Temporal-R1)
chenych's avatar
chenych committed
128
129
## TODO

chenych's avatar
chenych committed
130
131
- Support LoRA (high priority).
- Support ulysses parallelism for VLMs (middle priority).
chenych's avatar
chenych committed
132
133
- Support more VLM architectures.

chenych's avatar
chenych committed
134
135
136
> [!NOTE]
> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).

chenych's avatar
chenych committed
137
138
139
140
### Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

chenych's avatar
chenych committed
141
- Vision language models are not compatible with ulysses parallelism yet.
chenych's avatar
chenych committed
142
143
144
145
146

## Discussion Group

👋 Join our [WeChat group](assets/wechat.jpg).

chenych's avatar
Update  
chenych committed
147
148
149
150
151
152
## FAQs

> RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

Reduce the `worker.rollout.gpu_memory_utilization`.

chenych's avatar
chenych committed
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
## Citation

Core contributors: [Yaowei Zheng](https://github.com/hiyouga), [Junting Lu](https://github.com/AL-377), [Shenzhi Wang](https://github.com/Shenzhi-Wang), [Zhangchi Feng](https://github.com/BUAADreamer), [Dongdong Kuang](https://github.com/Kuangdd01) and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

```bibtex
@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}
```

We recommend to also cite the original work.

```bibtex
@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}
```