README_en.md 8.17 KB
Newer Older
chenych's avatar
chenych committed
1
2
# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

chenych's avatar
chenych committed
3
4
5
[![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)](https://github.com/hiyouga/EasyR1/stargazers)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)

chenych's avatar
chenych committed
6
7
8
9
10
11
12
This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode.

## Features

- Supported models
chenych's avatar
chenych committed
13
  - Llama3/Qwen2/Qwen2.5 language models
chenych's avatar
chenych committed
14
15
16
17
18
  - Qwen2/Qwen2.5-VL vision language models
  - DeepSeek-R1 distill models

- Supported algorithms
  - GRPO
chenych's avatar
chenych committed
19
20
21
  - Reinforce++
  - ReMax
  - RLOO
chenych's avatar
chenych committed
22
23

- Supported datasets
chenych's avatar
chenych committed
24
25
26
27
28
29
  - Any text, vision-text dataset in a [specific format](#custom-dataset)

- Supported tricks
  - Padding-free training
  - Resuming from checkpoint
  - Wandb & SwanLab & Mlflow & Tensorboard tracking
chenych's avatar
chenych committed
30
31
32
33
34
35

## Requirements

### Software Requirements

- Python 3.9+
chenych's avatar
chenych committed
36
- transformers>=4.51.0
chenych's avatar
chenych committed
37
- flash-attn>=2.4.3
chenych's avatar
chenych committed
38
- vllm>=0.8.3
chenych's avatar
chenych committed
39
40
41

We provide a [Dockerfile](./Dockerfile) to easily build environments.

chenych's avatar
chenych committed
42
43
44
We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1.

```bash
chenych's avatar
chenych committed
45
docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0
chenych's avatar
chenych committed
46
47
```

chenych's avatar
chenych committed
48
49
50
51
### Hardware Requirements

\* *estimated*

chenych's avatar
chenych committed
52
53
54
55
| Method                   | Bits |  1.5B  |   3B   |   7B   |   32B   |
| ------------------------ | ---- | ------ | ------ | ------ | ------- |
| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB |
| GRPO Full Fine-Tuning    | BF16 | 1*24GB | 1*40GB | 4*40GB |  8*80GB |
chenych's avatar
chenych committed
56
57

> [!NOTE]
chenych's avatar
chenych committed
58
59
> Use `worker.actor.fsdp.torch_dtype=bf16` and `worker.actor.optim.strategy=adamw_bf16` to enable bf16 training.
>
chenych's avatar
chenych committed
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps

![image](assets/qwen2_5_vl_7b_geo.png)

### Installation

```bash
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
```

### GRPO Training

```bash
chenych's avatar
chenych committed
77
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
chenych's avatar
chenych committed
78
79
80
81
82
```

### Merge Checkpoint in Hugging Face Format

```bash
chenych's avatar
chenych committed
83
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor
chenych's avatar
chenych committed
84
85
```

chenych's avatar
chenych committed
86
> [!TIP]
chenych's avatar
chenych committed
87
88
> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`.
>
chenych's avatar
chenych committed
89
> If you want to use SwanLab logger, consider using `bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh`.
chenych's avatar
chenych committed
90
91
92

## Custom Dataset

chenych's avatar
chenych committed
93
Please refer to the example datasets to prepare your own dataset.
chenych's avatar
chenych committed
94
95

- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
chenych's avatar
chenych committed
96
97
- Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
chenych's avatar
chenych committed
98
99
100
101
102

## How to Understand GRPO in EasyR1

![image](assets/easyr1_grpo.png)

chenych's avatar
chenych committed
103
- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/docs/trl/v0.16.1/en/grpo_trainer).
chenych's avatar
chenych committed
104
105
106

## How to Run 70B+ Model in Multi-node Environment

chenych's avatar
chenych committed
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
1. Start the Ray head node.

```bash
ray start --head --port=6379 --dashboard-host=0.0.0.0
```

2. Start the Ray worker node and connect to the head node.

```bash
ray start --address=<head_node_ip>:6379
```

3. Check the Ray resource pool.

```bash
ray status
```

4. Run training script on the Ray head node only.

```bash
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
```

See the **[veRL's official doc](https://verl.readthedocs.io/en/latest/start/multinode.html)** for more details about multi-node training and Ray debugger.
chenych's avatar
chenych committed
132
133
134

## Other Baselines

chenych's avatar
chenych committed
135
136
137
138
We also reproduced the following two baselines of the [R1-V](https://github.com/deep-agent/R1-V) project.
- [CLEVR-70k-Counting](examples/baselines/qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem.
- [GeoQA-8k](examples/baselines/qwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

chenych's avatar
chenych committed
139
140
141
142
## Performance Baselines

See [baselines.md](assets/baselines.md).

chenych's avatar
chenych committed
143
## Awesome Work using EasyR1
chenych's avatar
chenych committed
144

chenych's avatar
chenych committed
145
146
147
148
- **MMR1**: Advancing the Frontiers of Multimodal Reasoning. [![[code]](https://img.shields.io/github/stars/LengSicong/MMR1)](https://github.com/LengSicong/MMR1)
- **Vision-R1**: Incentivizing Reasoning Capability in Multimodal Large Language Models. [![[code]](https://img.shields.io/github/stars/Osilly/Vision-R1)](https://github.com/Osilly/Vision-R1) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06749-blue)](https://arxiv.org/abs/2503.06749)
- **Seg-Zero**: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [![[code]](https://img.shields.io/github/stars/dvlab-research/Seg-Zero)](https://github.com/dvlab-research/Seg-Zero) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06520-blue)](https://arxiv.org/abs/2503.06520)
- **MetaSpatial**: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [![[code]](https://img.shields.io/github/stars/PzySeere/MetaSpatial)](https://github.com/PzySeere/MetaSpatial) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.18470-blue)](https://arxiv.org/abs/2503.18470)
chenych's avatar
chenych committed
149
150
151
152
- **Temporal-R1**: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [![[code]](https://img.shields.io/github/stars/appletea233/Temporal-R1)](https://github.com/appletea233/Temporal-R1)
- **NoisyRollout**: Reinforcing Visual Reasoning with Data Augmentation. [![[code]](https://img.shields.io/github/stars/John-AI-Lab/NoisyRollout)](https://github.com/John-AI-Lab/NoisyRollout) [![[arxiv]](https://img.shields.io/badge/arxiv-2504.13055-blue)](https://arxiv.org/pdf/2504.13055)
- **GUI-R1**: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [![[code]](https://img.shields.io/github/stars/ritzz-ai/GUI-R1)](https://github.com/ritzz-ai/GUI-R1) [![[arxiv]](https://img.shields.io/badge/arxiv-2504.10458-blue)](https://arxiv.org/abs/2504.10458)

chenych's avatar
chenych committed
153
154
## TODO

chenych's avatar
chenych committed
155
156
- Support LoRA (high priority).
- Support ulysses parallelism for VLMs (middle priority).
chenych's avatar
chenych committed
157
158
- Support more VLM architectures.

chenych's avatar
chenych committed
159
160
161
> [!NOTE]
> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).

chenych's avatar
chenych committed
162
163
164
165
### Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

chenych's avatar
chenych committed
166
- Vision language models are not compatible with ulysses parallelism yet.
chenych's avatar
chenych committed
167
168
169
170
171

## Discussion Group

👋 Join our [WeChat group](assets/wechat.jpg).

chenych's avatar
Update  
chenych committed
172
173
## FAQs

chenych's avatar
chenych committed
174
175
176
177
> ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

Increase the `data.max_prompt_length` or reduce the `data.max_pixels`.

chenych's avatar
Update  
chenych committed
178
179
> RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

chenych's avatar
chenych committed
180
181
182
183
184
Reduce the `worker.rollout.gpu_memory_utilization` and enable `worker.actor.offload.offload_params`.

> RuntimeError: 0 active drivers ([]). There should only be one.

Uninstall `deepspeed` from the current python environment.
chenych's avatar
Update  
chenych committed
185

chenych's avatar
chenych committed
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
## Citation

Core contributors: [Yaowei Zheng](https://github.com/hiyouga), [Junting Lu](https://github.com/AL-377), [Shenzhi Wang](https://github.com/Shenzhi-Wang), [Zhangchi Feng](https://github.com/BUAADreamer), [Dongdong Kuang](https://github.com/Kuangdd01) and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

```bibtex
@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}
```

We recommend to also cite the original work.

```bibtex
@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}
```