README.md 4.61 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

这个项目是原始[veRL](https://github.com/volcengine/verl)项目的一个干净的分支,以支持视觉语言模型。

EasyR1基于 **[HybirdEngine](https://arxiv.org/abs/2409.19256)** 和最新发布的 **[vLLM](https://github.com/vllm-project/vllm)** 的SPMD模式,是一个高效和可扩展的项目。

## 项目特色

- 支持的模型
  - Qwen2/Qwen2.5 language models
  - Qwen2/Qwen2.5-VL vision language models
  - DeepSeek-R1 distill models

- 支持的算法
  - GRPO
chenych's avatar
chenych committed
16
17
18
  - Reinforce++
  - ReMax
  - RLOO
chenych's avatar
chenych committed
19
20
21
22
23
24
25
26

- 支持的数据集
  - Any text, vision-text dataset in a [specific format](#custom-dataset).

## 软硬件依赖

### 软件依赖

chenych's avatar
chenych committed
27
- Python 3.10+
chenych's avatar
chenych committed
28
- transformers>=4.49.0
chenych's avatar
chenych committed
29
- flash-attn==2.6.1+das.opt4.dtk2504
chenych's avatar
chenych committed
30
31
32
33
34
35
- vllm>=0.7.3

### 硬件依赖

\* *估算值*

chenych's avatar
chenych committed
36
37
38
39
| Method                   | Bits |  1.5B  |   3B   |   7B   |   32B   |
| ------------------------ | ---- | ------ | ------ | ------ | ------- |
| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB |
| GRPO Full Fine-Tuning    | BF16 | 1*24GB | 1*40GB | 4*40GB |  8*80GB |
chenych's avatar
chenych committed
40
41

> [!NOTE]
chenych's avatar
chenych committed
42
43
44
> 使用 `worker.actor.fsdp.torch_dtype=bf16` 和 `worker.actor.optim.strategy=adamw_bf16`参数确保使用 bf16 类型训练。
>
> 我们正在努力减少RL训练中的VRAM,LoRA支持将在下一次更新中集成。
chenych's avatar
chenych committed
45

chenych's avatar
chenych committed
46
## 教程: 只需三步,在 [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) 数据集上基于GRPO算法训练Qwen2.5-VL。
chenych's avatar
chenych committed
47
48
49

![image](assets/qwen2_5_vl_7b_geo.png)

chenych's avatar
chenych committed
50
51
### 环境准备

chenych's avatar
Update  
chenych committed
52
`-v 路径``docker_name``imageID`根据实际情况修改
chenych's avatar
chenych committed
53
54
55
56
57
58
59
60
61
62
63
64

####  Docker(方法一)

基于光源pytorch2.4.1+dtk25.04基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.4.1、python、dtk及系统下载对应的镜像版本。

```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10

docker run -it --shm-size 200g --network=host --name docker_name --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID bash

## 安装所需环境包
cd EasyR1
chenych's avatar
Update  
chenych committed
65

chenych's avatar
chenych committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
pip install -r requirements.txt --no-deps
## 注释掉accelerate、liger-kernel、tensordict之后再执行以下步骤
pip install -r requirements.txt
# 编译
pip install -e .
```
#### Dockerfile(方法二)

```bash
cd docker
docker build --no-cache -t llama-factory:latest .
docker run -it --shm-size 200g --network=host --name docker_name --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID bash

## 安装所需环境包
cd EasyR1
chenych's avatar
Update  
chenych committed
81

chenych's avatar
chenych committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
pip install -r requirements.txt --no-deps
## 注释掉accelerate、liger-kernel、tensordict之后再执行以下步骤
pip install -r requirements.txt
# 编译
pip install -e .
```

#### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash
DTK驱动: dtk25.04
python: 3.10
torch: 2.4.1
deepspeed: 0.14.2+das.opt2.dtk2504
flash-attn: 2.6.1+das.opt4.dtk2504
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
chenych's avatar
chenych committed
99
100
101

```bash
cd EasyR1
chenych's avatar
Update  
chenych committed
102

chenych's avatar
chenych committed
103
104
105
106
pip install -r requirements.txt --no-deps
## 注释掉accelerate、liger-kernel、tensordict之后再执行以下步骤
pip install -r requirements.txt
# 编译
chenych's avatar
chenych committed
107
108
109
110
111
112
pip install -e .
```

### GRPO 训练

```bash
chenych's avatar
chenych committed
113
bash examples/qwen2_5_7b_math_grpo.sh
chenych's avatar
chenych committed
114
115
116
117
118
119
120
121
122
```

### 基于Hugging Face Format融合Checkpoint

```bash
python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint
```

> [!NOTE]
chenych's avatar
chenych committed
123
124

> 如果您想使用SwanLab日志记录器,请考虑使用 `bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh`.
chenych's avatar
chenych committed
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

## 自定义数据集

自定义数据集应严格遵循示例数据格式。

- 文本数据集: https://huggingface.co/datasets/hiyouga/math12k
    - Required columns: problem, answer

- 视觉-文本数据集: https://huggingface.co/datasets/hiyouga/geometry3k
    - Required columns: images, problem, answer

## 其他基线

- [CLEVR-70k-Counting](examples/run_qwen2_5_vl_2b_clevr.sh):训练 Qwen2.5-VL-3B-Instruct 模型计数问题。

### 已知问题

这些功能目前暂时禁用,我们计划在未来的更新中逐一修复。

- 视觉语言模型目前不兼容 padding-free 训练和 DeepSpeed Ulysses并行方法。