README.md 5.41 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

这个项目是原始[veRL](https://github.com/volcengine/verl)项目的一个干净的分支,以支持视觉语言模型。

EasyR1基于 **[HybirdEngine](https://arxiv.org/abs/2409.19256)** 和最新发布的 **[vLLM](https://github.com/vllm-project/vllm)** 的SPMD模式,是一个高效和可扩展的项目。

## 项目特色

- 支持的模型
  - Qwen2/Qwen2.5 language models
  - Qwen2/Qwen2.5-VL vision language models
  - DeepSeek-R1 distill models

- 支持的算法
  - GRPO
chenych's avatar
chenych committed
16
17
18
  - Reinforce++
  - ReMax
  - RLOO
chenych's avatar
chenych committed
19
20
21
22
23
24
25
26

- 支持的数据集
  - Any text, vision-text dataset in a [specific format](#custom-dataset).

## 软硬件依赖

### 软件依赖

chenych's avatar
chenych committed
27
- Python 3.10+
chenych's avatar
chenych committed
28
- transformers>=4.49.0
chenych's avatar
chenych committed
29
- flash-attn==2.6.1+das.opt4.dtk2504
chenych's avatar
chenych committed
30
31
32
33
34
35
- vllm>=0.7.3

### 硬件依赖

\* *估算值*

chenych's avatar
chenych committed
36
37
38
39
| Method                   | Bits |  1.5B  |   3B   |   7B   |   32B   |
| ------------------------ | ---- | ------ | ------ | ------ | ------- |
| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB |
| GRPO Full Fine-Tuning    | BF16 | 1*24GB | 1*40GB | 4*40GB |  8*80GB |
chenych's avatar
chenych committed
40
41

> [!NOTE]
chenych's avatar
chenych committed
42
43
> 使用 `worker.actor.fsdp.torch_dtype=bf16` 和 `worker.actor.optim.strategy=adamw_bf16`参数确保使用 bf16 类型训练。
>
chenych's avatar
update  
chenych committed
44
45
> 训练需要使用到wandb,环境安装结束后,需要先登录wandb。

chenych's avatar
chenych committed
46
## 教程: 只需三步,在 [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) 数据集上基于GRPO算法训练Qwen2.5-VL。
chenych's avatar
chenych committed
47
48
49

![image](assets/qwen2_5_vl_7b_geo.png)

chenych's avatar
chenych committed
50
### 环境准备
chenych's avatar
Update  
chenych committed
51
`-v 路径``docker_name``imageID`根据实际情况修改
chenych's avatar
chenych committed
52
53
54
55
56
57
58
59
60
61
62
63
64

####  Docker(方法一)

基于光源pytorch2.4.1+dtk25.04基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.4.1、python、dtk及系统下载对应的镜像版本。

```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10

docker run -it --shm-size 200g --network=host --name docker_name --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID bash

## 安装所需环境包
cd EasyR1
pip install -r requirements.txt
chenych's avatar
chenych committed
65
pip install "tensordict<0.6"
chenych's avatar
chenych committed
66
67
68
69
70
71
72
73
74
75
76
77
78
# 编译
pip install -e .
```
#### Dockerfile(方法二)

```bash
cd docker
docker build --no-cache -t llama-factory:latest .
docker run -it --shm-size 200g --network=host --name docker_name --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro imageID bash

## 安装所需环境包
cd EasyR1
pip install -r requirements.txt
chenych's avatar
chenych committed
79
pip install "tensordict<0.6"
chenych's avatar
chenych committed
80
81
82
83
84
85
86
87
88
89
90
91
# 编译
pip install -e .
```

#### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash
DTK驱动: dtk25.04
python: 3.10
torch: 2.4.1
deepspeed: 0.14.2+das.opt2.dtk2504
flash-attn: 2.6.1+das.opt4.dtk2504
chenych's avatar
chenych committed
92
vllm: 0.8.3
chenych's avatar
chenych committed
93
94
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
chenych's avatar
chenych committed
95
96
97

```bash
cd EasyR1
chenych's avatar
chenych committed
98
pip install -r requirements.txt
chenych's avatar
chenych committed
99
pip install "tensordict<0.6"
chenych's avatar
chenych committed
100
# 编译
chenych's avatar
chenych committed
101
102
103
pip install -e .
```

chenych's avatar
chenych committed
104
105
106
107
108
109
110
### 数据集
可根据下面的样例数据去构造自己的数据集

- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa

chenych's avatar
chenych committed
111
112
113
### GRPO 训练

```bash
chenych's avatar
chenych committed
114
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
chenych's avatar
chenych committed
115
116
117
118
119
```

### 基于Hugging Face Format融合Checkpoint

```bash
chenych's avatar
chenych committed
120
python3 scripts/model_merger.py --local_dir path_to_your_actor_checkpoint
chenych's avatar
chenych committed
121
122
123
```

> [!NOTE]
chenych's avatar
chenych committed
124
125
> 如果无法连接到Hugging Face,请先安装`pip install -U huggingface_hub hf_transfer`,再在启动前增加 `export HF_ENDPOINT=https://hf-mirror.com`命令
>
chenych's avatar
chenych committed
126
> 如果您想使用SwanLab日志记录器,请考虑使用 `bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh`.
chenych's avatar
chenych committed
127
128
129
130
131
132
133
134
135
136
137
138
139

## 自定义数据集

自定义数据集应严格遵循示例数据格式。

- 文本数据集: https://huggingface.co/datasets/hiyouga/math12k
    - Required columns: problem, answer

- 视觉-文本数据集: https://huggingface.co/datasets/hiyouga/geometry3k
    - Required columns: images, problem, answer

## 其他基线

chenych's avatar
chenych committed
140
141
- [CLEVR-70k-Counting](examples/baselines/qwen2_5_vl_3b_clevr.sh):训练 Qwen2.5-VL-3B-Instruct 模型计数问题。
- [GeoQA-8k](examples/baselines/qwen2_5_vl_3b_geoqa8k.sh): 训练Qwen2.5-VL-3B-Instruct 的 GeoQA 问题.
chenych's avatar
chenych committed
142
143
144
145
146
147

### 已知问题

这些功能目前暂时禁用,我们计划在未来的更新中逐一修复。

- 视觉语言模型目前不兼容 padding-free 训练和 DeepSpeed Ulysses并行方法。
chenych's avatar
chenych committed
148
149
150
151
152
153
154
155
156
157
158
159
160
161

### 常见问题及解决办法

> ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

增大`data.max_prompt_length`的数值或者减小`data.max_pixels` 的数值.

> RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

减小`worker.rollout.gpu_memory_utilization`的数值并且确认开启 `worker.actor.offload.offload_params`.

> RuntimeError: 0 active drivers ([]). There should only be one.

在当前python环境下卸载 `deepspeed`.