README.md 9.1 KB
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# DeepSeek-V2-MoE模型在Pai-Megatron-Patch的最佳实践

## Table of Contents
   * [安装](#安装)
   * [数据集&模型下载](#数据集和模型下载)
   * [Megatron-Core-MoE模型训练流程](#Megatron-Core-MoE模型训练流程)
      * [模型格式转换](#Megatron-Core-MoE模型格式转换)
      * [继续预训练](#预训练示例)
      * [指令微调](#指令微调示例)
   * [下游任务评估](#下游任务评估)
      * [Megatron-Core-MoE模型格式转换](#评估格式转换)
      * [运行评估工具](#运行评估工具)


## 安装

请在阿里云人工智能平台PAI产品中填写专属镜像地址: `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:24.07` 

运行下列代码克隆Pai-Megatron-Patch
```bash
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
cd Pai-Megatron-Patch
```

目前DeepSeek-V2-MoE已支持使用FlashAttention-3加速计算,但只能在Hopper架构的GPU卡上进行运算。若需要在H卡上使用FA3,请在DSW的容器中按如下指令安装并保存镜像
```bash
pip install "git+https://github.com/Dao-AILab/flash-attention.git#egg=flashattn-hopper&subdirectory=hopper"
python_path=`python -c "import site; print(site.getsitepackages()[0])"`
mkdir -p $python_path/flashattn_hopper
wget -P $python_path/flashattn_hopper https://raw.githubusercontent.com/Dao-AILab/flash-attention/main/hopper/flash_attn_interface.py
```
## 预训练数据集和模型下载

```bash
cd /mnt
mkdir deepseek-ckpts
cd deepseek-ckpts
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-ckpts/DeepSeek-V2-Lite.tgz
tar -zxf DeepSeek-V2-Lite.tgz

mkdir deepseek-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/SlimPajama.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-valid.json
```

制作idxmap的脚本如下所示
```bash
cd /workspace/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing
sh run_make_pretraining_dataset_megatron.sh \
/mnt/deepseek-datasets/SlimPajama.json \
DeepSeekV2Tokenizer \
text \
/mnt/deepseek-datasets/ \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite
```
为方便期间,我们也提供了已经处理好的idxmap数据集供后续测试使用
```bash
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.idx
```


## Megatron-Core-MoE模型训练流程
### Megatron-Core-MoE模型格式转换
运行`hf2mcore_deepseek_v2_moe_convertor.sh`脚本,需要传入的参数列表如下
```
MODEL_SIZE=$1                  # 模型参数:A2.4B/A21B
SOURCE_CKPT_PATH=$2            # 源路径
TARGET_CKPT_PATH=$3            # 目标路径
TP=$4                          # 模型并行度
PP=$5                          # 流水并行度
EP=$6                          # 专家并行度
PR=$7                          # 转换精度
USE_TE=$8                      # 是否使用Transformer Engine建模
mg2hf=$9                       # 是否执行mcore2hf转换
HG_CKPT_PATH=${10}             # HF的CKPT的路径
```
例如,使用下述脚本将checkpoint转换到MCore-Dense并检查输出

```bash
cd /workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/deepseek
bash hf2mcore_deepseek_v2_moe_convertor.sh \
A2.4B \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite-to-mcore-tp2-pp1-ep4  \
2  \
1  \
4 \
fp32 \
true \
false 
```

### Megatron-Core预训练及指令微调
在DeepSeek-V2中,我们已将预训练和微调整合到`run_mcore_deepseek.sh`脚本,对于不同的使用场景,二者各参数的意义有所不同。

#### 预训练&微调命令统一描述
需要传入的参数列表如下:
```bash
ENV=$1                          # 运行环境配置开关: dsw单机训练训练,dlc表示多机训练环境
MODEL_SIZE=$2                   # 模型结构参数量级: A2.4B,A21B
BATCH_SIZE=$3                   # 一次迭代一个数据并行内的样本数
GLOBAL_BATCH_SIZE=$4            # 一次迭代多个数据并行的总样本数
LR=$5                           # 学习率
MIN_LR=$6                       # 最小学习率
SEQ_LEN=$7                      # 序列长度
PAD_LEN=$8                      # Padding长度
PR=${9}                         # 训练精度: fp16, bf16, fp8
TP=${10}                        # 模型并行度
PP=${11}                        # 流水并行度
CP=${12}                        # 上下文并行度
EP=${13}                        # 专家并行度
SP=${14}                        # 是否使用序列并行: true, false
DO=${15}                        # 是否使用Megatron版Zero-1降显存优化器: true, false
FL=${16}                        # 是否优先使用Flash Attention: true, false
SFT=${17}                       # 是否执行微调训练: true, false
AC=${18}                        # 激活检查点模式: sel, full, offload, false
OPTIMIZER_OFFLOAD=${19}         # 是否启用Offload optimizer: false, static, auto
SAVE_INTERVAL=${20}             # 保存ckpt的间隔
DATASET_PATH=${21}              # 训练数据集路径
VALID_DATASET_PATH=${22}        # 验证数据集路径
PRETRAIN_CHECKPOINT_PATH=${23}  # 预训练模型路径
TRAIN_TOKENS_OR_ITERS=${24}     # 训练TOKEN或者Iter数
WARMUP_TOKENS_OR_ITERS=${25}    # 预热TOKEN或者Iter数        
OUTPUT_BASEPATH=${26}           # 训练输出日志文件路径
```

#### 预训练示例
使用以下命令启动对Deepseek-V2-MoE的继续预训练。
备注:当`AC=offload``full`时,可设置`MP_AC_LAYERS`环境变量来控制Checkpointing或Offload的TransformerLayer层数(默认值:`1`)。

```bash
cd /workspace/Pai-Megatron-Patch/examples/deepseek_v2
sh run_mcore_deepseek.sh  \
dsw  \
A2.4B   \
1    \
8 \
1e-5   \
1e-6   \
128  \
128  \
bf16  \
2   \
1  \
1 \
4 \
true \
true   \
true \
false \
false   \
false \
100000  \
/mnt/deepseek-datasets/mmap_deepseekv2_datasets_text_document   \
/mnt/deepseek-datasets/mmap_deepseekv2_datasets_text_document   \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite-to-mcore-tp2-pp1-ep4  \
10000  \
100   \
/workspace/output_mcore_deepseek_pretrain
```

#### 指令微调示例
制作idxmap用于微调的数据集可以参考[链接](https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing)
当准备好微调数据集后,将SFT开关设置为`true`即可进行指令微调。

```bash
cd /workspace/Pai-Megatron-Patch/examples/deepseek_v2
sh run_mcore_deepseek.sh  \
dsw  \
A2.4B   \
1    \
8 \
1e-5   \
1e-6   \
128  \
128  \
bf16  \
2   \
1  \
1 \
4 \
true \
true   \
true \
true \
false   \
false \
100000  \
/mnt/deepseek-datasets/path_to_your_dataset   \
/mnt/deepseek-datasets/path_to_your_dataset   \
/path/to/pretraining/checkpoint  \
10000  \
100   \
/workspace/output_mcore_deepseek_finetune
```
通过设置MP_DATASET_TYPE环境变量,本脚本还可使用json格式的数据集进行指令微调
```bash
export MP_DATASET_TYPE="raw"
cd /workspace/Pai-Megatron-Patch/examples/deepseek_v2
sh run_mcore_deepseek.sh  \
dsw  \
A2.4B   \
1    \
8 \
1e-5   \
1e-6   \
128  \
128  \
bf16  \
1   \
1  \
1 \
1 \
true \
true   \
true \
true \
false   \
false \
100000  \
/mnt/deepseek-datasets/alpaca_zh-train.json    \
/mnt/deepseek-datasets/alpaca_zh-train.json   \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite-to-mcore-tp2-pp1-ep4  \
10000  \
100   \
/workspace/output_mcore_deepseek_finetune
```

## 下游任务评估

### 评估格式转换
您需要将训练/微调后保存的Megatron-Core转换为HuggingFace格式来进行推理评估。

```bash
cd /workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/deepseek
bash hf2mcore_deepseek_v2_moe_convertor.sh \
A2.4B \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite-to-mcore-tp2-pp1-ep4  \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite-mcore-te-to-hf    \
2  \
1  \
4 \
fp32 \
true \
true \
/mnt/deepseek-ckpts/DeepSeek-V2-Lite
```

### 运行评估工具
下载评估数据
```bash
# In container
cd /workspace

wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/evaluate.tgz 
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/cmmlu.tgz 
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/ceval.tgz 

tar -xvzf cmmlu.tgz 
tar -xvzf ceval.tgz 
tar -xvzf evaluate.tgz
```
运行以下指令对转换后的模型进行评估。
```bash
cd /workspace/Pai-Megatron-Patch/LM-Evaluation-Harness-240310
accelerate launch --main_process_port 29051 -m lm_eval \
--model hf \
--model_args pretrained=/mnt/deepseek-ckpts/DeepSeek-V2-Lite-mcore-te-to-hf,trust_remote_code=True \
--tasks cmmlu,ceval-valid  \
--batch_size 16
```