README.md 6.21 KB
Newer Older
hepj987's avatar
hepj987 committed
1
2
# Generative Pre-Training2(GPT2)

hepj987's avatar
hepj987 committed
3
4
5
6
7
8
9
10
11
## 论文

`Language Models are Unsupervised Multitask Learners`

-   https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

## 模型结构

![gpt2](gpt2.jpg)
hepj987's avatar
hepj987 committed
12
13

```
hepj987's avatar
hepj987 committed
14
GPT2使用Transformer的Decoder结构,并对 Transformer Decoder 进行了一些改动。主要在于将归一化层移到Block的输入位置;在最后一个自注意力块之后加了一层归一化;增大词汇量等。
hepj987's avatar
hepj987 committed
15
16
```

hepj987's avatar
hepj987 committed
17
18
19
20
## 算法原理

![image-gpt](image-gpt.png)

hepj987's avatar
hepj987 committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
## 环境配置

### Docker(方式一)

推荐使用docker方式运行,提供[光源](https://www.sourcefind.cn/)拉取的docker镜像:

```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py37-latest
docker run -dit --network=host --name=gpt2_pytorch --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py37-latest
docker exec -it gpt2_pytorch /bin/bash
pip install -r requirements.txt  -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
```

### Anaconda(方法二):

这里以DTK23.04、python3.7,torch1.10为例,进入[光合开发者社区](https://cancon.hpccube.com:65024/4/main/)进入到pytorch->dtk23.04->下载 torch-1.10.0+gite378c3c.abi0.dtk2304-cp37-cp37m-manylinux2014_x86_64.whl。然后可以仿照下边配置环境:

```
#创建虚拟环境
conda create -n venv_gpt2 python=3.7
#进入venv_gpt2虚拟环境
source activate venv_gpt2
#加载DTK以及其他环境设置
source env.sh		
#安装DTK版本依赖
pip install torch-1.10.0+gite378c3c.abi0.dtk2304-cp37-cp37m-manylinux2014_x86_64.whl
pip install deepspeed-0.9.2+git25d5540.abi0.dtk2304.torch1.10.0-cp37-cp37m-manylinux2014_x86_64.whl
#安装其他依赖
pip install -r requirements.txt  -i http://pypi.tuna.tsinghua.edu.cn/simple  --trusted-host pypi.tuna.tsinghua.edu.cn
```



hepj987's avatar
hepj987 committed
54
55
56
57
58
## 数据集

`oscar-1GB`

-   https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz
hepj987's avatar
hepj987 committed
59
60
61
62
63
64
65
66

```
#下载数据集
wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz
#下载vocab文件
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
xz -d oscar-1GB.jsonl.xz
hepj987's avatar
hepj987 committed
67
68
69
70
71
72
73
74
75
76
77

#处理数据集参数
--input				输入数据集路径,即oscar-1GB.jsonl.xz解压后的文件路径
--output-prefix		输出数据路径,处理后会自动加上_text_document后缀
--vocab				下载的gpt2-vocab.json词表文件路径
--dataset-impl		dataset类型
--tokenizer-type 	tokenizer类型
--merge-file		下载的gpt2-merges.txt文件路径		
--append-eod		添加结束标志符		
--workers			进程数

hepj987's avatar
hepj987 committed
78
#处理数据集
hepj987's avatar
hepj987 committed
79
sh creat-data.sh
hepj987's avatar
hepj987 committed
80
81
```

hepj987's avatar
hepj987 committed
82
83
84
85
86
87
```
#处理后的数据集格式
├── my-gpt2_text_document.bin
├── my-gpt2_text_document.idx
└── oscar-1GB.jsonl
```
hepj987's avatar
hepj987 committed
88
89
90



hepj987's avatar
hepj987 committed
91
92
## GPT2预训练

hepj987's avatar
hepj987 committed
93
### GPT2单节点训练
hepj987's avatar
hepj987 committed
94
95

```
hepj987's avatar
hepj987 committed
96
97
98
#np为起的进程数,和使用GPU数量一致,并且TP*PP < np,4卡的话可以设置2tp 2pp,或者1tp 4pp,4tp 1pp,节点内使用TP性能更好

mpirun -np 4 run-one-node.sh(基于单节点四卡)
hepj987's avatar
hepj987 committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
```

```
#重要参数
MODEL_NAME 					模型名(自定义)
CHECKPOINT_PATH				模型保存&加载路径
DATA_PATH					数据集路径(转换后的)
TENSORBOARD_PATH			tensorboard路径
CODECARBON_PATH				codecarbon路径

N_GPUS         				使用加速卡数量
TP_SIZE  	 				TP数量
PP_SIZE      				PP数量
MICRO_BATCH_SIZE			MICRO_BATCH_SIZE大小
GLOBAL_BATCH_SIZE           GLOBAL_BATCH_SIZE大小
NLAYERS 					模型层数
NHIDDEN						隐藏层维度
NHEADS						多注意力机制头数
SEQ_LEN						最大长度
SAVE_INTERVAL				保存频率

--train-samples				训练样本数
--eval-interval				验证频率
--eval-iters				验证iter
```

hepj987's avatar
hepj987 committed
125
### GPT2模型16B多节点训练
hepj987's avatar
hepj987 committed
126
127

```
hepj987's avatar
hepj987 committed
128
#多节点运行
hepj987's avatar
hepj987 committed
129
sh mpi-run-16B.sh(主要参数在single-16B.sh,参数类型与单节点相同, 默认以fp32精度训练,如需采用fp16精度可执行sh mpi-16B-fp16.sh)
hepj987's avatar
hepj987 committed
130
131
132
133
```

## GPT2文本生成

hepj987's avatar
hepj987 committed
134
### 转换成多卡推理
hepj987's avatar
hepj987 committed
135
136

```
hepj987's avatar
hepj987 committed
137
138
139
#训练后的模型保存格式为deepspeed格式,如果用于推理,需要进行格式转换成megatron格式,deepspeed-> megatron格式时转换前后TP数需要保持相同
#转换脚本
sh conver-model_to_megatron.sh
hepj987's avatar
hepj987 committed
140
141
142
```

```
hepj987's avatar
hepj987 committed
143
144
145
#重要参数
需要将工程路径加入PYTHONPATH
例如:export PYTHONPATH=/home/megatron-deepspeed_dtk23.04:$PYTHONPATH
hepj987's avatar
hepj987 committed
146

hepj987's avatar
hepj987 committed
147
148
149
150
CHECKPOINT_PATH  需要转换的模型路径(具体到保存的global_step)
output_folder	 转换后的模型路径
target_tp		 转换后的TP数,与训练保持一直或设置为1
target_pp		 转换后的PP数,与训练保持一直或设置为1
hepj987's avatar
hepj987 committed
151
152
```

hepj987's avatar
hepj987 committed
153
### 转换成单卡推理
hepj987's avatar
hepj987 committed
154
155

```
hepj987's avatar
hepj987 committed
156
#原始模型保存的是deepspeed格式,deepspeed-> megatron格式时转换前后TP数需要保持相同,因此需要先deepspeed->deepspeed(改变TP成1),然后再由deepspeed-> megatron转换成可推理的格式
hepj987's avatar
hepj987 committed
157

hepj987's avatar
hepj987 committed
158
159
#转换脚本
sh conver-model-1tp.sh
hepj987's avatar
hepj987 committed
160
161
```

hepj987's avatar
hepj987 committed
162

hepj987's avatar
hepj987 committed
163
164
165
166

### 无条件文本生成

```
hepj987's avatar
hepj987 committed
167
168
169
170
#多卡推理
mpirun -np 4 run-inf-gpus.sh
#单卡推理
mpirun -np 1 run-inf.sh
hepj987's avatar
hepj987 committed
171
172
173
174
175
176
177
178
179
180
```

```
#生成时模型各项参数需要与训练时保持一致(TP也需要保持一致)
--micro-batch-size  	micro-batch-size大小
--out-seq-length		输出文本程度
--genfile				生成文本保存位置
--num-samples			生成样本个数
```

hepj987's avatar
hepj987 committed
181
182
183
## 应用场景

### 算法类别
hepj987's avatar
hepj987 committed
184

hepj987's avatar
hepj987 committed
185
`文本生成`
hepj987's avatar
hepj987 committed
186

hepj987's avatar
hepj987 committed
187
188
### 热点应用行业

hepj987's avatar
hepj987 committed
189
`互联网`
hepj987's avatar
hepj987 committed
190
191

## result
hepj987's avatar
hepj987 committed
192

hepj987's avatar
hepj987 committed
193
194
195
196
197
198
199
16B模型训练loss:

|   卡数    |       配置        |   lm loss    |
| :-------: | :---------------: | :----------: |
| 32 x 4DCU | tp4,pp8,单卡16G | 1.965622E+00 |

16B模型验证:
hepj987's avatar
hepj987 committed
200

hepj987's avatar
hepj987 committed
201
202
203
|   卡数    |       配置        | lm loss value | lm loss PPL  |
| :-------: | :---------------: | :-----------: | :----------: |
| 32 x 4DCU | tp4,pp8,单卡16G | 4.299443E+00  | 7.365877E+01 |
hepj987's avatar
hepj987 committed
204

hepj987's avatar
hepj987 committed
205
16B模型收敛曲线如下:
hepj987's avatar
hepj987 committed
206
207
208
209
210
211
212
213
214
215
216
217

![image-20230524143710566](image-gpt-loss.png)

![image-20230524143830580](image-gpt-loss2.png)

## 源码仓库及问题反馈

https://developer.hpccube.com/codes/modelzoo/gpt2-pytorch/

## 参考

https://github.com/bigscience-workshop/Megatron-DeepSpeed
hepj987's avatar
hepj987 committed
218