Commit 81a090ed authored by hepj987's avatar hepj987
Browse files

添加result结果

parent d0d55509
Pipeline #588 failed with stage
......@@ -45,6 +45,8 @@ source env.sh
#安装DTK版本依赖
pip install torch-1.10.0+gite378c3c.abi0.dtk2304-cp37-cp37m-manylinux2014_x86_64.whl
pip install deepspeed-0.9.2+git25d5540.abi0.dtk2304.torch1.10.0-cp37-cp37m-manylinux2014_x86_64.whl
pip install apex-0.1+f49ddd4.abi0.dtk2304.torch1.10-cp37-cp37m-manylinux2014_x86_64.whl
pip install torchvision-0.10.0+git48e6bbb.abi0.dtk2304.torch1.10-cp37-cp37m-manylinux2014_x86_64.whl
#安装其他依赖
pip install -r requirements.txt -i http://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
```
......@@ -88,7 +90,7 @@ sh creat-data.sh
## GPT2预训练
## 训练
### GPT2单节点训练
......@@ -129,7 +131,19 @@ SAVE_INTERVAL 保存频率
sh mpi-run-16B.sh(主要参数在single-16B.sh,参数类型与单节点相同, 默认以fp32精度训练,如需采用fp16精度可执行sh mpi-16B-fp16.sh)
```
## GPT2文本生成
## 推理
### 说明
```
注意:推理时pp数需要为1,而tp数需要与训练时一致或者为1。(tp不为1时为多卡推理,为1则是单卡推理)
tools/convert_checkpoint/deepspeed_to_deepspeed.py 模型tp数转化
tools/convert_checkpoint/deepspeed_to_megatron.py 模型pp数转化,并变为可推理格式(推理必须有这一步)
下边展示多节点4tp 4pp训练的模型专为4tp 1pp的多卡推理,以及单节点4tp 1pp训练的模型转化为1tp 1pp的单卡推理。
16B模型单卡推理显存不足,这里不给示例,如有多tp 多pp转为单卡推理需要可以参考单卡推理conver-model-1tp.sh
```
### 转换成多卡推理
......@@ -180,6 +194,14 @@ mpirun -np 1 run-inf.sh
## result
推理示例如下:(生成单词异常是由于训练不充分导致一些token合并问题)
![result](result.jpg)
## 精度
16B模型训练loss:
| 卡数 | 配置 | lm loss |
......
......@@ -11,3 +11,4 @@ transformers
black==21.4b0
isort>=5.5.4
ninja
mpi4py
result.jpg

25.1 KB

Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment