添加result结果

81a090ed · hepj987 · d0d55509 · 81a090ed · 81a090ed · 81a090ed
Commit 81a090ed authored Oct 11, 2023 by hepj987
Hide whitespace changes
Inline Side-by-side

Showing with 25 additions and 2 deletions

README.md README.md +24 -2

requirements.txt requirements.txt +1 -0

result.jpg result.jpg +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -45,6 +45,8 @@ source env.sh
 #安装DTK版本依赖
 pip install torch-1.10.0+gite378c3c.abi0.dtk2304-cp37-cp37m-manylinux2014_x86_64.whl
 pip install deepspeed-0.9.2+git25d5540.abi0.dtk2304.torch1.10.0-cp37-cp37m-manylinux2014_x86_64.whl
+pip install apex-0.1+f49ddd4.abi0.dtk2304.torch1.10-cp37-cp37m-manylinux2014_x86_64.whl
+pip install torchvision-0.10.0+git48e6bbb.abi0.dtk2304.torch1.10-cp37-cp37m-manylinux2014_x86_64.whl
 #安装其他依赖
 pip install -r requirements.txt  -i http://pypi.tuna.tsinghua.edu.cn/simple  --trusted-host pypi.tuna.tsinghua.edu.cn
 ```
@@ -88,7 +90,7 @@ sh creat-data.sh



-## GPT2预训练
+## 训练

 ### GPT2单节点训练

@@ -129,7 +131,19 @@ SAVE_INTERVAL				保存频率
 sh mpi-run-16B.sh(主要参数在single-16B.sh,参数类型与单节点相同, 默认以fp32精度训练，如需采用fp16精度可执行sh mpi-16B-fp16.sh)
 ```

-## GPT2文本生成
+## 推理
+
+### 说明
+
+```
+注意：推理时pp数需要为1，而tp数需要与训练时一致或者为1。(tp不为1时为多卡推理，为1则是单卡推理)
+
+tools/convert_checkpoint/deepspeed_to_deepspeed.py 模型tp数转化
+tools/convert_checkpoint/deepspeed_to_megatron.py 模型pp数转化，并变为可推理格式（推理必须有这一步)
+
+下边展示多节点4tp 4pp训练的模型专为4tp 1pp的多卡推理，以及单节点4tp 1pp训练的模型转化为1tp 1pp的单卡推理。
+16B模型单卡推理显存不足，这里不给示例，如有多tp 多pp转为单卡推理需要可以参考单卡推理conver-model-1tp.sh
+```

 ### 转换成多卡推理

@@ -180,6 +194,14 @@ mpirun -np 1 run-inf.sh

 ## result

+推理示例如下：（生成单词异常是由于训练不充分导致一些token合并问题）
+
+![result](result.jpg)
+
+
+
+## 精度
+
 16B模型训练loss：

 |   卡数    |       配置        |   lm loss    |

--- a/requirements.txt
+++ b/requirements.txt
@@ -11,3 +11,4 @@ transformers
 black==21.4b0
 isort>=5.5.4
 ninja
+mpi4py
--- a/result.jpg
+++ b/result.jpg