update

065ccf6e · “yuguo” · 91454f7c · 065ccf6e
Commit 065ccf6e authored Oct 10, 2023 by “yuguo”
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

README.md README.md +3 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -61,7 +61,7 @@ GPT-2中使用掩模自注意力（masked self-attention），一般的自注意
    pip3 install pybind11 -i https://mirrors.aliyun.com/pypi/simple
    pip3 install -e . -i https://mirrors.aliyun.com/pypi/simple

-## GPT2预训练
+## 训练

 该预训练脚本运行环境为1节点，4张DCU-Z100-16G。

@@ -79,7 +79,7 @@ train.dist.pipeline_parallel_size = 1

    bash tools/train.sh tools/train_net.py configs/gpt2_pretrain.py 4

-### 精度
+## 精度

 训练数据：[https://oneflow-static.oss-cn-beijing.aliyuncs.com/ci-files/dataset/libai/gpt_dataset](链接)

@@ -91,7 +91,7 @@ train.dist.pipeline_parallel_size = 1
 | :--: | :--------: | :---------------------------: |
 |  4   | Libai-main | total_loss: 4.336/10000 iters |

-### 混合并行配置指南
+## 混合并行配置指南

 首先，可以在一个节点内的多卡上做模型并行切分。因为模型并行通信开销大（前后向可能都需要all-reduce通信），而节点内设备间带宽高；另外模型并行组大小越大，流水线Stage可以减少，继而可以减少流水线中的气泡；所以一般可以节点内所有设备作为一个模型并行组。