Upload New File

478602ba · yuguo960516yuguo · 25a4d68f · 478602ba
Commit 478602ba authored Mar 29, 2023 by yuguo960516yuguo
Hide whitespace changes
Inline Side-by-side

Showing with 69 additions and 0 deletions

README.md README.md +69 -0

No files found.
--- a/README.md
+++ b/README.md
+# Bidirectional Encoder Representation from Transformers(BERT)
+## 模型介绍
+BERT的全称为Bidirectional Encoder Representation from Transformers，是一个预训练的语言表征模型。它强调了不再像以往一样采用传统的单向语言模型或者把两个单向语言模型进行浅层拼接的方法进行预训练，而是采用新的**masked language model（MLM）**，以致能生成**深度的双向**语言表征。
+## 模型结构
+以往的预训练模型的结构会受到单向语言模型*（从左到右或者从右到左）*的限制，因而也限制了模型的表征能力，使其只能获取单方向的上下文信息。而BERT利用MLM进行预训练并且采用深层的双向Transformer组件*（单向的Transformer一般被称为Transformer decoder，其每一个token（符号）只会attend到目前往左的token。而双向的Transformer则被称为Transformer encoder，其每一个token会attend到所有的token。）*来构建整个模型，因此最终生成**能融合左右上下文信息**的深层双向语言表征。
+我们为了用户可以使用OneFlow-Libai快速验证Bert模型预训练，统计性能或验证精度，提供了一个Bert网络示例，主要网络参数如下：
+```
+model.cfg.num_attention_heads = 16
+model.cfg.hidden_size = 768
+model.cfg.hidden_layers = 8
+```
+完整的Bert-Large网络配置在configs/common/model/bert.py中
+## 数据集
+我们在libai目录下集成了部分小数据集供用户快速验证：
+    ./nlp_data
+## Bert预训练
+### 环境配置
+推荐使用docker方式运行，提供[光源](https://www.sourcefind.cn/#/service-details)拉取的docker镜像：image.sourcefind.cn:5000/dcu/admin/base/oneflow:0.9.1-centos7.6-dtk-22.10.1-py39-latest
+进入docker：
+    cd libai
+    pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
+    pip3 install pybind11 -i https://mirrors.aliyun.com/pypi/simple
+    pip3 install -e . -i https://mirrors.aliyun.com/pypi/simple
+    pip3 install oneflow-0.9.1+dtk2210.git.8ea46d6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+### 训练
+该预训练脚本运行环境为1节点，4张DCU-Z100-16G。
+并行配置策略在configs/bert_large_pretrain.py中，使用自动混合精度：
+```
+train.amp.enabled = True
+train.train_micro_batch_size = 16
+train.dist.data_parallel_size = 4
+train.dist.tensor_parallel_size = 1
+train.dist.pipeline_parallel_size = 1
+```
+预训练命令：
+    cd libai
+    bash tools/train.sh tools/train_net.py configs/bert_large_pretrain.py 4
+### 性能和收敛性
+训练数据：[https://oneflow-static.oss-cn-beijing.aliyuncs.com/ci-files/dataset/libai/gpt_dataset](链接)
+使用的GPGPU：4张DCU-Z100-16G。
+模型性能及收敛性：
+| 卡数 | 分布式工具 |       性能       |                            收敛性                            |
+| :--: | :--------: | :--------------: | :----------------------------------------------------------: |
+|  4   | Libai-main | 161.23 samples/s | total_loss: 6.555  lm_loss: 5.973  sop_loss: 0.583/10000 iters |
+## 参考
+* https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html
+* https://github.com/Oneflow-Inc/oneflow
+* https://github.com/Oneflow-Inc/libai/blob/main/docs/source/notes/FAQ.md
\ No newline at end of file