"...git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "135567f18edc0bf02d515d5c76cc736d1ebddad3"
Commit 124faa46 authored by liangjing's avatar liangjing
Browse files

Delete readme.bak.md

parent 1dfa0c01
# Bert介绍
## 应用领域:
自然语言理解大模型
## 目标精度
Mask-LM accuracy 达到0.72
## 模型基本参数设置
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"max_position_embeddings": 512,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"type_vocab_size": 2,
"vocab_size": 30522
# 测试前准备
## 数据集准备
### progress bars in model download and training scripts
boto3==1.14.0
gdown==3.13.0
git+https://github.com/mlcommons/logging.git@2.0.0-rc2
h5py==2.10.0
html2text==2020.1.16
ipdb==0.13.2
nltk==3.5
onnxruntime==1.3.0
parameterized
progressbar==2.5
requests==2.23.0
six==1.15.0
tensorflow==2.2.0
数据预处理时尽量将所有采用库的版本号对齐,以免出现md5码不一致问题
参见bert目录下 README.md制作数据
## 环境部署
1、准备dtk 21.04环境
2、Mlperf bert文件夹内包含paddlepaddle_rocm-0.0.0-cp36-cp36m-linux_x86_64.whl
python3 -m pip install paddlepaddle_rocm-0.0.0-cp36-cp36m-linux_x86_64.whl
## 安装python依赖包
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
# 测试脚本
## 8卡打开exchange padding测试
cp rundir_8gpu_exchange/* .
sbatch run_sbatch.sh
## 1024卡大规模并发测试
cp rundir_8gpu_exchange/* .
sbatch run_sbatch.sh
输出结果见worker.*文件
# 优化测试结果整理
测试数据存放目录:result.log
## 扩展性测试
| GPU卡数 | 单卡batch_size | gradient_accumulation | 吞吐量(seq/s) | 并行效率 |
|-------|--------------|-----------------------|------------------|--------------------------|
| 4 | 4 | 1 | 36.69 | 100% |
| 8 | 4 | 1 | 65.7 | 89.53% |
| 1024 | 4 | 1 | 7723.38 | 82.23% |
| 1024 | 8 | 1 | 9362.93-.9416.84 | 99.6%-100.25%(以单节点4卡为基准) |
## 性能优化测试
| GPU卡数 | 单卡batch_size | gradient_accumulation_steps | global batch size | |  混精度 | gemm优化 | softmax+softmax_cross_entropy  | distributed_fused_lamb | GeLU近似算法 | exchange padding | 收敛global_steps | walltime(s) |
|-------|--------------|-----------------------------|-------------------|------|---------------|--------------|--------------------------------|------------------------|---------------|----------------------|----------------|-------------|
| 8 | 4 | 14 | 448 | 优化前: | | 51.26seq/s | 85.3seq/s | 89.59seq/s | | 6697 (global steps) | 6697 | 32522.67 |
| | | | | 优化后: | 91.92seq/s  | 85.3seq/s | 89.59seq/s | 91.92seq/s  | 91.92seq/s  |  5692 (global steps) | 5692 | |
| 1024 | 4 | 1 | 4096 | 优化前: | 4458.04seq/s | | 7461seq/s | 5174.44seq/s | 7353.08seq/s | 必须off | 684 | 369.325 |
| | | | | 优化后: | 7723.38seq/s | 5174.44seq/s | 7723.38seq/s | 7461seq/s | 7723.38seq/s | | | |
| 1024 | 8 | 2 | 16384 | 优化前: |  --- | | 10634seq/s | 9083seq/s | | 必须off | 794 | 580.618 |
| | | | | 优化后: | 11330.07seq/s | 9083seq/s | 11330.07seq/s | 10634seq/s | 11330.07seq/s |
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment