"src/include/vscode:/vscode.git/clone" did not exist on "233540751a5d06369c8424427f291493cf148a29"
README.md 3.88 KB
Newer Older
liangjing's avatar
liangjing committed
1
# BERT
liangjing's avatar
update  
liangjing committed
2
## 论文
liangjing's avatar
liangjing committed
3

liangjing's avatar
update  
liangjing committed
4
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
liangjing's avatar
liangjing committed
5

liangjing's avatar
update  
liangjing committed
6
* https://arxiv.org/abs/1810.04805
liangjing's avatar
liangjing committed
7
8

## 模型结构
liangjing's avatar
update  
liangjing committed
9
BERT模型的核心是Transformer编码器,BERT-large是BERT模型的一个更大、更复杂的版本,其包含24个Transformer编码器,每个编码器有1024个隐藏层,总共包含340M个参数。在预训练阶段,BERT-large使用更多的未标记的文本数据进行预训练,并使用Masked Language Model(MLM)和Next Sentence Prediction(NSP)两个任务来优化模型。
liangjing's avatar
liangjing committed
10

liangjing's avatar
update  
liangjing committed
11
12
13
14
15
16
17
下图为BERT的模型结构示意图

![figure1](figure1.png)

## 算法原理

BERT用大量的无监督文本通过自监督训练的方式训练,把文本中包含的语言知识(包括:词法、语法、语义等特征)以参数的形式编码到Transformer-encoder layer中,即用了Masked LM及Next Sentence Prediction两种方法分别捕捉词语和句子级别的representation。
liangjing's avatar
liangjing committed
18

liangjing's avatar
liangjing committed
19
20
21
22
23
24
## 目标精度

0.72 Mask-LM accuracy

## MLPerf代码参考版本

liangjing's avatar
liangjing committed
25
版本:v2.1 
liangjing's avatar
liangjing committed
26

liangjing's avatar
liangjing committed
27
28
原始代码位置:
* https://github.com/mlcommons/training_results_v2.1/tree/main/Baidu/benchmarks/bert/implementations/8_node_64_A100_PaddlePaddle
liangjing's avatar
liangjing committed
29

liangjing's avatar
update  
liangjing committed
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## 环境配置

提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:

    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:mlperf_paddle_bert_mpirun
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径
    docker run -it --name mlperf_bert --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash

镜像版本依赖:
* DTK驱动:dtk21.04
* python: python3.6.8

测试目录:

```
/root/mlperf-paddle_bert.20220919-training-bert/training/bert
```

## 预训练模型

/workspace/bert_data文件夹存放预训练模型如下:

    ├── /workpalce/bert_data/phase1
    └── └──model.ckpt-28252.tf_pickled #预训练模型 

liangjing's avatar
liangjing committed
57
## 数据集
liangjing's avatar
update  
liangjing committed
58

liangjing's avatar
liangjing committed
59
60
61
62
63
模型训练的数据集来自Wikipedia 2020/01/01,即一种常用的自然语言处理数据集,它包含了维基百科上的文章和对应的摘要(即第一段内容),可用于各种文本相关的任务,例如文本分类、文本摘要、命名实体识别等。

下载+预处理数据可按照下述进行,最终获得的输入数据如下图所示:

    ./input_preprocessing/prepare_data.sh --outputdir /workspace/bert_data 
liangjing's avatar
liangjing committed
64
![dataset](dataset.png)
liangjing's avatar
liangjing committed
65

liangjing's avatar
update  
liangjing committed
66
对于预训练模型需要基于已下载数据进行如下处理:
liangjing's avatar
liangjing committed
67

liangjing's avatar
update  
liangjing committed
68
69
70
     python3 models/load_tf_checkpoint.py \
        /workspace/bert_data/phase1/model.ckpt-28252 \
        /workspace/bert_data/phase1/model.ckpt-28252.tf_pickled
liangjing's avatar
liangjing committed
71

liangjing's avatar
update  
liangjing committed
72
可得到/workspace/bert_data文件夹存放预训练模型如下:
liangjing's avatar
liangjing committed
73
74
75
76

    ├── /workpalce/bert_data/phase1
    └── └──model.ckpt-28252.tf_pickled #预训练模型 

liangjing's avatar
update  
liangjing committed
77
78
79
80
81
## 训练

### 单机多卡

单机8卡进行性能&&精度测试
liangjing's avatar
liangjing committed
82
83

    bash run_8gpu.sh
huchen1's avatar
huchen1 committed
84
    
liangjing's avatar
liangjing committed
85
86
87
    #不同环境的配置及数据的存放路径会有不同,请根据实际情况进行调整run_benchmark_8gpu.sh脚本中的如下内容:
    BASE_DATA_DIR=${BASE_DATA_DIR:-"/public/DL_DATA/mlperf/bert"} //调整为具体的数据的路径

liangjing's avatar
update  
liangjing committed
88
### result
liangjing's avatar
liangjing committed
89

liangjing's avatar
update  
liangjing committed
90
采用上述输入数据,加速卡采用Z100L * 8,可最终达到官方收敛要求,即达到目标精度0.72 Mask-LM accuracy;
liangjing's avatar
liangjing committed
91

liangjing's avatar
update  
liangjing committed
92
93
94
| 卡数 | 类型     | 进程数 | 达到精度              |
| ---- | -------- | ------ | --------------------- |
| 8    | 混合精度 | 8      | 0.72 Mask-LM accuracy |
liangjing's avatar
liangjing committed
95

liangjing's avatar
update  
liangjing committed
96
## 应用场景
liangjing's avatar
liangjing committed
97

liangjing's avatar
update  
liangjing committed
98
### 算法类别
liangjing's avatar
liangjing committed
99

liangjing's avatar
update  
liangjing committed
100
自然语言处理
liangjing's avatar
liangjing committed
101

liangjing's avatar
update  
liangjing committed
102
### 热点应用行业
liangjing's avatar
liangjing committed
103

liangjing's avatar
liangjing committed
104
零售、广媒
liangjing's avatar
liangjing committed
105

liangjing's avatar
liangjing committed
106
## 源码仓库及问题反馈
liangjing's avatar
update  
liangjing committed
107

liangjing's avatar
liangjing committed
108
* https://developer.hpccube.com/codes/modelzoo/mlperf_bert-large
liangjing's avatar
liangjing committed
109

liangjing's avatar
liangjing committed
110
111
112
## 参考
* https://mlcommons.org/en/
* https://github.com/mlcommons