# 简介 使用PyTorch框架计算Bert网络。 * BERT 的训练分为pre-train和fine-tune两种,pre-train训练分为两个phrase。 * BERT 的推理可基于不同数据集进行精度验证 * 数据生成、模型转换相关细节见 [README.md](http://10.0.100.3/dcutoolkit/deeplearing/dlexamples/-/blob/develop/PyTorch/NLP/BERT/scripts/README.md) # 运行示例 目前提供基于wiki英文数据集 pre-train 两个阶段的训练和基于squad数据集fine-tune 训练的代码示例, ## pre-train phrase1 |参数名|解释|示例| |:---:|:---:|:---:| |PATH_PHRASE1|第一阶段训练数据集路径|/workspace/lower_case_1_seq_len_128_max_pred_20_masked_lm_prob_0.
15_random_seed_12345_dupe_factor_5_shard_1472_test_split_10 |OUTPUT_DIR|输出路径|/workspace/results |PATH_CONFIG|confing路径|/workspace/bert_large_uncased |PATH_PHRASE2|第一阶段训练数据集路径|/workspace/lower_case_1_seq_len_512_max_pred_80_masked_lm_prob_0.
15_random_seed_12345_dupe_factor_5_shard_1472_test_split_10
### 单卡 ``` export HIP_VISIBLE_DEVICES=0 python3 run_pretraining_v1.py \ --input_dir=${PATH_PHRASE1} \ --output_dir=${OUTPUT_DIR}/checkpoints1 \ --config_file=${PATH_CONFIG}bert_config.json \ --bert_model=bert-large-uncased \ --train_batch_size=16 \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --max_steps=100000 \ --warmup_proportion=0.0 \ --num_steps_per_checkpoint=20000 \ --learning_rate=4.0e-4 \ --seed=12439 \ --gradient_accumulation_steps=1 \ --allreduce_post_accumulation \ --do_train \ --json-summary dllogger.json ``` ### 多卡 * 方法一 ``` export HIP_VISIBLE_DEVICES=0,1,2,3 python3 run_pretraining_v1.py \ --input_dir=${PATH_PHRASE1} \ --output_dir=${OUTPUT_DIR}/checkpoints \ --config_file=${PATH_CONFIG}bert_config.json \ --bert_model=bert-large-uncased \ --train_batch_size=16 \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --max_steps=100000 \ --warmup_proportion=0.0 \ --num_steps_per_checkpoint=20000 \ --learning_rate=4.0e-4 \ --seed=12439 \ --gradient_accumulation_steps=1 \ --allreduce_post_accumulation \ --do_train \ --json-summary dllogger.json ``` * 方法二 hostfile: ``` node1 slots=4 node2 slots=4 ``` ``` #scripts/run_pretrain.sh 脚本默认每个节点四块卡 cd scripts; bash run_pretrain.sh ``` ## pre-train phrase2 ### 单卡 ``` HIP_VISIBLE_DEVICES=0 python3 run_pretraining_v1.py --input_dir=${PATH_PHRASE2} \ --output_dir=${OUTPUT_DIR}/checkpoints2 \ --config_file=${PATH_CONFIG}bert_config.json \ --bert_model=bert-large-uncased \ --train_batch_size=4 \ --max_seq_length=512 \ --max_predictions_per_seq=80 \ --max_steps=400000 \ --warmup_proportion=0.128 \ --num_steps_per_checkpoint=200000 \ --learning_rate=4e-3 \ --seed=12439 \ --gradient_accumulation_steps=1 \ --allreduce_post_accumulation \ --do_train \ --phase2 \ --phase1_end_step=0 \ --json-summary dllogger.json ``` ### 多卡 * 方法一 ``` export HIP_VISIBLE_DEVICES=0,1,2,3 python3 run_pretraining_v1.py --input_dir=${PATH_PHRASE2} \ --output_dir=${OUTPUT_DIR}/checkpoints2 \ --config_file=${PATH_CONFIG}bert_config.json \ --bert_model=bert-large-uncased \ --train_batch_size=4 \ --max_seq_length=512 \ --max_predictions_per_seq=80 \ --max_steps=400000 \ --warmup_proportion=0.128 \ --num_steps_per_checkpoint=200000 \ --learning_rate=4e-3 \ --seed=12439 \ --gradient_accumulation_steps=1 \ --allreduce_post_accumulation \ --do_train \ --phase2 \ --phase1_end_step=0 \ --json-summary dllogger.json ``` * 方法二 hostfile: ``` node1 slots=4 node2 slots=4 ``` ``` #scripts/run_pretrain2.sh 脚本默认每个节点四块卡 cd scripts; bash run_pretrain2.sh ``` ## fine-tune 训练 ### 单卡 ``` python3 run_squad_v1.py \ --train_file squad/v1.1/train-v1.1.json \ --init_checkpoint model.ckpt-28252.pt \ --vocab_file vocab.txt \ --output_dir SQuAD \ --config_file bert_config.json \ --bert_model=bert-large-uncased \ --do_train \ --train_batch_size 1 \ --gpus_per_node 1 ``` ### 多卡 hostfile: ``` node1 slots=4 node2 slots=4 ``` ``` #scripts/run_squad_1.sh 脚本默认每个节点四块卡 bash run_squad_1.sh ``` # 参考资料 [https://github.com/mlperf/training_results_v0.7/blob/master/NVIDIA/benchmarks/bert/implementations/pytorch](https://github.com/mlperf/training_results_v0.7/blob/master/NVIDIA/benchmarks/bert/implementations/pytorch) [https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT)