# 测试前准备 ## 1.数据集准备 GLUE数据集下载https://pan.baidu.com/s/1tLd8opr08Nw5PzUBh7lXsQ 分类使用其中的MNLI数据集 提取码:fyvy 问答数据: [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json) [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json) [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py) ## 2.环境部署 ``` virtualenv -p python3 -system-site-packages venv_2 source venv_2/bin/activat ``` 安装python依赖包 ``` pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com pip install tensorflow-2.7.0-cp36-cp36m-linux_x86_64.whl pip install horovod-0.21.3-cp36-cp36m-linux_x86_64.whl pip install apex-0.1-cp36-cp36m-linux_x86_64.whl ``` 环境变量设置 ``` module rm compiler/rocm/2.9 export ROCM_PATH=/public/home/hepj/job_env/apps/dtk-21.10.1 export HIP_PATH=${ROCM_PATH}/hip export AMDGPU_TARGETS="gfx900;gfx906" export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${ROCM_PATH}/hcc/bin:${ROCM_PATH}/hip/bin:$PATH ``` ## 3.MNLI分类测试 ### 3.1单卡测试(单精度) #### 3.1.1数据转化 TF2.0版本读取数据方式与TF1.0不同,需要转化为tf_record格式 ``` python ../data/create_finetuning_data.py \ --input_data_dir=/public/home/hepj/data/MNLI \ --vocab_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/vocab.txt \ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \ --fine_tuning_task_type=classification --max_seq_length=32 \ --classification_task_name=MNLI ``` #### 3.1.2 模型转化 TF2.7.2与TF1.15.0模型存储、读取格式不同,官网给出的Bert一般是基于TF1.0的模型需要进行模型转化 ``` python3 tf2_encoder_checkpoint_converter.py \ --bert_config_file /public/home/hepj/model_source/uncased_L-12_H-768_A-12/bert_config.json \ --checkpoint_to_convert /public/home/hepjl/model_source/uncased_L-12_H-768_A-12/bert_model.ckpt \ --converted_checkpoint_path pre_tf2x/ ``` #### 3.1.3 bert_class.sh ``` export HSA_FORCE_FINE_GRAIN_PCIE=1 export MIOPEN_FIND_MODE=3 export MIOPEN_ENABLE_LOGGING_CMD=1 export ROCBLAS_LAYER=3 module unload compiler/rocm/2.9 echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE" lrank=$OMPI_COMM_WORLD_LOCAL_RANK comm_rank=$OMPI_COMM_WORLD_RANK comm_size=$OMPI_COMM_WORLD_SIZE python3 run_classifier.py \ --mode=train_and_eval \ --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \ --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \ --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \ --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \ --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \ --train_batch_size= 320 \ --eval_batch_size=32 \ --steps_per_loop=1000 \ --learning_rate=2e-5 \ --num_train_epochs=3 \ --model_dir=/public/home/hepj/model/tf2/out1 \ --distribution_strategy=mirrored ``` #### 3.1.4 运行 sh bert_class.sh ### 3.2 四卡测试(单精度) #### 3.2.1. 数据转化 与单卡相同(3.1.1) #### 3.2.2. 模型转化 与单卡相同(3.1.2) #### 3.2.3. bert_class4.sh ``` #这里的--train_batch_size为global train_batch_size #使用mpirun的方式启动多卡存在一些问题 export HIP_VISIBLE_DEVICES=0,1,2,3 export HSA_FORCE_FINE_GRAIN_PCIE=1 export MIOPEN_FIND_MODE=3 module unload compiler/rocm/2.9 echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE" lrank=$OMPI_COMM_WORLD_LOCAL_RANK comm_rank=$OMPI_COMM_WORLD_RANK comm_size=$OMPI_COMM_WORLD_SIZE python3 run_classifier.py \ --mode=train_and_eval \ --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \ --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \ --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \ --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \ --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \ --train_batch_size=1280 \ --eval_batch_size=32 \ --steps_per_loop=10 \ --learning_rate=2e-5 \ --num_train_epochs=3 \ --num_gpus=4 \ --model_dir=/public/home/hepj/outdir/tf2/class4 \ --distribution_strategy=mirrored ``` #### 3.2.4. 运行 ``` sh bert_class4.sh ``` ## 4. SQUAD1.1问答测试 ### 4.1. 单卡测试(单精度) #### 4.1.1. 数据转化 ``` python3 create_finetuning_data.py \ --squad_data_file=/public/home/hepj/model/model_source/sq1.1/train-v1.1.json \ --vocab_file=/public/home/hepj/model_source/bert-large-uncased-TF2/uncased_L-24_H-1024_A-16/vocab.txt \ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train_new.tf_record \ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data_new \ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/eval_new.tf_record \ --fine_tuning_task_type=squad \ --do_lower_case=Flase \ --max_seq_length=384 ``` #### 4.1.2. 模型转化 ``` python3 tf2_encoder_checkpoint_converter.py \ --bert_config_file /public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ --checkpoint_to_convert /public/home/hepj/model/model_sourceuncased_L-24_H-1024_A-16/bert_model.ckpt \ --converted_checkpoint_path /public/home/hepj/model_source/bert-large-uncased-TF2/ ``` #### 4.1.3. bert_squad.sh ``` export HSA_FORCE_FINE_GRAIN_PCIE=1 export MIOPEN_FIND_MODE=3 export MIOPEN_ENABLE_LOGGING_CMD=1 export ROCBLAS_LAYER=3 module unload compiler/rocm/2.9 echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE" lrank=$OMPI_COMM_WORLD_LOCAL_RANK comm_rank=$OMPI_COMM_WORLD_RANK comm_size=$OMPI_COMM_WORLD_SIZE python3 run_squad_xuan.py \ --mode=train_and_eval \ --vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \ --bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data \ --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record \ --predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \ --init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \ --train_batch_size=4 \ --predict_batch_size=4 \ --learning_rate=2e-5 \ --log_steps=1 \ --num_gpus=1 \ --distribution_strategy=mirrored \ --model_dir=/public/home/hepj/model/tf2/squad1 \ --run_eagerly=False ``` #### 4.1.4. 运行 ``` sh bert_squad.sh ``` ### 4.2. 四卡测试(单精度) #### 4.2.1. 数据转化 与单卡相同(4.1.1) #### 4.2.2. 模型转化 与单卡相同(4.1.2) #### 4.2.3. bert_squad4.sh ``` #这里的--train_batch_size为global train_batch_size #使用mpirun的方式启动多卡存在一些问题 export HSA_FORCE_FINE_GRAIN_PCIE=1 export MIOPEN_FIND_MODE=3 module unload compiler/rocm/2.9 echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE" export HIP_VISIBLE_DEVICES=0,1,2,3 python3 run_squad_xuan.py \ --mode=train_and_eval \ --vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \ --bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data \ --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record \ --predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \ --init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \ --train_batch_size=16 \ --predict_batch_size=4 \ --learning_rate=2e-5 \ --log_steps=1 \ --num_gpus=4 \ --distribution_strategy=mirrored \ --model_dir=/public/home/hepj/outdir/tf2/squad4 \ --run_eagerly=False ``` #### 4.2.4. 运行 ``` sh bert_squad4.sh ```