README.md

# 测试前准备

## 1.数据集准备

GLUE数据集下载https://pan.baidu.com/s/1tLd8opr08Nw5PzUBh7lXsQ

分类使用其中的MNLI数据集

提取码：fyvy

问答数据：

[train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)

[dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)

[evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)

## 2.环境部署

```
virtualenv -p python3 -system-site-packages venv_2
source venv_2/bin/activat
```

安装python依赖包

```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
pip install tensorflow-2.7.0-cp36-cp36m-linux_x86_64.whl
pip install horovod-0.21.3-cp36-cp36m-linux_x86_64.whl
pip install apex-0.1-cp36-cp36m-linux_x86_64.whl
```

环境变量设置

```
module rm compiler/rocm/2.9
export ROCM_PATH=/public/home/hepj/job_env/apps/dtk-21.10.1
export HIP_PATH=${ROCM_PATH}/hip
export AMDGPU_TARGETS="gfx900;gfx906"
export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${ROCM_PATH}/hcc/bin:${ROCM_PATH}/hip/bin:$PATH
```

##  3.MNLI分类测试

###  3.1单卡测试（单精度）

####  3.1.1数据转化

TF2.0版本读取数据方式与TF1.0不同，需要转化为tf_record格式

```
python ../data/create_finetuning_data.py \
 --input_data_dir=/public/home/hepj/data/MNLI \
 --vocab_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/vocab.txt \
 --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
 --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
 --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
 --fine_tuning_task_type=classification 
 --max_seq_length=32 \
 --classification_task_name=MNLI
```

#### 3.1.2   模型转化

TF2.7.2与TF1.15.0模型存储、读取格式不同，官网给出的Bert一般是基于TF1.0的模型需要进行模型转化

```
python3 tf2_encoder_checkpoint_converter.py \
--bert_config_file /public/home/hepj/model_source/uncased_L-12_H-768_A-12/bert_config.json \
--checkpoint_to_convert /public/home/hepjl/model_source/uncased_L-12_H-768_A-12/bert_model.ckpt \
--converted_checkpoint_path pre_tf2x/
```

#### 3.1.3    bert_class.sh

```
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_FIND_MODE=3
export MIOPEN_ENABLE_LOGGING_CMD=1
export ROCBLAS_LAYER=3
module unload compiler/rocm/2.9
echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
comm_rank=$OMPI_COMM_WORLD_RANK
comm_size=$OMPI_COMM_WORLD_SIZE
python3 run_classifier.py \
  --mode=train_and_eval \
  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
  --train_batch_size= 320 \
  --eval_batch_size=32 \
  --steps_per_loop=1000 \
  --learning_rate=2e-5 \
  --num_train_epochs=3 \
  --model_dir=/public/home/hepj/model/tf2/out1 \
  --distribution_strategy=mirrored
```

#### 3.1.4  运行

sh bert_class.sh

### 3.2    四卡测试（单精度）

#### 3.2.1.     数据转化

与单卡相同（3.1.1）

####  3.2.2.     模型转化

与单卡相同（3.1.2）

#### 3.2.3.   bert_class4.sh

```
#这里的--train_batch_size为global train_batch_size
#使用mpirun的方式启动多卡存在一些问题
export HIP_VISIBLE_DEVICES=0,1,2,3
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_FIND_MODE=3
module unload compiler/rocm/2.9
echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
comm_rank=$OMPI_COMM_WORLD_RANK
comm_size=$OMPI_COMM_WORLD_SIZE
python3 run_classifier.py \
  --mode=train_and_eval \
  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data  \
  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record  \
  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
  --train_batch_size=1280 \
  --eval_batch_size=32 \
  --steps_per_loop=10 \
  --learning_rate=2e-5 \
  --num_train_epochs=3 \
  --num_gpus=4 \
  --model_dir=/public/home/hepj/outdir/tf2/class4 \
  --distribution_strategy=mirrored
```

#### 3.2.4.     运行

```
sh bert_class4.sh
```


##  4. SQUAD1.1问答测试

### 4.1.     单卡测试（单精度)

#### 4.1.1.     数据转化

```
python3 create_finetuning_data.py \
 --squad_data_file=/public/home/hepj/model/model_source/sq1.1/train-v1.1.json \
 --vocab_file=/public/home/hepj/model_source/bert-large-uncased-TF2/uncased_L-24_H-1024_A-16/vocab.txt \
 --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train_new.tf_record \
 --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data_new \
 --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/eval_new.tf_record \
 --fine_tuning_task_type=squad \
 --do_lower_case=Flase \
 --max_seq_length=384
```

#### 4.1.2.     模型转化

```
python3 tf2_encoder_checkpoint_converter.py \
--bert_config_file /public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
--checkpoint_to_convert /public/home/hepj/model/model_sourceuncased_L-24_H-1024_A-16/bert_model.ckpt \
--converted_checkpoint_path  /public/home/hepj/model_source/bert-large-uncased-TF2/
```

#### 4.1.3.     bert_squad.sh

```
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_FIND_MODE=3
export MIOPEN_ENABLE_LOGGING_CMD=1
export ROCBLAS_LAYER=3
module unload compiler/rocm/2.9
echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
comm_rank=$OMPI_COMM_WORLD_RANK
comm_size=$OMPI_COMM_WORLD_SIZE
python3 run_squad_xuan.py \
--mode=train_and_eval \
--vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \
--bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
--input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data \
--train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record \
--predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \
--init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \
--train_batch_size=4 \
--predict_batch_size=4 \
--learning_rate=2e-5 \
--log_steps=1 \
--num_gpus=1 \
--distribution_strategy=mirrored \
--model_dir=/public/home/hepj/model/tf2/squad1 \
--run_eagerly=False
```

#### 4.1.4.     运行

```
sh bert_squad.sh
```

### 4.2.     四卡测试（单精度）

#### 4.2.1.     数据转化

与单卡相同（4.1.1）

#### 4.2.2.     模型转化

与单卡相同（4.1.2）

#### 4.2.3.     bert_squad4.sh

```
#这里的--train_batch_size为global train_batch_size
#使用mpirun的方式启动多卡存在一些问题
export HSA_FORCE_FINE_GRAIN_PCIE=1
export MIOPEN_FIND_MODE=3
module unload compiler/rocm/2.9
echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
export HIP_VISIBLE_DEVICES=0,1,2,3
python3 run_squad_xuan.py \
  --mode=train_and_eval \
  --vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \ 
  --bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ 
  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data  \
  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record  \
  --predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \ 
  --init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \ 
  --train_batch_size=16 \
  --predict_batch_size=4 \
  --learning_rate=2e-5 \
  --log_steps=1 \
  --num_gpus=4 \
  --distribution_strategy=mirrored \
  --model_dir=/public/home/hepj/outdir/tf2/squad4 \
  --run_eagerly=False
```

#### 4.2.4.     运行

```
sh bert_squad4.sh
```