Merge branch 'hepj-test' into 'main'

修改TF2框架bert模型README See merge request dcutoolkit/deeplearing/dlexamples_new!61

Merge branch 'hepj-test' into 'main'
修改TF2框架bert模型README See merge request dcutoolkit/deeplearing/dlexamples_new!61
d3910de2 · sunxx1 · 557ae9c4 · d0ec3908 · d3910de2 · d3910de2
Commit d3910de2 authored Mar 15, 2023 by sunxx1
2 changed files
--- a/TensorFlow2x/NLP/BERT/official/nlp/bert/README.md
+++ b/TensorFlow2x/NLP/BERT/official/nlp/bert/README.md
-# BERT (Bidirectional Encoder Representations from Transformers)
+# 测试前准备
-The academic paper which describes BERT in detail and provides full results on a
+## 1.数据集准备
-number of tasks can be found here: https://arxiv.org/abs/1810.04805.
-This repository contains TensorFlow 2.x implementation for BERT.
+GLUE数据集下载https://pan.baidu.com/s/1tLd8opr08Nw5PzUBh7lXsQ
-## Contents
+分类使用其中的MNLI数据集
-  * [Contents](#contents)
-  * [Pre-trained Models](#pre-trained-models)
-    * [Restoring from Checkpoints](#restoring-from-checkpoints)
-  * [Set Up](#set-up)
-  * [Process Datasets](#process-datasets)
-  * [Fine-tuning with BERT](#fine-tuning-with-bert)
-    * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus)
-    * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks)
-    * [SQuAD 1.1](#squad-1.1)
+提取码：fyvy
-## Pre-trained Models
+问答数据：
-We released both checkpoints and tf.hub modules as the pretrained models for
+[train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
-fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
-released in TF 1.x official BERT repository
-[google-research/bert](https://github.com/google-research/bert)
-in order to keep consistent with BERT paper.
+[dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
-### Access to Pretrained Checkpoints
+[evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
-Pretrained checkpoints can be found in the following links:
+## 2.环境部署
-**Note: We have switched BERT implementation
+```
-to use Keras functional-style networks in [nlp/modeling](../modeling).
+virtualenv -p python3 -system-site-packages venv_2
-The new checkpoints are:**
+source venv_2/bin/activat
+```
-*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**:
-    12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**:
-    12-layer, 768-hidden, 12-heads , 110M parameters
-*   **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-We recommend to host checkpoints on Google Cloud storage buckets when you use
+安装python依赖包
-Cloud GPU/TPU.
-### Restoring from Checkpoints
+```
+pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
+pip install tensorflow-2.7.0-cp36-cp36m-linux_x86_64.whl
+pip install horovod-0.21.3-cp36-cp36m-linux_x86_64.whl
+pip install apex-0.1-cp36-cp36m-linux_x86_64.whl
+```
-`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore
+环境变量设置
-weights from provided pre-trained checkpoints, you can use the following code:
-```python
-init_checkpoint='the pretrained model checkpoint path.'
-model=tf.keras.Model() # Bert pre-trained model as feature extractor.
-checkpoint = tf.train.Checkpoint(model=model)
-checkpoint.restore(init_checkpoint)
 ```
+module rm compiler/rocm/2.9
-Checkpoints featuring native serialized Keras models
+export ROCM_PATH=/public/home/hepj/job_env/apps/dtk-21.10.1
-(i.e. model.load()/load_weights()) will be available soon.
+export HIP_PATH=${ROCM_PATH}/hip
+export AMDGPU_TARGETS="gfx900;gfx906"
-### Access to Pretrained hub modules.
+export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${ROCM_PATH}/hcc/bin:${ROCM_PATH}/hip/bin:$PATH
-Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the
-following links:
-*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1)**:
-    12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/1)**:
-    12-layer, 768-hidden, 12-heads , 110M parameters
-*   **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1)**:
-    104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/1)**:
-    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads,
-    110M parameters
-## Set Up
-```shell
-export PYTHONPATH="$PYTHONPATH:/path/to/models"
 ```
-Install `tf-nightly` to get latest updates:
+##  3.MNLI分类测试
+###  3.1单卡测试（单精度）
-```shell
+####  3.1.1数据转化
-pip install tf-nightly-gpu
+TF2.0版本读取数据方式与TF1.0不同，需要转化为tf_record格式
+```
+python ../data/create_finetuning_data.py \
+ --input_data_dir=/public/home/hepj/data/MNLI \
+ --vocab_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/vocab.txt \
+ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
+ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
+ --fine_tuning_task_type=classification 
+ --max_seq_length=32 \
+ --classification_task_name=MNLI
 ```
-With TPU, GPU support is not necessary. First, you need to create a `tf-nightly`
+#### 3.1.2   模型转化
-TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu):
-```shell
+TF2.7.2与TF1.15.0模型存储、读取格式不同，官网给出的Bert一般是基于TF1.0的模型需要进行模型转化
-ctpu up -name <instance name> --tf-version=”nightly”
+```
+python3 tf2_encoder_checkpoint_converter.py \
+--bert_config_file /public/home/hepj/model_source/uncased_L-12_H-768_A-12/bert_config.json \
+--checkpoint_to_convert /public/home/hepjl/model_source/uncased_L-12_H-768_A-12/bert_model.ckpt \
+--converted_checkpoint_path pre_tf2x/
 ```
-Second, you need to install TF 2 `tf-nightly` on your VM:
+#### 3.1.3    bert_class.sh
-```shell
+```
-pip install tf-nightly
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+export MIOPEN_ENABLE_LOGGING_CMD=1
+export ROCBLAS_LAYER=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_classifier.py \
+  --mode=train_and_eval \
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
+  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
+  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
+  --train_batch_size= 320 \
+  --eval_batch_size=32 \
+  --steps_per_loop=1000 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=/public/home/hepj/model/tf2/out1 \
+  --distribution_strategy=mirrored
 ```
-## Process Datasets
+#### 3.1.4  运行
-### Pre-training
+sh bert_class.sh
-There is no change to generate pre-training data. Please use the script
+### 3.2    四卡测试（单精度）
-[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
-which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
-to get processed pre-training data and it adapts to TF2 symbols and python3
-compatibility.
+#### 3.2.1.     数据转化
-### Fine-tuning
+与单卡相同（3.1.1）
-To prepare the fine-tuning data for final model training, use the
+####  3.2.2.     模型转化
-[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
-Resulting datasets in `tf_record` format and training meta data should be later
-passed to training or evaluation scripts. The task-specific arguments are
-described in following sections:
-* GLUE
+与单卡相同（3.1.2）
-Users can download the
+#### 3.2.3.   bert_class4.sh
-[GLUE data](https://gluebenchmark.com/tasks) by running
-[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
-and unpack it to some directory `$GLUE_DIR`.
-Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage.
-```shell
+```
-export GLUE_DIR=~/glue
+#这里的--train_batch_size为global train_batch_size
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+#使用mpirun的方式启动多卡存在一些问题
+export HIP_VISIBLE_DEVICES=0,1,2,3
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_classifier.py \
+  --mode=train_and_eval \
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data  \
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record  \
+  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
+  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
+  --train_batch_size=1280 \
+  --eval_batch_size=32 \
+  --steps_per_loop=10 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --num_gpus=4 \
+  --model_dir=/public/home/hepj/outdir/tf2/class4 \
+  --distribution_strategy=mirrored
+```
+#### 3.2.4.     运行
-export TASK_NAME=MNLI
+```
-export OUTPUT_DIR=gs://some_bucket/datasets
+sh bert_class4.sh
-python ../data/create_finetuning_data.py \
- --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
- --vocab_file=${BERT_DIR}/vocab.txt \
- --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
- --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
- --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
- --fine_tuning_task_type=classification --max_seq_length=128 \
- --classification_task_name=${TASK_NAME}
 ```
-* SQUAD
-The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains
-detailed information about the SQuAD datasets and evaluation.
-The necessary files can be found here:
+##  4. SQUAD1.1问答测试
-*   [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
+### 4.1.     单卡测试（单精度)
-*   [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
-*   [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
-*   [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
-*   [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
-*   [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
-```shell
+#### 4.1.1.     数据转化
-export SQUAD_DIR=~/squad
-export SQUAD_VERSION=v1.1
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export OUTPUT_DIR=gs://some_bucket/datasets
-python ../data/create_finetuning_data.py \
+```
- --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
+python3 create_finetuning_data.py \
- --vocab_file=${BERT_DIR}/vocab.txt \
+ --squad_data_file=/public/home/hepj/model/model_source/sq1.1/train-v1.1.json \
- --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+ --vocab_file=/public/home/hepj/model_source/bert-large-uncased-TF2/uncased_L-24_H-1024_A-16/vocab.txt \
- --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
+ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train_new.tf_record \
- --fine_tuning_task_type=squad --max_seq_length=384
+ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data_new \
+ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/eval_new.tf_record \
+ --fine_tuning_task_type=squad \
+ --do_lower_case=Flase \
+ --max_seq_length=384
 ```
-## Fine-tuning with BERT
+#### 4.1.2.     模型转化
-### Cloud GPUs and TPUs
+```
+python3 tf2_encoder_checkpoint_converter.py \
+--bert_config_file /public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
+--checkpoint_to_convert /public/home/hepj/model/model_sourceuncased_L-24_H-1024_A-16/bert_model.ckpt \
+--converted_checkpoint_path  /public/home/hepj/model_source/bert-large-uncased-TF2/
+```
+#### 4.1.3.     bert_squad.sh
-* Cloud Storage
+```
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+export MIOPEN_ENABLE_LOGGING_CMD=1
+export ROCBLAS_LAYER=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_squad_xuan.py \
+--mode=train_and_eval \
+--vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \
+--bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
+--input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data \
+--train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record \
+--predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \
+--init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \
+--train_batch_size=4 \
+--predict_batch_size=4 \
+--learning_rate=2e-5 \
+--log_steps=1 \
+--num_gpus=1 \
+--distribution_strategy=mirrored \
+--model_dir=/public/home/hepj/model/tf2/squad1 \
+--run_eagerly=False
+```
-The unzipped pre-trained model files can also be found in the Google Cloud
+#### 4.1.4.     运行
-Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example:
-```shell
+```
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+sh bert_squad.sh
-export MODEL_DIR=gs://some_bucket/my_output_dir
 ```
-Currently, users are able to access to `tf-nightly` TPUs and the following TPU
+### 4.2.     四卡测试（单精度）
-script should run with `tf-nightly`.
-* GPU -> TPU
+#### 4.2.1.     数据转化
-Just add the following flags to `run_classifier.py` or `run_squad.py`:
+与单卡相同（4.1.1）
-```shell
+#### 4.2.2.     模型转化
-  --distribution_strategy=tpu
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
-```
-### Sentence and Sentence-pair Classification Tasks
+与单卡相同（4.1.2）
-This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase
-Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a
-few minutes on most GPUs.
-We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
-workflow.
-For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
-(uncased_L-12_H-768_A-12).
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export GLUE_DIR=gs://some_bucket/datasets
-export TASK=MRPC
-python run_classifier.py \
-  --mode='train_and_eval' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=4 \
-  --eval_batch_size=4 \
-  --steps_per_loop=1 \
-  --learning_rate=2e-5 \
-  --num_train_epochs=3 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=mirrored
-```
-Alternatively, instead of specifying `init_checkpoint`, you can specify
+#### 4.2.3.     bert_squad4.sh
-`hub_module_url` to employ a pretraind BERT hub module, e.g.,
-` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`.
-After training a model, to get predictions from the classifier, you can set the
-`--mode=predict` and offer the test set tfrecords to `--eval_data_path`.
-Output will be created in file called test_results.tsv in the output folder.
-Each line will contain output for each sample, columns are the class
-probabilities.
-```shell
-python run_classifier.py \
-  --mode='predict' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --eval_batch_size=4 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=mirrored
-```
-To use TPU, you only need to switch distribution strategy type to `tpu` with TPU
-information and use remote storage for model checkpoints.
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export TPU_IP_ADDRESS='???'
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export GLUE_DIR=gs://some_bucket/datasets
-export TASK=MRPC
-python run_classifier.py \
-  --mode='train_and_eval' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=32 \
-  --eval_batch_size=32 \
-  --steps_per_loop=1000 \
-  --learning_rate=2e-5 \
-  --num_train_epochs=3 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=tpu \
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
 ```
+#这里的--train_batch_size为global train_batch_size
-Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of
+#使用mpirun的方式启动多卡存在一些问题
-training steps inside a `tf.function` can significantly increase TPU utilization
+export HSA_FORCE_FINE_GRAIN_PCIE=1
-and callbacks will not be called inside the loop.
+export MIOPEN_FIND_MODE=3
+module unload compiler/rocm/2.9
-### SQuAD 1.1
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+export HIP_VISIBLE_DEVICES=0,1,2,3
-The Stanford Question Answering Dataset (SQuAD) is a popular question answering
+python3 run_squad_xuan.py \
-benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/).
+  --mode=train_and_eval \
+  --vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \ 
-We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
+  --bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ 
-workflow.
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data  \
-For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record  \
-(uncased_L-12_H-768_A-12).
+  --predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \ 
+  --init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \ 
-```shell
+  --train_batch_size=16 \
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export SQUAD_DIR=gs://some_bucket/datasets
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export SQUAD_VERSION=v1.1
-python run_squad.py \
-  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
-  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
-  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
-  --vocab_file=${BERT_DIR}/vocab.txt \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=4 \
  --predict_batch_size=4 \
-  --learning_rate=8e-5 \
+  --learning_rate=2e-5 \
-  --num_train_epochs=2 \
+  --log_steps=1 \
-  --model_dir=${MODEL_DIR} \
+  --num_gpus=4 \
-  --distribution_strategy=mirrored
+  --distribution_strategy=mirrored \
-```
+  --model_dir=/public/home/hepj/outdir/tf2/squad4 \
+  --run_eagerly=False
-Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to
-specify a hub module path.
-`run_squad.py` writes the prediction for `--predict_file` by default. If you set
-the `--model=predict` and offer the SQuAD test data, the scripts will generate
-the prediction json file.
-To use TPU, you need switch distribution strategy type to `tpu` with TPU
-information.
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export TPU_IP_ADDRESS='???'
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export SQUAD_DIR=gs://some_bucket/datasets
-export SQUAD_VERSION=v1.1
-python run_squad.py \
-  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
-  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
-  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
-  --vocab_file=${BERT_DIR}/vocab.txt \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=32 \
-  --learning_rate=8e-5 \
-  --num_train_epochs=2 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=tpu \
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
 ```
-The dev set predictions will be saved into a file called predictions.json in the
+#### 4.2.4.     运行
-model_dir:
-```shell
-python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json
 ```
+sh bert_squad4.sh
+```
--- a/TensorFlow2x/NLP/BERT/official/nlp/bert/README_old.md
+++ b/TensorFlow2x/NLP/BERT/official/nlp/bert/README_old.md
+# BERT (Bidirectional Encoder Representations from Transformers)
+The academic paper which describes BERT in detail and provides full results on a
+number of tasks can be found here: https://arxiv.org/abs/1810.04805.
+This repository contains TensorFlow 2.x implementation for BERT.
+## Contents
+  * [Contents](#contents)
+  * [Pre-trained Models](#pre-trained-models)
+    * [Restoring from Checkpoints](#restoring-from-checkpoints)
+  * [Set Up](#set-up)
+  * [Process Datasets](#process-datasets)
+  * [Fine-tuning with BERT](#fine-tuning-with-bert)
+    * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus)
+    * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks)
+    * [SQuAD 1.1](#squad-1.1)
+## Pre-trained Models
+We released both checkpoints and tf.hub modules as the pretrained models for
+fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
+released in TF 1.x official BERT repository
+[google-research/bert](https://github.com/google-research/bert)
+in order to keep consistent with BERT paper.
+### Access to Pretrained Checkpoints
+Pretrained checkpoints can be found in the following links:
+**Note: We have switched BERT implementation
+to use Keras functional-style networks in [nlp/modeling](../modeling).
+The new checkpoints are:**
+*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**:
+    12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**:
+    12-layer, 768-hidden, 12-heads , 110M parameters
+*   **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+We recommend to host checkpoints on Google Cloud storage buckets when you use
+Cloud GPU/TPU.
+### Restoring from Checkpoints
+`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore
+weights from provided pre-trained checkpoints, you can use the following code:
+```python
+init_checkpoint='the pretrained model checkpoint path.'
+model=tf.keras.Model() # Bert pre-trained model as feature extractor.
+checkpoint = tf.train.Checkpoint(model=model)
+checkpoint.restore(init_checkpoint)
+```
+Checkpoints featuring native serialized Keras models
+(i.e. model.load()/load_weights()) will be available soon.
+### Access to Pretrained hub modules.
+Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the
+following links:
+*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1)**:
+    12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/1)**:
+    12-layer, 768-hidden, 12-heads , 110M parameters
+*   **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1)**:
+    104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/1)**:
+    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads,
+    110M parameters
+## Set Up
+```shell
+export PYTHONPATH="$PYTHONPATH:/path/to/models"
+```
+Install `tf-nightly` to get latest updates:
+```shell
+pip install tf-nightly-gpu
+```
+With TPU, GPU support is not necessary. First, you need to create a `tf-nightly`
+TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu):
+```shell
+ctpu up -name <instance name> --tf-version=”nightly”
+```
+Second, you need to install TF 2 `tf-nightly` on your VM:
+```shell
+pip install tf-nightly
+```
+## Process Datasets
+### Pre-training
+There is no change to generate pre-training data. Please use the script
+[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
+which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
+to get processed pre-training data and it adapts to TF2 symbols and python3
+compatibility.
+### Fine-tuning
+To prepare the fine-tuning data for final model training, use the
+[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
+Resulting datasets in `tf_record` format and training meta data should be later
+passed to training or evaluation scripts. The task-specific arguments are
+described in following sections:
+* GLUE
+Users can download the
+[GLUE data](https://gluebenchmark.com/tasks) by running
+[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
+and unpack it to some directory `$GLUE_DIR`.
+Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage.
+```shell
+export GLUE_DIR=~/glue
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export TASK_NAME=MNLI
+export OUTPUT_DIR=gs://some_bucket/datasets
+python ../data/create_finetuning_data.py \
+ --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
+ --vocab_file=${BERT_DIR}/vocab.txt \
+ --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
+ --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
+ --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
+ --fine_tuning_task_type=classification --max_seq_length=128 \
+ --classification_task_name=${TASK_NAME}
+```
+* SQUAD
+The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains
+detailed information about the SQuAD datasets and evaluation.
+The necessary files can be found here:
+*   [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
+*   [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
+*   [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
+*   [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
+*   [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
+*   [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
+```shell
+export SQUAD_DIR=~/squad
+export SQUAD_VERSION=v1.1
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export OUTPUT_DIR=gs://some_bucket/datasets
+python ../data/create_finetuning_data.py \
+ --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
+ --vocab_file=${BERT_DIR}/vocab.txt \
+ --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+ --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
+ --fine_tuning_task_type=squad --max_seq_length=384
+```
+## Fine-tuning with BERT
+### Cloud GPUs and TPUs
+* Cloud Storage
+The unzipped pre-trained model files can also be found in the Google Cloud
+Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example:
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export MODEL_DIR=gs://some_bucket/my_output_dir
+```
+Currently, users are able to access to `tf-nightly` TPUs and the following TPU
+script should run with `tf-nightly`.
+* GPU -> TPU
+Just add the following flags to `run_classifier.py` or `run_squad.py`:
+```shell
+  --distribution_strategy=tpu
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+### Sentence and Sentence-pair Classification Tasks
+This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase
+Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a
+few minutes on most GPUs.
+We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
+workflow.
+For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
+(uncased_L-12_H-768_A-12).
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export GLUE_DIR=gs://some_bucket/datasets
+export TASK=MRPC
+python run_classifier.py \
+  --mode='train_and_eval' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=4 \
+  --eval_batch_size=4 \
+  --steps_per_loop=1 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+Alternatively, instead of specifying `init_checkpoint`, you can specify
+`hub_module_url` to employ a pretraind BERT hub module, e.g.,
+` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`.
+After training a model, to get predictions from the classifier, you can set the
+`--mode=predict` and offer the test set tfrecords to `--eval_data_path`.
+Output will be created in file called test_results.tsv in the output folder.
+Each line will contain output for each sample, columns are the class
+probabilities.
+```shell
+python run_classifier.py \
+  --mode='predict' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --eval_batch_size=4 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+To use TPU, you only need to switch distribution strategy type to `tpu` with TPU
+information and use remote storage for model checkpoints.
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export TPU_IP_ADDRESS='???'
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export GLUE_DIR=gs://some_bucket/datasets
+export TASK=MRPC
+python run_classifier.py \
+  --mode='train_and_eval' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=32 \
+  --eval_batch_size=32 \
+  --steps_per_loop=1000 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=tpu \
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of
+training steps inside a `tf.function` can significantly increase TPU utilization
+and callbacks will not be called inside the loop.
+### SQuAD 1.1
+The Stanford Question Answering Dataset (SQuAD) is a popular question answering
+benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/).
+We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
+workflow.
+For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
+(uncased_L-12_H-768_A-12).
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export SQUAD_DIR=gs://some_bucket/datasets
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export SQUAD_VERSION=v1.1
+python run_squad.py \
+  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
+  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
+  --vocab_file=${BERT_DIR}/vocab.txt \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=4 \
+  --predict_batch_size=4 \
+  --learning_rate=8e-5 \
+  --num_train_epochs=2 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to
+specify a hub module path.
+`run_squad.py` writes the prediction for `--predict_file` by default. If you set
+the `--model=predict` and offer the SQuAD test data, the scripts will generate
+the prediction json file.
+To use TPU, you need switch distribution strategy type to `tpu` with TPU
+information.
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export TPU_IP_ADDRESS='???'
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export SQUAD_DIR=gs://some_bucket/datasets
+export SQUAD_VERSION=v1.1
+python run_squad.py \
+  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
+  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
+  --vocab_file=${BERT_DIR}/vocab.txt \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=32 \
+  --learning_rate=8e-5 \
+  --num_train_epochs=2 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=tpu \
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+The dev set predictions will be saved into a file called predictions.json in the
+model_dir:
+```shell
+python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json
+```