修改TF2框架bert模型README

d0ec3908 · hepj987 · 0a159036 · d0ec3908 · d0ec3908
Commit d0ec3908 authored Mar 15, 2023 by hepj987
2 changed files
--- a/TensorFlow2x/NLP/BERT/official/nlp/bert/README.md
+++ b/TensorFlow2x/NLP/BERT/official/nlp/bert/README.md
-# BERT (Bidirectional Encoder Representations from Transformers)
+# 测试前准备

-The academic paper which describes BERT in detail and provides full results on a
-number of tasks can be found here: https://arxiv.org/abs/1810.04805.
+## 1.数据集准备

-This repository contains TensorFlow 2.x implementation for BERT.
+GLUE数据集下载https://pan.baidu.com/s/1tLd8opr08Nw5PzUBh7lXsQ

-## Contents
-  * [Contents](#contents)
-  * [Pre-trained Models](#pre-trained-models)
-    * [Restoring from Checkpoints](#restoring-from-checkpoints)
-  * [Set Up](#set-up)
-  * [Process Datasets](#process-datasets)
-  * [Fine-tuning with BERT](#fine-tuning-with-bert)
-    * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus)
-    * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks)
-    * [SQuAD 1.1](#squad-1.1)
+分类使用其中的MNLI数据集

+提取码：fyvy

-## Pre-trained Models
+问答数据：

-We released both checkpoints and tf.hub modules as the pretrained models for
-fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
-released in TF 1.x official BERT repository
-[google-research/bert](https://github.com/google-research/bert)
-in order to keep consistent with BERT paper.
+[train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)

+[dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)

-### Access to Pretrained Checkpoints
+[evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)

-Pretrained checkpoints can be found in the following links:
+## 2.环境部署

-**Note: We have switched BERT implementation
-to use Keras functional-style networks in [nlp/modeling](../modeling).
-The new checkpoints are:**
-
-*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**:
-    12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**:
-    12-layer, 768-hidden, 12-heads , 110M parameters
-*   **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
+```
+virtualenv -p python3 -system-site-packages venv_2
+source venv_2/bin/activat
+```

-We recommend to host checkpoints on Google Cloud storage buckets when you use
-Cloud GPU/TPU.
+安装python依赖包

-### Restoring from Checkpoints
+```
+pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
+pip install tensorflow-2.7.0-cp36-cp36m-linux_x86_64.whl
+pip install horovod-0.21.3-cp36-cp36m-linux_x86_64.whl
+pip install apex-0.1-cp36-cp36m-linux_x86_64.whl
+```

-`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore
-weights from provided pre-trained checkpoints, you can use the following code:
+环境变量设置

-```python
-init_checkpoint='the pretrained model checkpoint path.'
-model=tf.keras.Model() # Bert pre-trained model as feature extractor.
-checkpoint = tf.train.Checkpoint(model=model)
-checkpoint.restore(init_checkpoint)
 ```
-
-Checkpoints featuring native serialized Keras models
-(i.e. model.load()/load_weights()) will be available soon.
-
-### Access to Pretrained hub modules.
-
-Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the
-following links:
-
-*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1)**:
-    12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/1)**:
-    12-layer, 768-hidden, 12-heads , 110M parameters
-*   **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/1)**:
-    24-layer, 1024-hidden, 16-heads, 340M parameters
-*   **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1)**:
-    104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
-*   **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/1)**:
-    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads,
-    110M parameters
-
-## Set Up
-
-```shell
-export PYTHONPATH="$PYTHONPATH:/path/to/models"
+module rm compiler/rocm/2.9
+export ROCM_PATH=/public/home/hepj/job_env/apps/dtk-21.10.1
+export HIP_PATH=${ROCM_PATH}/hip
+export AMDGPU_TARGETS="gfx900;gfx906"
+export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${ROCM_PATH}/hcc/bin:${ROCM_PATH}/hip/bin:$PATH
 ```

-Install `tf-nightly` to get latest updates:
+##  3.MNLI分类测试
+
+###  3.1单卡测试（单精度）

-```shell
-pip install tf-nightly-gpu
+####  3.1.1数据转化
+
+TF2.0版本读取数据方式与TF1.0不同，需要转化为tf_record格式
+
+```
+python ../data/create_finetuning_data.py \
+ --input_data_dir=/public/home/hepj/data/MNLI \
+ --vocab_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/vocab.txt \
+ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
+ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
+ --fine_tuning_task_type=classification 
+ --max_seq_length=32 \
+ --classification_task_name=MNLI
 ```

-With TPU, GPU support is not necessary. First, you need to create a `tf-nightly`
-TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu):
+#### 3.1.2   模型转化

-```shell
-ctpu up -name <instance name> --tf-version=”nightly”
+TF2.7.2与TF1.15.0模型存储、读取格式不同，官网给出的Bert一般是基于TF1.0的模型需要进行模型转化
+
+```
+python3 tf2_encoder_checkpoint_converter.py \
+--bert_config_file /public/home/hepj/model_source/uncased_L-12_H-768_A-12/bert_config.json \
+--checkpoint_to_convert /public/home/hepjl/model_source/uncased_L-12_H-768_A-12/bert_model.ckpt \
+--converted_checkpoint_path pre_tf2x/
 ```

-Second, you need to install TF 2 `tf-nightly` on your VM:
+#### 3.1.3    bert_class.sh

-```shell
-pip install tf-nightly
+```
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+export MIOPEN_ENABLE_LOGGING_CMD=1
+export ROCBLAS_LAYER=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_classifier.py \
+  --mode=train_and_eval \
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data \
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record \
+  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
+  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
+  --train_batch_size= 320 \
+  --eval_batch_size=32 \
+  --steps_per_loop=1000 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=/public/home/hepj/model/tf2/out1 \
+  --distribution_strategy=mirrored
 ```

-## Process Datasets
+#### 3.1.4  运行

-### Pre-training
+sh bert_class.sh

-There is no change to generate pre-training data. Please use the script
-[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
-which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
-to get processed pre-training data and it adapts to TF2 symbols and python3
-compatibility.
+### 3.2    四卡测试（单精度）

+#### 3.2.1.     数据转化

-### Fine-tuning
+与单卡相同（3.1.1）

-To prepare the fine-tuning data for final model training, use the
-[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
-Resulting datasets in `tf_record` format and training meta data should be later
-passed to training or evaluation scripts. The task-specific arguments are
-described in following sections:
+####  3.2.2.     模型转化

-* GLUE
+与单卡相同（3.1.2）

-Users can download the
-[GLUE data](https://gluebenchmark.com/tasks) by running
-[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
-and unpack it to some directory `$GLUE_DIR`.
-Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage.
+#### 3.2.3.   bert_class4.sh

-```shell
-export GLUE_DIR=~/glue
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+```
+#这里的--train_batch_size为global train_batch_size
+#使用mpirun的方式启动多卡存在一些问题
+export HIP_VISIBLE_DEVICES=0,1,2,3
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_classifier.py \
+  --mode=train_and_eval \
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/meta_data  \
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/train.tf_record \
+  --eval_data_path=/public/home/hepj/model/tf2.7.0_Bert/MNLI/eval.tf_record  \
+  --bert_config_file=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_config.json \
+  --init_checkpoint=/public/home/hepj/model/tf2.7.0_Bert/pre_tf2x/bert_model.ckpt \
+  --train_batch_size=1280 \
+  --eval_batch_size=32 \
+  --steps_per_loop=10 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --num_gpus=4 \
+  --model_dir=/public/home/hepj/outdir/tf2/class4 \
+  --distribution_strategy=mirrored
+```
+
+#### 3.2.4.     运行

-export TASK_NAME=MNLI
-export OUTPUT_DIR=gs://some_bucket/datasets
-python ../data/create_finetuning_data.py \
- --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
- --vocab_file=${BERT_DIR}/vocab.txt \
- --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
- --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
- --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
- --fine_tuning_task_type=classification --max_seq_length=128 \
- --classification_task_name=${TASK_NAME}
+```
+sh bert_class4.sh
 ```

-* SQUAD

-The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains
-detailed information about the SQuAD datasets and evaluation.

-The necessary files can be found here:
+##  4. SQUAD1.1问答测试

-*   [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
-*   [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
-*   [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
-*   [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
-*   [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
-*   [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
+### 4.1.     单卡测试（单精度)

-```shell
-export SQUAD_DIR=~/squad
-export SQUAD_VERSION=v1.1
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export OUTPUT_DIR=gs://some_bucket/datasets
+#### 4.1.1.     数据转化

-python ../data/create_finetuning_data.py \
- --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
- --vocab_file=${BERT_DIR}/vocab.txt \
- --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
- --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
- --fine_tuning_task_type=squad --max_seq_length=384
+```
+python3 create_finetuning_data.py \
+ --squad_data_file=/public/home/hepj/model/model_source/sq1.1/train-v1.1.json \
+ --vocab_file=/public/home/hepj/model_source/bert-large-uncased-TF2/uncased_L-24_H-1024_A-16/vocab.txt \
+ --train_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train_new.tf_record \
+ --meta_data_file_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data_new \
+ --eval_data_output_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/eval_new.tf_record \
+ --fine_tuning_task_type=squad \
+ --do_lower_case=Flase \
+ --max_seq_length=384
 ```

-## Fine-tuning with BERT
+#### 4.1.2.     模型转化

-### Cloud GPUs and TPUs
+```
+python3 tf2_encoder_checkpoint_converter.py \
+--bert_config_file /public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
+--checkpoint_to_convert /public/home/hepj/model/model_sourceuncased_L-24_H-1024_A-16/bert_model.ckpt \
+--converted_checkpoint_path  /public/home/hepj/model_source/bert-large-uncased-TF2/
+```
+
+#### 4.1.3.     bert_squad.sh

-* Cloud Storage
+```
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+export MIOPEN_ENABLE_LOGGING_CMD=1
+export ROCBLAS_LAYER=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+lrank=$OMPI_COMM_WORLD_LOCAL_RANK
+comm_rank=$OMPI_COMM_WORLD_RANK
+comm_size=$OMPI_COMM_WORLD_SIZE
+python3 run_squad_xuan.py \
+--mode=train_and_eval \
+--vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \
+--bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \
+--input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data \
+--train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record \
+--predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \
+--init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \
+--train_batch_size=4 \
+--predict_batch_size=4 \
+--learning_rate=2e-5 \
+--log_steps=1 \
+--num_gpus=1 \
+--distribution_strategy=mirrored \
+--model_dir=/public/home/hepj/model/tf2/squad1 \
+--run_eagerly=False
+```

-The unzipped pre-trained model files can also be found in the Google Cloud
-Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example:
+#### 4.1.4.     运行

-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export MODEL_DIR=gs://some_bucket/my_output_dir
+```
+sh bert_squad.sh
 ```

-Currently, users are able to access to `tf-nightly` TPUs and the following TPU
-script should run with `tf-nightly`.
+### 4.2.     四卡测试（单精度）

-* GPU -> TPU
+#### 4.2.1.     数据转化

-Just add the following flags to `run_classifier.py` or `run_squad.py`:
+与单卡相同（4.1.1）

-```shell
-  --distribution_strategy=tpu
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
-```
+#### 4.2.2.     模型转化

-### Sentence and Sentence-pair Classification Tasks
-
-This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase
-Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a
-few minutes on most GPUs.
-
-We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
-workflow.
-For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
-(uncased_L-12_H-768_A-12).
-
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export GLUE_DIR=gs://some_bucket/datasets
-export TASK=MRPC
-
-python run_classifier.py \
-  --mode='train_and_eval' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=4 \
-  --eval_batch_size=4 \
-  --steps_per_loop=1 \
-  --learning_rate=2e-5 \
-  --num_train_epochs=3 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=mirrored
-```
+与单卡相同（4.1.2）

-Alternatively, instead of specifying `init_checkpoint`, you can specify
-`hub_module_url` to employ a pretraind BERT hub module, e.g.,
-` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`.
-
-After training a model, to get predictions from the classifier, you can set the
-`--mode=predict` and offer the test set tfrecords to `--eval_data_path`.
-Output will be created in file called test_results.tsv in the output folder.
-Each line will contain output for each sample, columns are the class
-probabilities.
-
-```shell
-python run_classifier.py \
-  --mode='predict' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --eval_batch_size=4 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=mirrored
-```
+#### 4.2.3.     bert_squad4.sh

-To use TPU, you only need to switch distribution strategy type to `tpu` with TPU
-information and use remote storage for model checkpoints.
-
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export TPU_IP_ADDRESS='???'
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export GLUE_DIR=gs://some_bucket/datasets
-export TASK=MRPC
-
-python run_classifier.py \
-  --mode='train_and_eval' \
-  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
-  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
-  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=32 \
-  --eval_batch_size=32 \
-  --steps_per_loop=1000 \
-  --learning_rate=2e-5 \
-  --num_train_epochs=3 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=tpu \
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
 ```
-
-Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of
-training steps inside a `tf.function` can significantly increase TPU utilization
-and callbacks will not be called inside the loop.
-
-### SQuAD 1.1
-
-The Stanford Question Answering Dataset (SQuAD) is a popular question answering
-benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/).
-
-We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
-workflow.
-For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
-(uncased_L-12_H-768_A-12).
-
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export SQUAD_DIR=gs://some_bucket/datasets
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export SQUAD_VERSION=v1.1
-
-python run_squad.py \
-  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
-  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
-  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
-  --vocab_file=${BERT_DIR}/vocab.txt \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=4 \
+#这里的--train_batch_size为global train_batch_size
+#使用mpirun的方式启动多卡存在一些问题
+export HSA_FORCE_FINE_GRAIN_PCIE=1
+export MIOPEN_FIND_MODE=3
+module unload compiler/rocm/2.9
+echo "MIOPEN_FIND_MODE=$MIOPEN_FIND_MODE"
+export HIP_VISIBLE_DEVICES=0,1,2,3
+python3 run_squad_xuan.py \
+  --mode=train_and_eval \
+  --vocab_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/vocab.txt \ 
+  --bert_config_file=/public/home/hepj/model/model_source/uncased_L-24_H-1024_A-16/bert_config.json \ 
+  --input_meta_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/meta_data  \
+  --train_data_path=/public/home/hepj/model/tf2.7.0_Bert/squad1.1/train.tf_record  \
+  --predict_file=/public/home/hepj/model/model_source/sq1.1/dev-v1.1.json \ 
+  --init_checkpoint=/public/home/hepj/model_source/bert-large-uncased-TF2/bert_model.ckpt \ 
+  --train_batch_size=16 \
  --predict_batch_size=4 \
-  --learning_rate=8e-5 \
-  --num_train_epochs=2 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=mirrored
-```
-
-Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to
-specify a hub module path.
-
-`run_squad.py` writes the prediction for `--predict_file` by default. If you set
-the `--model=predict` and offer the SQuAD test data, the scripts will generate
-the prediction json file.
-
-To use TPU, you need switch distribution strategy type to `tpu` with TPU
-information.
-
-```shell
-export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
-export TPU_IP_ADDRESS='???'
-export MODEL_DIR=gs://some_bucket/my_output_dir
-export SQUAD_DIR=gs://some_bucket/datasets
-export SQUAD_VERSION=v1.1
-
-python run_squad.py \
-  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
-  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
-  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
-  --vocab_file=${BERT_DIR}/vocab.txt \
-  --bert_config_file=${BERT_DIR}/bert_config.json \
-  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
-  --train_batch_size=32 \
-  --learning_rate=8e-5 \
-  --num_train_epochs=2 \
-  --model_dir=${MODEL_DIR} \
-  --distribution_strategy=tpu \
-  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+  --learning_rate=2e-5 \
+  --log_steps=1 \
+  --num_gpus=4 \
+  --distribution_strategy=mirrored \
+  --model_dir=/public/home/hepj/outdir/tf2/squad4 \
+  --run_eagerly=False
 ```

-The dev set predictions will be saved into a file called predictions.json in the
-model_dir:
+#### 4.2.4.     运行

-```shell
-python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json
 ```
+sh bert_squad4.sh
+```
+


--- a/TensorFlow2x/NLP/BERT/official/nlp/bert/README_old.md
+++ b/TensorFlow2x/NLP/BERT/official/nlp/bert/README_old.md
+# BERT (Bidirectional Encoder Representations from Transformers)
+
+The academic paper which describes BERT in detail and provides full results on a
+number of tasks can be found here: https://arxiv.org/abs/1810.04805.
+
+This repository contains TensorFlow 2.x implementation for BERT.
+
+## Contents
+  * [Contents](#contents)
+  * [Pre-trained Models](#pre-trained-models)
+    * [Restoring from Checkpoints](#restoring-from-checkpoints)
+  * [Set Up](#set-up)
+  * [Process Datasets](#process-datasets)
+  * [Fine-tuning with BERT](#fine-tuning-with-bert)
+    * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus)
+    * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks)
+    * [SQuAD 1.1](#squad-1.1)
+
+
+## Pre-trained Models
+
+We released both checkpoints and tf.hub modules as the pretrained models for
+fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
+released in TF 1.x official BERT repository
+[google-research/bert](https://github.com/google-research/bert)
+in order to keep consistent with BERT paper.
+
+
+### Access to Pretrained Checkpoints
+
+Pretrained checkpoints can be found in the following links:
+
+**Note: We have switched BERT implementation
+to use Keras functional-style networks in [nlp/modeling](../modeling).
+The new checkpoints are:**
+
+*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**:
+    12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**:
+    12-layer, 768-hidden, 12-heads , 110M parameters
+*   **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+
+We recommend to host checkpoints on Google Cloud storage buckets when you use
+Cloud GPU/TPU.
+
+### Restoring from Checkpoints
+
+`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore
+weights from provided pre-trained checkpoints, you can use the following code:
+
+```python
+init_checkpoint='the pretrained model checkpoint path.'
+model=tf.keras.Model() # Bert pre-trained model as feature extractor.
+checkpoint = tf.train.Checkpoint(model=model)
+checkpoint.restore(init_checkpoint)
+```
+
+Checkpoints featuring native serialized Keras models
+(i.e. model.load()/load_weights()) will be available soon.
+
+### Access to Pretrained hub modules.
+
+Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the
+following links:
+
+*   **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1)**:
+    12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/1)**:
+    12-layer, 768-hidden, 12-heads , 110M parameters
+*   **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/1)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
+*   **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1)**:
+    104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
+*   **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/1)**:
+    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads,
+    110M parameters
+
+## Set Up
+
+```shell
+export PYTHONPATH="$PYTHONPATH:/path/to/models"
+```
+
+Install `tf-nightly` to get latest updates:
+
+```shell
+pip install tf-nightly-gpu
+```
+
+With TPU, GPU support is not necessary. First, you need to create a `tf-nightly`
+TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu):
+
+```shell
+ctpu up -name <instance name> --tf-version=”nightly”
+```
+
+Second, you need to install TF 2 `tf-nightly` on your VM:
+
+```shell
+pip install tf-nightly
+```
+
+## Process Datasets
+
+### Pre-training
+
+There is no change to generate pre-training data. Please use the script
+[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
+which is essentially branched from [BERT research repo](https://github.com/google-research/bert)
+to get processed pre-training data and it adapts to TF2 symbols and python3
+compatibility.
+
+
+### Fine-tuning
+
+To prepare the fine-tuning data for final model training, use the
+[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
+Resulting datasets in `tf_record` format and training meta data should be later
+passed to training or evaluation scripts. The task-specific arguments are
+described in following sections:
+
+* GLUE
+
+Users can download the
+[GLUE data](https://gluebenchmark.com/tasks) by running
+[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
+and unpack it to some directory `$GLUE_DIR`.
+Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage.
+
+```shell
+export GLUE_DIR=~/glue
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+
+export TASK_NAME=MNLI
+export OUTPUT_DIR=gs://some_bucket/datasets
+python ../data/create_finetuning_data.py \
+ --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
+ --vocab_file=${BERT_DIR}/vocab.txt \
+ --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
+ --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
+ --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
+ --fine_tuning_task_type=classification --max_seq_length=128 \
+ --classification_task_name=${TASK_NAME}
+```
+
+* SQUAD
+
+The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains
+detailed information about the SQuAD datasets and evaluation.
+
+The necessary files can be found here:
+
+*   [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
+*   [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
+*   [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
+*   [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
+*   [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
+*   [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)
+
+```shell
+export SQUAD_DIR=~/squad
+export SQUAD_VERSION=v1.1
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export OUTPUT_DIR=gs://some_bucket/datasets
+
+python ../data/create_finetuning_data.py \
+ --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
+ --vocab_file=${BERT_DIR}/vocab.txt \
+ --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+ --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
+ --fine_tuning_task_type=squad --max_seq_length=384
+```
+
+## Fine-tuning with BERT
+
+### Cloud GPUs and TPUs
+
+* Cloud Storage
+
+The unzipped pre-trained model files can also be found in the Google Cloud
+Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example:
+
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export MODEL_DIR=gs://some_bucket/my_output_dir
+```
+
+Currently, users are able to access to `tf-nightly` TPUs and the following TPU
+script should run with `tf-nightly`.
+
+* GPU -> TPU
+
+Just add the following flags to `run_classifier.py` or `run_squad.py`:
+
+```shell
+  --distribution_strategy=tpu
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+
+### Sentence and Sentence-pair Classification Tasks
+
+This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase
+Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a
+few minutes on most GPUs.
+
+We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
+workflow.
+For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
+(uncased_L-12_H-768_A-12).
+
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export GLUE_DIR=gs://some_bucket/datasets
+export TASK=MRPC
+
+python run_classifier.py \
+  --mode='train_and_eval' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=4 \
+  --eval_batch_size=4 \
+  --steps_per_loop=1 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+
+Alternatively, instead of specifying `init_checkpoint`, you can specify
+`hub_module_url` to employ a pretraind BERT hub module, e.g.,
+` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`.
+
+After training a model, to get predictions from the classifier, you can set the
+`--mode=predict` and offer the test set tfrecords to `--eval_data_path`.
+Output will be created in file called test_results.tsv in the output folder.
+Each line will contain output for each sample, columns are the class
+probabilities.
+
+```shell
+python run_classifier.py \
+  --mode='predict' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --eval_batch_size=4 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+
+To use TPU, you only need to switch distribution strategy type to `tpu` with TPU
+information and use remote storage for model checkpoints.
+
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export TPU_IP_ADDRESS='???'
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export GLUE_DIR=gs://some_bucket/datasets
+export TASK=MRPC
+
+python run_classifier.py \
+  --mode='train_and_eval' \
+  --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
+  --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
+  --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=32 \
+  --eval_batch_size=32 \
+  --steps_per_loop=1000 \
+  --learning_rate=2e-5 \
+  --num_train_epochs=3 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=tpu \
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+
+Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of
+training steps inside a `tf.function` can significantly increase TPU utilization
+and callbacks will not be called inside the loop.
+
+### SQuAD 1.1
+
+The Stanford Question Answering Dataset (SQuAD) is a popular question answering
+benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/).
+
+We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
+workflow.
+For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
+(uncased_L-12_H-768_A-12).
+
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export SQUAD_DIR=gs://some_bucket/datasets
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export SQUAD_VERSION=v1.1
+
+python run_squad.py \
+  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
+  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
+  --vocab_file=${BERT_DIR}/vocab.txt \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=4 \
+  --predict_batch_size=4 \
+  --learning_rate=8e-5 \
+  --num_train_epochs=2 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=mirrored
+```
+
+Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to
+specify a hub module path.
+
+`run_squad.py` writes the prediction for `--predict_file` by default. If you set
+the `--model=predict` and offer the SQuAD test data, the scripts will generate
+the prediction json file.
+
+To use TPU, you need switch distribution strategy type to `tpu` with TPU
+information.
+
+```shell
+export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
+export TPU_IP_ADDRESS='???'
+export MODEL_DIR=gs://some_bucket/my_output_dir
+export SQUAD_DIR=gs://some_bucket/datasets
+export SQUAD_VERSION=v1.1
+
+python run_squad.py \
+  --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
+  --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
+  --predict_file=${SQUAD_DIR}/dev-v1.1.json \
+  --vocab_file=${BERT_DIR}/vocab.txt \
+  --bert_config_file=${BERT_DIR}/bert_config.json \
+  --init_checkpoint=${BERT_DIR}/bert_model.ckpt \
+  --train_batch_size=32 \
+  --learning_rate=8e-5 \
+  --num_train_epochs=2 \
+  --model_dir=${MODEL_DIR} \
+  --distribution_strategy=tpu \
+  --tpu=grpc://${TPU_IP_ADDRESS}:8470
+```
+
+The dev set predictions will be saved into a file called predictions.json in the
+model_dir:
+
+```shell
+python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json
+```
+
+