Add docs folder in nlp/ and provides pretrained_models.md.

PiperOrigin-RevId: 350052731

Add docs folder in nlp/ and provides pretrained_models.md.
PiperOrigin-RevId: 350052731
c857cb8a · Hongkun Yu · A. Unique TensorFlower · bf4a5360 · c857cb8a · c857cb8a
Commit c857cb8a authored Jan 04, 2021 by Hongkun Yu Committed by A. Unique TensorFlower Jan 04, 2021
4 changed files
--- a/official/nlp/README.md
+++ b/official/nlp/README.md
@@ -51,4 +51,12 @@ READMEs for specific papers.
 ### Common Training Driver

 We provide a single common driver [train.py](train.py) to train above SoTA
-models on popluar tasks. Please see [train.md](train.md) for more details.
+models on popluar tasks. Please see [docs/train.md](docs/train.md) for
+more details.
+
+
+### Pre-trained models with checkpoints and TF-Hub
+
+We provide a large collection of baselines and checkpoints for NLP pre-trained
+models. Please see [docs/pretrained_models.md](docs/pretrained_models.md) for
+more details.
--- a/official/nlp/bert/README.md
+++ b/official/nlp/bert/README.md
@@ -2,7 +2,7 @@

 **WARNING**: We are on the way to deprecate most of the code in this directory.
 Please see
-[this link](https://github.com/tensorflow/models/blob/master/official/nlp/train.md)
+[this link](https://github.com/tensorflow/models/blob/master/official/nlp/docs/train.md)
 for the new tutorial.

 The academic paper which describes BERT in detail and provides full results on a

--- a/official/nlp/docs/pretrained_models.md
+++ b/official/nlp/docs/pretrained_models.md
+# Pre-trained Models
+
+We provide a large collection of baselines and checkpoints for NLP pre-trained
+models.
+
+## How to Load Pretrained Models
+
+### How to Initialize from Checkpoint
+
+**Note:** TF-HUB/Savedmodel is the preferred way to distribute models as it is
+self-contained. Please consider using TF-HUB for finetuning tasks first.
+
+If you use the [NLP training library](train.md),
+you can specify the checkpoint path link directly when launching your job. For
+example, to initialize the model from the checkpoint, you can specify
+`--params_override=task.init_checkpoint=PATH_TO_INIT_CKPT` as:
+
+```
+python3 train.py \
+ --params_override=task.init_checkpoint=PATH_TO_INIT_CKPT
+```
+
+### How to load TF-HUB SavedModel
+
+Finetuning tasks such as question answering (SQuAD) and sentence
+prediction (GLUE) support loading a model from TF-HUB. These built-in tasks
+support a specific `task.hub_module_url` parameter. To set this parameter,
+replace `--params_override=task.init_checkpoint=...` with
+`--params_override=task.hub_module_url=TF_HUB_URL`, like below:
+
+```
+python3 train.py \
+ --params_override=task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
+```
+
+## BERT
+
+Public BERT pre-trained models released by the BERT authors.
+
+We released both checkpoints and tf.hub modules as the pretrained models for
+fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
+released in TF 1.x official BERT repository
+[google-research/bert](https://github.com/google-research/bert)
+in order to keep consistent with BERT paper.
+
+### Checkpoints
+
+Model                                    | Configuration                | Training Data | Checkpoint & Vocabulary | TF-HUB SavedModels
+---------------------------------------- | :--------------------------: | ------------: | ----------------------: | ------:
+BERT-base uncased English                | uncased_L-12_H-768_A-12      | Wiki + Books  | [uncased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/)
+BERT-base cased English                  | cased_L-12_H-768_A-12        | Wiki + Books  | [cased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/cased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/)
+BERT-large uncased English               | uncased_L-24_H-1024_A-16     | Wiki + Books  | [uncased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/uncased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/)
+BERT-large cased English                  | cased_L-24_H-1024_A-16       | Wiki + Books  | [cased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/cased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/)
+BERT-large, Uncased (Whole Word Masking) | wwm_uncased_L-24_H-1024_A-16 | Wiki + Books  | [wwm_uncased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/wwm_uncased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/)
+BERT-large, Cased (Whole Word Masking)   | wwm_cased_L-24_H-1024_A-16   | Wiki + Books  | [wwm_cased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/wwm_cased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/)
+BERT-base MultiLingual                   | multi_cased_L-12_H-768_A-12  | Wiki + Books  | [multi_cased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/multi_cased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/)
+BERT-base Chinese                        | chinese_L-12_H-768_A-12      | Wiki + Books  | [chinese_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/chinese_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/)
+
+You may explore more in the TF-Hub BERT collection:
+https://tfhub.dev/google/collections/bert/1
+
+### BERT variants
+
+We also have pretrained BERT models with variants in both network architecture
+and training methodologies. These models achieve higher downstream accuracy
+scores.
+
+Model                            | Configuration            | Training Data            | TF-HUB SavedModels                                                                    | Comment
+-------------------------------- | :----------------------: | -----------------------: | ------------------------------------------------------------------------------------: | ------:
+BERT-base talking heads + ggelu  | uncased_L-12_H-768_A-12  | Wiki + Books   | [talkheads_ggelu_base](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1)   | BERT-base trained with [talking heads attention](https://arxiv.org/abs/2003.02436) and [gated GeLU](https://arxiv.org/abs/2002.05202).
+BERT-large talking heads + ggelu | uncased_L-24_H-1024_A-16 | Wiki + Books  | [talkheads_ggelu_large](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1) | BERT-large trained with [talking heads attention](https://arxiv.org/abs/2003.02436) and [gated GeLU](https://arxiv.org/abs/2002.05202).
+LAMBERT-large uncased English    | uncased_L-24_H-1024_A-16 | Wiki + Books  | [lambert](https://tfhub.dev/tensorflow/lambert_en_uncased_L-24_H-1024_A-16/1)         | BERT trained with LAMB and techniques from RoBERTa.
--- a/official/nlp/train.md
+++ b/official/nlp/train.md
 # Model Garden NLP Common Training Driver

-[train.py](train.py) is the common training driver that supports multiple
+[train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) is the common training driver that supports multiple
 NLP tasks (e.g., pre-training, GLUE and SQuAD fine-tuning etc) and multiple
 models (e.g., BERT, ALBERT, MobileBERT etc).

 ## Experiment Configuration

-[train.py] is driven by configs defined by the [ExperimentConfig](../core/config_definitions.py)
+[train.py] is driven by configs defined by the [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py)
 including configurations for `task`, `trainer` and `runtime`. The pre-defined
-NLP related [ExperimentConfig](../core/config_definitions.py) can be found in
-[configs/experiment_configs.py](configs/experiment_configs.py).
+NLP related [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) can be found in
+[configs/experiment_configs.py](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiment_configs.py).

 ## Experiment Registry

-We use an [experiment registry](../core/exp_factory.py) to build a mapping
+We use an [experiment registry](https://github.com/tensorflow/models/blob/master/official/core/exp_factory.py) to build a mapping
 between experiment type to experiment configuration instance. For example,
-[configs/finetuning_experiments.py](configs/finetuning_experiments.py)
+[configs/finetuning_experiments.py](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py)
 registers `bert/sentence_prediction` and `bert/squad` experiments. User can use
 `--experiment` FLAG to invoke a registered experiment configuration,
 e.g., `--experiment=bert/sentence_prediction`.
@@ -39,7 +39,7 @@ In addition, experiment configuration can be further overriden by

 ## Run on Cloud TPUs

-Next, we will describe how to run the [train.py](train.py) on Cloud TPUs.
+Next, we will describe how to run the [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) on Cloud TPUs.

 ### Setup
 First, you need to create a `tf-nightly` TPU with
@@ -64,9 +64,9 @@ This example fine-tunes BERT-base from TF-Hub on the the Multi-Genre Natural
 Language Inference (MultiNLI) corpus using TPUs.

 Firstly, you can prepare the fine-tuning data using
-[`data/create_finetuning_data.py`](data/create_finetuning_data.py) script.
+[`data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/master/official/nlp/data/create_finetuning_data.py) script.
 Resulting training and evaluation datasets in `tf_record` format will be later
-passed to [train.py](train.py).
+passed to [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py).

 Then you can execute the following commands to start the training and evaluation
 job.