Commit 684c0d4d authored by Hongkun Yu's avatar Hongkun Yu Committed by A. Unique TensorFlower
Browse files

Add docs folder in nlp/ and provides pretrained_models.md.

PiperOrigin-RevId: 350052731
parent c5234326
...@@ -51,4 +51,12 @@ READMEs for specific papers. ...@@ -51,4 +51,12 @@ READMEs for specific papers.
### Common Training Driver ### Common Training Driver
We provide a single common driver [train.py](train.py) to train above SoTA We provide a single common driver [train.py](train.py) to train above SoTA
models on popluar tasks. Please see [train.md](train.md) for more details. models on popluar tasks. Please see [docs/train.md](docs/train.md) for
more details.
### Pre-trained models with checkpoints and TF-Hub
We provide a large collection of baselines and checkpoints for NLP pre-trained
models. Please see [docs/pretrained_models.md](docs/pretrained_models.md) for
more details.
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
**WARNING**: We are on the way to deprecate most of the code in this directory. **WARNING**: We are on the way to deprecate most of the code in this directory.
Please see Please see
[this link](https://github.com/tensorflow/models/blob/master/official/nlp/train.md) [this link](https://github.com/tensorflow/models/blob/master/official/nlp/docs/train.md)
for the new tutorial. for the new tutorial.
The academic paper which describes BERT in detail and provides full results on a The academic paper which describes BERT in detail and provides full results on a
......
# Pre-trained Models
We provide a large collection of baselines and checkpoints for NLP pre-trained
models.
## How to Load Pretrained Models
### How to Initialize from Checkpoint
**Note:** TF-HUB/Savedmodel is the preferred way to distribute models as it is
self-contained. Please consider using TF-HUB for finetuning tasks first.
If you use the [NLP training library](train.md),
you can specify the checkpoint path link directly when launching your job. For
example, to initialize the model from the checkpoint, you can specify
`--params_override=task.init_checkpoint=PATH_TO_INIT_CKPT` as:
```
python3 train.py \
--params_override=task.init_checkpoint=PATH_TO_INIT_CKPT
```
### How to load TF-HUB SavedModel
Finetuning tasks such as question answering (SQuAD) and sentence
prediction (GLUE) support loading a model from TF-HUB. These built-in tasks
support a specific `task.hub_module_url` parameter. To set this parameter,
replace `--params_override=task.init_checkpoint=...` with
`--params_override=task.hub_module_url=TF_HUB_URL`, like below:
```
python3 train.py \
--params_override=task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
```
## BERT
Public BERT pre-trained models released by the BERT authors.
We released both checkpoints and tf.hub modules as the pretrained models for
fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
released in TF 1.x official BERT repository
[google-research/bert](https://github.com/google-research/bert)
in order to keep consistent with BERT paper.
### Checkpoints
Model | Configuration | Training Data | Checkpoint & Vocabulary | TF-HUB SavedModels
---------------------------------------- | :--------------------------: | ------------: | ----------------------: | ------:
BERT-base uncased English | uncased_L-12_H-768_A-12 | Wiki + Books | [uncased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/)
BERT-base cased English | cased_L-12_H-768_A-12 | Wiki + Books | [cased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/cased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/)
BERT-large uncased English | uncased_L-24_H-1024_A-16 | Wiki + Books | [uncased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/uncased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/)
BERT-large cased English | cased_L-24_H-1024_A-16 | Wiki + Books | [cased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/cased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/)
BERT-large, Uncased (Whole Word Masking) | wwm_uncased_L-24_H-1024_A-16 | Wiki + Books | [wwm_uncased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/wwm_uncased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/)
BERT-large, Cased (Whole Word Masking) | wwm_cased_L-24_H-1024_A-16 | Wiki + Books | [wwm_cased_L-24_H-1024_A-16](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/wwm_cased_L-24_H-1024_A-16.tar.gz) | [`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/)
BERT-base MultiLingual | multi_cased_L-12_H-768_A-12 | Wiki + Books | [multi_cased_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/multi_cased_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/)
BERT-base Chinese | chinese_L-12_H-768_A-12 | Wiki + Books | [chinese_L-12_H-768_A-12](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/v3/chinese_L-12_H-768_A-12.tar.gz) | [`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/)
You may explore more in the TF-Hub BERT collection:
https://tfhub.dev/google/collections/bert/1
### BERT variants
We also have pretrained BERT models with variants in both network architecture
and training methodologies. These models achieve higher downstream accuracy
scores.
Model | Configuration | Training Data | TF-HUB SavedModels | Comment
-------------------------------- | :----------------------: | -----------------------: | ------------------------------------------------------------------------------------: | ------:
BERT-base talking heads + ggelu | uncased_L-12_H-768_A-12 | Wiki + Books | [talkheads_ggelu_base](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1) | BERT-base trained with [talking heads attention](https://arxiv.org/abs/2003.02436) and [gated GeLU](https://arxiv.org/abs/2002.05202).
BERT-large talking heads + ggelu | uncased_L-24_H-1024_A-16 | Wiki + Books | [talkheads_ggelu_large](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1) | BERT-large trained with [talking heads attention](https://arxiv.org/abs/2003.02436) and [gated GeLU](https://arxiv.org/abs/2002.05202).
LAMBERT-large uncased English | uncased_L-24_H-1024_A-16 | Wiki + Books | [lambert](https://tfhub.dev/tensorflow/lambert_en_uncased_L-24_H-1024_A-16/1) | BERT trained with LAMB and techniques from RoBERTa.
# Model Garden NLP Common Training Driver # Model Garden NLP Common Training Driver
[train.py](train.py) is the common training driver that supports multiple [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) is the common training driver that supports multiple
NLP tasks (e.g., pre-training, GLUE and SQuAD fine-tuning etc) and multiple NLP tasks (e.g., pre-training, GLUE and SQuAD fine-tuning etc) and multiple
models (e.g., BERT, ALBERT, MobileBERT etc). models (e.g., BERT, ALBERT, MobileBERT etc).
## Experiment Configuration ## Experiment Configuration
[train.py] is driven by configs defined by the [ExperimentConfig](../core/config_definitions.py) [train.py] is driven by configs defined by the [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py)
including configurations for `task`, `trainer` and `runtime`. The pre-defined including configurations for `task`, `trainer` and `runtime`. The pre-defined
NLP related [ExperimentConfig](../core/config_definitions.py) can be found in NLP related [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) can be found in
[configs/experiment_configs.py](configs/experiment_configs.py). [configs/experiment_configs.py](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiment_configs.py).
## Experiment Registry ## Experiment Registry
We use an [experiment registry](../core/exp_factory.py) to build a mapping We use an [experiment registry](https://github.com/tensorflow/models/blob/master/official/core/exp_factory.py) to build a mapping
between experiment type to experiment configuration instance. For example, between experiment type to experiment configuration instance. For example,
[configs/finetuning_experiments.py](configs/finetuning_experiments.py) [configs/finetuning_experiments.py](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py)
registers `bert/sentence_prediction` and `bert/squad` experiments. User can use registers `bert/sentence_prediction` and `bert/squad` experiments. User can use
`--experiment` FLAG to invoke a registered experiment configuration, `--experiment` FLAG to invoke a registered experiment configuration,
e.g., `--experiment=bert/sentence_prediction`. e.g., `--experiment=bert/sentence_prediction`.
...@@ -39,7 +39,7 @@ In addition, experiment configuration can be further overriden by ...@@ -39,7 +39,7 @@ In addition, experiment configuration can be further overriden by
## Run on Cloud TPUs ## Run on Cloud TPUs
Next, we will describe how to run the [train.py](train.py) on Cloud TPUs. Next, we will describe how to run the [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) on Cloud TPUs.
### Setup ### Setup
First, you need to create a `tf-nightly` TPU with First, you need to create a `tf-nightly` TPU with
...@@ -64,9 +64,9 @@ This example fine-tunes BERT-base from TF-Hub on the the Multi-Genre Natural ...@@ -64,9 +64,9 @@ This example fine-tunes BERT-base from TF-Hub on the the Multi-Genre Natural
Language Inference (MultiNLI) corpus using TPUs. Language Inference (MultiNLI) corpus using TPUs.
Firstly, you can prepare the fine-tuning data using Firstly, you can prepare the fine-tuning data using
[`data/create_finetuning_data.py`](data/create_finetuning_data.py) script. [`data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/master/official/nlp/data/create_finetuning_data.py) script.
Resulting training and evaluation datasets in `tf_record` format will be later Resulting training and evaluation datasets in `tf_record` format will be later
passed to [train.py](train.py). passed to [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py).
Then you can execute the following commands to start the training and evaluation Then you can execute the following commands to start the training and evaluation
job. job.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment