Unverified Commit ca552843 authored by Srihari Humbarwadi's avatar Srihari Humbarwadi Committed by GitHub
Browse files

Merge branch 'panoptic-segmentation' into panoptic-segmentation

parents 7e2f7a35 6b90e134
# NLP example project
This is a tutorial for setting up your project using TF-NLP library. Here we
focus on the scaffolding of project and pay little attention to any modeling
aspects.
Below we use classification as an example.
## Setup your codebase
First you need to define the
[Task](https://github.com/tensorflow/models/blob/master/official/core/base_task.py)
by inheirting it. Task is an abstraction of any machine learning task, here we
focus on two things inputs and optimization target.
NOTE: We use BertClassifier as base model. You can shop other models
[here](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models).
#### Step 1: build\_inputs
Here we use [CoLA](https://nyu-mll.github.io/CoLA/), a binary classification
task as an example.
TODO(saberkun): Add demo data instructions.
There are 4 fields we care about in the tf.Example, input_ids, input_mask,
segment_ids and label_ids. Then we start with a simple data loader by inheriting
the
[DataLoader](https://github.com/tensorflow/models/blob/master/official/nlp/data/data_loader.py)
interface.
```python
class ClassificationDataLoader(data_loader.DataLoader):
...
def _parse(self, record: Mapping[str, tf.Tensor]):
"""Parses raw tensors into a dict of tensors to be consumed by the model."""
x = {
'input_word_ids': record['input_ids'],
'input_mask': record['input_mask'],
'input_type_ids': record['segment_ids']
}
y = record['label_ids']
return (x, y)
...
```
Overall, loader will translate the tf.Example to approiate format for model to
consume. Then in Task.build_inputs, link the dataset like
```python
def build_inputs(self):
...
loader = classification_data_loader.ClassificationDataLoader(params)
return loader.load(input_context)
```
#### Step 2: build\_losses
We use standard cross entropy loss and make sure the `build_losses()` returns a
float scalar Tensor.
```python
def build_losses(self, labels, model_outputs, aux_losses=None):
loss = tf.keras.losses.sparse_categorical_crossentropy(
labels, tf.cast(model_outputs, tf.float32), from_logits=True)
...
```
#### Try the workflow locally.
We use a small BERT model for local trial and error. Below is the command:
```shell
# Assume you are under official/nlp/projects.
python3 example/train.py \
--experiment=example_bert_classification_example \
--config_file=example/local_example.yaml \
--mode=train \
--model_dir=/tmp/example_project_test/
```
The train binary translates the config file for the experiments. Usually you may
just change the task import logics:
```python
task_config = classification_example.ClassificationExampleConfig()
task = classification_example.ClassificationExampleTask(task_config)
```
TIPs: You can also check the [unittest](https://github.com/tensorflow/models/blob/master/official/nlp/projects/example/classification_example_test.py)
for better understanding.
### Finetune
TF-NLP make it easy to start from a [pretrained checkpoint](https://github.com/tensorflow/models/blob/master/official/nlp/docs/pretrained_models.md),
try below. This is done through configuring task.init_checkpoint in the YAML
config below, see the [base_task.initialize](https://github.com/tensorflow/models/blob/master/official/core/base_task.py)
method for more details.
We use GCP TPU to demonstrate this.
```shell
EXP_NAME=bert_base_cola
EXP_TYPE=example_bert_classification_example
CONFIG_FILE=example/experiments/classification_ft_cola.yaml
TPU_NAME=experiment01
MODEL_DIR=your GCS bucket folder
python3 example/train.py \
--experiment=$EXP_TYPE \
--mode=train_and_eval \
--tpu=$TPU_NAME \
--model_dir=${MODEL_DIR}
--config_file=${CONFIG_FILE}
```
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Loads dataset for classification tasks."""
from typing import Dict, Mapping, Optional, Tuple
import dataclasses
import tensorflow as tf
from official.core import config_definitions as cfg
from official.core import input_reader
from official.nlp.data import data_loader
@dataclasses.dataclass
class ClassificationExampleDataConfig(cfg.DataConfig):
"""Data config for token classification task."""
seq_length: int = 128
class ClassificationDataLoader(data_loader.DataLoader):
"""A class to load dataset for sentence prediction (classification) task."""
def __init__(self, params):
self._params = params
self._seq_length = params.seq_length
def _decode(self, record: tf.Tensor) -> Dict[str, tf.Tensor]:
"""Decodes a serialized tf.Example."""
name_to_features = {
'input_ids': tf.io.FixedLenFeature([self._seq_length], tf.int64),
'input_mask': tf.io.FixedLenFeature([self._seq_length], tf.int64),
'segment_ids': tf.io.FixedLenFeature([self._seq_length], tf.int64),
'label_ids': tf.io.FixedLenFeature([], tf.int64),
}
example = tf.io.parse_single_example(record, name_to_features)
# tf.Example only supports tf.int64, but the TPU only supports tf.int32.
# So cast all int64 to int32.
for name in example:
t = example[name]
if t.dtype == tf.int64:
t = tf.cast(t, tf.int32)
example[name] = t
return example
def _parse(
self,
record: Mapping[str,
tf.Tensor]) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
"""Parses raw tensors into a dict of tensors to be consumed by the model."""
x = {
'input_word_ids': record['input_ids'],
'input_mask': record['input_mask'],
'input_type_ids': record['segment_ids']
}
y = record['label_ids']
return (x, y)
def load(
self,
input_context: Optional[tf.distribute.InputContext] = None
) -> tf.data.Dataset:
"""Returns a tf.dataset.Dataset."""
reader = input_reader.InputReader(
params=self._params, decoder_fn=self._decode, parser_fn=self._parse)
return reader.read(input_context)
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Classifcation Task Showcase."""
import dataclasses
from typing import List, Mapping, Text
from seqeval import metrics as seqeval_metrics
import tensorflow as tf
from official.core import base_task
from official.core import config_definitions as cfg
from official.core import exp_factory
from official.modeling import optimization
from official.modeling import tf_utils
from official.modeling.hyperparams import base_config
from official.nlp.configs import encoders
from official.nlp.modeling import models
from official.nlp.projects.example import classification_data_loader
from official.nlp.tasks import utils
@dataclasses.dataclass
class ModelConfig(base_config.Config):
"""A base span labeler configuration."""
encoder: encoders.EncoderConfig = encoders.EncoderConfig()
head_dropout: float = 0.1
head_initializer_range: float = 0.02
@dataclasses.dataclass
class ClassificationExampleConfig(cfg.TaskConfig):
"""The model config."""
# At most one of `init_checkpoint` and `hub_module_url` can be specified.
init_checkpoint: str = ''
hub_module_url: str = ''
model: ModelConfig = ModelConfig()
num_classes = 2
class_names = ['A', 'B']
train_data: cfg.DataConfig = classification_data_loader.ClassificationExampleDataConfig(
)
validation_data: cfg.DataConfig = classification_data_loader.ClassificationExampleDataConfig(
)
class ClassificationExampleTask(base_task.Task):
"""Task object for classification."""
def build_model(self) -> tf.keras.Model:
if self.task_config.hub_module_url and self.task_config.init_checkpoint:
raise ValueError('At most one of `hub_module_url` and '
'`init_checkpoint` can be specified.')
if self.task_config.hub_module_url:
encoder_network = utils.get_encoder_from_hub(
self.task_config.hub_module_url)
else:
encoder_network = encoders.build_encoder(self.task_config.model.encoder)
return models.BertClassifier(
network=encoder_network,
num_classes=len(self.task_config.class_names),
initializer=tf.keras.initializers.TruncatedNormal(
stddev=self.task_config.model.head_initializer_range),
dropout_rate=self.task_config.model.head_dropout)
def build_losses(self, labels, model_outputs, aux_losses=None) -> tf.Tensor:
loss = tf.keras.losses.sparse_categorical_crossentropy(
labels, tf.cast(model_outputs, tf.float32), from_logits=True)
return tf_utils.safe_mean(loss)
def build_inputs(self,
params: cfg.DataConfig,
input_context=None) -> tf.data.Dataset:
"""Returns tf.data.Dataset for sentence_prediction task."""
loader = classification_data_loader.ClassificationDataLoader(params)
return loader.load(input_context)
def inference_step(self, inputs,
model: tf.keras.Model) -> Mapping[str, tf.Tensor]:
"""Performs the forward step."""
logits = model(inputs, training=False)
return {
'logits': logits,
'predict_ids': tf.argmax(logits, axis=-1, output_type=tf.int32)
}
def validation_step(self,
inputs,
model: tf.keras.Model,
metrics=None) -> Mapping[str, tf.Tensor]:
"""Validatation step.
Args:
inputs: a dictionary of input tensors.
model: the keras.Model.
metrics: a nested structure of metrics objects.
Returns:
A dictionary of logs.
"""
features, labels = inputs
outputs = self.inference_step(features, model)
loss = self.build_losses(labels=labels, model_outputs=outputs['logits'])
# Negative label ids are padding labels which should be ignored.
real_label_index = tf.where(tf.greater_equal(labels, 0))
predict_ids = tf.gather_nd(outputs['predict_ids'], real_label_index)
label_ids = tf.gather_nd(labels, real_label_index)
return {
self.loss: loss,
'predict_ids': predict_ids,
'label_ids': label_ids,
}
def aggregate_logs(self,
state=None,
step_outputs=None) -> Mapping[Text, List[List[Text]]]:
"""Aggregates over logs returned from a validation step."""
if state is None:
state = {'predict_class': [], 'label_class': []}
def id_to_class_name(batched_ids):
class_names = []
for per_example_ids in batched_ids:
class_names.append([])
for per_token_id in per_example_ids.numpy().tolist():
class_names[-1].append(self.task_config.class_names[per_token_id])
return class_names
# Convert id to class names, because `seqeval_metrics` relies on the class
# name to decide IOB tags.
state['predict_class'].extend(id_to_class_name(step_outputs['predict_ids']))
state['label_class'].extend(id_to_class_name(step_outputs['label_ids']))
return state
def reduce_aggregated_logs(self,
aggregated_logs,
global_step=None) -> Mapping[Text, float]:
"""Reduces aggregated logs over validation steps."""
label_class = aggregated_logs['label_class']
predict_class = aggregated_logs['predict_class']
return {
'f1':
seqeval_metrics.f1_score(label_class, predict_class),
'precision':
seqeval_metrics.precision_score(label_class, predict_class),
'recall':
seqeval_metrics.recall_score(label_class, predict_class),
'accuracy':
seqeval_metrics.accuracy_score(label_class, predict_class),
}
@exp_factory.register_config_factory('example_bert_classification_example')
def bert_classification_example() -> cfg.ExperimentConfig:
"""Return a minimum experiment config for Bert token classification."""
return cfg.ExperimentConfig(
task=ClassificationExampleConfig(),
trainer=cfg.TrainerConfig(
optimizer_config=optimization.OptimizationConfig({
'optimizer': {
'type': 'adamw',
},
'learning_rate': {
'type': 'polynomial',
},
'warmup': {
'type': 'polynomial'
}
})),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for nlp.projects.example.classification_example."""
import tensorflow as tf
from official.core import config_definitions as cfg
from official.nlp.configs import encoders
from official.nlp.projects.example import classification_data_loader
from official.nlp.projects.example import classification_example
class ClassificationExampleTest(tf.test.TestCase):
def get_model_config(self):
return classification_example.ModelConfig(
encoder=encoders.EncoderConfig(
bert=encoders.BertEncoderConfig(vocab_size=30522, num_layers=2)))
def get_dummy_dataset(self, params: cfg.DataConfig):
def dummy_data(_):
dummy_ids = tf.zeros((1, params.seq_length), dtype=tf.int32)
x = dict(
input_word_ids=dummy_ids,
input_mask=dummy_ids,
input_type_ids=dummy_ids)
y = tf.zeros((1, 1), dtype=tf.int32)
return (x, y)
dataset = tf.data.Dataset.range(1)
dataset = dataset.repeat()
dataset = dataset.map(
dummy_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
return dataset
def test_task_with_dummy_data(self):
train_data_config = (
classification_data_loader.ClassificationExampleDataConfig(
input_path='dummy', seq_length=128, global_batch_size=1))
task_config = classification_example.ClassificationExampleConfig(
model=self.get_model_config(),)
task = classification_example.ClassificationExampleTask(task_config)
task.build_inputs = self.get_dummy_dataset
model = task.build_model()
metrics = task.build_metrics()
dataset = task.build_inputs(train_data_config)
iterator = iter(dataset)
optimizer = tf.keras.optimizers.SGD(lr=0.1)
task.initialize(model)
task.train_step(next(iterator), model, optimizer, metrics=metrics)
if __name__ == '__main__':
tf.test.main()
task:
model:
encoder:
type: bert
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
hidden_activation: gelu
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
init_checkpoint: 'BERT checkpoint'
train_data:
input_path: 'YourData/COLA_train.tf_record'
is_training: true
global_batch_size: 32
validation_data:
input_path: 'YourData/COLA_eval.tf_record'
is_training: false
global_batch_size: 32
trainer:
checkpoint_interval: 5000
max_to_keep: 5
steps_per_loop: 100
summary_interval: 100
train_steps: 10000
validation_interval: 100
validation_steps: -1
optimizer_config:
learning_rate:
polynomial:
initial_learning_rate: 0.00002
decay_steps: 10000
warmup:
polynomial:
warmup_steps: 100
task:
model:
encoder:
type: bert
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
hidden_activation: gelu
hidden_size: 288
initializer_range: 0.02
intermediate_size: 256
max_position_embeddings: 512
num_attention_heads: 6
num_layers: 2
type_vocab_size: 4
vocab_size: 114507
train_data:
input_path: 'YourData/COLA_train.tf_record'
is_training: true
global_batch_size: 32
validation_data:
input_path: 'YourData/COLA_eval.tf_record'
is_training: false
global_batch_size: 32
trainer:
checkpoint_interval: 500
max_to_keep: 5
steps_per_loop: 100
summary_interval: 100
train_steps: 500
validation_interval: 100
validation_steps: -1
optimizer_config:
learning_rate:
polynomial:
initial_learning_rate: 0.001
decay_steps: 740000
warmup:
polynomial:
warmup_steps: 100
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""A customized training library for the specific task."""
from absl import app
from absl import flags
import gin
from official.common import distribute_utils
from official.common import flags as tfm_flags
from official.core import train_lib
from official.core import train_utils
from official.modeling import performance
from official.nlp.projects.example import classification_example
FLAGS = flags.FLAGS
def main(_):
gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params)
params = train_utils.parse_configuration(FLAGS)
model_dir = FLAGS.model_dir
if 'train' in FLAGS.mode:
# Pure eval modes do not output yaml files. Otherwise continuous eval job
# may race against the train job for writing the same file.
train_utils.serialize_config(params, model_dir)
# Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16'
# can have significant impact on model speeds by utilizing float16 in case of
# GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when
# dtype is float16
if params.runtime.mixed_precision_dtype:
performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype)
distribution_strategy = distribute_utils.get_distribution_strategy(
distribution_strategy=params.runtime.distribution_strategy,
all_reduce_alg=params.runtime.all_reduce_alg,
num_gpus=params.runtime.num_gpus,
tpu_address=params.runtime.tpu,
**params.runtime.model_parallelism())
with distribution_strategy.scope():
task = classification_example.ClassificationExampleTask(params.task)
train_lib.run_experiment(
distribution_strategy=distribution_strategy,
task=task,
mode=FLAGS.mode,
params=params,
model_dir=model_dir)
train_utils.save_gin_config(FLAGS.mode, model_dir)
if __name__ == '__main__':
tfm_flags.define_flags()
app.run(main)
......@@ -13,18 +13,18 @@
# limitations under the License.
"""Progressive distillation for MobileBERT student model."""
import dataclasses
from typing import List, Optional
from absl import logging
import dataclasses
import orbit
import tensorflow as tf
from official.core import base_task
from official.core import config_definitions as cfg
from official.modeling import optimization
from official.modeling import tf_utils
from official.modeling.fast_training.progressive import policies
from official.modeling.hyperparams import base_config
from official.modeling.progressive import policies
from official.nlp import keras_nlp
from official.nlp.configs import bert
from official.nlp.configs import encoders
......
......@@ -22,7 +22,7 @@ import tensorflow as tf
from official.core import config_definitions as cfg
from official.modeling import optimization
from official.modeling import tf_utils
from official.modeling.progressive import trainer as prog_trainer_lib
from official.modeling.fast_training.progressive import trainer as prog_trainer_lib
from official.nlp.configs import bert
from official.nlp.configs import encoders
from official.nlp.data import pretrain_dataloader
......
......@@ -28,8 +28,8 @@ from official.core import train_utils
from official.modeling import hyperparams
from official.modeling import optimization
from official.modeling import performance
from official.modeling.progressive import train_lib
from official.modeling.progressive import trainer as prog_trainer_lib
from official.modeling.fast_training.progressive import train_lib
from official.modeling.fast_training.progressive import trainer as prog_trainer_lib
from official.nlp.data import pretrain_dataloader
from official.nlp.projects.mobilebert import distillation
......
task:
hub_module_url: ''
model:
num_classes: 3
train_data:
drop_remainder: true
global_batch_size: 32
input_path: ''
is_training: true
seq_length: 128
validation_data:
drop_remainder: false
global_batch_size: 32
input_path: ''
is_training: false
seq_length: 128
trainer:
checkpoint_interval: 1000
optimizer_config:
learning_rate:
polynomial:
decay_steps: 61359
end_learning_rate: 0.0
initial_learning_rate: 3.0e-05
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 6136
type: polynomial
steps_per_loop: 1000
summary_interval: 1000
# Training data size 392,702 examples, 5 epochs.
train_steps: 61359
validation_interval: 2000
validation_steps: 307
task:
hub_module_url: ''
max_answer_length: 30
n_best_size: 20
null_score_diff_threshold: 0.0
train_data:
drop_remainder: true
global_batch_size: 32
input_path: ''
is_training: true
seq_length: 384
validation_data:
do_lower_case: true
doc_stride: 128
drop_remainder: false
global_batch_size: 32
input_path: ''
is_training: false
query_length: 64
seq_length: 384
tokenization: WordPiece
version_2_with_negative: false
vocab_file: ''
trainer:
checkpoint_interval: 500
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
decay_steps: 5549
end_learning_rate: 0.0
initial_learning_rate: 5.0e-05
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 555
type: polynomial
steps_per_loop: 500
summary_interval: 500
train_steps: 5549
validation_interval: 500
validation_steps: 339
task:
hub_module_url: ''
max_answer_length: 30
n_best_size: 20
null_score_diff_threshold: 0.0
train_data:
drop_remainder: true
global_batch_size: 32
input_path: ''
is_training: true
seq_length: 384
validation_data:
do_lower_case: true
doc_stride: 128
drop_remainder: false
global_batch_size: 32
input_path: ''
is_training: false
query_length: 64
seq_length: 384
tokenization: WordPiece
version_2_with_negative: true
vocab_file: ''
trainer:
checkpoint_interval: 500
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
decay_steps: 8160
end_learning_rate: 0.0
initial_learning_rate: 5.0e-05
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
name: polynomial
power: 1
warmup_steps: 816
type: polynomial
steps_per_loop: 500
summary_interval: 500
train_steps: 8160
validation_interval: 500
validation_steps: 383
task:
model:
cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768,
name: next_sentence, num_classes: 2}]
generator_encoder:
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 768
hidden_activation: gelu
hidden_size: 256
initializer_range: 0.02
intermediate_size: 1024
max_position_embeddings: 512
num_attention_heads: 4
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
num_masked_tokens: 76
sequence_length: 512
num_classes: 2
discriminator_encoder:
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 768
hidden_activation: gelu
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
discriminator_loss_weight: 50.0
disallow_correct: false
tie_embeddings: true
train_data:
drop_remainder: true
global_batch_size: 256
input_path: ''
is_training: true
max_predictions_per_seq: 76
seq_length: 512
use_next_sentence_label: false
use_position_id: false
validation_data:
drop_remainder: true
global_batch_size: 256
input_path: ''
is_training: false
max_predictions_per_seq: 76
seq_length: 512
use_next_sentence_label: false
use_position_id: false
trainer:
checkpoint_interval: 6000
max_to_keep: 50
optimizer_config:
learning_rate:
polynomial:
cycle: false
decay_steps: 1000000
end_learning_rate: 0.0
initial_learning_rate: 0.0002
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 10000
type: polynomial
steps_per_loop: 1000
summary_interval: 1000
train_steps: 1000000
validation_interval: 100
validation_steps: 64
task:
hub_module_url: ''
model:
num_classes: 3
train_data:
drop_remainder: true
global_batch_size: 32
input_path: ''
is_training: true
seq_length: 128
validation_data:
drop_remainder: false
global_batch_size: 32
input_path: ''
is_training: false
seq_length: 128
trainer:
checkpoint_interval: 1000
optimizer_config:
learning_rate:
polynomial:
decay_steps: 61359
end_learning_rate: 0.0
initial_learning_rate: 1.0e-04
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 6136
type: polynomial
steps_per_loop: 1000
summary_interval: 1000
# Training data size 392,702 examples, 5 epochs.
train_steps: 61359
validation_interval: 2000
# Eval data size = 9815 examples.
validation_steps: 307
task:
hub_module_url: ''
max_answer_length: 30
n_best_size: 20
null_score_diff_threshold: 0.0
train_data:
drop_remainder: true
global_batch_size: 48
input_path: ''
is_training: true
seq_length: 384
validation_data:
do_lower_case: true
doc_stride: 128
drop_remainder: false
global_batch_size: 48
input_path: ''
is_training: false
query_length: 64
seq_length: 384
tokenization: WordPiece
version_2_with_negative: false
vocab_file: ''
trainer:
checkpoint_interval: 500
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
decay_steps: 9248
end_learning_rate: 0.0
initial_learning_rate: 8.0e-05
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 925
type: polynomial
steps_per_loop: 500
summary_interval: 500
train_steps: 9248
validation_interval: 500
validation_steps: 226
task:
hub_module_url: ''
max_answer_length: 30
n_best_size: 20
null_score_diff_threshold: 0.0
train_data:
drop_remainder: true
global_batch_size: 48
input_path: ''
is_training: true
seq_length: 384
validation_data:
do_lower_case: true
doc_stride: 128
drop_remainder: false
global_batch_size: 48
input_path: ''
is_training: false
query_length: 64
seq_length: 384
tokenization: WordPiece
version_2_with_negative: true
vocab_file: ''
trainer:
checkpoint_interval: 500
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
decay_steps: 13601
end_learning_rate: 0.0
initial_learning_rate: 8.0e-05
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
name: polynomial
power: 1
warmup_steps: 1360
type: polynomial
steps_per_loop: 500
summary_interval: 500
train_steps: 13601
validation_interval: 500
validation_steps: 255
task:
model:
candidate_size: 5
num_shared_generator_hidden_layers: 3
num_discriminator_task_agnostic_layers: 11
tie_embeddings: true
generator:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 128
hidden_activation: gelu
hidden_size: 256
initializer_range: 0.02
intermediate_size: 1024
max_position_embeddings: 512
num_attention_heads: 4
num_layers: 6
type_vocab_size: 2
vocab_size: 30522
discriminator:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 128
hidden_activation: gelu
hidden_size: 256
initializer_range: 0.02
intermediate_size: 1024
max_position_embeddings: 512
num_attention_heads: 4
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
train_data:
drop_remainder: true
global_batch_size: 256
input_path: ''
is_training: true
max_predictions_per_seq: 76
seq_length: 512
use_next_sentence_label: false
use_position_id: false
validation_data:
drop_remainder: true
global_batch_size: 256
input_path: ''
is_training: false
max_predictions_per_seq: 76
seq_length: 512
use_next_sentence_label: false
use_position_id: false
trainer:
checkpoint_interval: 4000
max_to_keep: 5
optimizer_config:
learning_rate:
polynomial:
cycle: false
decay_steps: 500000
end_learning_rate: 0.0
initial_learning_rate: 0.0005
power: 1.0
type: polynomial
optimizer:
type: adamw
warmup:
polynomial:
power: 1
warmup_steps: 10000
type: polynomial
steps_per_loop: 4000
summary_interval: 4000
train_steps: 500000
validation_interval: 100
validation_steps: 64
task:
model:
encoder:
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 768
hidden_activation: gelu
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
task:
model:
encoder:
bert:
attention_dropout_rate: 0.1
dropout_rate: 0.1
embedding_size: 128
hidden_activation: gelu
hidden_size: 256
initializer_range: 0.02
intermediate_size: 1024
max_position_embeddings: 512
num_attention_heads: 4
num_layers: 12
type_vocab_size: 2
vocab_size: 30522
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment