Merge branch 'master' of https://github.com/tensorflow/models into latest

e2385734 · Kaushik Shivakumar · 30c14aa9 · 1bfb577d · e2385734 · e2385734
Commit e2385734 authored Jul 10, 2020 by Kaushik Shivakumar
20 changed files
--- a/README.md
+++ b/README.md
@@ -10,11 +10,13 @@ can take full advantage of TensorFlow for their research and product development
 | [official](official) | • A collection of example implementations for SOTA models using the latest TensorFlow 2's high-level APIs<br />• Officially maintained, supported, and kept up to date with the latest TensorFlow 2 APIs by TensorFlow<br />• Reasonably optimized for fast performance while still being easy to read |
 | [research](research) | • A collection of research model implementations in TensorFlow 1 or 2 by researchers<br />• Maintained and supported by researchers |
 | [community](community) | • A curated list of the GitHub repositories with machine learning models and implementations powered by TensorFlow 2 |
+| [orbit](orbit) | • A flexible and lightweight library that users can easily use or fork when writing customized training loop code in TensorFlow 2.x. It seamlessly integrates with `tf.distribute` and supports running on different device types (CPU, GPU, and TPU). |
 ## [Announcements](https://github.com/tensorflow/models/wiki/Announcements)
 | Date | News |
 |------|------|
+| July 10, 2020 | TensorFlow 2 meets the [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) ([Blog](https://blog.tensorflow.org/2020/07/tensorflow-2-meets-object-detection-api.html)) |
 | June 30, 2020 | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://github.com/tensorflow/models/tree/master/official/vision/detection#train-a-spinenet-49-based-mask-r-cnn) released ([Tweet](https://twitter.com/GoogleAI/status/1278016712978264064)) |
 | June 17, 2020 | [Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection](https://github.com/tensorflow/models/tree/master/research/object_detection#june-17th-2020) released ([Tweet](https://twitter.com/GoogleAI/status/1276571419422253057)) |
 | May 21, 2020 | [Unifying Deep Local and Global Features for Image Search (DELG)](https://github.com/tensorflow/models/tree/master/research/delf#delg) code released |
@@ -23,12 +25,6 @@ can take full advantage of TensorFlow for their research and product development
 | May 1, 2020 | [DELF: DEep Local Features](https://github.com/tensorflow/models/tree/master/research/delf) updated to support TensorFlow 2.1 |
 | March 31, 2020 | [Introducing the Model Garden for TensorFlow 2](https://blog.tensorflow.org/2020/03/introducing-model-garden-for-tensorflow-2.html) ([Tweet](https://twitter.com/TensorFlow/status/1245029834633297921)) |
-## [Milestones](https://github.com/tensorflow/models/milestones)
-| Date | Milestone |
-|------|-----------|
-| July 8, 2020 | [![GitHub milestone](https://img.shields.io/github/milestones/progress/tensorflow/models/1)](https://github.com/tensorflow/models/milestone/1) |
 ## Contributions
 [![help wanted:paper implementation](https://img.shields.io/github/issues/tensorflow/models/help%20wanted%3Apaper%20implementation)](https://github.com/tensorflow/models/labels/help%20wanted%3Apaper%20implementation)

--- a/official/benchmark/unet3d_benchmark.py
+++ b/official/benchmark/unet3d_benchmark.py
@@ -93,8 +93,11 @@ class Unet3DAccuracyBenchmark(keras_benchmark.KerasBenchmark):
    """Runs and reports the benchmark given the provided configuration."""
    params = unet_training_lib.extract_params(FLAGS)
    strategy = unet_training_lib.create_distribution_strategy(params)
-    if params.use_bfloat16:
-      policy = tf.keras.mixed_precision.experimental.Policy('mixed_bfloat16')
+    input_dtype = params.dtype
+    if input_dtype == 'float16' or input_dtype == 'bfloat16':
+      policy = tf.keras.mixed_precision.experimental.Policy(
+          'mixed_bfloat16' if input_dtype == 'bfloat16' else 'mixed_float16')
      tf.keras.mixed_precision.experimental.set_policy(policy)
    stats = {}

--- a/official/colab/nlp/customize_encoder.ipynb
+++ b/official/colab/nlp/customize_encoder.ipynb
--- a/official/colab/nlp/nlp_modeling_library_intro.ipynb
+++ b/official/colab/nlp/nlp_modeling_library_intro.ipynb
--- a/official/modeling/hyperparams/base_config.py
+++ b/official/modeling/hyperparams/base_config.py
@@ -126,10 +126,10 @@ class Config(params_dict.ParamsDict):
    subconfig_type = Config
    if k in cls.__annotations__:
      # Directly Config subtype.
-      type_annotation = cls.__annotations__[k]
+      type_annotation = cls.__annotations__[k]  # pytype: disable=invalid-annotation
      if (isinstance(type_annotation, type) and
          issubclass(type_annotation, Config)):
-        subconfig_type = cls.__annotations__[k]
+        subconfig_type = cls.__annotations__[k]  # pytype: disable=invalid-annotation
      else:
        # Check if the field is a sequence of subtypes.
        field_type = getattr(type_annotation, '__origin__', type(None))

--- a/official/modeling/optimization/configs/learning_rate_config.py
+++ b/official/modeling/optimization/configs/learning_rate_config.py
@@ -20,6 +20,20 @@ import dataclasses
 from official.modeling.hyperparams import base_config
+@dataclasses.dataclass
+class ConstantLrConfig(base_config.Config):
+  """Configuration for constant learning rate.
+  This class is a containers for the constant learning rate decay configs.
+  Attributes:
+    name: The name of the learning rate schedule. Defaults to Constant.
+    learning_rate: A float. The learning rate. Defaults to 0.1.
+  """
+  name: str = 'Constant'
+  learning_rate: float = 0.1
 @dataclasses.dataclass
 class StepwiseLrConfig(base_config.Config):
  """Configuration for stepwise learning rate decay.

--- a/official/modeling/optimization/configs/optimization_config.py
+++ b/official/modeling/optimization/configs/optimization_config.py
@@ -55,12 +55,14 @@ class LrConfig(oneof.OneOfConfig):
  Attributes:
    type: 'str', type of lr schedule to be used, on the of fields below.
+    constant: constant learning rate config.
    stepwise: stepwise learning rate config.
    exponential: exponential learning rate config.
    polynomial: polynomial learning rate config.
    cosine: cosine learning rate config.
  """
  type: Optional[str] = None
+  constant: lr_cfg.ConstantLrConfig = lr_cfg.ConstantLrConfig()
  stepwise: lr_cfg.StepwiseLrConfig = lr_cfg.StepwiseLrConfig()
  exponential: lr_cfg.ExponentialLrConfig = lr_cfg.ExponentialLrConfig()
  polynomial: lr_cfg.PolynomialLrConfig = lr_cfg.PolynomialLrConfig()

--- a/official/modeling/optimization/configs/optimizer_config.py
+++ b/official/modeling/optimization/configs/optimizer_config.py
@@ -28,13 +28,11 @@ class SGDConfig(base_config.Config):
  Attributes:
    name: name of the optimizer.
-    learning_rate: learning_rate for SGD optimizer.
    decay: decay rate for SGD optimizer.
    nesterov: nesterov for SGD optimizer.
    momentum: momentum for SGD optimizer.
  """
  name: str = "SGD"
-  learning_rate: float = 0.01
  decay: float = 0.0
  nesterov: bool = False
  momentum: float = 0.0
@@ -49,14 +47,12 @@ class RMSPropConfig(base_config.Config):
  Attributes:
    name: name of the optimizer.
-    learning_rate: learning_rate for RMSprop optimizer.
    rho: discounting factor for RMSprop optimizer.
    momentum: momentum for RMSprop optimizer.
    epsilon: epsilon value for RMSprop optimizer, help with numerical stability.
    centered: Whether to normalize gradients or not.
  """
  name: str = "RMSprop"
-  learning_rate: float = 0.001
  rho: float = 0.9
  momentum: float = 0.0
  epsilon: float = 1e-7
@@ -72,7 +68,6 @@ class AdamConfig(base_config.Config):
  Attributes:
    name: name of the optimizer.
-    learning_rate: learning_rate for Adam optimizer.
    beta_1: decay rate for 1st order moments.
    beta_2: decay rate for 2st order moments.
    epsilon: epsilon value used for numerical stability in Adam optimizer.
@@ -80,7 +75,6 @@ class AdamConfig(base_config.Config):
    the paper "On the Convergence of Adam and beyond".
  """
  name: str = "Adam"
-  learning_rate: float = 0.001
  beta_1: float = 0.9
  beta_2: float = 0.999
  epsilon: float = 1e-07
@@ -93,7 +87,6 @@ class AdamWeightDecayConfig(base_config.Config):
  Attributes:
    name: name of the optimizer.
-    learning_rate: learning_rate for the optimizer.
    beta_1: decay rate for 1st order moments.
    beta_2: decay rate for 2st order moments.
    epsilon: epsilon value used for numerical stability in the optimizer.
@@ -106,7 +99,6 @@ class AdamWeightDecayConfig(base_config.Config):
                             include in weight decay.
  """
  name: str = "AdamWeightDecay"
-  learning_rate: float = 0.001
  beta_1: float = 0.9
  beta_2: float = 0.999
  epsilon: float = 1e-07
@@ -125,7 +117,6 @@ class LAMBConfig(base_config.Config):
  Attributes:
    name: name of the optimizer.
-    learning_rate: learning_rate for Adam optimizer.
    beta_1: decay rate for 1st order moments.
    beta_2: decay rate for 2st order moments.
    epsilon: epsilon value used for numerical stability in LAMB optimizer.
@@ -139,7 +130,6 @@ class LAMBConfig(base_config.Config):
                                   be excluded.
  """
  name: str = "LAMB"
-  learning_rate: float = 0.001
  beta_1: float = 0.9
  beta_2: float = 0.999
  epsilon: float = 1e-6

--- a/official/modeling/optimization/optimizer_factory.py
+++ b/official/modeling/optimization/optimizer_factory.py
@@ -60,7 +60,7 @@ class OptimizerFactory(object):
  params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'stepwise',
@@ -88,12 +88,15 @@ class OptimizerFactory(object):
    self._optimizer_config = config.optimizer.get()
    self._optimizer_type = config.optimizer.type
-    if self._optimizer_config is None:
+    if self._optimizer_type is None:
      raise ValueError('Optimizer type must be specified')
    self._lr_config = config.learning_rate.get()
    self._lr_type = config.learning_rate.type
+    if self._lr_type is None:
+      raise ValueError('Learning rate type must be specified')
    self._warmup_config = config.warmup.get()
    self._warmup_type = config.warmup.type
@@ -101,18 +104,15 @@ class OptimizerFactory(object):
    """Build learning rate.
    Builds learning rate from config. Learning rate schedule is built according
-    to the learning rate config. If there is no learning rate config, optimizer
+    to the learning rate config. If learning rate type is consant,
-    learning rate is returned.
+    lr_config.learning_rate is returned.
    Returns:
-      tf.keras.optimizers.schedules.LearningRateSchedule instance. If no
+      tf.keras.optimizers.schedules.LearningRateSchedule instance. If
-      learning rate schedule defined, optimizer_config.learning_rate is
+      learning rate type is consant, lr_config.learning_rate is returned.
-      returned.
    """
+    if self._lr_type == 'constant':
-    # TODO(arashwan): Explore if we want to only allow explicit const lr sched.
+      lr = self._lr_config.learning_rate
-    if not self._lr_config:
-      lr = self._optimizer_config.learning_rate
    else:
      lr = LR_CLS[self._lr_type](**self._lr_config.as_dict())

--- a/official/modeling/optimization/optimizer_factory_test.py
+++ b/official/modeling/optimization/optimizer_factory_test.py
@@ -35,10 +35,17 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': optimizer_type
+        },
+        'learning_rate': {
+            'type': 'constant',
+            'constant': {
+                'learning_rate': 0.1
+            }
        }
    }
    optimizer_cls = optimizer_factory.OPTIMIZERS_CLS[optimizer_type]
    expected_optimizer_config = optimizer_cls().get_config()
+    expected_optimizer_config['learning_rate'] = 0.1
    opt_config = optimization_config.OptimizationConfig(params)
    opt_factory = optimizer_factory.OptimizerFactory(opt_config)
@@ -48,11 +55,32 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    self.assertIsInstance(optimizer, optimizer_cls)
    self.assertEqual(expected_optimizer_config, optimizer.get_config())
+  def test_missing_types(self):
+    params = {
+        'optimizer': {
+            'type': 'sgd',
+            'sgd': {'momentum': 0.9}
+        }
+    }
+    with self.assertRaises(ValueError):
+      optimizer_factory.OptimizerFactory(
+          optimization_config.OptimizationConfig(params))
+    params = {
+        'learning_rate': {
+            'type': 'stepwise',
+            'stepwise': {'boundaries': [10000, 20000],
+                         'values': [0.1, 0.01, 0.001]}
+        }
+    }
+    with self.assertRaises(ValueError):
+      optimizer_factory.OptimizerFactory(
+          optimization_config.OptimizationConfig(params))
  def test_stepwise_lr_schedule(self):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'stepwise',
@@ -79,7 +107,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'stepwise',
@@ -112,7 +140,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'exponential',
@@ -142,7 +170,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'polynomial',
@@ -166,7 +194,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'cosine',
@@ -192,7 +220,13 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
+        },
+        'learning_rate': {
+            'type': 'constant',
+            'constant': {
+                'learning_rate': 0.1
+            }
        },
        'warmup': {
            'type': 'linear',
@@ -216,7 +250,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase):
    params = {
        'optimizer': {
            'type': 'sgd',
-            'sgd': {'learning_rate': 0.1, 'momentum': 0.9}
+            'sgd': {'momentum': 0.9}
        },
        'learning_rate': {
            'type': 'stepwise',

--- a/official/modeling/tf_utils.py
+++ b/official/modeling/tf_utils.py
@@ -88,7 +88,6 @@ def is_special_none_tensor(tensor):
  return tensor.shape.ndims == 0 and tensor.dtype == tf.int32
-# TODO(hongkuny): consider moving custom string-map lookup to keras api.
 def get_activation(identifier):
  """Maps a identifier to a Python function, e.g., "relu" => `tf.nn.relu`.

--- a/official/nlp/albert/run_classifier.py
+++ b/official/nlp/albert/run_classifier.py
@@ -14,23 +14,61 @@
 # ==============================================================================
 """ALBERT classification finetuning runner in tf2.x."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import json
+import os
 from absl import app
 from absl import flags
+from absl import logging
 import tensorflow as tf
 from official.nlp.albert import configs as albert_configs
+from official.nlp.bert import bert_models
 from official.nlp.bert import run_classifier as run_classifier_bert
 from official.utils.misc import distribution_utils
 FLAGS = flags.FLAGS
+def predict(strategy, albert_config, input_meta_data, predict_input_fn):
+  """Function outputs both the ground truth predictions as .tsv files."""
+  with strategy.scope():
+    classifier_model = bert_models.classifier_model(
+        albert_config, input_meta_data['num_labels'])[0]
+    checkpoint = tf.train.Checkpoint(model=classifier_model)
+    latest_checkpoint_file = (
+        FLAGS.predict_checkpoint_path or
+        tf.train.latest_checkpoint(FLAGS.model_dir))
+    assert latest_checkpoint_file
+    logging.info('Checkpoint file %s found and restoring from '
+                 'checkpoint', latest_checkpoint_file)
+    checkpoint.restore(
+        latest_checkpoint_file).assert_existing_objects_matched()
+    preds, ground_truth = run_classifier_bert.get_predictions_and_labels(
+        strategy, classifier_model, predict_input_fn, return_probs=True)
+    output_predict_file = os.path.join(FLAGS.model_dir, 'test_results.tsv')
+    with tf.io.gfile.GFile(output_predict_file, 'w') as writer:
+      logging.info('***** Predict results *****')
+      for probabilities in preds:
+        output_line = '\t'.join(
+            str(class_probability)
+            for class_probability in probabilities) + '\n'
+        writer.write(output_line)
+    ground_truth_labels_file = os.path.join(FLAGS.model_dir,
+                                            'output_labels.tsv')
+    with tf.io.gfile.GFile(ground_truth_labels_file, 'w') as writer:
+      logging.info('***** Ground truth results *****')
+      for label in ground_truth:
+        output_line = '\t'.join(str(label)) + '\n'
+        writer.write(output_line)
+  return
 def main(_):
  with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader:
    input_meta_data = json.loads(reader.read().decode('utf-8'))
@@ -56,9 +94,14 @@ def main(_):
  albert_config = albert_configs.AlbertConfig.from_json_file(
      FLAGS.bert_config_file)
+  if FLAGS.mode == 'train_and_eval':
    run_classifier_bert.run_bert(strategy, input_meta_data, albert_config,
                                 train_input_fn, eval_input_fn)
+  elif FLAGS.mode == 'predict':
+    predict(strategy, albert_config, input_meta_data, eval_input_fn)
+  else:
+    raise ValueError('Unsupported mode is specified: %s' % FLAGS.mode)
+  return
 if __name__ == '__main__':
  flags.mark_flag_as_required('bert_config_file')

--- a/official/nlp/bert/export_tfhub.py
+++ b/official/nlp/bert/export_tfhub.py
@@ -79,7 +79,7 @@ def export_bert_tfhub(bert_config: configs.BertConfig,
                 do_lower_case, vocab_file)
  core_model, encoder = create_bert_model(bert_config)
  checkpoint = tf.train.Checkpoint(model=encoder)
-  checkpoint.restore(model_checkpoint_path).assert_consumed()
+  checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched()
  core_model.vocab_file = tf.saved_model.Asset(vocab_file)
  core_model.do_lower_case = tf.Variable(do_lower_case, trainable=False)
  core_model.save(hub_destination, include_optimizer=False, save_format="tf")

--- a/official/nlp/bert/model_training_utils.py
+++ b/official/nlp/bert/model_training_utils.py
@@ -559,7 +559,6 @@ def run_customized_training_loop(
    for metric in model.metrics:
      training_summary[metric.name] = _float_metric_value(metric)
    if eval_metrics:
-      # TODO(hongkuny): Cleans up summary reporting in text.
      training_summary['last_train_metrics'] = _float_metric_value(
          train_metrics[0])
      training_summary['eval_metrics'] = _float_metric_value(eval_metrics[0])

--- a/official/nlp/configs/encoders.py
+++ b/official/nlp/configs/encoders.py
@@ -17,8 +17,8 @@
 Includes configurations and instantiation methods.
 """
 import dataclasses
+import gin
 import tensorflow as tf
 from official.modeling import tf_utils
@@ -42,10 +42,43 @@ class TransformerEncoderConfig(base_config.Config):
  initializer_range: float = 0.02
-def instantiate_encoder_from_cfg(
+@gin.configurable
-    config: TransformerEncoderConfig) -> networks.TransformerEncoder:
+def instantiate_encoder_from_cfg(config: TransformerEncoderConfig,
+                                 encoder_cls=networks.TransformerEncoder):
  """Instantiate a Transformer encoder network from TransformerEncoderConfig."""
-  encoder_network = networks.TransformerEncoder(
+  if encoder_cls.__name__ == "EncoderScaffold":
+    embedding_cfg = dict(
+        vocab_size=config.vocab_size,
+        type_vocab_size=config.type_vocab_size,
+        hidden_size=config.hidden_size,
+        seq_length=None,
+        max_seq_length=config.max_position_embeddings,
+        initializer=tf.keras.initializers.TruncatedNormal(
+            stddev=config.initializer_range),
+        dropout_rate=config.dropout_rate,
+    )
+    hidden_cfg = dict(
+        num_attention_heads=config.num_attention_heads,
+        intermediate_size=config.intermediate_size,
+        intermediate_activation=tf_utils.get_activation(
+            config.hidden_activation),
+        dropout_rate=config.dropout_rate,
+        attention_dropout_rate=config.attention_dropout_rate,
+        kernel_initializer=tf.keras.initializers.TruncatedNormal(
+            stddev=config.initializer_range),
+    )
+    kwargs = dict(
+        embedding_cfg=embedding_cfg,
+        hidden_cfg=hidden_cfg,
+        num_hidden_instances=config.num_layers,
+        pooled_output_dim=config.hidden_size,
+        pooler_layer_initializer=tf.keras.initializers.TruncatedNormal(
+            stddev=config.initializer_range))
+    return encoder_cls(**kwargs)
+  if encoder_cls.__name__ != "TransformerEncoder":
+    raise ValueError("Unknown encoder network class. %s" % str(encoder_cls))
+  encoder_network = encoder_cls(
      vocab_size=config.vocab_size,
      hidden_size=config.hidden_size,
      num_layers=config.num_layers,

--- a/official/nlp/data/classifier_data_lib.py
+++ b/official/nlp/data/classifier_data_lib.py
--- a/official/nlp/data/create_finetuning_data.py
+++ b/official/nlp/data/create_finetuning_data.py
@@ -51,7 +51,8 @@ flags.DEFINE_string(
 flags.DEFINE_enum("classification_task_name", "MNLI",
                  ["COLA", "MNLI", "MRPC", "PAWS-X", "QNLI", "QQP", "RTE",
-                   "SST-2", "WNLI", "XNLI", "XTREME-XNLI", "XTREME-PAWS-X"],
+                   "SST-2", "STS-B", "WNLI", "XNLI", "XTREME-XNLI",
+                   "XTREME-PAWS-X"],
                  "The name of the task to train BERT classifier. The "
                  "difference between XTREME-XNLI and XNLI is: 1. the format "
                  "of input tsv files; 2. the dev set for XTREME is english "
@@ -187,6 +188,8 @@ def generate_classifier_dataset():
        "rte": classifier_data_lib.RteProcessor,
        "sst-2":
            classifier_data_lib.SstProcessor,
+        "sts-b":
+            classifier_data_lib.StsBProcessor,
        "xnli":
            functools.partial(classifier_data_lib.XnliProcessor,
                              language=FLAGS.xnli_language),

--- a/official/nlp/data/tagging_data_loader.py
+++ b/official/nlp/data/tagging_data_loader.py
@@ -28,6 +28,7 @@ class TaggingDataConfig(cfg.DataConfig):
  """Data config for tagging (tasks/tagging)."""
  is_training: bool = True
  seq_length: int = 128
+  include_sentence_id: bool = False
 @data_loader_factory.register_data_loader_cls(TaggingDataConfig)
@@ -37,6 +38,7 @@ class TaggingDataLoader:
  def __init__(self, params: TaggingDataConfig):
    self._params = params
    self._seq_length = params.seq_length
+    self._include_sentence_id = params.include_sentence_id
  def _decode(self, record: tf.Tensor):
    """Decodes a serialized tf.Example."""
@@ -46,6 +48,9 @@ class TaggingDataLoader:
        'segment_ids': tf.io.FixedLenFeature([self._seq_length], tf.int64),
        'label_ids': tf.io.FixedLenFeature([self._seq_length], tf.int64),
    }
+    if self._include_sentence_id:
+      name_to_features['sentence_id'] = tf.io.FixedLenFeature([], tf.int64)
    example = tf.io.parse_single_example(record, name_to_features)
    # tf.Example only supports tf.int64, but the TPU only supports tf.int32.
@@ -65,6 +70,8 @@ class TaggingDataLoader:
        'input_mask': record['input_mask'],
        'input_type_ids': record['segment_ids']
    }
+    if self._include_sentence_id:
+      x['sentence_id'] = record['sentence_id']
    y = record['label_ids']
    return (x, y)

--- a/official/nlp/modeling/README.md
+++ b/official/nlp/modeling/README.md
 # NLP Modeling Library
-This libary provides a set of Keras primitives (Layers, Networks, and Models)
+This library provides a set of Keras primitives (Layers, Networks, and Models)
 that can be assembled into transformer-based models. They are
 flexible, validated, interoperable, and both TF1 and TF2 compatible.
@@ -16,6 +16,11 @@ standardized configuration.
 * [`losses`](losses) contains common loss computation used in NLP tasks.
+Please see the colab
+[nlp_modeling_library_intro.ipynb]
+(https://colab.sandbox.google.com/github/tensorflow/models/blob/master/official/colab/nlp/nlp_modeling_library_intro.ipynb)
+for how to build transformer-based NLP models using above primitives.
 Besides the pre-defined primitives, it also provides scaffold classes to allow
 easy experimentation with noval achitectures, e.g., you don’t need to fork a whole Transformer object to try a different kind of attention primitive, for instance.
@@ -33,11 +38,9 @@ embedding subnetwork (which will replace the standard embedding logic) and/or a
 custom hidden layer (which will replace the Transformer instantiation in the
 encoder).
-BERT and ALBERT models in this repo are implemented using this library. Code examples can be found in the corresponding model folder.
+Please see the colab
+[customize_encoder.ipynb]
+(https://colab.sandbox.google.com/github/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb)
+for how to use scaffold classes to build noval achitectures.
+BERT and ALBERT models in this repo are implemented using this library. Code examples can be found in the corresponding model folder.
--- a/official/nlp/modeling/layers/README.md
+++ b/official/nlp/modeling/layers/README.md
@@ -3,11 +3,6 @@
 Layers are the fundamental building blocks for NLP models. They can be used to
 assemble new layers, networks, or models.
-*   [DenseEinsum](dense_einsum.py) implements a feedforward network using
-    tf.einsum. This layer contains the einsum op, the associated weight, and the
-    logic required to generate the einsum expression for the given
-    initialization parameters.
 *   [MultiHeadAttention](attention.py) implements an optionally masked attention
    between query, key, value tensors as described in
    ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762). If