Merge branch 'master' into amp_resnet50

a35e09d2 · Vinh Nguyen · GitHub · d5722dcd · 1f5a5e9d · a35e09d2
Unverified Commit a35e09d2 authored Aug 28, 2019 by Vinh Nguyen Committed by GitHub Aug 28, 2019
20 changed files
--- a/official/README.md
+++ b/official/README.md
 # TensorFlow Official Models

-The TensorFlow official models are a collection of example models that use TensorFlow's high-level APIs. They are intended to be well-maintained, tested, and kept up to date with the latest TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read.
+The TensorFlow official models are a collection of example models that use
+TensorFlow's high-level APIs. They are intended to be well-maintained, tested,
+and kept up to date with the latest TensorFlow API. They should also be
+reasonably optimized for fast performance while still being easy to read.

-These models are used as end-to-end tests, ensuring that the models run with the same speed and performance with each new TensorFlow build.
+These models are used as end-to-end tests, ensuring that the models run with the
+same speed and performance with each new TensorFlow build.

 ## Tensorflow releases
-The master branch of the models are **in development**, and they target the [nightly binaries](https://github.com/tensorflow/tensorflow#installation) built from the [master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master). We aim to keep them backwards compatible with the latest release when possible (currently TensorFlow 1.5), but we cannot always guarantee compatibility.

-**Stable versions** of the official models targeting releases of TensorFlow are available as tagged branches or [downloadable releases](https://github.com/tensorflow/models/releases). Model repository version numbers match the target TensorFlow release, such that [branch r1.4.0](https://github.com/tensorflow/models/tree/r1.4.0) and [release v1.4.0](https://github.com/tensorflow/models/releases/tag/v1.4.0) are compatible with [TensorFlow v1.4.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0).
-
-If you are on a version of TensorFlow earlier than 1.4, please [update your installation](https://www.tensorflow.org/install/).
+The master branch of the models are **in development**, and they target the
+[nightly binaries](https://github.com/tensorflow/tensorflow#installation) built
+from the
+[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
+We aim to keep them backwards compatible with the latest release when possible
+(currently TensorFlow 1.5), but we cannot always guarantee compatibility.
+
+**Stable versions** of the official models targeting releases of TensorFlow are
+available as tagged branches or
+[downloadable releases](https://github.com/tensorflow/models/releases). Model
+repository version numbers match the target TensorFlow release, such that
+[branch r1.4.0](https://github.com/tensorflow/models/tree/r1.4.0) and
+[release v1.4.0](https://github.com/tensorflow/models/releases/tag/v1.4.0) are
+compatible with
+[TensorFlow v1.4.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0).
+
+If you are on a version of TensorFlow earlier than 1.4, please
+[update your installation](https://www.tensorflow.org/install/).

 ## Requirements
-Please follow the below steps before running models in this repo:
-
-
-1. TensorFlow [nightly binaries](https://github.com/tensorflow/tensorflow#installation)

-2. Add the top-level ***/models*** folder to the Python path with the command:
-   ```
-   export PYTHONPATH="$PYTHONPATH:/path/to/models"
-   ```
+Please follow the below steps before running models in this repo:

-   Using Colab:
-   ```
-   import os
-   os.environ['PYTHONPATH'] += ":/path/to/models"
-   ```
+1.  TensorFlow
+    [nightly binaries](https://github.com/tensorflow/tensorflow#installation)

-3. Install dependencies:
-   ```
-   pip3 install --user -r official/requirements.txt
-   ```
-   or
-   ```
-   pip install --user -r official/requirements.txt
-   ```
+2.  Add the top-level ***/models*** folder to the Python path with the command:
+    `export PYTHONPATH="$PYTHONPATH:/path/to/models"`

+    Using Colab: `import os os.environ['PYTHONPATH'] += ":/path/to/models"`

-To make Official Models easier to use, we are planning to create a pip installable Official Models package. This is being tracked in [#917](https://github.com/tensorflow/models/issues/917).
+3.  Install dependencies: `pip3 install --user -r official/requirements.txt` or
+    `pip install --user -r official/requirements.txt`

+To make Official Models easier to use, we are planning to create a pip
+installable Official Models package. This is being tracked in
+[#917](https://github.com/tensorflow/models/issues/917).

 ## Available models

-**NOTE:** Please make sure to follow the steps in the [Requirements](#requirements) section.
+**NOTE:** Please make sure to follow the steps in the
+[Requirements](#requirements) section.

-* [bert](bert): A powerful pre-trained language representation model: BERT, which
-  stands for Bidirectional Encoder Representations from Transformers.
-* [mnist](mnist): A basic model to classify digits from the MNIST dataset.
-* [resnet](resnet): A deep residual network that can be used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
-* [transformer](transformer): A transformer model to translate the WMT English to German dataset.
-* [wide_deep](wide_deep): A model that combines a wide model and deep network to classify census income data.
-* More models to come!
+*   [bert](bert): A powerful pre-trained language representation model: BERT,
+    which stands for Bidirectional Encoder Representations from Transformers.
+*   [mnist](mnist): A basic model to classify digits from the MNIST dataset.
+*   [resnet](vision/image_classification): A deep residual network that can be
+    used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
+*   [transformer](transformer): A transformer model to translate the WMT English
+    to German dataset.
+*   [ncf](recommendation): Neural Collaborative Filtering model for
+    recommendation tasks.

 Models that will not update to TensorFlow 2.x stay inside R1 directory:

-* [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to classify
-  higgs boson process from HIGGS Data Set.
+*   [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to
+    classify higgs boson process from HIGGS Data Set.
+*   [wide_deep](r1/wide_deep): A model that combines a wide model and deep
+    network to classify census income data.
+
+## More models to come!
+
+We are in the progress to revamp official model garden with TensorFlow 2.0 and
+Keras. In the near future, we will bring:

+*   State-of-the-art language understanding models: XLNet, GPT2, and more
+    members in Transformer family.
+*   Start-of-the-art image classification models: EfficientNet, MnasNet and
+    variants.
+*   A set of excellent objection detection models.

-If you would like to make any fixes or improvements to the models, please [submit a pull request](https://github.com/tensorflow/models/compare).
+If you would like to make any fixes or improvements to the models, please
+[submit a pull request](https://github.com/tensorflow/models/compare).

 ## New Models

-The team is actively working to add new models to the repository. Every model should follow the following guidelines, to uphold the
-our objectives of readable, usable, and maintainable code.
+The team is actively working to add new models to the repository. Every model
+should follow the following guidelines, to uphold the our objectives of
+readable, usable, and maintainable code.

-**General guidelines**
-* Code should be well documented and tested.
-* Runnable from a blank environment with relative ease.
-* Trainable on: single GPU/CPU (baseline), multiple GPUs, TPU
-* Compatible with Python 2 and 3 (using [six](https://pythonhosted.org/six/) when necessary)
-* Conform to [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
+**General guidelines** * Code should be well documented and tested. * Runnable
+from a blank environment with relative ease. * Trainable on: single GPU/CPU
+(baseline), multiple GPUs, TPU * Compatible with Python 2 and 3 (using
+[six](https://pythonhosted.org/six/) when necessary) * Conform to
+[Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)

 **Implementation guidelines**

-These guidelines exist so the model implementations are consistent for better readability and maintainability.
+These guidelines exist so the model implementations are consistent for better
+readability and maintainability.

-* Use [common utility functions](utils)
-* Export SavedModel at the end of training.
-* Consistent flags and flag-parsing library ([read more here](utils/flags/guidelines.md))
-* Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
+*   Use [common utility functions](utils)
+*   Export SavedModel at the end of training.
+*   Consistent flags and flag-parsing library
+    ([read more here](utils/flags/guidelines.md))
+*   Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
--- a/official/benchmark/keras_imagenet_benchmark.py
+++ b/official/benchmark/keras_imagenet_benchmark.py
@@ -1043,7 +1043,7 @@ class Resnet50MultiWorkerKerasBenchmark(Resnet50KerasBenchmarkBase):


 class Resnet50MultiWorkerKerasBenchmarkSynth(Resnet50MultiWorkerKerasBenchmark):
-  """Resnet50 multi-worker synthetic benchmark tests."""
+  """Resnet50 multi-worker synthetic data benchmark tests."""

  def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
    def_flags = {}
@@ -1057,5 +1057,20 @@ class Resnet50MultiWorkerKerasBenchmarkSynth(Resnet50MultiWorkerKerasBenchmark):
        output_dir=output_dir, default_flags=def_flags)


+class Resnet50MultiWorkerKerasBenchmarkReal(Resnet50MultiWorkerKerasBenchmark):
+  """Resnet50 multi-worker real data benchmark tests."""
+
+  def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
+    def_flags = {}
+    def_flags['skip_eval'] = True
+    def_flags['report_accuracy_metrics'] = False
+    def_flags['data_dir'] = os.path.join(root_data_dir, 'imagenet')
+    def_flags['train_steps'] = 110
+    def_flags['log_steps'] = 10
+
+    super(Resnet50MultiWorkerKerasBenchmarkReal, self).__init__(
+        output_dir=output_dir, default_flags=def_flags)
+
+
 if __name__ == '__main__':
  tf.test.main()
--- a/official/utils/data/__init__.py
+++ b/official/utils/data/__init__.py
--- a/official/vision/image_classification/trivial_model.py
+++ b/official/vision/image_classification/trivial_model.py
--- a/official/bert/benchmark/bert_squad_benchmark.py
+++ b/official/bert/benchmark/bert_squad_benchmark.py
@@ -43,6 +43,7 @@ SQUAD_FULL_INPUT_META_DATA_PATH = 'gs://tf-perfzero-data/bert/squad/squad_full_m
 MODEL_CONFIG_FILE_PATH = 'gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16/bert_config'
 # pylint: enable=line-too-long

+TMP_DIR = os.getenv('TMPDIR')
 FLAGS = flags.FLAGS


@@ -116,7 +117,7 @@ class BertSquadBenchmarkReal(BertSquadBenchmarkBase):
  `benchmark_(number of gpus)_gpu` format.
  """

-  def __init__(self, output_dir=None, **kwargs):
+  def __init__(self, output_dir=TMP_DIR, **kwargs):
    super(BertSquadBenchmarkReal, self).__init__(output_dir=output_dir)

  def _setup(self):

--- a/official/bert/common_flags.py
+++ b/official/bert/common_flags.py
@@ -27,15 +27,19 @@ def define_common_bert_flags():
  flags.DEFINE_string('model_dir', None, (
      'The directory where the model weights and training/evaluation summaries '
      'are stored. If not specified, save to /tmp/bert20/.'))
+  flags.DEFINE_string(
+      'model_export_path', None,
+      'Path to the directory, where trainined model will be '
+      'exported.')
  flags.DEFINE_string('tpu', '', 'TPU address to connect to.')
  flags.DEFINE_string(
      'init_checkpoint', None,
      'Initial checkpoint (usually from a pre-trained BERT model).')
  flags.DEFINE_enum(
-      'strategy_type', 'mirror', ['tpu', 'mirror'],
+      'strategy_type', 'mirror', ['tpu', 'mirror', 'multi_worker_mirror'],
      'Distribution Strategy type to use for training. `tpu` uses '
-      'TPUStrategy for running on TPUs, `mirror` uses GPUs with '
-      'single host.')
+      'TPUStrategy for running on TPUs, `mirror` uses GPUs with single host, '
+      '`multi_worker_mirror` uses CPUs or GPUs with multiple hosts.')
  flags.DEFINE_integer('num_train_epochs', 3,
                       'Total number of training epochs to perform.')
  flags.DEFINE_integer(

--- a/official/bert/modeling.py
+++ b/official/bert/modeling.py
@@ -165,6 +165,7 @@ class BertModel(tf.keras.layers.Layer):
        max_position_embeddings=self.config.max_position_embeddings,
        dropout_prob=self.config.hidden_dropout_prob,
        initializer_range=self.config.initializer_range,
+        dtype=tf.float32,
        name="embedding_postprocessor")
    self.encoder = Transformer(
        num_hidden_layers=self.config.num_hidden_layers,
@@ -316,8 +317,9 @@ class EmbeddingPostprocessor(tf.keras.layers.Layer):
          dtype=self.dtype)

    self.output_layer_norm = tf.keras.layers.LayerNormalization(
-        name="layer_norm", axis=-1, epsilon=1e-12)
-    self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_prob)
+        name="layer_norm", axis=-1, epsilon=1e-12, dtype=tf.float32)
+    self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_prob,
+                                                  dtype=tf.float32)
    super(EmbeddingPostprocessor, self).build(input_shapes)

  def __call__(self, word_embeddings, token_type_ids=None, **kwargs):
@@ -714,11 +716,15 @@ class TransformerBlock(tf.keras.layers.Layer):
        rate=self.hidden_dropout_prob)
    self.attention_layer_norm = (
        tf.keras.layers.LayerNormalization(
-            name="self_attention_layer_norm", axis=-1, epsilon=1e-12))
+            name="self_attention_layer_norm", axis=-1, epsilon=1e-12,
+            # We do layer norm in float32 for numeric stability.
+            dtype=tf.float32))
    self.intermediate_dense = Dense2DProjection(
        output_size=self.intermediate_size,
        kernel_initializer=get_initializer(self.initializer_range),
        activation=self.intermediate_activation,
+        # Uses float32 so that gelu activation is done in float32.
+        dtype=tf.float32,
        name="intermediate")
    self.output_dense = Dense2DProjection(
        output_size=self.hidden_size,
@@ -726,7 +732,7 @@ class TransformerBlock(tf.keras.layers.Layer):
        name="output")
    self.output_dropout = tf.keras.layers.Dropout(rate=self.hidden_dropout_prob)
    self.output_layer_norm = tf.keras.layers.LayerNormalization(
-        name="output_layer_norm", axis=-1, epsilon=1e-12)
+        name="output_layer_norm", axis=-1, epsilon=1e-12, dtype=tf.float32)
    super(TransformerBlock, self).build(unused_input_shapes)

  def common_layers(self):
@@ -753,6 +759,10 @@ class TransformerBlock(tf.keras.layers.Layer):
    attention_output = self.attention_dropout(attention_output)
    # Use float32 in keras layer norm and the gelu activation in the
    # intermediate dense layer for numeric stability
+    # TODO(reedwm): These casts are probably unnecessary, as we passed
+    # dtype=tf.float32 to the layer norm constructor, so it will cast its inputs
+    # to float32 automatically. These manual casts additionally do the "+"
+    # operator in float32, but "+" is numerically stable in float16.
    if self.float_type == tf.float16:
      input_tensor = tf.cast(input_tensor, tf.float32)
      attention_output = tf.cast(attention_output, tf.float32)

--- a/official/bert/optimization.py
+++ b/official/bert/optimization.py
@@ -105,12 +105,14 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
               epsilon=1e-7,
               amsgrad=False,
               weight_decay_rate=0.0,
+               include_in_weight_decay=None,
               exclude_from_weight_decay=None,
               name='AdamWeightDecay',
               **kwargs):
    super(AdamWeightDecay, self).__init__(
        learning_rate, beta_1, beta_2, epsilon, amsgrad, name, **kwargs)
    self.weight_decay_rate = weight_decay_rate
+    self._include_in_weight_decay = include_in_weight_decay
    self._exclude_from_weight_decay = exclude_from_weight_decay

  @classmethod
@@ -178,6 +180,12 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
    """Whether to use L2 weight decay for `param_name`."""
    if self.weight_decay_rate == 0:
      return False
+
+    if self._include_in_weight_decay:
+      for r in self._include_in_weight_decay:
+        if re.search(r, param_name) is not None:
+          return True
+
    if self._exclude_from_weight_decay:
      for r in self._exclude_from_weight_decay:
        if re.search(r, param_name) is not None:

--- a/official/bert/run_classifier.py
+++ b/official/bert/run_classifier.py
@@ -48,10 +48,6 @@ flags.DEFINE_string('train_data_path', None,
                    'Path to training data for BERT classifier.')
 flags.DEFINE_string('eval_data_path', None,
                    'Path to evaluation data for BERT classifier.')
-flags.DEFINE_string(
-    'model_export_path', None,
-    'Path to the directory, where trainined model will be '
-    'exported.')
 # Model training specific flags.
 flags.DEFINE_string(
    'input_meta_data_path', None,

--- a/official/bert/run_squad.py
+++ b/official/bert/run_squad.py
@@ -31,6 +31,7 @@ import tensorflow as tf
 from official.bert import bert_models
 from official.bert import common_flags
 from official.bert import input_pipeline
+from official.bert import model_saving_utils
 from official.bert import model_training_utils
 from official.bert import modeling
 from official.bert import optimization
@@ -39,8 +40,13 @@ from official.bert import tokenization
 from official.utils.misc import keras_utils
 from official.utils.misc import tpu_lib

-flags.DEFINE_bool('do_train', False, 'Whether to run training.')
-flags.DEFINE_bool('do_predict', False, 'Whether to run eval on the dev set.')
+flags.DEFINE_enum(
+    'mode', 'train', ['train', 'predict', 'export_only'],
+    'One of {"train", "predict", "export_only"}. `train`: '
+    'trains the model and evaluates in the meantime. '
+    '`predict`: predict answers from the squad json file. '
+    '`export_only`: will take the latest checkpoint inside '
+    'model_dir and export a `SavedModel`.')
 flags.DEFINE_string('train_data_path', '',
                    'Training data path with train tfrecords.')
 flags.DEFINE_string(
@@ -139,6 +145,8 @@ def predict_squad_customized(strategy, input_meta_data, bert_config,
        strategy.experimental_distribute_dataset(predict_dataset))

    with strategy.scope():
+      # Prediction always uses float32, even if training uses mixed precision.
+      tf.keras.mixed_precision.experimental.set_policy('float32')
      squad_model, _ = bert_models.squad_model(
          bert_config, input_meta_data['max_seq_length'], float_type=tf.float32)

@@ -187,7 +195,7 @@ def train_squad(strategy,

  use_float16 = common_flags.use_float16()
  if use_float16:
-    policy = tf.keras.mixed_precision.experimental.Policy('infer_float32_vars')
+    policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
    tf.keras.mixed_precision.experimental.set_policy(policy)

  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
@@ -212,6 +220,9 @@ def train_squad(strategy,
    squad_model.optimizer = optimization.create_optimizer(
        FLAGS.learning_rate, steps_per_epoch * epochs, warmup_steps)
    if use_float16:
+      # Wraps optimizer with a LossScaleOptimizer. This is done automatically
+      # in compile() with the "mixed_float16" policy, but since we do not call
+      # compile(), we must wrap the optimizer manually.
      squad_model.optimizer = (
          tf.keras.mixed_precision.experimental.LossScaleOptimizer(
              squad_model.optimizer, loss_scale=common_flags.get_loss_scale()))
@@ -306,6 +317,26 @@ def predict_squad(strategy, input_meta_data):
      verbose=FLAGS.verbose_logging)


+def export_squad(model_export_path, input_meta_data):
+  """Exports a trained model as a `SavedModel` for inference.
+
+  Args:
+    model_export_path: a string specifying the path to the SavedModel directory.
+    input_meta_data: dictionary containing meta data about input and model.
+
+  Raises:
+    Export path is not specified, got an empty string or None.
+  """
+  if not model_export_path:
+    raise ValueError('Export path is not specified: %s' % model_export_path)
+  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
+
+  squad_model, _ = bert_models.squad_model(
+      bert_config, input_meta_data['max_seq_length'], float_type=tf.float32)
+  model_saving_utils.export_bert_model(
+      model_export_path, model=squad_model, checkpoint_dir=FLAGS.model_dir)
+
+
 def main(_):
  # Users should always run this script under TF 2.x
  assert tf.version.VERSION.startswith('2.')
@@ -313,9 +344,15 @@ def main(_):
  with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader:
    input_meta_data = json.loads(reader.read().decode('utf-8'))

+  if FLAGS.mode == 'export_only':
+    export_squad(FLAGS.model_export_path, input_meta_data)
+    return
+
  strategy = None
  if FLAGS.strategy_type == 'mirror':
    strategy = tf.distribute.MirroredStrategy()
+  elif FLAGS.strategy_type == 'multi_worker_mirror':
+    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
  elif FLAGS.strategy_type == 'tpu':
    # Initialize TPU System.
    cluster_resolver = tpu_lib.tpu_initialize(FLAGS.tpu)
@@ -323,9 +360,9 @@ def main(_):
  else:
    raise ValueError('The distribution strategy type is not supported: %s' %
                     FLAGS.strategy_type)
-  if FLAGS.do_train:
+  if FLAGS.mode == 'train':
    train_squad(strategy, input_meta_data)
-  if FLAGS.do_predict:
+  if FLAGS.mode == 'predict':
    predict_squad(strategy, input_meta_data)



--- a/official/bert/tools/tf1_to_keras_checkpoint_converter.py
+++ b/official/bert/tools/tf1_to_keras_checkpoint_converter.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Convert checkpoints created by Estimator (tf1) to be Keras compatible.
+
+Keras manages variable names internally, which results in subtly different names
+for variables between the Estimator and Keras version.
+The script should be ran with TF 1.x.
+
+Usage:
+
+  python checkpoint_convert.py \
+      --checkpoint_from_path="/path/to/checkpoint" \
+      --checkpoint_to_path="/path/to/new_checkpoint"
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl import app
+import tensorflow as tf
+
+flags = tf.flags
+
+FLAGS = flags.FLAGS
+
+## Required parameters
+flags.DEFINE_string("checkpoint_from_path", None,
+                    "Source BERT checkpoint path.")
+flags.DEFINE_string("checkpoint_to_path", None,
+                    "Destination BERT checkpoint path.")
+flags.DEFINE_string(
+    "exclude_patterns", None,
+    "Comma-delimited string of a list of patterns to exclude"
+    " variables from source checkpoint.")
+
+# Mapping between old <=> new names. The source pattern in original variable
+# name will be replaced by destination pattern.
+BERT_NAME_REPLACEMENTS = [
+    ("bert", "bert_model"),
+    ("embeddings/word_embeddings", "word_embeddings/embeddings"),
+    ("embeddings/token_type_embeddings",
+     "embedding_postprocessor/type_embeddings"),
+    ("embeddings/position_embeddings",
+     "embedding_postprocessor/position_embeddings"),
+    ("embeddings/LayerNorm", "embedding_postprocessor/layer_norm"),
+    ("attention/self", "self_attention"),
+    ("attention/output/dense", "self_attention_output"),
+    ("attention/output/LayerNorm", "self_attention_layer_norm"),
+    ("intermediate/dense", "intermediate"),
+    ("output/dense", "output"),
+    ("output/LayerNorm", "output_layer_norm"),
+    ("pooler/dense", "pooler_transform"),
+]
+
+
+def _bert_name_replacement(var_name):
+  for src_pattern, tgt_pattern in BERT_NAME_REPLACEMENTS:
+    if src_pattern in var_name:
+      old_var_name = var_name
+      var_name = var_name.replace(src_pattern, tgt_pattern)
+      tf.logging.info("Converted: %s --> %s", old_var_name, var_name)
+  return var_name
+
+
+def _has_exclude_patterns(name, exclude_patterns):
+  """Checks if a string contains substrings that match patterns to exclude."""
+  for p in exclude_patterns:
+    if p in name:
+      return True
+  return False
+
+
+def convert_names(checkpoint_from_path,
+                  checkpoint_to_path,
+                  exclude_patterns=None):
+  """Migrates the names of variables within a checkpoint.
+
+  Args:
+    checkpoint_from_path: Path to source checkpoint to be read in.
+    checkpoint_to_path: Path to checkpoint to be written out.
+    exclude_patterns: A list of string patterns to exclude variables from
+      checkpoint conversion.
+
+  Returns:
+    A dictionary that maps the new variable names to the Variable objects.
+    A dictionary that maps the old variable names to the new variable names.
+  """
+  with tf.Graph().as_default():
+    tf.logging.info("Reading checkpoint_from_path %s", checkpoint_from_path)
+    reader = tf.train.NewCheckpointReader(checkpoint_from_path)
+    name_shape_map = reader.get_variable_to_shape_map()
+    new_variable_map = {}
+    conversion_map = {}
+    for var_name in name_shape_map:
+      if exclude_patterns and _has_exclude_patterns(var_name, exclude_patterns):
+        continue
+      new_var_name = _bert_name_replacement(var_name)
+      tensor = reader.get_tensor(var_name)
+      var = tf.Variable(tensor, name=var_name)
+      new_variable_map[new_var_name] = var
+      if new_var_name != var_name:
+        conversion_map[var_name] = new_var_name
+
+    saver = tf.train.Saver(new_variable_map)
+
+    with tf.Session() as sess:
+      sess.run(tf.global_variables_initializer())
+      tf.logging.info("Writing checkpoint_to_path %s", checkpoint_to_path)
+      saver.save(sess, checkpoint_to_path)
+
+  tf.logging.info("Summary:")
+  tf.logging.info("  Converted %d variable name(s).", len(new_variable_map))
+  tf.logging.info("  Converted: %s", str(conversion_map))
+
+
+def main(_):
+  exclude_patterns = None
+  if FLAGS.exclude_patterns:
+    exclude_patterns = FLAGS.exclude_patterns.split(",")
+  convert_names(FLAGS.checkpoint_from_path, FLAGS.checkpoint_to_path,
+                exclude_patterns)
+
+
+if __name__ == "__main__":
+  flags.mark_flag_as_required("checkpoint_from_path")
+  flags.mark_flag_as_required("checkpoint_to_path")
+  app.run(main)
--- a/official/bert/tools/tf2_checkpoint_converter.py
+++ b/official/bert/tools/tf2_checkpoint_converter.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""A converter for BERT name-based checkpoint to object-based checkpoint.
+
+The conversion will yield objected-oriented checkpoint for TF2 Bert models,
+when BergConfig.backward_compatible is true.
+The variable/tensor shapes matches TF1 BERT model, but backward compatiblity
+introduces unnecessary reshape compuation.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl import app
+from absl import flags
+
+import tensorflow as tf
+from official.bert import modeling
+
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string("bert_config_file", None,
+                    "Bert configuration file to define core bert layers.")
+flags.DEFINE_string(
+    "init_checkpoint", None,
+    "Initial checkpoint (usually from a pre-trained BERT model).")
+flags.DEFINE_string("converted_checkpoint", None,
+                    "Path to objected-based V2 checkpoint.")
+flags.DEFINE_bool(
+    "export_bert_as_layer", False,
+    "Whether to use a layer rather than a model inside the checkpoint.")
+
+
+def create_bert_model(bert_config):
+  """Creates a BERT keras core model from BERT configuration.
+
+  Args:
+    bert_config: A BertConfig` to create the core model.
+  Returns:
+    A keras model.
+  """
+  max_seq_length = bert_config.max_position_embeddings
+
+  # Adds input layers just as placeholders.
+  input_word_ids = tf.keras.layers.Input(
+      shape=(max_seq_length,), dtype=tf.int32, name="input_word_ids")
+  input_mask = tf.keras.layers.Input(
+      shape=(max_seq_length,), dtype=tf.int32, name="input_mask")
+  input_type_ids = tf.keras.layers.Input(
+      shape=(max_seq_length,), dtype=tf.int32, name="input_type_ids")
+  core_model = modeling.get_bert_model(
+      input_word_ids,
+      input_mask,
+      input_type_ids,
+      config=bert_config,
+      name="bert_model",
+      float_type=tf.float32)
+  return core_model
+
+
+def convert_checkpoint():
+  """Converts a name-based matched TF V1 checkpoint to TF V2 checkpoint."""
+  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
+
+  # Sets backward_compatible to true to convert TF1 BERT checkpoints.
+  bert_config.backward_compatible = True
+  core_model = create_bert_model(bert_config)
+
+  # Uses streaming-restore in eager model to read V1 name-based checkpoints.
+  core_model.load_weights(FLAGS.init_checkpoint)
+  if FLAGS.export_bert_as_layer:
+    bert_layer = core_model.get_layer("bert_model")
+    checkpoint = tf.train.Checkpoint(bert_layer=bert_layer)
+  else:
+    checkpoint = tf.train.Checkpoint(model=core_model)
+
+  checkpoint.save(FLAGS.converted_checkpoint)
+
+
+def main(_):
+  tf.enable_eager_execution()
+  convert_checkpoint()
+
+
+if __name__ == "__main__":
+  app.run(main)
--- a/official/mnist/README.md
+++ b/official/mnist/README.md
@@ -12,7 +12,13 @@ APIs.
 ## Setup

 To begin, you'll simply need the latest version of TensorFlow installed.
-First make sure you've [added the models folder to your Python path](/official/#running-the-models); otherwise you may encounter an error like `ImportError: No module named official.mnist`.
+First make sure you've [added the models folder to your Python path]:
+
+```shell
+export PYTHONPATH="$PYTHONPATH:/path/to/models"
+```
+
+Otherwise you may encounter an error like `ImportError: No module named official.mnist`.

 Then to train the model, run the following:


--- a/official/mnist/mnist.py
+++ b/official/mnist/mnist.py
@@ -89,7 +89,9 @@ def create_model(data_format):

 def define_mnist_flags():
  flags_core.define_base()
-  flags_core.define_performance(num_parallel_calls=False)
+  flags_core.define_performance(inter_op=True, intra_op=True,
+                                num_parallel_calls=False,
+                                all_reduce_alg=True)
  flags_core.define_image()
  flags.adopt_module_key_flags(flags_core)
  flags_core.set_defaults(data_dir='/tmp/mnist_data',

--- a/official/r1/resnet/resnet_run_loop.py
+++ b/official/r1/resnet/resnet_run_loop.py
@@ -33,7 +33,7 @@ import tensorflow as tf

 from official.r1.resnet import imagenet_preprocessing
 from official.r1.resnet import resnet_model
-from official.utils.export import export
+from official.r1.utils import export
 from official.utils.flags import core as flags_core
 from official.utils.logs import hooks_helper
 from official.utils.logs import logger
@@ -725,6 +725,12 @@ def define_resnet_flags(resnet_size_choices=None, dynamic_loss_scale=False,
  """Add flags and validators for ResNet."""
  flags_core.define_base()
  flags_core.define_performance(num_parallel_calls=False,
+                                inter_op=True,
+                                intra_op=True,
+                                synthetic_data=True,
+                                dtype=True,
+                                all_reduce_alg=True,
+                                num_packs=True,
                                tf_gpu_thread_mode=True,
                                datasets_num_private_threads=True,
                                dynamic_loss_scale=dynamic_loss_scale,

--- a/official/utils/export/__init__.py
+++ b/official/utils/export/__init__.py
--- a/official/r1/utils/data/__init__.py
+++ b/official/r1/utils/data/__init__.py
--- a/official/utils/data/file_io.py
+++ b/official/utils/data/file_io.py
@@ -20,6 +20,7 @@ from __future__ import print_function

 import atexit
 import multiprocessing
+import multiprocessing.dummy
 import os
 import tempfile
 import uuid
@@ -78,8 +79,8 @@ def iter_shard_dataframe(df, rows_per_core=1000):
  It yields a list of dataframes with length equal to the number of CPU cores,
  with each dataframe having rows_per_core rows. (Except for the last batch
  which may have fewer rows in the dataframes.) Passing vectorized inputs to
-  a multiprocessing pool is much more effecient than iterating through a
-  dataframe in serial and passing a list of inputs to the pool.
+  a pool is more effecient than iterating through a dataframe in serial and
+  passing a list of inputs to the pool.

  Args:
    df: Pandas dataframe to be sharded.
@@ -134,7 +135,7 @@ def _serialize_shards(df_shards, columns, pool, writer):
  Args:
    df_shards: A list of pandas dataframes. (Should be of similar size)
    columns: The dataframe columns to be serialized.
-    pool: A multiprocessing pool to serialize in parallel.
+    pool: A pool to serialize in parallel.
    writer: A TFRecordWriter to write the serialized shards.
  """
  # Pandas does not store columns of arrays as nd arrays. stack remedies this.
@@ -190,7 +191,7 @@ def write_to_buffer(dataframe, buffer_path, columns, expected_size=None):
                            .format(buffer_path))

  count = 0
-  pool = multiprocessing.Pool(multiprocessing.cpu_count())
+  pool = multiprocessing.dummy.Pool(multiprocessing.cpu_count())
  try:
    with tf.io.TFRecordWriter(buffer_path) as writer:
      for df_shards in iter_shard_dataframe(df=dataframe,

--- a/official/utils/data/file_io_test.py
+++ b/official/utils/data/file_io_test.py
@@ -27,7 +27,7 @@ import pandas as pd
 import tensorflow as tf
 # pylint: enable=wrong-import-order

-from official.utils.data import file_io
+from official.r1.utils.data import file_io
 from official.utils.misc import keras_utils



--- a/official/utils/export/export.py
+++ b/official/utils/export/export.py