Merge pull request #7 from tensorflow/master

updated

Merge pull request #7 from tensorflow/master
updated
965cc3ee · Ayushman Kumar · GitHub · 1f3247f4 · 1f685c54 · 965cc3ee
Unverified Commit 965cc3ee authored Apr 21, 2020 by Ayushman Kumar Committed by GitHub Apr 21, 2020
20 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
-* @tensorflow/tf-garden-team
+* @tensorflow/tf-garden-team @tensorflow/tf-model-garden-team
 /official/ @rachellj218 @saberkun
 /official/bert @saberkun @hongjunChoi @rachellj218
+/research/adv_imagenet_models/ @alexeykurakin
 /research/adversarial_crypto/ @dave-andersen
-/research/adversarial_logit_pairing/ @AlexeyKurakin
+/research/adversarial_logit_pairing/ @alexeykurakin
 /research/adversarial_text/ @rsepassi @a-dai
-/research/adv_imagenet_models/ @AlexeyKurakin
 /research/attention_ocr/ @alexgorban
 /research/audioset/ @plakal @dpwe
 /research/autoaugment/* @barretzoph
@@ -14,10 +14,13 @@
 /research/compression/ @nmjohn
 /research/cvt_text/ @clarkkev @lmthang
 /research/deep_contextual_bandits/ @rikel
+/research/deep_speech/ @yhliang2018
 /research/deeplab/ @aquariusjay @yknzhu @gpapan
 /research/delf/ @andrefaraujo
 /research/domain_adaptation/ @bousmalis @dmrd
 /research/efficient-hrl/ @ofirnachum
+/research/feelvos/ @pvoigtlaender @yuningchai @aquariusjay
+/research/fivo/ @dieterichlawson
 /research/global_objectives/ @mackeya-google
 /research/im2txt/ @cshallue
 /research/inception/ @shlens @vincentvanhoucke
@@ -26,7 +29,7 @@
 /research/learning_to_remember_rare_events/ @lukaszkaiser @ofirnachum
 /research/learning_unsupervised_learning/ @lukemetz @nirum
 /research/lexnet_nc/ @vered1986 @waterson
-/research/lfads/ @jazcollins @susillo
+/research/lfads/ @jazcollins @sussillo
 /research/lm_1b/ @oriolvinyals @panyx0718
 /research/lm_commonsense/ @thtrieu
 /research/lstm_object_detection/ @dreamdragon @masonliuw @yinxiaoli @yongzhe2160
@@ -39,9 +42,10 @@
 /research/object_detection/ @jch1 @tombstone @derekjchow @jesu9 @dreamdragon @pkulzc
 /research/pcl_rl/ @ofirnachum
 /research/ptn/ @xcyan @arkanath @hellojas @honglaklee
+/research/qa_kg/ @yuyuz
 /research/real_nvp/ @laurent-dinh
 /research/rebar/ @gjtucker
-/research/resnet/ @panyx0718
+/research/sentiment_analysis/ @sculd
 /research/seq2species/ @apbusia @depristo
 /research/skip_thoughts/ @cshallue
 /research/slim/ @sguada @nathansilberman
@@ -50,15 +54,7 @@
 /research/struct2depth/ @aneliaangelova
 /research/swivel/ @waterson
 /research/tcn/ @coreylynch @sermanet
-/research/tensorrt/ @karmel
 /research/textsum/ @panyx0718 @peterjliu
 /research/transformer/ @daviddao
 /research/vid2depth/ @rezama
 /research/video_prediction/ @cbfinn
-/research/fivo/ @dieterichlawson
-/samples/ @MarkDaoust @lamberta
-/samples/languages/java/ @asimshankar
-/tutorials/embedding/ @zffchen78 @a-dai
-/tutorials/image/ @sherrym @shlens
-/tutorials/image/cifar10_estimator/ @protoget
-/tutorials/rnn/ @lukaszkaiser @ebrevdo
--- a/README.md
+++ b/README.md
-# TensorFlow Models
+![Logo](https://storage.googleapis.com/model_garden_artifacts/TF_Model_Garden.png)

-This repository contains a number of different models implemented in [TensorFlow](https://www.tensorflow.org):
+# Welcome to the Model Garden for TensorFlow

-The [official models](official) are a collection of example models that use TensorFlow 2's high-level APIs. They are intended to be well-maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here.
+The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users. We aim to demonstrate the best practices for modeling so that TensorFlow users can take full advantage of TensorFlow for their research and product development.

-The [research models](https://github.com/tensorflow/models/tree/master/research) are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests.
+## Structure
+
+| Folder | Description |
+|-----------|-------------|
+| [official](official) | • **A collection of example implementations for SOTA models using the latest TensorFlow 2's high-level APIs**<br />• Officially maintained, supported, and kept up to date with the latest TensorFlow 2 APIs<br />• Reasonably optimized for fast performance while still being easy to read |
+| [research](research) | • A collection of research model implementations in TensorFlow 1 or 2 by researchers<br />• Up to the individual researchers to maintain the model implementations and/or provide support on issues and pull requests |

 ## Contribution guidelines

-If you want to contribute to models, be sure to review the [contribution guidelines](CONTRIBUTING.md).
+If you want to contribute to models, please review the [contribution guidelines](CONTRIBUTING.md).

 ## License


--- a/official/README-TPU.md
+++ b/official/README-TPU.md
-# Offically Supported TensorFlow 2.1 Models on Cloud TPU
+# Offically Supported TensorFlow 2.1+ Models on Cloud TPU

 ## Natural Language Processing

 *   [bert](nlp/bert): A powerful pre-trained language representation model:
    BERT, which stands for Bidirectional Encoder Representations from
    Transformers.
-    [BERT FineTuning with Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/bert-2.x) provides step by step instructions on Cloud TPU training. You can look [Bert MNLI Tensorboard.dev metrics](https://tensorboard.dev/experiment/mIah5lppTASvrHqWrdr6NA) for MNLI fine tuning task.
+    [BERT FineTuning with Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/bert-2.x) provides step by step instructions on Cloud TPU training. You can look [Bert MNLI Tensorboard.dev metrics](https://tensorboard.dev/experiment/LijZ1IrERxKALQfr76gndA) for MNLI fine tuning task.
 *   [transformer](nlp/transformer): A transformer model to translate the WMT
    English to German dataset.
        [Training transformer on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/transformer-2.x) for step by step instructions on Cloud TPU training.

 ## Computer Vision

+*   [efficientnet](vision/image_classification): A family of convolutional
+    neural networks that scale by balancing network depth, width, and
+    resolution and can be used to classify ImageNet's dataset of 1000 classes.
+    See [Tensorboard.dev training metrics](https://tensorboard.dev/experiment/KnaWjrq5TXGfv0NW5m7rpg/#scalars).
 *   [mnist](vision/image_classification): A basic model to classify digits
    from the MNIST dataset. See [Running MNIST on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/mnist-2.x) tutorial and [Tensorboard.dev metrics](https://tensorboard.dev/experiment/mIah5lppTASvrHqWrdr6NA).
+*   [mask-rcnn](vision/detection): An object detection and instance segmentation model. See [Tensorboard.dev training metrics](https://tensorboard.dev/experiment/LH7k0fMsRwqUAcE09o9kPA).
 *   [resnet](vision/image_classification): A deep residual network that can
    be used to classify ImageNet's dataset of 1000 classes.
    See [Training ResNet on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/resnet-2.x) tutorial and [Tensorboard.dev metrics](https://tensorboard.dev/experiment/CxlDK8YMRrSpYEGtBRpOhg).

--- a/official/README.md
+++ b/official/README.md
-# TensorFlow Official Models
+![Logo](https://storage.googleapis.com/model_garden_artifacts/TF_Model_Garden.png)

-The TensorFlow official models are a collection of models that use
-TensorFlow's high-level APIs. They are intended to be well-maintained, tested,
-and kept up to date with the latest TensorFlow API. They should also be
-reasonably optimized for fast performance while still being easy to read.
+# TensorFlow Official Models

-These models are used as end-to-end tests, ensuring that the models run with the
-same or improved speed and performance with each new TensorFlow build.
+The TensorFlow official models are a collection of models
+that use TensorFlow’s high-level APIs.
+They are intended to be well-maintained, tested, and kept up to date
+with the latest TensorFlow API.
+They should also be reasonably optimized for fast performance while still
+being easy to read.
+These models are used as end-to-end tests, ensuring that the models run
+with the same or improved speed and performance with each new TensorFlow build.

-## Tensorflow releases
+## Model Implementations

-The master branch of the models are **in development** with TensorFlow 2.x, and
-they target the
-[nightly binaries](https://github.com/tensorflow/tensorflow#installation) built
-from the
-[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
-You may start from installing with pip:
+### Natural Language Processing

-```shell
-pip3 install tf-nightly
-```
+| Model | Description | Reference |
+| ----- | ----------- | --------- |
+| [ALBERT](nlp/albert) | A Lite BERT for Self-supervised Learning of Language Representations | [arXiv:1909.11942](https://arxiv.org/abs/1909.11942) |
+| [BERT](nlp/bert) | A powerful pre-trained language representation model: BERT (Bidirectional Encoder Representations from Transformers) | [arXiv:1810.04805](https://arxiv.org/abs/1810.04805) |
+| [NHNet](nlp/nhnet) | A transformer-based multi-sequence to sequence model: Generating Representative Headlines for News Stories | [arXiv:2001.09386](https://arxiv.org/abs/2001.09386) |
+| [Transformer](nlp/transformer) | A transformer model to translate the WMT English to German dataset | [arXiv:1706.03762](https://arxiv.org/abs/1706.03762) |
+| [XLNet](nlp/xlnet) | XLNet: Generalized Autoregressive Pretraining for Language Understanding | [arXiv:1906.08237](https://arxiv.org/abs/1906.08237) |

-**Stable versions** of the official models targeting releases of TensorFlow are
-available as tagged branches or
-[downloadable releases](https://github.com/tensorflow/models/releases). Model
-repository version numbers match the target TensorFlow release, such that
-[release v2.1.0](https://github.com/tensorflow/models/releases/tag/v2.1.0) are
-compatible with
-[TensorFlow v2.1.0](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0).
+### Computer Vision

-If you are on a version of TensorFlow earlier than 1.4, please
-[update your installation](https://www.tensorflow.org/install/).
+| Model | Description | Reference |
+| ----- | ----------- | --------- |
+| [MNIST](vision/image_classification) | A basic model to classify digits from the MNIST dataset | [Link](http://yann.lecun.com/exdb/mnist/) |
+| [ResNet](vision/image_classification) | A deep residual network for image recognition | [arXiv:1512.03385](https://arxiv.org/abs/1512.03385) |
+| [RetinaNet](vision/detection) | A fast and powerful object detector | [arXiv:1708.02002](https://arxiv.org/abs/1708.02002) |
+| [Mask R-CNN](vision/detection) | An object detection and instance segmentation model | [arXiv:1703.06870](https://arxiv.org/abs/1703.06870) |

-## Requirements
+### Other models

-Please follow the below steps before running models in this repo:
+| Model | Description | Reference |
+| ----- | ----------- | --------- |
+| [NCF](recommendation) | Neural Collaborative Filtering model for recommendation tasks | [arXiv:1708.05031](https://arxiv.org/abs/1708.05031) |

-1.  TensorFlow
-    [nightly binaries](https://github.com/tensorflow/tensorflow#installation)
+---

-2.  If users would like to clone this repo but do not care about change history,
-please consider:
+## How to get started with the Model Garden official models

-  ```shell
-  export repo_version="master"
-  git clone -b ${repo_version} https://github.com/tensorflow/models.git --depth=1
-  ```
+* The models in the master branch are developed using TensorFlow 2,
+and they target the TensorFlow [nightly binaries](https://github.com/tensorflow/tensorflow#installation)
+built from the
+[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
+* The stable versions targeting releases of TensorFlow are available
+as tagged branches or [downloadable releases](https://github.com/tensorflow/models/releases).
+* Model repository version numbers match the target TensorFlow release,
+such that
+[release v2.1.0](https://github.com/tensorflow/models/releases/tag/v2.1.0)
+are compatible with
+[TensorFlow v2.1.0](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0).

-3.  Add the top-level ***/models*** folder to the Python path with the command:
+Please follow the below steps before running models in this repository.

-  ```shell
-  export PYTHONPATH=$PYTHONPATH:/path/to/models
-  ```
+### Requirements

-  Using Colab:
+* The latest TensorFlow Model Garden release and TensorFlow 2
+  * If you are on a version of TensorFlow earlier than 2.1, please
+upgrade your TensorFlow to [the latest TensorFlow 2](https://www.tensorflow.org/install/).

-  ```python
-  import os
-  os.environ['PYTHONPATH'] += ":/path/to/models"
-  ```
+```shell
+pip3 install tf-nightly
+```

-4.  Install dependencies:
+### Installation

-  ```shell
-  pip3 install --user -r official/requirements.txt
-  ```
+#### Method 1: Install the TensorFlow Model Garden pip package

+**tf-models-nightly** is the nightly Model Garden package
+created daily automatically. pip will install all models
+and dependencies automatically.

-To make Official Models easier to use, we are planning to create a pip
-installable Official Models package. This is being tracked in
-[#917](https://github.com/tensorflow/models/issues/917).
+```shell
+pip install tf-models-nightly
+```

-## Available models
+Please check out our [example](colab/bert.ipynb)
+to learn how to use a PIP package.

-**NOTE: For Officially Supported TPU models please check [README-TPU](README-TPU.md).**
+#### Method 2: Clone the source

-**NOTE:** Please make sure to follow the steps in the
-[Requirements](#requirements) section.
+1. Clone the GitHub repository:

-### Natural Language Processing
+```shell
+git clone https://github.com/tensorflow/models.git
+```

-*   [albert](nlp/albert): A Lite BERT for Self-supervised Learning of Language
-    Representations.
-*   [bert](nlp/bert): A powerful pre-trained language representation model:
-    BERT, which stands for Bidirectional Encoder Representations from
-    Transformers.
-*   [transformer](nlp/transformer): A transformer model to translate the WMT English
-    to German dataset.
-*   [xlnet](nlp/xlnet): XLNet: Generalized Autoregressive Pretraining for
-    Language Understanding.
+2. Add the top-level ***/models*** folder to the Python path.

-### Computer Vision
+```shell
+export PYTHONPATH=$PYTHONPATH:/path/to/models
+```

-*   [mnist](vision/image_classification): A basic model to classify digits from
-    the MNIST dataset.
-*   [resnet](vision/image_classification): A deep residual network that can be
-    used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
-*   [retinanet](vision/detection): A fast and powerful object detector.
+If you are using a Colab notebook, please set the Python path with os.environ.

-### Others
+```python
+import os
+os.environ['PYTHONPATH'] += ":/path/to/models"
+```

-*   [ncf](recommendation): Neural Collaborative Filtering model for
-    recommendation tasks.
+3. Install other dependencies

-Models that will not update to TensorFlow 2.x stay inside R1 directory:
+```shell
+pip3 install --user -r official/requirements.txt
+```

-*   [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to
-    classify higgs boson process from HIGGS Data Set.
-*   [wide_deep](r1/wide_deep): A model that combines a wide model and deep
-    network to classify census income data.
+---

 ## More models to come!

-We are in the progress to revamp official model garden with TensorFlow 2.0 and
-Keras. In the near future, we will bring:
+The team is actively developing new models.
+In the near future, we will add:

-*   State-of-the-art language understanding models: XLNet, GPT2, and more
-    members in Transformer family.
-*   Start-of-the-art image classification models: EfficientNet, MnasNet and
-    variants.
-*   A set of excellent objection detection models.
+- State-of-the-art language understanding models:
+  More members in Transformer family
+- Start-of-the-art image classification models:
+  EfficientNet, MnasNet and variants.
+- A set of excellent objection detection models.

 If you would like to make any fixes or improvements to the models, please
 [submit a pull request](https://github.com/tensorflow/models/compare).

-## New Models
+---
+
+## Contributions

-The team is actively working to add new models to the repository. Every model
-should follow the following guidelines, to uphold the our objectives of
-readable, usable, and maintainable code.
+Every model should follow our guidelines to uphold our objectives of readable,
+usable, and maintainable code.

-**General guidelines**
+### General Guidelines

-* Code should be well documented and tested.
-* Runnable from a blank environment with relative ease.
-* Trainable on: single GPU/CPU (baseline), multiple GPUs, TPU
-* Compatible with Python 3 (using [six](https://pythonhosted.org/six/) when
-  being compatible with Python 2 is necessary)
-* Conform to [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
+- Code should be well documented and tested.
+- Runnable from a blank environment with ease.
+- Trainable on: single GPU/CPU (baseline), multiple GPUs & TPUs
+- Compatible with Python 3 (using [six](https://pythonhosted.org/six/)
+when being compatible with Python 2 is necessary)
+- Conform to
+  [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)

-**Implementation guidelines**
+### Implementation Guidelines

-These guidelines exist so the model implementations are consistent for better
-readability and maintainability.
+These guidelines are to ensure consistent model implementations for
+better readability and maintainability.

-*   Use [common utility functions](utils)
-*   Export SavedModel at the end of training.
-*   Consistent flags and flag-parsing library
-    ([read more here](utils/flags/guidelines.md))
-*   Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
+- Use [common utility functions](utils)
+- Export SavedModel at the end of the training.
+- Consistent flags and flag-parsing library ([read more here](utils/flags/guidelines.md))
--- a/official/utils/testing/benchmark_wrappers.py
+++ b/official/utils/testing/benchmark_wrappers.py
 # Lint as: python3
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
 """Utils to annotate and trace benchmarks."""

 from __future__ import absolute_import

--- a/official/benchmark/bert_benchmark.py
+++ b/official/benchmark/bert_benchmark.py
@@ -34,7 +34,7 @@ from official.benchmark import bert_benchmark_utils as benchmark_utils
 from official.nlp.bert import configs
 from official.nlp.bert import run_classifier
 from official.utils.misc import distribution_utils
-from official.utils.testing import benchmark_wrappers
+from official.benchmark import benchmark_wrappers

 # pylint: disable=line-too-long
 PRETRAINED_CHECKPOINT_PATH = 'gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16/bert_model.ckpt'
@@ -56,6 +56,7 @@ class BertClassifyBenchmarkBase(benchmark_utils.BertBenchmarkBase):
    self.num_epochs = None
    self.num_steps_per_epoch = None
    self.tpu = tpu
+    FLAGS.steps_per_loop = 50

  @flagsaver.flagsaver
  def _run_bert_classifier(self, callbacks=None, use_ds=True):
@@ -81,8 +82,6 @@ class BertClassifyBenchmarkBase(benchmark_utils.BertBenchmarkBase):
          distribution_strategy='mirrored' if use_ds else 'off',
          num_gpus=self.num_gpus)

-    steps_per_loop = 50
-
    max_seq_length = input_meta_data['max_seq_length']
    train_input_fn = run_classifier.get_dataset_fn(
        FLAGS.train_data_path,
@@ -101,7 +100,7 @@ class BertClassifyBenchmarkBase(benchmark_utils.BertBenchmarkBase):
        FLAGS.model_dir,
        epochs,
        steps_per_epoch,
-        steps_per_loop,
+        FLAGS.steps_per_loop,
        eval_steps,
        warmup_steps,
        FLAGS.learning_rate,

--- a/official/benchmark/bert_benchmark_utils.py
+++ b/official/benchmark/bert_benchmark_utils.py
@@ -23,11 +23,11 @@ import time
 # pylint: disable=g-bad-import-order
 import numpy as np
 from absl import flags
-import tensorflow.compat.v2 as tf
+import tensorflow as tf
 # pylint: enable=g-bad-import-order

 from official.utils.flags import core as flags_core
-from official.utils.testing.perfzero_benchmark import PerfZeroBenchmark
+from official.benchmark.perfzero_benchmark import PerfZeroBenchmark

 FLAGS = flags.FLAGS


--- a/official/benchmark/bert_squad_benchmark.py
+++ b/official/benchmark/bert_squad_benchmark.py
@@ -33,7 +33,7 @@ from official.benchmark import bert_benchmark_utils as benchmark_utils
 from official.nlp.bert import run_squad
 from official.utils.misc import distribution_utils
 from official.utils.misc import keras_utils
-from official.utils.testing import benchmark_wrappers
+from official.benchmark import benchmark_wrappers


 # pylint: disable=line-too-long
@@ -104,7 +104,6 @@ class BertSquadBenchmarkBase(benchmark_utils.BertBenchmarkBase):
  @flagsaver.flagsaver
  def _train_squad(self, run_eagerly=False, ds_type='mirrored'):
    """Runs BERT SQuAD training. Uses mirrored strategy by default."""
-    assert tf.version.VERSION.startswith('2.')
    self._init_gpu_and_data_threads()
    input_meta_data = self._read_input_meta_data_from_file()
    strategy = self._get_distribution_strategy(ds_type)
@@ -118,7 +117,6 @@ class BertSquadBenchmarkBase(benchmark_utils.BertBenchmarkBase):
  @flagsaver.flagsaver
  def _evaluate_squad(self, ds_type='mirrored'):
    """Runs BERT SQuAD evaluation. Uses mirrored strategy by default."""
-    assert tf.version.VERSION.startswith('2.')
    self._init_gpu_and_data_threads()
    input_meta_data = self._read_input_meta_data_from_file()
    strategy = self._get_distribution_strategy(ds_type)
@@ -128,7 +126,7 @@ class BertSquadBenchmarkBase(benchmark_utils.BertBenchmarkBase):
    eval_metrics = run_squad.eval_squad(strategy=strategy,
                                        input_meta_data=input_meta_data)
    # Use F1 score as reported evaluation metric.
-    self.eval_metrics = eval_metrics['f1']
+    self.eval_metrics = eval_metrics['final_f1']


 class BertSquadBenchmarkReal(BertSquadBenchmarkBase):
@@ -254,7 +252,7 @@ class BertSquadBenchmarkReal(BertSquadBenchmarkBase):
    self._setup()
    self.num_gpus = 8
    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_squad')
-    FLAGS.train_batch_size = 32
+    FLAGS.train_batch_size = 24
    FLAGS.tf_gpu_thread_mode = 'gpu_private'

    self._run_and_report_benchmark()

--- a/official/benchmark/keras_benchmark.py
+++ b/official/benchmark/keras_benchmark.py
@@ -19,9 +19,8 @@ from __future__ import division
 from __future__ import print_function

 import tensorflow as tf
-
+from official.benchmark.perfzero_benchmark import PerfZeroBenchmark
 from official.utils.flags import core as flags_core
-from official.utils.testing.perfzero_benchmark import PerfZeroBenchmark


 class KerasBenchmark(PerfZeroBenchmark):
@@ -32,7 +31,6 @@ class KerasBenchmark(PerfZeroBenchmark):
               default_flags=None,
               flag_methods=None,
               tpu=None):
-    assert tf.version.VERSION.startswith('2.')
    super(KerasBenchmark, self).__init__(
        output_dir=output_dir,
        default_flags=default_flags,

--- a/official/benchmark/keras_cifar_benchmark.py
+++ b/official/benchmark/keras_cifar_benchmark.py
@@ -23,7 +23,7 @@ from absl import flags
 import tensorflow as tf  # pylint: disable=g-bad-import-order

 from official.benchmark import keras_benchmark
-from official.utils.testing import benchmark_wrappers
+from official.benchmark import benchmark_wrappers
 from official.benchmark.models import resnet_cifar_main

 MIN_TOP_1_ACCURACY = 0.929

--- a/official/benchmark/keras_imagenet_benchmark.py
+++ b/official/benchmark/keras_imagenet_benchmark.py
+# Lint as: python3
 # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -13,17 +14,22 @@
 # limitations under the License.
 # ==============================================================================
 """Executes Keras benchmarks and accuracy tests."""
+# pylint: disable=line-too-long
 from __future__ import print_function

+import json
 import os
 import time

+from typing import Any, MutableMapping, Optional
+
 from absl import flags
 import tensorflow as tf  # pylint: disable=g-bad-import-order

+from official.benchmark import benchmark_wrappers
 from official.benchmark import keras_benchmark
-from official.utils.testing import benchmark_wrappers
-from official.vision.image_classification.resnet import resnet_imagenet_main
+from official.benchmark.models import resnet_imagenet_main
+from official.vision.image_classification import classifier_trainer

 MIN_TOP_1_ACCURACY = 0.76
 MAX_TOP_1_ACCURACY = 0.77
@@ -41,10 +47,74 @@ MODEL_OPTIMIZATION_TOP_1_ACCURACY = {
 FLAGS = flags.FLAGS


+def _get_classifier_parameters(
+    num_gpus: int = 0,
+    builder: str = 'records',
+    skip_eval: bool = False,
+    distribution_strategy: str = 'mirrored',
+    per_replica_batch_size: int = 128,
+    epochs: int = 90,
+    steps: int = 0,
+    epochs_between_evals: int = 1,
+    dtype: str = 'float32',
+    enable_xla: bool = False,
+    run_eagerly: bool = False,
+    gpu_thread_mode: Optional[str] = None,
+    dataset_num_private_threads: Optional[int] = None,
+    loss_scale: Optional[str] = None) -> MutableMapping[str, Any]:
+  """Gets classifier trainer's ResNet parameters."""
+  return {
+      'runtime': {
+          'num_gpus': num_gpus,
+          'distribution_strategy': distribution_strategy,
+          'run_eagerly': run_eagerly,
+          'enable_xla': enable_xla,
+          'dataset_num_private_threads': dataset_num_private_threads,
+          'gpu_thread_mode': gpu_thread_mode,
+          'loss_scale': loss_scale,
+      },
+      'train_dataset': {
+          'builder': builder,
+          'use_per_replica_batch_size': True,
+          'batch_size': per_replica_batch_size,
+          'image_size': 224,
+          'dtype': dtype,
+      },
+      'validation_dataset': {
+          'builder': builder,
+          'batch_size': per_replica_batch_size,
+          'use_per_replica_batch_size': True,
+          'image_size': 224,
+          'dtype': dtype,
+      },
+      'train': {
+          'epochs': epochs,
+          'steps': steps,
+          'callbacks': {
+              'enable_tensorboard': False,
+              'enable_checkpoint_and_export': False,
+              'enable_time_history': True,
+          },
+      },
+      'model': {
+          'loss': {
+              'label_smoothing': 0.1,
+          },
+      },
+      'evaluation': {
+          'epochs_between_evals': epochs_between_evals,
+          'skip_eval': skip_eval,
+      },
+  }
+
+
 class Resnet50KerasAccuracy(keras_benchmark.KerasBenchmark):
  """Benchmark accuracy tests for ResNet50 in Keras."""

-  def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
+  def __init__(self,
+               output_dir: Optional[str] = None,
+               root_data_dir: Optional[str] = None,
+               **kwargs):
    """A benchmark class.

    Args:
@@ -55,97 +125,54 @@ class Resnet50KerasAccuracy(keras_benchmark.KerasBenchmark):
                named arguments before updating the constructor.
    """

-    flag_methods = [resnet_imagenet_main.define_imagenet_keras_flags]
+    flag_methods = [classifier_trainer.define_classifier_flags]

    self.data_dir = os.path.join(root_data_dir, 'imagenet')
    super(Resnet50KerasAccuracy, self).__init__(
        output_dir=output_dir, flag_methods=flag_methods)

-  def benchmark_8_gpu(self):
-    """Test Keras model with eager, dist_strat and 8 GPUs."""
-    self._setup()
-    FLAGS.num_gpus = 8
-    FLAGS.data_dir = self.data_dir
-    FLAGS.batch_size = 128 * 8
-    FLAGS.train_epochs = 90
-    FLAGS.epochs_between_evals = 10
-    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu')
-    FLAGS.dtype = 'fp32'
-    FLAGS.enable_eager = True
-    # Add some thread tunings to improve performance.
-    FLAGS.datasets_num_private_threads = 14
-    self._run_and_report_benchmark()
-
-  def benchmark_8_gpu_amp(self):
-    """Test Keras model with eager, dist_strat and 8 GPUs with automatic mixed precision."""
-    self._setup()
-    FLAGS.num_gpus = 8
-    FLAGS.data_dir = self.data_dir
-    FLAGS.batch_size = 128 * 8
-    FLAGS.train_epochs = 90
-    FLAGS.epochs_between_evals = 10
-    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_amp')
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = True
-    FLAGS.fp16_implementation = 'graph_rewrite'
-    # Add some thread tunings to improve performance.
-    FLAGS.datasets_num_private_threads = 14
-    self._run_and_report_benchmark()
-
-  def benchmark_8_gpu_fp16(self):
-    """Test Keras model with eager, dist_strat, 8 GPUs, and fp16."""
-    self._setup()
-    FLAGS.num_gpus = 8
-    FLAGS.data_dir = self.data_dir
-    FLAGS.batch_size = 256 * 8
-    FLAGS.train_epochs = 90
-    FLAGS.epochs_between_evals = 10
-    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_fp16')
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = True
-    # Thread tuning to improve performance.
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
-  def benchmark_xla_8_gpu_fp16(self):
-    """Test Keras model with XLA, eager, dist_strat, 8 GPUs and fp16."""
-    self._setup()
-    FLAGS.num_gpus = 8
-    FLAGS.data_dir = self.data_dir
-    FLAGS.batch_size = 256 * 8
-    FLAGS.train_epochs = 90
-    FLAGS.epochs_between_evals = 10
-    FLAGS.model_dir = self._get_model_dir('benchmark_xla_8_gpu_fp16')
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = True
-    FLAGS.enable_xla = True
-    # Thread tuning to improve performance.
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
-  def benchmark_xla_8_gpu_fp16_dynamic(self):
-    """Test Keras model with XLA, eager, dist_strat, 8 GPUs, dynamic fp16."""
-    self._setup()
-    FLAGS.num_gpus = 8
+  @benchmark_wrappers.enable_runtime_flags
+  def _run_and_report_benchmark(
+      self,
+      experiment_name: str,
+      top_1_min: float = MIN_TOP_1_ACCURACY,
+      top_1_max: float = MAX_TOP_1_ACCURACY,
+      num_gpus: int = 0,
+      distribution_strategy: str = 'mirrored',
+      per_replica_batch_size: int = 128,
+      epochs: int = 90,
+      steps: int = 0,
+      epochs_between_evals: int = 1,
+      dtype: str = 'float32',
+      enable_xla: bool = False,
+      run_eagerly: bool = False,
+      gpu_thread_mode: Optional[str] = None,
+      dataset_num_private_threads: Optional[int] = None,
+      loss_scale: Optional[str] = None):
+    """Runs and reports the benchmark given the provided configuration."""
+    FLAGS.model_type = 'resnet'
+    FLAGS.dataset = 'imagenet'
+    FLAGS.mode = 'train_and_eval'
    FLAGS.data_dir = self.data_dir
-    FLAGS.batch_size = 256 * 8
-    FLAGS.train_epochs = 90
-    FLAGS.epochs_between_evals = 10
-    FLAGS.model_dir = self._get_model_dir('benchmark_xla_8_gpu_fp16_dynamic')
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = True
-    FLAGS.enable_xla = True
-    FLAGS.loss_scale = 'dynamic'
-    # Thread tuning to improve performance.
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark(top_1_min=0.736)
+    FLAGS.model_dir = self._get_model_dir(experiment_name)
+    parameters = _get_classifier_parameters(
+        num_gpus=num_gpus,
+        distribution_strategy=distribution_strategy,
+        per_replica_batch_size=per_replica_batch_size,
+        epochs=epochs,
+        steps=steps,
+        epochs_between_evals=epochs_between_evals,
+        dtype=dtype,
+        enable_xla=enable_xla,
+        run_eagerly=run_eagerly,
+        gpu_thread_mode=gpu_thread_mode,
+        dataset_num_private_threads=dataset_num_private_threads,
+        loss_scale=loss_scale)
+    FLAGS.params_override = json.dumps(parameters)
+    total_batch_size = num_gpus * per_replica_batch_size

-  @benchmark_wrappers.enable_runtime_flags
-  def _run_and_report_benchmark(self,
-                                top_1_min=MIN_TOP_1_ACCURACY,
-                                top_1_max=MAX_TOP_1_ACCURACY):
    start_time_sec = time.time()
-    stats = resnet_imagenet_main.run(flags.FLAGS)
+    stats = classifier_trainer.run(flags.FLAGS)
    wall_time_sec = time.time() - start_time_sec

    super(Resnet50KerasAccuracy, self)._report_benchmark(
@@ -153,9 +180,56 @@ class Resnet50KerasAccuracy(keras_benchmark.KerasBenchmark):
        wall_time_sec,
        top_1_min=top_1_min,
        top_1_max=top_1_max,
-        total_batch_size=FLAGS.batch_size,
+        total_batch_size=total_batch_size,
        log_steps=100)

+  def benchmark_8_gpu(self):
+    """Tests Keras model with eager, dist_strat and 8 GPUs."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu',
+        num_gpus=8,
+        per_replica_batch_size=128,
+        epochs=90,
+        epochs_between_evals=10,
+        dtype='float32')
+
+  def benchmark_8_gpu_fp16(self):
+    """Tests Keras model with eager, dist_strat, 8 GPUs, and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu_fp16',
+        num_gpus=8,
+        per_replica_batch_size=256,
+        epochs=90,
+        epochs_between_evals=10,
+        dtype='float16')
+
+  def benchmark_xla_8_gpu_fp16(self):
+    """Tests Keras model with XLA, eager, dist_strat, 8 GPUs and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16',
+        num_gpus=8,
+        per_replica_batch_size=256,
+        epochs=90,
+        epochs_between_evals=10,
+        dtype='float16',
+        enable_xla=True)
+
+  def benchmark_xla_8_gpu_fp16_dynamic(self):
+    """Tests Keras model with XLA, eager, dist_strat, 8 GPUs, dynamic fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16_dynamic',
+        top_1_min=0.736,
+        num_gpus=8,
+        per_replica_batch_size=256,
+        epochs=90,
+        epochs_between_evals=10,
+        dtype='float16',
+        loss_scale='dynamic')
+
  def _get_model_dir(self, folder_name):
    return os.path.join(self.output_dir, folder_name)

@@ -197,8 +271,6 @@ class MobilenetV1KerasAccuracy(keras_benchmark.KerasBenchmark):
    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu')
    FLAGS.dtype = 'fp32'
    FLAGS.enable_eager = True
-    # Add some thread tunings to improve performance.
-    FLAGS.datasets_num_private_threads = 14
    self._run_and_report_benchmark()

  @benchmark_wrappers.enable_runtime_flags
@@ -221,6 +293,348 @@ class MobilenetV1KerasAccuracy(keras_benchmark.KerasBenchmark):
    return os.path.join(self.output_dir, folder_name)


+class Resnet50KerasClassifierBenchmarkBase(keras_benchmark.KerasBenchmark):
+  """Resnet50 (classifier_trainer) benchmarks."""
+
+  def __init__(self, output_dir=None, default_flags=None,
+               tpu=None, dataset_builder='records', train_epochs=1,
+               train_steps=110, data_dir=None):
+    flag_methods = [classifier_trainer.define_classifier_flags]
+
+    self.dataset_builder = dataset_builder
+    self.train_epochs = train_epochs
+    self.train_steps = train_steps
+    self.data_dir = data_dir
+
+    super(Resnet50KerasClassifierBenchmarkBase, self).__init__(
+        output_dir=output_dir,
+        flag_methods=flag_methods,
+        default_flags=default_flags,
+        tpu=tpu)
+
+  @benchmark_wrappers.enable_runtime_flags
+  def _run_and_report_benchmark(
+      self,
+      experiment_name: str,
+      skip_steps: Optional[int] = None,
+      top_1_min: float = MIN_TOP_1_ACCURACY,
+      top_1_max: float = MAX_TOP_1_ACCURACY,
+      num_gpus: int = 0,
+      num_tpus: int = 0,
+      distribution_strategy: str = 'mirrored',
+      per_replica_batch_size: int = 128,
+      epochs_between_evals: int = 1,
+      dtype: str = 'float32',
+      enable_xla: bool = False,
+      run_eagerly: bool = False,
+      gpu_thread_mode: Optional[str] = None,
+      dataset_num_private_threads: Optional[int] = None,
+      loss_scale: Optional[str] = None):
+    """Runs and reports the benchmark given the provided configuration."""
+    FLAGS.model_type = 'resnet'
+    FLAGS.dataset = 'imagenet'
+    FLAGS.mode = 'train_and_eval'
+    FLAGS.data_dir = self.data_dir
+    FLAGS.model_dir = self._get_model_dir(experiment_name)
+    parameters = _get_classifier_parameters(
+        builder=self.dataset_builder,
+        skip_eval=True,
+        num_gpus=num_gpus,
+        distribution_strategy=distribution_strategy,
+        per_replica_batch_size=per_replica_batch_size,
+        epochs=self.train_epochs,
+        steps=self.train_steps,
+        epochs_between_evals=epochs_between_evals,
+        dtype=dtype,
+        enable_xla=enable_xla,
+        gpu_thread_mode=gpu_thread_mode,
+        dataset_num_private_threads=dataset_num_private_threads,
+        loss_scale=loss_scale)
+    FLAGS.params_override = json.dumps(parameters)
+    if distribution_strategy == 'tpu':
+      total_batch_size = num_tpus * per_replica_batch_size
+    else:
+      total_batch_size = num_gpus * per_replica_batch_size
+
+    start_time_sec = time.time()
+    stats = classifier_trainer.run(flags.FLAGS)
+    wall_time_sec = time.time() - start_time_sec
+    # Number of logged step time entries that are excluded in performance
+    # report. We keep results from last 100 batches, or skip the steps based on
+    # input skip_steps.
+    warmup = (skip_steps or (self.train_steps - 100)) // FLAGS.log_steps
+
+    super(Resnet50KerasClassifierBenchmarkBase, self)._report_benchmark(
+        stats,
+        wall_time_sec,
+        total_batch_size=total_batch_size,
+        log_steps=FLAGS.log_steps,
+        warmup=warmup,
+        start_time_sec=start_time_sec)
+
+  def benchmark_1_gpu_no_dist_strat(self):
+    """Tests Keras model with 1 GPU, no distribution strategy."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu_no_dist_strat',
+        num_gpus=1,
+        distribution_strategy='off',
+        per_replica_batch_size=128)
+
+  def benchmark_1_gpu_no_dist_strat_run_eagerly(self):
+    """Tests Keras model with 1 GPU, no distribution strategy, run eagerly."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu_no_dist_strat_run_eagerly',
+        num_gpus=1,
+        run_eagerly=True,
+        distribution_strategy='off',
+        per_replica_batch_size=64)
+
+  def benchmark_1_gpu_no_dist_strat_run_eagerly_fp16(self):
+    """Tests with 1 GPU, no distribution strategy, fp16, run eagerly."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu_no_dist_strat_run_eagerly_fp16',
+        num_gpus=1,
+        run_eagerly=True,
+        distribution_strategy='off',
+        dtype='float16',
+        per_replica_batch_size=128)
+
+  def benchmark_1_gpu(self):
+    """Tests Keras model with 1 GPU."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu',
+        num_gpus=1,
+        distribution_strategy='one_device',
+        per_replica_batch_size=128)
+
+  def benchmark_xla_1_gpu(self):
+    """Tests Keras model with XLA and 1 GPU."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_1_gpu',
+        num_gpus=1,
+        enable_xla=True,
+        distribution_strategy='one_device',
+        per_replica_batch_size=128)
+
+  def benchmark_1_gpu_fp16(self):
+    """Tests Keras model with 1 GPU and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu_fp16',
+        num_gpus=1,
+        distribution_strategy='one_device',
+        dtype='float16',
+        per_replica_batch_size=256)
+
+  def benchmark_1_gpu_fp16_dynamic(self):
+    """Tests Keras model with 1 GPU, fp16, and dynamic loss scaling."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_1_gpu_fp16_dynamic',
+        num_gpus=1,
+        distribution_strategy='one_device',
+        dtype='float16',
+        per_replica_batch_size=256,
+        loss_scale='dynamic')
+
+  def benchmark_xla_1_gpu_fp16(self):
+    """Tests Keras model with XLA, 1 GPU and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_1_gpu_fp16',
+        num_gpus=1,
+        enable_xla=True,
+        distribution_strategy='one_device',
+        dtype='float16',
+        per_replica_batch_size=256)
+
+  def benchmark_xla_1_gpu_fp16_tweaked(self):
+    """Tests Keras model with XLA, 1 GPU, fp16, and manual config tuning."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_1_gpu_fp16_tweaked',
+        num_gpus=1,
+        enable_xla=True,
+        distribution_strategy='one_device',
+        dtype='float16',
+        per_replica_batch_size=256,
+        gpu_thread_mode='gpu_private')
+
+  def benchmark_xla_1_gpu_fp16_dynamic(self):
+    """Tests Keras model with XLA, 1 GPU, fp16, and dynamic loss scaling."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_1_gpu_fp16_dynamic',
+        num_gpus=1,
+        enable_xla=True,
+        distribution_strategy='one_device',
+        dtype='float16',
+        per_replica_batch_size=256,
+        loss_scale='dynamic')
+
+  def benchmark_8_gpu(self):
+    """Tests Keras model with 8 GPUs."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu',
+        num_gpus=8,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=128)
+
+  def benchmark_8_gpu_tweaked(self):
+    """Tests Keras model with manual config tuning and 8 GPUs."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu_tweaked',
+        num_gpus=8,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=128,
+        dataset_num_private_threads=14)
+
+  def benchmark_xla_8_gpu(self):
+    """Tests Keras model with XLA and 8 GPUs."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=128)
+
+  def benchmark_xla_8_gpu_tweaked(self):
+    """Tests Keras model with manual config tuning, 8 GPUs, and XLA."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_tweaked',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=128,
+        gpu_thread_mode='gpu_private',
+        dataset_num_private_threads=24)
+
+  def benchmark_8_gpu_fp16(self):
+    """Tests Keras model with 8 GPUs and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu_fp16',
+        num_gpus=8,
+        dtype='float16',
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256)
+
+  def benchmark_8_gpu_fp16_tweaked(self):
+    """Tests Keras model with 8 GPUs, fp16, and manual config tuning."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu_fp16_tweaked',
+        num_gpus=8,
+        dtype='float16',
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256,
+        gpu_thread_mode='gpu_private',
+        dataset_num_private_threads=40)
+
+  def benchmark_8_gpu_fp16_dynamic_tweaked(self):
+    """Tests Keras model with 8 GPUs, fp16, dynamic loss scaling, and tuned."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_8_gpu_fp16_dynamic_tweaked',
+        num_gpus=8,
+        dtype='float16',
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256,
+        loss_scale='dynamic',
+        gpu_thread_mode='gpu_private',
+        dataset_num_private_threads=40)
+
+  def benchmark_xla_8_gpu_fp16(self):
+    """Tests Keras model with XLA, 8 GPUs and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16',
+        dtype='float16',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256)
+
+  def benchmark_xla_8_gpu_fp16_tweaked(self):
+    """Test Keras model with manual config tuning, XLA, 8 GPUs and fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16_tweaked',
+        dtype='float16',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256,
+        gpu_thread_mode='gpu_private',
+        dataset_num_private_threads=48)
+
+  def benchmark_xla_8_gpu_fp16_tweaked_delay_measure(self):
+    """Tests with manual config tuning, XLA, 8 GPUs and fp16.
+
+    Delay performance measurement for stable performance on 96 vCPU platforms.
+    """
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16_tweaked_delay_measure',
+        dtype='float16',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256,
+        gpu_thread_mode='gpu_private',
+        dataset_num_private_threads=48,
+        steps=310)
+
+  def benchmark_xla_8_gpu_fp16_dynamic_tweaked(self):
+    """Tests Keras model with config tuning, XLA, 8 GPUs and dynamic fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_xla_8_gpu_fp16_dynamic_tweaked',
+        dtype='float16',
+        num_gpus=8,
+        enable_xla=True,
+        distribution_strategy='mirrored',
+        per_replica_batch_size=256,
+        gpu_thread_mode='gpu_private',
+        loss_scale='dynamic',
+        dataset_num_private_threads=48)
+
+  def benchmark_2x2_tpu_fp16(self):
+    """Test Keras model with 2x2 TPU, fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_2x2_tpu_fp16',
+        dtype='bfloat16',
+        num_tpus=8,
+        distribution_strategy='tpu',
+        per_replica_batch_size=128)
+
+  def benchmark_4x4_tpu_fp16(self):
+    """Test Keras model with 4x4 TPU, fp16."""
+    self._setup()
+    self._run_and_report_benchmark(
+        experiment_name='benchmark_4x4_tpu_fp16',
+        dtype='bfloat16',
+        num_tpus=32,
+        distribution_strategy='tpu',
+        per_replica_batch_size=128)
+
+  def fill_report_object(self, stats):
+    super(Resnet50KerasClassifierBenchmarkBase, self).fill_report_object(
+        stats,
+        total_batch_size=FLAGS.batch_size,
+        log_steps=FLAGS.log_steps)
+
+
 class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
  """Resnet50 benchmarks."""

@@ -318,18 +732,6 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
    FLAGS.batch_size = 128
    self._run_and_report_benchmark()

-  def benchmark_graph_1_gpu_no_dist_strat(self):
-    """Test Keras model in legacy graph mode with 1 GPU, no dist strat."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'off'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu_no_dist_strat')
-    FLAGS.batch_size = 96  # BatchNorm is less efficient in legacy graph mode
-    # due to its reliance on v1 cond.
-    self._run_and_report_benchmark()
-
  def benchmark_1_gpu(self):
    """Test Keras model with 1 GPU."""
    self._setup()
@@ -446,69 +848,6 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
    FLAGS.loss_scale = 'dynamic'
    self._run_and_report_benchmark()

-  def benchmark_graph_1_gpu(self):
-    """Test Keras model in legacy graph mode with 1 GPU."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu')
-    FLAGS.batch_size = 128
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu(self):
-    """Test Keras model in legacy graph mode with XLA and 1 GPU."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_1_gpu')
-    FLAGS.batch_size = 128
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_1_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with 1 GPU and fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu_fp16')
-    FLAGS.batch_size = 256
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with 1 GPU, fp16 and XLA."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_1_gpu_fp16')
-    FLAGS.batch_size = 256
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu_fp16_tweaked(self):
-    """Test Keras model in legacy graph with 1 GPU, fp16, XLA, and tuning."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_xla_1_gpu_fp16_tweaked')
-    FLAGS.dtype = 'fp16'
-    FLAGS.batch_size = 256
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
  def benchmark_8_gpu(self):
    """Test Keras model with 8 GPUs."""
    self._setup()
@@ -608,6 +947,7 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
    FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_fp16_tweaked')
    FLAGS.batch_size = 256 * 8  # 8 GPUs
    FLAGS.tf_gpu_thread_mode = 'gpu_private'
+    FLAGS.dataset_num_private_threads = 40
    self._run_and_report_benchmark()

  def benchmark_8_gpu_fp16_dynamic_tweaked(self):
@@ -623,6 +963,7 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
    FLAGS.batch_size = 256 * 8  # 8 GPUs
    FLAGS.loss_scale = 'dynamic'
    FLAGS.tf_gpu_thread_mode = 'gpu_private'
+    FLAGS.dataset_num_private_threads = 40
    self._run_and_report_benchmark()

  def benchmark_xla_8_gpu_fp16(self):
@@ -669,6 +1010,7 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
        'benchmark_xla_8_gpu_fp16_tweaked_delay_measure')
    FLAGS.batch_size = 256 * 8
    FLAGS.tf_gpu_thread_mode = 'gpu_private'
+    FLAGS.datasets_num_private_threads = 48
    FLAGS.train_steps = 310
    self._run_and_report_benchmark()

@@ -689,132 +1031,6 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
    FLAGS.datasets_num_private_threads = 48
    self._run_and_report_benchmark()

-  def benchmark_graph_8_gpu(self):
-    """Test Keras model in legacy graph mode with 8 GPUs."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_8_gpu')
-    FLAGS.batch_size = 128 * 8  # 8 GPUs
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_8_gpu(self):
-    """Test Keras model in legacy graph mode with XLA and 8 GPUs."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_8_gpu')
-    FLAGS.batch_size = 128 * 8  # 8 GPUs
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_8_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with 8 GPUs and fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_8_gpu_fp16')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_8_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with XLA, 8 GPUs and fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_8_gpu_fp16')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_8_gpu_fp16_tweaked(self):
-    """Test Keras model in legacy graph mode, tuning, 8 GPUs, and FP16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_8_gpu_fp16_tweaked')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_8_gpu_fp16_tweaked(self):
-    """Test Keras model in legacy graph tuning, XLA_FP16, 8 GPUs and fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_xla_8_gpu_fp16_tweaked')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_8_gpu_fp16_tweaked_delay_measure(self):
-    """Test in legacy graph mode with manual config tuning, XLA, 8 GPUs, fp16.
-
-    Delay performance measurement for stable performance on 96 vCPU platforms.
-    """
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_xla_8_gpu_fp16_tweaked_delay_measure')
-    FLAGS.batch_size = 256 * 8
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    FLAGS.train_steps = 310
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_8_gpu_fp16_dynamic_tweaked(self):
-    """Test graph Keras with config tuning, 8 GPUs and dynamic fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_8_gpu_fp16_dynamic_tweaked')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    FLAGS.loss_scale = 'dynamic'
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_8_gpu_fp16_dynamic_tweaked(self):
-    """Test graph Keras with config tuning, XLA, 8 GPUs and dynamic fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 8
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'mirrored'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_xla_8_gpu_fp16_dynamic_tweaked')
-    FLAGS.batch_size = 256 * 8  # 8 GPUs
-    FLAGS.loss_scale = 'dynamic'
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._run_and_report_benchmark()
-
  def benchmark_2x2_tpu_fp16(self):
    """Test Keras model with 2x2 TPU, fp16."""
    self._setup()
@@ -842,34 +1058,30 @@ class Resnet50KerasBenchmarkBase(keras_benchmark.KerasBenchmark):
        log_steps=FLAGS.log_steps)


-class Resnet50KerasBenchmarkSynth(Resnet50KerasBenchmarkBase):
+class Resnet50KerasBenchmarkSynth(Resnet50KerasClassifierBenchmarkBase):
  """Resnet50 synthetic benchmark tests."""

  def __init__(self, output_dir=None, root_data_dir=None, tpu=None, **kwargs):
    def_flags = {}
-    def_flags['skip_eval'] = True
-    def_flags['report_accuracy_metrics'] = False
-    def_flags['use_synthetic_data'] = True
-    def_flags['train_steps'] = 110
    def_flags['log_steps'] = 10

    super(Resnet50KerasBenchmarkSynth, self).__init__(
-        output_dir=output_dir, default_flags=def_flags, tpu=tpu)
+        output_dir=output_dir, default_flags=def_flags, tpu=tpu,
+        dataset_builder='synthetic', train_epochs=1, train_steps=110)


-class Resnet50KerasBenchmarkReal(Resnet50KerasBenchmarkBase):
+class Resnet50KerasBenchmarkReal(Resnet50KerasClassifierBenchmarkBase):
  """Resnet50 real data benchmark tests."""

  def __init__(self, output_dir=None, root_data_dir=None, tpu=None, **kwargs):
+    data_dir = os.path.join(root_data_dir, 'imagenet')
    def_flags = {}
-    def_flags['skip_eval'] = True
-    def_flags['report_accuracy_metrics'] = False
-    def_flags['data_dir'] = os.path.join(root_data_dir, 'imagenet')
-    def_flags['train_steps'] = 110
    def_flags['log_steps'] = 10

    super(Resnet50KerasBenchmarkReal, self).__init__(
-        output_dir=output_dir, default_flags=def_flags, tpu=tpu)
+        output_dir=output_dir, default_flags=def_flags, tpu=tpu,
+        dataset_builder='records', train_epochs=1, train_steps=110,
+        data_dir=data_dir)


 class Resnet50KerasBenchmarkRemoteData(Resnet50KerasBenchmarkBase):
@@ -970,19 +1182,6 @@ class Resnet50KerasBenchmarkRemoteData(Resnet50KerasBenchmarkBase):
    self._override_flags_to_run_test_shorter()
    self._run_and_report_benchmark()

-  def benchmark_graph_1_gpu_no_dist_strat(self):
-    """Test Keras model in legacy graph mode with 1 GPU, no dist strat."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'off'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu_no_dist_strat')
-    FLAGS.batch_size = 96  # BatchNorm is less efficient in legacy graph mode
-    # due to its reliance on v1 cond.
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
  def benchmark_1_gpu(self):
    """Test Keras model with 1 GPU."""
    self._setup()
@@ -1108,74 +1307,6 @@ class Resnet50KerasBenchmarkRemoteData(Resnet50KerasBenchmarkBase):
    self._override_flags_to_run_test_shorter()
    self._run_and_report_benchmark()

-  def benchmark_graph_1_gpu(self):
-    """Test Keras model in legacy graph mode with 1 GPU."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu')
-    FLAGS.batch_size = 128
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu(self):
-    """Test Keras model in legacy graph mode with XLA and 1 GPU."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_1_gpu')
-    FLAGS.batch_size = 128
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_1_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with 1 GPU and fp16."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_1_gpu_fp16')
-    FLAGS.batch_size = 256
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu_fp16(self):
-    """Test Keras model in legacy graph mode with 1 GPU, fp16 and XLA."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.dtype = 'fp16'
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir('benchmark_graph_xla_1_gpu_fp16')
-    FLAGS.batch_size = 256
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
-  def benchmark_graph_xla_1_gpu_fp16_tweaked(self):
-    """Test Keras model in legacy graph with 1 GPU, fp16, XLA, and tuning."""
-    self._setup()
-
-    FLAGS.num_gpus = 1
-    FLAGS.enable_eager = False
-    FLAGS.enable_xla = True
-    FLAGS.distribution_strategy = 'one_device'
-    FLAGS.model_dir = self._get_model_dir(
-        'benchmark_graph_xla_1_gpu_fp16_tweaked')
-    FLAGS.dtype = 'fp16'
-    FLAGS.batch_size = 256
-    FLAGS.tf_gpu_thread_mode = 'gpu_private'
-    self._override_flags_to_run_test_shorter()
-    self._run_and_report_benchmark()
-
  @benchmark_wrappers.enable_runtime_flags
  def _run_and_report_benchmark(self):
    if FLAGS.num_gpus == 1 or FLAGS.run_eagerly:
@@ -1245,7 +1376,7 @@ class Resnet50MultiWorkerKerasAccuracy(keras_benchmark.KerasBenchmark):
  """Resnet50 distributed accuracy tests with multiple workers."""

  def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
-    flag_methods = [resnet_imagenet_main.define_imagenet_keras_flags]
+    flag_methods = [classifier_trainer.define_imagenet_keras_flags]
    self.data_dir = os.path.join(root_data_dir, 'imagenet')
    super(Resnet50MultiWorkerKerasAccuracy, self).__init__(
        output_dir=output_dir, flag_methods=flag_methods)
@@ -1278,7 +1409,7 @@ class Resnet50MultiWorkerKerasAccuracy(keras_benchmark.KerasBenchmark):
                                top_1_min=MIN_TOP_1_ACCURACY,
                                top_1_max=MAX_TOP_1_ACCURACY):
    start_time_sec = time.time()
-    stats = resnet_imagenet_main.run(flags.FLAGS)
+    stats = classifier_trainer.run(flags.FLAGS)
    wall_time_sec = time.time() - start_time_sec

    super(Resnet50MultiWorkerKerasAccuracy, self)._report_benchmark(

--- a/official/vision/image_classification/resnet/cifar_preprocessing.py
+++ b/official/vision/image_classification/resnet/cifar_preprocessing.py
--- a/official/benchmark/models/resnet_cifar_main.py
+++ b/official/benchmark/models/resnet_cifar_main.py
@@ -23,12 +23,13 @@ from absl import flags
 from absl import logging
 import numpy as np
 import tensorflow as tf
+from official.benchmark.models import cifar_preprocessing
 from official.benchmark.models import resnet_cifar_model
+from official.benchmark.models import synthetic_util
 from official.utils.flags import core as flags_core
 from official.utils.logs import logger
 from official.utils.misc import distribution_utils
 from official.utils.misc import keras_utils
-from official.vision.image_classification.resnet import cifar_preprocessing
 from official.vision.image_classification.resnet import common


@@ -159,7 +160,7 @@ def run(flags_obj):
  strategy_scope = distribution_utils.get_strategy_scope(strategy)

  if flags_obj.use_synthetic_data:
-    distribution_utils.set_up_synthetic_data()
+    synthetic_util.set_up_synthetic_data()
    input_fn = common.get_synth_input_fn(
        height=cifar_preprocessing.HEIGHT,
        width=cifar_preprocessing.WIDTH,
@@ -168,7 +169,7 @@ def run(flags_obj):
        dtype=flags_core.get_tf_dtype(flags_obj),
        drop_remainder=True)
  else:
-    distribution_utils.undo_set_up_synthetic_data()
+    synthetic_util.undo_set_up_synthetic_data()
    input_fn = cifar_preprocessing.input_fn

  train_input_dataset = input_fn(

--- a/official/benchmark/models/resnet_cifar_test.py
+++ b/official/benchmark/models/resnet_cifar_test.py
@@ -24,10 +24,10 @@ import tensorflow as tf

 from tensorflow.python.eager import context
 from tensorflow.python.platform import googletest
+from official.benchmark.models import cifar_preprocessing
 from official.benchmark.models import resnet_cifar_main
 from official.utils.misc import keras_utils
 from official.utils.testing import integration
-from official.vision.image_classification.resnet import cifar_preprocessing


 class KerasCifarTest(googletest.TestCase):

--- a/official/vision/image_classification/resnet/resnet_imagenet_main.py
+++ b/official/vision/image_classification/resnet/resnet_imagenet_main.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -98,7 +98,6 @@ def run(flags_obj):

  # pylint: disable=protected-access
  if flags_obj.use_synthetic_data:
-    distribution_utils.set_up_synthetic_data()
    input_fn = common.get_synth_input_fn(
        height=imagenet_preprocessing.DEFAULT_IMAGE_SIZE,
        width=imagenet_preprocessing.DEFAULT_IMAGE_SIZE,
@@ -107,7 +106,6 @@ def run(flags_obj):
        dtype=dtype,
        drop_remainder=True)
  else:
-    distribution_utils.undo_set_up_synthetic_data()
    input_fn = imagenet_preprocessing.input_fn

  # When `enable_xla` is True, we always drop the remainder of the batches

--- a/official/benchmark/models/resnet_imagenet_test.py
+++ b/official/benchmark/models/resnet_imagenet_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Test the keras ResNet model with ImageNet data."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl.testing import parameterized
+import tensorflow as tf
+
+from tensorflow.python.eager import context
+from official.benchmark.models import resnet_imagenet_main
+from official.utils.misc import keras_utils
+from official.utils.testing import integration
+from official.vision.image_classification.resnet import imagenet_preprocessing
+
+
+@parameterized.parameters(
+    "resnet",
+    # "resnet_polynomial_decay",  b/151854314
+    "mobilenet",
+    # "mobilenet_polynomial_decay"  b/151854314
+)
+class KerasImagenetTest(tf.test.TestCase):
+  """Unit tests for Keras Models with ImageNet."""
+  _default_flags_dict = [
+      "-batch_size", "4",
+      "-train_steps", "1",
+      "-use_synthetic_data", "true",
+      "-data_format", "channels_last",
+  ]
+  _extra_flags_dict = {
+      "resnet": [
+          "-model", "resnet50_v1.5",
+          "-optimizer", "resnet50_default",
+      ],
+      "resnet_polynomial_decay": [
+          "-model", "resnet50_v1.5",
+          "-optimizer", "resnet50_default",
+          "-pruning_method", "polynomial_decay",
+      ],
+      "mobilenet": [
+          "-model", "mobilenet",
+          "-optimizer", "mobilenet_default",
+      ],
+      "mobilenet_polynomial_decay": [
+          "-model", "mobilenet",
+          "-optimizer", "mobilenet_default",
+          "-pruning_method", "polynomial_decay",
+      ],
+  }
+  _tempdir = None
+
+  @classmethod
+  def setUpClass(cls):  # pylint: disable=invalid-name
+    super(KerasImagenetTest, cls).setUpClass()
+    resnet_imagenet_main.define_imagenet_keras_flags()
+
+  def setUp(self):
+    super(KerasImagenetTest, self).setUp()
+    imagenet_preprocessing.NUM_IMAGES["validation"] = 4
+    self.policy = \
+        tf.compat.v2.keras.mixed_precision.experimental.global_policy()
+
+  def tearDown(self):
+    super(KerasImagenetTest, self).tearDown()
+    tf.io.gfile.rmtree(self.get_temp_dir())
+    tf.compat.v2.keras.mixed_precision.experimental.set_policy(self.policy)
+
+  def get_extra_flags_dict(self, flags_key):
+    return self._extra_flags_dict[flags_key] + self._default_flags_dict
+
+  def test_end_to_end_no_dist_strat(self, flags_key):
+    """Test Keras model with 1 GPU, no distribution strategy."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    extra_flags = [
+        "-distribution_strategy", "off",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_graph_no_dist_strat(self, flags_key):
+    """Test Keras model in legacy graph mode with 1 GPU, no dist strat."""
+    extra_flags = [
+        "-enable_eager", "false",
+        "-distribution_strategy", "off",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_1_gpu(self, flags_key):
+    """Test Keras model with 1 GPU."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 1:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available".
+          format(1, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "1",
+        "-distribution_strategy", "mirrored",
+        "-enable_checkpoint_and_export", "1",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_1_gpu_fp16(self, flags_key):
+    """Test Keras model with 1 GPU and fp16."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 1:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available"
+          .format(1, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "1",
+        "-dtype", "fp16",
+        "-distribution_strategy", "mirrored",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    if "polynomial_decay" in extra_flags:
+      self.skipTest("Pruning with fp16 is not currently supported.")
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_2_gpu(self, flags_key):
+    """Test Keras model with 2 GPUs."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 2:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available".
+          format(2, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "2",
+        "-distribution_strategy", "mirrored",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_xla_2_gpu(self, flags_key):
+    """Test Keras model with XLA and 2 GPUs."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 2:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available".
+          format(2, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "2",
+        "-enable_xla", "true",
+        "-distribution_strategy", "mirrored",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_2_gpu_fp16(self, flags_key):
+    """Test Keras model with 2 GPUs and fp16."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 2:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available".
+          format(2, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "2",
+        "-dtype", "fp16",
+        "-distribution_strategy", "mirrored",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    if "polynomial_decay" in extra_flags:
+      self.skipTest("Pruning with fp16 is not currently supported.")
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  def test_end_to_end_xla_2_gpu_fp16(self, flags_key):
+    """Test Keras model with XLA, 2 GPUs and fp16."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    if context.num_gpus() < 2:
+      self.skipTest(
+          "{} GPUs are not available for this test. {} GPUs are available".
+          format(2, context.num_gpus()))
+
+    extra_flags = [
+        "-num_gpus", "2",
+        "-dtype", "fp16",
+        "-enable_xla", "true",
+        "-distribution_strategy", "mirrored",
+    ]
+    extra_flags = extra_flags + self.get_extra_flags_dict(flags_key)
+
+    if "polynomial_decay" in extra_flags:
+      self.skipTest("Pruning with fp16 is not currently supported.")
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+
+if __name__ == "__main__":
+  tf.test.main()
--- a/official/benchmark/models/resnet_imagenet_test_tpu.py
+++ b/official/benchmark/models/resnet_imagenet_test_tpu.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Test the keras ResNet model with ImageNet data on TPU."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl.testing import parameterized
+import tensorflow as tf
+from official.benchmark.models import resnet_imagenet_main
+from official.utils.misc import keras_utils
+from official.utils.testing import integration
+from official.vision.image_classification.resnet import imagenet_preprocessing
+
+
+class KerasImagenetTest(tf.test.TestCase, parameterized.TestCase):
+  """Unit tests for Keras Models with ImageNet."""
+
+  _extra_flags_dict = {
+      "resnet": [
+          "-batch_size", "4",
+          "-train_steps", "1",
+          "-use_synthetic_data", "true"
+          "-model", "resnet50_v1.5",
+          "-optimizer", "resnet50_default",
+      ],
+      "resnet_polynomial_decay": [
+          "-batch_size", "4",
+          "-train_steps", "1",
+          "-use_synthetic_data", "true",
+          "-model", "resnet50_v1.5",
+          "-optimizer", "resnet50_default",
+          "-pruning_method", "polynomial_decay",
+      ],
+  }
+  _tempdir = None
+
+  @classmethod
+  def setUpClass(cls):  # pylint: disable=invalid-name
+    super(KerasImagenetTest, cls).setUpClass()
+    resnet_imagenet_main.define_imagenet_keras_flags()
+
+  def setUp(self):
+    super(KerasImagenetTest, self).setUp()
+    imagenet_preprocessing.NUM_IMAGES["validation"] = 4
+    self.policy = \
+        tf.compat.v2.keras.mixed_precision.experimental.global_policy()
+
+  def tearDown(self):
+    super(KerasImagenetTest, self).tearDown()
+    tf.io.gfile.rmtree(self.get_temp_dir())
+    tf.compat.v2.keras.mixed_precision.experimental.set_policy(self.policy)
+
+  @parameterized.parameters([
+      "resnet",
+      # "resnet_polynomial_decay"  b/151854314
+  ])
+  def test_end_to_end_tpu(self, flags_key):
+    """Test Keras model with TPU distribution strategy."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    extra_flags = [
+        "-distribution_strategy", "tpu",
+        "-data_format", "channels_last",
+        "-enable_checkpoint_and_export", "1",
+    ]
+    extra_flags = extra_flags + self._extra_flags_dict[flags_key]
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+  @parameterized.parameters(["resnet"])
+  def test_end_to_end_tpu_bf16(self, flags_key):
+    """Test Keras model with TPU and bfloat16 activation."""
+    config = keras_utils.get_config_proto_v1()
+    tf.compat.v1.enable_eager_execution(config=config)
+
+    extra_flags = [
+        "-distribution_strategy", "tpu",
+        "-data_format", "channels_last",
+        "-dtype", "bf16",
+    ]
+    extra_flags = extra_flags + self._extra_flags_dict[flags_key]
+
+    integration.run_synthetic(
+        main=resnet_imagenet_main.run,
+        tmp_root=self.get_temp_dir(),
+        extra_flags=extra_flags
+    )
+
+
+if __name__ == "__main__":
+  tf.test.main()
--- a/official/staging/shakespeare/README.md
+++ b/official/staging/shakespeare/README.md
--- a/official/staging/shakespeare/__init__.py
+++ b/official/staging/shakespeare/__init__.py
--- a/official/staging/shakespeare/shakespeare_main.py
+++ b/official/staging/shakespeare/shakespeare_main.py
@@ -47,7 +47,6 @@ def define_flags():
                         epochs_between_evals=False,
                         stop_threshold=False,
                         num_gpu=True,
-                         hooks=False,
                         export_dir=False,
                         run_eagerly=True,
                         distribution_strategy=True)