Internal change

PiperOrigin-RevId: 329754787

Internal change
PiperOrigin-RevId: 329754787
b6215c6f · Abdullah Rashwan · A. Unique TensorFlower · a5b38a72 · b6215c6f · b6215c6f
Commit b6215c6f authored Sep 02, 2020 by Abdullah Rashwan Committed by A. Unique TensorFlower Sep 02, 2020
20 changed files
--- a/official/common/registry_imports.py
+++ b/official/common/registry_imports.py
@@ -15,5 +15,6 @@
 """All necessary imports for registration."""

 # pylint: disable=unused-import
-from official.nlp import tasks
+from official.nlp import tasks as nlp_task
 from official.utils.testing import mock_task
+from official.vision import beta
--- a/official/vision/beta/MODEL_GARDEN.md
+++ b/official/vision/beta/MODEL_GARDEN.md
+# TF Vision Model Garden
+
+## Introduction
+TF Vision model garden provides a large collection of baselines and checkpoints for image classification, object detection, and instance segmentation.
+
+
+## Image Classification
+### Common Settings and Notes
+* We provide ImageNet checkpoints for [ResNet](https://arxiv.org/abs/1512.03385) models.
+* Training details:
+  * All models are trained from scratch for 90 epochs with batch size 4096 and 1.6 initial stepwise decay learning rate.
+  * Unless noted, all models are trained with l2 weight regularization and ReLU activation.
+
+### ImageNet Baselines
+| model        | resolution    | epochs  | FLOPs (B)    | params (M)  |  Top-1  |  Top-5  | download |
+| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|---------:|---------:|
+| ResNet-50    | 224x224       |    90    | 4.1 | 25.6 | 76.1 | 92.9 | config |
+
+
+
+## Object Detection and Instance Segmentation
+### Common Settings and Notes
+* We provide models based on two detection frameworks, [RetinaNet](https://arxiv.org/abs/1708.02002) or [Mask R-CNN](https://arxiv.org/abs/1703.06870), and two backbones, [ResNet-FPN](https://arxiv.org/abs/1612.03144) or [SpineNet](https://arxiv.org/abs/1912.05027).
+* Models are all trained on COCO train2017 and evaluated on COCO val2017.
+* Training details:
+  * Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch adopt the 350 epochs schedule.
+  * The default training data augmentation implements horizontal flipping and scale jittering with a random scale between [0.5, 2.0].
+  * Unless noted, all models are trained with l2 weight regularization and ReLU activation.
+  * We use batch size 256 and stepwise learning rate that decays at the last 30 and 10 epoch.
+  * We use square image as input by resizing the long side of an image to the target size then padding the short side with zeros.
+
+### COCO Object Detection Baselines
+#### RetinaNet (ImageNet pretrained)
+| backbone        | resolution    | epochs  | FLOPs (B)     | params (M) |  box AP |   download |
+| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|-----------:|
+| R50-FPN      | 640x640       |    12    | 97.0 | 34.0 | 34.3 | config|
+| R50-FPN      | 640x640       |    36    | 97.0 | 34.0 | 37.3 | config|
+
+#### RetinaNet (Trained from scratch)
+| backbone        | resolution    | epochs  | FLOPs (B)     | params (M) |  box AP |   download |
+| ------------ |:-------------:| ---------:|-----------:|--------:|---------:|-----------:|
+| SpineNet-49  | 640x640       |    350    | 85.4| 28.5 | 42.4| config|
+| SpineNet-96  | 1024x1024     |    350    | 265.4 | 43.0 | 46.0 |  config |
+| SpineNet-143 | 1280x1280     |    350    | 524.0 | 67.0 | 46.8 |config|
+
+
+### Instance Segmentation Baselines
+#### Mask R-CNN (ImageNet pretrained)
+
+
+#### Mask R-CNN (Trained from scratch)
+| backbone        | resolution    | epochs  | FLOPs (B)  | params (M)  |  box AP |  mask AP  |   download |
+| ------------ |:-------------:| ---------:|-----------:|--------:|--------:|-----------:|-----------:|
+| SpineNet-49  | 640x640       |    350    | 215.7 | 40.8 | 42.6 | 37.9 | config |
--- a/official/vision/beta/README.md
+++ b/official/vision/beta/README.md
+This directory contains the new design of TF model garden vision framework.
+Stay tuned.
--- a/official/vision/beta/__init__.py
+++ b/official/vision/beta/__init__.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""NLP package definition."""
+# Lint as: python3
+# pylint: disable=unused-import
+from official.vision.beta import configs
+from official.vision.beta import tasks
--- a/official/vision/beta/configs/__init__.py
+++ b/official/vision/beta/configs/__init__.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Configs package definition."""
+
+from official.vision.beta.configs import image_classification
+from official.vision.beta.configs import maskrcnn
+from official.vision.beta.configs import retinanet
+from official.vision.beta.configs import video_classification
--- a/official/vision/beta/configs/backbones.py
+++ b/official/vision/beta/configs/backbones.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Backbones configurations."""
+from typing import Optional
+
+# Import libraries
+import dataclasses
+
+from official.modeling import hyperparams
+
+
+@dataclasses.dataclass
+class ResNet(hyperparams.Config):
+  """ResNet config."""
+  model_id: int = 50
+
+
+@dataclasses.dataclass
+class EfficientNet(hyperparams.Config):
+  """EfficientNet config."""
+  model_id: str = 'b0'
+  stochastic_depth_drop_rate: float = 0.0
+  se_ratio: float = 0.0
+
+
+@dataclasses.dataclass
+class SpineNet(hyperparams.Config):
+  """SpineNet config."""
+  model_id: str = '49'
+
+
+@dataclasses.dataclass
+class RevNet(hyperparams.Config):
+  """RevNet config."""
+  # Specifies the depth of RevNet.
+  model_id: int = 56
+
+
+@dataclasses.dataclass
+class Backbone(hyperparams.OneOfConfig):
+  """Configuration for backbones.
+
+  Attributes:
+    type: 'str', type of backbone be used, one the of fields below.
+    resnet: resnet backbone config.
+    revnet: revnet backbone config.
+    efficientnet: efficientnet backbone config.
+    spinenet: spinenet backbone config.
+  """
+  type: Optional[str] = None
+  resnet: ResNet = ResNet()
+  revnet: RevNet = RevNet()
+  efficientnet: EfficientNet = EfficientNet()
+  spinenet: SpineNet = SpineNet()
--- a/official/vision/beta/configs/backbones_3d.py
+++ b/official/vision/beta/configs/backbones_3d.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""3D Backbones configurations."""
+from typing import Optional, Tuple
+
+# Import libraries
+import dataclasses
+
+from official.modeling import hyperparams
+
+
+@dataclasses.dataclass
+class ResNet3DBlock(hyperparams.Config):
+  """Configuration of a ResNet 3D block."""
+  temporal_strides: int = 1
+  temporal_kernel_sizes: Tuple[int, ...] = ()
+  use_self_gating: bool = False
+
+
+@dataclasses.dataclass
+class ResNet3D(hyperparams.Config):
+  """ResNet config."""
+  model_id: int = 50
+  stem_temporal_conv_stride: int = 2
+  stem_temporal_pool_stride: int = 2
+  block_specs: Tuple[ResNet3DBlock, ...] = ()
+
+
+@dataclasses.dataclass
+class ResNet3D50(ResNet3D):
+  """Block specifications of the Resnet50 (3D) model."""
+  model_id: int = 50
+  block_specs: Tuple[
+      ResNet3DBlock, ResNet3DBlock, ResNet3DBlock, ResNet3DBlock] = (
+          ResNet3DBlock(temporal_strides=1,
+                        temporal_kernel_sizes=(3, 3, 3),
+                        use_self_gating=True),
+          ResNet3DBlock(temporal_strides=1,
+                        temporal_kernel_sizes=(3, 1, 3, 1),
+                        use_self_gating=True),
+          ResNet3DBlock(temporal_strides=1,
+                        temporal_kernel_sizes=(3, 1, 3, 1, 3, 1),
+                        use_self_gating=True),
+          ResNet3DBlock(temporal_strides=1,
+                        temporal_kernel_sizes=(1, 3, 1),
+                        use_self_gating=True))
+
+
+@dataclasses.dataclass
+class Backbone3D(hyperparams.OneOfConfig):
+  """Configuration for backbones.
+
+  Attributes:
+    type: 'str', type of backbone be used, on the of fields below.
+    resnet: resnet3d backbone config.
+  """
+  type: Optional[str] = None
+  resnet_3d: ResNet3D = ResNet3D50()
--- a/official/vision/beta/configs/common.py
+++ b/official/vision/beta/configs/common.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Common configurations."""
+
+# Import libraries
+import dataclasses
+
+from official.modeling import hyperparams
+
+
+@dataclasses.dataclass
+class NormActivation(hyperparams.Config):
+  activation: str = 'relu'
+  use_sync_bn: bool = False
+  norm_momentum: float = 0.99
+  norm_epsilon: float = 0.001
--- a/official/vision/beta/configs/decoders.py
+++ b/official/vision/beta/configs/decoders.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Decoders configurations."""
+from typing import Optional
+
+# Import libraries
+import dataclasses
+
+from official.modeling import hyperparams
+
+
+@dataclasses.dataclass
+class Identity(hyperparams.Config):
+  """Identity config."""
+  pass
+
+
+@dataclasses.dataclass
+class FPN(hyperparams.Config):
+  """FPN config."""
+  num_filters: int = 256
+  use_separable_conv: bool = False
+
+
+@dataclasses.dataclass
+class Decoder(hyperparams.OneOfConfig):
+  """Configuration for decoders.
+
+  Attributes:
+    type: 'str', type of decoder be used, on the of fields below.
+    fpn: fpn config.
+  """
+  type: Optional[str] = None
+  fpn: FPN = FPN()
+  identity: Identity = Identity()
--- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml
+++ b/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml
+runtime:
+  distribution_strategy: 'mirrored'
+  mixed_precision_dtype: 'float16'
+  loss_scale: 'dynamic'
+task:
+  model:
+    num_classes: 1001
+    input_size: [224, 224, 3]
+    backbone:
+      type: 'resnet'
+      resnet:
+        model_id: 50
+  losses:
+    l2_weight_decay: 0.0001
+    one_hot: True
+    label_smoothing: 0.1
+  train_data:
+    input_path: 'imagenet-2012-tfrecord/train*'
+    is_training: True
+    global_batch_size: 2048
+    dtype: 'float16'
+  validation_data:
+    input_path: 'imagenet-2012-tfrecord/valid*'
+    is_training: False
+    global_batch_size: 2048
+    dtype: 'float16'
+    drop_remainder: False
+trainer:
+  train_steps: 56160
+  validation_steps: 25
+  validation_interval: 625
+  steps_per_loop: 625
+  summary_interval: 625
+  checkpoint_interval: 625
+  optimizer_config:
+    optimizer:
+      type: 'sgd'
+      sgd:
+        momentum: 0.9
+    learning_rate:
+      type: 'stepwise'
+      stepwise:
+        boundaries: [18750, 37500, 50000]
+        values: [0.8, 0.08, 0.008, 0.0008]
+    warmup:
+      type: 'linear'
+      linear:
+        warmup_steps: 3125
--- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml
+++ b/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml
+runtime:
+  distribution_strategy: 'tpu'
+  mixed_precision_dtype: 'bfloat16'
+task:
+  model:
+    num_classes: 1001
+    input_size: [224, 224, 3]
+    backbone:
+      type: 'resnet'
+      resnet:
+        model_id: 50
+  losses:
+    l2_weight_decay: 0.0001
+    one_hot: True
+    label_smoothing: 0.1
+  train_data:
+    input_path: 'imagenet-2012-tfrecord/train*'
+    is_training: True
+    global_batch_size: 4096
+    dtype: 'bfloat16'
+  validation_data:
+    input_path: 'imagenet-2012-tfrecord/valid*'
+    is_training: False
+    global_batch_size: 4096
+    dtype: 'bfloat16'
+    drop_remainder: False
+trainer:
+  train_steps: 28080
+  validation_steps: 13
+  validation_interval: 312
+  steps_per_loop: 312
+  summary_interval: 312
+  checkpoint_interval: 312
+  optimizer_config:
+    optimizer:
+      type: 'sgd'
+      sgd:
+        momentum: 0.9
+    learning_rate:
+      type: 'stepwise'
+      stepwise:
+        boundaries: [9360, 18720, 24960]
+        values: [1.6, 0.16, 0.016, 0.0016]
+    warmup:
+      type: 'linear'
+      linear:
+        warmup_steps: 1560
--- a/official/vision/beta/configs/image_classification.py
+++ b/official/vision/beta/configs/image_classification.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Image classification configuration definition."""
+import os
+from typing import List
+import dataclasses
+from official.core import exp_factory
+from official.modeling import hyperparams
+from official.modeling import optimization
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision.beta.configs import backbones
+from official.vision.beta.configs import common
+
+
+@dataclasses.dataclass
+class DataConfig(cfg.DataConfig):
+  """Input config for training."""
+  input_path: str = ''
+  global_batch_size: int = 0
+  is_training: bool = True
+  dtype: str = 'float32'
+  shuffle_buffer_size: int = 10000
+  cycle_length: int = 10
+
+
+@dataclasses.dataclass
+class ImageClassificationModel(hyperparams.Config):
+  num_classes: int = 0
+  input_size: List[int] = dataclasses.field(default_factory=list)
+  backbone: backbones.Backbone = backbones.Backbone(
+      type='resnet', resnet=backbones.ResNet())
+  dropout_rate: float = 0.0
+  norm_activation: common.NormActivation = common.NormActivation()
+  # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
+  add_head_batch_norm: bool = False
+
+
+@dataclasses.dataclass
+class Losses(hyperparams.Config):
+  one_hot: bool = True
+  label_smoothing: float = 0.0
+  l2_weight_decay: float = 0.0
+
+
+@dataclasses.dataclass
+class ImageClassificationTask(cfg.TaskConfig):
+  """The model config."""
+  model: ImageClassificationModel = ImageClassificationModel()
+  train_data: DataConfig = DataConfig(is_training=True)
+  validation_data: DataConfig = DataConfig(is_training=False)
+  losses: Losses = Losses()
+  gradient_clip_norm: float = 0.0
+
+
+@exp_factory.register_config_factory('image_classification')
+def image_classification() -> cfg.ExperimentConfig:
+  """Image classification general."""
+  return cfg.ExperimentConfig(
+      task=ImageClassificationTask(),
+      trainer=cfg.TrainerConfig(),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+
+IMAGENET_TRAIN_EXAMPLES = 1281167
+IMAGENET_VAL_EXAMPLES = 50000
+IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord'
+
+
+@exp_factory.register_config_factory('resnet_imagenet')
+def image_classification_imagenet() -> cfg.ExperimentConfig:
+  """Image classification on imagenet with resnet."""
+  train_batch_size = 4096
+  eval_batch_size = 4096
+  steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size
+  config = cfg.ExperimentConfig(
+      task=ImageClassificationTask(
+          model=ImageClassificationModel(
+              num_classes=1001,
+              input_size=[224, 224, 3],
+              norm_activation=common.NormActivation(
+                  norm_momentum=0.9, norm_epsilon=1e-5)),
+          losses=Losses(l2_weight_decay=1e-4),
+          train_data=DataConfig(
+              input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=train_batch_size),
+          validation_data=DataConfig(
+              input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'),
+              is_training=False,
+              global_batch_size=eval_batch_size)),
+      trainer=cfg.TrainerConfig(
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          train_steps=90 * steps_per_epoch,
+          validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size,
+          validation_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [
+                          30 * steps_per_epoch, 60 * steps_per_epoch,
+                          80 * steps_per_epoch
+                      ],
+                      'values': [
+                          0.1 *  train_batch_size / 256,
+                          0.01 *  train_batch_size / 256,
+                          0.001 *  train_batch_size / 256,
+                          0.0001 *  train_batch_size / 256,
+                      ]
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 5 * steps_per_epoch,
+                      'warmup_learning_rate': 0
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
+
+
+@exp_factory.register_config_factory('revnet_imagenet')
+def image_classification_imagenet_revnet() -> cfg.ExperimentConfig:
+  """Returns a revnet config for image classification on imagenet."""
+  train_batch_size = 4096
+  eval_batch_size = 4096
+  steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size
+
+  config = cfg.ExperimentConfig(
+      task=ImageClassificationTask(
+          model=ImageClassificationModel(
+              num_classes=1001,
+              input_size=[224, 224, 3],
+              backbone=backbones.Backbone(
+                  type='revnet', revnet=backbones.RevNet(model_id=56)),
+              norm_activation=common.NormActivation(
+                  norm_momentum=0.9, norm_epsilon=1e-5),
+              add_head_batch_norm=True),
+          losses=Losses(l2_weight_decay=1e-4),
+          train_data=DataConfig(
+              input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=train_batch_size),
+          validation_data=DataConfig(
+              input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'),
+              is_training=False,
+              global_batch_size=eval_batch_size)),
+      trainer=cfg.TrainerConfig(
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          train_steps=90 * steps_per_epoch,
+          validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size,
+          validation_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [
+                          30 * steps_per_epoch, 60 * steps_per_epoch,
+                          80 * steps_per_epoch
+                      ],
+                      'values': [0.8, 0.08, 0.008, 0.0008]
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 5 * steps_per_epoch,
+                      'warmup_learning_rate': 0
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
--- a/official/vision/beta/configs/image_classification_test.py
+++ b/official/vision/beta/configs/image_classification_test.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for image_classification."""
+
+# pylint: disable=unused-import
+from absl.testing import parameterized
+import tensorflow as tf
+
+from official.core import exp_factory
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision import beta
+from official.vision.beta.configs import image_classification as exp_cfg
+
+
+class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase):
+
+  @parameterized.parameters(('resnet_imagenet',),
+                            ('revnet_imagenet',))
+  def test_image_classification_configs(self, config_name):
+    config = exp_factory.get_exp_config(config_name)
+    self.assertIsInstance(config, cfg.ExperimentConfig)
+    self.assertIsInstance(config.task, exp_cfg.ImageClassificationTask)
+    self.assertIsInstance(config.task.model,
+                          exp_cfg.ImageClassificationModel)
+    self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig)
+    config.task.train_data.is_training = None
+    with self.assertRaises(KeyError):
+      config.validate()
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/official/vision/beta/configs/maskrcnn.py
+++ b/official/vision/beta/configs/maskrcnn.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Mask R-CNN configuration definition."""
+
+import os
+from typing import List, Optional
+import dataclasses
+
+from official.core import exp_factory
+from official.modeling import hyperparams
+from official.modeling import optimization
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision.beta.configs import backbones
+from official.vision.beta.configs import common
+from official.vision.beta.configs import decoders
+
+
+# pylint: disable=missing-class-docstring
+@dataclasses.dataclass
+class TfExampleDecoder(hyperparams.Config):
+  regenerate_source_id: bool = False
+
+
+@dataclasses.dataclass
+class TfExampleDecoderLabelMap(hyperparams.Config):
+  regenerate_source_id: bool = False
+  label_map: str = ''
+
+
+@dataclasses.dataclass
+class DataDecoder(hyperparams.OneOfConfig):
+  type: Optional[str] = 'simple_decoder'
+  simple_decoder: TfExampleDecoder = TfExampleDecoder()
+  label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
+
+
+@dataclasses.dataclass
+class Parser(hyperparams.Config):
+  num_channels: int = 3
+  match_threshold: float = 0.5
+  unmatched_threshold: float = 0.5
+  aug_rand_hflip: bool = False
+  aug_scale_min: float = 1.0
+  aug_scale_max: float = 1.0
+  skip_crowd_during_training: bool = True
+  max_num_instances: int = 100
+  rpn_match_threshold: float = 0.7
+  rpn_unmatched_threshold: float = 0.3
+  rpn_batch_size_per_im: int = 256
+  rpn_fg_fraction: float = 0.5
+  mask_crop_size: int = 112
+
+
+@dataclasses.dataclass
+class DataConfig(cfg.DataConfig):
+  """Input config for training."""
+  input_path: str = ''
+  global_batch_size: int = 0
+  is_training: bool = False
+  dtype: str = 'bfloat16'
+  decoder: DataDecoder = DataDecoder()
+  parser: Parser = Parser()
+  shuffle_buffer_size: int = 10000
+
+
+@dataclasses.dataclass
+class Anchor(hyperparams.Config):
+  num_scales: int = 1
+  aspect_ratios: List[float] = dataclasses.field(
+      default_factory=lambda: [0.5, 1.0, 2.0])
+  anchor_size: float = 8.0
+
+
+@dataclasses.dataclass
+class RPNHead(hyperparams.Config):
+  num_convs: int = 1
+  num_filters: int = 256
+  use_separable_conv: bool = False
+
+
+@dataclasses.dataclass
+class DetectionHead(hyperparams.Config):
+  num_convs: int = 4
+  num_filters: int = 256
+  use_separable_conv: bool = False
+  num_fcs: int = 1
+  fc_dims: int = 1024
+
+
+@dataclasses.dataclass
+class ROIGenerator(hyperparams.Config):
+  pre_nms_top_k: int = 2000
+  pre_nms_score_threshold: float = 0.0
+  pre_nms_min_size_threshold: float = 0.0
+  nms_iou_threshold: float = 0.7
+  num_proposals: int = 1000
+  test_pre_nms_top_k: int = 1000
+  test_pre_nms_score_threshold: float = 0.0
+  test_pre_nms_min_size_threshold: float = 0.0
+  test_nms_iou_threshold: float = 0.7
+  test_num_proposals: int = 1000
+  use_batched_nms: bool = False
+
+
+@dataclasses.dataclass
+class ROISampler(hyperparams.Config):
+  mix_gt_boxes: bool = True
+  num_sampled_rois: int = 512
+  foreground_fraction: float = 0.25
+  foreground_iou_threshold: float = 0.5
+  background_iou_high_threshold: float = 0.5
+  background_iou_low_threshold: float = 0.0
+
+
+@dataclasses.dataclass
+class ROIAligner(hyperparams.Config):
+  crop_size: int = 7
+  sample_offset: float = 0.5
+
+
+@dataclasses.dataclass
+class DetectionGenerator(hyperparams.Config):
+  pre_nms_top_k: int = 5000
+  pre_nms_score_threshold: float = 0.05
+  nms_iou_threshold: float = 0.5
+  max_num_detections: int = 100
+  use_batched_nms: bool = False
+
+
+@dataclasses.dataclass
+class MaskHead(hyperparams.Config):
+  upsample_factor: int = 2
+  num_convs: int = 4
+  num_filters: int = 256
+  use_separable_conv: bool = False
+
+
+@dataclasses.dataclass
+class MaskSampler(hyperparams.Config):
+  num_sampled_masks: int = 128
+
+
+@dataclasses.dataclass
+class MaskROIAligner(hyperparams.Config):
+  crop_size: int = 14
+  sample_offset: float = 0.5
+
+
+@dataclasses.dataclass
+class MaskRCNN(hyperparams.Config):
+  num_classes: int = 0
+  input_size: List[int] = dataclasses.field(default_factory=list)
+  min_level: int = 2
+  max_level: int = 6
+  anchor: Anchor = Anchor()
+  include_mask: bool = True
+  backbone: backbones.Backbone = backbones.Backbone(
+      type='resnet', resnet=backbones.ResNet())
+  decoder: decoders.Decoder = decoders.Decoder(
+      type='fpn', fpn=decoders.FPN())
+  rpn_head: RPNHead = RPNHead()
+  detection_head: DetectionHead = DetectionHead()
+  roi_generator: ROIGenerator = ROIGenerator()
+  roi_sampler: ROISampler = ROISampler()
+  roi_aligner: ROIAligner = ROIAligner()
+  detection_generator: DetectionGenerator = DetectionGenerator()
+  mask_head: Optional[MaskHead] = MaskHead()
+  mask_sampler: Optional[MaskSampler] = MaskSampler()
+  mask_roi_aligner: Optional[MaskROIAligner] = MaskROIAligner()
+  norm_activation: common.NormActivation = common.NormActivation(
+      norm_momentum=0.997,
+      norm_epsilon=0.0001,
+      use_sync_bn=True)
+
+
+@dataclasses.dataclass
+class Losses(hyperparams.Config):
+  rpn_huber_loss_delta: float = 1. / 9.
+  frcnn_huber_loss_delta: float = 1.
+  l2_weight_decay: float = 0.0
+  rpn_score_weight: float = 1.0
+  rpn_box_weight: float = 1.0
+  frcnn_class_weight: float = 1.0
+  frcnn_box_weight: float = 1.0
+  mask_weight: float = 1.0
+
+
+@dataclasses.dataclass
+class MaskRCNNTask(cfg.TaskConfig):
+  model: MaskRCNN = MaskRCNN()
+  train_data: DataConfig = DataConfig(is_training=True)
+  validation_data: DataConfig = DataConfig(is_training=False)
+  losses: Losses = Losses()
+  init_checkpoint: Optional[str] = None
+  init_checkpoint_modules: str = 'all'  # all or backbone
+  annotation_file: Optional[str] = None
+  gradient_clip_norm: float = 0.0
+
+
+COCO_INPUT_PATH_BASE = 'coco'
+
+
+@exp_factory.register_config_factory('fasterrcnn_resnetfpn_coco')
+def fasterrcnn_resnetfpn_coco() -> cfg.ExperimentConfig:
+  """COCO object detection with Faster R-CNN."""
+  steps_per_epoch = 500
+  coco_val_samples = 5000
+
+  config = cfg.ExperimentConfig(
+      runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
+      task=MaskRCNNTask(
+          init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
+          init_checkpoint_modules='backbone',
+          annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
+                                       'instances_val2017.json'),
+          model=MaskRCNN(
+              num_classes=91,
+              input_size=[1024, 1024, 3],
+              include_mask=False,
+              mask_head=None,
+              mask_sampler=None,
+              mask_roi_aligner=None),
+          losses=Losses(l2_weight_decay=0.00004),
+          train_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=64,
+              parser=Parser(
+                  aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)),
+          validation_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
+              is_training=False,
+              global_batch_size=8)),
+      trainer=cfg.TrainerConfig(
+          train_steps=22500,
+          validation_steps=coco_val_samples // 8,
+          validation_interval=steps_per_epoch,
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [15000, 20000],
+                      'values': [0.12, 0.012, 0.0012],
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 500,
+                      'warmup_learning_rate': 0.0067
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
+
+
+@exp_factory.register_config_factory('maskrcnn_resnetfpn_coco')
+def maskrcnn_resnetfpn_coco() -> cfg.ExperimentConfig:
+  """COCO object detection with Mask R-CNN."""
+  steps_per_epoch = 500
+  coco_val_samples = 5000
+
+  config = cfg.ExperimentConfig(
+      runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
+      task=MaskRCNNTask(
+          init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
+          init_checkpoint_modules='backbone',
+          annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
+                                       'instances_val2017.json'),
+          model=MaskRCNN(
+              num_classes=91,
+              input_size=[1024, 1024, 3],
+              include_mask=True),
+          losses=Losses(l2_weight_decay=0.00004),
+          train_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=64,
+              parser=Parser(
+                  aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)),
+          validation_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
+              is_training=False,
+              global_batch_size=8)),
+      trainer=cfg.TrainerConfig(
+          train_steps=22500,
+          validation_steps=coco_val_samples // 8,
+          validation_interval=steps_per_epoch,
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [15000, 20000],
+                      'values': [0.12, 0.012, 0.0012],
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 500,
+                      'warmup_learning_rate': 0.0067
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
+
+
+@exp_factory.register_config_factory('maskrcnn_spinenet_coco')
+def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig:
+  """COCO object detection with Mask R-CNN with SpineNet backbone."""
+  steps_per_epoch = 463
+  coco_val_samples = 5000
+
+  config = cfg.ExperimentConfig(
+      runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
+      task=MaskRCNNTask(
+          annotation_file=os.path.join(COCO_INPUT_PATH_BASE,
+                                       'instances_val2017.json'),
+          model=MaskRCNN(
+              backbone=backbones.Backbone(
+                  type='spinenet', spinenet=backbones.SpineNet(model_id='49')),
+              decoder=decoders.Decoder(
+                  type='identity', identity=decoders.Identity()),
+              anchor=Anchor(anchor_size=3),
+              norm_activation=common.NormActivation(use_sync_bn=True),
+              num_classes=91,
+              input_size=[640, 640, 3],
+              min_level=3,
+              max_level=7,
+              include_mask=True),
+          losses=Losses(l2_weight_decay=0.00004),
+          train_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=256,
+              parser=Parser(
+                  aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
+          validation_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
+              is_training=False,
+              global_batch_size=8)),
+      trainer=cfg.TrainerConfig(
+          train_steps=steps_per_epoch * 350,
+          validation_steps=coco_val_samples // 8,
+          validation_interval=steps_per_epoch,
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [
+                          steps_per_epoch * 320, steps_per_epoch * 340
+                      ],
+                      'values': [0.28, 0.028, 0.0028],
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 2000,
+                      'warmup_learning_rate': 0.0067
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+  return config
--- a/official/vision/beta/configs/retinanet.py
+++ b/official/vision/beta/configs/retinanet.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""RetinaNet configuration definition."""
+
+import os
+from typing import List, Optional
+import dataclasses
+
+from official.core import exp_factory
+from official.modeling import hyperparams
+from official.modeling import optimization
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision.beta.configs import backbones
+from official.vision.beta.configs import common
+from official.vision.beta.configs import decoders
+
+
+# pylint: disable=missing-class-docstring
+@dataclasses.dataclass
+class TfExampleDecoder(hyperparams.Config):
+  regenerate_source_id: bool = False
+
+
+@dataclasses.dataclass
+class TfExampleDecoderLabelMap(hyperparams.Config):
+  regenerate_source_id: bool = False
+  label_map: str = ''
+
+
+@dataclasses.dataclass
+class DataDecoder(hyperparams.OneOfConfig):
+  type: Optional[str] = 'simple_decoder'
+  simple_decoder: TfExampleDecoder = TfExampleDecoder()
+  label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
+
+
+@dataclasses.dataclass
+class Parser(hyperparams.Config):
+  num_channels: int = 3
+  match_threshold: float = 0.5
+  unmatched_threshold: float = 0.5
+  aug_rand_hflip: bool = False
+  aug_scale_min: float = 1.0
+  aug_scale_max: float = 1.0
+  skip_crowd_during_training: bool = True
+  max_num_instances: int = 100
+
+
+@dataclasses.dataclass
+class DataConfig(cfg.DataConfig):
+  """Input config for training."""
+  input_path: str = ''
+  global_batch_size: int = 0
+  is_training: bool = False
+  dtype: str = 'bfloat16'
+  decoder: DataDecoder = DataDecoder()
+  parser: Parser = Parser()
+  shuffle_buffer_size: int = 10000
+
+
+@dataclasses.dataclass
+class Anchor(hyperparams.Config):
+  num_scales: int = 3
+  aspect_ratios: List[float] = dataclasses.field(
+      default_factory=lambda: [0.5, 1.0, 2.0])
+  anchor_size: float = 4.0
+
+
+@dataclasses.dataclass
+class Losses(hyperparams.Config):
+  focal_loss_alpha: float = 0.25
+  focal_loss_gamma: float = 1.5
+  huber_loss_delta: float = 0.1
+  box_loss_weight: int = 50
+  l2_weight_decay: float = 0.0
+
+
+@dataclasses.dataclass
+class RetinaNetHead(hyperparams.Config):
+  num_convs: int = 4
+  num_filters: int = 256
+  use_separable_conv: bool = False
+
+
+@dataclasses.dataclass
+class DetectionGenerator(hyperparams.Config):
+  pre_nms_top_k: int = 5000
+  pre_nms_score_threshold: float = 0.05
+  nms_iou_threshold: float = 0.5
+  max_num_detections: int = 100
+  use_batched_nms: bool = False
+
+
+@dataclasses.dataclass
+class RetinaNet(hyperparams.Config):
+  num_classes: int = 0
+  input_size: List[int] = dataclasses.field(default_factory=list)
+  min_level: int = 3
+  max_level: int = 7
+  anchor: Anchor = Anchor()
+  backbone: backbones.Backbone = backbones.Backbone(
+      type='resnet', resnet=backbones.ResNet())
+  decoder: decoders.Decoder = decoders.Decoder(
+      type='fpn', fpn=decoders.FPN())
+  head: RetinaNetHead = RetinaNetHead()
+  detection_generator: DetectionGenerator = DetectionGenerator()
+  norm_activation: common.NormActivation = common.NormActivation()
+
+
+@dataclasses.dataclass
+class RetinaNetTask(cfg.TaskConfig):
+  model: RetinaNet = RetinaNet()
+  train_data: DataConfig = DataConfig(is_training=True)
+  validation_data: DataConfig = DataConfig(is_training=False)
+  losses: Losses = Losses()
+  init_checkpoint: Optional[str] = None
+  init_checkpoint_modules: str = 'all'  # all or backbone
+  gradient_clip_norm: float = 0.0
+
+
+@exp_factory.register_config_factory('retinanet')
+def retinanet() -> cfg.ExperimentConfig:
+  """RetinaNet general config."""
+  return cfg.ExperimentConfig(
+      task=RetinaNetTask(),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+
+COCO_INPUT_PATH_BASE = 'coco'
+COCO_TRIAN_EXAMPLES = 118287
+COCO_VAL_EXAMPLES = 5000
+
+
+@exp_factory.register_config_factory('retinanet_resnetfpn_coco')
+def retinanet_resnetfpn_coco() -> cfg.ExperimentConfig:
+  """COCO object detection with RetinaNet."""
+  train_batch_size = 256
+  eval_batch_size = 8
+  steps_per_epoch = COCO_TRIAN_EXAMPLES // train_batch_size
+
+  config = cfg.ExperimentConfig(
+      runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'),
+      task=RetinaNetTask(
+          init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080',
+          init_checkpoint_modules='backbone',
+          model=RetinaNet(
+              num_classes=91,
+              input_size=[640, 640, 3],
+              min_level=3,
+              max_level=7),
+          losses=Losses(l2_weight_decay=1e-4),
+          train_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=train_batch_size,
+              parser=Parser(
+                  aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
+          validation_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
+              is_training=False,
+              global_batch_size=eval_batch_size)),
+      trainer=cfg.TrainerConfig(
+          train_steps=72 * steps_per_epoch,
+          validation_steps=COCO_VAL_EXAMPLES // eval_batch_size,
+          validation_interval=steps_per_epoch,
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [
+                          57 * steps_per_epoch, 67 * steps_per_epoch
+                      ],
+                      'values': [
+                          0.28 * train_batch_size / 256.0,
+                          0.028 * train_batch_size / 256.0,
+                          0.0028 * train_batch_size / 256.0
+                      ],
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 500,
+                      'warmup_learning_rate': 0.0067
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
+
+
+@exp_factory.register_config_factory('retinanet_spinenet_coco')
+def retinanet_spinenet_coco() -> cfg.ExperimentConfig:
+  """COCO object detection with RetinaNet using SpineNet backbone."""
+  train_batch_size = 256
+  eval_batch_size = 8
+  steps_per_epoch = COCO_TRIAN_EXAMPLES // train_batch_size
+  input_size = 640
+
+  config = cfg.ExperimentConfig(
+      runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'),
+      task=RetinaNetTask(
+          model=RetinaNet(
+              backbone=backbones.Backbone(
+                  type='spinenet',
+                  spinenet=backbones.SpineNet(model_id='49')),
+              decoder=decoders.Decoder(
+                  type='identity', identity=decoders.Identity()),
+              anchor=Anchor(anchor_size=3),
+              norm_activation=common.NormActivation(use_sync_bn=True),
+              num_classes=91,
+              input_size=[input_size, input_size, 3],
+              min_level=3,
+              max_level=7),
+          losses=Losses(l2_weight_decay=4e-5),
+          train_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'),
+              is_training=True,
+              global_batch_size=train_batch_size,
+              parser=Parser(
+                  aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)),
+          validation_data=DataConfig(
+              input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'),
+              is_training=False,
+              global_batch_size=eval_batch_size)),
+      trainer=cfg.TrainerConfig(
+          train_steps=350 * steps_per_epoch,
+          validation_steps=COCO_VAL_EXAMPLES // eval_batch_size,
+          validation_interval=steps_per_epoch,
+          steps_per_loop=steps_per_epoch,
+          summary_interval=steps_per_epoch,
+          checkpoint_interval=steps_per_epoch,
+          optimizer_config=optimization.OptimizationConfig({
+              'optimizer': {
+                  'type': 'sgd',
+                  'sgd': {
+                      'momentum': 0.9
+                  }
+              },
+              'learning_rate': {
+                  'type': 'stepwise',
+                  'stepwise': {
+                      'boundaries': [
+                          320 * steps_per_epoch, 340 * steps_per_epoch
+                      ],
+                      'values': [
+                          0.28 * train_batch_size / 256.0,
+                          0.028 * train_batch_size / 256.0,
+                          0.0028 * train_batch_size / 256.0
+                      ],
+                  }
+              },
+              'warmup': {
+                  'type': 'linear',
+                  'linear': {
+                      'warmup_steps': 2000,
+                      'warmup_learning_rate': 0.0067
+                  }
+              }
+          })),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
+
+  return config
--- a/official/vision/beta/dataloaders/classification_input.py
+++ b/official/vision/beta/dataloaders/classification_input.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Classification decoder and parser."""
+# Import libraries
+import tensorflow as tf
+
+from official.vision.beta.dataloaders import decoder
+from official.vision.beta.dataloaders import parser
+from official.vision.beta.ops import preprocess_ops
+
+MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255)
+STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255)
+
+
+class Decoder(decoder.Decoder):
+  """A tf.Example decoder for classification task."""
+
+  def __init__(self):
+    self._keys_to_features = {
+        'image/encoded': tf.io.FixedLenFeature((), tf.string, default_value=''),
+        'image/class/label': (
+            tf.io.FixedLenFeature((), tf.int64, default_value=-1))
+    }
+
+  def decode(self, serialized_example):
+    return tf.io.parse_single_example(
+        serialized_example, self._keys_to_features)
+
+
+class Parser(parser.Parser):
+  """Parser to parse an image and its annotations into a dictionary of tensors."""
+
+  def __init__(self,
+               output_size,
+               num_classes,
+               aug_rand_hflip=True,
+               dtype='float32'):
+    """Initializes parameters for parsing annotations in the dataset.
+
+    Args:
+      output_size: `Tenssor` or `list` for [height, width] of output image. The
+        output_size should be divided by the largest feature stride 2^max_level.
+      num_classes: `float`, number of classes.
+      aug_rand_hflip: `bool`, if True, augment training with random
+        horizontal flip.
+      dtype: `str`, cast output image in dtype. It can be 'float32', 'float16',
+        or 'bfloat16'.
+    """
+    self._output_size = output_size
+    self._aug_rand_hflip = aug_rand_hflip
+    self._num_classes = num_classes
+    if dtype == 'float32':
+      self._dtype = tf.float32
+    elif dtype == 'float16':
+      self._dtype = tf.float16
+    elif dtype == 'bfloat16':
+      self._dtype = tf.bfloat16
+    else:
+      raise ValueError('dtype {!r} is not supported!'.format(dtype))
+
+  def _parse_train_data(self, decoded_tensors):
+    """Parses data for training."""
+    label = tf.cast(decoded_tensors['image/class/label'], dtype=tf.int32)
+
+    image_bytes = decoded_tensors['image/encoded']
+    image_shape = tf.image.extract_jpeg_shape(image_bytes)
+
+    # Crops image.
+    # TODO(pengchong): support image format other than JPEG.
+    cropped_image = preprocess_ops.random_crop_image_v2(
+        image_bytes, image_shape)
+    image = tf.cond(
+        tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)),
+        lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape),
+        lambda: cropped_image)
+
+    if self._aug_rand_hflip:
+      image = tf.image.random_flip_left_right(image)
+
+    # Resizes image.
+    image = tf.image.resize(
+        image, self._output_size, method=tf.image.ResizeMethod.BILINEAR)
+
+    # Normalizes image with mean and std pixel values.
+    image = preprocess_ops.normalize_image(image,
+                                           offset=MEAN_RGB,
+                                           scale=STDDEV_RGB)
+
+    # Convert image to self._dtype.
+    image = tf.image.convert_image_dtype(image, self._dtype)
+
+    return image, label
+
+  def _parse_eval_data(self, decoded_tensors):
+    """Parses data for evaluation."""
+    label = tf.cast(decoded_tensors['image/class/label'], dtype=tf.int32)
+    image_bytes = decoded_tensors['image/encoded']
+    image_shape = tf.image.extract_jpeg_shape(image_bytes)
+
+    # Center crops and resizes image.
+    image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape)
+
+    image = tf.image.resize(
+        image, self._output_size, method=tf.image.ResizeMethod.BILINEAR)
+
+    image = tf.reshape(image, [self._output_size[0], self._output_size[1], 3])
+
+    # Normalizes image with mean and std pixel values.
+    image = preprocess_ops.normalize_image(image,
+                                           offset=MEAN_RGB,
+                                           scale=STDDEV_RGB)
+
+    # Convert image to self._dtype.
+    image = tf.image.convert_image_dtype(image, self._dtype)
+
+    return image, label
--- a/official/vision/beta/dataloaders/classification_input_test.py
+++ b/official/vision/beta/dataloaders/classification_input_test.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests classification_input.py."""
+
+import io
+# Import libraries
+from absl.testing import parameterized
+import numpy as np
+from PIL import Image
+import tensorflow as tf
+from official.core import input_reader
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision.beta.dataloaders import classification_input
+
+
+def _encode_image(image_array, fmt):
+  image = Image.fromarray(image_array)
+  with io.BytesIO() as output:
+    image.save(output, format=fmt)
+    return output.getvalue()
+
+
+class DecoderTest(tf.test.TestCase, parameterized.TestCase):
+
+  @parameterized.parameters(
+      (100, 100, 0), (100, 100, 1), (100, 100, 2),
+  )
+  def test_decoder(self, image_height, image_width, num_instances):
+    decoder = classification_input.Decoder()
+
+    image = _encode_image(
+        np.uint8(np.random.rand(image_height, image_width, 3) * 255),
+        fmt='JPEG')
+    label = 2
+    serialized_example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded': (tf.train.Feature(
+                    bytes_list=tf.train.BytesList(value=[image]))),
+                'image/class/label': (
+                    tf.train.Feature(
+                        int64_list=tf.train.Int64List(value=[label]))),
+            })).SerializeToString()
+    decoded_tensors = decoder.decode(tf.convert_to_tensor(serialized_example))
+
+    results = tf.nest.map_structure(lambda x: x.numpy(), decoded_tensors)
+    self.assertCountEqual(
+        ['image/encoded', 'image/class/label'], results.keys())
+    self.assertEqual(label, results['image/class/label'])
+
+
+class ParserTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.parameters(
+      ([224, 224, 3], 'float32', True),
+      ([224, 224, 3], 'float16', True),
+      ([224, 224, 3], 'float32', False),
+      ([224, 224, 3], 'float16', False),
+      ([512, 640, 3], 'float32', True),
+      ([512, 640, 3], 'float16', True),
+      ([512, 640, 3], 'float32', False),
+      ([512, 640, 3], 'float16', False),
+      ([640, 640, 3], 'float32', True),
+      ([640, 640, 3], 'bfloat16', True),
+      ([640, 640, 3], 'float32', False),
+      ([640, 640, 3], 'bfloat16', False),
+  )
+  def test_parser(self, output_size, dtype, is_training):
+
+    params = cfg.DataConfig(
+        input_path='/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*',
+        global_batch_size=2,
+        is_training=True,
+        examples_consume=4)
+
+    decoder = classification_input.Decoder()
+    parser = classification_input.Parser(
+        output_size=output_size[:2],
+        num_classes=1001,
+        aug_rand_hflip=False,
+        dtype=dtype)
+
+    reader = input_reader.InputReader(
+        params,
+        dataset_fn=tf.data.TFRecordDataset,
+        decoder_fn=decoder.decode,
+        parser_fn=parser.parse_fn(params.is_training))
+
+    dataset = reader.read()
+
+    images, labels = next(iter(dataset))
+
+    self.assertAllEqual(images.numpy().shape,
+                        [params.global_batch_size] + output_size)
+    self.assertAllEqual(labels.numpy().shape, [params.global_batch_size])
+
+    if dtype == 'float32':
+      self.assertAllEqual(images.dtype, tf.float32)
+    elif dtype == 'float16':
+      self.assertAllEqual(images.dtype, tf.float16)
+    elif dtype == 'bfloat16':
+      self.assertAllEqual(images.dtype, tf.bfloat16)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/official/vision/beta/dataloaders/decoder.py
+++ b/official/vision/beta/dataloaders/decoder.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The generic decoder interface."""
+
+import abc
+
+
+class Decoder(object):
+  """Decodes the raw data into tensors."""
+
+  __metaclass__ = abc.ABCMeta
+
+  @abc.abstractmethod
+  def decode(self, serialized_example):
+    """Decodes the serialized example into tensors.
+
+    Args:
+      serialized_example: a serialized string tensor that encodes the data.
+
+    Returns:
+      decoded_tensors: a dict of Tensors.
+    """
+    pass
--- a/official/vision/beta/dataloaders/maskrcnn_input.py
+++ b/official/vision/beta/dataloaders/maskrcnn_input.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Data parser and processing for Mask R-CNN."""
+
+# Import libraries
+
+import tensorflow as tf
+
+from official.vision.beta.dataloaders import parser
+from official.vision.beta.dataloaders import utils
+from official.vision.beta.ops import anchor
+from official.vision.beta.ops import box_ops
+from official.vision.beta.ops import preprocess_ops
+
+
+class Parser(parser.Parser):
+  """Parser to parse an image and its annotations into a dictionary of tensors."""
+
+  def __init__(self,
+               output_size,
+               min_level,
+               max_level,
+               num_scales,
+               aspect_ratios,
+               anchor_size,
+               rpn_match_threshold=0.7,
+               rpn_unmatched_threshold=0.3,
+               rpn_batch_size_per_im=256,
+               rpn_fg_fraction=0.5,
+               aug_rand_hflip=False,
+               aug_scale_min=1.0,
+               aug_scale_max=1.0,
+               skip_crowd_during_training=True,
+               max_num_instances=100,
+               include_mask=False,
+               mask_crop_size=112,
+               dtype='float32'):
+    """Initializes parameters for parsing annotations in the dataset.
+
+    Args:
+      output_size: `Tensor` or `list` for [height, width] of output image. The
+        output_size should be divided by the largest feature stride 2^max_level.
+      min_level: `int` number of minimum level of the output feature pyramid.
+      max_level: `int` number of maximum level of the output feature pyramid.
+      num_scales: `int` number representing intermediate scales added
+        on each level. For instances, num_scales=2 adds one additional
+        intermediate anchor scales [2^0, 2^0.5] on each level.
+      aspect_ratios: `list` of float numbers representing the aspect raito
+        anchors added on each level. The number indicates the ratio of width to
+        height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors
+        on each scale level.
+      anchor_size: `float` number representing the scale of size of the base
+        anchor to the feature stride 2^level.
+      rpn_match_threshold:
+      rpn_unmatched_threshold:
+      rpn_batch_size_per_im:
+      rpn_fg_fraction:
+      aug_rand_hflip: `bool`, if True, augment training with random
+        horizontal flip.
+      aug_scale_min: `float`, the minimum scale applied to `output_size` for
+        data augmentation during training.
+      aug_scale_max: `float`, the maximum scale applied to `output_size` for
+        data augmentation during training.
+      skip_crowd_during_training: `bool`, if True, skip annotations labeled with
+        `is_crowd` equals to 1.
+      max_num_instances: `int` number of maximum number of instances in an
+        image. The groundtruth data will be padded to `max_num_instances`.
+      include_mask: a bool to indicate whether parse mask groundtruth.
+      mask_crop_size: the size which groundtruth mask is cropped to.
+      dtype: `str`, data type. One of {`bfloat16`, `float32`, `float16`}.
+    """
+
+    self._max_num_instances = max_num_instances
+    self._skip_crowd_during_training = skip_crowd_during_training
+
+    # Anchor.
+    self._output_size = output_size
+    self._min_level = min_level
+    self._max_level = max_level
+    self._num_scales = num_scales
+    self._aspect_ratios = aspect_ratios
+    self._anchor_size = anchor_size
+
+    # Target assigning.
+    self._rpn_match_threshold = rpn_match_threshold
+    self._rpn_unmatched_threshold = rpn_unmatched_threshold
+    self._rpn_batch_size_per_im = rpn_batch_size_per_im
+    self._rpn_fg_fraction = rpn_fg_fraction
+
+    # Data augmentation.
+    self._aug_rand_hflip = aug_rand_hflip
+    self._aug_scale_min = aug_scale_min
+    self._aug_scale_max = aug_scale_max
+
+    # Mask.
+    self._include_mask = include_mask
+    self._mask_crop_size = mask_crop_size
+
+    # Image output dtype.
+    self._dtype = dtype
+
+  def _parse_train_data(self, data):
+    """Parses data for training.
+
+    Args:
+      data: the decoded tensor dictionary from TfExampleDecoder.
+
+    Returns:
+      image: image tensor that is preproessed to have normalized value and
+        dimension [output_size[0], output_size[1], 3]
+      labels: a dictionary of tensors used for training. The following describes
+        {key: value} pairs in the dictionary.
+        image_info: a 2D `Tensor` that encodes the information of the image and
+          the applied preprocessing. It is in the format of
+          [[original_height, original_width], [scaled_height, scaled_width],
+        anchor_boxes: ordered dictionary with keys
+          [min_level, min_level+1, ..., max_level]. The values are tensor with
+          shape [height_l, width_l, 4] representing anchor boxes at each level.
+        rpn_score_targets: ordered dictionary with keys
+          [min_level, min_level+1, ..., max_level]. The values are tensor with
+          shape [height_l, width_l, anchors_per_location]. The height_l and
+          width_l represent the dimension of class logits at l-th level.
+        rpn_box_targets: ordered dictionary with keys
+          [min_level, min_level+1, ..., max_level]. The values are tensor with
+          shape [height_l, width_l, anchors_per_location * 4]. The height_l and
+          width_l represent the dimension of bounding box regression output at
+          l-th level.
+        gt_boxes: Groundtruth bounding box annotations. The box is represented
+           in [y1, x1, y2, x2] format. The coordinates are w.r.t the scaled
+           image that is fed to the network. The tennsor is padded with -1 to
+           the fixed dimension [self._max_num_instances, 4].
+        gt_classes: Groundtruth classes annotations. The tennsor is padded
+          with -1 to the fixed dimension [self._max_num_instances].
+        gt_masks: groundtrugh masks cropped by the bounding box and
+          resized to a fixed size determined by mask_crop_size.
+    """
+    classes = data['groundtruth_classes']
+    boxes = data['groundtruth_boxes']
+    if self._include_mask:
+      masks = data['groundtruth_instance_masks']
+
+    is_crowds = data['groundtruth_is_crowd']
+    # Skips annotations with `is_crowd` = True.
+    if self._skip_crowd_during_training:
+      num_groundtruths = tf.shape(classes)[0]
+      with tf.control_dependencies([num_groundtruths, is_crowds]):
+        indices = tf.cond(
+            tf.greater(tf.size(is_crowds), 0),
+            lambda: tf.where(tf.logical_not(is_crowds))[:, 0],
+            lambda: tf.cast(tf.range(num_groundtruths), tf.int64))
+      classes = tf.gather(classes, indices)
+      boxes = tf.gather(boxes, indices)
+      if self._include_mask:
+        masks = tf.gather(masks, indices)
+
+    # Gets original image and its size.
+    image = data['image']
+    image_shape = tf.shape(image)[0:2]
+
+    # Normalizes image with mean and std pixel values.
+    image = preprocess_ops.normalize_image(image)
+
+    # Flips image randomly during training.
+    if self._aug_rand_hflip:
+      if self._include_mask:
+        image, boxes, masks = preprocess_ops.random_horizontal_flip(
+            image, boxes, masks)
+      else:
+        image, boxes, _ = preprocess_ops.random_horizontal_flip(
+            image, boxes)
+
+    # Converts boxes from normalized coordinates to pixel coordinates.
+    # Now the coordinates of boxes are w.r.t. the original image.
+    boxes = box_ops.denormalize_boxes(boxes, image_shape)
+
+    # Resizes and crops image.
+    image, image_info = preprocess_ops.resize_and_crop_image(
+        image,
+        self._output_size,
+        padded_size=preprocess_ops.compute_padded_size(
+            self._output_size, 2 ** self._max_level),
+        aug_scale_min=self._aug_scale_min,
+        aug_scale_max=self._aug_scale_max)
+    image_height, image_width, _ = image.get_shape().as_list()
+
+    # Resizes and crops boxes.
+    # Now the coordinates of boxes are w.r.t the scaled image.
+    image_scale = image_info[2, :]
+    offset = image_info[3, :]
+    boxes = preprocess_ops.resize_and_crop_boxes(
+        boxes, image_scale, image_info[1, :], offset)
+
+    # Filters out ground truth boxes that are all zeros.
+    indices = box_ops.get_non_empty_box_indices(boxes)
+    boxes = tf.gather(boxes, indices)
+    classes = tf.gather(classes, indices)
+    if self._include_mask:
+      masks = tf.gather(masks, indices)
+      # Transfer boxes to the original image space and do normalization.
+      cropped_boxes = boxes + tf.tile(tf.expand_dims(offset, axis=0), [1, 2])
+      cropped_boxes /= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2])
+      cropped_boxes = box_ops.normalize_boxes(cropped_boxes, image_shape)
+      num_masks = tf.shape(masks)[0]
+      masks = tf.image.crop_and_resize(
+          tf.expand_dims(masks, axis=-1),
+          cropped_boxes,
+          box_indices=tf.range(num_masks, dtype=tf.int32),
+          crop_size=[self._mask_crop_size, self._mask_crop_size],
+          method='bilinear')
+      masks = tf.squeeze(masks, axis=-1)
+
+    # Assigns anchor targets.
+    # Note that after the target assignment, box targets are absolute pixel
+    # offsets w.r.t. the scaled image.
+    input_anchor = anchor.build_anchor_generator(
+        min_level=self._min_level,
+        max_level=self._max_level,
+        num_scales=self._num_scales,
+        aspect_ratios=self._aspect_ratios,
+        anchor_size=self._anchor_size)
+    anchor_boxes = input_anchor(image_size=(image_height, image_width))
+    anchor_labeler = anchor.RpnAnchorLabeler(
+        self._rpn_match_threshold,
+        self._rpn_unmatched_threshold,
+        self._rpn_batch_size_per_im,
+        self._rpn_fg_fraction)
+    rpn_score_targets, rpn_box_targets = anchor_labeler.label_anchors(
+        anchor_boxes, boxes,
+        tf.cast(tf.expand_dims(classes, axis=-1), dtype=tf.float32))
+
+    # Casts input image to self._dtype
+    image = tf.cast(image, dtype=self._dtype)
+
+    # Packs labels for model_fn outputs.
+    labels = {
+        'anchor_boxes':
+            anchor_boxes,
+        'image_info':
+            image_info,
+        'rpn_score_targets':
+            rpn_score_targets,
+        'rpn_box_targets':
+            rpn_box_targets,
+        'gt_boxes':
+            preprocess_ops.clip_or_pad_to_fixed_size(boxes,
+                                                     self._max_num_instances,
+                                                     -1),
+        'gt_classes':
+            preprocess_ops.clip_or_pad_to_fixed_size(classes,
+                                                     self._max_num_instances,
+                                                     -1),
+    }
+    if self._include_mask:
+      labels['gt_masks'] = preprocess_ops.clip_or_pad_to_fixed_size(
+          masks, self._max_num_instances, -1)
+
+    return image, labels
+
+  def _parse_eval_data(self, data):
+    """Parses data for evaluation.
+
+    Args:
+      data: the decoded tensor dictionary from TfExampleDecoder.
+
+    Returns:
+      A dictionary of {'images': image, 'labels': labels} where
+        image: image tensor that is preproessed to have normalized value and
+          dimension [output_size[0], output_size[1], 3]
+        labels: a dictionary of tensors used for training. The following
+          describes {key: value} pairs in the dictionary.
+          source_ids: Source image id. Default value -1 if the source id is
+            empty in the groundtruth annotation.
+          image_info: a 2D `Tensor` that encodes the information of the image
+            and the applied preprocessing. It is in the format of
+            [[original_height, original_width], [scaled_height, scaled_width],
+          anchor_boxes: ordered dictionary with keys
+            [min_level, min_level+1, ..., max_level]. The values are tensor with
+            shape [height_l, width_l, 4] representing anchor boxes at each
+            level.
+    """
+    # Gets original image and its size.
+    image = data['image']
+    image_shape = tf.shape(image)[0:2]
+
+    # Normalizes image with mean and std pixel values.
+    image = preprocess_ops.normalize_image(image)
+
+    # Resizes and crops image.
+    image, image_info = preprocess_ops.resize_and_crop_image(
+        image,
+        self._output_size,
+        padded_size=preprocess_ops.compute_padded_size(
+            self._output_size, 2 ** self._max_level),
+        aug_scale_min=1.0,
+        aug_scale_max=1.0)
+    image_height, image_width, _ = image.get_shape().as_list()
+
+    # Casts input image to self._dtype
+    image = tf.cast(image, dtype=self._dtype)
+
+    # Converts boxes from normalized coordinates to pixel coordinates.
+    boxes = box_ops.denormalize_boxes(data['groundtruth_boxes'], image_shape)
+
+    # Compute Anchor boxes.
+    input_anchor = anchor.build_anchor_generator(
+        min_level=self._min_level,
+        max_level=self._max_level,
+        num_scales=self._num_scales,
+        aspect_ratios=self._aspect_ratios,
+        anchor_size=self._anchor_size)
+    anchor_boxes = input_anchor(image_size=(image_height, image_width))
+
+    labels = {
+        'image_info': image_info,
+        'anchor_boxes': anchor_boxes,
+    }
+
+    groundtruths = {
+        'source_id': data['source_id'],
+        'height': data['height'],
+        'width': data['width'],
+        'num_detections': tf.shape(data['groundtruth_classes']),
+        'boxes': boxes,
+        'classes': data['groundtruth_classes'],
+        'areas': data['groundtruth_area'],
+        'is_crowds': tf.cast(data['groundtruth_is_crowd'], tf.int32),
+    }
+    groundtruths['source_id'] = utils.process_source_id(
+        groundtruths['source_id'])
+    groundtruths = utils.pad_groundtruths_to_fixed_size(
+        groundtruths, self._max_num_instances)
+    labels['groundtruths'] = groundtruths
+    return image, labels
--- a/official/vision/beta/dataloaders/maskrcnn_input_test.py
+++ b/official/vision/beta/dataloaders/maskrcnn_input_test.py
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for maskrcnn_input."""
+
+# Import libraries
+from absl.testing import parameterized
+import tensorflow as tf
+from official.core import input_reader
+from official.modeling.hyperparams import config_definitions as cfg
+from official.vision.beta.dataloaders import maskrcnn_input
+from official.vision.beta.dataloaders import tf_example_decoder
+
+
+class InputReaderTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.parameters(
+      ([1024, 1024], True, True, True),
+      ([1024, 1024], True, False, True),
+      ([1024, 1024], False, True, True),
+      ([1024, 1024], False, False, True),
+      ([1024, 1024], True, True, False),
+      ([1024, 1024], True, False, False),
+      ([1024, 1024], False, True, False),
+      ([1024, 1024], False, False, False),
+  )
+  def testMaskRCNNInputReader(self,
+                              output_size,
+                              skip_crowd_during_training,
+                              include_mask,
+                              is_training):
+    min_level = 3
+    max_level = 7
+    num_scales = 3
+    aspect_ratios = [1.0, 2.0, 0.5]
+    max_num_instances = 100
+    batch_size = 2
+    mask_crop_size = 112
+    anchor_size = 4.0
+
+    params = cfg.DataConfig(
+        input_path='/placer/prod/home/snaggletooth/test/data/coco/val*',
+        global_batch_size=batch_size,
+        is_training=is_training)
+
+    parser = maskrcnn_input.Parser(
+        output_size=output_size,
+        min_level=min_level,
+        max_level=max_level,
+        num_scales=num_scales,
+        aspect_ratios=aspect_ratios,
+        anchor_size=anchor_size,
+        rpn_match_threshold=0.7,
+        rpn_unmatched_threshold=0.3,
+        rpn_batch_size_per_im=256,
+        rpn_fg_fraction=0.5,
+        aug_rand_hflip=True,
+        aug_scale_min=0.8,
+        aug_scale_max=1.2,
+        skip_crowd_during_training=skip_crowd_during_training,
+        max_num_instances=max_num_instances,
+        include_mask=include_mask,
+        mask_crop_size=mask_crop_size,
+        dtype='bfloat16')
+
+    decoder = tf_example_decoder.TfExampleDecoder(include_mask=include_mask)
+    reader = input_reader.InputReader(
+        params,
+        dataset_fn=tf.data.TFRecordDataset,
+        decoder_fn=decoder.decode,
+        parser_fn=parser.parse_fn(params.is_training))
+
+    dataset = reader.read()
+    iterator = iter(dataset)
+
+    images, labels = next(iterator)
+
+    np_images = images.numpy()
+    np_labels = tf.nest.map_structure(lambda x: x.numpy(), labels)
+
+    if is_training:
+      self.assertAllEqual(np_images.shape,
+                          [batch_size, output_size[0], output_size[1], 3])
+      self.assertAllEqual(np_labels['image_info'].shape, [batch_size, 4, 2])
+      self.assertAllEqual(np_labels['gt_boxes'].shape,
+                          [batch_size, max_num_instances, 4])
+      self.assertAllEqual(np_labels['gt_classes'].shape,
+                          [batch_size, max_num_instances])
+      if include_mask:
+        self.assertAllEqual(np_labels['gt_masks'].shape,
+                            [batch_size, max_num_instances,
+                             mask_crop_size, mask_crop_size])
+      for level in range(min_level, max_level + 1):
+        stride = 2 ** level
+        output_size_l = [output_size[0] / stride, output_size[1] / stride]
+        anchors_per_location = num_scales * len(aspect_ratios)
+        self.assertAllEqual(
+            np_labels['rpn_score_targets'][level].shape,
+            [batch_size, output_size_l[0], output_size_l[1],
+             anchors_per_location])
+        self.assertAllEqual(
+            np_labels['rpn_box_targets'][level].shape,
+            [batch_size, output_size_l[0], output_size_l[1],
+             4 * anchors_per_location])
+        self.assertAllEqual(
+            np_labels['anchor_boxes'][level].shape,
+            [batch_size, output_size_l[0], output_size_l[1],
+             4 * anchors_per_location])
+    else:
+      self.assertAllEqual(np_images.shape,
+                          [batch_size, output_size[0], output_size[1], 3])
+      self.assertAllEqual(np_labels['image_info'].shape, [batch_size, 4, 2])
+
+
+if __name__ == '__main__':
+  tf.test.main()