Merge branch 'master' into master

1f8b5b27 · Simon Geisler · GitHub · 0eeeaf98 · 8fcf177e · 1f8b5b27
Unverified Commit 1f8b5b27 authored Sep 03, 2021 by Simon Geisler Committed by GitHub Sep 03, 2021
20 changed files
--- a/official/recommendation/ranking/preprocessing/README.md
+++ b/official/recommendation/ranking/preprocessing/README.md
+## Download and preprocess Criteo TB dataset
+[Apache Beam](https://beam.apache.org) enables distributed preprocessing of the
+dataset and can be run on
+[Google Cloud Dataflow](https://cloud.google.com/dataflow/). The preprocessing
+scripts can be run locally via DirectRunner provided that the local host has
+enough CPU/Memory/Storage.
+Install required packages.
+```bash
+python3 setup.py install
+```
+Set up the following environment variables, replacing bucket-name with the name
+of your Cloud Storage bucket and project name with your GCP project name.
+```bash
+export STORAGE_BUCKET=gs://bucket-name
+export PROJECT=my-gcp-project
+export REGION=us-central1
+```
+Note: If running locally above environment variables won't be needed and instead
+of gs://bucket-name a local path can be used, also consider passing smaller
+`max_vocab_size` argument.
+1.  Download raw
+    [Criteo TB dataset](https://labs.criteo.com/2013/12/download-terabyte-click-logs/)
+    to a GCS bucket.
+Organize the data in the following way:
+*   The files day_0.gz, day_1.gz, ..., day_22.gz in
+    ${STORAGE_BUCKET}/criteo_raw/train/
+*   The file day_23.gz in ${STORAGE_BUCKET}/criteo_raw/test/
+2. Shard the raw training/test data into multiple files.
+```bash
+python3 shard_rebalancer.py \
+  --input_path "${STORAGE_BUCKET}/criteo_raw/train/*" \
+  --output_path "${STORAGE_BUCKET}/criteo_raw_sharded/train/train" \
+  --num_output_files 1024 --filetype csv --runner DataflowRunner \
+  --project ${PROJECT} --region ${REGION}
+```
+```bash
+python3 shard_rebalancer.py \
+  --input_path "${STORAGE_BUCKET}/criteo_raw/test/*" \
+  --output_path "${STORAGE_BUCKET}/criteo_raw_sharded/test/test" \
+  --num_output_files 64 --filetype csv --runner DataflowRunner \
+  --project ${PROJECT} --region ${REGION}
+```
+3. Generate vocabulary and preprocess the data.
+Generate vocabulary:
+```bash
+python3 criteo_preprocess.py \
+  --input_path "${STORAGE_BUCKET}/criteo_raw_sharded/*/*" \
+  --output_path "${STORAGE_BUCKET}/criteo/" \
+  --temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
+  --vocab_gen_mode --runner DataflowRunner --max_vocab_size 5000000 \
+  --project ${PROJECT} --region ${REGION}
+```
+Preprocess training and test data:
+```bash
+python3 criteo_preprocess.py \
+  --input_path "${STORAGE_BUCKET}/criteo_raw_sharded/train/*" \
+  --output_path "${STORAGE_BUCKET}/criteo/train/train" \
+  --temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
+  --runner DataflowRunner --max_vocab_size 5000000 \
+  --project ${PROJECT} --region ${REGION}
+```
+```bash
+python3 criteo_preprocess.py \
+  --input_path "${STORAGE_BUCKET}/criteo_raw_sharded/test/*" \
+  --output_path "${STORAGE_BUCKET}/criteo/test/test" \
+  --temp_dir "${STORAGE_BUCKET}/criteo_vocab/" \
+  --runner DataflowRunner --max_vocab_size 5000000 \
+  --project ${PROJECT} --region ${REGION}
+```
+4. (Optional) Re-balance the dataset.
+```bash
+python3 shard_rebalancer.py \
+  --input_path "${STORAGE_BUCKET}/criteo/train/*" \
+  --output_path "${STORAGE_BUCKET}/criteo_balanced/train/train" \
+  --num_output_files 8192 --filetype csv --runner DataflowRunner \
+  --project ${PROJECT} --region ${REGION}
+```
+```bash
+python3 shard_rebalancer.py \
+  --input_path "${STORAGE_BUCKET}/criteo/test/*" \
+  --output_path "${STORAGE_BUCKET}/criteo_balanced/test/test" \
+  --num_output_files 1024 --filetype csv --runner DataflowRunner \
+  --project ${PROJECT} --region ${REGION}
+```
+At this point training and test data are in the buckets:
+* `${STORAGE_BUCKET}/criteo_balanced/train/`
+* `${STORAGE_BUCKET}/criteo_balanced/test/`
+All other buckets can be removed.
--- a/official/recommendation/ranking/preprocessing/criteo_preprocess.py
+++ b/official/recommendation/ranking/preprocessing/criteo_preprocess.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""TFX beam preprocessing pipeline for Criteo data.
+Preprocessing util for criteo data. Transformations:
+1. Fill missing features with zeros.
+2. Set negative integer features to zeros.
+3. Normalize integer features using log(x+1).
+4. For categorical features (hex), convert to integer and take value modulus the
+   max_vocab_size value.
+Usage:
+For raw Criteo data, this script should be run twice.
+First run should set vocab_gen_mode to true.  This run is used to generate
+  vocabulary files in the temp_dir location.
+Second run should set vocab_gen_mode to false.  It is necessary to point to the
+  same temp_dir used during the first run.
+"""
+import argparse
+import datetime
+import os
+from absl import logging
+import apache_beam as beam
+import numpy as np
+import tensorflow as tf
+import tensorflow_transform as tft
+import tensorflow_transform.beam as tft_beam
+from tensorflow_transform.tf_metadata import dataset_metadata
+from tensorflow_transform.tf_metadata import schema_utils
+from tfx_bsl.public import tfxio
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--input_path",
+    default=None,
+    required=True,
+    help="Input path. Be sure to set this to cover all data, to ensure "
+    "that sparse vocabs are complete.")
+parser.add_argument(
+    "--output_path",
+    default=None,
+    required=True,
+    help="Output path.")
+parser.add_argument(
+    "--temp_dir",
+    default=None,
+    required=True,
+    help="Directory to store temporary metadata. Important because vocab "
+         "dictionaries will be stored here. Co-located with data, ideally.")
+parser.add_argument(
+    "--csv_delimeter",
+    default="\t",
+    help="Delimeter string for input and output.")
+parser.add_argument(
+    "--vocab_gen_mode",
+    action="store_true",
+    default=False,
+    help="If it is set, process full dataset and do not write CSV output. In "
+         "this mode, See temp_dir for vocab files. input_path should cover all "
+         "data, e.g. train, test, eval.")
+parser.add_argument(
+    "--runner",
+    help="Runner for Apache Beam, needs to be one of {DirectRunner, "
+    "DataflowRunner}.",
+    default="DirectRunner")
+parser.add_argument(
+    "--project",
+    default=None,
+    help="ID of your project. Ignored by DirectRunner.")
+parser.add_argument(
+    "--region",
+    default=None,
+    help="Region. Ignored by DirectRunner.")
+parser.add_argument(
+    "--max_vocab_size",
+    type=int,
+    default=10_000_000,
+    help="Max index range, categorical features convert to integer and take "
+         "value modulus the max_vocab_size")
+args = parser.parse_args()
+NUM_NUMERIC_FEATURES = 13
+NUMERIC_FEATURE_KEYS = [
+    f"int-feature-{x + 1}" for x in range(NUM_NUMERIC_FEATURES)]
+CATEGORICAL_FEATURE_KEYS = [
+    "categorical-feature-%d" % x for x in range(NUM_NUMERIC_FEATURES + 1, 40)]
+LABEL_KEY = "clicked"
+# Data is first preprocessed in pure Apache Beam using numpy.
+# This removes missing values and hexadecimal-encoded values.
+# For the TF schema, we can thus specify the schema as FixedLenFeature
+# for TensorFlow Transform.
+FEATURE_SPEC = dict([(name, tf.io.FixedLenFeature([], dtype=tf.int64))
+                     for name in CATEGORICAL_FEATURE_KEYS] +
+                    [(name, tf.io.FixedLenFeature([], dtype=tf.float32))
+                     for name in NUMERIC_FEATURE_KEYS] +
+                    [(LABEL_KEY, tf.io.FixedLenFeature([], tf.float32))])
+INPUT_METADATA = dataset_metadata.DatasetMetadata(
+    schema_utils.schema_from_feature_spec(FEATURE_SPEC))
+def apply_vocab_fn(inputs):
+  """Preprocessing fn for sparse features.
+  Applies vocab to bucketize sparse features. This function operates using
+  previously-created vocab files.
+  Pre-condition: Full vocab has been materialized.
+  Args:
+    inputs: Input features to transform.
+  Returns:
+    Output dict with transformed features.
+  """
+  outputs = {}
+  outputs[LABEL_KEY] = inputs[LABEL_KEY]
+  for key in NUMERIC_FEATURE_KEYS:
+    outputs[key] = inputs[key]
+  for idx, key in enumerate(CATEGORICAL_FEATURE_KEYS):
+    vocab_fn = os.path.join(
+        args.temp_dir, "tftransform_tmp", "feature_{}_vocab".format(idx))
+    outputs[key] = tft.apply_vocabulary(inputs[key], vocab_fn)
+  return outputs
+def compute_vocab_fn(inputs):
+  """Preprocessing fn for sparse features.
+  This function computes unique IDs for the sparse features. We rely on implicit
+  behavior which writes the vocab files to the vocab_filename specified in
+  tft.compute_and_apply_vocabulary.
+  Pre-condition: Sparse features have been converted to integer and mod'ed with
+  args.max_vocab_size.
+  Args:
+    inputs: Input features to transform.
+  Returns:
+    Output dict with transformed features.
+  """
+  outputs = {}
+  outputs[LABEL_KEY] = inputs[LABEL_KEY]
+  for key in NUMERIC_FEATURE_KEYS:
+    outputs[key] = inputs[key]
+  for idx, key in enumerate(CATEGORICAL_FEATURE_KEYS):
+    outputs[key] = tft.compute_and_apply_vocabulary(
+        x=inputs[key],
+        vocab_filename="feature_{}_vocab".format(idx))
+  return outputs
+class FillMissing(beam.DoFn):
+  """Fills missing elements with zero string value."""
+  def process(self, element):
+    elem_list = element.split(args.csv_delimeter)
+    out_list = []
+    for val in elem_list:
+      new_val = "0" if not val else val
+      out_list.append(new_val)
+    yield (args.csv_delimeter).join(out_list)
+class NegsToZeroLog(beam.DoFn):
+  """For int features, sets negative values to zero and takes log(x+1)."""
+  def process(self, element):
+    elem_list = element.split(args.csv_delimeter)
+    out_list = []
+    for i, val in enumerate(elem_list):
+      if i > 0 and i <= NUM_NUMERIC_FEATURES:
+        new_val = "0" if int(val) < 0 else val
+        new_val = np.log(int(new_val) + 1)
+        new_val = str(new_val)
+      else:
+        new_val = val
+      out_list.append(new_val)
+    yield (args.csv_delimeter).join(out_list)
+class HexToIntModRange(beam.DoFn):
+  """For categorical features, takes decimal value and mods with max value."""
+  def process(self, element):
+    elem_list = element.split(args.csv_delimeter)
+    out_list = []
+    for i, val in enumerate(elem_list):
+      if i > NUM_NUMERIC_FEATURES:
+        new_val = int(val, 16) % args.max_vocab_size
+      else:
+        new_val = val
+      out_list.append(str(new_val))
+    yield str.encode((args.csv_delimeter).join(out_list))
+def transform_data(data_path, output_path):
+  """Preprocesses Criteo data.
+  Two processing modes are supported. Raw data will require two passes.
+  If full vocab files already exist, only one pass is necessary.
+  Args:
+    data_path: File(s) to read.
+    output_path: Path to which output CSVs are written, if necessary.
+  """
+  preprocessing_fn = compute_vocab_fn if args.vocab_gen_mode else apply_vocab_fn
+  gcp_project = args.project
+  region = args.region
+  job_name = (f"criteo-preprocessing-"
+              f"{datetime.datetime.now().strftime('%y%m%d-%H%M%S')}")
+  # set up Beam pipeline.
+  pipeline_options = None
+  if args.runner == "DataflowRunner":
+    options = {
+        "staging_location": os.path.join(output_path, "tmp", "staging"),
+        "temp_location": os.path.join(output_path, "tmp"),
+        "job_name": job_name,
+        "project": gcp_project,
+        "save_main_session": True,
+        "region": region,
+        "setup_file": "./setup.py",
+    }
+    pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
+  elif args.runner == "DirectRunner":
+    pipeline_options = beam.options.pipeline_options.DirectOptions(
+        direct_num_workers=os.cpu_count(),
+        direct_running_mode="multi_threading")
+  with beam.Pipeline(args.runner, options=pipeline_options) as pipeline:
+    with tft_beam.Context(temp_dir=args.temp_dir):
+      processed_lines = (
+          pipeline
+          # Read in TSV data.
+          | beam.io.ReadFromText(data_path, coder=beam.coders.StrUtf8Coder())
+          # Fill in missing elements with the defaults (zeros).
+          | "FillMissing" >> beam.ParDo(FillMissing())
+          # For numerical features, set negatives to zero. Then take log(x+1).
+          | "NegsToZeroLog" >> beam.ParDo(NegsToZeroLog())
+          # For categorical features, mod the values with vocab size.
+          | "HexToIntModRange" >> beam.ParDo(HexToIntModRange()))
+      # CSV reader: List the cols in order, as dataset schema is not ordered.
+      ordered_columns = [LABEL_KEY
+                        ] + NUMERIC_FEATURE_KEYS + CATEGORICAL_FEATURE_KEYS
+      csv_tfxio = tfxio.BeamRecordCsvTFXIO(
+          physical_format="text",
+          column_names=ordered_columns,
+          delimiter=args.csv_delimeter,
+          schema=INPUT_METADATA.schema)
+      converted_data = (
+          processed_lines
+          | "DecodeData" >> csv_tfxio.BeamSource())
+      raw_dataset = (converted_data, csv_tfxio.TensorAdapterConfig())
+      # The TFXIO output format is chosen for improved performance.
+      transformed_dataset, _ = (
+          raw_dataset | tft_beam.AnalyzeAndTransformDataset(
+              preprocessing_fn, output_record_batches=False))
+      # Transformed metadata is not necessary for encoding.
+      transformed_data, transformed_metadata = transformed_dataset
+      if not args.vocab_gen_mode:
+        # Write to CSV.
+        transformed_csv_coder = tft.coders.CsvCoder(
+            ordered_columns, transformed_metadata.schema,
+            delimiter=args.csv_delimeter)
+        _ = (
+            transformed_data
+            | "EncodeDataCsv" >> beam.Map(transformed_csv_coder.encode)
+            | "WriteDataCsv" >> beam.io.WriteToText(output_path))
+if __name__ == "__main__":
+  logging.set_verbosity(logging.INFO)
+  transform_data(data_path=args.input_path,
+                 output_path=args.output_path)
--- a/official/recommendation/ranking/preprocessing/setup.py
+++ b/official/recommendation/ranking/preprocessing/setup.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Setup configuration for criteo dataset preprocessing.
+This is used while running Tensorflow transform on Cloud Dataflow.
+"""
+import setuptools
+version = "0.1.0"
+if __name__ == "__main__":
+  setuptools.setup(
+      name="criteo_preprocessing",
+      version=version,
+      install_requires=["tensorflow-transform"],
+      packages=setuptools.find_packages(),
+  )
--- a/official/recommendation/ranking/preprocessing/shard_rebalancer.py
+++ b/official/recommendation/ranking/preprocessing/shard_rebalancer.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Rebalance a set of CSV/TFRecord shards to a target number of files.
+"""
+import argparse
+import datetime
+import os
+import apache_beam as beam
+import tensorflow as tf
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--input_path",
+    default=None,
+    required=True,
+    help="Input path.")
+parser.add_argument(
+    "--output_path",
+    default=None,
+    required=True,
+    help="Output path.")
+parser.add_argument(
+    "--num_output_files",
+    type=int,
+    default=256,
+    help="Number of output file shards.")
+parser.add_argument(
+    "--filetype",
+    default="tfrecord",
+    help="File type, needs to be one of {tfrecord, csv}.")
+parser.add_argument(
+    "--project",
+    default=None,
+    help="ID (not name) of your project. Ignored by DirectRunner")
+parser.add_argument(
+    "--runner",
+    help="Runner for Apache Beam, needs to be one of "
+    "{DirectRunner, DataflowRunner}.",
+    default="DirectRunner")
+parser.add_argument(
+    "--region",
+    default=None,
+    help="region")
+args = parser.parse_args()
+def rebalance_data_shards():
+  """Rebalances data shards."""
+  def csv_pipeline(pipeline: beam.Pipeline):
+    """Rebalances CSV dataset.
+    Args:
+      pipeline: Beam pipeline object.
+    """
+    _ = (
+        pipeline
+        | beam.io.ReadFromText(args.input_path)
+        | beam.io.WriteToText(args.output_path,
+                              num_shards=args.num_output_files))
+  def tfrecord_pipeline(pipeline: beam.Pipeline):
+    """Rebalances TFRecords dataset.
+    Args:
+      pipeline: Beam pipeline object.
+    """
+    example_coder = beam.coders.ProtoCoder(tf.train.Example)
+    _ = (
+        pipeline
+        | beam.io.ReadFromTFRecord(args.input_path, coder=example_coder)
+        | beam.io.WriteToTFRecord(args.output_path, file_name_suffix="tfrecord",
+                                  coder=example_coder,
+                                  num_shards=args.num_output_files))
+  job_name = (
+      f"shard-rebalancer-{datetime.datetime.now().strftime('%y%m%d-%H%M%S')}")
+  # set up Beam pipeline.
+  options = {
+      "staging_location": os.path.join(args.output_path, "tmp", "staging"),
+      "temp_location": os.path.join(args.output_path, "tmp"),
+      "job_name": job_name,
+      "project": args.project,
+      "save_main_session": True,
+      "region": args.region,
+  }
+  opts = beam.pipeline.PipelineOptions(flags=[], **options)
+  with beam.Pipeline(args.runner, options=opts) as pipeline:
+    if args.filetype == "tfrecord":
+      tfrecord_pipeline(pipeline)
+    elif args.filetype == "csv":
+      csv_pipeline(pipeline)
+if __name__ == "__main__":
+  rebalance_data_shards()
--- a/official/utils/docs/build_nlp_api_docs.py
+++ b/official/utils/docs/build_nlp_api_docs.py
@@ -46,16 +46,13 @@ flags.DEFINE_bool('search_hints', True,
 flags.DEFINE_string('site_path', '/api_docs/python',
                    'Path prefix in the _toc.yaml')
-flags.DEFINE_bool('gen_report', False,
-                  'Generate an API report containing the health of the '
-                  'docstrings of the public API.')
 PROJECT_SHORT_NAME = 'tfnlp'
 PROJECT_FULL_NAME = 'TensorFlow Official Models - NLP Modeling Library'
-def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
+def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name,
-                 project_short_name, project_full_name, search_hints):
+                 project_full_name, search_hints):
  """Generates api docs for the tensorflow docs package."""
  build_api_docs_lib.hide_module_model_and_layer_methods()
  del tfnlp.layers.MultiHeadAttention
@@ -68,7 +65,6 @@ def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
      code_url_prefix=code_url_prefix,
      search_hints=search_hints,
      site_path=site_path,
-      gen_report=gen_report,
      callbacks=[public_api.explicit_package_contents_filter],
  )
@@ -84,7 +80,6 @@ def main(argv):
      code_url_prefix=FLAGS.code_url_prefix,
      site_path=FLAGS.site_path,
      output_dir=FLAGS.output_dir,
-      gen_report=FLAGS.gen_report,
      project_short_name=PROJECT_SHORT_NAME,
      project_full_name=PROJECT_FULL_NAME,
      search_hints=FLAGS.search_hints)

--- a/official/utils/docs/build_vision_api_docs.py
+++ b/official/utils/docs/build_vision_api_docs.py
@@ -46,16 +46,12 @@ flags.DEFINE_bool('search_hints', True,
 flags.DEFINE_string('site_path', 'tfvision/api_docs/python',
                    'Path prefix in the _toc.yaml')
-flags.DEFINE_bool('gen_report', False,
-                  'Generate an API report containing the health of the '
-                  'docstrings of the public API.')
 PROJECT_SHORT_NAME = 'tfvision'
 PROJECT_FULL_NAME = 'TensorFlow Official Models - Vision Modeling Library'
-def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
+def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name,
-                 project_short_name, project_full_name, search_hints):
+                 project_full_name, search_hints):
  """Generates api docs for the tensorflow docs package."""
  build_api_docs_lib.hide_module_model_and_layer_methods()
@@ -66,7 +62,6 @@ def gen_api_docs(code_url_prefix, site_path, output_dir, gen_report,
      code_url_prefix=code_url_prefix,
      search_hints=search_hints,
      site_path=site_path,
-      gen_report=gen_report,
      callbacks=[public_api.explicit_package_contents_filter],
  )
@@ -82,7 +77,6 @@ def main(argv):
      code_url_prefix=FLAGS.code_url_prefix,
      site_path=FLAGS.site_path,
      output_dir=FLAGS.output_dir,
-      gen_report=FLAGS.gen_report,
      project_short_name=PROJECT_SHORT_NAME,
      project_full_name=PROJECT_FULL_NAME,
      search_hints=FLAGS.search_hints)

--- a/official/vision/beta/configs/backbones.py
+++ b/official/vision/beta/configs/backbones.py
@@ -32,6 +32,7 @@ class ResNet(hyperparams.Config):
  stochastic_depth_drop_rate: float = 0.0
  resnetd_shortcut: bool = False
  replace_stem_max_pool: bool = False
+  bn_trainable: bool = True
 @dataclasses.dataclass

--- a/official/vision/beta/configs/common.py
+++ b/official/vision/beta/configs/common.py
@@ -15,15 +15,44 @@
 # Lint as: python3
 """Common configurations."""
-from typing import Optional, List
-# Import libraries
 import dataclasses
+from typing import Optional
+# Import libraries
 from official.core import config_definitions as cfg
 from official.modeling import hyperparams
+@dataclasses.dataclass
+class TfExampleDecoder(hyperparams.Config):
+  """A simple TF Example decoder config."""
+  regenerate_source_id: bool = False
+  mask_binarize_threshold: Optional[float] = None
+@dataclasses.dataclass
+class TfExampleDecoderLabelMap(hyperparams.Config):
+  """TF Example decoder with label map config."""
+  regenerate_source_id: bool = False
+  mask_binarize_threshold: Optional[float] = None
+  label_map: str = ''
+@dataclasses.dataclass
+class DataDecoder(hyperparams.OneOfConfig):
+  """Data decoder config.
+  Attributes:
+    type: 'str', type of data decoder be used, one of the fields below.
+    simple_decoder: simple TF Example decoder config.
+    label_map_decoder: TF Example decoder with label map config.
+  """
+  type: Optional[str] = 'simple_decoder'
+  simple_decoder: TfExampleDecoder = TfExampleDecoder()
+  label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
 @dataclasses.dataclass
 class RandAugment(hyperparams.Config):
  """Configuration for RandAugment."""

--- a/official/vision/beta/configs/image_classification.py
+++ b/official/vision/beta/configs/image_classification.py
@@ -14,11 +14,10 @@
 # Lint as: python3
 """Image classification configuration definition."""
+import dataclasses
 import os
 from typing import List, Optional
-import dataclasses
 from official.core import config_definitions as cfg
 from official.core import exp_factory
 from official.modeling import hyperparams
@@ -47,6 +46,7 @@ class DataConfig(cfg.DataConfig):
  label_field_key: str = 'image/class/label'
  decode_jpeg_only: bool = True
  mixup_and_cutmix: Optional[common.MixupAndCutmix] = None
+  decoder: Optional[common.DataDecoder] = common.DataDecoder()
  # Keep for backward compatibility.
  aug_policy: Optional[str] = None  # None, 'autoaug', or 'randaug'.

--- a/official/vision/beta/configs/maskrcnn.py
+++ b/official/vision/beta/configs/maskrcnn.py
@@ -17,7 +17,7 @@
 import dataclasses
 import os
-from typing import List, Optional
+from typing import List, Optional, Union
 from official.core import config_definitions as cfg
 from official.core import exp_factory
@@ -29,26 +29,6 @@ from official.vision.beta.configs import backbones
 # pylint: disable=missing-class-docstring
-@dataclasses.dataclass
-class TfExampleDecoder(hyperparams.Config):
-  regenerate_source_id: bool = False
-  mask_binarize_threshold: Optional[float] = None
-@dataclasses.dataclass
-class TfExampleDecoderLabelMap(hyperparams.Config):
-  regenerate_source_id: bool = False
-  mask_binarize_threshold: Optional[float] = None
-  label_map: str = ''
-@dataclasses.dataclass
-class DataDecoder(hyperparams.OneOfConfig):
-  type: Optional[str] = 'simple_decoder'
-  simple_decoder: TfExampleDecoder = TfExampleDecoder()
-  label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
 @dataclasses.dataclass
 class Parser(hyperparams.Config):
  num_channels: int = 3
@@ -73,7 +53,7 @@ class DataConfig(cfg.DataConfig):
  global_batch_size: int = 0
  is_training: bool = False
  dtype: str = 'bfloat16'
-  decoder: DataDecoder = DataDecoder()
+  decoder: common.DataDecoder = common.DataDecoder()
  parser: Parser = Parser()
  shuffle_buffer_size: int = 10000
  file_type: str = 'tfrecord'
@@ -221,7 +201,8 @@ class MaskRCNNTask(cfg.TaskConfig):
                                           drop_remainder=False)
  losses: Losses = Losses()
  init_checkpoint: Optional[str] = None
-  init_checkpoint_modules: str = 'all'  # all or backbone
+  init_checkpoint_modules: Union[
+      str, List[str]] = 'all'  # all, backbone, and/or decoder
  annotation_file: Optional[str] = None
  per_category_metrics: bool = False
  # If set, we only use masks for the specified class IDs.

--- a/official/vision/beta/configs/retinanet.py
+++ b/official/vision/beta/configs/retinanet.py
@@ -15,9 +15,9 @@
 # Lint as: python3
 """RetinaNet configuration definition."""
-import os
-from typing import List, Optional
 import dataclasses
+import os
+from typing import List, Optional, Union
 from official.core import config_definitions as cfg
 from official.core import exp_factory
@@ -29,22 +29,22 @@ from official.vision.beta.configs import backbones
 # pylint: disable=missing-class-docstring
+# Keep for backward compatibility.
 @dataclasses.dataclass
-class TfExampleDecoder(hyperparams.Config):
+class TfExampleDecoder(common.TfExampleDecoder):
-  regenerate_source_id: bool = False
+  """A simple TF Example decoder config."""
+# Keep for backward compatibility.
 @dataclasses.dataclass
-class TfExampleDecoderLabelMap(hyperparams.Config):
+class TfExampleDecoderLabelMap(common.TfExampleDecoderLabelMap):
-  regenerate_source_id: bool = False
+  """TF Example decoder with label map config."""
-  label_map: str = ''
+# Keep for backward compatibility.
 @dataclasses.dataclass
-class DataDecoder(hyperparams.OneOfConfig):
+class DataDecoder(common.DataDecoder):
-  type: Optional[str] = 'simple_decoder'
+  """Data decoder config."""
-  simple_decoder: TfExampleDecoder = TfExampleDecoder()
-  label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap()
 @dataclasses.dataclass
@@ -55,6 +55,7 @@ class Parser(hyperparams.Config):
  aug_rand_hflip: bool = False
  aug_scale_min: float = 1.0
  aug_scale_max: float = 1.0
+  aug_policy: Optional[str] = None
  skip_crowd_during_training: bool = True
  max_num_instances: int = 100
@@ -66,7 +67,7 @@ class DataConfig(cfg.DataConfig):
  global_batch_size: int = 0
  is_training: bool = False
  dtype: str = 'bfloat16'
-  decoder: DataDecoder = DataDecoder()
+  decoder: common.DataDecoder = common.DataDecoder()
  parser: Parser = Parser()
  shuffle_buffer_size: int = 10000
  file_type: str = 'tfrecord'
@@ -144,7 +145,8 @@ class RetinaNetTask(cfg.TaskConfig):
  validation_data: DataConfig = DataConfig(is_training=False)
  losses: Losses = Losses()
  init_checkpoint: Optional[str] = None
-  init_checkpoint_modules: str = 'all'  # all or backbone
+  init_checkpoint_modules: Union[
+      str, List[str]] = 'all'  # all, backbone, and/or decoder
  annotation_file: Optional[str] = None
  per_category_metrics: bool = False
  export_config: ExportConfig = ExportConfig()

--- a/official/vision/beta/configs/semantic_segmentation.py
+++ b/official/vision/beta/configs/semantic_segmentation.py
@@ -14,10 +14,10 @@
 # Lint as: python3
 """Semantic segmentation configuration definition."""
+import dataclasses
 import os
 from typing import List, Optional, Union
-import dataclasses
 import numpy as np
 from official.core import exp_factory
@@ -50,8 +50,10 @@ class DataConfig(cfg.DataConfig):
  aug_scale_min: float = 1.0
  aug_scale_max: float = 1.0
  aug_rand_hflip: bool = True
+  aug_policy: Optional[str] = None
  drop_remainder: bool = True
  file_type: str = 'tfrecord'
+  decoder: Optional[common.DataDecoder] = common.DataDecoder()
 @dataclasses.dataclass
@@ -120,7 +122,7 @@ class SemanticSegmentationTask(cfg.TaskConfig):
 def semantic_segmentation() -> cfg.ExperimentConfig:
  """Semantic segmentation general."""
  return cfg.ExperimentConfig(
-      task=SemanticSegmentationModel(),
+      task=SemanticSegmentationTask(),
      trainer=cfg.TrainerConfig(),
      restrictions=[
          'task.train_data.is_training != None',

--- a/official/vision/beta/modeling/backbones/resnet.py
+++ b/official/vision/beta/modeling/backbones/resnet.py
@@ -127,6 +127,7 @@ class ResNet(tf.keras.Model):
      kernel_initializer: str = 'VarianceScaling',
      kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None,
      bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None,
+      bn_trainable: bool = True,
      **kwargs):
    """Initializes a ResNet model.
@@ -153,6 +154,8 @@ class ResNet(tf.keras.Model):
        Conv2D. Default to None.
      bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D.
        Default to None.
+      bn_trainable: A `bool` that indicates whether batch norm layers should be
+        trainable. Default to True.
      **kwargs: Additional keyword arguments to be passed.
    """
    self._model_id = model_id
@@ -174,6 +177,7 @@ class ResNet(tf.keras.Model):
    self._kernel_initializer = kernel_initializer
    self._kernel_regularizer = kernel_regularizer
    self._bias_regularizer = bias_regularizer
+    self._bn_trainable = bn_trainable
    if tf.keras.backend.image_data_format() == 'channels_last':
      bn_axis = -1
@@ -195,7 +199,10 @@ class ResNet(tf.keras.Model):
          bias_regularizer=self._bias_regularizer)(
              inputs)
      x = self._norm(
-          axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
+          axis=bn_axis,
+          momentum=norm_momentum,
+          epsilon=norm_epsilon,
+          trainable=bn_trainable)(
              x)
      x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
    elif stem_type == 'v1':
@@ -210,7 +217,10 @@ class ResNet(tf.keras.Model):
          bias_regularizer=self._bias_regularizer)(
              inputs)
      x = self._norm(
-          axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
+          axis=bn_axis,
+          momentum=norm_momentum,
+          epsilon=norm_epsilon,
+          trainable=bn_trainable)(
              x)
      x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
      x = layers.Conv2D(
@@ -224,7 +234,10 @@ class ResNet(tf.keras.Model):
          bias_regularizer=self._bias_regularizer)(
              x)
      x = self._norm(
-          axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
+          axis=bn_axis,
+          momentum=norm_momentum,
+          epsilon=norm_epsilon,
+          trainable=bn_trainable)(
              x)
      x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
      x = layers.Conv2D(
@@ -238,7 +251,10 @@ class ResNet(tf.keras.Model):
          bias_regularizer=self._bias_regularizer)(
              x)
      x = self._norm(
-          axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
+          axis=bn_axis,
+          momentum=norm_momentum,
+          epsilon=norm_epsilon,
+          trainable=bn_trainable)(
              x)
      x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
    else:
@@ -256,7 +272,10 @@ class ResNet(tf.keras.Model):
          bias_regularizer=self._bias_regularizer)(
              x)
      x = self._norm(
-          axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)(
+          axis=bn_axis,
+          momentum=norm_momentum,
+          epsilon=norm_epsilon,
+          trainable=bn_trainable)(
              x)
      x = tf_utils.get_activation(activation, use_keras_layer=True)(x)
    else:
@@ -324,7 +343,8 @@ class ResNet(tf.keras.Model):
        activation=self._activation,
        use_sync_bn=self._use_sync_bn,
        norm_momentum=self._norm_momentum,
-        norm_epsilon=self._norm_epsilon)(
+        norm_epsilon=self._norm_epsilon,
+        bn_trainable=self._bn_trainable)(
            inputs)
    for _ in range(1, block_repeats):
@@ -341,7 +361,8 @@ class ResNet(tf.keras.Model):
          activation=self._activation,
          use_sync_bn=self._use_sync_bn,
          norm_momentum=self._norm_momentum,
-          norm_epsilon=self._norm_epsilon)(
+          norm_epsilon=self._norm_epsilon,
+          bn_trainable=self._bn_trainable)(
              x)
    return tf.keras.layers.Activation('linear', name=name)(x)
@@ -362,6 +383,7 @@ class ResNet(tf.keras.Model):
        'kernel_initializer': self._kernel_initializer,
        'kernel_regularizer': self._kernel_regularizer,
        'bias_regularizer': self._bias_regularizer,
+        'bn_trainable': self._bn_trainable
    }
    return config_dict
@@ -400,4 +422,5 @@ def build_resnet(
      use_sync_bn=norm_activation_config.use_sync_bn,
      norm_momentum=norm_activation_config.norm_momentum,
      norm_epsilon=norm_activation_config.norm_epsilon,
-      kernel_regularizer=l2_regularizer)
+      kernel_regularizer=l2_regularizer,
+      bn_trainable=backbone_cfg.bn_trainable)
--- a/official/vision/beta/modeling/backbones/resnet_test.py
+++ b/official/vision/beta/modeling/backbones/resnet_test.py
@@ -135,6 +135,7 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase):
        kernel_initializer='VarianceScaling',
        kernel_regularizer=None,
        bias_regularizer=None,
+        bn_trainable=True
    )
    network = resnet.ResNet(**kwargs)

--- a/official/vision/beta/modeling/layers/nn_blocks.py
+++ b/official/vision/beta/modeling/layers/nn_blocks.py
@@ -72,6 +72,7 @@ class ResidualBlock(tf.keras.layers.Layer):
               use_sync_bn=False,
               norm_momentum=0.99,
               norm_epsilon=0.001,
+               bn_trainable=True,
               **kwargs):
    """Initializes a residual block with BN after convolutions.
@@ -99,6 +100,8 @@ class ResidualBlock(tf.keras.layers.Layer):
      use_sync_bn: A `bool`. If True, use synchronized batch normalization.
      norm_momentum: A `float` of normalization momentum for the moving average.
      norm_epsilon: A `float` added to variance to avoid dividing by zero.
+      bn_trainable: A `bool` that indicates whether batch norm layers should be
+        trainable. Default to True.
      **kwargs: Additional keyword arguments to be passed.
    """
    super(ResidualBlock, self).__init__(**kwargs)
@@ -126,6 +129,7 @@ class ResidualBlock(tf.keras.layers.Layer):
    else:
      self._bn_axis = 1
    self._activation_fn = tf_utils.get_activation(activation)
+    self._bn_trainable = bn_trainable
  def build(self, input_shape):
    if self._use_projection:
@@ -140,7 +144,8 @@ class ResidualBlock(tf.keras.layers.Layer):
      self._norm0 = self._norm(
          axis=self._bn_axis,
          momentum=self._norm_momentum,
-          epsilon=self._norm_epsilon)
+          epsilon=self._norm_epsilon,
+          trainable=self._bn_trainable)
    self._conv1 = tf.keras.layers.Conv2D(
        filters=self._filters,
@@ -154,7 +159,8 @@ class ResidualBlock(tf.keras.layers.Layer):
    self._norm1 = self._norm(
        axis=self._bn_axis,
        momentum=self._norm_momentum,
-        epsilon=self._norm_epsilon)
+        epsilon=self._norm_epsilon,
+        trainable=self._bn_trainable)
    self._conv2 = tf.keras.layers.Conv2D(
        filters=self._filters,
@@ -168,7 +174,8 @@ class ResidualBlock(tf.keras.layers.Layer):
    self._norm2 = self._norm(
        axis=self._bn_axis,
        momentum=self._norm_momentum,
-        epsilon=self._norm_epsilon)
+        epsilon=self._norm_epsilon,
+        trainable=self._bn_trainable)
    if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1:
      self._squeeze_excitation = nn_layers.SqueezeExcitation(
@@ -203,7 +210,8 @@ class ResidualBlock(tf.keras.layers.Layer):
        'activation': self._activation,
        'use_sync_bn': self._use_sync_bn,
        'norm_momentum': self._norm_momentum,
-        'norm_epsilon': self._norm_epsilon
+        'norm_epsilon': self._norm_epsilon,
+        'bn_trainable': self._bn_trainable
    }
    base_config = super(ResidualBlock, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))
@@ -249,6 +257,7 @@ class BottleneckBlock(tf.keras.layers.Layer):
               use_sync_bn=False,
               norm_momentum=0.99,
               norm_epsilon=0.001,
+               bn_trainable=True,
               **kwargs):
    """Initializes a standard bottleneck block with BN after convolutions.
@@ -277,6 +286,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
      use_sync_bn: A `bool`. If True, use synchronized batch normalization.
      norm_momentum: A `float` of normalization momentum for the moving average.
      norm_epsilon: A `float` added to variance to avoid dividing by zero.
+      bn_trainable: A `bool` that indicates whether batch norm layers should be
+        trainable. Default to True.
      **kwargs: Additional keyword arguments to be passed.
    """
    super(BottleneckBlock, self).__init__(**kwargs)
@@ -303,6 +314,7 @@ class BottleneckBlock(tf.keras.layers.Layer):
      self._bn_axis = -1
    else:
      self._bn_axis = 1
+    self._bn_trainable = bn_trainable
  def build(self, input_shape):
    if self._use_projection:
@@ -330,7 +342,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
      self._norm0 = self._norm(
          axis=self._bn_axis,
          momentum=self._norm_momentum,
-          epsilon=self._norm_epsilon)
+          epsilon=self._norm_epsilon,
+          trainable=self._bn_trainable)
    self._conv1 = tf.keras.layers.Conv2D(
        filters=self._filters,
@@ -343,7 +356,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
    self._norm1 = self._norm(
        axis=self._bn_axis,
        momentum=self._norm_momentum,
-        epsilon=self._norm_epsilon)
+        epsilon=self._norm_epsilon,
+        trainable=self._bn_trainable)
    self._activation1 = tf_utils.get_activation(
        self._activation, use_keras_layer=True)
@@ -360,7 +374,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
    self._norm2 = self._norm(
        axis=self._bn_axis,
        momentum=self._norm_momentum,
-        epsilon=self._norm_epsilon)
+        epsilon=self._norm_epsilon,
+        trainable=self._bn_trainable)
    self._activation2 = tf_utils.get_activation(
        self._activation, use_keras_layer=True)
@@ -375,7 +390,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
    self._norm3 = self._norm(
        axis=self._bn_axis,
        momentum=self._norm_momentum,
-        epsilon=self._norm_epsilon)
+        epsilon=self._norm_epsilon,
+        trainable=self._bn_trainable)
    self._activation3 = tf_utils.get_activation(
        self._activation, use_keras_layer=True)
@@ -414,7 +430,8 @@ class BottleneckBlock(tf.keras.layers.Layer):
        'activation': self._activation,
        'use_sync_bn': self._use_sync_bn,
        'norm_momentum': self._norm_momentum,
-        'norm_epsilon': self._norm_epsilon
+        'norm_epsilon': self._norm_epsilon,
+        'bn_trainable': self._bn_trainable
    }
    base_config = super(BottleneckBlock, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

--- a/official/vision/beta/modeling/layers/nn_layers.py
+++ b/official/vision/beta/modeling/layers/nn_layers.py
@@ -425,7 +425,7 @@ class PositionalEncoding(tf.keras.layers.Layer):
    self._rezero = Scale(initializer=initializer, name='rezero')
    state_prefix = state_prefix if state_prefix is not None else ''
    self._state_prefix = state_prefix
-    self._frame_count_name = f'{state_prefix}/pos_enc_frame_count'
+    self._frame_count_name = f'{state_prefix}_pos_enc_frame_count'
  def get_config(self):
    """Returns a dictionary containing the config used for initialization."""
@@ -523,7 +523,7 @@ class PositionalEncoding(tf.keras.layers.Layer):
      inputs: An input `tf.Tensor`.
      states: A `dict` of states such that, if any of the keys match for this
        layer, will overwrite the contents of the buffer(s). Expected keys
-        include `state_prefix + '/pos_enc_frame_count'`.
+        include `state_prefix + '_pos_enc_frame_count'`.
      output_states: A `bool`. If True, returns the output tensor and output
        states. Returns just the output tensor otherwise.
@@ -587,8 +587,8 @@ class GlobalAveragePool3D(tf.keras.layers.Layer):
    state_prefix = state_prefix if state_prefix is not None else ''
    self._state_prefix = state_prefix
-    self._state_name = f'{state_prefix}/pool_buffer'
+    self._state_name = f'{state_prefix}_pool_buffer'
-    self._frame_count_name = f'{state_prefix}/pool_frame_count'
+    self._frame_count_name = f'{state_prefix}_pool_frame_count'
  def get_config(self):
    """Returns a dictionary containing the config used for initialization."""
@@ -611,8 +611,8 @@ class GlobalAveragePool3D(tf.keras.layers.Layer):
      inputs: An input `tf.Tensor`.
      states: A `dict` of states such that, if any of the keys match for this
        layer, will overwrite the contents of the buffer(s).
-        Expected keys include `state_prefix + '/pool_buffer'` and
+        Expected keys include `state_prefix + '__pool_buffer'` and
-        `state_prefix + '/pool_frame_count'`.
+        `state_prefix + '__pool_frame_count'`.
      output_states: A `bool`. If True, returns the output tensor and output
        states. Returns just the output tensor otherwise.

--- a/official/vision/beta/modeling/maskrcnn_model_test.py
+++ b/official/vision/beta/modeling/maskrcnn_model_test.py
@@ -384,7 +384,7 @@ class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase):
    ckpt.save(os.path.join(save_dir, 'ckpt'))
    partial_ckpt = tf.train.Checkpoint(backbone=backbone)
-    partial_ckpt.restore(tf.train.latest_checkpoint(
+    partial_ckpt.read(tf.train.latest_checkpoint(
        save_dir)).expect_partial().assert_existing_objects_matched()
    if include_mask:

--- a/official/vision/beta/ops/preprocess_ops.py
+++ b/official/vision/beta/ops/preprocess_ops.py
@@ -646,3 +646,183 @@ def _saturation(image: tf.Tensor,
  return augment.blend(tf.repeat(tf.image.rgb_to_grayscale(image), 3, axis=-1),
                       image,
                       saturation)
+def random_crop_image_with_boxes_and_labels(img, boxes, labels, min_scale,
+                                            aspect_ratio_range,
+                                            min_overlap_params, max_retry):
+  """Crops a random slice from the input image.
+  The function will correspondingly recompute the bounding boxes and filter out
+  outside boxes and their labels.
+  References:
+  [1] End-to-End Object Detection with Transformers
+  https://arxiv.org/abs/2005.12872
+  The preprocessing steps:
+  1. Sample a minimum IoU overlap.
+  2. For each trial, sample the new image width, height, and top-left corner.
+  3. Compute the IoUs of bounding boxes with the cropped image and retry if
+    the maximum IoU is below the sampled threshold.
+  4. Find boxes whose centers are in the cropped image.
+  5. Compute new bounding boxes in the cropped region and only select those
+    boxes' labels.
+  Args:
+    img: a 'Tensor' of shape [height, width, 3] representing the input image.
+    boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding
+      boxes with (ymin, xmin, ymax, xmax).
+    labels: a 'Tensor' of shape [N,] representing the class labels of the boxes.
+    min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random
+      scale variable.
+    aspect_ratio_range: a list of two 'float' that specifies the lower and upper
+      bound of the random aspect ratio.
+    min_overlap_params: a list of four 'float' representing the min value, max
+      value, step size, and offset for the minimum overlap sample.
+    max_retry: an 'int' representing the number of trials for cropping. If it is
+      exhausted, no cropping will be performed.
+  Returns:
+    img: a Tensor representing the random cropped image. Can be the
+      original image if max_retry is exhausted.
+    boxes: a Tensor representing the bounding boxes in the cropped image.
+    labels: a Tensor representing the new bounding boxes' labels.
+  """
+  shape = tf.shape(img)
+  original_h = shape[0]
+  original_w = shape[1]
+  minval, maxval, step, offset = min_overlap_params
+  min_overlap = tf.math.floordiv(
+      tf.random.uniform([], minval=minval, maxval=maxval), step) * step - offset
+  min_overlap = tf.clip_by_value(min_overlap, 0.0, 1.1)
+  if min_overlap > 1.0:
+    return img, boxes, labels
+  aspect_ratio_low = aspect_ratio_range[0]
+  aspect_ratio_high = aspect_ratio_range[1]
+  for _ in tf.range(max_retry):
+    scale_h = tf.random.uniform([], min_scale, 1.0)
+    scale_w = tf.random.uniform([], min_scale, 1.0)
+    new_h = tf.cast(
+        scale_h * tf.cast(original_h, dtype=tf.float32), dtype=tf.int32)
+    new_w = tf.cast(
+        scale_w * tf.cast(original_w, dtype=tf.float32), dtype=tf.int32)
+    # Aspect ratio has to be in the prespecified range
+    aspect_ratio = new_h / new_w
+    if aspect_ratio_low > aspect_ratio or aspect_ratio > aspect_ratio_high:
+      continue
+    left = tf.random.uniform([], 0, original_w - new_w, dtype=tf.int32)
+    right = left + new_w
+    top = tf.random.uniform([], 0, original_h - new_h, dtype=tf.int32)
+    bottom = top + new_h
+    normalized_left = tf.cast(
+        left, dtype=tf.float32) / tf.cast(
+            original_w, dtype=tf.float32)
+    normalized_right = tf.cast(
+        right, dtype=tf.float32) / tf.cast(
+            original_w, dtype=tf.float32)
+    normalized_top = tf.cast(
+        top, dtype=tf.float32) / tf.cast(
+            original_h, dtype=tf.float32)
+    normalized_bottom = tf.cast(
+        bottom, dtype=tf.float32) / tf.cast(
+            original_h, dtype=tf.float32)
+    cropped_box = tf.expand_dims(
+        tf.stack([
+            normalized_top,
+            normalized_left,
+            normalized_bottom,
+            normalized_right,
+        ]),
+        axis=0)
+    iou = box_ops.bbox_overlap(
+        tf.expand_dims(cropped_box, axis=0),
+        tf.expand_dims(boxes, axis=0))  # (1, 1, n_ground_truth)
+    iou = tf.squeeze(iou, axis=[0, 1])
+    # If not a single bounding box has a Jaccard overlap of greater than
+    # the minimum, try again
+    if tf.reduce_max(iou) < min_overlap:
+      continue
+    centroids = box_ops.yxyx_to_cycxhw(boxes)
+    mask = tf.math.logical_and(
+        tf.math.logical_and(centroids[:, 0] > normalized_top,
+                            centroids[:, 0] < normalized_bottom),
+        tf.math.logical_and(centroids[:, 1] > normalized_left,
+                            centroids[:, 1] < normalized_right))
+    # If not a single bounding box has its center in the crop, try again.
+    if tf.reduce_sum(tf.cast(mask, dtype=tf.int32)) > 0:
+      indices = tf.squeeze(tf.where(mask), axis=1)
+      filtered_boxes = tf.gather(boxes, indices)
+      boxes = tf.clip_by_value(
+          (filtered_boxes[..., :] * tf.cast(
+              tf.stack([original_h, original_w, original_h, original_w]),
+              dtype=tf.float32) -
+           tf.cast(tf.stack([top, left, top, left]), dtype=tf.float32)) /
+          tf.cast(tf.stack([new_h, new_w, new_h, new_w]), dtype=tf.float32),
+          0.0, 1.0)
+      img = tf.image.crop_to_bounding_box(img, top, left, bottom - top,
+                                          right - left)
+      labels = tf.gather(labels, indices)
+      break
+  return img, boxes, labels
+def random_crop(image,
+                boxes,
+                labels,
+                min_scale=0.3,
+                aspect_ratio_range=(0.5, 2.0),
+                min_overlap_params=(0.0, 1.4, 0.2, 0.1),
+                max_retry=50,
+                seed=None):
+  """Randomly crop the image and boxes, filtering labels.
+  Args:
+    image: a 'Tensor' of shape [height, width, 3] representing the input image.
+    boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding
+      boxes with (ymin, xmin, ymax, xmax).
+    labels: a 'Tensor' of shape [N,] representing the class labels of the boxes.
+    min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random
+      scale variable.
+    aspect_ratio_range: a list of two 'float' that specifies the lower and upper
+      bound of the random aspect ratio.
+    min_overlap_params: a list of four 'float' representing the min value, max
+      value, step size, and offset for the minimum overlap sample.
+    max_retry: an 'int' representing the number of trials for cropping. If it is
+      exhausted, no cropping will be performed.
+    seed: the random number seed of int, but could be None.
+  Returns:
+    image: a Tensor representing the random cropped image. Can be the
+      original image if max_retry is exhausted.
+    boxes: a Tensor representing the bounding boxes in the cropped image.
+    labels: a Tensor representing the new bounding boxes' labels.
+  """
+  with tf.name_scope('random_crop'):
+    do_crop = tf.greater(tf.random.uniform([], seed=seed), 0.5)
+    if do_crop:
+      return random_crop_image_with_boxes_and_labels(image, boxes, labels,
+                                                     min_scale,
+                                                     aspect_ratio_range,
+                                                     min_overlap_params,
+                                                     max_retry)
+    else:
+      return image, boxes, labels
--- a/official/vision/beta/ops/preprocess_ops_test.py
+++ b/official/vision/beta/ops/preprocess_ops_test.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Tests for preprocess_ops.py."""
 import io
@@ -42,7 +41,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
      ([12, 2], 10),
      ([13, 2, 3], 10),
  )
-  def testPadToFixedSize(self, input_shape, output_size):
+  def test_pad_to_fixed_size(self, input_shape, output_size):
    # Copies input shape to padding shape.
    clip_shape = input_shape[:]
    clip_shape[0] = min(output_size, clip_shape[0])
@@ -63,16 +62,11 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
      (100, 256, 128, 256, 32, 1.0, 1.0, 128, 256),
      (200, 512, 200, 128, 32, 0.25, 0.25, 224, 128),
  )
-  def testResizeAndCropImageRectangluarCase(self,
+  def test_resize_and_crop_image_rectangluar_case(self, input_height,
-                                            input_height,
+                                                  input_width, desired_height,
-                                            input_width,
+                                                  desired_width, stride,
-                                            desired_height,
+                                                  scale_y, scale_x,
-                                            desired_width,
+                                                  output_height, output_width):
-                                            stride,
-                                            scale_y,
-                                            scale_x,
-                                            output_height,
-                                            output_width):
    image = tf.convert_to_tensor(
        np.random.rand(input_height, input_width, 3))
@@ -98,16 +92,10 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
      (100, 200, 220, 220, 32, 1.1, 1.1, 224, 224),
      (512, 512, 1024, 1024, 32, 2.0, 2.0, 1024, 1024),
  )
-  def testResizeAndCropImageSquareCase(self,
+  def test_resize_and_crop_image_square_case(self, input_height, input_width,
-                                       input_height,
+                                             desired_height, desired_width,
-                                       input_width,
+                                             stride, scale_y, scale_x,
-                                       desired_height,
+                                             output_height, output_width):
-                                       desired_width,
-                                       stride,
-                                       scale_y,
-                                       scale_x,
-                                       output_height,
-                                       output_width):
    image = tf.convert_to_tensor(
        np.random.rand(input_height, input_width, 3))
@@ -135,18 +123,10 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
      (100, 200, 80, 100, 32, 0.5, 0.5, 50, 100, 96, 128),
      (200, 100, 80, 100, 32, 0.5, 0.5, 100, 50, 128, 96),
  )
-  def testResizeAndCropImageV2(self,
+  def test_resize_and_crop_image_v2(self, input_height, input_width, short_side,
-                               input_height,
+                                    long_side, stride, scale_y, scale_x,
-                               input_width,
+                                    desired_height, desired_width,
-                               short_side,
+                                    output_height, output_width):
-                               long_side,
-                               stride,
-                               scale_y,
-                               scale_x,
-                               desired_height,
-                               desired_width,
-                               output_height,
-                               output_width):
    image = tf.convert_to_tensor(
        np.random.rand(input_height, input_width, 3))
    image_shape = tf.shape(image)[0:2]
@@ -176,9 +156,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
  @parameterized.parameters(
      (400, 600), (600, 400),
  )
-  def testCenterCropImage(self,
+  def test_center_crop_image(self, input_height, input_width):
-                          input_height,
-                          input_width):
    image = tf.convert_to_tensor(
        np.random.rand(input_height, input_width, 3))
    cropped_image = preprocess_ops.center_crop_image(image)
@@ -188,9 +166,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
  @parameterized.parameters(
      (400, 600), (600, 400),
  )
-  def testCenterCropImageV2(self,
+  def test_center_crop_image_v2(self, input_height, input_width):
-                            input_height,
-                            input_width):
    image_bytes = tf.constant(
        _encode_image(
            np.uint8(np.random.rand(input_height, input_width, 3) * 255),
@@ -204,9 +180,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
  @parameterized.parameters(
      (400, 600), (600, 400),
  )
-  def testRandomCropImage(self,
+  def test_random_crop_image(self, input_height, input_width):
-                          input_height,
-                          input_width):
    image = tf.convert_to_tensor(
        np.random.rand(input_height, input_width, 3))
    _ = preprocess_ops.random_crop_image(image)
@@ -214,9 +188,7 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
  @parameterized.parameters(
      (400, 600), (600, 400),
  )
-  def testRandomCropImageV2(self,
+  def test_random_crop_image_v2(self, input_height, input_width):
-                            input_height,
-                            input_width):
    image_bytes = tf.constant(
        _encode_image(
            np.uint8(np.random.rand(input_height, input_width, 3) * 255),
@@ -244,6 +216,21 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
    jittered_image = preprocess_ops._saturation(image, saturation)
    assert jittered_image.shape == image.shape
+  @parameterized.parameters((640, 640, 20), (1280, 1280, 30))
+  def test_random_crop(self, input_height, input_width, num_boxes):
+    image = tf.convert_to_tensor(np.random.rand(input_height, input_width, 3))
+    boxes_height = np.random.randint(0, input_height, size=(num_boxes, 1))
+    top = np.random.randint(0, high=(input_height - boxes_height))
+    down = top + boxes_height
+    boxes_width = np.random.randint(0, input_width, size=(num_boxes, 1))
+    left = np.random.randint(0, high=(input_width - boxes_width))
+    right = left + boxes_width
+    boxes = tf.constant(
+        np.concatenate([top, left, down, right], axis=-1), tf.float32)
+    labels = tf.constant(
+        np.random.randint(low=0, high=num_boxes, size=(num_boxes,)), tf.int64)
+    _ = preprocess_ops.random_crop(image, boxes, labels)
 if __name__ == '__main__':
  tf.test.main()
--- a/official/vision/beta/projects/deepmac_maskrcnn/README.md
+++ b/official/vision/beta/projects/deepmac_maskrcnn/README.md
@@ -82,6 +82,8 @@ $ python3 -m official.vision.beta.projects.deepmac_maskrcnn.train \
 ```
 `CONFIG_FILE` can be any file in the `configs/experiments` directory.
+When using SpineNet models, please specify
+`--experiment=deep_mask_head_rcnn_spinenet_coco`
 **Note:** The default eval batch size of 32 discards some samples during
 validation. For accurate vaidation statistics, launch a dedicated eval job on
@@ -93,11 +95,12 @@ In the following table, we report the Mask mAP of our models on the non-VOC
 classes when only training with masks for the VOC calsses. Performance is
 measured on the `coco-val2017` set.
-Backbone   | Mask head    | Config name                              | Mask mAP
+Backbone     | Mask head    | Config name                                     | Mask mAP
-:--------- | :----------- | :--------------------------------------- | -------:
+:------------| :----------- | :-----------------------------------------------| -------:
-ResNet-50  | Default      | `deep_mask_head_rcnn_voc_r50.yaml`       | 25.9
+ResNet-50    | Default      | `deep_mask_head_rcnn_voc_r50.yaml`              | 25.9
-ResNet-50  | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml`  | 33.1
+ResNet-50    | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml`         | 33.1
-ResNet-101 | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml` | 34.4
+ResNet-101   | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml`        | 34.4
+SpienNet-143 | Hourglass-52 | `deep_mask_head_rcnn_voc_spinenet143_hg52.yaml` | 38.7
 ## See also