Merge branch 'yolo' of https://github.com/PurdueCAM2Project/tf-models into yolo

9cbdcd35 · anivegesana · 35e9d291 · 71ef4530 · 9cbdcd35 · 9cbdcd35
Commit 9cbdcd35 authored Oct 24, 2020 by anivegesana
18 changed files
--- a/official/vision/beta/projects/yolo/README.md
+++ b/official/vision/beta/projects/yolo/README.md
@@ -15,7 +15,7 @@ This repository is the unofficial implementation of the following papers. Howeve

 Yolo v1 the original implementation was released in 2015 providing a ground breaking algorithm that would quickly process images, and locate objects in a single pass through the detector. The original implementation based used a backbone derived from state of the art object classifier of the time, like [GoogLeNet](https://arxiv.org/abs/1409.4842) and [VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel Yolo Detection head that allowed for Object Detection with a single pass of an image. Though limited, the network could predict up to 90 bounding boxes per image, and was tested for about 80 classes per box. Also, the model could only make prediction at one scale. These attributes caused yolo v1 to be more limited, and less versatile, so as the year passed, the Developers continued to update and develop this model.

-Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo network group. These model uses a custom backbone called Darknet53 that uses knowledge gained from the ResNet paper to improve its predictions. The new backbone also allows for objects to be detected at multiple scales. As for the new detection head, the model now predicts the bounding boxes using a set of anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in combination with the Anchor boxes allows for the network to make up to 1000 object predictions on a single image. Finally, the new loss function forces the network to make better prediction by using Intersection Over Union (IOU) to inform the models confidence rather than relying on the mean squared error for the entire output.
+Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo network group. These model uses a custom backbone called Darknet53 that uses knowledge gained from the ResNet paper to improve its predictions. The new backbone also allows for objects to be detected at multiple scales. As for the new detection head, the model now predicts the bounding boxes using a set of anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in combination with the Anchor boxes allows for the network to make up to 1000 object predictions on a single image. Finally, the new loss function forces the network to make better prediction by using Intersection Over Union (IOU) to inform the model's confidence rather than relying on the mean squared error for the entire output.

 ## Authors

@@ -33,7 +33,8 @@ Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo net


 ## Our Goal
-Our goal with this model conversion is to provide highly versatile implementations of the Backbone and Yolo Head. We have tried to build the model in such a way that the Yolo head could easily be connected to a new, more powerful backbone if a person chose to.
+
+Our goal with this model conversion is to provide implementations of the Backbone and Yolo Head. We have built the model in such a way that the Yolo head could be connected to a new, more powerful backbone if a person chose to.

 ## Models in the library


--- a/official/vision/beta/projects/yolo/common/registry_imports.py
+++ b/official/vision/beta/projects/yolo/common/registry_imports.py
@@ -21,3 +21,12 @@ from official.vision import beta

 from official.vision.beta.projects import yolo
 from official.vision.beta.projects.yolo.modeling.backbones import Darknet
+from official.vision.beta.projects.yolo.configs import darknet_classification
+
+from official.vision.beta.projects.yolo.configs.darknet_classification import image_classification
+from official.vision.beta.projects.yolo.configs.darknet_classification import ImageClassificationTask
+
+from official.vision.beta.projects.yolo.tasks.image_classification import ImageClassificationTask
+
+# task_factory.register_task_cls(ImageClassificationTask)(ImageClassificationTask)
+# print(task_factory._REGISTERED_TASK_CLS)
\ No newline at end of file
--- a/official/vision/beta/projects/yolo/configs/backbones.py
+++ b/official/vision/beta/projects/yolo/configs/backbones.py
 """Backbones configurations."""
 # Import libraries
 import dataclasses
-from typing import Optional
-from official.modeling import hyperparams

-# from official.vision.beta.configs import backbones
+from official.modeling import hyperparams

+from official.vision.beta.configs import backbones

 @dataclasses.dataclass
 class DarkNet(hyperparams.Config):
  """DarkNet config."""
  model_id: str = "darknet53"

-
-# # we could not get this to work
-# @dataclasses.dataclass
-# class Backbone(backbones.Backbone):
-#   darknet: DarkNet = DarkNet()
+@dataclasses.dataclass
+class Backbone(backbones.Backbone):
+  darknet: DarkNet = DarkNet()
--- a/official/vision/beta/projects/yolo/configs/darknet_classification.py
+++ b/official/vision/beta/projects/yolo/configs/darknet_classification.py
+import os
+from typing import List
+import dataclasses
+from official.core import config_definitions as cfg
+from official.core import exp_factory
+from official.modeling import hyperparams
+from official.modeling import optimization
+from official.vision.beta.projects.yolo.configs import backbones
+from official.vision.beta.configs import common
+from official.vision.beta.configs import image_classification as imc
+
+
+@dataclasses.dataclass
+class ImageClassificationModel(hyperparams.Config):
+  num_classes: int = 0
+  input_size: List[int] = dataclasses.field(default_factory=list)
+  backbone: backbones.Backbone = backbones.Backbone(
+      type='darknet', resnet=backbones.DarkNet())
+  dropout_rate: float = 0.0
+  norm_activation: common.NormActivation = common.NormActivation()
+  # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
+  add_head_batch_norm: bool = False
+
+
+@dataclasses.dataclass
+class Losses(hyperparams.Config):
+  one_hot: bool = True
+  label_smoothing: float = 0.0
+  l2_weight_decay: float = 0.0
+
+
+@dataclasses.dataclass
+class ImageClassificationTask(cfg.TaskConfig):
+  """The model config."""
+  model: ImageClassificationModel = ImageClassificationModel()
+  train_data: imc.DataConfig = imc.DataConfig(is_training=True)
+  validation_data: imc.DataConfig = imc.DataConfig(is_training=False)
+  losses: Losses = Losses()
+  gradient_clip_norm: float = 0.0
+  logging_dir:str = None
+
+
+@exp_factory.register_config_factory('darknet_classification')
+def image_classification() -> cfg.ExperimentConfig:
+  """Image classification general."""
+  return cfg.ExperimentConfig(
+      task=ImageClassificationTask(),
+      trainer=cfg.TrainerConfig(),
+      restrictions=[
+          'task.train_data.is_training != None',
+          'task.validation_data.is_training != None'
+      ])
\ No newline at end of file
--- a/official/vision/beta/projects/yolo/configs/experiments/darknet53.yaml
+++ b/official/vision/beta/projects/yolo/configs/experiments/darknet53.yaml
 runtime:
  distribution_strategy: 'mirrored'
  mixed_precision_dtype: 'float32'
-  loss_scale: 'dynamic'
 task:
  model:
    num_classes: 1001
@@ -9,32 +8,28 @@ task:
    backbone:
      type: 'darknet'
      darknet:
-        model_id: 'darknet53'
+        model_id: 'cspdarknettiny'
  losses:
    l2_weight_decay: 0.0005
    one_hot: True
  train_data:
-    tfds_name: 'imagenet_a'
-    tfds_split: 'test'
-    tfds_download: True
-    is_training: True
-    global_batch_size: 2
+    input_path: 'imagenet-2012-tfrecord/train*'
+    is_training: true
+    global_batch_size: 128
    dtype: 'float16'
  validation_data:
-    tfds_name: 'imagenet_a'
-    tfds_split: 'test'
-    tfds_download: True
-    is_training: False
-    global_batch_size: 2
+    input_path: 'imagenet-2012-tfrecord/valid*'
+    is_training: true
+    global_batch_size: 128 
    dtype: 'float16'
-    drop_remainder: False
+    drop_remainder: false
 trainer:
-  train_steps: 51200000 # in the paper
-  validation_steps: 25600 # size of validation data
-  validation_interval: 150 
-  steps_per_loop: 150
-  summary_interval: 150
-  checkpoint_interval: 150
+  train_steps: 800000 # in the paper
+  validation_steps: 400 # size of validation data
+  validation_interval: 10000
+  steps_per_loop: 10000
+  summary_interval: 10000
+  checkpoint_interval: 10000
  optimizer_config:
    optimizer:
      type: 'sgd'
@@ -46,8 +41,8 @@ trainer:
        initial_learning_rate: 0.1
        end_learning_rate: 0.0001
        power: 4.0 
-        decay_steps: 51136000
+        decay_steps: 799000
    warmup:
      type: 'linear'
      linear:
-        warmup_steps: 64000 #lr rise from 0 to 0.1 over 1000 steps
+        warmup_steps: 1000 #learning rate rises from 0 to 0.1 over 1000 steps
--- a/official/vision/beta/projects/yolo/modeling/__init__.py
+++ b/official/vision/beta/projects/yolo/modeling/__init__.py
-#from .yolo_v3 import Yolov3
+
--- a/official/vision/beta/projects/yolo/modeling/backbones/Darknet.py
+++ b/official/vision/beta/projects/yolo/modeling/backbones/Darknet.py
+"""Contains definitions of Darknet Backbone Networks. 
+   The models are inspired by ResNet, and CSPNet 
+
+Residual networks (ResNets) were proposed in:
+[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
+    Deep Residual Learning for Image Recognition. arXiv:1512.03385
+
+Cross Stage Partial networks (CSPNets) were proposed in:
+[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh
+    CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv:1911.11929
+
+
+DarkNets Are used mainly for Object detection in:
+[1] Joseph Redmon, Ali Farhadi
+    YOLOv3: An Incremental Improvement. arXiv:1804.02767 
+
+[2] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao
+    YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934
+"""
+
 import tensorflow as tf
 import tensorflow.keras as ks
 import collections
@@ -8,33 +28,39 @@ from official.vision.beta.projects.yolo.modeling import building_blocks as nn_bl

 # builder required classes
 class BlockConfig(object):
-
-  def __init__(self, layer, stack, reps, bottleneck, filters, kernel_size,
+  '''
+    get layer config to make code more readable
+
+    Args:
+        layer: string layer name
+        stack: the type of layer ordering to use for this specific level
+        repetitions: integer for the number of times to repeat block
+        bottelneck: boolean for does this stack have a bottle neck layer 
+        filters: integer for the output depth of the level
+        pool_size: integer the pool_size of max pool layers 
+        kernel_size: optional integer, for convolution kernel size
+        strides: integer or tuple to indicate convolution strides 
+        padding: the padding to apply to layers in this stack
+        activation: string for the activation to use for this stack 
+        route: integer for what level to route from to get the next input
+        output_name: the name to use for this output
+        is_output: is this layer an output in the default model
+  '''
+  def __init__(self, layer, stack, reps, bottleneck, filters, pool_size, kernel_size,
               strides, padding, activation, route, output_name, is_output):
-    '''
-        get layer config to make code more readable
-
-        Args:
-            layer: string layer name
-            reps: integer for the number of times to repeat block
-            filters: integer for the filter for this layer, or the output depth
-            kernel_size: integer or none, if none, it implies that the the building block handles this automatically. not a layer input
-            downsample: boolean, to down sample the input width and height
-            output: boolean, true if the layer is required as an output
-        '''
    self.layer = layer
    self.stack = stack
    self.repetitions = reps
    self.bottleneck = bottleneck
    self.filters = filters
    self.kernel_size = kernel_size
+    self.pool_size = pool_size
    self.strides = strides
    self.padding = padding
    self.activation = activation
    self.route = route
    self.output_name = output_name
    self.is_output = is_output
-    return


 def build_block_specs(config):
@@ -43,48 +69,46 @@ def build_block_specs(config):
    specs.append(BlockConfig(*layer))
  return specs

+class layer_factory(object):
+  """ 
+  class for quick look up of default layers used by darknet to
+  connect, introduce or exit a level. Used in place of an if condition 
+  or switch to make adding new layers easier and to reduce redundant code  
+  """
+  def __init__(self):
+    self._layer_dict = {
+        "DarkTiny": (nn_blocks.DarkTiny, self.darktiny_config_todict),
+        "DarkConv": (nn_blocks.DarkConv, self.darkconv_config_todict),
+        "MaxPool": (tf.keras.layers.MaxPool2D, self.maxpool_config_todict)
+    }

-def darkconv_config_todict(config, kwargs):
-  dictvals = {
-      "filters": config.filters,
-      "kernel_size": config.kernel_size,
-      "strides": config.strides,
-      "padding": config.padding
-  }
-  dictvals.update(kwargs)
-  return dictvals
-
-
-def darktiny_config_todict(config, kwargs):
-  dictvals = {"filters": config.filters, "strides": config.strides}
-  dictvals.update(kwargs)
-  return dictvals
-
+  def darkconv_config_todict(self, config, kwargs):
+    dictvals = {
+        "filters": config.filters,
+        "kernel_size": config.kernel_size,
+        "strides": config.strides,
+        "padding": config.padding
+    }
+    dictvals.update(kwargs)
+    return dictvals

-def maxpool_config_todict(config, kwargs):
-  return {
-      "pool_size": config.kernel_size,
-      "strides": config.strides,
-      "padding": config.padding,
-      "name": kwargs["name"]
-  }

+  def darktiny_config_todict(self, config, kwargs):
+    dictvals = {"filters": config.filters, "strides": config.strides}
+    dictvals.update(kwargs)
+    return dictvals

-class layer_registry(object):

-  def __init__(self):
-    self._layer_dict = {
-        "DarkTiny": (nn_blocks.DarkTiny, darktiny_config_todict),
-        "DarkConv": (nn_blocks.DarkConv, darkconv_config_todict),
-        "MaxPool": (tf.keras.layers.MaxPool2D, maxpool_config_todict)
+  def maxpool_config_todict(self, config, kwargs):
+    return {
+        "pool_size": config.pool_size,
+        "strides": config.strides,
+        "padding": config.padding,
+        "name": kwargs["name"]
    }
-    return
-
-  def _get_layer(self, key):
-    return self._layer_dict[key]

  def __call__(self, config, kwargs):
-    layer, get_param_dict = self._get_layer(config.layer)
+    layer, get_param_dict = self._layer_dict[config.layer]
    param_dict = get_param_dict(config, kwargs)
    return layer(**param_dict)

@@ -92,7 +116,7 @@ class layer_registry(object):
 # model configs
 LISTNAMES = [
    "default_layer_name", "level_type", "number_of_layers_in_level",
-    "bottleneck", "filters", "kernal_size", "strides", "padding",
+    "bottleneck", "filters", "kernal_size", "pool_size", "strides", "padding",
    "default_activation", "route", "level/name", "is_output"
 ]

@@ -101,12 +125,12 @@ CSPDARKNET53 = {
    "splits": {"backbone_split": 106,
               "neck_split": 138},
    "backbone": [
-        ["DarkConv", None, 1, False, 32, 3, 1, "same", "mish", -1, 0, False],  # 1
-        ["DarkRes", "csp", 1, True, 64, None, None, None, "mish", -1, 1, False],  # 3
-        ["DarkRes", "csp", 2, False, 128, None, None, None, "mish", -1, 2, False],  # 2
-        ["DarkRes", "csp", 8, False, 256, None, None, None, "mish", -1, 3, True],
-        ["DarkRes", "csp", 8, False, 512, None, None, None, "mish", -1, 4, True],  # 3
-        ["DarkRes", "csp", 4, False, 1024, None, None, None, "mish", -1, 5, True],  # 6  #route
+        ["DarkConv", None, 1, False, 32, None, 3, 1, "same", "mish", -1, 0, False],  
+        ["DarkRes", "csp", 1, True, 64, None, None, None, None, "mish", -1, 1, False],  
+        ["DarkRes", "csp", 2, False, 128, None, None, None, None, "mish", -1, 2, False],  
+        ["DarkRes", "csp", 8, False, 256, None, None, None, None, "mish", -1, 3, True],
+        ["DarkRes", "csp", 8, False, 512, None, None, None, None, "mish", -1, 4, True],  
+        ["DarkRes", "csp", 4, False, 1024, None, None, None, None, "mish", -1, 5, True],  
    ]
 }

@@ -114,12 +138,12 @@ DARKNET53 = {
    "list_names": LISTNAMES,
    "splits": {"backbone_split": 76},
    "backbone": [
-        ["DarkConv", None, 1, False, 32, 3, 1, "same", "leaky", -1, 0, False],  # 1
-        ["DarkRes", "residual", 1, True, 64, None, None, None, "leaky", -1, 1, False],  # 3
-        ["DarkRes", "residual", 2, False, 128, None, None, None, "leaky", -1, 2, False],  # 2
-        ["DarkRes", "residual", 8, False, 256, None, None, None, "leaky", -1, 3, True],
-        ["DarkRes", "residual", 8, False, 512, None, None, None, "leaky", -1, 4, True],  # 3
-        ["DarkRes", "residual", 4, False, 1024, None, None, None, "leaky", -1, 5, True],  # 6
+        ["DarkConv", None, 1, False, 32, None, 3, 1, "same", "leaky", -1, 0, False], 
+        ["DarkRes", "residual", 1, True, 64, None, None, None, None, "leaky", -1, 1, False],  
+        ["DarkRes", "residual", 2, False, 128, None, None, None, None, "leaky", -1, 2, False],  
+        ["DarkRes", "residual", 8, False, 256, None, None, None, None, "leaky", -1, 3, True],
+        ["DarkRes", "residual", 8, False, 512, None, None, None, None, "leaky", -1, 4, True], 
+        ["DarkRes", "residual", 4, False, 1024, None, None, None, None, "leaky", -1, 5, True], 
    ]
 }

@@ -127,12 +151,12 @@ CSPDARKNETTINY = {
    "list_names": LISTNAMES,
    "splits": {"backbone_split": 28},
    "backbone": [
-        ["DarkConv", None, 1, False, 32, 3, 2, "same", "leaky", -1, 0, False],  # 1
-        ["DarkConv", None, 1, False, 64, 3, 2, "same", "leaky", -1, 1, False],  # 1
-        ["CSPTiny", "csp_tiny", 1, False, 64, 3, 2, "same", "leaky", -1, 2, False],  # 3
-        ["CSPTiny", "csp_tiny", 1, False, 128, 3, 2, "same", "leaky", -1, 3, False],  # 3
-        ["CSPTiny", "csp_tiny", 1, False, 256, 3, 2, "same", "leaky", -1, 4, True],  # 3
-        ["DarkConv", None, 1, False, 512, 3, 1, "same", "leaky", -1, 5, True],  # 1
+        ["DarkConv", None, 1, False, 32, None, 3, 2, "same", "leaky", -1, 0, False],
+        ["DarkConv", None, 1, False, 64, None, 3, 2, "same", "leaky", -1, 1, False],
+        ["CSPTiny", "csp_tiny", 1, False, 64, None, 3, 2, "same", "leaky", -1, 2, False],
+        ["CSPTiny", "csp_tiny", 1, False, 128, None, 3, 2, "same", "leaky", -1, 3, False],
+        ["CSPTiny", "csp_tiny", 1, False, 256, None, 3, 2, "same", "leaky", -1, 4, True],
+        ["DarkConv", None, 1, False, 512, None, 3, 1, "same", "leaky", -1, 5, True],
    ]
 }

@@ -140,13 +164,13 @@ DARKNETTINY = {
    "list_names": LISTNAMES,
    "splits": {"backbone_split": 14},
    "backbone": [
-        ["DarkConv", None, 1, False, 16, 3, 1, "same", "leaky", -1, 0, False],  # 1
-        ["DarkTiny", None, 1, True, 32, 3, 2, "same", "leaky", -1, 1, False],  # 3
-        ["DarkTiny", None, 1, True, 64, 3, 2, "same", "leaky", -1, 2, False],  # 3
-        ["DarkTiny", None, 1, False, 128, 3, 2, "same", "leaky", -1, 3, False],  # 2
-        ["DarkTiny", None, 1, False, 256, 3, 2, "same", "leaky", -1, 4, True],
-        ["DarkTiny", None, 1, False, 512, 3, 2, "same", "leaky", -1, 5, False],  # 3
-        ["DarkTiny", None, 1, False, 1024, 3, 1, "same", "leaky", -1, 5, True],  # 6  #route
+        ["DarkConv", None, 1, False, 16, None, 3, 1, "same", "leaky", -1, 0, False],
+        ["DarkTiny", None, 1, True, 32, None, 3, 2, "same", "leaky", -1, 1, False],
+        ["DarkTiny", None, 1, True, 64, None, 3, 2, "same", "leaky", -1, 2, False], 
+        ["DarkTiny", None, 1, False, 128, None, 3, 2, "same", "leaky", -1, 3, False],
+        ["DarkTiny", None, 1, False, 256, None, 3, 2, "same", "leaky", -1, 4, True],
+        ["DarkTiny", None, 1, False, 512, None, 3, 2, "same", "leaky", -1, 5, False],
+        ["DarkTiny", None, 1, False, 1024, None, 3, 1, "same", "leaky", -1, 5, True],
    ]
 }

@@ -164,9 +188,9 @@ class Darknet(ks.Model):
  def __init__(
      self,
      model_id="darknet53",
-      input_shape=tf.keras.layers.InputSpec(shape=[None, None, None, 3]),
-      min_size=None,
-      max_size=5,
+      input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]),
+      min_level=None,
+      max_level=5,
      activation=None,
      use_sync_bn=False,
      norm_momentum=0.99,
@@ -174,19 +198,18 @@ class Darknet(ks.Model):
      kernel_initializer='glorot_uniform',
      kernel_regularizer=None,
      bias_regularizer=None,
-      config=None,
      **kwargs):

    layer_specs, splits = Darknet.get_model_config(model_id)

    self._model_name = model_id
    self._splits = splits
-    self._input_shape = input_shape
-    self._registry = layer_registry()
+    self._input_shape = input_specs
+    self._registry = layer_factory()

    # default layer look up
-    self._min_size = min_size
-    self._max_size = max_size
+    self._min_size = min_level
+    self._max_size = max_level
    self._output_specs = None

    self._kernel_initializer = kernel_initializer
@@ -195,11 +218,11 @@ class Darknet(ks.Model):
    self._norm_epislon = norm_epsilon
    self._use_sync_bn = use_sync_bn
    self._activation = activation
-    self._weight_decay = kernel_regularizer
+    self._kernel_regularizer = kernel_regularizer

    self._default_dict = {
        "kernel_initializer": self._kernel_initializer,
-        "weight_decay": self._weight_decay,
+        "kernel_regularizer": self._kernel_regularizer,
        "bias_regularizer": self._bias_regularizer,
        "norm_momentum": self._norm_momentum,
        "norm_epsilon": self._norm_epislon,
@@ -211,7 +234,6 @@ class Darknet(ks.Model):
    inputs = ks.layers.Input(shape=self._input_shape.shape[1:])
    output = self._build_struct(layer_specs, inputs)
    super().__init__(inputs=inputs, outputs=output, name=self._model_name)
-    return

  @property
  def input_specs(self):
@@ -250,10 +272,10 @@ class Darknet(ks.Model):
                                     name=f"{config.layer}_{i}")
        stack_outputs.append(x_pass)
      if (config.is_output and
-          self._min_size == None):  # or isinstance(config.output_name, str):
-        endpoints[config.output_name] = x
+          self._min_size == None):
+        endpoints[str(config.output_name)] = x
      elif self._min_size != None and config.output_name >= self._min_size and config.output_name <= self._max_size:
-        endpoints[config.output_name] = x
+        endpoints[str(config.output_name)] = x

    self._output_specs = {l: endpoints[l].get_shape() for l in endpoints.keys()}
    return endpoints
@@ -334,7 +356,30 @@ class Darknet(ks.Model):
    backbone = BACKBONES[name]["backbone"]
    splits = BACKBONES[name]["splits"]
    return build_block_specs(backbone), splits
-
+  
+  @property
+  def model_id(self):
+    return self._model_name
+
+  @classmethod
+  def from_config(cls, config, custom_objects=None):
+    return cls(**config)
+
+  def get_config(self):
+    layer_config = {
+        "model_id": self._model_name,
+        "min_level": self._min_size,
+        "max_level": self._max_size,
+        "kernel_initializer": self._kernel_initializer,
+        "kernel_regularizer": self._kernel_regularizer,
+        "bias_regularizer": self._bias_regularizer,
+        "norm_momentum": self._norm_momentum,
+        "norm_epsilon": self._norm_epislon,
+        "use_sync_bn": self._use_sync_bn,
+        "activation": self._activation
+    }
+    #layer_config.update(super().get_config())
+    return layer_config

 @factory.register_backbone_builder('darknet')
 def build_darknet(

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPConnect.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPConnect.py
@@ -14,7 +14,7 @@ class CSPConnect(ks.layers.Layer):
      kernel_initializer='glorot_uniform',
      bias_initializer='zeros',
      bias_regularizer=None,
-      weight_decay=None,  # default find where is it is stated
+      kernel_regularizer=None,
      use_bn=True,
      use_sync_bn=False,
      norm_momentum=0.99,
@@ -30,7 +30,7 @@ class CSPConnect(ks.layers.Layer):
    #convoultion params
    self._kernel_initializer = kernel_initializer
    self._bias_initializer = bias_initializer
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer
    self._bias_regularizer = bias_regularizer
    self._use_bn = use_bn
    self._use_sync_bn = use_sync_bn
@@ -45,7 +45,7 @@ class CSPConnect(ks.layers.Layer):
                           kernel_initializer=self._kernel_initializer,
                           bias_initializer=self._bias_initializer,
                           bias_regularizer=self._bias_regularizer,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           use_bn=self._use_bn,
                           use_sync_bn=self._use_sync_bn,
                           norm_momentum=self._norm_moment,
@@ -58,7 +58,7 @@ class CSPConnect(ks.layers.Layer):
                           kernel_initializer=self._kernel_initializer,
                           bias_initializer=self._bias_initializer,
                           bias_regularizer=self._bias_regularizer,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           use_bn=self._use_bn,
                           use_sync_bn=self._use_sync_bn,
                           norm_momentum=self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPDownSample.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPDownSample.py
@@ -14,7 +14,7 @@ class CSPDownSample(ks.layers.Layer):
      kernel_initializer='glorot_uniform',
      bias_initializer='zeros',
      bias_regularizer=None,
-      weight_decay=None,  # default find where is it is stated
+      kernel_regularizer=None,
      use_bn=True,
      use_sync_bn=False,
      norm_momentum=0.99,
@@ -30,7 +30,7 @@ class CSPDownSample(ks.layers.Layer):
    #convoultion params
    self._kernel_initializer = kernel_initializer
    self._bias_initializer = bias_initializer
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer
    self._bias_regularizer = bias_regularizer
    self._use_bn = use_bn
    self._use_sync_bn = use_sync_bn
@@ -45,7 +45,7 @@ class CSPDownSample(ks.layers.Layer):
                           kernel_initializer=self._kernel_initializer,
                           bias_initializer=self._bias_initializer,
                           bias_regularizer=self._bias_regularizer,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           use_bn=self._use_bn,
                           use_sync_bn=self._use_sync_bn,
                           norm_momentum=self._norm_moment,
@@ -57,7 +57,7 @@ class CSPDownSample(ks.layers.Layer):
                           kernel_initializer=self._kernel_initializer,
                           bias_initializer=self._bias_initializer,
                           bias_regularizer=self._bias_regularizer,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           use_bn=self._use_bn,
                           use_sync_bn=self._use_sync_bn,
                           norm_momentum=self._norm_moment,
@@ -70,7 +70,7 @@ class CSPDownSample(ks.layers.Layer):
                           kernel_initializer=self._kernel_initializer,
                           bias_initializer=self._bias_initializer,
                           bias_regularizer=self._bias_regularizer,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           use_bn=self._use_bn,
                           use_sync_bn=self._use_sync_bn,
                           norm_momentum=self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPTiny.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_CSPTiny.py
@@ -14,7 +14,7 @@ class CSPTiny(ks.layers.Layer):
      kernel_initializer='glorot_uniform',
      bias_initializer='zeros',
      bias_regularizer=None,
-      weight_decay=None,  # default find where is it is stated
+      kernel_regularizer=None,
      use_bn=True,
      use_sync_bn=False,
      group_id=1,
@@ -34,7 +34,7 @@ class CSPTiny(ks.layers.Layer):
    self._bias_regularizer = bias_regularizer
    self._use_bn = use_bn
    self._use_sync_bn = use_sync_bn
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer
    self._groups = groups
    self._group_id = group_id
    self._downsample = downsample
@@ -59,7 +59,7 @@ class CSPTiny(ks.layers.Layer):
                                kernel_initializer=self._kernel_initializer,
                                bias_initializer=self._bias_initializer,
                                bias_regularizer=self._bias_regularizer,
-                                weight_decay=self._weight_decay,
+                                kernel_regularizer=self._kernel_regularizer,
                                use_bn=self._use_bn,
                                use_sync_bn=self._use_sync_bn,
                                norm_momentum=self._norm_moment,
@@ -75,7 +75,7 @@ class CSPTiny(ks.layers.Layer):
                                kernel_initializer=self._kernel_initializer,
                                bias_initializer=self._bias_initializer,
                                bias_regularizer=self._bias_regularizer,
-                                weight_decay=self._weight_decay,
+                                kernel_regularizer=self._kernel_regularizer,
                                use_bn=self._use_bn,
                                use_sync_bn=self._use_sync_bn,
                                norm_momentum=self._norm_moment,
@@ -91,7 +91,7 @@ class CSPTiny(ks.layers.Layer):
                                kernel_initializer=self._kernel_initializer,
                                bias_initializer=self._bias_initializer,
                                bias_regularizer=self._bias_regularizer,
-                                weight_decay=self._weight_decay,
+                                kernel_regularizer=self._kernel_regularizer,
                                use_bn=self._use_bn,
                                use_sync_bn=self._use_sync_bn,
                                norm_momentum=self._norm_moment,
@@ -107,7 +107,7 @@ class CSPTiny(ks.layers.Layer):
                                kernel_initializer=self._kernel_initializer,
                                bias_initializer=self._bias_initializer,
                                bias_regularizer=self._bias_regularizer,
-                                weight_decay=self._weight_decay,
+                                kernel_regularizer=self._kernel_regularizer,
                                use_bn=self._use_bn,
                                use_sync_bn=self._use_sync_bn,
                                norm_momentum=self._norm_moment,
@@ -143,7 +143,7 @@ class CSPTiny(ks.layers.Layer):
        "strides": self._strides,
        "kernel_initializer": self._kernel_initializer,
        "bias_initializer": self._bias_initializer,
-        "weight_decay": self._weight_decay,
+        "kernel_regularizer": self._kernel_regularizer,
        "use_bn": self._use_bn,
        "use_sync_bn": self._use_sync_bn,
        "norm_moment": self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkConv.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkConv.py
@@ -23,7 +23,7 @@ class DarkConv(ks.layers.Layer):
      kernel_initializer='glorot_uniform',
      bias_initializer='zeros',
      bias_regularizer=None,
-      weight_decay=None,  # default find where is it is stated
+      kernel_regularizer=None,  # Specify the weight decay as the default will not work.
      use_bn=True,
      use_sync_bn=False,
      norm_momentum=0.99,
@@ -66,7 +66,7 @@ class DarkConv(ks.layers.Layer):
    self._use_bias = use_bias
    self._kernel_initializer = kernel_initializer
    self._bias_initializer = bias_initializer
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer
    self._bias_regularizer = bias_regularizer

    # batchnorm params
@@ -99,7 +99,7 @@ class DarkConv(ks.layers.Layer):
        self._kernel_size) == int else self._kernel_size[0]
    if self._padding == "same" and kernel_size != 1:
      self._zeropad = ks.layers.ZeroPadding2D(
-          ((1, 1), (1, 1)))  # symetric padding
+          ((1, 1), (1, 1)))  # symmetric padding
    else:
      self._zeropad = Identity()

@@ -107,12 +107,12 @@ class DarkConv(ks.layers.Layer):
        filters=self._filters,
        kernel_size=self._kernel_size,
        strides=self._strides,
-        padding="valid",  #self._padding,
+        padding="valid",
        dilation_rate=self._dilation_rate,
        use_bias=self._use_bias,
        kernel_initializer=self._kernel_initializer,
        bias_initializer=self._bias_initializer,
-        kernel_regularizer=self._weight_decay,
+        kernel_regularizer=self._kernel_regularizer,
        bias_regularizer=self._bias_regularizer)

    #self.conv =tf.nn.convolution(filters=self._filters, strides=self._strides, padding=self._padding
@@ -136,8 +136,6 @@ class DarkConv(ks.layers.Layer):
      self._activation_fn = mish()
    else:
      self._activation_fn = ks.layers.Activation(activation=self._activation)
-
-    super(DarkConv, self).build(input_shape)
    return

  def call(self, inputs):
@@ -148,7 +146,7 @@ class DarkConv(ks.layers.Layer):
    return x

  def get_config(self):
-    # used to store/share parameters to reconsturct the model
+    # used to store/share parameters to reconstruct the model
    layer_config = {
        "filters": self._filters,
        "kernel_size": self._kernel_size,
@@ -159,7 +157,7 @@ class DarkConv(ks.layers.Layer):
        "kernel_initializer": self._kernel_initializer,
        "bias_initializer": self._bias_initializer,
        "bias_regularizer": self._bias_regularizer,
-        "l2_regularization": self._l2_regularization,
+        "kernel_regularizer": self._kernel_regularizer,
        "use_bn": self._use_bn,
        "use_sync_bn": self._use_sync_bn,
        "norm_moment": self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkResidual.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkResidual.py
@@ -14,7 +14,7 @@ class DarkResidual(ks.layers.Layer):
               use_bias=True,
               kernel_initializer='glorot_uniform',
               bias_initializer='zeros',
-               weight_decay=None,
+               kernel_regularizer=None,
               bias_regularizer=None,
               use_bn=True,
               use_sync_bn=False,
@@ -59,7 +59,7 @@ class DarkResidual(ks.layers.Layer):
    self._bias_regularizer = bias_regularizer
    self._use_bn = use_bn
    self._use_sync_bn = use_sync_bn
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer

    # normal params
    self._norm_moment = norm_momentum
@@ -88,7 +88,7 @@ class DarkResidual(ks.layers.Layer):
                             norm_momentum=self._norm_moment,
                             norm_epsilon=self._norm_epsilon,
                             activation=self._conv_activation,
-                             weight_decay=self._weight_decay,
+                             kernel_regularizer=self._kernel_regularizer,
                             leaky_alpha=self._leaky_alpha)
    else:
      self._dconv = Identity()
@@ -106,7 +106,7 @@ class DarkResidual(ks.layers.Layer):
                           norm_momentum=self._norm_moment,
                           norm_epsilon=self._norm_epsilon,
                           activation=self._conv_activation,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           leaky_alpha=self._leaky_alpha)
    self._conv2 = DarkConv(filters=self._filters,
                           kernel_size=(3, 3),
@@ -121,7 +121,7 @@ class DarkResidual(ks.layers.Layer):
                           norm_momentum=self._norm_moment,
                           norm_epsilon=self._norm_epsilon,
                           activation=self._conv_activation,
-                           weight_decay=self._weight_decay,
+                           kernel_regularizer=self._kernel_regularizer,
                           leaky_alpha=self._leaky_alpha)

    self._shortcut = ks.layers.Add()
@@ -138,13 +138,13 @@ class DarkResidual(ks.layers.Layer):
    return self._activation_fn(x)

  def get_config(self):
-    # used to store/share parameters to reconsturct the model
+    # used to store/share parameters to reconstruct the model
    layer_config = {
        "filters": self._filters,
        "use_bias": self._use_bias,
        "kernel_initializer": self._kernel_initializer,
        "bias_initializer": self._bias_initializer,
-        "weight_decay": self._weight_decay,
+        "kernel_regularizer": self._kernel_regularizer,
        "use_bn": self._use_bn,
        "use_sync_bn": self._use_sync_bn,
        "norm_moment": self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkTiny.py
+++ b/official/vision/beta/projects/yolo/modeling/building_blocks/_DarkTiny.py
@@ -15,7 +15,7 @@ class DarkTiny(ks.layers.Layer):
      kernel_initializer='glorot_uniform',
      bias_initializer='zeros',
      bias_regularizer=None,
-      weight_decay=None,  # default find where is it is stated
+      kernel_regularizer=None,  # default find where is it is stated
      use_bn=True,
      use_sync_bn=False,
      norm_momentum=0.99,
@@ -34,7 +34,7 @@ class DarkTiny(ks.layers.Layer):
    self._use_bn = use_bn
    self._use_sync_bn = use_sync_bn
    self._strides = strides
-    self._weight_decay = weight_decay
+    self._kernel_regularizer = kernel_regularizer

    # normal params
    self._norm_moment = norm_momentum
@@ -68,7 +68,7 @@ class DarkTiny(ks.layers.Layer):
                               kernel_initializer=self._kernel_initializer,
                               bias_initializer=self._bias_initializer,
                               bias_regularizer=self._bias_regularizer,
-                               weight_decay=self._weight_decay,
+                               kernel_regularizer=self._kernel_regularizer,
                               use_bn=self._use_bn,
                               use_sync_bn=self._use_sync_bn,
                               norm_momentum=self._norm_moment,

--- a/official/vision/beta/projects/yolo/modeling/tests/darknet_test.py
+++ b/official/vision/beta/projects/yolo/modeling/tests/darknet_test.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for resnet."""
+
+# Import libraries
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.distribute import combinations
+from tensorflow.python.distribute import strategy_combinations
+from official.vision.beta.projects.yolo.modeling.backbones import Darknet
+
+
+class DarkNetTest(parameterized.TestCase, tf.test.TestCase):
+  @parameterized.parameters(
+      (224, "darknet53", 2, 1),
+      (224, "darknettiny", 1, 2),
+      (224, "cspdarknettiny", 1, 1),
+      (224, "cspdarknet53", 2, 1),
+  )
+  def test_network_creation(self, input_size, model_id,
+                            endpoint_filter_scale, scale_final):
+    """Test creation of ResNet family models."""
+    tf.keras.backend.set_image_data_format('channels_last')
+
+    network = Darknet.Darknet(model_id=model_id, min_level=3, max_level=5)
+    print(network.model_id)
+    self.assertEqual(network.model_id, model_id)
+
+    inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1)
+    endpoints = network(inputs)
+
+
+    self.assertAllEqual(
+        [1, input_size / 2**3, input_size / 2**3, 128 * endpoint_filter_scale],
+        endpoints['3'].shape.as_list())
+    self.assertAllEqual(
+        [1, input_size / 2**4, input_size / 2**4, 256 * endpoint_filter_scale],
+        endpoints['4'].shape.as_list())
+    self.assertAllEqual(
+        [1, input_size / 2**5, input_size / 2**5, 512 * endpoint_filter_scale * scale_final],
+        endpoints['5'].shape.as_list())
+
+  @combinations.generate(
+      combinations.combine(
+          strategy=[
+              strategy_combinations.tpu_strategy,
+              strategy_combinations.one_device_strategy_gpu,
+          ],
+          use_sync_bn=[False, True],
+      ))
+  def test_sync_bn_multiple_devices(self, strategy, use_sync_bn):
+    """Test for sync bn on TPU and GPU devices."""
+    inputs = np.random.rand(1, 224, 224, 3)
+
+    tf.keras.backend.set_image_data_format('channels_last')
+
+    with strategy.scope():
+      network = Darknet.Darknet(model_id="darknet53", min_size=3, max_size=5)
+      _ = network(inputs)
+
+  @parameterized.parameters(1, 3, 4)
+  def test_input_specs(self, input_dim):
+    """Test different input feature dimensions."""
+    tf.keras.backend.set_image_data_format('channels_last')
+
+    input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, input_dim])
+    network = Darknet.Darknet(model_id="darknet53", min_level=3, max_level=5, input_specs=input_specs)
+
+    inputs = tf.keras.Input(shape=(224, 224, input_dim), batch_size=1)
+    _ = network(inputs)
+
+  def test_serialize_deserialize(self):
+    # Create a network object that sets all of its config options.
+    kwargs = dict(
+        model_id="darknet53",
+        min_level = 3, 
+        max_level = 5, 
+        use_sync_bn=False,
+        activation='relu',
+        norm_momentum=0.99,
+        norm_epsilon=0.001,
+        kernel_initializer='VarianceScaling',
+        kernel_regularizer=None,
+        bias_regularizer=None,
+    )
+    network = Darknet.Darknet(**kwargs)
+
+    expected_config = dict(kwargs)
+    self.assertEqual(network.get_config(), expected_config)
+
+    # Create another network object from the first object's config.
+    new_network = Darknet.Darknet.from_config(network.get_config())
+
+    # Validate that the config can be forced to JSON.
+    _ = new_network.to_json()
+
+    # If the serialization was successful, the new config should match the old.
+    self.assertAllEqual(network.get_config(), new_network.get_config())
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/official/vision/beta/projects/yolo/modeling/tests/test_DarkConv.py
+++ b/official/vision/beta/projects/yolo/modeling/tests/test_DarkConv.py
@@ -54,19 +54,5 @@ class DarkConvTest(tf.test.TestCase, parameterized.TestCase):
    self.assertNotIn(None, grad)
    return

-  # @parameterized.named_parameters(("filters", 3), ("filters", 20), ("filters", 512))
-  # def test_time(self, filters):
-  #     # finish the test for time
-  #     dataset = tfds.load("mnist")
-  #     model = ks.Sequential([
-  #             DarkConv(7, kernel_size=(3,3), strides = (2,2), activation='relu'),
-  #             DarkConv(10, kernel_size=(3,3), strides = (2,2), activation='relu'),
-  #             DarkConv(filters, kernel_size=(3,3), strides = (1,1), activation='relu'),
-  #             DarkConv(9, kernel_size=(3,3), strides = (2,2), activation='relu'),
-  #             ks.layers.GlobalAveragePooling2D(),
-  #             ks.layers.Dense(10, activation='softmax')], name='test')
-  #     return
-
-
 if __name__ == "__main__":
  tf.test.main()
--- a/official/vision/beta/projects/yolo/tasks/image_classification.py
+++ b/official/vision/beta/projects/yolo/tasks/image_classification.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Image classification task definition."""
+import tensorflow as tf
+from official.core import base_task
+from official.core import input_reader
+from official.core import task_factory
+from official.modeling import tf_utils
+from official.vision.beta.projects.yolo.configs import darknet_classification as exp_cfg
+from official.vision.beta.dataloaders import classification_input
+from official.vision.beta.modeling import factory
+
+
+@task_factory.register_task_cls(exp_cfg.ImageClassificationTask)
+class ImageClassificationTask(base_task.Task):
+  """A task for image classification."""
+
+  def build_model(self):
+    """Builds classification model."""
+    input_specs = tf.keras.layers.InputSpec(
+        shape=[None] + self.task_config.model.input_size)
+
+    l2_weight_decay = self.task_config.losses.l2_weight_decay
+    # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss.
+    # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2)
+    # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss)
+    l2_regularizer = (tf.keras.regularizers.l2(
+        l2_weight_decay / 2.0) if l2_weight_decay else None)
+
+    model = factory.build_classification_model(
+        input_specs=input_specs,
+        model_config=self.task_config.model,
+        l2_regularizer=l2_regularizer)
+    return model
+
+  def build_inputs(self, params, input_context=None):
+    """Builds classification input."""
+
+    num_classes = self.task_config.model.num_classes
+    input_size = self.task_config.model.input_size
+
+    decoder = classification_input.Decoder()
+    parser = classification_input.Parser(
+        output_size=input_size[:2],
+        num_classes=num_classes,
+        dtype=params.dtype)
+
+    reader = input_reader.InputReader(
+        params,
+        dataset_fn=tf.data.TFRecordDataset,
+        decoder_fn=decoder.decode,
+        parser_fn=parser.parse_fn(params.is_training))
+
+    dataset = reader.read(input_context=input_context)
+
+    return dataset
+
+  def build_losses(self, labels, model_outputs, aux_losses=None):
+    """Sparse categorical cross entropy loss.
+
+    Args:
+      labels: labels.
+      model_outputs: Output logits of the classifier.
+      aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model.
+
+    Returns:
+      The total loss tensor.
+    """
+    losses_config = self.task_config.losses
+    if losses_config.one_hot:
+      total_loss = tf.keras.losses.categorical_crossentropy(
+          labels,
+          model_outputs,
+          from_logits=True,
+          label_smoothing=losses_config.label_smoothing)
+    else:
+      total_loss = tf.keras.losses.sparse_categorical_crossentropy(
+          labels, model_outputs, from_logits=True)
+
+    total_loss = tf_utils.safe_mean(total_loss)
+    if aux_losses:
+      total_loss += tf.add_n(aux_losses)
+
+    return total_loss
+
+  def build_metrics(self, training=True):
+    """Gets streaming metrics for training/validation."""
+    if self.task_config.losses.one_hot:
+      metrics = [
+          tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
+          tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top_5_accuracy')]
+    else:
+      metrics = [
+          tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
+          tf.keras.metrics.SparseTopKCategoricalAccuracy(
+              k=5, name='top_5_accuracy')]
+    return metrics
+
+  def train_step(self, inputs, model, optimizer, metrics=None):
+    """Does forward and backward.
+
+    Args:
+      inputs: a dictionary of input tensors.
+      model: the model, forward pass definition.
+      optimizer: the optimizer for this training step.
+      metrics: a nested structure of metrics objects.
+
+    Returns:
+      A dictionary of logs.
+    """
+    features, labels = inputs
+    if self.task_config.losses.one_hot:
+      labels = tf.one_hot(labels, self.task_config.model.num_classes)
+
+    num_replicas = tf.distribute.get_strategy().num_replicas_in_sync
+    with tf.GradientTape() as tape:
+      outputs = model(features, training=True)
+      # Casting output layer as float32 is necessary when mixed_precision is
+      # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32.
+      outputs = tf.nest.map_structure(
+          lambda x: tf.cast(x, tf.float32), outputs)
+
+      # Computes per-replica loss.
+      loss = self.build_losses(
+          model_outputs=outputs, labels=labels, aux_losses=model.losses)
+      # Scales loss as the default gradients allreduce performs sum inside the
+      # optimizer.
+      scaled_loss = loss / num_replicas
+
+      # For mixed_precision policy, when LossScaleOptimizer is used, loss is
+      # scaled for numerical stability.
+      if isinstance(
+          optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer):
+        scaled_loss = optimizer.get_scaled_loss(scaled_loss)
+
+    tvars = model.trainable_variables
+    grads = tape.gradient(scaled_loss, tvars)
+    # Scales back gradient before apply_gradients when LossScaleOptimizer is
+    # used.
+    if isinstance(
+        optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer):
+      grads = optimizer.get_unscaled_gradients(grads)
+
+    # Apply gradient clipping.
+    if self.task_config.gradient_clip_norm > 0:
+      grads, _ = tf.clip_by_global_norm(
+          grads, self.task_config.gradient_clip_norm)
+    optimizer.apply_gradients(list(zip(grads, tvars)))
+
+    logs = {self.loss: loss}
+    if metrics:
+      self.process_metrics(metrics, labels, outputs)
+      logs.update({m.name: m.result() for m in metrics})
+    elif model.compiled_metrics:
+      self.process_compiled_metrics(model.compiled_metrics, labels, outputs)
+      logs.update({m.name: m.result() for m in model.metrics})
+    return logs
+
+  def validation_step(self, inputs, model, metrics=None):
+    """Validatation step.
+
+    Args:
+      inputs: a dictionary of input tensors.
+      model: the keras.Model.
+      metrics: a nested structure of metrics objects.
+
+    Returns:
+      A dictionary of logs.
+    """
+    features, labels = inputs
+    if self.task_config.losses.one_hot:
+      labels = tf.one_hot(labels, self.task_config.model.num_classes)
+
+    outputs = self.inference_step(features, model)
+    outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs)
+    loss = self.build_losses(model_outputs=outputs, labels=labels,
+                             aux_losses=model.losses)
+
+    logs = {self.loss: loss}
+    if metrics:
+      self.process_metrics(metrics, labels, outputs)
+      logs.update({m.name: m.result() for m in metrics})
+    elif model.compiled_metrics:
+      self.process_compiled_metrics(model.compiled_metrics, labels, outputs)
+      logs.update({m.name: m.result() for m in model.metrics})
+    return logs
+
+  def inference_step(self, inputs, model):
+    """Performs the forward step."""
+    return model(inputs, training=False)
--- a/official/vision/beta/projects/yolo/train.py
+++ b/official/vision/beta/projects/yolo/train.py
@@ -18,6 +18,7 @@
 from absl import app
 from absl import flags
 import gin
+import sys

 from official.core import train_utils
 # pylint: disable=unused-import
@@ -31,9 +32,21 @@ from official.modeling import performance

 FLAGS = flags.FLAGS

+'''
+python3 -m official.vision.beta.projects.yolo.train --mode=train_and_eval --experiment=darknet_classification --model_dir=training_dir --config_file=official/vision/beta/projects/yolo/configs/experiments/darknet53.yaml
+'''
+
+def import_overrides():
+  print(sys.modules["official.vision.beta.configs.backbones"])
+  return 
+
 def main(_):
+  import_overrides()
  gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params)
+  print(FLAGS.experiment)
  params = train_utils.parse_configuration(FLAGS)
+
+  
  model_dir = FLAGS.model_dir
  if 'train' in FLAGS.mode:
    # Pure eval modes do not output yaml files. Otherwise continuous eval job

--- a/training_dir/params.yaml
+++ b/training_dir/params.yaml
+runtime:
+  all_reduce_alg: null
+  batchnorm_spatial_persistent: false
+  dataset_num_private_threads: null
+  default_shard_dim: -1
+  distribution_strategy: mirrored
+  enable_xla: false
+  gpu_thread_mode: null
+  loss_scale: null
+  mixed_precision_dtype: float32
+  num_cores_per_replica: 1
+  num_gpus: 0
+  num_packs: 1
+  per_gpu_thread_count: 0
+  run_eagerly: false
+  task_index: -1
+  tpu: null
+  worker_hosts: null
+task:
+  gradient_clip_norm: 0.0
+  init_checkpoint: ''
+  logging_dir: null
+  losses:
+    l2_weight_decay: 0.0005
+    label_smoothing: 0.0
+    one_hot: true
+  model:
+    add_head_batch_norm: false
+    backbone:
+      darknet:
+        model_id: cspdarknettiny
+      type: darknet
+    dropout_rate: 0.0
+    input_size: [224, 224, 3]
+    norm_activation:
+      activation: relu
+      norm_epsilon: 0.001
+      norm_momentum: 0.99
+      use_sync_bn: false
+    num_classes: 1001
+  train_data:
+    block_length: 1
+    cache: false
+    cycle_length: 10
+    deterministic: null
+    drop_remainder: true
+    dtype: float16
+    enable_tf_data_service: false
+    global_batch_size: 128
+    input_path: imagenet-2012-tfrecord/train*
+    is_training: true
+    sharding: true
+    shuffle_buffer_size: 10000
+    tf_data_service_address: null
+    tf_data_service_job_name: null
+    tfds_as_supervised: false
+    tfds_data_dir: ''
+    tfds_download: false
+    tfds_name: ''
+    tfds_skip_decoding_feature: ''
+    tfds_split: ''
+  validation_data:
+    block_length: 1
+    cache: false
+    cycle_length: 10
+    deterministic: null
+    drop_remainder: false
+    dtype: float16
+    enable_tf_data_service: false
+    global_batch_size: 128
+    input_path: imagenet-2012-tfrecord/valid*
+    is_training: true
+    sharding: true
+    shuffle_buffer_size: 10000
+    tf_data_service_address: null
+    tf_data_service_job_name: null
+    tfds_as_supervised: false
+    tfds_data_dir: ''
+    tfds_download: false
+    tfds_name: ''
+    tfds_skip_decoding_feature: ''
+    tfds_split: ''
+trainer:
+  allow_tpu_summary: false
+  best_checkpoint_eval_metric: ''
+  best_checkpoint_export_subdir: ''
+  best_checkpoint_metric_comp: higher
+  checkpoint_interval: 10000
+  continuous_eval_timeout: 3600
+  eval_tf_function: true
+  max_to_keep: 5
+  optimizer_config:
+    ema: null
+    learning_rate:
+      polynomial:
+        cycle: false
+        decay_steps: 799000
+        end_learning_rate: 0.0001
+        initial_learning_rate: 0.1
+        name: PolynomialDecay
+        power: 4.0
+      type: polynomial
+    optimizer:
+      sgd:
+        clipnorm: null
+        clipvalue: null
+        decay: 0.0
+        momentum: 0.9
+        name: SGD
+        nesterov: false
+      type: sgd
+    warmup:
+      linear:
+        name: linear
+        warmup_learning_rate: 0
+        warmup_steps: 1000
+      type: linear
+  steps_per_loop: 10000
+  summary_interval: 10000
+  train_steps: 800000
+  train_tf_function: true
+  train_tf_while_loop: true
+  validation_interval: 10000
+  validation_steps: 400