Grammar & Spelling Fixes

64f16d61 · Akhil Chinnakotla · c02980f4 · 64f16d61 · 64f16d61 · 64f16d61
Commit 64f16d61 authored Jun 14, 2021 by Akhil Chinnakotla
10 changed files
--- a/official/vision/beta/projects/yolo/README.md
+++ b/official/vision/beta/projects/yolo/README.md
@@ -14,30 +14,30 @@ repository.

 ## Description

-Yolo v1 the original implementation was released in 2015 providing a ground
-breaking algorithm that would quickly process images, and locate objects in a
-single pass through the detector. The original implementation based used a
-backbone derived from state of the art object classifier of the time, like
+YOLO v1 the original implementation was released in 2015 providing a groundbreaking
+algorithm that would quickly process images and locate objects in a
+single pass through the detector. The original implementation used a
+backbone derived from state of the art object classifiers of the time, like
 [GoogLeNet](https://arxiv.org/abs/1409.4842) and
 [VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel
-Yolo Detection head that allowed for Object Detection with a single pass of an
+YOLO Detection head that allowed for Object Detection with a single pass of an
 image. Though limited, the network could predict up to 90 bounding boxes per
-image, and was tested for about 80 classes per box. Also, the model could only
-make prediction at one scale. These attributes caused yolo v1 to be more
-limited, and less versatile, so as the year passed, the Developers continued to
+image, and was tested for about 80 classes per box. Also, the model can only
+make predictions at one scale. These attributes caused YOLO v1 to be more
+limited and less versatile, so as the year passed, the Developers continued to
 update and develop this model.

-Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo
-network group. These model uses a custom backbone called Darknet53 that uses
-knowledge gained from the ResNet paper to improve its predictions. The new
-backbone also allows for objects to be detected at multiple scales. As for the
-new detection head, the model now predicts the bounding boxes using a set of
-anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in
-combination with the Anchor boxes allows for the network to make up to 1000
-object predictions on a single image. Finally, the new loss function forces the
-network to make better prediction by using Intersection Over Union (IOU) to
-inform the model's confidence rather than relying on the mean squared error for
-the entire output.
+YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO
+network group. This model uses a custom backbone called Darknet53 that uses
+knowledge gained from the ResNet paper to improve its predictions. The new backbone
+also allows for objects to be detected at multiple scales. As for the new detection head,
+the model now predicts the bounding boxes using a set of anchor box priors (Anchor
+Boxes) as suggestions. Multiscale predictions in combination with Anchor boxes allow
+for the network to make up to 1000 object predictions on a single image. Finally,
+the new loss function forces the network to make better predictions by using Intersection
+Over Union (IOU) to inform the model's confidence rather than relying on the mean
+squared error for the entire output.
+

 ## Authors

@@ -56,9 +56,9 @@ the entire output.

 ## Our Goal

-Our goal with this model conversion is to provide implementations of the
-Backbone and Yolo Head. We have built the model in such a way that the Yolo
-head could be connected to a new, more powerful backbone if a person chose to.
+Our goal with this model conversion is to provide implementation of the Backbone
+and YOLO Head. We have built the model in such a way that the YOLO head could be
+connected to a new, more powerful backbone if a person chose to.

 ## Models in the library


--- a/official/vision/beta/projects/yolo/configs/darknet_classification.py
+++ b/official/vision/beta/projects/yolo/configs/darknet_classification.py
@@ -35,7 +35,7 @@ class ImageClassificationModel(hyperparams.Config):
      type='darknet', darknet=backbones.Darknet())
  dropout_rate: float = 0.0
  norm_activation: common.NormActivation = common.NormActivation()
-  # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
+  # Adds a Batch Normalization layer pre-GlobalAveragePooling in classification.
  add_head_batch_norm: bool = False



--- a/official/vision/beta/projects/yolo/modeling/backbones/darknet.py
+++ b/official/vision/beta/projects/yolo/modeling/backbones/darknet.py
@@ -16,7 +16,7 @@

 """Contains definitions of Darknet Backbone Networks.

-   The models are inspired by ResNet, and CSPNet
+These models are inspired by ResNet and CSPNet.

 Residual networks (ResNets) were proposed in:
 [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
@@ -49,7 +49,7 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks

 class BlockConfig:
  """
-  Class to store layer config to make code more readable
+  This is a class to store layer config to make code more readable.
  """

  def __init__(self, layer, stack, reps, bottleneck, filters, pool_size,
@@ -69,7 +69,7 @@ class BlockConfig:
      padding: An `int` for the padding to apply to layers in this stack.
      activation: A `str` for the activation to use for this stack.
      route: An `int` for the level to route from to get the next input.
-      dilation_rate: An `int` for the scale used in dialated Darknet.
+      dilation_rate: An `int` for the scale used in dilated Darknet.
      output_name: A `str` for the name to use for this output.
      is_output: A `bool` for whether this layer is an output in the default
        model.
@@ -99,9 +99,10 @@ def build_block_specs(config):

 class LayerBuilder:
  """
-  class for quick look up of default layers used by darknet to
-  connect, introduce or exit a level. Used in place of an if condition
-  or switch to make adding new layers easier and to reduce redundant code
+  This is a class that is used for quick look up of default layers used
+  by darknet to connect, introduce or exit a level. Used in place of an
+  if condition or switch to make adding new layers easier and to reduce
+  redundant code.
  """

  def __init__(self):
@@ -377,7 +378,7 @@ BACKBONES = {

 @tf.keras.utils.register_keras_serializable(package='yolo')
 class Darknet(tf.keras.Model):
-  """ The Darknet backbone architecture """
+  """ The Darknet backbone architecture. """

  def __init__(
      self,

--- a/official/vision/beta/projects/yolo/modeling/backbones/darknet_test.py
+++ b/official/vision/beta/projects/yolo/modeling/backbones/darknet_test.py
@@ -13,7 +13,7 @@
 # limitations under the License.

 # Lint as: python3
-"""Tests for yolo."""
+"""Tests for YOLO."""

 from absl.testing import parameterized
 import numpy as np

--- a/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder.py
+++ b/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder.py
@@ -13,7 +13,7 @@
 # limitations under the License.

 # Lint as: python3
-"""Feature Pyramid Network and Path Aggregation variants used in YOLO"""
+"""Feature Pyramid Network and Path Aggregation variants used in YOLO."""

 import tensorflow as tf
 from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
@@ -23,8 +23,10 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
 class _IdentityRoute(tf.keras.layers.Layer):

  def __init__(self, **kwargs):
-    """Private class to mirror the outputs of blocks in nn_blocks for an easier
-    programatic generation of the feature pyramid network"""
+    """
+    Private class to mirror the outputs of blocks in nn_blocks for an easier
+    programatic generation of the feature pyramid network.
+    """

    super().__init__(**kwargs)

@@ -125,7 +127,7 @@ class YoloFPN(tf.keras.layers.Layer):
    # directly connect to an input path and process it
    self.preprocessors = dict()
    # resample an input and merge it with the output of another path
-    # inorder to aggregate backbone outputs
+    # in order to aggregate backbone outputs
    self.resamples = dict()
    # set of convoltion layers and upsample layers that are used to
    # prepare the FPN processors for output
@@ -214,7 +216,7 @@ class YoloPAN(tf.keras.layers.Layer):
      kernel_initializer: kernel_initializer for convolutional layers.
      kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D.
      bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d.
-      fpn_input: `bool`, for whether the input into this fucntion is an FPN or
+      fpn_input: `bool`, for whether the input into this function is an FPN or
        a backbone.
      fpn_filter_scale: `int`, scaling factor for the FPN filters.
      **kwargs: keyword arguments to be passed.
@@ -268,7 +270,7 @@ class YoloPAN(tf.keras.layers.Layer):
    # directly connect to an input path and process it
    self.preprocessors = dict()
    # resample an input and merge it with the output of another path
-    # inorder to aggregate backbone outputs
+    # in order to aggregate backbone outputs
    self.resamples = dict()

    # FPN will reverse the key process order for the backbone, so we need

--- a/official/vision/beta/projects/yolo/modeling/heads/yolo_head_test.py
+++ b/official/vision/beta/projects/yolo/modeling/heads/yolo_head_test.py
@@ -13,7 +13,7 @@
 # limitations under the License.

 # Lint as: python3
-"""Tests for yolo heads."""
+"""Tests for YOLO heads."""

 # Import libraries
 from absl.testing import parameterized
@@ -44,7 +44,6 @@ class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase):
      inputs[key] = tf.ones(input_shape[key], dtype=tf.float32)

    endpoints = head(inputs)
-    # print(endpoints)

    for key in endpoints.keys():
      expected_input_shape = input_shape[key]

--- a/official/vision/beta/projects/yolo/modeling/layers/nn_blocks.py
+++ b/official/vision/beta/projects/yolo/modeling/layers/nn_blocks.py
@@ -14,7 +14,7 @@

 # Lint as: python3

-"""Contains common building blocks for yolo neural networks."""
+"""Contains common building blocks for YOLO neural networks."""
 from typing import Callable, List
 import tensorflow as tf
 from official.modeling import tf_utils
@@ -35,9 +35,9 @@ class Identity(tf.keras.layers.Layer):
 class ConvBN(tf.keras.layers.Layer):
  """
  Modified Convolution layer to match that of the Darknet Library.
-  The Layer is a standards combination of Conv BatchNorm Activation,
-  however, the use of bias in the conv is determined by the use of batch
-  normalization.
+  The Layer is a standard combination of Conv BatchNorm Activation,
+  however, the use of bias in the Conv is determined by the use of
+  batch normalization.
  Cross Stage Partial networks (CSPNets) were proposed in:
  [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
        Ping-Yang Chen, Jun-Wei Hsieh
@@ -71,16 +71,16 @@ class ConvBN(tf.keras.layers.Layer):
        use.
      padding: string 'valid' or 'same', if same, then pad the image, else do
        not.
-      dialtion_rate: tuple to indicate how much to modulate kernel weights and
+      dilation_rate: tuple to indicate how much to modulate kernel weights and
        how many pixels in a feature map to skip.
      kernel_initializer: string to indicate which function to use to initialize
        weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global statistics
@@ -191,7 +191,7 @@ class ConvBN(tf.keras.layers.Layer):
 @tf.keras.utils.register_keras_serializable(package='yolo')
 class DarkResidual(tf.keras.layers.Layer):
  """
-  Darknet block with Residual connection for Yolo v3 Backbone
+  Darknet block with Residual connection for YOLO v3 Backbone
  """

  def __init__(self,
@@ -228,8 +228,6 @@ class DarkResidual(tf.keras.layers.Layer):
        (across all input batches).
      norm_momentum: float for moment to use for batch normalization.
      norm_epsilon: float for batch normalization epsilon.
-      conv_activation: string or None for activation function to use in layer,
-        if None activation is replaced by linear.
      leaky_alpha: float to use as alpha if activation function is leaky.
      sc_activation: string for activation function to use in layer.
      downsample: boolean for if image input is larger than layer output, set
@@ -352,10 +350,10 @@ class DarkResidual(tf.keras.layers.Layer):
 @tf.keras.utils.register_keras_serializable(package='yolo')
 class CSPTiny(tf.keras.layers.Layer):
  """
-  A Small size convolution block proposed in the CSPNet. The layer uses
-  shortcuts, routing(concatnation), and feature grouping in order to improve
-  gradient variablity and allow for high efficency, low power residual learning
-  for small networtf.keras.
+  A small size convolution block proposed in the CSPNet. The layer uses shortcuts,
+  routing(concatenation), and feature grouping in order to improve gradient
+  variability and allow for high efficiency, low power residual learning for small
+  networks.
  Cross Stage Partial networks (CSPNets) were proposed in:
  [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
        Ping-Yang Chen, Jun-Wei Hsieh
@@ -387,11 +385,11 @@ class CSPTiny(tf.keras.layers.Layer):
        weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      use_bn: boolean for whether to use batch normalization.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
+      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global statistics
        (across all input batches).
@@ -401,12 +399,12 @@ class CSPTiny(tf.keras.layers.Layer):
        feature stack output.
      norm_momentum: float for moment to use for batch normalization.
      norm_epsilon: float for batch normalization epsilon.
-      conv_activation: string or None for activation function to use in layer,
-        if None activation is replaced by linear.
-      leaky_alpha: float to use as alpha if activation function is leaky.
-      sc_activation: string for activation function to use in layer.
      downsample: boolean for if image input is larger than layer output, set
        downsample to True so the dimensions are forced to match.
+      leaky_alpha: float to use as alpha if activation function is leaky.
+      sc_activation: string for activation function to use in layer.
+      conv_activation: string or None for activation function to use in layer,
+        if None activation is replaced by linear.
      **kwargs: Keyword Arguments.
    """

@@ -505,18 +503,18 @@ class CSPTiny(tf.keras.layers.Layer):
 @tf.keras.utils.register_keras_serializable(package='yolo')
 class CSPRoute(tf.keras.layers.Layer):
  """
-  Down sampling layer to take the place of down sampleing done in Residual
+  Down sampling layer to take the place of down sampling done in Residual
  networks. This is the first of 2 layers needed to convert any Residual Network
  model to a CSPNet. At the start of a new level change, this CSPRoute layer
-  creates a learned identity that will act as a cross stage connection,
-  that is used to inform the inputs to the next stage. It is called cross stage
-  partial because the number of filters required in every intermitent Residual
+  creates a learned identity that will act as a cross stage connection that
+  is used to inform the inputs to the next stage. This is called cross stage
+  partial because the number of filters required in every intermittent residual
  layer is reduced by half. The sister layer will take the partial generated by
-  this layer and concatnate it with the output of the final residual layer in
-  the stack to create a fully feature level output. This concatnation merges the
+  this layer and concatenate it with the output of the final residual layer in the
+  stack to create a fully feature level output. This concatenation merges the
  partial blocks of 2 levels as input to the next allowing the gradients of each
-  level to be more unique, and reducing the number of parameters required by
-  each level by 50% while keeping accuracy consistent.
+  level to be more unique, and reducing the number of parameters required by each
+  level by 50% while keeping accuracy consistent.

  Cross Stage Partial networks (CSPNets) were proposed in:
  [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
@@ -544,24 +542,24 @@ class CSPRoute(tf.keras.layers.Layer):
    """
    Args:
      filters: integer for output depth, or the number of features to learn
-      filter_scale: integer dicating (filters//2) or the number of filters in
+      filter_scale: integer dictating (filters//2) or the number of filters in
        the partial feature stack.
-      downsample: down_sample the input.
      activation: string for activation function to use in layer.
      kernel_initializer: string to indicate which function to use to
        initialize weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global statistics
        (across all input batches).
      norm_momentum: float for moment to use for batch normalization.
      norm_epsilon: float for batch normalization epsilon.
+      downsample: down_sample the input.
      **kwargs: Keyword Arguments.
    """

@@ -571,7 +569,7 @@ class CSPRoute(tf.keras.layers.Layer):
    self._filter_scale = filter_scale
    self._activation = activation

-    # convoultion params
+    # convolution params
    self._kernel_initializer = kernel_initializer
    self._bias_initializer = bias_initializer
    self._kernel_regularizer = kernel_regularizer
@@ -638,7 +636,7 @@ class CSPRoute(tf.keras.layers.Layer):
 class CSPConnect(tf.keras.layers.Layer):
  """
  Sister Layer to the CSPRoute layer. Merges the partial feature stacks
-  generated by the CSPDownsampling layer, and the finaly output of the
+  generated by the CSPDownsampling layer, and the final output of the
  residual stack. Suggested in the CSPNet paper.
  Cross Stage Partial networks (CSPNets) were proposed in:
  [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
@@ -675,10 +673,10 @@ class CSPConnect(tf.keras.layers.Layer):
        weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global
@@ -750,13 +748,13 @@ class CSPConnect(tf.keras.layers.Layer):

 class CSPStack(tf.keras.layers.Layer):
  """
-  CSP full stack, combines the route and the connect in case you dont want to
-  jsut quickly wrap an existing callable or list of layers to
-  make it a cross stage partial. Added for ease of use. you should be able
-  to wrap any layer stack with a CSP independent of wether it belongs
-  to the Darknet family. if filter_scale = 2, then the blocks in the stack
-  passed into the the CSP stack should also have filters = filters/filter_scale
-  Cross Stage Partial networks (CSPNets) were proposed in:
+  CSP full stack, combines the route and the connect in case you don't want to
+  just quickly wrap an existing callable or list of layers to make it a cross
+  stage partial. Added for ease of use. you should be able to wrap any layer
+  stack with a CSP independent of whether it belongs to the Darknet family. If
+  filter_scale = 2, then the blocks in the stack passed into the the CSP stack
+  should also have filters = filters/filter_scale Cross Stage Partial networks
+  (CSPNets) were proposed in:

  [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
        Ping-Yang Chen, Jun-Wei Hsieh
@@ -781,11 +779,10 @@ class CSPStack(tf.keras.layers.Layer):
               **kwargs):
    """
    Args:
+      filters: integer for output depth, or the number of features to learn.
      model_to_wrap: callable Model or a list of callable objects that will
        process the output of CSPRoute, and be input into CSPConnect.
        list will be called sequentially.
-      downsample: down_sample the input.
-      filters: integer for output depth, or the number of features to learn.
      filter_scale: integer dicating (filters//2) or the number of filters in
        the partial feature stack.
      activation: string for activation function to use in layer.
@@ -793,10 +790,11 @@ class CSPStack(tf.keras.layers.Layer):
        weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
+      downsample: down_sample the input.
      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global statistics
@@ -891,10 +889,10 @@ class PathAggregationBlock(tf.keras.layers.Layer):
        weights.
      bias_initializer: string to indicate which function to use to initialize
        bias.
-      kernel_regularizer: string to indicate which function to use to
-        regularizer weights.
      bias_regularizer: string to indicate which function to use to regularizer
        bias.
+      kernel_regularizer: string to indicate which function to use to
+        regularizer weights.
      use_bn: boolean for whether to use batch normalization.
      use_sync_bn: boolean for whether sync batch normalization statistics
        of all batch norm layers to the models global statistics
@@ -905,8 +903,8 @@ class PathAggregationBlock(tf.keras.layers.Layer):
      activation: string or None for activation function to use in layer,
        if None activation is replaced by linear.
      leaky_alpha: float to use as alpha if activation function is leaky.
-      downsample: `bool` for whehter to downwample and merge.
-      upsample: `bool` for whehter to upsample and merge.
+      downsample: `bool` for whether to downsample and merge.
+      upsample: `bool` for whether to upsample and merge.
      upsample_size: `int` how much to upsample in order to match shapes.
      **kwargs: Keyword Arguments.
    """
@@ -1050,7 +1048,7 @@ class PathAggregationBlock(tf.keras.layers.Layer):
 @tf.keras.utils.register_keras_serializable(package='yolo')
 class SPP(tf.keras.layers.Layer):
  """
-  a non-agregated SPP layer that uses Pooling to gain more performance
+  A non-aggregated SPP layer that uses Pooling to gain more performance.
  """

  def __init__(self, sizes, **kwargs):
@@ -1090,7 +1088,7 @@ class SAM(tf.keras.layers.Layer):
  [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
  CBAM: Convolutional Block Attention Module. arXiv:1807.06521

-  implementation of the Spatial Attention Model (SAM)
+  Implementation of the Spatial Attention Model (SAM)
  """

  def __init__(self,
@@ -1167,7 +1165,7 @@ class CAM(tf.keras.layers.Layer):
  [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
  CBAM: Convolutional Block Attention Module. arXiv:1807.06521

-  implementation of the Channel Attention Model (CAM)
+  Implementation of the Channel Attention Model (CAM)
  """

  def __init__(self,
@@ -1253,7 +1251,7 @@ class CBAM(tf.keras.layers.Layer):
  [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
  CBAM: Convolutional Block Attention Module. arXiv:1807.06521

-  implementation of the Convolution Block Attention Module (CBAM)
+  Implementation of the Convolution Block Attention Module (CBAM)
  """

  def __init__(self,
@@ -1321,8 +1319,9 @@ class CBAM(tf.keras.layers.Layer):
 @tf.keras.utils.register_keras_serializable(package='yolo')
 class DarkRouteProcess(tf.keras.layers.Layer):
  """
-  process darknet outputs and connect back bone to head more generalizably
-  Abstracts repetition of DarkConv objects that is common in YOLO.
+  Processes darknet outputs and connects the backbone to the head for more
+  generalizability and abstracts the repetition of DarkConv objects that is
+  common in YOLO.

  It is used like the following:

@@ -1357,18 +1356,18 @@ class DarkRouteProcess(tf.keras.layers.Layer):
      filters: the number of filters to be used in all subsequent layers
        filters should be the depth of the tensor input into this layer,
        as no downsampling can be done within this layer object.
-      repetitions: number of times to repeat the processign nodes
-        for tiny: 1 repition, no spp allowed
+      repetitions: number of times to repeat the processing nodes
+        for tiny: 1 repetition, no spp allowed
        for spp: insert_spp = True, and allow for 3+ repetitions
        for regular: insert_spp = False, and allow for 3+ repetitions.
      insert_spp: bool if true add the spatial pyramid pooling layer.
-      kernel_initializer: method to use to initializa kernel weights.
+      kernel_initializer: method to use to initialize kernel weights.
      bias_initializer: method to use to initialize the bias of the conv
        layers.
      norm_momentum: batch norm parameter see TensorFlow documentation.
      norm_epsilon: batch norm parameter see TensorFlow documentation.
      activation: activation function to use in processing.
-      leaky_alpha: if leaky acitivation function, the alpha to use in
+      leaky_alpha: if leaky activation function, the alpha to use in
        processing the relu input.

    Returns:

--- a/official/vision/beta/projects/yolo/ops/box_ops.py
+++ b/official/vision/beta/projects/yolo/ops/box_ops.py
@@ -4,13 +4,13 @@ import math


 def yxyx_to_xcycwh(box: tf.Tensor):
-  """Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width, 
+  """Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width,
  height.

  Args:
-    box: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes in ymin, xmin, ymax, xmax.
-  
+
  Returns:
    box: a `Tensor` whose shape is the same as `box` in new format.
  """
@@ -52,13 +52,13 @@ def _xcycwh_to_yxyx(box: tf.Tensor, scale):


 def xcycwh_to_yxyx(box: tf.Tensor, darknet=False):
-  """Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax, 
+  """Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax,
  xmax.
-  
+
  Args:
-    box: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes in x_center, y_center, width, height.
-  
+
  Returns:
    box: a `Tensor` whose shape is the same as `box` in new format.
  """
@@ -75,9 +75,9 @@ def intersect_and_union(box1, box2, yxyx=False):
  """Calculates the intersection and union between box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.
@@ -109,15 +109,15 @@ def smallest_encompassing_box(box1, box2, yxyx=False):
  box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.

  Returns:
-    box_c: a `Tensor` whose last dimension is 4 representing the coordinates of 
+    box_c: a `Tensor` whose last dimension is 4 representing the coordinates of
      boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to
      to True. In other words it will match the input format.
  """
@@ -145,9 +145,9 @@ def compute_iou(box1, box2, yxyx=False):
  """Calculates the intersection over union between box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.
@@ -167,13 +167,13 @@ def compute_giou(box1, box2, yxyx=False, darknet=False):
  """Calculates the General intersection over union between box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.
-    darknet: a `bool` indicating whether the calling function is the yolo 
+    darknet: a `bool` indicating whether the calling function is the YOLO
      darknet loss.

  Returns:
@@ -208,15 +208,15 @@ def compute_diou(box1, box2, beta=1.0, yxyx=False, darknet=False):
  """Calculates the distance intersection over union between box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    beta: a `float` indicating the amount to scale the distance iou 
-      regularization term. 
+    beta: a `float` indicating the amount to scale the distance iou
+      regularization term.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.
-    darknet: a `bool` indicating whether the calling function is the yolo 
+    darknet: a `bool` indicating whether the calling function is the YOLO
      darknet loss.

  Returns:
@@ -256,13 +256,13 @@ def compute_ciou(box1, box2, yxyx=False, darknet=False):
  """Calculates the complete intersection over union between box1 and box2.

  Args:
-    box1: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
-    box2: any `Tensor` whose last dimension is 4 representing the coordinates of 
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
      boxes.
    yxyx: a `bool` indicating whether the input box is of the format x_center
      y_center, width, height or y_min, x_min, y_max, x_max.
-    darknet: a `bool` indicating whether the calling function is the yolo 
+    darknet: a `bool` indicating whether the calling function is the YOLO
      darknet loss.

  Returns:
@@ -297,23 +297,22 @@ def aggregated_comparitive_iou(boxes1,
                               boxes2=None,
                               iou_type=0,
                               beta=0.6):
-  """Calculates the intersection over union between every box in boxes1 and 
+  """Calculates the intersection over union between every box in boxes1 and
  every box in boxes2.

  Args:
-    boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates 
+    boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates
      of boxes.
-    boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates 
+    boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates
      of boxes.
-    iou_type: `integer` representing the iou version to use, 0 is distance iou, 
-      1 is the general iou, 2 is the complete iou, any other number uses the 
+    iou_type: `integer` representing the iou version to use, 0 is distance iou,
+      1 is the general iou, 2 is the complete iou, any other number uses the
      standard iou.
-    beta: `float` for the scaling quantity to apply to distance iou 
+    beta: `float` for the scaling quantity to apply to distance iou
      regularization.
-    

  Returns:
-    iou: a `Tensor` who represents the intersection over union in of the 
+    iou: a `Tensor` who represents the intersection over union in of the
      expected/input type.
  """
  boxes1 = tf.expand_dims(boxes1, axis=-2)

--- a/official/vision/beta/projects/yolo/ops/math_ops.py
+++ b/official/vision/beta/projects/yolo/ops/math_ops.py
-"""A set of private math operations used to safely implement the yolo loss"""
+"""A set of private math operations used to safely implement the YOLO loss."""
 import tensorflow as tf


 def rm_nan_inf(x, val=0.0):
-  """remove nan and infinity   
+  """remove nan and infinity

  Args:
-    x: any `Tensor` of any type. 
-    val: value to replace nan and infinity with. 
+    x: any `Tensor` of any type.
+    val: value to replace nan and infinity with.

  Return:
    a `Tensor` with nan and infinity removed.
@@ -19,11 +19,11 @@ def rm_nan_inf(x, val=0.0):


 def rm_nan(x, val=0.0):
-  """remove nan and infinity.   
+  """Remove nan and infinity.

  Args:
-    x: any `Tensor` of any type. 
-    val: value to replace nan. 
+    x: any `Tensor` of any type.
+    val: value to replace nan.

  Return:
    a `Tensor` with nan removed.
@@ -35,32 +35,32 @@ def rm_nan(x, val=0.0):


 def divide_no_nan(a, b):
-  """Nan safe divide operation built to allow model compilation in tflite. 
+  """Nan safe divide operation built to allow model compilation in tflite.

  Args:
    a: any `Tensor` of any type.
-    b: any `Tensor` of any type with the same shape as tensor a. 
+    b: any `Tensor` of any type with the same shape as tensor a.

  Return:
-    a `Tensor` representing a divided by b, with all nan values removed. 
+    a `Tensor` representing a divided by b, with all nan values removed.
  """
  zero = tf.cast(0.0, b.dtype)
  return tf.where(b == zero, zero, a / b)


 def mul_no_nan(x, y):
-  """Nan safe multiply operation built to allow model compilation in tflite and 
-  to allowing one tensor to mask another. Where ever x is zero the 
-  multiplication is not computed and the value is replaced with a zero. This is 
-  requred because 0 * nan = nan. This can make computation unstable in some 
-  cases where the intended behavior is for zero to mean ignore.  
+  """Nan safe multiply operation built to allow model compilation in tflite and
+  to allow one tensor to mask another. Where ever x is zero the
+  multiplication is not computed and the value is replaced with a zero. This is
+  required because 0 * nan = nan. This can make computation unstable in some
+  cases where the intended behavior is for zero to mean ignore.

  Args:
-    x: any `Tensor` of any type. 
-    y: any `Tensor` of any type with the same shape as tensor x. 
+    x: any `Tensor` of any type.
+    y: any `Tensor` of any type with the same shape as tensor x.

  Return:
-    a `Tensor` representing x times y, where x is used to safely mask the 
-    tensor y. 
+    a `Tensor` representing x times y, where x is used to safely mask the
+    tensor y.
  """
  return tf.where(x == 0, tf.cast(0, x.dtype), x * y)
--- a/official/vision/beta/projects/yolo/ops/nms_ops.py
+++ b/official/vision/beta/projects/yolo/ops/nms_ops.py
@@ -8,13 +8,12 @@ class TiledNMS():
  IOU_TYPES = {'diou': 0, 'giou': 1, 'ciou': 2, 'iou': 3}

  def __init__(self, iou_type='diou', beta=0.6):
-    '''initialization for all non max supression operations mainly used to 
-    select hyperperamters for the iou type and scaling.
+    '''initialization for all non max suppression operations mainly used to
+    select hyperparameters for the iou type and scaling.

-    Args: 
+    Args:
      iou_type: `str` for the version of IOU to use {diou, giou, ciou, iou}.
-      beta: `float` for the amount to scale regualrization on distance iou.
-    
+      beta: `float` for the amount to scale regularization on distance iou.
    '''
    self._iou_type = TiledNMS.IOU_TYPES[iou_type]
    self._beta = beta
@@ -54,7 +53,7 @@ class TiledNMS():
        overlap too much with respect to IOU.
      output_size: an int32 tensor of size [batch_size]. Representing the number
        of selected boxes for each batch.
-      idx: an integer scalar representing induction variable.
+      idx: an integer scalar representing an induction variable.

    Returns:
      boxes: updated boxes.
@@ -111,10 +110,10 @@ class TiledNMS():
    Assumption:
      * The boxes are sorted by scores unless the box is a dot (all coordinates
        are zero).
-      * Boxes with higher scores can be used to suppress boxes with lower 
+      * Boxes with higher scores can be used to suppress boxes with lower
        scores.

-    The overal design of the algorithm is to handle boxes tile-by-tile:
+    The overall design of the algorithm is to handle boxes tile-by-tile:

    boxes = boxes.pad_to_multiply_of(tile_size)
    num_tiles = len(boxes) // tile_size
@@ -126,7 +125,7 @@ class TiledNMS():
        iou = bbox_overlap(box_tile, suppressing_tile)
        # if the box is suppressed in iou, clear it to a dot
        box_tile *= _update_boxes(iou)
-      # Iteratively handle the diagnal tile.
+      # Iteratively handle the diagonal tile.
      iou = _box_overlap(box_tile, box_tile)
      iou_changed = True
      while iou_changed:
@@ -232,16 +231,16 @@ class TiledNMS():

    This implementation unrolls classes dimension while using the tf.while_loop
    to implement the batched NMS, so that it can be parallelized at the batch
-    dimension. It should give better performance comparing to v1 implementation.
+    dimension. It should give better performance compared to v1 implementation.
    It is TPU compatible.

    Args:
      boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size,
-        N, 1, 4], which box predictions on all feature levels. The N is the 
+        N, 1, 4], which box predictions on all feature levels. The N is the
        number of total anchors on all levels.
-      scores: a tensor with shape [batch_size, N, num_classes], which stacks 
-        class probability on all feature levels. The N is the number of total 
-        anchors on all levels. The num_classes is the number of classes the 
+      scores: a tensor with shape [batch_size, N, num_classes], which stacks
+        class probability on all feature levels. The N is the number of total
+        anchors on all levels. The num_classes is the number of classes the
        model predicted. Note that the class_outputs here is the raw score.
      pre_nms_top_k: an int number of top candidate detections per class
        before NMS.
@@ -327,21 +326,21 @@ def sorted_non_max_suppression_padded(scores, boxes, max_output_size,


 def sort_drop(objectness, box, classificationsi, k):
-  """This function sorts and drops boxes such that there are only k boxes 
-  sorted by number the objectness or confidence 
+  """This function sorts and drops boxes such that there are only k boxes
+  sorted by number the objectness or confidence

-  Args: 
-    objectness: a `Tensor` of shape [batch size, N] that needs to be 
+  Args:
+    objectness: a `Tensor` of shape [batch size, N] that needs to be
      filtered.
    box: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
-    classificationsi: a `Tensor` of shape [batch size, N, num_classes] that 
+    classificationsi: a `Tensor` of shape [batch size, N, num_classes] that
      needs to be filtered.
    k: a `integer` for the maximum number of boxes to keep after filtering
-  
+
  Return:
-    objectness: filtered `Tensor` of shape [batch size, k] 
-    boxes: filtered `Tensor` of shape [batch size, k, 4] 
-    classifications: filtered `Tensor` of shape [batch size, k, num_classes] 
+    objectness: filtered `Tensor` of shape [batch size, k]
+    boxes: filtered `Tensor` of shape [batch size, k, 4]
+    classifications: filtered `Tensor` of shape [batch size, k, num_classes]
  """
  # find rhe indexes for the boxes based on the scores
  objectness, ind = tf.math.top_k(objectness, k=k)
@@ -364,25 +363,25 @@ def sort_drop(objectness, box, classificationsi, k):


 def segment_nms(boxes, classes, confidence, k, iou_thresh):
-  """This is a quick nms that works on very well for small values of k, this 
-  was developed to operate for tflite models as the tiled NMS is far too slow 
-  and typically is not able to compile with tflite. This NMS does not account 
-  for classes, and only works to quickly filter boxes on phones. 
+  """This is a quick nms that works on very well for small values of k, this
+  was developed to operate for tflite models as the tiled NMS is far too slow
+  and typically is not able to compile with tflite. This NMS does not account
+  for classes, and only works to quickly filter boxes on phones.

-  Args: 
+  Args:
    boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
-    classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be 
+    classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
      filtered.
-    confidence: a `Tensor` of shape [batch size, N] that needs to be 
+    confidence: a `Tensor` of shape [batch size, N] that needs to be
      filtered.
    k: a `integer` for the maximum number of boxes to keep after filtering
-    iou_thresh: a `float` for the value above which boxes are consdered to be 
-      too similar, the closer to 1.0 the less that gets though. 
-  
+    iou_thresh: a `float` for the value above which boxes are considered to be
+      too similar, the closer to 1.0 the less that gets through.
+
  Return:
    boxes: filtered `Tensor` of shape [batch size, k, 4]
    classes: filtered `Tensor` of shape [batch size, k, num_classes] t
-    confidence: filtered `Tensor` of shape [batch size, k] 
+    confidence: filtered `Tensor` of shape [batch size, k]
  """
  mrange = tf.range(k)
  mask_x = tf.tile(
@@ -416,27 +415,27 @@ def nms(boxes,
        pre_nms_thresh,
        nms_thresh,
        prenms_top_k=500):
-  """This is a quick nms that works on very well for small values of k, this 
-  was developed to operate for tflite models as the tiled NMS is far too slow 
-  and typically is not able to compile with tflite. This NMS does not account 
-  for classes, and only works to quickly filter boxes on phones. 
+  """This is a quick nms that works on very well for small values of k, this
+  was developed to operate for tflite models as the tiled NMS is far too slow
+  and typically is not able to compile with tflite. This NMS does not account
+  for classes, and only works to quickly filter boxes on phones.

-  Args: 
+  Args:
    boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
-    classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be 
+    classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
      filtered.
-    confidence: a `Tensor` of shape [batch size, N] that needs to be 
+    confidence: a `Tensor` of shape [batch size, N] that needs to be
      filtered.
    k: a `integer` for the maximum number of boxes to keep after filtering
-    nms_thresh: a `float` for the value above which boxes are consdered to be 
-      too similar, the closer to 1.0 the less that gets though. 
+    nms_thresh: a `float` for the value above which boxes are considered to be
+      too similar, the closer to 1.0 the less that gets through.
    pre_nms_top_k: an int number of top candidate detections per class
      before NMS.

  Return:
    boxes: filtered `Tensor` of shape [batch size, k, 4]
    classes: filtered `Tensor` of shape [batch size, k, num_classes]
-    confidence: filtered `Tensor` of shape [batch size, k] 
+    confidence: filtered `Tensor` of shape [batch size, k]
  """

  # sort the boxes