Internal change

PiperOrigin-RevId: 389665560

Internal change
PiperOrigin-RevId: 389665560
1df58b60 · Abdullah Rashwan · A. Unique TensorFlower · 3576cb4b · 1df58b60 · 1df58b60
Commit 1df58b60 authored Aug 09, 2021 by Abdullah Rashwan Committed by A. Unique TensorFlower Aug 09, 2021
10 changed files
--- a/official/vision/beta/projects/yolo/README.md
+++ b/official/vision/beta/projects/yolo/README.md
@@ -17,30 +17,31 @@ repository.

 ## Description

-Yolo v1 the original implementation was released in 2015 providing a ground
-breaking algorithm that would quickly process images, and locate objects in a
-single pass through the detector. The original implementation based used a
-backbone derived from state of the art object classifier of the time, like
+YOLO v1 the original implementation was released in 2015 providing a
+ground breaking algorithm that would quickly process images and locate objects
+in a single pass through the detector. The original implementation used a
+backbone derived from state of the art object classifiers of the time, like
 [GoogLeNet](https://arxiv.org/abs/1409.4842) and
 [VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel
-Yolo Detection head that allowed for Object Detection with a single pass of an
+YOLO Detection head that allowed for Object Detection with a single pass of an
 image. Though limited, the network could predict up to 90 bounding boxes per
-image, and was tested for about 80 classes per box. Also, the model could only
-make prediction at one scale. These attributes caused yolo v1 to be more
-limited, and less versatile, so as the year passed, the Developers continued to
+image, and was tested for about 80 classes per box. Also, the model can only
+make predictions at one scale. These attributes caused YOLO v1 to be more
+limited and less versatile, so as the year passed, the Developers continued to
 update and develop this model.

-Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo
-network group. These model uses a custom backbone called Darknet53 that uses
+YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO
+network group. This model uses a custom backbone called Darknet53 that uses
 knowledge gained from the ResNet paper to improve its predictions. The new
 backbone also allows for objects to be detected at multiple scales. As for the
 new detection head, the model now predicts the bounding boxes using a set of
-anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in
-combination with the Anchor boxes allows for the network to make up to 1000
-object predictions on a single image. Finally, the new loss function forces the
-network to make better prediction by using Intersection Over Union (IOU) to
-inform the model's confidence rather than relying on the mean squared error for
-the entire output.
+anchor box priors (Anchor Boxes) as suggestions. Multiscale predictions in
+combination with Anchor boxes allow for the network to make up to 1000 object
+predictions on a single image. Finally, the new loss function forces the network
+to make better predictions by using Intersection Over Union (IOU) to inform the
+model's confidence rather than relying on the mean squared error for the entire
+output.
+

 ## Authors

@@ -59,9 +60,9 @@ the entire output.

 ## Our Goal

-Our goal with this model conversion is to provide implementations of the
-Backbone and Yolo Head. We have built the model in such a way that the Yolo
-head could be connected to a new, more powerful backbone if a person chose to.
+Our goal with this model conversion is to provide implementation of the Backbone
+and YOLO Head. We have built the model in such a way that the YOLO head could be
+connected to a new, more powerful backbone if a person chose to.

 ## Models in the library

@@ -79,3 +80,5 @@ head could be connected to a new, more powerful backbone if a person chose to.
 [![Python 3.8](https://img.shields.io/badge/Python-3.8-3776AB)](https://www.python.org/downloads/release/python-380/)


+DISCLAIMER: this YOLO implementation is still under development. No support
+will be provided during the development phase.
--- a/official/vision/beta/projects/yolo/configs/backbones.py
+++ b/official/vision/beta/projects/yolo/configs/backbones.py
@@ -30,6 +30,8 @@ class Darknet(hyperparams.Config):
  width_scale: float = 1.0
  depth_scale: float = 1.0
  dilate: bool = False
+  min_level: int = 3
+  max_level: int = 5


 @dataclasses.dataclass

--- a/official/vision/beta/projects/yolo/configs/darknet_classification.py
+++ b/official/vision/beta/projects/yolo/configs/darknet_classification.py
@@ -15,9 +15,8 @@
 # Lint as: python3
 """Image classification with darknet configs."""

-from typing import List, Optional
-
 import dataclasses
+from typing import List, Optional

 from official.core import config_definitions as cfg
 from official.core import exp_factory
@@ -35,7 +34,7 @@ class ImageClassificationModel(hyperparams.Config):
      type='darknet', darknet=backbones.Darknet())
  dropout_rate: float = 0.0
  norm_activation: common.NormActivation = common.NormActivation()
-  # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
+  # Adds a Batch Normalization layer pre-GlobalAveragePooling in classification.
  add_head_batch_norm: bool = False



--- a/official/vision/beta/projects/yolo/dataloaders/yolo_detection_input.py
+++ b/official/vision/beta/projects/yolo/dataloaders/yolo_detection_input.py
@@ -67,7 +67,7 @@ class Parser(parser.Parser):
      max_level: `int` number of maximum level of the output feature pyramid.
      masks: a `Tensor`, `List` or `numpy.ndarray` for anchor masks.
      max_process_size: an `int` for maximum image width and height.
-      min_process_size: an `int` for minimum image width and height ,
+      min_process_size: an `int` for minimum image width and height.
      max_num_instances: an `int` number of maximum number of instances in an
        image.
      random_flip: a `bool` if True, augment training with random horizontal

--- a/official/vision/beta/projects/yolo/modeling/yolo_model.py
+++ b/official/vision/beta/projects/yolo/modeling/yolo_model.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Yolo models."""
+
+import tensorflow as tf
+
+
+# Static base Yolo Models that do not require configuration
+# similar to a backbone model id.
+
+# this is done greatly simplify the model config
+# the structure is as follows. model version, {v3, v4, v#, ... etc}
+# the model config type {regular, tiny, small, large, ... etc}
+YOLO_MODELS = {
+    "v4":
+        dict(
+            regular=dict(
+                embed_spp=False,
+                use_fpn=True,
+                max_level_process_len=None,
+                path_process_len=6),
+            tiny=dict(
+                embed_spp=False,
+                use_fpn=False,
+                max_level_process_len=2,
+                path_process_len=1),
+            csp=dict(
+                embed_spp=False,
+                use_fpn=True,
+                max_level_process_len=None,
+                csp_stack=5,
+                fpn_depth=5,
+                path_process_len=6),
+            csp_large=dict(
+                embed_spp=False,
+                use_fpn=True,
+                max_level_process_len=None,
+                csp_stack=7,
+                fpn_depth=7,
+                path_process_len=8,
+                fpn_filter_scale=2),
+        ),
+    "v3":
+        dict(
+            regular=dict(
+                embed_spp=False,
+                use_fpn=False,
+                max_level_process_len=None,
+                path_process_len=6),
+            tiny=dict(
+                embed_spp=False,
+                use_fpn=False,
+                max_level_process_len=2,
+                path_process_len=1),
+            spp=dict(
+                embed_spp=True,
+                use_fpn=False,
+                max_level_process_len=2,
+                path_process_len=1),
+        ),
+}
+
+
+class Yolo(tf.keras.Model):
+  """The YOLO model class."""
+
+  def __init__(self,
+               backbone=None,
+               decoder=None,
+               head=None,
+               detection_generator=None,
+               **kwargs):
+    """Detection initialization function.
+
+    Args:
+      backbone: `tf.keras.Model`, a backbone network.
+      decoder: `tf.keras.Model`, a decoder network.
+      head: `YoloHead`, the YOLO head.
+      detection_generator: `tf.keras.Model`, the detection generator.
+      **kwargs: keyword arguments to be passed.
+    """
+    super().__init__(**kwargs)
+
+    self._config_dict = {
+        "backbone": backbone,
+        "decoder": decoder,
+        "head": head,
+        "detection_generator": detection_generator
+    }
+
+    # model components
+    self._backbone = backbone
+    self._decoder = decoder
+    self._head = head
+    self._detection_generator = detection_generator
+
+  def call(self, inputs, training=False):
+    maps = self._backbone(inputs)
+    decoded_maps = self._decoder(maps)
+    raw_predictions = self._head(decoded_maps)
+    if training:
+      return {"raw_output": raw_predictions}
+    else:
+      # Post-processing.
+      predictions = self._detection_generator(raw_predictions)
+      predictions.update({"raw_output": raw_predictions})
+      return predictions
+
+  @property
+  def backbone(self):
+    return self._backbone
+
+  @property
+  def decoder(self):
+    return self._decoder
+
+  @property
+  def head(self):
+    return self._head
+
+  @property
+  def detection_generator(self):
+    return self._detection_generator
+
+  def get_config(self):
+    return self._config_dict
+
+  @classmethod
+  def from_config(cls, config):
+    return cls(**config)
--- a/official/vision/beta/projects/yolo/ops/box_ops.py
+++ b/official/vision/beta/projects/yolo/ops/box_ops.py
@@ -12,28 +12,21 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-"""Bounding box utils."""
-
+"""Yolo box ops."""
 import math
-
 import tensorflow as tf
+from official.vision.beta.projects.yolo.ops import math_ops


 def yxyx_to_xcycwh(box: tf.Tensor):
-  """Converts boxes from ymin, xmin, ymax, xmax.
-
-  to x_center, y_center, width, height.
+  """Converts boxes from yxyx to x_center, y_center, width, height.

  Args:
-    box: `Tensor` whose shape is [..., 4] and represents the coordinates
-      of boxes in ymin, xmin, ymax, xmax.
+    box: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes in ymin, xmin, ymax, xmax.

  Returns:
-    `Tensor` whose shape is [..., 4] and contains the new format.
-
-  Raises:
-    ValueError: If the last dimension of box is not 4 or if box's dtype isn't
-      a floating point type.
+    box: a `Tensor` whose shape is the same as `box` in new format.
  """
  with tf.name_scope('yxyx_to_xcycwh'):
    ymin, xmin, ymax, xmax = tf.split(box, 4, axis=-1)
@@ -45,23 +38,9 @@ def yxyx_to_xcycwh(box: tf.Tensor):
  return box


-def xcycwh_to_yxyx(box: tf.Tensor, split_min_max: bool = False):
-  """Converts boxes from x_center, y_center, width, height.
-
-  to ymin, xmin, ymax, xmax.
-
-  Args:
-    box: a `Tensor` whose shape is [..., 4] and represents the coordinates
-      of boxes in x_center, y_center, width, height.
-    split_min_max: bool, whether or not to split x, y min and max values.
-
-  Returns:
-    box: a `Tensor` whose shape is [..., 4] and contains the new format.
-
-  Raises:
-    ValueError: If the last dimension of box is not 4 or if box's dtype isn't
-      a floating point type.
-  """
+@tf.custom_gradient
+def _xcycwh_to_yxyx(box: tf.Tensor, scale):
+  """Private function to allow custom gradients with defaults."""
  with tf.name_scope('xcycwh_to_yxyx'):
    xy, wh = tf.split(box, 2, axis=-1)
    xy_min = xy - wh / 2
@@ -69,229 +48,297 @@ def xcycwh_to_yxyx(box: tf.Tensor, split_min_max: bool = False):
    x_min, y_min = tf.split(xy_min, 2, axis=-1)
    x_max, y_max = tf.split(xy_max, 2, axis=-1)
    box = tf.concat([y_min, x_min, y_max, x_max], axis=-1)
-    if split_min_max:
-      box = tf.split(box, 2, axis=-1)
-  return box

+    def delta(dbox):
+      # y_min = top, x_min = left, y_max = bottom, x_max = right
+      dt, dl, db, dr = tf.split(dbox, 4, axis=-1)
+      dx = dl + dr
+      dy = dt + db
+      dw = (dr - dl) / scale
+      dh = (db - dt) / scale
+
+      dbox = tf.concat([dx, dy, dw, dh], axis=-1)
+      return dbox, 0.0

-def xcycwh_to_xyxy(box: tf.Tensor, split_min_max: bool = False):
-  """Converts boxes from x_center, y_center, width, height to.
+  return box, delta

-  xmin, ymin, xmax, ymax.
+
+def xcycwh_to_yxyx(box: tf.Tensor, darknet=False):
+  """Converts boxes from x_center, y_center, width, height to yxyx format.

  Args:
-    box: box: a `Tensor` whose shape is [..., 4] and represents the
-      coordinates of boxes in x_center, y_center, width, height.
-    split_min_max: bool, whether or not to split x, y min and max values.
+    box: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes in x_center, y_center, width, height.
+    darknet: `bool`, if True a scale of 1.0 is used.

  Returns:
-    box: a `Tensor` whose shape is [..., 4] and contains the new format.
-
-  Raises:
-    ValueError: If the last dimension of box is not 4 or if box's dtype isn't
-      a floating point type.
+    box: a `Tensor` whose shape is the same as `box` in new format.
  """
-  with tf.name_scope('xcycwh_to_yxyx'):
-    xy, wh = tf.split(box, 2, axis=-1)
-    xy_min = xy - wh / 2
-    xy_max = xy + wh / 2
-    box = (xy_min, xy_max)
-    if not split_min_max:
-      box = tf.concat(box, axis=-1)
+  if darknet:
+    scale = 1.0
+  else:
+    scale = 2.0
+  box = _xcycwh_to_yxyx(box, scale)
  return box


-def center_distance(center_1: tf.Tensor, center_2: tf.Tensor):
-  """Calculates the squared distance between two points.
+# IOU
+def intersect_and_union(box1, box2, yxyx=False):
+  """Calculates the intersection and union between box1 and box2.
+
+  Args:
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.
+
+  Returns:
+    intersection: a `Tensor` who represents the intersection.
+    union: a `Tensor` who represents the union.
+  """
+
+  if not yxyx:
+    box1 = xcycwh_to_yxyx(box1)
+    box2 = xcycwh_to_yxyx(box2)
+
+  b1mi, b1ma = tf.split(box1, 2, axis=-1)
+  b2mi, b2ma = tf.split(box2, 2, axis=-1)
+  intersect_mins = tf.math.maximum(b1mi, b2mi)
+  intersect_maxes = tf.math.minimum(b1ma, b2ma)
+  intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.0)
+  intersection = tf.reduce_prod(intersect_wh, axis=-1)
+
+  box1_area = tf.reduce_prod(b1ma - b1mi, axis=-1)
+  box2_area = tf.reduce_prod(b2ma - b2mi, axis=-1)
+  union = box1_area + box2_area - intersection
+  return intersection, union

-  This function is mathematically equivalent to the following code, but has
-  smaller rounding errors.

-  tf.norm(center_1 - center_2, axis=-1)**2
+def smallest_encompassing_box(box1, box2, yxyx=False):
+  """Calculates the smallest box that encompasses box1 and box2.

  Args:
-    center_1: a `Tensor` whose shape is [..., 2] and represents a point.
-    center_2: a `Tensor` whose shape is [..., 2] and represents a point.
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.

  Returns:
-    dist: a `Tensor` whose shape is [...] and value represents the squared
-      distance between center_1 and center_2.
-
-  Raises:
-    ValueError: If the last dimension of either center_1 or center_2 is not 2.
+    box_c: a `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to
+      to True. In other words it will match the input format.
  """
-  with tf.name_scope('center_distance'):
-    dist = (center_1[..., 0] - center_2[..., 0])**2 + (center_1[..., 1] -
-                                                       center_2[..., 1])**2
-  return dist
+  if not yxyx:
+    box1 = xcycwh_to_yxyx(box1)
+    box2 = xcycwh_to_yxyx(box2)
+
+  b1mi, b1ma = tf.split(box1, 2, axis=-1)
+  b2mi, b2ma = tf.split(box2, 2, axis=-1)
+
+  bcmi = tf.math.minimum(b1mi, b2mi)
+  bcma = tf.math.maximum(b1ma, b2ma)
+
+  bca = tf.reduce_prod(bcma - bcmi, keepdims=True, axis=-1)
+  box_c = tf.concat([bcmi, bcma], axis=-1)
+
+  if not yxyx:
+    box_c = yxyx_to_xcycwh(box_c)
+
+  box_c = tf.where(bca == 0.0, tf.zeros_like(box_c), box_c)
+  return box_c


 def compute_iou(box1, box2, yxyx=False):
-  """Calculates the intersection of union between box1 and box2.
+  """Calculates the intersection over union between box1 and box2.

  Args:
-    box1: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
-    box2: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
-    yxyx: `bool`, whether or not box1, and box2 are in yxyx format.
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.

  Returns:
-    iou: a `Tensor` whose shape is [...] and value represents the intersection
-      over union.
-
-  Raises:
-    ValueError: If the last dimension of either box1 or box2 is not 4.
+    iou: a `Tensor` who represents the intersection over union.
  """
-  # Get box corners
+  # get box corners
  with tf.name_scope('iou'):
-    if not yxyx:
-      box1 = xcycwh_to_yxyx(box1)
-      box2 = xcycwh_to_yxyx(box2)
-
-    b1mi, b1ma = tf.split(box1, 2, axis=-1)
-    b2mi, b2ma = tf.split(box2, 2, axis=-1)
-    intersect_mins = tf.math.maximum(b1mi, b2mi)
-    intersect_maxes = tf.math.minimum(b1ma, b2ma)
-    intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins,
-                                   tf.zeros_like(intersect_mins))
-    intersection = tf.reduce_prod(
-        intersect_wh, axis=-1)  # intersect_wh[..., 0] * intersect_wh[..., 1]
-
-    box1_area = tf.math.abs(tf.reduce_prod(b1ma - b1mi, axis=-1))
-    box2_area = tf.math.abs(tf.reduce_prod(b2ma - b2mi, axis=-1))
-    union = box1_area + box2_area - intersection
-
-    iou = intersection / (union + 1e-7)
-    iou = tf.clip_by_value(iou, clip_value_min=0.0, clip_value_max=1.0)
+    intersection, union = intersect_and_union(box1, box2, yxyx=yxyx)
+    iou = math_ops.divide_no_nan(intersection, union)
+    iou = math_ops.rm_nan_inf(iou, val=0.0)
  return iou


-def compute_giou(box1, box2):
-  """Calculates the generalized intersection of union between box1 and box2.
+def compute_giou(box1, box2, yxyx=False, darknet=False):
+  """Calculates the General intersection over union between box1 and box2.

  Args:
-    box1: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
-    box2: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.
+    darknet: a `bool` indicating whether the calling function is the YOLO
+      darknet loss.

  Returns:
-    iou: a `Tensor` whose shape is [...] and value represents the generalized
-      intersection over union.
-
-  Raises:
-    ValueError: If the last dimension of either box1 or box2 is not 4.
+    giou: a `Tensor` who represents the General intersection over union.
  """
  with tf.name_scope('giou'):
-    # get box corners
-    box1 = xcycwh_to_yxyx(box1)
-    box2 = xcycwh_to_yxyx(box2)
-
-    # compute IOU
-    intersect_mins = tf.math.maximum(box1[..., 0:2], box2[..., 0:2])
-    intersect_maxes = tf.math.minimum(box1[..., 2:4], box2[..., 2:4])
-    intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins,
-                                   tf.zeros_like(intersect_mins))
-    intersection = intersect_wh[..., 0] * intersect_wh[..., 1]
-
-    box1_area = tf.math.abs(
-        tf.reduce_prod(box1[..., 2:4] - box1[..., 0:2], axis=-1))
-    box2_area = tf.math.abs(
-        tf.reduce_prod(box2[..., 2:4] - box2[..., 0:2], axis=-1))
-    union = box1_area + box2_area - intersection
+    # get IOU
+    if not yxyx:
+      box1 = xcycwh_to_yxyx(box1, darknet=darknet)
+      box2 = xcycwh_to_yxyx(box2, darknet=darknet)
+      yxyx = True

-    iou = tf.math.divide_no_nan(intersection, union)
-    iou = tf.clip_by_value(iou, clip_value_min=0.0, clip_value_max=1.0)
+    intersection, union = intersect_and_union(box1, box2, yxyx=yxyx)
+    iou = math_ops.divide_no_nan(intersection, union)
+    iou = math_ops.rm_nan_inf(iou, val=0.0)

    # find the smallest box to encompase both box1 and box2
-    c_mins = tf.math.minimum(box1[..., 0:2], box2[..., 0:2])
-    c_maxes = tf.math.maximum(box1[..., 2:4], box2[..., 2:4])
-    c = tf.math.abs(tf.reduce_prod(c_mins - c_maxes, axis=-1))
+    boxc = smallest_encompassing_box(box1, box2, yxyx=yxyx)
+    if yxyx:
+      boxc = yxyx_to_xcycwh(boxc)
+    _, cwch = tf.split(boxc, 2, axis=-1)
+    c = tf.math.reduce_prod(cwch, axis=-1)

    # compute giou
-    giou = iou - tf.math.divide_no_nan((c - union), c)
+    regularization = math_ops.divide_no_nan((c - union), c)
+    giou = iou - regularization
+    giou = tf.clip_by_value(giou, clip_value_min=-1.0, clip_value_max=1.0)
  return iou, giou


-def compute_diou(box1, box2):
-  """Calculates the distance intersection of union between box1 and box2.
+def compute_diou(box1, box2, beta=1.0, yxyx=False, darknet=False):
+  """Calculates the distance intersection over union between box1 and box2.

  Args:
-    box1: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
-    box2: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    beta: a `float` indicating the amount to scale the distance iou
+      regularization term.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.
+    darknet: a `bool` indicating whether the calling function is the YOLO
+      darknet loss.

  Returns:
-    iou: a `Tensor` whose shape is [...] and value represents the distance
-      intersection over union.
-
-  Raises:
-    ValueError: If the last dimension of either box1 or box2 is not 4.
+    diou: a `Tensor` who represents the distance intersection over union.
  """
  with tf.name_scope('diou'):
    # compute center distance
-    dist = center_distance(box1[..., 0:2], box2[..., 0:2])
+    if not yxyx:
+      box1 = xcycwh_to_yxyx(box1, darknet=darknet)
+      box2 = xcycwh_to_yxyx(box2, darknet=darknet)
+      yxyx = True
+
+    intersection, union = intersect_and_union(box1, box2, yxyx=yxyx)
+    boxc = smallest_encompassing_box(box1, box2, yxyx=yxyx)
+
+    iou = math_ops.divide_no_nan(intersection, union)
+    iou = math_ops.rm_nan_inf(iou, val=0.0)
+    if yxyx:
+      boxc = yxyx_to_xcycwh(boxc)
+      box1 = yxyx_to_xcycwh(box1)
+      box2 = yxyx_to_xcycwh(box2)
+
+    b1xy, _ = tf.split(box1, 2, axis=-1)
+    b2xy, _ = tf.split(box2, 2, axis=-1)
+    _, bcwh = tf.split(boxc, 2, axis=-1)
+
+    center_dist = tf.reduce_sum((b1xy - b2xy)**2, axis=-1)
+    c_diag = tf.reduce_sum(bcwh**2, axis=-1)
+
+    regularization = math_ops.divide_no_nan(center_dist, c_diag)
+    diou = iou - regularization**beta
+    diou = tf.clip_by_value(diou, clip_value_min=-1.0, clip_value_max=1.0)
+  return iou, diou

-    # get box corners
-    box1 = xcycwh_to_yxyx(box1)
-    box2 = xcycwh_to_yxyx(box2)

-    # compute IOU
-    intersect_mins = tf.math.maximum(box1[..., 0:2], box2[..., 0:2])
-    intersect_maxes = tf.math.minimum(box1[..., 2:4], box2[..., 2:4])
-    intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins,
-                                   tf.zeros_like(intersect_mins))
-    intersection = intersect_wh[..., 0] * intersect_wh[..., 1]
+def compute_ciou(box1, box2, yxyx=False, darknet=False):
+  """Calculates the complete intersection over union between box1 and box2.

-    box1_area = tf.math.abs(
-        tf.reduce_prod(box1[..., 2:4] - box1[..., 0:2], axis=-1))
-    box2_area = tf.math.abs(
-        tf.reduce_prod(box2[..., 2:4] - box2[..., 0:2], axis=-1))
-    union = box1_area + box2_area - intersection
+  Args:
+    box1: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    box2: any `Tensor` whose last dimension is 4 representing the coordinates of
+      boxes.
+    yxyx: a `bool` indicating whether the input box is of the format x_center
+      y_center, width, height or y_min, x_min, y_max, x_max.
+    darknet: a `bool` indicating whether the calling function is the YOLO
+      darknet loss.

-    iou = tf.math.divide_no_nan(intersection, union)
-    iou = tf.clip_by_value(iou, clip_value_min=0.0, clip_value_max=1.0)
+  Returns:
+    ciou: a `Tensor` who represents the complete intersection over union.
+  """
+  with tf.name_scope('ciou'):
+    # compute DIOU and IOU
+    iou, diou = compute_diou(box1, box2, yxyx=yxyx, darknet=darknet)

-    # compute max diagnal of the smallest enclosing box
-    c_mins = tf.math.minimum(box1[..., 0:2], box2[..., 0:2])
-    c_maxes = tf.math.maximum(box1[..., 2:4], box2[..., 2:4])
+    if yxyx:
+      box1 = yxyx_to_xcycwh(box1)
+      box2 = yxyx_to_xcycwh(box2)

-    diag_dist = tf.reduce_sum((c_maxes - c_mins)**2, axis=-1)
+    _, _, b1w, b1h = tf.split(box1, 4, axis=-1)
+    _, _, b2w, b2h = tf.split(box1, 4, axis=-1)

-    regularization = tf.math.divide_no_nan(dist, diag_dist)
-    diou = iou + regularization
-  return iou, diou
+    # computer aspect ratio consistency
+    terma = tf.cast(math_ops.divide_no_nan(b1w, b1h), tf.float32)
+    termb = tf.cast(math_ops.divide_no_nan(b2w, b2h), tf.float32)
+    arcterm = tf.square(tf.math.atan(terma) - tf.math.atan(termb))
+    v = tf.squeeze(4 * arcterm / (math.pi**2), axis=-1)
+    v = tf.cast(v, b1w.dtype)
+
+    a = tf.stop_gradient(math_ops.divide_no_nan(v, ((1 - iou) + v)))
+    ciou = diou - (v * a)
+    ciou = tf.clip_by_value(ciou, clip_value_min=-1.0, clip_value_max=1.0)
+  return iou, ciou


-def compute_ciou(box1, box2):
-  """Calculates the complete intersection of union between box1 and box2.
+def aggregated_comparitive_iou(boxes1,
+                               boxes2=None,
+                               iou_type=0,
+                               beta=0.6):
+  """Calculates the IOU between two set of boxes.
+
+  Similar to bbox_overlap but far more versitile.

  Args:
-    box1: a `Tensor` whose shape is [..., 4] and represents the coordinates
-      of boxes in x_center, y_center, width, height.
-    box2: a `Tensor` whose shape is [..., 4] and represents the coordinates of
-      boxes in x_center, y_center, width, height.
+    boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates
+      of boxes.
+    boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates
+      of boxes.
+    iou_type: `integer` representing the iou version to use, 0 is distance iou,
+      1 is the general iou, 2 is the complete iou, any other number uses the
+      standard iou.
+    beta: `float` for the scaling quantity to apply to distance iou
+      regularization.

  Returns:
-    iou: a `Tensor` whose shape is [...] and value represents the complete
-      intersection over union.
-
-  Raises:
-    ValueError: If the last dimension of either box1 or box2 is not 4.
+    iou: a `Tensor` who represents the intersection over union in of the
+      expected/input type.
  """
-  with tf.name_scope('ciou'):
-    # compute DIOU and IOU
-    iou, diou = compute_diou(box1, box2)
-
-    # computer aspect ratio consistency
-    arcterm = (
-        tf.math.atan(tf.math.divide_no_nan(box1[..., 2], box1[..., 3])) -
-        tf.math.atan(tf.math.divide_no_nan(box2[..., 2], box2[..., 3])))**2
-    v = 4 * arcterm / (math.pi)**2
-
-    # compute IOU regularization
-    a = tf.math.divide_no_nan(v, ((1 - iou) + v))
-    ciou = diou + v * a
-  return iou, ciou
+  boxes1 = tf.expand_dims(boxes1, axis=-2)
+
+  if boxes2 is not None:
+    boxes2 = tf.expand_dims(boxes2, axis=-3)
+  else:
+    boxes2 = tf.transpose(boxes1, perm=(0, 2, 1, 3))
+
+  if iou_type == 0:  # diou
+    _, iou = compute_diou(boxes1, boxes2, beta=beta, yxyx=True)
+  elif iou_type == 1:  # giou
+    _, iou = compute_giou(boxes1, boxes2, yxyx=True)
+  elif iou_type == 2:  # ciou
+    _, iou = compute_ciou(boxes1, boxes2, yxyx=True)
+  else:
+    iou = compute_iou(boxes1, boxes2, yxyx=True)
+  return iou
--- a/official/vision/beta/projects/yolo/ops/box_ops_test.py
+++ b/official/vision/beta/projects/yolo/ops/box_ops_test.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+"""box_ops tests."""
 from absl.testing import parameterized
 import numpy as np
 import tensorflow as tf
@@ -27,10 +28,8 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
    expected_shape = np.array([num_boxes, 4])
    xywh_box = box_ops.yxyx_to_xcycwh(boxes)
    yxyx_box = box_ops.xcycwh_to_yxyx(boxes)
-    xyxy_box = box_ops.xcycwh_to_xyxy(boxes)
    self.assertAllEqual(tf.shape(xywh_box).numpy(), expected_shape)
    self.assertAllEqual(tf.shape(yxyx_box).numpy(), expected_shape)
-    self.assertAllEqual(tf.shape(xyxy_box).numpy(), expected_shape)

  @parameterized.parameters((1), (5), (7))
  def test_ious(self, num_boxes):
@@ -51,6 +50,5 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase):
    self.assertArrayNear(ciou, expected_iou, 0.001)
    self.assertArrayNear(diou, expected_iou, 0.001)

-
 if __name__ == '__main__':
  tf.test.main()
--- a/official/vision/beta/projects/yolo/ops/math_ops.py
+++ b/official/vision/beta/projects/yolo/ops/math_ops.py
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""A set of private math operations used to safely implement the YOLO loss."""
+import tensorflow as tf
+
+
+def rm_nan_inf(x, val=0.0):
+  """Remove nan and infinity.
+
+  Args:
+    x: any `Tensor` of any type.
+    val: value to replace nan and infinity with.
+
+  Returns:
+    a `Tensor` with nan and infinity removed.
+  """
+  cond = tf.math.logical_or(tf.math.is_nan(x), tf.math.is_inf(x))
+  val = tf.cast(val, dtype=x.dtype)
+  x = tf.where(cond, val, x)
+  return x
+
+
+def rm_nan(x, val=0.0):
+  """Remove nan and infinity.
+
+  Args:
+    x: any `Tensor` of any type.
+    val: value to replace nan.
+
+  Returns:
+    a `Tensor` with nan removed.
+  """
+  cond = tf.math.is_nan(x)
+  val = tf.cast(val, dtype=x.dtype)
+  x = tf.where(cond, val, x)
+  return x
+
+
+def divide_no_nan(a, b):
+  """Nan safe divide operation built to allow model compilation in tflite.
+
+  Args:
+    a: any `Tensor` of any type.
+    b: any `Tensor` of any type with the same shape as tensor a.
+
+  Returns:
+    a `Tensor` representing a divided by b, with all nan values removed.
+  """
+  zero = tf.cast(0.0, b.dtype)
+  return tf.where(b == zero, zero, a / b)
+
+
+def mul_no_nan(x, y):
+  """Nan safe multiply operation.
+
+  Built to allow model compilation in tflite and
+  to allow one tensor to mask another. Where ever x is zero the
+  multiplication is not computed and the value is replaced with a zero. This is
+  required because 0 * nan = nan. This can make computation unstable in some
+  cases where the intended behavior is for zero to mean ignore.
+
+  Args:
+    x: any `Tensor` of any type.
+    y: any `Tensor` of any type with the same shape as tensor x.
+
+  Returns:
+    a `Tensor` representing x times y, where x is used to safely mask the
+    tensor y.
+  """
+  return tf.where(x == 0, tf.cast(0, x.dtype), x * y)
--- a/official/vision/beta/projects/yolo/ops/preprocess_ops.py
+++ b/official/vision/beta/projects/yolo/ops/preprocess_ops.py
@@ -194,11 +194,11 @@ def get_best_anchor(y_true, anchors, width=1, height=1):
  """Gets the correct anchor that is assoiciated with each box using IOU.

  Args:
-    y_true: tf.Tensor[] for the list of bounding boxes in the yolo format
+    y_true: `tf.Tensor[]` for the list of bounding boxes in the yolo format.
    anchors: list or tensor for the anchor boxes to be used in prediction
-      found via Kmeans
-    width: int for the image width
-    height: int for the image height
+      found via Kmeans.
+    width: int for the image width.
+    height: int for the image height.

  Returns:
    tf.Tensor: y_true with the anchor associated with each ground truth
@@ -263,7 +263,7 @@ def build_grided_gt(y_true, mask, size, dtype, use_tie_breaker):

  Args:
    y_true: tf.Tensor[] ground truth
-      [box coords[0:4], classes_onehot[0:-1], best_fit_anchor_box]
+      [box coords[0:4], classes_onehot[0:-1], best_fit_anchor_box].
    mask: list of the anchor boxes choresponding to the output,
      ex. [1, 2, 3] tells this layer to predict only the first 3
      anchors in the total.
@@ -273,7 +273,7 @@ def build_grided_gt(y_true, mask, size, dtype, use_tie_breaker):
    use_tie_breaker: boolean value for wether or not to use the tie_breaker.

  Returns:
-    tf.Tensor[] of shape [size, size, #of_anchors, 4, 1, num_classes]
+    tf.Tensor[] of shape [size, size, #of_anchors, 4, 1, num_classes].
  """
  # unpack required components from the input ground truth
  boxes = tf.cast(y_true['bbox'], dtype)
@@ -391,18 +391,18 @@ def build_batch_grided_gt(y_true, mask, size, dtype, use_tie_breaker):

  Args:
    y_true: tf.Tensor[] ground truth
-      [batch, box coords[0:4], classes_onehot[0:-1], best_fit_anchor_box]
+      [batch, box coords[0:4], classes_onehot[0:-1], best_fit_anchor_box].
    mask: list of the anchor boxes choresponding to the output,
      ex. [1, 2, 3] tells this layer to predict only the first 3 anchors
      in the total.
    size: the dimensions of this output, for regular, it progresses from
-      13, to 26, to 52
-    dtype: expected output datatype
-    use_tie_breaker: boolean value for wether or not to use the tie
-      breaker
+      13, to 26, to 52.
+    dtype: expected output datatype.
+    use_tie_breaker: boolean value for whether or not to use the tie
+      breaker.

  Returns:
-    tf.Tensor[] of shape [batch, size, size, #of_anchors, 4, 1, num_classes]
+    tf.Tensor[] of shape [batch, size, size, #of_anchors, 4, 1, num_classes].
  """
  # unpack required components from the input ground truth
  boxes = tf.cast(y_true['bbox'], dtype)
@@ -521,4 +521,3 @@ def build_batch_grided_gt(y_true, mask, size, dtype, use_tie_breaker):
    update = update.stack()
    full = tf.tensor_scatter_nd_update(full, update_index, update)
  return full
-
--- a/official/vision/beta/projects/yolo/ops/preprocess_ops_test.py
+++ b/official/vision/beta/projects/yolo/ops/preprocess_ops_test.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+"""preprocess_ops tests."""
 from absl.testing import parameterized
 import numpy as np
 import tensorflow as tf