Commit 64f16d61 authored by Akhil Chinnakotla's avatar Akhil Chinnakotla
Browse files

Grammar & Spelling Fixes

parent c02980f4
......@@ -14,30 +14,30 @@ repository.
## Description
Yolo v1 the original implementation was released in 2015 providing a ground
breaking algorithm that would quickly process images, and locate objects in a
single pass through the detector. The original implementation based used a
backbone derived from state of the art object classifier of the time, like
YOLO v1 the original implementation was released in 2015 providing a groundbreaking
algorithm that would quickly process images and locate objects in a
single pass through the detector. The original implementation used a
backbone derived from state of the art object classifiers of the time, like
[GoogLeNet](https://arxiv.org/abs/1409.4842) and
[VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel
Yolo Detection head that allowed for Object Detection with a single pass of an
YOLO Detection head that allowed for Object Detection with a single pass of an
image. Though limited, the network could predict up to 90 bounding boxes per
image, and was tested for about 80 classes per box. Also, the model could only
make prediction at one scale. These attributes caused yolo v1 to be more
limited, and less versatile, so as the year passed, the Developers continued to
image, and was tested for about 80 classes per box. Also, the model can only
make predictions at one scale. These attributes caused YOLO v1 to be more
limited and less versatile, so as the year passed, the Developers continued to
update and develop this model.
Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo
network group. These model uses a custom backbone called Darknet53 that uses
knowledge gained from the ResNet paper to improve its predictions. The new
backbone also allows for objects to be detected at multiple scales. As for the
new detection head, the model now predicts the bounding boxes using a set of
anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in
combination with the Anchor boxes allows for the network to make up to 1000
object predictions on a single image. Finally, the new loss function forces the
network to make better prediction by using Intersection Over Union (IOU) to
inform the model's confidence rather than relying on the mean squared error for
the entire output.
YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO
network group. This model uses a custom backbone called Darknet53 that uses
knowledge gained from the ResNet paper to improve its predictions. The new backbone
also allows for objects to be detected at multiple scales. As for the new detection head,
the model now predicts the bounding boxes using a set of anchor box priors (Anchor
Boxes) as suggestions. Multiscale predictions in combination with Anchor boxes allow
for the network to make up to 1000 object predictions on a single image. Finally,
the new loss function forces the network to make better predictions by using Intersection
Over Union (IOU) to inform the model's confidence rather than relying on the mean
squared error for the entire output.
## Authors
......@@ -56,9 +56,9 @@ the entire output.
## Our Goal
Our goal with this model conversion is to provide implementations of the
Backbone and Yolo Head. We have built the model in such a way that the Yolo
head could be connected to a new, more powerful backbone if a person chose to.
Our goal with this model conversion is to provide implementation of the Backbone
and YOLO Head. We have built the model in such a way that the YOLO head could be
connected to a new, more powerful backbone if a person chose to.
## Models in the library
......
......@@ -35,7 +35,7 @@ class ImageClassificationModel(hyperparams.Config):
type='darknet', darknet=backbones.Darknet())
dropout_rate: float = 0.0
norm_activation: common.NormActivation = common.NormActivation()
# Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
# Adds a Batch Normalization layer pre-GlobalAveragePooling in classification.
add_head_batch_norm: bool = False
......
......@@ -16,7 +16,7 @@
"""Contains definitions of Darknet Backbone Networks.
The models are inspired by ResNet, and CSPNet
These models are inspired by ResNet and CSPNet.
Residual networks (ResNets) were proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
......@@ -49,7 +49,7 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
class BlockConfig:
"""
Class to store layer config to make code more readable
This is a class to store layer config to make code more readable.
"""
def __init__(self, layer, stack, reps, bottleneck, filters, pool_size,
......@@ -69,7 +69,7 @@ class BlockConfig:
padding: An `int` for the padding to apply to layers in this stack.
activation: A `str` for the activation to use for this stack.
route: An `int` for the level to route from to get the next input.
dilation_rate: An `int` for the scale used in dialated Darknet.
dilation_rate: An `int` for the scale used in dilated Darknet.
output_name: A `str` for the name to use for this output.
is_output: A `bool` for whether this layer is an output in the default
model.
......@@ -99,9 +99,10 @@ def build_block_specs(config):
class LayerBuilder:
"""
class for quick look up of default layers used by darknet to
connect, introduce or exit a level. Used in place of an if condition
or switch to make adding new layers easier and to reduce redundant code
This is a class that is used for quick look up of default layers used
by darknet to connect, introduce or exit a level. Used in place of an
if condition or switch to make adding new layers easier and to reduce
redundant code.
"""
def __init__(self):
......@@ -377,7 +378,7 @@ BACKBONES = {
@tf.keras.utils.register_keras_serializable(package='yolo')
class Darknet(tf.keras.Model):
""" The Darknet backbone architecture """
""" The Darknet backbone architecture. """
def __init__(
self,
......
......@@ -13,7 +13,7 @@
# limitations under the License.
# Lint as: python3
"""Tests for yolo."""
"""Tests for YOLO."""
from absl.testing import parameterized
import numpy as np
......
......@@ -13,7 +13,7 @@
# limitations under the License.
# Lint as: python3
"""Feature Pyramid Network and Path Aggregation variants used in YOLO"""
"""Feature Pyramid Network and Path Aggregation variants used in YOLO."""
import tensorflow as tf
from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
......@@ -23,8 +23,10 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
class _IdentityRoute(tf.keras.layers.Layer):
def __init__(self, **kwargs):
"""Private class to mirror the outputs of blocks in nn_blocks for an easier
programatic generation of the feature pyramid network"""
"""
Private class to mirror the outputs of blocks in nn_blocks for an easier
programatic generation of the feature pyramid network.
"""
super().__init__(**kwargs)
......@@ -125,7 +127,7 @@ class YoloFPN(tf.keras.layers.Layer):
# directly connect to an input path and process it
self.preprocessors = dict()
# resample an input and merge it with the output of another path
# inorder to aggregate backbone outputs
# in order to aggregate backbone outputs
self.resamples = dict()
# set of convoltion layers and upsample layers that are used to
# prepare the FPN processors for output
......@@ -214,7 +216,7 @@ class YoloPAN(tf.keras.layers.Layer):
kernel_initializer: kernel_initializer for convolutional layers.
kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D.
bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d.
fpn_input: `bool`, for whether the input into this fucntion is an FPN or
fpn_input: `bool`, for whether the input into this function is an FPN or
a backbone.
fpn_filter_scale: `int`, scaling factor for the FPN filters.
**kwargs: keyword arguments to be passed.
......@@ -268,7 +270,7 @@ class YoloPAN(tf.keras.layers.Layer):
# directly connect to an input path and process it
self.preprocessors = dict()
# resample an input and merge it with the output of another path
# inorder to aggregate backbone outputs
# in order to aggregate backbone outputs
self.resamples = dict()
# FPN will reverse the key process order for the backbone, so we need
......
......@@ -13,7 +13,7 @@
# limitations under the License.
# Lint as: python3
"""Tests for yolo heads."""
"""Tests for YOLO heads."""
# Import libraries
from absl.testing import parameterized
......@@ -44,7 +44,6 @@ class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase):
inputs[key] = tf.ones(input_shape[key], dtype=tf.float32)
endpoints = head(inputs)
# print(endpoints)
for key in endpoints.keys():
expected_input_shape = input_shape[key]
......
......@@ -14,7 +14,7 @@
# Lint as: python3
"""Contains common building blocks for yolo neural networks."""
"""Contains common building blocks for YOLO neural networks."""
from typing import Callable, List
import tensorflow as tf
from official.modeling import tf_utils
......@@ -35,9 +35,9 @@ class Identity(tf.keras.layers.Layer):
class ConvBN(tf.keras.layers.Layer):
"""
Modified Convolution layer to match that of the Darknet Library.
The Layer is a standards combination of Conv BatchNorm Activation,
however, the use of bias in the conv is determined by the use of batch
normalization.
The Layer is a standard combination of Conv BatchNorm Activation,
however, the use of bias in the Conv is determined by the use of
batch normalization.
Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh
......@@ -71,16 +71,16 @@ class ConvBN(tf.keras.layers.Layer):
use.
padding: string 'valid' or 'same', if same, then pad the image, else do
not.
dialtion_rate: tuple to indicate how much to modulate kernel weights and
dilation_rate: tuple to indicate how much to modulate kernel weights and
how many pixels in a feature map to skip.
kernel_initializer: string to indicate which function to use to initialize
weights.
bias_initializer: string to indicate which function to use to initialize
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics
......@@ -191,7 +191,7 @@ class ConvBN(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo')
class DarkResidual(tf.keras.layers.Layer):
"""
Darknet block with Residual connection for Yolo v3 Backbone
Darknet block with Residual connection for YOLO v3 Backbone
"""
def __init__(self,
......@@ -228,8 +228,6 @@ class DarkResidual(tf.keras.layers.Layer):
(across all input batches).
norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer.
downsample: boolean for if image input is larger than layer output, set
......@@ -352,10 +350,10 @@ class DarkResidual(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo')
class CSPTiny(tf.keras.layers.Layer):
"""
A Small size convolution block proposed in the CSPNet. The layer uses
shortcuts, routing(concatnation), and feature grouping in order to improve
gradient variablity and allow for high efficency, low power residual learning
for small networtf.keras.
A small size convolution block proposed in the CSPNet. The layer uses shortcuts,
routing(concatenation), and feature grouping in order to improve gradient
variability and allow for high efficiency, low power residual learning for small
networks.
Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh
......@@ -387,11 +385,11 @@ class CSPTiny(tf.keras.layers.Layer):
weights.
bias_initializer: string to indicate which function to use to initialize
bias.
use_bn: boolean for whether to use batch normalization.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics
(across all input batches).
......@@ -401,12 +399,12 @@ class CSPTiny(tf.keras.layers.Layer):
feature stack output.
norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer.
downsample: boolean for if image input is larger than layer output, set
downsample to True so the dimensions are forced to match.
leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
**kwargs: Keyword Arguments.
"""
......@@ -505,18 +503,18 @@ class CSPTiny(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo')
class CSPRoute(tf.keras.layers.Layer):
"""
Down sampling layer to take the place of down sampleing done in Residual
Down sampling layer to take the place of down sampling done in Residual
networks. This is the first of 2 layers needed to convert any Residual Network
model to a CSPNet. At the start of a new level change, this CSPRoute layer
creates a learned identity that will act as a cross stage connection,
that is used to inform the inputs to the next stage. It is called cross stage
partial because the number of filters required in every intermitent Residual
creates a learned identity that will act as a cross stage connection that
is used to inform the inputs to the next stage. This is called cross stage
partial because the number of filters required in every intermittent residual
layer is reduced by half. The sister layer will take the partial generated by
this layer and concatnate it with the output of the final residual layer in
the stack to create a fully feature level output. This concatnation merges the
this layer and concatenate it with the output of the final residual layer in the
stack to create a fully feature level output. This concatenation merges the
partial blocks of 2 levels as input to the next allowing the gradients of each
level to be more unique, and reducing the number of parameters required by
each level by 50% while keeping accuracy consistent.
level to be more unique, and reducing the number of parameters required by each
level by 50% while keeping accuracy consistent.
Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
......@@ -544,24 +542,24 @@ class CSPRoute(tf.keras.layers.Layer):
"""
Args:
filters: integer for output depth, or the number of features to learn
filter_scale: integer dicating (filters//2) or the number of filters in
filter_scale: integer dictating (filters//2) or the number of filters in
the partial feature stack.
downsample: down_sample the input.
activation: string for activation function to use in layer.
kernel_initializer: string to indicate which function to use to
initialize weights.
bias_initializer: string to indicate which function to use to initialize
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics
(across all input batches).
norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon.
downsample: down_sample the input.
**kwargs: Keyword Arguments.
"""
......@@ -571,7 +569,7 @@ class CSPRoute(tf.keras.layers.Layer):
self._filter_scale = filter_scale
self._activation = activation
# convoultion params
# convolution params
self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer
self._kernel_regularizer = kernel_regularizer
......@@ -638,7 +636,7 @@ class CSPRoute(tf.keras.layers.Layer):
class CSPConnect(tf.keras.layers.Layer):
"""
Sister Layer to the CSPRoute layer. Merges the partial feature stacks
generated by the CSPDownsampling layer, and the finaly output of the
generated by the CSPDownsampling layer, and the final output of the
residual stack. Suggested in the CSPNet paper.
Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
......@@ -675,10 +673,10 @@ class CSPConnect(tf.keras.layers.Layer):
weights.
bias_initializer: string to indicate which function to use to initialize
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global
......@@ -750,13 +748,13 @@ class CSPConnect(tf.keras.layers.Layer):
class CSPStack(tf.keras.layers.Layer):
"""
CSP full stack, combines the route and the connect in case you dont want to
jsut quickly wrap an existing callable or list of layers to
make it a cross stage partial. Added for ease of use. you should be able
to wrap any layer stack with a CSP independent of wether it belongs
to the Darknet family. if filter_scale = 2, then the blocks in the stack
passed into the the CSP stack should also have filters = filters/filter_scale
Cross Stage Partial networks (CSPNets) were proposed in:
CSP full stack, combines the route and the connect in case you don't want to
just quickly wrap an existing callable or list of layers to make it a cross
stage partial. Added for ease of use. you should be able to wrap any layer
stack with a CSP independent of whether it belongs to the Darknet family. If
filter_scale = 2, then the blocks in the stack passed into the the CSP stack
should also have filters = filters/filter_scale Cross Stage Partial networks
(CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh
......@@ -781,11 +779,10 @@ class CSPStack(tf.keras.layers.Layer):
**kwargs):
"""
Args:
filters: integer for output depth, or the number of features to learn.
model_to_wrap: callable Model or a list of callable objects that will
process the output of CSPRoute, and be input into CSPConnect.
list will be called sequentially.
downsample: down_sample the input.
filters: integer for output depth, or the number of features to learn.
filter_scale: integer dicating (filters//2) or the number of filters in
the partial feature stack.
activation: string for activation function to use in layer.
......@@ -793,10 +790,11 @@ class CSPStack(tf.keras.layers.Layer):
weights.
bias_initializer: string to indicate which function to use to initialize
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
downsample: down_sample the input.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics
......@@ -891,10 +889,10 @@ class PathAggregationBlock(tf.keras.layers.Layer):
weights.
bias_initializer: string to indicate which function to use to initialize
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer
bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics
......@@ -905,8 +903,8 @@ class PathAggregationBlock(tf.keras.layers.Layer):
activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky.
downsample: `bool` for whehter to downwample and merge.
upsample: `bool` for whehter to upsample and merge.
downsample: `bool` for whether to downsample and merge.
upsample: `bool` for whether to upsample and merge.
upsample_size: `int` how much to upsample in order to match shapes.
**kwargs: Keyword Arguments.
"""
......@@ -1050,7 +1048,7 @@ class PathAggregationBlock(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo')
class SPP(tf.keras.layers.Layer):
"""
a non-agregated SPP layer that uses Pooling to gain more performance
A non-aggregated SPP layer that uses Pooling to gain more performance.
"""
def __init__(self, sizes, **kwargs):
......@@ -1090,7 +1088,7 @@ class SAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Spatial Attention Model (SAM)
Implementation of the Spatial Attention Model (SAM)
"""
def __init__(self,
......@@ -1167,7 +1165,7 @@ class CAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Channel Attention Model (CAM)
Implementation of the Channel Attention Model (CAM)
"""
def __init__(self,
......@@ -1253,7 +1251,7 @@ class CBAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Convolution Block Attention Module (CBAM)
Implementation of the Convolution Block Attention Module (CBAM)
"""
def __init__(self,
......@@ -1321,8 +1319,9 @@ class CBAM(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo')
class DarkRouteProcess(tf.keras.layers.Layer):
"""
process darknet outputs and connect back bone to head more generalizably
Abstracts repetition of DarkConv objects that is common in YOLO.
Processes darknet outputs and connects the backbone to the head for more
generalizability and abstracts the repetition of DarkConv objects that is
common in YOLO.
It is used like the following:
......@@ -1357,18 +1356,18 @@ class DarkRouteProcess(tf.keras.layers.Layer):
filters: the number of filters to be used in all subsequent layers
filters should be the depth of the tensor input into this layer,
as no downsampling can be done within this layer object.
repetitions: number of times to repeat the processign nodes
for tiny: 1 repition, no spp allowed
repetitions: number of times to repeat the processing nodes
for tiny: 1 repetition, no spp allowed
for spp: insert_spp = True, and allow for 3+ repetitions
for regular: insert_spp = False, and allow for 3+ repetitions.
insert_spp: bool if true add the spatial pyramid pooling layer.
kernel_initializer: method to use to initializa kernel weights.
kernel_initializer: method to use to initialize kernel weights.
bias_initializer: method to use to initialize the bias of the conv
layers.
norm_momentum: batch norm parameter see TensorFlow documentation.
norm_epsilon: batch norm parameter see TensorFlow documentation.
activation: activation function to use in processing.
leaky_alpha: if leaky acitivation function, the alpha to use in
leaky_alpha: if leaky activation function, the alpha to use in
processing the relu input.
Returns:
......
......@@ -4,13 +4,13 @@ import math
def yxyx_to_xcycwh(box: tf.Tensor):
"""Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width,
"""Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width,
height.
Args:
box: any `Tensor` whose last dimension is 4 representing the coordinates of
box: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes in ymin, xmin, ymax, xmax.
Returns:
box: a `Tensor` whose shape is the same as `box` in new format.
"""
......@@ -52,13 +52,13 @@ def _xcycwh_to_yxyx(box: tf.Tensor, scale):
def xcycwh_to_yxyx(box: tf.Tensor, darknet=False):
"""Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax,
"""Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax,
xmax.
Args:
box: any `Tensor` whose last dimension is 4 representing the coordinates of
box: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes in x_center, y_center, width, height.
Returns:
box: a `Tensor` whose shape is the same as `box` in new format.
"""
......@@ -75,9 +75,9 @@ def intersect_and_union(box1, box2, yxyx=False):
"""Calculates the intersection and union between box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
......@@ -109,15 +109,15 @@ def smallest_encompassing_box(box1, box2, yxyx=False):
box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
Returns:
box_c: a `Tensor` whose last dimension is 4 representing the coordinates of
box_c: a `Tensor` whose last dimension is 4 representing the coordinates of
boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to
to True. In other words it will match the input format.
"""
......@@ -145,9 +145,9 @@ def compute_iou(box1, box2, yxyx=False):
"""Calculates the intersection over union between box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
......@@ -167,13 +167,13 @@ def compute_giou(box1, box2, yxyx=False, darknet=False):
"""Calculates the General intersection over union between box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo
darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss.
Returns:
......@@ -208,15 +208,15 @@ def compute_diou(box1, box2, beta=1.0, yxyx=False, darknet=False):
"""Calculates the distance intersection over union between box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
beta: a `float` indicating the amount to scale the distance iou
regularization term.
beta: a `float` indicating the amount to scale the distance iou
regularization term.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo
darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss.
Returns:
......@@ -256,13 +256,13 @@ def compute_ciou(box1, box2, yxyx=False, darknet=False):
"""Calculates the complete intersection over union between box1 and box2.
Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo
darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss.
Returns:
......@@ -297,23 +297,22 @@ def aggregated_comparitive_iou(boxes1,
boxes2=None,
iou_type=0,
beta=0.6):
"""Calculates the intersection over union between every box in boxes1 and
"""Calculates the intersection over union between every box in boxes1 and
every box in boxes2.
Args:
boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates
boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates
of boxes.
boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates
boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates
of boxes.
iou_type: `integer` representing the iou version to use, 0 is distance iou,
1 is the general iou, 2 is the complete iou, any other number uses the
iou_type: `integer` representing the iou version to use, 0 is distance iou,
1 is the general iou, 2 is the complete iou, any other number uses the
standard iou.
beta: `float` for the scaling quantity to apply to distance iou
beta: `float` for the scaling quantity to apply to distance iou
regularization.
Returns:
iou: a `Tensor` who represents the intersection over union in of the
iou: a `Tensor` who represents the intersection over union in of the
expected/input type.
"""
boxes1 = tf.expand_dims(boxes1, axis=-2)
......
"""A set of private math operations used to safely implement the yolo loss"""
"""A set of private math operations used to safely implement the YOLO loss."""
import tensorflow as tf
def rm_nan_inf(x, val=0.0):
"""remove nan and infinity
"""remove nan and infinity
Args:
x: any `Tensor` of any type.
val: value to replace nan and infinity with.
x: any `Tensor` of any type.
val: value to replace nan and infinity with.
Return:
a `Tensor` with nan and infinity removed.
......@@ -19,11 +19,11 @@ def rm_nan_inf(x, val=0.0):
def rm_nan(x, val=0.0):
"""remove nan and infinity.
"""Remove nan and infinity.
Args:
x: any `Tensor` of any type.
val: value to replace nan.
x: any `Tensor` of any type.
val: value to replace nan.
Return:
a `Tensor` with nan removed.
......@@ -35,32 +35,32 @@ def rm_nan(x, val=0.0):
def divide_no_nan(a, b):
"""Nan safe divide operation built to allow model compilation in tflite.
"""Nan safe divide operation built to allow model compilation in tflite.
Args:
a: any `Tensor` of any type.
b: any `Tensor` of any type with the same shape as tensor a.
b: any `Tensor` of any type with the same shape as tensor a.
Return:
a `Tensor` representing a divided by b, with all nan values removed.
a `Tensor` representing a divided by b, with all nan values removed.
"""
zero = tf.cast(0.0, b.dtype)
return tf.where(b == zero, zero, a / b)
def mul_no_nan(x, y):
"""Nan safe multiply operation built to allow model compilation in tflite and
to allowing one tensor to mask another. Where ever x is zero the
multiplication is not computed and the value is replaced with a zero. This is
requred because 0 * nan = nan. This can make computation unstable in some
cases where the intended behavior is for zero to mean ignore.
"""Nan safe multiply operation built to allow model compilation in tflite and
to allow one tensor to mask another. Where ever x is zero the
multiplication is not computed and the value is replaced with a zero. This is
required because 0 * nan = nan. This can make computation unstable in some
cases where the intended behavior is for zero to mean ignore.
Args:
x: any `Tensor` of any type.
y: any `Tensor` of any type with the same shape as tensor x.
x: any `Tensor` of any type.
y: any `Tensor` of any type with the same shape as tensor x.
Return:
a `Tensor` representing x times y, where x is used to safely mask the
tensor y.
a `Tensor` representing x times y, where x is used to safely mask the
tensor y.
"""
return tf.where(x == 0, tf.cast(0, x.dtype), x * y)
......@@ -8,13 +8,12 @@ class TiledNMS():
IOU_TYPES = {'diou': 0, 'giou': 1, 'ciou': 2, 'iou': 3}
def __init__(self, iou_type='diou', beta=0.6):
'''initialization for all non max supression operations mainly used to
select hyperperamters for the iou type and scaling.
'''initialization for all non max suppression operations mainly used to
select hyperparameters for the iou type and scaling.
Args:
Args:
iou_type: `str` for the version of IOU to use {diou, giou, ciou, iou}.
beta: `float` for the amount to scale regualrization on distance iou.
beta: `float` for the amount to scale regularization on distance iou.
'''
self._iou_type = TiledNMS.IOU_TYPES[iou_type]
self._beta = beta
......@@ -54,7 +53,7 @@ class TiledNMS():
overlap too much with respect to IOU.
output_size: an int32 tensor of size [batch_size]. Representing the number
of selected boxes for each batch.
idx: an integer scalar representing induction variable.
idx: an integer scalar representing an induction variable.
Returns:
boxes: updated boxes.
......@@ -111,10 +110,10 @@ class TiledNMS():
Assumption:
* The boxes are sorted by scores unless the box is a dot (all coordinates
are zero).
* Boxes with higher scores can be used to suppress boxes with lower
* Boxes with higher scores can be used to suppress boxes with lower
scores.
The overal design of the algorithm is to handle boxes tile-by-tile:
The overall design of the algorithm is to handle boxes tile-by-tile:
boxes = boxes.pad_to_multiply_of(tile_size)
num_tiles = len(boxes) // tile_size
......@@ -126,7 +125,7 @@ class TiledNMS():
iou = bbox_overlap(box_tile, suppressing_tile)
# if the box is suppressed in iou, clear it to a dot
box_tile *= _update_boxes(iou)
# Iteratively handle the diagnal tile.
# Iteratively handle the diagonal tile.
iou = _box_overlap(box_tile, box_tile)
iou_changed = True
while iou_changed:
......@@ -232,16 +231,16 @@ class TiledNMS():
This implementation unrolls classes dimension while using the tf.while_loop
to implement the batched NMS, so that it can be parallelized at the batch
dimension. It should give better performance comparing to v1 implementation.
dimension. It should give better performance compared to v1 implementation.
It is TPU compatible.
Args:
boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size,
N, 1, 4], which box predictions on all feature levels. The N is the
N, 1, 4], which box predictions on all feature levels. The N is the
number of total anchors on all levels.
scores: a tensor with shape [batch_size, N, num_classes], which stacks
class probability on all feature levels. The N is the number of total
anchors on all levels. The num_classes is the number of classes the
scores: a tensor with shape [batch_size, N, num_classes], which stacks
class probability on all feature levels. The N is the number of total
anchors on all levels. The num_classes is the number of classes the
model predicted. Note that the class_outputs here is the raw score.
pre_nms_top_k: an int number of top candidate detections per class
before NMS.
......@@ -327,21 +326,21 @@ def sorted_non_max_suppression_padded(scores, boxes, max_output_size,
def sort_drop(objectness, box, classificationsi, k):
"""This function sorts and drops boxes such that there are only k boxes
sorted by number the objectness or confidence
"""This function sorts and drops boxes such that there are only k boxes
sorted by number the objectness or confidence
Args:
objectness: a `Tensor` of shape [batch size, N] that needs to be
Args:
objectness: a `Tensor` of shape [batch size, N] that needs to be
filtered.
box: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classificationsi: a `Tensor` of shape [batch size, N, num_classes] that
classificationsi: a `Tensor` of shape [batch size, N, num_classes] that
needs to be filtered.
k: a `integer` for the maximum number of boxes to keep after filtering
Return:
objectness: filtered `Tensor` of shape [batch size, k]
boxes: filtered `Tensor` of shape [batch size, k, 4]
classifications: filtered `Tensor` of shape [batch size, k, num_classes]
objectness: filtered `Tensor` of shape [batch size, k]
boxes: filtered `Tensor` of shape [batch size, k, 4]
classifications: filtered `Tensor` of shape [batch size, k, num_classes]
"""
# find rhe indexes for the boxes based on the scores
objectness, ind = tf.math.top_k(objectness, k=k)
......@@ -364,25 +363,25 @@ def sort_drop(objectness, box, classificationsi, k):
def segment_nms(boxes, classes, confidence, k, iou_thresh):
"""This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones.
"""This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones.
Args:
Args:
boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
filtered.
confidence: a `Tensor` of shape [batch size, N] that needs to be
confidence: a `Tensor` of shape [batch size, N] that needs to be
filtered.
k: a `integer` for the maximum number of boxes to keep after filtering
iou_thresh: a `float` for the value above which boxes are consdered to be
too similar, the closer to 1.0 the less that gets though.
iou_thresh: a `float` for the value above which boxes are considered to be
too similar, the closer to 1.0 the less that gets through.
Return:
boxes: filtered `Tensor` of shape [batch size, k, 4]
classes: filtered `Tensor` of shape [batch size, k, num_classes] t
confidence: filtered `Tensor` of shape [batch size, k]
confidence: filtered `Tensor` of shape [batch size, k]
"""
mrange = tf.range(k)
mask_x = tf.tile(
......@@ -416,27 +415,27 @@ def nms(boxes,
pre_nms_thresh,
nms_thresh,
prenms_top_k=500):
"""This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones.
"""This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones.
Args:
Args:
boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
filtered.
confidence: a `Tensor` of shape [batch size, N] that needs to be
confidence: a `Tensor` of shape [batch size, N] that needs to be
filtered.
k: a `integer` for the maximum number of boxes to keep after filtering
nms_thresh: a `float` for the value above which boxes are consdered to be
too similar, the closer to 1.0 the less that gets though.
nms_thresh: a `float` for the value above which boxes are considered to be
too similar, the closer to 1.0 the less that gets through.
pre_nms_top_k: an int number of top candidate detections per class
before NMS.
Return:
boxes: filtered `Tensor` of shape [batch size, k, 4]
classes: filtered `Tensor` of shape [batch size, k, num_classes]
confidence: filtered `Tensor` of shape [batch size, k]
confidence: filtered `Tensor` of shape [batch size, k]
"""
# sort the boxes
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment