Commit 64f16d61 authored by Akhil Chinnakotla's avatar Akhil Chinnakotla
Browse files

Grammar & Spelling Fixes

parent c02980f4
...@@ -14,30 +14,30 @@ repository. ...@@ -14,30 +14,30 @@ repository.
## Description ## Description
Yolo v1 the original implementation was released in 2015 providing a ground YOLO v1 the original implementation was released in 2015 providing a groundbreaking
breaking algorithm that would quickly process images, and locate objects in a algorithm that would quickly process images and locate objects in a
single pass through the detector. The original implementation based used a single pass through the detector. The original implementation used a
backbone derived from state of the art object classifier of the time, like backbone derived from state of the art object classifiers of the time, like
[GoogLeNet](https://arxiv.org/abs/1409.4842) and [GoogLeNet](https://arxiv.org/abs/1409.4842) and
[VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel [VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel
Yolo Detection head that allowed for Object Detection with a single pass of an YOLO Detection head that allowed for Object Detection with a single pass of an
image. Though limited, the network could predict up to 90 bounding boxes per image. Though limited, the network could predict up to 90 bounding boxes per
image, and was tested for about 80 classes per box. Also, the model could only image, and was tested for about 80 classes per box. Also, the model can only
make prediction at one scale. These attributes caused yolo v1 to be more make predictions at one scale. These attributes caused YOLO v1 to be more
limited, and less versatile, so as the year passed, the Developers continued to limited and less versatile, so as the year passed, the Developers continued to
update and develop this model. update and develop this model.
Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO
network group. These model uses a custom backbone called Darknet53 that uses network group. This model uses a custom backbone called Darknet53 that uses
knowledge gained from the ResNet paper to improve its predictions. The new knowledge gained from the ResNet paper to improve its predictions. The new backbone
backbone also allows for objects to be detected at multiple scales. As for the also allows for objects to be detected at multiple scales. As for the new detection head,
new detection head, the model now predicts the bounding boxes using a set of the model now predicts the bounding boxes using a set of anchor box priors (Anchor
anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in Boxes) as suggestions. Multiscale predictions in combination with Anchor boxes allow
combination with the Anchor boxes allows for the network to make up to 1000 for the network to make up to 1000 object predictions on a single image. Finally,
object predictions on a single image. Finally, the new loss function forces the the new loss function forces the network to make better predictions by using Intersection
network to make better prediction by using Intersection Over Union (IOU) to Over Union (IOU) to inform the model's confidence rather than relying on the mean
inform the model's confidence rather than relying on the mean squared error for squared error for the entire output.
the entire output.
## Authors ## Authors
...@@ -56,9 +56,9 @@ the entire output. ...@@ -56,9 +56,9 @@ the entire output.
## Our Goal ## Our Goal
Our goal with this model conversion is to provide implementations of the Our goal with this model conversion is to provide implementation of the Backbone
Backbone and Yolo Head. We have built the model in such a way that the Yolo and YOLO Head. We have built the model in such a way that the YOLO head could be
head could be connected to a new, more powerful backbone if a person chose to. connected to a new, more powerful backbone if a person chose to.
## Models in the library ## Models in the library
......
...@@ -35,7 +35,7 @@ class ImageClassificationModel(hyperparams.Config): ...@@ -35,7 +35,7 @@ class ImageClassificationModel(hyperparams.Config):
type='darknet', darknet=backbones.Darknet()) type='darknet', darknet=backbones.Darknet())
dropout_rate: float = 0.0 dropout_rate: float = 0.0
norm_activation: common.NormActivation = common.NormActivation() norm_activation: common.NormActivation = common.NormActivation()
# Adds a BatchNormalization layer pre-GlobalAveragePooling in classification # Adds a Batch Normalization layer pre-GlobalAveragePooling in classification.
add_head_batch_norm: bool = False add_head_batch_norm: bool = False
......
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
"""Contains definitions of Darknet Backbone Networks. """Contains definitions of Darknet Backbone Networks.
The models are inspired by ResNet, and CSPNet These models are inspired by ResNet and CSPNet.
Residual networks (ResNets) were proposed in: Residual networks (ResNets) were proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
...@@ -49,7 +49,7 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks ...@@ -49,7 +49,7 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
class BlockConfig: class BlockConfig:
""" """
Class to store layer config to make code more readable This is a class to store layer config to make code more readable.
""" """
def __init__(self, layer, stack, reps, bottleneck, filters, pool_size, def __init__(self, layer, stack, reps, bottleneck, filters, pool_size,
...@@ -69,7 +69,7 @@ class BlockConfig: ...@@ -69,7 +69,7 @@ class BlockConfig:
padding: An `int` for the padding to apply to layers in this stack. padding: An `int` for the padding to apply to layers in this stack.
activation: A `str` for the activation to use for this stack. activation: A `str` for the activation to use for this stack.
route: An `int` for the level to route from to get the next input. route: An `int` for the level to route from to get the next input.
dilation_rate: An `int` for the scale used in dialated Darknet. dilation_rate: An `int` for the scale used in dilated Darknet.
output_name: A `str` for the name to use for this output. output_name: A `str` for the name to use for this output.
is_output: A `bool` for whether this layer is an output in the default is_output: A `bool` for whether this layer is an output in the default
model. model.
...@@ -99,9 +99,10 @@ def build_block_specs(config): ...@@ -99,9 +99,10 @@ def build_block_specs(config):
class LayerBuilder: class LayerBuilder:
""" """
class for quick look up of default layers used by darknet to This is a class that is used for quick look up of default layers used
connect, introduce or exit a level. Used in place of an if condition by darknet to connect, introduce or exit a level. Used in place of an
or switch to make adding new layers easier and to reduce redundant code if condition or switch to make adding new layers easier and to reduce
redundant code.
""" """
def __init__(self): def __init__(self):
...@@ -377,7 +378,7 @@ BACKBONES = { ...@@ -377,7 +378,7 @@ BACKBONES = {
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class Darknet(tf.keras.Model): class Darknet(tf.keras.Model):
""" The Darknet backbone architecture """ """ The Darknet backbone architecture. """
def __init__( def __init__(
self, self,
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
# Lint as: python3 # Lint as: python3
"""Tests for yolo.""" """Tests for YOLO."""
from absl.testing import parameterized from absl.testing import parameterized
import numpy as np import numpy as np
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
# Lint as: python3 # Lint as: python3
"""Feature Pyramid Network and Path Aggregation variants used in YOLO""" """Feature Pyramid Network and Path Aggregation variants used in YOLO."""
import tensorflow as tf import tensorflow as tf
from official.vision.beta.projects.yolo.modeling.layers import nn_blocks from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
...@@ -23,8 +23,10 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks ...@@ -23,8 +23,10 @@ from official.vision.beta.projects.yolo.modeling.layers import nn_blocks
class _IdentityRoute(tf.keras.layers.Layer): class _IdentityRoute(tf.keras.layers.Layer):
def __init__(self, **kwargs): def __init__(self, **kwargs):
"""Private class to mirror the outputs of blocks in nn_blocks for an easier """
programatic generation of the feature pyramid network""" Private class to mirror the outputs of blocks in nn_blocks for an easier
programatic generation of the feature pyramid network.
"""
super().__init__(**kwargs) super().__init__(**kwargs)
...@@ -125,7 +127,7 @@ class YoloFPN(tf.keras.layers.Layer): ...@@ -125,7 +127,7 @@ class YoloFPN(tf.keras.layers.Layer):
# directly connect to an input path and process it # directly connect to an input path and process it
self.preprocessors = dict() self.preprocessors = dict()
# resample an input and merge it with the output of another path # resample an input and merge it with the output of another path
# inorder to aggregate backbone outputs # in order to aggregate backbone outputs
self.resamples = dict() self.resamples = dict()
# set of convoltion layers and upsample layers that are used to # set of convoltion layers and upsample layers that are used to
# prepare the FPN processors for output # prepare the FPN processors for output
...@@ -214,7 +216,7 @@ class YoloPAN(tf.keras.layers.Layer): ...@@ -214,7 +216,7 @@ class YoloPAN(tf.keras.layers.Layer):
kernel_initializer: kernel_initializer for convolutional layers. kernel_initializer: kernel_initializer for convolutional layers.
kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D.
bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d.
fpn_input: `bool`, for whether the input into this fucntion is an FPN or fpn_input: `bool`, for whether the input into this function is an FPN or
a backbone. a backbone.
fpn_filter_scale: `int`, scaling factor for the FPN filters. fpn_filter_scale: `int`, scaling factor for the FPN filters.
**kwargs: keyword arguments to be passed. **kwargs: keyword arguments to be passed.
...@@ -268,7 +270,7 @@ class YoloPAN(tf.keras.layers.Layer): ...@@ -268,7 +270,7 @@ class YoloPAN(tf.keras.layers.Layer):
# directly connect to an input path and process it # directly connect to an input path and process it
self.preprocessors = dict() self.preprocessors = dict()
# resample an input and merge it with the output of another path # resample an input and merge it with the output of another path
# inorder to aggregate backbone outputs # in order to aggregate backbone outputs
self.resamples = dict() self.resamples = dict()
# FPN will reverse the key process order for the backbone, so we need # FPN will reverse the key process order for the backbone, so we need
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
# Lint as: python3 # Lint as: python3
"""Tests for yolo heads.""" """Tests for YOLO heads."""
# Import libraries # Import libraries
from absl.testing import parameterized from absl.testing import parameterized
...@@ -44,7 +44,6 @@ class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): ...@@ -44,7 +44,6 @@ class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase):
inputs[key] = tf.ones(input_shape[key], dtype=tf.float32) inputs[key] = tf.ones(input_shape[key], dtype=tf.float32)
endpoints = head(inputs) endpoints = head(inputs)
# print(endpoints)
for key in endpoints.keys(): for key in endpoints.keys():
expected_input_shape = input_shape[key] expected_input_shape = input_shape[key]
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
# Lint as: python3 # Lint as: python3
"""Contains common building blocks for yolo neural networks.""" """Contains common building blocks for YOLO neural networks."""
from typing import Callable, List from typing import Callable, List
import tensorflow as tf import tensorflow as tf
from official.modeling import tf_utils from official.modeling import tf_utils
...@@ -35,9 +35,9 @@ class Identity(tf.keras.layers.Layer): ...@@ -35,9 +35,9 @@ class Identity(tf.keras.layers.Layer):
class ConvBN(tf.keras.layers.Layer): class ConvBN(tf.keras.layers.Layer):
""" """
Modified Convolution layer to match that of the Darknet Library. Modified Convolution layer to match that of the Darknet Library.
The Layer is a standards combination of Conv BatchNorm Activation, The Layer is a standard combination of Conv BatchNorm Activation,
however, the use of bias in the conv is determined by the use of batch however, the use of bias in the Conv is determined by the use of
normalization. batch normalization.
Cross Stage Partial networks (CSPNets) were proposed in: Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh Ping-Yang Chen, Jun-Wei Hsieh
...@@ -71,16 +71,16 @@ class ConvBN(tf.keras.layers.Layer): ...@@ -71,16 +71,16 @@ class ConvBN(tf.keras.layers.Layer):
use. use.
padding: string 'valid' or 'same', if same, then pad the image, else do padding: string 'valid' or 'same', if same, then pad the image, else do
not. not.
dialtion_rate: tuple to indicate how much to modulate kernel weights and dilation_rate: tuple to indicate how much to modulate kernel weights and
how many pixels in a feature map to skip. how many pixels in a feature map to skip.
kernel_initializer: string to indicate which function to use to initialize kernel_initializer: string to indicate which function to use to initialize
weights. weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization. use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics of all batch norm layers to the models global statistics
...@@ -191,7 +191,7 @@ class ConvBN(tf.keras.layers.Layer): ...@@ -191,7 +191,7 @@ class ConvBN(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class DarkResidual(tf.keras.layers.Layer): class DarkResidual(tf.keras.layers.Layer):
""" """
Darknet block with Residual connection for Yolo v3 Backbone Darknet block with Residual connection for YOLO v3 Backbone
""" """
def __init__(self, def __init__(self,
...@@ -228,8 +228,6 @@ class DarkResidual(tf.keras.layers.Layer): ...@@ -228,8 +228,6 @@ class DarkResidual(tf.keras.layers.Layer):
(across all input batches). (across all input batches).
norm_momentum: float for moment to use for batch normalization. norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon. norm_epsilon: float for batch normalization epsilon.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky. leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer. sc_activation: string for activation function to use in layer.
downsample: boolean for if image input is larger than layer output, set downsample: boolean for if image input is larger than layer output, set
...@@ -352,10 +350,10 @@ class DarkResidual(tf.keras.layers.Layer): ...@@ -352,10 +350,10 @@ class DarkResidual(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class CSPTiny(tf.keras.layers.Layer): class CSPTiny(tf.keras.layers.Layer):
""" """
A Small size convolution block proposed in the CSPNet. The layer uses A small size convolution block proposed in the CSPNet. The layer uses shortcuts,
shortcuts, routing(concatnation), and feature grouping in order to improve routing(concatenation), and feature grouping in order to improve gradient
gradient variablity and allow for high efficency, low power residual learning variability and allow for high efficiency, low power residual learning for small
for small networtf.keras. networks.
Cross Stage Partial networks (CSPNets) were proposed in: Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh Ping-Yang Chen, Jun-Wei Hsieh
...@@ -387,11 +385,11 @@ class CSPTiny(tf.keras.layers.Layer): ...@@ -387,11 +385,11 @@ class CSPTiny(tf.keras.layers.Layer):
weights. weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
use_bn: boolean for whether to use batch normalization.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics of all batch norm layers to the models global statistics
(across all input batches). (across all input batches).
...@@ -401,12 +399,12 @@ class CSPTiny(tf.keras.layers.Layer): ...@@ -401,12 +399,12 @@ class CSPTiny(tf.keras.layers.Layer):
feature stack output. feature stack output.
norm_momentum: float for moment to use for batch normalization. norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon. norm_epsilon: float for batch normalization epsilon.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer.
downsample: boolean for if image input is larger than layer output, set downsample: boolean for if image input is larger than layer output, set
downsample to True so the dimensions are forced to match. downsample to True so the dimensions are forced to match.
leaky_alpha: float to use as alpha if activation function is leaky.
sc_activation: string for activation function to use in layer.
conv_activation: string or None for activation function to use in layer,
if None activation is replaced by linear.
**kwargs: Keyword Arguments. **kwargs: Keyword Arguments.
""" """
...@@ -505,18 +503,18 @@ class CSPTiny(tf.keras.layers.Layer): ...@@ -505,18 +503,18 @@ class CSPTiny(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class CSPRoute(tf.keras.layers.Layer): class CSPRoute(tf.keras.layers.Layer):
""" """
Down sampling layer to take the place of down sampleing done in Residual Down sampling layer to take the place of down sampling done in Residual
networks. This is the first of 2 layers needed to convert any Residual Network networks. This is the first of 2 layers needed to convert any Residual Network
model to a CSPNet. At the start of a new level change, this CSPRoute layer model to a CSPNet. At the start of a new level change, this CSPRoute layer
creates a learned identity that will act as a cross stage connection, creates a learned identity that will act as a cross stage connection that
that is used to inform the inputs to the next stage. It is called cross stage is used to inform the inputs to the next stage. This is called cross stage
partial because the number of filters required in every intermitent Residual partial because the number of filters required in every intermittent residual
layer is reduced by half. The sister layer will take the partial generated by layer is reduced by half. The sister layer will take the partial generated by
this layer and concatnate it with the output of the final residual layer in this layer and concatenate it with the output of the final residual layer in the
the stack to create a fully feature level output. This concatnation merges the stack to create a fully feature level output. This concatenation merges the
partial blocks of 2 levels as input to the next allowing the gradients of each partial blocks of 2 levels as input to the next allowing the gradients of each
level to be more unique, and reducing the number of parameters required by level to be more unique, and reducing the number of parameters required by each
each level by 50% while keeping accuracy consistent. level by 50% while keeping accuracy consistent.
Cross Stage Partial networks (CSPNets) were proposed in: Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
...@@ -544,24 +542,24 @@ class CSPRoute(tf.keras.layers.Layer): ...@@ -544,24 +542,24 @@ class CSPRoute(tf.keras.layers.Layer):
""" """
Args: Args:
filters: integer for output depth, or the number of features to learn filters: integer for output depth, or the number of features to learn
filter_scale: integer dicating (filters//2) or the number of filters in filter_scale: integer dictating (filters//2) or the number of filters in
the partial feature stack. the partial feature stack.
downsample: down_sample the input.
activation: string for activation function to use in layer. activation: string for activation function to use in layer.
kernel_initializer: string to indicate which function to use to kernel_initializer: string to indicate which function to use to
initialize weights. initialize weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization. use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics of all batch norm layers to the models global statistics
(across all input batches). (across all input batches).
norm_momentum: float for moment to use for batch normalization. norm_momentum: float for moment to use for batch normalization.
norm_epsilon: float for batch normalization epsilon. norm_epsilon: float for batch normalization epsilon.
downsample: down_sample the input.
**kwargs: Keyword Arguments. **kwargs: Keyword Arguments.
""" """
...@@ -571,7 +569,7 @@ class CSPRoute(tf.keras.layers.Layer): ...@@ -571,7 +569,7 @@ class CSPRoute(tf.keras.layers.Layer):
self._filter_scale = filter_scale self._filter_scale = filter_scale
self._activation = activation self._activation = activation
# convoultion params # convolution params
self._kernel_initializer = kernel_initializer self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer self._bias_initializer = bias_initializer
self._kernel_regularizer = kernel_regularizer self._kernel_regularizer = kernel_regularizer
...@@ -638,7 +636,7 @@ class CSPRoute(tf.keras.layers.Layer): ...@@ -638,7 +636,7 @@ class CSPRoute(tf.keras.layers.Layer):
class CSPConnect(tf.keras.layers.Layer): class CSPConnect(tf.keras.layers.Layer):
""" """
Sister Layer to the CSPRoute layer. Merges the partial feature stacks Sister Layer to the CSPRoute layer. Merges the partial feature stacks
generated by the CSPDownsampling layer, and the finaly output of the generated by the CSPDownsampling layer, and the final output of the
residual stack. Suggested in the CSPNet paper. residual stack. Suggested in the CSPNet paper.
Cross Stage Partial networks (CSPNets) were proposed in: Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
...@@ -675,10 +673,10 @@ class CSPConnect(tf.keras.layers.Layer): ...@@ -675,10 +673,10 @@ class CSPConnect(tf.keras.layers.Layer):
weights. weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization. use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global of all batch norm layers to the models global
...@@ -750,13 +748,13 @@ class CSPConnect(tf.keras.layers.Layer): ...@@ -750,13 +748,13 @@ class CSPConnect(tf.keras.layers.Layer):
class CSPStack(tf.keras.layers.Layer): class CSPStack(tf.keras.layers.Layer):
""" """
CSP full stack, combines the route and the connect in case you dont want to CSP full stack, combines the route and the connect in case you don't want to
jsut quickly wrap an existing callable or list of layers to just quickly wrap an existing callable or list of layers to make it a cross
make it a cross stage partial. Added for ease of use. you should be able stage partial. Added for ease of use. you should be able to wrap any layer
to wrap any layer stack with a CSP independent of wether it belongs stack with a CSP independent of whether it belongs to the Darknet family. If
to the Darknet family. if filter_scale = 2, then the blocks in the stack filter_scale = 2, then the blocks in the stack passed into the the CSP stack
passed into the the CSP stack should also have filters = filters/filter_scale should also have filters = filters/filter_scale Cross Stage Partial networks
Cross Stage Partial networks (CSPNets) were proposed in: (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu,
Ping-Yang Chen, Jun-Wei Hsieh Ping-Yang Chen, Jun-Wei Hsieh
...@@ -781,11 +779,10 @@ class CSPStack(tf.keras.layers.Layer): ...@@ -781,11 +779,10 @@ class CSPStack(tf.keras.layers.Layer):
**kwargs): **kwargs):
""" """
Args: Args:
filters: integer for output depth, or the number of features to learn.
model_to_wrap: callable Model or a list of callable objects that will model_to_wrap: callable Model or a list of callable objects that will
process the output of CSPRoute, and be input into CSPConnect. process the output of CSPRoute, and be input into CSPConnect.
list will be called sequentially. list will be called sequentially.
downsample: down_sample the input.
filters: integer for output depth, or the number of features to learn.
filter_scale: integer dicating (filters//2) or the number of filters in filter_scale: integer dicating (filters//2) or the number of filters in
the partial feature stack. the partial feature stack.
activation: string for activation function to use in layer. activation: string for activation function to use in layer.
...@@ -793,10 +790,11 @@ class CSPStack(tf.keras.layers.Layer): ...@@ -793,10 +790,11 @@ class CSPStack(tf.keras.layers.Layer):
weights. weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
downsample: down_sample the input.
use_bn: boolean for whether to use batch normalization. use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics of all batch norm layers to the models global statistics
...@@ -891,10 +889,10 @@ class PathAggregationBlock(tf.keras.layers.Layer): ...@@ -891,10 +889,10 @@ class PathAggregationBlock(tf.keras.layers.Layer):
weights. weights.
bias_initializer: string to indicate which function to use to initialize bias_initializer: string to indicate which function to use to initialize
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
bias_regularizer: string to indicate which function to use to regularizer bias_regularizer: string to indicate which function to use to regularizer
bias. bias.
kernel_regularizer: string to indicate which function to use to
regularizer weights.
use_bn: boolean for whether to use batch normalization. use_bn: boolean for whether to use batch normalization.
use_sync_bn: boolean for whether sync batch normalization statistics use_sync_bn: boolean for whether sync batch normalization statistics
of all batch norm layers to the models global statistics of all batch norm layers to the models global statistics
...@@ -905,8 +903,8 @@ class PathAggregationBlock(tf.keras.layers.Layer): ...@@ -905,8 +903,8 @@ class PathAggregationBlock(tf.keras.layers.Layer):
activation: string or None for activation function to use in layer, activation: string or None for activation function to use in layer,
if None activation is replaced by linear. if None activation is replaced by linear.
leaky_alpha: float to use as alpha if activation function is leaky. leaky_alpha: float to use as alpha if activation function is leaky.
downsample: `bool` for whehter to downwample and merge. downsample: `bool` for whether to downsample and merge.
upsample: `bool` for whehter to upsample and merge. upsample: `bool` for whether to upsample and merge.
upsample_size: `int` how much to upsample in order to match shapes. upsample_size: `int` how much to upsample in order to match shapes.
**kwargs: Keyword Arguments. **kwargs: Keyword Arguments.
""" """
...@@ -1050,7 +1048,7 @@ class PathAggregationBlock(tf.keras.layers.Layer): ...@@ -1050,7 +1048,7 @@ class PathAggregationBlock(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class SPP(tf.keras.layers.Layer): class SPP(tf.keras.layers.Layer):
""" """
a non-agregated SPP layer that uses Pooling to gain more performance A non-aggregated SPP layer that uses Pooling to gain more performance.
""" """
def __init__(self, sizes, **kwargs): def __init__(self, sizes, **kwargs):
...@@ -1090,7 +1088,7 @@ class SAM(tf.keras.layers.Layer): ...@@ -1090,7 +1088,7 @@ class SAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521 CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Spatial Attention Model (SAM) Implementation of the Spatial Attention Model (SAM)
""" """
def __init__(self, def __init__(self,
...@@ -1167,7 +1165,7 @@ class CAM(tf.keras.layers.Layer): ...@@ -1167,7 +1165,7 @@ class CAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521 CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Channel Attention Model (CAM) Implementation of the Channel Attention Model (CAM)
""" """
def __init__(self, def __init__(self,
...@@ -1253,7 +1251,7 @@ class CBAM(tf.keras.layers.Layer): ...@@ -1253,7 +1251,7 @@ class CBAM(tf.keras.layers.Layer):
[1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
CBAM: Convolutional Block Attention Module. arXiv:1807.06521 CBAM: Convolutional Block Attention Module. arXiv:1807.06521
implementation of the Convolution Block Attention Module (CBAM) Implementation of the Convolution Block Attention Module (CBAM)
""" """
def __init__(self, def __init__(self,
...@@ -1321,8 +1319,9 @@ class CBAM(tf.keras.layers.Layer): ...@@ -1321,8 +1319,9 @@ class CBAM(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package='yolo') @tf.keras.utils.register_keras_serializable(package='yolo')
class DarkRouteProcess(tf.keras.layers.Layer): class DarkRouteProcess(tf.keras.layers.Layer):
""" """
process darknet outputs and connect back bone to head more generalizably Processes darknet outputs and connects the backbone to the head for more
Abstracts repetition of DarkConv objects that is common in YOLO. generalizability and abstracts the repetition of DarkConv objects that is
common in YOLO.
It is used like the following: It is used like the following:
...@@ -1357,18 +1356,18 @@ class DarkRouteProcess(tf.keras.layers.Layer): ...@@ -1357,18 +1356,18 @@ class DarkRouteProcess(tf.keras.layers.Layer):
filters: the number of filters to be used in all subsequent layers filters: the number of filters to be used in all subsequent layers
filters should be the depth of the tensor input into this layer, filters should be the depth of the tensor input into this layer,
as no downsampling can be done within this layer object. as no downsampling can be done within this layer object.
repetitions: number of times to repeat the processign nodes repetitions: number of times to repeat the processing nodes
for tiny: 1 repition, no spp allowed for tiny: 1 repetition, no spp allowed
for spp: insert_spp = True, and allow for 3+ repetitions for spp: insert_spp = True, and allow for 3+ repetitions
for regular: insert_spp = False, and allow for 3+ repetitions. for regular: insert_spp = False, and allow for 3+ repetitions.
insert_spp: bool if true add the spatial pyramid pooling layer. insert_spp: bool if true add the spatial pyramid pooling layer.
kernel_initializer: method to use to initializa kernel weights. kernel_initializer: method to use to initialize kernel weights.
bias_initializer: method to use to initialize the bias of the conv bias_initializer: method to use to initialize the bias of the conv
layers. layers.
norm_momentum: batch norm parameter see TensorFlow documentation. norm_momentum: batch norm parameter see TensorFlow documentation.
norm_epsilon: batch norm parameter see TensorFlow documentation. norm_epsilon: batch norm parameter see TensorFlow documentation.
activation: activation function to use in processing. activation: activation function to use in processing.
leaky_alpha: if leaky acitivation function, the alpha to use in leaky_alpha: if leaky activation function, the alpha to use in
processing the relu input. processing the relu input.
Returns: Returns:
......
...@@ -4,13 +4,13 @@ import math ...@@ -4,13 +4,13 @@ import math
def yxyx_to_xcycwh(box: tf.Tensor): def yxyx_to_xcycwh(box: tf.Tensor):
"""Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width, """Converts boxes from ymin, xmin, ymax, xmax to x_center, y_center, width,
height. height.
Args: Args:
box: any `Tensor` whose last dimension is 4 representing the coordinates of box: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes in ymin, xmin, ymax, xmax. boxes in ymin, xmin, ymax, xmax.
Returns: Returns:
box: a `Tensor` whose shape is the same as `box` in new format. box: a `Tensor` whose shape is the same as `box` in new format.
""" """
...@@ -52,13 +52,13 @@ def _xcycwh_to_yxyx(box: tf.Tensor, scale): ...@@ -52,13 +52,13 @@ def _xcycwh_to_yxyx(box: tf.Tensor, scale):
def xcycwh_to_yxyx(box: tf.Tensor, darknet=False): def xcycwh_to_yxyx(box: tf.Tensor, darknet=False):
"""Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax, """Converts boxes from x_center, y_center, width, height to ymin, xmin, ymax,
xmax. xmax.
Args: Args:
box: any `Tensor` whose last dimension is 4 representing the coordinates of box: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes in x_center, y_center, width, height. boxes in x_center, y_center, width, height.
Returns: Returns:
box: a `Tensor` whose shape is the same as `box` in new format. box: a `Tensor` whose shape is the same as `box` in new format.
""" """
...@@ -75,9 +75,9 @@ def intersect_and_union(box1, box2, yxyx=False): ...@@ -75,9 +75,9 @@ def intersect_and_union(box1, box2, yxyx=False):
"""Calculates the intersection and union between box1 and box2. """Calculates the intersection and union between box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
...@@ -109,15 +109,15 @@ def smallest_encompassing_box(box1, box2, yxyx=False): ...@@ -109,15 +109,15 @@ def smallest_encompassing_box(box1, box2, yxyx=False):
box1 and box2. box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
Returns: Returns:
box_c: a `Tensor` whose last dimension is 4 representing the coordinates of box_c: a `Tensor` whose last dimension is 4 representing the coordinates of
boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to
to True. In other words it will match the input format. to True. In other words it will match the input format.
""" """
...@@ -145,9 +145,9 @@ def compute_iou(box1, box2, yxyx=False): ...@@ -145,9 +145,9 @@ def compute_iou(box1, box2, yxyx=False):
"""Calculates the intersection over union between box1 and box2. """Calculates the intersection over union between box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
...@@ -167,13 +167,13 @@ def compute_giou(box1, box2, yxyx=False, darknet=False): ...@@ -167,13 +167,13 @@ def compute_giou(box1, box2, yxyx=False, darknet=False):
"""Calculates the General intersection over union between box1 and box2. """Calculates the General intersection over union between box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss. darknet loss.
Returns: Returns:
...@@ -208,15 +208,15 @@ def compute_diou(box1, box2, beta=1.0, yxyx=False, darknet=False): ...@@ -208,15 +208,15 @@ def compute_diou(box1, box2, beta=1.0, yxyx=False, darknet=False):
"""Calculates the distance intersection over union between box1 and box2. """Calculates the distance intersection over union between box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
beta: a `float` indicating the amount to scale the distance iou beta: a `float` indicating the amount to scale the distance iou
regularization term. regularization term.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss. darknet loss.
Returns: Returns:
...@@ -256,13 +256,13 @@ def compute_ciou(box1, box2, yxyx=False, darknet=False): ...@@ -256,13 +256,13 @@ def compute_ciou(box1, box2, yxyx=False, darknet=False):
"""Calculates the complete intersection over union between box1 and box2. """Calculates the complete intersection over union between box1 and box2.
Args: Args:
box1: any `Tensor` whose last dimension is 4 representing the coordinates of box1: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
box2: any `Tensor` whose last dimension is 4 representing the coordinates of box2: any `Tensor` whose last dimension is 4 representing the coordinates of
boxes. boxes.
yxyx: a `bool` indicating whether the input box is of the format x_center yxyx: a `bool` indicating whether the input box is of the format x_center
y_center, width, height or y_min, x_min, y_max, x_max. y_center, width, height or y_min, x_min, y_max, x_max.
darknet: a `bool` indicating whether the calling function is the yolo darknet: a `bool` indicating whether the calling function is the YOLO
darknet loss. darknet loss.
Returns: Returns:
...@@ -297,23 +297,22 @@ def aggregated_comparitive_iou(boxes1, ...@@ -297,23 +297,22 @@ def aggregated_comparitive_iou(boxes1,
boxes2=None, boxes2=None,
iou_type=0, iou_type=0,
beta=0.6): beta=0.6):
"""Calculates the intersection over union between every box in boxes1 and """Calculates the intersection over union between every box in boxes1 and
every box in boxes2. every box in boxes2.
Args: Args:
boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates
of boxes. of boxes.
boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates
of boxes. of boxes.
iou_type: `integer` representing the iou version to use, 0 is distance iou, iou_type: `integer` representing the iou version to use, 0 is distance iou,
1 is the general iou, 2 is the complete iou, any other number uses the 1 is the general iou, 2 is the complete iou, any other number uses the
standard iou. standard iou.
beta: `float` for the scaling quantity to apply to distance iou beta: `float` for the scaling quantity to apply to distance iou
regularization. regularization.
Returns: Returns:
iou: a `Tensor` who represents the intersection over union in of the iou: a `Tensor` who represents the intersection over union in of the
expected/input type. expected/input type.
""" """
boxes1 = tf.expand_dims(boxes1, axis=-2) boxes1 = tf.expand_dims(boxes1, axis=-2)
......
"""A set of private math operations used to safely implement the yolo loss""" """A set of private math operations used to safely implement the YOLO loss."""
import tensorflow as tf import tensorflow as tf
def rm_nan_inf(x, val=0.0): def rm_nan_inf(x, val=0.0):
"""remove nan and infinity """remove nan and infinity
Args: Args:
x: any `Tensor` of any type. x: any `Tensor` of any type.
val: value to replace nan and infinity with. val: value to replace nan and infinity with.
Return: Return:
a `Tensor` with nan and infinity removed. a `Tensor` with nan and infinity removed.
...@@ -19,11 +19,11 @@ def rm_nan_inf(x, val=0.0): ...@@ -19,11 +19,11 @@ def rm_nan_inf(x, val=0.0):
def rm_nan(x, val=0.0): def rm_nan(x, val=0.0):
"""remove nan and infinity. """Remove nan and infinity.
Args: Args:
x: any `Tensor` of any type. x: any `Tensor` of any type.
val: value to replace nan. val: value to replace nan.
Return: Return:
a `Tensor` with nan removed. a `Tensor` with nan removed.
...@@ -35,32 +35,32 @@ def rm_nan(x, val=0.0): ...@@ -35,32 +35,32 @@ def rm_nan(x, val=0.0):
def divide_no_nan(a, b): def divide_no_nan(a, b):
"""Nan safe divide operation built to allow model compilation in tflite. """Nan safe divide operation built to allow model compilation in tflite.
Args: Args:
a: any `Tensor` of any type. a: any `Tensor` of any type.
b: any `Tensor` of any type with the same shape as tensor a. b: any `Tensor` of any type with the same shape as tensor a.
Return: Return:
a `Tensor` representing a divided by b, with all nan values removed. a `Tensor` representing a divided by b, with all nan values removed.
""" """
zero = tf.cast(0.0, b.dtype) zero = tf.cast(0.0, b.dtype)
return tf.where(b == zero, zero, a / b) return tf.where(b == zero, zero, a / b)
def mul_no_nan(x, y): def mul_no_nan(x, y):
"""Nan safe multiply operation built to allow model compilation in tflite and """Nan safe multiply operation built to allow model compilation in tflite and
to allowing one tensor to mask another. Where ever x is zero the to allow one tensor to mask another. Where ever x is zero the
multiplication is not computed and the value is replaced with a zero. This is multiplication is not computed and the value is replaced with a zero. This is
requred because 0 * nan = nan. This can make computation unstable in some required because 0 * nan = nan. This can make computation unstable in some
cases where the intended behavior is for zero to mean ignore. cases where the intended behavior is for zero to mean ignore.
Args: Args:
x: any `Tensor` of any type. x: any `Tensor` of any type.
y: any `Tensor` of any type with the same shape as tensor x. y: any `Tensor` of any type with the same shape as tensor x.
Return: Return:
a `Tensor` representing x times y, where x is used to safely mask the a `Tensor` representing x times y, where x is used to safely mask the
tensor y. tensor y.
""" """
return tf.where(x == 0, tf.cast(0, x.dtype), x * y) return tf.where(x == 0, tf.cast(0, x.dtype), x * y)
...@@ -8,13 +8,12 @@ class TiledNMS(): ...@@ -8,13 +8,12 @@ class TiledNMS():
IOU_TYPES = {'diou': 0, 'giou': 1, 'ciou': 2, 'iou': 3} IOU_TYPES = {'diou': 0, 'giou': 1, 'ciou': 2, 'iou': 3}
def __init__(self, iou_type='diou', beta=0.6): def __init__(self, iou_type='diou', beta=0.6):
'''initialization for all non max supression operations mainly used to '''initialization for all non max suppression operations mainly used to
select hyperperamters for the iou type and scaling. select hyperparameters for the iou type and scaling.
Args: Args:
iou_type: `str` for the version of IOU to use {diou, giou, ciou, iou}. iou_type: `str` for the version of IOU to use {diou, giou, ciou, iou}.
beta: `float` for the amount to scale regualrization on distance iou. beta: `float` for the amount to scale regularization on distance iou.
''' '''
self._iou_type = TiledNMS.IOU_TYPES[iou_type] self._iou_type = TiledNMS.IOU_TYPES[iou_type]
self._beta = beta self._beta = beta
...@@ -54,7 +53,7 @@ class TiledNMS(): ...@@ -54,7 +53,7 @@ class TiledNMS():
overlap too much with respect to IOU. overlap too much with respect to IOU.
output_size: an int32 tensor of size [batch_size]. Representing the number output_size: an int32 tensor of size [batch_size]. Representing the number
of selected boxes for each batch. of selected boxes for each batch.
idx: an integer scalar representing induction variable. idx: an integer scalar representing an induction variable.
Returns: Returns:
boxes: updated boxes. boxes: updated boxes.
...@@ -111,10 +110,10 @@ class TiledNMS(): ...@@ -111,10 +110,10 @@ class TiledNMS():
Assumption: Assumption:
* The boxes are sorted by scores unless the box is a dot (all coordinates * The boxes are sorted by scores unless the box is a dot (all coordinates
are zero). are zero).
* Boxes with higher scores can be used to suppress boxes with lower * Boxes with higher scores can be used to suppress boxes with lower
scores. scores.
The overal design of the algorithm is to handle boxes tile-by-tile: The overall design of the algorithm is to handle boxes tile-by-tile:
boxes = boxes.pad_to_multiply_of(tile_size) boxes = boxes.pad_to_multiply_of(tile_size)
num_tiles = len(boxes) // tile_size num_tiles = len(boxes) // tile_size
...@@ -126,7 +125,7 @@ class TiledNMS(): ...@@ -126,7 +125,7 @@ class TiledNMS():
iou = bbox_overlap(box_tile, suppressing_tile) iou = bbox_overlap(box_tile, suppressing_tile)
# if the box is suppressed in iou, clear it to a dot # if the box is suppressed in iou, clear it to a dot
box_tile *= _update_boxes(iou) box_tile *= _update_boxes(iou)
# Iteratively handle the diagnal tile. # Iteratively handle the diagonal tile.
iou = _box_overlap(box_tile, box_tile) iou = _box_overlap(box_tile, box_tile)
iou_changed = True iou_changed = True
while iou_changed: while iou_changed:
...@@ -232,16 +231,16 @@ class TiledNMS(): ...@@ -232,16 +231,16 @@ class TiledNMS():
This implementation unrolls classes dimension while using the tf.while_loop This implementation unrolls classes dimension while using the tf.while_loop
to implement the batched NMS, so that it can be parallelized at the batch to implement the batched NMS, so that it can be parallelized at the batch
dimension. It should give better performance comparing to v1 implementation. dimension. It should give better performance compared to v1 implementation.
It is TPU compatible. It is TPU compatible.
Args: Args:
boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size, boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size,
N, 1, 4], which box predictions on all feature levels. The N is the N, 1, 4], which box predictions on all feature levels. The N is the
number of total anchors on all levels. number of total anchors on all levels.
scores: a tensor with shape [batch_size, N, num_classes], which stacks scores: a tensor with shape [batch_size, N, num_classes], which stacks
class probability on all feature levels. The N is the number of total class probability on all feature levels. The N is the number of total
anchors on all levels. The num_classes is the number of classes the anchors on all levels. The num_classes is the number of classes the
model predicted. Note that the class_outputs here is the raw score. model predicted. Note that the class_outputs here is the raw score.
pre_nms_top_k: an int number of top candidate detections per class pre_nms_top_k: an int number of top candidate detections per class
before NMS. before NMS.
...@@ -327,21 +326,21 @@ def sorted_non_max_suppression_padded(scores, boxes, max_output_size, ...@@ -327,21 +326,21 @@ def sorted_non_max_suppression_padded(scores, boxes, max_output_size,
def sort_drop(objectness, box, classificationsi, k): def sort_drop(objectness, box, classificationsi, k):
"""This function sorts and drops boxes such that there are only k boxes """This function sorts and drops boxes such that there are only k boxes
sorted by number the objectness or confidence sorted by number the objectness or confidence
Args: Args:
objectness: a `Tensor` of shape [batch size, N] that needs to be objectness: a `Tensor` of shape [batch size, N] that needs to be
filtered. filtered.
box: a `Tensor` of shape [batch size, N, 4] that needs to be filtered. box: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classificationsi: a `Tensor` of shape [batch size, N, num_classes] that classificationsi: a `Tensor` of shape [batch size, N, num_classes] that
needs to be filtered. needs to be filtered.
k: a `integer` for the maximum number of boxes to keep after filtering k: a `integer` for the maximum number of boxes to keep after filtering
Return: Return:
objectness: filtered `Tensor` of shape [batch size, k] objectness: filtered `Tensor` of shape [batch size, k]
boxes: filtered `Tensor` of shape [batch size, k, 4] boxes: filtered `Tensor` of shape [batch size, k, 4]
classifications: filtered `Tensor` of shape [batch size, k, num_classes] classifications: filtered `Tensor` of shape [batch size, k, num_classes]
""" """
# find rhe indexes for the boxes based on the scores # find rhe indexes for the boxes based on the scores
objectness, ind = tf.math.top_k(objectness, k=k) objectness, ind = tf.math.top_k(objectness, k=k)
...@@ -364,25 +363,25 @@ def sort_drop(objectness, box, classificationsi, k): ...@@ -364,25 +363,25 @@ def sort_drop(objectness, box, classificationsi, k):
def segment_nms(boxes, classes, confidence, k, iou_thresh): def segment_nms(boxes, classes, confidence, k, iou_thresh):
"""This is a quick nms that works on very well for small values of k, this """This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones. for classes, and only works to quickly filter boxes on phones.
Args: Args:
boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered. boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
filtered. filtered.
confidence: a `Tensor` of shape [batch size, N] that needs to be confidence: a `Tensor` of shape [batch size, N] that needs to be
filtered. filtered.
k: a `integer` for the maximum number of boxes to keep after filtering k: a `integer` for the maximum number of boxes to keep after filtering
iou_thresh: a `float` for the value above which boxes are consdered to be iou_thresh: a `float` for the value above which boxes are considered to be
too similar, the closer to 1.0 the less that gets though. too similar, the closer to 1.0 the less that gets through.
Return: Return:
boxes: filtered `Tensor` of shape [batch size, k, 4] boxes: filtered `Tensor` of shape [batch size, k, 4]
classes: filtered `Tensor` of shape [batch size, k, num_classes] t classes: filtered `Tensor` of shape [batch size, k, num_classes] t
confidence: filtered `Tensor` of shape [batch size, k] confidence: filtered `Tensor` of shape [batch size, k]
""" """
mrange = tf.range(k) mrange = tf.range(k)
mask_x = tf.tile( mask_x = tf.tile(
...@@ -416,27 +415,27 @@ def nms(boxes, ...@@ -416,27 +415,27 @@ def nms(boxes,
pre_nms_thresh, pre_nms_thresh,
nms_thresh, nms_thresh,
prenms_top_k=500): prenms_top_k=500):
"""This is a quick nms that works on very well for small values of k, this """This is a quick nms that works on very well for small values of k, this
was developed to operate for tflite models as the tiled NMS is far too slow was developed to operate for tflite models as the tiled NMS is far too slow
and typically is not able to compile with tflite. This NMS does not account and typically is not able to compile with tflite. This NMS does not account
for classes, and only works to quickly filter boxes on phones. for classes, and only works to quickly filter boxes on phones.
Args: Args:
boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered. boxes: a `Tensor` of shape [batch size, N, 4] that needs to be filtered.
classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be classes: a `Tensor` of shape [batch size, N, num_classes] that needs to be
filtered. filtered.
confidence: a `Tensor` of shape [batch size, N] that needs to be confidence: a `Tensor` of shape [batch size, N] that needs to be
filtered. filtered.
k: a `integer` for the maximum number of boxes to keep after filtering k: a `integer` for the maximum number of boxes to keep after filtering
nms_thresh: a `float` for the value above which boxes are consdered to be nms_thresh: a `float` for the value above which boxes are considered to be
too similar, the closer to 1.0 the less that gets though. too similar, the closer to 1.0 the less that gets through.
pre_nms_top_k: an int number of top candidate detections per class pre_nms_top_k: an int number of top candidate detections per class
before NMS. before NMS.
Return: Return:
boxes: filtered `Tensor` of shape [batch size, k, 4] boxes: filtered `Tensor` of shape [batch size, k, 4]
classes: filtered `Tensor` of shape [batch size, k, num_classes] classes: filtered `Tensor` of shape [batch size, k, num_classes]
confidence: filtered `Tensor` of shape [batch size, k] confidence: filtered `Tensor` of shape [batch size, k]
""" """
# sort the boxes # sort the boxes
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment