Commit 9cbdcd35 authored by anivegesana's avatar anivegesana
Browse files

Merge branch 'yolo' of https://github.com/PurdueCAM2Project/tf-models into yolo

parents 35e9d291 71ef4530
......@@ -15,7 +15,7 @@ This repository is the unofficial implementation of the following papers. Howeve
Yolo v1 the original implementation was released in 2015 providing a ground breaking algorithm that would quickly process images, and locate objects in a single pass through the detector. The original implementation based used a backbone derived from state of the art object classifier of the time, like [GoogLeNet](https://arxiv.org/abs/1409.4842) and [VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel Yolo Detection head that allowed for Object Detection with a single pass of an image. Though limited, the network could predict up to 90 bounding boxes per image, and was tested for about 80 classes per box. Also, the model could only make prediction at one scale. These attributes caused yolo v1 to be more limited, and less versatile, so as the year passed, the Developers continued to update and develop this model.
Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo network group. These model uses a custom backbone called Darknet53 that uses knowledge gained from the ResNet paper to improve its predictions. The new backbone also allows for objects to be detected at multiple scales. As for the new detection head, the model now predicts the bounding boxes using a set of anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in combination with the Anchor boxes allows for the network to make up to 1000 object predictions on a single image. Finally, the new loss function forces the network to make better prediction by using Intersection Over Union (IOU) to inform the models confidence rather than relying on the mean squared error for the entire output.
Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo network group. These model uses a custom backbone called Darknet53 that uses knowledge gained from the ResNet paper to improve its predictions. The new backbone also allows for objects to be detected at multiple scales. As for the new detection head, the model now predicts the bounding boxes using a set of anchor box priors (Anchor Boxes) as suggestions. The multiscale predictions in combination with the Anchor boxes allows for the network to make up to 1000 object predictions on a single image. Finally, the new loss function forces the network to make better prediction by using Intersection Over Union (IOU) to inform the model's confidence rather than relying on the mean squared error for the entire output.
## Authors
......@@ -33,7 +33,8 @@ Yolo v3 and v4 serve as the most up to date and capable versions of the Yolo net
## Our Goal
Our goal with this model conversion is to provide highly versatile implementations of the Backbone and Yolo Head. We have tried to build the model in such a way that the Yolo head could easily be connected to a new, more powerful backbone if a person chose to.
Our goal with this model conversion is to provide implementations of the Backbone and Yolo Head. We have built the model in such a way that the Yolo head could be connected to a new, more powerful backbone if a person chose to.
## Models in the library
......
......@@ -21,3 +21,12 @@ from official.vision import beta
from official.vision.beta.projects import yolo
from official.vision.beta.projects.yolo.modeling.backbones import Darknet
from official.vision.beta.projects.yolo.configs import darknet_classification
from official.vision.beta.projects.yolo.configs.darknet_classification import image_classification
from official.vision.beta.projects.yolo.configs.darknet_classification import ImageClassificationTask
from official.vision.beta.projects.yolo.tasks.image_classification import ImageClassificationTask
# task_factory.register_task_cls(ImageClassificationTask)(ImageClassificationTask)
# print(task_factory._REGISTERED_TASK_CLS)
\ No newline at end of file
"""Backbones configurations."""
# Import libraries
import dataclasses
from typing import Optional
from official.modeling import hyperparams
# from official.vision.beta.configs import backbones
from official.modeling import hyperparams
from official.vision.beta.configs import backbones
@dataclasses.dataclass
class DarkNet(hyperparams.Config):
"""DarkNet config."""
model_id: str = "darknet53"
# # we could not get this to work
# @dataclasses.dataclass
# class Backbone(backbones.Backbone):
# darknet: DarkNet = DarkNet()
@dataclasses.dataclass
class Backbone(backbones.Backbone):
darknet: DarkNet = DarkNet()
import os
from typing import List
import dataclasses
from official.core import config_definitions as cfg
from official.core import exp_factory
from official.modeling import hyperparams
from official.modeling import optimization
from official.vision.beta.projects.yolo.configs import backbones
from official.vision.beta.configs import common
from official.vision.beta.configs import image_classification as imc
@dataclasses.dataclass
class ImageClassificationModel(hyperparams.Config):
num_classes: int = 0
input_size: List[int] = dataclasses.field(default_factory=list)
backbone: backbones.Backbone = backbones.Backbone(
type='darknet', resnet=backbones.DarkNet())
dropout_rate: float = 0.0
norm_activation: common.NormActivation = common.NormActivation()
# Adds a BatchNormalization layer pre-GlobalAveragePooling in classification
add_head_batch_norm: bool = False
@dataclasses.dataclass
class Losses(hyperparams.Config):
one_hot: bool = True
label_smoothing: float = 0.0
l2_weight_decay: float = 0.0
@dataclasses.dataclass
class ImageClassificationTask(cfg.TaskConfig):
"""The model config."""
model: ImageClassificationModel = ImageClassificationModel()
train_data: imc.DataConfig = imc.DataConfig(is_training=True)
validation_data: imc.DataConfig = imc.DataConfig(is_training=False)
losses: Losses = Losses()
gradient_clip_norm: float = 0.0
logging_dir:str = None
@exp_factory.register_config_factory('darknet_classification')
def image_classification() -> cfg.ExperimentConfig:
"""Image classification general."""
return cfg.ExperimentConfig(
task=ImageClassificationTask(),
trainer=cfg.TrainerConfig(),
restrictions=[
'task.train_data.is_training != None',
'task.validation_data.is_training != None'
])
\ No newline at end of file
runtime:
distribution_strategy: 'mirrored'
mixed_precision_dtype: 'float32'
loss_scale: 'dynamic'
task:
model:
num_classes: 1001
......@@ -9,32 +8,28 @@ task:
backbone:
type: 'darknet'
darknet:
model_id: 'darknet53'
model_id: 'cspdarknettiny'
losses:
l2_weight_decay: 0.0005
one_hot: True
train_data:
tfds_name: 'imagenet_a'
tfds_split: 'test'
tfds_download: True
is_training: True
global_batch_size: 2
input_path: 'imagenet-2012-tfrecord/train*'
is_training: true
global_batch_size: 128
dtype: 'float16'
validation_data:
tfds_name: 'imagenet_a'
tfds_split: 'test'
tfds_download: True
is_training: False
global_batch_size: 2
input_path: 'imagenet-2012-tfrecord/valid*'
is_training: true
global_batch_size: 128
dtype: 'float16'
drop_remainder: False
drop_remainder: false
trainer:
train_steps: 51200000 # in the paper
validation_steps: 25600 # size of validation data
validation_interval: 150
steps_per_loop: 150
summary_interval: 150
checkpoint_interval: 150
train_steps: 800000 # in the paper
validation_steps: 400 # size of validation data
validation_interval: 10000
steps_per_loop: 10000
summary_interval: 10000
checkpoint_interval: 10000
optimizer_config:
optimizer:
type: 'sgd'
......@@ -46,8 +41,8 @@ trainer:
initial_learning_rate: 0.1
end_learning_rate: 0.0001
power: 4.0
decay_steps: 51136000
decay_steps: 799000
warmup:
type: 'linear'
linear:
warmup_steps: 64000 #lr rise from 0 to 0.1 over 1000 steps
warmup_steps: 1000 #learning rate rises from 0 to 0.1 over 1000 steps
"""Contains definitions of Darknet Backbone Networks.
The models are inspired by ResNet, and CSPNet
Residual networks (ResNets) were proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
Cross Stage Partial networks (CSPNets) were proposed in:
[1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh
CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv:1911.11929
DarkNets Are used mainly for Object detection in:
[1] Joseph Redmon, Ali Farhadi
YOLOv3: An Incremental Improvement. arXiv:1804.02767
[2] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao
YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934
"""
import tensorflow as tf
import tensorflow.keras as ks
import collections
......@@ -8,33 +28,39 @@ from official.vision.beta.projects.yolo.modeling import building_blocks as nn_bl
# builder required classes
class BlockConfig(object):
def __init__(self, layer, stack, reps, bottleneck, filters, kernel_size,
'''
get layer config to make code more readable
Args:
layer: string layer name
stack: the type of layer ordering to use for this specific level
repetitions: integer for the number of times to repeat block
bottelneck: boolean for does this stack have a bottle neck layer
filters: integer for the output depth of the level
pool_size: integer the pool_size of max pool layers
kernel_size: optional integer, for convolution kernel size
strides: integer or tuple to indicate convolution strides
padding: the padding to apply to layers in this stack
activation: string for the activation to use for this stack
route: integer for what level to route from to get the next input
output_name: the name to use for this output
is_output: is this layer an output in the default model
'''
def __init__(self, layer, stack, reps, bottleneck, filters, pool_size, kernel_size,
strides, padding, activation, route, output_name, is_output):
'''
get layer config to make code more readable
Args:
layer: string layer name
reps: integer for the number of times to repeat block
filters: integer for the filter for this layer, or the output depth
kernel_size: integer or none, if none, it implies that the the building block handles this automatically. not a layer input
downsample: boolean, to down sample the input width and height
output: boolean, true if the layer is required as an output
'''
self.layer = layer
self.stack = stack
self.repetitions = reps
self.bottleneck = bottleneck
self.filters = filters
self.kernel_size = kernel_size
self.pool_size = pool_size
self.strides = strides
self.padding = padding
self.activation = activation
self.route = route
self.output_name = output_name
self.is_output = is_output
return
def build_block_specs(config):
......@@ -43,48 +69,46 @@ def build_block_specs(config):
specs.append(BlockConfig(*layer))
return specs
class layer_factory(object):
"""
class for quick look up of default layers used by darknet to
connect, introduce or exit a level. Used in place of an if condition
or switch to make adding new layers easier and to reduce redundant code
"""
def __init__(self):
self._layer_dict = {
"DarkTiny": (nn_blocks.DarkTiny, self.darktiny_config_todict),
"DarkConv": (nn_blocks.DarkConv, self.darkconv_config_todict),
"MaxPool": (tf.keras.layers.MaxPool2D, self.maxpool_config_todict)
}
def darkconv_config_todict(config, kwargs):
dictvals = {
"filters": config.filters,
"kernel_size": config.kernel_size,
"strides": config.strides,
"padding": config.padding
}
dictvals.update(kwargs)
return dictvals
def darktiny_config_todict(config, kwargs):
dictvals = {"filters": config.filters, "strides": config.strides}
dictvals.update(kwargs)
return dictvals
def darkconv_config_todict(self, config, kwargs):
dictvals = {
"filters": config.filters,
"kernel_size": config.kernel_size,
"strides": config.strides,
"padding": config.padding
}
dictvals.update(kwargs)
return dictvals
def maxpool_config_todict(config, kwargs):
return {
"pool_size": config.kernel_size,
"strides": config.strides,
"padding": config.padding,
"name": kwargs["name"]
}
def darktiny_config_todict(self, config, kwargs):
dictvals = {"filters": config.filters, "strides": config.strides}
dictvals.update(kwargs)
return dictvals
class layer_registry(object):
def __init__(self):
self._layer_dict = {
"DarkTiny": (nn_blocks.DarkTiny, darktiny_config_todict),
"DarkConv": (nn_blocks.DarkConv, darkconv_config_todict),
"MaxPool": (tf.keras.layers.MaxPool2D, maxpool_config_todict)
def maxpool_config_todict(self, config, kwargs):
return {
"pool_size": config.pool_size,
"strides": config.strides,
"padding": config.padding,
"name": kwargs["name"]
}
return
def _get_layer(self, key):
return self._layer_dict[key]
def __call__(self, config, kwargs):
layer, get_param_dict = self._get_layer(config.layer)
layer, get_param_dict = self._layer_dict[config.layer]
param_dict = get_param_dict(config, kwargs)
return layer(**param_dict)
......@@ -92,7 +116,7 @@ class layer_registry(object):
# model configs
LISTNAMES = [
"default_layer_name", "level_type", "number_of_layers_in_level",
"bottleneck", "filters", "kernal_size", "strides", "padding",
"bottleneck", "filters", "kernal_size", "pool_size", "strides", "padding",
"default_activation", "route", "level/name", "is_output"
]
......@@ -101,12 +125,12 @@ CSPDARKNET53 = {
"splits": {"backbone_split": 106,
"neck_split": 138},
"backbone": [
["DarkConv", None, 1, False, 32, 3, 1, "same", "mish", -1, 0, False], # 1
["DarkRes", "csp", 1, True, 64, None, None, None, "mish", -1, 1, False], # 3
["DarkRes", "csp", 2, False, 128, None, None, None, "mish", -1, 2, False], # 2
["DarkRes", "csp", 8, False, 256, None, None, None, "mish", -1, 3, True],
["DarkRes", "csp", 8, False, 512, None, None, None, "mish", -1, 4, True], # 3
["DarkRes", "csp", 4, False, 1024, None, None, None, "mish", -1, 5, True], # 6 #route
["DarkConv", None, 1, False, 32, None, 3, 1, "same", "mish", -1, 0, False],
["DarkRes", "csp", 1, True, 64, None, None, None, None, "mish", -1, 1, False],
["DarkRes", "csp", 2, False, 128, None, None, None, None, "mish", -1, 2, False],
["DarkRes", "csp", 8, False, 256, None, None, None, None, "mish", -1, 3, True],
["DarkRes", "csp", 8, False, 512, None, None, None, None, "mish", -1, 4, True],
["DarkRes", "csp", 4, False, 1024, None, None, None, None, "mish", -1, 5, True],
]
}
......@@ -114,12 +138,12 @@ DARKNET53 = {
"list_names": LISTNAMES,
"splits": {"backbone_split": 76},
"backbone": [
["DarkConv", None, 1, False, 32, 3, 1, "same", "leaky", -1, 0, False], # 1
["DarkRes", "residual", 1, True, 64, None, None, None, "leaky", -1, 1, False], # 3
["DarkRes", "residual", 2, False, 128, None, None, None, "leaky", -1, 2, False], # 2
["DarkRes", "residual", 8, False, 256, None, None, None, "leaky", -1, 3, True],
["DarkRes", "residual", 8, False, 512, None, None, None, "leaky", -1, 4, True], # 3
["DarkRes", "residual", 4, False, 1024, None, None, None, "leaky", -1, 5, True], # 6
["DarkConv", None, 1, False, 32, None, 3, 1, "same", "leaky", -1, 0, False],
["DarkRes", "residual", 1, True, 64, None, None, None, None, "leaky", -1, 1, False],
["DarkRes", "residual", 2, False, 128, None, None, None, None, "leaky", -1, 2, False],
["DarkRes", "residual", 8, False, 256, None, None, None, None, "leaky", -1, 3, True],
["DarkRes", "residual", 8, False, 512, None, None, None, None, "leaky", -1, 4, True],
["DarkRes", "residual", 4, False, 1024, None, None, None, None, "leaky", -1, 5, True],
]
}
......@@ -127,12 +151,12 @@ CSPDARKNETTINY = {
"list_names": LISTNAMES,
"splits": {"backbone_split": 28},
"backbone": [
["DarkConv", None, 1, False, 32, 3, 2, "same", "leaky", -1, 0, False], # 1
["DarkConv", None, 1, False, 64, 3, 2, "same", "leaky", -1, 1, False], # 1
["CSPTiny", "csp_tiny", 1, False, 64, 3, 2, "same", "leaky", -1, 2, False], # 3
["CSPTiny", "csp_tiny", 1, False, 128, 3, 2, "same", "leaky", -1, 3, False], # 3
["CSPTiny", "csp_tiny", 1, False, 256, 3, 2, "same", "leaky", -1, 4, True], # 3
["DarkConv", None, 1, False, 512, 3, 1, "same", "leaky", -1, 5, True], # 1
["DarkConv", None, 1, False, 32, None, 3, 2, "same", "leaky", -1, 0, False],
["DarkConv", None, 1, False, 64, None, 3, 2, "same", "leaky", -1, 1, False],
["CSPTiny", "csp_tiny", 1, False, 64, None, 3, 2, "same", "leaky", -1, 2, False],
["CSPTiny", "csp_tiny", 1, False, 128, None, 3, 2, "same", "leaky", -1, 3, False],
["CSPTiny", "csp_tiny", 1, False, 256, None, 3, 2, "same", "leaky", -1, 4, True],
["DarkConv", None, 1, False, 512, None, 3, 1, "same", "leaky", -1, 5, True],
]
}
......@@ -140,13 +164,13 @@ DARKNETTINY = {
"list_names": LISTNAMES,
"splits": {"backbone_split": 14},
"backbone": [
["DarkConv", None, 1, False, 16, 3, 1, "same", "leaky", -1, 0, False], # 1
["DarkTiny", None, 1, True, 32, 3, 2, "same", "leaky", -1, 1, False], # 3
["DarkTiny", None, 1, True, 64, 3, 2, "same", "leaky", -1, 2, False], # 3
["DarkTiny", None, 1, False, 128, 3, 2, "same", "leaky", -1, 3, False], # 2
["DarkTiny", None, 1, False, 256, 3, 2, "same", "leaky", -1, 4, True],
["DarkTiny", None, 1, False, 512, 3, 2, "same", "leaky", -1, 5, False], # 3
["DarkTiny", None, 1, False, 1024, 3, 1, "same", "leaky", -1, 5, True], # 6 #route
["DarkConv", None, 1, False, 16, None, 3, 1, "same", "leaky", -1, 0, False],
["DarkTiny", None, 1, True, 32, None, 3, 2, "same", "leaky", -1, 1, False],
["DarkTiny", None, 1, True, 64, None, 3, 2, "same", "leaky", -1, 2, False],
["DarkTiny", None, 1, False, 128, None, 3, 2, "same", "leaky", -1, 3, False],
["DarkTiny", None, 1, False, 256, None, 3, 2, "same", "leaky", -1, 4, True],
["DarkTiny", None, 1, False, 512, None, 3, 2, "same", "leaky", -1, 5, False],
["DarkTiny", None, 1, False, 1024, None, 3, 1, "same", "leaky", -1, 5, True],
]
}
......@@ -164,9 +188,9 @@ class Darknet(ks.Model):
def __init__(
self,
model_id="darknet53",
input_shape=tf.keras.layers.InputSpec(shape=[None, None, None, 3]),
min_size=None,
max_size=5,
input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]),
min_level=None,
max_level=5,
activation=None,
use_sync_bn=False,
norm_momentum=0.99,
......@@ -174,19 +198,18 @@ class Darknet(ks.Model):
kernel_initializer='glorot_uniform',
kernel_regularizer=None,
bias_regularizer=None,
config=None,
**kwargs):
layer_specs, splits = Darknet.get_model_config(model_id)
self._model_name = model_id
self._splits = splits
self._input_shape = input_shape
self._registry = layer_registry()
self._input_shape = input_specs
self._registry = layer_factory()
# default layer look up
self._min_size = min_size
self._max_size = max_size
self._min_size = min_level
self._max_size = max_level
self._output_specs = None
self._kernel_initializer = kernel_initializer
......@@ -195,11 +218,11 @@ class Darknet(ks.Model):
self._norm_epislon = norm_epsilon
self._use_sync_bn = use_sync_bn
self._activation = activation
self._weight_decay = kernel_regularizer
self._kernel_regularizer = kernel_regularizer
self._default_dict = {
"kernel_initializer": self._kernel_initializer,
"weight_decay": self._weight_decay,
"kernel_regularizer": self._kernel_regularizer,
"bias_regularizer": self._bias_regularizer,
"norm_momentum": self._norm_momentum,
"norm_epsilon": self._norm_epislon,
......@@ -211,7 +234,6 @@ class Darknet(ks.Model):
inputs = ks.layers.Input(shape=self._input_shape.shape[1:])
output = self._build_struct(layer_specs, inputs)
super().__init__(inputs=inputs, outputs=output, name=self._model_name)
return
@property
def input_specs(self):
......@@ -250,10 +272,10 @@ class Darknet(ks.Model):
name=f"{config.layer}_{i}")
stack_outputs.append(x_pass)
if (config.is_output and
self._min_size == None): # or isinstance(config.output_name, str):
endpoints[config.output_name] = x
self._min_size == None):
endpoints[str(config.output_name)] = x
elif self._min_size != None and config.output_name >= self._min_size and config.output_name <= self._max_size:
endpoints[config.output_name] = x
endpoints[str(config.output_name)] = x
self._output_specs = {l: endpoints[l].get_shape() for l in endpoints.keys()}
return endpoints
......@@ -334,7 +356,30 @@ class Darknet(ks.Model):
backbone = BACKBONES[name]["backbone"]
splits = BACKBONES[name]["splits"]
return build_block_specs(backbone), splits
@property
def model_id(self):
return self._model_name
@classmethod
def from_config(cls, config, custom_objects=None):
return cls(**config)
def get_config(self):
layer_config = {
"model_id": self._model_name,
"min_level": self._min_size,
"max_level": self._max_size,
"kernel_initializer": self._kernel_initializer,
"kernel_regularizer": self._kernel_regularizer,
"bias_regularizer": self._bias_regularizer,
"norm_momentum": self._norm_momentum,
"norm_epsilon": self._norm_epislon,
"use_sync_bn": self._use_sync_bn,
"activation": self._activation
}
#layer_config.update(super().get_config())
return layer_config
@factory.register_backbone_builder('darknet')
def build_darknet(
......
......@@ -14,7 +14,7 @@ class CSPConnect(ks.layers.Layer):
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
bias_regularizer=None,
weight_decay=None, # default find where is it is stated
kernel_regularizer=None,
use_bn=True,
use_sync_bn=False,
norm_momentum=0.99,
......@@ -30,7 +30,7 @@ class CSPConnect(ks.layers.Layer):
#convoultion params
self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
self._bias_regularizer = bias_regularizer
self._use_bn = use_bn
self._use_sync_bn = use_sync_bn
......@@ -45,7 +45,7 @@ class CSPConnect(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -58,7 +58,7 @@ class CSPConnect(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......
......@@ -14,7 +14,7 @@ class CSPDownSample(ks.layers.Layer):
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
bias_regularizer=None,
weight_decay=None, # default find where is it is stated
kernel_regularizer=None,
use_bn=True,
use_sync_bn=False,
norm_momentum=0.99,
......@@ -30,7 +30,7 @@ class CSPDownSample(ks.layers.Layer):
#convoultion params
self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
self._bias_regularizer = bias_regularizer
self._use_bn = use_bn
self._use_sync_bn = use_sync_bn
......@@ -45,7 +45,7 @@ class CSPDownSample(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -57,7 +57,7 @@ class CSPDownSample(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -70,7 +70,7 @@ class CSPDownSample(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......
......@@ -14,7 +14,7 @@ class CSPTiny(ks.layers.Layer):
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
bias_regularizer=None,
weight_decay=None, # default find where is it is stated
kernel_regularizer=None,
use_bn=True,
use_sync_bn=False,
group_id=1,
......@@ -34,7 +34,7 @@ class CSPTiny(ks.layers.Layer):
self._bias_regularizer = bias_regularizer
self._use_bn = use_bn
self._use_sync_bn = use_sync_bn
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
self._groups = groups
self._group_id = group_id
self._downsample = downsample
......@@ -59,7 +59,7 @@ class CSPTiny(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -75,7 +75,7 @@ class CSPTiny(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -91,7 +91,7 @@ class CSPTiny(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -107,7 +107,7 @@ class CSPTiny(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......@@ -143,7 +143,7 @@ class CSPTiny(ks.layers.Layer):
"strides": self._strides,
"kernel_initializer": self._kernel_initializer,
"bias_initializer": self._bias_initializer,
"weight_decay": self._weight_decay,
"kernel_regularizer": self._kernel_regularizer,
"use_bn": self._use_bn,
"use_sync_bn": self._use_sync_bn,
"norm_moment": self._norm_moment,
......
......@@ -23,7 +23,7 @@ class DarkConv(ks.layers.Layer):
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
bias_regularizer=None,
weight_decay=None, # default find where is it is stated
kernel_regularizer=None, # Specify the weight decay as the default will not work.
use_bn=True,
use_sync_bn=False,
norm_momentum=0.99,
......@@ -66,7 +66,7 @@ class DarkConv(ks.layers.Layer):
self._use_bias = use_bias
self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
self._bias_regularizer = bias_regularizer
# batchnorm params
......@@ -99,7 +99,7 @@ class DarkConv(ks.layers.Layer):
self._kernel_size) == int else self._kernel_size[0]
if self._padding == "same" and kernel_size != 1:
self._zeropad = ks.layers.ZeroPadding2D(
((1, 1), (1, 1))) # symetric padding
((1, 1), (1, 1))) # symmetric padding
else:
self._zeropad = Identity()
......@@ -107,12 +107,12 @@ class DarkConv(ks.layers.Layer):
filters=self._filters,
kernel_size=self._kernel_size,
strides=self._strides,
padding="valid", #self._padding,
padding="valid",
dilation_rate=self._dilation_rate,
use_bias=self._use_bias,
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
kernel_regularizer=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
bias_regularizer=self._bias_regularizer)
#self.conv =tf.nn.convolution(filters=self._filters, strides=self._strides, padding=self._padding
......@@ -136,8 +136,6 @@ class DarkConv(ks.layers.Layer):
self._activation_fn = mish()
else:
self._activation_fn = ks.layers.Activation(activation=self._activation)
super(DarkConv, self).build(input_shape)
return
def call(self, inputs):
......@@ -148,7 +146,7 @@ class DarkConv(ks.layers.Layer):
return x
def get_config(self):
# used to store/share parameters to reconsturct the model
# used to store/share parameters to reconstruct the model
layer_config = {
"filters": self._filters,
"kernel_size": self._kernel_size,
......@@ -159,7 +157,7 @@ class DarkConv(ks.layers.Layer):
"kernel_initializer": self._kernel_initializer,
"bias_initializer": self._bias_initializer,
"bias_regularizer": self._bias_regularizer,
"l2_regularization": self._l2_regularization,
"kernel_regularizer": self._kernel_regularizer,
"use_bn": self._use_bn,
"use_sync_bn": self._use_sync_bn,
"norm_moment": self._norm_moment,
......
......@@ -14,7 +14,7 @@ class DarkResidual(ks.layers.Layer):
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
weight_decay=None,
kernel_regularizer=None,
bias_regularizer=None,
use_bn=True,
use_sync_bn=False,
......@@ -59,7 +59,7 @@ class DarkResidual(ks.layers.Layer):
self._bias_regularizer = bias_regularizer
self._use_bn = use_bn
self._use_sync_bn = use_sync_bn
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
# normal params
self._norm_moment = norm_momentum
......@@ -88,7 +88,7 @@ class DarkResidual(ks.layers.Layer):
norm_momentum=self._norm_moment,
norm_epsilon=self._norm_epsilon,
activation=self._conv_activation,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
leaky_alpha=self._leaky_alpha)
else:
self._dconv = Identity()
......@@ -106,7 +106,7 @@ class DarkResidual(ks.layers.Layer):
norm_momentum=self._norm_moment,
norm_epsilon=self._norm_epsilon,
activation=self._conv_activation,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
leaky_alpha=self._leaky_alpha)
self._conv2 = DarkConv(filters=self._filters,
kernel_size=(3, 3),
......@@ -121,7 +121,7 @@ class DarkResidual(ks.layers.Layer):
norm_momentum=self._norm_moment,
norm_epsilon=self._norm_epsilon,
activation=self._conv_activation,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
leaky_alpha=self._leaky_alpha)
self._shortcut = ks.layers.Add()
......@@ -138,13 +138,13 @@ class DarkResidual(ks.layers.Layer):
return self._activation_fn(x)
def get_config(self):
# used to store/share parameters to reconsturct the model
# used to store/share parameters to reconstruct the model
layer_config = {
"filters": self._filters,
"use_bias": self._use_bias,
"kernel_initializer": self._kernel_initializer,
"bias_initializer": self._bias_initializer,
"weight_decay": self._weight_decay,
"kernel_regularizer": self._kernel_regularizer,
"use_bn": self._use_bn,
"use_sync_bn": self._use_sync_bn,
"norm_moment": self._norm_moment,
......
......@@ -15,7 +15,7 @@ class DarkTiny(ks.layers.Layer):
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
bias_regularizer=None,
weight_decay=None, # default find where is it is stated
kernel_regularizer=None, # default find where is it is stated
use_bn=True,
use_sync_bn=False,
norm_momentum=0.99,
......@@ -34,7 +34,7 @@ class DarkTiny(ks.layers.Layer):
self._use_bn = use_bn
self._use_sync_bn = use_sync_bn
self._strides = strides
self._weight_decay = weight_decay
self._kernel_regularizer = kernel_regularizer
# normal params
self._norm_moment = norm_momentum
......@@ -68,7 +68,7 @@ class DarkTiny(ks.layers.Layer):
kernel_initializer=self._kernel_initializer,
bias_initializer=self._bias_initializer,
bias_regularizer=self._bias_regularizer,
weight_decay=self._weight_decay,
kernel_regularizer=self._kernel_regularizer,
use_bn=self._use_bn,
use_sync_bn=self._use_sync_bn,
norm_momentum=self._norm_moment,
......
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for resnet."""
# Import libraries
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from tensorflow.python.distribute import combinations
from tensorflow.python.distribute import strategy_combinations
from official.vision.beta.projects.yolo.modeling.backbones import Darknet
class DarkNetTest(parameterized.TestCase, tf.test.TestCase):
@parameterized.parameters(
(224, "darknet53", 2, 1),
(224, "darknettiny", 1, 2),
(224, "cspdarknettiny", 1, 1),
(224, "cspdarknet53", 2, 1),
)
def test_network_creation(self, input_size, model_id,
endpoint_filter_scale, scale_final):
"""Test creation of ResNet family models."""
tf.keras.backend.set_image_data_format('channels_last')
network = Darknet.Darknet(model_id=model_id, min_level=3, max_level=5)
print(network.model_id)
self.assertEqual(network.model_id, model_id)
inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1)
endpoints = network(inputs)
self.assertAllEqual(
[1, input_size / 2**3, input_size / 2**3, 128 * endpoint_filter_scale],
endpoints['3'].shape.as_list())
self.assertAllEqual(
[1, input_size / 2**4, input_size / 2**4, 256 * endpoint_filter_scale],
endpoints['4'].shape.as_list())
self.assertAllEqual(
[1, input_size / 2**5, input_size / 2**5, 512 * endpoint_filter_scale * scale_final],
endpoints['5'].shape.as_list())
@combinations.generate(
combinations.combine(
strategy=[
strategy_combinations.tpu_strategy,
strategy_combinations.one_device_strategy_gpu,
],
use_sync_bn=[False, True],
))
def test_sync_bn_multiple_devices(self, strategy, use_sync_bn):
"""Test for sync bn on TPU and GPU devices."""
inputs = np.random.rand(1, 224, 224, 3)
tf.keras.backend.set_image_data_format('channels_last')
with strategy.scope():
network = Darknet.Darknet(model_id="darknet53", min_size=3, max_size=5)
_ = network(inputs)
@parameterized.parameters(1, 3, 4)
def test_input_specs(self, input_dim):
"""Test different input feature dimensions."""
tf.keras.backend.set_image_data_format('channels_last')
input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, input_dim])
network = Darknet.Darknet(model_id="darknet53", min_level=3, max_level=5, input_specs=input_specs)
inputs = tf.keras.Input(shape=(224, 224, input_dim), batch_size=1)
_ = network(inputs)
def test_serialize_deserialize(self):
# Create a network object that sets all of its config options.
kwargs = dict(
model_id="darknet53",
min_level = 3,
max_level = 5,
use_sync_bn=False,
activation='relu',
norm_momentum=0.99,
norm_epsilon=0.001,
kernel_initializer='VarianceScaling',
kernel_regularizer=None,
bias_regularizer=None,
)
network = Darknet.Darknet(**kwargs)
expected_config = dict(kwargs)
self.assertEqual(network.get_config(), expected_config)
# Create another network object from the first object's config.
new_network = Darknet.Darknet.from_config(network.get_config())
# Validate that the config can be forced to JSON.
_ = new_network.to_json()
# If the serialization was successful, the new config should match the old.
self.assertAllEqual(network.get_config(), new_network.get_config())
if __name__ == '__main__':
tf.test.main()
......@@ -54,19 +54,5 @@ class DarkConvTest(tf.test.TestCase, parameterized.TestCase):
self.assertNotIn(None, grad)
return
# @parameterized.named_parameters(("filters", 3), ("filters", 20), ("filters", 512))
# def test_time(self, filters):
# # finish the test for time
# dataset = tfds.load("mnist")
# model = ks.Sequential([
# DarkConv(7, kernel_size=(3,3), strides = (2,2), activation='relu'),
# DarkConv(10, kernel_size=(3,3), strides = (2,2), activation='relu'),
# DarkConv(filters, kernel_size=(3,3), strides = (1,1), activation='relu'),
# DarkConv(9, kernel_size=(3,3), strides = (2,2), activation='relu'),
# ks.layers.GlobalAveragePooling2D(),
# ks.layers.Dense(10, activation='softmax')], name='test')
# return
if __name__ == "__main__":
tf.test.main()
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Image classification task definition."""
import tensorflow as tf
from official.core import base_task
from official.core import input_reader
from official.core import task_factory
from official.modeling import tf_utils
from official.vision.beta.projects.yolo.configs import darknet_classification as exp_cfg
from official.vision.beta.dataloaders import classification_input
from official.vision.beta.modeling import factory
@task_factory.register_task_cls(exp_cfg.ImageClassificationTask)
class ImageClassificationTask(base_task.Task):
"""A task for image classification."""
def build_model(self):
"""Builds classification model."""
input_specs = tf.keras.layers.InputSpec(
shape=[None] + self.task_config.model.input_size)
l2_weight_decay = self.task_config.losses.l2_weight_decay
# Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss.
# (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2)
# (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss)
l2_regularizer = (tf.keras.regularizers.l2(
l2_weight_decay / 2.0) if l2_weight_decay else None)
model = factory.build_classification_model(
input_specs=input_specs,
model_config=self.task_config.model,
l2_regularizer=l2_regularizer)
return model
def build_inputs(self, params, input_context=None):
"""Builds classification input."""
num_classes = self.task_config.model.num_classes
input_size = self.task_config.model.input_size
decoder = classification_input.Decoder()
parser = classification_input.Parser(
output_size=input_size[:2],
num_classes=num_classes,
dtype=params.dtype)
reader = input_reader.InputReader(
params,
dataset_fn=tf.data.TFRecordDataset,
decoder_fn=decoder.decode,
parser_fn=parser.parse_fn(params.is_training))
dataset = reader.read(input_context=input_context)
return dataset
def build_losses(self, labels, model_outputs, aux_losses=None):
"""Sparse categorical cross entropy loss.
Args:
labels: labels.
model_outputs: Output logits of the classifier.
aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model.
Returns:
The total loss tensor.
"""
losses_config = self.task_config.losses
if losses_config.one_hot:
total_loss = tf.keras.losses.categorical_crossentropy(
labels,
model_outputs,
from_logits=True,
label_smoothing=losses_config.label_smoothing)
else:
total_loss = tf.keras.losses.sparse_categorical_crossentropy(
labels, model_outputs, from_logits=True)
total_loss = tf_utils.safe_mean(total_loss)
if aux_losses:
total_loss += tf.add_n(aux_losses)
return total_loss
def build_metrics(self, training=True):
"""Gets streaming metrics for training/validation."""
if self.task_config.losses.one_hot:
metrics = [
tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top_5_accuracy')]
else:
metrics = [
tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
tf.keras.metrics.SparseTopKCategoricalAccuracy(
k=5, name='top_5_accuracy')]
return metrics
def train_step(self, inputs, model, optimizer, metrics=None):
"""Does forward and backward.
Args:
inputs: a dictionary of input tensors.
model: the model, forward pass definition.
optimizer: the optimizer for this training step.
metrics: a nested structure of metrics objects.
Returns:
A dictionary of logs.
"""
features, labels = inputs
if self.task_config.losses.one_hot:
labels = tf.one_hot(labels, self.task_config.model.num_classes)
num_replicas = tf.distribute.get_strategy().num_replicas_in_sync
with tf.GradientTape() as tape:
outputs = model(features, training=True)
# Casting output layer as float32 is necessary when mixed_precision is
# mixed_float16 or mixed_bfloat16 to ensure output is casted as float32.
outputs = tf.nest.map_structure(
lambda x: tf.cast(x, tf.float32), outputs)
# Computes per-replica loss.
loss = self.build_losses(
model_outputs=outputs, labels=labels, aux_losses=model.losses)
# Scales loss as the default gradients allreduce performs sum inside the
# optimizer.
scaled_loss = loss / num_replicas
# For mixed_precision policy, when LossScaleOptimizer is used, loss is
# scaled for numerical stability.
if isinstance(
optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer):
scaled_loss = optimizer.get_scaled_loss(scaled_loss)
tvars = model.trainable_variables
grads = tape.gradient(scaled_loss, tvars)
# Scales back gradient before apply_gradients when LossScaleOptimizer is
# used.
if isinstance(
optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer):
grads = optimizer.get_unscaled_gradients(grads)
# Apply gradient clipping.
if self.task_config.gradient_clip_norm > 0:
grads, _ = tf.clip_by_global_norm(
grads, self.task_config.gradient_clip_norm)
optimizer.apply_gradients(list(zip(grads, tvars)))
logs = {self.loss: loss}
if metrics:
self.process_metrics(metrics, labels, outputs)
logs.update({m.name: m.result() for m in metrics})
elif model.compiled_metrics:
self.process_compiled_metrics(model.compiled_metrics, labels, outputs)
logs.update({m.name: m.result() for m in model.metrics})
return logs
def validation_step(self, inputs, model, metrics=None):
"""Validatation step.
Args:
inputs: a dictionary of input tensors.
model: the keras.Model.
metrics: a nested structure of metrics objects.
Returns:
A dictionary of logs.
"""
features, labels = inputs
if self.task_config.losses.one_hot:
labels = tf.one_hot(labels, self.task_config.model.num_classes)
outputs = self.inference_step(features, model)
outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs)
loss = self.build_losses(model_outputs=outputs, labels=labels,
aux_losses=model.losses)
logs = {self.loss: loss}
if metrics:
self.process_metrics(metrics, labels, outputs)
logs.update({m.name: m.result() for m in metrics})
elif model.compiled_metrics:
self.process_compiled_metrics(model.compiled_metrics, labels, outputs)
logs.update({m.name: m.result() for m in model.metrics})
return logs
def inference_step(self, inputs, model):
"""Performs the forward step."""
return model(inputs, training=False)
......@@ -18,6 +18,7 @@
from absl import app
from absl import flags
import gin
import sys
from official.core import train_utils
# pylint: disable=unused-import
......@@ -31,9 +32,21 @@ from official.modeling import performance
FLAGS = flags.FLAGS
'''
python3 -m official.vision.beta.projects.yolo.train --mode=train_and_eval --experiment=darknet_classification --model_dir=training_dir --config_file=official/vision/beta/projects/yolo/configs/experiments/darknet53.yaml
'''
def import_overrides():
print(sys.modules["official.vision.beta.configs.backbones"])
return
def main(_):
import_overrides()
gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params)
print(FLAGS.experiment)
params = train_utils.parse_configuration(FLAGS)
model_dir = FLAGS.model_dir
if 'train' in FLAGS.mode:
# Pure eval modes do not output yaml files. Otherwise continuous eval job
......
runtime:
all_reduce_alg: null
batchnorm_spatial_persistent: false
dataset_num_private_threads: null
default_shard_dim: -1
distribution_strategy: mirrored
enable_xla: false
gpu_thread_mode: null
loss_scale: null
mixed_precision_dtype: float32
num_cores_per_replica: 1
num_gpus: 0
num_packs: 1
per_gpu_thread_count: 0
run_eagerly: false
task_index: -1
tpu: null
worker_hosts: null
task:
gradient_clip_norm: 0.0
init_checkpoint: ''
logging_dir: null
losses:
l2_weight_decay: 0.0005
label_smoothing: 0.0
one_hot: true
model:
add_head_batch_norm: false
backbone:
darknet:
model_id: cspdarknettiny
type: darknet
dropout_rate: 0.0
input_size: [224, 224, 3]
norm_activation:
activation: relu
norm_epsilon: 0.001
norm_momentum: 0.99
use_sync_bn: false
num_classes: 1001
train_data:
block_length: 1
cache: false
cycle_length: 10
deterministic: null
drop_remainder: true
dtype: float16
enable_tf_data_service: false
global_batch_size: 128
input_path: imagenet-2012-tfrecord/train*
is_training: true
sharding: true
shuffle_buffer_size: 10000
tf_data_service_address: null
tf_data_service_job_name: null
tfds_as_supervised: false
tfds_data_dir: ''
tfds_download: false
tfds_name: ''
tfds_skip_decoding_feature: ''
tfds_split: ''
validation_data:
block_length: 1
cache: false
cycle_length: 10
deterministic: null
drop_remainder: false
dtype: float16
enable_tf_data_service: false
global_batch_size: 128
input_path: imagenet-2012-tfrecord/valid*
is_training: true
sharding: true
shuffle_buffer_size: 10000
tf_data_service_address: null
tf_data_service_job_name: null
tfds_as_supervised: false
tfds_data_dir: ''
tfds_download: false
tfds_name: ''
tfds_skip_decoding_feature: ''
tfds_split: ''
trainer:
allow_tpu_summary: false
best_checkpoint_eval_metric: ''
best_checkpoint_export_subdir: ''
best_checkpoint_metric_comp: higher
checkpoint_interval: 10000
continuous_eval_timeout: 3600
eval_tf_function: true
max_to_keep: 5
optimizer_config:
ema: null
learning_rate:
polynomial:
cycle: false
decay_steps: 799000
end_learning_rate: 0.0001
initial_learning_rate: 0.1
name: PolynomialDecay
power: 4.0
type: polynomial
optimizer:
sgd:
clipnorm: null
clipvalue: null
decay: 0.0
momentum: 0.9
name: SGD
nesterov: false
type: sgd
warmup:
linear:
name: linear
warmup_learning_rate: 0
warmup_steps: 1000
type: linear
steps_per_loop: 10000
summary_interval: 10000
train_steps: 800000
train_tf_function: true
train_tf_while_loop: true
validation_interval: 10000
validation_steps: 400
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment