Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
e79232f9
Unverified
Commit
e79232f9
authored
May 22, 2018
by
Lukasz Kaiser
Committed by
GitHub
May 22, 2018
Browse files
Merge pull request #4334 from gariel-google/master
Added the MorphNet library
parents
81d77669
79680288
Changes
29
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
2396 additions
and
0 deletions
+2396
-0
CODEOWNERS
CODEOWNERS
+1
-0
research/morph_net/README.md
research/morph_net/README.md
+84
-0
research/morph_net/__init__.py
research/morph_net/__init__.py
+0
-0
research/morph_net/framework/README.md
research/morph_net/framework/README.md
+367
-0
research/morph_net/framework/__init__.py
research/morph_net/framework/__init__.py
+0
-0
research/morph_net/framework/concat_and_slice_regularizers.py
...arch/morph_net/framework/concat_and_slice_regularizers.py
+108
-0
research/morph_net/framework/concat_and_slice_regularizers_test.py
...morph_net/framework/concat_and_slice_regularizers_test.py
+59
-0
research/morph_net/framework/generic_regularizers.py
research/morph_net/framework/generic_regularizers.py
+107
-0
research/morph_net/framework/grouping_regularizers.py
research/morph_net/framework/grouping_regularizers.py
+134
-0
research/morph_net/framework/grouping_regularizers_test.py
research/morph_net/framework/grouping_regularizers_test.py
+101
-0
research/morph_net/framework/op_regularizer_manager.py
research/morph_net/framework/op_regularizer_manager.py
+333
-0
research/morph_net/framework/op_regularizer_manager_test.py
research/morph_net/framework/op_regularizer_manager_test.py
+160
-0
research/morph_net/g3doc/grouping.png
research/morph_net/g3doc/grouping.png
+0
-0
research/morph_net/g3doc/histogram.png
research/morph_net/g3doc/histogram.png
+0
-0
research/morph_net/g3doc/tensorboard.png
research/morph_net/g3doc/tensorboard.png
+0
-0
research/morph_net/network_regularizers/__init__.py
research/morph_net/network_regularizers/__init__.py
+0
-0
research/morph_net/network_regularizers/bilinear_cost_utils.py
...rch/morph_net/network_regularizers/bilinear_cost_utils.py
+213
-0
research/morph_net/network_regularizers/bilinear_cost_utils_test.py
...orph_net/network_regularizers/bilinear_cost_utils_test.py
+117
-0
research/morph_net/network_regularizers/flop_regularizer.py
research/morph_net/network_regularizers/flop_regularizer.py
+59
-0
research/morph_net/network_regularizers/flop_regularizer_test.py
...h/morph_net/network_regularizers/flop_regularizer_test.py
+553
-0
No files found.
CODEOWNERS
View file @
e79232f9
...
@@ -24,6 +24,7 @@
...
@@ -24,6 +24,7 @@
/research/lm_1b/ @oriolvinyals @panyx0718
/research/lm_1b/ @oriolvinyals @panyx0718
/research/marco/ @vincentvanhoucke
/research/marco/ @vincentvanhoucke
/research/maskgan/ @a-dai
/research/maskgan/ @a-dai
/research/morph_net/ @gariel-google
/research/namignizer/ @knathanieltucker
/research/namignizer/ @knathanieltucker
/research/neural_gpu/ @lukaszkaiser
/research/neural_gpu/ @lukaszkaiser
/research/neural_programmer/ @arvind2505
/research/neural_programmer/ @arvind2505
...
...
research/morph_net/README.md
0 → 100644
View file @
e79232f9
# MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
[TOC]
## What is MorphNet?
MorphNet is a method for learning deep network structure during training. The
key principle is continuous relaxation of the network-structure learning
problem. Specifically, we use regularizers that induce sparsity in the space of
activations of the network. The regularizers can be tailored to target the
consumption of specific resources by the network, such as FLOPs or model size.
When such a regularizer is added to the training loss and their sum is
minimized via stochastic gradient descent or a similar optimizer, the learning
problem becomes also a constrained optimization of the structure of the network,
under the constraint represented by the regularizer. The method is described in
detail in the
[
this paper
](
https://arxiv.org/abs/1711.06798
)
, to appear in
[
CVPR
2018
](
http://cvpr2018.thecvf.com/
)
.
## Adding a MorphNet regularizer to your training code
Your interaction with the MorphNet codebase will most likely be through
subclasses of
`NetworkRegularizer`
. Each subclass represents a resource that we
wish to target/constrain when optimizing the network. The MorphNet package
provides several
`NetworkRegularizer`
s in the
`network_regularizers`
directory,
as well as a framework for writing your own. The framework is described in
detail
[
here
](
g3doc/regularizers_framework.md
)
. The interface of
`NetworkRegularizer`
is given
[
here
](
g3doc/regularizers_framework.md?#network-regularizers
)
.
To apply a
`NetworkRegularizer`
to your network, your code would look similar to
the example below. The example refers to a specific type of
`NetworkRegularizer`
that targets FLOPs, and to make the discussion simpler we henceforth restrict it
to this case, but generalization to an arbitrary constrained resource and an
arbitrary regularization method that targets that resource is straightforward.
```
python
my_gamma_threshold
=
1e-3
regularizer_strength
=
1e-9
network_reg
=
network_regularizers
.
GammaFlopsRegularizer
(
[
my_network_output
.
op
],
my_gamma_threshold
)
my_training_loss
+=
regularizer_strength
*
network_reg
.
get_regularization_term
()
tf
.
summary
.
scalar
(
'FLOPs'
,
network_reg
.
get_cost
()
```
Once you start your training, your TensorBoard will display the effective FLOP
count of the model. "Effective" is the sense that as activations are zeroed out
by the regularizer, their impact on the FLOP count is discounted.

The larger the
`regularizer_strength`
, the smaller the effective FLOP count to
which the network will converge. If
`regularizer_strength`
is large enough, the
FLOP count will collapse to zero, whereas if it is small enough, the FLOP count
will remain at its initial value and the network structure will not vary.
`regularizer_strength`
is your knob to control where you want to be on the
price-performance curve. The
`my_gamma_threshold`
parameter is used for
determining when an activation is alive. It is described in more detail
[
here
](
framework/README.md?#the-opregularizer-interface
)
, including
an explanation for how to tune it.
## Extracting the architecture learned by MorphNet
One way to extract the structure is querying the
`network_reg`
object created
above. To query which activations in a given op were kept alive (as opposed to
removed) by MorphNet, your code would look similar to
```
python
alive
=
sess
.
run
(
network_reg
.
opreg_manager
.
get_regularizer
(
op
).
alive_vector
)
```
where
`op`
is the tensorflow op in question, and
`sess`
is a tf.Session object.
The result is a vector of booleans, designating which activations were kept
alive (more details can be found
[
here
](
framework/README.md?#the-opregularizer-interface
)
). Typically
one would be interested in the number of alive activations, which can be
obtained by counting the
`True`
values in
`alive`
. Looping over all convolutions
and / or fully connected layers (as
`op`
) is typically sufficient to extract the
full structure learned by MorphNet.
## Maintainers
*
Elad Eban
*
Ariel Gordon, github:
[
gariel-google
](
https://github.com/gariel-google
)
.
research/morph_net/__init__.py
0 → 100644
View file @
e79232f9
research/morph_net/framework/README.md
0 → 100644
View file @
e79232f9
# Regularizers Framework
[TOC]
## Goal
The goal of this framework is to facilitate building sparsifying regularizers
for deep networks. A regularizer targets a certain cost (
**
*
targeted
cost
**
*
), such as the FLOP cost of inference, model size, latency, memory
footprint, etc.
In order to form such a regularizer, we traverse the TensorFlow graph and find
the ops that contribute to the
*targeted cost*
. For each op, we apply a
sparsifying regularizer that induces sparsity
*among the activations*
. The
sparsifying regularizer of each activation is weighted by its marginal
contribution to the
*targeted cost*
.
Calculating this weight may be a nontrivial task. For example, for a fully
connected layer the FLOP cost is proportional to the number of inputs times
the number of outputs, which means that the marginal cost of each output
is proportional to the number of inputs. Some of the inputs may have been
already regularized away, which means that the calculation of one op's FLOP
regularizer depends on the regularization of the output of other ops. Moreover,
if an op receives its input a concatenation or a sum of several other ops,
figuring out the regularizer requires some bookkeeping.
The goal of this framework is to take care of this bookkeeping in a general way,
to facilitate building a wide variety of regularizers, targeting a wide variety
of
*targeted costs*
, with little effort and less opportunities to err. In
what follows we outline the framework, building it from the bottom up: From a
single activation all the way to a full complex network.
## `OpRegularizers` and how they are assigned
### The `OpRegularizer` interface
`OpRegularizer`
is the most primitive element in the framework. An
`OpRegularizer`
refers to TensorFlow op, and has two methods,
`regularization_vector`
and
`alive_vector`
, both return
`tf.Tensor`
s or rank 1
(vectors).
`regularization_vector`
is of type float, and its
`i`
-th entry is the
regularizer of the
`i`
-th activation of the op the
`OpRegularizer`
refers to.
In order to regularize away that activation, one would need to add the
`i`
-th
entry of
`regularization_vector`
, multiplied by some coefficient, to the
training loss. The stronger we want to penalize it, the larger the coefficient
is. Assuming that the regularizer is of sparsifying nature (e.g. L1 norm), with
a large enough coefficient, the
`i`
-th activation will eventually vanish.
Loosely speaking, if we were to target the total number of activations in the
network, we would add the sum of all
`regularization_vector`
s from all
`OpRegularizer`
to the training loss.
Since
`OpRegularizer`
is an abstract interface, with no awareness of the nature
of regularization used, the decision when an activation can be considered alive
is also deferred to
`OpRegularizer`
, via the
`alive_vector`
method. The
`i`
-th
entry evaluates to a boolean that indicates whether the activation is alive.
```
python
class
OpRegularizer
(
object
):
@
abc
.
abstractproperty
def
regularization_vector
(
self
):
"""Returns a vector of floats with a regularizer for each activation."""
pass
@
abc
.
abstractproperty
def
alive_vector
(
self
):
"""Returns a bool vector indicating which activations are alive."""
pass
```
As an example, we can consider a fully connected layer that has
`m`
inputs and
`n`
outputs. The layer is represented by an
`m * n`
matrix, and one way to
impose sparsifying regularizer on the
`i`
-th output is by grouping all weights
associated with it into a group LASSO regularizer, such as the L2 norm of the
`i`
-th row of the matrix. That would therefore be the
`i`
-th entry of the
`regularization_vector`
.
When such a regularization is added to the training loss, the L2 norms of the
rows of the matrix tend to form a bimodal distribution with one peak near "zero"
(up to numerical noise), another peak away from zero, and a void in between. A
natural way to detemine whether the
`i`
-th activation is alive is thus by
comparing the
`i`
-th entry of the
`regularization_vector`
to some threshold that
lies in that void: If it's above the threshold, it's alive.

There are ops that are not regularized, such as constants, or the input to the
network. For an un-regularized op, the
`OpRegularizer`
is set to
`None`
, which
implies an all-zero
`regularization_vector`
and an all-True
`alive_vector`
.
### Rules for assigning `OpRegularizer`s to ops
As we traverse the TensorFlow graph, we assign an
`OpRegularizer`
to each op we
encounter according to the set of rules outlined in this section. We first
explain "default rules", rules that address propagating
`OpRegularizers`
across
connections in the TensorFlow graph. Then we discuss client-specified rules,
which can augment and override the default rules.
#### Pass-through ops
Many TensorFlow ops inherit the
`OpRegularizer`
of their input. These are ops
that:
*
Don't change the alive status of activations.
*
The only way an activation can be eliminated form their output is if
it's eliminated from their input.
An example is adding a bias to the output of a convolution. After adding a bias
to it, an activation will be alive (that is, have nonzero variance) if and only
if was alive before adding the bias. If we want to regularize away an activation
at the output of a
`BiasAdd`
op, the only way to do so is to penalize the same
activation in the preceding convolution.
Since both the
`regularization_vector`
and the
`alive_vector`
of such an op is
identical to those of its input, so is the entire
`OpRegularizer`
. We refer to
such ops as
*pass-through*
ops. Shape-preserving unary ops (e.g. ReLU) are
generally
*pass-through*
, but some binary ops are too. In our framework ops are
assumed to be
*pass-through*
by default. Exceptions to this rule are discussed
below.
#### Grouping
When learning the number of outputs of ops in a TensorFlow graph, some ops are
constrained to maintain the same number of outputs as others. Elementwise
ops that are performed on two (or more) tensors, such as addition,
multiplication, or maximum, constrain their input tensors to have the same size.
Common use cases are attention maps, recurrent models, and residual connections.
An example of a residual connection is illustrated in the diagram below. It
would be problematic if the activations of op1 and op2 didn't live or die
together. For example, if the
`i`
-th activation of op1 is alive but for op2 it's
dead, we still cannot eliminate the
`i`
-th activation from op2 without breaking
the topology of the network.

In our framework we choose to impose preservation of the topology. That is, ops
that are connected with addition (or other elementwise binary ops) are
constrained to have their activations live and die together. The
`i`
-th
activations of each of those ops are grouped together in a single LASSO group.
The default grouping mechanism is maximum for the
`regularization_vector`
and
elementwise logical OR for the
`alive_vector`
. To regularize away the
`i`
-th
element of the group one needs to penalize the maximum of
`i`
-th regularization
terms of all ops comprising the group, and to declare the entire
`i`
-th group
dead, the
`i`
-th element in all ops comprising the group must be dead. However
the framework admits other forms of grouping, and user-defined grouping methods
can be easily plugged into it.
One property of the grouping, which may seem confusing initially, is that once
two (or more)
`OpRegularizer`
s are grouped, and the
`OpRegularizer`
of the
group is formed, the
`OpRegularizer`
s comprising the group are all 'replaced' by
the
`OpRegularizer`
of the group. For example, in the diagram above, the
`OpRegularizer`
s of op1 and op2 have to be grouped. Therefore if the
`i`
-th
output of op1 is alive and that of op2 is dead, and we use the default grouping
described above, the
`i`
-th output of the group is
*alive*
.
Now, consider op4, which receives only op2 as input. From the point of view of
op4, the
`i`
-th activation of op2 must be considered
*alive*
, even though the
original op2 regularizer deemed it
*dead*
. This is because we already know that
we won't be able to do away with the
`i`
-th activation of op2 - it is tied to
the one of op1, which is alive. Therefore after the grouping, the
`OpRegularizer`
s of all constituents of the group are henceforth
*replaced*
by
the
`OpRegularizer`
of the group.
#### Concatenation
Often outputs of several ops are concatenated to a single tensor. For example,
in Inception networks, the outputs of various convolutional 'towers' are
concatenated along the channels dimension. In such a case, it is obvious that
the
`regularization_vector`
(
`alive_vector`
) of the concatenation is a
concatenation of the
`regularization_vector`
(
`alive_vector`
) of the
concatenated ops.
Similarly to the logic of grouping, once the concatenation of the
`OpRegularizer`
s has happened, the concatenated
`OpRegularizer`
s cease to exist
and are replaced by slices of their concatenation. For example if op1 has 3
outputs and op2 has 4, and op3 is their concatenation, op3 has 7 outputs. After
the concatenation, the
`alive_vector`
of op1 will be a slice (from index 0 to
index 2) of the
`alive_vector`
of op3, whereas for op2 it will be another slice
(index from 3 to index 6).
If op3 is later grouped with op4, as happens in Inception ResNet architectures,
a group will be formed, and the
`alive_vector`
of op1 will henceforth be a slice
(index from 0 to index 2) of the
`alive_vector`
of
*the new group*
. This is for
the same reasons as the ones described in the section above.
#### Client-specified rules
The client code of the framework has the opportunity to specify rules for
creating
`OpRegularizers`
. For example, for ops of type
`MatMul`
, which are the
common implementation of fully-connected layers, the client can choose to assign
group LASSO regularizers similar to the one described above. Typically the
client code would choose to do that for 'interesting' ops, like convolutions and
fully-connected layers, but the choice of rules is ultimately deferred to the
client code.
The client code may also choose to override the
*default rules*
. Ops are
considered
*pass-through*
by default, and obviously there are cases where this
is not true, such as reshaping, slicing, sparse maxtrix operations etc.
TensorFlow is much too expressive for us to be able to anticipate every usage
pattern of its ops and to properly regularize them. The set of default rules
cover most of the common published convolutional networks, but we do not presume
to cover
*all*
networks. More complex networks may require adding some custom
rules.
### OpRegularizerManager
`OpRegularizerManager`
is the class responsible for assigning an
`OpRegularizer`
to each op in the TensorFlow graph. Its constructor crawls the TensorFlow graph,
starting from the ops listed in the
`ops`
argument (typically the output of the
network), recursively, and assigns
`OpRegularizer`
s to each op encountered. Once
the object is constructed, it provides read-only methods that allow querying the
`OpRegularizer`
for any op that was encountered during construction, and a list
of the latter ops for convenience.
```
python
class
OpRegularizerManager
(
object
):
"""Assigns OpRegularizers to ops in a graph and bookkeeps the mapping."""
def
__init__
(
self
,
ops
,
op_regularizer_factory_dict
,
create_grouping_regularizer
=
None
):
"""Creates an instance.
Args:
ops: A list of tf.Operation. An OpRegularizer will be created for all the
ops in `ops`, and recursively for all ops they depend on via data
dependency. Typically `ops` would contain a single tf.Operation, which
is the output of the network.
op_regularizer_factory_dict: A dictionary, where the keys are strings
representing TensorFlow Op types, and the values are callables that
create the respective OpRegularizers. For every op encountered during
the recursion, if op.type is in op_regularizer_factory_dict, the
respective callable will be used to create an OpRegularizer. The
signature of the callables is the following args;
op; a tf.Operation for which to create a regularizer.
opreg_manager; A reference to an OpRegularizerManager object. Can be
None if the callable does not need access to OpRegularizerManager.
create_grouping_regularizer: A callable that has the signature of
grouping_regularizers.MaxGroupingRegularizer's constructor. Will be
called whenever a grouping op (see _GROUPING_OPS) is encountered.
Defaults to MaxGroupingRegularizer if None.
Raises:
ValueError: If ops is not a list.
"""
...
def
get_regularizer
(
self
,
op
):
"""Returns the OpRegularizer object pertaining to `op`.
Args:
op: a tf.Operation object.
Returns:
An OpRegularizer object, or None if `op` does not have one.
Raises:
ValueError: The OpRegularizerManager object did not encounter `op` when
it was constructed and the grpah was traversed, and thus does not know
the answer.
"""
...
@
property
def
ops
(
self
):
"""Returns all tf.Operations for which `get_regularizer` is known."""
...
```
As the constructor crawls the graph, it invokes the following set of rules, for
any op encountered:
*
If
`op_regularizer_factory_dict`
has a rule on how to create an
`OpRegularizer`
for the type of the op encountered, invoke the rule. These
are the user-specified rules. Otherwise:
*
If the op has no inputs, return
`None`
. Examples are constants and variables.
Otherwise:
*
If the ops is concatenation, invoke the rule for concatenation decribed above.
Otherwise:
*
If the op has more than one regularized input (that is, input that has a non-
`None`
`OpRegularizer`
, perform grouping. Being conservative, we first check if
the op is whitelisted for being a grouping op (elemetwise addition, subtraction
etc). Otherwise:
*
The op is a
*pass-through*
. That is, its OpRegularizer is the same as of its
input.
The implementaiton is recursive: We start from the output nodes(s) of the graph.
To build an
`OpRegularizer`
for each op, we need to know the
`OpRegularizer`
of
its inputs, so we make a recursive call to find out those, and so on.
<!-- TODO: Explain how to change the grouping mechanism. -->
## Network Regularizers
A
`NetworkRegularizer`
object targets a certain
*targeted cost*
of an entire
network. Its interface is:
```
python
class
NetworkRegularizer
(
object
):
"""An interface for Network Regularizers."""
@
abc
.
abstractmethod
def
get_regularization_term
(
self
,
ops
=
None
):
"""Compute the FluidNet regularization term.
Args:
ops: A list of tf.Operation. If specified, only the regularization term
associated with the ops in `ops` will be returned. Otherwise, all
relevant ops in the default TensorFlow graph will be included.
Returns:
A tf.Tensor scalar of floating point type that evaluates to the
regularization term.
"""
pass
@
abc
.
abstractmethod
def
get_cost
(
self
,
ops
=
None
):
"""Calculates the cost targeted by the Regularizer.
Args:
ops: A list of tf.Operation. If specified, only the cost pertaining to the
ops in the `ops` will be returned. Otherwise, all relevant ops in the
default TensorFlow graph will be included.
Returns:
A tf.Tensor scalar that evaluates to the cost.
"""
pass
```
The TensorFlow scalar returned by
`get_cost`
evaluates to the
*
targeted
cost
*
, and is typically used for monitoring (e.g. displaying it in
TensorBoard). The scalar returned by
`get_regularization_term`
is the one that
has to be added to the training loss, multiplied by a coefficient controlling
its strength.
`OpRegularizerManager`
and the
`OpRegularizer`
s it provides for ops in the graph
are intended to facilitate easy implementation of
`NetworkRegularizer`
s. We
exemplify it here in the context of targeting FLOPs for a convolutional network,
but the same principles apply for other
*targeted costs*
.
Most of the consumption of FLOPs in convolutional networks happens in the
convolutions. As a first approximation, we can neglect the FLOP impact of the
other ops in the graph, even though the framework readily allows including the
FLOP contribution of all ops, even the ones that have negligible cost.
Within this approximation, in order to build the FLOP
`NetworkRegularizer`
, its
constructor needs to:
*
Crawl the graph, starting from the output of the network, and find all
convolution ops on which the output depends.
*
For each of these convolution ops, create an
`OpRegularizer`
.
*
Find the
`OpRegularizer`
of the
*input*
of each convolution op.
*
Implement Eq. (6) in the
[
MorphNet paper
](
https://arxiv.org/abs/1711.06798
)
to
calculate the total FLOP cost of all convolutions, and an equation similar to
Eq. (9) to calcluate the respective regularization term. We say 'similar'
because Eq. (9) refers to a specific type of regularization, where the
`regularization_vector`
of a convolution is the absolute value of the respective
batch-norm gamma vector. However the exact nature of the
`regularization_vector`
is delegated to the
`OpRegularizer`
.
research/morph_net/framework/__init__.py
0 → 100644
View file @
e79232f9
research/morph_net/framework/concat_and_slice_regularizers.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""OpRegularizers that concatenate and slice other OpRegularizers.
When we have a concatenation op in the network, which concatenates several
tensors, the regularizers of the concatenated ops (that is, the
regularization_vector-s and the alive_vector-s) should be concatenated as well.
Slicing is the complementary op - if regularizers Ra and Rb were concatenated
into a regularizer Rc, Ra and Rb can be obtained form Rc by slicing.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
tensorflow
as
tf
from
morph_net.framework
import
generic_regularizers
class
ConcatRegularizer
(
generic_regularizers
.
OpRegularizer
):
"""An OpRegularizer that concatenates others, to reflect a Concat op."""
def
__init__
(
self
,
regularizers_to_concatenate
):
for
r
in
regularizers_to_concatenate
:
if
not
generic_regularizers
.
dimensions_are_compatible
(
r
):
raise
ValueError
(
'Bad regularizer: dimensions are not compatible'
)
self
.
_alive_vector
=
tf
.
concat
(
[
r
.
alive_vector
for
r
in
regularizers_to_concatenate
],
0
)
self
.
_regularization_vector
=
tf
.
concat
(
[
r
.
regularization_vector
for
r
in
regularizers_to_concatenate
],
0
)
@
property
def
regularization_vector
(
self
):
return
self
.
_regularization_vector
@
property
def
alive_vector
(
self
):
return
self
.
_alive_vector
class
SlicingReferenceRegularizer
(
generic_regularizers
.
OpRegularizer
):
"""An OpRegularizer that slices a segment of another regularizer.
This is useful to complement the ConcatRegularizer. For example, suppose that
we have two ops, one with 3 outputs (Op1) and the other with 4 outputs (Op2).
Each has own regularizer, Reg1 and Reg2.
Now suppose that a concat op concatenated Op1 and Op2 into OpC. Reg1 and Reg2
should be concatenated to RegC. To make the situation more complicated, RegC
was grouped in a group lasso with another op in the graph, resulting in RegG.
Whan happens next? All references to RegC should obviously be replaced by
RegG. But what about Reg1? The latter could be the first 3 outputs of RegG,
and Reg2 would be the 4 last outputs of RegG.
SlicingReferenceRegularizer is a regularizer that picks a segment of outputs
form an existing OpRegularizer. When OpRegularizers are concatenated, they
are replaced by SlicingReferenceRegularizer-s.
"""
def
__init__
(
self
,
get_regularizer_to_slice
,
begin
,
size
):
"""Creates an instance.
Args:
get_regularizer_to_slice: A callable, such that get_regularizer_to_slice()
returns an OpRegularizer that has to be sliced.
begin: An integer, where to begin the slice.
size: An integer, the length of the slice (so the slice ends at
begin + size
"""
self
.
_get_regularizer_to_slice
=
get_regularizer_to_slice
self
.
_begin
=
begin
self
.
_size
=
size
self
.
_alive_vector
=
None
self
.
_regularization_vector
=
None
@
property
def
regularization_vector
(
self
):
if
self
.
_regularization_vector
is
None
:
regularizer_to_slice
=
self
.
_get_regularizer_to_slice
()
self
.
_regularization_vector
=
tf
.
slice
(
regularizer_to_slice
.
regularization_vector
,
[
self
.
_begin
],
[
self
.
_size
])
return
self
.
_regularization_vector
@
property
def
alive_vector
(
self
):
if
self
.
_alive_vector
is
None
:
regularizer_to_slice
=
self
.
_get_regularizer_to_slice
()
assert
regularizer_to_slice
is
not
self
self
.
_alive_vector
=
tf
.
slice
(
regularizer_to_slice
.
alive_vector
,
[
self
.
_begin
],
[
self
.
_size
])
return
self
.
_alive_vector
research/morph_net/framework/concat_and_slice_regularizers_test.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for framework.concat_and_slice_regularizers."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
tensorflow
as
tf
from
morph_net.framework
import
concat_and_slice_regularizers
from
morph_net.testing
import
op_regularizer_stub
class
ConcatAndSliceRegularizersTest
(
tf
.
test
.
TestCase
):
def
setUp
(
self
):
self
.
_reg_vec1
=
[
0.1
,
0.3
,
0.6
,
0.2
]
self
.
_alive_vec1
=
[
False
,
True
,
True
,
False
]
self
.
_reg_vec2
=
[
0.2
,
0.4
,
0.5
]
self
.
_alive_vec2
=
[
False
,
True
,
False
]
self
.
_reg1
=
op_regularizer_stub
.
OpRegularizerStub
(
self
.
_reg_vec1
,
self
.
_alive_vec1
)
self
.
_reg2
=
op_regularizer_stub
.
OpRegularizerStub
(
self
.
_reg_vec2
,
self
.
_alive_vec2
)
def
testConcatRegularizer
(
self
):
concat_reg
=
concat_and_slice_regularizers
.
ConcatRegularizer
(
[
self
.
_reg1
,
self
.
_reg2
])
with
self
.
test_session
():
self
.
assertAllEqual
(
self
.
_alive_vec1
+
self
.
_alive_vec2
,
concat_reg
.
alive_vector
.
eval
())
self
.
assertAllClose
(
self
.
_reg_vec1
+
self
.
_reg_vec2
,
concat_reg
.
regularization_vector
.
eval
(),
1e-5
)
def
testSliceRegularizer
(
self
):
concat_reg
=
concat_and_slice_regularizers
.
SlicingReferenceRegularizer
(
lambda
:
self
.
_reg1
,
1
,
2
)
with
self
.
test_session
():
self
.
assertAllEqual
(
self
.
_alive_vec1
[
1
:
3
],
concat_reg
.
alive_vector
.
eval
())
self
.
assertAllClose
(
self
.
_reg_vec1
[
1
:
3
],
concat_reg
.
regularization_vector
.
eval
(),
1e-5
)
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
research/morph_net/framework/generic_regularizers.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Interface for MorphNet regularizers framework.
A subclasses of Regularizer represent a regularizer that targets a certain
quantity: Number of flops, model size, number of activations etc. The
Regularizer interface has two methods:
1. `get_regularization_term`, which returns a regularization term that should be
included in the total loss to target the quantity.
2. `get_cost`, the quantity itself (for example, the number of flops). This is
useful for display in TensorBoard, and later, to to provide feedback for
automatically tuning the coefficient that multplies the regularization term,
until the cost reaches (or goes below) its target value.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
abc
class
OpRegularizer
(
object
):
"""An interface for Op Regularizers.
An OpRegularizer object corresponds to a tf.Operation, and provides
a regularizer for the output of the op (we assume that the op has one output
of interest in the context of MorphNet).
"""
__metaclass__
=
abc
.
ABCMeta
@
abc
.
abstractproperty
def
regularization_vector
(
self
):
"""Returns a vector of floats, with regularizers.
The length of the vector is the number of "output activations" (call them
neurons, nodes, filters etc) of the op. For a convolutional network, it's
the number of filters (aka "depth"). For a fully-connected layer, it's
usually the second (and last) dimension - assuming the first one is the
batch size.
"""
pass
@
abc
.
abstractproperty
def
alive_vector
(
self
):
"""Returns a vector of booleans, indicating which activations are alive.
(call them activations, neurons, nodes, filters etc). This vector is of the
same length as the regularization_vector.
"""
pass
class
NetworkRegularizer
(
object
):
"""An interface for Network Regularizers."""
__metaclass__
=
abc
.
ABCMeta
@
abc
.
abstractmethod
def
get_regularization_term
(
self
,
ops
=
None
):
"""Compute the regularization term.
Args:
ops: A list of tf.Operation objects. If specified, only the regularization
term associated with the ops in `ops` will be returned. Otherwise, all
relevant ops in the default TensorFlow graph will be included.
Returns:
A tf.Tensor scalar of floating point type that evaluates to the
regularization term (that should be added to the total loss, with a
suitable coefficient)
"""
pass
@
abc
.
abstractmethod
def
get_cost
(
self
,
ops
=
None
):
"""Calculates the cost targeted by the Regularizer.
Args:
ops: A list of tf.Operation objects. If specified, only the cost
pertaining to the ops in the `ops` will be returned. Otherwise, all
relevant ops in the default TensorFlow graph will be included.
Returns:
A tf.Tensor scalar that evaluates to the cost.
"""
pass
def
dimensions_are_compatible
(
op_regularizer
):
"""Checks if op_regularizer's alive_vector matches regularization_vector."""
return
op_regularizer
.
alive_vector
.
shape
.
with_rank
(
1
).
dims
[
0
].
is_compatible_with
(
op_regularizer
.
regularization_vector
.
shape
.
with_rank
(
1
).
dims
[
0
])
research/morph_net/framework/grouping_regularizers.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Regularizers that group other regularizers for residual connections.
An Elementwise operation between two tensors (addition, multiplication, maximum
etc) imposes a constraint of equality of the shapes of the constituents. For
example, if A, B are convolutions, and another op in the network
receives A + B as input, it means that the i-th output of A is tied to the i-th
output of B. Only if the i-th output was regularized away by the reguarizer in
both A and B can we discard the i-th activation in both.
Therefore we group the i-th output of A and the i-th output of B in a group
LASSO, a group for each i. The grouping methods can vary, and this file offers
several variants.
Residual connections, in ResNet or in RNNs, are examples where this kind of
grouping is needed.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
tensorflow
as
tf
from
morph_net.framework
import
generic_regularizers
DEFAULT_THRESHOLD
=
0.01
class
MaxGroupingRegularizer
(
generic_regularizers
.
OpRegularizer
):
"""A regularizer that groups others by taking their maximum."""
def
__init__
(
self
,
regularizers_to_group
):
"""Creates an instance.
Args:
regularizers_to_group: A list of generic_regularizers.OpRegularizer
objects.Their regularization_vector (alive_vector) are expected to be of
the same length.
Raises:
ValueError: regularizers_to_group is not of length 2 (TODO:
support arbitrary length if needed.
"""
_raise_if_length_is_not2
(
regularizers_to_group
)
self
.
_regularization_vector
=
tf
.
maximum
(
regularizers_to_group
[
0
].
regularization_vector
,
regularizers_to_group
[
1
].
regularization_vector
)
self
.
_alive_vector
=
tf
.
logical_or
(
regularizers_to_group
[
0
].
alive_vector
,
regularizers_to_group
[
1
].
alive_vector
)
@
property
def
regularization_vector
(
self
):
return
self
.
_regularization_vector
@
property
def
alive_vector
(
self
):
return
self
.
_alive_vector
class
L2GroupingRegularizer
(
generic_regularizers
.
OpRegularizer
):
r
"""A regularizer that groups others by taking their L2 norm.
R_j = sqrt((\sum_i r_{ij}^2))
Where r_i is the i-th regularization vector, r_{ij} is its j-th element, and
R_j is the j-th element of the resulting regularization vector.
"""
def
__init__
(
self
,
regularizers_to_group
,
threshold
=
DEFAULT_THRESHOLD
):
"""Creates an instance.
Args:
regularizers_to_group: A list of generic_regularizers.OpRegularizer
objects.Their regularization_vector (alive_vector) are expected to be of
the same length.
threshold: A float. An group of activations will be considered alive if
its L2 norm is greater than `threshold`.
Raises:
ValueError: regularizers_to_group is not of length 2 (TODO:
support arbitrary length if needed.
"""
_raise_if_length_is_not2
(
regularizers_to_group
)
self
.
_regularization_vector
=
tf
.
sqrt
((
lazy_square
(
regularizers_to_group
[
0
].
regularization_vector
)
+
lazy_square
(
regularizers_to_group
[
1
].
regularization_vector
)))
self
.
_alive_vector
=
self
.
_regularization_vector
>
threshold
@
property
def
regularization_vector
(
self
):
return
self
.
_regularization_vector
@
property
def
alive_vector
(
self
):
return
self
.
_alive_vector
def
_raise_if_length_is_not2
(
regularizers_to_group
):
if
len
(
regularizers_to_group
)
!=
2
:
raise
ValueError
(
'Currently only groups of size 2 are supported.'
)
def
lazy_square
(
tensor
):
"""Computes the square of a tensor in a lazy way.
This function is lazy in the following sense, for:
tensor = tf.sqrt(input)
will return input (and not tf.square(tensor)).
Args:
tensor: A `Tensor` of floats to compute the square of.
Returns:
The squre of the input tensor.
"""
if
tensor
.
op
.
type
==
'Sqrt'
:
return
tensor
.
op
.
inputs
[
0
]
else
:
return
tf
.
square
(
tensor
)
research/morph_net/framework/grouping_regularizers_test.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for framework.grouping_regularizers."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
morph_net.framework
import
grouping_regularizers
from
morph_net.testing
import
op_regularizer_stub
def
_l2_reg_with_025_threshold
(
regularizers_to_group
):
return
grouping_regularizers
.
L2GroupingRegularizer
(
regularizers_to_group
,
0.25
)
class
GroupingRegularizersTest
(
parameterized
.
TestCase
,
tf
.
test
.
TestCase
):
# TODO: Add parametrized tests.
def
setUp
(
self
):
self
.
_reg_vec1
=
[
0.1
,
0.3
,
0.6
,
0.2
]
self
.
_alive_vec1
=
[
False
,
True
,
True
,
False
]
self
.
_reg_vec2
=
[
0.2
,
0.4
,
0.5
,
0.1
]
self
.
_alive_vec2
=
[
False
,
True
,
False
,
True
]
self
.
_reg_vec3
=
[
0.3
,
0.2
,
0.0
,
0.25
]
self
.
_alive_vec3
=
[
False
,
True
,
False
,
True
]
self
.
_reg1
=
op_regularizer_stub
.
OpRegularizerStub
(
self
.
_reg_vec1
,
self
.
_alive_vec1
)
self
.
_reg2
=
op_regularizer_stub
.
OpRegularizerStub
(
self
.
_reg_vec2
,
self
.
_alive_vec2
)
self
.
_reg3
=
op_regularizer_stub
.
OpRegularizerStub
(
self
.
_reg_vec3
,
self
.
_alive_vec3
)
def
testMaxGroupingRegularizer
(
self
):
group_reg
=
grouping_regularizers
.
MaxGroupingRegularizer
(
[
self
.
_reg1
,
self
.
_reg2
])
with
self
.
test_session
():
self
.
assertAllEqual
(
[
x
or
y
for
x
,
y
in
zip
(
self
.
_alive_vec1
,
self
.
_alive_vec2
)],
group_reg
.
alive_vector
.
eval
())
self
.
assertAllClose
(
[
max
(
x
,
y
)
for
x
,
y
in
zip
(
self
.
_reg_vec1
,
self
.
_reg_vec2
)],
group_reg
.
regularization_vector
.
eval
(),
1e-5
)
def
testL2GroupingRegularizer
(
self
):
group_reg
=
grouping_regularizers
.
L2GroupingRegularizer
(
[
self
.
_reg1
,
self
.
_reg2
],
0.25
)
expcted_reg_vec
=
[
np
.
sqrt
((
x
**
2
+
y
**
2
))
for
x
,
y
in
zip
(
self
.
_reg_vec1
,
self
.
_reg_vec2
)
]
with
self
.
test_session
():
self
.
assertAllEqual
([
x
>
0.25
for
x
in
expcted_reg_vec
],
group_reg
.
alive_vector
.
eval
())
self
.
assertAllClose
(
expcted_reg_vec
,
group_reg
.
regularization_vector
.
eval
(),
1e-5
)
@
parameterized
.
named_parameters
(
(
'Max'
,
grouping_regularizers
.
MaxGroupingRegularizer
),
(
'L2'
,
_l2_reg_with_025_threshold
))
def
testOrderDoesNotMatter
(
self
,
create_reg
):
group12
=
create_reg
([
self
.
_reg1
,
self
.
_reg2
])
group13
=
create_reg
([
self
.
_reg1
,
self
.
_reg3
])
group23
=
create_reg
([
self
.
_reg2
,
self
.
_reg3
])
group123
=
create_reg
([
group12
,
self
.
_reg3
])
group132
=
create_reg
([
group13
,
self
.
_reg2
])
group231
=
create_reg
([
group23
,
self
.
_reg1
])
with
self
.
test_session
():
self
.
assertAllEqual
(
group123
.
alive_vector
.
eval
(),
group132
.
alive_vector
.
eval
())
self
.
assertAllEqual
(
group123
.
alive_vector
.
eval
(),
group231
.
alive_vector
.
eval
())
self
.
assertAllClose
(
group123
.
regularization_vector
.
eval
(),
group132
.
regularization_vector
.
eval
())
self
.
assertAllClose
(
group123
.
regularization_vector
.
eval
(),
group231
.
regularization_vector
.
eval
())
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
research/morph_net/framework/op_regularizer_manager.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A class for managing OpRegularizers.
OpRegularizerManager creates the required regulrizers and manages the
association between ops and their regularizers. OpRegularizerManager handles the
logic associated with the graph topology:
- Concatenating tensors is reflected in concatenating their regularizers.
- Skip-connections (aka residual connections), RNNs and other structures where
the shapes of two (or more) tensors are tied together are reflected in
grouping their regularizers together.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
collections
import
logging
import
tensorflow
as
tf
from
morph_net.framework
import
concat_and_slice_regularizers
from
morph_net.framework
import
generic_regularizers
from
morph_net.framework
import
grouping_regularizers
# When an op has two (or more) inputs, that haveregularizers, the latter need to
# be grouped. _GROUPING_OPS is a whitelist of ops that are allowed to group, as
# a form of verification of the correctness of the code. The list is not
# exhaustive, feel free to add other grouping ops as needed.
_GROUPING_OPS
=
(
'Add'
,
'Sub'
,
'Mul'
,
'Div'
,
'Maximum'
,
'Minimum'
,
'SquaredDifference'
,
'RealDiv'
)
# TODO: Is Div needed?
# Ops that are not pass-through, that necessarily modify the regularizer.
# These are the Ops that should not have an regularizer that is identifical to
# one of its input. When we recursively look for regularizers along the graph
# the recursion will always stop at these Ops even if no regularizer factory
# is provided, and never assume that they pass the regularizer of their input
# through.
NON_PASS_THROUGH_OPS
=
(
'Conv2D'
,
'Conv2DBackpropInput'
,
'MatMul'
)
def
_remove_nones_and_dups
(
items
):
result
=
[]
for
i
in
items
:
if
i
is
not
None
and
i
not
in
result
:
result
.
append
(
i
)
return
result
def
_raise_type_error_if_not_operation
(
op
):
if
not
isinstance
(
op
,
tf
.
Operation
):
raise
TypeError
(
'
\'
op
\'
must be of type tf.Operation, not %s'
%
str
(
type
(
op
)))
class
OpRegularizerManager
(
object
):
"""A class for managing OpRegularizers."""
# Public methods -------------------------------------------------------------
def
__init__
(
self
,
ops
,
op_regularizer_factory_dict
,
create_grouping_regularizer
=
None
):
"""Creates an instance.
Args:
ops: A list of tf.Operation-s. An OpRegularizer will be created for all
the ops in `ops`, and recursively for all ops they depend on via data
dependency. Typically `ops` would contain a single tf.Operation, which
is the output of the network.
op_regularizer_factory_dict: A dictionary, where the keys are strings
representing TensorFlow Op types, and the values are callables that
create the respective OpRegularizers. For every op encountered during
the recursion, if op.type is in op_regularizer_factory_dict, the
respective callable will be used to create an OpRegularizer. The
signature of the callables is the following args;
op; a tf.Operation for which to create a regularizer.
opreg_manager; A reference to an OpRegularizerManager object. Can be
None if the callable does not need access to OpRegularizerManager.
create_grouping_regularizer: A callable that has the signature of
grouping_regularizers.MaxGroupingRegularizer's constructor. Will be
called whenever a grouping op (see _GROUPING_OPS) is encountered.
Defaults to MaxGroupingRegularizer if None.
Raises:
ValueError: If ops is not a list.
"""
self
.
_constructed
=
False
if
not
isinstance
(
ops
,
list
):
raise
ValueError
(
'Input %s ops is not a list. Should probably use []'
%
str
(
ops
))
self
.
_op_to_regularizer
=
{}
self
.
_regularizer_to_ops
=
collections
.
defaultdict
(
list
)
self
.
_op_regularizer_factory_dict
=
op_regularizer_factory_dict
for
op_type
in
NON_PASS_THROUGH_OPS
:
if
op_type
not
in
self
.
_op_regularizer_factory_dict
:
self
.
_op_regularizer_factory_dict
[
op_type
]
=
lambda
x
,
y
:
None
self
.
_create_grouping_regularizer
=
(
create_grouping_regularizer
or
grouping_regularizers
.
MaxGroupingRegularizer
)
self
.
_visited
=
set
()
for
op
in
ops
:
self
.
_get_regularizer
(
op
)
self
.
_constructed
=
True
def
get_regularizer
(
self
,
op
):
"""Looks up or creates an OpRegularizer for a tf.Operation.
Args:
op: A tf.Operation.
- If `self` has an OpRegularizer for `op`, it will be returned.
Otherwise:
- If called before construction of `self` was completed (that is, from the
constructor), an attempt to create an OpRegularizer for `op` will be made.
Otherwise:
- If called after contstruction of `self` was completed, an exception will
be raised.
Returns:
An OpRegularizer for `op`. Can be None if `op` is not regularized (e.g.
`op` is a constant).
Raises:
RuntimeError: If `self` object has no OpRegularizer for `op` in its
lookup table, and the construction of `self` has already been completed
(because them `self` is immutable and an OpRegularizer cannot be
created).
"""
try
:
return
self
.
_op_to_regularizer
[
op
]
except
KeyError
:
if
self
.
_constructed
:
raise
ValueError
(
'Op %s does not have a regularizer.'
%
op
.
name
)
else
:
return
self
.
_get_regularizer
(
op
)
@
property
def
ops
(
self
):
return
self
.
_op_to_regularizer
.
keys
()
# ---- Public MUTABLE methods ------------------------------------------------
#
# These methods are intended to be called by OpRegularizer factory functions,
# in the constructor of OpRegularizerManager. OpRegularizerManager is
# immutable after construction, so calling these methods after construction
# has been completed will raise an exception.
def
group_and_replace_regularizers
(
self
,
regularizers
):
"""Groups a list of OpRegularizers and replaces them by the grouped one.
Args:
regularizers: A list of OpRegularizer objects to be grouped.
Returns:
An OpRegularizer object formed by the grouping.
Raises:
RuntimeError: group_and_replace_regularizers was called affter
construction of the OpRegularizerManager object was completed.
"""
if
self
.
_constructed
:
raise
RuntimeError
(
'group_and_replace_regularizers can only be called '
'before construction of the OpRegularizerManager was '
'completed.'
)
grouped
=
self
.
_create_grouping_regularizer
(
regularizers
)
# Replace all the references to the regularizers by the new grouped
# regularizer.
for
r
in
regularizers
:
self
.
_replace_regularizer
(
r
,
grouped
)
return
grouped
# Private methods ------------------------------------------------------------
def
_get_regularizer
(
self
,
op
):
"""Fetches the regularizer of `op` if exists, creates it otherwise.
This function calls itself recursively, directly or via _create_regularizer
(which in turn calls _get_regularizer). It performs DFS along the data
dependencies of the graph, and uses a self._visited set to detect loops. The
use of self._visited makes it not thread safe, but _get_regularizer is a
private method that is supposed to only be called form the constructor, so
execution in multiple threads (for the same object) is not expected.
Args:
op: A Tf.Operation.
Returns:
An OpRegularizer that corresponds to `op`, or None if op does not have
a regularizer (e. g. it's a constant op).
"""
_raise_type_error_if_not_operation
(
op
)
if
op
not
in
self
.
_op_to_regularizer
:
if
op
in
self
.
_visited
:
# In while loops, the data dependencies form a loop.
# TODO: RNNs have "legit" loops - will this still work?
return
None
self
.
_visited
.
add
(
op
)
regularizer
=
self
.
_create_regularizer
(
op
)
self
.
_op_to_regularizer
[
op
]
=
regularizer
self
.
_regularizer_to_ops
[
regularizer
].
append
(
op
)
# Make sure that there is a regularizer (or None) for every op on which
# `op` depends via data dependency.
for
i
in
op
.
inputs
:
self
.
_get_regularizer
(
i
.
op
)
self
.
_visited
.
remove
(
op
)
return
self
.
_op_to_regularizer
[
op
]
def
_create_regularizer
(
self
,
op
):
"""Creates an OpRegularizer for `op`.
Args:
op: A Tf.Operation.
Returns:
An OpRegularizer that corresponds to `op`, or None if op does not have
a regularizer.
Raises:
RuntimeError: Grouping is attempted at op which is not whitelisted for
grouping (in _GROUPING_OPS).
"""
# First we see if there is a factory function for creating the regularizer
# in the op_regularizer_factory_dict (supplied in the constructor).
if
op
.
type
in
self
.
_op_regularizer_factory_dict
:
regularizer
=
self
.
_op_regularizer_factory_dict
[
op
.
type
](
op
,
self
)
if
regularizer
is
None
:
logging
.
warning
(
'Failed to create regularizer for %s.'
,
op
.
name
)
else
:
logging
.
info
(
'Created regularizer for %s.'
,
op
.
name
)
return
regularizer
# Unless overridden in op_regularizer_factory_dict, we assume that ops
# without inputs have no regularizers. These are 'leaf' ops, typically
# constants and variables.
if
not
op
.
inputs
:
return
None
if
op
.
type
==
'ConcatV2'
:
return
self
.
_create_concat_regularizer
(
op
)
inputs_regularizers
=
_remove_nones_and_dups
(
[
self
.
_get_regularizer
(
i
.
op
)
for
i
in
op
.
inputs
])
# Ops whose inputs have no regularizers, and that are not in
# op_regularizer_factory_dict, have no regularizer either (think of ops that
# only involve constants as an example).
if
not
inputs_regularizers
:
return
None
# Ops that have one input with a regularizer, and are not in
# op_regularizer_factory_dict, are assumed to be pass-through, that is, to
# carry over the regularizer of their inputs. Examples:
# - Unary ops, such as as RELU.
# - BiasAdd, or similar ops, that involve a constant/variable and a
# regularized op (e.g. the convolution that comes before the bias).
elif
len
(
inputs_regularizers
)
==
1
:
return
inputs_regularizers
[
0
]
# Group if we have more than one regularizer in the inputs of `op` and if it
# is white-listed for grouping.
elif
op
.
type
in
_GROUPING_OPS
:
return
self
.
group_and_replace_regularizers
(
inputs_regularizers
)
raise
RuntimeError
(
'Grouping is attempted at op which is not whitelisted '
'for grouping: %s'
%
str
(
op
.
type
))
def
_create_concat_regularizer
(
self
,
concat_op
):
"""Creates an OpRegularizer for a concat op.
Args:
concat_op: A tf.Operation of type ConcatV2.
Returns:
An OpRegularizer for `concat_op`.
"""
# We omit the last input, because it's the concat dimension. Others are
# the tensors to be concatenated.
input_ops
=
[
i
.
op
for
i
in
concat_op
.
inputs
[:
-
1
]]
regularizers_to_concat
=
[
self
.
_get_regularizer
(
op
)
for
op
in
input_ops
]
# If all inputs have no regularizer, so does the concat op.
if
regularizers_to_concat
==
[
None
]
*
len
(
regularizers_to_concat
):
return
None
offset
=
0
# Replace the regularizers_to_concat by SlicingReferenceRegularizer-s that
# slice the concatenated regularizer.
ops_to_concat
=
[]
for
r
,
op
in
zip
(
regularizers_to_concat
,
input_ops
):
if
r
is
None
:
length
=
op
.
outputs
[
0
].
shape
.
as_list
()[
-
1
]
offset
+=
length
ops_to_concat
.
append
(
self
.
_ConstantOpReg
(
length
))
else
:
length
=
tf
.
shape
(
r
.
alive_vector
)[
0
]
slice_ref
=
concat_and_slice_regularizers
.
SlicingReferenceRegularizer
(
lambda
:
self
.
_get_regularizer
(
concat_op
),
offset
,
length
)
offset
+=
length
self
.
_replace_regularizer
(
r
,
slice_ref
)
ops_to_concat
.
append
(
r
)
# Create the concatenated regularizer itself.
return
concat_and_slice_regularizers
.
ConcatRegularizer
(
ops_to_concat
)
def
_replace_regularizer
(
self
,
source
,
target
):
"""Replaces `source` by 'target' in self's lookup tables."""
for
op
in
self
.
_regularizer_to_ops
[
source
]:
assert
self
.
_op_to_regularizer
[
op
]
is
source
self
.
_op_to_regularizer
[
op
]
=
target
self
.
_regularizer_to_ops
[
target
].
append
(
op
)
del
self
.
_regularizer_to_ops
[
source
]
class
_ConstantOpReg
(
generic_regularizers
.
OpRegularizer
):
"""A class with the constant alive property, and zero regularization."""
def
__init__
(
self
,
size
):
self
.
_regularization_vector
=
tf
.
zeros
(
size
)
self
.
_alive_vector
=
tf
.
cast
(
tf
.
ones
(
size
),
tf
.
bool
)
@
property
def
regularization_vector
(
self
):
return
self
.
_regularization_vector
@
property
def
alive_vector
(
self
):
return
self
.
_alive_vector
research/morph_net/framework/op_regularizer_manager_test.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for op_regularizer_manager."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
morph_net.framework
import
op_regularizer_manager
as
orm
from
morph_net.testing
import
op_regularizer_stub
layers
=
tf
.
contrib
.
layers
def
_get_op
(
name
):
return
tf
.
get_default_graph
().
get_operation_by_name
(
name
)
class
TestOpRegularizerManager
(
parameterized
.
TestCase
,
tf
.
test
.
TestCase
):
def
setUp
(
self
):
tf
.
reset_default_graph
()
tf
.
set_random_seed
(
12
)
np
.
random
.
seed
(
665544
)
def
_batch_norm_scope
(
self
):
params
=
{
'trainable'
:
True
,
'normalizer_fn'
:
layers
.
batch_norm
,
'normalizer_params'
:
{
'scale'
:
True
}
}
with
tf
.
contrib
.
framework
.
arg_scope
([
layers
.
conv2d
],
**
params
)
as
sc
:
return
sc
@
parameterized
.
named_parameters
((
'Batch_no_par1'
,
True
,
False
,
'conv1'
),
(
'Batch_par1'
,
True
,
True
,
'conv1'
),
(
'NoBatch_no_par1'
,
False
,
False
,
'conv1'
),
(
'NoBatch_par2'
,
False
,
True
,
'conv2'
),
(
'Batch_no_par2'
,
True
,
False
,
'conv2'
),
(
'Batch_par2'
,
True
,
True
,
'conv2'
),
(
'Batch_par3'
,
True
,
True
,
'conv3'
),
(
'NoBatch_par3'
,
False
,
True
,
'conv3'
),
(
'NoBatch_no_par3'
,
False
,
False
,
'conv3'
))
def
testSimpleOpGetRegularizer
(
self
,
use_batch_norm
,
use_partitioner
,
scope
):
# Tests the alive patern of the conv and relu ops.
# use_batch_norm: A Boolean. Inidcats if batch norm should be used.
# use_partitioner: A Boolean. Inidcats if a fixed_size_partitioner should be
# used.
# scope: A String. with the scope to test.
sc
=
self
.
_batch_norm_scope
()
if
use_batch_norm
else
[]
partitioner
=
tf
.
fixed_size_partitioner
(
2
)
if
use_partitioner
else
None
with
tf
.
contrib
.
framework
.
arg_scope
(
sc
):
with
tf
.
variable_scope
(
tf
.
get_variable_scope
(),
partitioner
=
partitioner
):
final_op
=
op_regularizer_stub
.
build_model
()
op_reg_manager
=
orm
.
OpRegularizerManager
([
final_op
],
op_regularizer_stub
.
MOCK_REG_DICT
)
expected_alive
=
op_regularizer_stub
.
expected_alive
()
with
self
.
test_session
():
conv_reg
=
op_reg_manager
.
get_regularizer
(
_get_op
(
scope
+
'/Conv2D'
))
self
.
assertAllEqual
(
expected_alive
[
scope
],
conv_reg
.
alive_vector
.
eval
())
relu_reg
=
op_reg_manager
.
get_regularizer
(
_get_op
(
scope
+
'/Relu'
))
self
.
assertAllEqual
(
expected_alive
[
scope
],
relu_reg
.
alive_vector
.
eval
())
@
parameterized
.
named_parameters
((
'Batch_no_par'
,
True
,
False
),
(
'Batch_par'
,
True
,
True
),
(
'NoBatch_no_par'
,
False
,
False
),
(
'NoBatch_par'
,
False
,
True
))
def
testConcatOpGetRegularizer
(
self
,
use_batch_norm
,
use_partitioner
):
sc
=
self
.
_batch_norm_scope
()
if
use_batch_norm
else
[]
partitioner
=
tf
.
fixed_size_partitioner
(
2
)
if
use_partitioner
else
None
with
tf
.
contrib
.
framework
.
arg_scope
(
sc
):
with
tf
.
variable_scope
(
tf
.
get_variable_scope
(),
partitioner
=
partitioner
):
final_op
=
op_regularizer_stub
.
build_model
()
op_reg_manager
=
orm
.
OpRegularizerManager
([
final_op
],
op_regularizer_stub
.
MOCK_REG_DICT
)
expected_alive
=
op_regularizer_stub
.
expected_alive
()
expected
=
np
.
logical_or
(
expected_alive
[
'conv4'
],
expected_alive
[
'concat'
])
with
self
.
test_session
():
conv_reg
=
op_reg_manager
.
get_regularizer
(
_get_op
(
'conv4/Conv2D'
))
self
.
assertAllEqual
(
expected
,
conv_reg
.
alive_vector
.
eval
())
relu_reg
=
op_reg_manager
.
get_regularizer
(
_get_op
(
'conv4/Relu'
))
self
.
assertAllEqual
(
expected
,
relu_reg
.
alive_vector
.
eval
())
@
parameterized
.
named_parameters
((
'Concat_5'
,
True
,
5
),
(
'Concat_7'
,
True
,
7
),
(
'Add_6'
,
False
,
6
))
def
testGetRegularizerForConcatWithNone
(
self
,
test_concat
,
depth
):
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
17
,
19
,
3
])
conv2
=
layers
.
conv2d
(
image
,
5
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv2'
)
other_input
=
tf
.
add
(
tf
.
identity
(
tf
.
constant
(
3.0
,
shape
=
[
1
,
17
,
19
,
depth
])),
3.0
)
# other_input has None as regularizer.
concat
=
tf
.
concat
([
other_input
,
conv2
],
3
)
output
=
tf
.
add
(
concat
,
concat
,
name
=
'output_out'
)
op
=
concat
.
op
if
test_concat
else
output
.
op
op_reg_manager
=
orm
.
OpRegularizerManager
([
output
.
op
],
op_regularizer_stub
.
MOCK_REG_DICT
)
expected_alive
=
op_regularizer_stub
.
expected_alive
()
with
self
.
test_session
():
alive
=
op_reg_manager
.
get_regularizer
(
op
).
alive_vector
.
eval
()
self
.
assertAllEqual
([
True
]
*
depth
,
alive
[:
depth
])
self
.
assertAllEqual
(
expected_alive
[
'conv2'
],
alive
[
depth
:])
@
parameterized
.
named_parameters
((
'add'
,
tf
.
add
),
(
'div'
,
tf
.
divide
),
(
'mul'
,
tf
.
multiply
),
(
'max'
,
tf
.
maximum
),
(
'min'
,
tf
.
minimum
),
(
'l2'
,
tf
.
squared_difference
))
def
testGroupingOps
(
self
,
tested_op
):
th
,
size
=
0.5
,
11
image
=
tf
.
constant
(
0.5
,
shape
=
[
1
,
17
,
19
,
3
])
conv1
=
layers
.
conv2d
(
image
,
5
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv1'
)
conv2
=
layers
.
conv2d
(
image
,
5
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv2'
)
res
=
tested_op
(
conv1
,
conv2
)
reg
=
{
'conv1'
:
np
.
random
.
random
(
size
),
'conv2'
:
np
.
random
.
random
(
size
)}
def
regularizer
(
conv_op
,
manager
=
None
):
del
manager
# unused
for
prefix
in
[
'conv1'
,
'conv2'
]:
if
conv_op
.
name
.
startswith
(
prefix
):
return
op_regularizer_stub
.
OpRegularizerStub
(
reg
[
prefix
],
reg
[
prefix
]
>
th
)
op_reg_manager
=
orm
.
OpRegularizerManager
([
res
.
op
],
{
'Conv2D'
:
regularizer
})
with
self
.
test_session
():
alive
=
op_reg_manager
.
get_regularizer
(
res
.
op
).
alive_vector
.
eval
()
self
.
assertAllEqual
(
alive
,
np
.
logical_or
(
reg
[
'conv1'
]
>
th
,
reg
[
'conv2'
]
>
th
))
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
research/morph_net/g3doc/grouping.png
0 → 100644
View file @
e79232f9
20.5 KB
research/morph_net/g3doc/histogram.png
0 → 100644
View file @
e79232f9
6.54 KB
research/morph_net/g3doc/tensorboard.png
0 → 100644
View file @
e79232f9
61.3 KB
research/morph_net/network_regularizers/__init__.py
0 → 100644
View file @
e79232f9
research/morph_net/network_regularizers/bilinear_cost_utils.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Helpers for Network Regularizers that are bilinear in their inputs/outputs.
Examples: The number of FLOPs and the number weights of a convolution are both
a bilinear expression in the number of its inputs and outputs.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
tensorflow
as
tf
from
morph_net.framework
import
generic_regularizers
_CONV2D_OPS
=
(
'Conv2D'
,
'Conv2DBackpropInput'
,
'DepthwiseConv2dNative'
)
_SUPPORTED_OPS
=
_CONV2D_OPS
+
(
'MatMul'
,)
def
_raise_if_not_supported
(
op
):
if
not
isinstance
(
op
,
tf
.
Operation
):
raise
ValueError
(
'conv_op must be a tf.Operation, not %s'
%
type
(
op
))
if
op
.
type
not
in
_SUPPORTED_OPS
:
raise
ValueError
(
'conv_op must be a Conv2D or a MatMul, not %s'
%
op
.
type
)
def
_get_conv_filter_size
(
conv_op
):
assert
conv_op
.
type
in
_CONV2D_OPS
conv_weights
=
conv_op
.
inputs
[
1
]
filter_shape
=
conv_weights
.
shape
.
as_list
()[:
2
]
return
filter_shape
[
0
]
*
filter_shape
[
1
]
def
flop_coeff
(
op
):
"""Computes the coefficient of number of flops associated with a convolution.
The FLOPs cost of a convolution is given by C * output_depth * input_depth,
where C = 2 * output_width * output_height * filter_size. The 2 is because we
have one multiplication and one addition for each convolution weight and
pixel. This function returns C.
Args:
op: A tf.Operation of type 'Conv2D' or 'MatMul'.
Returns:
A float, the coefficient that when multiplied by the input depth and by the
output depth gives the number of flops needed to compute the convolution.
Raises:
ValueError: conv_op is not a tf.Operation of type Conv2D.
"""
_raise_if_not_supported
(
op
)
if
op
.
type
in
_CONV2D_OPS
:
# Looking at the output shape makes it easy to automatically take into
# account strides and the type of padding.
if
op
.
type
==
'Conv2D'
or
op
.
type
==
'DepthwiseConv2dNative'
:
shape
=
op
.
outputs
[
0
].
shape
.
as_list
()
else
:
# Conv2DBackpropInput
# For a transposed convolution, the input and the output are swapped (as
# far as shapes are concerned). In other words, for a given filter shape
# and stride, if Conv2D maps from shapeX to shapeY, Conv2DBackpropInput
# maps from shapeY to shapeX. Therefore wherever we use the output shape
# for Conv2D, we use the input shape for Conv2DBackpropInput.
shape
=
_get_input
(
op
).
shape
.
as_list
()
size
=
shape
[
1
]
*
shape
[
2
]
return
2.0
*
size
*
_get_conv_filter_size
(
op
)
else
:
# MatMul
# A MatMul is like a 1x1 conv with an output size of 1x1, so from the factor
# above only the 2.0 remains.
return
2.0
def
num_weights_coeff
(
op
):
"""The number of weights of a conv is C * output_depth * input_depth. Finds C.
Args:
op: A tf.Operation of type 'Conv2D' or 'MatMul'
Returns:
A float, the coefficient that when multiplied by the input depth and by the
output depth gives the number of flops needed to compute the convolution.
Raises:
ValueError: conv_op is not a tf.Operation of type Conv2D.
"""
_raise_if_not_supported
(
op
)
return
_get_conv_filter_size
(
op
)
if
op
.
type
in
_CONV2D_OPS
else
1.0
class
BilinearNetworkRegularizer
(
generic_regularizers
.
NetworkRegularizer
):
"""A NetworkRegularizer with bilinear cost and loss.
Can be used for FLOPs regularization or for model size regularization.
"""
def
__init__
(
self
,
opreg_manager
,
coeff_func
):
"""Creates an instance.
Args:
opreg_manager: An OpRegularizerManager object that will be used to query
OpRegularizers of the various ops in the graph.
coeff_func: A callable that receives a tf.Operation of type Conv2D and
returns a bilinear coefficient of its cost. Examples:
- Use conv_flop_coeff for a FLOP regularizer.
- Use conv_num_weights_coeff for a number-of-weights regularizer.
"""
self
.
_opreg_manager
=
opreg_manager
self
.
_coeff_func
=
coeff_func
def
_get_cost_or_regularization_term
(
self
,
is_regularization
,
ops
=
None
):
total
=
0.0
if
not
ops
:
ops
=
self
.
_opreg_manager
.
ops
for
op
in
ops
:
if
op
.
type
not
in
_SUPPORTED_OPS
:
continue
# We use the following expression for thr regularizer:
#
# coeff * (number_of_inputs_alive * sum_of_output_regularizers +
# number_of_outputs_alive * sum_of_input_regularizers)
#
# where 'coeff' is a coefficient (for a particular convolution) such that
# the number of flops of that convolution is given by:
# number_of_flops = coeff * number_of_inputs * number_of_outputs.
input_op
=
_get_input
(
op
).
op
input_op_reg
=
self
.
_opreg_manager
.
get_regularizer
(
input_op
)
output_op_reg
=
self
.
_opreg_manager
.
get_regularizer
(
op
)
coeff
=
self
.
_coeff_func
(
op
)
num_alive_inputs
=
_count_alive
(
input_op
,
input_op_reg
)
num_alive_outputs
=
_count_alive
(
op
,
output_op_reg
)
if
op
.
type
==
'DepthwiseConv2dNative'
:
if
is_regularization
:
reg_inputs
=
_sum_of_reg_vector
(
input_op_reg
)
reg_outputs
=
_sum_of_reg_vector
(
output_op_reg
)
# reg_inputs and reg_outputs are often identical since they should
# come from the same reguarlizer. Duplicate them for symmetry.
# When the input doesn't have a regularizer (e.g. input), only the
# second term is used.
# TODO: revisit this expression after experiments.
total
+=
coeff
*
(
reg_inputs
+
reg_outputs
)
else
:
# num_alive_inputs may not always equals num_alive_outputs because the
# input (e.g. the image) may not have a gamma regularizer. In this
# case the computation is porportional only to num_alive_outputs.
total
+=
coeff
*
num_alive_outputs
else
:
if
is_regularization
:
reg_inputs
=
_sum_of_reg_vector
(
input_op_reg
)
reg_outputs
=
_sum_of_reg_vector
(
output_op_reg
)
total
+=
coeff
*
(
num_alive_inputs
*
reg_outputs
+
num_alive_outputs
*
reg_inputs
)
else
:
total
+=
coeff
*
num_alive_inputs
*
num_alive_outputs
return
total
def
get_cost
(
self
,
ops
=
None
):
return
self
.
_get_cost_or_regularization_term
(
False
,
ops
)
def
get_regularization_term
(
self
,
ops
=
None
):
return
self
.
_get_cost_or_regularization_term
(
True
,
ops
)
def
_get_input
(
op
):
"""Returns the input to that op that represents the activations.
(as opposed to e.g. weights.)
Args:
op: A tf.Operation object with type in _SUPPORTED_OPS.
Returns:
A tf.Tensor representing the input activations.
Raises:
ValueError: MatMul is used with transposition (unsupported).
"""
assert
op
.
type
in
_SUPPORTED_OPS
,
'Op type %s is not supported.'
%
op
.
type
if
op
.
type
==
'Conv2D'
or
op
.
type
==
'DepthwiseConv2dNative'
:
return
op
.
inputs
[
0
]
if
op
.
type
==
'Conv2DBackpropInput'
:
return
op
.
inputs
[
2
]
if
op
.
type
==
'MatMul'
:
if
op
.
get_attr
(
'transpose_a'
)
or
op
.
get_attr
(
'transpose_b'
):
raise
ValueError
(
'MatMul with transposition is not yet supported.'
)
return
op
.
inputs
[
0
]
def
_count_alive
(
op
,
opreg
):
if
opreg
:
return
tf
.
reduce_sum
(
tf
.
cast
(
opreg
.
alive_vector
,
tf
.
float32
))
else
:
return
float
(
op
.
outputs
[
0
].
shape
.
as_list
()[
-
1
])
def
_sum_of_reg_vector
(
opreg
):
if
opreg
:
return
tf
.
reduce_sum
(
opreg
.
regularization_vector
)
else
:
return
0.0
research/morph_net/network_regularizers/bilinear_cost_utils_test.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for compute_cost_estimator.
Note that BilinearNetworkRegularizer is not tested here - its specific
instantiation is tested in flop_regularizer_test.py.
"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
tensorflow
as
tf
from
tensorflow.python.framework
import
ops
from
morph_net.network_regularizers
import
bilinear_cost_utils
layers
=
tf
.
contrib
.
layers
def
_flops
(
op
):
"""Get the number of flops of a convolution, from the ops stats registry.
Args:
op: A tf.Operation object.
Returns:
The number os flops needed to evaluate conv_op.
"""
return
(
ops
.
get_stats_for_node_def
(
tf
.
get_default_graph
(),
op
.
node_def
,
'flops'
).
value
)
def
_output_depth
(
conv_op
):
return
conv_op
.
outputs
[
0
].
shape
.
as_list
()[
-
1
]
def
_input_depth
(
conv_op
):
conv_weights
=
conv_op
.
inputs
[
1
]
return
conv_weights
.
shape
.
as_list
()[
2
]
class
BilinearCostUtilTest
(
tf
.
test
.
TestCase
):
def
setUp
(
self
):
tf
.
reset_default_graph
()
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
11
,
13
,
17
])
net
=
layers
.
conv2d
(
image
,
19
,
[
7
,
5
],
stride
=
2
,
padding
=
'SAME'
,
scope
=
'conv1'
)
layers
.
conv2d_transpose
(
image
,
29
,
[
7
,
5
],
stride
=
2
,
padding
=
'SAME'
,
scope
=
'convt2'
)
net
=
tf
.
reduce_mean
(
net
,
axis
=
(
1
,
2
))
layers
.
fully_connected
(
net
,
23
,
scope
=
'FC'
)
net
=
layers
.
conv2d
(
image
,
10
,
[
7
,
5
],
stride
=
2
,
padding
=
'SAME'
,
scope
=
'conv2'
)
layers
.
separable_conv2d
(
net
,
None
,
[
3
,
2
],
depth_multiplier
=
1
,
padding
=
'SAME'
,
scope
=
'dw1'
)
self
.
conv_op
=
tf
.
get_default_graph
().
get_operation_by_name
(
'conv1/Conv2D'
)
self
.
convt_op
=
tf
.
get_default_graph
().
get_operation_by_name
(
'convt2/conv2d_transpose'
)
self
.
matmul_op
=
tf
.
get_default_graph
().
get_operation_by_name
(
'FC/MatMul'
)
self
.
dw_op
=
tf
.
get_default_graph
().
get_operation_by_name
(
'dw1/depthwise'
)
def
assertNearRelatively
(
self
,
expected
,
actual
):
self
.
assertNear
(
expected
,
actual
,
expected
*
1e-6
)
def
testConvFlopsCoeff
(
self
):
# Divide by the input depth and the output depth to get the coefficient.
expected_coeff
=
_flops
(
self
.
conv_op
)
/
(
17.0
*
19.0
)
actual_coeff
=
bilinear_cost_utils
.
flop_coeff
(
self
.
conv_op
)
self
.
assertNearRelatively
(
expected_coeff
,
actual_coeff
)
def
testConvTransposeFlopsCoeff
(
self
):
# Divide by the input depth and the output depth to get the coefficient.
expected_coeff
=
_flops
(
self
.
convt_op
)
/
(
17.0
*
29.0
)
actual_coeff
=
bilinear_cost_utils
.
flop_coeff
(
self
.
convt_op
)
self
.
assertNearRelatively
(
expected_coeff
,
actual_coeff
)
def
testFcFlopsCoeff
(
self
):
expected_coeff
=
_flops
(
self
.
matmul_op
)
/
(
19.0
*
23.0
)
actual_coeff
=
bilinear_cost_utils
.
flop_coeff
(
self
.
matmul_op
)
self
.
assertNearRelatively
(
expected_coeff
,
actual_coeff
)
def
testConvNumWeightsCoeff
(
self
):
actual_coeff
=
bilinear_cost_utils
.
num_weights_coeff
(
self
.
conv_op
)
# The coefficient is just the filter size - 7 * 5 = 35:
self
.
assertNearRelatively
(
35
,
actual_coeff
)
def
testFcNumWeightsCoeff
(
self
):
actual_coeff
=
bilinear_cost_utils
.
num_weights_coeff
(
self
.
matmul_op
)
# The coefficient is 1.0, the number of weights is just inputs x outputs.
self
.
assertNearRelatively
(
1.0
,
actual_coeff
)
def
testDepthwiseConvFlopsCoeff
(
self
):
# Divide by the input depth (which is also the output depth) to get the
# coefficient.
expected_coeff
=
_flops
(
self
.
dw_op
)
/
(
10.0
)
actual_coeff
=
bilinear_cost_utils
.
flop_coeff
(
self
.
dw_op
)
self
.
assertNearRelatively
(
expected_coeff
,
actual_coeff
)
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
research/morph_net/network_regularizers/flop_regularizer.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A NetworkRegularizer that targets the number of FLOPs."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
from
morph_net.framework
import
op_regularizer_manager
from
morph_net.network_regularizers
import
bilinear_cost_utils
from
morph_net.op_regularizers
import
conv_group_lasso_regularizer
from
morph_net.op_regularizers
import
gamma_l1_regularizer
class
GammaFlopsRegularizer
(
bilinear_cost_utils
.
BilinearNetworkRegularizer
):
"""A NetworkRegularizer that targets FLOPs using Gamma L1 as OpRegularizer."""
def
__init__
(
self
,
ops
,
gamma_threshold
):
gamma_l1_reg_factory
=
gamma_l1_regularizer
.
GammaL1RegularizerFactory
(
gamma_threshold
)
opreg_manager
=
op_regularizer_manager
.
OpRegularizerManager
(
ops
,
{
'Conv2D'
:
gamma_l1_reg_factory
.
create_regularizer
,
'DepthwiseConv2dNative'
:
gamma_l1_reg_factory
.
create_regularizer
})
super
(
GammaFlopsRegularizer
,
self
).
__init__
(
opreg_manager
,
bilinear_cost_utils
.
flop_coeff
)
class
GroupLassoFlopsRegularizer
(
bilinear_cost_utils
.
BilinearNetworkRegularizer
):
"""A NetworkRegularizer that targets FLOPs using L1 group lasso."""
def
__init__
(
self
,
ops
,
threshold
):
# Regularizer factories for convolution and fully connected layers.
conv_regularizer_factory
=
(
conv_group_lasso_regularizer
.
ConvGroupLassoRegularizerFactory
(
threshold
)
)
regularizer_factories
=
{
'Conv2D'
:
conv_regularizer_factory
.
create_regularizer
,
'Conv2DBackpropInput'
:
conv_regularizer_factory
.
create_regularizer
,
}
# Create OpRegularizerManager instance.
opreg_manager
=
op_regularizer_manager
.
OpRegularizerManager
(
ops
,
regularizer_factories
)
super
(
GroupLassoFlopsRegularizer
,
self
).
__init__
(
opreg_manager
,
bilinear_cost_utils
.
flop_coeff
)
research/morph_net/network_regularizers/flop_regularizer_test.py
0 → 100644
View file @
e79232f9
# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for flop_regularizer."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
abc
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.contrib.slim.nets
import
resnet_v1
from
morph_net.network_regularizers
import
bilinear_cost_utils
from
morph_net.network_regularizers
import
flop_regularizer
arg_scope
=
tf
.
contrib
.
framework
.
arg_scope
layers
=
tf
.
contrib
.
layers
_coeff
=
bilinear_cost_utils
.
flop_coeff
NUM_CHANNELS
=
3
class
GammaFlopLossTest
(
tf
.
test
.
TestCase
):
def
setUp
(
self
):
tf
.
reset_default_graph
()
self
.
BuildWithBatchNorm
()
with
self
.
test_session
():
self
.
Init
()
def
BuildWithBatchNorm
(
self
):
params
=
{
'trainable'
:
True
,
'normalizer_fn'
:
layers
.
batch_norm
,
'normalizer_params'
:
{
'scale'
:
True
}
}
with
arg_scope
([
layers
.
conv2d
],
**
params
):
self
.
BuildModel
()
def
BuildModel
(
self
):
# Our test model is:
#
# -> conv1 --+ -> conv3 -->
# / | /
# image [concat]
# \ | \
# -> conv2 --+ -> conv4 -->
#
# (the model has two "outputs", conv3 and conv4).
#
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
17
,
19
,
NUM_CHANNELS
])
conv1
=
layers
.
conv2d
(
image
,
13
,
[
7
,
5
],
padding
=
'SAME'
,
scope
=
'conv1'
)
conv2
=
layers
.
conv2d
(
image
,
23
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv2'
)
concat
=
tf
.
concat
([
conv1
,
conv2
],
3
)
self
.
conv3
=
layers
.
conv2d
(
concat
,
29
,
[
3
,
3
],
stride
=
2
,
padding
=
'SAME'
,
scope
=
'conv3'
)
self
.
conv4
=
layers
.
conv2d
(
concat
,
31
,
[
1
,
1
],
stride
=
1
,
padding
=
'SAME'
,
scope
=
'conv4'
)
self
.
name_to_var
=
{
v
.
op
.
name
:
v
for
v
in
tf
.
global_variables
()}
self
.
gamma_flop_reg
=
flop_regularizer
.
GammaFlopsRegularizer
(
[
self
.
conv3
.
op
,
self
.
conv4
.
op
],
gamma_threshold
=
0.45
)
def
GetConv
(
self
,
name
):
return
tf
.
get_default_graph
().
get_operation_by_name
(
name
+
'/Conv2D'
)
def
Init
(
self
):
tf
.
global_variables_initializer
().
run
()
gamma1
=
self
.
name_to_var
[
'conv1/BatchNorm/gamma'
]
gamma1
.
assign
([
0.8
]
*
7
+
[
0.2
]
*
6
).
eval
()
gamma2
=
self
.
name_to_var
[
'conv2/BatchNorm/gamma'
]
gamma2
.
assign
([
-
0.7
]
*
11
+
[
0.1
]
*
12
).
eval
()
gamma3
=
self
.
name_to_var
[
'conv3/BatchNorm/gamma'
]
gamma3
.
assign
([
0.6
]
*
10
+
[
-
0.3
]
*
19
).
eval
()
gamma4
=
self
.
name_to_var
[
'conv4/BatchNorm/gamma'
]
gamma4
.
assign
([
-
0.5
]
*
17
+
[
-
0.4
]
*
14
).
eval
()
def
cost
(
self
,
conv
):
with
self
.
test_session
():
return
self
.
gamma_flop_reg
.
get_cost
(
conv
).
eval
()
def
loss
(
self
,
conv
):
with
self
.
test_session
():
return
self
.
gamma_flop_reg
.
get_regularization_term
(
conv
).
eval
()
def
testCost
(
self
):
# Conv1 has 7 gammas above 0.45, and NUM_CHANNELS inputs (from the image).
conv
=
self
.
GetConv
(
'conv1'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
7
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Conv2 has 11 gammas above 0.45, and NUM_CHANNELS inputs (from the image).
conv
=
self
.
GetConv
(
'conv2'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
11
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Conv3 has 10 gammas above 0.45, and 7 + 11 inputs from conv1 and conv2.
conv
=
self
.
GetConv
(
'conv3'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
10
*
18
,
self
.
cost
([
conv
]))
# Conv4 has 17 gammas above 0.45, and 7 + 11 inputs from conv1 and conv2.
conv
=
self
.
GetConv
(
'conv4'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
17
*
18
,
self
.
cost
([
conv
]))
# Test that passing a list of convs sums their contributions:
convs
=
[
self
.
GetConv
(
'conv3'
),
self
.
GetConv
(
'conv4'
)]
self
.
assertEqual
(
self
.
cost
(
convs
[:
1
])
+
self
.
cost
(
convs
[
1
:]),
self
.
cost
(
convs
))
class
GammaFlopLossWithDepthwiseConvTestBase
(
object
):
"""Test flop_regularizer for a network with depthwise convolutions."""
__metaclass__
=
abc
.
ABCMeta
@
abc
.
abstractmethod
def
GetSession
(
self
):
return
def
BuildWithBatchNorm
(
self
):
params
=
{
'trainable'
:
True
,
'normalizer_fn'
:
layers
.
batch_norm
,
'normalizer_params'
:
{
'scale'
:
True
}
}
ops_with_batchnorm
=
[
layers
.
conv2d
]
if
self
.
_depthwise_use_batchnorm
:
ops_with_batchnorm
.
append
(
layers
.
separable_conv2d
)
with
arg_scope
(
ops_with_batchnorm
,
**
params
):
self
.
BuildModel
()
def
BuildModel
(
self
):
# Our test model is:
#
# -> dw1 --> conv1 --+
# / |
# image [concat] --> conv3
# \ |
# -> conv2 --> dw2 --+
#
# (the model has one "output", conv3).
#
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
17
,
19
,
NUM_CHANNELS
])
dw1
=
layers
.
separable_conv2d
(
image
,
None
,
[
3
,
3
],
depth_multiplier
=
1
,
stride
=
1
,
scope
=
'dw1'
)
conv1
=
layers
.
conv2d
(
dw1
,
13
,
[
7
,
5
],
padding
=
'SAME'
,
scope
=
'conv1'
)
conv2
=
layers
.
conv2d
(
image
,
23
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv2'
)
dw2
=
layers
.
separable_conv2d
(
conv2
,
None
,
[
5
,
5
],
depth_multiplier
=
1
,
stride
=
1
,
scope
=
'dw2'
)
concat
=
tf
.
concat
([
conv1
,
dw2
],
3
)
self
.
conv3
=
layers
.
conv2d
(
concat
,
29
,
[
3
,
3
],
stride
=
2
,
padding
=
'SAME'
,
scope
=
'conv3'
)
self
.
name_to_var
=
{
v
.
op
.
name
:
v
for
v
in
tf
.
global_variables
()}
self
.
gamma_flop_reg
=
flop_regularizer
.
GammaFlopsRegularizer
(
[
self
.
conv3
.
op
],
gamma_threshold
=
0.45
)
def
GetConv
(
self
,
name
):
return
tf
.
get_default_graph
().
get_operation_by_name
(
name
+
(
'/Conv2D'
if
'conv'
in
name
else
'/depthwise'
))
def
GetGammaAbsValue
(
self
,
name
):
gamma_op
=
tf
.
get_default_graph
().
get_operation_by_name
(
name
+
'/BatchNorm/gamma'
)
with
self
.
GetSession
():
# pylint: disable=not-context-manager
gamma
=
gamma_op
.
outputs
[
0
].
eval
()
return
np
.
abs
(
gamma
)
def
Init
(
self
):
tf
.
global_variables_initializer
().
run
()
gamma1
=
self
.
name_to_var
[
'conv1/BatchNorm/gamma'
]
gamma1
.
assign
([
0.8
]
*
7
+
[
0.2
]
*
6
).
eval
()
gamma2
=
self
.
name_to_var
[
'conv2/BatchNorm/gamma'
]
gamma2
.
assign
([
-
0.7
]
*
11
+
[
0.1
]
*
12
).
eval
()
gamma3
=
self
.
name_to_var
[
'conv3/BatchNorm/gamma'
]
gamma3
.
assign
([
0.6
]
*
10
+
[
-
0.3
]
*
19
).
eval
()
# Initialize gamma for depthwise convs only if there are Batchnorm for them.
if
self
.
_depthwise_use_batchnorm
:
gammad1
=
self
.
name_to_var
[
'dw1/BatchNorm/gamma'
]
gammad1
.
assign
([
-
0.3
]
*
1
+
[
-
0.9
]
*
2
).
eval
()
gammad2
=
self
.
name_to_var
[
'dw2/BatchNorm/gamma'
]
gammad2
.
assign
([
0.3
]
*
5
+
[
0.9
]
*
10
+
[
-
0.1
]
*
8
).
eval
()
def
cost
(
self
,
conv
):
# pylint: disable=invalid-name
with
self
.
GetSession
():
# pylint: disable=not-context-manager
cost
=
self
.
gamma_flop_reg
.
get_cost
(
conv
)
return
cost
.
eval
()
if
isinstance
(
cost
,
tf
.
Tensor
)
else
cost
def
loss
(
self
,
conv
):
# pylint: disable=invalid-name
with
self
.
GetSession
():
# pylint: disable=not-context-manager
reg
=
self
.
gamma_flop_reg
.
get_regularization_term
(
conv
)
return
reg
.
eval
()
if
isinstance
(
reg
,
tf
.
Tensor
)
else
reg
class
GammaFlopLossWithDepthwiseConvTest
(
tf
.
test
.
TestCase
,
GammaFlopLossWithDepthwiseConvTestBase
):
"""Test flop_regularizer for a network with depthwise convolutions."""
def
setUp
(
self
):
self
.
_depthwise_use_batchnorm
=
True
tf
.
reset_default_graph
()
self
.
BuildWithBatchNorm
()
with
self
.
test_session
():
self
.
Init
()
def
GetSession
(
self
):
return
self
.
test_session
()
def
testCost
(
self
):
# Dw1 has 2 gammas above 0.45 out of NUM_CHANNELS inputs (from the image),
# but because the input doesn't have a regularizer, it has no way of
# removing the channels, so the channel count is still NUM_CHANNELS.
conv
=
self
.
GetConv
(
'dw1'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Conv1 has 7 gammas above 0.45, and NUM_CHANNELS inputs (from dw1).
conv
=
self
.
GetConv
(
'conv1'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
7
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Conv2 has 11 active + 12 inactive, while Dw2 has 5 inactive, 10 active and
# 8 active. Their max (or) has 15 active and 8 inactive.
# Conv2 has NUM_CHANNELS inputs (from the image).
conv
=
self
.
GetConv
(
'conv2'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
15
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Dw2 has 15 out of 23 inputs (from the Conv2).
conv
=
self
.
GetConv
(
'dw2'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
15
,
self
.
cost
([
conv
]))
# Conv3 has 10 gammas above 0.45, and 7 + 15 inputs from conv1 and dw2.
conv
=
self
.
GetConv
(
'conv3'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
10
*
22
,
self
.
cost
([
conv
]))
def
testRegularizer
(
self
):
# Dw1 depthwise convolution is connected to the input (no regularizer).
conv
=
self
.
GetConv
(
'dw1'
)
# Although the effective regularizer for dw is computed as below:
# gamma = self.GetGammaAbsValue('dw1')
# expected_loss = _coeff(conv) * gamma.sum()
# Since the input is not regularized, dw does not return a regularizer.
expected_loss
=
0.0
self
.
assertNear
(
expected_loss
,
self
.
loss
([
conv
]),
expected_loss
*
1e-5
)
# Conv1 takes Dw1 as input, its input regularizer is from dw1.
conv
=
self
.
GetConv
(
'conv1'
)
gamma
=
self
.
GetGammaAbsValue
(
'conv1'
)
# The effective size for dw can be computed from its gamma, and
# the loss may be computed as follows:
# gamma_dw = self.GetGammaAbsValue('dw1')
# expected_loss = _coeff(conv) * (
# gamma.sum() * (gamma_dw > 0.45).sum() + gamma_dw.sum() *
# (gamma > 0.45).sum())
# However, since dw cannot change shape because its input doesn't have a
# regularizer, the real loss we expect should be:
expected_loss
=
_coeff
(
conv
)
*
(
gamma
.
sum
()
*
NUM_CHANNELS
)
self
.
assertNear
(
expected_loss
,
self
.
loss
([
conv
]),
expected_loss
*
1e-5
)
# Dw2 depthwise convolution is connected to conv2 (grouped regularizer).
conv
=
self
.
GetConv
(
'conv2'
)
gamma_conv
=
self
.
GetGammaAbsValue
(
'conv2'
)
dw
=
self
.
GetConv
(
'dw2'
)
gamma_dw
=
self
.
GetGammaAbsValue
(
'dw2'
)
gamma
=
np
.
maximum
(
gamma_dw
,
gamma_conv
).
sum
()
expected_loss
=
_coeff
(
conv
)
*
(
gamma
*
3
+
(
gamma
>
0.45
).
sum
()
*
0
)
self
.
assertNear
(
expected_loss
,
self
.
loss
([
conv
]),
expected_loss
*
1e-5
)
expected_loss
=
_coeff
(
dw
)
*
gamma
*
2
self
.
assertNear
(
expected_loss
,
self
.
loss
([
dw
]),
expected_loss
*
1e-5
)
class
GammaFlopLossWithDepthwiseConvNoBatchNormTest
(
tf
.
test
.
TestCase
,
GammaFlopLossWithDepthwiseConvTestBase
):
"""Test flop_regularizer for un-batchnormed depthwise convolutions.
This test is used to confirm that when depthwise convolution is not BNed, it
will not be considered towards the regularizer, but it will be counted towards
the cost.
This design choice is for backward compatibility for users who did not
regularize depthwise convolutions. However, the cost will be reported
regardless in order to be faithful to the real computation complexity.
"""
def
setUp
(
self
):
self
.
_depthwise_use_batchnorm
=
False
tf
.
reset_default_graph
()
self
.
BuildWithBatchNorm
()
with
self
.
test_session
():
self
.
Init
()
def
GetSession
(
self
):
return
self
.
test_session
()
def
testCost
(
self
):
# Dw1 has NUM_CHANNELS inputs (from the image).
conv
=
self
.
GetConv
(
'dw1'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
3
,
self
.
cost
([
conv
]))
# Conv1 has 7 gammas above 0.45, and 3 inputs (from dw1).
conv
=
self
.
GetConv
(
'conv1'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
7
*
3
,
self
.
cost
([
conv
]))
# Conv2 has 11 active outputs and NUM_CHANNELS inputs (from the image).
conv
=
self
.
GetConv
(
'conv2'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
11
*
NUM_CHANNELS
,
self
.
cost
([
conv
]))
# Dw2 has 11 inputs (pass-through from the Conv2).
conv
=
self
.
GetConv
(
'dw2'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
11
,
self
.
cost
([
conv
]))
# Conv3 has 10 gammas above 0.45, and 7 + 11 inputs from conv1 and dw2.
conv
=
self
.
GetConv
(
'conv3'
)
self
.
assertEqual
(
_coeff
(
conv
)
*
10
*
18
,
self
.
cost
([
conv
]))
def
testRegularizer
(
self
):
# Dw1 depthwise convolution is connected to the input (no regularizer).
conv
=
self
.
GetConv
(
'dw1'
)
expected_loss
=
0.0
self
.
assertNear
(
expected_loss
,
self
.
loss
([
conv
]),
expected_loss
*
1e-5
)
# Conv1 takes Dw1 as input, but it's not affected by dw1 because depthwise
# is not BNed.
conv
=
self
.
GetConv
(
'conv1'
)
gamma
=
self
.
GetGammaAbsValue
(
'conv1'
)
expected_loss
=
_coeff
(
conv
)
*
(
gamma
.
sum
()
*
NUM_CHANNELS
)
self
.
assertNear
(
expected_loss
,
self
.
loss
([
conv
]),
expected_loss
*
1e-5
)
# Dw2 depthwise convolution is connected to conv2 (pass through).
dw
=
self
.
GetConv
(
'dw2'
)
gamma
=
self
.
GetGammaAbsValue
(
'conv2'
)
expected_loss
=
_coeff
(
dw
)
*
gamma
.
sum
()
*
2
self
.
assertNear
(
expected_loss
,
self
.
loss
([
dw
]),
expected_loss
*
1e-5
)
class
GammaFlopResidualConnectionsLossTest
(
tf
.
test
.
TestCase
):
"""Tests flop_regularizer for a network with residual connections."""
def
setUp
(
self
):
tf
.
reset_default_graph
()
tf
.
set_random_seed
(
7
)
self
.
_threshold
=
0.6
def
buildModel
(
self
,
resnet_fn
,
block_fn
):
# We use this model as a test case because the slim.nets.resnet module is
# used in some production.
#
# The model looks as follows:
#
# Image --> unit_1/shortcut
# Image --> unit_1/conv1 --> unit_1/conv2 --> unit_1/conv3
#
# unit_1/shortcut + unit_1/conv3 --> unit_1 (residual connection)
#
# unit_1 --> unit_2/conv1 -> unit_2/conv2 --> unit_2/conv3
#
# unit_1 + unit_2/conv3 --> unit_2 (residual connection)
#
# In between, there are strided convolutions and pooling ops, but these
# should not affect the regularizer.
blocks
=
[
block_fn
(
'block1'
,
base_depth
=
7
,
num_units
=
2
,
stride
=
2
),
]
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
2
,
2
,
NUM_CHANNELS
])
net
=
resnet_fn
(
image
,
blocks
,
include_root_block
=
False
,
is_training
=
False
)[
0
]
net
=
tf
.
reduce_mean
(
net
,
axis
=
(
1
,
2
))
return
layers
.
fully_connected
(
net
,
23
,
scope
=
'FC'
)
def
buildGraphWithBatchNorm
(
self
,
resnet_fn
,
block_fn
):
params
=
{
'trainable'
:
True
,
'normalizer_fn'
:
layers
.
batch_norm
,
'normalizer_params'
:
{
'scale'
:
True
}
}
with
arg_scope
([
layers
.
conv2d
,
layers
.
separable_conv2d
],
**
params
):
self
.
net
=
self
.
buildModel
(
resnet_fn
,
block_fn
)
def
initGamma
(
self
):
assignments
=
[]
gammas
=
{}
for
v
in
tf
.
global_variables
():
if
v
.
op
.
name
.
endswith
(
'/gamma'
):
assignments
.
append
(
v
.
assign
(
tf
.
random_uniform
(
v
.
shape
)))
gammas
[
v
.
op
.
name
]
=
v
with
self
.
test_session
()
as
s
:
s
.
run
(
assignments
)
self
.
_gammas
=
s
.
run
(
gammas
)
def
getGamma
(
self
,
short_name
):
tokens
=
short_name
.
split
(
'/'
)
name
=
(
'resnet_v1/block1/'
+
tokens
[
0
]
+
'/bottleneck_v1/'
+
tokens
[
1
]
+
'/BatchNorm/gamma'
)
return
self
.
_gammas
[
name
]
def
getOp
(
self
,
short_name
):
if
short_name
==
'FC'
:
return
tf
.
get_default_graph
().
get_operation_by_name
(
'FC/MatMul'
)
tokens
=
short_name
.
split
(
'/'
)
name
=
(
'resnet_v1/block1/'
+
tokens
[
0
]
+
'/bottleneck_v1/'
+
tokens
[
1
]
+
'/Conv2D'
)
return
tf
.
get_default_graph
().
get_operation_by_name
(
name
)
def
numAlive
(
self
,
short_name
):
return
np
.
sum
(
self
.
getGamma
(
short_name
)
>
self
.
_threshold
)
def
getCoeff
(
self
,
short_name
):
return
_coeff
(
self
.
getOp
(
short_name
))
def
testCost
(
self
):
self
.
buildGraphWithBatchNorm
(
resnet_v1
.
resnet_v1
,
resnet_v1
.
resnet_v1_block
)
self
.
initGamma
()
res_alive
=
np
.
logical_or
(
np
.
logical_or
(
self
.
getGamma
(
'unit_1/shortcut'
)
>
self
.
_threshold
,
self
.
getGamma
(
'unit_1/conv3'
)
>
self
.
_threshold
),
self
.
getGamma
(
'unit_2/conv3'
)
>
self
.
_threshold
)
self
.
gamma_flop_reg
=
flop_regularizer
.
GammaFlopsRegularizer
(
[
self
.
net
.
op
],
self
.
_threshold
)
expected
=
{}
expected
[
'unit_1/shortcut'
]
=
(
self
.
getCoeff
(
'unit_1/shortcut'
)
*
np
.
sum
(
res_alive
)
*
NUM_CHANNELS
)
expected
[
'unit_1/conv1'
]
=
(
self
.
getCoeff
(
'unit_1/conv1'
)
*
self
.
numAlive
(
'unit_1/conv1'
)
*
NUM_CHANNELS
)
expected
[
'unit_1/conv2'
]
=
(
self
.
getCoeff
(
'unit_1/conv2'
)
*
self
.
numAlive
(
'unit_1/conv2'
)
*
self
.
numAlive
(
'unit_1/conv1'
))
expected
[
'unit_1/conv3'
]
=
(
self
.
getCoeff
(
'unit_1/conv3'
)
*
np
.
sum
(
res_alive
)
*
self
.
numAlive
(
'unit_1/conv2'
))
expected
[
'unit_2/conv1'
]
=
(
self
.
getCoeff
(
'unit_2/conv1'
)
*
self
.
numAlive
(
'unit_2/conv1'
)
*
np
.
sum
(
res_alive
))
expected
[
'unit_2/conv2'
]
=
(
self
.
getCoeff
(
'unit_2/conv2'
)
*
self
.
numAlive
(
'unit_2/conv2'
)
*
self
.
numAlive
(
'unit_2/conv1'
))
expected
[
'unit_2/conv3'
]
=
(
self
.
getCoeff
(
'unit_2/conv3'
)
*
np
.
sum
(
res_alive
)
*
self
.
numAlive
(
'unit_2/conv2'
))
expected
[
'FC'
]
=
2.0
*
np
.
sum
(
res_alive
)
*
23.0
# TODO: Is there a way to use Parametrized Tests to make this more
# elegant?
with
self
.
test_session
():
for
short_name
in
expected
:
cost
=
self
.
gamma_flop_reg
.
get_cost
([
self
.
getOp
(
short_name
)]).
eval
()
self
.
assertEqual
(
expected
[
short_name
],
cost
)
self
.
assertEqual
(
sum
(
expected
.
values
()),
self
.
gamma_flop_reg
.
get_cost
().
eval
())
class
GroupLassoFlopRegTest
(
tf
.
test
.
TestCase
):
def
assertNearRelatively
(
self
,
expected
,
actual
):
self
.
assertNear
(
expected
,
actual
,
expected
*
1e-6
)
def
testFlopRegularizer
(
self
):
tf
.
reset_default_graph
()
tf
.
set_random_seed
(
7907
)
with
arg_scope
(
[
layers
.
conv2d
,
layers
.
conv2d_transpose
],
weights_initializer
=
tf
.
random_normal_initializer
):
# Our test model is:
#
# -> conv1 --+
# / |--[concat]
# image --> conv2 --+
# \
# -> convt
#
# (the model has two "outputs", convt and concat).
#
image
=
tf
.
constant
(
0.0
,
shape
=
[
1
,
17
,
19
,
NUM_CHANNELS
])
conv1
=
layers
.
conv2d
(
image
,
13
,
[
7
,
5
],
padding
=
'SAME'
,
scope
=
'conv1'
)
conv2
=
layers
.
conv2d
(
image
,
23
,
[
1
,
1
],
padding
=
'SAME'
,
scope
=
'conv2'
)
self
.
concat
=
tf
.
concat
([
conv1
,
conv2
],
3
)
self
.
convt
=
layers
.
conv2d_transpose
(
image
,
29
,
[
7
,
5
],
stride
=
3
,
padding
=
'SAME'
,
scope
=
'convt'
)
self
.
name_to_var
=
{
v
.
op
.
name
:
v
for
v
in
tf
.
global_variables
()}
with
self
.
test_session
():
tf
.
global_variables_initializer
().
run
()
threshold
=
1.0
flop_reg
=
flop_regularizer
.
GroupLassoFlopsRegularizer
(
[
self
.
concat
.
op
,
self
.
convt
.
op
],
threshold
=
threshold
)
with
self
.
test_session
()
as
s
:
evaluated_vars
=
s
.
run
(
self
.
name_to_var
)
def
group_norm
(
weights
,
axis
=
(
0
,
1
,
2
)):
# pylint: disable=invalid-name
return
np
.
sqrt
(
np
.
mean
(
weights
**
2
,
axis
=
axis
))
reg_vectors
=
{
'conv1'
:
group_norm
(
evaluated_vars
[
'conv1/weights'
],
(
0
,
1
,
2
)),
'conv2'
:
group_norm
(
evaluated_vars
[
'conv2/weights'
],
(
0
,
1
,
2
)),
'convt'
:
group_norm
(
evaluated_vars
[
'convt/weights'
],
(
0
,
1
,
3
))
}
num_alive
=
{
k
:
np
.
sum
(
r
>
threshold
)
for
k
,
r
in
reg_vectors
.
iteritems
()}
total_outputs
=
(
reg_vectors
[
'conv1'
].
shape
[
0
]
+
reg_vectors
[
'conv2'
].
shape
[
0
])
total_alive_outputs
=
sum
(
num_alive
.
values
())
assert
total_alive_outputs
>
0
,
(
'All outputs are dead - test is trivial. Decrease the threshold.'
)
assert
total_alive_outputs
<
total_outputs
,
(
'All outputs are alive - test is trivial. Increase the threshold.'
)
coeff1
=
_coeff
(
_get_op
(
'conv1/Conv2D'
))
coeff2
=
_coeff
(
_get_op
(
'conv2/Conv2D'
))
coefft
=
_coeff
(
_get_op
(
'convt/conv2d_transpose'
))
expected_flop_cost
=
NUM_CHANNELS
*
(
coeff1
*
num_alive
[
'conv1'
]
+
coeff2
*
num_alive
[
'conv2'
]
+
coefft
*
num_alive
[
'convt'
])
expected_reg_term
=
NUM_CHANNELS
*
(
coeff1
*
np
.
sum
(
reg_vectors
[
'conv1'
])
+
coeff2
*
np
.
sum
(
reg_vectors
[
'conv2'
])
+
coefft
*
np
.
sum
(
reg_vectors
[
'convt'
]))
with
self
.
test_session
():
self
.
assertEqual
(
round
(
expected_flop_cost
),
round
(
flop_reg
.
get_cost
().
eval
()))
self
.
assertNearRelatively
(
expected_reg_term
,
flop_reg
.
get_regularization_term
().
eval
())
def
_get_op
(
name
):
# pylint: disable=invalid-name
return
tf
.
get_default_graph
().
get_operation_by_name
(
name
)
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment