Merge pull request #4334 from gariel-google/master

Added the MorphNet library

Merge pull request #4334 from gariel-google/master
Added the MorphNet library
e79232f9 · Lukasz Kaiser · GitHub · 81d77669 · 79680288 · e79232f9
Unverified Commit e79232f9 authored May 22, 2018 by Lukasz Kaiser Committed by GitHub May 22, 2018
20 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -24,6 +24,7 @@
 /research/lm_1b/ @oriolvinyals @panyx0718
 /research/marco/ @vincentvanhoucke
 /research/maskgan/ @a-dai
+/research/morph_net/ @gariel-google
 /research/namignizer/ @knathanieltucker
 /research/neural_gpu/ @lukaszkaiser
 /research/neural_programmer/ @arvind2505

--- a/research/morph_net/README.md
+++ b/research/morph_net/README.md
+# MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
+[TOC]
+## What is MorphNet?
+MorphNet is a method for learning deep network structure during training. The
+key principle is continuous relaxation of the network-structure learning
+problem. Specifically, we use regularizers that induce sparsity in the space of
+activations of the network. The regularizers can be tailored to target the
+consumption of specific resources by the network, such as FLOPs or model size.
+When such a regularizer is added to the training loss and their sum is
+minimized via stochastic gradient descent or a similar optimizer, the learning
+problem becomes also a constrained optimization of the structure of the network,
+under the constraint represented by the regularizer. The method is described in
+detail in the [this paper](https://arxiv.org/abs/1711.06798), to appear in [CVPR
+2018](http://cvpr2018.thecvf.com/).
+## Adding a MorphNet regularizer to your training code
+Your interaction with the MorphNet codebase will most likely be through
+subclasses of `NetworkRegularizer`. Each subclass represents a resource that we
+wish to target/constrain when optimizing the network. The MorphNet package
+provides several `NetworkRegularizer`s in the `network_regularizers` directory,
+as well as a framework for writing your own. The framework is described in
+detail [here](g3doc/regularizers_framework.md). The interface of
+`NetworkRegularizer` is given
+[here](g3doc/regularizers_framework.md?#network-regularizers).
+To apply a `NetworkRegularizer` to your network, your code would look similar to
+the example below. The example refers to a specific type of `NetworkRegularizer`
+that targets FLOPs, and to make the discussion simpler we henceforth restrict it
+to this case, but generalization to an arbitrary constrained resource and an
+arbitrary regularization method that targets that resource is straightforward.
+```python
+my_gamma_threshold = 1e-3
+regularizer_strength = 1e-9
+network_reg = network_regularizers.GammaFlopsRegularizer(
+    [my_network_output.op], my_gamma_threshold)
+my_training_loss += regularizer_strength * network_reg.get_regularization_term()
+tf.summary.scalar('FLOPs', network_reg.get_cost()
+```
+Once you start your training, your TensorBoard will display the effective FLOP
+count of the model. "Effective" is the sense that as activations are zeroed out
+by the regularizer, their impact on the FLOP count is discounted.
+![TensorBoardDisplayOfFlops](g3doc/tensorboard.png "Example of the TensorBoard
+display of the resource regularized by MorphNet.")
+The larger the `regularizer_strength`, the smaller the effective FLOP count to
+which the network will converge. If `regularizer_strength` is large enough, the
+FLOP count will collapse to zero, whereas if it is small enough, the FLOP count
+will remain at its initial value and the network structure will not vary.
+`regularizer_strength` is your knob to control where you want to be on the
+price-performance curve. The `my_gamma_threshold` parameter is used for
+determining when an activation is alive. It is described in more detail
+[here](framework/README.md?#the-opregularizer-interface), including
+an explanation for how to tune it.
+## Extracting the architecture learned by MorphNet
+One way to extract the structure is querying the `network_reg` object created
+above. To query which activations in a given op were kept alive (as opposed to
+removed) by MorphNet, your code would look similar to
+```python
+alive = sess.run(network_reg.opreg_manager.get_regularizer(op).alive_vector)
+```
+where `op` is the tensorflow op in question, and `sess` is a tf.Session object.
+The result is a vector of booleans, designating which activations were kept
+alive (more details can be found
+[here](framework/README.md?#the-opregularizer-interface)). Typically
+one would be interested in the number of alive activations, which can be
+obtained by counting the `True` values in `alive`. Looping over all convolutions
+and / or fully connected layers (as `op`) is typically sufficient to extract the
+full structure learned by MorphNet.
+## Maintainers
+* Elad Eban
+* Ariel Gordon, github: [gariel-google](https://github.com/gariel-google).
--- a/research/morph_net/__init__.py
+++ b/research/morph_net/__init__.py
--- a/research/morph_net/framework/README.md
+++ b/research/morph_net/framework/README.md
+# Regularizers Framework
+[TOC]
+## Goal
+The goal of this framework is to facilitate building sparsifying regularizers
+for deep networks. A regularizer targets a certain cost (***targeted
+cost***), such as the FLOP cost of inference, model size, latency, memory
+footprint, etc.
+In order to form such a regularizer, we traverse the TensorFlow graph and find
+the ops that contribute to the *targeted cost*. For each op, we apply a
+sparsifying regularizer that induces sparsity *among the activations*. The
+sparsifying regularizer of each activation is weighted by its marginal
+contribution to the *targeted cost*.
+Calculating this weight may be a nontrivial task. For example, for a fully
+connected layer the FLOP cost is proportional to the number of inputs times
+the number of outputs, which means that the marginal cost of each output
+is proportional to the number of inputs. Some of the inputs may have been
+already regularized away, which means that the calculation of one op's FLOP
+regularizer depends on the regularization of the output of other ops. Moreover,
+if an op receives its input a concatenation or a sum of several other ops,
+figuring out the regularizer requires some bookkeeping.
+The goal of this framework is to take care of this bookkeeping in a general way,
+to facilitate building a wide variety of regularizers, targeting a wide variety
+of *targeted costs*, with little effort and less opportunities to err. In
+what follows we outline the framework, building it from the bottom up: From a
+single activation all the way to a full complex network.
+## `OpRegularizers` and how they are assigned
+### The `OpRegularizer` interface
+`OpRegularizer` is the most primitive element in the framework. An
+`OpRegularizer` refers to TensorFlow op, and has two methods,
+`regularization_vector` and `alive_vector`, both return `tf.Tensor`s or rank 1
+(vectors). `regularization_vector` is of type float, and its `i`-th entry is the
+regularizer of the `i`-th activation of the op the `OpRegularizer` refers to.
+In order to regularize away that activation, one would need to add the `i`-th
+entry of `regularization_vector`, multiplied by some coefficient, to the
+training loss. The stronger we want to penalize it, the larger the coefficient
+is. Assuming that the regularizer is of sparsifying nature (e.g. L1 norm), with
+a large enough coefficient, the `i`-th activation will eventually vanish.
+Loosely speaking, if we were to target the total number of activations in the
+network, we would add the sum of all `regularization_vector`s from all
+`OpRegularizer` to the training loss.
+Since `OpRegularizer` is an abstract interface, with no awareness of the nature
+of regularization used, the decision when an activation can be considered alive
+is also deferred to `OpRegularizer`, via the `alive_vector` method. The `i`-th
+entry evaluates to a boolean that indicates whether the activation is alive.
+```python
+class OpRegularizer(object):
+  @abc.abstractproperty
+  def regularization_vector(self):
+    """Returns a vector of floats with a regularizer for each activation."""
+    pass
+  @abc.abstractproperty
+  def alive_vector(self):
+    """Returns a bool vector indicating which activations are alive."""
+    pass
+```
+As an example, we can consider a fully connected layer that has `m` inputs and
+`n` outputs. The layer is represented by an `m * n` matrix, and one way to
+impose sparsifying regularizer on the `i`-th output is by grouping all weights
+associated with it into a group LASSO regularizer, such as the L2 norm of the
+`i`-th row of the matrix. That would therefore be the `i`-th entry of the
+`regularization_vector`.
+When such a regularization is added to the training loss, the L2 norms of the
+rows of the matrix tend to form a bimodal distribution with one peak near "zero"
+(up to numerical noise), another peak away from zero, and a void in between. A
+natural way to detemine whether the `i`-th activation is alive is thus by
+comparing the `i`-th entry of the `regularization_vector` to some threshold that
+lies in that void: If it's above the threshold, it's alive.
+![HistogramOfActivationSTDs](../g3doc/histogram.png "Typical bimodal distribution of
+the standatd deviations of the activations of a convolutional layer when a
+sparsifying regularizer is applied.")
+There are ops that are not regularized, such as constants, or the input to the
+network. For an un-regularized op, the `OpRegularizer` is set to `None`, which
+implies an all-zero `regularization_vector` and an all-True `alive_vector`.
+### Rules for assigning `OpRegularizer`s to ops
+As we traverse the TensorFlow graph, we assign an `OpRegularizer` to each op we
+encounter according to the set of rules outlined in this section. We first
+explain "default rules", rules that address propagating `OpRegularizers` across
+connections in the TensorFlow graph. Then we discuss client-specified rules,
+which can augment and override the default rules.
+#### Pass-through ops
+Many TensorFlow ops inherit the `OpRegularizer` of their input. These are ops
+that:
+* Don't change the alive status of activations.
+* The only way an activation can be eliminated form their output is if
+it's eliminated from their input.
+An example is adding a bias to the output of a convolution. After adding a bias
+to it, an activation will be alive (that is, have nonzero variance) if and only
+if was alive before adding the bias. If we want to regularize away an activation
+at the output of a `BiasAdd` op, the only way to do so is to penalize the same
+activation in the preceding convolution.
+Since both the `regularization_vector` and the `alive_vector` of such an op is
+identical to those of its input, so is the entire `OpRegularizer`. We refer to
+such ops as *pass-through* ops. Shape-preserving unary ops (e.g. ReLU) are
+generally *pass-through*, but some binary ops are too. In our framework ops are
+assumed to be *pass-through* by default. Exceptions to this rule are discussed
+below.
+#### Grouping
+When learning the number of outputs of ops in a TensorFlow graph, some ops are
+constrained to maintain the same number of outputs as others. Elementwise
+ops that are performed on two (or more) tensors, such as addition,
+multiplication, or maximum, constrain their input tensors to have the same size.
+Common use cases are attention maps, recurrent models, and residual connections.
+An example of a residual connection is illustrated in the diagram below. It
+would be problematic if the activations of op1 and op2 didn't live or die
+together. For example, if the `i`-th activation of op1 is alive but for op2 it's
+dead, we still cannot eliminate the `i`-th activation from op2 without breaking
+the topology of the network.
+![ResidualConnections](../g3doc/grouping.png "Ops with residual connections"
+)
+In our framework we choose to impose preservation of the topology. That is, ops
+that are connected with addition (or other elementwise binary ops) are
+constrained to have their activations live and die together. The `i`-th
+activations of each of those ops are grouped together in a single LASSO group.
+The default grouping mechanism is maximum for the `regularization_vector` and
+elementwise logical OR for the `alive_vector`. To regularize away the `i`-th
+element of the group one needs to penalize the maximum of `i`-th regularization
+terms of all ops comprising the group, and to declare the entire `i`-th group
+dead, the `i`-th element in all ops comprising the group must be dead. However
+the framework admits other forms of grouping, and user-defined grouping methods
+can be easily plugged into it.
+One property of the grouping, which may seem confusing initially, is that once
+two (or more) `OpRegularizer`s are grouped, and the `OpRegularizer` of the
+group is formed, the `OpRegularizer`s comprising the group are all 'replaced' by
+the `OpRegularizer` of the group. For example, in the diagram above, the
+`OpRegularizer`s of op1 and op2 have to be grouped. Therefore if the `i`-th
+output of op1 is alive and that of op2 is dead, and we use the default grouping
+described above, the `i`-th output of the group is *alive*.
+Now, consider op4, which receives only op2 as input. From the point of view of
+op4, the `i`-th activation of op2 must be considered *alive*, even though the
+original op2 regularizer deemed it *dead*. This is because we already know that
+we won't be able to do away with the `i`-th activation of op2 - it is tied to
+the one of op1, which is alive. Therefore after the grouping, the
+`OpRegularizer`s of all constituents of the group are henceforth *replaced* by
+the `OpRegularizer` of the group.
+#### Concatenation
+Often outputs of several ops are concatenated to a single tensor. For example,
+in Inception networks, the outputs of various convolutional 'towers' are
+concatenated along the channels dimension. In such a case, it is obvious that
+the `regularization_vector` (`alive_vector`) of the concatenation is a
+concatenation of the `regularization_vector` (`alive_vector`) of the
+concatenated ops.
+Similarly to the logic of grouping, once the concatenation of the
+`OpRegularizer`s has happened, the concatenated `OpRegularizer`s cease to exist
+and are replaced by slices of their concatenation. For example if op1 has 3
+outputs and op2 has 4, and op3 is their concatenation, op3 has 7 outputs. After
+the concatenation, the `alive_vector` of op1 will be a slice (from index 0 to
+index 2) of the `alive_vector` of op3, whereas for op2 it will be another slice
+(index from 3 to index 6).
+If op3 is later grouped with op4, as happens in Inception ResNet architectures,
+a group will be formed, and the `alive_vector` of op1 will henceforth be a slice
+(index from 0 to index 2) of the `alive_vector` of *the new group*. This is for
+the same reasons as the ones described in the section above.
+#### Client-specified rules
+The client code of the framework has the opportunity to specify rules for
+creating `OpRegularizers`. For example, for ops of type `MatMul`, which are the
+common implementation of fully-connected layers, the client can choose to assign
+group LASSO regularizers similar to the one described above. Typically the
+client code would choose to do that for 'interesting' ops, like convolutions and
+fully-connected layers, but the choice of rules is ultimately deferred to the
+client code.
+The client code may also choose to override the *default rules*. Ops are
+considered *pass-through* by default, and obviously there are cases where this
+is not true, such as reshaping, slicing, sparse maxtrix operations etc.
+TensorFlow is much too expressive for us to be able to anticipate every usage
+pattern of its ops and to properly regularize them. The set of default rules
+cover most of the common published convolutional networks, but we do not presume
+to cover *all* networks. More complex networks may require adding some custom
+rules.
+### OpRegularizerManager
+`OpRegularizerManager` is the class responsible for assigning an `OpRegularizer`
+to each op in the TensorFlow graph. Its constructor crawls the TensorFlow graph,
+starting from the ops listed in the `ops` argument (typically the output of the
+network), recursively, and assigns `OpRegularizer`s to each op encountered. Once
+the object is constructed, it provides read-only methods that allow querying the
+`OpRegularizer` for any op that was encountered during construction, and a list
+of the latter ops for convenience.
+```python
+class OpRegularizerManager(object):
+  """Assigns OpRegularizers to ops in a graph and bookkeeps the mapping."""
+  def __init__(self, ops, op_regularizer_factory_dict,
+               create_grouping_regularizer=None):
+    """Creates an instance.
+    Args:
+      ops: A list of tf.Operation. An OpRegularizer will be created for all the
+        ops in `ops`, and recursively for all ops they depend on via data
+        dependency. Typically `ops` would contain a single tf.Operation, which
+        is the output of the network.
+      op_regularizer_factory_dict: A dictionary, where the keys are strings
+        representing TensorFlow Op types, and the values are callables that
+        create the respective OpRegularizers. For every op encountered during
+        the recursion, if op.type is in op_regularizer_factory_dict, the
+        respective callable will be used to create an OpRegularizer. The
+        signature of the callables is the following args;
+          op; a tf.Operation for which to create a regularizer.
+          opreg_manager; A reference to an OpRegularizerManager object. Can be
+            None if the callable does not need access to OpRegularizerManager.
+      create_grouping_regularizer: A callable that has the signature of
+        grouping_regularizers.MaxGroupingRegularizer's constructor. Will be
+        called whenever a grouping op (see _GROUPING_OPS) is encountered.
+        Defaults to MaxGroupingRegularizer if None.
+    Raises:
+      ValueError: If ops is not a list.
+    """
+    ...
+  def get_regularizer(self, op):
+    """Returns the OpRegularizer object pertaining to `op`.
+    Args:
+      op: a tf.Operation object.
+    Returns:
+      An OpRegularizer object, or None if `op` does not have one.
+    Raises:
+      ValueError: The OpRegularizerManager object did not encounter `op` when
+        it was constructed and the grpah was traversed, and thus does not know
+        the answer.
+    """
+    ...
+  @property
+  def ops(self):
+    """Returns all tf.Operations for which `get_regularizer` is known."""
+    ...
+```
+As the constructor crawls the graph, it invokes the following set of rules, for
+any op encountered:
+* If `op_regularizer_factory_dict` has a rule on how to create an
+`OpRegularizer` for the type of the op encountered, invoke the rule. These
+are the user-specified rules. Otherwise:
+* If the op has no inputs, return `None`. Examples are constants and variables.
+Otherwise:
+* If the ops is concatenation, invoke the rule for concatenation decribed above.
+Otherwise:
+* If the op has more than one regularized input (that is, input that has a non-
+`None` `OpRegularizer`, perform grouping. Being conservative, we first check if
+the op is whitelisted for being a grouping op (elemetwise addition, subtraction
+etc). Otherwise:
+* The op is a *pass-through*. That is, its OpRegularizer is the same as of its
+input.
+The implementaiton is recursive: We start from the output nodes(s) of the graph.
+To build an `OpRegularizer` for each op, we need to know the `OpRegularizer` of
+its inputs, so we make a recursive call to find out those, and so on.
+<!-- TODO: Explain how to change the grouping mechanism. -->
+## Network Regularizers
+A `NetworkRegularizer` object targets a certain *targeted cost* of an entire
+network. Its interface is:
+```python
+class NetworkRegularizer(object):
+  """An interface for Network Regularizers."""
+  @abc.abstractmethod
+  def get_regularization_term(self, ops=None):
+    """Compute the FluidNet regularization term.
+    Args:
+      ops: A list of tf.Operation. If specified, only the regularization term
+        associated with the ops in `ops` will be returned. Otherwise, all
+        relevant ops in the default TensorFlow graph will be included.
+    Returns:
+      A tf.Tensor scalar of floating point type that evaluates to the
+      regularization term.
+    """
+    pass
+  @abc.abstractmethod
+  def get_cost(self, ops=None):
+    """Calculates the cost targeted by the Regularizer.
+    Args:
+      ops: A list of tf.Operation. If specified, only the cost pertaining to the
+      ops in the `ops` will be returned. Otherwise, all relevant ops in the
+      default TensorFlow graph will be included.
+    Returns:
+      A tf.Tensor scalar that evaluates to the cost.
+    """
+    pass
+```
+The TensorFlow scalar returned by `get_cost` evaluates to the *targeted
+cost*, and is typically used for monitoring (e.g. displaying it in
+TensorBoard). The scalar returned by `get_regularization_term` is the one that
+has to be added to the training loss, multiplied by a coefficient controlling
+its strength.
+`OpRegularizerManager` and the `OpRegularizer`s it provides for ops in the graph
+are intended to facilitate easy implementation of `NetworkRegularizer`s. We
+exemplify it here in the context of targeting FLOPs for a convolutional network,
+but the same principles apply for other *targeted costs*.
+Most of the consumption of FLOPs in convolutional networks happens in the
+convolutions. As a first approximation, we can neglect the FLOP impact of the
+other ops in the graph, even though the framework readily allows including the
+FLOP contribution of all ops, even the ones that have negligible cost.
+Within this approximation, in order to build the FLOP `NetworkRegularizer`, its
+constructor needs to:
+* Crawl the graph, starting from the output of the network, and find all
+convolution ops on which the output depends.
+* For each of these convolution ops, create an `OpRegularizer`.
+* Find the `OpRegularizer` of the *input* of each convolution op.
+* Implement Eq. (6) in the [MorphNet paper](https://arxiv.org/abs/1711.06798) to
+calculate the total FLOP cost of all convolutions, and an equation similar to
+Eq. (9) to calcluate the respective regularization term. We say 'similar'
+because Eq. (9) refers to a specific type of regularization, where the
+`regularization_vector` of a convolution is the absolute value of the respective
+batch-norm gamma vector. However the exact nature of the `regularization_vector`
+is delegated to the `OpRegularizer`.
--- a/research/morph_net/framework/__init__.py
+++ b/research/morph_net/framework/__init__.py
--- a/research/morph_net/framework/concat_and_slice_regularizers.py
+++ b/research/morph_net/framework/concat_and_slice_regularizers.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""OpRegularizers that concatenate and slice other OpRegularizers.
+When we have a concatenation op in the network, which concatenates several
+tensors, the regularizers of the concatenated ops (that is, the
+regularization_vector-s and the alive_vector-s) should be concatenated as well.
+Slicing is the complementary op - if regularizers Ra and Rb were concatenated
+into a regularizer Rc, Ra and Rb can be obtained form Rc by slicing.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from morph_net.framework import generic_regularizers
+class ConcatRegularizer(generic_regularizers.OpRegularizer):
+  """An OpRegularizer that concatenates others, to reflect a Concat op."""
+  def __init__(self, regularizers_to_concatenate):
+    for r in regularizers_to_concatenate:
+      if not generic_regularizers.dimensions_are_compatible(r):
+        raise ValueError('Bad regularizer: dimensions are not compatible')
+    self._alive_vector = tf.concat(
+        [r.alive_vector for r in regularizers_to_concatenate], 0)
+    self._regularization_vector = tf.concat(
+        [r.regularization_vector for r in regularizers_to_concatenate], 0)
+  @property
+  def regularization_vector(self):
+    return self._regularization_vector
+  @property
+  def alive_vector(self):
+    return self._alive_vector
+class SlicingReferenceRegularizer(generic_regularizers.OpRegularizer):
+  """An OpRegularizer that slices a segment of another regularizer.
+  This is useful to complement the ConcatRegularizer. For example, suppose that
+  we have two ops, one with 3 outputs (Op1) and the other with 4 outputs (Op2).
+  Each has own regularizer, Reg1 and Reg2.
+  Now suppose that a concat op concatenated Op1 and Op2 into OpC. Reg1 and Reg2
+  should be concatenated to RegC. To make the situation more complicated, RegC
+  was grouped in a group lasso with another op in the graph, resulting in RegG.
+  Whan happens next? All references to RegC should obviously be replaced by
+  RegG. But what about Reg1? The latter could be the first 3 outputs of RegG,
+  and Reg2 would be the 4 last outputs of RegG.
+  SlicingReferenceRegularizer is a regularizer that picks a segment of outputs
+  form an existing OpRegularizer. When OpRegularizers are concatenated, they
+  are replaced by SlicingReferenceRegularizer-s.
+  """
+  def __init__(self, get_regularizer_to_slice, begin, size):
+    """Creates an instance.
+    Args:
+      get_regularizer_to_slice: A callable, such that get_regularizer_to_slice()
+        returns an OpRegularizer that has to be sliced.
+      begin: An integer, where to begin the slice.
+      size: An integer, the length of the slice (so the slice ends at
+        begin + size
+    """
+    self._get_regularizer_to_slice = get_regularizer_to_slice
+    self._begin = begin
+    self._size = size
+    self._alive_vector = None
+    self._regularization_vector = None
+  @property
+  def regularization_vector(self):
+    if self._regularization_vector is None:
+      regularizer_to_slice = self._get_regularizer_to_slice()
+      self._regularization_vector = tf.slice(
+          regularizer_to_slice.regularization_vector, [self._begin],
+          [self._size])
+    return self._regularization_vector
+  @property
+  def alive_vector(self):
+    if self._alive_vector is None:
+      regularizer_to_slice = self._get_regularizer_to_slice()
+      assert regularizer_to_slice is not self
+      self._alive_vector = tf.slice(regularizer_to_slice.alive_vector,
+                                    [self._begin], [self._size])
+    return self._alive_vector
--- a/research/morph_net/framework/concat_and_slice_regularizers_test.py
+++ b/research/morph_net/framework/concat_and_slice_regularizers_test.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for framework.concat_and_slice_regularizers."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from morph_net.framework import concat_and_slice_regularizers
+from morph_net.testing import op_regularizer_stub
+class ConcatAndSliceRegularizersTest(tf.test.TestCase):
+  def setUp(self):
+    self._reg_vec1 = [0.1, 0.3, 0.6, 0.2]
+    self._alive_vec1 = [False, True, True, False]
+    self._reg_vec2 = [0.2, 0.4, 0.5]
+    self._alive_vec2 = [False, True, False]
+    self._reg1 = op_regularizer_stub.OpRegularizerStub(self._reg_vec1,
+                                                       self._alive_vec1)
+    self._reg2 = op_regularizer_stub.OpRegularizerStub(self._reg_vec2,
+                                                       self._alive_vec2)
+  def testConcatRegularizer(self):
+    concat_reg = concat_and_slice_regularizers.ConcatRegularizer(
+        [self._reg1, self._reg2])
+    with self.test_session():
+      self.assertAllEqual(self._alive_vec1 + self._alive_vec2,
+                          concat_reg.alive_vector.eval())
+      self.assertAllClose(self._reg_vec1 + self._reg_vec2,
+                          concat_reg.regularization_vector.eval(), 1e-5)
+  def testSliceRegularizer(self):
+    concat_reg = concat_and_slice_regularizers.SlicingReferenceRegularizer(
+        lambda: self._reg1, 1, 2)
+    with self.test_session():
+      self.assertAllEqual(self._alive_vec1[1:3],
+                          concat_reg.alive_vector.eval())
+      self.assertAllClose(self._reg_vec1[1:3],
+                          concat_reg.regularization_vector.eval(), 1e-5)
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/morph_net/framework/generic_regularizers.py
+++ b/research/morph_net/framework/generic_regularizers.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Interface for MorphNet regularizers framework.
+A subclasses of Regularizer represent a regularizer that targets a certain
+quantity: Number of flops, model size, number of activations etc. The
+Regularizer interface has two methods:
+1. `get_regularization_term`, which returns a regularization term that should be
+   included in the total loss to target the quantity.
+2. `get_cost`, the quantity itself (for example, the number of flops). This is
+   useful for display in TensorBoard, and later, to to provide feedback for
+   automatically tuning the coefficient that multplies the regularization term,
+   until the cost reaches (or goes below) its target value.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import abc
+class OpRegularizer(object):
+  """An interface for Op Regularizers.
+  An OpRegularizer object corresponds to a tf.Operation, and provides
+  a regularizer for the output of the op (we assume that the op has one output
+  of interest in the context of MorphNet).
+  """
+  __metaclass__ = abc.ABCMeta
+  @abc.abstractproperty
+  def regularization_vector(self):
+    """Returns a vector of floats, with regularizers.
+    The length of the vector is the number of "output activations" (call them
+    neurons, nodes, filters etc) of the op. For a convolutional network, it's
+    the number of filters (aka "depth"). For a fully-connected layer, it's
+    usually the second (and last) dimension - assuming the first one is the
+    batch size.
+    """
+    pass
+  @abc.abstractproperty
+  def alive_vector(self):
+    """Returns a vector of booleans, indicating which activations are alive.
+    (call them activations, neurons, nodes, filters etc). This vector is of the
+    same length as the regularization_vector.
+    """
+    pass
+class NetworkRegularizer(object):
+  """An interface for Network Regularizers."""
+  __metaclass__ = abc.ABCMeta
+  @abc.abstractmethod
+  def get_regularization_term(self, ops=None):
+    """Compute the regularization term.
+    Args:
+      ops: A list of tf.Operation objects. If specified, only the regularization
+        term associated with the ops in `ops` will be returned. Otherwise, all
+        relevant ops in the default TensorFlow graph will be included.
+    Returns:
+      A tf.Tensor scalar of floating point type that evaluates to the
+      regularization term (that should be added to the total loss, with a
+      suitable coefficient)
+    """
+    pass
+  @abc.abstractmethod
+  def get_cost(self, ops=None):
+    """Calculates the cost targeted by the Regularizer.
+    Args:
+      ops: A list of tf.Operation objects. If specified, only the cost
+        pertaining to the ops in the `ops` will be returned. Otherwise, all
+        relevant ops in the default TensorFlow graph will be included.
+    Returns:
+      A tf.Tensor scalar that evaluates to the cost.
+    """
+    pass
+def dimensions_are_compatible(op_regularizer):
+  """Checks if op_regularizer's alive_vector matches regularization_vector."""
+  return op_regularizer.alive_vector.shape.with_rank(1).dims[
+      0].is_compatible_with(
+          op_regularizer.regularization_vector.shape.with_rank(1).dims[0])
--- a/research/morph_net/framework/grouping_regularizers.py
+++ b/research/morph_net/framework/grouping_regularizers.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Regularizers that group other regularizers for residual connections.
+An Elementwise operation between two tensors (addition, multiplication, maximum
+etc) imposes a constraint of equality of the shapes of the constituents. For
+example, if A, B are convolutions, and another op in the network
+receives A + B as input, it means that the i-th output of A is tied to the i-th
+output of B. Only if the i-th output was regularized away by the reguarizer in
+both A and B can we discard the i-th activation in both.
+Therefore we group the i-th output of A and the i-th output of B in a group
+LASSO, a group for each i. The grouping methods can vary, and this file offers
+several variants.
+Residual connections, in ResNet or in RNNs, are examples where this kind of
+grouping is needed.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from morph_net.framework import generic_regularizers
+DEFAULT_THRESHOLD = 0.01
+class MaxGroupingRegularizer(generic_regularizers.OpRegularizer):
+  """A regularizer that groups others by taking their maximum."""
+  def __init__(self, regularizers_to_group):
+    """Creates an instance.
+    Args:
+      regularizers_to_group: A list of generic_regularizers.OpRegularizer
+        objects.Their regularization_vector (alive_vector) are expected to be of
+        the same length.
+    Raises:
+      ValueError: regularizers_to_group is not of length 2 (TODO:
+        support arbitrary length if needed.
+    """
+    _raise_if_length_is_not2(regularizers_to_group)
+    self._regularization_vector = tf.maximum(
+        regularizers_to_group[0].regularization_vector,
+        regularizers_to_group[1].regularization_vector)
+    self._alive_vector = tf.logical_or(regularizers_to_group[0].alive_vector,
+                                       regularizers_to_group[1].alive_vector)
+  @property
+  def regularization_vector(self):
+    return self._regularization_vector
+  @property
+  def alive_vector(self):
+    return self._alive_vector
+class L2GroupingRegularizer(generic_regularizers.OpRegularizer):
+  r"""A regularizer that groups others by taking their L2 norm.
+  R_j = sqrt((\sum_i r_{ij}^2))
+  Where r_i is the i-th regularization vector, r_{ij} is its j-th element, and
+  R_j is the j-th element of the resulting regularization vector.
+  """
+  def __init__(self, regularizers_to_group, threshold=DEFAULT_THRESHOLD):
+    """Creates an instance.
+    Args:
+      regularizers_to_group: A list of generic_regularizers.OpRegularizer
+        objects.Their regularization_vector (alive_vector) are expected to be of
+        the same length.
+      threshold: A float. An group of activations will be considered alive if
+        its L2 norm is greater than `threshold`.
+    Raises:
+      ValueError: regularizers_to_group is not of length 2 (TODO:
+        support arbitrary length if needed.
+    """
+    _raise_if_length_is_not2(regularizers_to_group)
+    self._regularization_vector = tf.sqrt((
+        lazy_square(regularizers_to_group[0].regularization_vector) +
+        lazy_square(regularizers_to_group[1].regularization_vector)))
+    self._alive_vector = self._regularization_vector > threshold
+  @property
+  def regularization_vector(self):
+    return self._regularization_vector
+  @property
+  def alive_vector(self):
+    return self._alive_vector
+def _raise_if_length_is_not2(regularizers_to_group):
+  if len(regularizers_to_group) != 2:
+    raise ValueError('Currently only groups of size 2 are supported.')
+def lazy_square(tensor):
+  """Computes the square of a tensor in a lazy way.
+  This function is lazy in the following sense, for:
+    tensor = tf.sqrt(input)
+  will return input (and not tf.square(tensor)).
+  Args:
+    tensor: A `Tensor` of floats to compute the square of.
+  Returns:
+    The squre of the input tensor.
+  """
+  if tensor.op.type == 'Sqrt':
+    return tensor.op.inputs[0]
+  else:
+    return tf.square(tensor)
--- a/research/morph_net/framework/grouping_regularizers_test.py
+++ b/research/morph_net/framework/grouping_regularizers_test.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for framework.grouping_regularizers."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+from morph_net.framework import grouping_regularizers
+from morph_net.testing import op_regularizer_stub
+def _l2_reg_with_025_threshold(regularizers_to_group):
+  return grouping_regularizers.L2GroupingRegularizer(regularizers_to_group,
+                                                     0.25)
+class GroupingRegularizersTest(parameterized.TestCase, tf.test.TestCase):
+  # TODO: Add parametrized tests.
+  def setUp(self):
+    self._reg_vec1 = [0.1, 0.3, 0.6, 0.2]
+    self._alive_vec1 = [False, True, True, False]
+    self._reg_vec2 = [0.2, 0.4, 0.5, 0.1]
+    self._alive_vec2 = [False, True, False, True]
+    self._reg_vec3 = [0.3, 0.2, 0.0, 0.25]
+    self._alive_vec3 = [False, True, False, True]
+    self._reg1 = op_regularizer_stub.OpRegularizerStub(self._reg_vec1,
+                                                       self._alive_vec1)
+    self._reg2 = op_regularizer_stub.OpRegularizerStub(self._reg_vec2,
+                                                       self._alive_vec2)
+    self._reg3 = op_regularizer_stub.OpRegularizerStub(self._reg_vec3,
+                                                       self._alive_vec3)
+  def testMaxGroupingRegularizer(self):
+    group_reg = grouping_regularizers.MaxGroupingRegularizer(
+        [self._reg1, self._reg2])
+    with self.test_session():
+      self.assertAllEqual(
+          [x or y for x, y in zip(self._alive_vec1, self._alive_vec2)],
+          group_reg.alive_vector.eval())
+      self.assertAllClose(
+          [max(x, y) for x, y in zip(self._reg_vec1, self._reg_vec2)],
+          group_reg.regularization_vector.eval(), 1e-5)
+  def testL2GroupingRegularizer(self):
+    group_reg = grouping_regularizers.L2GroupingRegularizer(
+        [self._reg1, self._reg2], 0.25)
+    expcted_reg_vec = [
+        np.sqrt((x**2 + y**2))
+        for x, y in zip(self._reg_vec1, self._reg_vec2)
+    ]
+    with self.test_session():
+      self.assertAllEqual([x > 0.25 for x in expcted_reg_vec],
+                          group_reg.alive_vector.eval())
+      self.assertAllClose(expcted_reg_vec,
+                          group_reg.regularization_vector.eval(), 1e-5)
+  @parameterized.named_parameters(
+      ('Max', grouping_regularizers.MaxGroupingRegularizer),
+      ('L2', _l2_reg_with_025_threshold))
+  def testOrderDoesNotMatter(self, create_reg):
+    group12 = create_reg([self._reg1, self._reg2])
+    group13 = create_reg([self._reg1, self._reg3])
+    group23 = create_reg([self._reg2, self._reg3])
+    group123 = create_reg([group12, self._reg3])
+    group132 = create_reg([group13, self._reg2])
+    group231 = create_reg([group23, self._reg1])
+    with self.test_session():
+      self.assertAllEqual(group123.alive_vector.eval(),
+                          group132.alive_vector.eval())
+      self.assertAllEqual(group123.alive_vector.eval(),
+                          group231.alive_vector.eval())
+      self.assertAllClose(group123.regularization_vector.eval(),
+                          group132.regularization_vector.eval())
+      self.assertAllClose(group123.regularization_vector.eval(),
+                          group231.regularization_vector.eval())
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/morph_net/framework/op_regularizer_manager.py
+++ b/research/morph_net/framework/op_regularizer_manager.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""A class for managing OpRegularizers.
+OpRegularizerManager creates the required regulrizers and manages the
+association between ops and their regularizers. OpRegularizerManager handles the
+logic associated with the graph topology:
+- Concatenating tensors is reflected in concatenating their regularizers.
+- Skip-connections (aka residual connections), RNNs and other structures where
+  the shapes of two (or more) tensors are tied together are reflected in
+  grouping their regularizers together.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import logging
+import tensorflow as tf
+from morph_net.framework import concat_and_slice_regularizers
+from morph_net.framework import generic_regularizers
+from morph_net.framework import grouping_regularizers
+# When an op has two (or more) inputs, that haveregularizers, the latter need to
+# be grouped. _GROUPING_OPS is a whitelist of ops that are allowed to group, as
+# a form of verification of the correctness of the code. The list is not
+# exhaustive, feel free to add other grouping ops as needed.
+_GROUPING_OPS = ('Add', 'Sub', 'Mul', 'Div', 'Maximum', 'Minimum',
+                 'SquaredDifference', 'RealDiv')  # TODO: Is Div needed?
+# Ops that are not pass-through, that necessarily modify the regularizer.
+# These are the Ops that should not have an regularizer that is identifical to
+# one of its input. When we recursively look for regularizers along the graph
+# the recursion will always stop at these Ops even if no regularizer factory
+# is provided, and never assume that they pass the regularizer of their input
+# through.
+NON_PASS_THROUGH_OPS = ('Conv2D', 'Conv2DBackpropInput', 'MatMul')
+def _remove_nones_and_dups(items):
+  result = []
+  for i in items:
+    if i is not None and i not in result:
+      result.append(i)
+  return result
+def _raise_type_error_if_not_operation(op):
+  if not isinstance(op, tf.Operation):
+    raise TypeError('\'op\' must be of type tf.Operation, not %s' %
+                    str(type(op)))
+class OpRegularizerManager(object):
+  """A class for managing OpRegularizers."""
+  # Public methods -------------------------------------------------------------
+  def __init__(self, ops, op_regularizer_factory_dict,
+               create_grouping_regularizer=None):
+    """Creates an instance.
+    Args:
+      ops: A list of tf.Operation-s. An OpRegularizer will be created for all
+        the ops in `ops`, and recursively for all ops they depend on via data
+        dependency. Typically `ops` would contain a single tf.Operation, which
+        is the output of the network.
+      op_regularizer_factory_dict: A dictionary, where the keys are strings
+        representing TensorFlow Op types, and the values are callables that
+        create the respective OpRegularizers. For every op encountered during
+        the recursion, if op.type is in op_regularizer_factory_dict, the
+        respective callable will be used to create an OpRegularizer. The
+        signature of the callables is the following args;
+          op; a tf.Operation for which to create a regularizer.
+          opreg_manager; A reference to an OpRegularizerManager object. Can be
+            None if the callable does not need access to OpRegularizerManager.
+      create_grouping_regularizer: A callable that has the signature of
+        grouping_regularizers.MaxGroupingRegularizer's constructor. Will be
+        called whenever a grouping op (see _GROUPING_OPS) is encountered.
+        Defaults to MaxGroupingRegularizer if None.
+    Raises:
+      ValueError: If ops is not a list.
+    """
+    self._constructed = False
+    if not isinstance(ops, list):
+      raise ValueError(
+          'Input %s ops is not a list. Should probably use []' % str(ops))
+    self._op_to_regularizer = {}
+    self._regularizer_to_ops = collections.defaultdict(list)
+    self._op_regularizer_factory_dict = op_regularizer_factory_dict
+    for op_type in NON_PASS_THROUGH_OPS:
+      if op_type not in self._op_regularizer_factory_dict:
+        self._op_regularizer_factory_dict[op_type] = lambda x, y: None
+    self._create_grouping_regularizer = (
+        create_grouping_regularizer or
+        grouping_regularizers.MaxGroupingRegularizer)
+    self._visited = set()
+    for op in ops:
+      self._get_regularizer(op)
+    self._constructed = True
+  def get_regularizer(self, op):
+    """Looks up or creates an OpRegularizer for a tf.Operation.
+    Args:
+      op: A tf.Operation.
+    - If `self` has an OpRegularizer for `op`, it will be returned.
+      Otherwise:
+    - If called before construction of `self` was completed (that is, from the
+      constructor), an attempt to create an OpRegularizer for `op` will be made.
+      Otherwise:
+    - If called after contstruction of `self` was completed, an exception will
+      be raised.
+    Returns:
+      An OpRegularizer for `op`. Can be None if `op` is not regularized (e.g.
+      `op` is a constant).
+    Raises:
+      RuntimeError: If `self` object has no OpRegularizer for `op` in its
+        lookup table, and the construction of `self` has already been completed
+        (because them `self` is immutable and an OpRegularizer cannot be
+        created).
+    """
+    try:
+      return self._op_to_regularizer[op]
+    except KeyError:
+      if self._constructed:
+        raise ValueError('Op %s does not have a regularizer.' % op.name)
+      else:
+        return self._get_regularizer(op)
+  @property
+  def ops(self):
+    return self._op_to_regularizer.keys()
+  # ---- Public MUTABLE methods ------------------------------------------------
+  #
+  # These methods are intended to be called by OpRegularizer factory functions,
+  # in the constructor of OpRegularizerManager. OpRegularizerManager is
+  # immutable after construction, so calling these methods after construction
+  # has been completed will raise an exception.
+  def group_and_replace_regularizers(self, regularizers):
+    """Groups a list of OpRegularizers and replaces them by the grouped one.
+    Args:
+      regularizers: A list of OpRegularizer objects to be grouped.
+    Returns:
+      An OpRegularizer object formed by the grouping.
+    Raises:
+      RuntimeError: group_and_replace_regularizers was called affter
+         construction of the OpRegularizerManager object was completed.
+    """
+    if self._constructed:
+      raise RuntimeError('group_and_replace_regularizers can only be called '
+                         'before construction of the OpRegularizerManager was '
+                         'completed.')
+    grouped = self._create_grouping_regularizer(regularizers)
+    # Replace all the references to the regularizers by the new grouped
+    # regularizer.
+    for r in regularizers:
+      self._replace_regularizer(r, grouped)
+    return grouped
+  # Private methods ------------------------------------------------------------
+  def _get_regularizer(self, op):
+    """Fetches the regularizer of `op` if exists, creates it otherwise.
+    This function calls itself recursively, directly or via _create_regularizer
+    (which in turn calls _get_regularizer). It performs DFS along the data
+    dependencies of the graph, and uses a self._visited set to detect loops. The
+    use of self._visited makes it not thread safe, but _get_regularizer is a
+    private method that is supposed to only be called form the constructor, so
+    execution in multiple threads (for the same object) is not expected.
+    Args:
+      op: A Tf.Operation.
+    Returns:
+      An OpRegularizer that corresponds to `op`, or None if op does not have
+      a regularizer (e. g. it's a constant op).
+    """
+    _raise_type_error_if_not_operation(op)
+    if op not in self._op_to_regularizer:
+      if op in self._visited:
+        # In while loops, the data dependencies form a loop.
+        # TODO: RNNs have "legit" loops - will this still work?
+        return None
+      self._visited.add(op)
+      regularizer = self._create_regularizer(op)
+      self._op_to_regularizer[op] = regularizer
+      self._regularizer_to_ops[regularizer].append(op)
+      # Make sure that there is a regularizer (or None) for every op on which
+      # `op` depends via data dependency.
+      for i in op.inputs:
+        self._get_regularizer(i.op)
+      self._visited.remove(op)
+    return self._op_to_regularizer[op]
+  def _create_regularizer(self, op):
+    """Creates an OpRegularizer for `op`.
+    Args:
+      op: A Tf.Operation.
+    Returns:
+      An OpRegularizer that corresponds to `op`, or None if op does not have
+      a regularizer.
+    Raises:
+      RuntimeError: Grouping is attempted at op which is not whitelisted for
+        grouping (in _GROUPING_OPS).
+    """
+    # First we see if there is a factory function for creating the regularizer
+    # in the op_regularizer_factory_dict (supplied in the constructor).
+    if op.type in self._op_regularizer_factory_dict:
+      regularizer = self._op_regularizer_factory_dict[op.type](op, self)
+      if regularizer is None:
+        logging.warning('Failed to create regularizer for %s.', op.name)
+      else:
+        logging.info('Created regularizer for %s.', op.name)
+      return regularizer
+    # Unless overridden in op_regularizer_factory_dict, we assume that ops
+    # without inputs have no regularizers. These are 'leaf' ops, typically
+    # constants and variables.
+    if not op.inputs:
+      return None
+    if op.type == 'ConcatV2':
+      return self._create_concat_regularizer(op)
+    inputs_regularizers = _remove_nones_and_dups(
+        [self._get_regularizer(i.op) for i in op.inputs])
+    # Ops whose inputs have no regularizers, and that are not in
+    # op_regularizer_factory_dict, have no regularizer either (think of ops that
+    # only involve constants as an example).
+    if not inputs_regularizers:
+      return None
+    # Ops that have one input with a regularizer, and are not in
+    # op_regularizer_factory_dict, are assumed to be pass-through, that is, to
+    # carry over the regularizer of their inputs. Examples:
+    # - Unary ops, such as as RELU.
+    # - BiasAdd, or similar ops, that involve a constant/variable and a
+    #   regularized op (e.g. the convolution that comes before the bias).
+    elif len(inputs_regularizers) == 1:
+      return inputs_regularizers[0]
+    # Group if we have more than one regularizer in the inputs of `op` and if it
+    # is white-listed for grouping.
+    elif op.type in _GROUPING_OPS:
+      return self.group_and_replace_regularizers(inputs_regularizers)
+    raise RuntimeError('Grouping is attempted at op which is not whitelisted '
+                       'for grouping: %s' % str(op.type))
+  def _create_concat_regularizer(self, concat_op):
+    """Creates an OpRegularizer for a concat op.
+    Args:
+      concat_op: A tf.Operation of type ConcatV2.
+    Returns:
+      An OpRegularizer for `concat_op`.
+    """
+    # We omit the last input, because it's the concat dimension. Others are
+    # the tensors to be concatenated.
+    input_ops = [i.op for i in concat_op.inputs[:-1]]
+    regularizers_to_concat = [self._get_regularizer(op) for op in input_ops]
+    # If all inputs have no regularizer, so does the concat op.
+    if regularizers_to_concat == [None] * len(regularizers_to_concat):
+      return None
+    offset = 0
+    # Replace the regularizers_to_concat by SlicingReferenceRegularizer-s that
+    # slice the concatenated regularizer.
+    ops_to_concat = []
+    for r, op in zip(regularizers_to_concat, input_ops):
+      if r is None:
+        length = op.outputs[0].shape.as_list()[-1]
+        offset += length
+        ops_to_concat.append(self._ConstantOpReg(length))
+      else:
+        length = tf.shape(r.alive_vector)[0]
+        slice_ref = concat_and_slice_regularizers.SlicingReferenceRegularizer(
+            lambda: self._get_regularizer(concat_op), offset, length)
+        offset += length
+        self._replace_regularizer(r, slice_ref)
+        ops_to_concat.append(r)
+    # Create the concatenated regularizer itself.
+    return concat_and_slice_regularizers.ConcatRegularizer(ops_to_concat)
+  def _replace_regularizer(self, source, target):
+    """Replaces `source` by 'target' in self's lookup tables."""
+    for op in self._regularizer_to_ops[source]:
+      assert self._op_to_regularizer[op] is source
+      self._op_to_regularizer[op] = target
+      self._regularizer_to_ops[target].append(op)
+    del self._regularizer_to_ops[source]
+  class _ConstantOpReg(generic_regularizers.OpRegularizer):
+    """A class with the constant alive property, and zero regularization."""
+    def __init__(self, size):
+      self._regularization_vector = tf.zeros(size)
+      self._alive_vector = tf.cast(tf.ones(size), tf.bool)
+    @property
+    def regularization_vector(self):
+      return self._regularization_vector
+    @property
+    def alive_vector(self):
+      return self._alive_vector
--- a/research/morph_net/framework/op_regularizer_manager_test.py
+++ b/research/morph_net/framework/op_regularizer_manager_test.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for op_regularizer_manager."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+from morph_net.framework import op_regularizer_manager as orm
+from morph_net.testing import op_regularizer_stub
+layers = tf.contrib.layers
+def _get_op(name):
+  return tf.get_default_graph().get_operation_by_name(name)
+class TestOpRegularizerManager(parameterized.TestCase, tf.test.TestCase):
+  def setUp(self):
+    tf.reset_default_graph()
+    tf.set_random_seed(12)
+    np.random.seed(665544)
+  def _batch_norm_scope(self):
+    params = {
+        'trainable': True,
+        'normalizer_fn': layers.batch_norm,
+        'normalizer_params': {
+            'scale': True
+        }
+    }
+    with tf.contrib.framework.arg_scope([layers.conv2d], **params) as sc:
+      return sc
+  @parameterized.named_parameters(('Batch_no_par1', True, False, 'conv1'),
+                                  ('Batch_par1', True, True, 'conv1'),
+                                  ('NoBatch_no_par1', False, False, 'conv1'),
+                                  ('NoBatch_par2', False, True, 'conv2'),
+                                  ('Batch_no_par2', True, False, 'conv2'),
+                                  ('Batch_par2', True, True, 'conv2'),
+                                  ('Batch_par3', True, True, 'conv3'),
+                                  ('NoBatch_par3', False, True, 'conv3'),
+                                  ('NoBatch_no_par3', False, False, 'conv3'))
+  def testSimpleOpGetRegularizer(self, use_batch_norm, use_partitioner, scope):
+    # Tests the alive patern of the conv and relu ops.
+    # use_batch_norm: A Boolean. Inidcats if batch norm should be used.
+    # use_partitioner: A Boolean. Inidcats if a fixed_size_partitioner should be
+    #   used.
+    # scope: A String. with the scope to test.
+    sc = self._batch_norm_scope() if use_batch_norm else []
+    partitioner = tf.fixed_size_partitioner(2) if use_partitioner else None
+    with tf.contrib.framework.arg_scope(sc):
+      with tf.variable_scope(tf.get_variable_scope(), partitioner=partitioner):
+        final_op = op_regularizer_stub.build_model()
+    op_reg_manager = orm.OpRegularizerManager([final_op],
+                                              op_regularizer_stub.MOCK_REG_DICT)
+    expected_alive = op_regularizer_stub.expected_alive()
+    with self.test_session():
+      conv_reg = op_reg_manager.get_regularizer(_get_op(scope + '/Conv2D'))
+      self.assertAllEqual(expected_alive[scope],
+                          conv_reg.alive_vector.eval())
+      relu_reg = op_reg_manager.get_regularizer(_get_op(scope +  '/Relu'))
+      self.assertAllEqual(expected_alive[scope],
+                          relu_reg.alive_vector.eval())
+  @parameterized.named_parameters(('Batch_no_par', True, False),
+                                  ('Batch_par', True, True),
+                                  ('NoBatch_no_par', False, False),
+                                  ('NoBatch_par', False, True))
+  def testConcatOpGetRegularizer(self, use_batch_norm, use_partitioner):
+    sc = self._batch_norm_scope() if use_batch_norm else []
+    partitioner = tf.fixed_size_partitioner(2) if use_partitioner else None
+    with tf.contrib.framework.arg_scope(sc):
+      with tf.variable_scope(tf.get_variable_scope(), partitioner=partitioner):
+        final_op = op_regularizer_stub.build_model()
+    op_reg_manager = orm.OpRegularizerManager([final_op],
+                                              op_regularizer_stub.MOCK_REG_DICT)
+    expected_alive = op_regularizer_stub.expected_alive()
+    expected = np.logical_or(expected_alive['conv4'],
+                             expected_alive['concat'])
+    with self.test_session():
+      conv_reg = op_reg_manager.get_regularizer(_get_op('conv4/Conv2D'))
+      self.assertAllEqual(expected, conv_reg.alive_vector.eval())
+      relu_reg = op_reg_manager.get_regularizer(_get_op('conv4/Relu'))
+      self.assertAllEqual(expected, relu_reg.alive_vector.eval())
+  @parameterized.named_parameters(('Concat_5', True, 5),
+                                  ('Concat_7', True, 7),
+                                  ('Add_6', False, 6))
+  def testGetRegularizerForConcatWithNone(self, test_concat, depth):
+    image = tf.constant(0.0, shape=[1, 17, 19, 3])
+    conv2 = layers.conv2d(image, 5, [1, 1], padding='SAME', scope='conv2')
+    other_input = tf.add(
+        tf.identity(tf.constant(3.0, shape=[1, 17, 19, depth])), 3.0)
+    # other_input has None as regularizer.
+    concat = tf.concat([other_input, conv2], 3)
+    output = tf.add(concat, concat, name='output_out')
+    op = concat.op if test_concat else output.op
+    op_reg_manager = orm.OpRegularizerManager([output.op],
+                                              op_regularizer_stub.MOCK_REG_DICT)
+    expected_alive = op_regularizer_stub.expected_alive()
+    with self.test_session():
+      alive = op_reg_manager.get_regularizer(op).alive_vector.eval()
+      self.assertAllEqual([True] * depth, alive[:depth])
+      self.assertAllEqual(expected_alive['conv2'], alive[depth:])
+  @parameterized.named_parameters(('add', tf.add),
+                                  ('div', tf.divide),
+                                  ('mul', tf.multiply),
+                                  ('max', tf.maximum),
+                                  ('min', tf.minimum),
+                                  ('l2', tf.squared_difference))
+  def testGroupingOps(self, tested_op):
+    th, size = 0.5, 11
+    image = tf.constant(0.5, shape=[1, 17, 19, 3])
+    conv1 = layers.conv2d(image, 5, [1, 1], padding='SAME', scope='conv1')
+    conv2 = layers.conv2d(image, 5, [1, 1], padding='SAME', scope='conv2')
+    res = tested_op(conv1, conv2)
+    reg = {'conv1': np.random.random(size), 'conv2': np.random.random(size)}
+    def regularizer(conv_op, manager=None):
+      del manager  # unused
+      for prefix in ['conv1', 'conv2']:
+        if conv_op.name.startswith(prefix):
+          return op_regularizer_stub.OpRegularizerStub(
+              reg[prefix], reg[prefix] > th)
+    op_reg_manager = orm.OpRegularizerManager([res.op], {'Conv2D': regularizer})
+    with self.test_session():
+      alive = op_reg_manager.get_regularizer(res.op).alive_vector.eval()
+      self.assertAllEqual(alive,
+                          np.logical_or(reg['conv1'] > th, reg['conv2'] > th))
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/morph_net/g3doc/grouping.png
+++ b/research/morph_net/g3doc/grouping.png
--- a/research/morph_net/g3doc/histogram.png
+++ b/research/morph_net/g3doc/histogram.png
--- a/research/morph_net/g3doc/tensorboard.png
+++ b/research/morph_net/g3doc/tensorboard.png
--- a/research/morph_net/network_regularizers/__init__.py
+++ b/research/morph_net/network_regularizers/__init__.py
--- a/research/morph_net/network_regularizers/bilinear_cost_utils.py
+++ b/research/morph_net/network_regularizers/bilinear_cost_utils.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Helpers for Network Regularizers that are bilinear in their inputs/outputs.
+Examples: The number of FLOPs and the number weights of a convolution are both
+a bilinear expression in the number of its inputs and outputs.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from morph_net.framework import generic_regularizers
+_CONV2D_OPS = ('Conv2D', 'Conv2DBackpropInput', 'DepthwiseConv2dNative')
+_SUPPORTED_OPS = _CONV2D_OPS + ('MatMul',)
+def _raise_if_not_supported(op):
+  if not isinstance(op, tf.Operation):
+    raise ValueError('conv_op must be a tf.Operation, not %s' % type(op))
+  if op.type not in _SUPPORTED_OPS:
+    raise ValueError('conv_op must be a Conv2D or a MatMul, not %s' % op.type)
+def _get_conv_filter_size(conv_op):
+  assert conv_op.type in _CONV2D_OPS
+  conv_weights = conv_op.inputs[1]
+  filter_shape = conv_weights.shape.as_list()[:2]
+  return filter_shape[0] * filter_shape[1]
+def flop_coeff(op):
+  """Computes the coefficient of number of flops associated with a convolution.
+  The FLOPs cost of a convolution is given by C * output_depth * input_depth,
+  where C = 2 * output_width * output_height * filter_size. The 2 is because we
+  have one multiplication and one addition for each convolution weight and
+  pixel. This function returns C.
+  Args:
+    op: A tf.Operation of type 'Conv2D' or 'MatMul'.
+  Returns:
+    A float, the coefficient that when multiplied by the input depth and by the
+    output depth gives the number of flops needed to compute the convolution.
+  Raises:
+    ValueError: conv_op is not a tf.Operation of type Conv2D.
+  """
+  _raise_if_not_supported(op)
+  if op.type in _CONV2D_OPS:
+    # Looking at the output shape makes it easy to automatically take into
+    # account strides and the type of padding.
+    if op.type == 'Conv2D' or op.type == 'DepthwiseConv2dNative':
+      shape = op.outputs[0].shape.as_list()
+    else:  # Conv2DBackpropInput
+      # For a transposed convolution, the input and the output are swapped (as
+      # far as shapes are concerned). In other words, for a given filter shape
+      # and stride, if Conv2D maps from shapeX to shapeY, Conv2DBackpropInput
+      # maps from shapeY to shapeX. Therefore wherever we use the output shape
+      # for Conv2D, we use the input shape for Conv2DBackpropInput.
+      shape = _get_input(op).shape.as_list()
+    size = shape[1] * shape[2]
+    return 2.0 * size * _get_conv_filter_size(op)
+  else:  # MatMul
+    # A MatMul is like a 1x1 conv with an output size of 1x1, so from the factor
+    # above only the 2.0 remains.
+    return 2.0
+def num_weights_coeff(op):
+  """The number of weights of a conv is C * output_depth * input_depth. Finds C.
+  Args:
+    op: A tf.Operation of type 'Conv2D' or 'MatMul'
+  Returns:
+    A float, the coefficient that when multiplied by the input depth and by the
+    output depth gives the number of flops needed to compute the convolution.
+  Raises:
+    ValueError: conv_op is not a tf.Operation of type Conv2D.
+  """
+  _raise_if_not_supported(op)
+  return _get_conv_filter_size(op) if op.type in _CONV2D_OPS else 1.0
+class BilinearNetworkRegularizer(generic_regularizers.NetworkRegularizer):
+  """A NetworkRegularizer with bilinear cost and loss.
+  Can be used for FLOPs regularization or for model size regularization.
+  """
+  def __init__(self, opreg_manager, coeff_func):
+    """Creates an instance.
+    Args:
+      opreg_manager: An OpRegularizerManager object that will be used to query
+        OpRegularizers of the various ops in the graph.
+      coeff_func: A callable that receives a tf.Operation of type Conv2D and
+        returns a bilinear coefficient of its cost. Examples:
+        - Use conv_flop_coeff for a FLOP regularizer.
+        - Use conv_num_weights_coeff for a number-of-weights regularizer.
+    """
+    self._opreg_manager = opreg_manager
+    self._coeff_func = coeff_func
+  def _get_cost_or_regularization_term(self, is_regularization, ops=None):
+    total = 0.0
+    if not ops:
+      ops = self._opreg_manager.ops
+    for op in ops:
+      if op.type not in _SUPPORTED_OPS:
+        continue
+      # We use the following expression for thr regularizer:
+      #
+      # coeff * (number_of_inputs_alive * sum_of_output_regularizers +
+      #          number_of_outputs_alive * sum_of_input_regularizers)
+      #
+      # where 'coeff' is a coefficient (for a particular convolution) such that
+      # the number of flops of that convolution is given by:
+      # number_of_flops = coeff * number_of_inputs * number_of_outputs.
+      input_op = _get_input(op).op
+      input_op_reg = self._opreg_manager.get_regularizer(input_op)
+      output_op_reg = self._opreg_manager.get_regularizer(op)
+      coeff = self._coeff_func(op)
+      num_alive_inputs = _count_alive(input_op, input_op_reg)
+      num_alive_outputs = _count_alive(op, output_op_reg)
+      if op.type == 'DepthwiseConv2dNative':
+        if is_regularization:
+          reg_inputs = _sum_of_reg_vector(input_op_reg)
+          reg_outputs = _sum_of_reg_vector(output_op_reg)
+          # reg_inputs and reg_outputs are often identical since they should
+          # come from the same reguarlizer. Duplicate them for symmetry.
+          # When the input doesn't have a regularizer (e.g. input), only the
+          # second term is used.
+          # TODO: revisit this expression after experiments.
+          total += coeff * (reg_inputs + reg_outputs)
+        else:
+          # num_alive_inputs may not always equals num_alive_outputs because the
+          # input (e.g. the image) may not have a gamma regularizer. In this
+          # case the computation is porportional only to num_alive_outputs.
+          total += coeff * num_alive_outputs
+      else:
+        if is_regularization:
+          reg_inputs = _sum_of_reg_vector(input_op_reg)
+          reg_outputs = _sum_of_reg_vector(output_op_reg)
+          total += coeff * (
+              num_alive_inputs * reg_outputs + num_alive_outputs * reg_inputs)
+        else:
+          total += coeff * num_alive_inputs * num_alive_outputs
+    return total
+  def get_cost(self, ops=None):
+    return self._get_cost_or_regularization_term(False, ops)
+  def get_regularization_term(self, ops=None):
+    return self._get_cost_or_regularization_term(True, ops)
+def _get_input(op):
+  """Returns the input to that op that represents the activations.
+  (as opposed to e.g. weights.)
+  Args:
+    op: A tf.Operation object with type in _SUPPORTED_OPS.
+  Returns:
+    A tf.Tensor representing the input activations.
+  Raises:
+    ValueError: MatMul is used with transposition (unsupported).
+  """
+  assert op.type in _SUPPORTED_OPS, 'Op type %s is not supported.' % op.type
+  if op.type == 'Conv2D' or op.type == 'DepthwiseConv2dNative':
+    return op.inputs[0]
+  if op.type == 'Conv2DBackpropInput':
+    return op.inputs[2]
+  if op.type == 'MatMul':
+    if op.get_attr('transpose_a') or op.get_attr('transpose_b'):
+      raise ValueError('MatMul with transposition is not yet supported.')
+    return op.inputs[0]
+def _count_alive(op, opreg):
+  if opreg:
+    return tf.reduce_sum(tf.cast(opreg.alive_vector, tf.float32))
+  else:
+    return float(op.outputs[0].shape.as_list()[-1])
+def _sum_of_reg_vector(opreg):
+  if opreg:
+    return tf.reduce_sum(opreg.regularization_vector)
+  else:
+    return 0.0
--- a/research/morph_net/network_regularizers/bilinear_cost_utils_test.py
+++ b/research/morph_net/network_regularizers/bilinear_cost_utils_test.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for compute_cost_estimator.
+Note that BilinearNetworkRegularizer is not tested here - its specific
+instantiation is tested in flop_regularizer_test.py.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from tensorflow.python.framework import ops
+from morph_net.network_regularizers import bilinear_cost_utils
+layers = tf.contrib.layers
+def _flops(op):
+  """Get the number of flops of a convolution, from the ops stats registry.
+  Args:
+    op: A tf.Operation object.
+  Returns:
+    The number os flops needed to evaluate conv_op.
+  """
+  return (ops.get_stats_for_node_def(tf.get_default_graph(), op.node_def,
+                                     'flops').value)
+def _output_depth(conv_op):
+  return conv_op.outputs[0].shape.as_list()[-1]
+def _input_depth(conv_op):
+  conv_weights = conv_op.inputs[1]
+  return conv_weights.shape.as_list()[2]
+class BilinearCostUtilTest(tf.test.TestCase):
+  def setUp(self):
+    tf.reset_default_graph()
+    image = tf.constant(0.0, shape=[1, 11, 13, 17])
+    net = layers.conv2d(
+        image, 19, [7, 5], stride=2, padding='SAME', scope='conv1')
+    layers.conv2d_transpose(
+        image, 29, [7, 5], stride=2, padding='SAME', scope='convt2')
+    net = tf.reduce_mean(net, axis=(1, 2))
+    layers.fully_connected(net, 23, scope='FC')
+    net = layers.conv2d(
+        image, 10, [7, 5], stride=2, padding='SAME', scope='conv2')
+    layers.separable_conv2d(
+        net, None, [3, 2], depth_multiplier=1, padding='SAME', scope='dw1')
+    self.conv_op = tf.get_default_graph().get_operation_by_name('conv1/Conv2D')
+    self.convt_op = tf.get_default_graph().get_operation_by_name(
+        'convt2/conv2d_transpose')
+    self.matmul_op = tf.get_default_graph().get_operation_by_name(
+        'FC/MatMul')
+    self.dw_op = tf.get_default_graph().get_operation_by_name(
+        'dw1/depthwise')
+  def assertNearRelatively(self, expected, actual):
+    self.assertNear(expected, actual, expected * 1e-6)
+  def testConvFlopsCoeff(self):
+    # Divide by the input depth and the output depth to get the coefficient.
+    expected_coeff = _flops(self.conv_op) / (17.0 * 19.0)
+    actual_coeff = bilinear_cost_utils.flop_coeff(self.conv_op)
+    self.assertNearRelatively(expected_coeff, actual_coeff)
+  def testConvTransposeFlopsCoeff(self):
+    # Divide by the input depth and the output depth to get the coefficient.
+    expected_coeff = _flops(self.convt_op) / (17.0 * 29.0)
+    actual_coeff = bilinear_cost_utils.flop_coeff(self.convt_op)
+    self.assertNearRelatively(expected_coeff, actual_coeff)
+  def testFcFlopsCoeff(self):
+    expected_coeff = _flops(self.matmul_op) / (19.0 * 23.0)
+    actual_coeff = bilinear_cost_utils.flop_coeff(self.matmul_op)
+    self.assertNearRelatively(expected_coeff, actual_coeff)
+  def testConvNumWeightsCoeff(self):
+    actual_coeff = bilinear_cost_utils.num_weights_coeff(self.conv_op)
+    # The coefficient is just the filter size - 7 * 5 = 35:
+    self.assertNearRelatively(35, actual_coeff)
+  def testFcNumWeightsCoeff(self):
+    actual_coeff = bilinear_cost_utils.num_weights_coeff(self.matmul_op)
+    # The coefficient is 1.0, the number of weights is just inputs x outputs.
+    self.assertNearRelatively(1.0, actual_coeff)
+  def testDepthwiseConvFlopsCoeff(self):
+    # Divide by the input depth (which is also the output depth) to get the
+    # coefficient.
+    expected_coeff = _flops(self.dw_op) / (10.0)
+    actual_coeff = bilinear_cost_utils.flop_coeff(self.dw_op)
+    self.assertNearRelatively(expected_coeff, actual_coeff)
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/morph_net/network_regularizers/flop_regularizer.py
+++ b/research/morph_net/network_regularizers/flop_regularizer.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""A NetworkRegularizer that targets the number of FLOPs."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from morph_net.framework import op_regularizer_manager
+from morph_net.network_regularizers import bilinear_cost_utils
+from morph_net.op_regularizers import conv_group_lasso_regularizer
+from morph_net.op_regularizers import gamma_l1_regularizer
+class GammaFlopsRegularizer(bilinear_cost_utils.BilinearNetworkRegularizer):
+  """A NetworkRegularizer that targets FLOPs using Gamma L1 as OpRegularizer."""
+  def __init__(self, ops, gamma_threshold):
+    gamma_l1_reg_factory = gamma_l1_regularizer.GammaL1RegularizerFactory(
+        gamma_threshold)
+    opreg_manager = op_regularizer_manager.OpRegularizerManager(
+        ops, {
+            'Conv2D': gamma_l1_reg_factory.create_regularizer,
+            'DepthwiseConv2dNative': gamma_l1_reg_factory.create_regularizer
+        })
+    super(GammaFlopsRegularizer, self).__init__(opreg_manager,
+                                                bilinear_cost_utils.flop_coeff)
+class GroupLassoFlopsRegularizer(
+    bilinear_cost_utils.BilinearNetworkRegularizer):
+  """A NetworkRegularizer that targets FLOPs using L1 group lasso."""
+  def __init__(self, ops, threshold):
+    # Regularizer factories for convolution and fully connected layers.
+    conv_regularizer_factory = (
+        conv_group_lasso_regularizer.ConvGroupLassoRegularizerFactory(threshold)
+    )
+    regularizer_factories = {
+        'Conv2D': conv_regularizer_factory.create_regularizer,
+        'Conv2DBackpropInput': conv_regularizer_factory.create_regularizer,
+    }
+    # Create OpRegularizerManager instance.
+    opreg_manager = op_regularizer_manager.OpRegularizerManager(
+        ops, regularizer_factories)
+    super(GroupLassoFlopsRegularizer, self).__init__(
+        opreg_manager, bilinear_cost_utils.flop_coeff)
--- a/research/morph_net/network_regularizers/flop_regularizer_test.py
+++ b/research/morph_net/network_regularizers/flop_regularizer_test.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for flop_regularizer."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import abc
+import numpy as np
+import tensorflow as tf
+from tensorflow.contrib.slim.nets import resnet_v1
+from morph_net.network_regularizers import bilinear_cost_utils
+from morph_net.network_regularizers import flop_regularizer
+arg_scope = tf.contrib.framework.arg_scope
+layers = tf.contrib.layers
+_coeff = bilinear_cost_utils.flop_coeff
+NUM_CHANNELS = 3
+class GammaFlopLossTest(tf.test.TestCase):
+  def setUp(self):
+    tf.reset_default_graph()
+    self.BuildWithBatchNorm()
+    with self.test_session():
+      self.Init()
+  def BuildWithBatchNorm(self):
+    params = {
+        'trainable': True,
+        'normalizer_fn': layers.batch_norm,
+        'normalizer_params': {
+            'scale': True
+        }
+    }
+    with arg_scope([layers.conv2d], **params):
+      self.BuildModel()
+  def BuildModel(self):
+    # Our test model is:
+    #
+    #         -> conv1 --+     -> conv3 -->
+    #        /           |    /
+    #  image          [concat]
+    #        \           |    \
+    #         -> conv2 --+     -> conv4 -->
+    #
+    # (the model has two "outputs", conv3 and conv4).
+    #
+    image = tf.constant(0.0, shape=[1, 17, 19, NUM_CHANNELS])
+    conv1 = layers.conv2d(image, 13, [7, 5], padding='SAME', scope='conv1')
+    conv2 = layers.conv2d(image, 23, [1, 1], padding='SAME', scope='conv2')
+    concat = tf.concat([conv1, conv2], 3)
+    self.conv3 = layers.conv2d(
+        concat, 29, [3, 3], stride=2, padding='SAME', scope='conv3')
+    self.conv4 = layers.conv2d(
+        concat, 31, [1, 1], stride=1, padding='SAME', scope='conv4')
+    self.name_to_var = {v.op.name: v for v in tf.global_variables()}
+    self.gamma_flop_reg = flop_regularizer.GammaFlopsRegularizer(
+        [self.conv3.op, self.conv4.op], gamma_threshold=0.45)
+  def GetConv(self, name):
+    return tf.get_default_graph().get_operation_by_name(name + '/Conv2D')
+  def Init(self):
+    tf.global_variables_initializer().run()
+    gamma1 = self.name_to_var['conv1/BatchNorm/gamma']
+    gamma1.assign([0.8] * 7 + [0.2] * 6).eval()
+    gamma2 = self.name_to_var['conv2/BatchNorm/gamma']
+    gamma2.assign([-0.7] * 11 + [0.1] * 12).eval()
+    gamma3 = self.name_to_var['conv3/BatchNorm/gamma']
+    gamma3.assign([0.6] * 10 + [-0.3] * 19).eval()
+    gamma4 = self.name_to_var['conv4/BatchNorm/gamma']
+    gamma4.assign([-0.5] * 17 + [-0.4] * 14).eval()
+  def cost(self, conv):
+    with self.test_session():
+      return self.gamma_flop_reg.get_cost(conv).eval()
+  def loss(self, conv):
+    with self.test_session():
+      return self.gamma_flop_reg.get_regularization_term(conv).eval()
+  def testCost(self):
+    # Conv1 has 7 gammas above 0.45, and NUM_CHANNELS inputs (from the image).
+    conv = self.GetConv('conv1')
+    self.assertEqual(_coeff(conv) * 7 * NUM_CHANNELS, self.cost([conv]))
+    # Conv2 has 11 gammas above 0.45, and NUM_CHANNELS inputs (from the image).
+    conv = self.GetConv('conv2')
+    self.assertEqual(_coeff(conv) * 11 * NUM_CHANNELS, self.cost([conv]))
+    # Conv3 has 10 gammas above 0.45, and 7 + 11 inputs from conv1 and conv2.
+    conv = self.GetConv('conv3')
+    self.assertEqual(_coeff(conv) * 10 * 18, self.cost([conv]))
+    # Conv4 has 17 gammas above 0.45, and 7 + 11 inputs from conv1 and conv2.
+    conv = self.GetConv('conv4')
+    self.assertEqual(_coeff(conv) * 17 * 18, self.cost([conv]))
+    # Test that passing a list of convs sums their contributions:
+    convs = [self.GetConv('conv3'), self.GetConv('conv4')]
+    self.assertEqual(
+        self.cost(convs[:1]) + self.cost(convs[1:]), self.cost(convs))
+class GammaFlopLossWithDepthwiseConvTestBase(object):
+  """Test flop_regularizer for a network with depthwise convolutions."""
+  __metaclass__ = abc.ABCMeta
+  @abc.abstractmethod
+  def GetSession(self):
+    return
+  def BuildWithBatchNorm(self):
+    params = {
+        'trainable': True,
+        'normalizer_fn': layers.batch_norm,
+        'normalizer_params': {
+            'scale': True
+        }
+    }
+    ops_with_batchnorm = [layers.conv2d]
+    if self._depthwise_use_batchnorm:
+      ops_with_batchnorm.append(layers.separable_conv2d)
+    with arg_scope(ops_with_batchnorm, **params):
+      self.BuildModel()
+  def BuildModel(self):
+    # Our test model is:
+    #
+    #         -> dw1 --> conv1 --+
+    #        /                   |
+    #  image                     [concat] --> conv3
+    #        \                   |
+    #         -> conv2 --> dw2 --+
+    #
+    # (the model has one "output", conv3).
+    #
+    image = tf.constant(0.0, shape=[1, 17, 19, NUM_CHANNELS])
+    dw1 = layers.separable_conv2d(
+        image, None, [3, 3], depth_multiplier=1, stride=1, scope='dw1')
+    conv1 = layers.conv2d(dw1, 13, [7, 5], padding='SAME', scope='conv1')
+    conv2 = layers.conv2d(image, 23, [1, 1], padding='SAME', scope='conv2')
+    dw2 = layers.separable_conv2d(
+        conv2, None, [5, 5], depth_multiplier=1, stride=1, scope='dw2')
+    concat = tf.concat([conv1, dw2], 3)
+    self.conv3 = layers.conv2d(
+        concat, 29, [3, 3], stride=2, padding='SAME', scope='conv3')
+    self.name_to_var = {v.op.name: v for v in tf.global_variables()}
+    self.gamma_flop_reg = flop_regularizer.GammaFlopsRegularizer(
+        [self.conv3.op], gamma_threshold=0.45)
+  def GetConv(self, name):
+    return tf.get_default_graph().get_operation_by_name(
+        name + ('/Conv2D' if 'conv' in name else '/depthwise'))
+  def GetGammaAbsValue(self, name):
+    gamma_op = tf.get_default_graph().get_operation_by_name(name +
+                                                            '/BatchNorm/gamma')
+    with self.GetSession():  # pylint: disable=not-context-manager
+      gamma = gamma_op.outputs[0].eval()
+    return np.abs(gamma)
+  def Init(self):
+    tf.global_variables_initializer().run()
+    gamma1 = self.name_to_var['conv1/BatchNorm/gamma']
+    gamma1.assign([0.8] * 7 + [0.2] * 6).eval()
+    gamma2 = self.name_to_var['conv2/BatchNorm/gamma']
+    gamma2.assign([-0.7] * 11 + [0.1] * 12).eval()
+    gamma3 = self.name_to_var['conv3/BatchNorm/gamma']
+    gamma3.assign([0.6] * 10 + [-0.3] * 19).eval()
+    # Initialize gamma for depthwise convs only if there are Batchnorm for them.
+    if self._depthwise_use_batchnorm:
+      gammad1 = self.name_to_var['dw1/BatchNorm/gamma']
+      gammad1.assign([-0.3] * 1 + [-0.9] * 2).eval()
+      gammad2 = self.name_to_var['dw2/BatchNorm/gamma']
+      gammad2.assign([0.3] * 5 + [0.9] * 10 + [-0.1] * 8).eval()
+  def cost(self, conv):  # pylint: disable=invalid-name
+    with self.GetSession():  # pylint: disable=not-context-manager
+      cost = self.gamma_flop_reg.get_cost(conv)
+      return cost.eval() if isinstance(cost, tf.Tensor) else cost
+  def loss(self, conv):  # pylint: disable=invalid-name
+    with self.GetSession():  # pylint: disable=not-context-manager
+      reg = self.gamma_flop_reg.get_regularization_term(conv)
+      return reg.eval() if isinstance(reg, tf.Tensor) else reg
+class GammaFlopLossWithDepthwiseConvTest(
+    tf.test.TestCase, GammaFlopLossWithDepthwiseConvTestBase):
+  """Test flop_regularizer for a network with depthwise convolutions."""
+  def setUp(self):
+    self._depthwise_use_batchnorm = True
+    tf.reset_default_graph()
+    self.BuildWithBatchNorm()
+    with self.test_session():
+      self.Init()
+  def GetSession(self):
+    return self.test_session()
+  def testCost(self):
+    # Dw1 has 2 gammas above 0.45 out of NUM_CHANNELS inputs (from the image),
+    # but because the input doesn't have a regularizer, it has no way of
+    # removing the channels, so the channel count is still NUM_CHANNELS.
+    conv = self.GetConv('dw1')
+    self.assertEqual(_coeff(conv) * NUM_CHANNELS, self.cost([conv]))
+    # Conv1 has 7 gammas above 0.45, and NUM_CHANNELS inputs (from dw1).
+    conv = self.GetConv('conv1')
+    self.assertEqual(_coeff(conv) * 7 * NUM_CHANNELS, self.cost([conv]))
+    # Conv2 has 11 active + 12 inactive, while Dw2 has 5 inactive, 10 active and
+    # 8 active. Their max (or) has 15 active and 8 inactive.
+    # Conv2 has NUM_CHANNELS inputs (from the image).
+    conv = self.GetConv('conv2')
+    self.assertEqual(_coeff(conv) * 15 * NUM_CHANNELS, self.cost([conv]))
+    # Dw2 has 15 out of 23 inputs (from the Conv2).
+    conv = self.GetConv('dw2')
+    self.assertEqual(_coeff(conv) * 15, self.cost([conv]))
+    # Conv3 has 10 gammas above 0.45, and 7 + 15 inputs from conv1 and dw2.
+    conv = self.GetConv('conv3')
+    self.assertEqual(_coeff(conv) * 10 * 22, self.cost([conv]))
+  def testRegularizer(self):
+    # Dw1 depthwise convolution is connected to the input (no regularizer).
+    conv = self.GetConv('dw1')
+    # Although the effective regularizer for dw is computed as below:
+    # gamma = self.GetGammaAbsValue('dw1')
+    # expected_loss = _coeff(conv) * gamma.sum()
+    # Since the input is not regularized, dw does not return a regularizer.
+    expected_loss = 0.0
+    self.assertNear(expected_loss, self.loss([conv]), expected_loss * 1e-5)
+    # Conv1 takes Dw1 as input, its input regularizer is from dw1.
+    conv = self.GetConv('conv1')
+    gamma = self.GetGammaAbsValue('conv1')
+    # The effective size for dw can be computed from its gamma, and
+    # the loss may be computed as follows:
+    # gamma_dw = self.GetGammaAbsValue('dw1')
+    # expected_loss = _coeff(conv) * (
+    #     gamma.sum() * (gamma_dw > 0.45).sum() + gamma_dw.sum() *
+    #     (gamma > 0.45).sum())
+    # However, since dw cannot change shape because its input doesn't have a
+    # regularizer, the real loss we expect should be:
+    expected_loss = _coeff(conv) * (gamma.sum() * NUM_CHANNELS)
+    self.assertNear(expected_loss, self.loss([conv]), expected_loss * 1e-5)
+    # Dw2 depthwise convolution is connected to conv2 (grouped regularizer).
+    conv = self.GetConv('conv2')
+    gamma_conv = self.GetGammaAbsValue('conv2')
+    dw = self.GetConv('dw2')
+    gamma_dw = self.GetGammaAbsValue('dw2')
+    gamma = np.maximum(gamma_dw, gamma_conv).sum()
+    expected_loss = _coeff(conv) * (gamma * 3 + (gamma > 0.45).sum() * 0)
+    self.assertNear(expected_loss, self.loss([conv]), expected_loss * 1e-5)
+    expected_loss = _coeff(dw) * gamma * 2
+    self.assertNear(expected_loss, self.loss([dw]), expected_loss * 1e-5)
+class GammaFlopLossWithDepthwiseConvNoBatchNormTest(
+    tf.test.TestCase, GammaFlopLossWithDepthwiseConvTestBase):
+  """Test flop_regularizer for un-batchnormed depthwise convolutions.
+  This test is used to confirm that when depthwise convolution is not BNed, it
+  will not be considered towards the regularizer, but it will be counted towards
+  the cost.
+  This design choice is for backward compatibility for users who did not
+  regularize depthwise convolutions. However, the cost will be reported
+  regardless in order to be faithful to the real computation complexity.
+  """
+  def setUp(self):
+    self._depthwise_use_batchnorm = False
+    tf.reset_default_graph()
+    self.BuildWithBatchNorm()
+    with self.test_session():
+      self.Init()
+  def GetSession(self):
+    return self.test_session()
+  def testCost(self):
+    # Dw1 has NUM_CHANNELS inputs (from the image).
+    conv = self.GetConv('dw1')
+    self.assertEqual(_coeff(conv) * 3, self.cost([conv]))
+    # Conv1 has 7 gammas above 0.45, and 3 inputs (from dw1).
+    conv = self.GetConv('conv1')
+    self.assertEqual(_coeff(conv) * 7 * 3, self.cost([conv]))
+    # Conv2 has 11 active outputs and NUM_CHANNELS inputs (from the image).
+    conv = self.GetConv('conv2')
+    self.assertEqual(_coeff(conv) * 11 * NUM_CHANNELS, self.cost([conv]))
+    # Dw2 has 11 inputs (pass-through from the Conv2).
+    conv = self.GetConv('dw2')
+    self.assertEqual(_coeff(conv) * 11, self.cost([conv]))
+    # Conv3 has 10 gammas above 0.45, and 7 + 11 inputs from conv1 and dw2.
+    conv = self.GetConv('conv3')
+    self.assertEqual(_coeff(conv) * 10 * 18, self.cost([conv]))
+  def testRegularizer(self):
+    # Dw1 depthwise convolution is connected to the input (no regularizer).
+    conv = self.GetConv('dw1')
+    expected_loss = 0.0
+    self.assertNear(expected_loss, self.loss([conv]), expected_loss * 1e-5)
+    # Conv1 takes Dw1 as input, but it's not affected by dw1 because depthwise
+    # is not BNed.
+    conv = self.GetConv('conv1')
+    gamma = self.GetGammaAbsValue('conv1')
+    expected_loss = _coeff(conv) * (gamma.sum() * NUM_CHANNELS)
+    self.assertNear(expected_loss, self.loss([conv]), expected_loss * 1e-5)
+    # Dw2 depthwise convolution is connected to conv2 (pass through).
+    dw = self.GetConv('dw2')
+    gamma = self.GetGammaAbsValue('conv2')
+    expected_loss = _coeff(dw) * gamma.sum() * 2
+    self.assertNear(expected_loss, self.loss([dw]), expected_loss * 1e-5)
+class GammaFlopResidualConnectionsLossTest(tf.test.TestCase):
+  """Tests flop_regularizer for a network with residual connections."""
+  def setUp(self):
+    tf.reset_default_graph()
+    tf.set_random_seed(7)
+    self._threshold = 0.6
+  def buildModel(self, resnet_fn, block_fn):
+    # We use this model as a test case because the slim.nets.resnet module is
+    # used in some production.
+    #
+    # The model looks as follows:
+    #
+    # Image --> unit_1/shortcut
+    # Image --> unit_1/conv1 --> unit_1/conv2 --> unit_1/conv3
+    #
+    # unit_1/shortcut + unit_1/conv3 --> unit_1 (residual connection)
+    #
+    # unit_1 --> unit_2/conv1  -> unit_2/conv2 --> unit_2/conv3
+    #
+    # unit_1 + unit_2/conv3 --> unit_2 (residual connection)
+    #
+    # In between, there are strided convolutions and pooling ops, but these
+    # should not affect the regularizer.
+    blocks = [
+        block_fn('block1', base_depth=7, num_units=2, stride=2),
+    ]
+    image = tf.constant(0.0, shape=[1, 2, 2, NUM_CHANNELS])
+    net = resnet_fn(
+        image, blocks, include_root_block=False, is_training=False)[0]
+    net = tf.reduce_mean(net, axis=(1, 2))
+    return layers.fully_connected(net, 23, scope='FC')
+  def buildGraphWithBatchNorm(self, resnet_fn, block_fn):
+    params = {
+        'trainable': True,
+        'normalizer_fn': layers.batch_norm,
+        'normalizer_params': {
+            'scale': True
+        }
+    }
+    with arg_scope([layers.conv2d, layers.separable_conv2d], **params):
+      self.net = self.buildModel(resnet_fn, block_fn)
+  def initGamma(self):
+    assignments = []
+    gammas = {}
+    for v in tf.global_variables():
+      if v.op.name.endswith('/gamma'):
+        assignments.append(v.assign(tf.random_uniform(v.shape)))
+        gammas[v.op.name] = v
+    with self.test_session() as s:
+      s.run(assignments)
+      self._gammas = s.run(gammas)
+  def getGamma(self, short_name):
+    tokens = short_name.split('/')
+    name = ('resnet_v1/block1/' + tokens[0] + '/bottleneck_v1/' + tokens[1] +
+            '/BatchNorm/gamma')
+    return self._gammas[name]
+  def getOp(self, short_name):
+    if short_name == 'FC':
+      return tf.get_default_graph().get_operation_by_name('FC/MatMul')
+    tokens = short_name.split('/')
+    name = ('resnet_v1/block1/' + tokens[0] + '/bottleneck_v1/' + tokens[1] +
+            '/Conv2D')
+    return tf.get_default_graph().get_operation_by_name(name)
+  def numAlive(self, short_name):
+    return np.sum(self.getGamma(short_name) > self._threshold)
+  def getCoeff(self, short_name):
+    return _coeff(self.getOp(short_name))
+  def testCost(self):
+    self.buildGraphWithBatchNorm(resnet_v1.resnet_v1, resnet_v1.resnet_v1_block)
+    self.initGamma()
+    res_alive = np.logical_or(
+        np.logical_or(
+            self.getGamma('unit_1/shortcut') > self._threshold,
+            self.getGamma('unit_1/conv3') > self._threshold),
+        self.getGamma('unit_2/conv3') > self._threshold)
+    self.gamma_flop_reg = flop_regularizer.GammaFlopsRegularizer(
+        [self.net.op], self._threshold)
+    expected = {}
+    expected['unit_1/shortcut'] = (
+        self.getCoeff('unit_1/shortcut') * np.sum(res_alive) * NUM_CHANNELS)
+    expected['unit_1/conv1'] = (
+        self.getCoeff('unit_1/conv1') * self.numAlive('unit_1/conv1') *
+        NUM_CHANNELS)
+    expected['unit_1/conv2'] = (
+        self.getCoeff('unit_1/conv2') * self.numAlive('unit_1/conv2') *
+        self.numAlive('unit_1/conv1'))
+    expected['unit_1/conv3'] = (
+        self.getCoeff('unit_1/conv3') * np.sum(res_alive) *
+        self.numAlive('unit_1/conv2'))
+    expected['unit_2/conv1'] = (
+        self.getCoeff('unit_2/conv1') * self.numAlive('unit_2/conv1') *
+        np.sum(res_alive))
+    expected['unit_2/conv2'] = (
+        self.getCoeff('unit_2/conv2') * self.numAlive('unit_2/conv2') *
+        self.numAlive('unit_2/conv1'))
+    expected['unit_2/conv3'] = (
+        self.getCoeff('unit_2/conv3') * np.sum(res_alive) *
+        self.numAlive('unit_2/conv2'))
+    expected['FC'] = 2.0 * np.sum(res_alive) * 23.0
+    # TODO: Is there a way to use Parametrized Tests to make this more
+    # elegant?
+    with self.test_session():
+      for short_name in expected:
+        cost = self.gamma_flop_reg.get_cost([self.getOp(short_name)]).eval()
+        self.assertEqual(expected[short_name], cost)
+      self.assertEqual(
+          sum(expected.values()),
+          self.gamma_flop_reg.get_cost().eval())
+class GroupLassoFlopRegTest(tf.test.TestCase):
+  def assertNearRelatively(self, expected, actual):
+    self.assertNear(expected, actual, expected * 1e-6)
+  def testFlopRegularizer(self):
+    tf.reset_default_graph()
+    tf.set_random_seed(7907)
+    with arg_scope(
+        [layers.conv2d, layers.conv2d_transpose],
+        weights_initializer=tf.random_normal_initializer):
+      # Our test model is:
+      #
+      #         -> conv1 --+
+      #        /           |--[concat]
+      #  image --> conv2 --+
+      #        \
+      #         -> convt
+      #
+      # (the model has two "outputs", convt and concat).
+      #
+      image = tf.constant(0.0, shape=[1, 17, 19, NUM_CHANNELS])
+      conv1 = layers.conv2d(
+          image, 13, [7, 5], padding='SAME', scope='conv1')
+      conv2 = layers.conv2d(
+          image, 23, [1, 1], padding='SAME', scope='conv2')
+      self.concat = tf.concat([conv1, conv2], 3)
+      self.convt = layers.conv2d_transpose(
+          image, 29, [7, 5], stride=3, padding='SAME', scope='convt')
+      self.name_to_var = {v.op.name: v for v in tf.global_variables()}
+    with self.test_session():
+      tf.global_variables_initializer().run()
+    threshold = 1.0
+    flop_reg = flop_regularizer.GroupLassoFlopsRegularizer(
+        [self.concat.op, self.convt.op], threshold=threshold)
+    with self.test_session() as s:
+      evaluated_vars = s.run(self.name_to_var)
+    def group_norm(weights, axis=(0, 1, 2)):  # pylint: disable=invalid-name
+      return np.sqrt(np.mean(weights**2, axis=axis))
+    reg_vectors = {
+        'conv1': group_norm(evaluated_vars['conv1/weights'], (0, 1, 2)),
+        'conv2': group_norm(evaluated_vars['conv2/weights'], (0, 1, 2)),
+        'convt': group_norm(evaluated_vars['convt/weights'], (0, 1, 3))
+    }
+    num_alive = {k: np.sum(r > threshold) for k, r in reg_vectors.iteritems()}
+    total_outputs = (
+        reg_vectors['conv1'].shape[0] + reg_vectors['conv2'].shape[0])
+    total_alive_outputs = sum(num_alive.values())
+    assert total_alive_outputs > 0, (
+        'All outputs are dead - test is trivial. Decrease the threshold.')
+    assert total_alive_outputs < total_outputs, (
+        'All outputs are alive - test is trivial. Increase the threshold.')
+    coeff1 = _coeff(_get_op('conv1/Conv2D'))
+    coeff2 = _coeff(_get_op('conv2/Conv2D'))
+    coefft = _coeff(_get_op('convt/conv2d_transpose'))
+    expected_flop_cost = NUM_CHANNELS * (
+        coeff1 * num_alive['conv1'] + coeff2 * num_alive['conv2'] +
+        coefft * num_alive['convt'])
+    expected_reg_term = NUM_CHANNELS * (
+        coeff1 * np.sum(reg_vectors['conv1']) + coeff2 * np.sum(
+            reg_vectors['conv2']) + coefft * np.sum(reg_vectors['convt']))
+    with self.test_session():
+      self.assertEqual(
+          round(expected_flop_cost), round(flop_reg.get_cost().eval()))
+      self.assertNearRelatively(expected_reg_term,
+                                flop_reg.get_regularization_term().eval())
+def _get_op(name):  # pylint: disable=invalid-name
+  return tf.get_default_graph().get_operation_by_name(name)
+if __name__ == '__main__':
+  tf.test.main()