Added new model, global objectives.

6c6f3f3a · Alan Mackey · cac3a298 · 6c6f3f3a · 6c6f3f3a · 6c6f3f3a
Commit 6c6f3f3a authored Jun 13, 2018 by Alan Mackey
8 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -14,6 +14,7 @@
 /research/differential_privacy/ @ilyamironov @ananthr
 /research/domain_adaptation/ @bousmalis @dmrd
 /research/gan/ @joel-shor
+/research/global_objectives/ @mackeya-google
 /research/im2txt/ @cshallue
 /research/inception/ @shlens @vincentvanhoucke
 /research/learned_optimizer/ @olganw @nirum

--- a/research/global_objectives/README.md
+++ b/research/global_objectives/README.md
+# Global Objectives
+The Global Objectives library provides TensorFlow loss functions that optimize
+directly for a variety of objectives including AUC, recall at precision, and
+more. The global objectives losses can be used as drop-in replacements for
+TensorFlow's standard multilabel loss functions:
+`tf.nn.sigmoid_cross_entropy_with_logits` and `tf.losses.sigmoid_cross_entropy`.
+Many machine learning classification models are optimized for classification
+accuracy, when the real objective the user cares about is different and can be
+precision at a fixed recall, precision-recall AUC, ROC AUC or similar metrics.
+These are referred to as "global objectives" because they depend on how the
+model classifies the dataset as a whole and do not decouple across data points
+as accuracy does.
+Because these objectives are combinatorial, discontinuous, and essentially
+intractable to optimize directly, the functions in this library approximate
+their corresponding objectives. This approximation approach follows the same
+pattern as optimizing for accuracy, where a surrogate objective such as
+cross-entropy or the hinge loss is used as an upper bound on the error rate.
+## Getting Started
+For a full example of how to use the loss functions in practice, see
+loss_layers_example.py.
+Briefly, global objective losses can be used to replace
+`tf.nn.sigmoid_cross_entropy_with_logits` by providing the relevant
+additional arguments. For example,
+``` python
+tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
+```
+could be replaced with
+``` python
+global_objectives.recall_at_precision_loss(
+    labels=labels,
+    logits=logits,
+    target_precision=0.95)[0]
+```
+Just as minimizing the cross-entropy loss will maximize accuracy, the loss
+functions in loss_layers.py were written so that minimizing the loss will
+maximize the corresponding objective.
+The global objective losses have two return values -- the loss tensor and
+additional quantities for debugging and customization -- which is why the first
+value is used above. For more information, see
+[Visualization & Debugging](#visualization-debugging).
+## Binary Label Format
+Binary classification problems can be represented as a multi-class problem with
+two classes, or as a multi-label problem with one label. (Recall that multiclass
+problems have mutually exclusive classes, e.g. 'cat xor dog', and multilabel
+have classes which are not mutually exclusive, e.g. an image can contain a cat,
+a dog, both, or neither.) The softmax loss
+(`tf.nn.softmax_cross_entropy_with_logits`) is used for multi-class problems,
+while the sigmoid loss (`tf.nn.sigmoid_cross_entropy_with_logits`) is used for
+multi-label problems.
+A multiclass label format for binary classification might represent positives
+with the label [1, 0] and negatives with the label [0, 1], while the multilbel
+format for the same problem would use [1] and [0], respectively.
+All global objectives loss functions assume that the multilabel format is used.
+Accordingly, if your current loss function is softmax, the labels will have to
+be reformatted for the loss to work properly.
+## Dual Variables
+Global objectives losses (except for `roc_auc_loss`) use internal variables
+called dual variables or Lagrange multipliers to enforce the desired constraint
+(e.g. if optimzing for recall at precision, the constraint is on precision).
+These dual variables are created and initialized internally by the loss
+functions, and are updated during training by the same optimizer used for the
+model's other variables. To initialize the dual variables to a particular value,
+use the `lambdas_initializer` argument. The dual variables can be found under
+the key `lambdas` in the `other_outputs` dictionary returned by the losses.
+## Loss Function Arguments
+The following arguments are common to all loss functions in the library, and are
+either required or very important.
+* `labels`: Corresponds directly to the `labels` argument of
+  `tf.nn.sigmoid_cross_entropy_with_logits`.
+* `logits`: Corresponds directly to the `logits` argument of
+  `tf.nn.sigmoid_cross_entropy_with_logits`.
+* `dual_rate_factor`: A floating point value which controls the step size for
+  the Lagrange multipliers. Setting this value less than 1.0 will cause the
+  constraint to be enforced more gradually and will result in more stable
+  training.
+In addition, the objectives with a single constraint (e.g.
+`recall_at_precision_loss`) have an argument (e.g. `target_precision`) used to
+specify the value of the constraint. The optional `precision_range` argument to
+`precision_recall_auc_loss` is used to specify the range of precision values
+over which to optimize the AUC, and defaults to the interval [0, 1].
+Optional arguments:
+* `weights`: A tensor which acts as coefficients for the loss. If a weight of x
+  is provided for a datapoint and that datapoint is a true (false) positive
+  (negative), it will be counted as x true (false) positives (negatives).
+  Defaults to 1.0.
+* `label_priors`: A tensor specifying the fraction of positive datapoints for
+  each label. If not provided, it will be computed inside the loss function.
+* `surrogate_type`: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+* `lambdas_initializer`: An initializer for the dual variables (Lagrange
+  multipliers). See also the Dual Variables section.
+* `num_anchors` (precision_recall_auc_loss only): The number of grid points used
+  when approximating the AUC as a Riemann sum.
+## Hyperparameters
+While the functional form of the global objectives losses allow them to be
+easily substituted in place of `sigmoid_cross_entropy_with_logits`, model
+hyperparameters such as learning rate, weight decay, etc. may need to be
+fine-tuned to the new loss. Fortunately, the amount of hyperparameter re-tuning
+is usually minor.
+The most important hyperparameters to modify are the learning rate and
+dual_rate_factor (see the section on Loss Function Arguments, above).
+## Visualization & Debugging
+The global objectives losses return two values. The first is a tensor
+representing the numerical value of the loss, which can be passed to an
+optimizer. The second is a dictionary of tensors created by the loss function
+which are not necessary for optimization but useful in debugging. These vary
+depending on the loss function, but usually include `lambdas` (the Lagrange
+multipliers) as well as the lower bound on true positives and upper bound on
+false positives.
+When visualizing the loss during training, note that the global objectives
+losses differ from standard losses in some important ways:
+* The global losses may be negative. This is because the value returned by the
+  loss includes terms involving the Lagrange multipliers, which may be negative.
+* The global losses may not decrease over the course of training. To enforce the
+  constraints in the objective, the loss changes over time and may increase.
+## More Info
+For more details, see the [Global Objectives paper](https://arxiv.org/abs/1608.04802).
+## Maintainers
+* Mariano Schain
+* Elad Eban
+* [Alan Mackey](https://github.com/mackeya-google)
--- a/research/global_objectives/loss_layers.py
+++ b/research/global_objectives/loss_layers.py
--- a/research/global_objectives/loss_layers_example.py
+++ b/research/global_objectives/loss_layers_example.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Example for using global objectives.
+Illustrate, using synthetic data, how using the precision_at_recall loss
+significanly improves the performace of a linear classifier.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# Dependency imports
+import numpy as np
+from sklearn.metrics import precision_score
+import tensorflow as tf
+from global_objectives import loss_layers
+# When optimizing using global_objectives, if set to True then the saddle point
+# optimization steps are performed internally by the Tensorflow optimizer,
+# otherwise by dedicated saddle-point steps as part of the optimization loop.
+USE_GO_SADDLE_POINT_OPT = False
+TARGET_RECALL = 0.98
+TRAIN_ITERATIONS = 150
+LEARNING_RATE = 1.0
+GO_DUAL_RATE_FACTOR = 15.0
+NUM_CHECKPOINTS = 6
+EXPERIMENT_DATA_CONFIG = {
+    'positives_centers': [[0, 1.0], [1, -0.5]],
+    'negatives_centers': [[0, -0.5], [1, 1.0]],
+    'positives_variances': [0.15, 0.1],
+    'negatives_variances': [0.15, 0.1],
+    'positives_counts': [500, 50],
+    'negatives_counts': [3000, 100]
+}
+def create_training_and_eval_data_for_experiment(**data_config):
+  """Creates train and eval data sets.
+  Note: The synthesized binary-labeled data is a mixture of four Gaussians - two
+    positives and two negatives. The centers, variances, and sizes for each of
+    the two positives and negatives mixtures are passed in the respective keys
+    of data_config:
+  Args:
+      **data_config: Dictionary with Array entries as follows:
+        positives_centers - float [2,2] two centers of positives data sets.
+        negatives_centers - float [2,2] two centers of negatives data sets.
+        positives_variances - float [2] Variances for the positives sets.
+        negatives_variances - float [2] Variances for the negatives sets.
+        positives_counts - int [2] Counts for each of the two positives sets.
+        negatives_counts - int [2] Counts for each of the two negatives sets.
+  Returns:
+    A dictionary with two shuffled data sets created - one for training and one
+    for eval. The dictionary keys are 'train_data', 'train_labels', 'eval_data',
+    and 'eval_labels'. The data points are two-dimentional floats, and the
+    labels are in {0,1}.
+  """
+  def data_points(is_positives, index):
+    variance = data_config['positives_variances'
+                           if is_positives else 'negatives_variances'][index]
+    center = data_config['positives_centers'
+                         if is_positives else 'negatives_centers'][index]
+    count = data_config['positives_counts'
+                        if is_positives else 'negatives_counts'][index]
+    return variance*np.random.randn(count, 2) + np.array([center])
+  def create_data():
+    return np.concatenate([data_points(False, 0),
+                           data_points(True, 0),
+                           data_points(True, 1),
+                           data_points(False, 1)], axis=0)
+  def create_labels():
+    """Creates an array of 0.0 or 1.0 labels for the data_config batches."""
+    return np.array([0.0]*data_config['negatives_counts'][0] +
+                    [1.0]*data_config['positives_counts'][0] +
+                    [1.0]*data_config['positives_counts'][1] +
+                    [0.0]*data_config['negatives_counts'][1])
+  permutation = np.random.permutation(
+      sum(data_config['positives_counts'] + data_config['negatives_counts']))
+  train_data = create_data()[permutation, :]
+  eval_data = create_data()[permutation, :]
+  train_labels = create_labels()[permutation]
+  eval_labels = create_labels()[permutation]
+  return {
+      'train_data': train_data,
+      'train_labels': train_labels,
+      'eval_data': eval_data,
+      'eval_labels': eval_labels
+  }
+def train_model(data, use_global_objectives):
+  """Trains a linear model for maximal accuracy or precision at given recall."""
+  def precision_at_recall(scores, labels, target_recall):
+    """Computes precision - at target recall - over data."""
+    positive_scores = scores[labels == 1.0]
+    threshold = np.percentile(positive_scores, 100 - target_recall*100)
+    predicted = scores >= threshold
+    return precision_score(labels, predicted)
+  w = tf.Variable(tf.constant([-1.0, -1.0], shape=[2, 1]), trainable=True,
+                  name='weights', dtype=tf.float32)
+  b = tf.Variable(tf.zeros([1]), trainable=True, name='biases',
+                  dtype=tf.float32)
+  logits = tf.matmul(tf.cast(data['train_data'], tf.float32), w) + b
+  labels = tf.constant(
+      data['train_labels'],
+      shape=[len(data['train_labels']), 1],
+      dtype=tf.float32)
+  if use_global_objectives:
+    loss, other_outputs = loss_layers.precision_at_recall_loss(
+        labels, logits,
+        TARGET_RECALL,
+        dual_rate_factor=GO_DUAL_RATE_FACTOR)
+    loss = tf.reduce_mean(loss)
+  else:
+    loss = tf.reduce_mean(
+        tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits))
+  global_step = tf.Variable(0, trainable=False)
+  learning_rate = tf.train.polynomial_decay(
+      LEARNING_RATE,
+      global_step,
+      TRAIN_ITERATIONS, (LEARNING_RATE / TRAIN_ITERATIONS),
+      power=1.0,
+      cycle=False,
+      name='learning_rate')
+  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
+  if (not use_global_objectives) or USE_GO_SADDLE_POINT_OPT:
+    training_op = optimizer.minimize(loss, global_step=global_step)
+  else:
+    lambdas = other_outputs['lambdas']
+    primal_update_op = optimizer.minimize(loss, var_list=[w, b])
+    dual_update_op = optimizer.minimize(
+        loss, global_step=global_step, var_list=[lambdas])
+  # Training loop:
+  with tf.Session() as sess:
+    checkpoint_step = TRAIN_ITERATIONS // NUM_CHECKPOINTS
+    sess.run(tf.global_variables_initializer())
+    step = sess.run(global_step)
+    while step <= TRAIN_ITERATIONS:
+      if (not use_global_objectives) or USE_GO_SADDLE_POINT_OPT:
+        _, step, loss_value, w_value, b_value = sess.run(
+            [training_op, global_step, loss, w, b])
+      else:
+        _, w_value, b_value = sess.run([primal_update_op, w, b])
+        _, loss_value, step = sess.run([dual_update_op, loss, global_step])
+      if use_global_objectives:
+        go_outputs = sess.run(other_outputs.values())
+      if step % checkpoint_step == 0:
+        precision = precision_at_recall(
+            np.dot(data['train_data'], w_value) + b_value,
+            data['train_labels'], TARGET_RECALL)
+        tf.logging.info('Loss = %f Precision = %f', loss_value, precision)
+        if use_global_objectives:
+          for i, output_name in enumerate(other_outputs.keys()):
+            tf.logging.info('\t%s = %f', output_name, go_outputs[i])
+    w_value, b_value = sess.run([w, b])
+    return precision_at_recall(np.dot(data['eval_data'], w_value) + b_value,
+                               data['eval_labels'],
+                               TARGET_RECALL)
+def main(unused_argv):
+  del unused_argv
+  experiment_data = create_training_and_eval_data_for_experiment(
+      **EXPERIMENT_DATA_CONFIG)
+  global_objectives_loss_precision = train_model(experiment_data, True)
+  tf.logging.info('global_objectives precision at requested recall is %f',
+                  global_objectives_loss_precision)
+  cross_entropy_loss_precision = train_model(experiment_data, False)
+  tf.logging.info('cross_entropy precision at requested recall is %f',
+                  cross_entropy_loss_precision)
+if __name__ == '__main__':
+  tf.logging.set_verbosity(tf.logging.INFO)
+  tf.app.run()
--- a/research/global_objectives/loss_layers_test.py
+++ b/research/global_objectives/loss_layers_test.py
--- a/research/global_objectives/test_all.py
+++ b/research/global_objectives/test_all.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Runs all unit tests in the Global Objectives package.
+Requires that TensorFlow and abseil (https://github.com/abseil/abseil-py) be
+installed on your machine. Command to run the tests:
+python test_all.py
+"""
+import os
+import sys
+import unittest
+this_file = os.path.realpath(__file__)
+start_dir = os.path.dirname(this_file)
+parent_dir = os.path.dirname(start_dir)
+sys.path.append(parent_dir)
+loader = unittest.TestLoader()
+suite = loader.discover(start_dir, pattern='*_test.py')
+runner = unittest.TextTestRunner(verbosity=2)
+runner.run(suite)
--- a/research/global_objectives/util.py
+++ b/research/global_objectives/util.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Contains utility functions for the global objectives library."""
+# Dependency imports
+import tensorflow as tf
+def weighted_sigmoid_cross_entropy_with_logits(labels,
+                                               logits,
+                                               positive_weights=1.0,
+                                               negative_weights=1.0,
+                                               name=None):
+  """Computes a weighting of sigmoid cross entropy given `logits`.
+  Measures the weighted probability error in discrete classification tasks in
+  which classes are independent and not mutually exclusive.  For instance, one
+  could perform multilabel classification where a picture can contain both an
+  elephant and a dog at the same time. The class weight multiplies the
+  different types of errors.
+  For brevity, let `x = logits`, `z = labels`, `c = positive_weights`,
+  `d = negative_weights`  The
+  weighed logistic loss is
+  ```
+  c * z * -log(sigmoid(x)) + d * (1 - z) * -log(1 - sigmoid(x))
+  = c * z * -log(1 / (1 + exp(-x))) - d * (1 - z) * log(exp(-x) / (1 + exp(-x)))
+  = c * z * log(1 + exp(-x)) + d * (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
+  = c * z * log(1 + exp(-x)) + d * (1 - z) * (x + log(1 + exp(-x)))
+  = (1 - z) * x * d + (1 - z + c * z ) * log(1 + exp(-x))
+  =  - d * x * z + d * x + (d - d * z + c * z ) * log(1 + exp(-x))
+  ```
+  To ensure stability and avoid overflow, the implementation uses the identity
+      log(1 + exp(-x)) = max(0,-x) + log(1 + exp(-abs(x)))
+  and the result is computed as
+    ```
+    = -d * x * z + d * x
+      + (d - d * z + c * z ) * (max(0,-x) + log(1 + exp(-abs(x))))
+    ```
+  Note that the loss is NOT an upper bound on the 0-1 loss, unless it is divided
+  by log(2).
+  Args:
+    labels: A `Tensor` of type `float32` or `float64`. `labels` can be a 2D
+      tensor with shape [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], the loss is computed separately on each
+      slice [:, :, k] of `logits`.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+  Returns:
+    A `Tensor` of the same shape as `logits` with the componentwise
+      weighted logistic losses.
+  """
+  with tf.name_scope(
+      name,
+      'weighted_logistic_loss',
+      [logits, labels, positive_weights, negative_weights]) as name:
+    labels, logits, positive_weights, negative_weights = prepare_loss_args(
+        labels, logits, positive_weights, negative_weights)
+    softplus_term = tf.add(tf.maximum(-logits, 0.0),
+                           tf.log(1.0 + tf.exp(-tf.abs(logits))))
+    weight_dependent_factor = (
+        negative_weights + (positive_weights - negative_weights) * labels)
+    return (negative_weights * (logits - labels * logits) +
+            weight_dependent_factor * softplus_term)
+def weighted_hinge_loss(labels,
+                        logits,
+                        positive_weights=1.0,
+                        negative_weights=1.0,
+                        name=None):
+  """Computes weighted hinge loss given logits `logits`.
+  The loss applies to multi-label classification tasks where labels are
+  independent and not mutually exclusive. See also
+  `weighted_sigmoid_cross_entropy_with_logits`.
+  Args:
+    labels: A `Tensor` of type `float32` or `float64`. Each entry must be
+      either 0 or 1. `labels` can be a 2D tensor with shape
+      [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], the loss is computed separately on each
+      slice [:, :, k] of `logits`.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+  Returns:
+    A `Tensor` of the same shape as `logits` with the componentwise
+      weighted hinge loss.
+  """
+  with tf.name_scope(
+      name, 'weighted_hinge_loss',
+      [logits, labels, positive_weights, negative_weights]) as name:
+    labels, logits, positive_weights, negative_weights = prepare_loss_args(
+        labels, logits, positive_weights, negative_weights)
+    positives_term = positive_weights * labels * tf.maximum(1.0 - logits, 0)
+    negatives_term = (negative_weights * (1.0 - labels)
+                      * tf.maximum(1.0 + logits, 0))
+    return positives_term + negatives_term
+def weighted_surrogate_loss(labels,
+                            logits,
+                            surrogate_type='xent',
+                            positive_weights=1.0,
+                            negative_weights=1.0,
+                            name=None):
+  """Returns either weighted cross-entropy or hinge loss.
+  For example `surrogate_type` is 'xent' returns the weighted cross
+  entropy loss.
+  Args:
+   labels: A `Tensor` of type `float32` or `float64`. Each entry must be
+      between 0 and 1. `labels` can be a 2D tensor with shape
+      [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], each slice [:, :, k] represents an
+      'attempt' to predict `labels` and the loss is computed per slice.
+    surrogate_type: A string that determines which loss to return, supports
+    'xent' for cross-entropy and 'hinge' for hinge loss.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+  Returns:
+    The weigthed loss.
+  Raises:
+    ValueError: If value of `surrogate_type` is not supported.
+  """
+  with tf.name_scope(
+      name, 'weighted_loss',
+      [logits, labels, surrogate_type, positive_weights,
+       negative_weights]) as name:
+    if surrogate_type == 'xent':
+      return weighted_sigmoid_cross_entropy_with_logits(
+          logits=logits,
+          labels=labels,
+          positive_weights=positive_weights,
+          negative_weights=negative_weights,
+          name=name)
+    elif surrogate_type == 'hinge':
+      return weighted_hinge_loss(
+          logits=logits,
+          labels=labels,
+          positive_weights=positive_weights,
+          negative_weights=negative_weights,
+          name=name)
+    raise ValueError('surrogate_type %s not supported.' % surrogate_type)
+def expand_outer(tensor, rank):
+  """Expands the given `Tensor` outwards to a target rank.
+  For example if rank = 3 and tensor.shape is [3, 4], this function will expand
+  to such that the resulting shape will be  [1, 3, 4].
+  Args:
+    tensor: The tensor to expand.
+    rank: The target dimension.
+  Returns:
+    The expanded tensor.
+  Raises:
+    ValueError: If rank of `tensor` is unknown, or if `rank` is smaller than
+      the rank of `tensor`.
+  """
+  if tensor.get_shape().ndims is None:
+    raise ValueError('tensor dimension must be known.')
+  if len(tensor.get_shape()) > rank:
+    raise ValueError(
+        '`rank` must be at least the current tensor dimension: (%s vs %s).' %
+        (rank, len(tensor.get_shape())))
+  while len(tensor.get_shape()) < rank:
+    tensor = tf.expand_dims(tensor, 0)
+  return tensor
+def build_label_priors(labels,
+                       weights=None,
+                       positive_pseudocount=1.0,
+                       negative_pseudocount=1.0,
+                       variables_collections=None):
+  """Creates an op to maintain and update label prior probabilities.
+  For each label, the label priors are estimated as
+      (P + sum_i w_i y_i) / (P + N + sum_i w_i),
+  where y_i is the ith label, w_i is the ith weight, P is a pseudo-count of
+  positive labels, and N is a pseudo-count of negative labels. The index i
+  ranges over all labels observed during all evaluations of the returned op.
+  Args:
+    labels: A `Tensor` with shape [batch_size, num_labels]. Entries should be
+      in [0, 1].
+    weights: Coefficients representing the weight of each label. Must be either
+      a Tensor of shape [batch_size, num_labels] or `None`, in which case each
+      weight is treated as 1.0.
+    positive_pseudocount: Number of positive labels used to initialize the label
+      priors.
+    negative_pseudocount: Number of negative labels used to initialize the label
+      priors.
+    variables_collections: Optional list of collections for created variables.
+  Returns:
+    label_priors: An op to update the weighted label_priors. Gives the
+      current value of the label priors when evaluated.
+  """
+  dtype = labels.dtype.base_dtype
+  num_labels = get_num_labels(labels)
+  if weights is None:
+    weights = tf.ones_like(labels)
+  # We disable partitioning while constructing dual variables because they will
+  # be updated with assign, which is not available for partitioned variables.
+  partitioner = tf.get_variable_scope().partitioner
+  try:
+    tf.get_variable_scope().set_partitioner(None)
+    # Create variable and update op for weighted label counts.
+    weighted_label_counts = tf.contrib.framework.model_variable(
+        name='weighted_label_counts',
+        shape=[num_labels],
+        dtype=dtype,
+        initializer=tf.constant_initializer(
+            [positive_pseudocount] * num_labels, dtype=dtype),
+        collections=variables_collections,
+        trainable=False)
+    weighted_label_counts_update = weighted_label_counts.assign_add(
+        tf.reduce_sum(weights * labels, 0))
+    # Create variable and update op for the sum of the weights.
+    weight_sum = tf.contrib.framework.model_variable(
+        name='weight_sum',
+        shape=[num_labels],
+        dtype=dtype,
+        initializer=tf.constant_initializer(
+            [positive_pseudocount + negative_pseudocount] * num_labels,
+            dtype=dtype),
+        collections=variables_collections,
+        trainable=False)
+    weight_sum_update = weight_sum.assign_add(tf.reduce_sum(weights, 0))
+  finally:
+    tf.get_variable_scope().set_partitioner(partitioner)
+  label_priors = tf.div(
+      weighted_label_counts_update,
+      weight_sum_update)
+  return label_priors
+def convert_and_cast(value, name, dtype):
+  """Convert input to tensor and cast to dtype.
+  Args:
+    value: An object whose type has a registered Tensor conversion function,
+        e.g. python numerical type or numpy array.
+    name: Name to use for the new Tensor, if one is created.
+    dtype: Optional element type for the returned tensor.
+  Returns:
+    A tensor.
+  """
+  return tf.cast(tf.convert_to_tensor(value, name=name), dtype=dtype)
+def prepare_loss_args(labels, logits, positive_weights, negative_weights):
+  """Prepare arguments for weighted loss functions.
+  If needed, will convert given arguments to appropriate type and shape.
+  Args:
+    labels: labels or labels of the loss function.
+    logits: Logits of the loss function.
+    positive_weights: Weight on the positive examples.
+    negative_weights: Weight on the negative examples.
+  Returns:
+    Converted labels, logits, positive_weights, negative_weights.
+  """
+  logits = tf.convert_to_tensor(logits, name='logits')
+  labels = convert_and_cast(labels, 'labels', logits.dtype)
+  if len(labels.get_shape()) == 2 and len(logits.get_shape()) == 3:
+    labels = tf.expand_dims(labels, [2])
+  positive_weights = convert_and_cast(positive_weights, 'positive_weights',
+                                      logits.dtype)
+  positive_weights = expand_outer(positive_weights, logits.get_shape().ndims)
+  negative_weights = convert_and_cast(negative_weights, 'negative_weights',
+                                      logits.dtype)
+  negative_weights = expand_outer(negative_weights, logits.get_shape().ndims)
+  return labels, logits, positive_weights, negative_weights
+def get_num_labels(labels_or_logits):
+  """Returns the number of labels inferred from labels_or_logits."""
+  if labels_or_logits.get_shape().ndims <= 1:
+    return 1
+  return labels_or_logits.get_shape()[1].value
--- a/research/global_objectives/util_test.py
+++ b/research/global_objectives/util_test.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for global objectives util functions."""
+# Dependency imports
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+from global_objectives import util
+def weighted_sigmoid_cross_entropy(targets, logits, weight):
+  return (weight * targets * np.log(1.0 + np.exp(-logits)) + (
+      (1.0 - targets) * np.log(1.0 + 1.0 / np.exp(-logits))))
+def hinge_loss(labels, logits):
+  # Mostly copied from tensorflow.python.ops.losses but with loss per datapoint.
+  labels = tf.to_float(labels)
+  all_ones = tf.ones_like(labels)
+  labels = tf.subtract(2 * labels, all_ones)
+  return tf.nn.relu(tf.subtract(all_ones, tf.multiply(labels, logits)))
+class WeightedSigmoidCrossEntropyTest(parameterized.TestCase, tf.test.TestCase):
+  def testTrivialCompatibilityWithSigmoidCrossEntropy(self):
+    """Tests compatibility with unweighted function with weight 1.0."""
+    x_shape = [300, 10]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+    weighted_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits)
+    expected_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(),
+                          weighted_loss.eval(),
+                          atol=0.000001)
+  def testNonTrivialCompatibilityWithSigmoidCrossEntropy(self):
+    """Tests use of an arbitrary weight (4.12)."""
+    x_shape = [300, 10]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+    weight = 4.12
+    weighted_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits,
+        weight,
+        weight)
+    expected_loss = (
+        weight *
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(),
+                          weighted_loss.eval(),
+                          atol=0.000001)
+  def testDifferentSizeWeightedSigmoidCrossEntropy(self):
+    """Tests correctness on 3D tensors.
+    Tests that the function works as expected when logits is a 3D tensor and
+    targets is a 2D tensor.
+    """
+    targets_shape = [30, 4]
+    logits_shape = [targets_shape[0], targets_shape[1], 3]
+    targets = np.random.random_sample(targets_shape).astype(np.float32)
+    logits = np.random.randn(*logits_shape).astype(np.float32)
+    weight_vector = [2.0, 3.0, 13.0]
+    loss = util.weighted_sigmoid_cross_entropy_with_logits(targets,
+                                                           logits,
+                                                           weight_vector)
+    with self.test_session():
+      loss = loss.eval()
+      for i in range(0, len(weight_vector)):
+        expected = weighted_sigmoid_cross_entropy(targets, logits[:, :, i],
+                                                  weight_vector[i])
+        self.assertAllClose(loss[:, :, i], expected, atol=0.000001)
+  @parameterized.parameters((300, 10, 0.3), (20, 4, 2.0), (30, 4, 3.9))
+  def testWeightedSigmoidCrossEntropy(self, batch_size, num_labels, weight):
+    """Tests thats the tf and numpy functions agree on many instances."""
+    x_shape = [batch_size, num_labels]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+    with self.test_session():
+      loss = util.weighted_sigmoid_cross_entropy_with_logits(
+          targets,
+          logits,
+          weight,
+          1.0,
+          name='weighted-loss')
+      expected = weighted_sigmoid_cross_entropy(targets, logits, weight)
+      self.assertAllClose(expected, loss.eval(), atol=0.000001)
+  def testGradients(self):
+    """Tests that weighted loss gradients behave as expected."""
+    dummy_tensor = tf.constant(1.0)
+    positives_shape = [10, 1]
+    positives_logits = dummy_tensor * tf.Variable(
+        tf.random_normal(positives_shape) + 1.0)
+    positives_targets = tf.ones(positives_shape)
+    positives_weight = 4.6
+    positives_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            positives_logits, positives_targets) * positives_weight)
+    negatives_shape = [190, 1]
+    negatives_logits = dummy_tensor * tf.Variable(
+        tf.random_normal(negatives_shape))
+    negatives_targets = tf.zeros(negatives_shape)
+    negatives_weight = 0.9
+    negatives_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            negatives_logits, negatives_targets) * negatives_weight)
+    all_logits = tf.concat([positives_logits, negatives_logits], 0)
+    all_targets = tf.concat([positives_targets, negatives_targets], 0)
+    weighted_loss = tf.reduce_sum(
+        util.weighted_sigmoid_cross_entropy_with_logits(
+            all_targets, all_logits, positives_weight, negatives_weight))
+    weighted_gradients = tf.gradients(weighted_loss, dummy_tensor)
+    expected_loss = tf.add(
+        tf.reduce_sum(positives_loss),
+        tf.reduce_sum(negatives_loss))
+    expected_gradients = tf.gradients(expected_loss, dummy_tensor)
+    with tf.Session() as session:
+      tf.global_variables_initializer().run()
+      grad, expected_grad = session.run(
+          [weighted_gradients, expected_gradients])
+      self.assertAllClose(grad, expected_grad)
+  def testDtypeFlexibility(self):
+    """Tests the loss on inputs of varying data types."""
+    shape = [20, 3]
+    logits = np.random.randn(*shape)
+    targets = tf.truncated_normal(shape)
+    positive_weights = tf.constant(3, dtype=tf.int64)
+    negative_weights = 1
+    loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets, logits, positive_weights, negative_weights)
+    with self.test_session():
+      self.assertEqual(loss.eval().dtype, np.float)
+class WeightedHingeLossTest(tf.test.TestCase):
+  def testTrivialCompatibilityWithHinge(self):
+    # Tests compatibility with unweighted hinge loss.
+    x_shape = [55, 10]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.3))
+    weighted_loss = util.weighted_hinge_loss(targets, logits)
+    expected_loss = hinge_loss(targets, logits)
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), weighted_loss.eval())
+  def testLessTrivialCompatibilityWithHinge(self):
+    # Tests compatibility with a constant weight for positives and negatives.
+    x_shape = [56, 11]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.7))
+    weight = 1.0 + 1.0/2 + 1.0/3 + 1.0/4 + 1.0/5 + 1.0/6 + 1.0/7
+    weighted_loss = util.weighted_hinge_loss(targets, logits, weight, weight)
+    expected_loss = hinge_loss(targets, logits) * weight
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), weighted_loss.eval())
+  def testNontrivialCompatibilityWithHinge(self):
+    # Tests compatibility with different positive and negative weights.
+    x_shape = [23, 8]
+    logits_positives = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    logits_negatives = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets_positives = tf.ones(x_shape)
+    targets_negatives = tf.zeros(x_shape)
+    logits = tf.concat([logits_positives, logits_negatives], 0)
+    targets = tf.concat([targets_positives, targets_negatives], 0)
+    raw_loss = util.weighted_hinge_loss(targets,
+                                        logits,
+                                        positive_weights=3.4,
+                                        negative_weights=1.2)
+    loss = tf.reduce_sum(raw_loss, 0)
+    positives_hinge = hinge_loss(targets_positives, logits_positives)
+    negatives_hinge = hinge_loss(targets_negatives, logits_negatives)
+    expected = tf.add(tf.reduce_sum(3.4 * positives_hinge, 0),
+                      tf.reduce_sum(1.2 * negatives_hinge, 0))
+    with self.test_session():
+      self.assertAllClose(loss.eval(), expected.eval())
+  def test3DLogitsAndTargets(self):
+    # Tests correctness when logits is 3D and targets is 2D.
+    targets_shape = [30, 4]
+    logits_shape = [targets_shape[0], targets_shape[1], 3]
+    targets = tf.to_float(
+        tf.constant(np.random.random_sample(targets_shape) > 0.7))
+    logits = tf.constant(np.random.randn(*logits_shape).astype(np.float32))
+    weight_vector = [1.0, 1.0, 1.0]
+    loss = util.weighted_hinge_loss(targets, logits, weight_vector)
+    with self.test_session():
+      loss_value = loss.eval()
+      for i in range(len(weight_vector)):
+        expected = hinge_loss(targets, logits[:, :, i]).eval()
+        self.assertAllClose(loss_value[:, :, i], expected)
+class BuildLabelPriorsTest(tf.test.TestCase):
+  def testLabelPriorConsistency(self):
+    # Checks that, with zero pseudocounts, the returned label priors reproduce
+    # label frequencies in the batch.
+    batch_shape = [4, 10]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.678)))
+    label_priors_update = util.build_label_priors(
+        labels=labels, positive_pseudocount=0, negative_pseudocount=0)
+    expected_priors = tf.reduce_mean(labels, 0)
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(label_priors_update.eval(), expected_priors.eval())
+  def testLabelPriorsUpdate(self):
+    # Checks that the update of label priors behaves as expected.
+    batch_shape = [1, 5]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors_update = util.build_label_priors(labels)
+    label_sum = np.ones(shape=batch_shape)
+    weight_sum = 2.0 * np.ones(shape=batch_shape)
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      for _ in range(3):
+        label_sum += labels.eval()
+        weight_sum += np.ones(shape=batch_shape)
+        expected_posteriors = label_sum / weight_sum
+        label_priors = label_priors_update.eval().reshape(batch_shape)
+        self.assertAllClose(label_priors, expected_posteriors)
+        # Re-initialize labels to get a new random sample.
+        session.run(labels.initializer)
+  def testLabelPriorsUpdateWithWeights(self):
+    # Checks the update of label priors with per-example weights.
+    batch_size = 6
+    num_labels = 5
+    batch_shape = [batch_size, num_labels]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+    weights = tf.Variable(tf.random_uniform(batch_shape) * 6.2)
+    update_op = util.build_label_priors(labels, weights=weights)
+    expected_weighted_label_counts = 1.0 + tf.reduce_sum(weights * labels, 0)
+    expected_weight_sum = 2.0 + tf.reduce_sum(weights, 0)
+    expected_label_posteriors = tf.divide(expected_weighted_label_counts,
+                                          expected_weight_sum)
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      updated_priors, expected_posteriors = session.run(
+          [update_op, expected_label_posteriors])
+      self.assertAllClose(updated_priors, expected_posteriors)
+class WeightedSurrogateLossTest(parameterized.TestCase, tf.test.TestCase):
+  @parameterized.parameters(
+      ('hinge', util.weighted_hinge_loss),
+      ('xent', util.weighted_sigmoid_cross_entropy_with_logits))
+  def testCompatibilityLoss(self, loss_name, loss_fn):
+    x_shape = [28, 4]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.5))
+    positive_weights = 0.66
+    negative_weights = 11.1
+    expected_loss = loss_fn(
+        targets,
+        logits,
+        positive_weights=positive_weights,
+        negative_weights=negative_weights)
+    computed_loss = util.weighted_surrogate_loss(
+        targets,
+        logits,
+        loss_name,
+        positive_weights=positive_weights,
+        negative_weights=negative_weights)
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), computed_loss.eval())
+  def testSurrogatgeError(self):
+    x_shape = [7, 3]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.5))
+    with self.assertRaises(ValueError):
+      util.weighted_surrogate_loss(logits, targets, 'bug')
+if __name__ == '__main__':
+  tf.test.main()