Added new model, global objectives.

6c6f3f3a · Alan Mackey · cac3a298 · 6c6f3f3a · 6c6f3f3a · 6c6f3f3a
Commit 6c6f3f3a authored Jun 13, 2018 by Alan Mackey
8 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -14,6 +14,7 @@
 /research/differential_privacy/ @ilyamironov @ananthr
 /research/domain_adaptation/ @bousmalis @dmrd
 /research/gan/ @joel-shor
+/research/global_objectives/ @mackeya-google
 /research/im2txt/ @cshallue
 /research/inception/ @shlens @vincentvanhoucke
 /research/learned_optimizer/ @olganw @nirum

--- a/research/global_objectives/README.md
+++ b/research/global_objectives/README.md
+# Global Objectives
+The Global Objectives library provides TensorFlow loss functions that optimize
+directly for a variety of objectives including AUC, recall at precision, and
+more. The global objectives losses can be used as drop-in replacements for
+TensorFlow's standard multilabel loss functions:
+`tf.nn.sigmoid_cross_entropy_with_logits` and `tf.losses.sigmoid_cross_entropy`.
+
+Many machine learning classification models are optimized for classification
+accuracy, when the real objective the user cares about is different and can be
+precision at a fixed recall, precision-recall AUC, ROC AUC or similar metrics.
+These are referred to as "global objectives" because they depend on how the
+model classifies the dataset as a whole and do not decouple across data points
+as accuracy does.
+
+Because these objectives are combinatorial, discontinuous, and essentially
+intractable to optimize directly, the functions in this library approximate
+their corresponding objectives. This approximation approach follows the same
+pattern as optimizing for accuracy, where a surrogate objective such as
+cross-entropy or the hinge loss is used as an upper bound on the error rate.
+
+## Getting Started
+For a full example of how to use the loss functions in practice, see
+loss_layers_example.py.
+
+Briefly, global objective losses can be used to replace
+`tf.nn.sigmoid_cross_entropy_with_logits` by providing the relevant
+additional arguments. For example,
+
+``` python
+tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
+```
+
+could be replaced with
+
+``` python
+global_objectives.recall_at_precision_loss(
+    labels=labels,
+    logits=logits,
+    target_precision=0.95)[0]
+```
+
+Just as minimizing the cross-entropy loss will maximize accuracy, the loss
+functions in loss_layers.py were written so that minimizing the loss will
+maximize the corresponding objective.
+
+The global objective losses have two return values -- the loss tensor and
+additional quantities for debugging and customization -- which is why the first
+value is used above. For more information, see
+[Visualization & Debugging](#visualization-debugging).
+
+## Binary Label Format
+Binary classification problems can be represented as a multi-class problem with
+two classes, or as a multi-label problem with one label. (Recall that multiclass
+problems have mutually exclusive classes, e.g. 'cat xor dog', and multilabel
+have classes which are not mutually exclusive, e.g. an image can contain a cat,
+a dog, both, or neither.) The softmax loss
+(`tf.nn.softmax_cross_entropy_with_logits`) is used for multi-class problems,
+while the sigmoid loss (`tf.nn.sigmoid_cross_entropy_with_logits`) is used for
+multi-label problems.
+
+A multiclass label format for binary classification might represent positives
+with the label [1, 0] and negatives with the label [0, 1], while the multilbel
+format for the same problem would use [1] and [0], respectively.
+
+All global objectives loss functions assume that the multilabel format is used.
+Accordingly, if your current loss function is softmax, the labels will have to
+be reformatted for the loss to work properly.
+
+## Dual Variables
+Global objectives losses (except for `roc_auc_loss`) use internal variables
+called dual variables or Lagrange multipliers to enforce the desired constraint
+(e.g. if optimzing for recall at precision, the constraint is on precision).
+
+These dual variables are created and initialized internally by the loss
+functions, and are updated during training by the same optimizer used for the
+model's other variables. To initialize the dual variables to a particular value,
+use the `lambdas_initializer` argument. The dual variables can be found under
+the key `lambdas` in the `other_outputs` dictionary returned by the losses.
+
+## Loss Function Arguments
+The following arguments are common to all loss functions in the library, and are
+either required or very important.
+
+* `labels`: Corresponds directly to the `labels` argument of
+  `tf.nn.sigmoid_cross_entropy_with_logits`.
+* `logits`: Corresponds directly to the `logits` argument of
+  `tf.nn.sigmoid_cross_entropy_with_logits`.
+* `dual_rate_factor`: A floating point value which controls the step size for
+  the Lagrange multipliers. Setting this value less than 1.0 will cause the
+  constraint to be enforced more gradually and will result in more stable
+  training.
+
+In addition, the objectives with a single constraint (e.g.
+`recall_at_precision_loss`) have an argument (e.g. `target_precision`) used to
+specify the value of the constraint. The optional `precision_range` argument to
+`precision_recall_auc_loss` is used to specify the range of precision values
+over which to optimize the AUC, and defaults to the interval [0, 1].
+
+Optional arguments:
+
+* `weights`: A tensor which acts as coefficients for the loss. If a weight of x
+  is provided for a datapoint and that datapoint is a true (false) positive
+  (negative), it will be counted as x true (false) positives (negatives).
+  Defaults to 1.0.
+* `label_priors`: A tensor specifying the fraction of positive datapoints for
+  each label. If not provided, it will be computed inside the loss function.
+* `surrogate_type`: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+* `lambdas_initializer`: An initializer for the dual variables (Lagrange
+  multipliers). See also the Dual Variables section.
+* `num_anchors` (precision_recall_auc_loss only): The number of grid points used
+  when approximating the AUC as a Riemann sum.
+
+## Hyperparameters
+While the functional form of the global objectives losses allow them to be
+easily substituted in place of `sigmoid_cross_entropy_with_logits`, model
+hyperparameters such as learning rate, weight decay, etc. may need to be
+fine-tuned to the new loss. Fortunately, the amount of hyperparameter re-tuning
+is usually minor.
+
+The most important hyperparameters to modify are the learning rate and
+dual_rate_factor (see the section on Loss Function Arguments, above).
+
+## Visualization & Debugging
+The global objectives losses return two values. The first is a tensor
+representing the numerical value of the loss, which can be passed to an
+optimizer. The second is a dictionary of tensors created by the loss function
+which are not necessary for optimization but useful in debugging. These vary
+depending on the loss function, but usually include `lambdas` (the Lagrange
+multipliers) as well as the lower bound on true positives and upper bound on
+false positives.
+
+When visualizing the loss during training, note that the global objectives
+losses differ from standard losses in some important ways:
+
+* The global losses may be negative. This is because the value returned by the
+  loss includes terms involving the Lagrange multipliers, which may be negative.
+* The global losses may not decrease over the course of training. To enforce the
+  constraints in the objective, the loss changes over time and may increase.
+
+## More Info
+For more details, see the [Global Objectives paper](https://arxiv.org/abs/1608.04802).
+
+## Maintainers
+
+* Mariano Schain
+* Elad Eban
+* [Alan Mackey](https://github.com/mackeya-google)
--- a/research/global_objectives/loss_layers.py
+++ b/research/global_objectives/loss_layers.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Loss functions for learning global objectives.
+
+These functions have two return values: a Tensor with the value of
+the loss, and a dictionary of internal quantities for customizability.
+"""
+
+# Dependency imports
+import numpy
+import tensorflow as tf
+
+from global_objectives import util
+
+
+def precision_recall_auc_loss(
+    labels,
+    logits,
+    precision_range=(0.0, 1.0),
+    num_anchors=20,
+    weights=1.0,
+    dual_rate_factor=0.1,
+    label_priors=None,
+    surrogate_type='xent',
+    lambdas_initializer=tf.constant_initializer(1.0),
+    reuse=None,
+    variables_collections=None,
+    trainable=True,
+    scope=None):
+  """Computes precision-recall AUC loss.
+
+  The loss is based on a sum of losses for recall at a range of
+  precision values (anchor points). This sum is a Riemann sum that
+  approximates the area under the precision-recall curve.
+
+  The per-example `weights` argument changes not only the coefficients of
+  individual training examples, but how the examples are counted toward the
+  constraint. If `label_priors` is given, it MUST take `weights` into account.
+  That is,
+      label_priors = P / (P + N)
+  where
+      P = sum_i (wt_i on positives)
+      N = sum_i (wt_i on negatives).
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    precision_range: A length-two tuple, the range of precision values over
+      which to compute AUC. The entries must be nonnegative, increasing, and
+      less than or equal to 1.0.
+    num_anchors: The number of grid points used to approximate the Riemann sum.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    dual_rate_factor: A floating point value which controls the step size for
+      the Lagrange multipliers.
+    label_priors: None, or a floating point `Tensor` of shape [num_labels]
+      containing the prior probability of each label (i.e. the fraction of the
+      training data consisting of positive examples). If None, the label
+      priors are computed from `labels` with a moving average. See the notes
+      above regarding the interaction with `weights` and do not set this unless
+      you have a good reason to do so.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+    lambdas_initializer: An initializer for the Lagrange multipliers.
+    reuse: Whether or not the layer and its variables should be reused. To be
+      able to reuse the layer scope must be given.
+    variables_collections: Optional list of collections for the variables.
+    trainable: If `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    scope: Optional scope for `variable_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise
+      loss.
+    other_outputs: A dictionary of useful internal quantities for debugging. For
+      more details, see http://arxiv.org/pdf/1608.04802.pdf.
+      lambdas: A Tensor of shape [1, num_labels, num_anchors] consisting of the
+        Lagrange multipliers.
+      biases: A Tensor of shape [1, num_labels, num_anchors] consisting of the
+        learned bias term for each.
+      label_priors: A Tensor of shape [1, num_labels, 1] consisting of the prior
+        probability of each label learned by the loss, if not provided.
+      true_positives_lower_bound: Lower bound on the number of true positives
+        given `labels` and `logits`. This is the same lower bound which is used
+        in the loss expression to be optimized.
+      false_positives_upper_bound: Upper bound on the number of false positives
+        given `labels` and `logits`. This is the same upper bound which is used
+        in the loss expression to be optimized.
+
+  Raises:
+    ValueError: If `surrogate_type` is not `xent` or `hinge`.
+  """
+  with tf.variable_scope(scope,
+                         'precision_recall_auc',
+                         [labels, logits, label_priors],
+                         reuse=reuse):
+    labels, logits, weights, original_shape = _prepare_labels_logits_weights(
+        labels, logits, weights)
+    num_labels = util.get_num_labels(logits)
+
+    # Convert other inputs to tensors and standardize dtypes.
+    dual_rate_factor = util.convert_and_cast(
+        dual_rate_factor, 'dual_rate_factor', logits.dtype)
+
+    # Create Tensor of anchor points and distance between anchors.
+    precision_values, delta = _range_to_anchors_and_delta(
+        precision_range, num_anchors, logits.dtype)
+    # Create lambdas with shape [1, num_labels, num_anchors].
+    lambdas, lambdas_variable = _create_dual_variable(
+        'lambdas',
+        shape=[1, num_labels, num_anchors],
+        dtype=logits.dtype,
+        initializer=lambdas_initializer,
+        collections=variables_collections,
+        trainable=trainable,
+        dual_rate_factor=dual_rate_factor)
+    # Create biases with shape [1, num_labels, num_anchors].
+    biases = tf.contrib.framework.model_variable(
+        name='biases',
+        shape=[1, num_labels, num_anchors],
+        dtype=logits.dtype,
+        initializer=tf.zeros_initializer(),
+        collections=variables_collections,
+        trainable=trainable)
+    # Maybe create label_priors.
+    label_priors = maybe_create_label_priors(
+        label_priors, labels, weights, variables_collections)
+    label_priors = tf.reshape(label_priors, [1, num_labels, 1])
+
+    # Expand logits, labels, and weights to shape [batch_size, num_labels, 1].
+    logits = tf.expand_dims(logits, 2)
+    labels = tf.expand_dims(labels, 2)
+    weights = tf.expand_dims(weights, 2)
+
+    # Calculate weighted loss and other outputs. The log(2.0) term corrects for
+    # logloss not being an upper bound on the indicator function.
+    loss = weights * util.weighted_surrogate_loss(
+        labels,
+        logits + biases,
+        surrogate_type=surrogate_type,
+        positive_weights=1.0 + lambdas * (1.0 - precision_values),
+        negative_weights=lambdas * precision_values)
+    maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+    maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+    lambda_term = lambdas * (1.0 - precision_values) * label_priors * maybe_log2
+    per_anchor_loss = loss - lambda_term
+    per_label_loss = delta * tf.reduce_sum(per_anchor_loss, 2)
+    # Normalize the AUC such that a perfect score function will have AUC 1.0.
+    # Because precision_range is discretized into num_anchors + 1 intervals
+    # but only num_anchors terms are included in the Riemann sum, the
+    # effective length of the integration interval is `delta` less than the
+    # length of precision_range.
+    scaled_loss = tf.div(per_label_loss,
+                         precision_range[1] - precision_range[0] - delta,
+                         name='AUC_Normalize')
+    scaled_loss = tf.reshape(scaled_loss, original_shape)
+
+    other_outputs = {
+        'lambdas': lambdas_variable,
+        'biases': biases,
+        'label_priors': label_priors,
+        'true_positives_lower_bound': true_positives_lower_bound(
+            labels, logits, weights, surrogate_type),
+        'false_positives_upper_bound': false_positives_upper_bound(
+            labels, logits, weights, surrogate_type)}
+
+    return scaled_loss, other_outputs
+
+
+def roc_auc_loss(
+    labels,
+    logits,
+    weights=1.0,
+    surrogate_type='xent',
+    scope=None):
+  """Computes ROC AUC loss.
+
+  The area under the ROC curve is the probability p that a randomly chosen
+  positive example will be scored higher than a randomly chosen negative
+  example. This loss approximates 1-p by using a surrogate (either hinge loss or
+  cross entropy) for the indicator function. Specifically, the loss is:
+
+    sum_i sum_j w_i*w_j*loss(logit_i - logit_j)
+
+  where i ranges over the positive datapoints, j ranges over the negative
+  datapoints, logit_k denotes the logit (or score) of the k-th datapoint, and
+  loss is either the hinge or log loss given a positive label.
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape and dtype as `labels`.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for the indicator function.
+    scope: Optional scope for `name_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise loss.
+    other_outputs: An empty dictionary, for consistency.
+
+  Raises:
+    ValueError: If `surrogate_type` is not `xent` or `hinge`.
+  """
+  with tf.name_scope(scope, 'roc_auc', [labels, logits, weights]):
+    # Convert inputs to tensors and standardize dtypes.
+    labels, logits, weights, original_shape = _prepare_labels_logits_weights(
+        labels, logits, weights)
+
+    # Create tensors of pairwise differences for logits and labels, and
+    # pairwise products of weights. These have shape
+    # [batch_size, batch_size, num_labels].
+    logits_difference = tf.expand_dims(logits, 0) - tf.expand_dims(logits, 1)
+    labels_difference = tf.expand_dims(labels, 0) - tf.expand_dims(labels, 1)
+    weights_product = tf.expand_dims(weights, 0) * tf.expand_dims(weights, 1)
+
+    signed_logits_difference = labels_difference * logits_difference
+    raw_loss = util.weighted_surrogate_loss(
+        labels=tf.ones_like(signed_logits_difference),
+        logits=signed_logits_difference,
+        surrogate_type=surrogate_type)
+    weighted_loss = weights_product * raw_loss
+
+    # Zero out entries of the loss where labels_difference zero (so loss is only
+    # computed on pairs with different labels).
+    loss = tf.reduce_mean(tf.abs(labels_difference) * weighted_loss, 0) * 0.5
+    loss = tf.reshape(loss, original_shape)
+    return loss, {}
+
+
+def recall_at_precision_loss(
+    labels,
+    logits,
+    target_precision,
+    weights=1.0,
+    dual_rate_factor=0.1,
+    label_priors=None,
+    surrogate_type='xent',
+    lambdas_initializer=tf.constant_initializer(1.0),
+    reuse=None,
+    variables_collections=None,
+    trainable=True,
+    scope=None):
+  """Computes recall at precision loss.
+
+  The loss is based on a surrogate of the form
+      wt * w(+) * loss(+) + wt * w(-) * loss(-) - c * pi,
+  where:
+  - w(+) =  1 + lambdas * (1 - target_precision)
+  - loss(+) is the cross-entropy loss on the positive examples
+  - w(-) = lambdas * target_precision
+  - loss(-) is the cross-entropy loss on the negative examples
+  - wt is a scalar or tensor of per-example weights
+  - c = lambdas * (1 - target_precision)
+  - pi is the label_priors.
+
+  The per-example weights change not only the coefficients of individual
+  training examples, but how the examples are counted toward the constraint.
+  If `label_priors` is given, it MUST take `weights` into account. That is,
+      label_priors = P / (P + N)
+  where
+      P = sum_i (wt_i on positives)
+      N = sum_i (wt_i on negatives).
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    target_precision: The precision at which to compute the loss. Can be a
+      floating point value between 0 and 1 for a single precision value, or a
+      `Tensor` of shape [num_labels], holding each label's target precision
+      value.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    dual_rate_factor: A floating point value which controls the step size for
+      the Lagrange multipliers.
+    label_priors: None, or a floating point `Tensor` of shape [num_labels]
+      containing the prior probability of each label (i.e. the fraction of the
+      training data consisting of positive examples). If None, the label
+      priors are computed from `labels` with a moving average. See the notes
+      above regarding the interaction with `weights` and do not set this unless
+      you have a good reason to do so.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+    lambdas_initializer: An initializer for the Lagrange multipliers.
+    reuse: Whether or not the layer and its variables should be reused. To be
+      able to reuse the layer scope must be given.
+    variables_collections: Optional list of collections for the variables.
+    trainable: If `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    scope: Optional scope for `variable_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise
+      loss.
+    other_outputs: A dictionary of useful internal quantities for debugging. For
+      more details, see http://arxiv.org/pdf/1608.04802.pdf.
+      lambdas: A Tensor of shape [num_labels] consisting of the Lagrange
+        multipliers.
+      label_priors: A Tensor of shape [num_labels] consisting of the prior
+        probability of each label learned by the loss, if not provided.
+      true_positives_lower_bound: Lower bound on the number of true positives
+        given `labels` and `logits`. This is the same lower bound which is used
+        in the loss expression to be optimized.
+      false_positives_upper_bound: Upper bound on the number of false positives
+        given `labels` and `logits`. This is the same upper bound which is used
+        in the loss expression to be optimized.
+
+  Raises:
+    ValueError: If `logits` and `labels` do not have the same shape.
+  """
+  with tf.variable_scope(scope,
+                         'recall_at_precision',
+                         [logits, labels, label_priors],
+                         reuse=reuse):
+    labels, logits, weights, original_shape = _prepare_labels_logits_weights(
+        labels, logits, weights)
+    num_labels = util.get_num_labels(logits)
+
+    # Convert other inputs to tensors and standardize dtypes.
+    target_precision = util.convert_and_cast(
+        target_precision, 'target_precision', logits.dtype)
+    dual_rate_factor = util.convert_and_cast(
+        dual_rate_factor, 'dual_rate_factor', logits.dtype)
+
+    # Create lambdas.
+    lambdas, lambdas_variable = _create_dual_variable(
+        'lambdas',
+        shape=[num_labels],
+        dtype=logits.dtype,
+        initializer=lambdas_initializer,
+        collections=variables_collections,
+        trainable=trainable,
+        dual_rate_factor=dual_rate_factor)
+    # Maybe create label_priors.
+    label_priors = maybe_create_label_priors(
+        label_priors, labels, weights, variables_collections)
+
+    # Calculate weighted loss and other outputs. The log(2.0) term corrects for
+    # logloss not being an upper bound on the indicator function.
+    weighted_loss = weights * util.weighted_surrogate_loss(
+        labels,
+        logits,
+        surrogate_type=surrogate_type,
+        positive_weights=1.0 + lambdas * (1.0 - target_precision),
+        negative_weights=lambdas * target_precision)
+    maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+    maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+    lambda_term = lambdas * (1.0 - target_precision) * label_priors * maybe_log2
+    loss = tf.reshape(weighted_loss - lambda_term, original_shape)
+    other_outputs = {
+        'lambdas': lambdas_variable,
+        'label_priors': label_priors,
+        'true_positives_lower_bound': true_positives_lower_bound(
+            labels, logits, weights, surrogate_type),
+        'false_positives_upper_bound': false_positives_upper_bound(
+            labels, logits, weights, surrogate_type)}
+
+    return loss, other_outputs
+
+
+def precision_at_recall_loss(
+    labels,
+    logits,
+    target_recall,
+    weights=1.0,
+    dual_rate_factor=0.1,
+    label_priors=None,
+    surrogate_type='xent',
+    lambdas_initializer=tf.constant_initializer(1.0),
+    reuse=None,
+    variables_collections=None,
+    trainable=True,
+    scope=None):
+  """Computes precision at recall loss.
+
+  The loss is based on a surrogate of the form
+     wt * loss(-) + lambdas * (pi * (b - 1) + wt * loss(+))
+  where:
+  - loss(-) is the cross-entropy loss on the negative examples
+  - loss(+) is the cross-entropy loss on the positive examples
+  - wt is a scalar or tensor of per-example weights
+  - b is the target recall
+  - pi is the label_priors.
+
+  The per-example weights change not only the coefficients of individual
+  training examples, but how the examples are counted toward the constraint.
+  If `label_priors` is given, it MUST take `weights` into account. That is,
+      label_priors = P / (P + N)
+  where
+      P = sum_i (wt_i on positives)
+      N = sum_i (wt_i on negatives).
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    target_recall: The recall at which to compute the loss. Can be a floating
+      point value between 0 and 1 for a single target recall value, or a
+      `Tensor` of shape [num_labels] holding each label's target recall value.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    dual_rate_factor: A floating point value which controls the step size for
+      the Lagrange multipliers.
+    label_priors: None, or a floating point `Tensor` of shape [num_labels]
+      containing the prior probability of each label (i.e. the fraction of the
+      training data consisting of positive examples). If None, the label
+      priors are computed from `labels` with a moving average. See the notes
+      above regarding the interaction with `weights` and do not set this unless
+      you have a good reason to do so.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+    lambdas_initializer: An initializer for the Lagrange multipliers.
+    reuse: Whether or not the layer and its variables should be reused. To be
+      able to reuse the layer scope must be given.
+    variables_collections: Optional list of collections for the variables.
+    trainable: If `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    scope: Optional scope for `variable_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise
+      loss.
+    other_outputs: A dictionary of useful internal quantities for debugging. For
+      more details, see http://arxiv.org/pdf/1608.04802.pdf.
+      lambdas: A Tensor of shape [num_labels] consisting of the Lagrange
+        multipliers.
+      label_priors: A Tensor of shape [num_labels] consisting of the prior
+        probability of each label learned by the loss, if not provided.
+      true_positives_lower_bound: Lower bound on the number of true positives
+        given `labels` and `logits`. This is the same lower bound which is used
+        in the loss expression to be optimized.
+      false_positives_upper_bound: Upper bound on the number of false positives
+        given `labels` and `logits`. This is the same upper bound which is used
+        in the loss expression to be optimized.
+  """
+  with tf.variable_scope(scope,
+                         'precision_at_recall',
+                         [logits, labels, label_priors],
+                         reuse=reuse):
+    labels, logits, weights, original_shape = _prepare_labels_logits_weights(
+        labels, logits, weights)
+    num_labels = util.get_num_labels(logits)
+
+    # Convert other inputs to tensors and standardize dtypes.
+    target_recall = util.convert_and_cast(
+        target_recall, 'target_recall', logits.dtype)
+    dual_rate_factor = util.convert_and_cast(
+        dual_rate_factor, 'dual_rate_factor', logits.dtype)
+
+    # Create lambdas.
+    lambdas, lambdas_variable = _create_dual_variable(
+        'lambdas',
+        shape=[num_labels],
+        dtype=logits.dtype,
+        initializer=lambdas_initializer,
+        collections=variables_collections,
+        trainable=trainable,
+        dual_rate_factor=dual_rate_factor)
+    # Maybe create label_priors.
+    label_priors = maybe_create_label_priors(
+        label_priors, labels, weights, variables_collections)
+
+    # Calculate weighted loss and other outputs. The log(2.0) term corrects for
+    # logloss not being an upper bound on the indicator function.
+    weighted_loss = weights * util.weighted_surrogate_loss(
+        labels,
+        logits,
+        surrogate_type,
+        positive_weights=lambdas,
+        negative_weights=1.0)
+    maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+    maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+    lambda_term = lambdas * label_priors * (target_recall - 1.0) * maybe_log2
+    loss = tf.reshape(weighted_loss + lambda_term, original_shape)
+    other_outputs = {
+        'lambdas': lambdas_variable,
+        'label_priors': label_priors,
+        'true_positives_lower_bound': true_positives_lower_bound(
+            labels, logits, weights, surrogate_type),
+        'false_positives_upper_bound': false_positives_upper_bound(
+            labels, logits, weights, surrogate_type)}
+
+    return loss, other_outputs
+
+
+def false_positive_rate_at_true_positive_rate_loss(
+    labels,
+    logits,
+    target_rate,
+    weights=1.0,
+    dual_rate_factor=0.1,
+    label_priors=None,
+    surrogate_type='xent',
+    lambdas_initializer=tf.constant_initializer(1.0),
+    reuse=None,
+    variables_collections=None,
+    trainable=True,
+    scope=None):
+  """Computes false positive rate at true positive rate loss.
+
+  Note that `true positive rate` is a synonym for Recall, and that minimizing
+  the false positive rate and maximizing precision are equivalent for a fixed
+  Recall. Therefore, this function is identical to precision_at_recall_loss.
+
+  The per-example weights change not only the coefficients of individual
+  training examples, but how the examples are counted toward the constraint.
+  If `label_priors` is given, it MUST take `weights` into account. That is,
+      label_priors = P / (P + N)
+  where
+      P = sum_i (wt_i on positives)
+      N = sum_i (wt_i on negatives).
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    target_rate: The true positive rate at which to compute the loss. Can be a
+      floating point value between 0 and 1 for a single true positive rate, or
+      a `Tensor` of shape [num_labels] holding each label's true positive rate.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    dual_rate_factor: A floating point value which controls the step size for
+      the Lagrange multipliers.
+    label_priors: None, or a floating point `Tensor` of shape [num_labels]
+      containing the prior probability of each label (i.e. the fraction of the
+      training data consisting of positive examples). If None, the label
+      priors are computed from `labels` with a moving average. See the notes
+      above regarding the interaction with `weights` and do not set this unless
+      you have a good reason to do so.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions. 'xent' will use the cross-entropy
+      loss surrogate, and 'hinge' will use the hinge loss.
+    lambdas_initializer: An initializer op for the Lagrange multipliers.
+    reuse: Whether or not the layer and its variables should be reused. To be
+      able to reuse the layer scope must be given.
+    variables_collections: Optional list of collections for the variables.
+    trainable: If `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    scope: Optional scope for `variable_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise
+      loss.
+    other_outputs: A dictionary of useful internal quantities for debugging. For
+      more details, see http://arxiv.org/pdf/1608.04802.pdf.
+      lambdas: A Tensor of shape [num_labels] consisting of the Lagrange
+        multipliers.
+      label_priors: A Tensor of shape [num_labels] consisting of the prior
+        probability of each label learned by the loss, if not provided.
+      true_positives_lower_bound: Lower bound on the number of true positives
+        given `labels` and `logits`. This is the same lower bound which is used
+        in the loss expression to be optimized.
+      false_positives_upper_bound: Upper bound on the number of false positives
+        given `labels` and `logits`. This is the same upper bound which is used
+        in the loss expression to be optimized.
+
+  Raises:
+    ValueError: If `surrogate_type` is not `xent` or `hinge`.
+  """
+  return precision_at_recall_loss(labels=labels,
+                                  logits=logits,
+                                  target_recall=target_rate,
+                                  weights=weights,
+                                  dual_rate_factor=dual_rate_factor,
+                                  label_priors=label_priors,
+                                  surrogate_type=surrogate_type,
+                                  lambdas_initializer=lambdas_initializer,
+                                  reuse=reuse,
+                                  variables_collections=variables_collections,
+                                  trainable=trainable,
+                                  scope=scope)
+
+
+def true_positive_rate_at_false_positive_rate_loss(
+    labels,
+    logits,
+    target_rate,
+    weights=1.0,
+    dual_rate_factor=0.1,
+    label_priors=None,
+    surrogate_type='xent',
+    lambdas_initializer=tf.constant_initializer(1.0),
+    reuse=None,
+    variables_collections=None,
+    trainable=True,
+    scope=None):
+  """Computes true positive rate at false positive rate loss.
+
+  The loss is based on a surrogate of the form
+      wt * loss(+) + lambdas * (wt * loss(-) - r * (1 - pi))
+  where:
+  - loss(-) is the loss on the negative examples
+  - loss(+) is the loss on the positive examples
+  - wt is a scalar or tensor of per-example weights
+  - r is the target rate
+  - pi is the label_priors.
+
+  The per-example weights change not only the coefficients of individual
+  training examples, but how the examples are counted toward the constraint.
+  If `label_priors` is given, it MUST take `weights` into account. That is,
+      label_priors = P / (P + N)
+  where
+      P = sum_i (wt_i on positives)
+      N = sum_i (wt_i on negatives).
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    target_rate: The false positive rate at which to compute the loss. Can be a
+      floating point value between 0 and 1 for a single false positive rate, or
+      a `Tensor` of shape [num_labels] holding each label's false positive rate.
+    weights: Coefficients for the loss. Must be a scalar or `Tensor` of shape
+      [batch_size] or [batch_size, num_labels].
+    dual_rate_factor: A floating point value which controls the step size for
+      the Lagrange multipliers.
+    label_priors: None, or a floating point `Tensor` of shape [num_labels]
+      containing the prior probability of each label (i.e. the fraction of the
+      training data consisting of positive examples). If None, the label
+      priors are computed from `labels` with a moving average. See the notes
+      above regarding the interaction with `weights` and do not set this unless
+      you have a good reason to do so.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions. 'xent' will use the cross-entropy
+      loss surrogate, and 'hinge' will use the hinge loss.
+    lambdas_initializer: An initializer op for the Lagrange multipliers.
+    reuse: Whether or not the layer and its variables should be reused. To be
+      able to reuse the layer scope must be given.
+    variables_collections: Optional list of collections for the variables.
+    trainable: If `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    scope: Optional scope for `variable_scope`.
+
+  Returns:
+    loss: A `Tensor` of the same shape as `logits` with the component-wise
+      loss.
+    other_outputs: A dictionary of useful internal quantities for debugging. For
+      more details, see http://arxiv.org/pdf/1608.04802.pdf.
+      lambdas: A Tensor of shape [num_labels] consisting of the Lagrange
+        multipliers.
+      label_priors: A Tensor of shape [num_labels] consisting of the prior
+        probability of each label learned by the loss, if not provided.
+      true_positives_lower_bound: Lower bound on the number of true positives
+        given `labels` and `logits`. This is the same lower bound which is used
+        in the loss expression to be optimized.
+      false_positives_upper_bound: Upper bound on the number of false positives
+        given `labels` and `logits`. This is the same upper bound which is used
+        in the loss expression to be optimized.
+
+  Raises:
+    ValueError: If `surrogate_type` is not `xent` or `hinge`.
+  """
+  with tf.variable_scope(scope,
+                         'tpr_at_fpr',
+                         [labels, logits, label_priors],
+                         reuse=reuse):
+    labels, logits, weights, original_shape = _prepare_labels_logits_weights(
+        labels, logits, weights)
+    num_labels = util.get_num_labels(logits)
+
+    # Convert other inputs to tensors and standardize dtypes.
+    target_rate = util.convert_and_cast(
+        target_rate, 'target_rate', logits.dtype)
+    dual_rate_factor = util.convert_and_cast(
+        dual_rate_factor, 'dual_rate_factor', logits.dtype)
+
+    # Create lambdas.
+    lambdas, lambdas_variable = _create_dual_variable(
+        'lambdas',
+        shape=[num_labels],
+        dtype=logits.dtype,
+        initializer=lambdas_initializer,
+        collections=variables_collections,
+        trainable=trainable,
+        dual_rate_factor=dual_rate_factor)
+    # Maybe create label_priors.
+    label_priors = maybe_create_label_priors(
+        label_priors, labels, weights, variables_collections)
+
+    # Loss op and other outputs. The log(2.0) term corrects for
+    # logloss not being an upper bound on the indicator function.
+    weighted_loss = weights * util.weighted_surrogate_loss(
+        labels,
+        logits,
+        surrogate_type=surrogate_type,
+        positive_weights=1.0,
+        negative_weights=lambdas)
+    maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+    maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+    lambda_term = lambdas * target_rate * (1.0 - label_priors) * maybe_log2
+    loss = tf.reshape(weighted_loss - lambda_term, original_shape)
+    other_outputs = {
+        'lambdas': lambdas_variable,
+        'label_priors': label_priors,
+        'true_positives_lower_bound': true_positives_lower_bound(
+            labels, logits, weights, surrogate_type),
+        'false_positives_upper_bound': false_positives_upper_bound(
+            labels, logits, weights, surrogate_type)}
+
+  return loss, other_outputs
+
+
+def _prepare_labels_logits_weights(labels, logits, weights):
+  """Validates labels, logits, and weights.
+
+  Converts inputs to tensors, checks shape compatibility, and casts dtype if
+  necessary.
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` with the same shape as `labels`.
+    weights: Either `None` or a `Tensor` with shape broadcastable to `logits`.
+
+  Returns:
+    labels: Same as `labels` arg after possible conversion to tensor, cast, and
+      reshape.
+    logits: Same as `logits` arg after possible conversion to tensor and
+      reshape.
+    weights: Same as `weights` arg after possible conversion, cast, and reshape.
+    original_shape: Shape of `labels` and `logits` before reshape.
+
+  Raises:
+    ValueError: If `labels` and `logits` do not have the same shape.
+  """
+  # Convert `labels` and `logits` to Tensors and standardize dtypes.
+  logits = tf.convert_to_tensor(logits, name='logits')
+  labels = util.convert_and_cast(labels, 'labels', logits.dtype.base_dtype)
+  weights = util.convert_and_cast(weights, 'weights', logits.dtype.base_dtype)
+
+  try:
+    labels.get_shape().merge_with(logits.get_shape())
+  except ValueError:
+    raise ValueError('logits and labels must have the same shape (%s vs %s)' %
+                     (logits.get_shape(), labels.get_shape()))
+
+  original_shape = labels.get_shape().as_list()
+  if labels.get_shape().ndims > 0:
+    original_shape[0] = -1
+  if labels.get_shape().ndims <= 1:
+    labels = tf.reshape(labels, [-1, 1])
+    logits = tf.reshape(logits, [-1, 1])
+
+  if weights.get_shape().ndims == 1:
+    # Weights has shape [batch_size]. Reshape to [batch_size, 1].
+    weights = tf.reshape(weights, [-1, 1])
+  if weights.get_shape().ndims == 0:
+    # Weights is a scalar. Change shape of weights to match logits.
+    weights *= tf.ones_like(logits)
+
+  return labels, logits, weights, original_shape
+
+
+def _range_to_anchors_and_delta(precision_range, num_anchors, dtype):
+  """Calculates anchor points from precision range.
+
+  Args:
+    precision_range: As required in precision_recall_auc_loss.
+    num_anchors: int, number of equally spaced anchor points.
+    dtype: Data type of returned tensors.
+
+  Returns:
+    precision_values: A `Tensor` of data type dtype with equally spaced values
+      in the interval precision_range.
+    delta: The spacing between the values in precision_values.
+
+  Raises:
+    ValueError: If precision_range is invalid.
+  """
+  # Validate precision_range.
+  if not 0 <= precision_range[0] <= precision_range[-1] <= 1:
+    raise ValueError('precision values must obey 0 <= %f <= %f <= 1' %
+                     (precision_range[0], precision_range[-1]))
+  if not 0 < len(precision_range) < 3:
+    raise ValueError('length of precision_range (%d) must be 1 or 2' %
+                     len(precision_range))
+
+  # Sets precision_values uniformly between min_precision and max_precision.
+  values = numpy.linspace(start=precision_range[0],
+                          stop=precision_range[1],
+                          num=num_anchors+2)[1:-1]
+  precision_values = util.convert_and_cast(
+      values, 'precision_values', dtype)
+  delta = util.convert_and_cast(
+      values[0] - precision_range[0], 'delta', dtype)
+  # Makes precision_values [1, 1, num_anchors].
+  precision_values = util.expand_outer(precision_values, 3)
+  return precision_values, delta
+
+
+def _create_dual_variable(name, shape, dtype, initializer, collections,
+                          trainable, dual_rate_factor):
+  """Creates a new dual variable.
+
+  Dual variables are required to be nonnegative. If trainable, their gradient
+  is reversed so that they are maximized (rather than minimized) by the
+  optimizer.
+
+  Args:
+    name: A string, the name for the new variable.
+    shape: Shape of the new variable.
+    dtype: Data type for the new variable.
+    initializer: Initializer for the new variable.
+    collections: List of graph collections keys. The new variable is added to
+      these collections. Defaults to `[GraphKeys.GLOBAL_VARIABLES]`.
+    trainable: If `True`, the default, also adds the variable to the graph
+      collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
+      the default list of variables to use by the `Optimizer` classes.
+    dual_rate_factor: A floating point value or `Tensor`. The learning rate for
+      the dual variable is scaled by this factor.
+
+  Returns:
+    dual_value: An op that computes the absolute value of the dual variable
+      and reverses its gradient.
+    dual_variable: The underlying variable itself.
+  """
+  # We disable partitioning while constructing dual variables because they will
+  # be updated with assign, which is not available for partitioned variables.
+  partitioner = tf.get_variable_scope().partitioner
+  try:
+    tf.get_variable_scope().set_partitioner(None)
+    dual_variable = tf.contrib.framework.model_variable(
+        name=name,
+        shape=shape,
+        dtype=dtype,
+        initializer=initializer,
+        collections=collections,
+        trainable=trainable)
+  finally:
+    tf.get_variable_scope().set_partitioner(partitioner)
+  # Using the absolute value enforces nonnegativity.
+  dual_value = tf.abs(dual_variable)
+
+  if trainable:
+    # To reverse the gradient on the dual variable, multiply the gradient by
+    # -dual_rate_factor
+    dual_value = (tf.stop_gradient((1.0 + dual_rate_factor) * dual_value)
+                  - dual_rate_factor * dual_value)
+  return dual_value, dual_variable
+
+
+def maybe_create_label_priors(label_priors,
+                              labels,
+                              weights,
+                              variables_collections):
+  """Creates moving average ops to track label priors, if necessary.
+
+  Args:
+    label_priors: As required in e.g. precision_recall_auc_loss.
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    weights: As required in e.g. precision_recall_auc_loss.
+    variables_collections: Optional list of collections for the variables, if
+      any must be created.
+
+  Returns:
+    label_priors: A Tensor of shape [num_labels] consisting of the
+      weighted label priors, after updating with moving average ops if created.
+  """
+  if label_priors is not None:
+    label_priors = util.convert_and_cast(
+        label_priors, name='label_priors', dtype=labels.dtype.base_dtype)
+    return tf.squeeze(label_priors)
+
+  label_priors = util.build_label_priors(
+      labels,
+      weights,
+      variables_collections=variables_collections)
+  return label_priors
+
+
+def true_positives_lower_bound(labels, logits, weights, surrogate_type):
+  """Calculate a lower bound on the number of true positives.
+
+  This lower bound on the number of true positives given `logits` and `labels`
+  is the same one used in the global objectives loss functions.
+
+  Args:
+    labels: A `Tensor` of shape [batch_size] or [batch_size, num_labels].
+    logits: A `Tensor` of shape [batch_size, num_labels] or
+      [batch_size, num_labels, num_anchors]. If the third dimension is present,
+      the lower bound is computed on each slice [:, :, k] independently.
+    weights: Per-example loss coefficients, with shape broadcast-compatible with
+        that of `labels`.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+
+  Returns:
+    A `Tensor` of shape [num_labels] or [num_labels, num_anchors].
+  """
+  maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+  maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+  if logits.get_shape().ndims == 3 and labels.get_shape().ndims < 3:
+    labels = tf.expand_dims(labels, 2)
+  loss_on_positives = util.weighted_surrogate_loss(
+      labels, logits, surrogate_type, negative_weights=0.0) / maybe_log2
+  return tf.reduce_sum(weights * (labels - loss_on_positives), 0)
+
+
+def false_positives_upper_bound(labels, logits, weights, surrogate_type):
+  """Calculate an upper bound on the number of false positives.
+
+  This upper bound on the number of false positives given `logits` and `labels`
+  is the same one used in the global objectives loss functions.
+
+  Args:
+    labels: A `Tensor` of shape [batch_size, num_labels]
+    logits: A `Tensor` of shape [batch_size, num_labels]  or
+      [batch_size, num_labels, num_anchors]. If the third dimension is present,
+      the lower bound is computed on each slice [:, :, k] independently.
+    weights: Per-example loss coefficients, with shape broadcast-compatible with
+        that of `labels`.
+    surrogate_type: Either 'xent' or 'hinge', specifying which upper bound
+      should be used for indicator functions.
+
+  Returns:
+    A `Tensor` of shape [num_labels] or [num_labels, num_anchors].
+  """
+  maybe_log2 = tf.log(2.0) if surrogate_type == 'xent' else 1.0
+  maybe_log2 = tf.cast(maybe_log2, logits.dtype.base_dtype)
+  loss_on_negatives = util.weighted_surrogate_loss(
+      labels, logits, surrogate_type, positive_weights=0.0) / maybe_log2
+  return tf.reduce_sum(weights *  loss_on_negatives, 0)
--- a/research/global_objectives/loss_layers_example.py
+++ b/research/global_objectives/loss_layers_example.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Example for using global objectives.
+
+Illustrate, using synthetic data, how using the precision_at_recall loss
+significanly improves the performace of a linear classifier.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+# Dependency imports
+import numpy as np
+from sklearn.metrics import precision_score
+import tensorflow as tf
+from global_objectives import loss_layers
+
+# When optimizing using global_objectives, if set to True then the saddle point
+# optimization steps are performed internally by the Tensorflow optimizer,
+# otherwise by dedicated saddle-point steps as part of the optimization loop.
+USE_GO_SADDLE_POINT_OPT = False
+
+TARGET_RECALL = 0.98
+TRAIN_ITERATIONS = 150
+LEARNING_RATE = 1.0
+GO_DUAL_RATE_FACTOR = 15.0
+NUM_CHECKPOINTS = 6
+
+EXPERIMENT_DATA_CONFIG = {
+    'positives_centers': [[0, 1.0], [1, -0.5]],
+    'negatives_centers': [[0, -0.5], [1, 1.0]],
+    'positives_variances': [0.15, 0.1],
+    'negatives_variances': [0.15, 0.1],
+    'positives_counts': [500, 50],
+    'negatives_counts': [3000, 100]
+}
+
+
+def create_training_and_eval_data_for_experiment(**data_config):
+  """Creates train and eval data sets.
+
+  Note: The synthesized binary-labeled data is a mixture of four Gaussians - two
+    positives and two negatives. The centers, variances, and sizes for each of
+    the two positives and negatives mixtures are passed in the respective keys
+    of data_config:
+
+  Args:
+      **data_config: Dictionary with Array entries as follows:
+        positives_centers - float [2,2] two centers of positives data sets.
+        negatives_centers - float [2,2] two centers of negatives data sets.
+        positives_variances - float [2] Variances for the positives sets.
+        negatives_variances - float [2] Variances for the negatives sets.
+        positives_counts - int [2] Counts for each of the two positives sets.
+        negatives_counts - int [2] Counts for each of the two negatives sets.
+
+  Returns:
+    A dictionary with two shuffled data sets created - one for training and one
+    for eval. The dictionary keys are 'train_data', 'train_labels', 'eval_data',
+    and 'eval_labels'. The data points are two-dimentional floats, and the
+    labels are in {0,1}.
+  """
+  def data_points(is_positives, index):
+    variance = data_config['positives_variances'
+                           if is_positives else 'negatives_variances'][index]
+    center = data_config['positives_centers'
+                         if is_positives else 'negatives_centers'][index]
+    count = data_config['positives_counts'
+                        if is_positives else 'negatives_counts'][index]
+    return variance*np.random.randn(count, 2) + np.array([center])
+
+  def create_data():
+    return np.concatenate([data_points(False, 0),
+                           data_points(True, 0),
+                           data_points(True, 1),
+                           data_points(False, 1)], axis=0)
+
+  def create_labels():
+    """Creates an array of 0.0 or 1.0 labels for the data_config batches."""
+    return np.array([0.0]*data_config['negatives_counts'][0] +
+                    [1.0]*data_config['positives_counts'][0] +
+                    [1.0]*data_config['positives_counts'][1] +
+                    [0.0]*data_config['negatives_counts'][1])
+
+  permutation = np.random.permutation(
+      sum(data_config['positives_counts'] + data_config['negatives_counts']))
+
+  train_data = create_data()[permutation, :]
+  eval_data = create_data()[permutation, :]
+  train_labels = create_labels()[permutation]
+  eval_labels = create_labels()[permutation]
+
+  return {
+      'train_data': train_data,
+      'train_labels': train_labels,
+      'eval_data': eval_data,
+      'eval_labels': eval_labels
+  }
+
+
+def train_model(data, use_global_objectives):
+  """Trains a linear model for maximal accuracy or precision at given recall."""
+
+  def precision_at_recall(scores, labels, target_recall):
+    """Computes precision - at target recall - over data."""
+    positive_scores = scores[labels == 1.0]
+    threshold = np.percentile(positive_scores, 100 - target_recall*100)
+    predicted = scores >= threshold
+    return precision_score(labels, predicted)
+
+  w = tf.Variable(tf.constant([-1.0, -1.0], shape=[2, 1]), trainable=True,
+                  name='weights', dtype=tf.float32)
+  b = tf.Variable(tf.zeros([1]), trainable=True, name='biases',
+                  dtype=tf.float32)
+
+  logits = tf.matmul(tf.cast(data['train_data'], tf.float32), w) + b
+
+  labels = tf.constant(
+      data['train_labels'],
+      shape=[len(data['train_labels']), 1],
+      dtype=tf.float32)
+
+  if use_global_objectives:
+    loss, other_outputs = loss_layers.precision_at_recall_loss(
+        labels, logits,
+        TARGET_RECALL,
+        dual_rate_factor=GO_DUAL_RATE_FACTOR)
+    loss = tf.reduce_mean(loss)
+  else:
+    loss = tf.reduce_mean(
+        tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits))
+
+  global_step = tf.Variable(0, trainable=False)
+
+  learning_rate = tf.train.polynomial_decay(
+      LEARNING_RATE,
+      global_step,
+      TRAIN_ITERATIONS, (LEARNING_RATE / TRAIN_ITERATIONS),
+      power=1.0,
+      cycle=False,
+      name='learning_rate')
+
+  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
+
+  if (not use_global_objectives) or USE_GO_SADDLE_POINT_OPT:
+    training_op = optimizer.minimize(loss, global_step=global_step)
+  else:
+    lambdas = other_outputs['lambdas']
+    primal_update_op = optimizer.minimize(loss, var_list=[w, b])
+    dual_update_op = optimizer.minimize(
+        loss, global_step=global_step, var_list=[lambdas])
+
+  # Training loop:
+  with tf.Session() as sess:
+    checkpoint_step = TRAIN_ITERATIONS // NUM_CHECKPOINTS
+    sess.run(tf.global_variables_initializer())
+    step = sess.run(global_step)
+
+    while step <= TRAIN_ITERATIONS:
+      if (not use_global_objectives) or USE_GO_SADDLE_POINT_OPT:
+        _, step, loss_value, w_value, b_value = sess.run(
+            [training_op, global_step, loss, w, b])
+      else:
+        _, w_value, b_value = sess.run([primal_update_op, w, b])
+        _, loss_value, step = sess.run([dual_update_op, loss, global_step])
+
+      if use_global_objectives:
+        go_outputs = sess.run(other_outputs.values())
+
+      if step % checkpoint_step == 0:
+        precision = precision_at_recall(
+            np.dot(data['train_data'], w_value) + b_value,
+            data['train_labels'], TARGET_RECALL)
+
+        tf.logging.info('Loss = %f Precision = %f', loss_value, precision)
+        if use_global_objectives:
+          for i, output_name in enumerate(other_outputs.keys()):
+            tf.logging.info('\t%s = %f', output_name, go_outputs[i])
+
+    w_value, b_value = sess.run([w, b])
+    return precision_at_recall(np.dot(data['eval_data'], w_value) + b_value,
+                               data['eval_labels'],
+                               TARGET_RECALL)
+
+
+def main(unused_argv):
+  del unused_argv
+  experiment_data = create_training_and_eval_data_for_experiment(
+      **EXPERIMENT_DATA_CONFIG)
+  global_objectives_loss_precision = train_model(experiment_data, True)
+  tf.logging.info('global_objectives precision at requested recall is %f',
+                  global_objectives_loss_precision)
+  cross_entropy_loss_precision = train_model(experiment_data, False)
+  tf.logging.info('cross_entropy precision at requested recall is %f',
+                  cross_entropy_loss_precision)
+
+
+if __name__ == '__main__':
+  tf.logging.set_verbosity(tf.logging.INFO)
+  tf.app.run()
--- a/research/global_objectives/loss_layers_test.py
+++ b/research/global_objectives/loss_layers_test.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for global objectives loss layers."""
+
+# Dependency imports
+from absl.testing import parameterized
+import numpy
+import tensorflow as tf
+
+from global_objectives import loss_layers
+from global_objectives import util
+
+
+# TODO: Include weights in the lagrange multiplier update tests.
+class PrecisionRecallAUCLossTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.named_parameters(
+      ('_xent', 'xent', 0.7),
+      ('_hinge', 'hinge', 0.7),
+      ('_hinge_2', 'hinge', 0.5)
+  )
+  def testSinglePointAUC(self, surrogate_type, target_precision):
+    # Tests a case with only one anchor point, where the loss should equal
+    # recall_at_precision_loss
+    batch_shape = [10, 2]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+
+    auc_loss, _ = loss_layers.precision_recall_auc_loss(
+        labels,
+        logits,
+        precision_range=(target_precision - 0.01, target_precision  + 0.01),
+        num_anchors=1,
+        surrogate_type=surrogate_type)
+    point_loss, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=target_precision,
+        surrogate_type=surrogate_type)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(auc_loss.eval(), point_loss.eval())
+
+  def testThreePointAUC(self):
+    # Tests a case with three anchor points against a weighted sum of recall
+    # at precision losses.
+    batch_shape = [11, 3]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+
+    # TODO: Place the hing/xent loss in a for loop.
+    auc_loss, _ = loss_layers.precision_recall_auc_loss(
+        labels, logits, num_anchors=1)
+    first_point_loss, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.25)
+    second_point_loss, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.5)
+    third_point_loss, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.75)
+    expected_loss = (first_point_loss + second_point_loss +
+                     third_point_loss) / 3
+
+    auc_loss_hinge, _ = loss_layers.precision_recall_auc_loss(
+        labels, logits, num_anchors=1, surrogate_type='hinge')
+    first_point_hinge, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.25, surrogate_type='hinge')
+    second_point_hinge, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.5, surrogate_type='hinge')
+    third_point_hinge, _ = loss_layers.recall_at_precision_loss(
+        labels, logits, target_precision=0.75, surrogate_type='hinge')
+    expected_hinge = (first_point_hinge + second_point_hinge +
+                      third_point_hinge) / 3
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(auc_loss.eval(), expected_loss.eval())
+      self.assertAllClose(auc_loss_hinge.eval(), expected_hinge.eval())
+
+  def testLagrangeMultiplierUpdateDirection(self):
+    for target_precision in [0.35, 0.65]:
+      precision_range = (target_precision - 0.01, target_precision + 0.01)
+
+      for surrogate_type in ['xent', 'hinge']:
+        kwargs = {'precision_range': precision_range,
+                  'num_anchors': 1,
+                  'surrogate_type': surrogate_type,
+                  'scope': 'pr-auc_{}_{}'.format(target_precision,
+                                                 surrogate_type)}
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.precision_recall_auc_loss,
+            objective_kwargs=kwargs,
+            data_builder=_multilabel_data,
+            test_object=self)
+        kwargs['scope'] = 'other-' + kwargs['scope']
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.precision_recall_auc_loss,
+            objective_kwargs=kwargs,
+            data_builder=_other_multilabel_data(surrogate_type),
+            test_object=self)
+
+
+class ROCAUCLossTest(parameterized.TestCase, tf.test.TestCase):
+
+  def testSimpleScores(self):
+    # Tests the loss on data with only one negative example with score zero.
+    # In this case, the loss should equal the surrogate loss on the scores with
+    # positive labels.
+    num_positives = 10
+    scores_positives = tf.constant(3.0 * numpy.random.randn(num_positives),
+                                   shape=[num_positives, 1])
+    labels = tf.constant([0.0] + [1.0] * num_positives,
+                         shape=[num_positives + 1, 1])
+    scores = tf.concat([[[0.0]], scores_positives], 0)
+
+    loss = tf.reduce_sum(
+        loss_layers.roc_auc_loss(labels, scores, surrogate_type='hinge')[0])
+    expected_loss = tf.reduce_sum(
+        tf.maximum(1.0 - scores_positives, 0)) / (num_positives + 1)
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), loss.eval())
+
+  def testRandomROCLoss(self):
+    # Checks that random Bernoulli scores and labels has ~25% swaps.
+    shape = [1000, 30]
+    scores = tf.constant(
+        numpy.random.randint(0, 2, size=shape), shape=shape, dtype=tf.float32)
+    labels = tf.constant(
+        numpy.random.randint(0, 2, size=shape), shape=shape, dtype=tf.float32)
+    loss = tf.reduce_mean(loss_layers.roc_auc_loss(
+        labels, scores, surrogate_type='hinge')[0])
+    with self.test_session():
+      self.assertAllClose(0.25, loss.eval(), 1e-2)
+
+  @parameterized.named_parameters(
+      ('_zero_hinge', 'xent',
+       [0.0, 0.0, 0.0, 1.0, 1.0, 1.0],
+       [-5.0, -7.0, -9.0, 8.0, 10.0, 14.0],
+       0.0),
+      ('_zero_xent', 'hinge',
+       [0.0, 0.0, 0.0, 1.0, 1.0, 1.0],
+       [-0.2, 0, -0.1, 1.0, 1.1, 1.0],
+       0.0),
+      ('_xent', 'xent',
+       [0.0, 0.0, 0.0, 1.0, 1.0, 1.0],
+       [0.0, -17.0, -19.0, 1.0, 14.0, 14.0],
+       numpy.log(1.0 + numpy.exp(-1.0)) / 6),
+      ('_hinge', 'hinge',
+       [0.0, 0.0, 0.0, 1.0, 1.0, 1.0],
+       [-0.2, -0.05, 0.0, 0.95, 0.8, 1.0],
+       0.4 / 6)
+  )
+  def testManualROCLoss(self, surrogate_type, labels, logits, expected_value):
+    labels = tf.constant(labels)
+    logits = tf.constant(logits)
+    loss, _ = loss_layers.roc_auc_loss(
+        labels=labels, logits=logits, surrogate_type=surrogate_type)
+
+    with self.test_session():
+      self.assertAllClose(expected_value, tf.reduce_sum(loss).eval())
+
+  def testMultiLabelROCLoss(self):
+    # Tests the loss on multi-label data against manually computed loss.
+    targets = numpy.array([[0.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]])
+    scores = numpy.array([[0.1, 1.0, 1.1, 1.0], [1.0, 0.0, 1.3, 1.1]])
+    class_1_auc = tf.reduce_sum(
+        loss_layers.roc_auc_loss(targets[0], scores[0])[0])
+    class_2_auc = tf.reduce_sum(
+        loss_layers.roc_auc_loss(targets[1], scores[1])[0])
+    total_auc = tf.reduce_sum(loss_layers.roc_auc_loss(
+        targets.transpose(), scores.transpose())[0])
+
+    with self.test_session():
+      self.assertAllClose(total_auc.eval(),
+                          class_1_auc.eval() + class_2_auc.eval())
+
+  def testWeights(self):
+    # Test the loss with per-example weights.
+    # The logits_negatives below are repeated, so that setting half their
+    # weights to 2 and the other half to 0 should leave the loss unchanged.
+    logits_positives = tf.constant([2.54321, -0.26, 3.334334], shape=[3, 1])
+    logits_negatives = tf.constant([-0.6, 1, -1.3, -1.3, -0.6, 1], shape=[6, 1])
+    logits = tf.concat([logits_positives, logits_negatives], 0)
+    targets = tf.constant([1, 1, 1, 0, 0, 0, 0, 0, 0],
+                          shape=[9, 1], dtype=tf.float32)
+    weights = tf.constant([1, 1, 1, 0, 0, 0, 2, 2, 2],
+                          shape=[9, 1], dtype=tf.float32)
+
+    loss = tf.reduce_sum(loss_layers.roc_auc_loss(targets, logits)[0])
+    weighted_loss = tf.reduce_sum(
+        loss_layers.roc_auc_loss(targets, logits, weights)[0])
+
+    with self.test_session():
+      self.assertAllClose(loss.eval(), weighted_loss.eval())
+
+
+class RecallAtPrecisionTest(tf.test.TestCase):
+
+  def testEqualWeightLoss(self):
+    # Tests a special case where the loss should equal cross entropy loss.
+    target_precision = 1.0
+    num_labels = 5
+    batch_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.7)))
+    label_priors = tf.constant(0.34, shape=[num_labels])
+
+    loss, _ = loss_layers.recall_at_precision_loss(
+        targets, logits, target_precision, label_priors=label_priors)
+    expected_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      loss_val, expected_val = session.run([loss, expected_loss])
+      self.assertAllClose(loss_val, expected_val)
+
+  def testEqualWeightLossWithMultiplePrecisions(self):
+    """Tests a case where the loss equals xent loss with multiple precisions."""
+    target_precision = [1.0, 1.0]
+    num_labels = 2
+    batch_size = 20
+    target_shape = [batch_size, num_labels]
+    logits = tf.Variable(tf.random_normal(target_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(target_shape), 0.7)))
+    label_priors = tf.constant([0.34], shape=[num_labels])
+
+    loss, _ = loss_layers.recall_at_precision_loss(
+        targets,
+        logits,
+        target_precision,
+        label_priors=label_priors,
+        surrogate_type='xent',
+    )
+
+    expected_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      loss_val, expected_val = session.run([loss, expected_loss])
+      self.assertAllClose(loss_val, expected_val)
+
+  def testPositivesOnlyLoss(self):
+    # Tests a special case where the loss should equal cross entropy loss
+    # on the negatives only.
+    target_precision = 1.0
+    num_labels = 3
+    batch_shape = [30, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant(0.45, shape=[num_labels])
+
+    loss, _ = loss_layers.recall_at_precision_loss(
+        targets, logits, target_precision, label_priors=label_priors,
+        lambdas_initializer=tf.zeros_initializer())
+    expected_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits,
+        positive_weights=1.0,
+        negative_weights=0.0)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      loss_val, expected_val = session.run([loss, expected_loss])
+      self.assertAllClose(loss_val, expected_val)
+
+  def testEquivalenceBetweenSingleAndMultiplePrecisions(self):
+    """Checks recall at precision with different precision values.
+
+    Runs recall at precision with multiple precision values, and runs each label
+    seperately with its own precision value as a scalar. Validates that the
+    returned loss values are the same.
+    """
+    target_precision = [0.2, 0.9, 0.4]
+    num_labels = 3
+    batch_shape = [30, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant([0.45, 0.8, 0.3], shape=[num_labels])
+
+    multi_label_loss, _ = loss_layers.recall_at_precision_loss(
+        targets, logits, target_precision, label_priors=label_priors,
+    )
+
+    single_label_losses = [
+        loss_layers.recall_at_precision_loss(
+            tf.expand_dims(targets[:, i], -1),
+            tf.expand_dims(logits[:, i], -1),
+            target_precision[i],
+            label_priors=label_priors[i])[0]
+        for i in range(num_labels)
+    ]
+
+    single_label_losses = tf.concat(single_label_losses, 1)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_losses])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+  def testEquivalenceBetweenSingleAndEqualMultiplePrecisions(self):
+    """Compares single and multiple target precisions with the same value.
+
+    Checks that using a single target precision and multiple target precisions
+    with the same value would result in the same loss value.
+    """
+    num_labels = 2
+    target_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(target_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(target_shape), 0.7)))
+    label_priors = tf.constant([0.34], shape=[num_labels])
+
+    multi_precision_loss, _ = loss_layers.recall_at_precision_loss(
+        targets,
+        logits,
+        [0.75, 0.75],
+        label_priors=label_priors,
+        surrogate_type='xent',
+    )
+
+    single_precision_loss, _ = loss_layers.recall_at_precision_loss(
+        targets,
+        logits,
+        0.75,
+        label_priors=label_priors,
+        surrogate_type='xent',
+    )
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_precision_loss_val, single_precision_loss_val = session.run(
+          [multi_precision_loss, single_precision_loss])
+      self.assertAllClose(multi_precision_loss_val, single_precision_loss_val)
+
+  def testLagrangeMultiplierUpdateDirection(self):
+    for target_precision in [0.35, 0.65]:
+      for surrogate_type in ['xent', 'hinge']:
+        kwargs = {'target_precision': target_precision,
+                  'surrogate_type': surrogate_type,
+                  'scope': 'r-at-p_{}_{}'.format(target_precision,
+                                                 surrogate_type)}
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.recall_at_precision_loss,
+            objective_kwargs=kwargs,
+            data_builder=_multilabel_data,
+            test_object=self)
+        kwargs['scope'] = 'other-' + kwargs['scope']
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.recall_at_precision_loss,
+            objective_kwargs=kwargs,
+            data_builder=_other_multilabel_data(surrogate_type),
+            test_object=self)
+
+  def testLagrangeMultiplierUpdateDirectionWithMultiplePrecisions(self):
+    """Runs Lagrange multiplier test with multiple precision values."""
+    target_precision = [0.65, 0.35]
+
+    for surrogate_type in ['xent', 'hinge']:
+      scope_str = 'r-at-p_{}_{}'.format(
+          '_'.join([str(precision) for precision in target_precision]),
+          surrogate_type)
+      kwargs = {
+          'target_precision': target_precision,
+          'surrogate_type': surrogate_type,
+          'scope': scope_str,
+      }
+      run_lagrange_multiplier_test(
+          global_objective=loss_layers.recall_at_precision_loss,
+          objective_kwargs=kwargs,
+          data_builder=_multilabel_data,
+          test_object=self)
+      kwargs['scope'] = 'other-' + kwargs['scope']
+      run_lagrange_multiplier_test(
+          global_objective=loss_layers.recall_at_precision_loss,
+          objective_kwargs=kwargs,
+          data_builder=_other_multilabel_data(surrogate_type),
+          test_object=self)
+
+
+class PrecisionAtRecallTest(tf.test.TestCase):
+
+  def testCrossEntropyEquivalence(self):
+    # Checks a special case where the loss should equal cross-entropy loss.
+    target_recall = 1.0
+    num_labels = 3
+    batch_shape = [10, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+
+    loss, _ = loss_layers.precision_at_recall_loss(
+        targets, logits, target_recall,
+        lambdas_initializer=tf.constant_initializer(1.0))
+    expected_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets, logits)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(loss.eval(), expected_loss.eval())
+
+  def testNegativesOnlyLoss(self):
+    # Checks a special case where the loss should equal the loss on
+    # the negative examples only.
+    target_recall = 0.61828
+    num_labels = 4
+    batch_shape = [8, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+
+    loss, _ = loss_layers.precision_at_recall_loss(
+        targets,
+        logits,
+        target_recall,
+        surrogate_type='hinge',
+        lambdas_initializer=tf.constant_initializer(0.0),
+        scope='negatives_only_test')
+    expected_loss = util.weighted_hinge_loss(
+        targets, logits, positive_weights=0.0, negative_weights=1.0)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(expected_loss.eval(), loss.eval())
+
+  def testLagrangeMultiplierUpdateDirection(self):
+    for target_recall in [0.34, 0.66]:
+      for surrogate_type in ['xent', 'hinge']:
+        kwargs = {'target_recall': target_recall,
+                  'dual_rate_factor': 1.0,
+                  'surrogate_type': surrogate_type,
+                  'scope': 'p-at-r_{}_{}'.format(target_recall, surrogate_type)}
+
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.precision_at_recall_loss,
+            objective_kwargs=kwargs,
+            data_builder=_multilabel_data,
+            test_object=self)
+        kwargs['scope'] = 'other-' + kwargs['scope']
+        run_lagrange_multiplier_test(
+            global_objective=loss_layers.precision_at_recall_loss,
+            objective_kwargs=kwargs,
+            data_builder=_other_multilabel_data(surrogate_type),
+            test_object=self)
+
+  def testCrossEntropyEquivalenceWithMultipleRecalls(self):
+    """Checks a case where the loss equals xent loss with multiple recalls."""
+    num_labels = 3
+    target_recall = [1.0] * num_labels
+    batch_shape = [10, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+
+    loss, _ = loss_layers.precision_at_recall_loss(
+        targets, logits, target_recall,
+        lambdas_initializer=tf.constant_initializer(1.0))
+    expected_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets, logits)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(loss.eval(), expected_loss.eval())
+
+  def testNegativesOnlyLossWithMultipleRecalls(self):
+    """Tests a case where the loss equals the loss on the negative examples.
+
+    Checks this special case using multiple target recall values.
+    """
+    num_labels = 4
+    target_recall = [0.61828] * num_labels
+    batch_shape = [8, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+
+    loss, _ = loss_layers.precision_at_recall_loss(
+        targets,
+        logits,
+        target_recall,
+        surrogate_type='hinge',
+        lambdas_initializer=tf.constant_initializer(0.0),
+        scope='negatives_only_test')
+    expected_loss = util.weighted_hinge_loss(
+        targets, logits, positive_weights=0.0, negative_weights=1.0)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(expected_loss.eval(), loss.eval())
+
+  def testLagrangeMultiplierUpdateDirectionWithMultipleRecalls(self):
+    """Runs Lagrange multiplier test with multiple recall values."""
+    target_recall = [0.34, 0.66]
+    for surrogate_type in ['xent', 'hinge']:
+      scope_str = 'p-at-r_{}_{}'.format(
+          '_'.join([str(recall) for recall in target_recall]),
+          surrogate_type)
+      kwargs = {'target_recall': target_recall,
+                'dual_rate_factor': 1.0,
+                'surrogate_type': surrogate_type,
+                'scope': scope_str}
+
+      run_lagrange_multiplier_test(
+          global_objective=loss_layers.precision_at_recall_loss,
+          objective_kwargs=kwargs,
+          data_builder=_multilabel_data,
+          test_object=self)
+      kwargs['scope'] = 'other-' + kwargs['scope']
+      run_lagrange_multiplier_test(
+          global_objective=loss_layers.precision_at_recall_loss,
+          objective_kwargs=kwargs,
+          data_builder=_other_multilabel_data(surrogate_type),
+          test_object=self)
+
+  def testEquivalenceBetweenSingleAndMultipleRecalls(self):
+    """Checks precision at recall with multiple different recall values.
+
+    Runs precision at recall with multiple recall values, and runs each label
+    seperately with its own recall value as a scalar. Validates that the
+    returned loss values are the same.
+    """
+    target_precision = [0.7, 0.9, 0.4]
+    num_labels = 3
+    batch_shape = [30, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant(0.45, shape=[num_labels])
+
+    multi_label_loss, _ = loss_layers.precision_at_recall_loss(
+        targets, logits, target_precision, label_priors=label_priors
+    )
+
+    single_label_losses = [
+        loss_layers.precision_at_recall_loss(
+            tf.expand_dims(targets[:, i], -1),
+            tf.expand_dims(logits[:, i], -1),
+            target_precision[i],
+            label_priors=label_priors[i])[0]
+        for i in range(num_labels)
+    ]
+
+    single_label_losses = tf.concat(single_label_losses, 1)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_losses])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+  def testEquivalenceBetweenSingleAndEqualMultipleRecalls(self):
+    """Compares single and multiple target recalls of the same value.
+
+    Checks that using a single target recall and multiple recalls with the
+    same value would result in the same loss value.
+    """
+    num_labels = 2
+    target_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(target_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(target_shape), 0.7)))
+    label_priors = tf.constant([0.34], shape=[num_labels])
+
+    multi_precision_loss, _ = loss_layers.precision_at_recall_loss(
+        targets,
+        logits,
+        [0.75, 0.75],
+        label_priors=label_priors,
+        surrogate_type='xent',
+    )
+
+    single_precision_loss, _ = loss_layers.precision_at_recall_loss(
+        targets,
+        logits,
+        0.75,
+        label_priors=label_priors,
+        surrogate_type='xent',
+    )
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_precision_loss_val, single_precision_loss_val = session.run(
+          [multi_precision_loss, single_precision_loss])
+      self.assertAllClose(multi_precision_loss_val, single_precision_loss_val)
+
+
+class FalsePositiveRateAtTruePositiveRateTest(tf.test.TestCase):
+
+  def testNegativesOnlyLoss(self):
+    # Checks a special case where the loss returned should be the loss on the
+    # negative examples.
+    target_recall = 0.6
+    num_labels = 3
+    batch_shape = [3, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant(numpy.random.uniform(size=[num_labels]),
+                               dtype=tf.float32)
+
+    xent_loss, _ = loss_layers.false_positive_rate_at_true_positive_rate_loss(
+        targets, logits, target_recall, label_priors=label_priors,
+        lambdas_initializer=tf.constant_initializer(0.0))
+    xent_expected = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits,
+        positive_weights=0.0,
+        negative_weights=1.0)
+    hinge_loss, _ = loss_layers.false_positive_rate_at_true_positive_rate_loss(
+        targets, logits, target_recall, label_priors=label_priors,
+        lambdas_initializer=tf.constant_initializer(0.0),
+        surrogate_type='hinge')
+    hinge_expected = util.weighted_hinge_loss(
+        targets,
+        logits,
+        positive_weights=0.0,
+        negative_weights=1.0)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      xent_val, xent_expected = session.run([xent_loss, xent_expected])
+      self.assertAllClose(xent_val, xent_expected)
+      hinge_val, hinge_expected = session.run([hinge_loss, hinge_expected])
+      self.assertAllClose(hinge_val, hinge_expected)
+
+  def testPositivesOnlyLoss(self):
+    # Checks a special case where the loss returned should be the loss on the
+    # positive examples only.
+    target_recall = 1.0
+    num_labels = 5
+    batch_shape = [5, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.ones_like(logits)
+    label_priors = tf.constant(numpy.random.uniform(size=[num_labels]),
+                               dtype=tf.float32)
+
+    loss, _ = loss_layers.false_positive_rate_at_true_positive_rate_loss(
+        targets, logits, target_recall, label_priors=label_priors)
+    expected_loss = tf.nn.sigmoid_cross_entropy_with_logits(
+        labels=targets, logits=logits)
+    hinge_loss, _ = loss_layers.false_positive_rate_at_true_positive_rate_loss(
+        targets, logits, target_recall, label_priors=label_priors,
+        surrogate_type='hinge')
+    expected_hinge = util.weighted_hinge_loss(
+        targets, logits)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(loss.eval(), expected_loss.eval())
+      self.assertAllClose(hinge_loss.eval(), expected_hinge.eval())
+
+  def testEqualWeightLoss(self):
+    # Checks a special case where the loss returned should be proportional to
+    # the ordinary loss.
+    target_recall = 1.0
+    num_labels = 4
+    batch_shape = [40, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+    label_priors = tf.constant(0.5, shape=[num_labels])
+
+    loss, _ = loss_layers.false_positive_rate_at_true_positive_rate_loss(
+        targets, logits, target_recall, label_priors=label_priors)
+    expected_loss = tf.nn.sigmoid_cross_entropy_with_logits(
+        labels=targets, logits=logits)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(loss.eval(), expected_loss.eval())
+
+  def testLagrangeMultiplierUpdateDirection(self):
+    for target_rate in [0.35, 0.65]:
+      for surrogate_type in ['xent', 'hinge']:
+        kwargs = {'target_rate': target_rate,
+                  'surrogate_type': surrogate_type,
+                  'scope': 'fpr-at-tpr_{}_{}'.format(target_rate,
+                                                     surrogate_type)}
+        # True positive rate is a synonym for recall, so we use the
+        # recall constraint data.
+        run_lagrange_multiplier_test(
+            global_objective=(
+                loss_layers.false_positive_rate_at_true_positive_rate_loss),
+            objective_kwargs=kwargs,
+            data_builder=_multilabel_data,
+            test_object=self)
+        kwargs['scope'] = 'other-' + kwargs['scope']
+        run_lagrange_multiplier_test(
+            global_objective=(
+                loss_layers.false_positive_rate_at_true_positive_rate_loss),
+            objective_kwargs=kwargs,
+            data_builder=_other_multilabel_data(surrogate_type),
+            test_object=self)
+
+  def testLagrangeMultiplierUpdateDirectionWithMultipleRates(self):
+    """Runs Lagrange multiplier test with multiple target rates."""
+    target_rate = [0.35, 0.65]
+    for surrogate_type in ['xent', 'hinge']:
+      kwargs = {'target_rate': target_rate,
+                'surrogate_type': surrogate_type,
+                'scope': 'fpr-at-tpr_{}_{}'.format(
+                    '_'.join([str(target) for target in target_rate]),
+                    surrogate_type)}
+      # True positive rate is a synonym for recall, so we use the
+      # recall constraint data.
+      run_lagrange_multiplier_test(
+          global_objective=(
+              loss_layers.false_positive_rate_at_true_positive_rate_loss),
+          objective_kwargs=kwargs,
+          data_builder=_multilabel_data,
+          test_object=self)
+      kwargs['scope'] = 'other-' + kwargs['scope']
+      run_lagrange_multiplier_test(
+          global_objective=(
+              loss_layers.false_positive_rate_at_true_positive_rate_loss),
+          objective_kwargs=kwargs,
+          data_builder=_other_multilabel_data(surrogate_type),
+          test_object=self)
+
+  def testEquivalenceBetweenSingleAndEqualMultipleRates(self):
+    """Compares single and multiple target rates of the same value.
+
+    Checks that using a single target rate and multiple rates with the
+    same value would result in the same loss value.
+    """
+    num_labels = 2
+    target_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(target_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(target_shape), 0.7)))
+    label_priors = tf.constant([0.34], shape=[num_labels])
+
+    multi_label_loss, _ = (
+        loss_layers.false_positive_rate_at_true_positive_rate_loss(
+            targets, logits, [0.75, 0.75], label_priors=label_priors))
+
+    single_label_loss, _ = (
+        loss_layers.false_positive_rate_at_true_positive_rate_loss(
+            targets, logits, 0.75, label_priors=label_priors))
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_loss])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+  def testEquivalenceBetweenSingleAndMultipleRates(self):
+    """Compares single and multiple target rates of different values.
+
+    Runs false_positive_rate_at_true_positive_rate_loss with multiple target
+    rates, and runs each label seperately with its own target rate as a
+    scalar. Validates that the returned loss values are the same.
+    """
+    target_precision = [0.7, 0.9, 0.4]
+    num_labels = 3
+    batch_shape = [30, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant(0.45, shape=[num_labels])
+
+    multi_label_loss, _ = (
+        loss_layers.false_positive_rate_at_true_positive_rate_loss(
+            targets, logits, target_precision, label_priors=label_priors))
+
+    single_label_losses = [
+        loss_layers.false_positive_rate_at_true_positive_rate_loss(
+            tf.expand_dims(targets[:, i], -1),
+            tf.expand_dims(logits[:, i], -1),
+            target_precision[i],
+            label_priors=label_priors[i])[0]
+        for i in range(num_labels)
+    ]
+
+    single_label_losses = tf.concat(single_label_losses, 1)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_losses])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+
+class TruePositiveRateAtFalsePositiveRateTest(tf.test.TestCase):
+
+  def testPositivesOnlyLoss(self):
+    # A special case where the loss should equal the loss on the positive
+    # examples.
+    target_rate = numpy.random.uniform()
+    num_labels = 3
+    batch_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+    label_priors = tf.constant(numpy.random.uniform(size=[num_labels]),
+                               dtype=tf.float32)
+
+    xent_loss, _ = loss_layers.true_positive_rate_at_false_positive_rate_loss(
+        targets, logits, target_rate, label_priors=label_priors,
+        lambdas_initializer=tf.constant_initializer(0.0))
+    xent_expected = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits,
+        positive_weights=1.0,
+        negative_weights=0.0)
+    hinge_loss, _ = loss_layers.true_positive_rate_at_false_positive_rate_loss(
+        targets, logits, target_rate, label_priors=label_priors,
+        lambdas_initializer=tf.constant_initializer(0.0),
+        surrogate_type='hinge')
+    hinge_expected = util.weighted_hinge_loss(
+        targets,
+        logits,
+        positive_weights=1.0,
+        negative_weights=0.0)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(xent_expected.eval(), xent_loss.eval())
+      self.assertAllClose(hinge_expected.eval(), hinge_loss.eval())
+
+  def testNegativesOnlyLoss(self):
+    # A special case where the loss should equal the loss on the negative
+    # examples, minus target_rate * (1 - label_priors) * maybe_log2.
+    target_rate = numpy.random.uniform()
+    num_labels = 3
+    batch_shape = [25, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.zeros_like(logits)
+    label_priors = tf.constant(numpy.random.uniform(size=[num_labels]),
+                               dtype=tf.float32)
+
+    xent_loss, _ = loss_layers.true_positive_rate_at_false_positive_rate_loss(
+        targets, logits, target_rate, label_priors=label_priors)
+    xent_expected = tf.subtract(
+        util.weighted_sigmoid_cross_entropy_with_logits(targets,
+                                                        logits,
+                                                        positive_weights=0.0,
+                                                        negative_weights=1.0),
+        target_rate * (1.0 - label_priors) * numpy.log(2))
+    hinge_loss, _ = loss_layers.true_positive_rate_at_false_positive_rate_loss(
+        targets, logits, target_rate, label_priors=label_priors,
+        surrogate_type='hinge')
+    hinge_expected = util.weighted_hinge_loss(
+        targets, logits) - target_rate * (1.0 - label_priors)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(xent_expected.eval(), xent_loss.eval())
+      self.assertAllClose(hinge_expected.eval(), hinge_loss.eval())
+
+  def testLagrangeMultiplierUpdateDirection(self):
+    for target_rate in [0.35, 0.65]:
+      for surrogate_type in ['xent', 'hinge']:
+        kwargs = {'target_rate': target_rate,
+                  'surrogate_type': surrogate_type,
+                  'scope': 'tpr-at-fpr_{}_{}'.format(target_rate,
+                                                     surrogate_type)}
+        run_lagrange_multiplier_test(
+            global_objective=(
+                loss_layers.true_positive_rate_at_false_positive_rate_loss),
+            objective_kwargs=kwargs,
+            data_builder=_multilabel_data,
+            test_object=self)
+        kwargs['scope'] = 'other-' + kwargs['scope']
+        run_lagrange_multiplier_test(
+            global_objective=(
+                loss_layers.true_positive_rate_at_false_positive_rate_loss),
+            objective_kwargs=kwargs,
+            data_builder=_other_multilabel_data(surrogate_type),
+            test_object=self)
+
+  def testLagrangeMultiplierUpdateDirectionWithMultipleRates(self):
+    """Runs Lagrange multiplier test with multiple target rates."""
+    target_rate = [0.35, 0.65]
+    for surrogate_type in ['xent', 'hinge']:
+      kwargs = {'target_rate': target_rate,
+                'surrogate_type': surrogate_type,
+                'scope': 'tpr-at-fpr_{}_{}'.format(
+                    '_'.join([str(target) for target in target_rate]),
+                    surrogate_type)}
+      run_lagrange_multiplier_test(
+          global_objective=(
+              loss_layers.true_positive_rate_at_false_positive_rate_loss),
+          objective_kwargs=kwargs,
+          data_builder=_multilabel_data,
+          test_object=self)
+      kwargs['scope'] = 'other-' + kwargs['scope']
+      run_lagrange_multiplier_test(
+          global_objective=(
+              loss_layers.true_positive_rate_at_false_positive_rate_loss),
+          objective_kwargs=kwargs,
+          data_builder=_other_multilabel_data(surrogate_type),
+          test_object=self)
+
+  def testEquivalenceBetweenSingleAndEqualMultipleRates(self):
+    """Compares single and multiple target rates of the same value.
+
+    Checks that using a single target rate and multiple rates with the
+    same value would result in the same loss value.
+    """
+    num_labels = 2
+    target_shape = [20, num_labels]
+    logits = tf.Variable(tf.random_normal(target_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(target_shape), 0.7)))
+    label_priors = tf.constant([0.34], shape=[num_labels])
+
+    multi_label_loss, _ = (
+        loss_layers.true_positive_rate_at_false_positive_rate_loss(
+            targets, logits, [0.75, 0.75], label_priors=label_priors))
+
+    single_label_loss, _ = (
+        loss_layers.true_positive_rate_at_false_positive_rate_loss(
+            targets, logits, 0.75, label_priors=label_priors))
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_loss])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+  def testEquivalenceBetweenSingleAndMultipleRates(self):
+    """Compares single and multiple target rates of different values.
+
+    Runs true_positive_rate_at_false_positive_rate_loss with multiple target
+    rates, and runs each label seperately with its own target rate as a
+    scalar. Validates that the returned loss values are the same.
+    """
+    target_precision = [0.7, 0.9, 0.4]
+    num_labels = 3
+    batch_shape = [30, num_labels]
+    logits = tf.Variable(tf.random_normal(batch_shape))
+    targets = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors = tf.constant(0.45, shape=[num_labels])
+
+    multi_label_loss, _ = (
+        loss_layers.true_positive_rate_at_false_positive_rate_loss(
+            targets, logits, target_precision, label_priors=label_priors))
+
+    single_label_losses = [
+        loss_layers.true_positive_rate_at_false_positive_rate_loss(
+            tf.expand_dims(targets[:, i], -1),
+            tf.expand_dims(logits[:, i], -1),
+            target_precision[i],
+            label_priors=label_priors[i])[0]
+        for i in range(num_labels)
+    ]
+
+    single_label_losses = tf.concat(single_label_losses, 1)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      multi_label_loss_val, single_label_loss_val = session.run(
+          [multi_label_loss, single_label_losses])
+      self.assertAllClose(multi_label_loss_val, single_label_loss_val)
+
+
+class UtilityFunctionsTest(tf.test.TestCase):
+
+  def testTrainableDualVariable(self):
+    # Confirm correct behavior of a trainable dual variable.
+    x = tf.get_variable('primal', dtype=tf.float32, initializer=2.0)
+    y_value, y = loss_layers._create_dual_variable(
+        'dual', shape=None, dtype=tf.float32, initializer=1.0, collections=None,
+        trainable=True, dual_rate_factor=0.3)
+    optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
+    update = optimizer.minimize(0.5 * tf.square(x - y_value))
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      update.run()
+      self.assertAllClose(0.7, y.eval())
+
+  def testUntrainableDualVariable(self):
+    # Confirm correct behavior of dual variable which is not trainable.
+    x = tf.get_variable('primal', dtype=tf.float32, initializer=-2.0)
+    y_value, y = loss_layers._create_dual_variable(
+        'dual', shape=None, dtype=tf.float32, initializer=1.0, collections=None,
+        trainable=False, dual_rate_factor=0.8)
+    optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
+    update = optimizer.minimize(tf.square(x) * y_value + tf.exp(y_value))
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      update.run()
+      self.assertAllClose(1.0, y.eval())
+
+
+class BoundTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.named_parameters(
+      ('_xent', 'xent', 1.0, [2.0, 1.0]),
+      ('_xent_weighted', 'xent',
+       numpy.array([0, 2, 0.5, 1, 2, 3]).reshape(6, 1), [2.5, 0]),
+      ('_hinge', 'hinge', 1.0, [2.0, 1.0]),
+      ('_hinge_weighted', 'hinge',
+       numpy.array([1.0, 2, 3, 4, 5, 6]).reshape(6, 1), [5.0, 1]))
+  def testLowerBoundMultilabel(self, surrogate_type, weights, expected):
+    labels, logits, _ = _multilabel_data()
+    lower_bound = loss_layers.true_positives_lower_bound(
+        labels, logits, weights, surrogate_type)
+
+    with self.test_session():
+      self.assertAllClose(lower_bound.eval(), expected)
+
+  @parameterized.named_parameters(
+      ('_xent', 'xent'), ('_hinge', 'hinge'))
+  def testLowerBoundOtherMultilabel(self, surrogate_type):
+    labels, logits, _ = _other_multilabel_data(surrogate_type)()
+    lower_bound = loss_layers.true_positives_lower_bound(
+        labels, logits, 1.0, surrogate_type)
+
+    with self.test_session():
+      self.assertAllClose(lower_bound.eval(), [4.0, 2.0], atol=1e-5)
+
+  @parameterized.named_parameters(
+      ('_xent', 'xent', 1.0, [1.0, 2.0]),
+      ('_xent_weighted', 'xent',
+       numpy.array([3.0, 2, 1, 0, 1, 2]).reshape(6, 1), [2.0, 1.0]),
+      ('_hinge', 'hinge', 1.0, [1.0, 2.0]),
+      ('_hinge_weighted', 'hinge',
+       numpy.array([13, 12, 11, 0.5, 0, 0.5]).reshape(6, 1), [0.5, 0.5]))
+  def testUpperBoundMultilabel(self, surrogate_type, weights, expected):
+    labels, logits, _ = _multilabel_data()
+    upper_bound = loss_layers.false_positives_upper_bound(
+        labels, logits, weights, surrogate_type)
+
+    with self.test_session():
+      self.assertAllClose(upper_bound.eval(), expected)
+
+  @parameterized.named_parameters(
+      ('_xent', 'xent'), ('_hinge', 'hinge'))
+  def testUpperBoundOtherMultilabel(self, surrogate_type):
+    labels, logits, _ = _other_multilabel_data(surrogate_type)()
+    upper_bound = loss_layers.false_positives_upper_bound(
+        labels, logits, 1.0, surrogate_type)
+
+    with self.test_session():
+      self.assertAllClose(upper_bound.eval(), [2.0, 4.0], atol=1e-5)
+
+  @parameterized.named_parameters(
+      ('_lower', 'lower'), ('_upper', 'upper'))
+  def testThreeDimensionalLogits(self, bound):
+    bound_function = loss_layers.false_positives_upper_bound
+    if bound == 'lower':
+      bound_function = loss_layers.true_positives_lower_bound
+    random_labels = numpy.float32(numpy.random.uniform(size=[2, 3]) > 0.5)
+    random_logits = numpy.float32(numpy.random.randn(2, 3, 2))
+    first_slice_logits = random_logits[:, :, 0].reshape(2, 3)
+    second_slice_logits = random_logits[:, :, 1].reshape(2, 3)
+
+    full_bound = bound_function(
+        tf.constant(random_labels), tf.constant(random_logits), 1.0, 'xent')
+    first_slice_bound = bound_function(tf.constant(random_labels),
+                                       tf.constant(first_slice_logits),
+                                       1.0,
+                                       'xent')
+    second_slice_bound = bound_function(tf.constant(random_labels),
+                                        tf.constant(second_slice_logits),
+                                        1.0,
+                                        'xent')
+    stacked_bound = tf.stack([first_slice_bound, second_slice_bound], axis=1)
+
+    with self.test_session():
+      self.assertAllClose(full_bound.eval(), stacked_bound.eval())
+
+
+def run_lagrange_multiplier_test(global_objective,
+                                 objective_kwargs,
+                                 data_builder,
+                                 test_object):
+  """Runs a test for the Lagrange multiplier update of `global_objective`.
+
+  The test checks that the constraint for `global_objective` is satisfied on
+  the first label of the data produced by `data_builder` but not the second.
+
+  Args:
+    global_objective: One of the global objectives.
+    objective_kwargs: A dictionary of keyword arguments to pass to
+      `global_objective`. Must contain an entry for the constraint argument
+      of `global_objective`, e.g. 'target_rate' or 'target_precision'.
+    data_builder: A function  which returns tensors corresponding to labels,
+      logits, and label priors.
+    test_object: An instance of tf.test.TestCase.
+  """
+  # Construct global objective kwargs from a copy of `objective_kwargs`.
+  kwargs = dict(objective_kwargs)
+  targets, logits, priors = data_builder()
+  kwargs['labels'] = targets
+  kwargs['logits'] = logits
+  kwargs['label_priors'] = priors
+
+  loss, output_dict = global_objective(**kwargs)
+  lambdas = tf.squeeze(output_dict['lambdas'])
+  opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)
+  update_op = opt.minimize(loss, var_list=[output_dict['lambdas']])
+
+  with test_object.test_session() as session:
+    tf.global_variables_initializer().run()
+    lambdas_before = session.run(lambdas)
+    session.run(update_op)
+    lambdas_after = session.run(lambdas)
+    test_object.assertLess(lambdas_after[0], lambdas_before[0])
+    test_object.assertGreater(lambdas_after[1], lambdas_before[1])
+
+
+class CrossFunctionTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.named_parameters(
+      ('_auc01xent', loss_layers.precision_recall_auc_loss, {
+          'precision_range': (0.0, 1.0), 'surrogate_type': 'xent'
+      }),
+      ('_auc051xent', loss_layers.precision_recall_auc_loss, {
+          'precision_range': (0.5, 1.0), 'surrogate_type': 'xent'
+      }),
+      ('_auc01)hinge', loss_layers.precision_recall_auc_loss, {
+          'precision_range': (0.0, 1.0), 'surrogate_type': 'hinge'
+      }),
+      ('_ratp04', loss_layers.recall_at_precision_loss, {
+          'target_precision': 0.4, 'surrogate_type': 'xent'
+      }),
+      ('_ratp066', loss_layers.recall_at_precision_loss, {
+          'target_precision': 0.66, 'surrogate_type': 'xent'
+      }),
+      ('_ratp07_hinge', loss_layers.recall_at_precision_loss, {
+          'target_precision': 0.7, 'surrogate_type': 'hinge'
+      }),
+      ('_fpattp066', loss_layers.false_positive_rate_at_true_positive_rate_loss,
+       {'target_rate': 0.66, 'surrogate_type': 'xent'}),
+      ('_fpattp046', loss_layers.false_positive_rate_at_true_positive_rate_loss,
+       {
+           'target_rate': 0.46, 'surrogate_type': 'xent'
+       }),
+      ('_fpattp076_hinge',
+       loss_layers.false_positive_rate_at_true_positive_rate_loss, {
+           'target_rate': 0.76, 'surrogate_type': 'hinge'
+       }),
+      ('_fpattp036_hinge',
+       loss_layers.false_positive_rate_at_true_positive_rate_loss, {
+           'target_rate': 0.36, 'surrogate_type': 'hinge'
+       }),
+  )
+  def testWeigtedGlobalObjective(self,
+                                 global_objective,
+                                 objective_kwargs):
+    """Runs a test of `global_objective` with per-example weights.
+
+    Args:
+      global_objective: One of the global objectives.
+      objective_kwargs: A dictionary of keyword arguments to pass to
+        `global_objective`. Must contain keys 'surrogate_type', and the keyword
+        for the constraint argument of `global_objective`, e.g. 'target_rate' or
+        'target_precision'.
+    """
+    logits_positives = tf.constant([1, -0.5, 3], shape=[3, 1])
+    logits_negatives = tf.constant([-0.5, 1, -1, -1, -0.5, 1], shape=[6, 1])
+
+    # Dummy tensor is used to compute the gradients.
+    dummy = tf.constant(1.0)
+    logits = tf.concat([logits_positives, logits_negatives], 0)
+    logits = tf.multiply(logits, dummy)
+    targets = tf.constant([1, 1, 1, 0, 0, 0, 0, 0, 0],
+                          shape=[9, 1], dtype=tf.float32)
+    priors = tf.constant(1.0/3.0, shape=[1])
+    weights = tf.constant([1, 1, 1, 0, 0, 0, 2, 2, 2],
+                          shape=[9, 1], dtype=tf.float32)
+
+    # Construct global objective kwargs.
+    objective_kwargs['labels'] = targets
+    objective_kwargs['logits'] = logits
+    objective_kwargs['label_priors'] = priors
+
+    scope = 'weighted_test'
+    # Unweighted loss.
+    objective_kwargs['scope'] = scope + '_plain'
+    raw_loss, update = global_objective(**objective_kwargs)
+    loss = tf.reduce_sum(raw_loss)
+
+    # Weighted loss.
+    objective_kwargs['weights'] = weights
+    objective_kwargs['scope'] = scope + '_weighted'
+    raw_weighted_loss, weighted_update = global_objective(**objective_kwargs)
+    weighted_loss = tf.reduce_sum(raw_weighted_loss)
+
+    lambdas = tf.contrib.framework.get_unique_variable(scope + '_plain/lambdas')
+    weighted_lambdas = tf.contrib.framework.get_unique_variable(
+        scope + '_weighted/lambdas')
+    logits_gradient = tf.gradients(loss, dummy)
+    weighted_logits_gradient = tf.gradients(weighted_loss, dummy)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      self.assertAllClose(loss.eval(), weighted_loss.eval())
+
+      logits_grad, weighted_logits_grad = session.run(
+          [logits_gradient, weighted_logits_gradient])
+      self.assertAllClose(logits_grad, weighted_logits_grad)
+
+      session.run([update, weighted_update])
+      lambdas_value, weighted_lambdas_value = session.run(
+          [lambdas, weighted_lambdas])
+      self.assertAllClose(lambdas_value, weighted_lambdas_value)
+
+  @parameterized.named_parameters(
+      ('_prauc051xent', loss_layers.precision_recall_auc_loss, {
+          'precision_range': (0.5, 1.0), 'surrogate_type': 'xent'
+      }),
+      ('_prauc01hinge', loss_layers.precision_recall_auc_loss, {
+          'precision_range': (0.0, 1.0), 'surrogate_type': 'hinge'
+      }),
+      ('_rocxent', loss_layers.roc_auc_loss, {'surrogate_type': 'xent'}),
+      ('_rochinge', loss_layers.roc_auc_loss, {'surrogate_type': 'xent'}),
+      ('_ratp04', loss_layers.recall_at_precision_loss, {
+          'target_precision': 0.4, 'surrogate_type': 'xent'
+      }),
+      ('_ratp07_hinge', loss_layers.recall_at_precision_loss, {
+          'target_precision': 0.7, 'surrogate_type': 'hinge'
+      }),
+      ('_patr05', loss_layers.precision_at_recall_loss, {
+          'target_recall': 0.4, 'surrogate_type': 'xent'
+      }),
+      ('_patr08_hinge', loss_layers.precision_at_recall_loss, {
+          'target_recall': 0.7, 'surrogate_type': 'hinge'
+      }),
+      ('_fpattp046', loss_layers.false_positive_rate_at_true_positive_rate_loss,
+       {
+           'target_rate': 0.46, 'surrogate_type': 'xent'
+       }),
+      ('_fpattp036_hinge',
+       loss_layers.false_positive_rate_at_true_positive_rate_loss, {
+           'target_rate': 0.36, 'surrogate_type': 'hinge'
+       }),
+      ('_tpatfp076', loss_layers.true_positive_rate_at_false_positive_rate_loss,
+       {
+           'target_rate': 0.76, 'surrogate_type': 'xent'
+       }),
+      ('_tpatfp036_hinge',
+       loss_layers.true_positive_rate_at_false_positive_rate_loss, {
+           'target_rate': 0.36, 'surrogate_type': 'hinge'
+       }),
+  )
+  def testVectorAndMatrixLabelEquivalence(self,
+                                          global_objective,
+                                          objective_kwargs):
+    """Tests equivalence between label shape [batch_size] or [batch_size, 1]."""
+    vector_labels = tf.constant([1.0, 1.0, 0.0, 0.0], shape=[4])
+    vector_logits = tf.constant([1.0, 0.1, 0.1, -1.0], shape=[4])
+
+    # Construct vector global objective kwargs and loss.
+    vector_kwargs = objective_kwargs.copy()
+    vector_kwargs['labels'] = vector_labels
+    vector_kwargs['logits'] = vector_logits
+    vector_loss, _ = global_objective(**vector_kwargs)
+    vector_loss_sum = tf.reduce_sum(vector_loss)
+
+    # Construct matrix global objective kwargs and loss.
+    matrix_kwargs = objective_kwargs.copy()
+    matrix_kwargs['labels'] = tf.expand_dims(vector_labels, 1)
+    matrix_kwargs['logits'] = tf.expand_dims(vector_logits, 1)
+    matrix_loss, _ = global_objective(**matrix_kwargs)
+    matrix_loss_sum = tf.reduce_sum(matrix_loss)
+
+    self.assertEqual(1, vector_loss.get_shape().ndims)
+    self.assertEqual(2, matrix_loss.get_shape().ndims)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(vector_loss_sum.eval(), matrix_loss_sum.eval())
+
+  @parameterized.named_parameters(
+      ('_prauc', loss_layers.precision_recall_auc_loss, None),
+      ('_roc', loss_layers.roc_auc_loss, None),
+      ('_rap', loss_layers.recall_at_precision_loss, {'target_precision': 0.8}),
+      ('_patr', loss_layers.precision_at_recall_loss, {'target_recall': 0.7}),
+      ('_fpattp', loss_layers.false_positive_rate_at_true_positive_rate_loss,
+       {'target_rate': 0.9}),
+      ('_tpatfp', loss_layers.true_positive_rate_at_false_positive_rate_loss,
+       {'target_rate': 0.1})
+  )
+  def testUnknownBatchSize(self, global_objective, objective_kwargs):
+    # Tests that there are no errors when the batch size is not known.
+    batch_shape = [5, 2]
+    logits = tf.placeholder(tf.float32)
+    logits_feed = numpy.random.randn(*batch_shape)
+    labels = tf.placeholder(tf.float32)
+    labels_feed = logits_feed > 0.1
+    logits.set_shape([None, 2])
+    labels.set_shape([None, 2])
+
+    if objective_kwargs is None:
+      objective_kwargs = {}
+
+    placeholder_kwargs = objective_kwargs.copy()
+    placeholder_kwargs['labels'] = labels
+    placeholder_kwargs['logits'] = logits
+    placeholder_loss, _ = global_objective(**placeholder_kwargs)
+
+    kwargs = objective_kwargs.copy()
+    kwargs['labels'] = labels_feed
+    kwargs['logits'] = logits_feed
+    loss, _ = global_objective(**kwargs)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+      feed_loss_val = session.run(placeholder_loss,
+                                  feed_dict={logits: logits_feed,
+                                             labels: labels_feed})
+      loss_val = session.run(loss)
+      self.assertAllClose(feed_loss_val, loss_val)
+
+
+# Both sets of logits below are designed so that the surrogate precision and
+# recall (true positive rate) of class 1 is ~ 2/3, and the same surrogates for
+# class 2 are ~ 1/3. The false positive rate surrogates are ~ 1/3 and 2/3.
+def _multilabel_data():
+  targets = tf.constant([1.0, 1.0, 1.0, 0.0, 0.0, 0.0], shape=[6, 1])
+  targets = tf.concat([targets, targets], 1)
+  logits_positives = tf.constant([[0.0, 15],
+                                  [16, 0.0],
+                                  [14, 0.0]], shape=[3, 2])
+  logits_negatives = tf.constant([[-17, 0.0],
+                                  [-15, 0.0],
+                                  [0.0, -101]], shape=[3, 2])
+  logits = tf.concat([logits_positives, logits_negatives], 0)
+  priors = tf.constant(0.5, shape=[2])
+
+  return targets, logits, priors
+
+
+def _other_multilabel_data(surrogate_type):
+  targets = tf.constant(
+      [1.0] * 6 + [0.0] * 6, shape=[12, 1])
+  targets = tf.concat([targets, targets], 1)
+  logits_positives = tf.constant([[0.0, 13],
+                                  [12, 0.0],
+                                  [15, 0.0],
+                                  [0.0, 30],
+                                  [13, 0.0],
+                                  [18, 0.0]], shape=[6, 2])
+  # A score of cost_2 incurs a loss of ~2.0.
+  cost_2 = 1.0 if surrogate_type == 'hinge' else 1.09861229
+  logits_negatives = tf.constant([[-16, cost_2],
+                                  [-15, cost_2],
+                                  [cost_2, -111],
+                                  [-133, -14,],
+                                  [-14.0100101, -16,],
+                                  [-19.888828882, -101]], shape=[6, 2])
+  logits = tf.concat([logits_positives, logits_negatives], 0)
+  priors = tf.constant(0.5, shape=[2])
+
+  def builder():
+    return targets, logits, priors
+
+  return builder
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/global_objectives/test_all.py
+++ b/research/global_objectives/test_all.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Runs all unit tests in the Global Objectives package.
+
+Requires that TensorFlow and abseil (https://github.com/abseil/abseil-py) be
+installed on your machine. Command to run the tests:
+python test_all.py
+
+"""
+
+import os
+import sys
+import unittest
+
+this_file = os.path.realpath(__file__)
+start_dir = os.path.dirname(this_file)
+parent_dir = os.path.dirname(start_dir)
+
+sys.path.append(parent_dir)
+loader = unittest.TestLoader()
+suite = loader.discover(start_dir, pattern='*_test.py')
+
+runner = unittest.TextTestRunner(verbosity=2)
+runner.run(suite)
--- a/research/global_objectives/util.py
+++ b/research/global_objectives/util.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Contains utility functions for the global objectives library."""
+
+# Dependency imports
+import tensorflow as tf
+
+
+def weighted_sigmoid_cross_entropy_with_logits(labels,
+                                               logits,
+                                               positive_weights=1.0,
+                                               negative_weights=1.0,
+                                               name=None):
+  """Computes a weighting of sigmoid cross entropy given `logits`.
+
+  Measures the weighted probability error in discrete classification tasks in
+  which classes are independent and not mutually exclusive.  For instance, one
+  could perform multilabel classification where a picture can contain both an
+  elephant and a dog at the same time. The class weight multiplies the
+  different types of errors.
+  For brevity, let `x = logits`, `z = labels`, `c = positive_weights`,
+  `d = negative_weights`  The
+  weighed logistic loss is
+
+  ```
+  c * z * -log(sigmoid(x)) + d * (1 - z) * -log(1 - sigmoid(x))
+  = c * z * -log(1 / (1 + exp(-x))) - d * (1 - z) * log(exp(-x) / (1 + exp(-x)))
+  = c * z * log(1 + exp(-x)) + d * (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
+  = c * z * log(1 + exp(-x)) + d * (1 - z) * (x + log(1 + exp(-x)))
+  = (1 - z) * x * d + (1 - z + c * z ) * log(1 + exp(-x))
+  =  - d * x * z + d * x + (d - d * z + c * z ) * log(1 + exp(-x))
+  ```
+
+  To ensure stability and avoid overflow, the implementation uses the identity
+      log(1 + exp(-x)) = max(0,-x) + log(1 + exp(-abs(x)))
+  and the result is computed as
+
+    ```
+    = -d * x * z + d * x
+      + (d - d * z + c * z ) * (max(0,-x) + log(1 + exp(-abs(x))))
+    ```
+
+  Note that the loss is NOT an upper bound on the 0-1 loss, unless it is divided
+  by log(2).
+
+  Args:
+    labels: A `Tensor` of type `float32` or `float64`. `labels` can be a 2D
+      tensor with shape [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], the loss is computed separately on each
+      slice [:, :, k] of `logits`.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of the same shape as `logits` with the componentwise
+      weighted logistic losses.
+  """
+  with tf.name_scope(
+      name,
+      'weighted_logistic_loss',
+      [logits, labels, positive_weights, negative_weights]) as name:
+    labels, logits, positive_weights, negative_weights = prepare_loss_args(
+        labels, logits, positive_weights, negative_weights)
+
+    softplus_term = tf.add(tf.maximum(-logits, 0.0),
+                           tf.log(1.0 + tf.exp(-tf.abs(logits))))
+    weight_dependent_factor = (
+        negative_weights + (positive_weights - negative_weights) * labels)
+    return (negative_weights * (logits - labels * logits) +
+            weight_dependent_factor * softplus_term)
+
+
+def weighted_hinge_loss(labels,
+                        logits,
+                        positive_weights=1.0,
+                        negative_weights=1.0,
+                        name=None):
+  """Computes weighted hinge loss given logits `logits`.
+
+  The loss applies to multi-label classification tasks where labels are
+  independent and not mutually exclusive. See also
+  `weighted_sigmoid_cross_entropy_with_logits`.
+
+  Args:
+    labels: A `Tensor` of type `float32` or `float64`. Each entry must be
+      either 0 or 1. `labels` can be a 2D tensor with shape
+      [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], the loss is computed separately on each
+      slice [:, :, k] of `logits`.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of the same shape as `logits` with the componentwise
+      weighted hinge loss.
+  """
+  with tf.name_scope(
+      name, 'weighted_hinge_loss',
+      [logits, labels, positive_weights, negative_weights]) as name:
+    labels, logits, positive_weights, negative_weights = prepare_loss_args(
+        labels, logits, positive_weights, negative_weights)
+
+    positives_term = positive_weights * labels * tf.maximum(1.0 - logits, 0)
+    negatives_term = (negative_weights * (1.0 - labels)
+                      * tf.maximum(1.0 + logits, 0))
+    return positives_term + negatives_term
+
+
+def weighted_surrogate_loss(labels,
+                            logits,
+                            surrogate_type='xent',
+                            positive_weights=1.0,
+                            negative_weights=1.0,
+                            name=None):
+  """Returns either weighted cross-entropy or hinge loss.
+
+  For example `surrogate_type` is 'xent' returns the weighted cross
+  entropy loss.
+
+  Args:
+   labels: A `Tensor` of type `float32` or `float64`. Each entry must be
+      between 0 and 1. `labels` can be a 2D tensor with shape
+      [batch_size, num_labels] or a 3D tensor with shape
+      [batch_size, num_labels, K].
+    logits: A `Tensor` of the same type and shape as `labels`. If `logits` has
+      shape [batch_size, num_labels, K], each slice [:, :, k] represents an
+      'attempt' to predict `labels` and the loss is computed per slice.
+    surrogate_type: A string that determines which loss to return, supports
+    'xent' for cross-entropy and 'hinge' for hinge loss.
+    positive_weights: A `Tensor` that holds positive weights and has the
+      following semantics according to its shape:
+        scalar - A global positive weight.
+        1D tensor - must be of size K, a weight for each 'attempt'
+        2D tensor - of size [num_labels, K'] where K' is either K or 1.
+      The `positive_weights` will be expanded to the left to match the
+      dimensions of logits and labels.
+    negative_weights: A `Tensor` that holds positive weight and has the
+      semantics identical to positive_weights.
+    name: A name for the operation (optional).
+
+  Returns:
+    The weigthed loss.
+
+  Raises:
+    ValueError: If value of `surrogate_type` is not supported.
+  """
+  with tf.name_scope(
+      name, 'weighted_loss',
+      [logits, labels, surrogate_type, positive_weights,
+       negative_weights]) as name:
+    if surrogate_type == 'xent':
+      return weighted_sigmoid_cross_entropy_with_logits(
+          logits=logits,
+          labels=labels,
+          positive_weights=positive_weights,
+          negative_weights=negative_weights,
+          name=name)
+    elif surrogate_type == 'hinge':
+      return weighted_hinge_loss(
+          logits=logits,
+          labels=labels,
+          positive_weights=positive_weights,
+          negative_weights=negative_weights,
+          name=name)
+    raise ValueError('surrogate_type %s not supported.' % surrogate_type)
+
+
+def expand_outer(tensor, rank):
+  """Expands the given `Tensor` outwards to a target rank.
+
+  For example if rank = 3 and tensor.shape is [3, 4], this function will expand
+  to such that the resulting shape will be  [1, 3, 4].
+
+  Args:
+    tensor: The tensor to expand.
+    rank: The target dimension.
+
+  Returns:
+    The expanded tensor.
+
+  Raises:
+    ValueError: If rank of `tensor` is unknown, or if `rank` is smaller than
+      the rank of `tensor`.
+  """
+  if tensor.get_shape().ndims is None:
+    raise ValueError('tensor dimension must be known.')
+  if len(tensor.get_shape()) > rank:
+    raise ValueError(
+        '`rank` must be at least the current tensor dimension: (%s vs %s).' %
+        (rank, len(tensor.get_shape())))
+  while len(tensor.get_shape()) < rank:
+    tensor = tf.expand_dims(tensor, 0)
+  return tensor
+
+
+def build_label_priors(labels,
+                       weights=None,
+                       positive_pseudocount=1.0,
+                       negative_pseudocount=1.0,
+                       variables_collections=None):
+  """Creates an op to maintain and update label prior probabilities.
+
+  For each label, the label priors are estimated as
+      (P + sum_i w_i y_i) / (P + N + sum_i w_i),
+  where y_i is the ith label, w_i is the ith weight, P is a pseudo-count of
+  positive labels, and N is a pseudo-count of negative labels. The index i
+  ranges over all labels observed during all evaluations of the returned op.
+
+  Args:
+    labels: A `Tensor` with shape [batch_size, num_labels]. Entries should be
+      in [0, 1].
+    weights: Coefficients representing the weight of each label. Must be either
+      a Tensor of shape [batch_size, num_labels] or `None`, in which case each
+      weight is treated as 1.0.
+    positive_pseudocount: Number of positive labels used to initialize the label
+      priors.
+    negative_pseudocount: Number of negative labels used to initialize the label
+      priors.
+    variables_collections: Optional list of collections for created variables.
+
+  Returns:
+    label_priors: An op to update the weighted label_priors. Gives the
+      current value of the label priors when evaluated.
+  """
+  dtype = labels.dtype.base_dtype
+  num_labels = get_num_labels(labels)
+
+  if weights is None:
+    weights = tf.ones_like(labels)
+
+  # We disable partitioning while constructing dual variables because they will
+  # be updated with assign, which is not available for partitioned variables.
+  partitioner = tf.get_variable_scope().partitioner
+  try:
+    tf.get_variable_scope().set_partitioner(None)
+    # Create variable and update op for weighted label counts.
+    weighted_label_counts = tf.contrib.framework.model_variable(
+        name='weighted_label_counts',
+        shape=[num_labels],
+        dtype=dtype,
+        initializer=tf.constant_initializer(
+            [positive_pseudocount] * num_labels, dtype=dtype),
+        collections=variables_collections,
+        trainable=False)
+    weighted_label_counts_update = weighted_label_counts.assign_add(
+        tf.reduce_sum(weights * labels, 0))
+
+    # Create variable and update op for the sum of the weights.
+    weight_sum = tf.contrib.framework.model_variable(
+        name='weight_sum',
+        shape=[num_labels],
+        dtype=dtype,
+        initializer=tf.constant_initializer(
+            [positive_pseudocount + negative_pseudocount] * num_labels,
+            dtype=dtype),
+        collections=variables_collections,
+        trainable=False)
+    weight_sum_update = weight_sum.assign_add(tf.reduce_sum(weights, 0))
+
+  finally:
+    tf.get_variable_scope().set_partitioner(partitioner)
+
+  label_priors = tf.div(
+      weighted_label_counts_update,
+      weight_sum_update)
+  return label_priors
+
+
+def convert_and_cast(value, name, dtype):
+  """Convert input to tensor and cast to dtype.
+
+  Args:
+    value: An object whose type has a registered Tensor conversion function,
+        e.g. python numerical type or numpy array.
+    name: Name to use for the new Tensor, if one is created.
+    dtype: Optional element type for the returned tensor.
+
+  Returns:
+    A tensor.
+  """
+  return tf.cast(tf.convert_to_tensor(value, name=name), dtype=dtype)
+
+
+def prepare_loss_args(labels, logits, positive_weights, negative_weights):
+  """Prepare arguments for weighted loss functions.
+
+  If needed, will convert given arguments to appropriate type and shape.
+
+  Args:
+    labels: labels or labels of the loss function.
+    logits: Logits of the loss function.
+    positive_weights: Weight on the positive examples.
+    negative_weights: Weight on the negative examples.
+
+  Returns:
+    Converted labels, logits, positive_weights, negative_weights.
+  """
+  logits = tf.convert_to_tensor(logits, name='logits')
+  labels = convert_and_cast(labels, 'labels', logits.dtype)
+  if len(labels.get_shape()) == 2 and len(logits.get_shape()) == 3:
+    labels = tf.expand_dims(labels, [2])
+
+  positive_weights = convert_and_cast(positive_weights, 'positive_weights',
+                                      logits.dtype)
+  positive_weights = expand_outer(positive_weights, logits.get_shape().ndims)
+  negative_weights = convert_and_cast(negative_weights, 'negative_weights',
+                                      logits.dtype)
+  negative_weights = expand_outer(negative_weights, logits.get_shape().ndims)
+  return labels, logits, positive_weights, negative_weights
+
+
+def get_num_labels(labels_or_logits):
+  """Returns the number of labels inferred from labels_or_logits."""
+  if labels_or_logits.get_shape().ndims <= 1:
+    return 1
+  return labels_or_logits.get_shape()[1].value
--- a/research/global_objectives/util_test.py
+++ b/research/global_objectives/util_test.py
+# Copyright 2018 The TensorFlow Global Objectives Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for global objectives util functions."""
+
+# Dependency imports
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+
+from global_objectives import util
+
+
+def weighted_sigmoid_cross_entropy(targets, logits, weight):
+  return (weight * targets * np.log(1.0 + np.exp(-logits)) + (
+      (1.0 - targets) * np.log(1.0 + 1.0 / np.exp(-logits))))
+
+
+def hinge_loss(labels, logits):
+  # Mostly copied from tensorflow.python.ops.losses but with loss per datapoint.
+  labels = tf.to_float(labels)
+  all_ones = tf.ones_like(labels)
+  labels = tf.subtract(2 * labels, all_ones)
+  return tf.nn.relu(tf.subtract(all_ones, tf.multiply(labels, logits)))
+
+
+class WeightedSigmoidCrossEntropyTest(parameterized.TestCase, tf.test.TestCase):
+
+  def testTrivialCompatibilityWithSigmoidCrossEntropy(self):
+    """Tests compatibility with unweighted function with weight 1.0."""
+    x_shape = [300, 10]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+    weighted_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits)
+    expected_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(),
+                          weighted_loss.eval(),
+                          atol=0.000001)
+
+  def testNonTrivialCompatibilityWithSigmoidCrossEntropy(self):
+    """Tests use of an arbitrary weight (4.12)."""
+    x_shape = [300, 10]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+    weight = 4.12
+    weighted_loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets,
+        logits,
+        weight,
+        weight)
+    expected_loss = (
+        weight *
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            logits, targets))
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(),
+                          weighted_loss.eval(),
+                          atol=0.000001)
+
+  def testDifferentSizeWeightedSigmoidCrossEntropy(self):
+    """Tests correctness on 3D tensors.
+
+    Tests that the function works as expected when logits is a 3D tensor and
+    targets is a 2D tensor.
+    """
+    targets_shape = [30, 4]
+    logits_shape = [targets_shape[0], targets_shape[1], 3]
+    targets = np.random.random_sample(targets_shape).astype(np.float32)
+    logits = np.random.randn(*logits_shape).astype(np.float32)
+
+    weight_vector = [2.0, 3.0, 13.0]
+    loss = util.weighted_sigmoid_cross_entropy_with_logits(targets,
+                                                           logits,
+                                                           weight_vector)
+
+    with self.test_session():
+      loss = loss.eval()
+      for i in range(0, len(weight_vector)):
+        expected = weighted_sigmoid_cross_entropy(targets, logits[:, :, i],
+                                                  weight_vector[i])
+        self.assertAllClose(loss[:, :, i], expected, atol=0.000001)
+
+  @parameterized.parameters((300, 10, 0.3), (20, 4, 2.0), (30, 4, 3.9))
+  def testWeightedSigmoidCrossEntropy(self, batch_size, num_labels, weight):
+    """Tests thats the tf and numpy functions agree on many instances."""
+    x_shape = [batch_size, num_labels]
+    targets = np.random.random_sample(x_shape).astype(np.float32)
+    logits = np.random.randn(*x_shape).astype(np.float32)
+
+    with self.test_session():
+      loss = util.weighted_sigmoid_cross_entropy_with_logits(
+          targets,
+          logits,
+          weight,
+          1.0,
+          name='weighted-loss')
+      expected = weighted_sigmoid_cross_entropy(targets, logits, weight)
+      self.assertAllClose(expected, loss.eval(), atol=0.000001)
+
+  def testGradients(self):
+    """Tests that weighted loss gradients behave as expected."""
+    dummy_tensor = tf.constant(1.0)
+
+    positives_shape = [10, 1]
+    positives_logits = dummy_tensor * tf.Variable(
+        tf.random_normal(positives_shape) + 1.0)
+    positives_targets = tf.ones(positives_shape)
+    positives_weight = 4.6
+    positives_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            positives_logits, positives_targets) * positives_weight)
+
+    negatives_shape = [190, 1]
+    negatives_logits = dummy_tensor * tf.Variable(
+        tf.random_normal(negatives_shape))
+    negatives_targets = tf.zeros(negatives_shape)
+    negatives_weight = 0.9
+    negatives_loss = (
+        tf.contrib.nn.deprecated_flipped_sigmoid_cross_entropy_with_logits(
+            negatives_logits, negatives_targets) * negatives_weight)
+
+    all_logits = tf.concat([positives_logits, negatives_logits], 0)
+    all_targets = tf.concat([positives_targets, negatives_targets], 0)
+    weighted_loss = tf.reduce_sum(
+        util.weighted_sigmoid_cross_entropy_with_logits(
+            all_targets, all_logits, positives_weight, negatives_weight))
+    weighted_gradients = tf.gradients(weighted_loss, dummy_tensor)
+
+    expected_loss = tf.add(
+        tf.reduce_sum(positives_loss),
+        tf.reduce_sum(negatives_loss))
+    expected_gradients = tf.gradients(expected_loss, dummy_tensor)
+
+    with tf.Session() as session:
+      tf.global_variables_initializer().run()
+      grad, expected_grad = session.run(
+          [weighted_gradients, expected_gradients])
+      self.assertAllClose(grad, expected_grad)
+
+  def testDtypeFlexibility(self):
+    """Tests the loss on inputs of varying data types."""
+    shape = [20, 3]
+    logits = np.random.randn(*shape)
+    targets = tf.truncated_normal(shape)
+    positive_weights = tf.constant(3, dtype=tf.int64)
+    negative_weights = 1
+
+    loss = util.weighted_sigmoid_cross_entropy_with_logits(
+        targets, logits, positive_weights, negative_weights)
+
+    with self.test_session():
+      self.assertEqual(loss.eval().dtype, np.float)
+
+
+class WeightedHingeLossTest(tf.test.TestCase):
+
+  def testTrivialCompatibilityWithHinge(self):
+    # Tests compatibility with unweighted hinge loss.
+    x_shape = [55, 10]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.3))
+    weighted_loss = util.weighted_hinge_loss(targets, logits)
+    expected_loss = hinge_loss(targets, logits)
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), weighted_loss.eval())
+
+  def testLessTrivialCompatibilityWithHinge(self):
+    # Tests compatibility with a constant weight for positives and negatives.
+    x_shape = [56, 11]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.7))
+    weight = 1.0 + 1.0/2 + 1.0/3 + 1.0/4 + 1.0/5 + 1.0/6 + 1.0/7
+    weighted_loss = util.weighted_hinge_loss(targets, logits, weight, weight)
+    expected_loss = hinge_loss(targets, logits) * weight
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), weighted_loss.eval())
+
+  def testNontrivialCompatibilityWithHinge(self):
+    # Tests compatibility with different positive and negative weights.
+    x_shape = [23, 8]
+    logits_positives = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    logits_negatives = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets_positives = tf.ones(x_shape)
+    targets_negatives = tf.zeros(x_shape)
+    logits = tf.concat([logits_positives, logits_negatives], 0)
+    targets = tf.concat([targets_positives, targets_negatives], 0)
+
+    raw_loss = util.weighted_hinge_loss(targets,
+                                        logits,
+                                        positive_weights=3.4,
+                                        negative_weights=1.2)
+    loss = tf.reduce_sum(raw_loss, 0)
+    positives_hinge = hinge_loss(targets_positives, logits_positives)
+    negatives_hinge = hinge_loss(targets_negatives, logits_negatives)
+    expected = tf.add(tf.reduce_sum(3.4 * positives_hinge, 0),
+                      tf.reduce_sum(1.2 * negatives_hinge, 0))
+
+    with self.test_session():
+      self.assertAllClose(loss.eval(), expected.eval())
+
+  def test3DLogitsAndTargets(self):
+    # Tests correctness when logits is 3D and targets is 2D.
+    targets_shape = [30, 4]
+    logits_shape = [targets_shape[0], targets_shape[1], 3]
+    targets = tf.to_float(
+        tf.constant(np.random.random_sample(targets_shape) > 0.7))
+    logits = tf.constant(np.random.randn(*logits_shape).astype(np.float32))
+    weight_vector = [1.0, 1.0, 1.0]
+    loss = util.weighted_hinge_loss(targets, logits, weight_vector)
+
+    with self.test_session():
+      loss_value = loss.eval()
+      for i in range(len(weight_vector)):
+        expected = hinge_loss(targets, logits[:, :, i]).eval()
+        self.assertAllClose(loss_value[:, :, i], expected)
+
+
+class BuildLabelPriorsTest(tf.test.TestCase):
+
+  def testLabelPriorConsistency(self):
+    # Checks that, with zero pseudocounts, the returned label priors reproduce
+    # label frequencies in the batch.
+    batch_shape = [4, 10]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.678)))
+
+    label_priors_update = util.build_label_priors(
+        labels=labels, positive_pseudocount=0, negative_pseudocount=0)
+    expected_priors = tf.reduce_mean(labels, 0)
+
+    with self.test_session():
+      tf.global_variables_initializer().run()
+      self.assertAllClose(label_priors_update.eval(), expected_priors.eval())
+
+  def testLabelPriorsUpdate(self):
+    # Checks that the update of label priors behaves as expected.
+    batch_shape = [1, 5]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.4)))
+    label_priors_update = util.build_label_priors(labels)
+
+    label_sum = np.ones(shape=batch_shape)
+    weight_sum = 2.0 * np.ones(shape=batch_shape)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+
+      for _ in range(3):
+        label_sum += labels.eval()
+        weight_sum += np.ones(shape=batch_shape)
+        expected_posteriors = label_sum / weight_sum
+        label_priors = label_priors_update.eval().reshape(batch_shape)
+        self.assertAllClose(label_priors, expected_posteriors)
+
+        # Re-initialize labels to get a new random sample.
+        session.run(labels.initializer)
+
+  def testLabelPriorsUpdateWithWeights(self):
+    # Checks the update of label priors with per-example weights.
+    batch_size = 6
+    num_labels = 5
+    batch_shape = [batch_size, num_labels]
+    labels = tf.Variable(
+        tf.to_float(tf.greater(tf.random_uniform(batch_shape), 0.6)))
+    weights = tf.Variable(tf.random_uniform(batch_shape) * 6.2)
+
+    update_op = util.build_label_priors(labels, weights=weights)
+
+    expected_weighted_label_counts = 1.0 + tf.reduce_sum(weights * labels, 0)
+    expected_weight_sum = 2.0 + tf.reduce_sum(weights, 0)
+    expected_label_posteriors = tf.divide(expected_weighted_label_counts,
+                                          expected_weight_sum)
+
+    with self.test_session() as session:
+      tf.global_variables_initializer().run()
+
+      updated_priors, expected_posteriors = session.run(
+          [update_op, expected_label_posteriors])
+      self.assertAllClose(updated_priors, expected_posteriors)
+
+
+class WeightedSurrogateLossTest(parameterized.TestCase, tf.test.TestCase):
+
+  @parameterized.parameters(
+      ('hinge', util.weighted_hinge_loss),
+      ('xent', util.weighted_sigmoid_cross_entropy_with_logits))
+  def testCompatibilityLoss(self, loss_name, loss_fn):
+    x_shape = [28, 4]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.5))
+    positive_weights = 0.66
+    negative_weights = 11.1
+    expected_loss = loss_fn(
+        targets,
+        logits,
+        positive_weights=positive_weights,
+        negative_weights=negative_weights)
+    computed_loss = util.weighted_surrogate_loss(
+        targets,
+        logits,
+        loss_name,
+        positive_weights=positive_weights,
+        negative_weights=negative_weights)
+    with self.test_session():
+      self.assertAllClose(expected_loss.eval(), computed_loss.eval())
+
+  def testSurrogatgeError(self):
+    x_shape = [7, 3]
+    logits = tf.constant(np.random.randn(*x_shape).astype(np.float32))
+    targets = tf.to_float(tf.constant(np.random.random_sample(x_shape) > 0.5))
+
+    with self.assertRaises(ValueError):
+      util.weighted_surrogate_loss(logits, targets, 'bug')
+
+
+if __name__ == '__main__':
+  tf.test.main()