Merge pull request #5992 from npapernot/master

remove code that was migrated to tf/privacy

Merge pull request #5992 from npapernot/master
remove code that was migrated to tf/privacy
fd2802cf · Ilya Mironov · GitHub · fc2f94b3 · 63c0aa5f · fd2802cf
Unverified Commit fd2802cf authored Jan 04, 2019 by Ilya Mironov Committed by GitHub Jan 04, 2019
16 changed files
--- a/research/differential_privacy/README.md
+++ b/research/differential_privacy/README.md
-<font size=4><b>Deep Learning with Differential Privacy</b></font>
+# Deep Learning with Differential Privacy
-Open Sourced By: Xin Pan
-### Introduction for [dp_sgd/README.md](dp_sgd/README.md)
-Machine learning techniques based on neural networks are achieving remarkable 
-results in a wide variety of domains. Often, the training of models requires 
-large, representative datasets, which may be crowdsourced and contain sensitive 
-information. The models should not expose private information in these datasets. 
-Addressing this goal, we develop new algorithmic techniques for learning and a 
-refined analysis of privacy costs within the framework of differential privacy. 
-Our implementation and experiments demonstrate that we can train deep neural 
-networks with non-convex objectives, under a modest privacy budget, and at a 
-manageable cost in software complexity, training efficiency, and model quality.
-paper: https://arxiv.org/abs/1607.00133
+Most of the content from this directory has moved to the [tensorflow/privacy](https://github.com/tensorflow/privacy) repository, which is dedicated to learning with (differential) privacy. The remaining code is related to the PATE papers from ICLR 2017 and 2018.
 ### Introduction for [multiple_teachers/README.md](multiple_teachers/README.md)
@@ -29,3 +13,10 @@ private manner by noisily aggregating the teacher decisions before feeding them
 to the student during training.
 paper: https://arxiv.org/abs/1610.05755
+### Introduction for [pate/README.md](pate/README.md)
+Implementation of an RDP privacy accountant and smooth sensitivity analysis for the PATE framework. The underlying theory and supporting experiments appear in "Scalable Private Learning with PATE" by Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson (ICLR 2018) 
+paper: https://arxiv.org/abs/1802.08908
--- a/research/differential_privacy/dp_sgd/README.md
+++ b/research/differential_privacy/dp_sgd/README.md
-<font size=4><b>Deep Learning with Differential Privacy</b></font>
-Authors:
-Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang
-Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718)
-<Introduction>
-Machine learning techniques based on neural networks are achieving remarkable
-results in a wide variety of domains. Often, the training of models requires
-large, representative datasets, which may be crowdsourced and contain sensitive
-information. The models should not expose private information in these datasets.
-Addressing this goal, we develop new algorithmic techniques for learning and a
-refined analysis of privacy costs within the framework of differential privacy.
-Our implementation and experiments demonstrate that we can train deep neural
-networks with non-convex objectives, under a modest privacy budget, and at a
-manageable cost in software complexity, training efficiency, and model quality.
-paper: https://arxiv.org/abs/1607.00133
-<b>Requirements:</b>
-1. Tensorflow 0.10.0 (master branch)
-Note: r0.11 might experience some problems
-2. Bazel 0.3.1 (<em>Optional</em>)
-3. Download MNIST data (tfrecord format) <br>
-   ```shell
-   cd models/research/slim
-   DATA_DIR=/tmp/mnist/
-   mkdir /tmp/mnist
-   python download_and_convert_data.py --dataset_name=mnist --dataset_dir="${DATA_DIR}"
-   ```
-<b>How to run:</b>
-```shell
-# Clone the codes under differential_privacy.
-# Create an empty WORKSPACE file.
-# List the codes (Optional).
-$ ls -R differential_privacy/
-differential_privacy/:
-dp_sgd  __init__.py  privacy_accountant  README.md
-differential_privacy/dp_sgd:
-dp_mnist  dp_optimizer  per_example_gradients  README.md
-differential_privacy/dp_sgd/dp_mnist:
-BUILD  dp_mnist.py
-differential_privacy/dp_sgd/dp_optimizer:
-BUILD  dp_optimizer.py  dp_pca.py  sanitizer.py  utils.py
-differential_privacy/dp_sgd/per_example_gradients:
-BUILD  per_example_gradients.py
-differential_privacy/privacy_accountant:
-python  tf
-differential_privacy/privacy_accountant/python:
-BUILD  gaussian_moments.py
-differential_privacy/privacy_accountant/tf:
-accountant.py  accountant_test.py  BUILD
-# List the data (optional).
-$ mv /tmp/mnist/mnist_train.tfrecord data
-$ mv /tmp/mnist/mnist_test.tfrecord data
-$ ls -R data/
-./data:
-mnist_test.tfrecord  mnist_train.tfrecord
-# Build the codes (optional).
-$ bazel build -c opt differential_privacy/...
-# Run the mnist differential privacy training codes.
-# 1. With bazel
-$ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
-    --training_data_path=data/mnist_train.tfrecord \
-    --eval_data_path=data/mnist_test.tfrecord \
-    --save_path=/tmp/mnist_dir
-# 2. Or without (by default data is in /tmp/mnist)
-python dp_sgd/dp_mnist/dp_mnist.py  
-...
-step: 1
-step: 2
-...
-step: 9
-spent privacy: eps 0.1250 delta 0.72709
-spent privacy: eps 0.2500 delta 0.24708
-spent privacy: eps 0.5000 delta 0.0029139
-spent privacy: eps 1.0000 delta 6.494e-10
-spent privacy: eps 2.0000 delta 8.2242e-24
-spent privacy: eps 4.0000 delta 1.319e-51
-spent privacy: eps 8.0000 delta 3.3927e-107
-train_accuracy: 0.53
-eval_accuracy: 0.53
-...
-$ ls /tmp/mnist_dir/
-checkpoint  ckpt  ckpt.meta  results-0.json
-```
--- a/research/differential_privacy/dp_sgd/dp_mnist/BUILD
+++ b/research/differential_privacy/dp_sgd/dp_mnist/BUILD
-package(default_visibility = [":internal"])
-licenses(["notice"])  # Apache 2.0
-exports_files(["LICENSE"])
-package_group(
-    name = "internal",
-    packages = [
-        "//differential_privacy/...",
-    ],
-)
-py_binary(
-    name = "dp_mnist",
-    srcs = [
-        "dp_mnist.py",
-    ],
-    deps = [
-        "//differential_privacy/dp_sgd/dp_optimizer",
-        "//differential_privacy/dp_sgd/dp_optimizer:dp_pca",
-        "//differential_privacy/dp_sgd/dp_optimizer:utils",
-    ],
-)
--- a/research/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py
+++ b/research/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py
--- a/research/differential_privacy/dp_sgd/dp_optimizer/BUILD
+++ b/research/differential_privacy/dp_sgd/dp_optimizer/BUILD
-package(default_visibility = [":internal"])
-licenses(["notice"])  # Apache 2.0
-exports_files(["LICENSE"])
-package_group(
-    name = "internal",
-    packages = [
-        "//differential_privacy/...",
-    ],
-)
-py_library(
-    name = "utils",
-    srcs = [
-        "utils.py",
-    ],
-    deps = [
-    ],
-)
-py_library(
-    name = "dp_pca",
-    srcs = [
-        "dp_pca.py",
-    ],
-    deps = [
-    ],
-)
-py_library(
-    name = "dp_optimizer",
-    srcs = [
-        "dp_optimizer.py",
-        "sanitizer.py",
-    ],
-    deps = [
-        ":utils",
-        "//differential_privacy/dp_sgd/per_example_gradients",
-        "//differential_privacy/privacy_accountant/tf:accountant",
-    ],
-)
--- a/research/differential_privacy/dp_sgd/dp_optimizer/dp_optimizer.py
+++ b/research/differential_privacy/dp_sgd/dp_optimizer/dp_optimizer.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Differentially private optimizers.
-"""
-from __future__ import division
-import tensorflow as tf
-from differential_privacy.dp_sgd.dp_optimizer import utils
-from differential_privacy.dp_sgd.per_example_gradients import per_example_gradients
-class DPGradientDescentOptimizer(tf.train.GradientDescentOptimizer):
-  """Differentially private gradient descent optimizer.
-  """
-  def __init__(self, learning_rate, eps_delta, sanitizer,
-               sigma=None, use_locking=False, name="DPGradientDescent",
-               batches_per_lot=1):
-    """Construct a differentially private gradient descent optimizer.
-    The optimizer uses fixed privacy budget for each batch of training.
-    Args:
-      learning_rate: for GradientDescentOptimizer.
-      eps_delta: EpsDelta pair for each epoch.
-      sanitizer: for sanitizing the graident.
-      sigma: noise sigma. If None, use eps_delta pair to compute sigma;
-        otherwise use supplied sigma directly.
-      use_locking: use locking.
-      name: name for the object.
-      batches_per_lot: Number of batches in a lot.
-    """
-    super(DPGradientDescentOptimizer, self).__init__(learning_rate,
-                                                     use_locking, name)
-    # Also, if needed, define the gradient accumulators
-    self._batches_per_lot = batches_per_lot
-    self._grad_accum_dict = {}
-    if batches_per_lot > 1:
-      self._batch_count = tf.Variable(1, dtype=tf.int32, trainable=False,
-                                      name="batch_count")
-      var_list = tf.trainable_variables()
-      with tf.variable_scope("grad_acc_for"):
-        for var in var_list:
-          v_grad_accum = tf.Variable(tf.zeros_like(var),
-                                     trainable=False,
-                                     name=utils.GetTensorOpName(var))
-          self._grad_accum_dict[var.name] = v_grad_accum
-    self._eps_delta = eps_delta
-    self._sanitizer = sanitizer
-    self._sigma = sigma
-  def compute_sanitized_gradients(self, loss, var_list=None,
-                                  add_noise=True):
-    """Compute the sanitized gradients.
-    Args:
-      loss: the loss tensor.
-      var_list: the optional variables.
-      add_noise: if true, then add noise. Always clip.
-    Returns:
-      a pair of (list of sanitized gradients) and privacy spending accumulation
-      operations.
-    Raises:
-      TypeError: if var_list contains non-variable.
-    """
-    self._assert_valid_dtypes([loss])
-    xs = [tf.convert_to_tensor(x) for x in var_list]
-    px_grads = per_example_gradients.PerExampleGradients(loss, xs)
-    sanitized_grads = []
-    for px_grad, v in zip(px_grads, var_list):
-      tensor_name = utils.GetTensorOpName(v)
-      sanitized_grad = self._sanitizer.sanitize(
-          px_grad, self._eps_delta, sigma=self._sigma,
-          tensor_name=tensor_name, add_noise=add_noise,
-          num_examples=self._batches_per_lot * tf.slice(
-              tf.shape(px_grad), [0], [1]))
-      sanitized_grads.append(sanitized_grad)
-    return sanitized_grads
-  def minimize(self, loss, global_step=None, var_list=None,
-               name=None):
-    """Minimize using sanitized gradients.
-    This gets a var_list which is the list of trainable variables.
-    For each var in var_list, we defined a grad_accumulator variable
-    during init. When batches_per_lot > 1, we accumulate the gradient
-    update in those. At the end of each lot, we apply the update back to
-    the variable. This has the effect that for each lot we compute
-    gradients at the point at the beginning of the lot, and then apply one
-    update at the end of the lot. In other words, semantically, we are doing
-    SGD with one lot being the equivalent of one usual batch of size
-    batch_size * batches_per_lot.
-    This allows us to simulate larger batches than our memory size would permit.
-    The lr and the num_steps are in the lot world.
-    Args:
-      loss: the loss tensor.
-      global_step: the optional global step.
-      var_list: the optional variables.
-      name: the optional name.
-    Returns:
-      the operation that runs one step of DP gradient descent.
-    """
-    # First validate the var_list
-    if var_list is None:
-      var_list = tf.trainable_variables()
-    for var in var_list:
-      if not isinstance(var, tf.Variable):
-        raise TypeError("Argument is not a variable.Variable: %s" % var)
-    # Modification: apply gradient once every batches_per_lot many steps.
-    # This may lead to smaller error
-    if self._batches_per_lot == 1:
-      sanitized_grads = self.compute_sanitized_gradients(
-          loss, var_list=var_list)
-      grads_and_vars = list(zip(sanitized_grads, var_list))
-      self._assert_valid_dtypes([v for g, v in grads_and_vars if g is not None])
-      apply_grads = self.apply_gradients(grads_and_vars,
-                                         global_step=global_step, name=name)
-      return apply_grads
-    # Condition for deciding whether to accumulate the gradient
-    # or actually apply it.
-    # we use a private self_batch_count to keep track of number of batches.
-    # global step will count number of lots processed.
-    update_cond = tf.equal(tf.constant(0),
-                           tf.mod(self._batch_count,
-                                  tf.constant(self._batches_per_lot)))
-    # Things to do for batches other than last of the lot.
-    # Add non-noisy clipped grads to shadow variables.
-    def non_last_in_lot_op(loss, var_list):
-      """Ops to do for a typical batch.
-      For a batch that is not the last one in the lot, we simply compute the
-      sanitized gradients and apply them to the grad_acc variables.
-      Args:
-        loss: loss function tensor
-        var_list: list of variables
-      Returns:
-        A tensorflow op to do the updates to the gradient accumulators
-      """
-      sanitized_grads = self.compute_sanitized_gradients(
-          loss, var_list=var_list, add_noise=False)
-      update_ops_list = []
-      for var, grad in zip(var_list, sanitized_grads):
-        grad_acc_v = self._grad_accum_dict[var.name]
-        update_ops_list.append(grad_acc_v.assign_add(grad))
-      update_ops_list.append(self._batch_count.assign_add(1))
-      return tf.group(*update_ops_list)
-    # Things to do for last batch of a lot.
-    # Add noisy clipped grads to accumulator.
-    # Apply accumulated grads to vars.
-    def last_in_lot_op(loss, var_list, global_step):
-      """Ops to do for last batch in a lot.
-      For the last batch in the lot, we first add the sanitized gradients to
-      the gradient acc variables, and then apply these
-      values over to the original variables (via an apply gradient)
-      Args:
-        loss: loss function tensor
-        var_list: list of variables
-        global_step: optional global step to be passed to apply_gradients
-      Returns:
-        A tensorflow op to push updates from shadow vars to real vars.
-      """
-      # We add noise in the last lot. This is why we need this code snippet
-      # that looks almost identical to the non_last_op case here.
-      sanitized_grads = self.compute_sanitized_gradients(
-          loss, var_list=var_list, add_noise=True)
-      normalized_grads = []
-      for var, grad in zip(var_list, sanitized_grads):
-        grad_acc_v = self._grad_accum_dict[var.name]
-        # To handle the lr difference per lot vs per batch, we divide the
-        # update by number of batches per lot.
-        normalized_grad = tf.div(grad_acc_v.assign_add(grad),
-                                 tf.to_float(self._batches_per_lot))
-        normalized_grads.append(normalized_grad)
-      with tf.control_dependencies(normalized_grads):
-        grads_and_vars = zip(normalized_grads, var_list)
-        self._assert_valid_dtypes(
-            [v for g, v in grads_and_vars if g is not None])
-        apply_san_grads = self.apply_gradients(grads_and_vars,
-                                               global_step=global_step,
-                                               name="apply_grads")
-      # Now reset the accumulators to zero
-      resets_list = []
-      with tf.control_dependencies([apply_san_grads]):
-        for _, acc in self._grad_accum_dict.items():
-          reset = tf.assign(acc, tf.zeros_like(acc))
-          resets_list.append(reset)
-      resets_list.append(self._batch_count.assign_add(1))
-      last_step_update = tf.group(*([apply_san_grads] + resets_list))
-      return last_step_update
-    # pylint: disable=g-long-lambda
-    update_op = tf.cond(update_cond,
-                        lambda: last_in_lot_op(
-                            loss, var_list,
-                            global_step),
-                        lambda: non_last_in_lot_op(
-                            loss, var_list))
-    return tf.group(update_op)
--- a/research/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py
+++ b/research/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Differentially private optimizers.
-"""
-import tensorflow as tf
-from differential_privacy.dp_sgd.dp_optimizer import sanitizer as san
-def ComputeDPPrincipalProjection(data, projection_dims,
-                                 sanitizer, eps_delta, sigma):
-  """Compute differentially private projection.
-  Args:
-    data: the input data, each row is a data vector.
-    projection_dims: the projection dimension.
-    sanitizer: the sanitizer used for achieving privacy.
-    eps_delta: (eps, delta) pair.
-    sigma: if not None, use noise sigma; otherwise compute it using
-      eps_delta pair.
-  Returns:
-    A projection matrix with projection_dims columns.
-  """
-  eps, delta = eps_delta
-  # Normalize each row.
-  normalized_data = tf.nn.l2_normalize(data, 1)
-  covar = tf.matmul(tf.transpose(normalized_data), normalized_data)
-  saved_shape = tf.shape(covar)
-  num_examples = tf.slice(tf.shape(data), [0], [1])
-  if eps > 0:
-    # Since the data is already normalized, there is no need to clip
-    # the covariance matrix.
-    assert delta > 0
-    saned_covar = sanitizer.sanitize(
-        tf.reshape(covar, [1, -1]), eps_delta, sigma=sigma,
-        option=san.ClipOption(1.0, False), num_examples=num_examples)
-    saned_covar = tf.reshape(saned_covar, saved_shape)
-    # Symmetrize saned_covar. This also reduces the noise variance.
-    saned_covar = 0.5 * (saned_covar + tf.transpose(saned_covar))
-  else:
-    saned_covar = covar
-  # Compute the eigen decomposition of the covariance matrix, and
-  # return the top projection_dims eigen vectors, represented as columns of
-  # the projection matrix.
-  eigvals, eigvecs = tf.self_adjoint_eig(saned_covar)
-  _, topk_indices = tf.nn.top_k(eigvals, projection_dims)
-  topk_indices = tf.reshape(topk_indices, [projection_dims])
-  # Gather and return the corresponding eigenvectors.
-  return tf.transpose(tf.gather(tf.transpose(eigvecs), topk_indices))
--- a/research/differential_privacy/dp_sgd/dp_optimizer/sanitizer.py
+++ b/research/differential_privacy/dp_sgd/dp_optimizer/sanitizer.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Defines Sanitizer class for sanitizing tensors.
-A sanitizer first limits the sensitivity of a tensor and then adds noise
-to the tensor. The parameters are determined by the privacy_spending and the
-other parameters. It also uses an accountant to keep track of the privacy
-spending.
-"""
-from __future__ import division
-import collections
-import tensorflow as tf
-from differential_privacy.dp_sgd.dp_optimizer import utils
-ClipOption = collections.namedtuple("ClipOption",
-                                    ["l2norm_bound", "clip"])
-class AmortizedGaussianSanitizer(object):
-  """Sanitizer with Gaussian noise and amoritzed privacy spending accounting.
-  This sanitizes a tensor by first clipping the tensor, summing the tensor
-  and then adding appropriate amount of noise. It also uses an amortized
-  accountant to keep track of privacy spending.
-  """
-  def __init__(self, accountant, default_option):
-    """Construct an AmortizedGaussianSanitizer.
-    Args:
-      accountant: the privacy accountant. Expect an amortized one.
-      default_option: the default ClipOptoin.
-    """
-    self._accountant = accountant
-    self._default_option = default_option
-    self._options = {}
-  def set_option(self, tensor_name, option):
-    """Set options for an individual tensor.
-    Args:
-      tensor_name: the name of the tensor.
-      option: clip option.
-    """
-    self._options[tensor_name] = option
-  def sanitize(self, x, eps_delta, sigma=None,
-               option=ClipOption(None, None), tensor_name=None,
-               num_examples=None, add_noise=True):
-    """Sanitize the given tensor.
-    This santize a given tensor by first applying l2 norm clipping and then
-    adding Gaussian noise. It calls the privacy accountant for updating the
-    privacy spending.
-    Args:
-      x: the tensor to sanitize.
-      eps_delta: a pair of eps, delta for (eps,delta)-DP. Use it to
-        compute sigma if sigma is None.
-      sigma: if sigma is not None, use sigma.
-      option: a ClipOption which, if supplied, used for
-        clipping and adding noise.
-      tensor_name: the name of the tensor.
-      num_examples: if None, use the number of "rows" of x.
-      add_noise: if True, then add noise, else just clip.
-    Returns:
-      a pair of sanitized tensor and the operation to accumulate privacy
-      spending.
-    """
-    if sigma is None:
-      # pylint: disable=unpacking-non-sequence
-      eps, delta = eps_delta
-      with tf.control_dependencies(
-          [tf.Assert(tf.greater(eps, 0),
-                     ["eps needs to be greater than 0"]),
-           tf.Assert(tf.greater(delta, 0),
-                     ["delta needs to be greater than 0"])]):
-        # The following formula is taken from
-        #   Dwork and Roth, The Algorithmic Foundations of Differential
-        #   Privacy, Appendix A.
-        #   http://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
-        sigma = tf.sqrt(2.0 * tf.log(1.25 / delta)) / eps
-    l2norm_bound, clip = option
-    if l2norm_bound is None:
-      l2norm_bound, clip = self._default_option
-      if ((tensor_name is not None) and
-          (tensor_name in self._options)):
-        l2norm_bound, clip = self._options[tensor_name]
-    if clip:
-      x = utils.BatchClipByL2norm(x, l2norm_bound)
-    if add_noise:
-      if num_examples is None:
-        num_examples = tf.slice(tf.shape(x), [0], [1])
-      privacy_accum_op = self._accountant.accumulate_privacy_spending(
-          eps_delta, sigma, num_examples)
-      with tf.control_dependencies([privacy_accum_op]):
-        saned_x = utils.AddGaussianNoise(tf.reduce_sum(x, 0),
-                                         sigma * l2norm_bound)
-    else:
-      saned_x = tf.reduce_sum(x, 0)
-    return saned_x
--- a/research/differential_privacy/dp_sgd/dp_optimizer/utils.py
+++ b/research/differential_privacy/dp_sgd/dp_optimizer/utils.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Utils for building and training NN models.
-"""
-from __future__ import division
-import math
-import numpy
-import tensorflow as tf
-class LayerParameters(object):
-  """class that defines a non-conv layer."""
-  def __init__(self):
-    self.name = ""
-    self.num_units = 0
-    self._with_bias = False
-    self.relu = False
-    self.gradient_l2norm_bound = 0.0
-    self.bias_gradient_l2norm_bound = 0.0
-    self.trainable = True
-    self.weight_decay = 0.0
-class ConvParameters(object):
-  """class that defines a conv layer."""
-  def __init__(self):
-    self.patch_size = 5
-    self.stride = 1
-    self.in_channels = 1
-    self.out_channels = 0
-    self.with_bias = True
-    self.relu = True
-    self.max_pool = True
-    self.max_pool_size = 2
-    self.max_pool_stride = 2
-    self.trainable = False
-    self.in_size = 28
-    self.name = ""
-    self.num_outputs = 0
-    self.bias_stddev = 0.1
-# Parameters for a layered neural network.
-class NetworkParameters(object):
-  """class that define the overall model structure."""
-  def __init__(self):
-    self.input_size = 0
-    self.projection_type = 'NONE'  # NONE, RANDOM, PCA
-    self.projection_dimensions = 0
-    self.default_gradient_l2norm_bound = 0.0
-    self.layer_parameters = []  # List of LayerParameters
-    self.conv_parameters = []  # List of ConvParameters
-def GetTensorOpName(x):
-  """Get the name of the op that created a tensor.
-  Useful for naming related tensors, as ':' in name field of op is not permitted
-  Args:
-    x: the input tensor.
-  Returns:
-    the name of the op.
-  """
-  t = x.name.rsplit(":", 1)
-  if len(t) == 1:
-    return x.name
-  else:
-    return t[0]
-def BuildNetwork(inputs, network_parameters):
-  """Build a network using the given parameters.
-  Args:
-    inputs: a Tensor of floats containing the input data.
-    network_parameters: NetworkParameters object
-      that describes the parameters for the network.
-  Returns:
-    output, training_parameters: where the outputs (a tensor) is the output
-      of the network, and training_parameters (a dictionary that maps the
-      name of each variable to a dictionary of parameters) is the parameters
-      used during training.
-  """
-  training_parameters = {}
-  num_inputs = network_parameters.input_size
-  outputs = inputs
-  projection = None
-  # First apply convolutions, if needed
-  for conv_param in network_parameters.conv_parameters:
-    outputs = tf.reshape(
-        outputs,
-        [-1, conv_param.in_size, conv_param.in_size,
-         conv_param.in_channels])
-    conv_weights_name = "%s_conv_weight" % (conv_param.name)
-    conv_bias_name = "%s_conv_bias" % (conv_param.name)
-    conv_std_dev = 1.0 / (conv_param.patch_size
-                          * math.sqrt(conv_param.in_channels))
-    conv_weights = tf.Variable(
-        tf.truncated_normal([conv_param.patch_size,
-                             conv_param.patch_size,
-                             conv_param.in_channels,
-                             conv_param.out_channels],
-                            stddev=conv_std_dev),
-        trainable=conv_param.trainable,
-        name=conv_weights_name)
-    conv_bias = tf.Variable(
-        tf.truncated_normal([conv_param.out_channels],
-                            stddev=conv_param.bias_stddev),
-        trainable=conv_param.trainable,
-        name=conv_bias_name)
-    training_parameters[conv_weights_name] = {}
-    training_parameters[conv_bias_name] = {}
-    conv = tf.nn.conv2d(outputs, conv_weights,
-                        strides=[1, conv_param.stride,
-                                 conv_param.stride, 1],
-                        padding="SAME")
-    relud = tf.nn.relu(conv + conv_bias)
-    mpd = tf.nn.max_pool(relud, ksize=[1,
-                                       conv_param.max_pool_size,
-                                       conv_param.max_pool_size, 1],
-                         strides=[1, conv_param.max_pool_stride,
-                                  conv_param.max_pool_stride, 1],
-                         padding="SAME")
-    outputs = mpd
-    num_inputs = conv_param.num_outputs
-    # this should equal
-    # in_size * in_size * out_channels / (stride * max_pool_stride)
-  # once all the convs are done, reshape to make it flat
-  outputs = tf.reshape(outputs, [-1, num_inputs])
-  # Now project, if needed
-  if network_parameters.projection_type is not "NONE":
-    projection = tf.Variable(tf.truncated_normal(
-        [num_inputs, network_parameters.projection_dimensions],
-        stddev=1.0 / math.sqrt(num_inputs)), trainable=False, name="projection")
-    num_inputs = network_parameters.projection_dimensions
-    outputs = tf.matmul(outputs, projection)
-  # Now apply any other layers
-  for layer_parameters in network_parameters.layer_parameters:
-    num_units = layer_parameters.num_units
-    hidden_weights_name = "%s_weight" % (layer_parameters.name)
-    hidden_weights = tf.Variable(
-        tf.truncated_normal([num_inputs, num_units],
-                            stddev=1.0 / math.sqrt(num_inputs)),
-        name=hidden_weights_name, trainable=layer_parameters.trainable)
-    training_parameters[hidden_weights_name] = {}
-    if layer_parameters.gradient_l2norm_bound:
-      training_parameters[hidden_weights_name]["gradient_l2norm_bound"] = (
-          layer_parameters.gradient_l2norm_bound)
-    if layer_parameters.weight_decay:
-      training_parameters[hidden_weights_name]["weight_decay"] = (
-          layer_parameters.weight_decay)
-    outputs = tf.matmul(outputs, hidden_weights)
-    if layer_parameters.with_bias:
-      hidden_biases_name = "%s_bias" % (layer_parameters.name)
-      hidden_biases = tf.Variable(tf.zeros([num_units]),
-                                  name=hidden_biases_name)
-      training_parameters[hidden_biases_name] = {}
-      if layer_parameters.bias_gradient_l2norm_bound:
-        training_parameters[hidden_biases_name][
-            "bias_gradient_l2norm_bound"] = (
-                layer_parameters.bias_gradient_l2norm_bound)
-      outputs += hidden_biases
-    if layer_parameters.relu:
-      outputs = tf.nn.relu(outputs)
-    # num_inputs for the next layer is num_units in the current layer.
-    num_inputs = num_units
-  return outputs, projection, training_parameters
-def VaryRate(start, end, saturate_epochs, epoch):
-  """Compute a linearly varying number.
-  Decrease linearly from start to end until epoch saturate_epochs.
-  Args:
-    start: the initial number.
-    end: the end number.
-    saturate_epochs: after this we do not reduce the number; if less than
-      or equal to zero, just return start.
-    epoch: the current learning epoch.
-  Returns:
-    the caculated number.
-  """
-  if saturate_epochs <= 0:
-    return start
-  step = (start - end) / (saturate_epochs - 1)
-  if epoch < saturate_epochs:
-    return start - step * epoch
-  else:
-    return end
-def BatchClipByL2norm(t, upper_bound, name=None):
-  """Clip an array of tensors by L2 norm.
-  Shrink each dimension-0 slice of tensor (for matrix it is each row) such
-  that the l2 norm is at most upper_bound. Here we clip each row as it
-  corresponds to each example in the batch.
-  Args:
-    t: the input tensor.
-    upper_bound: the upperbound of the L2 norm.
-    name: optional name.
-  Returns:
-    the clipped tensor.
-  """
-  assert upper_bound > 0
-  with tf.name_scope(values=[t, upper_bound], name=name,
-                     default_name="batch_clip_by_l2norm") as name:
-    saved_shape = tf.shape(t)
-    batch_size = tf.slice(saved_shape, [0], [1])
-    t2 = tf.reshape(t, tf.concat(axis=0, values=[batch_size, [-1]]))
-    upper_bound_inv = tf.fill(tf.slice(saved_shape, [0], [1]),
-                              tf.constant(1.0/upper_bound))
-    # Add a small number to avoid divide by 0
-    l2norm_inv = tf.rsqrt(tf.reduce_sum(t2 * t2, [1]) + 0.000001)
-    scale = tf.minimum(l2norm_inv, upper_bound_inv) * upper_bound
-    clipped_t = tf.matmul(tf.diag(scale), t2)
-    clipped_t = tf.reshape(clipped_t, saved_shape, name=name)
-  return clipped_t
-def SoftThreshold(t, threshold_ratio, name=None):
-  """Soft-threshold a tensor by the mean value.
-  Softthreshold each dimension-0 vector (for matrix it is each column) by
-  the mean of absolute value multiplied by the threshold_ratio factor. Here
-  we soft threshold each column as it corresponds to each unit in a layer.
-  Args:
-    t: the input tensor.
-    threshold_ratio: the threshold ratio.
-    name: the optional name for the returned tensor.
-  Returns:
-    the thresholded tensor, where each entry is soft-thresholded by
-    threshold_ratio times the mean of the aboslute value of each column.
-  """
-  assert threshold_ratio >= 0
-  with tf.name_scope(values=[t, threshold_ratio], name=name,
-                     default_name="soft_thresholding") as name:
-    saved_shape = tf.shape(t)
-    t2 = tf.reshape(t, tf.concat(axis=0, values=[tf.slice(saved_shape, [0], [1]), -1]))
-    t_abs = tf.abs(t2)
-    t_x = tf.sign(t2) * tf.nn.relu(t_abs -
-                                   (tf.reduce_mean(t_abs, [0],
-                                                   keep_dims=True) *
-                                    threshold_ratio))
-    return tf.reshape(t_x, saved_shape, name=name)
-def AddGaussianNoise(t, sigma, name=None):
-  """Add i.i.d. Gaussian noise (0, sigma^2) to every entry of t.
-  Args:
-    t: the input tensor.
-    sigma: the stddev of the Gaussian noise.
-    name: optional name.
-  Returns:
-    the noisy tensor.
-  """
-  with tf.name_scope(values=[t, sigma], name=name,
-                     default_name="add_gaussian_noise") as name:
-    noisy_t = t + tf.random_normal(tf.shape(t), stddev=sigma)
-  return noisy_t
-def GenerateBinomialTable(m):
-  """Generate binomial table.
-  Args:
-    m: the size of the table.
-  Returns:
-    A two dimensional array T where T[i][j] = (i choose j),
-    for 0<= i, j <=m.
-  """
-  table = numpy.zeros((m + 1, m + 1), dtype=numpy.float64)
-  for i in range(m + 1):
-    table[i, 0] = 1
-  for i in range(1, m + 1):
-    for j in range(1, m + 1):
-      v = table[i - 1, j] + table[i - 1, j -1]
-      assert not math.isnan(v) and not math.isinf(v)
-      table[i, j] = v
-  return tf.convert_to_tensor(table)
--- a/research/differential_privacy/dp_sgd/per_example_gradients/BUILD
+++ b/research/differential_privacy/dp_sgd/per_example_gradients/BUILD
-package(default_visibility = [":internal"])
-licenses(["notice"])  # Apache 2.0
-exports_files(["LICENSE"])
-package_group(
-    name = "internal",
-    packages = [
-        "//differential_privacy/...",
-    ],
-)
-py_library(
-    name = "per_example_gradients",
-    srcs = [
-        "per_example_gradients.py",
-    ],
-    deps = [
-    ],
-)
--- a/research/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py
+++ b/research/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Per-example gradients for selected ops."""
-import collections
-from six.moves import xrange
-import tensorflow as tf
-OrderedDict = collections.OrderedDict
-def _ListUnion(list_1, list_2):
-  """Returns the union of two lists.
-  Python sets can have a non-deterministic iteration order. In some
-  contexts, this could lead to TensorFlow producing two different
-  programs when the same Python script is run twice. In these contexts
-  we use lists instead of sets.
-  This function is not designed to be especially fast and should only
-  be used with small lists.
-  Args:
-    list_1: A list
-    list_2: Another list
-  Returns:
-    A new list containing one copy of each unique element of list_1 and
-    list_2. Uniqueness is determined by "x in union" logic; e.g. two
-    string of that value appearing in the union.
-  Raises:
-    TypeError: The arguments are not lists.
-  """
-  if not (isinstance(list_1, list) and isinstance(list_2, list)):
-    raise TypeError("Arguments must be lists.")
-  union = []
-  for x in list_1 + list_2:
-    if x not in union:
-      union.append(x)
-  return union
-def Interface(ys, xs):
-  """Maps xs to consumers.
-    Returns a dict mapping each element of xs to any of its consumers that are
-    indirectly consumed by ys.
-  Args:
-    ys: The outputs
-    xs: The inputs
-  Returns:
-    out: Dict mapping each member x of `xs` to a list of all Tensors that are
-         direct consumers of x and are eventually consumed by a member of
-         `ys`.
-  """
-  if isinstance(ys, (list, tuple)):
-    queue = list(ys)
-  else:
-    queue = [ys]
-  out = OrderedDict()
-  if isinstance(xs, (list, tuple)):
-    for x in xs:
-      out[x] = []
-  else:
-    out[xs] = []
-  done = set()
-  while queue:
-    y = queue.pop()
-    if y in done:
-      continue
-    done = done.union(set([y]))
-    for x in y.op.inputs:
-      if x in out:
-        out[x].append(y)
-      else:
-        assert id(x) not in [id(foo) for foo in out]
-    queue.extend(y.op.inputs)
-  return out
-class PXGRegistry(object):
-  """Per-Example Gradient registry.
-  Maps names of ops to per-example gradient rules for those ops.
-  These rules are only needed for ops that directly touch values that
-  are shared between examples. For most machine learning applications,
-  this means only ops that directly operate on the parameters.
-  See http://arxiv.org/abs/1510.01799 for more information, and please
-  consider citing that tech report if you use this function in published
-  research.
-  """
-  def __init__(self):
-    self.d = OrderedDict()
-  def __call__(self, op,
-               colocate_gradients_with_ops=False,
-               gate_gradients=False):
-    if op.node_def.op not in self.d:
-      raise NotImplementedError("No per-example gradient rule registered "
-                                "for " + op.node_def.op + " in pxg_registry.")
-    return self.d[op.node_def.op](op,
-                                  colocate_gradients_with_ops,
-                                  gate_gradients)
-  def Register(self, op_name, pxg_class):
-    """Associates `op_name` key with `pxg_class` value.
-    Registers `pxg_class` as the class that will be called to perform
-    per-example differentiation through ops with `op_name`.
-    Args:
-      op_name: String op name.
-      pxg_class: An instance of any class with the same signature as MatMulPXG.
-    """
-    self.d[op_name] = pxg_class
-pxg_registry = PXGRegistry()
-class MatMulPXG(object):
-  """Per-example gradient rule for MatMul op.
-  """
-  def __init__(self, op,
-               colocate_gradients_with_ops=False,
-               gate_gradients=False):
-    """Construct an instance of the rule for `op`.
-    Args:
-      op: The Operation to differentiate through.
-      colocate_gradients_with_ops: currently unsupported
-      gate_gradients: currently unsupported
-    """
-    assert op.node_def.op == "MatMul"
-    self.op = op
-    self.colocate_gradients_with_ops = colocate_gradients_with_ops
-    self.gate_gradients = gate_gradients
-  def __call__(self, x, z_grads):
-    """Build the graph for the per-example gradient through the op.
-    Assumes that the MatMul was called with a design matrix with examples
-    in rows as the first argument and parameters as the second argument.
-    Args:
-      x: The Tensor to differentiate with respect to. This tensor must
-         represent the weights.
-      z_grads: The list of gradients on the output of the op.
-    Returns:
-      x_grads: A Tensor containing the gradient with respect to `x` for
-       each example. This is a 3-D tensor, with the first axis corresponding
-       to examples and the remaining axes matching the shape of x.
-    """
-    idx = list(self.op.inputs).index(x)
-    assert idx != -1
-    assert len(z_grads) == len(self.op.outputs)
-    assert idx == 1  # We expect weights to be arg 1
-    # We don't expect anyone to per-example differentiate with repsect
-    # to anything other than the weights.
-    x, _ = self.op.inputs
-    z_grads, = z_grads
-    x_expanded = tf.expand_dims(x, 2)
-    z_grads_expanded = tf.expand_dims(z_grads, 1)
-    return tf.multiply(x_expanded, z_grads_expanded)
-pxg_registry.Register("MatMul", MatMulPXG)
-class Conv2DPXG(object):
-  """Per-example gradient rule of Conv2d op.
-  Same interface as MatMulPXG.
-  """
-  def __init__(self, op,
-               colocate_gradients_with_ops=False,
-               gate_gradients=False):
-    assert op.node_def.op == "Conv2D"
-    self.op = op
-    self.colocate_gradients_with_ops = colocate_gradients_with_ops
-    self.gate_gradients = gate_gradients
-  def _PxConv2DBuilder(self, input_, w, strides, padding):
-    """conv2d run separately per example, to help compute per-example gradients.
-    Args:
-      input_: tensor containing a minibatch of images / feature maps.
-              Shape [batch_size, rows, columns, channels]
-      w: convolution kernels. Shape
-        [kernel rows, kernel columns, input channels, output channels]
-      strides: passed through to regular conv_2d
-      padding: passed through to regular conv_2d
-    Returns:
-      conv: the output of the convolution.
-         single tensor, same as what regular conv_2d does
-      w_px: a list of batch_size copies of w. each copy was used
-          for the corresponding example in the minibatch.
-           calling tf.gradients on the copy gives the gradient for just
-                  that example.
-    """
-    input_shape = [int(e) for e in input_.get_shape()]
-    batch_size = input_shape[0]
-    input_px = [tf.slice(
-        input_, [example] + [0] * 3, [1] + input_shape[1:]) for example
-                in xrange(batch_size)]
-    for input_x in input_px:
-      assert int(input_x.get_shape()[0]) == 1
-    w_px = [tf.identity(w) for example in xrange(batch_size)]
-    conv_px = [tf.nn.conv2d(input_x, w_x,
-                            strides=strides,
-                            padding=padding)
-               for input_x, w_x in zip(input_px, w_px)]
-    for conv_x in conv_px:
-      num_x = int(conv_x.get_shape()[0])
-      assert num_x == 1, num_x
-    assert len(conv_px) == batch_size
-    conv = tf.concat(axis=0, values=conv_px)
-    assert int(conv.get_shape()[0]) == batch_size
-    return conv, w_px
-  def __call__(self, w, z_grads):
-    idx = list(self.op.inputs).index(w)
-    # Make sure that `op` was actually applied to `w`
-    assert idx != -1
-    assert len(z_grads) == len(self.op.outputs)
-    # The following assert may be removed when we are ready to use this
-    # for general purpose code.
-    # This assert is only expected to hold in the contex of our preliminary
-    # MNIST experiments.
-    assert idx == 1  # We expect convolution weights to be arg 1
-    images, filters = self.op.inputs
-    strides = self.op.get_attr("strides")
-    padding = self.op.get_attr("padding")
-    # Currently assuming that one specifies at most these four arguments and
-    # that all other arguments to conv2d are set to default.
-    conv, w_px = self._PxConv2DBuilder(images, filters, strides, padding)
-    z_grads, = z_grads
-    gradients_list = tf.gradients(conv, w_px, z_grads,
-                                  colocate_gradients_with_ops=
-                                  self.colocate_gradients_with_ops,
-                                  gate_gradients=self.gate_gradients)
-    return tf.stack(gradients_list)
-pxg_registry.Register("Conv2D", Conv2DPXG)
-class AddPXG(object):
-  """Per-example gradient rule for Add op.
-  Same interface as MatMulPXG.
-  """
-  def __init__(self, op,
-               colocate_gradients_with_ops=False,
-               gate_gradients=False):
-    assert op.node_def.op == "Add"
-    self.op = op
-    self.colocate_gradients_with_ops = colocate_gradients_with_ops
-    self.gate_gradients = gate_gradients
-  def __call__(self, x, z_grads):
-    idx = list(self.op.inputs).index(x)
-    # Make sure that `op` was actually applied to `x`
-    assert idx != -1
-    assert len(z_grads) == len(self.op.outputs)
-    # The following assert may be removed when we are ready to use this
-    # for general purpose code.
-    # This assert is only expected to hold in the contex of our preliminary
-    # MNIST experiments.
-    assert idx == 1 # We expect biases to be arg 1
-    # We don't expect anyone to per-example differentiate with respect
-    # to anything other than the biases.
-    x, _ = self.op.inputs
-    z_grads, = z_grads
-    return z_grads
-pxg_registry.Register("Add", AddPXG)
-def PerExampleGradients(ys, xs, grad_ys=None, name="gradients",
-                        colocate_gradients_with_ops=False,
-                        gate_gradients=False):
-  """Symbolic differentiation, separately for each example.
-  Matches the interface of tf.gradients, but the return values each have an
-  additional axis corresponding to the examples.
-  Assumes that the cost in `ys` is additive across examples.
-  e.g., no batch normalization.
-  Individual rules for each op specify their own assumptions about how
-  examples are put into tensors.
-  """
-  # Find the interface between the xs and the cost
-  for x in xs:
-    assert isinstance(x, tf.Tensor), type(x)
-  interface = Interface(ys, xs)
-  merged_interface = []
-  for x in xs:
-    merged_interface = _ListUnion(merged_interface, interface[x])
-  # Differentiate with respect to the interface
-  interface_gradients = tf.gradients(ys, merged_interface, grad_ys=grad_ys,
-                                     name=name,
-                                     colocate_gradients_with_ops=
-                                     colocate_gradients_with_ops,
-                                     gate_gradients=gate_gradients)
-  grad_dict = OrderedDict(zip(merged_interface, interface_gradients))
-  # Build the per-example gradients with respect to the xs
-  if colocate_gradients_with_ops:
-    raise NotImplementedError("The per-example gradients are not yet "
-                              "colocated with ops.")
-  if gate_gradients:
-    raise NotImplementedError("The per-example gradients are not yet "
-                              "gated.")
-  out = []
-  for x in xs:
-    zs = interface[x]
-    ops = []
-    for z in zs:
-      ops = _ListUnion(ops, [z.op])
-    if len(ops) != 1:
-      raise NotImplementedError("Currently we only support the case "
-                                "where each x is consumed by exactly "
-                                "one op. but %s is consumed by %d ops."
-                                % (x.name, len(ops)))
-    op = ops[0]
-    pxg_rule = pxg_registry(op, colocate_gradients_with_ops, gate_gradients)
-    x_grad = pxg_rule(x, [grad_dict[z] for z in zs])
-    out.append(x_grad)
-  return out
--- a/research/differential_privacy/privacy_accountant/python/BUILD
+++ b/research/differential_privacy/privacy_accountant/python/BUILD
-package(default_visibility = [":internal"])
-licenses(["notice"])  # Apache 2.0
-exports_files(["LICENSE"])
-package_group(
-    name = "internal",
-    packages = [
-        "//third_party/tensorflow_models/...",
-    ],
-)
-py_binary(
-    name = "gaussian_moments",
-    srcs = [
-        "gaussian_moments.py",
-    ],
-    deps = [
-    ],
-)
--- a/research/differential_privacy/privacy_accountant/python/rdp_accountant.py
+++ b/research/differential_privacy/privacy_accountant/python/rdp_accountant.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""RDP analysis of the Gaussian-with-sampling mechanism.
-Functionality for computing Renyi differential privacy of an additive Gaussian
-mechanism with sampling. Its public interface consists of two methods:
-  compute_rdp(q, sigma, T, orders) computes RDP with the sampling rate q,
-                                   noise sigma, T steps at the list of orders.
-  get_privacy_spent(orders, rdp, target_eps, target_delta) computes delta
-                                   (or eps) given RDP at multiple orders and
-                                   a target value for eps (or delta).
-Example use:
-Suppose that we have run an algorithm with parameters, an array of
-(q1, sigma1, T1) ... (qk, sigma_k, Tk), and we wish to compute eps for a given
-delta. The example code would be:
-  max_order = 32
-  orders = range(2, max_order + 1)
-  rdp = np.zeros_like(orders, dtype=float)
-  for q, sigma, T in parameters:
-   rdp += rdp_accountant.compute_rdp(q, sigma, T, orders)
-  eps, _, opt_order = rdp_accountant.get_privacy_spent(rdp, target_delta=delta)
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import sys
-from absl import app
-from absl import flags
-import math
-import numpy as np
-from scipy import special
-FLAGS = flags.FLAGS
-flags.DEFINE_boolean("rdp_verbose", False,
-                     "Output intermediate results for RDP computation.")
-FLAGS(sys.argv)  # Load the flags (including on import)
-########################
-# LOG-SPACE ARITHMETIC #
-########################
-def _log_add(logx, logy):
-  """Add two numbers in the log space."""
-  a, b = min(logx, logy), max(logx, logy)
-  if a == -np.inf:  # adding 0
-    return b
-  # Use exp(a) + exp(b) = (exp(a - b) + 1) * exp(b)
-  return math.log1p(math.exp(a - b)) + b  # log1p(x) = log(x + 1)
-def _log_sub(logx, logy):
-  """Subtract two numbers in the log space. Answer must be positive."""
-  if logy == -np.inf:  # subtracting 0
-    return logx
-  assert logx > logy
-  try:
-    # Use exp(x) - exp(y) = (exp(x - y) - 1) * exp(y).
-    return math.log(math.expm1(logx - logy)) + logy  # expm1(x) = exp(x) - 1
-  except OverflowError:
-    return logx
-def _log_print(logx):
-  """Pretty print."""
-  if logx < math.log(sys.float_info.max):
-    return "{}".format(math.exp(logx))
-  else:
-    return "exp({})".format(logx)
-def _compute_log_a_int(q, sigma, alpha):
-  """Compute log(A_alpha) for integer alpha."""
-  assert isinstance(alpha, (int, long))
-  # The first and second terms of A_alpha in the log space:
-  log_a1, log_a2 = -np.inf, -np.inf
-  for i in range(alpha + 1):
-    # Compute in the log space. Extra care needed for q = 0 or 1.
-    log_coef_i = math.log(special.binom(alpha, i))
-    if q > 0:
-      log_coef_i += i * math.log(q)
-    elif i > 0:
-      continue  # The term is 0, skip the rest.
-    if q < 1.0:
-      log_coef_i += (alpha - i) * math.log(1 - q)
-    elif i < alpha:
-      continue  # The term is 0, skip the rest.
-    s1 = log_coef_i + (i * i - i) / (2.0 * (sigma ** 2))
-    s2 = log_coef_i + (i * i + i) / (2.0 * (sigma ** 2))
-    log_a1 = _log_add(log_a1, s1)
-    log_a2 = _log_add(log_a2, s2)
-  log_a = _log_add(math.log(1 - q) + log_a1, math.log(q) + log_a2)
-  if FLAGS.rdp_verbose:
-    print("A: by binomial expansion    {} = {} + {}".format(
-        _log_print(log_a),
-        _log_print(math.log(1 - q) + log_a1), _log_print(math.log(q) + log_a2)))
-  return float(log_a)
-def _compute_log_a_frac(q, sigma, alpha):
-  """Compute log(A_alpha) for fractional alpha."""
-  # The four parts of A_alpha in the log space:
-  log_a11, log_a12 = -np.inf, -np.inf
-  log_a21, log_a22 = -np.inf, -np.inf
-  i = 0
-  z0, _ = _compute_zs(sigma, q)
-  while True:  # do ... until loop
-    coef = special.binom(alpha, i)
-    log_coef = math.log(abs(coef))
-    j = alpha - i
-    log_t1 = log_coef + i * math.log(q) + j * math.log(1 - q)
-    log_t2 = log_coef + j * math.log(q) + i * math.log(1 - q)
-    log_e11 = math.log(.5) + _log_erfc((i - z0) / (math.sqrt(2) * sigma))
-    log_e12 = math.log(.5) + _log_erfc((z0 - j) / (math.sqrt(2) * sigma))
-    log_e21 = math.log(.5) + _log_erfc((i - (z0 - 1)) / (math.sqrt(2) * sigma))
-    log_e22 = math.log(.5) + _log_erfc((z0 - 1 - j) / (math.sqrt(2) * sigma))
-    log_s11 = log_t1 + (i * i - i) / (2 * (sigma ** 2)) + log_e11
-    log_s12 = log_t2 + (j * j - j) / (2 * (sigma ** 2)) + log_e12
-    log_s21 = log_t1 + (i * i + i) / (2 * (sigma ** 2)) + log_e21
-    log_s22 = log_t2 + (j * j + j) / (2 * (sigma ** 2)) + log_e22
-    if coef > 0:
-      log_a11 = _log_add(log_a11, log_s11)
-      log_a12 = _log_add(log_a12, log_s12)
-      log_a21 = _log_add(log_a21, log_s21)
-      log_a22 = _log_add(log_a22, log_s22)
-    else:
-      log_a11 = _log_sub(log_a11, log_s11)
-      log_a12 = _log_sub(log_a12, log_s12)
-      log_a21 = _log_sub(log_a21, log_s21)
-      log_a22 = _log_sub(log_a22, log_s22)
-    i += 1
-    if max(log_s11, log_s21, log_s21, log_s22) < -30:
-      break
-  log_a = _log_add(
-      math.log(1. - q) + _log_add(log_a11, log_a12),
-      math.log(q) + _log_add(log_a21, log_a22))
-  return log_a
-def _compute_log_a(q, sigma, alpha):
-  """Compute log(A_alpha) for any positive finite alpha."""
-  if float(alpha).is_integer():
-    return _compute_log_a_int(q, sigma, int(alpha))
-  else:
-    return _compute_log_a_frac(q, sigma, alpha)
-def _log_erfc(x):
-  # Can be replaced with a single call to log_ntdr if available:
-  # return np.log(2.) + special.log_ntdr(-x * 2**.5)
-  r = special.erfc(x)
-  if r == 0.0:
-    # Using the Laurent series at infinity for the tail of the erfc function:
-    #     erfc(x) ~ exp(-x^2-.5/x^2+.625/x^4)/(x*pi^.5)
-    # To verify in Mathematica:
-    #     Series[Log[Erfc[x]] + Log[x] + Log[Pi]/2 + x^2, {x, Infinity, 6}]
-    return (-math.log(math.pi) / 2 - math.log(x) - x ** 2 - .5 * x ** -2 +
-            .625 * x ** -4 - 37. / 24. * x ** -6 + 353. / 64. * x ** -8)
-  else:
-    return math.log(r)
-def _compute_zs(sigma, q):
-  z0 = sigma ** 2 * math.log(1 / q - 1) + .5
-  z1 = min(z0 - 2, z0 / 2)
-  return z0, z1
-def _compute_log_b0(sigma, q, alpha, z1):
-  """Return an approximation to log(B0) or None if failed to converge."""
-  z0, _ = _compute_zs(sigma, q)
-  s, log_term, log_b0, k, sign, max_log_term = 0, 1., 0, 0, 1, -np.inf
-  # Keep adding new terms until precision is no longer preserved.
-  # Don't stop on the negative.
-  while (k < alpha or (log_term > max_log_term - 36 and log_term > -30) or
-         sign < 0.):
-    log_b1 = k * (k - 2 * z0) / (2 * sigma ** 2)
-    log_b2 = _log_erfc((k - z1) / (math.sqrt(2) * sigma))
-    log_term = log_b0 + log_b1 + log_b2
-    max_log_term = max(max_log_term, log_term)
-    s += sign * math.exp(log_term)
-    k += 1
-    # Maintain invariant: sign * exp(log_b0) = {-alpha choose k}
-    log_b0 += math.log(abs(-alpha - k + 1)) - math.log(k)
-    sign *= -1
-  if s == 0:  # May happen if all terms are < 1e-324.
-    return -np.inf
-  if s < 0 or math.log(s) < max_log_term - 25:  # The series failed to converge.
-    return None
-  c = math.log(.5) - math.log(1 - q) * alpha
-  return c + math.log(s)
-def _bound_log_b1(sigma, q, alpha, z1):
-  log_c = _log_add(math.log(1 - q),
-                   math.log(q) + (2 * z1 - 1.) / (2 * sigma ** 2))
-  return math.log(.5) - log_c * alpha + _log_erfc(z1 / (math.sqrt(2) * sigma))
-def _bound_log_b(q, sigma, alpha):
-  """Compute a numerically stable bound on log(B_alpha)."""
-  if q == 1.:  # If the sampling rate is 100%, A and B are symmetric.
-    return _compute_log_a(q, sigma, alpha)
-  z0, z1 = _compute_zs(sigma, q)
-  log_b_bound = np.inf
-  # Puts a lower bound on B1: it cannot be less than its value at z0.
-  log_lb_b1 = _bound_log_b1(sigma, q, alpha, z0)
-  while z0 - z1 > 1e-3:
-    m = (z0 + z1) / 2
-    log_b0 = _compute_log_b0(sigma, q, alpha, m)
-    if log_b0 is None:
-      z0 = m
-      continue
-    log_b1 = _bound_log_b1(sigma, q, alpha, m)
-    log_b_bound = min(log_b_bound, _log_add(log_b0, log_b1))
-    log_b_min_bound = _log_add(log_b0, log_lb_b1)
-    if (log_b_bound < 0 or
-        log_b_min_bound < 0 or
-        log_b_bound > log_b_min_bound + .01):
-      # If the bound is likely to be too loose, move z1 closer to z0 and repeat.
-      z1 = m
-    else:
-      break
-  return log_b_bound
-def _log_bound_b_elementary(q, alpha):
-  return -math.log(1 - q) * alpha
-def _compute_delta(orders, rdp, eps):
-  """Compute delta given an RDP curve and target epsilon.
-  Args:
-    orders: An array (or a scalar) of orders.
-    rdp: A list (or a scalar) of RDP guarantees.
-    eps: The target epsilon.
-  Returns:
-    Pair of (delta, optimal_order).
-  Raises:
-    ValueError: If input is malformed.
-  """
-  orders_vec = np.atleast_1d(orders)
-  rdp_vec = np.atleast_1d(rdp)
-  if len(orders_vec) != len(rdp_vec):
-    raise ValueError("Input lists must have the same length.")
-  deltas = np.exp((rdp_vec - eps) * (orders_vec - 1))
-  idx_opt = np.argmin(deltas)
-  return min(deltas[idx_opt], 1.), orders_vec[idx_opt]
-def _compute_eps(orders, rdp, delta):
-  """Compute epsilon given an RDP curve and target delta.
-  Args:
-    orders: An array (or a scalar) of orders.
-    rdp: A list (or a scalar) of RDP guarantees.
-    delta: The target delta.
-  Returns:
-    Pair of (eps, optimal_order).
-  Raises:
-    ValueError: If input is malformed.
-  """
-  orders_vec = np.atleast_1d(orders)
-  rdp_vec = np.atleast_1d(rdp)
-  if len(orders_vec) != len(rdp_vec):
-    raise ValueError("Input lists must have the same length.")
-  eps = rdp_vec - math.log(delta) / (orders_vec - 1)
-  idx_opt = np.nanargmin(eps)  # Ignore NaNs
-  return eps[idx_opt], orders_vec[idx_opt]
-def _compute_rdp(q, sigma, alpha):
-  """Compute RDP of the Gaussian mechanism with sampling at order alpha.
-  Args:
-    q: The sampling rate.
-    sigma: The std of the additive Gaussian noise.
-    alpha: The order at which RDP is computed.
-  Returns:
-    RDP at alpha, can be np.inf.
-  """
-  if np.isinf(alpha):
-    return np.inf
-  log_moment_a = _compute_log_a(q, sigma, alpha - 1)
-  log_bound_b = _log_bound_b_elementary(q, alpha - 1)  # does not require sigma
-  if log_bound_b < log_moment_a:
-    if FLAGS.rdp_verbose:
-      print("Elementary bound suffices   : {} < {}".format(
-          _log_print(log_bound_b), _log_print(log_moment_a)))
-  else:
-    log_bound_b2 = _bound_log_b(q, sigma, alpha - 1)
-    if math.isnan(log_bound_b2):
-      if FLAGS.rdp_verbose:
-        print("B bound failed to converge")
-    else:
-      if FLAGS.rdp_verbose and (log_bound_b2 < log_bound_b):
-        print("Elementary bound is stronger: {} < {}".format(
-            _log_print(log_bound_b2), _log_print(log_bound_b)))
-      log_bound_b = min(log_bound_b, log_bound_b2)
-  return max(log_moment_a, log_bound_b) / (alpha - 1)
-def compute_rdp(q, sigma, steps, orders):
-  """Compute RDP of Gaussian mechanism with sampling for given parameters.
-  Args:
-    q: The sampling rate.
-    sigma: The std of the additive Gaussian noise.
-    steps: The number of steps.
-    orders: An array (or a scalar) of RDP orders.
-  Returns:
-    The RDPs at all orders, can be np.inf.
-  """
-  if np.isscalar(orders):
-    rdp = _compute_rdp(q, sigma, orders)
-  else:
-    rdp = np.array([_compute_rdp(q, sigma, order) for order in orders])
-  return rdp * steps
-def get_privacy_spent(orders, rdp, target_eps=None, target_delta=None):
-  """Compute delta (or eps) for given eps (or delta) from the RDP curve.
-  Args:
-    orders: An array (or a scalar) of RDP orders.
-    rdp: An array of RDP values. Must be of the same length as the orders list.
-    target_eps: If not None, the epsilon for which we compute the corresponding
-              delta.
-    target_delta: If not None, the delta for which we compute the corresponding
-              epsilon. Exactly one of target_eps and target_delta must be None.
-  Returns:
-    eps, delta, opt_order.
-  Raises:
-    ValueError: If target_eps and target_delta are messed up.
-  """
-  if target_eps is None and target_delta is None:
-    raise ValueError(
-        "Exactly one out of eps and delta must be None. (Both are).")
-  if target_eps is not None and target_delta is not None:
-    raise ValueError(
-        "Exactly one out of eps and delta must be None. (None is).")
-  if target_eps is not None:
-    delta, opt_order = _compute_delta(orders, rdp, target_eps)
-    return target_eps, delta, opt_order
-  else:
-    eps, opt_order = _compute_eps(orders, rdp, target_delta)
-    return eps, target_delta, opt_order
-def main(_):
-  pass
-if __name__ == "__main__":
-  app.run(main)
--- a/research/differential_privacy/privacy_accountant/python/rdp_accountant_test.py
+++ b/research/differential_privacy/privacy_accountant/python/rdp_accountant_test.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for rdp_accountant.py."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from absl.testing import absltest
-import mpmath as mp
-import numpy as np
-import rdp_accountant
-class TestGaussianMoments(absltest.TestCase):
-  ##############################
-  # MULTI-PRECISION ARITHMETIC #
-  ##############################
-  def _pdf_gauss_mp(self, x, sigma, mean):
-    return 1. / mp.sqrt(2. * sigma ** 2 * mp.pi) * mp.exp(
-        -(x - mean) ** 2 / (2. * sigma ** 2))
-  def _integral_inf_mp(self, fn):
-    integral, _ = mp.quad(
-        fn, [-mp.inf, mp.inf], error=True,
-        maxdegree=7)  # maxdegree = 6 is not enough
-    return integral
-  def _integral_bounded_mp(self, fn, lb, ub):
-    integral, _ = mp.quad(fn, [lb, ub], error=True)
-    return integral
-  def _distributions_mp(self, sigma, q):
-    mu0 = lambda y: self._pdf_gauss_mp(y, sigma=sigma, mean=0)
-    mu1 = lambda y: self._pdf_gauss_mp(y, sigma=sigma, mean=1.)
-    mu = lambda y: (1 - q) * mu0(y) + q * mu1(y)
-    return mu0, mu1, mu
-  def compute_a_mp(self, sigma, q, order, verbose=False):
-    """Compute A_lambda for arbitrary lambda by numerical integration."""
-    mp.dps = 100
-    mu0, mu1, mu = self._distributions_mp(sigma, q)
-    a_lambda_fn = lambda z: mu(z) * (mu(z) / mu0(z)) ** order
-    a_lambda = self._integral_inf_mp(a_lambda_fn)
-    if verbose:
-      a_lambda_first_term_fn = lambda z: mu0(z) * (mu(z) / mu0(z)) ** order
-      a_lambda_second_term_fn = lambda z: mu1(z) * (mu(z) / mu0(z)) ** order
-      a_lambda_first_term = self._integral_inf_mp(a_lambda_first_term_fn)
-      a_lambda_second_term = self._integral_inf_mp(a_lambda_second_term_fn)
-      print("A: by numerical integration {} = {} + {}".format(
-          a_lambda, (1 - q) * a_lambda_first_term, q * a_lambda_second_term))
-    return a_lambda
-  def compute_b_mp(self, sigma, q, order, verbose=False):
-    """Compute B_lambda for arbitrary lambda by numerical integration."""
-    mu0, _, mu = self._distributions_mp(sigma, q)
-    b_lambda_fn = lambda z: mu0(z) * (mu0(z) / mu(z)) ** order
-    b_numeric = self._integral_inf_mp(b_lambda_fn)
-    if verbose:
-      _, z1 = rdp_accountant._compute_zs(sigma, q)
-      print("z1 = ", z1)
-      print("x in the Taylor series = ", q / (1 - q) * np.exp(
-          (2 * z1 - 1) / (2 * sigma ** 2)))
-      b0_numeric = self._integral_bounded_mp(b_lambda_fn, -np.inf, z1)
-      b1_numeric = self._integral_bounded_mp(b_lambda_fn, z1, +np.inf)
-      print("B: numerically {} = {} + {}".format(b_numeric, b0_numeric,
-                                                 b1_numeric))
-    return float(b_numeric)
-  def _compute_rdp_mp(self, q, sigma, order):
-    log_a_mp = float(mp.log(self.compute_a_mp(sigma, q, order)))
-    log_b_mp = float(mp.log(self.compute_b_mp(sigma, q, order)))
-    return log_a_mp, log_b_mp
-  # TEST ROUTINES
-  def _almost_equal(self, a, b, rtol, atol=0.):
-    # Analogue of np.testing.assert_allclose(a, b, rtol, atol).
-    self.assertBetween(a, b * (1 - rtol) - atol, b * (1 + rtol) + atol)
-  def _compare_bounds(self, q, sigma, order):
-    log_a_mp, log_b_mp = self._compute_rdp_mp(q, sigma, order)
-    log_a = rdp_accountant._compute_log_a(q, sigma, order)
-    log_bound_b = rdp_accountant._bound_log_b(q, sigma, order)
-    if log_a_mp < 1000 and log_a_mp > 1e-6:
-      self._almost_equal(log_a, log_a_mp, rtol=1e-6)
-    else:  # be more tolerant for _very_ large or small logarithms
-      self._almost_equal(log_a, log_a_mp, rtol=1e-3, atol=1e-14)
-    if np.isfinite(log_bound_b):
-      # Ignore divergence between the bound and exact value of B if
-      # they don't matter anyway (bound on A is larger) or q > .5
-      if log_bound_b > log_a and q <= .5:
-        self._almost_equal(log_b_mp, log_bound_b, rtol=1e-6, atol=1e-14)
-    if np.isfinite(log_a_mp) and np.isfinite(log_b_mp):
-      # We hypothesize that this assertion is always true; no proof yet.
-      self.assertLessEqual(log_b_mp, log_a_mp + 1e-6)
-  def test_compute_rdp(self):
-    rdp_scalar = rdp_accountant.compute_rdp(0.1, 2, 10, 5)
-    self.assertAlmostEqual(rdp_scalar, 0.07737, places=5)
-    rdp_vec = rdp_accountant.compute_rdp(0.01, 2.5, 50, [1.5, 2.5, 5, 50, 100,
-                                                         np.inf])
-    correct = [0.00065, 0.001085, 0.00218075, 0.023846, 167.416307, np.inf]
-    for i in range(len(rdp_vec)):
-      self.assertAlmostEqual(rdp_vec[i], correct[i], places=5)
-  def test_compare_with_mp(self):
-    # Compare the cheap computation with an expensive, multi-precision
-    # computation for a few parameters. Takes a few seconds.
-    self._compare_bounds(q=.01, sigma=.1, order=.5)
-    self._compare_bounds(q=.1, sigma=1., order=5)
-    self._compare_bounds(q=.5, sigma=2., order=32.5)
-    for q in (1e-6, .1, .999):
-      for sigma in (.1, 10., 100.):
-        for order in (1.01, 2, 255.9, 256):
-          self._compare_bounds(q, sigma, order)
-  def test_get_privacy_spent(self):
-    orders = range(2, 33)
-    rdp = rdp_accountant.compute_rdp(0.01, 4, 10000, orders)
-    eps, delta, opt_order = rdp_accountant.get_privacy_spent(orders, rdp,
-                                                             target_delta=1e-5)
-    self.assertAlmostEqual(eps, 1.258575, places=5)
-    self.assertEqual(opt_order, 20)
-    eps, delta, _ = rdp_accountant.get_privacy_spent(orders, rdp,
-                                                     target_eps=1.258575)
-    self.assertAlmostEqual(delta, 1e-5)
-  def test_compute_privacy_loss(self):
-    parameters = [(0.01, 4, 10000), (0.1, 2, 100)]
-    delta = 1e-5
-    orders = (1.25, 1.5, 1.75, 2., 2.5, 3., 4., 5., 6., 7., 8., 10., 12., 14.,
-              16., 20., 24., 28., 32., 64., 256.)
-    rdp = np.zeros_like(orders, dtype=float)
-    for q, sigma, steps in parameters:
-      rdp += rdp_accountant.compute_rdp(q, sigma, steps, orders)
-    eps, delta, opt_order = rdp_accountant.get_privacy_spent(orders, rdp,
-                                                             target_delta=delta)
-    self.assertAlmostEqual(eps, 3.276237, places=5)
-    self.assertEqual(opt_order, 8)
-if __name__ == "__main__":
-  absltest.main()
--- a/research/differential_privacy/privacy_accountant/tf/BUILD
+++ b/research/differential_privacy/privacy_accountant/tf/BUILD
-package(default_visibility = [":internal"])
-licenses(["notice"])  # Apache 2.0
-exports_files(["LICENSE"])
-package_group(
-    name = "internal",
-    packages = [
-        "//differential_privacy/...",
-    ],
-)
-py_library(
-    name = "accountant",
-    srcs = [
-        "accountant.py",
-    ],
-    deps = [
-        "//differential_privacy/dp_sgd/dp_optimizer:utils",
-    ],
-)
--- a/research/differential_privacy/privacy_accountant/tf/accountant.py
+++ b/research/differential_privacy/privacy_accountant/tf/accountant.py
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Defines Accountant class for keeping track of privacy spending.
-A privacy accountant keeps track of privacy spendings. It has methods
-accumulate_privacy_spending and get_privacy_spent. Here we only define
-AmortizedAccountant which tracks the privacy spending in the amortized
-way. It uses privacy amplication via sampling to compute the privacy
-spending for each batch and strong composition (specialized for Gaussian
-noise) for accumulate the privacy spending.
-"""
-from __future__ import division
-import abc
-import collections
-import math
-import sys
-import numpy
-import tensorflow as tf
-from differential_privacy.dp_sgd.dp_optimizer import utils
-EpsDelta = collections.namedtuple("EpsDelta", ["spent_eps", "spent_delta"])
-# TODO(liqzhang) To ensure the same API for AmortizedAccountant and
-# MomentsAccountant, we pass the union of arguments to both, so we
-# have unused_sigma for AmortizedAccountant and unused_eps_delta for
-# MomentsAccountant. Consider to revise the API to avoid the unused
-# arguments.  It would be good to use @abc.abstractmethod, etc, to
-# define the common interface as a base class.
-class AmortizedAccountant(object):
-  """Keep track of privacy spending in an amortized way.
-  AmortizedAccountant accumulates the privacy spending by assuming
-  all the examples are processed uniformly at random so the spending is
-  amortized among all the examples. And we assume that we use Gaussian noise
-  so the accumulation is on eps^2 and delta, using advanced composition.
-  """
-  def __init__(self, total_examples):
-    """Initialization. Currently only support amortized tracking.
-    Args:
-      total_examples: total number of examples.
-    """
-    assert total_examples > 0
-    self._total_examples = total_examples
-    self._eps_squared_sum = tf.Variable(tf.zeros([1]), trainable=False,
-                                        name="eps_squared_sum")
-    self._delta_sum = tf.Variable(tf.zeros([1]), trainable=False,
-                                  name="delta_sum")
-  def accumulate_privacy_spending(self, eps_delta, unused_sigma,
-                                  num_examples):
-    """Accumulate the privacy spending.
-    Currently only support approximate privacy. Here we assume we use Gaussian
-    noise on randomly sampled batch so we get better composition: 1. the per
-    batch privacy is computed using privacy amplication via sampling bound;
-    2. the composition is done using the composition with Gaussian noise.
-    TODO(liqzhang) Add a link to a document that describes the bounds used.
-    Args:
-      eps_delta: EpsDelta pair which can be tensors.
-      unused_sigma: the noise sigma. Unused for this accountant.
-      num_examples: the number of examples involved.
-    Returns:
-      a TensorFlow operation for updating the privacy spending.
-    """
-    eps, delta = eps_delta
-    with tf.control_dependencies(
-        [tf.Assert(tf.greater(delta, 0),
-                   ["delta needs to be greater than 0"])]):
-      amortize_ratio = (tf.cast(num_examples, tf.float32) * 1.0 /
-                        self._total_examples)
-      # Use privacy amplification via sampling bound.
-      # See Lemma 2.2 in http://arxiv.org/pdf/1405.7085v2.pdf
-      # TODO(liqzhang) Add a link to a document with formal statement
-      # and proof.
-      amortize_eps = tf.reshape(tf.log(1.0 + amortize_ratio * (
-          tf.exp(eps) - 1.0)), [1])
-      amortize_delta = tf.reshape(amortize_ratio * delta, [1])
-      return tf.group(*[tf.assign_add(self._eps_squared_sum,
-                                      tf.square(amortize_eps)),
-                        tf.assign_add(self._delta_sum, amortize_delta)])
-  def get_privacy_spent(self, sess, target_eps=None):
-    """Report the spending so far.
-    Args:
-      sess: the session to run the tensor.
-      target_eps: the target epsilon. Unused.
-    Returns:
-      the list containing a single EpsDelta, with values as Python floats (as
-      opposed to numpy.float64). This is to be consistent with
-      MomentAccountant which can return a list of (eps, delta) pair.
-    """
-    # pylint: disable=unused-argument
-    unused_target_eps = target_eps
-    eps_squared_sum, delta_sum = sess.run([self._eps_squared_sum,
-                                           self._delta_sum])
-    return [EpsDelta(math.sqrt(eps_squared_sum), float(delta_sum))]
-class MomentsAccountant(object):
-  """Privacy accountant which keeps track of moments of privacy loss.
-  Note: The constructor of this class creates tf.Variables that must
-  be initialized with tf.global_variables_initializer() or similar calls.
-  MomentsAccountant accumulates the high moments of the privacy loss. It
-  requires a method for computing differenital moments of the noise (See
-  below for the definition). So every specific accountant should subclass
-  this class by implementing _differential_moments method.
-  Denote by X_i the random variable of privacy loss at the i-th step.
-  Consider two databases D, D' which differ by one item. X_i takes value
-  log Pr[M(D')==x]/Pr[M(D)==x] with probability Pr[M(D)==x].
-  In MomentsAccountant, we keep track of y_i(L) = log E[exp(L X_i)] for some
-  large enough L. To compute the final privacy spending,  we apply Chernoff
-  bound (assuming the random noise added at each step is independent) to
-  bound the total privacy loss Z = sum X_i as follows:
-    Pr[Z > e] = Pr[exp(L Z) > exp(L e)]
-              < E[exp(L Z)] / exp(L e)
-              = Prod_i E[exp(L X_i)] / exp(L e)
-              = exp(sum_i log E[exp(L X_i)]) / exp(L e)
-              = exp(sum_i y_i(L) - L e)
-  Hence the mechanism is (e, d)-differentially private for
-    d =  exp(sum_i y_i(L) - L e).
-  We require d < 1, i.e. e > sum_i y_i(L) / L. We maintain y_i(L) for several
-  L to compute the best d for any give e (normally should be the lowest L
-  such that 2 * sum_i y_i(L) / L < e.
-  We further assume that at each step, the mechanism operates on a random
-  sample with sampling probability q = batch_size / total_examples. Then
-    E[exp(L X)] = E[(Pr[M(D)==x / Pr[M(D')==x])^L]
-  By distinguishing two cases of whether D < D' or D' < D, we have
-  that
-    E[exp(L X)] <= max (I1, I2)
-  where
-    I1 = (1-q) E ((1-q) + q P(X+1) / P(X))^L + q E ((1-q) + q P(X) / P(X-1))^L
-    I2 = E (P(X) / ((1-q) + q P(X+1)))^L
-  In order to compute I1 and I2, one can consider to
-    1. use an asymptotic bound, which recovers the advance composition theorem;
-    2. use the closed formula (like GaussianMomentsAccountant);
-    3. use numerical integration or random sample estimation.
-  Dependent on the distribution, we can often obtain a tigher estimation on
-  the moments and hence a more accurate estimation of the privacy loss than
-  obtained using generic composition theorems.
-  """
-  __metaclass__ = abc.ABCMeta
-  def __init__(self, total_examples, moment_orders=32):
-    """Initialize a MomentsAccountant.
-    Args:
-      total_examples: total number of examples.
-      moment_orders: the order of moments to keep.
-    """
-    assert total_examples > 0
-    self._total_examples = total_examples
-    self._moment_orders = (moment_orders
-                           if isinstance(moment_orders, (list, tuple))
-                           else range(1, moment_orders + 1))
-    self._max_moment_order = max(self._moment_orders)
-    assert self._max_moment_order < 100, "The moment order is too large."
-    self._log_moments = [tf.Variable(numpy.float64(0.0),
-                                     trainable=False,
-                                     name=("log_moments-%d" % moment_order))
-                         for moment_order in self._moment_orders]
-  @abc.abstractmethod
-  def _compute_log_moment(self, sigma, q, moment_order):
-    """Compute high moment of privacy loss.
-    Args:
-      sigma: the noise sigma, in the multiples of the sensitivity.
-      q: the sampling ratio.
-      moment_order: the order of moment.
-    Returns:
-      log E[exp(moment_order * X)]
-    """
-    pass
-  def accumulate_privacy_spending(self, unused_eps_delta,
-                                  sigma, num_examples):
-    """Accumulate privacy spending.
-    In particular, accounts for privacy spending when we assume there
-    are num_examples, and we are releasing the vector
-    (sum_{i=1}^{num_examples} x_i) + Normal(0, stddev=l2norm_bound*sigma)
-    where l2norm_bound is the maximum l2_norm of each example x_i, and
-    the num_examples have been randomly selected out of a pool of
-    self.total_examples.
-    Args:
-      unused_eps_delta: EpsDelta pair which can be tensors. Unused
-        in this accountant.
-      sigma: the noise sigma, in the multiples of the sensitivity (that is,
-        if the l2norm sensitivity is k, then the caller must have added
-        Gaussian noise with stddev=k*sigma to the result of the query).
-      num_examples: the number of examples involved.
-    Returns:
-      a TensorFlow operation for updating the privacy spending.
-    """
-    q = tf.cast(num_examples, tf.float64) * 1.0 / self._total_examples
-    moments_accum_ops = []
-    for i in range(len(self._log_moments)):
-      moment = self._compute_log_moment(sigma, q, self._moment_orders[i])
-      moments_accum_ops.append(tf.assign_add(self._log_moments[i], moment))
-    return tf.group(*moments_accum_ops)
-  def _compute_delta(self, log_moments, eps):
-    """Compute delta for given log_moments and eps.
-    Args:
-      log_moments: the log moments of privacy loss, in the form of pairs
-        of (moment_order, log_moment)
-      eps: the target epsilon.
-    Returns:
-      delta
-    """
-    min_delta = 1.0
-    for moment_order, log_moment in log_moments:
-      if math.isinf(log_moment) or math.isnan(log_moment):
-        sys.stderr.write("The %d-th order is inf or Nan\n" % moment_order)
-        continue
-      if log_moment < moment_order * eps:
-        min_delta = min(min_delta,
-                        math.exp(log_moment - moment_order * eps))
-    return min_delta
-  def _compute_eps(self, log_moments, delta):
-    min_eps = float("inf")
-    for moment_order, log_moment in log_moments:
-      if math.isinf(log_moment) or math.isnan(log_moment):
-        sys.stderr.write("The %d-th order is inf or Nan\n" % moment_order)
-        continue
-      min_eps = min(min_eps, (log_moment - math.log(delta)) / moment_order)
-    return min_eps
-  def get_privacy_spent(self, sess, target_eps=None, target_deltas=None):
-    """Compute privacy spending in (e, d)-DP form for a single or list of eps.
-    Args:
-      sess: the session to run the tensor.
-      target_eps: a list of target epsilon's for which we would like to
-        compute corresponding delta value.
-      target_deltas: a list of target deltas for which we would like to
-        compute the corresponding eps value. Caller must specify
-        either target_eps or target_delta.
-    Returns:
-      A list of EpsDelta pairs.
-    """
-    assert (target_eps is None) ^ (target_deltas is None)
-    eps_deltas = []
-    log_moments = sess.run(self._log_moments)
-    log_moments_with_order = zip(self._moment_orders, log_moments)
-    if target_eps is not None:
-      for eps in target_eps:
-        eps_deltas.append(
-            EpsDelta(eps, self._compute_delta(log_moments_with_order, eps)))
-    else:
-      assert target_deltas
-      for delta in target_deltas:
-        eps_deltas.append(
-            EpsDelta(self._compute_eps(log_moments_with_order, delta), delta))
-    return eps_deltas
-class GaussianMomentsAccountant(MomentsAccountant):
-  """MomentsAccountant which assumes Gaussian noise.
-  GaussianMomentsAccountant assumes the noise added is centered Gaussian
-  noise N(0, sigma^2 I). In this case, we can compute the differential moments
-  accurately using a formula.
-  For asymptotic bound, for Gaussian noise with variance sigma^2, we can show
-  for L < sigma^2,  q L < sigma,
-    log E[exp(L X)] = O(q^2 L^2 / sigma^2).
-  Using this we derive that for training T epoches, with batch ratio q,
-  the Gaussian mechanism with variance sigma^2 (with q < 1/sigma) is (e, d)
-  private for d = exp(T/q q^2 L^2 / sigma^2 - L e). Setting L = sigma^2,
-  Tq = e/2, the mechanism is (e, exp(-e sigma^2/2))-DP. Equivalently, the
-  mechanism is (e, d)-DP if sigma = sqrt{2 log(1/d)}/e, q < 1/sigma,
-  and T < e/(2q). This bound is better than the bound obtained using general
-  composition theorems, by an Omega(sqrt{log k}) factor on epsilon, if we run
-  k steps. Since we use direct estimate, the obtained privacy bound has tight
-  constant.
-  For GaussianMomentAccountant, it suffices to compute I1, as I1 >= I2,
-  which reduce to computing E(P(x+s)/P(x+s-1) - 1)^i for s = 0 and 1. In the
-  companion gaussian_moments.py file, we supply procedure for computing both
-  I1 and I2 (the computation of I2 is through multi-precision integration
-  package). It can be verified that indeed I1 >= I2 for wide range of parameters
-  we have tried, though at the moment we are unable to prove this claim.
-  We recommend that when using this accountant, users independently verify
-  using gaussian_moments.py that for their parameters, I1 is indeed larger
-  than I2. This can be done by following the instructions in
-  gaussian_moments.py.
-  """
-  def __init__(self, total_examples, moment_orders=32):
-    """Initialization.
-    Args:
-      total_examples: total number of examples.
-      moment_orders: the order of moments to keep.
-    """
-    super(self.__class__, self).__init__(total_examples, moment_orders)
-    self._binomial_table = utils.GenerateBinomialTable(self._max_moment_order)
-  def _differential_moments(self, sigma, s, t):
-    """Compute 0 to t-th differential moments for Gaussian variable.
-        E[(P(x+s)/P(x+s-1)-1)^t]
-      = sum_{i=0}^t (t choose i) (-1)^{t-i} E[(P(x+s)/P(x+s-1))^i]
-      = sum_{i=0}^t (t choose i) (-1)^{t-i} E[exp(-i*(2*x+2*s-1)/(2*sigma^2))]
-      = sum_{i=0}^t (t choose i) (-1)^{t-i} exp(i(i+1-2*s)/(2 sigma^2))
-    Args:
-      sigma: the noise sigma, in the multiples of the sensitivity.
-      s: the shift.
-      t: 0 to t-th moment.
-    Returns:
-      0 to t-th moment as a tensor of shape [t+1].
-    """
-    assert t <= self._max_moment_order, ("The order of %d is out "
-                                         "of the upper bound %d."
-                                         % (t, self._max_moment_order))
-    binomial = tf.slice(self._binomial_table, [0, 0],
-                        [t + 1, t + 1])
-    signs = numpy.zeros((t + 1, t + 1), dtype=numpy.float64)
-    for i in range(t + 1):
-      for j in range(t + 1):
-        signs[i, j] = 1.0 - 2 * ((i - j) % 2)
-    exponents = tf.constant([j * (j + 1.0 - 2.0 * s) / (2.0 * sigma * sigma)
-                             for j in range(t + 1)], dtype=tf.float64)
-    # x[i, j] = binomial[i, j] * signs[i, j] = (i choose j) * (-1)^{i-j}
-    x = tf.multiply(binomial, signs)
-    # y[i, j] = x[i, j] * exp(exponents[j])
-    #         = (i choose j) * (-1)^{i-j} * exp(j(j-1)/(2 sigma^2))
-    # Note: this computation is done by broadcasting pointwise multiplication
-    # between [t+1, t+1] tensor and [t+1] tensor.
-    y = tf.multiply(x, tf.exp(exponents))
-    # z[i] = sum_j y[i, j]
-    #      = sum_j (i choose j) * (-1)^{i-j} * exp(j(j-1)/(2 sigma^2))
-    z = tf.reduce_sum(y, 1)
-    return z
-  def _compute_log_moment(self, sigma, q, moment_order):
-    """Compute high moment of privacy loss.
-    Args:
-      sigma: the noise sigma, in the multiples of the sensitivity.
-      q: the sampling ratio.
-      moment_order: the order of moment.
-    Returns:
-      log E[exp(moment_order * X)]
-    """
-    assert moment_order <= self._max_moment_order, ("The order of %d is out "
-                                                    "of the upper bound %d."
-                                                    % (moment_order,
-                                                       self._max_moment_order))
-    binomial_table = tf.slice(self._binomial_table, [moment_order, 0],
-                              [1, moment_order + 1])
-    # qs = [1 q q^2 ... q^L] = exp([0 1 2 ... L] * log(q))
-    qs = tf.exp(tf.constant([i * 1.0 for i in range(moment_order + 1)],
-                            dtype=tf.float64) * tf.cast(
-                                tf.log(q), dtype=tf.float64))
-    moments0 = self._differential_moments(sigma, 0.0, moment_order)
-    term0 = tf.reduce_sum(binomial_table * qs * moments0)
-    moments1 = self._differential_moments(sigma, 1.0, moment_order)
-    term1 = tf.reduce_sum(binomial_table * qs * moments1)
-    return tf.squeeze(tf.log(tf.cast(q * term0 + (1.0 - q) * term1,
-                                     tf.float64)))
-class DummyAccountant(object):
-  """An accountant that does no accounting."""
-  def accumulate_privacy_spending(self, *unused_args):
-    return tf.no_op()
-  def get_privacy_spent(self, unused_sess, **unused_kwargs):
-    return [EpsDelta(numpy.inf, 1.0)]