Merge pull request #4885 from norouzi/master

Adding keypointnet to research/models.

Merge pull request #4885 from norouzi/master
Adding keypointnet to research/models.
22e248ce · Lukasz Kaiser · GitHub · 696b69a4 · a2dca9f5 · 22e248ce
Unverified Commit 22e248ce authored Jul 26, 2018 by Lukasz Kaiser Committed by GitHub Jul 26, 2018
9 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -20,6 +20,7 @@
 /research/global_objectives/ @mackeya-google
 /research/im2txt/ @cshallue
 /research/inception/ @shlens @vincentvanhoucke
+/research/keypointnet/ @mnorouzi
 /research/learned_optimizer/ @olganw @nirum
 /research/learning_to_remember_rare_events/ @lukaszkaiser @ofirnachum
 /research/learning_unsupervised_learning/ @lukemetz @nirum

--- a/research/README.md
+++ b/research/README.md
@@ -32,6 +32,8 @@ request.
 -   [gan](gan): generative adversarial networks.
 -   [im2txt](im2txt): image-to-text neural network for image captioning.
 -   [inception](inception): deep convolutional networks for computer vision.
+-   [keypointnet](keypointnet): discovery of latent 3D keypoints via end-to-end
+    geometric eeasoning [[demo](https://keypointnet.github.io/)].
 -   [learning_to_remember_rare_events](learning_to_remember_rare_events): a
    large-scale life-long memory module for use in deep learning.
 -   [learning_unsupervised_learning](learning_unsupervised_learning): a

--- a/research/keypointnet/CONTRIBUTING.md
+++ b/research/keypointnet/CONTRIBUTING.md
+# How to Contribute
+We'd love to accept your patches and contributions to this project. There are
+just a few small guidelines you need to follow.
+## Contributor License Agreement
+Contributions to this project must be accompanied by a Contributor License
+Agreement. You (or your employer) retain the copyright to your contribution;
+this simply gives us permission to use and redistribute your contributions as
+part of the project. Head over to <https://cla.developers.google.com/> to see
+your current agreements on file or to sign a new one.
+You generally only need to submit a CLA once, so if you've already submitted one
+(even if it was for a different project), you probably don't need to do it
+again.
+## Code reviews
+All submissions, including submissions by project members, require review. We
+use GitHub pull requests for this purpose. Consult
+[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
+information on using pull requests.
+## Community Guidelines
+This project follows [Google's Open Source Community
+Guidelines](https://opensource.google.com/conduct/).
--- a/research/keypointnet/LICENSE
+++ b/research/keypointnet/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/research/keypointnet/README.md
+++ b/research/keypointnet/README.md
+# KeypointNet
+This is an implementation of the keypoint network proposed in "Discovery of
+Latent 3D Keypoints via End-to-end Geometric Reasoning
+[[pdf](https://arxiv.org/pdf/1807.03146.pdf)]". Given a single 2D image of a
+known class, this network can predict a set of 3D keypoints that are consistent
+across viewing angles of the same object and across object instances. These
+keypoints and their detectors are discovered and learned automatically without
+keypoint location supervision [[demo](https://keypointnet.github.io)].
+## Datasets:
+  ShapeNet's rendering for 
+  [Cars](https://storage.googleapis.com/discovery-3dkeypoints-data/cars_with_keypoints.zip),
+  [Planes](https://storage.googleapis.com/discovery-3dkeypoints-data/planes_with_keypoints.zip),
+  [Chairs](https://storage.googleapis.com/discovery-3dkeypoints-data/chairs_with_keypoints.zip).
+  Each set contains:
+1. tfrecords
+2. train.txt, a list of tfrecords used for training.
+2. dev.txt, a list of tfrecords used for validation.
+3. test.txt, a list of tfrecords used for testing.
+4. projection.txt, storing the global 4x4 camera projection matrix.
+5. job.txt, storing ShapeNet's object IDs in each tfrecord.
+## Training:
+  Run `main.py --model_dir=MODEL_DIR --dset=DSET`
+  where MODEL_DIR is a folder for storing model checkpoints: (see [tf.estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)), and DSET should point to the folder containing tfrecords (download above).
+## Inference:
+  Run `main.py --model_dir=MODEL_DIR --input=INPUT --predict`
+  where MODEL_DIR is the model checkpoint folder, and INPUT is a folder containing png or jpeg test images.
+  We trained the network using the total batch size of 256 (8 x 32 replicas). You may have to tune the learning rate if your batch size is different. 
+## Code credit:
+  Supasorn Suwajanakorn
+## Contact:
+  supasorn@gmail.com, [snavely,tompson,mnorouzi]@google.com
+(This is not an officially supported Google product)
--- a/research/keypointnet/main.py
+++ b/research/keypointnet/main.py
+# Copyright 2018 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =============================================================================
+"""KeypointNet!!
+A reimplementation of 'Discovery of Latent 3D Keypoints via End-to-end
+Geometric Reasoning' keypoint network. Given a single 2D image of a known class,
+this network can predict a set of 3D keypoints that are consistent across
+viewing angles of the same object and across object instances. These keypoints
+and their detectors are discovered and learned automatically without
+keypoint location supervision.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+from scipy import misc
+import sys
+import tensorflow as tf
+import tensorflow.contrib.slim as slim
+import utils
+FLAGS = tf.app.flags.FLAGS
+tf.app.flags.DEFINE_boolean("predict", False, "Running inference if true")
+tf.app.flags.DEFINE_string(
+    "input",
+    "",
+    "Input folder containing images")
+tf.app.flags.DEFINE_string("model_dir", None, "Estimator model_dir")
+tf.app.flags.DEFINE_string(
+    "dset",
+    "",
+    "Path to the directory containing the dataset.")
+tf.app.flags.DEFINE_integer("steps", 200000, "Training steps")
+tf.app.flags.DEFINE_integer("batch_size", 8, "Size of mini-batch.")
+tf.app.flags.DEFINE_string(
+    "hparams", "",
+    "A comma-separated list of `name=value` hyperparameter values. This flag "
+    "is used to override hyperparameter settings either when manually "
+    "selecting hyperparameters or when using Vizier.")
+tf.app.flags.DEFINE_integer(
+    "sync_replicas", -1,
+    "If > 0, use SyncReplicasOptimizer and use this many replicas per sync.")
+# Fixed input size 128 x 128.
+vw = vh = 128
+def create_input_fn(split, batch_size):
+  """Returns input_fn for tf.estimator.Estimator.
+  Reads tfrecords and construts input_fn for either training or eval. All
+  tfrecords not in test.txt or dev.txt will be assigned to training set.
+  Args:
+    split: A string indicating the split. Can be either 'train' or 'validation'.
+    batch_size: The batch size!
+  Returns:
+    input_fn for tf.estimator.Estimator.
+  Raises:
+    IOError: If test.txt or dev.txt are not found.
+  """
+  if (not os.path.exists(os.path.join(FLAGS.dset, "test.txt")) or
+      not os.path.exists(os.path.join(FLAGS.dset, "dev.txt"))):
+    raise IOError("test.txt or dev.txt not found")
+  with open(os.path.join(FLAGS.dset, "test.txt"), "r") as f:
+    testset = [x.strip() for x in f.readlines()]
+  with open(os.path.join(FLAGS.dset, "dev.txt"), "r") as f:
+    validset = [x.strip() for x in f.readlines()]
+  files = os.listdir(FLAGS.dset)
+  filenames = []
+  for f in files:
+    sp = os.path.splitext(f)
+    if sp[1] != ".tfrecord" or sp[0] in testset:
+      continue
+    if ((split == "validation" and sp[0] in validset) or
+        (split == "train" and sp[0] not in validset)):
+      filenames.append(os.path.join(FLAGS.dset, f))
+  def input_fn():
+    """input_fn for tf.estimator.Estimator."""
+    def parser(serialized_example):
+      """Parses a single tf.Example into image and label tensors."""
+      fs = tf.parse_single_example(
+          serialized_example,
+          features={
+              "img0": tf.FixedLenFeature([], tf.string),
+              "img1": tf.FixedLenFeature([], tf.string),
+              "mv0": tf.FixedLenFeature([16], tf.float32),
+              "mvi0": tf.FixedLenFeature([16], tf.float32),
+              "mv1": tf.FixedLenFeature([16], tf.float32),
+              "mvi1": tf.FixedLenFeature([16], tf.float32),
+          })
+      fs["img0"] = tf.div(tf.to_float(tf.image.decode_png(fs["img0"], 4)), 255)
+      fs["img1"] = tf.div(tf.to_float(tf.image.decode_png(fs["img1"], 4)), 255)
+      fs["img0"].set_shape([vh, vw, 4])
+      fs["img1"].set_shape([vh, vw, 4])
+      # fs["lr0"] = [fs["mv0"][0]]
+      # fs["lr1"] = [fs["mv1"][0]]
+      fs["lr0"] = tf.convert_to_tensor([fs["mv0"][0]])
+      fs["lr1"] = tf.convert_to_tensor([fs["mv1"][0]])
+      return fs
+    np.random.shuffle(filenames)
+    dataset = tf.data.TFRecordDataset(filenames)
+    dataset = dataset.map(parser, num_parallel_calls=4)
+    dataset = dataset.shuffle(400).repeat().batch(batch_size)
+    dataset = dataset.prefetch(buffer_size=256)
+    return dataset.make_one_shot_iterator().get_next(), None
+  return input_fn
+class Transformer(object):
+  """A utility for projecting 3D points to 2D coordinates and vice versa.
+  3D points are represented in 4D-homogeneous world coordinates. The pixel
+  coordinates are represented in normalized device coordinates [-1, 1].
+  See https://learnopengl.com/Getting-started/Coordinate-Systems.
+  """
+  def __get_matrix(self, lines):
+    return np.array([[float(y) for y in x.strip().split(" ")] for x in lines])
+  def __read_projection_matrix(self, filename):
+    if not os.path.exists(filename):
+      filename = "/cns/vz-d/home/supasorn/datasets/cars/projection.txt"
+    with open(filename, "r") as f:
+      lines = f.readlines()
+    return self.__get_matrix(lines)
+  def __init__(self, w, h, dataset_dir):
+    self.w = w
+    self.h = h
+    p = self.__read_projection_matrix(dataset_dir + "projection.txt")
+    # transposed of inversed projection matrix.
+    self.pinv_t = tf.constant([[1.0 / p[0, 0], 0, 0,
+                                0], [0, 1.0 / p[1, 1], 0, 0], [0, 0, 1, 0],
+                               [0, 0, 0, 1]])
+    self.f = p[0, 0]
+  def project(self, xyzw):
+    """Projects homogeneous 3D coordinates to normalized device coordinates."""
+    z = xyzw[:, :, 2:3] + 1e-8
+    return tf.concat([-self.f * xyzw[:, :, :2] / z, z], axis=2)
+  def unproject(self, xyz):
+    """Unprojects normalized device coordinates with depth to 3D coordinates."""
+    z = xyz[:, :, 2:]
+    xy = -xyz * z
+    def batch_matmul(a, b):
+      return tf.reshape(
+          tf.matmul(tf.reshape(a, [-1, a.shape[2].value]), b),
+          [-1, a.shape[1].value, a.shape[2].value])
+    return batch_matmul(
+        tf.concat([xy[:, :, :2], z, tf.ones_like(z)], axis=2), self.pinv_t)
+def meshgrid(h):
+  """Returns a meshgrid ranging from [-1, 1] in x, y axes."""
+  r = np.arange(0.5, h, 1) / (h / 2) - 1
+  ranx, rany = tf.meshgrid(r, -r)
+  return tf.to_float(ranx), tf.to_float(rany)
+def estimate_rotation(xyz0, xyz1, pconf, noise):
+  """Estimates the rotation between two sets of keypoints.
+  The rotation is estimated by first subtracting mean from each set of keypoints
+  and computing SVD of the covariance matrix.
+  Args:
+    xyz0: [batch, num_kp, 3] The first set of keypoints.
+    xyz1: [batch, num_kp, 3] The second set of keypoints.
+    pconf: [batch, num_kp] The weights used to compute the rotation estimate.
+    noise: A number indicating the noise added to the keypoints.
+  Returns:
+    [batch, 3, 3] A batch of transposed 3 x 3 rotation matrices.
+  """
+  xyz0 += tf.random_normal(tf.shape(xyz0), mean=0, stddev=noise)
+  xyz1 += tf.random_normal(tf.shape(xyz1), mean=0, stddev=noise)
+  pconf2 = tf.expand_dims(pconf, 2)
+  cen0 = tf.reduce_sum(xyz0 * pconf2, 1, keepdims=True)
+  cen1 = tf.reduce_sum(xyz1 * pconf2, 1, keepdims=True)
+  x = xyz0 - cen0
+  y = xyz1 - cen1
+  cov = tf.matmul(tf.matmul(x, tf.matrix_diag(pconf), transpose_a=True), y)
+  _, u, v = tf.svd(cov, full_matrices=True)
+  d = tf.matrix_determinant(tf.matmul(v, u, transpose_b=True))
+  ud = tf.concat(
+      [u[:, :, :-1], u[:, :, -1:] * tf.expand_dims(tf.expand_dims(d, 1), 1)],
+      axis=2)
+  return tf.matmul(ud, v, transpose_b=True)
+def relative_pose_loss(xyz0, xyz1, rot, pconf, noise):
+  """Computes the relative pose loss (chordal, angular).
+  Args:
+    xyz0: [batch, num_kp, 3] The first set of keypoints.
+    xyz1: [batch, num_kp, 3] The second set of keypoints.
+    rot: [batch, 4, 4] The ground-truth rotation matrices.
+    pconf: [batch, num_kp] The weights used to compute the rotation estimate.
+    noise: A number indicating the noise added to the keypoints.
+  Returns:
+    A tuple (chordal loss, angular loss).
+  """
+  r_transposed = estimate_rotation(xyz0, xyz1, pconf, noise)
+  rotation = rot[:, :3, :3]
+  frob_sqr = tf.reduce_sum(tf.square(r_transposed - rotation), axis=[1, 2])
+  frob = tf.sqrt(frob_sqr)
+  return tf.reduce_mean(frob_sqr), \
+      2.0 * tf.reduce_mean(tf.asin(tf.minimum(1.0, frob / (2 * math.sqrt(2)))))
+def separation_loss(xyz, delta):
+  """Computes the separation loss.
+  Args:
+    xyz: [batch, num_kp, 3] Input keypoints.
+    delta: A separation threshold. Incur 0 cost if the distance >= delta.
+  Returns:
+    The seperation loss.
+  """
+  num_kp = tf.shape(xyz)[1]
+  t1 = tf.tile(xyz, [1, num_kp, 1])
+  t2 = tf.reshape(tf.tile(xyz, [1, 1, num_kp]), tf.shape(t1))
+  diffsq = tf.square(t1 - t2)
+  # -> [batch, num_kp ^ 2]
+  lensqr = tf.reduce_sum(diffsq, axis=2)
+  return (tf.reduce_sum(tf.maximum(-lensqr + delta, 0.0)) / tf.to_float(
+      num_kp * FLAGS.batch_size * 2))
+def consistency_loss(uv0, uv1, pconf):
+  """Computes multi-view consistency loss between two sets of keypoints.
+  Args:
+    uv0: [batch, num_kp, 2] The first set of keypoint 2D coordinates.
+    uv1: [batch, num_kp, 2] The second set of keypoint 2D coordinates.
+    pconf: [batch, num_kp] The weights used to compute the rotation estimate.
+  Returns:
+    The consistency loss.
+  """
+  # [batch, num_kp, 2]
+  wd = tf.square(uv0 - uv1) * tf.expand_dims(pconf, 2)
+  wd = tf.reduce_sum(wd, axis=[1, 2])
+  return tf.reduce_mean(wd)
+def variance_loss(probmap, ranx, rany, uv):
+  """Computes the variance loss as part of Sillhouette consistency.
+  Args:
+    probmap: [batch, num_kp, h, w] The distribution map of keypoint locations.
+    ranx: X-axis meshgrid.
+    rany: Y-axis meshgrid.
+    uv: [batch, num_kp, 2] Keypoint locations (in NDC).
+  Returns:
+    The variance loss.
+  """
+  ran = tf.stack([ranx, rany], axis=2)
+  sh = tf.shape(ran)
+  # [batch, num_kp, vh, vw, 2]
+  ran = tf.reshape(ran, [1, 1, sh[0], sh[1], 2])
+  sh = tf.shape(uv)
+  uv = tf.reshape(uv, [sh[0], sh[1], 1, 1, 2])
+  diff = tf.reduce_sum(tf.square(uv - ran), axis=4)
+  diff *= probmap
+  return tf.reduce_mean(tf.reduce_sum(diff, axis=[2, 3]))
+def dilated_cnn(images, num_filters, is_training):
+  """Constructs a base dilated convolutional network.
+  Args:
+    images: [batch, h, w, 3] Input RGB images.
+    num_filters: The number of filters for all layers.
+    is_training: True if this function is called during training.
+  Returns:
+    Output of this dilated CNN.
+  """
+  net = images
+  with slim.arg_scope(
+      [slim.conv2d, slim.fully_connected],
+      normalizer_fn=slim.batch_norm,
+      activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1),
+      normalizer_params={"is_training": is_training}):
+    for i, r in enumerate([1, 1, 2, 4, 8, 16, 1, 2, 4, 8, 16, 1]):
+      net = slim.conv2d(net, num_filters, [3, 3], rate=r, scope="dconv%d" % i)
+  return net
+def orientation_network(images, num_filters, is_training):
+  """Constructs a network that infers the orientation of an object.
+  Args:
+    images: [batch, h, w, 3] Input RGB images.
+    num_filters: The number of filters for all layers.
+    is_training: True if this function is called during training.
+  Returns:
+    Output of the orientation network.
+  """
+  with tf.variable_scope("OrientationNetwork"):
+    net = dilated_cnn(images, num_filters, is_training)
+    modules = 2
+    prob = slim.conv2d(net, 2, [3, 3], rate=1, activation_fn=None)
+    prob = tf.transpose(prob, [0, 3, 1, 2])
+    prob = tf.reshape(prob, [-1, modules, vh * vw])
+    prob = tf.nn.softmax(prob)
+    ranx, rany = meshgrid(vh)
+    prob = tf.reshape(prob, [-1, 2, vh, vw])
+    sx = tf.reduce_sum(prob * ranx, axis=[2, 3])
+    sy = tf.reduce_sum(prob * rany, axis=[2, 3])  # -> batch x modules
+    out_xy = tf.reshape(tf.stack([sx, sy], -1), [-1, modules, 2])
+  return out_xy
+def keypoint_network(rgba,
+                     num_filters,
+                     num_kp,
+                     is_training,
+                     lr_gt=None,
+                     anneal=1):
+  """Constructs our main keypoint network that predicts 3D keypoints.
+  Args:
+    rgba: [batch, h, w, 4] Input RGB images with alpha channel.
+    num_filters: The number of filters for all layers.
+    num_kp: The number of keypoints.
+    is_training: True if this function is called during training.
+    lr_gt: The groundtruth orientation flag used at the beginning of training.
+        Then we linearly anneal in the prediction.
+    anneal: A number between [0, 1] where 1 means using the ground-truth
+        orientation and 0 means using our estimate.
+  Returns:
+    uv: [batch, num_kp, 2] 2D locations of keypoints.
+    z: [batch, num_kp] The depth of keypoints.
+    orient: [batch, 2, 2] Two 2D coordinates that correspond to [1, 0, 0] and
+        [-1, 0, 0] in object space.
+    sill: The Sillhouette loss.
+    variance: The variance loss.
+    prob_viz: A visualization of all predicted keypoints.
+    prob_vizs: A list of visualizations of each keypoint.
+  """
+  images = rgba[:, :, :, :3]
+  # [batch, 1]
+  orient = orientation_network(images, num_filters * 0.5, is_training)
+  # [batch, 1]
+  lr_estimated = tf.maximum(0.0, tf.sign(orient[:, 0, :1] - orient[:, 1, :1]))
+  if lr_gt is None:
+    lr = lr_estimated
+  else:
+    lr_gt = tf.maximum(0.0, tf.sign(lr_gt[:, :1]))
+    lr = tf.round(lr_gt * anneal + lr_estimated * (1 - anneal))
+  lrtiled = tf.tile(
+      tf.expand_dims(tf.expand_dims(lr, 1), 1),
+      [1, images.shape[1], images.shape[2], 1])
+  images = tf.concat([images, lrtiled], axis=3)
+  mask = rgba[:, :, :, 3]
+  mask = tf.cast(tf.greater(mask, tf.zeros_like(mask)), dtype=tf.float32)
+  net = dilated_cnn(images, num_filters, is_training)
+  # The probability distribution map.
+  prob = slim.conv2d(
+      net, num_kp, [3, 3], rate=1, scope="conv_xy", activation_fn=None)
+  # We added the  fixed camera distance as a bias.
+  z = -30 + slim.conv2d(
+      net, num_kp, [3, 3], rate=1, scope="conv_z", activation_fn=None)
+  prob = tf.transpose(prob, [0, 3, 1, 2])
+  z = tf.transpose(z, [0, 3, 1, 2])
+  prob = tf.reshape(prob, [-1, num_kp, vh * vw])
+  prob = tf.nn.softmax(prob, name="softmax")
+  ranx, rany = meshgrid(vh)
+  prob = tf.reshape(prob, [-1, num_kp, vh, vw])
+  # These are for visualizing the distribution maps.
+  prob_viz = tf.expand_dims(tf.reduce_sum(prob, 1), 3)
+  prob_vizs = [tf.expand_dims(prob[:, i, :, :], 3) for i in range(num_kp)]
+  sx = tf.reduce_sum(prob * ranx, axis=[2, 3])
+  sy = tf.reduce_sum(prob * rany, axis=[2, 3])  # -> batch x num_kp
+  # [batch, num_kp]
+  sill = tf.reduce_sum(prob * tf.expand_dims(mask, 1), axis=[2, 3])
+  sill = tf.reduce_mean(-tf.log(sill + 1e-12))
+  z = tf.reduce_sum(prob * z, axis=[2, 3])
+  uv = tf.reshape(tf.stack([sx, sy], -1), [-1, num_kp, 2])
+  variance = variance_loss(prob, ranx, rany, uv)
+  return uv, z, orient, sill, variance, prob_viz, prob_vizs
+def model_fn(features, labels, mode, hparams):
+  """Returns model_fn for tf.estimator.Estimator."""
+  del labels
+  is_training = (mode == tf.estimator.ModeKeys.TRAIN)
+  t = Transformer(vw, vh, FLAGS.dset)
+  def func1(x):
+    return tf.transpose(tf.reshape(features[x], [-1, 4, 4]), [0, 2, 1])
+  mv = [func1("mv%d" % i) for i in range(2)]
+  mvi = [func1("mvi%d" % i) for i in range(2)]
+  uvz = [None] * 2
+  uvz_proj = [None] * 2  # uvz coordinates projected on to the other view.
+  viz = [None] * 2
+  vizs = [None] * 2
+  loss_sill = 0
+  loss_variance = 0
+  loss_con = 0
+  loss_sep = 0
+  loss_lr = 0
+  for i in range(2):
+    with tf.variable_scope("KeypointNetwork", reuse=i > 0):
+      # anneal: 1 = using ground-truth, 0 = using our estimate orientation.
+      anneal = tf.to_float(hparams.lr_anneal_end - tf.train.get_global_step())
+      anneal = tf.clip_by_value(
+          anneal / (hparams.lr_anneal_end - hparams.lr_anneal_start), 0.0, 1.0)
+      uv, z, orient, sill, variance, viz[i], vizs[i] = keypoint_network(
+          features["img%d" % i],
+          hparams.num_filters,
+          hparams.num_kp,
+          is_training,
+          lr_gt=features["lr%d" % i],
+          anneal=anneal)
+      # x-positive/negative axes (dominant direction).
+      xp_axis = tf.tile(
+          tf.constant([[[1.0, 0, 0, 1], [-1.0, 0, 0, 1]]]),
+          [tf.shape(orient)[0], 1, 1])
+      # [batch, 2, 4]  = [batch, 2, 4] x [batch, 4, 4]
+      xp = tf.matmul(xp_axis, mv[i])
+      # [batch, 2, 3]
+      xp = t.project(xp)
+      loss_lr += tf.losses.mean_squared_error(orient[:, :, :2], xp[:, :, :2])
+      loss_variance += variance
+      loss_sill += sill
+      uv = tf.reshape(uv, [-1, hparams.num_kp, 2])
+      z = tf.reshape(z, [-1, hparams.num_kp, 1])
+      # [batch, num_kp, 3]
+      uvz[i] = tf.concat([uv, z], axis=2)
+      world_coords = tf.matmul(t.unproject(uvz[i]), mvi[i])
+      # [batch, num_kp, 3]
+      uvz_proj[i] = t.project(tf.matmul(world_coords, mv[1 - i]))
+  pconf = tf.ones(
+      [tf.shape(uv)[0], tf.shape(uv)[1]], dtype=tf.float32) / hparams.num_kp
+  for i in range(2):
+    loss_con += consistency_loss(uvz_proj[i][:, :, :2], uvz[1 - i][:, :, :2],
+                                 pconf)
+    loss_sep += separation_loss(
+        t.unproject(uvz[i])[:, :, :3], hparams.sep_delta)
+  chordal, angular = relative_pose_loss(
+      t.unproject(uvz[0])[:, :, :3],
+      t.unproject(uvz[1])[:, :, :3], tf.matmul(mvi[0], mv[1]), pconf,
+      hparams.noise)
+  loss = (
+      hparams.loss_pose * angular +
+      hparams.loss_con * loss_con +
+      hparams.loss_sep * loss_sep +
+      hparams.loss_sill * loss_sill +
+      hparams.loss_lr * loss_lr +
+      hparams.loss_variance * loss_variance
+  )
+  def touint8(img):
+    return tf.cast(img * 255.0, tf.uint8)
+  with tf.variable_scope("output"):
+    tf.summary.image("0_img0", touint8(features["img0"][:, :, :, :3]))
+    tf.summary.image("1_combined", viz[0])
+    for i in range(hparams.num_kp):
+      tf.summary.image("2_f%02d" % i, vizs[0][i])
+  with tf.variable_scope("stats"):
+    tf.summary.scalar("anneal", anneal)
+    tf.summary.scalar("closs", loss_con)
+    tf.summary.scalar("seploss", loss_sep)
+    tf.summary.scalar("angular", angular)
+    tf.summary.scalar("chordal", chordal)
+    tf.summary.scalar("lrloss", loss_lr)
+    tf.summary.scalar("sill", loss_sill)
+    tf.summary.scalar("vloss", loss_variance)
+  return {
+      "loss": loss,
+      "predictions": {
+          "img0": features["img0"],
+          "img1": features["img1"],
+          "uvz0": uvz[0],
+          "uvz1": uvz[1]
+      },
+      "eval_metric_ops": {
+          "closs": tf.metrics.mean(loss_con),
+          "angular_loss": tf.metrics.mean(angular),
+          "chordal_loss": tf.metrics.mean(chordal),
+      }
+  }
+def predict(input_folder, hparams):
+  """Predicts keypoints on all images in input_folder."""
+  cols = plt.cm.get_cmap("rainbow")(
+      np.linspace(0, 1.0, hparams.num_kp))[:, :4]
+  img = tf.placeholder(tf.float32, shape=(1, 128, 128, 4))
+  with tf.variable_scope("KeypointNetwork"):
+    ret = keypoint_network(
+        img, hparams.num_filters, hparams.num_kp, False)
+  uv = tf.reshape(ret[0], [-1, hparams.num_kp, 2])
+  z = tf.reshape(ret[1], [-1, hparams.num_kp, 1])
+  uvz = tf.concat([uv, z], axis=2)
+  sess = tf.Session()
+  saver = tf.train.Saver()
+  ckpt = tf.train.get_checkpoint_state(FLAGS.model_dir)
+  print("loading model: ", ckpt.model_checkpoint_path)
+  saver.restore(sess, ckpt.model_checkpoint_path)
+  files = [x for x in os.listdir(input_folder)
+           if x[-3:] in ["jpg", "png"]]
+  output_folder = os.path.join(input_folder, "output")
+  if not os.path.exists(output_folder):
+    os.mkdir(output_folder)
+  for f in files:
+    orig = misc.imread(os.path.join(input_folder, f)).astype(float) / 255
+    if orig.shape[2] == 3:
+      orig = np.concatenate((orig, np.ones_like(orig[:, :, :1])), axis=2)
+    uv_ret = sess.run(uvz, feed_dict={img: np.expand_dims(orig, 0)})
+    utils.draw_ndc_points(orig, uv_ret.reshape(hparams.num_kp, 3), cols)
+    misc.imsave(os.path.join(output_folder, f), orig)
+def _default_hparams():
+  """Returns default or overridden user-specified hyperparameters."""
+  hparams = tf.contrib.training.HParams(
+      num_filters=64,  # Number of filters.
+      num_kp=10,  # Numer of keypoints.
+      loss_pose=0.2,  # Pose Loss.
+      loss_con=1.0,  # Multiview consistency Loss.
+      loss_sep=1.0,  # Seperation Loss.
+      loss_sill=1.0,  # Sillhouette Loss.
+      loss_lr=1.0,  # Orientation Loss.
+      loss_variance=0.5,  # Variance Loss (part of Sillhouette loss).
+      sep_delta=0.05,  # Seperation threshold.
+      noise=0.1,  # Noise added during estimating rotation.
+      learning_rate=1.0e-3,
+      lr_anneal_start=30000,  # When to anneal in the orientation prediction.
+      lr_anneal_end=60000,  # When to use the prediction completely.
+  )
+  if FLAGS.hparams:
+    hparams = hparams.parse(FLAGS.hparams)
+  return hparams
+def main(argv):
+  del argv
+  hparams = _default_hparams()
+  if FLAGS.predict:
+    predict(FLAGS.input, hparams)
+  else:
+    utils.train_and_eval(
+        model_dir=FLAGS.model_dir,
+        model_fn=model_fn,
+        input_fn=create_input_fn,
+        hparams=hparams,
+        steps=FLAGS.steps,
+        batch_size=FLAGS.batch_size,
+        save_checkpoints_secs=600,
+        eval_throttle_secs=1800,
+        eval_steps=5,
+        sync_replicas=FLAGS.sync_replicas,
+    )
+if __name__ == "__main__":
+  sys.excepthook = utils.colored_hook(
+      os.path.dirname(os.path.realpath(__file__)))
+  tf.app.run()
--- a/research/keypointnet/tools/gen_tfrecords.py
+++ b/research/keypointnet/tools/gen_tfrecords.py
+# Copyright 2018 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =============================================================================
+"""An example script to generate a tfrecord file from a folder containing the
+renderings.
+Example usage:
+  python gen_tfrecords.py --input=FOLDER --output=output.tfrecord
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import os
+from scipy import misc
+import tensorflow as tf
+FLAGS = tf.app.flags.FLAGS
+tf.app.flags.DEFINE_string("input", "", "Input folder containing images")
+tf.app.flags.DEFINE_string("output", "", "Output tfrecord.")
+def get_matrix(lines):
+  return np.array([[float(y) for y in x.strip().split(" ")] for x in lines])
+def read_model_view_matrices(filename):
+  with open(filename, "r") as f:
+    lines = f.readlines()
+  return get_matrix(lines[:4]), get_matrix(lines[4:])
+def bytes_feature(values):
+  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))
+def generate():
+  with tf.python_io.TFRecordWriter(FLAGS.output) as tfrecord_writer:
+    with tf.Graph().as_default():
+      im0 = tf.placeholder(dtype=tf.uint8)
+      im1 = tf.placeholder(dtype=tf.uint8)
+      encoded0 = tf.image.encode_png(im0)
+      encoded1 = tf.image.encode_png(im1)
+      with tf.Session() as sess:
+        count = 0
+        indir = FLAGS.input + "/"
+        while tf.gfile.Exists(indir + "%06d.txt" % count):
+          print("saving %06d" % count)
+          image0 = misc.imread(indir + "%06d.png" % (count * 2))
+          image1 = misc.imread(indir + "%06d.png" % (count * 2 + 1))
+          mat0, mat1 = read_model_view_matrices(indir + "%06d.txt" % count)
+          mati0 = np.linalg.inv(mat0).flatten()
+          mati1 = np.linalg.inv(mat1).flatten()
+          mat0 = mat0.flatten()
+          mat1 = mat1.flatten()
+          st0, st1 = sess.run([encoded0, encoded1],
+              feed_dict={im0: image0, im1: image1})
+          example = tf.train.Example(features=tf.train.Features(feature={
+            'img0': bytes_feature(st0),
+            'img1': bytes_feature(st1),
+            'mv0': tf.train.Feature(
+                float_list=tf.train.FloatList(value=mat0)),
+            'mvi0': tf.train.Feature(
+                float_list=tf.train.FloatList(value=mati0)),
+            'mv1': tf.train.Feature(
+                float_list=tf.train.FloatList(value=mat1)),
+            'mvi1': tf.train.Feature(
+                float_list=tf.train.FloatList(value=mati1)),
+            }))
+          tfrecord_writer.write(example.SerializeToString())
+          count += 1
+def main(argv):
+  del argv
+  generate()
+if __name__ == "__main__":
+  tf.app.run()
--- a/research/keypointnet/tools/render.py
+++ b/research/keypointnet/tools/render.py
+# Copyright 2018 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =============================================================================
+"""Script to render object views from ShapeNet obj models.
+Example usage:
+  blender -b --python render.py -- -m model.obj -o output/ -s 128 -n 120 -fov 5
+"""
+from __future__ import print_function
+import argparse
+import itertools
+import json
+from math import pi
+import os
+import random
+import sys
+from mathutils import Vector
+import math
+import mathutils
+import time
+import copy
+import bpy
+sys.path.append(os.path.dirname(__file__))
+BG_LUMINANCE = 0
+def look_at(obj_camera, point):
+  loc_camera = obj_camera.location
+  direction = point - loc_camera
+  # point the cameras '-Z' and use its 'Y' as up
+  rot_quat = direction.to_track_quat('-Z', 'Y')
+  obj_camera.rotation_euler = rot_quat.to_euler()
+def roll_camera(obj_camera):
+  roll_rotate = mathutils.Euler(
+      (0, 0, random.random() * math.pi - math.pi * 0.5), 'XYZ')
+  obj_camera.rotation_euler = (obj_camera.rotation_euler.to_matrix() *
+      roll_rotate.to_matrix()).to_euler()
+def norm(x):
+  return math.sqrt(x[0] * x[0] + x[1] * x[1] + x[2] * x[2])
+def normalize(x):
+  n = norm(x)
+  x[0] /= n
+  x[1] /= n
+  x[2] /= n
+def random_top_sphere():
+  xyz = [random.normalvariate(0, 1) for x in range(3)]
+  normalize(xyz)
+  if xyz[2] < 0:
+    xyz[2] *= -1
+  return xyz
+def perturb_sphere(loc, size):
+  while True:
+    xyz = [random.normalvariate(0, 1) for x in range(3)]
+    normalize(xyz)
+    nloc = [loc[i] + xyz[i] * random.random() * size for i in range(3)]
+    normalize(nloc)
+    if nloc[2] >= 0:
+      return nloc
+def perturb(loc, size):
+  while True:
+    nloc = [loc[i] + random.random() * size * 2 - size for i in range(3)]
+    if nloc[2] >= 0:
+      return nloc
+    bpy.ops.object.mode_set()
+def delete_all_objects():
+  bpy.ops.object.select_by_type(type="MESH")
+  bpy.ops.object.delete(use_global=False)
+def set_scene(render_size, fov, alpha=False):
+  """Set up default scene properties."""
+  delete_all_objects()
+  cam = bpy.data.cameras["Camera"]
+  cam.angle = fov * pi / 180
+  light = bpy.data.objects["Lamp"]
+  light.location = (0, 0, 1)
+  look_at(light, Vector((0.0, 0, 0)))
+  bpy.data.lamps['Lamp'].type = "HEMI"
+  bpy.data.lamps['Lamp'].energy = 1
+  bpy.data.lamps['Lamp'].use_specular = False
+  bpy.data.lamps['Lamp'].use_diffuse = True
+  bpy.context.scene.world.horizon_color = (
+      BG_LUMINANCE, BG_LUMINANCE, BG_LUMINANCE)
+  bpy.context.scene.render.resolution_x = render_size
+  bpy.context.scene.render.resolution_y = render_size
+  bpy.context.scene.render.resolution_percentage = 100
+  bpy.context.scene.render.use_antialiasing = True
+  bpy.context.scene.render.antialiasing_samples = '5'
+def get_modelview_matrix():
+  cam = bpy.data.objects["Camera"]
+  bpy.context.scene.update()
+  # when apply to object with CV coordinate i.e. to_blender * obj
+  # this gives object in blender coordinate
+  to_blender = mathutils.Matrix(
+      ((1., 0., 0., 0.),
+       (0., 0., -1., 0.),
+       (0., 1., 0., 0.),
+       (0., 0., 0., 1.)))
+  return cam.matrix_world.inverted() * to_blender
+def print_matrix(f, mat):
+  for i in range(4):
+    for j in range(4):
+      f.write("%lf " % mat[i][j])
+    f.write("\n")
+def mul(loc, v):
+  return [loc[i] * v for i in range(3)]
+def merge_all():
+  bpy.ops.object.select_by_type(type="MESH")
+  bpy.context.scene.objects.active = bpy.context.selected_objects[0]
+  bpy.ops.object.join()
+  obj = bpy.context.scene.objects.active
+  bpy.ops.object.origin_set(type="ORIGIN_CENTER_OF_MASS")
+  return obj
+def insert_frame(obj, frame_number):
+  obj.keyframe_insert(data_path="location", frame=frame_number)
+  obj.keyframe_insert(data_path="rotation_euler", frame=frame_number)
+  obj.keyframe_insert(data_path="scale", frame=frame_number)
+def render(output_prefix):
+  bpy.context.scene.render.filepath = output_prefix
+  bpy.context.scene.render.image_settings.file_format = "PNG"
+  bpy.context.scene.render.alpha_mode = "TRANSPARENT"
+  bpy.context.scene.render.image_settings.color_mode = "RGBA"
+  bpy.ops.render.render(write_still=True, animation=True)
+def render_obj(
+    obj_fn, save_dir, n, perturb_size, rotate=False, roll=False, scale=1.0):
+  # Load object.
+  bpy.ops.import_scene.obj(filepath=obj_fn)
+  cur_obj = merge_all()
+  scale = 2.0 / max(cur_obj.dimensions) * scale
+  cur_obj.scale = (scale, scale, scale)
+  # Using the center of mass as the origin doesn't really work, because Blender
+  # assumes the object is a solid shell. This seems to generate better-looking
+  # rotations.
+  bpy.ops.object.origin_set(type='ORIGIN_GEOMETRY', center='BOUNDS')
+  # bpy.ops.mesh.primitive_cube_add(location=(0, 0, 1))
+  # cube = bpy.data.objects["Cube"]
+  # cube.scale = (0.2, 0.2, 0.2)
+  for polygon in cur_obj.data.polygons:
+    polygon.use_smooth = True
+  bpy.ops.object.select_all(action="DESELECT")
+  camera = bpy.data.objects["Camera"]
+  # os.system("mkdir " + save_dir)
+  for i in range(n):
+    fo = open(save_dir + "/%06d.txt" % i, "w")
+    d = 30
+    shift = 0.2
+    if rotate:
+      t = 1.0 * i / (n-1) * 2 * math.pi
+      loc = [math.sin(t), math.cos(t), 1]
+      normalize(loc)
+      camera.location = mul(loc, d)
+      look_at(camera, Vector((0.0, 0, 0)))
+      print_matrix(fo, get_modelview_matrix())
+      print_matrix(fo, get_modelview_matrix())
+      insert_frame(camera, 2 * i)
+      insert_frame(camera, 2 * i + 1)
+    else:
+      loc = random_top_sphere()
+      camera.location = mul(loc, d)
+      look_at(camera, Vector((0.0, 0, 0)))
+      if roll:
+        roll_camera(camera)
+      camera.location = perturb(mul(loc, d), shift)
+      print_matrix(fo, get_modelview_matrix())
+      insert_frame(camera, 2 * i)
+      if perturb_size > 0:
+        loc = perturb_sphere(loc, perturb_size)
+      else:
+        loc = random_top_sphere()
+      camera.location = mul(loc, d)
+      look_at(camera, Vector((0.0, 0, 0)))
+      if roll:
+        roll_camera(camera)
+      camera.location = perturb(mul(loc, d), shift)
+      print_matrix(fo, get_modelview_matrix())
+      insert_frame(camera, 2 * i + 1)
+    fo.close()
+  # Create a bunch of views of the object
+  bpy.context.scene.frame_start = 0
+  bpy.context.scene.frame_end = 2 * n - 1
+  stem = os.path.join(save_dir, '######')
+  render(stem)
+def main():
+  parser = argparse.ArgumentParser()
+  parser.add_argument('-m', '--model', dest='model',
+                      required=True,
+                      help='Path to model obj file.')
+  parser.add_argument('-o', '--output_dir', dest='output_dir',
+                      required=True,
+                      help='Where to output files.')
+  parser.add_argument('-s', '--output_size', dest='output_size',
+                      required=True,
+                      help='Width and height of output in pixels, e.g. 32x32.')
+  parser.add_argument('-n', '--num_frames', dest='n', type=int,
+                      required=True,
+                      help='Number of frames to generate per clip.')
+  parser.add_argument('-scale', '--scale', dest='scale', type=float,
+                      help='object scaling', default=1)
+  parser.add_argument('-perturb', '--perturb', dest='perturb', type=float,
+                      help='sphere perturbation', default=0)
+  parser.add_argument('-rotate', '--rotate', dest='rotate', action='store_true',
+                      help='render rotating test set')
+  parser.add_argument('-roll', '--roll', dest='roll', action='store_true',
+                      help='add roll')
+  parser.add_argument(
+      '-fov', '--fov', dest='fov', type=float, required=True,
+      help='field of view')
+  if '--' not in sys.argv:
+    parser.print_help()
+    exit(1)
+  argv = sys.argv[sys.argv.index('--') + 1:]
+  args, _ = parser.parse_known_args(argv)
+  random.seed(args.model + str(time.time()) + str(os.getpid()))
+  # random.seed(0)
+  set_scene(int(args.output_size), args.fov)
+  render_obj(
+      args.model, args.output_dir, args.n, args.perturb, args.rotate,
+      args.roll, args.scale)
+  exit()
+if __name__ == '__main__':
+  main()
--- a/research/keypointnet/utils.py
+++ b/research/keypointnet/utils.py
+# Copyright 2018 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# =============================================================================
+"""Utility functions for KeypointNet.
+These are helper / tensorflow related functions. The actual implementation and
+algorithm is in main.py.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import numpy as np
+import os
+import re
+import tensorflow as tf
+import tensorflow.contrib.slim as slim
+import time
+import traceback
+class TrainingHook(tf.train.SessionRunHook):
+  """A utility for displaying training information such as the loss, percent
+  completed, estimated finish date and time."""
+  def __init__(self, steps):
+    self.steps = steps
+    self.last_time = time.time()
+    self.last_est = self.last_time
+    self.eta_interval = int(math.ceil(0.1 * self.steps))
+    self.current_interval = 0
+  def before_run(self, run_context):
+    graph = tf.get_default_graph()
+    return tf.train.SessionRunArgs(
+        {"loss": graph.get_collection("total_loss")[0]})
+  def after_run(self, run_context, run_values):
+    step = run_context.session.run(tf.train.get_global_step())
+    now = time.time()
+    if self.current_interval < self.eta_interval:
+      self.duration = now - self.last_est
+      self.current_interval += 1
+    if step % self.eta_interval == 0:
+      self.duration = now - self.last_est
+      self.last_est = now
+    eta_time = float(self.steps - step) / self.current_interval * \
+        self.duration
+    m, s = divmod(eta_time, 60)
+    h, m = divmod(m, 60)
+    eta = "%d:%02d:%02d" % (h, m, s)
+    print("%.2f%% (%d/%d): %.3e t %.3f  @ %s (%s)" % (
+        step * 100.0 / self.steps,
+        step,
+        self.steps,
+        run_values.results["loss"],
+        now - self.last_time,
+        time.strftime("%a %d %H:%M:%S", time.localtime(time.time() + eta_time)),
+        eta))
+    self.last_time = now
+def standard_model_fn(
+    func, steps, run_config=None, sync_replicas=0, optimizer_fn=None):
+  """Creates model_fn for tf.Estimator.
+  Args:
+    func: A model_fn with prototype model_fn(features, labels, mode, hparams).
+    steps: Training steps.
+    run_config: tf.estimatorRunConfig (usually passed in from TF_CONFIG).
+    sync_replicas: The number of replicas used to compute gradient for
+        synchronous training.
+    optimizer_fn: The type of the optimizer. Default to Adam.
+  Returns:
+    model_fn for tf.estimator.Estimator.
+  """
+  def fn(features, labels, mode, params):
+    """Returns model_fn for tf.estimator.Estimator."""
+    is_training = (mode == tf.estimator.ModeKeys.TRAIN)
+    ret = func(features, labels, mode, params)
+    tf.add_to_collection("total_loss", ret["loss"])
+    train_op = None
+    training_hooks = []
+    if is_training:
+      training_hooks.append(TrainingHook(steps))
+      if optimizer_fn is None:
+        optimizer = tf.train.AdamOptimizer(params.learning_rate)
+      else:
+        optimizer = optimizer_fn
+      if run_config is not None and run_config.num_worker_replicas > 1:
+        sr = sync_replicas
+        if sr <= 0:
+          sr = run_config.num_worker_replicas
+        optimizer = tf.train.SyncReplicasOptimizer(
+            optimizer,
+            replicas_to_aggregate=sr,
+            total_num_replicas=run_config.num_worker_replicas)
+        training_hooks.append(
+            optimizer.make_session_run_hook(
+                run_config.is_chief, num_tokens=run_config.num_worker_replicas))
+      optimizer = tf.contrib.estimator.clip_gradients_by_norm(optimizer, 5)
+      train_op = slim.learning.create_train_op(ret["loss"], optimizer)
+    if "eval_metric_ops" not in ret:
+      ret["eval_metric_ops"] = {}
+    return tf.estimator.EstimatorSpec(
+        mode=mode,
+        predictions=ret["predictions"],
+        loss=ret["loss"],
+        train_op=train_op,
+        eval_metric_ops=ret["eval_metric_ops"],
+        training_hooks=training_hooks)
+  return fn
+def train_and_eval(
+    model_dir,
+    steps,
+    batch_size,
+    model_fn,
+    input_fn,
+    hparams,
+    keep_checkpoint_every_n_hours=0.5,
+    save_checkpoints_secs=180,
+    save_summary_steps=50,
+    eval_steps=20,
+    eval_start_delay_secs=10,
+    eval_throttle_secs=300,
+    sync_replicas=0):
+  """Trains and evaluates our model. Supports local and distributed training.
+  Args:
+    model_dir: The output directory for trained parameters, checkpoints, etc.
+    steps: Training steps.
+    batch_size: Batch size.
+    model_fn: A func with prototype model_fn(features, labels, mode, hparams).
+    input_fn: A input function for the tf.estimator.Estimator.
+    hparams: tf.HParams containing a set of hyperparameters.
+    keep_checkpoint_every_n_hours: Number of hours between each checkpoint
+        to be saved.
+    save_checkpoints_secs: Save checkpoints every this many seconds.
+    save_summary_steps: Save summaries every this many steps.
+    eval_steps: Number of steps to evaluate model.
+    eval_start_delay_secs: Start evaluating after waiting for this many seconds.
+    eval_throttle_secs: Do not re-evaluate unless the last evaluation was
+        started at least this many seconds ago
+    sync_replicas: Number of synchronous replicas for distributed training.
+  Returns:
+    None
+  """
+  run_config = tf.estimator.RunConfig(
+      keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours,
+      save_checkpoints_secs=save_checkpoints_secs,
+      save_summary_steps=save_summary_steps)
+  estimator = tf.estimator.Estimator(
+      model_dir=model_dir,
+      model_fn=standard_model_fn(
+          model_fn,
+          steps,
+          run_config,
+          sync_replicas=sync_replicas),
+      params=hparams, config=run_config)
+  train_spec = tf.estimator.TrainSpec(
+      input_fn=input_fn(split="train", batch_size=batch_size),
+      max_steps=steps)
+  eval_spec = tf.estimator.EvalSpec(
+      input_fn=input_fn(split="validation", batch_size=batch_size),
+      steps=eval_steps,
+      start_delay_secs=eval_start_delay_secs,
+      throttle_secs=eval_throttle_secs)
+  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
+def draw_circle(rgb, u, v, col, r):
+  """Draws a simple anti-aliasing circle in-place.
+  Args:
+    rgb: Input image to be modified.
+    u: Horizontal coordinate.
+    v: Vertical coordinate.
+    col: Color.
+    r: Radius.
+  """
+  ir = int(math.ceil(r))
+  for i in range(-ir-1, ir+2):
+    for j in range(-ir-1, ir+2):
+      nu = int(round(u + i))
+      nv = int(round(v + j))
+      if nu < 0 or nu >= rgb.shape[1] or nv < 0 or nv >= rgb.shape[0]:
+        continue
+      du = abs(nu - u)
+      dv = abs(nv - v)
+      # need sqrt to keep scale
+      t = math.sqrt(du * du + dv * dv) - math.sqrt(r * r)
+      if t < 0:
+        rgb[nv, nu, :] = col
+      else:
+        t = 1 - t
+        if t > 0:
+          # t = t ** 0.3
+          rgb[nv, nu, :] = col * t + rgb[nv, nu, :] * (1-t)
+def draw_ndc_points(rgb, xy, cols):
+  """Draws keypoints onto an input image.
+  Args:
+    rgb: Input image to be modified.
+    xy: [n x 2] matrix of 2D locations.
+    cols: A list of colors for the keypoints.
+  """
+  vh, vw = rgb.shape[0], rgb.shape[1]
+  for j in range(len(cols)):
+    x, y = xy[j, :2]
+    x = (min(max(x, -1), 1) * vw / 2 + vw / 2) - 0.5
+    y = vh - 0.5 - (min(max(y, -1), 1) * vh / 2 + vh / 2)
+    x = int(round(x))
+    y = int(round(y))
+    if x < 0 or y < 0 or x >= vw or y >= vh:
+      continue
+    rad = 1.5
+    rad *= rgb.shape[0] / 128.0
+    draw_circle(rgb, x, y, np.array([0.0, 0.0, 0.0, 1.0]), rad * 1.5)
+    draw_circle(rgb, x, y, cols[j], rad)
+def colored_hook(home_dir):
+  """Colorizes python's error message.
+  Args:
+    home_dir: directory where code resides (to highlight your own files).
+  Returns:
+    The traceback hook.
+  """
+  def hook(type_, value, tb):
+    def colorize(text, color, own=0):
+      """Returns colorized text."""
+      endcolor = "\x1b[0m"
+      codes = {
+          "green": "\x1b[0;32m",
+          "green_own": "\x1b[1;32;40m",
+          "red": "\x1b[0;31m",
+          "red_own": "\x1b[1;31m",
+          "yellow": "\x1b[0;33m",
+          "yellow_own": "\x1b[1;33m",
+          "black": "\x1b[0;90m",
+          "black_own": "\x1b[1;90m",
+          "cyan": "\033[1;36m",
+      }
+      return codes[color + ("_own" if own else "")] + text + endcolor
+    for filename, line_num, func, text in traceback.extract_tb(tb):
+      basename = os.path.basename(filename)
+      own = (home_dir in filename) or ("/" not in filename)
+      print(colorize("\"" + basename + '"', "green", own) + " in " + func)
+      print("%s:  %s" % (
+          colorize("%5d" % line_num, "red", own),
+          colorize(text, "yellow", own)))
+      print("  %s" % colorize(filename, "black", own))
+    print(colorize("%s: %s" % (type_.__name__, value), "cyan"))
+  return hook