Add vid2depth model.

026ca58a · Reza Mahjourian · a1adc50b · 026ca58a · 026ca58a · 026ca58a
Commit 026ca58a authored Jun 04, 2018 by Reza Mahjourian
20 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -44,6 +44,7 @@
 /research/tensorrt/ @karmel
 /research/textsum/ @panyx0718 @peterjliu
 /research/transformer/ @daviddao
+/research/vid2depth/ @rezama
 /research/video_prediction/ @cbfinn
 /research/fivo/ @dieterichlawson
 /samples/ @MarkDaoust @lamberta

--- a/research/README.md
+++ b/research/README.md
@@ -70,5 +70,7 @@ request.
    summarization.
 -   [transformer](transformer): spatial transformer network, which allows the
    spatial manipulation of data within the network.
+-   [vid2depth](vid2depth): learning depth and ego-motion unsupervised from
+    raw monocular video.
 -   [video_prediction](video_prediction): predicting future video frames with
    neural advection.
--- a/research/vid2depth/.bazelrc
+++ b/research/vid2depth/.bazelrc
+# For projects which use TensorFlow as part of a Bazel build process, putting
+# nothing in a bazelrc will default to a monolithic build. The following line
+# opts in to modular op registration support by default.
+build --define framework_shared_object=true
+
+build --copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
+
+build --define=grpc_no_ares=true
--- a/research/vid2depth/BUILD
+++ b/research/vid2depth/BUILD
+package(default_visibility = ["//visibility:public"])
--- a/research/vid2depth/README.md
+++ b/research/vid2depth/README.md
+# vid2depth
+
+**Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints**
+
+Reza Mahjourian, Martin Wicke, Anelia Angelova
+
+CVPR 2018
+
+Project website: [http://sites.google.com/corp/view/vid2depth](https://sites.google.com/corp/view/vid2depth)
+
+ArXiv: [https://arxiv.org/abs/1802.05522](https://arxiv.org/abs/1802.05522)
+
+<p align="center">
+<a href="http://sites.google.com/corp/view/vid2depth"><img src='https://storage.googleapis.com/vid2depth/media/sample_video.gif'></a>
+</p>
+
+<p align="center">
+<a href="http://sites.google.com/corp/view/vid2depth"><img src='https://storage.googleapis.com/vid2depth/media/approach.png' width=400></a>
+</p>
+
+## 1. Installation
+
+### Requirements
+
+#### Python Packages
+
+```shell
+mkvirtualenv venv  # Optionally create a virtual environment.
+pip install absl-py
+pip install matplotlib
+pip install numpy
+pip install scipy
+pip install tensorflow
+```
+
+#### For building the ICP op (work in progress)
+
+* Bazel: https://bazel.build/
+
+### Download vid2depth
+
+```shell
+git clone --depth 1 https://github.com/tensorflow/models.git
+```
+
+## 2. Datasets
+
+### Download KITTI dataset (174GB)
+
+```shell
+mkdir -p ~/vid2depth/kitti-raw-uncompressed
+cd ~/vid2depth/kitti-raw-uncompressed
+wget https://github.com/mrharicot/monodepth/blob/master/utils/kitti_archives_to_download.txt
+wget -i kitti_archives_to_download.txt
+unzip "*.zip"
+```
+
+### Download Cityscapes dataset (110GB) (optional)
+
+You will need to register in order to download the data.  Download the following files:
+
+* leftImg8bit_sequence_trainvaltest.zip
+* camera_trainvaltest.zip
+
+### Download Bike dataset (17GB) (optional)
+
+```shell
+mkdir -p ~/vid2depth/bike-uncompressed
+cd ~/vid2depth/bike-uncompressed
+wget https://storage.googleapis.com/brain-robotics-data/bike/BikeVideoDataset.tar
+tar xvf BikeVideoDataset.tar
+```
+
+## 3. Inference
+
+### Download trained model
+
+```shell
+mkdir -p ~/vid2depth/trained-model
+cd ~/vid2depth/trained-model
+wget https://storage.cloud.google.com/vid2depth/model/model-119496.zip
+unzip model-119496.zip
+```
+
+### Run inference
+
+```shell
+cd tensorflow/models/research/vid2depth
+python inference.py \
+  --kitti_dir ~/vid2depth/kitti-raw-uncompressed \
+  --output_dir ~/vid2depth/inference \
+  --video 2011_09_26/2011_09_26_drive_0009_sync \
+  --model_ckpt ~/vid2depth/trained-model/model-119496
+```
+
+## 4. Training
+
+### Prepare KITTI training sequences
+
+```shell
+# Prepare training sequences.
+cd tensorflow/models/research/vid2depth
+python dataset/gen_data.py \
+  --dataset_name kitti_raw_eigen \
+  --dataset_dir ~/vid2depth/kitti-raw-uncompressed \
+  --data_dir ~/vid2depth/data/kitti_raw_eigen \
+  --seq_length 3
+```
+
+### Prepare Cityscapes training sequences (optional)
+
+```shell
+# Prepare training sequences.
+cd tensorflow/models/research/vid2depth
+python dataset/gen_data.py \
+  --dataset_name cityscapes \
+  --dataset_dir ~/vid2depth/cityscapes-uncompressed \
+  --data_dir ~/vid2depth/data/cityscapes \
+  --seq_length 3
+```
+
+### Prepare Bike training sequences (optional)
+
+```shell
+# Prepare training sequences.
+cd tensorflow/models/research/vid2depth
+python dataset/gen_data.py \
+  --dataset_name bike \
+  --dataset_dir ~/vid2depth/bike-uncompressed \
+  --data_dir ~/vid2depth/data/bike \
+  --seq_length 3
+```
+
+### Compile the ICP op (work in progress)
+
+The ICP op depends on multiple software packages (TensorFlow, Point Cloud
+Library, FLANN, Boost, HDF5).  The Bazel build system requires individual BUILD
+files for each of these packages.  We have included a partial implementation of
+these BUILD files inside the third_party directory.  But they are not ready for
+compiling the op.  If you manage to build the op, please let us know so we can
+include your contribution.
+
+```shell
+cd tensorflow/models/research/vid2depth
+bazel build ops:pcl_demo  # Build test program using PCL only.
+bazel build ops:icp_op.so
+```
+
+For the time being, it is possible to run inference on the pre-trained model and
+run training without the icp loss.
+
+### Run training
+
+```shell
+# Train
+cd tensorflow/models/research/vid2depth
+python train.py \
+  --data_dir ~/vid2depth/data/kitti_raw_eigen \
+  --seq_length 3 \
+  --reconstr_weight 0.85 \
+  --smooth_weight 0.05 \
+  --ssim_weight 0.15 \
+  --icp_weight 0 \
+  --checkpoint_dir ~/vid2depth/checkpoints
+```
+
+## Reference
+If you find our work useful in your research please consider citing our paper:
+
+```
+@inproceedings{mahjourian2018unsupervised,
+  title={Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints},
+    author={Mahjourian, Reza and Wicke, Martin and Angelova, Anelia},
+    booktitle = {CVPR},
+    year={2018}
+}
+```
+
+## Contact
+
+To ask questions or report issues please open an issue on the tensorflow/models
+[issues tracker](https://github.com/tensorflow/models/issues). Please assign
+issues to [@rezama](https://github.com/rezama).
+
+## Credits
+
+This implementation is derived from [SfMLearner](https://github.com/tinghuiz/SfMLearner) by [Tinghui Zhou](https://github.com/tinghuiz).
--- a/research/vid2depth/WORKSPACE
+++ b/research/vid2depth/WORKSPACE
+workspace(name = "vid2depth")
+
+# To update TensorFlow to a new revision.
+# 1. Update the 'git_commit' args below to include the new git hash.
+# 2. Get the sha256 hash of the archive with a command such as...
+#    curl -L https://github.com/tensorflow/tensorflow/archive/<git hash>.tar.gz | sha256sum
+#    and update the 'sha256' arg with the result.
+# 3. Request the new archive to be mirrored on mirror.bazel.build for more
+#    reliable downloads.
+load(":repo.bzl", "tensorflow_http_archive")
+
+tensorflow_http_archive(
+    name = "org_tensorflow",
+    git_commit = "bc69c4ceed6544c109be5693eb40ddcf3a4eb95d",
+    sha256 = "21d6ac553adcfc9d089925f6d6793fee6a67264a0ce717bc998636662df4ca7e",
+)
+
+# TensorFlow depends on "io_bazel_rules_closure" so we need this here.
+# Needs to be kept in sync with the same target in TensorFlow's WORKSPACE file.
+http_archive(
+    name = "io_bazel_rules_closure",
+    sha256 = "dbe0da2cca88194d13dc5a7125a25dd7b80e1daec7839f33223de654d7a1bcc8",
+    strip_prefix = "rules_closure-ba3e07cb88be04a2d4af7009caa0ff3671a79d06",
+    urls = [
+        "https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/ba3e07cb88be04a2d4af7009caa0ff3671a79d06.tar.gz",
+        "https://github.com/bazelbuild/rules_closure/archive/ba3e07cb88be04a2d4af7009caa0ff3671a79d06.tar.gz",  # 2017-10-31
+    ],
+)
+
+load("@org_tensorflow//tensorflow:workspace.bzl", "tf_workspace")
+
+tf_workspace(
+    path_prefix = "",
+    tf_repo_name = "org_tensorflow",
+)
+
+bind(
+    name = "libssl",
+    actual = "@boringssl//:ssl",
+)
+
+bind(
+    name = "zlib",
+    actual = "@zlib_archive//:zlib",
+)
+
+# gRPC wants the existence of a cares dependence but its contents are not
+# actually important since we have set GRPC_ARES=0 in tools/bazel.rc
+bind(
+    name = "cares",
+    actual = "@grpc//third_party/nanopb:nanopb",
+)
+
+# Specify the minimum required bazel version.
+load("@org_tensorflow//tensorflow:workspace.bzl", "check_bazel_version_at_least")
+
+check_bazel_version_at_least("0.5.4")
+
+# TODO(rodrigoq): rename to com_github_antonovvk_bazel_rules to match cartographer.
+http_archive(
+    name = "bazel_rules",
+    sha256 = "b6e1b6cfc17f676c70045deb6d46bb330490693e65c8d541aae265ea34a48c8c",
+    strip_prefix = "bazel_rules-0394a3b108412b8e543fd90255daa416e988c4a1",
+    urls = [
+        "https://mirror.bazel.build/github.com/drigz/bazel_rules/archive/0394a3b108412b8e543fd90255daa416e988c4a1.tar.gz",
+        "https://github.com/drigz/bazel_rules/archive/0394a3b108412b8e543fd90255daa416e988c4a1.tar.gz",
+    ],
+)
+
+# Point Cloud Library (PCL)
+new_http_archive(
+    name = "com_github_pointcloudlibrary_pcl",
+    build_file = "//third_party:pcl.BUILD",
+    sha256 = "5a102a2fbe2ba77c775bf92c4a5d2e3d8170be53a68c3a76cfc72434ff7b9783",
+    strip_prefix = "pcl-pcl-1.8.1",
+    urls = [
+        "https://mirror.bazel.build/github.com/PointCloudLibrary/pcl/archive/pcl-1.8.1.tar.gz",
+        "https://github.com/PointCloudLibrary/pcl/archive/pcl-1.8.1.tar.gz",
+    ],
+)
+
+# FLANN
+new_http_archive(
+    name = "flann",
+    build_file = "//third_party:flann.BUILD",
+    strip_prefix = "flann-1.8.4-src",
+    urls = [
+        "https://www.cs.ubc.ca/research/flann/uploads/FLANN/flann-1.8.4-src.zip",
+    ],
+)
+
+# HDF5
+new_http_archive(
+    name = "hdf5",
+    url = "https://support.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.10.1.tar.gz",
+    strip_prefix = "hdf5-1.10.1",
+    build_file = "third_party/hdf5.BUILD",
+)
+
+# Boost
+# http_archive(
+#     name = "com_github_nelhage_boost",
+#     sha256 = "5c88fc077f6b8111e997fec5146e5f9940ae9a2016eb9949447fcb4b482bcdb3",
+#     strip_prefix = "rules_boost-7289bb1d8f938fdf98078297768c122ee9e11c9e",
+#     urls = [
+#         "https://mirror.bazel.build/github.com/nelhage/rules_boost/archive/7289bb1d8f938fdf98078297768c122ee9e11c9e.tar.gz",
+#         "https://github.com/nelhage/rules_boost/archive/7289bb1d8f938fdf98078297768c122ee9e11c9e.tar.gz",
+#     ],
+# )
+#
+# load("@com_github_nelhage_boost//:boost/boost.bzl", "boost_deps")
+# boost_deps()
+
+git_repository(
+    name = "com_github_nelhage_rules_boost",
+    commit = "239ce40e42ab0e3fe7ce84c2e9303ff8a277c41a",
+    remote = "https://github.com/nelhage/rules_boost",
+)
+
+load("@com_github_nelhage_rules_boost//:boost/boost.bzl", "boost_deps")
+
+boost_deps()
+
+# Eigen
+# Based on https://github.com/tensorflow/tensorflow/blob/master/third_party/eigen.BUILD
+new_http_archive(
+    name = "eigen_repo",
+    build_file = "//third_party:eigen.BUILD",
+    sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
+    strip_prefix = "eigen-eigen-f3a22f35b044",
+    urls = [
+        "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
+        "https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
+    ],
+)
--- a/research/vid2depth/dataset/__init__.py
+++ b/research/vid2depth/dataset/__init__.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
--- a/research/vid2depth/dataset/dataset_loader.py
+++ b/research/vid2depth/dataset/dataset_loader.py
--- a/research/vid2depth/dataset/gen_data.py
+++ b/research/vid2depth/dataset/gen_data.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Generates data for training/validation and save it to disk."""
+
+# Example usage:
+#
+# python dataset/gen_data.py \
+#   --alsologtostderr \
+#   --dataset_name kitti_raw_eigen \
+#   --dataset_dir ~/vid2depth/dataset/kitti-raw-uncompressed \
+#   --data_dir ~/vid2depth/data/kitti_raw_eigen_s3 \
+#   --seq_length 3 \
+#   --num_threads 12
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import itertools
+import multiprocessing
+import os
+from absl import app
+from absl import flags
+from absl import logging
+import dataset_loader
+import numpy as np
+import scipy.misc
+import tensorflow as tf
+
+gfile = tf.gfile
+FLAGS = flags.FLAGS
+
+DATASETS = [
+    'kitti_raw_eigen', 'kitti_raw_stereo', 'kitti_odom', 'cityscapes', 'bike'
+]
+
+flags.DEFINE_enum('dataset_name', None, DATASETS, 'Dataset name.')
+flags.DEFINE_string('dataset_dir', None, 'Location for dataset source files.')
+flags.DEFINE_string('data_dir', None, 'Where to save the generated data.')
+# Note: Training time grows linearly with sequence length.  Use 2 or 3.
+flags.DEFINE_integer('seq_length', 3, 'Length of each training sequence.')
+flags.DEFINE_integer('img_height', 128, 'Image height.')
+flags.DEFINE_integer('img_width', 416, 'Image width.')
+flags.DEFINE_integer(
+    'num_threads', None, 'Number of worker threads. '
+    'Defaults to number of CPU cores.')
+
+flags.mark_flag_as_required('dataset_name')
+flags.mark_flag_as_required('dataset_dir')
+flags.mark_flag_as_required('data_dir')
+
+# Process data in chunks for reporting progress.
+NUM_CHUNKS = 100
+
+
+def _generate_data():
+  """Extract sequences from dataset_dir and store them in data_dir."""
+  if not gfile.Exists(FLAGS.data_dir):
+    gfile.MakeDirs(FLAGS.data_dir)
+
+  global dataloader  # pylint: disable=global-variable-undefined
+  if FLAGS.dataset_name == 'bike':
+    dataloader = dataset_loader.Bike(FLAGS.dataset_dir,
+                                     img_height=FLAGS.img_height,
+                                     img_width=FLAGS.img_width,
+                                     seq_length=FLAGS.seq_length)
+  elif FLAGS.dataset_name == 'kitti_odom':
+    dataloader = dataset_loader.KittiOdom(FLAGS.dataset_dir,
+                                          img_height=FLAGS.img_height,
+                                          img_width=FLAGS.img_width,
+                                          seq_length=FLAGS.seq_length)
+  elif FLAGS.dataset_name == 'kitti_raw_eigen':
+    dataloader = dataset_loader.KittiRaw(FLAGS.dataset_dir,
+                                         split='eigen',
+                                         img_height=FLAGS.img_height,
+                                         img_width=FLAGS.img_width,
+                                         seq_length=FLAGS.seq_length)
+  elif FLAGS.dataset_name == 'kitti_raw_stereo':
+    dataloader = dataset_loader.KittiRaw(FLAGS.dataset_dir,
+                                         split='stereo',
+                                         img_height=FLAGS.img_height,
+                                         img_width=FLAGS.img_width,
+                                         seq_length=FLAGS.seq_length)
+  elif FLAGS.dataset_name == 'cityscapes':
+    dataloader = dataset_loader.Cityscapes(FLAGS.dataset_dir,
+                                           img_height=FLAGS.img_height,
+                                           img_width=FLAGS.img_width,
+                                           seq_length=FLAGS.seq_length)
+  else:
+    raise ValueError('Unknown dataset')
+
+  # The default loop below uses multiprocessing, which can make it difficult
+  # to locate source of errors in data loader classes.
+  # Uncomment this loop for easier debugging:
+
+  # all_examples = {}
+  # for i in range(dataloader.num_train):
+  #   _gen_example(i, all_examples)
+  #   logging.info('Generated: %d', len(all_examples))
+
+  all_frames = range(dataloader.num_train)
+  frame_chunks = np.array_split(all_frames, NUM_CHUNKS)
+
+  manager = multiprocessing.Manager()
+  all_examples = manager.dict()
+  num_cores = multiprocessing.cpu_count()
+  num_threads = num_cores if FLAGS.num_threads is None else FLAGS.num_threads
+  pool = multiprocessing.Pool(num_threads)
+
+  # Split into training/validation sets. Fixed seed for repeatability.
+  np.random.seed(8964)
+
+  if not gfile.Exists(FLAGS.data_dir):
+    gfile.MakeDirs(FLAGS.data_dir)
+
+  with gfile.Open(os.path.join(FLAGS.data_dir, 'train.txt'), 'w') as train_f:
+    with gfile.Open(os.path.join(FLAGS.data_dir, 'val.txt'), 'w') as val_f:
+      logging.info('Generating data...')
+      for index, frame_chunk in enumerate(frame_chunks):
+        all_examples.clear()
+        pool.map(_gen_example_star,
+                 itertools.izip(frame_chunk, itertools.repeat(all_examples)))
+        logging.info('Chunk %d/%d: saving %s entries...', index + 1, NUM_CHUNKS,
+                     len(all_examples))
+        for _, example in all_examples.items():
+          if example:
+            s = example['folder_name']
+            frame = example['file_name']
+            if np.random.random() < 0.1:
+              val_f.write('%s %s\n' % (s, frame))
+            else:
+              train_f.write('%s %s\n' % (s, frame))
+  pool.close()
+  pool.join()
+
+
+def _gen_example(i, all_examples):
+  """Saves one example to file.  Also adds it to all_examples dict."""
+  example = dataloader.get_example_with_index(i)
+  if not example:
+    return
+  image_seq_stack = _stack_image_seq(example['image_seq'])
+  example.pop('image_seq', None)  # Free up memory.
+  intrinsics = example['intrinsics']
+  fx = intrinsics[0, 0]
+  fy = intrinsics[1, 1]
+  cx = intrinsics[0, 2]
+  cy = intrinsics[1, 2]
+  save_dir = os.path.join(FLAGS.data_dir, example['folder_name'])
+  if not gfile.Exists(save_dir):
+    gfile.MakeDirs(save_dir)
+  img_filepath = os.path.join(save_dir, '%s.jpg' % example['file_name'])
+  scipy.misc.imsave(img_filepath, image_seq_stack.astype(np.uint8))
+  cam_filepath = os.path.join(save_dir, '%s_cam.txt' % example['file_name'])
+  example['cam'] = '%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy)
+  with open(cam_filepath, 'w') as cam_f:
+    cam_f.write(example['cam'])
+
+  key = example['folder_name'] + '_' + example['file_name']
+  all_examples[key] = example
+
+
+def _gen_example_star(params):
+  return _gen_example(*params)
+
+
+def _stack_image_seq(seq):
+  for i, im in enumerate(seq):
+    if i == 0:
+      res = im
+    else:
+      res = np.hstack((res, im))
+  return res
+
+
+def main(_):
+  _generate_data()
+
+
+if __name__ == '__main__':
+  app.run(main)
--- a/research/vid2depth/dataset/kitti/static_frames.txt
+++ b/research/vid2depth/dataset/kitti/static_frames.txt
--- a/research/vid2depth/dataset/kitti/test_files_eigen.txt
+++ b/research/vid2depth/dataset/kitti/test_files_eigen.txt
--- a/research/vid2depth/dataset/kitti/test_files_stereo.txt
+++ b/research/vid2depth/dataset/kitti/test_files_stereo.txt
+training/image_2/000000_10.png
+training/image_2/000001_10.png
+training/image_2/000002_10.png
+training/image_2/000003_10.png
+training/image_2/000004_10.png
+training/image_2/000005_10.png
+training/image_2/000006_10.png
+training/image_2/000007_10.png
+training/image_2/000008_10.png
+training/image_2/000009_10.png
+training/image_2/000010_10.png
+training/image_2/000011_10.png
+training/image_2/000012_10.png
+training/image_2/000013_10.png
+training/image_2/000014_10.png
+training/image_2/000015_10.png
+training/image_2/000016_10.png
+training/image_2/000017_10.png
+training/image_2/000018_10.png
+training/image_2/000019_10.png
+training/image_2/000020_10.png
+training/image_2/000021_10.png
+training/image_2/000022_10.png
+training/image_2/000023_10.png
+training/image_2/000024_10.png
+training/image_2/000025_10.png
+training/image_2/000026_10.png
+training/image_2/000027_10.png
+training/image_2/000028_10.png
+training/image_2/000029_10.png
+training/image_2/000030_10.png
+training/image_2/000031_10.png
+training/image_2/000032_10.png
+training/image_2/000033_10.png
+training/image_2/000034_10.png
+training/image_2/000035_10.png
+training/image_2/000036_10.png
+training/image_2/000037_10.png
+training/image_2/000038_10.png
+training/image_2/000039_10.png
+training/image_2/000040_10.png
+training/image_2/000041_10.png
+training/image_2/000042_10.png
+training/image_2/000043_10.png
+training/image_2/000044_10.png
+training/image_2/000045_10.png
+training/image_2/000046_10.png
+training/image_2/000047_10.png
+training/image_2/000048_10.png
+training/image_2/000049_10.png
+training/image_2/000050_10.png
+training/image_2/000051_10.png
+training/image_2/000052_10.png
+training/image_2/000053_10.png
+training/image_2/000054_10.png
+training/image_2/000055_10.png
+training/image_2/000056_10.png
+training/image_2/000057_10.png
+training/image_2/000058_10.png
+training/image_2/000059_10.png
+training/image_2/000060_10.png
+training/image_2/000061_10.png
+training/image_2/000062_10.png
+training/image_2/000063_10.png
+training/image_2/000064_10.png
+training/image_2/000065_10.png
+training/image_2/000066_10.png
+training/image_2/000067_10.png
+training/image_2/000068_10.png
+training/image_2/000069_10.png
+training/image_2/000070_10.png
+training/image_2/000071_10.png
+training/image_2/000072_10.png
+training/image_2/000073_10.png
+training/image_2/000074_10.png
+training/image_2/000075_10.png
+training/image_2/000076_10.png
+training/image_2/000077_10.png
+training/image_2/000078_10.png
+training/image_2/000079_10.png
+training/image_2/000080_10.png
+training/image_2/000081_10.png
+training/image_2/000082_10.png
+training/image_2/000083_10.png
+training/image_2/000084_10.png
+training/image_2/000085_10.png
+training/image_2/000086_10.png
+training/image_2/000087_10.png
+training/image_2/000088_10.png
+training/image_2/000089_10.png
+training/image_2/000090_10.png
+training/image_2/000091_10.png
+training/image_2/000092_10.png
+training/image_2/000093_10.png
+training/image_2/000094_10.png
+training/image_2/000095_10.png
+training/image_2/000096_10.png
+training/image_2/000097_10.png
+training/image_2/000098_10.png
+training/image_2/000099_10.png
+training/image_2/000100_10.png
+training/image_2/000101_10.png
+training/image_2/000102_10.png
+training/image_2/000103_10.png
+training/image_2/000104_10.png
+training/image_2/000105_10.png
+training/image_2/000106_10.png
+training/image_2/000107_10.png
+training/image_2/000108_10.png
+training/image_2/000109_10.png
+training/image_2/000110_10.png
+training/image_2/000111_10.png
+training/image_2/000112_10.png
+training/image_2/000113_10.png
+training/image_2/000114_10.png
+training/image_2/000115_10.png
+training/image_2/000116_10.png
+training/image_2/000117_10.png
+training/image_2/000118_10.png
+training/image_2/000119_10.png
+training/image_2/000120_10.png
+training/image_2/000121_10.png
+training/image_2/000122_10.png
+training/image_2/000123_10.png
+training/image_2/000124_10.png
+training/image_2/000125_10.png
+training/image_2/000126_10.png
+training/image_2/000127_10.png
+training/image_2/000128_10.png
+training/image_2/000129_10.png
+training/image_2/000130_10.png
+training/image_2/000131_10.png
+training/image_2/000132_10.png
+training/image_2/000133_10.png
+training/image_2/000134_10.png
+training/image_2/000135_10.png
+training/image_2/000136_10.png
+training/image_2/000137_10.png
+training/image_2/000138_10.png
+training/image_2/000139_10.png
+training/image_2/000140_10.png
+training/image_2/000141_10.png
+training/image_2/000142_10.png
+training/image_2/000143_10.png
+training/image_2/000144_10.png
+training/image_2/000145_10.png
+training/image_2/000146_10.png
+training/image_2/000147_10.png
+training/image_2/000148_10.png
+training/image_2/000149_10.png
+training/image_2/000150_10.png
+training/image_2/000151_10.png
+training/image_2/000152_10.png
+training/image_2/000153_10.png
+training/image_2/000154_10.png
+training/image_2/000155_10.png
+training/image_2/000156_10.png
+training/image_2/000157_10.png
+training/image_2/000158_10.png
+training/image_2/000159_10.png
+training/image_2/000160_10.png
+training/image_2/000161_10.png
+training/image_2/000162_10.png
+training/image_2/000163_10.png
+training/image_2/000164_10.png
+training/image_2/000165_10.png
+training/image_2/000166_10.png
+training/image_2/000167_10.png
+training/image_2/000168_10.png
+training/image_2/000169_10.png
+training/image_2/000170_10.png
+training/image_2/000171_10.png
+training/image_2/000172_10.png
+training/image_2/000173_10.png
+training/image_2/000174_10.png
+training/image_2/000175_10.png
+training/image_2/000176_10.png
+training/image_2/000177_10.png
+training/image_2/000178_10.png
+training/image_2/000179_10.png
+training/image_2/000180_10.png
+training/image_2/000181_10.png
+training/image_2/000182_10.png
+training/image_2/000183_10.png
+training/image_2/000184_10.png
+training/image_2/000185_10.png
+training/image_2/000186_10.png
+training/image_2/000187_10.png
+training/image_2/000188_10.png
+training/image_2/000189_10.png
+training/image_2/000190_10.png
+training/image_2/000191_10.png
+training/image_2/000192_10.png
+training/image_2/000193_10.png
+training/image_2/000194_10.png
+training/image_2/000195_10.png
+training/image_2/000196_10.png
+training/image_2/000197_10.png
+training/image_2/000198_10.png
+training/image_2/000199_10.png
--- a/research/vid2depth/dataset/kitti/test_scenes_eigen.txt
+++ b/research/vid2depth/dataset/kitti/test_scenes_eigen.txt
+2011_09_26_drive_0117
+2011_09_28_drive_0002
+2011_09_26_drive_0052
+2011_09_30_drive_0016
+2011_09_26_drive_0059
+2011_09_26_drive_0027
+2011_09_26_drive_0020
+2011_09_26_drive_0009
+2011_09_26_drive_0013
+2011_09_26_drive_0101
+2011_09_26_drive_0046
+2011_09_26_drive_0029
+2011_09_26_drive_0064
+2011_09_26_drive_0048
+2011_10_03_drive_0027
+2011_09_26_drive_0002
+2011_09_26_drive_0036
+2011_09_29_drive_0071
+2011_10_03_drive_0047
+2011_09_30_drive_0027
+2011_09_26_drive_0086
+2011_09_26_drive_0084
+2011_09_26_drive_0096
+2011_09_30_drive_0018
+2011_09_26_drive_0106
+2011_09_26_drive_0056
+2011_09_26_drive_0023
+2011_09_26_drive_0093
--- a/research/vid2depth/dataset/kitti/test_scenes_stereo.txt
+++ b/research/vid2depth/dataset/kitti/test_scenes_stereo.txt
+2011_09_26_drive_0005
+2011_09_26_drive_0009
+2011_09_26_drive_0011
+2011_09_26_drive_0013
+2011_09_26_drive_0014
+2011_09_26_drive_0015
+2011_09_26_drive_0017
+2011_09_26_drive_0018
+2011_09_26_drive_0019
+2011_09_26_drive_0022
+2011_09_26_drive_0027
+2011_09_26_drive_0028
+2011_09_26_drive_0029
+2011_09_26_drive_0032
+2011_09_26_drive_0036
+2011_09_26_drive_0046
+2011_09_26_drive_0051
+2011_09_26_drive_0056
+2011_09_26_drive_0057
+2011_09_26_drive_0059
+2011_09_26_drive_0070
+2011_09_26_drive_0084
+2011_09_26_drive_0096
+2011_09_26_drive_0101
+2011_09_26_drive_0104
+2011_09_28_drive_0002
+2011_09_29_drive_0004
+2011_09_29_drive_0071
+2011_10_03_drive_0047
--- a/research/vid2depth/inference.py
+++ b/research/vid2depth/inference.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Generates depth estimates for an entire KITTI video."""
+
+# Example usage:
+#
+# python inference.py \
+#   --logtostderr \
+#   --kitti_dir ~/vid2depth/kitti-raw-uncompressed \
+#   --kitti_video 2011_09_26/2011_09_26_drive_0009_sync \
+#   --output_dir ~/vid2depth/inference \
+#   --model_ckpt ~/vid2depth/trained-model/model-119496
+#
+# python inference.py \
+#   --logtostderr \
+#   --kitti_dir ~/vid2depth/kitti-raw-uncompressed \
+#   --kitti_video test_files_eigen \
+#   --output_dir ~/vid2depth/inference \
+#   --model_ckpt ~/vid2depth/trained-model/model-119496
+#
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+from absl import app
+from absl import flags
+from absl import logging
+import matplotlib.pyplot as plt
+import model
+import numpy as np
+import scipy.misc
+import tensorflow as tf
+import util
+
+gfile = tf.gfile
+
+HOME_DIR = os.path.expanduser('~')
+DEFAULT_OUTPUT_DIR = os.path.join(HOME_DIR, 'vid2depth/inference')
+DEFAULT_KITTI_DIR = os.path.join(HOME_DIR, 'kitti-raw-uncompressed')
+
+flags.DEFINE_string('output_dir', DEFAULT_OUTPUT_DIR,
+                    'Directory to store estimated depth maps.')
+flags.DEFINE_string('kitti_dir', DEFAULT_KITTI_DIR, 'KITTI dataset directory.')
+flags.DEFINE_string('model_ckpt', None, 'Model checkpoint to load.')
+flags.DEFINE_string('kitti_video', None, 'KITTI video directory name.')
+flags.DEFINE_integer('batch_size', 4, 'The size of a sample batch.')
+flags.DEFINE_integer('img_height', 128, 'Image height.')
+flags.DEFINE_integer('img_width', 416, 'Image width.')
+flags.DEFINE_integer('seq_length', 3, 'Sequence length for each example.')
+FLAGS = flags.FLAGS
+
+flags.mark_flag_as_required('kitti_video')
+flags.mark_flag_as_required('model_ckpt')
+
+CMAP = 'plasma'
+
+
+def _run_inference():
+  """Runs all images through depth model and saves depth maps."""
+  ckpt_basename = os.path.basename(FLAGS.model_ckpt)
+  ckpt_modelname = os.path.basename(os.path.dirname(FLAGS.model_ckpt))
+  output_dir = os.path.join(FLAGS.output_dir,
+                            FLAGS.kitti_video.replace('/', '_') + '_' +
+                            ckpt_modelname + '_' + ckpt_basename)
+  if not gfile.Exists(output_dir):
+    gfile.MakeDirs(output_dir)
+  inference_model = model.Model(is_training=False,
+                                seq_length=FLAGS.seq_length,
+                                batch_size=FLAGS.batch_size,
+                                img_height=FLAGS.img_height,
+                                img_width=FLAGS.img_width)
+  vars_to_restore = util.get_vars_to_restore(FLAGS.model_ckpt)
+  saver = tf.train.Saver(vars_to_restore)
+  sv = tf.train.Supervisor(logdir='/tmp/', saver=None)
+  with sv.managed_session() as sess:
+    saver.restore(sess, FLAGS.model_ckpt)
+    if FLAGS.kitti_video == 'test_files_eigen':
+      im_files = util.read_text_lines(
+          util.get_resource_path('dataset/kitti/test_files_eigen.txt'))
+      im_files = [os.path.join(FLAGS.kitti_dir, f) for f in im_files]
+    else:
+      video_path = os.path.join(FLAGS.kitti_dir, FLAGS.kitti_video)
+      im_files = gfile.Glob(os.path.join(video_path, 'image_02/data', '*.png'))
+      im_files = [f for f in im_files if 'disp' not in f]
+      im_files = sorted(im_files)
+    for i in range(0, len(im_files), FLAGS.batch_size):
+      if i % 100 == 0:
+        logging.info('Generating from %s: %d/%d', ckpt_basename, i,
+                     len(im_files))
+      inputs = np.zeros(
+          (FLAGS.batch_size, FLAGS.img_height, FLAGS.img_width, 3),
+          dtype=np.uint8)
+      for b in range(FLAGS.batch_size):
+        idx = i + b
+        if idx >= len(im_files):
+          break
+        im = scipy.misc.imread(im_files[idx])
+        inputs[b] = scipy.misc.imresize(im, (FLAGS.img_height, FLAGS.img_width))
+      results = inference_model.inference(inputs, sess, mode='depth')
+      for b in range(FLAGS.batch_size):
+        idx = i + b
+        if idx >= len(im_files):
+          break
+        if FLAGS.kitti_video == 'test_files_eigen':
+          depth_path = os.path.join(output_dir, '%03d.png' % idx)
+        else:
+          depth_path = os.path.join(output_dir, '%04d.png' % idx)
+        depth_map = results['depth'][b]
+        depth_map = np.squeeze(depth_map)
+        colored_map = _normalize_depth_for_display(depth_map, cmap=CMAP)
+        input_float = inputs[b].astype(np.float32) / 255.0
+        vertical_stack = np.concatenate((input_float, colored_map), axis=0)
+        scipy.misc.imsave(depth_path, vertical_stack)
+
+
+def _gray2rgb(im, cmap=CMAP):
+  cmap = plt.get_cmap(cmap)
+  rgba_img = cmap(im.astype(np.float32))
+  rgb_img = np.delete(rgba_img, 3, 2)
+  return rgb_img
+
+
+def _normalize_depth_for_display(depth,
+                                 pc=95,
+                                 crop_percent=0,
+                                 normalizer=None,
+                                 cmap=CMAP):
+  """Converts a depth map to an RGB image."""
+  # Convert to disparity.
+  disp = 1.0 / (depth + 1e-6)
+  if normalizer is not None:
+    disp /= normalizer
+  else:
+    disp /= (np.percentile(disp, pc) + 1e-6)
+  disp = np.clip(disp, 0, 1)
+  disp = _gray2rgb(disp, cmap=cmap)
+  keep_h = int(disp.shape[0] * (1 - crop_percent))
+  disp = disp[:keep_h]
+  return disp
+
+
+def main(_):
+  _run_inference()
+
+
+if __name__ == '__main__':
+  app.run(main)
--- a/research/vid2depth/model.py
+++ b/research/vid2depth/model.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Build model for inference or training."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl import logging
+import nets
+from ops import icp_grad  # pylint: disable=unused-import
+from ops.icp_op import icp
+import project
+import reader
+import tensorflow as tf
+import util
+
+gfile = tf.gfile
+slim = tf.contrib.slim
+
+NUM_SCALES = 4
+
+
+class Model(object):
+  """Model code from SfMLearner."""
+
+  def __init__(self,
+               data_dir=None,
+               is_training=True,
+               learning_rate=0.0002,
+               beta1=0.9,
+               reconstr_weight=0.85,
+               smooth_weight=0.05,
+               ssim_weight=0.15,
+               icp_weight=0.0,
+               batch_size=4,
+               img_height=128,
+               img_width=416,
+               seq_length=3,
+               legacy_mode=False):
+    self.data_dir = data_dir
+    self.is_training = is_training
+    self.learning_rate = learning_rate
+    self.reconstr_weight = reconstr_weight
+    self.smooth_weight = smooth_weight
+    self.ssim_weight = ssim_weight
+    self.icp_weight = icp_weight
+    self.beta1 = beta1
+    self.batch_size = batch_size
+    self.img_height = img_height
+    self.img_width = img_width
+    self.seq_length = seq_length
+    self.legacy_mode = legacy_mode
+
+    logging.info('data_dir: %s', data_dir)
+    logging.info('learning_rate: %s', learning_rate)
+    logging.info('beta1: %s', beta1)
+    logging.info('smooth_weight: %s', smooth_weight)
+    logging.info('ssim_weight: %s', ssim_weight)
+    logging.info('icp_weight: %s', icp_weight)
+    logging.info('batch_size: %s', batch_size)
+    logging.info('img_height: %s', img_height)
+    logging.info('img_width: %s', img_width)
+    logging.info('seq_length: %s', seq_length)
+    logging.info('legacy_mode: %s', legacy_mode)
+
+    if self.is_training:
+      self.reader = reader.DataReader(self.data_dir, self.batch_size,
+                                      self.img_height, self.img_width,
+                                      self.seq_length, NUM_SCALES)
+      self.build_train_graph()
+    else:
+      self.build_depth_test_graph()
+      self.build_egomotion_test_graph()
+
+    # At this point, the model is ready.  Print some info on model params.
+    util.count_parameters()
+
+  def build_train_graph(self):
+    self.build_inference_for_training()
+    self.build_loss()
+    self.build_train_op()
+    self.build_summaries()
+
+  def build_inference_for_training(self):
+    """Invokes depth and ego-motion networks and computes clouds if needed."""
+    (self.image_stack, self.intrinsic_mat, self.intrinsic_mat_inv) = (
+        self.reader.read_data())
+    with tf.name_scope('egomotion_prediction'):
+      self.egomotion, _ = nets.egomotion_net(self.image_stack, is_training=True,
+                                             legacy_mode=self.legacy_mode)
+    with tf.variable_scope('depth_prediction'):
+      # Organized by ...[i][scale].  Note that the order is flipped in
+      # variables in build_loss() below.
+      self.disp = {}
+      self.depth = {}
+      if self.icp_weight > 0:
+        self.cloud = {}
+      for i in range(self.seq_length):
+        image = self.image_stack[:, :, :, 3 * i:3 * (i + 1)]
+        multiscale_disps_i, _ = nets.disp_net(image, is_training=True)
+        multiscale_depths_i = [1.0 / d for d in multiscale_disps_i]
+        self.disp[i] = multiscale_disps_i
+        self.depth[i] = multiscale_depths_i
+        if self.icp_weight > 0:
+          multiscale_clouds_i = [
+              project.get_cloud(d,
+                                self.intrinsic_mat_inv[:, s, :, :],
+                                name='cloud%d_%d' % (s, i))
+              for (s, d) in enumerate(multiscale_depths_i)
+          ]
+          self.cloud[i] = multiscale_clouds_i
+        # Reuse the same depth graph for all images.
+        tf.get_variable_scope().reuse_variables()
+    logging.info('disp: %s', util.info(self.disp))
+
+  def build_loss(self):
+    """Adds ops for computing loss."""
+    with tf.name_scope('compute_loss'):
+      self.reconstr_loss = 0
+      self.smooth_loss = 0
+      self.ssim_loss = 0
+      self.icp_transform_loss = 0
+      self.icp_residual_loss = 0
+
+      # self.images is organized by ...[scale][B, h, w, seq_len * 3].
+      self.images = [{} for _ in range(NUM_SCALES)]
+      # Following nested lists are organized by ...[scale][source-target].
+      self.warped_image = [{} for _ in range(NUM_SCALES)]
+      self.warp_mask = [{} for _ in range(NUM_SCALES)]
+      self.warp_error = [{} for _ in range(NUM_SCALES)]
+      self.ssim_error = [{} for _ in range(NUM_SCALES)]
+      self.icp_transform = [{} for _ in range(NUM_SCALES)]
+      self.icp_residual = [{} for _ in range(NUM_SCALES)]
+
+      self.middle_frame_index = util.get_seq_middle(self.seq_length)
+
+      # Compute losses at each scale.
+      for s in range(NUM_SCALES):
+        # Scale image stack.
+        height_s = int(self.img_height / (2**s))
+        width_s = int(self.img_width / (2**s))
+        self.images[s] = tf.image.resize_area(self.image_stack,
+                                              [height_s, width_s])
+
+        # Smoothness.
+        if self.smooth_weight > 0:
+          for i in range(self.seq_length):
+            # In legacy mode, use the depth map from the middle frame only.
+            if not self.legacy_mode or i == self.middle_frame_index:
+              self.smooth_loss += 1.0 / (2**s) * self.depth_smoothness(
+                  self.disp[i][s], self.images[s][:, :, :, 3 * i:3 * (i + 1)])
+
+        for i in range(self.seq_length):
+          for j in range(self.seq_length):
+            # Only consider adjacent frames.
+            if i == j or abs(i - j) != 1:
+              continue
+            # In legacy mode, only consider the middle frame as target.
+            if self.legacy_mode and j != self.middle_frame_index:
+              continue
+            source = self.images[s][:, :, :, 3 * i:3 * (i + 1)]
+            target = self.images[s][:, :, :, 3 * j:3 * (j + 1)]
+            target_depth = self.depth[j][s]
+            key = '%d-%d' % (i, j)
+
+            # Extract ego-motion from i to j
+            egomotion_index = min(i, j)
+            egomotion_mult = 1
+            if i > j:
+              # Need to inverse egomotion when going back in sequence.
+              egomotion_mult *= -1
+            # For compatiblity with SfMLearner, interpret all egomotion vectors
+            # as pointing toward the middle frame.  Note that unlike SfMLearner,
+            # each vector captures the motion to/from its next frame, and not
+            # the center frame.  Although with seq_length == 3, there is no
+            # difference.
+            if self.legacy_mode:
+              if egomotion_index >= self.middle_frame_index:
+                egomotion_mult *= -1
+            egomotion = egomotion_mult * self.egomotion[:, egomotion_index, :]
+
+            # Inverse warp the source image to the target image frame for
+            # photometric consistency loss.
+            self.warped_image[s][key], self.warp_mask[s][key] = (
+                project.inverse_warp(source,
+                                     target_depth,
+                                     egomotion,
+                                     self.intrinsic_mat[:, s, :, :],
+                                     self.intrinsic_mat_inv[:, s, :, :]))
+
+            # Reconstruction loss.
+            self.warp_error[s][key] = tf.abs(self.warped_image[s][key] - target)
+            self.reconstr_loss += tf.reduce_mean(
+                self.warp_error[s][key] * self.warp_mask[s][key])
+            # SSIM.
+            if self.ssim_weight > 0:
+              self.ssim_error[s][key] = self.ssim(self.warped_image[s][key],
+                                                  target)
+              # TODO(rezama): This should be min_pool2d().
+              ssim_mask = slim.avg_pool2d(self.warp_mask[s][key], 3, 1, 'VALID')
+              self.ssim_loss += tf.reduce_mean(
+                  self.ssim_error[s][key] * ssim_mask)
+            # 3D loss.
+            if self.icp_weight > 0:
+              cloud_a = self.cloud[j][s]
+              cloud_b = self.cloud[i][s]
+              self.icp_transform[s][key], self.icp_residual[s][key] = icp(
+                  cloud_a, egomotion, cloud_b)
+              self.icp_transform_loss += 1.0 / (2**s) * tf.reduce_mean(
+                  tf.abs(self.icp_transform[s][key]))
+              self.icp_residual_loss += 1.0 / (2**s) * tf.reduce_mean(
+                  tf.abs(self.icp_residual[s][key]))
+
+      self.total_loss = self.reconstr_weight * self.reconstr_loss
+      if self.smooth_weight > 0:
+        self.total_loss += self.smooth_weight * self.smooth_loss
+      if self.ssim_weight > 0:
+        self.total_loss += self.ssim_weight * self.ssim_loss
+      if self.icp_weight > 0:
+        self.total_loss += self.icp_weight * (self.icp_transform_loss +
+                                              self.icp_residual_loss)
+
+  def gradient_x(self, img):
+    return img[:, :, :-1, :] - img[:, :, 1:, :]
+
+  def gradient_y(self, img):
+    return img[:, :-1, :, :] - img[:, 1:, :, :]
+
+  def depth_smoothness(self, depth, img):
+    """Computes image-aware depth smoothness loss."""
+    depth_dx = self.gradient_x(depth)
+    depth_dy = self.gradient_y(depth)
+    image_dx = self.gradient_x(img)
+    image_dy = self.gradient_y(img)
+    weights_x = tf.exp(-tf.reduce_mean(tf.abs(image_dx), 3, keepdims=True))
+    weights_y = tf.exp(-tf.reduce_mean(tf.abs(image_dy), 3, keepdims=True))
+    smoothness_x = depth_dx * weights_x
+    smoothness_y = depth_dy * weights_y
+    return tf.reduce_mean(abs(smoothness_x)) + tf.reduce_mean(abs(smoothness_y))
+
+  def ssim(self, x, y):
+    """Computes a differentiable structured image similarity measure."""
+    c1 = 0.01**2
+    c2 = 0.03**2
+    mu_x = slim.avg_pool2d(x, 3, 1, 'VALID')
+    mu_y = slim.avg_pool2d(y, 3, 1, 'VALID')
+    sigma_x = slim.avg_pool2d(x**2, 3, 1, 'VALID') - mu_x**2
+    sigma_y = slim.avg_pool2d(y**2, 3, 1, 'VALID') - mu_y**2
+    sigma_xy = slim.avg_pool2d(x * y, 3, 1, 'VALID') - mu_x * mu_y
+    ssim_n = (2 * mu_x * mu_y + c1) * (2 * sigma_xy + c2)
+    ssim_d = (mu_x**2 + mu_y**2 + c1) * (sigma_x + sigma_y + c2)
+    ssim = ssim_n / ssim_d
+    return tf.clip_by_value((1 - ssim) / 2, 0, 1)
+
+  def build_train_op(self):
+    with tf.name_scope('train_op'):
+      optim = tf.train.AdamOptimizer(self.learning_rate, self.beta1)
+      self.train_op = slim.learning.create_train_op(self.total_loss, optim)
+      self.global_step = tf.Variable(0, name='global_step', trainable=False)
+      self.incr_global_step = tf.assign(self.global_step, self.global_step + 1)
+
+  def build_summaries(self):
+    """Adds scalar and image summaries for TensorBoard."""
+    tf.summary.scalar('total_loss', self.total_loss)
+    tf.summary.scalar('reconstr_loss', self.reconstr_loss)
+    if self.smooth_weight > 0:
+      tf.summary.scalar('smooth_loss', self.smooth_loss)
+    if self.ssim_weight > 0:
+      tf.summary.scalar('ssim_loss', self.ssim_loss)
+    if self.icp_weight > 0:
+      tf.summary.scalar('icp_transform_loss', self.icp_transform_loss)
+      tf.summary.scalar('icp_residual_loss', self.icp_residual_loss)
+
+    for i in range(self.seq_length - 1):
+      tf.summary.histogram('tx%d' % i, self.egomotion[:, i, 0])
+      tf.summary.histogram('ty%d' % i, self.egomotion[:, i, 1])
+      tf.summary.histogram('tz%d' % i, self.egomotion[:, i, 2])
+      tf.summary.histogram('rx%d' % i, self.egomotion[:, i, 3])
+      tf.summary.histogram('ry%d' % i, self.egomotion[:, i, 4])
+      tf.summary.histogram('rz%d' % i, self.egomotion[:, i, 5])
+
+    for s in range(NUM_SCALES):
+      for i in range(self.seq_length):
+        tf.summary.image('scale%d_image%d' % (s, i),
+                         self.images[s][:, :, :, 3 * i:3 * (i + 1)])
+        if i in self.depth:
+          tf.summary.histogram('scale%d_depth%d' % (s, i), self.depth[i][s])
+          tf.summary.histogram('scale%d_disp%d' % (s, i), self.disp[i][s])
+          tf.summary.image('scale%d_disparity%d' % (s, i), self.disp[i][s])
+
+      for key in self.warped_image[s]:
+        tf.summary.image('scale%d_warped_image%s' % (s, key),
+                         self.warped_image[s][key])
+        tf.summary.image('scale%d_warp_mask%s' % (s, key),
+                         self.warp_mask[s][key])
+        tf.summary.image('scale%d_warp_error%s' % (s, key),
+                         self.warp_error[s][key])
+        if self.ssim_weight > 0:
+          tf.summary.image('scale%d_ssim_error%s' % (s, key),
+                           self.ssim_error[s][key])
+        if self.icp_weight > 0:
+          tf.summary.image('scale%d_icp_residual%s' % (s, key),
+                           self.icp_residual[s][key])
+          transform = self.icp_transform[s][key]
+          tf.summary.histogram('scale%d_icp_tx%s' % (s, key), transform[:, 0])
+          tf.summary.histogram('scale%d_icp_ty%s' % (s, key), transform[:, 1])
+          tf.summary.histogram('scale%d_icp_tz%s' % (s, key), transform[:, 2])
+          tf.summary.histogram('scale%d_icp_rx%s' % (s, key), transform[:, 3])
+          tf.summary.histogram('scale%d_icp_ry%s' % (s, key), transform[:, 4])
+          tf.summary.histogram('scale%d_icp_rz%s' % (s, key), transform[:, 5])
+
+  def build_depth_test_graph(self):
+    """Builds depth model reading from placeholders."""
+    with tf.name_scope('depth_prediction'):
+      with tf.variable_scope('depth_prediction'):
+        input_uint8 = tf.placeholder(
+            tf.uint8, [self.batch_size, self.img_height, self.img_width, 3],
+            name='raw_input')
+        input_float = tf.image.convert_image_dtype(input_uint8, tf.float32)
+        # TODO(rezama): Retrain published model with batchnorm params and set
+        # is_training to False.
+        est_disp, _ = nets.disp_net(input_float, is_training=True)
+        est_depth = 1.0 / est_disp[0]
+    self.inputs_depth = input_uint8
+    self.est_depth = est_depth
+
+  def build_egomotion_test_graph(self):
+    """Builds egomotion model reading from placeholders."""
+    input_uint8 = tf.placeholder(
+        tf.uint8,
+        [self.batch_size, self.img_height, self.img_width * self.seq_length, 3],
+        name='raw_input')
+    input_float = tf.image.convert_image_dtype(input_uint8, tf.float32)
+    image_seq = input_float
+    image_stack = self.unpack_image_batches(image_seq)
+    with tf.name_scope('egomotion_prediction'):
+        # TODO(rezama): Retrain published model with batchnorm params and set
+        # is_training to False.
+      egomotion, _ = nets.egomotion_net(image_stack, is_training=True,
+                                        legacy_mode=self.legacy_mode)
+    self.inputs_egomotion = input_uint8
+    self.est_egomotion = egomotion
+
+  def unpack_image_batches(self, image_seq):
+    """[B, h, w * seq_length, 3] -> [B, h, w, 3 * seq_length]."""
+    with tf.name_scope('unpack_images'):
+      image_list = [
+          image_seq[:, :, i * self.img_width:(i + 1) * self.img_width, :]
+          for i in range(self.seq_length)
+      ]
+      image_stack = tf.concat(image_list, axis=3)
+      image_stack.set_shape([
+          self.batch_size, self.img_height, self.img_width, self.seq_length * 3
+      ])
+    return image_stack
+
+  def inference(self, inputs, sess, mode):
+    """Runs depth or egomotion inference from placeholders."""
+    fetches = {}
+    if mode == 'depth':
+      fetches['depth'] = self.est_depth
+      inputs_ph = self.inputs_depth
+    if mode == 'egomotion':
+      fetches['egomotion'] = self.est_egomotion
+      inputs_ph = self.inputs_egomotion
+    results = sess.run(fetches, feed_dict={inputs_ph: inputs})
+    return results
--- a/research/vid2depth/nets.py
+++ b/research/vid2depth/nets.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Depth and Ego-Motion networks."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from absl import flags
+import numpy as np
+import tensorflow as tf
+import util
+
+slim = tf.contrib.slim
+
+# TODO(rezama): Move flag to main, pass as argument to functions.
+flags.DEFINE_bool('use_bn', True, 'Add batch norm layers.')
+FLAGS = flags.FLAGS
+
+# Weight regularization.
+WEIGHT_REG = 0.05
+
+# Disparity (inverse depth) values range from 0.01 to 10.
+DISP_SCALING = 10
+MIN_DISP = 0.01
+
+EGOMOTION_VEC_SIZE = 6
+
+
+def egomotion_net(image_stack, is_training=True, legacy_mode=False):
+  """Predict ego-motion vectors from a stack of frames.
+
+  Args:
+    image_stack: Input tensor with shape [B, h, w, seq_length * 3].  Regardless
+        of the value of legacy_mode, the input image sequence passed to the
+        function should be in normal order, e.g. [1, 2, 3].
+    is_training: Whether the model is being trained or not.
+    legacy_mode: Setting legacy_mode to True enables compatibility with
+        SfMLearner checkpoints.  When legacy_mode is on, egomotion_net()
+        rearranges the input tensor to place the target (middle) frame first in
+        sequence.  This is the arrangement of inputs that legacy models have
+        received during training.  In legacy mode, the client program
+        (model.Model.build_loss()) interprets the outputs of this network
+        differently as well.  For example:
+
+        When legacy_mode == True,
+        Network inputs will be [2, 1, 3]
+        Network outputs will be [1 -> 2, 3 -> 2]
+
+        When legacy_mode == False,
+        Network inputs will be [1, 2, 3]
+        Network outputs will be [1 -> 2, 2 -> 3]
+
+  Returns:
+    Egomotion vectors with shape [B, seq_length - 1, 6].
+  """
+  seq_length = image_stack.get_shape()[3].value // 3  # 3 == RGB.
+  if legacy_mode:
+    # Put the target frame at the beginning of stack.
+    with tf.name_scope('rearrange_stack'):
+      mid_index = util.get_seq_middle(seq_length)
+      left_subset = image_stack[:, :, :, :mid_index * 3]
+      target_frame = image_stack[:, :, :, mid_index * 3:(mid_index + 1) * 3]
+      right_subset = image_stack[:, :, :, (mid_index + 1) * 3:]
+      image_stack = tf.concat([target_frame, left_subset, right_subset], axis=3)
+  batch_norm_params = {'is_training': is_training}
+  num_egomotion_vecs = seq_length - 1
+  with tf.variable_scope('pose_exp_net') as sc:
+    end_points_collection = sc.original_name_scope + '_end_points'
+    normalizer_fn = slim.batch_norm if FLAGS.use_bn else None
+    normalizer_params = batch_norm_params if FLAGS.use_bn else None
+    with slim.arg_scope([slim.conv2d, slim.conv2d_transpose],
+                        normalizer_fn=normalizer_fn,
+                        weights_regularizer=slim.l2_regularizer(WEIGHT_REG),
+                        normalizer_params=normalizer_params,
+                        activation_fn=tf.nn.relu,
+                        outputs_collections=end_points_collection):
+      cnv1 = slim.conv2d(image_stack, 16, [7, 7], stride=2, scope='cnv1')
+      cnv2 = slim.conv2d(cnv1, 32, [5, 5], stride=2, scope='cnv2')
+      cnv3 = slim.conv2d(cnv2, 64, [3, 3], stride=2, scope='cnv3')
+      cnv4 = slim.conv2d(cnv3, 128, [3, 3], stride=2, scope='cnv4')
+      cnv5 = slim.conv2d(cnv4, 256, [3, 3], stride=2, scope='cnv5')
+
+      # Ego-motion specific layers
+      with tf.variable_scope('pose'):
+        cnv6 = slim.conv2d(cnv5, 256, [3, 3], stride=2, scope='cnv6')
+        cnv7 = slim.conv2d(cnv6, 256, [3, 3], stride=2, scope='cnv7')
+        pred_channels = EGOMOTION_VEC_SIZE * num_egomotion_vecs
+        egomotion_pred = slim.conv2d(cnv7,
+                                     pred_channels,
+                                     [1, 1],
+                                     scope='pred',
+                                     stride=1,
+                                     normalizer_fn=None,
+                                     activation_fn=None)
+        egomotion_avg = tf.reduce_mean(egomotion_pred, [1, 2])
+        # Tinghui found that scaling by a small constant facilitates training.
+        egomotion_final = 0.01 * tf.reshape(
+            egomotion_avg, [-1, num_egomotion_vecs, EGOMOTION_VEC_SIZE])
+
+      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
+      return egomotion_final, end_points
+
+
+def disp_net(target_image, is_training=True):
+  """Predict inverse of depth from a single image."""
+  batch_norm_params = {'is_training': is_training}
+  h = target_image.get_shape()[1].value
+  w = target_image.get_shape()[2].value
+  inputs = target_image
+  with tf.variable_scope('depth_net') as sc:
+    end_points_collection = sc.original_name_scope + '_end_points'
+    normalizer_fn = slim.batch_norm if FLAGS.use_bn else None
+    normalizer_params = batch_norm_params if FLAGS.use_bn else None
+    with slim.arg_scope([slim.conv2d, slim.conv2d_transpose],
+                        normalizer_fn=normalizer_fn,
+                        normalizer_params=normalizer_params,
+                        weights_regularizer=slim.l2_regularizer(WEIGHT_REG),
+                        activation_fn=tf.nn.relu,
+                        outputs_collections=end_points_collection):
+      cnv1 = slim.conv2d(inputs, 32, [7, 7], stride=2, scope='cnv1')
+      cnv1b = slim.conv2d(cnv1, 32, [7, 7], stride=1, scope='cnv1b')
+      cnv2 = slim.conv2d(cnv1b, 64, [5, 5], stride=2, scope='cnv2')
+      cnv2b = slim.conv2d(cnv2, 64, [5, 5], stride=1, scope='cnv2b')
+
+      cnv3 = slim.conv2d(cnv2b, 128, [3, 3], stride=2, scope='cnv3')
+      cnv3b = slim.conv2d(cnv3, 128, [3, 3], stride=1, scope='cnv3b')
+      cnv4 = slim.conv2d(cnv3b, 256, [3, 3], stride=2, scope='cnv4')
+      cnv4b = slim.conv2d(cnv4, 256, [3, 3], stride=1, scope='cnv4b')
+      cnv5 = slim.conv2d(cnv4b, 512, [3, 3], stride=2, scope='cnv5')
+      cnv5b = slim.conv2d(cnv5, 512, [3, 3], stride=1, scope='cnv5b')
+      cnv6 = slim.conv2d(cnv5b, 512, [3, 3], stride=2, scope='cnv6')
+      cnv6b = slim.conv2d(cnv6, 512, [3, 3], stride=1, scope='cnv6b')
+      cnv7 = slim.conv2d(cnv6b, 512, [3, 3], stride=2, scope='cnv7')
+      cnv7b = slim.conv2d(cnv7, 512, [3, 3], stride=1, scope='cnv7b')
+
+      up7 = slim.conv2d_transpose(cnv7b, 512, [3, 3], stride=2, scope='upcnv7')
+      # There might be dimension mismatch due to uneven down/up-sampling.
+      up7 = _resize_like(up7, cnv6b)
+      i7_in = tf.concat([up7, cnv6b], axis=3)
+      icnv7 = slim.conv2d(i7_in, 512, [3, 3], stride=1, scope='icnv7')
+
+      up6 = slim.conv2d_transpose(icnv7, 512, [3, 3], stride=2, scope='upcnv6')
+      up6 = _resize_like(up6, cnv5b)
+      i6_in = tf.concat([up6, cnv5b], axis=3)
+      icnv6 = slim.conv2d(i6_in, 512, [3, 3], stride=1, scope='icnv6')
+
+      up5 = slim.conv2d_transpose(icnv6, 256, [3, 3], stride=2, scope='upcnv5')
+      up5 = _resize_like(up5, cnv4b)
+      i5_in = tf.concat([up5, cnv4b], axis=3)
+      icnv5 = slim.conv2d(i5_in, 256, [3, 3], stride=1, scope='icnv5')
+
+      up4 = slim.conv2d_transpose(icnv5, 128, [3, 3], stride=2, scope='upcnv4')
+      i4_in = tf.concat([up4, cnv3b], axis=3)
+      icnv4 = slim.conv2d(i4_in, 128, [3, 3], stride=1, scope='icnv4')
+      disp4 = (slim.conv2d(icnv4, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
+                           normalizer_fn=None, scope='disp4')
+               * DISP_SCALING + MIN_DISP)
+      disp4_up = tf.image.resize_bilinear(disp4, [np.int(h / 4), np.int(w / 4)])
+
+      up3 = slim.conv2d_transpose(icnv4, 64, [3, 3], stride=2, scope='upcnv3')
+      i3_in = tf.concat([up3, cnv2b, disp4_up], axis=3)
+      icnv3 = slim.conv2d(i3_in, 64, [3, 3], stride=1, scope='icnv3')
+      disp3 = (slim.conv2d(icnv3, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
+                           normalizer_fn=None, scope='disp3')
+               * DISP_SCALING + MIN_DISP)
+      disp3_up = tf.image.resize_bilinear(disp3, [np.int(h / 2), np.int(w / 2)])
+
+      up2 = slim.conv2d_transpose(icnv3, 32, [3, 3], stride=2, scope='upcnv2')
+      i2_in = tf.concat([up2, cnv1b, disp3_up], axis=3)
+      icnv2 = slim.conv2d(i2_in, 32, [3, 3], stride=1, scope='icnv2')
+      disp2 = (slim.conv2d(icnv2, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
+                           normalizer_fn=None, scope='disp2')
+               * DISP_SCALING + MIN_DISP)
+      disp2_up = tf.image.resize_bilinear(disp2, [h, w])
+
+      up1 = slim.conv2d_transpose(icnv2, 16, [3, 3], stride=2, scope='upcnv1')
+      i1_in = tf.concat([up1, disp2_up], axis=3)
+      icnv1 = slim.conv2d(i1_in, 16, [3, 3], stride=1, scope='icnv1')
+      disp1 = (slim.conv2d(icnv1, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
+                           normalizer_fn=None, scope='disp1')
+               * DISP_SCALING + MIN_DISP)
+
+      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
+      return [disp1, disp2, disp3, disp4], end_points
+
+
+def _resize_like(inputs, ref):
+  i_h, i_w = inputs.get_shape()[1], inputs.get_shape()[2]
+  r_h, r_w = ref.get_shape()[1], ref.get_shape()[2]
+  if i_h == r_h and i_w == r_w:
+    return inputs
+  else:
+    return tf.image.resize_nearest_neighbor(inputs, [r_h.value, r_w.value])
--- a/research/vid2depth/ops/BUILD
+++ b/research/vid2depth/ops/BUILD
+load("@org_tensorflow//tensorflow:tensorflow.bzl", "tf_custom_op_library")
+
+package(default_visibility = ["//visibility:public"])
+
+filegroup(
+    name = "test_data",
+    srcs = glob(["testdata/**"]),
+)
+
+cc_library(
+    name = "icp_op_kernel",
+    srcs = ["icp_op_kernel.cc"],
+    copts = [
+        "-fexceptions",
+        "-Wno-sign-compare",
+        "-D_GLIBCXX_USE_CXX11_ABI=0",
+    ],
+    deps = [
+        "@com_github_pointcloudlibrary_pcl//:common",
+        "@com_github_pointcloudlibrary_pcl//:registration",
+        "@com_google_protobuf//:protobuf",
+        "@org_tensorflow//tensorflow/core:framework_headers_lib",
+    ],
+)
+
+tf_custom_op_library(
+    name = "icp_op.so",
+    linkopts = ["-llz4"],
+    deps = [
+        ":icp_op_kernel",
+    ],
+)
+
+py_library(
+    name = "icp_op",
+    srcs = ["icp_op.py"],
+    data = [
+        ":icp_op.so",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+    ],
+)
+
+py_library(
+    name = "icp_util",
+    srcs = ["icp_util.py"],
+    data = [":test_data"],
+    srcs_version = "PY2AND3",
+    deps = [
+        "@org_tensorflow//tensorflow:tensorflow_py",
+    ],
+)
+
+py_library(
+    name = "icp_grad",
+    srcs = ["icp_grad.py"],
+    data = [
+        ":icp_op.so",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+        "@org_tensorflow//tensorflow:tensorflow_py",
+        ":icp_op",
+    ],
+)
+
+cc_binary(
+    name = "pcl_demo",
+    srcs = ["pcl_demo.cc"],
+    deps = [
+        "@com_github_pointcloudlibrary_pcl//:common",
+        "@com_github_pointcloudlibrary_pcl//:registration",
+    ],
+)
+
+py_binary(
+    name = "icp_train_demo",
+    srcs = ["icp_train_demo.py"],
+    data = [
+        ":icp_op.so",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+        "@org_tensorflow//tensorflow:tensorflow_py",
+        ":icp_op",
+        ":icp_grad",
+        ":icp_util",
+    ],
+)
+
+py_test(
+    name = "icp_test",
+    size = "small",
+    srcs = ["icp_test.py"],
+    data = [
+        ":icp_op.so",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+        "@org_tensorflow//tensorflow:tensorflow_py",
+        ":icp_op",
+        ":icp_util",
+    ],
+)
+
+py_test(
+    name = "icp_grad_test",
+    size = "small",
+    srcs = ["icp_grad_test.py"],
+    data = [
+        ":icp_op.so",
+    ],
+    srcs_version = "PY2AND3",
+    deps = [
+        "@org_tensorflow//tensorflow:tensorflow_py",
+        ":icp_op",
+        ":icp_grad",
+        ":icp_test",
+    ],
+)
--- a/research/vid2depth/ops/__init__.py
+++ b/research/vid2depth/ops/__init__.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
--- a/research/vid2depth/ops/icp_grad.py
+++ b/research/vid2depth/ops/icp_grad.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""The gradient of the icp op."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import ops
+
+
+@ops.RegisterGradient('Icp')
+def _icp_grad(op, grad_transform, grad_residual):
+  """The gradients for `icp`.
+
+  Args:
+    op: The `icp` `Operation` that we are differentiating, which we can use
+      to find the inputs and outputs of the original op.
+    grad_transform: Gradient with respect to `transform` output of the `icp` op.
+    grad_residual: Gradient with respect to `residual` output of the
+      `icp` op.
+
+  Returns:
+    Gradients with respect to the inputs of `icp`.
+  """
+  unused_transform = op.outputs[0]
+  unused_residual = op.outputs[1]
+  unused_source = op.inputs[0]
+  unused_ego_motion = op.inputs[1]
+  unused_target = op.inputs[2]
+
+  grad_p = -grad_residual
+  grad_ego_motion = -grad_transform
+
+  return [grad_p, grad_ego_motion, None]