Open-source FEELVOS model, which was developed by Paul Voigtlaender during his...

Open-source FEELVOS model, which was developed by Paul Voigtlaender during his 2018 summer internship at Google. The work has been accepted to CVPR 2019. (#6274)

Open-source FEELVOS model, which was developed by Paul Voigtlaender during his...
Open-source FEELVOS model, which was developed by Paul Voigtlaender during his 2018 summer internship at Google. The work has been accepted to CVPR 2019. (#6274)
e1ae37c4 · aquariusjay · GitHub · 5274ec8b · e1ae37c4 · e1ae37c4
Unverified Commit e1ae37c4 authored Feb 27, 2019 by aquariusjay Committed by GitHub Feb 27, 2019
20 changed files
--- a/research/feelvos/CONTRIBUTING.md
+++ b/research/feelvos/CONTRIBUTING.md
+# How to Contribute
+We'd love to accept your patches and contributions to this project. There are
+just a few small guidelines you need to follow.
+## Contributor License Agreement
+Contributions to this project must be accompanied by a Contributor License
+Agreement. You (or your employer) retain the copyright to your contribution;
+this simply gives us permission to use and redistribute your contributions as
+part of the project. Head over to <https://cla.developers.google.com/> to see
+your current agreements on file or to sign a new one.
+You generally only need to submit a CLA once, so if you've already submitted one
+(even if it was for a different project), you probably don't need to do it
+again.
+## Code reviews
+All submissions, including submissions by project members, require review. We
+use GitHub pull requests for this purpose. Consult
+[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
+information on using pull requests.
+## Community Guidelines
+This project follows [Google's Open Source Community
+Guidelines](https://opensource.google.com/conduct/).
--- a/research/feelvos/LICENSE
+++ b/research/feelvos/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/research/feelvos/README.md
+++ b/research/feelvos/README.md
+# FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation
+FEELVOS is a fast model for video object segmentation which does not rely on fine-tuning on the
+first frame.
+For details, please refer to our paper. If you find the code useful, please
+also consider citing it.
+* FEELVOS:
+```
+@inproceedings{feelvos2019,
+    title={FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation},
+    author={Paul Voigtlaender and Yuning Chai and Florian Schroff and Hartwig Adam and Bastian Leibe and Liang-Chieh Chen},
+    booktitle={CVPR},
+    year={2019}
+}
+```
+## Dependencies
+FEELVOS requires a good GPU with around 12 GB of memory and depends on the following libraries
+* TensorFlow
+* Pillow
+* Numpy
+* Scipy
+* Scikit Learn Image
+* tf Slim (which is included in the "tensorflow/models/research/" checkout)
+* DeepLab (which is included in the "tensorflow/models/research/" checkout)
+* correlation_cost (optional, see below)
+For detailed steps to install Tensorflow, follow the [Tensorflow installation
+instructions](https://www.tensorflow.org/install/). A typical user can install
+Tensorflow using the following command:
+```bash
+pip install tensorflow-gpu
+```
+The remaining libraries can also be installed with pip using:
+```bash
+pip install pillow scipy scikit-image
+```
+## Dependency on correlation_cost
+For fast cross-correlation, we use correlation cost as an external dependency. By default FEELVOS
+will use a slow and memory hungry fallback implementation without correlation_cost. If you care for
+performance, you should set up correlation_cost by following the instructions in
+correlation_cost/README and afterwards setting ```USE_CORRELATION_COST = True``` in
+utils/embedding_utils.py.
+## Pre-trained Models
+We provide 2 pre-trained FEELVOS models, both are based on Xception-65:
+* [Trained on DAVIS 2017](http://download.tensorflow.org/models/feelvos_davis17_trained.tar.gz)
+* [Trained on DAVIS 2017 and YouTube-VOS](http://download.tensorflow.org/models/feelvos_davis17_and_youtubevos_trained.tar.gz)
+Additionally, we provide a [DeepLab checkpoint for Xception-65 pre-trained on ImageNet and COCO](http://download.tensorflow.org/models/xception_65_coco_pretrained_2018_10_02.tar.gz),
+which can be used as an initialization for training FEELVOS.
+## Pre-computed Segmentation Masks
+We provide [pre-computed segmentation masks](http://download.tensorflow.org/models/feelvos_precomputed_masks.zip)
+for FEELVOS both for training with and without YouTube-VOS data for the following datasets:
+* DAVIS 2017 validation set
+* DAVIS 2017 test-dev set
+* YouTube-Objects dataset
+## Local Inference
+For a demo of local inference on DAVIS 2017 run
+```bash
+# From tensorflow/models/research/feelvos
+sh eval.sh
+```
+## Local Training
+For a demo of local training on DAVIS 2017 run
+```bash
+# From tensorflow/models/research/feelvos
+sh train.sh
+```
+## Contacts (Maintainers)
+*   Paul Voigtlaender, github: [pvoigtlaender](https://github.com/pvoigtlaender)
+*   Yuning Chai, github: [yuningchai](https://github.com/yuningchai)
+*   Liang-Chieh Chen, github: [aquariusjay](https://github.com/aquariusjay)
+## License
+All the codes in feelvos folder is covered by the [LICENSE](https://github.com/tensorflow/models/blob/master/LICENSE)
+under tensorflow/models. Please refer to the LICENSE for details.
--- a/research/feelvos/__init__.py
+++ b/research/feelvos/__init__.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
--- a/research/feelvos/common.py
+++ b/research/feelvos/common.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Provides flags that are common to scripts.
+Common flags from train/vis_video.py are collected in this script.
+"""
+import tensorflow as tf
+from deeplab import common
+flags = tf.app.flags
+flags.DEFINE_enum(
+    'classification_loss', 'softmax_with_attention',
+    ['softmax', 'triplet', 'softmax_with_attention'],
+    'Type of loss function used for classifying pixels, can be either softmax, '
+    'softmax_with_attention, or triplet.')
+flags.DEFINE_integer('k_nearest_neighbors', 1,
+                     'The number of nearest neighbors to use.')
+flags.DEFINE_integer('embedding_dimension', 100, 'The dimension used for the '
+                                                 'learned embedding')
+flags.DEFINE_boolean('use_softmax_feedback', True,
+                     'Whether to give the softmax predictions of the last '
+                     'frame as additional input to the segmentation head.')
+flags.DEFINE_boolean('sample_adjacent_and_consistent_query_frames', True,
+                     'If true, the query frames (all but the first frame '
+                     'which is the reference frame) will be sampled such '
+                     'that they are adjacent video frames and have the same '
+                     'crop coordinates and flip augmentation. Note that if '
+                     'use_softmax_feedback is True, this option will '
+                     'automatically be activated.')
+flags.DEFINE_integer('embedding_seg_feature_dimension', 256,
+                     'The dimensionality used in the segmentation head layers.')
+flags.DEFINE_integer('embedding_seg_n_layers', 4, 'The number of layers in the '
+                                                  'segmentation head.')
+flags.DEFINE_integer('embedding_seg_kernel_size', 7, 'The kernel size used in '
+                                                     'the segmentation head.')
+flags.DEFINE_multi_integer('embedding_seg_atrous_rates', [],
+                           'The atrous rates to use for the segmentation head.')
+flags.DEFINE_boolean('normalize_nearest_neighbor_distances', True,
+                     'Whether to normalize the nearest neighbor distances '
+                     'to [0,1] using sigmoid, scale and shift.')
+flags.DEFINE_boolean('also_attend_to_previous_frame', True, 'Whether to also '
+                     'use nearest neighbor attention with respect to the '
+                     'previous frame.')
+flags.DEFINE_bool('use_local_previous_frame_attention', True,
+                  'Whether to restrict the previous frame attention to a local '
+                  'search window. Only has an effect, if '
+                  'also_attend_to_previous_frame is True.')
+flags.DEFINE_integer('previous_frame_attention_window_size', 15,
+                     'The window size used for local previous frame attention,'
+                     ' if use_local_previous_frame_attention is True.')
+flags.DEFINE_boolean('use_first_frame_matching', True, 'Whether to extract '
+                     'features by matching to the reference frame. This should '
+                     'always be true except for ablation experiments.')
+FLAGS = flags.FLAGS
+# Constants
+# Perform semantic segmentation predictions.
+OUTPUT_TYPE = common.OUTPUT_TYPE
+# Semantic segmentation item names.
+LABELS_CLASS = common.LABELS_CLASS
+IMAGE = common.IMAGE
+HEIGHT = common.HEIGHT
+WIDTH = common.WIDTH
+IMAGE_NAME = common.IMAGE_NAME
+SOURCE_ID = 'source_id'
+VIDEO_ID = 'video_id'
+LABEL = common.LABEL
+ORIGINAL_IMAGE = common.ORIGINAL_IMAGE
+PRECEDING_FRAME_LABEL = 'preceding_frame_label'
+# Test set name.
+TEST_SET = common.TEST_SET
+# Internal constants.
+OBJECT_LABEL = 'object_label'
+class VideoModelOptions(common.ModelOptions):
+  """Internal version of immutable class to hold model options."""
+  def __new__(cls,
+              outputs_to_num_classes,
+              crop_size=None,
+              atrous_rates=None,
+              output_stride=8):
+    """Constructor to set default values.
+    Args:
+      outputs_to_num_classes: A dictionary from output type to the number of
+        classes. For example, for the task of semantic segmentation with 21
+        semantic classes, we would have outputs_to_num_classes['semantic'] = 21.
+      crop_size: A tuple [crop_height, crop_width].
+      atrous_rates: A list of atrous convolution rates for ASPP.
+      output_stride: The ratio of input to output spatial resolution.
+    Returns:
+      A new VideoModelOptions instance.
+    """
+    self = super(VideoModelOptions, cls).__new__(
+        cls,
+        outputs_to_num_classes,
+        crop_size,
+        atrous_rates,
+        output_stride)
+    # Add internal flags.
+    self.classification_loss = FLAGS.classification_loss
+    return self
--- a/research/feelvos/correlation_cost/README.md
+++ b/research/feelvos/correlation_cost/README.md
+# correlation_cost
+FEELVOS uses correlation_cost as an optional dependency to improve the speed and memory consumption
+of cross-correlation.
+## Installation
+Unfortunately we cannot provide the code for correlation_cost directly, so you
+will have to copy some files from this pull request
+https://github.com/tensorflow/tensorflow/pull/21392/. For your convenience we
+prepared scripts to download and adjust the code automatically.
+In the best case, all you need to do is run compile.sh with the path to your
+CUDA installation (tested only with CUDA 9).
+Note that the path should be to a folder containing the cuda folder, not to the
+cuda folder itself, e.g. if your cuda is in /usr/local/cuda-9.0, you can create
+a symlink /usr/local/cuda pointing to /usr/local/cuda-9.0 and then run
+```bash
+sh build.sh /usr/local/
+```
+This will
+* Download the code via ```sh get_code.sh ```
+* Apply minor adjustments to the code via ```sh fix_code.sh```
+* Clone the dependencies cub and thrust from github via ```sh clone_dependencies.sh```
+* Compile a shared library correlation_cost.so for correlation_cost via
+```sh compile.sh "${CUDA_DIR}"```
+Please review the licenses of correlation_cost, cub, and thrust.
+## Enabling correlation_cost
+If you managed to create the correlation_cost.so file, then set
+```USE_CORRELATION_COST = True``` in feelvos/utils/embedding_utils.py and try to run
+```sh eval.sh```.
--- a/research/feelvos/correlation_cost/build.sh
+++ b/research/feelvos/correlation_cost/build.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to download and build the code for correlation_cost.
+#
+# Usage:
+#   sh ./build.sh cuda_dir
+# Where cuda_dir points to a directory containing the cuda folder (not the cuda folder itself).
+#
+#
+if [ "$#" -ne 1 ]; then
+  echo "Illegal number of parameters, usage: ./build.sh cuda_dir"
+  echo "Where cuda_dir points to a directory containing the cuda folder (not the cuda folder itself)"
+  exit 1
+fi
+set -e
+set -x
+sh ./get_code.sh
+sh ./fix_code.sh
+sh ./clone_dependencies.sh
+sh ./compile.sh $1
--- a/research/feelvos/correlation_cost/clone_dependencies.sh
+++ b/research/feelvos/correlation_cost/clone_dependencies.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to clone the dependencies, i.e. cub and thrust, of correlation_cost from github.
+#
+# Usage:
+#   sh ./clone_dependencies.sh
+#
+#
+# Clone cub.
+if [ ! -d cub ] ; then
+  git clone https://github.com/dmlc/cub.git
+fi
+# Clone thrust.
+if [ ! -d thrust ] ; then
+  git clone https://github.com/thrust/thrust.git
+fi
--- a/research/feelvos/correlation_cost/compile.sh
+++ b/research/feelvos/correlation_cost/compile.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to compile the code for correlation_cost and create correlation_cost.so.
+#
+#  Usage:
+#    sh ./compile.sh cuda_dir
+#  Where cuda_dir points to a directory containing the cuda folder (not the cuda folder itself).
+#
+#
+if [ "$#" -ne 1 ]; then
+  echo "Illegal number of parameters, usage: ./compile.sh cuda_dir"
+  exit 1
+fi
+CUDA_DIR=$1
+if [ ! -d "${CUDA_DIR}/cuda" ]; then
+  echo "cuda_dir must point to a directory containing the cuda folder, not to the cuda folder itself"
+  exit 1
+fi
+TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
+TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
+CUB_DIR=cub
+THRUST_DIR=thrust
+# Depending on the versions of your nvcc and gcc, the flag --expt-relaxed-constexpr might be required or should be removed.
+# If nvcc complains about a too new gcc version, you can point it to another gcc
+# version by using something like nvcc -ccbin /path/to/your/gcc6
+nvcc -std=c++11 --expt-relaxed-constexpr -I ./ -I ${CUB_DIR}/../ -I ${THRUST_DIR} -I ${CUDA_DIR}/ -c -o correlation_cost_op_gpu.o kernels/correlation_cost_op_gpu.cu.cc ${TF_CFLAGS[@]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
+g++ -std=c++11 -I ./ -L ${CUDA_DIR}/cuda/lib64 -shared -o correlation_cost.so ops/correlation_cost_op.cc kernels/correlation_cost_op.cc correlation_cost_op_gpu.o ${TF_CFLAGS[@]} -fPIC -lcudart ${TF_LFLAGS[@]} -D GOOGLE_CUDA=1
--- a/research/feelvos/correlation_cost/fix_code.sh
+++ b/research/feelvos/correlation_cost/fix_code.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to modify the downloaded code.
+#
+#  Usage:
+#    sh ./fix_code.sh
+#
+#
+sed -i "s/tensorflow\/contrib\/correlation_cost\///g" kernels/correlation_cost_op_gpu.cu.cc
+sed -i "s/tensorflow\/contrib\/correlation_cost\///g" kernels/correlation_cost_op.cc
+sed -i "s/external\/cub_archive\//cub\//g" kernels/correlation_cost_op_gpu.cu.cc
+sed -i "s/from tensorflow.contrib.util import loader/import tensorflow as tf/g" python/ops/correlation_cost_op.py
+grep -v "from tensorflow" python/ops/correlation_cost_op.py | grep -v resource_loader.get_path_to_datafile > correlation_cost_op.py.tmp && mv correlation_cost_op.py.tmp python/ops/correlation_cost_op.py
+sed -i "s/array_ops/tf/g" python/ops/correlation_cost_op.py
+sed -i "s/ops/tf/g" python/ops/correlation_cost_op.py
+sed -i "s/loader.load_op_library(/tf.load_op_library('feelvos\/correlation_cost\/correlation_cost.so')/g" python/ops/correlation_cost_op.py
+sed -i "s/gen_correlation_cost_op/_correlation_cost_op_so/g" python/ops/correlation_cost_op.py
--- a/research/feelvos/correlation_cost/get_code.sh
+++ b/research/feelvos/correlation_cost/get_code.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to download the code for correlation_cost.
+#
+#  Usage:
+#    sh ./get_code.sh
+#
+#
+mkdir -p kernels ops python/ops
+touch __init__.py
+touch python/__init__.py
+touch python/ops/__init__.py
+wget https://raw.githubusercontent.com/tensorflow/tensorflow/91b163b9bd8dd0f8c2631b4245a67dfd387536a6/tensorflow/contrib/correlation_cost/ops/correlation_cost_op.cc -O ops/correlation_cost_op.cc
+wget https://raw.githubusercontent.com/tensorflow/tensorflow/91b163b9bd8dd0f8c2631b4245a67dfd387536a6/tensorflow/contrib/correlation_cost/python/ops/correlation_cost_op.py -O python/ops/correlation_cost_op.py
+wget https://raw.githubusercontent.com/tensorflow/tensorflow/91b163b9bd8dd0f8c2631b4245a67dfd387536a6/tensorflow/contrib/correlation_cost/kernels/correlation_cost_op.cc -O kernels/correlation_cost_op.cc
+wget https://raw.githubusercontent.com/tensorflow/tensorflow/91b163b9bd8dd0f8c2631b4245a67dfd387536a6/tensorflow/contrib/correlation_cost/kernels/correlation_cost_op.h -O kernels/correlation_cost_op.h
+wget https://raw.githubusercontent.com/tensorflow/tensorflow/91b163b9bd8dd0f8c2631b4245a67dfd387536a6/tensorflow/contrib/correlation_cost/kernels/correlation_cost_op_gpu.cu.cc -O kernels/correlation_cost_op_gpu.cu.cc
--- a/research/feelvos/datasets/__init__.py
+++ b/research/feelvos/datasets/__init__.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
--- a/research/feelvos/datasets/build_davis2017_data.py
+++ b/research/feelvos/datasets/build_davis2017_data.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Converts DAVIS 2017 data to TFRecord file format with SequenceExample protos.
+"""
+import io
+import math
+import os
+from StringIO import StringIO
+import numpy as np
+import PIL
+import tensorflow as tf
+FLAGS = tf.app.flags.FLAGS
+tf.app.flags.DEFINE_string('data_folder', 'DAVIS2017/',
+                           'Folder containing the DAVIS 2017 data')
+tf.app.flags.DEFINE_string('imageset', 'val',
+                           'Which subset to use, either train or val')
+tf.app.flags.DEFINE_string(
+    'output_dir', './tfrecord',
+    'Path to save converted TFRecords of TensorFlow examples.')
+_NUM_SHARDS_TRAIN = 10
+_NUM_SHARDS_VAL = 1
+def read_image(path):
+  with open(path) as fid:
+    image_str = fid.read()
+    image = PIL.Image.open(io.BytesIO(image_str))
+    w, h = image.size
+  return image_str, (h, w)
+def read_annotation(path):
+  """Reads a single image annotation from a png image.
+  Args:
+    path: Path to the png image.
+  Returns:
+    png_string: The png encoded as string.
+    size: Tuple of (height, width).
+  """
+  with open(path) as fid:
+    x = np.array(PIL.Image.open(fid))
+    h, w = x.shape
+    im = PIL.Image.fromarray(x)
+  output = StringIO()
+  im.save(output, format='png')
+  png_string = output.getvalue()
+  output.close()
+  return png_string, (h, w)
+def process_video(key, input_dir, anno_dir):
+  """Creates a SequenceExample for the video.
+  Args:
+    key: Name of the video.
+    input_dir: Directory which contains the image files.
+    anno_dir: Directory which contains the annotation files.
+  Returns:
+    The created SequenceExample.
+  """
+  frame_names = sorted(tf.gfile.ListDirectory(input_dir))
+  anno_files = sorted(tf.gfile.ListDirectory(anno_dir))
+  assert len(frame_names) == len(anno_files)
+  sequence = tf.train.SequenceExample()
+  context = sequence.context.feature
+  features = sequence.feature_lists.feature_list
+  for i, name in enumerate(frame_names):
+    image_str, image_shape = read_image(
+        os.path.join(input_dir, name))
+    anno_str, anno_shape = read_annotation(
+        os.path.join(anno_dir, name[:-4] + '.png'))
+    image_encoded = features['image/encoded'].feature.add()
+    image_encoded.bytes_list.value.append(image_str)
+    segmentation_encoded = features['segmentation/object/encoded'].feature.add()
+    segmentation_encoded.bytes_list.value.append(anno_str)
+    np.testing.assert_array_equal(np.array(image_shape), np.array(anno_shape))
+    if i == 0:
+      first_shape = np.array(image_shape)
+    else:
+      np.testing.assert_array_equal(np.array(image_shape), first_shape)
+  context['video_id'].bytes_list.value.append(key.encode('ascii'))
+  context['clip/frames'].int64_list.value.append(len(frame_names))
+  context['image/format'].bytes_list.value.append('JPEG')
+  context['image/channels'].int64_list.value.append(3)
+  context['image/height'].int64_list.value.append(first_shape[0])
+  context['image/width'].int64_list.value.append(first_shape[1])
+  context['segmentation/object/format'].bytes_list.value.append('PNG')
+  context['segmentation/object/height'].int64_list.value.append(first_shape[0])
+  context['segmentation/object/width'].int64_list.value.append(first_shape[1])
+  return sequence
+def convert(data_folder, imageset, output_dir, num_shards):
+  """Converts the specified subset of DAVIS 2017 to TFRecord format.
+  Args:
+    data_folder: The path to the DAVIS 2017 data.
+    imageset: The subset to use, either train or val.
+    output_dir: Where to store the TFRecords.
+    num_shards: The number of shards used for storing the data.
+  """
+  sets_file = os.path.join(data_folder, 'ImageSets', '2017', imageset + '.txt')
+  vids = [x.strip() for x in open(sets_file).readlines()]
+  num_vids = len(vids)
+  num_vids_per_shard = int(math.ceil(num_vids) / float(num_shards))
+  for shard_id in range(num_shards):
+    output_filename = os.path.join(
+        output_dir,
+        '%s-%05d-of-%05d.tfrecord' % (imageset, shard_id, num_shards))
+    with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
+      start_idx = shard_id * num_vids_per_shard
+      end_idx = min((shard_id + 1) * num_vids_per_shard, num_vids)
+      for i in range(start_idx, end_idx):
+        print('Converting video %d/%d shard %d video %s' % (
+            i + 1, num_vids, shard_id, vids[i]))
+        img_dir = os.path.join(data_folder, 'JPEGImages', '480p', vids[i])
+        anno_dir = os.path.join(data_folder, 'Annotations', '480p', vids[i])
+        example = process_video(vids[i], img_dir, anno_dir)
+        tfrecord_writer.write(example.SerializeToString())
+def main(unused_argv):
+  imageset = FLAGS.imageset
+  assert imageset in ('train', 'val')
+  if imageset == 'train':
+    num_shards = _NUM_SHARDS_TRAIN
+  else:
+    num_shards = _NUM_SHARDS_VAL
+  convert(FLAGS.data_folder, FLAGS.imageset, FLAGS.output_dir, num_shards)
+if __name__ == '__main__':
+  tf.app.run()
--- a/research/feelvos/datasets/download_and_convert_davis17.sh
+++ b/research/feelvos/datasets/download_and_convert_davis17.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# Script to download and preprocess the DAVIS 2017 dataset.
+#
+# Usage:
+#   bash ./download_and_convert_davis17.sh
+# Exit immediately if a command exits with a non-zero status.
+set -e
+CURRENT_DIR=$(pwd)
+WORK_DIR="./davis17"
+mkdir -p "${WORK_DIR}"
+cd "${WORK_DIR}"
+# Helper function to download and unpack the DAVIS 2017 dataset.
+download_and_uncompress() {
+  local BASE_URL=${1}
+  local FILENAME=${2}
+  if [ ! -f "${FILENAME}" ]; then
+    echo "Downloading ${FILENAME} to ${WORK_DIR}"
+    wget -nd -c "${BASE_URL}/${FILENAME}"
+    echo "Uncompressing ${FILENAME}"
+    unzip "${FILENAME}"
+  fi
+}
+BASE_URL="https://data.vision.ee.ethz.ch/csergi/share/davis/"
+FILENAME="DAVIS-2017-trainval-480p.zip"
+download_and_uncompress "${BASE_URL}" "${FILENAME}"
+cd "${CURRENT_DIR}"
+# Root path for DAVIS 2017 dataset.
+DAVIS_ROOT="${WORK_DIR}/DAVIS"
+# Build TFRecords of the dataset.
+# First, create output directory for storing TFRecords.
+OUTPUT_DIR="${WORK_DIR}/tfrecord"
+mkdir -p "${OUTPUT_DIR}"
+IMAGE_FOLDER="${DAVIS_ROOT}/JPEGImages"
+LIST_FOLDER="${DAVIS_ROOT}/ImageSets/Segmentation"
+# Convert validation set.
+if [ ! -f "${OUTPUT_DIR}/val-00000-of-00001.tfrecord" ]; then
+  echo "Converting DAVIS 2017 dataset (val)..."
+  python ./build_davis2017_data.py \
+    --data_folder="${DAVIS_ROOT}" \
+    --imageset=val \
+    --output_dir="${OUTPUT_DIR}"
+fi
+# Convert training set.
+if [ ! -f "${OUTPUT_DIR}/train-00009-of-00010.tfrecord" ]; then
+  echo "Converting DAVIS 2017 dataset (train)..."
+  python ./build_davis2017_data.py \
+    --data_folder="${DAVIS_ROOT}" \
+    --imageset=train \
+    --output_dir="${OUTPUT_DIR}"
+fi
--- a/research/feelvos/datasets/tfsequence_example_decoder.py
+++ b/research/feelvos/datasets/tfsequence_example_decoder.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Contains the TFExampleDecoder.
+The TFExampleDecode is a DataDecoder used to decode TensorFlow Example protos.
+In order to do so each requested item must be paired with one or more Example
+features that are parsed to produce the Tensor-based manifestation of the item.
+"""
+import tensorflow as tf
+slim = tf.contrib.slim
+data_decoder = slim.data_decoder
+class TFSequenceExampleDecoder(data_decoder.DataDecoder):
+  """A decoder for TensorFlow SequenceExamples.
+  Decoding SequenceExample proto buffers is comprised of two stages:
+  (1) Example parsing and (2) tensor manipulation.
+  In the first stage, the tf.parse_single_sequence_example function is called
+  with a list of FixedLenFeatures and SparseLenFeatures. These instances tell TF
+  how to parse the example. The output of this stage is a set of tensors.
+  In the second stage, the resulting tensors are manipulated to provide the
+  requested 'item' tensors.
+  To perform this decoding operation, a SequenceExampleDecoder is given a list
+  of ItemHandlers. Each ItemHandler indicates the set of features for stage 1
+  and contains the instructions for post_processing its tensors for stage 2.
+  """
+  def __init__(self, keys_to_context_features, keys_to_sequence_features,
+               items_to_handlers):
+    """Constructs the decoder.
+    Args:
+      keys_to_context_features: a dictionary from TF-SequenceExample context
+        keys to either tf.VarLenFeature or tf.FixedLenFeature instances.
+        See tensorflow's parsing_ops.py.
+      keys_to_sequence_features: a dictionary from TF-SequenceExample sequence
+        keys to either tf.VarLenFeature or tf.FixedLenSequenceFeature instances.
+        See tensorflow's parsing_ops.py.
+      items_to_handlers: a dictionary from items (strings) to ItemHandler
+        instances. Note that the ItemHandler's are provided the keys that they
+        use to return the final item Tensors.
+    Raises:
+      ValueError: if the same key is present for context features and sequence
+        features.
+    """
+    unique_keys = set()
+    unique_keys.update(keys_to_context_features)
+    unique_keys.update(keys_to_sequence_features)
+    if len(unique_keys) != (
+        len(keys_to_context_features) + len(keys_to_sequence_features)):
+      # This situation is ambiguous in the decoder's keys_to_tensors variable.
+      raise ValueError('Context and sequence keys are not unique. \n'
+                       ' Context keys: %s \n Sequence keys: %s' %
+                       (list(keys_to_context_features.keys()),
+                        list(keys_to_sequence_features.keys())))
+    self._keys_to_context_features = keys_to_context_features
+    self._keys_to_sequence_features = keys_to_sequence_features
+    self._items_to_handlers = items_to_handlers
+  def list_items(self):
+    """See base class."""
+    return self._items_to_handlers.keys()
+  def decode(self, serialized_example, items=None):
+    """Decodes the given serialized TF-SequenceExample.
+    Args:
+      serialized_example: a serialized TF-SequenceExample tensor.
+      items: the list of items to decode. These must be a subset of the item
+        keys in self._items_to_handlers. If `items` is left as None, then all
+        of the items in self._items_to_handlers are decoded.
+    Returns:
+      the decoded items, a list of tensor.
+    """
+    context, feature_list = tf.parse_single_sequence_example(
+        serialized_example, self._keys_to_context_features,
+        self._keys_to_sequence_features)
+    # Reshape non-sparse elements just once:
+    for k in self._keys_to_context_features:
+      v = self._keys_to_context_features[k]
+      if isinstance(v, tf.FixedLenFeature):
+        context[k] = tf.reshape(context[k], v.shape)
+    if not items:
+      items = self._items_to_handlers.keys()
+    outputs = []
+    for item in items:
+      handler = self._items_to_handlers[item]
+      keys_to_tensors = {
+          key: context[key] if key in context else feature_list[key]
+          for key in handler.keys
+      }
+      outputs.append(handler.tensors_to_item(keys_to_tensors))
+    return outputs
--- a/research/feelvos/datasets/video_dataset.py
+++ b/research/feelvos/datasets/video_dataset.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Provides data from video object segmentation datasets.
+This file provides both images and annotations (instance segmentations) for
+TensorFlow. Currently, we support the following datasets:
+1. DAVIS 2017 (https://davischallenge.org/davis2017/code.html).
+2. DAVIS 2016 (https://davischallenge.org/davis2016/code.html).
+3. YouTube-VOS (https://youtube-vos.org/dataset/download).
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import os.path
+import tensorflow as tf
+from feelvos.datasets import tfsequence_example_decoder
+slim = tf.contrib.slim
+dataset = slim.dataset
+tfexample_decoder = slim.tfexample_decoder
+_ITEMS_TO_DESCRIPTIONS = {
+    'image': 'A color image of varying height and width.',
+    'labels_class': ('A semantic segmentation label whose size matches image.'
+                     'Its values range from 0 (background) to num_classes.'),
+}
+# Named tuple to describe the dataset properties.
+DatasetDescriptor = collections.namedtuple(
+    'DatasetDescriptor',
+    ['splits_to_sizes',   # Splits of the dataset into training, val, and test.
+     'num_classes',   # Number of semantic classes.
+     'ignore_label',  # Ignore label value.
+    ]
+)
+_DAVIS_2016_INFORMATION = DatasetDescriptor(
+    splits_to_sizes={'train': [30, 1830],
+                     'val': [20, 1376]},
+    num_classes=2,
+    ignore_label=255,
+)
+_DAVIS_2017_INFORMATION = DatasetDescriptor(
+    splits_to_sizes={'train': [60, 4219],
+                     'val': [30, 2023],
+                     'test-dev': [30, 2037]},
+    num_classes=None,  # Number of instances per videos differ.
+    ignore_label=255,
+)
+_YOUTUBE_VOS_2018_INFORMATION = DatasetDescriptor(
+    # Leave these sizes as None to allow for different splits into
+    # training and validation sets.
+    splits_to_sizes={'train': [None, None],
+                     'val': [None, None]},
+    num_classes=None,  # Number of instances per video differs.
+    ignore_label=255,
+)
+_DATASETS_INFORMATION = {
+    'davis_2016': _DAVIS_2016_INFORMATION,
+    'davis_2017': _DAVIS_2017_INFORMATION,
+    'youtube_vos_2018': _YOUTUBE_VOS_2018_INFORMATION,
+}
+# Default file pattern of SSTable. Note we include '-' to avoid the confusion
+# between `train-` and `trainval-` sets.
+_FILE_PATTERN = '%s-*'
+def get_dataset(dataset_name,
+                split_name,
+                dataset_dir,
+                file_pattern=None,
+                data_type='tf_sequence_example',
+                decode_video_frames=False):
+  """Gets an instance of slim Dataset.
+  Args:
+    dataset_name: String, dataset name.
+    split_name: String, the train/val Split name.
+    dataset_dir: String, the directory of the dataset sources.
+    file_pattern: String, file pattern of SSTable.
+    data_type: String, data type. Currently supports 'tf_example' and
+      'annotated_image'.
+    decode_video_frames: Boolean, decode the images or not. Not decoding it here
+        is useful if we subsample later
+  Returns:
+    An instance of slim Dataset.
+  Raises:
+    ValueError: If the dataset_name or split_name is not recognized, or if
+      the dataset_type is not supported.
+  """
+  if dataset_name not in _DATASETS_INFORMATION:
+    raise ValueError('The specified dataset is not supported yet.')
+  splits_to_sizes = _DATASETS_INFORMATION[dataset_name].splits_to_sizes
+  if split_name not in splits_to_sizes:
+    raise ValueError('data split name %s not recognized' % split_name)
+  # Prepare the variables for different datasets.
+  num_classes = _DATASETS_INFORMATION[dataset_name].num_classes
+  ignore_label = _DATASETS_INFORMATION[dataset_name].ignore_label
+  if file_pattern is None:
+    file_pattern = _FILE_PATTERN
+  file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
+  if data_type == 'tf_sequence_example':
+    keys_to_context_features = {
+        'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
+        'image/height': tf.FixedLenFeature((), tf.int64, default_value=0),
+        'image/width': tf.FixedLenFeature((), tf.int64, default_value=0),
+        'segmentation/object/format': tf.FixedLenFeature(
+            (), tf.string, default_value='png'),
+        'video_id': tf.FixedLenFeature((), tf.string, default_value='unknown')
+    }
+    label_name = 'class' if dataset_name == 'davis_2016' else 'object'
+    keys_to_sequence_features = {
+        'image/encoded': tf.FixedLenSequenceFeature((), dtype=tf.string),
+        'segmentation/{}/encoded'.format(label_name):
+            tf.FixedLenSequenceFeature((), tf.string),
+        'segmentation/{}/encoded'.format(label_name):
+            tf.FixedLenSequenceFeature((), tf.string),
+    }
+    items_to_handlers = {
+        'height': tfexample_decoder.Tensor('image/height'),
+        'width': tfexample_decoder.Tensor('image/width'),
+        'video_id': tfexample_decoder.Tensor('video_id')
+    }
+    if decode_video_frames:
+      decode_image_handler = tfexample_decoder.Image(
+          image_key='image/encoded',
+          format_key='image/format',
+          channels=3,
+          repeated=True)
+      items_to_handlers['image'] = decode_image_handler
+      decode_label_handler = tfexample_decoder.Image(
+          image_key='segmentation/{}/encoded'.format(label_name),
+          format_key='segmentation/{}/format'.format(label_name),
+          channels=1,
+          repeated=True)
+      items_to_handlers['labels_class'] = decode_label_handler
+    else:
+      items_to_handlers['image/encoded'] = tfexample_decoder.Tensor(
+          'image/encoded')
+      items_to_handlers[
+          'segmentation/object/encoded'] = tfexample_decoder.Tensor(
+              'segmentation/{}/encoded'.format(label_name))
+    decoder = tfsequence_example_decoder.TFSequenceExampleDecoder(
+        keys_to_context_features, keys_to_sequence_features, items_to_handlers)
+  else:
+    raise ValueError('Unknown data type.')
+  size = splits_to_sizes[split_name]
+  if isinstance(size, collections.Sequence):
+    num_videos = size[0]
+    num_samples = size[1]
+  else:
+    num_videos = 0
+    num_samples = size
+  return dataset.Dataset(
+      data_sources=file_pattern,
+      reader=tf.TFRecordReader,
+      decoder=decoder,
+      num_samples=num_samples,
+      num_videos=num_videos,
+      items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
+      ignore_label=ignore_label,
+      num_classes=num_classes,
+      name=dataset_name,
+      multi_label=True)
--- a/research/feelvos/eval.sh
+++ b/research/feelvos/eval.sh
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to locally run inference on DAVIS 2017. Users could also
+# modify from this script for their use case. See train.sh for an example of
+# local training.
+#
+# Usage:
+#   # From the tensorflow/models/research/feelvos directory.
+#   sh ./eval.sh
+#
+#
+# Exit immediately if a command exits with a non-zero status.
+set -e
+# Move one-level up to tensorflow/models/research directory.
+cd ..
+# Update PYTHONPATH.
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim:`pwd`/feelvos
+# Set up the working environment.
+CURRENT_DIR=$(pwd)
+WORK_DIR="${CURRENT_DIR}/feelvos"
+# Run embedding_utils_test first to make sure the PYTHONPATH is correctly set.
+python "${WORK_DIR}"/utils/embedding_utils_test.py -v
+# Go to datasets folder and download and convert the DAVIS 2017 dataset.
+DATASET_DIR="datasets"
+cd "${WORK_DIR}/${DATASET_DIR}"
+sh download_and_convert_davis17.sh
+# Go to models folder and download and unpack the DAVIS 2017 trained model.
+MODELS_DIR="models"
+mkdir -p "${WORK_DIR}/${MODELS_DIR}"
+cd "${WORK_DIR}/${MODELS_DIR}"
+if [ ! -d "feelvos_davis17_trained" ]; then
+  wget http://download.tensorflow.org/models/feelvos_davis17_trained.tar.gz
+  tar -xvf feelvos_davis17_trained.tar.gz
+  echo "model_checkpoint_path: \"model.ckpt-200004\"" > feelvos_davis17_trained/checkpoint
+  rm feelvos_davis17_trained.tar.gz
+fi
+CHECKPOINT_DIR="${WORK_DIR}/${MODELS_DIR}/feelvos_davis17_trained/"
+# Go back to orignal directory.
+cd "${CURRENT_DIR}"
+# Set up the working directories.
+DAVIS_FOLDER="davis17"
+EXP_FOLDER="exp/eval_on_val_set"
+VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${DAVIS_FOLDER}/${EXP_FOLDER}/eval"
+mkdir -p ${VIS_LOGDIR}
+DAVIS_DATASET="${WORK_DIR}/${DATASET_DIR}/${DAVIS_FOLDER}/tfrecord"
+python "${WORK_DIR}"/vis_video.py \
+  --dataset=davis_2017 \
+  --dataset_dir="${DAVIS_DATASET}" \
+  --vis_logdir="${VIS_LOGDIR}" \
+  --checkpoint_dir="${CHECKPOINT_DIR}" \
+  --logtostderr \
+  --atrous_rates=12 \
+  --atrous_rates=24 \
+  --atrous_rates=36 \
+  --decoder_output_stride=4 \
+  --model_variant=xception_65 \
+  --multi_grid=1 \
+  --multi_grid=1 \
+  --multi_grid=1 \
+  --output_stride=8 \
+  --save_segmentations
--- a/research/feelvos/input_preprocess.py
+++ b/research/feelvos/input_preprocess.py
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Prepare the data used for FEELVOS training/evaluation."""
+import tensorflow as tf
+from deeplab.core import feature_extractor
+from deeplab.core import preprocess_utils
+# The probability of flipping the images and labels
+# left-right during training
+_PROB_OF_FLIP = 0.5
+get_random_scale = preprocess_utils.get_random_scale
+randomly_scale_image_and_label = (
+    preprocess_utils.randomly_scale_image_and_label)
+def preprocess_image_and_label(image,
+                               label,
+                               crop_height,
+                               crop_width,
+                               min_resize_value=None,
+                               max_resize_value=None,
+                               resize_factor=None,
+                               min_scale_factor=1.,
+                               max_scale_factor=1.,
+                               scale_factor_step_size=0,
+                               ignore_label=255,
+                               is_training=True,
+                               model_variant=None):
+  """Preprocesses the image and label.
+  Args:
+    image: Input image.
+    label: Ground truth annotation label.
+    crop_height: The height value used to crop the image and label.
+    crop_width: The width value used to crop the image and label.
+    min_resize_value: Desired size of the smaller image side.
+    max_resize_value: Maximum allowed size of the larger image side.
+    resize_factor: Resized dimensions are multiple of factor plus one.
+    min_scale_factor: Minimum scale factor value.
+    max_scale_factor: Maximum scale factor value.
+    scale_factor_step_size: The step size from min scale factor to max scale
+      factor. The input is randomly scaled based on the value of
+      (min_scale_factor, max_scale_factor, scale_factor_step_size).
+    ignore_label: The label value which will be ignored for training and
+      evaluation.
+    is_training: If the preprocessing is used for training or not.
+    model_variant: Model variant (string) for choosing how to mean-subtract the
+      images. See feature_extractor.network_map for supported model variants.
+  Returns:
+    original_image: Original image (could be resized).
+    processed_image: Preprocessed image.
+    label: Preprocessed ground truth segmentation label.
+  Raises:
+    ValueError: Ground truth label not provided during training.
+  """
+  if is_training and label is None:
+    raise ValueError('During training, label must be provided.')
+  if model_variant is None:
+    tf.logging.warning('Default mean-subtraction is performed. Please specify '
+                       'a model_variant. See feature_extractor.network_map for '
+                       'supported model variants.')
+  # Keep reference to original image.
+  original_image = image
+  processed_image = tf.cast(image, tf.float32)
+  if label is not None:
+    label = tf.cast(label, tf.int32)
+  # Resize image and label to the desired range.
+  if min_resize_value is not None or max_resize_value is not None:
+    [processed_image, label] = (
+        preprocess_utils.resize_to_range(
+            image=processed_image,
+            label=label,
+            min_size=min_resize_value,
+            max_size=max_resize_value,
+            factor=resize_factor,
+            align_corners=True))
+    # The `original_image` becomes the resized image.
+    original_image = tf.identity(processed_image)
+  # Data augmentation by randomly scaling the inputs.
+  scale = get_random_scale(
+      min_scale_factor, max_scale_factor, scale_factor_step_size)
+  processed_image, label = randomly_scale_image_and_label(
+      processed_image, label, scale)
+  processed_image.set_shape([None, None, 3])
+  if crop_height is not None and crop_width is not None:
+    # Pad image and label to have dimensions >= [crop_height, crop_width].
+    image_shape = tf.shape(processed_image)
+    image_height = image_shape[0]
+    image_width = image_shape[1]
+    target_height = image_height + tf.maximum(crop_height - image_height, 0)
+    target_width = image_width + tf.maximum(crop_width - image_width, 0)
+    # Pad image with mean pixel value.
+    mean_pixel = tf.reshape(
+        feature_extractor.mean_pixel(model_variant), [1, 1, 3])
+    processed_image = preprocess_utils.pad_to_bounding_box(
+        processed_image, 0, 0, target_height, target_width, mean_pixel)
+    if label is not None:
+      label = preprocess_utils.pad_to_bounding_box(
+          label, 0, 0, target_height, target_width, ignore_label)
+    # Randomly crop the image and label.
+    if is_training and label is not None:
+      processed_image, label = preprocess_utils.random_crop(
+          [processed_image, label], crop_height, crop_width)
+    processed_image.set_shape([crop_height, crop_width, 3])
+    if label is not None:
+      label.set_shape([crop_height, crop_width, 1])
+  if is_training:
+    # Randomly left-right flip the image and label.
+    processed_image, label, _ = preprocess_utils.flip_dim(
+        [processed_image, label], _PROB_OF_FLIP, dim=1)
+  return original_image, processed_image, label
+def preprocess_images_and_labels_consistently(images,
+                                              labels,
+                                              crop_height,
+                                              crop_width,
+                                              min_resize_value=None,
+                                              max_resize_value=None,
+                                              resize_factor=None,
+                                              min_scale_factor=1.,
+                                              max_scale_factor=1.,
+                                              scale_factor_step_size=0,
+                                              ignore_label=255,
+                                              is_training=True,
+                                              model_variant=None):
+  """Preprocesses images and labels in a consistent way.
+  Similar to preprocess_image_and_label, but works on a list of images
+  and a list of labels and uses the same crop coordinates and either flips
+  all images and labels or none of them.
+  Args:
+    images: List of input images.
+    labels: List of ground truth annotation labels.
+    crop_height: The height value used to crop the image and label.
+    crop_width: The width value used to crop the image and label.
+    min_resize_value: Desired size of the smaller image side.
+    max_resize_value: Maximum allowed size of the larger image side.
+    resize_factor: Resized dimensions are multiple of factor plus one.
+    min_scale_factor: Minimum scale factor value.
+    max_scale_factor: Maximum scale factor value.
+    scale_factor_step_size: The step size from min scale factor to max scale
+      factor. The input is randomly scaled based on the value of
+      (min_scale_factor, max_scale_factor, scale_factor_step_size).
+    ignore_label: The label value which will be ignored for training and
+      evaluation.
+    is_training: If the preprocessing is used for training or not.
+    model_variant: Model variant (string) for choosing how to mean-subtract the
+      images. See feature_extractor.network_map for supported model variants.
+  Returns:
+    original_images: Original images (could be resized).
+    processed_images: Preprocessed images.
+    labels: Preprocessed ground truth segmentation labels.
+  Raises:
+    ValueError: Ground truth label not provided during training.
+  """
+  if is_training and labels is None:
+    raise ValueError('During training, labels must be provided.')
+  if model_variant is None:
+    tf.logging.warning('Default mean-subtraction is performed. Please specify '
+                       'a model_variant. See feature_extractor.network_map for '
+                       'supported model variants.')
+  if labels is not None:
+    assert len(images) == len(labels)
+  num_imgs = len(images)
+  # Keep reference to original images.
+  original_images = images
+  processed_images = [tf.cast(image, tf.float32) for image in images]
+  if labels is not None:
+    labels = [tf.cast(label, tf.int32) for label in labels]
+  # Resize images and labels to the desired range.
+  if min_resize_value is not None or max_resize_value is not None:
+    processed_images, labels = zip(*[
+        preprocess_utils.resize_to_range(
+            image=processed_image,
+            label=label,
+            min_size=min_resize_value,
+            max_size=max_resize_value,
+            factor=resize_factor,
+            align_corners=True) for processed_image, label
+        in zip(processed_images, labels)])
+    # The `original_images` becomes the resized images.
+    original_images = [tf.identity(processed_image)
+                       for processed_image in processed_images]
+  # Data augmentation by randomly scaling the inputs.
+  scale = get_random_scale(
+      min_scale_factor, max_scale_factor, scale_factor_step_size)
+  processed_images, labels = zip(
+      *[randomly_scale_image_and_label(processed_image, label, scale)
+        for processed_image, label in zip(processed_images, labels)])
+  for processed_image in processed_images:
+    processed_image.set_shape([None, None, 3])
+  if crop_height is not None and crop_width is not None:
+    # Pad image and label to have dimensions >= [crop_height, crop_width].
+    image_shape = tf.shape(processed_images[0])
+    image_height = image_shape[0]
+    image_width = image_shape[1]
+    target_height = image_height + tf.maximum(crop_height - image_height, 0)
+    target_width = image_width + tf.maximum(crop_width - image_width, 0)
+    # Pad image with mean pixel value.
+    mean_pixel = tf.reshape(
+        feature_extractor.mean_pixel(model_variant), [1, 1, 3])
+    processed_images = [preprocess_utils.pad_to_bounding_box(
+        processed_image, 0, 0, target_height, target_width, mean_pixel)
+                        for processed_image in processed_images]
+    if labels is not None:
+      labels = [preprocess_utils.pad_to_bounding_box(
+          label, 0, 0, target_height, target_width, ignore_label)
+                for label in labels]
+    # Randomly crop the images and labels.
+    if is_training and labels is not None:
+      cropped = preprocess_utils.random_crop(
+          processed_images + labels, crop_height, crop_width)
+      assert len(cropped) == 2 * num_imgs
+      processed_images = cropped[:num_imgs]
+      labels = cropped[num_imgs:]
+    for processed_image in processed_images:
+      processed_image.set_shape([crop_height, crop_width, 3])
+    if labels is not None:
+      for label in labels:
+        label.set_shape([crop_height, crop_width, 1])
+  if is_training:
+    # Randomly left-right flip the image and label.
+    res = preprocess_utils.flip_dim(
+        list(processed_images + labels), _PROB_OF_FLIP, dim=1)
+    maybe_flipped = res[:-1]
+    assert len(maybe_flipped) == 2 * num_imgs
+    processed_images = maybe_flipped[:num_imgs]
+    labels = maybe_flipped[num_imgs:]
+  return original_images, processed_images, labels
--- a/research/feelvos/model.py
+++ b/research/feelvos/model.py
--- a/research/feelvos/train.py
+++ b/research/feelvos/train.py