Commit 026ca58a authored by Reza Mahjourian's avatar Reza Mahjourian
Browse files

Add vid2depth model.

parent a1adc50b
......@@ -44,6 +44,7 @@
/research/tensorrt/ @karmel
/research/textsum/ @panyx0718 @peterjliu
/research/transformer/ @daviddao
/research/vid2depth/ @rezama
/research/video_prediction/ @cbfinn
/research/fivo/ @dieterichlawson
/samples/ @MarkDaoust @lamberta
......
......@@ -70,5 +70,7 @@ request.
summarization.
- [transformer](transformer): spatial transformer network, which allows the
spatial manipulation of data within the network.
- [vid2depth](vid2depth): learning depth and ego-motion unsupervised from
raw monocular video.
- [video_prediction](video_prediction): predicting future video frames with
neural advection.
# For projects which use TensorFlow as part of a Bazel build process, putting
# nothing in a bazelrc will default to a monolithic build. The following line
# opts in to modular op registration support by default.
build --define framework_shared_object=true
build --copt=-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
build --define=grpc_no_ares=true
package(default_visibility = ["//visibility:public"])
# vid2depth
**Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints**
Reza Mahjourian, Martin Wicke, Anelia Angelova
CVPR 2018
Project website: [http://sites.google.com/corp/view/vid2depth](https://sites.google.com/corp/view/vid2depth)
ArXiv: [https://arxiv.org/abs/1802.05522](https://arxiv.org/abs/1802.05522)
<p align="center">
<a href="http://sites.google.com/corp/view/vid2depth"><img src='https://storage.googleapis.com/vid2depth/media/sample_video.gif'></a>
</p>
<p align="center">
<a href="http://sites.google.com/corp/view/vid2depth"><img src='https://storage.googleapis.com/vid2depth/media/approach.png' width=400></a>
</p>
## 1. Installation
### Requirements
#### Python Packages
```shell
mkvirtualenv venv # Optionally create a virtual environment.
pip install absl-py
pip install matplotlib
pip install numpy
pip install scipy
pip install tensorflow
```
#### For building the ICP op (work in progress)
* Bazel: https://bazel.build/
### Download vid2depth
```shell
git clone --depth 1 https://github.com/tensorflow/models.git
```
## 2. Datasets
### Download KITTI dataset (174GB)
```shell
mkdir -p ~/vid2depth/kitti-raw-uncompressed
cd ~/vid2depth/kitti-raw-uncompressed
wget https://github.com/mrharicot/monodepth/blob/master/utils/kitti_archives_to_download.txt
wget -i kitti_archives_to_download.txt
unzip "*.zip"
```
### Download Cityscapes dataset (110GB) (optional)
You will need to register in order to download the data. Download the following files:
* leftImg8bit_sequence_trainvaltest.zip
* camera_trainvaltest.zip
### Download Bike dataset (17GB) (optional)
```shell
mkdir -p ~/vid2depth/bike-uncompressed
cd ~/vid2depth/bike-uncompressed
wget https://storage.googleapis.com/brain-robotics-data/bike/BikeVideoDataset.tar
tar xvf BikeVideoDataset.tar
```
## 3. Inference
### Download trained model
```shell
mkdir -p ~/vid2depth/trained-model
cd ~/vid2depth/trained-model
wget https://storage.cloud.google.com/vid2depth/model/model-119496.zip
unzip model-119496.zip
```
### Run inference
```shell
cd tensorflow/models/research/vid2depth
python inference.py \
--kitti_dir ~/vid2depth/kitti-raw-uncompressed \
--output_dir ~/vid2depth/inference \
--video 2011_09_26/2011_09_26_drive_0009_sync \
--model_ckpt ~/vid2depth/trained-model/model-119496
```
## 4. Training
### Prepare KITTI training sequences
```shell
# Prepare training sequences.
cd tensorflow/models/research/vid2depth
python dataset/gen_data.py \
--dataset_name kitti_raw_eigen \
--dataset_dir ~/vid2depth/kitti-raw-uncompressed \
--data_dir ~/vid2depth/data/kitti_raw_eigen \
--seq_length 3
```
### Prepare Cityscapes training sequences (optional)
```shell
# Prepare training sequences.
cd tensorflow/models/research/vid2depth
python dataset/gen_data.py \
--dataset_name cityscapes \
--dataset_dir ~/vid2depth/cityscapes-uncompressed \
--data_dir ~/vid2depth/data/cityscapes \
--seq_length 3
```
### Prepare Bike training sequences (optional)
```shell
# Prepare training sequences.
cd tensorflow/models/research/vid2depth
python dataset/gen_data.py \
--dataset_name bike \
--dataset_dir ~/vid2depth/bike-uncompressed \
--data_dir ~/vid2depth/data/bike \
--seq_length 3
```
### Compile the ICP op (work in progress)
The ICP op depends on multiple software packages (TensorFlow, Point Cloud
Library, FLANN, Boost, HDF5). The Bazel build system requires individual BUILD
files for each of these packages. We have included a partial implementation of
these BUILD files inside the third_party directory. But they are not ready for
compiling the op. If you manage to build the op, please let us know so we can
include your contribution.
```shell
cd tensorflow/models/research/vid2depth
bazel build ops:pcl_demo # Build test program using PCL only.
bazel build ops:icp_op.so
```
For the time being, it is possible to run inference on the pre-trained model and
run training without the icp loss.
### Run training
```shell
# Train
cd tensorflow/models/research/vid2depth
python train.py \
--data_dir ~/vid2depth/data/kitti_raw_eigen \
--seq_length 3 \
--reconstr_weight 0.85 \
--smooth_weight 0.05 \
--ssim_weight 0.15 \
--icp_weight 0 \
--checkpoint_dir ~/vid2depth/checkpoints
```
## Reference
If you find our work useful in your research please consider citing our paper:
```
@inproceedings{mahjourian2018unsupervised,
title={Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints},
author={Mahjourian, Reza and Wicke, Martin and Angelova, Anelia},
booktitle = {CVPR},
year={2018}
}
```
## Contact
To ask questions or report issues please open an issue on the tensorflow/models
[issues tracker](https://github.com/tensorflow/models/issues). Please assign
issues to [@rezama](https://github.com/rezama).
## Credits
This implementation is derived from [SfMLearner](https://github.com/tinghuiz/SfMLearner) by [Tinghui Zhou](https://github.com/tinghuiz).
workspace(name = "vid2depth")
# To update TensorFlow to a new revision.
# 1. Update the 'git_commit' args below to include the new git hash.
# 2. Get the sha256 hash of the archive with a command such as...
# curl -L https://github.com/tensorflow/tensorflow/archive/<git hash>.tar.gz | sha256sum
# and update the 'sha256' arg with the result.
# 3. Request the new archive to be mirrored on mirror.bazel.build for more
# reliable downloads.
load(":repo.bzl", "tensorflow_http_archive")
tensorflow_http_archive(
name = "org_tensorflow",
git_commit = "bc69c4ceed6544c109be5693eb40ddcf3a4eb95d",
sha256 = "21d6ac553adcfc9d089925f6d6793fee6a67264a0ce717bc998636662df4ca7e",
)
# TensorFlow depends on "io_bazel_rules_closure" so we need this here.
# Needs to be kept in sync with the same target in TensorFlow's WORKSPACE file.
http_archive(
name = "io_bazel_rules_closure",
sha256 = "dbe0da2cca88194d13dc5a7125a25dd7b80e1daec7839f33223de654d7a1bcc8",
strip_prefix = "rules_closure-ba3e07cb88be04a2d4af7009caa0ff3671a79d06",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/ba3e07cb88be04a2d4af7009caa0ff3671a79d06.tar.gz",
"https://github.com/bazelbuild/rules_closure/archive/ba3e07cb88be04a2d4af7009caa0ff3671a79d06.tar.gz", # 2017-10-31
],
)
load("@org_tensorflow//tensorflow:workspace.bzl", "tf_workspace")
tf_workspace(
path_prefix = "",
tf_repo_name = "org_tensorflow",
)
bind(
name = "libssl",
actual = "@boringssl//:ssl",
)
bind(
name = "zlib",
actual = "@zlib_archive//:zlib",
)
# gRPC wants the existence of a cares dependence but its contents are not
# actually important since we have set GRPC_ARES=0 in tools/bazel.rc
bind(
name = "cares",
actual = "@grpc//third_party/nanopb:nanopb",
)
# Specify the minimum required bazel version.
load("@org_tensorflow//tensorflow:workspace.bzl", "check_bazel_version_at_least")
check_bazel_version_at_least("0.5.4")
# TODO(rodrigoq): rename to com_github_antonovvk_bazel_rules to match cartographer.
http_archive(
name = "bazel_rules",
sha256 = "b6e1b6cfc17f676c70045deb6d46bb330490693e65c8d541aae265ea34a48c8c",
strip_prefix = "bazel_rules-0394a3b108412b8e543fd90255daa416e988c4a1",
urls = [
"https://mirror.bazel.build/github.com/drigz/bazel_rules/archive/0394a3b108412b8e543fd90255daa416e988c4a1.tar.gz",
"https://github.com/drigz/bazel_rules/archive/0394a3b108412b8e543fd90255daa416e988c4a1.tar.gz",
],
)
# Point Cloud Library (PCL)
new_http_archive(
name = "com_github_pointcloudlibrary_pcl",
build_file = "//third_party:pcl.BUILD",
sha256 = "5a102a2fbe2ba77c775bf92c4a5d2e3d8170be53a68c3a76cfc72434ff7b9783",
strip_prefix = "pcl-pcl-1.8.1",
urls = [
"https://mirror.bazel.build/github.com/PointCloudLibrary/pcl/archive/pcl-1.8.1.tar.gz",
"https://github.com/PointCloudLibrary/pcl/archive/pcl-1.8.1.tar.gz",
],
)
# FLANN
new_http_archive(
name = "flann",
build_file = "//third_party:flann.BUILD",
strip_prefix = "flann-1.8.4-src",
urls = [
"https://www.cs.ubc.ca/research/flann/uploads/FLANN/flann-1.8.4-src.zip",
],
)
# HDF5
new_http_archive(
name = "hdf5",
url = "https://support.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.10.1.tar.gz",
strip_prefix = "hdf5-1.10.1",
build_file = "third_party/hdf5.BUILD",
)
# Boost
# http_archive(
# name = "com_github_nelhage_boost",
# sha256 = "5c88fc077f6b8111e997fec5146e5f9940ae9a2016eb9949447fcb4b482bcdb3",
# strip_prefix = "rules_boost-7289bb1d8f938fdf98078297768c122ee9e11c9e",
# urls = [
# "https://mirror.bazel.build/github.com/nelhage/rules_boost/archive/7289bb1d8f938fdf98078297768c122ee9e11c9e.tar.gz",
# "https://github.com/nelhage/rules_boost/archive/7289bb1d8f938fdf98078297768c122ee9e11c9e.tar.gz",
# ],
# )
#
# load("@com_github_nelhage_boost//:boost/boost.bzl", "boost_deps")
# boost_deps()
git_repository(
name = "com_github_nelhage_rules_boost",
commit = "239ce40e42ab0e3fe7ce84c2e9303ff8a277c41a",
remote = "https://github.com/nelhage/rules_boost",
)
load("@com_github_nelhage_rules_boost//:boost/boost.bzl", "boost_deps")
boost_deps()
# Eigen
# Based on https://github.com/tensorflow/tensorflow/blob/master/third_party/eigen.BUILD
new_http_archive(
name = "eigen_repo",
build_file = "//third_party:eigen.BUILD",
sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
strip_prefix = "eigen-eigen-f3a22f35b044",
urls = [
"http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
"https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
],
)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
This diff is collapsed.
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Generates data for training/validation and save it to disk."""
# Example usage:
#
# python dataset/gen_data.py \
# --alsologtostderr \
# --dataset_name kitti_raw_eigen \
# --dataset_dir ~/vid2depth/dataset/kitti-raw-uncompressed \
# --data_dir ~/vid2depth/data/kitti_raw_eigen_s3 \
# --seq_length 3 \
# --num_threads 12
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import multiprocessing
import os
from absl import app
from absl import flags
from absl import logging
import dataset_loader
import numpy as np
import scipy.misc
import tensorflow as tf
gfile = tf.gfile
FLAGS = flags.FLAGS
DATASETS = [
'kitti_raw_eigen', 'kitti_raw_stereo', 'kitti_odom', 'cityscapes', 'bike'
]
flags.DEFINE_enum('dataset_name', None, DATASETS, 'Dataset name.')
flags.DEFINE_string('dataset_dir', None, 'Location for dataset source files.')
flags.DEFINE_string('data_dir', None, 'Where to save the generated data.')
# Note: Training time grows linearly with sequence length. Use 2 or 3.
flags.DEFINE_integer('seq_length', 3, 'Length of each training sequence.')
flags.DEFINE_integer('img_height', 128, 'Image height.')
flags.DEFINE_integer('img_width', 416, 'Image width.')
flags.DEFINE_integer(
'num_threads', None, 'Number of worker threads. '
'Defaults to number of CPU cores.')
flags.mark_flag_as_required('dataset_name')
flags.mark_flag_as_required('dataset_dir')
flags.mark_flag_as_required('data_dir')
# Process data in chunks for reporting progress.
NUM_CHUNKS = 100
def _generate_data():
"""Extract sequences from dataset_dir and store them in data_dir."""
if not gfile.Exists(FLAGS.data_dir):
gfile.MakeDirs(FLAGS.data_dir)
global dataloader # pylint: disable=global-variable-undefined
if FLAGS.dataset_name == 'bike':
dataloader = dataset_loader.Bike(FLAGS.dataset_dir,
img_height=FLAGS.img_height,
img_width=FLAGS.img_width,
seq_length=FLAGS.seq_length)
elif FLAGS.dataset_name == 'kitti_odom':
dataloader = dataset_loader.KittiOdom(FLAGS.dataset_dir,
img_height=FLAGS.img_height,
img_width=FLAGS.img_width,
seq_length=FLAGS.seq_length)
elif FLAGS.dataset_name == 'kitti_raw_eigen':
dataloader = dataset_loader.KittiRaw(FLAGS.dataset_dir,
split='eigen',
img_height=FLAGS.img_height,
img_width=FLAGS.img_width,
seq_length=FLAGS.seq_length)
elif FLAGS.dataset_name == 'kitti_raw_stereo':
dataloader = dataset_loader.KittiRaw(FLAGS.dataset_dir,
split='stereo',
img_height=FLAGS.img_height,
img_width=FLAGS.img_width,
seq_length=FLAGS.seq_length)
elif FLAGS.dataset_name == 'cityscapes':
dataloader = dataset_loader.Cityscapes(FLAGS.dataset_dir,
img_height=FLAGS.img_height,
img_width=FLAGS.img_width,
seq_length=FLAGS.seq_length)
else:
raise ValueError('Unknown dataset')
# The default loop below uses multiprocessing, which can make it difficult
# to locate source of errors in data loader classes.
# Uncomment this loop for easier debugging:
# all_examples = {}
# for i in range(dataloader.num_train):
# _gen_example(i, all_examples)
# logging.info('Generated: %d', len(all_examples))
all_frames = range(dataloader.num_train)
frame_chunks = np.array_split(all_frames, NUM_CHUNKS)
manager = multiprocessing.Manager()
all_examples = manager.dict()
num_cores = multiprocessing.cpu_count()
num_threads = num_cores if FLAGS.num_threads is None else FLAGS.num_threads
pool = multiprocessing.Pool(num_threads)
# Split into training/validation sets. Fixed seed for repeatability.
np.random.seed(8964)
if not gfile.Exists(FLAGS.data_dir):
gfile.MakeDirs(FLAGS.data_dir)
with gfile.Open(os.path.join(FLAGS.data_dir, 'train.txt'), 'w') as train_f:
with gfile.Open(os.path.join(FLAGS.data_dir, 'val.txt'), 'w') as val_f:
logging.info('Generating data...')
for index, frame_chunk in enumerate(frame_chunks):
all_examples.clear()
pool.map(_gen_example_star,
itertools.izip(frame_chunk, itertools.repeat(all_examples)))
logging.info('Chunk %d/%d: saving %s entries...', index + 1, NUM_CHUNKS,
len(all_examples))
for _, example in all_examples.items():
if example:
s = example['folder_name']
frame = example['file_name']
if np.random.random() < 0.1:
val_f.write('%s %s\n' % (s, frame))
else:
train_f.write('%s %s\n' % (s, frame))
pool.close()
pool.join()
def _gen_example(i, all_examples):
"""Saves one example to file. Also adds it to all_examples dict."""
example = dataloader.get_example_with_index(i)
if not example:
return
image_seq_stack = _stack_image_seq(example['image_seq'])
example.pop('image_seq', None) # Free up memory.
intrinsics = example['intrinsics']
fx = intrinsics[0, 0]
fy = intrinsics[1, 1]
cx = intrinsics[0, 2]
cy = intrinsics[1, 2]
save_dir = os.path.join(FLAGS.data_dir, example['folder_name'])
if not gfile.Exists(save_dir):
gfile.MakeDirs(save_dir)
img_filepath = os.path.join(save_dir, '%s.jpg' % example['file_name'])
scipy.misc.imsave(img_filepath, image_seq_stack.astype(np.uint8))
cam_filepath = os.path.join(save_dir, '%s_cam.txt' % example['file_name'])
example['cam'] = '%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy)
with open(cam_filepath, 'w') as cam_f:
cam_f.write(example['cam'])
key = example['folder_name'] + '_' + example['file_name']
all_examples[key] = example
def _gen_example_star(params):
return _gen_example(*params)
def _stack_image_seq(seq):
for i, im in enumerate(seq):
if i == 0:
res = im
else:
res = np.hstack((res, im))
return res
def main(_):
_generate_data()
if __name__ == '__main__':
app.run(main)
This diff is collapsed.
This diff is collapsed.
training/image_2/000000_10.png
training/image_2/000001_10.png
training/image_2/000002_10.png
training/image_2/000003_10.png
training/image_2/000004_10.png
training/image_2/000005_10.png
training/image_2/000006_10.png
training/image_2/000007_10.png
training/image_2/000008_10.png
training/image_2/000009_10.png
training/image_2/000010_10.png
training/image_2/000011_10.png
training/image_2/000012_10.png
training/image_2/000013_10.png
training/image_2/000014_10.png
training/image_2/000015_10.png
training/image_2/000016_10.png
training/image_2/000017_10.png
training/image_2/000018_10.png
training/image_2/000019_10.png
training/image_2/000020_10.png
training/image_2/000021_10.png
training/image_2/000022_10.png
training/image_2/000023_10.png
training/image_2/000024_10.png
training/image_2/000025_10.png
training/image_2/000026_10.png
training/image_2/000027_10.png
training/image_2/000028_10.png
training/image_2/000029_10.png
training/image_2/000030_10.png
training/image_2/000031_10.png
training/image_2/000032_10.png
training/image_2/000033_10.png
training/image_2/000034_10.png
training/image_2/000035_10.png
training/image_2/000036_10.png
training/image_2/000037_10.png
training/image_2/000038_10.png
training/image_2/000039_10.png
training/image_2/000040_10.png
training/image_2/000041_10.png
training/image_2/000042_10.png
training/image_2/000043_10.png
training/image_2/000044_10.png
training/image_2/000045_10.png
training/image_2/000046_10.png
training/image_2/000047_10.png
training/image_2/000048_10.png
training/image_2/000049_10.png
training/image_2/000050_10.png
training/image_2/000051_10.png
training/image_2/000052_10.png
training/image_2/000053_10.png
training/image_2/000054_10.png
training/image_2/000055_10.png
training/image_2/000056_10.png
training/image_2/000057_10.png
training/image_2/000058_10.png
training/image_2/000059_10.png
training/image_2/000060_10.png
training/image_2/000061_10.png
training/image_2/000062_10.png
training/image_2/000063_10.png
training/image_2/000064_10.png
training/image_2/000065_10.png
training/image_2/000066_10.png
training/image_2/000067_10.png
training/image_2/000068_10.png
training/image_2/000069_10.png
training/image_2/000070_10.png
training/image_2/000071_10.png
training/image_2/000072_10.png
training/image_2/000073_10.png
training/image_2/000074_10.png
training/image_2/000075_10.png
training/image_2/000076_10.png
training/image_2/000077_10.png
training/image_2/000078_10.png
training/image_2/000079_10.png
training/image_2/000080_10.png
training/image_2/000081_10.png
training/image_2/000082_10.png
training/image_2/000083_10.png
training/image_2/000084_10.png
training/image_2/000085_10.png
training/image_2/000086_10.png
training/image_2/000087_10.png
training/image_2/000088_10.png
training/image_2/000089_10.png
training/image_2/000090_10.png
training/image_2/000091_10.png
training/image_2/000092_10.png
training/image_2/000093_10.png
training/image_2/000094_10.png
training/image_2/000095_10.png
training/image_2/000096_10.png
training/image_2/000097_10.png
training/image_2/000098_10.png
training/image_2/000099_10.png
training/image_2/000100_10.png
training/image_2/000101_10.png
training/image_2/000102_10.png
training/image_2/000103_10.png
training/image_2/000104_10.png
training/image_2/000105_10.png
training/image_2/000106_10.png
training/image_2/000107_10.png
training/image_2/000108_10.png
training/image_2/000109_10.png
training/image_2/000110_10.png
training/image_2/000111_10.png
training/image_2/000112_10.png
training/image_2/000113_10.png
training/image_2/000114_10.png
training/image_2/000115_10.png
training/image_2/000116_10.png
training/image_2/000117_10.png
training/image_2/000118_10.png
training/image_2/000119_10.png
training/image_2/000120_10.png
training/image_2/000121_10.png
training/image_2/000122_10.png
training/image_2/000123_10.png
training/image_2/000124_10.png
training/image_2/000125_10.png
training/image_2/000126_10.png
training/image_2/000127_10.png
training/image_2/000128_10.png
training/image_2/000129_10.png
training/image_2/000130_10.png
training/image_2/000131_10.png
training/image_2/000132_10.png
training/image_2/000133_10.png
training/image_2/000134_10.png
training/image_2/000135_10.png
training/image_2/000136_10.png
training/image_2/000137_10.png
training/image_2/000138_10.png
training/image_2/000139_10.png
training/image_2/000140_10.png
training/image_2/000141_10.png
training/image_2/000142_10.png
training/image_2/000143_10.png
training/image_2/000144_10.png
training/image_2/000145_10.png
training/image_2/000146_10.png
training/image_2/000147_10.png
training/image_2/000148_10.png
training/image_2/000149_10.png
training/image_2/000150_10.png
training/image_2/000151_10.png
training/image_2/000152_10.png
training/image_2/000153_10.png
training/image_2/000154_10.png
training/image_2/000155_10.png
training/image_2/000156_10.png
training/image_2/000157_10.png
training/image_2/000158_10.png
training/image_2/000159_10.png
training/image_2/000160_10.png
training/image_2/000161_10.png
training/image_2/000162_10.png
training/image_2/000163_10.png
training/image_2/000164_10.png
training/image_2/000165_10.png
training/image_2/000166_10.png
training/image_2/000167_10.png
training/image_2/000168_10.png
training/image_2/000169_10.png
training/image_2/000170_10.png
training/image_2/000171_10.png
training/image_2/000172_10.png
training/image_2/000173_10.png
training/image_2/000174_10.png
training/image_2/000175_10.png
training/image_2/000176_10.png
training/image_2/000177_10.png
training/image_2/000178_10.png
training/image_2/000179_10.png
training/image_2/000180_10.png
training/image_2/000181_10.png
training/image_2/000182_10.png
training/image_2/000183_10.png
training/image_2/000184_10.png
training/image_2/000185_10.png
training/image_2/000186_10.png
training/image_2/000187_10.png
training/image_2/000188_10.png
training/image_2/000189_10.png
training/image_2/000190_10.png
training/image_2/000191_10.png
training/image_2/000192_10.png
training/image_2/000193_10.png
training/image_2/000194_10.png
training/image_2/000195_10.png
training/image_2/000196_10.png
training/image_2/000197_10.png
training/image_2/000198_10.png
training/image_2/000199_10.png
2011_09_26_drive_0117
2011_09_28_drive_0002
2011_09_26_drive_0052
2011_09_30_drive_0016
2011_09_26_drive_0059
2011_09_26_drive_0027
2011_09_26_drive_0020
2011_09_26_drive_0009
2011_09_26_drive_0013
2011_09_26_drive_0101
2011_09_26_drive_0046
2011_09_26_drive_0029
2011_09_26_drive_0064
2011_09_26_drive_0048
2011_10_03_drive_0027
2011_09_26_drive_0002
2011_09_26_drive_0036
2011_09_29_drive_0071
2011_10_03_drive_0047
2011_09_30_drive_0027
2011_09_26_drive_0086
2011_09_26_drive_0084
2011_09_26_drive_0096
2011_09_30_drive_0018
2011_09_26_drive_0106
2011_09_26_drive_0056
2011_09_26_drive_0023
2011_09_26_drive_0093
2011_09_26_drive_0005
2011_09_26_drive_0009
2011_09_26_drive_0011
2011_09_26_drive_0013
2011_09_26_drive_0014
2011_09_26_drive_0015
2011_09_26_drive_0017
2011_09_26_drive_0018
2011_09_26_drive_0019
2011_09_26_drive_0022
2011_09_26_drive_0027
2011_09_26_drive_0028
2011_09_26_drive_0029
2011_09_26_drive_0032
2011_09_26_drive_0036
2011_09_26_drive_0046
2011_09_26_drive_0051
2011_09_26_drive_0056
2011_09_26_drive_0057
2011_09_26_drive_0059
2011_09_26_drive_0070
2011_09_26_drive_0084
2011_09_26_drive_0096
2011_09_26_drive_0101
2011_09_26_drive_0104
2011_09_28_drive_0002
2011_09_29_drive_0004
2011_09_29_drive_0071
2011_10_03_drive_0047
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Generates depth estimates for an entire KITTI video."""
# Example usage:
#
# python inference.py \
# --logtostderr \
# --kitti_dir ~/vid2depth/kitti-raw-uncompressed \
# --kitti_video 2011_09_26/2011_09_26_drive_0009_sync \
# --output_dir ~/vid2depth/inference \
# --model_ckpt ~/vid2depth/trained-model/model-119496
#
# python inference.py \
# --logtostderr \
# --kitti_dir ~/vid2depth/kitti-raw-uncompressed \
# --kitti_video test_files_eigen \
# --output_dir ~/vid2depth/inference \
# --model_ckpt ~/vid2depth/trained-model/model-119496
#
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
from absl import logging
import matplotlib.pyplot as plt
import model
import numpy as np
import scipy.misc
import tensorflow as tf
import util
gfile = tf.gfile
HOME_DIR = os.path.expanduser('~')
DEFAULT_OUTPUT_DIR = os.path.join(HOME_DIR, 'vid2depth/inference')
DEFAULT_KITTI_DIR = os.path.join(HOME_DIR, 'kitti-raw-uncompressed')
flags.DEFINE_string('output_dir', DEFAULT_OUTPUT_DIR,
'Directory to store estimated depth maps.')
flags.DEFINE_string('kitti_dir', DEFAULT_KITTI_DIR, 'KITTI dataset directory.')
flags.DEFINE_string('model_ckpt', None, 'Model checkpoint to load.')
flags.DEFINE_string('kitti_video', None, 'KITTI video directory name.')
flags.DEFINE_integer('batch_size', 4, 'The size of a sample batch.')
flags.DEFINE_integer('img_height', 128, 'Image height.')
flags.DEFINE_integer('img_width', 416, 'Image width.')
flags.DEFINE_integer('seq_length', 3, 'Sequence length for each example.')
FLAGS = flags.FLAGS
flags.mark_flag_as_required('kitti_video')
flags.mark_flag_as_required('model_ckpt')
CMAP = 'plasma'
def _run_inference():
"""Runs all images through depth model and saves depth maps."""
ckpt_basename = os.path.basename(FLAGS.model_ckpt)
ckpt_modelname = os.path.basename(os.path.dirname(FLAGS.model_ckpt))
output_dir = os.path.join(FLAGS.output_dir,
FLAGS.kitti_video.replace('/', '_') + '_' +
ckpt_modelname + '_' + ckpt_basename)
if not gfile.Exists(output_dir):
gfile.MakeDirs(output_dir)
inference_model = model.Model(is_training=False,
seq_length=FLAGS.seq_length,
batch_size=FLAGS.batch_size,
img_height=FLAGS.img_height,
img_width=FLAGS.img_width)
vars_to_restore = util.get_vars_to_restore(FLAGS.model_ckpt)
saver = tf.train.Saver(vars_to_restore)
sv = tf.train.Supervisor(logdir='/tmp/', saver=None)
with sv.managed_session() as sess:
saver.restore(sess, FLAGS.model_ckpt)
if FLAGS.kitti_video == 'test_files_eigen':
im_files = util.read_text_lines(
util.get_resource_path('dataset/kitti/test_files_eigen.txt'))
im_files = [os.path.join(FLAGS.kitti_dir, f) for f in im_files]
else:
video_path = os.path.join(FLAGS.kitti_dir, FLAGS.kitti_video)
im_files = gfile.Glob(os.path.join(video_path, 'image_02/data', '*.png'))
im_files = [f for f in im_files if 'disp' not in f]
im_files = sorted(im_files)
for i in range(0, len(im_files), FLAGS.batch_size):
if i % 100 == 0:
logging.info('Generating from %s: %d/%d', ckpt_basename, i,
len(im_files))
inputs = np.zeros(
(FLAGS.batch_size, FLAGS.img_height, FLAGS.img_width, 3),
dtype=np.uint8)
for b in range(FLAGS.batch_size):
idx = i + b
if idx >= len(im_files):
break
im = scipy.misc.imread(im_files[idx])
inputs[b] = scipy.misc.imresize(im, (FLAGS.img_height, FLAGS.img_width))
results = inference_model.inference(inputs, sess, mode='depth')
for b in range(FLAGS.batch_size):
idx = i + b
if idx >= len(im_files):
break
if FLAGS.kitti_video == 'test_files_eigen':
depth_path = os.path.join(output_dir, '%03d.png' % idx)
else:
depth_path = os.path.join(output_dir, '%04d.png' % idx)
depth_map = results['depth'][b]
depth_map = np.squeeze(depth_map)
colored_map = _normalize_depth_for_display(depth_map, cmap=CMAP)
input_float = inputs[b].astype(np.float32) / 255.0
vertical_stack = np.concatenate((input_float, colored_map), axis=0)
scipy.misc.imsave(depth_path, vertical_stack)
def _gray2rgb(im, cmap=CMAP):
cmap = plt.get_cmap(cmap)
rgba_img = cmap(im.astype(np.float32))
rgb_img = np.delete(rgba_img, 3, 2)
return rgb_img
def _normalize_depth_for_display(depth,
pc=95,
crop_percent=0,
normalizer=None,
cmap=CMAP):
"""Converts a depth map to an RGB image."""
# Convert to disparity.
disp = 1.0 / (depth + 1e-6)
if normalizer is not None:
disp /= normalizer
else:
disp /= (np.percentile(disp, pc) + 1e-6)
disp = np.clip(disp, 0, 1)
disp = _gray2rgb(disp, cmap=cmap)
keep_h = int(disp.shape[0] * (1 - crop_percent))
disp = disp[:keep_h]
return disp
def main(_):
_run_inference()
if __name__ == '__main__':
app.run(main)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Build model for inference or training."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import logging
import nets
from ops import icp_grad # pylint: disable=unused-import
from ops.icp_op import icp
import project
import reader
import tensorflow as tf
import util
gfile = tf.gfile
slim = tf.contrib.slim
NUM_SCALES = 4
class Model(object):
"""Model code from SfMLearner."""
def __init__(self,
data_dir=None,
is_training=True,
learning_rate=0.0002,
beta1=0.9,
reconstr_weight=0.85,
smooth_weight=0.05,
ssim_weight=0.15,
icp_weight=0.0,
batch_size=4,
img_height=128,
img_width=416,
seq_length=3,
legacy_mode=False):
self.data_dir = data_dir
self.is_training = is_training
self.learning_rate = learning_rate
self.reconstr_weight = reconstr_weight
self.smooth_weight = smooth_weight
self.ssim_weight = ssim_weight
self.icp_weight = icp_weight
self.beta1 = beta1
self.batch_size = batch_size
self.img_height = img_height
self.img_width = img_width
self.seq_length = seq_length
self.legacy_mode = legacy_mode
logging.info('data_dir: %s', data_dir)
logging.info('learning_rate: %s', learning_rate)
logging.info('beta1: %s', beta1)
logging.info('smooth_weight: %s', smooth_weight)
logging.info('ssim_weight: %s', ssim_weight)
logging.info('icp_weight: %s', icp_weight)
logging.info('batch_size: %s', batch_size)
logging.info('img_height: %s', img_height)
logging.info('img_width: %s', img_width)
logging.info('seq_length: %s', seq_length)
logging.info('legacy_mode: %s', legacy_mode)
if self.is_training:
self.reader = reader.DataReader(self.data_dir, self.batch_size,
self.img_height, self.img_width,
self.seq_length, NUM_SCALES)
self.build_train_graph()
else:
self.build_depth_test_graph()
self.build_egomotion_test_graph()
# At this point, the model is ready. Print some info on model params.
util.count_parameters()
def build_train_graph(self):
self.build_inference_for_training()
self.build_loss()
self.build_train_op()
self.build_summaries()
def build_inference_for_training(self):
"""Invokes depth and ego-motion networks and computes clouds if needed."""
(self.image_stack, self.intrinsic_mat, self.intrinsic_mat_inv) = (
self.reader.read_data())
with tf.name_scope('egomotion_prediction'):
self.egomotion, _ = nets.egomotion_net(self.image_stack, is_training=True,
legacy_mode=self.legacy_mode)
with tf.variable_scope('depth_prediction'):
# Organized by ...[i][scale]. Note that the order is flipped in
# variables in build_loss() below.
self.disp = {}
self.depth = {}
if self.icp_weight > 0:
self.cloud = {}
for i in range(self.seq_length):
image = self.image_stack[:, :, :, 3 * i:3 * (i + 1)]
multiscale_disps_i, _ = nets.disp_net(image, is_training=True)
multiscale_depths_i = [1.0 / d for d in multiscale_disps_i]
self.disp[i] = multiscale_disps_i
self.depth[i] = multiscale_depths_i
if self.icp_weight > 0:
multiscale_clouds_i = [
project.get_cloud(d,
self.intrinsic_mat_inv[:, s, :, :],
name='cloud%d_%d' % (s, i))
for (s, d) in enumerate(multiscale_depths_i)
]
self.cloud[i] = multiscale_clouds_i
# Reuse the same depth graph for all images.
tf.get_variable_scope().reuse_variables()
logging.info('disp: %s', util.info(self.disp))
def build_loss(self):
"""Adds ops for computing loss."""
with tf.name_scope('compute_loss'):
self.reconstr_loss = 0
self.smooth_loss = 0
self.ssim_loss = 0
self.icp_transform_loss = 0
self.icp_residual_loss = 0
# self.images is organized by ...[scale][B, h, w, seq_len * 3].
self.images = [{} for _ in range(NUM_SCALES)]
# Following nested lists are organized by ...[scale][source-target].
self.warped_image = [{} for _ in range(NUM_SCALES)]
self.warp_mask = [{} for _ in range(NUM_SCALES)]
self.warp_error = [{} for _ in range(NUM_SCALES)]
self.ssim_error = [{} for _ in range(NUM_SCALES)]
self.icp_transform = [{} for _ in range(NUM_SCALES)]
self.icp_residual = [{} for _ in range(NUM_SCALES)]
self.middle_frame_index = util.get_seq_middle(self.seq_length)
# Compute losses at each scale.
for s in range(NUM_SCALES):
# Scale image stack.
height_s = int(self.img_height / (2**s))
width_s = int(self.img_width / (2**s))
self.images[s] = tf.image.resize_area(self.image_stack,
[height_s, width_s])
# Smoothness.
if self.smooth_weight > 0:
for i in range(self.seq_length):
# In legacy mode, use the depth map from the middle frame only.
if not self.legacy_mode or i == self.middle_frame_index:
self.smooth_loss += 1.0 / (2**s) * self.depth_smoothness(
self.disp[i][s], self.images[s][:, :, :, 3 * i:3 * (i + 1)])
for i in range(self.seq_length):
for j in range(self.seq_length):
# Only consider adjacent frames.
if i == j or abs(i - j) != 1:
continue
# In legacy mode, only consider the middle frame as target.
if self.legacy_mode and j != self.middle_frame_index:
continue
source = self.images[s][:, :, :, 3 * i:3 * (i + 1)]
target = self.images[s][:, :, :, 3 * j:3 * (j + 1)]
target_depth = self.depth[j][s]
key = '%d-%d' % (i, j)
# Extract ego-motion from i to j
egomotion_index = min(i, j)
egomotion_mult = 1
if i > j:
# Need to inverse egomotion when going back in sequence.
egomotion_mult *= -1
# For compatiblity with SfMLearner, interpret all egomotion vectors
# as pointing toward the middle frame. Note that unlike SfMLearner,
# each vector captures the motion to/from its next frame, and not
# the center frame. Although with seq_length == 3, there is no
# difference.
if self.legacy_mode:
if egomotion_index >= self.middle_frame_index:
egomotion_mult *= -1
egomotion = egomotion_mult * self.egomotion[:, egomotion_index, :]
# Inverse warp the source image to the target image frame for
# photometric consistency loss.
self.warped_image[s][key], self.warp_mask[s][key] = (
project.inverse_warp(source,
target_depth,
egomotion,
self.intrinsic_mat[:, s, :, :],
self.intrinsic_mat_inv[:, s, :, :]))
# Reconstruction loss.
self.warp_error[s][key] = tf.abs(self.warped_image[s][key] - target)
self.reconstr_loss += tf.reduce_mean(
self.warp_error[s][key] * self.warp_mask[s][key])
# SSIM.
if self.ssim_weight > 0:
self.ssim_error[s][key] = self.ssim(self.warped_image[s][key],
target)
# TODO(rezama): This should be min_pool2d().
ssim_mask = slim.avg_pool2d(self.warp_mask[s][key], 3, 1, 'VALID')
self.ssim_loss += tf.reduce_mean(
self.ssim_error[s][key] * ssim_mask)
# 3D loss.
if self.icp_weight > 0:
cloud_a = self.cloud[j][s]
cloud_b = self.cloud[i][s]
self.icp_transform[s][key], self.icp_residual[s][key] = icp(
cloud_a, egomotion, cloud_b)
self.icp_transform_loss += 1.0 / (2**s) * tf.reduce_mean(
tf.abs(self.icp_transform[s][key]))
self.icp_residual_loss += 1.0 / (2**s) * tf.reduce_mean(
tf.abs(self.icp_residual[s][key]))
self.total_loss = self.reconstr_weight * self.reconstr_loss
if self.smooth_weight > 0:
self.total_loss += self.smooth_weight * self.smooth_loss
if self.ssim_weight > 0:
self.total_loss += self.ssim_weight * self.ssim_loss
if self.icp_weight > 0:
self.total_loss += self.icp_weight * (self.icp_transform_loss +
self.icp_residual_loss)
def gradient_x(self, img):
return img[:, :, :-1, :] - img[:, :, 1:, :]
def gradient_y(self, img):
return img[:, :-1, :, :] - img[:, 1:, :, :]
def depth_smoothness(self, depth, img):
"""Computes image-aware depth smoothness loss."""
depth_dx = self.gradient_x(depth)
depth_dy = self.gradient_y(depth)
image_dx = self.gradient_x(img)
image_dy = self.gradient_y(img)
weights_x = tf.exp(-tf.reduce_mean(tf.abs(image_dx), 3, keepdims=True))
weights_y = tf.exp(-tf.reduce_mean(tf.abs(image_dy), 3, keepdims=True))
smoothness_x = depth_dx * weights_x
smoothness_y = depth_dy * weights_y
return tf.reduce_mean(abs(smoothness_x)) + tf.reduce_mean(abs(smoothness_y))
def ssim(self, x, y):
"""Computes a differentiable structured image similarity measure."""
c1 = 0.01**2
c2 = 0.03**2
mu_x = slim.avg_pool2d(x, 3, 1, 'VALID')
mu_y = slim.avg_pool2d(y, 3, 1, 'VALID')
sigma_x = slim.avg_pool2d(x**2, 3, 1, 'VALID') - mu_x**2
sigma_y = slim.avg_pool2d(y**2, 3, 1, 'VALID') - mu_y**2
sigma_xy = slim.avg_pool2d(x * y, 3, 1, 'VALID') - mu_x * mu_y
ssim_n = (2 * mu_x * mu_y + c1) * (2 * sigma_xy + c2)
ssim_d = (mu_x**2 + mu_y**2 + c1) * (sigma_x + sigma_y + c2)
ssim = ssim_n / ssim_d
return tf.clip_by_value((1 - ssim) / 2, 0, 1)
def build_train_op(self):
with tf.name_scope('train_op'):
optim = tf.train.AdamOptimizer(self.learning_rate, self.beta1)
self.train_op = slim.learning.create_train_op(self.total_loss, optim)
self.global_step = tf.Variable(0, name='global_step', trainable=False)
self.incr_global_step = tf.assign(self.global_step, self.global_step + 1)
def build_summaries(self):
"""Adds scalar and image summaries for TensorBoard."""
tf.summary.scalar('total_loss', self.total_loss)
tf.summary.scalar('reconstr_loss', self.reconstr_loss)
if self.smooth_weight > 0:
tf.summary.scalar('smooth_loss', self.smooth_loss)
if self.ssim_weight > 0:
tf.summary.scalar('ssim_loss', self.ssim_loss)
if self.icp_weight > 0:
tf.summary.scalar('icp_transform_loss', self.icp_transform_loss)
tf.summary.scalar('icp_residual_loss', self.icp_residual_loss)
for i in range(self.seq_length - 1):
tf.summary.histogram('tx%d' % i, self.egomotion[:, i, 0])
tf.summary.histogram('ty%d' % i, self.egomotion[:, i, 1])
tf.summary.histogram('tz%d' % i, self.egomotion[:, i, 2])
tf.summary.histogram('rx%d' % i, self.egomotion[:, i, 3])
tf.summary.histogram('ry%d' % i, self.egomotion[:, i, 4])
tf.summary.histogram('rz%d' % i, self.egomotion[:, i, 5])
for s in range(NUM_SCALES):
for i in range(self.seq_length):
tf.summary.image('scale%d_image%d' % (s, i),
self.images[s][:, :, :, 3 * i:3 * (i + 1)])
if i in self.depth:
tf.summary.histogram('scale%d_depth%d' % (s, i), self.depth[i][s])
tf.summary.histogram('scale%d_disp%d' % (s, i), self.disp[i][s])
tf.summary.image('scale%d_disparity%d' % (s, i), self.disp[i][s])
for key in self.warped_image[s]:
tf.summary.image('scale%d_warped_image%s' % (s, key),
self.warped_image[s][key])
tf.summary.image('scale%d_warp_mask%s' % (s, key),
self.warp_mask[s][key])
tf.summary.image('scale%d_warp_error%s' % (s, key),
self.warp_error[s][key])
if self.ssim_weight > 0:
tf.summary.image('scale%d_ssim_error%s' % (s, key),
self.ssim_error[s][key])
if self.icp_weight > 0:
tf.summary.image('scale%d_icp_residual%s' % (s, key),
self.icp_residual[s][key])
transform = self.icp_transform[s][key]
tf.summary.histogram('scale%d_icp_tx%s' % (s, key), transform[:, 0])
tf.summary.histogram('scale%d_icp_ty%s' % (s, key), transform[:, 1])
tf.summary.histogram('scale%d_icp_tz%s' % (s, key), transform[:, 2])
tf.summary.histogram('scale%d_icp_rx%s' % (s, key), transform[:, 3])
tf.summary.histogram('scale%d_icp_ry%s' % (s, key), transform[:, 4])
tf.summary.histogram('scale%d_icp_rz%s' % (s, key), transform[:, 5])
def build_depth_test_graph(self):
"""Builds depth model reading from placeholders."""
with tf.name_scope('depth_prediction'):
with tf.variable_scope('depth_prediction'):
input_uint8 = tf.placeholder(
tf.uint8, [self.batch_size, self.img_height, self.img_width, 3],
name='raw_input')
input_float = tf.image.convert_image_dtype(input_uint8, tf.float32)
# TODO(rezama): Retrain published model with batchnorm params and set
# is_training to False.
est_disp, _ = nets.disp_net(input_float, is_training=True)
est_depth = 1.0 / est_disp[0]
self.inputs_depth = input_uint8
self.est_depth = est_depth
def build_egomotion_test_graph(self):
"""Builds egomotion model reading from placeholders."""
input_uint8 = tf.placeholder(
tf.uint8,
[self.batch_size, self.img_height, self.img_width * self.seq_length, 3],
name='raw_input')
input_float = tf.image.convert_image_dtype(input_uint8, tf.float32)
image_seq = input_float
image_stack = self.unpack_image_batches(image_seq)
with tf.name_scope('egomotion_prediction'):
# TODO(rezama): Retrain published model with batchnorm params and set
# is_training to False.
egomotion, _ = nets.egomotion_net(image_stack, is_training=True,
legacy_mode=self.legacy_mode)
self.inputs_egomotion = input_uint8
self.est_egomotion = egomotion
def unpack_image_batches(self, image_seq):
"""[B, h, w * seq_length, 3] -> [B, h, w, 3 * seq_length]."""
with tf.name_scope('unpack_images'):
image_list = [
image_seq[:, :, i * self.img_width:(i + 1) * self.img_width, :]
for i in range(self.seq_length)
]
image_stack = tf.concat(image_list, axis=3)
image_stack.set_shape([
self.batch_size, self.img_height, self.img_width, self.seq_length * 3
])
return image_stack
def inference(self, inputs, sess, mode):
"""Runs depth or egomotion inference from placeholders."""
fetches = {}
if mode == 'depth':
fetches['depth'] = self.est_depth
inputs_ph = self.inputs_depth
if mode == 'egomotion':
fetches['egomotion'] = self.est_egomotion
inputs_ph = self.inputs_egomotion
results = sess.run(fetches, feed_dict={inputs_ph: inputs})
return results
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Depth and Ego-Motion networks."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import flags
import numpy as np
import tensorflow as tf
import util
slim = tf.contrib.slim
# TODO(rezama): Move flag to main, pass as argument to functions.
flags.DEFINE_bool('use_bn', True, 'Add batch norm layers.')
FLAGS = flags.FLAGS
# Weight regularization.
WEIGHT_REG = 0.05
# Disparity (inverse depth) values range from 0.01 to 10.
DISP_SCALING = 10
MIN_DISP = 0.01
EGOMOTION_VEC_SIZE = 6
def egomotion_net(image_stack, is_training=True, legacy_mode=False):
"""Predict ego-motion vectors from a stack of frames.
Args:
image_stack: Input tensor with shape [B, h, w, seq_length * 3]. Regardless
of the value of legacy_mode, the input image sequence passed to the
function should be in normal order, e.g. [1, 2, 3].
is_training: Whether the model is being trained or not.
legacy_mode: Setting legacy_mode to True enables compatibility with
SfMLearner checkpoints. When legacy_mode is on, egomotion_net()
rearranges the input tensor to place the target (middle) frame first in
sequence. This is the arrangement of inputs that legacy models have
received during training. In legacy mode, the client program
(model.Model.build_loss()) interprets the outputs of this network
differently as well. For example:
When legacy_mode == True,
Network inputs will be [2, 1, 3]
Network outputs will be [1 -> 2, 3 -> 2]
When legacy_mode == False,
Network inputs will be [1, 2, 3]
Network outputs will be [1 -> 2, 2 -> 3]
Returns:
Egomotion vectors with shape [B, seq_length - 1, 6].
"""
seq_length = image_stack.get_shape()[3].value // 3 # 3 == RGB.
if legacy_mode:
# Put the target frame at the beginning of stack.
with tf.name_scope('rearrange_stack'):
mid_index = util.get_seq_middle(seq_length)
left_subset = image_stack[:, :, :, :mid_index * 3]
target_frame = image_stack[:, :, :, mid_index * 3:(mid_index + 1) * 3]
right_subset = image_stack[:, :, :, (mid_index + 1) * 3:]
image_stack = tf.concat([target_frame, left_subset, right_subset], axis=3)
batch_norm_params = {'is_training': is_training}
num_egomotion_vecs = seq_length - 1
with tf.variable_scope('pose_exp_net') as sc:
end_points_collection = sc.original_name_scope + '_end_points'
normalizer_fn = slim.batch_norm if FLAGS.use_bn else None
normalizer_params = batch_norm_params if FLAGS.use_bn else None
with slim.arg_scope([slim.conv2d, slim.conv2d_transpose],
normalizer_fn=normalizer_fn,
weights_regularizer=slim.l2_regularizer(WEIGHT_REG),
normalizer_params=normalizer_params,
activation_fn=tf.nn.relu,
outputs_collections=end_points_collection):
cnv1 = slim.conv2d(image_stack, 16, [7, 7], stride=2, scope='cnv1')
cnv2 = slim.conv2d(cnv1, 32, [5, 5], stride=2, scope='cnv2')
cnv3 = slim.conv2d(cnv2, 64, [3, 3], stride=2, scope='cnv3')
cnv4 = slim.conv2d(cnv3, 128, [3, 3], stride=2, scope='cnv4')
cnv5 = slim.conv2d(cnv4, 256, [3, 3], stride=2, scope='cnv5')
# Ego-motion specific layers
with tf.variable_scope('pose'):
cnv6 = slim.conv2d(cnv5, 256, [3, 3], stride=2, scope='cnv6')
cnv7 = slim.conv2d(cnv6, 256, [3, 3], stride=2, scope='cnv7')
pred_channels = EGOMOTION_VEC_SIZE * num_egomotion_vecs
egomotion_pred = slim.conv2d(cnv7,
pred_channels,
[1, 1],
scope='pred',
stride=1,
normalizer_fn=None,
activation_fn=None)
egomotion_avg = tf.reduce_mean(egomotion_pred, [1, 2])
# Tinghui found that scaling by a small constant facilitates training.
egomotion_final = 0.01 * tf.reshape(
egomotion_avg, [-1, num_egomotion_vecs, EGOMOTION_VEC_SIZE])
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
return egomotion_final, end_points
def disp_net(target_image, is_training=True):
"""Predict inverse of depth from a single image."""
batch_norm_params = {'is_training': is_training}
h = target_image.get_shape()[1].value
w = target_image.get_shape()[2].value
inputs = target_image
with tf.variable_scope('depth_net') as sc:
end_points_collection = sc.original_name_scope + '_end_points'
normalizer_fn = slim.batch_norm if FLAGS.use_bn else None
normalizer_params = batch_norm_params if FLAGS.use_bn else None
with slim.arg_scope([slim.conv2d, slim.conv2d_transpose],
normalizer_fn=normalizer_fn,
normalizer_params=normalizer_params,
weights_regularizer=slim.l2_regularizer(WEIGHT_REG),
activation_fn=tf.nn.relu,
outputs_collections=end_points_collection):
cnv1 = slim.conv2d(inputs, 32, [7, 7], stride=2, scope='cnv1')
cnv1b = slim.conv2d(cnv1, 32, [7, 7], stride=1, scope='cnv1b')
cnv2 = slim.conv2d(cnv1b, 64, [5, 5], stride=2, scope='cnv2')
cnv2b = slim.conv2d(cnv2, 64, [5, 5], stride=1, scope='cnv2b')
cnv3 = slim.conv2d(cnv2b, 128, [3, 3], stride=2, scope='cnv3')
cnv3b = slim.conv2d(cnv3, 128, [3, 3], stride=1, scope='cnv3b')
cnv4 = slim.conv2d(cnv3b, 256, [3, 3], stride=2, scope='cnv4')
cnv4b = slim.conv2d(cnv4, 256, [3, 3], stride=1, scope='cnv4b')
cnv5 = slim.conv2d(cnv4b, 512, [3, 3], stride=2, scope='cnv5')
cnv5b = slim.conv2d(cnv5, 512, [3, 3], stride=1, scope='cnv5b')
cnv6 = slim.conv2d(cnv5b, 512, [3, 3], stride=2, scope='cnv6')
cnv6b = slim.conv2d(cnv6, 512, [3, 3], stride=1, scope='cnv6b')
cnv7 = slim.conv2d(cnv6b, 512, [3, 3], stride=2, scope='cnv7')
cnv7b = slim.conv2d(cnv7, 512, [3, 3], stride=1, scope='cnv7b')
up7 = slim.conv2d_transpose(cnv7b, 512, [3, 3], stride=2, scope='upcnv7')
# There might be dimension mismatch due to uneven down/up-sampling.
up7 = _resize_like(up7, cnv6b)
i7_in = tf.concat([up7, cnv6b], axis=3)
icnv7 = slim.conv2d(i7_in, 512, [3, 3], stride=1, scope='icnv7')
up6 = slim.conv2d_transpose(icnv7, 512, [3, 3], stride=2, scope='upcnv6')
up6 = _resize_like(up6, cnv5b)
i6_in = tf.concat([up6, cnv5b], axis=3)
icnv6 = slim.conv2d(i6_in, 512, [3, 3], stride=1, scope='icnv6')
up5 = slim.conv2d_transpose(icnv6, 256, [3, 3], stride=2, scope='upcnv5')
up5 = _resize_like(up5, cnv4b)
i5_in = tf.concat([up5, cnv4b], axis=3)
icnv5 = slim.conv2d(i5_in, 256, [3, 3], stride=1, scope='icnv5')
up4 = slim.conv2d_transpose(icnv5, 128, [3, 3], stride=2, scope='upcnv4')
i4_in = tf.concat([up4, cnv3b], axis=3)
icnv4 = slim.conv2d(i4_in, 128, [3, 3], stride=1, scope='icnv4')
disp4 = (slim.conv2d(icnv4, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
normalizer_fn=None, scope='disp4')
* DISP_SCALING + MIN_DISP)
disp4_up = tf.image.resize_bilinear(disp4, [np.int(h / 4), np.int(w / 4)])
up3 = slim.conv2d_transpose(icnv4, 64, [3, 3], stride=2, scope='upcnv3')
i3_in = tf.concat([up3, cnv2b, disp4_up], axis=3)
icnv3 = slim.conv2d(i3_in, 64, [3, 3], stride=1, scope='icnv3')
disp3 = (slim.conv2d(icnv3, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
normalizer_fn=None, scope='disp3')
* DISP_SCALING + MIN_DISP)
disp3_up = tf.image.resize_bilinear(disp3, [np.int(h / 2), np.int(w / 2)])
up2 = slim.conv2d_transpose(icnv3, 32, [3, 3], stride=2, scope='upcnv2')
i2_in = tf.concat([up2, cnv1b, disp3_up], axis=3)
icnv2 = slim.conv2d(i2_in, 32, [3, 3], stride=1, scope='icnv2')
disp2 = (slim.conv2d(icnv2, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
normalizer_fn=None, scope='disp2')
* DISP_SCALING + MIN_DISP)
disp2_up = tf.image.resize_bilinear(disp2, [h, w])
up1 = slim.conv2d_transpose(icnv2, 16, [3, 3], stride=2, scope='upcnv1')
i1_in = tf.concat([up1, disp2_up], axis=3)
icnv1 = slim.conv2d(i1_in, 16, [3, 3], stride=1, scope='icnv1')
disp1 = (slim.conv2d(icnv1, 1, [3, 3], stride=1, activation_fn=tf.sigmoid,
normalizer_fn=None, scope='disp1')
* DISP_SCALING + MIN_DISP)
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
return [disp1, disp2, disp3, disp4], end_points
def _resize_like(inputs, ref):
i_h, i_w = inputs.get_shape()[1], inputs.get_shape()[2]
r_h, r_w = ref.get_shape()[1], ref.get_shape()[2]
if i_h == r_h and i_w == r_w:
return inputs
else:
return tf.image.resize_nearest_neighbor(inputs, [r_h.value, r_w.value])
load("@org_tensorflow//tensorflow:tensorflow.bzl", "tf_custom_op_library")
package(default_visibility = ["//visibility:public"])
filegroup(
name = "test_data",
srcs = glob(["testdata/**"]),
)
cc_library(
name = "icp_op_kernel",
srcs = ["icp_op_kernel.cc"],
copts = [
"-fexceptions",
"-Wno-sign-compare",
"-D_GLIBCXX_USE_CXX11_ABI=0",
],
deps = [
"@com_github_pointcloudlibrary_pcl//:common",
"@com_github_pointcloudlibrary_pcl//:registration",
"@com_google_protobuf//:protobuf",
"@org_tensorflow//tensorflow/core:framework_headers_lib",
],
)
tf_custom_op_library(
name = "icp_op.so",
linkopts = ["-llz4"],
deps = [
":icp_op_kernel",
],
)
py_library(
name = "icp_op",
srcs = ["icp_op.py"],
data = [
":icp_op.so",
],
srcs_version = "PY2AND3",
deps = [
],
)
py_library(
name = "icp_util",
srcs = ["icp_util.py"],
data = [":test_data"],
srcs_version = "PY2AND3",
deps = [
"@org_tensorflow//tensorflow:tensorflow_py",
],
)
py_library(
name = "icp_grad",
srcs = ["icp_grad.py"],
data = [
":icp_op.so",
],
srcs_version = "PY2AND3",
deps = [
"@org_tensorflow//tensorflow:tensorflow_py",
":icp_op",
],
)
cc_binary(
name = "pcl_demo",
srcs = ["pcl_demo.cc"],
deps = [
"@com_github_pointcloudlibrary_pcl//:common",
"@com_github_pointcloudlibrary_pcl//:registration",
],
)
py_binary(
name = "icp_train_demo",
srcs = ["icp_train_demo.py"],
data = [
":icp_op.so",
],
srcs_version = "PY2AND3",
deps = [
"@org_tensorflow//tensorflow:tensorflow_py",
":icp_op",
":icp_grad",
":icp_util",
],
)
py_test(
name = "icp_test",
size = "small",
srcs = ["icp_test.py"],
data = [
":icp_op.so",
],
srcs_version = "PY2AND3",
deps = [
"@org_tensorflow//tensorflow:tensorflow_py",
":icp_op",
":icp_util",
],
)
py_test(
name = "icp_grad_test",
size = "small",
srcs = ["icp_grad_test.py"],
data = [
":icp_op.so",
],
srcs_version = "PY2AND3",
deps = [
"@org_tensorflow//tensorflow:tensorflow_py",
":icp_op",
":icp_grad",
":icp_test",
],
)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""The gradient of the icp op."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.python.framework import ops
@ops.RegisterGradient('Icp')
def _icp_grad(op, grad_transform, grad_residual):
"""The gradients for `icp`.
Args:
op: The `icp` `Operation` that we are differentiating, which we can use
to find the inputs and outputs of the original op.
grad_transform: Gradient with respect to `transform` output of the `icp` op.
grad_residual: Gradient with respect to `residual` output of the
`icp` op.
Returns:
Gradients with respect to the inputs of `icp`.
"""
unused_transform = op.outputs[0]
unused_residual = op.outputs[1]
unused_source = op.inputs[0]
unused_ego_motion = op.inputs[1]
unused_target = op.inputs[2]
grad_p = -grad_residual
grad_ego_motion = -grad_transform
return [grad_p, grad_ego_motion, None]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment