Commit 5b9d9097 authored by Saurabh Gupta's avatar Saurabh Gupta
Browse files

Implementation for Cognitive Mapping and Planning paper.

parent c136af63
# Cognitive Mapping and Planning for Visual Navigation
**Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik**
**Computer Vision and Pattern Recognition (CVPR) 2017.**
**[ArXiv](https://arxiv.org/abs/1702.03920),
[Project Website](https://sites.google.com/corp/view/cognitive-mapping-and-planning/)**
### Citing
If you find this code base and models useful in your research, please consider
citing the following paper:
```
@inproceedings{gupta2017cognitive,
title={Cognitive Mapping and Planning for Visual Navigation},
author={Gupta, Saurabh and Davidson, James and Levine, Sergey and
Sukthankar, Rahul and Malik, Jitendra},
booktitle={CVPR},
year={2017}
}
```
### Contents
1. [Requirements: software](#requirements-software)
2. [Requirements: data](#requirements-data)
3. [Test Pre-trained Models](#test-pre_trained-models)
4. [Train your Own Models](#train-your-own-models)
### Requirements: software
1. Python Virtual Env Setup: All code is implemented in Python but depends on a
small number of python packages and a couple of C libraries. We recommend
using virtual environment for installing these python packages and python
bindings for these C libraries.
```Shell
VENV_DIR=venv
pip install virtualenv
virtualenv $VENV_DIR
source $VENV_DIR/bin/activate
# You may need to upgrade pip for installing openv-python.
pip install --upgrade pip
# Install simple dependencies.
pip install -r requirements.txt
# Patch bugs in dependencies.
sh patches/apply_patches.sh
```
2. Install [Tensorflow](https://www.tensorflow.org/) inside this virtual
environment. Typically done with `pip install --upgrade tensorflow-gpu`.
3. Swiftshader: We use
[Swiftshader](https://github.com/google/swiftshader.git), a CPU based
renderer to render the meshes. It is possible to use other renderers,
replace `SwiftshaderRenderer` in `render/swiftshader_renderer.py` with
bindings to your renderer.
```Shell
mkdir -p deps
git clone --recursive https://github.com/google/swiftshader.git deps/swiftshader-src
cd deps/swiftshader-src && git checkout 91da6b00584afd7dcaed66da88e2b617429b3950
mkdir build && cd build && cmake .. && make -j 16 libEGL libGLESv2
cd ../../../
cp deps/swiftshader-src/build/libEGL* libEGL.so.1
cp deps/swiftshader-src/build/libGLESv2* libGLESv2.so.2
```
4. PyAssimp: We use [PyAssimp](https://github.com/assimp/assimp.git) to load
meshes. It is possible to use other libraries to load meshes, replace
`Shape` `render/swiftshader_renderer.py` with bindings to your library for
loading meshes.
```Shell
mkdir -p deps
git clone https://github.com/assimp/assimp.git deps/assimp-src
cd deps/assimp-src
git checkout 2afeddd5cb63d14bc77b53740b38a54a97d94ee8
cmake CMakeLists.txt -G 'Unix Makefiles' && make -j 16
cd port/PyAssimp && python setup.py install
cd ../../../..
cp deps/assimp-src/lib/libassimp* .
```
5. graph-tool: We use [graph-tool](https://git.skewed.de/count0/graph-tool)
library for graph processing.
```Shell
mkdir -p deps
# If the following git clone command fails, you can also download the source
# from https://downloads.skewed.de/graph-tool/graph-tool-2.2.44.tar.bz2
git clone https://git.skewed.de/count0/graph-tool deps/graph-tool-src
cd deps/graph-tool-src && git checkout 178add3a571feb6666f4f119027705d95d2951ab
bash autogen.sh
./configure --disable-cairo --disable-sparsehash --prefix=$HOME/.local
make -j 16
make install
cd ../../
```
### Requirements: data
1. Download the Stanford 3D Inddor Spaces Dataset (S3DIS Dataset) and ImageNet
Pre-trained models for initializing different models. Follow instructions in
`data/README.md`
### Test Pre-trained Models
1. Download pre-trained models using
`scripts/scripts_download_pretrained_models.sh`
2. Test models using `scripts/script_test_pretrained_models.sh`.
### Train Your Own Models
All models were trained asynchronously with 16 workers each worker using data
from a single floor. The default hyper-parameters coorespond to this setting.
See [distributed training with
Tensorflow](https://www.tensorflow.org/deploy/distributed) for setting up
distributed training. Training with a single worker is possible with the current
code base but will require some minor changes to allow each worker to load all
training environments.
### Contact
For questions or issues open an issue on the tensorflow/models [issues
tracker](https://github.com/tensorflow/models/issues). Please assign issues to
@s-gupta.
### Credits
This code was written by Saurabh Gupta (@s-gupta).
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import os, sys
import numpy as np
from tensorflow.python.platform import app
from tensorflow.python.platform import flags
import logging
import src.utils as utils
import cfgs.config_common as cc
import tensorflow as tf
rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169'
d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002'
def get_default_args():
summary_args = utils.Foo(display_interval=1, test_iters=26,
arop_full_summary_iters=14)
control_args = utils.Foo(train=False, test=False,
force_batchnorm_is_training_at_test=False,
reset_rng_seed=False, only_eval_when_done=False,
test_mode=None)
return summary_args, control_args
def get_default_cmp_args():
batch_norm_param = {'center': True, 'scale': True,
'activation_fn':tf.nn.relu}
mapper_arch_args = utils.Foo(
dim_reduce_neurons=64,
fc_neurons=[1024, 1024],
fc_out_size=8,
fc_out_neurons=64,
encoder='resnet_v2_50',
deconv_neurons=[64, 32, 16, 8, 4, 2],
deconv_strides=[2, 2, 2, 2, 2, 2],
deconv_layers_per_block=2,
deconv_kernel_size=4,
fc_dropout=0.5,
combine_type='wt_avg_logits',
batch_norm_param=batch_norm_param)
readout_maps_arch_args = utils.Foo(
num_neurons=[],
strides=[],
kernel_size=None,
layers_per_block=None)
arch_args = utils.Foo(
vin_val_neurons=8, vin_action_neurons=8, vin_ks=3, vin_share_wts=False,
pred_neurons=[64, 64], pred_batch_norm_param=batch_norm_param,
conv_on_value_map=0, fr_neurons=16, fr_ver='v2', fr_inside_neurons=64,
fr_stride=1, crop_remove_each=30, value_crop_size=4,
action_sample_type='sample', action_sample_combine_type='one_or_other',
sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True,
vin_num_iters=36, isd_k=750., use_agent_loc=False, multi_scale=True,
readout_maps=False, rom_arch=readout_maps_arch_args)
return arch_args, mapper_arch_args
def get_arch_vars(arch_str):
if arch_str == '': vals = []
else: vals = arch_str.split('_')
ks = ['var1', 'var2', 'var3']
ks = ks[:len(vals)]
# Exp Ver.
if len(vals) == 0: ks.append('var1'); vals.append('v0')
# custom arch.
if len(vals) == 1: ks.append('var2'); vals.append('')
# map scape for projection baseline.
if len(vals) == 2: ks.append('var3'); vals.append('fr2')
assert(len(vals) == 3)
vars = utils.Foo()
for k, v in zip(ks, vals):
setattr(vars, k, v)
logging.error('arch_vars: %s', vars)
return vars
def process_arch_str(args, arch_str):
# This function modifies args.
args.arch, args.mapper_arch = get_default_cmp_args()
arch_vars = get_arch_vars(arch_str)
args.navtask.task_params.outputs.ego_maps = True
args.navtask.task_params.outputs.ego_goal_imgs = True
args.navtask.task_params.outputs.egomotion = True
args.navtask.task_params.toy_problem = False
if arch_vars.var1 == 'lmap':
args = process_arch_learned_map(args, arch_vars)
elif arch_vars.var1 == 'pmap':
args = process_arch_projected_map(args, arch_vars)
else:
logging.fatal('arch_vars.var1 should be lmap or pmap, but is %s', arch_vars.var1)
assert(False)
return args
def process_arch_learned_map(args, arch_vars):
# Multiscale vision based system.
args.navtask.task_params.input_type = 'vision'
args.navtask.task_params.outputs.images = True
if args.navtask.camera_param.modalities[0] == 'rgb':
args.solver.pretrained_path = rgb_resnet_v2_50_path
elif args.navtask.camera_param.modalities[0] == 'depth':
args.solver.pretrained_path = d_resnet_v2_50_path
if arch_vars.var2 == 'Ssc':
sc = 1./args.navtask.task_params.step_size
args.arch.vin_num_iters = 40
args.navtask.task_params.map_scales = [sc]
max_dist = args.navtask.task_params.max_dist * \
args.navtask.task_params.num_goals
args.navtask.task_params.map_crop_sizes = [2*max_dist]
args.arch.fr_stride = 1
args.arch.vin_action_neurons = 8
args.arch.vin_val_neurons = 3
args.arch.fr_inside_neurons = 32
args.mapper_arch.pad_map_with_zeros_each = [24]
args.mapper_arch.deconv_neurons = [64, 32, 16]
args.mapper_arch.deconv_strides = [1, 2, 1]
elif (arch_vars.var2 == 'Msc' or arch_vars.var2 == 'MscROMms' or
arch_vars.var2 == 'MscROMss' or arch_vars.var2 == 'MscNoVin'):
# Code for multi-scale planner.
args.arch.vin_num_iters = 8
args.arch.crop_remove_each = 4
args.arch.value_crop_size = 8
sc = 1./args.navtask.task_params.step_size
max_dist = args.navtask.task_params.max_dist * \
args.navtask.task_params.num_goals
n_scales = np.log2(float(max_dist) / float(args.arch.vin_num_iters))
n_scales = int(np.ceil(n_scales)+1)
args.navtask.task_params.map_scales = \
list(sc*(0.5**(np.arange(n_scales))[::-1]))
args.navtask.task_params.map_crop_sizes = [16 for x in range(n_scales)]
args.arch.fr_stride = 1
args.arch.vin_action_neurons = 8
args.arch.vin_val_neurons = 3
args.arch.fr_inside_neurons = 32
args.mapper_arch.pad_map_with_zeros_each = [0 for _ in range(n_scales)]
args.mapper_arch.deconv_neurons = [64*n_scales, 32*n_scales, 16*n_scales]
args.mapper_arch.deconv_strides = [1, 2, 1]
if arch_vars.var2 == 'MscNoVin':
# No planning version.
args.arch.fr_stride = [1, 2, 1, 2]
args.arch.vin_action_neurons = None
args.arch.vin_val_neurons = 16
args.arch.fr_inside_neurons = 32
args.arch.crop_remove_each = 0
args.arch.value_crop_size = 4
args.arch.vin_num_iters = 0
elif arch_vars.var2 == 'MscROMms' or arch_vars.var2 == 'MscROMss':
# Code with read outs, MscROMms flattens and reads out,
# MscROMss does not flatten and produces output at multiple scales.
args.navtask.task_params.outputs.readout_maps = True
args.navtask.task_params.map_resize_method = 'antialiasing'
args.arch.readout_maps = True
if arch_vars.var2 == 'MscROMms':
args.arch.rom_arch.num_neurons = [64, 1]
args.arch.rom_arch.kernel_size = 4
args.arch.rom_arch.strides = [2,2]
args.arch.rom_arch.layers_per_block = 2
args.navtask.task_params.readout_maps_crop_sizes = [64]
args.navtask.task_params.readout_maps_scales = [sc]
elif arch_vars.var2 == 'MscROMss':
args.arch.rom_arch.num_neurons = \
[64, len(args.navtask.task_params.map_scales)]
args.arch.rom_arch.kernel_size = 4
args.arch.rom_arch.strides = [1,1]
args.arch.rom_arch.layers_per_block = 1
args.navtask.task_params.readout_maps_crop_sizes = \
args.navtask.task_params.map_crop_sizes
args.navtask.task_params.readout_maps_scales = \
args.navtask.task_params.map_scales
else:
logging.fatal('arch_vars.var2 not one of Msc, MscROMms, MscROMss, MscNoVin.')
assert(False)
map_channels = args.mapper_arch.deconv_neurons[-1] / \
(2*len(args.navtask.task_params.map_scales))
args.navtask.task_params.map_channels = map_channels
return args
def process_arch_projected_map(args, arch_vars):
# Single scale vision based system which does not use a mapper but instead
# uses an analytically estimated map.
ds = int(arch_vars.var3[2])
args.navtask.task_params.input_type = 'analytical_counts'
args.navtask.task_params.outputs.analytical_counts = True
assert(args.navtask.task_params.modalities[0] == 'depth')
args.navtask.camera_param.img_channels = None
analytical_counts = utils.Foo(map_sizes=[512/ds],
xy_resolution=[5.*ds],
z_bins=[[-10, 10, 150, 200]],
non_linearity=[arch_vars.var2])
args.navtask.task_params.analytical_counts = analytical_counts
sc = 1./ds
args.arch.vin_num_iters = 36
args.navtask.task_params.map_scales = [sc]
args.navtask.task_params.map_crop_sizes = [512/ds]
args.arch.fr_stride = [1,2]
args.arch.vin_action_neurons = 8
args.arch.vin_val_neurons = 3
args.arch.fr_inside_neurons = 32
map_channels = len(analytical_counts.z_bins[0]) + 1
args.navtask.task_params.map_channels = map_channels
args.solver.freeze_conv = False
return args
def get_args_for_config(config_name):
args = utils.Foo()
args.summary, args.control = get_default_args()
exp_name, mode_str = config_name.split('+')
arch_str, solver_str, navtask_str = exp_name.split('.')
logging.error('config_name: %s', config_name)
logging.error('arch_str: %s', arch_str)
logging.error('navtask_str: %s', navtask_str)
logging.error('solver_str: %s', solver_str)
logging.error('mode_str: %s', mode_str)
args.solver = cc.process_solver_str(solver_str)
args.navtask = cc.process_navtask_str(navtask_str)
args = process_arch_str(args, arch_str)
args.arch.isd_k = args.solver.isd_k
# Train, test, etc.
mode, imset = mode_str.split('_')
args = cc.adjust_args_for_mode(args, mode)
args.navtask.building_names = args.navtask.dataset.get_split(imset)
args.control.test_name = '{:s}_on_{:s}'.format(mode, imset)
# Log the arguments
logging.error('%s', args)
return args
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import os
import numpy as np
import logging
import src.utils as utils
import datasets.nav_env_config as nec
from datasets import factory
def adjust_args_for_mode(args, mode):
if mode == 'train':
args.control.train = True
elif mode == 'val1':
# Same settings as for training, to make sure nothing wonky is happening
# there.
args.control.test = True
args.control.test_mode = 'val'
args.navtask.task_params.batch_size = 32
elif mode == 'val2':
# No data augmentation, not sampling but taking the argmax action, not
# sampling from the ground truth at all.
args.control.test = True
args.arch.action_sample_type = 'argmax'
args.arch.sample_gt_prob_type = 'zero'
args.navtask.task_params.data_augment = \
utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False,
relight_fast=False, structured=False)
args.control.test_mode = 'val'
args.navtask.task_params.batch_size = 32
elif mode == 'bench':
# Actually testing the agent in settings that are kept same between
# different runs.
args.navtask.task_params.batch_size = 16
args.control.test = True
args.arch.action_sample_type = 'argmax'
args.arch.sample_gt_prob_type = 'zero'
args.navtask.task_params.data_augment = \
utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False,
relight_fast=False, structured=False)
args.summary.test_iters = 250
args.control.only_eval_when_done = True
args.control.reset_rng_seed = True
args.control.test_mode = 'test'
else:
logging.fatal('Unknown mode: %s.', mode)
assert(False)
return args
def get_solver_vars(solver_str):
if solver_str == '': vals = [];
else: vals = solver_str.split('_')
ks = ['clip', 'dlw', 'long', 'typ', 'isdk', 'adam_eps', 'init_lr'];
ks = ks[:len(vals)]
# Gradient clipping or not.
if len(vals) == 0: ks.append('clip'); vals.append('noclip');
# data loss weight.
if len(vals) == 1: ks.append('dlw'); vals.append('dlw20')
# how long to train for.
if len(vals) == 2: ks.append('long'); vals.append('nolong')
# Adam
if len(vals) == 3: ks.append('typ'); vals.append('adam2')
# reg loss wt
if len(vals) == 4: ks.append('rlw'); vals.append('rlw1')
# isd_k
if len(vals) == 5: ks.append('isdk'); vals.append('isdk415') # 415, inflexion at 2.5k.
# adam eps
if len(vals) == 6: ks.append('adam_eps'); vals.append('aeps1en8')
# init lr
if len(vals) == 7: ks.append('init_lr'); vals.append('lr1en3')
assert(len(vals) == 8)
vars = utils.Foo()
for k, v in zip(ks, vals):
setattr(vars, k, v)
logging.error('solver_vars: %s', vars)
return vars
def process_solver_str(solver_str):
solver = utils.Foo(
seed=0, learning_rate_decay=None, clip_gradient_norm=None, max_steps=None,
initial_learning_rate=None, momentum=None, steps_per_decay=None,
logdir=None, sync=False, adjust_lr_sync=True, wt_decay=0.0001,
data_loss_wt=None, reg_loss_wt=None, freeze_conv=True, num_workers=1,
task=0, ps_tasks=0, master='local', typ=None, momentum2=None,
adam_eps=None)
# Clobber with overrides from solver str.
solver_vars = get_solver_vars(solver_str)
solver.data_loss_wt = float(solver_vars.dlw[3:].replace('x', '.'))
solver.adam_eps = float(solver_vars.adam_eps[4:].replace('x', '.').replace('n', '-'))
solver.initial_learning_rate = float(solver_vars.init_lr[2:].replace('x', '.').replace('n', '-'))
solver.reg_loss_wt = float(solver_vars.rlw[3:].replace('x', '.'))
solver.isd_k = float(solver_vars.isdk[4:].replace('x', '.'))
long = solver_vars.long
if long == 'long':
solver.steps_per_decay = 40000
solver.max_steps = 120000
elif long == 'long2':
solver.steps_per_decay = 80000
solver.max_steps = 120000
elif long == 'nolong' or long == 'nol':
solver.steps_per_decay = 20000
solver.max_steps = 60000
else:
logging.fatal('solver_vars.long should be long, long2, nolong or nol.')
assert(False)
clip = solver_vars.clip
if clip == 'noclip' or clip == 'nocl':
solver.clip_gradient_norm = 0
elif clip[:4] == 'clip':
solver.clip_gradient_norm = float(clip[4:].replace('x', '.'))
else:
logging.fatal('Unknown solver_vars.clip: %s', clip)
assert(False)
typ = solver_vars.typ
if typ == 'adam':
solver.typ = 'adam'
solver.momentum = 0.9
solver.momentum2 = 0.999
solver.learning_rate_decay = 1.0
elif typ == 'adam2':
solver.typ = 'adam'
solver.momentum = 0.9
solver.momentum2 = 0.999
solver.learning_rate_decay = 0.1
elif typ == 'sgd':
solver.typ = 'sgd'
solver.momentum = 0.99
solver.momentum2 = None
solver.learning_rate_decay = 0.1
else:
logging.fatal('Unknown solver_vars.typ: %s', typ)
assert(False)
logging.error('solver: %s', solver)
return solver
def get_navtask_vars(navtask_str):
if navtask_str == '': vals = []
else: vals = navtask_str.split('_')
ks_all = ['dataset_name', 'modality', 'task', 'history', 'max_dist',
'num_steps', 'step_size', 'n_ori', 'aux_views', 'data_aug']
ks = ks_all[:len(vals)]
# All data or not.
if len(vals) == 0: ks.append('dataset_name'); vals.append('sbpd')
# modality
if len(vals) == 1: ks.append('modality'); vals.append('rgb')
# semantic task?
if len(vals) == 2: ks.append('task'); vals.append('r2r')
# number of history frames.
if len(vals) == 3: ks.append('history'); vals.append('h0')
# max steps
if len(vals) == 4: ks.append('max_dist'); vals.append('32')
# num steps
if len(vals) == 5: ks.append('num_steps'); vals.append('40')
# step size
if len(vals) == 6: ks.append('step_size'); vals.append('8')
# n_ori
if len(vals) == 7: ks.append('n_ori'); vals.append('4')
# Auxiliary views.
if len(vals) == 8: ks.append('aux_views'); vals.append('nv0')
# Normal data augmentation as opposed to structured data augmentation (if set
# to straug.
if len(vals) == 9: ks.append('data_aug'); vals.append('straug')
assert(len(vals) == 10)
for i in range(len(ks)):
assert(ks[i] == ks_all[i])
vars = utils.Foo()
for k, v in zip(ks, vals):
setattr(vars, k, v)
logging.error('navtask_vars: %s', vals)
return vars
def process_navtask_str(navtask_str):
navtask = nec.nav_env_base_config()
# Clobber with overrides from strings.
navtask_vars = get_navtask_vars(navtask_str)
navtask.task_params.n_ori = int(navtask_vars.n_ori)
navtask.task_params.max_dist = int(navtask_vars.max_dist)
navtask.task_params.num_steps = int(navtask_vars.num_steps)
navtask.task_params.step_size = int(navtask_vars.step_size)
navtask.task_params.data_augment.delta_xy = int(navtask_vars.step_size)/2.
n_aux_views_each = int(navtask_vars.aux_views[2])
aux_delta_thetas = np.concatenate((np.arange(n_aux_views_each) + 1,
-1 -np.arange(n_aux_views_each)))
aux_delta_thetas = aux_delta_thetas*np.deg2rad(navtask.camera_param.fov)
navtask.task_params.aux_delta_thetas = aux_delta_thetas
if navtask_vars.data_aug == 'aug':
navtask.task_params.data_augment.structured = False
elif navtask_vars.data_aug == 'straug':
navtask.task_params.data_augment.structured = True
else:
logging.fatal('Unknown navtask_vars.data_aug %s.', navtask_vars.data_aug)
assert(False)
navtask.task_params.num_history_frames = int(navtask_vars.history[1:])
navtask.task_params.n_views = 1+navtask.task_params.num_history_frames
navtask.task_params.goal_channels = int(navtask_vars.n_ori)
if navtask_vars.task == 'hard':
navtask.task_params.type = 'rng_rejection_sampling_many'
navtask.task_params.rejection_sampling_M = 2000
navtask.task_params.min_dist = 10
elif navtask_vars.task == 'r2r':
navtask.task_params.type = 'room_to_room_many'
elif navtask_vars.task == 'ST':
# Semantic task at hand.
navtask.task_params.goal_channels = \
len(navtask.task_params.semantic_task.class_map_names)
navtask.task_params.rel_goal_loc_dim = \
len(navtask.task_params.semantic_task.class_map_names)
navtask.task_params.type = 'to_nearest_obj_acc'
else:
logging.fatal('navtask_vars.task: should be hard or r2r, ST')
assert(False)
if navtask_vars.modality == 'rgb':
navtask.camera_param.modalities = ['rgb']
navtask.camera_param.img_channels = 3
elif navtask_vars.modality == 'd':
navtask.camera_param.modalities = ['depth']
navtask.camera_param.img_channels = 2
navtask.task_params.img_height = navtask.camera_param.height
navtask.task_params.img_width = navtask.camera_param.width
navtask.task_params.modalities = navtask.camera_param.modalities
navtask.task_params.img_channels = navtask.camera_param.img_channels
navtask.task_params.img_fov = navtask.camera_param.fov
navtask.dataset = factory.get_dataset(navtask_vars.dataset_name)
return navtask
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import pprint
import copy
import os
from tensorflow.python.platform import app
from tensorflow.python.platform import flags
import logging
import src.utils as utils
import cfgs.config_common as cc
import tensorflow as tf
rgb_resnet_v2_50_path = 'cache/resnet_v2_50_inception_preprocessed/model.ckpt-5136169'
def get_default_args():
robot = utils.Foo(radius=15, base=10, height=140, sensor_height=120,
camera_elevation_degree=-15)
camera_param = utils.Foo(width=225, height=225, z_near=0.05, z_far=20.0,
fov=60., modalities=['rgb', 'depth'])
env = utils.Foo(padding=10, resolution=5, num_point_threshold=2,
valid_min=-10, valid_max=200, n_samples_per_face=200)
data_augment = utils.Foo(lr_flip=0, delta_angle=1, delta_xy=4, relight=False,
relight_fast=False, structured=False)
task_params = utils.Foo(num_actions=4, step_size=4, num_steps=0,
batch_size=32, room_seed=0, base_class='Building',
task='mapping', n_ori=6, data_augment=data_augment,
output_transform_to_global_map=False,
output_canonical_map=False,
output_incremental_transform=False,
output_free_space=False, move_type='shortest_path',
toy_problem=0)
buildinger_args = utils.Foo(building_names=['area1_gates_wingA_floor1_westpart'],
env_class=None, robot=robot,
task_params=task_params, env=env,
camera_param=camera_param)
solver_args = utils.Foo(seed=0, learning_rate_decay=0.1,
clip_gradient_norm=0, max_steps=120000,
initial_learning_rate=0.001, momentum=0.99,
steps_per_decay=40000, logdir=None, sync=False,
adjust_lr_sync=True, wt_decay=0.0001,
data_loss_wt=1.0, reg_loss_wt=1.0,
num_workers=1, task=0, ps_tasks=0, master='local')
summary_args = utils.Foo(display_interval=1, test_iters=100)
control_args = utils.Foo(train=False, test=False,
force_batchnorm_is_training_at_test=False)
arch_args = utils.Foo(rgb_encoder='resnet_v2_50', d_encoder='resnet_v2_50')
return utils.Foo(solver=solver_args,
summary=summary_args, control=control_args, arch=arch_args,
buildinger=buildinger_args)
def get_vars(config_name):
vars = config_name.split('_')
if len(vars) == 1: # All data or not.
vars.append('noall')
if len(vars) == 2: # n_ori
vars.append('4')
logging.error('vars: %s', vars)
return vars
def get_args_for_config(config_name):
args = get_default_args()
config_name, mode = config_name.split('+')
vars = get_vars(config_name)
logging.info('config_name: %s, mode: %s', config_name, mode)
args.buildinger.task_params.n_ori = int(vars[2])
args.solver.freeze_conv = True
args.solver.pretrained_path = resnet_v2_50_path
args.buildinger.task_params.img_channels = 5
args.solver.data_loss_wt = 0.00001
if vars[0] == 'v0':
None
else:
logging.error('config_name: %s undefined', config_name)
args.buildinger.task_params.height = args.buildinger.camera_param.height
args.buildinger.task_params.width = args.buildinger.camera_param.width
args.buildinger.task_params.modalities = args.buildinger.camera_param.modalities
if vars[1] == 'all':
args = cc.get_args_for_mode_building_all(args, mode)
elif vars[1] == 'noall':
args = cc.get_args_for_mode_building(args, mode)
# Log the arguments
logging.error('%s', args)
return args
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import pprint
import os
import numpy as np
from tensorflow.python.platform import app
from tensorflow.python.platform import flags
import logging
import src.utils as utils
import cfgs.config_common as cc
import datasets.nav_env_config as nec
import tensorflow as tf
FLAGS = flags.FLAGS
get_solver_vars = cc.get_solver_vars
get_navtask_vars = cc.get_navtask_vars
rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169'
d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002'
def get_default_args():
summary_args = utils.Foo(display_interval=1, test_iters=26,
arop_full_summary_iters=14)
control_args = utils.Foo(train=False, test=False,
force_batchnorm_is_training_at_test=False,
reset_rng_seed=False, only_eval_when_done=False,
test_mode=None)
return summary_args, control_args
def get_default_baseline_args():
batch_norm_param = {'center': True, 'scale': True,
'activation_fn':tf.nn.relu}
arch_args = utils.Foo(
pred_neurons=[], goal_embed_neurons=[], img_embed_neurons=[],
batch_norm_param=batch_norm_param, dim_reduce_neurons=64, combine_type='',
encoder='resnet_v2_50', action_sample_type='sample',
action_sample_combine_type='one_or_other',
sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True,
isd_k=750., use_visit_count=False, lstm_output=False, lstm_ego=False,
lstm_img=False, fc_dropout=0.0, embed_goal_for_state=False,
lstm_output_init_state_from_goal=False)
return arch_args
def get_arch_vars(arch_str):
if arch_str == '': vals = []
else: vals = arch_str.split('_')
ks = ['ver', 'lstm_dim', 'dropout']
# Exp Ver
if len(vals) == 0: vals.append('v0')
# LSTM dimentsions
if len(vals) == 1: vals.append('lstm2048')
# Dropout
if len(vals) == 2: vals.append('noDO')
assert(len(vals) == 3)
vars = utils.Foo()
for k, v in zip(ks, vals):
setattr(vars, k, v)
logging.error('arch_vars: %s', vars)
return vars
def process_arch_str(args, arch_str):
# This function modifies args.
args.arch = get_default_baseline_args()
arch_vars = get_arch_vars(arch_str)
args.navtask.task_params.outputs.rel_goal_loc = True
args.navtask.task_params.input_type = 'vision'
args.navtask.task_params.outputs.images = True
if args.navtask.camera_param.modalities[0] == 'rgb':
args.solver.pretrained_path = rgb_resnet_v2_50_path
elif args.navtask.camera_param.modalities[0] == 'depth':
args.solver.pretrained_path = d_resnet_v2_50_path
else:
logging.fatal('Neither of rgb or d')
if arch_vars.dropout == 'DO':
args.arch.fc_dropout = 0.5
args.tfcode = 'B'
exp_ver = arch_vars.ver
if exp_ver == 'v0':
# Multiplicative interaction between goal loc and image features.
args.arch.combine_type = 'multiply'
args.arch.pred_neurons = [256, 256]
args.arch.goal_embed_neurons = [64, 8]
args.arch.img_embed_neurons = [1024, 512, 256*8]
elif exp_ver == 'v1':
# Additive interaction between goal and image features.
args.arch.combine_type = 'add'
args.arch.pred_neurons = [256, 256]
args.arch.goal_embed_neurons = [64, 256]
args.arch.img_embed_neurons = [1024, 512, 256]
elif exp_ver == 'v2':
# LSTM at the output on top of multiple interactions.
args.arch.combine_type = 'multiply'
args.arch.goal_embed_neurons = [64, 8]
args.arch.img_embed_neurons = [1024, 512, 256*8]
args.arch.lstm_output = True
args.arch.lstm_output_dim = int(arch_vars.lstm_dim[4:])
args.arch.pred_neurons = [256] # The other is inside the LSTM.
elif exp_ver == 'v0blind':
# LSTM only on the goal location.
args.arch.combine_type = 'goalonly'
args.arch.goal_embed_neurons = [64, 256]
args.arch.img_embed_neurons = [2] # I dont know what it will do otherwise.
args.arch.lstm_output = True
args.arch.lstm_output_dim = 256
args.arch.pred_neurons = [256] # The other is inside the LSTM.
else:
logging.fatal('exp_ver: %s undefined', exp_ver)
assert(False)
# Log the arguments
logging.error('%s', args)
return args
def get_args_for_config(config_name):
args = utils.Foo()
args.summary, args.control = get_default_args()
exp_name, mode_str = config_name.split('+')
arch_str, solver_str, navtask_str = exp_name.split('.')
logging.error('config_name: %s', config_name)
logging.error('arch_str: %s', arch_str)
logging.error('navtask_str: %s', navtask_str)
logging.error('solver_str: %s', solver_str)
logging.error('mode_str: %s', mode_str)
args.solver = cc.process_solver_str(solver_str)
args.navtask = cc.process_navtask_str(navtask_str)
args = process_arch_str(args, arch_str)
args.arch.isd_k = args.solver.isd_k
# Train, test, etc.
mode, imset = mode_str.split('_')
args = cc.adjust_args_for_mode(args, mode)
args.navtask.building_names = args.navtask.dataset.get_split(imset)
args.control.test_name = '{:s}_on_{:s}'.format(mode, imset)
# Log the arguments
logging.error('%s', args)
return args
stanford_building_parser_dataset_raw
stanford_building_parser_dataset
init_models
This directory contains the data needed for training and benchmarking various
navigation models.
1. Download the data from the [dataset website]
(http://buildingparser.stanford.edu/dataset.html).
1. [Raw meshes](https://goo.gl/forms/2YSPaO2UKmn5Td5m2). We need the meshes
which are in the noXYZ folder. Download the tar files and place them in
the `stanford_building_parser_dataset_raw` folder. You need to download
`area_1_noXYZ.tar`, `area_3_noXYZ.tar`, `area_5a_noXYZ.tar`,
`area_5b_noXYZ.tar`, `area_6_noXYZ.tar` for training and
`area_4_noXYZ.tar` for evaluation.
2. [Annotations](https://goo.gl/forms/4SoGp4KtH1jfRqEj2) for setting up
tasks. We will need the file called `Stanford3dDataset_v1.2.zip`. Place
the file in the directory `stanford_building_parser_dataset_raw`.
2. Preprocess the data.
1. Extract meshes using `scripts/script_preprocess_meshes_S3DIS.sh`. After
this `ls data/stanford_building_parser_dataset/mesh` should have 6
folders `area1`, `area3`, `area4`, `area5a`, `area5b`, `area6`, with
textures and obj files within each directory.
2. Extract out room information and semantics from zip file using
`scripts/script_preprocess_annoations_S3DIS.sh`. After this there should
be `room-dimension` and `class-maps` folder in
`data/stanford_building_parser_dataset`. (If you find this script to
crash because of an exception in np.loadtxt while processing
`Area_5/office_19/Annotations/ceiling_1.txt`, there is a special
character on line 323474, that should be removed manually.)
3. Download ImageNet Pre-trained models. We used ResNet-v2-50 for representing
images. For RGB images this is pre-trained on ImageNet. For Depth images we
[distill](https://arxiv.org/abs/1507.00448) the RGB model to depth images
using paired RGB-D images. Both there models are available through
`scripts/script_download_init_models.sh`
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Wrapper for selecting the navigation environment that we want to train and
test on.
"""
import numpy as np
import os, glob
import platform
import logging
from tensorflow.python.platform import app
from tensorflow.python.platform import flags
import render.swiftshader_renderer as renderer
import src.file_utils as fu
import src.utils as utils
def get_dataset(dataset_name):
if dataset_name == 'sbpd':
dataset = StanfordBuildingParserDataset(dataset_name)
else:
logging.fatal('Not one of sbpd')
return dataset
class Loader():
def get_data_dir():
pass
def get_meta_data(self, file_name, data_dir=None):
if data_dir is None:
data_dir = self.get_data_dir()
full_file_name = os.path.join(data_dir, 'meta', file_name)
assert(fu.exists(full_file_name)), \
'{:s} does not exist'.format(full_file_name)
ext = os.path.splitext(full_file_name)[1]
if ext == '.txt':
ls = []
with fu.fopen(full_file_name, 'r') as f:
for l in f:
ls.append(l.rstrip())
elif ext == '.pkl':
ls = utils.load_variables(full_file_name)
return ls
def load_building(self, name, data_dir=None):
if data_dir is None:
data_dir = self.get_data_dir()
out = {}
out['name'] = name
out['data_dir'] = data_dir
out['room_dimension_file'] = os.path.join(data_dir, 'room-dimension',
name+'.pkl')
out['class_map_folder'] = os.path.join(data_dir, 'class-maps')
return out
def load_building_meshes(self, building):
dir_name = os.path.join(building['data_dir'], 'mesh', building['name'])
mesh_file_name = glob.glob1(dir_name, '*.obj')[0]
mesh_file_name_full = os.path.join(dir_name, mesh_file_name)
logging.error('Loading building from obj file: %s', mesh_file_name_full)
shape = renderer.Shape(mesh_file_name_full, load_materials=True,
name_prefix=building['name']+'_')
return [shape]
class StanfordBuildingParserDataset(Loader):
def __init__(self, ver):
self.ver = ver
self.data_dir = None
def get_data_dir(self):
if self.data_dir is None:
self.data_dir = 'data/stanford_building_parser_dataset/'
return self.data_dir
def get_benchmark_sets(self):
return self._get_benchmark_sets()
def get_split(self, split_name):
if self.ver == 'sbpd':
return self._get_split(split_name)
else:
logging.fatal('Unknown version.')
def _get_benchmark_sets(self):
sets = ['train1', 'val', 'test']
return sets
def _get_split(self, split_name):
train = ['area1', 'area5a', 'area5b', 'area6']
train1 = ['area1']
val = ['area3']
test = ['area4']
sets = {}
sets['train'] = train
sets['train1'] = train1
sets['val'] = val
sets['test'] = test
sets['all'] = sorted(list(set(train + val + test)))
return sets[split_name]
This diff is collapsed.
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Configs for stanford navigation environment.
Base config for stanford navigation enviornment.
"""
import numpy as np
import src.utils as utils
import datasets.nav_env as nav_env
def nav_env_base_config():
"""Returns the base config for stanford navigation environment.
Returns:
Base config for stanford navigation environment.
"""
robot = utils.Foo(radius=15,
base=10,
height=140,
sensor_height=120,
camera_elevation_degree=-15)
env = utils.Foo(padding=10,
resolution=5,
num_point_threshold=2,
valid_min=-10,
valid_max=200,
n_samples_per_face=200)
camera_param = utils.Foo(width=225,
height=225,
z_near=0.05,
z_far=20.0,
fov=60.,
modalities=['rgb'],
img_channels=3)
data_augment = utils.Foo(lr_flip=0,
delta_angle=0.5,
delta_xy=4,
relight=True,
relight_fast=False,
structured=False) # if True, uses the same perturb for the whole episode.
outputs = utils.Foo(images=True,
rel_goal_loc=False,
loc_on_map=True,
gt_dist_to_goal=True,
ego_maps=False,
ego_goal_imgs=False,
egomotion=False,
visit_count=False,
analytical_counts=False,
node_ids=True,
readout_maps=False)
# class_map_names=['board', 'chair', 'door', 'sofa', 'table']
class_map_names = ['chair', 'door', 'table']
semantic_task = utils.Foo(class_map_names=class_map_names, pix_distance=16,
sampling='uniform')
# time per iteration for cmp is 0.82 seconds per episode with 3.4s overhead per batch.
task_params = utils.Foo(max_dist=32,
step_size=8,
num_steps=40,
num_actions=4,
batch_size=4,
building_seed=0,
num_goals=1,
img_height=None,
img_width=None,
img_channels=None,
modalities=None,
outputs=outputs,
map_scales=[1.],
map_crop_sizes=[64],
rel_goal_loc_dim=4,
base_class='Building',
task='map+plan',
n_ori=4,
type='room_to_room_many',
data_augment=data_augment,
room_regex='^((?!hallway).)*$',
toy_problem=False,
map_channels=1,
gt_coverage=False,
input_type='maps',
full_information=False,
aux_delta_thetas=[],
semantic_task=semantic_task,
num_history_frames=0,
node_ids_dim=1,
perturbs_dim=4,
map_resize_method='linear_noantialiasing',
readout_maps_channels=1,
readout_maps_scales=[],
readout_maps_crop_sizes=[],
n_views=1,
reward_time_penalty=0.1,
reward_at_goal=1.,
discount_factor=0.99,
rejection_sampling_M=100,
min_dist=None)
navtask_args = utils.Foo(
building_names=['area1_gates_wingA_floor1_westpart'],
env_class=nav_env.VisualNavigationEnv,
robot=robot,
task_params=task_params,
env=env,
camera_param=camera_param,
cache_rooms=True)
return navtask_args
### Pre-Trained Models
We provide the following pre-trained models:
Config Name | Checkpoint | Mean Dist. | 50%ile Dist. | 75%ile Dist. | Success %age |
:-: | :-: | :-: | :-: | :-: | :-: |
cmp.lmap_Msc.clip5.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r.tar) | 4.79 | 0 | 1 | 78.9 |
cmp.lmap_Msc.clip5.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_r2r.tar) | 7.74 | 0 | 14 | 62.4 |
cmp.lmap_Msc.clip5.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_ST.tar) | 10.67 | 9 | 19 | 39.7 |
cmp.lmap_Msc.clip5.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_ST.tar) | 11.27 | 10 | 19 | 35.6 |
cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80.tar) | 11.6 | 0 | 19 | 66.9 |
bl.v2.noclip.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r.tar) | 5.90 | 0 | 6 | 71.2 |
bl.v2.noclip.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_r2r.tar) | 10.21 | 1 | 21 | 53.4 |
bl.v2.noclip.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_ST.tar) | 13.29 | 14 | 23 | 28.0 |
bl.v2.noclip.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_ST.tar) | 13.37 | 13 | 20 | 24.2 |
bl.v2.noclip.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r_h0_64_80.tar) | 15.30 | 0 | 29 | 57.9 |
10c10
< from OpenGL import platform, constant, arrays
---
> from OpenGL import platform, constant, arrays, contextdata
249a250
> from OpenGL._bytes import _NULL_8_BYTE
399c400
< array = ArrayDatatype.asArray( pointer, type )
---
> array = arrays.ArrayDatatype.asArray( pointer, type )
405c406
< ArrayDatatype.voidDataPointer( array )
---
> arrays.ArrayDatatype.voidDataPointer( array )
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
echo $VIRTUAL_ENV
patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/GLES2/VERSION/GLES2_2_0.py patches/GLES2_2_0.py.patch
patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/platform/ctypesloader.py patches/ctypesloader.py.patch
45c45,46
< return dllType( name, mode )
---
> print './' + name
> return dllType( './' + name, mode )
47,48c48,53
< err.args += (name,fullName)
< raise
---
> try:
> print name
> return dllType( name, mode )
> except:
> err.args += (name,fullName)
> raise
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment