Commit 92aa2fa8 authored by zachteed's avatar zachteed
Browse files

initial commit

parents
__pycache__
build
dist
*.egg-info
*.vscode/
*.pth
tests
checkpoints
datasets
runs
a.out
cache
*.g2o
[submodule "eigen"]
path = eigen
url = https://gitlab.com/libeigen/eigen
BSD 3-Clause License
Copyright (c) 2021, princeton-vl
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# LieTorch: Tangent Space Backpropogation
## Introduction
The LieTorch library generalizes PyTorch to 3D transformation groups. Just as `torch.Tensor` is a multi-dimensional matrix of scalar elements, `lietorch.SE3` is a multi-dimensional matrix of SE3 elements. We support common tensor manipulations such as indexing, reshaping, and broadcasting. Group operations can be composed into computation graphs and backpropogation is automatically peformed in the tangent space of each element. For more details, please see our paper:
<center><img src="lietorch.png" width="480" style="center"></center>
Tangent Space Backpropogation for 3D Transformation Groups
Zachary Teed and Jia Deng, CVPR 2021
## Installation
### Requirements:
* Cuda >= 10.1 (with nvcc compiler)
* PyTorch >= 1.6
We recommend installing within a virtual enviornment. Make sure you clone using the `--recursive` flag. If you are using Anaconda, the following command can be used to install all dependencies
```
git clone --recursive https://github.com/princeton-vl/lietorch.git
cd lietorch
conda create -n lie_env
conda activate lie_env
conda install scipy pyyaml pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
```
To run the examples, you will need OpenCV and Open3D. Depending on your operating system, OpenCV and Open3D can either be installed with pip or may need to be built from source
```
pip install opencv-python open3d
```
### Installing:
Clone the repo using the `--recursive` flag and install using `setup.py` (may take up to 10 minutes)
```
git clone --recursive https://github.com/princeton-vl/lietorch.git
python setup.py install
./run_tests.sh
```
## Overview
LieTorch currently supports the 3D transformation groups.
| Group | Dimension | Action |
| -------| --------- | ------------- |
| SO3 | 3 | rotation |
| RxSO3 | 4 | rotation + scaling |
| SE3 | 6 | rotation + translation |
| Sim3 | 7 | rotation + translation + scaling |
Each group supports the following operations:
| Operation | Map | Description |
| -------| --------| ------------- |
| exp | g -> G | exponential map |
| log | G -> g | logarithm map |
| inv | G -> G | group inverse |
| mul | G x G -> G | group multiplication |
| adj | G x g -> g | adjoint |
| adjT | G x g*-> g* | dual adjoint |
| act | G x R3 -> R3 | action on point (set) |
| act4 | G x P3 -> P3 | action on homogeneous point (set) |
&nbsp;
#### Simple Example:
Compute the angles between all pairs of rotation matrices
```python
import torch
from lietorch import SO3
phi = torch.randn(8000, 3, device='cuda', requires_grad=True)
R = SO3.exp(phi)
# relative rotation matrix, SO3 ^ {100 x 100}
dR = R[:,None].inv() * R[None,:]
# 100x100 matrix of angles
ang = dR.log().norm(dim=-1)
# backpropogation in tangent space
loss = ang.sum()
loss.backward()
```
## Examples
We provide real use cases in the examples directory
1. Pose Graph Optimization
2. Deep SE3/Sim3 Registrtion
3. RGB-D SLAM / VO
### Acknowledgements
Many of the Lie Group implementations are adapted from [Sophus](https://github.com/strasdat/Sophus).
Subproject commit 824272cde8ca2541e8b67b0887f5ded92b128d1f
import torch
import torchvision.transforms as transforms
import numpy as np
import torch.nn.functional as F
class RGBDAugmentor:
""" perform augmentation on RGB-D video """
def __init__(self, crop_size):
self.crop_size = crop_size
self.augcolor = transforms.Compose([
transforms.ToPILImage(),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.4/3.14),
transforms.ToTensor()])
self.max_scale = 0.25
def spatial_transform(self, images, depths, poses, intrinsics):
""" cropping and resizing """
ht, wd = images.shape[2:]
max_scale = self.max_scale
min_scale = np.log2(np.maximum(
(self.crop_size[0] + 1) / float(ht),
(self.crop_size[1] + 1) / float(wd)))
scale = 2 ** np.random.uniform(min_scale, max_scale)
intrinsics = scale * intrinsics
depths = depths.unsqueeze(dim=1)
images = F.interpolate(images, scale_factor=scale, mode='bilinear',
align_corners=True, recompute_scale_factor=True)
depths = F.interpolate(depths, scale_factor=scale, recompute_scale_factor=True)
# always perform center crop (TODO: try non-center crops)
y0 = (images.shape[2] - self.crop_size[0]) // 2
x0 = (images.shape[3] - self.crop_size[1]) // 2
intrinsics = intrinsics - torch.tensor([0.0, 0.0, x0, y0])
images = images[:, :, y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]
depths = depths[:, :, y0:y0+self.crop_size[0], x0:x0+self.crop_size[1]]
depths = depths.squeeze(dim=1)
return images, poses, depths, intrinsics
def color_transform(self, images):
""" color jittering """
num, ch, ht, wd = images.shape
images = images.permute(1, 2, 3, 0).reshape(ch, ht, wd*num)
images = 255 * self.augcolor(images[[2,1,0]] / 255.0)
return images[[2,1,0]].reshape(ch, ht, wd, num).permute(3,0,1,2).contiguous()
def __call__(self, images, poses, depths, intrinsics):
images = self.color_transform(images)
return self.spatial_transform(images, depths, poses, intrinsics)
import numpy as np
import torch
import torch.utils.data as data
import torch.nn.functional as F
import csv
import os
import cv2
import math
import random
import json
import pickle
import os.path as osp
from .augmentation import RGBDAugmentor
from .rgbd_utils import *
class RGBDDataset(data.Dataset):
def __init__(self, root, name, n_frames=4, crop_size=[384,512], fmin=8.0, fmax=75.0, do_aug=True):
""" Base class for RGBD dataset """
self.aug = None
self.root = root
self.name = name
self.n_frames = n_frames
self.fmin = fmin # exclude very easy examples
self.fmax = fmax # exclude very hard examples
if do_aug:
self.aug = RGBDAugmentor(crop_size=crop_size)
# building dataset is expensive, cache so only needs to be performed once
cur_path = osp.dirname(osp.abspath(__file__))
cache_path = osp.join(cur_path, 'cache', '{}.pickle'.format(self.name))
if not osp.isdir(osp.join(cur_path, 'cache')):
os.mkdir(osp.join(cur_path, 'cache'))
if osp.isfile(cache_path):
scene_info = pickle.load(open(cache_path, 'rb'))[0]
else:
scene_info = self._build_dataset()
with open(cache_path, 'wb') as cachefile:
pickle.dump((scene_info,), cachefile)
self.scene_info = scene_info
self._build_dataset_index()
def _build_dataset_index(self):
self.dataset_index = []
for scene in self.scene_info:
if not self.__class__.is_test_scene(scene):
graph = self.scene_info[scene]['graph']
for i in graph:
if len(graph[i][0]) > self.n_frames:
self.dataset_index.append((scene, i))
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
return np.load(depth_file)
def build_frame_graph(self, poses, depths, intrinsics, f=16, max_flow=256):
""" compute optical flow distance between all pairs of frames """
def read_disp(fn):
depth = self.__class__.depth_read(fn)[f//2::f, f//2::f]
depth[depth < 0.01] = np.mean(depth)
return 1.0 / depth
poses = np.array(poses)
intrinsics = np.array(intrinsics) / f
disps = np.stack(list(map(read_disp, depths)), 0)
d = f * compute_distance_matrix_flow(poses, disps, intrinsics)
# uncomment for nice visualization
# import matplotlib.pyplot as plt
# plt.imshow(d)
# plt.show()
graph = {}
for i in range(d.shape[0]):
j, = np.where(d[i] < max_flow)
graph[i] = (j, d[i,j])
return graph
def __getitem__(self, index):
""" return training video """
index = index % len(self.dataset_index)
scene_id, ix = self.dataset_index[index]
frame_graph = self.scene_info[scene_id]['graph']
images_list = self.scene_info[scene_id]['images']
depths_list = self.scene_info[scene_id]['depths']
poses_list = self.scene_info[scene_id]['poses']
intrinsics_list = self.scene_info[scene_id]['intrinsics']
inds = [ ix ]
while len(inds) < self.n_frames:
# get other frames within flow threshold
k = (frame_graph[ix][1] > self.fmin) & (frame_graph[ix][1] < self.fmax)
frames = frame_graph[ix][0][k]
# prefer frames forward in time
if np.count_nonzero(frames[frames > ix]):
ix = np.random.choice(frames[frames > ix])
elif np.count_nonzero(frames):
ix = np.random.choice(frames)
inds += [ ix ]
images, depths, poses, intrinsics = [], [], [], []
for i in inds:
images.append(self.__class__.image_read(images_list[i]))
depths.append(self.__class__.depth_read(depths_list[i]))
poses.append(poses_list[i])
intrinsics.append(intrinsics_list[i])
images = np.stack(images).astype(np.float32)
depths = np.stack(depths).astype(np.float32)
poses = np.stack(poses).astype(np.float32)
intrinsics = np.stack(intrinsics).astype(np.float32)
images = torch.from_numpy(images).float()
images = images.permute(0, 3, 1, 2)
depths = torch.from_numpy(depths)
poses = torch.from_numpy(poses)
intrinsics = torch.from_numpy(intrinsics)
if self.aug is not None:
images, poses, depths, intrinsics = \
self.aug(images, poses, depths, intrinsics)
return images, poses, depths, intrinsics
def __len__(self):
return len(self.dataset_index)
def __imul__(self, x):
self.dataset_index *= x
return self
import numpy as np
import torch
import torch.utils.data as data
import torch.nn.functional as F
import csv
import os
import cv2
import math
import random
import json
import pickle
import os.path as osp
from lietorch import SE3
from .base import RGBDDataset
from .stream import RGBDStream
from .augmentation import RGBDAugmentor
from .rgbd_utils import loadtum, all_pairs_distance_matrix
class ETH3D(RGBDDataset):
def __init__(self, **kwargs):
super(ETH3D, self).__init__(root='datasets/ETH3D', name='ETH3D', **kwargs)
@staticmethod
def is_test_scene(scene):
return False
def _build_dataset(self):
from tqdm import tqdm
print("Building ETH3D dataset")
scene_info = {}
dataset_index = []
for scene in tqdm(os.listdir(self.root)):
scene_path = osp.join(self.root, scene)
if not osp.isdir(scene_path):
continue
# don't use scenes with no rgb info
if 'dark' in scene or 'kidnap' in scene:
continue
scene_data, graph = {}, {}
images, depths, poses, intrinsics = loadtum(scene_path, skip=2)
# graph of co-visible frames based on flow
graph = self.build_frame_graph(poses, depths, intrinsics)
scene_info[scene] = {'images': images, 'depths': depths,
'poses': poses, 'intrinsics': intrinsics, 'graph': graph}
return scene_info
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 5000.0
class ETH3DStream(RGBDStream):
def __init__(self, datapath, **kwargs):
super(ETH3DStream, self).__init__(datapath=datapath, **kwargs)
def _build_dataset_index(self):
""" build list of images, poses, depths, and intrinsics """
images, depths, poses, intrinsics = loadtum(self.datapath, self.frame_rate)
# set first pose to identity
poses = SE3(torch.as_tensor(poses))
poses = poses[[0]].inv() * poses
poses = poses.data.cpu().numpy()
self.images = images
self.poses = poses
self.depths = depths
self.intrinsics = intrinsics
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 5000.0
import pickle
import os
import os.path as osp
# RGBD-Dataset
from .tartan import TartanAir
from .nyu2 import NYUv2
from .eth3d import ETH3D
from .scannet import ScanNet
# streaming datasets for inference
from .eth3d import ETH3DStream
from .tum import TUMStream
from .tartan import TartanAirStream
def dataset_factory(dataset_list, **kwargs):
""" create a combined dataset """
from torch.utils.data import ConcatDataset
dataset_map = {
'tartan': (TartanAir, 1),
'nyu': (NYUv2, 2),
'eth': (ETH3D, 5),
'scannet': (ScanNet, 1)}
db_list = []
for key in dataset_list:
# cache datasets for faster future loading
db = dataset_map[key][0](**kwargs)
db *= dataset_map[key][1]
print("Dataset {} has {} images".format(key, len(db)))
db_list.append(db)
return ConcatDataset(db_list)
def create_datastream(dataset_path, **kwargs):
""" create data_loader to stream images 1 by 1 """
from torch.utils.data import DataLoader
if osp.isfile(osp.join(dataset_path, 'calibration.txt')):
db = ETH3DStream(dataset_path, **kwargs)
elif osp.isfile(osp.join(dataset_path, 'rgb.txt')):
db = TUMStream(dataset_path, **kwargs)
elif osp.isdir(osp.join(dataset_path, 'image_left')):
db = TartanStream(dataset_path, **kwargs)
stream = DataLoader(db, shuffle=False, batch_size=1, num_workers=4)
return stream
import numpy as np
import torch
import glob
import cv2
import os
import os.path as osp
from .base import RGBDDataset
from .augmentation import RGBDAugmentor
from .rgbd_utils import all_pairs_distance_matrix, loadtum
class NYUv2(RGBDDataset):
def __init__(self, **kwargs):
super(NYUv2, self).__init__(root='datasets/NYUv2', name='NYUv2', **kwargs)
@staticmethod
def is_test_scene(scene):
return False
def _build_dataset(self):
from tqdm import tqdm
print("Building NYUv2 dataset")
scene_info = {}
dataset_index = []
scenes = os.listdir(self.root)
for scene in tqdm(scenes):
scene_path = osp.join(self.root, scene)
images, depths, poses, intrinsics = loadtum(scene_path, frame_rate=10)
# filter out some errors in dataset
if images is None or len(images) < 8:
continue
intrinsic = NYUv2.calib_read()
intrinsics = [intrinsic] * len(images)
# graph of co-visible frames based on flow
graph = self.build_frame_graph(poses, depths, intrinsics)
scene_info[scene] = {'images': images, 'depths': depths,
'poses': poses, 'intrinsics': intrinsics, 'graph': graph}
return scene_info
@staticmethod
def calib_read():
fx = 5.1885790117450188e+02
fy = 5.1946961112127485e+02
cx = 3.2558244941119034e+02
cy = 2.5373616633400465e+02
return np.array([fx, fy, cx, cy])
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 5000.0
import numpy as np
import os.path as osp
import torch
from lietorch import SE3
import geom.projective_ops as pops
from scipy.spatial.transform import Rotation
def parse_list(filepath, skiprows=0):
""" read list data """
data = np.loadtxt(filepath, delimiter=' ', dtype=np.unicode_, skiprows=skiprows)
return data
def associate_frames(tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
""" pair images, depths, and poses """
associations = []
for i, t in enumerate(tstamp_image):
if tstamp_pose is None:
j = np.argmin(np.abs(tstamp_depth - t))
if (np.abs(tstamp_depth[j] - t) < max_dt):
associations.append((i, j))
else:
j = np.argmin(np.abs(tstamp_depth - t))
k = np.argmin(np.abs(tstamp_pose - t))
if (np.abs(tstamp_depth[j] - t) < max_dt) and \
(np.abs(tstamp_pose[k] - t) < max_dt):
associations.append((i, j, k))
return associations
def loadtum(datapath, frame_rate=-1):
""" read video data in tum-rgbd format """
if osp.isfile(osp.join(datapath, 'groundtruth.txt')):
pose_list = osp.join(datapath, 'groundtruth.txt')
elif osp.isfile(osp.join(datapath, 'pose.txt')):
pose_list = osp.join(datapath, 'pose.txt')
image_list = osp.join(datapath, 'rgb.txt')
depth_list = osp.join(datapath, 'depth.txt')
calib_path = osp.join(datapath, 'calibration.txt')
intrinsic = None
if osp.isfile(calib_path):
intrinsic = np.loadtxt(calib_path, delimiter=' ')
intrinsic = intrinsic.astype(np.float64)
image_data = parse_list(image_list)
depth_data = parse_list(depth_list)
pose_data = parse_list(pose_list, skiprows=1)
pose_vecs = pose_data[:,1:].astype(np.float64)
tstamp_image = image_data[:,0].astype(np.float64)
tstamp_depth = depth_data[:,0].astype(np.float64)
tstamp_pose = pose_data[:,0].astype(np.float64)
associations = associate_frames(tstamp_image, tstamp_depth, tstamp_pose)
indicies = [ 0 ]
for i in range(1, len(associations)):
t0 = tstamp_image[associations[indicies[-1]][0]]
t1 = tstamp_image[associations[i][0]]
if t1 - t0 > 1.0 / frame_rate:
indicies += [ i ]
images, poses, depths, intrinsics = [], [], [], []
for ix in indicies:
(i, j, k) = associations[ix]
images += [ osp.join(datapath, image_data[i,1]) ]
depths += [ osp.join(datapath, depth_data[j,1]) ]
poses += [ pose_vecs[k] ]
if intrinsic is not None:
intrinsics += [ intrinsic ]
return images, depths, poses, intrinsics
def all_pairs_distance_matrix(poses, beta=2.5):
""" compute distance matrix between all pairs of poses """
poses = np.array(poses, dtype=np.float32)
poses[:,:3] *= beta # scale to balence rot + trans
poses = SE3(torch.from_numpy(poses))
r = (poses[:,None].inv() * poses[None,:]).log()
return r.norm(dim=-1).cpu().numpy()
def pose_matrix_to_quaternion(pose):
""" convert 4x4 pose matrix to (t, q) """
q = Rotation.from_matrix(pose[:3, :3]).as_quat()
return np.concatenate([pose[:3, 3], q], axis=0)
def compute_distance_matrix_flow(poses, disps, intrinsics):
""" compute flow magnitude between all pairs of frames """
if not isinstance(poses, SE3):
poses = torch.from_numpy(poses).float().cuda()[None]
poses = SE3(poses).inv()
disps = torch.from_numpy(disps).float().cuda()[None]
intrinsics = torch.from_numpy(intrinsics).float().cuda()[None]
N = poses.shape[1]
ii, jj = torch.meshgrid(torch.arange(N), torch.arange(N))
ii = ii.reshape(-1).cuda()
jj = jj.reshape(-1).cuda()
MAX_FLOW = 100.0
matrix = np.zeros((N, N), dtype=np.float32)
s = 2048
for i in range(0, ii.shape[0], s):
flow1, val1 = pops.induced_flow(poses, disps, intrinsics, ii[i:i+s], jj[i:i+s])
flow2, val2 = pops.induced_flow(poses, disps, intrinsics, jj[i:i+s], ii[i:i+s])
flow = torch.stack([flow1, flow2], dim=2)
val = torch.stack([val1, val2], dim=2)
mag = flow.norm(dim=-1).clamp(max=MAX_FLOW)
mag = mag.view(mag.shape[1], -1)
val = val.view(val.shape[1], -1)
mag = (mag * val).mean(-1) / val.mean(-1)
mag[val.mean(-1) < 0.7] = np.inf
i1 = ii[i:i+s].cpu().numpy()
j1 = jj[i:i+s].cpu().numpy()
matrix[i1, j1] = mag.cpu().numpy()
return matrix
import numpy as np
import pickle
import torch
import glob
import cv2
import os
import os.path as osp
from .base import RGBDDataset
from .augmentation import RGBDAugmentor
from .rgbd_utils import pose_matrix_to_quaternion
from .rgbd_utils import all_pairs_distance_matrix
class ScanNet(RGBDDataset):
def __init__(self, **kwargs):
super(ScanNet, self).__init__(root='datasets/ScanNet', name='ScanNet', **kwargs)
@staticmethod
def is_test_scene(scene):
scanid = int(re.findall(r'scene(.+?)_', scene)[0])
return scanid > 660
def _build_dataset_index(self):
""" construct scene_info and dataset_index objects """
from tqdm import tqdm
print("Building ScanNet dataset")
scene_info = {}
dataset_index = []
for scene in tqdm(os.listdir(self.root)):
scene_path = osp.join(self.root, scene)
depth_glob = osp.join(scene_path, 'depth', '*.png')
depth_list = glob.glob(depth_glob)
get_indicies = lambda x: int(osp.basename(x).split('.')[0])
get_images = lambda i: osp.join(scene_path, 'color', '%d.jpg' % i)
get_depths = lambda i: osp.join(scene_path, 'depth', '%d.png' % i)
get_poses = lambda i: osp.join(scene_path, 'pose', '%d.txt' % i)
indicies = sorted(map(get_indicies, depth_list))[::2]
image_list = list(map(get_images, indicies))
depth_list = list(map(get_depths, indicies))
pose_list = map(get_poses, indicies)
pose_list = list(map(ScanNet.pose_read, pose_list))
# remove nan poses
pvecs = np.stack(pose_list, 0)
keep, = np.where(~np.any(np.isnan(pvecs) | np.isinf(pvecs), axis=1))
images = [image_list[i] for i in keep]
depths = [depth_list[i] for i in keep]
poses = [pose_list[i] for i in keep]
intrinsic = ScanNet.calib_read(scene_path)
intrinsics = [intrinsic] * len(images)
graph = self.build_frame_graph(poses, depths, intrinsics)
scene_info[scene] = {'images': images, 'depths': depths,
'poses': poses, 'intrinsics': intrinsics, 'graph': graph}
for i in range(len(images)):
if len(graph[i][0]) > 1:
dataset_index.append((scene, i))
return scene_info, dataset_index
@staticmethod
def calib_read(scene_path):
intrinsic_file = osp.join(scene_path, 'intrinsic', 'intrinsic_depth.txt')
K = np.loadtxt(intrinsic_file, delimiter=' ')
return np.array([K[0,0], K[1,1], K[0,2], K[1,2]])
@staticmethod
def pose_read(pose_file):
pose = np.loadtxt(pose_file, delimiter=' ').astype(np.float64)
return pose_matrix_to_quaternion(pose)
@staticmethod
def image_read(image_file):
image = cv2.imread(image_file)
return cv2.resize(image, (640, 480))
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 1000.0
\ No newline at end of file
import numpy as np
import torch
import torch.utils.data as data
import torch.nn.functional as F
import csv
import os
import cv2
import math
import random
import json
import pickle
import os.path as osp
from .rgbd_utils import *
class RGBDStream(data.Dataset):
def __init__(self, datapath, frame_rate=-1, crop_size=[384,512]):
self.datapath = datapath
self.frame_rate = frame_rate
self.crop_size = crop_size
self._build_dataset_index()
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
return np.load(depth_file)
def __len__(self):
return len(self.images)
def __getitem__(self, index):
""" return training video """
image = self.__class__.image_read(self.images[index])
image = torch.from_numpy(image).float()
image = image.permute(2, 0, 1)
depth = self.__class__.depth_read(self.depths[index])
depth = torch.from_numpy(depth).float()
pose = torch.from_numpy(self.poses[index]).float()
intrinsic = torch.from_numpy(self.intrinsics[index]).float()
sx = self.crop_size[1] / depth.shape[1]
sy = self.crop_size[0] / depth.shape[0]
image = F.interpolate(image[None], self.crop_size, mode='bilinear', align_corners=True)[0]
depth = F.interpolate(depth[None,None], self.crop_size, mode='nearest')[0,0]
image = image[..., 8:-8, 8:-8]
depth = depth[..., 8:-8, 8:-8]
fx, fy, cx, cy = intrinsic.unbind(dim=0)
intrinsic = torch.stack([sx*fx, sy*fy, sx*cx - 8, sy*cy - 8])
# intrinsic *= torch.as_tensor([sx, sy, sx, sy])
return index, image, depth, pose, intrinsic
\ No newline at end of file
import numpy as np
import torch
import glob
import cv2
import os
import os.path as osp
from .base import RGBDDataset
from .stream import RGBDStream
class TartanAir(RGBDDataset):
# scale depths to balance rot & trans
DEPTH_SCALE = 5.0
TEST_SET = ['westerndesert', 'seasidetown', 'seasonsforest_winter', 'office2', 'gascola']
def __init__(self, mode='training', **kwargs):
self.mode = mode
self.n_frames = 2
super(TartanAir, self).__init__(root='datasets/TartanAir', name='TartanAir', **kwargs)
@staticmethod
def is_test_scene(scene):
return scene.split('/')[-3] in TartanAir.TEST_SET
def _build_dataset(self):
from tqdm import tqdm
print("Building TartanAir dataset")
scene_info = {}
scenes = glob.glob(osp.join(self.root, '*/*/*/*'))
for scene in tqdm(sorted(scenes)):
images = sorted(glob.glob(osp.join(scene, 'image_left/*.png')))
depths = sorted(glob.glob(osp.join(scene, 'depth_left/*.npy')))
poses = np.loadtxt(osp.join(scene, 'pose_left.txt'), delimiter=' ')
poses = poses[:, [1, 2, 0, 4, 5, 3, 6]]
poses[:,:3] /= TartanAir.DEPTH_SCALE
intrinsics = [TartanAir.calib_read()] * len(images)
# graph of co-visible frames based on flow
graph = self.build_frame_graph(poses, depths, intrinsics)
scene = '/'.join(scene.split('/'))
scene_info[scene] = {'images': images, 'depths': depths,
'poses': poses, 'intrinsics': intrinsics, 'graph': graph}
return scene_info
@staticmethod
def calib_read():
return np.array([320.0, 320.0, 320.0, 240.0])
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
depth = np.load(depth_file) / TartanAir.DEPTH_SCALE
depth[depth==np.nan] = 1.0
depth[depth==np.inf] = 1.0
return depth
class TartanAirTest(torch.utils.data.Dataset):
def __init__(self, root='datasets/Tartan'):
self.root = root
self.dataset_index = []
self.scene_info = {}
scenes = glob.glob(osp.join(self.root, '*/*/*/*'))
for scene in sorted(scenes):
image_glob = osp.join(scene, 'image_left/*.png')
depth_glob = osp.join(scene, 'depth_left/*.npy')
images = sorted(glob.glob(image_glob))
depths = sorted(glob.glob(depth_glob))
poses = np.loadtxt(osp.join(scene, 'pose_left.txt'), delimiter=' ')
poses = poses[:, [1, 2, 0, 4, 5, 3, 6]]
poses[:,:3] /= TartanAir.DEPTH_SCALE
intrinsics = [TartanAir.calib_read()] * len(images)
self.scene_info[scene] = {'images': images,
'depths': depths, 'poses': poses, 'intrinsics': intrinsics}
with open('assets/tartan_test.txt') as f:
self.dataset_index = f.readlines()
def __getitem__(self, index):
""" load test example """
scene_id, ix1, ix2 = self.dataset_index[index].split()
inds = [int(ix1), int(ix2)]
images_list = self.scene_info[scene_id]['images']
depths_list = self.scene_info[scene_id]['depths']
poses_list = self.scene_info[scene_id]['poses']
intrinsics_list = self.scene_info[scene_id]['intrinsics']
images, depths, poses, intrinsics = [], [], [], []
for i in inds:
images.append(TartanAir.image_read(images_list[i]))
depths.append(TartanAir.depth_read(depths_list[i]))
poses.append(poses_list[i])
intrinsics.append(intrinsics_list[i])
images = np.stack(images).astype(np.float32)
depths = np.stack(depths).astype(np.float32)
poses = np.stack(poses).astype(np.float32)
intrinsics = np.stack(intrinsics).astype(np.float32)
images = torch.from_numpy(images).float()
images = images.permute(0, 3, 1, 2)
depths = torch.from_numpy(depths)
poses = torch.from_numpy(poses)
intrinsics = torch.from_numpy(intrinsics)
return images, poses, depths, intrinsics
def __len__(self):
return len(self.dataset_index)
class TartanAirStream(RGBDStream):
def __init__(self, datapath, **kwargs):
super(TartanAirStream, self).__init__(datapath=datapath, **kwargs)
def _build_dataset_index(self):
""" build list of images, poses, depths, and intrinsics """
images, poses, depths, intrinsics = loadtum(self.datapath)
intrinsic = NYUv2.TUMStream(self.datapath)
intrinsics = np.tile(intrinsic[None], (len(images), 1))
self.images = images
self.poses = poses
self.depths = depths
self.intrinsics = intrinsics
@staticmethod
def calib_read(datapath):
return np.array([320.0, 320.0, 320.0, 240.0])
@staticmethod
def image_read(image_file):
return cv2.imread(image_file)
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 5000.0
import numpy as np
import torch
import csv
import os
import cv2
import math
import random
import json
import pickle
import os.path as osp
from lietorch import SE3
from .stream import RGBDStream
from .rgbd_utils import loadtum
intrinsics_dict = {
'freiburg1': [517.3, 516.5, 318.6, 255.3],
'freiburg2': [520.9, 521.0, 325.1, 249.7],
'freiburg3': [535.4, 539.2, 320.1, 247.6],
}
distortion_dict = {
'freiburg1': [0.2624, -0.9531, -0.0054, 0.0026, 1.1633],
'freiburg2': [0.2312, -0.7849, -0.0033, -0.0001, 0.9172],
'freiburg3': [0, 0, 0, 0, 0],
}
def as_intrinsics_matrix(intrinsics):
K = np.eye(3)
K[0,0] = intrinsics[0]
K[1,1] = intrinsics[1]
K[0,2] = intrinsics[2]
K[1,2] = intrinsics[3]
return K
class TUMStream(RGBDStream):
def __init__(self, datapath, **kwargs):
super(TUMStream, self).__init__(datapath=datapath, **kwargs)
def _build_dataset_index(self):
""" build list of images, poses, depths, and intrinsics """
images, depths, poses, intrinsics = loadtum(self.datapath, self.frame_rate)
intrinsic, _ = TUMStream.calib_read(self.datapath)
intrinsics = np.tile(intrinsic[None], (len(images), 1))
# set first pose to identity
poses = SE3(torch.as_tensor(poses))
poses = poses[[0]].inv() * poses
poses = poses.data.cpu().numpy()
self.images = images
self.poses = poses
self.depths = depths
self.intrinsics = intrinsics
@staticmethod
def calib_read(datapath):
if 'freiburg1' in datapath:
intrinsic = intrinsics_dict['freiburg1']
d_coef = distortion_dict['freiburg1']
elif 'freiburg2' in datapath:
intrinsic = intrinsics_dict['freiburg2']
d_coef = distortion_dict['freiburg2']
elif 'freiburg3' in datapath:
intrinsic = intrinsics_dict['freiburg3']
d_coef = distortion_dict['freiburg3']
return np.array(intrinsic), np.array(d_coef)
@staticmethod
def image_read(image_file):
intrinsics, d_coef = TUMStream.calib_read(image_file)
K = as_intrinsics_matrix(intrinsics)
image = cv2.imread(image_file)
return cv2.undistort(image, K, d_coef)
@staticmethod
def depth_read(depth_file):
depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH)
return depth.astype(np.float32) / 5000.0
import lietorch
import torch
import torch.nn.functional as F
from .chol import block_solve
import geom.projective_ops as pops
# utility functions for scattering ops
def safe_scatter_add_mat(H, data, ii, jj, B, M, D):
v = (ii >= 0) & (jj >= 0)
H.scatter_add_(1, (ii[v]*M + jj[v]).view(1,-1,1,1).repeat(B,1,D,D), data[:,v])
def safe_scatter_add_vec(b, data, ii, B, M, D):
v = ii >= 0
b.scatter_add_(1, ii[v].view(1,-1,1).repeat(B,1,D), data[:,v])
def MoBA(target, weight, poses, disps, intrinsics, ii, jj, fixedp=1, lm=0.0001, ep=0.1):
""" MoBA: Motion Only Bundle Adjustment """
B, M = poses.shape[:2]
D = poses.manifold_dim
N = ii.shape[0]
### 1: commpute jacobians and residuals ###
coords, valid, (Ji, Jj) = pops.projective_transform(
poses, disps, intrinsics, ii, jj, jacobian=True)
r = (target - coords).view(B, N, -1, 1)
w = (valid * weight).view(B, N, -1, 1)
### 2: construct linear system ###
Ji = Ji.view(B, N, -1, D)
Jj = Jj.view(B, N, -1, D)
wJiT = (.001 * w * Ji).transpose(2,3)
wJjT = (.001 * w * Jj).transpose(2,3)
Hii = torch.matmul(wJiT, Ji)
Hij = torch.matmul(wJiT, Jj)
Hji = torch.matmul(wJjT, Ji)
Hjj = torch.matmul(wJjT, Jj)
vi = torch.matmul(wJiT, r).squeeze(-1)
vj = torch.matmul(wJjT, r).squeeze(-1)
# only optimize keyframe poses
M = M - fixedp
ii = ii - fixedp
jj = jj - fixedp
H = torch.zeros(B, M*M, D, D, device=target.device)
safe_scatter_add_mat(H, Hii, ii, ii, B, M, D)
safe_scatter_add_mat(H, Hij, ii, jj, B, M, D)
safe_scatter_add_mat(H, Hji, jj, ii, B, M, D)
safe_scatter_add_mat(H, Hjj, jj, jj, B, M, D)
H = H.reshape(B, M, M, D, D)
v = torch.zeros(B, M, D, device=target.device)
safe_scatter_add_vec(v, vi, ii, B, M, D)
safe_scatter_add_vec(v, vj, jj, B, M, D)
### 3: solve the system + apply retraction ###
dx = block_solve(H, v, ep=ep, lm=lm)
poses1, poses2 = poses[:,:fixedp], poses[:,fixedp:]
poses2 = poses2.retr(dx)
poses = lietorch.cat([poses1, poses2], dim=1)
return poses
def SLessBA(target, weight, poses, disps, intrinsics, ii, jj, fixedp=1):
""" Structureless Bundle Adjustment """
pass
def BA(target, weight, poses, disps, intrinsics, ii, jj, fixedp=1):
""" Full Bundle Adjustment """
pass
\ No newline at end of file
import torch
import torch.nn.functional as F
import geom.projective_ops as pops
class CholeskySolver(torch.autograd.Function):
@staticmethod
def forward(ctx, H, b):
# don't crash training if cholesky decomp fails
try:
U = torch.cholesky(H)
xs = torch.cholesky_solve(b, U)
ctx.save_for_backward(U, xs)
ctx.failed = False
except Exception as e:
print(e)
ctx.failed = True
xs = torch.zeros_like(b)
return xs
@staticmethod
def backward(ctx, grad_x):
if ctx.failed:
return None, None
U, xs = ctx.saved_tensors
dz = torch.cholesky_solve(grad_x, U)
dH = -torch.matmul(xs, dz.transpose(-1,-2))
return dH, dz
def block_solve(H, b, ep=0.1, lm=0.0001):
""" solve normal equations """
B, N, _, D, _ = H.shape
I = torch.eye(D).to(H.device)
H = H + (ep + lm*H) * I
H = H.permute(0,1,3,2,4)
H = H.reshape(B, N*D, N*D)
b = b.reshape(B, N*D, 1)
x = CholeskySolver.apply(H,b)
return x.reshape(B, N, D)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment