cameras.py 49.9 KB
Newer Older
facebook-github-bot's avatar
facebook-github-bot committed
1
2
3
# Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.

import math
Georgia Gkioxari's avatar
Georgia Gkioxari committed
4
import warnings
Georgia Gkioxari's avatar
Georgia Gkioxari committed
5
from typing import Optional, Sequence, Tuple
6
7

import numpy as np
facebook-github-bot's avatar
facebook-github-bot committed
8
9
10
11
12
13
import torch
import torch.nn.functional as F
from pytorch3d.transforms import Rotate, Transform3d, Translate

from .utils import TensorProperties, convert_to_tensors_and_broadcast

14

facebook-github-bot's avatar
facebook-github-bot committed
15
# Default values for rotation and translation matrices.
David Novotny's avatar
David Novotny committed
16
17
_R = torch.eye(3)[None]  # (1, 3, 3)
_T = torch.zeros(1, 3)  # (1, 3)
facebook-github-bot's avatar
facebook-github-bot committed
18
19


20
21
22
23
class CamerasBase(TensorProperties):
    """
    `CamerasBase` implements a base class for all cameras.

Georgia Gkioxari's avatar
Georgia Gkioxari committed
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
    For cameras, there are four different coordinate systems (or spaces)
    - World coordinate system: This is the system the object lives - the world.
    - Camera view coordinate system: This is the system that has its origin on the image plane
        and the and the Z-axis perpendicular to the image plane.
        In PyTorch3D, we assume that +X points left, and +Y points up and
        +Z points out from the image plane.
        The transformation from world -> view happens after applying a rotation (R)
        and translation (T)
    - NDC coordinate system: This is the normalized coordinate system that confines
        in a volume the renderered part of the object or scene. Also known as view volume.
        Given the PyTorch3D convention, (+1, +1, znear) is the top left near corner,
        and (-1, -1, zfar) is the bottom right far corner of the volume.
        The transformation from view -> NDC happens after applying the camera
        projection matrix (P).
    - Screen coordinate system: This is another representation of the view volume with
        the XY coordinates defined in pixel space instead of a normalized space.

    A better illustration of the coordinate systems can be found in pytorch3d/docs/notes/cameras.md.

43
44
45
46
    It defines methods that are common to all camera models:
        - `get_camera_center` that returns the optical center of the camera in
    world coordinates
        - `get_world_to_view_transform` which returns a 3D transform from
Georgia Gkioxari's avatar
Georgia Gkioxari committed
47
    world coordinates to the camera view coordinates (R, T)
48
        - `get_full_projection_transform` which composes the projection
Georgia Gkioxari's avatar
Georgia Gkioxari committed
49
50
51
52
53
    transform (P) with the world-to-view transform (R, T)
        - `transform_points` which takes a set of input points in world coordinates and
    projects to NDC coordinates ranging from [-1, -1, znear] to [+1, +1, zfar].
        - `transform_points_screen` which takes a set of input points in world coordinates and
    projects them to the screen coordinates ranging from [0, 0, znear] to [W-1, H-1, zfar]
54
55

    For each new camera, one should implement the `get_projection_transform`
Georgia Gkioxari's avatar
Georgia Gkioxari committed
56
    routine that returns the mapping from camera view coordinates to NDC coordinates.
57
58

    Another useful function that is specific to each camera model is
Georgia Gkioxari's avatar
Georgia Gkioxari committed
59
60
    `unproject_points` which sends points from NDC coordinates back to
    camera view or world coordinates depending on the `world_coordinates`
61
62
63
64
65
66
67
68
69
70
71
72
    boolean argument of the function.
    """

    def get_projection_transform(self):
        """
        Calculate the projective transformation matrix.

        Args:
            **kwargs: parameters for the projection can be passed in as keyword
                arguments to override the default values set in `__init__`.

        Return:
73
            a `Transform3d` object which represents a batch of projection
74
75
76
77
78
79
            matrices of shape (N, 3, 3)
        """
        raise NotImplementedError()

    def unproject_points(self):
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
80
        Transform input points from NDC coodinates
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
        to the world / camera coordinates.

        Each of the input points `xy_depth` of shape (..., 3) is
        a concatenation of the x, y location and its depth.

        For instance, for an input 2D tensor of shape `(num_points, 3)`
        `xy_depth` takes the following form:
            `xy_depth[i] = [x[i], y[i], depth[i]]`,
        for a each point at an index `i`.

        The following example demonstrates the relationship between
        `transform_points` and `unproject_points`:

        .. code-block:: python

            cameras = # camera object derived from CamerasBase
            xyz = # 3D points of shape (batch_size, num_points, 3)
Georgia Gkioxari's avatar
Georgia Gkioxari committed
98
            # transform xyz to the camera view coordinates
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
            xyz_cam = cameras.get_world_to_view_transform().transform_points(xyz)
            # extract the depth of each point as the 3rd coord of xyz_cam
            depth = xyz_cam[:, :, 2:]
            # project the points xyz to the camera
            xy = cameras.transform_points(xyz)[:, :, :2]
            # append depth to xy
            xy_depth = torch.cat((xy, depth), dim=2)
            # unproject to the world coordinates
            xyz_unproj_world = cameras.unproject_points(xy_depth, world_coordinates=True)
            print(torch.allclose(xyz, xyz_unproj_world)) # True
            # unproject to the camera coordinates
            xyz_unproj = cameras.unproject_points(xy_depth, world_coordinates=False)
            print(torch.allclose(xyz_cam, xyz_unproj)) # True

        Args:
            xy_depth: torch tensor of shape (..., 3).
            world_coordinates: If `True`, unprojects the points back to world
                coordinates using the camera extrinsics `R` and `T`.
                `False` ignores `R` and `T` and unprojects to
Georgia Gkioxari's avatar
Georgia Gkioxari committed
118
                the camera view coordinates.
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

        Returns
            new_points: unprojected points with the same shape as `xy_depth`.
        """
        raise NotImplementedError()

    def get_camera_center(self, **kwargs) -> torch.Tensor:
        """
        Return the 3D location of the camera optical center
        in the world coordinates.

        Args:
            **kwargs: parameters for the camera extrinsics can be passed in
                as keyword arguments to override the default values
                set in __init__.

        Setting T here will update the values set in init as this
        value may be needed later on in the rendering pipeline e.g. for
        lighting calculations.

        Returns:
            C: a batch of 3D locations of shape (N, 3) denoting
            the locations of the center of each camera in the batch.
        """
        w2v_trans = self.get_world_to_view_transform(**kwargs)
        P = w2v_trans.inverse().get_matrix()
        # the camera center is the translation component (the first 3 elements
        # of the last row) of the inverted world-to-view
        # transform (4x4 RT matrix)
        C = P[:, 3, :3]
        return C

    def get_world_to_view_transform(self, **kwargs) -> Transform3d:
        """
        Return the world-to-view transform.

        Args:
            **kwargs: parameters for the camera extrinsics can be passed in
                as keyword arguments to override the default values
                set in __init__.

        Setting R and T here will update the values set in init as these
        values may be needed later on in the rendering pipeline e.g. for
        lighting calculations.

        Returns:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
165
            A Transform3d object which represents a batch of transforms
166
167
168
169
170
171
172
173
174
            of shape (N, 3, 3)
        """
        self.R = kwargs.get("R", self.R)  # pyre-ignore[16]
        self.T = kwargs.get("T", self.T)  # pyre-ignore[16]
        world_to_view_transform = get_world_to_view_transform(R=self.R, T=self.T)
        return world_to_view_transform

    def get_full_projection_transform(self, **kwargs) -> Transform3d:
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
175
176
        Return the full world-to-NDC transform composing the
        world-to-view and view-to-NDC transforms.
177
178
179
180
181
182
183
184
185
186
187

        Args:
            **kwargs: parameters for the projection transforms can be passed in
                as keyword arguments to override the default values
                set in __init__.

        Setting R and T here will update the values set in init as these
        values may be needed later on in the rendering pipeline e.g. for
        lighting calculations.

        Returns:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
188
            a Transform3d object which represents a batch of transforms
189
190
191
192
193
            of shape (N, 3, 3)
        """
        self.R = kwargs.get("R", self.R)  # pyre-ignore[16]
        self.T = kwargs.get("T", self.T)  # pyre-ignore[16]
        world_to_view_transform = self.get_world_to_view_transform(R=self.R, T=self.T)
Georgia Gkioxari's avatar
Georgia Gkioxari committed
194
195
        view_to_ndc_transform = self.get_projection_transform(**kwargs)
        return world_to_view_transform.compose(view_to_ndc_transform)
196
197
198
199
200

    def transform_points(
        self, points, eps: Optional[float] = None, **kwargs
    ) -> torch.Tensor:
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
201
        Transform input points from world to NDC space.
202
203
204
205
206

        Args:
            points: torch tensor of shape (..., 3).
            eps: If eps!=None, the argument is used to clamp the
                divisor in the homogeneous normalization of the points
Georgia Gkioxari's avatar
Georgia Gkioxari committed
207
                transformed to the ndc space. Please see
208
209
210
211
212
213
214
215
216
217
                `transforms.Transform3D.transform_points` for details.

                For `CamerasBase.transform_points`, setting `eps > 0`
                stabilizes gradients since it leads to avoiding division
                by excessivelly low numbers for points close to the
                camera plane.

        Returns
            new_points: transformed points with the same shape as the input.
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
        world_to_ndc_transform = self.get_full_projection_transform(**kwargs)
        return world_to_ndc_transform.transform_points(points, eps=eps)

    def transform_points_screen(
        self, points, image_size, eps: Optional[float] = None, **kwargs
    ) -> torch.Tensor:
        """
        Transform input points from world to screen space.

        Args:
            points: torch tensor of shape (N, V, 3).
            image_size: torch tensor of shape (N, 2)
            eps: If eps!=None, the argument is used to clamp the
                divisor in the homogeneous normalization of the points
                transformed to the ndc space. Please see
                `transforms.Transform3D.transform_points` for details.

                For `CamerasBase.transform_points`, setting `eps > 0`
                stabilizes gradients since it leads to avoiding division
                by excessivelly low numbers for points close to the
                camera plane.

        Returns
            new_points: transformed points with the same shape as the input.
        """

        ndc_points = self.transform_points(points, eps=eps, **kwargs)

        if not torch.is_tensor(image_size):
            image_size = torch.tensor(
                image_size, dtype=torch.int64, device=points.device
            )
        if (image_size < 1).any():
            raise ValueError("Provided image size is invalid.")

        image_width, image_height = image_size.unbind(1)
        image_width = image_width.view(-1, 1)  # (N, 1)
        image_height = image_height.view(-1, 1)  # (N, 1)

        ndc_z = ndc_points[..., 2]
        screen_x = (image_width - 1.0) / 2.0 * (1.0 - ndc_points[..., 0])
        screen_y = (image_height - 1.0) / 2.0 * (1.0 - ndc_points[..., 1])

        return torch.stack((screen_x, screen_y, ndc_z), dim=2)
262
263
264
265
266
267
268
269
270
271

    def clone(self):
        """
        Returns a copy of `self`.
        """
        cam_type = type(self)
        other = cam_type(device=self.device)
        return super().clone(other)


Georgia Gkioxari's avatar
Georgia Gkioxari committed
272
273
274
275
############################################################
#             Field of View Camera Classes                 #
############################################################

276

Georgia Gkioxari's avatar
Georgia Gkioxari committed
277
278
279
280
281
282
def OpenGLPerspectiveCameras(
    znear=1.0,
    zfar=100.0,
    aspect_ratio=1.0,
    fov=60.0,
    degrees: bool = True,
David Novotny's avatar
David Novotny committed
283
284
    R=_R,
    T=_T,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
285
286
287
288
289
290
    device="cpu",
):
    """
    OpenGLPerspectiveCameras has been DEPRECATED. Use FoVPerspectiveCameras instead.
    Preserving OpenGLPerspectiveCameras for backward compatibility.
    """
291

Georgia Gkioxari's avatar
Georgia Gkioxari committed
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
    warnings.warn(
        """OpenGLPerspectiveCameras is deprecated,
        Use FoVPerspectiveCameras instead.
        OpenGLPerspectiveCameras will be removed in future releases.""",
        PendingDeprecationWarning,
    )

    return FoVPerspectiveCameras(
        znear=znear,
        zfar=zfar,
        aspect_ratio=aspect_ratio,
        fov=fov,
        degrees=degrees,
        R=R,
        T=T,
        device=device,
    )


class FoVPerspectiveCameras(CamerasBase):
facebook-github-bot's avatar
facebook-github-bot committed
312
313
    """
    A class which stores a batch of parameters to generate a batch of
Georgia Gkioxari's avatar
Georgia Gkioxari committed
314
315
    projection matrices by specifiying the field of view.
    The definition of the parameters follow the OpenGL perspective camera.
facebook-github-bot's avatar
facebook-github-bot committed
316
317
318

    The extrinsics of the camera (R and T matrices) can also be set in the
    initializer or passed in to `get_full_projection_transform` to get
Georgia Gkioxari's avatar
Georgia Gkioxari committed
319
    the full transformation from world -> ndc.
facebook-github-bot's avatar
facebook-github-bot committed
320

Georgia Gkioxari's avatar
Georgia Gkioxari committed
321
    The `transform_points` method calculates the full world -> ndc transform
facebook-github-bot's avatar
facebook-github-bot committed
322
323
324
325
326
327
328
329
330
331
332
333
    and then applies it to the input points.

    The transforms can also be returned separately as Transform3d objects.
    """

    def __init__(
        self,
        znear=1.0,
        zfar=100.0,
        aspect_ratio=1.0,
        fov=60.0,
        degrees: bool = True,
David Novotny's avatar
David Novotny committed
334
335
        R=_R,
        T=_T,
336
        K=None,
facebook-github-bot's avatar
facebook-github-bot committed
337
338
339
340
341
342
343
344
345
346
347
348
        device="cpu",
    ):
        """

        Args:
            znear: near clipping plane of the view frustrum.
            zfar: far clipping plane of the view frustrum.
            aspect_ratio: ratio of screen_width/screen_height.
            fov: field of view angle of the camera.
            degrees: bool, set to True if fov is specified in degrees.
            R: Rotation matrix of shape (N, 3, 3)
            T: Translation matrix of shape (N, 3)
349
350
            K: (optional) A calibration matrix of shape (N, 4, 4)
                If provided, don't need znear, zfar, fov, aspect_ratio, degrees
facebook-github-bot's avatar
facebook-github-bot committed
351
352
353
354
355
356
357
358
359
360
361
362
            device: torch.device or string
        """
        # The initializer formats all inputs to torch tensors and broadcasts
        # all the inputs to have the same batch dimension where necessary.
        super().__init__(
            device=device,
            znear=znear,
            zfar=zfar,
            aspect_ratio=aspect_ratio,
            fov=fov,
            R=R,
            T=T,
363
            K=K,
facebook-github-bot's avatar
facebook-github-bot committed
364
365
366
367
368
        )

        # No need to convert to tensor or broadcast.
        self.degrees = degrees

369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
    def compute_projection_matrix(
        self, znear, zfar, fov, aspect_ratio, degrees
    ) -> torch.Tensor:
        """
        Compute the calibration matrix K of shape (N, 4, 4)

        Args:
            znear: near clipping plane of the view frustrum.
            zfar: far clipping plane of the view frustrum.
            fov: field of view angle of the camera.
            aspect_ratio: ratio of screen_width/screen_height.
            degrees: bool, set to True if fov is specified in degrees.

        Returns:
            torch.floatTensor of the calibration matrix with shape (N, 4, 4)
        """
        K = torch.zeros((self._N, 4, 4), device=self.device, dtype=torch.float32)
        ones = torch.ones((self._N), dtype=torch.float32, device=self.device)
        if degrees:
            fov = (np.pi / 180) * fov

        if not torch.is_tensor(fov):
            fov = torch.tensor(fov, device=self.device)
        tanHalfFov = torch.tan((fov / 2))
        max_y = tanHalfFov * znear
        min_y = -max_y
        max_x = max_y * aspect_ratio
        min_x = -max_x

        # NOTE: In OpenGL the projection matrix changes the handedness of the
        # coordinate frame. i.e the NDC space postive z direction is the
        # camera space negative z direction. This is because the sign of the z
        # in the projection matrix is set to -1.0.
        # In pytorch3d we maintain a right handed coordinate system throughout
        # so the so the z sign is 1.0.
        z_sign = 1.0

        K[:, 0, 0] = 2.0 * znear / (max_x - min_x)
        K[:, 1, 1] = 2.0 * znear / (max_y - min_y)
        K[:, 0, 2] = (max_x + min_x) / (max_x - min_x)
        K[:, 1, 2] = (max_y + min_y) / (max_y - min_y)
        K[:, 3, 2] = z_sign * ones

        # NOTE: This maps the z coordinate from [0, 1] where z = 0 if the point
        # is at the near clipping plane and z = 1 when the point is at the far
        # clipping plane.
        K[:, 2, 2] = z_sign * zfar / (zfar - znear)
        K[:, 2, 3] = -(zfar * znear) / (zfar - znear)

        return K

facebook-github-bot's avatar
facebook-github-bot committed
420
421
    def get_projection_transform(self, **kwargs) -> Transform3d:
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
422
        Calculate the perpective projection matrix with a symmetric
facebook-github-bot's avatar
facebook-github-bot committed
423
        viewing frustrum. Use column major order.
Georgia Gkioxari's avatar
Georgia Gkioxari committed
424
425
426
        The viewing frustrum will be projected into ndc, s.t.
        (max_x, max_y) -> (+1, +1)
        (min_x, min_y) -> (-1, -1)
facebook-github-bot's avatar
facebook-github-bot committed
427
428
429
430
431
432

        Args:
            **kwargs: parameters for the projection can be passed in as keyword
                arguments to override the default values set in `__init__`.

        Return:
433
            a Transform3d object which represents a batch of projection
Georgia Gkioxari's avatar
Georgia Gkioxari committed
434
            matrices of shape (N, 4, 4)
facebook-github-bot's avatar
facebook-github-bot committed
435
436
437

        .. code-block:: python

Georgia Gkioxari's avatar
Georgia Gkioxari committed
438
439
            h1 = (max_y + min_y)/(max_y - min_y)
            w1 = (max_x + min_x)/(max_x - min_x)
facebook-github-bot's avatar
facebook-github-bot committed
440
441
442
443
            tanhalffov = tan((fov/2))
            s1 = 1/tanhalffov
            s2 = 1/(tanhalffov * (aspect_ratio))

444
445
446
447
448
            # To map z to the range [0, 1] use:
            f1 =  far / (far - near)
            f2 = -(far * near) / (far - near)

            # Projection matrix
449
            K = [
facebook-github-bot's avatar
facebook-github-bot committed
450
451
452
                    [s1,   0,   w1,   0],
                    [0,   s2,   h1,   0],
                    [0,    0,   f1,  f2],
453
                    [0,    0,    1,   0],
facebook-github-bot's avatar
facebook-github-bot committed
454
455
            ]
        """
456
457
458
459
460
461
462
463
464
465
466
467
468
        K = kwargs.get("K", self.K)  # pyre-ignore[16]
        if K is not None:
            if K.shape != (self._N, 4, 4):
                msg = "Expected K to have shape of (%r, 4, 4)"
                raise ValueError(msg % (self._N))
        else:
            K = self.compute_projection_matrix(
                kwargs.get("znear", self.znear),  # pyre-ignore[16]
                kwargs.get("zfar", self.zfar),  # pyre-ignore[16]
                kwargs.get("fov", self.fov),  # pyre-ignore[16]
                kwargs.get("aspect_ratio", self.aspect_ratio),  # pyre-ignore[16]
                kwargs.get("degrees", self.degrees),
            )
facebook-github-bot's avatar
facebook-github-bot committed
469

David Novotny's avatar
David Novotny committed
470
        # Transpose the projection matrix as PyTorch3D transforms use row vectors.
facebook-github-bot's avatar
facebook-github-bot committed
471
        transform = Transform3d(device=self.device)
472
        transform._matrix = K.transpose(1, 2).contiguous()
facebook-github-bot's avatar
facebook-github-bot committed
473
474
        return transform

475
476
477
478
479
480
481
482
    def unproject_points(
        self,
        xy_depth: torch.Tensor,
        world_coordinates: bool = True,
        scaled_depth_input: bool = False,
        **kwargs
    ) -> torch.Tensor:
        """>!
Georgia Gkioxari's avatar
Georgia Gkioxari committed
483
        FoV cameras further allow for passing depth in world units
484
485
        (`scaled_depth_input=False`) or in the [0, 1]-normalized units
        (`scaled_depth_input=True`)
facebook-github-bot's avatar
facebook-github-bot committed
486
487

        Args:
488
489
490
491
492
            scaled_depth_input: If `True`, assumes the input depth is in
                the [0, 1]-normalized units. If `False` the input depth is in
                the world units.
        """

Georgia Gkioxari's avatar
Georgia Gkioxari committed
493
        # obtain the relevant transformation to ndc
494
        if world_coordinates:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
495
            to_ndc_transform = self.get_full_projection_transform()
496
        else:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
497
            to_ndc_transform = self.get_projection_transform()
498
499
500
501
502
503

        if scaled_depth_input:
            # the input is scaled depth, so we don't have to do anything
            xy_sdepth = xy_depth
        else:
            # parse out important values from the projection matrix
504
505
            K_matrix = self.get_projection_transform(**kwargs.copy()).get_matrix()
            # parse out f1, f2 from K_matrix
506
            unsqueeze_shape = [1] * xy_depth.dim()
507
508
509
            unsqueeze_shape[0] = K_matrix.shape[0]
            f1 = K_matrix[:, 2, 2].reshape(unsqueeze_shape)
            f2 = K_matrix[:, 3, 2].reshape(unsqueeze_shape)
510
511
512
513
514
515
            # get the scaled depth
            sdepth = (f1 * xy_depth[..., 2:3] + f2) / xy_depth[..., 2:3]
            # concatenate xy + scaled depth
            xy_sdepth = torch.cat((xy_depth[..., 0:2], sdepth), dim=-1)

        # unproject with inverse of the projection
Georgia Gkioxari's avatar
Georgia Gkioxari committed
516
        unprojection_transform = to_ndc_transform.inverse()
517
518
519
        return unprojection_transform.transform_points(xy_sdepth)


Georgia Gkioxari's avatar
Georgia Gkioxari committed
520
521
522
523
524
525
526
527
def OpenGLOrthographicCameras(
    znear=1.0,
    zfar=100.0,
    top=1.0,
    bottom=-1.0,
    left=-1.0,
    right=1.0,
    scale_xyz=((1.0, 1.0, 1.0),),  # (1, 3)
David Novotny's avatar
David Novotny committed
528
529
    R=_R,
    T=_T,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
    device="cpu",
):
    """
    OpenGLOrthographicCameras has been DEPRECATED. Use FoVOrthographicCameras instead.
    Preserving OpenGLOrthographicCameras for backward compatibility.
    """

    warnings.warn(
        """OpenGLOrthographicCameras is deprecated,
        Use FoVOrthographicCameras instead.
        OpenGLOrthographicCameras will be removed in future releases.""",
        PendingDeprecationWarning,
    )

    return FoVOrthographicCameras(
        znear=znear,
        zfar=zfar,
        max_y=top,
        min_y=bottom,
        max_x=right,
        min_x=left,
        scale_xyz=scale_xyz,
        R=R,
        T=T,
        device=device,
    )


class FoVOrthographicCameras(CamerasBase):
facebook-github-bot's avatar
facebook-github-bot committed
559
560
    """
    A class which stores a batch of parameters to generate a batch of
Georgia Gkioxari's avatar
Georgia Gkioxari committed
561
562
    projection matrices by specifiying the field of view.
    The definition of the parameters follow the OpenGL orthographic camera.
facebook-github-bot's avatar
facebook-github-bot committed
563
564
565
566
567
568
    """

    def __init__(
        self,
        znear=1.0,
        zfar=100.0,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
569
570
571
572
        max_y=1.0,
        min_y=-1.0,
        max_x=1.0,
        min_x=-1.0,
facebook-github-bot's avatar
facebook-github-bot committed
573
        scale_xyz=((1.0, 1.0, 1.0),),  # (1, 3)
David Novotny's avatar
David Novotny committed
574
575
        R=_R,
        T=_T,
576
        K=None,
facebook-github-bot's avatar
facebook-github-bot committed
577
578
579
580
581
582
583
        device="cpu",
    ):
        """

        Args:
            znear: near clipping plane of the view frustrum.
            zfar: far clipping plane of the view frustrum.
Georgia Gkioxari's avatar
Georgia Gkioxari committed
584
585
586
587
            max_y: maximum y coordinate of the frustrum.
            min_y: minimum y coordinate of the frustrum.
            max_x: maximum x coordinate of the frustrum.
            min_x: minumum x coordinage of the frustrum
facebook-github-bot's avatar
facebook-github-bot committed
588
589
590
            scale_xyz: scale factors for each axis of shape (N, 3).
            R: Rotation matrix of shape (N, 3, 3).
            T: Translation of shape (N, 3).
591
592
            K: (optional) A calibration matrix of shape (N, 4, 4)
                If provided, don't need znear, zfar, max_y, min_y, max_x, min_x, scale_xyz
facebook-github-bot's avatar
facebook-github-bot committed
593
594
            device: torch.device or string.

Georgia Gkioxari's avatar
Georgia Gkioxari committed
595
        Only need to set min_x, max_x, min_y, max_y for viewing frustrums
facebook-github-bot's avatar
facebook-github-bot committed
596
597
598
599
600
601
602
603
        which are non symmetric about the origin.
        """
        # The initializer formats all inputs to torch tensors and broadcasts
        # all the inputs to have the same batch dimension where necessary.
        super().__init__(
            device=device,
            znear=znear,
            zfar=zfar,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
604
605
606
607
            max_y=max_y,
            min_y=min_y,
            max_x=max_x,
            min_x=min_x,
facebook-github-bot's avatar
facebook-github-bot committed
608
609
610
            scale_xyz=scale_xyz,
            R=R,
            T=T,
611
            K=K,
facebook-github-bot's avatar
facebook-github-bot committed
612
613
        )

614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
    def compute_projection_matrix(
        self, znear, zfar, max_x, min_x, max_y, min_y, scale_xyz
    ) -> torch.Tensor:
        """
        Compute the calibration matrix K of shape (N, 4, 4)

        Args:
            znear: near clipping plane of the view frustrum.
            zfar: far clipping plane of the view frustrum.
            max_x: maximum x coordinate of the frustrum.
            min_x: minumum x coordinage of the frustrum
            max_y: maximum y coordinate of the frustrum.
            min_y: minimum y coordinate of the frustrum.
            scale_xyz: scale factors for each axis of shape (N, 3).
        """
        K = torch.zeros((self._N, 4, 4), dtype=torch.float32, device=self.device)
        ones = torch.ones((self._N), dtype=torch.float32, device=self.device)
        # NOTE: OpenGL flips handedness of coordinate system between camera
        # space and NDC space so z sign is -ve. In PyTorch3D we maintain a
        # right handed coordinate system throughout.
        z_sign = +1.0

        K[:, 0, 0] = (2.0 / (max_x - min_x)) * scale_xyz[:, 0]
        K[:, 1, 1] = (2.0 / (max_y - min_y)) * scale_xyz[:, 1]
        K[:, 0, 3] = -(max_x + min_x) / (max_x - min_x)
        K[:, 1, 3] = -(max_y + min_y) / (max_y - min_y)
        K[:, 3, 3] = ones

        # NOTE: This maps the z coordinate to the range [0, 1] and replaces the
        # the OpenGL z normalization to [-1, 1]
        K[:, 2, 2] = z_sign * (1.0 / (zfar - znear)) * scale_xyz[:, 2]
        K[:, 2, 3] = -znear / (zfar - znear)

        return K

facebook-github-bot's avatar
facebook-github-bot committed
649
650
    def get_projection_transform(self, **kwargs) -> Transform3d:
        """
Georgia Gkioxari's avatar
Georgia Gkioxari committed
651
        Calculate the orthographic projection matrix.
facebook-github-bot's avatar
facebook-github-bot committed
652
653
654
655
656
657
        Use column major order.

        Args:
            **kwargs: parameters for the projection can be passed in to
                      override the default values set in __init__.
        Return:
658
            a Transform3d object which represents a batch of projection
Georgia Gkioxari's avatar
Georgia Gkioxari committed
659
               matrices of shape (N, 4, 4)
facebook-github-bot's avatar
facebook-github-bot committed
660
661
662

        .. code-block:: python

Georgia Gkioxari's avatar
Georgia Gkioxari committed
663
664
665
666
667
668
            scale_x = 2 / (max_x - min_x)
            scale_y = 2 / (max_y - min_y)
            scale_z = 2 / (far-near)
            mid_x = (max_x + min_x) / (max_x - min_x)
            mix_y = (max_y + min_y) / (max_y - min_y)
            mid_z = (far + near) / (far−near)
facebook-github-bot's avatar
facebook-github-bot committed
669

670
            K = [
facebook-github-bot's avatar
facebook-github-bot committed
671
672
673
674
675
676
                    [scale_x,        0,         0,  -mid_x],
                    [0,        scale_y,         0,  -mix_y],
                    [0,              0,  -scale_z,  -mid_z],
                    [0,              0,         0,       1],
            ]
        """
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
        K = kwargs.get("K", self.K)  # pyre-ignore[16]
        if K is not None:
            if K.shape != (self._N, 4, 4):
                msg = "Expected K to have shape of (%r, 4, 4)"
                raise ValueError(msg % (self._N))
        else:
            K = self.compute_projection_matrix(
                kwargs.get("znear", self.znear),  # pyre-ignore[16]
                kwargs.get("zfar", self.zfar),  # pyre-ignore[16]
                kwargs.get("max_x", self.max_x),  # pyre-ignore[16]
                kwargs.get("min_x", self.min_x),  # pyre-ignore[16]
                kwargs.get("max_y", self.max_y),  # pyre-ignore[16]
                kwargs.get("min_y", self.min_y),  # pyre-ignore[16]
                kwargs.get("scale_xyz", self.scale_xyz),  # pyre-ignore[16]
            )
facebook-github-bot's avatar
facebook-github-bot committed
692
693

        transform = Transform3d(device=self.device)
694
        transform._matrix = K.transpose(1, 2).contiguous()
facebook-github-bot's avatar
facebook-github-bot committed
695
696
        return transform

697
698
699
700
701
702
703
704
    def unproject_points(
        self,
        xy_depth: torch.Tensor,
        world_coordinates: bool = True,
        scaled_depth_input: bool = False,
        **kwargs
    ) -> torch.Tensor:
        """>!
Georgia Gkioxari's avatar
Georgia Gkioxari committed
705
        FoV cameras further allow for passing depth in world units
706
707
        (`scaled_depth_input=False`) or in the [0, 1]-normalized units
        (`scaled_depth_input=True`)
facebook-github-bot's avatar
facebook-github-bot committed
708
709

        Args:
710
711
712
713
714
715
            scaled_depth_input: If `True`, assumes the input depth is in
                the [0, 1]-normalized units. If `False` the input depth is in
                the world units.
        """

        if world_coordinates:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
716
            to_ndc_transform = self.get_full_projection_transform(**kwargs.copy())
717
        else:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
718
            to_ndc_transform = self.get_projection_transform(**kwargs.copy())
719
720
721
722
723
724

        if scaled_depth_input:
            # the input depth is already scaled
            xy_sdepth = xy_depth
        else:
            # we have to obtain the scaled depth first
725
726
727
728
729
            K = self.get_projection_transform(**kwargs).get_matrix()
            unsqueeze_shape = [1] * K.dim()
            unsqueeze_shape[0] = K.shape[0]
            mid_z = K[:, 3, 2].reshape(unsqueeze_shape)
            scale_z = K[:, 2, 2].reshape(unsqueeze_shape)
730
731
732
733
            scaled_depth = scale_z * xy_depth[..., 2:3] + mid_z
            # cat xy and scaled depth
            xy_sdepth = torch.cat((xy_depth[..., :2], scaled_depth), dim=-1)
        # finally invert the transform
Georgia Gkioxari's avatar
Georgia Gkioxari committed
734
        unprojection_transform = to_ndc_transform.inverse()
735
736
737
        return unprojection_transform.transform_points(xy_sdepth)


Georgia Gkioxari's avatar
Georgia Gkioxari committed
738
739
740
741
742
743
744
745
746
747
748
749
############################################################
#             MultiView Camera Classes                     #
############################################################
"""
Note that the MultiView Cameras accept  parameters in both
screen and NDC space.
If the user specifies `image_size` at construction time then
we assume the parameters are in screen space.
"""


def SfMPerspectiveCameras(
750
    focal_length=1.0, principal_point=((0.0, 0.0),), R=_R, T=_T, device="cpu"
Georgia Gkioxari's avatar
Georgia Gkioxari committed
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
):
    """
    SfMPerspectiveCameras has been DEPRECATED. Use PerspectiveCameras instead.
    Preserving SfMPerspectiveCameras for backward compatibility.
    """

    warnings.warn(
        """SfMPerspectiveCameras is deprecated,
        Use PerspectiveCameras instead.
        SfMPerspectiveCameras will be removed in future releases.""",
        PendingDeprecationWarning,
    )

    return PerspectiveCameras(
        focal_length=focal_length,
        principal_point=principal_point,
        R=R,
        T=T,
        device=device,
    )


class PerspectiveCameras(CamerasBase):
facebook-github-bot's avatar
facebook-github-bot committed
774
775
776
777
    """
    A class which stores a batch of parameters to generate a batch of
    transformation matrices using the multi-view geometry convention for
    perspective camera.
Georgia Gkioxari's avatar
Georgia Gkioxari committed
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802

    Parameters for this camera can be specified in NDC or in screen space.
    If you wish to provide parameters in screen space, you NEED to provide
    the image_size = (imwidth, imheight).
    If you wish to provide parameters in NDC space, you should NOT provide
    image_size. Providing valid image_size will triger a screen space to
    NDC space transformation in the camera.

    For example, here is how to define cameras on the two spaces.

    .. code-block:: python
        # camera defined in screen space
        cameras = PerspectiveCameras(
            focal_length=((22.0, 15.0),),  # (fx_screen, fy_screen)
            principal_point=((192.0, 128.0),),  # (px_screen, py_screen)
            image_size=((256, 256),),  # (imwidth, imheight)
        )

        # the equivalent camera defined in NDC space
        cameras = PerspectiveCameras(
            focal_length=((0.17875, 0.11718),),  # fx = fx_screen / half_imwidth,
                                                # fy = fy_screen / half_imheight
            principal_point=((-0.5, 0),),  # px = - (px_screen - half_imwidth) / half_imwidth,
                                           # py = - (py_screen - half_imheight) / half_imheight
        )
facebook-github-bot's avatar
facebook-github-bot committed
803
804
805
    """

    def __init__(
Georgia Gkioxari's avatar
Georgia Gkioxari committed
806
807
808
        self,
        focal_length=1.0,
        principal_point=((0.0, 0.0),),
David Novotny's avatar
David Novotny committed
809
810
        R=_R,
        T=_T,
811
        K=None,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
812
813
        device="cpu",
        image_size=((-1, -1),),
facebook-github-bot's avatar
facebook-github-bot committed
814
815
816
817
818
819
820
821
822
823
824
825
    ):
        """

        Args:
            focal_length: Focal length of the camera in world units.
                A tensor of shape (N, 1) or (N, 2) for
                square and non-square pixels respectively.
            principal_point: xy coordinates of the center of
                the principal point of the camera in pixels.
                A tensor of shape (N, 2).
            R: Rotation matrix of shape (N, 3, 3)
            T: Translation matrix of shape (N, 3)
826
827
828
            K: (optional) A calibration matrix of shape (N, 4, 4)
                If provided, don't need focal_length, principal_point, image_size

facebook-github-bot's avatar
facebook-github-bot committed
829
            device: torch.device or string
Georgia Gkioxari's avatar
Georgia Gkioxari committed
830
831
832
833
834
            image_size: If image_size = (imwidth, imheight) with imwidth, imheight > 0
                is provided, the camera parameters are assumed to be in screen
                space. They will be converted to NDC space.
                If image_size is not provided, the parameters are assumed to
                be in NDC space.
facebook-github-bot's avatar
facebook-github-bot committed
835
836
837
838
839
840
841
842
843
        """
        # The initializer formats all inputs to torch tensors and broadcasts
        # all the inputs to have the same batch dimension where necessary.
        super().__init__(
            device=device,
            focal_length=focal_length,
            principal_point=principal_point,
            R=R,
            T=T,
844
            K=K,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
845
            image_size=image_size,
facebook-github-bot's avatar
facebook-github-bot committed
846
847
848
849
850
851
852
853
854
855
856
857
        )

    def get_projection_transform(self, **kwargs) -> Transform3d:
        """
        Calculate the projection matrix using the
        multi-view geometry convention.

        Args:
            **kwargs: parameters for the projection can be passed in as keyword
                arguments to override the default values set in __init__.

        Returns:
858
            A `Transform3d` object with a batch of `N` projection transforms.
facebook-github-bot's avatar
facebook-github-bot committed
859
860
861

        .. code-block:: python

862
863
864
865
            fx = focal_length[:, 0]
            fy = focal_length[:, 1]
            px = principal_point[:, 0]
            py = principal_point[:, 1]
facebook-github-bot's avatar
facebook-github-bot committed
866

867
            K = [
868
869
                    [fx,   0,   px,   0],
                    [0,   fy,   py,   0],
facebook-github-bot's avatar
facebook-github-bot committed
870
871
872
873
                    [0,    0,    0,   1],
                    [0,    0,    1,   0],
            ]
        """
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
        K = kwargs.get("K", self.K)  # pyre-ignore[16]
        if K is not None:
            if K.shape != (self._N, 4, 4):
                msg = "Expected K to have shape of (%r, 4, 4)"
                raise ValueError(msg % (self._N))
        else:
            # pyre-ignore[16]
            image_size = kwargs.get("image_size", self.image_size)
            # if imwidth > 0, parameters are in screen space
            image_size = image_size if image_size[0][0] > 0 else None

            K = _get_sfm_calibration_matrix(
                self._N,
                self.device,
                kwargs.get("focal_length", self.focal_length),  # pyre-ignore[16]
                kwargs.get("principal_point", self.principal_point),  # pyre-ignore[16]
                orthographic=False,
                image_size=image_size,
            )
facebook-github-bot's avatar
facebook-github-bot committed
893
894

        transform = Transform3d(device=self.device)
895
        transform._matrix = K.transpose(1, 2).contiguous()
facebook-github-bot's avatar
facebook-github-bot committed
896
897
        return transform

898
899
900
901
    def unproject_points(
        self, xy_depth: torch.Tensor, world_coordinates: bool = True, **kwargs
    ) -> torch.Tensor:
        if world_coordinates:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
902
            to_ndc_transform = self.get_full_projection_transform(**kwargs)
903
        else:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
904
            to_ndc_transform = self.get_projection_transform(**kwargs)
905

Georgia Gkioxari's avatar
Georgia Gkioxari committed
906
        unprojection_transform = to_ndc_transform.inverse()
907
908
909
910
        xy_inv_depth = torch.cat(
            (xy_depth[..., :2], 1.0 / xy_depth[..., 2:3]), dim=-1  # type: ignore
        )
        return unprojection_transform.transform_points(xy_inv_depth)
facebook-github-bot's avatar
facebook-github-bot committed
911
912


Georgia Gkioxari's avatar
Georgia Gkioxari committed
913
def SfMOrthographicCameras(
David Novotny's avatar
David Novotny committed
914
    focal_length=1.0, principal_point=((0.0, 0.0),), R=_R, T=_T, device="cpu"
Georgia Gkioxari's avatar
Georgia Gkioxari committed
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
):
    """
    SfMOrthographicCameras has been DEPRECATED. Use OrthographicCameras instead.
    Preserving SfMOrthographicCameras for backward compatibility.
    """

    warnings.warn(
        """SfMOrthographicCameras is deprecated,
        Use OrthographicCameras instead.
        SfMOrthographicCameras will be removed in future releases.""",
        PendingDeprecationWarning,
    )

    return OrthographicCameras(
        focal_length=focal_length,
        principal_point=principal_point,
        R=R,
        T=T,
        device=device,
    )


class OrthographicCameras(CamerasBase):
facebook-github-bot's avatar
facebook-github-bot committed
938
939
940
941
    """
    A class which stores a batch of parameters to generate a batch of
    transformation matrices using the multi-view geometry convention for
    orthographic camera.
Georgia Gkioxari's avatar
Georgia Gkioxari committed
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965

    Parameters for this camera can be specified in NDC or in screen space.
    If you wish to provide parameters in screen space, you NEED to provide
    the image_size = (imwidth, imheight).
    If you wish to provide parameters in NDC space, you should NOT provide
    image_size. Providing valid image_size will triger a screen space to
    NDC space transformation in the camera.

    For example, here is how to define cameras on the two spaces.

    .. code-block:: python
        # camera defined in screen space
        cameras = OrthographicCameras(
            focal_length=((22.0, 15.0),),  # (fx, fy)
            principal_point=((192.0, 128.0),),  # (px, py)
            image_size=((256, 256),),  # (imwidth, imheight)
        )

        # the equivalent camera defined in NDC space
        cameras = OrthographicCameras(
            focal_length=((0.17875, 0.11718),),  # := (fx / half_imwidth, fy / half_imheight)
            principal_point=((-0.5, 0),),  # := (- (px - half_imwidth) / half_imwidth,
                                                 - (py - half_imheight) / half_imheight)
        )
facebook-github-bot's avatar
facebook-github-bot committed
966
967
968
    """

    def __init__(
Georgia Gkioxari's avatar
Georgia Gkioxari committed
969
970
971
        self,
        focal_length=1.0,
        principal_point=((0.0, 0.0),),
David Novotny's avatar
David Novotny committed
972
973
        R=_R,
        T=_T,
974
        K=None,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
975
976
        device="cpu",
        image_size=((-1, -1),),
facebook-github-bot's avatar
facebook-github-bot committed
977
978
979
980
981
982
983
984
985
986
987
988
    ):
        """

        Args:
            focal_length: Focal length of the camera in world units.
                A tensor of shape (N, 1) or (N, 2) for
                square and non-square pixels respectively.
            principal_point: xy coordinates of the center of
                the principal point of the camera in pixels.
                A tensor of shape (N, 2).
            R: Rotation matrix of shape (N, 3, 3)
            T: Translation matrix of shape (N, 3)
989
990
            K: (optional) A calibration matrix of shape (N, 4, 4)
                If provided, don't need focal_length, principal_point, image_size
facebook-github-bot's avatar
facebook-github-bot committed
991
            device: torch.device or string
Georgia Gkioxari's avatar
Georgia Gkioxari committed
992
993
994
995
996
            image_size: If image_size = (imwidth, imheight) with imwidth, imheight > 0
                is provided, the camera parameters are assumed to be in screen
                space. They will be converted to NDC space.
                If image_size is not provided, the parameters are assumed to
                be in NDC space.
facebook-github-bot's avatar
facebook-github-bot committed
997
998
999
1000
1001
1002
1003
1004
1005
        """
        # The initializer formats all inputs to torch tensors and broadcasts
        # all the inputs to have the same batch dimension where necessary.
        super().__init__(
            device=device,
            focal_length=focal_length,
            principal_point=principal_point,
            R=R,
            T=T,
1006
            K=K,
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1007
            image_size=image_size,
facebook-github-bot's avatar
facebook-github-bot committed
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
        )

    def get_projection_transform(self, **kwargs) -> Transform3d:
        """
        Calculate the projection matrix using
        the multi-view geometry convention.

        Args:
            **kwargs: parameters for the projection can be passed in as keyword
                arguments to override the default values set in __init__.

1019
        Returns:
1020
            A `Transform3d` object with a batch of `N` projection transforms.
facebook-github-bot's avatar
facebook-github-bot committed
1021
1022
1023
1024
1025
1026
1027
1028

        .. code-block:: python

            fx = focal_length[:,0]
            fy = focal_length[:,1]
            px = principal_point[:,0]
            py = principal_point[:,1]

1029
            K = [
facebook-github-bot's avatar
facebook-github-bot committed
1030
1031
1032
1033
1034
1035
                    [fx,   0,    0,  px],
                    [0,   fy,    0,  py],
                    [0,    0,    1,   0],
                    [0,    0,    0,   1],
            ]
        """
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
        K = kwargs.get("K", self.K)  # pyre-ignore[16]
        if K is not None:
            if K.shape != (self._N, 4, 4):
                msg = "Expected K to have shape of (%r, 4, 4)"
                raise ValueError(msg % (self._N))
        else:
            # pyre-ignore[16]
            image_size = kwargs.get("image_size", self.image_size)
            # if imwidth > 0, parameters are in screen space
            image_size = image_size if image_size[0][0] > 0 else None

            K = _get_sfm_calibration_matrix(
                self._N,
                self.device,
                kwargs.get("focal_length", self.focal_length),  # pyre-ignore[16]
                kwargs.get("principal_point", self.principal_point),  # pyre-ignore[16]
                orthographic=True,
                image_size=image_size,
            )
facebook-github-bot's avatar
facebook-github-bot committed
1055
1056

        transform = Transform3d(device=self.device)
1057
        transform._matrix = K.transpose(1, 2).contiguous()
facebook-github-bot's avatar
facebook-github-bot committed
1058
1059
        return transform

1060
1061
1062
1063
    def unproject_points(
        self, xy_depth: torch.Tensor, world_coordinates: bool = True, **kwargs
    ) -> torch.Tensor:
        if world_coordinates:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1064
            to_ndc_transform = self.get_full_projection_transform(**kwargs)
1065
        else:
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1066
            to_ndc_transform = self.get_projection_transform(**kwargs)
facebook-github-bot's avatar
facebook-github-bot committed
1067

Georgia Gkioxari's avatar
Georgia Gkioxari committed
1068
        unprojection_transform = to_ndc_transform.inverse()
1069
        return unprojection_transform.transform_points(xy_depth)
facebook-github-bot's avatar
facebook-github-bot committed
1070
1071


Georgia Gkioxari's avatar
Georgia Gkioxari committed
1072
1073
1074
1075
1076
################################################
#       Helper functions for cameras           #
################################################


facebook-github-bot's avatar
facebook-github-bot committed
1077
def _get_sfm_calibration_matrix(
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1078
1079
1080
1081
1082
1083
    N,
    device,
    focal_length,
    principal_point,
    orthographic: bool = False,
    image_size=None,
facebook-github-bot's avatar
facebook-github-bot committed
1084
1085
1086
1087
1088
1089
1090
1091
1092
) -> torch.Tensor:
    """
    Returns a calibration matrix of a perspective/orthograpic camera.

    Args:
        N: Number of cameras.
        focal_length: Focal length of the camera in world units.
        principal_point: xy coordinates of the center of
            the principal point of the camera in pixels.
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1093
1094
1095
1096
        orthographic: Boolean specifying if the camera is orthographic or not
        image_size: (Optional) Specifying the image_size = (imwidth, imheight).
            If not None, the camera parameters are assumed to be in screen space
            and are transformed to NDC space.
facebook-github-bot's avatar
facebook-github-bot committed
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115

        The calibration matrix `K` is set up as follows:

        .. code-block:: python

            fx = focal_length[:,0]
            fy = focal_length[:,1]
            px = principal_point[:,0]
            py = principal_point[:,1]

            for orthographic==True:
                K = [
                        [fx,   0,    0,  px],
                        [0,   fy,    0,  py],
                        [0,    0,    1,   0],
                        [0,    0,    0,   1],
                ]
            else:
                K = [
1116
1117
                        [fx,   0,   px,   0],
                        [0,   fy,   py,   0],
facebook-github-bot's avatar
facebook-github-bot committed
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
                        [0,    0,    0,   1],
                        [0,    0,    1,   0],
                ]

    Returns:
        A calibration matrix `K` of the SfM-conventioned camera
        of shape (N, 4, 4).
    """

    if not torch.is_tensor(focal_length):
        focal_length = torch.tensor(focal_length, device=device)

Georgia Gkioxari's avatar
Georgia Gkioxari committed
1130
    if focal_length.ndim in (0, 1) or focal_length.shape[1] == 1:
facebook-github-bot's avatar
facebook-github-bot committed
1131
1132
1133
1134
1135
1136
1137
1138
1139
        fx = fy = focal_length
    else:
        fx, fy = focal_length.unbind(1)

    if not torch.is_tensor(principal_point):
        principal_point = torch.tensor(principal_point, device=device)

    px, py = principal_point.unbind(1)

Georgia Gkioxari's avatar
Georgia Gkioxari committed
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
    if image_size is not None:
        if not torch.is_tensor(image_size):
            image_size = torch.tensor(image_size, device=device)
        imwidth, imheight = image_size.unbind(1)
        # make sure imwidth, imheight are valid (>0)
        if (imwidth < 1).any() or (imheight < 1).any():
            raise ValueError(
                "Camera parameters provided in screen space. Image width or height invalid."
            )
        half_imwidth = imwidth / 2.0
        half_imheight = imheight / 2.0
        fx = fx / half_imwidth
        fy = fy / half_imheight
        px = -(px - half_imwidth) / half_imwidth
        py = -(py - half_imheight) / half_imheight

facebook-github-bot's avatar
facebook-github-bot committed
1156
1157
1158
1159
    K = fx.new_zeros(N, 4, 4)
    K[:, 0, 0] = fx
    K[:, 1, 1] = fy
    if orthographic:
1160
1161
        K[:, 0, 3] = px
        K[:, 1, 3] = py
facebook-github-bot's avatar
facebook-github-bot committed
1162
1163
1164
        K[:, 2, 2] = 1.0
        K[:, 3, 3] = 1.0
    else:
1165
1166
        K[:, 0, 2] = px
        K[:, 1, 2] = py
facebook-github-bot's avatar
facebook-github-bot committed
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
        K[:, 3, 2] = 1.0
        K[:, 2, 3] = 1.0

    return K


################################################
# Helper functions for world to view transforms
################################################


David Novotny's avatar
David Novotny committed
1178
def get_world_to_view_transform(R=_R, T=_T) -> Transform3d:
facebook-github-bot's avatar
facebook-github-bot committed
1179
1180
1181
1182
1183
    """
    This function returns a Transform3d representing the transformation
    matrix to go from world space to view space by applying a rotation and
    a translation.

1184
    PyTorch3D uses the same convention as Hartley & Zisserman.
facebook-github-bot's avatar
facebook-github-bot committed
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
    I.e., for camera extrinsic parameters R (rotation) and T (translation),
    we map a 3D point `X_world` in world coordinates to
    a point `X_cam` in camera coordinates with:
    `X_cam = X_world R + T`

    Args:
        R: (N, 3, 3) matrix representing the rotation.
        T: (N, 3) matrix representing the translation.

    Returns:
        a Transform3d object which represents the composed RT transformation.

    """
    # TODO: also support the case where RT is specified as one matrix
    # of shape (N, 4, 4).

    if T.shape[0] != R.shape[0]:
        msg = "Expected R, T to have the same batch dimension; got %r, %r"
        raise ValueError(msg % (R.shape[0], T.shape[0]))
    if T.dim() != 2 or T.shape[1:] != (3,):
        msg = "Expected T to have shape (N, 3); got %r"
        raise ValueError(msg % repr(T.shape))
    if R.dim() != 3 or R.shape[1:] != (3, 3):
        msg = "Expected R to have shape (N, 3, 3); got %r"
1209
        raise ValueError(msg % repr(R.shape))
facebook-github-bot's avatar
facebook-github-bot committed
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247

    # Create a Transform3d object
    T = Translate(T, device=T.device)
    R = Rotate(R, device=R.device)
    return R.compose(T)


def camera_position_from_spherical_angles(
    distance, elevation, azimuth, degrees: bool = True, device: str = "cpu"
) -> torch.Tensor:
    """
    Calculate the location of the camera based on the distance away from
    the target point, the elevation and azimuth angles.

    Args:
        distance: distance of the camera from the object.
        elevation, azimuth: angles.
            The inputs distance, elevation and azimuth can be one of the following
                - Python scalar
                - Torch scalar
                - Torch tensor of shape (N) or (1)
        degrees: bool, whether the angles are specified in degrees or radians.
        device: str or torch.device, device for new tensors to be placed on.

    The vectors are broadcast against each other so they all have shape (N, 1).

    Returns:
        camera_position: (N, 3) xyz location of the camera.
    """
    broadcasted_args = convert_to_tensors_and_broadcast(
        distance, elevation, azimuth, device=device
    )
    dist, elev, azim = broadcasted_args
    if degrees:
        elev = math.pi / 180.0 * elev
        azim = math.pi / 180.0 * azim
    x = dist * torch.cos(elev) * torch.sin(azim)
    y = dist * torch.sin(elev)
1248
    z = dist * torch.cos(elev) * torch.cos(azim)
facebook-github-bot's avatar
facebook-github-bot committed
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
    camera_position = torch.stack([x, y, z], dim=1)
    if camera_position.dim() == 0:
        camera_position = camera_position.view(1, -1)  # add batch dim.
    return camera_position.view(-1, 3)


def look_at_rotation(
    camera_position, at=((0, 0, 0),), up=((0, 1, 0),), device: str = "cpu"
) -> torch.Tensor:
    """
    This function takes a vector 'camera_position' which specifies the location
    of the camera in world coordinates and two vectors `at` and `up` which
    indicate the position of the object and the up directions of the world
    coordinate system respectively. The object is assumed to be centered at
    the origin.

    The output is a rotation matrix representing the transformation
    from world coordinates -> view coordinates.

    Args:
        camera_position: position of the camera in world coordinates
        at: position of the object in world coordinates
        up: vector specifying the up direction in the world coordinate frame.

    The inputs camera_position, at and up can each be a
        - 3 element tuple/list
        - torch tensor of shape (1, 3)
        - torch tensor of shape (N, 3)

    The vectors are broadcast against each other so they all have shape (N, 3).

    Returns:
        R: (N, 3, 3) batched rotation matrices
    """
    # Format input and broadcast
    broadcasted_args = convert_to_tensors_and_broadcast(
        camera_position, at, up, device=device
    )
    camera_position, at, up = broadcasted_args
    for t, n in zip([camera_position, at, up], ["camera_position", "at", "up"]):
        if t.shape[-1] != 3:
            msg = "Expected arg %s to have shape (N, 3); got %r"
            raise ValueError(msg % (n, t.shape))
    z_axis = F.normalize(at - camera_position, eps=1e-5)
1293
1294
    x_axis = F.normalize(torch.cross(up, z_axis, dim=1), eps=1e-5)
    y_axis = F.normalize(torch.cross(z_axis, x_axis, dim=1), eps=1e-5)
Amitav Baruah's avatar
Amitav Baruah committed
1295
1296
1297
1298
1299
1300
    is_close = torch.isclose(x_axis, torch.tensor(0.0), atol=5e-3).all(
        dim=1, keepdim=True
    )
    if is_close.any():
        replacement = F.normalize(torch.cross(y_axis, z_axis, dim=1), eps=1e-5)
        x_axis = torch.where(is_close, replacement, x_axis)
1301
    R = torch.cat((x_axis[:, None, :], y_axis[:, None, :], z_axis[:, None, :]), dim=1)
facebook-github-bot's avatar
facebook-github-bot committed
1302
1303
1304
1305
    return R.transpose(1, 2)


def look_at_view_transform(
1306
1307
1308
    dist=1.0,
    elev=0.0,
    azim=0.0,
facebook-github-bot's avatar
facebook-github-bot committed
1309
    degrees: bool = True,
1310
    eye: Optional[Sequence] = None,
facebook-github-bot's avatar
facebook-github-bot committed
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
    at=((0, 0, 0),),  # (1, 3)
    up=((0, 1, 0),),  # (1, 3)
    device="cpu",
) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    This function returns a rotation and translation matrix
    to apply the 'Look At' transformation from world -> view coordinates [0].

    Args:
        dist: distance of the camera from the object
        elev: angle in degres or radians. This is the angle between the
1322
            vector from the object to the camera, and the horizontal plane y = 0 (xz-plane).
facebook-github-bot's avatar
facebook-github-bot committed
1323
        azim: angle in degrees or radians. The vector from the object to
1324
            the camera is projected onto a horizontal plane y = 0.
facebook-github-bot's avatar
facebook-github-bot committed
1325
            azim is the angle between the projected vector and a
1326
            reference vector at (0, 0, 1) on the reference plane (the horizontal plane).
facebook-github-bot's avatar
facebook-github-bot committed
1327
1328
        dist, elem and azim can be of shape (1), (N).
        degrees: boolean flag to indicate if the elevation and azimuth
1329
1330
1331
            angles are specified in degrees or radians.
        eye: the position of the camera(s) in world coordinates. If eye is not
            None, it will overide the camera position derived from dist, elev, azim.
facebook-github-bot's avatar
facebook-github-bot committed
1332
1333
        up: the direction of the x axis in the world coordinate system.
        at: the position of the object(s) in world coordinates.
1334
        eye, up and at can be of shape (1, 3) or (N, 3).
facebook-github-bot's avatar
facebook-github-bot committed
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344

    Returns:
        2-element tuple containing

        - **R**: the rotation to apply to the points to align with the camera.
        - **T**: the translation to apply to the points to align with the camera.

    References:
    [0] https://www.scratchapixel.com
    """
1345
1346

    if eye is not None:
1347
        broadcasted_args = convert_to_tensors_and_broadcast(eye, at, up, device=device)
1348
1349
1350
1351
        eye, at, up = broadcasted_args
        C = eye
    else:
        broadcasted_args = convert_to_tensors_and_broadcast(
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1352
1353
            dist, elev, azim, at, up, device=device
        )
1354
        dist, elev, azim, at, up = broadcasted_args
1355
1356
1357
1358
1359
        C = (
            camera_position_from_spherical_angles(
                dist, elev, azim, degrees=degrees, device=device
            )
            + at
Georgia Gkioxari's avatar
Georgia Gkioxari committed
1360
        )
1361

facebook-github-bot's avatar
facebook-github-bot committed
1362
1363
1364
    R = look_at_rotation(C, at, up, device=device)
    T = -torch.bmm(R.transpose(1, 2), C[:, :, None])[:, :, 0]
    return R, T