[Feature] Support ImVoteNet complete model (#352)

* Added image loading in SUNRGB-D dataset (#195) * image loading * format and docstring fix * removed irrelevant files * removed irrelevant files * load image only if modality is pc+img * added modality like nuscenes * Added imvotenet image branch pretrain (#217) * image loading * naive commit * format and docstring fix * removed irrelevant files * removed irrelevant files * load image only if modality is pc+img * added modality like nuscenes * pretrain_2d_model * finetune sunrgbd * finetune sunrgbd * deleted unused code * fixed a bug * resolve conflict * update config file * fix docstring and configs * integrated vote fusion * coords transform and unit test * Update docstring * refactor and add unit test * fix bug caused by mmcv upgrade; delete pdb breakpoint * add point fusion unittest * remove unused file * fix typos * updates * add assertion info * update * add unittest * add vote fusion unittest * add vote fusion unittest * [Refactor] VoteNet refactor (#322) * votenet refactor * remove file * minor update * docstring * initial update of imvotenet * [Feature] Support vote fusion (#297) * integrated vote fusion * coords transform and unit test * Update docstring * refactor and add unit test * add point fusion unittest * remove unused file * updates * add assertion info * update * add unittest * add vote fusion unittest * add vote fusion unittest * minor update * docstring * change np ops to torch * refactor test * update * refactor of image mlp and np random ops to torch * add docstring * add config and mod dataset * fix bugs * add_comments * fix bugs * fix_bug * fix bug * fix bug * fix bug * fix bug * final fix * fix bug * ? * add docstring * move train/test cfg * change img mlp default param * rename config * minor mod * change config name * move train/test cfg * some fixes and 2d utils * fix config name * fix config override issue * config simplify & reformat * explicitly set eval mode->override train() * add fix_img_branch to config * remove set_img_branch_eval_mode * temporal fix, change calibs to calib * more docstring and view/reshape, expand/repeat change * complete imvotenet docstring * fix docstring * add config and some minor fix * rename config Co-authored-by: ZwwWayne <wayne.zw@outlook.com>

[Feature] Support ImVoteNet complete model (#352)
* Added image loading in SUNRGB-D dataset (#195) * image loading * format and docstring fix * removed irrelevant files * removed irrelevant files * load image only if modality is pc+img * added modality like nuscenes * Added imvotenet image branch pretrain (#217) * image loading * naive commit * format and docstring fix * removed irrelevant files * removed irrelevant files * load image only if modality is pc+img * added modality like nuscenes * pretrain_2d_model * finetune sunrgbd * finetune sunrgbd * deleted unused code * fixed a bug * resolve conflict * update config file * fix docstring and configs * integrated vote fusion * coords transform and unit test * Update docstring * refactor and add unit test * fix bug caused by mmcv upgrade; delete pdb breakpoint * add point fusion unittest * remove unused file * fix typos * updates * add assertion info * update * add unittest * add vote fusion unittest * add vote fusion unittest * [Refactor] VoteNet refactor (#322) * votenet refactor * remove file * minor update * docstring * initial update of imvotenet * [Feature] Support vote fusion (#297) * integrated vote fusion * coords transform and unit test * Update docstring * refactor and add unit test * add point fusion unittest * remove unused file * updates * add assertion info * update * add unittest * add vote fusion unittest * add vote fusion unittest * minor update * docstring * change np ops to torch * refactor test * update * refactor of image mlp and np random ops to torch * add docstring * add config and mod dataset * fix bugs * add_comments * fix bugs * fix_bug * fix bug * fix bug * fix bug * fix bug * final fix * fix bug * ? * add docstring * move train/test cfg * change img mlp default param * rename config * minor mod * change config name * move train/test cfg * some fixes and 2d utils * fix config name * fix config override issue * config simplify & reformat * explicitly set eval mode->override train() * add fix_img_branch to config * remove set_img_branch_eval_mode * temporal fix, change calibs to calib * more docstring and view/reshape, expand/repeat change * complete imvotenet docstring * fix docstring * add config and some minor fix * rename config Co-authored-by: ZwwWayne <wayne.zw@outlook.com>
4eed122d · Yezhen Cong · GitHub · 097b66ee · 4eed122d · 4eed122d
Unverified Commit 4eed122d authored Mar 24, 2021 by Yezhen Cong Committed by GitHub Mar 24, 2021
7 changed files
--- a/mmdet3d/models/utils/__init__.py
+++ b/mmdet3d/models/utils/__init__.py
 from .clip_sigmoid import clip_sigmoid
+from .mlp import MLP

-__all__ = ['clip_sigmoid']
+__all__ = ['clip_sigmoid', 'MLP']
--- a/mmdet3d/models/utils/mlp.py
+++ b/mmdet3d/models/utils/mlp.py
+from mmcv.cnn import ConvModule
+from torch import nn as nn
+
+
+class MLP(nn.Module):
+    """A simple MLP module.
+
+    Pass features (B, C, N) through an MLP.
+
+    Args:
+        in_channels (int): Number of channels of input features.
+            Default: 18.
+        conv_channels (tuple[int]): Out channels of the convolution.
+            Default: (256, 256).
+        conv_cfg (dict): Config of convolution.
+            Default: dict(type='Conv1d').
+        norm_cfg (dict): Config of normalization.
+            Default: dict(type='BN1d').
+        act_cfg (dict): Config of activation.
+            Default: dict(type='ReLU').
+    """
+
+    def __init__(self,
+                 in_channel=18,
+                 conv_channels=(256, 256),
+                 conv_cfg=dict(type='Conv1d'),
+                 norm_cfg=dict(type='BN1d'),
+                 act_cfg=dict(type='ReLU')):
+        super().__init__()
+        self.mlp = nn.Sequential()
+        prev_channels = in_channel
+        for i, conv_channel in enumerate(conv_channels):
+            self.mlp.add_module(
+                f'layer{i}',
+                ConvModule(
+                    prev_channels,
+                    conv_channels[i],
+                    1,
+                    padding=0,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg,
+                    bias=True,
+                    inplace=True))
+            prev_channels = conv_channels[i]
+
+    def forward(self, img_features):
+        return self.mlp(img_features)
--- a/tests/test_models/test_fusion/test_fusion_coord_trans.py
+++ b/tests/test_models/test_fusion/test_fusion_coord_trans.py
+"""Tests coords transformation in fusion modules.
+
+CommandLine:
+    pytest tests/test_models/test_fusion/test_fusion_coord_trans.py
+"""
+
+import torch
+
+from mmdet3d.models.fusion_layers import apply_3d_transformation
+
+
+def test_coords_transformation():
+    """Test the transformation of 3d coords."""
+
+    # H+R+S+T, not reverse, depth
+    img_meta = {
+        'pcd_scale_factor':
+        1.2311e+00,
+        'pcd_rotation': [[8.660254e-01, 0.5, 0], [-0.5, 8.660254e-01, 0],
+                         [0, 0, 1.0e+00]],
+        'pcd_trans': [1.111e-02, -8.88e-03, 0.0],
+        'pcd_horizontal_flip':
+        True,
+        'transformation_3d_flow': ['HF', 'R', 'S', 'T']
+    }
+
+    pcd = torch.tensor([[-5.2422e+00, -2.9757e-01, 4.0021e+01],
+                        [-9.1435e-01, 2.6675e+01, -5.5950e+00],
+                        [2.0089e-01, 5.8098e+00, -3.5409e+01],
+                        [-1.9461e-01, 3.1309e+01, -1.0901e+00]])
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'DEPTH', img_meta, reverse=False)
+
+    expected_tensor = torch.tensor(
+        [[5.78332345e+00, 2.900697e+00, 4.92698531e+01],
+         [-1.5433839e+01, 2.8993850e+01, -6.8880045e+00],
+         [-3.77929405e+00, 6.061661e+00, -4.35920199e+01],
+         [-1.9053658e+01, 3.3491436e+01, -1.34202211e+00]])
+
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
+
+    # H+R+S+T, reverse, depth
+    img_meta = {
+        'pcd_scale_factor':
+        7.07106781e-01,
+        'pcd_rotation': [[7.07106781e-01, 7.07106781e-01, 0.0],
+                         [-7.07106781e-01, 7.07106781e-01, 0.0],
+                         [0.0, 0.0, 1.0e+00]],
+        'pcd_trans': [0.0, 0.0, 0.0],
+        'pcd_horizontal_flip':
+        False,
+        'transformation_3d_flow': ['HF', 'R', 'S', 'T']
+    }
+
+    pcd = torch.tensor([[-5.2422e+00, -2.9757e-01, 4.0021e+01],
+                        [-9.1435e+01, 2.6675e+01, -5.5950e+00],
+                        [6.061661e+00, -0.0, -1.0e+02]])
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'DEPTH', img_meta, reverse=True)
+
+    expected_tensor = torch.tensor(
+        [[-5.53977e+00, 4.94463e+00, 5.65982409e+01],
+         [-6.476e+01, 1.1811e+02, -7.91252488e+00],
+         [6.061661e+00, -6.061661e+00, -1.41421356e+02]])
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
+
+    # H+R+S+T, not reverse, camera
+    img_meta = {
+        'pcd_scale_factor':
+        1.0 / 7.07106781e-01,
+        'pcd_rotation': [[7.07106781e-01, 0.0, 7.07106781e-01],
+                         [0.0, 1.0e+00, 0.0],
+                         [-7.07106781e-01, 0.0, 7.07106781e-01]],
+        'pcd_trans': [1.0e+00, -1.0e+00, 0.0],
+        'pcd_horizontal_flip':
+        True,
+        'transformation_3d_flow': ['HF', 'S', 'R', 'T']
+    }
+
+    pcd = torch.tensor([[-5.2422e+00, 4.0021e+01, -2.9757e-01],
+                        [-9.1435e+01, -5.5950e+00, 2.6675e+01],
+                        [6.061661e+00, -1.0e+02, -0.0]])
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'CAMERA', img_meta, reverse=False)
+
+    expected_tensor = torch.tensor(
+        [[6.53977e+00, 5.55982409e+01, 4.94463e+00],
+         [6.576e+01, -8.91252488e+00, 1.1811e+02],
+         [-5.061661e+00, -1.42421356e+02, -6.061661e+00]])
+
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
+
+    # V, reverse, camera
+    img_meta = {'pcd_vertical_flip': True, 'transformation_3d_flow': ['VF']}
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'CAMERA', img_meta, reverse=True)
+
+    expected_tensor = torch.tensor([[-5.2422e+00, 4.0021e+01, 2.9757e-01],
+                                    [-9.1435e+01, -5.5950e+00, -2.6675e+01],
+                                    [6.061661e+00, -1.0e+02, 0.0]])
+
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
+
+    # V+H, not reverse, depth
+    img_meta = {
+        'pcd_vertical_flip': True,
+        'pcd_horizontal_flip': True,
+        'transformation_3d_flow': ['VF', 'HF']
+    }
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'DEPTH', img_meta, reverse=False)
+
+    expected_tensor = torch.tensor([[5.2422e+00, -4.0021e+01, -2.9757e-01],
+                                    [9.1435e+01, 5.5950e+00, 2.6675e+01],
+                                    [-6.061661e+00, 1.0e+02, 0.0]])
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
+
+    # V+H, reverse, lidar
+    img_meta = {
+        'pcd_vertical_flip': True,
+        'pcd_horizontal_flip': True,
+        'transformation_3d_flow': ['VF', 'HF']
+    }
+
+    pcd_transformed = apply_3d_transformation(
+        pcd, 'LIDAR', img_meta, reverse=True)
+
+    expected_tensor = torch.tensor([[5.2422e+00, -4.0021e+01, -2.9757e-01],
+                                    [9.1435e+01, 5.5950e+00, 2.6675e+01],
+                                    [-6.061661e+00, 1.0e+02, 0.0]])
+    assert torch.allclose(expected_tensor, pcd_transformed, 1e-4)
--- a/tests/test_models/test_fusion/test_point_fusion.py
+++ b/tests/test_models/test_fusion/test_point_fusion.py
+"""Tests the core function of point fusion.
+
+CommandLine:
+    pytest tests/test_models/test_fusion/test_point_fusion.py
+"""
+
+import torch
+
+from mmdet3d.models.fusion_layers import PointFusion
+
+
+def test_sample_single():
+    # this function makes sure the rewriting of 3d coords transformation
+    # in point fusion does not change the original behaviour
+    lidar2img = torch.tensor(
+        [[6.0294e+02, -7.0791e+02, -1.2275e+01, -1.7094e+02],
+         [1.7678e+02, 8.8088e+00, -7.0794e+02, -1.0257e+02],
+         [9.9998e-01, -1.5283e-03, -5.2907e-03, -3.2757e-01],
+         [0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]])
+
+    #  all use default
+    img_meta = {
+        'transformation_3d_flow': ['R', 'S', 'T', 'HF'],
+        'input_shape': [370, 1224],
+        'img_shape': [370, 1224],
+        'lidar2img': lidar2img,
+    }
+
+    #  dummy parameters
+    fuse = PointFusion(1, 1, 1, 1)
+    img_feat = torch.arange(370 * 1224)[None, ...].view(
+        370, 1224)[None, None, ...].float() / (370 * 1224)
+    pts = torch.tensor([[8.356, -4.312, -0.445], [11.777, -6.724, -0.564],
+                        [6.453, 2.53, -1.612], [6.227, -3.839, -0.563]])
+    out = fuse.sample_single(img_feat, pts, img_meta)
+
+    expected_tensor = torch.tensor(
+        [0.5560822, 0.5476625, 0.9687978, 0.6241757])
+    assert torch.allclose(expected_tensor, out, 1e-4)
+
+    pcd_rotation = torch.tensor([[8.660254e-01, 0.5, 0],
+                                 [-0.5, 8.660254e-01, 0], [0, 0, 1.0e+00]])
+    pcd_scale_factor = 1.111
+    pcd_trans = torch.tensor([1.0, -1.0, 0.5])
+    pts = pts @ pcd_rotation
+    pts *= pcd_scale_factor
+    pts += pcd_trans
+    pts[:, 1] = -pts[:, 1]
+
+    # not use default
+    img_meta.update({
+        'pcd_scale_factor': pcd_scale_factor,
+        'pcd_rotation': pcd_rotation,
+        'pcd_trans': pcd_trans,
+        'pcd_horizontal_flip': True
+    })
+    out = fuse.sample_single(img_feat, pts, img_meta)
+    expected_tensor = torch.tensor(
+        [0.5560822, 0.5476625, 0.9687978, 0.6241757])
+    assert torch.allclose(expected_tensor, out, 1e-4)
--- a/tests/test_models/test_fusion/test_vote_fusion.py
+++ b/tests/test_models/test_fusion/test_vote_fusion.py
+"""Tests the core function of vote fusion.
+
+CommandLine:
+    pytest tests/test_models/test_fusion/test_vote_fusion.py
+"""
+
+import torch
+
+from mmdet3d.models.fusion_layers import VoteFusion
+
+
+def test_vote_fusion():
+    img_meta = {
+        'ori_shape': (530, 730, 3),
+        'img_shape': (600, 826, 3),
+        'pad_shape': (608, 832, 3),
+        'scale_factor':
+        torch.tensor([1.1315, 1.1321, 1.1315, 1.1321]),
+        'flip':
+        False,
+        'pcd_horizontal_flip':
+        False,
+        'pcd_vertical_flip':
+        False,
+        'pcd_trans':
+        torch.tensor([0., 0., 0.]),
+        'pcd_scale_factor':
+        1.0308290128214932,
+        'pcd_rotation':
+        torch.tensor([[0.9747, 0.2234, 0.0000], [-0.2234, 0.9747, 0.0000],
+                      [0.0000, 0.0000, 1.0000]]),
+        'transformation_3d_flow': ['HF', 'R', 'S', 'T']
+    }
+
+    calibs = {
+        'Rt':
+        torch.tensor([[[0.979570, 0.047954, -0.195330],
+                       [0.047954, 0.887470, 0.458370],
+                       [0.195330, -0.458370, 0.867030]]]),
+        'K':
+        torch.tensor([[[529.5000, 0.0000, 365.0000],
+                       [0.0000, 529.5000, 265.0000], [0.0000, 0.0000,
+                                                      1.0000]]])
+    }
+
+    bboxes = torch.tensor([[[
+        5.4286e+02, 9.8283e+01, 6.1700e+02, 1.6742e+02, 9.7922e-01, 3.0000e+00
+    ], [
+        4.2613e+02, 8.4646e+01, 4.9091e+02, 1.6237e+02, 9.7848e-01, 3.0000e+00
+    ], [
+        2.5606e+02, 7.3244e+01, 3.7883e+02, 1.8471e+02, 9.7317e-01, 3.0000e+00
+    ], [
+        6.0104e+02, 1.0648e+02, 6.6757e+02, 1.9216e+02, 8.4607e-01, 3.0000e+00
+    ], [
+        2.2923e+02, 1.4984e+02, 7.0163e+02, 4.6537e+02, 3.5719e-01, 0.0000e+00
+    ], [
+        2.5614e+02, 7.4965e+01, 3.3275e+02, 1.5908e+02, 2.8688e-01, 3.0000e+00
+    ], [
+        9.8718e+00, 1.4142e+02, 2.0213e+02, 3.3878e+02, 1.0935e-01, 3.0000e+00
+    ], [
+        6.1930e+02, 1.1768e+02, 6.8505e+02, 2.0318e+02, 1.0720e-01, 3.0000e+00
+    ]]])
+
+    seeds_3d = torch.tensor([[[0.044544, 1.675476, -1.531831],
+                              [2.500625, 7.238662, -0.737675],
+                              [-0.600003, 4.827733, -0.084022],
+                              [1.396212, 3.994484, -1.551180],
+                              [-2.054746, 2.012759, -0.357472],
+                              [-0.582477, 6.580470, -1.466052],
+                              [1.313331, 5.722039, 0.123904],
+                              [-1.107057, 3.450359, -1.043422],
+                              [1.759746, 5.655951, -1.519564],
+                              [-0.203003, 6.453243, 0.137703],
+                              [-0.910429, 0.904407, -0.512307],
+                              [0.434049, 3.032374, -0.763842],
+                              [1.438146, 2.289263, -1.546332],
+                              [0.575622, 5.041906, -0.891143],
+                              [-1.675931, 1.417597, -1.588347]]])
+
+    imgs = torch.linspace(
+        -1, 1, steps=608 * 832).reshape(1, 608, 832).repeat(3, 1, 1)[None]
+
+    expected_tensor1 = torch.tensor(
+        [[[
+            0.000000e+00, -0.000000e+00, 0.000000e+00, -0.000000e+00,
+            0.000000e+00, 1.193706e-01, -0.000000e+00, -2.879214e-01,
+            -0.000000e+00, 0.000000e+00, 1.422463e-01, -6.474612e-01,
+            -0.000000e+00, 1.490057e-02, 0.000000e+00
+        ],
+          [
+              0.000000e+00, -0.000000e+00, -0.000000e+00, 0.000000e+00,
+              0.000000e+00, -1.873745e+00, -0.000000e+00, 1.576240e-01,
+              0.000000e+00, -0.000000e+00, -3.646177e-02, -7.751858e-01,
+              0.000000e+00, 9.593642e-02, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, -6.263277e-02, 0.000000e+00, -3.646387e-01,
+              0.000000e+00, 0.000000e+00, -5.875812e-01, -6.263450e-02,
+              0.000000e+00, 1.149264e-01, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 8.899736e-01, 0.000000e+00, 9.019017e-01,
+              0.000000e+00, 0.000000e+00, 6.917775e-01, 8.899733e-01,
+              0.000000e+00, 9.812444e-01, 0.000000e+00
+          ],
+          [
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, -4.516903e-01, -0.000000e+00, -2.315422e-01,
+              -0.000000e+00, -0.000000e+00, -4.197519e-01, -4.516906e-01,
+              -0.000000e+00, -1.547615e-01, -0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 3.571937e-01, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 3.571937e-01,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 9.731653e-01,
+              0.000000e+00, 0.000000e+00, 1.093455e-01, 0.000000e+00,
+              0.000000e+00, 8.460656e-01, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ]]])
+
+    expected_tensor2 = torch.tensor([[
+        False, False, False, False, False, True, False, True, False, False,
+        True, True, False, True, False, False, False, False, False, False,
+        False, False, True, False, False, False, False, False, True, False,
+        False, False, False, False, False, False, False, False, False, False,
+        False, False, False, True, False
+    ]])
+
+    expected_tensor3 = torch.tensor(
+        [[[
+            -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+            0.000000e+00, -0.000000e+00, -0.000000e+00, 0.000000e+00,
+            -0.000000e+00, -0.000000e+00, 0.000000e+00, -0.000000e+00,
+            -0.000000e+00, 1.720988e-01, 0.000000e+00
+        ],
+          [
+              0.000000e+00, -0.000000e+00, -0.000000e+00, 0.000000e+00,
+              -0.000000e+00, 0.000000e+00, -0.000000e+00, 0.000000e+00,
+              0.000000e+00, -0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 4.824460e-02, 0.000000e+00
+          ],
+          [
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, 0.000000e+00,
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, 1.447314e-01, -0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 9.759269e-01, 0.000000e+00
+          ],
+          [
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, -0.000000e+00, -0.000000e+00, -0.000000e+00,
+              -0.000000e+00, -1.631542e-01, -0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 1.072001e-01, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+              0.000000e+00, 0.000000e+00, 0.000000e+00
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ],
+          [
+              2.316288e-03, -1.948284e-03, -3.694394e-03, 2.176163e-04,
+              -3.882605e-03, -1.901490e-03, -3.355042e-03, -1.774631e-03,
+              -6.981542e-04, -3.886823e-03, -1.302233e-03, -1.189933e-03,
+              2.540967e-03, -1.834944e-03, 1.032048e-03
+          ]]])
+
+    fusion = VoteFusion()
+    out1, out2 = fusion(imgs, bboxes, seeds_3d, [img_meta], calibs)
+    assert torch.allclose(expected_tensor1, out1[:, :, :15], 1e-3)
+    assert torch.allclose(expected_tensor2.float(), out2.float(), 1e-3)
+    assert torch.allclose(expected_tensor3, out1[:, :, 30:45], 1e-3)
+
+    out1, out2 = fusion(imgs, bboxes[:, :2], seeds_3d, [img_meta], calibs)
+    out1 = out1[:, :15, 30:45]
+    out2 = out2[:, 30:45].float()
+    assert torch.allclose(torch.zeros_like(out1), out1, 1e-3)
+    assert torch.allclose(torch.zeros_like(out2), out2, 1e-3)
--- a/tests/test_utils/test_coord_3d_mode.py
+++ b/tests/test_utils/test_coord_3d_mode.py
@@ -62,21 +62,21 @@ def test_points_conversion():

    convert_depth_points = cam_points.convert_to(Coord3DMode.DEPTH)
    expected_tensor = torch.tensor([[
-        -5.2422e+00, -2.9757e-01, 4.0021e+01, 6.6660e-01, 1.9560e-01,
+        -5.2422e+00, 2.9757e-01, -4.0021e+01, 6.6660e-01, 1.9560e-01,
        4.9740e-01, 9.4090e-01
    ],
                                    [
-                                        -2.6675e+01, 9.1435e-01, 5.5950e+00,
+                                        -2.6675e+01, -9.1435e-01, -5.5950e+00,
                                        1.5020e-01, 3.7070e-01, 1.0860e-01,
                                        6.2970e-01
                                    ],
                                    [
-                                        -5.8098e+00, -2.0089e-01, 3.5409e+01,
+                                        -5.8098e+00, 2.0089e-01, -3.5409e+01,
                                        6.5650e-01, 6.2480e-01, 6.9540e-01,
                                        2.5380e-01
                                    ],
                                    [
-                                        -3.1309e+01, 1.9461e-01, 1.0901e+00,
+                                        -3.1309e+01, -1.9461e-01, -1.0901e+00,
                                        2.8030e-01, 2.5800e-02, 4.8960e-01,
                                        3.2690e-01
                                    ]])
@@ -157,21 +157,21 @@ def test_points_conversion():

    convert_cam_points = depth_points.convert_to(Coord3DMode.CAM)
    expected_tensor = torch.tensor([[
-        -5.2422e+00, 2.9757e-01, -4.0021e+01, 6.6660e-01, 1.9560e-01,
+        -5.2422e+00, -2.9757e-01, 4.0021e+01, 6.6660e-01, 1.9560e-01,
        4.9740e-01, 9.4090e-01
    ],
                                    [
-                                        -2.6675e+01, -9.1435e-01, -5.5950e+00,
+                                        -2.6675e+01, 9.1435e-01, 5.5950e+00,
                                        1.5020e-01, 3.7070e-01, 1.0860e-01,
                                        6.2970e-01
                                    ],
                                    [
-                                        -5.8098e+00, 2.0089e-01, -3.5409e+01,
+                                        -5.8098e+00, -2.0089e-01, 3.5409e+01,
                                        6.5650e-01, 6.2480e-01, 6.9540e-01,
                                        2.5380e-01
                                    ],
                                    [
-                                        -3.1309e+01, -1.9461e-01, -1.0901e+00,
+                                        -3.1309e+01, 1.9461e-01, 1.0901e+00,
                                        2.8030e-01, 2.5800e-02, 4.8960e-01,
                                        3.2690e-01
                                    ]])
@@ -182,6 +182,22 @@ def test_points_conversion():
    assert torch.allclose(expected_tensor, convert_cam_points.tensor, 1e-4)
    assert torch.allclose(cam_point_tensor, convert_cam_points.tensor, 1e-4)

+    rt_mat_provided = torch.tensor([[0.99789, -0.012698, -0.063678],
+                                    [-0.012698, 0.92359, -0.38316],
+                                    [0.063678, 0.38316, 0.92148]])
+
+    depth_points_new = torch.cat([
+        depth_points.tensor[:, :3] @ rt_mat_provided.t(),
+        depth_points.tensor[:, 3:]
+    ],
+                                 dim=1)
+    cam_point_tensor_new = Coord3DMode.convert_point(
+        depth_points_new,
+        Coord3DMode.DEPTH,
+        Coord3DMode.CAM,
+        rt_mat=rt_mat_provided)
+    assert torch.allclose(expected_tensor, cam_point_tensor_new, 1e-4)
+
    convert_lidar_points = depth_points.convert_to(Coord3DMode.LIDAR)
    expected_tensor = torch.tensor([[
        4.0021e+01, 5.2422e+00, 2.9757e-01, 6.6660e-01, 1.9560e-01, 4.9740e-01,

--- a/tools/data_converter/sunrgbd_data_utils.py
+++ b/tools/data_converter/sunrgbd_data_utils.py
@@ -111,8 +111,9 @@ class SUNRGBDData(object):
        calib_filepath = osp.join(self.calib_dir, f'{idx:06d}.txt')
        lines = [line.rstrip() for line in open(calib_filepath)]
        Rt = np.array([float(x) for x in lines[0].split(' ')])
-        Rt = np.reshape(Rt, (3, 3), order='F')
+        Rt = np.reshape(Rt, (3, 3), order='F').astype(np.float32)
        K = np.array([float(x) for x in lines[1].split(' ')])
+        K = np.reshape(Rt, (3, 3), order='F').astype(np.float32)
        return K, Rt

    def get_label_objects(self, idx):
@@ -155,8 +156,7 @@ class SUNRGBDData(object):
                osp.join(self.root_dir, 'points', f'{sample_idx:06d}.bin'))

            info['pts_path'] = osp.join('points', f'{sample_idx:06d}.bin')
-            img_name = osp.join(self.image_dir, f'{sample_idx:06d}')
-            img_path = osp.join(self.image_dir, img_name)
+            img_path = osp.join('image', f'{sample_idx:06d}.jpg')
            image_info = {
                'image_idx': sample_idx,
                'image_shape': self.get_image_shape(sample_idx),