Add support for VoxelNeXt (#1309)

* VoxelNeXt

Add support for VoxelNeXt (#1309)
* VoxelNeXt
83954d03 · yukang · GitHub · 31f6758a · 83954d03 · 83954d03
Unverified Commit 83954d03 authored Apr 03, 2023 by yukang Committed by GitHub Apr 03, 2023
20 changed files
--- a/README.md
+++ b/README.md
@@ -22,6 +22,8 @@ It is also the official code release of [`[PointRCNN]`](https://arxiv.org/abs/18


 ## Changelog
+[2023-04-02] Added support for [`VoxelNeXt`](https://github.com/dvlab-research/VoxelNeXt) on Nuscenes, Waymo, and Argoverse2 datasets. It is a fully sparse 3D object detection network, which is a clean sparse CNNs network and predicts 3D objects directly upon voxels.
+
 [2022-09-02] **NEW:** Update `OpenPCDet` to v0.6.0:
 * Official code release of [MPPNet](https://arxiv.org/abs/2205.05979) for temporal 3D object detection, which supports long-term multi-frame 3D object detection and ranks 1st place on [3D detection learderboard](https://waymo.com/open/challenges/2020/3d-detection) of Waymo Open Dataset on Sept. 2th, 2022. For validation dataset, MPPNet achieves 74.96%, 75.06% and 74.52% for vehicle, pedestrian and cyclist classes in terms of mAPH@Level_2. (see the [guideline](docs/guidelines_of_approaches/mppnet.md) on how to train/test with MPPNet).
 * Support multi-frame training/testing on Waymo Open Dataset (see the [change log](docs/changelog.md) for more details on how to process data).
@@ -172,7 +174,6 @@ By default, all models are trained with **a single frame** of **20% data (~32k f
 | [PV-RCNN++](tools/cfgs/waymo_models/pv_rcnn_plusplus.yaml) | 77.82/77.32|	69.07/68.62|	77.99/71.36|	69.92/63.74|	71.80/70.71|	69.31/68.26|
 | [PV-RCNN++ (ResNet)](tools/cfgs/waymo_models/pv_rcnn_plusplus_resnet.yaml) |77.61/77.14|	69.18/68.75|	79.42/73.31|	70.88/65.21|	72.50/71.39|	69.84/68.77|

-
 Here we also provide the performance of several models trained on the full training set (refer to the paper of [PV-RCNN++](https://arxiv.org/abs/2102.00463)):

 |    Performance@(train with 100\% Data)            | Vec_L1 | Vec_L2 | Ped_L1 | Ped_L2 | Cyc_L1 | Cyc_L2 |  
@@ -180,6 +181,7 @@ Here we also provide the performance of several models trained on the full train
 | [SECOND](tools/cfgs/waymo_models/second.yaml) | 72.27/71.69 | 63.85/63.33 | 68.70/58.18 | 60.72/51.31 | 60.62/59.28 | 58.34/57.05 | 
 | [CenterPoint-Pillar](tools/cfgs/waymo_models/centerpoint_pillar_1x.yaml)| 73.37/72.86 | 65.09/64.62 | 75.35/65.11 | 67.61/58.25 | 67.76/66.22 | 65.25/63.77 | 
 | [Part-A2-Anchor](tools/cfgs/waymo_models/PartA2.yaml) | 77.05/76.51 | 68.47/67.97 | 75.24/66.87 | 66.18/58.62 | 68.60/67.36 | 66.13/64.93 |
+| [VoxelNeXt-2D](tools/cfgs/waymo_models/voxelnext2d_ioubranch.yaml) | 77.94/77.47	|69.68/69.25	|80.24/73.47	|72.23/65.88	|73.33/72.20	|70.66/69.56 | 
 | [PV-RCNN (CenterHead)](tools/cfgs/waymo_models/pv_rcnn_with_centerhead_rpn.yaml) | 78.00/77.50 | 69.43/68.98 | 79.21/73.03 | 70.42/64.72 | 71.46/70.27 | 68.95/67.79 |
 | [PV-RCNN++](tools/cfgs/waymo_models/pv_rcnn_plusplus.yaml) | 79.10/78.63 | 70.34/69.91 | 80.62/74.62 | 71.86/66.30 | 73.49/72.38 | 70.70/69.62 |
 | [PV-RCNN++ (ResNet)](tools/cfgs/waymo_models/pv_rcnn_plusplus_resnet.yaml) | 79.25/78.78 | 70.61/70.18 | 81.83/76.28 | 73.17/68.00 | 73.72/72.66 | 71.21/70.19 |
@@ -199,12 +201,13 @@ but you could easily achieve similar performance by training with the default co
 All models are trained with 8 GTX 1080Ti GPUs and are available for download.

 |                                                                                                    |   mATE |  mASE  |  mAOE  | mAVE  | mAAE  |  mAP  |  NDS   |                                              download                                              | 
-|---------------------------------------------|----------:|:-------:|:-------:|:-------:|:---------:|:-------:|:-------:|:---------:|
+|----------------------------------------------------------------------------------------------------|-------:|:------:|:------:|:-----:|:-----:|:-----:|:------:|:--------------------------------------------------------------------------------------------------:|
 | [PointPillar-MultiHead](tools/cfgs/nuscenes_models/cbgs_pp_multihead.yaml)                         | 33.87	 | 26.00  | 32.07	 | 28.74 | 20.15 | 44.63 | 58.23	 |  [model-23M](https://drive.google.com/file/d/1p-501mTWsq0G9RzroTWSXreIMyTUUpBM/view?usp=sharing)   | 
 | [SECOND-MultiHead (CBGS)](tools/cfgs/nuscenes_models/cbgs_second_multihead.yaml)                   |  31.15 | 	25.51 | 	26.64 | 26.26 | 20.46 | 50.59 | 62.29  |  [model-35M](https://drive.google.com/file/d/1bNzcOnE3u9iooBFMk2xK7HqhdeQ_nwTq/view?usp=sharing)   |
 | [CenterPoint-PointPillar](tools/cfgs/nuscenes_models/cbgs_dyn_pp_centerpoint.yaml)                 |  31.13 | 	26.04 | 	42.92 | 23.90 | 19.14 | 50.03 | 60.70  |  [model-23M](https://drive.google.com/file/d/1UvGm6mROMyJzeSRu7OD1leU_YWoAZG7v/view?usp=sharing)   |
 | [CenterPoint (voxel_size=0.1)](tools/cfgs/nuscenes_models/cbgs_voxel01_res3d_centerpoint.yaml)     |  30.11 | 	25.55 | 	38.28 | 21.94 | 18.87 | 56.03 | 64.54  |  [model-34M](https://drive.google.com/file/d/1Cz-J1c3dw7JAWc25KRG1XQj8yCaOlexQ/view?usp=sharing)   |
 | [CenterPoint (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_res3d_centerpoint.yaml) |  28.80 | 	25.43 | 	37.27 | 21.55 | 18.24 | 59.22 | 66.48  |  [model-34M](https://drive.google.com/file/d/1XOHAWm1MPkCKr1gqmc3TWi5AYZgPsgxU/view?usp=sharing)   |
+| [VoxelNeXt (voxel_size=0.075)](tools/cfgs/nuscenes_models/cbgs_voxel0075_voxelnext.yaml)   |  30.11 | 	25.23 | 	40.57 | 21.69 | 18.56 | 60.53 | 66.65  | [model-31M](https://drive.google.com/file/d/1IV7e7G9X-61KXSjMGtQo579pzDNbhwvf/view?usp=share_link) |


 ### ONCE 3D Object Detection Baselines
@@ -218,6 +221,14 @@ All models are trained with 8 GPUs.
 | [PV-RCNN](tools/cfgs/once_models/pv_rcnn.yaml)         | 77.77   | 23.50      | 59.37   | 53.55  |
 | [CenterPoint](tools/cfgs/once_models/centerpoint.yaml) | 78.02   | 49.74      | 67.22   | 64.99  |

+### Argoverse2 3D Object Detection Baselines
+All models are trained with 4 GPUs.
+
+|                                                         | mAP  |                                              download                                              | 
+|---------------------------------------------------------|:----:|:--------------------------------------------------------------------------------------------------:|
+| [VoxelNeXt](tools/cfgs/argo2_models/cbgs_voxel01_voxelnext.yaml)        | 30.0 | [model-30M](https://drive.google.com/file/d/1zr-it1ERJzLQ3a3hP060z_EQqS_RkNaC/view?usp=share_link) | 
+| [VoxelNeXt-K3](tools/cfgs/argo2_models/cbgs_voxel01_voxelnext_headkernel3.yaml) | 30.7 | [model-45M](https://drive.google.com/file/d/1NrYRsiKbuWyL8jE4SY27IHpFMY9K0o__/view?usp=share_link) | 
+
 ### Other datasets
 Welcome to support other datasets by submitting pull request. 


--- a/data/argo2/ImageSets/train.txt
+++ b/data/argo2/ImageSets/train.txt
--- a/data/argo2/ImageSets/val.txt
+++ b/data/argo2/ImageSets/val.txt
--- a/pcdet/datasets/__init__.py
+++ b/pcdet/datasets/__init__.py
@@ -12,6 +12,7 @@ from .waymo.waymo_dataset import WaymoDataset
 from .pandaset.pandaset_dataset import PandasetDataset
 from .lyft.lyft_dataset import LyftDataset
 from .once.once_dataset import ONCEDataset
+from .argo2.argo2_dataset import Argo2Dataset
 from .custom.custom_dataset import CustomDataset

 __all__ = {
@@ -22,7 +23,8 @@ __all__ = {
    'PandasetDataset': PandasetDataset,
    'LyftDataset': LyftDataset,
    'ONCEDataset': ONCEDataset,
-    'CustomDataset': CustomDataset
+    'CustomDataset': CustomDataset,
+    'Argo2Dataset': Argo2Dataset
 }



--- a/pcdet/datasets/argo2/__init__.py
+++ b/pcdet/datasets/argo2/__init__.py
+
+
--- a/pcdet/datasets/argo2/argo2_dataset.py
+++ b/pcdet/datasets/argo2/argo2_dataset.py
+import copy
+import pickle
+import torch
+import numpy as np
+
+from ..dataset import DatasetTemplate
+from .argo2_utils.so3 import yaw_to_quat
+from .argo2_utils.constants import LABEL_ATTR
+from os import path as osp
+from pathlib import Path
+
+
+class Argo2Dataset(DatasetTemplate):
+    def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None):
+        """
+        Args:
+            root_path:
+            dataset_cfg:
+            class_names:
+            training:
+            logger:
+        """
+        super().__init__(
+            dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
+        )
+        self.split = self.dataset_cfg.DATA_SPLIT[self.mode]
+        self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing')
+
+        split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')
+        self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None
+
+        self.kitti_infos = []
+        self.include_kitti_data(self.mode)
+
+    def include_kitti_data(self, mode):
+        if self.logger is not None:
+            self.logger.info('Loading Argoverse2 dataset')
+        kitti_infos = []
+
+        for info_path in self.dataset_cfg.INFO_PATH[mode]:
+            info_path = self.root_path / info_path
+            if not info_path.exists():
+                continue
+            with open(info_path, 'rb') as f:
+                infos = pickle.load(f)
+                kitti_infos.extend(infos)
+
+        self.kitti_infos.extend(kitti_infos)
+
+        if self.logger is not None:
+            self.logger.info('Total samples for Argo2 dataset: %d' % (len(kitti_infos)))
+
+    def set_split(self, split):
+        super().__init__(
+            dataset_cfg=self.dataset_cfg, class_names=self.class_names, training=self.training, root_path=self.root_path, logger=self.logger
+        )
+        self.split = split
+        self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing')
+
+        split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')
+        self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None
+
+    def get_lidar(self, idx):
+        lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx)
+        assert lidar_file.exists()
+        return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)
+
+    @staticmethod
+    def generate_prediction_dicts(batch_dict, pred_dicts, class_names, output_path=None):
+        """
+        Args:
+            batch_dict:
+                frame_id:
+            pred_dicts: list of pred_dicts
+                pred_boxes: (N, 7), Tensor
+                pred_scores: (N), Tensor
+                pred_labels: (N), Tensor
+            class_names:
+            output_path:
+
+        Returns:
+
+        """
+        def get_template_prediction(num_samples):
+            ret_dict = {
+                'name': np.zeros(num_samples), 'truncated': np.zeros(num_samples),
+                'occluded': np.zeros(num_samples), 'alpha': np.zeros(num_samples),
+                'bbox': np.zeros([num_samples, 4]), 'dimensions': np.zeros([num_samples, 3]),
+                'location': np.zeros([num_samples, 3]), 'rotation_y': np.zeros(num_samples),
+                'score': np.zeros(num_samples), 'boxes_lidar': np.zeros([num_samples, 7])
+            }
+            return ret_dict
+
+        def generate_single_sample_dict(batch_index, box_dict):
+            pred_scores = box_dict['pred_scores'].cpu().numpy()
+            pred_boxes = box_dict['pred_boxes'].cpu().numpy()
+            pred_labels = box_dict['pred_labels'].cpu().numpy()
+            pred_dict = get_template_prediction(pred_scores.shape[0])
+            if pred_scores.shape[0] == 0:
+                return pred_dict
+
+            pred_boxes_img = pred_boxes
+            pred_boxes_camera = pred_boxes
+
+            pred_dict['name'] = np.array(class_names)[pred_labels - 1]
+            pred_dict['alpha'] = -np.arctan2(-pred_boxes[:, 1], pred_boxes[:, 0]) + pred_boxes_camera[:, 6]
+            pred_dict['bbox'] = pred_boxes_img
+            pred_dict['dimensions'] = pred_boxes_camera[:, 3:6]
+            pred_dict['location'] = pred_boxes_camera[:, 0:3]
+            pred_dict['rotation_y'] = pred_boxes_camera[:, 6]
+            pred_dict['score'] = pred_scores
+            pred_dict['boxes_lidar'] = pred_boxes
+
+            return pred_dict
+
+        annos = []
+        for index, box_dict in enumerate(pred_dicts):
+            frame_id = batch_dict['frame_id'][index]
+
+            single_pred_dict = generate_single_sample_dict(index, box_dict)
+            single_pred_dict['frame_id'] = frame_id
+            annos.append(single_pred_dict)
+
+            if output_path is not None:
+                cur_det_file = output_path / ('%s.txt' % frame_id)
+                with open(cur_det_file, 'w') as f:
+                    bbox = single_pred_dict['bbox']
+                    loc = single_pred_dict['location']
+                    dims = single_pred_dict['dimensions']  # lhw -> hwl
+
+                    for idx in range(len(bbox)):
+                        print('%s -1 -1 %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f'
+                              % (single_pred_dict['name'][idx], single_pred_dict['alpha'][idx],
+                                 bbox[idx][0], bbox[idx][1], bbox[idx][2], bbox[idx][3],
+                                 dims[idx][1], dims[idx][2], dims[idx][0], loc[idx][0],
+                                 loc[idx][1], loc[idx][2], single_pred_dict['rotation_y'][idx],
+                                 single_pred_dict['score'][idx]), file=f)
+
+        return annos
+
+    def __len__(self):
+        if self._merge_all_iters_to_one_epoch:
+            return len(self.kitti_infos) * self.total_epochs
+
+        return len(self.kitti_infos)
+
+    def __getitem__(self, index):
+        # index = 4
+        if self._merge_all_iters_to_one_epoch:
+            index = index % len(self.kitti_infos)
+
+        info = copy.deepcopy(self.kitti_infos[index])
+
+        sample_idx = info['point_cloud']['velodyne_path'].split('/')[-1].rstrip('.bin')
+        calib = None
+        get_item_list = self.dataset_cfg.get('GET_ITEM_LIST', ['points'])
+
+        input_dict = {
+            'frame_id': sample_idx,
+            'calib': calib,
+        }
+
+        if 'annos' in info:
+            annos = info['annos']
+            loc, dims, rots = annos['location'], annos['dimensions'], annos['rotation_y']
+            gt_names = annos['name']
+            gt_bboxes_3d = np.concatenate([loc, dims, rots[..., np.newaxis]], axis=1).astype(np.float32)
+
+            input_dict.update({
+                'gt_names': gt_names,
+                'gt_boxes': gt_bboxes_3d
+            })
+
+        if "points" in get_item_list:
+            points = self.get_lidar(sample_idx)
+            input_dict['points'] = points
+
+        input_dict['calib'] = calib
+        data_dict = self.prepare_data(data_dict=input_dict)
+
+        return data_dict
+
+    def format_results(self,
+                       outputs,
+                       class_names,
+                       pklfile_prefix=None,
+                       submission_prefix=None,
+                       ):
+        """Format the results to .feather file with argo2 format.
+
+        Args:
+            outputs (list[dict]): Testing results of the dataset.
+            pklfile_prefix (str | None): The prefix of pkl files. It includes
+                the file path and the prefix of filename, e.g., "a/b/prefix".
+                If not specified, a temp file will be created. Default: None.
+            submission_prefix (str | None): The prefix of submitted files. It
+                includes the file path and the prefix of filename, e.g.,
+                "a/b/prefix". If not specified, a temp file will be created.
+                Default: None.
+
+        Returns:
+            tuple: (result_files, tmp_dir), result_files is a dict containing
+                the json filepaths, tmp_dir is the temporal directory created
+                for saving json files when jsonfile_prefix is not specified.
+        """
+        import pandas as pd
+
+        assert len(self.kitti_infos) == len(outputs)
+        num_samples = len(outputs)
+        print('\nGot {} samples'.format(num_samples))
+        
+        serialized_dts_list = []
+        
+        print('\nConvert predictions to Argoverse 2 format')
+        for i in range(num_samples):
+            out_i = outputs[i]
+            log_id, ts = self.kitti_infos[i]['uuid'].split('/')
+            track_uuid = None
+            #cat_id = out_i['labels_3d'].numpy().tolist()
+            #category = [class_names[i].upper() for i in cat_id]
+            category = [class_name.upper() for class_name in out_i['name']]
+            serialized_dts = pd.DataFrame(
+                self.lidar_box_to_argo2(out_i['bbox']).numpy(), columns=list(LABEL_ATTR)
+            )
+            serialized_dts["score"] = out_i['score']
+            serialized_dts["log_id"] = log_id
+            serialized_dts["timestamp_ns"] = int(ts)
+            serialized_dts["category"] = category
+            serialized_dts_list.append(serialized_dts)
+        
+        dts = (
+            pd.concat(serialized_dts_list)
+            .set_index(["log_id", "timestamp_ns"])
+            .sort_index()
+        )
+
+        dts = dts.sort_values("score", ascending=False).reset_index()
+
+        if pklfile_prefix is not None:
+            if not pklfile_prefix.endswith(('.feather')):
+                pklfile_prefix = f'{pklfile_prefix}.feather'
+            dts.to_feather(pklfile_prefix)
+            print(f'Result is saved to {pklfile_prefix}.')
+
+        dts = dts.set_index(["log_id", "timestamp_ns"]).sort_index()
+
+        return dts 
+    
+    def lidar_box_to_argo2(self, boxes):
+        boxes = torch.Tensor(boxes)
+        cnt_xyz = boxes[:, :3]
+        #cnt_xyz[:, 2] += boxes[:, 5] * 0.5
+        lwh = boxes[:, [4, 3, 5]]
+        #yaw = -boxes[:, 6] - np.pi/2
+        yaw = boxes[:, 6] #- np.pi/2
+
+        yaw = -yaw - 0.5 * np.pi
+        while (yaw < -np.pi).any():
+            yaw[yaw < -np.pi] += 2 * np.pi
+        while (yaw > np.pi).any():
+            yaw[yaw > np.pi] -= 2 * np.pi
+
+        quat = yaw_to_quat(yaw)
+        argo_cuboid = torch.cat([cnt_xyz, lwh, quat], dim=1)
+        return argo_cuboid
+
+    def evaluation(self,
+                 results,
+                 class_names,
+                 eval_metric='waymo',
+                 logger=None,
+                 pklfile_prefix=None,
+                 submission_prefix=None,
+                 show=False,
+                 output_path=None,
+                 pipeline=None):
+        """Evaluation in KITTI protocol.
+
+        Args:
+            results (list[dict]): Testing results of the dataset.
+            metric (str | list[str]): Metrics to be evaluated.
+                Default: 'waymo'. Another supported metric is 'kitti'.
+            logger (logging.Logger | str | None): Logger used for printing
+                related information during evaluation. Default: None.
+            pklfile_prefix (str | None): The prefix of pkl files. It includes
+                the file path and the prefix of filename, e.g., "a/b/prefix".
+                If not specified, a temp file will be created. Default: None.
+            submission_prefix (str | None): The prefix of submission datas.
+                If not specified, the submission data will not be generated.
+            show (bool): Whether to visualize.
+                Default: False.
+            out_dir (str): Path to save the visualization results.
+                Default: None.
+            pipeline (list[dict], optional): raw data loading for showing.
+                Default: None.
+
+        Returns:
+            dict[str: float]: results of each evaluation metric
+        """
+        from av2.evaluation.detection.constants import CompetitionCategories
+        from av2.evaluation.detection.utils import DetectionCfg
+        from av2.evaluation.detection.eval import evaluate
+        from av2.utils.io import read_feather
+
+        dts = self.format_results(results, class_names, pklfile_prefix, submission_prefix)
+        argo2_root = "../data/argo2/"
+        val_anno_path = osp.join(argo2_root, 'val_anno.feather')
+        gts = read_feather(val_anno_path)
+        gts = gts.set_index(["log_id", "timestamp_ns"]).sort_values("category")
+
+        valid_uuids_gts = gts.index.tolist()
+        valid_uuids_dts = dts.index.tolist()
+        valid_uuids = set(valid_uuids_gts) & set(valid_uuids_dts)
+        gts = gts.loc[list(valid_uuids)].sort_index()
+
+        categories = set(x.value for x in CompetitionCategories)
+        categories &= set(gts["category"].unique().tolist())
+
+        split = 'val'
+        dataset_dir = Path(argo2_root) / 'sensor' / split
+        cfg = DetectionCfg(
+            dataset_dir=dataset_dir,
+            categories=tuple(sorted(categories)),
+            #split=split,
+            max_range_m=200.0,
+            eval_only_roi_instances=True,
+        )
+
+        # Evaluate using Argoverse detection API.
+        eval_dts, eval_gts, metrics = evaluate(
+            dts.reset_index(), gts.reset_index(), cfg
+        )
+
+        valid_categories = sorted(categories) + ["AVERAGE_METRICS"]
+        ap_dict = {}
+        for index, row in metrics.iterrows():
+            ap_dict[index] = row.to_json()
+        return metrics.loc[valid_categories], ap_dict
--- a/pcdet/datasets/argo2/argo2_utils/constants.py
+++ b/pcdet/datasets/argo2/argo2_utils/constants.py
+LABEL_ATTR = (
+    "tx_m",
+    "ty_m",
+    "tz_m",
+    "length_m",
+    "width_m",
+    "height_m",
+    "qw",
+    "qx",
+    "qy",
+    "qz",
+)
\ No newline at end of file
--- a/pcdet/datasets/argo2/argo2_utils/so3.py
+++ b/pcdet/datasets/argo2/argo2_utils/so3.py
+"""SO(3) group transformations."""
+
+import kornia.geometry.conversions as C
+import torch
+from torch import Tensor
+from math import pi as PI
+
+
+@torch.jit.script
+def quat_to_mat(quat_wxyz: Tensor) -> Tensor:
+    """Convert scalar first quaternion to rotation matrix.
+
+    Args:
+        quat_wxyz: (...,4) Scalar first quaternions.
+
+    Returns:
+        (...,3,3) 3D rotation matrices.
+    """
+    return C.quaternion_to_rotation_matrix(
+        quat_wxyz, order=C.QuaternionCoeffOrder.WXYZ
+    )
+
+
+# @torch.jit.script
+def mat_to_quat(mat: Tensor) -> Tensor:
+    """Convert rotation matrix to scalar first quaternion.
+
+    Args:
+        mat: (...,3,3) 3D rotation matrices.
+
+    Returns:
+        (...,4) Scalar first quaternions.
+    """
+    return C.rotation_matrix_to_quaternion(
+        mat, order=C.QuaternionCoeffOrder.WXYZ
+    )
+
+
+@torch.jit.script
+def quat_to_xyz(
+    quat_wxyz: Tensor, singularity_value: float = PI / 2
+) -> Tensor:
+    """Convert scalar first quaternion to Tait-Bryan angles.
+
+    Reference:
+        https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles#Source_code_2
+
+    Args:
+        quat_wxyz: (...,4) Scalar first quaternions.
+        singularity_value: Value that's set at the singularities.
+
+    Returns:
+        (...,3) The Tait-Bryan angles --- roll, pitch, and yaw.
+    """
+    qw = quat_wxyz[..., 0]
+    qx = quat_wxyz[..., 1]
+    qy = quat_wxyz[..., 2]
+    qz = quat_wxyz[..., 3]
+
+    # roll (x-axis rotation)
+    sinr_cosp = 2 * (qw * qx + qy * qz)
+    cosr_cosp = 1 - 2 * (qx * qx + qy * qy)
+    roll = torch.atan2(sinr_cosp, cosr_cosp)
+
+    # pitch (y-axis rotation)
+    pitch = 2 * (qw * qy - qz * qx)
+    is_out_of_range = torch.abs(pitch) >= 1
+    pitch[is_out_of_range] = torch.copysign(
+        torch.as_tensor(singularity_value), pitch[is_out_of_range]
+    )
+    pitch[~is_out_of_range] = torch.asin(pitch[~is_out_of_range])
+
+    # yaw (z-axis rotation)
+    siny_cosp = 2 * (qw * qz + qx * qy)
+    cosy_cosp = 1 - 2 * (qy * qy + qz * qz)
+    yaw = torch.atan2(siny_cosp, cosy_cosp)
+    xyz = torch.stack([roll, pitch, yaw], dim=-1)
+    return xyz
+
+
+@torch.jit.script
+def quat_to_yaw(quat_wxyz: Tensor) -> Tensor:
+    """Convert scalar first quaternion to yaw (rotation about vertical axis).
+
+    Reference:
+        https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles#Source_code_2
+
+    Args:
+        quat_wxyz: (...,4) Scalar first quaternions.
+
+    Returns:
+        (...,) The rotation about the z-axis in radians.
+    """
+    xyz = quat_to_xyz(quat_wxyz)
+    yaw_rad: Tensor = xyz[..., -1]
+    return yaw_rad
+
+
+@torch.jit.script
+def xyz_to_quat(xyz_rad: Tensor) -> Tensor:
+    """Convert euler angles (xyz - pitch, roll, yaw) to scalar first quaternions.
+
+    Args:
+        xyz_rad: (...,3) Tensor of roll, pitch, and yaw in radians.
+
+    Returns:
+        (...,4) Scalar first quaternions (wxyz).
+    """
+    x_rad = xyz_rad[..., 0]
+    y_rad = xyz_rad[..., 1]
+    z_rad = xyz_rad[..., 2]
+
+    cy = torch.cos(z_rad * 0.5)
+    sy = torch.sin(z_rad * 0.5)
+    cp = torch.cos(y_rad * 0.5)
+    sp = torch.sin(y_rad * 0.5)
+    cr = torch.cos(x_rad * 0.5)
+    sr = torch.sin(x_rad * 0.5)
+
+    qw = cr * cp * cy + sr * sp * sy
+    qx = sr * cp * cy - cr * sp * sy
+    qy = cr * sp * cy + sr * cp * sy
+    qz = cr * cp * sy - sr * sp * cy
+    quat_wxyz = torch.stack([qw, qx, qy, qz], dim=-1)
+    return quat_wxyz
+
+
+@torch.jit.script
+def yaw_to_quat(yaw_rad: Tensor) -> Tensor:
+    """Convert yaw (rotation about the vertical axis) to scalar first quaternions.
+
+    Args:
+        yaw_rad: (...,1) Rotations about the z-axis.
+
+    Returns:
+        (...,4) scalar first quaternions (wxyz).
+    """
+    xyz_rad = torch.zeros_like(yaw_rad)[..., None].repeat_interleave(3, dim=-1)
+    xyz_rad[..., -1] = yaw_rad
+    quat_wxyz: Tensor = xyz_to_quat(xyz_rad)
+    return quat_wxyz
--- a/pcdet/datasets/dataset.py
+++ b/pcdet/datasets/dataset.py
@@ -199,13 +199,19 @@ class DatasetTemplate(torch_data.Dataset):
                data_dict[key].append(val)
        batch_size = len(batch_list)
        ret = {}
+        batch_size_ratio = 1

        for key, val in data_dict.items():
            try:
                if key in ['voxels', 'voxel_num_points']:
+                    if isinstance(val[0], list):
+                        batch_size_ratio = len(val[0])
+                        val = [i for item in val for i in item]
                    ret[key] = np.concatenate(val, axis=0)
                elif key in ['points', 'voxel_coords']:
                    coors = []
+                    if isinstance(val[0], list):
+                        val =  [i for item in val for i in item]
                    for i, coor in enumerate(val):
                        coor_pad = np.pad(coor, ((0, 0), (1, 0)), mode='constant', constant_values=i)
                        coors.append(coor_pad)
@@ -287,5 +293,5 @@ class DatasetTemplate(torch_data.Dataset):
                print('Error in collate_batch: key=%s' % key)
                raise TypeError

-        ret['batch_size'] = batch_size
+        ret['batch_size'] = batch_size * batch_size_ratio
        return ret
--- a/pcdet/datasets/processor/data_processor.py
+++ b/pcdet/datasets/processor/data_processor.py
@@ -113,6 +113,22 @@ class DataProcessor(object):
        
        return data_dict

+    def double_flip(self, points):
+        # y flip
+        points_yflip = points.copy()
+        points_yflip[:, 1] = -points_yflip[:, 1]
+
+        # x flip
+        points_xflip = points.copy()
+        points_xflip[:, 0] = -points_xflip[:, 0]
+
+        # x y flip
+        points_xyflip = points.copy()
+        points_xyflip[:, 0] = -points_xyflip[:, 0]
+        points_xyflip[:, 1] = -points_xyflip[:, 1]
+
+        return points_yflip, points_xflip, points_xyflip
+
    def transform_points_to_voxels(self, data_dict=None, config=None):
        if data_dict is None:
            grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
@@ -138,6 +154,25 @@ class DataProcessor(object):
        if not data_dict['use_lead_xyz']:
            voxels = voxels[..., 3:]  # remove xyz in voxels(N, 3)

+        if config.get('DOUBLE_FLIP', False):
+            voxels_list, voxel_coords_list, voxel_num_points_list = [voxels], [coordinates], [num_points]
+            points_yflip, points_xflip, points_xyflip = self.double_flip(points)
+            points_list = [points_yflip, points_xflip, points_xyflip]
+            keys = ['yflip', 'xflip', 'xyflip']
+            for i, key in enumerate(keys):
+                voxel_output = self.voxel_generator.generate(points_list[i])
+                voxels, coordinates, num_points = voxel_output
+
+                if not data_dict['use_lead_xyz']:
+                    voxels = voxels[..., 3:]
+                voxels_list.append(voxels)
+                voxel_coords_list.append(coordinates)
+                voxel_num_points_list.append(num_points)
+
+            data_dict['voxels'] = voxels_list
+            data_dict['voxel_coords'] = voxel_coords_list
+            data_dict['voxel_num_points'] = voxel_num_points_list
+        else:
            data_dict['voxels'] = voxels
            data_dict['voxel_coords'] = coordinates
            data_dict['voxel_num_points'] = num_points

--- a/pcdet/models/backbones_3d/__init__.py
+++ b/pcdet/models/backbones_3d/__init__.py
@@ -2,6 +2,7 @@ from .pointnet2_backbone import PointNet2Backbone, PointNet2MSG
 from .spconv_backbone import VoxelBackBone8x, VoxelResBackBone8x
 from .spconv_backbone_2d import PillarBackBone8x, PillarRes18BackBone8x
 from .spconv_backbone_focal import VoxelBackBone8xFocal
+from .spconv_backbone_voxelnext import VoxelResBackBone8xVoxelNeXt
 from .spconv_unet import UNetV2

 __all__ = {
@@ -11,6 +12,7 @@ __all__ = {
    'PointNet2MSG': PointNet2MSG,
    'VoxelResBackBone8x': VoxelResBackBone8x,
    'VoxelBackBone8xFocal': VoxelBackBone8xFocal,
+    'VoxelResBackBone8xVoxelNeXt': VoxelResBackBone8xVoxelNeXt,
    'PillarBackBone8x': PillarBackBone8x,
    'PillarRes18BackBone8x': PillarRes18BackBone8x
 }
--- a/pcdet/models/backbones_3d/spconv_backbone_voxelnext.py
+++ b/pcdet/models/backbones_3d/spconv_backbone_voxelnext.py
+from functools import partial
+import torch
+import torch.nn as nn
+
+from ...utils.spconv_utils import replace_feature, spconv
+
+
+def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,
+                   conv_type='subm', norm_fn=None):
+
+    if conv_type == 'subm':
+        conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)
+    elif conv_type == 'spconv':
+        conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
+                                   bias=False, indice_key=indice_key)
+    elif conv_type == 'inverseconv':
+        conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)
+    else:
+        raise NotImplementedError
+
+    m = spconv.SparseSequential(
+        conv,
+        norm_fn(out_channels),
+        nn.ReLU(),
+    )
+
+    return m
+
+
+class SparseBasicBlock(spconv.SparseModule):
+    expansion = 1
+
+    def __init__(self, inplanes, planes, stride=1, norm_fn=None, downsample=None, indice_key=None):
+        super(SparseBasicBlock, self).__init__()
+
+        assert norm_fn is not None
+        bias = norm_fn is not None
+        self.conv1 = spconv.SubMConv3d(
+            inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key
+        )
+        self.bn1 = norm_fn(planes)
+        self.relu = nn.ReLU()
+        self.conv2 = spconv.SubMConv3d(
+            planes, planes, kernel_size=3, stride=stride, padding=1, bias=bias, indice_key=indice_key
+        )
+        self.bn2 = norm_fn(planes)
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        identity = x
+
+        out = self.conv1(x)
+        out = replace_feature(out, self.bn1(out.features))
+        out = replace_feature(out, self.relu(out.features))
+
+        out = self.conv2(out)
+        out = replace_feature(out, self.bn2(out.features))
+
+        if self.downsample is not None:
+            identity = self.downsample(x)
+
+        out = replace_feature(out, out.features + identity.features)
+        out = replace_feature(out, self.relu(out.features))
+
+        return out
+
+
+class VoxelResBackBone8xVoxelNeXt(nn.Module):
+    def __init__(self, model_cfg, input_channels, grid_size, **kwargs):
+        super().__init__()
+        self.model_cfg = model_cfg
+        norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)
+
+        spconv_kernel_sizes = model_cfg.get('SPCONV_KERNEL_SIZES', [3, 3, 3, 3])
+        channels = model_cfg.get('CHANNELS', [16, 32, 64, 128, 128])
+        out_channel = model_cfg.get('OUT_CHANNEL', 128)
+
+        self.sparse_shape = grid_size[::-1] + [1, 0, 0]
+
+        self.conv_input = spconv.SparseSequential(
+            spconv.SubMConv3d(input_channels, channels[0], 3, padding=1, bias=False, indice_key='subm1'),
+            norm_fn(channels[0]),
+            nn.ReLU(),
+        )
+        block = post_act_block
+
+        self.conv1 = spconv.SparseSequential(
+            SparseBasicBlock(channels[0], channels[0], norm_fn=norm_fn, indice_key='res1'),
+            SparseBasicBlock(channels[0], channels[0], norm_fn=norm_fn, indice_key='res1'),
+        )
+
+        self.conv2 = spconv.SparseSequential(
+            # [1600, 1408, 41] <- [800, 704, 21]
+            block(channels[0], channels[1], spconv_kernel_sizes[0], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[0]//2), indice_key='spconv2', conv_type='spconv'),
+            SparseBasicBlock(channels[1], channels[1], norm_fn=norm_fn, indice_key='res2'),
+            SparseBasicBlock(channels[1], channels[1], norm_fn=norm_fn, indice_key='res2'),
+        )
+
+        self.conv3 = spconv.SparseSequential(
+            # [800, 704, 21] <- [400, 352, 11]
+            block(channels[1], channels[2], spconv_kernel_sizes[1], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[1]//2), indice_key='spconv3', conv_type='spconv'),
+            SparseBasicBlock(channels[2], channels[2], norm_fn=norm_fn, indice_key='res3'),
+            SparseBasicBlock(channels[2], channels[2], norm_fn=norm_fn, indice_key='res3'),
+        )
+
+        self.conv4 = spconv.SparseSequential(
+            # [400, 352, 11] <- [200, 176, 6]
+            block(channels[2], channels[3], spconv_kernel_sizes[2], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[2]//2), indice_key='spconv4', conv_type='spconv'),
+            SparseBasicBlock(channels[3], channels[3], norm_fn=norm_fn, indice_key='res4'),
+            SparseBasicBlock(channels[3], channels[3], norm_fn=norm_fn, indice_key='res4'),
+        )
+
+        self.conv5 = spconv.SparseSequential(
+            # [200, 176, 6] <- [100, 88, 3]
+            block(channels[3], channels[4], spconv_kernel_sizes[3], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[3]//2), indice_key='spconv5', conv_type='spconv'),
+            SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res5'),
+            SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res5'),
+        )
+        
+        self.conv6 = spconv.SparseSequential(
+            # [200, 176, 6] <- [100, 88, 3]
+            block(channels[4], channels[4], spconv_kernel_sizes[3], norm_fn=norm_fn, stride=2, padding=int(spconv_kernel_sizes[3]//2), indice_key='spconv6', conv_type='spconv'),
+            SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res6'),
+            SparseBasicBlock(channels[4], channels[4], norm_fn=norm_fn, indice_key='res6'),
+        )
+        self.conv_out = spconv.SparseSequential(
+            # [200, 150, 5] -> [200, 150, 2]
+            spconv.SparseConv2d(channels[3], out_channel, 3, stride=1, padding=1, bias=False, indice_key='spconv_down2'),
+            norm_fn(out_channel),
+            nn.ReLU(),
+        )
+
+        self.shared_conv = spconv.SparseSequential(
+            spconv.SubMConv2d(out_channel, out_channel, 3, stride=1, padding=1, bias=True),
+            nn.BatchNorm1d(out_channel),
+            nn.ReLU(True),
+        )
+
+        self.forward_ret_dict = {}
+        self.num_point_features = out_channel
+        self.backbone_channels = {
+            'x_conv1': channels[0],
+            'x_conv2': channels[1],
+            'x_conv3': channels[2],
+            'x_conv4': channels[3]
+        }
+
+    def bev_out(self, x_conv):
+        features_cat = x_conv.features
+        indices_cat = x_conv.indices[:, [0, 2, 3]]
+        spatial_shape = x_conv.spatial_shape[1:]
+
+        indices_unique, _inv = torch.unique(indices_cat, dim=0, return_inverse=True)
+        features_unique = features_cat.new_zeros((indices_unique.shape[0], features_cat.shape[1]))
+        features_unique.index_add_(0, _inv, features_cat)
+
+        x_out = spconv.SparseConvTensor(
+            features=features_unique,
+            indices=indices_unique,
+            spatial_shape=spatial_shape,
+            batch_size=x_conv.batch_size
+        )
+        return x_out
+
+    def forward(self, batch_dict):
+        """
+        Args:
+            batch_dict:
+                batch_size: int
+                vfe_features: (num_voxels, C)
+                voxel_coords: (num_voxels, 4), [batch_idx, z_idx, y_idx, x_idx]
+        Returns:
+            batch_dict:
+                encoded_spconv_tensor: sparse tensor
+        """
+        voxel_features, voxel_coords = batch_dict['voxel_features'], batch_dict['voxel_coords']
+        batch_size = batch_dict['batch_size']
+        input_sp_tensor = spconv.SparseConvTensor(
+            features=voxel_features,
+            indices=voxel_coords.int(),
+            spatial_shape=self.sparse_shape,
+            batch_size=batch_size
+        )
+        x = self.conv_input(input_sp_tensor)
+
+        x_conv1 = self.conv1(x)
+        x_conv2 = self.conv2(x_conv1)
+        x_conv3 = self.conv3(x_conv2)
+        x_conv4 = self.conv4(x_conv3)
+        x_conv5 = self.conv5(x_conv4)
+        x_conv6 = self.conv6(x_conv5)
+
+        x_conv5.indices[:, 1:] *= 2
+        x_conv6.indices[:, 1:] *= 4
+        x_conv4 = x_conv4.replace_feature(torch.cat([x_conv4.features, x_conv5.features, x_conv6.features]))
+        x_conv4.indices = torch.cat([x_conv4.indices, x_conv5.indices, x_conv6.indices])
+
+        out = self.bev_out(x_conv4)
+
+        out = self.conv_out(out)
+        out = self.shared_conv(out)
+
+        batch_dict.update({
+            'encoded_spconv_tensor': out,
+            'encoded_spconv_tensor_stride': 8
+        })
+        batch_dict.update({
+            'multi_scale_3d_features': {
+                'x_conv1': x_conv1,
+                'x_conv2': x_conv2,
+                'x_conv3': x_conv3,
+                'x_conv4': x_conv4,
+            }
+        })
+        batch_dict.update({
+            'multi_scale_3d_strides': {
+                'x_conv1': 1,
+                'x_conv2': 2,
+                'x_conv3': 4,
+                'x_conv4': 8,
+            }
+        })
+        
+        return batch_dict
--- a/pcdet/models/dense_heads/__init__.py
+++ b/pcdet/models/dense_heads/__init__.py
@@ -5,6 +5,7 @@ from .point_head_box import PointHeadBox
 from .point_head_simple import PointHeadSimple
 from .point_intra_part_head import PointIntraPartOffsetHead
 from .center_head import CenterHead
+from .voxelnext_head import VoxelNeXtHead

 __all__ = {
    'AnchorHeadTemplate': AnchorHeadTemplate,
@@ -13,5 +14,6 @@ __all__ = {
    'PointHeadSimple': PointHeadSimple,
    'PointHeadBox': PointHeadBox,
    'AnchorHeadMulti': AnchorHeadMulti,
-    'CenterHead': CenterHead
+    'CenterHead': CenterHead,
+    'VoxelNeXtHead': VoxelNeXtHead,
 }
--- a/pcdet/models/dense_heads/voxelnext_head.py
+++ b/pcdet/models/dense_heads/voxelnext_head.py
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.nn.init import kaiming_normal_
+from ..model_utils import centernet_utils
+from ..model_utils import model_nms_utils
+from ...utils import loss_utils
+import spconv.pytorch as spconv
+import copy
+from easydict import EasyDict
+
+
+class SeparateHead(nn.Module):
+    def __init__(self, input_channels, sep_head_dict, kernel_size, init_bias=-2.19, use_bias=False):
+        super().__init__()
+        self.sep_head_dict = sep_head_dict
+
+        for cur_name in self.sep_head_dict:
+            output_channels = self.sep_head_dict[cur_name]['out_channels']
+            num_conv = self.sep_head_dict[cur_name]['num_conv']
+
+            fc_list = []
+            for k in range(num_conv - 1):
+                fc_list.append(spconv.SparseSequential(
+                    spconv.SubMConv2d(input_channels, input_channels, kernel_size, padding=int(kernel_size//2), bias=use_bias, indice_key=cur_name),
+                    nn.BatchNorm1d(input_channels),
+                    nn.ReLU()
+                ))
+            fc_list.append(spconv.SubMConv2d(input_channels, output_channels, 1, bias=True, indice_key=cur_name+'out'))
+            fc = nn.Sequential(*fc_list)
+            if 'hm' in cur_name:
+                fc[-1].bias.data.fill_(init_bias)
+            else:
+                for m in fc.modules():
+                    if isinstance(m, spconv.SubMConv2d):
+                        kaiming_normal_(m.weight.data)
+                        if hasattr(m, "bias") and m.bias is not None:
+                            nn.init.constant_(m.bias, 0)
+
+            self.__setattr__(cur_name, fc)
+
+    def forward(self, x):
+        ret_dict = {}
+        for cur_name in self.sep_head_dict:
+            ret_dict[cur_name] = self.__getattr__(cur_name)(x).features
+
+        return ret_dict
+
+
+class VoxelNeXtHead(nn.Module):
+    def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size,
+                 predict_boxes_when_training=False):
+        super().__init__()
+        self.model_cfg = model_cfg
+        self.num_class = num_class
+        self.grid_size = grid_size
+        self.point_cloud_range = torch.Tensor(point_cloud_range).cuda()
+        self.voxel_size = torch.Tensor(voxel_size).cuda()
+        self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None)
+
+        self.class_names = class_names
+        self.class_names_each_head = []
+        self.class_id_mapping_each_head = []
+        self.gaussian_ratio = self.model_cfg.get('GAUSSIAN_RATIO', 1)
+        self.gaussian_type = self.model_cfg.get('GAUSSIAN_TYPE', ['nearst', 'gt_center'])
+        # The iou branch is only used for Waymo dataset
+        self.iou_branch = self.model_cfg.get('IOU_BRANCH', False)
+        if self.iou_branch:
+            self.rectifier = self.model_cfg.get('RECTIFIER')
+            nms_configs = self.model_cfg.POST_PROCESSING.NMS_CONFIG
+            self.nms_configs = [EasyDict(NMS_TYPE=nms_configs.NMS_TYPE, 
+                                    NMS_THRESH=nms_configs.NMS_THRESH[i],
+                                    NMS_PRE_MAXSIZE=nms_configs.NMS_PRE_MAXSIZE[i],
+                                    NMS_POST_MAXSIZE=nms_configs.NMS_POST_MAXSIZE[i]) for i in range(num_class)]
+
+        self.double_flip = self.model_cfg.get('DOUBLE_FLIP', False)
+        for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD:
+            self.class_names_each_head.append([x for x in cur_class_names if x in class_names])
+            cur_class_id_mapping = torch.from_numpy(np.array(
+                [self.class_names.index(x) for x in cur_class_names if x in class_names]
+            )).cuda()
+            self.class_id_mapping_each_head.append(cur_class_id_mapping)
+
+        total_classes = sum([len(x) for x in self.class_names_each_head])
+        assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}'
+
+        kernel_size_head = self.model_cfg.get('KERNEL_SIZE_HEAD', 3)
+
+        self.heads_list = nn.ModuleList()
+        self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG
+        for idx, cur_class_names in enumerate(self.class_names_each_head):
+            cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT)
+            cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV)
+            self.heads_list.append(
+                SeparateHead(
+                    input_channels=self.model_cfg.get('SHARED_CONV_CHANNEL', 128),
+                    sep_head_dict=cur_head_dict,
+                    kernel_size=kernel_size_head,
+                    init_bias=-2.19,
+                    use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False),
+                )
+            )
+        self.predict_boxes_when_training = predict_boxes_when_training
+        self.forward_ret_dict = {}
+        self.build_losses()
+
+    def build_losses(self):
+        self.add_module('hm_loss_func', loss_utils.FocalLossSparse())
+        self.add_module('reg_loss_func', loss_utils.RegLossSparse())
+        if self.iou_branch:
+            self.add_module('crit_iou', loss_utils.IouLossSparse())
+            self.add_module('crit_iou_reg', loss_utils.IouRegLossSparse())
+
+    def assign_targets(self, gt_boxes, num_voxels, spatial_indices, spatial_shape):
+        """
+        Args:
+            gt_boxes: (B, M, 8)
+        Returns:
+        """
+        target_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG
+
+        batch_size = gt_boxes.shape[0]
+        ret_dict = {
+            'heatmaps': [],
+            'target_boxes': [],
+            'inds': [],
+            'masks': [],
+            'heatmap_masks': [],
+            'gt_boxes': []
+        }
+
+        all_names = np.array(['bg', *self.class_names])
+        for idx, cur_class_names in enumerate(self.class_names_each_head):
+            heatmap_list, target_boxes_list, inds_list, masks_list, gt_boxes_list = [], [], [], [], []
+            for bs_idx in range(batch_size):
+                cur_gt_boxes = gt_boxes[bs_idx]
+                gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()]
+
+                gt_boxes_single_head = []
+
+                for idx, name in enumerate(gt_class_names):
+                    if name not in cur_class_names:
+                        continue
+                    temp_box = cur_gt_boxes[idx]
+                    temp_box[-1] = cur_class_names.index(name) + 1
+                    gt_boxes_single_head.append(temp_box[None, :])
+
+                if len(gt_boxes_single_head) == 0:
+                    gt_boxes_single_head = cur_gt_boxes[:0, :]
+                else:
+                    gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0)
+
+                heatmap, ret_boxes, inds, mask = self.assign_target_of_single_head(
+                    num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head,
+                    num_voxels=num_voxels[bs_idx], spatial_indices=spatial_indices[bs_idx], 
+                    spatial_shape=spatial_shape, 
+                    feature_map_stride=target_assigner_cfg.FEATURE_MAP_STRIDE,
+                    num_max_objs=target_assigner_cfg.NUM_MAX_OBJS,
+                    gaussian_overlap=target_assigner_cfg.GAUSSIAN_OVERLAP,
+                    min_radius=target_assigner_cfg.MIN_RADIUS,
+                )
+                heatmap_list.append(heatmap.to(gt_boxes_single_head.device))
+                target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device))
+                inds_list.append(inds.to(gt_boxes_single_head.device))
+                masks_list.append(mask.to(gt_boxes_single_head.device))
+                gt_boxes_list.append(gt_boxes_single_head[:, :-1])
+
+            ret_dict['heatmaps'].append(torch.cat(heatmap_list, dim=1).permute(1, 0))
+            ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0))
+            ret_dict['inds'].append(torch.stack(inds_list, dim=0))
+            ret_dict['masks'].append(torch.stack(masks_list, dim=0))
+            ret_dict['gt_boxes'].append(gt_boxes_list)
+
+        return ret_dict
+
+    def distance(self, voxel_indices, center):
+        distances = ((voxel_indices - center.unsqueeze(0))**2).sum(-1)
+        return distances
+
+    def assign_target_of_single_head(
+            self, num_classes, gt_boxes, num_voxels, spatial_indices, spatial_shape, feature_map_stride, num_max_objs=500,
+            gaussian_overlap=0.1, min_radius=2
+    ):
+        """
+        Args:
+            gt_boxes: (N, 8)
+            feature_map_size: (2), [x, y]
+
+        Returns:
+
+        """
+        heatmap = gt_boxes.new_zeros(num_classes, num_voxels)
+
+        ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1))
+        inds = gt_boxes.new_zeros(num_max_objs).long()
+        mask = gt_boxes.new_zeros(num_max_objs).long()
+
+        x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2]
+        coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride
+        coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride
+
+        coord_x = torch.clamp(coord_x, min=0, max=spatial_shape[1] - 0.5)  # bugfixed: 1e-6 does not work for center.int()
+        coord_y = torch.clamp(coord_y, min=0, max=spatial_shape[0] - 0.5)  #
+
+        center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1)
+        center_int = center.int()
+        center_int_float = center_int.float()
+
+        dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5]
+        dx = dx / self.voxel_size[0] / feature_map_stride
+        dy = dy / self.voxel_size[1] / feature_map_stride
+
+        radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap)
+        radius = torch.clamp_min(radius.int(), min=min_radius)
+
+        for k in range(min(num_max_objs, gt_boxes.shape[0])):
+            if dx[k] <= 0 or dy[k] <= 0:
+                continue
+
+            if not (0 <= center_int[k][0] <= spatial_shape[1] and 0 <= center_int[k][1] <= spatial_shape[0]):
+                continue
+
+            cur_class_id = (gt_boxes[k, -1] - 1).long()
+            distance = self.distance(spatial_indices, center[k])
+            inds[k] = distance.argmin()
+            mask[k] = 1
+
+            if 'gt_center' in self.gaussian_type:
+                centernet_utils.draw_gaussian_to_heatmap_voxels(heatmap[cur_class_id], distance, radius[k].item() * self.gaussian_ratio)
+
+            if 'nearst' in self.gaussian_type:
+                centernet_utils.draw_gaussian_to_heatmap_voxels(heatmap[cur_class_id], self.distance(spatial_indices, spatial_indices[inds[k]]), radius[k].item() * self.gaussian_ratio)
+
+            ret_boxes[k, 0:2] = center[k] - spatial_indices[inds[k]][:2]
+            ret_boxes[k, 2] = z[k]
+            ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log()
+            ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6])
+            ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6])
+            if gt_boxes.shape[1] > 8:
+                ret_boxes[k, 8:] = gt_boxes[k, 7:-1]
+
+        return heatmap, ret_boxes, inds, mask
+
+    def sigmoid(self, x):
+        y = torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4)
+        return y
+
+    def get_loss(self):
+        pred_dicts = self.forward_ret_dict['pred_dicts']
+        target_dicts = self.forward_ret_dict['target_dicts']
+        batch_index = self.forward_ret_dict['batch_index']
+
+        tb_dict = {}
+        loss = 0
+        batch_indices = self.forward_ret_dict['voxel_indices'][:, 0]
+        spatial_indices = self.forward_ret_dict['voxel_indices'][:, 1:]
+
+        for idx, pred_dict in enumerate(pred_dicts):
+            pred_dict['hm'] = self.sigmoid(pred_dict['hm'])
+            hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][idx])
+            hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight']
+
+            target_boxes = target_dicts['target_boxes'][idx]
+            pred_boxes = torch.cat([pred_dict[head_name] for head_name in self.separate_head_cfg.HEAD_ORDER], dim=1)
+
+            reg_loss = self.reg_loss_func(
+                pred_boxes, target_dicts['masks'][idx], target_dicts['inds'][idx], target_boxes, batch_index
+            )
+            loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum()
+            loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
+            tb_dict['hm_loss_head_%d' % idx] = hm_loss.item()
+            tb_dict['loc_loss_head_%d' % idx] = loc_loss.item()
+            if self.iou_branch:
+                batch_box_preds = self._get_predicted_boxes(pred_dict, spatial_indices)
+                pred_boxes_for_iou = batch_box_preds.detach()
+                iou_loss = self.crit_iou(pred_dict['iou'], target_dicts['masks'][idx], target_dicts['inds'][idx],
+                                            pred_boxes_for_iou, target_dicts['gt_boxes'][idx], batch_indices)
+
+                iou_reg_loss = self.crit_iou_reg(batch_box_preds, target_dicts['masks'][idx], target_dicts['inds'][idx],
+                                                    target_dicts['gt_boxes'][idx], batch_indices)
+                iou_weight = self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['iou_weight'] if 'iou_weight' in self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS else self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
+                iou_reg_loss = iou_reg_loss * iou_weight #self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']
+
+                loss += (hm_loss + loc_loss + iou_loss + iou_reg_loss)
+                tb_dict['iou_loss_head_%d' % idx] = iou_loss.item()
+                tb_dict['iou_reg_loss_head_%d' % idx] = iou_reg_loss.item()
+            else:
+                loss += hm_loss + loc_loss
+
+        tb_dict['rpn_loss'] = loss.item()
+        return loss, tb_dict
+
+    def _get_predicted_boxes(self, pred_dict, spatial_indices):
+        center = pred_dict['center']
+        center_z = pred_dict['center_z']
+        #dim = pred_dict['dim'].exp()
+        dim = torch.exp(torch.clamp(pred_dict['dim'], min=-5, max=5))
+        rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1)
+        rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1)
+        angle = torch.atan2(rot_sin, rot_cos)
+        xs = (spatial_indices[:, 1:2] + center[:, 0:1]) * self.feature_map_stride * self.voxel_size[0] + self.point_cloud_range[0]
+        ys = (spatial_indices[:, 0:1] + center[:, 1:2]) * self.feature_map_stride * self.voxel_size[1] + self.point_cloud_range[1]
+
+        box_part_list = [xs, ys, center_z, dim, angle]
+        pred_box = torch.cat((box_part_list), dim=-1)
+        return pred_box
+
+    def rotate_class_specific_nms_iou(self, boxes, scores, iou_preds, labels, rectifier, nms_configs):
+        """
+        :param boxes: (N, 5) [x, y, z, l, w, h, theta]
+        :param scores: (N)
+        :param thresh:
+        :return:
+        """
+        assert isinstance(rectifier, list)
+
+        box_preds_list, scores_list, labels_list = [], [], []
+        for cls in range(self.num_class):
+            mask = labels == cls
+            boxes_cls = boxes[mask]
+            scores_cls = torch.pow(scores[mask], 1 - rectifier[cls]) * torch.pow(iou_preds[mask].squeeze(-1), rectifier[cls])
+            labels_cls = labels[mask]
+
+            selected, selected_scores = model_nms_utils.class_agnostic_nms(box_scores=scores_cls, box_preds=boxes_cls, 
+                                                        nms_config=nms_configs[cls], score_thresh=None)
+
+            box_preds_list.append(boxes_cls[selected])
+            scores_list.append(scores_cls[selected])
+            labels_list.append(labels_cls[selected])
+
+        return torch.cat(box_preds_list, dim=0), torch.cat(scores_list, dim=0), torch.cat(labels_list, dim=0)
+
+    def merge_double_flip(self, pred_dict, batch_size, voxel_indices, spatial_shape):
+        # spatial_shape (Z, Y, X)
+        pred_dict['hm'] = pred_dict['hm'].sigmoid()
+        pred_dict['dim'] = pred_dict['dim'].exp()
+
+        batch_indices = voxel_indices[:, 0]
+        spatial_indices = voxel_indices[:, 1:]
+
+        pred_dict_ = {k: [] for k in pred_dict.keys()}
+        counts = []
+        spatial_indices_ = []
+        for bs_idx in range(batch_size):
+            spatial_indices_batch = []
+            pred_dict_batch = {k: [] for k in pred_dict.keys()}
+            for i in range(4):
+                bs_indices = batch_indices == (bs_idx * 4 + i)
+                if i in [1, 3]:
+                    spatial_indices[bs_indices, 0] = spatial_shape[0] - spatial_indices[bs_indices, 0]
+                if i in [2, 3]:
+                    spatial_indices[bs_indices, 1] = spatial_shape[1] - spatial_indices[bs_indices, 1]
+
+                if i == 1:
+                    pred_dict['center'][bs_indices, 1] = - pred_dict['center'][bs_indices, 1]
+                    pred_dict['rot'][bs_indices, 1] *= -1
+                    pred_dict['vel'][bs_indices, 1] *= -1
+
+                if i == 2:
+                    pred_dict['center'][bs_indices, 0] = - pred_dict['center'][bs_indices, 0]
+                    pred_dict['rot'][bs_indices, 0] *= -1
+                    pred_dict['vel'][bs_indices, 0] *= -1
+
+                if i == 3:
+                    pred_dict['center'][bs_indices, 0] = - pred_dict['center'][bs_indices, 0]
+                    pred_dict['center'][bs_indices, 1] = - pred_dict['center'][bs_indices, 1]
+
+                    pred_dict['rot'][bs_indices, 1] *= -1
+                    pred_dict['rot'][bs_indices, 0] *= -1
+
+                    pred_dict['vel'][bs_indices] *= -1
+
+                spatial_indices_batch.append(spatial_indices[bs_indices])
+
+                for k in pred_dict.keys():
+                    pred_dict_batch[k].append(pred_dict[k][bs_indices])
+
+            spatial_indices_batch = torch.cat(spatial_indices_batch)
+
+            spatial_indices_unique, _inv, count = torch.unique(spatial_indices_batch, dim=0, return_inverse=True,
+                                                               return_counts=True)
+            spatial_indices_.append(spatial_indices_unique)
+            counts.append(count)
+            for k in pred_dict.keys():
+                pred_dict_batch[k] = torch.cat(pred_dict_batch[k])
+                features_unique = pred_dict_batch[k].new_zeros(
+                    (spatial_indices_unique.shape[0], pred_dict_batch[k].shape[1]))
+                features_unique.index_add_(0, _inv, pred_dict_batch[k])
+                pred_dict_[k].append(features_unique)
+
+        for k in pred_dict.keys():
+            pred_dict_[k] = torch.cat(pred_dict_[k])
+        counts = torch.cat(counts).unsqueeze(-1).float()
+        voxel_indices_ = torch.cat([torch.cat(
+            [torch.full((indices.shape[0], 1), i, device=indices.device, dtype=indices.dtype), indices], dim=1
+        ) for i, indices in enumerate(spatial_indices_)])
+
+        batch_hm = pred_dict_['hm']
+        batch_center = pred_dict_['center']
+        batch_center_z = pred_dict_['center_z']
+        batch_dim = pred_dict_['dim']
+        batch_rot_cos = pred_dict_['rot'][:, 0].unsqueeze(dim=1)
+        batch_rot_sin = pred_dict_['rot'][:, 1].unsqueeze(dim=1)
+        batch_vel = pred_dict_['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None
+
+        batch_hm /= counts
+        batch_center /= counts
+        batch_center_z /= counts
+        batch_dim /= counts
+        batch_rot_cos /= counts
+        batch_rot_sin /= counts
+
+        if not batch_vel is None:
+            batch_vel /= counts
+
+        return batch_hm, batch_center, batch_center_z, batch_dim, batch_rot_cos, batch_rot_sin, batch_vel, None, voxel_indices_
+
+    def generate_predicted_boxes(self, batch_size, pred_dicts, voxel_indices, spatial_shape):
+        post_process_cfg = self.model_cfg.POST_PROCESSING
+        post_center_limit_range = torch.tensor(post_process_cfg.POST_CENTER_LIMIT_RANGE).cuda().float()
+
+        ret_dict = [{
+            'pred_boxes': [],
+            'pred_scores': [],
+            'pred_labels': [],
+            'pred_ious': [],
+        } for k in range(batch_size)]
+        for idx, pred_dict in enumerate(pred_dicts):
+            if self.double_flip:
+                batch_hm, batch_center, batch_center_z, batch_dim, batch_rot_cos, batch_rot_sin, batch_vel, batch_iou, voxel_indices_ = \
+                self.merge_double_flip(pred_dict, batch_size, voxel_indices.clone(), spatial_shape)
+            else:
+                batch_hm = pred_dict['hm'].sigmoid()
+                batch_center = pred_dict['center']
+                batch_center_z = pred_dict['center_z']
+                batch_dim = pred_dict['dim'].exp()
+                batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1)
+                batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1)
+                batch_iou = (pred_dict['iou'] + 1) * 0.5 if self.iou_branch else None
+                batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None
+                voxel_indices_ = voxel_indices
+
+            final_pred_dicts = centernet_utils.decode_bbox_from_voxels_nuscenes(
+                batch_size=batch_size, indices=voxel_indices_,
+                obj=batch_hm, 
+                rot_cos=batch_rot_cos,
+                rot_sin=batch_rot_sin,
+                center=batch_center, center_z=batch_center_z,
+                dim=batch_dim, vel=batch_vel, iou=batch_iou,
+                point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size,
+                feature_map_stride=self.feature_map_stride,
+                K=post_process_cfg.MAX_OBJ_PER_SAMPLE,
+                #circle_nms=(post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'),
+                score_thresh=post_process_cfg.SCORE_THRESH,
+                post_center_limit_range=post_center_limit_range
+            )
+
+            for k, final_dict in enumerate(final_pred_dicts):
+                final_dict['pred_labels'] = self.class_id_mapping_each_head[idx][final_dict['pred_labels'].long()]
+                if not self.iou_branch:
+                    selected, selected_scores = model_nms_utils.class_agnostic_nms(
+                        box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'],
+                        nms_config=post_process_cfg.NMS_CONFIG,
+                        score_thresh=None
+                    )
+
+                    final_dict['pred_boxes'] = final_dict['pred_boxes'][selected]
+                    final_dict['pred_scores'] = selected_scores
+                    final_dict['pred_labels'] = final_dict['pred_labels'][selected]
+
+                ret_dict[k]['pred_boxes'].append(final_dict['pred_boxes'])
+                ret_dict[k]['pred_scores'].append(final_dict['pred_scores'])
+                ret_dict[k]['pred_labels'].append(final_dict['pred_labels'])
+                ret_dict[k]['pred_ious'].append(final_dict['pred_ious'])
+
+        for k in range(batch_size):
+            pred_boxes = torch.cat(ret_dict[k]['pred_boxes'], dim=0)
+            pred_scores = torch.cat(ret_dict[k]['pred_scores'], dim=0)
+            pred_labels = torch.cat(ret_dict[k]['pred_labels'], dim=0)
+            if self.iou_branch:
+                pred_ious = torch.cat(ret_dict[k]['pred_ious'], dim=0)
+                pred_boxes, pred_scores, pred_labels = self.rotate_class_specific_nms_iou(pred_boxes, pred_scores, pred_ious, pred_labels, self.rectifier, self.nms_configs)
+
+            ret_dict[k]['pred_boxes'] = pred_boxes
+            ret_dict[k]['pred_scores'] = pred_scores
+            ret_dict[k]['pred_labels'] = pred_labels + 1
+
+        return ret_dict
+
+    @staticmethod
+    def reorder_rois_for_refining(batch_size, pred_dicts):
+        num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts])
+        num_max_rois = max(1, num_max_rois)  # at least one faked rois to avoid error
+        pred_boxes = pred_dicts[0]['pred_boxes']
+
+        rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1]))
+        roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois))
+        roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long()
+
+        for bs_idx in range(batch_size):
+            num_boxes = len(pred_dicts[bs_idx]['pred_boxes'])
+
+            rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes']
+            roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores']
+            roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels']
+        return rois, roi_scores, roi_labels
+
+    def _get_voxel_infos(self, x):
+        spatial_shape = x.spatial_shape
+        voxel_indices = x.indices
+        spatial_indices = []
+        num_voxels = []
+        batch_size = x.batch_size
+        batch_index = voxel_indices[:, 0]
+
+        for bs_idx in range(batch_size):
+            batch_inds = batch_index==bs_idx
+            spatial_indices.append(voxel_indices[batch_inds][:, [2, 1]])
+            num_voxels.append(batch_inds.sum())
+
+        return spatial_shape, batch_index, voxel_indices, spatial_indices, num_voxels
+
+    def forward(self, data_dict):
+        x = data_dict['encoded_spconv_tensor']
+
+        spatial_shape, batch_index, voxel_indices, spatial_indices, num_voxels = self._get_voxel_infos(x)
+        self.forward_ret_dict['batch_index'] = batch_index
+        
+        pred_dicts = []
+        for head in self.heads_list:
+            pred_dicts.append(head(x))
+
+        if self.training:
+            target_dict = self.assign_targets(
+                data_dict['gt_boxes'], num_voxels, spatial_indices, spatial_shape
+            )
+            self.forward_ret_dict['target_dicts'] = target_dict
+
+        self.forward_ret_dict['pred_dicts'] = pred_dicts
+        self.forward_ret_dict['voxel_indices'] = voxel_indices
+
+        if not self.training or self.predict_boxes_when_training:
+            if self.double_flip:
+                data_dict['batch_size'] = data_dict['batch_size'] // 4
+            pred_dicts = self.generate_predicted_boxes(
+                data_dict['batch_size'], 
+                pred_dicts, voxel_indices, spatial_shape
+            )
+
+            if self.predict_boxes_when_training:
+                rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], pred_dicts)
+                data_dict['rois'] = rois
+                data_dict['roi_scores'] = roi_scores
+                data_dict['roi_labels'] = roi_labels
+                data_dict['has_class_labels'] = True
+            else:
+                data_dict['final_box_dicts'] = pred_dicts
+
+        return data_dict
--- a/pcdet/models/detectors/__init__.py
+++ b/pcdet/models/detectors/__init__.py
@@ -12,6 +12,7 @@ from .pv_rcnn_plusplus import PVRCNNPlusPlus
 from .mppnet import MPPNet
 from .mppnet_e2e import MPPNetE2E
 from .pillarnet import PillarNet
+from .voxelnext import VoxelNeXt

 __all__ = {
    'Detector3DTemplate': Detector3DTemplate,
@@ -28,7 +29,8 @@ __all__ = {
    'PVRCNNPlusPlus': PVRCNNPlusPlus,
    'MPPNet': MPPNet,
    'MPPNetE2E': MPPNetE2E,
-    'PillarNet': PillarNet
+    'PillarNet': PillarNet,
+    'VoxelNeXt': VoxelNeXt
 }



--- a/pcdet/models/detectors/detector3d_template.py
+++ b/pcdet/models/detectors/detector3d_template.py
@@ -127,7 +127,7 @@ class Detector3DTemplate(nn.Module):
            return None, model_info_dict
        dense_head_module = dense_heads.__all__[self.model_cfg.DENSE_HEAD.NAME](
            model_cfg=self.model_cfg.DENSE_HEAD,
-            input_channels=model_info_dict['num_bev_features'],
+            input_channels=model_info_dict['num_bev_features'] if 'num_bev_features' in model_info_dict else self.model_cfg.DENSE_HEAD.INPUT_FEATURES,
            num_class=self.num_class if not self.model_cfg.DENSE_HEAD.CLASS_AGNOSTIC else 1,
            class_names=self.class_names,
            grid_size=model_info_dict['grid_size'],

--- a/pcdet/models/detectors/voxelnext.py
+++ b/pcdet/models/detectors/voxelnext.py
+from .detector3d_template import Detector3DTemplate
+
+class VoxelNeXt(Detector3DTemplate):
+    def __init__(self, model_cfg, num_class, dataset):
+        super().__init__(model_cfg=model_cfg, num_class=num_class, dataset=dataset)
+        self.module_list = self.build_networks()
+
+    def forward(self, batch_dict):
+
+        for cur_module in self.module_list:
+            batch_dict = cur_module(batch_dict)
+
+        if self.training:
+            loss, tb_dict, disp_dict = self.get_training_loss()
+            ret_dict = {
+                'loss': loss
+            }
+            return ret_dict, tb_dict, disp_dict
+        else:
+            pred_dicts, recall_dicts = self.post_processing(batch_dict)
+            return pred_dicts, recall_dicts
+
+    def get_training_loss(self):
+        
+        disp_dict = {}
+        loss, tb_dict = self.dense_head.get_loss()
+        
+        return loss, tb_dict, disp_dict
+
+    def post_processing(self, batch_dict):
+        post_process_cfg = self.model_cfg.POST_PROCESSING
+        batch_size = batch_dict['batch_size']
+        final_pred_dict = batch_dict['final_box_dicts']
+        recall_dict = {}
+        for index in range(batch_size):
+            pred_boxes = final_pred_dict[index]['pred_boxes']
+
+            recall_dict = self.generate_recall_record(
+                box_preds=pred_boxes,
+                recall_dict=recall_dict, batch_index=index, data_dict=batch_dict,
+                thresh_list=post_process_cfg.RECALL_THRESH_LIST
+            )
+
+        return final_pred_dict, recall_dict
--- a/pcdet/models/model_utils/centernet_utils.py
+++ b/pcdet/models/model_utils/centernet_utils.py
@@ -77,6 +77,25 @@ def _nms(heat, kernel=3):
    return heat * keep


+def gaussian3D(shape, sigma=1):
+    m, n = [(ss - 1.) / 2. for ss in shape]
+    y, x = np.ogrid[-m:m + 1, -n:n + 1]
+
+    h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))
+    h[h < np.finfo(h.dtype).eps * h.max()] = 0
+    return h
+
+
+def draw_gaussian_to_heatmap_voxels(heatmap, distances, radius, k=1):
+    diameter = 2 * radius + 1
+    sigma = diameter / 6
+    masked_gaussian = torch.exp(- distances / (2 * sigma * sigma))
+
+    torch.max(heatmap, masked_gaussian, out=heatmap)
+
+    return heatmap
+
+
 @numba.jit(nopython=True)
 def circle_nms(dets, thresh):
    x1 = dets[:, 0]
@@ -214,3 +233,116 @@ def decode_bbox_from_heatmap(heatmap, rot_cos, rot_sin, center, center_z, dim,
            'pred_labels': cur_labels
        })
    return ret_pred_dicts
+
+def _topk_1d(scores, batch_size, batch_idx, obj, K=40, nuscenes=False):
+    # scores: (N, num_classes)
+    topk_score_list = []
+    topk_inds_list = []
+    topk_classes_list = []
+
+    for bs_idx in range(batch_size):
+        batch_inds = batch_idx==bs_idx
+        if obj.shape[-1] == 1 and not nuscenes:
+            score = scores[batch_inds].permute(1, 0)
+            topk_scores, topk_inds = torch.topk(score, K)
+            topk_score, topk_ind = torch.topk(obj[topk_inds.view(-1)].squeeze(-1), K) #torch.topk(topk_scores.view(-1), K)
+        else:
+            score = obj[batch_inds].permute(1, 0)
+            topk_scores, topk_inds = torch.topk(score, min(K, score.shape[-1]))
+            topk_score, topk_ind = torch.topk(topk_scores.view(-1), min(K, topk_scores.view(-1).shape[-1]))
+            #topk_score, topk_ind = torch.topk(score.reshape(-1), K)
+
+        topk_classes = (topk_ind // K).int()
+        topk_inds = topk_inds.view(-1).gather(0, topk_ind)
+        #print('topk_inds', topk_inds)
+
+        if not obj is None and obj.shape[-1] == 1:
+            topk_score_list.append(obj[batch_inds][topk_inds])
+        else:
+            topk_score_list.append(topk_score)
+        topk_inds_list.append(topk_inds)
+        topk_classes_list.append(topk_classes)
+
+    topk_score = torch.stack(topk_score_list)
+    topk_inds = torch.stack(topk_inds_list)
+    topk_classes = torch.stack(topk_classes_list)
+
+    return topk_score, topk_inds, topk_classes
+
+def gather_feat_idx(feats, inds, batch_size, batch_idx):
+    feats_list = []
+    dim = feats.size(-1)
+    _inds = inds.unsqueeze(-1).expand(inds.size(0), inds.size(1), dim)
+
+    for bs_idx in range(batch_size):
+        batch_inds = batch_idx==bs_idx
+        feat = feats[batch_inds]
+        feats_list.append(feat.gather(0, _inds[bs_idx]))
+    feats = torch.stack(feats_list)
+    return feats
+
+def decode_bbox_from_voxels_nuscenes(batch_size, indices, obj, rot_cos, rot_sin,
+                            center, center_z, dim, vel=None, iou=None, point_cloud_range=None, voxel_size=None, voxels_3d=None,
+                            feature_map_stride=None, K=100, score_thresh=None, post_center_limit_range=None, add_features=None):
+    batch_idx = indices[:, 0]
+    spatial_indices = indices[:, 1:]
+    scores, inds, class_ids = _topk_1d(None, batch_size, batch_idx, obj, K=K, nuscenes=True)
+
+    center = gather_feat_idx(center, inds, batch_size, batch_idx)
+    rot_sin = gather_feat_idx(rot_sin, inds, batch_size, batch_idx)
+    rot_cos = gather_feat_idx(rot_cos, inds, batch_size, batch_idx)
+    center_z = gather_feat_idx(center_z, inds, batch_size, batch_idx)
+    dim = gather_feat_idx(dim, inds, batch_size, batch_idx)
+    spatial_indices = gather_feat_idx(spatial_indices, inds, batch_size, batch_idx)
+
+    if not add_features is None:
+        add_features = [gather_feat_idx(add_feature, inds, batch_size, batch_idx) for add_feature in add_features]
+
+    if not isinstance(feature_map_stride, int):
+        feature_map_stride = gather_feat_idx(feature_map_stride.unsqueeze(-1), inds, batch_size, batch_idx)
+
+    angle = torch.atan2(rot_sin, rot_cos)
+    xs = (spatial_indices[:, :, -1:] + center[:, :, 0:1]) * feature_map_stride * voxel_size[0] + point_cloud_range[0]
+    ys = (spatial_indices[:, :, -2:-1] + center[:, :, 1:2]) * feature_map_stride * voxel_size[1] + point_cloud_range[1]
+    #zs = (spatial_indices[:, :, 0:1]) * feature_map_stride * voxel_size[2] + point_cloud_range[2] + center_z
+
+    box_part_list = [xs, ys, center_z, dim, angle]
+
+    if not vel is None:
+        vel = gather_feat_idx(vel, inds, batch_size, batch_idx)
+        box_part_list.append(vel)
+
+    if not iou is None:
+        iou = gather_feat_idx(iou, inds, batch_size, batch_idx)
+        iou = torch.clamp(iou, min=0, max=1.)
+
+    final_box_preds = torch.cat((box_part_list), dim=-1)
+    final_scores = scores.view(batch_size, K)
+    final_class_ids = class_ids.view(batch_size, K)
+    if not add_features is None:
+        add_features = [add_feature.view(batch_size, K, add_feature.shape[-1]) for add_feature in add_features]
+
+    assert post_center_limit_range is not None
+    mask = (final_box_preds[..., :3] >= post_center_limit_range[:3]).all(2)
+    mask &= (final_box_preds[..., :3] <= post_center_limit_range[3:]).all(2)
+
+    if score_thresh is not None:
+        mask &= (final_scores > score_thresh)
+
+    ret_pred_dicts = []
+    for k in range(batch_size):
+        cur_mask = mask[k]
+        cur_boxes = final_box_preds[k, cur_mask]
+        cur_scores = final_scores[k, cur_mask]
+        cur_labels = final_class_ids[k, cur_mask]
+        cur_add_features = [add_feature[k, cur_mask] for add_feature in add_features] if not add_features is None else None
+        cur_iou = iou[k, cur_mask] if not iou is None else None
+
+        ret_pred_dicts.append({
+            'pred_boxes': cur_boxes,
+            'pred_scores': cur_scores,
+            'pred_labels': cur_labels,
+            'pred_ious': cur_iou,
+            'add_features': cur_add_features,
+        })
+    return ret_pred_dicts
--- a/pcdet/ops/iou3d_nms/iou3d_nms_utils.py
+++ b/pcdet/ops/iou3d_nms/iou3d_nms_utils.py
@@ -80,6 +80,42 @@ def boxes_iou3d_gpu(boxes_a, boxes_b):

    return iou3d

+def boxes_aligned_iou3d_gpu(boxes_a, boxes_b):
+    """
+    Args:
+        boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading]
+        boxes_b: (N, 7) [x, y, z, dx, dy, dz, heading]
+
+    Returns:
+        ans_iou: (N,)
+    """
+    assert boxes_a.shape[0] == boxes_b.shape[0]
+    assert boxes_a.shape[1] == boxes_b.shape[1] == 7
+
+    # height overlap
+    boxes_a_height_max = (boxes_a[:, 2] + boxes_a[:, 5] / 2).view(-1, 1)
+    boxes_a_height_min = (boxes_a[:, 2] - boxes_a[:, 5] / 2).view(-1, 1)
+    boxes_b_height_max = (boxes_b[:, 2] + boxes_b[:, 5] / 2).view(-1, 1)
+    boxes_b_height_min = (boxes_b[:, 2] - boxes_b[:, 5] / 2).view(-1, 1)
+
+    # bev overlap
+    overlaps_bev = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], 1))).zero_()  # (N, M)
+    iou3d_nms_cuda.boxes_aligned_overlap_bev_gpu(boxes_a.contiguous(), boxes_b.contiguous(), overlaps_bev)
+
+    max_of_min = torch.max(boxes_a_height_min, boxes_b_height_min)
+    min_of_max = torch.min(boxes_a_height_max, boxes_b_height_max)
+    overlaps_h = torch.clamp(min_of_max - max_of_min, min=0)
+
+    # 3d iou
+    overlaps_3d = overlaps_bev * overlaps_h
+
+    vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).view(-1, 1)
+    vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).view(-1, 1)
+
+    iou3d = overlaps_3d / torch.clamp(vol_a + vol_b - overlaps_3d, min=1e-6)
+
+    return iou3d
+

 def nms_gpu(boxes, scores, thresh, pre_maxsize=None, **kwargs):
    """

--- a/pcdet/ops/iou3d_nms/src/iou3d_cpu.cpp
+++ b/pcdet/ops/iou3d_nms/src/iou3d_cpu.cpp
@@ -250,3 +250,24 @@ int boxes_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::
    }
    return 1;
 }
+
+int boxes_aligned_iou_bev_cpu(at::Tensor boxes_a_tensor, at::Tensor boxes_b_tensor, at::Tensor ans_iou_tensor){
+    // params boxes_a_tensor: (N, 7) [x, y, z, dx, dy, dz, heading]
+    // params boxes_b_tensor: (N, 7) [x, y, z, dx, dy, dz, heading]
+    // params ans_iou_tensor: (N, 1)
+
+    CHECK_CONTIGUOUS(boxes_a_tensor);
+    CHECK_CONTIGUOUS(boxes_b_tensor);
+
+    int num_boxes = boxes_a_tensor.size(0);
+    int num_boxes_b = boxes_b_tensor.size(0);
+    assert(num_boxes == num_boxes_b);
+    const float *boxes_a = boxes_a_tensor.data<float>();
+    const float *boxes_b = boxes_b_tensor.data<float>();
+    float *ans_iou = ans_iou_tensor.data<float>();
+
+    for (int i = 0; i < num_boxes; i++){
+        ans_iou[i] = iou_bev(boxes_a + i * 7, boxes_b + i * 7);
+    }
+    return 1;
+}