Bump version to V1.0.0rc0

Bump version to V1.0.0rc0

Bump version to V1.0.0rc0
32a4328b · Wenwei Zhang · GitHub · 86cc487c · a8817998 · 32a4328b
Unverified Commit 32a4328b authored Feb 24, 2022 by Wenwei Zhang Committed by GitHub Feb 24, 2022
20 changed files
--- a/docs/zh_cn/faq.md
+++ b/docs/zh_cn/faq.md
@@ -19,12 +19,6 @@

  **注意**： 我们已经在 0.13.0 及之后的版本中全面支持 pycocotools。

- 如果您遇到下面的问题，并且您的环境包含 numba == 0.48.0 和 numpy >= 1.20.0：
-
-  ``TypeError: expected dtype object, got 'numpy.dtype[bool_]'``
-
-    请将 numpy 的版本降级至 < 1.20.0，或者从源码安装 numba == 0.48，这是由于 numpy == 1.20.0 改变了 API，使得在调用 `np.dtype` 会产生子类。请参考 [这里](https://github.com/numba/numba/issues/6041) 获取更多细节。
-
 - 如果您在导入 pycocotools 相关包时遇到下面的问题：

  ``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject``

--- a/docs/zh_cn/getting_started.md
+++ b/docs/zh_cn/getting_started.md
@@ -10,6 +10,7 @@
 | MMDetection3D version | MMDetection version | MMSegmentation version |    MMCV version     |
 |:-------------------:|:-------------------:|:-------------------:|:-------------------:|
 | master              | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
+| v1.0.0rc0           | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
 | 0.18.1              | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
 | 0.18.0              | mmdet>=2.19.0, <=3.0.0| mmseg>=0.20.0, <=1.0.0 | mmcv-full>=1.3.8, <=1.5.0|
 | 0.17.3              | mmdet>=2.14.0, <=3.0.0| mmseg>=0.14.1, <=1.0.0 | mmcv-full>=1.3.8, <=1.4.0|

--- a/docs/zh_cn/model_zoo.md
+++ b/docs/zh_cn/model_zoo.md
@@ -75,3 +75,31 @@
 ### ImVoxelNet

 请参考 [ImVoxelNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/imvoxelnet) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果。
+
+### PAConv
+
+请参考 [PAConv](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/paconv) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果.
+
+### DGCNN
+
+请参考 [DGCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/dgcnn) 获取更多细节，我们在 S3DIS 数据集上给出了相应的结果.
+
+### SMOKE
+
+请参考 [SMOKE](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/smoke) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果.
+
+### PGD
+
+请参考 [PGD](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pgd) 获取更多细节，我们在 KITTI 和 nuScenes 数据集上给出了相应的结果.
+
+### PointRCNN
+
+请参考 [PointRCNN](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/point_rcnn) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果.
+
+### MonoFlex
+
+请参考 [MonoFlex](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/monoflex) 获取更多细节，我们在 KITTI 数据集上给出了相应的结果.
+
+### Mixed Precision (FP16) Training
+
+细节请参考 [Mixed Precision (FP16) Training] 在 PointPillars 训练的样例 (https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0.dev0/configs/pointpillars/hv_pointpillars_fpn_sbn-all_fp16_2x8_2x_nus-3d.py).
--- a/docs/zh_cn/stat.py
+++ b/docs/zh_cn/stat.py
 #!/usr/bin/env python
 import functools as func
 import glob
-import numpy as np
 import re
 from os import path as osp

+import numpy as np
+
 url_prefix = 'https://github.com/open-mmlab/mmdetection3d/blob/master/'

 files = sorted(glob.glob('../configs/*/README.md'))

--- a/docs/zh_cn/tutorials/index.rst
+++ b/docs/zh_cn/tutorials/index.rst
@@ -6,3 +6,4 @@
   data_pipeline.md
   customize_models.md
   customize_runtime.md
+   coord_sys_tutorial.md
--- a/docs/zh_cn/useful_tools.md
+++ b/docs/zh_cn/useful_tools.md
@@ -71,7 +71,7 @@ python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
 python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --eval-options 'show=True' 'out_dir=${SHOW_DIR}'
 ```

-在运行这个指令后，您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签（例如：在多模态检测任务中的`***_points.obj`，`***_pred.obj`，`***_gt.obj`，`***_img.png` 和 `***_pred.png` ）。当 `show` 被激活，[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当在没有 GUI 的远程服务器上运行测试的时候，您需要设定 `show=False`。
+在运行这个指令后，您将会在 `${SHOW_DIR}` 获得输入数据、可视化在输入上的网络输出和真值标签（例如：在多模态检测任务中的`***_points.obj`，`***_pred.obj`，`***_gt.obj`，`***_img.png` 和 `***_pred.png` ）。当 `show` 被激活，[Open3D](http://www.open3d.org/) 将会被用来在线可视化结果。当您在没有 GUI 的远程服务器上运行测试的时候，无法进行在线可视化，您可以设定 `show=False` 将输出结果保存在 `{SHOW_DIR}`。

 至于离线可视化，您将有两个选择。
 利用 `Open3D` 后端可视化结果，您可以运行下面的指令
@@ -97,6 +97,12 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py -

 **注意**：一旦指定 `--output-dir` ，当按下 open3d 窗口的 `_ESC_`，用户指定的视图图像将被保存。如果您没有显示器，您可以移除 `--online` 标志，从而仅仅保存可视化结果并且进行离线浏览。

+为了验证数据的一致性和数据增强的效果，您还可以使用以下命令添加 `--aug` 标志来可视化数据增强后的数据：
+
+```shell
+python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --aug --output-dir ${OUTPUT_DIR} --online
+```
+
 如果您还想显示 2D 图像以及投影的 3D 边界框，则需要找到支持多模态数据加载的配置文件，然后将 `--task` 参数更改为 `multi_modality-det`。一个例子如下所示

 ```shell
@@ -123,6 +129,64 @@ python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task

 &emsp;

+# 模型部署
+
+**Note**: 此工具仍然处于试验阶段，目前只有 SECOND 支持用 [`TorchServe`](https://pytorch.org/serve/) 部署，我们将会在未来支持更多的模型。
+
+为了使用 [`TorchServe`](https://pytorch.org/serve/) 部署 `MMDetection3D` 模型，您可以遵循以下步骤：
+
+## 1. 将模型从 MMDetection3D 转换到 TorchServe
+
+```shell
+python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
+--output-folder ${MODEL_STORE} \
+--model-name ${MODEL_NAME}
+```
+
+**Note**: ${MODEL_STORE} 需要为文件夹的绝对路径。
+
+## 2. 构建 `mmdet3d-serve` 镜像
+
+```shell
+docker build -t mmdet3d-serve:latest docker/serve/
+```
+
+## 3. 运行 `mmdet3d-serve`
+
+查看官网文档来 [使用 docker 运行 TorchServe](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment)。
+
+为了在 GPU 上运行，您需要安装 [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。您可以忽略 `--gpus` 参数，从而在 CPU 上运行。
+
+例子：
+
+```shell
+docker run --rm \
+--cpus 8 \
+--gpus device=0 \
+-p8080:8080 -p8081:8081 -p8082:8082 \
+--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
+mmdet3d-serve:latest
+```
+
+[阅读文档](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) 关于 Inference (8080), Management (8081) and Metrics (8082) 接口。
+
+## 4. 测试部署
+
+您可以使用 `test_torchserver.py` 进行部署， 同时比较 torchserver 和 pytorch 的结果。
+
+```shell
+python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
+[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
+```
+
+例子:
+
+```shell
+python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
+```
+
+&emsp;
+
 # 模型复杂度

 您可以使用 MMDetection 中的 `tools/analysis_tools/get_flops.py` 这个脚本文件，基于 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 计算一个给定模型的计算量 (FLOPS) 和参数量 (params)。

--- a/mmdet3d/apis/__init__.py
+++ b/mmdet3d/apis/__init__.py
@@ -4,10 +4,11 @@ from .inference import (convert_SyncBN, inference_detector,
                        inference_multi_modality_detector, inference_segmentor,
                        init_model, show_result_meshlab)
 from .test import single_gpu_test
-from .train import train_model
+from .train import init_random_seed, train_model

 __all__ = [
    'inference_detector', 'init_model', 'single_gpu_test',
    'inference_mono_3d_detector', 'show_result_meshlab', 'convert_SyncBN',
-    'train_model', 'inference_multi_modality_detector', 'inference_segmentor'
+    'train_model', 'inference_multi_modality_detector', 'inference_segmentor',
+    'init_random_seed'
 ]
--- a/mmdet3d/apis/inference.py
+++ b/mmdet3d/apis/inference.py
 # Copyright (c) OpenMMLab. All rights reserved.
+import re
+from copy import deepcopy
+from os import path as osp
+
 import mmcv
 import numpy as np
-import re
 import torch
-from copy import deepcopy
 from mmcv.parallel import collate, scatter
 from mmcv.runner import load_checkpoint
-from os import path as osp

-from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes,
+from mmdet3d.core import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,
                          DepthInstance3DBoxes, LiDARInstance3DBoxes,
                          show_multi_modality_result, show_result,
                          show_seg_result)
@@ -83,26 +84,53 @@ def inference_detector(model, pcd):
    """
    cfg = model.cfg
    device = next(model.parameters()).device  # model device
+
+    if not isinstance(pcd, str):
+        cfg = cfg.copy()
+        # set loading pipeline type
+        cfg.data.test.pipeline[0].type = 'LoadPointsFromDict'
+
    # build the data pipeline
    test_pipeline = deepcopy(cfg.data.test.pipeline)
    test_pipeline = Compose(test_pipeline)
    box_type_3d, box_mode_3d = get_box_type(cfg.data.test.box_type_3d)
-    data = dict(
-        pts_filename=pcd,
-        box_type_3d=box_type_3d,
-        box_mode_3d=box_mode_3d,
-        # for ScanNet demo we need axis_align_matrix
-        ann_info=dict(axis_align_matrix=np.eye(4)),
-        sweeps=[],
-        # set timestamp = 0
-        timestamp=[0],
-        img_fields=[],
-        bbox3d_fields=[],
-        pts_mask_fields=[],
-        pts_seg_fields=[],
-        bbox_fields=[],
-        mask_fields=[],
-        seg_fields=[])
+
+    if isinstance(pcd, str):
+        # load from point clouds file
+        data = dict(
+            pts_filename=pcd,
+            box_type_3d=box_type_3d,
+            box_mode_3d=box_mode_3d,
+            # for ScanNet demo we need axis_align_matrix
+            ann_info=dict(axis_align_matrix=np.eye(4)),
+            sweeps=[],
+            # set timestamp = 0
+            timestamp=[0],
+            img_fields=[],
+            bbox3d_fields=[],
+            pts_mask_fields=[],
+            pts_seg_fields=[],
+            bbox_fields=[],
+            mask_fields=[],
+            seg_fields=[])
+    else:
+        # load from http
+        data = dict(
+            points=pcd,
+            box_type_3d=box_type_3d,
+            box_mode_3d=box_mode_3d,
+            # for ScanNet demo we need axis_align_matrix
+            ann_info=dict(axis_align_matrix=np.eye(4)),
+            sweeps=[],
+            # set timestamp = 0
+            timestamp=[0],
+            img_fields=[],
+            bbox3d_fields=[],
+            pts_mask_fields=[],
+            pts_seg_fields=[],
+            bbox_fields=[],
+            mask_fields=[],
+            seg_fields=[])
    data = test_pipeline(data)
    data = collate([data], samples_per_gpu=1)
    if next(model.parameters()).is_cuda:
@@ -317,8 +345,7 @@ def show_det_result_meshlab(data,
    # for now we convert points into depth mode
    box_mode = data['img_metas'][0][0]['box_mode_3d']
    if box_mode != Box3DMode.DEPTH:
-        points = points[..., [1, 0, 2]]
-        points[..., 0] *= -1
+        points = Coord3DMode.convert(points, box_mode, Coord3DMode.DEPTH)
        show_bboxes = Box3DMode.convert(pred_bboxes, box_mode, Box3DMode.DEPTH)
    else:
        show_bboxes = deepcopy(pred_bboxes)
@@ -462,15 +489,17 @@ def show_result_meshlab(data,
        data (dict): Contain data from pipeline.
        result (dict): Predicted result from model.
        out_dir (str): Directory to save visualized result.
-        score_thr (float): Minimum score of bboxes to be shown. Default: 0.0
-        show (bool): Visualize the results online. Defaults to False.
-        snapshot (bool): Whether to save the online results. Defaults to False.
-        task (str): Distinguish which task result to visualize. Currently we
-            support 3D detection, multi-modality detection and 3D segmentation.
-            Defaults to 'det'.
-        palette (list[list[int]]] | np.ndarray | None): The palette of
-                segmentation map. If None is given, random palette will be
-                generated. Defaults to None.
+        score_thr (float, optional): Minimum score of bboxes to be shown.
+            Default: 0.0
+        show (bool, optional): Visualize the results online. Defaults to False.
+        snapshot (bool, optional): Whether to save the online results.
+            Defaults to False.
+        task (str, optional): Distinguish which task result to visualize.
+            Currently we support 3D detection, multi-modality detection and
+            3D segmentation. Defaults to 'det'.
+        palette (list[list[int]]] | np.ndarray, optional): The palette
+            of segmentation map. If None is given, random palette will be
+            generated. Defaults to None.
    """
    assert task in ['det', 'multi_modality-det', 'seg', 'mono-det'], \
        f'unsupported visualization task {task}'

--- a/mmdet3d/apis/test.py
+++ b/mmdet3d/apis/test.py
 # Copyright (c) OpenMMLab. All rights reserved.
+from os import path as osp
+
 import mmcv
 import torch
 from mmcv.image import tensor2imgs
-from os import path as osp

 from mmdet3d.models import (Base3DDetector, Base3DSegmentor,
                            SingleStageMono3DDetector)
@@ -22,9 +23,9 @@ def single_gpu_test(model,
    Args:
        model (nn.Module): Model to be tested.
        data_loader (nn.Dataloader): Pytorch data loader.
-        show (bool): Whether to save viualization results.
+        show (bool, optional): Whether to save viualization results.
            Default: True.
-        out_dir (str): The path to save visualization results.
+        out_dir (str, optional): The path to save visualization results.
            Default: None.

    Returns:

--- a/mmdet3d/apis/train.py
+++ b/mmdet3d/apis/train.py
 # Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+from mmcv.runner import get_dist_info
+from torch import distributed as dist
+
 from mmdet.apis import train_detector
 from mmseg.apis import train_segmentor


+def init_random_seed(seed=None, device='cuda'):
+    """Initialize random seed.
+
+    If the seed is not set, the seed will be automatically randomized,
+    and then broadcast to all processes to prevent some potential bugs.
+    Args:
+        seed (int, optional): The seed. Default to None.
+        device (str, optional): The device where the seed will be put on.
+            Default to 'cuda'.
+    Returns:
+        int: Seed to be used.
+    """
+    if seed is not None:
+        return seed
+
+    # Make sure all ranks share the same random seed to prevent
+    # some potential bugs. Please refer to
+    # https://github.com/open-mmlab/mmdetection/issues/6339
+    rank, world_size = get_dist_info()
+    seed = np.random.randint(2**31)
+    if world_size == 1:
+        return seed
+
+    if rank == 0:
+        random_num = torch.tensor(seed, dtype=torch.int32, device=device)
+    else:
+        random_num = torch.tensor(0, dtype=torch.int32, device=device)
+    dist.broadcast(random_num, src=0)
+    return random_num.item()
+
+
 def train_model(model,
                dataset,
                cfg,

--- a/mmdet3d/core/anchor/anchor_3d_generator.py
+++ b/mmdet3d/core/anchor/anchor_3d_generator.py
@@ -19,20 +19,26 @@ class Anchor3DRangeGenerator(object):
        ranges (list[list[float]]): Ranges of different anchors.
            The ranges are the same across different feature levels. But may
            vary for different anchor sizes if size_per_range is True.
-        sizes (list[list[float]]): 3D sizes of anchors.
-        scales (list[int]): Scales of anchors in different feature levels.
-        rotations (list[float]): Rotations of anchors in a feature grid.
-        custom_values (tuple[float]): Customized values of that anchor. For
-            example, in nuScenes the anchors have velocities.
-        reshape_out (bool): Whether to reshape the output into (N x 4).
-        size_per_range: Whether to use separate ranges for different sizes.
-            If size_per_range is True, the ranges should have the same length
-            as the sizes, if not, it will be duplicated.
+        sizes (list[list[float]], optional): 3D sizes of anchors.
+            Defaults to [[3.9, 1.6, 1.56]].
+        scales (list[int], optional): Scales of anchors in different feature
+            levels. Defaults to [1].
+        rotations (list[float], optional): Rotations of anchors in a feature
+            grid. Defaults to [0, 1.5707963].
+        custom_values (tuple[float], optional): Customized values of that
+            anchor. For example, in nuScenes the anchors have velocities.
+            Defaults to ().
+        reshape_out (bool, optional): Whether to reshape the output into
+            (N x 4). Defaults to True.
+        size_per_range (bool, optional): Whether to use separate ranges for
+            different sizes. If size_per_range is True, the ranges should have
+            the same length as the sizes, if not, it will be duplicated.
+            Defaults to True.
    """

    def __init__(self,
                 ranges,
-                 sizes=[[1.6, 3.9, 1.56]],
+                 sizes=[[3.9, 1.6, 1.56]],
                 scales=[1],
                 rotations=[0, 1.5707963],
                 custom_values=(),
@@ -86,13 +92,14 @@ class Anchor3DRangeGenerator(object):
        Args:
            featmap_sizes (list[tuple]): List of feature map sizes in
                multiple feature levels.
-            device (str): Device where the anchors will be put on.
+            device (str, optional): Device where the anchors will be put on.
+                Defaults to 'cuda'.

        Returns:
-            list[torch.Tensor]: Anchors in multiple feature levels. \
-                The sizes of each tensor should be [N, 4], where \
-                N = width * height * num_base_anchors, width and height \
-                are the sizes of the corresponding feature lavel, \
+            list[torch.Tensor]: Anchors in multiple feature levels.
+                The sizes of each tensor should be [N, 4], where
+                N = width * height * num_base_anchors, width and height
+                are the sizes of the corresponding feature level,
                num_base_anchors is the number of anchors for that level.
        """
        assert self.num_levels == len(featmap_sizes)
@@ -149,7 +156,7 @@ class Anchor3DRangeGenerator(object):
                             feature_size,
                             anchor_range,
                             scale=1,
-                             sizes=[[1.6, 3.9, 1.56]],
+                             sizes=[[3.9, 1.6, 1.56]],
                             rotations=[0, 1.5707963],
                             device='cuda'):
        """Generate anchors in a single range.
@@ -161,14 +168,18 @@ class Anchor3DRangeGenerator(object):
                shape [6]. The order is consistent with that of anchors, i.e.,
                (x_min, y_min, z_min, x_max, y_max, z_max).
            scale (float | int, optional): The scale factor of anchors.
-            sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with
-                shape [N, 3], in order of x, y, z.
-            rotations (list[float] | np.ndarray | torch.Tensor): Rotations of
-                anchors in a single feature grid.
+                Defaults to 1.
+            sizes (list[list] | np.ndarray | torch.Tensor, optional):
+                Anchor size with shape [N, 3], in order of x, y, z.
+                Defaults to [[3.9, 1.6, 1.56]].
+            rotations (list[float] | np.ndarray | torch.Tensor, optional):
+                Rotations of anchors in a single feature grid.
+                Defaults to [0, 1.5707963].
            device (str): Devices that the anchors will be put on.
+                Defaults to 'cuda'.

        Returns:
-            torch.Tensor: Anchors with shape \
+            torch.Tensor: Anchors with shape
                [*feature_size, num_sizes, num_rots, 7].
        """
        if len(feature_size) == 2:
@@ -231,10 +242,10 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
        up corner to distribute anchors.

    Args:
-        anchor_corner (bool): Whether to align with the corner of the voxel
-            grid. By default it is False and the anchor's center will be
+        anchor_corner (bool, optional): Whether to align with the corner of the
+            voxel grid. By default it is False and the anchor's center will be
            the same as the corresponding voxel's center, which is also the
-            center of the corresponding greature grid.
+            center of the corresponding greature grid. Defaults to False.
    """

    def __init__(self, align_corner=False, **kwargs):
@@ -245,7 +256,7 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
                             feature_size,
                             anchor_range,
                             scale,
-                             sizes=[[1.6, 3.9, 1.56]],
+                             sizes=[[3.9, 1.6, 1.56]],
                             rotations=[0, 1.5707963],
                             device='cuda'):
        """Generate anchors in a single range.
@@ -256,15 +267,18 @@ class AlignedAnchor3DRangeGenerator(Anchor3DRangeGenerator):
            anchor_range (torch.Tensor | list[float]): Range of anchors with
                shape [6]. The order is consistent with that of anchors, i.e.,
                (x_min, y_min, z_min, x_max, y_max, z_max).
-            scale (float | int, optional): The scale factor of anchors.
-            sizes (list[list] | np.ndarray | torch.Tensor): Anchor size with
-                shape [N, 3], in order of x, y, z.
-            rotations (list[float] | np.ndarray | torch.Tensor): Rotations of
-                anchors in a single feature grid.
-            device (str): Devices that the anchors will be put on.
+            scale (float | int): The scale factor of anchors.
+            sizes (list[list] | np.ndarray | torch.Tensor, optional):
+                Anchor size with shape [N, 3], in order of x, y, z.
+                Defaults to [[3.9, 1.6, 1.56]].
+            rotations (list[float] | np.ndarray | torch.Tensor, optional):
+                Rotations of anchors in a single feature grid.
+                Defaults to [0, 1.5707963].
+            device (str, optional): Devices that the anchors will be put on.
+                Defaults to 'cuda'.

        Returns:
-            torch.Tensor: Anchors with shape \
+            torch.Tensor: Anchors with shape
                [*feature_size, num_sizes, num_rots, 7].
        """
        if len(feature_size) == 2:
@@ -334,7 +348,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
    Note that feature maps of different classes may be different.

    Args:
-        kwargs (dict): Arguments are the same as those in \
+        kwargs (dict): Arguments are the same as those in
            :class:`AlignedAnchor3DRangeGenerator`.
    """

@@ -347,15 +361,16 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
        """Generate grid anchors in multiple feature levels.

        Args:
-            featmap_sizes (list[tuple]): List of feature map sizes for \
+            featmap_sizes (list[tuple]): List of feature map sizes for
                different classes in a single feature level.
-            device (str): Device where the anchors will be put on.
+            device (str, optional): Device where the anchors will be put on.
+                Defaults to 'cuda'.

        Returns:
-            list[list[torch.Tensor]]: Anchors in multiple feature levels. \
-                Note that in this anchor generator, we currently only \
-                support single feature level. The sizes of each tensor \
-                should be [num_sizes/ranges*num_rots*featmap_size, \
+            list[list[torch.Tensor]]: Anchors in multiple feature levels.
+                Note that in this anchor generator, we currently only
+                support single feature level. The sizes of each tensor
+                should be [num_sizes/ranges*num_rots*featmap_size,
                box_code_size].
        """
        multi_level_anchors = []
@@ -371,7 +386,7 @@ class AlignedAnchor3DRangeGeneratorPerCls(AlignedAnchor3DRangeGenerator):
        This function is usually called by method ``self.grid_anchors``.

        Args:
-            featmap_sizes (list[tuple]): List of feature map sizes for \
+            featmap_sizes (list[tuple]): List of feature map sizes for
                different classes in a single feature level.
            scale (float): Scale factor of the anchors in the current level.
            device (str, optional): Device the tensor will be put on.

--- a/mmdet3d/core/bbox/__init__.py
+++ b/mmdet3d/core/bbox/__init__.py
@@ -12,7 +12,8 @@ from .samplers import (BaseSampler, CombinedSampler,
 from .structures import (BaseInstance3DBoxes, Box3DMode, CameraInstance3DBoxes,
                         Coord3DMode, DepthInstance3DBoxes,
                         LiDARInstance3DBoxes, get_box_type, limit_period,
-                         mono_cam_box2vis, points_cam2img, xywhr2xyxyr)
+                         mono_cam_box2vis, points_cam2img, points_img2cam,
+                         xywhr2xyxyr)
 from .transforms import bbox3d2result, bbox3d2roi, bbox3d_mapping_back

 __all__ = [
@@ -25,5 +26,5 @@ __all__ = [
    'LiDARInstance3DBoxes', 'CameraInstance3DBoxes', 'bbox3d2roi',
    'bbox3d2result', 'DepthInstance3DBoxes', 'BaseInstance3DBoxes',
    'bbox3d_mapping_back', 'xywhr2xyxyr', 'limit_period', 'points_cam2img',
-    'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
+    'points_img2cam', 'get_box_type', 'Coord3DMode', 'mono_cam_box2vis'
 ]
--- a/mmdet3d/core/bbox/box_np_ops.py
+++ b/mmdet3d/core/bbox/box_np_ops.py
 # Copyright (c) OpenMMLab. All rights reserved.
 # TODO: clean the functions in this file and move the APIs into box structures
 # in the future
+# NOTICE: All functions in this file are valid for LiDAR or depth boxes only
+# if we use default parameters.

 import numba
 import numpy as np

+from .structures.utils import limit_period, points_cam2img, rotation_3d_in_axis
+

 def camera_to_lidar(points, r_rect, velo2cam):
    """Convert points in camera coordinate to lidar coordinate.

+    Note:
+        This function is for KITTI only.
+
    Args:
        points (np.ndarray, shape=[N, 3]): Points in camera coordinate.
        r_rect (np.ndarray, shape=[4, 4]): Matrix to project points in
@@ -27,7 +34,10 @@ def camera_to_lidar(points, r_rect, velo2cam):


 def box_camera_to_lidar(data, r_rect, velo2cam):
-    """Covert boxes in camera coordinate to lidar coordinate.
+    """Convert boxes in camera coordinate to lidar coordinate.
+
+    Note:
+        This function is for KITTI only.

    Args:
        data (np.ndarray, shape=[N, 7]): Boxes in camera coordinate.
@@ -40,10 +50,13 @@ def box_camera_to_lidar(data, r_rect, velo2cam):
        np.ndarray, shape=[N, 3]: Boxes in lidar coordinate.
    """
    xyz = data[:, 0:3]
-    l, h, w = data[:, 3:4], data[:, 4:5], data[:, 5:6]
+    x_size, y_size, z_size = data[:, 3:4], data[:, 4:5], data[:, 5:6]
    r = data[:, 6:7]
    xyz_lidar = camera_to_lidar(xyz, r_rect, velo2cam)
-    return np.concatenate([xyz_lidar, w, l, h, r], axis=1)
+    # yaw and dims also needs to be converted
+    r_new = -r - np.pi / 2
+    r_new = limit_period(r_new, period=np.pi * 2)
+    return np.concatenate([xyz_lidar, x_size, z_size, y_size, r_new], axis=1)


 def corners_nd(dims, origin=0.5):
@@ -80,26 +93,9 @@ def corners_nd(dims, origin=0.5):
    return corners


-def rotation_2d(points, angles):
-    """Rotation 2d points based on origin point clockwise when angle positive.
-
-    Args:
-        points (np.ndarray): Points to be rotated with shape \
-            (N, point_size, 2).
-        angles (np.ndarray): Rotation angle with shape (N).
-
-    Returns:
-        np.ndarray: Same shape as points.
-    """
-    rot_sin = np.sin(angles)
-    rot_cos = np.cos(angles)
-    rot_mat_T = np.stack([[rot_cos, -rot_sin], [rot_sin, rot_cos]])
-    return np.einsum('aij,jka->aik', points, rot_mat_T)
-
-
 def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
    """Convert kitti locations, dimensions and angles to corners.
-    format: center(xy), dims(xy), angles(clockwise when positive)
+    format: center(xy), dims(xy), angles(counterclockwise when positive)

    Args:
        centers (np.ndarray): Locations in kitti label file with shape (N, 2).
@@ -118,7 +114,7 @@ def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
    corners = corners_nd(dims, origin=origin)
    # corners: [N, 4, 2]
    if angles is not None:
-        corners = rotation_2d(corners, angles)
+        corners = rotation_3d_in_axis(corners, angles)
    corners += centers.reshape([-1, 1, 2])
    return corners

@@ -172,37 +168,6 @@ def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam):
    return lidar_points


-def rotation_3d_in_axis(points, angles, axis=0):
-    """Rotate points in specific axis.
-
-    Args:
-        points (np.ndarray, shape=[N, point_size, 3]]):
-        angles (np.ndarray, shape=[N]]):
-        axis (int, optional): Axis to rotate at. Defaults to 0.
-
-    Returns:
-        np.ndarray: Rotated points.
-    """
-    # points: [N, point_size, 3]
-    rot_sin = np.sin(angles)
-    rot_cos = np.cos(angles)
-    ones = np.ones_like(rot_cos)
-    zeros = np.zeros_like(rot_cos)
-    if axis == 1:
-        rot_mat_T = np.stack([[rot_cos, zeros, -rot_sin], [zeros, ones, zeros],
-                              [rot_sin, zeros, rot_cos]])
-    elif axis == 2 or axis == -1:
-        rot_mat_T = np.stack([[rot_cos, -rot_sin, zeros],
-                              [rot_sin, rot_cos, zeros], [zeros, zeros, ones]])
-    elif axis == 0:
-        rot_mat_T = np.stack([[zeros, rot_cos, -rot_sin],
-                              [zeros, rot_sin, rot_cos], [ones, zeros, zeros]])
-    else:
-        raise ValueError('axis should in range')
-
-    return np.einsum('aij,jka->aik', points, rot_mat_T)
-
-
 def center_to_corner_box3d(centers,
                           dims,
                           angles=None,
@@ -225,7 +190,7 @@ def center_to_corner_box3d(centers,
        np.ndarray: Corners with the shape of (N, 8, 3).
    """
    # 'length' in kitti format is in x axis.
-    # yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(wlh)(lidar)
+    # yzx(hwl)(kitti label file)<->xyz(lhw)(camera)<->z(-x)(-y)(lwh)(lidar)
    # center in kitti format is [0.5, 1.0, 0.5] in xyz.
    corners = corners_nd(dims, origin=origin)
    # corners: [N, 8, 3]
@@ -259,8 +224,8 @@ def box2d_to_corner_jit(boxes):
        rot_sin = np.sin(boxes[i, -1])
        rot_cos = np.cos(boxes[i, -1])
        rot_mat_T[0, 0] = rot_cos
-        rot_mat_T[0, 1] = -rot_sin
-        rot_mat_T[1, 0] = rot_sin
+        rot_mat_T[0, 1] = rot_sin
+        rot_mat_T[1, 0] = -rot_sin
        rot_mat_T[1, 1] = rot_cos
        box_corners[i] = corners[i] @ rot_mat_T + boxes[i, :2]
    return box_corners
@@ -327,15 +292,15 @@ def rotation_points_single_angle(points, angle, axis=0):
    rot_cos = np.cos(angle)
    if axis == 1:
        rot_mat_T = np.array(
-            [[rot_cos, 0, -rot_sin], [0, 1, 0], [rot_sin, 0, rot_cos]],
+            [[rot_cos, 0, rot_sin], [0, 1, 0], [-rot_sin, 0, rot_cos]],
            dtype=points.dtype)
    elif axis == 2 or axis == -1:
        rot_mat_T = np.array(
-            [[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0], [0, 0, 1]],
+            [[rot_cos, rot_sin, 0], [-rot_sin, rot_cos, 0], [0, 0, 1]],
            dtype=points.dtype)
    elif axis == 0:
        rot_mat_T = np.array(
-            [[1, 0, 0], [0, rot_cos, -rot_sin], [0, rot_sin, rot_cos]],
+            [[1, 0, 0], [0, rot_cos, rot_sin], [0, -rot_sin, rot_cos]],
            dtype=points.dtype)
    else:
        raise ValueError('axis should in range')
@@ -343,44 +308,6 @@ def rotation_points_single_angle(points, angle, axis=0):
    return points @ rot_mat_T, rot_mat_T


-def points_cam2img(points_3d, proj_mat, with_depth=False):
-    """Project points in camera coordinates to image coordinates.
-
-    Args:
-        points_3d (np.ndarray): Points in shape (N, 3)
-        proj_mat (np.ndarray): Transformation matrix between coordinates.
-        with_depth (bool, optional): Whether to keep depth in the output.
-            Defaults to False.
-
-    Returns:
-        np.ndarray: Points in image coordinates with shape [N, 2].
-    """
-    points_shape = list(points_3d.shape)
-    points_shape[-1] = 1
-
-    assert len(proj_mat.shape) == 2, 'The dimension of the projection'\
-        f' matrix should be 2 instead of {len(proj_mat.shape)}.'
-    d1, d2 = proj_mat.shape[:2]
-    assert (d1 == 3 and d2 == 3) or (d1 == 3 and d2 == 4) or (
-        d1 == 4 and d2 == 4), 'The shape of the projection matrix'\
-        f' ({d1}*{d2}) is not supported.'
-    if d1 == 3:
-        proj_mat_expanded = np.eye(4, dtype=proj_mat.dtype)
-        proj_mat_expanded[:d1, :d2] = proj_mat
-        proj_mat = proj_mat_expanded
-
-    points_4 = np.concatenate([points_3d, np.ones(points_shape)], axis=-1)
-    point_2d = points_4 @ proj_mat.T
-    point_2d_res = point_2d[..., :2] / point_2d[..., 2:3]
-
-    if with_depth:
-        points_2d_depth = np.concatenate([point_2d_res, point_2d[..., 2:3]],
-                                         axis=-1)
-        return points_2d_depth
-
-    return point_2d_res
-
-
 def box3d_to_bbox(box3d, P2):
    """Convert box3d in camera coordinates to bbox in image coordinates.

@@ -424,7 +351,10 @@ def corner_to_surfaces_3d(corners):


 def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)):
-    """Check points in rotated bbox and return indicces.
+    """Check points in rotated bbox and return indices.
+
+    Note:
+        This function is for counterclockwise boxes.

    Args:
        points (np.ndarray, shape=[N, 3+dim]): Points to query.
@@ -461,25 +391,9 @@ def minmax_to_corner_2d(minmax_box):
    return center_to_corner_box2d(center, dims, origin=0.0)


-def limit_period(val, offset=0.5, period=np.pi):
-    """Limit the value into a period for periodic function.
-
-    Args:
-        val (np.ndarray): The value to be converted.
-        offset (float, optional): Offset to set the value range. \
-            Defaults to 0.5.
-        period (float, optional): Period of the value. Defaults to np.pi.
-
-    Returns:
-        torch.Tensor: Value in the range of \
-            [-offset * period, (1-offset) * period]
-    """
-    return val - np.floor(val / period + offset) * period
-
-
 def create_anchors_3d_range(feature_size,
                            anchor_range,
-                            sizes=((1.6, 3.9, 1.56), ),
+                            sizes=((3.9, 1.6, 1.56), ),
                            rotations=(0, np.pi / 2),
                            dtype=np.float32):
    """Create anchors 3d by range.
@@ -492,14 +406,14 @@ def create_anchors_3d_range(feature_size,
            (x_min, y_min, z_min, x_max, y_max, z_max).
        sizes (list[list] | np.ndarray | torch.Tensor, optional):
            Anchor size with shape [N, 3], in order of x, y, z.
-            Defaults to ((1.6, 3.9, 1.56), ).
+            Defaults to ((3.9, 1.6, 1.56), ).
        rotations (list[float] | np.ndarray | torch.Tensor, optional):
            Rotations of anchors in a single feature grid.
            Defaults to (0, np.pi / 2).
-        dtype (type, optional): Data type. Default to np.float32.
+        dtype (type, optional): Data type. Defaults to np.float32.

    Returns:
-        np.ndarray: Range based anchors with shape of \
+        np.ndarray: Range based anchors with shape of
            (*feature_size, num_sizes, num_rots, 7).
    """
    anchor_range = np.array(anchor_range, dtype)
@@ -550,11 +464,11 @@ def rbbox2d_to_near_bbox(rbboxes):
    """convert rotated bbox to nearest 'standing' or 'lying' bbox.

    Args:
-        rbboxes (np.ndarray): Rotated bboxes with shape of \
+        rbboxes (np.ndarray): Rotated bboxes with shape of
            (N, 5(x, y, xdim, ydim, rad)).

    Returns:
-        np.ndarray: Bounding boxes with the shpae of
+        np.ndarray: Bounding boxes with the shape of
            (N, 4(xmin, ymin, xmax, ymax)).
    """
    rots = rbboxes[..., -1]
@@ -570,6 +484,9 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
    """Calculate box iou. Note that jit version runs ~10x faster than the
    box_overlaps function in mmdet3d.core.evaluation.

+    Note:
+        This function is for counterclockwise boxes.
+
    Args:
        boxes (np.ndarray): Input bounding boxes with shape of (N, 4).
        query_boxes (np.ndarray): Query boxes with shape of (K, 4).
@@ -607,7 +524,10 @@ def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):


 def projection_matrix_to_CRT_kitti(proj):
-    """Split projection matrix of kitti.
+    """Split projection matrix of KITTI.
+
+    Note:
+        This function is for KITTI only.

    P = C @ [R|T]
    C is upper triangular matrix, so we need to inverse CR and use QR
@@ -633,6 +553,9 @@ def projection_matrix_to_CRT_kitti(proj):
 def remove_outside_points(points, rect, Trv2c, P2, image_shape):
    """Remove points which are outside of image.

+    Note:
+        This function is for KITTI only.
+
    Args:
        points (np.ndarray, shape=[N, 3+dims]): Total points.
        rect (np.ndarray, shape=[4, 4]): Matrix to project points in
@@ -782,8 +705,8 @@ def points_in_convex_polygon_3d_jit(points,
                                            normal_vec, d, num_surfaces)


-@numba.jit
-def points_in_convex_polygon_jit(points, polygon, clockwise=True):
+@numba.njit
+def points_in_convex_polygon_jit(points, polygon, clockwise=False):
    """Check points is in 2d convex polygons. True when point in polygon.

    Args:
@@ -800,14 +723,16 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
    num_points_of_polygon = polygon.shape[1]
    num_points = points.shape[0]
    num_polygons = polygon.shape[0]
-    # if clockwise:
-    #     vec1 = polygon - polygon[:, [num_points_of_polygon - 1] +
-    #                              list(range(num_points_of_polygon - 1)), :]
-    # else:
-    #     vec1 = polygon[:, [num_points_of_polygon - 1] +
-    #                    list(range(num_points_of_polygon - 1)), :] - polygon
-    # vec1: [num_polygon, num_points_of_polygon, 2]
-    vec1 = np.zeros((2), dtype=polygon.dtype)
+    # vec for all the polygons
+    if clockwise:
+        vec1 = polygon - polygon[:,
+                                 np.array([num_points_of_polygon - 1] + list(
+                                     range(num_points_of_polygon - 1))), :]
+    else:
+        vec1 = polygon[:,
+                       np.array([num_points_of_polygon - 1] +
+                                list(range(num_points_of_polygon -
+                                           1))), :] - polygon
    ret = np.zeros((num_points, num_polygons), dtype=np.bool_)
    success = True
    cross = 0.0
@@ -815,12 +740,9 @@ def points_in_convex_polygon_jit(points, polygon, clockwise=True):
        for j in range(num_polygons):
            success = True
            for k in range(num_points_of_polygon):
-                if clockwise:
-                    vec1 = polygon[j, k] - polygon[j, k - 1]
-                else:
-                    vec1 = polygon[j, k - 1] - polygon[j, k]
-                cross = vec1[1] * (polygon[j, k, 0] - points[i, 0])
-                cross -= vec1[0] * (polygon[j, k, 1] - points[i, 1])
+                vec = vec1[j, k]
+                cross = vec[1] * (polygon[j, k, 0] - points[i, 0])
+                cross -= vec[0] * (polygon[j, k, 1] - points[i, 1])
                if cross >= 0:
                    success = False
                    break
@@ -839,10 +761,13 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
      |/         |/
      2 -------- 1

+    Note:
+        This function is for LiDAR boxes only.
+
    Args:
        boxes3d (np.ndarray): Boxes with shape of (N, 7)
-            [x, y, z, w, l, h, ry] in LiDAR coords, see the definition of ry
-            in KITTI dataset.
+            [x, y, z, x_size, y_size, z_size, ry] in LiDAR coords,
+            see the definition of ry in KITTI dataset.
        bottom_center (bool, optional): Whether z is on the bottom center
            of object. Defaults to True.

@@ -850,19 +775,25 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
        np.ndarray: Box corners with the shape of [N, 8, 3].
    """
    boxes_num = boxes3d.shape[0]
-    w, l, h = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
-    x_corners = np.array(
-        [w / 2., -w / 2., -w / 2., w / 2., w / 2., -w / 2., -w / 2., w / 2.],
-        dtype=np.float32).T
-    y_corners = np.array(
-        [-l / 2., -l / 2., l / 2., l / 2., -l / 2., -l / 2., l / 2., l / 2.],
-        dtype=np.float32).T
+    x_size, y_size, z_size = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
+    x_corners = np.array([
+        x_size / 2., -x_size / 2., -x_size / 2., x_size / 2., x_size / 2.,
+        -x_size / 2., -x_size / 2., x_size / 2.
+    ],
+                         dtype=np.float32).T
+    y_corners = np.array([
+        -y_size / 2., -y_size / 2., y_size / 2., y_size / 2., -y_size / 2.,
+        -y_size / 2., y_size / 2., y_size / 2.
+    ],
+                         dtype=np.float32).T
    if bottom_center:
        z_corners = np.zeros((boxes_num, 8), dtype=np.float32)
-        z_corners[:, 4:8] = h.reshape(boxes_num, 1).repeat(4, axis=1)  # (N, 8)
+        z_corners[:, 4:8] = z_size.reshape(boxes_num, 1).repeat(
+            4, axis=1)  # (N, 8)
    else:
        z_corners = np.array([
-            -h / 2., -h / 2., -h / 2., -h / 2., h / 2., h / 2., h / 2., h / 2.
+            -z_size / 2., -z_size / 2., -z_size / 2., -z_size / 2.,
+            z_size / 2., z_size / 2., z_size / 2., z_size / 2.
        ],
                             dtype=np.float32).T

@@ -870,9 +801,9 @@ def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):
    zeros, ones = np.zeros(
        ry.size, dtype=np.float32), np.ones(
            ry.size, dtype=np.float32)
-    rot_list = np.array([[np.cos(ry), -np.sin(ry), zeros],
-                         [np.sin(ry), np.cos(ry), zeros], [zeros, zeros,
-                                                           ones]])  # (3, 3, N)
+    rot_list = np.array([[np.cos(ry), np.sin(ry), zeros],
+                         [-np.sin(ry), np.cos(ry), zeros],
+                         [zeros, zeros, ones]])  # (3, 3, N)
    R_list = np.transpose(rot_list, (2, 0, 1))  # (N, 3, 3)

    temp_corners = np.concatenate((x_corners.reshape(

--- a/mmdet3d/core/bbox/coders/__init__.py
+++ b/mmdet3d/core/bbox/coders/__init__.py
@@ -3,10 +3,17 @@ from mmdet.core.bbox import build_bbox_coder
 from .anchor_free_bbox_coder import AnchorFreeBBoxCoder
 from .centerpoint_bbox_coders import CenterPointBBoxCoder
 from .delta_xyzwhlr_bbox_coder import DeltaXYZWLHRBBoxCoder
+from .fcos3d_bbox_coder import FCOS3DBBoxCoder
 from .groupfree3d_bbox_coder import GroupFree3DBBoxCoder
+from .monoflex_bbox_coder import MonoFlexCoder
 from .partial_bin_based_bbox_coder import PartialBinBasedBBoxCoder
+from .pgd_bbox_coder import PGDBBoxCoder
+from .point_xyzwhlr_bbox_coder import PointXYZWHLRBBoxCoder
+from .smoke_bbox_coder import SMOKECoder

 __all__ = [
    'build_bbox_coder', 'DeltaXYZWLHRBBoxCoder', 'PartialBinBasedBBoxCoder',
-    'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder'
+    'CenterPointBBoxCoder', 'AnchorFreeBBoxCoder', 'GroupFree3DBBoxCoder',
+    'PointXYZWHLRBBoxCoder', 'FCOS3DBBoxCoder', 'PGDBBoxCoder', 'SMOKECoder',
+    'MonoFlexCoder'
 ]
--- a/mmdet3d/core/bbox/coders/anchor_free_bbox_coder.py
+++ b/mmdet3d/core/bbox/coders/anchor_free_bbox_coder.py
@@ -25,7 +25,7 @@ class AnchorFreeBBoxCoder(PartialBinBasedBBoxCoder):
        """Encode ground truth to prediction targets.

        Args:
-            gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \
+            gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
                with shape (n, 7).
            gt_labels_3d (torch.Tensor): Ground truth classes.


--- a/mmdet3d/core/bbox/coders/centerpoint_bbox_coders.py
+++ b/mmdet3d/core/bbox/coders/centerpoint_bbox_coders.py
@@ -13,12 +13,12 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
        pc_range (list[float]): Range of point cloud.
        out_size_factor (int): Downsample factor of the model.
        voxel_size (list[float]): Size of voxel.
-        post_center_range (list[float]): Limit of the center.
+        post_center_range (list[float], optional): Limit of the center.
            Default: None.
-        max_num (int): Max number to be kept. Default: 100.
-        score_threshold (float): Threshold to filter boxes based on score.
-            Default: None.
-        code_size (int): Code size of bboxes. Default: 9
+        max_num (int, optional): Max number to be kept. Default: 100.
+        score_threshold (float, optional): Threshold to filter boxes
+            based on score. Default: None.
+        code_size (int, optional): Code size of bboxes. Default: 9
    """

    def __init__(self,
@@ -45,7 +45,8 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
            feats (torch.Tensor): Features to be transposed and gathered
                with the shape of [B, 2, W, H].
            inds (torch.Tensor): Indexes with the shape of [B, N].
-            feat_masks (torch.Tensor): Mask of the feats. Default: None.
+            feat_masks (torch.Tensor, optional): Mask of the feats.
+                Default: None.

        Returns:
            torch.Tensor: Gathered feats.
@@ -64,7 +65,7 @@ class CenterPointBBoxCoder(BaseBBoxCoder):

        Args:
            scores (torch.Tensor): scores with the shape of [B, N, W, H].
-            K (int): Number to be kept. Defaults to 80.
+            K (int, optional): Number to be kept. Defaults to 80.

        Returns:
            tuple[torch.Tensor]
@@ -135,9 +136,9 @@ class CenterPointBBoxCoder(BaseBBoxCoder):
            dim (torch.Tensor): Dim of the boxes with the shape of
                [B, 1, W, H].
            vel (torch.Tensor): Velocity with the shape of [B, 1, W, H].
-            reg (torch.Tensor): Regression value of the boxes in 2D with
-                the shape of [B, 2, W, H]. Default: None.
-            task_id (int): Index of task. Default: -1.
+            reg (torch.Tensor, optional): Regression value of the boxes in
+                2D with the shape of [B, 2, W, H]. Default: None.
+            task_id (int, optional): Index of task. Default: -1.

        Returns:
            list[dict]: Decoded boxes.

--- a/mmdet3d/core/bbox/coders/delta_xyzwhlr_bbox_coder.py
+++ b/mmdet3d/core/bbox/coders/delta_xyzwhlr_bbox_coder.py
@@ -19,9 +19,9 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):

    @staticmethod
    def encode(src_boxes, dst_boxes):
-        """Get box regression transformation deltas (dx, dy, dz, dw, dh, dl,
-        dr, dv*) that can be used to transform the `src_boxes` into the
-        `target_boxes`.
+        """Get box regression transformation deltas (dx, dy, dz, dx_size,
+        dy_size, dz_size, dr, dv*) that can be used to transform the
+        `src_boxes` into the `target_boxes`.

        Args:
            src_boxes (torch.Tensor): source boxes, e.g., object proposals.
@@ -56,13 +56,13 @@ class DeltaXYZWLHRBBoxCoder(BaseBBoxCoder):

    @staticmethod
    def decode(anchors, deltas):
-        """Apply transformation `deltas` (dx, dy, dz, dw, dh, dl, dr, dv*) to
-        `boxes`.
+        """Apply transformation `deltas` (dx, dy, dz, dx_size, dy_size,
+        dz_size, dr, dv*) to `boxes`.

        Args:
            anchors (torch.Tensor): Parameters of anchors with shape (N, 7).
            deltas (torch.Tensor): Encoded boxes with shape
-                (N, 7+n) [x, y, z, w, l, h, r, velo*].
+                (N, 7+n) [x, y, z, x_size, y_size, z_size, r, velo*].

        Returns:
            torch.Tensor: Decoded boxes.

--- a/mmdet3d/core/bbox/coders/fcos3d_bbox_coder.py
+++ b/mmdet3d/core/bbox/coders/fcos3d_bbox_coder.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+
+from mmdet.core.bbox import BaseBBoxCoder
+from mmdet.core.bbox.builder import BBOX_CODERS
+from ..structures import limit_period
+
+
+@BBOX_CODERS.register_module()
+class FCOS3DBBoxCoder(BaseBBoxCoder):
+    """Bounding box coder for FCOS3D.
+
+    Args:
+        base_depths (tuple[tuple[float]]): Depth references for decode box
+            depth. Defaults to None.
+        base_dims (tuple[tuple[float]]): Dimension references for decode box
+            dimension. Defaults to None.
+        code_size (int): The dimension of boxes to be encoded. Defaults to 7.
+        norm_on_bbox (bool): Whether to apply normalization on the bounding
+            box 2D attributes. Defaults to True.
+    """
+
+    def __init__(self,
+                 base_depths=None,
+                 base_dims=None,
+                 code_size=7,
+                 norm_on_bbox=True):
+        super(FCOS3DBBoxCoder, self).__init__()
+        self.base_depths = base_depths
+        self.base_dims = base_dims
+        self.bbox_code_size = code_size
+        self.norm_on_bbox = norm_on_bbox
+
+    def encode(self, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels):
+        # TODO: refactor the encoder in the FCOS3D and PGD head
+        pass
+
+    def decode(self, bbox, scale, stride, training, cls_score=None):
+        """Decode regressed results into 3D predictions.
+
+        Note that offsets are not transformed to the projected 3D centers.
+
+        Args:
+            bbox (torch.Tensor): Raw bounding box predictions in shape
+                [N, C, H, W].
+            scale (tuple[`Scale`]): Learnable scale parameters.
+            stride (int): Stride for a specific feature level.
+            training (bool): Whether the decoding is in the training
+                procedure.
+            cls_score (torch.Tensor): Classification score map for deciding
+                which base depth or dim is used. Defaults to None.
+
+        Returns:
+            torch.Tensor: Decoded boxes.
+        """
+        # scale the bbox of different level
+        # only apply to offset, depth and size prediction
+        scale_offset, scale_depth, scale_size = scale[0:3]
+
+        clone_bbox = bbox.clone()
+        bbox[:, :2] = scale_offset(clone_bbox[:, :2]).float()
+        bbox[:, 2] = scale_depth(clone_bbox[:, 2]).float()
+        bbox[:, 3:6] = scale_size(clone_bbox[:, 3:6]).float()
+
+        if self.base_depths is None:
+            bbox[:, 2] = bbox[:, 2].exp()
+        elif len(self.base_depths) == 1:  # only single prior
+            mean = self.base_depths[0][0]
+            std = self.base_depths[0][1]
+            bbox[:, 2] = mean + bbox.clone()[:, 2] * std
+        else:  # multi-class priors
+            assert len(self.base_depths) == cls_score.shape[1], \
+                'The number of multi-class depth priors should be equal to ' \
+                'the number of categories.'
+            indices = cls_score.max(dim=1)[1]
+            depth_priors = cls_score.new_tensor(
+                self.base_depths)[indices, :].permute(0, 3, 1, 2)
+            mean = depth_priors[:, 0]
+            std = depth_priors[:, 1]
+            bbox[:, 2] = mean + bbox.clone()[:, 2] * std
+
+        bbox[:, 3:6] = bbox[:, 3:6].exp()
+        if self.base_dims is not None:
+            assert len(self.base_dims) == cls_score.shape[1], \
+                'The number of anchor sizes should be equal to the number ' \
+                'of categories.'
+            indices = cls_score.max(dim=1)[1]
+            size_priors = cls_score.new_tensor(
+                self.base_dims)[indices, :].permute(0, 3, 1, 2)
+            bbox[:, 3:6] = size_priors * bbox.clone()[:, 3:6]
+
+        assert self.norm_on_bbox is True, 'Setting norm_on_bbox to False '\
+            'has not been thoroughly tested for FCOS3D.'
+        if self.norm_on_bbox:
+            if not training:
+                # Note that this line is conducted only when testing
+                bbox[:, :2] *= stride
+
+        return bbox
+
+    @staticmethod
+    def decode_yaw(bbox, centers2d, dir_cls, dir_offset, cam2img):
+        """Decode yaw angle and change it from local to global.i.
+
+        Args:
+            bbox (torch.Tensor): Bounding box predictions in shape
+                [N, C] with yaws to be decoded.
+            centers2d (torch.Tensor): Projected 3D-center on the image planes
+                corresponding to the box predictions.
+            dir_cls (torch.Tensor): Predicted direction classes.
+            dir_offset (float): Direction offset before dividing all the
+                directions into several classes.
+            cam2img (torch.Tensor): Camera intrinsic matrix in shape [4, 4].
+
+        Returns:
+            torch.Tensor: Bounding boxes with decoded yaws.
+        """
+        if bbox.shape[0] > 0:
+            dir_rot = limit_period(bbox[..., 6] - dir_offset, 0, np.pi)
+            bbox[..., 6] = \
+                dir_rot + dir_offset + np.pi * dir_cls.to(bbox.dtype)
+
+        bbox[:, 6] = torch.atan2(centers2d[:, 0] - cam2img[0, 2],
+                                 cam2img[0, 0]) + bbox[:, 6]
+
+        return bbox
--- a/mmdet3d/core/bbox/coders/groupfree3d_bbox_coder.py
+++ b/mmdet3d/core/bbox/coders/groupfree3d_bbox_coder.py
@@ -14,9 +14,10 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
        num_dir_bins (int): Number of bins to encode direction angle.
        num_sizes (int): Number of size clusters.
        mean_sizes (list[list[int]]): Mean size of bboxes in each class.
-        with_rot (bool): Whether the bbox is with rotation. Defaults to True.
-        size_cls_agnostic (bool): Whether the predicted size is class-agnostic.
+        with_rot (bool, optional): Whether the bbox is with rotation.
            Defaults to True.
+        size_cls_agnostic (bool, optional): Whether the predicted size is
+            class-agnostic. Defaults to True.
    """

    def __init__(self,
@@ -36,7 +37,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
        """Encode ground truth to prediction targets.

        Args:
-            gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes \
+            gt_bboxes_3d (BaseInstance3DBoxes): Ground truth bboxes
                with shape (n, 7).
            gt_labels_3d (torch.Tensor): Ground truth classes.

@@ -76,7 +77,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
                - size_class: predicted bbox size class.
                - size_res: predicted bbox size residual.
                - size: predicted class-agnostic bbox size
-            prefix (str): Decode predictions with specific prefix.
+            prefix (str, optional): Decode predictions with specific prefix.
                Defaults to ''.

        Returns:
@@ -122,7 +123,7 @@ class GroupFree3DBBoxCoder(PartialBinBasedBBoxCoder):
            cls_preds (torch.Tensor): Class predicted features to split.
            reg_preds (torch.Tensor): Regression predicted features to split.
            base_xyz (torch.Tensor): Coordinates of points.
-            prefix (str): Decode predictions with specific prefix.
+            prefix (str, optional): Decode predictions with specific prefix.
                Defaults to ''.

        Returns:

--- a/mmdet3d/core/bbox/coders/monoflex_bbox_coder.py
+++ b/mmdet3d/core/bbox/coders/monoflex_bbox_coder.py
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+from torch.nn import functional as F
+
+from mmdet.core.bbox import BaseBBoxCoder
+from mmdet.core.bbox.builder import BBOX_CODERS
+
+
+@BBOX_CODERS.register_module()
+class MonoFlexCoder(BaseBBoxCoder):
+    """Bbox Coder for MonoFlex.
+
+    Args:
+        depth_mode (str): The mode for depth calculation.
+            Available options are "linear", "inv_sigmoid", and "exp".
+        base_depth (tuple[float]): References for decoding box depth.
+        depth_range (list): Depth range of predicted depth.
+        combine_depth (bool): Whether to use combined depth (direct depth
+            and depth from keypoints) or use direct depth only.
+        uncertainty_range (list): Uncertainty range of predicted depth.
+        base_dims (tuple[tuple[float]]): Dimensions mean and std of decode bbox
+            dimensions [l, h, w] for each category.
+        dims_mode (str): The mode for dimension calculation.
+            Available options are "linear" and "exp".
+        multibin (bool): Whether to use multibin representation.
+        num_dir_bins (int): Number of Number of bins to encode
+            direction angle.
+        bin_centers (list[float]): Local yaw centers while using multibin
+            representations.
+        bin_margin (float): Margin of multibin representations.
+        code_size (int): The dimension of boxes to be encoded.
+        eps (float, optional): A value added to the denominator for numerical
+            stability. Default 1e-3.
+    """
+
+    def __init__(self,
+                 depth_mode,
+                 base_depth,
+                 depth_range,
+                 combine_depth,
+                 uncertainty_range,
+                 base_dims,
+                 dims_mode,
+                 multibin,
+                 num_dir_bins,
+                 bin_centers,
+                 bin_margin,
+                 code_size,
+                 eps=1e-3):
+        super(MonoFlexCoder, self).__init__()
+
+        # depth related
+        self.depth_mode = depth_mode
+        self.base_depth = base_depth
+        self.depth_range = depth_range
+        self.combine_depth = combine_depth
+        self.uncertainty_range = uncertainty_range
+
+        # dimensions related
+        self.base_dims = base_dims
+        self.dims_mode = dims_mode
+
+        # orientation related
+        self.multibin = multibin
+        self.num_dir_bins = num_dir_bins
+        self.bin_centers = bin_centers
+        self.bin_margin = bin_margin
+
+        # output related
+        self.bbox_code_size = code_size
+        self.eps = eps
+
+    def encode(self, gt_bboxes_3d):
+        """Encode ground truth to prediction targets.
+
+        Args:
+            gt_bboxes_3d (`BaseInstance3DBoxes`): Ground truth 3D bboxes.
+                shape: (N, 7).
+
+        Returns:
+            torch.Tensor: Targets of orientations.
+        """
+        local_yaw = gt_bboxes_3d.local_yaw
+        # encode local yaw (-pi ~ pi) to multibin format
+        encode_local_yaw = local_yaw.new_zeros(
+            [local_yaw.shape[0], self.num_dir_bins * 2])
+        bin_size = 2 * np.pi / self.num_dir_bins
+        margin_size = bin_size * self.bin_margin
+
+        bin_centers = local_yaw.new_tensor(self.bin_centers)
+        range_size = bin_size / 2 + margin_size
+
+        offsets = local_yaw.unsqueeze(1) - bin_centers.unsqueeze(0)
+        offsets[offsets > np.pi] = offsets[offsets > np.pi] - 2 * np.pi
+        offsets[offsets < -np.pi] = offsets[offsets < -np.pi] + 2 * np.pi
+
+        for i in range(self.num_dir_bins):
+            offset = offsets[:, i]
+            inds = abs(offset) < range_size
+            encode_local_yaw[inds, i] = 1
+            encode_local_yaw[inds, i + self.num_dir_bins] = offset[inds]
+
+        orientation_target = encode_local_yaw
+
+        return orientation_target
+
+    def decode(self, bbox, base_centers2d, labels, downsample_ratio, cam2imgs):
+        """Decode bounding box regression into 3D predictions.
+
+        Args:
+            bbox (Tensor): Raw bounding box predictions for each
+                predict center2d point.
+                shape: (N, C)
+            base_centers2d (torch.Tensor): Base centers2d for 3D bboxes.
+                shape: (N, 2).
+            labels (Tensor): Batch predict class label for each predict
+                center2d point.
+                shape: (N, )
+            downsample_ratio (int): The stride of feature map.
+            cam2imgs (Tensor): Batch images' camera intrinsic matrix.
+                shape: kitti (N, 4, 4)  nuscenes (N, 3, 3)
+
+        Return:
+            dict: The 3D prediction dict decoded from regression map.
+            the dict has components below:
+                - bboxes2d (torch.Tensor): Decoded [x1, y1, x2, y2] format
+                    2D bboxes.
+                - dimensions (torch.Tensor): Decoded dimensions for each
+                    object.
+                - offsets2d (torch.Tenosr): Offsets between base centers2d
+                    and real centers2d.
+                - direct_depth (torch.Tensor): Decoded directly regressed
+                    depth.
+                - keypoints2d (torch.Tensor): Keypoints of each projected
+                    3D box on image.
+                - keypoints_depth (torch.Tensor): Decoded depth from keypoints.
+                - combined_depth (torch.Tensor): Combined depth using direct
+                    depth and keypoints depth with depth uncertainty.
+                - orientations (torch.Tensor): Multibin format orientations
+                    (local yaw) for each objects.
+        """
+
+        # 4 dimensions for FCOS style regression
+        pred_bboxes2d = bbox[:, 0:4]
+
+        # change FCOS style to [x1, y1, x2, y2] format for IOU Loss
+        pred_bboxes2d = self.decode_bboxes2d(pred_bboxes2d, base_centers2d)
+
+        # 2 dimensions for projected centers2d offsets
+        pred_offsets2d = bbox[:, 4:6]
+
+        # 3 dimensions for 3D bbox dimensions offsets
+        pred_dimensions_offsets3d = bbox[:, 29:32]
+
+        # the first 8 dimensions are for orientation bin classification
+        # and the second 8 dimensions are for orientation offsets.
+        pred_orientations = torch.cat((bbox[:, 32:40], bbox[:, 40:48]), dim=1)
+
+        # 3 dimensions for the uncertainties of the solved depths from
+        # groups of keypoints
+        pred_keypoints_depth_uncertainty = bbox[:, 26:29]
+
+        # 1 dimension for the uncertainty of directly regressed depth
+        pred_direct_depth_uncertainty = bbox[:, 49:50].squeeze(-1)
+
+        # 2 dimension of offsets x keypoints (8 corners + top/bottom center)
+        pred_keypoints2d = bbox[:, 6:26].reshape(-1, 10, 2)
+
+        # 1 dimension for depth offsets
+        pred_direct_depth_offsets = bbox[:, 48:49].squeeze(-1)
+
+        # decode the pred residual dimensions to real dimensions
+        pred_dimensions = self.decode_dims(labels, pred_dimensions_offsets3d)
+        pred_direct_depth = self.decode_direct_depth(pred_direct_depth_offsets)
+        pred_keypoints_depth = self.keypoints2depth(pred_keypoints2d,
+                                                    pred_dimensions, cam2imgs,
+                                                    downsample_ratio)
+
+        pred_direct_depth_uncertainty = torch.clamp(
+            pred_direct_depth_uncertainty, self.uncertainty_range[0],
+            self.uncertainty_range[1])
+        pred_keypoints_depth_uncertainty = torch.clamp(
+            pred_keypoints_depth_uncertainty, self.uncertainty_range[0],
+            self.uncertainty_range[1])
+
+        if self.combine_depth:
+            pred_depth_uncertainty = torch.cat(
+                (pred_direct_depth_uncertainty.unsqueeze(-1),
+                 pred_keypoints_depth_uncertainty),
+                dim=1).exp()
+            pred_depth = torch.cat(
+                (pred_direct_depth.unsqueeze(-1), pred_keypoints_depth), dim=1)
+            pred_combined_depth = \
+                self.combine_depths(pred_depth, pred_depth_uncertainty)
+        else:
+            pred_combined_depth = None
+
+        preds = dict(
+            bboxes2d=pred_bboxes2d,
+            dimensions=pred_dimensions,
+            offsets2d=pred_offsets2d,
+            keypoints2d=pred_keypoints2d,
+            orientations=pred_orientations,
+            direct_depth=pred_direct_depth,
+            keypoints_depth=pred_keypoints_depth,
+            combined_depth=pred_combined_depth,
+            direct_depth_uncertainty=pred_direct_depth_uncertainty,
+            keypoints_depth_uncertainty=pred_keypoints_depth_uncertainty,
+        )
+
+        return preds
+
+    def decode_direct_depth(self, depth_offsets):
+        """Transform depth offset to directly regressed depth.
+
+        Args:
+            depth_offsets (torch.Tensor): Predicted depth offsets.
+                shape: (N, )
+
+        Return:
+            torch.Tensor: Directly regressed depth.
+                shape: (N, )
+        """
+        if self.depth_mode == 'exp':
+            direct_depth = depth_offsets.exp()
+        elif self.depth_mode == 'linear':
+            base_depth = depth_offsets.new_tensor(self.base_depth)
+            direct_depth = depth_offsets * base_depth[1] + base_depth[0]
+        elif self.depth_mode == 'inv_sigmoid':
+            direct_depth = 1 / torch.sigmoid(depth_offsets) - 1
+        else:
+            raise ValueError
+
+        if self.depth_range is not None:
+            direct_depth = torch.clamp(
+                direct_depth, min=self.depth_range[0], max=self.depth_range[1])
+
+        return direct_depth
+
+    def decode_location(self,
+                        base_centers2d,
+                        offsets2d,
+                        depths,
+                        cam2imgs,
+                        downsample_ratio,
+                        pad_mode='default'):
+        """Retrieve object location.
+
+        Args:
+            base_centers2d (torch.Tensor): predicted base centers2d.
+                shape: (N, 2)
+            offsets2d (torch.Tensor): The offsets between real centers2d
+                and base centers2d.
+                shape: (N , 2)
+            depths (torch.Tensor): Depths of objects.
+                shape: (N, )
+            cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
+                shape: kitti (N, 4, 4)  nuscenes (N, 3, 3)
+            downsample_ratio (int): The stride of feature map.
+            pad_mode (str, optional): Padding mode used in
+                training data augmentation.
+
+        Return:
+            tuple(torch.Tensor): Centers of 3D boxes.
+                shape: (N, 3)
+        """
+        N = cam2imgs.shape[0]
+        # (N, 4, 4)
+        cam2imgs_inv = cam2imgs.inverse()
+        if pad_mode == 'default':
+            centers2d_img = (base_centers2d + offsets2d) * downsample_ratio
+        else:
+            raise NotImplementedError
+        # (N, 3)
+        centers2d_img = \
+            torch.cat((centers2d_img, depths.unsqueeze(-1)), dim=1)
+        # (N, 4, 1)
+        centers2d_extend = \
+            torch.cat((centers2d_img, centers2d_img.new_ones(N, 1)),
+                      dim=1).unsqueeze(-1)
+        locations = torch.matmul(cam2imgs_inv, centers2d_extend).squeeze(-1)
+
+        return locations[:, :3]
+
+    def keypoints2depth(self,
+                        keypoints2d,
+                        dimensions,
+                        cam2imgs,
+                        downsample_ratio=4,
+                        group0_index=[(7, 3), (0, 4)],
+                        group1_index=[(2, 6), (1, 5)]):
+        """Decode depth form three groups of keypoints and geometry projection
+        model. 2D keypoints inlucding 8 coreners and top/bottom centers will be
+        divided into three groups which will be used to calculate three depths
+        of object.
+
+        .. code-block:: none
+
+                Group center keypoints:
+
+                             + --------------- +
+                            /|   top center   /|
+                           / |      .        / |
+                          /  |      |       /  |
+                         + ---------|----- +   +
+                         |  /       |      |  /
+                         | /        .      | /
+                         |/ bottom center  |/
+                         + --------------- +
+
+                Group 0 keypoints:
+
+                             0
+                             + -------------- +
+                            /|               /|
+                           / |              / |
+                          /  |            5/  |
+                         + -------------- +   +
+                         |  /3            |  /
+                         | /              | /
+                         |/               |/
+                         + -------------- + 6
+
+                Group 1 keypoints:
+
+                                               4
+                             + -------------- +
+                            /|               /|
+                           / |              / |
+                          /  |             /  |
+                       1 + -------------- +   + 7
+                         |  /             |  /
+                         | /              | /
+                         |/               |/
+                       2 + -------------- +
+
+
+        Args:
+            keypoints2d (torch.Tensor): Keypoints of objects.
+                8 vertices + top/bottom center.
+                shape: (N, 10, 2)
+            dimensions (torch.Tensor): Dimensions of objetcts.
+                shape: (N, 3)
+            cam2imgs (torch.Tensor): Batch images' camera intrinsic matrix.
+                shape: kitti (N, 4, 4)  nuscenes (N, 3, 3)
+            downsample_ratio (int, opitonal): The stride of feature map.
+                Defaults: 4.
+            group0_index(list[tuple[int]], optional): Keypoints group 0
+                of index to calculate the depth.
+                Defaults: [0, 3, 4, 7].
+            group1_index(list[tuple[int]], optional): Keypoints group 1
+                of index to calculate the depth.
+                Defaults: [1, 2, 5, 6]
+
+        Return:
+            tuple(torch.Tensor): Depth computed from three groups of
+                keypoints (top/bottom, group0, group1)
+                shape: (N, 3)
+        """
+
+        pred_height_3d = dimensions[:, 1].clone()
+        f_u = cam2imgs[:, 0, 0]
+        center_height = keypoints2d[:, -2, 1] - keypoints2d[:, -1, 1]
+        corner_group0_height = keypoints2d[:, group0_index[0], 1] \
+            - keypoints2d[:, group0_index[1], 1]
+        corner_group1_height = keypoints2d[:, group1_index[0], 1] \
+            - keypoints2d[:, group1_index[1], 1]
+        center_depth = f_u * pred_height_3d / (
+            F.relu(center_height) * downsample_ratio + self.eps)
+        corner_group0_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
+            F.relu(corner_group0_height) * downsample_ratio + self.eps)
+        corner_group1_depth = (f_u * pred_height_3d).unsqueeze(-1) / (
+            F.relu(corner_group1_height) * downsample_ratio + self.eps)
+
+        corner_group0_depth = corner_group0_depth.mean(dim=1)
+        corner_group1_depth = corner_group1_depth.mean(dim=1)
+
+        keypoints_depth = torch.stack(
+            (center_depth, corner_group0_depth, corner_group1_depth), dim=1)
+        keypoints_depth = torch.clamp(
+            keypoints_depth, min=self.depth_range[0], max=self.depth_range[1])
+
+        return keypoints_depth
+
+    def decode_dims(self, labels, dims_offset):
+        """Retrieve object dimensions.
+
+        Args:
+            labels (torch.Tensor): Each points' category id.
+                shape: (N, K)
+            dims_offset (torch.Tensor): Dimension offsets.
+                shape: (N, 3)
+
+        Returns:
+            torch.Tensor: Shape (N, 3)
+        """
+
+        if self.dims_mode == 'exp':
+            dims_offset = dims_offset.exp()
+        elif self.dims_mode == 'linear':
+            labels = labels.long()
+            base_dims = dims_offset.new_tensor(self.base_dims)
+            dims_mean = base_dims[:, :3]
+            dims_std = base_dims[:, 3:6]
+            cls_dimension_mean = dims_mean[labels, :]
+            cls_dimension_std = dims_std[labels, :]
+            dimensions = dims_offset * cls_dimension_mean + cls_dimension_std
+        else:
+            raise ValueError
+
+        return dimensions
+
+    def decode_orientation(self, ori_vector, locations):
+        """Retrieve object orientation.
+
+        Args:
+            ori_vector (torch.Tensor): Local orientation vector
+                in [axis_cls, head_cls, sin, cos] format.
+                shape: (N, num_dir_bins * 4)
+            locations (torch.Tensor): Object location.
+                shape: (N, 3)
+
+        Returns:
+            tuple[torch.Tensor]: yaws and local yaws of 3d bboxes.
+        """
+        if self.multibin:
+            pred_bin_cls = ori_vector[:, :self.num_dir_bins * 2].view(
+                -1, self.num_dir_bins, 2)
+            pred_bin_cls = pred_bin_cls.softmax(dim=2)[..., 1]
+            orientations = ori_vector.new_zeros(ori_vector.shape[0])
+            for i in range(self.num_dir_bins):
+                mask_i = (pred_bin_cls.argmax(dim=1) == i)
+                start_bin = self.num_dir_bins * 2 + i * 2
+                end_bin = start_bin + 2
+                pred_bin_offset = ori_vector[mask_i, start_bin:end_bin]
+                orientations[mask_i] = pred_bin_offset[:, 0].atan2(
+                    pred_bin_offset[:, 1]) + self.bin_centers[i]
+        else:
+            axis_cls = ori_vector[:, :2].softmax(dim=1)
+            axis_cls = axis_cls[:, 0] < axis_cls[:, 1]
+            head_cls = ori_vector[:, 2:4].softmax(dim=1)
+            head_cls = head_cls[:, 0] < head_cls[:, 1]
+            # cls axis
+            orientations = self.bin_centers[axis_cls + head_cls * 2]
+            sin_cos_offset = F.normalize(ori_vector[:, 4:])
+            orientations += sin_cos_offset[:, 0].atan(sin_cos_offset[:, 1])
+
+        locations = locations.view(-1, 3)
+        rays = locations[:, 0].atan2(locations[:, 2])
+        local_yaws = orientations
+        yaws = local_yaws + rays
+
+        larger_idx = (yaws > np.pi).nonzero(as_tuple=False)
+        small_idx = (yaws < -np.pi).nonzero(as_tuple=False)
+        if len(larger_idx) != 0:
+            yaws[larger_idx] -= 2 * np.pi
+        if len(small_idx) != 0:
+            yaws[small_idx] += 2 * np.pi
+
+        larger_idx = (local_yaws > np.pi).nonzero(as_tuple=False)
+        small_idx = (local_yaws < -np.pi).nonzero(as_tuple=False)
+        if len(larger_idx) != 0:
+            local_yaws[larger_idx] -= 2 * np.pi
+        if len(small_idx) != 0:
+            local_yaws[small_idx] += 2 * np.pi
+
+        return yaws, local_yaws
+
+    def decode_bboxes2d(self, reg_bboxes2d, base_centers2d):
+        """Retrieve [x1, y1, x2, y2] format 2D bboxes.
+
+        Args:
+            reg_bboxes2d (torch.Tensor): Predicted FCOS style
+                2D bboxes.
+                shape: (N, 4)
+            base_centers2d (torch.Tensor): predicted base centers2d.
+                shape: (N, 2)
+
+        Returns:
+            torch.Tenosr: [x1, y1, x2, y2] format 2D bboxes.
+        """
+        centers_x = base_centers2d[:, 0]
+        centers_y = base_centers2d[:, 1]
+
+        xs_min = centers_x - reg_bboxes2d[..., 0]
+        ys_min = centers_y - reg_bboxes2d[..., 1]
+        xs_max = centers_x + reg_bboxes2d[..., 2]
+        ys_max = centers_y + reg_bboxes2d[..., 3]
+
+        bboxes2d = torch.stack([xs_min, ys_min, xs_max, ys_max], dim=-1)
+
+        return bboxes2d
+
+    def combine_depths(self, depth, depth_uncertainty):
+        """Combine all the prediced depths with depth uncertainty.
+
+        Args:
+            depth (torch.Tensor): Predicted depths of each object.
+                2D bboxes.
+                shape: (N, 4)
+            depth_uncertainty (torch.Tensor): Depth uncertainty for
+                each depth of each object.
+                shape: (N, 4)
+
+        Returns:
+            torch.Tenosr: combined depth.
+        """
+        uncertainty_weights = 1 / depth_uncertainty
+        uncertainty_weights = \
+            uncertainty_weights / \
+            uncertainty_weights.sum(dim=1, keepdim=True)
+        combined_depth = torch.sum(depth * uncertainty_weights, dim=1)
+
+        return combined_depth