Merge branch 'freeanchor' into 'master'

Freeanchor See merge request open-mmlab/mmdet.3d!101

Merge branch 'freeanchor' into 'master'
Freeanchor See merge request open-mmlab/mmdet.3d!101
4eebf2c6 · zhangwenwei · 37f317e6 · 140af75d · 4eebf2c6 · 4eebf2c6
Commit 4eebf2c6 authored Jun 28, 2020 by zhangwenwei
17 changed files
--- a/README.md
+++ b/README.md
@@ -51,6 +51,7 @@ Results and models are available in the [model zoo](docs/model_zoo.md).
 |--------------------|:--------:|:--------:|:--------:|:---------:|:-----:|:--------:|:-----:|
 | SECOND             | ☐        | ☐        | ☐        | ✗         | ✓     | ✓        | ☐     |
 | PointPillars       | ☐        | ☐        | ☐        | ✗         | ✓     | ✓        | ☐     |
+| FreeAnchor         | ☐        | ☐        | ☐        | ✗         | ✓     | ✓        | ☐     |
 | VoteNet            | ✗        | ✗        | ✗        | ✓         | ✗     | ✗        | ✗     |
 | Part-A2            | ☐        | ☐        | ☐        | ✗         | ✓     | ✓        | ☐     |
 | MVXNet             | ☐        | ☐        | ☐        | ✗         | ✓     | ✓        | ☐     |

--- a/configs/_base_/datasets/nus-3d.py
+++ b/configs/_base_/datasets/nus-3d.py
@@ -16,16 +16,16 @@ input_modality = dict(
    use_radar=False,
    use_map=False,
    use_external=False)
-# file_client_args = dict(backend='disk')
+file_client_args = dict(backend='disk')
 # Uncomment the following if use ceph or other file clients.
 # See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
 # for more details.
-file_client_args = dict(
-    backend='petrel',
-    path_mapping=dict({
-        './data/nuscenes/': 's3://nuscenes/nuscenes/',
-        'data/nuscenes/': 's3://nuscenes/nuscenes/'
-    }))
+# file_client_args = dict(
+#     backend='petrel',
+#     path_mapping=dict({
+#         './data/nuscenes/': 's3://nuscenes/nuscenes/',
+#         'data/nuscenes/': 's3://nuscenes/nuscenes/'
+#     }))
 train_pipeline = [
    dict(
        type='LoadPointsFromFile',
@@ -45,6 +45,7 @@ train_pipeline = [
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])

--- a/configs/free_anchor/README.md
+++ b/configs/free_anchor/README.md
+# FreeAnchor for 3D Object Detection
+
+## Introduction
+
+We implement FreeAnchor in 3D detection systems and provide their first results with PointPillars on nuScenes dataset.
+With the implemented `FreeAnchor3DHead`, a PointPillar detector with a big backbone (e.g., RegNet-3.2GF) achieves top performance
+on the nuScenes benchmark.
+
+```
+@inproceedings{zhang2019freeanchor,
+  title   =  {{FreeAnchor}: Learning to Match Anchors for Visual Object Detection},
+  author  =  {Zhang, Xiaosong and Wan, Fang and Liu, Chang and Ji, Rongrong and Ye, Qixiang},
+  booktitle =  {Neural Information Processing Systems},
+  year    =  {2019}
+}
+```
+
+## Usage
+
+### Modify config
+
+As in the [baseline config](hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py), we only need to replace the head of an existing one-stage detector to use FreeAnchor head.
+Since the config is inherit from a common detector head, `_delete_=True` is necessary to avoid conflicts.
+The hyperparameters are specifically tuned according to the original paper.
+
+```python
+_base_ = [
+    '../_base_/models/pointpillars_second_fpn.py',
+    '../_base_/datasets/nus-3d.py', '../_base_/schedules/schedule_2x.py',
+    '../_base_/default_runtime.py'
+]
+
+model = dict(
+    pts_bbox_head=dict(
+        _delete_=True,
+        type='FreeAnchor3DHead',
+        num_classes=10,
+        in_channels=256,
+        feat_channels=256,
+        use_direction_classifier=True,
+        pre_anchor_topk=25,
+        bbox_thr=0.5,
+        gamma=2.0,
+        alpha=0.5,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGenerator',
+            ranges=[[-50, -50, -1.8, 50, 50, -1.8]],
+            scales=[1, 2, 4],
+            sizes=[
+                [0.8660, 2.5981, 1.],  # 1.5/sqrt(3)
+                [0.5774, 1.7321, 1.],  # 1/sqrt(3)
+                [1., 1., 1.],
+                [0.4, 0.4, 1],
+            ],
+            custom_values=[0, 0],
+            rotations=[0, 1.57],
+            reshape_out=True),
+        assigner_per_size=False,
+        diff_rad_by_sin=True,
+        dir_offset=0.7854,  # pi/4
+        dir_limit_offset=0,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.8),
+        loss_dir=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
+# model training and testing settings
+train_cfg = dict(
+    pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25]))
+```
+
+## Results
+
+### PointPillars
+
+|  Backbone   |FreeAnchor|Lr schd | Mem (GB) | Inf time (fps) | mAP |NDS| Download |
+| :---------: |:-----: |:-----: | :------: | :------------: | :----: |:----: | :------: |
+|[FPN](../pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py)|✗|2x|17.1||40.0|53.3||
+|[FPN](./hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py)|✓|2x|||43.7|55.1||
+|[RegNetX-400MF-FPN](../regnet/hv_pointpillars_regnet-400mf_fpn_sbn-all_4x8_2x_nus-3d.py)|✗|2x|17.3||44.8|56.4||
+|[RegNetX-400MF-FPN](./hv_pointpillars_regnet-1.6gf_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py)|✓|2x||||||
+|[RegNetX-1.6GF-FPN](./hv_pointpillars_regnet-1.6gf_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py)|✓|2x||||||
+|[RegNetX-3.2GF-FPN](./hv_pointpillars_regnet-3.2gf_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py)|✓|2x||||||
--- a/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
+++ b/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
+_base_ = [
+    '../_base_/models/pointpillars_second_fpn.py',
+    '../_base_/datasets/nus-3d.py', '../_base_/schedules/schedule_2x.py',
+    '../_base_/default_runtime.py'
+]
+
+model = dict(
+    pts_bbox_head=dict(
+        _delete_=True,
+        type='FreeAnchor3DHead',
+        num_classes=10,
+        in_channels=256,
+        feat_channels=256,
+        use_direction_classifier=True,
+        pre_anchor_topk=25,
+        bbox_thr=0.5,
+        gamma=2.0,
+        alpha=0.5,
+        anchor_generator=dict(
+            type='AlignedAnchor3DRangeGenerator',
+            ranges=[[-50, -50, -1.8, 50, 50, -1.8]],
+            scales=[1, 2, 4],
+            sizes=[
+                [0.8660, 2.5981, 1.],  # 1.5/sqrt(3)
+                [0.5774, 1.7321, 1.],  # 1/sqrt(3)
+                [1., 1., 1.],
+                [0.4, 0.4, 1],
+            ],
+            custom_values=[0, 0],
+            rotations=[0, 1.57],
+            reshape_out=True),
+        assigner_per_size=False,
+        diff_rad_by_sin=True,
+        dir_offset=0.7854,  # pi/4
+        dir_limit_offset=0,
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
+        loss_cls=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.8),
+        loss_dir=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
+# model training and testing settings
+train_cfg = dict(
+    pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25]))
--- a/configs/free_anchor/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
+++ b/configs/free_anchor/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
+_base_ = './hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py'
+
+model = dict(
+    pretrained=dict(pts='open-mmlab://regnetx_1.6gf'),
+    pts_backbone=dict(
+        _delete_=True,
+        type='NoStemRegNet',
+        arch='regnetx_1.6gf',
+        out_indices=(1, 2, 3),
+        frozen_stages=-1,
+        strides=(1, 2, 2, 2),
+        base_channels=64,
+        stem_channels=64,
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        norm_eval=False,
+        style='pytorch'),
+    pts_neck=dict(in_channels=[168, 408, 912]))
--- a/configs/regnet/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_4x8_2x_nus-3d.py
+++ b/configs/regnet/hv_pointpillars_regnet-1.6gf_fpn_sbn-all_4x8_2x_nus-3d.py
+_base_ = [
+    '../_base_/models/pointpillars_second_fpn.py',
+    '../_base_/datasets/nus-3d.py',
+    '../_base_/schedules/schedule_2x.py',
+    '../_base_/default_runtime.py',
+]
+# model settings
+model = dict(
+    type='MVXFasterRCNN',
+    pretrained=dict(pts='open-mmlab://regnetx_1.6gf'),
+    pts_backbone=dict(
+        _delete_=True,
+        type='NoStemRegNet',
+        arch='regnetx_1.6gf',
+        out_indices=(1, 2, 3),
+        frozen_stages=-1,
+        strides=(1, 2, 2, 2),
+        base_channels=64,
+        stem_channels=64,
+        norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
+        norm_eval=False,
+        style='pytorch'),
+    pts_neck=dict(in_channels=[168, 408, 912]))
--- a/mmdet3d/core/post_processing/merge_augs.py
+++ b/mmdet3d/core/post_processing/merge_augs.py
@@ -10,7 +10,7 @@ def merge_aug_bboxes_3d(aug_results, img_metas, test_cfg):
    Args:
        aug_results (list[dict]): The dict of detection results.
            The dict contains the following keys
-            - boxes_3d (:obj:BaseInstance3DBoxes): detection bbox
+            - boxes_3d (:obj:`BaseInstance3DBoxes`): detection bbox
            - scores_3d (torch.Tensor): detection scores
            - labels_3d (torch.Tensor): predicted box labels
        img_metas (list[dict]): Meta information of each sample
@@ -18,7 +18,7 @@ def merge_aug_bboxes_3d(aug_results, img_metas, test_cfg):

    Returns:
        dict: bbox results in cpu mode, containing the merged results
-            - boxes_3d (:obj:BaseInstance3DBoxes): merged detection bbox
+            - boxes_3d (:obj:`BaseInstance3DBoxes`): merged detection bbox
            - scores_3d (torch.Tensor): merged detection scores
            - labels_3d (torch.Tensor): merged predicted box labels
    """

--- a/mmdet3d/models/dense_heads/__init__.py
+++ b/mmdet3d/models/dense_heads/__init__.py
 from .anchor3d_head import Anchor3DHead
+from .free_anchor3d_head import FreeAnchor3DHead
 from .parta2_rpn_head import PartA2RPNHead
 from .vote_head import VoteHead

-__all__ = ['Anchor3DHead', 'PartA2RPNHead', 'VoteHead']
+__all__ = ['Anchor3DHead', 'FreeAnchor3DHead', 'PartA2RPNHead', 'VoteHead']
--- a/mmdet3d/models/dense_heads/anchor3d_head.py
+++ b/mmdet3d/models/dense_heads/anchor3d_head.py
@@ -281,7 +281,7 @@ class Anchor3DHead(nn.Module, AnchorTrainMixin):
            bbox_preds (list[torch.Tensor]): Multi-level bbox predictions.
            dir_cls_preds (list[torch.Tensor]): Multi-level direction
                class predictions.
-            gt_bboxes (list[:obj:BaseInstance3DBoxes]): Gt bboxes
+            gt_bboxes (list[:obj:`BaseInstance3DBoxes`]): Gt bboxes
                of each sample.
            gt_labels (list[torch.Tensor]): Gt labels of each sample.
            input_metas (list[dict]): Contain pcd and img's meta info.
@@ -405,7 +405,7 @@ class Anchor3DHead(nn.Module, AnchorTrainMixin):

        Returns:
            tuple: Contain predictions of single batch.
-                - bboxes (:obj:BaseInstance3DBoxes): Predicted 3d bboxes.
+                - bboxes (:obj:`BaseInstance3DBoxes`): Predicted 3d bboxes.
                - scores (torch.Tensor): Class score of each bbox.
                - labels (torch.Tensor): Label of each bbox.
        """

--- a/mmdet3d/models/dense_heads/free_anchor3d_head.py
+++ b/mmdet3d/models/dense_heads/free_anchor3d_head.py
+import torch
+import torch.nn.functional as F
+
+from mmdet3d.core.bbox import bbox_overlaps_nearest_3d
+from mmdet.models import HEADS
+from .anchor3d_head import Anchor3DHead
+from .train_mixins import get_direction_target
+
+
+@HEADS.register_module()
+class FreeAnchor3DHead(Anchor3DHead):
+    """`FreeAnchor <https://arxiv.org/abs/1909.02466>`_ head for 3D detection
+
+    Note:
+        This implementation is directly modified from the `mmdet implementation
+        <https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/free_anchor_retina_head.py>`_  # noqa
+        We find it also works on 3D detection with minor modification, i.e.,
+        different hyper-parameters and a additional direction classifier.
+
+    Args:
+        pre_anchor_topk (int): Number of boxes that be token in each bag.
+        bbox_thr (float): The threshold of the saturated linear function. It is
+            usually the same with the IoU threshold used in NMS.
+        gamma (float): Gamma parameter in focal loss.
+        alpha (float): Alpha parameter in focal loss.
+        kwargs (dict): Other arguments are the same as those in :class:`Anchor3DHead`.
+    """
+
+    def __init__(self,
+                 pre_anchor_topk=50,
+                 bbox_thr=0.6,
+                 gamma=2.0,
+                 alpha=0.5,
+                 **kwargs):
+        super().__init__(**kwargs)
+        self.pre_anchor_topk = pre_anchor_topk
+        self.bbox_thr = bbox_thr
+        self.gamma = gamma
+        self.alpha = alpha
+
+    def loss(self,
+             cls_scores,
+             bbox_preds,
+             dir_cls_preds,
+             gt_bboxes,
+             gt_labels,
+             input_metas,
+             gt_bboxes_ignore=None):
+        """Calculate loss of FreeAnchor head.
+
+        Args:
+            cls_scores (list[torch.Tensor]): Classification scores of
+                different samples.
+            bbox_preds (list[torch.Tensor]): Box predictions of
+                different samples
+            dir_cls_preds (list[torch.Tensor]): Direction predictions of
+                different samples
+            gt_bboxes (list[:obj:`BaseInstance3DBoxes`]): Ground truth boxes.
+            gt_labels (list[torch.Tensor]): Ground truth labels.
+            input_metas (list[dict]): List of input meta information.
+            gt_bboxes_ignore (list[:obj:`BaseInstance3DBoxes`], optional):
+                Ground truth boxes that should be ignored. Defaults to None.
+
+        Returns:
+            dict: Loss items.
+        """
+        featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
+        assert len(featmap_sizes) == self.anchor_generator.num_levels
+
+        anchor_list = self.get_anchors(featmap_sizes, input_metas)
+        anchors = [torch.cat(anchor) for anchor in anchor_list]
+
+        # concatenate each level
+        cls_scores = [
+            cls_score.permute(0, 2, 3, 1).reshape(
+                cls_score.size(0), -1, self.num_classes)
+            for cls_score in cls_scores
+        ]
+        bbox_preds = [
+            bbox_pred.permute(0, 2, 3, 1).reshape(
+                bbox_pred.size(0), -1, self.box_code_size)
+            for bbox_pred in bbox_preds
+        ]
+        dir_cls_preds = [
+            dir_cls_pred.permute(0, 2, 3,
+                                 1).reshape(dir_cls_pred.size(0), -1, 2)
+            for dir_cls_pred in dir_cls_preds
+        ]
+
+        cls_scores = torch.cat(cls_scores, dim=1)
+        bbox_preds = torch.cat(bbox_preds, dim=1)
+        dir_cls_preds = torch.cat(dir_cls_preds, dim=1)
+
+        cls_prob = torch.sigmoid(cls_scores)
+        box_prob = []
+        num_pos = 0
+        positive_losses = []
+        for _, (anchors_, gt_labels_, gt_bboxes_, cls_prob_, bbox_preds_,
+                dir_cls_preds_) in enumerate(
+                    zip(anchors, gt_labels, gt_bboxes, cls_prob, bbox_preds,
+                        dir_cls_preds)):
+
+            gt_bboxes_ = gt_bboxes_.tensor.to(anchors_.device)
+
+            with torch.no_grad():
+                # box_localization: a_{j}^{loc}, shape: [j, 4]
+                pred_boxes = self.bbox_coder.decode(anchors_, bbox_preds_)
+
+                # object_box_iou: IoU_{ij}^{loc}, shape: [i, j]
+                object_box_iou = bbox_overlaps_nearest_3d(
+                    gt_bboxes_, pred_boxes)
+
+                # object_box_prob: P{a_{j} -> b_{i}}, shape: [i, j]
+                t1 = self.bbox_thr
+                t2 = object_box_iou.max(
+                    dim=1, keepdim=True).values.clamp(min=t1 + 1e-12)
+                object_box_prob = ((object_box_iou - t1) / (t2 - t1)).clamp(
+                    min=0, max=1)
+
+                # object_cls_box_prob: P{a_{j} -> b_{i}}, shape: [i, c, j]
+                num_obj = gt_labels_.size(0)
+                indices = torch.stack(
+                    [torch.arange(num_obj).type_as(gt_labels_), gt_labels_],
+                    dim=0)
+
+                object_cls_box_prob = torch.sparse_coo_tensor(
+                    indices, object_box_prob)
+
+                # image_box_iou: P{a_{j} \in A_{+}}, shape: [c, j]
+                """
+                from "start" to "end" implement:
+                image_box_iou = torch.sparse.max(object_cls_box_prob,
+                                                 dim=0).t()
+
+                """
+                # start
+                box_cls_prob = torch.sparse.sum(
+                    object_cls_box_prob, dim=0).to_dense()
+
+                indices = torch.nonzero(box_cls_prob).t_()
+                if indices.numel() == 0:
+                    image_box_prob = torch.zeros(
+                        anchors_.size(0),
+                        self.num_classes).type_as(object_box_prob)
+                else:
+                    nonzero_box_prob = torch.where(
+                        (gt_labels_.unsqueeze(dim=-1) == indices[0]),
+                        object_box_prob[:, indices[1]],
+                        torch.tensor(
+                            [0]).type_as(object_box_prob)).max(dim=0).values
+
+                    # upmap to shape [j, c]
+                    image_box_prob = torch.sparse_coo_tensor(
+                        indices.flip([0]),
+                        nonzero_box_prob,
+                        size=(anchors_.size(0), self.num_classes)).to_dense()
+                # end
+
+                box_prob.append(image_box_prob)
+
+            # construct bags for objects
+            match_quality_matrix = bbox_overlaps_nearest_3d(
+                gt_bboxes_, anchors_)
+            _, matched = torch.topk(
+                match_quality_matrix,
+                self.pre_anchor_topk,
+                dim=1,
+                sorted=False)
+            del match_quality_matrix
+
+            # matched_cls_prob: P_{ij}^{cls}
+            matched_cls_prob = torch.gather(
+                cls_prob_[matched], 2,
+                gt_labels_.view(-1, 1, 1).repeat(1, self.pre_anchor_topk,
+                                                 1)).squeeze(2)
+
+            # matched_box_prob: P_{ij}^{loc}
+            matched_anchors = anchors_[matched]
+            matched_object_targets = self.bbox_coder.encode(
+                matched_anchors,
+                gt_bboxes_.unsqueeze(dim=1).expand_as(matched_anchors))
+
+            # direction classification loss
+            loss_dir = None
+            if self.use_direction_classifier:
+                # also calculate direction prob: P_{ij}^{dir}
+                matched_dir_targets = get_direction_target(
+                    matched_anchors,
+                    matched_object_targets,
+                    self.dir_offset,
+                    one_hot=False)
+                loss_dir = self.loss_dir(
+                    dir_cls_preds_[matched].transpose(-2, -1),
+                    matched_dir_targets,
+                    reduction_override='none')
+
+            # generate bbox weights
+            if self.diff_rad_by_sin:
+                bbox_preds_[matched], matched_object_targets = \
+                    self.add_sin_difference(
+                        bbox_preds_[matched], matched_object_targets)
+            bbox_weights = matched_anchors.new_ones(matched_anchors.size())
+            # Use pop is not right, check performance
+            code_weight = self.train_cfg.get('code_weight', None)
+            if code_weight:
+                bbox_weights = bbox_weights * bbox_weights.new_tensor(
+                    code_weight)
+            loss_bbox = self.loss_bbox(
+                bbox_preds_[matched],
+                matched_object_targets,
+                bbox_weights,
+                reduction_override='none').sum(-1)
+
+            if loss_dir is not None:
+                loss_bbox += loss_dir
+            matched_box_prob = torch.exp(-loss_bbox)
+
+            # positive_losses: {-log( Mean-max(P_{ij}^{cls} * P_{ij}^{loc}) )}
+            num_pos += len(gt_bboxes_)
+            positive_losses.append(
+                self.positive_bag_loss(matched_cls_prob, matched_box_prob))
+
+        positive_loss = torch.cat(positive_losses).sum() / max(1, num_pos)
+
+        # box_prob: P{a_{j} \in A_{+}}
+        box_prob = torch.stack(box_prob, dim=0)
+
+        # negative_loss:
+        # \sum_{j}{ FL((1 - P{a_{j} \in A_{+}}) * (1 - P_{j}^{bg})) } / n||B||
+        negative_loss = self.negative_bag_loss(cls_prob, box_prob).sum() / max(
+            1, num_pos * self.pre_anchor_topk)
+
+        losses = {
+            'positive_bag_loss': positive_loss,
+            'negative_bag_loss': negative_loss
+        }
+        return losses
+
+    def positive_bag_loss(self, matched_cls_prob, matched_box_prob):
+        """Generate positive bag loss
+
+        Args:
+            matched_cls_prob (torch.Tensor): Classification probability
+                of matched positive samples.
+            matched_box_prob (torch.Tensor): Bounding box probability
+                of matched positive samples.
+
+        Returns:
+            torch.Tensor: Loss of positive samples.
+        """
+        # bag_prob = Mean-max(matched_prob)
+        matched_prob = matched_cls_prob * matched_box_prob
+        weight = 1 / torch.clamp(1 - matched_prob, 1e-12, None)
+        weight /= weight.sum(dim=1).unsqueeze(dim=-1)
+        bag_prob = (weight * matched_prob).sum(dim=1)
+        # positive_bag_loss = -self.alpha * log(bag_prob)
+        bag_prob = bag_prob.clamp(0, 1)  # to avoid bug of BCE, check
+        return self.alpha * F.binary_cross_entropy(
+            bag_prob, torch.ones_like(bag_prob), reduction='none')
+
+    def negative_bag_loss(self, cls_prob, box_prob):
+        """Generate negative bag loss
+
+        Args:
+            cls_prob (torch.Tensor): Classification probability
+                of negative samples.
+            box_prob (torch.Tensor): Bounding box probability
+                of negative samples.
+
+        Returns:
+            torch.Tensor: Loss of negative samples.
+        """
+        prob = cls_prob * (1 - box_prob)
+        prob = prob.clamp(0, 1)  # to avoid bug of BCE, check
+        negative_bag_loss = prob**self.gamma * F.binary_cross_entropy(
+            prob, torch.zeros_like(prob), reduction='none')
+        return (1 - self.alpha) * negative_bag_loss
--- a/mmdet3d/models/dense_heads/parta2_rpn_head.py
+++ b/mmdet3d/models/dense_heads/parta2_rpn_head.py
@@ -121,7 +121,7 @@ class PartA2RPNHead(Anchor3DHead):

        Returns:
            dict: Predictions of single batch. Contain the keys:
-                - boxes_3d (:obj:BaseInstance3DBoxes): Predicted 3d bboxes.
+                - boxes_3d (:obj:`BaseInstance3DBoxes`): Predicted 3d bboxes.
                - scores_3d (torch.Tensor): Score of each bbox.
                - labels_3d (torch.Tensor): Label of each bbox.
                - cls_preds (torch.Tensor): Class score of each bbox.
@@ -217,7 +217,7 @@ class PartA2RPNHead(Anchor3DHead):

        Returns:
            dict: Predictions of single batch. Contain the keys:
-                - boxes_3d (:obj:BaseInstance3DBoxes): Predicted 3d bboxes.
+                - boxes_3d (:obj:`BaseInstance3DBoxes`): Predicted 3d bboxes.
                - scores_3d (torch.Tensor): Score of each bbox.
                - labels_3d (torch.Tensor): Label of each bbox.
                - cls_preds (torch.Tensor): Class score of each bbox.

--- a/mmdet3d/models/dense_heads/train_mixins.py
+++ b/mmdet3d/models/dense_heads/train_mixins.py
@@ -20,7 +20,7 @@ class AnchorTrainMixin(object):

        Args:
            anchor_list (list[list]): Multi level anchors of each image.
-            gt_bboxes_list (list[:obj:BaseInstance3DBoxes]): Ground truth
+            gt_bboxes_list (list[:obj:`BaseInstance3DBoxes`]): Ground truth
                bboxes of each image.
            input_metas (list[dict]): Meta info of each image.
            gt_bboxes_ignore_list (None | list): Ignore list of gt bboxes.
@@ -96,7 +96,7 @@ class AnchorTrainMixin(object):

        Args:
            anchors (torch.Tensor): Concatenated multi-level anchor.
-            gt_bboxes (:obj:BaseInstance3DBoxes): Gt bboxes.
+            gt_bboxes (:obj:`BaseInstance3DBoxes`): Gt bboxes.
            gt_bboxes_ignore (torch.Tensor): Ignored gt bboxes.
            gt_labels (torch.Tensor): Gt class labels.
            input_meta (dict): Meta info of each image.
@@ -185,7 +185,7 @@ class AnchorTrainMixin(object):
        Args:
            bbox_assigner (BaseAssigner): assign positive and negative boxes.
            anchors (torch.Tensor): Concatenated multi-level anchor.
-            gt_bboxes (:obj:BaseInstance3DBoxes): Gt bboxes.
+            gt_bboxes (:obj:`BaseInstance3DBoxes`): Gt bboxes.
            gt_bboxes_ignore (torch.Tensor): Ignored gt bboxes.
            gt_labels (torch.Tensor): Gt class labels.
            input_meta (dict): Meta info of each image.

--- a/mmdet3d/models/dense_heads/vote_head.py
+++ b/mmdet3d/models/dense_heads/vote_head.py
@@ -189,7 +189,7 @@ class VoteHead(nn.Module):
        Args:
            bbox_preds (dict): Predictions from forward of vote head.
            points (list[torch.Tensor]): Input points.
-            gt_bboxes_3d (list[:obj:BaseInstance3DBoxes]): Gt bboxes
+            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`]): Gt bboxes
                of each sample.
            gt_labels_3d (list[torch.Tensor]): Gt labels of each sample.
            pts_semantic_mask (None | list[torch.Tensor]): Point-wise
@@ -296,7 +296,7 @@ class VoteHead(nn.Module):

        Args:
            points (list[torch.Tensor]): Points of each batch.
-            gt_bboxes_3d (list[:obj:BaseInstance3DBoxes]): gt bboxes of
+            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`]): gt bboxes of
                each batch.
            gt_labels_3d (list[torch.Tensor]): gt class labels of each batch.
            pts_semantic_mask (None | list[torch.Tensor]): point-wise semantic
@@ -382,7 +382,7 @@ class VoteHead(nn.Module):

        Args:
            points (torch.Tensor): Points of each batch.
-            gt_bboxes_3d (:obj:BaseInstance3DBoxes): gt bboxes of each batch.
+            gt_bboxes_3d (:obj:`BaseInstance3DBoxes`): gt bboxes of each batch.
            gt_labels_3d (torch.Tensor): gt class labels of each batch.
            pts_semantic_mask (None | torch.Tensor): point-wise semantic
                label of each batch.

--- a/mmdet3d/models/detectors/votenet.py
+++ b/mmdet3d/models/detectors/votenet.py
@@ -38,7 +38,7 @@ class VoteNet(SingleStage3DDetector):
        Args:
            points (list[torch.Tensor]): Points of each batch.
            img_metas (list): Image metas.
-            gt_bboxes_3d (:obj:BaseInstance3DBoxes): gt bboxes of each batch.
+            gt_bboxes_3d (:obj:`BaseInstance3DBoxes`): gt bboxes of each batch.
            gt_labels_3d (list[torch.Tensor]): gt class labels of each batch.
            pts_semantic_mask (None | list[torch.Tensor]): point-wise semantic
                label of each batch.

--- a/mmdet3d/models/roi_heads/base_3droi_head.py
+++ b/mmdet3d/models/roi_heads/base_3droi_head.py
@@ -63,7 +63,7 @@ class Base3DRoIHead(nn.Module, metaclass=ABCMeta):
            x (dict): Contains features from the first stage.
            img_metas (list[dict]): Meta info of each image.
            proposal_list (list[dict]): Proposal information from rpn.
-            gt_bboxes (list[:obj:BaseInstance3DBoxes]):
+            gt_bboxes (list[:obj:`BaseInstance3DBoxes`]):
                GT bboxes of each sample. The bboxes are encapsulated
                by 3D box structures.
            gt_labels (list[LongTensor]): GT labels of each sample.

--- a/mmdet3d/models/roi_heads/mask_heads/pointwise_semantic_head.py
+++ b/mmdet3d/models/roi_heads/mask_heads/pointwise_semantic_head.py
@@ -78,7 +78,7 @@ class PointwiseSemanticHead(nn.Module):
        Args:
            voxel_centers (torch.Tensor): shape [voxel_num, 3],
                the center of voxels
-            gt_bboxes_3d (:obj:BaseInstance3DBoxes): gt boxes containing tensor
+            gt_bboxes_3d (:obj:`BaseInstance3DBoxes`): gt boxes with tensor
                of shape [box_num, 7].
            gt_labels_3d (torch.Tensor): shape [box_num], class label of gt

@@ -125,7 +125,7 @@ class PointwiseSemanticHead(nn.Module):
        Args:
            voxel_centers (torch.Tensor): shape [voxel_num, 3],
                the center of voxels
-            gt_bboxes_3d (list[:obj:BaseInstance3DBoxes]): list of gt boxes
+            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`]): list of gt boxes
                containing tensor of shape [box_num, 7].
            gt_labels_3d (list[torch.Tensor]): list of GT labels.


--- a/mmdet3d/models/roi_heads/part_aggregation_roi_head.py
+++ b/mmdet3d/models/roi_heads/part_aggregation_roi_head.py
@@ -79,10 +79,10 @@ class PartAggregationROIHead(Base3DRoIHead):
            img_metas (list[dict]): Meta info of each image.
            proposal_list (list[dict]): Proposal information from rpn.
                The dictionary should contain the following keys:
-                - boxes_3d (:obj:BaseInstance3DBoxes): Proposal bboxes
+                - boxes_3d (:obj:`BaseInstance3DBoxes`): Proposal bboxes
                - labels_3d (torch.Tensor): Labels of proposals
                - cls_preds (torch.Tensor): Original scores of proposals
-            gt_bboxes_3d (list[:obj:BaseInstance3DBoxes]):
+            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`]):
                GT bboxes of each sample. The bboxes are encapsulated
                by 3D box structures.
            gt_labels_3d (list[LongTensor]): GT labels of each sample.