Use pre-commit to reformat code

Use pre-commit to reformat code

Use pre-commit to reformat code
41b18fd8 · zhe chen · ff20ea39 · 41b18fd8 · 41b18fd8 · 41b18fd8
Commit 41b18fd8 authored Jan 06, 2025 by zhe chen
20 changed files
--- a/autonomous_driving/occupancy_prediction/README.md
+++ b/autonomous_driving/occupancy_prediction/README.md
 ## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!!
-We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
+We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a
+series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
 #### 1. Requirements
 ```bash
 python>=3.8
 torch==1.12 # recommend
@@ -16,8 +16,8 @@ numpy==1.22
 mmdet3d==0.18.1 # recommend
 ```
 ### 2. Install DCNv3 for InternImage
 ```bash
 cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3
 bash make.sh # requires torch>=1.10
@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10
 Notes: InatenImage provides abundant pre-trained model weights that can be used!!!
 ### 4. Performance compared to baseline
-model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer |  truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | 
+| model name             |                                                weight                                                 | mIoU  | others | barrier | bicycle |  bus  |  car  | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
+| ---------------------- | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
-bevformer_intern-s_occ|[Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)| 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 | 
+| bevformer_intern-s_occ |  [Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)   | 25.11 |  6.93  |  35.57  |  10.40  | 35.97 | 41.23 |        13.72         |   20.30    |   21.10    |    18.34     |  19.18  | 28.64 |       49.82       |   30.74    |  31.00   |  27.44  |  19.29  |   17.29    |
-bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | 
+| bevformer_base_occ     | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 |  5.03  |  38.79  |  9.98   | 34.41 | 41.09 |        13.24         |   16.50    |   18.15    |    17.83     |  18.66  | 27.70 |       48.95       |   27.73    |  29.08   |  25.38  |  15.41  |   14.46    |
 ## Challenge Timeline
 - Pending - Challenge Period Open.
 - Jun 01, 2023 - Challenge Period End.
 - Jun 03, 2023 - Finalist Notification.
 - Jun 10, 2023 - Technical Report Deadline.
 - Jun 12, 2023 - Winner Announcement.
 <p align="right">(<a href="#top">back to top</a>)</p>
+## Leaderboard
-## Leaderboard 
 To be released.
 <p align="right">(<a href="#top">back to top</a>)</p>
 ## License
-Before using the dataset, you should register on the website and agree to the terms of use of the [nuScenes](https://www.nuscenes.org/nuscenes).
-All code within this repository is under [Apache License 2.0](./LICENSE).
+Before using the dataset, you should register on the website and agree to the terms of use of
+the [nuScenes](https://www.nuscenes.org/nuscenes). All code within this repository is
+under [Apache License 2.0](./LICENSE).
 <p align="right">(<a href="#top">back to top</a>)</p>
--- a/autonomous_driving/occupancy_prediction/docs/getting_started.md
+++ b/autonomous_driving/occupancy_prediction/docs/getting_started.md
 ## Installation
-Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
+Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
-## Preparing Dataset
-1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset.
+## Preparing Dataset
-2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download).
+1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample
+   files of the nuScenes dataset.
-3. Organize your folder structure as below：
-```
+2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download).
-Occupancy3D
-├── projects/
+3. Organize your folder structure as below：
-├── tools/
-├── ckpts/
+```
-│   ├── r101_dcn_fcos3d_pretrain.pth
+Occupancy3D
-├── data/
+├── projects/
-│   ├── can_bus/
+├── tools/
-│   ├── occ3d-nus/
+├── ckpts/
-│   │   ├── maps/
+│   ├── r101_dcn_fcos3d_pretrain.pth
-│   │   ├── samples/     # You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset
+├── data/
-│   │   ├── v1.0-trainval/
+│   ├── can_bus/
-│   │   ├── gts/
+│   ├── occ3d-nus/
-│   │   │── annotations.json
+│   │   ├── maps/
-```
+│   │   ├── samples/     # You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset
+│   │   ├── v1.0-trainval/
+│   │   ├── gts/
-4. Generate the info files for training and validation:
+│   │   │── annotations.json
 ```
-python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
-``` 
+4. Generate the info files for training and validation:
-## Training
+```
-```
+python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
-./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8
+```
-```
+## Training
-## Testing
 ```
-./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8
+./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8
 ```
-You can evaluate the F-score at the same time by adding `--eval_fscore`.
+## Testing
-### Performance
+```
+./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8
-model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer |  truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | 
+```
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
-bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | 
+You can evaluate the F-score at the same time by adding `--eval_fscore`.
+### Performance
+| model name         |                                                weight                                                 | mIoU  | others | barrier | bicycle |  bus  |  car  | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
+| ------------------ | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
+| bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 |  5.03  |  38.79  |  9.98   | 34.41 | 41.09 |        13.24         |   16.50    |   18.15    |    17.83     |  18.66  | 27.7  |       48.95       |   27.73    |  29.08   |  25.38  |  15.41  |   14.46    |
--- a/autonomous_driving/occupancy_prediction/projects/configs/_base_/datasets/s3dis_seg-3d-13class.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/_base_/datasets/s3dis_seg-3d-13class.py
@@ -125,8 +125,7 @@ data = dict(
        classes=class_names,
        test_mode=True,
        ignore_index=len(class_names),
-        scene_idxs=data_root +
+        scene_idxs=data_root + f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
-        f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
    test=dict(
        type=dataset_type,
        data_root=data_root,

--- a/autonomous_driving/occupancy_prediction/projects/configs/_base_/models/3dssd.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/_base_/models/3dssd.py
@@ -25,7 +25,7 @@ model = dict(
            in_channels=256,
            num_points=256,
            gt_per_seed=1,
-            conv_channels=(128, ),
+            conv_channels=(128,),
            conv_cfg=dict(type='Conv1d'),
            norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
            with_res_feat=False,
@@ -43,8 +43,8 @@ model = dict(
        pred_layer_cfg=dict(
            in_channels=1536,
            shared_conv_channels=(512, 128),
-            cls_conv_channels=(128, ),
+            cls_conv_channels=(128,),
-            reg_conv_channels=(128, ),
+            reg_conv_channels=(128,),
            conv_cfg=dict(type='Conv1d'),
            norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
            bias=True),

--- a/autonomous_driving/occupancy_prediction/projects/configs/_base_/models/fcos3d.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/_base_/models/fcos3d.py
@@ -31,16 +31,16 @@ model = dict(
        dir_offset=0.7854,  # pi/4
        strides=[8, 16, 32, 64, 128],
        group_reg_dims=(2, 1, 3, 1, 2),  # offset, depth, size, rot, velo
-        cls_branch=(256, ),
+        cls_branch=(256,),
        reg_branch=(
-            (256, ),  # offset
+            (256,),  # offset
-            (256, ),  # depth
+            (256,),  # depth
-            (256, ),  # size
+            (256,),  # size
-            (256, ),  # rot
+            (256,),  # rot
            ()  # velo
        ),
-        dir_branch=(256, ),
+        dir_branch=(256,),
-        attr_branch=(256, ),
+        attr_branch=(256,),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,

--- a/autonomous_driving/occupancy_prediction/projects/configs/bevformer/.ipynb_checkpoints/bevformer_small_occ-checkpoint.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/bevformer/.ipynb_checkpoints/bevformer_small_occ-checkpoint.py
 _base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
 ]
 #
 plugin = True
 plugin_dir = 'projects/mmdet3d_plugin/'
 # If point cloud range is changed, the models should also change their point
 # cloud range accordingly
 point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
 voxel_size = [0.2, 0.2, 8]
+img_norm_cfg = dict(
+    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
-img_norm_cfg = dict(
+# For nuScenes we usually do 10-class detection
-    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
+class_names = [
-# For nuScenes we usually do 10-class detection
+    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
-class_names = [
+    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
+]
-    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-]
+input_modality = dict(
+    use_lidar=False,
-input_modality = dict(
+    use_camera=True,
-    use_lidar=False,
+    use_radar=False,
-    use_camera=True,
+    use_map=False,
-    use_radar=False,
+    use_external=True)
-    use_map=False,
-    use_external=True)
+_dim_ = 256
+_pos_dim_ = _dim_ // 2
-_dim_ = 256
+_ffn_dim_ = _dim_ * 2
-_pos_dim_ = _dim_//2
+_num_levels_ = 2
-_ffn_dim_ = _dim_*2
+bev_h_ = 200
-_num_levels_ = 2
+bev_w_ = 200
-bev_h_ = 200
+queue_length = 4  # each sequence contains `queue_length` frames.
-bev_w_ = 200
+model = dict(
-queue_length = 4 # each sequence contains `queue_length` frames.
+    type='BEVFormerOcc',
-model = dict(
+    use_grid_mask=True,
-    type='BEVFormerOcc',
+    video_test_mode=True,
-    use_grid_mask=True,
+    img_backbone=dict(
-    video_test_mode=True,
+        type='ResNet',
-    img_backbone=dict(
+        depth=50,
-        type='ResNet',
+        num_stages=4,
-        depth=50,
+        out_indices=(2, 3),
-        num_stages=4,
+        frozen_stages=1,
-        out_indices=(2, 3),
+        norm_cfg=dict(type='BN2d', requires_grad=False),
-        frozen_stages=1,
+        norm_eval=True,
-        norm_cfg=dict(type='BN2d', requires_grad=False),
+        style='caffe',
-        norm_eval=True,
+        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
-        style='caffe',
+        # original DCNv2 will print log when perform load_state_dict
-        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
+        stage_with_dcn=(False, False, True, True)),
-        stage_with_dcn=(False, False, True, True)),
+    img_neck=dict(
-    img_neck=dict(
+        type='FPN',
-        type='FPN',
+        in_channels=[1024, 2048],
-        in_channels=[1024, 2048],
+        out_channels=_dim_,
-        out_channels=_dim_,
+        start_level=0,
-        start_level=0,
+        add_extra_convs='on_output',
-        add_extra_convs='on_output',
+        num_outs=_num_levels_,
-        num_outs=_num_levels_,
+        relu_before_extra_convs=True),
-        relu_before_extra_convs=True),
+    pts_bbox_head=dict(
-    pts_bbox_head=dict(
+        type='BEVFormerOccHead',
-        type='BEVFormerOccHead',
+        pc_range=point_cloud_range,
-        pc_range=point_cloud_range,
+        bev_h=bev_h_,
-        bev_h=bev_h_,
+        bev_w=bev_w_,
-        bev_w=bev_w_,
+        num_classes=18,
-        num_classes=18,
+        in_channels=_dim_,
-        in_channels=_dim_,
+        sync_cls_avg_factor=True,
-        sync_cls_avg_factor=True,
+        with_box_refine=True,
-        with_box_refine=True,
+        as_two_stage=False,
-        as_two_stage=False,
+        # loss_occ=dict(
-        # loss_occ=dict(
+        #     type='FocalLoss',
-        #     type='FocalLoss',
+        #     use_sigmoid=False,
-        #     use_sigmoid=False,
+        #     gamma=2.0,
-        #     gamma=2.0,
+        #     alpha=0.25,
-        #     alpha=0.25,
+        #     loss_weight=10.0),
-        #     loss_weight=10.0),
+        use_mask=False,
-        use_mask=False,
+        loss_occ=dict(
-        loss_occ= dict(
+            type='CrossEntropyLoss',
-            type='CrossEntropyLoss',
+            use_sigmoid=False,
-            use_sigmoid=False,
+            loss_weight=1.0),
-            loss_weight=1.0),
+        transformer=dict(
-        transformer=dict(
+            type='TransformerOcc',
-            type='TransformerOcc',
+            pillar_h=16,
-            pillar_h=16,
+            num_classes=18,
-            num_classes=18,
+            norm_cfg=dict(type='BN', ),
-            norm_cfg=dict(type='BN', ),
+            norm_cfg_3d=dict(type='BN3d', ),
-            norm_cfg_3d=dict(type='BN3d', ),
+            use_3d=True,
-            use_3d=True,
+            use_conv=False,
-            use_conv=False,
+            rotate_prev_bev=True,
-            rotate_prev_bev=True,
+            use_shift=True,
-            use_shift=True,
+            use_can_bus=True,
-            use_can_bus=True,
+            embed_dims=_dim_,
-            embed_dims=_dim_,
+            encoder=dict(
-            encoder=dict(
+                type='BEVFormerEncoder',
-                type='BEVFormerEncoder',
+                num_layers=1,
-                num_layers=1,
+                pc_range=point_cloud_range,
-                pc_range=point_cloud_range,
+                num_points_in_pillar=8,
-                num_points_in_pillar=8,
+                return_intermediate=False,
-                return_intermediate=False,
+                transformerlayers=dict(
-                transformerlayers=dict(
+                    type='BEVFormerLayer',
-                    type='BEVFormerLayer',
+                    attn_cfgs=[
-                    attn_cfgs=[
+                        dict(
-                        dict(
+                            type='TemporalSelfAttention',
-                            type='TemporalSelfAttention',
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                            num_levels=1),
-                            num_levels=1),
+                        dict(
-                        dict(
+                            type='SpatialCrossAttention',
-                            type='SpatialCrossAttention',
+                            pc_range=point_cloud_range,
-                            pc_range=point_cloud_range,
+                            deformable_attention=dict(
-                            deformable_attention=dict(
+                                type='MSDeformableAttention3D',
-                                type='MSDeformableAttention3D',
+                                embed_dims=_dim_,
-                                embed_dims=_dim_,
+                                num_points=8,
-                                num_points=8,
+                                num_levels=_num_levels_),
-                                num_levels=_num_levels_),
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                        )
-                        )
+                    ],
-                    ],
+                    feedforward_channels=_ffn_dim_,
-                    feedforward_channels=_ffn_dim_,
+                    ffn_dropout=0.1,
-                    ffn_dropout=0.1,
+                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
-                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
+                                     'ffn', 'norm'))),
-                                     'ffn', 'norm'))),
+        ),
-        ),
+        positional_encoding=dict(
-        positional_encoding=dict(
+            type='LearnedPositionalEncoding',
-            type='LearnedPositionalEncoding',
+            num_feats=_pos_dim_,
-            num_feats=_pos_dim_,
+            row_num_embed=bev_h_,
-            row_num_embed=bev_h_,
+            col_num_embed=bev_w_,
-            col_num_embed=bev_w_,
+        ),
-        ),
+        # model training and testing settings
-    # model training and testing settings
+        train_cfg=dict(pts=dict(
-    train_cfg=dict(pts=dict(
+            grid_size=[512, 512, 1],
-        grid_size=[512, 512, 1],
+            voxel_size=voxel_size,
-        voxel_size=voxel_size,
+            point_cloud_range=point_cloud_range,
-        point_cloud_range=point_cloud_range,
+            out_size_factor=4,
-        out_size_factor=4,
+            assigner=dict(
-        assigner=dict(
+                type='HungarianAssigner3D',
-            type='HungarianAssigner3D',
+                cls_cost=dict(type='FocalLossCost', weight=2.0),
-            cls_cost=dict(type='FocalLossCost', weight=2.0),
+                reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
-            reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
+                iou_cost=dict(type='IoUCost', weight=0.0),
-            iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
+                # Fake cost. This is just to make it compatible with DETR head.
-            pc_range=point_cloud_range)))))
+                pc_range=point_cloud_range)))))
 dataset_type = 'NuSceneOcc'
 data_root = 'data/occ3d-nus/'
 file_client_args = dict(backend='disk')
-occ_gt_data_root='data/occ3d-nus'
+occ_gt_data_root = 'data/occ3d-nus'
 train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
    dict(type='PhotoMetricDistortionMultiViewImage'),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
    dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
-    dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
+    dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
 ]
 test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1600, 900),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
-        dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
+            dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
-        dict(type='PadMultiViewImage', size_divisor=32),
+            dict(type='PadMultiViewImage', size_divisor=32),
-        dict(
+            dict(
-            type='DefaultFormatBundle3D',
+                type='DefaultFormatBundle3D',
-            class_names=class_names,
+                class_names=class_names,
-            with_label=False),
+                with_label=False),
-        dict(type='CustomCollect3D', keys=['img'])
+            dict(type='CustomCollect3D', keys=['img'])
-    ])
+        ])
 ]
 data = dict(
    samples_per_gpu=1,
    workers_per_gpu=0,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'occ_infos_temporal_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        bev_size=(bev_h_, bev_w_),
        queue_length=queue_length,
        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR'),
    val=dict(type=dataset_type,
             data_root=data_root,
             ann_file=data_root + 'occ_infos_temporal_val.pkl',
-             pipeline=test_pipeline,  bev_size=(bev_h_, bev_w_),
+             pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
             classes=class_names, modality=input_modality, samples_per_gpu=1),
    test=dict(type=dataset_type,
              data_root=data_root,
              ann_file=data_root + 'occ_infos_temporal_val.pkl',
              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
              classes=class_names, modality=input_modality),
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler')
 )
 optimizer = dict(
    type='AdamW',
    lr=2e-4,
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)
 optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
 # learning policy
 lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)
 total_epochs = 24
 evaluation = dict(interval=1, pipeline=test_pipeline)
 runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
 # load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
 log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
 checkpoint_config = dict(interval=1)
--- a/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_base_occ.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_base_occ.py
 _base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
 ]
 #
 plugin = True
 plugin_dir = 'projects/mmdet3d_plugin/'
 # If point cloud range is changed, the models should also change their point
 # cloud range accordingly
 point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
 voxel_size = [0.2, 0.2, 8]
+img_norm_cfg = dict(
+    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
-img_norm_cfg = dict(
+# For nuScenes we usually do 10-class detection
-    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
+class_names = [
-# For nuScenes we usually do 10-class detection
+    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
-class_names = [
+    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
+]
-    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-]
+input_modality = dict(
+    use_lidar=False,
-input_modality = dict(
+    use_camera=True,
-    use_lidar=False,
+    use_radar=False,
-    use_camera=True,
+    use_map=False,
-    use_radar=False,
+    use_external=True)
-    use_map=False,
-    use_external=True)
+_dim_ = 256
+_pos_dim_ = _dim_ // 2
-_dim_ = 256
+_ffn_dim_ = _dim_ * 2
-_pos_dim_ = _dim_//2
+_num_levels_ = 4
-_ffn_dim_ = _dim_*2
+bev_h_ = 200
-_num_levels_ = 4
+bev_w_ = 200
-bev_h_ = 200
+queue_length = 4  # each sequence contains `queue_length` frames.
-bev_w_ = 200
+model = dict(
-queue_length = 4 # each sequence contains `queue_length` frames.
+    type='BEVFormerOcc',
-model = dict(
+    use_grid_mask=True,
-    type='BEVFormerOcc',
+    video_test_mode=True,
-    use_grid_mask=True,
+    img_backbone=dict(
-    video_test_mode=True,
+        type='ResNet',
-    img_backbone=dict(
+        depth=101,
-        type='ResNet',
+        num_stages=4,
-        depth=101,
+        out_indices=(1, 2, 3),
-        num_stages=4,
+        frozen_stages=1,
-        out_indices=(1, 2, 3),
+        norm_cfg=dict(type='BN2d', requires_grad=False),
-        frozen_stages=1,
+        norm_eval=True,
-        norm_cfg=dict(type='BN2d', requires_grad=False),
+        style='caffe',
-        norm_eval=True,
+        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
-        style='caffe',
+        # original DCNv2 will print log when perform load_state_dict
-        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
+        stage_with_dcn=(False, False, True, True)),
-        stage_with_dcn=(False, False, True, True)),
+    img_neck=dict(
-    img_neck=dict(
+        type='FPN',
-        type='FPN',
+        in_channels=[512, 1024, 2048],
-        in_channels=[512, 1024, 2048],
+        out_channels=_dim_,
-        out_channels=_dim_,
+        start_level=0,
-        start_level=0,
+        add_extra_convs='on_output',
-        add_extra_convs='on_output',
+        num_outs=4,
-        num_outs=4,
+        relu_before_extra_convs=True),
-        relu_before_extra_convs=True),
+    pts_bbox_head=dict(
-    pts_bbox_head=dict(
+        type='BEVFormerOccHead',
-        type='BEVFormerOccHead',
+        pc_range=point_cloud_range,
-        pc_range=point_cloud_range,
+        bev_h=bev_h_,
-        bev_h=bev_h_,
+        bev_w=bev_w_,
-        bev_w=bev_w_,
+        num_classes=18,
-        num_classes=18,
+        in_channels=_dim_,
-        in_channels=_dim_,
+        sync_cls_avg_factor=True,
-        sync_cls_avg_factor=True,
+        with_box_refine=True,
-        with_box_refine=True,
+        as_two_stage=False,
-        as_two_stage=False,
+        # loss_occ=dict(
-        # loss_occ=dict(
+        #     type='FocalLoss',
-        #     type='FocalLoss',
+        #     use_sigmoid=False,
-        #     use_sigmoid=False,
+        #     gamma=2.0,
-        #     gamma=2.0,
+        #     alpha=0.25,
-        #     alpha=0.25,
+        #     loss_weight=10.0),
-        #     loss_weight=10.0),
+        use_mask=False,
-        use_mask=False,
+        loss_occ=dict(
-        loss_occ= dict(
+            type='CrossEntropyLoss',
-            type='CrossEntropyLoss',
+            use_sigmoid=False,
-            use_sigmoid=False,
+            loss_weight=1.0),
-            loss_weight=1.0),
+        transformer=dict(
-        transformer=dict(
+            type='TransformerOcc',
-            type='TransformerOcc',
+            pillar_h=16,
-            pillar_h=16,
+            num_classes=18,
-            num_classes=18,
+            norm_cfg=dict(type='BN', ),
-            norm_cfg=dict(type='BN', ),
+            norm_cfg_3d=dict(type='BN3d', ),
-            norm_cfg_3d=dict(type='BN3d', ),
+            use_3d=True,
-            use_3d=True,
+            use_conv=False,
-            use_conv=False,
+            rotate_prev_bev=True,
-            rotate_prev_bev=True,
+            use_shift=True,
-            use_shift=True,
+            use_can_bus=True,
-            use_can_bus=True,
+            embed_dims=_dim_,
-            embed_dims=_dim_,
+            encoder=dict(
-            encoder=dict(
+                type='BEVFormerEncoder',
-                type='BEVFormerEncoder',
+                num_layers=4,
-                num_layers=4,
+                pc_range=point_cloud_range,
-                pc_range=point_cloud_range,
+                num_points_in_pillar=8,
-                num_points_in_pillar=8,
+                return_intermediate=False,
-                return_intermediate=False,
+                transformerlayers=dict(
-                transformerlayers=dict(
+                    type='BEVFormerLayer',
-                    type='BEVFormerLayer',
+                    attn_cfgs=[
-                    attn_cfgs=[
+                        dict(
-                        dict(
+                            type='TemporalSelfAttention',
-                            type='TemporalSelfAttention',
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                            num_levels=1),
-                            num_levels=1),
+                        dict(
-                        dict(
+                            type='SpatialCrossAttention',
-                            type='SpatialCrossAttention',
+                            pc_range=point_cloud_range,
-                            pc_range=point_cloud_range,
+                            deformable_attention=dict(
-                            deformable_attention=dict(
+                                type='MSDeformableAttention3D',
-                                type='MSDeformableAttention3D',
+                                embed_dims=_dim_,
-                                embed_dims=_dim_,
+                                num_points=8,
-                                num_points=8,
+                                num_levels=_num_levels_),
-                                num_levels=_num_levels_),
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                        )
-                        )
+                    ],
-                    ],
+                    feedforward_channels=_ffn_dim_,
-                    feedforward_channels=_ffn_dim_,
+                    ffn_dropout=0.1,
-                    ffn_dropout=0.1,
+                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
-                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
+                                     'ffn', 'norm'))),
-                                     'ffn', 'norm'))),
+        ),
-        ),
+        positional_encoding=dict(
-        positional_encoding=dict(
+            type='LearnedPositionalEncoding',
-            type='LearnedPositionalEncoding',
+            num_feats=_pos_dim_,
-            num_feats=_pos_dim_,
+            row_num_embed=bev_h_,
-            row_num_embed=bev_h_,
+            col_num_embed=bev_w_,
-            col_num_embed=bev_w_,
+        ),
-        ),
+        # model training and testing settings
+        train_cfg=dict(pts=dict(
-    # model training and testing settings
+            grid_size=[512, 512, 1],
-    train_cfg=dict(pts=dict(
+            voxel_size=voxel_size,
-        grid_size=[512, 512, 1],
+            point_cloud_range=point_cloud_range,
-        voxel_size=voxel_size,
+            out_size_factor=4,
-        point_cloud_range=point_cloud_range,
+            assigner=dict(
-        out_size_factor=4,
+                type='HungarianAssigner3D',
-        assigner=dict(
+                cls_cost=dict(type='FocalLossCost', weight=2.0),
-            type='HungarianAssigner3D',
+                reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
-            cls_cost=dict(type='FocalLossCost', weight=2.0),
+                iou_cost=dict(type='IoUCost', weight=0.0),
-            reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
+                # Fake cost. This is just to make it compatible with DETR head.
-            iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
+                pc_range=point_cloud_range)))))
-            pc_range=point_cloud_range)))))
+dataset_type = 'NuSceneOcc'
-dataset_type = 'NuSceneOcc'
+data_root = 'data/occ3d-nus/'
-data_root = 'data/occ3d-nus/'
+file_client_args = dict(backend='disk')
-file_client_args = dict(backend='disk')
+occ_gt_data_root = 'data/occ3d-nus'
-occ_gt_data_root='data/occ3d-nus'
+train_pipeline = [
-train_pipeline = [
+    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='PhotoMetricDistortionMultiViewImage'),
-    dict(type='PhotoMetricDistortionMultiViewImage'),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
-    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
-    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='ObjectNameFilter', classes=class_names),
-    dict(type='ObjectNameFilter', classes=class_names),
+    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
-    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
+    dict(type='PadMultiViewImage', size_divisor=32),
-    dict(type='PadMultiViewImage', size_divisor=32),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
-    dict(type='DefaultFormatBundle3D', class_names=class_names),
+    dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
-    dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
+]
-]
+test_pipeline = [
-test_pipeline = [
+    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
-    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
+    dict(type='PadMultiViewImage', size_divisor=32),
-    dict(type='PadMultiViewImage', size_divisor=32),
+    dict(
-    dict(
+        type='MultiScaleFlipAug3D',
-        type='MultiScaleFlipAug3D',
+        img_scale=(1600, 900),
-        img_scale=(1600, 900),
+        pts_scale_ratio=1,
-        pts_scale_ratio=1,
+        flip=False,
-        flip=False,
+        transforms=[
-        transforms=[
+            dict(
-            dict(
+                type='DefaultFormatBundle3D',
-                type='DefaultFormatBundle3D',
+                class_names=class_names,
-                class_names=class_names,
+                with_label=False),
-                with_label=False),
+            dict(type='CustomCollect3D', keys=['img'])
-            dict(type='CustomCollect3D', keys=['img'])
+        ])
-        ])
+]
-]
+data = dict(
-data = dict(
+    samples_per_gpu=1,
-    samples_per_gpu=1,
+    workers_per_gpu=0,
-    workers_per_gpu=0,
+    train=dict(
-    train=dict(
+        type=dataset_type,
-        type=dataset_type,
+        data_root=data_root,
-        data_root=data_root,
+        ann_file=data_root + 'occ_infos_temporal_train.pkl',
-        ann_file=data_root + 'occ_infos_temporal_train.pkl',
+        pipeline=train_pipeline,
-        pipeline=train_pipeline,
+        classes=class_names,
-        classes=class_names,
+        modality=input_modality,
-        modality=input_modality,
+        test_mode=False,
-        test_mode=False,
+        use_valid_flag=True,
-        use_valid_flag=True,
+        bev_size=(bev_h_, bev_w_),
-        bev_size=(bev_h_, bev_w_),
+        queue_length=queue_length,
-        queue_length=queue_length,
+        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
-        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
+        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
-        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
+        box_type_3d='LiDAR'),
-        box_type_3d='LiDAR'),
+    val=dict(type=dataset_type,
-    val=dict(type=dataset_type,
+             data_root=data_root,
-             data_root=data_root,
+             ann_file=data_root + 'occ_infos_temporal_val.pkl',
-             ann_file=data_root + 'occ_infos_temporal_val.pkl',
+             pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
-             pipeline=test_pipeline,  bev_size=(bev_h_, bev_w_),
+             classes=class_names, modality=input_modality, samples_per_gpu=1),
-             classes=class_names, modality=input_modality, samples_per_gpu=1),
+    test=dict(type=dataset_type,
-    test=dict(type=dataset_type,
+              data_root=data_root,
-              data_root=data_root,
+              ann_file=data_root + 'occ_infos_temporal_val.pkl',
-              ann_file=data_root + 'occ_infos_temporal_val.pkl',
+              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
-              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
+              classes=class_names, modality=input_modality),
-              classes=class_names, modality=input_modality),
+    shuffler_sampler=dict(type='DistributedGroupSampler'),
-    shuffler_sampler=dict(type='DistributedGroupSampler'),
+    nonshuffler_sampler=dict(type='DistributedSampler')
-    nonshuffler_sampler=dict(type='DistributedSampler')
+)
-)
+optimizer = dict(
-optimizer = dict(
+    type='AdamW',
-    type='AdamW',
+    lr=2e-4,
-    lr=2e-4,
+    paramwise_cfg=dict(
-    paramwise_cfg=dict(
+        custom_keys={
-        custom_keys={
+            'img_backbone': dict(lr_mult=0.1),
-            'img_backbone': dict(lr_mult=0.1),
+        }),
-        }),
+    weight_decay=0.01)
-    weight_decay=0.01)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
-optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
-# learning policy
+lr_config = dict(
-lr_config = dict(
+    policy='CosineAnnealing',
-    policy='CosineAnnealing',
+    warmup='linear',
-    warmup='linear',
+    warmup_iters=500,
-    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
-    warmup_ratio=1.0 / 3,
+    min_lr_ratio=1e-3)
-    min_lr_ratio=1e-3)
+total_epochs = 24
-total_epochs = 24
+evaluation = dict(interval=1, pipeline=test_pipeline)
-evaluation = dict(interval=1, pipeline=test_pipeline)
+runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
-runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
+load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
-load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
+log_config = dict(
-log_config = dict(
+    interval=50,
-    interval=50,
+    hooks=[
-    hooks=[
+        dict(type='TextLoggerHook'),
-        dict(type='TextLoggerHook'),
+        dict(type='TensorboardLoggerHook')
-        dict(type='TensorboardLoggerHook')
+    ])
-    ])
+checkpoint_config = dict(interval=1)
-checkpoint_config = dict(interval=1)
--- a/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_intern-s_occ.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_intern-s_occ.py
 _base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
 ]
 #
 plugin = True
 plugin_dir = 'projects/mmdet3d_plugin/'
 # If point cloud range is changed, the models should also change their point
 # cloud range accordingly
 point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
 voxel_size = [0.2, 0.2, 8]
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
-img_norm_cfg = dict(
+# For nuScenes we usually do 10-class detection
-   mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+class_names = [
-# For nuScenes we usually do 10-class detection
+    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
-class_names = [
+    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
+]
-    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-]
+input_modality = dict(
+    use_lidar=False,
-input_modality = dict(
+    use_camera=True,
-    use_lidar=False,
+    use_radar=False,
-    use_camera=True,
+    use_map=False,
-    use_radar=False,
+    use_external=True)
-    use_map=False,
-    use_external=True)
+_dim_ = 256
+_pos_dim_ = _dim_ // 2
-_dim_ = 256
+_ffn_dim_ = _dim_ * 2
-_pos_dim_ = _dim_//2
+_num_levels_ = 4
-_ffn_dim_ = _dim_*2
+bev_h_ = 200
-_num_levels_ = 4
+bev_w_ = 200
-bev_h_ = 200
+queue_length = 4  # each sequence contains `queue_length` frames.
-bev_w_ = 200
+pretrained = 'https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth'
-queue_length = 4 # each sequence contains `queue_length` frames.
+model = dict(
-pretrained = 'https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth'
+    type='BEVFormerOcc',
-model = dict(
+    use_grid_mask=True,
-    type='BEVFormerOcc',
+    video_test_mode=True,
-    use_grid_mask=True,
+    img_backbone=dict(
-    video_test_mode=True,
+        _delete_=True,
-    img_backbone=dict(
+        type='InternImage',
-        _delete_=True,
+        core_op='DCNv3',
-        type='InternImage',
+        channels=80,
-        core_op='DCNv3',
+        depths=[4, 4, 21, 4],
-        channels=80,
+        groups=[5, 10, 20, 40],
-        depths=[4, 4, 21, 4],
+        mlp_ratio=4.,
-        groups=[5, 10, 20, 40],
+        drop_path_rate=0.3,
-        mlp_ratio=4.,
+        norm_layer='LN',
-        drop_path_rate=0.3,
+        layer_scale=1.0,
-        norm_layer='LN',
+        offset_scale=1.0,
-        layer_scale=1.0,
+        post_norm=True,
-        offset_scale=1.0,
+        with_cp=True,
-        post_norm=True,
+        out_indices=(1, 2, 3),
-        with_cp=True,
+        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
-        out_indices=(1, 2, 3),
+    img_neck=dict(
-        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
+        type='FPN',
-    img_neck=dict(
+        in_channels=[160, 320, 640],
-        type='FPN',
+        out_channels=_dim_,
-        in_channels=[160, 320, 640],
+        start_level=0,
-        out_channels=_dim_,
+        add_extra_convs='on_output',
-        start_level=0,
+        num_outs=4,
-        add_extra_convs='on_output',
+        relu_before_extra_convs=True),
-        num_outs=4,
+    pts_bbox_head=dict(
-        relu_before_extra_convs=True),
+        type='BEVFormerOccHead',
-    pts_bbox_head=dict(
+        pc_range=point_cloud_range,
-        type='BEVFormerOccHead',
+        bev_h=bev_h_,
-        pc_range=point_cloud_range,
+        bev_w=bev_w_,
-        bev_h=bev_h_,
+        num_classes=18,
-        bev_w=bev_w_,
+        in_channels=_dim_,
-        num_classes=18,
+        sync_cls_avg_factor=True,
-        in_channels=_dim_,
+        with_box_refine=True,
-        sync_cls_avg_factor=True,
+        as_two_stage=False,
-        with_box_refine=True,
+        # loss_occ=dict(
-        as_two_stage=False,
+        #     type='FocalLoss',
-        # loss_occ=dict(
+        #     use_sigmoid=False,
-        #     type='FocalLoss',
+        #     gamma=2.0,
-        #     use_sigmoid=False,
+        #     alpha=0.25,
-        #     gamma=2.0,
+        #     loss_weight=10.0),
-        #     alpha=0.25,
+        use_mask=False,
-        #     loss_weight=10.0),
+        loss_occ=dict(
-        use_mask=False,
+            type='CrossEntropyLoss',
-        loss_occ= dict(
+            use_sigmoid=False,
-            type='CrossEntropyLoss',
+            loss_weight=1.0),
-            use_sigmoid=False,
+        transformer=dict(
-            loss_weight=1.0),
+            type='TransformerOcc',
-        transformer=dict(
+            pillar_h=16,
-            type='TransformerOcc',
+            num_classes=18,
-            pillar_h=16,
+            norm_cfg=dict(type='BN', ),
-            num_classes=18,
+            norm_cfg_3d=dict(type='BN3d', ),
-            norm_cfg=dict(type='BN', ),
+            use_3d=True,
-            norm_cfg_3d=dict(type='BN3d', ),
+            use_conv=False,
-            use_3d=True,
+            rotate_prev_bev=True,
-            use_conv=False,
+            use_shift=True,
-            rotate_prev_bev=True,
+            use_can_bus=True,
-            use_shift=True,
+            embed_dims=_dim_,
-            use_can_bus=True,
+            encoder=dict(
-            embed_dims=_dim_,
+                type='BEVFormerEncoder',
-            encoder=dict(
+                num_layers=4,
-                type='BEVFormerEncoder',
+                pc_range=point_cloud_range,
-                num_layers=4,
+                num_points_in_pillar=8,
-                pc_range=point_cloud_range,
+                return_intermediate=False,
-                num_points_in_pillar=8,
+                transformerlayers=dict(
-                return_intermediate=False,
+                    type='BEVFormerLayer',
-                transformerlayers=dict(
+                    attn_cfgs=[
-                    type='BEVFormerLayer',
+                        dict(
-                    attn_cfgs=[
+                            type='TemporalSelfAttention',
-                        dict(
+                            embed_dims=_dim_,
-                            type='TemporalSelfAttention',
+                            num_levels=1),
-                            embed_dims=_dim_,
+                        dict(
-                            num_levels=1),
+                            type='SpatialCrossAttention',
-                        dict(
+                            pc_range=point_cloud_range,
-                            type='SpatialCrossAttention',
+                            deformable_attention=dict(
-                            pc_range=point_cloud_range,
+                                type='MSDeformableAttention3D',
-                            deformable_attention=dict(
+                                embed_dims=_dim_,
-                                type='MSDeformableAttention3D',
+                                num_points=8,
-                                embed_dims=_dim_,
+                                num_levels=_num_levels_),
-                                num_points=8,
+                            embed_dims=_dim_,
-                                num_levels=_num_levels_),
+                        )
-                            embed_dims=_dim_,
+                    ],
-                        )
+                    feedforward_channels=_ffn_dim_,
-                    ],
+                    ffn_dropout=0.1,
-                    feedforward_channels=_ffn_dim_,
+                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
-                    ffn_dropout=0.1,
+                                     'ffn', 'norm'))),
-                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
+        ),
-                                     'ffn', 'norm'))),
+        positional_encoding=dict(
-        ),
+            type='LearnedPositionalEncoding',
-        positional_encoding=dict(
+            num_feats=_pos_dim_,
-            type='LearnedPositionalEncoding',
+            row_num_embed=bev_h_,
-            num_feats=_pos_dim_,
+            col_num_embed=bev_w_,
-            row_num_embed=bev_h_,
-            col_num_embed=bev_w_,
+        ),
-        ),
+        # model training and testing settings
+        train_cfg=dict(pts=dict(
+            grid_size=[512, 512, 1],
-    # model training and testing settings
+            voxel_size=voxel_size,
-    train_cfg=dict(pts=dict(
+            point_cloud_range=point_cloud_range,
-        grid_size=[512, 512, 1],
+            out_size_factor=4,
-        voxel_size=voxel_size,
+            assigner=dict(
-        point_cloud_range=point_cloud_range,
+                type='HungarianAssigner3D',
-        out_size_factor=4,
+                cls_cost=dict(type='FocalLossCost', weight=2.0),
-        assigner=dict(
+                reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
-            type='HungarianAssigner3D',
+                iou_cost=dict(type='IoUCost', weight=0.0),
-            cls_cost=dict(type='FocalLossCost', weight=2.0),
+                # Fake cost. This is just to make it compatible with DETR head.
-            reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
+                pc_range=point_cloud_range)))))
-            iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
-            pc_range=point_cloud_range)))))
+dataset_type = 'NuSceneOcc'
+data_root = 'data/occ3d-nus/'
-dataset_type = 'NuSceneOcc'
+file_client_args = dict(backend='disk')
-data_root = 'data/occ3d-nus/'
+occ_gt_data_root = 'data/occ3d-nus'
-file_client_args = dict(backend='disk')
-occ_gt_data_root='data/occ3d-nus'
+train_pipeline = [
+    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-train_pipeline = [
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
-    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
+    dict(type='PhotoMetricDistortionMultiViewImage'),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
-    dict(type='PhotoMetricDistortionMultiViewImage'),
+    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
-    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
+    dict(type='ObjectNameFilter', classes=class_names),
-    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
+    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
-    dict(type='ObjectNameFilter', classes=class_names),
+    dict(type='PadMultiViewImage', size_divisor=32),
-    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
+    dict(type='DefaultFormatBundle3D', class_names=class_names),
-    dict(type='PadMultiViewImage', size_divisor=32),
+    dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
-    dict(type='DefaultFormatBundle3D', class_names=class_names),
+]
-    dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
-]
+test_pipeline = [
+    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-test_pipeline = [
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
-    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
+    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='PadMultiViewImage', size_divisor=32),
-    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
+    dict(
-    dict(type='PadMultiViewImage', size_divisor=32),
+        type='MultiScaleFlipAug3D',
-    dict(
+        img_scale=(1600, 900),
-        type='MultiScaleFlipAug3D',
+        pts_scale_ratio=1,
-        img_scale=(1600, 900),
+        flip=False,
-        pts_scale_ratio=1,
+        transforms=[
-        flip=False,
+            dict(
-        transforms=[
+                type='DefaultFormatBundle3D',
-            dict(
+                class_names=class_names,
-                type='DefaultFormatBundle3D',
+                with_label=False),
-                class_names=class_names,
+            dict(type='CustomCollect3D', keys=['img'])
-                with_label=False),
+        ])
-            dict(type='CustomCollect3D', keys=['img'])
+]
-        ])
-]
+data = dict(
+    samples_per_gpu=1,
-data = dict(
+    workers_per_gpu=6,
-    samples_per_gpu=1,
+    train=dict(
-    workers_per_gpu=6,
+        type=dataset_type,
-    train=dict(
+        data_root=data_root,
-        type=dataset_type,
+        ann_file=data_root + 'occ_infos_temporal_train.pkl',
-        data_root=data_root,
+        pipeline=train_pipeline,
-        ann_file=data_root + 'occ_infos_temporal_train.pkl',
+        classes=class_names,
-        pipeline=train_pipeline,
+        modality=input_modality,
-        classes=class_names,
+        test_mode=False,
-        modality=input_modality,
+        use_valid_flag=True,
-        test_mode=False,
+        bev_size=(bev_h_, bev_w_),
-        use_valid_flag=True,
+        queue_length=queue_length,
-        bev_size=(bev_h_, bev_w_),
+        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
-        queue_length=queue_length,
+        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
-        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
+        box_type_3d='LiDAR'),
-        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
+    val=dict(type=dataset_type,
-        box_type_3d='LiDAR'),
+             data_root=data_root,
-    val=dict(type=dataset_type,
+             ann_file=data_root + 'occ_infos_temporal_val.pkl',
-             data_root=data_root,
+             pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
-             ann_file=data_root + 'occ_infos_temporal_val.pkl',
+             classes=class_names, modality=input_modality, samples_per_gpu=1),
-             pipeline=test_pipeline,  bev_size=(bev_h_, bev_w_),
+    test=dict(type=dataset_type,
-             classes=class_names, modality=input_modality, samples_per_gpu=1),
+              data_root=data_root,
-    test=dict(type=dataset_type,
-              data_root=data_root,
+              ann_file=data_root + 'occ_infos_temporal_val.pkl',
+              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
-              ann_file=data_root + 'occ_infos_temporal_val.pkl',
+              classes=class_names, modality=input_modality),
-              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
+    shuffler_sampler=dict(type='DistributedGroupSampler'),
-              classes=class_names, modality=input_modality),
+    nonshuffler_sampler=dict(type='DistributedSampler')
-    shuffler_sampler=dict(type='DistributedGroupSampler'),
+)
-    nonshuffler_sampler=dict(type='DistributedSampler')
+optimizer = dict(
-)
+    type='AdamW',
-optimizer = dict(
+    lr=2e-4,
-    type='AdamW',
+    weight_decay=0.05,
-    lr=2e-4,
+    constructor='CustomLayerDecayOptimizerConstructor',
-    weight_decay=0.05,
+    paramwise_cfg=dict(
-    constructor='CustomLayerDecayOptimizerConstructor',
+        num_layers=33, layer_decay_rate=1.0,
-    paramwise_cfg=dict(
+        depths=[4, 4, 21, 4]))
-        num_layers=33, layer_decay_rate=1.0,
-                       depths=[4, 4, 21,4]))
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
-optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+lr_config = dict(
-# learning policy
+    policy='CosineAnnealing',
-lr_config = dict(
+    warmup='linear',
-    policy='CosineAnnealing',
+    warmup_iters=500,
-    warmup='linear',
+    warmup_ratio=1.0 / 3,
-    warmup_iters=500,
+    min_lr_ratio=1e-3)
-    warmup_ratio=1.0 / 3,
+total_epochs = 24
-    min_lr_ratio=1e-3)
+evaluation = dict(interval=1, pipeline=test_pipeline)
-total_epochs = 24
+runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
-evaluation = dict(interval=1, pipeline=test_pipeline)
+log_config = dict(
-runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
+    interval=50,
-log_config = dict(
+    hooks=[
-    interval=50,
+        dict(type='TextLoggerHook'),
-    hooks=[
+        dict(type='TensorboardLoggerHook')
-        dict(type='TextLoggerHook'),
+    ])
-        dict(type='TensorboardLoggerHook')
-    ])
+checkpoint_config = dict(interval=1)
-checkpoint_config = dict(interval=1)
--- a/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_small_occ.py
+++ b/autonomous_driving/occupancy_prediction/projects/configs/bevformer/bevformer_small_occ.py
 _base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
 ]
 #
 plugin = True
 plugin_dir = 'projects/mmdet3d_plugin/'
 # If point cloud range is changed, the models should also change their point
 # cloud range accordingly
 point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
 voxel_size = [0.2, 0.2, 8]
+img_norm_cfg = dict(
+    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
-img_norm_cfg = dict(
+# For nuScenes we usually do 10-class detection
-    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
+class_names = [
-# For nuScenes we usually do 10-class detection
+    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
-class_names = [
+    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
+]
-    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
-]
+input_modality = dict(
+    use_lidar=False,
-input_modality = dict(
+    use_camera=True,
-    use_lidar=False,
+    use_radar=False,
-    use_camera=True,
+    use_map=False,
-    use_radar=False,
+    use_external=True)
-    use_map=False,
-    use_external=True)
+_dim_ = 256
+_pos_dim_ = _dim_ // 2
-_dim_ = 256
+_ffn_dim_ = _dim_ * 2
-_pos_dim_ = _dim_//2
+_num_levels_ = 2
-_ffn_dim_ = _dim_*2
+bev_h_ = 200
-_num_levels_ = 2
+bev_w_ = 200
-bev_h_ = 200
+queue_length = 4  # each sequence contains `queue_length` frames.
-bev_w_ = 200
+model = dict(
-queue_length = 4 # each sequence contains `queue_length` frames.
+    type='BEVFormerOcc',
-model = dict(
+    use_grid_mask=True,
-    type='BEVFormerOcc',
+    video_test_mode=True,
-    use_grid_mask=True,
+    img_backbone=dict(
-    video_test_mode=True,
+        type='ResNet',
-    img_backbone=dict(
+        depth=50,
-        type='ResNet',
+        num_stages=4,
-        depth=50,
+        out_indices=(2, 3),
-        num_stages=4,
+        frozen_stages=1,
-        out_indices=(2, 3),
+        norm_cfg=dict(type='BN2d', requires_grad=False),
-        frozen_stages=1,
+        norm_eval=True,
-        norm_cfg=dict(type='BN2d', requires_grad=False),
+        style='caffe',
-        norm_eval=True,
+        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
-        style='caffe',
+        # original DCNv2 will print log when perform load_state_dict
-        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
+        stage_with_dcn=(False, False, True, True)),
-        stage_with_dcn=(False, False, True, True)),
+    img_neck=dict(
-    img_neck=dict(
+        type='FPN',
-        type='FPN',
+        in_channels=[1024, 2048],
-        in_channels=[1024, 2048],
+        out_channels=_dim_,
-        out_channels=_dim_,
+        start_level=0,
-        start_level=0,
+        add_extra_convs='on_output',
-        add_extra_convs='on_output',
+        num_outs=_num_levels_,
-        num_outs=_num_levels_,
+        relu_before_extra_convs=True),
-        relu_before_extra_convs=True),
+    pts_bbox_head=dict(
-    pts_bbox_head=dict(
+        type='BEVFormerOccHead',
-        type='BEVFormerOccHead',
+        pc_range=point_cloud_range,
-        pc_range=point_cloud_range,
+        bev_h=bev_h_,
-        bev_h=bev_h_,
+        bev_w=bev_w_,
-        bev_w=bev_w_,
+        num_classes=18,
-        num_classes=18,
+        in_channels=_dim_,
-        in_channels=_dim_,
+        sync_cls_avg_factor=True,
-        sync_cls_avg_factor=True,
+        with_box_refine=True,
-        with_box_refine=True,
+        as_two_stage=False,
-        as_two_stage=False,
+        # loss_occ=dict(
-        # loss_occ=dict(
+        #     type='FocalLoss',
-        #     type='FocalLoss',
+        #     use_sigmoid=False,
-        #     use_sigmoid=False,
+        #     gamma=2.0,
-        #     gamma=2.0,
+        #     alpha=0.25,
-        #     alpha=0.25,
+        #     loss_weight=10.0),
-        #     loss_weight=10.0),
+        use_mask=False,
-        use_mask=False,
+        loss_occ=dict(
-        loss_occ= dict(
+            type='CrossEntropyLoss',
-            type='CrossEntropyLoss',
+            use_sigmoid=False,
-            use_sigmoid=False,
+            loss_weight=1.0),
-            loss_weight=1.0),
+        transformer=dict(
-        transformer=dict(
+            type='TransformerOcc',
-            type='TransformerOcc',
+            pillar_h=16,
-            pillar_h=16,
+            num_classes=18,
-            num_classes=18,
+            norm_cfg=dict(type='BN', ),
-            norm_cfg=dict(type='BN', ),
+            norm_cfg_3d=dict(type='BN3d', ),
-            norm_cfg_3d=dict(type='BN3d', ),
+            use_3d=True,
-            use_3d=True,
+            use_conv=False,
-            use_conv=False,
+            rotate_prev_bev=True,
-            rotate_prev_bev=True,
+            use_shift=True,
-            use_shift=True,
+            use_can_bus=True,
-            use_can_bus=True,
+            embed_dims=_dim_,
-            embed_dims=_dim_,
+            encoder=dict(
-            encoder=dict(
+                type='BEVFormerEncoder',
-                type='BEVFormerEncoder',
+                num_layers=1,
-                num_layers=1,
+                pc_range=point_cloud_range,
-                pc_range=point_cloud_range,
+                num_points_in_pillar=8,
-                num_points_in_pillar=8,
+                return_intermediate=False,
-                return_intermediate=False,
+                transformerlayers=dict(
-                transformerlayers=dict(
+                    type='BEVFormerLayer',
-                    type='BEVFormerLayer',
+                    attn_cfgs=[
-                    attn_cfgs=[
+                        dict(
-                        dict(
+                            type='TemporalSelfAttention',
-                            type='TemporalSelfAttention',
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                            num_levels=1),
-                            num_levels=1),
+                        dict(
-                        dict(
+                            type='SpatialCrossAttention',
-                            type='SpatialCrossAttention',
+                            pc_range=point_cloud_range,
-                            pc_range=point_cloud_range,
+                            deformable_attention=dict(
-                            deformable_attention=dict(
+                                type='MSDeformableAttention3D',
-                                type='MSDeformableAttention3D',
+                                embed_dims=_dim_,
-                                embed_dims=_dim_,
+                                num_points=8,
-                                num_points=8,
+                                num_levels=_num_levels_),
-                                num_levels=_num_levels_),
+                            embed_dims=_dim_,
-                            embed_dims=_dim_,
+                        )
-                        )
+                    ],
-                    ],
+                    feedforward_channels=_ffn_dim_,
-                    feedforward_channels=_ffn_dim_,
+                    ffn_dropout=0.1,
-                    ffn_dropout=0.1,
+                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
-                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
+                                     'ffn', 'norm'))),
-                                     'ffn', 'norm'))),
+        ),
-        ),
+        positional_encoding=dict(
-        positional_encoding=dict(
+            type='LearnedPositionalEncoding',
-            type='LearnedPositionalEncoding',
+            num_feats=_pos_dim_,
-            num_feats=_pos_dim_,
+            row_num_embed=bev_h_,
-            row_num_embed=bev_h_,
+            col_num_embed=bev_w_,
-            col_num_embed=bev_w_,
+        ),
-        ),
+        # model training and testing settings
-    # model training and testing settings
+        train_cfg=dict(pts=dict(
-    train_cfg=dict(pts=dict(
+            grid_size=[512, 512, 1],
-        grid_size=[512, 512, 1],
+            voxel_size=voxel_size,
-        voxel_size=voxel_size,
+            point_cloud_range=point_cloud_range,
-        point_cloud_range=point_cloud_range,
+            out_size_factor=4,
-        out_size_factor=4,
+            assigner=dict(
-        assigner=dict(
+                type='HungarianAssigner3D',
-            type='HungarianAssigner3D',
+                cls_cost=dict(type='FocalLossCost', weight=2.0),
-            cls_cost=dict(type='FocalLossCost', weight=2.0),
+                reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
-            reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
+                iou_cost=dict(type='IoUCost', weight=0.0),
-            iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
+                # Fake cost. This is just to make it compatible with DETR head.
-            pc_range=point_cloud_range)))))
+                pc_range=point_cloud_range)))))
 dataset_type = 'NuSceneOcc'
 data_root = 'data/occ3d-nus/'
 file_client_args = dict(backend='disk')
-occ_gt_data_root='data/occ3d-nus'
+occ_gt_data_root = 'data/occ3d-nus'
 train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
    dict(type='PhotoMetricDistortionMultiViewImage'),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
    dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
-    dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
+    dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
 ]
 test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
-    dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
+    dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1600, 900),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
-        dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
+            dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
-        dict(type='PadMultiViewImage', size_divisor=32),
+            dict(type='PadMultiViewImage', size_divisor=32),
-        dict(
+            dict(
-            type='DefaultFormatBundle3D',
+                type='DefaultFormatBundle3D',
-            class_names=class_names,
+                class_names=class_names,
-            with_label=False),
+                with_label=False),
-        dict(type='CustomCollect3D', keys=['img'])
+            dict(type='CustomCollect3D', keys=['img'])
-    ])
+        ])
 ]
 data = dict(
    samples_per_gpu=1,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'occ_infos_temporal_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        bev_size=(bev_h_, bev_w_),
        queue_length=queue_length,
        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR'),
    val=dict(type=dataset_type,
             data_root=data_root,
             ann_file=data_root + 'occ_infos_temporal_val.pkl',
-             pipeline=test_pipeline,  bev_size=(bev_h_, bev_w_),
+             pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
             classes=class_names, modality=input_modality, samples_per_gpu=1),
    test=dict(type=dataset_type,
              data_root=data_root,
              ann_file=data_root + 'occ_infos_temporal_val.pkl',
              pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
              classes=class_names, modality=input_modality),
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler')
 )
 optimizer = dict(
    type='AdamW',
    lr=2e-4,
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)
 optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
 # learning policy
 lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)
 total_epochs = 24
 evaluation = dict(interval=1, pipeline=test_pipeline)
 runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
 # load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
 log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
 checkpoint_config = dict(interval=1)
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/__init__.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/__init__.py
+from .bevformer import *
 from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D
 from .core.bbox.coders.nms_free_coder import NMSFreeCoder
 from .core.bbox.match_costs import BBox3DL1Cost
 from .core.evaluation.eval_hooks import CustomDistEvalHook
-from .datasets.pipelines import (
+from .datasets.pipelines import (CustomCollect3D, NormalizeMultiviewImage,
-  PhotoMetricDistortionMultiViewImage, PadMultiViewImage, 
+                                 PadMultiViewImage,
-  NormalizeMultiviewImage,  CustomCollect3D)
+                                 PhotoMetricDistortionMultiViewImage)
 from .models.backbones.vovnet import VoVNet
-from .models.utils import *
 from .models.opt.adamw import AdamW2
-from .bevformer import *
+from .models.utils import *
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/__init__.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/__init__.py
+from .backbones import *
 from .dense_heads import *
 from .detectors import *
-from .modules import *
+from .hooks import *
-from .runner import *
+from .modules import *
-from .hooks import *
+from .runner import *
-from .backbones import *
\ No newline at end of file
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/__init__.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/__init__.py
-from .train import custom_train_model
 from .mmdet_train import custom_train_detector
-# from .test import custom_multi_gpu_test
+from .train import custom_train_model
\ No newline at end of file
+# from .test import custom_multi_gpu_test
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py
@@ -3,42 +3,39 @@
 # ---------------------------------------------
 #  Modified by Zhiqi Li
 # ---------------------------------------------
-import random
+import os.path as osp
+import time
 import warnings
-import numpy as np
 import torch
-import torch.distributed as dist
 from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
 from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,
                         Fp16OptimizerHook, OptimizerHook, build_optimizer,
-                         build_runner, get_dist_info)
+                         build_runner)
 from mmcv.utils import build_from_cfg
 from mmdet.core import EvalHook
+from mmdet.datasets import replace_ImageToTensor
-from mmdet.datasets import (build_dataset,
-                            replace_ImageToTensor)
 from mmdet.utils import get_root_logger
-import time
+from projects.mmdet3d_plugin.core.evaluation.eval_hooks import \
-import os.path as osp
+    CustomDistEvalHook
-from projects.mmdet3d_plugin.datasets.builder import build_dataloader
-from projects.mmdet3d_plugin.core.evaluation.eval_hooks import CustomDistEvalHook
 from projects.mmdet3d_plugin.datasets import custom_build_dataset
+from projects.mmdet3d_plugin.datasets.builder import build_dataloader
 def custom_train_detector(model,
-                   dataset,
+                          dataset,
-                   cfg,
+                          cfg,
-                   distributed=False,
+                          distributed=False,
-                   validate=False,
+                          validate=False,
-                   timestamp=None,
+                          timestamp=None,
-                   eval_model=None,
+                          eval_model=None,
-                   meta=None):
+                          meta=None):
    logger = get_root_logger(cfg.log_level)
    # prepare data loaders
    dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
-    #assert len(dataset)==1s
+    # assert len(dataset)==1s
    if 'imgs_per_gpu' in cfg.data:
        logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. '
                       'Please use "samples_per_gpu" instead')
@@ -90,7 +87,6 @@ def custom_train_detector(model,
            eval_model = MMDataParallel(
                eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
    # build runner
    optimizer = build_optimizer(model, cfg.optimizer)
@@ -142,12 +138,12 @@ def custom_train_detector(model,
    runner.register_training_hooks(cfg.lr_config, optimizer_config,
                                   cfg.checkpoint_config, cfg.log_config,
                                   cfg.get('momentum_config', None))
    # register profiler hook
-    #trace_config = dict(type='tb_trace', dir_name='work_dir')
+    # trace_config = dict(type='tb_trace', dir_name='work_dir')
-    #profiler_config = dict(on_trace_ready=trace_config)
+    # profiler_config = dict(on_trace_ready=trace_config)
-    #runner.register_profiler_hook(profiler_config)
+    # runner.register_profiler_hook(profiler_config)
    if distributed:
        if isinstance(runner, EpochBasedRunner):
            runner.register_hook(DistSamplerSeedHook())
@@ -174,7 +170,7 @@ def custom_train_detector(model,
        )
        eval_cfg = cfg.get('evaluation', {})
        eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
-        eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ','_').replace(':','_'))
+        eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ', '_').replace(':', '_'))
        eval_hook = CustomDistEvalHook if distributed else EvalHook
        runner.register_hook(eval_hook(val_dataloader, **eval_cfg))
@@ -197,4 +193,3 @@ def custom_train_detector(model,
    elif cfg.load_from:
        runner.load_checkpoint(cfg.load_from)
    runner.run(data_loaders, cfg.workflow)
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/test.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/test.py
 # ---------------------------------------------
 # Copyright (c) OpenMMLab. All rights reserved.
 # ---------------------------------------------
 #  Modified by Xiaoyu Tian
 # ---------------------------------------------
 import os.path as osp
-import pickle
+import shutil
-import shutil
+import tempfile
-import tempfile
+import time
-import time
+import mmcv
-import mmcv
+import numpy as np
-import torch
+import pycocotools.mask as mask_util
-import torch.distributed as dist
+import torch
-from mmcv.image import tensor2imgs
+import torch.distributed as dist
 from mmcv.runner import get_dist_info
-from mmdet.core import encode_mask_results
+def custom_encode_mask_results(mask_results):
+    """Encode bitmap mask to RLE code. Semantic Masks only
-import mmcv
+    Args:
-import numpy as np
+        mask_results (list | tuple[list]): bitmap mask results.
-import pycocotools.mask as mask_util
+            In mask scoring rcnn, mask_results is a tuple of (segm_results,
+            segm_cls_score).
-def custom_encode_mask_results(mask_results):
+    Returns:
-    """Encode bitmap mask to RLE code. Semantic Masks only
+        list | tuple: RLE encoded mask.
-    Args:
+    """
-        mask_results (list | tuple[list]): bitmap mask results.
+    cls_segms = mask_results
-            In mask scoring rcnn, mask_results is a tuple of (segm_results,
+    num_classes = len(cls_segms)
-            segm_cls_score).
+    encoded_mask_results = []
-    Returns:
+    for i in range(len(cls_segms)):
-        list | tuple: RLE encoded mask.
+        encoded_mask_results.append(
-    """
+            mask_util.encode(
-    cls_segms = mask_results
+                np.array(
-    num_classes = len(cls_segms)
+                    cls_segms[i][:, :, np.newaxis], order='F',
-    encoded_mask_results = []
+                    dtype='uint8'))[0])  # encoded with RLE
-    for i in range(len(cls_segms)):
+    return [encoded_mask_results]
-        encoded_mask_results.append(
-            mask_util.encode(
-                np.array(
+def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
-                    cls_segms[i][:, :, np.newaxis], order='F',
+    """Test model with multiple gpus.
-                        dtype='uint8'))[0])  # encoded with RLE
+    This method tests model with multiple gpus and collects the results
-    return [encoded_mask_results]
+    under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
+    it encodes results to gpu tensors and use gpu communication for results
-def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
+    collection. On cpu mode it saves the results on different gpus to 'tmpdir'
-    """Test model with multiple gpus.
+    and collects them by the rank 0 worker.
-    This method tests model with multiple gpus and collects the results
+    Args:
-    under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
+        model (nn.Module): Model to be tested.
-    it encodes results to gpu tensors and use gpu communication for results
+        data_loader (nn.Dataloader): Pytorch data loader.
-    collection. On cpu mode it saves the results on different gpus to 'tmpdir'
+        tmpdir (str): Path of directory to save the temporary results from
-    and collects them by the rank 0 worker.
+            different gpus under cpu mode.
-    Args:
+        gpu_collect (bool): Option to use either gpu or cpu to collect results.
-        model (nn.Module): Model to be tested.
+    Returns:
-        data_loader (nn.Dataloader): Pytorch data loader.
+        list: The prediction results.
-        tmpdir (str): Path of directory to save the temporary results from
+    """
-            different gpus under cpu mode.
+    model.eval()
-        gpu_collect (bool): Option to use either gpu or cpu to collect results.
+    bbox_results = []
-    Returns:
+    mask_results = []
-        list: The prediction results.
+    occ_results = []
-    """
+    dataset = data_loader.dataset
-    model.eval()
+    rank, world_size = get_dist_info()
-    bbox_results = []
+    if rank == 0:
-    mask_results = []
+        prog_bar = mmcv.ProgressBar(len(dataset))
-    occ_results = []
+    time.sleep(2)  # This line can prevent deadlock problem in some cases.
-    dataset = data_loader.dataset
+    have_mask = False
-    rank, world_size = get_dist_info()
+    for i, data in enumerate(data_loader):
-    if rank == 0:
+        with torch.no_grad():
-        prog_bar = mmcv.ProgressBar(len(dataset))
+            result = model(return_loss=False, rescale=True, **data)
-    time.sleep(2)  # This line can prevent deadlock problem in some cases.
+            bs = result.shape[0]
-    have_mask = False
+            assert bs == 1, \
-    for i, data in enumerate(data_loader):
+                'Evaluation only supports batch_size=1 in this version'
-        with torch.no_grad():
+            # encode mask results
-            result = model(return_loss=False, rescale=True, **data)
+            if isinstance(result, dict):
-            bs=result.shape[0]
+                if 'bbox_results' in result.keys():
-            assert bs==1, \
+                    bbox_result = result['bbox_results']
-                'Evaluation only supports batch_size=1 in this version'
+                    batch_size = len(result['bbox_results'])
-            # encode mask results
+                    bbox_results.extend(bbox_result)
-            if isinstance(result, dict):
+                if 'mask_results' in result.keys() and result['mask_results'] is not None:
-                if 'bbox_results' in result.keys():
+                    mask_result = custom_encode_mask_results(result['mask_results'])
-                    bbox_result = result['bbox_results']
+                    mask_results.extend(mask_result)
-                    batch_size = len(result['bbox_results'])
+                    have_mask = True
-                    bbox_results.extend(bbox_result)
+            else:
-                if 'mask_results' in result.keys() and result['mask_results'] is not None:
+                batch_size = 1
-                    mask_result = custom_encode_mask_results(result['mask_results'])
+                occ_results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
-                    mask_results.extend(mask_result)
+                # batch_size = len(result)
-                    have_mask = True
+                # bbox_results.extend(result)
-            else:
-                batch_size = 1
+            # if isinstance(result[0], tuple):
-                occ_results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
+            #    assert False, 'this code is for instance segmentation, which our code will not utilize.'
-                # batch_size = len(result)
+            #    result = [(bbox_results, encode_mask_results(mask_results))
-                # bbox_results.extend(result)
+            #              for bbox_results, mask_results in result]
+        if rank == 0:
-            #if isinstance(result[0], tuple):
-            #    assert False, 'this code is for instance segmentation, which our code will not utilize.'
+            for _ in range(batch_size * world_size):
-            #    result = [(bbox_results, encode_mask_results(mask_results))
+                prog_bar.update()
-            #              for bbox_results, mask_results in result]
-        if rank == 0:
+    # collect results from all ranks
+    if gpu_collect:
-            for _ in range(batch_size * world_size):
+        bbox_results = collect_results_gpu(bbox_results, len(dataset))
-                prog_bar.update()
+        if have_mask:
+            mask_results = collect_results_gpu(mask_results, len(dataset))
-    # collect results from all ranks
+        else:
-    if gpu_collect:
+            mask_results = None
-        bbox_results = collect_results_gpu(bbox_results, len(dataset))
+    else:
-        if have_mask:
+        # bbox_results = collect_results_cpu(bbox_results, len(dataset), tmpdir)
-            mask_results = collect_results_gpu(mask_results, len(dataset))
+        # tmpdir = tmpdir+'_mask' if tmpdir is not None else None
-        else:
+        # if have_mask:
-            mask_results = None
+        #     mask_results = collect_results_cpu(mask_results, len(dataset), tmpdir)
-    else:
+        # else:
-        # bbox_results = collect_results_cpu(bbox_results, len(dataset), tmpdir)
+        #     mask_results = None
-        # tmpdir = tmpdir+'_mask' if tmpdir is not None else None
+        tmpdir = tmpdir + '_occ' if tmpdir is not None else None
-        # if have_mask:
+        occ_results = collect_results_cpu(occ_results, len(dataset), tmpdir)
-        #     mask_results = collect_results_cpu(mask_results, len(dataset), tmpdir)
-        # else:
+    return occ_results
-        #     mask_results = None
-        tmpdir = tmpdir + '_occ' if tmpdir is not None else None
-        occ_results = collect_results_cpu(occ_results, len(dataset), tmpdir)
+def collect_results_cpu(result_part, size, tmpdir=None):
+    rank, world_size = get_dist_info()
-    return occ_results
+    # create a tmp dir if it is not specified
+    if tmpdir is None:
+        MAX_LEN = 512
-def collect_results_cpu(result_part, size, tmpdir=None):
+        # 32 is whitespace
-    rank, world_size = get_dist_info()
+        dir_tensor = torch.full((MAX_LEN,),
-    # create a tmp dir if it is not specified
+                                32,
-    if tmpdir is None:
+                                dtype=torch.uint8,
-        MAX_LEN = 512
+                                device='cuda')
-        # 32 is whitespace
+        if rank == 0:
-        dir_tensor = torch.full((MAX_LEN, ),
+            mmcv.mkdir_or_exist('.dist_test')
-                                32,
+            tmpdir = tempfile.mkdtemp(dir='.dist_test')
-                                dtype=torch.uint8,
+            tmpdir = torch.tensor(
-                                device='cuda')
+                bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
-        if rank == 0:
+            dir_tensor[:len(tmpdir)] = tmpdir
-            mmcv.mkdir_or_exist('.dist_test')
+        dist.broadcast(dir_tensor, 0)
-            tmpdir = tempfile.mkdtemp(dir='.dist_test')
+        tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
-            tmpdir = torch.tensor(
+    else:
-                bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
+        mmcv.mkdir_or_exist(tmpdir)
-            dir_tensor[:len(tmpdir)] = tmpdir
+    # dump the part result to the dir
-        dist.broadcast(dir_tensor, 0)
+    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
-        tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
+    dist.barrier()
-    else:
+    # collect all parts
-        mmcv.mkdir_or_exist(tmpdir)
+    if rank != 0:
-    # dump the part result to the dir
+        return None
-    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
+    else:
-    dist.barrier()
+        # load results of all parts from tmp dir
-    # collect all parts
+        part_list = []
-    if rank != 0:
+        for i in range(world_size):
-        return None
+            part_file = osp.join(tmpdir, f'part_{i}.pkl')
-    else:
+            part_list.append(mmcv.load(part_file))
-        # load results of all parts from tmp dir
+        # sort the results
-        part_list = []
+        ordered_results = []
-        for i in range(world_size):
+        '''
-            part_file = osp.join(tmpdir, f'part_{i}.pkl')
+        bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample,
-            part_list.append(mmcv.load(part_file))
+        '''
-        # sort the results
+        # for res in zip(*part_list):
-        ordered_results = []
+        for res in part_list:
-        '''
+            ordered_results.extend(list(res))
-        bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample,
+        # the dataloader may pad some samples
-        '''
+        ordered_results = ordered_results[:size]
-        #for res in zip(*part_list):
+        # remove tmp dir
-        for res in part_list:  
+        shutil.rmtree(tmpdir)
-            ordered_results.extend(list(res))
+        return ordered_results
-        # the dataloader may pad some samples
-        ordered_results = ordered_results[:size]
-        # remove tmp dir
+def single_gpu_test(model,
-        shutil.rmtree(tmpdir)
+                    data_loader,
-        return ordered_results
+                    show=False,
+                    out_dir=None,
-def single_gpu_test(model,
+                    show_score_thr=0.3):
-                    data_loader,
+    """Test model with single gpu.
-                    show=False,
-                    out_dir=None,
+    This method tests model with single gpu and gives the 'show' option.
-                    show_score_thr=0.3):
+    By setting ``show=True``, it saves the visualization results under
-    """Test model with single gpu.
+    ``out_dir``.
-    This method tests model with single gpu and gives the 'show' option.
+    Args:
-    By setting ``show=True``, it saves the visualization results under
+        model (nn.Module): Model to be tested.
-    ``out_dir``.
+        data_loader (nn.Dataloader): Pytorch data loader.
+        show (bool): Whether to save viualization results.
-    Args:
+            Default: True.
-        model (nn.Module): Model to be tested.
+        out_dir (str): The path to save visualization results.
-        data_loader (nn.Dataloader): Pytorch data loader.
+            Default: None.
-        show (bool): Whether to save viualization results.
-            Default: True.
+    Returns:
-        out_dir (str): The path to save visualization results.
+        list[dict]: The prediction results.
-            Default: None.
+    """
+    model.eval()
-    Returns:
+    results = []
-        list[dict]: The prediction results.
+    dataset = data_loader.dataset
-    """
+    prog_bar = mmcv.ProgressBar(len(dataset))
-    model.eval()
+    for i, data in enumerate(data_loader):
-    results = []
+        with torch.no_grad():
-    dataset = data_loader.dataset
+            result = model(return_loss=False, rescale=True, **data)
-    prog_bar = mmcv.ProgressBar(len(dataset))
-    for i, data in enumerate(data_loader):
+        results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
-        with torch.no_grad():
-            result = model(return_loss=False, rescale=True, **data)
+        batch_size = len(result)
+        for _ in range(batch_size):
-        results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
+            prog_bar.update()
+    return results
-        batch_size = len(result)
-        for _ in range(batch_size):
-            prog_bar.update()
+def collect_results_gpu(result_part, size):
-    return results
+    collect_results_cpu(result_part, size)
-def collect_results_gpu(result_part, size):
-    collect_results_cpu(result_part, size)
\ No newline at end of file
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/train.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/apis/train.py
@@ -4,18 +4,20 @@
 #  Modified by Zhiqi Li
 # ---------------------------------------------
-from .mmdet_train import custom_train_detector
-from mmseg.apis import train_segmentor
 from mmdet.apis import train_detector
+from mmseg.apis import train_segmentor
+from .mmdet_train import custom_train_detector
 def custom_train_model(model,
-                dataset,
+                       dataset,
-                cfg,
+                       cfg,
-                distributed=False,
+                       distributed=False,
-                validate=False,
+                       validate=False,
-                timestamp=None,
+                       timestamp=None,
-                eval_model=None,
+                       eval_model=None,
-                meta=None):
+                       meta=None):
    """A function wrapper for launching model training according to cfg.
    Because we need different eval_hook in runner. Should be deprecated in the

--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/__init__.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/__init__.py
-from .internimage import  InternImage
+from .custom_layer_decay_optimizer_constructor import \
-from .custom_layer_decay_optimizer_constructor import CustomLayerDecayOptimizerConstructor
+    CustomLayerDecayOptimizerConstructor
\ No newline at end of file
+from .internimage import InternImage
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/custom_layer_decay_optimizer_constructor.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/custom_layer_decay_optimizer_constructor.py
@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c
 import json
-from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor
+from mmcv.runner import (OPTIMIZER_BUILDERS, DefaultOptimizerConstructor,
-from mmcv.runner import get_dist_info
+                         get_dist_info)
 from mmdet.utils import get_root_logger
 def get_num_layer_for_swin(var_name, num_max_layer, depths):
-    if var_name.startswith("img_backbone.patch_embed"):
+    if var_name.startswith('img_backbone.patch_embed'):
        return 0
-    elif "level_embeds" in var_name:
+    elif 'level_embeds' in var_name:
        return 0
-    elif var_name.startswith("img_backbone.layers") or var_name.startswith(
+    elif var_name.startswith('img_backbone.layers') or var_name.startswith(
-            "img_backbone.levels"):
+            'img_backbone.levels'):
        if var_name.split('.')[3] not in ['downsample', 'norm']:
            stage_id = int(var_name.split('.')[2])
            layer_id = int(var_name.split('.')[4])
@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
        depths = self.paramwise_cfg.get('depths')
        offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0)
-        logger.info("Build CustomLayerDecayOptimizerConstructor %f - %d" %
+        logger.info('Build CustomLayerDecayOptimizerConstructor %f - %d' %
                    (layer_decay_rate, num_layers))
        weight_decay = self.base_wd
        for name, param in module.named_parameters():
            if not param.requires_grad:
                continue  # frozen weights
-            if len(param.shape) == 1 or name.endswith(".bias") or \
+            if len(param.shape) == 1 or name.endswith('.bias') or \
-                    "relative_position" in name or \
+                    'relative_position' in name or \
-                    "norm" in name or\
+                    'norm' in name or \
-                    "sampling_offsets" in name:
+                    'sampling_offsets' in name:
-                group_name = "no_decay"
+                group_name = 'no_decay'
                this_weight_decay = 0.
            else:
-                group_name = "decay"
+                group_name = 'decay'
                this_weight_decay = weight_decay
            layer_id = get_num_layer_for_swin(name, num_layers, depths)
            if layer_id == num_layers - 1 and dino_head and \
-                    ("sampling_offsets" in name or "reference_points" in name):
+                    ('sampling_offsets' in name or 'reference_points' in name):
-                group_name = "layer_%d_%s_0.1x" % (layer_id, group_name)
+                group_name = 'layer_%d_%s_0.1x' % (layer_id, group_name)
-            elif "sampling_offsets" in name or "reference_points" in name:
+            elif 'sampling_offsets' in name or 'reference_points' in name:
-                group_name = "layer_%d_%s_offset_lr_scale" % (layer_id,
+                group_name = 'layer_%d_%s_offset_lr_scale' % (layer_id,
                                                              group_name)
            else:
-                group_name = "layer_%d_%s" % (layer_id, group_name)
+                group_name = 'layer_%d_%s' % (layer_id, group_name)
            if group_name not in parameter_groups:
                scale = layer_decay_rate ** (num_layers - layer_id - 1)
                if scale < 1 and backbone_small_lr == True:
                    scale = scale * 0.1
-                if "0.1x" in group_name:
+                if '0.1x' in group_name:
                    scale = scale * 0.1
-                if "offset_lr_scale" in group_name:
+                if 'offset_lr_scale' in group_name:
                    scale = scale * offset_lr_scale
                parameter_groups[group_name] = {
-                    "weight_decay": this_weight_decay,
+                    'weight_decay': this_weight_decay,
-                    "params": [],
+                    'params': [],
-                    "param_names": [],
+                    'param_names': [],
-                    "lr_scale": scale,
+                    'lr_scale': scale,
-                    "group_name": group_name,
+                    'group_name': group_name,
-                    "lr": scale * self.base_lr,
+                    'lr': scale * self.base_lr,
                }
-            parameter_groups[group_name]["params"].append(param)
+            parameter_groups[group_name]['params'].append(param)
-            parameter_groups[group_name]["param_names"].append(name)
+            parameter_groups[group_name]['param_names'].append(name)
        rank, _ = get_dist_info()
        if rank == 0:
            to_display = {}
            for key in parameter_groups:
                to_display[key] = {
-                    "param_names": parameter_groups[key]["param_names"],
+                    'param_names': parameter_groups[key]['param_names'],
-                    "lr_scale": parameter_groups[key]["lr_scale"],
+                    'lr_scale': parameter_groups[key]['lr_scale'],
-                    "lr": parameter_groups[key]["lr"],
+                    'lr': parameter_groups[key]['lr'],
-                    "weight_decay": parameter_groups[key]["weight_decay"],
+                    'weight_decay': parameter_groups[key]['weight_decay'],
                }
-            logger.info("Param groups = %s" % json.dumps(to_display, indent=2))
+            logger.info('Param groups = %s' % json.dumps(to_display, indent=2))
        # state_dict = module.state_dict()
        # for group_name in parameter_groups:
@@ -139,4 +139,4 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
        #     for name in group["param_names"]:
        #         group["params"].append(state_dict[name])
        params.extend(parameter_groups.values())
\ No newline at end of file
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/internimage.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/internimage.py
@@ -4,16 +4,17 @@
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
+from collections import OrderedDict
 import torch
 import torch.nn as nn
-from collections import OrderedDict
+import torch.nn.functional as F
 import torch.utils.checkpoint as checkpoint
-from timm.models.layers import trunc_normal_, DropPath
-from mmcv.runner import _load_checkpoint
 from mmcv.cnn import constant_init, trunc_normal_init
-from mmdet.utils import get_root_logger
+from mmcv.runner import _load_checkpoint
 from mmdet.models.builder import BACKBONES
-import torch.nn.functional as F
+from mmdet.utils import get_root_logger
+from timm.models.layers import DropPath, trunc_normal_
 from .ops_dcnv3 import modules as opsm
@@ -86,7 +87,7 @@ class CrossAttention(nn.Module):
        attn_head_dim (int, optional): Dimension of attention head.
        out_dim (int, optional): Dimension of output.
    """
    def __init__(self,
                 dim,
                 num_heads=8,
@@ -178,7 +179,7 @@ class AttentiveBlock(nn.Module):
        attn_head_dim (int, optional): Dimension of attention head. Default: None.
        out_dim (int, optional): Dimension of output. Default: None.
    """
    def __init__(self,
                 dim,
                 num_heads,
@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module):
                 drop=0.,
                 attn_drop=0.,
                 drop_path=0.,
-                 norm_layer="LN",
+                 norm_layer='LN',
                 attn_head_dim=None,
                 out_dim=None):
        super().__init__()
@@ -363,9 +364,9 @@ class InternImageLayer(nn.Module):
                 layer_scale=None,
                 offset_scale=1.0,
                 with_cp=False,
-                 dw_kernel_size=None, # for InternImage-H/G
+                 dw_kernel_size=None,  # for InternImage-H/G
-                 res_post_norm=False, # for InternImage-H/G
+                 res_post_norm=False,  # for InternImage-H/G
-                 center_feature_scale=False): # for InternImage-H/G
+                 center_feature_scale=False):  # for InternImage-H/G
        super().__init__()
        self.channels = channels
        self.groups = groups
@@ -384,8 +385,8 @@ class InternImageLayer(nn.Module):
            offset_scale=offset_scale,
            act_layer=act_layer,
            norm_layer=norm_layer,
-            dw_kernel_size=dw_kernel_size, # for InternImage-H/G
+            dw_kernel_size=dw_kernel_size,  # for InternImage-H/G
-            center_feature_scale=center_feature_scale) # for InternImage-H/G
+            center_feature_scale=center_feature_scale)  # for InternImage-H/G
        self.drop_path = DropPath(drop_path) if drop_path > 0. \
            else nn.Identity()
        self.norm2 = build_norm_layer(channels, 'LN')
@@ -411,7 +412,7 @@ class InternImageLayer(nn.Module):
                if self.post_norm:
                    x = x + self.drop_path(self.norm1(self.dcn(x)))
                    x = x + self.drop_path(self.norm2(self.mlp(x)))
-                elif self.res_post_norm: # for InternImage-H/G
+                elif self.res_post_norm:  # for InternImage-H/G
                    x = x + self.drop_path(self.res_post_norm1(self.dcn(self.norm1(x))))
                    x = x + self.drop_path(self.res_post_norm2(self.mlp(self.norm2(x))))
                else:
@@ -466,10 +467,10 @@ class InternImageBlock(nn.Module):
                 offset_scale=1.0,
                 layer_scale=None,
                 with_cp=False,
-                 dw_kernel_size=None, # for InternImage-H/G
+                 dw_kernel_size=None,  # for InternImage-H/G
-                 post_norm_block_ids=None, # for InternImage-H/G
+                 post_norm_block_ids=None,  # for InternImage-H/G
-                 res_post_norm=False, # for InternImage-H/G
+                 res_post_norm=False,  # for InternImage-H/G
-                 center_feature_scale=False): # for InternImage-H/G
+                 center_feature_scale=False):  # for InternImage-H/G
        super().__init__()
        self.channels = channels
        self.depth = depth
@@ -491,15 +492,15 @@ class InternImageBlock(nn.Module):
                layer_scale=layer_scale,
                offset_scale=offset_scale,
                with_cp=with_cp,
-                dw_kernel_size=dw_kernel_size, # for InternImage-H/G
+                dw_kernel_size=dw_kernel_size,  # for InternImage-H/G
-                res_post_norm=res_post_norm, # for InternImage-H/G
+                res_post_norm=res_post_norm,  # for InternImage-H/G
-                center_feature_scale=center_feature_scale # for InternImage-H/G
+                center_feature_scale=center_feature_scale  # for InternImage-H/G
            ) for i in range(depth)
        ])
        if not self.post_norm or center_feature_scale:
            self.norm = build_norm_layer(channels, 'LN')
        self.post_norm_block_ids = post_norm_block_ids
-        if post_norm_block_ids is not None: # for InternImage-H/G
+        if post_norm_block_ids is not None:  # for InternImage-H/G
            self.post_norms = nn.ModuleList(
                [build_norm_layer(channels, 'LN', eps=1e-6) for _ in post_norm_block_ids]
            )
@@ -511,7 +512,7 @@ class InternImageBlock(nn.Module):
            x = blk(x)
            if (self.post_norm_block_ids is not None) and (i in self.post_norm_block_ids):
                index = self.post_norm_block_ids.index(i)
-                x = self.post_norms[index](x) # for InternImage-H/G
+                x = self.post_norms[index](x)  # for InternImage-H/G
        if not self.post_norm or self.center_feature_scale:
            x = self.norm(x)
        if return_wo_downsample:
@@ -577,7 +578,7 @@ class InternImage(nn.Module):
        self.num_levels = len(depths)
        self.depths = depths
        self.channels = channels
-        self.num_features = int(channels * 2**(self.num_levels - 1))
+        self.num_features = int(channels * 2 ** (self.num_levels - 1))
        self.post_norm = post_norm
        self.mlp_ratio = mlp_ratio
        self.init_cfg = init_cfg
@@ -588,9 +589,9 @@ class InternImage(nn.Module):
        logger.info(f'using activation layer: {act_layer}')
        logger.info(f'using main norm layer: {norm_layer}')
        logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}')
-        logger.info(f"level2_post_norm: {level2_post_norm}")
+        logger.info(f'level2_post_norm: {level2_post_norm}')
-        logger.info(f"level2_post_norm_block_ids: {level2_post_norm_block_ids}")
+        logger.info(f'level2_post_norm_block_ids: {level2_post_norm_block_ids}')
-        logger.info(f"res_post_norm: {res_post_norm}")
+        logger.info(f'res_post_norm: {res_post_norm}')
        in_chans = 3
        self.patch_embed = StemLayer(in_chans=in_chans,
@@ -609,10 +610,10 @@ class InternImage(nn.Module):
        self.levels = nn.ModuleList()
        for i in range(self.num_levels):
            post_norm_block_ids = level2_post_norm_block_ids if level2_post_norm and (
-                i == 2) else None # for InternImage-H/G
+                    i == 2) else None  # for InternImage-H/G
            level = InternImageBlock(
                core_op=getattr(opsm, core_op),
-                channels=int(channels * 2**i),
+                channels=int(channels * 2 ** i),
                depth=depths[i],
                groups=groups[i],
                mlp_ratio=self.mlp_ratio,
@@ -626,9 +627,9 @@ class InternImage(nn.Module):
                offset_scale=offset_scale,
                with_cp=with_cp,
                dw_kernel_size=dw_kernel_size,  # for InternImage-H/G
-                post_norm_block_ids=post_norm_block_ids, # for InternImage-H/G
+                post_norm_block_ids=post_norm_block_ids,  # for InternImage-H/G
-                res_post_norm=res_post_norm, # for InternImage-H/G
+                res_post_norm=res_post_norm,  # for InternImage-H/G
-                center_feature_scale=center_feature_scale # for InternImage-H/G
+                center_feature_scale=center_feature_scale  # for InternImage-H/G
            )
            self.levels.append(level)
@@ -699,4 +700,4 @@ class InternImage(nn.Module):
            x, x_ = level(x, return_wo_downsample=True)
            if level_idx in self.out_indices:
                seq_out.append(x_.permute(0, 3, 1, 2).contiguous())
        return seq_out
\ No newline at end of file
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3/functions/dcnv3_func.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3/functions/dcnv3_func.py
@@ -4,16 +4,14 @@
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
-from __future__ import absolute_import
+from __future__ import absolute_import, division, print_function
-from __future__ import print_function
-from __future__ import division
+import DCNv3
 import torch
 import torch.nn.functional as F
 from torch.autograd import Function
 from torch.autograd.function import once_differentiable
 from torch.cuda.amp import custom_bwd, custom_fwd
-import DCNv3
 class DCNv3Function(Function):
@@ -58,7 +56,7 @@ class DCNv3Function(Function):
                ctx.group_channels, ctx.offset_scale, grad_output.contiguous(), ctx.im2col_step)
        return grad_input, grad_offset, grad_mask, \
-            None, None, None, None, None, None, None, None, None, None, None, None
+               None, None, None, None, None, None, None, None, None, None, None, None
    @staticmethod
    def symbolic(g, input, offset, mask, kernel_h, kernel_w, stride_h,
@@ -88,7 +86,9 @@ class DCNv3Function(Function):
            im2col_step_i=int(im2col_step),
        )
-def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0, stride_h=1, stride_w=1):
+def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0,
+                          stride_h=1, stride_w=1):
    _, H_, W_, _ = spatial_shapes
    H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1
    W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1
@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil
            device=device))
    points_list.extend([x / W_, y / H_])
-    grid = torch.stack(points_list, -1).reshape(-1, 1, 2).\
+    grid = torch.stack(points_list, -1).reshape(-1, 1, 2). \
        repeat(1, group, 1).permute(1, 0, 2)
    grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2)
@@ -161,28 +161,28 @@ def dcnv3_core_pytorch(
        input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w)
    grid = _generate_dilation_grids(
        input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device)
-    spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2).\
+    spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2). \
-        repeat(1, 1, 1, group*kernel_h*kernel_w).to(input.device)
+        repeat(1, 1, 1, group * kernel_h * kernel_w).to(input.device)
    sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \
-        offset * offset_scale / spatial_norm
+                         offset * offset_scale / spatial_norm
    P_ = kernel_h * kernel_w
    sampling_grids = 2 * sampling_locations - 1
    # N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in
-    input_ = input.view(N_, H_in*W_in, group*group_channels).transpose(1, 2).\
+    input_ = input.view(N_, H_in * W_in, group * group_channels).transpose(1, 2). \
-        reshape(N_*group, group_channels, H_in, W_in)
+        reshape(N_ * group, group_channels, H_in, W_in)
    # N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2
-    sampling_grid_ = sampling_grids.view(N_, H_out*W_out, group, P_, 2).transpose(1, 2).\
+    sampling_grid_ = sampling_grids.view(N_, H_out * W_out, group, P_, 2).transpose(1, 2). \
        flatten(0, 1)
    # N_*group, group_channels, H_out*W_out, P_
    sampling_input_ = F.grid_sample(
        input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False)
    # (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_)
-    mask = mask.view(N_, H_out*W_out, group, P_).transpose(1, 2).\
+    mask = mask.view(N_, H_out * W_out, group, P_).transpose(1, 2). \
-        reshape(N_*group, 1, H_out*W_out, P_)
+        reshape(N_ * group, 1, H_out * W_out, P_)
    output = (sampling_input_ * mask).sum(-1).view(N_,
-                                                   group*group_channels, H_out*W_out)
+                                                   group * group_channels, H_out * W_out)
    return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous()
--- a/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3/modules/__init__.py
+++ b/autonomous_driving/occupancy_prediction/projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3/modules/__init__.py
@@ -4,4 +4,4 @@
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
 from .dcnv3 import DCNv3, DCNv3_pytorch
\ No newline at end of file