Commit 41b18fd8 authored by zhe chen's avatar zhe chen
Browse files

Use pre-commit to reformat code


Use pre-commit to reformat code
parent ff20ea39
## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!! ## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!!
We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a series of leaderboards and benchmarks, such as *COCO* and *nuScenes*. We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a
series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
#### 1. Requirements #### 1. Requirements
```bash ```bash
python>=3.8 python>=3.8
torch==1.12 # recommend torch==1.12 # recommend
...@@ -16,8 +16,8 @@ numpy==1.22 ...@@ -16,8 +16,8 @@ numpy==1.22
mmdet3d==0.18.1 # recommend mmdet3d==0.18.1 # recommend
``` ```
### 2. Install DCNv3 for InternImage ### 2. Install DCNv3 for InternImage
```bash ```bash
cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3 cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3
bash make.sh # requires torch>=1.10 bash make.sh # requires torch>=1.10
...@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10 ...@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10
Notes: InatenImage provides abundant pre-trained model weights that can be used!!! Notes: InatenImage provides abundant pre-trained model weights that can be used!!!
### 4. Performance compared to baseline ### 4. Performance compared to baseline
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | | model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: | | ---------------------- | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
bevformer_intern-s_occ|[Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)| 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 | | bevformer_intern-s_occ | [Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing) | 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | | bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
## Challenge Timeline ## Challenge Timeline
- Pending - Challenge Period Open. - Pending - Challenge Period Open.
- Jun 01, 2023 - Challenge Period End. - Jun 01, 2023 - Challenge Period End.
- Jun 03, 2023 - Finalist Notification. - Jun 03, 2023 - Finalist Notification.
- Jun 10, 2023 - Technical Report Deadline. - Jun 10, 2023 - Technical Report Deadline.
- Jun 12, 2023 - Winner Announcement. - Jun 12, 2023 - Winner Announcement.
<p align="right">(<a href="#top">back to top</a>)</p> <p align="right">(<a href="#top">back to top</a>)</p>
## Leaderboard
## Leaderboard
To be released. To be released.
<p align="right">(<a href="#top">back to top</a>)</p> <p align="right">(<a href="#top">back to top</a>)</p>
## License ## License
Before using the dataset, you should register on the website and agree to the terms of use of the [nuScenes](https://www.nuscenes.org/nuscenes).
All code within this repository is under [Apache License 2.0](./LICENSE). Before using the dataset, you should register on the website and agree to the terms of use of
the [nuScenes](https://www.nuscenes.org/nuscenes). All code within this repository is
under [Apache License 2.0](./LICENSE).
<p align="right">(<a href="#top">back to top</a>)</p> <p align="right">(<a href="#top">back to top</a>)</p>
## Installation ## Installation
Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
## Preparing Dataset
1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset. ## Preparing Dataset
2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download). 1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample
files of the nuScenes dataset.
3. Organize your folder structure as below:
``` 2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download).
Occupancy3D
├── projects/ 3. Organize your folder structure as below:
├── tools/
├── ckpts/ ```
│ ├── r101_dcn_fcos3d_pretrain.pth Occupancy3D
├── data/ ├── projects/
│ ├── can_bus/ ├── tools/
│ ├── occ3d-nus/ ├── ckpts/
│ │ ├── maps/ │ ├── r101_dcn_fcos3d_pretrain.pth
│ │ ├── samples/ # You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset ├── data/
│ │ ├── v1.0-trainval/ │ ├── can_bus/
│ │ ├── gts/ │ ├── occ3d-nus/
│ │ │── annotations.json │ │ ├── maps/
``` │ │ ├── samples/ # You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset
│ │ ├── v1.0-trainval/
│ │ ├── gts/
4. Generate the info files for training and validation: │ │ │── annotations.json
``` ```
python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
``` 4. Generate the info files for training and validation:
## Training ```
``` python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8 ```
```
## Training
## Testing
``` ```
./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8 ./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8
``` ```
You can evaluate the F-score at the same time by adding `--eval_fscore`.
## Testing
### Performance ```
./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | ```
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | You can evaluate the F-score at the same time by adding `--eval_fscore`.
### Performance
| model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
| ------------------ | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
| bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
...@@ -125,8 +125,7 @@ data = dict( ...@@ -125,8 +125,7 @@ data = dict(
classes=class_names, classes=class_names,
test_mode=True, test_mode=True,
ignore_index=len(class_names), ignore_index=len(class_names),
scene_idxs=data_root + scene_idxs=data_root + f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
test=dict( test=dict(
type=dataset_type, type=dataset_type,
data_root=data_root, data_root=data_root,
......
...@@ -25,7 +25,7 @@ model = dict( ...@@ -25,7 +25,7 @@ model = dict(
in_channels=256, in_channels=256,
num_points=256, num_points=256,
gt_per_seed=1, gt_per_seed=1,
conv_channels=(128, ), conv_channels=(128,),
conv_cfg=dict(type='Conv1d'), conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1), norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
with_res_feat=False, with_res_feat=False,
...@@ -43,8 +43,8 @@ model = dict( ...@@ -43,8 +43,8 @@ model = dict(
pred_layer_cfg=dict( pred_layer_cfg=dict(
in_channels=1536, in_channels=1536,
shared_conv_channels=(512, 128), shared_conv_channels=(512, 128),
cls_conv_channels=(128, ), cls_conv_channels=(128,),
reg_conv_channels=(128, ), reg_conv_channels=(128,),
conv_cfg=dict(type='Conv1d'), conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1), norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
bias=True), bias=True),
......
...@@ -31,16 +31,16 @@ model = dict( ...@@ -31,16 +31,16 @@ model = dict(
dir_offset=0.7854, # pi/4 dir_offset=0.7854, # pi/4
strides=[8, 16, 32, 64, 128], strides=[8, 16, 32, 64, 128],
group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo
cls_branch=(256, ), cls_branch=(256,),
reg_branch=( reg_branch=(
(256, ), # offset (256,), # offset
(256, ), # depth (256,), # depth
(256, ), # size (256,), # size
(256, ), # rot (256,), # rot
() # velo () # velo
), ),
dir_branch=(256, ), dir_branch=(256,),
attr_branch=(256, ), attr_branch=(256,),
loss_cls=dict( loss_cls=dict(
type='FocalLoss', type='FocalLoss',
use_sigmoid=True, use_sigmoid=True,
......
_base_ = [ _base_ = [
'../datasets/custom_nus-3d.py', '../datasets/custom_nus-3d.py',
'../_base_/default_runtime.py' '../_base_/default_runtime.py'
] ]
# #
plugin = True plugin = True
plugin_dir = 'projects/mmdet3d_plugin/' plugin_dir = 'projects/mmdet3d_plugin/'
# If point cloud range is changed, the models should also change their point # If point cloud range is changed, the models should also change their point
# cloud range accordingly # cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
img_norm_cfg = dict( # For nuScenes we usually do 10-class detection
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) class_names = [
# For nuScenes we usually do 10-class detection 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
class_names = [ 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', ]
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
] input_modality = dict(
use_lidar=False,
input_modality = dict( use_camera=True,
use_lidar=False, use_radar=False,
use_camera=True, use_map=False,
use_radar=False, use_external=True)
use_map=False,
use_external=True) _dim_ = 256
_pos_dim_ = _dim_ // 2
_dim_ = 256 _ffn_dim_ = _dim_ * 2
_pos_dim_ = _dim_//2 _num_levels_ = 2
_ffn_dim_ = _dim_*2 bev_h_ = 200
_num_levels_ = 2 bev_w_ = 200
bev_h_ = 200 queue_length = 4 # each sequence contains `queue_length` frames.
bev_w_ = 200 model = dict(
queue_length = 4 # each sequence contains `queue_length` frames. type='BEVFormerOcc',
model = dict( use_grid_mask=True,
type='BEVFormerOcc', video_test_mode=True,
use_grid_mask=True, img_backbone=dict(
video_test_mode=True, type='ResNet',
img_backbone=dict( depth=50,
type='ResNet', num_stages=4,
depth=50, out_indices=(2, 3),
num_stages=4, frozen_stages=1,
out_indices=(2, 3), norm_cfg=dict(type='BN2d', requires_grad=False),
frozen_stages=1, norm_eval=True,
norm_cfg=dict(type='BN2d', requires_grad=False), style='caffe',
norm_eval=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
style='caffe', # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict stage_with_dcn=(False, False, True, True)),
stage_with_dcn=(False, False, True, True)), img_neck=dict(
img_neck=dict( type='FPN',
type='FPN', in_channels=[1024, 2048],
in_channels=[1024, 2048], out_channels=_dim_,
out_channels=_dim_, start_level=0,
start_level=0, add_extra_convs='on_output',
add_extra_convs='on_output', num_outs=_num_levels_,
num_outs=_num_levels_, relu_before_extra_convs=True),
relu_before_extra_convs=True), pts_bbox_head=dict(
pts_bbox_head=dict( type='BEVFormerOccHead',
type='BEVFormerOccHead', pc_range=point_cloud_range,
pc_range=point_cloud_range, bev_h=bev_h_,
bev_h=bev_h_, bev_w=bev_w_,
bev_w=bev_w_, num_classes=18,
num_classes=18, in_channels=_dim_,
in_channels=_dim_, sync_cls_avg_factor=True,
sync_cls_avg_factor=True, with_box_refine=True,
with_box_refine=True, as_two_stage=False,
as_two_stage=False, # loss_occ=dict(
# loss_occ=dict( # type='FocalLoss',
# type='FocalLoss', # use_sigmoid=False,
# use_sigmoid=False, # gamma=2.0,
# gamma=2.0, # alpha=0.25,
# alpha=0.25, # loss_weight=10.0),
# loss_weight=10.0), use_mask=False,
use_mask=False, loss_occ=dict(
loss_occ= dict( type='CrossEntropyLoss',
type='CrossEntropyLoss', use_sigmoid=False,
use_sigmoid=False, loss_weight=1.0),
loss_weight=1.0), transformer=dict(
transformer=dict( type='TransformerOcc',
type='TransformerOcc', pillar_h=16,
pillar_h=16, num_classes=18,
num_classes=18, norm_cfg=dict(type='BN', ),
norm_cfg=dict(type='BN', ), norm_cfg_3d=dict(type='BN3d', ),
norm_cfg_3d=dict(type='BN3d', ), use_3d=True,
use_3d=True, use_conv=False,
use_conv=False, rotate_prev_bev=True,
rotate_prev_bev=True, use_shift=True,
use_shift=True, use_can_bus=True,
use_can_bus=True, embed_dims=_dim_,
embed_dims=_dim_, encoder=dict(
encoder=dict( type='BEVFormerEncoder',
type='BEVFormerEncoder', num_layers=1,
num_layers=1, pc_range=point_cloud_range,
pc_range=point_cloud_range, num_points_in_pillar=8,
num_points_in_pillar=8, return_intermediate=False,
return_intermediate=False, transformerlayers=dict(
transformerlayers=dict( type='BEVFormerLayer',
type='BEVFormerLayer', attn_cfgs=[
attn_cfgs=[ dict(
dict( type='TemporalSelfAttention',
type='TemporalSelfAttention', embed_dims=_dim_,
embed_dims=_dim_, num_levels=1),
num_levels=1), dict(
dict( type='SpatialCrossAttention',
type='SpatialCrossAttention', pc_range=point_cloud_range,
pc_range=point_cloud_range, deformable_attention=dict(
deformable_attention=dict( type='MSDeformableAttention3D',
type='MSDeformableAttention3D', embed_dims=_dim_,
embed_dims=_dim_, num_points=8,
num_points=8, num_levels=_num_levels_),
num_levels=_num_levels_), embed_dims=_dim_,
embed_dims=_dim_, )
) ],
], feedforward_channels=_ffn_dim_,
feedforward_channels=_ffn_dim_, ffn_dropout=0.1,
ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))),
'ffn', 'norm'))), ),
), positional_encoding=dict(
positional_encoding=dict( type='LearnedPositionalEncoding',
type='LearnedPositionalEncoding', num_feats=_pos_dim_,
num_feats=_pos_dim_, row_num_embed=bev_h_,
row_num_embed=bev_h_, col_num_embed=bev_w_,
col_num_embed=bev_w_,
),
), # model training and testing settings
# model training and testing settings train_cfg=dict(pts=dict(
train_cfg=dict(pts=dict( grid_size=[512, 512, 1],
grid_size=[512, 512, 1], voxel_size=voxel_size,
voxel_size=voxel_size, point_cloud_range=point_cloud_range,
point_cloud_range=point_cloud_range, out_size_factor=4,
out_size_factor=4, assigner=dict(
assigner=dict( type='HungarianAssigner3D',
type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0),
cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. # Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]), dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names), dict(type='ObjectNameFilter', classes=class_names),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict( dict(
type='MultiScaleFlipAug3D', type='MultiScaleFlipAug3D',
img_scale=(1600, 900), img_scale=(1600, 900),
pts_scale_ratio=1, pts_scale_ratio=1,
flip=False, flip=False,
transforms=[ transforms=[
dict(type='RandomScaleImageMultiViewImage', scales=[0.8]), dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict( dict(
type='DefaultFormatBundle3D', type='DefaultFormatBundle3D',
class_names=class_names, class_names=class_names,
with_label=False), with_label=False),
dict(type='CustomCollect3D', keys=['img']) dict(type='CustomCollect3D', keys=['img'])
]) ])
] ]
data = dict( data = dict(
samples_per_gpu=1, samples_per_gpu=1,
workers_per_gpu=0, workers_per_gpu=0,
train=dict( train=dict(
type=dataset_type, type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_train.pkl', ann_file=data_root + 'occ_infos_temporal_train.pkl',
pipeline=train_pipeline, pipeline=train_pipeline,
classes=class_names, classes=class_names,
modality=input_modality, modality=input_modality,
test_mode=False, test_mode=False,
use_valid_flag=True, use_valid_flag=True,
bev_size=(bev_h_, bev_w_), bev_size=(bev_h_, bev_w_),
queue_length=queue_length, queue_length=queue_length,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset. # and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='LiDAR'), box_type_3d='LiDAR'),
val=dict(type=dataset_type, val=dict(type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_val.pkl', ann_file=data_root + 'occ_infos_temporal_val.pkl',
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
classes=class_names, modality=input_modality, samples_per_gpu=1), classes=class_names, modality=input_modality, samples_per_gpu=1),
test=dict(type=dataset_type, test=dict(type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_val.pkl', ann_file=data_root + 'occ_infos_temporal_val.pkl',
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
classes=class_names, modality=input_modality), classes=class_names, modality=input_modality),
shuffler_sampler=dict(type='DistributedGroupSampler'), shuffler_sampler=dict(type='DistributedGroupSampler'),
nonshuffler_sampler=dict(type='DistributedSampler') nonshuffler_sampler=dict(type='DistributedSampler')
) )
optimizer = dict( optimizer = dict(
type='AdamW', type='AdamW',
lr=2e-4, lr=2e-4,
paramwise_cfg=dict( paramwise_cfg=dict(
custom_keys={ custom_keys={
'img_backbone': dict(lr_mult=0.1), 'img_backbone': dict(lr_mult=0.1),
}), }),
weight_decay=0.01) weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy # learning policy
lr_config = dict( lr_config = dict(
policy='CosineAnnealing', policy='CosineAnnealing',
warmup='linear', warmup='linear',
warmup_iters=500, warmup_iters=500,
warmup_ratio=1.0 / 3, warmup_ratio=1.0 / 3,
min_lr_ratio=1e-3) min_lr_ratio=1e-3)
total_epochs = 24 total_epochs = 24
evaluation = dict(interval=1, pipeline=test_pipeline) evaluation = dict(interval=1, pipeline=test_pipeline)
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs) runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
# load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth' # load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
log_config = dict( log_config = dict(
interval=50, interval=50,
hooks=[ hooks=[
dict(type='TextLoggerHook'), dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook') dict(type='TensorboardLoggerHook')
]) ])
checkpoint_config = dict(interval=1) checkpoint_config = dict(interval=1)
_base_ = [ _base_ = [
'../datasets/custom_nus-3d.py', '../datasets/custom_nus-3d.py',
'../_base_/default_runtime.py' '../_base_/default_runtime.py'
] ]
# #
plugin = True plugin = True
plugin_dir = 'projects/mmdet3d_plugin/' plugin_dir = 'projects/mmdet3d_plugin/'
# If point cloud range is changed, the models should also change their point # If point cloud range is changed, the models should also change their point
# cloud range accordingly # cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
img_norm_cfg = dict( # For nuScenes we usually do 10-class detection
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) class_names = [
# For nuScenes we usually do 10-class detection 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
class_names = [ 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', ]
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
] input_modality = dict(
use_lidar=False,
input_modality = dict( use_camera=True,
use_lidar=False, use_radar=False,
use_camera=True, use_map=False,
use_radar=False, use_external=True)
use_map=False,
use_external=True) _dim_ = 256
_pos_dim_ = _dim_ // 2
_dim_ = 256 _ffn_dim_ = _dim_ * 2
_pos_dim_ = _dim_//2 _num_levels_ = 4
_ffn_dim_ = _dim_*2 bev_h_ = 200
_num_levels_ = 4 bev_w_ = 200
bev_h_ = 200 queue_length = 4 # each sequence contains `queue_length` frames.
bev_w_ = 200 model = dict(
queue_length = 4 # each sequence contains `queue_length` frames. type='BEVFormerOcc',
model = dict( use_grid_mask=True,
type='BEVFormerOcc', video_test_mode=True,
use_grid_mask=True, img_backbone=dict(
video_test_mode=True, type='ResNet',
img_backbone=dict( depth=101,
type='ResNet', num_stages=4,
depth=101, out_indices=(1, 2, 3),
num_stages=4, frozen_stages=1,
out_indices=(1, 2, 3), norm_cfg=dict(type='BN2d', requires_grad=False),
frozen_stages=1, norm_eval=True,
norm_cfg=dict(type='BN2d', requires_grad=False), style='caffe',
norm_eval=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
style='caffe', # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict stage_with_dcn=(False, False, True, True)),
stage_with_dcn=(False, False, True, True)), img_neck=dict(
img_neck=dict( type='FPN',
type='FPN', in_channels=[512, 1024, 2048],
in_channels=[512, 1024, 2048], out_channels=_dim_,
out_channels=_dim_, start_level=0,
start_level=0, add_extra_convs='on_output',
add_extra_convs='on_output', num_outs=4,
num_outs=4, relu_before_extra_convs=True),
relu_before_extra_convs=True), pts_bbox_head=dict(
pts_bbox_head=dict( type='BEVFormerOccHead',
type='BEVFormerOccHead', pc_range=point_cloud_range,
pc_range=point_cloud_range, bev_h=bev_h_,
bev_h=bev_h_, bev_w=bev_w_,
bev_w=bev_w_, num_classes=18,
num_classes=18, in_channels=_dim_,
in_channels=_dim_, sync_cls_avg_factor=True,
sync_cls_avg_factor=True, with_box_refine=True,
with_box_refine=True, as_two_stage=False,
as_two_stage=False, # loss_occ=dict(
# loss_occ=dict( # type='FocalLoss',
# type='FocalLoss', # use_sigmoid=False,
# use_sigmoid=False, # gamma=2.0,
# gamma=2.0, # alpha=0.25,
# alpha=0.25, # loss_weight=10.0),
# loss_weight=10.0), use_mask=False,
use_mask=False, loss_occ=dict(
loss_occ= dict( type='CrossEntropyLoss',
type='CrossEntropyLoss', use_sigmoid=False,
use_sigmoid=False, loss_weight=1.0),
loss_weight=1.0), transformer=dict(
transformer=dict( type='TransformerOcc',
type='TransformerOcc', pillar_h=16,
pillar_h=16, num_classes=18,
num_classes=18, norm_cfg=dict(type='BN', ),
norm_cfg=dict(type='BN', ), norm_cfg_3d=dict(type='BN3d', ),
norm_cfg_3d=dict(type='BN3d', ), use_3d=True,
use_3d=True, use_conv=False,
use_conv=False, rotate_prev_bev=True,
rotate_prev_bev=True, use_shift=True,
use_shift=True, use_can_bus=True,
use_can_bus=True, embed_dims=_dim_,
embed_dims=_dim_, encoder=dict(
encoder=dict( type='BEVFormerEncoder',
type='BEVFormerEncoder', num_layers=4,
num_layers=4, pc_range=point_cloud_range,
pc_range=point_cloud_range, num_points_in_pillar=8,
num_points_in_pillar=8, return_intermediate=False,
return_intermediate=False, transformerlayers=dict(
transformerlayers=dict( type='BEVFormerLayer',
type='BEVFormerLayer', attn_cfgs=[
attn_cfgs=[ dict(
dict( type='TemporalSelfAttention',
type='TemporalSelfAttention', embed_dims=_dim_,
embed_dims=_dim_, num_levels=1),
num_levels=1), dict(
dict( type='SpatialCrossAttention',
type='SpatialCrossAttention', pc_range=point_cloud_range,
pc_range=point_cloud_range, deformable_attention=dict(
deformable_attention=dict( type='MSDeformableAttention3D',
type='MSDeformableAttention3D', embed_dims=_dim_,
embed_dims=_dim_, num_points=8,
num_points=8, num_levels=_num_levels_),
num_levels=_num_levels_), embed_dims=_dim_,
embed_dims=_dim_, )
) ],
], feedforward_channels=_ffn_dim_,
feedforward_channels=_ffn_dim_, ffn_dropout=0.1,
ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))),
'ffn', 'norm'))), ),
), positional_encoding=dict(
positional_encoding=dict( type='LearnedPositionalEncoding',
type='LearnedPositionalEncoding', num_feats=_pos_dim_,
num_feats=_pos_dim_, row_num_embed=bev_h_,
row_num_embed=bev_h_, col_num_embed=bev_w_,
col_num_embed=bev_w_,
),
),
# model training and testing settings
train_cfg=dict(pts=dict(
# model training and testing settings grid_size=[512, 512, 1],
train_cfg=dict(pts=dict( voxel_size=voxel_size,
grid_size=[512, 512, 1], point_cloud_range=point_cloud_range,
voxel_size=voxel_size, out_size_factor=4,
point_cloud_range=point_cloud_range, assigner=dict(
out_size_factor=4, type='HungarianAssigner3D',
assigner=dict( cls_cost=dict(type='FocalLossCost', weight=2.0),
type='HungarianAssigner3D', reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
cls_cost=dict(type='FocalLossCost', weight=2.0), iou_cost=dict(type='IoUCost', weight=0.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), # Fake cost. This is just to make it compatible with DETR head.
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. pc_range=point_cloud_range)))))
pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc'
dataset_type = 'NuSceneOcc' data_root = 'data/occ3d-nus/'
data_root = 'data/occ3d-nus/' file_client_args = dict(backend='disk')
file_client_args = dict(backend='disk') occ_gt_data_root = 'data/occ3d-nus'
occ_gt_data_root='data/occ3d-nus'
train_pipeline = [
train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectNameFilter', classes=class_names),
dict(type='ObjectNameFilter', classes=class_names), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) ]
]
test_pipeline = [
test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='PadMultiViewImage', size_divisor=32), dict(
dict( type='MultiScaleFlipAug3D',
type='MultiScaleFlipAug3D', img_scale=(1600, 900),
img_scale=(1600, 900), pts_scale_ratio=1,
pts_scale_ratio=1, flip=False,
flip=False, transforms=[
transforms=[ dict(
dict( type='DefaultFormatBundle3D',
type='DefaultFormatBundle3D', class_names=class_names,
class_names=class_names, with_label=False),
with_label=False), dict(type='CustomCollect3D', keys=['img'])
dict(type='CustomCollect3D', keys=['img']) ])
]) ]
]
data = dict(
data = dict( samples_per_gpu=1,
samples_per_gpu=1, workers_per_gpu=0,
workers_per_gpu=0, train=dict(
train=dict( type=dataset_type,
type=dataset_type, data_root=data_root,
data_root=data_root, ann_file=data_root + 'occ_infos_temporal_train.pkl',
ann_file=data_root + 'occ_infos_temporal_train.pkl', pipeline=train_pipeline,
pipeline=train_pipeline, classes=class_names,
classes=class_names, modality=input_modality,
modality=input_modality, test_mode=False,
test_mode=False, use_valid_flag=True,
use_valid_flag=True, bev_size=(bev_h_, bev_w_),
bev_size=(bev_h_, bev_w_), queue_length=queue_length,
queue_length=queue_length, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset.
# and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR'),
box_type_3d='LiDAR'), val=dict(type=dataset_type,
val=dict(type=dataset_type, data_root=data_root,
data_root=data_root, ann_file=data_root + 'occ_infos_temporal_val.pkl',
ann_file=data_root + 'occ_infos_temporal_val.pkl', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), classes=class_names, modality=input_modality, samples_per_gpu=1),
classes=class_names, modality=input_modality, samples_per_gpu=1), test=dict(type=dataset_type,
test=dict(type=dataset_type, data_root=data_root,
data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_val.pkl',
ann_file=data_root + 'occ_infos_temporal_val.pkl', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), classes=class_names, modality=input_modality),
classes=class_names, modality=input_modality), shuffler_sampler=dict(type='DistributedGroupSampler'),
shuffler_sampler=dict(type='DistributedGroupSampler'), nonshuffler_sampler=dict(type='DistributedSampler')
nonshuffler_sampler=dict(type='DistributedSampler') )
) optimizer = dict(
optimizer = dict( type='AdamW',
type='AdamW', lr=2e-4,
lr=2e-4, paramwise_cfg=dict(
paramwise_cfg=dict( custom_keys={
custom_keys={ 'img_backbone': dict(lr_mult=0.1),
'img_backbone': dict(lr_mult=0.1), }),
}), weight_decay=0.01)
weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) # learning policy
# learning policy lr_config = dict(
lr_config = dict( policy='CosineAnnealing',
policy='CosineAnnealing', warmup='linear',
warmup='linear', warmup_iters=500,
warmup_iters=500, warmup_ratio=1.0 / 3,
warmup_ratio=1.0 / 3, min_lr_ratio=1e-3)
min_lr_ratio=1e-3) total_epochs = 24
total_epochs = 24 evaluation = dict(interval=1, pipeline=test_pipeline)
evaluation = dict(interval=1, pipeline=test_pipeline)
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs) load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth' log_config = dict(
log_config = dict( interval=50,
interval=50, hooks=[
hooks=[ dict(type='TextLoggerHook'),
dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')
dict(type='TensorboardLoggerHook') ])
])
checkpoint_config = dict(interval=1)
checkpoint_config = dict(interval=1)
_base_ = [ _base_ = [
'../datasets/custom_nus-3d.py', '../datasets/custom_nus-3d.py',
'../_base_/default_runtime.py' '../_base_/default_runtime.py'
] ]
# #
plugin = True plugin = True
plugin_dir = 'projects/mmdet3d_plugin/' plugin_dir = 'projects/mmdet3d_plugin/'
# If point cloud range is changed, the models should also change their point # If point cloud range is changed, the models should also change their point
# cloud range accordingly # cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
img_norm_cfg = dict( # For nuScenes we usually do 10-class detection
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) class_names = [
# For nuScenes we usually do 10-class detection 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
class_names = [ 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', ]
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
] input_modality = dict(
use_lidar=False,
input_modality = dict( use_camera=True,
use_lidar=False, use_radar=False,
use_camera=True, use_map=False,
use_radar=False, use_external=True)
use_map=False,
use_external=True) _dim_ = 256
_pos_dim_ = _dim_ // 2
_dim_ = 256 _ffn_dim_ = _dim_ * 2
_pos_dim_ = _dim_//2 _num_levels_ = 4
_ffn_dim_ = _dim_*2 bev_h_ = 200
_num_levels_ = 4 bev_w_ = 200
bev_h_ = 200 queue_length = 4 # each sequence contains `queue_length` frames.
bev_w_ = 200 pretrained = 'https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth'
queue_length = 4 # each sequence contains `queue_length` frames. model = dict(
pretrained = 'https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask_rcnn_internimage_s_fpn_3x_coco.pth' type='BEVFormerOcc',
model = dict( use_grid_mask=True,
type='BEVFormerOcc', video_test_mode=True,
use_grid_mask=True, img_backbone=dict(
video_test_mode=True, _delete_=True,
img_backbone=dict( type='InternImage',
_delete_=True, core_op='DCNv3',
type='InternImage', channels=80,
core_op='DCNv3', depths=[4, 4, 21, 4],
channels=80, groups=[5, 10, 20, 40],
depths=[4, 4, 21, 4], mlp_ratio=4.,
groups=[5, 10, 20, 40], drop_path_rate=0.3,
mlp_ratio=4., norm_layer='LN',
drop_path_rate=0.3, layer_scale=1.0,
norm_layer='LN', offset_scale=1.0,
layer_scale=1.0, post_norm=True,
offset_scale=1.0, with_cp=True,
post_norm=True, out_indices=(1, 2, 3),
with_cp=True, init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
out_indices=(1, 2, 3), img_neck=dict(
init_cfg=dict(type='Pretrained', checkpoint=pretrained)), type='FPN',
img_neck=dict( in_channels=[160, 320, 640],
type='FPN', out_channels=_dim_,
in_channels=[160, 320, 640], start_level=0,
out_channels=_dim_, add_extra_convs='on_output',
start_level=0, num_outs=4,
add_extra_convs='on_output', relu_before_extra_convs=True),
num_outs=4, pts_bbox_head=dict(
relu_before_extra_convs=True), type='BEVFormerOccHead',
pts_bbox_head=dict( pc_range=point_cloud_range,
type='BEVFormerOccHead', bev_h=bev_h_,
pc_range=point_cloud_range, bev_w=bev_w_,
bev_h=bev_h_, num_classes=18,
bev_w=bev_w_, in_channels=_dim_,
num_classes=18, sync_cls_avg_factor=True,
in_channels=_dim_, with_box_refine=True,
sync_cls_avg_factor=True, as_two_stage=False,
with_box_refine=True, # loss_occ=dict(
as_two_stage=False, # type='FocalLoss',
# loss_occ=dict( # use_sigmoid=False,
# type='FocalLoss', # gamma=2.0,
# use_sigmoid=False, # alpha=0.25,
# gamma=2.0, # loss_weight=10.0),
# alpha=0.25, use_mask=False,
# loss_weight=10.0), loss_occ=dict(
use_mask=False, type='CrossEntropyLoss',
loss_occ= dict( use_sigmoid=False,
type='CrossEntropyLoss', loss_weight=1.0),
use_sigmoid=False, transformer=dict(
loss_weight=1.0), type='TransformerOcc',
transformer=dict( pillar_h=16,
type='TransformerOcc', num_classes=18,
pillar_h=16, norm_cfg=dict(type='BN', ),
num_classes=18, norm_cfg_3d=dict(type='BN3d', ),
norm_cfg=dict(type='BN', ), use_3d=True,
norm_cfg_3d=dict(type='BN3d', ), use_conv=False,
use_3d=True, rotate_prev_bev=True,
use_conv=False, use_shift=True,
rotate_prev_bev=True, use_can_bus=True,
use_shift=True, embed_dims=_dim_,
use_can_bus=True, encoder=dict(
embed_dims=_dim_, type='BEVFormerEncoder',
encoder=dict( num_layers=4,
type='BEVFormerEncoder', pc_range=point_cloud_range,
num_layers=4, num_points_in_pillar=8,
pc_range=point_cloud_range, return_intermediate=False,
num_points_in_pillar=8, transformerlayers=dict(
return_intermediate=False, type='BEVFormerLayer',
transformerlayers=dict( attn_cfgs=[
type='BEVFormerLayer', dict(
attn_cfgs=[ type='TemporalSelfAttention',
dict( embed_dims=_dim_,
type='TemporalSelfAttention', num_levels=1),
embed_dims=_dim_, dict(
num_levels=1), type='SpatialCrossAttention',
dict( pc_range=point_cloud_range,
type='SpatialCrossAttention', deformable_attention=dict(
pc_range=point_cloud_range, type='MSDeformableAttention3D',
deformable_attention=dict( embed_dims=_dim_,
type='MSDeformableAttention3D', num_points=8,
embed_dims=_dim_, num_levels=_num_levels_),
num_points=8, embed_dims=_dim_,
num_levels=_num_levels_), )
embed_dims=_dim_, ],
) feedforward_channels=_ffn_dim_,
], ffn_dropout=0.1,
feedforward_channels=_ffn_dim_, operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
ffn_dropout=0.1, 'ffn', 'norm'))),
operation_order=('self_attn', 'norm', 'cross_attn', 'norm', ),
'ffn', 'norm'))), positional_encoding=dict(
), type='LearnedPositionalEncoding',
positional_encoding=dict( num_feats=_pos_dim_,
type='LearnedPositionalEncoding', row_num_embed=bev_h_,
num_feats=_pos_dim_, col_num_embed=bev_w_,
row_num_embed=bev_h_,
col_num_embed=bev_w_, ),
), # model training and testing settings
train_cfg=dict(pts=dict(
grid_size=[512, 512, 1],
# model training and testing settings voxel_size=voxel_size,
train_cfg=dict(pts=dict( point_cloud_range=point_cloud_range,
grid_size=[512, 512, 1], out_size_factor=4,
voxel_size=voxel_size, assigner=dict(
point_cloud_range=point_cloud_range, type='HungarianAssigner3D',
out_size_factor=4, cls_cost=dict(type='FocalLossCost', weight=2.0),
assigner=dict( reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
type='HungarianAssigner3D', iou_cost=dict(type='IoUCost', weight=0.0),
cls_cost=dict(type='FocalLossCost', weight=2.0), # Fake cost. This is just to make it compatible with DETR head.
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), pc_range=point_cloud_range)))))
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/'
dataset_type = 'NuSceneOcc' file_client_args = dict(backend='disk')
data_root = 'data/occ3d-nus/' occ_gt_data_root = 'data/occ3d-nus'
file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
train_pipeline = [ dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectNameFilter', classes=class_names),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='ObjectNameFilter', classes=class_names), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
dict(type='DefaultFormatBundle3D', class_names=class_names), ]
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
] test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
test_pipeline = [ dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(
dict(type='PadMultiViewImage', size_divisor=32), type='MultiScaleFlipAug3D',
dict( img_scale=(1600, 900),
type='MultiScaleFlipAug3D', pts_scale_ratio=1,
img_scale=(1600, 900), flip=False,
pts_scale_ratio=1, transforms=[
flip=False, dict(
transforms=[ type='DefaultFormatBundle3D',
dict( class_names=class_names,
type='DefaultFormatBundle3D', with_label=False),
class_names=class_names, dict(type='CustomCollect3D', keys=['img'])
with_label=False), ])
dict(type='CustomCollect3D', keys=['img']) ]
])
] data = dict(
samples_per_gpu=1,
data = dict( workers_per_gpu=6,
samples_per_gpu=1, train=dict(
workers_per_gpu=6, type=dataset_type,
train=dict( data_root=data_root,
type=dataset_type, ann_file=data_root + 'occ_infos_temporal_train.pkl',
data_root=data_root, pipeline=train_pipeline,
ann_file=data_root + 'occ_infos_temporal_train.pkl', classes=class_names,
pipeline=train_pipeline, modality=input_modality,
classes=class_names, test_mode=False,
modality=input_modality, use_valid_flag=True,
test_mode=False, bev_size=(bev_h_, bev_w_),
use_valid_flag=True, queue_length=queue_length,
bev_size=(bev_h_, bev_w_), # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
queue_length=queue_length, # and box_type_3d='Depth' in sunrgbd and scannet dataset.
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset box_type_3d='LiDAR'),
# and box_type_3d='Depth' in sunrgbd and scannet dataset. val=dict(type=dataset_type,
box_type_3d='LiDAR'), data_root=data_root,
val=dict(type=dataset_type, ann_file=data_root + 'occ_infos_temporal_val.pkl',
data_root=data_root, pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
ann_file=data_root + 'occ_infos_temporal_val.pkl', classes=class_names, modality=input_modality, samples_per_gpu=1),
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), test=dict(type=dataset_type,
classes=class_names, modality=input_modality, samples_per_gpu=1), data_root=data_root,
test=dict(type=dataset_type,
data_root=data_root, ann_file=data_root + 'occ_infos_temporal_val.pkl',
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
ann_file=data_root + 'occ_infos_temporal_val.pkl', classes=class_names, modality=input_modality),
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), shuffler_sampler=dict(type='DistributedGroupSampler'),
classes=class_names, modality=input_modality), nonshuffler_sampler=dict(type='DistributedSampler')
shuffler_sampler=dict(type='DistributedGroupSampler'), )
nonshuffler_sampler=dict(type='DistributedSampler') optimizer = dict(
) type='AdamW',
optimizer = dict( lr=2e-4,
type='AdamW', weight_decay=0.05,
lr=2e-4, constructor='CustomLayerDecayOptimizerConstructor',
weight_decay=0.05, paramwise_cfg=dict(
constructor='CustomLayerDecayOptimizerConstructor', num_layers=33, layer_decay_rate=1.0,
paramwise_cfg=dict( depths=[4, 4, 21, 4]))
num_layers=33, layer_decay_rate=1.0,
depths=[4, 4, 21,4])) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict(
# learning policy policy='CosineAnnealing',
lr_config = dict( warmup='linear',
policy='CosineAnnealing', warmup_iters=500,
warmup='linear', warmup_ratio=1.0 / 3,
warmup_iters=500, min_lr_ratio=1e-3)
warmup_ratio=1.0 / 3, total_epochs = 24
min_lr_ratio=1e-3) evaluation = dict(interval=1, pipeline=test_pipeline)
total_epochs = 24 runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
evaluation = dict(interval=1, pipeline=test_pipeline) log_config = dict(
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs) interval=50,
log_config = dict( hooks=[
interval=50, dict(type='TextLoggerHook'),
hooks=[ dict(type='TensorboardLoggerHook')
dict(type='TextLoggerHook'), ])
dict(type='TensorboardLoggerHook')
]) checkpoint_config = dict(interval=1)
checkpoint_config = dict(interval=1)
_base_ = [ _base_ = [
'../datasets/custom_nus-3d.py', '../datasets/custom_nus-3d.py',
'../_base_/default_runtime.py' '../_base_/default_runtime.py'
] ]
# #
plugin = True plugin = True
plugin_dir = 'projects/mmdet3d_plugin/' plugin_dir = 'projects/mmdet3d_plugin/'
# If point cloud range is changed, the models should also change their point # If point cloud range is changed, the models should also change their point
# cloud range accordingly # cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
img_norm_cfg = dict( # For nuScenes we usually do 10-class detection
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) class_names = [
# For nuScenes we usually do 10-class detection 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
class_names = [ 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', ]
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
] input_modality = dict(
use_lidar=False,
input_modality = dict( use_camera=True,
use_lidar=False, use_radar=False,
use_camera=True, use_map=False,
use_radar=False, use_external=True)
use_map=False,
use_external=True) _dim_ = 256
_pos_dim_ = _dim_ // 2
_dim_ = 256 _ffn_dim_ = _dim_ * 2
_pos_dim_ = _dim_//2 _num_levels_ = 2
_ffn_dim_ = _dim_*2 bev_h_ = 200
_num_levels_ = 2 bev_w_ = 200
bev_h_ = 200 queue_length = 4 # each sequence contains `queue_length` frames.
bev_w_ = 200 model = dict(
queue_length = 4 # each sequence contains `queue_length` frames. type='BEVFormerOcc',
model = dict( use_grid_mask=True,
type='BEVFormerOcc', video_test_mode=True,
use_grid_mask=True, img_backbone=dict(
video_test_mode=True, type='ResNet',
img_backbone=dict( depth=50,
type='ResNet', num_stages=4,
depth=50, out_indices=(2, 3),
num_stages=4, frozen_stages=1,
out_indices=(2, 3), norm_cfg=dict(type='BN2d', requires_grad=False),
frozen_stages=1, norm_eval=True,
norm_cfg=dict(type='BN2d', requires_grad=False), style='caffe',
norm_eval=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
style='caffe', # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict stage_with_dcn=(False, False, True, True)),
stage_with_dcn=(False, False, True, True)), img_neck=dict(
img_neck=dict( type='FPN',
type='FPN', in_channels=[1024, 2048],
in_channels=[1024, 2048], out_channels=_dim_,
out_channels=_dim_, start_level=0,
start_level=0, add_extra_convs='on_output',
add_extra_convs='on_output', num_outs=_num_levels_,
num_outs=_num_levels_, relu_before_extra_convs=True),
relu_before_extra_convs=True), pts_bbox_head=dict(
pts_bbox_head=dict( type='BEVFormerOccHead',
type='BEVFormerOccHead', pc_range=point_cloud_range,
pc_range=point_cloud_range, bev_h=bev_h_,
bev_h=bev_h_, bev_w=bev_w_,
bev_w=bev_w_, num_classes=18,
num_classes=18, in_channels=_dim_,
in_channels=_dim_, sync_cls_avg_factor=True,
sync_cls_avg_factor=True, with_box_refine=True,
with_box_refine=True, as_two_stage=False,
as_two_stage=False, # loss_occ=dict(
# loss_occ=dict( # type='FocalLoss',
# type='FocalLoss', # use_sigmoid=False,
# use_sigmoid=False, # gamma=2.0,
# gamma=2.0, # alpha=0.25,
# alpha=0.25, # loss_weight=10.0),
# loss_weight=10.0), use_mask=False,
use_mask=False, loss_occ=dict(
loss_occ= dict( type='CrossEntropyLoss',
type='CrossEntropyLoss', use_sigmoid=False,
use_sigmoid=False, loss_weight=1.0),
loss_weight=1.0), transformer=dict(
transformer=dict( type='TransformerOcc',
type='TransformerOcc', pillar_h=16,
pillar_h=16, num_classes=18,
num_classes=18, norm_cfg=dict(type='BN', ),
norm_cfg=dict(type='BN', ), norm_cfg_3d=dict(type='BN3d', ),
norm_cfg_3d=dict(type='BN3d', ), use_3d=True,
use_3d=True, use_conv=False,
use_conv=False, rotate_prev_bev=True,
rotate_prev_bev=True, use_shift=True,
use_shift=True, use_can_bus=True,
use_can_bus=True, embed_dims=_dim_,
embed_dims=_dim_, encoder=dict(
encoder=dict( type='BEVFormerEncoder',
type='BEVFormerEncoder', num_layers=1,
num_layers=1, pc_range=point_cloud_range,
pc_range=point_cloud_range, num_points_in_pillar=8,
num_points_in_pillar=8, return_intermediate=False,
return_intermediate=False, transformerlayers=dict(
transformerlayers=dict( type='BEVFormerLayer',
type='BEVFormerLayer', attn_cfgs=[
attn_cfgs=[ dict(
dict( type='TemporalSelfAttention',
type='TemporalSelfAttention', embed_dims=_dim_,
embed_dims=_dim_, num_levels=1),
num_levels=1), dict(
dict( type='SpatialCrossAttention',
type='SpatialCrossAttention', pc_range=point_cloud_range,
pc_range=point_cloud_range, deformable_attention=dict(
deformable_attention=dict( type='MSDeformableAttention3D',
type='MSDeformableAttention3D', embed_dims=_dim_,
embed_dims=_dim_, num_points=8,
num_points=8, num_levels=_num_levels_),
num_levels=_num_levels_), embed_dims=_dim_,
embed_dims=_dim_, )
) ],
], feedforward_channels=_ffn_dim_,
feedforward_channels=_ffn_dim_, ffn_dropout=0.1,
ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))),
'ffn', 'norm'))), ),
), positional_encoding=dict(
positional_encoding=dict( type='LearnedPositionalEncoding',
type='LearnedPositionalEncoding', num_feats=_pos_dim_,
num_feats=_pos_dim_, row_num_embed=bev_h_,
row_num_embed=bev_h_, col_num_embed=bev_w_,
col_num_embed=bev_w_,
),
), # model training and testing settings
# model training and testing settings train_cfg=dict(pts=dict(
train_cfg=dict(pts=dict( grid_size=[512, 512, 1],
grid_size=[512, 512, 1], voxel_size=voxel_size,
voxel_size=voxel_size, point_cloud_range=point_cloud_range,
point_cloud_range=point_cloud_range, out_size_factor=4,
out_size_factor=4, assigner=dict(
assigner=dict( type='HungarianAssigner3D',
type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0),
cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. # Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]), dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names), dict(type='ObjectNameFilter', classes=class_names),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict( dict(
type='MultiScaleFlipAug3D', type='MultiScaleFlipAug3D',
img_scale=(1600, 900), img_scale=(1600, 900),
pts_scale_ratio=1, pts_scale_ratio=1,
flip=False, flip=False,
transforms=[ transforms=[
dict(type='RandomScaleImageMultiViewImage', scales=[0.8]), dict(type='RandomScaleImageMultiViewImage', scales=[0.8]),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict( dict(
type='DefaultFormatBundle3D', type='DefaultFormatBundle3D',
class_names=class_names, class_names=class_names,
with_label=False), with_label=False),
dict(type='CustomCollect3D', keys=['img']) dict(type='CustomCollect3D', keys=['img'])
]) ])
] ]
data = dict( data = dict(
samples_per_gpu=1, samples_per_gpu=1,
workers_per_gpu=4, workers_per_gpu=4,
train=dict( train=dict(
type=dataset_type, type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_train.pkl', ann_file=data_root + 'occ_infos_temporal_train.pkl',
pipeline=train_pipeline, pipeline=train_pipeline,
classes=class_names, classes=class_names,
modality=input_modality, modality=input_modality,
test_mode=False, test_mode=False,
use_valid_flag=True, use_valid_flag=True,
bev_size=(bev_h_, bev_w_), bev_size=(bev_h_, bev_w_),
queue_length=queue_length, queue_length=queue_length,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset. # and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='LiDAR'), box_type_3d='LiDAR'),
val=dict(type=dataset_type, val=dict(type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_val.pkl', ann_file=data_root + 'occ_infos_temporal_val.pkl',
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
classes=class_names, modality=input_modality, samples_per_gpu=1), classes=class_names, modality=input_modality, samples_per_gpu=1),
test=dict(type=dataset_type, test=dict(type=dataset_type,
data_root=data_root, data_root=data_root,
ann_file=data_root + 'occ_infos_temporal_val.pkl', ann_file=data_root + 'occ_infos_temporal_val.pkl',
pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pipeline=test_pipeline, bev_size=(bev_h_, bev_w_),
classes=class_names, modality=input_modality), classes=class_names, modality=input_modality),
shuffler_sampler=dict(type='DistributedGroupSampler'), shuffler_sampler=dict(type='DistributedGroupSampler'),
nonshuffler_sampler=dict(type='DistributedSampler') nonshuffler_sampler=dict(type='DistributedSampler')
) )
optimizer = dict( optimizer = dict(
type='AdamW', type='AdamW',
lr=2e-4, lr=2e-4,
paramwise_cfg=dict( paramwise_cfg=dict(
custom_keys={ custom_keys={
'img_backbone': dict(lr_mult=0.1), 'img_backbone': dict(lr_mult=0.1),
}), }),
weight_decay=0.01) weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy # learning policy
lr_config = dict( lr_config = dict(
policy='CosineAnnealing', policy='CosineAnnealing',
warmup='linear', warmup='linear',
warmup_iters=500, warmup_iters=500,
warmup_ratio=1.0 / 3, warmup_ratio=1.0 / 3,
min_lr_ratio=1e-3) min_lr_ratio=1e-3)
total_epochs = 24 total_epochs = 24
evaluation = dict(interval=1, pipeline=test_pipeline) evaluation = dict(interval=1, pipeline=test_pipeline)
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs) runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
# load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth' # load_from = 'ckpts/r101_dcn_fcos3d_pretrain.pth'
log_config = dict( log_config = dict(
interval=50, interval=50,
hooks=[ hooks=[
dict(type='TextLoggerHook'), dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook') dict(type='TensorboardLoggerHook')
]) ])
checkpoint_config = dict(interval=1) checkpoint_config = dict(interval=1)
from .bevformer import *
from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D
from .core.bbox.coders.nms_free_coder import NMSFreeCoder from .core.bbox.coders.nms_free_coder import NMSFreeCoder
from .core.bbox.match_costs import BBox3DL1Cost from .core.bbox.match_costs import BBox3DL1Cost
from .core.evaluation.eval_hooks import CustomDistEvalHook from .core.evaluation.eval_hooks import CustomDistEvalHook
from .datasets.pipelines import ( from .datasets.pipelines import (CustomCollect3D, NormalizeMultiviewImage,
PhotoMetricDistortionMultiViewImage, PadMultiViewImage, PadMultiViewImage,
NormalizeMultiviewImage, CustomCollect3D) PhotoMetricDistortionMultiViewImage)
from .models.backbones.vovnet import VoVNet from .models.backbones.vovnet import VoVNet
from .models.utils import *
from .models.opt.adamw import AdamW2 from .models.opt.adamw import AdamW2
from .bevformer import * from .models.utils import *
from .backbones import *
from .dense_heads import * from .dense_heads import *
from .detectors import * from .detectors import *
from .modules import * from .hooks import *
from .runner import * from .modules import *
from .hooks import * from .runner import *
from .backbones import *
\ No newline at end of file
from .train import custom_train_model
from .mmdet_train import custom_train_detector from .mmdet_train import custom_train_detector
# from .test import custom_multi_gpu_test from .train import custom_train_model
\ No newline at end of file
# from .test import custom_multi_gpu_test
...@@ -3,42 +3,39 @@ ...@@ -3,42 +3,39 @@
# --------------------------------------------- # ---------------------------------------------
# Modified by Zhiqi Li # Modified by Zhiqi Li
# --------------------------------------------- # ---------------------------------------------
import random import os.path as osp
import time
import warnings import warnings
import numpy as np
import torch import torch
import torch.distributed as dist
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner, from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,
Fp16OptimizerHook, OptimizerHook, build_optimizer, Fp16OptimizerHook, OptimizerHook, build_optimizer,
build_runner, get_dist_info) build_runner)
from mmcv.utils import build_from_cfg from mmcv.utils import build_from_cfg
from mmdet.core import EvalHook from mmdet.core import EvalHook
from mmdet.datasets import replace_ImageToTensor
from mmdet.datasets import (build_dataset,
replace_ImageToTensor)
from mmdet.utils import get_root_logger from mmdet.utils import get_root_logger
import time from projects.mmdet3d_plugin.core.evaluation.eval_hooks import \
import os.path as osp CustomDistEvalHook
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
from projects.mmdet3d_plugin.core.evaluation.eval_hooks import CustomDistEvalHook
from projects.mmdet3d_plugin.datasets import custom_build_dataset from projects.mmdet3d_plugin.datasets import custom_build_dataset
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
def custom_train_detector(model, def custom_train_detector(model,
dataset, dataset,
cfg, cfg,
distributed=False, distributed=False,
validate=False, validate=False,
timestamp=None, timestamp=None,
eval_model=None, eval_model=None,
meta=None): meta=None):
logger = get_root_logger(cfg.log_level) logger = get_root_logger(cfg.log_level)
# prepare data loaders # prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset] dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
#assert len(dataset)==1s # assert len(dataset)==1s
if 'imgs_per_gpu' in cfg.data: if 'imgs_per_gpu' in cfg.data:
logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. ' logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. '
'Please use "samples_per_gpu" instead') 'Please use "samples_per_gpu" instead')
...@@ -90,7 +87,6 @@ def custom_train_detector(model, ...@@ -90,7 +87,6 @@ def custom_train_detector(model,
eval_model = MMDataParallel( eval_model = MMDataParallel(
eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids) eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
# build runner # build runner
optimizer = build_optimizer(model, cfg.optimizer) optimizer = build_optimizer(model, cfg.optimizer)
...@@ -142,12 +138,12 @@ def custom_train_detector(model, ...@@ -142,12 +138,12 @@ def custom_train_detector(model,
runner.register_training_hooks(cfg.lr_config, optimizer_config, runner.register_training_hooks(cfg.lr_config, optimizer_config,
cfg.checkpoint_config, cfg.log_config, cfg.checkpoint_config, cfg.log_config,
cfg.get('momentum_config', None)) cfg.get('momentum_config', None))
# register profiler hook # register profiler hook
#trace_config = dict(type='tb_trace', dir_name='work_dir') # trace_config = dict(type='tb_trace', dir_name='work_dir')
#profiler_config = dict(on_trace_ready=trace_config) # profiler_config = dict(on_trace_ready=trace_config)
#runner.register_profiler_hook(profiler_config) # runner.register_profiler_hook(profiler_config)
if distributed: if distributed:
if isinstance(runner, EpochBasedRunner): if isinstance(runner, EpochBasedRunner):
runner.register_hook(DistSamplerSeedHook()) runner.register_hook(DistSamplerSeedHook())
...@@ -174,7 +170,7 @@ def custom_train_detector(model, ...@@ -174,7 +170,7 @@ def custom_train_detector(model,
) )
eval_cfg = cfg.get('evaluation', {}) eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner' eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ','_').replace(':','_')) eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ', '_').replace(':', '_'))
eval_hook = CustomDistEvalHook if distributed else EvalHook eval_hook = CustomDistEvalHook if distributed else EvalHook
runner.register_hook(eval_hook(val_dataloader, **eval_cfg)) runner.register_hook(eval_hook(val_dataloader, **eval_cfg))
...@@ -197,4 +193,3 @@ def custom_train_detector(model, ...@@ -197,4 +193,3 @@ def custom_train_detector(model,
elif cfg.load_from: elif cfg.load_from:
runner.load_checkpoint(cfg.load_from) runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow) runner.run(data_loaders, cfg.workflow)
# --------------------------------------------- # ---------------------------------------------
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
# --------------------------------------------- # ---------------------------------------------
# Modified by Xiaoyu Tian # Modified by Xiaoyu Tian
# --------------------------------------------- # ---------------------------------------------
import os.path as osp import os.path as osp
import pickle import shutil
import shutil import tempfile
import tempfile import time
import time
import mmcv
import mmcv import numpy as np
import torch import pycocotools.mask as mask_util
import torch.distributed as dist import torch
from mmcv.image import tensor2imgs import torch.distributed as dist
from mmcv.runner import get_dist_info from mmcv.runner import get_dist_info
from mmdet.core import encode_mask_results
def custom_encode_mask_results(mask_results):
"""Encode bitmap mask to RLE code. Semantic Masks only
import mmcv Args:
import numpy as np mask_results (list | tuple[list]): bitmap mask results.
import pycocotools.mask as mask_util In mask scoring rcnn, mask_results is a tuple of (segm_results,
segm_cls_score).
def custom_encode_mask_results(mask_results): Returns:
"""Encode bitmap mask to RLE code. Semantic Masks only list | tuple: RLE encoded mask.
Args: """
mask_results (list | tuple[list]): bitmap mask results. cls_segms = mask_results
In mask scoring rcnn, mask_results is a tuple of (segm_results, num_classes = len(cls_segms)
segm_cls_score). encoded_mask_results = []
Returns: for i in range(len(cls_segms)):
list | tuple: RLE encoded mask. encoded_mask_results.append(
""" mask_util.encode(
cls_segms = mask_results np.array(
num_classes = len(cls_segms) cls_segms[i][:, :, np.newaxis], order='F',
encoded_mask_results = [] dtype='uint8'))[0]) # encoded with RLE
for i in range(len(cls_segms)): return [encoded_mask_results]
encoded_mask_results.append(
mask_util.encode(
np.array( def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
cls_segms[i][:, :, np.newaxis], order='F', """Test model with multiple gpus.
dtype='uint8'))[0]) # encoded with RLE This method tests model with multiple gpus and collects the results
return [encoded_mask_results] under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
it encodes results to gpu tensors and use gpu communication for results
def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False): collection. On cpu mode it saves the results on different gpus to 'tmpdir'
"""Test model with multiple gpus. and collects them by the rank 0 worker.
This method tests model with multiple gpus and collects the results Args:
under two different modes: gpu and cpu modes. By setting 'gpu_collect=True' model (nn.Module): Model to be tested.
it encodes results to gpu tensors and use gpu communication for results data_loader (nn.Dataloader): Pytorch data loader.
collection. On cpu mode it saves the results on different gpus to 'tmpdir' tmpdir (str): Path of directory to save the temporary results from
and collects them by the rank 0 worker. different gpus under cpu mode.
Args: gpu_collect (bool): Option to use either gpu or cpu to collect results.
model (nn.Module): Model to be tested. Returns:
data_loader (nn.Dataloader): Pytorch data loader. list: The prediction results.
tmpdir (str): Path of directory to save the temporary results from """
different gpus under cpu mode. model.eval()
gpu_collect (bool): Option to use either gpu or cpu to collect results. bbox_results = []
Returns: mask_results = []
list: The prediction results. occ_results = []
""" dataset = data_loader.dataset
model.eval() rank, world_size = get_dist_info()
bbox_results = [] if rank == 0:
mask_results = [] prog_bar = mmcv.ProgressBar(len(dataset))
occ_results = [] time.sleep(2) # This line can prevent deadlock problem in some cases.
dataset = data_loader.dataset have_mask = False
rank, world_size = get_dist_info() for i, data in enumerate(data_loader):
if rank == 0: with torch.no_grad():
prog_bar = mmcv.ProgressBar(len(dataset)) result = model(return_loss=False, rescale=True, **data)
time.sleep(2) # This line can prevent deadlock problem in some cases. bs = result.shape[0]
have_mask = False assert bs == 1, \
for i, data in enumerate(data_loader): 'Evaluation only supports batch_size=1 in this version'
with torch.no_grad(): # encode mask results
result = model(return_loss=False, rescale=True, **data) if isinstance(result, dict):
bs=result.shape[0] if 'bbox_results' in result.keys():
assert bs==1, \ bbox_result = result['bbox_results']
'Evaluation only supports batch_size=1 in this version' batch_size = len(result['bbox_results'])
# encode mask results bbox_results.extend(bbox_result)
if isinstance(result, dict): if 'mask_results' in result.keys() and result['mask_results'] is not None:
if 'bbox_results' in result.keys(): mask_result = custom_encode_mask_results(result['mask_results'])
bbox_result = result['bbox_results'] mask_results.extend(mask_result)
batch_size = len(result['bbox_results']) have_mask = True
bbox_results.extend(bbox_result) else:
if 'mask_results' in result.keys() and result['mask_results'] is not None: batch_size = 1
mask_result = custom_encode_mask_results(result['mask_results']) occ_results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
mask_results.extend(mask_result) # batch_size = len(result)
have_mask = True # bbox_results.extend(result)
else:
batch_size = 1 # if isinstance(result[0], tuple):
occ_results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)]) # assert False, 'this code is for instance segmentation, which our code will not utilize.'
# batch_size = len(result) # result = [(bbox_results, encode_mask_results(mask_results))
# bbox_results.extend(result) # for bbox_results, mask_results in result]
if rank == 0:
#if isinstance(result[0], tuple):
# assert False, 'this code is for instance segmentation, which our code will not utilize.' for _ in range(batch_size * world_size):
# result = [(bbox_results, encode_mask_results(mask_results)) prog_bar.update()
# for bbox_results, mask_results in result]
if rank == 0: # collect results from all ranks
if gpu_collect:
for _ in range(batch_size * world_size): bbox_results = collect_results_gpu(bbox_results, len(dataset))
prog_bar.update() if have_mask:
mask_results = collect_results_gpu(mask_results, len(dataset))
# collect results from all ranks else:
if gpu_collect: mask_results = None
bbox_results = collect_results_gpu(bbox_results, len(dataset)) else:
if have_mask: # bbox_results = collect_results_cpu(bbox_results, len(dataset), tmpdir)
mask_results = collect_results_gpu(mask_results, len(dataset)) # tmpdir = tmpdir+'_mask' if tmpdir is not None else None
else: # if have_mask:
mask_results = None # mask_results = collect_results_cpu(mask_results, len(dataset), tmpdir)
else: # else:
# bbox_results = collect_results_cpu(bbox_results, len(dataset), tmpdir) # mask_results = None
# tmpdir = tmpdir+'_mask' if tmpdir is not None else None tmpdir = tmpdir + '_occ' if tmpdir is not None else None
# if have_mask: occ_results = collect_results_cpu(occ_results, len(dataset), tmpdir)
# mask_results = collect_results_cpu(mask_results, len(dataset), tmpdir)
# else: return occ_results
# mask_results = None
tmpdir = tmpdir + '_occ' if tmpdir is not None else None
occ_results = collect_results_cpu(occ_results, len(dataset), tmpdir) def collect_results_cpu(result_part, size, tmpdir=None):
rank, world_size = get_dist_info()
return occ_results # create a tmp dir if it is not specified
if tmpdir is None:
MAX_LEN = 512
def collect_results_cpu(result_part, size, tmpdir=None): # 32 is whitespace
rank, world_size = get_dist_info() dir_tensor = torch.full((MAX_LEN,),
# create a tmp dir if it is not specified 32,
if tmpdir is None: dtype=torch.uint8,
MAX_LEN = 512 device='cuda')
# 32 is whitespace if rank == 0:
dir_tensor = torch.full((MAX_LEN, ), mmcv.mkdir_or_exist('.dist_test')
32, tmpdir = tempfile.mkdtemp(dir='.dist_test')
dtype=torch.uint8, tmpdir = torch.tensor(
device='cuda') bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
if rank == 0: dir_tensor[:len(tmpdir)] = tmpdir
mmcv.mkdir_or_exist('.dist_test') dist.broadcast(dir_tensor, 0)
tmpdir = tempfile.mkdtemp(dir='.dist_test') tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
tmpdir = torch.tensor( else:
bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda') mmcv.mkdir_or_exist(tmpdir)
dir_tensor[:len(tmpdir)] = tmpdir # dump the part result to the dir
dist.broadcast(dir_tensor, 0) mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))
tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip() dist.barrier()
else: # collect all parts
mmcv.mkdir_or_exist(tmpdir) if rank != 0:
# dump the part result to the dir return None
mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl')) else:
dist.barrier() # load results of all parts from tmp dir
# collect all parts part_list = []
if rank != 0: for i in range(world_size):
return None part_file = osp.join(tmpdir, f'part_{i}.pkl')
else: part_list.append(mmcv.load(part_file))
# load results of all parts from tmp dir # sort the results
part_list = [] ordered_results = []
for i in range(world_size): '''
part_file = osp.join(tmpdir, f'part_{i}.pkl') bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample,
part_list.append(mmcv.load(part_file)) '''
# sort the results # for res in zip(*part_list):
ordered_results = [] for res in part_list:
''' ordered_results.extend(list(res))
bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample, # the dataloader may pad some samples
''' ordered_results = ordered_results[:size]
#for res in zip(*part_list): # remove tmp dir
for res in part_list: shutil.rmtree(tmpdir)
ordered_results.extend(list(res)) return ordered_results
# the dataloader may pad some samples
ordered_results = ordered_results[:size]
# remove tmp dir def single_gpu_test(model,
shutil.rmtree(tmpdir) data_loader,
return ordered_results show=False,
out_dir=None,
def single_gpu_test(model, show_score_thr=0.3):
data_loader, """Test model with single gpu.
show=False,
out_dir=None, This method tests model with single gpu and gives the 'show' option.
show_score_thr=0.3): By setting ``show=True``, it saves the visualization results under
"""Test model with single gpu. ``out_dir``.
This method tests model with single gpu and gives the 'show' option. Args:
By setting ``show=True``, it saves the visualization results under model (nn.Module): Model to be tested.
``out_dir``. data_loader (nn.Dataloader): Pytorch data loader.
show (bool): Whether to save viualization results.
Args: Default: True.
model (nn.Module): Model to be tested. out_dir (str): The path to save visualization results.
data_loader (nn.Dataloader): Pytorch data loader. Default: None.
show (bool): Whether to save viualization results.
Default: True. Returns:
out_dir (str): The path to save visualization results. list[dict]: The prediction results.
Default: None. """
model.eval()
Returns: results = []
list[dict]: The prediction results. dataset = data_loader.dataset
""" prog_bar = mmcv.ProgressBar(len(dataset))
model.eval() for i, data in enumerate(data_loader):
results = [] with torch.no_grad():
dataset = data_loader.dataset result = model(return_loss=False, rescale=True, **data)
prog_bar = mmcv.ProgressBar(len(dataset))
for i, data in enumerate(data_loader): results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)])
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data) batch_size = len(result)
for _ in range(batch_size):
results.extend([result.squeeze(dim=0).cpu().numpy().astype(np.uint8)]) prog_bar.update()
return results
batch_size = len(result)
for _ in range(batch_size):
prog_bar.update() def collect_results_gpu(result_part, size):
return results collect_results_cpu(result_part, size)
def collect_results_gpu(result_part, size):
collect_results_cpu(result_part, size)
\ No newline at end of file
...@@ -4,18 +4,20 @@ ...@@ -4,18 +4,20 @@
# Modified by Zhiqi Li # Modified by Zhiqi Li
# --------------------------------------------- # ---------------------------------------------
from .mmdet_train import custom_train_detector
from mmseg.apis import train_segmentor
from mmdet.apis import train_detector from mmdet.apis import train_detector
from mmseg.apis import train_segmentor
from .mmdet_train import custom_train_detector
def custom_train_model(model, def custom_train_model(model,
dataset, dataset,
cfg, cfg,
distributed=False, distributed=False,
validate=False, validate=False,
timestamp=None, timestamp=None,
eval_model=None, eval_model=None,
meta=None): meta=None):
"""A function wrapper for launching model training according to cfg. """A function wrapper for launching model training according to cfg.
Because we need different eval_hook in runner. Should be deprecated in the Because we need different eval_hook in runner. Should be deprecated in the
......
from .internimage import InternImage from .custom_layer_decay_optimizer_constructor import \
from .custom_layer_decay_optimizer_constructor import CustomLayerDecayOptimizerConstructor CustomLayerDecayOptimizerConstructor
\ No newline at end of file from .internimage import InternImage
...@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c ...@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c
import json import json
from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor from mmcv.runner import (OPTIMIZER_BUILDERS, DefaultOptimizerConstructor,
from mmcv.runner import get_dist_info get_dist_info)
from mmdet.utils import get_root_logger from mmdet.utils import get_root_logger
def get_num_layer_for_swin(var_name, num_max_layer, depths): def get_num_layer_for_swin(var_name, num_max_layer, depths):
if var_name.startswith("img_backbone.patch_embed"): if var_name.startswith('img_backbone.patch_embed'):
return 0 return 0
elif "level_embeds" in var_name: elif 'level_embeds' in var_name:
return 0 return 0
elif var_name.startswith("img_backbone.layers") or var_name.startswith( elif var_name.startswith('img_backbone.layers') or var_name.startswith(
"img_backbone.levels"): 'img_backbone.levels'):
if var_name.split('.')[3] not in ['downsample', 'norm']: if var_name.split('.')[3] not in ['downsample', 'norm']:
stage_id = int(var_name.split('.')[2]) stage_id = int(var_name.split('.')[2])
layer_id = int(var_name.split('.')[4]) layer_id = int(var_name.split('.')[4])
...@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor): ...@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
depths = self.paramwise_cfg.get('depths') depths = self.paramwise_cfg.get('depths')
offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0) offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0)
logger.info("Build CustomLayerDecayOptimizerConstructor %f - %d" % logger.info('Build CustomLayerDecayOptimizerConstructor %f - %d' %
(layer_decay_rate, num_layers)) (layer_decay_rate, num_layers))
weight_decay = self.base_wd weight_decay = self.base_wd
for name, param in module.named_parameters(): for name, param in module.named_parameters():
if not param.requires_grad: if not param.requires_grad:
continue # frozen weights continue # frozen weights
if len(param.shape) == 1 or name.endswith(".bias") or \ if len(param.shape) == 1 or name.endswith('.bias') or \
"relative_position" in name or \ 'relative_position' in name or \
"norm" in name or\ 'norm' in name or \
"sampling_offsets" in name: 'sampling_offsets' in name:
group_name = "no_decay" group_name = 'no_decay'
this_weight_decay = 0. this_weight_decay = 0.
else: else:
group_name = "decay" group_name = 'decay'
this_weight_decay = weight_decay this_weight_decay = weight_decay
layer_id = get_num_layer_for_swin(name, num_layers, depths) layer_id = get_num_layer_for_swin(name, num_layers, depths)
if layer_id == num_layers - 1 and dino_head and \ if layer_id == num_layers - 1 and dino_head and \
("sampling_offsets" in name or "reference_points" in name): ('sampling_offsets' in name or 'reference_points' in name):
group_name = "layer_%d_%s_0.1x" % (layer_id, group_name) group_name = 'layer_%d_%s_0.1x' % (layer_id, group_name)
elif "sampling_offsets" in name or "reference_points" in name: elif 'sampling_offsets' in name or 'reference_points' in name:
group_name = "layer_%d_%s_offset_lr_scale" % (layer_id, group_name = 'layer_%d_%s_offset_lr_scale' % (layer_id,
group_name) group_name)
else: else:
group_name = "layer_%d_%s" % (layer_id, group_name) group_name = 'layer_%d_%s' % (layer_id, group_name)
if group_name not in parameter_groups: if group_name not in parameter_groups:
scale = layer_decay_rate ** (num_layers - layer_id - 1) scale = layer_decay_rate ** (num_layers - layer_id - 1)
if scale < 1 and backbone_small_lr == True: if scale < 1 and backbone_small_lr == True:
scale = scale * 0.1 scale = scale * 0.1
if "0.1x" in group_name: if '0.1x' in group_name:
scale = scale * 0.1 scale = scale * 0.1
if "offset_lr_scale" in group_name: if 'offset_lr_scale' in group_name:
scale = scale * offset_lr_scale scale = scale * offset_lr_scale
parameter_groups[group_name] = { parameter_groups[group_name] = {
"weight_decay": this_weight_decay, 'weight_decay': this_weight_decay,
"params": [], 'params': [],
"param_names": [], 'param_names': [],
"lr_scale": scale, 'lr_scale': scale,
"group_name": group_name, 'group_name': group_name,
"lr": scale * self.base_lr, 'lr': scale * self.base_lr,
} }
parameter_groups[group_name]["params"].append(param) parameter_groups[group_name]['params'].append(param)
parameter_groups[group_name]["param_names"].append(name) parameter_groups[group_name]['param_names'].append(name)
rank, _ = get_dist_info() rank, _ = get_dist_info()
if rank == 0: if rank == 0:
to_display = {} to_display = {}
for key in parameter_groups: for key in parameter_groups:
to_display[key] = { to_display[key] = {
"param_names": parameter_groups[key]["param_names"], 'param_names': parameter_groups[key]['param_names'],
"lr_scale": parameter_groups[key]["lr_scale"], 'lr_scale': parameter_groups[key]['lr_scale'],
"lr": parameter_groups[key]["lr"], 'lr': parameter_groups[key]['lr'],
"weight_decay": parameter_groups[key]["weight_decay"], 'weight_decay': parameter_groups[key]['weight_decay'],
} }
logger.info("Param groups = %s" % json.dumps(to_display, indent=2)) logger.info('Param groups = %s' % json.dumps(to_display, indent=2))
# state_dict = module.state_dict() # state_dict = module.state_dict()
# for group_name in parameter_groups: # for group_name in parameter_groups:
...@@ -139,4 +139,4 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor): ...@@ -139,4 +139,4 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
# for name in group["param_names"]: # for name in group["param_names"]:
# group["params"].append(state_dict[name]) # group["params"].append(state_dict[name])
params.extend(parameter_groups.values()) params.extend(parameter_groups.values())
\ No newline at end of file
...@@ -4,16 +4,17 @@ ...@@ -4,16 +4,17 @@
# Licensed under The MIT License [see LICENSE for details] # Licensed under The MIT License [see LICENSE for details]
# -------------------------------------------------------- # --------------------------------------------------------
from collections import OrderedDict
import torch import torch
import torch.nn as nn import torch.nn as nn
from collections import OrderedDict import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint import torch.utils.checkpoint as checkpoint
from timm.models.layers import trunc_normal_, DropPath
from mmcv.runner import _load_checkpoint
from mmcv.cnn import constant_init, trunc_normal_init from mmcv.cnn import constant_init, trunc_normal_init
from mmdet.utils import get_root_logger from mmcv.runner import _load_checkpoint
from mmdet.models.builder import BACKBONES from mmdet.models.builder import BACKBONES
import torch.nn.functional as F from mmdet.utils import get_root_logger
from timm.models.layers import DropPath, trunc_normal_
from .ops_dcnv3 import modules as opsm from .ops_dcnv3 import modules as opsm
...@@ -86,7 +87,7 @@ class CrossAttention(nn.Module): ...@@ -86,7 +87,7 @@ class CrossAttention(nn.Module):
attn_head_dim (int, optional): Dimension of attention head. attn_head_dim (int, optional): Dimension of attention head.
out_dim (int, optional): Dimension of output. out_dim (int, optional): Dimension of output.
""" """
def __init__(self, def __init__(self,
dim, dim,
num_heads=8, num_heads=8,
...@@ -178,7 +179,7 @@ class AttentiveBlock(nn.Module): ...@@ -178,7 +179,7 @@ class AttentiveBlock(nn.Module):
attn_head_dim (int, optional): Dimension of attention head. Default: None. attn_head_dim (int, optional): Dimension of attention head. Default: None.
out_dim (int, optional): Dimension of output. Default: None. out_dim (int, optional): Dimension of output. Default: None.
""" """
def __init__(self, def __init__(self,
dim, dim,
num_heads, num_heads,
...@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module): ...@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module):
drop=0., drop=0.,
attn_drop=0., attn_drop=0.,
drop_path=0., drop_path=0.,
norm_layer="LN", norm_layer='LN',
attn_head_dim=None, attn_head_dim=None,
out_dim=None): out_dim=None):
super().__init__() super().__init__()
...@@ -363,9 +364,9 @@ class InternImageLayer(nn.Module): ...@@ -363,9 +364,9 @@ class InternImageLayer(nn.Module):
layer_scale=None, layer_scale=None,
offset_scale=1.0, offset_scale=1.0,
with_cp=False, with_cp=False,
dw_kernel_size=None, # for InternImage-H/G dw_kernel_size=None, # for InternImage-H/G
res_post_norm=False, # for InternImage-H/G res_post_norm=False, # for InternImage-H/G
center_feature_scale=False): # for InternImage-H/G center_feature_scale=False): # for InternImage-H/G
super().__init__() super().__init__()
self.channels = channels self.channels = channels
self.groups = groups self.groups = groups
...@@ -384,8 +385,8 @@ class InternImageLayer(nn.Module): ...@@ -384,8 +385,8 @@ class InternImageLayer(nn.Module):
offset_scale=offset_scale, offset_scale=offset_scale,
act_layer=act_layer, act_layer=act_layer,
norm_layer=norm_layer, norm_layer=norm_layer,
dw_kernel_size=dw_kernel_size, # for InternImage-H/G dw_kernel_size=dw_kernel_size, # for InternImage-H/G
center_feature_scale=center_feature_scale) # for InternImage-H/G center_feature_scale=center_feature_scale) # for InternImage-H/G
self.drop_path = DropPath(drop_path) if drop_path > 0. \ self.drop_path = DropPath(drop_path) if drop_path > 0. \
else nn.Identity() else nn.Identity()
self.norm2 = build_norm_layer(channels, 'LN') self.norm2 = build_norm_layer(channels, 'LN')
...@@ -411,7 +412,7 @@ class InternImageLayer(nn.Module): ...@@ -411,7 +412,7 @@ class InternImageLayer(nn.Module):
if self.post_norm: if self.post_norm:
x = x + self.drop_path(self.norm1(self.dcn(x))) x = x + self.drop_path(self.norm1(self.dcn(x)))
x = x + self.drop_path(self.norm2(self.mlp(x))) x = x + self.drop_path(self.norm2(self.mlp(x)))
elif self.res_post_norm: # for InternImage-H/G elif self.res_post_norm: # for InternImage-H/G
x = x + self.drop_path(self.res_post_norm1(self.dcn(self.norm1(x)))) x = x + self.drop_path(self.res_post_norm1(self.dcn(self.norm1(x))))
x = x + self.drop_path(self.res_post_norm2(self.mlp(self.norm2(x)))) x = x + self.drop_path(self.res_post_norm2(self.mlp(self.norm2(x))))
else: else:
...@@ -466,10 +467,10 @@ class InternImageBlock(nn.Module): ...@@ -466,10 +467,10 @@ class InternImageBlock(nn.Module):
offset_scale=1.0, offset_scale=1.0,
layer_scale=None, layer_scale=None,
with_cp=False, with_cp=False,
dw_kernel_size=None, # for InternImage-H/G dw_kernel_size=None, # for InternImage-H/G
post_norm_block_ids=None, # for InternImage-H/G post_norm_block_ids=None, # for InternImage-H/G
res_post_norm=False, # for InternImage-H/G res_post_norm=False, # for InternImage-H/G
center_feature_scale=False): # for InternImage-H/G center_feature_scale=False): # for InternImage-H/G
super().__init__() super().__init__()
self.channels = channels self.channels = channels
self.depth = depth self.depth = depth
...@@ -491,15 +492,15 @@ class InternImageBlock(nn.Module): ...@@ -491,15 +492,15 @@ class InternImageBlock(nn.Module):
layer_scale=layer_scale, layer_scale=layer_scale,
offset_scale=offset_scale, offset_scale=offset_scale,
with_cp=with_cp, with_cp=with_cp,
dw_kernel_size=dw_kernel_size, # for InternImage-H/G dw_kernel_size=dw_kernel_size, # for InternImage-H/G
res_post_norm=res_post_norm, # for InternImage-H/G res_post_norm=res_post_norm, # for InternImage-H/G
center_feature_scale=center_feature_scale # for InternImage-H/G center_feature_scale=center_feature_scale # for InternImage-H/G
) for i in range(depth) ) for i in range(depth)
]) ])
if not self.post_norm or center_feature_scale: if not self.post_norm or center_feature_scale:
self.norm = build_norm_layer(channels, 'LN') self.norm = build_norm_layer(channels, 'LN')
self.post_norm_block_ids = post_norm_block_ids self.post_norm_block_ids = post_norm_block_ids
if post_norm_block_ids is not None: # for InternImage-H/G if post_norm_block_ids is not None: # for InternImage-H/G
self.post_norms = nn.ModuleList( self.post_norms = nn.ModuleList(
[build_norm_layer(channels, 'LN', eps=1e-6) for _ in post_norm_block_ids] [build_norm_layer(channels, 'LN', eps=1e-6) for _ in post_norm_block_ids]
) )
...@@ -511,7 +512,7 @@ class InternImageBlock(nn.Module): ...@@ -511,7 +512,7 @@ class InternImageBlock(nn.Module):
x = blk(x) x = blk(x)
if (self.post_norm_block_ids is not None) and (i in self.post_norm_block_ids): if (self.post_norm_block_ids is not None) and (i in self.post_norm_block_ids):
index = self.post_norm_block_ids.index(i) index = self.post_norm_block_ids.index(i)
x = self.post_norms[index](x) # for InternImage-H/G x = self.post_norms[index](x) # for InternImage-H/G
if not self.post_norm or self.center_feature_scale: if not self.post_norm or self.center_feature_scale:
x = self.norm(x) x = self.norm(x)
if return_wo_downsample: if return_wo_downsample:
...@@ -577,7 +578,7 @@ class InternImage(nn.Module): ...@@ -577,7 +578,7 @@ class InternImage(nn.Module):
self.num_levels = len(depths) self.num_levels = len(depths)
self.depths = depths self.depths = depths
self.channels = channels self.channels = channels
self.num_features = int(channels * 2**(self.num_levels - 1)) self.num_features = int(channels * 2 ** (self.num_levels - 1))
self.post_norm = post_norm self.post_norm = post_norm
self.mlp_ratio = mlp_ratio self.mlp_ratio = mlp_ratio
self.init_cfg = init_cfg self.init_cfg = init_cfg
...@@ -588,9 +589,9 @@ class InternImage(nn.Module): ...@@ -588,9 +589,9 @@ class InternImage(nn.Module):
logger.info(f'using activation layer: {act_layer}') logger.info(f'using activation layer: {act_layer}')
logger.info(f'using main norm layer: {norm_layer}') logger.info(f'using main norm layer: {norm_layer}')
logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}') logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}')
logger.info(f"level2_post_norm: {level2_post_norm}") logger.info(f'level2_post_norm: {level2_post_norm}')
logger.info(f"level2_post_norm_block_ids: {level2_post_norm_block_ids}") logger.info(f'level2_post_norm_block_ids: {level2_post_norm_block_ids}')
logger.info(f"res_post_norm: {res_post_norm}") logger.info(f'res_post_norm: {res_post_norm}')
in_chans = 3 in_chans = 3
self.patch_embed = StemLayer(in_chans=in_chans, self.patch_embed = StemLayer(in_chans=in_chans,
...@@ -609,10 +610,10 @@ class InternImage(nn.Module): ...@@ -609,10 +610,10 @@ class InternImage(nn.Module):
self.levels = nn.ModuleList() self.levels = nn.ModuleList()
for i in range(self.num_levels): for i in range(self.num_levels):
post_norm_block_ids = level2_post_norm_block_ids if level2_post_norm and ( post_norm_block_ids = level2_post_norm_block_ids if level2_post_norm and (
i == 2) else None # for InternImage-H/G i == 2) else None # for InternImage-H/G
level = InternImageBlock( level = InternImageBlock(
core_op=getattr(opsm, core_op), core_op=getattr(opsm, core_op),
channels=int(channels * 2**i), channels=int(channels * 2 ** i),
depth=depths[i], depth=depths[i],
groups=groups[i], groups=groups[i],
mlp_ratio=self.mlp_ratio, mlp_ratio=self.mlp_ratio,
...@@ -626,9 +627,9 @@ class InternImage(nn.Module): ...@@ -626,9 +627,9 @@ class InternImage(nn.Module):
offset_scale=offset_scale, offset_scale=offset_scale,
with_cp=with_cp, with_cp=with_cp,
dw_kernel_size=dw_kernel_size, # for InternImage-H/G dw_kernel_size=dw_kernel_size, # for InternImage-H/G
post_norm_block_ids=post_norm_block_ids, # for InternImage-H/G post_norm_block_ids=post_norm_block_ids, # for InternImage-H/G
res_post_norm=res_post_norm, # for InternImage-H/G res_post_norm=res_post_norm, # for InternImage-H/G
center_feature_scale=center_feature_scale # for InternImage-H/G center_feature_scale=center_feature_scale # for InternImage-H/G
) )
self.levels.append(level) self.levels.append(level)
...@@ -699,4 +700,4 @@ class InternImage(nn.Module): ...@@ -699,4 +700,4 @@ class InternImage(nn.Module):
x, x_ = level(x, return_wo_downsample=True) x, x_ = level(x, return_wo_downsample=True)
if level_idx in self.out_indices: if level_idx in self.out_indices:
seq_out.append(x_.permute(0, 3, 1, 2).contiguous()) seq_out.append(x_.permute(0, 3, 1, 2).contiguous())
return seq_out return seq_out
\ No newline at end of file
...@@ -4,16 +4,14 @@ ...@@ -4,16 +4,14 @@
# Licensed under The MIT License [see LICENSE for details] # Licensed under The MIT License [see LICENSE for details]
# -------------------------------------------------------- # --------------------------------------------------------
from __future__ import absolute_import from __future__ import absolute_import, division, print_function
from __future__ import print_function
from __future__ import division
import DCNv3
import torch import torch
import torch.nn.functional as F import torch.nn.functional as F
from torch.autograd import Function from torch.autograd import Function
from torch.autograd.function import once_differentiable from torch.autograd.function import once_differentiable
from torch.cuda.amp import custom_bwd, custom_fwd from torch.cuda.amp import custom_bwd, custom_fwd
import DCNv3
class DCNv3Function(Function): class DCNv3Function(Function):
...@@ -58,7 +56,7 @@ class DCNv3Function(Function): ...@@ -58,7 +56,7 @@ class DCNv3Function(Function):
ctx.group_channels, ctx.offset_scale, grad_output.contiguous(), ctx.im2col_step) ctx.group_channels, ctx.offset_scale, grad_output.contiguous(), ctx.im2col_step)
return grad_input, grad_offset, grad_mask, \ return grad_input, grad_offset, grad_mask, \
None, None, None, None, None, None, None, None, None, None, None, None None, None, None, None, None, None, None, None, None, None, None, None
@staticmethod @staticmethod
def symbolic(g, input, offset, mask, kernel_h, kernel_w, stride_h, def symbolic(g, input, offset, mask, kernel_h, kernel_w, stride_h,
...@@ -88,7 +86,9 @@ class DCNv3Function(Function): ...@@ -88,7 +86,9 @@ class DCNv3Function(Function):
im2col_step_i=int(im2col_step), im2col_step_i=int(im2col_step),
) )
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0, stride_h=1, stride_w=1):
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0,
stride_h=1, stride_w=1):
_, H_, W_, _ = spatial_shapes _, H_, W_, _ = spatial_shapes
H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1 H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1
W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1 W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1
...@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil ...@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil
device=device)) device=device))
points_list.extend([x / W_, y / H_]) points_list.extend([x / W_, y / H_])
grid = torch.stack(points_list, -1).reshape(-1, 1, 2).\ grid = torch.stack(points_list, -1).reshape(-1, 1, 2). \
repeat(1, group, 1).permute(1, 0, 2) repeat(1, group, 1).permute(1, 0, 2)
grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2) grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2)
...@@ -161,28 +161,28 @@ def dcnv3_core_pytorch( ...@@ -161,28 +161,28 @@ def dcnv3_core_pytorch(
input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w) input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w)
grid = _generate_dilation_grids( grid = _generate_dilation_grids(
input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device) input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device)
spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2).\ spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2). \
repeat(1, 1, 1, group*kernel_h*kernel_w).to(input.device) repeat(1, 1, 1, group * kernel_h * kernel_w).to(input.device)
sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \ sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \
offset * offset_scale / spatial_norm offset * offset_scale / spatial_norm
P_ = kernel_h * kernel_w P_ = kernel_h * kernel_w
sampling_grids = 2 * sampling_locations - 1 sampling_grids = 2 * sampling_locations - 1
# N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in # N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in
input_ = input.view(N_, H_in*W_in, group*group_channels).transpose(1, 2).\ input_ = input.view(N_, H_in * W_in, group * group_channels).transpose(1, 2). \
reshape(N_*group, group_channels, H_in, W_in) reshape(N_ * group, group_channels, H_in, W_in)
# N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2 # N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2
sampling_grid_ = sampling_grids.view(N_, H_out*W_out, group, P_, 2).transpose(1, 2).\ sampling_grid_ = sampling_grids.view(N_, H_out * W_out, group, P_, 2).transpose(1, 2). \
flatten(0, 1) flatten(0, 1)
# N_*group, group_channels, H_out*W_out, P_ # N_*group, group_channels, H_out*W_out, P_
sampling_input_ = F.grid_sample( sampling_input_ = F.grid_sample(
input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False) input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False)
# (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_) # (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_)
mask = mask.view(N_, H_out*W_out, group, P_).transpose(1, 2).\ mask = mask.view(N_, H_out * W_out, group, P_).transpose(1, 2). \
reshape(N_*group, 1, H_out*W_out, P_) reshape(N_ * group, 1, H_out * W_out, P_)
output = (sampling_input_ * mask).sum(-1).view(N_, output = (sampling_input_ * mask).sum(-1).view(N_,
group*group_channels, H_out*W_out) group * group_channels, H_out * W_out)
return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous() return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous()
...@@ -4,4 +4,4 @@ ...@@ -4,4 +4,4 @@
# Licensed under The MIT License [see LICENSE for details] # Licensed under The MIT License [see LICENSE for details]
# -------------------------------------------------------- # --------------------------------------------------------
from .dcnv3 import DCNv3, DCNv3_pytorch from .dcnv3 import DCNv3, DCNv3_pytorch
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment