"pcdet/models/vscode:/vscode.git/clone" did not exist on "c1d93158891a044e00c7ff0d41873d89eea20fa9"
Commit 41b18fd8 authored by zhe chen's avatar zhe chen
Browse files

Use pre-commit to reformat code


Use pre-commit to reformat code
parent ff20ea39
## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!! ## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!!
We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a series of leaderboards and benchmarks, such as *COCO* and *nuScenes*. We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a
series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
#### 1. Requirements #### 1. Requirements
```bash ```bash
python>=3.8 python>=3.8
torch==1.12 # recommend torch==1.12 # recommend
...@@ -16,8 +16,8 @@ numpy==1.22 ...@@ -16,8 +16,8 @@ numpy==1.22
mmdet3d==0.18.1 # recommend mmdet3d==0.18.1 # recommend
``` ```
### 2. Install DCNv3 for InternImage ### 2. Install DCNv3 for InternImage
```bash ```bash
cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3 cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3
bash make.sh # requires torch>=1.10 bash make.sh # requires torch>=1.10
...@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10 ...@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10
Notes: InatenImage provides abundant pre-trained model weights that can be used!!! Notes: InatenImage provides abundant pre-trained model weights that can be used!!!
### 4. Performance compared to baseline ### 4. Performance compared to baseline
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | | model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: | | ---------------------- | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
bevformer_intern-s_occ|[Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)| 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 | | bevformer_intern-s_occ | [Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing) | 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | | bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
## Challenge Timeline ## Challenge Timeline
- Pending - Challenge Period Open. - Pending - Challenge Period Open.
- Jun 01, 2023 - Challenge Period End. - Jun 01, 2023 - Challenge Period End.
- Jun 03, 2023 - Finalist Notification. - Jun 03, 2023 - Finalist Notification.
- Jun 10, 2023 - Technical Report Deadline. - Jun 10, 2023 - Technical Report Deadline.
- Jun 12, 2023 - Winner Announcement. - Jun 12, 2023 - Winner Announcement.
<p align="right">(<a href="#top">back to top</a>)</p>
<p align="right">(<a href="#top">back to top</a>)</p>
## Leaderboard ## Leaderboard
To be released. To be released.
<p align="right">(<a href="#top">back to top</a>)</p> <p align="right">(<a href="#top">back to top</a>)</p>
## License ## License
Before using the dataset, you should register on the website and agree to the terms of use of the [nuScenes](https://www.nuscenes.org/nuscenes).
All code within this repository is under [Apache License 2.0](./LICENSE). Before using the dataset, you should register on the website and agree to the terms of use of
the [nuScenes](https://www.nuscenes.org/nuscenes). All code within this repository is
under [Apache License 2.0](./LICENSE).
<p align="right">(<a href="#top">back to top</a>)</p> <p align="right">(<a href="#top">back to top</a>)</p>
## Installation ## Installation
Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment. Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
## Preparing Dataset ## Preparing Dataset
1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset.
1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample
files of the nuScenes dataset.
2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download). 2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download).
3. Organize your folder structure as below: 3. Organize your folder structure as below:
``` ```
Occupancy3D Occupancy3D
├── projects/ ├── projects/
...@@ -23,28 +27,28 @@ Occupancy3D ...@@ -23,28 +27,28 @@ Occupancy3D
│ │ │── annotations.json │ │ │── annotations.json
``` ```
4. Generate the info files for training and validation: 4. Generate the info files for training and validation:
``` ```
python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
``` ```
## Training ## Training
``` ```
./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8 ./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8
``` ```
## Testing ## Testing
``` ```
./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8 ./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8
``` ```
You can evaluate the F-score at the same time by adding `--eval_fscore`.
You can evaluate the F-score at the same time by adding `--eval_fscore`.
### Performance ### Performance
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation | | model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: | | ------------------ | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 | | bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
...@@ -125,8 +125,7 @@ data = dict( ...@@ -125,8 +125,7 @@ data = dict(
classes=class_names, classes=class_names,
test_mode=True, test_mode=True,
ignore_index=len(class_names), ignore_index=len(class_names),
scene_idxs=data_root + scene_idxs=data_root + f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
test=dict( test=dict(
type=dataset_type, type=dataset_type,
data_root=data_root, data_root=data_root,
......
...@@ -25,7 +25,7 @@ model = dict( ...@@ -25,7 +25,7 @@ model = dict(
in_channels=256, in_channels=256,
num_points=256, num_points=256,
gt_per_seed=1, gt_per_seed=1,
conv_channels=(128, ), conv_channels=(128,),
conv_cfg=dict(type='Conv1d'), conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1), norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
with_res_feat=False, with_res_feat=False,
...@@ -43,8 +43,8 @@ model = dict( ...@@ -43,8 +43,8 @@ model = dict(
pred_layer_cfg=dict( pred_layer_cfg=dict(
in_channels=1536, in_channels=1536,
shared_conv_channels=(512, 128), shared_conv_channels=(512, 128),
cls_conv_channels=(128, ), cls_conv_channels=(128,),
reg_conv_channels=(128, ), reg_conv_channels=(128,),
conv_cfg=dict(type='Conv1d'), conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1), norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
bias=True), bias=True),
......
...@@ -31,16 +31,16 @@ model = dict( ...@@ -31,16 +31,16 @@ model = dict(
dir_offset=0.7854, # pi/4 dir_offset=0.7854, # pi/4
strides=[8, 16, 32, 64, 128], strides=[8, 16, 32, 64, 128],
group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo
cls_branch=(256, ), cls_branch=(256,),
reg_branch=( reg_branch=(
(256, ), # offset (256,), # offset
(256, ), # depth (256,), # depth
(256, ), # size (256,), # size
(256, ), # rot (256,), # rot
() # velo () # velo
), ),
dir_branch=(256, ), dir_branch=(256,),
attr_branch=(256, ), attr_branch=(256,),
loss_cls=dict( loss_cls=dict(
type='FocalLoss', type='FocalLoss',
use_sigmoid=True, use_sigmoid=True,
......
...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/' ...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict( img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection # For nuScenes we usually do 10-class detection
...@@ -29,8 +27,8 @@ input_modality = dict( ...@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True) use_external=True)
_dim_ = 256 _dim_ = 256
_pos_dim_ = _dim_//2 _pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_*2 _ffn_dim_ = _dim_ * 2
_num_levels_ = 2 _num_levels_ = 2
bev_h_ = 200 bev_h_ = 200
bev_w_ = 200 bev_w_ = 200
...@@ -48,7 +46,8 @@ model = dict( ...@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False), norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True, norm_eval=True,
style='caffe', style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)), stage_with_dcn=(False, False, True, True)),
img_neck=dict( img_neck=dict(
type='FPN', type='FPN',
...@@ -75,7 +74,7 @@ model = dict( ...@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25, # alpha=0.25,
# loss_weight=10.0), # loss_weight=10.0),
use_mask=False, use_mask=False,
loss_occ= dict( loss_occ=dict(
type='CrossEntropyLoss', type='CrossEntropyLoss',
use_sigmoid=False, use_sigmoid=False,
loss_weight=1.0), loss_weight=1.0),
...@@ -137,17 +136,18 @@ model = dict( ...@@ -137,17 +136,18 @@ model = dict(
type='HungarianAssigner3D', type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0), cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]), dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
...@@ -156,11 +156,11 @@ train_pipeline = [ ...@@ -156,11 +156,11 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict( dict(
type='MultiScaleFlipAug3D', type='MultiScaleFlipAug3D',
......
...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/' ...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict( img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection # For nuScenes we usually do 10-class detection
...@@ -29,8 +27,8 @@ input_modality = dict( ...@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True) use_external=True)
_dim_ = 256 _dim_ = 256
_pos_dim_ = _dim_//2 _pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_*2 _ffn_dim_ = _dim_ * 2
_num_levels_ = 4 _num_levels_ = 4
bev_h_ = 200 bev_h_ = 200
bev_w_ = 200 bev_w_ = 200
...@@ -48,7 +46,8 @@ model = dict( ...@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False), norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True, norm_eval=True,
style='caffe', style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)), stage_with_dcn=(False, False, True, True)),
img_neck=dict( img_neck=dict(
type='FPN', type='FPN',
...@@ -75,7 +74,7 @@ model = dict( ...@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25, # alpha=0.25,
# loss_weight=10.0), # loss_weight=10.0),
use_mask=False, use_mask=False,
loss_occ= dict( loss_occ=dict(
type='CrossEntropyLoss', type='CrossEntropyLoss',
use_sigmoid=False, use_sigmoid=False,
loss_weight=1.0), loss_weight=1.0),
...@@ -128,7 +127,6 @@ model = dict( ...@@ -128,7 +127,6 @@ model = dict(
), ),
# model training and testing settings # model training and testing settings
train_cfg=dict(pts=dict( train_cfg=dict(pts=dict(
grid_size=[512, 512, 1], grid_size=[512, 512, 1],
...@@ -139,17 +137,18 @@ model = dict( ...@@ -139,17 +137,18 @@ model = dict(
type='HungarianAssigner3D', type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0), cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
...@@ -157,12 +156,12 @@ train_pipeline = [ ...@@ -157,12 +156,12 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict( dict(
......
...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/' ...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict( img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# For nuScenes we usually do 10-class detection # For nuScenes we usually do 10-class detection
...@@ -29,8 +27,8 @@ input_modality = dict( ...@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True) use_external=True)
_dim_ = 256 _dim_ = 256
_pos_dim_ = _dim_//2 _pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_*2 _ffn_dim_ = _dim_ * 2
_num_levels_ = 4 _num_levels_ = 4
bev_h_ = 200 bev_h_ = 200
bev_w_ = 200 bev_w_ = 200
...@@ -81,7 +79,7 @@ model = dict( ...@@ -81,7 +79,7 @@ model = dict(
# alpha=0.25, # alpha=0.25,
# loss_weight=10.0), # loss_weight=10.0),
use_mask=False, use_mask=False,
loss_occ= dict( loss_occ=dict(
type='CrossEntropyLoss', type='CrossEntropyLoss',
use_sigmoid=False, use_sigmoid=False,
loss_weight=1.0), loss_weight=1.0),
...@@ -134,7 +132,6 @@ model = dict( ...@@ -134,7 +132,6 @@ model = dict(
), ),
# model training and testing settings # model training and testing settings
train_cfg=dict(pts=dict( train_cfg=dict(pts=dict(
grid_size=[512, 512, 1], grid_size=[512, 512, 1],
...@@ -145,17 +142,18 @@ model = dict( ...@@ -145,17 +142,18 @@ model = dict(
type='HungarianAssigner3D', type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0), cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
...@@ -163,12 +161,12 @@ train_pipeline = [ ...@@ -163,12 +161,12 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict( dict(
...@@ -223,7 +221,7 @@ optimizer = dict( ...@@ -223,7 +221,7 @@ optimizer = dict(
constructor='CustomLayerDecayOptimizerConstructor', constructor='CustomLayerDecayOptimizerConstructor',
paramwise_cfg=dict( paramwise_cfg=dict(
num_layers=33, layer_decay_rate=1.0, num_layers=33, layer_decay_rate=1.0,
depths=[4, 4, 21,4])) depths=[4, 4, 21, 4]))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy # learning policy
......
...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/' ...@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4] point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8] voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict( img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection # For nuScenes we usually do 10-class detection
...@@ -29,8 +27,8 @@ input_modality = dict( ...@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True) use_external=True)
_dim_ = 256 _dim_ = 256
_pos_dim_ = _dim_//2 _pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_*2 _ffn_dim_ = _dim_ * 2
_num_levels_ = 2 _num_levels_ = 2
bev_h_ = 200 bev_h_ = 200
bev_w_ = 200 bev_w_ = 200
...@@ -48,7 +46,8 @@ model = dict( ...@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False), norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True, norm_eval=True,
style='caffe', style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)), stage_with_dcn=(False, False, True, True)),
img_neck=dict( img_neck=dict(
type='FPN', type='FPN',
...@@ -75,7 +74,7 @@ model = dict( ...@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25, # alpha=0.25,
# loss_weight=10.0), # loss_weight=10.0),
use_mask=False, use_mask=False,
loss_occ= dict( loss_occ=dict(
type='CrossEntropyLoss', type='CrossEntropyLoss',
use_sigmoid=False, use_sigmoid=False,
loss_weight=1.0), loss_weight=1.0),
...@@ -137,17 +136,18 @@ model = dict( ...@@ -137,17 +136,18 @@ model = dict(
type='HungarianAssigner3D', type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0), cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25), reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))))) pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc' dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/' data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk') file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus' occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [ train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]), dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
...@@ -156,11 +156,11 @@ train_pipeline = [ ...@@ -156,11 +156,11 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32), dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] ) dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
] ]
test_pipeline = [ test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root), dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict( dict(
type='MultiScaleFlipAug3D', type='MultiScaleFlipAug3D',
......
from .bevformer import *
from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D
from .core.bbox.coders.nms_free_coder import NMSFreeCoder from .core.bbox.coders.nms_free_coder import NMSFreeCoder
from .core.bbox.match_costs import BBox3DL1Cost from .core.bbox.match_costs import BBox3DL1Cost
from .core.evaluation.eval_hooks import CustomDistEvalHook from .core.evaluation.eval_hooks import CustomDistEvalHook
from .datasets.pipelines import ( from .datasets.pipelines import (CustomCollect3D, NormalizeMultiviewImage,
PhotoMetricDistortionMultiViewImage, PadMultiViewImage, PadMultiViewImage,
NormalizeMultiviewImage, CustomCollect3D) PhotoMetricDistortionMultiViewImage)
from .models.backbones.vovnet import VoVNet from .models.backbones.vovnet import VoVNet
from .models.utils import *
from .models.opt.adamw import AdamW2 from .models.opt.adamw import AdamW2
from .bevformer import * from .models.utils import *
from .backbones import *
from .dense_heads import * from .dense_heads import *
from .detectors import * from .detectors import *
from .hooks import *
from .modules import * from .modules import *
from .runner import * from .runner import *
from .hooks import *
from .backbones import *
\ No newline at end of file
from .train import custom_train_model
from .mmdet_train import custom_train_detector from .mmdet_train import custom_train_detector
from .train import custom_train_model
# from .test import custom_multi_gpu_test # from .test import custom_multi_gpu_test
...@@ -3,28 +3,25 @@ ...@@ -3,28 +3,25 @@
# --------------------------------------------- # ---------------------------------------------
# Modified by Zhiqi Li # Modified by Zhiqi Li
# --------------------------------------------- # ---------------------------------------------
import random import os.path as osp
import time
import warnings import warnings
import numpy as np
import torch import torch
import torch.distributed as dist
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner, from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,
Fp16OptimizerHook, OptimizerHook, build_optimizer, Fp16OptimizerHook, OptimizerHook, build_optimizer,
build_runner, get_dist_info) build_runner)
from mmcv.utils import build_from_cfg from mmcv.utils import build_from_cfg
from mmdet.core import EvalHook from mmdet.core import EvalHook
from mmdet.datasets import replace_ImageToTensor
from mmdet.datasets import (build_dataset,
replace_ImageToTensor)
from mmdet.utils import get_root_logger from mmdet.utils import get_root_logger
import time from projects.mmdet3d_plugin.core.evaluation.eval_hooks import \
import os.path as osp CustomDistEvalHook
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
from projects.mmdet3d_plugin.core.evaluation.eval_hooks import CustomDistEvalHook
from projects.mmdet3d_plugin.datasets import custom_build_dataset from projects.mmdet3d_plugin.datasets import custom_build_dataset
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
def custom_train_detector(model, def custom_train_detector(model,
dataset, dataset,
cfg, cfg,
...@@ -38,7 +35,7 @@ def custom_train_detector(model, ...@@ -38,7 +35,7 @@ def custom_train_detector(model,
# prepare data loaders # prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset] dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
#assert len(dataset)==1s # assert len(dataset)==1s
if 'imgs_per_gpu' in cfg.data: if 'imgs_per_gpu' in cfg.data:
logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. ' logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. '
'Please use "samples_per_gpu" instead') 'Please use "samples_per_gpu" instead')
...@@ -90,7 +87,6 @@ def custom_train_detector(model, ...@@ -90,7 +87,6 @@ def custom_train_detector(model,
eval_model = MMDataParallel( eval_model = MMDataParallel(
eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids) eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
# build runner # build runner
optimizer = build_optimizer(model, cfg.optimizer) optimizer = build_optimizer(model, cfg.optimizer)
...@@ -144,9 +140,9 @@ def custom_train_detector(model, ...@@ -144,9 +140,9 @@ def custom_train_detector(model,
cfg.get('momentum_config', None)) cfg.get('momentum_config', None))
# register profiler hook # register profiler hook
#trace_config = dict(type='tb_trace', dir_name='work_dir') # trace_config = dict(type='tb_trace', dir_name='work_dir')
#profiler_config = dict(on_trace_ready=trace_config) # profiler_config = dict(on_trace_ready=trace_config)
#runner.register_profiler_hook(profiler_config) # runner.register_profiler_hook(profiler_config)
if distributed: if distributed:
if isinstance(runner, EpochBasedRunner): if isinstance(runner, EpochBasedRunner):
...@@ -174,7 +170,7 @@ def custom_train_detector(model, ...@@ -174,7 +170,7 @@ def custom_train_detector(model,
) )
eval_cfg = cfg.get('evaluation', {}) eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner' eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ','_').replace(':','_')) eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ', '_').replace(':', '_'))
eval_hook = CustomDistEvalHook if distributed else EvalHook eval_hook = CustomDistEvalHook if distributed else EvalHook
runner.register_hook(eval_hook(val_dataloader, **eval_cfg)) runner.register_hook(eval_hook(val_dataloader, **eval_cfg))
...@@ -197,4 +193,3 @@ def custom_train_detector(model, ...@@ -197,4 +193,3 @@ def custom_train_detector(model,
elif cfg.load_from: elif cfg.load_from:
runner.load_checkpoint(cfg.load_from) runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow) runner.run(data_loaders, cfg.workflow)
...@@ -4,23 +4,17 @@ ...@@ -4,23 +4,17 @@
# Modified by Xiaoyu Tian # Modified by Xiaoyu Tian
# --------------------------------------------- # ---------------------------------------------
import os.path as osp import os.path as osp
import pickle
import shutil import shutil
import tempfile import tempfile
import time import time
import mmcv import mmcv
import numpy as np
import pycocotools.mask as mask_util
import torch import torch
import torch.distributed as dist import torch.distributed as dist
from mmcv.image import tensor2imgs
from mmcv.runner import get_dist_info from mmcv.runner import get_dist_info
from mmdet.core import encode_mask_results
import mmcv
import numpy as np
import pycocotools.mask as mask_util
def custom_encode_mask_results(mask_results): def custom_encode_mask_results(mask_results):
"""Encode bitmap mask to RLE code. Semantic Masks only """Encode bitmap mask to RLE code. Semantic Masks only
...@@ -42,6 +36,7 @@ def custom_encode_mask_results(mask_results): ...@@ -42,6 +36,7 @@ def custom_encode_mask_results(mask_results):
dtype='uint8'))[0]) # encoded with RLE dtype='uint8'))[0]) # encoded with RLE
return [encoded_mask_results] return [encoded_mask_results]
def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False): def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
"""Test model with multiple gpus. """Test model with multiple gpus.
This method tests model with multiple gpus and collects the results This method tests model with multiple gpus and collects the results
...@@ -71,8 +66,8 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False): ...@@ -71,8 +66,8 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
for i, data in enumerate(data_loader): for i, data in enumerate(data_loader):
with torch.no_grad(): with torch.no_grad():
result = model(return_loss=False, rescale=True, **data) result = model(return_loss=False, rescale=True, **data)
bs=result.shape[0] bs = result.shape[0]
assert bs==1, \ assert bs == 1, \
'Evaluation only supports batch_size=1 in this version' 'Evaluation only supports batch_size=1 in this version'
# encode mask results # encode mask results
if isinstance(result, dict): if isinstance(result, dict):
...@@ -90,7 +85,7 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False): ...@@ -90,7 +85,7 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
# batch_size = len(result) # batch_size = len(result)
# bbox_results.extend(result) # bbox_results.extend(result)
#if isinstance(result[0], tuple): # if isinstance(result[0], tuple):
# assert False, 'this code is for instance segmentation, which our code will not utilize.' # assert False, 'this code is for instance segmentation, which our code will not utilize.'
# result = [(bbox_results, encode_mask_results(mask_results)) # result = [(bbox_results, encode_mask_results(mask_results))
# for bbox_results, mask_results in result] # for bbox_results, mask_results in result]
...@@ -125,7 +120,7 @@ def collect_results_cpu(result_part, size, tmpdir=None): ...@@ -125,7 +120,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
if tmpdir is None: if tmpdir is None:
MAX_LEN = 512 MAX_LEN = 512
# 32 is whitespace # 32 is whitespace
dir_tensor = torch.full((MAX_LEN, ), dir_tensor = torch.full((MAX_LEN,),
32, 32,
dtype=torch.uint8, dtype=torch.uint8,
device='cuda') device='cuda')
...@@ -156,7 +151,7 @@ def collect_results_cpu(result_part, size, tmpdir=None): ...@@ -156,7 +151,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
''' '''
bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample, bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample,
''' '''
#for res in zip(*part_list): # for res in zip(*part_list):
for res in part_list: for res in part_list:
ordered_results.extend(list(res)) ordered_results.extend(list(res))
# the dataloader may pad some samples # the dataloader may pad some samples
...@@ -165,6 +160,7 @@ def collect_results_cpu(result_part, size, tmpdir=None): ...@@ -165,6 +160,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
shutil.rmtree(tmpdir) shutil.rmtree(tmpdir)
return ordered_results return ordered_results
def single_gpu_test(model, def single_gpu_test(model,
data_loader, data_loader,
show=False, show=False,
......
...@@ -4,9 +4,11 @@ ...@@ -4,9 +4,11 @@
# Modified by Zhiqi Li # Modified by Zhiqi Li
# --------------------------------------------- # ---------------------------------------------
from .mmdet_train import custom_train_detector
from mmseg.apis import train_segmentor
from mmdet.apis import train_detector from mmdet.apis import train_detector
from mmseg.apis import train_segmentor
from .mmdet_train import custom_train_detector
def custom_train_model(model, def custom_train_model(model,
dataset, dataset,
......
from .custom_layer_decay_optimizer_constructor import \
CustomLayerDecayOptimizerConstructor
from .internimage import InternImage from .internimage import InternImage
from .custom_layer_decay_optimizer_constructor import CustomLayerDecayOptimizerConstructor
\ No newline at end of file
...@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c ...@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c
import json import json
from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor from mmcv.runner import (OPTIMIZER_BUILDERS, DefaultOptimizerConstructor,
from mmcv.runner import get_dist_info get_dist_info)
from mmdet.utils import get_root_logger from mmdet.utils import get_root_logger
def get_num_layer_for_swin(var_name, num_max_layer, depths): def get_num_layer_for_swin(var_name, num_max_layer, depths):
if var_name.startswith("img_backbone.patch_embed"): if var_name.startswith('img_backbone.patch_embed'):
return 0 return 0
elif "level_embeds" in var_name: elif 'level_embeds' in var_name:
return 0 return 0
elif var_name.startswith("img_backbone.layers") or var_name.startswith( elif var_name.startswith('img_backbone.layers') or var_name.startswith(
"img_backbone.levels"): 'img_backbone.levels'):
if var_name.split('.')[3] not in ['downsample', 'norm']: if var_name.split('.')[3] not in ['downsample', 'norm']:
stage_id = int(var_name.split('.')[2]) stage_id = int(var_name.split('.')[2])
layer_id = int(var_name.split('.')[4]) layer_id = int(var_name.split('.')[4])
...@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor): ...@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
depths = self.paramwise_cfg.get('depths') depths = self.paramwise_cfg.get('depths')
offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0) offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0)
logger.info("Build CustomLayerDecayOptimizerConstructor %f - %d" % logger.info('Build CustomLayerDecayOptimizerConstructor %f - %d' %
(layer_decay_rate, num_layers)) (layer_decay_rate, num_layers))
weight_decay = self.base_wd weight_decay = self.base_wd
for name, param in module.named_parameters(): for name, param in module.named_parameters():
if not param.requires_grad: if not param.requires_grad:
continue # frozen weights continue # frozen weights
if len(param.shape) == 1 or name.endswith(".bias") or \ if len(param.shape) == 1 or name.endswith('.bias') or \
"relative_position" in name or \ 'relative_position' in name or \
"norm" in name or\ 'norm' in name or \
"sampling_offsets" in name: 'sampling_offsets' in name:
group_name = "no_decay" group_name = 'no_decay'
this_weight_decay = 0. this_weight_decay = 0.
else: else:
group_name = "decay" group_name = 'decay'
this_weight_decay = weight_decay this_weight_decay = weight_decay
layer_id = get_num_layer_for_swin(name, num_layers, depths) layer_id = get_num_layer_for_swin(name, num_layers, depths)
if layer_id == num_layers - 1 and dino_head and \ if layer_id == num_layers - 1 and dino_head and \
("sampling_offsets" in name or "reference_points" in name): ('sampling_offsets' in name or 'reference_points' in name):
group_name = "layer_%d_%s_0.1x" % (layer_id, group_name) group_name = 'layer_%d_%s_0.1x' % (layer_id, group_name)
elif "sampling_offsets" in name or "reference_points" in name: elif 'sampling_offsets' in name or 'reference_points' in name:
group_name = "layer_%d_%s_offset_lr_scale" % (layer_id, group_name = 'layer_%d_%s_offset_lr_scale' % (layer_id,
group_name) group_name)
else: else:
group_name = "layer_%d_%s" % (layer_id, group_name) group_name = 'layer_%d_%s' % (layer_id, group_name)
if group_name not in parameter_groups: if group_name not in parameter_groups:
scale = layer_decay_rate ** (num_layers - layer_id - 1) scale = layer_decay_rate ** (num_layers - layer_id - 1)
if scale < 1 and backbone_small_lr == True: if scale < 1 and backbone_small_lr == True:
scale = scale * 0.1 scale = scale * 0.1
if "0.1x" in group_name: if '0.1x' in group_name:
scale = scale * 0.1 scale = scale * 0.1
if "offset_lr_scale" in group_name: if 'offset_lr_scale' in group_name:
scale = scale * offset_lr_scale scale = scale * offset_lr_scale
parameter_groups[group_name] = { parameter_groups[group_name] = {
"weight_decay": this_weight_decay, 'weight_decay': this_weight_decay,
"params": [], 'params': [],
"param_names": [], 'param_names': [],
"lr_scale": scale, 'lr_scale': scale,
"group_name": group_name, 'group_name': group_name,
"lr": scale * self.base_lr, 'lr': scale * self.base_lr,
} }
parameter_groups[group_name]["params"].append(param) parameter_groups[group_name]['params'].append(param)
parameter_groups[group_name]["param_names"].append(name) parameter_groups[group_name]['param_names'].append(name)
rank, _ = get_dist_info() rank, _ = get_dist_info()
if rank == 0: if rank == 0:
to_display = {} to_display = {}
for key in parameter_groups: for key in parameter_groups:
to_display[key] = { to_display[key] = {
"param_names": parameter_groups[key]["param_names"], 'param_names': parameter_groups[key]['param_names'],
"lr_scale": parameter_groups[key]["lr_scale"], 'lr_scale': parameter_groups[key]['lr_scale'],
"lr": parameter_groups[key]["lr"], 'lr': parameter_groups[key]['lr'],
"weight_decay": parameter_groups[key]["weight_decay"], 'weight_decay': parameter_groups[key]['weight_decay'],
} }
logger.info("Param groups = %s" % json.dumps(to_display, indent=2)) logger.info('Param groups = %s' % json.dumps(to_display, indent=2))
# state_dict = module.state_dict() # state_dict = module.state_dict()
# for group_name in parameter_groups: # for group_name in parameter_groups:
......
...@@ -4,16 +4,17 @@ ...@@ -4,16 +4,17 @@
# Licensed under The MIT License [see LICENSE for details] # Licensed under The MIT License [see LICENSE for details]
# -------------------------------------------------------- # --------------------------------------------------------
from collections import OrderedDict
import torch import torch
import torch.nn as nn import torch.nn as nn
from collections import OrderedDict import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint import torch.utils.checkpoint as checkpoint
from timm.models.layers import trunc_normal_, DropPath
from mmcv.runner import _load_checkpoint
from mmcv.cnn import constant_init, trunc_normal_init from mmcv.cnn import constant_init, trunc_normal_init
from mmdet.utils import get_root_logger from mmcv.runner import _load_checkpoint
from mmdet.models.builder import BACKBONES from mmdet.models.builder import BACKBONES
import torch.nn.functional as F from mmdet.utils import get_root_logger
from timm.models.layers import DropPath, trunc_normal_
from .ops_dcnv3 import modules as opsm from .ops_dcnv3 import modules as opsm
...@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module): ...@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module):
drop=0., drop=0.,
attn_drop=0., attn_drop=0.,
drop_path=0., drop_path=0.,
norm_layer="LN", norm_layer='LN',
attn_head_dim=None, attn_head_dim=None,
out_dim=None): out_dim=None):
super().__init__() super().__init__()
...@@ -577,7 +578,7 @@ class InternImage(nn.Module): ...@@ -577,7 +578,7 @@ class InternImage(nn.Module):
self.num_levels = len(depths) self.num_levels = len(depths)
self.depths = depths self.depths = depths
self.channels = channels self.channels = channels
self.num_features = int(channels * 2**(self.num_levels - 1)) self.num_features = int(channels * 2 ** (self.num_levels - 1))
self.post_norm = post_norm self.post_norm = post_norm
self.mlp_ratio = mlp_ratio self.mlp_ratio = mlp_ratio
self.init_cfg = init_cfg self.init_cfg = init_cfg
...@@ -588,9 +589,9 @@ class InternImage(nn.Module): ...@@ -588,9 +589,9 @@ class InternImage(nn.Module):
logger.info(f'using activation layer: {act_layer}') logger.info(f'using activation layer: {act_layer}')
logger.info(f'using main norm layer: {norm_layer}') logger.info(f'using main norm layer: {norm_layer}')
logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}') logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}')
logger.info(f"level2_post_norm: {level2_post_norm}") logger.info(f'level2_post_norm: {level2_post_norm}')
logger.info(f"level2_post_norm_block_ids: {level2_post_norm_block_ids}") logger.info(f'level2_post_norm_block_ids: {level2_post_norm_block_ids}')
logger.info(f"res_post_norm: {res_post_norm}") logger.info(f'res_post_norm: {res_post_norm}')
in_chans = 3 in_chans = 3
self.patch_embed = StemLayer(in_chans=in_chans, self.patch_embed = StemLayer(in_chans=in_chans,
...@@ -612,7 +613,7 @@ class InternImage(nn.Module): ...@@ -612,7 +613,7 @@ class InternImage(nn.Module):
i == 2) else None # for InternImage-H/G i == 2) else None # for InternImage-H/G
level = InternImageBlock( level = InternImageBlock(
core_op=getattr(opsm, core_op), core_op=getattr(opsm, core_op),
channels=int(channels * 2**i), channels=int(channels * 2 ** i),
depth=depths[i], depth=depths[i],
groups=groups[i], groups=groups[i],
mlp_ratio=self.mlp_ratio, mlp_ratio=self.mlp_ratio,
......
...@@ -4,16 +4,14 @@ ...@@ -4,16 +4,14 @@
# Licensed under The MIT License [see LICENSE for details] # Licensed under The MIT License [see LICENSE for details]
# -------------------------------------------------------- # --------------------------------------------------------
from __future__ import absolute_import from __future__ import absolute_import, division, print_function
from __future__ import print_function
from __future__ import division
import DCNv3
import torch import torch
import torch.nn.functional as F import torch.nn.functional as F
from torch.autograd import Function from torch.autograd import Function
from torch.autograd.function import once_differentiable from torch.autograd.function import once_differentiable
from torch.cuda.amp import custom_bwd, custom_fwd from torch.cuda.amp import custom_bwd, custom_fwd
import DCNv3
class DCNv3Function(Function): class DCNv3Function(Function):
...@@ -88,7 +86,9 @@ class DCNv3Function(Function): ...@@ -88,7 +86,9 @@ class DCNv3Function(Function):
im2col_step_i=int(im2col_step), im2col_step_i=int(im2col_step),
) )
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0, stride_h=1, stride_w=1):
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0,
stride_h=1, stride_w=1):
_, H_, W_, _ = spatial_shapes _, H_, W_, _ = spatial_shapes
H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1 H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1
W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1 W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1
...@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil ...@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil
device=device)) device=device))
points_list.extend([x / W_, y / H_]) points_list.extend([x / W_, y / H_])
grid = torch.stack(points_list, -1).reshape(-1, 1, 2).\ grid = torch.stack(points_list, -1).reshape(-1, 1, 2). \
repeat(1, group, 1).permute(1, 0, 2) repeat(1, group, 1).permute(1, 0, 2)
grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2) grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2)
...@@ -161,8 +161,8 @@ def dcnv3_core_pytorch( ...@@ -161,8 +161,8 @@ def dcnv3_core_pytorch(
input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w) input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w)
grid = _generate_dilation_grids( grid = _generate_dilation_grids(
input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device) input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device)
spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2).\ spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2). \
repeat(1, 1, 1, group*kernel_h*kernel_w).to(input.device) repeat(1, 1, 1, group * kernel_h * kernel_w).to(input.device)
sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \ sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \
offset * offset_scale / spatial_norm offset * offset_scale / spatial_norm
...@@ -170,19 +170,19 @@ def dcnv3_core_pytorch( ...@@ -170,19 +170,19 @@ def dcnv3_core_pytorch(
P_ = kernel_h * kernel_w P_ = kernel_h * kernel_w
sampling_grids = 2 * sampling_locations - 1 sampling_grids = 2 * sampling_locations - 1
# N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in # N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in
input_ = input.view(N_, H_in*W_in, group*group_channels).transpose(1, 2).\ input_ = input.view(N_, H_in * W_in, group * group_channels).transpose(1, 2). \
reshape(N_*group, group_channels, H_in, W_in) reshape(N_ * group, group_channels, H_in, W_in)
# N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2 # N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2
sampling_grid_ = sampling_grids.view(N_, H_out*W_out, group, P_, 2).transpose(1, 2).\ sampling_grid_ = sampling_grids.view(N_, H_out * W_out, group, P_, 2).transpose(1, 2). \
flatten(0, 1) flatten(0, 1)
# N_*group, group_channels, H_out*W_out, P_ # N_*group, group_channels, H_out*W_out, P_
sampling_input_ = F.grid_sample( sampling_input_ = F.grid_sample(
input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False) input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False)
# (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_) # (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_)
mask = mask.view(N_, H_out*W_out, group, P_).transpose(1, 2).\ mask = mask.view(N_, H_out * W_out, group, P_).transpose(1, 2). \
reshape(N_*group, 1, H_out*W_out, P_) reshape(N_ * group, 1, H_out * W_out, P_)
output = (sampling_input_ * mask).sum(-1).view(N_, output = (sampling_input_ * mask).sum(-1).view(N_,
group*group_channels, H_out*W_out) group * group_channels, H_out * W_out)
return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous() return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment