Commit 41b18fd8 authored by zhe chen's avatar zhe chen
Browse files

Use pre-commit to reformat code


Use pre-commit to reformat code
parent ff20ea39
## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!!
We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
We improve our baseline with a more powerful image backbone: **InternImage**, which shows its excellent ability within a
series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
#### 1. Requirements
```bash
python>=3.8
torch==1.12 # recommend
......@@ -16,8 +16,8 @@ numpy==1.22
mmdet3d==0.18.1 # recommend
```
### 2. Install DCNv3 for InternImage
```bash
cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3
bash make.sh # requires torch>=1.10
......@@ -31,32 +31,33 @@ bash make.sh # requires torch>=1.10
Notes: InatenImage provides abundant pre-trained model weights that can be used!!!
### 4. Performance compared to baseline
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
bevformer_intern-s_occ|[Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)| 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
| model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
| ---------------------- | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
| bevformer_intern-s_occ | [Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing) | 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 |
| bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
## Challenge Timeline
- Pending - Challenge Period Open.
- Jun 01, 2023 - Challenge Period End.
- Jun 03, 2023 - Finalist Notification.
- Jun 10, 2023 - Technical Report Deadline.
- Jun 12, 2023 - Winner Announcement.
<p align="right">(<a href="#top">back to top</a>)</p>
<p align="right">(<a href="#top">back to top</a>)</p>
## Leaderboard
To be released.
<p align="right">(<a href="#top">back to top</a>)</p>
## License
Before using the dataset, you should register on the website and agree to the terms of use of the [nuScenes](https://www.nuscenes.org/nuscenes).
All code within this repository is under [Apache License 2.0](./LICENSE).
Before using the dataset, you should register on the website and agree to the terms of use of
the [nuScenes](https://www.nuscenes.org/nuscenes). All code within this repository is
under [Apache License 2.0](./LICENSE).
<p align="right">(<a href="#top">back to top</a>)</p>
## Installation
Follow https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md to prepare the environment.
## Preparing Dataset
1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample files of the nuScenes dataset.
1. Download the gts and annotations.json we provided. You can download our imgs.tar.gz or using the original sample
files of the nuScenes dataset.
2. Download the CAN bus expansion data and maps [HERE](https://www.nuscenes.org/download).
3. Organize your folder structure as below:
```
Occupancy3D
├── projects/
......@@ -23,28 +27,28 @@ Occupancy3D
│ │ │── annotations.json
```
4. Generate the info files for training and validation:
```
python tools/create_data.py occ --root-path ./data/occ3d-nus --out-dir ./data/occ3d-nus --extra-tag occ --version v1.0-trainval --canbus ./data --occ-path ./data/occ3d-nus
```
## Training
```
./tools/dist_train.sh projects/configs/bevformer/bevformer_base_occ.py 8
```
## Testing
```
./tools/dist_test.sh projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 8
```
You can evaluate the F-score at the same time by adding `--eval_fscore`.
You can evaluate the F-score at the same time by adding `--eval_fscore`.
### Performance
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
| model name | weight | mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
| ------------------ | :---------------------------------------------------------------------------------------------------: | :---: | :----: | :-----: | :-----: | :---: | :---: | :------------------: | :--------: | :--------: | :----------: | :-----: | :---: | :---------------: | :--------: | :------: | :-----: | :-----: | :--------: |
| bevformer_base_occ | [Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link) | 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.7 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
......@@ -125,8 +125,7 @@ data = dict(
classes=class_names,
test_mode=True,
ignore_index=len(class_names),
scene_idxs=data_root +
f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
scene_idxs=data_root + f'seg_info/Area_{test_area}_resampled_scene_idxs.npy'),
test=dict(
type=dataset_type,
data_root=data_root,
......
......@@ -25,7 +25,7 @@ model = dict(
in_channels=256,
num_points=256,
gt_per_seed=1,
conv_channels=(128, ),
conv_channels=(128,),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
with_res_feat=False,
......@@ -43,8 +43,8 @@ model = dict(
pred_layer_cfg=dict(
in_channels=1536,
shared_conv_channels=(512, 128),
cls_conv_channels=(128, ),
reg_conv_channels=(128, ),
cls_conv_channels=(128,),
reg_conv_channels=(128,),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.1),
bias=True),
......
......@@ -31,16 +31,16 @@ model = dict(
dir_offset=0.7854, # pi/4
strides=[8, 16, 32, 64, 128],
group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo
cls_branch=(256, ),
cls_branch=(256,),
reg_branch=(
(256, ), # offset
(256, ), # depth
(256, ), # size
(256, ), # rot
(256,), # offset
(256,), # depth
(256,), # size
(256,), # rot
() # velo
),
dir_branch=(256, ),
attr_branch=(256, ),
dir_branch=(256,),
attr_branch=(256,),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
......
......@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection
......@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True)
_dim_ = 256
_pos_dim_ = _dim_//2
_ffn_dim_ = _dim_*2
_pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_ * 2
_num_levels_ = 2
bev_h_ = 200
bev_w_ = 200
......@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True,
style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)),
img_neck=dict(
type='FPN',
......@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25,
# loss_weight=10.0),
use_mask=False,
loss_occ= dict(
loss_occ=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
......@@ -137,17 +136,18 @@ model = dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus'
occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
......@@ -156,11 +156,11 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(
type='MultiScaleFlipAug3D',
......
......@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection
......@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True)
_dim_ = 256
_pos_dim_ = _dim_//2
_ffn_dim_ = _dim_*2
_pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_ * 2
_num_levels_ = 4
bev_h_ = 200
bev_w_ = 200
......@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True,
style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)),
img_neck=dict(
type='FPN',
......@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25,
# loss_weight=10.0),
use_mask=False,
loss_occ= dict(
loss_occ=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
......@@ -128,7 +127,6 @@ model = dict(
),
# model training and testing settings
train_cfg=dict(pts=dict(
grid_size=[512, 512, 1],
......@@ -139,17 +137,18 @@ model = dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus'
occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
......@@ -157,12 +156,12 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(
......
......@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# For nuScenes we usually do 10-class detection
......@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True)
_dim_ = 256
_pos_dim_ = _dim_//2
_ffn_dim_ = _dim_*2
_pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_ * 2
_num_levels_ = 4
bev_h_ = 200
bev_w_ = 200
......@@ -81,7 +79,7 @@ model = dict(
# alpha=0.25,
# loss_weight=10.0),
use_mask=False,
loss_occ= dict(
loss_occ=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
......@@ -134,7 +132,6 @@ model = dict(
),
# model training and testing settings
train_cfg=dict(pts=dict(
grid_size=[512, 512, 1],
......@@ -145,17 +142,18 @@ model = dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus'
occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
......@@ -163,12 +161,12 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(
......@@ -223,7 +221,7 @@ optimizer = dict(
constructor='CustomLayerDecayOptimizerConstructor',
paramwise_cfg=dict(
num_layers=33, layer_decay_rate=1.0,
depths=[4, 4, 21,4]))
depths=[4, 4, 21, 4]))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
......
......@@ -11,8 +11,6 @@ plugin_dir = 'projects/mmdet3d_plugin/'
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
voxel_size = [0.2, 0.2, 8]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
# For nuScenes we usually do 10-class detection
......@@ -29,8 +27,8 @@ input_modality = dict(
use_external=True)
_dim_ = 256
_pos_dim_ = _dim_//2
_ffn_dim_ = _dim_*2
_pos_dim_ = _dim_ // 2
_ffn_dim_ = _dim_ * 2
_num_levels_ = 2
bev_h_ = 200
bev_w_ = 200
......@@ -48,7 +46,8 @@ model = dict(
norm_cfg=dict(type='BN2d', requires_grad=False),
norm_eval=True,
style='caffe',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), # original DCNv2 will print log when perform load_state_dict
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
# original DCNv2 will print log when perform load_state_dict
stage_with_dcn=(False, False, True, True)),
img_neck=dict(
type='FPN',
......@@ -75,7 +74,7 @@ model = dict(
# alpha=0.25,
# loss_weight=10.0),
use_mask=False,
loss_occ= dict(
loss_occ=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
......@@ -137,17 +136,18 @@ model = dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
iou_cost=dict(type='IoUCost', weight=0.0),
# Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range)))))
dataset_type = 'NuSceneOcc'
data_root = 'data/occ3d-nus/'
file_client_args = dict(backend='disk')
occ_gt_data_root='data/occ3d-nus'
occ_gt_data_root = 'data/occ3d-nus'
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='PhotoMetricDistortionMultiViewImage'),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
dict(type='RandomScaleImageMultiViewImage', scales=[0.2]),
......@@ -156,11 +156,11 @@ train_pipeline = [
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(type='PadMultiViewImage', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='CustomCollect3D', keys=[ 'img','voxel_semantics','mask_lidar','mask_camera'] )
dict(type='CustomCollect3D', keys=['img', 'voxel_semantics', 'mask_lidar', 'mask_camera'])
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=True),
dict(type='LoadOccGTFromFile',data_root=occ_gt_data_root),
dict(type='LoadOccGTFromFile', data_root=occ_gt_data_root),
dict(type='NormalizeMultiviewImage', **img_norm_cfg),
dict(
type='MultiScaleFlipAug3D',
......
from .bevformer import *
from .core.bbox.assigners.hungarian_assigner_3d import HungarianAssigner3D
from .core.bbox.coders.nms_free_coder import NMSFreeCoder
from .core.bbox.match_costs import BBox3DL1Cost
from .core.evaluation.eval_hooks import CustomDistEvalHook
from .datasets.pipelines import (
PhotoMetricDistortionMultiViewImage, PadMultiViewImage,
NormalizeMultiviewImage, CustomCollect3D)
from .datasets.pipelines import (CustomCollect3D, NormalizeMultiviewImage,
PadMultiViewImage,
PhotoMetricDistortionMultiViewImage)
from .models.backbones.vovnet import VoVNet
from .models.utils import *
from .models.opt.adamw import AdamW2
from .bevformer import *
from .models.utils import *
from .backbones import *
from .dense_heads import *
from .detectors import *
from .hooks import *
from .modules import *
from .runner import *
from .hooks import *
from .backbones import *
\ No newline at end of file
from .train import custom_train_model
from .mmdet_train import custom_train_detector
from .train import custom_train_model
# from .test import custom_multi_gpu_test
......@@ -3,28 +3,25 @@
# ---------------------------------------------
# Modified by Zhiqi Li
# ---------------------------------------------
import random
import os.path as osp
import time
import warnings
import numpy as np
import torch
import torch.distributed as dist
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,
Fp16OptimizerHook, OptimizerHook, build_optimizer,
build_runner, get_dist_info)
build_runner)
from mmcv.utils import build_from_cfg
from mmdet.core import EvalHook
from mmdet.datasets import (build_dataset,
replace_ImageToTensor)
from mmdet.datasets import replace_ImageToTensor
from mmdet.utils import get_root_logger
import time
import os.path as osp
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
from projects.mmdet3d_plugin.core.evaluation.eval_hooks import CustomDistEvalHook
from projects.mmdet3d_plugin.core.evaluation.eval_hooks import \
CustomDistEvalHook
from projects.mmdet3d_plugin.datasets import custom_build_dataset
from projects.mmdet3d_plugin.datasets.builder import build_dataloader
def custom_train_detector(model,
dataset,
cfg,
......@@ -38,7 +35,7 @@ def custom_train_detector(model,
# prepare data loaders
dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]
#assert len(dataset)==1s
# assert len(dataset)==1s
if 'imgs_per_gpu' in cfg.data:
logger.warning('"imgs_per_gpu" is deprecated in MMDet V2.0. '
'Please use "samples_per_gpu" instead')
......@@ -90,7 +87,6 @@ def custom_train_detector(model,
eval_model = MMDataParallel(
eval_model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
# build runner
optimizer = build_optimizer(model, cfg.optimizer)
......@@ -144,9 +140,9 @@ def custom_train_detector(model,
cfg.get('momentum_config', None))
# register profiler hook
#trace_config = dict(type='tb_trace', dir_name='work_dir')
#profiler_config = dict(on_trace_ready=trace_config)
#runner.register_profiler_hook(profiler_config)
# trace_config = dict(type='tb_trace', dir_name='work_dir')
# profiler_config = dict(on_trace_ready=trace_config)
# runner.register_profiler_hook(profiler_config)
if distributed:
if isinstance(runner, EpochBasedRunner):
......@@ -174,7 +170,7 @@ def custom_train_detector(model,
)
eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ','_').replace(':','_'))
eval_cfg['jsonfile_prefix'] = osp.join('val', cfg.work_dir, time.ctime().replace(' ', '_').replace(':', '_'))
eval_hook = CustomDistEvalHook if distributed else EvalHook
runner.register_hook(eval_hook(val_dataloader, **eval_cfg))
......@@ -197,4 +193,3 @@ def custom_train_detector(model,
elif cfg.load_from:
runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow)
......@@ -4,23 +4,17 @@
# Modified by Xiaoyu Tian
# ---------------------------------------------
import os.path as osp
import pickle
import shutil
import tempfile
import time
import mmcv
import numpy as np
import pycocotools.mask as mask_util
import torch
import torch.distributed as dist
from mmcv.image import tensor2imgs
from mmcv.runner import get_dist_info
from mmdet.core import encode_mask_results
import mmcv
import numpy as np
import pycocotools.mask as mask_util
def custom_encode_mask_results(mask_results):
"""Encode bitmap mask to RLE code. Semantic Masks only
......@@ -42,6 +36,7 @@ def custom_encode_mask_results(mask_results):
dtype='uint8'))[0]) # encoded with RLE
return [encoded_mask_results]
def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
"""Test model with multiple gpus.
This method tests model with multiple gpus and collects the results
......@@ -71,8 +66,8 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
for i, data in enumerate(data_loader):
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
bs=result.shape[0]
assert bs==1, \
bs = result.shape[0]
assert bs == 1, \
'Evaluation only supports batch_size=1 in this version'
# encode mask results
if isinstance(result, dict):
......@@ -90,7 +85,7 @@ def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False):
# batch_size = len(result)
# bbox_results.extend(result)
#if isinstance(result[0], tuple):
# if isinstance(result[0], tuple):
# assert False, 'this code is for instance segmentation, which our code will not utilize.'
# result = [(bbox_results, encode_mask_results(mask_results))
# for bbox_results, mask_results in result]
......@@ -125,7 +120,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
if tmpdir is None:
MAX_LEN = 512
# 32 is whitespace
dir_tensor = torch.full((MAX_LEN, ),
dir_tensor = torch.full((MAX_LEN,),
32,
dtype=torch.uint8,
device='cuda')
......@@ -156,7 +151,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
'''
bacause we change the sample of the evaluation stage to make sure that each gpu will handle continuous sample,
'''
#for res in zip(*part_list):
# for res in zip(*part_list):
for res in part_list:
ordered_results.extend(list(res))
# the dataloader may pad some samples
......@@ -165,6 +160,7 @@ def collect_results_cpu(result_part, size, tmpdir=None):
shutil.rmtree(tmpdir)
return ordered_results
def single_gpu_test(model,
data_loader,
show=False,
......
......@@ -4,9 +4,11 @@
# Modified by Zhiqi Li
# ---------------------------------------------
from .mmdet_train import custom_train_detector
from mmseg.apis import train_segmentor
from mmdet.apis import train_detector
from mmseg.apis import train_segmentor
from .mmdet_train import custom_train_detector
def custom_train_model(model,
dataset,
......
from .custom_layer_decay_optimizer_constructor import \
CustomLayerDecayOptimizerConstructor
from .internimage import InternImage
from .custom_layer_decay_optimizer_constructor import CustomLayerDecayOptimizerConstructor
\ No newline at end of file
......@@ -10,18 +10,18 @@ https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_c
import json
from mmcv.runner import OPTIMIZER_BUILDERS, DefaultOptimizerConstructor
from mmcv.runner import get_dist_info
from mmcv.runner import (OPTIMIZER_BUILDERS, DefaultOptimizerConstructor,
get_dist_info)
from mmdet.utils import get_root_logger
def get_num_layer_for_swin(var_name, num_max_layer, depths):
if var_name.startswith("img_backbone.patch_embed"):
if var_name.startswith('img_backbone.patch_embed'):
return 0
elif "level_embeds" in var_name:
elif 'level_embeds' in var_name:
return 0
elif var_name.startswith("img_backbone.layers") or var_name.startswith(
"img_backbone.levels"):
elif var_name.startswith('img_backbone.layers') or var_name.startswith(
'img_backbone.levels'):
if var_name.split('.')[3] not in ['downsample', 'norm']:
stage_id = int(var_name.split('.')[2])
layer_id = int(var_name.split('.')[4])
......@@ -74,64 +74,64 @@ class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
depths = self.paramwise_cfg.get('depths')
offset_lr_scale = self.paramwise_cfg.get('offset_lr_scale', 1.0)
logger.info("Build CustomLayerDecayOptimizerConstructor %f - %d" %
logger.info('Build CustomLayerDecayOptimizerConstructor %f - %d' %
(layer_decay_rate, num_layers))
weight_decay = self.base_wd
for name, param in module.named_parameters():
if not param.requires_grad:
continue # frozen weights
if len(param.shape) == 1 or name.endswith(".bias") or \
"relative_position" in name or \
"norm" in name or\
"sampling_offsets" in name:
group_name = "no_decay"
if len(param.shape) == 1 or name.endswith('.bias') or \
'relative_position' in name or \
'norm' in name or \
'sampling_offsets' in name:
group_name = 'no_decay'
this_weight_decay = 0.
else:
group_name = "decay"
group_name = 'decay'
this_weight_decay = weight_decay
layer_id = get_num_layer_for_swin(name, num_layers, depths)
if layer_id == num_layers - 1 and dino_head and \
("sampling_offsets" in name or "reference_points" in name):
group_name = "layer_%d_%s_0.1x" % (layer_id, group_name)
elif "sampling_offsets" in name or "reference_points" in name:
group_name = "layer_%d_%s_offset_lr_scale" % (layer_id,
('sampling_offsets' in name or 'reference_points' in name):
group_name = 'layer_%d_%s_0.1x' % (layer_id, group_name)
elif 'sampling_offsets' in name or 'reference_points' in name:
group_name = 'layer_%d_%s_offset_lr_scale' % (layer_id,
group_name)
else:
group_name = "layer_%d_%s" % (layer_id, group_name)
group_name = 'layer_%d_%s' % (layer_id, group_name)
if group_name not in parameter_groups:
scale = layer_decay_rate ** (num_layers - layer_id - 1)
if scale < 1 and backbone_small_lr == True:
scale = scale * 0.1
if "0.1x" in group_name:
if '0.1x' in group_name:
scale = scale * 0.1
if "offset_lr_scale" in group_name:
if 'offset_lr_scale' in group_name:
scale = scale * offset_lr_scale
parameter_groups[group_name] = {
"weight_decay": this_weight_decay,
"params": [],
"param_names": [],
"lr_scale": scale,
"group_name": group_name,
"lr": scale * self.base_lr,
'weight_decay': this_weight_decay,
'params': [],
'param_names': [],
'lr_scale': scale,
'group_name': group_name,
'lr': scale * self.base_lr,
}
parameter_groups[group_name]["params"].append(param)
parameter_groups[group_name]["param_names"].append(name)
parameter_groups[group_name]['params'].append(param)
parameter_groups[group_name]['param_names'].append(name)
rank, _ = get_dist_info()
if rank == 0:
to_display = {}
for key in parameter_groups:
to_display[key] = {
"param_names": parameter_groups[key]["param_names"],
"lr_scale": parameter_groups[key]["lr_scale"],
"lr": parameter_groups[key]["lr"],
"weight_decay": parameter_groups[key]["weight_decay"],
'param_names': parameter_groups[key]['param_names'],
'lr_scale': parameter_groups[key]['lr_scale'],
'lr': parameter_groups[key]['lr'],
'weight_decay': parameter_groups[key]['weight_decay'],
}
logger.info("Param groups = %s" % json.dumps(to_display, indent=2))
logger.info('Param groups = %s' % json.dumps(to_display, indent=2))
# state_dict = module.state_dict()
# for group_name in parameter_groups:
......
......@@ -4,16 +4,17 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
from collections import OrderedDict
import torch
import torch.nn as nn
from collections import OrderedDict
import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint
from timm.models.layers import trunc_normal_, DropPath
from mmcv.runner import _load_checkpoint
from mmcv.cnn import constant_init, trunc_normal_init
from mmdet.utils import get_root_logger
from mmcv.runner import _load_checkpoint
from mmdet.models.builder import BACKBONES
import torch.nn.functional as F
from mmdet.utils import get_root_logger
from timm.models.layers import DropPath, trunc_normal_
from .ops_dcnv3 import modules as opsm
......@@ -187,7 +188,7 @@ class AttentiveBlock(nn.Module):
drop=0.,
attn_drop=0.,
drop_path=0.,
norm_layer="LN",
norm_layer='LN',
attn_head_dim=None,
out_dim=None):
super().__init__()
......@@ -577,7 +578,7 @@ class InternImage(nn.Module):
self.num_levels = len(depths)
self.depths = depths
self.channels = channels
self.num_features = int(channels * 2**(self.num_levels - 1))
self.num_features = int(channels * 2 ** (self.num_levels - 1))
self.post_norm = post_norm
self.mlp_ratio = mlp_ratio
self.init_cfg = init_cfg
......@@ -588,9 +589,9 @@ class InternImage(nn.Module):
logger.info(f'using activation layer: {act_layer}')
logger.info(f'using main norm layer: {norm_layer}')
logger.info(f'using dpr: {drop_path_type}, {drop_path_rate}')
logger.info(f"level2_post_norm: {level2_post_norm}")
logger.info(f"level2_post_norm_block_ids: {level2_post_norm_block_ids}")
logger.info(f"res_post_norm: {res_post_norm}")
logger.info(f'level2_post_norm: {level2_post_norm}')
logger.info(f'level2_post_norm_block_ids: {level2_post_norm_block_ids}')
logger.info(f'res_post_norm: {res_post_norm}')
in_chans = 3
self.patch_embed = StemLayer(in_chans=in_chans,
......@@ -612,7 +613,7 @@ class InternImage(nn.Module):
i == 2) else None # for InternImage-H/G
level = InternImageBlock(
core_op=getattr(opsm, core_op),
channels=int(channels * 2**i),
channels=int(channels * 2 ** i),
depth=depths[i],
groups=groups[i],
mlp_ratio=self.mlp_ratio,
......
......@@ -4,16 +4,14 @@
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import, division, print_function
import DCNv3
import torch
import torch.nn.functional as F
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from torch.cuda.amp import custom_bwd, custom_fwd
import DCNv3
class DCNv3Function(Function):
......@@ -88,7 +86,9 @@ class DCNv3Function(Function):
im2col_step_i=int(im2col_step),
)
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0, stride_h=1, stride_w=1):
def _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0,
stride_h=1, stride_w=1):
_, H_, W_, _ = spatial_shapes
H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1
W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1
......@@ -137,7 +137,7 @@ def _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dil
device=device))
points_list.extend([x / W_, y / H_])
grid = torch.stack(points_list, -1).reshape(-1, 1, 2).\
grid = torch.stack(points_list, -1).reshape(-1, 1, 2). \
repeat(1, group, 1).permute(1, 0, 2)
grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2)
......@@ -161,8 +161,8 @@ def dcnv3_core_pytorch(
input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w)
grid = _generate_dilation_grids(
input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device)
spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2).\
repeat(1, 1, 1, group*kernel_h*kernel_w).to(input.device)
spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2). \
repeat(1, 1, 1, group * kernel_h * kernel_w).to(input.device)
sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \
offset * offset_scale / spatial_norm
......@@ -170,19 +170,19 @@ def dcnv3_core_pytorch(
P_ = kernel_h * kernel_w
sampling_grids = 2 * sampling_locations - 1
# N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in
input_ = input.view(N_, H_in*W_in, group*group_channels).transpose(1, 2).\
reshape(N_*group, group_channels, H_in, W_in)
input_ = input.view(N_, H_in * W_in, group * group_channels).transpose(1, 2). \
reshape(N_ * group, group_channels, H_in, W_in)
# N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2
sampling_grid_ = sampling_grids.view(N_, H_out*W_out, group, P_, 2).transpose(1, 2).\
sampling_grid_ = sampling_grids.view(N_, H_out * W_out, group, P_, 2).transpose(1, 2). \
flatten(0, 1)
# N_*group, group_channels, H_out*W_out, P_
sampling_input_ = F.grid_sample(
input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False)
# (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_)
mask = mask.view(N_, H_out*W_out, group, P_).transpose(1, 2).\
reshape(N_*group, 1, H_out*W_out, P_)
mask = mask.view(N_, H_out * W_out, group, P_).transpose(1, 2). \
reshape(N_ * group, 1, H_out * W_out, P_)
output = (sampling_input_ * mask).sum(-1).view(N_,
group*group_channels, H_out*W_out)
group * group_channels, H_out * W_out)
return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment