Commit 85529f35 authored by unknown's avatar unknown
Browse files

添加openmmlab测试用例

parent b21b0c01
# Feature Pyramid Grids
## Introduction
```latex
@article{chen2020feature,
title={Feature pyramid grids},
author={Chen, Kai and Cao, Yuhang and Loy, Chen Change and Lin, Dahua and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2004.03580},
year={2020}
}
```
## Results and Models
We benchmark the new training schedule (crop training, large batch, unfrozen BN, 50 epochs) introduced in NAS-FPN.
All backbones are Resnet-50 in pytorch style.
| Method | Neck | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
|:------------:|:-----------:|:-------:|:--------:|:--------------:|:------:|:-------:|:-------:|:--------:|
| Faster R-CNN | FPG | 50e | 20.0 | - | 42.2 | - |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/faster_rcnn_r50_fpg_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg_crop640_50e_coco/faster_rcnn_r50_fpg_crop640_50e_coco-76220505.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg_crop640_50e_coco/20210218_223520.log.json) |
| Faster R-CNN | FPG-chn128 | 50e | 11.9 | - | 41.2 | - |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/faster_rcnn_r50_fpg-chn128_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg-chn128_crop640_50e_coco/faster_rcnn_r50_fpg-chn128_crop640_50e_coco-24257de9.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg-chn128_crop640_50e_coco/20210218_221412.log.json) |
| Mask R-CNN | FPG | 50e | 23.2 | - | 42.7 | 37.8 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/mask_rcnn_r50_fpg_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg_crop640_50e_coco/mask_rcnn_r50_fpg_crop640_50e_coco-c5860453.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg_crop640_50e_coco/20210222_205447.log.json) |
| Mask R-CNN | FPG-chn128 | 50e | 15.3 | - | 41.7 | 36.9 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/mask_rcnn_r50_fpg-chn128_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg-chn128_crop640_50e_coco/mask_rcnn_r50_fpg-chn128_crop640_50e_coco-5c6ea10d.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg-chn128_crop640_50e_coco/20210223_025039.log.json) |
| RetinaNet | FPG | 50e | 20.8 | - | 40.5 | - |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/retinanet_r50_fpg_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg_crop640_50e_coco/retinanet_r50_fpg_crop640_50e_coco-46fdd1c6.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg_crop640_50e_coco/20210225_143957.log.json) |
| RetinaNet | FPG-chn128 | 50e | 19.9 | - | 40.3 | - |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py) |[model](https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco/retinanet_r50_fpg-chn128_crop640_50e_coco-5cf33c76.pth) | [log](https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco/20210225_184328.log.json) |
**Note**: Chn128 means to decrease the number of channels of features and convs from 256 (default) to 128 in
Neck and BBox Head, which can greatly decrease memory consumption without sacrificing much precision.
_base_ = 'faster_rcnn_r50_fpg_crop640_50e_coco.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
neck=dict(out_channels=128, inter_channels=128),
rpn_head=dict(in_channels=128),
roi_head=dict(
bbox_roi_extractor=dict(out_channels=128),
bbox_head=dict(in_channels=128)))
_base_ = 'faster_rcnn_r50_fpn_crop640_50e_coco.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
neck=dict(
type='FPG',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
inter_channels=256,
num_outs=5,
stack_times=9,
paths=['bu'] * 9,
same_down_trans=None,
same_up_trans=dict(
type='conv',
kernel_size=3,
stride=2,
padding=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_lateral_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_down_trans=dict(
type='interpolation_conv',
mode='nearest',
kernel_size=3,
norm_cfg=norm_cfg,
order=('act', 'conv', 'norm'),
inplace=False),
across_up_trans=None,
across_skip_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
output_trans=dict(
type='last_conv',
kernel_size=3,
order=('act', 'conv', 'norm'),
inplace=False),
norm_cfg=norm_cfg,
skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()]))
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/coco_detection.py',
'../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
backbone=dict(norm_cfg=norm_cfg, norm_eval=False),
neck=dict(norm_cfg=norm_cfg),
roi_head=dict(bbox_head=dict(norm_cfg=norm_cfg)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=(640, 640),
ratio_range=(0.8, 1.2),
keep_ratio=True),
dict(type='RandomCrop', crop_size=(640, 640)),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=(640, 640)),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(640, 640),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=64),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
# learning policy
optimizer = dict(
type='SGD',
lr=0.08,
momentum=0.9,
weight_decay=0.0001,
paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=1000,
warmup_ratio=0.1,
step=[30, 40])
# runtime settings
runner = dict(max_epochs=50)
evaluation = dict(interval=2)
_base_ = 'mask_rcnn_r50_fpg_crop640_50e_coco.py'
model = dict(
neck=dict(out_channels=128, inter_channels=128),
rpn_head=dict(in_channels=128),
roi_head=dict(
bbox_roi_extractor=dict(out_channels=128),
bbox_head=dict(in_channels=128),
mask_roi_extractor=dict(out_channels=128),
mask_head=dict(in_channels=128)))
_base_ = 'mask_rcnn_r50_fpn_crop640_50e_coco.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
neck=dict(
type='FPG',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
inter_channels=256,
num_outs=5,
stack_times=9,
paths=['bu'] * 9,
same_down_trans=None,
same_up_trans=dict(
type='conv',
kernel_size=3,
stride=2,
padding=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_lateral_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_down_trans=dict(
type='interpolation_conv',
mode='nearest',
kernel_size=3,
norm_cfg=norm_cfg,
order=('act', 'conv', 'norm'),
inplace=False),
across_up_trans=None,
across_skip_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
output_trans=dict(
type='last_conv',
kernel_size=3,
order=('act', 'conv', 'norm'),
inplace=False),
norm_cfg=norm_cfg,
skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()]))
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
backbone=dict(norm_cfg=norm_cfg, norm_eval=False),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
norm_cfg=norm_cfg,
num_outs=5),
roi_head=dict(
bbox_head=dict(norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=(640, 640),
ratio_range=(0.8, 1.2),
keep_ratio=True),
dict(type='RandomCrop', crop_size=(640, 640)),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=(640, 640)),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(640, 640),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=64),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
# learning policy
optimizer = dict(
type='SGD',
lr=0.08,
momentum=0.9,
weight_decay=0.0001,
paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=1000,
warmup_ratio=0.1,
step=[30, 40])
# runtime settings
runner = dict(max_epochs=50)
evaluation = dict(interval=2)
Collections:
- Name: Feature Pyramid Grids
Metadata:
Training Data: COCO
Training Techniques:
- SGD with Momentum
- Weight Decay
Training Resources: 8x NVIDIA V100 GPUs
Architecture:
- Feature Pyramid Grids
Paper: https://arxiv.org/abs/2004.03580
README: configs/fpg/README.md
Models:
- Name: faster_rcnn_r50_fpg_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/faster_rcnn_r50_fpg_crop640_50e_coco.py
Metadata:
Training Memory (GB): 20.0
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 42.2
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg_crop640_50e_coco/faster_rcnn_r50_fpg_crop640_50e_coco-76220505.pth
- Name: faster_rcnn_r50_fpg-chn128_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/faster_rcnn_r50_fpg-chn128_crop640_50e_coco.py
Metadata:
Training Memory (GB): 11.9
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 41.2
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/faster_rcnn_r50_fpg-chn128_crop640_50e_coco/faster_rcnn_r50_fpg-chn128_crop640_50e_coco-24257de9.pth
- Name: mask_rcnn_r50_fpg_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/mask_rcnn_r50_fpg_crop640_50e_coco.py
Metadata:
Training Memory (GB): 23.2
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 42.7
- Task: Instance Segmentation
Dataset: COCO
Metrics:
mask AP: 37.8
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg_crop640_50e_coco/mask_rcnn_r50_fpg_crop640_50e_coco-c5860453.pth
- Name: mask_rcnn_r50_fpg-chn128_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/mask_rcnn_r50_fpg-chn128_crop640_50e_coco.py
Metadata:
Training Memory (GB): 15.3
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 41.7
- Task: Instance Segmentation
Dataset: COCO
Metrics:
mask AP: 36.9
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/mask_rcnn_r50_fpg-chn128_crop640_50e_coco/mask_rcnn_r50_fpg-chn128_crop640_50e_coco-5c6ea10d.pth
- Name: retinanet_r50_fpg_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/retinanet_r50_fpg_crop640_50e_coco.py
Metadata:
Training Memory (GB): 20.8
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 40.5
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg_crop640_50e_coco/retinanet_r50_fpg_crop640_50e_coco-46fdd1c6.pth
- Name: retinanet_r50_fpg-chn128_crop640_50e_coco
In Collection: Feature Pyramid Grids
Config: configs/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py
Metadata:
Training Memory (GB): 19.9
Epochs: 50
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 40.3
Weights: https://download.openmmlab.com/mmdetection/v2.0/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco/retinanet_r50_fpg-chn128_crop640_50e_coco-5cf33c76.pth
_base_ = 'retinanet_r50_fpg_crop640_50e_coco.py'
model = dict(
neck=dict(out_channels=128, inter_channels=128),
bbox_head=dict(in_channels=128))
_base_ = '../nas_fpn/retinanet_r50_nasfpn_crop640_50e_coco.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
neck=dict(
_delete_=True,
type='FPG',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
inter_channels=256,
num_outs=5,
add_extra_convs=True,
start_level=1,
stack_times=9,
paths=['bu'] * 9,
same_down_trans=None,
same_up_trans=dict(
type='conv',
kernel_size=3,
stride=2,
padding=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_lateral_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
across_down_trans=dict(
type='interpolation_conv',
mode='nearest',
kernel_size=3,
norm_cfg=norm_cfg,
order=('act', 'conv', 'norm'),
inplace=False),
across_up_trans=None,
across_skip_trans=dict(
type='conv',
kernel_size=1,
norm_cfg=norm_cfg,
inplace=False,
order=('act', 'conv', 'norm')),
output_trans=dict(
type='last_conv',
kernel_size=3,
order=('act', 'conv', 'norm'),
inplace=False),
norm_cfg=norm_cfg,
skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()]))
evaluation = dict(interval=2)
# FreeAnchor: Learning to Match Anchors for Visual Object Detection
## Introduction
<!-- [ALGORITHM] -->
```latex
@inproceedings{zhang2019freeanchor,
title = {{FreeAnchor}: Learning to Match Anchors for Visual Object Detection},
author = {Zhang, Xiaosong and Wan, Fang and Liu, Chang and Ji, Rongrong and Ye, Qixiang},
booktitle = {Neural Information Processing Systems},
year = {2019}
}
```
## Results and Models
| Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
|:--------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:--------:|
| R-50 | pytorch | 1x | 4.9 | 18.4 | 38.7 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/free_anchor/retinanet_free_anchor_r50_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r50_fpn_1x_coco/retinanet_free_anchor_r50_fpn_1x_coco_20200130-0f67375f.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r50_fpn_1x_coco/retinanet_free_anchor_r50_fpn_1x_coco_20200130_095625.log.json) |
| R-101 | pytorch | 1x | 6.8 | 14.9 | 40.3 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/free_anchor/retinanet_free_anchor_r101_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r101_fpn_1x_coco/retinanet_free_anchor_r101_fpn_1x_coco_20200130-358324e6.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r101_fpn_1x_coco/retinanet_free_anchor_r101_fpn_1x_coco_20200130_100723.log.json) |
| X-101-32x4d | pytorch | 1x | 8.1 | 11.1 | 41.9 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco/retinanet_free_anchor_x101_32x4d_fpn_1x_coco_20200130-d4846968.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco/retinanet_free_anchor_x101_32x4d_fpn_1x_coco_20200130_095627.log.json) |
**Notes:**
- We use 8 GPUs with 2 images/GPU.
- For more settings and models, please refer to the [official repo](https://github.com/zhangxiaosong18/FreeAnchor).
Collections:
- Name: FreeAnchor
Metadata:
Training Data: COCO
Training Techniques:
- SGD with Momentum
- Weight Decay
Training Resources: 8x NVIDIA V100 GPUs
Architecture:
- FreeAnchor
- ResNet
Paper: https://arxiv.org/abs/1909.02466
README: configs/free_anchor/README.md
Models:
- Name: retinanet_free_anchor_r50_fpn_1x_coco
In Collection: FreeAnchor
Config: configs/free_anchor/retinanet_free_anchor_r50_fpn_1x_coco.py
Metadata:
Training Memory (GB): 4.9
inference time (s/im): 0.05435
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 38.7
Weights: https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r50_fpn_1x_coco/retinanet_free_anchor_r50_fpn_1x_coco_20200130-0f67375f.pth
- Name: retinanet_free_anchor_r101_fpn_1x_coco
In Collection: FreeAnchor
Config: configs/free_anchor/retinanet_free_anchor_r101_fpn_1x_coco.py
Metadata:
Training Memory (GB): 6.8
inference time (s/im): 0.06711
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 40.3
Weights: https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_r101_fpn_1x_coco/retinanet_free_anchor_r101_fpn_1x_coco_20200130-358324e6.pth
- Name: retinanet_free_anchor_x101_32x4d_fpn_1x_coco
In Collection: FreeAnchor
Config: configs/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco.py
Metadata:
Training Memory (GB): 8.1
inference time (s/im): 0.09009
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 41.9
Weights: https://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco/retinanet_free_anchor_x101_32x4d_fpn_1x_coco_20200130-d4846968.pth
_base_ = './retinanet_free_anchor_r50_fpn_1x_coco.py'
model = dict(pretrained='torchvision://resnet101', backbone=dict(depth=101))
_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py'
model = dict(
bbox_head=dict(
_delete_=True,
type='FreeAnchorRetinaHead',
num_classes=80,
in_channels=256,
stacked_convs=4,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
octave_base_scale=4,
scales_per_octave=3,
ratios=[0.5, 1.0, 2.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.75)))
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
_base_ = './retinanet_free_anchor_r50_fpn_1x_coco.py'
model = dict(
pretrained='open-mmlab://resnext101_32x4d',
backbone=dict(
type='ResNeXt',
depth=101,
groups=32,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'))
# Feature Selective Anchor-Free Module for Single-Shot Object Detection
<!-- [ALGORITHM] -->
FSAF is an anchor-free method published in CVPR2019 ([https://arxiv.org/pdf/1903.00621.pdf](https://arxiv.org/pdf/1903.00621.pdf)).
Actually it is equivalent to the anchor-based method with only one anchor at each feature map position in each FPN level.
And this is how we implemented it.
Only the anchor-free branch is released for its better compatibility with the current framework and less computational budget.
In the original paper, feature maps within the central 0.2-0.5 area of a gt box are tagged as ignored. However,
it is empirically found that a hard threshold (0.2-0.2) gives a further gain on the performance. (see the table below)
## Main Results
### Results on R50/R101/X101-FPN
| Backbone | ignore range | ms-train| Lr schd |Train Mem (GB)| Train time (s/iter) | Inf time (fps) | box AP | Config | Download |
|:----------:| :-------: |:-------:|:-------:|:------------:|:---------------:|:--------------:|:-------------:|:------:|:--------:|
| R-50 | 0.2-0.5 | N | 1x | 3.15 | 0.43 | 12.3 | 36.0 (35.9) | | [model](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_pscale0.2_nscale0.5_r50_fpn_1x_coco/fsaf_pscale0.2_nscale0.5_r50_fpn_1x_coco_20200715-b555b0e0.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_pscale0.2_nscale0.5_r50_fpn_1x_coco/fsaf_pscale0.2_nscale0.5_r50_fpn_1x_coco_20200715_094657.log.json) |
| R-50 | 0.2-0.2 | N | 1x | 3.15 | 0.43 | 13.0 | 37.4 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fsaf/fsaf_r50_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r50_fpn_1x_coco/fsaf_r50_fpn_1x_coco-94ccc51f.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r50_fpn_1x_coco/fsaf_r50_fpn_1x_coco_20200428_072327.log.json)|
| R-101 | 0.2-0.2 | N | 1x | 5.08 | 0.58 | 10.8 | 39.3 (37.9) | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fsaf/fsaf_r101_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r101_fpn_1x_coco/fsaf_r101_fpn_1x_coco-9e71098f.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r101_fpn_1x_coco/fsaf_r101_fpn_1x_coco_20200428_160348.log.json)|
| X-101 | 0.2-0.2 | N | 1x | 9.38 | 1.23 | 5.6 | 42.4 (41.0) | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fsaf/fsaf_x101_64x4d_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco-e3f6e6fd.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco_20200428_160424.log.json)|
**Notes:**
- *1x means the model is trained for 12 epochs.*
- *AP values in the brackets represent those reported in the original paper.*
- *All results are obtained with a single model and single-scale test.*
- *X-101 backbone represents ResNext-101-64x4d.*
- *All pretrained backbones use pytorch style.*
- *All models are trained on 8 Titan-XP gpus and tested on a single gpu.*
## Citations
BibTeX reference is as follows.
```latex
@inproceedings{zhu2019feature,
title={Feature Selective Anchor-Free Module for Single-Shot Object Detection},
author={Zhu, Chenchen and He, Yihui and Savvides, Marios},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={840--849},
year={2019}
}
```
_base_ = './fsaf_r50_fpn_1x_coco.py'
model = dict(pretrained='torchvision://resnet101', backbone=dict(depth=101))
_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py'
# model settings
model = dict(
type='FSAF',
bbox_head=dict(
type='FSAFHead',
num_classes=80,
in_channels=256,
stacked_convs=4,
feat_channels=256,
reg_decoded_bbox=True,
# Only anchor-free branch is implemented. The anchor generator only
# generates 1 anchor at each feature point, as a substitute of the
# grid of features.
anchor_generator=dict(
type='AnchorGenerator',
octave_base_scale=1,
scales_per_octave=1,
ratios=[1.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(_delete_=True, type='TBLRBBoxCoder', normalizer=4.0),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0,
reduction='none'),
loss_bbox=dict(
_delete_=True,
type='IoULoss',
eps=1e-6,
loss_weight=1.0,
reduction='none')),
# training and testing settings
train_cfg=dict(
assigner=dict(
_delete_=True,
type='CenterRegionAssigner',
pos_scale=0.2,
neg_scale=0.2,
min_pos_iof=0.01),
allowed_border=-1,
pos_weight=-1,
debug=False))
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=10, norm_type=2))
_base_ = './fsaf_r50_fpn_1x_coco.py'
model = dict(
pretrained='open-mmlab://resnext101_64x4d',
backbone=dict(
type='ResNeXt',
depth=101,
groups=64,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
style='pytorch'))
Collections:
- Name: FSAF
Metadata:
Training Data: COCO
Training Techniques:
- SGD with Momentum
- Weight Decay
Training Resources: 8x NVIDIA Titan-XP GPUs
Architecture:
- FPN
- FSAF
- ResNet
Paper: https://arxiv.org/abs/1903.00621
README: configs/fsaf/README.md
Models:
- Name: fsaf_r50_fpn_1x_coco
In Collection: FSAF
Config: configs/fsaf/fsaf_r50_fpn_1x_coco.py
Metadata:
Training Memory (GB): 3.15
inference time (s/im): 0.07692
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 37.4
Weights: https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r50_fpn_1x_coco/fsaf_r50_fpn_1x_coco-94ccc51f.pth
- Name: fsaf_r101_fpn_1x_coco
In Collection: FSAF
Config: configs/fsaf/fsaf_r101_fpn_1x_coco.py
Metadata:
Training Memory (GB): 5.08
inference time (s/im): 0.09259
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 39.3 (37.9)
Weights: https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_r101_fpn_1x_coco/fsaf_r101_fpn_1x_coco-9e71098f.pth
- Name: fsaf_x101_64x4d_fpn_1x_coco
In Collection: FSAF
Config: configs/fsaf/fsaf_x101_64x4d_fpn_1x_coco.py
Metadata:
Training Memory (GB): 9.38
inference time (s/im): 0.17857
Epochs: 12
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 42.4 (41.0)
Weights: https://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco-e3f6e6fd.pth
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment