Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
detectron2
Commits
c732df65
Commit
c732df65
authored
Jan 18, 2024
by
limm
Browse files
push v0.1.3 version commit bd2ea47
parent
5b3792fc
Pipeline
#706
failed with stages
in 0 seconds
Changes
424
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
2746 additions
and
0 deletions
+2746
-0
detectron2/layers/nms.py
detectron2/layers/nms.py
+146
-0
detectron2/layers/roi_align.py
detectron2/layers/roi_align.py
+105
-0
detectron2/layers/roi_align_rotated.py
detectron2/layers/roi_align_rotated.py
+88
-0
detectron2/layers/rotated_boxes.py
detectron2/layers/rotated_boxes.py
+22
-0
detectron2/layers/shape_spec.py
detectron2/layers/shape_spec.py
+20
-0
detectron2/layers/wrappers.py
detectron2/layers/wrappers.py
+215
-0
detectron2/model_zoo/__init__.py
detectron2/model_zoo/__init__.py
+9
-0
detectron2/model_zoo/model_zoo.py
detectron2/model_zoo/model_zoo.py
+150
-0
detectron2/modeling/__init__.py
detectron2/modeling/__init__.py
+56
-0
detectron2/modeling/anchor_generator.py
detectron2/modeling/anchor_generator.py
+382
-0
detectron2/modeling/backbone/__init__.py
detectron2/modeling/backbone/__init__.py
+9
-0
detectron2/modeling/backbone/backbone.py
detectron2/modeling/backbone/backbone.py
+53
-0
detectron2/modeling/backbone/build.py
detectron2/modeling/backbone/build.py
+33
-0
detectron2/modeling/backbone/fpn.py
detectron2/modeling/backbone/fpn.py
+245
-0
detectron2/modeling/backbone/resnet.py
detectron2/modeling/backbone/resnet.py
+591
-0
detectron2/modeling/box_regression.py
detectron2/modeling/box_regression.py
+247
-0
detectron2/modeling/matcher.py
detectron2/modeling/matcher.py
+123
-0
detectron2/modeling/meta_arch/__init__.py
detectron2/modeling/meta_arch/__init__.py
+11
-0
detectron2/modeling/meta_arch/build.py
detectron2/modeling/meta_arch/build.py
+23
-0
detectron2/modeling/meta_arch/panoptic_fpn.py
detectron2/modeling/meta_arch/panoptic_fpn.py
+218
-0
No files found.
Too many changes to show.
To preserve performance only
424 of 424+
files are displayed.
Plain diff
Email patch
detectron2/layers/nms.py
0 → 100644
View file @
c732df65
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
torch
from
torchvision.ops
import
boxes
as
box_ops
from
torchvision.ops
import
nms
# BC-compat
def
batched_nms
(
boxes
,
scores
,
idxs
,
iou_threshold
):
"""
Same as torchvision.ops.boxes.batched_nms, but safer.
"""
assert
boxes
.
shape
[
-
1
]
==
4
# TODO may need better strategy.
# Investigate after having a fully-cuda NMS op.
if
len
(
boxes
)
<
40000
:
return
box_ops
.
batched_nms
(
boxes
,
scores
,
idxs
,
iou_threshold
)
result_mask
=
scores
.
new_zeros
(
scores
.
size
(),
dtype
=
torch
.
bool
)
for
id
in
torch
.
unique
(
idxs
).
cpu
().
tolist
():
mask
=
(
idxs
==
id
).
nonzero
().
view
(
-
1
)
keep
=
nms
(
boxes
[
mask
],
scores
[
mask
],
iou_threshold
)
result_mask
[
mask
[
keep
]]
=
True
keep
=
result_mask
.
nonzero
().
view
(
-
1
)
keep
=
keep
[
scores
[
keep
].
argsort
(
descending
=
True
)]
return
keep
# Note: this function (nms_rotated) might be moved into
# torchvision/ops/boxes.py in the future
def
nms_rotated
(
boxes
,
scores
,
iou_threshold
):
"""
Performs non-maximum suppression (NMS) on the rotated boxes according
to their intersection-over-union (IoU).
Rotated NMS iteratively removes lower scoring rotated boxes which have an
IoU greater than iou_threshold with another (higher scoring) rotated box.
Note that RotatedBox (5, 3, 4, 2, -90) covers exactly the same region as
RotatedBox (5, 3, 4, 2, 90) does, and their IoU will be 1. However, they
can be representing completely different objects in certain tasks, e.g., OCR.
As for the question of whether rotated-NMS should treat them as faraway boxes
even though their IOU is 1, it depends on the application and/or ground truth annotation.
As an extreme example, consider a single character v and the square box around it.
If the angle is 0 degree, the object (text) would be read as 'v';
If the angle is 90 degrees, the object (text) would become '>';
If the angle is 180 degrees, the object (text) would become '^';
If the angle is 270/-90 degrees, the object (text) would become '<'
All of these cases have IoU of 1 to each other, and rotated NMS that only
uses IoU as criterion would only keep one of them with the highest score -
which, practically, still makes sense in most cases because typically
only one of theses orientations is the correct one. Also, it does not matter
as much if the box is only used to classify the object (instead of transcribing
them with a sequential OCR recognition model) later.
On the other hand, when we use IoU to filter proposals that are close to the
ground truth during training, we should definitely take the angle into account if
we know the ground truth is labeled with the strictly correct orientation (as in,
upside-down words are annotated with -180 degrees even though they can be covered
with a 0/90/-90 degree box, etc.)
The way the original dataset is annotated also matters. For example, if the dataset
is a 4-point polygon dataset that does not enforce ordering of vertices/orientation,
we can estimate a minimum rotated bounding box to this polygon, but there's no way
we can tell the correct angle with 100% confidence (as shown above, there could be 4 different
rotated boxes, with angles differed by 90 degrees to each other, covering the exactly
same region). In that case we have to just use IoU to determine the box
proximity (as many detection benchmarks (even for text) do) unless there're other
assumptions we can make (like width is always larger than height, or the object is not
rotated by more than 90 degrees CCW/CW, etc.)
In summary, not considering angles in rotated NMS seems to be a good option for now,
but we should be aware of its implications.
Args:
boxes (Tensor[N, 5]): Rotated boxes to perform NMS on. They are expected to be in
(x_center, y_center, width, height, angle_degrees) format.
scores (Tensor[N]): Scores for each one of the rotated boxes
iou_threshold (float): Discards all overlapping rotated boxes with IoU < iou_threshold
Returns:
keep (Tensor): int64 tensor with the indices of the elements that have been kept
by Rotated NMS, sorted in decreasing order of scores
"""
from
detectron2
import
_C
return
_C
.
nms_rotated
(
boxes
,
scores
,
iou_threshold
)
# Note: this function (batched_nms_rotated) might be moved into
# torchvision/ops/boxes.py in the future
def
batched_nms_rotated
(
boxes
,
scores
,
idxs
,
iou_threshold
):
"""
Performs non-maximum suppression in a batched fashion.
Each index value correspond to a category, and NMS
will not be applied between elements of different categories.
Args:
boxes (Tensor[N, 5]):
boxes where NMS will be performed. They
are expected to be in (x_ctr, y_ctr, width, height, angle_degrees) format
scores (Tensor[N]):
scores for each one of the boxes
idxs (Tensor[N]):
indices of the categories for each one of the boxes.
iou_threshold (float):
discards all overlapping boxes
with IoU < iou_threshold
Returns:
Tensor:
int64 tensor with the indices of the elements that have been kept
by NMS, sorted in decreasing order of scores
"""
assert
boxes
.
shape
[
-
1
]
==
5
if
boxes
.
numel
()
==
0
:
return
torch
.
empty
((
0
,),
dtype
=
torch
.
int64
,
device
=
boxes
.
device
)
# Strategy: in order to perform NMS independently per class,
# we add an offset to all the boxes. The offset is dependent
# only on the class idx, and is large enough so that boxes
# from different classes do not overlap
# Note that batched_nms in torchvision/ops/boxes.py only uses max_coordinate,
# which won't handle negative coordinates correctly.
# Here by using min_coordinate we can make sure the negative coordinates are
# correctly handled.
max_coordinate
=
(
torch
.
max
(
boxes
[:,
0
],
boxes
[:,
1
])
+
torch
.
max
(
boxes
[:,
2
],
boxes
[:,
3
])
/
2
).
max
()
min_coordinate
=
(
torch
.
min
(
boxes
[:,
0
],
boxes
[:,
1
])
-
torch
.
max
(
boxes
[:,
2
],
boxes
[:,
3
])
/
2
).
min
()
offsets
=
idxs
.
to
(
boxes
)
*
(
max_coordinate
-
min_coordinate
+
1
)
boxes_for_nms
=
boxes
.
clone
()
# avoid modifying the original values in boxes
boxes_for_nms
[:,
:
2
]
+=
offsets
[:,
None
]
keep
=
nms_rotated
(
boxes_for_nms
,
scores
,
iou_threshold
)
return
keep
detectron2/layers/roi_align.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
torch
import
nn
from
torch.autograd
import
Function
from
torch.autograd.function
import
once_differentiable
from
torch.nn.modules.utils
import
_pair
from
detectron2
import
_C
class
_ROIAlign
(
Function
):
@
staticmethod
def
forward
(
ctx
,
input
,
roi
,
output_size
,
spatial_scale
,
sampling_ratio
,
aligned
):
ctx
.
save_for_backward
(
roi
)
ctx
.
output_size
=
_pair
(
output_size
)
ctx
.
spatial_scale
=
spatial_scale
ctx
.
sampling_ratio
=
sampling_ratio
ctx
.
input_shape
=
input
.
size
()
ctx
.
aligned
=
aligned
output
=
_C
.
roi_align_forward
(
input
,
roi
,
spatial_scale
,
output_size
[
0
],
output_size
[
1
],
sampling_ratio
,
aligned
)
return
output
@
staticmethod
@
once_differentiable
def
backward
(
ctx
,
grad_output
):
(
rois
,)
=
ctx
.
saved_tensors
output_size
=
ctx
.
output_size
spatial_scale
=
ctx
.
spatial_scale
sampling_ratio
=
ctx
.
sampling_ratio
bs
,
ch
,
h
,
w
=
ctx
.
input_shape
grad_input
=
_C
.
roi_align_backward
(
grad_output
,
rois
,
spatial_scale
,
output_size
[
0
],
output_size
[
1
],
bs
,
ch
,
h
,
w
,
sampling_ratio
,
ctx
.
aligned
,
)
return
grad_input
,
None
,
None
,
None
,
None
,
None
roi_align
=
_ROIAlign
.
apply
class
ROIAlign
(
nn
.
Module
):
def
__init__
(
self
,
output_size
,
spatial_scale
,
sampling_ratio
,
aligned
=
True
):
"""
Args:
output_size (tuple): h, w
spatial_scale (float): scale the input boxes by this number
sampling_ratio (int): number of inputs samples to take for each output
sample. 0 to take samples densely.
aligned (bool): if False, use the legacy implementation in
Detectron. If True, align the results more perfectly.
Note:
The meaning of aligned=True:
Given a continuous coordinate c, its two neighboring pixel indices (in our
pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example,
c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled
from the underlying signal at continuous coordinates 0.5 and 1.5). But the original
roi_align (aligned=False) does not subtract the 0.5 when computing neighboring
pixel indices and therefore it uses pixels with a slightly incorrect alignment
(relative to our pixel model) when performing bilinear interpolation.
With `aligned=True`,
we first appropriately scale the ROI and then shift it by -0.5
prior to calling roi_align. This produces the correct neighbors; see
detectron2/tests/test_roi_align.py for verification.
The difference does not make a difference to the model's performance if
ROIAlign is used together with conv layers.
"""
super
(
ROIAlign
,
self
).
__init__
()
self
.
output_size
=
output_size
self
.
spatial_scale
=
spatial_scale
self
.
sampling_ratio
=
sampling_ratio
self
.
aligned
=
aligned
def
forward
(
self
,
input
,
rois
):
"""
Args:
input: NCHW images
rois: Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.
"""
assert
rois
.
dim
()
==
2
and
rois
.
size
(
1
)
==
5
return
roi_align
(
input
,
rois
,
self
.
output_size
,
self
.
spatial_scale
,
self
.
sampling_ratio
,
self
.
aligned
)
def
__repr__
(
self
):
tmpstr
=
self
.
__class__
.
__name__
+
"("
tmpstr
+=
"output_size="
+
str
(
self
.
output_size
)
tmpstr
+=
", spatial_scale="
+
str
(
self
.
spatial_scale
)
tmpstr
+=
", sampling_ratio="
+
str
(
self
.
sampling_ratio
)
tmpstr
+=
", aligned="
+
str
(
self
.
aligned
)
tmpstr
+=
")"
return
tmpstr
detectron2/layers/roi_align_rotated.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
torch
import
nn
from
torch.autograd
import
Function
from
torch.autograd.function
import
once_differentiable
from
torch.nn.modules.utils
import
_pair
from
detectron2
import
_C
class
_ROIAlignRotated
(
Function
):
@
staticmethod
def
forward
(
ctx
,
input
,
roi
,
output_size
,
spatial_scale
,
sampling_ratio
):
ctx
.
save_for_backward
(
roi
)
ctx
.
output_size
=
_pair
(
output_size
)
ctx
.
spatial_scale
=
spatial_scale
ctx
.
sampling_ratio
=
sampling_ratio
ctx
.
input_shape
=
input
.
size
()
output
=
_C
.
roi_align_rotated_forward
(
input
,
roi
,
spatial_scale
,
output_size
[
0
],
output_size
[
1
],
sampling_ratio
)
return
output
@
staticmethod
@
once_differentiable
def
backward
(
ctx
,
grad_output
):
(
rois
,)
=
ctx
.
saved_tensors
output_size
=
ctx
.
output_size
spatial_scale
=
ctx
.
spatial_scale
sampling_ratio
=
ctx
.
sampling_ratio
bs
,
ch
,
h
,
w
=
ctx
.
input_shape
grad_input
=
_C
.
roi_align_rotated_backward
(
grad_output
,
rois
,
spatial_scale
,
output_size
[
0
],
output_size
[
1
],
bs
,
ch
,
h
,
w
,
sampling_ratio
,
)
return
grad_input
,
None
,
None
,
None
,
None
,
None
roi_align_rotated
=
_ROIAlignRotated
.
apply
class
ROIAlignRotated
(
nn
.
Module
):
def
__init__
(
self
,
output_size
,
spatial_scale
,
sampling_ratio
):
"""
Args:
output_size (tuple): h, w
spatial_scale (float): scale the input boxes by this number
sampling_ratio (int): number of inputs samples to take for each output
sample. 0 to take samples densely.
Note:
ROIAlignRotated supports continuous coordinate by default:
Given a continuous coordinate c, its two neighboring pixel indices (in our
pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example,
c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled
from the underlying signal at continuous coordinates 0.5 and 1.5).
"""
super
(
ROIAlignRotated
,
self
).
__init__
()
self
.
output_size
=
output_size
self
.
spatial_scale
=
spatial_scale
self
.
sampling_ratio
=
sampling_ratio
def
forward
(
self
,
input
,
rois
):
"""
Args:
input: NCHW images
rois: Bx6 boxes. First column is the index into N.
The other 5 columns are (x_ctr, y_ctr, width, height, angle_degrees).
"""
assert
rois
.
dim
()
==
2
and
rois
.
size
(
1
)
==
6
return
roi_align_rotated
(
input
,
rois
,
self
.
output_size
,
self
.
spatial_scale
,
self
.
sampling_ratio
)
def
__repr__
(
self
):
tmpstr
=
self
.
__class__
.
__name__
+
"("
tmpstr
+=
"output_size="
+
str
(
self
.
output_size
)
tmpstr
+=
", spatial_scale="
+
str
(
self
.
spatial_scale
)
tmpstr
+=
", sampling_ratio="
+
str
(
self
.
sampling_ratio
)
tmpstr
+=
")"
return
tmpstr
detectron2/layers/rotated_boxes.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
__future__
import
absolute_import
,
division
,
print_function
,
unicode_literals
from
detectron2
import
_C
def
pairwise_iou_rotated
(
boxes1
,
boxes2
):
"""
Return intersection-over-union (Jaccard index) of boxes.
Both sets of boxes are expected to be in
(x_center, y_center, width, height, angle) format.
Arguments:
boxes1 (Tensor[N, 5])
boxes2 (Tensor[M, 5])
Returns:
iou (Tensor[N, M]): the NxM matrix containing the pairwise
IoU values for every element in boxes1 and boxes2
"""
return
_C
.
box_iou_rotated
(
boxes1
,
boxes2
)
detectron2/layers/shape_spec.py
0 → 100644
View file @
c732df65
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
collections
import
namedtuple
class
ShapeSpec
(
namedtuple
(
"_ShapeSpec"
,
[
"channels"
,
"height"
,
"width"
,
"stride"
])):
"""
A simple structure that contains basic shape specification about a tensor.
It is often used as the auxiliary inputs/outputs of models,
to obtain the shape inference ability among pytorch modules.
Attributes:
channels:
height:
width:
stride:
"""
def
__new__
(
cls
,
*
,
channels
=
None
,
height
=
None
,
width
=
None
,
stride
=
None
):
return
super
().
__new__
(
cls
,
channels
,
height
,
width
,
stride
)
detectron2/layers/wrappers.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Wrappers around on some nn functions, mainly to support empty tensors.
Ideally, add support directly in PyTorch to empty tensors in those functions.
These can be removed once https://github.com/pytorch/pytorch/issues/12013
is implemented
"""
import
math
import
torch
from
torch.nn.modules.utils
import
_ntuple
TORCH_VERSION
=
tuple
(
int
(
x
)
for
x
in
torch
.
__version__
.
split
(
"."
)[:
2
])
def
cat
(
tensors
,
dim
=
0
):
"""
Efficient version of torch.cat that avoids a copy if there is only a single element in a list
"""
assert
isinstance
(
tensors
,
(
list
,
tuple
))
if
len
(
tensors
)
==
1
:
return
tensors
[
0
]
return
torch
.
cat
(
tensors
,
dim
)
class
_NewEmptyTensorOp
(
torch
.
autograd
.
Function
):
@
staticmethod
def
forward
(
ctx
,
x
,
new_shape
):
ctx
.
shape
=
x
.
shape
return
x
.
new_empty
(
new_shape
)
@
staticmethod
def
backward
(
ctx
,
grad
):
shape
=
ctx
.
shape
return
_NewEmptyTensorOp
.
apply
(
grad
,
shape
),
None
class
Conv2d
(
torch
.
nn
.
Conv2d
):
"""
A wrapper around :class:`torch.nn.Conv2d` to support empty inputs and more features.
"""
def
__init__
(
self
,
*
args
,
**
kwargs
):
"""
Extra keyword arguments supported in addition to those in `torch.nn.Conv2d`:
Args:
norm (nn.Module, optional): a normalization layer
activation (callable(Tensor) -> Tensor): a callable activation function
It assumes that norm layer is used before activation.
"""
norm
=
kwargs
.
pop
(
"norm"
,
None
)
activation
=
kwargs
.
pop
(
"activation"
,
None
)
super
().
__init__
(
*
args
,
**
kwargs
)
self
.
norm
=
norm
self
.
activation
=
activation
def
forward
(
self
,
x
):
if
x
.
numel
()
==
0
and
self
.
training
:
# https://github.com/pytorch/pytorch/issues/12013
assert
not
isinstance
(
self
.
norm
,
torch
.
nn
.
SyncBatchNorm
),
"SyncBatchNorm does not support empty inputs!"
if
x
.
numel
()
==
0
and
TORCH_VERSION
<=
(
1
,
4
):
assert
not
isinstance
(
self
.
norm
,
torch
.
nn
.
GroupNorm
),
"GroupNorm does not support empty inputs in PyTorch <=1.4!"
# When input is empty, we want to return a empty tensor with "correct" shape,
# So that the following operations will not panic
# if they check for the shape of the tensor.
# This computes the height and width of the output tensor
output_shape
=
[
(
i
+
2
*
p
-
(
di
*
(
k
-
1
)
+
1
))
//
s
+
1
for
i
,
p
,
di
,
k
,
s
in
zip
(
x
.
shape
[
-
2
:],
self
.
padding
,
self
.
dilation
,
self
.
kernel_size
,
self
.
stride
)
]
output_shape
=
[
x
.
shape
[
0
],
self
.
weight
.
shape
[
0
]]
+
output_shape
empty
=
_NewEmptyTensorOp
.
apply
(
x
,
output_shape
)
if
self
.
training
:
# This is to make DDP happy.
# DDP expects all workers to have gradient w.r.t the same set of parameters.
_dummy
=
sum
(
x
.
view
(
-
1
)[
0
]
for
x
in
self
.
parameters
())
*
0.0
return
empty
+
_dummy
else
:
return
empty
x
=
super
().
forward
(
x
)
if
self
.
norm
is
not
None
:
x
=
self
.
norm
(
x
)
if
self
.
activation
is
not
None
:
x
=
self
.
activation
(
x
)
return
x
if
TORCH_VERSION
>
(
1
,
4
):
ConvTranspose2d
=
torch
.
nn
.
ConvTranspose2d
else
:
class
ConvTranspose2d
(
torch
.
nn
.
ConvTranspose2d
):
"""
A wrapper around :class:`torch.nn.ConvTranspose2d` to support zero-size tensor.
"""
def
forward
(
self
,
x
):
if
x
.
numel
()
>
0
:
return
super
(
ConvTranspose2d
,
self
).
forward
(
x
)
# get output shape
# When input is empty, we want to return a empty tensor with "correct" shape,
# So that the following operations will not panic
# if they check for the shape of the tensor.
# This computes the height and width of the output tensor
output_shape
=
[
(
i
-
1
)
*
d
-
2
*
p
+
(
di
*
(
k
-
1
)
+
1
)
+
op
for
i
,
p
,
di
,
k
,
d
,
op
in
zip
(
x
.
shape
[
-
2
:],
self
.
padding
,
self
.
dilation
,
self
.
kernel_size
,
self
.
stride
,
self
.
output_padding
,
)
]
output_shape
=
[
x
.
shape
[
0
],
self
.
out_channels
]
+
output_shape
# This is to make DDP happy.
# DDP expects all workers to have gradient w.r.t the same set of parameters.
_dummy
=
sum
(
x
.
view
(
-
1
)[
0
]
for
x
in
self
.
parameters
())
*
0.0
return
_NewEmptyTensorOp
.
apply
(
x
,
output_shape
)
+
_dummy
if
TORCH_VERSION
>
(
1
,
4
):
BatchNorm2d
=
torch
.
nn
.
BatchNorm2d
else
:
class
BatchNorm2d
(
torch
.
nn
.
BatchNorm2d
):
"""
A wrapper around :class:`torch.nn.BatchNorm2d` to support zero-size tensor.
"""
def
forward
(
self
,
x
):
if
x
.
numel
()
>
0
:
return
super
(
BatchNorm2d
,
self
).
forward
(
x
)
# get output shape
output_shape
=
x
.
shape
return
_NewEmptyTensorOp
.
apply
(
x
,
output_shape
)
if
TORCH_VERSION
>
(
1
,
5
):
Linear
=
torch
.
nn
.
Linear
else
:
class
Linear
(
torch
.
nn
.
Linear
):
"""
A wrapper around :class:`torch.nn.Linear` to support empty inputs and more features.
Because of https://github.com/pytorch/pytorch/issues/34202
"""
def
forward
(
self
,
x
):
if
x
.
numel
()
==
0
:
output_shape
=
[
x
.
shape
[
0
],
self
.
weight
.
shape
[
0
]]
empty
=
_NewEmptyTensorOp
.
apply
(
x
,
output_shape
)
if
self
.
training
:
# This is to make DDP happy.
# DDP expects all workers to have gradient w.r.t the same set of parameters.
_dummy
=
sum
(
x
.
view
(
-
1
)[
0
]
for
x
in
self
.
parameters
())
*
0.0
return
empty
+
_dummy
else
:
return
empty
x
=
super
().
forward
(
x
)
return
x
def
interpolate
(
input
,
size
=
None
,
scale_factor
=
None
,
mode
=
"nearest"
,
align_corners
=
None
):
"""
A wrapper around :func:`torch.nn.functional.interpolate` to support zero-size tensor.
"""
if
TORCH_VERSION
>
(
1
,
4
)
or
input
.
numel
()
>
0
:
return
torch
.
nn
.
functional
.
interpolate
(
input
,
size
,
scale_factor
,
mode
,
align_corners
=
align_corners
)
def
_check_size_scale_factor
(
dim
):
if
size
is
None
and
scale_factor
is
None
:
raise
ValueError
(
"either size or scale_factor should be defined"
)
if
size
is
not
None
and
scale_factor
is
not
None
:
raise
ValueError
(
"only one of size or scale_factor should be defined"
)
if
(
scale_factor
is
not
None
and
isinstance
(
scale_factor
,
tuple
)
and
len
(
scale_factor
)
!=
dim
):
raise
ValueError
(
"scale_factor shape must match input shape. "
"Input is {}D, scale_factor size is {}"
.
format
(
dim
,
len
(
scale_factor
))
)
def
_output_size
(
dim
):
_check_size_scale_factor
(
dim
)
if
size
is
not
None
:
return
size
scale_factors
=
_ntuple
(
dim
)(
scale_factor
)
# math.floor might return float in py2.7
return
[
int
(
math
.
floor
(
input
.
size
(
i
+
2
)
*
scale_factors
[
i
]))
for
i
in
range
(
dim
)]
output_shape
=
tuple
(
_output_size
(
2
))
output_shape
=
input
.
shape
[:
-
2
]
+
output_shape
return
_NewEmptyTensorOp
.
apply
(
input
,
output_shape
)
detectron2/model_zoo/__init__.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Model Zoo API for Detectron2: a collection of functions to create common model architectures and
optionally load pre-trained weights as released in
`MODEL_ZOO.md <https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md>`_.
"""
from
.model_zoo
import
get
,
get_config_file
,
get_checkpoint_url
__all__
=
[
"get_checkpoint_url"
,
"get"
,
"get_config_file"
]
detectron2/model_zoo/model_zoo.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
os
import
pkg_resources
import
torch
from
detectron2.checkpoint
import
DetectionCheckpointer
from
detectron2.config
import
get_cfg
from
detectron2.modeling
import
build_model
class
_ModelZooUrls
(
object
):
"""
Mapping from names to officially released Detectron2 pre-trained models.
"""
S3_PREFIX
=
"https://dl.fbaipublicfiles.com/detectron2/"
# format: {config_path.yaml} -> model_id/model_final_{commit}.pkl
CONFIG_PATH_TO_URL_SUFFIX
=
{
# COCO Detection with Faster R-CNN
"COCO-Detection/faster_rcnn_R_50_C4_1x.yaml"
:
"137257644/model_final_721ade.pkl"
,
"COCO-Detection/faster_rcnn_R_50_DC5_1x.yaml"
:
"137847829/model_final_51d356.pkl"
,
"COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml"
:
"137257794/model_final_b275ba.pkl"
,
"COCO-Detection/faster_rcnn_R_50_C4_3x.yaml"
:
"137849393/model_final_f97cb7.pkl"
,
"COCO-Detection/faster_rcnn_R_50_DC5_3x.yaml"
:
"137849425/model_final_68d202.pkl"
,
"COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
:
"137849458/model_final_280758.pkl"
,
"COCO-Detection/faster_rcnn_R_101_C4_3x.yaml"
:
"138204752/model_final_298dad.pkl"
,
"COCO-Detection/faster_rcnn_R_101_DC5_3x.yaml"
:
"138204841/model_final_3e0943.pkl"
,
"COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"
:
"137851257/model_final_f6e8b1.pkl"
,
"COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"
:
"139173657/model_final_68b088.pkl"
,
# COCO Detection with RetinaNet
"COCO-Detection/retinanet_R_50_FPN_1x.yaml"
:
"137593951/model_final_b796dc.pkl"
,
"COCO-Detection/retinanet_R_50_FPN_3x.yaml"
:
"137849486/model_final_4cafe0.pkl"
,
"COCO-Detection/retinanet_R_101_FPN_3x.yaml"
:
"138363263/model_final_59f53c.pkl"
,
# COCO Detection with RPN and Fast R-CNN
"COCO-Detection/rpn_R_50_C4_1x.yaml"
:
"137258005/model_final_450694.pkl"
,
"COCO-Detection/rpn_R_50_FPN_1x.yaml"
:
"137258492/model_final_02ce48.pkl"
,
"COCO-Detection/fast_rcnn_R_50_FPN_1x.yaml"
:
"137635226/model_final_e5f7ce.pkl"
,
# COCO Instance Segmentation Baselines with Mask R-CNN
"COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x.yaml"
:
"137259246/model_final_9243eb.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_1x.yaml"
:
"137260150/model_final_4f86c3.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml"
:
"137260431/model_final_a54504.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x.yaml"
:
"137849525/model_final_4ce675.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x.yaml"
:
"137849551/model_final_84107b.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
:
"137849600/model_final_f10217.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x.yaml"
:
"138363239/model_final_a2914c.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_101_DC5_3x.yaml"
:
"138363294/model_final_0464b7.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"
:
"138205316/model_final_a3ec72.pkl"
,
"COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"
:
"139653917/model_final_2d9806.pkl"
,
# noqa
# COCO Person Keypoint Detection Baselines with Keypoint R-CNN
"COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x.yaml"
:
"137261548/model_final_04e291.pkl"
,
"COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"
:
"137849621/model_final_a6e10b.pkl"
,
"COCO-Keypoints/keypoint_rcnn_R_101_FPN_3x.yaml"
:
"138363331/model_final_997cc7.pkl"
,
"COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml"
:
"139686956/model_final_5ad38f.pkl"
,
# COCO Panoptic Segmentation Baselines with Panoptic FPN
"COCO-PanopticSegmentation/panoptic_fpn_R_50_1x.yaml"
:
"139514544/model_final_dbfeb4.pkl"
,
"COCO-PanopticSegmentation/panoptic_fpn_R_50_3x.yaml"
:
"139514569/model_final_c10459.pkl"
,
"COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml"
:
"139514519/model_final_cafdb1.pkl"
,
# LVIS Instance Segmentation Baselines with Mask R-CNN
"LVIS-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml"
:
"144219072/model_final_571f7c.pkl"
,
"LVIS-InstanceSegmentation/mask_rcnn_R_101_FPN_1x.yaml"
:
"144219035/model_final_824ab5.pkl"
,
"LVIS-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_1x.yaml"
:
"144219108/model_final_5e3439.pkl"
,
# noqa
# Cityscapes & Pascal VOC Baselines
"Cityscapes/mask_rcnn_R_50_FPN.yaml"
:
"142423278/model_final_af9cf5.pkl"
,
"PascalVOC-Detection/faster_rcnn_R_50_C4.yaml"
:
"142202221/model_final_b1acc2.pkl"
,
# Other Settings
"Misc/mask_rcnn_R_50_FPN_1x_dconv_c3-c5.yaml"
:
"138602867/model_final_65c703.pkl"
,
"Misc/mask_rcnn_R_50_FPN_3x_dconv_c3-c5.yaml"
:
"144998336/model_final_821d0b.pkl"
,
"Misc/cascade_mask_rcnn_R_50_FPN_1x.yaml"
:
"138602847/model_final_e9d89b.pkl"
,
"Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml"
:
"144998488/model_final_480dd8.pkl"
,
"Misc/mask_rcnn_R_50_FPN_3x_syncbn.yaml"
:
"169527823/model_final_3b3c51.pkl"
,
"Misc/mask_rcnn_R_50_FPN_3x_gn.yaml"
:
"138602888/model_final_dc5d9e.pkl"
,
"Misc/scratch_mask_rcnn_R_50_FPN_3x_gn.yaml"
:
"138602908/model_final_01ca85.pkl"
,
"Misc/panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml"
:
"139797668/model_final_be35db.pkl"
,
"Misc/cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv.yaml"
:
"18131413/model_0039999_e76410.pkl"
,
# noqa
# D1 Comparisons
"Detectron1-Comparisons/faster_rcnn_R_50_FPN_noaug_1x.yaml"
:
"137781054/model_final_7ab50c.pkl"
,
# noqa
"Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml"
:
"137781281/model_final_62ca52.pkl"
,
# noqa
"Detectron1-Comparisons/keypoint_rcnn_R_50_FPN_1x.yaml"
:
"137781195/model_final_cce136.pkl"
,
}
def
get_checkpoint_url
(
config_path
):
"""
Returns the URL to the model trained using the given config
Args:
config_path (str): config file name relative to detectron2's "configs/"
directory, e.g., "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml"
Returns:
str: a URL to the model
"""
name
=
config_path
.
replace
(
".yaml"
,
""
)
if
config_path
in
_ModelZooUrls
.
CONFIG_PATH_TO_URL_SUFFIX
:
suffix
=
_ModelZooUrls
.
CONFIG_PATH_TO_URL_SUFFIX
[
config_path
]
return
_ModelZooUrls
.
S3_PREFIX
+
name
+
"/"
+
suffix
raise
RuntimeError
(
"{} not available in Model Zoo!"
.
format
(
name
))
def
get_config_file
(
config_path
):
"""
Returns path to a builtin config file.
Args:
config_path (str): config file name relative to detectron2's "configs/"
directory, e.g., "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml"
Returns:
str: the real path to the config file.
"""
cfg_file
=
pkg_resources
.
resource_filename
(
"detectron2.model_zoo"
,
os
.
path
.
join
(
"configs"
,
config_path
)
)
if
not
os
.
path
.
exists
(
cfg_file
):
raise
RuntimeError
(
"{} not available in Model Zoo!"
.
format
(
config_path
))
return
cfg_file
def
get
(
config_path
,
trained
:
bool
=
False
):
"""
Get a model specified by relative path under Detectron2's official ``configs/`` directory.
Args:
config_path (str): config file name relative to detectron2's "configs/"
directory, e.g., "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml"
trained (bool): If True, will initialize the model with the trained model zoo weights.
If False, the checkpoint specified in the config file's ``MODEL.WEIGHTS`` is used
instead; this will typically (though not always) initialize a subset of weights using
an ImageNet pre-trained model, while randomly initializing the other weights.
Example:
.. code-block:: python
from detectron2 import model_zoo
model = model_zoo.get("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml", trained=True)
"""
cfg_file
=
get_config_file
(
config_path
)
cfg
=
get_cfg
()
cfg
.
merge_from_file
(
cfg_file
)
if
trained
:
cfg
.
MODEL
.
WEIGHTS
=
get_checkpoint_url
(
config_path
)
if
not
torch
.
cuda
.
is_available
():
cfg
.
MODEL
.
DEVICE
=
"cpu"
model
=
build_model
(
cfg
)
DetectionCheckpointer
(
model
).
load
(
cfg
.
MODEL
.
WEIGHTS
)
return
model
detectron2/modeling/__init__.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
torch
from
detectron2.layers
import
ShapeSpec
from
.anchor_generator
import
build_anchor_generator
,
ANCHOR_GENERATOR_REGISTRY
from
.backbone
import
(
BACKBONE_REGISTRY
,
FPN
,
Backbone
,
ResNet
,
ResNetBlockBase
,
build_backbone
,
build_resnet_backbone
,
make_stage
,
)
from
.meta_arch
import
(
META_ARCH_REGISTRY
,
SEM_SEG_HEADS_REGISTRY
,
GeneralizedRCNN
,
PanopticFPN
,
ProposalNetwork
,
RetinaNet
,
SemanticSegmentor
,
build_model
,
build_sem_seg_head
,
)
from
.postprocessing
import
detector_postprocess
from
.proposal_generator
import
(
PROPOSAL_GENERATOR_REGISTRY
,
build_proposal_generator
,
RPN_HEAD_REGISTRY
,
build_rpn_head
,
)
from
.roi_heads
import
(
ROI_BOX_HEAD_REGISTRY
,
ROI_HEADS_REGISTRY
,
ROI_KEYPOINT_HEAD_REGISTRY
,
ROI_MASK_HEAD_REGISTRY
,
ROIHeads
,
StandardROIHeads
,
BaseMaskRCNNHead
,
BaseKeypointRCNNHead
,
build_box_head
,
build_keypoint_head
,
build_mask_head
,
build_roi_heads
,
)
from
.test_time_augmentation
import
DatasetMapperTTA
,
GeneralizedRCNNWithTTA
_EXCLUDE
=
{
"torch"
,
"ShapeSpec"
}
__all__
=
[
k
for
k
in
globals
().
keys
()
if
k
not
in
_EXCLUDE
and
not
k
.
startswith
(
"_"
)]
assert
(
torch
.
Tensor
([
1
])
==
torch
.
Tensor
([
2
])
).
dtype
==
torch
.
bool
,
"Your Pytorch is too old. Please update to contain https://github.com/pytorch/pytorch/pull/21113"
detectron2/modeling/anchor_generator.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
math
from
typing
import
List
import
torch
from
torch
import
nn
from
detectron2.config
import
configurable
from
detectron2.layers
import
ShapeSpec
from
detectron2.structures
import
Boxes
,
RotatedBoxes
from
detectron2.utils.registry
import
Registry
ANCHOR_GENERATOR_REGISTRY
=
Registry
(
"ANCHOR_GENERATOR"
)
ANCHOR_GENERATOR_REGISTRY
.
__doc__
=
"""
Registry for modules that creates object detection anchors for feature maps.
The registered object will be called with `obj(cfg, input_shape)`.
"""
class
BufferList
(
nn
.
Module
):
"""
Similar to nn.ParameterList, but for buffers
"""
def
__init__
(
self
,
buffers
=
None
):
super
(
BufferList
,
self
).
__init__
()
if
buffers
is
not
None
:
self
.
extend
(
buffers
)
def
extend
(
self
,
buffers
):
offset
=
len
(
self
)
for
i
,
buffer
in
enumerate
(
buffers
):
self
.
register_buffer
(
str
(
offset
+
i
),
buffer
)
return
self
def
__len__
(
self
):
return
len
(
self
.
_buffers
)
def
__iter__
(
self
):
return
iter
(
self
.
_buffers
.
values
())
def
_create_grid_offsets
(
size
:
List
[
int
],
stride
:
int
,
offset
:
float
,
device
:
torch
.
device
):
grid_height
,
grid_width
=
size
shifts_x
=
torch
.
arange
(
offset
*
stride
,
grid_width
*
stride
,
step
=
stride
,
dtype
=
torch
.
float32
,
device
=
device
)
shifts_y
=
torch
.
arange
(
offset
*
stride
,
grid_height
*
stride
,
step
=
stride
,
dtype
=
torch
.
float32
,
device
=
device
)
shift_y
,
shift_x
=
torch
.
meshgrid
(
shifts_y
,
shifts_x
)
shift_x
=
shift_x
.
reshape
(
-
1
)
shift_y
=
shift_y
.
reshape
(
-
1
)
return
shift_x
,
shift_y
def
_broadcast_params
(
params
,
num_features
,
name
):
"""
If one size (or aspect ratio) is specified and there are multiple feature
maps, we "broadcast" anchors of that single size (or aspect ratio)
over all feature maps.
If params is list[float], or list[list[float]] with len(params) == 1, repeat
it num_features time.
Returns:
list[list[float]]: param for each feature
"""
assert
isinstance
(
params
,
(
list
,
tuple
)
),
f
"
{
name
}
in anchor generator has to be a list! Got
{
params
}
."
assert
len
(
params
),
f
"
{
name
}
in anchor generator cannot be empty!"
if
not
isinstance
(
params
[
0
],
(
list
,
tuple
)):
# list[float]
return
[
params
]
*
num_features
if
len
(
params
)
==
1
:
return
list
(
params
)
*
num_features
assert
len
(
params
)
==
num_features
,
(
f
"Got
{
name
}
of length
{
len
(
params
)
}
in anchor generator, "
f
"but the number of input features is
{
num_features
}
!"
)
return
params
@
ANCHOR_GENERATOR_REGISTRY
.
register
()
class
DefaultAnchorGenerator
(
nn
.
Module
):
"""
Compute anchors in the standard ways described in
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks".
"""
box_dim
:
int
=
4
"""
the dimension of each anchor box.
"""
@
configurable
def
__init__
(
self
,
*
,
sizes
,
aspect_ratios
,
strides
,
offset
=
0.5
):
"""
This interface is experimental.
Args:
sizes (list[list[float]] or list[float]):
If sizes is list[list[float]], sizes[i] is the list of anchor sizes
(i.e. sqrt of anchor area) to use for the i-th feature map.
If sizes is list[float], the sizes are used for all feature maps.
Anchor sizes are given in absolute lengths in units of
the input image; they do not dynamically scale if the input image size changes.
aspect_ratios (list[list[float]] or list[float]): list of aspect ratios
(i.e. height / width) to use for anchors. Same "broadcast" rule for `sizes` applies.
strides (list[int]): stride of each input feature.
offset (float): Relative offset between the center of the first anchor and the top-left
corner of the image. Value has to be in [0, 1).
Recommend to use 0.5, which means half stride.
"""
super
().
__init__
()
self
.
strides
=
strides
self
.
num_features
=
len
(
self
.
strides
)
sizes
=
_broadcast_params
(
sizes
,
self
.
num_features
,
"sizes"
)
aspect_ratios
=
_broadcast_params
(
aspect_ratios
,
self
.
num_features
,
"aspect_ratios"
)
self
.
cell_anchors
=
self
.
_calculate_anchors
(
sizes
,
aspect_ratios
)
self
.
offset
=
offset
assert
0.0
<=
self
.
offset
<
1.0
,
self
.
offset
@
classmethod
def
from_config
(
cls
,
cfg
,
input_shape
:
List
[
ShapeSpec
]):
return
{
"sizes"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
SIZES
,
"aspect_ratios"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
ASPECT_RATIOS
,
"strides"
:
[
x
.
stride
for
x
in
input_shape
],
"offset"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
OFFSET
,
}
def
_calculate_anchors
(
self
,
sizes
,
aspect_ratios
):
cell_anchors
=
[
self
.
generate_cell_anchors
(
s
,
a
).
float
()
for
s
,
a
in
zip
(
sizes
,
aspect_ratios
)
]
return
BufferList
(
cell_anchors
)
@
property
def
num_cell_anchors
(
self
):
"""
Alias of `num_anchors`.
"""
return
self
.
num_anchors
@
property
def
num_anchors
(
self
):
"""
Returns:
list[int]: Each int is the number of anchors at every pixel
location, on that feature map.
For example, if at every pixel we use anchors of 3 aspect
ratios and 5 sizes, the number of anchors is 15.
(See also ANCHOR_GENERATOR.SIZES and ANCHOR_GENERATOR.ASPECT_RATIOS in config)
In standard RPN models, `num_anchors` on every feature map is the same.
"""
return
[
len
(
cell_anchors
)
for
cell_anchors
in
self
.
cell_anchors
]
def
_grid_anchors
(
self
,
grid_sizes
:
List
[
List
[
int
]]):
"""
Returns:
list[Tensor]: #featuremap tensors, each is (#locations x #cell_anchors) x 4
"""
anchors
=
[]
for
size
,
stride
,
base_anchors
in
zip
(
grid_sizes
,
self
.
strides
,
self
.
cell_anchors
):
shift_x
,
shift_y
=
_create_grid_offsets
(
size
,
stride
,
self
.
offset
,
base_anchors
.
device
)
shifts
=
torch
.
stack
((
shift_x
,
shift_y
,
shift_x
,
shift_y
),
dim
=
1
)
anchors
.
append
((
shifts
.
view
(
-
1
,
1
,
4
)
+
base_anchors
.
view
(
1
,
-
1
,
4
)).
reshape
(
-
1
,
4
))
return
anchors
def
generate_cell_anchors
(
self
,
sizes
=
(
32
,
64
,
128
,
256
,
512
),
aspect_ratios
=
(
0.5
,
1
,
2
)):
"""
Generate a tensor storing canonical anchor boxes, which are all anchor
boxes of different sizes and aspect_ratios centered at (0, 0).
We can later build the set of anchors for a full feature map by
shifting and tiling these tensors (see `meth:_grid_anchors`).
Args:
sizes (tuple[float]):
aspect_ratios (tuple[float]]):
Returns:
Tensor of shape (len(sizes) * len(aspect_ratios), 4) storing anchor boxes
in XYXY format.
"""
# This is different from the anchor generator defined in the original Faster R-CNN
# code or Detectron. They yield the same AP, however the old version defines cell
# anchors in a less natural way with a shift relative to the feature grid and
# quantization that results in slightly different sizes for different aspect ratios.
# See also https://github.com/facebookresearch/Detectron/issues/227
anchors
=
[]
for
size
in
sizes
:
area
=
size
**
2.0
for
aspect_ratio
in
aspect_ratios
:
# s * s = w * h
# a = h / w
# ... some algebra ...
# w = sqrt(s * s / a)
# h = a * w
w
=
math
.
sqrt
(
area
/
aspect_ratio
)
h
=
aspect_ratio
*
w
x0
,
y0
,
x1
,
y1
=
-
w
/
2.0
,
-
h
/
2.0
,
w
/
2.0
,
h
/
2.0
anchors
.
append
([
x0
,
y0
,
x1
,
y1
])
return
torch
.
tensor
(
anchors
)
def
forward
(
self
,
features
):
"""
Args:
features (list[Tensor]): list of backbone feature maps on which to generate anchors.
Returns:
list[Boxes]: a list of Boxes containing all the anchors for each feature map
(i.e. the cell anchors repeated over all locations in the feature map).
The number of anchors of each feature map is Hi x Wi x num_cell_anchors,
where Hi, Wi are resolution of the feature map divided by anchor stride.
"""
grid_sizes
=
[
feature_map
.
shape
[
-
2
:]
for
feature_map
in
features
]
anchors_over_all_feature_maps
=
self
.
_grid_anchors
(
grid_sizes
)
return
[
Boxes
(
x
)
for
x
in
anchors_over_all_feature_maps
]
@
ANCHOR_GENERATOR_REGISTRY
.
register
()
class
RotatedAnchorGenerator
(
nn
.
Module
):
"""
Compute rotated anchors used by Rotated RPN (RRPN), described in
"Arbitrary-Oriented Scene Text Detection via Rotation Proposals".
"""
box_dim
:
int
=
5
"""
the dimension of each anchor box.
"""
@
configurable
def
__init__
(
self
,
*
,
sizes
,
aspect_ratios
,
strides
,
angles
,
offset
=
0.5
):
"""
This interface is experimental.
Args:
sizes (list[list[float]] or list[float]):
If sizes is list[list[float]], sizes[i] is the list of anchor sizes
(i.e. sqrt of anchor area) to use for the i-th feature map.
If sizes is list[float], the sizes are used for all feature maps.
Anchor sizes are given in absolute lengths in units of
the input image; they do not dynamically scale if the input image size changes.
aspect_ratios (list[list[float]] or list[float]): list of aspect ratios
(i.e. height / width) to use for anchors. Same "broadcast" rule for `sizes` applies.
strides (list[int]): stride of each input feature.
angles (list[list[float]] or list[float]): list of angles (in degrees CCW)
to use for anchors. Same "broadcast" rule for `sizes` applies.
offset (float): Relative offset between the center of the first anchor and the top-left
corner of the image. Value has to be in [0, 1).
Recommend to use 0.5, which means half stride.
"""
super
().
__init__
()
self
.
strides
=
strides
self
.
num_features
=
len
(
self
.
strides
)
sizes
=
_broadcast_params
(
sizes
,
self
.
num_features
,
"sizes"
)
aspect_ratios
=
_broadcast_params
(
aspect_ratios
,
self
.
num_features
,
"aspect_ratios"
)
angles
=
_broadcast_params
(
angles
,
self
.
num_features
,
"angles"
)
self
.
cell_anchors
=
self
.
_calculate_anchors
(
sizes
,
aspect_ratios
,
angles
)
self
.
offset
=
offset
assert
0.0
<=
self
.
offset
<
1.0
,
self
.
offset
@
classmethod
def
from_config
(
cls
,
cfg
,
input_shape
:
List
[
ShapeSpec
]):
return
{
"sizes"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
SIZES
,
"aspect_ratios"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
ASPECT_RATIOS
,
"strides"
:
[
x
.
stride
for
x
in
input_shape
],
"offset"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
OFFSET
,
"angles"
:
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
ANGLES
,
}
def
_calculate_anchors
(
self
,
sizes
,
aspect_ratios
,
angles
):
cell_anchors
=
[
self
.
generate_cell_anchors
(
size
,
aspect_ratio
,
angle
).
float
()
for
size
,
aspect_ratio
,
angle
in
zip
(
sizes
,
aspect_ratios
,
angles
)
]
return
BufferList
(
cell_anchors
)
@
property
def
num_cell_anchors
(
self
):
"""
Alias of `num_anchors`.
"""
return
self
.
num_anchors
@
property
def
num_anchors
(
self
):
"""
Returns:
list[int]: Each int is the number of anchors at every pixel
location, on that feature map.
For example, if at every pixel we use anchors of 3 aspect
ratios, 2 sizes and 5 angles, the number of anchors is 30.
(See also ANCHOR_GENERATOR.SIZES, ANCHOR_GENERATOR.ASPECT_RATIOS
and ANCHOR_GENERATOR.ANGLES in config)
In standard RRPN models, `num_anchors` on every feature map is the same.
"""
return
[
len
(
cell_anchors
)
for
cell_anchors
in
self
.
cell_anchors
]
def
_grid_anchors
(
self
,
grid_sizes
):
anchors
=
[]
for
size
,
stride
,
base_anchors
in
zip
(
grid_sizes
,
self
.
strides
,
self
.
cell_anchors
):
shift_x
,
shift_y
=
_create_grid_offsets
(
size
,
stride
,
self
.
offset
,
base_anchors
.
device
)
zeros
=
torch
.
zeros_like
(
shift_x
)
shifts
=
torch
.
stack
((
shift_x
,
shift_y
,
zeros
,
zeros
,
zeros
),
dim
=
1
)
anchors
.
append
((
shifts
.
view
(
-
1
,
1
,
5
)
+
base_anchors
.
view
(
1
,
-
1
,
5
)).
reshape
(
-
1
,
5
))
return
anchors
def
generate_cell_anchors
(
self
,
sizes
=
(
32
,
64
,
128
,
256
,
512
),
aspect_ratios
=
(
0.5
,
1
,
2
),
angles
=
(
-
90
,
-
60
,
-
30
,
0
,
30
,
60
,
90
),
):
"""
Generate a tensor storing canonical anchor boxes, which are all anchor
boxes of different sizes, aspect_ratios, angles centered at (0, 0).
We can later build the set of anchors for a full feature map by
shifting and tiling these tensors (see `meth:_grid_anchors`).
Args:
sizes (tuple[float]):
aspect_ratios (tuple[float]]):
angles (tuple[float]]):
Returns:
Tensor of shape (len(sizes) * len(aspect_ratios) * len(angles), 5)
storing anchor boxes in (x_ctr, y_ctr, w, h, angle) format.
"""
anchors
=
[]
for
size
in
sizes
:
area
=
size
**
2.0
for
aspect_ratio
in
aspect_ratios
:
# s * s = w * h
# a = h / w
# ... some algebra ...
# w = sqrt(s * s / a)
# h = a * w
w
=
math
.
sqrt
(
area
/
aspect_ratio
)
h
=
aspect_ratio
*
w
anchors
.
extend
([
0
,
0
,
w
,
h
,
a
]
for
a
in
angles
)
return
torch
.
tensor
(
anchors
)
def
forward
(
self
,
features
):
"""
Args:
features (list[Tensor]): list of backbone feature maps on which to generate anchors.
Returns:
list[RotatedBoxes]: a list of Boxes containing all the anchors for each feature map
(i.e. the cell anchors repeated over all locations in the feature map).
The number of anchors of each feature map is Hi x Wi x num_cell_anchors,
where Hi, Wi are resolution of the feature map divided by anchor stride.
"""
grid_sizes
=
[
feature_map
.
shape
[
-
2
:]
for
feature_map
in
features
]
anchors_over_all_feature_maps
=
self
.
_grid_anchors
(
grid_sizes
)
return
[
RotatedBoxes
(
x
)
for
x
in
anchors_over_all_feature_maps
]
def
build_anchor_generator
(
cfg
,
input_shape
):
"""
Built an anchor generator from `cfg.MODEL.ANCHOR_GENERATOR.NAME`.
"""
anchor_generator
=
cfg
.
MODEL
.
ANCHOR_GENERATOR
.
NAME
return
ANCHOR_GENERATOR_REGISTRY
.
get
(
anchor_generator
)(
cfg
,
input_shape
)
detectron2/modeling/backbone/__init__.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
.build
import
build_backbone
,
BACKBONE_REGISTRY
# noqa F401 isort:skip
from
.backbone
import
Backbone
from
.fpn
import
FPN
from
.resnet
import
ResNet
,
ResNetBlockBase
,
build_resnet_backbone
,
make_stage
__all__
=
[
k
for
k
in
globals
().
keys
()
if
not
k
.
startswith
(
"_"
)]
# TODO can expose more resnet blocks after careful consideration
detectron2/modeling/backbone/backbone.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
abc
import
ABCMeta
,
abstractmethod
import
torch.nn
as
nn
from
detectron2.layers
import
ShapeSpec
__all__
=
[
"Backbone"
]
class
Backbone
(
nn
.
Module
,
metaclass
=
ABCMeta
):
"""
Abstract base class for network backbones.
"""
def
__init__
(
self
):
"""
The `__init__` method of any subclass can specify its own set of arguments.
"""
super
().
__init__
()
@
abstractmethod
def
forward
(
self
):
"""
Subclasses must override this method, but adhere to the same return type.
Returns:
dict[str->Tensor]: mapping from feature name (e.g., "res2") to tensor
"""
pass
@
property
def
size_divisibility
(
self
):
"""
Some backbones require the input height and width to be divisible by a
specific integer. This is typically true for encoder / decoder type networks
with lateral connection (e.g., FPN) for which feature maps need to match
dimension in the "bottom up" and "top down" paths. Set to 0 if no specific
input size divisibility is required.
"""
return
0
def
output_shape
(
self
):
"""
Returns:
dict[str->ShapeSpec]
"""
# this is a backward-compatible default
return
{
name
:
ShapeSpec
(
channels
=
self
.
_out_feature_channels
[
name
],
stride
=
self
.
_out_feature_strides
[
name
]
)
for
name
in
self
.
_out_features
}
detectron2/modeling/backbone/build.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
detectron2.layers
import
ShapeSpec
from
detectron2.utils.registry
import
Registry
from
.backbone
import
Backbone
BACKBONE_REGISTRY
=
Registry
(
"BACKBONE"
)
BACKBONE_REGISTRY
.
__doc__
=
"""
Registry for backbones, which extract feature maps from images
The registered object must be a callable that accepts two arguments:
1. A :class:`detectron2.config.CfgNode`
2. A :class:`detectron2.layers.ShapeSpec`, which contains the input shape specification.
It must returns an instance of :class:`Backbone`.
"""
def
build_backbone
(
cfg
,
input_shape
=
None
):
"""
Build a backbone from `cfg.MODEL.BACKBONE.NAME`.
Returns:
an instance of :class:`Backbone`
"""
if
input_shape
is
None
:
input_shape
=
ShapeSpec
(
channels
=
len
(
cfg
.
MODEL
.
PIXEL_MEAN
))
backbone_name
=
cfg
.
MODEL
.
BACKBONE
.
NAME
backbone
=
BACKBONE_REGISTRY
.
get
(
backbone_name
)(
cfg
,
input_shape
)
assert
isinstance
(
backbone
,
Backbone
)
return
backbone
detectron2/modeling/backbone/fpn.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
math
import
fvcore.nn.weight_init
as
weight_init
import
torch.nn.functional
as
F
from
torch
import
nn
from
detectron2.layers
import
Conv2d
,
ShapeSpec
,
get_norm
from
.backbone
import
Backbone
from
.build
import
BACKBONE_REGISTRY
from
.resnet
import
build_resnet_backbone
__all__
=
[
"build_resnet_fpn_backbone"
,
"build_retinanet_resnet_fpn_backbone"
,
"FPN"
]
class
FPN
(
Backbone
):
"""
This module implements :paper:`FPN`.
It creates pyramid features built on top of some input feature maps.
"""
def
__init__
(
self
,
bottom_up
,
in_features
,
out_channels
,
norm
=
""
,
top_block
=
None
,
fuse_type
=
"sum"
):
"""
Args:
bottom_up (Backbone): module representing the bottom up subnetwork.
Must be a subclass of :class:`Backbone`. The multi-scale feature
maps generated by the bottom up network, and listed in `in_features`,
are used to generate FPN levels.
in_features (list[str]): names of the input feature maps coming
from the backbone to which FPN is attached. For example, if the
backbone produces ["res2", "res3", "res4"], any *contiguous* sublist
of these may be used; order must be from high to low resolution.
out_channels (int): number of channels in the output feature maps.
norm (str): the normalization to use.
top_block (nn.Module or None): if provided, an extra operation will
be performed on the output of the last (smallest resolution)
FPN output, and the result will extend the result list. The top_block
further downsamples the feature map. It must have an attribute
"num_levels", meaning the number of extra FPN levels added by
this block, and "in_feature", which is a string representing
its input feature (e.g., p5).
fuse_type (str): types for fusing the top down features and the lateral
ones. It can be "sum" (default), which sums up element-wise; or "avg",
which takes the element-wise mean of the two.
"""
super
(
FPN
,
self
).
__init__
()
assert
isinstance
(
bottom_up
,
Backbone
)
# Feature map strides and channels from the bottom up network (e.g. ResNet)
input_shapes
=
bottom_up
.
output_shape
()
in_strides
=
[
input_shapes
[
f
].
stride
for
f
in
in_features
]
in_channels
=
[
input_shapes
[
f
].
channels
for
f
in
in_features
]
_assert_strides_are_log2_contiguous
(
in_strides
)
lateral_convs
=
[]
output_convs
=
[]
use_bias
=
norm
==
""
for
idx
,
in_channels
in
enumerate
(
in_channels
):
lateral_norm
=
get_norm
(
norm
,
out_channels
)
output_norm
=
get_norm
(
norm
,
out_channels
)
lateral_conv
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
1
,
bias
=
use_bias
,
norm
=
lateral_norm
)
output_conv
=
Conv2d
(
out_channels
,
out_channels
,
kernel_size
=
3
,
stride
=
1
,
padding
=
1
,
bias
=
use_bias
,
norm
=
output_norm
,
)
weight_init
.
c2_xavier_fill
(
lateral_conv
)
weight_init
.
c2_xavier_fill
(
output_conv
)
stage
=
int
(
math
.
log2
(
in_strides
[
idx
]))
self
.
add_module
(
"fpn_lateral{}"
.
format
(
stage
),
lateral_conv
)
self
.
add_module
(
"fpn_output{}"
.
format
(
stage
),
output_conv
)
lateral_convs
.
append
(
lateral_conv
)
output_convs
.
append
(
output_conv
)
# Place convs into top-down order (from low to high resolution)
# to make the top-down computation in forward clearer.
self
.
lateral_convs
=
lateral_convs
[::
-
1
]
self
.
output_convs
=
output_convs
[::
-
1
]
self
.
top_block
=
top_block
self
.
in_features
=
in_features
self
.
bottom_up
=
bottom_up
# Return feature names are "p<stage>", like ["p2", "p3", ..., "p6"]
self
.
_out_feature_strides
=
{
"p{}"
.
format
(
int
(
math
.
log2
(
s
))):
s
for
s
in
in_strides
}
# top block output feature maps.
if
self
.
top_block
is
not
None
:
for
s
in
range
(
stage
,
stage
+
self
.
top_block
.
num_levels
):
self
.
_out_feature_strides
[
"p{}"
.
format
(
s
+
1
)]
=
2
**
(
s
+
1
)
self
.
_out_features
=
list
(
self
.
_out_feature_strides
.
keys
())
self
.
_out_feature_channels
=
{
k
:
out_channels
for
k
in
self
.
_out_features
}
self
.
_size_divisibility
=
in_strides
[
-
1
]
assert
fuse_type
in
{
"avg"
,
"sum"
}
self
.
_fuse_type
=
fuse_type
@
property
def
size_divisibility
(
self
):
return
self
.
_size_divisibility
def
forward
(
self
,
x
):
"""
Args:
input (dict[str->Tensor]): mapping feature map name (e.g., "res5") to
feature map tensor for each feature level in high to low resolution order.
Returns:
dict[str->Tensor]:
mapping from feature map name to FPN feature map tensor
in high to low resolution order. Returned feature names follow the FPN
paper convention: "p<stage>", where stage has stride = 2 ** stage e.g.,
["p2", "p3", ..., "p6"].
"""
# Reverse feature maps into top-down order (from low to high resolution)
bottom_up_features
=
self
.
bottom_up
(
x
)
x
=
[
bottom_up_features
[
f
]
for
f
in
self
.
in_features
[::
-
1
]]
results
=
[]
prev_features
=
self
.
lateral_convs
[
0
](
x
[
0
])
results
.
append
(
self
.
output_convs
[
0
](
prev_features
))
for
features
,
lateral_conv
,
output_conv
in
zip
(
x
[
1
:],
self
.
lateral_convs
[
1
:],
self
.
output_convs
[
1
:]
):
top_down_features
=
F
.
interpolate
(
prev_features
,
scale_factor
=
2
,
mode
=
"nearest"
)
lateral_features
=
lateral_conv
(
features
)
prev_features
=
lateral_features
+
top_down_features
if
self
.
_fuse_type
==
"avg"
:
prev_features
/=
2
results
.
insert
(
0
,
output_conv
(
prev_features
))
if
self
.
top_block
is
not
None
:
top_block_in_feature
=
bottom_up_features
.
get
(
self
.
top_block
.
in_feature
,
None
)
if
top_block_in_feature
is
None
:
top_block_in_feature
=
results
[
self
.
_out_features
.
index
(
self
.
top_block
.
in_feature
)]
results
.
extend
(
self
.
top_block
(
top_block_in_feature
))
assert
len
(
self
.
_out_features
)
==
len
(
results
)
return
dict
(
zip
(
self
.
_out_features
,
results
))
def
output_shape
(
self
):
return
{
name
:
ShapeSpec
(
channels
=
self
.
_out_feature_channels
[
name
],
stride
=
self
.
_out_feature_strides
[
name
]
)
for
name
in
self
.
_out_features
}
def
_assert_strides_are_log2_contiguous
(
strides
):
"""
Assert that each stride is 2x times its preceding stride, i.e. "contiguous in log2".
"""
for
i
,
stride
in
enumerate
(
strides
[
1
:],
1
):
assert
stride
==
2
*
strides
[
i
-
1
],
"Strides {} {} are not log2 contiguous"
.
format
(
stride
,
strides
[
i
-
1
]
)
class
LastLevelMaxPool
(
nn
.
Module
):
"""
This module is used in the original FPN to generate a downsampled
P6 feature from P5.
"""
def
__init__
(
self
):
super
().
__init__
()
self
.
num_levels
=
1
self
.
in_feature
=
"p5"
def
forward
(
self
,
x
):
return
[
F
.
max_pool2d
(
x
,
kernel_size
=
1
,
stride
=
2
,
padding
=
0
)]
class
LastLevelP6P7
(
nn
.
Module
):
"""
This module is used in RetinaNet to generate extra layers, P6 and P7 from
C5 feature.
"""
def
__init__
(
self
,
in_channels
,
out_channels
,
in_feature
=
"res5"
):
super
().
__init__
()
self
.
num_levels
=
2
self
.
in_feature
=
in_feature
self
.
p6
=
nn
.
Conv2d
(
in_channels
,
out_channels
,
3
,
2
,
1
)
self
.
p7
=
nn
.
Conv2d
(
out_channels
,
out_channels
,
3
,
2
,
1
)
for
module
in
[
self
.
p6
,
self
.
p7
]:
weight_init
.
c2_xavier_fill
(
module
)
def
forward
(
self
,
c5
):
p6
=
self
.
p6
(
c5
)
p7
=
self
.
p7
(
F
.
relu
(
p6
))
return
[
p6
,
p7
]
@
BACKBONE_REGISTRY
.
register
()
def
build_resnet_fpn_backbone
(
cfg
,
input_shape
:
ShapeSpec
):
"""
Args:
cfg: a detectron2 CfgNode
Returns:
backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
"""
bottom_up
=
build_resnet_backbone
(
cfg
,
input_shape
)
in_features
=
cfg
.
MODEL
.
FPN
.
IN_FEATURES
out_channels
=
cfg
.
MODEL
.
FPN
.
OUT_CHANNELS
backbone
=
FPN
(
bottom_up
=
bottom_up
,
in_features
=
in_features
,
out_channels
=
out_channels
,
norm
=
cfg
.
MODEL
.
FPN
.
NORM
,
top_block
=
LastLevelMaxPool
(),
fuse_type
=
cfg
.
MODEL
.
FPN
.
FUSE_TYPE
,
)
return
backbone
@
BACKBONE_REGISTRY
.
register
()
def
build_retinanet_resnet_fpn_backbone
(
cfg
,
input_shape
:
ShapeSpec
):
"""
Args:
cfg: a detectron2 CfgNode
Returns:
backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
"""
bottom_up
=
build_resnet_backbone
(
cfg
,
input_shape
)
in_features
=
cfg
.
MODEL
.
FPN
.
IN_FEATURES
out_channels
=
cfg
.
MODEL
.
FPN
.
OUT_CHANNELS
in_channels_p6p7
=
bottom_up
.
output_shape
()[
"res5"
].
channels
backbone
=
FPN
(
bottom_up
=
bottom_up
,
in_features
=
in_features
,
out_channels
=
out_channels
,
norm
=
cfg
.
MODEL
.
FPN
.
NORM
,
top_block
=
LastLevelP6P7
(
in_channels_p6p7
,
out_channels
),
fuse_type
=
cfg
.
MODEL
.
FPN
.
FUSE_TYPE
,
)
return
backbone
detectron2/modeling/backbone/resnet.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
numpy
as
np
import
fvcore.nn.weight_init
as
weight_init
import
torch
import
torch.nn.functional
as
F
from
torch
import
nn
from
detectron2.layers
import
(
CNNBlockBase
,
Conv2d
,
DeformConv
,
ModulatedDeformConv
,
ShapeSpec
,
get_norm
,
)
from
.backbone
import
Backbone
from
.build
import
BACKBONE_REGISTRY
__all__
=
[
"ResNetBlockBase"
,
"BasicBlock"
,
"BottleneckBlock"
,
"DeformBottleneckBlock"
,
"BasicStem"
,
"ResNet"
,
"make_stage"
,
"build_resnet_backbone"
,
]
ResNetBlockBase
=
CNNBlockBase
"""
Alias for backward compatibiltiy.
"""
class
BasicBlock
(
CNNBlockBase
):
"""
The basic residual block for ResNet-18 and ResNet-34 defined in :paper:`ResNet`,
with two 3x3 conv layers and a projection shortcut if needed.
"""
def
__init__
(
self
,
in_channels
,
out_channels
,
*
,
stride
=
1
,
norm
=
"BN"
):
"""
Args:
in_channels (int): Number of input channels.
out_channels (int): Number of output channels.
stride (int): Stride for the first conv.
norm (str or callable): normalization for all conv layers.
See :func:`layers.get_norm` for supported format.
"""
super
().
__init__
(
in_channels
,
out_channels
,
stride
)
if
in_channels
!=
out_channels
:
self
.
shortcut
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
1
,
stride
=
stride
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
else
:
self
.
shortcut
=
None
self
.
conv1
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
3
,
stride
=
stride
,
padding
=
1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
self
.
conv2
=
Conv2d
(
out_channels
,
out_channels
,
kernel_size
=
3
,
stride
=
1
,
padding
=
1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
for
layer
in
[
self
.
conv1
,
self
.
conv2
,
self
.
shortcut
]:
if
layer
is
not
None
:
# shortcut can be None
weight_init
.
c2_msra_fill
(
layer
)
def
forward
(
self
,
x
):
out
=
self
.
conv1
(
x
)
out
=
F
.
relu_
(
out
)
out
=
self
.
conv2
(
out
)
if
self
.
shortcut
is
not
None
:
shortcut
=
self
.
shortcut
(
x
)
else
:
shortcut
=
x
out
+=
shortcut
out
=
F
.
relu_
(
out
)
return
out
class
BottleneckBlock
(
CNNBlockBase
):
"""
The standard bottleneck residual block used by ResNet-50, 101 and 152
defined in :paper:`ResNet`. It contains 3 conv layers with kernels
1x1, 3x3, 1x1, and a projection shortcut if needed.
"""
def
__init__
(
self
,
in_channels
,
out_channels
,
*
,
bottleneck_channels
,
stride
=
1
,
num_groups
=
1
,
norm
=
"BN"
,
stride_in_1x1
=
False
,
dilation
=
1
,
):
"""
Args:
bottleneck_channels (int): number of output channels for the 3x3
"bottleneck" conv layers.
num_groups (int): number of groups for the 3x3 conv layer.
norm (str or callable): normalization for all conv layers.
See :func:`layers.get_norm` for supported format.
stride_in_1x1 (bool): when stride>1, whether to put stride in the
first 1x1 convolution or the bottleneck 3x3 convolution.
dilation (int): the dilation rate of the 3x3 conv layer.
"""
super
().
__init__
(
in_channels
,
out_channels
,
stride
)
if
in_channels
!=
out_channels
:
self
.
shortcut
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
1
,
stride
=
stride
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
else
:
self
.
shortcut
=
None
# The original MSRA ResNet models have stride in the first 1x1 conv
# The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have
# stride in the 3x3 conv
stride_1x1
,
stride_3x3
=
(
stride
,
1
)
if
stride_in_1x1
else
(
1
,
stride
)
self
.
conv1
=
Conv2d
(
in_channels
,
bottleneck_channels
,
kernel_size
=
1
,
stride
=
stride_1x1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
bottleneck_channels
),
)
self
.
conv2
=
Conv2d
(
bottleneck_channels
,
bottleneck_channels
,
kernel_size
=
3
,
stride
=
stride_3x3
,
padding
=
1
*
dilation
,
bias
=
False
,
groups
=
num_groups
,
dilation
=
dilation
,
norm
=
get_norm
(
norm
,
bottleneck_channels
),
)
self
.
conv3
=
Conv2d
(
bottleneck_channels
,
out_channels
,
kernel_size
=
1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
for
layer
in
[
self
.
conv1
,
self
.
conv2
,
self
.
conv3
,
self
.
shortcut
]:
if
layer
is
not
None
:
# shortcut can be None
weight_init
.
c2_msra_fill
(
layer
)
# Zero-initialize the last normalization in each residual branch,
# so that at the beginning, the residual branch starts with zeros,
# and each residual block behaves like an identity.
# See Sec 5.1 in "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour":
# "For BN layers, the learnable scaling coefficient γ is initialized
# to be 1, except for each residual block's last BN
# where γ is initialized to be 0."
# nn.init.constant_(self.conv3.norm.weight, 0)
# TODO this somehow hurts performance when training GN models from scratch.
# Add it as an option when we need to use this code to train a backbone.
def
forward
(
self
,
x
):
out
=
self
.
conv1
(
x
)
out
=
F
.
relu_
(
out
)
out
=
self
.
conv2
(
out
)
out
=
F
.
relu_
(
out
)
out
=
self
.
conv3
(
out
)
if
self
.
shortcut
is
not
None
:
shortcut
=
self
.
shortcut
(
x
)
else
:
shortcut
=
x
out
+=
shortcut
out
=
F
.
relu_
(
out
)
return
out
class
DeformBottleneckBlock
(
ResNetBlockBase
):
"""
Similar to :class:`BottleneckBlock`, but with :paper:`deformable conv <deformconv>`
in the 3x3 convolution.
"""
def
__init__
(
self
,
in_channels
,
out_channels
,
*
,
bottleneck_channels
,
stride
=
1
,
num_groups
=
1
,
norm
=
"BN"
,
stride_in_1x1
=
False
,
dilation
=
1
,
deform_modulated
=
False
,
deform_num_groups
=
1
,
):
super
().
__init__
(
in_channels
,
out_channels
,
stride
)
self
.
deform_modulated
=
deform_modulated
if
in_channels
!=
out_channels
:
self
.
shortcut
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
1
,
stride
=
stride
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
else
:
self
.
shortcut
=
None
stride_1x1
,
stride_3x3
=
(
stride
,
1
)
if
stride_in_1x1
else
(
1
,
stride
)
self
.
conv1
=
Conv2d
(
in_channels
,
bottleneck_channels
,
kernel_size
=
1
,
stride
=
stride_1x1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
bottleneck_channels
),
)
if
deform_modulated
:
deform_conv_op
=
ModulatedDeformConv
# offset channels are 2 or 3 (if with modulated) * kernel_size * kernel_size
offset_channels
=
27
else
:
deform_conv_op
=
DeformConv
offset_channels
=
18
self
.
conv2_offset
=
Conv2d
(
bottleneck_channels
,
offset_channels
*
deform_num_groups
,
kernel_size
=
3
,
stride
=
stride_3x3
,
padding
=
1
*
dilation
,
dilation
=
dilation
,
)
self
.
conv2
=
deform_conv_op
(
bottleneck_channels
,
bottleneck_channels
,
kernel_size
=
3
,
stride
=
stride_3x3
,
padding
=
1
*
dilation
,
bias
=
False
,
groups
=
num_groups
,
dilation
=
dilation
,
deformable_groups
=
deform_num_groups
,
norm
=
get_norm
(
norm
,
bottleneck_channels
),
)
self
.
conv3
=
Conv2d
(
bottleneck_channels
,
out_channels
,
kernel_size
=
1
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
for
layer
in
[
self
.
conv1
,
self
.
conv2
,
self
.
conv3
,
self
.
shortcut
]:
if
layer
is
not
None
:
# shortcut can be None
weight_init
.
c2_msra_fill
(
layer
)
nn
.
init
.
constant_
(
self
.
conv2_offset
.
weight
,
0
)
nn
.
init
.
constant_
(
self
.
conv2_offset
.
bias
,
0
)
def
forward
(
self
,
x
):
out
=
self
.
conv1
(
x
)
out
=
F
.
relu_
(
out
)
if
self
.
deform_modulated
:
offset_mask
=
self
.
conv2_offset
(
out
)
offset_x
,
offset_y
,
mask
=
torch
.
chunk
(
offset_mask
,
3
,
dim
=
1
)
offset
=
torch
.
cat
((
offset_x
,
offset_y
),
dim
=
1
)
mask
=
mask
.
sigmoid
()
out
=
self
.
conv2
(
out
,
offset
,
mask
)
else
:
offset
=
self
.
conv2_offset
(
out
)
out
=
self
.
conv2
(
out
,
offset
)
out
=
F
.
relu_
(
out
)
out
=
self
.
conv3
(
out
)
if
self
.
shortcut
is
not
None
:
shortcut
=
self
.
shortcut
(
x
)
else
:
shortcut
=
x
out
+=
shortcut
out
=
F
.
relu_
(
out
)
return
out
def
make_stage
(
block_class
,
num_blocks
,
first_stride
,
*
,
in_channels
,
out_channels
,
**
kwargs
):
"""
Create a list of blocks just like those in a ResNet stage.
Args:
block_class (type): a subclass of ResNetBlockBase
num_blocks (int):
first_stride (int): the stride of the first block. The other blocks will have stride=1.
in_channels (int): input channels of the entire stage.
out_channels (int): output channels of **every block** in the stage.
kwargs: other arguments passed to the constructor of every block.
Returns:
list[nn.Module]: a list of block module.
"""
assert
"stride"
not
in
kwargs
,
"Stride of blocks in make_stage cannot be changed."
blocks
=
[]
for
i
in
range
(
num_blocks
):
blocks
.
append
(
block_class
(
in_channels
=
in_channels
,
out_channels
=
out_channels
,
stride
=
first_stride
if
i
==
0
else
1
,
**
kwargs
,
)
)
in_channels
=
out_channels
return
blocks
class
BasicStem
(
CNNBlockBase
):
"""
The standard ResNet stem (layers before the first residual block).
"""
def
__init__
(
self
,
in_channels
=
3
,
out_channels
=
64
,
norm
=
"BN"
):
"""
Args:
norm (str or callable): norm after the first conv layer.
See :func:`layers.get_norm` for supported format.
"""
super
().
__init__
(
in_channels
,
out_channels
,
4
)
self
.
in_channels
=
in_channels
self
.
conv1
=
Conv2d
(
in_channels
,
out_channels
,
kernel_size
=
7
,
stride
=
2
,
padding
=
3
,
bias
=
False
,
norm
=
get_norm
(
norm
,
out_channels
),
)
weight_init
.
c2_msra_fill
(
self
.
conv1
)
def
forward
(
self
,
x
):
x
=
self
.
conv1
(
x
)
x
=
F
.
relu_
(
x
)
x
=
F
.
max_pool2d
(
x
,
kernel_size
=
3
,
stride
=
2
,
padding
=
1
)
return
x
class
ResNet
(
Backbone
):
"""
Implement :paper:`ResNet`.
"""
def
__init__
(
self
,
stem
,
stages
,
num_classes
=
None
,
out_features
=
None
):
"""
Args:
stem (nn.Module): a stem module
stages (list[list[CNNBlockBase]]): several (typically 4) stages,
each contains multiple :class:`CNNBlockBase`.
num_classes (None or int): if None, will not perform classification.
Otherwise, will create a linear layer.
out_features (list[str]): name of the layers whose outputs should
be returned in forward. Can be anything in "stem", "linear", or "res2" ...
If None, will return the output of the last layer.
"""
super
(
ResNet
,
self
).
__init__
()
self
.
stem
=
stem
self
.
num_classes
=
num_classes
current_stride
=
self
.
stem
.
stride
self
.
_out_feature_strides
=
{
"stem"
:
current_stride
}
self
.
_out_feature_channels
=
{
"stem"
:
self
.
stem
.
out_channels
}
self
.
stages_and_names
=
[]
for
i
,
blocks
in
enumerate
(
stages
):
assert
len
(
blocks
)
>
0
,
len
(
blocks
)
for
block
in
blocks
:
assert
isinstance
(
block
,
CNNBlockBase
),
block
name
=
"res"
+
str
(
i
+
2
)
stage
=
nn
.
Sequential
(
*
blocks
)
self
.
add_module
(
name
,
stage
)
self
.
stages_and_names
.
append
((
stage
,
name
))
self
.
_out_feature_strides
[
name
]
=
current_stride
=
int
(
current_stride
*
np
.
prod
([
k
.
stride
for
k
in
blocks
])
)
self
.
_out_feature_channels
[
name
]
=
curr_channels
=
blocks
[
-
1
].
out_channels
if
num_classes
is
not
None
:
self
.
avgpool
=
nn
.
AdaptiveAvgPool2d
((
1
,
1
))
self
.
linear
=
nn
.
Linear
(
curr_channels
,
num_classes
)
# Sec 5.1 in "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour":
# "The 1000-way fully-connected layer is initialized by
# drawing weights from a zero-mean Gaussian with standard deviation of 0.01."
nn
.
init
.
normal_
(
self
.
linear
.
weight
,
std
=
0.01
)
name
=
"linear"
if
out_features
is
None
:
out_features
=
[
name
]
self
.
_out_features
=
out_features
assert
len
(
self
.
_out_features
)
children
=
[
x
[
0
]
for
x
in
self
.
named_children
()]
for
out_feature
in
self
.
_out_features
:
assert
out_feature
in
children
,
"Available children: {}"
.
format
(
", "
.
join
(
children
))
def
forward
(
self
,
x
):
outputs
=
{}
x
=
self
.
stem
(
x
)
if
"stem"
in
self
.
_out_features
:
outputs
[
"stem"
]
=
x
for
stage
,
name
in
self
.
stages_and_names
:
x
=
stage
(
x
)
if
name
in
self
.
_out_features
:
outputs
[
name
]
=
x
if
self
.
num_classes
is
not
None
:
x
=
self
.
avgpool
(
x
)
x
=
torch
.
flatten
(
x
,
1
)
x
=
self
.
linear
(
x
)
if
"linear"
in
self
.
_out_features
:
outputs
[
"linear"
]
=
x
return
outputs
def
output_shape
(
self
):
return
{
name
:
ShapeSpec
(
channels
=
self
.
_out_feature_channels
[
name
],
stride
=
self
.
_out_feature_strides
[
name
]
)
for
name
in
self
.
_out_features
}
def
freeze
(
self
,
freeze_at
=
0
):
"""
Freeze the first several stages of the ResNet. Commonly used in
fine-tuning.
Layers that produce the same feature map spatial size are defined as one
"stage" by :paper:`FPN`.
Args:
freeze_at (int): number of stages to freeze.
`1` means freezing the stem. `2` means freezing the stem and
one residual stage, etc.
Returns:
nn.Module: this ResNet itself
"""
if
freeze_at
>=
1
:
self
.
stem
.
freeze
()
for
idx
,
(
stage
,
_
)
in
enumerate
(
self
.
stages_and_names
,
start
=
2
):
if
freeze_at
>=
idx
:
for
block
in
stage
.
children
():
block
.
freeze
()
return
self
@
BACKBONE_REGISTRY
.
register
()
def
build_resnet_backbone
(
cfg
,
input_shape
):
"""
Create a ResNet instance from config.
Returns:
ResNet: a :class:`ResNet` instance.
"""
# need registration of new blocks/stems?
norm
=
cfg
.
MODEL
.
RESNETS
.
NORM
stem
=
BasicStem
(
in_channels
=
input_shape
.
channels
,
out_channels
=
cfg
.
MODEL
.
RESNETS
.
STEM_OUT_CHANNELS
,
norm
=
norm
,
)
# fmt: off
freeze_at
=
cfg
.
MODEL
.
BACKBONE
.
FREEZE_AT
out_features
=
cfg
.
MODEL
.
RESNETS
.
OUT_FEATURES
depth
=
cfg
.
MODEL
.
RESNETS
.
DEPTH
num_groups
=
cfg
.
MODEL
.
RESNETS
.
NUM_GROUPS
width_per_group
=
cfg
.
MODEL
.
RESNETS
.
WIDTH_PER_GROUP
bottleneck_channels
=
num_groups
*
width_per_group
in_channels
=
cfg
.
MODEL
.
RESNETS
.
STEM_OUT_CHANNELS
out_channels
=
cfg
.
MODEL
.
RESNETS
.
RES2_OUT_CHANNELS
stride_in_1x1
=
cfg
.
MODEL
.
RESNETS
.
STRIDE_IN_1X1
res5_dilation
=
cfg
.
MODEL
.
RESNETS
.
RES5_DILATION
deform_on_per_stage
=
cfg
.
MODEL
.
RESNETS
.
DEFORM_ON_PER_STAGE
deform_modulated
=
cfg
.
MODEL
.
RESNETS
.
DEFORM_MODULATED
deform_num_groups
=
cfg
.
MODEL
.
RESNETS
.
DEFORM_NUM_GROUPS
# fmt: on
assert
res5_dilation
in
{
1
,
2
},
"res5_dilation cannot be {}."
.
format
(
res5_dilation
)
num_blocks_per_stage
=
{
18
:
[
2
,
2
,
2
,
2
],
34
:
[
3
,
4
,
6
,
3
],
50
:
[
3
,
4
,
6
,
3
],
101
:
[
3
,
4
,
23
,
3
],
152
:
[
3
,
8
,
36
,
3
],
}[
depth
]
if
depth
in
[
18
,
34
]:
assert
out_channels
==
64
,
"Must set MODEL.RESNETS.RES2_OUT_CHANNELS = 64 for R18/R34"
assert
not
any
(
deform_on_per_stage
),
"MODEL.RESNETS.DEFORM_ON_PER_STAGE unsupported for R18/R34"
assert
res5_dilation
==
1
,
"Must set MODEL.RESNETS.RES5_DILATION = 1 for R18/R34"
assert
num_groups
==
1
,
"Must set MODEL.RESNETS.NUM_GROUPS = 1 for R18/R34"
stages
=
[]
# Avoid creating variables without gradients
# It consumes extra memory and may cause allreduce to fail
out_stage_idx
=
[{
"res2"
:
2
,
"res3"
:
3
,
"res4"
:
4
,
"res5"
:
5
}[
f
]
for
f
in
out_features
]
max_stage_idx
=
max
(
out_stage_idx
)
for
idx
,
stage_idx
in
enumerate
(
range
(
2
,
max_stage_idx
+
1
)):
dilation
=
res5_dilation
if
stage_idx
==
5
else
1
first_stride
=
1
if
idx
==
0
or
(
stage_idx
==
5
and
dilation
==
2
)
else
2
stage_kargs
=
{
"num_blocks"
:
num_blocks_per_stage
[
idx
],
"first_stride"
:
first_stride
,
"in_channels"
:
in_channels
,
"out_channels"
:
out_channels
,
"norm"
:
norm
,
}
# Use BasicBlock for R18 and R34.
if
depth
in
[
18
,
34
]:
stage_kargs
[
"block_class"
]
=
BasicBlock
else
:
stage_kargs
[
"bottleneck_channels"
]
=
bottleneck_channels
stage_kargs
[
"stride_in_1x1"
]
=
stride_in_1x1
stage_kargs
[
"dilation"
]
=
dilation
stage_kargs
[
"num_groups"
]
=
num_groups
if
deform_on_per_stage
[
idx
]:
stage_kargs
[
"block_class"
]
=
DeformBottleneckBlock
stage_kargs
[
"deform_modulated"
]
=
deform_modulated
stage_kargs
[
"deform_num_groups"
]
=
deform_num_groups
else
:
stage_kargs
[
"block_class"
]
=
BottleneckBlock
blocks
=
make_stage
(
**
stage_kargs
)
in_channels
=
out_channels
out_channels
*=
2
bottleneck_channels
*=
2
stages
.
append
(
blocks
)
return
ResNet
(
stem
,
stages
,
out_features
=
out_features
).
freeze
(
freeze_at
)
detectron2/modeling/box_regression.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
math
from
typing
import
Tuple
import
torch
# Value for clamping large dw and dh predictions. The heuristic is that we clamp
# such that dw and dh are no larger than what would transform a 16px box into a
# 1000px box (based on a small anchor, 16px, and a typical image size, 1000px).
_DEFAULT_SCALE_CLAMP
=
math
.
log
(
1000.0
/
16
)
__all__
=
[
"Box2BoxTransform"
,
"Box2BoxTransformRotated"
]
def
apply_deltas_broadcast
(
box2box_transform
,
deltas
,
boxes
):
"""
Apply transform deltas to boxes. Similar to `box2box_transform.apply_deltas`,
but allow broadcasting boxes when the second dimension of deltas is a multiple
of box dimension.
Args:
box2box_transform (Box2BoxTransform or Box2BoxTransformRotated): the transform to apply
deltas (Tensor): tensor of shape (N,B) or (N,KxB)
boxes (Tensor): tensor of shape (N,B)
Returns:
Tensor: same shape as deltas.
"""
assert
deltas
.
dim
()
==
boxes
.
dim
()
==
2
,
f
"
{
deltas
.
shape
}
,
{
boxes
.
shape
}
"
N
,
B
=
boxes
.
shape
assert
(
deltas
.
shape
[
1
]
%
B
==
0
),
f
"Second dim of deltas should be a multiple of
{
B
}
. Got
{
deltas
.
shape
}
"
K
=
deltas
.
shape
[
1
]
//
B
ret
=
box2box_transform
.
apply_deltas
(
deltas
.
view
(
N
*
K
,
B
),
boxes
.
unsqueeze
(
1
).
expand
(
N
,
K
,
B
).
reshape
(
N
*
K
,
B
)
)
return
ret
.
view
(
N
,
K
*
B
)
@
torch
.
jit
.
script
class
Box2BoxTransform
(
object
):
"""
The box-to-box transform defined in R-CNN. The transformation is parameterized
by 4 deltas: (dx, dy, dw, dh). The transformation scales the box's width and height
by exp(dw), exp(dh) and shifts a box's center by the offset (dx * width, dy * height).
"""
def
__init__
(
self
,
weights
:
Tuple
[
float
,
float
,
float
,
float
],
scale_clamp
:
float
=
_DEFAULT_SCALE_CLAMP
):
"""
Args:
weights (4-element tuple): Scaling factors that are applied to the
(dx, dy, dw, dh) deltas. In Fast R-CNN, these were originally set
such that the deltas have unit variance; now they are treated as
hyperparameters of the system.
scale_clamp (float): When predicting deltas, the predicted box scaling
factors (dw and dh) are clamped such that they are <= scale_clamp.
"""
self
.
weights
=
weights
self
.
scale_clamp
=
scale_clamp
def
get_deltas
(
self
,
src_boxes
,
target_boxes
):
"""
Get box regression transformation deltas (dx, dy, dw, dh) that can be used
to transform the `src_boxes` into the `target_boxes`. That is, the relation
``target_boxes == self.apply_deltas(deltas, src_boxes)`` is true (unless
any delta is too large and is clamped).
Args:
src_boxes (Tensor): source boxes, e.g., object proposals
target_boxes (Tensor): target of the transformation, e.g., ground-truth
boxes.
"""
assert
isinstance
(
src_boxes
,
torch
.
Tensor
),
type
(
src_boxes
)
assert
isinstance
(
target_boxes
,
torch
.
Tensor
),
type
(
target_boxes
)
src_widths
=
src_boxes
[:,
2
]
-
src_boxes
[:,
0
]
src_heights
=
src_boxes
[:,
3
]
-
src_boxes
[:,
1
]
src_ctr_x
=
src_boxes
[:,
0
]
+
0.5
*
src_widths
src_ctr_y
=
src_boxes
[:,
1
]
+
0.5
*
src_heights
target_widths
=
target_boxes
[:,
2
]
-
target_boxes
[:,
0
]
target_heights
=
target_boxes
[:,
3
]
-
target_boxes
[:,
1
]
target_ctr_x
=
target_boxes
[:,
0
]
+
0.5
*
target_widths
target_ctr_y
=
target_boxes
[:,
1
]
+
0.5
*
target_heights
wx
,
wy
,
ww
,
wh
=
self
.
weights
dx
=
wx
*
(
target_ctr_x
-
src_ctr_x
)
/
src_widths
dy
=
wy
*
(
target_ctr_y
-
src_ctr_y
)
/
src_heights
dw
=
ww
*
torch
.
log
(
target_widths
/
src_widths
)
dh
=
wh
*
torch
.
log
(
target_heights
/
src_heights
)
deltas
=
torch
.
stack
((
dx
,
dy
,
dw
,
dh
),
dim
=
1
)
assert
(
src_widths
>
0
).
all
().
item
(),
"Input boxes to Box2BoxTransform are not valid!"
return
deltas
def
apply_deltas
(
self
,
deltas
,
boxes
):
"""
Apply transformation `deltas` (dx, dy, dw, dh) to `boxes`.
Args:
deltas (Tensor): transformation deltas of shape (N, k*4), where k >= 1.
deltas[i] represents k potentially different class-specific
box transformations for the single box boxes[i].
boxes (Tensor): boxes to transform, of shape (N, 4)
"""
boxes
=
boxes
.
to
(
deltas
.
dtype
)
widths
=
boxes
[:,
2
]
-
boxes
[:,
0
]
heights
=
boxes
[:,
3
]
-
boxes
[:,
1
]
ctr_x
=
boxes
[:,
0
]
+
0.5
*
widths
ctr_y
=
boxes
[:,
1
]
+
0.5
*
heights
wx
,
wy
,
ww
,
wh
=
self
.
weights
dx
=
deltas
[:,
0
::
4
]
/
wx
dy
=
deltas
[:,
1
::
4
]
/
wy
dw
=
deltas
[:,
2
::
4
]
/
ww
dh
=
deltas
[:,
3
::
4
]
/
wh
# Prevent sending too large values into torch.exp()
dw
=
torch
.
clamp
(
dw
,
max
=
self
.
scale_clamp
)
dh
=
torch
.
clamp
(
dh
,
max
=
self
.
scale_clamp
)
pred_ctr_x
=
dx
*
widths
[:,
None
]
+
ctr_x
[:,
None
]
pred_ctr_y
=
dy
*
heights
[:,
None
]
+
ctr_y
[:,
None
]
pred_w
=
torch
.
exp
(
dw
)
*
widths
[:,
None
]
pred_h
=
torch
.
exp
(
dh
)
*
heights
[:,
None
]
pred_boxes
=
torch
.
zeros_like
(
deltas
)
pred_boxes
[:,
0
::
4
]
=
pred_ctr_x
-
0.5
*
pred_w
# x1
pred_boxes
[:,
1
::
4
]
=
pred_ctr_y
-
0.5
*
pred_h
# y1
pred_boxes
[:,
2
::
4
]
=
pred_ctr_x
+
0.5
*
pred_w
# x2
pred_boxes
[:,
3
::
4
]
=
pred_ctr_y
+
0.5
*
pred_h
# y2
return
pred_boxes
@
torch
.
jit
.
script
class
Box2BoxTransformRotated
(
object
):
"""
The box-to-box transform defined in Rotated R-CNN. The transformation is parameterized
by 5 deltas: (dx, dy, dw, dh, da). The transformation scales the box's width and height
by exp(dw), exp(dh), shifts a box's center by the offset (dx * width, dy * height),
and rotate a box's angle by da (radians).
Note: angles of deltas are in radians while angles of boxes are in degrees.
"""
def
__init__
(
self
,
weights
:
Tuple
[
float
,
float
,
float
,
float
,
float
],
scale_clamp
:
float
=
_DEFAULT_SCALE_CLAMP
,
):
"""
Args:
weights (5-element tuple): Scaling factors that are applied to the
(dx, dy, dw, dh, da) deltas. These are treated as
hyperparameters of the system.
scale_clamp (float): When predicting deltas, the predicted box scaling
factors (dw and dh) are clamped such that they are <= scale_clamp.
"""
self
.
weights
=
weights
self
.
scale_clamp
=
scale_clamp
def
get_deltas
(
self
,
src_boxes
,
target_boxes
):
"""
Get box regression transformation deltas (dx, dy, dw, dh, da) that can be used
to transform the `src_boxes` into the `target_boxes`. That is, the relation
``target_boxes == self.apply_deltas(deltas, src_boxes)`` is true (unless
any delta is too large and is clamped).
Args:
src_boxes (Tensor): Nx5 source boxes, e.g., object proposals
target_boxes (Tensor): Nx5 target of the transformation, e.g., ground-truth
boxes.
"""
assert
isinstance
(
src_boxes
,
torch
.
Tensor
),
type
(
src_boxes
)
assert
isinstance
(
target_boxes
,
torch
.
Tensor
),
type
(
target_boxes
)
src_ctr_x
,
src_ctr_y
,
src_widths
,
src_heights
,
src_angles
=
torch
.
unbind
(
src_boxes
,
dim
=
1
)
target_ctr_x
,
target_ctr_y
,
target_widths
,
target_heights
,
target_angles
=
torch
.
unbind
(
target_boxes
,
dim
=
1
)
wx
,
wy
,
ww
,
wh
,
wa
=
self
.
weights
dx
=
wx
*
(
target_ctr_x
-
src_ctr_x
)
/
src_widths
dy
=
wy
*
(
target_ctr_y
-
src_ctr_y
)
/
src_heights
dw
=
ww
*
torch
.
log
(
target_widths
/
src_widths
)
dh
=
wh
*
torch
.
log
(
target_heights
/
src_heights
)
# Angles of deltas are in radians while angles of boxes are in degrees.
# the conversion to radians serve as a way to normalize the values
da
=
target_angles
-
src_angles
da
=
(
da
+
180.0
)
%
360.0
-
180.0
# make it in [-180, 180)
da
*=
wa
*
math
.
pi
/
180.0
deltas
=
torch
.
stack
((
dx
,
dy
,
dw
,
dh
,
da
),
dim
=
1
)
assert
(
(
src_widths
>
0
).
all
().
item
()
),
"Input boxes to Box2BoxTransformRotated are not valid!"
return
deltas
def
apply_deltas
(
self
,
deltas
,
boxes
):
"""
Apply transformation `deltas` (dx, dy, dw, dh, da) to `boxes`.
Args:
deltas (Tensor): transformation deltas of shape (N, 5).
deltas[i] represents box transformation for the single box boxes[i].
boxes (Tensor): boxes to transform, of shape (N, 5)
"""
assert
deltas
.
shape
[
1
]
==
5
and
boxes
.
shape
[
1
]
==
5
boxes
=
boxes
.
to
(
deltas
.
dtype
)
ctr_x
=
boxes
[:,
0
]
ctr_y
=
boxes
[:,
1
]
widths
=
boxes
[:,
2
]
heights
=
boxes
[:,
3
]
angles
=
boxes
[:,
4
]
wx
,
wy
,
ww
,
wh
,
wa
=
self
.
weights
dx
=
deltas
[:,
0
]
/
wx
dy
=
deltas
[:,
1
]
/
wy
dw
=
deltas
[:,
2
]
/
ww
dh
=
deltas
[:,
3
]
/
wh
da
=
deltas
[:,
4
]
/
wa
# Prevent sending too large values into torch.exp()
dw
=
torch
.
clamp
(
dw
,
max
=
self
.
scale_clamp
)
dh
=
torch
.
clamp
(
dh
,
max
=
self
.
scale_clamp
)
pred_boxes
=
torch
.
zeros_like
(
deltas
)
pred_boxes
[:,
0
]
=
dx
*
widths
+
ctr_x
# x_ctr
pred_boxes
[:,
1
]
=
dy
*
heights
+
ctr_y
# y_ctr
pred_boxes
[:,
2
]
=
torch
.
exp
(
dw
)
*
widths
# width
pred_boxes
[:,
3
]
=
torch
.
exp
(
dh
)
*
heights
# height
# Following original RRPN implementation,
# angles of deltas are in radians while angles of boxes are in degrees.
pred_angle
=
da
*
180.0
/
math
.
pi
+
angles
pred_angle
=
(
pred_angle
+
180.0
)
%
360.0
-
180.0
# make it in [-180, 180)
pred_boxes
[:,
4
]
=
pred_angle
return
pred_boxes
detectron2/modeling/matcher.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
typing
import
List
import
torch
class
Matcher
(
object
):
"""
This class assigns to each predicted "element" (e.g., a box) a ground-truth
element. Each predicted element will have exactly zero or one matches; each
ground-truth element may be matched to zero or more predicted elements.
The matching is determined by the MxN match_quality_matrix, that characterizes
how well each (ground-truth, prediction)-pair match each other. For example,
if the elements are boxes, this matrix may contain box intersection-over-union
overlap values.
The matcher returns (a) a vector of length N containing the index of the
ground-truth element m in [0, M) that matches to prediction n in [0, N).
(b) a vector of length N containing the labels for each prediction.
"""
def
__init__
(
self
,
thresholds
:
List
[
float
],
labels
:
List
[
int
],
allow_low_quality_matches
:
bool
=
False
):
"""
Args:
thresholds (list): a list of thresholds used to stratify predictions
into levels.
labels (list): a list of values to label predictions belonging at
each level. A label can be one of {-1, 0, 1} signifying
{ignore, negative class, positive class}, respectively.
allow_low_quality_matches (bool): if True, produce additional matches
for predictions with maximum match quality lower than high_threshold.
See set_low_quality_matches_ for more details.
For example,
thresholds = [0.3, 0.5]
labels = [0, -1, 1]
All predictions with iou < 0.3 will be marked with 0 and
thus will be considered as false positives while training.
All predictions with 0.3 <= iou < 0.5 will be marked with -1 and
thus will be ignored.
All predictions with 0.5 <= iou will be marked with 1 and
thus will be considered as true positives.
"""
# Add -inf and +inf to first and last position in thresholds
thresholds
=
thresholds
[:]
assert
thresholds
[
0
]
>
0
thresholds
.
insert
(
0
,
-
float
(
"inf"
))
thresholds
.
append
(
float
(
"inf"
))
assert
all
(
low
<=
high
for
(
low
,
high
)
in
zip
(
thresholds
[:
-
1
],
thresholds
[
1
:]))
assert
all
(
l
in
[
-
1
,
0
,
1
]
for
l
in
labels
)
assert
len
(
labels
)
==
len
(
thresholds
)
-
1
self
.
thresholds
=
thresholds
self
.
labels
=
labels
self
.
allow_low_quality_matches
=
allow_low_quality_matches
def
__call__
(
self
,
match_quality_matrix
):
"""
Args:
match_quality_matrix (Tensor[float]): an MxN tensor, containing the
pairwise quality between M ground-truth elements and N predicted
elements. All elements must be >= 0 (due to the us of `torch.nonzero`
for selecting indices in :meth:`set_low_quality_matches_`).
Returns:
matches (Tensor[int64]): a vector of length N, where matches[i] is a matched
ground-truth index in [0, M)
match_labels (Tensor[int8]): a vector of length N, where pred_labels[i] indicates
whether a prediction is a true or false positive or ignored
"""
assert
match_quality_matrix
.
dim
()
==
2
if
match_quality_matrix
.
numel
()
==
0
:
default_matches
=
match_quality_matrix
.
new_full
(
(
match_quality_matrix
.
size
(
1
),),
0
,
dtype
=
torch
.
int64
)
# When no gt boxes exist, we define IOU = 0 and therefore set labels
# to `self.labels[0]`, which usually defaults to background class 0
# To choose to ignore instead, can make labels=[-1,0,-1,1] + set appropriate thresholds
default_match_labels
=
match_quality_matrix
.
new_full
(
(
match_quality_matrix
.
size
(
1
),),
self
.
labels
[
0
],
dtype
=
torch
.
int8
)
return
default_matches
,
default_match_labels
assert
torch
.
all
(
match_quality_matrix
>=
0
)
# match_quality_matrix is M (gt) x N (predicted)
# Max over gt elements (dim 0) to find best gt candidate for each prediction
matched_vals
,
matches
=
match_quality_matrix
.
max
(
dim
=
0
)
match_labels
=
matches
.
new_full
(
matches
.
size
(),
1
,
dtype
=
torch
.
int8
)
for
(
l
,
low
,
high
)
in
zip
(
self
.
labels
,
self
.
thresholds
[:
-
1
],
self
.
thresholds
[
1
:]):
low_high
=
(
matched_vals
>=
low
)
&
(
matched_vals
<
high
)
match_labels
[
low_high
]
=
l
if
self
.
allow_low_quality_matches
:
self
.
set_low_quality_matches_
(
match_labels
,
match_quality_matrix
)
return
matches
,
match_labels
def
set_low_quality_matches_
(
self
,
match_labels
,
match_quality_matrix
):
"""
Produce additional matches for predictions that have only low-quality matches.
Specifically, for each ground-truth G find the set of predictions that have
maximum overlap with it (including ties); for each prediction in that set, if
it is unmatched, then match it to the ground-truth G.
This function implements the RPN assignment case (i) in Sec. 3.1.2 of
:paper:`Faster R-CNN`.
"""
# For each gt, find the prediction with which it has highest quality
highest_quality_foreach_gt
,
_
=
match_quality_matrix
.
max
(
dim
=
1
)
# Find the highest quality match available, even if it is low, including ties.
# Note that the matches qualities must be positive due to the use of
# `torch.nonzero`.
_
,
pred_inds_with_highest_quality
=
torch
.
nonzero
(
match_quality_matrix
==
highest_quality_foreach_gt
[:,
None
],
as_tuple
=
True
)
# If an anchor was labeled positive only due to a low-quality match
# with gt_A, but it has larger overlap with gt_B, it's matched index will still be gt_B.
# This follows the implementation in Detectron, and is found to have no significant impact.
match_labels
[
pred_inds_with_highest_quality
]
=
1
detectron2/modeling/meta_arch/__init__.py
0 → 100644
View file @
c732df65
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from
.build
import
META_ARCH_REGISTRY
,
build_model
# isort:skip
from
.panoptic_fpn
import
PanopticFPN
# import all the meta_arch, so they will be registered
from
.rcnn
import
GeneralizedRCNN
,
ProposalNetwork
from
.retinanet
import
RetinaNet
from
.semantic_seg
import
SEM_SEG_HEADS_REGISTRY
,
SemanticSegmentor
,
build_sem_seg_head
detectron2/modeling/meta_arch/build.py
0 → 100644
View file @
c732df65
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
torch
from
detectron2.utils.registry
import
Registry
META_ARCH_REGISTRY
=
Registry
(
"META_ARCH"
)
# noqa F401 isort:skip
META_ARCH_REGISTRY
.
__doc__
=
"""
Registry for meta-architectures, i.e. the whole model.
The registered object will be called with `obj(cfg)`
and expected to return a `nn.Module` object.
"""
def
build_model
(
cfg
):
"""
Build the whole model architecture, defined by ``cfg.MODEL.META_ARCHITECTURE``.
Note that it does not load any weights from ``cfg``.
"""
meta_arch
=
cfg
.
MODEL
.
META_ARCHITECTURE
model
=
META_ARCH_REGISTRY
.
get
(
meta_arch
)(
cfg
)
model
.
to
(
torch
.
device
(
cfg
.
MODEL
.
DEVICE
))
return
model
detectron2/modeling/meta_arch/panoptic_fpn.py
0 → 100644
View file @
c732df65
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import
torch
from
torch
import
nn
from
detectron2.structures
import
ImageList
from
..backbone
import
build_backbone
from
..postprocessing
import
detector_postprocess
,
sem_seg_postprocess
from
..proposal_generator
import
build_proposal_generator
from
..roi_heads
import
build_roi_heads
from
.build
import
META_ARCH_REGISTRY
from
.semantic_seg
import
build_sem_seg_head
__all__
=
[
"PanopticFPN"
]
@
META_ARCH_REGISTRY
.
register
()
class
PanopticFPN
(
nn
.
Module
):
"""
Implement the paper :paper:`PanopticFPN`.
"""
def
__init__
(
self
,
cfg
):
super
().
__init__
()
self
.
instance_loss_weight
=
cfg
.
MODEL
.
PANOPTIC_FPN
.
INSTANCE_LOSS_WEIGHT
# options when combining instance & semantic outputs
self
.
combine_on
=
cfg
.
MODEL
.
PANOPTIC_FPN
.
COMBINE
.
ENABLED
self
.
combine_overlap_threshold
=
cfg
.
MODEL
.
PANOPTIC_FPN
.
COMBINE
.
OVERLAP_THRESH
self
.
combine_stuff_area_limit
=
cfg
.
MODEL
.
PANOPTIC_FPN
.
COMBINE
.
STUFF_AREA_LIMIT
self
.
combine_instances_confidence_threshold
=
(
cfg
.
MODEL
.
PANOPTIC_FPN
.
COMBINE
.
INSTANCES_CONFIDENCE_THRESH
)
self
.
backbone
=
build_backbone
(
cfg
)
self
.
proposal_generator
=
build_proposal_generator
(
cfg
,
self
.
backbone
.
output_shape
())
self
.
roi_heads
=
build_roi_heads
(
cfg
,
self
.
backbone
.
output_shape
())
self
.
sem_seg_head
=
build_sem_seg_head
(
cfg
,
self
.
backbone
.
output_shape
())
self
.
register_buffer
(
"pixel_mean"
,
torch
.
Tensor
(
cfg
.
MODEL
.
PIXEL_MEAN
).
view
(
-
1
,
1
,
1
))
self
.
register_buffer
(
"pixel_std"
,
torch
.
Tensor
(
cfg
.
MODEL
.
PIXEL_STD
).
view
(
-
1
,
1
,
1
))
@
property
def
device
(
self
):
return
self
.
pixel_mean
.
device
def
forward
(
self
,
batched_inputs
):
"""
Args:
batched_inputs: a list, batched outputs of :class:`DatasetMapper`.
Each item in the list contains the inputs for one image.
For now, each item in the list is a dict that contains:
* "image": Tensor, image in (C, H, W) format.
* "instances": Instances
* "sem_seg": semantic segmentation ground truth.
* Other information that's included in the original dicts, such as:
"height", "width" (int): the output resolution of the model, used in inference.
See :meth:`postprocess` for details.
Returns:
list[dict]:
each dict is the results for one image. The dict contains the following keys:
* "instances": see :meth:`GeneralizedRCNN.forward` for its format.
* "sem_seg": see :meth:`SemanticSegmentor.forward` for its format.
* "panoptic_seg": available when `PANOPTIC_FPN.COMBINE.ENABLED`.
See the return value of
:func:`combine_semantic_and_instance_outputs` for its format.
"""
images
=
[
x
[
"image"
].
to
(
self
.
device
)
for
x
in
batched_inputs
]
images
=
[(
x
-
self
.
pixel_mean
)
/
self
.
pixel_std
for
x
in
images
]
images
=
ImageList
.
from_tensors
(
images
,
self
.
backbone
.
size_divisibility
)
features
=
self
.
backbone
(
images
.
tensor
)
if
"proposals"
in
batched_inputs
[
0
]:
proposals
=
[
x
[
"proposals"
].
to
(
self
.
device
)
for
x
in
batched_inputs
]
proposal_losses
=
{}
if
"sem_seg"
in
batched_inputs
[
0
]:
gt_sem_seg
=
[
x
[
"sem_seg"
].
to
(
self
.
device
)
for
x
in
batched_inputs
]
gt_sem_seg
=
ImageList
.
from_tensors
(
gt_sem_seg
,
self
.
backbone
.
size_divisibility
,
self
.
sem_seg_head
.
ignore_value
).
tensor
else
:
gt_sem_seg
=
None
sem_seg_results
,
sem_seg_losses
=
self
.
sem_seg_head
(
features
,
gt_sem_seg
)
if
"instances"
in
batched_inputs
[
0
]:
gt_instances
=
[
x
[
"instances"
].
to
(
self
.
device
)
for
x
in
batched_inputs
]
else
:
gt_instances
=
None
if
self
.
proposal_generator
:
proposals
,
proposal_losses
=
self
.
proposal_generator
(
images
,
features
,
gt_instances
)
detector_results
,
detector_losses
=
self
.
roi_heads
(
images
,
features
,
proposals
,
gt_instances
)
if
self
.
training
:
losses
=
{}
losses
.
update
(
sem_seg_losses
)
losses
.
update
({
k
:
v
*
self
.
instance_loss_weight
for
k
,
v
in
detector_losses
.
items
()})
losses
.
update
(
proposal_losses
)
return
losses
processed_results
=
[]
for
sem_seg_result
,
detector_result
,
input_per_image
,
image_size
in
zip
(
sem_seg_results
,
detector_results
,
batched_inputs
,
images
.
image_sizes
):
height
=
input_per_image
.
get
(
"height"
,
image_size
[
0
])
width
=
input_per_image
.
get
(
"width"
,
image_size
[
1
])
sem_seg_r
=
sem_seg_postprocess
(
sem_seg_result
,
image_size
,
height
,
width
)
detector_r
=
detector_postprocess
(
detector_result
,
height
,
width
)
processed_results
.
append
({
"sem_seg"
:
sem_seg_r
,
"instances"
:
detector_r
})
if
self
.
combine_on
:
panoptic_r
=
combine_semantic_and_instance_outputs
(
detector_r
,
sem_seg_r
.
argmax
(
dim
=
0
),
self
.
combine_overlap_threshold
,
self
.
combine_stuff_area_limit
,
self
.
combine_instances_confidence_threshold
,
)
processed_results
[
-
1
][
"panoptic_seg"
]
=
panoptic_r
return
processed_results
def
combine_semantic_and_instance_outputs
(
instance_results
,
semantic_results
,
overlap_threshold
,
stuff_area_limit
,
instances_confidence_threshold
,
):
"""
Implement a simple combining logic following
"combine_semantic_and_instance_predictions.py" in panopticapi
to produce panoptic segmentation outputs.
Args:
instance_results: output of :func:`detector_postprocess`.
semantic_results: an (H, W) tensor, each is the contiguous semantic
category id
Returns:
panoptic_seg (Tensor): of shape (height, width) where the values are ids for each segment.
segments_info (list[dict]): Describe each segment in `panoptic_seg`.
Each dict contains keys "id", "category_id", "isthing".
"""
panoptic_seg
=
torch
.
zeros_like
(
semantic_results
,
dtype
=
torch
.
int32
)
# sort instance outputs by scores
sorted_inds
=
torch
.
argsort
(
-
instance_results
.
scores
)
current_segment_id
=
0
segments_info
=
[]
instance_masks
=
instance_results
.
pred_masks
.
to
(
dtype
=
torch
.
bool
,
device
=
panoptic_seg
.
device
)
# Add instances one-by-one, check for overlaps with existing ones
for
inst_id
in
sorted_inds
:
score
=
instance_results
.
scores
[
inst_id
].
item
()
if
score
<
instances_confidence_threshold
:
break
mask
=
instance_masks
[
inst_id
]
# H,W
mask_area
=
mask
.
sum
().
item
()
if
mask_area
==
0
:
continue
intersect
=
(
mask
>
0
)
&
(
panoptic_seg
>
0
)
intersect_area
=
intersect
.
sum
().
item
()
if
intersect_area
*
1.0
/
mask_area
>
overlap_threshold
:
continue
if
intersect_area
>
0
:
mask
=
mask
&
(
panoptic_seg
==
0
)
current_segment_id
+=
1
panoptic_seg
[
mask
]
=
current_segment_id
segments_info
.
append
(
{
"id"
:
current_segment_id
,
"isthing"
:
True
,
"score"
:
score
,
"category_id"
:
instance_results
.
pred_classes
[
inst_id
].
item
(),
"instance_id"
:
inst_id
.
item
(),
}
)
# Add semantic results to remaining empty areas
semantic_labels
=
torch
.
unique
(
semantic_results
).
cpu
().
tolist
()
for
semantic_label
in
semantic_labels
:
if
semantic_label
==
0
:
# 0 is a special "thing" class
continue
mask
=
(
semantic_results
==
semantic_label
)
&
(
panoptic_seg
==
0
)
mask_area
=
mask
.
sum
().
item
()
if
mask_area
<
stuff_area_limit
:
continue
current_segment_id
+=
1
panoptic_seg
[
mask
]
=
current_segment_id
segments_info
.
append
(
{
"id"
:
current_segment_id
,
"isthing"
:
False
,
"category_id"
:
semantic_label
,
"area"
:
mask_area
,
}
)
return
panoptic_seg
,
segments_info
Prev
1
…
6
7
8
9
10
11
12
13
14
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment