We provide correct dilated pre-trained ResNet and DenseNet for semantic segmentation.
For dilation of ResNet, we replace the stride of 2 Conv3x3 at begining of certain stage and update the dilation of the conv layers afterwards.
For dilation of DenseNet, we provide :class:`encoding.nn.DilatedAvgPool2d` that handles the dilation of the transition layers, then update the dilation of the conv layers afterwards.
We provide correct dilated pre-trained ResNet and DenseNet (stride of 8) for semantic segmentation.
For dilation of DenseNet, we provide :class:`encoding.nn.DilatedAvgPool2d`.
All provided models have been verified.
.. note::
This code is provided together with the paper (coming soon), please cite our work.
* Please follow the `PyTorch instructions <https://github.com/pytorch/pytorch#from-source>`_ to install PyTorch from Source to the ``$HOME`` directory (recommended). Or you can simply clone a copy to ``$HOME`` directory::
To avoid the checkerboard artifacts of standard Fractionally-strided Convolution, we adapt an integer stride convolution but producing a :math:`2\times 2` outputs for each convolutional window.
.. image:: _static/img/upconv.png
:width: 50%
:align: center
Reference:
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer." *arXiv preprint arXiv:1703.06953 (2017)*
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
output_padding (int or tuple, optional): Zero-padding added to one side of the output. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
scale_factor (int): scaling factor for upsampling convolution. Default: 1
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` where
Encoding Layer: a learnable residual encoder over 3d or 4d input that
is seen as a mini-batch.
...
...
@@ -35,6 +36,9 @@ class Encoding(nn.Module):
Please see the `example of training Deep TEN <./experiments/texture.html>`_.
Reference:
Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*
Args:
D: dimention of the features or feature channels
K: number of codeswords
...
...
@@ -51,22 +55,19 @@ class Encoding(nn.Module):
>>> import encoding
>>> import torch
>>> import torch.nn.functional as F
>>> from torch.autograd import Variable, gradcheck
>>> from torch.autograd import Variable
>>> B,C,H,W,K = 2,3,4,5,6
>>> X = Variable(torch.cuda.DoubleTensor(B,C,H,W).uniform_(-0.5,0.5), requires_grad=True)
Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*
@@ -159,7 +159,7 @@ class EncodingShake(nn.Module):
+str(self.D)+')'
classInspiration(nn.Module):
classInspiration(Module):
r"""
Inspiration Layer (CoMatch Layer) enables the multi-style transfer in feed-forward network, which learns to match the target feature statistics during the training.
This module is differentialble and can be inserted in standard feed-forward network to be learned directly from the loss function without additional supervision.
To avoid the checkerboard artifacts of standard Fractionally-strided Convolution, we adapt an integer stride convolution but producing a :math:`2\times 2` outputs for each convolutional window.
.. image:: _static/img/upconv.png
:width: 50%
:align: center
Reference:
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer." *arXiv preprint arXiv:1703.06953 (2017)*
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
output_padding (int or tuple, optional): Zero-padding added to one side of the output. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
scale_factor (int): scaling factor for upsampling convolution. Default: 1
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` where
args: base learning rate :attr:`args.lr`, number of epochs :attr:`args.epochs`
args: :attr:`args.lr_scheduler` lr scheduler mode (`cos`, `poly`), :attr:`args.lr` base learning rate, :attr:`args.epochs` number of epochs, :attr:`args.lr_step`
niters: number of iterations per epoch
"""
def__init__(self,args,niters):
def__init__(self,args,niters=0):
self.mode=args.lr_scheduler
print('Using {} LR Scheduler!'.format(self.mode))
self.lr=args.lr
self.niters=niters
self.N=args.epochs*niters
ifself.mode=='step':
self.lr_step=args.lr_step
else:
self.niters=niters
self.N=args.epochs*niters
self.epoch=-1
def__call__(self,optimizer,i,epoch,best_pred):
T=(epoch-1)*self.niters+i
lr=0.5*self.lr*(1+math.cos(1.0*T/self.N*math.pi))
def__call__(self,optimizer,i,epoch):
ifself.mode=='cos':
T=(epoch-1)*self.niters+i
lr=0.5*self.lr*(1+math.cos(1.0*T/self.N*math.pi))
elifself.mode=='poly':
T=(epoch-1)*self.niters+i
lr=self.lr*pow((1-1.0*T/self.N),0.9)
elifself.mode=='step':
lr=self.lr*(0.1**((epoch-1)//self.lr_step))
else:
raiseRuntimeError('Unknown LR scheduler!')
ifepoch>self.epoch:
print('\n=>Epochs %i, learning rate = %.4f, previous best ='\
'%.3f%%'%(epoch,lr,best_pred))
print('\n=>Epoches %i, learning rate = %.4f'%(
epoch,lr))
self.epoch=epoch
self._adjust_learning_rate(optimizer,lr)
def_adjust_learning_rate(self,optimizer,lr):
iflen(optimizer.param_groups)==1:
optimizer.param_groups[0]['lr']=lr
eliflen(optimizer.param_groups)==2:
else:
# enlarge the lr at the head
optimizer.param_groups[0]['lr']=lr
optimizer.param_groups[1]['lr']=lr*10
else:
raiseRuntimeError('unsupported number of param groups: {}' \
.format(len(optimizer.param_groups)))
foriinrange(1,len(optimizer.param_groups)):
optimizer.param_groups[i]['lr']=lr*10
# refer to https://github.com/xternalz/WideResNet-pytorch