Aggregate operation, aggregate the residuals of inputs (:math:`X`) with repect to the codewords (:math:`C`) with assignment weights (:math:`A`).
Aggregate operation, aggregate the residuals of inputs (:math:`X`) with repect
to the codewords (:math:`C`) with assignment weights (:math:`A`).
.. math::
e_{k} = \sum_{i=1}^{N} a_{ik} (x_i - d_k)
Shape:
- Input: :math:`A\in\mathcal{R}^{B\times N\times K}` :math:`X\in\mathcal{R}^{B\times N\times D}` :math:`C\in\mathcal{R}^{K\times D}` (where :math:`B` is batch, :math:`N` is total number of features, :math:`K` is number is codewords, :math:`D` is feature dimensions.)
- Input: :math:`X\in\mathcal{R}^{B\times N\times D}` :math:`C\in\mathcal{R}^{K\times D}` :math:`S\in \mathcal{R}^K` (where :math:`B` is batch, :math:`N` is total number of features, :math:`K` is number is codewords, :math:`D` is feature dimensions.)
r"""Applies a 1D convolution over an input signal composed of several
input planes.
In the simplest case, the output value of the layer with input size
:math:`(N, C_{in}, L)` and output :math:`(N, C_{out}, L_{out})` can be
precisely described as:
.. math::
\begin{array}{ll}
out(N_i, C_{out_j}) = bias(C_{out_j})
+ \sum_{{k}=0}^{C_{in}-1} weight(C_{out_j}, k)
\star input(N_i, k)
\end{array}
where :math:`\star` is the valid `cross-correlation`_ operator
| :attr:`stride` controls the stride for the cross-correlation.
| If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides for :attr:`padding` number of points.
| :attr:`dilation` controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.
| :attr:`groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`.
| At groups=1, all inputs are convolved to all outputs.
| At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups=`in_channels`, each input channel is convolved with its own set of filters (of size `out_channels // in_channels`).
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 0
dilation (int or tuple, optional): Spacing between kernel
elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
where :math:`\star` is the valid 2D `cross-correlation`_ operator
| :attr:`stride` controls the stride for the cross-correlation.
| If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides for :attr:`padding` number of points.
| :attr:`dilation` controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.
| :attr:`groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`.
| At groups=1, all inputs are convolved to all outputs.
| At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups=`in_channels`, each input channel is convolved with its own set of filters (of size `out_channels // in_channels`).
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` where
r"""Applies a 2D transposed convolution operator over an input image
composed of several input planes.
This module can be seen as the gradient of Conv2d with respect to its input.
It is also known as a fractionally-strided convolution or
a deconvolution (although it is not an actual deconvolution operation).
| :attr:`stride` controls the stride for the cross-correlation.
| If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides for :attr:`padding` number of points.
| If :attr:`output_padding` is non-zero, then the output is implicitly zero-padded on one side for :attr:`output_padding` number of points.
| :attr:`dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.
| :attr:`groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`.
| At groups=1, all inputs are convolved to all outputs.
| At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups=`in_channels`, each input channel is convolved with its own set of filters (of size `out_channels // in_channels`).
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`output_padding` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimensions
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
output_padding (int or tuple, optional): Zero-padding added to one side of the output. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` where
input(N_i, C_j, stride[0] * h + m, stride[1] * w + n)
\end{array}
| If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides
for :attr:`padding` number of points
| :attr:`dilation` controls the spacing between the kernel points. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
Args:
kernel_size: the size of the window to take a max over
stride: the stride of the window. Default value is :attr:`kernel_size`
padding: implicit zero padding to be added on both sides
dilation: a parameter that controls the stride of elements in the window
return_indices: if True, will return the max indices along with the outputs.
Useful when Unpooling later
ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape
Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*
and the residuals are given by :math:`r_{ik} = x_i - d_k`. The output encoders are
:math:`E=\{e_1,...e_K\}`.
Args:
D: dimention of the features or feature channels
K: number of codeswords
Shape:
- Input: :math:`X\in\mathcal{R}^{B\times N\times D}` or :math:`\mathcal{R}^{B\times D\times H\times W}` (where :math:`B` is batch, :math:`N` is total number of features or :math:`H\times W`.)
- Input: :math:`X\in\mathcal{R}^{B\times N\times D}` or
:math:`\mathcal{R}^{B\times D\times H\times W}` (where :math:`B` is batch,
:math:`N` is total number of features or :math:`H\times W`.)
Inspiration Layer (CoMatch Layer) enables the multi-style transfer in feed-forward network, which learns to match the target feature statistics during the training.
This module is differentialble and can be inserted in standard feed-forward network to be learned directly from the loss function without additional supervision.
r"""
Inspiration Layer (CoMatch Layer) enables the multi-style transfer in feed-forward
network, which learns to match the target feature statistics during the training.
This module is differentialble and can be inserted in standard feed-forward network
to be learned directly from the loss function without additional supervision.
.. math::
Y = \phi^{-1}[\phi(\mathcal{F}^T)W\mathcal{G}]
Please see the `example of MSG-Net <./experiments/style.html>`_
Please see the `example of MSG-Net <./experiments/style.html>`_
training multi-style generative network for real-time transfer.
Reference:
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer." *arXiv preprint arXiv:1703.06953 (2017)*
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer."
@@ -275,14 +281,17 @@ class DilatedAvgPool2d(Module):
classUpsampleConv2d(Module):
r"""
To avoid the checkerboard artifacts of standard Fractionally-strided Convolution, we adapt an integer stride convolution but producing a :math:`2\times 2` outputs for each convolutional window.
To avoid the checkerboard artifacts of standard Fractionally-strided Convolution,
we adapt an integer stride convolution but producing a :math:`2\times 2` outputs for
each convolutional window.
.. image:: _static/img/upconv.png
:width: 50%
:align: center
Reference:
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer." *arXiv preprint arXiv:1703.06953 (2017)*
Hang Zhang and Kristin Dana. "Multi-style Generative Network for Real-time Transfer."
*arXiv preprint arXiv:1703.06953 (2017)*
Args:
in_channels (int): Number of channels in the input image
...
...
@@ -290,8 +299,10 @@ class UpsampleConv2d(Module):
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
output_padding (int or tuple, optional): Zero-padding added to one side of the output. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
output_padding (int or tuple, optional): Zero-padding added to one side of the output.
Default: 0
groups (int, optional): Number of blocked connections from input channels to output
channels. Default: 1
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
scale_factor (int): scaling factor for upsampling convolution. Default: 1
The mean and standard-deviation are calculated per-dimension over
the mini-batches and gamma and beta are learnable parameter vectors
...
...
@@ -246,6 +151,9 @@ class BatchNorm2d(Module):
During evaluation, this running mean/variance is used for normalization.
Because the BatchNorm is done over the `C` dimension, computing statistics
on `(N, H, W)` slices, it's common terminology to call this Spatial BatchNorm
Args:
num_features: num_features from an expected input of
size batch_size x num_features x height x width
...
...
@@ -253,16 +161,20 @@ class BatchNorm2d(Module):
Default: 1e-5
momentum: the value used for the running_mean and running_var
computation. Default: 0.1
affine: a boolean value that when set to true, gives the layer learnable
affine parameters. Default: True
affine: a boolean value that when set to ``True``, gives the layer learnable
affine parameters. Default: ``True``
Shape:
- Input: :math:`(N, C, H, W)`
- Output: :math:`(N, C, H, W)` (same shape as input)
Reference:
.. [1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." *ICML 2015*
.. [2] Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. "Context Encoding for Semantic Segmentation." *CVPR 2018*
args: :attr:`args.lr_scheduler` lr scheduler mode (`cos`, `poly`), :attr:`args.lr` base learning rate, :attr:`args.epochs` number of epochs, :attr:`args.lr_step`
args: :attr:`args.lr_scheduler` lr scheduler mode (`cos`, `poly`),
:attr:`args.lr` base learning rate, :attr:`args.epochs` number of epochs,