v0.2.0

8fbc9bb6 · Hang Zhang · 01946d40 · 8fbc9bb6 · 8fbc9bb6 · 8fbc9bb6
Commit 8fbc9bb6 authored Mar 19, 2018 by Hang Zhang
16 changed files
--- a/README.md
+++ b/README.md
 # PyTorch-Encoding
+created by [Hang Zhang](http://hangzh.com/)
 ## [Documentation](http://hangzh.com/PyTorch-Encoding/)
 - Please visit the [**Docs**](http://hangzh.com/PyTorch-Encoding/) for detail instructions of installation and usage. 
- [**Link**](http://hangzh.com/PyTorch-Encoding/experiments/texture.html) to the Deep TEN texture classification experiments and pre-trained models.
+## Citations
-## Citation
+**Context Encoding for Semantic Segmentation**  
+  [Hang Zhang](http://hangzh.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html), [Jianping Shi](http://shijianping.me/), [Zhongyue Zhang](http://zhongyuezhang.com/), [Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/), [Ambrish Tyagi](https://scholar.google.com/citations?user=GaSWCoUAAAAJ&hl=en), [Amit Agrawal](http://www.amitkagrawal.com/)
+```
+@InProceedings{Zhang_2018_CVPR,
+author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
+title = {Context Encoding for Semantic Segmentation},
+booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+month = {June},
+year = {2018}
+}
+```
 **Deep TEN: Texture Encoding Network** [[arXiv]](https://arxiv.org/pdf/1612.02844.pdf)  
  [Hang Zhang](http://hangzh.com/), [Jia Xue](http://jiaxueweb.com/), [Kristin Dana](http://eceweb1.rutgers.edu/vision/dana.html)

--- a/build.py
+++ b/build.py
@@ -29,6 +29,7 @@ if platform.system() == 'Darwin':
    ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.dylib')
 else:
+    os.environ['CFLAGS'] = '-std=c99'
    os.environ['TH_LIBRARIES'] = os.path.join(lib_path,'libATen.so.1')
    ENCODING_LIB = os.path.join(cwd, 'encoding/lib/libENCODING.so')

--- a/docs/source/experiments/texture.rst
+++ b/docs/source/experiments/texture.rst
@@ -8,20 +8,18 @@ Deep TEN: Deep Texture Encoding Network Example
 In this section, we show an example of training/testing Encoding-Net for texture recognition on MINC-2500 dataset. Comparing to original Torch implementation, we use *different learning rate* for pre-trained base network and encoding layer (10x), disable color jittering after reducing lr and adopt much *smaller training image size* (224 instead of 352). 
-.. note::
-    **Make Sure** to `Install PyTorch Encoding <../notes/compile.html>`_ First.
 Test Pre-trained Model
 ----------------------
+- Clone the GitHub repo::
- Clone the GitHub repo (I am sure you did during the installation)::
    git clone git@github.com:zhanghang1989/PyTorch-Encoding.git
+- Install PyTorch Encoding (if not yet). Please follow the installation guide `Installing PyTorch Encoding <../notes/compile.html>`_.
 - Download the `MINC-2500 <http://opensurfaces.cs.cornell.edu/publications/minc/>`_ dataset to ``$HOME/data/minc-2500/`` folder. Download pre-trained model (training `curve`_ as bellow, pre-trained on train-1 split using single training size of 224, with an error rate of :math:`19.98\%` using single crop on test-1 set)::
-    cd PyTorch-Encoding/experiments
+    cd PyTorch-Encoding/experiments/recognition
    bash model/download_models.sh
 .. _curve:
@@ -41,14 +39,14 @@ Train Your Own Model
 - Example training command for training above model::
-    python main.py --model deepten --nclass 23 --model encodingnet --batch-size 64 --lr 0.01 --epochs 60 
+    python main.py --model deepten --nclass 23 --model deepten --batch-size 64 --lr 0.01 --epochs 60 
- Training options::
+- Detail training options::
  -h, --help            show this help message and exit
  --dataset DATASET     training dataset (default: cifar10)
  --model MODEL         network model type (default: densenet)
-  --widen N             widen factor of the network (default: 4)
+  --backbone BACKBONE   backbone name (default: resnet50)
  --batch-size N        batch size for training (default: 128)
  --test-batch-size N   batch size for testing (default: 1000)
  --epochs N            number of epochs to train (default: 300)
@@ -69,7 +67,7 @@ Train Your Own Model
 Extending the Software
 ----------------------
-This code includes an integrated pipeline and some visualization tools (progress bar, real-time training curve plots). It is easy to use and extend for your own model or dataset:
+This code is well written, easy to use and extendable for your own models or datasets:
 - Write your own Dataloader ``mydataset.py`` to ``dataset/`` folder

--- a/docs/source/notes/compile.rst
+++ b/docs/source/notes/compile.rst
@@ -25,7 +25,18 @@ Reference
 ---------
    .. note::
-        If using the code in your research, please cite our paper.
+        If using the code in your research, please cite our papers.
+        * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
+            @InProceedings{Zhang_2018_CVPR,
+            author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
+            title = {Context Encoding for Semantic Segmentation},
+            booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+            month = {June},
+            year = {2018}
+            }
        * Hang Zhang, Jia Xue, and Kristin Dana. "Deep TEN: Texture Encoding Network." *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017*::

--- a/docs/source/notes/syncbn.rst
+++ b/docs/source/notes/syncbn.rst
 Implementing Synchronized Multi-GPU Batch Normalization
 =======================================================
-In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`. We will provide the training example in a later version.
+In this tutorial, we discuss the implementation detail of Multi-GPU Batch Normalization (BN) (classic implementation: :class:`encoding.nn.BatchNorm2d` and compatible :class:`encoding.parallel.SelfDataParallel`). We will provide the training example in a later version.
 How BN works?
 -------------
@@ -23,7 +23,7 @@ BN layer was introduced in the paper `Batch Normalization: Accelerating Deep Net
        \frac{d_\ell}{d_{x_i}} = \frac{d_\ell}{d_{y_i}}\cdot\frac{d_{y_i}}{d_{x_i}} + \frac{d_\ell}{d_\mu}\cdot\frac{d_\mu}{d_{x_i}} + \frac{d_\ell}{d_\sigma}\cdot\frac{d_\sigma}{d_{x_i}}
-    where :math:`\frac{d_\ell}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}} 
+    where :math:`\frac{d_{y_i}}{d_{x_i}}=\frac{\gamma}{\sigma}, \frac{d_\ell}{d_\mu}=-\frac{\gamma}{\sigma}\sum_i^N\frac{d_\ell}{d_{y_i}}
    \text{ and } \frac{d_\sigma}{d_{x_i}}=-\frac{1}{\sigma}(\frac{x_i-\mu}{N})`.
 Why Synchronize BN?
@@ -49,6 +49,9 @@ Suppose we have :math:`K` number of GPUs, :math:`sum(x)_k` and :math:`sum(x^2)_k
    * Then Sync the gradient (automatically handled by :class:`encoding.parallel.AllReduce`) and continue the backward.
+Classic Implementation
+~~~~~~~~~~~~~~~~~~~~~~
 - Synchronized DataParallel:
    Standard DataParallel pipeline of public frameworks (MXNet, PyTorch...) in each training iters: 

--- a/encoding/CMakeLists.txt
+++ b/encoding/CMakeLists.txt
@@ -62,7 +62,6 @@ IF(MSVC)
 ENDIF()
 TARGET_LINK_LIBRARIES(ENCODING 
-	${THC_LIBRARIES} 
 	${TH_LIBRARIES} 
 	${CUDA_cusparse_LIBRARY}
 )

--- a/encoding/cmake/FindTorch.cmake
+++ b/encoding/cmake/FindTorch.cmake
@@ -36,4 +36,3 @@ SET(Torch_INSTALL_INCLUDE "${TORCH_BUILD_DIR}/include" ${TORCH_TH_INCLUDE_DIR} $
 # Find the libs. We need to find libraries one by one.
 SET(TH_LIBRARIES "$ENV{TH_LIBRARIES}")
-SET(THC_LIBRARIES "$ENV{THC_LIBRARIES}")
--- a/encoding/dilated/densenet.py
+++ b/encoding/dilated/densenet.py
@@ -120,7 +120,9 @@ class _Transition(nn.Sequential):
 class DenseNet(nn.Module):
-    r"""Dilated Densenet-BC model class
+    r"""Dilated DenseNet.
+    For correctly dilation of transition layer fo DenseNet, we implement the :class:`encoding.nn.DilatedAvgPool2d`.
    Args:
        growth_rate (int) - how many filters to add each layer (`k` in paper)

--- a/encoding/functions/syncbn.py
+++ b/encoding/functions/syncbn.py
@@ -62,9 +62,9 @@ def sum_square(input):
 class _batchnorm(Function):
-    def __init__(self, training=False):
+    def __init__(ctx, training=False):
-        super(_batchnorm, self).__init__()
+        super(_batchnorm, ctx).__init__()
-        self.training = training
+        ctx.training = training
    def forward(ctx, input, gamma, beta, mean, std):
        ctx.save_for_backward(input, gamma, beta, mean, std)
@@ -99,13 +99,13 @@ class _batchnorm(Function):
                encoding_lib.Encoding_Float_batchnorm_Backward(
                    gradOutput, input, gradInput, gradGamma, gradBeta, 
                    mean, invstd, gamma, beta, gradMean, gradStd,
-                    self.training) 
+                    ctx.training) 
        elif isinstance(input, torch.cuda.DoubleTensor):
            with torch.cuda.device_of(input):
                encoding_lib.Encoding_Double_batchnorm_Backward(
                    gradOutput, input, gradInput, gradGamma, gradBeta, 
                    mean, invstd, gamma, beta, gradMean, gradStd,
-                    self.training) 
+                    ctx.training) 
        else:
            raise RuntimeError('Unimplemented data type!')
        return gradInput, gradGamma, gradBeta, gradMean, gradStd

--- a/encoding/kernel/generic/pooling_kernel.c
+++ b/encoding/kernel/generic/pooling_kernel.c
@@ -35,7 +35,7 @@ __global__ void Encoding_(DilatedAvgPool_Forward_kernel) (
    c = bc - b*C;
    /* boundary check for output */
    if (w >= Y.getSize(3) || h >= Y.getSize(2)) return;
-    int hstart = h*dW -padH;
+    int hstart = h*dH -padH;
    int wstart = w*dW -padW;
    int hend = min(hstart + kH*dilationH, X.getSize(2));
    int wend = min(wstart + kW*dilationW, X.getSize(3));

--- a/encoding/nn/encoding.py
+++ b/encoding/nn/encoding.py
@@ -13,13 +13,14 @@ import torch
 from torch.nn import Module, Parameter
 import torch.nn.functional as F
 from torch.autograd import Function, Variable
+from torch.nn.modules.utils import _single, _pair, _triple
 from .._ext import encoding_lib
 from ..functions import scaledL2, aggregate
 from ..parallel import my_data_parallel
 from ..functions import dilatedavgpool2d
-__all__ = ['Encoding', 'EncodingShake', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d'] 
+__all__ = ['Encoding', 'EncodingDrop', 'Inspiration', 'DilatedAvgPool2d', 'UpsampleConv2d'] 
 class Encoding(Module):
    r"""
@@ -104,9 +105,9 @@ class Encoding(Module):
            + 'N x ' + str(self.D) + '=>' + str(self.K) + 'x' \
            + str(self.D) + ')'
-class EncodingShake(Module):
+class EncodingDrop(Module):
    def __init__(self, D, K):
-        super(EncodingShake, self).__init__()
+        super(EncodingDrop, self).__init__()
        # init codewords and smoothing factor
        self.D, self.K = D, K
        self.codewords = Parameter(torch.Tensor(K, D), 
@@ -119,7 +120,7 @@ class EncodingShake(Module):
        self.codewords.data.uniform_(-std1, std1)
        self.scale.data.uniform_(-1, 0)
-    def shake(self):
+    def _drop(self):
        if self.training:
            self.scale.data.uniform_(-1, 0)
        else:
@@ -143,14 +144,12 @@ class EncodingShake(Module):
            X = X.view(B,D,-1).transpose(1,2).contiguous()
        else:
            raise RuntimeError('Encoding Layer unknown input dims!')
-        # shake
+        self._drop()
-        self.shake()
        # assignment weights
        A = F.softmax(scaledL2(X, self.codewords, self.scale), dim=1)
        # aggregate
        E = aggregate(A, X, self.codewords)
-        # shake
+        self._drop()
-        self.shake()
        return E
    def __repr__(self):
@@ -202,27 +201,27 @@ class DilatedAvgPool2d(Module):
    r"""We provide Dilated Average Pooling for the dilation of Densenet as
    in :class:`encoding.dilated.DenseNet`.
-    Reference::
+    Reference:
        We provide this code for a comming paper.
    Applies a 2D average pooling over an input signal composed of several input planes.
    In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
-    output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
+    output :math:`(B, C, H_{out}, W_{out})`, :attr:`kernel_size` :math:`(k_H,k_W)`, :attr:`stride` :math:`(s_H,s_W)` :attr:`dilation` :math:`(d_H,d_W)`
    can be precisely described as:
    .. math::
        \begin{array}{ll}
-        out(b, c, h, w)  = 1 / (kH * kW) * 
+        out(b, c, h, w)  = 1 / (k_H \cdot k_W) \cdot 
-        \sum_{{m}=0}^{kH-1} \sum_{{n}=0}^{kW-1}
+        \sum_{{m}=0}^{k_H-1} \sum_{{n}=0}^{k_W-1}
-        input(b, c, dH * h + m, dW * w + n)
+        input(b, c, s_H \cdot h + d_H \cdot m, s_W \cdot w + d_W \cdot n)
        \end{array}
    | If :attr:`padding` is non-zero, then the input is implicitly zero-padded on both sides
      for :attr:`padding` number of points
-    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
+    | The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
@@ -235,10 +234,11 @@ class DilatedAvgPool2d(Module):
        dilation: the dilation parameter similar to Conv2d
    Shape:
-        - Input: :math:`(N, C, H_{in}, W_{in})`
+        - Input: :math:`(B, C, H_{in}, W_{in})`
-        - Output: :math:`(N, C, H_{out}, W_{out})` where
+        - Output: :math:`(B, C, H_{out}, W_{out})` where
          :math:`H_{out} = floor((H_{in}  + 2 * padding[0] - kernel\_size[0]) / stride[0] + 1)`
          :math:`W_{out} = floor((W_{in}  + 2 * padding[1] - kernel\_size[1]) / stride[1] + 1)`
+          For :attr:`stride=1`, the output featuremap preserves the same size as input.
    Examples::
@@ -306,7 +306,7 @@ class UpsampleConv2d(Module):
                         (in_channels, scale * scale * out_channels, kernel_size[0], kernel_size[1])
        bias (Tensor):   the learnable bias of the module of shape (scale * scale * out_channels)
-    Examples::
+    Examples:
        >>> # With square kernels and equal stride
        >>> m = nn.UpsampleCov2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding

--- a/encoding/parallel.py
+++ b/encoding/parallel.py
@@ -19,7 +19,7 @@ from torch.nn.parallel.scatter_gather import scatter, scatter_kwargs, \
 from torch.nn.parallel.replicate import replicate
 from torch.nn.parallel.parallel_apply import parallel_apply
-__all__ = ['AllReduce', 'Broadcast', 'ModelDataParallel', 
+__all__ = ['Reduce', 'AllReduce', 'Broadcast', 'ModelDataParallel', 
    'CriterionDataParallel', 'SelfDataParallel']
 def nccl_all_reduce(inputs):
@@ -45,6 +45,22 @@ def comm_all_reduce(inputs):
        results.append(result.clone().cuda(i))
    return results
+class Reduce(Function):
+    def forward(ctx, *inputs):
+        ctx.save_for_backward(*inputs)
+        if len(inputs) == 1:
+            return inputs[0]
+        return comm.reduce_add(inputs)
+    def backward(ctx, gradOutput):
+        inputs = tuple(ctx.saved_tensors)
+        if len(inputs) == 1:
+            return gradOutput
+        gradInputs = []
+        for i in range(len(inputs)):
+            with torch.cuda.device_of(inputs[i]):
+                gradInputs.append(gradOutput.cuda())
+        return tuple(gradInputs)
 class AllReduce(Function):
    """Cross GPU all reduce autograd operation for calculate mean and

--- a/experiments/recognition/README.md
+++ b/experiments/recognition/README.md
- [Link to the Deep TEN pre-trained models and experiments](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
+- [Link to the EncNet CIFAR experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/cifar.html)
+- [Link to the Deep TEN experiments and pre-trained models](http://hangzh.com/PyTorch-Encoding/experiments/texture.html)
--- a/experiments/recognition/option.py
+++ b/experiments/recognition/option.py
@@ -24,6 +24,8 @@ class Options():
            help='number of classes (default: 10)')
        parser.add_argument('--widen', type=int, default=4, metavar='N',
            help='widen factor of the network (default: 4)')
+        parser.add_argument('--ncodes', type=int, default=32, metavar='N',
+            help='number of codewords in Encoding Layer (default: 32)')
        parser.add_argument('--backbone', type=str, default='resnet50',
            help='backbone name (default: resnet50)')
        # training hyper params
@@ -31,8 +33,8 @@ class Options():
            metavar='N', help='batch size for training (default: 128)')
        parser.add_argument('--test-batch-size', type=int, default=256, 
            metavar='N', help='batch size for testing (default: 256)')
-        parser.add_argument('--epochs', type=int, default=300, metavar='N',
+        parser.add_argument('--epochs', type=int, default=600, metavar='N',
-            help='number of epochs to train (default: 300)')
+            help='number of epochs to train (default: 600)')
        parser.add_argument('--start_epoch', type=int, default=1, 
            metavar='N', help='the epoch number to start (default: 0)')
        # lr setting
@@ -65,4 +67,7 @@ class Options():
        self.parser = parser
    def parse(self):
-        return self.parser.parse_args()
+        args = self.parser.parse_args()
+        if args.dataset == 'minc':
+            args.nclass = 23
+        return args
--- a/setup.py
+++ b/setup.py
@@ -32,7 +32,7 @@ class install(setuptools.command.install.install):
        with open(version_path, 'w') as f:
            f.write("__version__ = '{}'\n".format(version))
-version = '0.1.0'
+version = '0.2.0'
 try:
    sha = subprocess.check_output(['git', 'rev-parse', 'HEAD'], 
        cwd=cwd).decode('ascii').strip()

--- a/test/test.py
+++ b/test/test.py
@@ -62,19 +62,6 @@ def test_sum_square():
    print('Testing sum_square(): {}'.format(test))
-def test_dilated_densenet():
-    net = encoding.dilated.densenet161(True).cuda().eval()
-    print(net)
-    net2 = models.densenet161(True).cuda().eval()
-    x=Variable(torch.Tensor(1,3,224,224).uniform_(-0.5,0.5)).cuda()
-    y = net.features(x)
-    y2 = net2.features(x)
-    print(y[0][0])
-    print(y2[0][0])
 def test_dilated_avgpool():
    X = Variable(torch.cuda.FloatTensor(1,3,75,75).uniform_(-0.5,0.5))
    input = (X,)
@@ -89,6 +76,3 @@ if __name__ == '__main__':
    test_aggregate()
    test_sum_square()
    test_dilated_avgpool()
-    """
-    test_dilated_densenet()
-    """