"docs/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "62cbde8d41ac39e4b3a1f5bbbbc546cc93f1d84d"
Commit 9276aa93 authored by VoVAllen's avatar VoVAllen
Browse files

add tutorial

parent 8801154b
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Capsule Network\n",
"================\n",
"**Author**: `Jinjing Zhou`\n",
"\n",
"This Tutorial is for blablabla"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"Capsule Network is proposed by blablabla"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What's a capsule?\n",
"> A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. \n",
"-- <cite>Geoffrey E. Hinton</cite>\n",
"\n",
"Generally Speaking, "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [0.4.0] - 2018-01-30
### Added
- Supports and works with CIFAR10 dataset.
### Changed
- Upgrade to PyTorch 0.3.0.
- Supports CUDA 9.
- Drop our custom softmax function and switch to PyTorch softmax function.
- Modify the save_image utils function to handle 3-channel (RGB) image.
### Fixed
- Compatibilities with PyTorch 0.3.0.
## [0.3.0] - 2017-11-27
### Added
- Decoder network PyTorch module.
- Reconstruct image with Decoder network during testing.
- Save the original and recontructed images into file system.
- Log the original and reconstructed images using TensorBoard.
### Changed
- Refactor reconstruction loss function and decoder network.
- Remove image reconstruction from training.
## [0.2.0] - 2017-11-26
### Added
- New dependencies for TensorBoard and tqdm.
- Logging losses and accuracies with TensorBoard.
- New utils functions for:
- computing accuracy
- convert values of the model parameters to numpy.array.
- parsing boolean values with argparse
- Softmax function that takes a dimension.
- More detailed code comments.
- Show margin loss and reconstruction loss in logs.
- Show accuracy in train logs.
### Changed
- Refactor loss functions.
- Clean codes.
### Fixed
- Runtime error during pip install requirements.txt
- Bug in routing algorithm.
## [0.1.0] - 2017-11-12
### Added
- Implemented reconstruction loss.
- Saving reconstructed image as file.
- Improve training speed by using PyTorch DataParallel to wrap our model.
- PyTorch will parallelized the model and data over multiple GPUs.
- Supports training:
- on CPU (tested with macOS Sierra)
- on one GPU (tested with 1 Tesla K80 GPU)
- on multiple GPU (tested with 8 GPUs)
- with or without CUDA (tested with CUDA version 8.0.61)
- cuDNN 5 (tested with cuDNN 5.1.3)
### Changed
- More intuitive variable naming.
### Fixed
- Resolve Pylint warnings and reformat code.
- Missing square in equation 4 for margin (class) loss.
## 0.0.1 - 2017-11-04
### Added
- Initial release. The first beta version. API is stable. The code runs. So, I think it's safe to use for development but not ready for general production usage.
[Unreleased]: https://github.com/cedrickchee/capsule-net-pytorch/compare/v1.0.0...HEAD
[0.1.0]: https://github.com/cedrickchee/capsule-net-pytorch/compare/v0.0.1...v0.1.0
[0.2.0]: https://github.com/cedrickchee/capsule-net-pytorch/compare/v0.1.0...v0.2.0
[0.3.0]: https://github.com/cedrickchee/capsule-net-pytorch/compare/v0.2.0...v0.3.0
[0.4.0]: https://github.com/cedrickchee/capsule-net-pytorch/compare/v0.3.0...v0.4.0
COPYRIGHT
All contributions by Cedric Chee:
Copyright (c) 2017, Cedric Chee.
All rights reserved.
All other contributions:
Copyright (c) 2017, the respective contributors.
All rights reserved.
Each contributor holds copyright over their respective contributions.
The project versioning (Git) records all such contribution source information.
LICENSE
The MIT License (MIT)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# PyTorch CapsNet: Capsule Network for PyTorch
[![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/cedrickchee/capsule-net-pytorch/blob/master/LICENSE)
![completion](https://img.shields.io/badge/completion%20state-95%25-green.svg?style=plastic)
A CUDA-enabled PyTorch implementation of CapsNet (Capsule Network) based on this paper:
[Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules. NIPS 2017](https://arxiv.org/abs/1710.09829)
The current `test error is 0.21%` and the `best test error is 0.20%`. The current `test accuracy is 99.31%` and the `best test accuracy is 99.32%`.
**What is a Capsule**
> A Capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.
You can learn more about Capsule Networks [here](#learning-resources).
**Why another CapsNet implementation?**
I wanted a decent PyTorch implementation of CapsNet and I couldn't find one at the point when I started. The goal of this implementation is focus to help newcomers learn and understand the CapsNet architecture and the idea of Capsules. The implementation is **NOT** focus on rigorous correctness of the results. In addition, the codes are not optimized for speed. To help us read and understand the codes easier, the codes comes with ample comments and the Python classes and functions are documented with Python docstring.
I will try my best to check and fix issues reported. Contributions are highly welcomed. If you find any bugs or errors in the codes, please do not hesitate to open an issue or a pull request. Thank you.
**Status and Latest Updates:**
See the [CHANGELOG](CHANGELOG.md)
**Datasets**
The model was trained on the standard [MNIST](http://yann.lecun.com/exdb/mnist/) data.
*Note: you don't have to manually download, preprocess, and load the MNIST dataset as [TorchVision](https://github.com/pytorch/vision) will take care of this step for you.*
I have tried using other datasets. See the [Other Datasets](#other-datasets) section below for more details.
## Requirements
- Python 3
- Tested with version 3.6.4
- [PyTorch](http://pytorch.org/)
- Tested with version 0.3.0.post4
- Migrate existing code to work in version 0.4.0. [Work-In-Progress]
- Code will not run with version 0.1.2 due to `keepdim` not available in this version.
- Code will not run with version 0.2.0 due to `softmax` function doesn't takes a dimension.
- CUDA 8 and above
- Tested with CUDA 8 and CUDA 9.
- [TorchVision](https://github.com/pytorch/vision)
- [tensorboardX](https://github.com/lanpa/tensorboard-pytorch)
- [tqdm](https://github.com/tqdm/tqdm)
## Usage
### Training and Evaluation
**Step 1.**
Clone this repository with ``git`` and install project dependencies.
```bash
$ git clone https://github.com/cedrickchee/capsule-net-pytorch.git
$ cd capsule-net-pytorch
$ pip install -r requirements.txt
```
**Step 2.**
Start the CapsNet on MNIST training and evaluation:
- Training with default settings:
```bash
$ python main.py
```
- Training on 8 GPUs with 30 epochs and 1 routing iteration:
```bash
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --epochs 30 --num-routing 1 --threads 16 --batch-size 128 --test-batch-size 128
```
**Step 3.**
Test a pre-trained model:
If you have trained a model in Step 2 above, then the weights for the trained model will be saved to `results/trained_model/model_epoch_10.pth`. [WIP] Now just run the following command to get test results.
```bash
$ python main.py --is-training 0 --weights results/trained_model/model_epoch_10.pth
```
### Pre-trained Model
You can download the weights for the pre-trained model from my Google Drive. We saved the weights (model state dict) and the optimizer state for the model at the end of every training epoch.
- Weights from [epoch 50 checkpoint](https://drive.google.com/uc?export=download&id=1lYtOMSreP4I9hr9un4DsBJZrzodI6l2d) [84 MB].
- Weights from [epoch 40 to 50](https://drive.google.com/uc?export=download&id=1VMuVtJrecz47czsT5HqLxZpFjkLoMKaL) checkpoints [928 MB].
Uncompress and put the weights (.pth files) into `./results/trained_model/`.
*Note: the model was **last trained** on 2017-11-26 and the weights **last updated** on 2017-11-28.*
### The Default Hyper Parameters
| Parameter | Value | CLI arguments |
| --- | --- | --- |
| Training epochs | 10 | --epochs 10 |
| Learning rate | 0.01 | --lr 0.01 |
| Training batch size | 128 | --batch-size 128 |
| Testing batch size | 128 | --test-batch-size 128 |
| Log interval | 10 | --log-interval 10 |
| Disables CUDA training | false | --no-cuda |
| Num. of channels produced by the convolution | 256 | --num-conv-out-channel 256 |
| Num. of input channels to the convolution | 1 | --num-conv-in-channel 1 |
| Num. of primary unit | 8 | --num-primary-unit 8 |
| Primary unit size | 1152 | --primary-unit-size 1152 |
| Num. of digit classes | 10 | --num-classes 10 |
| Output unit size | 16 | --output-unit-size 16 |
| Num. routing iteration | 3 | --num-routing 3 |
| Use reconstruction loss | true | --use-reconstruction-loss |
| Regularization coefficient for reconstruction loss | 0.0005 | --regularization-scale 0.0005 |
| Dataset name (mnist, cifar10) | mnist | --dataset mnist |
| Input image width to the convolution | 28 | --input-width 28 |
| Input image height to the convolution | 28 | --input-height 28 |
## Results
### Test Error
CapsNet classification test error on MNIST. The MNIST average and standard deviation results are reported from 3 trials.
The results can be reproduced by running the following commands.
```bash
python main.py --epochs 50 --num-routing 1 --use-reconstruction-loss no --regularization-scale 0.0 #CapsNet-v1
python main.py --epochs 50 --num-routing 1 --use-reconstruction-loss yes --regularization-scale 0.0005 #CapsNet-v2
python main.py --epochs 50 --num-routing 3 --use-reconstruction-loss no --regularization-scale 0.0 #CapsNet-v3
python main.py --epochs 50 --num-routing 3 --use-reconstruction-loss yes --regularization-scale 0.0005 #CapsNet-v4
```
Method | Routing | Reconstruction | MNIST (%) | *Paper*
:---------|:------:|:---:|:----:|:----:
Baseline | -- | -- | -- | *0.39*
CapsNet-v1 | 1 | no | -- | *0.34 (0.032)*
CapsNet-v2 | 1 | yes | -- | *0.29 (0.011)*
CapsNet-v3 | 3 | no | -- | *0.35 (0.036)*
CapsNet-v4 | 3 | yes | 0.21 | *0.25 (0.005)*
### Training Loss and Accuracy
The training losses and accuracies for CapsNet-v4 (50 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):
![](results/train_loss_accuracy.png)
Training accuracy. Highest training accuracy: 100%
![](results/train_accuracy.png)
Training loss. Lowest training error: 0.1938%
![](results/train_loss.png)
### Test Loss and Accuracy
The test losses and accuracies for CapsNet-v4 (50 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):
![](results/test_loss_accuracy.png)
Test accuracy. Highest test accuracy: 99.32%
![](results/test_accuracy.png)
Test loss. Lowest test error: 0.2002%
![](results/test_loss.png)
### Training Speed
- Around `5.97s / batch` or `8min / epoch` on a single Tesla K80 GPU with batch size of 704.
- Around `3.25s / batch` or `25min / epoch` on a single Tesla K80 GPUwith batch size of 128.
![](results/training_speed.png)
In my case, these are the hyperparameters I used for the training setup:
- batch size: 128
- Epochs: 50
- Num. of routing: 3
- Use reconstruction loss: yes
- Regularization scale for reconstruction loss: 0.0005
### Reconstruction
The results of CapsNet-v4.
Digits at left are reconstructed images.
<table>
<tr>
<td>
<img src="results/reconstructed_images.png"/>
</td>
<td>
<p>[WIP] Ground truth image from dataset</p>
</td>
</tr>
</table>
### Model Design
```bash
Model architecture:
------------------
Net (
(conv1): ConvLayer (
(conv0): Conv2d(1, 256, kernel_size=(9, 9), stride=(1, 1))
(relu): ReLU (inplace)
)
(primary): CapsuleLayer (
(conv_units): ModuleList (
(0): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(1): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(2): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(3): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(4): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(5): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(6): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
(7): Conv2d(256, 32, kernel_size=(9, 9), stride=(2, 2))
)
)
(digits): CapsuleLayer (
)
(decoder): Decoder (
(fc1): Linear (160 -> 512)
(fc2): Linear (512 -> 1024)
(fc3): Linear (1024 -> 784)
(relu): ReLU (inplace)
(sigmoid): Sigmoid ()
)
)
Parameters and size:
-------------------
conv1.conv0.weight: [256, 1, 9, 9]
conv1.conv0.bias: [256]
primary.conv_units.0.weight: [32, 256, 9, 9]
primary.conv_units.0.bias: [32]
primary.conv_units.1.weight: [32, 256, 9, 9]
primary.conv_units.1.bias: [32]
primary.conv_units.2.weight: [32, 256, 9, 9]
primary.conv_units.2.bias: [32]
primary.conv_units.3.weight: [32, 256, 9, 9]
primary.conv_units.3.bias: [32]
primary.conv_units.4.weight: [32, 256, 9, 9]
primary.conv_units.4.bias: [32]
primary.conv_units.5.weight: [32, 256, 9, 9]
primary.conv_units.5.bias: [32]
primary.conv_units.6.weight: [32, 256, 9, 9]
primary.conv_units.6.bias: [32]
primary.conv_units.7.weight: [32, 256, 9, 9]
primary.conv_units.7.bias: [32]
digits.weight: [1, 1152, 10, 16, 8]
decoder.fc1.weight: [512, 160]
decoder.fc1.bias: [512]
decoder.fc2.weight: [1024, 512]
decoder.fc2.bias: [1024]
decoder.fc3.weight: [784, 1024]
decoder.fc3.bias: [784]
Total number of parameters on (with reconstruction network): 8227088 (8 million)
```
### TensorBoard
We logged the training and test losses and accuracies using tensorboardX. TensorBoard helps us visualize how the machine learn over time. We can visualize statistics, such as how the objective function is changing or weights or accuracy varied during training.
TensorBoard operates by reading TensorFlow data (events files).
#### How to Use TensorBoard
1. Download a [copy of the events files](https://drive.google.com/uc?export=download&id=1lZVffeZTkUQfSxmZmYDViRzmhb59wBWL) for the latest run from my Google Drive.
2. Uncompress the file and put it into `./runs`.
3. Check to ensure you have installed tensorflow (CPU version). We need this for TensorBoard server and dashboard.
4. Start TensorBoard.
```bash
$ tensorboard --logdir runs
```
5. Open TensorBoard dashboard in your web browser using this URL: http://localhost:6006
### Other Datasets
#### CIFAR10
In the spirit of experiment, I have tried using other datasets. I have updated the implementation so that it supports and works with CIFAR10. Need to note that I have not tested throughly our capsule model on CIFAR10.
Here's how we can train and test the model on CIFAR10 by running the following commands.
```bash
python main.py --dataset cifar10 --num-conv-in-channel 3 --input-width 32 --input-height 32 --primary-unit-size 2048 --epochs 80 --num-routing 1 --use-reconstruction-loss yes --regularization-scale 0.0005
```
##### Training Loss and Accuracy
The training losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):
![](results/cifar10/train_loss_accuracy.png)
- Highest training accuracy: 100%
- Lowest training error: 0.3589%
##### Test Loss and Accuracy
The test losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):
![](results/cifar10/test_loss_accuracy.png)
- Highest test accuracy: 71%
- Lowest test error: 0.5735%
## TODO
- [x] Publish results.
- [x] More testing.
- [ ] Inference mode - command to test a pre-trained model.
- [ ] Jupyter Notebook version.
- [x] Create a sample to show how we can apply CapsNet to real-world application.
- [ ] Experiment with CapsNet:
* [x] Try using another dataset.
* [ ] Come out a more creative model structure.
- [x] Pre-trained model and weights.
- [x] Add visualization for training and evaluation metrics.
- [x] Implement recontruction loss.
- [x] Check algorithm for correctness.
- [x] Update results from TensorBoard after making improvements and bug fixes.
- [x] Publish updated pre-trained model weights.
- [x] Log the original and reconstructed images using TensorBoard.
- [ ] Update results with reconstructed image and original image.
- [ ] Resume training by loading model checkpoint.
- [ ] Migrate existing code to work in PyTorch 0.4.0.
*WIP is an acronym for Work-In-Progress*
## Credits
Referenced these implementations mainly for sanity check:
1. [TensorFlow implementation by @naturomics](https://github.com/naturomics/CapsNet-Tensorflow)
## Learning Resources
Here's some resources that we think will be helpful if you want to learn more about Capsule Networks:
- Articles and blog posts:
- [Understanding Hinton's Capsule Networks. Part I: Intuition.](https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b)
- [Dynamic routing between capsules](https://blog.acolyer.org/2017/11/13/dynamic-routing-between-capsules/)
- [What is a CapsNet or Capsule Network?](https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc)
- [Capsule Networks Are Shaking up AI — Here's How to Use Them](https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952)
- [Capsule Networks Explained](https://kndrck.co/posts/capsule_networks_explained/)
- Videos:
- [Capsule Networks: An Improvement to Convolutional Networks](https://www.youtube.com/watch?v=VKoLGnq15RM)
- [Capsule Networks (CapsNets) – Tutorial](https://www.youtube.com/watch?v=pPN8d0E3900)
## Other Implementations
- TensorFlow:
- The first author of the paper, [Sara Sabour has released the code](https://github.com/Sarasra/models/tree/master/research/capsules).
## Real-world Application of CapsNet
The following is a few samples in the wild that show how we can apply CapsNet to real-world use cases.
- [An attempt to implement CapsNet for car make-model classification](https://www.reddit.com/r/MachineLearning/comments/80eiz3/p_implementing_a_capsnet_for_car_makemodel/)
- [A Keras implementation of Capsule Network on Fashion MNIST dataset](https://github.com/XifengGuo/CapsNet-Fashion-MNIST)
\ No newline at end of file
"""Capsule layer
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Author: Cedric Chee
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import utils
class CapsuleLayer(nn.Module):
"""
The core implementation of the idea of capsules
"""
def __init__(self, in_unit, in_channel, num_unit, unit_size, use_routing,
num_routing, cuda_enabled):
super(CapsuleLayer, self).__init__()
self.in_unit = in_unit
self.in_channel = in_channel
self.num_unit = num_unit
self.use_routing = use_routing
self.num_routing = num_routing
self.cuda_enabled = cuda_enabled
if self.use_routing:
"""
Based on the paper, DigitCaps which is capsule layer(s) with
capsule inputs use a routing algorithm that uses this weight matrix, Wij
"""
# weight shape:
# [1 x primary_unit_size x num_classes x output_unit_size x num_primary_unit]
# == [1 x 1152 x 10 x 16 x 8]
self.weight = nn.Parameter(torch.randn(1, in_channel, num_unit, unit_size, in_unit))
else:
"""
According to the CapsNet architecture section in the paper,
we have routing only between two consecutive capsule layers (e.g. PrimaryCapsules and DigitCaps).
No routing is used between Conv1 and PrimaryCapsules.
This means PrimaryCapsules is composed of several convolutional units.
"""
# Define 8 convolutional units.
self.conv_units = nn.ModuleList([
nn.Conv2d(self.in_channel, 32, 9, 2) for u in range(self.num_unit)
])
def forward(self, x):
if self.use_routing:
# Currently used by DigitCaps layer.
return self.routing(x)
else:
# Currently used by PrimaryCaps layer.
return self.no_routing(x)
def routing(self, x):
"""
Routing algorithm for capsule.
:input: tensor x of shape [128, 8, 1152]
:return: vector output of capsule j
"""
batch_size = x.size(0)
x = x.transpose(1, 2) # dim 1 and dim 2 are swapped. out tensor shape: [128, 1152, 8]
# Stacking and adding a dimension to a tensor.
# stack ops output shape: [128, 1152, 10, 8]
# unsqueeze ops output shape: [128, 1152, 10, 8, 1]
x = torch.stack([x] * self.num_unit, dim=2).unsqueeze(4)
# Convert single weight to batch weight.
# [1 x 1152 x 10 x 16 x 8] to: [128, 1152, 10, 16, 8]
batch_weight = torch.cat([self.weight] * batch_size, dim=0)
# u_hat is "prediction vectors" from the capsules in the layer below.
# Transform inputs by weight matrix.
# Matrix product of 2 tensors with shape: [128, 1152, 10, 16, 8] x [128, 1152, 10, 8, 1]
# u_hat shape: [128, 1152, 10, 16, 1]
u_hat = torch.matmul(batch_weight, x)
# All the routing logits (b_ij in the paper) are initialized to zero.
# self.in_channel = primary_unit_size = 32 * 6 * 6 = 1152
# self.num_unit = num_classes = 10
# b_ij shape: [1, 1152, 10, 1]
b_ij = Variable(torch.zeros(1, self.in_channel, self.num_unit, 1))
if self.cuda_enabled:
b_ij = b_ij.cuda()
# From the paper in the "Capsules on MNIST" section,
# the sample MNIST test reconstructions of a CapsNet with 3 routing iterations.
num_iterations = self.num_routing
for iteration in range(num_iterations):
# Routing algorithm
# Calculate routing or also known as coupling coefficients (c_ij).
# c_ij shape: [1, 1152, 10, 1]
c_ij = F.softmax(b_ij, dim=2) # Convert routing logits (b_ij) to softmax.
# c_ij shape from: [128, 1152, 10, 1] to: [128, 1152, 10, 1, 1]
c_ij = torch.cat([c_ij] * batch_size, dim=0).unsqueeze(4)
# Implement equation 2 in the paper.
# s_j is total input to a capsule, is a weigthed sum over all "prediction vectors".
# u_hat is weighted inputs, prediction ˆuj|i made by capsule i.
# c_ij * u_hat shape: [128, 1152, 10, 16, 1]
# s_j output shape: [batch_size=128, 1, 10, 16, 1]
# Sum of Primary Capsules outputs, 1152D becomes 1D.
s_j = (c_ij * u_hat).sum(dim=1, keepdim=True)
# Squash the vector output of capsule j.
# v_j shape: [batch_size, weighted sum of PrimaryCaps output,
# num_classes, output_unit_size from u_hat, 1]
# == [128, 1, 10, 16, 1]
# So, the length of the output vector of a capsule is 16, which is in dim 3.
v_j = utils.squash(s_j, dim=3)
# in_channel is 1152.
# v_j1 shape: [128, 1152, 10, 16, 1]
v_j1 = torch.cat([v_j] * self.in_channel, dim=1)
# The agreement.
# Transpose u_hat with shape [128, 1152, 10, 16, 1] to [128, 1152, 10, 1, 16],
# so we can do matrix product u_hat and v_j1.
# u_vj1 shape: [1, 1152, 10, 1]
u_vj1 = torch.matmul(u_hat.transpose(3, 4), v_j1).squeeze(4).mean(dim=0, keepdim=True)
# Update routing (b_ij) by adding the agreement to the initial logit.
b_ij = b_ij + u_vj1
return v_j.squeeze(1) # shape: [128, 10, 16, 1]
def no_routing(self, x):
"""
Get output for each unit.
A unit has batch, channels, height, width.
An example of a unit output shape is [128, 32, 6, 6]
:return: vector output of capsule j
"""
# Create 8 convolutional unit.
# A convolutional unit uses normal convolutional layer with a non-linearity (squash).
unit = [self.conv_units[i](x) for i, l in enumerate(self.conv_units)]
# Stack all unit outputs.
# Stacked of 8 unit output shape: [128, 8, 32, 6, 6]
unit = torch.stack(unit, dim=1)
batch_size = x.size(0)
# Flatten the 32 of 6x6 grid into 1152.
# Shape: [128, 8, 1152]
unit = unit.view(batch_size, self.num_unit, -1)
# Add non-linearity
# Return squashed outputs of shape: [128, 8, 1152]
return utils.squash(unit, dim=2) # dim 2 is the third dim (1152D array) in our tensor
"""Convolutional layer
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Author: Cedric Chee
"""
import torch
import torch.nn as nn
class ConvLayer(nn.Module):
"""
Conventional Conv2d layer
"""
def __init__(self, in_channel, out_channel, kernel_size):
super(ConvLayer, self).__init__()
self.conv0 = nn.Conv2d(in_channels=in_channel,
out_channels=out_channel,
kernel_size=kernel_size,
stride=1)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
"""Forward pass"""
# x shape: [128, 1, 28, 28]
# out_conv0 shape: [128, 256, 20, 20]
out_conv0 = self.conv0(x)
# out_relu shape: [128, 256, 20, 20]
out_relu = self.relu(out_conv0)
return out_relu
"""Decoder Network
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Author: Cedric Chee
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import utils
class Decoder(nn.Module):
"""
Implement Decoder structure in section 4.1, Figure 2 to reconstruct a digit
from the `DigitCaps` layer representation.
The decoder network consists of 3 fully connected layers. For each
[10, 16] output, we mask out the incorrect predictions, and send
the [16,] vector to the decoder network to reconstruct a [784,] size
image.
This Decoder network is used in training and prediction (testing).
"""
def __init__(self, num_classes, output_unit_size, input_width,
input_height, num_conv_in_channel, cuda_enabled):
"""
The decoder network consists of 3 fully connected layers, with
512, 1024, 784 (or 3072 for CIFAR10) neurons each.
"""
super(Decoder, self).__init__()
self.cuda_enabled = cuda_enabled
fc1_output_size = 512
fc2_output_size = 1024
self.fc3_output_size = input_width * input_height * num_conv_in_channel
self.fc1 = nn.Linear(num_classes * output_unit_size, fc1_output_size) # input dim 10 * 16.
self.fc2 = nn.Linear(fc1_output_size, fc2_output_size)
self.fc3 = nn.Linear(fc2_output_size, self.fc3_output_size)
# Activation functions
self.relu = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()
def forward(self, x, target):
"""
We send the outputs of the `DigitCaps` layer, which is a
[batch_size, 10, 16] size tensor into the Decoder network, and
reconstruct a [batch_size, fc3_output_size] size tensor representing the image.
Args:
x: [batch_size, 10, 16] The output of the digit capsule.
target: [batch_size, 10] One-hot MNIST dataset labels.
Returns:
reconstruction: [batch_size, fc3_output_size] Tensor of reconstructed images.
"""
batch_size = target.size(0)
"""
First, do masking.
"""
# Method 1: mask with y.
# Note: we have not implement method 2 which is masking with true label.
# masked_caps shape: [batch_size, 10, 16, 1]
masked_caps = utils.mask(x, self.cuda_enabled)
"""
Second, reconstruct the images with 3 Fully Connected layers.
"""
# vector_j shape: [batch_size, 160=10*16]
vector_j = masked_caps.view(x.size(0), -1) # reshape the masked_caps tensor
# Forward pass of the network
fc1_out = self.relu(self.fc1(vector_j))
fc2_out = self.relu(self.fc2(fc1_out)) # shape: [batch_size, 1024]
reconstruction = self.sigmoid(self.fc3(fc2_out)) # shape: [batch_size, fc3_output_size]
assert reconstruction.size() == torch.Size([batch_size, self.fc3_output_size])
return reconstruction
import dgl
import torch
import torch.nn.functional as F
from torch import nn
from capsule_layer import CapsuleLayer
# import main
from utils import writer, step
# global_step = main.global_step
device = "cuda" if torch.cuda.is_available() else "cpu"
class DGLFeature():
"""
To wrap different shape of representation tensor into the same shape
"""
def __init__(self, tensor, pad_to):
# self.tensor = tensor
self.node_num = tensor.size(0)
self.flat_tensor = tensor.contiguous().view(self.node_num, -1)
self.node_feature_dim = self.flat_tensor.size(1)
self.flat_pad_tensor = F.pad(self.flat_tensor, (0, pad_to - self.flat_tensor.size(1)))
self.shape = tensor.shape
@property
def tensor(self):
"""
:return: Tensor with original shape
"""
return self.flat_tensor.index_select(1, torch.arange(0, self.node_feature_dim).to(device)).view(self.shape)
@property
def padded_tensor(self):
"""
:return: Flatted and padded Tensor
"""
return self.flat_pad_tensor
class DGLBatchCapsuleLayer(CapsuleLayer):
def __init__(self, in_unit, in_channel, num_unit, unit_size, use_routing,
num_routing, cuda_enabled):
super(DGLBatchCapsuleLayer, self).__init__(in_unit, in_channel, num_unit, unit_size, use_routing,
num_routing, cuda_enabled)
self.unit_size = unit_size
self.weight = nn.Parameter(torch.randn(in_channel, num_unit, unit_size, in_unit))
def routing(self, x):
self.batch_size = x.size(0)
self.g = dgl.DGLGraph()
self.g.add_nodes_from([i for i in range(self.in_channel)])
self.g.add_nodes_from([i + self.in_channel for i in range(self.num_unit)])
for i in range(self.in_channel):
for j in range(self.num_unit):
index_j = j + self.in_channel
self.g.add_edge(i, index_j)
self.edge_features = torch.zeros(self.in_channel, self.num_unit).to('cuda')
x_ = x.transpose(0, 2)
x_ = DGLFeature(x_, self.batch_size * self.unit_size)
x = x.transpose(1, 2)
x = torch.stack([x] * self.num_unit, dim=2).unsqueeze(4)
W = torch.cat([self.weight.unsqueeze(0)] * self.batch_size, dim=0)
u_hat = torch.matmul(W, x).permute(1, 2, 0, 3, 4).squeeze().contiguous()
self.node_feature = DGLFeature(torch.zeros(self.num_unit, self.batch_size, self.unit_size).to('cuda'),
self.batch_size * self.unit_size)
nf = torch.cat([x_.padded_tensor, self.node_feature.padded_tensor], dim=0)
self.g.set_e_repr({'b_ij': self.edge_features.view(-1)})
self.g.set_n_repr({'h': nf})
self.g.set_e_repr({'u_hat': u_hat.view(-1, self.batch_size, self.unit_size)})
for i in range(self.num_routing):
self.i = i
self.g.update_all(self.capsule_msg, self.capsule_reduce,
lambda x: {'h': DGLFeature(x['h'], self.batch_size * self.unit_size).padded_tensor},
batchable=True)
self.g.update_edge(dgl.base.ALL, dgl.base.ALL, self.update_edge, batchable=True)
self.node_feature = self.g.get_n_repr()['h'] \
.index_select(0, torch.arange(self.in_channel, self.in_channel + self.num_unit).to(device)) \
.view(self.num_unit, self.batch_size, self.unit_size)
return self.node_feature.transpose(0, 1).unsqueeze(1).unsqueeze(4).squeeze(1)
def update_edge(self, u, v, edge):
return {
'b_ij': edge['b_ij'] + (v['h'].view(-1, self.batch_size, self.unit_size) * edge['u_hat']).mean(dim=1).sum(
dim=1)}
@staticmethod
def capsule_msg(src, edge):
return {'b_ij': edge['b_ij'], 'h': src['h'], 'u_hat': edge['u_hat']}
def capsule_reduce(self, node, msg):
b_ij_c, h_c, u_hat_c = msg['b_ij'], msg['h'], msg['u_hat']
u_hat = u_hat_c
c_i = F.softmax(b_ij_c, dim=0)
writer.add_histogram(f"c_i{self.i}", c_i, step['step'])
s_j = (c_i.unsqueeze(2).unsqueeze(3) * u_hat).sum(dim=1)
v_j = self.squash(s_j)
return {'h': v_j.view(-1, self.batch_size * self.unit_size)}
@staticmethod
def squash(s):
# This is equation 1 from the paper.
mag_sq = torch.sum(s ** 2, dim=2, keepdim=True)
mag = torch.sqrt(mag_sq)
s = (mag_sq / (1.0 + mag_sq)) * (s / mag)
return s
"""
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Usage:
python main.py
python main.py --epochs 30
python main.py --epochs 30 --num-routing 1
Author: Cedric Chee
"""
from __future__ import print_function
import argparse
import os
from timeit import default_timer as timer
import torch
import torch.optim as optim
import torchvision.utils as vutils
from torch.autograd import Variable
from torch.backends import cudnn
from tqdm import tqdm
import utils
from model import Net
from utils import writer, step
def train(model, data_loader, optimizer, epoch, writer):
"""
Train CapsuleNet model on training set
Args:
model: The CapsuleNet model.
data_loader: An interator over the dataset. It combines a dataset and a sampler.
optimizer: Optimization algorithm.
epoch: Current epoch.
"""
print('===> Training mode')
num_batches = len(data_loader) # iteration per epoch. e.g: 469
total_step = args.epochs * num_batches
epoch_tot_acc = 0
# Switch to train mode
model.train()
if args.cuda:
# When we wrap a Module in DataParallel for multi-GPUs
model = model.module
start_time = timer()
for batch_idx, (data, target) in enumerate(tqdm(data_loader, unit='batch')):
batch_size = data.size(0)
global_step = batch_idx + (epoch * num_batches) - num_batches
step['step'] = global_step
labels = target
target_one_hot = utils.one_hot_encode(target, length=args.num_classes)
assert target_one_hot.size() == torch.Size([batch_size, 10])
data, target = Variable(data), Variable(target_one_hot)
if args.cuda:
data = data.cuda()
target = target.cuda()
# Train step - forward, backward and optimize
optimizer.zero_grad()
output = model(data) # output from DigitCaps (out_digit_caps)
loss, margin_loss, recon_loss = model.loss(data, output, target)
loss.backward()
optimizer.step()
# Calculate accuracy for each step and average accuracy for each epoch
acc = utils.accuracy(output, labels, args.cuda)
epoch_tot_acc += acc
epoch_avg_acc = epoch_tot_acc / (batch_idx + 1)
# TensorBoard logging
# 1) Log the scalar values
writer.add_scalar('train/total_loss', loss.item(), global_step)
writer.add_scalar('train/margin_loss', margin_loss.item(), global_step)
if args.use_reconstruction_loss:
writer.add_scalar('train/reconstruction_loss', recon_loss.item(), global_step)
writer.add_scalar('train/batch_accuracy', acc, global_step)
writer.add_scalar('train/accuracy', epoch_avg_acc, global_step)
# 2) Log values and gradients of the parameters (histogram)
# for tag, value in model.named_parameters():
# tag = tag.replace('.', '/')
# writer.add_histogram(tag, utils.to_np(value), global_step)
# writer.add_histogram(tag + '/grad', utils.to_np(value.grad), global_step)
# Print losses
if batch_idx % args.log_interval == 0:
template = 'Epoch {}/{}, ' \
'Step {}/{}: ' \
'[Total loss: {:.6f},' \
'\tMargin loss: {:.6f},' \
'\tReconstruction loss: {:.6f},' \
'\tBatch accuracy: {:.6f},' \
'\tAccuracy: {:.6f}]'
tqdm.write(template.format(
epoch,
args.epochs,
global_step,
total_step,
loss.item(),
margin_loss.item(),
recon_loss.item() if args.use_reconstruction_loss else 0,
acc,
epoch_avg_acc))
# Print time elapsed for an epoch
end_time = timer()
print('Time elapsed for epoch {}: {:.0f}s.'.format(epoch, end_time - start_time))
def test(model, data_loader, num_train_batches, epoch, writer):
"""
Evaluate model on validation set
Args:
model: The CapsuleNet model.
data_loader: An interator over the dataset. It combines a dataset and a sampler.
"""
print('===> Evaluate mode')
# Switch to evaluate mode
model.eval()
if args.cuda:
# When we wrap a Module in DataParallel for multi-GPUs
model = model.module
loss = 0
margin_loss = 0
recon_loss = 0
correct = 0
num_batches = len(data_loader)
global_step = epoch * num_train_batches + num_train_batches
step['step'] = global_step
for data, target in data_loader:
batch_size = data.size(0)
target_indices = target
target_one_hot = utils.one_hot_encode(target_indices, length=args.num_classes)
assert target_one_hot.size() == torch.Size([batch_size, 10])
data, target = Variable(data, volatile=True), Variable(target_one_hot)
if args.cuda:
data = data.cuda()
target = target.cuda()
# Output predictions
output = model(data) # output from DigitCaps (out_digit_caps)
# Sum up batch loss
t_loss, m_loss, r_loss = model.loss(data, output, target, size_average=False)
loss += t_loss.data[0]
margin_loss += m_loss.data[0]
recon_loss += r_loss.data[0]
# Count number of correct predictions
# v_magnitude shape: [128, 10, 1, 1]
v_magnitude = torch.sqrt((output ** 2).sum(dim=2, keepdim=True))
# pred shape: [128, 1, 1, 1]
pred = v_magnitude.data.max(1, keepdim=True)[1].cpu()
correct += pred.eq(target_indices.view_as(pred)).sum()
# Get the reconstructed images of the last batch
if args.use_reconstruction_loss:
reconstruction = model.decoder(output, target)
# Input image size and number of channel.
# By default, for MNIST, the image width and height is 28x28 and 1 channel for black/white.
image_width = args.input_width
image_height = args.input_height
image_channel = args.num_conv_in_channel
recon_img = reconstruction.view(-1, image_channel, image_width, image_height)
assert recon_img.size() == torch.Size([batch_size, image_channel, image_width, image_height])
# Save the image into file system
utils.save_image(recon_img, 'results/recons_image_test_{}_{}.png'.format(epoch, global_step))
utils.save_image(data, 'results/original_image_test_{}_{}.png'.format(epoch, global_step))
# Add and visualize the image in TensorBoard
recon_img = vutils.make_grid(recon_img.data, normalize=True, scale_each=True)
original_img = vutils.make_grid(data.data, normalize=True, scale_each=True)
writer.add_image('test/recons-image-{}-{}'.format(epoch, global_step), recon_img, global_step)
writer.add_image('test/original-image-{}-{}'.format(epoch, global_step), original_img, global_step)
# Log test losses
loss /= num_batches
margin_loss /= num_batches
recon_loss /= num_batches
# Log test accuracies
num_test_data = len(data_loader.dataset)
accuracy = correct / num_test_data
accuracy_percentage = 100. * accuracy
# TensorBoard logging
# 1) Log the scalar values
writer.add_scalar('test/total_loss', loss, global_step)
writer.add_scalar('test/margin_loss', margin_loss, global_step)
if args.use_reconstruction_loss:
writer.add_scalar('test/reconstruction_loss', recon_loss, global_step)
writer.add_scalar('test/accuracy', accuracy, global_step)
# Print test losses and accuracy
print('Test: [Loss: {:.6f},' \
'\tMargin loss: {:.6f},' \
'\tReconstruction loss: {:.6f}]'.format(
loss,
margin_loss,
recon_loss if args.use_reconstruction_loss else 0))
print('Test Accuracy: {}/{} ({:.0f}%)\n'.format(
correct, num_test_data, accuracy_percentage))
def main():
"""The main function
Entry point.
"""
global args
# Setting the hyper parameters
parser = argparse.ArgumentParser(description='Example of Capsule Network')
parser.add_argument('--epochs', type=int, default=10,
help='number of training epochs. default=10')
parser.add_argument('--lr', type=float, default=0.01,
help='learning rate. default=0.01')
parser.add_argument('--batch-size', type=int, default=128,
help='training batch size. default=128')
parser.add_argument('--test-batch-size', type=int,
default=128, help='testing batch size. default=128')
parser.add_argument('--log-interval', type=int, default=10,
help='how many batches to wait before logging training status. default=10')
parser.add_argument('--no-cuda', action='store_true', default=False,
help='disables CUDA training. default=false')
parser.add_argument('--threads', type=int, default=4,
help='number of threads for data loader to use. default=4')
parser.add_argument('--seed', type=int, default=42,
help='random seed for training. default=42')
parser.add_argument('--num-conv-out-channel', type=int, default=256,
help='number of channels produced by the convolution. default=256')
parser.add_argument('--num-conv-in-channel', type=int, default=1,
help='number of input channels to the convolution. default=1')
parser.add_argument('--num-primary-unit', type=int, default=8,
help='number of primary unit. default=8')
parser.add_argument('--primary-unit-size', type=int,
default=1152, help='primary unit size is 32 * 6 * 6. default=1152')
parser.add_argument('--num-classes', type=int, default=10,
help='number of digit classes. 1 unit for one MNIST digit. default=10')
parser.add_argument('--output-unit-size', type=int,
default=16, help='output unit size. default=16')
parser.add_argument('--num-routing', type=int,
default=3, help='number of routing iteration. default=3')
parser.add_argument('--use-reconstruction-loss', type=utils.str2bool, nargs='?', default=True,
help='use an additional reconstruction loss. default=True')
parser.add_argument('--regularization-scale', type=float, default=0.0005,
help='regularization coefficient for reconstruction loss. default=0.0005')
parser.add_argument('--dataset', help='the name of dataset (mnist, cifar10)', default='mnist')
parser.add_argument('--input-width', type=int,
default=28, help='input image width to the convolution. default=28 for MNIST')
parser.add_argument('--input-height', type=int,
default=28, help='input image height to the convolution. default=28 for MNIST')
args = parser.parse_args()
print(args)
# Check GPU or CUDA is available
args.cuda = not args.no_cuda and torch.cuda.is_available()
# Get reproducible results by manually seed the random number generator
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)
# Load data
train_loader, test_loader = utils.load_data(args)
# Build Capsule Network
print('===> Building model')
model = Net(num_conv_in_channel=args.num_conv_in_channel,
num_conv_out_channel=args.num_conv_out_channel,
num_primary_unit=args.num_primary_unit,
primary_unit_size=args.primary_unit_size,
num_classes=args.num_classes,
output_unit_size=args.output_unit_size,
num_routing=args.num_routing,
use_reconstruction_loss=args.use_reconstruction_loss,
regularization_scale=args.regularization_scale,
input_width=args.input_width,
input_height=args.input_height,
cuda_enabled=args.cuda)
if args.cuda:
print('Utilize GPUs for computation')
print('Number of GPU available', torch.cuda.device_count())
model.cuda()
cudnn.benchmark = True
model = torch.nn.DataParallel(model)
# Print the model architecture and parameters
print('Model architectures:\n{}\n'.format(model))
print('Parameters and size:')
for name, param in model.named_parameters():
print('{}: {}'.format(name, list(param.size())))
# CapsNet has:
# - 8.2M parameters and 6.8M parameters without the reconstruction subnet on MNIST.
# - 11.8M parameters and 8.0M parameters without the reconstruction subnet on CIFAR10.
num_params = sum([param.nelement() for param in model.parameters()])
# The coupling coefficients c_ij are not included in the parameter list,
# we need to add them manually, which is 1152 * 10 = 11520 (on MNIST) or 2048 * 10 (on CIFAR10)
print('\nTotal number of parameters: {}\n'.format(num_params + (11520 if args.dataset == 'mnist' else 20480)))
# Optimizer
optimizer = optim.Adam(model.parameters(), lr=args.lr)
# Make model checkpoint directory
if not os.path.exists('results/trained_model'):
os.makedirs('results/trained_model')
# Train and test
for epoch in range(1, args.epochs + 1):
train(model, train_loader, optimizer, epoch, writer)
test(model, test_loader, len(train_loader), epoch, writer)
# Save model checkpoint
utils.checkpoint({
'epoch': epoch + 1,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict()
}, epoch)
writer.close()
if __name__ == "__main__":
main()
"""CapsNet Architecture
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Author: Cedric Chee
"""
import torch
import torch.nn as nn
from torch.autograd import Variable
from capsule_layer import CapsuleLayer
from conv_layer import ConvLayer
from decoder import Decoder
from dgl_capsule_batch import DGLBatchCapsuleLayer
class Net(nn.Module):
"""
A simple CapsNet with 3 layers
"""
def __init__(self, num_conv_in_channel, num_conv_out_channel, num_primary_unit,
primary_unit_size, num_classes, output_unit_size, num_routing,
use_reconstruction_loss, regularization_scale, input_width, input_height,
cuda_enabled):
"""
In the constructor we instantiate one ConvLayer module and two CapsuleLayer modules
and assign them as member variables.
"""
super(Net, self).__init__()
self.cuda_enabled = cuda_enabled
# Configurations used for image reconstruction.
self.use_reconstruction_loss = use_reconstruction_loss
# Input image size and number of channel.
# By default, for MNIST, the image width and height is 28x28
# and 1 channel for black/white.
self.image_width = input_width
self.image_height = input_height
self.image_channel = num_conv_in_channel
# Also known as lambda reconstruction. Default value is 0.0005.
# We use sum of squared errors (SSE) similar to paper.
self.regularization_scale = regularization_scale
# Layer 1: Conventional Conv2d layer.
self.conv1 = ConvLayer(in_channel=num_conv_in_channel,
out_channel=num_conv_out_channel,
kernel_size=9)
# PrimaryCaps
# Layer 2: Conv2D layer with `squash` activation.
self.primary = CapsuleLayer(in_unit=0,
in_channel=num_conv_out_channel,
num_unit=num_primary_unit,
unit_size=primary_unit_size, # capsule outputs
use_routing=False,
num_routing=num_routing,
cuda_enabled=cuda_enabled)
# DigitCaps
# Final layer: Capsule layer where the routing algorithm is.
self.digits = CapsuleLayer(in_unit=num_primary_unit,
in_channel=primary_unit_size,
num_unit=num_classes,
unit_size=output_unit_size, # 16D capsule per digit class
use_routing=True,
num_routing=num_routing,
cuda_enabled=cuda_enabled)
# Reconstruction network
if use_reconstruction_loss:
self.decoder = Decoder(num_classes, output_unit_size, input_width,
input_height, num_conv_in_channel, cuda_enabled)
def forward(self, x):
"""
Defines the computation performed at every forward pass.
"""
# x shape: [128, 1, 28, 28]. 128 is for the batch size.
# out_conv1 shape: [128, 256, 20, 20]
out_conv1 = self.conv1(x)
# out_primary_caps shape: [128, 8, 1152].
# Total PrimaryCapsules has [32 × 6 × 6 = 1152] capsule outputs.
out_primary_caps = self.primary(out_conv1)
# out_digit_caps shape: [128, 10, 16, 1]
# batch size: 128, 10 digit class, 16D capsule per digit class.
out_digit_caps = self.digits(out_primary_caps)
return out_digit_caps
def loss(self, image, out_digit_caps, target, size_average=True):
"""Custom loss function
Args:
image: [batch_size, 1, 28, 28] MNIST samples.
out_digit_caps: [batch_size, 10, 16, 1] The output from `DigitCaps` layer.
target: [batch_size, 10] One-hot MNIST dataset labels.
size_average: A boolean to enable mean loss (average loss over batch size).
Returns:
total_loss: A scalar Variable of total loss.
m_loss: A scalar of margin loss.
recon_loss: A scalar of reconstruction loss.
"""
recon_loss = 0
m_loss = self.margin_loss(out_digit_caps, target)
if size_average:
m_loss = m_loss.mean()
total_loss = m_loss
if self.use_reconstruction_loss:
# Reconstruct the image from the Decoder network
reconstruction = self.decoder(out_digit_caps, target)
recon_loss = self.reconstruction_loss(reconstruction, image)
# Mean squared error
if size_average:
recon_loss = recon_loss.mean()
# In order to keep in line with the paper,
# they scale down the reconstruction loss by 0.0005
# so that it does not dominate the margin loss.
total_loss = m_loss + recon_loss * self.regularization_scale
return total_loss, m_loss, (recon_loss * self.regularization_scale)
def margin_loss(self, input, target):
"""
Class loss
Implement equation 4 in section 3 'Margin loss for digit existence' in the paper.
Args:
input: [batch_size, 10, 16, 1] The output from `DigitCaps` layer.
target: target: [batch_size, 10] One-hot MNIST labels.
Returns:
l_c: A scalar of class loss or also know as margin loss.
"""
batch_size = input.size(0)
# ||vc|| also known as norm.
v_c = torch.sqrt((input ** 2).sum(dim=2, keepdim=True))
# Calculate left and right max() terms.
zero = Variable(torch.zeros(1))
if self.cuda_enabled:
zero = zero.cuda()
m_plus = 0.9
m_minus = 0.1
loss_lambda = 0.5
max_left = torch.max(m_plus - v_c, zero).view(batch_size, -1) ** 2
max_right = torch.max(v_c - m_minus, zero).view(batch_size, -1) ** 2
t_c = target
# Lc is margin loss for each digit of class c
l_c = t_c * max_left + loss_lambda * (1.0 - t_c) * max_right
l_c = l_c.sum(dim=1)
return l_c
def reconstruction_loss(self, reconstruction, image):
"""
The reconstruction loss is the sum of squared differences between
the reconstructed image (outputs of the logistic units) and
the original image (input image).
Implement section 4.1 'Reconstruction as a regularization method' in the paper.
Based on naturomics's implementation.
Args:
reconstruction: [batch_size, 784] Decoder outputs of reconstructed image tensor.
image: [batch_size, 1, 28, 28] MNIST samples.
Returns:
recon_error: A scalar Variable of reconstruction loss.
"""
# Calculate reconstruction loss.
batch_size = image.size(0) # or another way recon_img.size(0)
# error = (recon_img - image).view(batch_size, -1)
image = image.view(batch_size, -1) # flatten 28x28 by reshaping to [batch_size, 784]
error = reconstruction - image
squared_error = error ** 2
# Scalar Variable
recon_error = torch.sum(squared_error, dim=1)
return recon_error
http://download.pytorch.org/whl/cu90/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl ; sys_platform == "linux"
http://download.pytorch.org/whl/torch-0.3.0.post4-cp36-cp36m-macosx_10_7_x86_64.whl ; sys_platform == "darwin"
torchvision
tensorboardX
tensorflow
tqdm
"""Utilities
PyTorch implementation of CapsNet in Sabour, Hinton et al.'s paper
Dynamic Routing Between Capsules. NIPS 2017.
https://arxiv.org/abs/1710.09829
Author: Cedric Chee
"""
import argparse
import torch
import torch.nn.functional as F
import torchvision.utils as vutils
from tensorboardX import SummaryWriter
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
# Set the logger
writer = SummaryWriter()
step = {'step': 0}
def one_hot_encode(target, length):
"""Converts batches of class indices to classes of one-hot vectors."""
batch_s = target.size(0)
one_hot_vec = torch.zeros(batch_s, length)
for i in range(batch_s):
one_hot_vec[i, target[i]] = 1.0
return one_hot_vec
def checkpoint(state, epoch):
"""Save checkpoint"""
model_out_path = 'results/trained_model/model_epoch_{}.pth'.format(epoch)
torch.save(state, model_out_path)
print('Checkpoint saved to {}'.format(model_out_path))
def load_mnist(args):
"""Load MNIST dataset.
The data is split and normalized between train and test sets.
"""
# Normalize MNIST dataset.
data_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
kwargs = {'num_workers': args.threads,
'pin_memory': True} if args.cuda else {}
print('===> Loading MNIST training datasets')
# MNIST dataset
training_set = datasets.MNIST(
'./data', train=True, download=True, transform=data_transform)
# Input pipeline
training_data_loader = DataLoader(
training_set, batch_size=args.batch_size, shuffle=True, **kwargs)
print('===> Loading MNIST testing datasets')
testing_set = datasets.MNIST(
'./data', train=False, download=True, transform=data_transform)
testing_data_loader = DataLoader(
testing_set, batch_size=args.test_batch_size, shuffle=True, **kwargs)
return training_data_loader, testing_data_loader
def load_cifar10(args):
"""Load CIFAR10 dataset.
The data is split and normalized between train and test sets.
"""
# Normalize CIFAR10 dataset.
data_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
kwargs = {'num_workers': args.threads,
'pin_memory': True} if args.cuda else {}
print('===> Loading CIFAR10 training datasets')
# CIFAR10 dataset
training_set = datasets.CIFAR10(
'./data', train=True, download=True, transform=data_transform)
# Input pipeline
training_data_loader = DataLoader(
training_set, batch_size=args.batch_size, shuffle=True, **kwargs)
print('===> Loading CIFAR10 testing datasets')
testing_set = datasets.CIFAR10(
'./data', train=False, download=True, transform=data_transform)
testing_data_loader = DataLoader(
testing_set, batch_size=args.test_batch_size, shuffle=True, **kwargs)
return training_data_loader, testing_data_loader
def load_data(args):
"""
Load dataset.
"""
dst = args.dataset
if dst == 'mnist':
return load_mnist(args)
elif dst == 'cifar10':
return load_cifar10(args)
else:
raise Exception('Invalid dataset, please check the name of dataset:', dst)
def squash(sj, dim=2):
"""
The non-linear activation used in Capsule.
It drives the length of a large vector to near 1 and small vector to 0
This implement equation 1 from the paper.
"""
sj_mag_sq = torch.sum(sj ** 2, dim, keepdim=True)
# ||sj||
sj_mag = torch.sqrt(sj_mag_sq)
v_j = (sj_mag_sq / (1.0 + sj_mag_sq)) * (sj / sj_mag)
return v_j
def mask(out_digit_caps, cuda_enabled=True):
"""
In the paper, they mask out all but the activity vector of the correct digit capsule.
This means:
a) during training, mask all but the capsule (1x16 vector) which match the ground-truth.
b) during testing, mask all but the longest capsule (1x16 vector).
Args:
out_digit_caps: [batch_size, 10, 16] Tensor output of `DigitCaps` layer.
Returns:
masked: [batch_size, 10, 16, 1] The masked capsules tensors.
"""
# a) Get capsule outputs lengths, ||v_c||
v_length = torch.sqrt((out_digit_caps ** 2).sum(dim=2))
# b) Pick out the index of longest capsule output, v_length by
# masking the tensor by the max value in dim=1.
_, max_index = v_length.max(dim=1)
max_index = max_index.data
# Method 1: masking with y.
# c) In all batches, get the most active capsule
# It's not easy to understand the indexing process with max_index
# as we are 3D animal.
batch_size = out_digit_caps.size(0)
masked_v = [None] * batch_size # Python list
for batch_ix in range(batch_size):
# Batch sample
sample = out_digit_caps[batch_ix]
# Masks out the other capsules in this sample.
v = Variable(torch.zeros(sample.size()))
if cuda_enabled:
v = v.cuda()
# Get the maximum capsule index from this batch sample.
max_caps_index = max_index[batch_ix]
v[max_caps_index] = sample[max_caps_index]
masked_v[batch_ix] = v # append v to masked_v
# Concatenates sequence of masked capsules tensors along the batch dimension.
masked = torch.stack(masked_v, dim=0)
return masked
def save_image(image, file_name):
"""
Save a given image into an image file
"""
# Check number of channels in an image.
if image.size(1) == 2:
# 2-channel image
zeros = torch.zeros(image.size(0), 1, image.size(2), image.size(3))
image_tensor = torch.cat([zeros, image.data.cpu()], dim=1)
else:
# Grayscale or RGB image
image_tensor = image.data.cpu() # get Tensor from Variable
vutils.save_image(image_tensor, file_name)
def accuracy(output, target, cuda_enabled=True):
"""
Compute accuracy.
Args:
output: [batch_size, 10, 16, 1] The output from DigitCaps layer.
target: [batch_size] Labels for dataset.
Returns:
accuracy (float): The accuracy for a batch.
"""
batch_size = target.size(0)
v_length = torch.sqrt((output ** 2).sum(dim=2, keepdim=True))
softmax_v = F.softmax(v_length, dim=1)
assert softmax_v.size() == torch.Size([batch_size, 10, 1, 1])
_, max_index = softmax_v.max(dim=1)
assert max_index.size() == torch.Size([batch_size, 1, 1])
pred = max_index.squeeze() # max_index.view(batch_size)
assert pred.size() == torch.Size([batch_size])
if cuda_enabled:
target = target.cuda()
pred = pred.cuda()
correct_pred = torch.eq(target, pred.data) # tensor
# correct_pred_sum = correct_pred.sum() # scalar. e.g: 6 correct out of 128 images.
acc = correct_pred.float().mean() # e.g: 6 / 128 = 0.046875
return acc
def to_np(param):
"""
Convert values of the model parameters to numpy.array.
"""
return param.clone().cpu().data.numpy()
def str2bool(v):
"""
Parsing boolean values with argparse.
"""
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment