"This tutorial explains how to use DGL library and its language to implement the [capsule network](http://arxiv.org/abs/1710.09829) proposed by Geoffrey Hinton and his team. The algorithm aims to provide a better alternative to current neural network structures. By using DGL library, users can implement the algorithm in a more intuitive way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Overview\n",
"\n",
"### Introduction\n",
"Capsule Network were first introduced in 2011 by Geoffrey Hinton, et al., in a paper called [Transforming Autoencoders](https://www.cs.toronto.edu/~fritz/absps/transauto6.pdf), but it was only a few months ago, in November 2017, that Sara Sabour, Nicholas Frosst, and Geoffrey Hinton published a paper called Dynamic Routing between Capsules, where they introduced a CapsNet architecture that reached state-of-the-art performance on MNIST.\n",
"\n",
"### What's a capsule?\n",
"> A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. \n",
"\n",
"Generally Speaking, the idea of capsule is to encode all the information about the features into a vector form, by substituting scalars in traditional neural network with vectors. And use the norm of the vector to represents the meaning of original scalars. \n",
"\n",
"\n",
"### Dynamic Routing Algorithm\n",
"Due to the different structure of network, capsules network has different operations to calculate results. This figure shows the comparison, drawn by [Max Pechyonkin](https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66O). \n",
"The key idea is that the output of each capsule is the sum of weighted input vectors. We will go into details in the later section with code implementations.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Implementations\n",
"\n",
"### 1. Consider capsule routing as a graph structure\n",
"\n",
"We can consider each capsule as a node in a graph, and connect all the nodes between layers.\n",
"At this stage, we need to define a reduce function to aggregate all the information we get from message function into node features.\n",
"This step implements the line 4 and line 5 in routing algorithms, which softmax over $b_{ij}$ and calculate weighted sum of input features. Note that softmax operation is over dimension $j$ instead of $i$. \n",
"/home/jinjing/.pyenv/versions/3.6.1/lib/python3.6/site-packages/dgl-0.0.1-py3.6.egg/dgl/frame.py:256: UserWarning: Initializer is not set. Use zero initializer instead. To suppress this warning, use `set_initializer` to explicitly specify which initializer to use.\n",
" dgl_warning('Initializer is not set. Use zero initializer instead.'\n"
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [0.4.0] - 2018-01-30
### Added
- Supports and works with CIFAR10 dataset.
### Changed
- Upgrade to PyTorch 0.3.0.
- Supports CUDA 9.
- Drop our custom softmax function and switch to PyTorch softmax function.
- Modify the save_image utils function to handle 3-channel (RGB) image.
### Fixed
- Compatibilities with PyTorch 0.3.0.
## [0.3.0] - 2017-11-27
### Added
- Decoder network PyTorch module.
- Reconstruct image with Decoder network during testing.
- Save the original and recontructed images into file system.
- Log the original and reconstructed images using TensorBoard.
### Changed
- Refactor reconstruction loss function and decoder network.
- Remove image reconstruction from training.
## [0.2.0] - 2017-11-26
### Added
- New dependencies for TensorBoard and tqdm.
- Logging losses and accuracies with TensorBoard.
- New utils functions for:
- computing accuracy
- convert values of the model parameters to numpy.array.
- parsing boolean values with argparse
- Softmax function that takes a dimension.
- More detailed code comments.
- Show margin loss and reconstruction loss in logs.
- Show accuracy in train logs.
### Changed
- Refactor loss functions.
- Clean codes.
### Fixed
- Runtime error during pip install requirements.txt
- Bug in routing algorithm.
## [0.1.0] - 2017-11-12
### Added
- Implemented reconstruction loss.
- Saving reconstructed image as file.
- Improve training speed by using PyTorch DataParallel to wrap our model.
- PyTorch will parallelized the model and data over multiple GPUs.
- Supports training:
- on CPU (tested with macOS Sierra)
- on one GPU (tested with 1 Tesla K80 GPU)
- on multiple GPU (tested with 8 GPUs)
- with or without CUDA (tested with CUDA version 8.0.61)
- cuDNN 5 (tested with cuDNN 5.1.3)
### Changed
- More intuitive variable naming.
### Fixed
- Resolve Pylint warnings and reformat code.
- Missing square in equation 4 for margin (class) loss.
## 0.0.1 - 2017-11-04
### Added
- Initial release. The first beta version. API is stable. The code runs. So, I think it's safe to use for development but not ready for general production usage.
A CUDA-enabled PyTorch implementation of CapsNet (Capsule Network) based on this paper:
[Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules. NIPS 2017](https://arxiv.org/abs/1710.09829)
The current `test error is 0.21%` and the `best test error is 0.20%`. The current `test accuracy is 99.31%` and the `best test accuracy is 99.32%`.
**What is a Capsule**
> A Capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.
You can learn more about Capsule Networks [here](#learning-resources).
**Why another CapsNet implementation?**
I wanted a decent PyTorch implementation of CapsNet and I couldn't find one at the point when I started. The goal of this implementation is focus to help newcomers learn and understand the CapsNet architecture and the idea of Capsules. The implementation is **NOT** focus on rigorous correctness of the results. In addition, the codes are not optimized for speed. To help us read and understand the codes easier, the codes comes with ample comments and the Python classes and functions are documented with Python docstring.
I will try my best to check and fix issues reported. Contributions are highly welcomed. If you find any bugs or errors in the codes, please do not hesitate to open an issue or a pull request. Thank you.
**Status and Latest Updates:**
See the [CHANGELOG](CHANGELOG.md)
**Datasets**
The model was trained on the standard [MNIST](http://yann.lecun.com/exdb/mnist/) data.
*Note: you don't have to manually download, preprocess, and load the MNIST dataset as [TorchVision](https://github.com/pytorch/vision) will take care of this step for you.*
I have tried using other datasets. See the [Other Datasets](#other-datasets) section below for more details.
## Requirements
- Python 3
- Tested with version 3.6.4
-[PyTorch](http://pytorch.org/)
- Tested with version 0.3.0.post4
- Migrate existing code to work in version 0.4.0. [Work-In-Progress]
- Code will not run with version 0.1.2 due to `keepdim` not available in this version.
- Code will not run with version 0.2.0 due to `softmax` function doesn't takes a dimension.
If you have trained a model in Step 2 above, then the weights for the trained model will be saved to `results/trained_model/model_epoch_10.pth`. [WIP] Now just run the following command to get test results.
You can download the weights for the pre-trained model from my Google Drive. We saved the weights (model state dict) and the optimizer state for the model at the end of every training epoch.
- Weights from [epoch 50 checkpoint](https://drive.google.com/uc?export=download&id=1lYtOMSreP4I9hr9un4DsBJZrzodI6l2d)[84 MB].
- Weights from [epoch 40 to 50](https://drive.google.com/uc?export=download&id=1VMuVtJrecz47czsT5HqLxZpFjkLoMKaL) checkpoints [928 MB].
Uncompress and put the weights (.pth files) into `./results/trained_model/`.
*Note: the model was **last trained** on 2017-11-26 and the weights **last updated** on 2017-11-28.*
Total number of parameters on (with reconstruction network): 8227088 (8 million)
```
### TensorBoard
We logged the training and test losses and accuracies using tensorboardX. TensorBoard helps us visualize how the machine learn over time. We can visualize statistics, such as how the objective function is changing or weights or accuracy varied during training.
TensorBoard operates by reading TensorFlow data (events files).
#### How to Use TensorBoard
1. Download a [copy of the events files](https://drive.google.com/uc?export=download&id=1lZVffeZTkUQfSxmZmYDViRzmhb59wBWL) for the latest run from my Google Drive.
2. Uncompress the file and put it into `./runs`.
3. Check to ensure you have installed tensorflow (CPU version). We need this for TensorBoard server and dashboard.
4. Start TensorBoard.
```bash
$ tensorboard --logdir runs
```
5. Open TensorBoard dashboard in your web browser using this URL: http://localhost:6006
### Other Datasets
#### CIFAR10
In the spirit of experiment, I have tried using other datasets. I have updated the implementation so that it supports and works with CIFAR10. Need to note that I have not tested throughly our capsule model on CIFAR10.
Here's how we can train and test the model on CIFAR10 by running the following commands.
The training losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

- Highest training accuracy: 100%
- Lowest training error: 0.3589%
##### Test Loss and Accuracy
The test losses and accuracies for CapsNet-v4 (80 epochs, 3 routing iteration, using reconstruction, regularization scale of 0.0005):

- Highest test accuracy: 71%
- Lowest test error: 0.5735%
## TODO
- [x] Publish results.
- [x] More testing.
- [ ] Inference mode - command to test a pre-trained model.
- [ ] Jupyter Notebook version.
- [x] Create a sample to show how we can apply CapsNet to real-world application.
- [ ] Experiment with CapsNet:
* [x] Try using another dataset.
* [ ] Come out a more creative model structure.
- [x] Pre-trained model and weights.
- [x] Add visualization for training and evaluation metrics.
- [x] Implement recontruction loss.
- [x] Check algorithm for correctness.
- [x] Update results from TensorBoard after making improvements and bug fixes.
- [x] Publish updated pre-trained model weights.
- [x] Log the original and reconstructed images using TensorBoard.
- [ ] Update results with reconstructed image and original image.
- [ ] Resume training by loading model checkpoint.
- [ ] Migrate existing code to work in PyTorch 0.4.0.
*WIP is an acronym for Work-In-Progress*
## Credits
Referenced these implementations mainly for sanity check:
1.[TensorFlow implementation by @naturomics](https://github.com/naturomics/CapsNet-Tensorflow)
## Learning Resources
Here's some resources that we think will be helpful if you want to learn more about Capsule Networks:
- Articles and blog posts:
-[Understanding Hinton's Capsule Networks. Part I: Intuition.](https://medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b)
-[Dynamic routing between capsules](https://blog.acolyer.org/2017/11/13/dynamic-routing-between-capsules/)
-[What is a CapsNet or Capsule Network?](https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc)
-[Capsule Networks Are Shaking up AI — Here's How to Use Them](https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952)
- The first author of the paper, [Sara Sabour has released the code](https://github.com/Sarasra/models/tree/master/research/capsules).
## Real-world Application of CapsNet
The following is a few samples in the wild that show how we can apply CapsNet to real-world use cases.
-[An attempt to implement CapsNet for car make-model classification](https://www.reddit.com/r/MachineLearning/comments/80eiz3/p_implementing_a_capsnet_for_car_makemodel/)
-[A Keras implementation of Capsule Network on Fashion MNIST dataset](https://github.com/XifengGuo/CapsNet-Fashion-MNIST)