"git@developer.sourcefind.cn:OpenDAS/ollama.git" did not exist on "4dcf5c3e0bfd95e1fd25cbc0ca2c2544d2ba4e0c"
Commit 16cc5287 authored by VoVAllen's avatar VoVAllen Committed by Minjie Wang
Browse files

[Doc] Improve Capsule with Jinyang & Fix wrong tutorial level layout (#236)

* improve capsule tutorial with jinyang

* fix wrong layout of second-level tutorial

* delete transformer
parent dafe4671
...@@ -195,8 +195,8 @@ intersphinx_mapping = { ...@@ -195,8 +195,8 @@ intersphinx_mapping = {
# sphinx gallery configurations # sphinx gallery configurations
from sphinx_gallery.sorting import FileNameSortKey from sphinx_gallery.sorting import FileNameSortKey
examples_dirs = ['../../tutorials'] # path to find sources examples_dirs = ['../../tutorials/basics','../../tutorials/models'] # path to find sources
gallery_dirs = ['tutorials'] # path to generate docs gallery_dirs = ['tutorials/basics','tutorials/models'] # path to generate docs
reference_url = { reference_url = {
'dgl' : None, 'dgl' : None,
'numpy': 'http://docs.scipy.org/doc/numpy/', 'numpy': 'http://docs.scipy.org/doc/numpy/',
......
...@@ -65,7 +65,8 @@ credit, see `here <https://www.dgl.ai/ack>`_. ...@@ -65,7 +65,8 @@ credit, see `here <https://www.dgl.ai/ack>`_.
:caption: Tutorials :caption: Tutorials
:glob: :glob:
tutorials/index tutorials/basics/index
tutorials/models/index
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
......
...@@ -156,7 +156,7 @@ def pagerank_level2(g): ...@@ -156,7 +156,7 @@ def pagerank_level2(g):
############################################################################### ###############################################################################
# Besides ``update_all``, we also have ``pull``, ``push``, and ``send_and_recv`` # Besides ``update_all``, we also have ``pull``, ``push``, and ``send_and_recv``
# in this level-2 category. Please refer to the :doc:`API reference <../api/python/graph>` # in this level-2 category. Please refer to the :doc:`API reference <../../api/python/graph>`
# for more details. # for more details.
...@@ -200,7 +200,7 @@ def pagerank_builtin(g): ...@@ -200,7 +200,7 @@ def pagerank_builtin(g):
# #
# `This section <spmv_>`_ describes why spMV can speed up the scatter-gather # `This section <spmv_>`_ describes why spMV can speed up the scatter-gather
# phase in PageRank. For more details about the builtin functions in DGL, # phase in PageRank. For more details about the builtin functions in DGL,
# please read the :doc:`API reference <../api/python/function>`. # please read the :doc:`API reference <../../api/python/function>`.
# #
# You can also download and run the codes to feel the difference. # You can also download and run the codes to feel the difference.
...@@ -241,5 +241,5 @@ print(g.ndata['pv']) ...@@ -241,5 +241,5 @@ print(g.ndata['pv'])
############################################################################### ###############################################################################
# Next steps # Next steps
# ---------- # ----------
# Check out :doc:`GCN <models/1_gcn>` and :doc:`Capsule <models/2_capsule>` # Check out :doc:`GCN <../models/1_gnn/1_gcn>` and :doc:`Capsule <../models/4_old_wines/2_capsule>`
# for more model implemenetations in DGL. # for more model implemenetations in DGL.
Basic Tutorials Basic Tutorials
=============== ===============
These tutorials conver the basics of DGL. These tutorials cover the basics of DGL.
...@@ -10,7 +10,7 @@ Yu Gai, Quan Gan, Zheng Zhang ...@@ -10,7 +10,7 @@ Yu Gai, Quan Gan, Zheng Zhang
This is a gentle introduction of using DGL to implement Graph Convolutional This is a gentle introduction of using DGL to implement Graph Convolutional
Networks (Kipf & Welling et al., `Semi-Supervised Classificaton with Graph Networks (Kipf & Welling et al., `Semi-Supervised Classificaton with Graph
Convolutional Networks <https://arxiv.org/pdf/1609.02907.pdf>`_). We build upon Convolutional Networks <https://arxiv.org/pdf/1609.02907.pdf>`_). We build upon
the :doc:`earlier tutorial <../3_pagerank>` on DGLGraph and demonstrate the :doc:`earlier tutorial <../../basics/3_pagerank>` on DGLGraph and demonstrate
how DGL combines graph with deep neural network and learn structural representations. how DGL combines graph with deep neural network and learn structural representations.
""" """
...@@ -160,4 +160,4 @@ for epoch in range(30): ...@@ -160,4 +160,4 @@ for epoch in range(30):
# multiplication kernels (such as Kipf's # multiplication kernels (such as Kipf's
# `pygcn <https://github.com/tkipf/pygcn>`_ code). The above DGL implementation # `pygcn <https://github.com/tkipf/pygcn>`_ code). The above DGL implementation
# in fact has already used this trick due to the use of builtin functions. To # in fact has already used this trick due to the use of builtin functions. To
# understand what is under the hood, please read our tutorial on :doc:`PageRank <../3_pagerank>`. # understand what is under the hood, please read our tutorial on :doc:`PageRank <../../basics/3_pagerank>`.
.. _tutorials1-index:
Graph Neural Network and its variant
------------------------------------
* **GCN** `[paper] <https://arxiv.org/abs/1609.02907>`__ `[tutorial] <models/1_gcn.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gcn/gcn.py>`__:
this is the vanilla GCN. The tutorial covers the basic uses of DGL APIs.
* **GAT** `[paper] <https://arxiv.org/abs/1710.10903>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gat/gat.py>`__:
the key extension of GAT w.r.t vanilla GCN is deploying multi-head attention
among neighborhood of a node, thus greatly enhances the capacity and
expressiveness of the model.
* **R-GCN** `[paper] <https://arxiv.org/abs/1703.06103>`__ `[tutorial] <models/4_rgcn.html>`__
[code (wip)]: the key
difference of RGNN is to allow multi-edges among two entities of a graph, and
edges with distinct relationships are encoded differently. This is an
interesting extension of GCN that can have a lot of applications of its own.
* **LGNN** `[paper] <https://arxiv.org/abs/1705.08415>`__ `[tutorial (wip)]` `[code (wip)]`:
this model focuses on community detection by inspecting graph structures. It
uses representations of both the orignal graph and its line-graph companion. In
addition to demonstrate how an algorithm can harness multiple graphs, our
implementation shows how one can judiciously mix vanilla tensor operation,
sparse-matrix tensor operations, along with message-passing with DGL.
* **SSE** `[paper] <http://proceedings.mlr.press/v80/dai18a/dai18a.pdf>`__ `[tutorial (wip)]`
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/mxnet/sse/sse_batch.py>`__:
the emphasize here is *giant* graph that cannot fit comfortably on one GPU
card. SSE is an example to illustrate the co-design of both algrithm and
system: sampling to guarantee asymptotic covergence while lowering the
complexity, and batching across samples for maximum parallelism.
\ No newline at end of file
.. _tutorials2-index:
Dealing with many small graphs
------------------------------
* **Tree-LSTM** `[paper] <https://arxiv.org/abs/1503.00075>`__ `[tutorial] <models/3_tree-lstm.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/tree_lstm/tree_lstm.py>`__:
sentences of natural languages have inherent structures, which are thrown away
by treating them simply as sequences. Tree-LSTM is a powerful model that learns
the representation by leveraging prior syntactic structures (e.g. parse-tree).
The challenge to train it well is that simply by padding a sentence to the
maximum length no longer works, since trees of different sentences have
different sizes and topologies. DGL solves this problem by throwing the trees
into a bigger "container" graph, and use message-passing to explore maximum
parallelism. The key API we use is batching.
.. _tutorials3-index:
Generative models
------------------------------
* **DGMG** `[paper] <https://arxiv.org/abs/1803.03324>`__ `[tutorial] <models/5_dgmg.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/dgmg>`__:
this model belongs to the important family that deals with structural
generation. DGMG is interesting because its state-machine approach is the most
general. It is also very challenging because, unlike Tree-LSTM, every sample
has a dynamic, probability-driven structure that is not available before
training. We are able to progressively leverage intra- and inter-graph
parallelism to steadily improve the performance.
* **JTNN** `[paper] <https://arxiv.org/abs/1802.04364>`__ `[code (wip)]`: unlike DGMG, this
paper generates molecular graphs using the framework of variational
auto-encoder. Perhaps more interesting is its approach to build structure
hierarchically, in the case of molecular, with junction tree as the middle
scaffolding.
...@@ -4,62 +4,62 @@ ...@@ -4,62 +4,62 @@
Capsule Network Tutorial Capsule Network Tutorial
=========================== ===========================
**Author**: Jinjing Zhou, `Jake **Author**: Jinjing Zhou, `Jake Zhao <https://cs.nyu.edu/~jakezhao/>`_, Zheng Zhang, Jinyang Li
Zhao <https://cs.nyu.edu/~jakezhao/>`_, Zheng Zhang
It is perhaps a little surprising that some of the more classical models can It is perhaps a little surprising that some of the more classical models
also be described in terms of graphs, offering a different perspective. can also be described in terms of graphs, offering a different
This tutorial describes how this is done for the `capsule network <http://arxiv.org/abs/1710.09829>`__. perspective. This tutorial describes how this can be done for the
`capsule network <http://arxiv.org/abs/1710.09829>`__.
""" """
####################################################################################### #######################################################################################
# Key ideas of Capsule # Key ideas of Capsule
# -------------------- # --------------------
# #
# There are two key ideas that the Capsule model offers. # The Capsule model offers two key ideas.
# #
# **Richer representations** In classic convolutional network, a scalar # **Richer representation** In classic convolutional networks, a scalar
# value represents the activation of a given feature. Instead, a capsule # value represents the activation of a given feature. By contrast, a
# outputs a vector, whose norm represents the probability of a feature, # capsule outputs a vector. The vector's length represents the probability
# and the orientation its properties. # of a feature being present. The vector's orientation represents the
# various properties of the feature (such as pose, deformation, texture
# etc.).
# #
# .. figure:: https://i.imgur.com/55Ovkdh.png # |image0|
# :alt:
# #
# **Dynamic routing** To generalize max-pooling, there is another # **Dynamic routing** The output of a capsule is preferentially sent to
# interesting proposed by the authors, as a representational more powerful # certain parents in the layer above based on how well the capsule's
# way to construct higher level feature from its low levels. Consider a # prediction agrees with that of a parent. Such dynamic
# capsule :math:`u_i`. The way :math:`u_i` is integrated to the next level # "routing-by-agreement" generalizes the static routing of max-pooling.
# capsules take two steps:
# #
# 1. :math:`u_i` projects differently to different higher level capsules # During training, routing is done iteratively; each iteration adjusts
# via a linear transformation: :math:`\hat{u}_{j|i} = W_{ij}u_i`. # "routing weights" between capsules based on their observed agreements,
# 2. :math:`\hat{u}_{j|i}` routes to the higher level capsules by # in a manner similar to a k-means algorithm or `competitive
# spreading itself with a weighted sum, and the weight is dynamically # learning <https://en.wikipedia.org/wiki/Competitive_learning>`__.
# determined by iteratively modify the and checking against the
# "consistency" between :math:`\hat{u}_{j|i}` and :math:`v_j`, for any
# :math:`v_j`. Note that this is similar to a k-means algorithm or
# `competive
# learning <https://en.wikipedia.org/wiki/Competitive_learning>`__ in
# spirit. At the end of iterations, :math:`v_j` now integrates the
# lower level capsules.
# #
# The full algorithm is the following: |image0| # In this tutorial, we show how capsule's dynamic routing algorithm can be
# # naturally expressed as a graph algorithm. Our implementation is adapted
# The dynamic routing step can be naturally expressed as a graph # from `Cedric
# algorithm. This is the focus of this tutorial. Our implementation is
# adapted from `Cedric
# Chee <https://github.com/cedrickchee/capsule-net-pytorch>`__, replacing # Chee <https://github.com/cedrickchee/capsule-net-pytorch>`__, replacing
# only the routing layer, and achieving similar speed and accuracy. # only the routing layer. Our version achieves similar speed and accuracy.
# #
# Model Implementation # Model Implementation
# ----------------------------------- # ----------------------
# Step 1: Setup and Graph Initialiation # Step 1: Setup and Graph Initialization
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# The connectivity between two layers of capsules form a directed,
# bipartite graph, as shown in the Figure below.
#
# |image1|
# #
# The below figure shows the directed bipartitie graph built for capsules # Each node :math:`j` is associated with feature :math:`v_j`,
# network. We denote :math:`b_{ij}`, :math:`\hat{u}_{j|i}` as edge # representing its capsule’s output. Each edge is associated with
# features and :math:`v_j` as node features. |image1| # features :math:`b_{ij}` and :math:`\hat{u}_{j|i}`. :math:`b_{ij}`
# determines routing weights, and :math:`\hat{u}_{j|i}` represents the
# prediction of capsule :math:`i` for :math:`j`.
# #
# Here's how we set up the graph and initialize node and edge features.
import torch.nn as nn import torch.nn as nn
import torch as th import torch as th
import torch.nn.functional as F import torch.nn.functional as F
...@@ -88,32 +88,33 @@ def init_graph(in_nodes, out_nodes, f_size): ...@@ -88,32 +88,33 @@ def init_graph(in_nodes, out_nodes, f_size):
######################################################################################### #########################################################################################
# Step 2: Define message passing functions # Step 2: Define message passing functions
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Recall the following steps, and they are implemented in the class
# ``DGLRoutingLayer`` as the followings:
# #
# 1. Normalize over out edges # This is the pseudo code for Capsule's routing algorithm as given in the
# paper:
#
# |image2|
# We implement pseudo code lines 4-7 in the class `DGLRoutingLayer` as the following steps:
#
# 1. Calculate coupling coefficients:
# #
# - Softmax over all out-edge of in-capsules # - Coefficients are the softmax over all out-edge of in-capsules:
# :math:`\textbf{c}_i = \text{softmax}(\textbf{b}_i)`. # :math:`\textbf{c}_{i,j} = \text{softmax}(\textbf{b}_{i,j})`.
# #
# 2. Weighted sum over all in-capsules # 2. Calculate weighted sum over all in-capsules:
# #
# - Out-capsules equals weighted sum of in-capsules # - Output of a capsule is equal to the weighted sum of its in-capsules
# :math:`s_j=\sum_i c_{ij}\hat{u}_{j|i}` # :math:`s_j=\sum_i c_{ij}\hat{u}_{j|i}`
# #
# 3. Squash Operation # 3. Squash outputs:
# #
# - Squashing function is to ensure that short capsule vectors get # - Squash the length of a capsule's output vector to range (0,1), so it can represent the probability (of some feature being present).
# shrunk to almost zero length while the long capsule vectors get
# shrunk to a length slightly below 1. Its norm is expected to
# represents probabilities at some levels.
# - :math:`v_j=\text{squash}(s_j)=\frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||}` # - :math:`v_j=\text{squash}(s_j)=\frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||}`
# #
# 4. Update weights by agreement # 4. Update weights by the amount of agreement:
# #
# - :math:`\hat{u}_{j|i}\cdot v_j` can be considered as agreement # - The scalar product :math:`\hat{u}_{j|i}\cdot v_j` can be considered as how well capsule :math:`i` agrees with :math:`j`. It is used to update
# between current capsule and updated capsule,
# :math:`b_{ij}=b_{ij}+\hat{u}_{j|i}\cdot v_j` # :math:`b_{ij}=b_{ij}+\hat{u}_{j|i}\cdot v_j`
class DGLRoutingLayer(nn.Module): class DGLRoutingLayer(nn.Module):
def __init__(self, in_nodes, out_nodes, f_size): def __init__(self, in_nodes, out_nodes, f_size):
super(DGLRoutingLayer, self).__init__() super(DGLRoutingLayer, self).__init__()
...@@ -172,9 +173,9 @@ u_hat = th.randn(in_nodes * out_nodes, f_size) ...@@ -172,9 +173,9 @@ u_hat = th.randn(in_nodes * out_nodes, f_size)
routing = DGLRoutingLayer(in_nodes, out_nodes, f_size) routing = DGLRoutingLayer(in_nodes, out_nodes, f_size)
############################################################################################################ ############################################################################################################
# We can visualize the behavior by monitoring the entropy of outgoing # We can visualize a capsule network's behavior by monitoring the entropy
# weights, they should start high and then drop, as the assignment # of coupling coefficients. They should start high and then drop, as the
# gradually concentrate: # weights gradually concentrate on fewer edges:
entropy_list = [] entropy_list = []
dist_list = [] dist_list = []
...@@ -193,9 +194,9 @@ plt.xlabel("Number of Routing") ...@@ -193,9 +194,9 @@ plt.xlabel("Number of Routing")
plt.xticks(np.arange(len(entropy_list))) plt.xticks(np.arange(len(entropy_list)))
plt.close() plt.close()
############################################################################################################ ############################################################################################################
# |image3|
# #
# .. figure:: https://i.imgur.com/dMvu7p3.png # Alternatively, we can also watch the evolution of histograms:
# :alt:
import seaborn as sns import seaborn as sns
import matplotlib.animation as animation import matplotlib.animation as animation
...@@ -216,8 +217,10 @@ ani = animation.FuncAnimation(fig, dist_animate, frames=len(entropy_list), inter ...@@ -216,8 +217,10 @@ ani = animation.FuncAnimation(fig, dist_animate, frames=len(entropy_list), inter
plt.close() plt.close()
############################################################################################################ ############################################################################################################
# Alternatively, we can also watch the evolution of histograms: |image2| # |image4|
# Or monitor the how lower level capcules gradually attach to one of the higher level ones: #
# Or monitor the how lower level capsules gradually attach to one of the
# higher level ones:
import networkx as nx import networkx as nx
from networkx.algorithms import bipartite from networkx.algorithms import bipartite
...@@ -251,14 +254,16 @@ ani2 = animation.FuncAnimation(fig2, weight_animate, frames=len(dist_list), inte ...@@ -251,14 +254,16 @@ ani2 = animation.FuncAnimation(fig2, weight_animate, frames=len(dist_list), inte
plt.close() plt.close()
############################################################################################################ ############################################################################################################
# |image3| # |image5|
# #
# The full code of this visulization is provided at # The full code of this visualization is provided at
# `link <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/capsule/simple_routing.py>`__; the complete # `link <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/capsule/simple_routing.py>`__; the complete
# code that trains on MNIST is at `link <https://github.com/jermainewang/dgl/tree/tutorial/examples/pytorch/capsule>`__. # code that trains on MNIST is at `link <https://github.com/jermainewang/dgl/tree/tutorial/examples/pytorch/capsule>`__.
# #
# .. |image0| image:: https://i.imgur.com/mv1W9Rv.png # .. |image0| image:: https://i.imgur.com/55Ovkdh.png
# .. |image1| image:: https://i.imgur.com/9tc6GLl.png # .. |image1| image:: https://i.imgur.com/9tc6GLl.png
# .. |image2| image:: https://github.com/VoVAllen/DGL_Capsule/raw/master/routing_dist.gif # .. |image2| image:: https://i.imgur.com/mv1W9Rv.png
# .. |image3| image:: https://github.com/VoVAllen/DGL_Capsule/raw/master/routing_vis.gif # .. |image3| image:: https://i.imgur.com/dMvu7p3.png
# # .. |image4| image:: https://github.com/VoVAllen/DGL_Capsule/raw/master/routing_dist.gif
# .. |image5| image:: https://github.com/VoVAllen/DGL_Capsule/raw/master/routing_vis.gif
.. _tutorials4-index:
Old (new) wines in new bottle
-----------------------------
* **Capsule** `[paper] <https://arxiv.org/abs/1710.09829>`__ `[tutorial] <models/2_capsule.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/capsule>`__: this new
computer vision model has two key ideas -- enhancing the feature representation
in a vector form (instead of a scalar) called *capsule*, and replacing
maxpooling with dynamic routing. The idea of dynamic routing is to integrate a
lower level capsule to one (or several) of a higher level one with
non-parametric message-passing. We show how the later can be nicely implemented
with DGL APIs.
* **Transformer** `[paper] <https://arxiv.org/abs/1706.03762>`__ `[tutorial (wip)]` `[code (wip)]` and
**Universal Transformer** `[paper] <https://arxiv.org/abs/1807.03819>`__ `[tutorial (wip)]`
`[code (wip)]`: these
two models replace RNN with several layers of multi-head attention to encode
and discover structures among tokens of a sentence. These attention mechanisms
can similarly formulated as graph operations with message-passing.
...@@ -15,85 +15,4 @@ We categorize the models below, providing links to the original code and ...@@ -15,85 +15,4 @@ We categorize the models below, providing links to the original code and
tutorial when appropriate. As will become apparent, these models stress the use tutorial when appropriate. As will become apparent, these models stress the use
of different DGL APIs. of different DGL APIs.
Graph Neural Network and its variant
------------------------------------
* **GCN** `[paper] <https://arxiv.org/abs/1609.02907>`__ `[tutorial] <models/1_gcn.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gcn/gcn.py>`__:
this is the vanilla GCN. The tutorial covers the basic uses of DGL APIs.
* **GAT** `[paper] <https://arxiv.org/abs/1710.10903>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gat/gat.py>`__:
the key extension of GAT w.r.t vanilla GCN is deploying multi-head attention
among neighborhood of a node, thus greatly enhances the capacity and
expressiveness of the model.
* **R-GCN** `[paper] <https://arxiv.org/abs/1703.06103>`__ `[tutorial] <models/4_rgcn.html>`__
[code (wip)]: the key
difference of RGNN is to allow multi-edges among two entities of a graph, and
edges with distinct relationships are encoded differently. This is an
interesting extension of GCN that can have a lot of applications of its own.
* **LGNN** `[paper] <https://arxiv.org/abs/1705.08415>`__ `[tutorial (wip)]` `[code (wip)]`:
this model focuses on community detection by inspecting graph structures. It
uses representations of both the orignal graph and its line-graph companion. In
addition to demonstrate how an algorithm can harness multiple graphs, our
implementation shows how one can judiciously mix vanilla tensor operation,
sparse-matrix tensor operations, along with message-passing with DGL.
* **SSE** `[paper] <http://proceedings.mlr.press/v80/dai18a/dai18a.pdf>`__ `[tutorial (wip)]`
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/mxnet/sse/sse_batch.py>`__:
the emphasize here is *giant* graph that cannot fit comfortably on one GPU
card. SSE is an example to illustrate the co-design of both algrithm and
system: sampling to guarantee asymptotic covergence while lowering the
complexity, and batching across samples for maximum parallelism.
Dealing with many small graphs
------------------------------
* **Tree-LSTM** `[paper] <https://arxiv.org/abs/1503.00075>`__ `[tutorial] <models/3_tree-lstm.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/tree_lstm/tree_lstm.py>`__:
sentences of natural languages have inherent structures, which are thrown away
by treating them simply as sequences. Tree-LSTM is a powerful model that learns
the representation by leveraging prior syntactic structures (e.g. parse-tree).
The challenge to train it well is that simply by padding a sentence to the
maximum length no longer works, since trees of different sentences have
different sizes and topologies. DGL solves this problem by throwing the trees
into a bigger "container" graph, and use message-passing to explore maximum
parallelism. The key API we use is batching.
Generative models
------------------------------
* **DGMG** `[paper] <https://arxiv.org/abs/1803.03324>`__ `[tutorial] <models/5_dgmg.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/dgmg>`__:
this model belongs to the important family that deals with structural
generation. DGMG is interesting because its state-machine approach is the most
general. It is also very challenging because, unlike Tree-LSTM, every sample
has a dynamic, probability-driven structure that is not available before
training. We are able to progressively leverage intra- and inter-graph
parallelism to steadily improve the performance.
* **JTNN** `[paper] <https://arxiv.org/abs/1802.04364>`__ `[code (wip)]`: unlike DGMG, this
paper generates molecular graphs using the framework of variational
auto-encoder. Perhaps more interesting is its approach to build structure
hierarchically, in the case of molecular, with junction tree as the middle
scaffolding.
Old (new) wines in new bottle
-----------------------------
* **Capsule** `[paper] <https://arxiv.org/abs/1710.09829>`__ `[tutorial] <models/2_capsule.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/capsule>`__: this new
computer vision model has two key ideas -- enhancing the feature representation
in a vector form (instead of a scalar) called *capsule*, and replacing
maxpooling with dynamic routing. The idea of dynamic routing is to integrate a
lower level capsule to one (or several) of a higher level one with
non-parametric message-passing. We show how the later can be nicely implemented
with DGL APIs.
* **Transformer** `[paper] <https://arxiv.org/abs/1706.03762>`__ `[tutorial (wip)]` `[code (wip)]` and
**Universal Transformer** `[paper] <https://arxiv.org/abs/1807.03819>`__ `[tutorial (wip)]`
`[code (wip)]`: these
two models replace RNN with several layers of multi-head attention to encode
and discover structures among tokens of a sentence. These attention mechanisms
can similarly formulated as graph operations with message-passing.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment