"git@developer.sourcefind.cn:OpenDAS/torch-sparce.git" did not exist on "ae35b8a596530494a02e5a63becd2e77e29bb384"
Unverified Commit 10e18ed9 authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[Doc] overview and others (#222)

* model tutorials overview

* update the 1_first

* revised overview and summarized "what is DGL" in glance
parent 2c170a8c
...@@ -3,25 +3,74 @@ ...@@ -3,25 +3,74 @@
You can adapt this file completely to your liking, but it should at least You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive. contain the root `toctree` directive.
Welcome to DGL's documentation! Overview of DGL
=============================== ===============
Deep Graph Library (DGL) is a Python package built for easy implementation of
graph neural network model family, on top of existing DL frameworks (e.g.
Pytorch, MXNet, Gluon etc.).
DGL reduces the implementation of graph neural networks into declaring a set
of _functions_ (or _modules_ in PyTorch terminology). In addition, DGL
provides:
* Versatile controls over message passing, ranging from low-level operations
such as sending along selected edges and receiving on specific nodes, to
high-level control such as graph-wide feature updates.
* Transparent speed optimization with automatic batching of computations and
sparse matrix multiplication.
* Seamless integration with existing deep learning frameworks.
* Easy and friendly interfaces for node/edge feature access and graph
structure manipulation.
To begin with, we have prototyped 10 models across various domains:
semi-supervised learning on graphs (with potentially billions of nodes/edges),
generative models on graphs, (previously) difficult-to-parallelize tree-based
models like TreeLSTM, etc. We also implement some conventional models in DGL
from a new graphical perspective yielding simplicity.
Relationship of DGL to other frameworks
---------------------------------------
DGL is designed to be compatible and agnostic to the existing tensor
frameworks. It provides a backend adapter interface that allows easy porting
to other tensor-based, autograd-enabled frameworks. Currently, our prototype
works with MXNet/Gluon and PyTorch.
Free software
-------------
DGL is free software; you can redistribute it and/or modify it under the terms
of the Apache License 2.0. We welcome contributions.
Join us on `GitHub <https://github.com/jermainewang/dgl>`_.
History
-------
Prototype of DGL started in early Spring, 2018, at NYU Shanghai by Prof. Zheng
Zhang and Quan Gan. Serious development began when Minjie, Lingfan and Prof
Jinyang Li from NYU's system group joined, flanked by a team of student
volunteers at NYU Shanghai, Fudan and other universities (Yu, Zihao, Murphy,
Allen, Qipeng, Qi, Hao), as well as early adopters at the CILVR lab (Jake
Zhao). Development accelerated when AWS MXNet Science team joined force, with
Da Zheng, Alex Smola, Haibin Lin, Chao Ma and a number of others. For full
credit, see `here <https://www.dgl.ai/ack>`_.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 1
:caption: Contents: :caption: Get Started
:glob:
install/index
Get Started
-----------
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Tutorials
:glob:
install/index
tutorials/index tutorials/index
API Reference
-------------
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: API Reference
:glob:
api/python/index api/python/index
......
...@@ -7,51 +7,42 @@ DGL at a Glance ...@@ -7,51 +7,42 @@ DGL at a Glance
**Author**: `Minjie Wang <https://jermainewang.github.io/>`_, Quan Gan, `Jake **Author**: `Minjie Wang <https://jermainewang.github.io/>`_, Quan Gan, `Jake
Zhao <https://cs.nyu.edu/~jakezhao/>`_, Zheng Zhang Zhao <https://cs.nyu.edu/~jakezhao/>`_, Zheng Zhang
DGL is a Python package dedicated to deep learning on graphs, built atop
existing tensor DL frameworks (e.g. Pytorch, MXNet) and simplifying the
implementation of graph-based neural networks.
The goal of this tutorial: The goal of this tutorial:
- Understand how DGL builds a graph and performs computation on graph from a - Understand how DGL enables computation on graph from a high level.
high level.
- Train a simple graph neural network in DGL to classify nodes in a graph. - Train a simple graph neural network in DGL to classify nodes in a graph.
At the end of this tutorial, we hope you get a brief feeling of how DGL works. At the end of this tutorial, we hope you get a brief feeling of how DGL works.
"""
############################################################################### *This tutorial assumes basic familiarity with pytorch.*
# Why DGL? """
# ----------------
# DGL is designed to bring **machine learning** closer to **graph-structured
# data**. Specifically DGL enables trouble-free implementation of graph neural
# network (GNN) model family. Unlike PyTorch or Tensorflow, DGL provides
# friendly APIs to perform the fundamental operations in GNNs such as message
# passing and reduction. Through DGL, we hope to benefit both researchers
# trying out new ideas and engineers in production.
#
# *This tutorial assumes basic familiarity with pytorch.*
############################################################################### ###############################################################################
# A toy graph: Zachary's Karate Club # Step 0: Problem description
# ---------------------------------- # ---------------------------
# #
# We start by creating the well-knowned "Zachary's karate club" social network. # We start with the well-known "Zachary's karate club" problem. The karate club
# The network captures 34 members of a karate club, documenting pairwise links # is a social network which captures 34 members and document pairwise links
# between members who interacted outside the club. The club later splits into # between members who interact outside the club. The club later divides into
# two communities led by the instructor (node 0) and the club president (node # two communities led by the instructor (node 0) and the club president (node
# 33). A visualization of the network and the community is as follows: # 33). The network is visualized as follows with the color indicating the
# community:
# #
# .. image:: http://historicaldataninjas.com/wp-content/uploads/2014/05/karate.jpg # .. image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/img/karate-club.png
# :height: 400px
# :width: 500px
# :align: center # :align: center
# #
# Out task is to **build a graph neural network to predict which side each # The task is to predict which side (0 or 33) each member tends to join given
# member will join.** # the social network itself.
############################################################################### ###############################################################################
# Build the graph # Step 1: Creating a graph in DGL
# --------------- # -------------------------------
# A graph is built using :class:`~dgl.DGLGraph` class. Here is how we add the 34 members # Creating the graph for Zachary's karate club goes as follows:
# and their interaction edges into the graph.
import dgl import dgl
...@@ -59,7 +50,7 @@ def build_karate_club_graph(): ...@@ -59,7 +50,7 @@ def build_karate_club_graph():
g = dgl.DGLGraph() g = dgl.DGLGraph()
# add 34 nodes into the graph; nodes are labeled from 0~33 # add 34 nodes into the graph; nodes are labeled from 0~33
g.add_nodes(34) g.add_nodes(34)
# all the 78 edges in a list of tuple # all 78 edges as a list of tuples
edge_list = [(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), edge_list = [(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2),
(4, 0), (5, 0), (6, 0), (6, 4), (6, 5), (7, 0), (7, 1), (4, 0), (5, 0), (6, 0), (6, 4), (6, 5), (7, 0), (7, 1),
(7, 2), (7, 3), (8, 0), (8, 2), (9, 2), (10, 0), (10, 4), (7, 2), (7, 3), (8, 0), (8, 2), (9, 2), (10, 0), (10, 4),
...@@ -73,37 +64,44 @@ def build_karate_club_graph(): ...@@ -73,37 +64,44 @@ def build_karate_club_graph():
(33, 14), (33, 15), (33, 18), (33, 19), (33, 20), (33, 22), (33, 14), (33, 15), (33, 18), (33, 19), (33, 20), (33, 22),
(33, 23), (33, 26), (33, 27), (33, 28), (33, 29), (33, 30), (33, 23), (33, 26), (33, 27), (33, 28), (33, 29), (33, 30),
(33, 31), (33, 32)] (33, 31), (33, 32)]
# edges in DGL is added by two list of nodes: src and dst # add edges two lists of nodes: src and dst
src, dst = tuple(zip(*edge_list)) src, dst = tuple(zip(*edge_list))
g.add_edges(src, dst) g.add_edges(src, dst)
# edges are directional in DGL; make it bi-directional # edges are directional in DGL; make them bi-directional
g.add_edges(dst, src) g.add_edges(dst, src)
return g return g
############################################################################### ###############################################################################
# We can test it to see we have the correct number of nodes and edges: # We can print out the number of nodes and edges in our newly constructed graph:
G = build_karate_club_graph() G = build_karate_club_graph()
print('We have %d nodes.' % G.number_of_nodes()) print('We have %d nodes.' % G.number_of_nodes())
print('We have %d edges.' % G.number_of_edges()) print('We have %d edges.' % G.number_of_edges())
############################################################################### ###############################################################################
# We can also visualize it by converting it to a `networkx # We can also visualize the graph by converting it to a `networkx
# <https://networkx.github.io/documentation/stable/>`_ graph: # <https://networkx.github.io/documentation/stable/>`_ graph:
import networkx as nx import networkx as nx
nx_G = G.to_networkx() # Since the actual graph is undirected, we convert it for visualization
pos = nx.circular_layout(nx_G) # purpose.
nx.draw(nx_G, pos, with_labels=True) nx_G = G.to_networkx().to_undirected()
# Kamada-Kawaii layout usually looks pretty for arbitrary graphs
pos = nx.kamada_kawai_layout(nx_G)
nx.draw(nx_G, pos, with_labels=True, node_color=[[.7, .7, .7]])
############################################################################### ###############################################################################
# Assign features # Step 2: assign features to nodes or edges
# --------------- # --------------------------------------------
# Features are tensor data associated with nodes and edges. The features of # Graph neural networks associate features with nodes and edges for training.
# mulitple nodes/edges are batched along the first dimension. Following codes # For our classification example, we assign each node's an input feature as a one-hot vector:
# assign a one-hot encoding feature for each node in the graph (e.g. :math:`v_i` got # node :math:`v_i`'s feature vector is :math:`[0,\ldots,1,\dots,0]`,
# a feature vector :math:`[0,\ldots,1,\dots,0]`, where the :math:`i^{th}` location is one). # where the :math:`i^{th}` position is one.
#
# In DGL, we can add features for all nodes at once, using a feature tensor that
# batches node features along the first dimension. This code below adds the one-hot
# feature for all nodes:
import torch import torch
...@@ -120,46 +118,42 @@ print(G.nodes[2].data['feat']) ...@@ -120,46 +118,42 @@ print(G.nodes[2].data['feat'])
print(G.nodes[[10, 11]].data['feat']) print(G.nodes[[10, 11]].data['feat'])
############################################################################### ###############################################################################
# Define a Graph Convolutional Network (GCN) # Step 3: define a Graph Convolutional Network (GCN)
# ------------------------------------------ # --------------------------------------------------
# To classify whose side each node will join, we adopt the Graph Convolutional # To perform node classification, we use the Graph Convolutional Network
# Network (GCN) developed by `Kipf and # (GCN) developed by `Kipf and Welling <https://arxiv.org/abs/1609.02907>`_. Here
# Welling <https://arxiv.org/abs/1609.02907>`_. The GCN model can be summarized, # we provide the simpliest definition of a GCN framework, but we recommend the
# in a high-level as follows: # reader to read the original paper for more details.
# #
# - Each node :math:`v_i` has a feature vector :math:`h_i`. # - At layer :math:`l`, each node :math:`v_i^l` carries a feature vector :math:`h_i^l`.
# - Each node accumulates the feature vectors :math:`h_j` from its neighbors, performs # - Each layer of the GCN tries to aggregate the features from :math:`u_i^{l}` where
# an affine and non-linear transformation to update its own feature. # :math:`u_i`'s are neighborhood nodes to :math:`v` into the next layer representation at
# :math:`v_i^{l+1}`. This is followed by an affine transformation with some
# non-linearity.
# #
# A graphical demonstration is displayed below. # The above definition of GCN fits into a **message-passing** paradigm: each
# node will update its own feature with information sent from neighboring
# nodes. A graphical demonstration is displayed below.
# #
# .. image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/1_first/mailbox.png # .. image:: https://s3.us-east-2.amazonaws.com/dgl.ai/tutorial/1_first/mailbox.png
# :alt: mailbox # :alt: mailbox
# :align: center # :align: center
# #
# The GCN layer can be easily implemented in DGL using the message passing # Now, we show that the GCN layer can be easily implemented in DGL.
# interface. It typically consists of three steps:
#
# 1. Define the message function.
# 2. Define the reduce function.
# 3. Define how they are triggered using message passing APIs (e.g. ``send`` and ``recv``).
#
# Following is how it looks like:
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
# Define the message & reduce function # Define the message & reduce function
# NOTE: we ignore the normalization constant c_ij for this tutorial. # NOTE: we ignore the GCN's normalization constant c_ij for this tutorial.
def gcn_message(edges): def gcn_message(edges):
# The argument is a batch of edges. # The argument is a batch of edges.
# This computes a message called 'msg' using the source node's feature 'h'. # This computes a (batch of) message called 'msg' using the source node's feature 'h'.
return {'msg' : edges.src['h']} return {'msg' : edges.src['h']}
def gcn_reduce(nodes): def gcn_reduce(nodes):
# The argument is a batch of nodes. # The argument is a batch of nodes.
# This computes the new 'h' features by summing the received 'msg' # This computes the new 'h' features by summing received 'msg' in each node's mailbox.
# in mailbox.
return {'h' : torch.sum(nodes.mailbox['msg'], dim=1)} return {'h' : torch.sum(nodes.mailbox['msg'], dim=1)}
# Define the GCNLayer module # Define the GCNLayer module
...@@ -172,8 +166,9 @@ class GCNLayer(nn.Module): ...@@ -172,8 +166,9 @@ class GCNLayer(nn.Module):
# g is the graph and the inputs is the input node features # g is the graph and the inputs is the input node features
# first set the node features # first set the node features
g.ndata['h'] = inputs g.ndata['h'] = inputs
# trigger message passing on all the edges and nodes # trigger message passing on all edges
g.send(g.edges(), gcn_message) g.send(g.edges(), gcn_message)
# trigger aggregation at all nodes
g.recv(g.nodes(), gcn_reduce) g.recv(g.nodes(), gcn_reduce)
# get the result node features # get the result node features
h = g.ndata.pop('h') h = g.ndata.pop('h')
...@@ -181,12 +176,15 @@ class GCNLayer(nn.Module): ...@@ -181,12 +176,15 @@ class GCNLayer(nn.Module):
return self.linear(h) return self.linear(h)
############################################################################### ###############################################################################
# We then define a neural network that contains two GCN layers: # In general, the nodes send information computed via the *message functions*,
# and aggregates incoming information with the *reduce functions*.
#
# We then define a deeper GCN model that contains two GCN layers:
# Define a 2-layer GCN model # Define a 2-layer GCN model
class Net(nn.Module): class GCN(nn.Module):
def __init__(self, in_feats, hidden_size, num_classes): def __init__(self, in_feats, hidden_size, num_classes):
super(Net, self).__init__() super(GCN, self).__init__()
self.gcn1 = GCNLayer(in_feats, hidden_size) self.gcn1 = GCNLayer(in_feats, hidden_size)
self.gcn2 = GCNLayer(hidden_size, num_classes) self.gcn2 = GCNLayer(hidden_size, num_classes)
...@@ -195,26 +193,29 @@ class Net(nn.Module): ...@@ -195,26 +193,29 @@ class Net(nn.Module):
h = torch.relu(h) h = torch.relu(h)
h = self.gcn2(g, h) h = self.gcn2(g, h)
return h return h
# input_feature_size=34, hidden_size=5, num_classes=2 # The first layer transforms input features of size of 34 to a hidden size of 5.
net = Net(34, 5, 2) # The second layer transforms the hidden layer and produces output features of
# size 2, corresponding to the two groups of the karate club.
net = GCN(34, 5, 2)
############################################################################### ###############################################################################
# Train the GCN model to predict community # Step 4: data preparation and initialization
# ---------------------------------------- # -------------------------------------------
# #
# To prepare the input features and labels, again, we adopt a # We use one-hot vectors to initialize the node features. Since this is a
# semi-supervised setting. Each node is initialized by an # semi-supervised setting, only the instructor (node 0) and the club president
# one-hot encoding, and only the instructor (node 0) and the club president # (node 33) are assigned labels. The implementation is available as follow.
# (node 33) are labeled.
inputs = torch.eye(34) inputs = torch.eye(34)
labeled_nodes = torch.tensor([0, 33]) # only the instructor and the president nodes are labeled labeled_nodes = torch.tensor([0, 33]) # only the instructor and the president nodes are labeled
labels = torch.tensor([0, 1]) # their labels are different labels = torch.tensor([0, 1]) # their labels are different
############################################################################### ###############################################################################
# The training loop is no fancier than other NN models. We (1) create an optimizer, # Step 5: train then visualize
# (2) feed the inputs to the model, (3) calculate the loss and (4) use autograd # ----------------------------
# to optimize the model. # The training loop is exactly the same as other PyTorch models.
# We (1) create an optimizer, (2) feed the inputs to the model,
# (3) calculate the loss and (4) use autograd to optimize the model.
optimizer = torch.optim.Adam(net.parameters(), lr=0.01) optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
all_logits = [] all_logits = []
...@@ -233,8 +234,13 @@ for epoch in range(30): ...@@ -233,8 +234,13 @@ for epoch in range(30):
print('Epoch %d | Loss: %.4f' % (epoch, loss.item())) print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))
############################################################################### ###############################################################################
# Since the model produces a 2-dimensional vector for each node, we can # This is a rather toy example, so it does not even have a validation or test
# visualize it very easily. # set. Instead, Since the model produces an output feature of size 2 for each node, we can
# visualize by plotting the output feature in a 2D space.
# The following code animates the training process from initial guess
# (where the nodes are not classified correctly at all) to the end
# (where the nodes are linearly separable).
import matplotlib.animation as animation import matplotlib.animation as animation
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
...@@ -253,10 +259,6 @@ def draw(i): ...@@ -253,10 +259,6 @@ def draw(i):
nx.draw_networkx(nx_G.to_undirected(), pos, node_color=colors, nx.draw_networkx(nx_G.to_undirected(), pos, node_color=colors,
with_labels=True, node_size=300, ax=ax) with_labels=True, node_size=300, ax=ax)
###############################################################################
# We first plot the initial guess before training. As you can see, the nodes
# are not classified correctly.
fig = plt.figure(dpi=150) fig = plt.figure(dpi=150)
fig.clf() fig.clf()
ax = fig.subplots() ax = fig.subplots()
...@@ -271,7 +273,7 @@ plt.close() ...@@ -271,7 +273,7 @@ plt.close()
############################################################################### ###############################################################################
# The following animation shows how the model correctly predicts the community # The following animation shows how the model correctly predicts the community
# after training. # after a series of training epochs.
ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200) ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
...@@ -284,5 +286,6 @@ ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200) ...@@ -284,5 +286,6 @@ ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
############################################################################### ###############################################################################
# Next steps # Next steps
# ---------- # ----------
#
# In the :doc:`next tutorial <2_basics>`, we will go through some more basics # In the :doc:`next tutorial <2_basics>`, we will go through some more basics
# of DGL, such as reading and writing node/edge features. # of DGL, such as reading and writing node/edge features.
Tutorials Basic Tutorials
========= ===============
DGL tutorials and examples These tutorials conver the basics of DGL.
\ No newline at end of file
Graph-based DNN models in DGL Graph-based Neural Network Models
============================= =================================
Graph-based DNN models in DGL We developed DGL with a broad range of applications in mind. Building
state-of-art models forces us to think hard on the most common and useful APIs,
learn the hard lessons, and push the system design.
We have prototyped altogether 10 different models, all of them are ready to run
out-of-box and some of them are very new graph-based algorithms. In most of the
cases, they demonstrate the performance, flexibility, and expressiveness of
DGL. For where we still fall in short, these exercises point to future
directions.
We categorize the models below, providing links to the original code and
tutorial when appropriate. As will become apparent, these models stress the use
of different DGL APIs.
Graph Neural Network and its variant
------------------------------------
* **GCN** `[paper] <https://arxiv.org/abs/1609.02907>`__ `[tutorial] <models/1_gcn.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gcn/gcn.py>`__:
this is the vanilla GCN. The tutorial covers the basic uses of DGL APIs.
* **GAT** `[paper] <https://arxiv.org/abs/1710.10903>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gat/gat.py>`__:
the key extension of GAT w.r.t vanilla GCN is deploying multi-head attention
among neighborhood of a node, thus greatly enhances the capacity and
expressiveness of the model.
* **R-GCN** `[paper] <https://arxiv.org/abs/1703.06103>`__ `[tutorial] <models/4_rgcn.html>`__
[code (wip)]: the key
difference of RGNN is to allow multi-edges among two entities of a graph, and
edges with distinct relationships are encoded differently. This is an
interesting extension of GCN that can have a lot of applications of its own.
* **LGNN** `[paper] <https://arxiv.org/abs/1705.08415>`__ `[tutorial (wip)]` `[code (wip)]`:
this model focuses on community detection by inspecting graph structures. It
uses representations of both the orignal graph and its line-graph companion. In
addition to demonstrate how an algorithm can harness multiple graphs, our
implementation shows how one can judiciously mix vanilla tensor operation,
sparse-matrix tensor operations, along with message-passing with DGL.
* **SSE** `[paper] <http://proceedings.mlr.press/v80/dai18a/dai18a.pdf>`__ `[tutorial (wip)]`
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/mxnet/sse/sse_batch.py>`__:
the emphasize here is *giant* graph that cannot fit comfortably on one GPU
card. SSE is an example to illustrate the co-design of both algrithm and
system: sampling to guarantee asymptotic covergence while lowering the
complexity, and batching across samples for maximum parallelism.
Dealing with many small graphs
------------------------------
* **Tree-LSTM** `[paper] <https://arxiv.org/abs/1503.00075>`__ `[tutorial] <models/3_tree-lstm.html>`__
`[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/tree_lstm/tree_lstm.py>`__:
sentences of natural languages have inherent structures, which are thrown away
by treating them simply as sequences. Tree-LSTM is a powerful model that learns
the representation by leveraging prior syntactic structures (e.g. parse-tree).
The challenge to train it well is that simply by padding a sentence to the
maximum length no longer works, since trees of different sentences have
different sizes and topologies. DGL solves this problem by throwing the trees
into a bigger "container" graph, and use message-passing to explore maximum
parallelism. The key API we use is batching.
Generative models
------------------------------
* **DGMG** `[paper] <https://arxiv.org/abs/1803.03324>`__ `[tutorial] <models/5_dgmg.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/dgmg>`__:
this model belongs to the important family that deals with structural
generation. DGMG is interesting because its state-machine approach is the most
general. It is also very challenging because, unlike Tree-LSTM, every sample
has a dynamic, probability-driven structure that is not available before
training. We are able to progressively leverage intra- and inter-graph
parallelism to steadily improve the performance.
* **JTNN** `[paper] <https://arxiv.org/abs/1802.04364>`__ `[code (wip)]`: unlike DGMG, this
paper generates molecular graphs using the framework of variational
auto-encoder. Perhaps more interesting is its approach to build structure
hierarchically, in the case of molecular, with junction tree as the middle
scaffolding.
Old (new) wines in new bottle
-----------------------------
* **Capsule** `[paper] <https://arxiv.org/abs/1710.09829>`__ `[tutorial] <models/2_capsule.html>`__
`[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/capsule>`__: this new
computer vision model has two key ideas -- enhancing the feature representation
in a vector form (instead of a scalar) called *capsule*, and replacing
maxpooling with dynamic routing. The idea of dynamic routing is to integrate a
lower level capsule to one (or several) of a higher level one with
non-parametric message-passing. We show how the later can be nicely implemented
with DGL APIs.
* **Transformer** `[paper] <https://arxiv.org/abs/1706.03762>`__ `[tutorial (wip)]` `[code (wip)]` and
**Universal Transformer** `[paper] <https://arxiv.org/abs/1807.03819>`__ `[tutorial (wip)]`
`[code (wip)]`: these
two models replace RNN with several layers of multi-head attention to encode
and discover structures among tokens of a sentence. These attention mechanisms
can similarly formulated as graph operations with message-passing.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment