Unverified Commit 20469802 authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[DOC] API tutorials (#96)

* setup sphinx-gallery; work on graph tutorial

* draft dglgraph tutorial

* update readme to include document url

* rm obsolete file

* Draft the message passing tutorial
parent 3de20385
...@@ -2,62 +2,11 @@ ...@@ -2,62 +2,11 @@
[![Build Status](http://216.165.71.225:8080/buildStatus/icon?job=DGL/master)](http://216.165.71.225:8080/job/DGL/job/master/) [![Build Status](http://216.165.71.225:8080/buildStatus/icon?job=DGL/master)](http://216.165.71.225:8080/job/DGL/job/master/)
[![GitHub license](https://dmlc.github.io/img/apache2.svg)](./LICENSE) [![GitHub license](https://dmlc.github.io/img/apache2.svg)](./LICENSE)
## Architecture
Show below, there are three sets of APIs for different models.
- `update_all`, `proppagate` are more global
- `update_by_edge`, `update_to` and `update_from` give finer control when updates are applied to a path, or a group of nodes
- `sendto` and `recvfrom` are the bottom primitives that update a message and node.
![Screenshot](graph-api.png) For how to install and how to play with DGL, please read our
[Documentation](http://216.165.71.225:23232/index.html)
## For Model developers
- Always choose the API at the *highest* possible level.
- Refer to the [GCN example](examples/pytorch/gcn/gcn_batch.py) to see how to register message and node update functions;
## How to build (the `cpp` branch) ## Contribution rules
No direct push to master branch. All changes need to be PRed and reviewed before merging to the
Before building, make sure that the submodules are cloned. If you haven't initialized the submodules, run master branch.
```sh
$ git submodule init
```
To sync the submodules, run
```sh
$ git submodule update
```
### Linux
At the root directory of the repo:
```sh
$ mkdir build
$ cd build
$ cmake ..
$ make
$ export DGL_LIBRARY_PATH=$PWD
```
The `DGL_LIBRARY_PATH` environment variable should point to the library `libdgl.so` built by CMake.
### Windows/MinGW (Experimental)
Make sure you have the following installed:
* CMake
* MinGW/GCC (G++)
* MinGW/Make
You can grab them from Anaconda.
In the command line prompt, run:
```
> md build
> cd build
> cmake -DCMAKE_CXX_FLAGS="-DDMLC_LOG_STACK_TRACE=0 -DTVM_EXPORTS" .. -G "MinGW Makefiles"
> mingw32-make
> set DGL_LIBRARY_PATH=%CD%
```
build build
# tutorials are auto-generated
tutorials
DGL document and tutorial folder
================================
To build,
```
make html
```
and then render the page `build/html/index.html`.
...@@ -45,6 +45,8 @@ extensions = [ ...@@ -45,6 +45,8 @@ extensions = [
'sphinx.ext.mathjax', 'sphinx.ext.mathjax',
'sphinx.ext.napoleon', 'sphinx.ext.napoleon',
'sphinx.ext.viewcode', 'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'sphinx_gallery.gen_gallery',
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
...@@ -133,7 +135,7 @@ latex_elements = { ...@@ -133,7 +135,7 @@ latex_elements = {
# (source start file, target name, title, # (source start file, target name, title,
# author, documentclass [howto, manual, or own class]). # author, documentclass [howto, manual, or own class]).
latex_documents = [ latex_documents = [
(master_doc, 'dgl.tex', 'dgl Documentation', (master_doc, 'dgl.tex', 'DGL Documentation',
'DGL Team', 'manual'), 'DGL Team', 'manual'),
] ]
...@@ -143,7 +145,7 @@ latex_documents = [ ...@@ -143,7 +145,7 @@ latex_documents = [
# One entry per manual page. List of tuples # One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section). # (source start file, name, description, authors, manual section).
man_pages = [ man_pages = [
(master_doc, 'dgl', 'dgl Documentation', (master_doc, 'dgl', 'DGL Documentation',
[author], 1) [author], 1)
] ]
...@@ -154,8 +156,8 @@ man_pages = [ ...@@ -154,8 +156,8 @@ man_pages = [
# (source start file, target name, title, author, # (source start file, target name, title, author,
# dir menu entry, description, category) # dir menu entry, description, category)
texinfo_documents = [ texinfo_documents = [
(master_doc, 'dgl', 'dgl Documentation', (master_doc, 'dgl', 'DGL Documentation',
author, 'dgl', 'One line description of project.', author, 'dgl', 'Library for deep learning on graphs.',
'Miscellaneous'), 'Miscellaneous'),
] ]
...@@ -179,3 +181,16 @@ epub_exclude_files = ['search.html'] ...@@ -179,3 +181,16 @@ epub_exclude_files = ['search.html']
# -- Extension configuration ------------------------------------------------- # -- Extension configuration -------------------------------------------------
# sphinx gallery configurations
from sphinx_gallery.sorting import FileNameSortKey
examples_dirs = ['../../tutorials'] # path to find sources
gallery_dirs = ['tutorials'] # path to generate docs
sphinx_gallery_conf = {
'examples_dirs' : examples_dirs,
'gallery_dirs' : gallery_dirs,
'within_subsection_order' : FileNameSortKey,
'filename_pattern' : '.py',
}
...@@ -72,3 +72,23 @@ built above. Use following command to test whether the installation is successfu ...@@ -72,3 +72,23 @@ built above. Use following command to test whether the installation is successfu
Install from docker Install from docker
------------------- -------------------
TBD TBD
Install on Windows/MinGW
------------------------
Make sure you have the following installed:
* CMake
* MinGW/GCC (G++)
* MinGW/Make
You can grab them from Anaconda.
In the command line prompt, run:
.. code:: bash
md build
cd build
cmake -DCMAKE_CXX_FLAGS="-DDMLC_LOG_STACK_TRACE=0 -DTVM_EXPORTS" .. -G "MinGW Makefiles"
mingw32-make
set DGL_LIBRARY_PATH=%CD%
...@@ -251,7 +251,7 @@ class Frame(MutableMapping): ...@@ -251,7 +251,7 @@ class Frame(MutableMapping):
if self.num_rows == 0: if self.num_rows == 0:
raise DGLError('Cannot add column "%s" using column schemes because' raise DGLError('Cannot add column "%s" using column schemes because'
' number of rows is unknown. Make sure there is at least' ' number of rows is unknown. Make sure there is at least'
' one column in the frame so number of rows can be inferred.') ' one column in the frame so number of rows can be inferred.' % name)
if self.initializer is None: if self.initializer is None:
dgl_warning('Initializer is not set. Use zero initializer instead.' dgl_warning('Initializer is not set. Use zero initializer instead.'
' To suppress this warning, use `set_initializer` to' ' To suppress this warning, use `set_initializer` to'
......
...@@ -905,6 +905,8 @@ class DGLGraph(object): ...@@ -905,6 +905,8 @@ class DGLGraph(object):
Currently, we require the message functions of consecutive send's to Currently, we require the message functions of consecutive send's to
return the same keys. Otherwise the behavior will be undefined. return the same keys. Otherwise the behavior will be undefined.
TODO(minjie): document on multiple send behavior
Parameters Parameters
---------- ----------
u : optional, node, container or tensor u : optional, node, container or tensor
...@@ -1077,6 +1079,10 @@ class DGLGraph(object): ...@@ -1077,6 +1079,10 @@ class DGLGraph(object):
All node_reprs and edge_reprs support tensor and dictionary types. All node_reprs and edge_reprs support tensor and dictionary types.
TODO(minjie): document on zero-in-degree case
TODO(minjie): document on how returned new features are merged with the old features
TODO(minjie): document on how many times UDFs will be called
Parameters Parameters
---------- ----------
u : node, container or tensor u : node, container or tensor
......
"""
.. _tutorial-first:
Your first example in DGL
=========================
TODO: either a pagerank or SSSP example
"""
###############################################################################
# Create a DGLGraph
# -----------------
#
# To start with, let's first import dgl
import dgl
"""
.. _tutorial-graph:
Use DGLGraph
============
**Author**: `Minjie Wang <https://jermainewang.github.io/>`_
In this tutorial, we introduce how to use our graph class -- ``DGLGraph``.
The ``DGLGraph`` is the very core data structure in our library. It provides the basic
interfaces to manipulate graph structure, set/get node/edge features and convert
from/to many other graph formats. You can also perform computation on the graph
using our message passing APIs (see :ref:`tutorial-mp`).
"""
###############################################################################
# Construct a graph
# -----------------
#
# In ``DGLGraph``, all nodes are represented using consecutive integers starting from
# zero. All edges are directed. Let us start by creating a star network of 10 nodes
# where all the edges point to the center node (node#0).
# TODO(minjie): it's better to plot the graph here.
import dgl
star = dgl.DGLGraph()
star.add_nodes(10) # add 10 nodes
for i in range(1, 10):
star.add_edge(i, 0)
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# ``DGLGraph`` also supports adding multiple edges at once by providing multiple
# source and destination nodes. Multiple nodes are represented using either a
# list or a 1D integer tensor(vector). In addition to this, we also support
# "edge broadcasting":
#
# .. _note-edge-broadcast:
#
# .. note::
#
# Given two source and destination node list/tensor ``u`` and ``v``.
#
# - If ``len(u) == len(v)``, then this is a many-many edge set and
# each edge is represented by ``(u[i], v[i])``.
# - If ``len(u) == 1``, then this is a one-many edge set.
# - If ``len(v) == 1``, then this is a many-one edge set.
#
# Edge broadcasting is supported in many APIs whenever a bunch of edges need
# to be specified. The example below creates the same star graph as the previous one.
star.clear() # clear the previous graph
star.add_nodes(10)
u = list(range(1, 10)) # can also use tensor type here (e.g. torch.Tensor)
star.add_edges(u, 0) # many-one edge set
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# In ``DGLGraph``, each edge is assigned an internal edge id (also a consecutive
# integer starting from zero). The ids follow the addition order of the edges
# and you can query the id using the ``edge_ids`` interface.
print(star.edge_ids(1, 0)) # the first edge
print(star.edge_ids([8, 9], 0)) # ask for ids of multiple edges
###############################################################################
# Assigning consecutive integer ids for nodes and edges makes it easier to batch
# their features together (see next section). As a result, removing nodes or edges
# of a ``DGLGraph`` is currently not supported because this will break the assumption
# that the ids form a consecutive range from zero.
###############################################################################
# Node and edge features
# ----------------------
# Nodes and edges can have feature data in tensor type. They can be accessed/updated
# through a key-value storage interface. The key must be hashable. The value should
# be features of each node and edge batched on the *first* dimension. For example,
# following codes create features for all nodes (``hv``) and features for all
# edges (``he``). Each feature is a vector of length 3.
#
# .. note::
#
# The first dimension is usually reserved as batch dimension in DGL. Thus, even setting
# only one node/edge still needs to have an extra dimension (of length one).
import torch as th
D = 3 # the feature dimension
N = star.number_of_nodes()
M = star.number_of_edges()
nfeat = th.randn((N, D)) # some random node features
efeat = th.randn((M, D)) # some random edge features
# TODO(minjie): enable following syntax
# star.nodes[:]['hv'] = nfeat
# star.edges[:]['he'] = efeat
star.set_n_repr({'hv' : nfeat})
star.set_e_repr({'he' : efeat})
###############################################################################
# We can then set some nodes' features to be zero.
# TODO(minjie): enable following syntax
# print(star.nodes[:]['hv'])
print(star.get_n_repr()['hv'])
# set node 0, 2, 4 feature to zero
star.set_n_repr({'hv' : th.zeros((3, D))}, [0, 2, 4])
print(star.get_n_repr()['hv'])
###############################################################################
# Once created, each node/edge feature will be associated with a *scheme* containing
# the shape, dtype information of the feature tensor. Updating features using data
# of different scheme will raise error unless all the features are updated,
# in which case the scheme will be replaced with the new one.
print(star.node_attr_schemes())
# updating features with different scheme will raise error
# star.set_n_repr({'hv' : th.zeros((3, 2*D))}, [0, 2, 4])
# updating all the nodes is fine, the old scheme will be replaced
star.set_n_repr({'hv' : th.zeros((N, 2*D))})
print(star.node_attr_schemes())
###############################################################################
# If a new feature is added for some but not all of the nodes/edges, we will
# automatically create empty features for the others to make sure that features are
# always aligned. By default, we fill zero for the empty features. The behavior
# can be changed using ``set_n_initializer`` and ``set_e_initializer``.
star.set_n_repr({'hv_1' : th.randn((3, D+1))}, [0, 2, 4])
print(star.node_attr_schemes())
print(star.get_n_repr()['hv_1'])
###############################################################################
# Convert from/to other formats
# -----------------------------
# DGLGraph can be easily converted from/to ``networkx`` graph.
import networkx as nx
# note that networkx create undirected graph by default, so when converting
# to DGLGraph, directed edges of both directions will be added.
nx_star = nx.star_graph(9)
star = dgl.DGLGraph(nx_star)
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# Node and edge attributes can be automatically batched when converting from
# ``networkx`` graph. Since ``networkx`` graph by default does not tell which
# edge is added the first, we use the ``"id"`` edge attribute as a hint
# if available.
for i in range(10):
nx_star.nodes[i]['feat'] = th.randn((D,))
star = dgl.DGLGraph()
star.from_networkx(nx_star, node_attrs=['feat']) # auto-batch specified node features
print(star.get_n_repr()['feat'])
###############################################################################
# Multi-edge graph
# ----------------
# There are many applications that work on graphs containing multi-edges. To enable
# this, construct ``DGLGraph`` with ``multigraph=True``.
g = dgl.DGLGraph(multigraph=True)
g.add_nodes(5)
g.add_edge(0, 1)
g.add_edge(1, 2)
g.add_edge(0, 1)
print('#Nodes:', g.number_of_nodes())
print('#Edges:', g.number_of_edges())
# init random edge features
M = g.number_of_edges()
g.set_e_repr({'he' : th.randn((M, D))})
###############################################################################
# Because an edge in multi-graph cannot be uniquely identified using its incident
# nodes ``u`` and ``v``, you need to use edge id to access edge features. The
# edge ids can be queried from ``edge_id`` interface.
eid_01 = g.edge_id(0, 1)
print(eid_01)
###############################################################################
# We can then use the edge id to set/get the features of the corresponding edge.
g.set_e_repr_by_id({'he' : th.ones(len(eid_01), D)}, eid=eid_01)
print(g.get_e_repr()['he'])
"""
.. _tutorial-mp:
Message passing on graph
========================
**Author**: `Minjie Wang <https://jermainewang.github.io/>`_
Many of the graph-based deep neural networks are based on *"message passing"* --
nodes compute messages that are sent to others and the features are updated
using the messages. In this tutorial, we introduce the basic mechanism of message
passing in DGL.
"""
###############################################################################
# Let us start by import DGL and create an example graph used throughput this
# tutorial. The graph has 10 nodes, with node#0 be the source and node#9 be the
# sink. The source node (node#0) connects to all other nodes besides the sink
# node. Similarly, the sink node is connected by all other nodes besides the
# source node. We also initialize the feature vector of the source node to be
# all one, while the others have features of all zero.
# The code to create such graph is as follows (using pytorch syntax):
import dgl
import torch as th
g = dgl.DGLGraph()
g.add_nodes(10)
g.add_edges(0, list(range(1, 9)))
g.add_edges(list(range(1, 9)), 9)
# TODO(minjie): plot the graph here.
N = g.number_of_nodes()
M = g.number_of_edges()
print('#Nodes:', N)
print('#Edges:', M)
# initialize the node features
D = 1 # feature size
g.set_n_repr({'feat' : th.zeros((N, D))})
g.set_n_repr({'feat' : th.ones((1, D))}, 0)
print(g.get_n_repr()['feat'])
###############################################################################
# User-defined functions and high-level APIs
# ------------------------------------------
#
# There are two core components in DGL's message passing programming model:
#
# * **User-defined functions (UDFs)** on how the messages are computed and used.
# * **High-level APIs** on who are sending messages to whom and are being updated.
#
# For example, one simple user-defined message function can be as follows:
def send_source(src, edge):
return {'msg' : src['feat']}
###############################################################################
# The above function computes the messages over **a batch of edges**.
# It has two arguments: `src` for source node features and
# `edge` for the edge features, and it returns the messages computed. The argument
# and return type is dictionary from the feature/message name to tensor values.
# We can trigger this function using out ``send`` API:
g.send(0, 1, message_func=send_source)
###############################################################################
# Here, the message is computed using the feature of node#0. The result message
# (on 0->1) is not returned but directly saved in ``DGLGraph`` for the later
# receive phase.
#
# You can send multiple messages at once using the
# :ref:`multi-edge semantics <note-edge-broadcast>`.
# In such case, the source node and edge features are batched on the first dimension.
# You can simply print out the shape of the feature tensor in your message
# function.
def send_source_print(src, edge):
print('src feat shape:', src['feat'].shape)
return {'msg' : src['feat']}
g.send(0, [4, 5, 6], message_func=send_source_print)
###############################################################################
# To receive and aggregate in-coming messages, user can define a reduce function
# that operators on **a batch of nodes**.
def simple_reduce(node, msgs):
return {'feat' : th.sum(msgs['msg'], dim=1)}
###############################################################################
# The reduce function has two arguments: ``node`` for the node features and
# ``msgs`` for the in-coming messages. It returns the updated node features.
# The function can be triggered using the ``recv`` API. Again, DGL support
# receive messages for multiple nodes at the same time. In such case, the
# node features are batched on the first dimension. Because each node can
# receive different number of in-coming messages, we divide the receiving
# nodes into buckets based on their numbers of receiving messages. As a result,
# the message tensor has at least three dimensions (B, n, D), where the second
# dimension concats all the messages for each node together. This also means
# the reduce UDF will be called for each bucket. You can simply print out
# the shape of the message tensor as follows:
def simple_reduce_print(node, msgs):
print('msg shape:', msgs['msg'].shape)
return {'feat' : th.sum(msgs['msg'], dim=1)}
g.recv([1, 4, 5, 6], reduce_func=simple_reduce_print)
print(g.get_n_repr()['feat'])
###############################################################################
# You can see that, after send and recv, the value of node#0 has been propagated
# to node 1, 4, 5 and 6.
###############################################################################
# DGL message passing APIs
# ------------------------
#
# TODO(minjie): enable backreference for all the mentioned APIs below.
#
# In DGL, we categorize the message passing APIs into three levels. All of them
# can be configured using UDFs such as the message and reduce functions.
#
# **Level-1 routines:** APIs that trigger computation on either a batch of nodes
# or a batch of edges. This includes:
#
# * ``send(u, v)`` and ``recv(v)``
# * ``update_edge(u, v)``: This updates the edge features using the current edge
# features and the source and destination nodes features.
# * ``apply_nodes(v)``: This transforms the node features using the current node
# features.
# * ``apply_edges(u, v)``: This transforms the edge features using the current edge
# features.
###############################################################################
# **Level-2 routines:** APIs that combines several level-1 routines.
#
# * ``send_and_recv(u, v)``: This first computes messages over u->v, then reduce
# them on v. An optional node apply function can be provided.
# * ``pull(v)``: This computes the messages over all the in-edges of v, then reduce
# them on v. An optional node apply function can be provided.
# * ``push(v)``: This computes the messages over all the out-edges of v, then
# reduce them on the successors. An optional node apply function can be provided.
# * ``update_all()``: Send out and reduce messages on every node. An optional node
# apply function can be provided.
#
# The following example uses ``send_and_recv`` to continue propagate signals to the
# sink node#9:
g.send_and_recv([1, 4, 5, 6], 9, message_func=send_source, reduce_func=simple_reduce)
print(g.get_n_repr()['feat'])
###############################################################################
# **Level-3 routines:** APIs that calls multiple level-2 routines.
#
# * ``propagate()``: TBD after Yu's traversal PR.
###############################################################################
# Builtin functions
# -----------------
#
# Since many message and reduce UDFs are very common (such as sending source
# node features as the message and aggregating messages using summation), DGL
# actually provides builtin functions that can be directly used:
import dgl.function as fn
g.send_and_recv(0, [2, 3], fn.copy_src(src='feat', out='msg'), fn.sum(msg='msg', out='feat'))
print(g.get_n_repr()['feat'])
###############################################################################
# TODO(minjie): document on multiple builtin function syntax after Lingfan
# finished his change.
###############################################################################
# Using builtin functions not only saves your time in writing codes, but also
# allows DGL to use more efficient implementation automatically. To see this,
# you can continue to our tutorial on Graph Convolutional Network.
# TODO(minjie): need a hyperref to the GCN tutorial here.
Tutorials Tutorials
========= =========
TBD: Get started on DGL DGL tutorials and examples
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment