Unverified Commit 20469802 authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[DOC] API tutorials (#96)

* setup sphinx-gallery; work on graph tutorial

* draft dglgraph tutorial

* update readme to include document url

* rm obsolete file

* Draft the message passing tutorial
parent 3de20385
......@@ -2,62 +2,11 @@
[![Build Status](http://216.165.71.225:8080/buildStatus/icon?job=DGL/master)](http://216.165.71.225:8080/job/DGL/job/master/)
[![GitHub license](https://dmlc.github.io/img/apache2.svg)](./LICENSE)
## Architecture
Show below, there are three sets of APIs for different models.
- `update_all`, `proppagate` are more global
- `update_by_edge`, `update_to` and `update_from` give finer control when updates are applied to a path, or a group of nodes
- `sendto` and `recvfrom` are the bottom primitives that update a message and node.
![Screenshot](graph-api.png)
For how to install and how to play with DGL, please read our
[Documentation](http://216.165.71.225:23232/index.html)
## For Model developers
- Always choose the API at the *highest* possible level.
- Refer to the [GCN example](examples/pytorch/gcn/gcn_batch.py) to see how to register message and node update functions;
## How to build (the `cpp` branch)
Before building, make sure that the submodules are cloned. If you haven't initialized the submodules, run
```sh
$ git submodule init
```
To sync the submodules, run
```sh
$ git submodule update
```
### Linux
At the root directory of the repo:
```sh
$ mkdir build
$ cd build
$ cmake ..
$ make
$ export DGL_LIBRARY_PATH=$PWD
```
The `DGL_LIBRARY_PATH` environment variable should point to the library `libdgl.so` built by CMake.
### Windows/MinGW (Experimental)
Make sure you have the following installed:
* CMake
* MinGW/GCC (G++)
* MinGW/Make
You can grab them from Anaconda.
In the command line prompt, run:
```
> md build
> cd build
> cmake -DCMAKE_CXX_FLAGS="-DDMLC_LOG_STACK_TRACE=0 -DTVM_EXPORTS" .. -G "MinGW Makefiles"
> mingw32-make
> set DGL_LIBRARY_PATH=%CD%
```
## Contribution rules
No direct push to master branch. All changes need to be PRed and reviewed before merging to the
master branch.
build
# tutorials are auto-generated
tutorials
DGL document and tutorial folder
================================
To build,
```
make html
```
and then render the page `build/html/index.html`.
......@@ -45,6 +45,8 @@ extensions = [
'sphinx.ext.mathjax',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.intersphinx',
'sphinx_gallery.gen_gallery',
]
# Add any paths that contain templates here, relative to this directory.
......@@ -133,7 +135,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'dgl.tex', 'dgl Documentation',
(master_doc, 'dgl.tex', 'DGL Documentation',
'DGL Team', 'manual'),
]
......@@ -143,7 +145,7 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'dgl', 'dgl Documentation',
(master_doc, 'dgl', 'DGL Documentation',
[author], 1)
]
......@@ -154,8 +156,8 @@ man_pages = [
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'dgl', 'dgl Documentation',
author, 'dgl', 'One line description of project.',
(master_doc, 'dgl', 'DGL Documentation',
author, 'dgl', 'Library for deep learning on graphs.',
'Miscellaneous'),
]
......@@ -179,3 +181,16 @@ epub_exclude_files = ['search.html']
# -- Extension configuration -------------------------------------------------
# sphinx gallery configurations
from sphinx_gallery.sorting import FileNameSortKey
examples_dirs = ['../../tutorials'] # path to find sources
gallery_dirs = ['tutorials'] # path to generate docs
sphinx_gallery_conf = {
'examples_dirs' : examples_dirs,
'gallery_dirs' : gallery_dirs,
'within_subsection_order' : FileNameSortKey,
'filename_pattern' : '.py',
}
......@@ -72,3 +72,23 @@ built above. Use following command to test whether the installation is successfu
Install from docker
-------------------
TBD
Install on Windows/MinGW
------------------------
Make sure you have the following installed:
* CMake
* MinGW/GCC (G++)
* MinGW/Make
You can grab them from Anaconda.
In the command line prompt, run:
.. code:: bash
md build
cd build
cmake -DCMAKE_CXX_FLAGS="-DDMLC_LOG_STACK_TRACE=0 -DTVM_EXPORTS" .. -G "MinGW Makefiles"
mingw32-make
set DGL_LIBRARY_PATH=%CD%
......@@ -251,7 +251,7 @@ class Frame(MutableMapping):
if self.num_rows == 0:
raise DGLError('Cannot add column "%s" using column schemes because'
' number of rows is unknown. Make sure there is at least'
' one column in the frame so number of rows can be inferred.')
' one column in the frame so number of rows can be inferred.' % name)
if self.initializer is None:
dgl_warning('Initializer is not set. Use zero initializer instead.'
' To suppress this warning, use `set_initializer` to'
......
......@@ -905,6 +905,8 @@ class DGLGraph(object):
Currently, we require the message functions of consecutive send's to
return the same keys. Otherwise the behavior will be undefined.
TODO(minjie): document on multiple send behavior
Parameters
----------
u : optional, node, container or tensor
......@@ -1077,6 +1079,10 @@ class DGLGraph(object):
All node_reprs and edge_reprs support tensor and dictionary types.
TODO(minjie): document on zero-in-degree case
TODO(minjie): document on how returned new features are merged with the old features
TODO(minjie): document on how many times UDFs will be called
Parameters
----------
u : node, container or tensor
......
"""
.. _tutorial-first:
Your first example in DGL
=========================
TODO: either a pagerank or SSSP example
"""
###############################################################################
# Create a DGLGraph
# -----------------
#
# To start with, let's first import dgl
import dgl
"""
.. _tutorial-graph:
Use DGLGraph
============
**Author**: `Minjie Wang <https://jermainewang.github.io/>`_
In this tutorial, we introduce how to use our graph class -- ``DGLGraph``.
The ``DGLGraph`` is the very core data structure in our library. It provides the basic
interfaces to manipulate graph structure, set/get node/edge features and convert
from/to many other graph formats. You can also perform computation on the graph
using our message passing APIs (see :ref:`tutorial-mp`).
"""
###############################################################################
# Construct a graph
# -----------------
#
# In ``DGLGraph``, all nodes are represented using consecutive integers starting from
# zero. All edges are directed. Let us start by creating a star network of 10 nodes
# where all the edges point to the center node (node#0).
# TODO(minjie): it's better to plot the graph here.
import dgl
star = dgl.DGLGraph()
star.add_nodes(10) # add 10 nodes
for i in range(1, 10):
star.add_edge(i, 0)
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# ``DGLGraph`` also supports adding multiple edges at once by providing multiple
# source and destination nodes. Multiple nodes are represented using either a
# list or a 1D integer tensor(vector). In addition to this, we also support
# "edge broadcasting":
#
# .. _note-edge-broadcast:
#
# .. note::
#
# Given two source and destination node list/tensor ``u`` and ``v``.
#
# - If ``len(u) == len(v)``, then this is a many-many edge set and
# each edge is represented by ``(u[i], v[i])``.
# - If ``len(u) == 1``, then this is a one-many edge set.
# - If ``len(v) == 1``, then this is a many-one edge set.
#
# Edge broadcasting is supported in many APIs whenever a bunch of edges need
# to be specified. The example below creates the same star graph as the previous one.
star.clear() # clear the previous graph
star.add_nodes(10)
u = list(range(1, 10)) # can also use tensor type here (e.g. torch.Tensor)
star.add_edges(u, 0) # many-one edge set
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# In ``DGLGraph``, each edge is assigned an internal edge id (also a consecutive
# integer starting from zero). The ids follow the addition order of the edges
# and you can query the id using the ``edge_ids`` interface.
print(star.edge_ids(1, 0)) # the first edge
print(star.edge_ids([8, 9], 0)) # ask for ids of multiple edges
###############################################################################
# Assigning consecutive integer ids for nodes and edges makes it easier to batch
# their features together (see next section). As a result, removing nodes or edges
# of a ``DGLGraph`` is currently not supported because this will break the assumption
# that the ids form a consecutive range from zero.
###############################################################################
# Node and edge features
# ----------------------
# Nodes and edges can have feature data in tensor type. They can be accessed/updated
# through a key-value storage interface. The key must be hashable. The value should
# be features of each node and edge batched on the *first* dimension. For example,
# following codes create features for all nodes (``hv``) and features for all
# edges (``he``). Each feature is a vector of length 3.
#
# .. note::
#
# The first dimension is usually reserved as batch dimension in DGL. Thus, even setting
# only one node/edge still needs to have an extra dimension (of length one).
import torch as th
D = 3 # the feature dimension
N = star.number_of_nodes()
M = star.number_of_edges()
nfeat = th.randn((N, D)) # some random node features
efeat = th.randn((M, D)) # some random edge features
# TODO(minjie): enable following syntax
# star.nodes[:]['hv'] = nfeat
# star.edges[:]['he'] = efeat
star.set_n_repr({'hv' : nfeat})
star.set_e_repr({'he' : efeat})
###############################################################################
# We can then set some nodes' features to be zero.
# TODO(minjie): enable following syntax
# print(star.nodes[:]['hv'])
print(star.get_n_repr()['hv'])
# set node 0, 2, 4 feature to zero
star.set_n_repr({'hv' : th.zeros((3, D))}, [0, 2, 4])
print(star.get_n_repr()['hv'])
###############################################################################
# Once created, each node/edge feature will be associated with a *scheme* containing
# the shape, dtype information of the feature tensor. Updating features using data
# of different scheme will raise error unless all the features are updated,
# in which case the scheme will be replaced with the new one.
print(star.node_attr_schemes())
# updating features with different scheme will raise error
# star.set_n_repr({'hv' : th.zeros((3, 2*D))}, [0, 2, 4])
# updating all the nodes is fine, the old scheme will be replaced
star.set_n_repr({'hv' : th.zeros((N, 2*D))})
print(star.node_attr_schemes())
###############################################################################
# If a new feature is added for some but not all of the nodes/edges, we will
# automatically create empty features for the others to make sure that features are
# always aligned. By default, we fill zero for the empty features. The behavior
# can be changed using ``set_n_initializer`` and ``set_e_initializer``.
star.set_n_repr({'hv_1' : th.randn((3, D+1))}, [0, 2, 4])
print(star.node_attr_schemes())
print(star.get_n_repr()['hv_1'])
###############################################################################
# Convert from/to other formats
# -----------------------------
# DGLGraph can be easily converted from/to ``networkx`` graph.
import networkx as nx
# note that networkx create undirected graph by default, so when converting
# to DGLGraph, directed edges of both directions will be added.
nx_star = nx.star_graph(9)
star = dgl.DGLGraph(nx_star)
print('#Nodes:', star.number_of_nodes())
print('#Edges:', star.number_of_edges())
###############################################################################
# Node and edge attributes can be automatically batched when converting from
# ``networkx`` graph. Since ``networkx`` graph by default does not tell which
# edge is added the first, we use the ``"id"`` edge attribute as a hint
# if available.
for i in range(10):
nx_star.nodes[i]['feat'] = th.randn((D,))
star = dgl.DGLGraph()
star.from_networkx(nx_star, node_attrs=['feat']) # auto-batch specified node features
print(star.get_n_repr()['feat'])
###############################################################################
# Multi-edge graph
# ----------------
# There are many applications that work on graphs containing multi-edges. To enable
# this, construct ``DGLGraph`` with ``multigraph=True``.
g = dgl.DGLGraph(multigraph=True)
g.add_nodes(5)
g.add_edge(0, 1)
g.add_edge(1, 2)
g.add_edge(0, 1)
print('#Nodes:', g.number_of_nodes())
print('#Edges:', g.number_of_edges())
# init random edge features
M = g.number_of_edges()
g.set_e_repr({'he' : th.randn((M, D))})
###############################################################################
# Because an edge in multi-graph cannot be uniquely identified using its incident
# nodes ``u`` and ``v``, you need to use edge id to access edge features. The
# edge ids can be queried from ``edge_id`` interface.
eid_01 = g.edge_id(0, 1)
print(eid_01)
###############################################################################
# We can then use the edge id to set/get the features of the corresponding edge.
g.set_e_repr_by_id({'he' : th.ones(len(eid_01), D)}, eid=eid_01)
print(g.get_e_repr()['he'])
"""
.. _tutorial-mp:
Message passing on graph
========================
**Author**: `Minjie Wang <https://jermainewang.github.io/>`_
Many of the graph-based deep neural networks are based on *"message passing"* --
nodes compute messages that are sent to others and the features are updated
using the messages. In this tutorial, we introduce the basic mechanism of message
passing in DGL.
"""
###############################################################################
# Let us start by import DGL and create an example graph used throughput this
# tutorial. The graph has 10 nodes, with node#0 be the source and node#9 be the
# sink. The source node (node#0) connects to all other nodes besides the sink
# node. Similarly, the sink node is connected by all other nodes besides the
# source node. We also initialize the feature vector of the source node to be
# all one, while the others have features of all zero.
# The code to create such graph is as follows (using pytorch syntax):
import dgl
import torch as th
g = dgl.DGLGraph()
g.add_nodes(10)
g.add_edges(0, list(range(1, 9)))
g.add_edges(list(range(1, 9)), 9)
# TODO(minjie): plot the graph here.
N = g.number_of_nodes()
M = g.number_of_edges()
print('#Nodes:', N)
print('#Edges:', M)
# initialize the node features
D = 1 # feature size
g.set_n_repr({'feat' : th.zeros((N, D))})
g.set_n_repr({'feat' : th.ones((1, D))}, 0)
print(g.get_n_repr()['feat'])
###############################################################################
# User-defined functions and high-level APIs
# ------------------------------------------
#
# There are two core components in DGL's message passing programming model:
#
# * **User-defined functions (UDFs)** on how the messages are computed and used.
# * **High-level APIs** on who are sending messages to whom and are being updated.
#
# For example, one simple user-defined message function can be as follows:
def send_source(src, edge):
return {'msg' : src['feat']}
###############################################################################
# The above function computes the messages over **a batch of edges**.
# It has two arguments: `src` for source node features and
# `edge` for the edge features, and it returns the messages computed. The argument
# and return type is dictionary from the feature/message name to tensor values.
# We can trigger this function using out ``send`` API:
g.send(0, 1, message_func=send_source)
###############################################################################
# Here, the message is computed using the feature of node#0. The result message
# (on 0->1) is not returned but directly saved in ``DGLGraph`` for the later
# receive phase.
#
# You can send multiple messages at once using the
# :ref:`multi-edge semantics <note-edge-broadcast>`.
# In such case, the source node and edge features are batched on the first dimension.
# You can simply print out the shape of the feature tensor in your message
# function.
def send_source_print(src, edge):
print('src feat shape:', src['feat'].shape)
return {'msg' : src['feat']}
g.send(0, [4, 5, 6], message_func=send_source_print)
###############################################################################
# To receive and aggregate in-coming messages, user can define a reduce function
# that operators on **a batch of nodes**.
def simple_reduce(node, msgs):
return {'feat' : th.sum(msgs['msg'], dim=1)}
###############################################################################
# The reduce function has two arguments: ``node`` for the node features and
# ``msgs`` for the in-coming messages. It returns the updated node features.
# The function can be triggered using the ``recv`` API. Again, DGL support
# receive messages for multiple nodes at the same time. In such case, the
# node features are batched on the first dimension. Because each node can
# receive different number of in-coming messages, we divide the receiving
# nodes into buckets based on their numbers of receiving messages. As a result,
# the message tensor has at least three dimensions (B, n, D), where the second
# dimension concats all the messages for each node together. This also means
# the reduce UDF will be called for each bucket. You can simply print out
# the shape of the message tensor as follows:
def simple_reduce_print(node, msgs):
print('msg shape:', msgs['msg'].shape)
return {'feat' : th.sum(msgs['msg'], dim=1)}
g.recv([1, 4, 5, 6], reduce_func=simple_reduce_print)
print(g.get_n_repr()['feat'])
###############################################################################
# You can see that, after send and recv, the value of node#0 has been propagated
# to node 1, 4, 5 and 6.
###############################################################################
# DGL message passing APIs
# ------------------------
#
# TODO(minjie): enable backreference for all the mentioned APIs below.
#
# In DGL, we categorize the message passing APIs into three levels. All of them
# can be configured using UDFs such as the message and reduce functions.
#
# **Level-1 routines:** APIs that trigger computation on either a batch of nodes
# or a batch of edges. This includes:
#
# * ``send(u, v)`` and ``recv(v)``
# * ``update_edge(u, v)``: This updates the edge features using the current edge
# features and the source and destination nodes features.
# * ``apply_nodes(v)``: This transforms the node features using the current node
# features.
# * ``apply_edges(u, v)``: This transforms the edge features using the current edge
# features.
###############################################################################
# **Level-2 routines:** APIs that combines several level-1 routines.
#
# * ``send_and_recv(u, v)``: This first computes messages over u->v, then reduce
# them on v. An optional node apply function can be provided.
# * ``pull(v)``: This computes the messages over all the in-edges of v, then reduce
# them on v. An optional node apply function can be provided.
# * ``push(v)``: This computes the messages over all the out-edges of v, then
# reduce them on the successors. An optional node apply function can be provided.
# * ``update_all()``: Send out and reduce messages on every node. An optional node
# apply function can be provided.
#
# The following example uses ``send_and_recv`` to continue propagate signals to the
# sink node#9:
g.send_and_recv([1, 4, 5, 6], 9, message_func=send_source, reduce_func=simple_reduce)
print(g.get_n_repr()['feat'])
###############################################################################
# **Level-3 routines:** APIs that calls multiple level-2 routines.
#
# * ``propagate()``: TBD after Yu's traversal PR.
###############################################################################
# Builtin functions
# -----------------
#
# Since many message and reduce UDFs are very common (such as sending source
# node features as the message and aggregating messages using summation), DGL
# actually provides builtin functions that can be directly used:
import dgl.function as fn
g.send_and_recv(0, [2, 3], fn.copy_src(src='feat', out='msg'), fn.sum(msg='msg', out='feat'))
print(g.get_n_repr()['feat'])
###############################################################################
# TODO(minjie): document on multiple builtin function syntax after Lingfan
# finished his change.
###############################################################################
# Using builtin functions not only saves your time in writing codes, but also
# allows DGL to use more efficient implementation automatically. To see this,
# you can continue to our tutorial on Graph Convolutional Network.
# TODO(minjie): need a hyperref to the GCN tutorial here.
Tutorials
=========
TBD: Get started on DGL
DGL tutorials and examples
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment