Merge pull request #1 from jermainewang/cpp

Cpp

Merge pull request #1 from jermainewang/cpp
Cpp
8801154b · VoVAllen · GitHub · b46abb09 · b2c1c4fa · 8801154b
Unverified Commit 8801154b authored Oct 17, 2018 by VoVAllen Committed by GitHub Oct 17, 2018
20 changed files
--- a/docker/Dockerfile.ci_cpu
+++ b/docker/Dockerfile.ci_cpu
+# CI docker CPU env
+# Adapted from github.com/dmlc/tvm/docker/Dockerfile.ci_cpu
+FROM ubuntu:16.04
+
+RUN apt-get update --fix-missing
+
+COPY install/ubuntu_install_core.sh /install/ubuntu_install_core.sh
+RUN bash /install/ubuntu_install_core.sh
+
+COPY install/ubuntu_install_python.sh /install/ubuntu_install_python.sh
+RUN bash /install/ubuntu_install_python.sh
+
+COPY install/ubuntu_install_python_package.sh /install/ubuntu_install_python_package.sh
+RUN bash /install/ubuntu_install_python_package.sh
--- a/docker/Dockerfile.ci_gpu
+++ b/docker/Dockerfile.ci_gpu
+# CI docker GPU env
+FROM nvidia/cuda:9.0-cudnn7-devel
+
+# Base scripts
+RUN apt-get update --fix-missing
+
+COPY install/ubuntu_install_core.sh /install/ubuntu_install_core.sh
+RUN bash /install/ubuntu_install_core.sh
+
+COPY install/ubuntu_install_python.sh /install/ubuntu_install_python.sh
+RUN bash /install/ubuntu_install_python.sh
+
+COPY install/ubuntu_install_python_package.sh /install/ubuntu_install_python_package.sh
+RUN bash /install/ubuntu_install_python_package.sh
+
+# Environment variables
+ENV PATH=/usr/local/nvidia/bin:${PATH}
+ENV PATH=/usr/local/cuda/bin:${PATH}
+ENV CPLUS_INCLUDE_PATH=/usr/local/cuda/include:${CPLUS_INCLUDE_PATH}
+ENV C_INCLUDE_PATH=/usr/local/cuda/include:${C_INCLUDE_PATH}
+ENV LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64:${LIBRARY_PATH}
+ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}
--- a/docker/README.md
+++ b/docker/README.md
+## Build docker image for CI
+
+### CPU image
+docker build -t dgl-cpu -f Dockerfile.ci_cpu .
+
+### GPU image
+docker build -t dgl-gpu -f Dockerfile.ci_gpu .
--- a/docker/install/ubuntu_install_core.sh
+++ b/docker/install/ubuntu_install_core.sh
+# install libraries for building c++ core on ubuntu
+apt update && apt install -y --no-install-recommends --force-yes \
+        apt-utils git build-essential make cmake wget unzip sudo libz-dev libxml2-dev
--- a/docker/install/ubuntu_install_python.sh
+++ b/docker/install/ubuntu_install_python.sh
+# install python and pip, don't modify this, modify install_python_package.sh
+# apt-get update && apt-get install -y python-dev python-pip
+
+# python 3.6
+apt-get update && yes | apt-get install software-properties-common
+add-apt-repository ppa:jonathonf/python-3.6
+apt-get update && apt-get install -y python3.6 python3.6-dev
+rm -f /usr/bin/python3 && ln -s /usr/bin/python3.6 /usr/bin/python3
+
+# Install pip
+cd /tmp && wget https://bootstrap.pypa.io/get-pip.py
+# python2 get-pip.py
+python3.6 get-pip.py
--- a/docker/install/ubuntu_install_python_package.sh
+++ b/docker/install/ubuntu_install_python_package.sh
+# install libraries for python package on ubuntu
+# pip2 install pylint numpy cython scipy nltk requests[security]
+pip3 install pylint numpy cython scipy nltk requests[security]
+
+# install DL Framework
+# pip2 install torch torchvision
+pip3 install torch torchvision
--- a/docs/.gitignore
+++ b/docs/.gitignore
+build
--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
--- a/docs/source/api/python/batch.rst
+++ b/docs/source/api/python/batch.rst
+dgl.BatchedDGLGraph
+-------------------
+.. autoclass:: dgl.BatchedDGLGraph
+    :members:
+    :show-inheritance:
+
+.. autofunction:: dgl.batch
+
+.. autofunction:: dgl.unbatch
--- a/docs/source/api/python/graph.rst
+++ b/docs/source/api/python/graph.rst
+dgl.DGLGraph
+------------
+.. automodule:: dgl.graph
+
+.. autoclass:: dgl.DGLGraph
+    :members:
+    :inherited-members:
--- a/docs/source/api/python/index.rst
+++ b/docs/source/api/python/index.rst
+Python APIs
+===========
+
+.. toctree::
+   :maxdepth: 2
+
+   graph
+   batch
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# -*- coding: utf-8 -*-
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+sys.path.insert(0, os.path.abspath('../../python'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'DGL'
+copyright = '2018, DGL Team'
+author = 'DGL Team'
+
+# The short X.Y version
+version = '0.0.1'
+# The full version, including alpha/beta/rc tags
+release = '0.0.1'
+
+
+# -- General configuration ---------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.coverage',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = ['.rst', '.md']
+
+# The master toctree document.
+master_doc = 'index'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = None
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+
+
+# -- Options for HTMLHelp output ---------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'dgldoc'
+
+
+# -- Options for LaTeX output ------------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'dgl.tex', 'dgl Documentation',
+     'DGL Team', 'manual'),
+]
+
+
+# -- Options for manual page output ------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'dgl', 'dgl Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output ----------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'dgl', 'dgl Documentation',
+     author, 'dgl', 'One line description of project.',
+     'Miscellaneous'),
+]
+
+
+# -- Options for Epub output -------------------------------------------------
+
+# Bibliographic Dublin Core info.
+epub_title = project
+
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+
+# A unique identification for the text.
+#
+# epub_uid = ''
+
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+
+
+# -- Extension configuration -------------------------------------------------
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+.. DGL documentation master file, created by
+   sphinx-quickstart on Fri Oct  5 14:18:01 2018.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to DGL's documentation!
+===============================
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+Get Started
+-----------
+.. toctree::
+   :maxdepth: 2
+
+   install/index
+   tutorials/index
+
+API Reference
+-------------
+.. toctree::
+   :maxdepth: 2
+
+   api/python/index
+
+Index
+-----
+* :ref:`genindex`
--- a/docs/source/install/index.rst
+++ b/docs/source/install/index.rst
+Install DGL
+============
+
+At this stage, we recommend installing DGL from source. To quickly try out DGL and its demo/tutorials, checkout `Install from docker`_.
+
+Get source codes
+----------------
+First, download the source files from github. Note you need to use the ``--recursive`` option to
+also clone the submodules.
+
+.. code:: bash
+
+    git clone --recursive https://github.com/jermainewang/dgl.git
+
+You can also clone the repository first and type following commands:
+
+.. code:: bash
+
+    git submodule init
+    git submodule update
+
+
+Build shared library
+--------------------
+Before building the library, please make sure the following dependencies are installed
+(use ubuntu as an example):
+
+.. code:: bash
+
+    sudo apt-get update
+    sudo apt-get install -y python
+
+We use cmake (minimal version 2.8) to build the library.
+
+.. code:: bash
+
+    mkdir build
+    cd build
+    cmake ..
+    make -j4
+
+Build python binding
+--------------------
+DGL's python binding depends on following packages (tested version):
+
+* numpy (>= 1.14.0)
+* scipy (>= 1.1.0)
+* networkx (>= 2.1)
+
+To install them, use following command:
+
+.. code:: bash
+
+    pip install --user numpy scipy networkx
+
+There are several ways to setup DGL's python binding. We recommend developers at the current stage
+use environment variables to find python packages.
+
+.. code:: bash
+
+    export DGL_HOME=/path/to/dgl
+    export PYTHONPATH=$DGL_HOME$/python:${PYTHONPATH}
+    export DGL_LIBRARY_PATH=$DGL_HOME$/build
+
+The ``DGL_LIBRARY_PATH`` variable is used for our python package to locate the shared library
+built above. Use following command to test whether the installation is successful or not.
+
+.. code:: bash
+
+    python -c 'import dgl'
+
+Install from docker
+-------------------
+TBD
--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
+Tutorials
+=========
+
+TBD: Get started on DGL
--- a/examples/pytorch/gat/gat_batch.py
+++ b/examples/pytorch/gat/gat_batch.py
@@ -9,33 +9,35 @@ GAT with batch processing
 import argparse
 import numpy as np
 import time
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
+import mxnet as mx
+from mxnet import gluon
 import dgl
 from dgl import DGLGraph
 from dgl.data import register_data_args, load_data

+def elu(data):
+    return mx.nd.LeakyReLU(data, act_type='elu')
+
 def gat_message(src, edge):
    return {'ft' : src['ft'], 'a2' : src['a2']}

-class GATReduce(nn.Module):
+class GATReduce(gluon.Block):
    def __init__(self, attn_drop):
        super(GATReduce, self).__init__()
        self.attn_drop = attn_drop

    def forward(self, node, msgs):
-        a1 = torch.unsqueeze(node['a1'], 1)  # shape (B, 1, 1)
+        a1 = mx.nd.expand_dims(node['a1'], 1)  # shape (B, 1, 1)
        a2 = msgs['a2'] # shape (B, deg, 1)
        ft = msgs['ft'] # shape (B, deg, D)
        # attention
        a = a1 + a2  # shape (B, deg, 1)
-        e = F.softmax(F.leaky_relu(a), dim=1)
+        e = mx.nd.softmax(mx.nd.LeakyReLU(a))
        if self.attn_drop != 0.0:
-            e = F.dropout(e, self.attn_drop)
-        return {'accum' : torch.sum(e * ft, dim=1)} # shape (B, D)
+            e = mx.nd.Dropout(e, self.attn_drop)
+        return {'accum' : mx.nd.sum(e * ft, axis=1)} # shape (B, D)

-class GATFinalize(nn.Module):
+class GATFinalize(gluon.Block):
    def __init__(self, headid, indim, hiddendim, activation, residual):
        super(GATFinalize, self).__init__()
        self.headid = headid
@@ -44,7 +46,7 @@ class GATFinalize(nn.Module):
        self.residual_fc = None
        if residual:
            if indim != hiddendim:
-                self.residual_fc = nn.Linear(indim, hiddendim)
+                self.residual_fc = gluon.nn.Dense(hiddendim)

    def forward(self, node):
        ret = node['accum']
@@ -55,24 +57,24 @@ class GATFinalize(nn.Module):
                ret = node['h'] + ret
        return {'head%d' % self.headid : self.activation(ret)}

-class GATPrepare(nn.Module):
+class GATPrepare(gluon.Block):
    def __init__(self, indim, hiddendim, drop):
        super(GATPrepare, self).__init__()
-        self.fc = nn.Linear(indim, hiddendim)
+        self.fc = gluon.nn.Dense(hiddendim)
        self.drop = drop
-        self.attn_l = nn.Linear(hiddendim, 1)
-        self.attn_r = nn.Linear(hiddendim, 1)
+        self.attn_l = gluon.nn.Dense(1)
+        self.attn_r = gluon.nn.Dense(1)

    def forward(self, feats):
        h = feats
        if self.drop != 0.0:
-            h = F.dropout(h, self.drop)
+            h = mx.nd.Dropout(h, self.drop)
        ft = self.fc(h)
        a1 = self.attn_l(ft)
        a2 = self.attn_r(ft)
        return {'h' : h, 'ft' : ft, 'a1' : a1, 'a2' : a2}

-class GAT(nn.Module):
+class GAT(gluon.Block):
    def __init__(self,
                 g,
                 num_layers,
@@ -88,27 +90,27 @@ class GAT(nn.Module):
        self.g = g
        self.num_layers = num_layers
        self.num_heads = num_heads
-        self.prp = nn.ModuleList()
-        self.red = nn.ModuleList()
-        self.fnl = nn.ModuleList()
+        self.prp = gluon.nn.Sequential()
+        self.red = gluon.nn.Sequential()
+        self.fnl = gluon.nn.Sequential()
        # input projection (no residual)
        for hid in range(num_heads):
-            self.prp.append(GATPrepare(in_dim, num_hidden, in_drop))
-            self.red.append(GATReduce(attn_drop))
-            self.fnl.append(GATFinalize(hid, in_dim, num_hidden, activation, False))
+            self.prp.add(GATPrepare(in_dim, num_hidden, in_drop))
+            self.red.add(GATReduce(attn_drop))
+            self.fnl.add(GATFinalize(hid, in_dim, num_hidden, activation, False))
        # hidden layers
        for l in range(num_layers - 1):
            for hid in range(num_heads):
                # due to multi-head, the in_dim = num_hidden * num_heads
-                self.prp.append(GATPrepare(num_hidden * num_heads, num_hidden, in_drop))
-                self.red.append(GATReduce(attn_drop))
-                self.fnl.append(GATFinalize(hid, num_hidden * num_heads,
-                                            num_hidden, activation, residual))
+                self.prp.add(GATPrepare(num_hidden * num_heads, num_hidden, in_drop))
+                self.red.add(GATReduce(attn_drop))
+                self.fnl.add(GATFinalize(hid, num_hidden * num_heads,
+                                         num_hidden, activation, residual))
        # output projection
-        self.prp.append(GATPrepare(num_hidden * num_heads, num_classes, in_drop))
-        self.red.append(GATReduce(attn_drop))
-        self.fnl.append(GATFinalize(0, num_hidden * num_heads,
-                                    num_classes, activation, residual))
+        self.prp.add(GATPrepare(num_hidden * num_heads, num_classes, in_drop))
+        self.red.add(GATReduce(attn_drop))
+        self.fnl.add(GATFinalize(0, num_hidden * num_heads,
+                                 num_classes, activation, residual))
        # sanity check
        assert len(self.prp) == self.num_layers * self.num_heads + 1
        assert len(self.red) == self.num_layers * self.num_heads + 1
@@ -122,23 +124,23 @@ class GAT(nn.Module):
                # prepare
                self.g.set_n_repr(self.prp[i](last))
                # message passing
-                self.g.update_all(gat_message, self.red[i], self.fnl[i], batchable=True)
+                self.g.update_all(gat_message, self.red[i], self.fnl[i])
            # merge all the heads
-            last = torch.cat(
-                    [self.g.pop_n_repr('head%d' % hid) for hid in range(self.num_heads)],
+            last = mx.nd.concat(
+                    *[self.g.pop_n_repr('head%d' % hid) for hid in range(self.num_heads)],
                    dim=1)
        # output projection
        self.g.set_n_repr(self.prp[-1](last))
-        self.g.update_all(gat_message, self.red[-1], self.fnl[-1], batchable=True)
+        self.g.update_all(gat_message, self.red[-1], self.fnl[-1])
        return self.g.pop_n_repr('head0')

 def main(args):
    # load and preprocess dataset
    data = load_data(args)

-    features = torch.FloatTensor(data.features)
-    labels = torch.LongTensor(data.labels)
-    mask = torch.ByteTensor(data.train_mask)
+    features = mx.nd.array(data.features)
+    labels = mx.nd.array(data.labels)
+    mask = mx.nd.array(data.train_mask)
    in_feats = features.shape[1]
    n_classes = data.num_labels
    n_edges = data.graph.number_of_edges()
@@ -162,16 +164,17 @@ def main(args):
                args.num_hidden,
                n_classes,
                args.num_heads,
-                F.elu,
+                elu,
                args.in_drop,
                args.attn_drop,
                args.residual)

    if cuda:
        model.cuda()
+    model.initialize()

    # use optimizer
-    optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
+    trainer = gluon.Trainer(model.collect_params(), 'adam', {'learning_rate': args.lr})

    # initialize graph
    dur = []
@@ -179,19 +182,18 @@ def main(args):
        if epoch >= 3:
            t0 = time.time()
        # forward
-        logits = model(features)
-        logp = F.log_softmax(logits, 1)
-        loss = F.nll_loss(logp, labels)
+        with mx.autograd.record():
+            logits = model(features)
+            loss = mx.nd.softmax_cross_entropy(logits, labels)

-        optimizer.zero_grad()
+        #optimizer.zero_grad()
        loss.backward()
-        optimizer.step()
+        trainer.step(features.shape[0])

        if epoch >= 3:
            dur.append(time.time() - t0)
-
-        print("Epoch {:05d} | Loss {:.4f} | Time(s) {:.4f} | ETputs(KTEPS) {:.2f}".format(
-            epoch, loss.item(), np.mean(dur), n_edges / np.mean(dur) / 1000))
+            print("Epoch {:05d} | Loss {:.4f} | Time(s) {:.4f} | ETputs(KTEPS) {:.2f}".format(
+                epoch, loss.asnumpy()[0], np.mean(dur), n_edges / np.mean(dur) / 1000))

 if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='GAT')

--- a/examples/mxnet/gcn/README.md
+++ b/examples/mxnet/gcn/README.md
+Graph Convolutional Networks (GCN)
+============
+
+Paper link: [https://arxiv.org/abs/1609.02907](https://arxiv.org/abs/1609.02907)
+Author's code repo: [https://github.com/tkipf/gcn](https://github.com/tkipf/gcn)
+
+The folder contains three different implementations using DGL.
+
+Naive GCN (gcn.py)
+-------
+The model is defined in the finest granularity (aka on *one* edge and *one* node).
+
+* The message function `gcn_msg` computes the message for one edge. It simply returns the `h` representation of the source node.
+  ```python
+  def gcn_msg(src, edge):
+    # src['h'] is a tensor of shape (D,). D is the feature length.
+    return src['h']
+  ```
+* The reduce function `gcn_reduce` accumulates the incoming messages for one node. The `msgs` argument is a list of all the messages. In GCN, the incoming messages are summed up.
+  ```python
+  def gcn_reduce(node, msgs):
+    # msgs is a list of in-coming messages.
+    return sum(msgs)
+  ```
+* The update function `NodeUpdateModule` computes the new new node representation `h` using non-linear transformation on the reduced messages.
+  ```python
+  class NodeUpdateModule(nn.Module):
+    def __init__(self, in_feats, out_feats, activation=None):
+      super(NodeUpdateModule, self).__init__()
+      self.linear = nn.Linear(in_feats, out_feats)
+      self.activation = activation
+
+    def forward(self, node, accum):
+      # accum is a tensor of shape (D,).
+      h = self.linear(accum)
+      if self.activation:
+          h = self.activation(h)
+      return {'h' : h}
+  ```
+
+After defining the functions on each node/edge, the message passing is triggered by calling `update_all` on the DGLGraph object (in GCN module).
+
+Batched GCN (gcn_batch.py)
+-----------
+Defining the model on only one node and edge makes it hard to fully utilize GPUs. As a result, we allow users to define model on a *batch of* nodes and edges.
+
+* The message function `gcn_msg` computes the message for a batch of edges. Here, the `src` argument is the batched representation of the source endpoints of the edges. The function simply returns the source node representations.
+  ```python
+  def gcn_msg(src, edge):
+    # src is a tensor of shape (B, D). B is the number of edges being batched.
+    return src
+  ```
+* The reduce function `gcn_reduce` also accumulates messages for a batch of nodes. We batch the messages on the second dimension fo the `msgs` argument:
+  ```python
+  def gcn_reduce(node, msgs):
+    # The msgs is a tensor of shape (B, deg, D). B is the number of nodes in the batch;
+    #  deg is the number of messages; D is the message tensor dimension. DGL gaurantees
+    #  that all the nodes in a batch have the same in-degrees (through "degree-bucketing").
+    # Reduce on the second dimension is equal to sum up all the in-coming messages.
+    return torch.sum(msgs, 1)
+  ```
+* The update module is similar. The first dimension of each tensor is the batch dimension. Since PyTorch operation is usually aware of the batch dimension, the code is the same as the naive GCN.
+
+Triggering message passing is also similar. User needs to set `batchable=True` to indicate that the functions all support batching.
+```python
+self.g.update_all(gcn_msg, gcn_reduce, layer, batchable=True)`
+```
+
+Batched GCN with spMV optimization (gcn_spmv.py)
+-----------
+Batched computation is much more efficient than naive vertex-centric approach, but is still not ideal. For example, the batched message function needs to look up source node data and save it on edges. Such kind of lookups is very common and incurs extra memory copy operations. In fact, the message and reduce phase of GCN model can be fused into one sparse-matrix-vector multiplication (spMV). Therefore, DGL provides many built-in message/reduce functions so we can figure out the chance of optimization. In gcn_spmv.py, user only needs to write update module and trigger the message passing as follows:
+```python
+self.g.update_all('from_src', 'sum', layer, batchable=True)
+```
+Here, `'from_src'` and `'sum'` are the builtin message and reduce function.
--- a/examples/pytorch/gcn/gcn_batch.py
+++ b/examples/pytorch/gcn/gcn_batch.py
@@ -8,9 +8,8 @@ GCN with batch processing
 import argparse
 import numpy as np
 import time
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
+import mxnet as mx
+from mxnet import gluon
 import dgl
 from dgl import DGLGraph
 from dgl.data import register_data_args, load_data
@@ -19,21 +18,17 @@ def gcn_msg(src, edge):
    return src

 def gcn_reduce(node, msgs):
-    return torch.sum(msgs, 1)
+    return mx.nd.sum(msgs, 1)

-class NodeApplyModule(nn.Module):
-    def __init__(self, in_feats, out_feats, activation=None):
-        super(NodeApplyModule, self).__init__()
-        self.linear = nn.Linear(in_feats, out_feats)
-        self.activation = activation
+class NodeUpdateModule(gluon.Block):
+    def __init__(self, out_feats, activation=None):
+        super(NodeUpdateModule, self).__init__()
+        self.linear = gluon.nn.Dense(out_feats, activation=activation)

    def forward(self, node):
-        h = self.linear(node)
-        if self.activation:
-            h = self.activation(h)
-        return h
+        return self.linear(node)

-class GCN(nn.Module):
+class GCN(gluon.Block):
    def __init__(self,
                 g,
                 in_feats,
@@ -46,12 +41,13 @@ class GCN(nn.Module):
        self.g = g
        self.dropout = dropout
        # input layer
-        self.layers = nn.ModuleList([NodeApplyModule(in_feats, n_hidden, activation)])
+        self.layers = gluon.nn.Sequential()
+        self.layers.add(NodeUpdateModule(n_hidden, activation))
        # hidden layers
        for i in range(n_layers - 1):
-            self.layers.append(NodeApplyModule(n_hidden, n_hidden, activation))
+            self.layers.add(NodeUpdateModule(n_hidden, activation))
        # output layer
-        self.layers.append(NodeApplyModule(n_hidden, n_classes))
+        self.layers.add(NodeUpdateModule(n_classes))

    def forward(self, features):
        self.g.set_n_repr(features)
@@ -60,28 +56,29 @@ class GCN(nn.Module):
            if self.dropout:
                val = F.dropout(self.g.get_n_repr(), p=self.dropout)
                self.g.set_n_repr(val)
-            self.g.update_all(gcn_msg, gcn_reduce, layer, batchable=True)
+            self.g.update_all(gcn_msg, gcn_reduce, layer)
        return self.g.pop_n_repr()

 def main(args):
    # load and preprocess dataset
    data = load_data(args)

-    features = torch.FloatTensor(data.features)
-    labels = torch.LongTensor(data.labels)
-    mask = torch.ByteTensor(data.train_mask)
+    features = mx.nd.array(data.features)
+    labels = mx.nd.array(data.labels)
+    mask = mx.nd.array(data.train_mask)
    in_feats = features.shape[1]
    n_classes = data.num_labels
    n_edges = data.graph.number_of_edges()

-    if args.gpu < 0:
+    if args.gpu <= 0:
        cuda = False
+        ctx = mx.cpu(0)
    else:
        cuda = True
-        torch.cuda.set_device(args.gpu)
-        features = features.cuda()
-        labels = labels.cuda()
-        mask = mask.cuda()
+        features = features.as_in_context(mx.gpu(0))
+        labels = labels.as_in_context(mx.gpu(0))
+        mask = mask.as_in_context(mx.gpu(0))
+        ctx = mx.gpu(0)

    # create GCN model
    g = DGLGraph(data.graph)
@@ -90,14 +87,12 @@ def main(args):
                args.n_hidden,
                n_classes,
                args.n_layers,
-                F.relu,
+                'relu',
                args.dropout)
-
-    if cuda:
-        model.cuda()
+    model.initialize(ctx=ctx)

    # use optimizer
-    optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
+    trainer = gluon.Trainer(model.collect_params(), 'adam', {'learning_rate': args.lr})

    # initialize graph
    dur = []
@@ -105,19 +100,18 @@ def main(args):
        if epoch >= 3:
            t0 = time.time()
        # forward
-        logits = model(features)
-        logp = F.log_softmax(logits, 1)
-        loss = F.nll_loss(logp[mask], labels[mask])
+        with mx.autograd.record():
+            logits = model(features)
+            loss = mx.nd.softmax_cross_entropy(logits, labels)

-        optimizer.zero_grad()
+        #optimizer.zero_grad()
        loss.backward()
-        optimizer.step()
+        trainer.step(features.shape[0])

        if epoch >= 3:
            dur.append(time.time() - t0)
-
-        print("Epoch {:05d} | Loss {:.4f} | Time(s) {:.4f} | ETputs(KTEPS) {:.2f}".format(
-            epoch, loss.item(), np.mean(dur), n_edges / np.mean(dur) / 1000))
+            print("Epoch {:05d} | Loss {:.4f} | Time(s) {:.4f} | ETputs(KTEPS) {:.2f}".format(
+                epoch, loss.asnumpy()[0], np.mean(dur), n_edges / np.mean(dur) / 1000))

 if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='GCN')
@@ -135,6 +129,5 @@ if __name__ == '__main__':
    parser.add_argument("--n-layers", type=int, default=1,
            help="number of hidden gcn layers")
    args = parser.parse_args()
-    print(args)

    main(args)
--- a/examples/pytorch/gat/gat.py
+++ b/examples/pytorch/gat/gat.py
@@ -2,6 +2,8 @@
 Graph Attention Networks
 Paper: https://arxiv.org/abs/1710.10903
 Code: https://github.com/PetarV-/GAT
+
+GAT with batch processing
 """

 import argparse
@@ -10,6 +12,7 @@ import time
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
+import dgl
 from dgl import DGLGraph
 from dgl.data import register_data_args, load_data

@@ -22,15 +25,15 @@ class GATReduce(nn.Module):
        self.attn_drop = attn_drop

    def forward(self, node, msgs):
-        a1 = torch.unsqueeze(node['a1'], 0)  # shape (1, 1)
-        a2 = torch.cat([torch.unsqueeze(m['a2'], 0) for m in msgs], dim=0) # shape (deg, 1)
-        ft = torch.cat([torch.unsqueeze(m['ft'], 0) for m in msgs], dim=0) # shape (deg, D)
+        a1 = torch.unsqueeze(node['a1'], 1)  # shape (B, 1, 1)
+        a2 = msgs['a2'] # shape (B, deg, 1)
+        ft = msgs['ft'] # shape (B, deg, D)
        # attention
-        a = a1 + a2  # shape (deg, 1)
-        e = F.softmax(F.leaky_relu(a), dim=0)
+        a = a1 + a2  # shape (B, deg, 1)
+        e = F.softmax(F.leaky_relu(a), dim=1)
        if self.attn_drop != 0.0:
            e = F.dropout(e, self.attn_drop)
-        return {'accum' : torch.sum(e * ft, dim=0)} # shape (D,)
+        return {'accum' : torch.sum(e * ft, dim=1)} # shape (B, D)

 class GATFinalize(nn.Module):
    def __init__(self, headid, indim, hiddendim, activation, residual):
@@ -71,7 +74,7 @@ class GATPrepare(nn.Module):

 class GAT(nn.Module):
    def __init__(self,
-                 nx_graph,
+                 g,
                 num_layers,
                 in_dim,
                 num_hidden,
@@ -82,8 +85,8 @@ class GAT(nn.Module):
                 attn_drop,
                 residual):
        super(GAT, self).__init__()
-        self.g = DGLGraph(nx_graph)
-        self.num_layers = num_layers # one extra output projection
+        self.g = g
+        self.num_layers = num_layers
        self.num_heads = num_heads
        self.prp = nn.ModuleList()
        self.red = nn.ModuleList()
@@ -104,48 +107,39 @@ class GAT(nn.Module):
        # output projection
        self.prp.append(GATPrepare(num_hidden * num_heads, num_classes, in_drop))
        self.red.append(GATReduce(attn_drop))
-        self.fnl.append(GATFinalize(0, num_hidden * num_heads, num_classes, activation, residual))
+        self.fnl.append(GATFinalize(0, num_hidden * num_heads,
+                                    num_classes, activation, residual))
        # sanity check
        assert len(self.prp) == self.num_layers * self.num_heads + 1
        assert len(self.red) == self.num_layers * self.num_heads + 1
        assert len(self.fnl) == self.num_layers * self.num_heads + 1

-    def forward(self, features, train_nodes):
+    def forward(self, features):
        last = features
        for l in range(self.num_layers):
            for hid in range(self.num_heads):
                i = l * self.num_heads + hid
                # prepare
-                for n, h in last.items():
-                    self.g.nodes[n].update(self.prp[i](h))
+                self.g.set_n_repr(self.prp[i](last))
                # message passing
                self.g.update_all(gat_message, self.red[i], self.fnl[i])
            # merge all the heads
-            last = {}
-            for n in self.g.nodes():
-                last[n] = torch.cat(
-                    [self.g.nodes[n]['head%d' % hid] for hid in range(self.num_heads)])
+            last = torch.cat(
+                    [self.g.pop_n_repr('head%d' % hid) for hid in range(self.num_heads)],
+                    dim=1)
        # output projection
-        for n, h in last.items():
-          self.g.nodes[n].update(self.prp[-1](h))
+        self.g.set_n_repr(self.prp[-1](last))
        self.g.update_all(gat_message, self.red[-1], self.fnl[-1])
-        return torch.cat([torch.unsqueeze(self.g.nodes[n]['head0'], 0) for n in train_nodes])
+        return self.g.pop_n_repr('head0')

 def main(args):
    # load and preprocess dataset
    data = load_data(args)

-    # features of each samples
-    features = {}
-    labels = []
-    train_nodes = []
-    for n in data.graph.nodes():
-        features[n] = torch.FloatTensor(data.features[n, :])
-        if data.train_mask[n] == 1:
-            train_nodes.append(n)
-            labels.append(data.labels[n])
-    labels = torch.LongTensor(labels)
-    in_feats = data.features.shape[1]
+    features = torch.FloatTensor(data.features)
+    labels = torch.LongTensor(data.labels)
+    mask = torch.ByteTensor(data.train_mask)
+    in_feats = features.shape[1]
    n_classes = data.num_labels
    n_edges = data.graph.number_of_edges()

@@ -154,11 +148,15 @@ def main(args):
    else:
        cuda = True
        torch.cuda.set_device(args.gpu)
-        features = {k : v.cuda() for k, v in features.items()}
+        features = features.cuda()
        labels = labels.cuda()
+        mask = mask.cuda()
+
+    # create GCN model
+    g = DGLGraph(data.graph)

    # create model
-    model = GAT(data.graph,
+    model = GAT(g,
                args.num_layers,
                in_feats,
                args.num_hidden,
@@ -181,7 +179,7 @@ def main(args):
        if epoch >= 3:
            t0 = time.time()
        # forward
-        logits = model(features, train_nodes)
+        logits = model(features)
        logp = F.log_softmax(logits, 1)
        loss = F.nll_loss(logp, labels)

@@ -202,7 +200,7 @@ if __name__ == '__main__':
            help="Which GPU to use. Set -1 to use CPU.")
    parser.add_argument("--epochs", type=int, default=20,
            help="number of training epochs")
-    parser.add_argument("--num-heads", type=int, default=8,
+    parser.add_argument("--num-heads", type=int, default=3,
            help="number of attentional heads to use")
    parser.add_argument("--num-layers", type=int, default=1,
            help="number of hidden layers")

--- a/examples/pytorch/gcn/README.md
+++ b/examples/pytorch/gcn/README.md
@@ -4,43 +4,9 @@ Graph Convolutional Networks (GCN)
 Paper link: [https://arxiv.org/abs/1609.02907](https://arxiv.org/abs/1609.02907)
 Author's code repo: [https://github.com/tkipf/gcn](https://github.com/tkipf/gcn)

-The folder contains three different implementations using DGL.
+The folder contains two different implementations using DGL.

-Naive GCN (gcn.py)
-------
-The model is defined in the finest granularity (aka on *one* edge and *one* node).
-
-* The message function `gcn_msg` computes the message for one edge. It simply returns the `h` representation of the source node.
-  ```python
-  def gcn_msg(src, edge):
-    # src['h'] is a tensor of shape (D,). D is the feature length.
-    return src['h']
-  ```
-* The reduce function `gcn_reduce` accumulates the incoming messages for one node. The `msgs` argument is a list of all the messages. In GCN, the incoming messages are summed up.
-  ```python
-  def gcn_reduce(node, msgs):
-    # msgs is a list of in-coming messages.
-    return sum(msgs)
-  ```
-* The update function `NodeUpdateModule` computes the new new node representation `h` using non-linear transformation on the reduced messages.
-  ```python
-  class NodeUpdateModule(nn.Module):
-    def __init__(self, in_feats, out_feats, activation=None):
-      super(NodeUpdateModule, self).__init__()
-      self.linear = nn.Linear(in_feats, out_feats)
-      self.activation = activation
-
-    def forward(self, node, accum):
-      # accum is a tensor of shape (D,).
-      h = self.linear(accum)
-      if self.activation:
-          h = self.activation(h)
-      return {'h' : h}
-  ```
-
-After defining the functions on each node/edge, the message passing is triggered by calling `update_all` on the DGLGraph object (in GCN module).
-
-Batched GCN (gcn_batch.py)
+Batched GCN (gcn.py)
 -----------
 Defining the model on only one node and edge makes it hard to fully utilize GPUs. As a result, we allow users to define model on a *batch of* nodes and edges.