[Doc] Tutorials re-organization (#2683)

* reorg * change titles * rm some stale API doc; minor fix * fix docs * add warning * rm new-tutorial run in ci * lint

[Doc] Tutorials re-organization (#2683)
* reorg * change titles * rm some stale API doc; minor fix * fix docs * add warning * rm new-tutorial run in ci * lint
8a07ab77 · Minjie Wang · GitHub · 0fc64952 · 8a07ab77 · 8a07ab77
Unverified Commit 8a07ab77 authored Feb 20, 2021 by Minjie Wang Committed by GitHub Feb 20, 2021
20 changed files
--- a/docs/source/guide_cn/distributed-apis.rst
+++ b/docs/source/guide_cn/distributed-apis.rst
@@ -107,7 +107,7 @@ DGL为分布式张量提供了类似于单机普通张量的接口，以访问

 .. code:: python

-    tensor = dgl.distributed.DistTensor((g.number_of_nodes(), 10), th.float32, name=’test’)
+    tensor = dgl.distributed.DistTensor((g.number_of_nodes(), 10), th.float32, name='test')

 **Note**: :class:`~dgl.distributed.DistTensor` 的创建是一个同步操作。所有训练器都必须调用创建，
 并且只有当所有训练器都调用它时，此创建过程才能成功。

--- a/docs/source/guide_cn/distributed-preprocessing.rst
+++ b/docs/source/guide_cn/distributed-preprocessing.rst
@@ -53,7 +53,7 @@ JSON文件包含所有划分的配置。如果该API没有为节点和边分配

 .. code:: python

-    dgl.distributed.partition_graph(g, ‘graph_name’, 4, ‘/tmp/test’, balance_ntypes=g.ndata[‘train_mask’])
+    dgl.distributed.partition_graph(g, 'graph_name', 4, '/tmp/test', balance_ntypes=g.ndata['train_mask'])

 除了平衡节点的类型之外， :func:`dgl.distributed.partition_graph` 还允许通过指定
 ``balance_edges`` 来平衡每个类型节点在子图中的入度。这平衡了不同类型节点的连边数量。

--- a/docs/source/guide_cn/graph.rst
+++ b/docs/source/guide_cn/graph.rst
@@ -27,9 +27,9 @@ DGL通过其核心数据结构  :class:`~dgl.DGLGraph` 提供了一个以图为
    :hidden:
    :glob:

-    graph_cn-basic
-    graph_cn-graphs-nodes-edges
-    graph_cn-feature
-    graph_cn-external
-    graph_cn-heterogeneous
-    graph_cn-gpu
+    graph-basic
+    graph-graphs-nodes-edges
+    graph-feature
+    graph-external
+    graph-heterogeneous
+    graph-gpu
--- a/docs/source/guide_cn/index.rst
+++ b/docs/source/guide_cn/index.rst
 用户指南
 ==========

-(持续更新中)
-
 .. toctree::
  :maxdepth: 2
  :titlesonly:

--- a/docs/source/guide_cn/message.rst
+++ b/docs/source/guide_cn/message.rst
 .. _guide_cn-message-passing:

 第2章：消息传递范式
-================
+===========================

 :ref:`(English Version) <guide-message-passing>`

 消息传递是实现GNN的一种通用框架和编程范式。它从聚合与更新的角度归纳总结了多种GNN模型的实现。

 消息传递范式
----------
+----------------------

 假设节点 :math:`v` 上的的特征为 :math:`x_v\in\mathbb{R}^{d_1}`，边 :math:`({u}, {v})` 上的特征为 :math:`w_{e}\in\mathbb{R}^{d_2}`。
 **消息传递范式** 定义了以下逐节点和边上的计算：
@@ -21,7 +21,7 @@
 **聚合函数** :math:`\rho` 会聚合节点接受到的消息。 **更新函数** :math:`\psi` 会结合聚合后的消息和节点本身的特征来更新节点的特征。

 本章路线图
--------
+--------------------

 本章首先介绍了DGL的消息传递API。然后讲解了如何高效地在点和边上使用这些API。本章的最后一节解释了如何在异构图上实现消息传递。


--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -3,75 +3,8 @@
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

-Overview of DGL
-===============
-
-Deep Graph Library (DGL) is a Python package built for easy implementation of
-graph neural network model family, on top of existing DL frameworks (e.g.
-PyTorch, MXNet, Gluon etc.).
-
-DGL reduces the implementation of graph neural networks into declaring a set
-of *functions* (or *modules* in PyTorch terminology).  In addition, DGL
-provides:
-
-* Versatile controls over message passing, ranging from low-level operations
-  such as sending along selected edges and receiving on specific nodes, to
-  high-level control such as graph-wide feature updates.
-* Transparent speed optimization with automatic batching of computations and
-  sparse matrix multiplication.
-* Seamless integration with existing deep learning frameworks.
-* Easy and friendly interfaces for node/edge feature access and graph
-  structure manipulation.
-* Good scalability to graphs with tens of millions of vertices.
-
-To begin with, we have prototyped 10 models across various domains:
-semi-supervised learning on graphs (with potentially billions of nodes/edges),
-generative models on graphs, (previously) difficult-to-parallelize tree-based
-models like TreeLSTM, etc. We also implement some conventional models in DGL
-from a new graphical perspective yielding simplicity.
-
-Getting Started
---------------
-
-* :doc:`Installation<install/index>`.
-* :doc:`Quickstart tutorial<tutorials/basics/1_first>` for absolute beginners.
-* :doc:`User guide<guide/index>`.
-* :doc:`用户指南(User guide)中文版<guide_cn/index>`.
-* :doc:`API reference manual<api/python/index>`.
-* :doc:`End-to-end model tutorials<tutorials/models/index>` for learning DGL by popular models on graphs.
-
-..
-  Follow the :doc:`instructions<install/index>` to install DGL.
-  :doc:`<new-tutorial/1_introduction>` is the most common place to get started with.
-  It offers a broad experience of using DGL for deep learning on graph data.
-
-  API reference document lists more endetailed specifications of each API and GNN modules,
-  a useful manual for in-depth developers.
-
-  You can learn other basic concepts of DGL through the dedicated tutorials.
-
-  * Learn constructing, saving and loading graphs with node and edge features :doc:`here<new-tutorial/2_dglgraph>`.
-  * Learn performing computation on graph using message passing :doc:`here<new-tutorial/3_message_passing>`.
-  * Learn link prediction with DGL :doc:`here<new-tutorial/4_link_predict>`.
-  * Learn graph classification with DGL :doc:`here<new-tutorial/5_graph_classification>`.
-  * Learn creating your own dataset for DGL :doc:`here<new-tutorial/6_load_data>`.
-  * Learn working with heterogeneous graph data :doc:`here<tutorials/basics/5_hetero>`.
-
-  End-to-end model tutorials are other good starting points for learning DGL and popular
-  models on graphs. The model tutorials are categorized based on the way they utilize DGL APIs.
-
-  * :ref:`Graph Neural Network and its variant <tutorials1-index>`: Learn how to use DGL to train
-    popular **GNN models** on one input graph.
-  * :ref:`Dealing with many small graphs <tutorials2-index>`: Learn how to train models for
-    many graph samples such as sentence parse trees.
-  * :ref:`Generative models <tutorials3-index>`: Learn how to deal with **dynamically-changing graphs**.
-  * :ref:`Old (new) wines in new bottle <tutorials4-index>`: Learn how to combine DGL with tensor-based
-    DGL framework in a flexible way. Explore new perspective on traditional models by graphs.
-  * :ref:`Training on giant graphs <tutorials5-index>`: Learn how to train graph neural networks
-    on giant graphs.
-
-  Each tutorial is accompanied with a runnable python script and jupyter notebook that
-  can be downloaded. If you would like the tutorials improved, please raise a github issue.
+Welcome to Deep Graph Library Tutorials and Documentation
+=========================================================

 .. toctree::
   :maxdepth: 1
@@ -80,33 +13,19 @@ Getting Started
   :glob:

   install/index
-   install/backend
+   tutorials/blitz/index

 .. toctree::
   :maxdepth: 2
-   :caption: Tutorials
-   :hidden:
-   :glob:
-
-   new-tutorial/blitz/index
-   new-tutorial/large/index
-
-.. toctree::
-   :maxdepth: 3
-   :caption: Model Examples
-   :hidden:
-   :glob:
-
-   tutorials/models/index
-
-.. toctree::
-   :maxdepth: 2
-   :caption: User Guide
+   :caption: Advanced Materials
   :hidden:
   :titlesonly:
   :glob:

   guide/index
+   guide_cn/index
+   tutorials/large/index
+   tutorials/models/index

 .. toctree::
   :maxdepth: 2
@@ -121,6 +40,7 @@ Getting Started
   api/python/dgl.distributed
   api/python/dgl.function
   api/python/nn
+   api/python/nn.functional
   api/python/dgl.ops
   api/python/dgl.optim
   api/python/dgl.sampling
@@ -145,34 +65,39 @@ Getting Started
   env_var
   resources

-Relationship of DGL to other frameworks
---------------------------------------
-DGL is designed to be compatible and agnostic to the existing tensor
-frameworks. It provides a backend adapter interface that allows easy porting
-to other tensor-based, autograd-enabled frameworks.

+Deep Graph Library (DGL) is a Python package built for easy implementation of
+graph neural network model family, on top of existing DL frameworks (currently
+supporting PyTorch, MXNet and TensorFlow). It offers a versatile control of message passing,
+speed optimization via auto-batching and highly tuned sparse matrix kernels,
+and multi-GPU/CPU training to scale to graphs of hundreds of millions of
+nodes and edges.

-Free software
+Getting Started
+---------------
+
+For absolute beginners, start with the :doc:`Blitz Introduction to DGL <tutorials/blitz/index>`.
+It covers the basic concepts of common graph machine learning tasks and a step-by-step
+on building Graph Neural Networks (GNNs) to solve them.
+
+For acquainted users who wish to learn more advanced usage,
+
+* `Learn DGL by examples <https://github.com/dmlc/dgl/tree/master/examples>`_.
+* Read the :doc:`User Guide<guide/index>` (:doc:`中文版链接<guide_cn/index>`), which explains the concepts
+  and usage of DGL in much more details.
+* Go through the tutorials for :doc:`Stochastic Training of GNNs <tutorials/large/index>`,
+  which covers the basic steps for training GNNs on large graphs in mini-batches.
+* :doc:`Study classical papers <tutorials/models/index>` on graph machine learning alongside DGL.
+* Search for the usage of a specific API in the :doc:`API reference manual <api/python/index>`,
+  which organizes all DGL APIs by their namespace.
+
+Contribution
 -------------
 DGL is free software; you can redistribute it and/or modify it under the terms
 of the Apache License 2.0. We welcome contributions.
 Join us on `GitHub <https://github.com/dmlc/dgl>`_ and check out our
 :doc:`contribution guidelines <contribute>`.

-History
-------
-Prototype of DGL started in early Spring, 2018, at NYU Shanghai by Prof. `Zheng
-Zhang <https://shanghai.nyu.edu/academics/faculty/directory/zheng-zhang>`_ and
-Quan Gan. Serious development began when `Minjie
-<https://jermainewang.github.io/>`_, `Lingfan <https://cs.nyu.edu/~lingfan/>`_
-and Prof. `Jinyang Li <http://www.news.cs.nyu.edu/~jinyang/>`_ from NYU's
-system group joined, flanked by a team of student volunteers at NYU Shanghai,
-Fudan and other universities (Yu, Zihao, Murphy, Allen, Qipeng, Qi, Hao), as
-well as early adopters at the CILVR lab (Jake Zhao). Development accelerated
-when AWS MXNet Science team joined force, with Da Zheng, Alex Smola, Haibin
-Lin, Chao Ma and a number of others. For full credit, see `here
-<https://www.dgl.ai/ack>`_.
-
 Index
 -----
 * :ref:`genindex`
--- a/docs/source/install/backend.rst
+++ b/docs/source/install/backend.rst
-.. _backends:
-
-Working with different backends
-===============================
-
-DGL supports PyTorch, MXNet and Tensorflow backends. 
-DGL will choose the backend on the following options (high priority to low priority)
- `DGLBACKEND` environment
-   - You can use `DGLBACKEND=[BACKEND] python gcn.py ...` to specify the backend
-   - Or `export DGLBACKEND=[BACKEND]` to set the global environment variable 
- `config.json` file under "~/.dgl"
-   - You can use `python -m dgl.backend.set_default_backend [BACKEND]` to set the default backend
-
-Currently BACKEND can be chosen from mxnet, pytorch, tensorflow.
-
-PyTorch backend
---------------
-
-Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch
-version is 1.5.0 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
-
-MXNet backend
-------------
-
-Export ``DGLBACKEND`` as ``mxnet`` to specify MXNet backend. The required MXNet version is
-1.5 or later. See `mxnet.apache.org <https://mxnet.apache.org/get_started>`_ for installation
-instructions.
-
-MXNet uses uint32 as the default data type for integer tensors, which only supports graph of
-size smaller than 2^32. To enable large graph training, *build* MXNet with ``USE_INT64_TENSOR_SIZE=1``
-flag. See `this FAQ <https://mxnet.apache.org/api/faq/large_tensor_support>`_ for more information.
-
-MXNet 1.5 and later has an option to enable Numpy shape mode for ``NDArray`` objects, some DGL models
-need this mode to be enabled to run correctly. However, this mode may not compatible with pretrained
-model parameters with this mode disabled, e.g. pretrained models from GluonCV and GluonNLP.
-By setting ``DGL_MXNET_SET_NP_SHAPE``, users can switch this mode on or off.
-
-Tensorflow backend
------------------
-
-Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow
-version is 2.2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
-instructions. In addition, DGL will set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
-
-.. code:: bash
-
-   pip install "tensorflow>=2.2.0"  # when using tensorflow cpu version
-
--- a/docs/source/install/index.rst
+++ b/docs/source/install/index.rst
-Install DGL
-===========
-
-This topic explains how to install DGL. We recommend installing DGL by using ``conda`` or ``pip``.
+Install and Setup
+=================

 System requirements
 -------------------
@@ -22,7 +20,8 @@ CPU build, then the CPU build is overwritten.
 Install from Conda or Pip
 -------------------------

-Check out the `Get Started page <https://www.dgl.ai/pages/start.html>`_.
+We recommend installing DGL by ``conda`` or ``pip``.
+Check out the instructions on the `Get Started page <https://www.dgl.ai/pages/start.html>`_.

 .. _install-from-source:

@@ -63,15 +62,14 @@ configuration as you wish. For example, change ``USE_CUDA`` to ``ON`` will
 enable a CUDA build. You could also pass ``-DKEY=VALUE`` to the cmake command
 for the same purpose.

- CPU-only build
-   .. code:: bash
+* CPU-only build::

     mkdir build
     cd build
     cmake ..
     make -j4
- CUDA build
-   .. code:: bash
+
+* CUDA build::

     mkdir build
     cd build
@@ -125,8 +123,7 @@ You can build DGL with MSBuild.  With `MS Build Tools <https://go.microsoft.com/
 and `CMake on Windows <https://cmake.org/download/>`_ installed, run the following
 in VS2019 x64 Native tools command prompt.

- CPU only build
-  .. code::
+* CPU only build::

     MD build
     CD build
@@ -134,8 +131,8 @@ in VS2019 x64 Native tools command prompt.
     msbuild dgl.sln /m
     CD ..\python
     python setup.py install
- CUDA build
-  .. code::
+
+* CUDA build::

     MD build
     CD build
@@ -144,9 +141,61 @@ in VS2019 x64 Native tools command prompt.
     CD ..\python
     python setup.py install

-Optional Flags
-``````````````
+Compilation Flags
+`````````````````
+
+See `config.cmake <https://github.com/dmlc/dgl/blob/master/cmake/config.cmake>`_.
+
+
+.. _backends:
+
+Working with different backends
+-------------------------------
+
+DGL supports PyTorch, MXNet and Tensorflow backends. 
+DGL will choose the backend on the following options (high priority to low priority)
+
+* Use the ``DGLBACKEND`` environment variable:
+
+   - You can use ``DGLBACKEND=[BACKEND] python gcn.py ...`` to specify the backend
+   - Or ``export DGLBACKEND=[BACKEND]`` to set the global environment variable 
+
+* Modify the ``config.json`` file under "~/.dgl":
+
+   - You can use ``python -m dgl.backend.set_default_backend [BACKEND]`` to set the default backend
+
+Currently BACKEND can be chosen from mxnet, pytorch, tensorflow.
+
+PyTorch backend
+```````````````
+
+Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch
+version is 1.5.0 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
+
+MXNet backend
+`````````````
+
+Export ``DGLBACKEND`` as ``mxnet`` to specify MXNet backend. The required MXNet version is
+1.5 or later. See `mxnet.apache.org <https://mxnet.apache.org/get_started>`_ for installation
+instructions.
+
+MXNet uses uint32 as the default data type for integer tensors, which only supports graph of
+size smaller than 2^32. To enable large graph training, *build* MXNet with ``USE_INT64_TENSOR_SIZE=1``
+flag. See `this FAQ <https://mxnet.apache.org/api/faq/large_tensor_support>`_ for more information.
+
+MXNet 1.5 and later has an option to enable Numpy shape mode for ``NDArray`` objects, some DGL models
+need this mode to be enabled to run correctly. However, this mode may not compatible with pretrained
+model parameters with this mode disabled, e.g. pretrained models from GluonCV and GluonNLP.
+By setting ``DGL_MXNET_SET_NP_SHAPE``, users can switch this mode on or off.
+
+Tensorflow backend
+``````````````````
+
+Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow
+version is 2.2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
+instructions. In addition, DGL will set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
+
+.. code:: bash
+
+   pip install "tensorflow>=2.2.0"  # when using tensorflow cpu version

- If you are using PyTorch, you can add ``-DBUILD_TORCH=ON`` flag in CMake
-  to build PyTorch plugins for further performance optimization.  This applies for Linux,
-  Windows, and Mac.
--- a/new-tutorial/README.txt
+++ b/new-tutorial/README.txt
--- a/python/dgl/distributed/partition.py
+++ b/python/dgl/distributed/partition.py
--- a/python/dgl/heterograph.py
+++ b/python/dgl/heterograph.py
--- a/python/dgl/ops/edge_softmax.py
+++ b/python/dgl/ops/edge_softmax.py
@@ -9,8 +9,6 @@ __all__ = ['edge_softmax']
 def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
    r"""Compute softmax over weights of incoming edges for every node.

-    Description
-    -----------
    For a node :math:`i`, edge softmax is an operation that computes

    .. math::
@@ -28,6 +26,9 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
    An example of using edge softmax is in
    `Graph Attention Network <https://arxiv.org/pdf/1710.10903.pdf>`__ where
    the attention weights are computed with this operation.
+    Other non-GNN examples using this are
+    `Transformer <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>`__,
+    `Capsule <https://arxiv.org/pdf/1710.09829.pdf>`__, etc.

    Parameters
    ----------

--- a/python/dgl/ops/sddmm.py
+++ b/python/dgl/ops/sddmm.py
@@ -13,6 +13,7 @@ def gsddmm(g, op, lhs_data, rhs_data, lhs_target='u', rhs_target='v'):
    It computes edge features by :attr:`op` lhs features and rhs features.

    .. math::
+
        x_{e} = \phi(x_{lhs}, x_{rhs}), \forall (u,e,v)\in \mathcal{G}

    where :math:`x_{e}` is the returned feature on edges and :math:`x_u`,
@@ -33,9 +34,9 @@ def gsddmm(g, op, lhs_data, rhs_data, lhs_target='u', rhs_target='v'):
    rhs_data : tensor or None
        The right operand, could be None if it's not required by op.
    lhs_target: str
-        Choice of `u`(source), `e`(edge) or `v`(destination) for left operand.
+        Choice of ``u``(source), ``e``(edge) or ``v``(destination) for left operand.
    rhs_target: str
-        Choice of `u`(source), `e`(edge) or `v`(destination) for right operand.
+        Choice of ``u``(source), ``e``(edge) or ``v``(destination) for right operand.

    Returns
    -------

--- a/tests/scripts/task_pytorch_tutorial_test.sh
+++ b/tests/scripts/task_pytorch_tutorial_test.sh
@@ -4,7 +4,6 @@
 . /opt/conda/etc/profile.d/conda.sh
 conda activate pytorch-ci
 TUTORIAL_ROOT="./tutorials"
-NEW_TUTORIAL_ROOT="./new-tutorial"

 function fail {
    echo FAIL: $@
@@ -29,11 +28,3 @@ do
 done

 popd > /dev/null
-
-pushd ${NEW_TUTORIAL_ROOT} > /dev/null
-for f in $(find . -name "*.py" ! -name "*_mx.py")
-do
-    echo "Running tutorial ${f} ..."
-    python3 $f || fail "run ${f}"
-done
-popd > /dev/null
--- a/tutorials/README.txt
+++ b/tutorials/README.txt
--- a/tutorials/basics/1_first.py
+++ b/tutorials/basics/1_first.py
-"""
-.. currentmodule:: dgl
-
-DGL at a Glance
-=========================
-
-**Author**: `Minjie Wang <https://jermainewang.github.io/>`_, Quan Gan, `Jake
-Zhao <https://cs.nyu.edu/~jakezhao/>`_, Zheng Zhang
-
-DGL is a Python package dedicated to deep learning on graphs, built atop
-existing tensor DL frameworks (e.g. Pytorch, MXNet) and simplifying the
-implementation of graph-based neural networks.
-
-The goal of this tutorial:
-
- Understand how DGL enables computation on graph from a high level.
- Train a simple graph neural network in DGL to classify nodes in a graph.
-
-At the end of this tutorial, we hope you get a brief feeling of how DGL works.
-
-*This tutorial assumes basic familiarity with pytorch.*
-"""
-
-###############################################################################
-# Tutorial problem description
-# ----------------------------
-#
-# The tutorial is based on the "Zachary's karate club" problem. The karate club
-# is a social network that includes 34 members and documents pairwise links
-# between members who interact outside the club.  The club later divides into
-# two communities led by the instructor (node 0) and the club president (node
-# 33). The network is visualized as follows with the color indicating the
-# community:
-#
-# .. image:: https://data.dgl.ai/tutorial/img/karate-club.png
-#    :align: center
-#
-# The task is to predict which side (0 or 33) each member tends to join given
-# the social network itself.
-
-
-###############################################################################
-# Step 1: Creating a graph in DGL
-# -------------------------------
-# Create the graph for Zachary's karate club as follows:
-
-import dgl
-import numpy as np
-
-def build_karate_club_graph():
-    # All 78 edges are stored in two numpy arrays. One for source endpoints
-    # while the other for destination endpoints.
-    src = np.array([1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 10, 10,
-        10, 11, 12, 12, 13, 13, 13, 13, 16, 16, 17, 17, 19, 19, 21, 21,
-        25, 25, 27, 27, 27, 28, 29, 29, 30, 30, 31, 31, 31, 31, 32, 32,
-        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
-        33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
-    dst = np.array([0, 0, 1, 0, 1, 2, 0, 0, 0, 4, 5, 0, 1, 2, 3, 0, 2, 2, 0, 4,
-        5, 0, 0, 3, 0, 1, 2, 3, 5, 6, 0, 1, 0, 1, 0, 1, 23, 24, 2, 23,
-        24, 2, 23, 26, 1, 8, 0, 24, 25, 28, 2, 8, 14, 15, 18, 20, 22, 23,
-        29, 30, 31, 8, 9, 13, 14, 15, 18, 19, 20, 22, 23, 26, 27, 28, 29, 30,
-        31, 32])
-    # Edges are directional in DGL; Make them bi-directional.
-    u = np.concatenate([src, dst])
-    v = np.concatenate([dst, src])
-    # Construct a DGLGraph
-    return dgl.graph((u, v))
-
-###############################################################################
-# Print out the number of nodes and edges in our newly constructed graph:
-
-G = build_karate_club_graph()
-print('We have %d nodes.' % G.number_of_nodes())
-print('We have %d edges.' % G.number_of_edges())
-
-###############################################################################
-# Visualize the graph by converting it to a `networkx
-# <https://networkx.github.io/documentation/stable/>`_ graph:
-
-import networkx as nx
-# Since the actual graph is undirected, we convert it for visualization
-# purpose.
-nx_G = G.to_networkx().to_undirected()
-# Kamada-Kawaii layout usually looks pretty for arbitrary graphs
-pos = nx.kamada_kawai_layout(nx_G)
-nx.draw(nx_G, pos, with_labels=True, node_color=[[.7, .7, .7]])
-
-###############################################################################
-# Step 2: Assign features to nodes or edges
-# --------------------------------------------
-# Graph neural networks associate features with nodes and edges for training.
-# For our classification example, since there is no input feature, we assign each node
-# with a learnable embedding vector.
-
-# In DGL, you can add features for all nodes at once, using a feature tensor that
-# batches node features along the first dimension. The code below adds the learnable
-# embeddings for all nodes:
-
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-embed = nn.Embedding(34, 5)  # 34 nodes with embedding dim equal to 5
-G.ndata['feat'] = embed.weight
-
-###############################################################################
-# Print out the node features to verify:
-
-# print out node 2's input feature
-print(G.ndata['feat'][2])
-
-# print out node 10 and 11's input features
-print(G.ndata['feat'][[10, 11]])
-
-###############################################################################
-# Step 3: Define a Graph Convolutional Network (GCN)
-# --------------------------------------------------
-# To perform node classification, use the Graph Convolutional Network
-# (GCN) developed by `Kipf and Welling <https://arxiv.org/abs/1609.02907>`_. Here
-# is the simplest definition of a GCN framework. We recommend that you 
-# read the original paper for more details.
-#
-# - At layer :math:`l`, each node :math:`v_i^l` carries a feature vector :math:`h_i^l`.
-# - Each layer of the GCN tries to aggregate the features from :math:`u_i^{l}` where
-#   :math:`u_i`'s are neighborhood nodes to :math:`v` into the next layer representation at
-#   :math:`v_i^{l+1}`. This is followed by an affine transformation with some
-#   non-linearity.
-#
-# The above definition of GCN fits into a **message-passing** paradigm: Each
-# node will update its own feature with information sent from neighboring
-# nodes. A graphical demonstration is displayed below.
-#
-# .. image:: https://data.dgl.ai/tutorial/1_first/mailbox.png
-#    :alt: mailbox
-#    :align: center
-#
-# In DGL, we provide implementations of popular Graph Neural Network layers under
-# the `dgl.<backend>.nn` subpackage. The :class:`~dgl.nn.pytorch.GraphConv` module
-# implements one Graph Convolutional layer.
-
-from dgl.nn.pytorch import GraphConv
-
-###############################################################################
-# Define a deeper GCN model that contains two GCN layers:
-
-class GCN(nn.Module):
-    def __init__(self, in_feats, hidden_size, num_classes):
-        super(GCN, self).__init__()
-        self.conv1 = GraphConv(in_feats, hidden_size)
-        self.conv2 = GraphConv(hidden_size, num_classes)
-
-    def forward(self, g, inputs):
-        h = self.conv1(g, inputs)
-        h = torch.relu(h)
-        h = self.conv2(g, h)
-        return h
-
-# The first layer transforms input features of size of 5 to a hidden size of 5.
-# The second layer transforms the hidden layer and produces output features of
-# size 2, corresponding to the two groups of the karate club.
-net = GCN(5, 5, 2)
-
-###############################################################################
-# Step 4: Data preparation and initialization
-# -------------------------------------------
-#
-# We use learnable embeddings to initialize the node features. Since this is a
-# semi-supervised setting, only the instructor (node 0) and the club president
-# (node 33) are assigned labels. The implementation is available as follow.
-
-inputs = embed.weight
-labeled_nodes = torch.tensor([0, 33])  # only the instructor and the president nodes are labeled
-labels = torch.tensor([0, 1])  # their labels are different
-
-###############################################################################
-# Step 5: Train then visualize
-# ----------------------------
-# The training loop is exactly the same as other PyTorch models.
-# We (1) create an optimizer, (2) feed the inputs to the model,
-# (3) calculate the loss and (4) use autograd to optimize the model.
-import itertools
-
-optimizer = torch.optim.Adam(itertools.chain(net.parameters(), embed.parameters()), lr=0.01)
-all_logits = []
-for epoch in range(50):
-    logits = net(G, inputs)
-    # we save the logits for visualization later
-    all_logits.append(logits.detach())
-    logp = F.log_softmax(logits, 1)
-    # we only compute loss for labeled nodes
-    loss = F.nll_loss(logp[labeled_nodes], labels)
-
-    optimizer.zero_grad()
-    loss.backward()
-    optimizer.step()
-
-    print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))
-
-###############################################################################
-# This is a rather toy example, so it does not even have a validation or test
-# set. Instead, Since the model produces an output feature of size 2 for each node, we can
-# visualize by plotting the output feature in a 2D space.
-# The following code animates the training process from initial guess
-# (where the nodes are not classified correctly at all) to the end
-# (where the nodes are linearly separable).
-
-import matplotlib.animation as animation
-import matplotlib.pyplot as plt
-
-def draw(i):
-    cls1color = '#00FFFF'
-    cls2color = '#FF00FF'
-    pos = {}
-    colors = []
-    for v in range(34):
-        pos[v] = all_logits[i][v].numpy()
-        cls = pos[v].argmax()
-        colors.append(cls1color if cls else cls2color)
-    ax.cla()
-    ax.axis('off')
-    ax.set_title('Epoch: %d' % i)
-    nx.draw_networkx(nx_G.to_undirected(), pos, node_color=colors,
-            with_labels=True, node_size=300, ax=ax)
-
-fig = plt.figure(dpi=150)
-fig.clf()
-ax = fig.subplots()
-draw(0)  # draw the prediction of the first epoch
-plt.close()
-
-###############################################################################
-# .. image:: https://data.dgl.ai/tutorial/1_first/karate0.png
-#    :height: 300px
-#    :width: 400px
-#    :align: center
-
-###############################################################################
-# The following animation shows how the model correctly predicts the community
-# after a series of training epochs.
-
-ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)
-
-###############################################################################
-# .. image:: https://data.dgl.ai/tutorial/1_first/karate.gif
-#    :height: 300px
-#    :width: 400px
-#    :align: center
-
-###############################################################################
-# Next steps
-# ----------
-# 
-# In the :doc:`next tutorial <2_basics>`, we will go through some more basics
-# of DGL, such as reading and writing node/edge features.
--- a/tutorials/basics/2_basics.py
+++ b/tutorials/basics/2_basics.py
-"""
-.. currentmodule:: dgl
-
-DGLGraph and Node/edge Features
-===============================
-
-**Author**: `Minjie Wang <https://jermainewang.github.io/>`_, Quan Gan, Yu Gai,
-Zheng Zhang
-
-In this tutorial, you learn how to create a graph and how to read and write node and edge representations.
-"""
-
-###############################################################################
-# Creating a graph
-# ----------------
-# The design of :class:`DGLGraph` was influenced by other graph libraries. You 
-# can create a graph from networkx and convert it into a :class:`DGLGraph` and 
-# vice versa.
-
-import networkx as nx
-import dgl
-
-g_nx = nx.petersen_graph()
-g_dgl = dgl.DGLGraph(g_nx)
-
-import matplotlib.pyplot as plt
-plt.subplot(121)
-nx.draw(g_nx, with_labels=True)
-plt.subplot(122)
-nx.draw(g_dgl.to_networkx(), with_labels=True)
-
-plt.show()
-
-
-###############################################################################
-# There are many ways to construct a :class:`DGLGraph`. Below are the allowed
-# data types ordered by our recommendataion.
-#
-# * A pair of arrays ``(u, v)`` storing the source and destination nodes respectively.
-#   They can be numpy arrays or tensor objects from the backend framework.
-# * ``scipy`` sparse matrix representing the adjacency matrix of the graph to be
-#   constructed.
-# * ``networkx`` graph object.
-# * A list of edges in the form of integer pairs.
-#
-# The examples below construct the same star graph via different methods.
-#
-# :class:`DGLGraph` nodes are a consecutive range of integers between 0 and
-# :func:`number_of_nodes() <DGLGraph.number_of_nodes>`. 
-# :class:`DGLGraph` edges are in order of their additions. Note that
-# edges are accessed in much the same way as nodes, with one extra feature:
-# *edge broadcasting*.
-
-import torch as th
-import numpy as np
-import scipy.sparse as spp
-
-# Create a star graph from a pair of arrays (using ``numpy.array`` works too).
-u = th.tensor([0, 0, 0, 0, 0])
-v = th.tensor([1, 2, 3, 4, 5])
-star1 = dgl.DGLGraph((u, v))
-
-# Create the same graph from a scipy sparse matrix (using ``scipy.sparse.csr_matrix`` works too).
-adj = spp.coo_matrix((np.ones(len(u)), (u.numpy(), v.numpy())))
-star3 = dgl.DGLGraph(adj)
-
-###############################################################################
-# You can also create a graph by progressively adding more nodes and edges.
-# Although it is not as efficient as the above constructors, it is suitable
-# for applications where the graph cannot be constructed in one shot.
-
-g = dgl.DGLGraph()
-g.add_nodes(10)
-# A couple edges one-by-one
-for i in range(1, 4):
-    g.add_edge(i, 0)
-# A few more with a paired list
-src = list(range(5, 8)); dst = [0]*3
-g.add_edges(src, dst)
-# finish with a pair of tensors
-src = th.tensor([8, 9]); dst = th.tensor([0, 0])
-g.add_edges(src, dst)
-
-# Edge broadcasting will do star graph in one go!
-g = dgl.DGLGraph()
-g.add_nodes(10)
-src = th.tensor(list(range(1, 10)));
-g.add_edges(src, 0)
-
-# Visualize the graph.
-nx.draw(g.to_networkx(), with_labels=True)
-plt.show()
-
-###############################################################################
-# Assigning a feature
-# -------------------
-# You can also assign features to nodes and edges of a :class:`DGLGraph`.  The
-# features are represented as dictionary of names (strings) and tensors,
-# called **fields**.
-#
-# The following code snippet assigns each node a vector (len=3).
-#
-# .. note::
-#
-#    DGL aims to be framework-agnostic, and currently it supports PyTorch and
-#    MXNet tensors. The following examples use PyTorch only.
-
-import dgl
-import torch as th
-
-x = th.randn(10, 3)
-g.ndata['x'] = x
-
-###############################################################################
-# :func:`ndata <DGLGraph.ndata>` is a syntax sugar to access the feature
-# data of all nodes. To get the features of some particular nodes, slice out
-# the corresponding rows.
-
-g.ndata['x'][0] = th.zeros(1, 3)
-g.ndata['x'][[0, 1, 2]] = th.zeros(3, 3)
-g.ndata['x'][th.tensor([0, 1, 2])] = th.randn((3, 3))
-
-###############################################################################
-# Assigning edge features is similar to that of node features,
-# except that you can also do it by specifying endpoints of the edges.
-
-g.edata['w'] = th.randn(9, 2)
-
-# Access edge set with IDs in integer, list, or integer tensor
-g.edata['w'][1] = th.randn(1, 2)
-g.edata['w'][[0, 1, 2]] = th.zeros(3, 2)
-g.edata['w'][th.tensor([0, 1, 2])] = th.zeros(3, 2)
-
-# You can get the edge ids by giving endpoints, which are useful for accessing the features.
-g.edata['w'][g.edge_id(1, 0)] = th.ones(1, 2)                   # edge 1 -> 0
-g.edata['w'][g.edge_ids([1, 2, 3], [0, 0, 0])] = th.ones(3, 2)  # edges [1, 2, 3] -> 0
-# Use edge broadcasting whenever applicable.
-g.edata['w'][g.edge_ids([1, 2, 3], [0, 0, 0])] = th.ones(3, 2)          # edges [1, 2, 3] -> 0
-
-###############################################################################
-# After assignments, each node or edge field will be associated with a scheme
-# containing the shape and data type (dtype) of its field value.
-
-print(g.node_attr_schemes())
-g.ndata['x'] = th.zeros((10, 4))
-print(g.node_attr_schemes())
-
-
-###############################################################################
-# You can also remove node or edge states from the graph. This is particularly
-# useful to save memory during inference.
-
-g.ndata.pop('x')
-g.edata.pop('w')
-
-
-###############################################################################
-# Working with multigraphs
-# ~~~~~~~~~~~~~~~~~~~~~~~~
-# Many graph applications need parallel edges,
-# which class:DGLGraph supports by default.
-
-g_multi = dgl.DGLGraph()
-g_multi.add_nodes(10)
-g_multi.ndata['x'] = th.randn(10, 2)
-
-g_multi.add_edges(list(range(1, 10)), 0)
-g_multi.add_edge(1, 0) # two edges on 1->0
-
-g_multi.edata['w'] = th.randn(10, 2)
-g_multi.edges[1].data['w'] = th.zeros(1, 2)
-print(g_multi.edges())
-
-
-###############################################################################
-# An edge in multigraph cannot be uniquely identified by using its incident nodes
-# :math:`u` and :math:`v`; query their edge IDs use ``edge_id`` interface.
-
-_, _, eid_10 = g_multi.edge_id(1, 0, return_uv=True)
-g_multi.edges[eid_10].data['w'] = th.ones(len(eid_10), 2)
-print(g_multi.edata['w'])
-
-
-###############################################################################
-# .. note::
-#
-#    * Updating a feature of different schemes raises the risk of error on individual nodes (or
-#      node subset).
-
-###############################################################################
-# Next steps
-# ----------
-# In the :doc:`next tutorial <3_pagerank>` you learn the
-# DGL message passing interface by implementing PageRank.
--- a/tutorials/basics/3_pagerank.py.bak
+++ b/tutorials/basics/3_pagerank.py.bak
-"""
-.. currentmodule:: dgl
-
-Message Passing Tutorial
-========================
-
-**Author**: `Minjie Wang <https://jermainewang.github.io/>`_, Quan Gan, Yu Gai,
-Zheng Zhang
-
-In this tutorial, you learn how to use different levels of the message
-passing API with PageRank on a small graph. In DGL, the message passing and
-feature transformations are **user-defined functions** (UDFs).
-
-"""
-
-###############################################################################
-# The PageRank algorithm
-# ----------------------
-# In each iteration of PageRank, every node (web page) first scatters its
-# PageRank value uniformly to its downstream nodes. The new PageRank value of
-# each node is computed by aggregating the received PageRank values from its
-# neighbors, which is then adjusted by the damping factor:
-#
-# .. math::
-#
-#    PV(u) = \frac{1-d}{N} + d \times \sum_{v \in \mathcal{N}(u)}
-#    \frac{PV(v)}{D(v)}
-#
-# where :math:`N` is the number of nodes in the graph; :math:`D(v)` is the
-# out-degree of a node :math:`v`; and :math:`\mathcal{N}(u)` is the neighbor
-# nodes.
-
-
-###############################################################################
-# A naive implementation
-# ----------------------
-# Create a graph with 100 nodes by using ``networkx`` and then convert it to a
-# :class:`DGLGraph`.
-
-import networkx as nx
-import matplotlib.pyplot as plt
-import torch
-import dgl
-
-N = 100  # number of nodes
-DAMP = 0.85  # damping factor
-K = 10  # number of iterations
-g = nx.nx.erdos_renyi_graph(N, 0.1)
-g = dgl.DGLGraph(g)
-nx.draw(g.to_networkx(), node_size=50, node_color=[[.5, .5, .5,]])
-plt.show()
-
-
-###############################################################################
-# According to the algorithm, PageRank consists of two phases in a typical
-# scatter-gather pattern. Initialize the PageRank value of each node
-# to :math:`\frac{1}{N}` and then store each node's out-degree as a node feature.
-
-g.ndata['pv'] = torch.ones(N) / N
-g.ndata['deg'] = g.out_degrees(g.nodes()).float()
-
-
-###############################################################################
-# Define the message function, which divides every node's PageRank
-# value by its out-degree and passes the result as message to its neighbors.
-
-def pagerank_message_func(edges):
-    return {'pv' : edges.src['pv'] / edges.src['deg']}
-
-
-###############################################################################
-# In DGL, the message functions are expressed as **Edge UDFs**.  Edge UDFs
-# take in a single argument ``edges``.  It has three members ``src``, ``dst``,
-# and ``data`` for accessing source node features, destination node features,
-# and edge features.  Here, the function computes messages only
-# from source node features.
-#
-# Define the reduce function, which removes and aggregates the
-# messages from its ``mailbox``, and computes its new PageRank value.
-
-def pagerank_reduce_func(nodes):
-    msgs = torch.sum(nodes.mailbox['pv'], dim=1)
-    pv = (1 - DAMP) / N + DAMP * msgs
-    return {'pv' : pv}
-
-
-###############################################################################
-# The reduce functions are **Node UDFs**.  Node UDFs have a single argument
-# ``nodes``, which has two members ``data`` and ``mailbox``.  ``data``
-# contains the node features and ``mailbox`` contains all incoming message
-# features, stacked along the second dimension (hence the ``dim=1`` argument).
-#
-# The message UDF works on a batch of edges, whereas the reduce UDF works on
-# a batch of edges but outputs a batch of nodes. Their relationships are as
-# follows:
-#
-# .. image:: https://i.imgur.com/kIMiuFb.png
-#
-
-###############################################################################
-# The algorithm is straightforward. Here is the code for one
-# PageRank iteration.
-
-def pagerank_naive(g):
-    # Phase #1: send out messages along all edges.
-    for u, v in zip(*g.edges()):
-        g.send((u, v), pagerank_message_func)
-    # Phase #2: receive messages to compute new PageRank values.
-    for v in g.nodes():
-        g.recv(v, pagerank_reduce_func)
-
-
-###############################################################################
-# Batching semantics for a large graph
-# ------------------------------------
-# The above code does not scale to a large graph because it iterates over all
-# the nodes. DGL solves this by allowing you to compute on a *batch* of nodes or
-# edges. For example, the following codes trigger message and reduce functions
-# on multiple nodes and edges at one time.
-
-def pagerank_batch(g):
-    g.send(g.edges(), pagerank_message_func)
-    g.recv(g.nodes(), pagerank_reduce_func)
-
-
-###############################################################################
-# You are still using the same reduce function ``pagerank_reduce_func``,
-# where ``nodes.mailbox['pv']`` is a *single* tensor, stacking the incoming
-# messages along the second dimension.
-#
-# You might wonder if this is even possible to perform reduce on all
-# nodes in parallel, since each node may have different number of incoming
-# messages and you cannot really "stack" tensors of different lengths together.
-# In general, DGL solves the problem by grouping the nodes by the number of
-# incoming messages, and calling the reduce function for each group.
-
-
-###############################################################################
-# Use higher-level APIs for efficiency
-# ---------------------------------------
-# DGL provides many routines that combine basic ``send`` and ``recv`` in
-# various ways. These routines are called **level-2 APIs**. For example, the next code example
-# shows how to further simplify the PageRank example with such an API.
-
-def pagerank_level2(g):
-    g.update_all()
-
-
-###############################################################################
-# In addition to ``update_all``, you can use ``pull``, ``push``, and ``send_and_recv``
-# in this level-2 category. For more information, see :doc:`API reference <../../api/python/graph>`.
-
-
-###############################################################################
-# Use DGL ``builtin`` functions for efficiency
-# ------------------------------------------------
-# Some of the message and reduce functions are used frequently. For this reason, DGL also
-# provides ``builtin`` functions. For example, two ``builtin`` functions can be
-# used in the PageRank example.
-#
-# * :func:`dgl.function.copy_src(src, out) <function.copy_src>` - This
-#   code example is an edge UDF that computes the
-#   output using the source node feature data. To use this, specify the name of
-#   the source feature data (``src``) and the output name (``out``).
-# 
-# * :func:`dgl.function.sum(msg, out) <function.sum>` - This code example is a node UDF
-#   that sums the messages in
-#   the node's mailbox. To use this, specify the message name (``msg``) and the
-#   output name (``out``).
-#
-# The following PageRank example shows such functions.
-
-import dgl.function as fn
-
-def pagerank_builtin(g):
-    g.ndata['pv'] = g.ndata['pv'] / g.ndata['deg']
-    g.update_all(message_func=fn.copy_src(src='pv', out='m'),
-                 reduce_func=fn.sum(msg='m',out='m_sum'))
-    g.ndata['pv'] = (1 - DAMP) / N + DAMP * g.ndata['m_sum']
-
-
-###############################################################################
-# In the previous example code, you directly provide the UDFs to the :func:`update_all <DGLGraph.update_all>`
-# as its arguments.
-# This will override the previously registered UDFs.
-#
-# In addition to cleaner code, using ``builtin`` functions also gives DGL the
-# opportunity to fuse operations together. This results in faster execution.  For
-# example, DGL will fuse the ``copy_src`` message function and ``sum`` reduce
-# function into one sparse matrix-vector (spMV) multiplication.
-#
-# `The following section <spmv_>`_ describes why spMV can speed up the scatter-gather
-# phase in PageRank.  For more details about the ``builtin`` functions in DGL,
-# see :doc:`API reference <../../api/python/function>`.
-#
-# You can also download and run the different code examples to see the differences.
-
-for k in range(K):
-    # Uncomment the corresponding line to select different version.
-    # pagerank_naive(g)
-    # pagerank_batch(g)
-    # pagerank_level2(g)
-    pagerank_builtin(g)
-print(g.ndata['pv'])
-
-
-###############################################################################
-# .. _spmv:
-#
-# Using spMV for PageRank
-# -----------------------
-# Using ``builtin`` functions allows DGL to understand the semantics of UDFs.
-# This allows you to create an efficient implementation. For example, in the case
-# of PageRank, one common method to accelerate it is by using its linear algebra
-# form.
-#
-# .. math::
-#
-#    \mathbf{R}^{k} = \frac{1-d}{N} \mathbf{1} + d \mathbf{A}*\mathbf{R}^{k-1}
-#
-# Here, :math:`\mathbf{R}^k` is the vector of the PageRank values of all nodes
-# at iteration :math:`k`; :math:`\mathbf{A}` is the sparse adjacency matrix
-# of the graph.
-# Computing this equation is quite efficient because there is an efficient
-# GPU kernel for the sparse matrix-vector multiplication (spMV). DGL
-# detects whether such optimization is available through the ``builtin``
-# functions. If a certain combination of ``builtin`` can be mapped to an spMV
-# kernel (e.g., the PageRank example), DGL uses it automatically. We recommend 
-# using ``builtin`` functions whenever possible.
-
-
-###############################################################################
-# Next steps
-# ----------
-# 
-# * Learn how to use DGL (:doc:`builtin functions<../../features/builtin>`) to write 
-#   more efficient message passing.
-# * To see model tutorials, see the :doc:`overview page<../models/index>`.
-# * To learn about Graph Neural Networks, see :doc:`GCN tutorial<../models/1_gnn/1_gcn>`.
-# * To see how DGL batches multiple graphs, see :doc:`TreeLSTM tutorial<../models/2_small_graph/3_tree-lstm>`.
-# * Play with some graph generative models by following tutorial for :doc:`Deep Generative Model of Graphs<../models/3_generative_model/5_dgmg>`.
-# * To learn how traditional models are interpreted in a view of graph, see 
-#   the tutorials on :doc:`CapsuleNet<../models/4_old_wines/2_capsule>` and
-#   :doc:`Transformer<../models/4_old_wines/7_transformer>`.
--- a/tutorials/basics/4_batch.py
+++ b/tutorials/basics/4_batch.py
-"""
-.. currentmodule:: dgl
-
-Graph Classification Tutorial
-=============================
-
-**Author**: `Mufei Li <https://github.com/mufeili>`_,
-`Minjie Wang <https://jermainewang.github.io/>`_,
-`Zheng Zhang <https://shanghai.nyu.edu/academics/faculty/directory/zheng-zhang>`_.
-
-In this tutorial, you learn how to use DGL to batch multiple graphs of variable size and shape. The 
-tutorial also demonstrates training a graph neural network for a simple graph classification task.
-
-Graph classification is an important problem
-with applications across many fields, such as bioinformatics, chemoinformatics, social
-network analysis, urban computing, and cybersecurity. Applying graph neural
-networks to this problem has been a popular approach recently. This can be seen in the following reserach references: 
-`Ying et al., 2018 <https://arxiv.org/abs/1806.08804>`_,
-`Cangea et al., 2018 <https://arxiv.org/abs/1811.01287>`_,
-`Knyazev et al., 2018 <https://arxiv.org/abs/1811.09595>`_,
-`Bianchi et al., 2019 <https://arxiv.org/abs/1901.01343>`_,
-`Liao et al., 2019 <https://arxiv.org/abs/1901.01484>`_,
-`Gao et al., 2019 <https://openreview.net/forum?id=HJePRoAct7>`_).
-
-"""
-
-###############################################################################
-# Simple graph classification task
-# --------------------------------
-# In this tutorial, you learn how to perform batched graph classification
-# with DGL. The example task objective is to classify eight types of topologies shown here.
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/dataset_overview.png
-#     :align: center
-#
-# Implement a synthetic dataset :class:`data.MiniGCDataset` in DGL. The dataset has eight 
-# different types of graphs and each class has the same number of graph samples.
-
-import dgl
-import torch
-from dgl.data import MiniGCDataset
-import matplotlib.pyplot as plt
-import networkx as nx
-# A dataset with 80 samples, each graph is
-# of size [10, 20]
-dataset = MiniGCDataset(80, 10, 20)
-graph, label = dataset[0]
-fig, ax = plt.subplots()
-nx.draw(graph.to_networkx(), ax=ax)
-ax.set_title('Class: {:d}'.format(label))
-plt.show()
-
-###############################################################################
-# Form a graph mini-batch
-# -----------------------
-# To train neural networks efficiently, a common practice is to batch
-# multiple samples together to form a mini-batch. Batching fixed-shaped tensor
-# inputs is common. For example, batching two images of size 28 x 28
-# gives a tensor of shape 2 x 28 x 28. By contrast, batching graph inputs
-# has two challenges:
-#
-# * Graphs are sparse.
-# * Graphs can have various length. For example, number of nodes and edges.
-#
-# To address this, DGL provides a :func:`dgl.batch` API. It leverages the idea that
-# a batch of graphs can be viewed as a large graph that has many disjointed 
-# connected components. Below is a visualization that gives the general idea.
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/batch.png
-#     :width: 400pt
-#     :align: center
-#
-# The return type of :func:`dgl.batch` is still a graph. In the same way, 
-# a batch of tensors is still a tensor. This means that any code that works
-# for one graph immediately works for a batch of graphs. More importantly,
-# because DGL processes messages on all nodes and edges in parallel, this greatly
-# improves efficiency.
-#
-# Graph classifier
-# ----------------
-# Graph classification proceeds as follows.
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/graph_classifier.png
-#
-# From a batch of graphs, perform message passing and graph convolution
-# for nodes to communicate with others. After message passing, compute a
-# tensor for graph representation from node (and edge) attributes. This step might 
-# be called readout or aggregation. Finally, the graph 
-# representations are fed into a classifier :math:`g` to predict the graph labels.
-#
-# Graph convolution layer can be found in the ``dgl.nn.<backend>`` submodule.
-
-from dgl.nn.pytorch import GraphConv
-
-###############################################################################
-# Readout and classification
-# --------------------------
-# For this demonstration, consider initial node features to be their degrees.
-# After two rounds of graph convolution, perform a graph readout by averaging
-# over all node features for each graph in the batch.
-#
-# .. math::
-#
-#    h_g=\frac{1}{|\mathcal{V}|}\sum_{v\in\mathcal{V}}h_{v}
-#
-# In DGL, :func:`dgl.mean_nodes` handles this task for a batch of
-# graphs with variable size. You then feed the graph representations into a
-# classifier with one linear layer to obtain pre-softmax logits.
-
-import torch.nn as nn
-import torch.nn.functional as F
-
-class Classifier(nn.Module):
-    def __init__(self, in_dim, hidden_dim, n_classes):
-        super(Classifier, self).__init__()
-        self.conv1 = GraphConv(in_dim, hidden_dim)
-        self.conv2 = GraphConv(hidden_dim, hidden_dim)
-        self.classify = nn.Linear(hidden_dim, n_classes)
-
-    def forward(self, g):
-        # Use node degree as the initial node feature. For undirected graphs, the in-degree
-        # is the same as the out_degree.
-        h = g.in_degrees().view(-1, 1).float()
-        # Perform graph convolution and activation function.
-        h = F.relu(self.conv1(g, h))
-        h = F.relu(self.conv2(g, h))
-        g.ndata['h'] = h
-        # Calculate graph representation by averaging all the node representations.
-        hg = dgl.mean_nodes(g, 'h')
-        return self.classify(hg)
-
-###############################################################################
-# Setup and training
-# ------------------
-# Create a synthetic dataset of :math:`400` graphs with :math:`10` ~
-# :math:`20` nodes. :math:`320` graphs constitute a training set and
-# :math:`80` graphs constitute a test set.
-
-import torch.optim as optim
-from dgl.dataloading import GraphDataLoader
-
-# Create training and test sets.
-trainset = MiniGCDataset(320, 10, 20)
-testset = MiniGCDataset(80, 10, 20)
-# Use DGL's GraphDataLoader. It by default handles the 
-# graph batching operation for every mini-batch.
-data_loader = GraphDataLoader(trainset, batch_size=32, shuffle=True)
-
-# Create model
-model = Classifier(1, 256, trainset.num_classes)
-loss_func = nn.CrossEntropyLoss()
-optimizer = optim.Adam(model.parameters(), lr=0.001)
-model.train()
-
-epoch_losses = []
-for epoch in range(80):
-    epoch_loss = 0
-    for iter, (bg, label) in enumerate(data_loader):
-        prediction = model(bg)
-        loss = loss_func(prediction, label)
-        optimizer.zero_grad()
-        loss.backward()
-        optimizer.step()
-        epoch_loss += loss.detach().item()
-    epoch_loss /= (iter + 1)
-    print('Epoch {}, loss {:.4f}'.format(epoch, epoch_loss))
-    epoch_losses.append(epoch_loss)
-
-###############################################################################
-# The learning curve of a run is presented below.
-
-plt.title('cross entropy averaged over minibatches')
-plt.plot(epoch_losses)
-plt.show()
-
-###############################################################################
-# The trained model is evaluated on the test set created. To deploy
-# the tutorial, restrict the running time to get a higher
-# accuracy (:math:`80` % ~ :math:`90` %) than the ones printed below.
-
-model.eval()
-# Convert a list of tuples to two lists
-test_X, test_Y = map(list, zip(*testset))
-test_bg = dgl.batch(test_X)
-test_Y = torch.tensor(test_Y).float().view(-1, 1)
-probs_Y = torch.softmax(model(test_bg), 1)
-sampled_Y = torch.multinomial(probs_Y, 1)
-argmax_Y = torch.max(probs_Y, 1)[1].view(-1, 1)
-print('Accuracy of sampled predictions on the test set: {:.4f}%'.format(
-    (test_Y == sampled_Y.float()).sum().item() / len(test_Y) * 100))
-print('Accuracy of argmax predictions on the test set: {:4f}%'.format(
-    (test_Y == argmax_Y.float()).sum().item() / len(test_Y) * 100))
-
-###############################################################################
-# The animation here plots the probability that a trained model predicts the correct graph type.
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/test_eval4.gif
-#
-# To understand the node and graph representations that a trained model learned,
-# we use `t-SNE, <https://lvdmaaten.github.io/tsne/>`_ for dimensionality reduction
-# and visualization.
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/tsne_node2.png
-#     :align: center
-#
-# .. image:: https://data.dgl.ai/tutorial/batch/tsne_graph2.png
-#     :align: center
-#
-# The two small figures on the top separately visualize node representations after one and two
-# layers of graph convolution. The figure on the bottom visualizes
-# the pre-softmax logits for graphs as graph representations.
-#
-# While the visualization does suggest some clustering effects of the node features,
-# you would not expect a perfect result. Node degrees are deterministic for
-# these node features. The graph features are improved when separated.
-#
-# What's next?
-# ------------
-# Graph classification with graph neural networks is still a new field.
-# It's waiting for people to bring more exciting discoveries. The work requires 
-# mapping different graphs to different embeddings, while preserving
-# their structural similarity in the embedding space. To learn more about it, see 
-# `How Powerful Are Graph Neural Networks? <https://arxiv.org/abs/1810.00826>`_ a research paper  
-# published for the International Conference on Learning Representations 2019.
-#
-# For more examples about batched graph processing, see the following:
-#
-# * Tutorials for `Tree LSTM <https://docs.dgl.ai/tutorials/models/2_small_graph/3_tree-lstm.html>`_ and `Deep Generative Models of Graphs <https://docs.dgl.ai/tutorials/models/3_generative_model/5_dgmg.html>`_
-# * An example implementation of `Junction Tree VAE <https://github.com/dmlc/dgl/tree/master/examples/pytorch/jtnn>`_
--- a/tutorials/basics/5_hetero.py
+++ b/tutorials/basics/5_hetero.py
-"""
-.. currentmodule:: dgl
-
-Working with Heterogeneous Graphs
-=================================
-
-**Author**: Quan Gan, `Minjie Wang <https://jermainewang.github.io/>`_, Mufei Li,
-George Karypis, Zheng Zhang
-
-In this tutorial, you learn about:
-
-* Examples of heterogenous graph data and typical applications.
-
-* Creating and manipulating a heterogenous graph in DGL.
-
-* Implementing `Relational-GCN <https://arxiv.org/abs/1703.06103>`_, a popular GNN model,
-  for heterogenous graph input.
-
-* Training a model to solve a node classification task.
-
-Heterogeneous graphs, or *heterographs* for short, are graphs that contain
-different types of nodes and edges. The different types of nodes and edges tend
-to have different types of attributes that are designed to capture the
-characteristics of each node and edge type. Within the context of
-graph neural networks, depending on their complexity, certain node and edge types
-might need to be modeled with representations that have a different number of dimensions.
-
-DGL supports graph neural network computations on such heterogeneous graphs, by
-using the heterograph class and its associated API.
-
-"""
-
-###############################################################################
-# Examples of heterographs
-# -----------------------
-# Many graph datasets represent relationships among various types of entities.
-# This section provides an overview for several graph use-cases that show such relationships 
-# and can have their data represented as heterographs.
-#
-# Citation graph 
-# ~~~~~~~~~~~~~~~
-# The Association for Computing Machinery publishes an `ACM dataset <https://aminer.org/citation>`_ that contains two
-# million papers, their authors, publication venues, and the other papers
-# that were cited. This information can be represented as a heterogeneous graph.
-#
-# The following diagram shows several entities in the ACM dataset and the relationships among them 
-# (taken from `Shi et al., 2015 <https://arxiv.org/pdf/1511.04854.pdf>`_).
-#
-# .. figure:: https://data.dgl.ai/tutorial/hetero/acm-example.png# 
-# 
-# This graph has three types of entities that correspond to papers, authors, and publication venues.
-# It also contains three types of edges that connect the following:
-#
-# * Authors with papers corresponding to *written-by* relationships
-#
-# * Papers with publication venues corresponding to *published-in* relationships
-#
-# * Papers with other papers corresponding to *cited-by* relationships
-#
-#
-# Recommender systems 
-# ~~~~~~~~~~~~~~~~~~~~ 
-# The datasets used in recommender systems often contain
-# interactions between users and items. For example, the data could include the
-# ratings that users have provided to movies. Such interactions can be modeled
-# as heterographs.
-#
-# The nodes in these heterographs will have two types, *users* and *movies*. The edges
-# will correspond to the user-movie interactions. Furthermore, if an interaction is
-# marked with a rating, then each rating value could correspond to a different edge type.
-# The following diagram shows an example of user-item interactions as a heterograph.
-#
-# .. figure:: https://data.dgl.ai/tutorial/hetero/recsys-example.png
-#
-#
-# Knowledge graph 
-# ~~~~~~~~~~~~~~~~
-# Knowledge graphs are inherently heterogenous. For example, in
-# Wikidata, Barack Obama (item Q76) is an instance of a human, which could be viewed as
-# the entity class, whose spouse (item P26) is Michelle Obama (item Q13133) and
-# occupation (item P106) is politician (item Q82955). The relationships are shown in the following.
-# diagram.
-#
-# .. figure:: https://data.dgl.ai/tutorial/hetero/kg-example.png
-#
-
-###############################################################################
-# Creating a heterograph in DGL
-# -----------------------------
-# You can create a heterograph in DGL using the :func:`dgl.heterograph` API.
-# The argument to :func:`dgl.heterograph` is a dictionary. The keys are tuples
-# in the form of ``(srctype, edgetype, dsttype)`` specifying the relation name
-# and the two entity types it connects. Such tuples are called *canonical edge types*
-# The values are data to initialize the graph structures, that is, which
-# nodes the edges actually connect.
-#
-# For instance, the following code creates the user-item interactions heterograph shown earlier.
-
-# Each value of the dictionary is a pair of source and destination arrays.
-# Nodes are integer IDs starting from zero. Nodes IDs of different types have
-# separate countings.
-import dgl
-import numpy as np
-
-ratings = dgl.heterograph(
-    {('user', '+1', 'movie') : (np.array([0, 0, 1]), np.array([0, 1, 0])),
-     ('user', '-1', 'movie') : (np.array([2]), np.array([1]))})
-
-###############################################################################
-# Manipulating heterograph
-# ------------------------
-# You can create a more realistic heterograph using the ACM dataset. To do this, first 
-# download the dataset as follows:
-
-import scipy.io
-import urllib.request
-
-data_url = 'https://data.dgl.ai/dataset/ACM.mat'
-data_file_path = '/tmp/ACM.mat'
-
-urllib.request.urlretrieve(data_url, data_file_path)
-data = scipy.io.loadmat(data_file_path)
-print(list(data.keys()))
-
-###############################################################################
-# The dataset stores node information by their types: ``P`` for paper, ``A``
-# for author, ``C`` for conference, ``L`` for subject code, and so on. The relationships
-# are stored as SciPy sparse matrix under key ``XvsY``, where ``X`` and ``Y``
-# could be any of the node type code.
-#
-# The following code prints out some statistics about the paper-author relationships.
-
-print(type(data['PvsA']))
-print('#Papers:', data['PvsA'].shape[0])
-print('#Authors:', data['PvsA'].shape[1])
-print('#Links:', data['PvsA'].nnz)
-
-###############################################################################
-# Converting this SciPy matrix to a heterograph in DGL is straightforward.
-
-pa_g = dgl.heterograph({('paper', 'written-by', 'author') : data['PvsA'].nonzero()})
-
-###############################################################################
-# You can easily print out the type names and other structural information.
-
-print('Node types:', pa_g.ntypes)
-print('Edge types:', pa_g.etypes)
-print('Canonical edge types:', pa_g.canonical_etypes)
-
-# Nodes and edges are assigned integer IDs starting from zero and each type has its own counting.
-# To distinguish the nodes and edges of different types, specify the type name as the argument.
-print(pa_g.number_of_nodes('paper'))
-# Canonical edge type name can be shortened to only one edge type name if it is
-# uniquely distinguishable.
-print(pa_g.number_of_edges(('paper', 'written-by', 'author')))
-print(pa_g.number_of_edges('written-by'))
-print(pa_g.successors(1, etype='written-by'))  # get the authors that write paper #1
-
-# Type name argument could be omitted whenever the behavior is unambiguous.
-print(pa_g.number_of_edges())  # Only one edge type, the edge type argument could be omitted
-
-###############################################################################
-# A homogeneous graph is just a special case of a heterograph with only one type
-# of node and edge.
-
-# Paper-citing-paper graph is a homogeneous graph
-pp_g = dgl.heterograph({('paper', 'citing', 'paper') : data['PvsP'].nonzero()})
-# equivalent (shorter) API for creating homogeneous graph
-pp_g = dgl.from_scipy(data['PvsP'])
-
-# All the ntype and etype arguments could be omitted because the behavior is unambiguous.
-print(pp_g.number_of_nodes())
-print(pp_g.number_of_edges())
-print(pp_g.successors(3))
-
-###############################################################################
-# Create a subset of the ACM graph using the paper-author, paper-paper, 
-# and paper-subject relationships.  Meanwhile, also add the reverse
-# relationship to prepare for the later sections.
-
-G = dgl.heterograph({
-        ('paper', 'written-by', 'author') : data['PvsA'].nonzero(),
-        ('author', 'writing', 'paper') : data['PvsA'].transpose().nonzero(),
-        ('paper', 'citing', 'paper') : data['PvsP'].nonzero(),
-        ('paper', 'cited', 'paper') : data['PvsP'].transpose().nonzero(),
-        ('paper', 'is-about', 'subject') : data['PvsL'].nonzero(),
-        ('subject', 'has', 'paper') : data['PvsL'].transpose().nonzero(),
-    })
-
-print(G)
-
-###############################################################################
-# **Metagraph** (or network schema) is a useful summary of a heterograph.
-# Serving as a template for a heterograph, it tells how many types of objects
-# exist in the network and where the possible links exist.
-#
-# DGL provides easy access to the metagraph, which could be visualized using
-# external tools.
-
-# Draw the metagraph using graphviz.
-import pygraphviz as pgv
-def plot_graph(nxg):
-    ag = pgv.AGraph(strict=False, directed=True)
-    for u, v, k in nxg.edges(keys=True):
-        ag.add_edge(u, v, label=k)
-    ag.layout('dot')
-    ag.draw('graph.png')
-
-plot_graph(G.metagraph())
-
-###############################################################################
-# Learning tasks associated with heterographs
-# -------------------------------------------
-# Some of the typical learning tasks that involve heterographs include:
-#
-# * *Node classification and regression* to predict the class of each node or
-#   estimate a value associated with it.
-#
-# * *Link prediction* to predict if there is an edge of a certain
-#   type between a pair of nodes, or predict which other nodes a particular
-#   node is connected with (and optionally the edge types of such connections).
-#
-# * *Graph classification/regression* to assign an entire
-#   heterograph into one of the target classes or to estimate a numerical
-#   value associated with it.
-#
-# In this tutorial, we designed a simple example for the first task.
-#
-# A semi-supervised node classification example
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-# Our goal is to predict the publishing conference of a paper using the ACM
-# academic graph we just created. To further simplify the task, we only focus
-# on papers published in three conferences: *KDD*, *ICML*, and *VLDB*. All
-# the other papers are not labeled, making it a semi-supervised setting.
-#
-# The following code extracts those papers from the raw dataset and prepares 
-# the training, validation, testing split.
-
-import numpy as np
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-pvc = data['PvsC'].tocsr()
-# find all papers published in KDD, ICML, VLDB
-c_selected = [0, 11, 13]  # KDD, ICML, VLDB
-p_selected = pvc[:, c_selected].tocoo()
-# generate labels
-labels = pvc.indices
-labels[labels == 11] = 1
-labels[labels == 13] = 2
-labels = torch.tensor(labels).long()
-
-# generate train/val/test split
-pid = p_selected.row
-shuffle = np.random.permutation(pid)
-train_idx = torch.tensor(shuffle[0:800]).long()
-val_idx = torch.tensor(shuffle[800:900]).long()
-test_idx = torch.tensor(shuffle[900:]).long()
-
-###############################################################################
-# Relational-GCN on heterograph
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-# We use `Relational-GCN <https://arxiv.org/abs/1703.06103>`_ to learn the
-# representation of nodes in the graph. Its message-passing equation is as
-# follows:
-#
-# .. math::
-#
-#    h_i^{(l+1)} = \sigma\left(\sum_{r\in \mathcal{R}}
-#    \sum_{j\in\mathcal{N}_r(i)}W_r^{(l)}h_j^{(l)}\right)
-#
-# Breaking down the equation, you see that there are two parts in the
-# computation.
-#
-# (i) Message computation and aggregation within each relation :math:`r`
-#
-# (ii) Reduction that merges the results from multiple relationships
-#
-# Following this intuition, perform message passing on a heterograph in
-# two steps.
-#
-# (i) Per-edge-type message passing
-#
-# (ii) Type wise reduction
-
-import dgl.function as fn
-
-class HeteroRGCNLayer(nn.Module):
-    def __init__(self, in_size, out_size, etypes):
-        super(HeteroRGCNLayer, self).__init__()
-        # W_r for each relation
-        self.weight = nn.ModuleDict({
-                name : nn.Linear(in_size, out_size) for name in etypes
-            })
-
-    def forward(self, G, feat_dict):
-        # The input is a dictionary of node features for each type
-        funcs = {}
-        for srctype, etype, dsttype in G.canonical_etypes:
-            # Compute W_r * h
-            Wh = self.weight[etype](feat_dict[srctype])
-            # Save it in graph for message passing
-            G.nodes[srctype].data['Wh_%s' % etype] = Wh
-            # Specify per-relation message passing functions: (message_func, reduce_func).
-            # Note that the results are saved to the same destination feature 'h', which
-            # hints the type wise reducer for aggregation.
-            funcs[etype] = (fn.copy_u('Wh_%s' % etype, 'm'), fn.mean('m', 'h'))
-        # Trigger message passing of multiple types.
-        # The first argument is the message passing functions for each relation.
-        # The second one is the type wise reducer, could be "sum", "max",
-        # "min", "mean", "stack"
-        G.multi_update_all(funcs, 'sum')
-        # return the updated node feature dictionary
-        return {ntype : G.nodes[ntype].data['h'] for ntype in G.ntypes}
-
-###############################################################################
-# Create a simple GNN by stacking two ``HeteroRGCNLayer``. Since the
-# nodes do not have input features, make their embeddings trainable.
-
-class HeteroRGCN(nn.Module):
-    def __init__(self, G, in_size, hidden_size, out_size):
-        super(HeteroRGCN, self).__init__()
-        # Use trainable node embeddings as featureless inputs.
-        embed_dict = {ntype : nn.Parameter(torch.Tensor(G.number_of_nodes(ntype), in_size))
-                      for ntype in G.ntypes}
-        for key, embed in embed_dict.items():
-            nn.init.xavier_uniform_(embed)
-        self.embed = nn.ParameterDict(embed_dict)
-        # create layers
-        self.layer1 = HeteroRGCNLayer(in_size, hidden_size, G.etypes)
-        self.layer2 = HeteroRGCNLayer(hidden_size, out_size, G.etypes)
-
-    def forward(self, G):
-        h_dict = self.layer1(G, self.embed)
-        h_dict = {k : F.leaky_relu(h) for k, h in h_dict.items()}
-        h_dict = self.layer2(G, h_dict)
-        # get paper logits
-        return h_dict['paper']
-
-###############################################################################
-# Train and evaluate
-# ~~~~~~~~~~~~~~~~~~
-# Train and evaluate this network.
-
-# Create the model. The output has three logits for three classes.
-model = HeteroRGCN(G, 10, 10, 3)
-
-opt = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
-
-best_val_acc = 0
-best_test_acc = 0
-
-for epoch in range(100):
-    logits = model(G)
-    # The loss is computed only for labeled nodes.
-    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
-
-    pred = logits.argmax(1)
-    train_acc = (pred[train_idx] == labels[train_idx]).float().mean()
-    val_acc = (pred[val_idx] == labels[val_idx]).float().mean()
-    test_acc = (pred[test_idx] == labels[test_idx]).float().mean()
-
-    if best_val_acc < val_acc:
-        best_val_acc = val_acc
-        best_test_acc = test_acc
-
-    opt.zero_grad()
-    loss.backward()
-    opt.step()
-
-    if epoch % 5 == 0:
-        print('Loss %.4f, Train Acc %.4f, Val Acc %.4f (Best %.4f), Test Acc %.4f (Best %.4f)' % (
-            loss.item(),
-            train_acc.item(),
-            val_acc.item(),
-            best_val_acc.item(),
-            test_acc.item(),
-            best_test_acc.item(),
-        ))
-
-###############################################################################
-# What's next?
-# ------------
-# * Check out our full implementation in PyTorch
-#   `here <https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn-hetero>`_.
-#
-# * We also provide the following model examples:
-#
-#   * `Graph Convolutional Matrix Completion <https://arxiv.org/abs/1706.02263>_`,
-#     which we implement in MXNet
-#     `here <https://github.com/dmlc/dgl/tree/v0.4.0/examples/mxnet/gcmc>`_.
-#
-#   * `Heterogeneous Graph Attention Network <https://arxiv.org/abs/1903.07293>`_
-#     requires transforming a heterograph into a homogeneous graph according to
-#     a given metapath (i.e. a path template consisting of edge types).  We
-#     provide :func:`dgl.transform.metapath_reachable_graph` to do this.  See full
-#     implementation
-#     `here <https://github.com/dmlc/dgl/tree/master/examples/pytorch/han>`_.
-#
-#   * `Metapath2vec <https://dl.acm.org/citation.cfm?id=3098036>`_ requires
-#     generating random walk paths according to a given metapath.  Please
-#     refer to the full metapath2vec implementation
-#     `here <https://github.com/dmlc/dgl/tree/master/examples/pytorch/metapath2vec>`_.
-#
-# * :doc:`Full heterograph API reference <../../api/python/heterograph>`.