[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)

* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>

[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)
* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>
44089c8b · Minjie Wang · GitHub · 015acfd2 · 44089c8b · 44089c8b
Unverified Commit 44089c8b authored Jul 28, 2020 by Minjie Wang Committed by GitHub Jul 28, 2020
20 changed files
--- a/python/dgl/nn/mxnet/glob.py
+++ b/python/dgl/nn/mxnet/glob.py
@@ -152,7 +152,7 @@ class SortPooling(nn.Block):
            feat = feat.sort(axis=-1)
            graph.ndata['h'] = feat
            # Sort nodes according to their last features.
-            ret = topk_nodes(graph, 'h', self.k)[0].reshape(
+            ret = topk_nodes(graph, 'h', self.k, sortby=-1)[0].reshape(
                -1, self.k * feat.shape[-1])
            return ret


--- a/python/dgl/nn/mxnet/softmax.py
+++ b/python/dgl/nn/mxnet/softmax.py
@@ -2,7 +2,7 @@
 # pylint: disable= no-member, arguments-differ, access-member-before-definition, unpacking-non-sequence
 import mxnet as mx

-from ... import function as fn
+from ... import backend as F
 from ...base import ALL, is_all

 __all__ = ['edge_softmax']
@@ -28,7 +28,7 @@ class EdgeSoftmax(mx.autograd.Function):
    def __init__(self, g, eids):
        super(EdgeSoftmax, self).__init__()
        if not is_all(eids):
-            g = g.edge_subgraph(eids.astype('int64'))
+            g = g.edge_subgraph(eids.astype(g.idtype), preserve_nodes=True)
        self.g = g

    def forward(self, score):
@@ -46,16 +46,12 @@ class EdgeSoftmax(mx.autograd.Function):
            return out.data
        """
        g = self.g
-        with g.local_scope():
-            g.edata['s'] = score
-            g.update_all(fn.copy_e('s', 'm'), fn.max('m', 'smax'))
-            g.apply_edges(fn.e_sub_v('s', 'smax', 'out'))
-            g.edata['out'] = g.edata['out'].exp()
-            g.update_all(fn.copy_e('out', 'm'), fn.sum('m', 'out_sum'))
-            g.apply_edges(fn.e_div_v('out', 'out_sum', 'out'))
-            out = g.edata['out']
-            self.save_for_backward(out)
-            return out
+        score_max = F.copy_e_max(g, score)
+        score = mx.nd.exp(F.e_sub_v(g, score, score_max))
+        score_sum = F.copy_e_sum(g, score)
+        out = F.e_div_v(g, score, score_sum)
+        self.save_for_backward(out)
+        return out

    def backward(self, grad_out):
        """Backward function.
@@ -71,17 +67,13 @@ class EdgeSoftmax(mx.autograd.Function):
            sds_sum = sds.dst_sum()  # type dgl.NData
            grad_score = sds - sds * sds_sum  # multiple expressions
        """
+        out, = self.saved_tensors
        g = self.g
-        with g.local_scope():
-            out, = self.saved_tensors
-            # clear saved tensors explicitly
-            self.saved_tensors = None
-            g.edata['out'] = out
-            g.edata['grad_score'] = out * grad_out
-            g.update_all(fn.copy_e('grad_score', 'm'), fn.sum('m', 'accum'))
-            g.apply_edges(fn.e_mul_v('out', 'accum', 'out'))
-            grad_score = g.edata['grad_score'] - g.edata['out']
-            return grad_score
+        sds = out * grad_out
+        accum = F.copy_e_sum(g, sds)
+        grad_score = sds - F.e_mul_v(g, out, accum)
+        self.save_tensors = None
+        return grad_score

 def edge_softmax(graph, logits, eids=ALL):
    r"""Compute edge softmax.

--- a/python/dgl/nn/pytorch/conv/gatedgraphconv.py
+++ b/python/dgl/nn/pytorch/conv/gatedgraphconv.py
@@ -78,7 +78,7 @@ class GatedGraphConv(nn.Module):
            is the output feature size.
        """
        with graph.local_scope():
-            assert graph.is_homograph(), \
+            assert graph.is_homogeneous(), \
                "not a homograph; convert it with to_homo and pass in the edge type as argument"
            zero_pad = feat.new_zeros((feat.shape[0], self._out_feats - feat.shape[1]))
            feat = th.cat([feat, zero_pad], -1)
@@ -86,7 +86,7 @@ class GatedGraphConv(nn.Module):
            for _ in range(self._n_steps):
                graph.ndata['h'] = feat
                for i in range(self._n_etypes):
-                    eids = (etypes == i).nonzero().view(-1)
+                    eids = (etypes == i).nonzero().view(-1).type(graph.idtype)
                    if len(eids) > 0:
                        graph.apply_edges(
                            lambda edges: {'W_e*h': self.linears[i](edges.src['h'])},

--- a/python/dgl/nn/pytorch/conv/graphconv.py
+++ b/python/dgl/nn/pytorch/conv/graphconv.py
@@ -138,7 +138,7 @@ class GraphConv(nn.Module):
            feat_src, feat_dst = expand_as_pair(feat, graph)

            if self._norm == 'both':
-                degs = graph.out_degrees().to(feat_src.device).float().clamp(min=1)
+                degs = graph.out_degrees().float().clamp(min=1)
                norm = th.pow(degs, -0.5)
                shp = norm.shape + (1,) * (feat_src.dim() - 1)
                norm = th.reshape(norm, shp)
@@ -170,7 +170,7 @@ class GraphConv(nn.Module):
                    rst = th.matmul(rst, weight)

            if self._norm != 'none':
-                degs = graph.in_degrees().to(feat_dst.device).float().clamp(min=1)
+                degs = graph.in_degrees().float().clamp(min=1)
                if self._norm == 'both':
                    norm = th.pow(degs, -0.5)
                else:

--- a/python/dgl/nn/pytorch/conv/tagconv.py
+++ b/python/dgl/nn/pytorch/conv/tagconv.py
@@ -74,7 +74,7 @@ class TAGConv(nn.Module):
            is size of output feature.
        """
        with graph.local_scope():
-            assert graph.is_homograph(), 'Graph is not homogeneous'
+            assert graph.is_homogeneous(), 'Graph is not homogeneous'

            norm = th.pow(graph.in_degrees().float().clamp(min=1), -0.5)
            shp = norm.shape + (1,) * (feat.dim() - 1)

--- a/python/dgl/nn/pytorch/glob.py
+++ b/python/dgl/nn/pytorch/glob.py
@@ -145,7 +145,7 @@ class SortPooling(nn.Module):
            feat, _ = feat.sort(dim=-1)
            graph.ndata['h'] = feat
            # Sort nodes according to their last features.
-            ret = topk_nodes(graph, 'h', self.k, idx=-1)[0].view(
+            ret = topk_nodes(graph, 'h', self.k, sortby=-1)[0].view(
                -1, self.k * feat.shape[-1])
            return ret

@@ -564,7 +564,7 @@ class SetTransformerEncoder(nn.Module):
        torch.Tensor
            The output feature with shape :math:`(N, D)`.
        """
-        lengths = graph.batch_num_nodes
+        lengths = graph.batch_num_nodes()
        for layer in self.layers:
            feat = layer(feat, lengths)
        return feat
@@ -626,7 +626,7 @@ class SetTransformerDecoder(nn.Module):
            The output feature with shape :math:`(B, D)`, where
            :math:`B` refers to the batch size.
        """
-        len_pma = graph.batch_num_nodes
+        len_pma = graph.batch_num_nodes()
        len_sab = [self.k] * graph.batch_size
        feat = self.pma(feat, len_pma)
        for layer in self.layers:

--- a/python/dgl/nn/pytorch/softmax.py
+++ b/python/dgl/nn/pytorch/softmax.py
@@ -43,13 +43,11 @@ class EdgeSoftmax(th.autograd.Function):
        # remember to save the graph to backward cache before making it
        # a local variable
        if not is_all(eids):
-            g = g.edge_subgraph(eids.long())
-
+            g = g.edge_subgraph(eids.type(g.idtype), preserve_nodes=True)
        score_max = F.copy_e_max(g, score)
        score = th.exp(F.e_sub_v(g, score, score_max))
        score_sum = F.copy_e_sum(g, score)
        out = F.e_div_v(g, score, score_sum)
-
        ctx.backward_cache = g
        ctx.save_for_backward(out)
        return out

--- a/python/dgl/nn/tensorflow/conv/relgraphconv.py
+++ b/python/dgl/nn/tensorflow/conv/relgraphconv.py
@@ -214,7 +214,7 @@ class RelGraphConv(layers.Layer):
        tf.Tensor
            New node features.
        """
-        assert g.is_homograph(), \
+        assert g.is_homogeneous(), \
            "not a homograph; convert it with to_homo and pass in the edge type as argument"
        with g.local_scope():
            g.ndata['h'] = x

--- a/python/dgl/nn/tensorflow/glob.py
+++ b/python/dgl/nn/tensorflow/glob.py
@@ -148,7 +148,7 @@ class SortPooling(layers.Layer):
            feat = tf.sort(feat, -1)
            graph.ndata['h'] = feat
            # Sort nodes according to their last features.
-            ret = tf.reshape(topk_nodes(graph, 'h', self.k, idx=-1)[0], (
+            ret = tf.reshape(topk_nodes(graph, 'h', self.k, sortby=-1)[0], (
                -1, self.k * feat.shape[-1]))
            return ret


--- a/python/dgl/nn/tensorflow/softmax.py
+++ b/python/dgl/nn/tensorflow/softmax.py
@@ -2,7 +2,7 @@
 # pylint: disable= no-member, arguments-differ
 import tensorflow as tf

-from ... import function as fn
+from ...sparse import _gspmm, _gsddmm
 from ...base import ALL, is_all

 __all__ = ['edge_softmax']
@@ -11,25 +11,18 @@ __all__ = ['edge_softmax']
 def edge_softmax_real(graph, score, eids=ALL):
    """Edge Softmax function"""
    if not is_all(eids):
-        graph = graph.edge_subgraph(tf.cast(eids, tf.int64))
-    with graph.local_scope():
-        graph.edata['s'] = score
-        graph.update_all(fn.copy_e('s', 'm'), fn.max('m', 'smax'))
-        graph.apply_edges(fn.e_sub_v('s', 'smax', 'out'))
-        graph.edata['out'] = tf.math.exp(graph.edata['out'])
-        graph.update_all(fn.copy_e('out', 'm'), fn.sum('m', 'out_sum'))
-        graph.apply_edges(fn.e_div_v('out', 'out_sum', 'out'))
-        out = graph.edata['out']
+        graph = graph.edge_subgraph(tf.cast(eids, graph.idtype))
+    gidx = graph._graph
+    score_max = _gspmm(gidx, 'copy_rhs', 'max', None, score)[0]
+    score = tf.math.exp(_gsddmm(gidx, 'sub', score, score_max, 'e', 'v'))
+    score_sum = _gspmm(gidx, 'copy_rhs', 'sum', None, score)[0]
+    out = _gsddmm(gidx, 'div', score, score_sum, 'e', 'v')

    def edge_softmax_backward(grad_out):
-        with graph.local_scope():
-            # clear backward cache explicitly
-            graph.edata['out'] = out
-            graph.edata['grad_s'] = out * grad_out
-            graph.update_all(fn.copy_e('grad_s', 'm'), fn.sum('m', 'accum'))
-            graph.apply_edges(fn.e_mul_v('out', 'accum', 'out'))
-            grad_score = graph.edata['grad_s'] - graph.edata['out']
-            return grad_score
+        sds = out * grad_out
+        accum = _gspmm(gidx, 'copy_rhs', 'sum', None, sds)[0]
+        grad_score = sds - _gsddmm(gidx, 'mul', out, accum, 'e', 'v')
+        return grad_score

    return out, edge_softmax_backward


--- a/python/dgl/nodeflow.py
+++ b/python/dgl/nodeflow.py
@@ -3,7 +3,7 @@ from __future__ import absolute_import

 from ._ffi.object import register_object, ObjectBase
 from ._ffi.function import _init_api
-from .base import ALL, is_all, DGLError
+from .base import ALL, is_all, DGLError, dgl_warning
 from . import backend as F
 from .frame import Frame, FrameRef
 from .graph import DGLBaseGraph
@@ -89,13 +89,15 @@ class NodeFlow(DGLBaseGraph):

    Parameters
    ----------
-    parent : DGLGraph
+    parent : DGLGraphStale
        The parent graph.
    nfobj : NodeFlowObject
        The nodeflow object
    """
    def __init__(self, parent, nfobj):
        super(NodeFlow, self).__init__(nfobj.graph)
+        dgl_warning('NodeFlow APIs are deprecated starting from v0.5. Please read our'
+                    ' guide<link> for how to use the new sampling APIs.')
        self._parent = parent
        self._node_mapping = utils.toindex(nfobj.node_mapping)
        self._edge_mapping = utils.toindex(nfobj.edge_mapping)
@@ -891,7 +893,7 @@ class NodeFlow(DGLBaseGraph):
    def block_compute(self, block_id, message_func="default", reduce_func="default",
                      apply_node_func="default", v=ALL, inplace=False):
        """Perform the computation on the specified block. It's similar to `pull`
-        in DGLGraph.
+        in DGLGraphStale.
        On the given block i, it runs `pull` on nodes in layer i+1, which generates
        messages on edges in block i, runs the reduce function and node update
        function on nodes in layer i+1.

--- a/python/dgl/partition.py
+++ b/python/dgl/partition.py
@@ -112,8 +112,8 @@ def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):
    # This creaets a subgraph from subgraphs returned from the CAPI above.
    def create_subgraph(subg, induced_nodes, induced_edges):
        subg1 = DGLHeteroGraph(gidx=subg.graph, ntypes=['_N'], etypes=['_E'])
-        subg1.ndata[NID] = induced_nodes[0].tousertensor()
-        subg1.edata[EID] = induced_edges[0].tousertensor()
+        subg1.ndata[NID] = induced_nodes[0]
+        subg1.edata[EID] = induced_edges[0]
        return subg1

    for i, subg in enumerate(subgs):

--- a/python/dgl/propagate.py
+++ b/python/dgl/propagate.py
 """Module for message propagation."""
 from __future__ import absolute_import

+from . import backend as F
 from . import traversal as trv
 from .heterograph import DGLHeteroGraph

@@ -86,7 +87,11 @@ def prop_nodes_bfs(graph,
        'DGLGraph is deprecated, Please use DGLHeteroGraph'
    assert len(graph.canonical_etypes) == 1, \
        'prop_nodes_bfs only support homogeneous graph'
-    nodes_gen = trv.bfs_nodes_generator(graph, source, reverse)
+    # TODO(murphy): Graph traversal currently is only supported on
+    # CPP graphs. Move graph to CPU as a workaround,
+    # which should be fixed in the future.
+    nodes_gen = trv.bfs_nodes_generator(graph.cpu(), source, reverse)
+    nodes_gen = [F.copy_to(frontier, graph.device) for frontier in nodes_gen]
    prop_nodes(graph, nodes_gen, message_func, reduce_func, apply_node_func)

 def prop_nodes_topo(graph,
@@ -117,7 +122,11 @@ def prop_nodes_topo(graph,
        'DGLGraph is deprecated, Please use DGLHeteroGraph'
    assert len(graph.canonical_etypes) == 1, \
        'prop_nodes_topo only support homogeneous graph'
-    nodes_gen = trv.topological_nodes_generator(graph, reverse)
+    # TODO(murphy): Graph traversal currently is only supported on
+    # CPP graphs. Move graph to CPU as a workaround,
+    # which should be fixed in the future.
+    nodes_gen = trv.topological_nodes_generator(graph.cpu(), reverse)
+    nodes_gen = [F.copy_to(frontier, graph.device) for frontier in nodes_gen]
    prop_nodes(graph, nodes_gen, message_func, reduce_func, apply_node_func)

 def prop_edges_dfs(graph,
@@ -157,7 +166,11 @@ def prop_edges_dfs(graph,
        'DGLGraph is deprecated, Please use DGLHeteroGraph'
    assert len(graph.canonical_etypes) == 1, \
        'prop_edges_dfs only support homogeneous graph'
+    # TODO(murphy): Graph traversal currently is only supported on
+    # CPP graphs. Move graph to CPU as a workaround,
+    # which should be fixed in the future.
    edges_gen = trv.dfs_labeled_edges_generator(
-        graph, source, reverse, has_reverse_edge, has_nontree_edge,
+        graph.cpu(), source, reverse, has_reverse_edge, has_nontree_edge,
        return_labels=False)
+    edges_gen = [F.copy_to(frontier, graph.device) for frontier in edges_gen]
    prop_edges(graph, edges_gen, message_func, reduce_func, apply_node_func)
--- a/python/dgl/readout.py
+++ b/python/dgl/readout.py
 """Classes and functions for batching multiple graphs together."""
 from __future__ import absolute_import

-import numpy as np
-
 from .base import DGLError
 from . import backend as F
+from . import segment

-__all__ = ['sum_nodes', 'sum_edges', 'mean_nodes', 'mean_edges',
+__all__ = ['readout_nodes', 'readout_edges',
+           'sum_nodes', 'sum_edges', 'mean_nodes', 'mean_edges',
           'max_nodes', 'max_edges', 'softmax_nodes', 'softmax_edges',
           'broadcast_nodes', 'broadcast_edges', 'topk_nodes', 'topk_edges']

-READOUT_ON_ATTRS = {
-    'nodes': ('ndata', 'batch_num_nodes', 'number_of_nodes'),
-    'edges': ('edata', 'batch_num_edges', 'number_of_edges'),
-}
+def readout_nodes(graph, feat, weight=None, *, op='sum', ntype=None):
+    """Generate a graph-level representation by aggregating node features
+    :attr:`feat`.

-def _sum_on(graph, typestr, feat, weight):
-    """Internal function to sum node or edge features.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    typestr : str
-        'nodes' or 'edges'
-    feat : str
-        The feature field name.
-    weight : str
-        The weight field name.
-
-    Returns
-    -------
-    tensor
-        The (weighted) summed node or edge features.
-    """
-    data_attr, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-    data = getattr(graph, data_attr)
-    feat = data[feat]
-
-    if weight is not None:
-        weight = data[weight]
-        weight = F.reshape(weight, (-1,) + (1,) * (F.ndim(feat) - 1))
-        feat = weight * feat
-
-    n_graphs = graph.batch_size
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    seg_id = F.zerocopy_from_numpy(np.arange(n_graphs, dtype='int64').repeat(batch_num_objs))
-    seg_id = F.copy_to(seg_id, F.context(feat))
-    y = F.unsorted_1d_segment_sum(feat, seg_id, n_graphs, 0)
-    return y
-
-def sum_nodes(graph, feat, weight=None):
-    """Sums all the values of node field :attr:`feat` in :attr:`graph`, optionally
-    multiplies the field by a scalar node field :attr:`weight`.
+    The function is commonly used as a *readout* function on a batch of graphs
+    to generate graph-level representation. Thus, the result tensor shape
+    depends on the batch size of the input graph. Given a graph of batch size
+    :math:`B`, and a feature size of :math:`D`, the result shape will be
+    :math:`(B, D)`, with each row being the aggregated node features of each
+    graph.

    Parameters
    ----------
    graph : DGLGraph.
-        The graph.
+        Input graph.
    feat : str
-        The feature field.
+        Node feature name.
    weight : str, optional
-        The weight field. If None, no weighting will be performed,
+        Node weight name. If None, no weighting will be performed,
        otherwise, weight each node feature with field :attr:`feat`.
-        for summation. The weight feature associated in the :attr:`graph`
-        should be a tensor of shape ``[graph.number_of_nodes(), 1]``.
+        for aggregation. The weight feature shape must be compatible with
+        an element-wise multiplication with the feature tensor.
+    op : str, optional
+        Readout operator. Can be 'sum', 'max', 'min', 'mean'.
+    ntype : str, optional
+        Node type. Can be omitted if there is only one node type in the graph.

    Returns
    -------
    tensor
-        The summed tensor.
-
-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of the
-    i-th graph in the batched graph. If a graph has no nodes,
-    a zero tensor with the same shape is returned at the corresponding row.
+        Result tensor.

    Examples
    --------
@@ -88,177 +51,73 @@ def sum_nodes(graph, feat, weight=None):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    node features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.ndata['h'] = th.tensor([[1.], [2.]])
-    >>> g1.ndata['w'] = th.tensor([[3.], [6.]])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.ndata['h'] = th.tensor([[1.], [2.], [3.]])
-
-    Sum over node attribute :attr:`h` without weighting for each graph in a
-    batched graph.
-
-    >>> bg = dgl.batch([g1, g2], node_attrs='h')
-    >>> dgl.sum_nodes(bg, 'h')
-    tensor([[3.],   # 1 + 2
-            [6.]])  # 1 + 2 + 3
-
-    Sum node attribute :attr:`h` with weight from node attribute :attr:`w`
-    for a single graph.
-
-    >>> dgl.sum_nodes(g1, 'h', 'w')
-    tensor([[15.]]) # 1 * 3 + 2 * 6
-
-    See Also
-    --------
-    mean_nodes
-    sum_edges
-    mean_edges
-    """
-    return _sum_on(graph, 'nodes', feat, weight)
-
-def sum_edges(graph, feat, weight=None):
-    """Sums all the values of edge field :attr:`feat` in :attr:`graph`,
-    optionally multiplies the field by a scalar edge field :attr:`weight`.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    feat : str
-        The feature field.
-    weight : str, optional
-        The weight field. If None, no weighting will be performed,
-        otherwise, weight each edge feature with field :attr:`feat`.
-        for summation. The weight feature associated in the :attr:`graph`
-        should be a tensor of shape ``[graph.number_of_edges(), 1]``.
-
-    Returns
-    -------
-    tensor
-        The summed tensor.
-
-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of the
-    i-th graph in the batched graph. If a graph has no edges,
-    a zero tensor with the same shape is returned at the corresponding row.
-
-    Examples
-    --------
+    >>> g1 = dgl.graph(([0, 1], [1, 0]))              # Graph 1
+    >>> g1.ndata['h'] = th.tensor([1., 2.])
+    >>> g2 = dgl.graph(([0, 1], [1, 2]))              # Graph 2
+    >>> g2.ndata['h'] = th.tensor([1., 2., 3.])

-    >>> import dgl
-    >>> import torch as th
+    Sum over one graph:

-    Create two :class:`~dgl.DGLGraph` objects and initialize their
-    edge features.
+    >>> dgl.readout_nodes(g1, 'h')
+    tensor([3.])  # 1 + 2

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.add_edges([0, 1], [1, 0])
-    >>> g1.edata['h'] = th.tensor([[1.], [2.]])
-    >>> g1.edata['w'] = th.tensor([[3.], [6.]])
+    Sum over a batched graph:

-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.add_edges([0, 1, 2], [1, 2, 0])
-    >>> g2.edata['h'] = th.tensor([[1.], [2.], [3.]])
+    >>> bg = dgl.batch([g1, g2])
+    >>> dgl.readout_nodes(bg, 'h')
+    tensor([3., 6.])  # [1 + 2, 1 + 2 + 3]

-    Sum over edge attribute :attr:`h` without weighting for each graph in a
-    batched graph.
+    Weighted sum:

-    >>> bg = dgl.batch([g1, g2], edge_attrs='h')
-    >>> dgl.sum_edges(bg, 'h')
-    tensor([[3.],   # 1 + 2
-            [6.]])  # 1 + 2 + 3
+    >>> bg.ndata['w'] = th.tensor([.1, .2, .1, .5, .2])
+    >>> dgl.readout_nodes(bg, 'h', 'w')
+    tensor([.5, 1.7])

-    Sum edge attribute :attr:`h` with weight from edge attribute :attr:`w`
-    for a single graph.
+    Readout by max:

-    >>> dgl.sum_edges(g1, 'h', 'w')
-    tensor([[15.]]) # 1 * 3 + 2 * 6
+    >>> dgl.readout_nodes(bg, 'h', op='max')
+    tensor([2., 3.])

    See Also
    --------
-    sum_nodes
-    mean_nodes
-    mean_edges
+    readout_edges
    """
-    return _sum_on(graph, 'edges', feat, weight)
-
-
-def _mean_on(graph, typestr, feat, weight):
-    """Internal function to sum node or edge features.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    typestr : str
-        'nodes' or 'edges'
-    feat : str
-        The feature field name.
-    weight : str
-        The weight field name.
-
-    Returns
-    -------
-    tensor
-        The (weighted) summed node or edge features.
-    """
-    data_attr, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-    data = getattr(graph, data_attr)
-    feat = data[feat]
-
+    x = graph.nodes[ntype].data[feat]
    if weight is not None:
-        weight = data[weight]
-        weight = F.reshape(weight, (-1,) + (1,) * (F.ndim(feat) - 1))
-        feat = weight * feat
-
-    n_graphs = graph.batch_size
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    seg_id = F.zerocopy_from_numpy(np.arange(n_graphs, dtype='int64').repeat(batch_num_objs))
-    seg_id = F.copy_to(seg_id, F.context(feat))
-    if weight is not None:
-        w = F.unsorted_1d_segment_sum(weight, seg_id, n_graphs, 0)
-        y = F.unsorted_1d_segment_sum(feat, seg_id, n_graphs, 0)
-        y = y / w
-    else:
-        y = F.unsorted_1d_segment_mean(feat, seg_id, n_graphs, 0)
-    return y
+        x = x * graph.nodes[ntype].data[weight]
+    return segment.segment_reduce(graph.batch_num_nodes(ntype), x, reducer=op)

-def mean_nodes(graph, feat, weight=None):
-    """Averages all the values of node field :attr:`feat` in :attr:`graph`,
-    optionally multiplies the field by a scalar node field :attr:`weight`.
+def readout_edges(graph, feat, weight=None, *, op='sum', etype=None):
+    """Sum the edge feature :attr:`feat` in :attr:`graph`, optionally
+    multiplies it by a edge :attr:`weight`.
+
+    The function is commonly used as a *readout* function on a batch of graphs
+    to generate graph-level representation. Thus, the result tensor shape
+    depends on the batch size of the input graph. Given a graph of batch size
+    :math:`B`, and a feature size of :math:`D`, the result shape will be
+    :math:`(B, D)`, with each row being the aggregated edge features of each
+    graph.

    Parameters
    ----------
-    graph : DGLGraph
-        The graph.
+    graph : DGLGraph.
+        Input graph.
    feat : str
-        The feature field.
+        Edge feature name.
    weight : str, optional
-        The weight field. If None, no weighting will be performed,
-        otherwise, weight each node feature with field :attr:`feat`.
-        for calculating mean. The weight feature associated in the :attr:`graph`
-        should be a tensor of shape ``[graph.number_of_nodes(), 1]``.
+        Edge weight name. If None, no weighting will be performed,
+        otherwise, weight each edge feature with field :attr:`feat`.
+        for summation. The weight feature shape must be compatible with
+        an element-wise multiplication with the feature tensor.
+    op : str, optional
+        Readout operator. Can be 'sum', 'max', 'min', 'mean'.
+    etype : str, tuple of str, optional
+        Edge type. Can be omitted if there is only one edge type in the graph.

    Returns
    -------
    tensor
-        The averaged tensor.
-
-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of
-    the i-th graph in the batch. If a graph has no nodes,
-    a zero tensor with the same shape is returned at the corresponding row.
+        Result tensor.

    Examples
    --------
@@ -267,302 +126,100 @@ def mean_nodes(graph, feat, weight=None):
    >>> import torch as th

    Create two :class:`~dgl.DGLGraph` objects and initialize their
-    node features.
-
-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.ndata['h'] = th.tensor([[1.], [2.]])
-    >>> g1.ndata['w'] = th.tensor([[3.], [6.]])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.ndata['h'] = th.tensor([[1.], [2.], [3.]])
-
-    Average over node attribute :attr:`h` without weighting for each graph in a
-    batched graph.
-
-    >>> bg = dgl.batch([g1, g2], node_attrs='h')
-    >>> dgl.mean_nodes(bg, 'h')
-    tensor([[1.5000],    # (1 + 2) / 2
-            [2.0000]])   # (1 + 2 + 3) / 3
-
-    Sum node attribute :attr:`h` with normalized weight from node attribute :attr:`w`
-    for a single graph.
-
-    >>> dgl.mean_nodes(g1, 'h', 'w') # h1 * (w1 / (w1 + w2)) + h2 * (w2 / (w1 + w2))
-    tensor([[1.6667]])               # 1 * (3 / (3 + 6)) + 2 * (6 / (3 + 6))
-
-    See Also
-    --------
-    sum_nodes
-    sum_edges
-    mean_edges
-    """
-    return _mean_on(graph, 'nodes', feat, weight)
-
-def mean_edges(graph, feat, weight=None):
-    """Averages all the values of edge field :attr:`feat` in :attr:`graph`,
-    optionally multiplies the field by a scalar edge field :attr:`weight`.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    feat : str
-        The feature field.
-    weight : optional, str
-        The weight field. If None, no weighting will be performed,
-        otherwise, weight each edge feature with field :attr:`feat`.
-        for calculating mean. The weight feature associated in the :attr:`graph`
-        should be a tensor of shape ``[graph.number_of_edges(), 1]``.
+    edge features.

-    Returns
-    -------
-    tensor
-        The averaged tensor.
+    >>> g1 = dgl.graph(([0, 1], [1, 0]))              # Graph 1
+    >>> g1.edata['h'] = th.tensor([1., 2.])
+    >>> g2 = dgl.graph(([0, 1], [1, 2]))              # Graph 2
+    >>> g2.edata['h'] = th.tensor([2., 3.])

-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of
-    the i-th graph in the batched graph. If a graph has no edges,
-    a zero tensor with the same shape is returned at the corresponding row.
+    Sum over one graph:

-    Examples
-    --------
+    >>> dgl.readout_edges(g1, 'h')
+    tensor([3.])  # 1 + 2

-    >>> import dgl
-    >>> import torch as th
-
-    Create two :class:`~dgl.DGLGraph` objects and initialize their
-    edge features.
+    Sum over a batched graph:

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.add_edges([0, 1], [1, 0])
-    >>> g1.edata['h'] = th.tensor([[1.], [2.]])
-    >>> g1.edata['w'] = th.tensor([[3.], [6.]])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.add_edges([0, 1, 2], [1, 2, 0])
-    >>> g2.edata['h'] = th.tensor([[1.], [2.], [3.]])
+    >>> bg = dgl.batch([g1, g2])
+    >>> dgl.readout_edges(bg, 'h')
+    tensor([3., 5.])  # [1 + 2, 2 + 3]

-    Average over edge attribute :attr:`h` without weighting for each graph in a
-    batched graph.
+    Weighted sum:

-    >>> bg = dgl.batch([g1, g2], edge_attrs='h')
-    >>> dgl.mean_edges(bg, 'h')
-    tensor([[1.5000],    # (1 + 2) / 2
-            [2.0000]])   # (1 + 2 + 3) / 3
+    >>> bg.edata['w'] = th.tensor([.1, .2, .1, .5])
+    >>> dgl.readout_edges(bg, 'h', 'w')
+    tensor([.5, 1.7])

-    Sum edge attribute :attr:`h` with normalized weight from edge attribute :attr:`w`
-    for a single graph.
+    Readout by max:

-    >>> dgl.mean_edges(g1, 'h', 'w') # h1 * (w1 / (w1 + w2)) + h2 * (w2 / (w1 + w2))
-    tensor([[1.6667]])               # 1 * (3 / (3 + 6)) + 2 * (6 / (3 + 6))
+    >>> dgl.readout_edges(bg, 'w', op='max')
+    tensor([2., 3.])

    See Also
    --------
-    sum_nodes
-    mean_nodes
-    sum_edges
+    readout_nodes
    """
-    return _mean_on(graph, 'edges', feat, weight)
-
-def _max_on(graph, typestr, feat):
-    """Internal function to take elementwise maximum
-     over node or edge features.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    typestr : str
-        'nodes' or 'edges'
-    feat : str
-        The feature field name.
+    x = graph.edges[etype].data[feat]
+    if weight is not None:
+        x = x * graph.edges[etype].data[weight]
+    return segment.segment_reduce(graph.batch_num_edges(etype), x, reducer=op)

-    Returns
-    -------
-    tensor
-        The (weighted) summed node or edge features.
+def sum_nodes(graph, feat, weight=None, *, ntype=None):
+    """Syntax sugar for ``dgl.readout_nodes(graph, feat, weight, ntype=ntype, op='sum')``.
    """
-    data_attr, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-    data = getattr(graph, data_attr)
-    feat = data[feat]
-
-    # TODO: the current solution pads the different graph sizes to the same,
-    #  a more efficient way is to use segment max, we need to implement it in
-    #  the future.
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    feat = F.pad_packed_tensor(feat, batch_num_objs, -float('inf'))
-    return F.max(feat, 1)
-
-
-def _softmax_on(graph, typestr, feat):
-    """Internal function of applying batch-wise graph-level softmax
-    over node or edge features of a given field.
+    return readout_nodes(graph, feat, weight, ntype=ntype, op='sum')

-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph
-    typestr : str
-        'nodes' or 'edges'
-    feat : str
-        The feature field name.
-
-    Returns
-    -------
-    tensor
-        The obtained tensor.
+def sum_edges(graph, feat, weight=None, *, etype=None):
+    """Syntax sugar for ``dgl.readout_edges(graph, feat, weight, etype=etype, op='sum')``.
    """
-    data_attr, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-    data = getattr(graph, data_attr)
-    feat = data[feat]
-
-    # TODO: the current solution pads the different graph sizes to the same,
-    #  a more efficient way is to use segment sum/max, we need to implement
-    #  it in the future.
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    feat = F.pad_packed_tensor(feat, batch_num_objs, -float('inf'))
-    feat = F.softmax(feat, 1)
-    return F.pack_padded_tensor(feat, batch_num_objs)
-
-def _broadcast_on(graph, typestr, feat_data):
-    """Internal function of broadcasting features to all nodes/edges.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph
-    typestr : str
-        'nodes' or 'edges'
-    feat_data : tensor
-        The feature to broadcast. Tensor shape is :math:`(*)` for single graph,
-        and :math:`(B, *)` for batched graph.
+    return readout_edges(graph, feat, weight, etype=etype, op='sum')

-    Returns
-    -------
-    tensor
-        The node/edge features tensor with shape :math:`(N, *)`.
+def mean_nodes(graph, feat, weight=None, *, ntype=None):
+    """Syntax sugar for ``dgl.readout_nodes(graph, feat, weight, ntype=ntype, op='mean')``.
    """
-    _, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    index = []
-    for i, num_obj in enumerate(batch_num_objs):
-        index.extend([i] * num_obj)
-    ctx = F.context(feat_data)
-    index = F.copy_to(F.tensor(index), ctx)
-    return F.gather_row(feat_data, index)
-
-def _topk_on(graph, typestr, feat, k, descending=True, idx=None):
-    """Internal function to take graph-wise top-k node/edge features of
-    field :attr:`feat` in :attr:`graph` ranked by keys at given
-    index :attr:`idx`. If :attr:`descending` is set to False, return the
-    k smallest elements instead.
-
-    If idx is set to None, the function would return top-k value of all
-    indices, which is equivalent to calling `th.topk(graph.ndata[feat], dim=0)`
-    for each single graph of the input batched-graph.
-
-    Parameters
-    ---------
-    graph : DGLGraph
-        The graph
-    typestr : str
-        'nodes' or 'edges'
-    feat : str
-        The feature field name.
-    k : int
-        The :math:`k` in "top-:math`k`".
-    descending : bool
-        Controls whether to return the largest or smallest elements,
-         defaults to True.
-    idx : int or None, defaults to None
-        The key index we sort :attr:`feat` on, if set to None, we sort
-        the whole :attr:`feat`.
-
-    Returns
-    -------
-    tuple of tensors:
-        The first tensor returns top-k features of each single graph of
-        the input graph:
-        a tensor with shape :math:`(B, K, D)` would be returned, where
-        :math:`B` is the batch size of the input graph.
-        The second tensor returns the top-k indices of each single graph
-        of the input graph:
-        a tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if` idx
-        is set to None) would be returned, where
-        :math:`B` is the batch size of the input graph.
+    return readout_nodes(graph, feat, weight, ntype=ntype, op='mean')

-    Notes
-    -----
-    If an example has :math:`n` nodes/edges and :math:`n<k`, in the first
-    returned tensor the :math:`n+1` to :math:`k`th rows would be padded
-    with all zero; in the second returned tensor, the behavior of :math:`n+1`
-    to :math:`k`th elements is not defined.
+def mean_edges(graph, feat, weight=None, *, etype=None):
+    """Syntax sugar for ``dgl.readout_edges(graph, feat, weight, etype=etype, op='mean')``.
    """
-    data_attr, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
-    data = getattr(graph, data_attr)
-    if F.ndim(data[feat]) > 2:
-        raise DGLError('The {} feature `{}` should have dimension less than or'
-                       ' equal to 2'.format(typestr, feat))
+    return readout_edges(graph, feat, weight, etype=etype, op='mean')

-    feat = data[feat]
-    hidden_size = F.shape(feat)[-1]
-    batch_num_objs = getattr(graph, batch_num_objs_attr)
-    batch_size = len(batch_num_objs)
-
-    length = max(max(batch_num_objs), k)
-    fill_val = -float('inf') if descending else float('inf')
-    feat_ = F.pad_packed_tensor(feat, batch_num_objs, fill_val, l_min=k)
-
-    if idx is not None:
-        keys = F.squeeze(F.slice_axis(feat_, -1, idx, idx+1), -1)
-        order = F.argsort(keys, -1, descending=descending)
-    else:
-        order = F.argsort(feat_, 1, descending=descending)
+def max_nodes(graph, feat, weight=None, *, ntype=None):
+    """Syntax sugar for ``dgl.readout_nodes(graph, feat, weight, ntype=ntype, op='max')``.
+    """
+    return readout_nodes(graph, feat, weight, ntype=ntype, op='max')

-    topk_indices = F.slice_axis(order, 1, 0, k)
+def max_edges(graph, feat, weight=None, *, etype=None):
+    """Syntax sugar for ``dgl.readout_edges(graph, feat, weight, etype=etype, op='max')``.
+    """
+    return readout_edges(graph, feat, weight, etype=etype, op='max')

-    # zero padding
-    feat_ = F.pad_packed_tensor(feat, batch_num_objs, 0, l_min=k)
+def softmax_nodes(graph, feat, *, ntype=None):
+    r"""Perform graph-wise softmax on the node features.

-    if idx is not None:
-        feat_ = F.reshape(feat_, (batch_size * length, -1))
-        shift = F.repeat(F.arange(0, batch_size) * length, k, -1)
-        shift = F.copy_to(shift, F.context(feat))
-        topk_indices_ = F.reshape(topk_indices, (-1,)) + shift
-    else:
-        feat_ = F.reshape(feat_, (-1,))
-        shift = F.repeat(F.arange(0, batch_size), k * hidden_size, -1) * length * hidden_size +\
-                F.cat([F.arange(0, hidden_size)] * batch_size * k, -1)
-        shift = F.copy_to(shift, F.context(feat))
-        topk_indices_ = F.reshape(topk_indices, (-1,)) * hidden_size + shift
+    For each node :math:`v\in\mathcal{V}` and its feature :math:`x_v`,
+    calculate its normalized feature as follows:

-    return F.reshape(F.gather_row(feat_, topk_indices_), (batch_size, k, -1)),\
-           topk_indices
+    .. math::
+        z_v = \frac{\exp(x_v)}{\sum_{u\in\mathcal{V}}\exp(x_u)}

-
-def max_nodes(graph, feat):
-    """Take elementwise maximum over all the values of node field
-    :attr:`feat` in :attr:`graph`
+    If the graph is a batch of multiple graphs, each graph computes softmax
+    independently. The result tensor has the same shape as the original node
+    feature.

    Parameters
    ----------
-    graph : DGLGraph
-        The graph.
+    graph : DGLGraph.
+        Input graph.
    feat : str
-        The feature field.
+        Node feature name.
+    ntype : str, optional
+        Node type. Can be omitted if there is only one node type in the graph.

    Returns
    -------
    tensor
-        The tensor obtained.
+        Result tensor.

    Examples
    --------
@@ -573,167 +230,55 @@ def max_nodes(graph, feat):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    node features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.ndata['h'] = th.tensor([[1.], [2.]])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.ndata['h'] = th.tensor([[1.], [2.], [3.]])
-
-    Max over node attribute :attr:`h` in a batched graph.
-
-    >>> bg = dgl.batch([g1, g2], node_attrs='h')
-    >>> dgl.max_nodes(bg, 'h')
-    tensor([[2.],    # max(1, 2)
-            [3.]])   # max(1, 2, 3)
-
-    Max over node attribute :attr:`h` in a single graph.
+    >>> g1 = dgl.graph(([0, 1], [1, 0]))              # Graph 1
+    >>> g1.ndata['h'] = th.tensor([1., 1.])
+    >>> g2 = dgl.graph(([0, 1], [1, 2]))              # Graph 2
+    >>> g2.ndata['h'] = th.tensor([1., 1., 1.])

-    >>> dgl.max_nodes(g1, 'h')
-    tensor([[2.]])
+    Softmax over one graph:

-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of
-    the i-th graph in the batched graph. If a graph has no nodes,
-    a tensor filed with -inf of the same shape is returned at the
-    corresponding row.
-    """
-    return _max_on(graph, 'nodes', feat)
-
-def max_edges(graph, feat):
-    """Take elementwise maximum over all the values of edge field
-    :attr:`feat` in :attr:`graph`
+    >>> dgl.softmax_nodes(g1, 'h')
+    tensor([.5000, .5000])

-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    feat : str
-        The feature field.
+    Softmax over a batched graph:

-    Returns
-    -------
-    tensor
-        The tensor obtained.
+    >>> bg = dgl.batch([g1, g2])
+    >>> dgl.softmax_nodes(bg, 'h')
+    tensor([.5000, .5000, .3333, .3333, .3333])

-    Examples
+    See Also
    --------
-
-    >>> import dgl
-    >>> import torch as th
-
-    Create two :class:`~dgl.DGLGraph` objects and initialize their
-    edge features.
-
-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.add_edges([0, 1], [1, 0])
-    >>> g1.edata['h'] = th.tensor([[1.], [2.]])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.add_edges([0, 1, 2], [1, 2, 0])
-    >>> g2.edata['h'] = th.tensor([[1.], [2.], [3.]])
-
-    Max over edge attribute :attr:`h` in a batched graph.
-
-    >>> bg = dgl.batch([g1, g2], edge_attrs='h')
-    >>> dgl.max_edges(bg, 'h')
-    tensor([[2.],    # max(1, 2)
-            [3.]])   # max(1, 2, 3)
-
-    Max over edge attribute :attr:`h` in a single graph.
-
-    >>> dgl.max_edges(g1, 'h')
-    tensor([[2.]])
-
-    Notes
-    -----
-    Return a stacked tensor with an extra first dimension whose size equals
-    batch size of the input graph.
-    The i-th row of the stacked tensor contains the readout result of
-    the i-th graph in the batched graph. If a graph has no edges,
-    a tensor filled with -inf of the same shape is returned at the
-    corresponding row.
+    softmax_edges
    """
-    return _max_on(graph, 'edges', feat)
-
-def softmax_nodes(graph, feat):
-    """Apply batch-wise graph-level softmax over all the values of node field
-    :attr:`feat` in :attr:`graph`.
-
-    Parameters
-    ----------
-    graph : DGLGraph
-        The graph.
-    feat : str
-        The feature field.
-
-    Returns
-    -------
-    tensor
-        The tensor obtained.
-
-    Examples
-    --------
-
-    >>> import dgl
-    >>> import torch as th
-
-    Create two :class:`~dgl.DGLGraph` objects and initialize their
-    node features.
+    x = graph.nodes[ntype].data[feat]
+    return segment.segment_softmax(graph.batch_num_nodes(ntype), x)

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.ndata['h'] = th.tensor([[1., 0.], [2., 0.]])
+def softmax_edges(graph, feat, *, etype=None):
+    r"""Perform graph-wise softmax on the edge features.

-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.ndata['h'] = th.tensor([[1., 0.], [2., 0.], [3., 0.]])
+    For each edge :math:`e\in\mathcal{E}` and its feature :math:`x_e`,
+    calculate its normalized feature as follows:

-    Softmax over node attribute :attr:`h` in a batched graph.
-
-    >>> bg = dgl.batch([g1, g2], node_attrs='h')
-    >>> dgl.softmax_nodes(bg, 'h')
-    tensor([[0.2689, 0.5000], # [0.2689, 0.7311] = softmax([1., 2.])
-            [0.7311, 0.5000], # [0.5000, 0.5000] = softmax([0., 0.])
-            [0.0900, 0.3333], # [0.0900, 0.2447, 0.6652] = softmax([1., 2., 3.])
-            [0.2447, 0.3333], # [0.3333, 0.3333, 0.3333] = softmax([0., 0., 0.])
-            [0.6652, 0.3333]])
-
-    Softmax over node attribute :attr:`h` in a single graph.
-
-    >>> dgl.softmax_nodes(g1, 'h')
-    tensor([[0.2689, 0.5000],   # [0.2689, 0.7311] = softmax([1., 2.])
-            [0.7311, 0.5000]]), # [0.5000, 0.5000] = softmax([0., 0.])
-
-    Notes
-    -----
-    If the input graph has batch size greater then one, the softmax is applied at
-    each single graph in the batched graph.
-    """
-    return _softmax_on(graph, 'nodes', feat)
+    .. math::
+        z_e = \frac{\exp(x_e)}{\sum_{e'\in\mathcal{E}}\exp(x_{e'})}

-
-def softmax_edges(graph, feat):
-    """Apply batch-wise graph-level softmax over all the values of edge field
-    :attr:`feat` in :attr:`graph`.
+    If the graph is a batch of multiple graphs, each graph computes softmax
+    independently. The result tensor has the same shape as the original edge
+    feature.

    Parameters
    ----------
-    graph : DGLGraph
-        The graph.
+    graph : DGLGraph.
+        Input graph.
    feat : str
-        The feature field.
+        Edge feature name.
+    etype : str, typle of str, optional
+        Edge type. Can be omitted if there is only one edge type in the graph.

    Returns
    -------
    tensor
-        The tensor obtained.
+        Result tensor.

    Examples
    --------
@@ -744,55 +289,56 @@ def softmax_edges(graph, feat):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    edge features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.add_edges([0, 1], [1, 0])
-    >>> g1.edata['h'] = th.tensor([[1., 0.], [2., 0.]])
+    >>> g1 = dgl.graph(([0, 1], [1, 0]))              # Graph 1
+    >>> g1.edata['h'] = th.tensor([1., 1.])
+    >>> g2 = dgl.graph(([0, 1, 0], [1, 2, 2]))        # Graph 2
+    >>> g2.edata['h'] = th.tensor([1., 1., 1.])

-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.add_edges([0, 1, 2], [1, 2, 0])
-    >>> g2.edata['h'] = th.tensor([[1., 0.], [2., 0.], [3., 0.]])
+    Softmax over one graph:

-    Softmax over edge attribute :attr:`h` in a batched graph.
+    >>> dgl.softmax_edges(g1, 'h')
+    tensor([.5000, .5000])
+
+    Softmax over a batched graph:

-    >>> bg = dgl.batch([g1, g2], edge_attrs='h')
+    >>> bg = dgl.batch([g1, g2])
    >>> dgl.softmax_edges(bg, 'h')
-    tensor([[0.2689, 0.5000], # [0.2689, 0.7311] = softmax([1., 2.])
-            [0.7311, 0.5000], # [0.5000, 0.5000] = softmax([0., 0.])
-            [0.0900, 0.3333], # [0.0900, 0.2447, 0.6652] = softmax([1., 2., 3.])
-            [0.2447, 0.3333], # [0.3333, 0.3333, 0.3333] = softmax([0., 0., 0.])
-            [0.6652, 0.3333]])
+    tensor([.5000, .5000, .3333, .3333, .3333])

-    Softmax over edge attribute :attr:`h` in a single graph.
+    See Also
+    --------
+    softmax_nodes
+    """
+    x = graph.edges[etype].data[feat]
+    return segment.segment_softmax(graph.batch_num_edges(etype), x)

-    >>> dgl.softmax_edges(g1, 'h')
-    tensor([[0.2689, 0.5000],   # [0.2689, 0.7311] = softmax([1., 2.])
-            [0.7311, 0.5000]]), # [0.5000, 0.5000] = softmax([0., 0.])
+def broadcast_nodes(graph, graph_feat, *, ntype=None):
+    """Generate a node feature equal to the graph-level feature :attr:`graph_feat`.

-    Notes
-    -----
-    If the input graph has batch size greater then one, the softmax is applied at each
-    example in the batch.
-    """
-    return _softmax_on(graph, 'edges', feat)
+    The operation is similar to ``numpy.repeat`` (or ``torch.repeat_interleave``).
+    It is commonly used to normalize node features by a global vector. For example,
+    to normalize node features across graph to range :math:`[0~1)`:

-def broadcast_nodes(graph, feat_data):
-    """Broadcast :attr:`feat_data` to all nodes in :attr:`graph`, and return a
-    tensor of node features.
+    >>> g = dgl.batch([...])  # batch multiple graphs
+    >>> g.ndata['h'] = ...  # some node features
+    >>> h_sum = dgl.broadcast_nodes(g, dgl.sum_nodes(g, 'h'))
+    >>> g.ndata['h'] /= h_sum  # normalize by summation

    Parameters
    ----------
    graph : DGLGraph
        The graph.
-    feat_data : tensor
+    graph_feat : tensor
        The feature to broadcast. Tensor shape is :math:`(*)` for single graph, and
        :math:`(B, *)` for batched graph.
+    ntype : str, optional
+        Node type. Can be omitted if there is only one node type.

    Returns
    -------
-    tensor
-        The node features tensor with shape :math:`(N, *)`.
+    Tensor
+        The node features tensor with shape :math:`(N, *)`, where :math:`N` is the
+        number of nodes.

    Examples
    --------
@@ -803,12 +349,8 @@ def broadcast_nodes(graph, feat_data):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    node features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-
+    >>> g1 = dgl.graph(([0], [1]))                    # Graph 1
+    >>> g2 = dgl.graph(([0, 1], [1, 2]))              # Graph 2
    >>> bg = dgl.batch([g1, g2])
    >>> feat = th.rand(2, 5)
    >>> feat
@@ -825,34 +367,45 @@ def broadcast_nodes(graph, feat_data):
            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014],
            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014]])

-    Broadcast feature to all nodes in the batched graph.
+    Broadcast feature to all nodes in the single graph.

    >>> dgl.broadcast_nodes(g1, feat[0])
    tensor([[0.4325, 0.7710, 0.5541, 0.0544, 0.9368],
            [0.4325, 0.7710, 0.5541, 0.0544, 0.9368]])

-    Notes
-    -----
-    feat[i] is broadcast to the nodes in i-th graph in the batched graph.
+    See Also
+    --------
+    broadcast_edges
    """
-    return _broadcast_on(graph, 'nodes', feat_data)
+    return F.repeat(graph_feat, graph.batch_num_nodes(ntype), dim=0)

-def broadcast_edges(graph, feat_data):
-    """Broadcast :attr:`feat_data` to all edges in :attr:`graph`, and return a
-    tensor of edge features.
+def broadcast_edges(graph, graph_feat, *, etype=None):
+    """Generate an edge feature equal to the graph-level feature :attr:`graph_feat`.
+
+    The operation is similar to ``numpy.repeat`` (or ``torch.repeat_interleave``).
+    It is commonly used to normalize edge features by a global vector. For example,
+    to normalize edge features across graph to range :math:`[0~1)`:
+
+    >>> g = dgl.batch([...])  # batch multiple graphs
+    >>> g.edata['h'] = ...  # some node features
+    >>> h_sum = dgl.broadcast_edges(g, dgl.sum_edges(g, 'h'))
+    >>> g.edata['h'] /= h_sum  # normalize by summation

    Parameters
    ----------
    graph : DGLGraph
        The graph.
-    feat_data : tensor
-        The feature to broadcast. Tensor shape is :math:`(*)` for single
-        graph, and :math:`(B, *)` for batched graph.
+    graph_feat : tensor
+        The feature to broadcast. Tensor shape is :math:`(*)` for single graph, and
+        :math:`(B, *)` for batched graph.
+    etype : str, typle of str, optional
+        Edge type. Can be omitted if there is only one edge type in the graph.

    Returns
    -------
-    tensor
-        The edge features tensor with shape :math:`(E, *)`
+    Tensor
+        The edge features tensor with shape :math:`(M, *)`, where :math:`M` is the
+        number of edges.

    Examples
    --------
@@ -863,14 +416,8 @@ def broadcast_edges(graph, feat_data):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    edge features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(2)
-    >>> g1.add_edges([0, 1], [1, 0])
-
-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(3)
-    >>> g2.add_edges([0, 1, 2], [1, 2, 0])
-
+    >>> g1 = dgl.graph(([0], [1]))                    # Graph 1
+    >>> g2 = dgl.graph(([0, 1], [1, 2]))              # Graph 2
    >>> bg = dgl.batch([g1, g2])
    >>> feat = th.rand(2, 5)
    >>> feat
@@ -882,32 +429,119 @@ def broadcast_edges(graph, feat_data):

    >>> dgl.broadcast_edges(bg, feat)
    tensor([[0.4325, 0.7710, 0.5541, 0.0544, 0.9368],
-            [0.4325, 0.7710, 0.5541, 0.0544, 0.9368],
-            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014],
            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014],
            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014]])

-    Broadcast feature to all edges in the batched graph.
+    Broadcast feature to all edges in the single graph.
+
+    >>> dgl.broadcast_edges(g2, feat[1])
+    tensor([[0.2721, 0.4629, 0.7269, 0.0724, 0.1014],
+            [0.2721, 0.4629, 0.7269, 0.0724, 0.1014]])
+
+    See Also
+    --------
+    broadcast_nodes
+    """
+    return F.repeat(graph_feat, graph.batch_num_edges(etype), dim=0)
+
+READOUT_ON_ATTRS = {
+    'nodes': ('ndata', 'batch_num_nodes', 'number_of_nodes'),
+    'edges': ('edata', 'batch_num_edges', 'number_of_edges'),
+}
+
+def _topk_on(graph, typestr, feat, k, descending, sortby, ntype_or_etype):
+    """Internal function to take graph-wise top-k node/edge features of
+    field :attr:`feat` in :attr:`graph` ranked by keys at given
+    index :attr:`sortby`. If :attr:`descending` is set to False, return the
+    k smallest elements instead.
+
+    Parameters
+    ---------
+    graph : DGLGraph
+        The graph
+    typestr : str
+        'nodes' or 'edges'
+    feat : str
+        The feature field name.
+    k : int
+        The :math:`k` in "top-:math`k`".
+    descending : bool
+        Controls whether to return the largest or smallest elements,
+         defaults to True.
+    sortby : int
+        The key index we sort :attr:`feat` on, if set to None, we sort
+        the whole :attr:`feat`.
+    ntype_or_etype : str, tuple of str
+        Node/edge type.
+
+    Returns
+    -------
+    sorted_feat : Tensor
+        A tensor with shape :math:`(B, K, D)`, where
+        :math:`B` is the batch size of the input graph.
+    sorted_idx : Tensor
+        A tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if sortby
+        is set to None), where
+        :math:`B` is the batch size of the input graph, :math:`D`
+        is the feature size.

-    >>> dgl.broadcast_edges(g1, feat[0])
-    tensor([[0.4325, 0.7710, 0.5541, 0.0544, 0.9368],
-            [0.4325, 0.7710, 0.5541, 0.0544, 0.9368]])

    Notes
    -----
-    feat[i] is broadcast to the edges in i-th graph in the batched graph.
+    If an example has :math:`n` nodes/edges and :math:`n<k`, in the first
+    returned tensor the :math:`n+1` to :math:`k`th rows would be padded
+    with all zero; in the second returned tensor, the behavior of :math:`n+1`
+    to :math:`k`th elements is not defined.
    """
-    return _broadcast_on(graph, 'edges', feat_data)
+    _, batch_num_objs_attr, _ = READOUT_ON_ATTRS[typestr]
+    data = getattr(graph, typestr)[ntype_or_etype].data
+    if F.ndim(data[feat]) > 2:
+        raise DGLError('Only support {} feature `{}` with dimension less than or'
+                       ' equal to 2'.format(typestr, feat))
+
+    feat = data[feat]
+    hidden_size = F.shape(feat)[-1]
+    batch_num_objs = getattr(graph, batch_num_objs_attr)(ntype_or_etype)
+    batch_size = len(batch_num_objs)
+
+    length = max(max(F.asnumpy(batch_num_objs)), k)
+    fill_val = -float('inf') if descending else float('inf')
+    feat_ = F.pad_packed_tensor(feat, batch_num_objs, fill_val, l_min=k)
+
+    if sortby is not None:
+        keys = F.squeeze(F.slice_axis(feat_, -1, sortby, sortby+1), -1)
+        order = F.argsort(keys, -1, descending=descending)
+    else:
+        order = F.argsort(feat_, 1, descending=descending)

-def topk_nodes(graph, feat, k, descending=True, idx=None):
-    """Return graph-wise top-k node features of field :attr:`feat` in
-    :attr:`graph` ranked by keys at given index :attr:`idx`. If :attr:
+    topk_indices = F.slice_axis(order, 1, 0, k)
+
+    # zero padding
+    feat_ = F.pad_packed_tensor(feat, batch_num_objs, 0, l_min=k)
+
+    if sortby is not None:
+        feat_ = F.reshape(feat_, (batch_size * length, -1))
+        shift = F.repeat(F.arange(0, batch_size) * length, k, -1)
+        shift = F.copy_to(shift, F.context(feat))
+        topk_indices_ = F.reshape(topk_indices, (-1,)) + shift
+    else:
+        feat_ = F.reshape(feat_, (-1,))
+        shift = F.repeat(F.arange(0, batch_size), k * hidden_size, -1) * length * hidden_size +\
+                F.cat([F.arange(0, hidden_size)] * batch_size * k, -1)
+        shift = F.copy_to(shift, F.context(feat))
+        topk_indices_ = F.reshape(topk_indices, (-1,)) * hidden_size + shift
+
+    return F.reshape(F.gather_row(feat_, topk_indices_), (batch_size, k, -1)),\
+           topk_indices
+
+def topk_nodes(graph, feat, k, *, descending=True, sortby=None, ntype=None):
+    """Perform a graph-wise top-k on node features :attr:`feat` in
+    :attr:`graph` by feature at index :attr:`sortby`. If :attr:
    `descending` is set to False, return the k smallest elements instead.

-    If idx is set to None, the function would return top-k value of all
-    indices, which is equivalent to calling
-    :code:`torch.topk(graph.ndata[feat], dim=0)`
-    for each example of the input graph.
+    If :attr:`sortby` is set to None, the function would perform top-k on
+    all dimensions independently, equivalent to calling
+    :code:`torch.topk(graph.ndata[feat], dim=0)`.

    Parameters
    ----------
@@ -919,22 +553,21 @@ def topk_nodes(graph, feat, k, descending=True, idx=None):
        The k in "top-k"
    descending : bool
        Controls whether to return the largest or smallest elements.
-    idx : int or None, defaults to None
-        The index of keys we rank :attr:`feat` on, if set to None, we sort
-        the whole :attr:`feat`.
+    sortby : int, optional
+        Sort according to which feature. If is None, all features are sorted independently.
+    ntype : str, optional
+        Node type. Can be omitted if there is only one node type in the graph.

    Returns
    -------
-    tuple of tensors
-        The first tensor returns top-k node features of each single graph of
-        the input graph:
-        a tensor with shape :math:`(B, K, D)` would be returned, where
-        :math:`B` is the batch size of the input graph.
-        The second tensor returns the top-k node indices of each single graph
-        of the input graph:
-        a tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if` idx
-        is set to None) would be returned, where
+    sorted_feat : Tensor
+        A tensor with shape :math:`(B, K, D)`, where
        :math:`B` is the batch size of the input graph.
+    sorted_idx : Tensor
+        A tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if sortby
+        is set to None), where
+        :math:`B` is the batch size of the input graph, :math:`D`
+        is the feature size.

    Examples
    --------
@@ -945,8 +578,7 @@ def topk_nodes(graph, feat, k, descending=True, idx=None):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    node features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(4)
+    >>> g1 = dgl.graph(([0, 1], [2, 3]))              # Graph 1
    >>> g1.ndata['h'] = th.rand(4, 5)
    >>> g1.ndata['h']
    tensor([[0.0297, 0.8307, 0.9140, 0.6702, 0.3346],
@@ -954,8 +586,7 @@ def topk_nodes(graph, feat, k, descending=True, idx=None):
            [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
            [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]])

-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(5)
+    >>> g2 = dgl.graph(([0, 1, 2], [2, 3, 4]))       # Graph 2
    >>> g2.ndata['h'] = th.rand(5, 5)
    >>> g2.ndata['h']
    tensor([[0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
@@ -966,63 +597,58 @@ def topk_nodes(graph, feat, k, descending=True, idx=None):

    Top-k over node attribute :attr:`h` in a batched graph.

-    >>> bg = dgl.batch([g1, g2], node_attrs='h')
+    >>> bg = dgl.batch([g1, g2], ndata=['h'])
    >>> dgl.topk_nodes(bg, 'h', 3)
    (tensor([[[0.5901, 0.8307, 0.9280, 0.8954, 0.7997],
-             [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
-             [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]],
-
-            [[0.5065, 0.9105, 0.5692, 0.8489, 0.3872],
-             [0.3168, 0.5182, 0.5418, 0.6114, 0.3808],
-             [0.1931, 0.4954, 0.5303, 0.3934, 0.1458]]]), tensor([[[1, 0, 1, 3, 1],
-             [3, 2, 0, 2, 2],
-             [2, 3, 2, 1, 3]],
-
-            [[4, 2, 2, 2, 4],
-             [0, 4, 4, 1, 0],
-             [3, 3, 0, 3, 1]]]))
-
-    Top-k over node attribute :attr:`h` along index -1 in a batched graph.
+              [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
+              [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]],
+             [[0.5065, 0.9105, 0.5692, 0.8489, 0.3872],
+              [0.3168, 0.5182, 0.5418, 0.6114, 0.3808],
+              [0.1931, 0.4954, 0.5303, 0.3934, 0.1458]]]), tensor([[[1, 0, 1, 3, 1],
+              [3, 2, 0, 2, 2],
+              [2, 3, 2, 1, 3]],
+             [[4, 2, 2, 2, 4],
+              [0, 4, 4, 1, 0],
+              [3, 3, 0, 3, 1]]]))
+
+    Top-k over node attribute :attr:`h` along the last dimension in a batched graph.
    (used in SortPooling)

-    >>> dgl.topk_nodes(bg, 'h', 3, idx=-1)
+    >>> dgl.topk_nodes(bg, 'h', 3, sortby=-1)
    (tensor([[[0.5901, 0.3030, 0.9280, 0.6893, 0.7997],
-             [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
-             [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]],
-
-            [[0.5065, 0.5182, 0.5418, 0.1520, 0.3872],
-             [0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
-             [0.1323, 0.2766, 0.4318, 0.6114, 0.1458]]]), tensor([[1, 2, 3],
-            [4, 0, 1]]))
+              [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
+              [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]],
+             [[0.5065, 0.5182, 0.5418, 0.1520, 0.3872],
+              [0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
+              [0.1323, 0.2766, 0.4318, 0.6114, 0.1458]]]), tensor([[1, 2, 3],
+             [4, 0, 1]]))

    Top-k over node attribute :attr:`h` in a single graph.

    >>> dgl.topk_nodes(g1, 'h', 3)
    (tensor([[[0.5901, 0.8307, 0.9280, 0.8954, 0.7997],
-             [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
-             [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]]]), tensor([[[1, 0, 1, 3, 1],
-             [3, 2, 0, 2, 2],
-             [2, 3, 2, 1, 3]]]))
+              [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
+              [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]]]), tensor([[[1, 0, 1, 3, 1],
+              [3, 2, 0, 2, 2],
+              [2, 3, 2, 1, 3]]]))

    Notes
    -----
-    If an example has :math:`n` nodes and :math:`n<k`, in the first
-    returned tensor the :math:`n+1` to :math:`k`th rows would be padded
-    with all zero; in the second returned tensor, the behavior of :math:`n+1`
-    to :math:`k`th elements is not defined.
+    If an example has :math:`n` nodes and :math:`n<k`, the ``sorted_feat``
+    tensor will pad the :math:`n+1` to :math:`k`th rows with zero;
    """
-    return _topk_on(graph, 'nodes', feat, k, descending=descending, idx=idx)
+    return _topk_on(graph, 'nodes', feat, k,
+                    descending=descending, sortby=sortby,
+                    ntype_or_etype=ntype)

-def topk_edges(graph, feat, k, descending=True, idx=None):
-    """Return graph-wise top-k edge features of field :attr:`feat` in
-    :attr:`graph` ranked by keys at given index :attr:`idx`. If
-    :attr:`descending` is set to False, return the k smallest elements
-    instead.
+def topk_edges(graph, feat, k, *, descending=True, sortby=None, etype=None):
+    """Perform a graph-wise top-k on node features :attr:`feat` in
+    :attr:`graph` by feature at index :attr:`sortby`. If :attr:
+    `descending` is set to False, return the k smallest elements instead.

-    If idx is set to None, the function would return top-k value of all
-    indices, which is equivalent to calling
-    :code:`torch.topk(graph.edata[feat], dim=0)`
-    for each example of the input graph.
+    If :attr:`sortby` is set to None, the function would perform top-k on
+    all dimensions independently, equivalent to calling
+    :code:`torch.topk(graph.edata[feat], dim=0)`.

    Parameters
    ----------
@@ -1031,25 +657,24 @@ def topk_edges(graph, feat, k, descending=True, idx=None):
    feat : str
        The feature field.
    k : int
-        The k in "top-k".
+        The k in "top-k"
    descending : bool
        Controls whether to return the largest or smallest elements.
-    idx : int or None, defaults to None
-        The key index we sort :attr:`feat` on, if set to None, we sort
-        the whole :attr:`feat`.
+    sortby : int, optional
+        Sort according to which feature. If is None, all features are sorted independently.
+    etype : str, typle of str, optional
+        Edge type. Can be omitted if there is only one edge type in the graph.

    Returns
    -------
-    tuple of tensors
-        The first tensor returns top-k edge features of each single graph of
-        the input graph:
-        a tensor with shape :math:`(B, K, D)` would be returned, where
-        :math:`B` is the batch size of the input graph.
-        The second tensor returns the top-k edge indices of each single graph
-        of the input graph:
-        a tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if` idx
-        is set to None) would be returned, where
+    sorted_feat : Tensor
+        A tensor with shape :math:`(B, K, D)`, where
        :math:`B` is the batch size of the input graph.
+    sorted_idx : Tensor
+        A tensor with shape :math:`(B, K)`(:math:`(B, K, D)` if sortby
+        is set to None), where
+        :math:`B` is the batch size of the input graph, :math:`D`
+        is the feature size.

    Examples
    --------
@@ -1060,9 +685,7 @@ def topk_edges(graph, feat, k, descending=True, idx=None):
    Create two :class:`~dgl.DGLGraph` objects and initialize their
    edge features.

-    >>> g1 = dgl.DGLGraph()                           # Graph 1
-    >>> g1.add_nodes(4)
-    >>> g1.add_edges([0, 1, 2, 3], [1, 2, 3, 0])
+    >>> g1 = dgl.graph(([0, 1, 2, 3], [1, 2, 3, 0]))         # Graph 1
    >>> g1.edata['h'] = th.rand(4, 5)
    >>> g1.edata['h']
    tensor([[0.0297, 0.8307, 0.9140, 0.6702, 0.3346],
@@ -1070,9 +693,7 @@ def topk_edges(graph, feat, k, descending=True, idx=None):
            [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
            [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]])

-    >>> g2 = dgl.DGLGraph()                           # Graph 2
-    >>> g2.add_nodes(5)
-    >>> g2.add_edges([0, 1, 2, 3, 4], [1, 2, 3, 4, 0])
+    >>> g2 = dgl.graph(([0, 1, 2, 3, 4], [1, 2, 3, 4, 0]))   # Graph 2
    >>> g2.edata['h'] = th.rand(5, 5)
    >>> g2.edata['h']
    tensor([[0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
@@ -1083,49 +704,46 @@ def topk_edges(graph, feat, k, descending=True, idx=None):

    Top-k over edge attribute :attr:`h` in a batched graph.

-    >>> bg = dgl.batch([g1, g2], edge_attrs='h')
+    >>> bg = dgl.batch([g1, g2], edata=['h'])
    >>> dgl.topk_edges(bg, 'h', 3)
    (tensor([[[0.5901, 0.8307, 0.9280, 0.8954, 0.7997],
-             [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
-             [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]],
-
-            [[0.5065, 0.9105, 0.5692, 0.8489, 0.3872],
-             [0.3168, 0.5182, 0.5418, 0.6114, 0.3808],
-             [0.1931, 0.4954, 0.5303, 0.3934, 0.1458]]]), tensor([[[1, 0, 1, 3, 1],
-             [3, 2, 0, 2, 2],
-             [2, 3, 2, 1, 3]],
-
-            [[4, 2, 2, 2, 4],
-             [0, 4, 4, 1, 0],
-             [3, 3, 0, 3, 1]]]))
+              [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
+              [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]],
+             [[0.5065, 0.9105, 0.5692, 0.8489, 0.3872],
+              [0.3168, 0.5182, 0.5418, 0.6114, 0.3808],
+              [0.1931, 0.4954, 0.5303, 0.3934, 0.1458]]]), tensor([[[1, 0, 1, 3, 1],
+              [3, 2, 0, 2, 2],
+              [2, 3, 2, 1, 3]],
+             [[4, 2, 2, 2, 4],
+              [0, 4, 4, 1, 0],
+              [3, 3, 0, 3, 1]]]))

    Top-k over edge attribute :attr:`h` along index -1 in a batched graph.
    (used in SortPooling)

-    >>> dgl.topk_edges(bg, 'h', 3, idx=-1)
+    >>> dgl.topk_edges(bg, 'h', 3, sortby=-1)
    (tensor([[[0.5901, 0.3030, 0.9280, 0.6893, 0.7997],
-             [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
-             [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]],
-
-            [[0.5065, 0.5182, 0.5418, 0.1520, 0.3872],
-             [0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
-             [0.1323, 0.2766, 0.4318, 0.6114, 0.1458]]]), tensor([[1, 2, 3],
-            [4, 0, 1]]))
+              [0.0880, 0.6515, 0.4451, 0.7507, 0.5297],
+              [0.5171, 0.6379, 0.2695, 0.8954, 0.5197]],
+             [[0.5065, 0.5182, 0.5418, 0.1520, 0.3872],
+              [0.3168, 0.3174, 0.5303, 0.0804, 0.3808],
+              [0.1323, 0.2766, 0.4318, 0.6114, 0.1458]]]), tensor([[1, 2, 3],
+             [4, 0, 1]]))

    Top-k over edge attribute :attr:`h` in a single graph.

    >>> dgl.topk_edges(g1, 'h', 3)
    (tensor([[[0.5901, 0.8307, 0.9280, 0.8954, 0.7997],
-             [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
-             [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]]]), tensor([[[1, 0, 1, 3, 1],
-             [3, 2, 0, 2, 2],
-             [2, 3, 2, 1, 3]]]))
+              [0.5171, 0.6515, 0.9140, 0.7507, 0.5297],
+              [0.0880, 0.6379, 0.4451, 0.6893, 0.5197]]]), tensor([[[1, 0, 1, 3, 1],
+              [3, 2, 0, 2, 2],
+              [2, 3, 2, 1, 3]]]))

    Notes
    -----
-    If an example has :math:`n` edges and :math:`n<k`, in the first
-    returned tensor the :math:`n+1` to :math:`k`th rows would be padded
-    with all zero; in the second returned tensor, the behavior of :math:`n+1`
-    to :math:`k`th elements is not defined.
+    If an example has :math:`n` nodes and :math:`n<k`, the ``sorted_feat``
+    tensor will pad the :math:`n+1` to :math:`k`th rows with zero;
    """
-    return _topk_on(graph, 'edges', feat, k, descending=descending, idx=idx)
+    return _topk_on(graph, 'edges', feat, k,
+                    descending=descending, sortby=sortby,
+                    ntype_or_etype=etype)
--- a/python/dgl/runtime/spmv.py
+++ b/python/dgl/runtime/spmv.py
@@ -168,7 +168,8 @@ def build_gidx_and_mapping_uv(edge_tuples, num_src, num_dst):
        Number of ints needed to represent the graph
    """
    u, v, eid = edge_tuples
-    gidx = create_unitgraph_from_coo(2, num_src, num_dst, u, v, 'any')
+    gidx = create_unitgraph_from_coo(2, num_src, num_dst,
+                                     u.tousertensor(), v.tousertensor(), ['coo', 'csr', 'csc'])
    forward, backward = gidx.get_csr_shuffle_order(0)
    eid = eid.tousertensor()
    nbits = gidx.bits_needed(0)

--- a/python/dgl/sampling/neighbor.py
+++ b/python/dgl/sampling/neighbor.py
@@ -59,10 +59,11 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
        if len(g.ntypes) > 1:
            raise DGLError("Must specify node type when the graph is not homogeneous.")
        nodes = {g.ntypes[0] : nodes}
+    nodes = utils.prepare_tensor_dict(g, nodes, 'nodes')
    nodes_all_types = []
    for ntype in g.ntypes:
        if ntype in nodes:
-            nodes_all_types.append(utils.toindex(nodes[ntype], g._idtype_str).todgltensor())
+            nodes_all_types.append(F.to_dgl_nd(nodes[ntype]))
        else:
            nodes_all_types.append(nd.array([], ctx=nd.cpu()))

@@ -75,7 +76,7 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
        fanout_array = [None] * len(g.etypes)
        for etype, value in fanout.items():
            fanout_array[g.get_etype_id(etype)] = value
-    fanout_array = utils.toindex(fanout_array).todgltensor()
+    fanout_array = F.to_dgl_nd(F.tensor(fanout_array, dtype=F.int64))

    if prob is None:
        prob_arrays = [nd.array([], ctx=nd.cpu())] * len(g.etypes)
@@ -83,7 +84,7 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
        prob_arrays = []
        for etype in g.canonical_etypes:
            if prob in g.edges[etype].data:
-                prob_arrays.append(F.zerocopy_to_dgl_ndarray(g.edges[etype].data[prob]))
+                prob_arrays.append(F.to_dgl_nd(g.edges[etype].data[prob]))
            else:
                prob_arrays.append(nd.array([], ctx=nd.cpu()))

@@ -92,7 +93,7 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
    induced_edges = subgidx.induced_edges
    ret = DGLHeteroGraph(subgidx.graph, g.ntypes, g.etypes)
    for i, etype in enumerate(ret.canonical_etypes):
-        ret.edges[etype].data[EID] = induced_edges[i].tousertensor()
+        ret.edges[etype].data[EID] = induced_edges[i]
    return ret

 def select_topk(g, k, weight, nodes=None, edge_dir='in', ascending=False):
@@ -140,10 +141,11 @@ def select_topk(g, k, weight, nodes=None, edge_dir='in', ascending=False):
        nodes = {g.ntypes[0] : nodes}

    # Parse nodes into a list of NDArrays.
+    nodes = utils.prepare_tensor_dict(g, nodes, 'nodes')
    nodes_all_types = []
    for ntype in g.ntypes:
        if ntype in nodes:
-            nodes_all_types.append(utils.toindex(nodes[ntype], g._idtype_str).todgltensor())
+            nodes_all_types.append(F.to_dgl_nd(nodes[ntype]))
        else:
            nodes_all_types.append(nd.array([], ctx=nd.cpu()))

@@ -156,12 +158,12 @@ def select_topk(g, k, weight, nodes=None, edge_dir='in', ascending=False):
        k_array = [None] * len(g.etypes)
        for etype, value in k.items():
            k_array[g.get_etype_id(etype)] = value
-    k_array = utils.toindex(k_array).todgltensor()
+    k_array = F.to_dgl_nd(F.tensor(k_array, dtype=F.int64))

    weight_arrays = []
    for etype in g.canonical_etypes:
        if weight in g.edges[etype].data:
-            weight_arrays.append(F.zerocopy_to_dgl_ndarray(g.edges[etype].data[weight]))
+            weight_arrays.append(F.to_dgl_nd(g.edges[etype].data[weight]))
        else:
            raise DGLError('Edge weights "{}" do not exist for relation graph "{}".'.format(
                weight, etype))
@@ -171,7 +173,7 @@ def select_topk(g, k, weight, nodes=None, edge_dir='in', ascending=False):
    induced_edges = subgidx.induced_edges
    ret = DGLHeteroGraph(subgidx.graph, g.ntypes, g.etypes)
    for i, etype in enumerate(ret.canonical_etypes):
-        ret.edges[etype].data[EID] = induced_edges[i].tousertensor()
+        ret.edges[etype].data[EID] = induced_edges[i]
    return ret



--- a/python/dgl/segment.py
+++ b/python/dgl/segment.py
+"""Segment aggregation operators implemented using DGL graph."""
+
+from .base import DGLError
+from . import backend as F
+from . import convert
+from . import function as fn
+
+def segment_reduce(seglen, value, reducer='sum'):
+    """Segment reduction operator.
+
+    It aggregates the value tensor along the first dimension by segments.
+    The first argument ``seglen`` stores the length of each segment. Its
+    summation must be equal to the first dimension of the ``value`` tensor.
+    Zero-length segments are allowed.
+
+    Parameters
+    ----------
+    seglen : Tensor
+        Segment lengths.
+    value : Tensor
+        Value to aggregate.
+    reducer : str, optional
+        Aggregation method. Can be 'sum', 'max', 'min', 'mean'.
+
+    Returns
+    -------
+    Tensor
+        Aggregated tensor of shape ``(len(seglen), value.shape[1:])``.
+
+    Examples
+    --------
+
+    >>> import dgl
+    >>> import torch as th
+    >>> val = th.ones(10, 3)
+    >>> seg = th.tensor([1, 0, 5, 4])  # 4 segments
+    >>> dgl.segment_reduce(seg, val)
+    tensor([[1., 1., 1.],
+            [0., 0., 0.],
+            [5., 5., 5.],
+            [4., 4., 4.]])
+    """
+    ctx = F.context(seglen)
+    # TODO(minjie): a more efficient implementation is to create a graph
+    #   directly from a CSR structure.
+    u = F.copy_to(F.arange(0, F.shape(value)[0], F.int32), ctx)
+    v = F.repeat(F.copy_to(F.arange(0, len(seglen), F.int32), ctx),
+                 seglen, dim=0)
+    if len(u) != len(v):
+        raise DGLError("Invalid seglen array:", seglen,
+                       ". Its summation must be equal to value.shape[0].")
+    g = convert.bipartite((u, v))
+    g.srcdata['h'] = value
+    g.update_all(fn.copy_u('h', 'm'), getattr(fn, reducer)('m', 'h'))
+    return g.dstdata['h']
+
+def segment_softmax(seglen, value):
+    """Performa softmax on each segment.
+
+    The first argument ``seglen`` stores the length of each segment. Its
+    summation must be equal to the first dimension of the ``value`` tensor.
+    Zero-length segments are allowed.
+
+    Parameters
+    ----------
+    seglen : Tensor
+        Segment lengths.
+    value : Tensor
+        Value to aggregate.
+    reducer : str, optional
+        Aggregation method. Can be 'sum', 'max', 'min', 'mean'.
+
+    Returns
+    -------
+    Tensor
+        Result tensor of the same shape as the ``value`` tensor.
+
+    Examples
+    --------
+
+    >>> import dgl
+    >>> import torch as th
+    >>> val = th.ones(10, 3)
+    >>> seg = th.tensor([1, 0, 5, 4])  # 4 segments
+    >>> dgl.segment_softmax(seg, val)
+    tensor([[1.0000, 1.0000, 1.0000],
+            [0.2000, 0.2000, 0.2000],
+            [0.2000, 0.2000, 0.2000],
+            [0.2000, 0.2000, 0.2000],
+            [0.2000, 0.2000, 0.2000],
+            [0.2000, 0.2000, 0.2000],
+            [0.2500, 0.2500, 0.2500],
+            [0.2500, 0.2500, 0.2500],
+            [0.2500, 0.2500, 0.2500],
+            [0.2500, 0.2500, 0.2500]])
+    """
+    value_max = segment_reduce(seglen, value, reducer='max')
+    value = F.exp(value - F.repeat(value_max, seglen, dim=0))
+    value_sum = segment_reduce(seglen, value, reducer='sum')
+    return value / F.repeat(value_sum, seglen, dim=0)
--- a/python/dgl/sparse.py
+++ b/python/dgl/sparse.py
@@ -5,7 +5,6 @@ from __future__ import absolute_import
 import dgl.ndarray as nd
 from ._ffi.function import _init_api
 from .base import DGLError
-from .utils import to_dgl_context
 from . import backend as F

 def infer_broadcast_shape(op, shp1, shp2):
@@ -115,43 +114,44 @@ def _gspmm(gidx, op, reduce_op, u, e):
    (90,) to (90, 1) for a graph with 90 nodes/edges).
    """
    if gidx.number_of_etypes() != 1:
-        raise DGLError("We only support gsddmm on graph with one edge type")
+        raise DGLError("We only support gspmm on graph with one edge type")
    use_u = op != 'copy_rhs'
    use_e = op != 'copy_lhs'
+    # deal with scalar features.
+    expand_u, expand_e = False, False
    if use_u:
        if F.ndim(u) == 1:
            u = F.unsqueeze(u, -1)
+            expand_u = True
    if use_e:
        if F.ndim(e) == 1:
            e = F.unsqueeze(e, -1)
-    if gidx.number_of_etypes() != 1:
-        raise DGLError("We only support gspmm on graph with one edge type")
-
+            expand_e = True
    ctx = F.context(u) if use_u else F.context(e)
    dtype = F.dtype(u) if use_u else F.dtype(e)
    u_shp = F.shape(u) if use_u else (0,)
    e_shp = F.shape(e) if use_e else (0,)
-
    _, dsttype = gidx.metagraph.find_edge(0)
    v_shp = (gidx.number_of_nodes(dsttype), ) +\
        infer_broadcast_shape(op, u_shp[1:], e_shp[1:])
    v = F.zeros(v_shp, dtype, ctx)
    use_cmp = reduce_op in ['max', 'min']
    arg_u, arg_e = None, None
-    ugi = gidx.get_unitgraph(0, to_dgl_context(ctx))
-    idtype = getattr(F, ugi.dtype)
+    idtype = getattr(F, gidx.dtype)
    if use_cmp:
        if use_u:
            arg_u = F.zeros(v_shp, idtype, ctx)
        if use_e:
            arg_e = F.zeros(v_shp, idtype, ctx)
    if gidx.number_of_edges(0) > 0:
-        _CAPI_DGLKernelSpMM(ugi, op, reduce_op,
+        _CAPI_DGLKernelSpMM(gidx, op, reduce_op,
                            to_dgl_nd(u if use_u else None),
                            to_dgl_nd(e if use_e else None),
                            to_dgl_nd_for_write(v),
                            to_dgl_nd_for_write(arg_u),
                            to_dgl_nd_for_write(arg_e))
+    if (expand_u or not use_u) and (expand_e or not use_e):
+        v = F.squeeze(v, -1)
    return v, (arg_u, arg_e)

 def _gsddmm(gidx, op, lhs, rhs, lhs_target='u', rhs_target='v'):
@@ -200,13 +200,16 @@ def _gsddmm(gidx, op, lhs, rhs, lhs_target='u', rhs_target='v'):
        raise DGLError("We only support gsddmm on graph with one edge type")
    use_lhs = op != 'copy_rhs'
    use_rhs = op != 'copy_lhs'
+    # deal with scalar features.
+    expand_lhs, expand_rhs = False, False
    if use_lhs:
        if F.ndim(lhs) == 1:
            lhs = F.unsqueeze(lhs, -1)
+            expand_lhs = True
    if use_rhs:
        if F.ndim(rhs) == 1:
            rhs = F.unsqueeze(rhs, -1)
-
+            expand_rhs = True
    lhs_target = target_mapping[lhs_target]
    rhs_target = target_mapping[rhs_target]
    ctx = F.context(lhs) if use_lhs else F.context(rhs)
@@ -217,12 +220,13 @@ def _gsddmm(gidx, op, lhs, rhs, lhs_target='u', rhs_target='v'):
        infer_broadcast_shape(op, lhs_shp[1:], rhs_shp[1:])
    out = F.zeros(out_shp, dtype, ctx)
    if gidx.number_of_edges(0) > 0:
-        ugi = gidx.get_unitgraph(0, to_dgl_context(ctx))
-        _CAPI_DGLKernelSDDMM(ugi, op,
+        _CAPI_DGLKernelSDDMM(gidx, op,
                             to_dgl_nd(lhs if use_lhs else None),
                             to_dgl_nd(rhs if use_rhs else None),
                             to_dgl_nd_for_write(out),
                             lhs_target, rhs_target)
+    if (expand_lhs or not use_lhs) and (expand_rhs or not use_rhs):
+        out = F.squeeze(out, -1)
    return out

 _init_api("dgl.sparse")
--- a/python/dgl/transform.py
+++ b/python/dgl/transform.py
@@ -7,43 +7,43 @@ import numpy as np
 from scipy import sparse

 from ._ffi.function import _init_api
-from .graph import DGLGraph
+from .base import EID, NID, dgl_warning, DGLError, is_internal_column
+from . import convert
 from .heterograph import DGLHeteroGraph, DGLBlock
 from . import ndarray as nd
 from . import backend as F
-from .graph_index import from_coo
-from .graph_index import _get_halo_subgraph_inner_node
-from .graph import unbatch
-from .convert import graph, bipartite, heterograph
-from . import utils
-from .base import EID, NID, DGLError, is_internal_column
-from . import ndarray as nd
+from . import utils, batch
 from .partition import metis_partition_assignment as hetero_metis_partition_assignment
 from .partition import partition_graph_with_halo as hetero_partition_graph_with_halo
 from .partition import metis_partition as hetero_metis_partition

+# TO BE DEPRECATED
+from .graph import DGLGraph as DGLGraphStale
+from .graph_index import _get_halo_subgraph_inner_node
+
 __all__ = [
    'line_graph',
-    'line_heterograph',
    'khop_adj',
    'khop_graph',
    'reverse',
-    'reverse_heterograph',
-    'to_simple_graph',
    'to_bidirected',
    'to_bidirected_stale',
    'laplacian_lambda_max',
    'knn_graph',
    'segmented_knn_graph',
+    'add_edges',
+    'add_nodes',
+    'remove_edges',
+    'remove_nodes',
    'add_self_loop',
    'remove_self_loop',
    'metapath_reachable_graph',
    'compact_graphs',
    'to_block',
    'to_simple',
+    'to_simple_graph',
    'in_subgraph',
    'out_subgraph',
-    'remove_edges',
    'as_immutable_graph',
    'as_heterograph']

@@ -102,8 +102,7 @@ def knn_graph(x, k):
        (F.asnumpy(F.zeros_like(dst) + 1), (F.asnumpy(dst), F.asnumpy(src))),
        shape=(n_samples * n_points, n_samples * n_points))

-    g = DGLGraph(adj, readonly=True)
-    return g
+    return convert.graph(adj)

 #pylint: disable=invalid-name
 def segmented_knn_graph(x, k, segs):
@@ -145,7 +144,7 @@ def segmented_knn_graph(x, k, segs):
    src = F.reshape(src, (-1,))
    adj = sparse.csr_matrix((F.asnumpy(F.zeros_like(dst) + 1), (F.asnumpy(dst), F.asnumpy(src))))

-    g = DGLGraph(adj, readonly=True)
+    g = convert.graph(adj)
    return g

 def to_bidirected(g, readonly=None, copy_ndata=True,
@@ -262,7 +261,7 @@ def to_bidirected(g, readonly=None, copy_ndata=True,
            u, v = g.edges(form='uv', order='eid', etype=c_etype)
            subgs[c_etype] = (F.cat([u, v], dim=0), F.cat([v, u], dim=0))

-        new_g = heterograph(subgs)
+        new_g = convert.heterograph(subgs)
    else:
        subgs = {}
        for c_etype in canonical_etypes:
@@ -273,7 +272,7 @@ def to_bidirected(g, readonly=None, copy_ndata=True,
                u, v = g.edges(form='uv', order='eid', etype=c_etype)
                subgs[c_etype] = (F.cat([u, v], dim=0), F.cat([v, u], dim=0))

-        new_g = heterograph(subgs)
+        new_g = convert.heterograph(subgs)

    # handle features
    if copy_ndata:
@@ -299,27 +298,6 @@ def to_bidirected(g, readonly=None, copy_ndata=True,
 def line_graph(g, backtracking=True, shared=False):
    """Return the line graph of this graph.

-    Parameters
-    ----------
-    g : dgl.DGLGraph
-        The input graph.
-    backtracking : bool, optional
-        Whether the returned line graph is backtracking.
-    shared : bool, optional
-        Whether the returned line graph shares representations with `self`.
-
-    Returns
-    -------
-    DGLGraph
-        The line graph of this graph.
-    """
-    graph_data = g._graph.line_graph(backtracking)
-    node_frame = g._edge_frame if shared else None
-    return DGLGraph(graph_data, node_frame)
-
-def line_heterograph(g, backtracking=True):
-    """Return the line graph of this graph.
-
    The graph should be an directed homogeneous graph. Aother type of graphs
    are not supported right now.

@@ -327,8 +305,13 @@ def line_heterograph(g, backtracking=True):

    Parameters
    ----------
-    backtracking : bool
+    g : DGLGraph
+        Input graph.
+    backtracking : bool, optional
        Whether the pair of (v, u) (u, v) edges are treated as linked. Default True.
+    shared : bool, optional
+        Whether to copy the edge features of the original graph as the node features
+        of the result line graph.

    Returns
    -------
@@ -357,12 +340,15 @@ def line_heterograph(g, backtracking=True):
    ... (tensor([0, 1, 2, 4]), tensor([4, 0, 3, 1]))

    """
-    assert g.is_homograph(), \
+    assert g.is_homogeneous(), \
        'line_heterograph only support directed homogeneous graph right now'
+    lg = DGLHeteroGraph(_CAPI_DGLHeteroLineGraph(g._graph, backtracking))
+    if shared:
+        # copy edge features
+        lg.ndata.update(g.edata)
+    return lg

-    hgidx = _CAPI_DGLHeteroLineGraph(g._graph, backtracking)
-    hg = DGLHeteroGraph(hgidx, g._etypes, g._ntypes)
-    return hg
+DGLHeteroGraph.line_graph = line_graph

 def khop_adj(g, k):
    """Return the matrix of :math:`A^k` where :math:`A` is the adjacency matrix of :math:`g`,
@@ -456,82 +442,9 @@ def khop_graph(g, k):
    col = np.repeat(adj_k.col, multiplicity)
    # TODO(zihao): we should support creating multi-graph from scipy sparse matrix
    # in the future.
-    return DGLGraph(from_coo(n, row, col, True))
-
-def reverse(g, copy_ndata=False, copy_edata=False):
-    """Return the reverse of a graph
-    The reverse (also called converse, transpose) of a directed graph is another directed
-    graph on the same nodes with edges reversed in terms of direction.
-    Given a :class:`dgl.DGLGraph` object, we return another :class:`dgl.DGLGraph` object
-    representing its reverse.
-
-    Parameters
-    ----------
-    g : dgl.DGLGraph
-        The input graph.
-    copy_ndata: bool, optional
-        If True, node attributes are copied from the original graph to the reversed graph.
-        Otherwise the reversed graph will not be initialized with node attributes.
-    copy_edata: bool, optional
-        If True, edge attributes are copied from the original graph to the reversed graph.
-        Otherwise the reversed graph will not have edge attributes.
-
-    Return
-    ------
-    dgl.DGLGraph
-        The reversed graph.
-
-    Notes
-    -----
-    * We do not dynamically update the topology of a graph once that of its reverse changes.
-      This can be particularly problematic when the node/edge attrs are shared. For example,
-      if the topology of both the original graph and its reverse get changed independently,
-      you can get a mismatched node/edge feature.
-
-    Examples
-    --------
-    Create a graph to reverse.
-    >>> import dgl
-    >>> import torch as th
-    >>> g = dgl.DGLGraph()
-    >>> g.add_nodes(3)
-    >>> g.add_edges([0, 1, 2], [1, 2, 0])
-    >>> g.ndata['h'] = th.tensor([[0.], [1.], [2.]])
-    >>> g.edata['h'] = th.tensor([[3.], [4.], [5.]])
-    Reverse the graph and examine its structure.
-    >>> rg = g.reverse(copy_ndata=True, copy_edata=True)
-    >>> print(rg)
-    DGLGraph with 3 nodes and 3 edges.
-    Node data: {'h': Scheme(shape=(1,), dtype=torch.float32)}
-    Edge data: {'h': Scheme(shape=(1,), dtype=torch.float32)}
-    The edges are reversed now.
-    >>> rg.has_edges_between([1, 2, 0], [0, 1, 2])
-    tensor([1, 1, 1])
-    Reversed edges have the same feature as the original ones.
-    >>> g.edges[[0, 2], [1, 0]].data['h'] == rg.edges[[1, 0], [0, 2]].data['h']
-    tensor([[1],
-            [1]], dtype=torch.uint8)
-    The node/edge features of the reversed graph share memory with the original
-    graph, which is helpful for both forward computation and back propagation.
-    >>> g.ndata['h'] = g.ndata['h'] + 1
-    >>> rg.ndata['h']
-    tensor([[1.],
-            [2.],
-            [3.]])
-    """
-    g_reversed = DGLGraph()
-    g_reversed.add_nodes(g.number_of_nodes())
-    g_edges = g.all_edges(order='eid')
-    g_reversed.add_edges(g_edges[1], g_edges[0])
-    g_reversed._batch_num_nodes = g._batch_num_nodes
-    g_reversed._batch_num_edges = g._batch_num_edges
-    if copy_ndata:
-        g_reversed._node_frame = g._node_frame
-    if copy_edata:
-        g_reversed._edge_frame = g._edge_frame
-    return g_reversed
+    return convert.graph((row, col), num_nodes=n)

-def reverse_heterograph(g, copy_ndata=True, copy_edata=False):
+def reverse(g, copy_ndata=True, copy_edata=False, *, share_ndata=None, share_edata=None):
    r"""Return the reverse of a graph.

    The reverse (also called converse, transpose) of a graph with edges
@@ -649,6 +562,14 @@ def reverse_heterograph(g, copy_ndata=True, copy_edata=False):
    >>> rg.edges['plays'].data
    {}
    """
+    if share_ndata is not None:
+        dgl_warning('share_ndata argument has been renamed to copy_ndata.')
+        copy_ndata = share_ndata
+    if share_edata is not None:
+        dgl_warning('share_edata argument has been renamed to copy_edata.')
+        copy_edata = share_edata
+    if g.is_block:
+        raise DGLError('Reversing a block graph is not allowed.')
    # TODO(0.5 release, xiangsx) need to handle BLOCK
    # currently reversing a block results in undefined behavior
    gidx = g._graph.reverse()
@@ -672,12 +593,12 @@ def reverse_heterograph(g, copy_ndata=True, copy_edata=False):

    return new_g

-DGLHeteroGraph.reverse = reverse_heterograph
+DGLHeteroGraph.reverse = reverse

 def to_simple_graph(g):
    """Convert the graph to a simple graph with no multi-edge.

-    The function generates a new *readonly* graph with no node/edge feature.
+    DEPRECATED: renamed to dgl.to_simple

    Parameters
    ----------
@@ -689,8 +610,8 @@ def to_simple_graph(g):
    DGLGraph
        A simple graph.
    """
-    gidx = _CAPI_DGLToSimpleGraph(g._graph)
-    return DGLGraph(gidx, readonly=True)
+    dgl_warning('dgl.to_simple_graph is renamed to dgl.to_simple in v0.5.')
+    return to_simple(g)

 def to_bidirected_stale(g, readonly=True):
    """Convert the graph to a bidirected graph.
@@ -733,7 +654,7 @@ def to_bidirected_stale(g, readonly=True):
        newgidx = _CAPI_DGLToBidirectedImmutableGraph(g._graph)
    else:
        newgidx = _CAPI_DGLToBidirectedMutableGraph(g._graph)
-    return DGLGraph(newgidx)
+    return DGLGraphStale(newgidx)

 def laplacian_lambda_max(g):
    """Return the largest eigenvalue of the normalized symmetric laplacian of g.
@@ -762,7 +683,7 @@ def laplacian_lambda_max(g):
    >>> dgl.laplacian_lambda_max(g)
    [1.809016994374948]
    """
-    g_arr = unbatch(g)
+    g_arr = batch.unbatch(g)
    rst = []
    for g_i in g_arr:
        n = g_i.number_of_nodes()
@@ -803,7 +724,6 @@ def metapath_reachable_graph(g, metapath):
        A homogeneous or bipartite graph.
    """
    adj = 1
-    index_dtype = g._idtype_str
    for etype in metapath:
        adj = adj * g.adj(etype=etype, scipy_fmt='csr', transpose=True)

@@ -812,83 +732,490 @@ def metapath_reachable_graph(g, metapath):
    dsttype = g.to_canonical_etype(metapath[-1])[2]
    if srctype == dsttype:
        assert adj.shape[0] == adj.shape[1]
-        new_g = graph(adj, ntype=srctype, index_dtype=index_dtype)
+        new_g = convert.graph(adj, ntype=srctype, idtype=g.idtype, device=g.device)
    else:
-        new_g = bipartite(adj, utype=srctype, vtype=dsttype, index_dtype=index_dtype)
+        new_g = convert.bipartite(adj, utype=srctype, vtype=dsttype,
+                                  idtype=g.idtype, device=g.device)

+    # copy srcnode features
    for key, value in g.nodes[srctype].data.items():
        new_g.nodes[srctype].data[key] = value
+    # copy dstnode features
    if srctype != dsttype:
        for key, value in g.nodes[dsttype].data.items():
            new_g.nodes[dsttype].data[key] = value

    return new_g

-def add_self_loop(g):
-    """Return a new graph containing all the edges in the input graph plus self loops
-    of every nodes.
-    No duplicate self loop will be added for nodes already having self loops.
-    Self-loop edges id are not preserved. All self-loop edges would be added at the end.
+def add_nodes(g, num, data=None, ntype=None):
+    r"""Add new nodes of the same node type.
+    A new graph with newly added nodes is returned.
+
+    Parameters
+    ----------
+    num : int
+        Number of nodes to add.
+    data : dict, optional
+        Feature data of the added nodes.
+    ntype : str, optional
+        The type of the new nodes. Can be omitted if there is
+        only one node type in the graph.
+
+    Return
+    ------
+    DGLHeteroGraph
+        The graph with newly added nodes.
+
+    Notes
+    -----
+
+    * If the key of ``data`` does not contain some existing feature fields,
+    those features for the new nodes will be filled with zeros).
+    * If the key of ``data`` contains new feature fields, those features for
+    the old nodes will be filled zeros).

    Examples
-    ---------
+    --------

-    >>> g = DGLGraph()
-    >>> g.add_nodes(5)
-    >>> g.add_edges([0, 1, 2], [1, 1, 2])
-    >>> new_g = dgl.transform.add_self_loop(g) # Nodes 0, 3, 4 don't have self-loop
-    >>> new_g.edges()
-    (tensor([0, 0, 1, 2, 3, 4]), tensor([1, 0, 1, 2, 3, 4]))
+    The following example uses PyTorch backend.
+    >>> import dgl
+    >>> import torch
+
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Node Type**
+
+    >>> g = dgl.graph((torch.tensor([0, 1]), torch.tensor([1, 2])))
+    >>> g.num_nodes()
+    3
+    >>> g = dgl.add_nodes(g, 2)
+    >>> g.num_nodes()
+    5
+
+    If the graph has some node features and new nodes are added without
+    features, their features will be created by initializers defined
+    with :func:`set_n_initializer`.
+
+    >>> g.ndata['h'] = torch.ones(5, 1)
+    >>> g = dgl.add_nodes(g, 1)
+    >>> g.ndata['h']
+    tensor([[1.], [1.], [1.], [1.], [1.], [0.]])
+
+    We can also assign features for the new nodes in adding new nodes.
+
+    >>> g = dgl.add_nodes(g, 1, {'h': torch.ones(1, 1), 'w': torch.ones(1, 1)})
+    >>> g.ndata['h']
+    tensor([[1.], [1.], [1.], [1.], [1.], [0.], [1.]])
+
+    Since ``data`` contains new feature fields, the features for old nodes
+    will be created by initializers defined with :func:`set_n_initializer`.
+
+    >>> g.ndata['w']
+    tensor([[0.], [0.], [0.], [0.], [0.], [0.], [1.]])
+
+    **Heterogeneous Graphs with Multiple Node Types**
+
+    >>> g = dgl.heterograph({
+    >>>     ('user', 'plays', 'game'): (torch.tensor([0, 1, 1, 2]),
+    >>>                                 torch.tensor([0, 0, 1, 1])),
+    >>>     ('developer', 'develops', 'game'): (torch.tensor([0, 1]),
+    >>>                                         torch.tensor([0, 1]))
+    >>>     })
+    >>> g = dgl.add_nodes(g, 2)
+    DGLError: Node type name must be specified
+    if there are more than one node types.
+    >>> g.num_nodes('user')
+    3
+    >>> g = dgl.add_nodes(g, 2, ntype='user')
+    >>> g.num_nodes('user')
+    5
+
+    See Also
+    --------
+    remove_nodes
+    add_edges
+    remove_edges
+    """
+    g = g.clone()
+    g.add_nodes(num, data=data, ntype=ntype)
+    return g
+
+def add_edges(g, u, v, data=None, etype=None):
+    r"""Add multiple new edges for the specified edge type.
+    A new graph with newly added edges is returned.
+
+    The i-th new edge will be from ``u[i]`` to ``v[i]``.

    Parameters
-    ------------
-    g: DGLGraph
+    ----------
+    u : int, tensor, numpy.ndarray, list
+        Source node IDs, ``u[i]`` gives the source node for the i-th new edge.
+    v : int, tensor, numpy.ndarray, list
+        Destination node IDs, ``v[i]`` gives the destination node for the i-th new edge.
+    data : dict, optional
+        Feature data of the added edges. The i-th row of the feature data
+        corresponds to the i-th new edge.
+    etype : str or tuple of str, optional
+        The type of the new edges. Can be omitted if there is
+        only one edge type in the graph.

-    Returns
+    Return
+    ------
+    DGLHeteroGraph
+        The graph with newly added edges.
+
+    Notes
+    -----
+    * If end nodes of adding edges does not exists, add_nodes is invoked
+    to add new nodes. The node features of the new nodes will be created
+    by initializers defined with :func:`set_n_initializer` (default
+    initializer fills zeros). In certain cases, it is recommanded to
+    add_nodes first and then add_edges.
+    * If the key of ``data`` does not contain some existing feature fields,
+    those features for the new edges will be created by initializers
+    defined with :func:`set_n_initializer` (default initializer fills zeros).
+    * If the key of ``data`` contains new feature fields, those features for
+    the old edges will be created by initializers defined with
+    :func:`set_n_initializer` (default initializer fills zeros).
+
+    Examples
    --------
-    DGLGraph
+
+    The following example uses PyTorch backend.
+    >>> import dgl
+    >>> import torch
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Edge Type**
+    >>> g = dgl.graph((torch.tensor([0, 1]), torch.tensor([1, 2])))
+    >>> g.num_edges()
+    2
+    >>> g = dgl.add_edges(g, torch.tensor([1, 3]), torch.tensor([0, 1]))
+    >>> g.num_edges()
+    4
+    Since ``u`` or ``v`` contains a non-existing node ID, the nodes are
+    added implicitly.
+    >>> g.num_nodes()
+    4
+
+    If the graph has some edge features and new edges are added without
+    features, their features will be created by initializers defined
+    with :func:`set_n_initializer`.
+
+    >>> g.edata['h'] = torch.ones(4, 1)
+    >>> g = dgl.add_edges(g, torch.tensor([1]), torch.tensor([1]))
+    >>> g.edata['h']
+    tensor([[1.], [1.], [1.], [1.], [0.]])
+
+    We can also assign features for the new edges in adding new edges.
+
+    >>> g = dgl.add_edges(g, torch.tensor([0, 0]), torch.tensor([2, 2]),
+    >>>                   {'h': torch.tensor([[1.], [2.]]), 'w': torch.ones(2, 1)})
+    >>> g.edata['h']
+    tensor([[1.], [1.], [1.], [1.], [0.], [1.], [2.]])
+    Since ``data`` contains new feature fields, the features for old edges
+    will be created by initializers defined with :func:`set_n_initializer`.
+    >>> g.edata['w']
+    tensor([[0.], [0.], [0.], [0.], [0.], [1.], [1.]])
+
+    **Heterogeneous Graphs with Multiple Edge Types**
+
+    >>> g = dgl.heterograph({
+    >>>     ('user', 'plays', 'game'): (torch.tensor([0, 1, 1, 2]),
+    >>>                                 torch.tensor([0, 0, 1, 1])),
+    >>>     ('developer', 'develops', 'game'): (torch.tensor([0, 1]),
+    >>>                                         torch.tensor([0, 1]))
+    >>>     })
+    >>> g = dgl.add_edges(g, torch.tensor([3]), torch.tensor([3]))
+    DGLError: Edge type name must be specified
+    if there are more than one edge types.
+    >>> g.number_of_edges('plays')
+    4
+    >>> g = dgl.add_edges(g, torch.tensor([3]), torch.tensor([3]), etype='plays')
+    >>> g.number_of_edges('plays')
+    5
+
+    See Also
+    --------
+    add_nodes
+    remove_nodes
+    remove_edges
    """
-    new_g = DGLGraph()
-    new_g.add_nodes(g.number_of_nodes())
-    src, dst = g.all_edges(order="eid")
-    src = F.zerocopy_to_numpy(src)
-    dst = F.zerocopy_to_numpy(dst)
-    non_self_edges_idx = src != dst
-    nodes = np.arange(g.number_of_nodes())
-    new_g.add_edges(src[non_self_edges_idx], dst[non_self_edges_idx])
-    new_g.add_edges(nodes, nodes)
+    g = g.clone()
+    g.add_edges(u, v, data=data, etype=etype)
+    return g
+
+def remove_edges(g, eids, etype=None):
+    r"""Remove multiple edges with the specified edge type.
+    A new graph with certain edges deleted is returned.
+
+    Nodes will not be removed. After removing edges, the rest
+    edges will be re-indexed using consecutive integers from 0,
+    with their relative order preserved.
+    The features for the removed edges will be removed accordingly.
+
+    Parameters
+    ----------
+    eids : int, tensor, numpy.ndarray, list
+        IDs for the edges to remove.
+    etype : str or tuple of str, optional
+        The type of the edges to remove. Can be omitted if there is
+        only one edge type in the graph.
+
+    Return
+    ------
+    DGLHeteroGraph
+        The graph with edges deleted.
+
+    Examples
+    --------
+    >>> import dgl
+    >>> import torch
+
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Edge Type**
+
+    >>> g = dgl.graph((torch.tensor([0, 0, 2]), torch.tensor([0, 1, 2])))
+    >>> g.edata['he'] = torch.arange(3).float().reshape(-1, 1)
+    >>> g = dgl.remove_edges(g, torch.tensor([0, 1]))
+    >>> g
+    Graph(num_nodes=3, num_edges=1,
+        ndata_schemes={}
+        edata_schemes={'he': Scheme(shape=(1,), dtype=torch.float32)})
+    >>> g.edges('all')
+    (tensor([2]), tensor([2]), tensor([0]))
+    >>> g.edata['he']
+    tensor([[2.]])
+
+    **Heterogeneous Graphs with Multiple Edge Types**
+
+    >>> g = dgl.heterograph({
+    >>>     ('user', 'plays', 'game'): (torch.tensor([0, 1, 1, 2]),
+    >>>                                 torch.tensor([0, 0, 1, 1])),
+    >>>     ('developer', 'develops', 'game'): (torch.tensor([0, 1]),
+    >>>                                         torch.tensor([0, 1]))
+    >>>     })
+    >>> g = dgl.remove_edges(g, torch.tensor([0, 1]))
+    DGLError: Edge type name must be specified
+    if there are more than one edge types.
+    >>> g = dgl.remove_edges(g, torch.tensor([0, 1]), 'plays')
+    >>> g.edges('all', etype='plays')
+    (tensor([0, 1]), tensor([0, 0]), tensor([0, 1]))
+    See Also
+    --------
+    add_nodes
+    add_edges
+    remove_nodes
+    """
+    g = g.clone()
+    g.remove_edges(eids, etype=etype)
+    return g
+
+
+def remove_nodes(g, nids, ntype=None):
+    r"""Remove multiple nodes with the specified node type.
+    A new graph with certain nodes deleted is returned.
+
+    Edges that connect to the nodes will be removed as well. After removing
+    nodes and edges, the rest nodes and edges will be re-indexed using
+    consecutive integers from 0, with their relative order preserved.
+    The features for the removed nodes/edges will be removed accordingly.
+
+    The features for the removed nodes/edges will be removed accordingly.
+
+    Parameters
+    ----------
+    nids : int, tensor, numpy.ndarray, list
+        Nodes to remove.
+    ntype : str, optional
+        The type of the nodes to remove. Can be omitted if there is
+        only one node type in the graph.
+
+    Return
+    ------
+    DGLHeteroGraph
+        The graph with nodes deleted.
+
+    Examples
+    --------
+
+    >>> import dgl
+    >>> import torch
+
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Node Type**
+
+    >>> g = dgl.graph((torch.tensor([0, 0, 2]), torch.tensor([0, 1, 2])))
+    >>> g.ndata['hv'] = torch.arange(3).float().reshape(-1, 1)
+    >>> g.edata['he'] = torch.arange(3).float().reshape(-1, 1)
+    >>> g = dgl.remove_nodes(g, torch.tensor([0, 1]))
+    >>> g
+    Graph(num_nodes=1, num_edges=1,
+        ndata_schemes={'hv': Scheme(shape=(1,), dtype=torch.float32)}
+        edata_schemes={'he': Scheme(shape=(1,), dtype=torch.float32)})
+    >>> g.ndata['hv']
+    tensor([[2.]])
+    >>> g.edata['he']
+    tensor([[2.]])
+
+    **Heterogeneous Graphs with Multiple Node Types**
+
+    >>> g = dgl.heterograph({
+    >>>     ('user', 'plays', 'game'): (torch.tensor([0, 1, 1, 2]),
+    >>>                                 torch.tensor([0, 0, 1, 1])),
+    >>>     ('developer', 'develops', 'game'): (torch.tensor([0, 1]),
+    >>>                                         torch.tensor([0, 1]))
+    >>>     })
+    >>> g = dgl.remove_nodes(g, torch.tensor([0, 1]))
+    DGLError: Node type name must be specified
+    if there are more than one node types.
+    >>> g = dgl.remove_nodes(g, torch.tensor([0, 1]), ntype='game')
+    >>> g.num_nodes('user')
+    3
+    >>> g.num_nodes('game')
+    0
+    >>> g.num_edges('plays')
+    0
+
+    See Also
+    --------
+    add_nodes
+    add_edges
+    remove_edges
+    """
+    g = g.clone()
+    g.remove_nodes(nids, ntype=ntype)
+    return g
+
+def add_self_loop(g, etype=None):
+    r""" Add self loop for each node in the graph.
+    A new graph with self-loop is returned.
+
+    Since **selfloop is not well defined for unidirectional
+    bipartite graphs**, we simply skip the nodes corresponding
+    to unidirectional bipartite graphs.
+
+    Return
+    ------
+    DGLHeteroGraph
+        The graph with self-loop.
+
+    Notes
+    -----
+    * It is recommanded to ``remove_self_loop`` before invoking
+    ``add_self_loop``.
+    * Features for the new edges (self-loop edges) will be created
+    by initializers defined with :func:`set_n_initializer`
+    (default initializer fills zeros).
+
+    Examples
+    --------
+    >>> import dgl
+    >>> import torch
+
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Node Type**
+
+    >>> g = dgl.graph((torch.tensor([0, 0, 2]), torch.tensor([2, 1, 0])))
+    >>> g.ndata['hv'] = torch.arange(3).float().reshape(-1, 1)
+    >>> g.edata['he'] = torch.arange(3).float().reshape(-1, 1)
+    >>> g = dgl.add_self_loop(g)
+    >>> g
+    Graph(num_nodes=3, num_edges=6,
+            ndata_schemes={'hv': Scheme(shape=(1,), dtype=torch.float32)}
+            edata_schemes={'he': Scheme(shape=(1,), dtype=torch.float32)})
+    >>> g.edata['he']
+    tensor([[0.],
+            [1.],
+            [2.],
+            [0.],
+            [0.],
+            [0.]])
+
+    **Heterogeneous Graphs with Multiple Node Types**
+
+    >>> g = dgl.heterograph({
+            ('user', 'follows', 'user'): (torch.tensor([1, 2]),
+                                        torch.tensor([0, 1])),
+            ('user', 'plays', 'game'): (torch.tensor([0, 1]),
+                                        torch.tensor([0, 1]))})
+    >>> g = dgl.add_self_loop(g, etype='follows')
+    >>> g
+    Graph(num_nodes={'user': 3, 'game': 2},
+        num_edges={('user', 'plays', 'game'): 2, ('user', 'follows', 'user'): 5},
+        metagraph=[('user', 'user'), ('user', 'game')])
+    """
+    etype = g.to_canonical_etype(etype)
+    if etype[0] != etype[2]:
+        raise DGLError(
+            'add_self_loop does not support unidirectional bipartite graphs: {}.' \
+            'Please make sure the types of head node and tail node are identical.' \
+            ''.format(etype))
+    nodes = g.nodes(etype[0])
+    new_g = add_edges(g, nodes, nodes, etype=etype)
    return new_g

-def remove_self_loop(g):
-    """Return a new graph with all self-loop edges removed
+DGLHeteroGraph.add_self_loop = add_self_loop
+
+def remove_self_loop(g, etype=None):
+    r""" Remove self loops for each node in the graph.
+    A new graph with self-loop removed is returned.
+
+    If there are multiple self loops for a certain node,
+    all of them will be removed.
+
+    Parameters
+    ----------
+    etype : str or tuple of str, optional
+        The type of the edges to remove. Can be omitted if there is
+        only one edge type in the graph.

    Examples
    ---------

-    >>> g = DGLGraph()
-    >>> g.add_nodes(5)
-    >>> g.add_edges([0, 1, 2], [1, 1, 2])
-    >>> new_g = dgl.transform.remove_self_loop(g) # Nodes 1, 2 have self-loop
-    >>> new_g.edges()
-    (tensor([0]), tensor([1]))
+    >>> import dgl
+    >>> import torch

-    Parameters
-    ------------
-    g: DGLGraph
+    **Homogeneous Graphs or Heterogeneous Graphs with A Single Node Type**

-    Returns
+    >>> g = dgl.graph((torch.tensor([0, 0, 0, 1]), torch.tensor([1, 0, 0, 2])),
+                        idtype=idtype, device=F.ctx())
+    >>> g.edata['he'] = torch.arange(4).float().reshape(-1, 1)
+    >>> g = dgl.remove_self_loop(g)
+    >>> g
+    Graph(num_nodes=3, num_edges=2,
+        edata_schemes={'he': Scheme(shape=(2,), dtype=torch.float32)})
+    >>> g.edata['he']
+    tensor([[0.],[3.]])
+
+    **Heterogeneous Graphs with Multiple Node Types**
+
+    >>> g = dgl.heterograph({
+    >>>     ('user', 'follows', 'user'): (torch.tensor([0, 1, 1, 1, 2]),
+    >>>                                 torch.tensor([0, 0, 1, 1, 1])),
+    >>>     ('user', 'plays', 'game'): (torch.tensor([0, 1]),
+    >>>                                         torch.tensor([0, 1]))
+    >>>     })
+    >>> g = dgl.remove_self_loop(g)
+    >>> g.num_nodes('user')
+    3
+    >>> g.num_nodes('game')
+    2
+    >>> g.num_edges('follows')
+    2
+    >>> g.num_edges('plays')
+    2
+
+    See Also
    --------
-    DGLGraph
+    add_self_loop
    """
-    new_g = DGLGraph()
-    new_g.add_nodes(g.number_of_nodes())
-    src, dst = g.all_edges(order="eid")
-    src = F.zerocopy_to_numpy(src)
-    dst = F.zerocopy_to_numpy(dst)
-    non_self_edges_idx = src != dst
-    new_g.add_edges(src[non_self_edges_idx], dst[non_self_edges_idx])
+    etype = g.to_canonical_etype(etype)
+    if etype[0] != etype[2]:
+        raise DGLError(
+            'remove_self_loop does not support unidirectional bipartite graphs: {}.' \
+            'Please make sure the types of head node and tail node are identical.' \
+            ''.format(etype))
+    u, v = g.edges(form='uv', order='eid', etype=etype)
+    self_loop_eids = F.tensor(F.nonzero_1d(u == v), dtype=F.dtype(u))
+    new_g = remove_edges(g, self_loop_eids, etype=etype)
    return new_g

+DGLHeteroGraph.remove_self_loop = remove_self_loop
+
 def reorder_nodes(g, new_node_ids):
    """ Generate a new graph with new node Ids.

@@ -914,7 +1241,7 @@ def reorder_nodes(g, new_node_ids):
            and F.asnumpy(sorted_ids[-1]) == g.number_of_nodes() - 1, \
            "The new node Ids are incorrect."
    new_gidx = _CAPI_DGLReorderGraph(g._graph, new_node_ids.todgltensor())
-    new_g = DGLGraph(new_gidx)
+    new_g = DGLGraphStale(new_gidx)
    new_g.ndata['orig_id'] = idx
    return new_g

@@ -981,7 +1308,7 @@ def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):

    # This creaets a subgraph from subgraphs returned from the CAPI above.
    def create_subgraph(subg, induced_nodes, induced_edges):
-        subg1 = DGLGraph(graph_data=subg.graph, readonly=True)
+        subg1 = DGLGraphStale(graph_data=subg.graph, readonly=True)
        subg1.ndata[NID] = induced_nodes.tousertensor()
        subg1.edata[EID] = induced_edges.tousertensor()
        return subg1
@@ -1246,20 +1573,22 @@ def compact_graphs(graphs, always_preserve=None):
        return_single = True
    if len(graphs) == 0:
        return []
+    if graphs[0].is_block:
+        raise DGLError('Compacting a block graph is not allowed.')

    # Ensure the node types are ordered the same.
    # TODO(BarclayII): we ideally need to remove this constraint.
    ntypes = graphs[0].ntypes
-    graph_dtype = graphs[0]._idtype_str
-    graph_ctx = graphs[0]._graph.ctx
+    idtype = graphs[0].idtype
+    device = graphs[0].device
    for g in graphs:
        assert ntypes == g.ntypes, \
            ("All graphs should have the same node types in the same order, got %s and %s" %
             ntypes, g.ntypes)
-        assert graph_dtype == g._idtype_str, "Expect graph data type to be {}, but got {}".format(
-            graph_dtype, g._idtype_str)
-        assert graph_ctx == g._graph.ctx, "Expect graph device to be {}, but got {}".format(
-            graph_ctx, g._graph.ctx)
+        assert idtype == g.idtype, "Expect graph data type to be {}, but got {}".format(
+            idtype, g.idtype)
+        assert device == g.device, "Expect graph device to be {}, but got {}".format(
+            device, g.device)

    # Process the dictionary or tensor of "always preserve" nodes
    if always_preserve is None:
@@ -1269,19 +1598,18 @@ def compact_graphs(graphs, always_preserve=None):
            raise ValueError("Node type must be given if multiple node types exist.")
        always_preserve = {ntypes[0]: always_preserve}

+    always_preserve = utils.prepare_tensor_dict(graphs[0], always_preserve, 'always_preserve')
    always_preserve_nd = []
    for ntype in ntypes:
        nodes = always_preserve.get(ntype, None)
        if nodes is None:
-            nodes = nd.empty([0], graph_dtype, graph_ctx)
-        else:
-            nodes = F.zerocopy_to_dgl_ndarray(nodes)
-        always_preserve_nd.append(nodes)
+            nodes = F.copy_to(F.tensor([], idtype), device)
+        always_preserve_nd.append(F.to_dgl_nd(nodes))

    # Compact and construct heterographs
    new_graph_indexes, induced_nodes = _CAPI_DGLCompactGraphs(
        [g._graph for g in graphs], always_preserve_nd)
-    induced_nodes = [F.zerocopy_from_dgl_ndarray(nodes) for nodes in induced_nodes]
+    induced_nodes = [F.from_dgl_nd(nodes) for nodes in induced_nodes]

    new_graphs = [
        DGLHeteroGraph(new_graph_index, graph.ntypes, graph.etypes)
@@ -1446,7 +1774,7 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True, copy_ndata=True, copy_e
        g._graph, dst_nodes_nd, include_dst_in_src)

    # The new graph duplicates the original node types to SRC and DST sets.
-    new_ntypes = ([ntype for ntype in g.ntypes], [ntype for ntype in g.ntypes])
+    new_ntypes = (g.ntypes, g.ntypes)
    new_graph = DGLBlock(new_graph_index, new_ntypes, g.etypes)
    assert new_graph.is_unibipartite  # sanity check

@@ -1494,55 +1822,6 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True, copy_ndata=True, copy_e

    return new_graph

-def remove_edges(g, edge_ids):
-    """Return a new graph with given edge IDs removed.
-
-    The nodes are preserved.
-
-    Parameters
-    ----------
-    graph : DGLHeteroGraph
-        The graph
-    edge_ids : Tensor or dict[etypes, Tensor]
-        The edge IDs for each edge type.
-
-    Returns
-    -------
-    DGLHeteroGraph
-        The new graph.
-        The edge ID mapping from the new graph to the original graph is stored as
-        ``dgl.EID`` on edge features.
-    """
-    if not isinstance(edge_ids, Mapping):
-        if len(g.etypes) != 1:
-            raise ValueError(
-                "Graph has more than one edge type; specify a dict for edge_id instead.")
-        edge_ids = {g.canonical_etypes[0]: edge_ids}
-
-    edge_ids_nd = [nd.NULL[g._idtype_str]] * len(g.etypes)
-    for key, value in edge_ids.items():
-        if value.dtype != g.idtype:
-            # if didn't check, this function still works, but returns wrong result
-            raise utils.InconsistentDtypeException("Expect edge id tensors({}) to have \
-         the same index type as graph({})".format(value.dtype, g.idtype))
-        edge_ids_nd[g.get_etype_id(key)] = F.zerocopy_to_dgl_ndarray(value)
-    new_graph_index, induced_eids_nd = _CAPI_DGLRemoveEdges(g._graph, edge_ids_nd)
-
-    new_graph = DGLHeteroGraph(new_graph_index, g.ntypes, g.etypes)
-    for i, canonical_etype in enumerate(g.canonical_etypes):
-        data = induced_eids_nd[i]
-        if len(data) == 0:
-            # Empty means that either
-            # (1) no edges are removed and edges are not shuffled.
-            # (2) all edges are removed.
-            # The following statement deals with both cases.
-            new_graph.edges[canonical_etype].data[EID] = F.arange(
-                0, new_graph.number_of_edges(canonical_etype))
-        else:
-            new_graph.edges[canonical_etype].data[EID] = F.zerocopy_from_dgl_ndarray(data)
-
-    return new_graph
-
 def in_subgraph(g, nodes):
    """Extract the subgraph containing only the in edges of the given nodes.

@@ -1564,14 +1843,17 @@ def in_subgraph(g, nodes):
    DGLHeteroGraph
        The subgraph.
    """
+    if g.is_block:
+        raise DGLError('Extracting subgraph of a block graph is not allowed.')
    if not isinstance(nodes, dict):
        if len(g.ntypes) > 1:
            raise DGLError("Must specify node type when the graph is not homogeneous.")
        nodes = {g.ntypes[0] : nodes}
+    nodes = utils.prepare_tensor_dict(g, nodes, 'nodes')
    nodes_all_types = []
    for ntype in g.ntypes:
        if ntype in nodes:
-            nodes_all_types.append(utils.toindex(nodes[ntype], g._idtype_str).todgltensor())
+            nodes_all_types.append(F.to_dgl_nd(nodes[ntype]))
        else:
            nodes_all_types.append(nd.NULL[g._idtype_str])

@@ -1579,7 +1861,7 @@ def in_subgraph(g, nodes):
    induced_edges = subgidx.induced_edges
    ret = DGLHeteroGraph(subgidx.graph, g.ntypes, g.etypes)
    for i, etype in enumerate(ret.canonical_etypes):
-        ret.edges[etype].data[EID] = induced_edges[i].tousertensor()
+        ret.edges[etype].data[EID] = induced_edges[i]
    return ret

 def out_subgraph(g, nodes):
@@ -1603,14 +1885,17 @@ def out_subgraph(g, nodes):
    DGLHeteroGraph
        The subgraph.
    """
+    if g.is_block:
+        raise DGLError('Extracting subgraph of a block graph is not allowed.')
    if not isinstance(nodes, dict):
        if len(g.ntypes) > 1:
            raise DGLError("Must specify node type when the graph is not homogeneous.")
        nodes = {g.ntypes[0] : nodes}
+    nodes = utils.prepare_tensor_dict(g, nodes, 'nodes')
    nodes_all_types = []
    for ntype in g.ntypes:
        if ntype in nodes:
-            nodes_all_types.append(utils.toindex(nodes[ntype], g._idtype_str).todgltensor())
+            nodes_all_types.append(F.to_dgl_nd(nodes[ntype]))
        else:
            nodes_all_types.append(nd.NULL[g._idtype_str])

@@ -1618,7 +1903,7 @@ def out_subgraph(g, nodes):
    induced_edges = subgidx.induced_edges
    ret = DGLHeteroGraph(subgidx.graph, g.ntypes, g.etypes)
    for i, etype in enumerate(ret.canonical_etypes):
-        ret.edges[etype].data[EID] = induced_edges[i].tousertensor()
+        ret.edges[etype].data[EID] = induced_edges[i]
    return ret

 def to_simple(g, return_counts='count', writeback_mapping=False, copy_ndata=True, copy_edata=False):
@@ -1775,6 +2060,8 @@ def to_simple(g, return_counts='count', writeback_mapping=False, copy_ndata=True
    {('user', 'wins', 'user'): tensor([1, 2, 1, 1])
     ('user', 'plays', 'game'): tensor([1, 1, 1])}
    """
+    if g.is_block:
+        raise DGLError('Cannot convert a block graph to a simple graph.')
    simple_graph_index, counts, edge_maps = _CAPI_DGLToSimpleHetero(g._graph)
    simple_graph = DGLHeteroGraph(simple_graph_index, g.ntypes, g.etypes)
    counts = [F.zerocopy_from_dgl_ndarray(count) for count in counts]
@@ -1814,50 +2101,24 @@ def to_simple(g, return_counts='count', writeback_mapping=False, copy_ndata=True

 DGLHeteroGraph.to_simple = to_simple

-def as_heterograph(g, ntype='_U', etype='_E'):
+def as_heterograph(g, ntype='_U', etype='_E'):  # pylint: disable=unused-argument
    """Convert a DGLGraph to a DGLHeteroGraph with one node and edge type.

-    Node and edge features are preserved. Returns 64 bits graph
-
-    Parameters
-    ----------
-    g : DGLGraph
-        The graph
-    ntype : str, optional
-        The node type name
-    etype : str, optional
-        The edge type name
-
-    Returns
-    -------
-    DGLHeteroGraph
-        The heterograph.
+    DEPRECATED: DGLGraph and DGLHeteroGraph have been merged. This function will
+                do nothing and can be removed safely in all cases.
    """
-    hgi = _CAPI_DGLAsHeteroGraph(g._graph)
-    hg = DGLHeteroGraph(hgi, [ntype], [etype])
-    hg.ndata.update(g.ndata)
-    hg.edata.update(g.edata)
-    return hg
+    dgl_warning('DEPRECATED: DGLGraph and DGLHeteroGraph have been merged in v0.5.\n'
+                '\tdgl.as_heterograph will do nothing and can be removed safely in all cases.')
+    return g

 def as_immutable_graph(hg):
    """Convert a DGLHeteroGraph with one node and edge type into a DGLGraph.

-    Node and edge features are preserved.
-
-    Parameters
-    ----------
-    g : DGLHeteroGraph
-        The heterograph
-
-    Returns
-    -------
-    DGLGraph
-        The graph.
+    DEPRECATED: DGLGraph and DGLHeteroGraph have been merged. This function will
+                do nothing and can be removed safely in all cases.
    """
-    gidx = _CAPI_DGLAsImmutableGraph(hg._graph)
-    g = DGLGraph(gidx)
-    g.ndata.update(hg.ndata)
-    g.edata.update(hg.edata)
-    return g
+    dgl_warning('DEPRECATED: DGLGraph and DGLHeteroGraph have been merged in v0.5.\n'
+                '\tdgl.as_immutable_graph will do nothing and can be removed safely in all cases.')
+    return hg

 _init_api("dgl.transform")
--- a/python/dgl/utils/__init__.py
+++ b/python/dgl/utils/__init__.py
+"""Internal utilities."""
+from .internal import *
+from .data import *
+from .checks import *