[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)

* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>

[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)
* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>
44089c8b · Minjie Wang · GitHub · 015acfd2 · 44089c8b · 015acfd2
Unverified Commit 44089c8b authored Jul 28, 2020 by Minjie Wang Committed by GitHub Jul 28, 2020
20 changed files
--- a/tests/compute/test_kernel.py
+++ b/tests/compute/test_kernel.py
@@ -71,7 +71,7 @@ def generate_feature(g, broadcast='none', binary_op='none'):
            u = F.tensor(np.random.uniform(-1, 1, (nv, D1, D2, D3)))
            e = F.tensor(np.random.uniform(-1, 1, (ne, D1, D2, D3)))
            v = F.tensor(np.random.uniform(-1, 1, (nv, D1, D2, D3)))
-    return u, v, e
+    return F.astype(u, F.float32), F.astype(v, F.float32), F.astype(e, F.float32)


 def test_copy_src_reduce():
@@ -80,9 +80,10 @@ def test_copy_src_reduce():
        # NOTE(zihao): add self-loop to avoid zero-degree nodes.
        # https://github.com/dmlc/dgl/issues/761
        g.add_edges(g.nodes(), g.nodes())
+        g = g.to(F.ctx())
        hu, hv, he = generate_feature(g, 'none', 'none')
        if partial:
-            nid = F.tensor(list(range(0, 100, 2)))
+            nid = F.tensor(list(range(0, 100, 2)), g.idtype)

        g.ndata['u'] = F.attach_grad(F.clone(hu))
        g.ndata['v'] = F.attach_grad(F.clone(hv))
@@ -141,9 +142,10 @@ def test_copy_edge_reduce():
        g = dgl.DGLGraph(nx.erdos_renyi_graph(100, 0.1))
        # NOTE(zihao): add self-loop to avoid zero-degree nodes.
        g.add_edges(g.nodes(), g.nodes())
+        g = g.to(F.ctx())
        hu, hv, he = generate_feature(g, 'none', 'none')
        if partial:
-            nid = F.tensor(list(range(0, 100, 2)))
+            nid = F.tensor(list(range(0, 100, 2)), g.idtype)

        g.ndata['u'] = F.attach_grad(F.clone(hu))
        g.ndata['v'] = F.attach_grad(F.clone(hv))
@@ -348,7 +350,8 @@ def test_all_binary_builtins():
    g.add_edge(18, 1)
    g.add_edge(19, 0)
    g.add_edge(19, 1)
-    nid = F.tensor([0, 1, 4, 5, 7, 12, 14, 15, 18, 19])
+    g = g.to(F.ctx())
+    nid = F.tensor([0, 1, 4, 5, 7, 12, 14, 15, 18, 19], g.idtype)
    target = ["u", "v", "e"]

    for lhs, rhs in product(target, target):

--- a/tests/compute/test_multi_send_recv.py
+++ b/tests/compute/test_multi_send_recv.py
-import numpy as np
-import dgl
-from dgl.graph import DGLGraph
-from collections import defaultdict as ddict
-import scipy.sparse as sp
-import backend as F
-
-D = 5
-
-def message_func(edges):
-    assert len(edges.src['h'].shape) == 2
-    assert edges.src['h'].shape[1] == D
-    return {'m' : edges.src['h']}
-
-def reduce_func(nodes):
-    msgs = nodes.mailbox['m']
-    assert len(msgs.shape) == 3
-    assert msgs.shape[2] == D
-    return {'accum' : F.sum(msgs, 1)}
-
-def apply_node_func(nodes):
-    return {'h' : nodes.data['h'] + nodes.data['accum']}
-
-def generate_graph(grad=False):
-    g = DGLGraph()
-    g.add_nodes(10) # 10 nodes.
-    # create a graph where 0 is the source and 9 is the sink
-    # 16 edges
-    for i in range(1, 9):
-        g.add_edge(0, i)
-        g.add_edge(i, 9)
-    ncol = F.randn((10, D))
-    ecol = F.randn((16, D))
-    if grad:
-        ncol = F.attach_grad(ncol)
-        ecol = F.attach_grad(ecol)
-    g.set_n_initializer(dgl.init.zero_initializer)
-    g.set_e_initializer(dgl.init.zero_initializer)
-    g.ndata['h'] = ncol
-    g.edata['w'] = ecol
-    return g
-
-def test_multi_send():
-    g = generate_graph()
-    def _fmsg(edges):
-        assert edges.src['h'].shape == (5, D)
-        return {'m' : edges.src['h']}
-    g.register_message_func(_fmsg)
-    # many-many send
-    u = F.tensor([0, 0, 0, 0, 0])
-    v = F.tensor([1, 2, 3, 4, 5])
-    g.send((u, v))
-    # duplicate send
-    u = F.tensor([0])
-    v = F.tensor([1, 2, 3, 4, 5])
-    g.send((u, v))
-    # send more
-    u = F.tensor([1, 2, 3, 4, 5])
-    v = F.tensor([9])
-    g.send((u, v))
-
-    # check if message indicator is as expected
-    expected = F.copy_to(F.zeros((g.number_of_edges(),), dtype=F.int64), F.cpu())
-    eid = g.edge_ids([0, 0, 0, 0, 0, 1, 2, 3, 4, 5],
-                     [1, 2, 3, 4, 5, 9, 9, 9, 9, 9])
-    expected = F.asnumpy(expected)
-    eid = F.asnumpy(eid)
-    expected[eid] = 1
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-
-def test_multi_recv():
-    # basic recv test
-    g = generate_graph()
-    h = g.ndata['h']
-    g.register_message_func(message_func)
-    g.register_reduce_func(reduce_func)
-    g.register_apply_node_func(apply_node_func)
-    expected = F.copy_to(F.zeros((g.number_of_edges(),), dtype=F.int64), F.cpu())
-    # two separate round of send and recv
-    u = [4, 5, 6]
-    v = [9]
-    g.send((u, v))
-    eid = g.edge_ids(u, v)
-    expected = F.asnumpy(expected)
-    eid = F.asnumpy(eid)
-    expected[eid] = 1
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-    g.recv(v)
-    expected[eid] = 0
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-
-    u = [0]
-    v = [1, 2, 3]
-    g.send((u, v))
-    eid = g.edge_ids(u, v)
-    eid = F.asnumpy(eid)
-    expected[eid] = 1
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-    g.recv(v)
-    expected[eid] = 0
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-
-    h1 = g.ndata['h']
-
-    # one send, two recv
-    g.ndata['h'] = h
-    u = F.tensor([0, 0, 0, 4, 5, 6])
-    v = F.tensor([1, 2, 3, 9, 9, 9])
-    g.send((u, v))
-    eid = g.edge_ids(u, v)
-    eid = F.asnumpy(eid)
-    expected[eid] = 1
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-    u = [4, 5, 6]
-    v = [9]
-    g.recv(v)
-    eid = g.edge_ids(u, v)
-    eid = F.asnumpy(eid)
-    expected[eid] = 0
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-    u = [0]
-    v = [1, 2, 3]
-    g.recv(v)
-    eid = g.edge_ids(u, v)
-    eid = F.asnumpy(eid)
-    expected[eid] = 0
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-
-    h2 = g.ndata['h']
-    assert F.allclose(h1, h2)
-
-def test_multi_recv_0deg():
-    # test recv with 0deg nodes;
-    g = DGLGraph()
-    def _message(edges):
-        return {'m' : edges.src['h']}
-    def _reduce(nodes):
-        return {'h' : nodes.data['h'] + F.sum(nodes.mailbox['m'], 1)}
-    def _apply(nodes):
-        return {'h' : nodes.data['h'] * 2}
-    def _init2(shape, dtype, ctx, ids):
-        return 2 + F.zeros(shape, dtype=dtype, ctx=ctx)
-    g.register_message_func(_message)
-    g.register_reduce_func(_reduce)
-    g.register_apply_node_func(_apply)
-    g.set_n_initializer(_init2)
-    g.add_nodes(2)
-    g.add_edge(0, 1)
-    # recv both 0deg and non-0deg nodes
-    old = F.randn((2, 5))
-    g.ndata['h'] = old
-    g.send((0, 1))
-    g.recv([0, 1])
-    new = g.ndata['h']
-    # 0deg check: initialized with the func and got applied
-    assert F.allclose(new[0], F.full((5,), 4, F.float32))
-    # non-0deg check
-    assert F.allclose(new[1], F.sum(old, 0) * 2)
-
-    # recv again on zero degree node
-    g.recv([0])
-    assert F.allclose(g.nodes[0].data['h'], F.full((5,), 8, F.float32))
-
-    # recv again on node with no incoming message
-    g.recv([1])
-    assert F.allclose(g.nodes[1].data['h'], F.sum(old, 0) * 4)
-
-def test_send_twice_different_shape():
-    g = generate_graph()
-    def _message_1(edges):
-        return {'h': edges.src['h']}
-    def _message_2(edges):
-        return {'h': F.cat((edges.src['h'], edges.data['w']), dim=1)}
-    g.send(message_func=_message_1)
-    g.send(message_func=_message_2)
-
-def test_send_twice_different_msg():
-    g = DGLGraph()
-    g.set_n_initializer(dgl.init.zero_initializer)
-    g.add_nodes(3)
-    g.add_edge(0, 1)
-    g.add_edge(2, 1)
-    def _message_a(edges):
-        return {'a': edges.src['a']}
-    def _message_b(edges):
-        return {'a': edges.src['a'] * 3}
-    def _reduce(nodes):
-        return {'a': F.max(nodes.mailbox['a'], 1)}
-
-    old_repr = F.randn((3, 5))
-    g.ndata['a'] = old_repr
-    g.send((0, 1), _message_a)
-    g.send((0, 1), _message_b)
-    g.recv(1, _reduce)
-    new_repr = g.ndata['a']
-    assert F.allclose(new_repr[1], old_repr[0] * 3)
-
-    g.ndata['a'] = old_repr
-    g.send((0, 1), _message_a)
-    g.send((2, 1), _message_b)
-    g.recv(1, _reduce)
-    new_repr = g.ndata['a']
-    assert F.allclose(new_repr[1], F.max(F.stack([old_repr[0], old_repr[2] * 3], 0), 0))
-
-def test_send_twice_different_field():
-    g = DGLGraph()
-    g.set_n_initializer(dgl.init.zero_initializer)
-    g.add_nodes(2)
-    g.add_edge(0, 1)
-    def _message_a(edges):
-        return {'a': edges.src['a']}
-    def _message_b(edges):
-        return {'b': edges.src['b']}
-    def _reduce(nodes):
-        return {'a': F.sum(nodes.mailbox['a'], 1), 'b': F.sum(nodes.mailbox['b'], 1)}
-    old_a = F.randn((2, 5))
-    old_b = F.randn((2, 5))
-    g.set_n_repr({'a': old_a, 'b': old_b})
-    g.send((0, 1), _message_a)
-    g.send((0, 1), _message_b)
-    g.recv([1], _reduce)
-    new_repr = g.get_n_repr()
-    assert F.allclose(new_repr['a'][1], old_a[0])
-    assert F.allclose(new_repr['b'][1], old_b[0])
-
-def test_dynamic_addition():
-    N = 3
-    D = 1
-
-    g = DGLGraph()
-    def _init(shape, dtype, ctx, ids):
-        return F.copy_to(F.astype(F.randn(shape), dtype), ctx)
-    g.set_n_initializer(_init)
-    g.set_e_initializer(_init)
-
-    def _message(edges):
-        return {'m' : edges.src['h1'] + edges.dst['h2'] + edges.data['h1'] +
-                edges.data['h2']}
-    def _reduce(nodes):
-        return {'h' : F.sum(nodes.mailbox['m'], 1)}
-    def _apply(nodes):
-        return {'h' : nodes.data['h']}
-
-    g.register_message_func(_message)
-    g.register_reduce_func(_reduce)
-    g.register_apply_node_func(_apply)
-    g.set_n_initializer(dgl.init.zero_initializer)
-    g.set_e_initializer(dgl.init.zero_initializer)
-
-    # add nodes and edges
-    g.add_nodes(N)
-    g.ndata.update({'h1': F.randn((N, D)),
-                    'h2': F.randn((N, D))})
-    g.add_nodes(3)
-    g.add_edge(0, 1)
-    g.add_edge(1, 0)
-    g.edata.update({'h1': F.randn((2, D)),
-                    'h2': F.randn((2, D))})
-    g.send()
-    expected = F.copy_to(F.ones((g.number_of_edges(),), dtype=F.int64), F.cpu())
-    assert F.array_equal(g._get_msg_index().tousertensor(), expected)
-
-    # add more edges
-    g.add_edges([0, 2], [2, 0], {'h1': F.randn((2, D))})
-    g.send(([0, 2], [2, 0]))
-    g.recv(0)
-
-    g.add_edge(1, 2)
-    g.edges[4].data['h1'] = F.randn((1, D))
-    g.send((1, 2))
-    g.recv([1, 2])
-
-    h = g.ndata.pop('h')
-
-    # a complete round of send and recv
-    g.send()
-    g.recv()
-    assert F.allclose(h, g.ndata['h'])
-
-def test_recv_no_send():
-    g = generate_graph()
-    g.recv(1, reduce_func)
-    # test recv after clear
-    g.clear()
-    g.add_nodes(3)
-    g.add_edges([0, 1], [1, 2])
-    g.set_n_initializer(dgl.init.zero_initializer)
-    g.ndata['h'] = F.randn((3, D))
-    g.send((1, 2), message_func)
-    expected = F.copy_to(F.zeros(2, dtype=F.int64), F.cpu())
-    expected = F.asnumpy(expected)
-    expected[1] = 1
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-    g.recv(2, reduce_func)
-    expected[1] = 0
-    assert np.array_equal(g._get_msg_index().tonumpy(), expected)
-
-def test_send_recv_after_conversion():
-    # test send and recv after converting from a graph with edges
-
-    g = generate_graph()
-
-    # nx graph
-    nxg = g.to_networkx(node_attrs=['h'])
-    g1 = DGLGraph()
-    # some random node and edges
-    g1.add_nodes(4)
-    g1.add_edges([1, 2], [2, 3])
-    g1.set_n_initializer(dgl.init.zero_initializer)
-    g1.from_networkx(nxg, node_attrs=['h'])
-
-    # sparse matrix
-    row, col= g.all_edges()
-    data = range(len(row))
-    n = g.number_of_nodes()
-    a = sp.coo_matrix(
-            (data, (F.zerocopy_to_numpy(row), F.zerocopy_to_numpy(col))),
-            shape=(n, n))
-    g2 = DGLGraph()
-    # some random node and edges
-    g2.add_nodes(5)
-    g2.add_edges([1, 2, 4], [2, 3, 0])
-    g2.set_n_initializer(dgl.init.zero_initializer)
-    g2.from_scipy_sparse_matrix(a)
-    g2.ndata['h'] = g.ndata['h']
-
-    # on dgl graph
-    g.send(message_func=message_func)
-    g.recv([0, 1, 3, 5], reduce_func=reduce_func,
-           apply_node_func=apply_node_func)
-    g.recv([0, 2, 4, 8], reduce_func=reduce_func,
-           apply_node_func=apply_node_func)
-
-    # nx
-    g1.send(message_func=message_func)
-    g1.recv([0, 1, 3, 5], reduce_func=reduce_func,
-            apply_node_func=apply_node_func)
-    g1.recv([0, 2, 4, 8], reduce_func=reduce_func,
-            apply_node_func=apply_node_func)
-
-    # sparse matrix
-    g2.send(message_func=message_func)
-    g2.recv([0, 1, 3, 5], reduce_func=reduce_func,
-            apply_node_func=apply_node_func)
-    g2.recv([0, 2, 4, 8], reduce_func=reduce_func,
-            apply_node_func=apply_node_func)
-
-    assert F.allclose(g.ndata['h'], g1.ndata['h'])
-    assert F.allclose(g.ndata['h'], g2.ndata['h'])
-
-
-if __name__ == '__main__':
-    test_multi_send()
-    test_multi_recv()
-    test_multi_recv_0deg()
-    test_dynamic_addition()
-    test_send_twice_different_shape()
-    test_send_twice_different_msg()
-    test_send_twice_different_field()
-    test_recv_no_send()
-    test_send_recv_after_conversion()
--- a/tests/compute/test_nodeflow.py
+++ b/tests/compute/test_nodeflow.py
@@ -28,40 +28,17 @@ def generate_rand_graph(n, connect_more=False, complete=False, add_self_loop=Fal
            arr[0] = 1
            arr[:, 0] = 1
    if add_self_loop:
-        g = dgl.DGLGraph(arr, readonly=False)
+        g = dgl.DGLGraphStale(arr, readonly=False)
        nodes = np.arange(g.number_of_nodes())
        g.add_edges(nodes, nodes)
        g.readonly()
    else:
-        g = dgl.DGLGraph(arr, readonly=True)
+        g = dgl.DGLGraphStale(arr, readonly=True)
    g.ndata['h1'] = F.randn((g.number_of_nodes(), 10))
    g.edata['h2'] = F.randn((g.number_of_edges(), 3))
    return g


-def test_self_loop():
-    n = 100
-    num_hops = 2
-    g = generate_rand_graph(n, complete=True)
-    nf = create_mini_batch(g, num_hops, add_self_loop=True)
-    for i in range(1, nf.num_layers):
-        in_deg = nf.layer_in_degree(i)
-        deg = F.copy_to(F.ones(in_deg.shape, dtype=F.int64), F.cpu()) * n
-        assert_array_equal(F.asnumpy(in_deg), F.asnumpy(deg))
-
-    g = generate_rand_graph(n, complete=True, add_self_loop=True)
-    g = dgl.to_simple_graph(g)
-    nf = create_mini_batch(g, num_hops, add_self_loop=True)
-    for i in range(nf.num_blocks):
-        parent_eid = F.asnumpy(nf.block_parent_eid(i))
-        parent_nid = F.asnumpy(nf.layer_parent_nid(i + 1))
-        # The loop eid in the parent graph must exist in the block parent eid.
-        parent_loop_eid = F.asnumpy(g.edge_ids(parent_nid, parent_nid))
-        assert len(parent_loop_eid) == len(parent_nid)
-        for eid in parent_loop_eid:
-            assert eid in parent_eid
-
-
 def create_mini_batch(g, num_hops, add_self_loop=False):
    seed_ids = np.array([1, 2, 0, 3])
    sampler = NeighborSampler(g, batch_size=4, expand_factor=g.number_of_nodes(),

--- a/tests/compute/test_pickle.py
+++ b/tests/compute/test_pickle.py
@@ -9,7 +9,9 @@ import backend as F
 import dgl.function as fn
 import pickle
 import io
-import unittest
+import unittest, pytest
+import test_utils
+from test_utils import parametrize_dtype, get_cases

 def _assert_is_identical(g, g2):
    assert g.is_readonly == g2.is_readonly
@@ -147,130 +149,15 @@ def test_pickling_frame():
 def _global_message_func(nodes):
    return {'x': nodes.data['x']}

-def test_pickling_graph():
-    # graph structures and frames are pickled
-    g = dgl.DGLGraph()
-    g.add_nodes(3)
-    src = F.tensor([0, 0])
-    dst = F.tensor([1, 2])
-    g.add_edges(src, dst)
-
-    x = F.randn((3, 7))
-    y = F.randn((3, 5))
-    a = F.randn((2, 6))
-    b = F.randn((2, 4))
-
-    g.ndata['x'] = x
-    g.ndata['y'] = y
-    g.edata['a'] = a
-    g.edata['b'] = b
-
-    # registered functions are pickled
-    g.register_message_func(_global_message_func)
-    reduce_func = fn.sum('x', 'x')
-    g.register_reduce_func(reduce_func)
-
-    # custom attributes should be pickled
-    g.foo = 2
-
-    new_g = _reconstruct_pickle(g)
-
-    _assert_is_identical(g, new_g)
-    assert new_g.foo == 2
-    assert new_g._message_func == _global_message_func
-    assert isinstance(new_g._reduce_func, type(reduce_func))
-    assert new_g._reduce_func._name == 'sum'
-    assert new_g._reduce_func.msg_field == 'x'
-    assert new_g._reduce_func.out_field == 'x'
-
-    # test batched graph with partial set case
-    g2 = dgl.DGLGraph()
-    g2.add_nodes(4)
-    src2 = F.tensor([0, 1])
-    dst2 = F.tensor([2, 3])
-    g2.add_edges(src2, dst2)
-
-    x2 = F.randn((4, 7))
-    y2 = F.randn((3, 5))
-    a2 = F.randn((2, 6))
-    b2 = F.randn((2, 4))
-
-    g2.ndata['x'] = x2
-    g2.nodes[[0, 1, 3]].data['y'] = y2
-    g2.edata['a'] = a2
-    g2.edata['b'] = b2
-
-    bg = dgl.batch([g, g2])
-
-    bg2 = _reconstruct_pickle(bg)
-
-    _assert_is_identical(bg, bg2)
-    new_g, new_g2 = dgl.unbatch(bg2)
-    _assert_is_identical(g, new_g)
-    _assert_is_identical(g2, new_g2)
-
-    # readonly graph
-    g = dgl.DGLGraph([(0, 1), (1, 2)], readonly=True)
-    new_g = _reconstruct_pickle(g)
-    _assert_is_identical(g, new_g)
-
-    # multigraph
-    g = dgl.DGLGraph([(0, 1), (0, 1), (1, 2)])
-    new_g = _reconstruct_pickle(g)
-    _assert_is_identical(g, new_g)
-
-    # readonly multigraph
-    g = dgl.DGLGraph([(0, 1), (0, 1), (1, 2)], readonly=True)
-    new_g = _reconstruct_pickle(g)
-    _assert_is_identical(g, new_g)
-
-def test_pickling_nodeflow():
-    elist = [(0, 1), (1, 2), (2, 3), (3, 0)]
-    g = dgl.DGLGraph(elist, readonly=True)
-    g.ndata['x'] = F.randn((4, 5))
-    g.edata['y'] = F.randn((4, 3))
-    nf = contrib.sampling.sampler.create_full_nodeflow(g, 5)
-    nf.copy_from_parent()  # add features
-    new_nf = _reconstruct_pickle(nf)
-    _assert_is_identical_nodeflow(nf, new_nf)
-
-def test_pickling_batched_graph():
-    glist = [nx.path_graph(i + 5) for i in range(5)]
-    glist = [dgl.DGLGraph(g) for g in glist]
-    bg = dgl.batch(glist)
-    bg.ndata['x'] = F.randn((35, 5))
-    bg.edata['y'] = F.randn((60, 3))
-    new_bg = _reconstruct_pickle(bg)
-    _assert_is_identical_batchedgraph(bg, new_bg)
-
-def test_pickling_heterograph():
-    # copied from test_heterograph.create_test_heterograph()
-    plays_spmat = ssp.coo_matrix(([1, 1, 1, 1], ([0, 1, 2, 1], [0, 0, 1, 1])))
-    wishes_nx = nx.DiGraph()
-    wishes_nx.add_nodes_from(['u0', 'u1', 'u2'], bipartite=0)
-    wishes_nx.add_nodes_from(['g0', 'g1'], bipartite=1)
-    wishes_nx.add_edge('u0', 'g1', id=0)
-    wishes_nx.add_edge('u2', 'g0', id=1)
-
-    follows_g = dgl.graph([(0, 1), (1, 2)], 'user', 'follows')
-    plays_g = dgl.bipartite(plays_spmat, 'user', 'plays', 'game')
-    wishes_g = dgl.bipartite(wishes_nx, 'user', 'wishes', 'game')
-    develops_g = dgl.bipartite([(0, 0), (1, 1)], 'developer', 'develops', 'game')
-    g = dgl.hetero_from_relations([follows_g, plays_g, wishes_g, develops_g])
-
-    g.nodes['user'].data['u_h'] = F.randn((3, 4))
-    g.nodes['game'].data['g_h'] = F.randn((2, 5))
-    g.edges['plays'].data['p_h'] = F.randn((4, 6))
-
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(exclude=['dglgraph']))
+def test_pickling_graph(g, idtype):
+    g = g.astype(idtype)
    new_g = _reconstruct_pickle(g)
-    _assert_is_identical_hetero(g, new_g)
-
-    block = dgl.to_block(g, {'user': [1, 2], 'game': [0, 1], 'developer': []})
-    new_block = _reconstruct_pickle(block)
-    _assert_is_identical_hetero(block, new_block)
-    assert block.is_block
-    assert new_block.is_block
+    test_utils.check_graph_equal(g, new_g, check_feature=True)

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_pickling_batched_heterograph():
    # copied from test_heterograph.create_test_heterograph()
    plays_spmat = ssp.coo_matrix(([1, 1, 1, 1], ([0, 1, 2, 1], [0, 0, 1, 1])))
@@ -296,8 +183,9 @@ def test_pickling_batched_heterograph():

    bg = dgl.batch_hetero([g, g2])
    new_bg = _reconstruct_pickle(bg)
-    _assert_is_identical_batchedhetero(bg, new_bg)
+    test_utils.check_graph_equal(bg, new_bg)

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @unittest.skipIf(dgl.backend.backend_name != "pytorch", reason="Only test for pytorch format file")
 def test_pickling_heterograph_index_compatibility():
    plays_spmat = ssp.coo_matrix(([1, 1, 1, 1], ([0, 1, 2, 1], [0, 0, 1, 1])))

--- a/tests/compute/test_propagate.py
+++ b/tests/compute/test_propagate.py
 import dgl
 import networkx as nx
 import backend as F
+import unittest
 import utils as U
+from utils import parametrize_dtype

 def mfunc(edges):
    return {'m' : edges.src['x']}
@@ -10,18 +12,20 @@ def rfunc(nodes):
    msg = F.sum(nodes.mailbox['m'], 1)
    return {'x' : nodes.data['x'] + msg}

-def test_prop_nodes_bfs():
-    g = dgl.DGLGraph(nx.path_graph(5))
-    g = dgl.graph(g.edges())
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+@parametrize_dtype
+def test_prop_nodes_bfs(idtype):
+    g = dgl.graph(nx.path_graph(5), idtype=idtype, device=F.ctx())
    g.ndata['x'] = F.ones((5, 2))
    dgl.prop_nodes_bfs(g, 0, message_func=mfunc, reduce_func=rfunc, apply_node_func=None)
    # pull nodes using bfs order will result in a cumsum[i] + data[i] + data[i+1]
    assert F.allclose(g.ndata['x'],
            F.tensor([[2., 2.], [4., 4.], [6., 6.], [8., 8.], [9., 9.]]))

-def test_prop_edges_dfs():
-    g = dgl.DGLGraph(nx.path_graph(5))
-    g = dgl.graph(g.edges())
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+@parametrize_dtype
+def test_prop_edges_dfs(idtype):
+    g = dgl.graph(nx.path_graph(5), idtype=idtype, device=F.ctx())
    g.ndata['x'] = F.ones((5, 2))
    dgl.prop_edges_dfs(g, 0, message_func=mfunc, reduce_func=rfunc, apply_node_func=None)
    # snr using dfs results in a cumsum
@@ -40,10 +44,11 @@ def test_prop_edges_dfs():
    assert F.allclose(g.ndata['x'],
            F.tensor([[3., 3.], [5., 5.], [7., 7.], [9., 9.], [5., 5.]]))

-def test_prop_nodes_topo():
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+@parametrize_dtype
+def test_prop_nodes_topo(idtype):
    # bi-directional chain
-    g = dgl.DGLGraph(nx.path_graph(5))
-    g = dgl.graph(g.edges())
+    g = dgl.graph(nx.path_graph(5), idtype=idtype, device=F.ctx())
    assert U.check_fail(dgl.prop_nodes_topo, g)  # has loop

    # tree

--- a/tests/compute/test_randomwalk.py
+++ b/tests/compute/test_randomwalk.py
@@ -12,7 +12,7 @@ def test_random_walk():
    n_traces = 3
    n_hops = 4

-    g = dgl.DGLGraph(edge_list, readonly=True)
+    g = dgl.DGLGraphStale(edge_list, readonly=True)
    traces = dgl.contrib.sampling.random_walk(g, seeds, n_traces, n_hops)
    traces = F.zerocopy_to_numpy(traces)

@@ -31,7 +31,7 @@ def test_random_walk_with_restart():
    seeds = [0, 1]
    max_nodes = 10

-    g = dgl.DGLGraph(edge_list)
+    g = dgl.DGLGraphStale(edge_list)

    # test normal RWR
    traces = dgl.contrib.sampling.random_walk_with_restart(g, seeds, 0.2, max_nodes)
@@ -61,9 +61,9 @@ def test_random_walk_with_restart():
            assert (trace_diff % 2 == 0).all()

 @parametrize_dtype
-def test_metapath_random_walk(index_dtype):
-    g1 = dgl.bipartite(([0, 1, 2, 3], [0, 1, 2, 3]), 'a', 'ab', 'b', index_dtype=index_dtype)
-    g2 = dgl.bipartite(([0, 0, 1, 1, 2, 2, 3, 3], [1, 3, 2, 0, 3, 1, 0, 2]), 'b', 'ba', 'a', index_dtype=index_dtype)
+def test_metapath_random_walk(idtype):
+    g1 = dgl.bipartite(([0, 1, 2, 3], [0, 1, 2, 3]), 'a', 'ab', 'b', idtype=idtype)
+    g2 = dgl.bipartite(([0, 0, 1, 1, 2, 2, 3, 3], [1, 3, 2, 0, 3, 1, 0, 2]), 'b', 'ba', 'a', idtype=idtype)
    G = dgl.hetero_from_relations([g1, g2])
    seeds = [0, 1]
    traces = dgl.contrib.sampling.metapath_random_walk(G, ['ab', 'ba'] * 4, seeds, 3)

--- a/tests/compute/test_readout.py
+++ b/tests/compute/test_readout.py
 import dgl
+import numpy as np
 import backend as F
 import networkx as nx
 import unittest
-
-def test_simple_readout():
-    g1 = dgl.DGLGraph()
-    g1.add_nodes(3)
-    g2 = dgl.DGLGraph()
-    g2.add_nodes(4) # no edges
-    g1.add_edges([0, 1, 2], [2, 0, 1])
-
-    n1 = F.randn((3, 5))
-    n2 = F.randn((4, 5))
-    e1 = F.randn((3, 5))
-    s1 = F.sum(n1, 0)   # node sums
-    s2 = F.sum(n2, 0)
-    se1 = F.sum(e1, 0)  # edge sums
-    m1 = F.mean(n1, 0)  # node means
-    m2 = F.mean(n2, 0)
-    me1 = F.mean(e1, 0) # edge means
-    w1 = F.randn((3,))
-    w2 = F.randn((4,))
-    max1 = F.max(n1, 0)
-    max2 = F.max(n2, 0)
-    maxe1 = F.max(e1, 0)
-    ws1 = F.sum(n1 * F.unsqueeze(w1, 1), 0)
-    ws2 = F.sum(n2 * F.unsqueeze(w2, 1), 0)
-    wm1 = F.sum(n1 * F.unsqueeze(w1, 1), 0) / F.sum(F.unsqueeze(w1, 1), 0)
-    wm2 = F.sum(n2 * F.unsqueeze(w2, 1), 0) / F.sum(F.unsqueeze(w2, 1), 0)
-    g1.ndata['x'] = n1
-    g2.ndata['x'] = n2
-    g1.ndata['w'] = w1
-    g2.ndata['w'] = w2
-    g1.edata['x'] = e1
-
-    assert F.allclose(dgl.sum_nodes(g1, 'x'), s1)
-    assert F.allclose(dgl.sum_nodes(g1, 'x', 'w'), ws1)
-    assert F.allclose(dgl.sum_edges(g1, 'x'), se1)
-    assert F.allclose(dgl.mean_nodes(g1, 'x'), m1)
-    assert F.allclose(dgl.mean_nodes(g1, 'x', 'w'), wm1)
-    assert F.allclose(dgl.mean_edges(g1, 'x'), me1)
-    assert F.allclose(dgl.max_nodes(g1, 'x'), max1)
-    assert F.allclose(dgl.max_edges(g1, 'x'), maxe1)
-
-    g = dgl.batch([g1, g2])
-    s = dgl.sum_nodes(g, 'x')
-    m = dgl.mean_nodes(g, 'x')
-    max_bg = dgl.max_nodes(g, 'x')
-    assert F.allclose(s, F.stack([s1, s2], 0))
-    assert F.allclose(m, F.stack([m1, m2], 0))
-    assert F.allclose(max_bg, F.stack([max1, max2], 0))
-    ws = dgl.sum_nodes(g, 'x', 'w')
-    wm = dgl.mean_nodes(g, 'x', 'w')
-    assert F.allclose(ws, F.stack([ws1, ws2], 0))
-    assert F.allclose(wm, F.stack([wm1, wm2], 0))
-    s = dgl.sum_edges(g, 'x')
-    m = dgl.mean_edges(g, 'x')
-    max_bg_e = dgl.max_edges(g, 'x')
-    assert F.allclose(s, F.stack([se1, F.zeros(5)], 0))
-    assert F.allclose(m, F.stack([me1, F.zeros(5)], 0))
-    # TODO(zihao): fix -inf issue
-    # assert F.allclose(max_bg_e, F.stack([maxe1, F.zeros(5)], 0)) 
-
-
-def test_topk_nodes():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(14))
-
-    feat0 = F.randn((g0.number_of_nodes(), 10))
-    g0.ndata['x'] = feat0
-    # to test the case where k > number of nodes.
-    dgl.topk_nodes(g0, 'x', 20, idx=-1)
-    # test correctness
-    val, indices = dgl.topk_nodes(g0, 'x', 5, idx=-1)
-    ground_truth = F.reshape(
-        F.argsort(F.slice_axis(feat0, -1, 9, 10), 0, True)[:5], (5,))
-    assert F.allclose(ground_truth, indices)
-    g0.ndata.pop('x')
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(12))
-    feat1 = F.randn((g1.number_of_nodes(), 10))
-
-    bg = dgl.batch([g0, g1])
-    bg.ndata['x'] = F.cat([feat0, feat1], 0)
-    # to test the case where k > number of nodes.
-    dgl.topk_nodes(bg, 'x', 16, idx=1)
-    # test correctness
-    val, indices = dgl.topk_nodes(bg, 'x', 6, descending=False, idx=0)
-    ground_truth_0 = F.reshape(
-        F.argsort(F.slice_axis(feat0, -1, 0, 1), 0, False)[:6], (6,))
-    ground_truth_1 = F.reshape(
-        F.argsort(F.slice_axis(feat1, -1, 0, 1), 0, False)[:6], (6,))
-    ground_truth = F.stack([ground_truth_0, ground_truth_1], 0)
-    assert F.allclose(ground_truth, indices)
-
-    # test idx=None
-    val, indices = dgl.topk_nodes(bg, 'x', 6, descending=True)
-    assert F.allclose(val, F.stack([F.topk(feat0, 6, 0), F.topk(feat1, 6, 0)], 0))
-
-
-def test_topk_edges():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(14))
-
-    feat0 = F.randn((g0.number_of_edges(), 10))
-    g0.edata['x'] = feat0
-    # to test the case where k > number of edges.
-    dgl.topk_edges(g0, 'x', 30, idx=-1)
-    # test correctness
-    val, indices = dgl.topk_edges(g0, 'x', 7, idx=-1)
-    ground_truth = F.reshape(
-        F.argsort(F.slice_axis(feat0, -1, 9, 10), 0, True)[:7], (7,))
-    assert F.allclose(ground_truth, indices)
-    g0.edata.pop('x')
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(12))
-    feat1 = F.randn((g1.number_of_edges(), 10))
-
-    bg = dgl.batch([g0, g1])
-    bg.edata['x'] = F.cat([feat0, feat1], 0)
-    # to test the case where k > number of edges.
-    dgl.topk_edges(bg, 'x', 33, idx=1)
-    # test correctness
-    val, indices = dgl.topk_edges(bg, 'x', 4, descending=False, idx=0)
-    ground_truth_0 = F.reshape(
-        F.argsort(F.slice_axis(feat0, -1, 0, 1), 0, False)[:4], (4,))
-    ground_truth_1 = F.reshape(
-        F.argsort(F.slice_axis(feat1, -1, 0, 1), 0, False)[:4], (4,))
-    ground_truth = F.stack([ground_truth_0, ground_truth_1], 0)
-    assert F.allclose(ground_truth, indices)
-
-    # test idx=None
-    val, indices = dgl.topk_edges(bg, 'x', 6, descending=True)
-    assert F.allclose(val, F.stack([F.topk(feat0, 6, 0), F.topk(feat1, 6, 0)], 0))
-
-def test_softmax_nodes():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(9))
-
-    feat0 = F.randn((g0.number_of_nodes(), 10))
-    g0.ndata['x'] = feat0
-    ground_truth = F.softmax(feat0, dim=0)
-    assert F.allclose(dgl.softmax_nodes(g0, 'x'), ground_truth)
-    g0.ndata.pop('x')
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(5))
-    g2 = dgl.DGLGraph(nx.path_graph(3))
-    g3 = dgl.DGLGraph()
-    g4 = dgl.DGLGraph(nx.path_graph(10))
-    bg = dgl.batch([g0, g1, g2, g3, g4])
-    feat1 = F.randn((g1.number_of_nodes(), 10))
-    feat2 = F.randn((g2.number_of_nodes(), 10))
-    feat4 = F.randn((g4.number_of_nodes(), 10))
-    bg.ndata['x'] = F.cat([feat0, feat1, feat2, feat4], 0)
-    ground_truth = F.cat([
-        F.softmax(feat0, 0),
-        F.softmax(feat1, 0),
-        F.softmax(feat2, 0),
-        F.softmax(feat4, 0)
-    ], 0)
-    assert F.allclose(dgl.softmax_nodes(bg, 'x'), ground_truth)
-
-def test_softmax_edges():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(10))
-
-    feat0 = F.randn((g0.number_of_edges(), 10))
-    g0.edata['x'] = feat0
-    ground_truth = F.softmax(feat0, dim=0)
-    assert F.allclose(dgl.softmax_edges(g0, 'x'), ground_truth)
-    g0.edata.pop('x')
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(5))
-    g2 = dgl.DGLGraph(nx.path_graph(3))
-    g3 = dgl.DGLGraph()
-    g4 = dgl.DGLGraph(nx.path_graph(10))
-    bg = dgl.batch([g0, g1, g2, g3, g4])
-    feat1 = F.randn((g1.number_of_edges(), 10))
-    feat2 = F.randn((g2.number_of_edges(), 10))
-    feat4 = F.randn((g4.number_of_edges(), 10))
-    bg.edata['x'] = F.cat([feat0, feat1, feat2, feat4], 0)
-    ground_truth = F.cat([
-        F.softmax(feat0, 0),
-        F.softmax(feat1, 0),
-        F.softmax(feat2, 0),
-        F.softmax(feat4, 0)
-    ], 0)
-    assert F.allclose(dgl.softmax_edges(bg, 'x'), ground_truth)
-
-def test_broadcast_nodes():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(10))
-    feat0 = F.randn((1, 40))
-    ground_truth = F.stack([feat0] * g0.number_of_nodes(), 0)
-    assert F.allclose(dgl.broadcast_nodes(g0, feat0), ground_truth)
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(3))
-    g2 = dgl.DGLGraph()
-    g3 = dgl.DGLGraph(nx.path_graph(12))
-    bg = dgl.batch([g0, g1, g2, g3])
-    feat1 = F.randn((1, 40))
-    feat2 = F.randn((1, 40))
-    feat3 = F.randn((1, 40))
-    ground_truth = F.cat(
-        [feat0] * g0.number_of_nodes() +\
-        [feat1] * g1.number_of_nodes() +\
-        [feat2] * g2.number_of_nodes() +\
-        [feat3] * g3.number_of_nodes(), 0
-    )
-    assert F.allclose(dgl.broadcast_nodes(
-        bg, F.cat([feat0, feat1, feat2, feat3], 0)
-    ), ground_truth)
-
-def test_broadcast_edges():
-    # test#1: basic
-    g0 = dgl.DGLGraph(nx.path_graph(10))
-    feat0 = F.randn((1, 40))
-    ground_truth = F.stack([feat0] * g0.number_of_edges(), 0)
-    assert F.allclose(dgl.broadcast_edges(g0, feat0), ground_truth)
-
-    # test#2: batched graph
-    g1 = dgl.DGLGraph(nx.path_graph(3))
-    g2 = dgl.DGLGraph()
-    g3 = dgl.DGLGraph(nx.path_graph(12))
-    bg = dgl.batch([g0, g1, g2, g3])
-    feat1 = F.randn((1, 40))
-    feat2 = F.randn((1, 40))
-    feat3 = F.randn((1, 40))
-    ground_truth = F.cat(
-        [feat0] * g0.number_of_edges() +\
-        [feat1] * g1.number_of_edges() +\
-        [feat2] * g2.number_of_edges() +\
-        [feat3] * g3.number_of_edges(), 0
-    )
-    assert F.allclose(dgl.broadcast_edges(
-        bg, F.cat([feat0, feat1, feat2, feat3], 0)
-    ), ground_truth)
-
-if __name__ == '__main__':
-    test_simple_readout()
-    test_topk_nodes()
-    test_topk_edges()
-    test_softmax_nodes()
-    test_softmax_edges()
-    test_broadcast_nodes()
-    test_broadcast_edges()
+import pytest
+from test_utils.graph_cases import get_cases
+from utils import parametrize_dtype
+
+@parametrize_dtype
+def test_sum_case1(idtype):
+    # NOTE: If you want to update this test case, remember to update the docstring
+    #  example too!!!
+    g1 = dgl.graph(([0, 1], [1, 0]), idtype=idtype, device=F.ctx())
+    g1.ndata['h'] = F.tensor([1., 2.])
+    g2 = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    g2.ndata['h'] = F.tensor([1., 2., 3.])
+    bg = dgl.batch([g1, g2])
+    bg.ndata['w'] = F.tensor([.1, .2, .1, .5, .2])
+    assert F.allclose(F.tensor([3.]), dgl.sum_nodes(g1, 'h'))
+    assert F.allclose(F.tensor([3., 6.]), dgl.sum_nodes(bg, 'h'))
+    assert F.allclose(F.tensor([.5, 1.7]), dgl.sum_nodes(bg, 'h', 'w'))
+
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(['homo'], exclude=['dglgraph']))
+@pytest.mark.parametrize('reducer', ['sum', 'max', 'mean'])
+def test_reduce_readout(g, idtype, reducer):
+    g = g.astype(idtype).to(F.ctx())
+    g.ndata['h'] = F.randn((g.number_of_nodes(), 3))
+    g.edata['h'] = F.randn((g.number_of_edges(), 2))
+
+    # Test.1: node readout
+    x = dgl.readout_nodes(g, 'h', op=reducer)
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = dgl.readout_nodes(sg, 'h', op=reducer)
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    x = getattr(dgl, '{}_nodes'.format(reducer))(g, 'h')
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = getattr(dgl, '{}_nodes'.format(reducer))(sg, 'h')
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    # Test.2: edge readout
+    x = dgl.readout_edges(g, 'h', op=reducer)
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = dgl.readout_edges(sg, 'h', op=reducer)
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    x = getattr(dgl, '{}_edges'.format(reducer))(g, 'h')
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = getattr(dgl, '{}_edges'.format(reducer))(sg, 'h')
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(['homo'], exclude=['dglgraph']))
+@pytest.mark.parametrize('reducer', ['sum', 'max', 'mean'])
+def test_weighted_reduce_readout(g, idtype, reducer):
+    g = g.astype(idtype).to(F.ctx())
+    g.ndata['h'] = F.randn((g.number_of_nodes(), 3))
+    g.ndata['w'] = F.randn((g.number_of_nodes(), 1))
+    g.edata['h'] = F.randn((g.number_of_edges(), 2))
+    g.edata['w'] = F.randn((g.number_of_edges(), 1))
+
+    # Test.1: node readout
+    x = dgl.readout_nodes(g, 'h', 'w', op=reducer)
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = dgl.readout_nodes(sg, 'h', 'w', op=reducer)
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    x = getattr(dgl, '{}_nodes'.format(reducer))(g, 'h', 'w')
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = getattr(dgl, '{}_nodes'.format(reducer))(sg, 'h', 'w')
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    # Test.2: edge readout
+    x = dgl.readout_edges(g, 'h', 'w', op=reducer)
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = dgl.readout_edges(sg, 'h', 'w', op=reducer)
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    x = getattr(dgl, '{}_edges'.format(reducer))(g, 'h', 'w')
+    # check correctness
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        sx = getattr(dgl, '{}_edges'.format(reducer))(sg, 'h', 'w')
+        subx.append(sx)
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(['homo'], exclude=['dglgraph']))
+@pytest.mark.parametrize('descending', [True, False])
+def test_topk(g, idtype, descending):
+    g = g.astype(idtype).to(F.ctx())
+    g.ndata['x'] = F.randn((g.number_of_nodes(), 3))
+
+    # Test.1: to test the case where k > number of nodes.
+    dgl.topk_nodes(g, 'x', 100, sortby=-1)
+
+    # Test.2: test correctness
+    min_nnodes = F.asnumpy(g.batch_num_nodes()).min()
+    if min_nnodes <= 1:
+        return
+    k = min_nnodes - 1
+    val, indices = dgl.topk_nodes(g, 'x', k, descending=descending, sortby=-1)
+    print(k)
+    print(g.ndata['x'])
+    print('val', val)
+    print('indices', indices)
+    subg = dgl.unbatch(g)
+    subval, subidx = [], []
+    for sg in subg:
+        subx = F.asnumpy(sg.ndata['x'])
+        ai = np.argsort(subx[:,-1:].flatten())
+        if descending:
+            ai = np.ascontiguousarray(ai[::-1])
+        subx = np.expand_dims(subx[ai[:k]], 0)
+        subval.append(F.tensor(subx))
+        subidx.append(F.tensor(np.expand_dims(ai[:k], 0)))
+    print(F.cat(subval, dim=0))
+    assert F.allclose(val, F.cat(subval, dim=0))
+    assert F.allclose(indices, F.cat(subidx, dim=0))
+
+    # Test.3: sorby=None
+    dgl.topk_nodes(g, 'x', k, sortby=None)
+
+    g.edata['x'] = F.randn((g.number_of_edges(), 3))
+
+    # Test.4: topk edges where k > number of edges.
+    dgl.topk_edges(g, 'x', 100, sortby=-1)
+
+    # Test.5: topk edges test correctness
+    min_nedges = F.asnumpy(g.batch_num_edges()).min()
+    if min_nedges <= 1:
+        return
+    k = min_nedges - 1
+    val, indices = dgl.topk_edges(g, 'x', k, descending=descending, sortby=-1)
+    print(k)
+    print(g.edata['x'])
+    print('val', val)
+    print('indices', indices)
+    subg = dgl.unbatch(g)
+    subval, subidx = [], []
+    for sg in subg:
+        subx = F.asnumpy(sg.edata['x'])
+        ai = np.argsort(subx[:,-1:].flatten())
+        if descending:
+            ai = np.ascontiguousarray(ai[::-1])
+        subx = np.expand_dims(subx[ai[:k]], 0)
+        subval.append(F.tensor(subx))
+        subidx.append(F.tensor(np.expand_dims(ai[:k], 0)))
+    print(F.cat(subval, dim=0))
+    assert F.allclose(val, F.cat(subval, dim=0))
+    assert F.allclose(indices, F.cat(subidx, dim=0))
+
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(['homo'], exclude=['dglgraph']))
+def test_softmax(g, idtype):
+    g = g.astype(idtype).to(F.ctx())
+    g.ndata['h'] = F.randn((g.number_of_nodes(), 3))
+    g.edata['h'] = F.randn((g.number_of_edges(), 2))
+
+    # Test.1: node readout
+    x = dgl.softmax_nodes(g, 'h')
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        subx.append(F.softmax(sg.ndata['h'], dim=0))
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+    # Test.2: edge readout
+    x = dgl.softmax_edges(g, 'h')
+    subg = dgl.unbatch(g)
+    subx = []
+    for sg in subg:
+        subx.append(F.softmax(sg.edata['h'], dim=0))
+    assert F.allclose(x, F.cat(subx, dim=0))
+
+@parametrize_dtype
+@pytest.mark.parametrize('g', get_cases(['homo'], exclude=['dglgraph']))
+def test_broadcast(idtype, g):
+    g = g.astype(idtype).to(F.ctx())
+    gfeat = F.randn((g.batch_size, 3))
+
+    # Test.0: broadcast_nodes
+    g.ndata['h'] = dgl.broadcast_nodes(g, gfeat)
+    subg = dgl.unbatch(g)
+    for i, sg in enumerate(subg):
+        assert F.allclose(sg.ndata['h'],
+                F.repeat(F.reshape(gfeat[i], (1,3)), sg.number_of_nodes(), dim=0))
+
+    # Test.1: broadcast_edges
+    g.edata['h'] = dgl.broadcast_edges(g, gfeat)
+    subg = dgl.unbatch(g)
+    for i, sg in enumerate(subg):
+        assert F.allclose(sg.edata['h'],
+                F.repeat(F.reshape(gfeat[i], (1,3)), sg.number_of_edges(), dim=0))
--- a/tests/compute/test_removal.py
+++ b/tests/compute/test_removal.py
@@ -3,9 +3,12 @@ import backend as F
 import networkx as nx
 import numpy as np
 import dgl
+from test_utils import parametrize_dtype

-def test_node_removal():
+@parametrize_dtype
+def test_node_removal(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    g.add_edge(0, 0)
    assert g.number_of_nodes() == 10
@@ -26,8 +29,10 @@ def test_node_removal():
    assert g.number_of_nodes() == 7
    assert F.array_equal(g.ndata['id'], F.tensor([0, 7, 8, 9, 0, 0, 0]))

-def test_multigraph_node_removal():
+@parametrize_dtype
+def test_multigraph_node_removal(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    for i in range(5):
        g.add_edge(i, i)
@@ -52,8 +57,10 @@ def test_multigraph_node_removal():
    assert g.number_of_nodes() == 3
    assert g.number_of_edges() == 6

-def test_multigraph_edge_removal():
+@parametrize_dtype
+def test_multigraph_edge_removal(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    for i in range(5):
        g.add_edge(i, i)
@@ -77,8 +84,10 @@ def test_multigraph_edge_removal():
    assert g.number_of_nodes() == 5
    assert g.number_of_edges() == 8

-def test_edge_removal():
+@parametrize_dtype
+def test_edge_removal(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    for i in range(5):
        for j in range(5):
@@ -103,8 +112,10 @@ def test_edge_removal():
    assert g.number_of_edges() == 11
    assert F.array_equal(g.edata['id'], F.tensor([0, 1, 10, 11, 12, 20, 21, 22, 23, 24, 0]))

-def test_node_and_edge_removal():
+@parametrize_dtype
+def test_node_and_edge_removal(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    for i in range(10):
        for j in range(10):
@@ -140,46 +151,54 @@ def test_node_and_edge_removal():
    assert g.number_of_nodes() == 10
    assert g.number_of_edges() == 48

-def test_node_frame():
+@parametrize_dtype
+def test_node_frame(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    data = np.random.rand(10, 3)
    new_data = data.take([0, 1, 2, 7, 8, 9], axis=0)
-    g.ndata['h'] = F.zerocopy_from_numpy(data)
+    g.ndata['h'] = F.tensor(data)

    # remove nodes
    g.remove_nodes(range(3, 7))
-    assert F.allclose(g.ndata['h'], F.zerocopy_from_numpy(new_data))
+    assert F.allclose(g.ndata['h'], F.tensor(new_data))

-def test_edge_frame():
+@parametrize_dtype
+def test_edge_frame(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    g.add_edges(list(range(10)), list(range(1, 10)) + [0])
    data = np.random.rand(10, 3)
    new_data = data.take([0, 1, 2, 7, 8, 9], axis=0)
-    g.edata['h'] = F.zerocopy_from_numpy(data)
+    g.edata['h'] = F.tensor(data)

    # remove edges
    g.remove_edges(range(3, 7))
-    assert F.allclose(g.edata['h'], F.zerocopy_from_numpy(new_data))
+    assert F.allclose(g.edata['h'], F.tensor(new_data))

-def test_frame_size():
+@parametrize_dtype
+def test_issue1287(idtype):
    # reproduce https://github.com/dmlc/dgl/issues/1287.
-    # remove nodes
+    # setting features after remove nodes
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    g.add_edges([0, 2, 3, 1, 1], [1, 0, 3, 1, 0])
    g.remove_nodes([0, 1])
-    assert g._node_frame.num_rows == 3
-    assert g._edge_frame.num_rows == 1
+    g.ndata['h'] = F.randn((g.number_of_nodes(), 3))
+    g.edata['h'] = F.randn((g.number_of_edges(), 2))

    # remove edges 
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    g.add_edges([0, 2, 3, 1, 1], [1, 0, 3, 1, 0])
    g.remove_edges([0, 1])
-    assert g._node_frame.num_rows == 5
-    assert g._edge_frame.num_rows == 3
+    g = g.to(F.ctx())
+    g.ndata['h'] = F.randn((g.number_of_nodes(), 3))
+    g.edata['h'] = F.randn((g.number_of_edges(), 2))

 if __name__ == '__main__':
    test_node_removal()

--- a/tests/compute/test_sampler.py
+++ b/tests/compute/test_sampler.py
@@ -10,7 +10,7 @@ np.random.seed(42)

 def generate_rand_graph(n):
    arr = (sp.sparse.random(n, n, density=0.1, format='coo') != 0).astype(np.int64)
-    return dgl.DGLGraph(arr, readonly=True)
+    return dgl.DGLGraphStale(arr, readonly=True)

 def test_create_full():
    g = generate_rand_graph(100)
@@ -171,7 +171,7 @@ def test_nonuniform_neighbor_sampler():
        if edge not in edges:
            edges.append(edge)
    src, dst = zip(*edges)
-    g = dgl.DGLGraph()
+    g = dgl.DGLGraphStale()
    g.add_nodes(100)
    g.add_edges(src, dst)
    g.readonly()

--- a/tests/compute/test_sampling.py
+++ b/tests/compute/test_sampling.py
@@ -17,7 +17,7 @@ def check_random_walk(g, metapath, traces, ntypes, prob=None):
                traces[i, j], traces[i, j+1], etype=metapath[j])
            if prob is not None and prob in g.edges[metapath[j]].data:
                p = F.asnumpy(g.edges[metapath[j]].data['p'])
-                eids = g.edge_id(traces[i, j], traces[i, j+1], etype=metapath[j])
+                eids = g.edge_ids(traces[i, j], traces[i, j+1], etype=metapath[j])
                assert p[eids] != 0

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU random walk not implemented")
@@ -113,6 +113,7 @@ def test_pack_traces():
    assert F.array_equal(result[2], F.tensor([2, 7], dtype=F.int64))
    assert F.array_equal(result[3], F.tensor([0, 2], dtype=F.int64))

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_pinsage_sampling():
    def _test_sampler(g, sampler, ntype):
        neighbor_g = sampler(F.tensor([0, 2], dtype=F.int64))
@@ -228,7 +229,7 @@ def _test_sample_neighbors(hypersparse):
            assert subg.number_of_edges() == 4
            u, v = subg.edges()
            assert set(F.asnumpy(F.unique(v))) == {0, 1}
-            assert F.array_equal(g.has_edges_between(u, v), F.ones((4,), dtype=F.int64))
+            assert F.array_equal(F.astype(g.has_edges_between(u, v), F.int64), F.ones((4,), dtype=F.int64))
            assert F.array_equal(g.edge_ids(u, v), subg.edata[dgl.EID])
            edge_set = set(zip(list(F.asnumpy(u)), list(F.asnumpy(v))))
            if not replace:
@@ -258,7 +259,7 @@ def _test_sample_neighbors(hypersparse):
            assert subg.number_of_edges() == num_edges
            u, v = subg.edges()
            assert set(F.asnumpy(F.unique(v))) == {0, 2}
-            assert F.array_equal(g.has_edges_between(u, v), F.ones((num_edges,), dtype=F.int64))
+            assert F.array_equal(F.astype(g.has_edges_between(u, v), F.int64), F.ones((num_edges,), dtype=F.int64))
            assert F.array_equal(g.edge_ids(u, v), subg.edata[dgl.EID])
            edge_set = set(zip(list(F.asnumpy(u)), list(F.asnumpy(v))))
            if not replace:
@@ -326,7 +327,7 @@ def _test_sample_neighbors_outedge(hypersparse):
            assert subg.number_of_edges() == 4
            u, v = subg.edges()
            assert set(F.asnumpy(F.unique(u))) == {0, 1}
-            assert F.array_equal(g.has_edges_between(u, v), F.ones((4,), dtype=F.int64))
+            assert F.array_equal(F.astype(g.has_edges_between(u, v), F.int64), F.ones((4,), dtype=F.int64))
            assert F.array_equal(g.edge_ids(u, v), subg.edata[dgl.EID])
            edge_set = set(zip(list(F.asnumpy(u)), list(F.asnumpy(v))))
            if not replace:
@@ -356,7 +357,7 @@ def _test_sample_neighbors_outedge(hypersparse):
            assert subg.number_of_edges() == num_edges
            u, v = subg.edges()
            assert set(F.asnumpy(F.unique(u))) == {0, 2}
-            assert F.array_equal(g.has_edges_between(u, v), F.ones((num_edges,), dtype=F.int64))
+            assert F.array_equal(F.astype(g.has_edges_between(u, v), F.int64), F.ones((num_edges,), dtype=F.int64))
            assert F.array_equal(g.edge_ids(u, v), subg.edata[dgl.EID])
            edge_set = set(zip(list(F.asnumpy(u)), list(F.asnumpy(v))))
            if not replace:

--- a/tests/compute/test_serialize.py
+++ b/tests/compute/test_serialize.py
@@ -5,6 +5,7 @@ import time
 import tempfile
 import os
 import pytest
+import unittest

 from dgl import DGLGraph
 import dgl
@@ -34,6 +35,7 @@ def construct_graph(n, is_hetero):
    return g_list


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @pytest.mark.parametrize('is_hetero', [True, False])
 def test_graph_serialize_with_feature(is_hetero):
    num_graphs = 100
@@ -75,6 +77,7 @@ def test_graph_serialize_with_feature(is_hetero):
    os.unlink(path)


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @pytest.mark.parametrize('is_hetero', [True, False])
 def test_graph_serialize_without_feature(is_hetero):
    num_graphs = 100
@@ -102,6 +105,7 @@ def test_graph_serialize_without_feature(is_hetero):

    os.unlink(path)

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @pytest.mark.parametrize('is_hetero', [True, False])
 def test_graph_serialize_with_labels(is_hetero):
    num_graphs = 100
@@ -212,10 +216,10 @@ def test_load_old_files2():
    assert np.allclose(F.asnumpy(load_edges[1]), edges1)


-def create_heterographs(index_dtype):
+def create_heterographs(idtype):
    g_x = dgl.graph(([0, 1, 2], [1, 2, 3]), 'user',
-                    'follows', index_dtype=index_dtype, restrict_format='any')
-    g_y = dgl.graph(([0, 2], [2, 3]), 'user', 'knows', index_dtype=index_dtype, restrict_format='csr')
+                    'follows', idtype=idtype)
+    g_y = dgl.graph(([0, 2], [2, 3]), 'user', 'knows', idtype=idtype).formats('csr')
    g_x.nodes['user'].data['h'] = F.randn((4, 3))
    g_x.edges['follows'].data['w'] = F.randn((3, 2))
    g_y.nodes['user'].data['hh'] = F.ones((4, 5))
@@ -223,11 +227,11 @@ def create_heterographs(index_dtype):
    g = dgl.hetero_from_relations([g_x, g_y])
    return [g, g_x, g_y]

-def create_heterographs2(index_dtype):
+def create_heterographs2(idtype):
    g_x = dgl.graph(([0, 1, 2], [1, 2, 3]), 'user',
-                    'follows', index_dtype=index_dtype, restrict_format='any')
-    g_y = dgl.graph(([0, 2], [2, 3]), 'user', 'knows', index_dtype=index_dtype, restrict_format='csr')
-    g_z = dgl.bipartite(([0, 1, 3], [2, 3, 4]), 'user', 'knows', 'knowledge', index_dtype=index_dtype)
+                    'follows', idtype=idtype)
+    g_y = dgl.graph(([0, 2], [2, 3]), 'user', 'knows', idtype=idtype).formats('csr')
+    g_z = dgl.bipartite(([0, 1, 3], [2, 3, 4]), 'user', 'knows', 'knowledge', idtype=idtype)
    g_x.nodes['user'].data['h'] = F.randn((4, 3))
    g_x.edges['follows'].data['w'] = F.randn((3, 2))
    g_y.nodes['user'].data['hh'] = F.ones((4, 5))
@@ -253,16 +257,17 @@ def test_deserialize_old_heterograph_file():
 def create_old_heterograph_files():
    path = os.path.join(
        os.path.dirname(__file__), "data/hetero1.bin")
-    g_list0 = create_heterographs("int64") + create_heterographs("int32")
+    g_list0 = create_heterographs(F.int64) + create_heterographs(F.int32)
    labels_dict = {"graph_label": F.ones(54)}
    save_graphs(path, g_list0, labels_dict)


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_serialize_heterograph():
    f = tempfile.NamedTemporaryFile(delete=False)
    path = f.name
    f.close()
-    g_list0 = create_heterographs2("int64") + create_heterographs2("int32")
+    g_list0 = create_heterographs2(F.int64) + create_heterographs2(F.int32)
    save_graphs(path, g_list0)

    g_list, _ = load_graphs(path)
@@ -271,8 +276,9 @@ def test_serialize_heterograph():
    for i in range(len(g_list0)):
        for j, etypes in enumerate(g_list0[i].canonical_etypes):
            assert g_list[i].canonical_etypes[j] == etypes
-    assert g_list[1].restrict_format() == 'any'
-    assert g_list[2].restrict_format() == 'csr'
+    #assert g_list[1].restrict_format() == 'any'
+    #assert g_list[2].restrict_format() == 'csr'
+
    assert g_list[4].idtype == F.int32
    assert np.allclose(
        F.asnumpy(g_list[2].nodes['user'].data['hh']), np.ones((4, 5)))
@@ -291,15 +297,16 @@ def test_serialize_heterograph():

    os.unlink(path)

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @pytest.mark.skip(reason="lack of permission on CI")
 def test_serialize_heterograph_s3():
    path = "s3://dglci-data-test/graph2.bin"
-    g_list0 = create_heterographs("int64") + create_heterographs("int32")
+    g_list0 = create_heterographs(F.int64) + create_heterographs(F.int32)
    save_graphs(path, g_list0)

    g_list = load_graphs(path, [0, 2, 5])
    assert g_list[0].idtype == F.int64
-    assert g_list[1].restrict_format() == 'csr'
+    #assert g_list[1].restrict_format() == 'csr'
    assert np.allclose(
        F.asnumpy(g_list[1].nodes['user'].data['hh']), np.ones((4, 5)))
    assert np.allclose(

--- a/tests/compute/test_sparse.py
+++ b/tests/compute/test_sparse.py
@@ -5,7 +5,8 @@ import random
 import pytest
 import networkx as nx
 import backend as F
-import numpy as np
+import numpy as np 
+from utils import parametrize_dtype

 random.seed(42)
 np.random.seed(42)
@@ -98,13 +99,10 @@ sddmm_shapes = [
 @pytest.mark.parametrize('msg', ['add', 'sub', 'mul', 'div', 'copy_lhs', 'copy_rhs'])
 @pytest.mark.parametrize('reducer', ['sum', 'min', 'max'])
 @parametrize_dtype
-def test_spmm(g, shp, msg, reducer, index_dtype):
-    if dgl.backend.backend_name == 'tensorflow' and (reducer in ['min', 'max'] or index_dtype == 'int32'):
+def test_spmm(idtype, g, shp, msg, reducer):
+    g = g.astype(idtype).to(F.ctx())
+    if dgl.backend.backend_name == 'tensorflow' and (reducer in ['min', 'max']):
        pytest.skip()  # tensorflow dlpack has problem writing into int32 arrays on GPU.
-    if index_dtype == 'int32':
-        g = g.int()
-    else:
-        g = g.long()
    print(g)
    print(g.idtype)

@@ -164,15 +162,12 @@ def test_spmm(g, shp, msg, reducer, index_dtype):
 @pytest.mark.parametrize('rhs_target', ['u', 'v', 'e'])
 @pytest.mark.parametrize('msg', ['add', 'sub', 'mul', 'div', 'dot', 'copy_lhs', 'copy_rhs'])
 @parametrize_dtype
-def test_sddmm(g, shp, lhs_target, rhs_target, msg, index_dtype):
+def test_sddmm(g, shp, lhs_target, rhs_target, msg, idtype):
+    if lhs_target == rhs_target:
+        return
+    g = g.astype(idtype).to(F.ctx())
    if dgl.backend.backend_name == 'mxnet' and g.number_of_edges() == 0:
        pytest.skip()   # mxnet do not support zero shape tensor
-    if dgl.backend.backend_name == 'tensorflow' and index_dtype == 'int32':
-        pytest.skip()   # tensorflow dlpack has problem with int32 ndarray.
-    if index_dtype == 'int32':
-        g = g.int()
-    else:
-        g = g.long()
    print(g)
    print(g.idtype)

@@ -234,4 +229,4 @@ def test_sddmm(g, shp, lhs_target, rhs_target, msg, index_dtype):
    if 'm' in g.edata: g.edata.pop('m')

 if __name__ == '__main__':
-    test_spmm(graphs[0], spmm_shapes[5], 'copy_lhs', 'sum')
+    test_spmm(F.int32, graphs[0], spmm_shapes[5], 'copy_lhs', 'sum')
--- a/tests/compute/test_specialization.py
+++ b/tests/compute/test_specialization.py
@@ -3,11 +3,13 @@ import scipy.sparse as sp
 import dgl
 import dgl.function as fn
 import backend as F
+from test_utils import parametrize_dtype

 D = 5

-def generate_graph():
+def generate_graph(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    # create a graph where 0 is the source and 9 is the sink
    for i in range(1, 9):
@@ -15,12 +17,13 @@ def generate_graph():
        g.add_edge(i, 9)
    # add a back flow from 9 to 0
    g.add_edge(9, 0)
-    g.set_n_repr({'f1' : F.randn((10,)), 'f2' : F.randn((10, D))})
+    g.ndata.update({'f1' : F.randn((10,)), 'f2' : F.randn((10, D))})
    weights = F.randn((17,))
-    g.set_e_repr({'e1': weights, 'e2': F.unsqueeze(weights, 1)})
+    g.edata.update({'e1': weights, 'e2': F.unsqueeze(weights, 1)})
    return g

-def test_v2v_update_all():
+@parametrize_dtype
+def test_v2v_update_all(idtype):
    def _test(fld):
        def message_func(edges):
            return {'m' : edges.src[fld]}
@@ -36,12 +39,12 @@ def test_v2v_update_all():

        def apply_func(nodes):
            return {fld : 2 * nodes.data[fld]}
-        g = generate_graph()
+        g = generate_graph(idtype)
        # update all
        v1 = g.ndata[fld]
        g.update_all(fn.copy_src(src=fld, out='m'), fn.sum(msg='m', out=fld), apply_func)
        v2 = g.ndata[fld]
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.update_all(message_func, reduce_func, apply_func)
        v3 = g.ndata[fld]
        assert F.allclose(v2, v3)
@@ -50,7 +53,7 @@ def test_v2v_update_all():
        g.update_all(fn.src_mul_edge(src=fld, edge='e1', out='m'),
                fn.sum(msg='m', out=fld), apply_func)
        v2 = g.ndata[fld]
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.update_all(message_func_edge, reduce_func, apply_func)
        v4 = g.ndata[fld]
        assert F.allclose(v2, v4)
@@ -59,9 +62,10 @@ def test_v2v_update_all():
    # test 2d node features
    _test('f2')

-def test_v2v_snr():
-    u = F.tensor([0, 0, 0, 3, 4, 9])
-    v = F.tensor([1, 2, 3, 9, 9, 0])
+@parametrize_dtype
+def test_v2v_snr(idtype):
+    u = F.tensor([0, 0, 0, 3, 4, 9], idtype)
+    v = F.tensor([1, 2, 3, 9, 9, 0], idtype)
    def _test(fld):
        def message_func(edges):
            return {'m' : edges.src[fld]}
@@ -77,13 +81,13 @@ def test_v2v_snr():

        def apply_func(nodes):
            return {fld : 2 * nodes.data[fld]}
-        g = generate_graph()
+        g = generate_graph(idtype)
        # send and recv
        v1 = g.ndata[fld]
        g.send_and_recv((u, v), fn.copy_src(src=fld, out='m'),
                fn.sum(msg='m', out=fld), apply_func)
        v2 = g.ndata[fld]
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.send_and_recv((u, v), message_func, reduce_func, apply_func)
        v3 = g.ndata[fld]
        assert F.allclose(v2, v3)
@@ -92,7 +96,7 @@ def test_v2v_snr():
        g.send_and_recv((u, v), fn.src_mul_edge(src=fld, edge='e1', out='m'),
                fn.sum(msg='m', out=fld), apply_func)
        v2 = g.ndata[fld]
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.send_and_recv((u, v), message_func_edge, reduce_func, apply_func)
        v4 = g.ndata[fld]
        assert F.allclose(v2, v4)
@@ -102,8 +106,9 @@ def test_v2v_snr():
    _test('f2')


-def test_v2v_pull():
-    nodes = F.tensor([1, 2, 3, 9])
+@parametrize_dtype
+def test_v2v_pull(idtype):
+    nodes = F.tensor([1, 2, 3, 9], idtype)
    def _test(fld):
        def message_func(edges):
            return {'m' : edges.src[fld]}
@@ -119,7 +124,7 @@ def test_v2v_pull():

        def apply_func(nodes):
            return {fld : 2 * nodes.data[fld]}
-        g = generate_graph()
+        g = generate_graph(idtype)
        # send and recv
        v1 = g.ndata[fld]
        g.pull(nodes, fn.copy_src(src=fld, out='m'), fn.sum(msg='m', out=fld), apply_func)
@@ -142,7 +147,8 @@ def test_v2v_pull():
    # test 2d node features
    _test('f2')

-def test_v2v_update_all_multi_fn():
+@parametrize_dtype
+def test_v2v_update_all_multi_fn(idtype):
    def message_func(edges):
        return {'m2': edges.src['f2']}

@@ -152,8 +158,8 @@ def test_v2v_update_all_multi_fn():
    def reduce_func(nodes):
        return {'v1': F.sum(nodes.mailbox['m2'], 1)}

-    g = generate_graph()
-    g.set_n_repr({'v1' : F.zeros((10,)), 'v2' : F.zeros((10,))})
+    g = generate_graph(idtype)
+    g.ndata.update({'v1' : F.zeros((10,)), 'v2' : F.zeros((10,))})
    fld = 'f2'

    g.update_all(message_func, reduce_func)
@@ -181,9 +187,10 @@ def test_v2v_update_all_multi_fn():
    v2 = g.ndata['v2']
    assert F.allclose(v1, v2)

-def test_v2v_snr_multi_fn():
-    u = F.tensor([0, 0, 0, 3, 4, 9])
-    v = F.tensor([1, 2, 3, 9, 9, 0])
+@parametrize_dtype
+def test_v2v_snr_multi_fn(idtype):
+    u = F.tensor([0, 0, 0, 3, 4, 9], idtype)
+    v = F.tensor([1, 2, 3, 9, 9, 0], idtype)

    def message_func(edges):
        return {'m2': edges.src['f2']}
@@ -194,8 +201,8 @@ def test_v2v_snr_multi_fn():
    def reduce_func(nodes):
        return {'v1' : F.sum(nodes.mailbox['m2'], 1)}

-    g = generate_graph()
-    g.set_n_repr({'v1' : F.zeros((10, D)), 'v2' : F.zeros((10, D)),
+    g = generate_graph(idtype)
+    g.ndata.update({'v1' : F.zeros((10, D)), 'v2' : F.zeros((10, D)),
        'v3' : F.zeros((10, D))})
    fld = 'f2'

@@ -229,7 +236,8 @@ def test_v2v_snr_multi_fn():
    v2 = g.ndata['v2']
    assert F.allclose(v1, v2)

-def test_e2v_update_all_multi_fn():
+@parametrize_dtype
+def test_e2v_update_all_multi_fn(idtype):
    def _test(fld):
        def message_func(edges):
            return {'m1' : edges.src[fld] + edges.dst[fld],
@@ -244,19 +252,19 @@ def test_e2v_update_all_multi_fn():
        def apply_func_2(nodes):
            return {fld : 2 * nodes.data['r1'] + 2 * nodes.data['r2']}

-        g = generate_graph()
+        g = generate_graph(idtype)
        # update all
-        v1 = g.get_n_repr()[fld]
+        v1 = g.ndata[fld]
        # no specialization
        g.update_all(message_func, reduce_func, apply_func)
-        v2 = g.get_n_repr()[fld]
+        v2 = g.ndata[fld]

        # user break reduce func into 2 builtin
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.update_all(message_func,
                     [fn.sum(msg='m1', out='r1'), fn.sum(msg='m2', out='r2')],
                     apply_func_2)
-        v3 = g.get_n_repr()[fld]
+        v3 = g.ndata[fld]

        assert F.allclose(v2, v3)

@@ -265,9 +273,10 @@ def test_e2v_update_all_multi_fn():
    # test 2d node features
    _test('f2')

-def test_e2v_snr_multi_fn():
-    u = F.tensor([0, 0, 0, 3, 4, 9])
-    v = F.tensor([1, 2, 3, 9, 9, 0])
+@parametrize_dtype
+def test_e2v_snr_multi_fn(idtype):
+    u = F.tensor([0, 0, 0, 3, 4, 9], idtype)
+    v = F.tensor([1, 2, 3, 9, 9, 0], idtype)
    def _test(fld):
        def message_func(edges):
            return {'m1' : edges.src[fld] + edges.dst[fld],
@@ -282,59 +291,19 @@ def test_e2v_snr_multi_fn():
        def apply_func_2(nodes):
            return {fld : 2 * nodes.data['r1'] + 2 * nodes.data['r2']}

-        g = generate_graph()
+        g = generate_graph(idtype)
        # send_and_recv
-        v1 = g.get_n_repr()[fld]
+        v1 = g.ndata[fld]
        # no specialization
        g.send_and_recv((u, v), message_func, reduce_func, apply_func)
-        v2 = g.get_n_repr()[fld]
+        v2 = g.ndata[fld]

        # user break reduce func into 2 builtin
-        g.set_n_repr({fld : v1})
+        g.ndata.update({fld : v1})
        g.send_and_recv((u, v), message_func,
                        [fn.sum(msg='m1', out='r1'), fn.sum(msg='m2', out='r2')],
                        apply_func_2)
-        v3 = g.get_n_repr()[fld]
-
-        assert F.allclose(v2, v3)
-
-    # test 1d node features
-    _test('f1')
-    # test 2d node features
-    _test('f2')
-
-def test_e2v_recv_multi_fn():
-    u = F.tensor([0, 0, 0, 3, 4, 9])
-    v = F.tensor([1, 2, 3, 9, 9, 0])
-    def _test(fld):
-        def message_func(edges):
-            return {'m1' : edges.src[fld] + edges.dst[fld],
-                    'm2' : edges.src[fld] * edges.dst[fld]}
-
-        def reduce_func(nodes):
-            return {fld : F.sum(nodes.mailbox['m1'] + nodes.mailbox['m2'], 1)}
-
-        def apply_func(nodes):
-            return {fld : 2 * nodes.data[fld]}
-
-        def apply_func_2(nodes):
-            return {fld : 2 * nodes.data['r1'] + 2 * nodes.data['r2']}
-
-        g = generate_graph()
-        # recv
-        v1 = g.get_n_repr()[fld]
-        # no specialization
-        g.send((u, v), message_func)
-        g.recv([0,1,2,3,9], reduce_func, apply_func)
-        v2 = g.get_n_repr()[fld]
-
-        # user break reduce func into 2 builtin
-        g.set_n_repr({fld : v1})
-        g.send((u, v), message_func)
-        g.recv([0,1,2,3,9],
-               [fn.sum(msg='m1', out='r1'), fn.sum(msg='m2', out='r2')],
-               apply_func_2)
-        v3 = g.get_n_repr()[fld]
+        v3 = g.ndata[fld]

        assert F.allclose(v2, v3)

@@ -343,9 +312,11 @@ def test_e2v_recv_multi_fn():
    # test 2d node features
    _test('f2')

-def test_update_all_multi_fallback():
+@parametrize_dtype
+def test_update_all_multi_fallback(idtype):
    # create a graph with zero in degree nodes
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    for i in range(1, 9):
        g.add_edge(0, i)
@@ -399,9 +370,11 @@ def test_update_all_multi_fallback():
    assert F.allclose(o1, g.ndata.pop('o1'))
    assert F.allclose(o2, g.ndata.pop('o2'))

-def test_pull_multi_fallback():
+@parametrize_dtype
+def test_pull_multi_fallback(idtype):
    # create a graph with zero in degree nodes
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(10)
    for i in range(1, 9):
        g.add_edge(0, i)
@@ -465,7 +438,8 @@ def test_pull_multi_fallback():
    nodes = [0, 1, 2, 9]
    _pull_nodes(nodes)

-def test_spmv_3d_feat():
+@parametrize_dtype
+def test_spmv_3d_feat(idtype):
    def src_mul_edge_udf(edges):
        return {'sum': edges.src['h'] * F.unsqueeze(F.unsqueeze(edges.data['h'], 1), 1)}

@@ -476,6 +450,7 @@ def test_spmv_3d_feat():
    p = 0.1
    a = sp.random(n, n, p, data_rvs=lambda n: np.ones(n))
    g = dgl.DGLGraph(a)
+    g = g.astype(idtype).to(F.ctx())
    m = g.number_of_edges()

    # test#1: v2v with adj data

--- a/tests/compute/test_to_device.py
+++ b/tests/compute/test_to_device.py
-import dgl
-import backend as F
-import unittest
-
-
-@unittest.skipIf(F._default_context_str == 'cpu', reason="Need gpu for this test")
-def test_to_device():
-    g = dgl.DGLGraph()
-    g.add_nodes(5, {'h' : F.ones((5, 2))})
-    g.add_edges([0, 1], [1, 2], {'m' : F.ones((2, 2))})
-    if F.is_cuda_available():
-        g = g.to(F.cuda())
-        assert g is not None
-
-
-if __name__ == '__main__':
-    test_to_device()
--- a/tests/compute/test_transform.py
+++ b/tests/compute/test_transform.py
 from scipy import sparse as spsp
-import unittest
 import networkx as nx
 import numpy as np
 import dgl
@@ -9,37 +8,28 @@ from dgl.graph_index import from_scipy_sparse_matrix
 import unittest
 from utils import parametrize_dtype

+from test_heterograph import create_test_heterograph4, create_test_heterograph5, create_test_heterograph6
+
 D = 5

 # line graph related

-def test_line_graph():
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+def test_line_graph1():
    N = 5
    G = dgl.DGLGraph(nx.star_graph(N))
    G.edata['h'] = F.randn((2 * N, D))
    n_edges = G.number_of_edges()
    L = G.line_graph(shared=True)
    assert L.number_of_nodes() == 2 * N
-    L.ndata['h'] = F.randn((2 * N, D))
-    # update node features on line graph should reflect to edge features on
-    # original graph.
-    u = [0, 0, 2, 3]
-    v = [1, 2, 0, 0]
-    eid = G.edge_ids(u, v)
-    L.nodes[eid].data['h'] = F.zeros((4, D))
-    assert F.allclose(G.edges[u, v].data['h'], F.zeros((4, D)))
-
-    # adding a new node feature on line graph should also reflect to a new
-    # edge feature on original graph
-    data = F.randn((n_edges, D))
-    L.ndata['w'] = data
-    assert F.allclose(G.edata['w'], data)
+    assert F.allclose(L.ndata['h'], G.edata['h'])

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_hetero_linegraph(index_dtype):
+def test_line_graph2(idtype):
    g = dgl.graph(([0, 1, 1, 2, 2],[2, 0, 2, 0, 1]),
-        'user', 'follows', index_dtype=index_dtype)
-    lg = dgl.line_heterograph(g)
+        'user', 'follows', idtype=idtype)
+    lg = dgl.line_graph(g)
    assert lg.number_of_nodes() == 5
    assert lg.number_of_edges() == 8
    row, col = lg.edges()
@@ -48,7 +38,7 @@ def test_hetero_linegraph(index_dtype):
    assert np.array_equal(F.asnumpy(col),
                          np.array([3, 4, 0, 3, 4, 0, 1, 2]))

-    lg = dgl.line_heterograph(g, backtracking=False)
+    lg = dgl.line_graph(g, backtracking=False)
    assert lg.number_of_nodes() == 5
    assert lg.number_of_edges() == 4
    row, col = lg.edges()
@@ -57,8 +47,8 @@ def test_hetero_linegraph(index_dtype):
    assert np.array_equal(F.asnumpy(col),
                          np.array([4, 0, 3, 1]))
    g = dgl.graph(([0, 1, 1, 2, 2],[2, 0, 2, 0, 1]),
-        'user', 'follows', restrict_format='csr', index_dtype=index_dtype)
-    lg = dgl.line_heterograph(g)
+        'user', 'follows', idtype=idtype).formats('csr')
+    lg = dgl.line_graph(g)
    assert lg.number_of_nodes() == 5
    assert lg.number_of_edges() == 8
    row, col = lg.edges()
@@ -67,9 +57,9 @@ def test_hetero_linegraph(index_dtype):
    assert np.array_equal(F.asnumpy(col),
                          np.array([3, 4, 0, 3, 4, 0, 1, 2]))

-    g = dgl.graph(([0, 1, 1, 2, 2],[2, 0, 2, 0, 1]),
-        'user', 'follows', restrict_format='csc', index_dtype=index_dtype)
-    lg = dgl.line_heterograph(g)
+    g = dgl.graph(([0, 1, 1, 2, 2],[2, 0, 2, 0, 1]), 
+        'user', 'follows', idtype=idtype).formats('csc')
+    lg = dgl.line_graph(g)
    assert lg.number_of_nodes() == 5
    assert lg.number_of_edges() == 8
    row, col, eid = lg.edges('all')
@@ -82,6 +72,7 @@ def test_hetero_linegraph(index_dtype):
    assert np.array_equal(col[order],
                          np.array([3, 4, 0, 3, 4, 0, 1, 2]))

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_no_backtracking():
    N = 5
    G = dgl.DGLGraph(nx.star_graph(N))
@@ -94,8 +85,10 @@ def test_no_backtracking():
        assert not L.has_edge_between(e2, e1)

 # reverse graph related
-def test_reverse():
+@parametrize_dtype
+def test_reverse(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(5)
    # The graph need not to be completely connected.
    g.add_edges([0, 1, 2], [1, 2, 1])
@@ -113,12 +106,12 @@ def test_reverse():
    assert g.edge_id(1, 2) == rg.edge_id(2, 1)
    assert g.edge_id(2, 1) == rg.edge_id(1, 2)

-    # test dgl.reverse_heterograph
+    # test dgl.reverse
    # test homogeneous graph
    g = dgl.graph((F.tensor([0, 1, 2]), F.tensor([1, 2, 0])))
    g.ndata['h'] = F.tensor([[0.], [1.], [2.]])
    g.edata['h'] = F.tensor([[3.], [4.], [5.]])
-    g_r = dgl.reverse_heterograph(g)
+    g_r = dgl.reverse(g)
    assert g.number_of_nodes() == g_r.number_of_nodes()
    assert g.number_of_edges() == g_r.number_of_edges()
    u_g, v_g, eids_g = g.all_edges(form='all')
@@ -130,14 +123,14 @@ def test_reverse():
    assert len(g_r.edata) == 0

    # without share ndata
-    g_r = dgl.reverse_heterograph(g, copy_ndata=False)
+    g_r = dgl.reverse(g, copy_ndata=False)
    assert g.number_of_nodes() == g_r.number_of_nodes()
    assert g.number_of_edges() == g_r.number_of_edges()
    assert len(g_r.ndata) == 0
    assert len(g_r.edata) == 0

    # with share ndata and edata
-    g_r = dgl.reverse_heterograph(g, copy_ndata=True, copy_edata=True)
+    g_r = dgl.reverse(g, copy_ndata=True, copy_edata=True)
    assert g.number_of_nodes() == g_r.number_of_nodes()
    assert g.number_of_edges() == g_r.number_of_edges()
    assert F.array_equal(g.ndata['h'], g_r.ndata['h'])
@@ -157,13 +150,14 @@ def test_reverse():
    g = dgl.heterograph({
        ('user', 'follows', 'user'): ([0, 1, 2, 4, 3 ,1, 3], [1, 2, 3, 2, 0, 0, 1]),
        ('user', 'plays', 'game'): ([0, 0, 2, 3, 3, 4, 1], [1, 0, 1, 0, 1, 0, 0]),
-        ('developer', 'develops', 'game'): ([0, 1, 1, 2], [0, 0, 1, 1])})
+        ('developer', 'develops', 'game'): ([0, 1, 1, 2], [0, 0, 1, 1])},
+        idtype=idtype, device=F.ctx())
    g.nodes['user'].data['h'] = F.tensor([0, 1, 2, 3, 4])
    g.nodes['user'].data['hh'] = F.tensor([1, 1, 1, 1, 1])
    g.nodes['game'].data['h'] = F.tensor([0, 1])
    g.edges['follows'].data['h'] = F.tensor([0, 1, 2, 4, 3 ,1, 3])
    g.edges['follows'].data['hh'] = F.tensor([1, 2, 3, 2, 0, 0, 1])
-    g_r = dgl.reverse_heterograph(g)
+    g_r = dgl.reverse(g)

    for etype_g, etype_gr in zip(g.canonical_etypes, g_r.canonical_etypes):
        assert etype_g[0] == etype_gr[2]
@@ -193,7 +187,7 @@ def test_reverse():
    assert F.array_equal(eids_g, eids_rg)

    # withour share ndata
-    g_r = dgl.reverse_heterograph(g, copy_ndata=False)
+    g_r = dgl.reverse(g, copy_ndata=False)
    for etype_g, etype_gr in zip(g.canonical_etypes, g_r.canonical_etypes):
        assert etype_g[0] == etype_gr[2]
        assert etype_g[1] == etype_gr[1]
@@ -204,7 +198,7 @@ def test_reverse():
    assert len(g_r.nodes['user'].data) == 0
    assert len(g_r.nodes['game'].data) == 0

-    g_r = dgl.reverse_heterograph(g, copy_ndata=True, copy_edata=True)
+    g_r = dgl.reverse(g, copy_ndata=True, copy_edata=True)
    print(g_r)
    for etype_g, etype_gr in zip(g.canonical_etypes, g_r.canonical_etypes):
        assert etype_g[0] == etype_gr[2]
@@ -225,8 +219,10 @@ def test_reverse():
    assert ('hhh' in g_r.edges['follows'].data) is True


-def test_reverse_shared_frames():
+@parametrize_dtype
+def test_reverse_shared_frames(idtype):
    g = dgl.DGLGraph()
+    g = g.astype(idtype).to(F.ctx())
    g.add_nodes(3)
    g.add_edges([0, 1, 2], [1, 2, 1])
    g.ndata['h'] = F.tensor([[0.], [1.], [2.]])
@@ -238,18 +234,6 @@ def test_reverse_shared_frames():
    assert F.allclose(g.edges[[0, 2], [1, 1]].data['h'],
                      rg.edges[[1, 1], [0, 2]].data['h'])

-    rg.ndata['h'] = rg.ndata['h'] + 1
-    assert F.allclose(rg.ndata['h'], g.ndata['h'])
-
-    g.edata['h'] = g.edata['h'] - 1
-    assert F.allclose(rg.edata['h'], g.edata['h'])
-
-    src_msg = fn.copy_src(src='h', out='m')
-    sum_reduce = fn.sum(msg='m', out='h')
-
-    rg.update_all(src_msg, sum_reduce)
-    assert F.allclose(g.ndata['h'], rg.ndata['h'])
-
 def test_to_bidirected():
    # homogeneous graph
    g = dgl.graph((F.tensor([0, 1, 3, 1]), F.tensor([1, 2, 0, 2])))
@@ -329,6 +313,7 @@ def test_to_bidirected():
    assert F.array_equal(v, vb)


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_simple_graph():
    elist = [(0, 1), (0, 2), (1, 2), (0, 1)]
    g = dgl.DGLGraph(elist, readonly=True)
@@ -340,8 +325,8 @@ def test_simple_graph():
    eset = set(zip(list(F.asnumpy(src)), list(F.asnumpy(dst))))
    assert eset == set(elist)

-
-def test_bidirected_graph():
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
+def _test_bidirected_graph():
    def _test(in_readonly, out_readonly):
        elist = [(0, 0), (0, 1), (1, 0),
                (1, 1), (2, 1), (2, 2)]
@@ -361,6 +346,7 @@ def test_bidirected_graph():
    _test(False, False)


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_khop_graph():
    N = 20
    feat = F.randn((N, 5))
@@ -386,6 +372,7 @@ def test_khop_graph():
    g = dgl.DGLGraph(nx.erdos_renyi_graph(N, 0.3, directed=True))
    _test(g)

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_khop_adj():
    N = 20
    feat = F.randn((N, 5))
@@ -402,6 +389,7 @@ def test_khop_adj():
        assert F.allclose(h_0, h_1, rtol=1e-3, atol=1e-3)


+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_laplacian_lambda_max():
    N = 20
    eps = 1e-6
@@ -410,6 +398,7 @@ def test_laplacian_lambda_max():
    l_max = dgl.laplacian_lambda_max(g)
    assert (l_max[0] < 2 + eps)
    # test batched DGLGraph
+    '''
    N_arr = [20, 30, 10, 12]
    bg = dgl.batch([
        dgl.DGLGraph(nx.erdos_renyi_graph(N, 0.3))
@@ -419,25 +408,7 @@ def test_laplacian_lambda_max():
    assert len(l_max_arr) == len(N_arr)
    for l_max in l_max_arr:
        assert l_max < 2 + eps
-
-
-def test_add_self_loop():
-    g = dgl.DGLGraph()
-    g.add_nodes(5)
-    g.add_edges([0, 1, 2], [1, 1, 2])
-    # Nodes 0, 3, 4 don't have self-loop
-    new_g = dgl.transform.add_self_loop(g)
-    assert F.allclose(new_g.edges()[0], F.tensor([0, 0, 1, 2, 3, 4]))
-    assert F.allclose(new_g.edges()[1], F.tensor([1, 0, 1, 2, 3, 4]))
-
-
-def test_remove_self_loop():
-    g = dgl.DGLGraph()
-    g.add_nodes(5)
-    g.add_edges([0, 1, 2], [1, 1, 2])
-    new_g = dgl.transform.remove_self_loop(g)
-    assert F.allclose(new_g.edges()[0], F.tensor([0]))
-    assert F.allclose(new_g.edges()[1], F.tensor([1]))
+    '''

 def create_large_graph_index(num_nodes):
    row = np.random.choice(num_nodes, num_nodes * 10)
@@ -454,8 +425,9 @@ def get_nodeflow(g, node_ids, num_layers):
            seed_nodes=node_ids)
    return next(iter(sampler))

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 def test_partition_with_halo():
-    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
+    g = dgl.DGLGraphStale(create_large_graph_index(1000), readonly=True)
    node_part = np.random.choice(4, g.number_of_nodes())
    subgs = dgl.transform.partition_graph_with_halo(g, node_part, 2)
    for part_id, subg in subgs.items():
@@ -484,7 +456,7 @@ def test_partition_with_halo():
 @unittest.skipIf(F._default_context_str == 'gpu', reason="METIS doesn't support GPU")
 def test_metis_partition():
    # TODO(zhengda) Metis fails to partition a small graph.
-    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
+    g = dgl.DGLGraphStale(create_large_graph_index(1000), readonly=True)
    check_metis_partition(g, 0)
    check_metis_partition(g, 1)
    check_metis_partition(g, 2)
@@ -493,7 +465,7 @@ def test_metis_partition():
 @unittest.skipIf(F._default_context_str == 'gpu', reason="METIS doesn't support GPU")
 def test_hetero_metis_partition():
    # TODO(zhengda) Metis fails to partition a small graph.
-    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
+    g = dgl.DGLGraphStale(create_large_graph_index(1000), readonly=True)
    g = dgl.as_heterograph(g)
    check_metis_partition(g, 0)
    check_metis_partition(g, 1)
@@ -583,7 +555,7 @@ def check_metis_partition(g, extra_hops):

 @unittest.skipIf(F._default_context_str == 'gpu', reason="It doesn't support GPU")
 def test_reorder_nodes():
-    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
+    g = dgl.DGLGraphStale(create_large_graph_index(1000), readonly=True)
    new_nids = np.random.permutation(g.number_of_nodes())
    # TODO(zhengda) we need to test both CSR and COO.
    new_g = dgl.transform.reorder_nodes(g, new_nids)
@@ -618,14 +590,14 @@ def test_reorder_nodes():

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_in_subgraph(index_dtype):
-    g1 = dgl.graph([(1,0),(2,0),(3,0),(0,1),(2,1),(3,1),(0,2)], 'user', 'follow', index_dtype=index_dtype)
-    g2 = dgl.bipartite([(0,0),(0,1),(1,2),(3,2)], 'user', 'play', 'game', index_dtype=index_dtype)
-    g3 = dgl.bipartite([(2,0),(2,1),(2,2),(1,0),(1,3),(0,0)], 'game', 'liked-by', 'user', index_dtype=index_dtype)
-    g4 = dgl.bipartite([(0,0),(1,0),(2,0),(3,0)], 'user', 'flips', 'coin', index_dtype=index_dtype)
+def test_in_subgraph(idtype):
+    g1 = dgl.graph([(1,0),(2,0),(3,0),(0,1),(2,1),(3,1),(0,2)], 'user', 'follow', idtype=idtype)
+    g2 = dgl.bipartite([(0,0),(0,1),(1,2),(3,2)], 'user', 'play', 'game', idtype=idtype)
+    g3 = dgl.bipartite([(2,0),(2,1),(2,2),(1,0),(1,3),(0,0)], 'game', 'liked-by', 'user', idtype=idtype)
+    g4 = dgl.bipartite([(0,0),(1,0),(2,0),(3,0)], 'user', 'flips', 'coin', idtype=idtype)
    hg = dgl.hetero_from_relations([g1, g2, g3, g4])
    subg = dgl.in_subgraph(hg, {'user' : [0,1], 'game' : 0})
-    assert subg._idtype_str == index_dtype
+    assert subg.idtype == idtype
    assert len(subg.ntypes) == 3
    assert len(subg.etypes) == 4
    u, v = subg['follow'].edges()
@@ -644,14 +616,14 @@ def test_in_subgraph(index_dtype):

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_out_subgraph(index_dtype):
-    g1 = dgl.graph([(1,0),(2,0),(3,0),(0,1),(2,1),(3,1),(0,2)], 'user', 'follow', index_dtype=index_dtype)
-    g2 = dgl.bipartite([(0,0),(0,1),(1,2),(3,2)], 'user', 'play', 'game', index_dtype=index_dtype)
-    g3 = dgl.bipartite([(2,0),(2,1),(2,2),(1,0),(1,3),(0,0)], 'game', 'liked-by', 'user', index_dtype=index_dtype)
-    g4 = dgl.bipartite([(0,0),(1,0),(2,0),(3,0)], 'user', 'flips', 'coin', index_dtype=index_dtype)
+def test_out_subgraph(idtype):
+    g1 = dgl.graph([(1,0),(2,0),(3,0),(0,1),(2,1),(3,1),(0,2)], 'user', 'follow', idtype=idtype)
+    g2 = dgl.bipartite([(0,0),(0,1),(1,2),(3,2)], 'user', 'play', 'game', idtype=idtype)
+    g3 = dgl.bipartite([(2,0),(2,1),(2,2),(1,0),(1,3),(0,0)], 'game', 'liked-by', 'user', idtype=idtype)
+    g4 = dgl.bipartite([(0,0),(1,0),(2,0),(3,0)], 'user', 'flips', 'coin', idtype=idtype)
    hg = dgl.hetero_from_relations([g1, g2, g3, g4])
    subg = dgl.out_subgraph(hg, {'user' : [0,1], 'game' : 0})
-    assert subg._idtype_str == index_dtype
+    assert subg.idtype == idtype
    assert len(subg.ntypes) == 3
    assert len(subg.etypes) == 4
    u, v = subg['follow'].edges()
@@ -673,20 +645,20 @@ def test_out_subgraph(index_dtype):

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU compaction not implemented")
 @parametrize_dtype
-def test_compact(index_dtype):
+def test_compact(idtype):
    g1 = dgl.heterograph({
        ('user', 'follow', 'user'): [(1, 3), (3, 5)],
        ('user', 'plays', 'game'): [(2, 4), (3, 4), (2, 5)],
        ('game', 'wished-by', 'user'): [(6, 7), (5, 7)]},
-        {'user': 20, 'game': 10}, index_dtype=index_dtype)
+        {'user': 20, 'game': 10}, idtype=idtype)

    g2 = dgl.heterograph({
        ('game', 'clicked-by', 'user'): [(3, 1)],
        ('user', 'likes', 'user'): [(1, 8), (8, 9)]},
-        {'user': 20, 'game': 10}, index_dtype=index_dtype)
+        {'user': 20, 'game': 10}, idtype=idtype)

-    g3 = dgl.graph([(0, 1), (1, 2)], num_nodes=10, ntype='user', index_dtype=index_dtype)
-    g4 = dgl.graph([(1, 3), (3, 5)], num_nodes=10, ntype='user', index_dtype=index_dtype)
+    g3 = dgl.graph([(0, 1), (1, 2)], num_nodes=10, ntype='user', idtype=idtype)
+    g4 = dgl.graph([(1, 3), (3, 5)], num_nodes=10, ntype='user', idtype=idtype)

    def _check(g, new_g, induced_nodes):
        assert g.ntypes == new_g.ntypes
@@ -709,15 +681,15 @@ def test_compact(index_dtype):
    new_g1 = dgl.compact_graphs(g1)
    induced_nodes = {ntype: new_g1.nodes[ntype].data[dgl.NID] for ntype in new_g1.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}
-    assert new_g1._idtype_str == index_dtype
+    assert new_g1.idtype == idtype
    assert set(induced_nodes['user']) == set([1, 3, 5, 2, 7])
    assert set(induced_nodes['game']) == set([4, 5, 6])
    _check(g1, new_g1, induced_nodes)

    # Test with always_preserve given a dict
    new_g1 = dgl.compact_graphs(
-        g1, always_preserve={'game': F.tensor([4, 7], dtype=getattr(F, index_dtype))})
-    assert new_g1._idtype_str == index_dtype
+        g1, always_preserve={'game': F.tensor([4, 7], idtype)})
+    assert new_g1.idtype == idtype
    induced_nodes = {ntype: new_g1.nodes[ntype].data[dgl.NID] for ntype in new_g1.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}
    assert set(induced_nodes['user']) == set([1, 3, 5, 2, 7])
@@ -726,11 +698,11 @@ def test_compact(index_dtype):

    # Test with always_preserve given a tensor
    new_g3 = dgl.compact_graphs(
-        g3, always_preserve=F.tensor([1, 7], dtype=getattr(F, index_dtype)))
+        g3, always_preserve=F.tensor([1, 7], idtype))
    induced_nodes = {ntype: new_g3.nodes[ntype].data[dgl.NID] for ntype in new_g3.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}

-    assert new_g3._idtype_str == index_dtype
+    assert new_g3.idtype == idtype
    assert set(induced_nodes['user']) == set([0, 1, 2, 7])
    _check(g3, new_g3, induced_nodes)

@@ -738,8 +710,8 @@ def test_compact(index_dtype):
    new_g1, new_g2 = dgl.compact_graphs([g1, g2])
    induced_nodes = {ntype: new_g1.nodes[ntype].data[dgl.NID] for ntype in new_g1.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}
-    assert new_g1._idtype_str == index_dtype
-    assert new_g2._idtype_str == index_dtype
+    assert new_g1.idtype == idtype
+    assert new_g2.idtype == idtype
    assert set(induced_nodes['user']) == set([1, 3, 5, 2, 7, 8, 9])
    assert set(induced_nodes['game']) == set([3, 4, 5, 6])
    _check(g1, new_g1, induced_nodes)
@@ -747,11 +719,11 @@ def test_compact(index_dtype):

    # Test multiple graphs with always_preserve given a dict
    new_g1, new_g2 = dgl.compact_graphs(
-        [g1, g2], always_preserve={'game': F.tensor([4, 7], dtype=getattr(F, index_dtype))})
+        [g1, g2], always_preserve={'game': F.tensor([4, 7], dtype=idtype)})
    induced_nodes = {ntype: new_g1.nodes[ntype].data[dgl.NID] for ntype in new_g1.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}
-    assert new_g1._idtype_str == index_dtype
-    assert new_g2._idtype_str == index_dtype
+    assert new_g1.idtype == idtype
+    assert new_g2.idtype == idtype
    assert set(induced_nodes['user']) == set([1, 3, 5, 2, 7, 8, 9])
    assert set(induced_nodes['game']) == set([3, 4, 5, 6, 7])
    _check(g1, new_g1, induced_nodes)
@@ -759,19 +731,20 @@ def test_compact(index_dtype):

    # Test multiple graphs with always_preserve given a tensor
    new_g3, new_g4 = dgl.compact_graphs(
-        [g3, g4], always_preserve=F.tensor([1, 7], dtype=getattr(F, index_dtype)))
+        [g3, g4], always_preserve=F.tensor([1, 7], dtype=idtype))
    induced_nodes = {ntype: new_g3.nodes[ntype].data[dgl.NID] for ntype in new_g3.ntypes}
    induced_nodes = {k: F.asnumpy(v) for k, v in induced_nodes.items()}

-    assert new_g3._idtype_str == index_dtype
-    assert new_g4._idtype_str == index_dtype
+    assert new_g3.idtype == idtype
+    assert new_g4.idtype == idtype
+
    assert set(induced_nodes['user']) == set([0, 1, 2, 3, 5, 7])
    _check(g3, new_g3, induced_nodes)
    _check(g4, new_g4, induced_nodes)

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU to simple not implemented")
 @parametrize_dtype
-def test_to_simple(index_dtype):
+def test_to_simple(idtype):
    # homogeneous graph
    g = dgl.graph((F.tensor([0, 1, 2, 1]), F.tensor([1, 2, 0, 2])))
    g.ndata['h'] = F.tensor([[0.], [1.], [2.]])
@@ -809,7 +782,7 @@ def test_to_simple(index_dtype):
        ('user', 'follow', 'user'): ([0, 1, 2, 1, 1, 1],
                                     [1, 3, 2, 3, 4, 4]),
        ('user', 'plays', 'game'): ([3, 2, 1, 1, 3, 2, 2], [5, 3, 4, 4, 5, 3, 3])},
-        index_dtype=index_dtype)
+        idtype=idtype, device=F.ctx())
    g.nodes['user'].data['h'] = F.tensor([0, 1, 2, 3, 4])
    g.nodes['user'].data['hh'] = F.tensor([0, 1, 2, 3, 4])
    g.edges['follow'].data['h'] = F.tensor([0, 1, 2, 3, 4, 5])
@@ -855,7 +828,7 @@ def test_to_simple(index_dtype):

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU compaction not implemented")
 @parametrize_dtype
-def test_to_block(index_dtype):
+def test_to_block(idtype):
    def check(g, bg, ntype, etype, dst_nodes, include_dst_in_src=True):
        if dst_nodes is not None:
            assert F.array_equal(bg.dstnodes[ntype].data[dgl.NID], dst_nodes)
@@ -892,7 +865,7 @@ def test_to_block(index_dtype):
    g = dgl.heterograph({
        ('A', 'AA', 'A'): [(0, 1), (2, 3), (1, 2), (3, 4)],
        ('A', 'AB', 'B'): [(0, 1), (1, 3), (3, 5), (1, 6)],
-        ('B', 'BA', 'A'): [(2, 3), (3, 2)]}, index_dtype=index_dtype)
+        ('B', 'BA', 'A'): [(2, 3), (3, 2)]}, idtype=idtype)
    g.nodes['A'].data['x'] = F.randn((5, 10))
    g.nodes['B'].data['x'] = F.randn((7, 5))
    g.edges['AA'].data['x'] = F.randn((4, 3))
@@ -929,7 +902,7 @@ def test_to_block(index_dtype):
    assert bg.number_of_src_nodes() == 4
    assert bg.number_of_dst_nodes() == 4

-    dst_nodes = F.tensor([4, 3, 2, 1], dtype=getattr(F, index_dtype))
+    dst_nodes = F.tensor([4, 3, 2, 1], dtype=idtype)
    bg = dgl.to_block(g_a, dst_nodes)
    check(g_a, bg, 'A', 'AA', dst_nodes)
    check_features(g_a, bg)
@@ -937,14 +910,14 @@ def test_to_block(index_dtype):
    g_ab = g['AB']

    bg = dgl.to_block(g_ab)
-    assert bg._idtype_str == index_dtype
+    assert bg.idtype == idtype
    assert bg.number_of_nodes('SRC/B') == 4
    assert F.array_equal(bg.srcnodes['B'].data[dgl.NID], bg.dstnodes['B'].data[dgl.NID])
    assert bg.number_of_nodes('DST/A') == 0
    checkall(g_ab, bg, None)
    check_features(g_ab, bg)

-    dst_nodes = {'B': F.tensor([5, 6, 3, 1], dtype=getattr(F, index_dtype))}
+    dst_nodes = {'B': F.tensor([5, 6, 3, 1], dtype=idtype)}
    bg = dgl.to_block(g, dst_nodes)
    assert bg.number_of_nodes('SRC/B') == 4
    assert F.array_equal(bg.srcnodes['B'].data[dgl.NID], bg.dstnodes['B'].data[dgl.NID])
@@ -952,14 +925,14 @@ def test_to_block(index_dtype):
    checkall(g, bg, dst_nodes)
    check_features(g, bg)

-    dst_nodes = {'A': F.tensor([4, 3, 2, 1], dtype=getattr(F, index_dtype)), 'B': F.tensor([3, 5, 6, 1], dtype=getattr(F, index_dtype))}
+    dst_nodes = {'A': F.tensor([4, 3, 2, 1], dtype=idtype), 'B': F.tensor([3, 5, 6, 1], dtype=idtype)}
    bg = dgl.to_block(g, dst_nodes=dst_nodes)
    checkall(g, bg, dst_nodes)
    check_features(g, bg)

 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_remove_edges(index_dtype):
+def test_remove_edges(idtype):
    def check(g1, etype, g, edges_removed):
        src, dst, eid = g.edges(etype=etype, form='all')
        src1, dst1 = g1.edges(etype=etype, order='eid')
@@ -982,82 +955,503 @@ def test_remove_edges(index_dtype):

    for fmt in ['coo', 'csr', 'csc']:
        for edges_to_remove in [[2], [2, 2], [3, 2], [1, 3, 1, 2]]:
-            g = dgl.graph([(0, 1), (2, 3), (1, 2), (3, 4)], restrict_format=fmt, index_dtype=index_dtype)
-            g1 = dgl.remove_edges(g, F.tensor(edges_to_remove, getattr(F, index_dtype)))
+            g = dgl.graph([(0, 1), (2, 3), (1, 2), (3, 4)], idtype=idtype).formats(fmt)
+            g1 = dgl.remove_edges(g, F.tensor(edges_to_remove, idtype))
            check(g1, None, g, edges_to_remove)

            g = dgl.graph(
                spsp.csr_matrix(([1, 1, 1, 1], ([0, 2, 1, 3], [1, 3, 2, 4])), shape=(5, 5)),
-                restrict_format=fmt, index_dtype=index_dtype)
-            g1 = dgl.remove_edges(g, F.tensor(edges_to_remove, getattr(F, index_dtype)))
+                idtype=idtype).formats(fmt)
+            g1 = dgl.remove_edges(g, F.tensor(edges_to_remove, idtype))
            check(g1, None, g, edges_to_remove)

    g = dgl.heterograph({
        ('A', 'AA', 'A'): [(0, 1), (2, 3), (1, 2), (3, 4)],
        ('A', 'AB', 'B'): [(0, 1), (1, 3), (3, 5), (1, 6)],
-        ('B', 'BA', 'A'): [(2, 3), (3, 2)]}, index_dtype=index_dtype)
-    g2 = dgl.remove_edges(g, {'AA': F.tensor([2], getattr(F, index_dtype)), 'AB': F.tensor([3], getattr(F, index_dtype)), 'BA': F.tensor([1], getattr(F, index_dtype))})
+        ('B', 'BA', 'A'): [(2, 3), (3, 2)]}, idtype=idtype)
+    g2 = dgl.remove_edges(g, {'AA': F.tensor([2], idtype), 'AB': F.tensor([3], idtype), 'BA': F.tensor([1], idtype)})
    check(g2, 'AA', g, [2])
    check(g2, 'AB', g, [3])
    check(g2, 'BA', g, [1])

-    g3 = dgl.remove_edges(g, {'AA': F.tensor([], getattr(F, index_dtype)), 'AB': F.tensor([3], getattr(F, index_dtype)), 'BA': F.tensor([1], getattr(F, index_dtype))})
+    g3 = dgl.remove_edges(g, {'AA': F.tensor([], idtype), 'AB': F.tensor([3], idtype), 'BA': F.tensor([1], idtype)})
    check(g3, 'AA', g, [])
    check(g3, 'AB', g, [3])
    check(g3, 'BA', g, [1])

-    g4 = dgl.remove_edges(g, {'AB': F.tensor([3, 1, 2, 0], getattr(F, index_dtype))})
+    g4 = dgl.remove_edges(g, {'AB': F.tensor([3, 1, 2, 0], idtype)})
    check(g4, 'AA', g, [])
    check(g4, 'AB', g, [3, 1, 2, 0])
    check(g4, 'BA', g, [])

-def test_cast():
-    m = spsp.coo_matrix(([1, 1], ([0, 1], [1, 2])), (4, 4))
-    g = dgl.DGLGraph(m, readonly=True)
-    gsrc, gdst = g.edges(order='eid')
-    ndata = F.randn((4, 5))
-    edata = F.randn((2, 4))
-    g.ndata['x'] = ndata
-    g.edata['y'] = edata
-
-    hg = dgl.as_heterograph(g, 'A', 'AA')
-    assert hg.ntypes == ['A']
-    assert hg.etypes == ['AA']
-    assert hg.canonical_etypes == [('A', 'AA', 'A')]
-    assert hg.number_of_nodes() == 4
-    assert hg.number_of_edges() == 2
-    hgsrc, hgdst = hg.edges(order='eid')
-    assert F.array_equal(gsrc, hgsrc)
-    assert F.array_equal(gdst, hgdst)
-
-    g2 = dgl.as_immutable_graph(hg)
-    assert g2.number_of_nodes() == 4
-    assert g2.number_of_edges() == 2
-    g2src, g2dst = hg.edges(order='eid')
-    assert F.array_equal(g2src, gsrc)
-    assert F.array_equal(g2dst, gdst)
+@parametrize_dtype
+def test_add_edges(idtype):
+    # homogeneous graph
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    u = 0
+    v = 1
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 3
+    u = [0]
+    v = [1]
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 4
+    u = F.tensor(u, dtype=idtype)
+    v = F.tensor(v, dtype=idtype)
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 5
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 0, 0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 1, 1, 1], dtype=idtype))
+
+    # node id larger than current max node id
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    u = F.tensor([0, 1], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    g = dgl.add_edges(g, u, v)
+    assert g.number_of_nodes() == 4
+    assert g.number_of_edges() == 4
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 2, 3], dtype=idtype))
+
+    # has data
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    g.ndata['h'] = F.copy_to(F.tensor([1, 1, 1], dtype=idtype), ctx=F.ctx())
+    g.edata['h'] = F.copy_to(F.tensor([1, 1], dtype=idtype), ctx=F.ctx())
+    u = F.tensor([0, 1], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    e_feat = {'h' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx()),
+              'hh' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx())}
+    g = dgl.add_edges(g, u, v, e_feat)
+    assert g.number_of_nodes() == 4
+    assert g.number_of_edges() == 4
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 2, 3], dtype=idtype))
+    assert F.array_equal(g.ndata['h'], F.tensor([1, 1, 1, 0], dtype=idtype))
+    assert F.array_equal(g.edata['h'], F.tensor([1, 1, 2, 2], dtype=idtype))
+    assert F.array_equal(g.edata['hh'], F.tensor([0, 0, 2, 2], dtype=idtype))
+
+    # zero data graph
+    g = dgl.graph([], num_nodes=0, idtype=idtype, device=F.ctx())
+    u = F.tensor([0, 1], dtype=idtype)
+    v = F.tensor([2, 2], dtype=idtype)
+    e_feat = {'h' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx()),
+              'hh' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx())}
+    g = dgl.add_edges(g, u, v, e_feat)
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 2
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2, 2], dtype=idtype))
+    assert F.array_equal(g.edata['h'], F.tensor([2, 2], dtype=idtype))
+    assert F.array_equal(g.edata['hh'], F.tensor([2, 2], dtype=idtype))
+
+    # bipartite graph
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    u = 0
+    v = 1
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes('user') == 2
+    assert g.number_of_nodes('game') == 3
+    assert g.number_of_edges() == 3
+    u = [0]
+    v = [1]
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes('user') == 2
+    assert g.number_of_nodes('game') == 3
+    assert g.number_of_edges() == 4
+    u = F.tensor(u, dtype=idtype)
+    v = F.tensor(v, dtype=idtype)
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes('user') == 2
+    assert g.number_of_nodes('game') == 3
+    assert g.number_of_edges() == 5
+    u, v = g.edges(form='uv')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 0, 0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 1, 1, 1], dtype=idtype))
+
+    # node id larger than current max node id
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    u = F.tensor([0, 2], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    g = dgl.add_edges(g, u, v)
+    assert g.device == F.ctx()
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 4
+    assert g.number_of_edges() == 4
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 2, 3], dtype=idtype))
+
+    # has data
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    g.ndata['h'] = {'user' : F.copy_to(F.tensor([1, 1], dtype=idtype), ctx=F.ctx()),
+                    'game' : F.copy_to(F.tensor([2, 2, 2], dtype=idtype), ctx=F.ctx())}
+    g.edata['h'] = F.copy_to(F.tensor([1, 1], dtype=idtype), ctx=F.ctx())
+    u = F.tensor([0, 2], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    e_feat = {'h' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx()),
+              'hh' : F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx())}
+    g = dgl.add_edges(g, u, v, e_feat)
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 4
+    assert g.number_of_edges() == 4
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1, 2, 2, 3], dtype=idtype))
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1, 0], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2, 2, 0], dtype=idtype))
+    assert F.array_equal(g.edata['h'], F.tensor([1, 1, 2, 2], dtype=idtype))
+    assert F.array_equal(g.edata['hh'], F.tensor([0, 0, 2, 2], dtype=idtype))
+
+    # heterogeneous graph
+    g = create_test_heterograph4(idtype)
+    u = F.tensor([0, 2], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    g = dgl.add_edges(g, u, v, etype='plays')
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 4
+    assert g.number_of_nodes('developer') == 2
+    assert g.number_of_edges('plays') == 6
+    assert g.number_of_edges('develops') == 2
+    u, v = g.edges(form='uv', order='eid', etype='plays')
+    assert F.array_equal(u, F.tensor([0, 1, 1, 2, 0, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 0, 1, 1, 2, 3], dtype=idtype))
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1, 1], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2, 0, 0], dtype=idtype))
+    assert F.array_equal(g.edges['plays'].data['h'], F.tensor([1, 1, 1, 1, 0, 0], dtype=idtype))
+
+    # add with feature
+    e_feat = {'h': F.copy_to(F.tensor([2, 2], dtype=idtype), ctx=F.ctx())}
+    u = F.tensor([0, 2], dtype=idtype)
+    v = F.tensor([2, 3], dtype=idtype)
+    g.nodes['game'].data['h'] =  F.copy_to(F.tensor([2, 2, 1, 1], dtype=idtype), ctx=F.ctx())
+    g = dgl.add_edges(g, u, v, data=e_feat, etype='develops')
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 4
+    assert g.number_of_nodes('developer') == 3
+    assert g.number_of_edges('plays') == 6
+    assert g.number_of_edges('develops') == 4
+    u, v = g.edges(form='uv', order='eid', etype='develops')
+    assert F.array_equal(u, F.tensor([0, 1, 0, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 1, 2, 3], dtype=idtype))
+    assert F.array_equal(g.nodes['developer'].data['h'], F.tensor([3, 3, 0], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2, 1, 1], dtype=idtype))
+    assert F.array_equal(g.edges['develops'].data['h'], F.tensor([0, 0, 2, 2], dtype=idtype))
+
+@parametrize_dtype
+def test_add_nodes(idtype):
+    # homogeneous Graphs
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    g.ndata['h'] = F.copy_to(F.tensor([1,1,1], dtype=idtype), ctx=F.ctx())
+    new_g = dgl.add_nodes(g, 1)
+    assert g.number_of_nodes() == 3
+    assert new_g.number_of_nodes() == 4
+    assert F.array_equal(new_g.ndata['h'], F.tensor([1, 1, 1, 0], dtype=idtype))
+
+    # zero node graph
+    g = dgl.graph([], num_nodes=3, idtype=idtype, device=F.ctx())
+    g.ndata['h'] = F.copy_to(F.tensor([1,1,1], dtype=idtype), ctx=F.ctx())
+    g = dgl.add_nodes(g, 1, data={'h' : F.copy_to(F.tensor([2],  dtype=idtype), ctx=F.ctx())})
+    assert g.number_of_nodes() == 4
+    assert F.array_equal(g.ndata['h'], F.tensor([1, 1, 1, 2], dtype=idtype))
+
+    # bipartite graph
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    g = dgl.add_nodes(g, 2, data={'h' : F.copy_to(F.tensor([2, 2],  dtype=idtype), ctx=F.ctx())}, ntype='user')
+    assert g.number_of_nodes('user') == 4
+    assert g.number_of_nodes('game') == 3
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([0, 0, 2, 2], dtype=idtype))
+    g = dgl.add_nodes(g, 2, ntype='game')
+    assert g.number_of_nodes('user') == 4
+    assert g.number_of_nodes('game') == 5
+
+    # heterogeneous graph
+    g = create_test_heterograph4(idtype)
+    g = dgl.add_nodes(g, 1, ntype='user')
+    g = dgl.add_nodes(g, 2, data={'h' : F.copy_to(F.tensor([2, 2],  dtype=idtype), ctx=F.ctx())}, ntype='game')
+    assert g.number_of_nodes('user') == 4
+    assert g.number_of_nodes('game') == 4
+    assert g.number_of_nodes('developer') == 2
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1, 1, 0], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2, 2, 2], dtype=idtype))
+
+@parametrize_dtype
+def test_remove_edges(idtype):
+    # homogeneous Graphs
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    e = 0
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2], dtype=idtype))
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    e = [0]
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2], dtype=idtype))
+    e = F.tensor([0], dtype=idtype)
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 0
+
+    # has node data
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    g.ndata['h'] = F.copy_to(F.tensor([1, 2, 3], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_edges(g, 1)
+    assert g.number_of_edges() == 1
+    assert F.array_equal(g.ndata['h'], F.tensor([1, 2, 3], dtype=idtype))
+
+    # has edge data
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    g.edata['h'] = F.copy_to(F.tensor([1, 2], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_edges(g, 0)
+    assert g.number_of_edges() == 1
+    assert F.array_equal(g.edata['h'], F.tensor([2], dtype=idtype))
+
+    # invalid eid
+    assert_fail = False
+    try:
+        g = dgl.remove_edges(g, 1)
+    except:
+        assert_fail = True
+    assert assert_fail
+
+    # bipartite graph
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    e = 0
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2], dtype=idtype))
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    e = [0]
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2], dtype=idtype))
+    e = F.tensor([0], dtype=idtype)
+    g = dgl.remove_edges(g, e)
+    assert g.number_of_edges() == 0
+
+    # has data
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    g.ndata['h'] = {'user' : F.copy_to(F.tensor([1, 1], dtype=idtype), ctx=F.ctx()),
+                    'game' : F.copy_to(F.tensor([2, 2, 2], dtype=idtype), ctx=F.ctx())}
+    g.edata['h'] = F.copy_to(F.tensor([1, 2], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_edges(g, 1)
+    assert g.number_of_edges() == 1
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2, 2], dtype=idtype))
+    assert F.array_equal(g.edata['h'], F.tensor([1], dtype=idtype))
+
+    # heterogeneous graph
+    g = create_test_heterograph4(idtype)
+    g.edges['plays'].data['h'] = F.copy_to(F.tensor([1, 2, 3, 4], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_edges(g, 1, etype='plays')
+    assert g.number_of_edges('plays') == 3
+    u, v = g.edges(form='uv', order='eid', etype='plays')
+    assert F.array_equal(u, F.tensor([0, 1, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 1, 1], dtype=idtype))
+    assert F.array_equal(g.edges['plays'].data['h'], F.tensor([1, 3, 4], dtype=idtype))
+    # remove all edges of 'develops'
+    g = dgl.remove_edges(g, [0, 1], etype='develops')
+    assert g.number_of_edges('develops') == 0
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1, 1], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2, 2], dtype=idtype))
+    assert F.array_equal(g.nodes['developer'].data['h'], F.tensor([3, 3], dtype=idtype))
+
+@parametrize_dtype
+def test_remove_nodes(idtype):
+    # homogeneous Graphs
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    n = 0
+    g = dgl.remove_nodes(g, n)
+    assert g.number_of_nodes() == 2
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1], dtype=idtype))
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    n = [1]
+    g = dgl.remove_nodes(g, n)
+    assert g.number_of_nodes() == 2
+    assert g.number_of_edges() == 0
+    g = dgl.graph(([0, 1], [1, 2]), idtype=idtype, device=F.ctx())
+    n = F.tensor([2], dtype=idtype)
+    g = dgl.remove_nodes(g, n)
+    assert g.number_of_nodes() == 2
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1], dtype=idtype))
+
+    # invalid nid
+    assert_fail = False
+    try:
+        g.remove_nodes(3)
+    except:
+        assert_fail = True
+    assert assert_fail
+
+    # has node and edge data
+    g = dgl.graph(([0, 0, 2], [0, 1, 2]), idtype=idtype, device=F.ctx())
+    g.ndata['hv'] = F.copy_to(F.tensor([1, 2, 3], dtype=idtype), ctx=F.ctx())
+    g.edata['he'] = F.copy_to(F.tensor([1, 2, 3], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_nodes(g, F.tensor([0], dtype=idtype))
+    assert g.number_of_nodes() == 2
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1], dtype=idtype))
+    assert F.array_equal(g.ndata['hv'], F.tensor([2, 3], dtype=idtype))
+    assert F.array_equal(g.edata['he'], F.tensor([3], dtype=idtype))
+
+    # node id larger than current max node id
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    n = 0
+    g = dgl.remove_nodes(g, n, ntype='user')
+    assert g.number_of_nodes('user') == 1
+    assert g.number_of_nodes('game') == 3
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2], dtype=idtype))
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    n = [1]
+    g = dgl.remove_nodes(g, n, ntype='user')
+    assert g.number_of_nodes('user') == 1
+    assert g.number_of_nodes('game') == 3
+    assert g.number_of_edges() == 1
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0], dtype=idtype))
+    assert F.array_equal(v, F.tensor([1], dtype=idtype))
+    g = dgl.bipartite(([0, 1], [1, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    n = F.tensor([0], dtype=idtype)
+    g = dgl.remove_nodes(g, n, ntype='game')
+    assert g.number_of_nodes('user') == 2
+    assert g.number_of_nodes('game') == 2
+    assert g.number_of_edges() == 2
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0 ,1], dtype=idtype))
+
+    # heterogeneous graph
+    g = create_test_heterograph4(idtype)
+    g.edges['plays'].data['h'] = F.copy_to(F.tensor([1, 2, 3, 4], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_nodes(g, 0, ntype='game')
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 1
+    assert g.number_of_nodes('developer') == 2
+    assert g.number_of_edges('plays') == 2
+    assert g.number_of_edges('develops') == 1
+    assert F.array_equal(g.nodes['user'].data['h'], F.tensor([1, 1, 1], dtype=idtype))
+    assert F.array_equal(g.nodes['game'].data['h'], F.tensor([2], dtype=idtype))
+    assert F.array_equal(g.nodes['developer'].data['h'], F.tensor([3, 3], dtype=idtype))
+    u, v = g.edges(form='uv', order='eid', etype='plays')
+    assert F.array_equal(u, F.tensor([1, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 0], dtype=idtype))
+    assert F.array_equal(g.edges['plays'].data['h'], F.tensor([3, 4], dtype=idtype))
+    u, v = g.edges(form='uv', order='eid', etype='develops')
+    assert F.array_equal(u, F.tensor([1], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0], dtype=idtype))
+
+@parametrize_dtype
+def test_add_selfloop(idtype):
+    # homogeneous graph
+    g = dgl.graph(([0, 0, 2], [2, 1, 0]), idtype=idtype, device=F.ctx())
+    g.edata['he'] = F.copy_to(F.tensor([1, 2, 3], dtype=idtype), ctx=F.ctx())
+    g.ndata['hn'] = F.copy_to(F.tensor([1, 2, 3], dtype=idtype), ctx=F.ctx())
+    g = dgl.add_self_loop(g)
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 6
+    u, v = g.edges(form='uv', order='eid')
+    assert F.array_equal(u, F.tensor([0, 0, 2, 0, 1, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([2, 1, 0, 0, 1, 2], dtype=idtype))
+    assert F.array_equal(g.edata['he'], F.tensor([1, 2, 3, 0, 0, 0], dtype=idtype))
+
+    # bipartite graph
+    g = dgl.bipartite(([0, 1, 2], [1, 2, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    # nothing will happend
+    raise_error = False
+    try:
+        g = dgl.add_self_loop(g)
+    except:
+        raise_error = True
+    assert raise_error
+
+    g = create_test_heterograph6(idtype)
+    g = dgl.add_self_loop(g, etype='follows')
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 2
+    assert g.number_of_edges('follows') == 5
+    assert g.number_of_edges('plays') == 2
+    u, v = g.edges(form='uv', order='eid', etype='follows')
+    assert F.array_equal(u, F.tensor([1, 2, 0, 1, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 1, 0, 1, 2], dtype=idtype))
+    assert F.array_equal(g.edges['follows'].data['h'], F.tensor([1, 2, 0, 0, 0], dtype=idtype))
+    assert F.array_equal(g.edges['plays'].data['h'], F.tensor([1, 2], dtype=idtype))
+
+    raise_error = False
+    try:
+        g = dgl.add_self_loop(g, etype='plays')
+    except:
+        raise_error = True
+    assert raise_error
+
+@parametrize_dtype
+def test_remove_selfloop(idtype):
+    # homogeneous graph
+    g = dgl.graph(([0, 0, 0, 1], [1, 0, 0, 2]), idtype=idtype, device=F.ctx())
+    g.edata['he'] = F.copy_to(F.tensor([1, 2, 3, 4], dtype=idtype), ctx=F.ctx())
+    g = dgl.remove_self_loop(g)
+    assert g.number_of_nodes() == 3
+    assert g.number_of_edges() == 2
+    assert F.array_equal(g.edata['he'], F.tensor([1, 4], dtype=idtype))
+
+    # bipartite graph
+    g = dgl.bipartite(([0, 1, 2], [1, 2, 2]), 'user', 'plays', 'game', idtype=idtype, device=F.ctx())
+    # nothing will happend
+    raise_error = False
+    try:
+        g = dgl.remove_self_loop(g, etype='plays')
+    except:
+        raise_error = True
+    assert raise_error
+
+    g = create_test_heterograph5(idtype)
+    g = dgl.remove_self_loop(g, etype='follows')
+    assert g.number_of_nodes('user') == 3
+    assert g.number_of_nodes('game') == 2
+    assert g.number_of_edges('follows') == 2
+    assert g.number_of_edges('plays') == 2
+    u, v = g.edges(form='uv', order='eid', etype='follows')
+    assert F.array_equal(u, F.tensor([1, 2], dtype=idtype))
+    assert F.array_equal(v, F.tensor([0, 1], dtype=idtype))
+    assert F.array_equal(g.edges['follows'].data['h'], F.tensor([2, 4], dtype=idtype))
+    assert F.array_equal(g.edges['plays'].data['h'], F.tensor([1, 2], dtype=idtype))
+
+    raise_error = False
+    try:
+        g = dgl.remove_self_loop(g, etype='plays')
+    except:
+        raise_error = True
+    assert raise_error

 if __name__ == '__main__':
-    # test_reorder_nodes()
-    # test_line_graph()
-    # test_no_backtracking()
-    # test_reverse()
-    # test_reverse_shared_frames()
-    # test_to_bidirected()
-    # test_simple_graph()
-    # test_bidirected_graph()
-    # test_khop_adj()
-    # test_khop_graph()
-    # test_laplacian_lambda_max()
-    # test_remove_self_loop()
-    # test_add_self_loop()
-    # test_partition_with_halo()
-    test_metis_partition()
-    test_hetero_metis_partition()
-    # test_hetero_linegraph('int32')
-    # test_compact()
-    # test_to_simple("int32")
-    # test_in_subgraph("int32")
-    # test_out_subgraph()
-    # test_to_block("int32")
-    # test_remove_edges()
+    pass
--- a/tests/compute/test_traversal.py
+++ b/tests/compute/test_traversal.py
 import random
 import sys
 import time
+import unittest

 import dgl
 import networkx as nx
@@ -17,8 +18,9 @@ def toset(x):
    # F.zerocopy_to_numpy may return a int
    return set(F.zerocopy_to_numpy(x).tolist())

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_bfs(index_dtype, n=100):
+def test_bfs(idtype, n=100):
    def _bfs_nx(g_nx, src):
        edges = nx.bfs_edges(g_nx, src)
        layers_nx = [set([src])]
@@ -28,25 +30,20 @@ def test_bfs(index_dtype, n=100):
        for u, v in edges:
            if u in layers_nx[-1]:
                frontier.add(v)
-                edge_frontier.add(g.edge_id(u, v))
+                edge_frontier.add(g.edge_ids(u, v))
            else:
                layers_nx.append(frontier)
                edges_nx.append(edge_frontier)
                frontier = set([v])
-                edge_frontier = set([g.edge_id(u, v)])
+                edge_frontier = set([g.edge_ids(u, v)])
        # avoids empty successors
        if len(frontier) > 0 and len(edge_frontier) > 0:
            layers_nx.append(frontier)
            edges_nx.append(edge_frontier)
        return layers_nx, edges_nx

-    g = dgl.DGLGraph()
    a = sp.random(n, n, 3 / n, data_rvs=lambda n: np.ones(n))
-    g.from_scipy_sparse_matrix(a)
-    if index_dtype == 'int32':
-        g = dgl.graph(g.edges()).int()
-    else:
-        g = dgl.graph(g.edges()).long()
+    g = dgl.graph(a).astype(idtype)

    g_nx = g.to_networkx()
    src = random.choice(range(n))
@@ -56,29 +53,19 @@ def test_bfs(index_dtype, n=100):
    assert all(toset(x) == y for x, y in zip(layers_dgl, layers_nx))

    g_nx = nx.random_tree(n, seed=42)
-    g = dgl.DGLGraph()
-    g.from_networkx(g_nx)
-    if index_dtype == 'int32':
-        g = dgl.graph(g.edges()).int()
-    else:
-        g = dgl.graph(g.edges()).long()
-
+    g = dgl.graph(g_nx).astype(idtype)
    src = 0
    _, edges_nx = _bfs_nx(g_nx, src)
    edges_dgl = dgl.bfs_edges_generator(g, src)
    assert len(edges_dgl) == len(edges_nx)
    assert all(toset(x) == y for x, y in zip(edges_dgl, edges_nx))

+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_topological_nodes(index_dtype, n=100):
-    g = dgl.DGLGraph()
+def test_topological_nodes(idtype, n=100):
    a = sp.random(n, n, 3 / n, data_rvs=lambda n: np.ones(n))
    b = sp.tril(a, -1).tocoo()
-    g.from_scipy_sparse_matrix(b)
-    if index_dtype == 'int32':
-        g = dgl.graph(g.edges()).int()
-    else:
-        g = dgl.graph(g.edges()).long()
+    g = dgl.graph(b).astype(idtype)

    layers_dgl = dgl.topological_nodes_generator(g)

@@ -101,15 +88,12 @@ def test_topological_nodes(index_dtype, n=100):
    assert all(toset(x) == toset(y) for x, y in zip(layers_dgl, layers_spmv))

 DFS_LABEL_NAMES = ['forward', 'reverse', 'nontree']
+@unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
-def test_dfs_labeled_edges(index_dtype, example=False):
-    dgl_g = dgl.DGLGraph()
+def test_dfs_labeled_edges(idtype, example=False):
+    dgl_g = dgl.DGLGraph().astype(idtype)
    dgl_g.add_nodes(6)
    dgl_g.add_edges([0, 1, 0, 3, 3], [1, 2, 2, 4, 5])
-    if index_dtype == 'int32':
-        dgl_g = dgl.graph(dgl_g.edges()).int()
-    else:
-        dgl_g = dgl.graph(dgl_g.edges()).long()
    dgl_edges, dgl_labels = dgl.dfs_labeled_edges_generator(
            dgl_g, [0, 3], has_reverse_edge=True, has_nontree_edge=True)
    dgl_edges = [toset(t) for t in dgl_edges]
@@ -141,6 +125,6 @@ def test_dfs_labeled_edges(index_dtype, example=False):
        assert False

 if __name__ == '__main__':
-    test_bfs(index_dtype='int32')
-    test_topological_nodes(index_dtype='int32')
-    test_dfs_labeled_edges(index_dtype='int32')
+    test_bfs(idtype='int32')
+    test_topological_nodes(idtype='int32')
+    test_dfs_labeled_edges(idtype='int32')
--- a/tests/compute/test_udf.py
+++ b/tests/compute/test_udf.py
-import backend as F
-import dgl
-import networkx as nx
-import dgl.utils as utils
-from dgl import DGLGraph, ALL
-from dgl.udf import NodeBatch, EdgeBatch
-
-def test_node_batch():
-    g = dgl.DGLGraph(nx.path_graph(20))
-    feat = F.randn((g.number_of_nodes(), 10))
-    g.ndata['x'] = feat
-
-    # test all
-    v = utils.toindex(slice(0, g.number_of_nodes()))
-    n_repr = g.get_n_repr(v)
-    nbatch = NodeBatch(v, n_repr)
-    assert F.allclose(nbatch.data['x'], feat)
-    assert nbatch.mailbox is None
-    assert F.allclose(nbatch.nodes(), g.nodes())
-    assert nbatch.batch_size() == g.number_of_nodes()
-    assert len(nbatch) == g.number_of_nodes()
-
-    # test partial
-    v = utils.toindex(F.tensor([0, 3, 5, 7, 9]))
-    n_repr = g.get_n_repr(v)
-    nbatch = NodeBatch(v, n_repr)
-    assert F.allclose(nbatch.data['x'], F.gather_row(feat, F.tensor([0, 3, 5, 7, 9])))
-    assert nbatch.mailbox is None
-    assert F.allclose(nbatch.nodes(), F.tensor([0, 3, 5, 7, 9]))
-    assert nbatch.batch_size() == 5
-    assert len(nbatch) == 5
-
-def test_edge_batch():
-    d = 10
-    g = dgl.DGLGraph(nx.path_graph(20))
-    nfeat = F.randn((g.number_of_nodes(), d))
-    efeat = F.randn((g.number_of_edges(), d))
-    g.ndata['x'] = nfeat
-    g.edata['x'] = efeat
-
-    # test all
-    eid = utils.toindex(slice(0, g.number_of_edges()))
-    u, v, _ = g._graph.edges('eid')
-
-    src_data = g.get_n_repr(u)
-    edge_data = g.get_e_repr(eid)
-    dst_data = g.get_n_repr(v)
-    ebatch = EdgeBatch((u, v, eid), src_data, edge_data, dst_data)
-    assert F.shape(ebatch.src['x'])[0] == g.number_of_edges() and\
-        F.shape(ebatch.src['x'])[1] == d
-    assert F.shape(ebatch.dst['x'])[0] == g.number_of_edges() and\
-        F.shape(ebatch.dst['x'])[1] == d
-    assert F.shape(ebatch.data['x'])[0] == g.number_of_edges() and\
-        F.shape(ebatch.data['x'])[1] == d
-    assert F.allclose(ebatch.edges()[0], u.tousertensor())
-    assert F.allclose(ebatch.edges()[1], v.tousertensor())
-    assert F.allclose(ebatch.edges()[2], F.arange(0, g.number_of_edges()))
-    assert ebatch.batch_size() == g.number_of_edges()
-    assert len(ebatch) == g.number_of_edges()
-
-    # test partial
-    eid = utils.toindex(F.tensor([0, 3, 5, 7, 11, 13, 15, 27]))
-    u, v, _ = g._graph.find_edges(eid)
-    src_data = g.get_n_repr(u)
-    edge_data = g.get_e_repr(eid)
-    dst_data = g.get_n_repr(v)
-    ebatch = EdgeBatch((u, v, eid), src_data, edge_data, dst_data)
-    assert F.shape(ebatch.src['x'])[0] == 8 and\
-        F.shape(ebatch.src['x'])[1] == d
-    assert F.shape(ebatch.dst['x'])[0] == 8 and\
-        F.shape(ebatch.dst['x'])[1] == d
-    assert F.shape(ebatch.data['x'])[0] == 8 and\
-        F.shape(ebatch.data['x'])[1] == d
-    assert F.allclose(ebatch.edges()[0], u.tousertensor())
-    assert F.allclose(ebatch.edges()[1], v.tousertensor())
-    assert F.allclose(ebatch.edges()[2], eid.tousertensor())
-    assert ebatch.batch_size() == 8
-    assert len(ebatch) == 8
-
-if __name__ == '__main__':
-    test_node_batch()
-    test_edge_batch()
--- a/tests/compute/utils.py
+++ b/tests/compute/utils.py
 import pytest
-parametrize_dtype = pytest.mark.parametrize("index_dtype", ['int32', 'int64'])
+import backend as F
+
+if F._default_context_str == 'cpu':
+    parametrize_dtype = pytest.mark.parametrize("idtype", [F.int32, F.int64])
+else:
+    # only test int32 on GPU because many graph operators are not supported for int64.
+    parametrize_dtype = pytest.mark.parametrize("idtype", [F.int32])

 def check_fail(fn, *args, **kwargs):
    try:

--- a/tests/cpp/test_aten.cc
+++ b/tests/cpp/test_aten.cc
@@ -189,19 +189,23 @@ TEST(ArrayTest, TestArith) {
 };

 template <typename IDX>
-void _TestHStack() {
-  IdArray a = aten::Range(0, 100, sizeof(IDX)*8, CTX);
-  IdArray b = aten::Range(100, 200, sizeof(IDX)*8, CTX);
-  IdArray c = aten::HStack(a, b);
+void _TestHStack(DLContext ctx) {
+  IdArray a = aten::Range(0, 100, sizeof(IDX)*8, ctx);
+  IdArray b = aten::Range(100, 200, sizeof(IDX)*8, ctx);
+  IdArray c = aten::HStack(a, b).CopyTo(aten::CPU);
  ASSERT_EQ(c->ndim, 1);
  ASSERT_EQ(c->shape[0], 200);
  for (int i = 0; i < 200; ++i)
    ASSERT_EQ(Ptr<IDX>(c)[i], i);
 }

-TEST(ArrayTest, TestHStack) {
-  _TestHStack<int32_t>();
-  _TestHStack<int64_t>();
+TEST(ArrayTest, HStack) {
+  _TestHStack<int32_t>(CPU);
+  _TestHStack<int64_t>(CPU);
+#ifdef DGL_USE_CUDA
+  _TestHStack<int32_t>(GPU);
+  _TestHStack<int64_t>(GPU);
+#endif
 }

 template <typename IDX>
@@ -1238,6 +1242,61 @@ TEST(ArrayTest, CumSum) {
 #endif
 }

+template <typename IDX, typename D>
+void _TestScatter_(DLContext ctx) {
+  IdArray out = aten::Full(1, 10, 8*sizeof(IDX), ctx);
+  IdArray idx = aten::VecToIdArray(std::vector<IDX>({2, 3, 9}), sizeof(IDX)*8, ctx);
+  IdArray val = aten::VecToIdArray(std::vector<IDX>({-20, 30, 90}), sizeof(IDX)*8, ctx);
+  aten::Scatter_(idx, val, out);
+  IdArray tout = aten::VecToIdArray(std::vector<IDX>({1, 1, -20, 30, 1, 1, 1, 1, 1, 90}), sizeof(IDX)*8, ctx);
+  ASSERT_TRUE(ArrayEQ<IDX>(out, tout));
+}
+
+TEST(ArrayTest, Scatter_) {
+  _TestScatter_<int32_t, int32_t>(CPU);
+  _TestScatter_<int64_t, int32_t>(CPU);
+  _TestScatter_<int32_t, int64_t>(CPU);
+  _TestScatter_<int64_t, int64_t>(CPU);
+#ifdef DGL_USE_CUDA
+  _TestScatter_<int32_t, int32_t>(GPU);
+  _TestScatter_<int64_t, int32_t>(GPU);
+  _TestScatter_<int32_t, int64_t>(GPU);
+  _TestScatter_<int64_t, int64_t>(GPU);
+#endif
+}
+
+template <typename IDX>
+void _TestNonZero(DLContext ctx) {
+  auto val = aten::VecToIdArray(std::vector<IDX>({0, 1, 2, 0, -10, 0, 0, 23}), sizeof(IDX)*8, ctx);
+  auto idx = aten::NonZero(val);
+  auto tidx = aten::VecToIdArray(std::vector<int64_t>({1, 2, 4, 7}), 64, ctx);
+  ASSERT_TRUE(ArrayEQ<IDX>(idx, tidx));
+
+  val = aten::VecToIdArray(std::vector<IDX>({}), sizeof(IDX)*8, ctx);
+  idx = aten::NonZero(val);
+  tidx = aten::VecToIdArray(std::vector<int64_t>({}), 64, ctx);
+  ASSERT_TRUE(ArrayEQ<IDX>(idx, tidx));
+
+  val = aten::VecToIdArray(std::vector<IDX>({0, 0, 0, 0}), sizeof(IDX)*8, ctx);
+  idx = aten::NonZero(val);
+  tidx = aten::VecToIdArray(std::vector<int64_t>({}), 64, ctx);
+  ASSERT_TRUE(ArrayEQ<IDX>(idx, tidx));
+
+  val = aten::Full(1, 3, sizeof(IDX)*8, ctx);
+  idx = aten::NonZero(val);
+  tidx = aten::VecToIdArray(std::vector<int64_t>({0, 1, 2}), 64, ctx);
+  ASSERT_TRUE(ArrayEQ<IDX>(idx, tidx));
+}
+
+TEST(ArrayTest, NonZero) {
+  _TestNonZero<int32_t>(CPU);
+  _TestNonZero<int64_t>(CPU);
+#ifdef DGL_USE_CUDA
+  _TestNonZero<int32_t>(GPU);
+  _TestNonZero<int64_t>(GPU);
+#endif
+}
+
 template <typename IdType>
 void _TestLineGraphCOO(DLContext ctx) {
  /*
@@ -1344,16 +1403,3 @@ TEST(LineGraphTest, LineGraphCOO) {
  _TestLineGraphCOO<int32_t>(CPU);
  _TestLineGraphCOO<int64_t>(CPU);
 }
-
-template <typename IDX>
-void _TestNonZero() {
-  BoolArray a = aten::VecToIdArray(std::vector<IDX>({1, 0, 1, 1, 0, 0, 1}));
-  IdArray indices = aten::NonZero(a);
-  IdArray expected = aten::VecToIdArray(std::vector<IDX>({0, 2, 3, 6}));
-  ASSERT_TRUE(ArrayEQ<IDX>(indices, expected));
-}
-
-TEST(ArrayTest, NonZero) {
-  _TestNonZero<int32_t>();
-  _TestNonZero<int64_t>();
-}
--- a/tests/cpp/test_serialize.cc
+++ b/tests/cpp/test_serialize.cc
@@ -4,6 +4,7 @@
 #include <gtest/gtest.h>
 #include <algorithm>
 #include <iostream>
+#include <memory>
 #include <vector>
 #include "../../src/graph/heterograph.h"
 #include "../../src/graph/unit_graph.h"
@@ -18,7 +19,7 @@ TEST(Serialize, UnitGraph_COO) {
  auto src = VecToIdArray<int64_t>({1, 2, 5, 3});
  auto dst = VecToIdArray<int64_t>({1, 6, 2, 6});
  auto mg = std::dynamic_pointer_cast<UnitGraph>(
-      dgl::UnitGraph::CreateFromCOO(2, 9, 8, src, dst, dgl::SparseFormat::kCOO));
+      dgl::UnitGraph::CreateFromCOO(2, 9, 8, src, dst, coo_code));

  std::string blob;
  dmlc::MemoryStringStream ifs(&blob);
@@ -39,13 +40,15 @@ TEST(Serialize, UnitGraph_CSR) {
  aten::CSRMatrix csr_matrix;
  auto src = VecToIdArray<int64_t>({1, 2, 5, 3});
  auto dst = VecToIdArray<int64_t>({1, 6, 2, 6});
-  auto mg = std::dynamic_pointer_cast<UnitGraph>(
-      dgl::UnitGraph::CreateFromCOO(2, 9, 8, src, dst, dgl::SparseFormat::kCSR));
+  auto coo_g = std::dynamic_pointer_cast<UnitGraph>(
+      dgl::UnitGraph::CreateFromCOO(2, 9, 8, src, dst));
+  auto csr_g =
+      std::dynamic_pointer_cast<UnitGraph>(coo_g->GetGraphInFormat(csr_code));

  std::string blob;
  dmlc::MemoryStringStream ifs(&blob);

-  static_cast<dmlc::Stream *>(&ifs)->Write(mg);
+  static_cast<dmlc::Stream *>(&ifs)->Write(csr_g);

  dmlc::MemoryStringStream ofs(&blob);
  auto ug2 = Serializer::make_shared<UnitGraph>();