[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)

* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>

[Refactor][Graph] Merge DGLGraph and DGLHeteroGraph (#1862)
* Merge * [Graph][CUDA] Graph on GPU and many refactoring (#1791) * change edge_ids behavior and C++ impl * fix unittests; remove utils.Index in edge_id * pass mx and th tests * pass tf test * add aten::Scatter_ * Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix * CSRGetData and CSRGetDataAndIndices passed tests * CSRSliceMatrix basic tests * fix bug in empty slice * CUDA CSRHasDuplicate * has_node; has_edge_between * predecessors, successors * deprecate send/recv; fix send_and_recv * deprecate send/recv; fix send_and_recv * in_edges; out_edges; all_edges; apply_edges * in deg/out deg * subgraph/edge_subgraph * adj * in_subgraph/out_subgraph * sample neighbors * set/get_n/e_repr * wip: working on refactoring all idtypes * pass ndata/edata tests on gpu * fix * stash * workaround nonzero issue * stash * nx conversion * test_hetero_basics except update routines * test_update_routines * test_hetero_basics for pytorch * more fixes * WIP: flatten graph * wip: flatten * test_flatten * test_to_device * fix bug in to_homo * fix bug in CSRSliceMatrix * pass subgraph test * fix send_and_recv * fix filter * test_heterograph * passed all pytorch tests * fix mx unittest * fix pytorch test_nn * fix all unittests for PyTorch * passed all mxnet tests * lint * fix tf nn test * pass all tf tests * lint * lint * change deprecation * try fix compile * lint * update METIDS * fix utest * fix * fix utests * try debug * revert * small fix * fix utests * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [kernel] Use heterograph index instead of unitgraph index (#1813) * upd * upd * upd * fix * upd * upd * upd * upd * upd * trigger * +1s * [Graph] Mutation for Heterograph (#1818) * mutation add_nodes and add_edges * Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop * Fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * upd * upd * upd * fix * [Transfom] Mutable transform (#1833) * add nodesy * All three * Fix * lint * Add some test case * Fix * Fix * Fix * Fix * Fix * Fix * fix * triger * Fix * fix Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> * [Graph] Migrate Batch & Readout module to heterograph (#1836) * dgl.batch * unbatch * fix to device * reduce readout; segment reduce * change batch_num_nodes|edges to function * reduce readout/ softmax * broadcast * topk * fix * fix tf and mx * fix some ci * fix batch but unbatch differently * new checkk * upd * upd * upd * idtype behavior; code reorg * idtype behavior; code reorg * wip: test_basics * pass test_basics * WIP: from nx/ to nx * missing files * upd * pass test_basics:test_nx_conversion * Fix test * Fix inplace update * WIP: fixing tests * upd * pass test_transform cpu * pass gpu test_transform * pass test_batched_graph * GPU graph auto cast to int32 * missing file * stash * WIP: rgcn-hetero * Fix two datasety * upd * weird * Fix capsuley * fuck you * fuck matthias * Fix dgmg * fix bug in block degrees; pass rgcn-hetero * rgcn * gat and diffpool fix also fix ppi and tu dataset * Tree LSTM * pointcloud * rrn; wip: sgc * resolve conflicts * upd * sgc and reddit dataset * upd * Fix deepwalk, gindt and gcn * fix datasets and sign * optimization * optimization * upd * upd * Fix GIN * fix bug in add_nodes add_edges; tagcn * adaptive sampling and gcmc * upd * upd * fix geometric * fix * metapath2vec * fix agnn * fix pickling problem of block * fix utests * miss file * linegraph * upd * upd * upd * graphsage * stgcn_wave * fix hgt * on unittests * Fix transformer * Fix HAN * passed pytorch unittests * lint * fix * Fix cluster gcn * cluster-gcn is ready * on fixing block related codes * 2nd order derivative * Revert "2nd order derivative" This reverts commit 523bf6c249bee61b51b1ad1babf42aad4167f206. * passed torch utests again * fix all mxnet unittests * delete some useless tests * pass all tf cpu tests * disable * disable distributed unittest * fix * fix * lint * fix * fix * fix script * fix tutorial * fix apply edges bug * fix 2 basics * fix tutorial Co-authored-by: yzh119 <expye@outlook.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-51-214.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-42.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-1-5.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-68-185.ec2.internal>
44089c8b · Minjie Wang · GitHub · 015acfd2 · 44089c8b · 44089c8b
Unverified Commit 44089c8b authored Jul 28, 2020 by Minjie Wang Committed by GitHub Jul 28, 2020
20 changed files
--- a/examples/pytorch/rrn/README.md
+++ b/examples/pytorch/rrn/README.md
@@ -15,7 +15,7 @@ application on sudoku solving.

 ## Usage

- To train the RNN for sudoku, run the following
+- To train the RRN for sudoku, run the following
 ```
 python3 train_sudoku.py --output_dir out/ --do_train --do_eval
 ```

--- a/examples/pytorch/rrn/train_sudoku.py
+++ b/examples/pytorch/rrn/train_sudoku.py
@@ -27,13 +27,8 @@ def main(args):
        for epoch in range(args.epochs):
            model.train()
            for i, g in enumerate(train_dataloader):
-                g.ndata['q'] = g.ndata['q'].to(device)
-                g.ndata['a'] = g.ndata['a'].to(device)
-                g.ndata['row'] = g.ndata['row'].to(device)
-                g.ndata['col'] = g.ndata['col'].to(device)
-
+                g = g.to(device)
                _, loss = model(g)
-
                opt.zero_grad()
                loss.backward()
                opt.step()
@@ -46,11 +41,7 @@ def main(args):
            dev_loss = []
            dev_res = []
            for g in dev_dataloader:
-                g.ndata['q'] = g.ndata['q'].to(device)
-                g.ndata['a'] = g.ndata['a'].to(device)
-                g.ndata['row'] = g.ndata['row'].to(device)
-                g.ndata['col'] = g.ndata['col'].to(device)
-
+                g = g.to(device)
                target = g.ndata['a']
                target = target.view([-1, 81])

@@ -85,11 +76,7 @@ def main(args):
        test_loss = []
        test_res = []
        for g in test_dataloader:
-            g.ndata['q'] = g.ndata['q'].to(device)
-            g.ndata['a'] = g.ndata['a'].to(device)
-            g.ndata['row'] = g.ndata['row'].to(device)
-            g.ndata['col'] = g.ndata['col'].to(device)
-
+            g = g.to(device)
            target = g.ndata['a']
            target = target.view([-1, 81])


--- a/examples/pytorch/sgc/sgc.py
+++ b/examples/pytorch/sgc/sgc.py
@@ -76,7 +76,9 @@ def main(args):
                   cached=True,
                   bias=args.bias)

-    if cuda: model.cuda()
+    if cuda:
+        model.cuda()
+        g = g.to(args.gpu)
    loss_fcn = torch.nn.CrossEntropyLoss()

    # use optimizer

--- a/examples/pytorch/sgc/sgc_reddit.py
+++ b/examples/pytorch/sgc/sgc_reddit.py
@@ -31,16 +31,18 @@ def main(args):
    # load and preprocess dataset
    args.dataset = "reddit-self-loop"
    data = load_data(args)
-    features = torch.FloatTensor(data.features)
-    labels = torch.LongTensor(data.labels)
-    if hasattr(torch, 'BoolTensor'):
-        train_mask = torch.BoolTensor(data.train_mask)
-        val_mask = torch.BoolTensor(data.val_mask)
-        test_mask = torch.BoolTensor(data.test_mask)
+    g = data.graph
+    if args.gpu < 0:
+        cuda = False
    else:
-        train_mask = torch.ByteTensor(data.train_mask)
-        val_mask = torch.ByteTensor(data.val_mask)
-        test_mask = torch.ByteTensor(data.test_mask)
+        cuda = True
+        g = g.to(args.gpu)
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
    in_feats = features.shape[1]
    n_classes = data.num_labels
    n_edges = data.graph.number_of_edges()
@@ -51,29 +53,16 @@ def main(args):
      #Val samples %d
      #Test samples %d""" %
          (n_edges, n_classes,
-           train_mask.int().sum().item(),
-           val_mask.int().sum().item(),
-           test_mask.int().sum().item()))
-
-    if args.gpu < 0:
-        cuda = False
-    else:
-        cuda = True
-        torch.cuda.set_device(args.gpu)
-        features = features.cuda()
-        labels = labels.cuda()
-        train_mask = train_mask.cuda()
-        val_mask = val_mask.cuda()
-        test_mask = test_mask.cuda()
+           g.ndata['train_mask'].int().sum().item(),
+           g.ndata['val_mask'].int().sum().item(),
+           g.ndata['test_mask'].int().sum().item()))

    # graph preprocess and calculate normalization factor
-    g = DGLGraph(data.graph)
    n_edges = g.number_of_edges()
    # normalization
    degs = g.in_degrees().float()
    norm = torch.pow(degs, -0.5)
    norm[torch.isinf(norm)] = 0
-    if cuda: norm = norm.cuda()
    g.ndata['norm'] = norm.unsqueeze(1)

    # create SGC model

--- a/examples/pytorch/sign/dataset.py
+++ b/examples/pytorch/sign/dataset.py
@@ -13,29 +13,27 @@ def load_dataset(name):
        val_nid = splitted_idx["valid"]
        test_nid = splitted_idx["test"]
        g, labels = dataset[0]
-        labels = labels.squeeze()
        n_classes = int(labels.max() - labels.min() + 1)
-        features = g.ndata.pop("feat").float()
+        g.ndata['label'] = labels.squeeze()
+        g.ndata['feat'] = g.ndata['feat'].float()
    elif dataset in ["reddit", "cora"]:
        if dataset == "reddit":
            from dgl.data import RedditDataset
            data = RedditDataset(self_loop=True)
-            g = data.graph
+            g = data[0]
        else:
            from dgl.data import CitationGraphDataset
            data = CitationGraphDataset('cora')
-            g = dgl.DGLGraph(data.graph)
-        train_mask = data.train_mask
-        val_mask = data.val_mask
-        test_mask = data.test_mask
-        features = torch.Tensor(data.features)
-        labels = torch.LongTensor(data.labels)
+            g = data[0]
        n_classes = data.num_labels
-        train_nid = torch.LongTensor(np.nonzero(train_mask)[0])
-        val_nid = torch.LongTensor(np.nonzero(val_mask)[0])
-        test_nid = torch.LongTensor(np.nonzero(test_mask)[0])
+        train_mask = g.ndata['train_mask']
+        val_mask = g.ndata['val_mask']
+        test_mask = g.ndata['test_mask']
+        train_nid = torch.LongTensor(train_mask.nonzero().squeeze())
+        val_nid = torch.LongTensor(val_mask.nonzero().squeeze())
+        test_nid = torch.LongTensor(test_mask.nonzero().squeeze())
    else:
        print("Dataset {} is not supported".format(name))
        assert(0)

-    return g, features, labels, n_classes, train_nid, val_nid, test_nid
+    return g, n_classes, train_nid, val_nid, test_nid
--- a/examples/pytorch/sign/sign.py
+++ b/examples/pytorch/sign/sign.py
@@ -61,14 +61,14 @@ class Model(nn.Module):
        return out


-def calc_weight(g, device):
+def calc_weight(g):
    """
    Compute row_normalized(D^(-1/2)AD^(-1/2))
    """
    with g.local_scope():
        # compute D^(-0.5)*D(-1/2), assuming A is Identity
-        g.ndata["in_deg"] = g.in_degrees().float().to(device).pow(-0.5)
-        g.ndata["out_deg"] = g.out_degrees().float().to(device).pow(-0.5)
+        g.ndata["in_deg"] = g.in_degrees().float().pow(-0.5)
+        g.ndata["out_deg"] = g.out_degrees().float().pow(-0.5)
        g.apply_edges(fn.u_mul_v("out_deg", "in_deg", "weight"))
        # row-normalize weight
        g.update_all(fn.copy_e("weight", "msg"), fn.sum("msg", "norm"))
@@ -76,13 +76,13 @@ def calc_weight(g, device):
        return g.edata["weight"]


-def preprocess(g, features, args, device):
+def preprocess(g, features, args):
    """
    Pre-compute the average of n-th hop neighbors
    """
    with torch.no_grad():
-        g.edata["weight"] = calc_weight(g, device)
-        g.ndata["feat_0"] = features.to(device)
+        g.edata["weight"] = calc_weight(g)
+        g.ndata["feat_0"] = features
        for hop in range(1, args.R + 1):
            g.update_all(fn.u_mul_e(f"feat_{hop-1}", "weight", "msg"),
                         fn.sum("msg", f"feat_{hop}"))
@@ -94,11 +94,12 @@ def preprocess(g, features, args, device):

 def prepare_data(device, args):
    data = load_dataset(args.dataset)
-    g, features, labels, n_classes, train_nid, val_nid, test_nid = data
-    in_feats = features.shape[1]
-    feats = preprocess(g, features, args, device)
+    g, n_classes, train_nid, val_nid, test_nid = data
+    g = g.to(device)
+    in_feats = g.ndata['feat'].shape[1]
+    feats = preprocess(g, g.ndata['feat'], args)
+    labels = g.ndata['label']
    # move to device
-    labels = labels.to(device)
    train_nid = train_nid.to(device)
    val_nid = val_nid.to(device)
    test_nid = test_nid.to(device)

--- a/examples/pytorch/stgcn_wave/README.md
+++ b/examples/pytorch/stgcn_wave/README.md
@@ -8,7 +8,7 @@ Dependencies
 - PyTorch 1.1.0+
 - sklearn
 - dgl
-
+- tables


 How to run

--- a/examples/pytorch/stgcn_wave/main.py
+++ b/examples/pytorch/stgcn_wave/main.py
@@ -22,7 +22,7 @@ parser.add_argument('--window', type=int, default=144, help='window length')
 parser.add_argument('--sensorsfilepath', type=str, default='./data/sensor_graph/graph_sensor_ids.txt', help='sensors file path')
 parser.add_argument('--disfilepath', type=str, default='./data/sensor_graph/distances_la_2012.csv', help='distance file path')
 parser.add_argument('--tsfilepath', type=str, default='./data/metr-la.h5', help='ts file path')
-parser.add_argument('--savemodelpath', type=str, default='./save/stgcnwavemodel.pt', help='save model path')
+parser.add_argument('--savemodelpath', type=str, default='stgcnwavemodel.pt', help='save model path')
 parser.add_argument('--pred_len', type=int, default=5, help='how many steps away we want to predict')
 parser.add_argument('--control_str', type=str, default='TNTSTNTST', help='model strcture controller, T: Temporal Layer, S: Spatio Layer, N: Norm Layer')
 parser.add_argument('--channels', type=int, nargs='+', default=[1, 16, 32, 64, 32, 128], help='model strcture controller, T: Temporal Layer, S: Spatio Layer, N: Norm Layer')
@@ -37,8 +37,7 @@ distance_df = pd.read_csv(args.disfilepath, dtype={'from': 'str', 'to': 'str'})

 adj_mx = get_adjacency_matrix(distance_df, sensor_ids)
 sp_mx = sp.coo_matrix(adj_mx)
-G = dgl.DGLGraph()
-G.from_scipy_sparse_matrix(sp_mx)
+G = dgl.from_scipy(sp_mx)


 df = pd.read_hdf(args.tsfilepath)
@@ -91,6 +90,7 @@ test_iter = torch.utils.data.DataLoader(test_data, batch_size)


 loss = nn.MSELoss()
+G = G.to(device)
 model = STGCN_WAVE(blocks, n_his, n_route, G, drop_prob, num_layers, args.control_str).to(device)
 optimizer = torch.optim.RMSprop(model.parameters(), lr=lr)


--- a/examples/pytorch/tagcn/train.py
+++ b/examples/pytorch/tagcn/train.py
@@ -22,19 +22,20 @@ def evaluate(model, features, labels, mask):
 def main(args):
    # load and preprocess dataset
    data = load_data(args)
-    features = torch.FloatTensor(data.features)
-    labels = torch.LongTensor(data.labels)
-    if hasattr(torch, 'BoolTensor'):
-        train_mask = torch.BoolTensor(data.train_mask)
-        val_mask = torch.BoolTensor(data.val_mask)
-        test_mask = torch.BoolTensor(data.test_mask)
+    g = data[0]
+    if args.gpu < 0:
+        cuda = False
    else:
-        train_mask = torch.ByteTensor(data.train_mask)
-        val_mask = torch.ByteTensor(data.val_mask)
-        test_mask = torch.ByteTensor(data.test_mask)
+        cuda = True
+        g = g.to(args.gpu)
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
    in_feats = features.shape[1]
    n_classes = data.num_labels
-    n_edges = data.graph.number_of_edges()
+    n_edges = g.number_of_edges()
    print("""----Data statistics------'
      #Edges %d
      #Classes %d
@@ -46,24 +47,10 @@ def main(args):
              val_mask.int().sum().item(),
              test_mask.int().sum().item()))

-    if args.gpu < 0:
-        cuda = False
-    else:
-        cuda = True
-        torch.cuda.set_device(args.gpu)
-        features = features.cuda()
-        labels = labels.cuda()
-        train_mask = train_mask.cuda()
-        val_mask = val_mask.cuda()
-        test_mask = test_mask.cuda()
-
    # graph preprocess and calculate normalization factor
-    g = data.graph
    # add self loop
    if args.self_loop:
-        g.remove_edges_from(nx.selfloop_edges(g))
-        g.add_edges_from(zip(g.nodes(), g.nodes()))
-    g = DGLGraph(g)
+        g = g.remove_self_loop().add_self_loop()
    n_edges = g.number_of_edges()

    # create TAGCN model

--- a/examples/pytorch/transformer/README.md
+++ b/examples/pytorch/transformer/README.md
@@ -9,6 +9,7 @@ The folder contains training module and inferencing module (beam decoder) for Tr
 - networkx
 - tqdm
 - requests
+- matplotlib

 ## Usage


--- a/examples/pytorch/transformer/dataset/graph.py
+++ b/examples/pytorch/transformer/dataset/graph.py
@@ -19,7 +19,7 @@ class GraphPool:
        print('start creating graph pool...')
        tic = time.time()
        self.n, self.m = n, m
-        g_pool = [[dgl.DGLGraph() for _ in range(m)] for _ in range(n)]
+        g_pool = [[dgl.graph([]) for _ in range(m)] for _ in range(n)]
        num_edges = {
            'ee': np.zeros((n, n)).astype(int),
            'ed': np.zeros((n, m)).astype(int),
@@ -103,6 +103,7 @@ class GraphPool:

        g.set_n_initializer(dgl.init.zero_initializer)
        g.set_e_initializer(dgl.init.zero_initializer)
+        g = g.to(device).long()

        return Graph(g=g,
                     src=(th.cat(src), th.cat(src_pos)),
@@ -160,6 +161,7 @@ class GraphPool:

        g.set_n_initializer(dgl.init.zero_initializer)
        g.set_e_initializer(dgl.init.zero_initializer)
+        g = g.to(device).long()

        return Graph(g=g,
                     src=(th.cat(src), th.cat(src_pos)),

--- a/examples/pytorch/transformer/modules/models.py
+++ b/examples/pytorch/transformer/modules/models.py
@@ -170,9 +170,9 @@ class Transformer(nn.Module):
            y = y.view(-1)
            tgt_embed = self.tgt_embed(y)
            g.ndata['x'][nids['dec']] = self.pos_enc.dropout(tgt_embed + tgt_pos)
-            edges_ed = g.filter_edges(lambda e: (e.dst['pos'] < step) & ~e.dst['mask'] , eids['ed'])
-            edges_dd = g.filter_edges(lambda e: (e.dst['pos'] < step) & ~e.dst['mask'], eids['dd'])
-            nodes_d = g.filter_nodes(lambda v: (v.data['pos'] < step) & ~v.data['mask'], nids['dec'])
+            edges_ed = g.filter_edges(lambda e: (e.dst['pos'] < step) & ~e.dst['mask'].bool(), eids['ed'])
+            edges_dd = g.filter_edges(lambda e: (e.dst['pos'] < step) & ~e.dst['mask'].bool(), eids['dd'])
+            nodes_d = g.filter_nodes(lambda v: (v.data['pos'] < step) & ~v.data['mask'].bool(), nids['dec'])
            for i in range(self.decoder.N):
                pre_func, post_func = self.decoder.pre_func(i, 'qkv'), self.decoder.post_func(i)
                nodes, edges = nodes_d, edges_dd

--- a/examples/pytorch/transformer/translation_test.py
+++ b/examples/pytorch/transformer/translation_test.py
@@ -35,7 +35,7 @@ if __name__ == '__main__':
        model.load_state_dict(th.load(f, map_location=lambda storage, loc: storage))
    model = model.to(device)
    model.eval()
-    test_iter = dataset(graph_pool, mode='test', batch_size=args.batch, devices=[device], k=k)
+    test_iter = dataset(graph_pool, mode='test', batch_size=args.batch, device=device, k=k)
    for i, g in enumerate(test_iter):
        with th.no_grad():
            output = model.infer(g, dataset.MAX_LENGTH, dataset.eos_id, k, alpha=0.6)

--- a/examples/pytorch/transformer/translation_train.py
+++ b/examples/pytorch/transformer/translation_train.py
@@ -3,7 +3,6 @@ from loss import *
 from optims import *
 from dataset import *
 from modules.config import *
-#from modules.viz import *
 import numpy as np
 import argparse
 import torch

--- a/examples/pytorch/tree_lstm/train.py
+++ b/examples/pytorch/tree_lstm/train.py
@@ -77,14 +77,14 @@ def main(args):
        t_epoch = time.time()
        model.train()
        for step, batch in enumerate(train_loader):
-            g = batch.graph
+            g = batch.graph.to(device)
            n = g.number_of_nodes()
            h = th.zeros((n, args.h_size)).to(device)
            c = th.zeros((n, args.h_size)).to(device)
            if step >= 3:
                t0 = time.time() # tik

-            logits = model(batch, h, c)
+            logits = model(batch, g, h, c)
            logp = F.log_softmax(logits, 1)
            loss = F.nll_loss(logp, batch.label, reduction='sum')

@@ -98,7 +98,7 @@ def main(args):
            if step > 0 and step % args.log_every == 0:
                pred = th.argmax(logits, 1)
                acc = th.sum(th.eq(batch.label, pred))
-                root_ids = [i for i in range(batch.graph.number_of_nodes()) if batch.graph.out_degree(i)==0]
+                root_ids = [i for i in range(g.number_of_nodes()) if g.out_degree(i)==0]
                root_acc = np.sum(batch.label.cpu().data.numpy()[root_ids] == pred.cpu().data.numpy()[root_ids])

                print("Epoch {:05d} | Step {:05d} | Loss {:.4f} | Acc {:.4f} | Root Acc {:.4f} | Time(s) {:.4f}".format(
@@ -110,17 +110,17 @@ def main(args):
        root_accs = []
        model.eval()
        for step, batch in enumerate(dev_loader):
-            g = batch.graph
+            g = batch.graph.to(device)
            n = g.number_of_nodes()
            with th.no_grad():
                h = th.zeros((n, args.h_size)).to(device)
                c = th.zeros((n, args.h_size)).to(device)
-                logits = model(batch, h, c)
+                logits = model(batch, g, h, c)

            pred = th.argmax(logits, 1)
            acc = th.sum(th.eq(batch.label, pred)).item()
            accs.append([acc, len(batch.label)])
-            root_ids = [i for i in range(batch.graph.number_of_nodes()) if batch.graph.out_degree(i)==0]
+            root_ids = [i for i in range(g.number_of_nodes()) if g.out_degree(i)==0]
            root_acc = np.sum(batch.label.cpu().data.numpy()[root_ids] == pred.cpu().data.numpy()[root_ids])
            root_accs.append([root_acc, len(root_ids)])

@@ -148,17 +148,17 @@ def main(args):
    root_accs = []
    model.eval()
    for step, batch in enumerate(test_loader):
-        g = batch.graph
+        g = batch.graph.to(device)
        n = g.number_of_nodes()
        with th.no_grad():
            h = th.zeros((n, args.h_size)).to(device)
            c = th.zeros((n, args.h_size)).to(device)
-            logits = model(batch, h, c)
+            logits = model(batch, g, h, c)

        pred = th.argmax(logits, 1)
        acc = th.sum(th.eq(batch.label, pred)).item()
        accs.append([acc, len(batch.label)])
-        root_ids = [i for i in range(batch.graph.number_of_nodes()) if batch.graph.out_degree(i)==0]
+        root_ids = [i for i in range(g.number_of_nodes()) if g.out_degree(i)==0]
        root_acc = np.sum(batch.label.cpu().data.numpy()[root_ids] == pred.cpu().data.numpy()[root_ids])
        root_accs.append([root_acc, len(root_ids)])

@@ -172,7 +172,7 @@ if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=int, default=-1)
    parser.add_argument('--seed', type=int, default=41)
-    parser.add_argument('--batch-size', type=int, default=25)
+    parser.add_argument('--batch-size', type=int, default=20)
    parser.add_argument('--child-sum', action='store_true')
    parser.add_argument('--x-size', type=int, default=300)
    parser.add_argument('--h-size', type=int, default=150)

--- a/examples/pytorch/tree_lstm/tree_lstm.py
+++ b/examples/pytorch/tree_lstm/tree_lstm.py
@@ -82,12 +82,14 @@ class TreeLSTM(nn.Module):
        cell = TreeLSTMCell if cell_type == 'nary' else ChildSumTreeLSTMCell
        self.cell = cell(x_size, h_size)

-    def forward(self, batch, h, c):
+    def forward(self, batch, g, h, c):
        """Compute tree-lstm prediction given a batch.
        Parameters
        ----------
        batch : dgl.data.SSTBatch
            The data batch.
+        g : dgl.DGLGraph
+            Tree for computation.
        h : Tensor
            Initial hidden state.
        c : Tensor
@@ -97,17 +99,13 @@ class TreeLSTM(nn.Module):
        logits : Tensor
            The prediction of each node.
        """
-        g = batch.graph
-        g.register_message_func(self.cell.message_func)
-        g.register_reduce_func(self.cell.reduce_func)
-        g.register_apply_node_func(self.cell.apply_node_func)
        # feed embedding
        embeds = self.embedding(batch.wordid * batch.mask)
        g.ndata['iou'] = self.cell.W_iou(self.dropout(embeds)) * batch.mask.float().unsqueeze(-1)
        g.ndata['h'] = h
        g.ndata['c'] = c
        # propagate
-        dgl.prop_nodes_topo(g)
+        dgl.prop_nodes_topo(g, self.cell.message_func, self.cell.reduce_func, apply_node_func=self.cell.apply_node_func)
        # compute logits
        h = self.dropout(g.ndata.pop('h'))
        logits = self.linear(h)

--- a/examples/tensorflow/gat/train.py
+++ b/examples/tensorflow/gat/train.py
@@ -66,7 +66,7 @@ def main(args):
        g = data.graph
        # add self loop
        g.remove_edges_from(nx.selfloop_edges(g))
-        g = DGLGraph(g)
+        g = DGLGraph(g).to(device)
        g.add_edges(g.nodes(), g.nodes())
        n_edges = g.number_of_edges()
        # create model

--- a/examples/tensorflow/gcn/train.py
+++ b/examples/tensorflow/gcn/train.py
@@ -51,7 +51,7 @@ def main(args):
        if args.self_loop:
            g.remove_edges_from(nx.selfloop_edges(g))
            g.add_edges_from(zip(g.nodes(), g.nodes()))
-        g = DGLGraph(g)
+        g = DGLGraph(g).to(device)
        n_edges = g.number_of_edges()
        # normalization
        degs = tf.cast(tf.identity(g.in_degrees()), dtype=tf.float32)

--- a/include/dgl/aten/array_ops.h
+++ b/include/dgl/aten/array_ops.h
@@ -24,8 +24,9 @@ namespace aten {
 //////////////////////////////////////////////////////////////////////

 /*! \return A special array to represent null. */
-inline NDArray NullArray() {
-  return NDArray::Empty({0}, DLDataType{kDLInt, 64, 1}, DLContext{kDLCPU, 0});
+inline NDArray NullArray(const DLDataType& dtype = DLDataType{kDLInt, 64, 1},
+                         const DLContext& ctx = DLContext{kDLCPU, 0}) {
+  return NDArray::Empty({0}, dtype, ctx);
 }

 /*!
@@ -150,6 +151,8 @@ NDArray IndexSelect(NDArray array, int64_t start, int64_t end);
 /*!
 * \brief Permute the elements of an array according to given indices.
 *
+ * Only support 1D arrays.
+ *
 * Equivalent to:
 *
 * <code>
@@ -159,6 +162,17 @@ NDArray IndexSelect(NDArray array, int64_t start, int64_t end);
 */
 NDArray Scatter(NDArray array, IdArray indices);

+/*!
+ * \brief Scatter data into the output array.
+ *
+ * Equivalent to:
+ *
+ * <code>
+ *     out[index] = value
+ * </code>
+ */
+void Scatter_(IdArray index, NDArray value, NDArray out);
+
 /*!
 * \brief Repeat each element a number of times.  Equivalent to np.repeat(array, repeats)
 * \param array A 1D vector
@@ -280,6 +294,16 @@ std::pair<NDArray, IdArray> ConcatSlices(NDArray array, IdArray lengths);
 */
 IdArray CumSum(IdArray array, bool prepend_zero = false);

+/*!
+ * \brief Return the nonzero index.
+ *
+ * Only support 1D array. The result index array is in int64.
+ *
+ * \param array The input array.
+ * \return A 1D index array storing the positions of the non zero values.
+ */
+IdArray NonZero(NDArray array);
+
 /*!
 * \brief Return a string that prints out some debug information.
 */

--- a/include/dgl/aten/coo.h
+++ b/include/dgl/aten/coo.h
@@ -163,15 +163,45 @@ inline bool COOHasData(COOMatrix csr) {
 */
 std::pair<bool, bool> COOIsSorted(COOMatrix coo);

-/*! \brief Get data. The return type is an ndarray due to possible duplicate entries. */
-runtime::NDArray COOGetData(COOMatrix , int64_t row, int64_t col);
-
 /*!
 * \brief Get the data and the row,col indices for each returned entries.
+ *
+ * The operator supports matrix with duplicate entries and all the matched entries
+ * will be returned. The operator assumes there is NO duplicate (row, col) pair
+ * in the given input. Otherwise, the returned result is undefined.
+ *
 * \note This operator allows broadcasting (i.e, either row or col can be of length 1).
+ * \param mat Sparse matrix
+ * \param rows Row index
+ * \param cols Column index
+ * \return Three arrays {rows, cols, data}
 */
 std::vector<runtime::NDArray> COOGetDataAndIndices(
-    COOMatrix , runtime::NDArray rows, runtime::NDArray cols);
+    COOMatrix mat, runtime::NDArray rows, runtime::NDArray cols);
+
+/*! \brief Get data. The return type is an ndarray due to possible duplicate entries. */
+inline runtime::NDArray COOGetAllData(COOMatrix mat, int64_t row, int64_t col) {
+  IdArray rows = VecToIdArray<int64_t>({row}, mat.row->dtype.bits, mat.row->ctx);
+  IdArray cols = VecToIdArray<int64_t>({col}, mat.row->dtype.bits, mat.row->ctx);
+  const auto& rst = COOGetDataAndIndices(mat, rows, cols);
+  return rst[2];
+}
+
+/*!
+ * \brief Get the data for each (row, col) pair.
+ *
+ * The operator supports matrix with duplicate entries but only one matched entry
+ * will be returned for each (row, col) pair. Support duplicate input (row, col)
+ * pairs.
+ *
+ * \note This operator allows broadcasting (i.e, either row or col can be of length 1).
+ *
+ * \param mat Sparse matrix.
+ * \param rows Row index.
+ * \param cols Column index.
+ * \return Data array. The i^th element is the data of (rows[i], cols[i])
+ */
+runtime::NDArray COOGetData(COOMatrix mat, runtime::NDArray rows, runtime::NDArray cols);

 /*! \brief Return a transposed COO matrix */
 COOMatrix COOTranspose(COOMatrix coo);