[Distributed] Heterogeneous graph support (#2457)

* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>

[Distributed] Heterogeneous graph support (#2457)
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>
25ac3344 · Da Zheng · GitHub · aa884d43 · 25ac3344 · 25ac3344
Unverified Commit 25ac3344 authored Jan 24, 2021 by Da Zheng Committed by GitHub Jan 24, 2021
20 changed files
--- a/examples/pytorch/graphsage/experimental/README.md
+++ b/examples/pytorch/graphsage/experimental/README.md
@@ -39,6 +39,8 @@ the number of nodes, the number of edges and the number of labelled nodes.
 python3 partition_graph.py --dataset ogb-product --num_parts 4 --balance_train --balance_edges
 ```
+This script generates partitioned graphs and store them in the directory called `data`.
 ### Step 2: copy the partitioned data and files to the cluster
 DGL provides a script for copying partitioned data and files to the cluster. Before that, copy the training script to a local folder:

--- a/examples/pytorch/rgcn/experimental/README.md
+++ b/examples/pytorch/rgcn/experimental/README.md
 ## Distributed training
-This is an example of training RGCN node classification in a distributed fashion. Currently, the example only support training RGCN graphs with no input features. The current implementation follows ../rgcn/entity_claasify_mp.py.
+This is an example of training RGCN node classification in a distributed fashion. Currently, the example train RGCN graphs with input node features. The current implementation follows ../rgcn/entity_claasify_mp.py.
 Before training, please install some python libs by pip:
@@ -36,6 +36,8 @@ the number of nodes, the number of edges and the number of labelled nodes.
 python3 partition_graph.py --dataset ogbn-mag --num_parts 4 --balance_train --balance_edges
 ```
+This script generates partitioned graphs and store them in the directory called `data`.
 ### Step 2: copy the partitioned data to the cluster
 DGL provides a script for copying partitioned data to the cluster. Before that, copy the training script to a local folder:
@@ -78,7 +80,7 @@ python3 ~/dgl/tools/launch.py \
 --num_samplers 4 \
 --part_config data/ogbn-mag.json \
 --ip_config ip_config.txt \
-"python3 dgl_code/entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 512  --n-hidden 64 --lr 0.01 --eval-batch-size 16  --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt  --num-workers 4 --num-servers 1 --sparse-embedding  --sparse-lr 0.06"
+"python3 dgl_code/entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 512  --n-hidden 64 --lr 0.01 --eval-batch-size 16  --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt  --num-workers 4 --num-servers 1 --sparse-embedding  --sparse-lr 0.06 --node-feats"
 ```
 We can get the performance score at the second epoch:
@@ -98,5 +100,5 @@ python3 partition_graph.py --dataset ogbn-mag --num_parts 1
 ### Step 2: run the training script
 ```bash
-python3 entity_classify_dist.py --graph-name ogbn-mag  --dataset ogbn-mag --fanout='25,25' --batch-size 256 --n-hidden 64 --lr 0.01 --eval-batch-size 8 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --conf-path 'data/ogbn-mag.json' --standalone
+python3 entity_classify_dist.py --graph-name ogbn-mag  --dataset ogbn-mag --fanout='25,25' --batch-size 512 --n-hidden 64 --lr 0.01 --eval-batch-size 128 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --conf-path 'data/ogbn-mag.json' --standalone  --sparse-embedding  --sparse-lr 0.06 --node-feats
 ```
--- a/examples/pytorch/rgcn/experimental/entity_classify_dist.py
+++ b/examples/pytorch/rgcn/experimental/entity_classify_dist.py
--- a/examples/pytorch/rgcn/experimental/partition_graph.py
+++ b/examples/pytorch/rgcn/experimental/partition_graph.py
@@ -6,7 +6,7 @@ import time
 from ogb.nodeproppred import DglNodePropPredDataset
-def load_ogb(dataset, global_norm):
+def load_ogb(dataset):
    if dataset == 'ogbn-mag':
        dataset = DglNodePropPredDataset(name=dataset)
        split_idx = dataset.get_idx_split()
@@ -33,54 +33,24 @@ def load_ogb(dataset, global_norm):
        print('Number of valid: {}'.format(len(val_idx)))
        print('Number of test: {}'.format(len(test_idx)))
-        # currently we do not support node feature in mag dataset.
-        # calculate norm for each edge type and store in edge
-        if global_norm is False:
-            for canonical_etype in hg.canonical_etypes:
-                u, v, eid = hg.all_edges(form='all', etype=canonical_etype)
-                _, inverse_index, count = th.unique(v, return_inverse=True, return_counts=True)
-                degrees = count[inverse_index]
-                norm = th.ones(eid.shape[0]) / degrees
-                norm = norm.unsqueeze(1)
-                hg.edges[canonical_etype].data['norm'] = norm
        # get target category id
        category_id = len(hg.ntypes)
        for i, ntype in enumerate(hg.ntypes):
            if ntype == category:
                category_id = i
-        g = dgl.to_homogeneous(hg, edata=['norm'])
+        train_mask = th.zeros((hg.number_of_nodes('paper'),), dtype=th.bool)
-        if global_norm:
-            u, v, eid = g.all_edges(form='all')
-            _, inverse_index, count = th.unique(v, return_inverse=True, return_counts=True)
-            degrees = count[inverse_index]
-            norm = th.ones(eid.shape[0]) / degrees
-            norm = norm.unsqueeze(1)
-            g.edata['norm'] = norm
-        node_ids = th.arange(g.number_of_nodes())
-        # find out the target node ids
-        node_tids = g.ndata[dgl.NTYPE]
-        loc = (node_tids == category_id)
-        target_idx = node_ids[loc]
-        train_idx = target_idx[train_idx]
-        val_idx = target_idx[val_idx]
-        test_idx = target_idx[test_idx]
-        train_mask = th.zeros((g.number_of_nodes(),), dtype=th.bool)
        train_mask[train_idx] = True
-        val_mask = th.zeros((g.number_of_nodes(),), dtype=th.bool)
+        val_mask = th.zeros((hg.number_of_nodes('paper'),), dtype=th.bool)
        val_mask[val_idx] = True
-        test_mask = th.zeros((g.number_of_nodes(),), dtype=th.bool)
+        test_mask = th.zeros((hg.number_of_nodes('paper'),), dtype=th.bool)
        test_mask[test_idx] = True
-        g.ndata['train_mask'] = train_mask
+        hg.nodes['paper'].data['train_mask'] = train_mask
-        g.ndata['val_mask'] = val_mask
+        hg.nodes['paper'].data['val_mask'] = val_mask
-        g.ndata['test_mask'] = test_mask
+        hg.nodes['paper'].data['test_mask'] = test_mask
-        labels = th.full((g.number_of_nodes(),), -1, dtype=paper_labels.dtype)
+        hg.nodes['paper'].data['labels'] = paper_labels
-        labels[target_idx] = paper_labels
+        return hg
-        g.ndata['labels'] = labels
-        return g
    else:
        raise("Do not support other ogbn datasets.")
@@ -98,21 +68,19 @@ if __name__ == '__main__':
                           help='turn the graph into an undirected graph.')
    argparser.add_argument('--balance_edges', action='store_true',
                           help='balance the number of edges in each partition.')
-    argparser.add_argument('--global-norm', default=False, action='store_true',
-                           help='User global norm instead of per node type norm')
    args = argparser.parse_args()
    start = time.time()
-    g = load_ogb(args.dataset, args.global_norm)
+    g = load_ogb(args.dataset)
    print('load {} takes {:.3f} seconds'.format(args.dataset, time.time() - start))
    print('|V|={}, |E|={}'.format(g.number_of_nodes(), g.number_of_edges()))
-    print('train: {}, valid: {}, test: {}'.format(th.sum(g.ndata['train_mask']),
+    print('train: {}, valid: {}, test: {}'.format(th.sum(g.nodes['paper'].data['train_mask']),
-                                                  th.sum(g.ndata['val_mask']),
+                                                  th.sum(g.nodes['paper'].data['val_mask']),
-                                                  th.sum(g.ndata['test_mask'])))
+                                                  th.sum(g.nodes['paper'].data['test_mask'])))
    if args.balance_train:
-        balance_ntypes = g.ndata['train_mask']
+        balance_ntypes = {'paper': g.nodes['paper'].data['train_mask']}
    else:
        balance_ntypes = None

--- a/python/dgl/backend/backend.py
+++ b/python/dgl/backend/backend.py
@@ -355,6 +355,22 @@ def sum(input, dim, keepdims=False):
    """
    pass
+def floor_div(in1, in2):
+    """Element-wise integer division and rounds each quotient towards zero.
+    Parameters
+    ----------
+    in1 : Tensor
+        The input tensor
+    in2 : Tensor or integer
+        The input
+    Returns
+    -------
+    Tensor
+        A framework-specific tensor.
+    """
 def reduce_sum(input):
    """Returns the sum of all elements in the input tensor.

--- a/python/dgl/backend/mxnet/tensor.py
+++ b/python/dgl/backend/mxnet/tensor.py
@@ -149,6 +149,9 @@ def sum(input, dim, keepdims=False):
        return nd.array([0.], dtype=input.dtype, ctx=input.context)
    return nd.sum(input, axis=dim, keepdims=keepdims)
+def floor_div(in1, in2):
+    return in1 / in2
 def reduce_sum(input):
    return input.sum()

--- a/python/dgl/backend/pytorch/tensor.py
+++ b/python/dgl/backend/pytorch/tensor.py
@@ -117,6 +117,9 @@ def copy_to(input, ctx, **kwargs):
 def sum(input, dim, keepdims=False):
    return th.sum(input, dim=dim, keepdim=keepdims)
+def floor_div(in1, in2):
+    return in1 // in2
 def reduce_sum(input):
    return input.sum()

--- a/python/dgl/backend/tensorflow/tensor.py
+++ b/python/dgl/backend/tensorflow/tensor.py
@@ -168,6 +168,8 @@ def sum(input, dim, keepdims=False):
        input = tf.cast(input, tf.int32)
    return tf.reduce_sum(input, axis=dim, keepdims=keepdims)
+def floor_div(in1, in2):
+    return astype(in1 / in2, dtype(in1))
 def reduce_sum(input):
    if input.dtype == tf.bool:

--- a/python/dgl/data/citation_graph.py
+++ b/python/dgl/data/citation_graph.py
@@ -184,9 +184,9 @@ class CitationGraphDataset(DGLBuiltinDataset):
        self._graph = nx.DiGraph(graph)
        self._num_classes = info['num_classes']
-        self._g.ndata['train_mask'] = generate_mask_tensor(self._g.ndata['train_mask'].numpy())
+        self._g.ndata['train_mask'] = generate_mask_tensor(F.asnumpy(self._g.ndata['train_mask']))
-        self._g.ndata['val_mask'] = generate_mask_tensor(self._g.ndata['val_mask'].numpy())
+        self._g.ndata['val_mask'] = generate_mask_tensor(F.asnumpy(self._g.ndata['val_mask']))
-        self._g.ndata['test_mask'] = generate_mask_tensor(self._g.ndata['test_mask'].numpy())
+        self._g.ndata['test_mask'] = generate_mask_tensor(F.asnumpy(self._g.ndata['test_mask']))
        # hack for mxnet compatability
        if self.verbose:

--- a/python/dgl/distributed/dist_dataloader.py
+++ b/python/dgl/distributed/dist_dataloader.py
@@ -133,7 +133,7 @@ class DistDataLoader:
        if not self.drop_last and len(dataset) % self.batch_size != 0:
            self.expected_idxs += 1
-        # We need to have a unique Id for each data loader to identify itself
+        # We need to have a unique ID for each data loader to identify itself
        # in the sampler processes.
        global DATALOADER_ID
        self.name = "dataloader-" + str(DATALOADER_ID)

--- a/python/dgl/distributed/dist_graph.py
+++ b/python/dgl/distributed/dist_graph.py
--- a/python/dgl/distributed/dist_tensor.py
+++ b/python/dgl/distributed/dist_tensor.py
@@ -8,17 +8,10 @@ from .role import get_role
 from .. import utils
 from .. import backend as F
-def _get_data_name(name, part_policy):
-    ''' This is to get the name of data in the kvstore.
-    KVStore doesn't understand node data or edge data. We'll use a prefix to distinguish them.
-    '''
-    return part_policy + ':' + name
 def _default_init_data(shape, dtype):
    return F.zeros(shape, dtype, F.cpu())
-# These Ids can identify the anonymous distributed tensors.
+# These IDs can identify the anonymous distributed tensors.
 DIST_TENSOR_ID = 0
 class DistTensor:
@@ -144,10 +137,12 @@ class DistTensor:
            assert not persistent, 'We cannot generate anonymous persistent distributed tensors'
            global DIST_TENSOR_ID
            # All processes of the same role should create DistTensor synchronously.
-            # Thus, all of them should have the same Ids.
+            # Thus, all of them should have the same IDs.
            name = 'anonymous-' + get_role() + '-' + str(DIST_TENSOR_ID)
            DIST_TENSOR_ID += 1
-        self._name = _get_data_name(name, part_policy.policy_str)
+        assert isinstance(name, str), 'name {} is type {}'.format(name, type(name))
+        data_name = part_policy.get_data_name(name)
+        self._name = str(data_name)
        self._persistent = persistent
        if self._name not in exist_names:
            self.kvstore.init_data(self._name, shape, dtype, part_policy, init_func)

--- a/python/dgl/distributed/graph_partition_book.py
+++ b/python/dgl/distributed/graph_partition_book.py
--- a/python/dgl/distributed/graph_services.py
+++ b/python/dgl/distributed/graph_services.py
@@ -47,10 +47,10 @@ class FindEdgeResponse(Response):
 def _sample_neighbors(local_g, partition_book, seed_nodes, fan_out, edge_dir, prob, replace):
    """ Sample from local partition.
-    The input nodes use global Ids. We need to map the global node Ids to local node Ids,
+    The input nodes use global IDs. We need to map the global node IDs to local node IDs,
-    perform sampling and map the sampled results to the global Ids space again.
+    perform sampling and map the sampled results to the global IDs space again.
    The sampled results are stored in three vectors that store source nodes, destination nodes
-    and edge Ids.
+    and edge IDs.
    """
    local_ids = partition_book.nid2localnid(seed_nodes, partition_book.partid)
    local_ids = F.astype(local_ids, local_g.idtype)
@@ -59,7 +59,8 @@ def _sample_neighbors(local_g, partition_book, seed_nodes, fan_out, edge_dir, pr
        local_g, local_ids, fan_out, edge_dir, prob, replace, _dist_training=True)
    global_nid_mapping = local_g.ndata[NID]
    src, dst = sampled_graph.edges()
-    global_src, global_dst = global_nid_mapping[src], global_nid_mapping[dst]
+    global_src, global_dst = F.gather_row(global_nid_mapping, src), \
+            F.gather_row(global_nid_mapping, dst)
    global_eids = F.gather_row(local_g.edata[EID], sampled_graph.edata[EID])
    return global_src, global_dst, global_eids
@@ -78,10 +79,10 @@ def _find_edges(local_g, partition_book, seed_edges):
 def _in_subgraph(local_g, partition_book, seed_nodes):
    """ Get in subgraph from local partition.
-    The input nodes use global Ids. We need to map the global node Ids to local node Ids,
+    The input nodes use global IDs. We need to map the global node IDs to local node IDs,
-    get in-subgraph and map the sampled results to the global Ids space again.
+    get in-subgraph and map the sampled results to the global IDs space again.
    The results are stored in three vectors that store source nodes, destination nodes
-    and edge Ids.
+    and edge IDs.
    """
    local_ids = partition_book.nid2localnid(seed_nodes, partition_book.partid)
    local_ids = F.astype(local_ids, local_g.idtype)
@@ -254,7 +255,19 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
    Node/edge features are not preserved. The original IDs of
    the sampled edges are stored as the `dgl.EID` feature in the returned graph.
-    For now, we only support the input graph with one node type and one edge type.
+    This version provides an experimental support for heterogeneous graphs.
+    When the input graph is heterogeneous, the sampled subgraph is still stored in
+    the homogeneous graph format. That is, all nodes and edges are assigned with
+    unique IDs (in contrast, we typically use a type name and a node/edge ID to
+    identify a node or an edge in ``DGLGraph``). We refer to this type of IDs
+    as *homogeneous ID*.
+    Users can use :func:`dgl.distributed.GraphPartitionBook.map_to_per_ntype`
+    and :func:`dgl.distributed.GraphPartitionBook.map_to_per_etype`
+    to identify their node/edge types and node/edge IDs of that type.
+    For heterogeneous graphs, ``nodes`` can be a dictionary whose key is node type
+    and the value is type-specific node IDs; ``nodes`` can also be a tensor of
+    *homogeneous ID*.
    Parameters
    ----------
@@ -292,9 +305,17 @@ def sample_neighbors(g, nodes, fanout, edge_dir='in', prob=None, replace=False):
    DGLGraph
        A sampled subgraph containing only the sampled neighboring edges.  It is on CPU.
    """
+    gpb = g.get_partition_book()
    if isinstance(nodes, dict):
-        assert len(nodes) == 1, 'The distributed sampler only supports one node type for now.'
+        homo_nids = []
-        nodes = list(nodes.values())[0]
+        for ntype in nodes:
+            assert ntype in g.ntypes, 'The sampled node type does not exist in the input graph'
+            if F.is_tensor(nodes[ntype]):
+                typed_nodes = nodes[ntype]
+            else:
+                typed_nodes = toindex(nodes[ntype]).tousertensor()
+            homo_nids.append(gpb.map_to_homo_nid(typed_nodes, ntype))
+        nodes = F.cat(homo_nids, 0)
    def issue_remote_req(node_ids):
        return SamplingRequest(node_ids, fanout, edge_dir=edge_dir,
                               prob=prob, replace=replace)

--- a/python/dgl/distributed/id_map.py
+++ b/python/dgl/distributed/id_map.py
+"""Module for mapping between node/edge IDs and node/edge types."""
+import numpy as np
+from .._ffi.function import _init_api
+from .. import backend as F
+from .. import utils
+class IdMap:
+    '''A map for converting node/edge IDs to their type IDs and type-wise IDs.
+    For a heterogeneous graph, DGL assigns an integer ID to each node/edge type;
+    node and edge of different types have independent IDs starting from zero.
+    Therefore, a node/edge can be uniquely identified by an ID pair,
+    ``(type_id, type_wise_id)``. To make it convenient for distributed processing,
+    DGL further encodes the ID pair into one integer ID, which we refer to
+    as *homogeneous ID*.
+    DGL arranges nodes and edges so that all nodes of the same type have contiguous
+    homogeneous IDs. If the graph is partitioned, the nodes/edges of the same type
+    within a partition have contiguous homogeneous IDs.
+    Below is an example adjancency matrix of an unpartitioned heterogeneous graph
+    stored using the above ID assignment. Here, the graph has two types of nodes
+    (``T0`` and ``T1``), and four types of edges (``R0``, ``R1``, ``R2``, ``R3``).
+    There are a total of 400 nodes in the graph and each type has 200 nodes. Nodes
+    of type 0 have IDs in [0,200), while nodes of type 1 have IDs in [200, 400).
+    ```
+        0 <- T0 -> 200 <- T1 -> 400
+     0  +-----------+------------+
+        |           |            |
+     ^  |    R0     |     R1     |
+     T0 |           |            |
+     v  |           |            |
+    200 +-----------+------------+
+        |           |            |
+     ^  |    R2     |     R3     |
+     T1 |           |            |
+     v  |           |            |
+    400 +-----------+------------+
+    ```
+    Below shows the adjacency matrix after the graph is partitioned into two.
+    Note that each partition still has two node types and four edge types,
+    and nodes/edges of the same type have contiguous IDs.
+    ```
+                partition 0              partition 1
+        0 <- T0 -> 100 <- T1 -> 200 <- T0 -> 300 <- T1 -> 400
+     0  +-----------+------------+-----------+------------+
+        |           |            |                        |
+     ^  |    R0     |     R1     |                        |
+     T0 |           |            |                        |
+     v  |           |            |                        |
+    100 +-----------+------------+                        |
+        |           |            |                        |
+     ^  |    R2     |     R3     |                        |
+     T1 |           |            |                        |
+     v  |           |            |                        |
+    200 +-----------+------------+-----------+------------+
+        |                        |           |            |
+     ^  |                        |    R0     |     R1     |
+     T0 |                        |           |            |
+     v  |                        |           |            |
+    100 |                        +-----------+------------+
+        |                        |           |            |
+     ^  |                        |    R2     |     R3     |
+     T1 |                        |           |            |
+     v  |                        |           |            |
+    200 +-----------+------------+-----------+------------+
+    ```
+    The following table is an alternative way to represent the above ID assignments.
+    It is easy to see that the homogeneous ID range [0, 100) is used for nodes of type 0
+    in partition 0, [100, 200) is used for nodes of type 1 in partition 0, and so on.
+    ```
+    +---------+------+----------
+      range   | type | partition
+    [0, 100)  |   0  |    0
+    [100,200) |   1  |    0
+    [200,300) |   0  |    1
+    [300,400) |   1  |    1
+    ```
+    The goal of this class is to, given a node's homogenous ID, convert it into the
+    ID pair ``(type_id, type_wise_id)``. For example, homogeneous node ID 90 is mapped
+    to (0, 90); homogeneous node ID 201 is mapped to (0, 101).
+    Parameters
+    ----------
+    id_ranges : dict[str, Tensor].
+        Node ID ranges within partitions for each node type. The key is the node type
+        name in string. The value is a tensor of shape :math:`(K, 2)`, where :math:`K` is
+        the number of partitions. Each row has two integers: the starting and the ending IDs
+        for a particular node type in a partition. For example, all nodes of type ``"T"`` in
+        partition ``i`` has ID range ``id_ranges["T"][i][0]`` to ``id_ranges["T"][i][1]``.
+        It is the same as the `node_map` argument in `RangePartitionBook`.
+    '''
+    def __init__(self, id_ranges):
+        self.num_parts = list(id_ranges.values())[0].shape[0]
+        self.num_types = len(id_ranges)
+        ranges = np.zeros((self.num_parts * self.num_types, 2), dtype=np.int64)
+        typed_map = []
+        id_ranges = list(id_ranges.values())
+        id_ranges.sort(key=lambda a: a[0, 0])
+        for i, id_range in enumerate(id_ranges):
+            ranges[i::self.num_types] = id_range
+            map1 = np.cumsum(id_range[:, 1] - id_range[:, 0])
+            typed_map.append(map1)
+        assert np.all(np.diff(ranges[:, 0]) >= 0)
+        assert np.all(np.diff(ranges[:, 1]) >= 0)
+        self.range_start = utils.toindex(np.ascontiguousarray(ranges[:, 0]))
+        self.range_end = utils.toindex(np.ascontiguousarray(ranges[:, 1]) - 1)
+        self.typed_map = utils.toindex(np.concatenate(typed_map))
+    def __call__(self, ids):
+        '''Convert the homogeneous IDs to (type_id, type_wise_id).
+        Parameters
+        ----------
+        ids : 1D tensor
+            The homogeneous ID.
+        Returns
+        -------
+        type_ids : Tensor
+            Type IDs
+        per_type_ids : Tensor
+            Type-wise IDs
+        '''
+        if self.num_types == 0:
+            return F.zeros((len(ids),), F.dtype(ids), F.cpu()), ids
+        if len(ids) == 0:
+            return ids, ids
+        ids = utils.toindex(ids)
+        ret = _CAPI_DGLHeteroMapIds(ids.todgltensor(),
+                                    self.range_start.todgltensor(),
+                                    self.range_end.todgltensor(),
+                                    self.typed_map.todgltensor(),
+                                    self.num_parts, self.num_types)
+        ret = utils.toindex(ret).tousertensor()
+        return ret[:len(ids)], ret[len(ids):]
+_init_api("dgl.distributed.id_map")
--- a/python/dgl/distributed/kvstore.py
+++ b/python/dgl/distributed/kvstore.py
@@ -886,9 +886,9 @@ class KVClient(object):
        def push_handler(data_store, name, local_offset, data)
        ```
-        `data_store` is a dict that contains all tensors in the kvstore. `name` is the name
+        ``data_store`` is a dict that contains all tensors in the kvstore. ``name`` is the name
-        of the tensor where new data is pushed to. `local_offset` is the offset where new
+        of the tensor where new data is pushed to. ``local_offset`` is the offset where new
-        data should be written in the tensor in the local partition. `data` is the new data
+        data should be written in the tensor in the local partition. ``data`` is the new data
        to be written.
        Parameters
@@ -919,8 +919,8 @@ class KVClient(object):
        def pull_handler(data_store, name, local_offset)
        ```
-        `data_store` is a dict that contains all tensors in the kvstore. `name` is the name
+        ``data_store`` is a dict that contains all tensors in the kvstore. ``name`` is the name
-        of the tensor where new data is pushed to. `local_offset` is the offset where new
+        of the tensor where new data is pushed to. ``local_offset`` is the offset where new
        data should be written in the tensor in the local partition.
        Parameters

--- a/python/dgl/distributed/partition.py
+++ b/python/dgl/distributed/partition.py
--- a/python/dgl/distributed/standalone_kvstore.py
+++ b/python/dgl/distributed/standalone_kvstore.py
@@ -4,7 +4,6 @@ This kvstore is used when running in the standalone mode
 """
 from .. import backend as F
-from .graph_partition_book import PartitionPolicy, NODE_PART_POLICY, EDGE_PART_POLICY
 class KVClient(object):
    ''' The fake KVStore client.
@@ -34,9 +33,11 @@ class KVClient(object):
        '''register pull handler'''
        self._pull_handlers[name] = func
-    def add_data(self, name, tensor):
+    def add_data(self, name, tensor, part_policy):
        '''add data to the client'''
        self._data[name] = tensor
+        if part_policy.policy_str not in self._all_possible_part_policy:
+            self._all_possible_part_policy[part_policy.policy_str] = part_policy
    def init_data(self, name, shape, dtype, part_policy, init_func):
        '''add new data to the client'''
@@ -72,7 +73,3 @@ class KVClient(object):
    def map_shared_data(self, partition_book):
        '''Mapping shared-memory tensor from server to client.'''
-        self._all_possible_part_policy[NODE_PART_POLICY] = PartitionPolicy(NODE_PART_POLICY,
-                                                                           partition_book)
-        self._all_possible_part_policy[EDGE_PART_POLICY] = PartitionPolicy(EDGE_PART_POLICY,
-                                                                           partition_book)
--- a/python/dgl/partition.py
+++ b/python/dgl/partition.py
@@ -6,16 +6,16 @@ from ._ffi.function import _init_api
 from .heterograph import DGLHeteroGraph
 from . import backend as F
 from . import utils
-from .base import EID, NID
+from .base import EID, NID, NTYPE, ETYPE
 __all__ = ["metis_partition", "metis_partition_assignment",
           "partition_graph_with_halo"]
 def reorder_nodes(g, new_node_ids):
-    """ Generate a new graph with new node Ids.
+    """ Generate a new graph with new node IDs.
-    We assign each node in the input graph with a new node Id. This results in
+    We assign each node in the input graph with a new node ID. This results in
    a new graph.
    Parameters
@@ -23,11 +23,11 @@ def reorder_nodes(g, new_node_ids):
    g : DGLGraph
        The input graph
    new_node_ids : a tensor
-        The new node Ids
+        The new node IDs
    Returns
    -------
    DGLGraph
-        The graph with new node Ids.
+        The graph with new node IDs.
    """
    assert len(new_node_ids) == g.number_of_nodes(), \
        "The number of new node ids must match #nodes in the graph."
@@ -35,7 +35,7 @@ def reorder_nodes(g, new_node_ids):
    sorted_ids, idx = F.sort_1d(new_node_ids.tousertensor())
    assert F.asnumpy(sorted_ids[0]) == 0 \
        and F.asnumpy(sorted_ids[-1]) == g.number_of_nodes() - 1, \
-        "The new node Ids are incorrect."
+        "The new node IDs are incorrect."
    new_gidx = _CAPI_DGLReorderGraph_Hetero(
        g._graph, new_node_ids.todgltensor())
    new_g = DGLHeteroGraph(gidx=new_gidx, ntypes=['_N'], etypes=['_E'])
@@ -46,6 +46,74 @@ def reorder_nodes(g, new_node_ids):
 def _get_halo_heterosubgraph_inner_node(halo_subg):
    return _CAPI_GetHaloSubgraphInnerNodes_Hetero(halo_subg)
+def reshuffle_graph(g, node_part=None):
+    '''Reshuffle node ids and edge IDs of a graph.
+    This function reshuffles nodes and edges in a graph so that all nodes/edges of the same type
+    have contiguous IDs. If a graph is partitioned and nodes are assigned to different partitions,
+    all nodes/edges in a partition should
+    get contiguous IDs; within a partition, all nodes/edges of the same type have contigous IDs.
+    Parameters
+    ----------
+    g : DGLGraph
+        The input graph.
+    node_part : Tensor
+        This is a vector whose length is the same as the number of nodes in the input graph.
+        Each element indicates the partition ID the corresponding node is assigned to.
+    Returns
+    -------
+    (DGLGraph, Tensor)
+        The graph whose nodes and edges are reshuffled.
+        The 1D tensor that indicates the partition IDs of the nodes in the reshuffled graph.
+    '''
+    # In this case, we don't need to reshuffle node IDs and edge IDs.
+    if node_part is None:
+        g.ndata['orig_id'] = F.arange(0, g.number_of_nodes())
+        g.edata['orig_id'] = F.arange(0, g.number_of_edges())
+        return g, None
+    start = time.time()
+    if node_part is not None:
+        node_part = utils.toindex(node_part)
+        node_part = node_part.tousertensor()
+    if NTYPE in g.ndata:
+        is_hetero = len(F.unique(g.ndata[NTYPE])) > 1
+    else:
+        is_hetero = False
+    if is_hetero:
+        num_node_types = F.max(g.ndata[NTYPE], 0) + 1
+        if node_part is not None:
+            sorted_part, new2old_map = F.sort_1d(node_part * num_node_types + g.ndata[NTYPE])
+        else:
+            sorted_part, new2old_map = F.sort_1d(g.ndata[NTYPE])
+        sorted_part = F.floor_div(sorted_part, num_node_types)
+    elif node_part is not None:
+        sorted_part, new2old_map = F.sort_1d(node_part)
+    else:
+        g.ndata['orig_id'] = g.ndata[NID]
+        g.edata['orig_id'] = g.edata[EID]
+        return g, None
+    new_node_ids = np.zeros((g.number_of_nodes(),), dtype=np.int64)
+    new_node_ids[F.asnumpy(new2old_map)] = np.arange(0, g.number_of_nodes())
+    # If the input graph is homogneous, we only need to create an empty array, so that
+    # _CAPI_DGLReassignEdges_Hetero knows how to handle it.
+    etype = g.edata[ETYPE] if ETYPE in g.edata else F.zeros((0), F.dtype(sorted_part), F.cpu())
+    g = reorder_nodes(g, new_node_ids)
+    node_part = utils.toindex(sorted_part)
+    # We reassign edges in in-CSR. In this way, after partitioning, we can ensure
+    # that all edges in a partition are in the contiguous ID space.
+    etype_idx = utils.toindex(etype)
+    orig_eids = _CAPI_DGLReassignEdges_Hetero(g._graph, etype_idx.todgltensor(),
+                                              node_part.todgltensor(), True)
+    orig_eids = utils.toindex(orig_eids)
+    orig_eids = orig_eids.tousertensor()
+    g.edata['orig_id'] = orig_eids
+    print('Reshuffle nodes and edges: {:.3f} seconds'.format(time.time() - start))
+    return g, node_part.tousertensor()
 def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):
    '''Partition a graph.
@@ -55,10 +123,10 @@ def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):
    not belong to the partition of a subgraph but are connected to the nodes
    in the partition within a fixed number of hops.
-    If `reshuffle` is turned on, the function reshuffles node Ids and edge Ids
+    If `reshuffle` is turned on, the function reshuffles node IDs and edge IDs
    of the input graph before partitioning. After reshuffling, all nodes and edges
-    in a partition fall in a contiguous Id range in the input graph.
+    in a partition fall in a contiguous ID range in the input graph.
-    The partitioend subgraphs have node data 'orig_id', which stores the node Ids
+    The partitioend subgraphs have node data 'orig_id', which stores the node IDs
    in the original input graph.
    Parameters
@@ -68,37 +136,24 @@ def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):
    node_part: 1D tensor
        Specify which partition a node is assigned to. The length of this tensor
        needs to be the same as the number of nodes of the graph. Each element
-        indicates the partition Id of a node.
+        indicates the partition ID of a node.
    extra_cached_hops: int
        The number of hops a HALO node can be accessed.
    reshuffle : bool
-        Resuffle nodes so that nodes in the same partition are in the same Id range.
+        Resuffle nodes so that nodes in the same partition are in the same ID range.
    Returns
    --------
    a dict of DGLGraphs
-        The key is the partition Id and the value is the DGLGraph of the partition.
+        The key is the partition ID and the value is the DGLGraph of the partition.
    '''
    assert len(node_part) == g.number_of_nodes()
-    node_part = utils.toindex(node_part)
    if reshuffle:
-        start = time.time()
+        g, node_part = reshuffle_graph(g, node_part)
-        node_part = node_part.tousertensor()
-        sorted_part, new2old_map = F.sort_1d(node_part)
-        new_node_ids = np.zeros((g.number_of_nodes(),), dtype=np.int64)
-        new_node_ids[F.asnumpy(new2old_map)] = np.arange(
-            0, g.number_of_nodes())
-        g = reorder_nodes(g, new_node_ids)
-        node_part = utils.toindex(sorted_part)
-        # We reassign edges in in-CSR. In this way, after partitioning, we can ensure
-        # that all edges in a partition are in the contiguous Id space.
-        orig_eids = _CAPI_DGLReassignEdges_Hetero(g._graph, True)
-        orig_eids = utils.toindex(orig_eids)
-        orig_eids = orig_eids.tousertensor()
        orig_nids = g.ndata['orig_id']
-        print('Reshuffle nodes and edges: {:.3f} seconds'.format(
+        orig_eids = g.edata['orig_id']
-            time.time() - start))
+    node_part = utils.toindex(node_part)
    start = time.time()
    subgs = _CAPI_DGLPartitionWithHalo_Hetero(
        g._graph, node_part.todgltensor(), extra_cached_hops)
@@ -171,7 +226,7 @@ def metis_partition_assignment(g, k, balance_ntypes=None, balance_edges=False):
    Returns
    -------
    a 1-D tensor
-        A vector with each element that indicates the partition Id of a vertex.
+        A vector with each element that indicates the partition ID of a vertex.
    '''
    # METIS works only on symmetric graphs.
    # The METIS runs on the symmetric graph to generate the node assignment to partitions.
@@ -252,10 +307,10 @@ def metis_partition(g, k, extra_cached_hops=0, reshuffle=False,
    To balance the node types, a user needs to pass a vector of N elements to indicate
    the type of each node. N is the number of nodes in the input graph.
-    If `reshuffle` is turned on, the function reshuffles node Ids and edge Ids
+    If `reshuffle` is turned on, the function reshuffles node IDs and edge IDs
    of the input graph before partitioning. After reshuffling, all nodes and edges
-    in a partition fall in a contiguous Id range in the input graph.
+    in a partition fall in a contiguous ID range in the input graph.
-    The partitioend subgraphs have node data 'orig_id', which stores the node Ids
+    The partitioend subgraphs have node data 'orig_id', which stores the node IDs
    in the original input graph.
    The partitioned subgraph is stored in DGLGraph. The DGLGraph has the `part_id`
@@ -271,7 +326,7 @@ def metis_partition(g, k, extra_cached_hops=0, reshuffle=False,
    extra_cached_hops: int
        The number of hops a HALO node can be accessed.
    reshuffle : bool
-        Resuffle nodes so that nodes in the same partition are in the same Id range.
+        Resuffle nodes so that nodes in the same partition are in the same ID range.
    balance_ntypes : tensor
        Node type of each node
    balance_edges : bool
@@ -280,7 +335,7 @@ def metis_partition(g, k, extra_cached_hops=0, reshuffle=False,
    Returns
    --------
    a dict of DGLGraphs
-        The key is the partition Id and the value is the DGLGraph of the partition.
+        The key is the partition ID and the value is the DGLGraph of the partition.
    '''
    node_part = metis_partition_assignment(g, k, balance_ntypes, balance_edges)
    if node_part is None:
@@ -289,5 +344,4 @@ def metis_partition(g, k, extra_cached_hops=0, reshuffle=False,
    # Then we split the original graph into parts based on the METIS partitioning results.
    return partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle)
 _init_api("dgl.partition")
--- a/src/graph/graph_op.cc
+++ b/src/graph/graph_op.cc
@@ -719,4 +719,61 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLMapSubgraphNID")
    *rv = GraphOp::MapParentIdToSubgraphId(parent_vids, query);
  });
+template<class IdType>
+IdArray MapIds(IdArray ids, IdArray range_starts, IdArray range_ends, IdArray typed_map,
+               int num_parts, int num_types) {
+  int64_t num_ids = ids->shape[0];
+  int64_t num_ranges = range_starts->shape[0];
+  IdArray ret = IdArray::Empty({num_ids * 2}, ids->dtype, ids->ctx);
+  const IdType *range_start_data = static_cast<IdType *>(range_starts->data);
+  const IdType *range_end_data = static_cast<IdType *>(range_ends->data);
+  const IdType *ids_data = static_cast<IdType *>(ids->data);
+  const IdType *typed_map_data = static_cast<IdType *>(typed_map->data);
+  IdType *types_data = static_cast<IdType *>(ret->data);
+  IdType *per_type_ids_data = static_cast<IdType *>(ret->data) + num_ids;
+#pragma omp parallel for
+  for (int64_t i = 0; i < ids->shape[0]; i++) {
+    IdType id = ids_data[i];
+    auto it = std::lower_bound(range_end_data, range_end_data + num_ranges, id);
+    // The range must exist.
+    BUG_ON(it != range_end_data + num_ranges);
+    size_t range_id = it - range_end_data;
+    int type_id = range_id % num_types;
+    types_data[i] = type_id;
+    int part_id = range_id / num_types;
+    BUG_ON(part_id < num_parts);
+    if (part_id == 0) {
+      per_type_ids_data[i] = id - range_start_data[range_id];
+    } else {
+      per_type_ids_data[i] = id - range_start_data[range_id]
+          + typed_map_data[num_parts * type_id + part_id - 1];
+    }
+  }
+  return ret;
+}
+DGL_REGISTER_GLOBAL("distributed.id_map._CAPI_DGLHeteroMapIds")
+.set_body([] (DGLArgs args, DGLRetValue* rv) {
+    const IdArray ids = args[0];
+    const IdArray range_starts = args[1];
+    const IdArray range_ends = args[2];
+    const IdArray typed_map = args[3];
+    int num_parts = args[4];
+    int num_types = args[5];
+    int num_ranges = range_starts->shape[0];
+    CHECK_EQ(range_starts->dtype.bits, ids->dtype.bits);
+    CHECK_EQ(range_ends->dtype.bits, ids->dtype.bits);
+    CHECK_EQ(typed_map->dtype.bits, ids->dtype.bits);
+    CHECK_EQ(num_ranges, num_parts * num_types);
+    CHECK_EQ(num_ranges, range_ends->shape[0]);
+    IdArray ret;
+    ATEN_ID_TYPE_SWITCH(ids->dtype, IdType, {
+      ret = MapIds<IdType>(ids, range_starts, range_ends, typed_map, num_parts, num_types);
+    });
+    *rv = ret;
+  });
 }  // namespace dgl