[Feature] Range partition (#1522)

* add reorder immutable graph. * add python API. * add reorder for csr. * remove gk version. * fix * add cpp test. * bug fixes * fix tests. * fix bugs and add check * fix test. * add omp. * add comments. * add coo reorder. * fix a bug. * handle reorder for different graph structues. * fix lint. * fix. * add original ids. * reshuffle nodes before metis partition. * inner nodes are in contiguous Id range. * reshuffle nodes/edges when partitioning. * load partition return graph partition book. * use inner_node/inner_edges * add and test range partition book. * count inner_edge correctly. * fix lint. * fix lint. * fix lint. * fix errors. * fix errors. * fix for TF. * fix. * fix. * change docstring. * support logical and. * add comments. * avoid copy. * fix * update docstring. * fix a bug. * add range search. * fix * fix a bug. * add more tests. * load graph partition book. * support shared memory for range partition book. * fix a bug. * fix. * fix lint. * remove check * fix test. * remove num_nodes and num_edges * fix lint. * fix graph partition book. * address comments. * use makedirs. * fix compile Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Chao Ma <mctt90@gmail.com>

[Feature] Range partition (#1522)
* add reorder immutable graph. * add python API. * add reorder for csr. * remove gk version. * fix * add cpp test. * bug fixes * fix tests. * fix bugs and add check * fix test. * add omp. * add comments. * add coo reorder. * fix a bug. * handle reorder for different graph structues. * fix lint. * fix. * add original ids. * reshuffle nodes before metis partition. * inner nodes are in contiguous Id range. * reshuffle nodes/edges when partitioning. * load partition return graph partition book. * use inner_node/inner_edges * add and test range partition book. * count inner_edge correctly. * fix lint. * fix lint. * fix lint. * fix errors. * fix errors. * fix for TF. * fix. * fix. * change docstring. * support logical and. * add comments. * avoid copy. * fix * update docstring. * fix a bug. * add range search. * fix * fix a bug. * add more tests. * load graph partition book. * support shared memory for range partition book. * fix a bug. * fix. * fix lint. * remove check * fix test. * remove num_nodes and num_edges * fix lint. * fix graph partition book. * address comments. * use makedirs. * fix compile Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Chao Ma <mctt90@gmail.com>
41349dce · Da Zheng · GitHub · 3e72c53a · 41349dce · 41349dce
Unverified Commit 41349dce authored Jun 15, 2020 by Da Zheng Committed by GitHub Jun 15, 2020
20 changed files
--- a/include/dgl/array.h
+++ b/include/dgl/array.h
@@ -589,6 +589,14 @@ bool CSRHasDuplicate(CSRMatrix csr);
 */
 void CSRSort_(CSRMatrix* csr);
+/*!
+ * \brief Reorder the rows and colmns according to the new row and column order.
+ * \param csr The input csr matrix.
+ * \param new_row_ids the new row Ids (the index is the old row Id)
+ * \param new_col_ids the new column Ids (the index is the old col Id).
+ */
+CSRMatrix CSRReorder(CSRMatrix csr, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids);
 /*!
 * \brief Remove entries from CSR matrix by entry indices (data indices)
 * \return A new CSR matrix as well as a mapping from the new CSR entries to the old CSR
@@ -778,6 +786,14 @@ COOMatrix COOSort(COOMatrix mat, bool sort_column = false);
 */
 COOMatrix COORemove(COOMatrix coo, IdArray entries);
+/*!
+ * \brief Reorder the rows and colmns according to the new row and column order.
+ * \param csr The input coo matrix.
+ * \param new_row_ids the new row Ids (the index is the old row Id)
+ * \param new_col_ids the new column Ids (the index is the old col Id).
+ */
+COOMatrix COOReorder(COOMatrix coo, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids);
 /*!
 * \brief Randomly select a fixed number of non-zero entries along each given row independently.
 *

--- a/include/dgl/graph_op.h
+++ b/include/dgl/graph_op.h
@@ -159,6 +159,14 @@ class GraphOp {
   */
  static HaloSubgraph GetSubgraphWithHalo(GraphPtr graph, IdArray nodes, int num_hops);
+  /*!
+   * \brief Reorder the nodes in the immutable graph.
+   * \param graph The input graph.
+   * \param new_order The node Ids in the new graph. The index in `new_order` is old node Ids.
+   * \return the graph with reordered node Ids
+   */
+  static GraphPtr ReorderImmutableGraph(ImmutableGraphPtr ig, IdArray new_order);
  /*!
   * \brief Partition a graph with Metis.
   * The partitioning algorithm assigns each vertex to a partition.

--- a/python/dgl/backend/backend.py
+++ b/python/dgl/backend/backend.py
@@ -982,6 +982,9 @@ def logical_not(input):
    """
    pass
+def logical_and(input1, input2):
+    pass
 def clone(input):
    """Return a clone of the input tensor.

--- a/python/dgl/backend/mxnet/tensor.py
+++ b/python/dgl/backend/mxnet/tensor.py
@@ -313,6 +313,9 @@ def equal(x, y):
 def logical_not(input):
    return nd.logical_not(input)
+def logical_and(input1, input2):
+    return nd.logical_and(input1, input2)
 def clone(input):
    return input.copy()

--- a/python/dgl/backend/pytorch/tensor.py
+++ b/python/dgl/backend/pytorch/tensor.py
@@ -249,6 +249,9 @@ def equal(x, y):
 def logical_not(input):
    return ~input
+def logical_and(input1, input2):
+    return input1 & input2
 def clone(input):
    return input.clone()

--- a/python/dgl/backend/tensorflow/tensor.py
+++ b/python/dgl/backend/tensorflow/tensor.py
@@ -350,6 +350,9 @@ def equal(x, y):
 def logical_not(input):
    return ~input
+def logical_and(input1, input2):
+    return tf.math.logical_and(input1, input2)
 def clone(input):
    # TF tensor is always immutable so returning the input is safe.
    return input

--- a/python/dgl/distributed/dist_graph.py
+++ b/python/dgl/distributed/dist_graph.py
@@ -10,7 +10,7 @@ from ..graph_index import from_shared_mem_graph_index
 from .._ffi.ndarray import empty_shared_mem
 from ..frame import infer_scheme
 from .partition import load_partition
-from .graph_partition_book import GraphPartitionBook, PartitionPolicy, get_shared_mem_partition_book
+from .graph_partition_book import PartitionPolicy, get_shared_mem_partition_book
 from .. import utils
 from .shared_mem_utils import _to_shared_mem, _get_ndata_path, _get_edata_path, DTYPE_DICT
 from .rpc_client import connect_to_server
@@ -25,17 +25,16 @@ def _copy_graph_to_shared_mem(g, graph_name):
    new_g = DGLGraph(gidx)
    # We should share the node/edge data to the client explicitly instead of putting them
    # in the KVStore because some of the node/edge data may be duplicated.
-    local_node_path = _get_ndata_path(graph_name, 'local_node')
+    local_node_path = _get_ndata_path(graph_name, 'inner_node')
-    new_g.ndata['local_node'] = _to_shared_mem(g.ndata['local_node'],
+    new_g.ndata['inner_node'] = _to_shared_mem(g.ndata['inner_node'], local_node_path)
-                                               local_node_path)
+    local_edge_path = _get_edata_path(graph_name, 'inner_edge')
-    local_edge_path = _get_edata_path(graph_name, 'local_edge')
+    new_g.edata['inner_edge'] = _to_shared_mem(g.edata['inner_edge'], local_edge_path)
-    new_g.edata['local_edge'] = _to_shared_mem(g.edata['local_edge'], local_edge_path)
    new_g.ndata[NID] = _to_shared_mem(g.ndata[NID], _get_ndata_path(graph_name, NID))
    new_g.edata[EID] = _to_shared_mem(g.edata[EID], _get_edata_path(graph_name, EID))
    return new_g
-FIELD_DICT = {'local_node': F.int64,
+FIELD_DICT = {'inner_node': F.int64,
-              'local_edge': F.int64,
+              'inner_edge': F.int64,
              NID: F.int64,
              EID: F.int64}
@@ -99,8 +98,8 @@ def _get_graph_from_shared_mem(graph_name):
        return gidx
    g = DGLGraph(gidx)
-    g.ndata['local_node'] = _get_shared_mem_ndata(g, graph_name, 'local_node')
+    g.ndata['inner_node'] = _get_shared_mem_ndata(g, graph_name, 'inner_node')
-    g.edata['local_edge'] = _get_shared_mem_edata(g, graph_name, 'local_edge')
+    g.edata['inner_edge'] = _get_shared_mem_edata(g, graph_name, 'inner_edge')
    g.ndata[NID] = _get_shared_mem_ndata(g, graph_name, NID)
    g.edata[EID] = _get_shared_mem_edata(g, graph_name, EID)
    return g
@@ -271,12 +270,10 @@ class DistGraphServer(KVServer):
        self.ip_config = ip_config
        # Load graph partition data.
-        self.client_g, node_feats, edge_feats, self.meta = load_partition(conf_file, server_id)
+        self.client_g, node_feats, edge_feats, self.gpb = load_partition(conf_file, server_id)
-        _, _, node_map, edge_map, num_partitions = self.meta
        self.client_g = _copy_graph_to_shared_mem(self.client_g, graph_name)
        # Init kvstore.
-        self.gpb = GraphPartitionBook(server_id, num_partitions, node_map, edge_map, self.client_g)
        self.gpb.shared_memory(graph_name)
        self.add_part_policy(PartitionPolicy('node', server_id, self.gpb))
        self.add_part_policy(PartitionPolicy('edge', server_id, self.gpb))
@@ -332,6 +329,12 @@ class DistGraph:
        self._default_init_ndata = _default_init_data
        self._default_init_edata = _default_init_data
+        self._num_nodes = 0
+        self._num_edges = 0
+        for part_md in self._gpb.metadata():
+            self._num_nodes += int(part_md['num_nodes'])
+            self._num_edges += int(part_md['num_edges'])
    def init_ndata(self, ndata_name, shape, dtype):
        '''Initialize node data
@@ -425,11 +428,11 @@ class DistGraph:
    def number_of_nodes(self):
        """Return the number of nodes"""
-        return self._gpb.num_nodes()
+        return self._num_nodes
    def number_of_edges(self):
        """Return the number of edges"""
-        return self._gpb.num_edges()
+        return self._num_edges
    def node_attr_schemes(self):
        """Return the node feature and embedding schemes."""

--- a/python/dgl/distributed/graph_partition_book.py
+++ b/python/dgl/distributed/graph_partition_book.py
@@ -8,13 +8,13 @@ from .. import utils
 from .shared_mem_utils import _to_shared_mem, _get_ndata_path, _get_edata_path, DTYPE_DICT
 from .._ffi.ndarray import empty_shared_mem
-def _move_metadata_to_shared_mam(graph_name, num_nodes, num_edges, part_id,
+def _move_metadata_to_shared_mem(graph_name, num_nodes, num_edges, part_id,
-                                 num_partitions, node_map, edge_map):
+                                 num_partitions, node_map, edge_map, is_range_part):
-    ''' Move all metadata to the shared memory.
+    ''' Move all metadata of the partition book to the shared memory.
    We need these metadata to construct graph partition book.
    '''
-    meta = _to_shared_mem(F.tensor([num_nodes, num_edges,
+    meta = _to_shared_mem(F.tensor([int(is_range_part), num_nodes, num_edges,
                                    num_partitions, part_id]),
                          _get_ndata_path(graph_name, 'meta'))
    node_map = _to_shared_mem(node_map, _get_ndata_path(graph_name, 'node_map'))
@@ -27,25 +27,29 @@ def _get_shared_mem_metadata(graph_name):
    The metadata includes the number of nodes and the number of edges. In the future,
    we can add more information, especially for heterograph.
    '''
-    shape = (4,)
+    # The metadata has 5 elements: is_range_part, num_nodes, num_edges, num_partitions, part_id
+    # We might need to extend the list in the future.
+    shape = (5,)
    dtype = F.int64
    dtype = DTYPE_DICT[dtype]
    data = empty_shared_mem(_get_ndata_path(graph_name, 'meta'), False, shape, dtype)
    dlpack = data.to_dlpack()
    meta = F.asnumpy(F.zerocopy_from_dlpack(dlpack))
-    num_nodes, num_edges, num_partitions, part_id = meta[0], meta[1], meta[2], meta[3]
+    is_range_part, num_nodes, num_edges, num_partitions, part_id = meta
    # Load node map
-    data = empty_shared_mem(_get_ndata_path(graph_name, 'node_map'), False, (num_nodes,), dtype)
+    length = num_partitions if is_range_part else num_nodes
+    data = empty_shared_mem(_get_ndata_path(graph_name, 'node_map'), False, (length,), dtype)
    dlpack = data.to_dlpack()
    node_map = F.zerocopy_from_dlpack(dlpack)
    # Load edge_map
-    data = empty_shared_mem(_get_edata_path(graph_name, 'edge_map'), False, (num_edges,), dtype)
+    length = num_partitions if is_range_part else num_edges
+    data = empty_shared_mem(_get_edata_path(graph_name, 'edge_map'), False, (length,), dtype)
    dlpack = data.to_dlpack()
    edge_map = F.zerocopy_from_dlpack(dlpack)
-    return part_id, num_partitions, node_map, edge_map
+    return is_range_part, part_id, num_partitions, node_map, edge_map
 def get_shared_mem_partition_book(graph_name, graph_part):
@@ -63,11 +67,14 @@ def get_shared_mem_partition_book(graph_name, graph_part):
    Returns
    -------
-    GraphPartitionBook
+    GraphPartitionBook or RangePartitionBook
        A graph partition book for a particular partition.
    '''
-    part_id, num_parts, node_map, edge_map = _get_shared_mem_metadata(graph_name)
+    is_range_part, part_id, num_parts, node_map, edge_map = _get_shared_mem_metadata(graph_name)
-    return GraphPartitionBook(part_id, num_parts, node_map, edge_map, graph_part)
+    if is_range_part == 1:
+        return RangePartitionBook(part_id, num_parts, node_map, edge_map)
+    else:
+        return GraphPartitionBook(part_id, num_parts, node_map, edge_map, graph_part)
 class GraphPartitionBook:
    """GraphPartitionBook is used to store parition information.
@@ -75,7 +82,7 @@ class GraphPartitionBook:
    Parameters
    ----------
    part_id : int
-        partition id of current GraphPartitionBook
+        partition id of current partition book
    num_parts : int
        number of total partitions
    node_map : tensor
@@ -94,16 +101,15 @@ class GraphPartitionBook:
        self._nid2partid = node_map.tousertensor()
        edge_map = utils.toindex(edge_map)
        self._eid2partid = edge_map.tousertensor()
-        self._graph = part_graph
+        # Get meta data of the partition book.
-        # Get meta data of GraphPartitionBook
        self._partition_meta_data = []
        _, nid_count = np.unique(F.asnumpy(self._nid2partid), return_counts=True)
        _, eid_count = np.unique(F.asnumpy(self._eid2partid), return_counts=True)
        for partid in range(self._num_partitions):
            part_info = {}
            part_info['machine_id'] = partid
-            part_info['num_nodes'] = nid_count[partid]
+            part_info['num_nodes'] = int(nid_count[partid])
-            part_info['num_edges'] = eid_count[partid]
+            part_info['num_edges'] = int(eid_count[partid])
            self._partition_meta_data.append(part_info)
        # Get partid2nids
        self._partid2nids = []
@@ -123,7 +129,7 @@ class GraphPartitionBook:
            self._partid2eids.append(part_eids)
        # Get nidg2l
        self._nidg2l = [None] * self._num_partitions
-        global_id = self._graph.ndata[NID]
+        global_id = part_graph.ndata[NID]
        max_global_id = np.amax(F.asnumpy(global_id))
        # TODO(chao): support int32 index
        g2l = F.zeros((max_global_id+1), F.int64, F.context(global_id))
@@ -131,7 +137,7 @@ class GraphPartitionBook:
        self._nidg2l[self._part_id] = g2l
        # Get eidg2l
        self._eidg2l = [None] * self._num_partitions
-        global_id = self._graph.edata[EID]
+        global_id = part_graph.edata[EID]
        max_global_id = np.amax(F.asnumpy(global_id))
        # TODO(chao): support int32 index
        g2l = F.zeros((max_global_id+1), F.int64, F.context(global_id))
@@ -149,9 +155,9 @@ class GraphPartitionBook:
        graph_name : str
            The graph name
        """
-        self._meta, self._nid2partid, self._eid2partid = _move_metadata_to_shared_mam(
+        self._meta, self._nid2partid, self._eid2partid = _move_metadata_to_shared_mem(
-            graph_name, self.num_nodes(), self.num_edges(), self._part_id, self._num_partitions,
+            graph_name, self._num_nodes(), self._num_edges(), self._part_id, self._num_partitions,
-            self._nid2partid, self._eid2partid)
+            self._nid2partid, self._eid2partid, False)
    def num_partitions(self):
        """Return the number of partitions.
@@ -169,7 +175,6 @@ class GraphPartitionBook:
        The meta data includes:
        * The machine ID.
-        * The machine IP address.
        * Number of nodes and edges of each partition.
        Examples
@@ -186,12 +191,12 @@ class GraphPartitionBook:
        """
        return self._partition_meta_data
-    def num_nodes(self):
+    def _num_nodes(self):
        """ The total number of nodes
        """
        return len(self._nid2partid)
-    def num_edges(self):
+    def _num_edges(self):
        """ The total number of edges
        """
        return len(self._eid2partid)
@@ -227,7 +232,7 @@ class GraphPartitionBook:
        return F.gather_row(self._eid2partid, eids)
    def partid2nids(self, partid):
-        """From partition id to node IDs
+        """From partition id to global node IDs
        Parameters
        ----------
@@ -242,7 +247,7 @@ class GraphPartitionBook:
        return self._partid2nids[partid]
    def partid2eids(self, partid):
-        """From partition id to edge IDs
+        """From partition id to global edge IDs
        Parameters
        ----------
@@ -309,11 +314,7 @@ class GraphPartitionBook:
        DGLGraph
            The graph of the partition.
        """
-        if partid != self._part_id:
+        #TODO(zhengda) add implementation later.
-            raise RuntimeError('Now GraphPartitionBook does not support \
-                getting remote partitions.')
-        return self._graph
    def get_node_size(self):
        """Get the number of nodes in the current partition.
@@ -335,6 +336,260 @@ class GraphPartitionBook:
        """
        return self._edge_size
+class RangePartitionBook:
+    """RangePartitionBook is used to store parition information.
+    Parameters
+    ----------
+    part_id : int
+        partition id of current partition book
+    num_parts : int
+        number of total partitions
+    node_map : tensor
+        map global node id to partition id
+    edge_map : tensor
+        map global edge id to partition id
+    """
+    def __init__(self, part_id, num_parts, node_map, edge_map):
+        assert part_id >= 0, 'part_id cannot be a negative number.'
+        assert num_parts > 0, 'num_parts must be greater than zero.'
+        self._partid = part_id
+        self._num_partitions = num_parts
+        node_map = utils.toindex(node_map)
+        edge_map = utils.toindex(edge_map)
+        self._node_map = node_map.tonumpy()
+        self._edge_map = edge_map.tonumpy()
+        # Get meta data of the partition book
+        self._partition_meta_data = []
+        for partid in range(self._num_partitions):
+            nrange_start = node_map[partid - 1] if partid > 0 else 0
+            nrange_end = node_map[partid]
+            erange_start = edge_map[partid - 1] if partid > 0 else 0
+            erange_end = edge_map[partid]
+            part_info = {}
+            part_info['machine_id'] = partid
+            part_info['num_nodes'] = int(nrange_end - nrange_start)
+            part_info['num_edges'] = int(erange_end - erange_start)
+            self._partition_meta_data.append(part_info)
+    def shared_memory(self, graph_name):
+        """Move data to shared memory.
+        Parameters
+        ----------
+        graph_name : str
+            The graph name
+        """
+        self._meta = _move_metadata_to_shared_mem(
+            graph_name, self._num_nodes(), self._num_edges(), self._partid,
+            self._num_partitions, F.tensor(self._node_map), F.tensor(self._edge_map), True)
+    def num_partitions(self):
+        """Return the number of partitions.
+        Returns
+        -------
+        int
+            number of partitions
+        """
+        return self._num_partitions
+    def _num_nodes(self):
+        """ The total number of nodes
+        """
+        return int(self._node_map[-1])
+    def _num_edges(self):
+        """ The total number of edges
+        """
+        return int(self._edge_map[-1])
+    def metadata(self):
+        """Return the partition meta data.
+        The meta data includes:
+        * The machine ID.
+        * Number of nodes and edges of each partition.
+        Examples
+        --------
+        >>> print(g.get_partition_book().metadata())
+        >>> [{'machine_id' : 0, 'num_nodes' : 3000, 'num_edges' : 5000},
+        ...  {'machine_id' : 1, 'num_nodes' : 2000, 'num_edges' : 4888},
+        ...  ...]
+        Returns
+        -------
+        list[dict[str, any]]
+            Meta data of each partition.
+        """
+        return self._partition_meta_data
+    def nid2partid(self, nids):
+        """From global node IDs to partition IDs
+        Parameters
+        ----------
+        nids : tensor
+            global node IDs
+        Returns
+        -------
+        tensor
+            partition IDs
+        """
+        nids = utils.toindex(nids)
+        ret = np.searchsorted(self._node_map, nids.tonumpy(), side='right')
+        ret = utils.toindex(ret)
+        return ret.tousertensor()
+    def eid2partid(self, eids):
+        """From global edge IDs to partition IDs
+        Parameters
+        ----------
+        eids : tensor
+            global edge IDs
+        Returns
+        -------
+        tensor
+            partition IDs
+        """
+        eids = utils.toindex(eids)
+        ret = np.searchsorted(self._edge_map, eids.tonumpy(), side='right')
+        ret = utils.toindex(ret)
+        return ret.tousertensor()
+    def partid2nids(self, partid):
+        """From partition id to global node IDs
+        Parameters
+        ----------
+        partid : int
+            partition id
+        Returns
+        -------
+        tensor
+            node IDs
+        """
+        # TODO do we need to cache it?
+        start = self._node_map[partid - 1] if partid > 0 else 0
+        end = self._node_map[partid]
+        return F.arange(start, end)
+    def partid2eids(self, partid):
+        """From partition id to global edge IDs
+        Parameters
+        ----------
+        partid : int
+            partition id
+        Returns
+        -------
+        tensor
+            edge IDs
+        """
+        # TODO do we need to cache it?
+        start = self._edge_map[partid - 1] if partid > 0 else 0
+        end = self._edge_map[partid]
+        return F.arange(start, end)
+    def nid2localnid(self, nids, partid):
+        """Get local node IDs within the given partition.
+        Parameters
+        ----------
+        nids : tensor
+            global node IDs
+        partid : int
+            partition ID
+        Returns
+        -------
+        tensor
+             local node IDs
+        """
+        if partid != self._partid:
+            raise RuntimeError('Now RangePartitionBook does not support \
+                getting remote tensor of nid2localnid.')
+        start = self._node_map[partid - 1] if partid > 0 else 0
+        return nids - start
+    def eid2localeid(self, eids, partid):
+        """Get the local edge ids within the given partition.
+        Parameters
+        ----------
+        eids : tensor
+            global edge ids
+        partid : int
+            partition ID
+        Returns
+        -------
+        tensor
+             local edge ids
+        """
+        if partid != self._partid:
+            raise RuntimeError('Now RangePartitionBook does not support \
+                getting remote tensor of eid2localeid.')
+        start = self._edge_map[partid - 1] if partid > 0 else 0
+        return eids - start
+    def get_partition(self, partid):
+        """Get the graph of one partition.
+        Parameters
+        ----------
+        partid : int
+            Partition ID.
+        Returns
+        -------
+        DGLGraph
+            The graph of the partition.
+        """
+        #TODO(zhengda) add implementation later.
+    def get_node_size(self):
+        """Get the number of nodes in the current partition.
+        Return
+        ------
+        int
+            The number of nodes in current partition
+        """
+        range_start = self._node_map[self._partid - 1] if self._partid > 0 else 0
+        range_end = self._node_map[self._partid]
+        return range_end - range_start
+    def get_edge_size(self):
+        """Get the number of edges in the current partition.
+        Return
+        ------
+        int
+            The number of edges in current partition
+        """
+        range_start = self._edge_map[self._partid - 1] if self._partid > 0 else 0
+        range_end = self._edge_map[self._partid]
+        return range_end - range_start
 class PartitionPolicy(object):
    """Wrapper for GraphPartitionBook and RangePartitionBook.

--- a/python/dgl/distributed/partition.py
+++ b/python/dgl/distributed/partition.py
@@ -85,6 +85,7 @@ from .. import backend as F
 from ..base import NID, EID
 from ..data.utils import load_graphs, save_graphs, load_tensors, save_tensors
 from ..transform import metis_partition_assignment, partition_graph_with_halo
+from .graph_partition_book import GraphPartitionBook, RangePartitionBook
 def load_partition(conf_file, part_id):
    ''' Load data of a partition from the data path in the DistGraph server.
@@ -115,8 +116,8 @@ def load_partition(conf_file, part_id):
        All node features.
    dict of tensors
        All edge features.
-    (int, int, NumPy ndarray, Numpy ndarray))
+    GraphPartitionBook
-        The metadata of the global graph: number of nodes, number of edges, node map, edge map.
+        The global partition information.
    '''
    with open(conf_file) as conf_f:
        part_metadata = json.load(conf_f)
@@ -134,24 +135,27 @@ def load_partition(conf_file, part_id):
    assert 'num_edges' in part_metadata, "cannot get the number of edges of the global graph."
    assert 'node_map' in part_metadata, "cannot get the node map."
    assert 'edge_map' in part_metadata, "cannot get the edge map."
-    node_map = np.load(part_metadata['node_map'], allow_pickle=True)
-    edge_map = np.load(part_metadata['edge_map'], allow_pickle=True)
+    # If this is a range partitioning, node_map actually stores a list, whose elements
-    meta = (part_metadata['num_nodes'], part_metadata['num_edges'], node_map, edge_map, num_parts)
+    # indicate the boundary of range partitioning. Otherwise, node_map stores a filename
+    # that contains node map in a NumPy array.
+    is_range_part = isinstance(part_metadata['node_map'], list)
+    node_map = part_metadata['node_map'] if is_range_part else np.load(part_metadata['node_map'])
+    edge_map = part_metadata['edge_map'] if is_range_part else np.load(part_metadata['edge_map'])
+    assert isinstance(node_map, list) == isinstance(edge_map, list), \
+            "The node map and edge map need to have the same format"
    assert NID in graph.ndata, "the partition graph should contain node mapping to global node Id"
    assert EID in graph.edata, "the partition graph should contain edge mapping to global edge Id"
-    # TODO we need to fix this. DGL backend doesn't support boolean or byte.
+    if is_range_part:
-    # int64 is unnecessary.
+        gpb = RangePartitionBook(part_id, num_parts, np.array(node_map), np.array(edge_map))
-    node_map = F.zerocopy_from_numpy(node_map)
+    else:
-    part_ids = F.gather_row(node_map, graph.ndata[NID])
+        gpb = GraphPartitionBook(part_id, num_parts, node_map, edge_map, graph)
-    graph.ndata['local_node'] = F.astype(part_ids == part_id, F.int64)
+    return graph, node_feats, edge_feats, gpb
-    edge_map = F.zerocopy_from_numpy(edge_map)
-    part_ids = F.gather_row(edge_map, graph.edata[EID])
-    graph.edata['local_edge'] = F.astype(part_ids == part_id, F.int64)
-    return graph, node_feats, edge_feats, meta
-def partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method="metis"):
+def partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method="metis",
+                    reshuffle=True):
    ''' Partition a graph for distributed training and store the partitions on files.
    The partitioning occurs in three steps: 1) run a partition algorithm (e.g., Metis) to
@@ -192,31 +196,37 @@ def partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method=
        The partition method. It supports "random" and "metis".
    out_path : str
        The path to store the files for all partitioned data.
+    reshuffle : bool
+        Reshuffle nodes and edges so that nodes and edges in a partition are in
+        contiguous Id range.
    '''
    if num_parts == 1:
        client_parts = {0: g}
        node_parts = F.zeros((g.number_of_nodes(),), F.int64, F.cpu())
        g.ndata[NID] = F.arange(0, g.number_of_nodes())
        g.edata[EID] = F.arange(0, g.number_of_edges())
+        g.ndata['inner_node'] = F.ones((g.number_of_nodes(),), F.int64, F.cpu())
+        g.edata['inner_edge'] = F.ones((g.number_of_edges(),), F.int64, F.cpu())
+        g.ndata['orig_id'] = F.arange(0, g.number_of_nodes())
    elif part_method == 'metis':
        node_parts = metis_partition_assignment(g, num_parts)
-        client_parts = partition_graph_with_halo(g, node_parts, num_hops)
+        client_parts = partition_graph_with_halo(g, node_parts, num_hops, reshuffle=reshuffle)
    elif part_method == 'random':
        node_parts = dgl.random.choice(num_parts, g.number_of_nodes())
-        client_parts = partition_graph_with_halo(g, node_parts, num_hops)
+        client_parts = partition_graph_with_halo(g, node_parts, num_hops, reshuffle=reshuffle)
    else:
        raise Exception('Unknown partitioning method: ' + part_method)
    # Let's calculate edge assignment.
    # TODO(zhengda) we should replace int64 with int16. int16 should be sufficient.
-    if num_parts > 1:
+    if not reshuffle:
        edge_parts = np.zeros((g.number_of_edges(),), dtype=np.int64) - 1
        num_edges = 0
        lnodes_list = []      # The node ids of each partition
        ledges_list = []      # The edge Ids of each partition
        for part_id in range(num_parts):
            part = client_parts[part_id]
-            local_nodes = F.boolean_mask(part.ndata[NID], part.ndata['inner_node'] == 1)
+            local_nodes = F.boolean_mask(part.ndata[NID], part.ndata['inner_node'])
            local_edges = F.asnumpy(g.in_edges(local_nodes, form='eid'))
            edge_parts[local_edges] = part_id
            num_edges += len(local_edges)
@@ -224,23 +234,53 @@ def partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method=
            ledges_list.append(local_edges)
        assert num_edges == g.number_of_edges()
    else:
-        edge_parts = np.zeros((g.number_of_edges(),), dtype=np.int64)
+        num_edges = 0
+        num_nodes = 0
+        lnodes_list = []      # The node ids of each partition
+        ledges_list = []      # The edge Ids of each partition
+        for part_id in range(num_parts):
+            part = client_parts[part_id]
+            num_local_nodes = F.asnumpy(F.sum(part.ndata['inner_node'], 0))
+            # To get the edges in the input graph, we should use original node Ids.
+            local_nodes = F.boolean_mask(part.ndata['orig_id'], part.ndata['inner_node'])
+            num_local_edges = F.asnumpy(F.sum(g.in_degrees(local_nodes), 0))
+            num_edges += int(num_local_edges)
+            num_nodes += int(num_local_nodes)
+            lnodes_list.append(num_nodes)
+            ledges_list.append(num_edges)
+        assert num_edges == g.number_of_edges()
+        assert num_nodes == g.number_of_nodes()
    os.makedirs(out_path, mode=0o775, exist_ok=True)
    tot_num_inner_edges = 0
    out_path = os.path.abspath(out_path)
-    node_part_file = os.path.join(out_path, "node_map")
-    edge_part_file = os.path.join(out_path, "edge_map")
+    # Without reshuffling, we have to store the entire node/edge mapping in a file.
-    np.save(node_part_file, F.asnumpy(node_parts), allow_pickle=True)
+    if not reshuffle:
-    np.save(edge_part_file, edge_parts, allow_pickle=True)
+        node_part_file = os.path.join(out_path, "node_map")
+        edge_part_file = os.path.join(out_path, "edge_map")
+        np.save(node_part_file, F.asnumpy(node_parts), allow_pickle=False)
+        np.save(edge_part_file, edge_parts, allow_pickle=False)
+        node_map_val = node_part_file + ".npy"
+        edge_map_val = edge_part_file + ".npy"
+    else:
+        # With reshuffling, we can ensure that all nodes and edges are reshuffled
+        # and are in contiguous Id space.
+        if num_parts > 1:
+            node_map_val = lnodes_list
+            edge_map_val = ledges_list
+        else:
+            node_map_val = [g.number_of_nodes()]
+            edge_map_val = [g.number_of_edges()]
    part_metadata = {'graph_name': graph_name,
                     'num_nodes': g.number_of_nodes(),
                     'num_edges': g.number_of_edges(),
                     'part_method': part_method,
                     'num_parts': num_parts,
                     'halo_hops': num_hops,
-                     'node_map': node_part_file + ".npy",
+                     'node_map': node_map_val,
-                     'edge_map': edge_part_file + ".npy"}
+                     'edge_map': edge_map_val}
    for part_id in range(num_parts):
        part = client_parts[part_id]
@@ -248,8 +288,15 @@ def partition_graph(g, graph_name, num_parts, out_path, num_hops=1, part_method=
        node_feats = {}
        edge_feats = {}
        if num_parts > 1:
-            local_nodes = lnodes_list[part_id]
+            if reshuffle and part_id == 0:
-            local_edges = ledges_list[part_id]
+                local_nodes = F.arange(0, lnodes_list[part_id])
+                local_edges = F.arange(0, ledges_list[part_id])
+            elif reshuffle:
+                local_nodes = F.arange(lnodes_list[part_id - 1], lnodes_list[part_id])
+                local_edges = F.arange(ledges_list[part_id - 1], ledges_list[part_id])
+            else:
+                local_nodes = lnodes_list[part_id]
+                local_edges = ledges_list[part_id]
            print('part {} has {} nodes and {} edges.'.format(
                part_id, part.number_of_nodes(), part.number_of_edges()))
            print('{} nodes and {} edges are inside the partition'.format(

--- a/python/dgl/transform.py
+++ b/python/dgl/transform.py
@@ -11,7 +11,6 @@ from . import ndarray as nd
 from . import backend as F
 from .graph_index import from_coo
 from .graph_index import _get_halo_subgraph_inner_node
-from .graph_index import _get_halo_subgraph_inner_edge
 from .graph import unbatch
 from .convert import graph, bipartite
 from . import utils
@@ -549,9 +548,48 @@ def remove_self_loop(g):
    new_g.add_edges(src[non_self_edges_idx], dst[non_self_edges_idx])
    return new_g
-def partition_graph_with_halo(g, node_part, num_hops):
+def reorder_nodes(g, new_node_ids):
-    ''' This is to partition a graph. Each partition contains HALO nodes
+    """ Generate a new graph with new node Ids.
-    so that we can generate NodeFlow in each partition correctly.
+    We assign each node in the input graph with a new node Id. This results in
+    a new graph.
+    Parameters
+    ----------
+    g : DGLGraph
+        The input graph
+    new_node_ids : a tensor
+        The new node Ids
+    Returns
+    -------
+    DGLGraph
+        The graph with new node Ids.
+    """
+    assert len(new_node_ids) == g.number_of_nodes(), \
+            "The number of new node ids must match #nodes in the graph."
+    new_node_ids = utils.toindex(new_node_ids)
+    sorted_ids, idx = F.sort_1d(new_node_ids.tousertensor())
+    assert F.asnumpy(sorted_ids[0]) == 0 \
+            and F.asnumpy(sorted_ids[-1]) == g.number_of_nodes() - 1, \
+            "The new node Ids are incorrect."
+    new_gidx = _CAPI_DGLReorderGraph(g._graph, new_node_ids.todgltensor())
+    new_g = DGLGraph(new_gidx)
+    new_g.ndata['orig_id'] = idx
+    return new_g
+def partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle=False):
+    '''Partition a graph.
+    Based on the given node assignments for each partition, the function splits
+    the input graph into subgraphs. A subgraph may contain HALO nodes which does
+    not belong to the partition of a subgraph but are connected to the nodes
+    in the partition within a fixed number of hops.
+    If `reshuffle` is turned on, the function reshuffles node Ids and edge Ids
+    of the input graph before partitioning. After reshuffling, all nodes and edges
+    in a partition fall in a contiguous Id range in the input graph.
+    The partitioend subgraphs have node data 'orig_id', which stores the node Ids
+    in the original input graph.
    Parameters
    ------------
@@ -563,9 +601,12 @@ def partition_graph_with_halo(g, node_part, num_hops):
        needs to be the same as the number of nodes of the graph. Each element
        indicates the partition Id of a node.
-    num_hops: int
+    extra_cached_hops: int
        The number of hops a HALO node can be accessed.
+    reshuffle : bool
+        Resuffle nodes so that nodes in the same partition are in the same Id range.
    Returns
    --------
    a dict of DGLGraphs
@@ -573,15 +614,41 @@ def partition_graph_with_halo(g, node_part, num_hops):
    '''
    assert len(node_part) == g.number_of_nodes()
    node_part = utils.toindex(node_part)
-    subgs = _CAPI_DGLPartitionWithHalo(g._graph, node_part.todgltensor(), num_hops)
+    if reshuffle:
+        node_part = node_part.tousertensor()
+        sorted_part, new2old_map = F.sort_1d(node_part)
+        new_node_ids = F.gather_row(F.arange(0, g.number_of_nodes()), new2old_map)
+        g = reorder_nodes(g, new_node_ids)
+        node_part = utils.toindex(sorted_part)
+        # We reassign edges in in-CSR. In this way, after partitioning, we can ensure
+        # that all edges in a partition are in the contiguous Id space.
+        orig_eids = _CAPI_DGLReassignEdges(g._graph, True)
+        orig_eids = utils.toindex(orig_eids)
+        g.edata['orig_id'] = orig_eids.tousertensor()
+    subgs = _CAPI_DGLPartitionWithHalo(g._graph, node_part.todgltensor(), extra_cached_hops)
    subg_dict = {}
+    node_part = node_part.tousertensor()
    for i, subg in enumerate(subgs):
        inner_node = _get_halo_subgraph_inner_node(subg)
-        inner_edge = _get_halo_subgraph_inner_edge(subg)
        subg = g._create_subgraph(subg, subg.induced_nodes, subg.induced_edges)
        inner_node = F.zerocopy_from_dlpack(inner_node.to_dlpack())
        subg.ndata['inner_node'] = inner_node
-        inner_edge = F.zerocopy_from_dlpack(inner_edge.to_dlpack())
+        subg.ndata['part_id'] = F.gather_row(node_part, subg.parent_nid)
+        if reshuffle:
+            subg.ndata['orig_id'] = F.gather_row(g.ndata['orig_id'], subg.ndata[NID])
+            subg.edata['orig_id'] = F.gather_row(g.edata['orig_id'], subg.edata[EID])
+        if extra_cached_hops >= 1:
+            inner_edge = F.zeros((subg.number_of_edges(),), F.int64, F.cpu())
+            inner_nids = F.nonzero_1d(subg.ndata['inner_node'])
+            # TODO(zhengda) we need to fix utils.toindex() to avoid the dtype cast below.
+            inner_nids = F.astype(inner_nids, F.int64)
+            inner_eids = subg.in_edges(inner_nids, form='eid')
+            inner_edge = F.scatter_row(inner_edge, inner_eids,
+                                       F.ones((len(inner_eids),), F.dtype(inner_edge), F.cpu()))
+        else:
+            inner_edge = F.ones((subg.number_of_edges(),), F.int64, F.cpu())
        subg.edata['inner_edge'] = inner_edge
        subg_dict[i] = subg
    return subg_dict
@@ -613,14 +680,23 @@ def metis_partition_assignment(g, k):
        node_part = utils.toindex(node_part)
        return node_part.tousertensor()
-def metis_partition(g, k, extra_cached_hops=0):
+def metis_partition(g, k, extra_cached_hops=0, reshuffle=False):
    ''' This is to partition a graph with Metis partitioning.
-    Metis assigns vertices to partitions. This API constructs graphs with the vertices assigned
+    Metis assigns vertices to partitions. This API constructs subgraphs with the vertices assigned
-    to the partitions and their incoming edges.
+    to the partitions and their incoming edges. A subgraph may contain HALO nodes which does
+    not belong to the partition of a subgraph but are connected to the nodes
+    in the partition within a fixed number of hops.
-    The partitioned graph is stored in DGLGraph. The DGLGraph has the `part_id`
+    If `reshuffle` is turned on, the function reshuffles node Ids and edge Ids
-    node data that indicates the partition a node belongs to.
+    of the input graph before partitioning. After reshuffling, all nodes and edges
+    in a partition fall in a contiguous Id range in the input graph.
+    The partitioend subgraphs have node data 'orig_id', which stores the node Ids
+    in the original input graph.
+    The partitioned subgraph is stored in DGLGraph. The DGLGraph has the `part_id`
+    node data that indicates the partition a node belongs to. The subgraphs do not contain
+    the node/edge data in the input graph.
    Parameters
    ------------
@@ -633,6 +709,9 @@ def metis_partition(g, k, extra_cached_hops=0):
    extra_cached_hops: int
        The number of hops a HALO node can be accessed.
+    reshuffle : bool
+        Resuffle nodes so that nodes in the same partition are in the same Id range.
    Returns
    --------
    a dict of DGLGraphs
@@ -645,14 +724,8 @@ def metis_partition(g, k, extra_cached_hops=0):
    if len(node_part) == 0:
        return None
-    node_part = utils.toindex(node_part)
    # Then we split the original graph into parts based on the METIS partitioning results.
-    parts = partition_graph_with_halo(g, node_part, extra_cached_hops)
+    return partition_graph_with_halo(g, node_part, extra_cached_hops, reshuffle)
-    node_part = node_part.tousertensor()
-    for part_id in parts:
-        part = parts[part_id]
-        part.ndata['part_id'] = F.gather_row(node_part, part.parent_nid)
-    return parts
 def compact_graphs(graphs, always_preserve=None):
    """Given a list of graphs with the same set of nodes, find and eliminate the common
@@ -953,10 +1026,6 @@ def remove_edges(g, edge_ids):
    The nodes are preserved.
-    Note: `remove_edges` is slow especially when removing a small number of edges from
-    a large graph. It creates a new graph with all remaining edges and return the new graph.
-    Please use it with caution especially when using it in mini-batch training.
    Parameters
    ----------
    graph : DGLHeteroGraph

--- a/src/array/array.cc
+++ b/src/array/array.cc
@@ -441,6 +441,22 @@ void CSRSort_(CSRMatrix* csr) {
  });
 }
+CSRMatrix CSRReorder(CSRMatrix csr, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids) {
+  CSRMatrix ret;
+  ATEN_CSR_SWITCH(csr, XPU, IdType, "CSRReorder", {
+    ret = impl::CSRReorder<XPU, IdType>(csr, new_row_ids, new_col_ids);
+  });
+  return ret;
+}
+COOMatrix COOReorder(COOMatrix coo, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids) {
+  COOMatrix ret;
+  ATEN_COO_SWITCH(coo, XPU, IdType, "COOReorder", {
+    ret = impl::COOReorder<XPU, IdType>(coo, new_row_ids, new_col_ids);
+  });
+  return ret;
+}
 CSRMatrix CSRRemove(CSRMatrix csr, IdArray entries) {
  CSRMatrix ret;
  ATEN_CSR_SWITCH(csr, XPU, IdType, "CSRRemove", {

--- a/src/array/array_op.h
+++ b/src/array/array_op.h
@@ -113,6 +113,12 @@ CSRMatrix CSRSliceMatrix(CSRMatrix csr, runtime::NDArray rows, runtime::NDArray
 template <DLDeviceType XPU, typename IdType>
 void CSRSort_(CSRMatrix* csr);
+template <DLDeviceType XPU, typename IdType>
+CSRMatrix CSRReorder(CSRMatrix csr, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids);
+template <DLDeviceType XPU, typename IdType>
+COOMatrix COOReorder(COOMatrix coo, runtime::NDArray new_row_ids, runtime::NDArray new_col_ids);
 template <DLDeviceType XPU, typename IdType>
 CSRMatrix CSRRemove(CSRMatrix csr, IdArray entries);

--- a/src/array/cpu/spmat_op_impl.cc
+++ b/src/array/cpu/spmat_op_impl.cc
@@ -556,6 +556,8 @@ template CSRMatrix CSRSliceMatrix<kDLCPU, int32_t>(
 template CSRMatrix CSRSliceMatrix<kDLCPU, int64_t>(
    CSRMatrix csr, runtime::NDArray rows, runtime::NDArray cols);
+///////////////////////////// CSRSort /////////////////////////////
 template <DLDeviceType XPU, typename IdType>
 void CSRSort_(CSRMatrix* csr) {
  typedef std::pair<IdType, IdType> ShufflePair;
@@ -597,6 +599,82 @@ void CSRSort_(CSRMatrix* csr) {
 template void CSRSort_<kDLCPU, int64_t>(CSRMatrix* csr);
 template void CSRSort_<kDLCPU, int32_t>(CSRMatrix* csr);
+///////////////////////////// CSRReorder /////////////////////////////
+template <DLDeviceType XPU, typename IdType>
+CSRMatrix CSRReorder(CSRMatrix csr, runtime::NDArray new_row_id_arr,
+                     runtime::NDArray new_col_id_arr) {
+  CHECK_SAME_DTYPE(csr.indices, new_row_id_arr);
+  CHECK_SAME_DTYPE(csr.indices, new_col_id_arr);
+  // Input CSR
+  const IdType* in_indptr = static_cast<IdType*>(csr.indptr->data);
+  const IdType* in_indices = static_cast<IdType*>(csr.indices->data);
+  const IdType* in_data = static_cast<IdType*>(csr.data->data);
+  int64_t num_rows = csr.num_rows;
+  int64_t num_cols = csr.num_cols;
+  int64_t nnz = csr.indices->shape[0];
+  CHECK_EQ(nnz, in_indptr[num_rows]);
+  CHECK_EQ(num_rows, new_row_id_arr->shape[0])
+      << "The new row Id array needs to be the same as the number of rows of CSR";
+  CHECK_EQ(num_cols, new_col_id_arr->shape[0])
+      << "The new col Id array needs to be the same as the number of cols of CSR";
+  // New row/col Ids.
+  const IdType* new_row_ids = static_cast<IdType*>(new_row_id_arr->data);
+  const IdType* new_col_ids = static_cast<IdType*>(new_col_id_arr->data);
+  // Output CSR
+  NDArray out_indptr_arr = NDArray::Empty({num_rows + 1}, csr.indptr->dtype, csr.indptr->ctx);
+  NDArray out_indices_arr = NDArray::Empty({nnz}, csr.indices->dtype, csr.indices->ctx);
+  NDArray out_data_arr = NDArray::Empty({nnz}, csr.data->dtype, csr.data->ctx);
+  IdType *out_indptr = static_cast<IdType*>(out_indptr_arr->data);
+  IdType *out_indices = static_cast<IdType*>(out_indices_arr->data);
+  IdType *out_data = static_cast<IdType*>(out_data_arr->data);
+  // Compute the length of rows for the new matrix.
+  std::vector<IdType> new_row_lens(num_rows, -1);
+#pragma omp parallel for
+  for (int64_t i = 0; i < num_rows; i++) {
+    int64_t new_row_id = new_row_ids[i];
+    new_row_lens[new_row_id] = in_indptr[i + 1] - in_indptr[i];
+  }
+  // Compute the starting location of each row in the new matrix.
+  out_indptr[0] = 0;
+  // This is sequential. It should be pretty fast.
+  for (int64_t i = 0; i < num_rows; i++) {
+    CHECK_GE(new_row_lens[i], 0);
+    out_indptr[i + 1] = out_indptr[i] + new_row_lens[i];
+  }
+  CHECK_EQ(out_indptr[num_rows], nnz);
+  // Copy indieces and data with the new order.
+  // Here I iterate rows in the order of the old matrix.
+#pragma omp parallel for
+  for (int64_t i = 0; i < num_rows; i++) {
+    const IdType *in_row = in_indices + in_indptr[i];
+    const IdType *in_row_data = in_data + in_indptr[i];
+    int64_t new_row_id = new_row_ids[i];
+    IdType *out_row = out_indices + out_indptr[new_row_id];
+    IdType *out_row_data = out_data + out_indptr[new_row_id];
+    int64_t row_len = new_row_lens[new_row_id];
+    // Here I iterate col indices in a row in the order of the old matrix.
+    for (int64_t j = 0; j < row_len; j++) {
+      out_row[j] = new_col_ids[in_row[j]];
+      out_row_data[j] = in_row_data[j];
+    }
+    // TODO(zhengda) maybe we should sort the column indices.
+  }
+  return CSRMatrix(num_rows, num_cols,
+    out_indptr_arr, out_indices_arr, out_data_arr);
+}
+template CSRMatrix CSRReorder<kDLCPU, int64_t>(CSRMatrix csr, runtime::NDArray new_row_ids,
+                                               runtime::NDArray new_col_ids);
+template CSRMatrix CSRReorder<kDLCPU, int32_t>(CSRMatrix csr, runtime::NDArray new_row_ids,
+                                               runtime::NDArray new_col_ids);
 }  // namespace impl
 }  // namespace aten
 }  // namespace dgl
--- a/src/array/cpu/spmat_op_impl_coo.cc
+++ b/src/array/cpu/spmat_op_impl_coo.cc
@@ -427,6 +427,51 @@ template COOMatrix COOSliceMatrix<kDLCPU, int32_t>(
 template COOMatrix COOSliceMatrix<kDLCPU, int64_t>(
    COOMatrix coo, runtime::NDArray rows, runtime::NDArray cols);
+///////////////////////////// COOReorder /////////////////////////////
+template <DLDeviceType XPU, typename IdType>
+COOMatrix COOReorder(COOMatrix coo, runtime::NDArray new_row_id_arr,
+                     runtime::NDArray new_col_id_arr) {
+  CHECK_SAME_DTYPE(coo.row, new_row_id_arr);
+  CHECK_SAME_DTYPE(coo.col, new_col_id_arr);
+  // Input COO
+  const IdType* in_rows = static_cast<IdType*>(coo.row->data);
+  const IdType* in_cols = static_cast<IdType*>(coo.col->data);
+  const IdType* in_data = COOHasData(coo) ? static_cast<IdType*>(coo.data->data) : nullptr;
+  int64_t num_rows = coo.num_rows;
+  int64_t num_cols = coo.num_cols;
+  int64_t nnz = coo.row->shape[0];
+  CHECK_EQ(num_rows, new_row_id_arr->shape[0])
+      << "The new row Id array needs to be the same as the number of rows of COO";
+  CHECK_EQ(num_cols, new_col_id_arr->shape[0])
+      << "The new col Id array needs to be the same as the number of cols of COO";
+  // New row/col Ids.
+  const IdType* new_row_ids = static_cast<IdType*>(new_row_id_arr->data);
+  const IdType* new_col_ids = static_cast<IdType*>(new_col_id_arr->data);
+  // Output COO
+  NDArray out_row_arr = NDArray::Empty({nnz}, coo.row->dtype, coo.row->ctx);
+  NDArray out_col_arr = NDArray::Empty({nnz}, coo.col->dtype, coo.col->ctx);
+  NDArray out_data_arr = COOHasData(coo) ? coo.data : NullArray();
+  IdType *out_row = static_cast<IdType*>(out_row_arr->data);
+  IdType *out_col = static_cast<IdType*>(out_col_arr->data);
+#pragma omp parallel for
+  for (int64_t i = 0; i < nnz; i++) {
+    out_row[i] = new_row_ids[in_rows[i]];
+    out_col[i] = new_col_ids[in_cols[i]];
+  }
+  return COOMatrix(num_rows, num_cols, out_row_arr, out_col_arr, out_data_arr);
+}
+template COOMatrix COOReorder<kDLCPU, int64_t>(COOMatrix csr, runtime::NDArray new_row_ids,
+                                               runtime::NDArray new_col_ids);
+template COOMatrix COOReorder<kDLCPU, int32_t>(COOMatrix csr, runtime::NDArray new_row_ids,
+                                               runtime::NDArray new_col_ids);
 }  // namespace impl
 }  // namespace aten
 }  // namespace dgl
--- a/src/graph/graph_op.cc
+++ b/src/graph/graph_op.cc
@@ -403,7 +403,13 @@ GraphPtr GraphOp::ToBidirectedImmutableGraph(GraphPtr g) {
 HaloSubgraph GraphOp::GetSubgraphWithHalo(GraphPtr g, IdArray nodes, int num_hops) {
  const dgl_id_t *nid = static_cast<dgl_id_t *>(nodes->data);
  const auto id_len = nodes->shape[0];
+  // A map contains all nodes in the subgraph.
+  // The key is the old node Ids, the value indicates whether a node is a inner
+  // node.
  std::unordered_map<dgl_id_t, bool> all_nodes;
+  // The old Ids of all nodes. We want to preserve the order of the nodes in the
+  // vector. The first few nodes are the inner nodes in the subgraph.
+  std::vector<dgl_id_t> old_node_ids(nid, nid + id_len);
  std::vector<std::vector<dgl_id_t>> outer_nodes(num_hops);
  for (int64_t i = 0; i < id_len; i++)
    all_nodes[nid[i]] = true;
@@ -436,6 +442,7 @@ HaloSubgraph GraphOp::GetSubgraphWithHalo(GraphPtr g, IdArray nodes, int num_hop
    auto it = all_nodes.find(src_data[i]);
    if (it == all_nodes.end() && num_hops > 0) {
      all_nodes[src_data[i]] = false;
+      old_node_ids.push_back(src_data[i]);
      outer_nodes[0].push_back(src_data[i]);
    }
  }
@@ -461,22 +468,14 @@ HaloSubgraph GraphOp::GetSubgraphWithHalo(GraphPtr g, IdArray nodes, int num_hop
      auto it = all_nodes.find(src_data[i]);
      if (it == all_nodes.end()) {
        all_nodes[src_data[i]] = false;
+        old_node_ids.push_back(src_data[i]);
        outer_nodes[k].push_back(src_data[i]);
      }
    }
  }
-  // We assign new Ids to the nodes in the subgraph. We ensure that nodes
+  // We assign new Ids to the nodes in the subgraph. We ensure that the HALO
-  // with smaller Ids in the original graph will also get smaller Ids in
+  // nodes are behind the input nodes.
-  // the subgraph.
-  // Move all nodes to a vector.
-  std::vector<dgl_id_t> old_node_ids;
-  old_node_ids.reserve(all_nodes.size());
-  for (auto it = all_nodes.begin(); it != all_nodes.end(); it++) {
-    old_node_ids.push_back(it->first);
-  }
-  std::sort(old_node_ids.begin(), old_node_ids.end());
  std::unordered_map<dgl_id_t, dgl_id_t> old2new;
  for (size_t i = 0; i < old_node_ids.size(); i++) {
    old2new[old_node_ids[i]] = i;
@@ -508,6 +507,32 @@ HaloSubgraph GraphOp::GetSubgraphWithHalo(GraphPtr g, IdArray nodes, int num_hop
  return halo_subg;
 }
+GraphPtr GraphOp::ReorderImmutableGraph(ImmutableGraphPtr ig, IdArray new_order) {
+  CSRPtr in_csr, out_csr;
+  COOPtr coo;
+  // We only need to reorder one of the graph structure.
+  if (ig->HasInCSR()) {
+    in_csr = ig->GetInCSR();
+    auto csrmat = in_csr->ToCSRMatrix();
+    auto new_csrmat = aten::CSRReorder(csrmat, new_order, new_order);
+    in_csr = CSRPtr(new CSR(new_csrmat.indptr, new_csrmat.indices, new_csrmat.data));
+  } else if (ig->HasOutCSR()) {
+    out_csr = ig->GetOutCSR();
+    auto csrmat = out_csr->ToCSRMatrix();
+    auto new_csrmat = aten::CSRReorder(csrmat, new_order, new_order);
+    out_csr = CSRPtr(new CSR(new_csrmat.indptr, new_csrmat.indices, new_csrmat.data));
+  } else {
+    coo = ig->GetCOO();
+    auto coomat = coo->ToCOOMatrix();
+    auto new_coomat = aten::COOReorder(coomat, new_order, new_order);
+    coo = COOPtr(new COO(ig->NumVertices(), new_coomat.row, new_coomat.col));
+  }
+  if (in_csr || out_csr)
+    return GraphPtr(new ImmutableGraph(in_csr, out_csr));
+  else
+    return GraphPtr(new ImmutableGraph(coo));
+}
 DGL_REGISTER_GLOBAL("transform._CAPI_DGLPartitionWithHalo")
 .set_body([] (DGLArgs args, DGLRetValue* rv) {
    GraphRef graph = args[0];
@@ -537,6 +562,7 @@ DGL_REGISTER_GLOBAL("transform._CAPI_DGLPartitionWithHalo")
      part_nodes.push_back(it->second);
    }
    auto graph_ptr = std::dynamic_pointer_cast<ImmutableGraph>(graph.sptr());
+    CHECK(graph_ptr) << "The input graph has to be an immutable graph";
    // When we construct subgraphs, we only access in-edges.
    // We need to make sure the in-CSR exists. Otherwise, we'll
    // try to construct in-CSR in openmp for loop, which will lead
@@ -573,6 +599,7 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_GetHaloSubgraphInnerNodes")
 .set_body([] (DGLArgs args, DGLRetValue* rv) {
  SubgraphRef g = args[0];
  auto gptr = std::dynamic_pointer_cast<HaloSubgraph>(g.sptr());
+  CHECK(gptr) << "The input graph has to be immutable graph";
  *rv = gptr->inner_nodes;
 });
@@ -580,6 +607,7 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_GetHaloSubgraphInnerEdges")
 .set_body([] (DGLArgs args, DGLRetValue* rv) {
  SubgraphRef g = args[0];
  auto gptr = std::dynamic_pointer_cast<HaloSubgraph>(g.sptr());
+  CHECK(gptr) << "The input graph has to be immutable graph";
  *rv = gptr->inner_edges;
 });
@@ -642,6 +670,42 @@ DGL_REGISTER_GLOBAL("transform._CAPI_DGLToBidirectedMutableGraph")
    *rv = GraphOp::ToBidirectedMutableGraph(g.sptr());
  });
+DGL_REGISTER_GLOBAL("transform._CAPI_DGLReorderGraph")
+.set_body([] (DGLArgs args, DGLRetValue* rv) {
+    GraphRef g = args[0];
+    const IdArray new_order = args[1];
+    auto gptr = std::dynamic_pointer_cast<ImmutableGraph>(g.sptr());
+    CHECK(gptr) << "The input graph has to be immutable graph";
+    *rv = GraphOp::ReorderImmutableGraph(gptr, new_order);
+  });
+DGL_REGISTER_GLOBAL("transform._CAPI_DGLReassignEdges")
+.set_body([] (DGLArgs args, DGLRetValue* rv) {
+    GraphRef graph = args[0];
+    bool is_incsr = args[1];
+    auto gptr = std::dynamic_pointer_cast<ImmutableGraph>(graph.sptr());
+    CHECK(gptr) << "We can only reassign edge Ids on immutable graphs";
+    CSRPtr csr = is_incsr ? gptr->GetInCSR() : gptr->GetOutCSR();
+    auto csrmat = csr->ToCSRMatrix();
+    int64_t num_edges = csrmat.data->shape[0];
+    IdArray new_data = IdArray::Empty({num_edges}, csrmat.data->dtype, csrmat.data->ctx);
+    // Return the original edge Ids.
+    *rv = new_data;
+    // TODO(zhengda) I need to invalidate out-CSR and COO.
+    // Generate new edge Ids.
+    // TODO(zhengda) after assignment, we actually don't need to store them
+    // physically.
+    ATEN_ID_TYPE_SWITCH(new_data->dtype, IdType, {
+      IdType *typed_new_data = static_cast<IdType*>(new_data->data);
+      IdType *typed_data = static_cast<IdType*>(csrmat.data->data);
+      for (int64_t i = 0; i < num_edges; i++) {
+        typed_new_data[i] = typed_data[i];
+        typed_data[i] = i;
+      }
+    });
+  });
 DGL_REGISTER_GLOBAL("transform._CAPI_DGLToBidirectedImmutableGraph")
 .set_body([] (DGLArgs args, DGLRetValue* rv) {
    GraphRef g = args[0];

--- a/tests/compute/test_transform.py
+++ b/tests/compute/test_transform.py
@@ -244,23 +244,35 @@ def test_partition_with_halo():
 @unittest.skipIf(F._default_context_str == 'gpu', reason="METIS doesn't support GPU")
 def test_metis_partition():
+    # TODO(zhengda) Metis fails to partition a small graph.
    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
-    subgs = dgl.transform.metis_partition(g, 4, 0)
+    check_metis_partition(g, 0)
+    check_metis_partition(g, 1)
+    check_metis_partition(g, 2)
+def check_metis_partition(g, extra_hops):
+    # partitions with 1-hop HALO nodes
+    subgs = dgl.transform.metis_partition(g, 4, extra_cached_hops=extra_hops)
    num_inner_nodes = 0
    num_inner_edges = 0
    if subgs is not None:
        for part_id, subg in subgs.items():
-            assert np.all(F.asnumpy(subg.ndata['inner_node']) == 1)
+            lnode_ids = np.nonzero(F.asnumpy(subg.ndata['inner_node']))[0]
-            assert np.all(F.asnumpy(subg.edata['inner_edge']) == 1)
+            ledge_ids = np.nonzero(F.asnumpy(subg.edata['inner_edge']))[0]
-            assert np.all(F.asnumpy(subg.ndata['part_id']) == part_id)
+            num_inner_nodes += len(lnode_ids)
-            num_inner_nodes += subg.number_of_nodes()
+            num_inner_edges += len(ledge_ids)
-            num_inner_edges += subg.number_of_edges()
+            assert np.sum(F.asnumpy(subg.ndata['part_id']) == part_id) == len(lnode_ids)
        assert num_inner_nodes == g.number_of_nodes()
        print(g.number_of_edges() - num_inner_edges)
-    subgs = dgl.transform.metis_partition(g, 4, 1)
+    if extra_hops == 0:
+        return
+    # partitions with 1-hop HALO nodes and reshuffling nodes
+    subgs = dgl.transform.metis_partition(g, 4, extra_cached_hops=extra_hops, reshuffle=True)
    num_inner_nodes = 0
    num_inner_edges = 0
+    edge_cnts = np.zeros((g.number_of_edges(),))
    if subgs is not None:
        for part_id, subg in subgs.items():
            lnode_ids = np.nonzero(F.asnumpy(subg.ndata['inner_node']))[0]
@@ -268,9 +280,61 @@ def test_metis_partition():
            num_inner_nodes += len(lnode_ids)
            num_inner_edges += len(ledge_ids)
            assert np.sum(F.asnumpy(subg.ndata['part_id']) == part_id) == len(lnode_ids)
+            nids = F.asnumpy(subg.ndata[dgl.NID])
+            # ensure the local node Ids are contiguous.
+            parent_ids = F.asnumpy(subg.ndata[dgl.NID])
+            parent_ids = parent_ids[:len(lnode_ids)]
+            assert np.all(parent_ids == np.arange(parent_ids[0], parent_ids[-1] + 1))
+            # count the local edges.
+            parent_ids = F.asnumpy(subg.edata[dgl.EID])[ledge_ids]
+            edge_cnts[parent_ids] += 1
+            orig_ids = subg.ndata['orig_id']
+            inner_node = F.asnumpy(subg.ndata['inner_node'])
+            for nid in range(subg.number_of_nodes()):
+                neighs = subg.predecessors(nid)
+                old_neighs1 = F.gather_row(orig_ids, neighs)
+                old_nid = F.asnumpy(orig_ids[nid])
+                old_neighs2 = g.predecessors(old_nid)
+                # If this is an inner node, it should have the full neighborhood.
+                if inner_node[nid]:
+                    assert np.all(np.sort(F.asnumpy(old_neighs1)) == np.sort(F.asnumpy(old_neighs2)))
+        # Normally, local edges are only counted once.
+        assert np.all(edge_cnts == 1)
        assert num_inner_nodes == g.number_of_nodes()
        print(g.number_of_edges() - num_inner_edges)
+@unittest.skipIf(F._default_context_str == 'gpu', reason="It doesn't support GPU")
+def test_reorder_nodes():
+    g = dgl.DGLGraph(create_large_graph_index(1000), readonly=True)
+    new_nids = np.random.permutation(g.number_of_nodes())
+    # TODO(zhengda) we need to test both CSR and COO.
+    new_g = dgl.transform.reorder_nodes(g, new_nids)
+    new_in_deg = new_g.in_degrees()
+    new_out_deg = new_g.out_degrees()
+    in_deg = g.in_degrees()
+    out_deg = g.out_degrees()
+    new_in_deg1 = F.scatter_row(in_deg, F.tensor(new_nids), in_deg)
+    new_out_deg1 = F.scatter_row(out_deg, F.tensor(new_nids), out_deg)
+    assert np.all(F.asnumpy(new_in_deg == new_in_deg1))
+    assert np.all(F.asnumpy(new_out_deg == new_out_deg1))
+    orig_ids = F.asnumpy(new_g.ndata['orig_id'])
+    for nid in range(new_g.number_of_nodes()):
+        neighs = F.asnumpy(new_g.successors(nid))
+        old_neighs1 = orig_ids[neighs]
+        old_nid = orig_ids[nid]
+        old_neighs2 = g.successors(old_nid)
+        assert np.all(np.sort(old_neighs1) == np.sort(F.asnumpy(old_neighs2)))
+        neighs = F.asnumpy(new_g.predecessors(nid))
+        old_neighs1 = orig_ids[neighs]
+        old_nid = orig_ids[nid]
+        old_neighs2 = g.predecessors(old_nid)
+        assert np.all(np.sort(old_neighs1) == np.sort(F.asnumpy(old_neighs2)))
 @unittest.skipIf(F._default_context_str == 'gpu', reason="GPU not implemented")
 @parametrize_dtype
 def test_in_subgraph(index_dtype):
@@ -615,6 +679,7 @@ def test_cast():
    assert F.array_equal(g2dst, gdst)
 if __name__ == '__main__':
+    test_reorder_nodes()
    # test_line_graph()
    # test_no_backtracking()
    # test_reverse()
@@ -627,7 +692,7 @@ if __name__ == '__main__':
    # test_remove_self_loop()
    # test_add_self_loop()
    # test_partition_with_halo()
-    # test_metis_partition()
+    test_metis_partition()
    # test_compact()
    # test_to_simple()
    # test_in_subgraph("int32")

--- a/tests/cpp/test_spmat.cc
+++ b/tests/cpp/test_spmat.cc
@@ -480,3 +480,37 @@ TEST(SpmatTest, TestCOOSort) {
  _TestCOOSort<int32_t>(GPU);
 #endif
 }
+template <typename IDX>
+void _TestCSRReorder() {
+  auto csr = CSR2<IDX>();
+  auto new_row = aten::VecToIdArray(
+    std::vector<IDX>({2, 0, 3, 1}), sizeof(IDX)*8, CTX);
+  auto new_col = aten::VecToIdArray(
+    std::vector<IDX>({2, 0, 4, 3, 1}), sizeof(IDX)*8, CTX);
+  auto new_csr = CSRReorder(csr, new_row, new_col);
+  ASSERT_EQ(new_csr.num_rows, csr.num_rows);
+  ASSERT_EQ(new_csr.num_cols, csr.num_cols);
+}
+TEST(SpmatTest, TestCSRReorder) {
+  _TestCSRReorder<int32_t>();
+  _TestCSRReorder<int64_t>();
+}
+template <typename IDX>
+void _TestCOOReorder() {
+  auto coo = COO2<IDX>();
+  auto new_row = aten::VecToIdArray(
+    std::vector<IDX>({2, 0, 3, 1}), sizeof(IDX)*8, CTX);
+  auto new_col = aten::VecToIdArray(
+    std::vector<IDX>({2, 0, 4, 3, 1}), sizeof(IDX)*8, CTX);
+  auto new_coo = COOReorder(coo, new_row, new_col);
+  ASSERT_EQ(new_coo.num_rows, coo.num_rows);
+  ASSERT_EQ(new_coo.num_cols, coo.num_cols);
+}
+TEST(SpmatTest, TestCOOReorder) {
+  _TestCOOReorder<int32_t>();
+  _TestCOOReorder<int64_t>();
+}
--- a/tests/distributed/test_dist_graph_store.py
+++ b/tests/distributed/test_dist_graph_store.py
@@ -51,15 +51,13 @@ def create_random_graph(n):
    ig = create_graph_index(arr, readonly=True)
    return dgl.DGLGraph(ig)
-def run_server(graph_name, server_id, num_clients, barrier):
+def run_server(graph_name, server_id, num_clients):
    g = DistGraphServer(server_id, "kv_ip_config.txt", num_clients, graph_name,
                        '/tmp/dist_graph/{}.json'.format(graph_name))
-    barrier.wait()
    print('start server', server_id)
    g.start()
-def run_client(graph_name, barrier, num_nodes, num_edges):
+def run_client(graph_name, num_nodes, num_edges):
-    barrier.wait()
    g = DistGraph("kv_ip_config.txt", graph_name)
    # Test API
@@ -132,19 +130,18 @@ def test_server_client():
    # let's just test on one partition for now.
    # We cannot run multiple servers and clients on the same machine.
-    barrier = mp.Barrier(2)
    serv_ps = []
    ctx = mp.get_context('spawn')
    for serv_id in range(1):
-        p = ctx.Process(target=run_server, args=(graph_name, serv_id, 1, barrier))
+        p = ctx.Process(target=run_server, args=(graph_name, serv_id, 1))
        serv_ps.append(p)
        p.start()
    cli_ps = []
    for cli_id in range(1):
        print('start client', cli_id)
-        p = ctx.Process(target=run_client, args=(graph_name, barrier, g.number_of_nodes(),
+        p = ctx.Process(target=run_client, args=(graph_name, g.number_of_nodes(),
-                                             g.number_of_edges()))
+                                                 g.number_of_edges()))
        p.start()
        cli_ps.append(p)
@@ -168,14 +165,8 @@ def test_split():
    selected_nodes = np.nonzero(node_mask)[0]
    selected_edges = np.nonzero(edge_mask)[0]
    for i in range(num_parts):
-        part_g, node_feats, edge_feats, meta = load_partition('/tmp/dist_graph/dist_graph_test.json', i)
+        part_g, node_feats, edge_feats, gpb = load_partition('/tmp/dist_graph/dist_graph_test.json', i)
-        num_nodes, num_edges, node_map, edge_map, num_partitions = meta
+        local_nids = F.nonzero_1d(part_g.ndata['inner_node'])
-        gpb = GraphPartitionBook(part_id=i,
-                                 num_parts=num_partitions,
-                                 node_map=node_map,
-                                 edge_map=edge_map,
-                                 part_graph=part_g)
-        local_nids = F.nonzero_1d(part_g.ndata['local_node'])
        local_nids = F.gather_row(part_g.ndata[dgl.NID], local_nids)
        nodes1 = np.intersect1d(selected_nodes, F.asnumpy(local_nids))
        nodes2 = node_split(node_mask, gpb, i)
@@ -184,7 +175,7 @@ def test_split():
        for n in nodes1:
            assert n in local_nids
-        local_eids = F.nonzero_1d(part_g.edata['local_edge'])
+        local_eids = F.nonzero_1d(part_g.edata['inner_edge'])
        local_eids = F.gather_row(part_g.edata[dgl.EID], local_eids)
        edges1 = np.intersect1d(selected_edges, F.asnumpy(local_eids))
        edges2 = edge_split(edge_mask, gpb, i)
@@ -200,6 +191,6 @@ def prepare_dist():
    ip_config.close()
 if __name__ == '__main__':
-    os.mkdir('/tmp/dist_graph')
+    os.makedirs('/tmp/dist_graph', exist_ok=True)
-    test_split()
+    #test_split()
-    #test_server_client()
+    test_server_client()
--- a/tests/distributed/test_graph_partition_book.py
+++ b/tests/distributed/test_graph_partition_book.py
-import dgl
-import sys
-import os
-import numpy as np
-from scipy import sparse as spsp
-from numpy.testing import assert_array_equal
-from dgl.graph_index import create_graph_index
-from dgl.distributed import partition_graph, load_partition, GraphPartitionBook
-import backend as F
-import unittest
-import pickle
-def create_random_graph(n):
-    arr = (spsp.random(n, n, density=0.001, format='coo') != 0).astype(np.int64)
-    ig = create_graph_index(arr, readonly=True)
-    return dgl.DGLGraph(ig)
-@unittest.skipIf(F._default_context_str == 'gpu', reason="METIS doesn't support GPU")
-def test_graph_partition_book():
-    g = create_random_graph(10000)
-    g.ndata['labels'] = F.arange(0, g.number_of_nodes())
-    g.ndata['feats'] = F.tensor(np.random.randn(g.number_of_nodes(), 10))
-    num_parts = 4
-    num_hops = 2
-    partition_graph(g, 'gpb_test', num_parts, '/tmp/gpb', num_hops=num_hops, part_method='metis')
-    for i in range(num_parts):
-        part_g, node_feats, edge_feats, meta = load_partition('/tmp/gpb/gpb_test.json', i)
-        num_nodes, num_edges, node_map, edge_map, num_partitions = meta
-        gpb = GraphPartitionBook(part_id=i,
-                                 num_parts=num_partitions,
-                                 node_map=node_map,
-                                 edge_map=edge_map,
-                                 part_graph=part_g)
-        assert gpb.num_partitions() == num_parts
-        gpb_meta = gpb.metadata()
-        assert len(gpb_meta) == num_parts
-        assert np.all(F.asnumpy(gpb.nid2partid(F.arange(0, len(node_map)))) == node_map)
-        assert np.all(F.asnumpy(gpb.eid2partid(F.arange(0, len(edge_map)))) == edge_map)
-        assert len(gpb.partid2nids(i)) == gpb_meta[i]['num_nodes']
-        assert len(gpb.partid2eids(i)) == gpb_meta[i]['num_edges']
-        local_nid = gpb.nid2localnid(part_g.ndata[dgl.NID], i)
-        assert np.all(F.asnumpy(local_nid) == F.asnumpy(F.arange(0, len(local_nid))))
-        local_eid = gpb.eid2localeid(part_g.edata[dgl.EID], i)
-        assert np.all(F.asnumpy(local_eid) == F.asnumpy(F.arange(0, len(local_eid))))
-if __name__ == '__main__':
-    os.mkdir('/tmp/gpb')
-    test_graph_partition_book()
--- a/tests/distributed/test_partition.py
+++ b/tests/distributed/test_partition.py
@@ -17,34 +17,44 @@ def create_random_graph(n):
    ig = create_graph_index(arr, readonly=True)
    return dgl.DGLGraph(ig)
-def test_partition():
+def check_partition(reshuffle):
    g = create_random_graph(10000)
    g.ndata['labels'] = F.arange(0, g.number_of_nodes())
    g.ndata['feats'] = F.tensor(np.random.randn(g.number_of_nodes(), 10))
    num_parts = 4
    num_hops = 2
-    partition_graph(g, 'test', num_parts, '/tmp/partition', num_hops=num_hops, part_method='metis')
+    partition_graph(g, 'test', num_parts, '/tmp/partition', num_hops=num_hops,
+                    part_method='metis', reshuffle=reshuffle)
+    part_sizes = []
    for i in range(num_parts):
-        part_g, node_feats, edge_feats, meta = load_partition('/tmp/partition/test.json', i)
+        part_g, node_feats, edge_feats, gpb = load_partition('/tmp/partition/test.json', i)
-        num_nodes, num_edges, node_map, edge_map, num_partitions = meta
        # Check the metadata
-        assert num_nodes == g.number_of_nodes()
+        assert gpb._num_nodes() == g.number_of_nodes()
-        assert num_edges == g.number_of_edges()
+        assert gpb._num_edges() == g.number_of_edges()
-        assert num_partitions == num_parts
+        assert gpb.num_partitions() == num_parts
+        gpb_meta = gpb.metadata()
+        assert len(gpb_meta) == num_parts
+        assert len(gpb.partid2nids(i)) == gpb_meta[i]['num_nodes']
+        assert len(gpb.partid2eids(i)) == gpb_meta[i]['num_edges']
+        part_sizes.append((gpb_meta[i]['num_nodes'], gpb_meta[i]['num_edges']))
+        local_nid = gpb.nid2localnid(F.boolean_mask(part_g.ndata[dgl.NID], part_g.ndata['inner_node']), i)
+        assert np.all(F.asnumpy(local_nid) == np.arange(0, len(local_nid)))
+        local_eid = gpb.eid2localeid(F.boolean_mask(part_g.edata[dgl.EID], part_g.edata['inner_edge']), i)
+        assert np.all(F.asnumpy(local_eid) == np.arange(0, len(local_eid)))
        # Check the node map.
-        local_nodes = np.nonzero(node_map == i)[0]
+        local_nodes = F.asnumpy(F.boolean_mask(part_g.ndata[dgl.NID], part_g.ndata['inner_node']))
-        part_ids = node_map[F.asnumpy(part_g.ndata[dgl.NID])]
+        local_nodes1 = F.asnumpy(gpb.partid2nids(i))
-        local_nodes1 = F.asnumpy(part_g.ndata[dgl.NID])[part_ids == i]
+        assert np.all(np.sort(local_nodes) == np.sort(local_nodes1))
-        assert np.all(local_nodes == local_nodes1)
        # Check the edge map.
-        assert np.all(edge_map >= 0)
+        local_edges = F.asnumpy(F.boolean_mask(part_g.edata[dgl.EID], part_g.edata['inner_edge']))
-        local_edges = np.nonzero(edge_map == i)[0]
+        local_edges1 = F.asnumpy(gpb.partid2eids(i))
-        part_ids = edge_map[F.asnumpy(part_g.edata[dgl.EID])]
+        assert np.all(np.sort(local_edges) == np.sort(local_edges1))
-        local_edges1 = F.asnumpy(part_g.edata[dgl.EID])[part_ids == i]
-        assert np.all(local_edges == np.sort(local_edges1))
        for name in ['labels', 'feats']:
            assert name in node_feats
@@ -53,7 +63,22 @@ def test_partition():
            assert np.all(F.asnumpy(g.ndata[name])[local_nodes] == F.asnumpy(node_feats[name]))
        assert len(edge_feats) == 0
+    if reshuffle:
+        node_map = []
+        edge_map = []
+        for i, (num_nodes, num_edges) in enumerate(part_sizes):
+            node_map.append(np.ones(num_nodes) * i)
+            edge_map.append(np.ones(num_edges) * i)
+        node_map = np.concatenate(node_map)
+        edge_map = np.concatenate(edge_map)
+        assert np.all(F.asnumpy(gpb.nid2partid(F.arange(0, len(node_map)))) == node_map)
+        assert np.all(F.asnumpy(gpb.eid2partid(F.arange(0, len(edge_map)))) == edge_map)
+def test_partition():
+    check_partition(True)
+    check_partition(False)
 if __name__ == '__main__':
-    os.mkdir('/tmp/partition')
+    os.makedirs('/tmp/partition', exist_ok=True)
    test_partition()