Unverified Commit 48a1794f authored by Da Zheng's avatar Da Zheng Committed by GitHub
Browse files

[Doc] Fix the user guide for distributed partitioning. (#2684)

* fix doc.

* explain the schema file.

* fix.
parent 80c26877
...@@ -90,14 +90,24 @@ a graph in a cluster of machines. This solution requires users to prepare data f ...@@ -90,14 +90,24 @@ a graph in a cluster of machines. This solution requires users to prepare data f
ParMETIS Installation ParMETIS Installation
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
ParMETIS requires METIS and GKLib. Please follow the instructions `here <https://github.com/KarypisLab/GKlib>`__
to compile and install GKLib. For compiling and install METIS, please follow the instructions below to
clone METIS with GIT and compile it with int64 support.
.. code-block:: none
git clone https://github.com/KarypisLab/METIS.git
make config shared=1 cc=gcc prefix=~/local i64=1
make install
For now, we need to compile and install ParMETIS manually. We clone the DGL branch of ParMETIS as follows: For now, we need to compile and install ParMETIS manually. We clone the DGL branch of ParMETIS as follows:
.. code-block:: none .. code-block:: none
git clone --branch dgl https://github.com/KarypisLab/ParMETIS.git git clone --branch dgl https://github.com/KarypisLab/ParMETIS.git
Then we follow the instructions in its Github to install its dependencies including METIS Then compile and install ParMETIS.
and then compile and install ParMETIS.
.. code-block:: none .. code-block:: none
...@@ -172,6 +182,8 @@ All fields are separated by whitespace: ...@@ -172,6 +182,8 @@ All fields are separated by whitespace:
* `<attributes>` are optional fields. They can be used to store any values and ParMETIS does not * `<attributes>` are optional fields. They can be used to store any values and ParMETIS does not
interpret these fields. interpret these fields.
**Note**: please make sure that there are no duplicated edges and self-loop edges in the edge file.
`xxx_stats.txt` stores some basic statistics of the graph. It has only one line with three fields `xxx_stats.txt` stores some basic statistics of the graph. It has only one line with three fields
separated by whitespace: separated by whitespace:
...@@ -225,17 +237,8 @@ an edge with the following fields: ...@@ -225,17 +237,8 @@ an edge with the following fields:
* `<attributes>` are optional fields that contain any edge attributes in the input edge file. * `<attributes>` are optional fields that contain any edge attributes in the input edge file.
When invoking `pm_dglpart`, the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt` When invoking `pm_dglpart`, the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt`
should be located in the directory where `pm_dglpart` runs. The following command partitions the graph should be located in the directory where `pm_dglpart` runs. The following command run four ParMETIS
named `xxx` into two partitions. processes to partition the graph named `xxx` into eight partitions (each process handles two partitions).
.. code-block:: none
pm_dglpart xxx 2
The following command partitions the graph named `xxx` into eight partitions. In this case,
the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt` should still be located
in the directory where `pm_dglpart` runs. **Note**: the command actually splits the input graph
into eight partitions.
.. code-block:: none .. code-block:: none
...@@ -255,6 +258,8 @@ for loading data in csv files. ...@@ -255,6 +258,8 @@ for loading data in csv files.
* `--input-dir INPUT_DIR` specifies the directory that contains the partition files generated by ParMETIS. * `--input-dir INPUT_DIR` specifies the directory that contains the partition files generated by ParMETIS.
* `--graph-name GRAPH_NAME` specifies the graph name. * `--graph-name GRAPH_NAME` specifies the graph name.
* `--schema SCHEMA` provides a file that specifies the schema of the input heterogeneous graph. * `--schema SCHEMA` provides a file that specifies the schema of the input heterogeneous graph.
The schema file is a JSON file that lists node types and edge types as well as homogeneous ID ranges
for each node type and edge type.
* `--num-parts NUM_PARTS` specifies the number of partitions. * `--num-parts NUM_PARTS` specifies the number of partitions.
* `--num-node-weights NUM_NODE_WEIGHTS` specifies the number of node weights used by ParMETIS * `--num-node-weights NUM_NODE_WEIGHTS` specifies the number of node weights used by ParMETIS
to balance partitions. to balance partitions.
...@@ -286,6 +291,85 @@ assumes all nodes/edges of any types have exactly these attributes. Therefore, i ...@@ -286,6 +291,85 @@ assumes all nodes/edges of any types have exactly these attributes. Therefore, i
nodes or edges of different types contain different numbers of attributes, users need to construct nodes or edges of different types contain different numbers of attributes, users need to construct
them manually. them manually.
Below shows an example of the schema of the OGBN-MAG graph for `convert_partition.py`. It has two fields:
"nid" and "eid". Inside "nid", it lists all node types and the homogeneous ID ranges for each node type;
inside "eid", it lists all edge types and the homogeneous ID ranges for each edge type.
.. code-block:: none
{
"nid": {
"author": [
0,
1134649
],
"field_of_study": [
1134649,
1194614
],
"institution": [
1194614,
1203354
],
"paper": [
1203354,
1939743
]
},
"eid": {
"affiliated_with": [
0,
1043998
],
"writes": [
1043998,
8189658
],
"rev-has_topic": [
8189658,
15694736
],
"rev-affiliated_with": [
15694736,
16738734
],
"cites": [
16738734,
22155005
],
"has_topic": [
22155005,
29660083
],
"rev-cites": [
29660083,
35076354
],
"rev-writes": [
35076354,
42222014
]
}
}
Below shows the demo code to construct the schema file.
.. code-block:: none
nid_ranges = {}
eid_ranges = {}
for ntype in hg.ntypes:
ntype_id = hg.get_ntype_id(ntype)
nid = th.nonzero(g.ndata[dgl.NTYPE] == ntype_id, as_tuple=True)[0]
nid_ranges[ntype] = [int(nid[0]), int(nid[-1] + 1)]
for etype in hg.etypes:
etype_id = hg.get_etype_id(etype)
eid = th.nonzero(g.edata[dgl.ETYPE] == etype_id, as_tuple=True)[0]
eid_ranges[etype] = [int(eid[0]), int(eid[-1] + 1)]
with open('mag.json', 'w') as outfile:
json.dump({'nid': nid_ranges, 'eid': eid_ranges}, outfile, indent=4)
Construct node/edge features for a heterogeneous graph Construct node/edge features for a heterogeneous graph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment