"src/vscode:/vscode.git/clone" did not exist on "a40095dd226ac129e834ec4f709b998ae9ac4e90"
Unverified Commit 48a1794f authored by Da Zheng's avatar Da Zheng Committed by GitHub
Browse files

[Doc] Fix the user guide for distributed partitioning. (#2684)

* fix doc.

* explain the schema file.

* fix.
parent 80c26877
......@@ -90,14 +90,24 @@ a graph in a cluster of machines. This solution requires users to prepare data f
ParMETIS Installation
^^^^^^^^^^^^^^^^^^^^^
ParMETIS requires METIS and GKLib. Please follow the instructions `here <https://github.com/KarypisLab/GKlib>`__
to compile and install GKLib. For compiling and install METIS, please follow the instructions below to
clone METIS with GIT and compile it with int64 support.
.. code-block:: none
git clone https://github.com/KarypisLab/METIS.git
make config shared=1 cc=gcc prefix=~/local i64=1
make install
For now, we need to compile and install ParMETIS manually. We clone the DGL branch of ParMETIS as follows:
.. code-block:: none
git clone --branch dgl https://github.com/KarypisLab/ParMETIS.git
Then we follow the instructions in its Github to install its dependencies including METIS
and then compile and install ParMETIS.
Then compile and install ParMETIS.
.. code-block:: none
......@@ -172,6 +182,8 @@ All fields are separated by whitespace:
* `<attributes>` are optional fields. They can be used to store any values and ParMETIS does not
interpret these fields.
**Note**: please make sure that there are no duplicated edges and self-loop edges in the edge file.
`xxx_stats.txt` stores some basic statistics of the graph. It has only one line with three fields
separated by whitespace:
......@@ -225,17 +237,8 @@ an edge with the following fields:
* `<attributes>` are optional fields that contain any edge attributes in the input edge file.
When invoking `pm_dglpart`, the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt`
should be located in the directory where `pm_dglpart` runs. The following command partitions the graph
named `xxx` into two partitions.
.. code-block:: none
pm_dglpart xxx 2
The following command partitions the graph named `xxx` into eight partitions. In this case,
the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt` should still be located
in the directory where `pm_dglpart` runs. **Note**: the command actually splits the input graph
into eight partitions.
should be located in the directory where `pm_dglpart` runs. The following command run four ParMETIS
processes to partition the graph named `xxx` into eight partitions (each process handles two partitions).
.. code-block:: none
......@@ -255,6 +258,8 @@ for loading data in csv files.
* `--input-dir INPUT_DIR` specifies the directory that contains the partition files generated by ParMETIS.
* `--graph-name GRAPH_NAME` specifies the graph name.
* `--schema SCHEMA` provides a file that specifies the schema of the input heterogeneous graph.
The schema file is a JSON file that lists node types and edge types as well as homogeneous ID ranges
for each node type and edge type.
* `--num-parts NUM_PARTS` specifies the number of partitions.
* `--num-node-weights NUM_NODE_WEIGHTS` specifies the number of node weights used by ParMETIS
to balance partitions.
......@@ -286,6 +291,85 @@ assumes all nodes/edges of any types have exactly these attributes. Therefore, i
nodes or edges of different types contain different numbers of attributes, users need to construct
them manually.
Below shows an example of the schema of the OGBN-MAG graph for `convert_partition.py`. It has two fields:
"nid" and "eid". Inside "nid", it lists all node types and the homogeneous ID ranges for each node type;
inside "eid", it lists all edge types and the homogeneous ID ranges for each edge type.
.. code-block:: none
{
"nid": {
"author": [
0,
1134649
],
"field_of_study": [
1134649,
1194614
],
"institution": [
1194614,
1203354
],
"paper": [
1203354,
1939743
]
},
"eid": {
"affiliated_with": [
0,
1043998
],
"writes": [
1043998,
8189658
],
"rev-has_topic": [
8189658,
15694736
],
"rev-affiliated_with": [
15694736,
16738734
],
"cites": [
16738734,
22155005
],
"has_topic": [
22155005,
29660083
],
"rev-cites": [
29660083,
35076354
],
"rev-writes": [
35076354,
42222014
]
}
}
Below shows the demo code to construct the schema file.
.. code-block:: none
nid_ranges = {}
eid_ranges = {}
for ntype in hg.ntypes:
ntype_id = hg.get_ntype_id(ntype)
nid = th.nonzero(g.ndata[dgl.NTYPE] == ntype_id, as_tuple=True)[0]
nid_ranges[ntype] = [int(nid[0]), int(nid[-1] + 1)]
for etype in hg.etypes:
etype_id = hg.get_etype_id(etype)
eid = th.nonzero(g.edata[dgl.ETYPE] == etype_id, as_tuple=True)[0]
eid_ranges[etype] = [int(eid[0]), int(eid[-1] + 1)]
with open('mag.json', 'w') as outfile:
json.dump({'nid': nid_ranges, 'eid': eid_ranges}, outfile, indent=4)
Construct node/edge features for a heterogeneous graph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment