Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dgl
Commits
48a1794f
Unverified
Commit
48a1794f
authored
Feb 22, 2021
by
Da Zheng
Committed by
GitHub
Feb 22, 2021
Browse files
[Doc] Fix the user guide for distributed partitioning. (#2684)
* fix doc. * explain the schema file. * fix.
parent
80c26877
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
97 additions
and
13 deletions
+97
-13
docs/source/guide/distributed-preprocessing.rst
docs/source/guide/distributed-preprocessing.rst
+97
-13
No files found.
docs/source/guide/distributed-preprocessing.rst
View file @
48a1794f
...
...
@@ -90,14 +90,24 @@ a graph in a cluster of machines. This solution requires users to prepare data f
ParMETIS Installation
^^^^^^^^^^^^^^^^^^^^^
ParMETIS requires METIS and GKLib. Please follow the instructions `here <https://github.com/KarypisLab/GKlib>`__
to compile and install GKLib. For compiling and install METIS, please follow the instructions below to
clone METIS with GIT and compile it with int64 support.
.. code-block:: none
git clone https://github.com/KarypisLab/METIS.git
make config shared=1 cc=gcc prefix=~/local i64=1
make install
For now, we need to compile and install ParMETIS manually. We clone the DGL branch of ParMETIS as follows:
.. code-block:: none
git clone --branch dgl https://github.com/KarypisLab/ParMETIS.git
Then we follow the instructions in its Github to install its dependencies including METIS
and then compile and install ParMETIS.
Then compile and install ParMETIS.
.. code-block:: none
...
...
@@ -172,6 +182,8 @@ All fields are separated by whitespace:
* `<attributes>` are optional fields. They can be used to store any values and ParMETIS does not
interpret these fields.
**Note**: please make sure that there are no duplicated edges and self-loop edges in the edge file.
`xxx_stats.txt` stores some basic statistics of the graph. It has only one line with three fields
separated by whitespace:
...
...
@@ -225,17 +237,8 @@ an edge with the following fields:
* `<attributes>` are optional fields that contain any edge attributes in the input edge file.
When invoking `pm_dglpart`, the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt`
should be located in the directory where `pm_dglpart` runs. The following command partitions the graph
named `xxx` into two partitions.
.. code-block:: none
pm_dglpart xxx 2
The following command partitions the graph named `xxx` into eight partitions. In this case,
the three input files: `xxx_nodes.txt`, `xxx_edges.txt`, `xxx_stats.txt` should still be located
in the directory where `pm_dglpart` runs. **Note**: the command actually splits the input graph
into eight partitions.
should be located in the directory where `pm_dglpart` runs. The following command run four ParMETIS
processes to partition the graph named `xxx` into eight partitions (each process handles two partitions).
.. code-block:: none
...
...
@@ -255,6 +258,8 @@ for loading data in csv files.
* `--input-dir INPUT_DIR` specifies the directory that contains the partition files generated by ParMETIS.
* `--graph-name GRAPH_NAME` specifies the graph name.
* `--schema SCHEMA` provides a file that specifies the schema of the input heterogeneous graph.
The schema file is a JSON file that lists node types and edge types as well as homogeneous ID ranges
for each node type and edge type.
* `--num-parts NUM_PARTS` specifies the number of partitions.
* `--num-node-weights NUM_NODE_WEIGHTS` specifies the number of node weights used by ParMETIS
to balance partitions.
...
...
@@ -286,6 +291,85 @@ assumes all nodes/edges of any types have exactly these attributes. Therefore, i
nodes or edges of different types contain different numbers of attributes, users need to construct
them manually.
Below shows an example of the schema of the OGBN-MAG graph for `convert_partition.py`. It has two fields:
"nid" and "eid". Inside "nid", it lists all node types and the homogeneous ID ranges for each node type;
inside "eid", it lists all edge types and the homogeneous ID ranges for each edge type.
.. code-block:: none
{
"nid": {
"author": [
0,
1134649
],
"field_of_study": [
1134649,
1194614
],
"institution": [
1194614,
1203354
],
"paper": [
1203354,
1939743
]
},
"eid": {
"affiliated_with": [
0,
1043998
],
"writes": [
1043998,
8189658
],
"rev-has_topic": [
8189658,
15694736
],
"rev-affiliated_with": [
15694736,
16738734
],
"cites": [
16738734,
22155005
],
"has_topic": [
22155005,
29660083
],
"rev-cites": [
29660083,
35076354
],
"rev-writes": [
35076354,
42222014
]
}
}
Below shows the demo code to construct the schema file.
.. code-block:: none
nid_ranges = {}
eid_ranges = {}
for ntype in hg.ntypes:
ntype_id = hg.get_ntype_id(ntype)
nid = th.nonzero(g.ndata[dgl.NTYPE] == ntype_id, as_tuple=True)[0]
nid_ranges[ntype] = [int(nid[0]), int(nid[-1] + 1)]
for etype in hg.etypes:
etype_id = hg.get_etype_id(etype)
eid = th.nonzero(g.edata[dgl.ETYPE] == etype_id, as_tuple=True)[0]
eid_ranges[etype] = [int(eid[0]), int(eid[-1] + 1)]
with open('mag.json', 'w') as outfile:
json.dump({'nid': nid_ranges, 'eid': eid_ranges}, outfile, indent=4)
Construct node/edge features for a heterogeneous graph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment