[Doc] Misc Fix for User Guide 7.1 Data Preprocessing (#4433)

* Update * rollback for partition_algo/random.py Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-21.us-west-2.compute.internal>

[Doc] Misc Fix for User Guide 7.1 Data Preprocessing (#4433)
* Update * rollback for partition_algo/random.py Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-21.us-west-2.compute.internal>
bc2fef9c · Mufei Li · GitHub · d248e768 · bc2fef9c
Unverified Commit bc2fef9c authored Aug 18, 2022 by Mufei Li Committed by GitHub Aug 18, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 13 deletions

docs/source/guide/distributed-preprocessing.rst docs/source/guide/distributed-preprocessing.rst +12 -13

No files found.
--- a/docs/source/guide/distributed-preprocessing.rst
+++ b/docs/source/guide/distributed-preprocessing.rst
@@ -20,7 +20,7 @@ training. For example,

    import dgl

-    g = ...  # create or load an DGLGraph object
+    g = ...  # create or load a DGLGraph object
    dgl.distributed.partition_graph(g, 'mygraph', 2, 'data_root_dir')

 will outputs the following data file.
@@ -243,7 +243,7 @@ strict requirement as long as ``metadata.json`` contains valid file paths.
  in each chunk.
 * ``edge_type``: List of string. Edge type names in the form of
  ``<source node type>:<relation>:<destination node type>``.
-* ``num_edges_per_chunk``: List of list of integer. For graphs with :math:`R` edge 
+* ``num_edges_per_chunk``: List of list of integer. For graphs with :math:`R` edge
  types stored in :math:`P` chunks, the value contains :math:`R` integer lists.
  Each list contains :math:`P` integers, which specify the number of edges
  in each chunk.
@@ -262,8 +262,7 @@ strict requirement as long as ``metadata.json`` contains valid file paths.
  details about how to parse each data file.
    - ``"csv"``: CSV file. Use the ``delimiter`` key to specify delimiter in use.
    - ``"numpy"``: NumPy array binary file created by :func:`numpy.save`.
-* ``data``: List of string. File path to each data chunk. Support absolute path
-  or path relative to the location of ``metadata.json``.
+* ``data``: List of string. File path to each data chunk. Support absolute path.

 Tips for making chunked graph data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -297,9 +296,9 @@ For example, to randomly partition MAG240M-LSC to two parts, run the
 .. code-block:: bash

    python /my/repo/dgl/tools/partition_algo/random.py
-        --in-dir=/mydata/MAG240M-LSC_chunked/
-        --out-dir=/mydata/MAG240M-LSC_2parts/
-        --num-parts=2
+        --metadata /mydata/MAG240M-LSC_chunked/metadata.json
+        --output_path /mydata/MAG240M-LSC_2parts/
+        --num_partitions 2

 , which outputs files as follows:

@@ -345,13 +344,13 @@ efficiently. The entire step can be further accelerated using multi-processing.
 .. code-block:: bash

    python /myrepo/dgl/tools/dispatch_data.py         \
-       --in-dir=/mydata/MAG240M-LSC_chunked/          \
-       --partition-file=/mydata/MAG240M-LSC_2parts/   \
-       --out-dir=/data/MAG_LSC_partitioned            \
-       --ip-config=ip_config.txt
+       --in-dir /mydata/MAG240M-LSC_chunked/          \
+       --partitions-dir /mydata/MAG240M-LSC_2parts/   \
+       --out-dir data/MAG_LSC_partitioned            \
+       --ip-config ip_config.txt

-* ``--in-dir`` specifies the path to the folder of the input chunked graph data produced by Step.1.
-* ``--partition-file`` specifies the path to the partition assignment file produced by Step.2.
+* ``--in-dir`` specifies the path to the folder of the input chunked graph data produced
+* ``--partitions-dir`` specifies the path to the partition assignment folder produced by Step.1.
 * ``--out-dir`` specifies the path to stored the data partition on each machine.
 * ``--ip-config`` specifies the IP configuration file of the cluster.