[DistGB] update documentation (#7201)

34ae70b5 · Rhett Ying · GitHub · 996a9364 · 34ae70b5 · 34ae70b5
Unverified Commit 34ae70b5 authored Mar 07, 2024 by Rhett Ying Committed by GitHub Mar 07, 2024
Showing with 54 additions and 0 deletions

docs/source/api/python/dgl.distributed.rst docs/source/api/python/dgl.distributed.rst +1 -0

tutorials/dist/1_node_classification.py tutorials/dist/1_node_classification.py +53 -0

No files found.
--- a/docs/source/api/python/dgl.distributed.rst
+++ b/docs/source/api/python/dgl.distributed.rst
@@ -104,3 +104,4 @@ Split and Load Partitions
    load_partition_feats
    load_partition_book
    partition_graph
+    dgl_partition_to_graphbolt
--- a/tutorials/dist/1_node_classification.py
+++ b/tutorials/dist/1_node_classification.py
@@ -436,4 +436,57 @@ If we split the graph into four partitions as demonstrated at the beginning of t
  ip_addr3
  ip_addr4
+Sample neighbors with `GraphBolt`
+----------------------------------
+Since DGL 2.0, we have introduced a new dataloading framework
+`GraphBolt <https://doc.dgl.ai/stochastic_training/index.html>`_ in
+which sampling is highly improved compared to previous implementations in DGL.
+As a result, we've introduced `GraphBolt` to distributed training to improve
+the performance of distributed sampling. What's more, the graph partitions
+could be much smaller than before, which is beneficial for the loading speed
+and memory usage during distributed training.
+Graph partitioning
+^^^^^^^^^^^^^^^^^^^
+In order to benefit from `GraphBolt` for distributed sampling, we need to
+convert partitions from `DGL` format to `GraphBolt` format. This can be done by
+`dgl.distributed.dgl_partition_to_graphbolt` function. Alternatively, we can use
+`dgl.distributed.partition_graph` function to generate partitions in `GraphBolt`
+format directly.
+1. Convert partitions from `DGL` format to `GraphBolt` format.
+.. code-block:: python
+    part_config = "4part_data/ogbn-products.json"
+    dgl.distributed.dgl_partition_to_graphbolt(part_config)
+The new partitions will be stored in the same directory as the original
+partitions.
+2. Generate partitions in `GraphBolt` format directly. Just set the
+`use_graphbolt` flag to `True` in `partition_graph` function.
+.. code-block:: python
+    dgl.distributed.partition_graph(graph, graph_name='ogbn-products', num_parts=4,
+                                    out_path='4part_data',
+                                    balance_ntypes=graph.ndata['train_mask'],
+                                    balance_edges=True,
+                                    use_graphbolt=True)
+Enable `GraphBolt` sampling in the training script
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Just set the `use_graphbolt` flag to `True` in `dgl.distributed.initialize`
+function. This is the only change needed in the training script to enable
+`GraphBolt` sampling.
+.. code-block:: python
+    dgl.distributed.initialize('ip_config.txt', use_graphbolt=True)
 """