[Doc] Add user guide for GPU-based sampling (#3070)

* add user guide for gpu sampling * Update minibatch-gpu-sampling.rst Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

[Doc] Add user guide for GPU-based sampling (#3070)
* add user guide for gpu sampling * Update minibatch-gpu-sampling.rst Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
ced0e443 · Quan (Andy) Gan · GitHub · 0851b4df · ced0e443 · ced0e443
Unverified Commit ced0e443 authored Jul 05, 2021 by Quan (Andy) Gan Committed by GitHub Jul 05, 2021
3 changed files
--- a/docs/source/guide/minibatch-gpu-sampling.rst
+++ b/docs/source/guide/minibatch-gpu-sampling.rst
+.. _guide-minibatch-gpu-sampling:
+
+6.7 Using GPU for Neighborhood Sampling
+---------------------------------------
+
+DGL since 0.7 has been supporting GPU-based neighborhood sampling, which has a significant
+speed advantage over CPU-based neighborhood sampling.  If you estimate that your graph and
+its features can fit onto GPU and your model does not take a lot of GPU memory, then it is
+best to put the GPU into memory and use GPU-based neighbor sampling.
+
+For example, `OGB Products <https://ogb.stanford.edu/docs/nodeprop/#ogbn-products>`_ has
+2.4M nodes and 61M edges, each node having 100-dimensional features.  The node feature
+themselves take less than 1GB memory, and the graph also takes less than 1GB since the
+memory consumption of a graph depends on the number of edges.  Therefore it is entirely
+possible to fit the whole graph onto GPU.
+
+.. note::
+
+   This feature is experimental and a work-in-progress.  Please stay tuned for further
+   updates.
+
+Using GPU-based neighborhood sampling in DGL data loaders
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One can use GPU-based neighborhood sampling with DGL data loaders via
+
+* Putting the graph onto GPU.
+
+* Set ``num_workers`` argument to 0, because CUDA does not allow multiple processes
+  accessing the same context.
+  
+* Set ``device`` argument to a GPU device.
+
+All the other arguments for the :class:`~dgl.dataloading.pytorch.NodeDataLoader` can be
+the same as the other user guides and tutorials.
+
+.. code:: python
+
+   g = g.to('cuda:0')
+   dataloader = dgl.dataloading.NodeDataLoader(
+       g,                                # The graph must be on GPU.
+       train_nid,
+       sampler,
+       device=torch.device('cuda:0'),    # The device argument must be GPU.
+       num_workers=0,                    # Number of workers must be 0.
+       batch_size=1000,
+       drop_last=False,
+       shuffle=True)
+       
+GPU-based neighbor sampling also works for custom neighborhood samplers as long as
+(1) your sampler is subclassed from :class:`~dgl.dataloading.BlockSampler`, and (2)
+your code in the sampler entirely works on GPU.
+
+.. note::
+
+   Currently :class:`~dgl.dataloading.pytorch.EdgeDataLoader` and heterogeneous graphs
+   are not supported.
+
+Using GPU-based neighbor sampling with DGL functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following sampling functions support operating on GPU:
+
+* :func:`dgl.sampling.sample_neighbors`
+
+  * Only has support for uniform sampling; non-uniform sampling can only run on CPU.
+
+Besides the functions above, :func:`dgl.to_block` can also run on GPU.
--- a/docs/source/guide/minibatch.rst
+++ b/docs/source/guide/minibatch.rst
@@ -58,6 +58,10 @@ conducted in mini-batches.
 * :ref:`guide-minibatch-custom-gnn-module`
 * :ref:`guide-minibatch-inference`

+The following are performance tips for implementing and using neighborhood
+sampling:
+
+* :ref:`guide-minibatch-gpu-sampling`

 .. toctree::
    :maxdepth: 1
@@ -70,3 +74,4 @@ conducted in mini-batches.
    minibatch-custom-sampler
    minibatch-nn
    minibatch-inference
+    minibatch-gpu-sampling
--- a/tutorials/large/L1_large_node_classification.py
+++ b/tutorials/large/L1_large_node_classification.py
@@ -122,6 +122,15 @@ train_dataloader = dgl.dataloading.NodeDataLoader(
 )


+######################################################################
+# .. note::
+#
+#    Since DGL 0.7 neighborhood sampling on GPU is supported.  Please
+#    refer to :ref:`guide-minibatch-gpu-sampling` if you are
+#    interested.
+#
+
+
 ######################################################################
 # You can iterate over the data loader and see what it yields.
 #