"docs/source/vscode:/vscode.git/clone" did not exist on "c894836108732d0cbb6fce15aeda8de1218a380d"
Unverified Commit 8425c936 authored by Xin Yao's avatar Xin Yao Committed by GitHub
Browse files

update doc for gpu&uva sampling (#3787)


Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
parent 861666fa
......@@ -13,35 +13,15 @@ For example, `OGB Products <https://ogb.stanford.edu/docs/nodeprop/#ogbn-product
a graph depends on the number of edges. Therefore it is entirely possible to fit the
whole graph onto GPU.
Put the node features onto GPU memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the node features can also fit onto GPU memory, it is recommended to put them onto GPU
to reduce the time for data transfer from CPU to GPU, which usually becomes a bottleneck
when using GPU for sampling. For exampling, in the above OGB Products, each node has
100-dimensional features and they take less than 1GB memory in total. It is easy to
transfer these features to GPU before training via the following code.
.. code:: python
# pop the features and labels
features = g.ndata.pop('features')
labels = g.ndata.pop('labels')
# put them onto GPU
features = features.to('cuda:0')
labels = labels.to('cuda:0')
If the node features are too large to fit onto GPU memory, :class:`~dgl.contrib.UnifiedTensor`
enables GPU zero-copy access to the features stored on CPU memory and greatly reduces
the time for data transfer from CPU to GPU.
Using GPU-based neighborhood sampling in DGL data loaders
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One can use GPU-based neighborhood sampling with DGL data loaders via:
* Putting the graph onto GPU.
* Put the graph onto GPU.
* Put the ``train_nid`` onto GPU.
* Set ``device`` argument to a GPU device.
......@@ -54,9 +34,10 @@ the same as the other user guides and tutorials.
.. code:: python
g = g.to('cuda:0')
train_nid = train_nid.to('cuda:0')
dataloader = dgl.dataloading.DataLoader(
g, # The graph must be on GPU.
train_nid,
train_nid, # train_nid must be on GPU.
sampler,
device=torch.device('cuda:0'), # The device argument must be GPU.
num_workers=0, # Number of workers must be 0.
......@@ -82,28 +63,31 @@ CUDA UVA (Unified Virtual Addressing)-based sampling, in which GPUs perform the
on the graph pinned on CPU memory via zero-copy access.
You can enable UVA-based neighborhood sampling in DGL data loaders via:
* Pin the graph to page-locked memory via :func:`dgl.DGLGraph.pin_memory_`.
* Put the ``train_nid`` onto GPU.
* Set ``device`` argument to a GPU device.
* Set ``num_workers`` argument to 0, because CUDA does not allow multiple processes
accessing the same context.
* Set ``use_uva=True``.
All the other arguments for the :class:`~dgl.dataloading.DataLoader` can be
the same as the other user guides and tutorials.
.. code:: python
g = g.pin_memory_()
train_nid = train_nid.to('cuda:0')
dataloader = dgl.dataloading.DataLoader(
g, # The graph must be pinned.
train_nid,
g,
train_nid, # train_nid must be on GPU.
sampler,
device=torch.device('cuda:0'), # The device argument must be GPU.
num_workers=0, # Number of workers must be 0.
batch_size=1000,
drop_last=False,
shuffle=True)
shuffle=True,
use_uva=True) # Set use_uva=True
UVA-based sampling is the recommended solution for mini-batch training on large graphs,
especially for multi-GPU training.
......@@ -111,9 +95,8 @@ especially for multi-GPU training.
.. note::
To use UVA-based sampling in multi-GPU training, you should first materialize all the
necessary sparse formats of the graph and copy them to the shared memory explicitly
before spawning training processes. Then you should pin the shared graph in each training
process respectively. Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details.
necessary sparse formats of the graph before spawning training processes.
Refer to our `GraphSAGE example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/multi_gpu_node_classification.py>`_ for more details.
Using GPU-based neighbor sampling with DGL functions
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment