Unverified Commit e02caa67 authored by Rhett Ying's avatar Rhett Ying Committed by GitHub
Browse files

[doc] update user_guide/minibatch/node_classification (#6628)

parent dbedce51
...@@ -21,58 +21,55 @@ DGL provides several neighborhood sampler classes that generates the ...@@ -21,58 +21,55 @@ DGL provides several neighborhood sampler classes that generates the
computation dependencies needed for each layer given the nodes we wish computation dependencies needed for each layer given the nodes we wish
to compute on. to compute on.
The simplest neighborhood sampler is The simplest neighborhood sampler is :class:`~dgl.graphbolt.NeighborSampler`
:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` or the equivalent function-like interface :func:`~dgl.graphbolt.sample_neighbor`
which makes the node gather messages from all of its neighbors. which makes the node gather messages from its neighbors.
To use a sampler provided by DGL, one also need to combine it with To use a sampler provided by DGL, one also need to combine it with
:class:`~dgl.dataloading.DataLoader`, which iterates :class:`~dgl.graphbolt.MultiProcessDataLoader`, which iterates
over a set of indices (nodes in this case) in minibatches. over a set of indices (nodes in this case) in minibatches.
For example, the following code creates a PyTorch DataLoader that For example, the following code creates a DataLoader that
iterates over the training node ID array ``train_nids`` in batches, iterates over the training node ID set of ``ogbn-arxiv`` in batches,
putting the list of generated MFGs onto GPU. putting the list of generated MFGs onto GPU.
.. code:: python .. code:: python
import dgl import dgl
import dgl.graphbolt as gb
import dgl.nn as dglnn import dgl.nn as dglnn
import torch import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataloader = dgl.dataloading.DataLoader( dataset = gb.BuiltinDataset("ogbn-arxiv").load()
g, train_nids, sampler, train_set = dataset.tasks[0].train_set
batch_size=1024, datapipe = gb.ItemSampler(train_set, batch_size=1024, shuffle=True)
shuffle=True, datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
drop_last=False, # Or equivalently:
num_workers=4) # datapipe = gb.NeighborSampler(datapipe, g, [10, 10])
datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
Iterating over the DataLoader will yield a list of specially created
graphs representing the computation dependencies on each layer. They are Iterating over the DataLoader will yield :class:`~dgl.graphbolt.DGLMiniBatch`
called *message flow graphs* (MFGs) in DGL. which contains a list of specially created graphs representing the computation
dependencies on each layer. They are called *message flow graphs* (MFGs) in DGL.
.. code:: python .. code:: python
input_nodes, output_nodes, blocks = next(iter(dataloader)) mini_batch = next(iter(dataloader))
print(blocks) print(mini_batch.blocks)
The iterator generates three items at a time. ``input_nodes`` describe
the nodes needed to compute the representation of ``output_nodes``.
``blocks`` describe for each GNN layer which node representations are to
be computed as output, which node representations are needed as input,
and how does representation from the input nodes propagate to the output
nodes.
.. note:: .. note::
See the :doc:`Stochastic Training Tutorial See the `Stochastic Training Tutorial
<tutorials/large/L0_neighbor_sampling_overview>` for the concept of <../notebooks/stochastic_training/neighbor_sampling_overview.nblink>`__
message flow graph. for the concept of message flow graph.
For a complete list of supported builtin samplers, please refer to the
:ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
If you wish to develop your own neighborhood sampler or you want a more If you wish to develop your own neighborhood sampler or you want a more
detailed explanation of the concept of MFGs, please refer to detailed explanation of the concept of MFGs, please refer to
...@@ -130,53 +127,40 @@ Training Loop ...@@ -130,53 +127,40 @@ Training Loop
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
The training loop simply consists of iterating over the dataset with the The training loop simply consists of iterating over the dataset with the
customized batching iterator. During each iteration that yields a list customized batching iterator. During each iteration that yields
of MFGs, we: :class:`~dgl.graphbolt.DGLMiniBatch`, we:
1. Load the node features corresponding to the input nodes onto GPU. The 1. Access the node features corresponding to the input nodes via
node features can be stored in either memory or external storage. ``data.node_features["feat"]``. These features are already moved to the
Note that we only need to load the input nodes features, as opposed target device (CPU or GPU) by the data loader.
to load the features of all nodes as in full graph training.
If the features are stored in ``g.ndata``, then the features can be loaded 2. Access the node labels corresponding to the output nodes via
by accessing the features in ``blocks[0].srcdata``, the features of ``data.labels``. These labels are already moved to the target device
source nodes of the first MFG, which is identical to all the (CPU or GPU) by the data loader.
necessary nodes needed for computing the final representations.
2. Feed the list of MFGs and the input node features to the multilayer 3. Feed the list of MFGs and the input node features to the multilayer
GNN and get the outputs. GNN and get the outputs.
3. Load the node labels corresponding to the output nodes onto GPU.
Similarly, the node labels can be stored in either memory or external
storage. Again, note that we only need to load the output nodes
labels, as opposed to load the labels of all nodes as in full graph
training.
If the features are stored in ``g.ndata``, then the labels
can be loaded by accessing the features in ``blocks[-1].dstdata``,
the features of destination nodes of the last MFG, which is identical to
the nodes we wish to compute the final representation.
4. Compute the loss and backpropagate. 4. Compute the loss and backpropagate.
.. code:: python .. code:: python
model = StochasticTwoLayerGCN(in_features, hidden_features, out_features) model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
model = model.cuda() model = model.to(device)
opt = torch.optim.Adam(model.parameters()) opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader: for data in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks] input_features = data.node_features["feat"]
input_features = blocks[0].srcdata['features'] output_labels = data.labels
output_labels = blocks[-1].dstdata['label'] output_predictions = model(data.blocks, input_features)
output_predictions = model(blocks, input_features)
loss = compute_loss(output_labels, output_predictions) loss = compute_loss(output_labels, output_predictions)
opt.zero_grad() opt.zero_grad()
loss.backward() loss.backward()
opt.step() opt.step()
DGL provides an end-to-end stochastic training example `GraphSAGE DGL provides an end-to-end stochastic training example `GraphSAGE
implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/node_classification.py>`__. implementation <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/node_classification.py>`__.
For heterogeneous graphs For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -209,23 +193,31 @@ removed for simplicity): ...@@ -209,23 +193,31 @@ removed for simplicity):
x = self.conv2(blocks[1], x) x = self.conv2(blocks[1], x)
return x return x
Some of the samplers provided by DGL also support heterogeneous graphs. The samplers provided by DGL also support heterogeneous graphs.
For example, one can still use the provided For example, one can still use the provided
:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` class and :class:`~dgl.graphbolt.NeighborSampler` class and
:class:`~dgl.dataloading.DataLoader` class for :class:`~dgl.graphbolt.MultiProcessDataLoader` class for
stochastic training. For full-neighbor sampling, the only difference stochastic training. The only difference is that the itemset is now an
would be that you would specify a dictionary of node instance of :class:`~dgl.graphbolt.ItemSetDict` which is a dictionary
types and node IDs for the training set. of node types to node IDs.
.. code:: python .. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataloader = dgl.dataloading.DataLoader( dataset = gb.BuiltinDataset("ogbn-mag").load()
g, train_nid_dict, sampler, train_set = dataset.tasks[0].train_set
batch_size=1024, datapipe = gb.ItemSampler(train_set, batch_size=1024, shuffle=True)
shuffle=True, datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
drop_last=False, # Or equivalently:
num_workers=4) # datapipe = gb.NeighborSampler(datapipe, g, [10, 10])
# For heterogeneous graphs, we need to specify the node feature keys
# for each node type.
datapipe = datapipe.fetch_feature(
feature, node_feature_keys={"author": ["feat"], "paper": ["feat"]}
)
datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
The training loop is almost the same as that of homogeneous graphs, The training loop is almost the same as that of homogeneous graphs,
except for the implementation of ``compute_loss`` that will take in two except for the implementation of ``compute_loss`` that will take in two
...@@ -234,20 +226,24 @@ dictionaries of node types and predictions here. ...@@ -234,20 +226,24 @@ dictionaries of node types and predictions here.
.. code:: python .. code:: python
model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, etypes) model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, etypes)
model = model.cuda() model = model.to(device)
opt = torch.optim.Adam(model.parameters()) opt = torch.optim.Adam(model.parameters())
for input_nodes, output_nodes, blocks in dataloader: for data in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks] # For heterogeneous graphs, we need to specify the node types and
input_features = blocks[0].srcdata # returns a dict # feature name when accessing the node features. So does the labels.
output_labels = blocks[-1].dstdata # returns a dict input_features = {
output_predictions = model(blocks, input_features) "author": data.node_features[("author", "feat")],
"paper": data.node_features[("paper", "feat")]
}
output_labels = data.labels["paper"]
output_predictions = model(data.blocks, input_features)
loss = compute_loss(output_labels, output_predictions) loss = compute_loss(output_labels, output_predictions)
opt.zero_grad() opt.zero_grad()
loss.backward() loss.backward()
opt.step() opt.step()
DGL provides an end-to-end stochastic training example `RGCN DGL provides an end-to-end stochastic training example `RGCN
implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py>`__. implementation <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/rgcn/hetero_rgcn.py>`__.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment