[doc] update user_guide/minibatch/node_classification (#6628)

e02caa67 · Rhett Ying · GitHub · dbedce51 · e02caa67
Unverified Commit e02caa67 authored Nov 30, 2023 by Rhett Ying Committed by GitHub Nov 30, 2023
Show whitespace changes
Inline Side-by-side

Showing with 81 additions and 85 deletions

docs/source/guide/minibatch-node.rst docs/source/guide/minibatch-node.rst +81 -85

No files found.
--- a/docs/source/guide/minibatch-node.rst
+++ b/docs/source/guide/minibatch-node.rst
@@ -21,58 +21,55 @@ DGL provides several neighborhood sampler classes that generates the
 computation dependencies needed for each layer given the nodes we wish
 to compute on.
-The simplest neighborhood sampler is
+The simplest neighborhood sampler is :class:`~dgl.graphbolt.NeighborSampler`
-:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler`
+or the equivalent function-like interface :func:`~dgl.graphbolt.sample_neighbor`
-which makes the node gather messages from all of its neighbors.
+which makes the node gather messages from its neighbors.
 To use a sampler provided by DGL, one also need to combine it with
-:class:`~dgl.dataloading.DataLoader`, which iterates
+:class:`~dgl.graphbolt.MultiProcessDataLoader`, which iterates
 over a set of indices (nodes in this case) in minibatches.
-For example, the following code creates a PyTorch DataLoader that
+For example, the following code creates a DataLoader that
-iterates over the training node ID array ``train_nids`` in batches,
+iterates over the training node ID set of ``ogbn-arxiv`` in batches,
 putting the list of generated MFGs onto GPU.
 .. code:: python
    import dgl
+    import dgl.graphbolt as gb
    import dgl.nn as dglnn
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
-    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-    dataloader = dgl.dataloading.DataLoader(
+    dataset = gb.BuiltinDataset("ogbn-arxiv").load()
-        g, train_nids, sampler,
+    train_set = dataset.tasks[0].train_set
-        batch_size=1024,
+    datapipe = gb.ItemSampler(train_set, batch_size=1024, shuffle=True)
-        shuffle=True,
+    datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
-        drop_last=False,
+    # Or equivalently:
-        num_workers=4)
+    # datapipe = gb.NeighborSampler(datapipe, g, [10, 10])
+    datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
+    datapipe = datapipe.to_dgl()
+    datapipe = datapipe.copy_to(device)
+    dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
-Iterating over the DataLoader will yield a list of specially created
-graphs representing the computation dependencies on each layer. They are
+Iterating over the DataLoader will yield :class:`~dgl.graphbolt.DGLMiniBatch`
-called *message flow graphs* (MFGs) in DGL.
+which contains a list of specially created graphs representing the computation
+dependencies on each layer. They are called *message flow graphs* (MFGs) in DGL.
 .. code:: python
-    input_nodes, output_nodes, blocks = next(iter(dataloader))
+    mini_batch = next(iter(dataloader))
-    print(blocks)
+    print(mini_batch.blocks)
-The iterator generates three items at a time. ``input_nodes`` describe
-the nodes needed to compute the representation of ``output_nodes``.
-``blocks`` describe for each GNN layer which node representations are to
-be computed as output, which node representations are needed as input,
-and how does representation from the input nodes propagate to the output
-nodes.
 .. note::
-   See the :doc:`Stochastic Training Tutorial
+   See the `Stochastic Training Tutorial
-   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
+   <../notebooks/stochastic_training/neighbor_sampling_overview.nblink>`__
-   message flow graph.
+   for the concept of message flow graph.
-   For a complete list of supported builtin samplers, please refer to the
-   :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
   If you wish to develop your own neighborhood sampler or you want a more
   detailed explanation of the concept of MFGs, please refer to
@@ -130,53 +127,40 @@ Training Loop
 ~~~~~~~~~~~~~
 The training loop simply consists of iterating over the dataset with the
-customized batching iterator. During each iteration that yields a list
+customized batching iterator. During each iteration that yields
-of MFGs, we:
+:class:`~dgl.graphbolt.DGLMiniBatch`, we:
-1. Load the node features corresponding to the input nodes onto GPU. The
+1. Access the node features corresponding to the input nodes via
-   node features can be stored in either memory or external storage.
+   ``data.node_features["feat"]``. These features are already moved to the
-   Note that we only need to load the input nodes’ features, as opposed
+   target device (CPU or GPU) by the data loader.
-   to load the features of all nodes as in full graph training.
-   If the features are stored in ``g.ndata``, then the features can be loaded
+2. Access the node labels corresponding to the output nodes via
-   by accessing the features in ``blocks[0].srcdata``, the features of
+   ``data.labels``. These labels are already moved to the target device
-   source nodes of the first MFG, which is identical to all the
+   (CPU or GPU) by the data loader.
-   necessary nodes needed for computing the final representations.
-2. Feed the list of MFGs and the input node features to the multilayer
+3. Feed the list of MFGs and the input node features to the multilayer
   GNN and get the outputs.
-3. Load the node labels corresponding to the output nodes onto GPU.
-   Similarly, the node labels can be stored in either memory or external
-   storage. Again, note that we only need to load the output nodes’
-   labels, as opposed to load the labels of all nodes as in full graph
-   training.
-   If the features are stored in ``g.ndata``, then the labels
-   can be loaded by accessing the features in ``blocks[-1].dstdata``,
-   the features of destination nodes of the last MFG, which is identical to
-   the nodes we wish to compute the final representation.
 4. Compute the loss and backpropagate.
 .. code:: python
    model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
-    model = model.cuda()
+    model = model.to(device)
    opt = torch.optim.Adam(model.parameters())
-    for input_nodes, output_nodes, blocks in dataloader:
+    for data in dataloader:
-        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        input_features = data.node_features["feat"]
-        input_features = blocks[0].srcdata['features']
+        output_labels = data.labels
-        output_labels = blocks[-1].dstdata['label']
+        output_predictions = model(data.blocks, input_features)
-        output_predictions = model(blocks, input_features)
        loss = compute_loss(output_labels, output_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()
 DGL provides an end-to-end stochastic training example `GraphSAGE
-implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/node_classification.py>`__.
+implementation <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/node_classification.py>`__.
 For heterogeneous graphs
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -209,23 +193,31 @@ removed for simplicity):
            x = self.conv2(blocks[1], x)
            return x
-Some of the samplers provided by DGL also support heterogeneous graphs.
+The samplers provided by DGL also support heterogeneous graphs.
 For example, one can still use the provided
-:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` class and
+:class:`~dgl.graphbolt.NeighborSampler` class and
-:class:`~dgl.dataloading.DataLoader` class for
+:class:`~dgl.graphbolt.MultiProcessDataLoader` class for
-stochastic training. For full-neighbor sampling, the only difference
+stochastic training. The only difference is that the itemset is now an
-would be that you would specify a dictionary of node
+instance of :class:`~dgl.graphbolt.ItemSetDict` which is a dictionary
-types and node IDs for the training set.
+of node types to node IDs.
 .. code:: python
-    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-    dataloader = dgl.dataloading.DataLoader(
+    dataset = gb.BuiltinDataset("ogbn-mag").load()
-        g, train_nid_dict, sampler,
+    train_set = dataset.tasks[0].train_set
-        batch_size=1024,
+    datapipe = gb.ItemSampler(train_set, batch_size=1024, shuffle=True)
-        shuffle=True,
+    datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
-        drop_last=False,
+    # Or equivalently:
-        num_workers=4)
+    # datapipe = gb.NeighborSampler(datapipe, g, [10, 10])
+    # For heterogeneous graphs, we need to specify the node feature keys
+    # for each node type.
+    datapipe = datapipe.fetch_feature(
+        feature, node_feature_keys={"author": ["feat"], "paper": ["feat"]}
+    )
+    datapipe = datapipe.to_dgl()
+    datapipe = datapipe.copy_to(device)
+    dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
 The training loop is almost the same as that of homogeneous graphs,
 except for the implementation of ``compute_loss`` that will take in two
@@ -234,20 +226,24 @@ dictionaries of node types and predictions here.
 .. code:: python
    model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, etypes)
-    model = model.cuda()
+    model = model.to(device)
    opt = torch.optim.Adam(model.parameters())
-    for input_nodes, output_nodes, blocks in dataloader:
+    for data in dataloader:
-        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        # For heterogeneous graphs, we need to specify the node types and
-        input_features = blocks[0].srcdata     # returns a dict
+        # feature name when accessing the node features. So does the labels.
-        output_labels = blocks[-1].dstdata     # returns a dict
+        input_features = {
-        output_predictions = model(blocks, input_features)
+            "author": data.node_features[("author", "feat")],
+            "paper": data.node_features[("paper", "feat")]
+        }
+        output_labels = data.labels["paper"]
+        output_predictions = model(data.blocks, input_features)
        loss = compute_loss(output_labels, output_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()
 DGL provides an end-to-end stochastic training example `RGCN
-implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py>`__.
+implementation <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/rgcn/hetero_rgcn.py>`__.