minibatch-node.rst 9.28 KB
Newer Older
1
2
3
4
5
.. _guide-minibatch-node-classification-sampler:

6.1 Training GNN for Node Classification with Neighborhood Sampling
-----------------------------------------------------------------------

6
7
:ref:`(中文版) <guide_cn-minibatch-node-classification-sampler>`

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
To make your model been trained stochastically, you need to do the
followings:

-  Define a neighborhood sampler.
-  Adapt your model for minibatch training.
-  Modify your training loop.

The following sub-subsections address these steps one by one.

Define a neighborhood sampler and data loader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DGL provides several neighborhood sampler classes that generates the
computation dependencies needed for each layer given the nodes we wish
to compute on.

The simplest neighborhood sampler is
:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler`
which makes the node gather messages from all of its neighbors.

To use a sampler provided by DGL, one also need to combine it with
29
30
:class:`~dgl.dataloading.DataLoader`, which iterates
over a set of indices (nodes in this case) in minibatches.
31
32
33

For example, the following code creates a PyTorch DataLoader that
iterates over the training node ID array ``train_nids`` in batches,
34
putting the list of generated MFGs onto GPU.
35
36
37
38
39
40
41
42
43
44

.. code:: python

    import dgl
    import dgl.nn as dglnn
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    
    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
45
    dataloader = dgl.dataloading.DataLoader(
46
47
48
49
50
51
52
53
        g, train_nids, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

Iterating over the DataLoader will yield a list of specially created
graphs representing the computation dependencies on each layer. They are
54
called *message flow graphs* (MFGs) in DGL.
55
56
57
58
59
60
61
62
63
64
65
66
67

.. code:: python

    input_nodes, output_nodes, blocks = next(iter(dataloader))
    print(blocks)

The iterator generates three items at a time. ``input_nodes`` describe
the nodes needed to compute the representation of ``output_nodes``.
``blocks`` describe for each GNN layer which node representations are to
be computed as output, which node representations are needed as input,
and how does representation from the input nodes propagate to the output
nodes.

68
69
70
71
72
73
74
75
76
77
78
79
.. note::

   See the :doc:`Stochastic Training Tutorial
   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
   message flow graph.

   For a complete list of supported builtin samplers, please refer to the
   :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.

   If you wish to develop your own neighborhood sampler or you want a more
   detailed explanation of the concept of MFGs, please refer to
   :ref:`guide-minibatch-customizing-neighborhood-sampler`.
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123


.. _guide-minibatch-node-classification-model:

Adapt your model for minibatch training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If your message passing modules are all provided by DGL, the changes
required to adapt your model to minibatch training is minimal. Take a
multi-layer GCN as an example. If your model on full graph is
implemented as follows:

.. code:: python

    class TwoLayerGCN(nn.Module):
        def __init__(self, in_features, hidden_features, out_features):
            super().__init__()
            self.conv1 = dglnn.GraphConv(in_features, hidden_features)
            self.conv2 = dglnn.GraphConv(hidden_features, out_features)
    
        def forward(self, g, x):
            x = F.relu(self.conv1(g, x))
            x = F.relu(self.conv2(g, x))
            return x

Then all you need is to replace ``g`` with ``blocks`` generated above.

.. code:: python

    class StochasticTwoLayerGCN(nn.Module):
        def __init__(self, in_features, hidden_features, out_features):
            super().__init__()
            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
    
        def forward(self, blocks, x):
            x = F.relu(self.conv1(blocks[0], x))
            x = F.relu(self.conv2(blocks[1], x))
            return x

The DGL ``GraphConv`` modules above accepts an element in ``blocks``
generated by the data loader as an argument.

:ref:`The API reference of each NN module <apinn>` will tell you
124
whether it supports accepting a MFG as an argument.
125
126
127
128
129
130
131
132
133

If you wish to use your own message passing module, please refer to
:ref:`guide-minibatch-custom-gnn-module`.

Training Loop
~~~~~~~~~~~~~

The training loop simply consists of iterating over the dataset with the
customized batching iterator. During each iteration that yields a list
134
of MFGs, we:
135
136
137
138
139
140
141
142

1. Load the node features corresponding to the input nodes onto GPU. The
   node features can be stored in either memory or external storage.
   Note that we only need to load the input nodes features, as opposed
   to load the features of all nodes as in full graph training.
   
   If the features are stored in ``g.ndata``, then the features can be loaded
   by accessing the features in ``blocks[0].srcdata``, the features of
143
   source nodes of the first MFG, which is identical to all the
144
145
   necessary nodes needed for computing the final representations.

146
2. Feed the list of MFGs and the input node features to the multilayer
147
148
149
150
151
152
153
154
155
   GNN and get the outputs.

3. Load the node labels corresponding to the output nodes onto GPU.
   Similarly, the node labels can be stored in either memory or external
   storage. Again, note that we only need to load the output nodes
   labels, as opposed to load the labels of all nodes as in full graph
   training.
   
   If the features are stored in ``g.ndata``, then the labels
156
   can be loaded by accessing the features in ``blocks[-1].dstdata``,
157
   the features of destination nodes of the last MFG, which is identical to
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
   the nodes we wish to compute the final representation.

4. Compute the loss and backpropagate.

.. code:: python

    model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, output_nodes, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        input_features = blocks[0].srcdata['features']
        output_labels = blocks[-1].dstdata['label']
        output_predictions = model(blocks, input_features)
        loss = compute_loss(output_labels, output_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()

DGL provides an end-to-end stochastic training example `GraphSAGE
179
implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/node_classification.py>`__.
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~

Training a graph neural network for node classification on heterogeneous
graph is similar.

For instance, we have previously seen
:ref:`how to train a 2-layer RGCN on full graph <guide-training-rgcn-node-classification>`.
The code for RGCN implementation on minibatch training looks very
similar to that (with self-loops, non-linearity and basis decomposition
removed for simplicity):

.. code:: python

    class StochasticTwoLayerRGCN(nn.Module):
196
        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
            super().__init__()
            self.conv1 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                    for rel in rel_names
                })
            self.conv2 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                    for rel in rel_names
                })
    
        def forward(self, blocks, x):
            x = self.conv1(blocks[0], x)
            x = self.conv2(blocks[1], x)
            return x

Some of the samplers provided by DGL also support heterogeneous graphs.
For example, one can still use the provided
:class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` class and
215
:class:`~dgl.dataloading.DataLoader` class for
216
217
218
219
220
221
222
stochastic training. For full-neighbor sampling, the only difference
would be that you would specify a dictionary of node
types and node IDs for the training set.

.. code:: python

    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
223
    dataloader = dgl.dataloading.DataLoader(
224
225
226
227
228
229
230
231
232
233
234
235
        g, train_nid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

The training loop is almost the same as that of homogeneous graphs,
except for the implementation of ``compute_loss`` that will take in two
dictionaries of node types and predictions here.

.. code:: python

236
    model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, etypes)
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, output_nodes, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        input_features = blocks[0].srcdata     # returns a dict
        output_labels = blocks[-1].dstdata     # returns a dict
        output_predictions = model(blocks, input_features)
        loss = compute_loss(output_labels, output_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()

DGL provides an end-to-end stochastic training example `RGCN
implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py>`__.