minibatch-edge.rst 12 KB
Newer Older
1
2
3
4
5
.. _guide-minibatch-edge-classification-sampler:

6.2 Training GNN for Edge Classification with Neighborhood Sampling
----------------------------------------------------------------------

6
7
:ref:`(中文版) <guide_cn-minibatch-edge-classification-sampler>`

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Training for edge classification/regression is somewhat similar to that
of node classification/regression with several notable differences.

Define a neighborhood sampler and data loader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can use the
:ref:`same neighborhood samplers as node classification <guide-minibatch-node-classification-sampler>`.

.. code:: python

    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)

To use the neighborhood sampler provided by DGL for edge classification,
one need to instead combine it with
23
:func:`~dgl.dataloading.as_edge_prediction_sampler`, which iterates
24
over a set of edges in minibatches, yielding the subgraph induced by the
25
edge minibatch and *message flow graphs* (MFGs) to be consumed by the module below.
26
27
28

For example, the following code creates a PyTorch DataLoader that
iterates over the training edge ID array ``train_eids`` in batches,
29
putting the list of generated MFGs onto GPU.
30
31
32

.. code:: python

33
34
    sampler = dgl.dataloading.as_edge_prediction_sampler(sampler)
    dataloader = dgl.dataloading.DataLoader(
35
36
37
38
39
40
        g, train_eid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

41
.. note::
42

43
44
45
46
47
48
49
50
51
52
   See the :doc:`Stochastic Training Tutorial
   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
   message flow graph.

   For a complete list of supported builtin samplers, please refer to the
   :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.

   If you wish to develop your own neighborhood sampler or you want a more
   detailed explanation of the concept of MFGs, please refer to
   :ref:`guide-minibatch-customizing-neighborhood-sampler`.
53

54
55
.. _guide-minibatch-edge-classification-sampler-exclude:

56
57
58
59
60
61
62
63
64
65
66
67
Removing edges in the minibatch from the original graph for neighbor sampling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When training edge classification models, sometimes you wish to remove
the edges appearing in the training data from the computation dependency
as if they never existed. Otherwise, the model will know the fact that
an edge exists between the two nodes, and potentially use it for
advantage.

Therefore in edge classification you sometimes would like to exclude the
edges sampled in the minibatch from the original graph for neighborhood
sampling, as well as the reverse edges of the sampled edges on an
68
69
undirected graph. You can specify ``exclude='reverse_id'`` in calling
:func:`~dgl.dataloading.as_edge_prediction_sampler`, with the mapping of the edge
70
71
72
73
74
75
IDs to their reverse edges IDs.  Usually doing so will lead to much slower
sampling process due to locating the reverse edges involving in the minibatch
and removing them.

.. code:: python

76
    n_edges = g.num_edges()
77
78
79
80
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, exclude='reverse_id', reverse_eids=torch.cat([
            torch.arange(n_edges // 2, n_edges), torch.arange(0, n_edges // 2)]))
    dataloader = dgl.dataloading.DataLoader(
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
        g, train_eid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

Adapt your model for minibatch training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The edge classification model usually consists of two parts:

-  One part that obtains the representation of incident nodes.
-  The other part that computes the edge score from the incident node
   representations.

The former part is exactly the same as
:ref:`that from node classification <guide-minibatch-node-classification-model>`
and we can simply reuse it. The input is still the list of
99
MFGs generated from a data loader provided by DGL, as well as the
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
input features.

.. code:: python

    class StochasticTwoLayerGCN(nn.Module):
        def __init__(self, in_features, hidden_features, out_features):
            super().__init__()
            self.conv1 = dglnn.GraphConv(in_features, hidden_features)
            self.conv2 = dglnn.GraphConv(hidden_features, out_features)
    
        def forward(self, blocks, x):
            x = F.relu(self.conv1(blocks[0], x))
            x = F.relu(self.conv2(blocks[1], x))
            return x

The input to the latter part is usually the output from the
former part, as well as the subgraph of the original graph induced by the
edges in the minibatch. The subgraph is yielded from the same data
118
loader. One can call :meth:`dgl.DGLGraph.apply_edges` to compute the
119
120
121
122
123
124
125
126
127
128
129
130
131
132
scores on the edges with the edge subgraph.

The following code shows an example of predicting scores on the edges by
concatenating the incident node features and projecting it with a dense
layer.

.. code:: python

    class ScorePredictor(nn.Module):
        def __init__(self, num_classes, in_features):
            super().__init__()
            self.W = nn.Linear(2 * in_features, num_classes)
    
        def apply_edges(self, edges):
133
            data = torch.cat([edges.src['x'], edges.dst['x']], 1)
134
135
136
137
138
139
140
141
            return {'score': self.W(data)}
    
        def forward(self, edge_subgraph, x):
            with edge_subgraph.local_scope():
                edge_subgraph.ndata['x'] = x
                edge_subgraph.apply_edges(self.apply_edges)
                return edge_subgraph.edata['score']

142
The entire model will take the list of MFGs and the edge subgraph
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
generated by the data loader, as well as the input node features as
follows:

.. code:: python

    class Model(nn.Module):
        def __init__(self, in_features, hidden_features, out_features, num_classes):
            super().__init__()
            self.gcn = StochasticTwoLayerGCN(
                in_features, hidden_features, out_features)
            self.predictor = ScorePredictor(num_classes, out_features)
    
        def forward(self, edge_subgraph, blocks, x):
            x = self.gcn(blocks, x)
            return self.predictor(edge_subgraph, x)

DGL ensures that that the nodes in the edge subgraph are the same as the
160
output nodes of the last MFG in the generated list of MFGs.
161
162
163
164
165
166

Training Loop
~~~~~~~~~~~~~

The training loop is very similar to node classification. You can
iterate over the dataloader and get a subgraph induced by the edges in
167
the minibatch, as well as the list of MFGs necessary for computing
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
their incident node representations.

.. code:: python

    model = Model(in_features, hidden_features, out_features, num_classes)
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, edge_subgraph, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        edge_subgraph = edge_subgraph.to(torch.device('cuda'))
        input_features = blocks[0].srcdata['features']
        edge_labels = edge_subgraph.edata['labels']
        edge_predictions = model(edge_subgraph, blocks, input_features)
        loss = compute_loss(edge_labels, edge_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()

For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~

The models computing the node representations on heterogeneous graphs
can also be used for computing incident node representations for edge
classification/regression.

.. code:: python

    class StochasticTwoLayerRGCN(nn.Module):
197
        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
            super().__init__()
            self.conv1 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                    for rel in rel_names
                })
            self.conv2 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                    for rel in rel_names
                })
    
        def forward(self, blocks, x):
            x = self.conv1(blocks[0], x)
            x = self.conv2(blocks[1], x)
            return x

For score prediction, the only implementation difference between the
homogeneous graph and the heterogeneous graph is that we are looping
215
over the edge types for :meth:`~dgl.DGLGraph.apply_edges`.
216
217
218
219
220
221
222
223
224

.. code:: python

    class ScorePredictor(nn.Module):
        def __init__(self, num_classes, in_features):
            super().__init__()
            self.W = nn.Linear(2 * in_features, num_classes)
    
        def apply_edges(self, edges):
225
            data = torch.cat([edges.src['x'], edges.dst['x']], 1)
226
227
228
229
230
231
232
233
234
            return {'score': self.W(data)}
    
        def forward(self, edge_subgraph, x):
            with edge_subgraph.local_scope():
                edge_subgraph.ndata['x'] = x
                for etype in edge_subgraph.canonical_etypes:
                    edge_subgraph.apply_edges(self.apply_edges, etype=etype)
                return edge_subgraph.edata['score']

235
236
237
238
239
240
241
242
243
244
245
246
    class Model(nn.Module):
        def __init__(self, in_features, hidden_features, out_features, num_classes,
                     etypes):
            super().__init__()
            self.rgcn = StochasticTwoLayerRGCN(
                in_features, hidden_features, out_features, etypes)
            self.pred = ScorePredictor(num_classes, out_features)

        def forward(self, edge_subgraph, blocks, x):
            x = self.rgcn(blocks, x)
            return self.pred(edge_subgraph, x)

247
248
Data loader definition is also very similar to that of node
classification. The only difference is that you need
249
250
:func:`~dgl.dataloading.as_edge_prediction_sampler`,
and you will be supplying a
251
252
253
254
255
256
dictionary of edge types and edge ID tensors instead of a dictionary of
node types and node ID tensors.

.. code:: python

    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
257
258
    sampler = dgl.dataloading.as_edge_prediction_sampler(sampler)
    dataloader = dgl.dataloading.DataLoader(
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
        g, train_eid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

Things become a little different if you wish to exclude the reverse
edges on heterogeneous graphs. On heterogeneous graphs, reverse edges
usually have a different edge type from the edges themselves, in order
to differentiate the forward and backward relationships (e.g.
``follow`` and ``followed by`` are reverse relations of each other,
``purchase`` and ``purchased by`` are reverse relations of each other,
etc.).

If each edge in a type has a reverse edge with the same ID in another
type, you can specify the mapping between edge types and their reverse
types. The way to exclude the edges in the minibatch as well as their
reverse edges then goes as follows.

.. code:: python

280
281
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, exclude='reverse_types',
282
        reverse_etypes={'follow': 'followed by', 'followed by': 'follow',
283
284
285
                        'purchase': 'purchased by', 'purchased by': 'purchase'})
    dataloader = dgl.dataloading.DataLoader(
        g, train_eid_dict, sampler,
286
287
288
289
290
291
292
293
294
295
296
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

The training loop is again almost the same as that on homogeneous graph,
except for the implementation of ``compute_loss`` that will take in two
dictionaries of node types and predictions here.

.. code:: python

297
    model = Model(in_features, hidden_features, out_features, num_classes, etypes)
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, edge_subgraph, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        edge_subgraph = edge_subgraph.to(torch.device('cuda'))
        input_features = blocks[0].srcdata['features']
        edge_labels = edge_subgraph.edata['labels']
        edge_predictions = model(edge_subgraph, blocks, input_features)
        loss = compute_loss(edge_labels, edge_predictions)
        opt.zero_grad()
        loss.backward()
        opt.step()

`GCMC <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcmc>`__
is an example of edge classification on a bipartite graph.