Unverified Commit 1d2a1cdc authored by Rhett Ying's avatar Rhett Ying Committed by GitHub
Browse files

[doc] update edge classification chapter (#6642)

parent e3752754
...@@ -16,36 +16,45 @@ You can use the ...@@ -16,36 +16,45 @@ You can use the
.. code:: python .. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) datapipe = datapipe.sample_neighbor(g, [10, 10])
# Or equivalently
datapipe = dgl.graphbolt.NeighborSampler(datapipe, g, [10, 10])
To use the neighborhood sampler provided by DGL for edge classification, The code for defining a data loader is also the same as that of node
one need to instead combine it with classification. The only difference is that it iterates over the
:func:`~dgl.dataloading.as_edge_prediction_sampler`, which iterates edges(namely, node pairs) in the training set instead of the nodes.
over a set of edges in minibatches, yielding the subgraph induced by the
edge minibatch and *message flow graphs* (MFGs) to be consumed by the module below.
For example, the following code creates a PyTorch DataLoader that
iterates over the training edge ID array ``train_eids`` in batches,
putting the list of generated MFGs onto GPU.
.. code:: python .. code:: python
sampler = dgl.dataloading.as_edge_prediction_sampler(sampler) import dgl.graphbolt as gb
dataloader = dgl.dataloading.DataLoader(
g, train_eid_dict, sampler, device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size=1024, g = gb.SamplingGraph()
shuffle=True, node_paris = torch.arange(0, 1000).reshape(-1, 2)
drop_last=False, labels = torch.randint(0, 2, (5,))
num_workers=4) train_set = gb.ItemSet((node_pairs, labels), names=("node_pairs", "labels"))
datapipe = gb.ItemSampler(train_set, batch_size=128, shuffle=True)
datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
# Or equivalently:
# datapipe = gb.NeighborSampler(datapipe, g, [10, 10])
datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
Iterating over the DataLoader will yield :class:`~dgl.graphbolt.DGLMiniBatch`
which contains a list of specially created graphs representing the computation
dependencies on each layer. They are called *message flow graphs* (MFGs) in DGL.
.. code:: python
mini_batch = next(iter(dataloader))
print(mini_batch.blocks)
.. note:: .. note::
See the :doc:`Stochastic Training Tutorial See the :doc:`Stochastic Training Tutorial
<tutorials/large/L0_neighbor_sampling_overview>` for the concept of <../notebooks/stochastic_training/neighbor_sampling_overview.nblink>`__
message flow graph. for the concept of message flow graph.
For a complete list of supported builtin samplers, please refer to the
:ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
If you wish to develop your own neighborhood sampler or you want a more If you wish to develop your own neighborhood sampler or you want a more
detailed explanation of the concept of MFGs, please refer to detailed explanation of the concept of MFGs, please refer to
...@@ -63,26 +72,29 @@ an edge exists between the two nodes, and potentially use it for ...@@ -63,26 +72,29 @@ an edge exists between the two nodes, and potentially use it for
advantage. advantage.
Therefore in edge classification you sometimes would like to exclude the Therefore in edge classification you sometimes would like to exclude the
edges sampled in the minibatch from the original graph for neighborhood seed edges as well as their reverse edges from the sampled minibatch.
sampling, as well as the reverse edges of the sampled edges on an You can use :func:`~dgl.graphbolt.exclude_seed_edges` alongside with
undirected graph. You can specify ``exclude='reverse_id'`` in calling :class:`~dgl.graphbolt.MiniBatchTransformer` to achieve this.
:func:`~dgl.dataloading.as_edge_prediction_sampler`, with the mapping of the edge
IDs to their reverse edges IDs. Usually doing so will lead to much slower
sampling process due to locating the reverse edges involving in the minibatch
and removing them.
.. code:: python .. code:: python
n_edges = g.num_edges() import dgl.graphbolt as gb
sampler = dgl.dataloading.as_edge_prediction_sampler( from functools import partial
sampler, exclude='reverse_id', reverse_eids=torch.cat([
torch.arange(n_edges // 2, n_edges), torch.arange(0, n_edges // 2)])) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataloader = dgl.dataloading.DataLoader( g = gb.SamplingGraph()
g, train_eid_dict, sampler, node_paris = torch.arange(0, 1000).reshape(-1, 2)
batch_size=1024, labels = torch.randint(0, 2, (5,))
shuffle=True, train_set = gb.ItemSet((node_pairs, labels), names=("node_pairs", "labels"))
drop_last=False, datapipe = gb.ItemSampler(train_set, batch_size=128, shuffle=True)
num_workers=4) datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
exclude_seed_edges = partial(gb.exclude_seed_edges, include_reverse_edges=True)
datapipe = datapipe.transform(exclude_seed_edges)
datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
Adapt your model for minibatch training Adapt your model for minibatch training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -113,14 +125,12 @@ input features. ...@@ -113,14 +125,12 @@ input features.
return x return x
The input to the latter part is usually the output from the The input to the latter part is usually the output from the
former part, as well as the subgraph of the original graph induced by the former part, as well as the subgraph(node pairs) of the original graph induced
edges in the minibatch. The subgraph is yielded from the same data by the edges in the minibatch. The subgraph is yielded from the same data
loader. One can call :meth:`dgl.DGLGraph.apply_edges` to compute the loader.
scores on the edges with the edge subgraph.
The following code shows an example of predicting scores on the edges by The following code shows an example of predicting scores on the edges by
concatenating the incident node features and projecting it with a dense concatenating the incident node features and projecting it with a dense layer.
layer.
.. code:: python .. code:: python
...@@ -129,19 +139,15 @@ layer. ...@@ -129,19 +139,15 @@ layer.
super().__init__() super().__init__()
self.W = nn.Linear(2 * in_features, num_classes) self.W = nn.Linear(2 * in_features, num_classes)
def apply_edges(self, edges): def forward(self, node_pairs, x):
data = torch.cat([edges.src['x'], edges.dst['x']], 1) src_x = x[node_pairs[0]]
return {'score': self.W(data)} dst_x = x[node_pairs[1]]
data = torch.cat([src_x, dst_x], 1)
def forward(self, edge_subgraph, x): return self.W(data)
with edge_subgraph.local_scope():
edge_subgraph.ndata['x'] = x
edge_subgraph.apply_edges(self.apply_edges)
return edge_subgraph.edata['score']
The entire model will take the list of MFGs and the edge subgraph The entire model will take the list of MFGs and the edges generated by the data
generated by the data loader, as well as the input node features as loader, as well as the input node features as follows:
follows:
.. code:: python .. code:: python
...@@ -151,10 +157,10 @@ follows: ...@@ -151,10 +157,10 @@ follows:
self.gcn = StochasticTwoLayerGCN( self.gcn = StochasticTwoLayerGCN(
in_features, hidden_features, out_features) in_features, hidden_features, out_features)
self.predictor = ScorePredictor(num_classes, out_features) self.predictor = ScorePredictor(num_classes, out_features)
def forward(self, edge_subgraph, blocks, x): def forward(self, blocks, x, node_pairs):
x = self.gcn(blocks, x) x = self.gcn(blocks, x)
return self.predictor(edge_subgraph, x) return self.predictor(node_pairs, x)
DGL ensures that that the nodes in the edge subgraph are the same as the DGL ensures that that the nodes in the edge subgraph are the same as the
output nodes of the last MFG in the generated list of MFGs. output nodes of the last MFG in the generated list of MFGs.
...@@ -169,21 +175,21 @@ their incident node representations. ...@@ -169,21 +175,21 @@ their incident node representations.
.. code:: python .. code:: python
import torch.nn.functional as F
model = Model(in_features, hidden_features, out_features, num_classes) model = Model(in_features, hidden_features, out_features, num_classes)
model = model.cuda() model = model.to(device)
opt = torch.optim.Adam(model.parameters()) opt = torch.optim.Adam(model.parameters())
for input_nodes, edge_subgraph, blocks in dataloader: for data in dataloader:
blocks = [b.to(torch.device('cuda')) for b in blocks] blocks = data.blocks
edge_subgraph = edge_subgraph.to(torch.device('cuda')) x = data.edge_features("feat")
input_features = blocks[0].srcdata['features'] y_hat = model(data.blocks, x, data.positive_node_pairs)
edge_labels = edge_subgraph.edata['labels'] loss = F.cross_entropy(data.labels, y_hat)
edge_predictions = model(edge_subgraph, blocks, input_features)
loss = compute_loss(edge_labels, edge_predictions)
opt.zero_grad() opt.zero_grad()
loss.backward() loss.backward()
opt.step() opt.step()
For heterogeneous graphs For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -212,7 +218,7 @@ classification/regression. ...@@ -212,7 +218,7 @@ classification/regression.
For score prediction, the only implementation difference between the For score prediction, the only implementation difference between the
homogeneous graph and the heterogeneous graph is that we are looping homogeneous graph and the heterogeneous graph is that we are looping
over the edge types for :meth:`~dgl.DGLGraph.apply_edges`. over the edge types.
.. code:: python .. code:: python
...@@ -221,16 +227,13 @@ over the edge types for :meth:`~dgl.DGLGraph.apply_edges`. ...@@ -221,16 +227,13 @@ over the edge types for :meth:`~dgl.DGLGraph.apply_edges`.
super().__init__() super().__init__()
self.W = nn.Linear(2 * in_features, num_classes) self.W = nn.Linear(2 * in_features, num_classes)
def apply_edges(self, edges): def forward(self, node_pairs, x):
data = torch.cat([edges.src['x'], edges.dst['x']], 1) scores = {}
return {'score': self.W(data)} for etype in node_pairs.keys():
src, dst = node_pairs[etype]
def forward(self, edge_subgraph, x): data = torch.cat([x[etype][src], x[etype][dst]], 1)
with edge_subgraph.local_scope(): scores[etype] = self.W(data)
edge_subgraph.ndata['x'] = x return scores
for etype in edge_subgraph.canonical_etypes:
edge_subgraph.apply_edges(self.apply_edges, etype=etype)
return edge_subgraph.edata['score']
class Model(nn.Module): class Model(nn.Module):
def __init__(self, in_features, hidden_features, out_features, num_classes, def __init__(self, in_features, hidden_features, out_features, num_classes,
...@@ -240,34 +243,46 @@ over the edge types for :meth:`~dgl.DGLGraph.apply_edges`. ...@@ -240,34 +243,46 @@ over the edge types for :meth:`~dgl.DGLGraph.apply_edges`.
in_features, hidden_features, out_features, etypes) in_features, hidden_features, out_features, etypes)
self.pred = ScorePredictor(num_classes, out_features) self.pred = ScorePredictor(num_classes, out_features)
def forward(self, edge_subgraph, blocks, x): def forward(self, node_pairs, blocks, x):
x = self.rgcn(blocks, x) x = self.rgcn(blocks, x)
return self.pred(edge_subgraph, x) return self.pred(node_pairs, x)
Data loader definition is also very similar to that of node Data loader definition is almost identical to that of homogeneous graph. The
classification. The only difference is that you need only difference is that the train_set is now an instance of
:func:`~dgl.dataloading.as_edge_prediction_sampler`, :class:`~dgl.graphbolt.ItemSetDict` instead of :class:`~dgl.graphbolt.ItemSet`.
and you will be supplying a
dictionary of edge types and edge ID tensors instead of a dictionary of
node types and node ID tensors.
.. code:: python .. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) import dgl.graphbolt as gb
sampler = dgl.dataloading.as_edge_prediction_sampler(sampler)
dataloader = dgl.dataloading.DataLoader( device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
g, train_eid_dict, sampler, g = gb.SamplingGraph()
batch_size=1024, node_pairs = torch.arange(0, 1000).reshape(-1, 2)
shuffle=True, labels = torch.randint(0, 3, (1000,))
drop_last=False, node_pairs_labels = {
num_workers=4) "user:like:item": gb.ItemSet(
(node_pairs, labels), names=("node_pairs", "labels")
),
"user:follow:user": gb.ItemSet(
(node_pairs, labels), names=("node_pairs", "labels")
),
}
train_set = gb.ItemSetDict(node_pairs_labels)
datapipe = gb.ItemSampler(train_set, batch_size=128, shuffle=True)
datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
datapipe = datapipe.fetch_feature(
feature, node_feature_keys={"item": ["feat"], "user": ["feat"]}
)
datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
Things become a little different if you wish to exclude the reverse Things become a little different if you wish to exclude the reverse
edges on heterogeneous graphs. On heterogeneous graphs, reverse edges edges on heterogeneous graphs. On heterogeneous graphs, reverse edges
usually have a different edge type from the edges themselves, in order usually have a different edge type from the edges themselves, in order
to differentiate the forward and backward relationships (e.g. to differentiate the forward and backward relationships (e.g.
``follow`` and ``followed by`` are reverse relations of each other, ``follow`` and ``followed_by`` are reverse relations of each other,
``purchase`` and ``purchased by`` are reverse relations of each other, ``like`` and ``liked_by`` are reverse relations of each other,
etc.). etc.).
If each edge in a type has a reverse edge with the same ID in another If each edge in a type has a reverse edge with the same ID in another
...@@ -277,16 +292,17 @@ reverse edges then goes as follows. ...@@ -277,16 +292,17 @@ reverse edges then goes as follows.
.. code:: python .. code:: python
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, exclude='reverse_types', exclude_seed_edges = partial(
reverse_etypes={'follow': 'followed by', 'followed by': 'follow', gb.exclude_seed_edges,
'purchase': 'purchased by', 'purchased by': 'purchase'}) include_reverse_edges=True,
dataloader = dgl.dataloading.DataLoader( reverse_etypes_mapping={
g, train_eid_dict, sampler, "user:like:item": "item:liked_by:user",
batch_size=1024, "user:follow:user": "user:followed_by:user",
shuffle=True, },
drop_last=False, )
num_workers=4) datapipe = datapipe.transform(exclude_seed_edges)
The training loop is again almost the same as that on homogeneous graph, The training loop is again almost the same as that on homogeneous graph,
except for the implementation of ``compute_loss`` that will take in two except for the implementation of ``compute_loss`` that will take in two
...@@ -309,7 +325,3 @@ dictionaries of node types and predictions here. ...@@ -309,7 +325,3 @@ dictionaries of node types and predictions here.
loss.backward() loss.backward()
opt.step() opt.step()
`GCMC <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcmc>`__
is an example of edge classification on a bipartite graph.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment