Unverified Commit 1e40c81b authored by Rhett Ying's avatar Rhett Ying Committed by GitHub
Browse files

[doc] update minibatch link prediction (#6649)

parent aa9c1101
...@@ -5,44 +5,40 @@ ...@@ -5,44 +5,40 @@
:ref:`(中文版) <guide_cn-minibatch-link-classification-sampler>` :ref:`(中文版) <guide_cn-minibatch-link-classification-sampler>`
Define a neighborhood sampler and data loader with negative sampling Define a data loader with neighbor and negative sampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can still use the same neighborhood sampler as the one in node/edge You can still use the same data loader as the one in node/edge classification.
classification. The only difference is that you need to add an additional stage
`negative sampling` before neighbor sampling stage. The following data loader
will pick 5 negative destination nodes uniformly for each source node of an
edge.
.. code:: python .. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) datapipe = datapipe.sample_uniform_negative(graph, 5)
:func:`~dgl.dataloading.as_edge_prediction_sampler` in DGL also The whole data loader pipeline is as follows:
supports generating negative samples for link prediction. To do so, you
need to provide the negative sampling function.
:class:`~dgl.dataloading.negative_sampler.Uniform` is a
function that does uniform sampling. For each source node of an edge, it
samples ``k`` negative destination nodes.
The following data loader will pick 5 negative destination nodes
uniformly for each source node of an edge.
.. code:: python .. code:: python
sampler = dgl.dataloading.as_edge_prediction_sampler( datapipe = gb.ItemSampler(itemset, batch_size=1024, shuffle=True)
sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5)) datapipe = datapipe.sample_uniform_negative(graph, 5)
dataloader = dgl.dataloading.DataLoader( datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
g, train_seeds, sampler, datapipe = datapipe.transform(gb.exclude_seed_edges)
batch_size=args.batch_size, datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
shuffle=True, datapipe = datapipe.to_dgl()
drop_last=False, datapipe = datapipe.copy_to(device)
pin_memory=True, dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
num_workers=args.num_workers)
For the builtin negative samplers please see :ref:`api-dataloading-negative-sampling`. For the details about the builtin uniform negative sampler please see
:class:`~dgl.graphbolt.UniformNegativeSampler`.
You can also give your own negative sampler function, as long as it You can also give your own negative sampler function, as long as it inherits
takes in the original graph ``g`` and the minibatch edge ID array from :class:`~dgl.graphbolt.NegativeSampler` and overrides the
``eid``, and returns a pair of source ID arrays and destination ID :meth:`~dgl.graphbolt.NegativeSampler._sample_with_etype` method which takes in
arrays. the node pairs in minibatch, and returns the negative node pairs back.
The following gives an example of custom negative sampler that samples The following gives an example of custom negative sampler that samples
negative destination nodes according to a probability distribution negative destination nodes according to a probability distribution
...@@ -50,88 +46,73 @@ proportional to a power of degrees. ...@@ -50,88 +46,73 @@ proportional to a power of degrees.
.. code:: python .. code:: python
class NegativeSampler(object): @functional_datapipe("customized_sample_negative")
def __init__(self, g, k): class CustomizedNegativeSampler(dgl.graphbolt.NegativeSampler):
def __init__(self, datapipe, k, node_degrees):
super().__init__(datapipe, k)
# caches the probability distribution # caches the probability distribution
self.weights = g.in_degrees().float() ** 0.75 self.weights = node_degrees ** 0.75
self.k = k self.k = k
def __call__(self, g, eids): def _sample_with_etype(node_pairs, etype=None):
src, _ = g.find_edges(eids) src, _ = node_pairs
src = src.repeat_interleave(self.k) src = src.repeat_interleave(self.k)
dst = self.weights.multinomial(len(src), replacement=True) dst = self.weights.multinomial(len(src), replacement=True)
return src, dst return src, dst
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=NegativeSampler(g, 5))
dataloader = dgl.dataloading.DataLoader(
g, train_seeds, sampler,
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=args.num_workers)
Adapt your model for minibatch training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As explained in :ref:`guide-training-link-prediction`, link prediction is trained datapipe = datapipe.customized_sample_negative(5, node_degrees)
via comparing the score of an edge (positive example) against a
non-existent edge (negative example). To compute the scores of edges you
can reuse the node representation computation model you have seen in
edge classification/regression.
.. code:: python
class StochasticTwoLayerGCN(nn.Module): Define a GraphSAGE model for minibatch training
def __init__(self, in_features, hidden_features, out_features): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
super().__init__()
self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
def forward(self, blocks, x):
x = F.relu(self.conv1(blocks[0], x))
x = F.relu(self.conv2(blocks[1], x))
return x
For score prediction, since you only need to predict a scalar score for
each edge instead of a probability distribution, this example shows how
to compute a score with a dot product of incident node representations.
.. code:: python .. code:: python
class ScorePredictor(nn.Module): class SAGE(nn.Module):
def forward(self, edge_subgraph, x): def __init__(self, in_size, hidden_size):
with edge_subgraph.local_scope(): super().__init__()
edge_subgraph.ndata['x'] = x self.layers = nn.ModuleList()
edge_subgraph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score')) self.layers.append(dglnn.SAGEConv(in_size, hidden_size, "mean"))
return edge_subgraph.edata['score'] self.layers.append(dglnn.SAGEConv(hidden_size, hidden_size, "mean"))
self.layers.append(dglnn.SAGEConv(hidden_size, hidden_size, "mean"))
When a negative sampler is provided, DGLs data loader will generate self.hidden_size = hidden_size
three items per minibatch: self.predictor = nn.Sequential(
nn.Linear(hidden_size, hidden_size),
- A positive graph containing all the edges sampled in the minibatch. nn.ReLU(),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 1),
)
- A negative graph containing all the non-existent edges generated by def forward(self, blocks, x):
the negative sampler. hidden_x = x
for layer_idx, (layer, block) in enumerate(zip(self.layers, blocks)):
hidden_x = layer(block, hidden_x)
is_last_layer = layer_idx == len(self.layers) - 1
if not is_last_layer:
hidden_x = F.relu(hidden_x)
return hidden_x
- A list of *message flow graphs* (MFGs) generated by the neighborhood sampler.
So one can define the link prediction model as follows that takes in the When a negative sampler is provided, the data loader will generate positive and
three items as well as the input features. negative node pairs for each minibatch besides the *Message Flow Graphs* (MFGs).
Let's define a utility function to compact node pairs as follows:
.. code:: python .. code:: python
class Model(nn.Module): def to_binary_link_dgl_computing_pack(data: gb.DGLMiniBatch):
def __init__(self, in_features, hidden_features, out_features): """Convert the minibatch to a training pair and a label tensor."""
super().__init__() pos_src, pos_dst = data.positive_node_pairs
self.gcn = StochasticTwoLayerGCN( neg_src, neg_dst = data.negative_node_pairs
in_features, hidden_features, out_features) node_pairs = (
torch.cat((pos_src, neg_src), dim=0),
torch.cat((pos_dst, neg_dst), dim=0),
)
pos_label = torch.ones_like(pos_src)
neg_label = torch.zeros_like(neg_src)
labels = torch.cat([pos_label, neg_label], dim=0)
return (node_pairs, labels.float())
def forward(self, positive_graph, negative_graph, blocks, x):
x = self.gcn(blocks, x)
pos_score = self.predictor(positive_graph, x)
neg_score = self.predictor(negative_graph, x)
return pos_score, neg_score
Training loop Training loop
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
...@@ -142,164 +123,176 @@ above. ...@@ -142,164 +123,176 @@ above.
.. code:: python .. code:: python
def compute_loss(pos_score, neg_score): optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# an example hinge loss
n = pos_score.shape[0] for epoch in tqdm.trange(args.epochs):
return (neg_score.view(n, -1) - pos_score.view(n, -1) + 1).clamp(min=0).mean() model.train()
total_loss = 0
model = Model(in_features, hidden_features, out_features) start_epoch_time = time.time()
model = model.cuda() for step, data in enumerate(dataloader):
opt = torch.optim.Adam(model.parameters()) # Unpack MiniBatch.
compacted_pairs, labels = to_binary_link_dgl_computing_pack(data)
for input_nodes, positive_graph, negative_graph, blocks in dataloader: node_feature = data.node_features["feat"]
blocks = [b.to(torch.device('cuda')) for b in blocks] # Convert sampled subgraphs to DGL blocks.
positive_graph = positive_graph.to(torch.device('cuda')) blocks = data.blocks
negative_graph = negative_graph.to(torch.device('cuda'))
input_features = blocks[0].srcdata['features'] # Get the embeddings of the input nodes.
pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features) y = model(blocks, node_feature)
loss = compute_loss(pos_score, neg_score) logits = model.predictor(
opt.zero_grad() y[compacted_pairs[0]] * y[compacted_pairs[1]]
).squeeze()
# Compute loss.
loss = F.binary_cross_entropy_with_logits(logits, labels)
optimizer.zero_grad()
loss.backward() loss.backward()
opt.step() optimizer.step()
total_loss += loss.item()
end_epoch_time = time.time()
DGL provides the DGL provides the
`unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_unsupervised.py>`__ `unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/link_prediction.py>`__
that shows an example of link prediction on homogeneous graphs. that shows an example of link prediction on homogeneous graphs.
For heterogeneous graphs For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~
The models computing the node representations on heterogeneous graphs The previous model could be easily extended to heterogeneous graphs. The only
can also be used for computing incident node representations for edge difference is that you need to use :class:`~dgl.nn.HeteroGraphConv` to wrap
classification/regression. :class:`~dgl.nn.SAGEConv` according to edge types.
.. code:: python .. code:: python
class StochasticTwoLayerRGCN(nn.Module): class SAGE(nn.Module):
def __init__(self, in_feat, hidden_feat, out_feat, rel_names): def __init__(self, in_size, hidden_size):
super().__init__() super().__init__()
self.conv1 = dglnn.HeteroGraphConv({ self.layers = nn.ModuleList()
rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right') self.layers.append(dglnn.HeteroGraphConv({
rel : dglnn.SAGEConv(in_size, hidden_size, "mean")
for rel in rel_names for rel in rel_names
}) }))
self.conv2 = dglnn.HeteroGraphConv({ self.layers.append(dglnn.HeteroGraphConv({
rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right') rel : dglnn.SAGEConv(hidden_size, hidden_size, "mean")
for rel in rel_names for rel in rel_names
}) }))
self.layers.append(dglnn.HeteroGraphConv({
rel : dglnn.SAGEConv(hidden_size, hidden_size, "mean")
for rel in rel_names
}))
self.hidden_size = hidden_size
self.predictor = nn.Sequential(
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 1),
)
def forward(self, blocks, x): def forward(self, blocks, x):
x = self.conv1(blocks[0], x) hidden_x = x
x = self.conv2(blocks[1], x) for layer_idx, (layer, block) in enumerate(zip(self.layers, blocks)):
return x hidden_x = layer(block, hidden_x)
is_last_layer = layer_idx == len(self.layers) - 1
For score prediction, the only implementation difference between the if not is_last_layer:
homogeneous graph and the heterogeneous graph is that we are looping hidden_x = F.relu(hidden_x)
over the edge types for :meth:`dgl.DGLGraph.apply_edges`. return hidden_x
.. code:: python
class ScorePredictor(nn.Module): Data loader definition is also very similar to that for homogeneous graph. The
def forward(self, edge_subgraph, x): only difference is that you need to give edge types for feature fetching.
with edge_subgraph.local_scope():
edge_subgraph.ndata['x'] = x
for etype in edge_subgraph.canonical_etypes:
edge_subgraph.apply_edges(
dgl.function.u_dot_v('x', 'x', 'score'), etype=etype)
return edge_subgraph.edata['score']
class Model(nn.Module):
def __init__(self, in_features, hidden_features, out_features, num_classes,
etypes):
super().__init__()
self.rgcn = StochasticTwoLayerRGCN(
in_features, hidden_features, out_features, etypes)
self.pred = ScorePredictor()
def forward(self, positive_graph, negative_graph, blocks, x):
x = self.rgcn(blocks, x)
pos_score = self.pred(positive_graph, x)
neg_score = self.pred(negative_graph, x)
return pos_score, neg_score
Data loader definition is also very similar to that of edge
classification/regression. The only difference is that you need to give
the negative sampler and you will be supplying a dictionary of edge
types and edge ID tensors instead of a dictionary of node types and node
ID tensors.
.. code:: python .. code:: python
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) datapipe = gb.ItemSampler(itemset, batch_size=1024, shuffle=True)
sampler = dgl.dataloading.as_edge_prediction_sampler( datapipe = datapipe.sample_uniform_negative(graph, 5)
sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5)) datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
dataloader = dgl.dataloading.DataLoader( datapipe = datapipe.transform(gb.exclude_seed_edges)
g, train_eid_dict, sampler, datapipe = datapipe.fetch_feature(
batch_size=1024, feature,
shuffle=True, node_feature_keys={"user": ["feat"], "item": ["feat"]}
drop_last=False, )
num_workers=4) datapipe = datapipe.to_dgl()
datapipe = datapipe.copy_to(device)
If you want to give your own negative sampling function, the function dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
should take in the original graph and the dictionary of edge types and
edge ID tensors. It should return a dictionary of edge types and If you want to give your own negative sampling function, just inherit from the
source-destination array pairs. An example is given as follows: :class:`~dgl.graphbolt.NegativeSampler` class and override the
:meth:`~dgl.graphbolt.NegativeSampler._sample_with_etype` method.
.. code:: python .. code:: python
class NegativeSampler(object): @functional_datapipe("customized_sample_negative")
def __init__(self, g, k): class CustomizedNegativeSampler(dgl.graphbolt.NegativeSampler):
def __init__(self, datapipe, k, node_degrees):
super().__init__(datapipe, k)
# caches the probability distribution # caches the probability distribution
self.weights = { self.weights = {
etype: g.in_degrees(etype=etype).float() ** 0.75 etype: node_degrees[etype] ** 0.75 for etype in node_degrees
for etype in g.canonical_etypes} }
self.k = k self.k = k
def __call__(self, g, eids_dict): def _sample_with_etype(node_pairs, etype):
result_dict = {} src, _ = node_pairs
for etype, eids in eids_dict.items():
src, _ = g.find_edges(eids, etype=etype)
src = src.repeat_interleave(self.k) src = src.repeat_interleave(self.k)
dst = self.weights[etype].multinomial(len(src), replacement=True) dst = self.weights[etype].multinomial(len(src), replacement=True)
result_dict[etype] = (src, dst) return src, dst
return result_dict
datapipe = datapipe.customized_sample_negative(5, node_degrees)
Then you can give the dataloader a dictionary of edge types and edge IDs as well as the negative
sampler. For instance, the following iterates over all edges of the heterogeneous graph. For heterogeneous graphs, node pairs are grouped by edge types.
.. code:: python .. code:: python
train_eid_dict = { def to_binary_link_dgl_computing_pack(data: gb.DGLMiniBatch, etype):
etype: g.edges(etype=etype, form='eid') """Convert the minibatch to a training pair and a label tensor."""
for etype in g.canonical_etypes} pos_src, pos_dst = data.positive_node_pairs[etype]
sampler = dgl.dataloading.as_edge_prediction_sampler( neg_src, neg_dst = data.negative_node_pairs[etype]
sampler, negative_sampler=NegativeSampler(g, 5)) node_pairs = (
dataloader = dgl.dataloading.DataLoader( torch.cat((pos_src, neg_src), dim=0),
g, train_eid_dict, sampler, torch.cat((pos_dst, neg_dst), dim=0),
batch_size=1024, )
shuffle=True, pos_label = torch.ones_like(pos_src)
drop_last=False, neg_label = torch.zeros_like(neg_src)
num_workers=4) labels = torch.cat([pos_label, neg_label], dim=0)
return (node_pairs, labels.float())
The training loop is again almost the same as that on homogeneous graph, The training loop is again almost the same as that on homogeneous graph,
except for the implementation of ``compute_loss`` that will take in two except for computing loss on specific edge type.
dictionaries of node types and predictions here.
.. code:: python .. code:: python
model = Model(in_features, hidden_features, out_features, num_classes, etypes) optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
model = model.cuda()
opt = torch.optim.Adam(model.parameters()) category = "user"
for epoch in tqdm.trange(args.epochs):
for input_nodes, positive_graph, negative_graph, blocks in dataloader: model.train()
blocks = [b.to(torch.device('cuda')) for b in blocks] total_loss = 0
positive_graph = positive_graph.to(torch.device('cuda')) start_epoch_time = time.time()
negative_graph = negative_graph.to(torch.device('cuda')) for step, data in enumerate(dataloader):
input_features = blocks[0].srcdata['features'] # Unpack MiniBatch.
pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features) compacted_pairs, labels = to_binary_link_dgl_computing_pack(data, category)
loss = compute_loss(pos_score, neg_score) node_features = {
opt.zero_grad() ntype: data.node_features[(ntype, "feat")]
for ntype in data.blocks[0].srctypes
}
# Convert sampled subgraphs to DGL blocks.
blocks = data.blocks
# Get the embeddings of the input nodes.
y = model(blocks, node_feature)
logits = model.predictor(
y[category][compacted_pairs[0]] * y[category][compacted_pairs[1]]
).squeeze()
# Compute loss.
loss = F.binary_cross_entropy_with_logits(logits, labels)
optimizer.zero_grad()
loss.backward() loss.backward()
opt.step() optimizer.step()
total_loss += loss.item()
end_epoch_time = time.time()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment