.. _guide-minibatch-link-classification-sampler: 6.3 Training GNN for Link Prediction with Neighborhood Sampling -------------------------------------------------------------------- Define a neighborhood sampler and data loader with negative sampling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can still use the same neighborhood sampler as the one in node/edge classification. .. code:: python sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) :class:`~dgl.dataloading.pytorch.EdgeDataLoader` in DGL also supports generating negative samples for link prediction. To do so, you need to provide the negative sampling function. :class:`~dgl.dataloading.negative_sampler.Uniform` is a function that does uniform sampling. For each source node of an edge, it samples ``k`` negative destination nodes. The following data loader will pick 5 negative destination nodes uniformly for each source node of an edge. .. code:: python dataloader = dgl.dataloading.EdgeDataLoader( g, train_seeds, sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5), batch_size=args.batch_size, shuffle=True, drop_last=False, pin_memory=True, num_workers=args.num_workers) For the builtin negative samplers please see :ref:`api-dataloading-negative-sampling`. You can also give your own negative sampler function, as long as it takes in the original graph ``g`` and the minibatch edge ID array ``eid``, and returns a pair of source ID arrays and destination ID arrays. The following gives an example of custom negative sampler that samples negative destination nodes according to a probability distribution proportional to a power of degrees. .. code:: python class NegativeSampler(object): def __init__(self, g, k): # caches the probability distribution self.weights = g.in_degrees().float() ** 0.75 self.k = k def __call__(self, g, eids): src, _ = g.find_edges(eids) src = src.repeat_interleave(self.k) dst = self.weights.multinomial(len(src), replacement=True) return src, dst dataloader = dgl.dataloading.EdgeDataLoader( g, train_seeds, sampler, negative_sampler=NegativeSampler(g, 5), batch_size=args.batch_size, shuffle=True, drop_last=False, pin_memory=True, num_workers=args.num_workers) Adapt your model for minibatch training ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As explained in :ref:`guide-training-link-prediction`, link prediction is trained via comparing the score of an edge (positive example) against a non-existent edge (negative example). To compute the scores of edges you can reuse the node representation computation model you have seen in edge classification/regression. .. code:: python class StochasticTwoLayerGCN(nn.Module): def __init__(self, in_features, hidden_features, out_features): super().__init__() self.conv1 = dgl.nn.GraphConv(in_features, hidden_features) self.conv2 = dgl.nn.GraphConv(hidden_features, out_features) def forward(self, blocks, x): x = F.relu(self.conv1(blocks[0], x)) x = F.relu(self.conv2(blocks[1], x)) return x For score prediction, since you only need to predict a scalar score for each edge instead of a probability distribution, this example shows how to compute a score with a dot product of incident node representations. .. code:: python class ScorePredictor(nn.Module): def forward(self, edge_subgraph, x): with edge_subgraph.local_scope(): edge_subgraph.ndata['x'] = x edge_subgraph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score')) return edge_subgraph.edata['score'] When a negative sampler is provided, DGL’s data loader will generate three items per minibatch: - A positive graph containing all the edges sampled in the minibatch. - A negative graph containing all the non-existent edges generated by the negative sampler. - A list of blocks generated by the neighborhood sampler. So one can define the link prediction model as follows that takes in the three items as well as the input features. .. code:: python class Model(nn.Module): def __init__(self, in_features, hidden_features, out_features): super().__init__() self.gcn = StochasticTwoLayerGCN( in_features, hidden_features, out_features) def forward(self, positive_graph, negative_graph, blocks, x): x = self.gcn(blocks, x) pos_score = self.predictor(positive_graph, x) neg_score = self.predictor(negative_graph, x) return pos_score, neg_score Training loop ~~~~~~~~~~~~~ The training loop simply involves iterating over the data loader and feeding in the graphs as well as the input features to the model defined above. .. code:: python model = Model(in_features, hidden_features, out_features) model = model.cuda() opt = torch.optim.Adam(model.parameters()) for input_nodes, positive_graph, negative_graph, blocks in dataloader: blocks = [b.to(torch.device('cuda')) for b in blocks] positive_graph = positive_graph.to(torch.device('cuda')) negative_graph = negative_graph.to(torch.device('cuda')) input_features = blocks[0].srcdata['features'] pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features) loss = compute_loss(pos_score, neg_score) opt.zero_grad() loss.backward() opt.step() DGL provides the `unsupervised learning GraphSAGE `__ that shows an example of link prediction on homogeneous graphs. For heterogeneous graphs ~~~~~~~~~~~~~~~~~~~~~~~~ The models computing the node representations on heterogeneous graphs can also be used for computing incident node representations for edge classification/regression. .. code:: python class StochasticTwoLayerRGCN(nn.Module): def __init__(self, in_feat, hidden_feat, out_feat, rel_names): super().__init__() self.conv1 = dglnn.HeteroGraphConv({ rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right') for rel in rel_names }) self.conv2 = dglnn.HeteroGraphConv({ rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right') for rel in rel_names }) def forward(self, blocks, x): x = self.conv1(blocks[0], x) x = self.conv2(blocks[1], x) return x For score prediction, the only implementation difference between the homogeneous graph and the heterogeneous graph is that we are looping over the edge types for :meth:`dgl.DGLHeteroGraph.apply_edges`. .. code:: python class ScorePredictor(nn.Module): def forward(self, edge_subgraph, x): with edge_subgraph.local_scope(): edge_subgraph.ndata['x'] = x for etype in edge_subgraph.canonical_etypes: edge_subgraph.apply_edges( dgl.function.u_dot_v('x', 'x', 'score'), etype=etype) return edge_subgraph.edata['score'] class Model(nn.Module): def __init__(self, in_features, hidden_features, out_features, num_classes, etypes): super().__init__() self.rgcn = StochasticTwoLayerRGCN( in_features, hidden_features, out_features, etypes) self.pred = ScorePredictor() def forward(self, positive_graph, negative_graph, blocks, x): x = self.rgcn(blocks, x) pos_score = self.pred(positive_graph, x) neg_score = self.pred(negative_graph, x) return pos_score, neg_score Data loader definition is also very similar to that of edge classification/regression. The only difference is that you need to give the negative sampler and you will be supplying a dictionary of edge types and edge ID tensors instead of a dictionary of node types and node ID tensors. .. code:: python sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2) dataloader = dgl.dataloading.EdgeDataLoader( g, train_eid_dict, sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5), batch_size=1024, shuffle=True, drop_last=False, num_workers=4) If you want to give your own negative sampling function, the function should take in the original graph and the dictionary of edge types and edge ID tensors. It should return a dictionary of edge types and source-destination array pairs. An example is given as follows: .. code:: python class NegativeSampler(object): def __init__(self, g, k): # caches the probability distribution self.weights = { etype: g.in_degrees(etype=etype).float() ** 0.75 for _, etype, _ in g.canonical_etypes } self.k = k def __call__(self, g, eids_dict): result_dict = {} for etype, eids in eids_dict.items(): src, _ = g.find_edges(eids, etype=etype) src = src.repeat_interleave(self.k) dst = self.weights[etype].multinomial(len(src), replacement=True) result_dict[etype] = (src, dst) return result_dict Then you can give the dataloader a dictionary of edge types and edge IDs as well as the negative sampler. For instance, the following iterates over all edges of the heterogeneous graph. .. code:: python train_eid_dict = { g.edges(etype=etype, form='eid') for etype in g.etypes} dataloader = dgl.dataloading.EdgeDataLoader( g, train_eid_dict, sampler, negative_sampler=NegativeSampler(g, 5), batch_size=1024, shuffle=True, drop_last=False, num_workers=4) The training loop is again almost the same as that on homogeneous graph, except for the implementation of ``compute_loss`` that will take in two dictionaries of node types and predictions here. .. code:: python model = Model(in_features, hidden_features, out_features, num_classes, etypes) model = model.cuda() opt = torch.optim.Adam(model.parameters()) for input_nodes, positive_graph, negative_graph, blocks in dataloader: blocks = [b.to(torch.device('cuda')) for b in blocks] positive_graph = positive_graph.to(torch.device('cuda')) negative_graph = negative_graph.to(torch.device('cuda')) input_features = blocks[0].srcdata['features'] pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features) loss = compute_loss(pos_score, neg_score) opt.zero_grad() loss.backward() opt.step()