[doc] update minibatch link prediction (#6649)

1e40c81b · Rhett Ying · GitHub · aa9c1101 · 1e40c81b
Unverified Commit 1e40c81b authored Dec 01, 2023 by Rhett Ying Committed by GitHub Dec 01, 2023
Show whitespace changes
Inline Side-by-side

Showing with 213 additions and 220 deletions

docs/source/guide/minibatch-link.rst docs/source/guide/minibatch-link.rst +213 -220

No files found.
--- a/docs/source/guide/minibatch-link.rst
+++ b/docs/source/guide/minibatch-link.rst
@@ -5,44 +5,40 @@
 :ref:`(中文版) <guide_cn-minibatch-link-classification-sampler>`
-Define a neighborhood sampler and data loader with negative sampling
+Define a data loader with neighbor and negative sampling
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-You can still use the same neighborhood sampler as the one in node/edge
+You can still use the same data loader as the one in node/edge classification.
-classification.
+The only difference is that you need to add an additional stage
+`negative sampling` before neighbor sampling stage. The following data loader
+will pick 5 negative destination nodes uniformly for each source node of an
+edge.
 .. code:: python
-    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    datapipe = datapipe.sample_uniform_negative(graph, 5)
-:func:`~dgl.dataloading.as_edge_prediction_sampler` in DGL also
+The whole data loader pipeline is as follows:
-supports generating negative samples for link prediction. To do so, you
-need to provide the negative sampling function.
-:class:`~dgl.dataloading.negative_sampler.Uniform` is a
-function that does uniform sampling. For each source node of an edge, it
-samples ``k`` negative destination nodes.
-The following data loader will pick 5 negative destination nodes
-uniformly for each source node of an edge.
 .. code:: python
-    sampler = dgl.dataloading.as_edge_prediction_sampler(
+    datapipe = gb.ItemSampler(itemset, batch_size=1024, shuffle=True)
-        sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
+    datapipe = datapipe.sample_uniform_negative(graph, 5)
-    dataloader = dgl.dataloading.DataLoader(
+    datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
-        g, train_seeds, sampler,
+    datapipe = datapipe.transform(gb.exclude_seed_edges)
-        batch_size=args.batch_size,
+    datapipe = datapipe.fetch_feature(feature, node_feature_keys=["feat"])
-        shuffle=True,
+    datapipe = datapipe.to_dgl()
-        drop_last=False,
+    datapipe = datapipe.copy_to(device)
-        pin_memory=True,
+    dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
-        num_workers=args.num_workers)
-For the builtin negative samplers please see :ref:`api-dataloading-negative-sampling`.
+For the details about the builtin uniform negative sampler please see
+:class:`~dgl.graphbolt.UniformNegativeSampler`.
-You can also give your own negative sampler function, as long as it
+You can also give your own negative sampler function, as long as it inherits
-takes in the original graph ``g`` and the minibatch edge ID array
+from :class:`~dgl.graphbolt.NegativeSampler` and overrides the
-``eid``, and returns a pair of source ID arrays and destination ID
+:meth:`~dgl.graphbolt.NegativeSampler._sample_with_etype` method which takes in
-arrays.
+the node pairs in minibatch, and returns the negative node pairs back.
 The following gives an example of custom negative sampler that samples
 negative destination nodes according to a probability distribution
@@ -50,88 +46,73 @@ proportional to a power of degrees.
 .. code:: python
-    class NegativeSampler(object):
+    @functional_datapipe("customized_sample_negative")
-        def __init__(self, g, k):
+    class CustomizedNegativeSampler(dgl.graphbolt.NegativeSampler):
+        def __init__(self, datapipe, k, node_degrees):
+            super().__init__(datapipe, k)
            # caches the probability distribution
-            self.weights = g.in_degrees().float() ** 0.75
+            self.weights = node_degrees ** 0.75
            self.k = k
-        def __call__(self, g, eids):
+        def _sample_with_etype(node_pairs, etype=None):
-            src, _ = g.find_edges(eids)
+            src, _ = node_pairs
            src = src.repeat_interleave(self.k)
            dst = self.weights.multinomial(len(src), replacement=True)
            return src, dst
-    sampler = dgl.dataloading.as_edge_prediction_sampler(
-        sampler, negative_sampler=NegativeSampler(g, 5))
-    dataloader = dgl.dataloading.DataLoader(
-        g, train_seeds, sampler,
-        batch_size=args.batch_size,
-        shuffle=True,
-        drop_last=False,
-        pin_memory=True,
-        num_workers=args.num_workers)
-Adapt your model for minibatch training
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As explained in :ref:`guide-training-link-prediction`, link prediction is trained
+    datapipe = datapipe.customized_sample_negative(5, node_degrees)
-via comparing the score of an edge (positive example) against a
-non-existent edge (negative example). To compute the scores of edges you
-can reuse the node representation computation model you have seen in
-edge classification/regression.
-.. code:: python
-    class StochasticTwoLayerGCN(nn.Module):
+Define a GraphSAGE model for minibatch training
-        def __init__(self, in_features, hidden_features, out_features):
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-            super().__init__()
-            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
-            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
-        def forward(self, blocks, x):
-            x = F.relu(self.conv1(blocks[0], x))
-            x = F.relu(self.conv2(blocks[1], x))
-            return x
-For score prediction, since you only need to predict a scalar score for
-each edge instead of a probability distribution, this example shows how
-to compute a score with a dot product of incident node representations.
 .. code:: python
-    class ScorePredictor(nn.Module):
+    class SAGE(nn.Module):
-        def forward(self, edge_subgraph, x):
+        def __init__(self, in_size, hidden_size):
-            with edge_subgraph.local_scope():
+            super().__init__()
-                edge_subgraph.ndata['x'] = x
+            self.layers = nn.ModuleList()
-                edge_subgraph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score'))
+            self.layers.append(dglnn.SAGEConv(in_size, hidden_size, "mean"))
-                return edge_subgraph.edata['score']
+            self.layers.append(dglnn.SAGEConv(hidden_size, hidden_size, "mean"))
+            self.layers.append(dglnn.SAGEConv(hidden_size, hidden_size, "mean"))
-When a negative sampler is provided, DGL’s data loader will generate
+            self.hidden_size = hidden_size
-three items per minibatch:
+            self.predictor = nn.Sequential(
+                nn.Linear(hidden_size, hidden_size),
-  A positive graph containing all the edges sampled in the minibatch.
+                nn.ReLU(),
+                nn.Linear(hidden_size, hidden_size),
+                nn.ReLU(),
+                nn.Linear(hidden_size, 1),
+            )
-  A negative graph containing all the non-existent edges generated by
+        def forward(self, blocks, x):
-   the negative sampler.
+            hidden_x = x
+            for layer_idx, (layer, block) in enumerate(zip(self.layers, blocks)):
+                hidden_x = layer(block, hidden_x)
+                is_last_layer = layer_idx == len(self.layers) - 1
+                if not is_last_layer:
+                    hidden_x = F.relu(hidden_x)
+            return hidden_x
-  A list of *message flow graphs* (MFGs) generated by the neighborhood sampler.
-So one can define the link prediction model as follows that takes in the
+When a negative sampler is provided, the data loader will generate positive and
-three items as well as the input features.
+negative node pairs for each minibatch besides the *Message Flow Graphs* (MFGs).
+Let's define a utility function to compact node pairs as follows:
 .. code:: python
-    class Model(nn.Module):
+    def to_binary_link_dgl_computing_pack(data: gb.DGLMiniBatch):
-        def __init__(self, in_features, hidden_features, out_features):
+        """Convert the minibatch to a training pair and a label tensor."""
-            super().__init__()
+        pos_src, pos_dst = data.positive_node_pairs
-            self.gcn = StochasticTwoLayerGCN(
+        neg_src, neg_dst = data.negative_node_pairs
-                in_features, hidden_features, out_features)
+        node_pairs = (
+            torch.cat((pos_src, neg_src), dim=0),
+            torch.cat((pos_dst, neg_dst), dim=0),
+        )
+        pos_label = torch.ones_like(pos_src)
+        neg_label = torch.zeros_like(neg_src)
+        labels = torch.cat([pos_label, neg_label], dim=0)
+        return (node_pairs, labels.float())
-        def forward(self, positive_graph, negative_graph, blocks, x):
-            x = self.gcn(blocks, x)
-            pos_score = self.predictor(positive_graph, x)
-            neg_score = self.predictor(negative_graph, x)
-            return pos_score, neg_score
 Training loop
 ~~~~~~~~~~~~~
@@ -142,164 +123,176 @@ above.
 .. code:: python
-    def compute_loss(pos_score, neg_score):
+    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
-        # an example hinge loss
-        n = pos_score.shape[0]
+    for epoch in tqdm.trange(args.epochs):
-        return (neg_score.view(n, -1) - pos_score.view(n, -1) + 1).clamp(min=0).mean()
+        model.train()
+        total_loss = 0
-    model = Model(in_features, hidden_features, out_features)
+        start_epoch_time = time.time()
-    model = model.cuda()
+        for step, data in enumerate(dataloader):
-    opt = torch.optim.Adam(model.parameters())
+            # Unpack MiniBatch.
+            compacted_pairs, labels = to_binary_link_dgl_computing_pack(data)
-    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
+            node_feature = data.node_features["feat"]
-        blocks = [b.to(torch.device('cuda')) for b in blocks]
+            # Convert sampled subgraphs to DGL blocks.
-        positive_graph = positive_graph.to(torch.device('cuda'))
+            blocks = data.blocks
-        negative_graph = negative_graph.to(torch.device('cuda'))
-        input_features = blocks[0].srcdata['features']
+            # Get the embeddings of the input nodes.
-        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
+            y = model(blocks, node_feature)
-        loss = compute_loss(pos_score, neg_score)
+            logits = model.predictor(
-        opt.zero_grad()
+                y[compacted_pairs[0]] * y[compacted_pairs[1]]
+            ).squeeze()
+            # Compute loss.
+            loss = F.binary_cross_entropy_with_logits(logits, labels)
+            optimizer.zero_grad()
            loss.backward()
-        opt.step()
+            optimizer.step()
+            total_loss += loss.item()
+        end_epoch_time = time.time()
 DGL provides the
-`unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_unsupervised.py>`__
+`unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/sampling/graphbolt/link_prediction.py>`__
 that shows an example of link prediction on homogeneous graphs.
 For heterogeneous graphs
 ~~~~~~~~~~~~~~~~~~~~~~~~
-The models computing the node representations on heterogeneous graphs
+The previous model could be easily extended to heterogeneous graphs. The only
-can also be used for computing incident node representations for edge
+difference is that you need to use :class:`~dgl.nn.HeteroGraphConv` to wrap
-classification/regression.
+:class:`~dgl.nn.SAGEConv` according to edge types.
 .. code:: python
-    class StochasticTwoLayerRGCN(nn.Module):
+    class SAGE(nn.Module):
-        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
+        def __init__(self, in_size, hidden_size):
            super().__init__()
-            self.conv1 = dglnn.HeteroGraphConv({
+            self.layers = nn.ModuleList()
-                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
+            self.layers.append(dglnn.HeteroGraphConv({
+                    rel : dglnn.SAGEConv(in_size, hidden_size, "mean")
                    for rel in rel_names
-                })
+                }))
-            self.conv2 = dglnn.HeteroGraphConv({
+            self.layers.append(dglnn.HeteroGraphConv({
-                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
+                    rel : dglnn.SAGEConv(hidden_size, hidden_size, "mean")
                    for rel in rel_names
-                })
+                }))
+            self.layers.append(dglnn.HeteroGraphConv({
+                    rel : dglnn.SAGEConv(hidden_size, hidden_size, "mean")
+                    for rel in rel_names
+                }))
+            self.hidden_size = hidden_size
+            self.predictor = nn.Sequential(
+                nn.Linear(hidden_size, hidden_size),
+                nn.ReLU(),
+                nn.Linear(hidden_size, hidden_size),
+                nn.ReLU(),
+                nn.Linear(hidden_size, 1),
+            )
        def forward(self, blocks, x):
-            x = self.conv1(blocks[0], x)
+            hidden_x = x
-            x = self.conv2(blocks[1], x)
+            for layer_idx, (layer, block) in enumerate(zip(self.layers, blocks)):
-            return x
+                hidden_x = layer(block, hidden_x)
+                is_last_layer = layer_idx == len(self.layers) - 1
-For score prediction, the only implementation difference between the
+                if not is_last_layer:
-homogeneous graph and the heterogeneous graph is that we are looping
+                    hidden_x = F.relu(hidden_x)
-over the edge types for :meth:`dgl.DGLGraph.apply_edges`.
+            return hidden_x
-.. code:: python
-    class ScorePredictor(nn.Module):
+Data loader definition is also very similar to that for homogeneous graph. The
-        def forward(self, edge_subgraph, x):
+only difference is that you need to give edge types for feature fetching.
-            with edge_subgraph.local_scope():
-                edge_subgraph.ndata['x'] = x
-                for etype in edge_subgraph.canonical_etypes:
-                    edge_subgraph.apply_edges(
-                        dgl.function.u_dot_v('x', 'x', 'score'), etype=etype)
-                return edge_subgraph.edata['score']
-    class Model(nn.Module):
-        def __init__(self, in_features, hidden_features, out_features, num_classes,
-                     etypes):
-            super().__init__()
-            self.rgcn = StochasticTwoLayerRGCN(
-                in_features, hidden_features, out_features, etypes)
-            self.pred = ScorePredictor()
-        def forward(self, positive_graph, negative_graph, blocks, x):
-            x = self.rgcn(blocks, x)
-            pos_score = self.pred(positive_graph, x)
-            neg_score = self.pred(negative_graph, x)
-            return pos_score, neg_score
-Data loader definition is also very similar to that of edge
-classification/regression. The only difference is that you need to give
-the negative sampler and you will be supplying a dictionary of edge
-types and edge ID tensors instead of a dictionary of node types and node
-ID tensors.
 .. code:: python
-    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    datapipe = gb.ItemSampler(itemset, batch_size=1024, shuffle=True)
-    sampler = dgl.dataloading.as_edge_prediction_sampler(
+    datapipe = datapipe.sample_uniform_negative(graph, 5)
-        sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
+    datapipe = datapipe.sample_neighbor(g, [10, 10]) # 2 layers.
-    dataloader = dgl.dataloading.DataLoader(
+    datapipe = datapipe.transform(gb.exclude_seed_edges)
-        g, train_eid_dict, sampler,
+    datapipe = datapipe.fetch_feature(
-        batch_size=1024,
+        feature,
-        shuffle=True,
+        node_feature_keys={"user": ["feat"], "item": ["feat"]}
-        drop_last=False,
+    )
-        num_workers=4)
+    datapipe = datapipe.to_dgl()
+    datapipe = datapipe.copy_to(device)
-If you want to give your own negative sampling function, the function
+    dataloader = gb.MultiProcessDataLoader(datapipe, num_workers=0)
-should take in the original graph and the dictionary of edge types and
-edge ID tensors. It should return a dictionary of edge types and
+If you want to give your own negative sampling function, just inherit from the
-source-destination array pairs. An example is given as follows:
+:class:`~dgl.graphbolt.NegativeSampler` class and override the
+:meth:`~dgl.graphbolt.NegativeSampler._sample_with_etype` method.
 .. code:: python
-   class NegativeSampler(object):
+    @functional_datapipe("customized_sample_negative")
-       def __init__(self, g, k):
+    class CustomizedNegativeSampler(dgl.graphbolt.NegativeSampler):
+        def __init__(self, datapipe, k, node_degrees):
+            super().__init__(datapipe, k)
            # caches the probability distribution
            self.weights = {
-               etype: g.in_degrees(etype=etype).float() ** 0.75
+                etype: node_degrees[etype] ** 0.75 for etype in node_degrees
-               for etype in g.canonical_etypes}
+            }
            self.k = k
-       def __call__(self, g, eids_dict):
+        def _sample_with_etype(node_pairs, etype):
-           result_dict = {}
+            src, _ = node_pairs
-           for etype, eids in eids_dict.items():
-               src, _ = g.find_edges(eids, etype=etype)
            src = src.repeat_interleave(self.k)
            dst = self.weights[etype].multinomial(len(src), replacement=True)
-               result_dict[etype] = (src, dst)
+            return src, dst
-           return result_dict
+    datapipe = datapipe.customized_sample_negative(5, node_degrees)
-Then you can give the dataloader a dictionary of edge types and edge IDs as well as the negative
-sampler.  For instance, the following iterates over all edges of the heterogeneous graph.
+For heterogeneous graphs, node pairs are grouped by edge types.
 .. code:: python
-    train_eid_dict = {
+    def to_binary_link_dgl_computing_pack(data: gb.DGLMiniBatch, etype):
-        etype: g.edges(etype=etype, form='eid')
+        """Convert the minibatch to a training pair and a label tensor."""
-        for etype in g.canonical_etypes}
+        pos_src, pos_dst = data.positive_node_pairs[etype]
-    sampler = dgl.dataloading.as_edge_prediction_sampler(
+        neg_src, neg_dst = data.negative_node_pairs[etype]
-        sampler, negative_sampler=NegativeSampler(g, 5))
+        node_pairs = (
-    dataloader = dgl.dataloading.DataLoader(
+            torch.cat((pos_src, neg_src), dim=0),
-        g, train_eid_dict, sampler,
+            torch.cat((pos_dst, neg_dst), dim=0),
-        batch_size=1024,
+        )
-        shuffle=True,
+        pos_label = torch.ones_like(pos_src)
-        drop_last=False,
+        neg_label = torch.zeros_like(neg_src)
-        num_workers=4)
+        labels = torch.cat([pos_label, neg_label], dim=0)
+        return (node_pairs, labels.float())
 The training loop is again almost the same as that on homogeneous graph,
-except for the implementation of ``compute_loss`` that will take in two
+except for computing loss on specific edge type.
-dictionaries of node types and predictions here.
 .. code:: python
-    model = Model(in_features, hidden_features, out_features, num_classes, etypes)
+    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
-    model = model.cuda()
-    opt = torch.optim.Adam(model.parameters())
+    category = "user"
+    for epoch in tqdm.trange(args.epochs):
-    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
+        model.train()
-        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        total_loss = 0
-        positive_graph = positive_graph.to(torch.device('cuda'))
+        start_epoch_time = time.time()
-        negative_graph = negative_graph.to(torch.device('cuda'))
+        for step, data in enumerate(dataloader):
-        input_features = blocks[0].srcdata['features']
+            # Unpack MiniBatch.
-        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
+            compacted_pairs, labels = to_binary_link_dgl_computing_pack(data, category)
-        loss = compute_loss(pos_score, neg_score)
+            node_features = {
-        opt.zero_grad()
+                ntype: data.node_features[(ntype, "feat")]
+                for ntype in data.blocks[0].srctypes
+            }
+            # Convert sampled subgraphs to DGL blocks.
+            blocks = data.blocks
+            # Get the embeddings of the input nodes.
+            y = model(blocks, node_feature)
+            logits = model.predictor(
+                y[category][compacted_pairs[0]] * y[category][compacted_pairs[1]]
+            ).squeeze()
+            # Compute loss.
+            loss = F.binary_cross_entropy_with_logits(logits, labels)
+            optimizer.zero_grad()
            loss.backward()
-        opt.step()
+            optimizer.step()
+            total_loss += loss.item()
+        end_epoch_time = time.time()