[Example] Label Propagation and Correct&Smooth (#2852)

* [example] label propagation and correct&smooth * update * update * update * update * [docs] add speed for c&s * update * [fix] remove gat&consistent with the author's code * [feat] multi-type adj norm supported * update Co-authored-by: Mufei Li <mufeili1996@gmail.com>

[Example] Label Propagation and Correct&Smooth (#2852)
* [example] label propagation and correct&smooth * update * update * update * update * [docs] add speed for c&s * update * [fix] remove gat&consistent with the author's code * [feat] multi-type adj norm supported * update Co-authored-by: Mufei Li <mufeili1996@gmail.com>
85231dc9 · xnouhz · GitHub · 5de4edd1 · 85231dc9 · 85231dc9
Unverified Commit 85231dc9 authored May 12, 2021 by xnouhz Committed by GitHub May 12, 2021
7 changed files
--- a/examples/README.md
+++ b/examples/README.md
@@ -95,12 +95,17 @@ The folder contains example implementations of selected research papers related
 | [DeeperGCN: All You Need to Train Deeper GCNs](#deepergcn)                                                                        |                                  |                                  | :heavy_check_mark:        |                    | :heavy_check_mark: |
 | [Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forcasting](#dcrnn) |                     |                                  | :heavy_check_mark:        |                    |                    |
 | [GaAN: Gated Attention Networks for Learning on large and Spatiotemporal Graphs](#gaan) |                     |                                  | :heavy_check_mark:        |                    |                    |
+| [Combining Label Propagation and Simple Models Out-performs Graph Neural Networks](#correct_and_smooth) |  :heavy_check_mark: |                                  |                                  |                                  | :heavy_check_mark:                                 |
+| [Learning from Labeled and Unlabeled Data with Label Propagation](#label_propagation) |  :heavy_check_mark: |                                  |                                  |                                  |                                  |

 ## 2021

 - <a name="bgnn"></a> Ivanov et al. Boost then Convolve: Gradient Boosting Meets Graph Neural Networks. [Paper link](https://openreview.net/forum?id=ebS5NUfoMKL). 
    - Example code: [PyTorch](../examples/pytorch/bgnn)
    - Tags: semi-supervised node classification, tabular data, GBDT
+- <a name="correct_and_smooth"></a> Huang et al. Combining Label Propagation and Simple Models Out-performs Graph Neural Networks. [Paper link](https://arxiv.org/abs/2010.13993). 
+    - Example code: [PyTorch](../examples/pytorch/correct_and_smooth)
+    - Tags: efficiency, node classification, label propagation

 ## 2020

@@ -142,7 +147,7 @@ The folder contains example implementations of selected research papers related
    - Tags: molecules, molecular property prediction, quantum chemistry
 - <a name="tgn"></a> Rossi et al. Temporal Graph Networks For Deep Learning on Dynamic Graphs. [Paper link](https://arxiv.org/abs/2006.10637).
    - Example code: [Pytorch](../examples/pytorch/tgn)
-    - Tags: over-smoothing, node classification 
+    - Tags: temporal, node classification 
 - <a name="compgcn"></a> Vashishth, Shikhar, et al. Composition-based Multi-Relational Graph Convolutional Networks. [Paper link](https://arxiv.org/abs/1911.03082).
    - Example code: [PyTorch](../examples/pytorch/compGCN)
    - Tags: multi-relational graphs, graph neural network
@@ -152,7 +157,6 @@ The folder contains example implementations of selected research papers related

 ## 2019

-
 - <a name="infograph"></a> Sun et al. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. [Paper link](https://arxiv.org/abs/1908.01000). 
    - Example code: [PyTorch](../examples/pytorch/infograph)
    - Tags: semi-supervised graph regression, unsupervised graph classification
@@ -229,7 +233,6 @@ The folder contains example implementations of selected research papers related
    - Example code: [PyTorch](../examples/pytorch/gnn_explainer)
    - Tags: Graph Neural Network, Explainability

-
 ## 2018

 - <a name="dgmg"></a> Li et al. Learning Deep Generative Models of Graphs. [Paper link](https://arxiv.org/abs/1803.03324).
@@ -419,6 +422,12 @@ The folder contains example implementations of selected research papers related
    - Example code: [PyTorch](../examples/pytorch/graph_matching)
    - Tags: graph edit distance, graph matching

+## 2002
+
+- <a name="label_propagation"></a> Zhu & Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. [Paper link](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.3864&rep=rep1&type=pdf).
+    - Example code: [PyTorch](../examples/pytorch/label_propagation)
+    - Tags: node classification, label propagation
+
 ## 1998

 - <a name="pagerank"></a> Page et al. The PageRank Citation Ranking: Bringing Order to the Web. [Paper link](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427).

--- a/examples/pytorch/correct_and_smooth/README.md
+++ b/examples/pytorch/correct_and_smooth/README.md
+# DGL Implementation of CorrectAndSmooth
+
+This DGL example implements the GNN model proposed in the paper [Combining Label Propagation and Simple Models Out-performs Graph Neural Networks](https://arxiv.org/abs/2010.13993). For the original implementation, see [here](https://github.com/CUAI/CorrectAndSmooth).
+
+Contributor: [xnuohz](https://github.com/xnuohz)
+
+### Requirements
+The codebase is implemented in Python 3.7. For version requirement of packages, see below.
+
+```
+dgl 0.6.0.post1
+torch 1.7.0
+ogb 1.3.0
+```
+
+### The graph datasets used in this example
+
+Open Graph Benchmark(OGB). Dataset summary:
+
+|    Dataset    |  #Nodes   |   #Edges   | #Node Feats |  Metric  |
+| :-----------: | :-------: | :--------: | :---------: | :------: |
+|  ogbn-arxiv   |  169,343  | 1,166,243  |     128     | Accuracy |
+| ogbn-products | 2,449,029 | 61,859,140 |     100     | Accuracy |
+
+### Usage
+
+Training a **Base predictor** and using **Correct&Smooth** which follows the original hyperparameters on different datasets.
+
+##### ogbn-arxiv
+
+* **MLP + C&S**
+
+```bash
+python main.py --dropout 0.5
+python main.py --pretrain --correction-adj DA --smoothing-adj AD
+```
+
+* **Linear + C&S**
+
+```bash
+python main.py --model linear --dropout 0.5 --epochs 1000
+python main.py --model linear --pretrain --correction-alpha 0.8 --smoothing-alpha 0.6 --correction-adj AD
+```
+
+##### ogbn-products
+
+* **Linear + C&S**
+
+```bash
+python main.py --dataset ogbn-products --model linear --dropout 0.5 --epochs 1000 --lr 0.1
+python main.py --dataset ogbn-products --model linear --pretrain --correction-alpha 0.6 --smoothing-alpha 0.9
+```
+
+### Performance
+
+#### ogbn-arxiv
+
+|                 |  MLP  | MLP + C&S | Linear | Linear + C&S |
+| :-------------: | :---: | :-------: | :----: | :----------: |
+| Results(Author) | 55.58 |   68.72   | 51.06  |    70.24     |
+|  Results(DGL)   | 56.12 |   68.63   | 52.49  |    71.69     |
+
+#### ogbn-products
+
+|                 | Linear | Linear + C&S |
+| :-------------: | :----: | :----------: |
+| Results(Author) | 47.67  |    82.34     |
+|  Results(DGL)   | 47.71  |    79.57     |
+
+### Speed
+
+|      ogb-arxiv       |      Time     | GPU Memory | Params  |
+| :------------------: | :-----------: | :--------: | :-----: |
+| Author, Linear + C&S | 6.3 * 10 ^ -3 |   1,248M   |  5,160  |
+|   DGL, Linear + C&S  | 5.6 * 10 ^ -3 |   1,252M   |  5,160  |
--- a/examples/pytorch/correct_and_smooth/main.py
+++ b/examples/pytorch/correct_and_smooth/main.py
+import argparse
+import copy
+import os
+import torch
+import torch.nn.functional as F
+import torch.optim as optim
+import dgl
+from ogb.nodeproppred import DglNodePropPredDataset, Evaluator
+from model import MLP, MLPLinear, CorrectAndSmooth
+
+
+def evaluate(y_pred, y_true, idx, evaluator):
+    return evaluator.eval({
+        'y_true': y_true[idx],
+        'y_pred': y_pred[idx]
+    })['acc']
+
+
+def main():
+    # check cuda
+    device = f'cuda:{args.gpu}' if torch.cuda.is_available() and args.gpu >= 0 else 'cpu'
+    # load data
+    dataset = DglNodePropPredDataset(name=args.dataset)
+    evaluator = Evaluator(name=args.dataset)
+
+    split_idx = dataset.get_idx_split()
+    g, labels = dataset[0] # graph: DGLGraph object, label: torch tensor of shape (num_nodes, num_tasks)
+    
+    if args.dataset == 'ogbn-arxiv':
+        g = dgl.to_bidirected(g, copy_ndata=True)
+        
+        feat = g.ndata['feat']
+        feat = (feat - feat.mean(0)) / feat.std(0)
+        g.ndata['feat'] = feat
+
+    g = g.to(device)
+    feats = g.ndata['feat']
+    labels = labels.to(device)
+
+    # load masks for train / validation / test
+    train_idx = split_idx["train"].to(device)
+    valid_idx = split_idx["valid"].to(device)
+    test_idx = split_idx["test"].to(device)
+
+    n_features = feats.size()[-1]
+    n_classes = dataset.num_classes
+    
+    # load model
+    if args.model == 'mlp':
+        model = MLP(n_features, args.hid_dim, n_classes, args.num_layers, args.dropout)
+    elif args.model == 'linear':
+        model = MLPLinear(n_features, n_classes)
+    else:
+        raise NotImplementedError(f'Model {args.model} is not supported.')
+
+    model = model.to(device)
+    print(f'Model parameters: {sum(p.numel() for p in model.parameters())}')
+
+    if args.pretrain:
+        print('---------- Before ----------')
+        model.load_state_dict(torch.load(f'base/{args.dataset}-{args.model}.pt'))
+        model.eval()
+
+        y_soft = model(feats).exp()
+
+        y_pred = y_soft.argmax(dim=-1, keepdim=True)
+        valid_acc = evaluate(y_pred, labels, valid_idx, evaluator)
+        test_acc = evaluate(y_pred, labels, test_idx, evaluator)
+        print(f'Valid acc: {valid_acc:.4f} | Test acc: {test_acc:.4f}')
+
+        print('---------- Correct & Smoothing ----------')
+        cs = CorrectAndSmooth(num_correction_layers=args.num_correction_layers,
+                              correction_alpha=args.correction_alpha,
+                              correction_adj=args.correction_adj,
+                              num_smoothing_layers=args.num_smoothing_layers,
+                              smoothing_alpha=args.smoothing_alpha,
+                              smoothing_adj=args.smoothing_adj,
+                              scale=args.scale)
+        
+        mask_idx = torch.cat([train_idx, valid_idx])
+        y_soft = cs.correct(g, y_soft, labels[mask_idx], mask_idx)
+        y_soft = cs.smooth(g, y_soft, labels[mask_idx], mask_idx)
+        y_pred = y_soft.argmax(dim=-1, keepdim=True)
+        valid_acc = evaluate(y_pred, labels, valid_idx, evaluator)
+        test_acc = evaluate(y_pred, labels, test_idx, evaluator)
+        print(f'Valid acc: {valid_acc:.4f} | Test acc: {test_acc:.4f}')
+    else:
+        opt = optim.Adam(model.parameters(), lr=args.lr)
+
+        best_acc = 0
+        best_model = copy.deepcopy(model)
+
+        # training
+        print('---------- Training ----------')
+        for i in range(args.epochs):
+
+            model.train()
+            opt.zero_grad()
+
+            logits = model(feats)
+            
+            train_loss = F.nll_loss(logits[train_idx], labels.squeeze(1)[train_idx])
+            train_loss.backward()
+
+            opt.step()
+            
+            model.eval()
+            with torch.no_grad():
+                logits = model(feats)
+                
+                y_pred = logits.argmax(dim=-1, keepdim=True)
+
+                train_acc = evaluate(y_pred, labels, train_idx, evaluator)
+                valid_acc = evaluate(y_pred, labels, valid_idx, evaluator)
+
+                print(f'Epoch {i} | Train loss: {train_loss.item():.4f} | Train acc: {train_acc:.4f} | Valid acc {valid_acc:.4f}')
+
+                if valid_acc > best_acc:
+                    best_acc = valid_acc
+                    best_model = copy.deepcopy(model)
+        
+        # testing & saving model
+        print('---------- Testing ----------')
+        best_model.eval()
+        
+        logits = best_model(feats)
+        
+        y_pred = logits.argmax(dim=-1, keepdim=True)
+        test_acc = evaluate(y_pred, labels, test_idx, evaluator)
+        print(f'Test acc: {test_acc:.4f}')
+
+        if not os.path.exists('base'):
+            os.makedirs('base')
+
+        torch.save(best_model.state_dict(), f'base/{args.dataset}-{args.model}.pt')
+
+
+if __name__ == '__main__':
+    """
+    Correct & Smoothing Hyperparameters
+    """
+    parser = argparse.ArgumentParser(description='Base predictor(C&S)')
+
+    # Dataset
+    parser.add_argument('--gpu', type=int, default=0, help='-1 for cpu')
+    parser.add_argument('--dataset', type=str, default='ogbn-arxiv', choices=['ogbn-arxiv', 'ogbn-products'])
+    # Base predictor
+    parser.add_argument('--model', type=str, default='mlp', choices=['mlp', 'linear'])
+    parser.add_argument('--num-layers', type=int, default=3)
+    parser.add_argument('--hid-dim', type=int, default=256)
+    parser.add_argument('--dropout', type=float, default=0.4)
+    parser.add_argument('--lr', type=float, default=0.01)
+    parser.add_argument('--epochs', type=int, default=300)
+    # extra options for gat
+    parser.add_argument('--n-heads', type=int, default=3)
+    parser.add_argument('--attn_drop', type=float, default=0.05)
+    # C & S
+    parser.add_argument('--pretrain', action='store_true', help='Whether to perform C & S')
+    parser.add_argument('--num-correction-layers', type=int, default=50)
+    parser.add_argument('--correction-alpha', type=float, default=0.979)
+    parser.add_argument('--correction-adj', type=str, default='DAD')
+    parser.add_argument('--num-smoothing-layers', type=int, default=50)
+    parser.add_argument('--smoothing-alpha', type=float, default=0.756)
+    parser.add_argument('--smoothing-adj', type=str, default='DAD')
+    parser.add_argument('--scale', type=float, default=20.)
+
+    args = parser.parse_args()
+    print(args)
+
+    main()
--- a/examples/pytorch/correct_and_smooth/model.py
+++ b/examples/pytorch/correct_and_smooth/model.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import dgl.function as fn
+
+
+class MLPLinear(nn.Module):
+    def __init__(self, in_dim, out_dim):
+        super(MLPLinear, self).__init__()
+        self.linear = nn.Linear(in_dim, out_dim)
+        self.reset_parameters()
+    
+    def reset_parameters(self):
+        self.linear.reset_parameters()
+
+    def forward(self, x):
+        return F.log_softmax(self.linear(x), dim=-1)
+
+
+class MLP(nn.Module):
+    def __init__(self, in_dim, hid_dim, out_dim, num_layers, dropout=0.):
+        super(MLP, self).__init__()
+        assert num_layers >= 2
+
+        self.linears = nn.ModuleList()
+        self.bns = nn.ModuleList()
+        self.linears.append(nn.Linear(in_dim, hid_dim))
+        self.bns.append(nn.BatchNorm1d(hid_dim))
+
+        for _ in range(num_layers - 2):
+            self.linears.append(nn.Linear(hid_dim, hid_dim))
+            self.bns.append(nn.BatchNorm1d(hid_dim))
+        
+        self.linears.append(nn.Linear(hid_dim, out_dim))
+        self.dropout = dropout
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        for layer in self.linears:
+            layer.reset_parameters()
+        for layer in self.bns:
+            layer.reset_parameters()
+
+    def forward(self, x):
+        for linear, bn in zip(self.linears[:-1], self.bns):
+            x = linear(x)
+            x = F.relu(x, inplace=True)
+            x = bn(x)
+            x = F.dropout(x, p=self.dropout, training=self.training)
+        x = self.linears[-1](x)
+        return F.log_softmax(x, dim=-1)
+
+
+class LabelPropagation(nn.Module):
+    r"""
+
+    Description
+    -----------
+    Introduced in `Learning from Labeled and Unlabeled Data with Label Propagation <https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.3864&rep=rep1&type=pdf>`_
+
+    .. math::
+        \mathbf{Y}^{\prime} = \alpha \cdot \mathbf{D}^{-1/2} \mathbf{A}
+        \mathbf{D}^{-1/2} \mathbf{Y} + (1 - \alpha) \mathbf{Y},
+
+    where unlabeled data is inferred by labeled data via propagation.
+
+    Parameters
+    ----------
+        num_layers: int
+            The number of propagations.
+        alpha: float
+            The :math:`\alpha` coefficient.
+        adj: str
+            'DAD': D^-0.5 * A * D^-0.5
+            'DA': D^-1 * A
+            'AD': A * D^-1
+    """
+    def __init__(self, num_layers, alpha, adj='DAD'):
+        super(LabelPropagation, self).__init__()
+
+        self.num_layers = num_layers
+        self.alpha = alpha
+        self.adj = adj
+    
+    @torch.no_grad()
+    def forward(self, g, labels, mask=None, post_step=lambda y: y.clamp_(0., 1.)):
+        with g.local_scope():
+            if labels.dtype == torch.long:
+                labels = F.one_hot(labels.view(-1)).to(torch.float32)
+            
+            y = labels
+            if mask is not None:
+                y = torch.zeros_like(labels)
+                y[mask] = labels[mask]
+            
+            last = (1 - self.alpha) * y
+            degs = g.in_degrees().float().clamp(min=1)
+            norm = torch.pow(degs, -0.5 if self.adj == 'DAD' else -1).to(labels.device).unsqueeze(1)
+
+            for _ in range(self.num_layers):
+                # Assume the graphs to be undirected
+                if self.adj in ['DAD', 'AD']:
+                    y = norm * y
+                
+                g.ndata['h'] = y
+                g.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
+                y = self.alpha * g.ndata.pop('h')
+
+                if self.adj in ['DAD', 'DA']:
+                    y = y * norm
+                
+                y = post_step(last + y)
+            
+            return y
+
+
+class CorrectAndSmooth(nn.Module):
+    r"""
+
+    Description
+    -----------
+    Introduced in `Combining Label Propagation and Simple Models Out-performs Graph Neural Networks <https://arxiv.org/abs/2010.13993>`_
+
+    Parameters
+    ----------
+        num_correction_layers: int
+            The number of correct propagations.
+        correction_alpha: float
+            The coefficient of correction.
+        correction_adj: str
+            'DAD': D^-0.5 * A * D^-0.5
+            'DA': D^-1 * A
+            'AD': A * D^-1
+        num_smoothing_layers: int
+            The number of smooth propagations.
+        smoothing_alpha: float
+            The coefficient of smoothing.
+        smoothing_adj: str
+            'DAD': D^-0.5 * A * D^-0.5
+            'DA': D^-1 * A
+            'AD': A * D^-1
+        autoscale: bool, optional
+            If set to True, will automatically determine the scaling factor :math:`\sigma`. Default is True.
+        scale: float, optional
+            The scaling factor :math:`\sigma`, in case :obj:`autoscale = False`. Default is 1.
+    """
+    def __init__(self,
+                 num_correction_layers,
+                 correction_alpha,
+                 correction_adj,
+                 num_smoothing_layers,
+                 smoothing_alpha,
+                 smoothing_adj,
+                 autoscale=True,
+                 scale=1.):
+        super(CorrectAndSmooth, self).__init__()
+        
+        self.autoscale = autoscale
+        self.scale = scale
+
+        self.prop1 = LabelPropagation(num_correction_layers,
+                                      correction_alpha,
+                                      correction_adj)
+        self.prop2 = LabelPropagation(num_smoothing_layers,
+                                      smoothing_alpha,
+                                      correction_adj)
+
+    def correct(self, g, y_soft, y_true, mask):
+        with g.local_scope():
+            assert abs(float(y_soft.sum()) / y_soft.size(0) - 1.0) < 1e-2
+            numel = int(mask.sum()) if mask.dtype == torch.bool else mask.size(0)
+            assert y_true.size(0) == numel
+
+            if y_true.dtype == torch.long:
+                y_true = F.one_hot(y_true.view(-1), y_soft.size(-1)).to(y_soft.dtype)
+            
+            error = torch.zeros_like(y_soft)
+            error[mask] = y_true - y_soft[mask]
+
+            if self.autoscale:
+                smoothed_error = self.prop1(g, error, post_step=lambda x: x.clamp_(-1., 1.))
+                sigma = error[mask].abs().sum() / numel
+                scale = sigma / smoothed_error.abs().sum(dim=1, keepdim=True)
+                scale[scale.isinf() | (scale > 1000)] = 1.0
+
+                result = y_soft + scale * smoothed_error
+                result[result.isnan()] = y_soft[result.isnan()]
+                return result
+            else:
+                def fix_input(x):
+                    x[mask] = error[mask]
+                    return x
+                
+                smoothed_error = self.prop1(g, error, post_step=fix_input)
+
+                result = y_soft + self.scale * smoothed_error
+                result[result.isnan()] = y_soft[result.isnan()]
+                return result
+
+    def smooth(self, g, y_soft, y_true, mask):
+        with g.local_scope():
+            numel = int(mask.sum()) if mask.dtype == torch.bool else mask.size(0)
+            assert y_true.size(0) == numel
+
+            if y_true.dtype == torch.long:
+                y_true = F.one_hot(y_true.view(-1), y_soft.size(-1)).to(y_soft.dtype)
+            
+            y_soft[mask] = y_true
+            return self.prop2(g, y_soft)
--- a/examples/pytorch/label_propagation/README.md
+++ b/examples/pytorch/label_propagation/README.md
+# DGL Implementation of Label Propagation
+
+This DGL example implements the method proposed in the paper [Learning from Labeled and Unlabeled Data with Label Propagation](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.3864&rep=rep1&type=pdf).
+
+Contributor: [xnuohz](https://github.com/xnuohz)
+
+### Requirements
+The codebase is implemented in Python 3.7. For version requirement of packages, see below.
+
+```
+dgl 0.6.0.post1
+torch 1.7.0
+```
+
+### The graph datasets used in this example
+
+The DGL's built-in Cora, Pubmed and Citeseer datasets. Dataset summary:
+
+| Dataset  | #Nodes | #Edges | #Feats | #Classes | #Train Nodes | #Val Nodes | #Test Nodes |
+| :------: | :----: | :----: | :----: | :------: | :----------: | :--------: | :---------: |
+| Citeseer | 3,327  | 9,228  | 3,703  |    6     |     120      |    500     |    1000     |
+|   Cora   | 2,708  | 10,556 | 1,433  |    7     |     140      |    500     |    1000     |
+|  Pubmed  | 19,717 | 88,651 |  500   |    3     |      60      |    500     |    1000     |
+
+### Usage
+
+```bash
+# Cora
+python main.py
+
+# Citeseer
+python main.py --dataset Citeseer --num-layers 100 --alpha 0.99
+
+# Pubmed
+python main.py --dataset Pubmed --num-layers 60 --alpha 1
+```
+
+### Performance
+
+|   Dataset    | Cora  | Citeseer | Pubmed |
+| :----------: | :---: | :------: | :----: |
+| Results(DGL) | 69.20 | 51.30 | 71.40 |
--- a/examples/pytorch/label_propagation/main.py
+++ b/examples/pytorch/label_propagation/main.py
+import argparse
+import torch
+import dgl
+from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
+from model import LabelPropagation
+
+
+def main():
+    # check cuda
+    device = f'cuda:{args.gpu}' if torch.cuda.is_available() and args.gpu >= 0 else 'cpu'
+
+    # load data
+    if args.dataset == 'Cora':
+        dataset = CoraGraphDataset()
+    elif args.dataset == 'Citeseer':
+        dataset = CiteseerGraphDataset()
+    elif args.dataset == 'Pubmed':
+        dataset = PubmedGraphDataset()
+    else:
+        raise ValueError('Dataset {} is invalid.'.format(args.dataset))
+    
+    g = dataset[0]
+    g = dgl.add_self_loop(g)
+
+    labels = g.ndata.pop('label').to(device).long()
+
+    # load masks for train / test, valid is not used.
+    train_mask = g.ndata.pop('train_mask')
+    test_mask = g.ndata.pop('test_mask')
+
+    train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze().to(device)
+    test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze().to(device)
+
+    g = g.to(device)
+    
+    # label propagation
+    lp = LabelPropagation(args.num_layers, args.alpha)
+    logits = lp(g, labels, mask=train_idx)
+
+    test_acc = torch.sum(logits[test_idx].argmax(dim=1) == labels[test_idx]).item() / len(test_idx)
+    print("Test Acc {:.4f}".format(test_acc))
+
+
+if __name__ == '__main__':
+    """
+    Label Propagation Hyperparameters
+    """
+    parser = argparse.ArgumentParser(description='LP')
+    parser.add_argument('--gpu', type=int, default=0)
+    parser.add_argument('--dataset', type=str, default='Cora')
+    parser.add_argument('--num-layers', type=int, default=10)
+    parser.add_argument('--alpha', type=float, default=0.5)
+
+    args = parser.parse_args()
+    print(args)
+
+    main()
--- a/examples/pytorch/label_propagation/model.py
+++ b/examples/pytorch/label_propagation/model.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import dgl.function as fn
+
+
+class LabelPropagation(nn.Module):
+    r"""
+
+    Description
+    -----------
+    Introduced in `Learning from Labeled and Unlabeled Data with Label Propagation <https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.3864&rep=rep1&type=pdf>`_
+
+    .. math::
+        \mathbf{Y}^{\prime} = \alpha \cdot \mathbf{D}^{-1/2} \mathbf{A}
+        \mathbf{D}^{-1/2} \mathbf{Y} + (1 - \alpha) \mathbf{Y},
+
+    where unlabeled data is inferred by labeled data via propagation.
+
+    Parameters
+    ----------
+        num_layers: int
+            The number of propagations.
+        alpha: float
+            The :math:`\alpha` coefficient.
+    """
+    def __init__(self, num_layers, alpha):
+        super(LabelPropagation, self).__init__()
+
+        self.num_layers = num_layers
+        self.alpha = alpha
+    
+    @torch.no_grad()
+    def forward(self, g, labels, mask=None, post_step=lambda y: y.clamp_(0., 1.)):
+        with g.local_scope():
+            if labels.dtype == torch.long:
+                labels = F.one_hot(labels.view(-1)).to(torch.float32)
+            
+            y = labels
+            if mask is not None:
+                y = torch.zeros_like(labels)
+                y[mask] = labels[mask]
+            
+            last = (1 - self.alpha) * y
+            degs = g.in_degrees().float().clamp(min=1)
+            norm = torch.pow(degs, -0.5).to(labels.device).unsqueeze(1)
+
+            for _ in range(self.num_layers):
+                # Assume the graphs to be undirected
+                g.ndata['h'] = y * norm
+                g.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h'))
+                y = last + self.alpha * g.ndata.pop('h') * norm
+                y = post_step(y)
+                last = (1 - self.alpha) * y
+            
+            return y