[Example] DAGNN (#2545)

* dagnn * dagnn * Update README.md * Update README.md * fixed some details * fixed some details * Update README.md * Update README.md Co-authored-by: Mufei Li <mufeili1996@gmail.com>

[Example] DAGNN (#2545)
* dagnn * dagnn * Update README.md * Update README.md * fixed some details * fixed some details * Update README.md * Update README.md Co-authored-by: Mufei Li <mufeili1996@gmail.com>
8900450d · lt610 · GitHub · c45f6eb5 · 8900450d · 8900450d
Unverified Commit 8900450d authored Jan 26, 2021 by lt610 Committed by GitHub Jan 26, 2021
4 changed files
--- a/examples/README.md
+++ b/examples/README.md
@@ -45,28 +45,38 @@ The folder contains example implementations of selected research papers related
 | [GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation](#gnnfilm) | :heavy_check_mark:  |                     |                     |                     |                     |
 | [Hierarchical Graph Pooling with Structure Learning](#hgp-sl)                                                                                 |                     |                                  | :heavy_check_mark:        |                    |                    |
 | [Graph Representation Learning via Hard and Channel-Wise Attention Networks](#hardgat)                                   |:heavy_check_mark:                     |                 |                           |                    |                    |
+| [Towards Deeper Graph Neural Networks](#dagnn) | :heavy_check_mark:  |                                  |                           |                    |                    |

 ## 2020

 - <a name="grand"></a> Feng et al. Graph Random Neural Network for Semi-Supervised Learning on Graphs. [Paper link](https://arxiv.org/abs/2005.11079). 
    - Example code: [PyTorch](../examples/pytorch/grand)
    - Tags: semi-supervised node classification, simplifying graph convolution, data augmentation
+
 - <a name="hgt"></a> Hu et al. Heterogeneous Graph Transformer. [Paper link](https://arxiv.org/abs/2003.01332).
    - Example code: [PyTorch](../examples/pytorch/hgt)
    - Tags: dynamic heterogeneous graphs, large-scale, node classification, link prediction
+
 - <a name="mwe"></a> Chen. Graph Convolutional Networks for Graphs with Multi-Dimensionally Weighted Edges. [Paper link](https://cims.nyu.edu/~chenzh/files/GCN_with_edge_weights.pdf).
    - Example code: [PyTorch on ogbn-proteins](../examples/pytorch/ogb/ogbn-proteins)
    - Tags: node classification, weighted graphs, OGB
+
 - <a name="sign"></a> Frasca et al. SIGN: Scalable Inception Graph Neural Networks. [Paper link](https://arxiv.org/abs/2004.11198).
    - Example code: [PyTorch on ogbn-arxiv/products/mag](../examples/pytorch/ogb/sign), [PyTorch](../examples/pytorch/sign)
    - Tags: node classification, OGB, large-scale, heterogeneous graphs
+
 - <a name="prestrategy"></a> Hu et al. Strategies for Pre-training Graph Neural Networks. [Paper link](https://arxiv.org/abs/1905.12265).
    - Example code: [Molecule embedding](https://github.com/awslabs/dgl-lifesci/tree/master/examples/molecule_embeddings), [PyTorch for custom data](https://github.com/awslabs/dgl-lifesci/tree/master/examples/property_prediction/csv_data_configuration)
    - Tags: molecules, graph classification, unsupervised learning, self-supervised learning, molecular property prediction
+
 - <a name="GNN-FiLM"></a> Marc Brockschmidt. GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation. [Paper link](https://arxiv.org/abs/1906.12192).
    - Example code: [Pytorch](../examples/pytorch/GNN-FiLM)
    - Tags: multi-relational graphs, hypernetworks, GNN architectures

+- <a name="dagnn"></a> Liu et al. Towards Deeper Graph Neural Networks. [Paper link](https://arxiv.org/abs/2007.09296).
+    - Example code: [Pytorch](../examples/pytorch/dagnn)
+    - Tags: over-smoothing, node classification
+
 ## 2019

 - <a name="appnp"></a> Klicpera et al. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. [Paper link](https://arxiv.org/abs/1810.05997).

--- a/examples/pytorch/dagnn/README.md
+++ b/examples/pytorch/dagnn/README.md
+# DAGNN
+
+This DGL example implements the GNN model proposed in the paper [Towards Deeper Graph Neural Networks](https://arxiv.org/abs/2007.09296).
+
+Paper link: https://arxiv.org/abs/2007.09296
+
+Author's code: https://github.com/divelab/DeeperGNN
+
+Contributor: Liu Tang ([@lt610](https://github.com/lt610))
+
+## Dependecies
+- Python 3.6.10
+- PyTorch 1.4.0
+- numpy 1.18.1
+- dgl 0.5.3
+
+## Dataset
+
+The DGL's built-in Cora, Pubmed and Citeseer datasets. Dataset summary:
+
+| Dataset | #Nodes | #Edges | #Feats | #Classes | #Train Nodes | #Val Nodes | #Test Nodes |
+| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
+| Citeseer | 3,327 | 9,228 | 3,703 | 6 | 120 | 500 | 1000 |
+| Cora | 2,708 | 10,556 | 1,433 | 7 | 140 | 500 | 1000 |
+| Pubmed | 19,717 | 88,651 | 500 | 3 | 60 | 500 | 1000 |
+
+## Arguments
+
+###### Dataset options
+```
+--dataset          str     The graph dataset name.             Default is 'Cora'.
+```
+
+###### GPU options
+```
+--gpu              int     GPU index.                          Default is -1, using CPU.
+```
+
+###### Model options
+```
+--runs             int     Number of training runs.               Default is 1
+--epochs           int     Number of training epochs.             Default is 1500.
+--early-stopping   int     Early stopping patience rounds.        Default is 100.
+--lr               float   Adam optimizer learning rate.          Default is 0.01.
+--lamb             float   L2 regularization coefficient.         Default is 5e-3.
+--k                int     Number of propagation layers.          Default is 10.
+--hid-dim          int     Hidden layer dimensionalities.         Default is 64.
+--dropout          float   Dropout rate                           Default is 0.8
+```
+
+## Examples
+
+Train a model which follows the original hyperparameters on different datasets.
+```bash
+# Cora:
+python main.py --dataset Cora --gpu 0 --runs 100 --lamb 0.005 --k 12
+# Citeseer:
+python main.py --dataset Citeseer --gpu 0 --runs 100 --lamb 0.02 --k 16
+# Pubmed:
+python main.py --dataset Pubmed --gpu 0 --runs 100 --lamb 0.005 --k 20
+```
+### Performance
+
+#### On Cora, Citeseer and Pubmed
+| Dataset | Cora | Citeseer | Pubmed |
+| :-: | :-: | :-: | :-: |
+| Accuracy Reported(100 runs) | 84.4 ± 0.5 | 73.3 ± 0.6 | 80.5 ± 0.5 |
+| Accuracy DGL(100 runs) | 84.3 ± 0.5 | 73.1 ± 0.9 | 80.5 ± 0.4 |
--- a/examples/pytorch/dagnn/main.py
+++ b/examples/pytorch/dagnn/main.py
+import argparse
+from torch import nn
+from torch.nn import Parameter
+import dgl.function as fn
+from torch.nn import functional as F
+from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
+import numpy as np
+import torch
+from tqdm import trange
+from utils import generate_random_seeds, set_random_state, evaluate
+
+
+class DAGNNConv(nn.Module):
+    def __init__(self,
+                 in_dim,
+                 k):
+        super(DAGNNConv, self).__init__()
+
+        self.s = Parameter(torch.FloatTensor(in_dim, 1))
+        self.k = k
+
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        gain = nn.init.calculate_gain('sigmoid')
+        nn.init.xavier_uniform_(self.s, gain=gain)
+
+    def forward(self, graph, feats):
+
+        with graph.local_scope():
+            results = [feats]
+
+            degs = graph.in_degrees().float()
+            norm = torch.pow(degs, -0.5)
+            norm = norm.to(feats.device).unsqueeze(1)
+
+            for _ in range(self.k):
+                feats = feats * norm
+                graph.ndata['h'] = feats
+                graph.update_all(fn.copy_u('h', 'm'),
+                                 fn.sum('m', 'h'))
+                feats = graph.ndata['h']
+                feats = feats * norm
+                results.append(feats)
+
+            H = torch.stack(results, dim=1)
+            S = F.sigmoid(torch.matmul(H, self.s))
+            S = S.permute(0, 2, 1)
+            H = torch.matmul(S, H).squeeze()
+
+            return H
+
+
+class MLPLayer(nn.Module):
+    def __init__(self,
+                 in_dim,
+                 out_dim,
+                 bias=True,
+                 activation=None,
+                 dropout=0):
+        super(MLPLayer, self).__init__()
+
+        self.linear = nn.Linear(in_dim, out_dim, bias=bias)
+        self.activation = activation
+        self.dropout = nn.Dropout(dropout)
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        gain = 1.
+        if self.activation is F.relu:
+            gain = nn.init.calculate_gain('relu')
+        nn.init.xavier_uniform_(self.linear.weight, gain=gain)
+        if self.linear.bias is not None:
+            nn.init.zeros_(self.linear.bias)
+
+    def forward(self, feats):
+
+        feats = self.dropout(feats)
+        feats = self.linear(feats)
+        if self.activation:
+            feats = self.activation(feats)
+
+        return feats
+
+
+class DAGNN(nn.Module):
+    def __init__(self,
+                 k,
+                 in_dim,
+                 hid_dim,
+                 out_dim,
+                 bias=True,
+                 activation=F.relu,
+                 dropout=0, ):
+        super(DAGNN, self).__init__()
+        self.mlp = nn.ModuleList()
+        self.mlp.append(MLPLayer(in_dim=in_dim, out_dim=hid_dim, bias=bias,
+                                 activation=activation, dropout=dropout))
+        self.mlp.append(MLPLayer(in_dim=hid_dim, out_dim=out_dim, bias=bias,
+                                 activation=None, dropout=dropout))
+        self.dagnn = DAGNNConv(in_dim=out_dim, k=k)
+
+    def forward(self, graph, feats):
+        for layer in self.mlp:
+            feats = layer(feats)
+        feats = self.dagnn(graph, feats)
+        return feats
+
+
+def main(args):
+    # Step 1: Prepare graph data and retrieve train/validation/test index ============================= #
+    # Load from DGL dataset
+    if args.dataset == 'Cora':
+        dataset = CoraGraphDataset()
+    elif args.dataset == 'Citeseer':
+        dataset = CiteseerGraphDataset()
+    elif args.dataset == 'Pubmed':
+        dataset = PubmedGraphDataset()
+    else:
+        raise ValueError('Dataset {} is invalid.'.format(args.dataset))
+
+    graph = dataset[0]
+    graph = graph.add_self_loop()
+
+    # check cuda
+    if args.gpu >= 0 and torch.cuda.is_available():
+        device = 'cuda:{}'.format(args.gpu)
+    else:
+        device = 'cpu'
+
+    # retrieve the number of classes
+    n_classes = dataset.num_classes
+
+    # retrieve labels of ground truth
+    labels = graph.ndata.pop('label').to(device).long()
+
+    # Extract node features
+    feats = graph.ndata.pop('feat').to(device)
+    n_features = feats.shape[-1]
+
+    # retrieve masks for train/validation/test
+    train_mask = graph.ndata.pop('train_mask')
+    val_mask = graph.ndata.pop('val_mask')
+    test_mask = graph.ndata.pop('test_mask')
+
+    train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze().to(device)
+    val_idx = torch.nonzero(val_mask, as_tuple=False).squeeze().to(device)
+    test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze().to(device)
+
+    graph = graph.to(device)
+
+    # Step 2: Create model =================================================================== #
+    model = DAGNN(k=args.k,
+                  in_dim=n_features,
+                  hid_dim=args.hid_dim,
+                  out_dim=n_classes,
+                  dropout=args.dropout)
+    model = model.to(device)
+
+    # Step 3: Create training components ===================================================== #
+    loss_fn = F.cross_entropy
+    opt = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.lamb)
+
+    # Step 4: training epochs =============================================================== #
+    loss = float('inf')
+    best_acc = 0
+    no_improvement = 0
+    epochs = trange(args.epochs, desc='Accuracy & Loss')
+
+    for _ in epochs:
+        model.train()
+
+        logits = model(graph, feats)
+
+        # compute loss
+        train_loss = loss_fn(logits[train_idx], labels[train_idx])
+
+        # backward
+        opt.zero_grad()
+        train_loss.backward()
+        opt.step()
+
+        train_loss, train_acc, valid_loss, valid_acc, test_loss, test_acc = evaluate(model, graph, feats, labels,
+                                                                                     (train_idx, val_idx, test_idx))
+
+        # Print out performance
+        epochs.set_description('Train Acc {:.4f} | Train Loss {:.4f} | Val Acc {:.4f} | Val loss {:.4f}'.format(
+            train_acc, train_loss.item(), valid_acc, valid_loss.item()))
+
+        if valid_loss > loss:
+            no_improvement += 1
+            if no_improvement == args.early_stopping:
+                print('Early stop.')
+                break
+        else:
+            no_improvement = 0
+            loss = valid_loss
+            best_acc = test_acc
+
+    print("Test Acc {:.4f}".format(best_acc))
+    return best_acc
+
+
+if __name__ == "__main__":
+    """
+        DAGNN Model Hyperparameters
+    """
+    parser = argparse.ArgumentParser(description='DAGNN')
+    # data source params
+    parser.add_argument('--dataset', type=str, default='Cora', choices=["Cora", "Citeseer", "Pubmed"], help='Name of dataset.')
+    # cuda params
+    parser.add_argument('--gpu', type=int, default=-1, help='GPU index. Default: -1, using CPU.')
+    # training params
+    parser.add_argument('--runs', type=int, default=1, help='Training runs.')
+    parser.add_argument('--epochs', type=int, default=1500, help='Training epochs.')
+    parser.add_argument('--early-stopping', type=int, default=100, help='Patient epochs to wait before early stopping.')
+    parser.add_argument('--lr', type=float, default=0.01, help='Learning rate.')
+    parser.add_argument('--lamb', type=float, default=0.005, help='L2 reg.')
+    # model params
+    parser.add_argument('--k', type=int, default=12, help='Number of propagation layers.')
+    parser.add_argument("--hid-dim", type=int, default=64, help='Hidden layer dimensionalities.')
+    parser.add_argument('--dropout', type=float, default=0.8, help='dropout')
+    args = parser.parse_args()
+    print(args)
+
+    acc_lists = []
+    random_seeds = generate_random_seeds(seed=1222, nums=args.runs)
+
+    for run in range(args.runs):
+        set_random_state(random_seeds[run])
+        acc_lists.append(main(args))
+
+    acc_lists = np.array(acc_lists)
+
+    mean = np.around(np.mean(acc_lists, axis=0), decimals=4)
+    std = np.around(np.std(acc_lists, axis=0), decimals=4)
+
+    print('Total acc: ', acc_lists)
+    print('mean', mean)
+    print('std', std)
\ No newline at end of file
--- a/examples/pytorch/dagnn/utils.py
+++ b/examples/pytorch/dagnn/utils.py
+import numpy as np
+import random
+from torch.nn import functional as F
+import torch
+
+
+def evaluate(model, graph, feats, labels, idxs):
+    model.eval()
+    with torch.no_grad():
+        logits = model(graph, feats)
+        results = ()
+        for idx in idxs:
+            loss = F.cross_entropy(logits[idx], labels[idx])
+            acc = torch.sum(logits[idx].argmax(dim=1) == labels[idx]).item() / len(idx)
+            results += (loss, acc)
+    return results
+
+
+def generate_random_seeds(seed, nums):
+    random.seed(seed)
+    return [random.randint(1, 999999999) for _ in range(nums)]
+
+
+def set_random_state(seed):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+        torch.backends.cudnn.deterministic = True
\ No newline at end of file