[Example] add implementation of grace (#2828)

* [Example] add implementation of grace * [Doc] add grace in the index file * fix * fix typos Co-authored-by: Mufei Li <mufeili1996@gmail.com>

[Example] add implementation of grace (#2828)
* [Example] add implementation of grace * [Doc] add grace in the index file * fix * fix typos Co-authored-by: Mufei Li <mufeili1996@gmail.com>
b2b531e0 · Hengrui Zhang · GitHub · e7046f1e · b2b531e0 · b2b531e0
Unverified Commit b2b531e0 authored Apr 27, 2021 by Hengrui Zhang Committed by GitHub Apr 27, 2021
8 changed files
--- a/examples/README.md
+++ b/examples/README.md
@@ -11,6 +11,7 @@ The folder contains example implementations of selected research papers related
 | [Network Embedding with Completely-imbalanced Labels](#rect) | :heavy_check_mark:  |                                  |                           |                    |                    |
 | [Boost then Convolve: Gradient Boosting Meets Graph Neural Networks](#bgnn) | :heavy_check_mark:  |                                  |                           |                    |                    |
 | [Contrastive Multi-View Representation Learning on Graphs](#mvgrl) | :heavy_check_mark: |  | :heavy_check_mark: |  |  |
+| [Deep Graph Contrastive Representation Learning](#grace) | :heavy_check_mark: | |  | | |
 | [Graph Random Neural Network for Semi-Supervised Learning on Graphs](#grand) | :heavy_check_mark:  |                                  |                           |                    |                    |
 | [Heterogeneous Graph Transformer](#hgt)                      | :heavy_check_mark:  | :heavy_check_mark:               |                           |                    |                    |
 | [Graph Convolutional Networks for Graphs with Multi-Dimensionally Weighted Edges](#mwe) | :heavy_check_mark:  |                                  |                           |                    | :heavy_check_mark: |
@@ -109,6 +110,9 @@ The folder contains example implementations of selected research papers related
 - <a name="mvgrl"></a> Hassani and Khasahmadi. Contrastive Multi-View Representation Learning on Graphs. [Paper link](https://arxiv.org/abs/2006.05582). 
    - Example code: [PyTorch](../examples/pytorch/mvgrl)
    - Tags: graph diffusion, self-supervised learning on graphs.
+- <a name="grace"></a> Zhu et al. Deep Graph Contrastive Representation Learning. [Paper link](https://arxiv.org/abs/2006.04131). 
+    - Example code: [PyTorch](../examples/pytorch/grace)
+    - Tags: contrastive learning for node classification.
 - <a name="grand"></a> Feng et al. Graph Random Neural Network for Semi-Supervised Learning on Graphs. [Paper link](https://arxiv.org/abs/2005.11079). 
    - Example code: [PyTorch](../examples/pytorch/grand)
    - Tags: semi-supervised node classification, simplifying graph convolution, data augmentation
@@ -136,19 +140,12 @@ The folder contains example implementations of selected research papers related
 - <a name="dimenet"></a> Klicpera et al. Directional Message Passing for Molecular Graphs. [Paper link](https://arxiv.org/abs/2003.03123).
    - Example code: [PyTorch](../examples/pytorch/dimenet)
    - Tags: molecules, molecular property prediction, quantum chemistry
-
- <a name="dagnn"></a> Rossi et al. Temporal Graph Networks For Deep Learning on Dynamic Graphs. [Paper link](https://arxiv.org/abs/2006.10637).
-    - Example code: [PyTorch](../examples/pytorch/tgn)
-    - Tags: over-smoothing, node classification 
-
- <a name="dagnn"></a> Rossi et al. Temporal Graph Networks For Deep Learning on Dynamic Graphs. [Paper link](https://arxiv.org/abs/2006.10637).
-    - Example code: [PyTorch](../examples/pytorch/tgn)
+- <a name="tgn"></a> Rossi et al. Temporal Graph Networks For Deep Learning on Dynamic Graphs. [Paper link](https://arxiv.org/abs/2006.10637).
+    - Example code: [Pytorch](../examples/pytorch/tgn)
    - Tags: over-smoothing, node classification 
-
 - <a name="compgcn"></a> Vashishth, Shikhar, et al. Composition-based Multi-Relational Graph Convolutional Networks. [Paper link](https://arxiv.org/abs/1911.03082).
    - Example code: [PyTorch](../examples/pytorch/compGCN)
    - Tags: multi-relational graphs, graph neural network
-
 - <a name="deepergcn"></a> Li et al. DeeperGCN: All You Need to Train Deeper GCNs. [Paper link](https://arxiv.org/abs/2006.07739).
    - Example code: [PyTorch](../examples/pytorch/deepergcn)
    - Tags: over-smoothing, deeper gnn, OGB

--- a/examples/pytorch/grace/README.md
+++ b/examples/pytorch/grace/README.md
+# DGL Implementation of GRACE
+This DGL example implements the model proposed in the paper [Deep Graph Contrastive Representation Learning](https://arxiv.org/abs/2006.04131).
+
+Author's code: https://github.com/CRIPAC-DIG/GRACE
+
+## Example Implementor
+
+This example was implemented by [Hengrui Zhang](https://github.com/hengruizhang98) when he was an applied scientist intern at AWS Shanghai AI Lab.
+
+## Dependencies
+
+- Python 3.7
+- PyTorch 1.7.1
+- dgl 0.6.0
+
+## Datasets
+
+##### Unsupervised Node Classification Datasets:
+
+'Cora', 'Citeseer' and 'Pubmed'
+
+| Dataset  | # Nodes | # Edges | # Classes |
+| -------- | ------- | ------- | --------- |
+| Cora     | 2,708   | 10,556  | 7         |
+| Citeseer | 3,327   | 9,228   | 6         |
+| Pubmed   | 19,717  | 88,651  | 3         |
+
+
+## Arguments
+
+```
+--dataname         str     The graph dataset name.                Default is 'cora'.
+--gpu              int     GPU index.                             Default is 0.
+--split            int     Dataset spliting method.               Default is 'random'.
+```
+
+## How to run examples
+
+In the paper(as well as authors' repo), the training set and testing set are split randomly with 1:9 ratio. In order to fairly compare it with other models with the split they used, termed public split, in this repo we also provide experiment results using public split. To run the examples,
+
+```python
+# Cora with random split
+python main.py --dataname cora
+
+# Cora with public split
+python main.py --dataname cora --split public
+```
+
+replace 'cora' with 'citeseer' or 'pubmed' if you would like to run this example for other datasets.
+
+## 	Performance
+
+We use the same hyperparameter settings as provided by the author, you can check config.yaml for detailed hyper-parameters for each dataset.
+
+Random split (Train/Test = 1:9)
+
+|      Dataset      | Cora | Citeseer | Pubmed |
+| :---------------: | :--: | :------: | :----: |
+| Accuracy Reported | 83.3 |   72.1   |  86.7  |
+|   Author's Code   | 83.1 |   71.0   |  86.3  |
+|        DGL        | 83.4 |   71.4   |  86.1  |
+
+Public split
+
+|    Dataset    | Cora | Citeseer | Pubmed |
+| :-----------: | :--: | :------: | :----: |
+| Author's Code | 79.9 |   68.6   |  81.3  |
+|      DGL      | 80.1 |   68.9   |  81.2  |
+
--- a/examples/pytorch/grace/aug.py
+++ b/examples/pytorch/grace/aug.py
+# Data augmentation on graphs via edge dropping and feature masking
+
+import torch as th
+import numpy as np
+import dgl
+
+def aug(graph, x, feat_drop_rate, edge_mask_rate):
+    ng = drop_edge(graph, edge_mask_rate)
+    feat = drop_feat(x, feat_drop_rate)
+    ng = ng.add_self_loop()
+
+    return ng, feat
+
+def drop_edge(graph, drop_prob):
+    E = graph.num_edges()
+
+    mask_rates = th.FloatTensor(np.ones(E) * drop_prob)
+    masks = th.bernoulli(1 - mask_rates)
+    edge_idx = masks.nonzero().squeeze(1)
+
+    sg = dgl.edge_subgraph(graph, edge_idx, preserve_nodes=True)
+    
+    return sg
+
+def drop_feat(x, drop_prob):
+    D = x.shape[1]
+    mask_rates = th.FloatTensor(np.ones(D) * drop_prob)
+    masks = th.bernoulli(1 - mask_rates)
+
+    x = x.clone()
+    x[:, masks] = 0
+
+    return x
\ No newline at end of file
--- a/examples/pytorch/grace/config.yaml
+++ b/examples/pytorch/grace/config.yaml
+cora:
+  learning_rate: 0.0005
+  num_hidden: 128
+  num_proj_hidden: 128
+  activation: 'relu'
+  num_layers: 2
+  drop_edge_rate_1: 0.2
+  drop_edge_rate_2: 0.4
+  drop_feature_rate_1: 0.3
+  drop_feature_rate_2: 0.4
+  tau: 0.4
+  num_epochs: 400
+  weight_decay: 0.00001
+citeseer:
+  learning_rate: 0.001
+  num_hidden: 256
+  num_proj_hidden: 256
+  activation: 'prelu'
+  num_layers: 2
+  drop_edge_rate_1: 0.2
+  drop_edge_rate_2: 0.0
+  drop_feature_rate_1: 0.3
+  drop_feature_rate_2: 0.2
+  tau: 0.9
+  num_epochs: 200
+  weight_decay: 0.00001
+pubmed:
+  learning_rate: 0.001
+  num_hidden: 256
+  num_proj_hidden: 256
+  activation: 'relu'
+  num_layers: 2
+  drop_edge_rate_1: 0.4
+  drop_edge_rate_2: 0.1
+  drop_feature_rate_1: 0.0
+  drop_feature_rate_2: 0.2
+  tau: 0.7
+  num_epochs: 1500
+  weight_decay: 0.00001
--- a/examples/pytorch/grace/dataset.py
+++ b/examples/pytorch/grace/dataset.py
+from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
+
+def load(name):
+    if name == 'cora':
+        dataset = CoraGraphDataset()
+    elif name == 'citeseer':
+        dataset = CiteseerGraphDataset()
+    elif name == 'pubmed':
+        dataset = PubmedGraphDataset()
+
+    graph = dataset[0]
+
+    train_mask = graph.ndata.pop('train_mask')
+    test_mask = graph.ndata.pop('test_mask')
+
+    feat = graph.ndata.pop('feat')
+    labels = graph.ndata.pop('label')
+
+    return graph, feat, labels, train_mask, test_mask
\ No newline at end of file
--- a/examples/pytorch/grace/eval.py
+++ b/examples/pytorch/grace/eval.py
+'''
+Code adapted from https://github.com/CRIPAC-DIG/GRACE
+Linear evaluation on learned node embeddings
+'''
+
+import numpy as np
+import functools
+
+from sklearn.metrics import f1_score
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import train_test_split, GridSearchCV
+from sklearn.multiclass import OneVsRestClassifier
+from sklearn.preprocessing import normalize, OneHotEncoder
+
+
+def repeat(n_times):
+    def decorator(f):
+        @functools.wraps(f)
+        def wrapper(*args, **kwargs):
+            results = [f(*args, **kwargs) for _ in range(n_times)]
+            statistics = {}
+            for key in results[0].keys():
+                values = [r[key] for r in results]
+                statistics[key] = {
+                    'mean': np.mean(values),
+                    'std': np.std(values)}
+            print_statistics(statistics, f.__name__)
+            return statistics
+
+        return wrapper
+
+    return decorator
+
+
+def prob_to_one_hot(y_pred):
+    ret = np.zeros(y_pred.shape, np.bool)
+    indices = np.argmax(y_pred, axis=1)
+    for i in range(y_pred.shape[0]):
+        ret[i][indices[i]] = True
+    return ret
+
+
+def print_statistics(statistics, function_name):
+    print(f'(E) | {function_name}:', end=' ')
+    for i, key in enumerate(statistics.keys()):
+        mean = statistics[key]['mean']
+        std = statistics[key]['std']
+        print(f'{key}={mean:.4f}+-{std:.4f}', end='')
+        if i != len(statistics.keys()) - 1:
+            print(',', end=' ')
+        else:
+            print()
+
+
+@repeat(3)
+def label_classification(embeddings, y, train_mask, test_mask, split='random', ratio=0.1):
+    X = embeddings.detach().cpu().numpy()
+    Y = y.detach().cpu().numpy()
+    Y = Y.reshape(-1, 1)
+    onehot_encoder = OneHotEncoder(categories='auto').fit(Y)
+    Y = onehot_encoder.transform(Y).toarray().astype(np.bool)
+
+    X = normalize(X, norm='l2')
+
+    if split == 'random':
+        X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=1 - ratio)
+    elif split == 'public':
+        X_train = X[train_mask]
+        X_test = X[test_mask]
+        y_train = Y[train_mask]
+        y_test = Y[test_mask]
+
+    logreg = LogisticRegression(solver='liblinear')
+    c = 2.0 ** np.arange(-10, 10)
+
+    clf = GridSearchCV(estimator=OneVsRestClassifier(logreg),
+                       param_grid=dict(estimator__C=c), n_jobs=8, cv=5,
+                       verbose=0)
+    clf.fit(X_train, y_train)
+
+    y_pred = clf.predict_proba(X_test)
+    y_pred = prob_to_one_hot(y_pred)
+
+    micro = f1_score(y_test, y_pred, average="micro")
+    macro = f1_score(y_test, y_pred, average="macro")
+
+    return {
+        'F1Mi': micro,
+        'F1Ma': macro
+    }
--- a/examples/pytorch/grace/main.py
+++ b/examples/pytorch/grace/main.py
+import argparse
+
+from model import Grace
+from aug import aug
+from dataset import load
+
+import torch as th
+import torch.nn as nn
+
+import yaml
+from yaml import SafeLoader
+
+from eval import label_classification
+import warnings
+
+warnings.filterwarnings('ignore')
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--dataname', type=str, default='cora', choices = ['cora', 'citeseer', 'pubmed'])
+parser.add_argument('--gpu', type=int, default=0)
+parser.add_argument('--split', type=str, default='random', choices = ['random', 'public'])
+args = parser.parse_args()
+
+if args.gpu != -1 and th.cuda.is_available():
+    args.device = 'cuda:{}'.format(args.gpu)
+else:
+    args.device = 'cpu'
+
+
+if __name__ == '__main__':
+
+    # Step 1: Load hyperparameters =================================================================== #
+    config = 'config.yaml'
+    config = yaml.load(open(config), Loader=SafeLoader)[args.dataname]
+    lr = config['learning_rate']
+    hid_dim = config['num_hidden']
+    out_dim = config['num_proj_hidden']
+
+    num_layers = config['num_layers']
+    act_fn = ({'relu': nn.ReLU(), 'prelu': nn.PReLU()})[config['activation']]
+
+    drop_edge_rate_1 = config['drop_edge_rate_1']
+    drop_edge_rate_2 = config['drop_edge_rate_2']
+    drop_feature_rate_1 = config['drop_feature_rate_1']
+    drop_feature_rate_2 = config['drop_feature_rate_2']
+
+    temp = config['tau']
+    epochs = config['num_epochs']
+    wd = config['weight_decay']
+
+    # Step 2: Prepare data =================================================================== #
+    graph, feat, labels, train_mask, test_mask = load(args.dataname)
+
+    in_dim = feat.shape[1]
+
+    # Step 3: Create model =================================================================== #
+    model = Grace(in_dim, hid_dim, out_dim, num_layers, act_fn, temp)
+    model = model.to(args.device)
+
+    optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
+
+    # Step 4: Training ======================================================================= #
+    for epoch in range(epochs):
+        model.train()
+        optimizer.zero_grad()
+        graph1, feat1 = aug(graph, feat, drop_feature_rate_1, drop_edge_rate_1)
+        graph2, feat2 = aug(graph, feat, drop_feature_rate_2, drop_edge_rate_2)
+
+        graph1 = graph1.to(args.device)
+        graph2 = graph2.to(args.device)
+
+        feat1 = feat1.to(args.device)
+        feat2 = feat2.to(args.device)
+
+        loss = model(graph1, graph2, feat1, feat2)
+        loss.backward()
+        optimizer.step()
+
+        print(f'Epoch={epoch:03d}, loss={loss.item():.4f}')
+
+    # Step 5: Linear evaluation ============================================================== #
+    print("=== Final Evaluation ===")
+    graph = graph.add_self_loop()
+    graph = graph.to(args.device)
+    feat = feat.to(args.device)
+    embeds = model.get_embedding(graph, feat)
+
+    '''Evaluation Embeddings  '''
+    label_classification(embeds, labels, train_mask, test_mask, split=args.split, ratio=0.1)
\ No newline at end of file
--- a/examples/pytorch/grace/model.py
+++ b/examples/pytorch/grace/model.py
+import torch as th
+import torch.nn as nn
+import torch.nn.functional as F
+from dgl.nn import GraphConv
+
+
+# Multi-layer Graph Convolutional Networks
+class GCN(nn.Module):
+    def __init__(self, in_dim, out_dim, act_fn, num_layers = 2):
+        super(GCN, self).__init__()
+
+        assert num_layers >= 2
+        self.num_layers = num_layers
+        self.convs = nn.ModuleList()
+
+        self.convs.append(GraphConv(in_dim, out_dim * 2))
+        for _ in range(self.num_layers - 2):
+            self.convs.append(GraphConv(out_dim * 2, out_dim * 2))
+
+        self.convs.append(GraphConv(out_dim * 2, out_dim))
+        self.act_fn = act_fn
+
+    def forward(self, graph, feat):
+        for i in range(self.num_layers):
+            feat = self.act_fn(self.convs[i](graph, feat))
+
+        return feat
+
+# Multi-layer(2-layer) Perceptron
+class MLP(nn.Module):
+    def __init__(self, in_dim, out_dim):
+        super(MLP, self).__init__()
+        self.fc1 = nn.Linear(in_dim, out_dim)
+        self.fc2 = nn.Linear(out_dim, in_dim)
+
+    def forward(self, x):
+        z = F.elu(self.fc1(x))
+        return self.fc2(z)
+
+
+class Grace(nn.Module):
+    r"""
+        GRACE model
+    Parameters
+    -----------
+    in_dim: int
+        Input feature size.
+    hid_dim: int
+        Hidden feature size.
+    out_dim: int
+        Output feature size.
+    num_layers: int
+        Number of the GNN encoder layers.
+    act_fn: nn.Module
+        Activation function.
+    temp: float
+        Temperature constant.
+    """
+    def __init__(self, in_dim, hid_dim, out_dim, num_layers, act_fn, temp):
+        super(Grace, self).__init__()
+        self.encoder = GCN(in_dim, hid_dim, act_fn, num_layers)
+        self.temp = temp
+        self.proj = MLP(hid_dim, out_dim)
+
+    def sim(self, z1, z2):
+        # normalize embeddings across feature dimension
+        z1 = F.normalize(z1)
+        z2 = F.normalize(z2)
+
+        s = th.mm(z1, z2.t())
+        return s
+
+    def get_loss(self, z1, z2):
+        # calculate SimCLR loss
+
+        f = lambda x: th.exp(x / self.temp)
+
+        refl_sim = f(self.sim(z1, z1))        # intra-view pairs
+        between_sim = f(self.sim(z1, z2))     # inter-view pairs
+
+        # between_sim.diag(): positive pairs
+        x1 = refl_sim.sum(1) + between_sim.sum(1) - refl_sim.diag()
+        loss = -th.log(between_sim.diag() / x1)
+
+        return loss
+
+    def get_embedding(self, graph, feat):
+        # get embeddings from the model for evaluation
+        h = self.encoder(graph, feat)
+
+        return h.detach()
+
+    def forward(self, graph1, graph2, feat1, feat2):
+        # encoding
+        h1 = self.encoder(graph1, feat1)
+        h2 = self.encoder(graph2, feat2)
+
+        # projection
+        z1 = self.proj(h1)
+        z2 = self.proj(h2)
+
+        # get loss
+        l1 = self.get_loss(z1, z2)
+        l2 = self.get_loss(z2, z1)
+
+        ret = (l1 + l2) * 0.5
+
+        return ret.mean()
\ No newline at end of file