Unverified Commit bcffdb82 authored by Hengrui Zhang's avatar Hengrui Zhang Committed by GitHub
Browse files

[Example] Add implementation of mvgrl (#2739)



* [Example ]add mvgrl

* [Doc] add mvgrl to readme

* add more comments

* fix typos

* replace tab with space

* [doc] replace tab with space

* [Doc] fix a typo

* fix minor typos

* fix typos

* fix typos

* fix typos

* fix typos

* fix
Co-authored-by: default avatarMufei Li <mufeili1996@gmail.com>
parent e6f6c2eb
......@@ -8,6 +8,7 @@ The folder contains example implementations of selected research papers related
| Paper | node classification | link prediction / classification | graph property prediction | sampling | OGB |
| ------------------------------------------------------------ | ------------------- | -------------------------------- | ------------------------- | ------------------ | ------------------ |
| [Contrastive Multi-View Representation Learning on Graphs](#mvgrl) | :heavy_check_mark: | | :heavy_check_mark: | | |
| [Graph Random Neural Network for Semi-Supervised Learning on Graphs](#grand) | :heavy_check_mark: | | | | |
| [Heterogeneous Graph Transformer](#hgt) | :heavy_check_mark: | :heavy_check_mark: | | | |
| [Graph Convolutional Networks for Graphs with Multi-Dimensionally Weighted Edges](#mwe) | :heavy_check_mark: | | | | :heavy_check_mark: |
......@@ -86,6 +87,9 @@ The folder contains example implementations of selected research papers related
| [GNNExplainer: Generating Explanations for Graph Neural Networks](#gnnexplainer) | :heavy_check_mark: | | | | |
## 2020
- <a name="mvgrl"></a> Hassani and Khasahmadi. Contrastive Multi-View Representation Learning on Graphs. [Paper link](https://arxiv.org/abs/2006.05582).
- Example code: [PyTorch](../examples/pytorch/mvgrl)
- Tags: graph diffusion, self-supervised learning on graphs.
- <a name="grand"></a> Feng et al. Graph Random Neural Network for Semi-Supervised Learning on Graphs. [Paper link](https://arxiv.org/abs/2005.11079).
- Example code: [PyTorch](../examples/pytorch/grand)
- Tags: semi-supervised node classification, simplifying graph convolution, data augmentation
......@@ -110,7 +114,6 @@ The folder contains example implementations of selected research papers related
- <a name="dagnn"></a> Liu et al. Towards Deeper Graph Neural Networks. [Paper link](https://arxiv.org/abs/2007.09296).
- Example code: [PyTorch](../examples/pytorch/dagnn)
- Tags: over-smoothing, node classification
- <a name="dimenet"></a> Klicpera et al. Directional Message Passing for Molecular Graphs. [Paper link](https://arxiv.org/abs/2003.03123).
- Example code: [PyTorch](../examples/pytorch/dimenet)
- Tags: molecules, molecular property prediction, quantum chemistry
......
# DGL Implementation of MVGRL
This DGL example implements the model proposed in the paper [Contrastive Multi-View Representation Learning on Graphs](https://arxiv.org/abs/2006.05582).
Author's code: https://github.com/kavehhassani/mvgrl
## Example Implementor
This example was implemented by [Hengrui Zhang](https://github.com/hengruizhang98) when he was an applied scientist intern at AWS Shanghai AI Lab.
## Dependencies
- Python 3.7
- PyTorch 1.7.1
- dgl 0.6.0
- networkx
- scipy
## Datasets
##### Unsupervised Graph Classification Datasets:
'MUTAG', 'PTC_MR', 'REDDIT-BINARY', 'IMDB-BINARY', 'IMDB-MULTI'.
| Dataset | MUTAG | PTC_MR | RDT-B | IMDB-B | IMDB-M |
| --------------- | ----- | ------ | ------ | ------ | ------ |
| # Graphs | 188 | 344 | 2000 | 1000 | 1500 |
| # Classes | 2 | 2 | 2 | 2 | 3 |
| Avg. Graph Size | 17.93 | 14.29 | 429.63 | 19.77 | 13.00 |
* RDT-B, IMDB-B, IMDB-M are short for REDDIT-BINARY, IMDB-BINARY and IMDB-MULTI respectively.
##### Unsupervised Node Classification Datasets:
'Cora', 'Citeseer' and 'Pubmed'
| Dataset | # Nodes | # Edges | # Classes |
| -------- | ------- | ------- | --------- |
| Cora | 2,708 | 10,556 | 7 |
| Citeseer | 3,327 | 9,228 | 6 |
| Pubmed | 19,717 | 88,651 | 3 |
## Arguments
##### Graph Classification:
```
--dataname str The graph dataset name. Default is 'MUTAG'.
--gpu int GPU index. Default is -1, using cpu.
--epochs int Number of training periods. Default is 200.
--patience int Early stopping steps. Default is 20.
--lr float Learning rate. Default is 0.001.
--wd float Weight decay. Default is 0.0.
--batch_size int Size of a training batch. Default is 64.
--n_layers int Number of GNN layers. Default is 4.
--hid_dim int Embedding dimension. Default is 32.
```
##### Node Classification:
```
--dataname str The graph dataset name. Default is 'cora'.
--gpu int GPU index. Default is -1, using cpu.
--epochs int Number of training periods. Default is 500.
--patience int Early stopping steps. Default is 20.
--lr1 float Learning rate of main model. Default is 0.001.
--lr2 float Learning rate of linear classifer. Default is 0.01.
--wd1 float Weight decay of main model. Default is 0.0.
--wd2 float Weight decay of linear classifier. Default is 0.0.
--epsilon float Edge mask threshold. Default is 0.01.
--hid_dim int Embedding dimension. Default is 512.
```
## How to run examples
###### Graph Classification
```python
# Enter the 'graph' directory
cd graph
# MUTAG:
python main.py --dataname MUTAG --epochs 20
# PTC_MR:
python main.py --dataname PTC_MR --epochs 32 --hid_dim 128
# REDDIT-BINARY
python main.py --dataname REDDIT-BINARY --epochs 20 --hid_dim 128
# IMDB-BINARY
python main.py --dataname IMDB-BINARY --epochs 20 --hid_dim 512 --n_layers 2
# IMDB-MULTI
python main.py --dataname IMDB-MULTI --epochs 20 --hid_dim 512 --n_layers 2
```
###### Node Classification
For semi-supervised node classification on 'Cora', 'Citeseer' and 'Pubmed', we provide two implementations:
1. full-graph training, see 'main.py', where we contrast the local and global representations of the whole graph.
2. subgraph training, see 'main_sample.py', where we contrast the local and global representations of a sampled subgraph with fixed number of nodes.
For larger graphs(e.g. Pubmed), it would be hard to calculate the graph diffusion matrix(i.e., PPR matrix), so we try to approximate it with [APPNP](https://arxiv.org/abs/1810.05997), see function 'process_dataset_appnp' in 'node/dataset.py' for details.
```python
# Enter the 'node' directory
cd node
# Cora with full graph
python main.py --dataname cora --gpu 0
# Cora with sampled subgraphs
python main_sample.py --dataname cora --gpu 0
# Citeseer with full graph
python main.py --dataname citeseer --wd1 0.001 --wd2 0.01 --epochs 200 --gpu 0
# Citeseer with sampled subgraphs
python main_sample.py --dataname citeseer --wd2 0.01 --gpu 0
# Pubmed with sampled subgraphs
python main_sample.py --dataname pubmed --sample_size 4000 --epochs 400 --patience 999 --gpu 0
```
## Performance
We use the same hyper-parameter settings as stated in the original paper.
##### Graph classification:
| Dataset | MUTAG | PTC-MR | REDDIT-B | IMDB-B | IMDB-M |
| :---------------: | :---: | :----: | :------: | :----: | :----: |
| Accuracy Reported | 89.7 | 62.5 | 84.5 | 74.2 | 51.2 |
| DGL | 89.4 | 62.2 | 85.0 | 73.8 | 51.1 |
* The datasets that the authors used are slightly different from standard TUDataset (see dgl.data.GINDataset) in the nodes' features(e.g. The node features of 'MUTAG' dataset are of dimensionality 11 rather than 7")
##### Node classification:
| Dataset | Cora | Citeseer | Pubmed |
| :---------------: | :--: | :------: | :----: |
| Accuracy Reported | 86.8 | 73.3 | 80.1 |
| DGL-sample | 83.2 | 72.6 | 79.8 |
| DGL-full | 83.5 | 73.7 | OOM |
* We fail to reproduce the reported accuracy on 'Cora', even with the authors' code.
* The accuracy reported by the original paper is based on fixed-sized subgraph-training.
''' Code adapted from https://github.com/kavehhassani/mvgrl '''
import os
import re
import numpy as np
import dgl
import torch as th
import networkx as nx
from dgl.data import DGLDataset
from collections import Counter
from scipy.linalg import fractional_matrix_power, inv
''' Compute Personalized Page Ranking'''
def compute_ppr(graph: nx.Graph, alpha=0.2, self_loop=True):
a = nx.convert_matrix.to_numpy_array(graph)
if self_loop:
a = a + np.eye(a.shape[0]) # A^ = A + I_n
d = np.diag(np.sum(a, 1)) # D^ = Sigma A^_ii
dinv = fractional_matrix_power(d, -0.5) # D^(-1/2)
at = np.matmul(np.matmul(dinv, a), dinv) # A~ = D^(-1/2) x A^ x D^(-1/2)
return alpha * inv((np.eye(a.shape[0]) - (1 - alpha) * at)) # a(I_n-(1-a)A~)^-1
def download(dataset, datadir):
os.makedirs(datadir)
url = 'https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/{0}.zip'.format(dataset)
zipfile = os.path.basename(url)
os.system('wget {0}; unzip {1}'.format(url, zipfile))
os.system('mv {0}/* {1}'.format(dataset, datadir))
os.system('rm -r {0}'.format(dataset))
os.system('rm {0}'.format(zipfile))
def process(dataset):
src = os.path.join(os.path.dirname(__file__), 'data')
prefix = os.path.join(src, dataset, dataset)
# assign each node to the corresponding graph
graph_node_dict = {}
with open('{0}_graph_indicator.txt'.format(prefix), 'r') as f:
for idx, line in enumerate(f):
graph_node_dict[idx + 1] = int(line.strip('\n'))
node_labels = []
if os.path.exists('{0}_node_labels.txt'.format(prefix)):
with open('{0}_node_labels.txt'.format(prefix), 'r') as f:
for line in f:
node_labels += [int(line.strip('\n')) - 1]
num_unique_node_labels = max(node_labels) + 1
else:
print('No node labels')
node_attrs = []
if os.path.exists('{0}_node_attributes.txt'.format(prefix)):
with open('{0}_node_attributes.txt'.format(prefix), 'r') as f:
for line in f:
node_attrs.append(
np.array([float(attr) for attr in re.split("[,\s]+", line.strip("\s\n")) if attr], dtype=np.float)
)
else:
print('No node attributes')
graph_labels = []
unique_labels = set()
with open('{0}_graph_labels.txt'.format(prefix), 'r') as f:
for line in f:
val = int(line.strip('\n'))
if val not in unique_labels:
unique_labels.add(val)
graph_labels.append(val)
label_idx_dict = {val: idx for idx, val in enumerate(unique_labels)}
graph_labels = np.array([label_idx_dict[l] for l in graph_labels])
adj_list = {idx: [] for idx in range(1, len(graph_labels) + 1)}
index_graph = {idx: [] for idx in range(1, len(graph_labels) + 1)}
with open('{0}_A.txt'.format(prefix), 'r') as f:
for line in f:
u, v = tuple(map(int, line.strip('\n').split(',')))
adj_list[graph_node_dict[u]].append((u, v))
index_graph[graph_node_dict[u]] += [u, v]
for k in index_graph.keys():
index_graph[k] = [u - 1 for u in set(index_graph[k])]
graphs, pprs = [], []
for idx in range(1, 1 + len(adj_list)):
graph = nx.from_edgelist(adj_list[idx])
graph.graph['label'] = graph_labels[idx - 1]
for u in graph.nodes():
if len(node_labels) > 0:
node_label_one_hot = [0] * num_unique_node_labels
node_label = node_labels[u - 1]
node_label_one_hot[node_label] = 1
graph.nodes[u]['label'] = node_label_one_hot
if len(node_attrs) > 0:
graph.nodes[u]['feat'] = node_attrs[u - 1]
if len(node_attrs) > 0:
graph.graph['feat_dim'] = node_attrs[0].shape[0]
# relabeling
mapping = {}
for node_idx, node in enumerate(graph.nodes()):
mapping[node] = node_idx
graphs.append(nx.relabel_nodes(graph, mapping))
pprs.append(compute_ppr(graph, alpha=0.2))
if 'feat_dim' in graphs[0].graph:
pass
else:
max_deg = max([max(dict(graph.degree).values()) for graph in graphs])
for graph in graphs:
for u in graph.nodes(data=True):
f = np.zeros(max_deg + 1)
f[graph.degree[u[0]]] = 1.0
if 'label' in u[1]:
f = np.concatenate((np.array(u[1]['label'], dtype=np.float), f))
graph.nodes[u[0]]['feat'] = f
return graphs, pprs
def load(dataset):
basedir = os.path.dirname(os.path.abspath(__file__))
datadir = os.path.join(basedir, 'data', dataset)
if not os.path.exists(datadir):
download(dataset, datadir)
graphs, diff = process(dataset)
feat, adj, labels = [], [], []
for idx, graph in enumerate(graphs):
adj.append(nx.to_numpy_array(graph))
labels.append(graph.graph['label'])
feat.append(np.array(list(nx.get_node_attributes(graph, 'feat').values())))
adj, diff, feat, labels = np.array(adj), np.array(diff), np.array(feat), np.array(labels)
np.save(f'{datadir}/adj.npy', adj)
np.save(f'{datadir}/diff.npy', diff)
np.save(f'{datadir}/feat.npy', feat)
np.save(f'{datadir}/labels.npy', labels)
else:
adj = np.load(f'{datadir}/adj.npy', allow_pickle=True)
diff = np.load(f'{datadir}/diff.npy', allow_pickle=True)
feat = np.load(f'{datadir}/feat.npy', allow_pickle=True)
labels = np.load(f'{datadir}/labels.npy', allow_pickle=True)
n_graphs = adj.shape[0]
graphs = []
diff_graphs = []
lbls = []
for i in range(n_graphs):
a = adj[i]
edge_indexes = a.nonzero()
graph = dgl.graph(edge_indexes)
graph = graph.add_self_loop()
graph.ndata['feat'] = th.tensor(feat[i]).float()
diff_adj = diff[i]
diff_indexes = diff_adj.nonzero()
diff_weight = th.tensor(diff_adj[diff_indexes]).float()
diff_graph = dgl.graph(diff_indexes)
diff_graph.edata['edge_weight'] = diff_weight
label = labels[i]
graphs.append(graph)
diff_graphs.append(diff_graph)
lbls.append(label)
labels = th.tensor(lbls)
dataset = TUDataset(graphs, diff_graphs, labels)
return dataset
class TUDataset(DGLDataset):
def __init__(self, graphs, diff_graphs, labels):
super(TUDataset, self).__init__(name='tu')
self.graphs = graphs
self.diff_graphs = diff_graphs
self.labels = labels
def process(self):
return
def __len__(self):
return len(self.graphs)
def __getitem__(self, idx):
return self.graphs[idx], self.diff_graphs[idx], self.labels[idx]
\ No newline at end of file
import argparse
import torch as th
import dgl
from dgl.dataloading import GraphDataLoader
import warnings
from dataset import load
warnings.filterwarnings('ignore')
from utils import linearsvc
from model import MVGRL
parser = argparse.ArgumentParser(description='mvgrl')
parser.add_argument('--dataname', type=str, default='MUTAG', help='Name of dataset.')
parser.add_argument('--gpu', type=int, default=-1, help='GPU index. Default: -1, using cpu.')
parser.add_argument('--epochs', type=int, default=200, help=' Number of training periods.')
parser.add_argument('--patience', type=int, default=20, help='Early stopping steps.')
parser.add_argument('--lr', type=float, default=0.001, help='Learning rate of mvgrl.')
parser.add_argument('--wd', type=float, default=0., help='Weight decay of mvgrl.')
parser.add_argument('--batch_size', type=int, default=64, help='Batch size.')
parser.add_argument('--n_layers', type=int, default=4, help='Number of GNN layers.')
parser.add_argument("--hid_dim", type=int, default=32, help='Hidden layer dim.')
args = parser.parse_args()
# check cuda
if args.gpu != -1 and th.cuda.is_available():
args.device = 'cuda:{}'.format(args.gpu)
else:
args.device = 'cpu'
def collate(samples):
''' collate function for building the graph dataloader'''
graphs, diff_graphs, labels = map(list, zip(*samples))
# generate batched graphs and labels
batched_graph = dgl.batch(graphs)
batched_labels = th.tensor(labels)
batched_diff_graph = dgl.batch(diff_graphs)
n_graphs = len(graphs)
graph_id = th.arange(n_graphs)
graph_id = dgl.broadcast_nodes(batched_graph, graph_id)
batched_graph.ndata['graph_id'] = graph_id
return batched_graph, batched_diff_graph, batched_labels
if __name__ == '__main__':
# Step 1: Prepare data =================================================================== #
dataset = load(args.dataname)
graphs, diff_graphs, labels = map(list, zip(*dataset))
print('Number of graphs:', len(graphs))
# generate a full-graph with all examples for evaluation
wholegraph = dgl.batch(graphs)
whole_dg = dgl.batch(diff_graphs)
# create dataloader for batch training
dataloader = GraphDataLoader(dataset,
batch_size=args.batch_size,
collate_fn=collate,
drop_last=False,
shuffle=True)
in_dim = wholegraph.ndata['feat'].shape[1]
# Step 2: Create model =================================================================== #
model = MVGRL(in_dim, args.hid_dim, args.n_layers)
model = model.to(args.device)
# Step 3: Create training components ===================================================== #
optimizer = th.optim.Adam(model.parameters(), lr=args.lr)
print('===== Before training ======')
wholegraph = wholegraph.to(args.device)
whole_dg = whole_dg.to(args.device)
wholefeat = wholegraph.ndata.pop('feat')
whole_weight = whole_dg.edata.pop('edge_weight')
embs = model.get_embedding(wholegraph, whole_dg, wholefeat, whole_weight)
lbls = th.LongTensor(labels)
acc_mean, acc_std = linearsvc(embs, lbls)
print('accuracy_mean, {:.4f}'.format(acc_mean))
best = float('inf')
cnt_wait = 0
# Step 4: Training epochs =============================================================== #
for epoch in range(args.epochs):
loss_all = 0
model.train()
for graph, diff_graph, label in dataloader:
graph = graph.to(args.device)
diff_graph = diff_graph.to(args.device)
feat = graph.ndata['feat']
graph_id = graph.ndata['graph_id']
edge_weight = diff_graph.edata['edge_weight']
n_graph = label.shape[0]
optimizer.zero_grad()
loss = model(graph, diff_graph, feat, edge_weight, graph_id)
loss_all += loss.item()
loss.backward()
optimizer.step()
print('Epoch {}, Loss {:.4f}'.format(epoch, loss_all))
if loss < best:
best = loss
best_t = epoch
cnt_wait = 0
th.save(model.state_dict(), f'{args.dataname}.pkl')
else:
cnt_wait += 1
if cnt_wait == args.patience:
print('Early stopping')
break
print('Training End')
# Step 5: Linear evaluation ========================================================== #
model.load_state_dict(th.load(f'{args.dataname}.pkl'))
embs = model.get_embedding(wholegraph, whole_dg, wholefeat, whole_weight)
acc_mean, acc_std = linearsvc(embs, lbls)
print('accuracy_mean, {:.4f}'.format(acc_mean))
\ No newline at end of file
import torch as th
import torch.nn as nn
from dgl.nn.pytorch import GraphConv
from dgl.nn.pytorch.glob import SumPooling
from utils import local_global_loss_
class MLP(nn.Module):
def __init__(self, in_dim, out_dim):
super(MLP, self).__init__()
self.fcs = nn.Sequential(
nn.Linear(in_dim, out_dim),
nn.PReLU(),
nn.Linear(out_dim, out_dim),
nn.PReLU(),
nn.Linear(out_dim, out_dim),
nn.PReLU()
)
self.linear_shortcut = nn.Linear(in_dim, out_dim)
def forward(self, x):
return self.fcs(x) + self.linear_shortcut(x)
class GCN(nn.Module):
def __init__(self, in_dim, out_dim, num_layers, norm):
super(GCN, self).__init__()
self.num_layers = num_layers
self.layers = nn.ModuleList()
self.layers.append(GraphConv(in_dim, out_dim, bias=False, norm=norm, activation = nn.PReLU()))
self.pooling = SumPooling()
for _ in range(num_layers - 1):
self.layers.append(GraphConv(out_dim, out_dim, bias=False, norm=norm, activation = nn.PReLU()))
def forward(self, graph, feat, edge_weight = None):
h = self.layers[0](graph, feat, edge_weight=edge_weight)
hg = self.pooling(graph, h)
for idx in range(self.num_layers - 1):
h = self.layers[idx + 1](graph, h, edge_weight=edge_weight)
hg = th.cat((hg, self.pooling(graph, h)), -1)
return h, hg
class MVGRL(nn.Module):
r"""
mvgrl model
Parameters
-----------
in_dim: int
Input feature size.
out_dim: int
Output feature size.
num_layers: int
Number of the GNN encoder layers.
Functions
-----------
forward(graph1, graph2, feat, edge_weight):
graph1: DGLGraph
The original graph
graph2: DGLGraph
The diffusion graph
feat: tensor
Node features
edge_weight: tensor
Edge weight of the diffusion graph
"""
def __init__(self, in_dim, out_dim, num_layers):
super(MVGRL, self).__init__()
self.local_mlp = MLP(out_dim, out_dim)
self.global_mlp = MLP(num_layers * out_dim, out_dim)
self.encoder1 = GCN(in_dim, out_dim, num_layers, norm='both')
self.encoder2 = GCN(in_dim, out_dim, num_layers, norm='none')
def get_embedding(self, graph1, graph2, feat, edge_weight):
local_v1, global_v1 = self.encoder1(graph1, feat)
local_v2, global_v2 = self.encoder2(graph2, feat, edge_weight=edge_weight)
global_v1 = self.global_mlp(global_v1)
global_v2 = self.global_mlp(global_v2)
return (global_v1 + global_v2).detach()
def forward(self, graph1, graph2, feat, edge_weight, graph_id):
# calculate node embeddings and graph embeddings
local_v1, global_v1 = self.encoder1(graph1, feat)
local_v2, global_v2 = self.encoder2(graph2, feat, edge_weight=edge_weight)
local_v1 = self.local_mlp(local_v1)
local_v2 = self.local_mlp(local_v2)
global_v1 = self.global_mlp(global_v1)
global_v2 = self.global_mlp(global_v2)
# calculate loss
loss1 = local_global_loss_(local_v1, global_v2, graph_id)
loss2 = local_global_loss_(local_v2, global_v1, graph_id)
loss = loss1 + loss2
return loss
''' Code adapted from https://github.com/fanyun-sun/InfoGraph '''
import torch as th
import torch.nn.functional as F
import math
import numpy as np
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, StratifiedKFold
def linearsvc(embeds, labels):
x = embeds.cpu().numpy()
y = labels.cpu().numpy()
params = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]}
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=None)
accuracies = []
for train_index, test_index in kf.split(x, y):
x_train, x_test = x[train_index], x[test_index]
y_train, y_test = y[train_index], y[test_index]
classifier = GridSearchCV(LinearSVC(), params, cv=5, scoring='accuracy', verbose=0)
classifier.fit(x_train, y_train)
accuracies.append(accuracy_score(y_test, classifier.predict(x_test)))
return np.mean(accuracies), np.std(accuracies)
def get_positive_expectation(p_samples, average=True):
"""Computes the positive part of a JS Divergence.
Args:
p_samples: Positive samples.
average: Average the result over samples.
Returns:
th.Tensor
"""
log_2 = math.log(2.)
Ep = log_2 - F.softplus(- p_samples)
if average:
return Ep.mean()
else:
return Ep
def get_negative_expectation(q_samples, average=True):
"""Computes the negative part of a JS Divergence.
Args:
q_samples: Negative samples.
average: Average the result over samples.
Returns:
th.Tensor
"""
log_2 = math.log(2.)
Eq = F.softplus(-q_samples) + q_samples - log_2
if average:
return Eq.mean()
else:
return Eq
def local_global_loss_(l_enc, g_enc, graph_id):
num_graphs = g_enc.shape[0]
num_nodes = l_enc.shape[0]
device = g_enc.device
pos_mask = th.zeros((num_nodes, num_graphs)).to(device)
neg_mask = th.ones((num_nodes, num_graphs)).to(device)
for nodeidx, graphidx in enumerate(graph_id):
pos_mask[nodeidx][graphidx] = 1.
neg_mask[nodeidx][graphidx] = 0.
res = th.mm(l_enc, g_enc.t())
E_pos = get_positive_expectation(res * pos_mask, average=False).sum()
E_pos = E_pos / num_nodes
E_neg = get_negative_expectation(res * neg_mask, average=False).sum()
E_neg = E_neg / (num_nodes * (num_graphs - 1))
return E_neg - E_pos
''' Code adapted from https://github.com/kavehhassani/mvgrl '''
import numpy as np
import torch as th
import scipy.sparse as sp
from scipy.linalg import fractional_matrix_power, inv
import dgl
from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
import networkx as nx
from sklearn.preprocessing import MinMaxScaler
from dgl.nn import APPNPConv
def preprocess_features(features):
"""Row-normalize feature matrix and convert to tuple representation"""
rowsum = np.array(features.sum(1))
r_inv = np.power(rowsum, -1).flatten()
r_inv[np.isinf(r_inv)] = 0.
r_mat_inv = sp.diags(r_inv)
features = r_mat_inv.dot(features)
if isinstance(features, np.ndarray):
return features
else:
return features.todense(), sparse_to_tuple(features)
def sparse_to_tuple(sparse_mx):
"""Convert sparse matrix to tuple representation."""
def to_tuple(mx):
if not sp.isspmatrix_coo(mx):
mx = mx.tocoo()
coords = np.vstack((mx.row, mx.col)).transpose()
values = mx.data
shape = mx.shape
return coords, values, shape
if isinstance(sparse_mx, list):
for i in range(len(sparse_mx)):
sparse_mx[i] = to_tuple(sparse_mx[i])
else:
sparse_mx = to_tuple(sparse_mx)
return sparse_mx
def compute_ppr(graph: nx.Graph, alpha=0.2, self_loop=True):
a = nx.convert_matrix.to_numpy_array(graph)
if self_loop:
a = a + np.eye(a.shape[0]) # A^ = A + I_n
d = np.diag(np.sum(a, 1)) # D^ = Sigma A^_ii
dinv = fractional_matrix_power(d, -0.5) # D^(-1/2)
at = np.matmul(np.matmul(dinv, a), dinv) # A~ = D^(-1/2) x A^ x D^(-1/2)
return alpha * inv((np.eye(a.shape[0]) - (1 - alpha) * at)) # a(I_n-(1-a)A~)^-1
def process_dataset(name, epsilon):
if name == 'cora':
dataset = CoraGraphDataset()
elif name == 'citeseer':
dataset = CiteseerGraphDataset()
graph = dataset[0]
feat = graph.ndata.pop('feat')
label = graph.ndata.pop('label')
train_mask = graph.ndata.pop('train_mask')
val_mask = graph.ndata.pop('val_mask')
test_mask = graph.ndata.pop('test_mask')
train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
nx_g = dgl.to_networkx(graph)
print('computing ppr')
diff_adj = compute_ppr(nx_g, 0.2)
print('computing end')
if name == 'citeseer':
print('additional processing')
feat = th.tensor(preprocess_features(feat.numpy())).float()
diff_adj[diff_adj < epsilon] = 0
scaler = MinMaxScaler()
scaler.fit(diff_adj)
diff_adj = scaler.transform(diff_adj)
diff_edges = np.nonzero(diff_adj)
diff_weight = diff_adj[diff_edges]
diff_graph = dgl.graph(diff_edges)
graph = graph.add_self_loop()
return graph, diff_graph, feat, label, train_idx, val_idx, test_idx, diff_weight
def process_dataset_appnp(epsilon):
k = 20
alpha = 0.2
dataset = PubmedGraphDataset()
graph = dataset[0]
feat = graph.ndata.pop('feat')
label = graph.ndata.pop('label')
train_mask = graph.ndata.pop('train_mask')
val_mask = graph.ndata.pop('val_mask')
test_mask = graph.ndata.pop('test_mask')
train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
appnp = APPNPConv(k, alpha)
id = th.eye(graph.number_of_nodes()).float()
diff_adj = appnp(graph.add_self_loop(), id).numpy()
diff_adj[diff_adj < epsilon] = 0
scaler = MinMaxScaler()
scaler.fit(diff_adj)
diff_adj = scaler.transform(diff_adj)
diff_edges = np.nonzero(diff_adj)
diff_weight = diff_adj[diff_edges]
diff_graph = dgl.graph(diff_edges)
return graph, diff_graph, feat, label, train_idx, val_idx, test_idx, diff_weight
\ No newline at end of file
import argparse
import numpy as np
import torch as th
import torch.nn as nn
import warnings
warnings.filterwarnings('ignore')
from dataset import process_dataset
from model import MVGRL, LogReg
parser = argparse.ArgumentParser(description='mvgrl')
parser.add_argument('--dataname', type=str, default='cora', help='Name of dataset.')
parser.add_argument('--gpu', type=int, default=0, help='GPU index. Default: -1, using cpu.')
parser.add_argument('--epochs', type=int, default=500, help='Training epochs.')
parser.add_argument('--patience', type=int, default=20, help='Patient epochs to wait before early stopping.')
parser.add_argument('--lr1', type=float, default=0.001, help='Learning rate of mvgrl.')
parser.add_argument('--lr2', type=float, default=0.01, help='Learning rate of linear evaluator.')
parser.add_argument('--wd1', type=float, default=0., help='Weight decay of mvgrl.')
parser.add_argument('--wd2', type=float, default=0., help='Weight decay of linear evaluator.')
parser.add_argument('--epsilon', type=float, default=0.01, help='Edge mask threshold of diffusion graph.')
parser.add_argument("--hid_dim", type=int, default=512, help='Hidden layer dim.')
args = parser.parse_args()
# check cuda
if args.gpu != -1 and th.cuda.is_available():
args.device = 'cuda:{}'.format(args.gpu)
else:
args.device = 'cpu'
if __name__ == '__main__':
print(args)
# Step 1: Prepare data =================================================================== #
graph, diff_graph, feat, label, train_idx, val_idx, test_idx, edge_weight = process_dataset(args.dataname, args.epsilon)
n_feat = feat.shape[1]
n_classes = np.unique(label).shape[0]
graph = graph.to(args.device)
diff_graph = diff_graph.to(args.device)
feat = feat.to(args.device)
edge_weight = th.tensor(edge_weight).float().to(args.device)
train_idx = train_idx.to(args.device)
val_idx = val_idx.to(args.device)
test_idx = test_idx.to(args.device)
n_node = graph.number_of_nodes()
lbl1 = th.ones(n_node * 2)
lbl2 = th.zeros(n_node * 2)
lbl = th.cat((lbl1, lbl2))
# Step 2: Create model =================================================================== #
model = MVGRL(n_feat, args.hid_dim)
model = model.to(args.device)
lbl = lbl.to(args.device)
# Step 3: Create training components ===================================================== #
optimizer = th.optim.Adam(model.parameters(), lr=args.lr1, weight_decay=args.wd1)
loss_fn = nn.BCEWithLogitsLoss()
# Step 4: Training epochs ================================================================ #
best = float('inf')
cnt_wait = 0
for epoch in range(args.epochs):
model.train()
optimizer.zero_grad()
shuf_idx = np.random.permutation(n_node)
shuf_feat = feat[shuf_idx, :]
shuf_feat = shuf_feat.to(args.device)
out = model(graph, diff_graph, feat, shuf_feat, edge_weight)
loss = loss_fn(out, lbl)
loss.backward()
optimizer.step()
print('Epoch: {0}, Loss: {1:0.4f}'.format(epoch, loss.item()))
if loss < best:
best = loss
cnt_wait = 0
th.save(model.state_dict(), 'model.pkl')
else:
cnt_wait += 1
if cnt_wait == args.patience:
print('Early stopping')
break
model.load_state_dict(th.load('model.pkl'))
embeds = model.get_embedding(graph, diff_graph, feat, edge_weight)
train_embs = embeds[train_idx]
test_embs = embeds[test_idx]
label = label.to(args.device)
train_labels = label[train_idx]
test_labels = label[test_idx]
accs = []
# Step 5: Linear evaluation ========================================================== #
for _ in range(5):
model = LogReg(args.hid_dim, n_classes)
opt = th.optim.Adam(model.parameters(), lr=args.lr2, weight_decay=args.wd2)
model = model.to(args.device)
loss_fn = nn.CrossEntropyLoss()
for epoch in range(300):
model.train()
opt.zero_grad()
logits = model(train_embs)
loss = loss_fn(logits, train_labels)
loss.backward()
opt.step()
model.eval()
logits = model(test_embs)
preds = th.argmax(logits, dim=1)
acc = th.sum(preds == test_labels).float() / test_labels.shape[0]
accs.append(acc * 100)
accs = th.stack(accs)
print(accs.mean().item(), accs.std().item())
\ No newline at end of file
import argparse
import numpy as np
import torch as th
import torch.nn as nn
import random
import dgl
import warnings
warnings.filterwarnings('ignore')
from dataset import process_dataset, process_dataset_appnp
from model import MVGRL, LogReg
parser = argparse.ArgumentParser(description='mvgrl')
parser.add_argument('--dataname', type=str, default='cora', help='Name of dataset.')
parser.add_argument('--gpu', type=int, default=-1, help='GPU index. Default: -1, using cpu.')
parser.add_argument('--epochs', type=int, default=500, help='Training epochs.')
parser.add_argument('--patience', type=int, default=20, help='Patient epochs to wait before early stopping.')
parser.add_argument('--lr1', type=float, default=0.001, help='Learning rate of mvgrl.')
parser.add_argument('--lr2', type=float, default=0.01, help='Learning rate of linear evaluator.')
parser.add_argument('--wd1', type=float, default=0., help='Weight decay of mvgrl.')
parser.add_argument('--wd2', type=float, default=0., help='Weight decay of linear evaluator.')
parser.add_argument('--epsilon', type=float, default=0.01, help='Edge mask threshold of diffusion graph.')
parser.add_argument("--hid_dim", type=int, default=512, help='Hidden layer dim.')
args = parser.parse_args()
# check cuda
if args.gpu != -1 and th.cuda.is_available():
args.device = 'cuda:{}'.format(args.gpu)
else:
args.device = 'cpu'
if __name__ == '__main__':
print(args)
# Step 1: Prepare data =================================================================== #
if args.dataname == 'pubmed':
graph, diff_graph, feat, label, train_idx, val_idx, test_idx, edge_weight = process_dataset_appnp(args.epsilon)
else:
graph, diff_graph, feat, label, train_idx, val_idx, test_idx, edge_weight = process_dataset(args.dataname, args.epsilon)
edge_weight = th.tensor(edge_weight).float()
graph.ndata['feat'] = feat
diff_graph.edata['edge_weight'] = edge_weight
n_feat = feat.shape[1]
n_classes = np.unique(label).shape[0]
edge_weight = th.tensor(edge_weight).float()
train_idx = train_idx.to(args.device)
val_idx = val_idx.to(args.device)
test_idx = test_idx.to(args.device)
n_node = graph.number_of_nodes()
sample_size = 2000
lbl1 = th.ones(sample_size * 2)
lbl2 = th.zeros(sample_size * 2)
lbl = th.cat((lbl1, lbl2))
lbl = lbl.to(args.device)
# Step 2: Create model =================================================================== #
model = MVGRL(n_feat, args.hid_dim)
model = model.to(args.device)
# Step 3: Create training components ===================================================== #
optimizer = th.optim.Adam(model.parameters(), lr=args.lr1, weight_decay=args.wd1)
loss_fn = nn.BCEWithLogitsLoss()
node_list = list(range(n_node))
# Step 4: Training epochs ================================================================ #
best = float('inf')
cnt_wait = 0
for epoch in range(args.epochs):
model.train()
optimizer.zero_grad()
sample_idx = random.sample(node_list, sample_size)
g = dgl.node_subgraph(graph, sample_idx)
dg = dgl.node_subgraph(diff_graph, sample_idx)
f = g.ndata.pop('feat')
ew = dg.edata.pop('edge_weight')
shuf_idx = np.random.permutation(sample_size)
sf = f[shuf_idx, :]
g = g.to(args.device)
dg = dg.to(args.device)
f = f.to(args.device)
ew = ew.to(args.device)
sf = sf.to(args.device)
out = model(g, dg, f, sf, ew)
loss = loss_fn(out, lbl)
loss.backward()
optimizer.step()
print('Epoch: {0}, Loss: {1:0.4f}'.format(epoch, loss.item()))
if loss < best:
best = loss
cnt_wait = 0
th.save(model.state_dict(), 'model.pkl')
else:
cnt_wait += 1
if cnt_wait == args.patience:
print('Early stopping')
break
model.load_state_dict(th.load('model.pkl'))
graph = graph.to(args.device)
diff_graph = diff_graph.to(args.device)
feat = feat.to(args.device)
edge_weight = edge_weight.to(args.device)
embeds = model.get_embedding(graph, diff_graph, feat, edge_weight)
train_embs = embeds[train_idx]
test_embs = embeds[test_idx]
label = label.to(args.device)
train_labels = label[train_idx]
test_labels = label[test_idx]
accs = []
# Step 5: Linear evaluation ========================================================== #
for _ in range(5):
model = LogReg(args.hid_dim, n_classes)
opt = th.optim.Adam(model.parameters(), lr=args.lr2, weight_decay=args.wd2)
model = model.to(args.device)
loss_fn = nn.CrossEntropyLoss()
for epoch in range(300):
model.train()
opt.zero_grad()
logits = model(train_embs)
loss = loss_fn(logits, train_labels)
loss.backward()
opt.step()
model.eval()
logits = model(test_embs)
preds = th.argmax(logits, dim=1)
acc = th.sum(preds == test_labels).float() / test_labels.shape[0]
accs.append(acc * 100)
accs = th.stack(accs)
print(accs.mean().item(), accs.std().item())
\ No newline at end of file
import torch as th
import torch.nn as nn
from dgl.nn.pytorch import GraphConv
from dgl.nn.pytorch.glob import AvgPooling
class LogReg(nn.Module):
def __init__(self, hid_dim, n_classes):
super(LogReg, self).__init__()
self.fc = nn.Linear(hid_dim, n_classes)
def forward(self, x):
ret = self.fc(x)
return ret
class Discriminator(nn.Module):
def __init__(self, dim):
super(Discriminator, self).__init__()
self.fn = nn.Bilinear(dim, dim, 1)
def forward(self, h1, h2, h3, h4, c1, c2):
c_x1 = c1.expand_as(h1).contiguous()
c_x2 = c2.expand_as(h2).contiguous()
# positive
sc_1 = self.fn(h2, c_x1).squeeze(1)
sc_2 = self.fn(h1, c_x2).squeeze(1)
# negative
sc_3 = self.fn(h4, c_x1).squeeze(1)
sc_4 = self.fn(h3, c_x2).squeeze(1)
logits = th.cat((sc_1, sc_2, sc_3, sc_4))
return logits
class MVGRL(nn.Module):
def __init__(self, in_dim, out_dim):
super(MVGRL, self).__init__()
self.encoder1 = GraphConv(in_dim, out_dim, norm='both', bias=True, activation=nn.PReLU())
self.encoder2 = GraphConv(in_dim, out_dim, norm='none', bias=True, activation=nn.PReLU())
self.pooling = AvgPooling()
self.disc = Discriminator(out_dim)
self.act_fn = nn.Sigmoid()
def get_embedding(self, graph, diff_graph, feat, edge_weight):
h1 = self.encoder1(graph, feat)
h2 = self.encoder2(diff_graph, feat, edge_weight=edge_weight)
return (h1 + h2).detach()
def forward(self, graph, diff_graph, feat, shuf_feat, edge_weight):
h1 = self.encoder1(graph, feat)
h2 = self.encoder2(diff_graph, feat, edge_weight=edge_weight)
h3 = self.encoder1(graph, shuf_feat)
h4 = self.encoder2(diff_graph, shuf_feat, edge_weight=edge_weight)
c1 = self.act_fn(self.pooling(graph, h1))
c2 = self.act_fn(self.pooling(graph, h2))
out = self.disc(h1, h2, h3, h4, c1, c2)
return out
\ No newline at end of file
......@@ -20,7 +20,7 @@ class QM9EdgeDataset(DGLDataset):
2. It provides edge features, and node features in addition to the atoms' coordinates and atomic numbers.
3. It provides another 7 regression tasks(from 12 to 19).
This class is built based on a preprocessed dataset version, and we provide the preprocessing datails `here <https://gist.github.com/hengruizhang98/a2da30213b2356fff18b25385c9d3cd2>`_
This class is built based on a preprocessed version of the dataset, and we provide the preprocessing datails `here <https://gist.github.com/hengruizhang98/a2da30213b2356fff18b25385c9d3cd2>`_.
Reference:
......@@ -84,7 +84,7 @@ class QM9EdgeDataset(DGLDataset):
| B | :math:`B` | Rotational constant | :math:`\textrm{GHz}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| C | :math:`C` | Rotational constant | :math:`\textrm{GHz}` |
+--------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
Parameters
----------
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment